Landing AI, a company that provides enterprise-wide transformation programs and solutions for industrial AI applications, today announced the launch of Visual Prompting. This technology takes the framework of text prompting found in technologies such as ChatGPT and brings it to computer vision. This technology shift enables users to build computer-vision systems using Visual Prompting conversations.
Visual prompting is a feature of LandingLens, the company’s flagship product that makes computer vision easy for everyone to implement. LandingLens is an intuitive software platform that allows users to create, deploy and scale AI-powered industrial computer-vision applications — such as defect detection — faster and with higher accuracy.
The traditional AI workflow requires multiple steps: finding and labeling data, training a model and then making predictions. It’s often a very long and serial process where feedback often doesn’t come back until later in the cycle. In contrast, text interfaces like ChatGPT have a dramatically simpler process where users give text prompts saying what they want, and get answers quickly. Then users can rapidly iterate with ChatGPT to get more insights or clarity. In a similar vein, visual prompting significantly reduces the time to get a computer vision result from days or months to minutes or even seconds.
“Our visual Prompting product makes it easy for users to develop new vision applications,” Andrew Ng, founder and CEO of Landing AI, told VentureBeat. “Visual Prompting takes the transformational technology from text to images and will democratize the creation of AI and make it easier for everyone to use custom AI systems tuned to their own data and application.”
Landing AI’s customers have already seen the benefits of Visual Prompting in their projects. For example, Utilight, a company that develops advanced laser technologies for solar cell manufacturing, used Visual Prompting to detect defects on solar cells that were previously undetectable by conventional methods.
Landing AI claims that this technology is the world’s first commercial Visual Prompting capability, and was developed by its team of experts in computer vision and natural language processing (NLP).
Landing AI’s Visual Prompting is available as a public beta now. In addition, the company has partnered with top tech executives in multiple industries, including manufacturing, life sciences, satellite imagery and retail, to test and apply this technology.
Streamlining computer vision through generative AI
Building artificial intelligence (AI) models has traditionally been a complex and lengthy endeavor, often involving multiple steps such as data labeling, model training, and deployment before getting any predictions. But a new innovation, Visual Prompting, aims to transform the way computer vision systems are created.
Visual Prompting represents a significant shift in the AI development workflow by simplifying and accelerating the creation of computer vision models. With this technology, developers can leverage visual cues to quickly and efficiently label data, reducing the time needed for this crucial step.
The novel technology is inspired by recent advances in generative AI text interfaces like ChatGPT, which streamline the process of iterating text to gain valuable insights. By leveraging such innovations, Landing AI’s developers created a powerful new tool that enables faster and more efficient iterations of AI models.
“In the prompting workflow, you come up with a simple ‘visual prompt’ and in seconds can start getting predictions. This enables a much faster speed of development of applications,” Andrew Ng, founder and CEO of Landing AI, told VentureBeat. “The GPT-3 moment — where prompting makes it easy to develop new applications — isn’t here yet for computer vision, but I believe visual prompts will get us closer.”
Ng explained that the Visual Prompting technology enables users to specify a visual prompt by painting over object classes they wish the system to detect, using just one or a few images. The algorithm then immediately begins making inferences based on the user-provided prompt. If initial results are subpar, users can immediately re-enter the prompt to refine their models further, guiding the AI system towards better recognition by highlighting the specific pixels they want to improve.
“This powerful functionality empowers users to fine-tune their models with ease, resulting in faster iterations and greater accuracy,” Andrew added. “If the results look good, you can also deploy to a cloud API endpoint in the cloud, perhaps in tens of seconds. This means that you can get a first model up and running perhaps in minutes or at most a small number of hours, and use that to keep iterating and improving its performance.”
Democratizing the creation of AI
LandingAI’s Visual Prompting tool leverages multiple state-of-the-art pretrained vision transformer models developed in-house by a team of expert computer vision and natural language processing (NLP) researchers. With this technology breakthrough, the company is poised to transform computer vision workflows and make AI creation accessible to everyone.
Andrew believes that the success of text prompting in transforming natural language processing has paved the way for explosive innovation in the field of computer vision.
“Many groups in computer vision are exploring how to take the ideas of text prompting and adapt them to vision. For example, Meta’s recent work on SAM (segment anything) was a great piece of work on using prompting for the task of image segmentation,” he added. “There’s still much work ahead, and I expect this technology to continue to rapidly improve through our work and the work of many others.”
The company said the tool is already being tested across multiple industries for various use cases. One standout example involves a leading multinational pharmaceutical company that utilized the technology to create a real-time proof of concept for a computer vision project focused on estimating crystal shape and growth from laboratory images. This enabled them to develop models on a small dataset while removing the annotation burden.
Recently, therapeutic antibody discovery firm OmniAb used the LandingLens platform to analyze individual cells in honeycombs through Visual Prompting, which generally requires hours, hand-labeling hundreds of hexagonal shapes.
“With Visual Prompting, our team can build new models with greater ease, which allows us to build tailored models for new applications of our technology. The greatest impact of Visual Prompting has been in use cases where it would be laborious to exhaustively label all features, such as cells in our high throughput screening platform,” Bob Chen, senior director at OmniAb, told VentureBeat. “Thanks to Visual Prompting’s intuitive prompt interface, we can achieve high-quality results in a fraction of the time and with significantly reduced effort.”
Current challenges and Landing AI’s future plans
Andrew says that the software doesn’t work on everything because it is a beta release. However, out of 40 use cases that LandingAI analyzed, Visual Prompting and its post-processing capabilities were sufficient for over two-thirds of them.
“One limitation of our current system is that it is better at distinguishing between classes with different textures/colors than shape features; this reflects a limitation of the pre-trained vision transformers we’re using,” he said. “We’re continuing to work to improve the system.”
Andrew added that the company plans to improve the new capability based on user feedback and interactions.
“We’ll keep improving visual prompting and are eager to engage with the community to keep developing this technology together,” he said. “As vision transformers improve, we’ll also keep looking into how to incorporate the latest ideas to further help our customers.”
The post Andrew Ng’s Landing AI makes it easier to create computer vision apps with Visual Prompting appeared first on Venture Beat.