INFOFLA: Overcoming the Limitations of Traditional AI with VLM-Based Automation

#AI#AWS#AWS Summit#LLM#RPA#VLAgent#VLM#INFOFLA#Automation
Amazon Web Services (AWS), a leading player in the cloud market, hosted the ‘AWS Summit Seoul 2024’ on May 16-17 at COEX, Seoul. This event, which sheds light on the current state of Korea’s cloud ecosystem and AWS’s vision, began in 2015 and has now reached its 10th year.
This year’s AWS Summit featured over 60 partner booths and welcomed approximately 29,000 attendees, making it the largest cloud-related event in Korea. Particularly, it garnered significant attention as many innovative technologies, products, and services related to Generative AI were introduced.
article info1INFOFLA CEO Choi In-mook presenting at the AWS Summit / Source: IT Donga
The event also featured case studies from AWS ecosystem partners developing AI-related products and services. Among them were not only large corporations but also startups and SMEs. On the 17th, INFOFLA (CEO Choi In-mook) delivered a presentation in the EXPO session titled ‘Journey of Generative AI-Based IT Service Automation Technology and Cloud Service Development.’
In his presentation, CEO Choi highlighted the potential and limitations of Robotic Process Automation (RPA) solutions, which automate repetitive tasks. He pointed out that existing script-based RPA solutions require expert assistance for non-developers to use effectively. Additionally, he noted that RPA-based automation of repetitive web and app tasks struggles to handle unexpected pop-up windows or screen disruptions, often causing the RPA to stop functioning.
article info2VLM can overcome the limitations of traditional RPA
INFOFLA proposed ‘VLM (Vision Language Model)’ as a solution to overcome these RPA limitations. VLM enhances Large Language Models (LLM) with image processing capabilities, allowing automation purely through visual recognition without scripts. It also supports remote environments, effectively addressing many of the constraints of traditional RPA.
Moreover, VLM can recognize and respond to unexpected changes in a way similar to human intuition, and its capabilities improve over time through continuous learning. For instance, in a scenario where VLM automates repetitive text input on a webpage, if a pop-up window appears, VLM can analyze the situation, close the pop-up, and continue entering text in the designated field. While traditional RPA could also be programmed to handle such cases, the more scripts are added, the higher the chances of errors.
On this note, INFOFLA introduced its in-house developed ‘VLAgent (VLM + Agent).’ VLAgent is an agent model capable of understanding the screen and executing commands through VLM. Unlike traditional RPAs that require manually written scripts, VLAgent autonomously creates and executes work and action plans based on AI-driven screen recognition.
This approach integrates the latest trends, including LLMs, AI agents, and business process automation. Unlike conventional AI services that simply respond to user queries, VLAgent actively executes solutions. Additionally, traditional models required excessive processing power for high-resolution video recognition and often lacked Korean language support, making them less user-friendly for domestic users.
article info2Structure of ‘VLAgent’
To address these challenges, INFOFLA developed its own lightweight, Korean-supported 4K high-resolution solution that runs even on standard PCs and includes self-learning capabilities. VLAgent, including INFOFLA’s real-time object recognition RPA ‘RPACA,’ is now available on AWS and can be integrated into INFOFLA’s AI-based IT management system, ‘ITOMS.’
During the presentation, INFOFLA also showcased a demo video. In the Windows OS environment, when a user entered a command asking VLAgent for directions from Konkuk University Station to Gangnam Station, the AI took control of the mouse and keyboard, launched the Chrome web browser, accessed Google Maps, entered the start and destination points, and retrieved the route—all autonomously.
article info2INFOFLA CEO demonstrating ‘VLAgent’ in action / Source: IT Donga
Wrapping up the presentation, CEO Choi emphasized, 'Our solution is applicable across various fields, including business and service automation, support for the visually impaired, entertainment, education, manufacturing, healthcare, and customer service. This is the first attempt of its kind in Korea, and even globally, similar cases are rare.'
IT Donga Reporter Kim Young-woo pengo@itdonga.com
Source: IT Donga
https://www.donga.com/news/It/article/all/20240518/124992204/1