At Google I/O, the tech giant revealed Project Astra, which is Google’s attempt at creating a new generation of AI assistants, or agents, that go beyond just understanding natural language to actually taking actions and completing multi-step tasks.
Notably, Astra is Google’s response to OpenAI’s GPT-4o, which possesses similar agentic capabilities. Pretty soon, AI agents will execute tasks on your behalf.
These recent advancements indicate that AI agents are something tech companies have been working towards. In the not-so-distant future, your favourite applications could have an AI agent built in, with which you can converse to order food, book a cab, or even make payments on your behalf.
Identifying early where the industry is headed, Bengaluru-based deep tech startup KOGO AI has developed a platform that helps companies build AI agents that can converse in Indic languages.
KOGO, which initially started as an AI travel app called Mappls, has now expanded its focus to enterprise AI solutions.
Known as KOGO AI Operating System (OS), it’s a low-code platform that allows companies to build an AI agent from scratch within minutes.
The agents will initially have conversational capabilities in Urdu, Hindi and English. However, another 73 languages, both Indian and global, are expected to be added soon.
For this, the Bengaluru-based startup has partnered with Bhashini, the Indian government’s initiative to break language barriers in India, and Microsoft to make the agents multilingual.
“Today, if a company, whether it’s a developer, system integrator, or large enterprise, intends to develop an AI agent from the ground up, it could take them months, depending on the complexity of the use case.
“We’ve developed an OS that allows you to utilise pre-built building blocks, enabling you to create an agent within minutes for simple use cases and efficiently tackle complex ones. Think of us as similar to Xcode, where you can come in and develop AI agents or AI applications,” KOGO AI CEO Raj K Gopalakrishnan told AIM.
Leveraging Large Action Models (LAM)
KOGO OS is powered by large action models (LAMs), which create AI agents. This likely represents the next iteration in the AI journey.
LAMs are a type of AI system that can understand human intentions and take action to accomplish tasks. Unlike LLMs, which primarily generate outputs, LAMs can execute actions by interfacing with applications, websites, and other systems.
At its core, LAM employs a hierarchical approach to action representation and execution. It decomposes complex actions into smaller sub-actions, facilitating efficient planning and execution.
“You can’t use ChatGPT to book a flight ticket, make a reservation at a hotel or initiate a refund for a cancelled transaction. But with LAMs, it is possible. So LAMs can do all these things, and it uses LLMs as a base,” Gopalakrishnan said.
The KOGO OS platform also leverages multiple small language models (SLMs), which are trained and hosted locally by the startup, and commercially available LLMs.
Even though the platform leverages LLMs now, eventually, it will be powered by the company’s proprietary models, according to Gopalakrishnan. “Currently, we are leveraging LLMs for its intelligence and knowledge wrapping, not for its data,” he said.
A Swarm of SLMs
KOGO’s platform features a graph that generates the AI agent, determining whether to utilise an SLM or query an LLM for a specific task. Moreover, it connects to the contextual data of the particular enterprise.
“We achieve this by creating a swarm of SLMs. Think of it like a school of fish, where each tiny fish handles small tasks, and, together, they work synchronously to deliver faster, domain-specific results,” Gopalakrishnan said.
When it comes to enterprise data, AI agents can handle various data formats, including vector databases, PDFs, unstructured data, and CSV files. They’re also compatible with over 600 different types of apps.
“The system ingests and understands your requirements, whether you are part of the workforce or a business, and completes the task,” he added.
Given that SLMs are smaller, more agile, and trained for specific domains, they can perform functions faster. Moreover, given they are trained on very specific data, the chances of hallucinations reduces.
Over the last few months, we have seen many smaller models popping up including Llama 3 8b. Microsoft too have released several SLMs including the most recent Phi-3, which has just 2.7 billion parameters.
The tech giant is making these models accessible to its customers through Azure because it believes such smaller models are cost-efficient and perform certain functions more effectively.
Now, SLMs are proving to be a useful tool for KOGO in developing its platform.
What AI Agents Can Do
Venture capitalist Vinod Khosla envisioned a future where internet access will primarily be through agents. He predicted that most consumer interactions online would involve agents performing tasks on their behalf and protecting them from marketers and bots.
So far, KOGO has already carried out 14 proofs of concept (PoCs), with several use cases including business intelligence and customer experience.
Moreover, around half of these PoCs are expected to go live this quarter. Gopalakrishnan also revealed that three large system integrators (SI) carry out another six to seven PoCs with their customers and also internally.
“One of the PoCs involves an apparel brand, which, for instance, processes around 200 transactions daily, totalling 700 weekly. Each transaction varies with different items and charges, and payment gateways apply different fees based on payment methods.
“Reconciling these transactions, GST charges, and gateway fees is complex and time-consuming for accounting departments. Our AI agent streamlines this entire process, reducing the time from days to minutes by handling repetitive, mundane tasks efficiently,” Gopalakrishnan said.
Going forward, as AI agents become more advanced and ubiquitous, they are expected to completely change how we interact with machines. While exciting, it’s important to carefully consider the ethical implications.