A new era of autonomous AI agents has begun. At Google I/O 2024, the tech giant unveiled Project Astra, a first-of-its-kind initiative to develop universal AI agents capable of perceiving, reasoning, and conversing in real-time.
“Building on Gemini, we’ve developed prototype agents that can process information faster by continuously encoding video frames, combining the video and speech input into a timeline of events, and caching this information for efficient recall,” said Google DeepMind chief Demis Hassabis, in a blog post.
Hassabis added that with the release it would be easy to see a future where people could have an expert AI assistant by their side via phone or glasses.
The release comes just a day after OpenAI unveiled GPT-4o, which won hearts online with its ‘omni’ capabilities across text, vision, and audio. OpenAI’s demos, which included a real-time translator, coding assistant, AI tutor, friendly companion, poet, and singer, soon became the talk of the town.
However, its agentic capabilities in particular have caught everyone’s attention, with some even calling it ‘the biggest part of the update’ and ‘a step closer to autonomous agents’.
The GPT-4o desktop app can read your screen in real-time and interact with your OS, revolutionising the way people work. The app allows for voice conversations, screenshot discussions, and instant access to ChatGPT. It’s like having an AI teammate on your device who can help you with whatever you’re working on.
Source: X
OpenAI president and co-founder Greg Brockman also demonstrated human-computer interactions (and even human-computer-computer interactions), giving users a glimpse of pre-AGI vibes.
You can get different instances of GPT-4o to interact with each other. The model can be interrupted in real-time, change its emotion, and even adjust its response with little to no latency. All this is a big breakthrough for building AI agents.
Real-time conversation with a voice agent that can understand the emotion in a person’s voice and that someone can interrupt with no lag, makes GPT-4o extremely helpful for building voice and vision-enabled smart agents.
A promising application is customer service, including a new type of technical support where customers can walk through their problems via video stream, allowing the agent to troubleshoot it in real time with the customer.
These developments show that with GPT-4o, the future is poised to be agent-to-agent. However, with their latest release, it’s clear that Google has also gone all in on AI agents, deploying them across the company’s product ecosystem.
From an agent who can continuously organise all receipts in your inbox into a spreadsheet to an agent who can return your orders, Google has it all. Use cases for the assistant also include aiding in multi-step researching, and reasoning, to even shopping to prepare a meal plan. Need to do something as tedious as updating your email? It has you covered there too, with a browser agent that works across multiple external websites to do tasks like updating addresses across dozens of websites.
Google also introduced an AI Teammate who lives inside Google Workspace to do collaborative tasks.
Despite all this, Google’s Project Astra and AI agent developments have received mixed responses online.
On the one hand, people appreciate Astra’s long context support, memory ‘to remember where the glasses were’, and native video processing capabilities compared to GPT-4o, which some contest only processes a single frame at a time.
Source: X
Many are also saying that with these advanced AI email, browser, and search agent demos, as well as an AI Teammate, Google will likely obliterate many startups focusing on email & browser-based agents.
“One thing Google is doing right: they are finally making serious efforts to integrate AI into the search box. I sense the agent flow: planning, real-time browsing, and multimodal input, all from the landing page. Google’s strongest moat is distribution. Gemini doesn’t have to be the best model to be the most used one in the world,” wrote a user on X.
On the other hand, some are not impressed with Google Astra’s slightly longer latency and are even sceptical on whether the real product will match the ‘too good to be true’ promises made in the demo.
“Remember the last time Google demo’d (sic) their AI it was all a complete lie,” wrote a user online, “Google promised a lot in events but never released like OpenAI does,” added another. Many even went so far as to call the demo an advertisement, rather than an actual demo.
Compared to OpenAI’s live demo, Google making use of a pre-recorded demo has some taking things with a pinch of salt. At least until they get to test the product.
Source: X
Everybody Is Bullish on ‘AI Agents’
Regardless of who wins this race, the important thing is that everybody seems to be bullish on AI agents, and soon, we might see a lot more interesting developments take shape.
Source: X
Recently, venture capitalist Vinod Khosla envisioned a future where internet interactions will be done mostly through agents. He predicted a future in which most consumer access to the internet will be agents acting for consumers doing tasks and fending off marketers and bots. “Tens of billions of agents on the internet will be normal,” he wrote.
Similarly, Meta CEO Mark Zuckerberg highlighted the evolving role of AI agents in customer interactions, envisioning a future where businesses and creators each have their own AI to represent their interests.
“A lot of people talk about the ‘ChatGPT moment’, where you’re like ‘Wow, never seen anything like this’. Many people will have kind of a ‘Wow, I couldn’t imagine an AI agent doing this’ moment,” said DeepLearning.AI founder Andrew Ng at Sequoia Capital’s AI Ascent.
Looks like it is finally happening.