A new era of autonomous AI agents has begun. At Google I/O 2024, the tech giant unveiled Project Astra, a first-of-its-kind initiative to develop universal AI agents capable of perceiving, reasoning, and conversing in real-time.
“Building on Gemini, we’ve developed prototype agents that can process information faster by continuously encoding video frames, combining the video and speech input into a timeline of events, and caching this information for efficient recall,” said Google DeepMind chief Demis Hassabis, in a blog post.
Hassabis added that with the release it would be easy to see a future where people could have an expert AI assistant by their side via phone or glasses.
We’re sharing Project Astra: our new project focused on building a future AI assistant that can be truly helpful in everyday life. 🤝
— Google DeepMind (@GoogleDeepMind) May 14, 2024
Watch it in action, with two parts – each was captured in a single take, in real time. ↓ #GoogleIO pic.twitter.com/x40OOVODdv
The release comes just a day after OpenAI unveiled GPT-4o, which won hearts online with its ‘omni’ capabilities across text, vision, and audio. OpenAI’s demos, which included a real-time translator, coding assistant, AI tutor, friendly companion, poet, and singer, soon became the talk of the town.
However, its agentic capabilities in particular have caught everyone’s attention, with some even calling it ‘the biggest part of the update’ and ‘a step closer to autonomous agents’.
The GPT-4o desktop app can read your screen in real-time and interact with your OS, revolutionising the way people work. The app allows for voice conversations, screenshot discussions, and instant access to ChatGPT. It’s like having an AI teammate on your device who can help you with whatever you’re working on.
The ChatGPT desktop app just became the best coding assistant on the planet.
— Pietro Schirano (@skirano) May 13, 2024
Simply select the code, and GPT-4o will take care of it.
Combine this with audio/video capability, and you get your own engineer teammate. pic.twitter.com/g4fWcbhXy2
Source: X
OpenAI president and co-founder Greg Brockman also demonstrated human-computer interactions (and even human-computer-computer interactions), giving users a glimpse of pre-AGI vibes.
Introducing GPT-4o, our new model which can reason across text, audio, and video in real time.
— Greg Brockman (@gdb) May 13, 2024
It's extremely versatile, fun to play with, and is a step towards a much more natural form of human-computer interaction (and even human-computer-computer interaction): pic.twitter.com/VLG7TJ1JQx
You can get different instances of GPT-4o to interact with each other. The model can be interrupted in real-time, change its emotion, and even adjust its response with little to no latency. All this is a big breakthrough for building AI agents.
Real-time conversation with a voice agent that can understand the emotion in a person’s voice and that someone can interrupt with no lag, makes GPT-4o extremely helpful for building voice and vision-enabled smart agents.
A promising application is customer service, including a new type of technical support where customers can walk through their problems via video stream, allowing the agent to troubleshoot it in real time with the customer.
This was a fun one! Take a look at 2 AI agents resolving a customer service claim with #OpenAI new #GPT4o.
— Joe Beutler (@JoeBeutler) May 14, 2024
Working with customers to build transformational solutions always gets me fired up. The potential solutions we can build with this new SOTA model has my head spinning! pic.twitter.com/86SNgNI6Tl
These developments show that with GPT-4o, the future is poised to be agent-to-agent. However, with their latest release, it’s clear that Google has also gone all in on AI agents, deploying them across the company’s product ecosystem.
From an agent who can continuously organise all receipts in your inbox into a spreadsheet to an agent who can return your orders, Google has it all. Use cases for the assistant also include aiding in multi-step researching, and reasoning, to even shopping to prepare a meal plan. Need to do something as tedious as updating your email? It has you covered there too, with a browser agent that works across multiple external websites to do tasks like updating addresses across dozens of websites.
Google also introduced an AI Teammate who lives inside Google Workspace to do collaborative tasks.
TLDR: Google is ALL IN on AI agents
— Chief AI Officer (@chiefaioffice) May 14, 2024
AI agents are deployed across their whole product ecosystem.
8 wild demos from Google I/O today:
1. An email agent to continuously organise all receipts in your inbox into a spreadsheet pic.twitter.com/A4ij23uOV1
Despite all this, Google’s Project Astra and AI agent developments have received mixed responses online.
On the one hand, people appreciate Astra’s long context support, memory ‘to remember where the glasses were’, and native video processing capabilities compared to GPT-4o, which some contest only processes a single frame at a time.
Source: X
Many are also saying that with these advanced AI email, browser, and search agent demos, as well as an AI Teammate, Google will likely obliterate many startups focusing on email & browser-based agents.
“One thing Google is doing right: they are finally making serious efforts to integrate AI into the search box. I sense the agent flow: planning, real-time browsing, and multimodal input, all from the landing page. Google’s strongest moat is distribution. Gemini doesn’t have to be the best model to be the most used one in the world,” wrote a user on X.
Not just perplexity but every other vertically focused tool that google touches
— Jerry Liu (@jerryjliu0) May 14, 2024
Google meet + Gemini – competes against zoom, even gong
Gmail + Gemini – competes against any other AI email assistant
Directly competing against any one of googles offerings will be challenging
On the other hand, some are not impressed with Google Astra’s slightly longer latency and are even sceptical on whether the real product will match the ‘too good to be true’ promises made in the demo.
“Remember the last time Google demo’d (sic) their AI it was all a complete lie,” wrote a user online, “Google promised a lot in events but never released like OpenAI does,” added another. Many even went so far as to call the demo an advertisement, rather than an actual demo.
Compared to OpenAI’s live demo, Google making use of a pre-recorded demo has some taking things with a pinch of salt. At least until they get to test the product.
Source: X
After watching Google I/O, it's safe to say what OAI showed yesterday was mind-blowing!!🤯🤯
— Bindu Reddy (@bindureddy) May 14, 2024
Astra is a prototype voice assistant and seemed like a 2-year-old baby to OAI's Scarlett Johansson!!
Everybody Is Bullish on ‘AI Agents’
Regardless of who wins this race, the important thing is that everybody seems to be bullish on AI agents, and soon, we might see a lot more interesting developments take shape.
Source: X
Recently, venture capitalist Vinod Khosla envisioned a future where internet interactions will be done mostly through agents. He predicted a future in which most consumer access to the internet will be agents acting for consumers doing tasks and fending off marketers and bots. “Tens of billions of agents on the internet will be normal,” he wrote.
Similarly, Meta CEO Mark Zuckerberg highlighted the evolving role of AI agents in customer interactions, envisioning a future where businesses and creators each have their own AI to represent their interests.
“A lot of people talk about the ‘ChatGPT moment’, where you’re like ‘Wow, never seen anything like this’. Many people will have kind of a ‘Wow, I couldn’t imagine an AI agent doing this’ moment,” said DeepLearning.AI founder Andrew Ng at Sequoia Capital’s AI Ascent.
Looks like it is finally happening.