You Don’t Mess with Google

‘Attention’ has returned to Google

Share

Illustration by Nikhil Kumar

Published on August 5, 2024

by Siddharth Jindal

The headline seems apt for Google at the moment. Google DeepMind’s latest model, Gemini 1.5 Pro’s experimental version, 0801, recently claimed the top spot in Arena, surpassing GPT-4o and Claude-3.5 with an impressive score of 1300. It also achieved first place on the Vision Leaderboard.

We have an experimental updated version of Gemini 1.5 Pro that is #1 on the @lmsysorg Chatbot Arena. This model is a significant improvement over earlier versions of Gemini 1.5 Pro (it cracks into 1300+ elo score territory).

I'm really proud of the whole team of people that… https://t.co/0uczGKBlFr
— Jeff Dean (@🏡) (@JeffDean) August 1, 2024

It excels in multilingual tasks and delivers a robust performance in technical areas like math, hard prompts, and coding. This version of Gemini 1.5 Pro is available for early testing and feedback in Google AI Studio and the Gemini API, where developers can try it out.

“Very impressed with the extraction capabilities from images and PDFs. It does feel super close to what GPT-4o gives in terms of vision capabilities and what Claude 3.5 Sonnet gives me on code generation and PDF understanding/reasoning,” said Elvis Saravia, co-founder, DAIR.AI.

Dang, the new Gemini model is quite good

I gave it an excellent, complicated game source book PDF and asked it to roll up a character, referencing many pages. There may be errors, but wow. First time I have seen an AI get so far

Interestingly, it decided to play… a secret AI pic.twitter.com/XkAs4Ps9YU
— Ethan Mollick (@emollick) August 2, 2024

“The new Gemini model can execute code and is quite ‘sharp.’ We might see it become the first competitor to the Code Interpreter,” said Ethan Mollick, assistant professor at Wharton. He further said that, while experimenting with it for data analysis, some capabilities seemed odd, like the fact that it couldn’t access files unless they are uploaded each time. “Nevertheless, it seems promising so far,” he said.

Most recently, Google also upgraded its free tier Gemini model with 1.5 Flash, which brings improvements in quality and latency, with especially noticeable enhancements in reasoning and image understanding. The tech giant also expanded the context window in Gemini Advanced, quadrupling it to 32K tokens.

Following in the footsteps of its competitor OpenAI, Google has announced improvements to Gemini 1.5 Flash, reducing input costs by up to 85% and output costs by up to 80%, effective from August 12, 2024.

This is unreal. We're legitimately getting to a point where intelligence might be too cheap to meter

Gemini flash will soon cost ~$0.05/1m tokens

For reference, ~2 years ago gpt-3.5 was $0.06$/1k tokens

In time we got 100x cheaper models that are 10x smarter pic.twitter.com/C2qxJThDIz
— Sully (@SullyOmarr) August 2, 2024

Apart from focusing on the LLMs, Google entered the SLM scene as well by releasing Gemma 2. This new model outperformed GPT-3.5 on the Chatbot Arena leaderboard and is optimised for efficient deployment across a wide range of hardware, from edge devices to robust cloud environments. Leveraging NVIDIA’s TensorRT-LLM library optimisations, it supports deployments in data centres, local workstations, and edge AI applications.

InflectionAI, AdeptAI, CharacterAI. Who’s next?

Google recently brought back Noam Shazeer, one of the original authors of the famous Transformers paper ‘Attention is All You Need’, who had left Google to create his own chatbot company, Character.AI. Surprisingly, Character.AI has more web traffic in the US than Gemini, according to Similarweb.

This is similar to what Microsoft and Amazon have done with Inflection and Adept AI, respectively. Microsoft hired much of Inflection’s leadership team and employees, agreeing to pay a licensing fee of approximately $650 million. Meanwhile, Amazon hired Adept’s co-founder and CEO David Luan, along with several other co-founders and key team members.

What is going on with large tech companies 'acquiring' AI companies (Inflection, Adept, now Character)? They license their technology, hire the co-founders and researchers, but leave a shell of a company that still exists. It must be an antitrust/regulatory arb pic.twitter.com/7Y2j0O463S
— Luke Michael Byrne (@byrnemluke) August 2, 2024

“Absolutely delighted to welcome my longtime Google colleague Noam Shazeer back to Google!” said Jeff Dean, Google Deepmind’s chief scientist.

Similarly, others from Deepmind followed suit in welcoming Shazeer back. “Welcome back Noam Shazeer to Google! It’ll be a great time working together again since 2018. Let’s take Gemini which is #1 and continue expanding the limits of its capabilities,” said Dustin Tran, research scientist at Google Deepmind.

In a new arrangement, Google will pay a licensing fee to Character.AI for its models. Around 30 of Character.AI’s 130 staff members, specialising in model training and voice AI, will join Google to advance its Gemini AI initiatives.

Interestingly, Character.AI recently introduced Character Calls, allowing users to engage in two-way voice conversations with their favourite characters, like Michael Jackson, Sherlock Holmes, or Albert Einstein. “It’s like having a phone call with a friend,” the company said.

“So this also means that google is getting into the “ai companion” market to compete with openai’s “her” lol,” one user posted on X in response.

Character.AI allows users to create their own chatbots, defining their personalities, traits, and backstories. These characters can be based on real people, fictional characters, or entirely new creations.

This is similar to what Meta AI is doing by introducing customised AI characters across its social media platforms. Meta recently introduced AI Studio, a new platform that allows users to create, share, and discover custom AI characters without needing technical skills.

OpenAI recently rolled its advanced voice mode to a small group, which has been received well with the users finding several usecases for it. However, GPT-4o is still not multimodal yet.

“Advanced voice mode on ChatGPT is very impressive, but it doesn’t yet have the full features enabled: no multimodal video or images in, no GPTs, no new image generator, no code interpreter or internet access,” said Mollick.

On the other hand, Google Deepmind’s latest models AlphaProof and AlphaGeometry 2 solved four out of six problems from this year’s International Mathematical Olympiad (IMO), achieving a score equivalent to a silver medalist in the competition.

The word on the street is that OpenAI is also working on advanced reasoning capabilities for its next frontier model with ‘Project Strawberry’, which is most likely to be GPT-5. However, it appears that it is not coming out anytime soon.

“Little birdies told me Q* hasn’t been released yet as they aren’t happy with the latency and other little things they want to further optimise,” posted an insider who goes by the name Jimmy Apples on X.

“Not sure if it’s big patience or small patience.”

📣 Want to advertise in AIM? Book here

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.