Last updated May 27, 2024
In AI News

Google Gemini 1.5 Crushes ChatGPT and Claude with Largest-Ever 1 Mn Token Context Window

It can process vast amounts of information in one go, including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code, or over 700,000 words.

Share

Published on February 15, 2024

by Siddharth Jindal

Google today released Gemini 1.5. This new model outperforms ChatGPT and Claud with 1 million token context window — the largest ever seen in natural processing models. In contrast, GPT-4 Turbo has 128K context window and Claude 2.1 has 200K context window.

“We’ve been able to significantly increase the amount of information our models can process — running up to 1 million tokens consistently, achieving the longest context window of any large-scale foundation model yet.,” reads the blog, co-authored by Google chief Sundar Pichai and Google DeepMind chief Demis Hassabis, comparing it with existing models like ChatGPT and Claude.

Gemini 1.5 Pro comes with a standard 128,000 token context window. But starting today, a limited group of developers and enterprise customers can try it with a context window of up to 1 million tokens via AI Studio and Vertex AI in private preview.

It can process vast amounts of information in one go, including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code, or over 700,000 words. In their research, Google also successfully tested up to 10 million tokens.

Gemini 1.5 is built upon Transformer and MoE architecture. While a traditional Transformer functions as one large neural network, MoE models are divided into smaller “expert” neural networks.

Gemini 1.5 Pro’s capabilities span various modalities, from analysing lengthy transcripts of historical events, such as those from Apollo 11’s mission, to understanding and reasoning about a silent movie. The model’s proficiency in processing extensive code further establishes its relevance in complex problem-solving tasks, showcasing its adaptability and efficiency.

Gemini 1.5 Pro’s performance in the Needle In A Haystack (NIAH) evaluation stands out, where it excels at locating specific facts within long blocks of text, achieving a remarkable 99% success rate. Its ability to learn in-context, demonstrated in the Machine Translation from One Book (MTOB) benchmark, solidifies Gemini 1.5 Pro as a frontrunner in adaptive learning.

This new development comes after Google released the first version of Gemini Ultra just last week. Recently Google added generative AI features to Chrome as well. Google has introduced the “Help me Write” feature across all websites. By right-clicking on any text box, users can access the feature, prompting Google’s AI to inquire about their writing requirements and subsequently generate an initial draft.

While Google is focusing on improving its AI models, OpenAI is reportedly working on a web search product to challenge Google. Additionally, OpenAI is working on its next LLM, GPT-5, which is expected to be smarter than ever, according to Altman.

OpenAI also recently released text to video generation model Sora. It can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt. Meanwhile, Meta is expected to release Llama 3 soon.

📣 Want to advertise in AIM? Book here

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.