Last updated February 28, 2024
In AI Origins & Evolution

What to Expect from Google’s Gemini?

Gemini, is expected to be launched sometime next month. But what can be expected from the AI model on Enterprise, Medical & other fronts?

Share

Published on August 16, 2023

by Shyam Nandan Upadhyay

Google DeepMind’s most awaited foundational model, Gemini, is expected to be launched sometime next month. Demis Hassabis recently claimed that engineers at DeepMind are using techniques from AlphaGo for Gemini—which will be its challenger in the AI race and was teased during Google’s I/O event. Hassabis claims that Gemini will be more capable than OpenAI’s GPT-4.

“At a high level you can think of Gemini as combining some of the strengths of AlphaGo-type systems with the amazing language capabilities of the large models,” Hassabis said. “We also have some new innovations that are going to be pretty interesting,” he added.

In April, Google brought Google Brain and DeepMind teams together into a single unit—Google DeepMind. Pichai’s unexpected merger aimed to boost efficiency, using Google’s seemingly unending computational resources and DeepMind’s meticulous research to build more capable systems which would be the next frontier in this AI arms race.

Before that, both entities developed individual responses against ChatGPT. While DeepMind initiated Project Goodall, using an undisclosed model called Chipmunk, Google launched Bard based on Google Brain models. Despite a rivalry between the teams, DeepMind abandoned Goodall to collaborate on Gemini.

However, people forget that PaLM and PaLM 2 were not created by Deepmind. Gemini will be deepminds potentially first popular commercialised model that won’t be stuck in research like Gato and it’s other interesting models.

Despite being in the early stages of development, Google reports significant advancements in Gemini’s multimodal capabilities, surpassing preceding models. Its notable that Gemini was conceived from the ground up with a multifaceted design approach. This design not only prioritises being multimodal, allowing it to process and understand various forms of data, but also emphasises high efficiency in terms of tools and API integrations. Gemini’s architecture is furthermore poised to facilitate future innovations, specifically memory and planning.

The implications of this progress are substantial, as it hints at enhanced comprehension and interaction with diverse types of data. While GPT-4 is adept at understanding and generating conversational text, Gemini will transcend this by being proficient in processing various inputs, including text, images, and videos. Gemini will also be capable of generating outputs in the form of text, videos, audio, music, and images. Additionally, it will possess reasoning capabilities and the ability to facilitate translations across diverse languages and input formats.

In addition, discussions among Google employees have revolved around using Gemini for various functionalities. This includes tasks such as analysing charts, producing graphics accompanied by text descriptions, and operating software through text or voice commands.

Boosting Enterprise Services

Google is pinning its hopes on Gemini to fuel a range of services. These applications span from the Bard chatbot, which competes with OpenAI’s ChatGPT, to enterprise-oriented platforms like Google Docs and Slides. In pursuit of this goal, Google envisions charging app developers for access to Gemini via its Google Cloud server-rental division. At present, Google Cloud provides access to less advanced Google-designed AI models through Vertex AI. By incorporating these new attributes, Google aims to narrow the gap with Microsoft, which has surged ahead in integrating new AI features into its Office 365 suite. Microsoft has also been offering OpenAI’s models to its application users.

Unleashing New Medical Use-cases

Google has been high on integrating its AI models to develop medical use cases. It has been testing an AI tool called Med-PaLM 2, which would answer medical questions. The product is being tested at renowned healthcare institutions like the Mayo Clinic research hospital.

These efforts could be magnified with Gemini and could be used in medical chatbots or robotics to help with surgeries or assistance in medical procedures.

Building Super Cool Robots

In addition, Google might also look to integrate their insights from building DeepMind’s Gato, a “general-purpose” system, which was trained to complete 604 tasks through multi-modal, multi-task training, including image captioning, dialogue, robot-arm block stacking, playing games, and navigating 3D environments.

Gato’s unique aspect is its task diversity and training approach which employed a transformer neural network and various data modalities like text, images, and actions. During deployment, Gato tokenises prompts and observations to generate actions sequentially.

Similarly, with the recent launch of RT-2, which is based on Transformer architecture and trained on web text and images, empowering it to directly generate robotic actions.

Similar to language models, it learns from web data to guide robot behaviour. This innovation builds on vision-language models (VLMs) like PaLI-X and PaLM-E, using action tokens in its output to control robots’ behaviour effectively.

With the recent launch of its RT-2 a successor to its Robotics Transformer model, Google DeepMind has taken a leap forward in robotics as well. RT-2 is based on Transformer architecture and trained on web text and images, which empowers it to directly generate robotic actions.

This innovation builds on vision-language models (VLMs) like PaLI-X and PaLM-E, using action tokens in its output to control robots’ behaviour effectively. Similar to language models, it learns from web data to guide robot behaviour.

While DeepMind’s Gato was seen as a stride toward artificial general intelligence (AGI), because of its capability of diverse tasks—Gemini could accelerate its AGI push.

Might Kill OpenAI’s GPT-4

The fact that Google Brain and DeepMind are working on this together possibly means trouble for OpenAI and other competitors. Additionally, others like former Google president Sergey Brin have joined forces to strengthen its AI capabilities.

OpenAI Chief Sam Altman believes video training is the next frontier, however, Google has got an edge and is sitting on top of the world’s biggest video library—Youtube.

Gemini is being trained on YouTube videos and would be the first multi-modal model being trained on video rather than just text (or in GPT-4’s case text plus images). This might equip Gemini with capabilities well beyond GPT-4. Don’t forget that it has access to nearly all of the web, to which Google claimed a stake recently by changing its privacy policy.

Not just that, there are reports indicating that Gemini is being trained on twice the number of tokens as GPT-4 was, and 10x that of PaLM 2. manifold compute making it significantly smarter and less prone to hallucinations. Not just that, with friction between OpenAI and Microsoft lately, Google could be the tortoise to beat OpenAI to the punch and become the first to arrive at AGI or AGI-like model.

📣 Want to advertise in AIM? Book here

Shyam Nandan Upadhyay

Shyam is a tech journalist with expertise in policy and politics, and exhibits a fervent interest in scrutinising the convergence of AI and analytics in society. In his leisure time, he indulges in anime binges and mountain hikes.