With Google’s Gemini 1.5 Flash, the Possibilities are Endless

The cost + latency + context window size of Flash can create so many new startups.

Share

Published on May 21, 2024

by Sukriti Gupta

Google announced a new model, Gemini 1.5 Flash, at the Google I/O 2024. It’s a lightweight AI model optimised for speed and efficiency with a massive context window of 1M tokens.

Designed to handle tasks that require quick responses, it is capable of multimodal reasoning, which means it has the ability to simultaneously process and understand various types of data such as text, images, audio, and video.

It is a valuable tool for situations where time and efficiency are crucial and can be used in various applications from a customer service chatbot and generating captions or images for social media posts, to scientific research and business analytics.

People are still underestimating the value of Gemini 1.5 Flash.

For $0.35, you can get 1 million tokens and start building natively multi-modal projects.

The cost + latency + context window size + intelligence of Flash is going to create so many new startups.
— Logan Kilpatrick (@OfficialLoganK) May 18, 2024

“Gemini 1.5 Flash excels at summarisation, chat applications, image and video captioning, data extraction from long documents and tables, and more,” wrote Demis Hassabis, the CEO of Google DeepMind.

Hassabis further added that Google created Gemini 1.5 Flash to provide developers with a model that was lighter and less expensive than the Gemini 1.5 Pro version.

Despite being lighter in weight than Gemini Pro, Gemini 1.5 Flash is just as powerful. This is because it’s been trained through a process called “distillation”, where the most essential knowledge and skills from Gemini Pro are transferred to 1.5 Flash but in a way that makes the Flash model smaller and more efficient.

In addition to being the fastest in the Gemini family, it’s also more cost efficient to use, making it a faster and less expensive way for developers building their own AI products and services.

How Does Gemini 1.5 Flash Compare to Other Models?

Source: X

Many users tested Gemini 1.5 Flash and compared it with other models and in most cases 1.5 Flash performed impressively.

Over the weekend, I tested 3 LLMs to get relevancy score

1. Haiku
2. Gemini Flash 1.5
3. Perplexity: Llama3 Sonar 8B

Haiku didn't care to follow my instructions most of the time, btw I used claude to write the prompt.

Gemini Flash worked pretty well

Perplexity worked really…
— Praveen Kumar | Building MevinX (@PraveenInPublic) May 20, 2024

When compared with GPT-4o, a user posted that 1.5 Flash performed almost as well as GPT-4o on the StaticAnalysisEval benchmark. Additionally, it is faster and more cost-effective than GPT-4o, making it a compelling alternative.

A user tested GPT- 3.5 Turbo, Claude Haiku, and Gemini 1.5 Flash to check which model aligns most closely with GPT-4o in terms of accuracy for a specific classification task. Flash emerged as the clear winner.

Another posted that Gemini 1.5 Flash was better than Llama-3-70b on long context tasks. “It’s way faster than my locally hosted 70b model (on 4*A6000) and hallucinates less. The free of charge plan is good enough for me to do prompt engineering for prototyping,” he wrote.

A user ran 1.5 Flash on some evals for automatically triaging vulnerabilities in code, and did the same with GPT-4-Turbo hosted on Azure, Llama-3 70B hosted on Groq, and GPT-4o hosted on OpenAI as well.

“It’s very fast and very cheap. The results were pretty much on par with the other models in terms of accuracy,” he concluded.

I played with Google's new Gemini 1.5 Flash model over the weekend and was quite impressed.

It's not the best model out there, but can be very powerful if it works for your use case.

It's more verbose, but very fast and very cheap.

I ran it on some of our evals for… pic.twitter.com/mJIff2gerA
— Stefan Streichsbier (@s_streichsbier) May 20, 2024

Another user ran various tests for both Gemini Flash as well as GPT-4o and agreed that Google’s new model is impressive – cheaper, sometimes faster, and gives similar results to GPT-4o. “A combination of the two using LLM agentic workflow is the solution,” he added.

The new gemini-flash is 19x cheaper than gpt-4o & nearly as good.

But I don't trust benchmarks.

So I run my own tests:
test #1 → analyze youtube for me pic.twitter.com/E1bZsbjqzj
— Ruben Hassid (@RubenHssd) May 19, 2024

However, some have also raised concerns about the model’s low rate limit that is creating roadblocks in using it in production in any capacity.

Source: X

Interesting Use Cases of Gemini 1.5 Flash

Online users have been trying their hands on the model and are coming up with interesting use cases.

Introducing DIY-Astra, a small but powerful web app powered by Gemini 1.5 Flash. ⚡️

Astra will tell you anything it sees in the camera, essentially in real-time.

I was so impressed when I saw that it can also solve visual questions as well.

Repo in the comment. pic.twitter.com/eHUHPGEMHg
— Pietro Schirano (@skirano) May 16, 2024

The 1M token context, low cost, and high speed of Gemini 1.5 Flash make it a perfect tool to create exciting applications like these.

Gemini 1.5 Flash for WebScrapping

Gemini 1.5 Flash is ideal for web scraping. It simplifies the process by eliminating the need for HTML selectors and adapts to various HTML structures across devices, countries, and products. The model works efficiently with any web page technology, including JavaScript and pre-rendered HTML.

I'm testing #Gemini 1.5 Flash for #WebScrapping and the results are amazing

Gemini 1.5 Flash is a multimodal, lightweight, and affordable AI model (35 cents per million input tokens) for web scraping.

Here’s why AI is great for scraping:
🤯 No more dealing with HTML selectors.… pic.twitter.com/5wm2kCiUnp
— Xavi Ramirez (@xaviramirezcom) May 20, 2024

Analyse a Video to Produce Script

An online user gave Gemini 1.5 Flash a video recording of him shopping and it generated the Selenium code of the site in just about 5 seconds.

This is mind blowing 🤯

I gave Gemini 1.5 Flash video recording of me shopping and it gave me Selenium code in ~5 seconds.

This can change so many things. pic.twitter.com/Ojm6aueLe7
— Min Choi (@minchoi) May 18, 2024

Gemini-1.5-Flash as a Copilot in VSCode

By connecting CodeGPT with Google AI Studio, you can leverage the power of Gemini 1.5 Flash to enhance your coding experience.

Gemini-1.5-flash as a Copilot in VSCode is amazing!

You can now use this model by connecting CodeGPT with Google AI Studio.@codegptAI + @googleaistudio

In this video, I show how CodeGPT manages to get the entire context of the "Quick Fix" section and Gemini provides a… pic.twitter.com/E6eGczLgtb
— Daniel San (@dani_avila7) May 15, 2024

A Great Option for Voice AI

Gemini 1.5 Flash is a great option for voice AI, with first token around 500 ms and 150 tokens/s.

Gemini 1.5 Flash is a game changer for voice-based products. Adding it to Voqal really shows how easy it will be to interact with machines in the future.

It took <5min to "teach" my assistant how to watch my CI builds and alert me when they finish. Zero keywords. Zero wake… pic.twitter.com/t7NFJgRkxT
— Voqal (@voqaldev) May 16, 2024

Gemini YouTube Researcher

Let Gemini be your YouTube researcher. Simply input a topic, and the AI analyses relevant videos to deliver a comprehensive summary, simplifying your research by extracting key insights efficiently.

1. Gemini YouTube Researcher

– Listens to videos & delivers topical reports.

– Write a topic and AI will analyze relevant videos & provide a comprehensive report. pic.twitter.com/pQ0EkMPAXg
— Saumya Singh (@saumya1singh) May 20, 2024

This shows that with Gemini 1.5 Flash’s cost, latency, and 1M tokens context, alongside the OpenAI GPT-4o, which is also plausibly a lightweight model, the possibilities are endless.

📣 Want to advertise in AIM? Book here

Sukriti Gupta

Having done her undergrad in engineering and masters in journalism, Sukriti likes combining her technical know-how and storytelling to simplify seemingly complicated tech topics in a way everyone can understand