Last year, Snowflake acquired Neeva, which turned out to be one of its best decisions.
Sridhar Ramaswamy founded Neeva, an ad-free, privacy-focused search engine, alongside another ex-Google executive, Vivek Raghunathan, in 2020. At the time of its last funding round in 2021, Neeva was valued at about $250 million.
Fast forward to the present, Snowflake is a household name for enterprises. The company recently introduced a slew of open-source models, including Arctic LLM, designed for enterprises that want to use large language models (LLMs) to create conversational SQL data copilots, code copilots, and RAG chatbots.
Credit for this goes to Sridhar Ramaswamy, who, earlier this year, became Snowflake’s new CEO. Since assuming the position, the company has transformed from a data cloud company to a data and AI-driven entity with a strong emphasis on generative AI.
“I think it’s a huge opportunity in the world of data applications and AI. It will keep me busy for many years to come,” said Ramaswamy in a recent interview after taking the helm at Snowflake.
In an exclusive interview with AIM, Snowflake head of AI Baris Gultekin said that he had worked with Ramaswamy for over 20 years at Google, calling him an incredible leader. “Sridhar brings incredible depth in AI as well as data systems. He has managed super large-scale data systems and AI systems at Google,” Gultekin said.
Neeva’s expertise in generative AI and LLMs, now integrated into the Snowflake Data Cloud, has enhanced Snowflake’s AI capabilities. Especially in natural language processing and search functionalities within its cloud data platform.
“Neeva is an important acquisition for Snowflake. We are integrating many things from Neeva into Snowflake’s offerings, the most obvious one of which is Snowflake’s Universal Search product,” said Gultekin.
Universal Search helps customers quickly and easily find database objects in their account, data products available in the Snowflake Marketplace, relevant Snowflake Documentation topics, and Snowflake Community Knowledge Base articles.
Snowflake’s Generative AI Prowess
While there are several generative AI models out in the market, Snowflake has selected its niche in targeting enterprise customers. Recently, the company made Snowflake Cortex generally available.
Cortex grants access to pre-trained LLMs from various providers, including Snowflake’s own Arctic LLM. These models can perform tasks like text summarisation, sentiment analysis, question answering, and code generation, all within the Snowflake environment.
Moreover, Cortex offers pre-built SQL functions that enable users to perform machine learning tasks on their data without extensive coding expertise. These functions handle tasks like classification, regression, and anomaly detection.
Currently, Snowflake Arctic outperforms leading open models such as DBRX, Llama 2 70B, Mixtral-8x7B, and more in coding (HumanEval+, MBPP+) and SQL generation (Spider and Bird-SQL), while also providing superior performance in general language understanding (MMLU).
Snowflake has also partnered with Mistral, Meta, and Reka to host their LLMs on Cortex. “We’ve partnered with Landing AI, AI21 Labs, and other capable partners to build amazing products. They’re important to us as they allow us to provide choices to our customers,” said Gultekin.
Gultekin further said that Snowflake is developing LLMs at a very affordable price, prioritising the security of their customers’ data. “Despite using a 17x less compute budget, Arctic is on par with Llama 3 70B in language understanding and reasoning while surpassing in Enterprise Metrics,” said Gultekin
Additionally, he said that they had 10,000 customers entrusting Snowflake with their sensitive data. With this in mind, he emphasised that all the LLMs that they operate are within strict security parameters, meaning no data leaves and everything remains secure.
Moreover, he added that even though Arctic LLM is orders of magnitude smaller compared to OpenAI, the benchmark proves that they excel in document understanding and question answering with their document data model.
Snowflake recently introduced Document AI to extract valuable content from unstructured data like PDFs, images, and videos. Powered by Arctic-TILT, a multimodal large language model, it offers efficient content extraction for enterprises.
“We’re just getting started. There’s a lot to build. I’ll say the core use cases for us are being able to talk to data and how we can make that a lot better and a lot easier,” concluded Gultekin, saying they put out a whole pile of products just recently for public preview. This included a series of chat products that are able to chat with structured data.
Snowflake Is Not Alone
Coincidentally, Snowflake’s acquisition of Neeva is similar to Databricks’ acquisition of MosaicML. Naveen Rao, who founded MosaicML, is now the VP of generative AI at Databricks.
MosaicML specialises in optimising machine learning models and has been integrated into Databricks’ offerings to enhance generative AI development.
Recently, Databricks also released its own mixture of expert models, DBRX, built with 132 billion parameters and pre-trained on a dataset of 12 trillion tokens. DBRX outperforms GPT-4, particularly in niche areas like SQL and RAG tasks.