snowflake News, Stories and Latest Updates https://analyticsindiamag.com/news/snowflake/ Artificial Intelligence news, conferences, courses & apps in India Mon, 12 Aug 2024 07:03:04 +0000 en-US hourly 1 https://analyticsindiamag.com/wp-content/uploads/2019/11/cropped-aim-new-logo-1-22-3-32x32.jpg snowflake News, Stories and Latest Updates https://analyticsindiamag.com/news/snowflake/ 32 32 Shiprocket and Snowflake Partner to Boost Data Infrastructure and GenAI https://analyticsindiamag.com/ai-news-updates/shiprocket-and-snowflake-partner-to-boost-data-infrastructure-and-genai/ https://analyticsindiamag.com/ai-news-updates/shiprocket-and-snowflake-partner-to-boost-data-infrastructure-and-genai/#respond Wed, 10 Jul 2024 07:43:02 +0000 https://analyticsindiamag.com/?p=10126359

The partnership claims to reduce data processing time from days to minutes and merchants can gain real-time data insights, optimising operations and decision-making.

The post Shiprocket and Snowflake Partner to Boost Data Infrastructure and GenAI appeared first on AIM.

]]>

Indian eCommerce logistics and shipping software solution provider Shiprocket and AI data cloud giant Snowflake have joined hands to improve data operations and provide faster access to data for over one lakh of its merchants in India. This collaboration is an effort to help businesses make quicker, data-driven decisions and scale data infrastructure.

The firm’s integration with Snowflake claims to reduce data processing time from days to minutes and merchants can gain real-time data insights, optimising operations and decision-making.

“This collaboration with Snowflake is a transformative milestone for our 1.5 lakhs-strong seller community, collectively driving an annualised GMV of over three billion dollars,” said Saahil Goel, managing director and chief executive officer of Shiprocket.

Moreover, the McKinsey-backed company plans to explore generative AI by developing chatbots that allow sellers to interact with their data using natural language to improve accessibility and user experience through Snowflake AI Data Cloud. 

Recently, Snowflake introduced new features to Snowflake Cortex AI and Snowflake ML for expanding access to enterprise AI through a no-code interactive interface and providing access to leading LLMs. The new features are built on Meta’s Llama 3 and Mistral large models. 

“As Shiprocket expands its operations, Snowflake’s AI Data Cloud provides a scalable, cost-effective, secure platform to support their diverse data needs to drive business value” said Vijayant Rai, MD India, Snowflake.

In a previous interaction with AIM, Rai said, “We’re looking at India in a multi-dimensional way,” Rai said, underscoring the company’s diverse operations. This includes a significant presence in Pune, where a team of 500 professionals handles operations and support. Additionally, Snowflake is leveraging India as a hub for global customers through its Global Capability Centers (GCCs).

It is addressing the need for skilled developers by providing extensive training and certification programs in rural India. This expands Tier-2 and Tier-3 locations, and small villages in India.

The post Shiprocket and Snowflake Partner to Boost Data Infrastructure and GenAI appeared first on AIM.

]]>
https://analyticsindiamag.com/ai-news-updates/shiprocket-and-snowflake-partner-to-boost-data-infrastructure-and-genai/feed/ 0
AIM Workshop ALERT: RAG & Fine Tuning in GenAI with Snowflake https://analyticsindiamag.com/ai-highlights/aim-workshop-alert-rag-fine-tuning-in-genai-with-snowflake/ https://analyticsindiamag.com/ai-highlights/aim-workshop-alert-rag-fine-tuning-in-genai-with-snowflake/#respond Wed, 03 Jul 2024 13:33:27 +0000 https://analyticsindiamag.com/?p=10125712 AIM Workshop ALERT: RAG & Fine Tuning in GenAI with Snowflake

This webinar will clarify the differences between RAG and fine-tuning, alongside how and when to use these methods effectively.

The post AIM Workshop ALERT: RAG & Fine Tuning in GenAI with Snowflake appeared first on AIM.

]]>
AIM Workshop ALERT: RAG & Fine Tuning in GenAI with Snowflake

AIM in partnership with Snowflake is hosting an exclusive webinar on 25th July from 6 PM to 7:30 PM titled “RAG & Fine Tuning in GenAI with Snowflake”. 

This event is designed to provide AI enthusiasts, researchers, and professionals with a deep dive into the advanced techniques of retrieval augmented generation (RAG) and fine tuning, crucial for developing AI applications.

Meet the Expert: Prashant Wate

Prashant Wate, Senior Sales Engineer at Snowflake India, will lead this insightful session. With extensive experience in AI and enterprise solutions, Wate will guide the participants through the intricacies of RAG and fine-tuning, demonstrating how these techniques can revolutionise your AI projects.

Register Now!

What You Will Learn?

Understanding RAG and Fine Tuning

This webinar will clarify the differences between RAG and fine-tuning, alongside how and when to use these methods effectively. Gain practical knowledge and explore real-world applications that will set the stage for advanced AI innovations in your organisation.

Enhancing Generative Models with RAG

RAG is a technique designed to reduce hallucinations (incorrect responses) in generative models by integrating private datasets and vector embeddings. This method significantly boosts the capabilities of generative models, ensuring they align with your enterprise requirements.

Optimising Models with Fine-tuning

Fine-tuning involves adjusting pre-trained LLMs on a specific dataset to improve their performance on targeted tasks. This process is essential for optimising models to better understand and respond to domain-specific queries, ensuring higher accuracy and relevance in the generated outputs.

Leveraging Snowflake’s Platform

Discover how Snowflake’s robust platform for data governance and management can be leveraged to develop and deploy end-to-end AI applications using RAG. 

With features like Snowflake Cortex, Streamlit, and Snowpark, you can achieve this without the need for additional integrations, infrastructure management, or data movement.

Why Attend?

This webinar is particularly valuable for developers who are eager to enhance their skills and knowledge in Generative AI. By attending, you will be equipped with the tools and techniques needed to drive advanced AI innovations in your organisation.

Register Now!

Secure your spot for the AIM Workshop on 25th July from 6 PM to 7:30 PM. 

Register today and take the first step towards mastering RAG and fine-tuning with Snowflake.

The post AIM Workshop ALERT: RAG & Fine Tuning in GenAI with Snowflake appeared first on AIM.

]]>
https://analyticsindiamag.com/ai-highlights/aim-workshop-alert-rag-fine-tuning-in-genai-with-snowflake/feed/ 0
Databricks is Taking the Ultimate Risk of Building ‘USB for AI’  https://analyticsindiamag.com/developers-corner/databricks-is-taking-the-ultimate-risk-of-building-usb-for-ai/ Sat, 15 Jun 2024 12:30:00 +0000 https://analyticsindiamag.com/?p=10123756

Databricks envisions bringing both Delta Lake and Iceberg formats closer in the future to a point where their differences won’t matter.

The post Databricks is Taking the Ultimate Risk of Building ‘USB for AI’  appeared first on AIM.

]]>

Databricks acquiring Tabular was the talk of the Bay Area at the recent Data + AI Summit.

Whether by coincidence or not, the announcement was made during Snowflake’s Data Cloud Summit, held last week.

Databricks chief Ali Ghodsi definitely has some answers, or maybe not? 

“Now, at Databricks, we have employees from both of these projects, Delta and Iceberg. We really want to double down on ensuring that Delta Lake UniForm has full 100% compatibility and interoperability for both of those,” Ghodsi said, admitting they don’t understand all the intricacies of the Iceberg format, but the original creators of Apache Iceberg do. 

Talking about Databricks’ mission to democratise data and AI, Ghodsi started his keynote by saying that every company today wants GenAI but at the same time everybody is worried about the security and privacy of their data estate which is highly fragmented. 

He pointed out that the data estate of every company is placed into several data warehouses and the data is siloed everywhere. This ends up bringing a lot of complexity and huge costs to the companies and ultimately gets them locked into these proprietary system silos. 

Databricks’ Delta Lake Project (+ Apache Iceberg) to the Rescue!

With a focus on addressing these issues, Databricks announced the open-source Delta Lake Project a few years back. 

Ghodsi explained that the idea was to let users own their data and store it in data lakes where any vendor can then plug their data platforms into that data, allowing users to decide which platform suits them best. This removes lock-in, reduces the cost, and also lets users get many more use cases by giving them the choice to use different engines for different purposes if they want. 

“This was our vision and we almost succeeded but unfortunately there are now two camps. At Databricks we have Delta Lake, but a lot of other vendors are using this other format called Apache Iceberg,” said Ghodsi.

Delta Lake and Apache Iceberg emerged as the two leading open-source standards for data lakehouse formats. Despite sharing similar goals and designs, they became incompatible due to their independent development.

Over time, various open-source and proprietary engines adopted these formats. However, they typically adopted only one of the standards, and frequently, only aspects of it. This selective adoption effectively fragmented and siloed enterprise data, undermining the value of the lakehouse architecture.

Now, with the Tabular acquisition, Databricks intends to work closely with the Iceberg and Delta Lake communities to bring interoperability to the formats themselves as highlighted by Ghodsi. 

Tabular, Inc, a data management company was founded by Ryan Blue, Daniel Weeks, and Jason Reid. Blue and Weeks had developed the Iceberg project at Netflix and donated it to the Apache Software Foundation. 

As the largest contributor to Iceberg, Tabular is seen as the company driving Iceberg, playing a key role in advancing Iceberg within data management frameworks. 

“I’ve known Ryan for a long time. We worked closely with him when he was back at Netflix, and some of the team members were working with him even before that when he was at Cloudera. So it’s been a very close collaboration,” Ghodsi said. 

Databricks’ UniForm, now generally available, offers interoperability among Delta Lake, Iceberg, and Hudi. It supports the Iceberg restful catalogue interface so that companies can use their existing analytics engines and tools across all their data. 

Furthermore, with the inclusion of the original Iceberg team, the company plans to expand the scope and ambitions of Delta Lake UniForm. It envisions bringing both the Delta Lake and Iceberg formats closer in the future to a point where their differences won’t matter, according to Ghodsi.

So basically, with the Tabular acquisition, Databricks seems to be trying to build this pendrive or USB port of sorts that can be plugged into AI systems in the future — achieving 100% interoperability.

“It will simplify the developer experience and allow them to move up the stack in the value chain. As, instead of worrying about which version of Iceberg or Delta they are using, developers can be rest assured that all of that is solved through the UniForm format,” said Databricks’ vice president of field engineering APJ Nick Eayrs in an exclusive interview with AIM

Eayrs explained that with this, developers will now be able to spend more time on analysis, enrichment, and transformation of the data rather than worrying about version conflicts. 

“Our commitment is reinforced to open source. We have open-sourced our Unity Catalog, and we continue to build the default open standard when it comes to the data format,” he added. 

The Other Side of the Acquisition 

The tabular acquisition development came just after Databricks’ competitor Snowflake announced it had adopted Apache Iceberg tables as a native format and also introduced Polaris, a catalogue for Iceberg tables accessible by any data processing engine that could read the format, such as Spark, Dremio, and even Snowflake itself.

In addition, Microsoft announced an expanded partnership with Snowflake. As part of this, Microsoft Fabric’s OneLake will now support Snowflake’s Iceberg tables and facilitate bi-directional data access between Snowflake and Fabric.

Databricks’ decision to acquire Tabular was spurred by customer demand for better interoperability among data lake formats. People have also weighed in on the significance of Databricks’ Tabular purchase in light of Snowflake’s activity.

While the acquisition of Tabular indicates that both Databricks and Snowflake are positioning for the influence of AI on data infrastructure, the purchase has clearly put new pressure on Databricks’ competitors including Snowflake. 

When asked if Databricks is planning to work closely with Snowflake to bring Iceberg and Delta Lake together, Ghodsi added, “Governance in open-source projects involves working closely with the community and those that have committers in the project. And if some of these committers happen to be employed by Snowflake, we’ll work with them as well.”

The post Databricks is Taking the Ultimate Risk of Building ‘USB for AI’  appeared first on AIM.

]]>
Snowflake Looks to Upskill Developers in India’s Rural Towns https://analyticsindiamag.com/intellectual-ai-discussions/snowflake-looks-to-upskill-developers-in-indias-rural-towns/ Wed, 12 Jun 2024 11:21:18 +0000 https://analyticsindiamag.com/?p=10123378 Snowflake Looks to Upskill Developers in India’s Rural Towns

"We believe it's part of our charter to educate the market," said Vijayant Rai.

The post Snowflake Looks to Upskill Developers in India’s Rural Towns appeared first on AIM.

]]>
Snowflake Looks to Upskill Developers in India’s Rural Towns

Snowflake is making significant strides in India, with a vision that transcends traditional go-to-market strategies. AIM caught up with Vijayant Rai, the managing director of Snowflake India, at Snowflake’s Data Cloud Summit 2024. Rai elaborated on the company’s multifaceted approach to establishing a robust presence in the country.

“We’re looking at India in a multi-dimensional way,” Rai said, underscoring the company’s diverse operations. This includes a significant presence in Pune, where a team of 500 professionals handles operations and support. Additionally, Snowflake is leveraging India as a hub for global customers through its Global Capability Centers (GCCs). 

“It’s not just about go-to-market; it’s also about what we can do for global customers from India,” he explained. Snowflake aims to be an integral part of this transformation by partnering with enterprises and SMBs to drive innovation.

The company’s engagement with India is driven by the country’s rapid economic growth and digital transformation. “India is the fastest growing global economy, with a growth rate of over 7%,” he said. Rai noted that the widespread adoption of digital public goods like the UPI framework has positioned India as a data-driven economy. 

The AI in India Approach

Snowflake’s approach to AI is pragmatic, focusing on laying strong data foundations. “There’s no AI strategy without a data strategy,” Rai asserts. The company is helping enterprises break data silos and establish robust data governance frameworks. This is essential for leveraging advanced technologies like generative AI, which Rai believes will revolutionise how businesses operate.

Snowflake is launching boardroom-level workshops to support this transformation and educate senior management on devising effective data strategies. “We believe it’s part of our charter to educate the market,” Rai said. These workshops are designed to ensure that enterprises can maximise the potential of AI and other emerging technologies.

Snowflake is also addressing the demand for skilled developers by offering extensive training and certification programs. These initiatives extend beyond major cities to Tier-2 and Tier-3 locations, even small villages in India, reflecting Snowflake’s commitment to democratise access to AI. 

“India is front and centre in our strategy,” Rai affirmed, highlighting Snowflake’s dedication to making a meaningful impact in the country.

The Tech Talent Prowess

One of Snowflake’s key initiatives is to leverage India’s talent and technological prowess. “We see India not just as a support hub but as a centre for innovation, especially in AI and data-driven technologies,” Rai stated. 

He highlights the significant role of GCCs, with over 600,000 tech professionals in India driving data innovation. Snowflake is committed to supporting these centres to scale and enhance their capabilities.

The company is also focused on nurturing the developer community in India. “We’re investing heavily in skilling and ensuring that people are exposed to the Snowflake platform and various aspects of data and AI,” Rai said. This includes initiatives like language support in their large language model, which accommodates all major Indian languages. 

Teams from Pune and other locations are already contributing to Snowflake’s global projects, and this collaboration is set to deepen over time.

In terms of market strategy, Rai emphasised the importance of understanding local business cultures and nuances. “We have experienced teams in Delhi, Bengaluru, and Mumbai who have worked in various verticals and understand the unique needs of different industries,” he said. 

This local expertise is crucial in navigating the fast-paced technological changes that Indian enterprises are embracing.

Snowflake’s Shift to Generative AI Paying Off

Snowflake has been on an acquisition spree and much of its focus is on expanding its generative AI capabilities. Ever since Sridhar Ramaswamy joined as the CEO of Snowflake after the acquisition of Neeva, generative AI has been one of its biggest focuses. 

In an exclusive interview with AIM, Snowflake head of AI Baris Gultekin said that he had worked with Ramaswamy for over 20 years at Google, and called him an incredible leader. “Sridhar brings incredible depth in AI as well as data systems. He has managed super large-scale data systems and AI systems at Google,” Gultekin said.

In addition, Microsoft announced an expanded partnership with Snowflake, aiming to deliver a seamless data experience for customers. As part of this, Microsoft Fabric’s OneLake will now support Apache Iceberg and facilitate bi-directional data access between Snowflake and Fabric.

Moreover, in a recent interview, Ramaswamy revealed that the cloud data company plans to deepen its collaboration with AI powerhouse NVIDIA. “We collaborated with NVIDIA on a number of fronts – our foundation model Arctic was, unsurprisingly, done on top of NVIDIA chips. There’s a lot to come, and Jensen’s, of course, a visionary when it comes to AI,” Ramaswamy said.

The post Snowflake Looks to Upskill Developers in India’s Rural Towns appeared first on AIM.

]]>
Snowflake Enhances AI for Enterprise with Upgrades to Cortex AI with Meta’s Llama 3 and Mistral LLMs https://analyticsindiamag.com/ai-news-updates/snowflake-enhances-ai-for-enterprise-with-upgrades-to-cortex-ai-with-metas-llama-3-and-mistral-llms/ Wed, 05 Jun 2024 02:33:04 +0000 https://analyticsindiamag.com/?p=10122492 Snowflake Enhances AI for Enterprise with Upgrades to Cortex AI with Meta’s Llama 3 and Mistral LLMs

Snowflake is also unveiling Snowflake Cortex Guard, an LLM-based safeguard for filtering harmful content across organisational data.

The post Snowflake Enhances AI for Enterprise with Upgrades to Cortex AI with Meta’s Llama 3 and Mistral LLMs appeared first on AIM.

]]>
Snowflake Enhances AI for Enterprise with Upgrades to Cortex AI with Meta’s Llama 3 and Mistral LLMs

Snowflake is expanding access to enterprise AI with significant updates to Snowflake Cortex AI and Snowflake ML, democratising AI customisation through a no-code interactive interface and providing access to leading LLMs. 

These enhancements include serverless fine-tuning capabilities and an integrated ML experience, enabling developers to manage models across the ML lifecycle. This unified platform allows businesses to derive more value from their data while ensuring full security and governance.

“Snowflake is at the epicentre of enterprise AI, putting easy, efficient, and trusted AI in the hands of every user so they can solve their most complex business challenges, without compromising on security or governance,” said Baris Gultekin, Head of AI at Snowflake.

The company is introducing two new chat capabilities, Snowflake Cortex Analyst and Snowflake Cortex Search, both entering public preview soon. These tools enable users to develop chatbots that interact with structured and unstructured data, facilitating faster and more efficient decision-making processes. 

Cortex Analyst, utilising Meta’s Llama 3 and Mistral Large models, allows secure application building on Snowflake’s analytical data. Cortex Search integrates Neeva’s retrieval and ranking technology for enhanced document and text-based dataset searches.

Awinash Sinha, Corporate CIO at Zoom, highlighted the importance of Snowflake’s AI solutions for their enterprise analytics: “By combining the power of Snowflake Cortex AI and Streamlit, we’ve been able to quickly build apps leveraging pre-trained large language models in just a few days.”

Snowflake is also unveiling Snowflake Cortex Guard, an LLM-based safeguard for filtering harmful content across organisational data, further ensuring the safety and usability of AI models. This feature, leveraging Meta’s Llama Guard, will be generally available soon.

In addition to these advancements, Snowflake is introducing Document AI and Snowflake Copilot, both generally available soon. Document AI allows users to extract content from documents using the multimodal LLM Snowflake Arctic-TILT. Snowflake Copilot enhances productivity for SQL users by combining Mistral Large with Snowflake’s proprietary SQL generation model.

“Although businesses typically use dashboards to consume information from their data for strategic decision-making, this approach has some drawbacks including information overload, limited flexibility, and time-consuming development,” said Mukesh Dubey, Product Owner Data Platform at Bayer.

Snowflake’s new AI & ML Studio, currently in private preview, offers a no-code interface for AI development, enabling users to test and evaluate models for cost-effectiveness. Cortex Fine-Tuning, now in public preview, provides serverless customization for a subset of Meta and Mistral AI models.

Additionally, Snowflake ML enhances MLOps capabilities, facilitating the management of models and features across their lifecycle. This includes the Snowflake Model Registry, now generally available, and the Snowflake Feature Store, in public preview.

These comprehensive updates reinforce Snowflake’s commitment to making AI accessible and effective for enterprises while maintaining robust security and governance frameworks.

The post Snowflake Enhances AI for Enterprise with Upgrades to Cortex AI with Meta’s Llama 3 and Mistral LLMs appeared first on AIM.

]]>
Snowflake Unveils Polaris Catalog, a Vendor-Neutral, Open Catalog Implementation for Apache Iceberg https://analyticsindiamag.com/ai-news-updates/snowflake-announces-polaris-catalog/ Mon, 03 Jun 2024 12:59:00 +0000 https://analyticsindiamag.com/?p=10122362

Polaris Catalog relies on Iceberg’s open source REST protocol, which provides an open standard for users to access and retrieve data from any engine that supports the Iceberg Rest API

The post Snowflake Unveils Polaris Catalog, a Vendor-Neutral, Open Catalog Implementation for Apache Iceberg appeared first on AIM.

]]>

At its annual user conference Snowflake Summit 2024, the company today announced Polaris Catalog, a vendor-neutral, open catalog implementation for Apache Iceberg. It’s an open standard of choice for implementing data lakehouses, data lakes, and other modern architectures. 

Snowflake plans to open-source Polaris Catalog in the next 90 days to provide enterprises and the entire Iceberg community with new levels of choice, flexibility, and control over their data, with full enterprise security and Apache Iceberg interoperability with Amazon Web Services (AWS), Confluent, Dremio, Google Cloud, Microsoft Azure, Salesforce, and more.

Apache Iceberg emerged from incubation to a top-level Apache Software Foundation project in May 2020, and has since surged in popularity to become a leading open-source data table format.

With Polaris Catalog, users now gain a single, centralised place for any engine to find and access an organisation’s Iceberg tables with full, open interoperability.

Polaris Catalog relies on Iceberg’s open-source REST protocol, which provides an open standard for users to access and retrieve data from any engine that supports the Iceberg Rest API, including Apache Flink, Apache Spark, Dremio, Python, Trino, and more.

“Organisations want open storage and interoperable query engines without lock-in. Now, with the support of industry leaders, we are further simplifying how any organisation can easily access their data across diverse systems with increased flexibility and control,” said Christian Kleinerman, EVP of product, Snowflake.

Moreover, Snowflake revealed that organisations can get started running Polaris Catalog hosted in Snowflake’s AI Data Cloud within minutes (Snowflake-hosted in public preview soon), or self-host it in their own infrastructure using containers such as Docker or Kubernetes.

Since Polaris Catalog’s backend implementation will be open source, organisations can freely swap the hosting infrastructure while eliminating vendor lock-in.

To ensure Polaris Catalog can meet the evolving needs of the wider community and landscape, Snowflake is collaborating with the Iceberg ecosystem to drive the project forward. Interestingly, a part of what makes Apache Iceberg so powerful is its vibrant community of diverse adopters, contributors, and commercial offerings.

The post Snowflake Unveils Polaris Catalog, a Vendor-Neutral, Open Catalog Implementation for Apache Iceberg appeared first on AIM.

]]>
Snowflake Proves AI is Not Just at the Tip of its Iceberg https://analyticsindiamag.com/ai-origins-evolution/snowflake-proves-ai-is-not-just-at-the-tip-of-its-iceberg/ Thu, 23 May 2024 10:47:37 +0000 https://analyticsindiamag.com/?p=10121351 Snowflake Proves AI is Not Just at the Tip of its Iceberg

“Our product pipeline, especially in AI, has been in overdrive. The era of enterprise AI is here, right here at Snowflake.”

The post Snowflake Proves AI is Not Just at the Tip of its Iceberg appeared first on AIM.

]]>
Snowflake Proves AI is Not Just at the Tip of its Iceberg

Snowflake’s current quarter numbers have made Wall Street happy. Snowflake showed a stronger-than-expected sales forecast for the current quarter, indicating that its new generative AI-focused products are driving faster growth.

Snowflake has announced that its product revenue is projected to be between $805 million and $810 million for the current quarter ending in July, up 34% year-over-year. This surpasses forecasts from analysts who projected a revenue of $787.5 million. Additionally, the company increased its annual product sales projection to $3.3 billion from $3.25 billion.

Snowflake CEO Sridhar Ramaswamy said, “Our AI products, now generally available, are generating strong customer interest. They will help our customers deliver effective and efficient AI-powered experiences faster than ever.” Ever since Ramaswamy joined Snowflake after its acquisition of his company, Neeva, generative AI has been one of its biggest focuses. 

All About Generative AI

Snowflake had been negotiating to acquire generative AI startup Reka AI for over $1 billion, but the discussions ended without an agreement. In April, Snowflake introduced its own LLM suite called Arctic and now allows customers to utilise third-party AI models, such as Mistral and Meta, on their data within the company’s platform, dubbed Snowflake Cortex.

But apart from the Reka AI deal that fell through, Snowflake announced a definitive agreement to acquire TruEra, an AI startup specialising in tools for testing, debugging, and monitoring machine learning models and large language model applications in production.

Snowflake announced plans to “acquire certain technology assets and hire key employees” from the AI-focused startup, which raised $25 million in 2022. 

On the acquisition, TruEra co-founder, president, and chief scientist Anupam Datta said, “We are looking forward to this next phase in our journey with the Snowflake team with whom we share a commitment to delivering effective & trustworthy generative AI and predictive ML at scale.”

This acquisition marks Snowflake’s sixth significant investment to enhance the capabilities of its data cloud and its third major initiative in the data observability space. Prior to this, Snowflake had invested in two monitoring solutions companies – Observe and Metaplane.

In a blog post, Snowflake highlighted that the TrueEra acquisition will help the company ensure more accuracy and trustworthiness with the data used for training AI models.

TruEra has been instrumental in solving the black box problem with AI and the team behind the startup are experts in RAG-based solutions. Following the acquisition, all three co-founders will be joining Snowflake alongside the TrueEra team.

This AI-focused approach is enabling Snowflake to better compete with others in the field, such as Databricks, which offers similar services and, interestingly, has employed a similar acquiring strategy ever since it acquired MosaicML.

In an exclusive interview with AIM, Snowflake head of AI Baris Gultekin said that he had worked with Ramaswamy for over 20 years at Google, calling him an incredible leader. “Sridhar brings incredible depth in AI as well as data systems. He has managed super large-scale data systems and AI systems at Google,” Gultekin said.

Gultekin further said that Snowflake is developing LLMs at a very affordable price, prioritising the security of their customers’ data. “Despite using a 17x less compute budget, Arctic is on par with Llama 3 70B in language understanding and reasoning while surpassing enterprise metrics,” said Gultekin.

The Microsoft Fabric and NVIDIA Spread

In addition, Microsoft announced an expanded partnership with Snowflake, aiming to deliver a seamless data experience for customers. As part of this, Microsoft Fabric‘s OneLake will now support Apache Iceberg and facilitate bi-directional data access between Snowflake and Fabric.

OneLake, a unified, SaaS-based open data foundation, was launched by Microsoft with the introduction of Fabric. The foundation underscores the company’s commitment to open standards. The support for Iceberg, alongside Delta Lake in Microsoft Fabric OneLake, further enhances this commitment. 

In essence, Snowflake can store data in Iceberg’s format in OneLake. Data written by either Snowflake or Fabric will be accessible in both Iceberg and Delta Lake formats through XTable translation in OneLake. Snowflake can read any Fabric data artefact in OneLake, whether stored physically or virtually, through shortcuts.

And that’s not all.

In a recent interview, Ramaswamy revealed that the cloud data company plans to deepen its collaboration with AI powerhouse NVIDIA. “We collaborated with NVIDIA on a number of fronts – our foundation model Arctic was, unsurprisingly, done on top of NVIDIA chips. There’s a lot to come, and Jensen’s, of course, a visionary when it comes to AI,” Ramaswamy said.

Snowflake is expected to make a lot more announcements at its Data Cloud Summit this June. As Ramaswamy said, “Our product pipeline, especially in AI, has been in overdrive. The era of enterprise AI is here, right here at Snowflake.”

The post Snowflake Proves AI is Not Just at the Tip of its Iceberg appeared first on AIM.

]]>
Snowflake’s Strategic Acquisition of Neeva Pays Off https://analyticsindiamag.com/ai-origins-evolution/snowflakes-strategic-acquisition-of-neeva-pays-off/ Tue, 14 May 2024 12:10:28 +0000 https://analyticsindiamag.com/?p=10120353

The company recently made Snowflake Cortex generally available.

The post Snowflake’s Strategic Acquisition of Neeva Pays Off appeared first on AIM.

]]>

Last year, Snowflake acquired Neeva, which turned out to be one of its best decisions.

Sridhar Ramaswamy founded Neeva, an ad-free, privacy-focused search engine, alongside another ex-Google executive, Vivek Raghunathan, in 2020. At the time of its last funding round in 2021, Neeva was valued at about $250 million.

Fast forward to the present, Snowflake is a household name for enterprises. The company recently introduced a slew of open-source models, including Arctic LLM, designed for enterprises that want to use large language models (LLMs) to create conversational SQL data copilots, code copilots, and RAG chatbots.

Credit for this goes to Sridhar Ramaswamy, who, earlier this year, became Snowflake’s new CEO. Since assuming the position, the company has transformed from a data cloud company to a data and AI-driven entity with a strong emphasis on generative AI.

“I think it’s a huge opportunity in the world of data applications and AI. It will keep me busy for many years to come,” said Ramaswamy in a recent interview after taking the helm at Snowflake.

In an exclusive interview with AIM, Snowflake head of AI Baris Gultekin said that he had worked with Ramaswamy for over 20 years at Google, calling him an incredible leader. “Sridhar brings incredible depth in AI as well as data systems. He has managed super large-scale data systems and AI systems at Google,” Gultekin said.

Neeva’s expertise in generative AI and LLMs, now integrated into the Snowflake Data Cloud, has enhanced Snowflake’s AI capabilities. Especially in natural language processing and search functionalities within its cloud data platform.

“Neeva is an important acquisition for Snowflake. We are integrating many things from Neeva into Snowflake’s offerings, the most obvious one of which is Snowflake’s Universal Search product,” said Gultekin.

Universal Search helps customers quickly and easily find database objects in their account, data products available in the Snowflake Marketplace, relevant Snowflake Documentation topics, and Snowflake Community Knowledge Base articles.

Snowflake’s Generative AI Prowess 

While there are several generative AI models out in the market, Snowflake has selected its niche in targeting enterprise customers. Recently, the company made Snowflake Cortex generally available.

Cortex grants access to pre-trained LLMs from various providers, including Snowflake’s own Arctic LLM. These models can perform tasks like text summarisation, sentiment analysis, question answering, and code generation, all within the Snowflake environment.

Moreover, Cortex offers pre-built SQL functions that enable users to perform machine learning tasks on their data without extensive coding expertise. These functions handle tasks like classification, regression, and anomaly detection.

Currently, Snowflake Arctic outperforms leading open models such as DBRX, Llama 2 70B, Mixtral-8x7B, and more in coding (HumanEval+, MBPP+) and SQL generation (Spider and Bird-SQL), while also providing superior performance in general language understanding (MMLU).

Snowflake has also partnered with Mistral, Meta, and Reka to host their LLMs on Cortex. “We’ve partnered with Landing AI, AI21 Labs, and other capable partners to build amazing products. They’re important to us as they allow us to provide choices to our customers,” said Gultekin.

Gultekin further said that Snowflake is developing LLMs at a very affordable price, prioritising the security of their customers’ data. “Despite using a 17x less compute budget, Arctic is on par with Llama 3 70B in language understanding and reasoning while surpassing in Enterprise Metrics,” said Gultekin

Additionally, he said that they had 10,000 customers entrusting Snowflake with their sensitive data. With this in mind, he emphasised that all the LLMs that they operate are within strict security parameters, meaning no data leaves and everything remains secure.

Moreover, he added that even though Arctic LLM is orders of magnitude smaller compared to OpenAI, the benchmark proves that they excel in document understanding and question answering with their document data model.

​​Snowflake recently introduced Document AI to extract valuable content from unstructured data like PDFs, images, and videos. Powered by Arctic-TILT, a multimodal large language model, it offers efficient content extraction for enterprises.

“We’re just getting started. There’s a lot to build. I’ll say the core use cases for us are being able to talk to data and how we can make that a lot better and a lot easier,” concluded Gultekin, saying they put out a whole pile of products just recently for public preview. This included a series of chat products that are able to chat with structured data.

Snowflake Is Not Alone

Coincidentally, Snowflake’s acquisition of Neeva is similar to Databricks’ acquisition of MosaicML. Naveen Rao, who founded MosaicML, is now the VP of generative AI at Databricks.

MosaicML specialises in optimising machine learning models and has been integrated into Databricks’ offerings to enhance generative AI development.

Recently, Databricks also released its own mixture of expert models, DBRX, built with 132 billion parameters and pre-trained on a dataset of 12 trillion tokens. DBRX outperforms GPT-4, particularly in niche areas like SQL and RAG tasks.

The post Snowflake’s Strategic Acquisition of Neeva Pays Off appeared first on AIM.

]]>
Snowflake Releases Open Enterprise LLM, Arctic with 480 Billion Parameters https://analyticsindiamag.com/ai-breakthroughs/snowflake-releases-open-enterprise-llm-arctic-with-480-billion-parameters/ Wed, 24 Apr 2024 13:00:00 +0000 https://analyticsindiamag.com/?p=10118982 Snowflake Arctic

Arctic activates approximately 50% fewer parameters than DBRX, and 80% fewer than Grok-1 during inference or training.

The post Snowflake Releases Open Enterprise LLM, Arctic with 480 Billion Parameters appeared first on AIM.

]]>
Snowflake Arctic

After open-sourcing the Arctic family of text embedding models, Snowflake is now adding another LLM to the list for enterprise use cases. Snowflake Arctic sets a new standard for openness and enterprise-grade performance. 

Designed with a unique Mixture-of-Experts (MoE) architecture, Arctic provides top-tier optimisation for complex enterprise workloads, surpassing several industry benchmarks in SQL code generation, instruction following, and more. 

Arctic’s unique MoE design enhances both training systems and model performance with a carefully crafted data composition tailored to enterprise needs. With a breakthrough in efficiency, Arctic activates only 17 out of 480 billion parameters at a time, achieving industry-leading quality with unprecedented token efficiency.

“Despite using 17x less compute budget, Arctic is on par with Llama3 70B in language understanding and reasoning while surpassing in Enterprise Metrics,” said Baris Gultekin,  Snowflake’s head of AI.

Compared to other models, Arctic activates approximately 50% fewer parameters than DBRX, and 80% fewer than Grok-1 during inference or training. Moreover, it outperforms leading open models such as DBRX, Llama 2 70B, Mixtral-8x7B, and more in coding (HumanEval+, MBPP+) and SQL generation (Spider and Bird-SQL), while also providing superior performance in general language understanding (MMLU).

“This is a watershed moment for Snowflake, with our AI research team innovating at the forefront of AI,” said Sridhar Ramaswamy, CEO, Snowflake. “By delivering industry-leading intelligence and efficiency in a truly open way to the AI community, we are furthering the frontiers of what open source AI can do. Our research with Arctic will significantly enhance our capability to deliver reliable, efficient AI to our customers,” he said. 

The best open model?

The best part is that Snowflake is releasing Arctic’s weights under an Apache 2.0 licence, along with details of the research behind its training, establishing a new level of openness for enterprise AI technology. “With the Apache 2 licensed Snowflake Arctic embed family of models, organisations now have one more open alternative to black-box API providers such as Cohere, OpenAI, or Google,” says Snowflake.

“The continued advancement and healthy competition between open source AI models is pivotal not only to the success of Perplexity, but the future of democratising generative AI for all,” said Aravind Srinivas, co-founder and CEO, Perplexity. “We look forward to experimenting with Snowflake Arctic to customise it for our product, ultimately generating even greater value for our end users.”

As part of the Snowflake Arctic model family, Arctic is the most open LLM available, allowing ungated personal, research, and commercial use with its Apache 2.0 licence. Snowflake goes further by providing code templates, along with flexible inference and training options, enabling users to deploy and customise Arctic quickly using their preferred frameworks, including NVIDIA NIM with NVIDIA TensorRT-LLM, vLLM, and Hugging Face.

Yoav Shoham, co-founder and co-CEO, AI21 Labs, said, “We are excited to see Snowflake help enterprises harness the power of open source models, as we did with our recent release of Jamba — the first production-grade Mamba-based Transformer-SSM model.”

For immediate use, Arctic is available for serverless inference in Snowflake Cortex, Snowflake’s fully managed service offering machine learning and AI solutions in the Data Cloud, alongside other model gardens and catalogues such as Hugging Face, Lamini, Microsoft Azure, NVIDIA API catalogue, Perplexity, Together, and more.

“We’re pleased to increase enterprise customer choice in the rapidly evolving AI landscape by bringing the robust capabilities of Snowflake’s new LLM model Arctic to the Microsoft Azure AI model catalogue,” said Eric Boyd, corporate vice president, Azure AI Platform, Microsoft.

Everyone loves the winter

Snowflake’s AI research team, comprising industry-leading researchers and system engineers, developed Arctic in less than three months, spending roughly one-eighth of the training cost of similar models. Snowflake has set a new benchmark for the speed at which state-of-the-art open, enterprise-grade models can be trained, enabling users to create cost-efficient custom models at scale.

Clement Delangue, CEO and co-founder of Hugging Face said, “We’re excited to see Snowflake contributing significantly with this release not only of the model with an Apache 2.0 licence but also with details on how it was trained. It gives the necessary transparency and control for enterprises to build AI and for the field as a whole to break new grounds.”

Snowflake Ventures has also recently invested in LandingAI, Mistral AI, Reka, and others, reinforcing its commitment to helping customers derive value from their enterprise data with LLMs and AI. 

“Snowflake and Reka are committed to getting AI into the hands of every user, regardless of their technical expertise, to drive business outcomes faster,” said Dani Yogatama, co-founder and CEO, Reka. “With the launch of Snowflake Arctic, Snowflake is furthering this vision by putting world-class truly-open large language models at users’ fingertips.”

Additionally, Snowflake has expanded its partnership with NVIDIA to further AI innovation, combining the full-stack NVIDIA accelerated platform with Snowflake’s Data Cloud to provide a secure and powerful infrastructure and compute capabilities for unlocking AI productivity. 

The post Snowflake Releases Open Enterprise LLM, Arctic with 480 Billion Parameters appeared first on AIM.

]]>
Snowflake Open Sources Arctic, Family of Embedding Models for RAG https://analyticsindiamag.com/ai-news-updates/snowflake-open-sources-arctic-family-of-embedding-models-for-rag/ Tue, 16 Apr 2024 13:52:31 +0000 https://analyticsindiamag.com/?p=10118324 Snowflake Open Sources Arctic, Family of Embedding Models for RAG

“With the Apache 2 licensed Snowflake Arctic embed family of models, organisations now have one more open alternative to black-box API providers such as Cohere, OpenAI, or Google," says Snowflake.

The post Snowflake Open Sources Arctic, Family of Embedding Models for RAG appeared first on AIM.

]]>
Snowflake Open Sources Arctic, Family of Embedding Models for RAG

Snowflake today announced the launch of the Snowflake Arctic embed family of models under an Apache 2.0 licence. These models, ranging in size and context window, are designed for text embedding tasks and offer SOTA performance for retrieval applications. 

The largest model in the family, with 330 million parameters, leads the Massive Text Embedding Benchmark (MTEB) Retrieval Leaderboard, achieving an average retrieval performance surpassing 55.9.

Click here to check out the model on Hugging Face.

Sridhar Ramaswamy, CEO of Snowflake highlights the importance and expertise of the Neeva team and commitment to AI for making the model. Snowflake acquired Neeva in May last year.

The Snowflake Arctic embed models, available on Hugging Face and soon in Snowflake Cortex embed function, provide organisations with advanced retrieval capabilities when integrating proprietary datasets with LLMs for Retrieval Augmented Generation (RAG) or semantic search services. 

The success of these models lies in the application of effective web searching techniques to training text embedding models. Improved sampling strategies and competence-aware hard-negative mining have significantly boosted the quality of the models. 

Snowflake Arctic embed models come in five sizes, from x-small to large, catering to different organisational needs regarding latency, cost, and retrieval performance. 

Snowflake claims that Arctic-embed-l stands out as the leading open-source model suitable for production due to its excellent performance-to-size ratio. Although there are models like SFR-Embedding-Mistral that surpass Arctic-embed-l, they come with a vector dimensionality that is four times greater (1024 vs. 4096) and require over 20 times more parameters (335 million vs. 7.1 billion). 

“With the Apache 2 licensed Snowflake Arctic embed family of models, organisations now have one more open alternative to black-box API providers such as Cohere, OpenAI, or Google,” reads Snowflake’s blog.

These enhancements, combined with Snowflake’s data processing power, were achieved without the need for a massive expansion of computing resources, utilising just eight H100 GPUs.

Snowflake plans to continue expanding its range of models and targeted workloads to maintain its commitment to providing customers with top-quality models for enterprise use cases such as RAG and search.

The post Snowflake Open Sources Arctic, Family of Embedding Models for RAG appeared first on AIM.

]]>
Snowflake and Mistral AI Partner to Bring LLMs to Data Cloud https://analyticsindiamag.com/ai-news-updates/snowflake-and-mistral-ai-partner-to-bring-llms-to-data-cloud/ Wed, 06 Mar 2024 03:32:12 +0000 https://analyticsindiamag.com/?p=10115058 Snowflake Open Sources Arctic, Family of Embedding Models for RAG

Snowflake Cortex LLM Functions, now in public preview, enables users to quickly, easily, and securely build generative AI apps

The post Snowflake and Mistral AI Partner to Bring LLMs to Data Cloud appeared first on AIM.

]]>
Snowflake Open Sources Arctic, Family of Embedding Models for RAG

Snowflake, the data cloud company, and Mistral AI, a European-based AI startup, have announced a global partnership to provide Mistral AI’s most powerful language models directly to Snowflake customers in the Data Cloud.

The collaboration, backed by a parallel investment in Mistral’s Series A from Snowflake Ventures, focuses on providing enterprises with seamless access to large language models while prioritising security, privacy, and governance over their data.

As part of the partnership, Snowflake customers can now access Mistral AI’s latest LLM, Mistral Large, via the Snowflake Data Cloud. Additionally, Snowflake customers gain access to Mixtral 8x7B, an open-source model exceeding OpenAI’s GPT3.5 in speed and quality on most benchmarks, and Mistral 7B, a foundation model optimised for low latency, low memory requirements, and high throughput for its size.

“By partnering with Mistral AI, Snowflake is putting one of the most powerful LLMs on the market directly in the hands of our customers, empowering every user to build cutting-edge, AI-powered apps with simplicity and scale,” said Sridhar Ramaswamy, CEO of Snowflake. 

Snowflake Cortex LLM Functions, now in public preview, enables users to build generative AI apps quickly, easily, and securely. The Snowflake Cortex service, fully managed by Snowflake, supports industry-leading LLMs for specialised tasks like sentiment analysis, translation, and summarisation. 

Snowflake recently announced Sridhar Ramaswamy as the New Chief Executive Officer and member of the Board of Directors effective immediately. He will replace Frank Slootman, who decided to retire and step from the helm of the company.

Ramaswamy, who graduated from IIT Madras, joined Snowflake in May 2023 after selling his startup Neeva AI to the data cloud company. He held the position of Senior Vice President of AI at Snowflake.

The post Snowflake and Mistral AI Partner to Bring LLMs to Data Cloud appeared first on AIM.

]]>
Snowflake Announces Sridhar Ramaswamy as its New Chief https://analyticsindiamag.com/ai-news-updates/snowflake-announces-sridhar-ramaswamy-as-new-chief/ Thu, 29 Feb 2024 06:45:55 +0000 https://analyticsindiamag.com/?p=10114736

Ramaswamy, who graduated from IIT Madras, joined Snowflake in May 2023

The post Snowflake Announces Sridhar Ramaswamy as its New Chief appeared first on AIM.

]]>

Snowflake recently announced Sridhar Ramaswamy as the New Chief Executive Officer and member of the Board of Directors effective immediately. He will replace Frank Slootman, who decided to retire and step from the helm of the company.

Ramaswamy, who graduated from IIT Madras, joined Snowflake in May 2023 after selling his startup Neeva AI to the data cloud company. He held the position of Senior Vice President of AI at Snowflake.

“There is no better person than Sridhar to lead Snowflake into this next phase of growth and deliver on the opportunity ahead in AI and machine learning. He is a visionary technologist with a proven track record of running and scaling successful businesses. I have the utmost confidence in Sridhar and look forward to working with him as he assumes this new role,” Slootman said.

Back in 2021, Ramaswamy, along with Vivek Raghunathan set out to address the gap in the search engine space and launch an alternative product. In February 2023, Neeva AI launched its search engine powered by generative AI. The LLM-powered search engine challenged Google’s fundamentals and offered an ad-free and privacy-focused search experience.

However, just three months later, data cloud company Snowflake acquired Neeva AI for an undisclosed amount.

In an interaction with AIM last year, Ramaswamy stated that becoming the default search engine in Safari, as it turns out, is an incredibly convoluted endeavour. “In reality, there is no formal process; it all hinges on Cupertino’s (Apple’s) subjective judgement of one’s qualifications. There is no process and it’s pretty tough to create a sustainable business,” he said.

Before this, he also worked with Google for over 15 years and led its USD 115 billion advertising tech division.

The post Snowflake Announces Sridhar Ramaswamy as its New Chief appeared first on AIM.

]]>
This company’s AI FinOps serves Snowflake, Databricks & more https://analyticsindiamag.com/intellectual-ai-discussions/this-companys-decade-long-ai-finops-offerings-cater-to-databricks-snowflake-and-more/ Tue, 02 Jan 2024 11:56:17 +0000 https://analyticsindiamag.com/?p=10109834

“In 2024 (and beyond) cloud data costs are going to be much higher because you’re gathering, retaining, and processing more data,” said Kunal Agarwal, CEO and co-founder of Unravel Data

The post This company’s AI FinOps serves Snowflake, Databricks & more appeared first on AIM.

]]>

With the global cloud FinOps market expected to grow from $832.2 million in 2023 to $2,750.5 million by 2028, with an average annual growth rate of 18.8%, companies offering data observability and FinOps have found rising prevalence in the market. One such player who has been in the market for over a decade, leveraging machine learning and automation to provide services to some of the biggest cloud players such as Databricks, Snowflake, Big Query and others is Unravel Data

AI is Not New

Using AI-powered tools and an insights engine, this US-based company which also has an office in Bengaluru, has been relying on in-house ML models and algorithms. “AI is not new to Unravel. We have used AI to automate tasks for data teams over the last 10 years after observing more than 50 million data pipelines and queries. Today, AI is woven into Unravel’s platform at all levels,” said Kunal Agarwal, CEO and co-founder of Unravel Data, in an exclusive interaction with AIM

Unravel Data’s ML algorithms have been developed in-house and have been trained across a wide variety of workloads for each specific platform to ensure maximum accuracy in insights and predictions. The company’s AI-powered Insights Engine utilises a robust tech stack that starts with data collection from diverse sources, covering big data application performance, cloud expenses and historical usage patterns. 

While AI has been the core of Unravel’s business functioning, generative AI is not far behind. Speaking about its implementation, Agarwal mentioned that Unravel has big plans with it and will be shared with the market very soon. 

Customer-Centric Solutions

Increasing adoption of data engineering teams that are guided by DataOps practices will lead to fruitful results. By 2025, data teams supported by dataops tools and practices are said to be 10 times more productive than teams that don’t use Data Ops. With the need to stand apart and offer specialised solutions, Unravel has addressed that as well. 

“The AI isn’t just reactive; it employs predictive analytics, forecasting future cloud spending based on historical data and trends. This foresight empowers businesses to make proactive adjustments, avoiding budgetary pitfalls. This also means that our ML models are trained for each specific platform, across a wide variety of workloads to provide accurate insights,” said Agarwal. 

Catering to specific needs, Unravel has distinguished products for each of their big customers including Databricks, AWS’ EMR and others. Unravel’s purpose-built AI provides insights in real time at the job, user, and workgroup levels to help teams improve their cost allocation and workload efficiency. Furthermore, a standout feature of Unravel’s Insights Engine is its ability to act as a financial detective. It scrutinises cloud spending patterns, identifying anomalies and inefficiencies in resource allocation. “This is invaluable for organisations aiming to streamline costs and enhance operational effectiveness,” said Agarwal. 

Agarwal believes that the AI-driven resource rightsizing recommendations are akin to having a personal trainer for one’s cloud resources. “Unravel Data ensures that your resources are neither underutilised nor oversized, optimising costs with precision. The AI also plays a crucial role in cost allocation, accurately attributing cloud costs to different business units or projects.”

Data Observability in 2024 

With booming predictions for cloud end-user spending which was expected to hit $600 billion in 2023 as per Gartner, the forecast is only going to hit higher, thereby spiking the need for data observability and FinOps platform. 

“In 2024 (and beyond) cloud data costs are going to be much higher because you’re gathering, retaining, and processing more data. Data observability to understand what’s going on with data applications/pipelines will become table stakes. What companies will really need are solutions that leverage data observability with FinOps and AI-powered recommendations that optimise performance and costs of data workloads,” said Agarwal. However, challenges of navigating around generative AI in FinOps will continue. 

“The total impact, both fiscal and environmental, will have companies putting greater scrutiny on their AI projects, such as which models do they really need to run, which projects needs generative AI, can a model be repurposed/fine-tuned as opposed to starting from scratch, and rightsizing jobs to ensure that they’re wasting neither resources nor money,” said Agarwal, who believes these are some of the nuances that needs to be dealt with.

Companies such as Dynatrace, Datadog, Microsoft System Centre among others are some of the notable competitors to Unravel Data. 

The post This company’s AI FinOps serves Snowflake, Databricks & more appeared first on AIM.

]]>
Are Databricks and Snowflake Ferraris in a Toyota World? https://analyticsindiamag.com/ai-origins-evolution/are-databricks-and-snowflake-ferraris-in-a-toyota-world/ Tue, 22 Aug 2023 08:13:57 +0000 https://analyticsindiamag.com/?p=10098821

Companies looking to cut costs shouldn’t rely on open source solutions either. It's a juggle

The post Are Databricks and Snowflake Ferraris in a Toyota World? appeared first on AIM.

]]>

While opting for data architecture solutions, companies frequently fall into the trap of paying exorbitant prices for services they don’t need. A recent blog by Kieran Healey points out that companies like Databricks or Snowflakes are offering Ferraris when many companies could do their work with Toyota. 

Databricks and Snowflake are undoubtedly robust platforms that offer impressive capabilities. Snowflake’s partnership with NVIDIA and Databricks’ integration with the Spark Human API showcased their technical prowess and made it even bigger. Yet, such features often serve as marketing tactics rather than essential solutions, which companies end up paying instead of open source solutions.

For example, instead of opting to pay the price of an LLM-based chatbot, most companies could effectively address their data challenges with simpler, more cost-efficient solutions such as a simple “press 1 to choose this option”. But when it comes to addressing data-related challenges without overspending, companies should adopt an anti-hype mindset. 

A person from Databricks suggested on HackerNews that though companies might be able to create their own Spark deployment, it will run much slower than how it runs on Databricks or its proprietary runtime. He further adds that a lot of businesses have other problems to solve and focusing on building DIY platforms is a horrible approach.

Interestingly, none of this matters if you only have gigabytes of data as the company can use pretty much anything very cheaply and easily. It is just about companies that have terabytes or hundreds of terabytes of data. 

Open source vs commercial solutions

On the other end, it seems easy to hop onto the open source solutions as well, given the cost-effective value that they are presented as. One side of the debate emphasises the financial advantage of open source solutions. Supporters highlight the fact that open source software is often free to use, suggesting that the cost savings alone make it a compelling choice. 

However, it is essential to be pointed out that while the software itself may be free, deploying, maintaining, and expertly managing open source solutions can incur significant costs. Paying skilled professionals to ensure proper deployment and upkeep can strain both time and resources.

“Open source it may be. Free it is not. Paying an expert to correctly deploy an open source solution takes time and money,” said another user. This argument underscores the idea that simply adopting open source software isn’t a guaranteed money-saving solution without proper expertise and management.

On the opposite side, commercial solutions such as Databricks and Snowflake might come with upfront costs, but offer comprehensive support, integration, and scalability that can be invaluable. These solutions often package features, support, and maintenance into a single offering, reducing the need for extensive in-house expertise. Furthermore, commercial solutions can provide a level of assurance and accountability that can be lacking in open source alternatives.

Though you pay to change the parameters of the problem. This is a fundamental misunderstanding of how to get things done in a constrained environment. This viewpoint highlights the notion that the trade-off between open source and commercial solutions is about more than just cost—it’s about shifting the focus from technical challenges to non-technical ones.

Funnily, it’s like saying no company needs a cloud provider but it definitely helps them focus on better things instead of building a data centre themselves. 

The Anti-Hype Approach

In the debate over data platform choices, context and expertise play pivotal roles. While open source solutions can be powerful tools when implemented correctly, they require a skilled team to navigate potential challenges. Conversely, commercial solutions can mitigate many technical complexities, enabling organisations to concentrate on their core business goals. However, this often involves a trade-off between flexibility and vendor lock-in.

Ultimately, there is no one-size-fits-all answer to the open source vs commercial debate in the context of data platforms. The decision depends on the unique circumstances of each organisation—its budget, existing expertise, scalability requirements, and risk tolerance.

In the current age, when CEOs are being pushed to say generative AI by everyone, it might be easy to fall into the trap and overspend on over engineered solutions. It’s essential to scrutinise its applicability. Instead of focusing on novel technologies, companies should adhere to the age-old principle of delivering tangible returns on investments and CEOs are always looking for solutions that not only enhance but also generate profits.

The post Are Databricks and Snowflake Ferraris in a Toyota World? appeared first on AIM.

]]>
Snowflake Now Wants You to Converse With Your Data https://analyticsindiamag.com/intellectual-ai-discussions/snowflake-now-wants-you-to-converse-with-your-data/ Mon, 14 Aug 2023 09:30:00 +0000 https://analyticsindiamag.com/?p=10098549

The Neeva AI acquisition will give Snowflake’s customers the ability to talk to their data essentially in a conversational way

The post Snowflake Now Wants You to Converse With Your Data appeared first on AIM.

]]>

Snowflake’s growth trajectory has been nothing short of remarkable. Since 2012, the company has witnessed exponential market adoption and has attracted a diverse range of clients, from startups to Fortune 500 giants. Some of its notable customers include Adobe, Airbnb, BlackRock, Dropbox, Pepsico, ConAgra Foods, Novartis and Yamaha. In India, Snowflake caters to the needs of companies such as Porter, Swiggy and Urban Company. The rapid expansion is a testament to Snowflake’s ability to address the ever-increasing demands of the data-driven world we live in.

But today, we are stepping into the age of generative AI and Snowflake too is gearing up to bring the best of the technology to its long list of customers. Torsten Grabs, senior director of product management at Snowflake told AIM that with the advent of generative AI, we will increasingly see less technical users successfully interact with computers with technology and that’s probably the broadest and biggest impact that he would expect from generative AI and Large Language Models (LLMs) across the board. Moreover, talking about the impact of generative AI on Snowflake, he said that it has impacted Snowflake on two distinct levels.

Firstly, like almost every other company, generative AI is leading to productivity improvements at Snowflake. Grabs anticipates developers working on Snowflake to benefit the most from generative AI. This concept is akin to Microsoft’s co-pilot and AWS’s CodeWhisperer, where a coding assistant aids in productivity by comprehending natural language and engaging in interactive conversations to facilitate faster and more precise code creation.

Moreover, Snowflake is harnessing generative AI to enhance conversational search capabilities. For instance, when accessing the Snowflake marketplace, it employs conversational methods to identify suitable datasets that address your business needs effectively. “There’s another layer that I think is actually very critical for everybody in the data space, which is around applying LLMs to the data that’s being stored or managed in a system like Snowflake,” Grabs said. The big opportunity for Snowflake lies in leveraging generative AI to offer enhanced insights into the data managed and stored within these systems. 

Conversing with your data 

On May 24, 2023, Snowflake acquired Neeva AI with the aim of accelerating search capabilities within Snowflake’s Data Cloud platform by leveraging Neeva’s expertise in generative AI-based search technology. “We recognised the necessity of integrating robust search functionality directly into Snowflake, making it an inherent and valuable capability. Partnering with Neeva AI further enriched our approach, combining their expertise in advanced search with generative AI, benefiting us in multiple dimensions,” Grabs said.

Grabs believes the Neeva AI acquisition is going to bring a host of benefits to Snowflake’s customers. Most importantly, it will give them the ability to talk to their data essentially in a conversational way. “It’s analogous to the demonstration we presented, where a conversation with the marketplace utilizes metadata processed by the large language model to discover relevant datasets,” Grabs said.

Now consider scaling this process and going beyond metadata, involving proprietary sensitive data. By employing generative AI, Snowflake’s customers can engage in natural language conversations to gain precise insights about their enterprise’s data.

Building LLMs for customers

Building on generative AI capabilities, Snowflake, at its annual user conference called ‘Snowflake Summit 2023’ also announced a new LLM built from Applica’s generative AI technology to help customers understand documents and put their unstructured data to work. “We have specifically built this model for document understanding use cases and we started with TILT base model that we leveraged and then built on top of it,” Grabs said.

When compared to the GPT models from OpenAI or other models developed by labs such as Antrhopic, Snowflake’s LLMs offers few distinct advantages. For example, the GPT models are trained on the entirety of publicly available internet data, resulting in broad capabilities but high resource demands. Their resource-intensive nature also makes them costly to operate. Much of these resources are allocated to aspects irrelevant to your specific use case. Grabs believes utilising a more tailored, specialised model designed for your specific use case allows for a narrower model with a reduced resource footprint, leading to increased cost-effectiveness.

“This approach is also poised to yield significantly superior outcomes due to its tailor-made design for the intended use case. Furthermore, the model can be refined and optimised using your proprietary data. This principle isn’t confined solely to the document AI scenarios; rather, it’s a pattern that will likely extend more widely across various use cases.”

In many instances, these specialised models are expected to surpass broad foundational models in both accuracy and result quality. Additionally, they are likely to prove more resource-efficient and cost-effective to operate. “Our document AI significantly aids financial institutions in automating approval processes, particularly for mortgages. Documents are loaded into the system, the model identifies document types (e.g., salary statements), extracts structured data, and suggests approvals. An associate reviews and finalises decisions, streamlining the process and enhancing efficiency.”

Addressing customer’s concerns

While generative AI has garnered significant interest, enterprises, including Snowflake’s clients, which encompasses 590 Forbes Global 2000 companies, remain concerned about the potential risks tied to its utilisation. “I think some of the top concerns for pretty much all of the customers that I’m talking to is around security, privacy, data governance and compliance,” Grab said.

This presents a significant challenge, especially concerning advanced commercial LLMs. These models are often hosted in proprietary cloud services that require interaction. For enterprise clients with sensitive data containing personally identifiable information (PII), the prospect of sending such data to an external system outside their control and unfamiliar with their cybersecurity processes raises concerns. This limitation hinders the variety of data that can interact with such systems and services. 

“Our long-standing stance has been to avoid dispersing data across various locations within the data stack or across the cloud. Instead, we advocate for bringing computation to the data’s location, which is now feasible with the abundant availability of compute resources,” Grabs said. Unlike a decade or two ago when compute was scarce, the approach now is to keep data secure and well-governed in its place and then bring computation to wherever the data resides. 

He believes this argument extends to generative AI and LLMs as well. “We would like to offer the state-of-the-art LLMs and side by side the compelling open-source options that operate within the secure confines of the customer’s Snowflake account. This approach ensures that the customer’s proprietary or sensitive data remains within the security boundary of their Snowflake account, offering them peace of mind.”

Moreover, on the flip side, another crucial aspect to consider is the protection of proprietary intellectual property (IP) within commercial LLMs. The model’s code, weights, and parameters often involve sensitive proprietary information. “With our security model integrated into native apps on the marketplace, we can ensure that commercial LLM vendors’ valuable IP remains undisclosed to customers utilising these models within their Snowflake account. Our role in facilitating the compute for both parties empowers us to maintain robust security and privacy boundaries among all participants involved in the process,” Grabs concluded. 

The post Snowflake Now Wants You to Converse With Your Data appeared first on AIM.

]]>
Everyone’s a Winner in the AI Race https://analyticsindiamag.com/ai-origins-evolution/everyones-a-winner-in-the-ai-race/ Tue, 06 Jun 2023 10:30:00 +0000 https://analyticsindiamag.com/?p=10094589

“AI race is going to create more losers than winners,” says GQG Partners’ Rajiv Jain. Really?

The post Everyone’s a Winner in the AI Race appeared first on AIM.

]]>

Rajiv Jain, chairman and chief investment officer of GQG Partners, one of the biggest buyers of Nvidia Corp in the first quarter (8.2 million shares worth over $2.3 billion), recently said that the AI race is going to create ‘more losers than winners’ as it disrupts several business models across industries. With everyone jumping onto the AI bandwagon, the fate of them all is a mystery for now.  

Rajiv believes that apart from NVIDIA, the winners in the AI race would be large tech companies. However, it is not necessary that everyone emerges a winner, especially big tech. Similar to the buzz that occurs with any disruptive technology, most companies are eager to try their hand to hopefully reap benefits from it. However, several fail to sustain in the competitive market owing mainly to the cost dynamics. Here, one can’t help but draw parallels to the dot-com boom of the early 2000s when a number of companies failed owing to reasons such as market overconfidence, investors’ fear of missing out, abundance of venture capital funding, and companies failing to make a profit. Yes, the AI euphoria does give you that déjà vu

Tracing the AI Winners

Looking at NVIDIA’s success story, and the recent valuation of close to $1 trillion, it is clear that to be the forerunner of a race, you should either be a disruptor by being the first one to innovate and stay ahead, or simply be an essential enabler to fuel the existing race. NVIDIA has been in the forefront for fuelling AI companies. Big techs such as OpenAI, Meta, Amazon, Oracle are utilising NVIDIA’s H100 GPUs for powering their AI models. It was reported that OpenAI had in fact reached out to Microsoft five years ago to build AI infrastructure on thousands of NVIDIA GPUs. 

While OpenAI is taking home all the trophies, Microsoft is gaining big from it. The company has been riding high on the success of ChatGPT and related applications that are now an integral part of Microsoft products. Investing $13 billion on OpenAI and releasing major announcements in the recent Build conference, none of it would have been possible for Microsoft without OpenAI. 

Not Everyone’s Game

While a few companies are surely flying high, not everyone is reaping from it. Google, one of the major players in the game, has been struggling to strike a dominance in the AI race. Reeling from the infamous Google debacle at the Paris event that cost the company a jaw-dropping $100 billion, the company came up with major AI announcements at the recent Google I/O event endorsing Bard and PaLM2. Though its chatbot, Bard, received updates and is substantially popular, it is still not able to catch up with ChatGPT that is now available as an app on iOS. 

In trying to cash-in on the chatbot wave, China’s Baidu launched its ChatGPT rival Ernie. However, post the announcement, the company’s shares dropped to an 8-week low. Baidu’s stocks continued to show no improvement and dipped further. However, the company has not given up and continues to be optimistic. It recently announced a $145-million venture capital AI fund to back-up companies that focus on AI generated content applications. 

Online community Stack Overflow has been witnessing a slump ever since ChatGPT came into existence. Instead of spending time on Stack Overflow to browse for answers, users have been inclining towards ChatGPT. To protect their content, the company announced its plan to charge AI developers to access its programming-driven questions. Recently, the company also faced a 10% layoff of its workforce owing to focus on profitability and strategy change. 

Building LLMs comes with its fair share of challenges. AI advisor Vin Vashishta recently compared LLMs to a boat. “It’s fun until it’s your boat and you have to take care of it.” The problems of cost and fine tuning datasets will be a deterrent in addition to finding people with experience that can deploy LLMs, hence one should be wary of the RoI which should be significant before starting a project. 

The Benefiting Intermediaries

Though not all can shine in the AI race, considering how not everyone can build their LLM models, there are companies that benefit from utilising generative AI. Companies that use AI as a support function will benefit. This includes examples like Zoho that have integrated ChatGPT into their in-house applications. 

Consumer search startup Neeva, founded by former Google executives, had launched their AI-powered search earlier this year, but last month the company had to shut down its operations. Within days, cloud-based data company Snowflake acquires Neeva to enable users to have a conversational experience. 

It was noted that the usage of ChatGPT in companies led to a positive influence on the company stocks, as it reflected a jump in labour-force productivity. In addition to integrating ChatGPT-like conversational chatbot SlackGPT on their platform, cloud-based company Salesforce also announced their plans to collaborate with Accenture to bring in generative AI for CRM. To further ride the AI wave, some companies are leveraging ChatGPT by offering their services in the form of plugins on the portal. 

With the accelerated pace of AI development, it is evident that AI is inevitable and companies will be involved in one way or another. But how they leverage AI is what matters. Jensen Huang from NVIDIA recently mentioned that more than worrying about AI taking away jobs, someone who is an expert at AI will take it. To quote, “Either you are running for food, or you are running from becoming food.” 

The post Everyone’s a Winner in the AI Race appeared first on AIM.

]]>
5 Most Powerful AI Tools for Data Science and Analytics  https://analyticsindiamag.com/top-ai-tools/5-powerful-ai-tools-for-data-science-and-analytics/ Tue, 02 May 2023 13:00:20 +0000 https://analyticsindiamag.com/?p=10092644

Powerful to time-saving, this list of AI-powered tools will come in handy for data science and analytics

The post 5 Most Powerful AI Tools for Data Science and Analytics  appeared first on AIM.

]]>

With the evolution of LLMs and AI models, tools and applications that support data science and analytics are seeing significant development. With AI powered tools and no code platforms, the simple usage of these applications are pushing them for wider adoption. While the traditional roles of data scientists may see a shift, the new AI tools will complement them in their day to day work. By automating repetitive and unwanted tasks, time is better utilised for productive work which ultimately saves cost. 

Here is a list of 5 AI tools for Data Science and Analytics: 

1. ProbeAI 

Known as the “AI Copilot for Data Analysts,” ProbeAI helps in a variety of tasks that simplifies the work of data analysts. It avoids time-consuming, repetitive tasks and automates parts of the workflow. Some of the tasks include auto-generation of complex SQL codes, identifying relevant tables for your questions, optimising and fixing SQL codes. It supports databases such as MongoDB, MySQL, PostgreSQL, snowflake, Google Big Query and Amazon RedShift.  

Source: Product Hunt 

2. ObviouslyAI

Obviously AI builds no code AI models for businesses. The tools help in multifunctions such as data cleaning, model selection, hyper parameter tuning, model deployment and management. It can even help in predicting fraudulent transactions. In a previous interview with AIM, CEO of Obviously AI, Nirman Dave, said that the tool will not replace data scientists and will accelerate the data science process. 

3. Relevance AI

RelevanceAI is a platform that assists users to analyse and visualise unstructured data effortlessly without the need of coding skills. Some of the functions that can be executed on the platform are automated categorization, AI workflows, semantic search that helps users to categorise and analyse text, images and audio. The platform also helps with use cases, market research, customer experience, analytics and insights. 

4. MonkeyLearn

A no-code text analytics platform, is designed to help businesses to visualize customer feedback. The platform is built on machine learning models, with a range of pre-trained classifiers and extractors. The platform simplifies data and pulls insights in an easy understandable format. The platform has been used by different domains and a few of their clients include Dell, Freshly, MoxiWorks, and others.  

Source: MonkeyLearn

5. IBM Cognos Analytics with Watson

Considered an AI Copilot for businesses, the ‘IBM Cognos Analytics with Watson’ is a Business Intelligence solution tool that has AI capabilities integrated. It helps in cleaning data, interpreting them and producing data visualisations. It helps with predictive analysis and can forecast the future of any business. You can import data from CSV files and spreadsheets, and can connect to data sources such as SQL databases, Google BigQuery, Amazon, Redshift, etc.  

The post 5 Most Powerful AI Tools for Data Science and Analytics  appeared first on AIM.

]]>
AIM Research: Product Partnerships Of Data Service Providers https://analyticsindiamag.com/ai-insights-analysis/aim-research-product-partnerships-of-data-service-providers/ Thu, 23 Mar 2023 08:00:00 +0000 https://analyticsindiamag.com/?p=10089865 Product Partnerships Of Data Service Providers

A significantly high percentage of partnerships with companies like AWS, Microsoft, and Google shows their wide-ranging technology facilities, like cloud storage, database, reporting and visualization, data and analytics, computing, etc.

The post AIM Research: Product Partnerships Of Data Service Providers appeared first on AIM.

]]>
Product Partnerships Of Data Service Providers

Data science is a rapidly growing field, and demand for these services has been increasing as organizations of all sizes seek to harness the value of the large amounts of data they collect. Data service providers are companies that specialize in helping organizations leverage the power of data to make informed decisions and drive business outcomes.

To provide such services data service providers have product partnerships with different technology providers, the product partnership involves two or more companies collaborating to provide a more comprehensive suite of data services to their clients, these partnerships typically include, sharing of resources and technology to create innovative solutions that meet the evolving needs of the business.

The report mainly considers data science service providers in India. This could be firms headquartered in India or companies with a majority of their delivery team sitting in India.

This report can help data science service providers analyze what the current market trend is in terms of forging technology partnerships that can enable a high standard of work delivery. This can be also used by consumer companies to understand what tools are predominantly being used and subsequently invest in the right technologies when making an effort toward digitization. Technology providers can gauge where they stand in terms of their competition.

Read the complete report here:

The post AIM Research: Product Partnerships Of Data Service Providers appeared first on AIM.

]]>
Mizu Renames Itself To Kubeshark For Kubernetes https://analyticsindiamag.com/ai-news-updates/mizu-renames-itself-to-kubeshark-for-kubernetes/ Tue, 22 Nov 2022 12:22:29 +0000 https://analyticsindiamag.com/?p=10080459

The Mizu team claims that ‘Kubeshark’ is similar to Wireshark re-born for Kubernetes. 

The post Mizu Renames Itself To Kubeshark For Kubernetes appeared first on AIM.

]]>

Earlier this week, Mizu, the API traffic viewer for Kuberenetes, renamed itself as Kubershark. Mizu was originally developed by software firm UP9, which was open sourced at the end of 2021.

Check out Kubershark on GitHub

UP9’s chief executive officer Alon Girmonsky announced in a blog post that UP9 was building a testing product which was autonomous and attempted to infer API tests—which was most sought after by testing professionals—from the API traffic. 

Simply put, Kubeshark is a monitoring tool that is used to capture all the network traffic inside a Kubernetes cluster, including egress, ingress, and various containers. 

As gaining access to API traffic is a challenging process, the team developed the product with significant investment to better help work in large-scale production clusters, having minimal impact on performance. Meanwhile, as they realised how other people could utilise the capability, the team decided to contribute their technology to open source as a standalone project called ‘Mizu’.

The shark is re-born 

In 2022, Girmonsky stated that due to a significant strategic shift in UP9, it became challenging to prioritise investments in Mizu. He said, “Lately, we decided to spin off the open source project (fka Mizu) and assign a dedicated core team that can focus solely on the project, developing new features and supporting its users”.

Caption: HTTP traffic captured inside a Kubernetes cluster

As the Mizu project is no longer under UP9, it is now developed and maintained by individuals from the same team that originally developed it—partly sponsored by the CEO. “To recognize that the project now has a new home, we decided to change the project name and from now on, Mizu is Kubeshark”, said Girmonsky. Moreover, the company wanted to give the project its own repository and identity on spaces such as DockerHub and GitHub

Back in the day, users loved using Wireshark—an open-source packet analyser—used for network troubleshooting and analysis. The Mizu team claims that ‘Kubeshark’ is similar to Wireshark re-born for Kubernetes. 

Read: Do you really need to learn Kubernetes?

A standalone project

The team believes that the focus will be solely on Kubeshark as a standalone project. 

Some of the areas that Kubershark will focus on are scalability, ease of use, and real time or historical traffic visibility. 

However, Kubeshark can capture and display the encrypted traffic (TLS) with help of Linux kernel technologies. It mainly supports a variety of application layer protocols and RPCs such as gRPC and GraphQL. 

Along with the new developments, the GitHub issues section will be used to discuss bug fixes, feature requests, and development in general. The blog reads, “We are aiming to refactor the codebase, reduce the bloat of features, reconsider some of the old design decisions, reduce the technical debt and improve the overall code quality. We plan for this work to translate to a far better software quality and the user experience”.

Further, the GitHub repository is transferred from up9inc/mizu to kubeshark/kubeshark. Meaning, everything related to the repository has been shifted to another repo including the pull requests, issue tracker, and releases—without any data loss.

Besides Mizu, RStudio also announced that it has a new name now—‘Posit’. The move exhibits the company’s expansion plans with a focus beyond R, including users of the Visual Studio Code and Python. 

Read: Is Python Slowly Eating R? The Reason Why RStudio Became Posit

The company claims that it rebranded itself to better represent their evolving business.

Hadley Wickham, chief scientist at Rstudio said, “We’re not pivoting from R to Python. I’m not going to stop writing R code, and I’m not going to learn Python.” 

Speaking of Python, cloud company ‘Snowflake’ unveiled new additions to its platform, which are Python-friendly in nature. The firm aims to help developers and data scientists to find new ways to develop applications, pipelines, and ML models with Snowflake’s single data platform. 
Read: Snowflake is Now Python-friendly

The post Mizu Renames Itself To Kubeshark For Kubernetes appeared first on AIM.

]]>
Is Python Slowly Eating R? The Reason Why RStudio Became Posit https://analyticsindiamag.com/ai-origins-evolution/is-python-slowly-eating-r-the-reason-why-rstudio-became-posit/ Wed, 16 Nov 2022 07:30:00 +0000 https://analyticsindiamag.com/?p=10079845

Python and R are used for similar purposes, but differ in essence.

The post Is Python Slowly Eating R? The Reason Why RStudio Became Posit appeared first on AIM.

]]>

Following the company’s annual user conference in Washington, D.C. held on July 27, IDE (Integrated Development Environment) RStudio announced that it has a new name—Posit. The move signalled the company’s expansion plans with a focus beyond R, including users of Python and Visual Studio Code. 

Meanwhile, the open-source data science community Posit said, “While many things will stay the same, our rebrand will result in changes beyond a new name.”

RStudio has been emphasising that its commercial products are “bilingual” for both R and Python for many years. However, the “RStudio” brand has made it difficult to convince organisations to consider its products for Python users. 

But does it bring about a belief that Python is somehow supplanting R for the data science ecosystem?

Cocktail of languages 

At Snowday 2022, cloud company ‘Snowflake’ announced new additions to its platform that are set to help data scientists and developers figure out new ways to develop pipelines, applications, and ML models with the firm’s single data platform. 

Read: Snowflake is Now Python-friendly 

Over the years, Posit (formerly, RStudio) has transitioned from R-exclusive tooling to an agnostic language ecosystem. There has been a gradual shift of RStudio IDE to be more Python-friendly. RStudio—a name synonymous with open source R development—rebranded itself to better represent the evolving business. 

This led to the rebranding of tools and commercial products. RStudio Connect was renamed ‘Posit Connect’ and RStudio Workbench as ‘Posit Workbench’. RStudio said in a tweet that RStudio IDE will still be around to offer help with the open source R development. 

RStudio’s chief scientist Hadley Wickham said, “We’re not pivoting from R to Python.” He further explained, “I’m not going to stop writing R code. . . I’m not going to learn Python,” putting users’ concerns at ease.

Although RStudio is seeking to balance the share of engineers working on R with other advancements over time, the company claims that the majority of work will continue to be R-related. 

Can Python replace R?

Python and R are used for similar purposes, but differ in essence.

Python is a high-level, object-oriented programming language that comes with built-in data structures, making it a top language for the development of applications. Python syntaxes are simple and easy to read. 

On the other hand, R is a programming language used for statistical analysis of data and comes with a wide range of techniques of linear modelling, statistical tests, non-linear modelling, and clustering. One of the core strengths of R is easy production of a plot, including mathematical notations and formulas.

However, both the languages are the preferred for data science, data analysis, and machine learning.  R primarily focuses on the statistical aspect of a project while Python is flexible in its data analysis and usage. 

R plays an important role in visualising data in graphs. However, it is difficult to use this language in a production environment because of its ‘yet-to-develop’ production tools. In contrast, Python can be easily integrated in a complex work environment.

When it comes to performance, users prefer Python as it runs faster than R in all environments. However, a user posted on Reddit that Python has libraries that are ‘embarrassing’ compared to feature-rich ones in R. 

(Source: Reddit)

Nevertheless, both languages are among top favourites for users to work with depending on their usage in a given environment. 

A Single Home for R and Python

With R and RStudio communities, the company has helped users pose and answer difficult questions around data. By building open source tools to make “code-first” data science accessible to millions of people to establish reproducibility as a baseline for analysis and communication, the company aims to foster developments in a diverse community. 

RStudio said that one of the core ideas that the community believed in was the imperativeness of using open source software for scientific work. 

“Scientific work needs to be reproducible, resilient, and must encourage broad participation in the creation of the tools themselves.”

Hadley Wickham said, “The name had just started to feel increasingly constraining.”  Both Wickham and CEO J.J. Allaire emphasised that the rebranding doesn’t signify a shift away from R. 

However, a user claims that above all, the foremost problem with R is governance.

On Python github, users will be able to see thousands of pull requests and issues, with various people who are trying to contribute to the core language. In addition, Python even holds elections with anyone theoretically becoming a “core python developer”.

The user further said, “Do you want to contribute to the core R language? You can’t. R is open source in terms of the source code being available, but completely closed in terms of development. You cannot even directly create an issue on the bug tracker if you find a bug.”

If Python is a democracy, R is the feudal system. Additionally, since R core developers are unelected and the amount of core developers can be counted—it makes Python more diverse than R. 

The post Is Python Slowly Eating R? The Reason Why RStudio Became Posit appeared first on AIM.

]]>
Snowflake is Now Python-friendly   https://analyticsindiamag.com/ai-news-updates/snowflake-is-now-python-friendly/ Fri, 11 Nov 2022 04:03:27 +0000 https://analyticsindiamag.com/?p=10079381

At Snowday 2022, Snowflake announced exciting new additions to its platform. The additional features will help developers, data scientists, and data engineers to increase productivity and uncover new ways to develop applications, pipelines, and ML models with Snowflake’s single data platform.  Interestingly, after announcing its acquisition of Streamlit, an open-source framework for machine learning and […]

The post Snowflake is Now Python-friendly   appeared first on AIM.

]]>

At Snowday 2022, Snowflake announced exciting new additions to its platform. The additional features will help developers, data scientists, and data engineers to increase productivity and uncover new ways to develop applications, pipelines, and ML models with Snowflake’s single data platform. 

Interestingly, after announcing its acquisition of Streamlit, an open-source framework for machine learning and data science teams to build and share data apps, Snowflake has said that users will be able to develop data applications using Python directly on its platform. The applications created can then be run on Snowflake’s secure and governed platform. 

Snowflake’s developer framework, Snowpark, supports multiple programming languages like Java, Scala, and SQL. Python is the latest addition and one that enables developers to co-create projects without any concern about data security and compliance roadblocks. 

Further, Snowflake has been able to leverage the abilities offered by its partners like Anaconda, dbt labs and more. Anaconda’s integration with Snowflake means that the open-source Python library in Anaconda will now be available for Snowflake users. The merger eliminates the need for manual installation or package dependency management. Snowflake’s integration with dbt labs, on the other hand, combines the power of SQL and Python, bringing together the widening gap between analytics and data science teams. 

Additionally, it is also planning to release its own optimised warehouses, which can publicly be previewed in AWS, so that developers can run large scale ML training and other memory-incentive operations in Snowflake, along with Python Worksheets for private view, where applications, data pipelines and ML models can be developed. 

Snowflake has also taken other measures to bring developers to build applications in the Data Cloud. The Schema Interface allows developers to onboard data faster, thereby increasing productivity, while also executing pipelines seamlessly with Serverless Tasks natively in the platform. Additionally, Snowflake has also introduced two new tools—dynamic tables and observability & experiences. 

Dynamic tables automate incremental processing through the use of declarative data pipelines for coding efficacy and ease. Observability & experiences include alerting (private preview), logging (private preview), event tracing (private preview), task graphs and history (public preview), and more, in order to build, test, debug, deploy, and monitor data pipelines more productively. 

Python is known to be the most-popular language among data scientists and third most-popular among developers. Making it Python-supported, and having an open-source library integrated within it, Snowflake will be able to onboard a large portion of the developer community to the data cloud. 

The post Snowflake is Now Python-friendly   appeared first on AIM.

]]>
Building a resilient, scalable modern data platform https://analyticsindiamag.com/ai-origins-evolution/building-a-resilient-scalable-modern-data-platform/ Mon, 02 May 2022 10:30:00 +0000 https://analyticsindiamag.com/?p=10066057

The first generation platforms were based on a data warehouse model.

The post Building a resilient, scalable modern data platform appeared first on AIM.

]]>

Sumit Jindal, director of data engineering and Rashmi Purbey, manager of data engineering, from Publicis Sapient, spoke in detail about the evolution of modern data architecture and application at the Data Engineering Summit 2022. The duo unpacked the data buzzwords doing the rounds in the world of AI and data science in an information-packed session.

Sumit started off with an example of an online and offline multiformat retail and financial giant with a presence in multiple countries. The client needed multi-language support. “We had to build data for multi-business units spanning across multiple business domains, and also account for country dimensions,” he said.

“We enable the data platform, which is seamlessly working on diverse data set from different business units and countries. And the outcome of this system helps our clients become digitally integrated enterprises,” he added.

A modular view

“This is a logical view of a modern data platform. What we are seeing here is the data is coming in different formats such as structured data, unstructured data, semi-structured data etc. The data can be integrated through APIs such as on-demand at batch loads, direct integration through databases, and we can have real-time streaming data as a prerequisite for the system,” Sumit said.

The data collection layer should have the functionality of combining or consuming data from different sources. It should have a data provenance layer where you can go back and see if something is wrong. Apart from this, storage is a fundamental point of a data platform.

Evolution of data architecture

Sumit said the first generation platforms were based on a data warehouse model. The data was integrated with ETL tools like Informatica, data-stage, etc and databases were integrated in a batch fashion. A data warehouse could be a sequel based data system such as Oracle, Teradata, etc which were a bit more performant in terms of doing ad-hoc BI queries. The process of data integration faced limitations such as storage and compute power. Normally, a top-down approach is used while building such a data warehouse. 

Two-tier architecture

The tier-two architecture is a modern method of data warehousing. With the advent of systems like Hadoop and Spark, a data lake based model has emerged. Now, in a data warehouse, the storage of unstructured data is particularly challenging. Frequent data updates or data injections pose another bottleneck.

“The two sections of a two-tier architecture are: First, a data lake layer, where you are processing your data. You are first loading data from multiple sources. And the second is data transformation where you are transforming data and making it available for ML as well as analytics use case and for a lot of ad hoc analytics,” he said.

The advantage is you can have multi-modal data available in all formats. However, the inconsistency or staleness of data is an issue.

Data lakehouse

Rashmi Purbey spoke about the applications of data lakehouse on various cloud systems.

  1. Lakehouse on Databricks (Azure as cloud platform)
  2. Lakehouse on AWS
  3. BigLake – Lakehouse on GCP
  4. Lakehouse using Snowflake

Databricks combines the best of both worlds, data warehousing and data lake. Lakehouse (AWS) provides a stable interface layer to query the data from both the data warehouse as well as the data lake. BigLake is a storage engine that allows organisations to unify data warehouses and lakes, and enables them to perform uniform fine-grained access control, and accelerate query performance across multi-cloud storage and open formats. Snowflake is a data warehouse built for the cloud. It enables the data-driven enterprise with instant elasticity, secure data sharing,  and per-second pricing. Snowflake combines the power of data warehousing, the flexibility of big data platforms and the elasticity of the cloud at a fraction of the cost of traditional solutions.

“When building a resilient, scalable data platform, businesses normally focus on the platform that they are building, rather than concentrating on the analytics that goes behind building such a platform. Apart from that, one has to even consider the data being generated as a product in itself as there is a demand for such data in the market. One needs to keep enhancing and improving it to keep its quality up. Having the right quality checks and monitoring the output is of paramount importance in building a robust data product,” said Sumit.

REGISTER HERE TO ACCESS THE CONTENT

The post Building a resilient, scalable modern data platform appeared first on AIM.

]]>
What’s new in Streamlit version 1.8.0? https://analyticsindiamag.com/ai-news-updates/whats-new-in-streamlit-version-1-8-0/ Fri, 25 Mar 2022 06:57:04 +0000 https://analyticsindiamag.com/?p=10063571

Streamlit is an open-source Python library that makes it easy to create custom web apps for machine learning and data science.

The post What’s new in Streamlit version 1.8.0? appeared first on AIM.

]]>

Streamlit has announced its latest version 1.8.0. Streamlit is an open-source Python library that makes it easy to create custom web apps for machine learning and data science.

The key changes in the new release include:

  1. Performance improvements on dataframes – In this update, the time taken to parse dataframes is massively reduced. Now, a 150MB dataframe goes from taking 3s to process on the frontend down to 125ms, which is primarily the time spent by arrow turning the byte array into a table.
  1. ‘st.slider’ handles timezones better by removing timezone conversions on the backend – This update removes any timezone conversation on the backend and uses the UTC timezone in the frontend (with moments) instead of the browser timezone. Thereby, the DateTime instances are always shown with their exact date and time.
  1. Design improvements to our header – This update introduces a handful of design adjustments on the app header like Styled the top buttons, different hover state for buttons, a semi-opaque background for the top header and the consequent removal of the background behind the running man animation.

Recently, Streamlit agreed to be acquired by Snowflake to join forces and open new frontiers in data science and data application development.

The post What’s new in Streamlit version 1.8.0? appeared first on AIM.

]]>
Why did Snowflake acquire Streamlit? https://analyticsindiamag.com/ai-origins-evolution/why-did-snowflake-acquire-streamlit/ Mon, 14 Mar 2022 10:30:42 +0000 https://analyticsindiamag.com/?p=10062694 Snowflake acquires Streamlit

Existing Snowflake customers would be able to leverage Streamlit’s app development framework to use data with Snowflake Data Cloud.

The post Why did Snowflake acquire Streamlit? appeared first on AIM.

]]>
Snowflake acquires Streamlit

Data cloud company Snowflake has announced that it will be acquiring Streamlit. Latter is a framework to simplify and accelerate the development of data applications. This deal, currently valued at $800 million, is at a very preliminary stage and subject to regulatory approvals and customary closing conditions.

Snowflake-Streamlit: A good match

The two companies would be working together to help developers build apps with simplified data access and governance. Existing Snowflake customers would be able to leverage Streamlit’s app development framework to use data with Snowflake Data Cloud. Streamlit customers, on the other hand, will now have access to ‘trusted and secure’ data for their applications.

Founded in 2018 by Adrien Treuille, Amanda Kelly, and Thiago Teixeira, the California-headquartered Streamlit offers an open-source framework to build and share apps quickly. Streamlit has over 8 million downloads, and 1.5 million applications have been built using this framework.

Streamlit’s motivation was to make building tools as easy as writing Python scripts. Instead of offering a one-size-fits-all tool, Streamlit creates Lego-like capabilities which users can join together to suit their needs. Streamlit treats widgets as variables, and every interaction reruns the script from top to bottom. The product deploys apps directly from private Git repos and updates on commits.

Co-founder and president of products at Snowflake, Benoit Dageville, said that with this partnership, Snowflake would be able to offer even non-technical users to interact with data and build apps. Until now, Snowflake had tools for accessing and managing technical data but lacked a data visualisation platform; with Streamlit, that fills the void.

Where is Snowflake going?

In September 2020, San Mateo-headquartered Snowflake raised $3.36 billion in its initial public offering. At that time, it was the biggest US listing of the year, surpassing the previous best IPO of Royalty Pharma. Snowflake’s IPO was a rebound for the US stock market when a lot of companies had put a hold on IPO due to the pandemic.

Snowflake makes virtual machines available to anyone on the public cloud platform. Despite facing stiff competition from companies like Oracle, SQL Server, Amazon Redshift, and Google BigQuery, Snowflake continues to hold its ground.

The multi-cloud feature of Snowflake that makes cloud data warehousing useful for its clients is its differentiating factor. Snowflake uses scalable cloud blob storage that is available in AWS, Azure or GCP. It offers reliability and scalability by utilising distributed storage systems. Its architecture cloud data warehouse can process massive volumes of data with a high degree of efficiency. This unique architecture makes Snowflake suitable for a wide range of applications – streamlining data ingestion and integration, data warehousing, and streamlining data science workloads. 

As of January 2021, Snowflake has over 4,000 customers, including 186 from the Fortune 500 list. The company has a free-to-join marketplace called Snowflake Data Exchange, where customers can connect with data providers and get access to an additional data stream.

Snowflake and open source

Most developers would prefer a third-party vendor to manage their database, provided factors like safety and reliability are taken into account. This explains why developers might prefer cloud over open source, even as the latter is a great way to build software and, in turn, foster a community.

That said, companies like Snowflake continue to be undefeated largely by their open-source counterparts. In an interesting blog on the company’s website titled ‘Choose open wisely’, authors write that Snowflake believes in ‘open where open matters’. It means that while the company values open standards and open source but would not trade it off for ease of use, transparent optimisations, and continuous improvements. “Some companies tout being open and pride themselves on being open source, but in reality, their embrace is not 100%; as described in this document, there are good reasons dictating such departures. Our goal is to be clear and transparent about how we think about these topics at Snowflake and to dispel myths and misconceptions,” the authors write. Snowflake’s adamancy on remaining in a closed environment has irked stakeholders in the past.

This makes one think about the future of Streamlit with its latest acquisition. Streamlit, until now, has been a completely open-source platform. What lies ahead needs to be seen.

The post Why did Snowflake acquire Streamlit? appeared first on AIM.

]]>
Snowflake to acquire Streamlit to empower developers and data scientists https://analyticsindiamag.com/ai-news-updates/snowflake-to-acquire-streamlit-and-empower-developers-and-data-scientists/ Thu, 03 Mar 2022 11:08:48 +0000 https://analyticsindiamag.com/?p=10062059

Streamlit and Snowflake developer communities will now be able to tap into cutting edge technologies and get a lot more done from data.

The post Snowflake to acquire Streamlit to empower developers and data scientists appeared first on AIM.

]]>

Snowflake, a data cloud company, has signed a definitive agreement to acquire Streamlit, a framework built specifically for machine learning and data science teams. Streamlit has more than 8 million downloads, and over 1.5 million applications have been built using their framework.

This strategic acquisition allows the two companies to unlock the potential of data and make it easier to build applications.

The Snowflake Data Cloud provides developers and data scientists with a secure collaboration hub for data, and Streamlit’s open-source framework enables developers and data scientists to build and share data apps quickly and iteratively. With the collaboration, the two companies will be able to enable developers to build apps using tools with simplified data access and governance. The companies will also benefit from a larger and more active community contributing to the Streamlit framework.

At the announcement, Snowflake Co-Founder and President of Products, Benoit Dageville, said, “When Snowflake and Streamlit come together, we will be able to provide developers and data scientists with a single, powerful hub to discover and collaborate with data they can trust to build next-generation data apps and shape the future of data science.”

Streamlit CEO and Co-Founder Adrien Treuille said, “By joining forces with Snowflake, both the Streamlit and Snowflake developer communities will be able to tap into cutting edge technologies for unlocking data’s true potential.”

Closing of the acquisition is subject to the receipt of required regulatory approvals and other customary closing conditions.

The post Snowflake to acquire Streamlit to empower developers and data scientists appeared first on AIM.

]]>
Snowflake makes Data Classification available in public preview https://analyticsindiamag.com/ai-news-updates/snowflake-makes-data-classification-available-in-public-preview/ https://analyticsindiamag.com/ai-news-updates/snowflake-makes-data-classification-available-in-public-preview/#respond Wed, 23 Feb 2022 10:45:30 +0000 https://analyticsindiamag.com/?p=10061396 data science platforms

Currently, Snowflake’s Data Classification focuses on direct identifiers, quasi-identifiers, and sensitive attributes.

The post Snowflake makes Data Classification available in public preview appeared first on AIM.

]]>
data science platforms

Snowflake has made Data Classification available in public preview. The Data Classification is built into Snowflake’s platform and is available to the users’ at no extra cost.

Snowflake’s Data Classification aims at helping organisations govern sensitive data by removing manual processes or dependence on a third-party tool. Organisations can leverage Data Classification to understand their data and unlock the analytical value present in the data. Once classified using Snowflake’s Data Classification, organisations can easily run queries defined in INFORMATION_SCHEMA to search for this data, protect it with role-based policies, and audit access through Access History, which is all part of Snowflake’s suite of native data governance features. 

Data Classification analyses the contents and metadata of columns in a table and then feeds that information into a pre-built machine learning model to help determine the appropriate categories of personal information that may be considered sensitive, requiring more protection or limited access, and applies the results as System Tags. Snowflake will continue to add more categories giving customers more functionality with little additional input. 
Currently, Snowflake’s Data Classification focuses on direct identifiers, quasi-identifiers, and sensitive attributes.

The post Snowflake makes Data Classification available in public preview appeared first on AIM.

]]>
https://analyticsindiamag.com/ai-news-updates/snowflake-makes-data-classification-available-in-public-preview/feed/ 0
Is Machine Learning Currently Hyped? https://analyticsindiamag.com/ai-origins-evolution/machine-learning/ Tue, 23 Nov 2021 04:30:00 +0000 https://analyticsindiamag.com/?p=10054027

With 90 per cent of organisations struggling to implement ML, it is important to take a relook at the actual benefits of the technology and its possibilities.

The post Is Machine Learning Currently Hyped? appeared first on AIM.

]]>

The main challenge that organisations face in implementing machine learning is the complex infrastructure or workload needs. A whopping 90% of CXOs feel the same way. Into the details of this – 88% struggle with integration and compatibility of AI/ML technologies, while 86% struggle with the frequent updates that are required for data science tooling.

Such stats by the DataRobot 5 Latest Trends in Enterprise Machine Learning 2021 report state that many organisations do have a difficult time keeping up with ML. This implores the question – is ML really overhyped?

Reality Check

Every year, there are always some technologies that are more popular than others. This was seen with cloud computing, big data and cybersecurity. Machine learning is currently the topic that allows people to dream about the future and the possibilities that ML can introduce. The dreams are even so scary as they include self-learning robots that can take over the world. But the reality is so far away from this. Today, it is difficult to crack the working of statistical and mathematical supervised learning models that are deployed in machine learning.

Such future dreams surely motivate us to invest in technology but also drive the so-called hype. Experts suggest that such situations arise when ML is requested without actually considering the internal data readiness or the requirements of the tool.

ML Engineers Struggle

A quick look at the answers of ML engineers and data scientists on Quora suggests that they are not satisfied with their jobs at places where there are ‘hyped’ expectations by companies. Some of the top arguments put across by the employees include:

  • They were hired to do basic data analysis like the one that involves using excel sheets, R analytics, or analysis in Python. In any of these cases, ML did not exist at all.
  • Data is sparse, and the company doesn’t have the right features to collect it. This causes a high degree of non-linearity, and the model comes back with a low accuracy rate.
  • End up being an ‘SQL junkie.’
  • Apprehensive managers who are not sure what ML can do, don’t fund experiments as they worry about businesses losing money.
  • IT does not cooperate, sometimes even in sharing cloud passwords for complete access.

Becoming Data First

For deploying machine learning, it is essential to have a solid foundation of data for successful project execution. This demands a complete change in organisational processes and culture.

Enterprises must work on ‘data readiness’ before any machine learning development begins. This includes getting clean and stable data; and creating data governance processes as well as scalable data structures. The companies need to implement long-term data-based plans and policies to create a common data architecture.

Employees require time to adapt while onboarding any new technologies, and ML is no exception.

New technologies are always overhyped

When computers were gaining popularity in 1950, people thought that the future of these machines were humanoids – especially the military. But nobody imagined the Internet would actually change the world. Similar is the situation today, where the latest algorithms developed in AI and ML are always overhyped.

ML is not something that is very new, though. Arthur Lee Samuel, an American pioneer in the field of computer gaming and AI and who popularised the term “machine learning,” defined it as something that gives computers the ability to learn without being programmed explicitly in 1959. Today, ML is more objective in nature and is concerned with what can actually be achieved in realistic terms.

Wrapping up

ML has been successful at doing many amazing things like production parameter adjustment, predictive maintenance, and visual quality control. It is essential to set an achievable long-term goal and work on organisational infrastructure, data strategies and culture. According to Paul Zhao, Principal Product Manager, Data Science and Machine Learning, Snowflake, “The need to leverage machine learning for better and faster insights is clear. Only organisations that are able to rein in the complexities around infrastructure, tooling, operations, and workloads will be able to deliver on the value of those insights.” The possibilities of what ML can achieve is endless, and it probably deserves the hype.

The post Is Machine Learning Currently Hyped? appeared first on AIM.

]]>
Top 8 Alternatives To Snowflake https://analyticsindiamag.com/ai-origins-evolution/top-8-alternatives-to-snowflake/ Fri, 06 Aug 2021 12:30:00 +0000 https://analyticsindiamag.com/?p=10045427 Snowflake alternatives

Snowflake offers fast, reliable, secure and cost-effective access to data by creating a single, governed and immediately available source.

The post Top 8 Alternatives To Snowflake appeared first on AIM.

]]>
Snowflake alternatives

Data warehouses are used to employ data for deep analytics. One of the leading players in the data warehousing space is Snowflake

US-based data warehousing company Snowflake Inc was founded in July 2012 by Benoit Dageville, Thierry Cruanes and Marcin Zukowski to address the challenges of companies having to buy expensive hardware appliances to run in their data centres for storing data. Its query engine is made in-house. 

It offers fast, reliable, secure and cost-effective access to data by creating a single, governed and immediately available source. It has partnered with data integration and business intelligence solution providers, including Tableau, Qlik, Sigma, and Stitch, to provide seamless access of data by customers. Snowflake has worked on Amazon S3 since 2014, Microsoft Azure since 2018 and Google Cloud Platform since 2019. 

In this list, we explore alternatives to Snowflake. 

Google BigQuery 

Google’s multi-cloud data warehouse BigQuery is highly scalable and designed for business agility. BigQuery claims to help businesses run analytics at scale with up to 34 percent lower three-year TCO than its alternatives. In addition, it seamlessly integrates with Google Suite products such as Google Analytics. However, it lacks integrations to be able to pull data from non-Google sources.

New customers of BigQuery get $300 free credits to spend on the first 90 days of using Google Cloud. Additionally, customers get 10 GB storage and up to 1 TB queries per month, for free of charge. 

For more information, click here

Amazon Redshift 

Cloud-based data warehouse Amazon Redshift is a product of the Amazon Web Services cloud platform. Mostly designed for data scientists and data engineers, Redshift is fast and fully managed, making it simple and cost-effective to analyse data using SQL and existing BI tools. 

However, Redshift users require third-party tools when it comes to ETL and data transformation. It is HIPAA and GDPR compliant. Redshift offers two months of a free trial, post which its pricing begins from $0.25 per hour. 

For more information, click here

IBM Db2 

IBM’s Db2 warehouse offers in-memory BLU processing technology and in-database analytics. It provides scalability and performance through MPP architecture and is compatible with Oracle and Netezza. Therefore, it is a suitable option for businesses that need to keep data on-premises or want the flexibility of the cloud without compromising on the privacy requirements. Additionally, IBM Db2 is ideal for businesses considering hybrid architecture to modernise its data warehouse. 

For more information, click here

Panoply 

Co-founded by Yaniv Leven and Roi Avinoam, Panoply is an end-to-end cloud data warehouse and management service. Its no-code data integrations ensure zero maintenance. Its features include:

  • Automatic data type detection.
  • Built-in performance monitoring system.
  • Hands-free scaling.
  • Pre-built SQL queries. 

As an ETL tool, Panoply comes with built-in ETL integrations to ready-to-use data sources. In addition, Panoply offers plug-and-play compatibility with analytic notebook and BI tools. 

Panoply offers 14-days of free trial, after which it offers three different annual pricing options, starting with $399 per month. 

For more information, click here

Microsoft SQL Server

Microsoft SQL combines data analytics and warehousing, making it one of the most popular SQL database formats. It is used in Azure data warehouse and Microsoft transactional database. After the launch of Azure Synapse Analytics, Microsoft created an unified platform for ingesting, preparing, managing and serving data that can be channeled into BI and ML tools. 

Microsoft SQL Server’s pricing is volume-dependent, however, the services can be availed for the first 180-days for free. 

For more information, click here

Azure Synapse Analytics 

Azure Synapse Analytics brings together data integration, enterprise data warehousing and big data analytics. It is a cloud-based enterprise data warehouse. Azure Synapse Analytics leverages MPP to run complex queries across data. It allows ingestion, exploration, preparation, management and serving of data for immediate BI and ML needs. 

For Azure Synapse Analytics, a customer only has to pay for the capabilities that they opt in to use. 

For more information, click here

Azure Data Lake Storage

Microsoft Azure Data Lake Storage allows developers, data scientists and analysts to store data of all size, shape and speed and perform all types of processing and analytics across platforms and languages. Furthermore, it integrates with existing operational stores and data warehouses, allowing one to extend their present data applications. 

It is massively scalable and secures data lake for customers’ high-performance analytics workloads. Additionally, it offers a single storage platform for ingestion, processing and visualisation supporting common analytics frameworks. 

For more information, click here

Oracle Exadata Cloud Service 

Oracle Exadata Cloud Service allows customers to run Oracle Database workloads in the cloud. Its infrastructure is isolated from other users, ensuring maximum security, performance and uptime. 

Oracle Exadata Cloud Service is cost effective and reliable for data warehousing and business intelligence. In addition, it offers flexibility in licensing options. The first 30-days are free. 

For more information, click here

The post Top 8 Alternatives To Snowflake appeared first on AIM.

]]>
Top Private Cloud Service Providers For Data Analytics https://analyticsindiamag.com/ai-origins-evolution/topprivate-cloud-service-providers-snowflake/ Sun, 04 Oct 2020 10:30:00 +0000 https://analyticsindiamag.com/?p=10008807 Snowflake Open Sources Arctic, Family of Embedding Models for RAG

Maintaining data centres is not a core competency for many enterprises. So, many companies started surfacing over the past decade promising to alleviate the enterprise woes by offering seamless intelligence services via the cloud; a new tribe of cloud platforms known as private cloud services came to the fore. These startups allowed customers to shift […]

The post Top Private Cloud Service Providers For Data Analytics appeared first on AIM.

]]>
Snowflake Open Sources Arctic, Family of Embedding Models for RAG

Maintaining data centres is not a core competency for many enterprises. So, many companies started surfacing over the past decade promising to alleviate the enterprise woes by offering seamless intelligence services via the cloud; a new tribe of cloud platforms known as private cloud services came to the fore. These startups allowed customers to shift that burden by virtualising hardware for storage and compute, integrating multiple intelligent platforms and offloading cumbersome maintenance to experts. 

IT companies are rapidly migrating to the cloud as it has become the ideal channel for delivering enterprise analytics. This was further fueled by the COVID-19 pandemic. Cloud data stores are so performant that complex queries on massive data now execute in seconds.

In this article, we take a look at the fastest growing cloud service providers for B2B data analytics, AI and more. The list is arranged in the order of the funding released so far. (Funding details via Forbes The Cloud 100)

Snowflake

Funding: $1.4 billion

The Sydney based cloud data platform, Snowflake, which made the headlines by raising $3.36 billion in its IPO last month. Backed by Warren Buffet, this cloud startup also became 2020′s biggest IPO. Snowflake is a fully-managed service that’s simple to use but can power a near-unlimited number of concurrent workloads. Snowflake offers solutions for data warehousing, data lakes, data engineering, data science, data application development, and for securely sharing and consuming shared data.

Samsara

Funding: $930 million

Samsara, which claims to be the leader in Industrial IoT markets was founded in 2015 and today powers over 2 million networks worldwide. Headquartered in San Francisco, Samsara’s portfolio of complete Internet of Things (“IoT”) solutions combine hardware, software, and cloud to bring real-time visibility, analytics, and AI to operations. The company serves over 15,000 customers across diverse sizes and industries, from transportation and logistics to field services, food production, energy, construction, local governments, and manufacturing.

Databricks

Funding: $897 million

Founded in 2013 by the creators of Apache Spark™, Delta Lake and MLflow, Databricks marries data engineering, science and analytics on an open, unified platform so that the customers can collaborate and innovate faster. Headquartered in San Francisco, the company’s global partners include Microsoft, Amazon, Tableau, Informatica, Cap Gemini and Booz Allen Hamilton. More than 5,000 organisations worldwide —including Shell, Conde Nast and Regeneron — rely on Databricks for data science, full-lifecycle machine learning and business analytics. 

.

Rubrik

Funding: $553 million

Rubrik offers a one-stop solution for all the data hurdles an organisation has to overcome. The solutions include cloud, at the edge, or on-prem for backup, disaster recovery, archival, compliance, analytics, and copy data management. Rubrik calls itself an intelligent Data Management stack where every layer is independently resistant to failures. The platform is designed to run on-prem, or on the cloud, the stack is anchored by Infinity (API ), Cerebro and Atlas (cloud-scale file system built from scratch. 

Rubrik’s Blob Engine and Distributed Task Framework orchestrate data from on-prem to cloud so that the data can be retrieved quickly in real-time. Unlike legacy solutions, Rubrik has integrated an API-first architecture and consumes the same APIs published and offered to users.

Confluent

Funding: $456 million

Built by the original creators of Apache Kafka, Confluent Cloud is the industry’s only fully managed, cloud-native event streaming platform powered by Apache Kafka. Apache Kafka is an open-source, distributed streaming platform that enables 100,000+ organisations globally to build event-driven applications at scale. Confluent cloud provides a simple, scalable, resilient, and secure event streaming platform for the cloud-first enterprise, the DevOps-starved organisation, or the agile developer on a mission.

DataRobot

Funding: $431 million

DataRobot’s enterprise AI platform offers seamless cloud integration with all of the preferred hosting providers. Supported by all of the major cloud hosting providers, DataRobot allows users to scale their infrastructure securely and at less costs, without having to commit to one cloud vendor for storage, compute, and machine learning. DataRobot’s multi-cloud strategies include cloud data integration with AWS, Microsoft Azure, and GCP. 

Sisense

Funding: $270 million

Ever since its first commercial release in 2010, Sisense has grown to become an industry leader with over $100M in annual recurring revenue, over $200M in funding. 

With more than 2,000 global customers, Sisense for Cloud Data Teams provides data teams with the ability to build cloud data pipelines, perform advanced analysis using languages they already know like SQL, Python, and R, and create advanced, custom visualizations to easily share insights.

The post Top Private Cloud Service Providers For Data Analytics appeared first on AIM.

]]>
Nvidia’s $40 Billion Bet, Apple’s Best Chip So Far And More In This Week’s Top News https://analyticsindiamag.com/ai-news-updates/nvidia-arm-apple-microsoft-oracle-news/ Sat, 19 Sep 2020 12:30:10 +0000 https://analyticsindiamag.com/?p=10007686

This week, American tech giants have reignited their ambitions of becoming the centre of global chipset innovation. Nvidia which has been having a tremendous run has now probably become the biggest chip companies ever by pocketing UK’s crown jewel Arm Holdings. Arm’s architecture powers 95% of the world’s smartphones. Even that of Apple’s. Whereas, Apple […]

The post Nvidia’s $40 Billion Bet, Apple’s Best Chip So Far And More In This Week’s Top News appeared first on AIM.

]]>

This week, American tech giants have reignited their ambitions of becoming the centre of global chipset innovation. Nvidia which has been having a tremendous run has now probably become the biggest chip companies ever by pocketing UK’s crown jewel Arm Holdings. Arm’s architecture powers 95% of the world’s smartphones. Even that of Apple’s. Whereas, Apple too, has set a new benchmark by launching iPad Air which houses the world’s first 5 nanometer chip—A14 Bionic. Know more about what has happened this week brought to you by Analytics India Magazine.

Big Pharma Tunes Into Mellody

Machine Learning Ledger Orchestration for Drug Discovery (MELLODDY) — an Innovative Medicines Initiative-funded consortium of 10 pharmaceutical partners that includes biggies like  AstraZeneca, GSK and Novartis have announced the creation of a secure predictive modelling platform and the first successful Federated Learning run using their new platform.

The aim of the consortium is to develop a cutting-edge FL platform that enables the generation and enhancement of predictive ML models, using distributed pharmaceutical data and without exposing or revealing any of the individual company’s proprietary data and models.

World’s First 5 nm Chips Are Here

Apple, at their latest event ‘Time Flies’, introduced an all-new iPad Air that houses a powerful A14 Bionic chip, a 5 nm chipset. This makes the iPad Air, the world’s first device to operate on a 5nm chip. “We’re excited to introduce Apple’s most powerful chip ever made, the A14 Bionic,” said Greg Joswiak, Apple’s senior vice president of Worldwide Marketing.

A14 Highlights 

  • Apple claims to be the “first in the industry” to make use of the 5 nm process technology to manufacture chips.
  • A14 to house 8 billion transistors. 40% more than A13.
  • CPU is 40 percent faster than the one in the previous iPad Air.
  • 30% increase in graphics performance.
  • Delivers 11 trillion operations per second.

Nvidia Pockets Arm 

On Tuesday, NVIDIA announced that it would acquire Arm Limited from SBG and the SoftBank Vision Fund in a transaction valued at $40 billion. According to the terms of the transaction, NVIDIA will pay SoftBank $21.5 billion in common stock and $12 billion in cash. This includes $2 billion payable at signing. 

Deal Highlights

  • Unites NVIDIA’s leadership in AI with Arm’s vast computing ecosystem 
  • NVIDIA will build a world-class AI research and education center, and build an Arm/NVIDIA-powered AI supercomputer.
  • Arm’s open-licensing model is still intact and Arm’s IP licensing portfolio to expand with NVIDIA technology. Arm’s IP will remain registered in the U.K.

Full details here.

Oracle Closes In On TikTok Deal

Among Microsoft and a handful of other suitors, Larry Ellison’s Oracle has been chosen by Chinese internet company ByteDance to anchor itself on the US soil.

On Wednesday, Oracle confirmed it struck a deal with TikTok-owner Bytedance to become a “trusted technology provider”.  Oracle would reportedly be managing TikTok’s US data. The deal still needs approval from the US and China. Full details here. However, this deal doesn’t look like it will fly off. Today, the US government announced the banning of two of the most successful Chinese exports WeChat and TikTok from app stores. 

According to the Department of Commerce, as of September 20, 2020, the following transactions are prohibited:

  • Any provision of service to distribute or maintain the WeChat or TikTok mobile applications, constituent code, or application updates through an online mobile application store in the U.S.
  • Any provision of services through the WeChat mobile application for the purpose of transferring funds or processing payments within the U.S.

Following this announcement, ByteDance filed a case against the Trump administration in the Washington Federal Court.

Cloud Underwater?!

Microsoft came up with this idea of the underwater datacenter concept back in 2014. The objective here was to deliver lightning-quick cloud services to coastal populations and save energy. In 2018, this container sized data center helmed by Team Natick of Microsoft, placed it on the seabed of Scotland. This week, after 2 years underwater, the team retrieved the datacenter. The phase 2 of this experiment, to their surprise, has turned out to be performing well. Majority of the populations lies near to coast so why not their data?

The success of underwater data centers can mean the following:

  • Now deploy data centers in 90 days.
  • Since half of the world’s population lives within 200 km of the ocean, the data centers can be taken to the customers. 
  • Low latency.

Full details here.

Snowflake IPO Raises $3.36 Billion

Sydney based cloud data platform, Snowflake raised $3.36 billion in its initial public offering. Debuted on Wednesday, this Buffet backed cloud startup also became 2020′s biggest IPO that is expected to rake in nearly $4 billion at a valuation of roughly $33 billion just eight years after being established. Snowflake raised $479 million on a $12.4 billion valuation with the likes of Salesforce and Sequoia leading the investments.

Qualcomm Cloud AI Kit

Qualcomm Technologies announced the release of Cloud AI 100, a high-performance AI inference accelerator that uses advanced signal processing and cutting-edge power efficiency to support AI solutions for multiple environments including the datacenter, cloud edge, edge appliance, and 5G infrastructure.  The Cloud AI 100 software suite supports popular frameworks including Tensorflow & PyTorch.   The newly announced Qualcomm Cloud AI 100 Edge Development Kit is engineered to accelerate adoption of edge applications.

The post Nvidia’s $40 Billion Bet, Apple’s Best Chip So Far And More In This Week’s Top News appeared first on AIM.

]]>