In response to a significant demand from its customers, Databricks is intensifying its efforts in data engineering. According to CEO and co-founder Ali Ghodsi, the company initially perceived AI as the primary area of interest, however, customer feedback guided them to prioritise data integration, leading to the acquisition of Arcion and its subsequent integration into Databricks.
“Two years ago, at the CIO Forum, we asked our customers what they wanted most from Databricks, and the majority expressed a need for easier data integration,” Ghodsi said in an exclusive conversation with AIM.
“Now, customers can seamlessly integrate data from sources like Salesforce, Workday, Google Analytics, SQL Server, MySQL, and Postgres into Databricks. This strategic move aligns with our customer’s needs and has the potential to significantly impact our financial performance,” he explained.
Speaking to AIM, Nick Eayrs, vice president – field engineering APJ, Databrick, explained that the emphasis on data engineering over AI is essential for building a solid data foundation necessary for effective AI implementation.
He highlighted a collaborative approach involving data-literate C-suite executes who work closely with data engineers to source and enrich data.
“We need more enriched data to kind of answer the problem space, there’ll be able to then give that on to analysts in the same sort of environment to then go and explore and visualise and, you know, double click into the data to see if they can find something interesting in the data, and then kind of resurface that back to the C level very, very seamlessly,” he pointed out.
Eayrs described a comprehensive process where analysts explore and visualise data to uncover valuable patterns. These insights are then communicated back to the executives, fostering a team-oriented approach.
Advanced data platforms facilitate real-time collaboration and sharing, ensuring that the entire organisation can leverage data effectively.
Databricks LakeFlow
Databricks LakeFlow is a new solution designed to unify and streamline data engineering from ingestion through transformation and orchestration. With LakeFlow, data teams can efficiently ingest data from various databases.
LakeFlow automates the deployment, operation, and monitoring of pipelines at scale, featuring built-in CI/CD support and advanced workflows with capabilities such as triggering, branching, and conditional execution.
It incorporates data quality checks and health monitoring, integrated with alerting systems like PagerDuty. LakeFlow simplifies the construction and management of production-grade data pipelines, empowering data teams to meet the increasing demand for reliable data and AI solutions.
Convergence of AI and Data Engineering
Data engineering ensures that data is clean, complete, and reliable, as AI models rely heavily on high-quality and accurate data to function correctly.
A year ago, there was a discussion about when AI would be able to make sense of and take ownership over the mountains of SQL data that engineers and analysts have been accumulating for years.
Maxime Beauchemin, the CEO & founder of Preset and a pioneering data engineer who created Apache Airflow and Apache Superset, humorously commented on a concept from both AI and software development: “But what happens when AI can create spaghetti SQL faster than all of us!? That’s when we reach the spaghetti SQL singularity. Infinite happiness ensues.”
At the Data + AI Summit, Databricks announced several innovations for the Mosaic AI platform to help customers build production-quality generative AI applications. The focus is on supporting compound AI systems, improving model quality, and introducing new AI governance tools.
Further, it introduced Shutterstock ImageAI, an AI tool for advanced image analysis that integrates seamlessly into business workflows.
Additionally, Databricks unveiled Databricks AI/BI, an intelligent analytics platform featuring AI-powered dashboards and a conversational interface, Genie, for natural language queries. This platform aims to make data analytics accessible to all organisational levels without needing specialised knowledge.