While opting for data architecture solutions, companies frequently fall into the trap of paying exorbitant prices for services they don’t need. A recent blog by Kieran Healey points out that companies like Databricks or Snowflakes are offering Ferraris when many companies could do their work with Toyota.
Databricks and Snowflake are undoubtedly robust platforms that offer impressive capabilities. Snowflake’s partnership with NVIDIA and Databricks’ integration with the Spark Human API showcased their technical prowess and made it even bigger. Yet, such features often serve as marketing tactics rather than essential solutions, which companies end up paying instead of open source solutions.
For example, instead of opting to pay the price of an LLM-based chatbot, most companies could effectively address their data challenges with simpler, more cost-efficient solutions such as a simple “press 1 to choose this option”. But when it comes to addressing data-related challenges without overspending, companies should adopt an anti-hype mindset.
A person from Databricks suggested on HackerNews that though companies might be able to create their own Spark deployment, it will run much slower than how it runs on Databricks or its proprietary runtime. He further adds that a lot of businesses have other problems to solve and focusing on building DIY platforms is a horrible approach.
Interestingly, none of this matters if you only have gigabytes of data as the company can use pretty much anything very cheaply and easily. It is just about companies that have terabytes or hundreds of terabytes of data.
Open source vs commercial solutions
On the other end, it seems easy to hop onto the open source solutions as well, given the cost-effective value that they are presented as. One side of the debate emphasises the financial advantage of open source solutions. Supporters highlight the fact that open source software is often free to use, suggesting that the cost savings alone make it a compelling choice.
However, it is essential to be pointed out that while the software itself may be free, deploying, maintaining, and expertly managing open source solutions can incur significant costs. Paying skilled professionals to ensure proper deployment and upkeep can strain both time and resources.
“Open source it may be. Free it is not. Paying an expert to correctly deploy an open source solution takes time and money,” said another user. This argument underscores the idea that simply adopting open source software isn’t a guaranteed money-saving solution without proper expertise and management.
On the opposite side, commercial solutions such as Databricks and Snowflake might come with upfront costs, but offer comprehensive support, integration, and scalability that can be invaluable. These solutions often package features, support, and maintenance into a single offering, reducing the need for extensive in-house expertise. Furthermore, commercial solutions can provide a level of assurance and accountability that can be lacking in open source alternatives.
Though you pay to change the parameters of the problem. This is a fundamental misunderstanding of how to get things done in a constrained environment. This viewpoint highlights the notion that the trade-off between open source and commercial solutions is about more than just cost—it’s about shifting the focus from technical challenges to non-technical ones.
Funnily, it’s like saying no company needs a cloud provider but it definitely helps them focus on better things instead of building a data centre themselves.
The Anti-Hype Approach
In the debate over data platform choices, context and expertise play pivotal roles. While open source solutions can be powerful tools when implemented correctly, they require a skilled team to navigate potential challenges. Conversely, commercial solutions can mitigate many technical complexities, enabling organisations to concentrate on their core business goals. However, this often involves a trade-off between flexibility and vendor lock-in.
Ultimately, there is no one-size-fits-all answer to the open source vs commercial debate in the context of data platforms. The decision depends on the unique circumstances of each organisation—its budget, existing expertise, scalability requirements, and risk tolerance.
In the current age, when CEOs are being pushed to say generative AI by everyone, it might be easy to fall into the trap and overspend on over engineered solutions. It’s essential to scrutinise its applicability. Instead of focusing on novel technologies, companies should adhere to the age-old principle of delivering tangible returns on investments and CEOs are always looking for solutions that not only enhance but also generate profits.