Cosine similarity, a popular metric in assessing semantic similarity between high-dimensional objects, is facing scrutiny regarding its universal applicability. While commonly employed across various domains, recent studies suggest that its efficacy might not be consistent across all scenarios, prompting a re-evaluation of its necessity.
In a recent paper by Netflix researchers Harald Steck, Chaitanya Ekanadham, and Nathan Kallus, titled ‘Is Cosine-Similarity of Embeddings Really Necessary Everywhere?’, the authors explore the nuanced landscape surrounding cosine similarity within the context of linear matrix factorisation models. Their findings challenge the assumption that cosine similarity is indispensable in all AI applications, advocating for a more discerning approach.
Central to the debate is the realisation that cosine similarity may not always capture semantic similarity as reliably as previously thought. The Netflix study uncovers instances where cosine similarity fails to provide meaningful insights, emphasising the importance of considering alternative measures in certain contexts.
In natural language processing, for example, where cosine similarity is often used to assess the semantic similarity between words or documents, researchers may benefit from considering alternative metrics that capture the nuances of language semantics better.
Similarly, in recommender systems, where cosine similarity is employed to measure the similarity between users or items, exploring alternative techniques may lead to more accurate recommendations and improved user satisfaction.
What to do?
“Why would anyone expect cosine-similarity to be a useful metric?,” said a user on HackerNews. “In the real world, the arbitrary absolute position of an object in the universe (if it could be measured) isn’t that important, it’s the directions and distances to nearby objects that matter most,” he added.
In traditional linear matrix factorisation models, the reliance on cosine similarity as a default similarity measure is commonplace. However, as the Netflix study reveals, this may not always be justified.
Bindu Reddy, the founder of Abacus.AI, posted on X saying, “It goes back to building good RAG systems, which is hard. Before deploying these systems, you have to make intelligent decisions about chunking, hierarchical chunking, embedding, and even the algorithm for similarity look-up. Failure modes will be high and accuracy low if you don’t use the appropriate techniques.”
A recent blog by Andew Nguonly also suggested adding keyword search to improve the results, and further improving it by involving RAG. “In fact, I’d bet that this is the expectation of many users of RAG applications already,” he said in the blog.
To make this clear, authors of the paper proved it by leveraging analytical derivations from linear matrix factorization models, the authors shed light on the limitations of cosine similarity and propose alternative metrics for similarity evaluation.
Their findings highlight the need for a nuanced understanding of the underlying data and the suitability of different metrics based on the specific requirements of each application, which leads back to using vector embeddings and RAG.
One alternative metric gaining traction in similarity evaluation is the unnormalised dot-product between embedded vectors. Unlike cosine similarity, which relies on the normalisation of vectors, the unnormalised dot-product preserves the original magnitudes of the vectors, providing a different perspective on similarity assessment.
Additionally, techniques based on optimal transport have emerged as promising alternatives to cosine similarity, particularly in scenarios where the underlying data exhibits complex geometric structures. While computationally expensive, these techniques offer a more theoretically sound approach to measuring similarity, taking into account the curvature of the underlying manifold.
Not all that redundant
By embracing a more nuanced approach to similarity evaluation, informed by the latest research and guided by the specific characteristics of their data and analysis goals, researchers and practitioners can unlock new insights and drive innovation in their respective fields. However, the paper also falls short of proposing alternatives to the test, leaving us hanging.
“There aren’t any alternatives: cosine similarity is effectively an extension of Euclidean distance, which is the mathematically correct way for finding the distance between vectors,” explained a user on HackerNews. “You may not want to use cosine similarity as your only metric for rankings, however, and you may want to experiment with how you construct the embeddings.”
While acknowledging the utility of cosine similarity in many scenarios, the authors caution against its indiscriminate use. Instead, they advocate for a case-by-case evaluation of the appropriateness of cosine similarity, considering factors such as data characteristics, model architecture, and the intended use of similarity metrics.
The debate surrounding the necessity of cosine similarity underscores the evolving nature of similarity evaluation in data-driven fields. While cosine similarity remains a valuable tool in many contexts, its blind application may lead to suboptimal outcomes.
Through rigorous analysis and thoughtful consideration of alternatives, researchers and practitioners can navigate the complexities of similarity measurement more effectively, ensuring that the right metric is used for the right task.