Claude-2 vs GPT-4 – Which is Better?

A true competitor for OpenAI is finally here and might make the company drop its prices and come to the ground to compete

Share

Published on July 13, 2023

by Mohit Pandey

Google-backed Anthropic, an AI lab based in San Francisco, has unveiled Claude 2, a publicly accessible alternative to GPT-4. Previously, Claude, the earlier iteration, was exclusively offered to enterprises, but the latest version is now open to the general public in the United States and the United Kingdom. Distinguishing itself from its predecessor, Claude 2 is accessible through both a beta website and an API.

The timing couldn’t have been better. Claude-2 comes at a time when the popularity of GPT has seen a decline in recent months. Users are seeking alternatives that offer superior performance and affordability. Claude-2 appears to fit the bill, with its enhanced capabilities and cost-effectiveness.

Learning from Google’s Bard and OpenAI’s ChatGPT and taking user feedback into account, Anthropic has made significant enhancements to Claude-2. Users on Twitter have been lauding Claude’s ability to engage in natural language conversations, clearly explain its reasoning, and produce less harmful outputs. Claude-2 builds on these strengths and adds several key features that elevate its performance to new heights.

One notable improvement is Claude-2’s enhanced coding, maths, and reasoning skills. This includes reading PDFs, something that GPT-based models still struggle with. This is exactly the time around when OpenAI has introduced Code Interpreter on its paid models.

The best part? 100K context, so it can "see" the entire file

Oh yah and its free

This is arguable better than GPT4-32K because Its cheaper and has a larger context window
— Sully (@SullyOmarr) July 11, 2023

Let’s Evaluate

Anthropic has put considerable effort into fine-tuning the model. According to the model card of Claude-2, the model is built using unsupervised learning and reinforcement learning with human feedback (RLHF), similar to what OpenAI used for GPT. Moreover, the model is trained with data till early 2023, but does not access the internet.

Claude-2 now boasts an impressive 71.2% score on the Codex HumanEval, a Python coding test, up from 56.0% achieved by its predecessor, Claude-1.3. This is compared to 67% of GPT-4. Claude-2 wins.

Similarly, on the GSM8k maths problem set, Claude-2 scored 88%, an improvement from Claude-1.3’s score of 85.2%. These advancements position Claude-2 as a valuable asset for developers and individuals seeking assistance with technical challenges. GPT-4 wins here with 92% score.

Claude-2, Anthropic's shot at GPT-4, has arrived. It's cheaper than GPT-4 and far stronger in reasoning & coding than its older self.

Things you should know:
▸ On standard exams, it's not quite at GPT-4 yet but catching up fast compared to v1.3. Winner in bracket:
GRE verbal:… pic.twitter.com/k8Vblc1BDg
— Jim Fan (@DrJimFan) July 11, 2023

The most important aspect is the expansion of Claude-2’s input and output capabilities. Users can now input up to 100,000 tokens per prompt, compared to 32,000 of GPT-4, allowing Claude-2 to process extensive technical documentation or even entire books. Additionally, Claude-2 can generate longer documents, ranging from memos to letters to stories, up to a few thousand tokens in length.

This is also 4-5 times cheaper than GPT-4-32K which costs $1.96 per token. Prompt tokens cost $11 per million token vs $60 million for GPT, and completion costs $32 vs $120/M, assuming similar tokenisation length. This will definitely push a lot of users to start using Claude-2 instead of GPT-4.

Read: Busting the Myth of Context Length

Price drop and availability

Anthropic has made Claude-2 available through multiple channels. Users can access Claude-2 via the API, allowing businesses to integrate it into their systems seamlessly. Remarkably, Anthropic has maintained the same pricing for the Claude-2 API as its predecessor, Claude-1.3, making the upgrade to the latest model even more appealing to budget-conscious users.

If Claude 2 turns out to be as strong as GPT-4, thereby breaking the OpenAI monopoly on strong LMing, the number of companies building products on top of LMs will increase substantially. pic.twitter.com/prfWj3jlSu
— Ofir Press (@OfirPress) July 12, 2023

Partners like Jasper, a generative AI platform, have reported Claude-2’s strength in a wide range of use cases, particularly those involving extended content generation. With a 3X larger context window and improved semantics, Claude-2 has empowered Jasper’s customers to stay ahead of the curve and achieve their content strategy goals.

Another notable collaboration involves Sourcegraph, a code AI platform that assists developers in writing, fixing, and maintaining code. Sourcegraph’s coding assistant, Cody, leverages Claude-2’s improved reasoning and access to a larger context window of up to 100,000 tokens. By providing accurate answers and incorporating codebase context, Cody assists developers in speeding up their workflow and staying up to date with the latest frameworks and libraries.

Safe but still hallucinatory

According to Anthropic, the model has undergone rigorous evaluation, including internal red-teaming and automated tests on harmful prompts. In these evaluations, Claude-2 demonstrated a twofold improvement in providing harmless responses compared to Claude-1.3. While no model is completely immune to misuse, Anthropic accepts that.

“For example, Claude models could support a lawyer but should not be used instead of one, and any work should still be reviewed by a human,” reads the paper. People on Twitter have been already pointing out that the claims of being good at maths are overstated.

It seems that Claude 2 is as bad at math as GPT4, maybe even slightly worse. Here's one example interaction: it makes a massive and fairly obvious error in screenshots 1) and 2), and in 3) it isn't aware of an important (openly available) paper published in 2015 pic.twitter.com/2OKojeE3qC
— Zygi (@nonagonono) July 11, 2023

Anthropic acknowledges the evolving nature of AI and is committed to responsible deployment. Claude-2 is poised to become a trusted companion for individuals and a valuable tool for businesses.

As users seek alternatives to declining ChatGPT usage, Claude-2’s budget-friendly offering and remarkable feature set make it an enticing option. Seems as though, a true competitor for OpenAI is finally here and might finally make the company drop its prices and come to the ground to compete.

📣 Want to advertise in AIM? Book here