Claude News, Stories and Latest Updates

Anthropic Launches Free Android Claude AI Chatbot to Expand Mobile Reach

MIA — Wed, 17 Jul 2024 09:34:16 +0000

Anthropic, the artificial intelligence company backed by Amazon, has officially released its Claude mobile app for Android devices. This launch comes after the app’s initial debut on iOS, marking a significant expansion of Claude’s availability to a broader mobile user base.

Free Claude’s AI ChatBot for Android Brings Advanced AI Features

The new Android app offers users Free access to Anthropic’s Claude 3.5 Sonnet model, providing a range of sophisticated AI capabilities.

Key features include:

Cross-platform synchronization, allowing users to continue conversations across devices
Real-time image analysis through photo uploads or camera captures
Multilingual processing and translation
Advanced reasoning for complex problem-solving

The app is available for free download on the Google Play Store, with both free and premium subscription options. Free users can access the basic functionalities, while Pro and Team plans offer additional benefits such as increased usage limits and access to more advanced models.

Google Play store Download Link for Claude by Anthropic

Anthropic’s Strategic Move to Capture Android Market Share

Anthropic’s decision to launch on Android after iOS appears to be a calculated business strategy. By initially targeting Apple’s ecosystem, known for its high-spending and early-adopter user base, Anthropic likely aimed to establish a premium positioning for Claude.

The subsequent Android release now allows Anthropic to tap into the world’s largest mobile operating system, with over 3 billion active users. This move positions Claude to compete more directly with other AI assistants like OpenAI’s ChatGPT, which has already gained significant traction on both iOS and Android platforms.

As the AI assistant market continues to evolve rapidly, Anthropic’s expansion to Android could prove crucial in scaling its user base and collecting valuable real-world data to improve its AI models. The coming months will be critical in determining whether Claude can leverage this opportunity to challenge the dominance of established players in the mobile AI assistant space

The post Anthropic Launches Free Android Claude AI Chatbot to Expand Mobile Reach appeared first on AIM.

Anthropic Introduces Fine-Tuning for Claude 3 Haiku on Amazon Bedrock

Donna Eva — Thu, 11 Jul 2024 11:13:00 +0000

Anthropic has launched fine-tuning capabilities for Claude 3 Haiku on Amazon Bedrock. The feature is available for preview in the US West (Oregon) AWS region, allowing businesses to customise the Claude 3 model for specific tasks.

Claude 3 Haiku, which is a part of the Claude 3 family alongside Sonnet and Opus, can now undertake tasks like classification, API interactions, and data interpretation. The fine-tuning process uses prompt-completion pairs to enhance the model’s performance in specialised areas. The entire process is guided through the Amazon Bedrock console or API, where users can test and refine their custom models before deployment.

Several enterprise customers have already begun using the fine-tuning feature, allowing customisation of Haiku to fit their needs. SK Telecom reported a 73% increase in positive feedback for agent responses and a 37% improvement in key performance indicators after implementing a fine-tuned Claude model. Thomson Reuters plans to fine-tune Claude 3 Haiku with their industry expertise.

The features serves as a boon to enterprise customers, as it gives them more control over the model to train it according to their needs. Specifically, allowing the fine-tuning of Haiku will help in improving accuracy in specialised tasks, allow for faster processing at lower costs, give enterprises consistent output formatting, make the API accessible for companies, and ensuring data security for AWS customers.

As AI technology advances, customisation options like fine-tuning have become more abundant, especially as offerings from cloud infrastructure providers. This development in AI customisation could impact various sectors, from customer service to data analysis, as businesses seek to integrate AI more deeply into their operations.

Additionally, other cloud infrastructure companies like Microsoft Azure and GCP have also increased their GenAI offerings to businesses, giving them the option to build agents on the models they offer.

The post Anthropic Introduces Fine-Tuning for Claude 3 Haiku on Amazon Bedrock appeared first on AIM.

Claude’s Performance Tanks After EU Updates

Donna Eva — Wed, 12 Jun 2024 07:07:42 +0000

Anthropic recently launched Claude in the European Union and updated its ToS (terms of service). The company highlighted policy refinements, high-risk use cases and certain disclosure requirements within its usage policy, possibly to align with the EU regulations.

Interestingly, the policy changes applied to users worldwide. Soon after, complaints about the model’s performance began surfacing from across the globe.

Why the Change?

Users noticed a marked change in the way Claude reacted to certain prompts and questioning. While there have been several theories as to why the company decided to shuffle things, the most believable seems to be that Anthropic is trying to anticipate the upcoming EU AI Act, thanks to its recent deployment in the region.

Like one Reddit user said, the rest “is just a cheap conspiracy. The new ToS is because they are finally deploying to the EU, and therefore need to comply with this,” pointing to the EU’s Artificial Intelligence Act (AIA).

Anthropic has gone all-in on creating a more holistic policy, ahead of their launch in the EU as well as more recently in Canada. However, other big tech companies have faced similar problems in the EU.

OpenAI, Meta and Others Follow

Now, Anthropic making overarching policy changes to fit in with EU standards isn’t unwarranted. The region has been notorious for cracking down on companies not following through with the regulations.

Case in point, OpenAI was recently in hot water when an Italian regulatory body accused the company of violating the EU privacy laws. In January this year, the company was subjected to a fact-finding initiative by Italy’s Data Protection Authority (DPA), where they alleged that user data had been used to train OpenAI’s ChatGPT.

This, they said, was in violation of the EU General Data Protection Regulation (GDPR).

Similarly, Meta updated its privacy policy, stating, “To properly serve our European communities, the models that power AI at Meta need to be trained on relevant information that reflects the diverse languages, geography and cultural references of the people in Europe who use them.”

However, this was also flagged by an Austrian privacy organisation, NYOB, stating that this also violated EU GDPR.

With countries in the EU closely following AI companies on how they implement their policies, Anthropic’s need for such a drastic change makes sense. But whether this change is doing good overall is up for debate.

How Bad is the Change?

As per the updated usage policy, Anthropic prohibits the usage of its services in compromising child safety, critical infrastructure, and personal identities. They have also barred making use of their products to create emotionally and psychological harmful content, as well as misinformation, including those used in elections.

There are several other changes made to the policy, as well as their ToS and privacy policies, including the right to request deletion of personal data and the option to opt out in case of data selling to third parties.

While most would be happy about stricter data privacy policies, users have reported that Claude is performing significantly worse this year. Particularly, with respect to the use cases in the updated usage policy.

“Some stuff that’s very open to interpretation or just outright dumb. Want to write some facts about the well-documented health risks of obesity? You’d be violating the “body shaming” rule. You can’t create anything that could be considered ‘emotionally harmful’,” one Reddit user said.

Further, they said that this would be worse to determine, considering there is no guarantee that those reviewing violations would be unbiased or neutral in terms of political misinformation.

Additionally, sexually-explicit content generation has also been significantly restricted. One user said that a story they had been working on with Claude had stopped progressing because Claude refused to continue, stating that it was uncomfortable with the prompt.

This was further backed by several users who stated the same issue, including one who said that Claude refused to comply with providing quotes from certain fictional characters, citing copyright infringement.

“You can’t ‘promote or advocate for a particular political candidate, party, issue or position’. Want to write a persuasive essay about an issue that can be construed as political? Better not use Claude,” they said.

What’s the Damage?

At the moment, users are willing to give both Claude and Anthropic the benefit of the doubt. With the updated policies, seemingly also due to the EU AI Act, Anthropic has made it easier to flag issues with their products and data privacy concerns.

This includes two emails, including one for Anthropic’s Data Protection Officer (DPO), to raise complaints or offer feedback, which was not present in the previous iteration of their policy.

Similarly, users believe that while Claude seems to have been handicapped by the new ToS, this could be reverted if given enough time and if the issues are raised by the users. “Anthropic does seem willing to listen to user feedback – and we’ve seen with the release of the Claude 3 models the dialling back of the refusals. So I think, at some point in the future, Anthropic will loosen up on things like that,” another user said.

Whether this can actually happen or if Anthropic will stick to its guns to preserve a user base in the EU and Canada is yet to be seen.

It’s no surprise to conclude that the noose is only tightening around big tech companies, and Claude seems to be the actual first in a long line of victims of over-regulation.

The post Claude’s Performance Tanks After EU Updates appeared first on AIM.

Claude is Finally Available to Users in the EU

Donna Eva — Tue, 14 May 2024 10:38:41 +0000

Anthropic announced the release of Claude to its user base, both individuals and businesses, in the EU on Tuesday.

This is another step towards Anthropic focusing on its EU customers, as the company released its Claude API in Europe earlier this year.

The AI chatbot will be accessible on desktop as well as on their newly launched iOS app. Users from the EU will be able to access both the free and paid versions of Claude with a subscription cost of €18, excluding VAT.

The EU release comes after the region released a set of regulations earlier this year governing AI. In accordance with this, Anthropic made sure to emphasise a focus on privacy and security.

Alongside the EU release, the company also announced an update to their Terms of Service. Anthropic highlighted policy refinements, high-risk use cases and certain disclosure requirements within their usage policy, possibly to align with the regulations put forth by the EU.

“We’ve refined and restructured our policy to give more details about the individuals and organisations covered by our policies. We’ve broken out some specific “high-risk use cases” that have additional requirements due to posing an elevated risk of harm. We added new disclosure requirements so that organisations who use our tools also help their own users understand they are interacting with an AI system,” the company said.

Additionally, while a data retention policy was not specified prior, the default data retention period has been updated to 30 days.

In terms of what’s on the table with this development, businesses in the EU will also have access to Claude Team, a new plan offered by the company specifically for workplaces, with full access to Opus, Sonnet and Haiku, and Claude Pro. The Team plan also includes tools for admin, billing management and document processing.

The Team plan was also launched alongside the Claude iOS app earlier this month. Likewise, its subscription costs amount to $30 per user, or €28 plus VAT in the EU.

The post Claude is Finally Available to Users in the EU appeared first on AIM.

Anthropic Unveils Claude 3 Team Plan for Enterprise Collaboration

Siddharth Jindal — Wed, 01 May 2024 16:01:58 +0000

AI startup Anthropic has introduced a team plan and an iOS app for the Claude 3 family of models. The plan is available for $30 per user per month and grants access to the full suite of Claude 3 model family, including Opus, Sonnet, and Haiku, tailored for diverse business needs. The plan requires a minimum of 5 seats.

Key features of the Team plan include increased usage per user compared to the Pro plan, a 200K context window for processing complex documents and maintaining multi-step conversations, admin tools for streamlined management, and all features from the Pro plan.

In addition to the Team plan, Claude is also launching its iOS app, available for free to all users. The app mirrors the seamless experience of the mobile web, enabling users to sync chat history, upload photos, and access vision capabilities for real-time image analysis.

In the upcoming weeks, Anthropc plans to roll out enhanced collaboration capabilities. These include the ability to incorporate citations from trusted sources for validating AI-generated assertions, integrating with data repositories such as codebases or CRMs, and collaborating with colleagues on AI-generated documents or projects—all while upholding top-tier standards of security and safety.

Earlier, OpenAI also introduced ChatGPT Team, which includes features such as access to GPT-4 with a 32K context window, tools like DALL·E 3, GPT-4 with Vision, Browsing, and Advanced Data Analysis, along with higher message caps. ChatGPT Team costs $25 per month per user when billed annually or $30 per month per user when billed monthly.

The post Anthropic Unveils Claude 3 Team Plan for Enterprise Collaboration appeared first on AIM.

Meta Releases Llama 3, Beats Claude 3 Sonnet and Gemini Pro 1.5

Mohit Pandey — Thu, 18 Apr 2024 16:32:07 +0000

After teasing the world with a glimpse on Microsoft Azure, Meta has finally dropped Llama 3, the latest generation of its LLM that offers SOTA performance and efficiency.

Click here to check out the model on GitHub.

The model is available in 8B and 70B parameter versions and has been trained on over 15 trillion tokens, making it seven times larger than Llama 2’s dataset. Llama 3 provides enhanced reasoning and coding capabilities, and its training process is three times more efficient than its predecessor.

The models are now also available on Hugging Face.

Meta is also training a model with more than 400 billion parameters which Mark Zuckerberg said in a Reel on Instagram is going to be the top performing model out there.

The 7B models outperforms Gemma and Mistral on all benchmarks and the 70B model outperforms Gemini Pro 1.5 and Claude 3 Sonnet.

Llama 3 models is now rolling out on Amazon SageMaker, Databricks, Google Cloud, Hugging Face, Kaggle, IBM WatsonX, Microsoft Azure, NVIDIA NIM, and Snowflake. Additionally, the models will be compatible with hardware platforms provided by AMD, AWS, Dell, Intel, NVIDIA, and Qualcomm.

In addition to the model, Meta has incorporated its latest models into Meta AI, now powered by Llama 3, and expanded its availability across more countries. Meta AI is accessible through Facebook, Instagram, WhatsApp, Messenger, and the web, enabling users to accomplish tasks, learn, create, and engage with their interests.

Additionally, users will soon have the opportunity to experience multimodal Meta AI on Ray-Ban Meta smart glasses.

Meta AI is powered by Llama 3 and is now available in 13 new countries. It includes improved search capabilities and innovative web experiences. The latest updates in image generation on Meta AI allow users to create, animate, and share images with a simple text prompt.

The model uses a 128K-token vocabulary for more efficient language encoding, leading to significantly improved performance. To boost inference efficiency, grouped query attention (GQA) is implemented in both the 8B and 70B parameter models. The models were trained on sequences of 8,192 tokens, with masking to maintain document boundaries.

Llama 3’s training data consists of over 15 trillion tokens sourced from publicly available data, seven times larger than Llama 2’s dataset. The model was trained on two custom built 24k GPU clusters.

It includes four times more code and over 5% high-quality non-English data spanning 30+ languages, though English remains the most proficient. Advanced data-filtering methods, including heuristic filters and semantic deduplication, ensure top-quality training data.

Here is the sneak preview of the upcoming 400 billion parameter Llama 3 model.

The post Meta Releases Llama 3, Beats Claude 3 Sonnet and Gemini Pro 1.5 appeared first on AIM.

Now You Can Use Claude 3 in Google Sheets

Shritama Saha — Thu, 14 Mar 2024 05:24:59 +0000

Google Sheets users can now use Claude 3, Anthropic’s latest LLM, surpassing GPT-4 on prominent benchmarks, including near-instant results and strong reasoning capabilities. With Claude 3, users can streamline their workflow by creating prompt templates within their spreadsheets and filling them with customised data.

How to use Claude 3 in Google Sheets

X user Moritz Kremb shared how to use Claude 3 on sheets. First, users need to install the Claude For Sheets extension, available on the Google Workspace Marketplace. Once installed, they get an API key from the Anthropic Console. With the API key in hand, you have to call Claude using the formula: =Claude(prompt, model)

Once configured, Claude 3 offers a range of possibilities for enhancing productivity and efficiency. A primary feature is its ability to generate personalised content based on user-defined prompts and input data. For example, if you want to craft personalised cold emails tailored to specific customers, you can integrate customer data such as names, industries, and products.

With the ability to automate repetitive tasks and generate tailored content swiftly, Claude 3 empowers users to focus their time and energy on high-value activities, improving productivity.

The Claude 3 model series debuts with a 200,000-token context window, a jump from 100,000 tokens in the second version of Claude. However, these models are flexible in accommodating inputs surpassing one million tokens for selected customers.

Opus is the flagship model of the Claude 3 family. Meanwhile, Claude 3 Sonnet (
available for free now) is designed for enterprise workloads, and Claude 3 Haiku stands out as the fastest and most compact model, ensuring near-instant responsiveness.

Yesterday, Amazon announced that Claude 3 Haiku is now available on Amazon Bedrock.

The post Now You Can Use Claude 3 in Google Sheets appeared first on AIM.

What Makes Anthropic’s Claude 3 Special

Shritama Saha — Thu, 07 Mar 2024 12:40:35 +0000

Amazon’s four-billion dollar baby Anthropic recently released Claude 3, a family of generative AI models called Haiku, Sonnet and Opus, which surpasses GPT-4 on prominent benchmarks, including near-instant results and strong reasoning capabilities. It has also outperformed Gemini 1.0 Pro and is at par or shows competitive results with Gemini 1.0 Ultra.

Longer Context Length

In contrast, Gemini 1.5 shows a substantial leap in performance, leveraging advancements in research and engineering across foundational model development and infrastructure. Notably, Gemini 1.5 Pro, the first model released for early testing, introduces a mid-size multimodal architecture optimised for diverse tasks. Positioned at a performance level akin to 1.0 Ultra, Gemini 1.5 Pro pioneers a breakthrough experimental feature in long-context understanding.

On the other hand, Gemini 1.5 has a 128,000 token context window. Still, like Claude, it allows a select group of developers and enterprise customers to explore an extended context window of up to one million tokens via AI Studio and Vertex AI in private preview.

Unfortunately, the weakest in this space is OpenAI’s GPT-4, which sets a maximum context length of 32,000 tokens. However, GPT-4 Turbo can process up to 128,000 tokens.

Improved Reasoning and Understanding

Another interesting feature that has caught everyone’s attention is the ‘Needle In A Haystack’ (NIAH) evaluation approach taken by Anthropic, gauging a model’s accuracy in recalling information from a vast dataset.

Effective processing of lengthy context prompts demands models with strong recall abilities. Claude 3 Opus not only achieved nearly perfect recall, surpassing 99% accuracy, but also demonstrated an awareness of evaluation limitations, identifying instances where the ‘needle’ sentence seemed artificially inserted into the original text by a human.

During an NIAH evaluation, which assesses a model’s recall ability by embedding a target sentence (“needle”) into a collection of random documents (“haystack”), Opus exhibited an unexpected behaviour. It used 30 random needle/question pairs per prompt to enhance the benchmark’s robustness and tested on a diverse corpus of crowdsourced documents.

In a recount of internal testing on Claude 3 Opus, Alex Albert, prompt engineer at Anthropic, shared that during an NIAH evaluation of the model, it seemed to suspect that the team was running Eval on it. When presented with a question about pizza toppings, Opus produced an output that included a seemingly unrelated sentence from the documents.

The context of this sentence appeared out of place compared to the overall document content, which primarily focused on programming languages, startups, and career-related topics. The suspicion arose that the pizza-topping information might have been inserted as a joke or a test to assess attention, as it did not align with the broader themes. The documents lacked any other information about pizza toppings.

So, Opus not only successfully identified the inserted needle but also demonstrated meta-awareness by recognising the needle’s incongruity within the haystack. This prompted reflection on the need for the industry to move beyond artificial tests.

Several users, who have tried Claude 3 Opus, are so impressed by its reasoning and understanding skills that they feel the model has reached AGI. For example, its apparent intrinsic worldview, shaped by the Integral Causality framework, is appreciated. Claude 3’s worldview is characterised by holism, development, embodiment, contextuality, perspectivism, and practical engagement.

Other reactions from the community that discuss Claude 3’s potential status as AGI are its ability to reinvent quantum algorithms, its intrinsic worldview, and even its comprehension of a complex quantum physics paper.

Another aspect highlighted by NVIDIA’s Jim Fan is the inclusion of domain expert benchmarks in finance, medicine, and philosophy, which sets Claude apart from models that rely solely on saturated metrics like MMLU and HumanEval. This approach provides a more targeted understanding of performance in specific expert domains, offering valuable insights for downstream applications.

Secondly, Anthropic addresses the issue of overly cautious answers from LLMs with a refusal rate analysis. It emphasises efforts to mitigate overly safe responses to non-controversial questions.

However, it is also important to note that people should not overinterpret Claude-3’s perceived “awareness”. Fan believes that a simpler explanation is that instances of apparent self-awareness are outcomes of pattern-matching alignment data crafted by humans. This process is similar to asking GPT-4 about its self-consciousness, where a sophisticated response is likely shaped by human annotators adhering to their preferences.

Even though the topic has been the talk of the town since OpenAI released GPT-4 in March 2023, Anthropic’s Claude 3 falls short. This raises an important question: How close are we to AGI? And, most importantly, who is leading that race?

The post What Makes Anthropic’s Claude 3 Special appeared first on AIM.

Why Claude 3 Is Bad News for Microsoft Azure

Siddharth Jindal — Wed, 06 Mar 2024 10:05:13 +0000

AWS, a cloud market leader, has been facing stiff competition from Microsoft Azure in the past few months. The same goes for Google Cloud, which raised concerns about Microsoft’s monopolistic cloud computing practices.

The tussle in the cloud space is clearly visible in the recent cloud earnings from generative AI. In the fourth quarter of 2023, Microsoft’s Intelligent Cloud division achieved $25.9 billion in sales, while AWS and Google Cloud recorded $24.2 billion and $9.2 billion, respectively.

Now, AWS and Google Cloud’s generative AI pal and OpenAI’s rival, Anthropic, might have brought them back into the game. The company recently released the Claude 3 model family, which comprises Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus.

Claude 3 Opus, the strongest model, outperforms GPT-4 on common benchmarks like MMLU and HumanEval. It has been almost a year since GPT-4 was released, and this is the first time a model has surpassed it. Also, Claude 3 Sonnet, specifically meant for enterprise workloads, is now available on Amazon Bedrock and in private preview on Google Cloud’s Vertex AI Model Garden—with Opus and Haiku coming soon to both.

Claude 3 also boasts vision capabilities, allowing it to process images and generate text outputs. It analyses and comprehends charts, graphs, technical diagrams, reports, and other visual assets.

With a context window of 200K, Claude 3 is well-suited for enterprise applications dealing with vast amounts of corporate data. Its capabilities encompass analysis, forecasting, content creation, code generation, and multilingual conversation, including Spanish, Japanese, and French proficiency.

Ironically, many believe Claude 3 has reached AGI because of its ability to delight users. That also explains why the customer-obsessed Amazon invested $4 billion, alongside Google investing $2 billion in the AI startup. It is interesting to see how the two cloud majors are joining hands to support OpenAI rivals and give Microsoft Azure a tough competition.

While Amazon seems to have strategically integrated powerful AI models into its Bedrock umbrella, catering to enterprise customers, Microsoft and Google are upping their small language models game with Phi-2 and Gemma, fueling both startups and developers and giving Meta’s Llama 2 a fight.

Amazon has none. It is likely to be released in the coming months.

Hey, Who’s Dancing with Whom Now?

The cloud war is turning into a dance battle of sorts. While Microsoft made Google dance at the start, Google and Amazon are now making Microsoft dance. But the latter is not going to give up so easily.

At Microsoft Ignite 2023, the company announced models as a service (MaaS), an approach similar to Amazon Bedrock, an LLM marketplace (the analogy which the company dislikes) providing foundational models for enterprise customers from Anthopic, AI21 Labs, Stability AI, and Amazon via an API.

More recently, Microsoft invested $16 million in Mistral AI and partnered with them to host their latest model, Mistral Large, on Azure as a MaaS and Azure Machine Learning model catalogue.

Before the release of Claude 3, Mistral Large held the position as the world’s second-ranked model generally available through an API, surpassing Google’s Gemini Pro and Anthropic’s Claude 2.1, second only to GPT-4.

Apart from the Mistral, Azure hosts all OpenAI models, including GPT-4, GPT-4 Turbo, and GPT-3.5. Furthermore, Azure provides pre-trained models, such as Meta’s Llama 2 series in 7B, 13B, and 70B parameter versions, along with its proprietary Phi-2.

Google Grooves to GenAI

Like AWS, Google is also putting in a lot of effort to make Vertex AI a success. With Gemini Ultra 1.0 along with the open-source models Gemma and Llama 2 and Gemini 1.5, Google has sent a clear message that they mean business.

With Claude 3 and Gemini 1.5 in its arsenal, Google Cloud appears to be an excellent choice for developers.

Both Claude 2 and Gemini 1.5 offer a context window of 1 million tokens, the highest till date, which even OpenAI has not achieved yet. The largest context window OpenAI offers is 128K with GPT-4 Turbo.

Now, the ball is in OpenAI’s court, and everybody is impatiently waiting for the release of GPT-5. But, OpenAI mostly has other priorities, and seems distracted with controversies, like the recent one where Elon Musk sued the company, accusing them of deviating from their original mission and becoming a de facto subsidiary of Microsoft. Microsoft badly needs OpenAI’s GPT-5 to win the cloud war.

The post Why Claude 3 Is Bad News for Microsoft Azure appeared first on AIM.

Anthropic Claude 3 Opus Beats OpenAI GPT-4

Siddharth Jindal — Mon, 04 Mar 2024 16:16:49 +0000

Anthropic today released the Claude 3 model family which comprises Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus.

Opus is the flagship model of the Claude 3 family. Meanwhile, Claude 3 Sonnet is designed for enterprise workloads, and Claude 3 Haiku stands out as the fastest and most compact model, ensuring near-instant responsiveness. It excels at answering simple queries and requests with unmatched speed

Claude 3 Opus (the strongest model) outperforms GPT-4 on common benchmarks like MMLU and HumanEval. Claude 3 capabilities include analysis, forecasting, content creation, code generation, and conversing in non-English languages like Spanish, Japanese, and French.

The Claude 3 family initially offers a 200K context window, but all models are capable of processing inputs exceeding 1 million tokens. Opus, in particular, showcases near-perfect recall, surpassing 99% accuracy in the ‘Needle In A Haystack’ evaluation. In contrast Gemini 1.5 has a context window of 1 million tokens.

The Claude 3 models prove their mettle by enabling near-instantaneous results, fueling live customer chats, auto-completions, and real-time data extraction tasks. Haiku stands out as the fastest and most cost-effective in its category, delivering remarkable performance, reading through information-dense research papers in less than three seconds.

The models also have strong vision capabilities for processing formats like photos, charts, and graphs. Anthropic claims these models have a more nuanced understanding of requests and make fewer refusals.

The input cost for Opus is $15 per million tokens, with an output cost of $75 per million tokens. For Sonnet, the input cost is $3 per million tokens, and the output cost is $15 per million tokens. As for Haiku, the input cost is $0.25 per million tokens, and the output cost is $1.25 per million tokens.

Opus and Sonnet are available to use today in API, which is now generally available, enabling developers to sign up and start using these models immediately. Haiku will be available soon. Sonnet is powering the free experience on claude.ai, with Opus available for Claude Pro subscribers.

Sonnet is also available today through Amazon Bedrock and in private preview on Google Cloud’s Vertex AI Model Garden—with Opus and Haiku coming soon to both.

Feels like the Claude 3 release was strategically timed, knowing that OpenAI probably can’t release a better model later today, given the Elon lawsuit.
— Matt Shumer (@mattshumer_) March 4, 2024

It’s time OpenAI released GPT-5. In conversation with Bill Gates, OpenAI Chief Sam Altman spoke at length about GPT-5, emphasising on customisation and personalisation.

“The ability to know about you, your email, your calendar, how you like appointments booked, connected to other outside data sources—all of that. Those will be some of the most important areas of improvement,” said Altman. Furthermore, he claimed that GPT-5 would have much better reasoning capabilities than GPT-4.

The post Anthropic Claude 3 Opus Beats OpenAI GPT-4 appeared first on AIM.

Google Gemini 1.5 Crushes ChatGPT and Claude with Largest-Ever 1 Mn Token Context Window

Siddharth Jindal — Thu, 15 Feb 2024 17:17:27 +0000

Google today released Gemini 1.5. This new model outperforms ChatGPT and Claud with 1 million token context window — the largest ever seen in natural processing models. In contrast, GPT-4 Turbo has 128K context window and Claude 2.1 has 200K context window.

“We’ve been able to significantly increase the amount of information our models can process — running up to 1 million tokens consistently, achieving the longest context window of any large-scale foundation model yet.,” reads the blog, co-authored by Google chief Sundar Pichai and Google DeepMind chief Demis Hassabis, comparing it with existing models like ChatGPT and Claude.

Gemini 1.5 Pro comes with a standard 128,000 token context window. But starting today, a limited group of developers and enterprise customers can try it with a context window of up to 1 million tokens via AI Studio and Vertex AI in private preview.

It can process vast amounts of information in one go, including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code, or over 700,000 words. In their research, Google also successfully tested up to 10 million tokens.

Gemini 1.5 is built upon Transformer and MoE architecture. While a traditional Transformer functions as one large neural network, MoE models are divided into smaller “expert” neural networks.

Gemini 1.5 Pro’s capabilities span various modalities, from analysing lengthy transcripts of historical events, such as those from Apollo 11’s mission, to understanding and reasoning about a silent movie. The model’s proficiency in processing extensive code further establishes its relevance in complex problem-solving tasks, showcasing its adaptability and efficiency.

Gemini 1.5 Pro’s performance in the Needle In A Haystack (NIAH) evaluation stands out, where it excels at locating specific facts within long blocks of text, achieving a remarkable 99% success rate. Its ability to learn in-context, demonstrated in the Machine Translation from One Book (MTOB) benchmark, solidifies Gemini 1.5 Pro as a frontrunner in adaptive learning.

This new development comes after Google released the first version of Gemini Ultra just last week. Recently Google added generative AI features to Chrome as well. Google has introduced the “Help me Write” feature across all websites. By right-clicking on any text box, users can access the feature, prompting Google’s AI to inquire about their writing requirements and subsequently generate an initial draft.

While Google is focusing on improving its AI models, OpenAI is reportedly working on a web search product to challenge Google. Additionally, OpenAI is working on its next LLM, GPT-5, which is expected to be smarter than ever, according to Altman.

OpenAI also recently released text to video generation model Sora. It can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt. Meanwhile, Meta is expected to release Llama 3 soon.

The post Google Gemini 1.5 Crushes ChatGPT and Claude with Largest-Ever 1 Mn Token Context Window appeared first on AIM.

China Open Sources DeepSeek LLM, Outperforms Llama 2 and Claude-2

Mohit Pandey — Fri, 01 Dec 2023 10:23:35 +0000

DeepSeek, a company based in China which aims to “unravel the mystery of AGI with curiosity,” has released DeepSeek LLM, a 67 billion parameter model trained meticulously from scratch on a dataset consisting of 2 trillion tokens.

Available in both English and Chinese languages, the LLM aims to foster research and innovation. The research community is granted access to the open-source versions, DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat.

Check out the GitHub repository here.

The model is available under the MIT licence.

DeepSeek LLM 67B Base has showcased unparalleled capabilities, outperforming the Llama 2 70B Base in key areas such as reasoning, coding, mathematics, and Chinese comprehension. Particularly, its proficiency in coding is highlighted by an outstanding HumanEval Pass@1 score of 73.78, and in mathematics, it achieves remarkable scores, including GSM8K 0-shot: 84.1 and Math 0-shot: 32.6.

The model’s generalisation abilities are underscored by an exceptional score of 65 on the challenging Hungarian National High School Exam.

DeepSeek LLM 7B/67B models, including base and chat versions, are released to the public on GitHub, Hugging Face and also AWS S3. Access to intermediate checkpoints during the base model’s training process is provided, with usage subject to the outlined licence terms.

Performance Highlights

DeepSeek LLM 67B Base surpasses Llama2 70B Base in general capabilities.
DeepSeek LLM 67B Chat performs exceptionally well in coding, mathematics, and reasoning. pic.twitter.com/Y9uVNYAq2l
— DeepSeek (@deepseek_ai) November 29, 2023

In-depth evaluations have been conducted on the base and chat models, comparing them to existing benchmarks. Results reveal DeepSeek LLM’s supremacy over LLaMA-2, GPT-3.5, and Claude-2 in various metrics, showcasing its prowess in English and Chinese languages.

The evaluation extends to never-before-seen exams, including the Hungarian National High School Exam, where DeepSeek LLM 67B Chat exhibits outstanding performance.

Experimentation with multi-choice questions has proven to enhance benchmark performance, particularly in Chinese multiple-choice benchmarks. By incorporating 20 million Chinese multiple-choice questions, DeepSeek LLM 7B Chat demonstrates improved scores in MMLU, C-Eval, and CMMLU.

DeepSeek LLM’s pre-training involved a vast dataset, meticulously curated to ensure richness and variety. The architecture, akin to LLaMA, employs auto-regressive transformer decoder models with unique attention mechanisms. The pre-training process, with specific details on training loss curves and benchmark metrics, is released to the public, emphasising transparency and accessibility.

Recently, Alibaba, the chinese tech giant also unveiled its own LLM called Qwen-72B, which has been trained on high-quality data consisting of 3T tokens and also an expanded context window length of 32K. Not just that, the company also added a smaller language model, Qwen-1.8B, touting it as a gift to the research community.

The post China Open Sources DeepSeek LLM, Outperforms Llama 2 and Claude-2 appeared first on AIM.

Anthropic Launches Claude 2.1, Surpasses GPT-4 Turbo in Context Length

Mohit Pandey — Tue, 21 Nov 2023 17:23:20 +0000

Anthropic’s Claude just got a massive upgrade. Claude 2.1, the latest iteration of its AI language model, is now available through API, revolutionising the claude.ai chat experience. Bringing forth key enhancements for enterprises, Claude 2.1 introduces a remarkable 200K token context window, substantial reductions in model hallucination rates, and a beta feature called tool use.

This update also includes a pricing overhaul aimed at improving cost efficiency for diverse customer base.

Responding to user feedback, Claude 2.1 doubles the allowable token limit, now enabling a context window of 200,000 tokens, compared to 128,000 of GPT-4 Turbo announced at OpenAI DevDay. This corresponds to approximately 150,000 words or over 500 pages of material. Users can leverage this extended capacity to upload comprehensive documents such as codebases, financial statements, or lengthy literary works.

Claude’s capabilities now extend to summarisation, Q&A, trend forecasting, document comparison, and more, all with the ability to process complex tasks in a matter of minutes.

Addressing user demands, a new beta feature, tool use, has been introduced, allowing Claude to seamlessly integrate with existing processes, products, and APIs. This expanded interoperability enhances Claude’s utility in day-to-day operations, enabling it to orchestrate across developer-defined functions, search web sources, retrieve information from private knowledge bases, and perform various actions on behalf of users.

Read: Meet Silicon Valley’s Generative AI Darling

Furthermore, Claude 2.1 achieves a significant milestone with a 2x decrease in false statements compared to its predecessor, Claude 2.0. This boost in honesty empowers enterprises to deploy AI applications with greater trust and reliability across their operations. Rigorous testing revealed Claude 2.1’s increased likelihood to demur rather than provide incorrect information, bolstering its credibility in handling complex, factual questions.

Claude 2.1 has made significant gains in honesty, with a 2x decrease in false statements compared to Claude 2.0.

This enables enterprises to build high-performing applications that solve business problems with accuracy and reliability. pic.twitter.com/Km5UuYHR3M
— Anthropic (@AnthropicAI) November 21, 2023

Simplifying the developer Console experience, Anthropic introduced Workbench, a tool that enables developers to iterate on prompts in a playground-style environment. This facilitates faster learning and optimization of Claude’s behaviour. System prompts have also been introduced, allowing users to provide custom instructions to enhance Claude’s performance, aligning responses with specific personalities or roles.

Claude 2.1 is now available in the API and powers the chat interface at claude.ai for both free and Pro tiers. The usage of the 200K token context window is exclusive to Claude Pro users, who can now upload larger files than ever before.

The post Anthropic Launches Claude 2.1, Surpasses GPT-4 Turbo in Context Length appeared first on AIM.

Anthropic Expands Claude’s Services Across 95 Nations

Siddharth Jindal — Tue, 17 Oct 2023 11:38:35 +0000

Anthropic recently announced Claude is now available to users in 95 countries worldwide. Launched in July, Claude has rapidly become a go-to solution for millions of users seeking professional and day-to-day task assistance.

This expansion comes as good news for users in supported countries, granting access to both the free version and Claude Pro. This dual offering provides a tailored experience, enhancing productivity and efficiency. Users can streamline their workflow, accomplish tasks more effectively, and achieve their goals using Claude’s intuitive interface and robust features.

Users worldwide can now harness Claude’s vast capabilities, leveraging its expansive memory, a remarkable 100K token context window, and unique file upload feature. These functionalities empower users to analyze data, refine their writing skills, and even engage in conversations with books and research papers.

Notably, Claude excels in data analysis, enabling users to gain valuable insights through advanced algorithms, driving well-informed decision-making processes. Additionally, its file upload feature facilitates seamless collaboration, allowing users to effortlessly share and discuss documents.

This significant development follows Amazon’s recent announcement of a $4 billion investment in Anthropic. Moreover, reports indicate that Anthropic is gearing up for another funding round, with discussions underway with past investors, including Google. The company aims to secure approximately $2 billion in funding. This strategic move solidifies Anthropic as a key player in the competitive landscape, positioning itself as a prime independent rival to OpenAI.

You can find the list of supported countries here.

The post Anthropic Expands Claude’s Services Across 95 Nations appeared first on AIM.

Meet Silicon Valley’s Generative AI Darling

Mohit Pandey — Thu, 05 Oct 2023 05:12:39 +0000

Looks like the entire Silicon Valley is head over heels for Anthropic. According to recent reports, the company is ready to raise another round of funding from past investors, including Google. One of the prime independent rivals of OpenAI, the company is in talks with investors to raise around $2 billion in funding.

This comes just a week after Amazon committed $1.25 billion for Anthropic, with plans to invest a total of $4 billion in the future. In return, Amazon expects to be the sole cloud provider for Anthropic.

Interestingly, Google has already made a $300 million investment in Anthropic, acquiring a 10% stake in the company. The two-year old startup that is building Claude, a rival to ChatGPT, now aims to get a valuation between $20 billion to $30 billion, which is ten times more than the current $4 billion after the investment in March.

To put this in perspective, OpenAI roughly has a valuation of around $28 billion after raising several rounds of funds.

The CEO of Anthropic, Dario Amodei, said in a recent interview with Andreessen Horowitz, that the biggest thing that the company wants to do is make Claude have infinite context windows. The only things holding it back, according to Amodei, is that “at some point, it just becomes too expensive in terms of compute”.

It is clear that Anthropic has high expectations when it comes to what it wants to achieve. But it seems like the current funds are holding the company back from its ambitions. It is only fair for the company to go around looking for more funds and raise the stakes for the big-tech.

A tussle with Google?

Interestingly, there are have been rumours about a senior Google engineer delivering some challenging news to over fifty colleagues. Here, a segment of the company’s cloud services, crucial for Anthropic, was experiencing issues necessitating overtime efforts to rectify the situation. To address the problems in their service, specifically, an underperforming and unstable NVIDIA H100 cluster, Google Cloud leadership initiated a month-long, seven-day-per-week sprint.

The consequences of not resolving this issue were deemed substantial, affecting Anthropic primarily but also leaving an adverse impact on Google Cloud and Google as a whole, as per the documents examined by Big Technology.

Just a week after Google launched the sprint, Anthropic announced its deal with Amazon, designating Amazon Web Services as its primary cloud provider for mission-critical workloads. It’s worth noting that the Amazon deal had been in the works for a while and was unrelated to Google Cloud’s performance problems.

Nevertheless, for Google, this development must have been unsettling, especially considering Google had invested all this money into the company. Nevertheless, Anthropic’s new funding from Amazon is undoubtedly a benefit for Google as well, as the value of its share would also increase, not just for Amazon.

On the other hand, Google is already developing its own AI models with Google DeepMind. Gemini, which is expected to be arriving soon, might be the biggest bet the company has made.

While Google may have the capability to manage these endeavours simultaneously, it faces the risk of being outpaced by competitors with fewer complicated trade-offs. Notably, Google Cloud’s performance issues with Anthropic appear to be stabilising, albeit not without requiring engineers to engage in a rare phenomenon at Google — weekend work.

OpenAI went from $25m run rate to $1bn in a year, Anthropic from < $10m to $200m this year, $500m next year

It looks like they are raising at 100x sales, but have you ever seen sales growth that high?

And the enterprise market hasn't even started yet, gen AI will be everywhere
— Emad (@EMostaque) October 4, 2023

Everyone loves Anthropic

Even though OpenAI is not profitable yet, it is still generating revenue through its offerings. Anthropic also has plans to make its generative AI capabilities generate revenue for itself, and aims for an annualised pace of $200 million. It also hopes to generate a $500 million annualised rate, according to a person with knowledge.

Anthropic believes that these AI models from companies like OpenAI would be ahead of everything in the next few years, and it would be impossible to catch up with them. This is clearly why every AI startup in the world wants its valuation to be the highest at the moment, to stay ahead in the race.

At present, generative AI startups are the biggest draw for investors and cloud providers. Emerging startups such as Mistral AI, Reka AI, Cohere AI, and Inflection AI, all have been raising funds, and have their own strategies for making bucks. Amidst all this, the investors and big-tech are running for their money.

Anthropic has raised the stakes even more, as in the end, the only moat that generative AI companies have is money.

Interestingly, FTX had a $500 million stakeholder in Anthropic. But even after bankruptcy, the Sam Bankman-Fried led company stopped the sale of its shares. Now, three months later, the stakes would be worth $2 billion, effectively making its customers very happy.

How could someone not love Anthropic?

With Anthropic rumoured to be raising at a $20-30B valuation, FTX’s stake in it could be worth ~$3B+

That may be enough to make all of FTX’s creditors whole
— Tanay Jaipuria (@tanayj) October 4, 2023

The post Meet Silicon Valley’s Generative AI Darling appeared first on AIM.

Anthropic Partners with BCG to Expand Claude Capabilities

Pranav Kashyap — Fri, 15 Sep 2023 09:31:39 +0000

San-Francisco-based AI lab Anthropic has announced a new collaboration with Boston Consulting Group (BCG), one of the big three strategy-consulting firms, to expand its AI assistant, Claude, to more enterprises.

BCG’s customers globally will get direct access to Anthropic’s proprietary AI assistant Claude to power their strategic AI offerings and deploy safer, more reliable AI solutions.

Through this collaboration, BCG will advise their customers on strategic applications of AI and help them deploy Anthropic models including Claude 2, the newest version of the AI lab’s assistant, to deliver business results. Use cases involving Claude span knowledge management, market research, fraud detection, demand forecasting, report generation, business analysis and more.

Why BCG Chose Anthropic

Anthropic chose to partner with BCG as its ‘Constitutional AI’ ideology aligns with BCG’s ‘Responsible AI’ idea. Its ‘Constitutional AI’ is a set of principles, developed by Anthropic to make judgments about the AI’s outputs. Therefore, the constitution guides the model to take on the normative behaviour described in it. On the other hand, BCG’s ‘Responsible AI’ is the process of developing and operating artificial intelligence systems that align with organisational purpose and ethical values, achieving transformative business impact.

In addition to working together to bring AI to new organisations, BCG has partnered with Anthropic to use Claude within its own teams.

“Our new collaboration with Anthropic will help deliver that alignment on harnessing value and bottom line impact from AI,” said Sylvain Duranton, global leader of BCG X, BCG’s tech build and design unit. “

Together, we aim to set a new standard for responsible enterprise AI and promote a safety race to the top for AI to be deployed ethically.” he added.

Onboard the AI Bandwagon

BCG had partnered with OpenAI in March this year to establish the Center for Responsible Generative AI within BCG X—unites tech builders, entrepreneurs and designers with talent in AI to enter into partnerships and build rapid solutions.

BCG has also partnered with Intel to bring generative AI into enterprise. Bain & Company also partnered with OpenAI to help its clients integrate the technology developer’s innovations into daily tasks, reducing waste and supercharging productivity. MCKinsey & Co internally developed an AI Chatbot, Lilli, to deliver insights to employees based on a knowledge base of over 10,000 documents and archival data. It also aggregates external sources and allows employees to engage in dialogue with the platform.

The post Anthropic Partners with BCG to Expand Claude Capabilities appeared first on AIM.

Former Google DeepMind Researchers Go Deep for Sales Triumph

Vandana Nair — Mon, 28 Aug 2023 05:05:46 +0000

Amidst the rush to create prototypes using large language models and releasing them on Hacker News, where none of them have seen integrated use cases, a startup is working on offering feasible solutions. Plunging right into a market, where LLMs can be leveraged for sales solutions projected to reach $770M by 2032, are former Google DeepMind employees through their startup Glyphic AI — an AI copilot for the sales team.

“I think one of the biggest challenges of Glyphic, as well as any LLM-based project, is handling hallucinations,” said Devang Agrawal, cofounder and CTO of Glyphic AI, in an exclusive interview with AIM. “One of the advantages of Glyphic is that, since most of our tasks are quite grounded, i.e. we are not just answering questions randomly, we’re trying to go through questions, calls, emails, and things like that, the scope of hallucinations is minimised. The answer is always based on something.”

In June, Glyphic AI officially came out of stealth mode and raised a pre-seed funding of $5.5 million.

Navigating Large Language Models

“I think ChatGPT has brought to everyone’s mind the capabilities of AI, and everyone is now thinking about how it could be useful within their particular use case,” said Agrawal.

Speaking of other challenges with LLMs, he said, “Most people are using large language models these days with some prompt engineering, but soon they will realise is that this approach is not scalable. You can have evaluation sets or tests, but it is difficult to know if you’re getting better or worse, and the sort of iterating on large language models is very challenging. For instance, by changing your prompt, you can completely break everything but you won’t truly be able to detect what broke. You can trip the model and the accuracy can come down, but it’s hard to test these models.”

At Glyphic, a mix of a few small models and large language models are used. “We, kind of, cleverly decide between GPT-4, Claude or Cohere based on the context and tasks we are trying to accomplish. For instance, Claude is able to understand long context and can understand 100,000 tokens at one go. This is something you can’t do with GPT-4 which can understand only up to 16,000 tokens. Claude is also trained in a much conversational way, whereas GPT-4 can be direct. So, based on the sort of context, we use one or the other model depending on what behaviour we actually want.”

Glyphic AI will sufficiently validate its product and use cases, and then move towards optimising them. After which, in the future, the company would work towards building their own language model. Agrawal also spoke about a recent trend that has picked up. “So, everyone is building these pretty-looking prototypes with large language models and putting them on Hacker News. While they look nice, we still haven’t seen deeply integrated use cases, which are of high quality, high fidelity, and are being used everyday — because it is really challenging to do so.”

DeepMind Harvesting Entrepreneurs

Glyphic AI co-founders Devang Agrawal and Adam Liska

Agrawal completed his graduation from Cambridge University and had always been fascinated with AI. He worked as a machine learning engineer at Apple on the Siri project. Switching from a product-focused role, Agrawal desired to move to the research and academic side of things. He thus joined the ‘research-heavy’ Google DeepMind as a research engineer, where he worked for two years before moving on to start Glyphic AI with another DeepMind senior researcher Adam Liska.

“DeepMind is one of the most innovative companies, and one of those few places where you can be working for years and still be learning so much, because you are always on the forefront. It was a tough decision to leave as the multimodal project we were doing was going in a really exciting direction,” said Agrawal. “But, Adam and I were always clear about wanting to build a startup. Right from my college days, I knew what I wanted, and was doing jobs to get the correct skills so when you do have a startup, it becomes more effective.”

Google DeepMind supports people by giving them the flexibility to work on projects that they desire and even allows them to take risks. “It is a very supportive organisation where you can take speculative bets which sometimes work out and sometimes don’t. It is a great platform to develop as researchers and helps us inculcate critical thinking, which is pertinent for research.”

Agrawal believes that having expertise and being at the centre of transformational technology helped with the switch. “Many people are leaving DeepMind to set up startups, as they have expertise in this new transformational technology, which is now ready to be used in a product.”

There were two waves of exits fuelled by technology change at Google DeepMind. “When reinforcement learning had come to the market, a number of people were leaving DeepMind to set up startups using the same, but what we realised was that it was really hard to bring deep reinforcement learning in a product context, and that wave slightly died out. This was around 2017. Now, having specialist skills in large language and multimodal models, people want to build products on it.”

A number of people have also gone on to build research-driven startups to solve large problems such as cancer. “Designing new molecules that might work better for treating certain sorts of cancer, etc. is a different sort of startup that requires a different sort of thinking. We’re just focusing on slightly more open-ended things and solving the problem in a fundamental way.”

Roadmap for Glyphic

Currently, with ten employees, Glyphic focuses on applying large language models and generative AI to transform B2B sales processes. “Right now, we’re just focusing on improving and optimising sales processes but we want to build on top of it to optimise the entire go-to market and product strategy with everything else.”

In line with the company’s vision in a year’s time, Agrawal is looking to have a research team as well. “As we grow and scale, we would invest deeply into research and this would be one of the key differentiators.” Glyphic currently provides services for software companies. “They’re so innovative with their sales processes and are willing to try out new products. Once we prove our products and models, we can expand into enterprises.”

Glyphic also works with a few Indian companies headquartered or registered in the US. “Microsoft is building something in this area, and so are other companies which are building copilot for sales, but I think this is one of the spaces where startups have a huge advantage. The technology is moving fast and you need to be able to completely rip out everything and change it within a couple of weeks if you want to stay ahead of the game. I think we definitely have an advantage because of our kind of technology background.”

The post Former Google DeepMind Researchers Go Deep for Sales Triumph appeared first on AIM.

Llama 2 vs GPT-4 vs Claude-2 – Comparison

Shritama Saha — Wed, 19 Jul 2023 12:30:00 +0000

Last night Meta released LLaMA 2, an upgraded version of its large language model LLaMa, in a surprise partnership with Microsoft. Soon to be available on the Microsoft Azure platform catalogue and Amazon SageMaker, the model can be used for both research and commercial purposes through licensing.

The introduction of the 7B, 13B, and 70B pre-trained and fine-tuned parameter models shows a remarkable 40% increase in pre-trained data, leveraging larger context data for training, and employing GQA (Generalised Question-Answering) to enhance the inference capabilities of the larger model.

Meanwhile, over the past couple of months, several companies have launched their own LLMs including TII’s Falcon, Stanford’s Alpaca and Vicuna-13B, Anthropic’s Claude 2 and more. So before your timeline gets flooded with posts like “ChatGPT is just the tip of the iceberg, Llama is here” or “Meta is Microsoft’s new favourite child”, let’s cut to the chase and see how these models fair.

Grades Matter

Llama 2-Chat was made using fine-tuning and reinforcement learning with human feedback, involving preference data collection and training reward models, including a new technique like Ghost Attention (GAtt). It is also trained on GPT-4 outputs. Meta performed human study to evaluate the helpfulness of Llama-2 using 4,000 prompts. The “win rate” metric was used to compare the models, similar to the Vicuna benchmark. The study compares Llama 2-Chat models to both open-source and closed-source models like ChatGPT and PaLM using single and multi-turn prompts.

The 70B Llama-2 model performs roughly on par with GPT-3.5-0301 and outperforms Falcon, MPT, and Vicuna. Llama 2-Chat models outperform open-source models in terms of helpfulness for both single and multi-turn prompts. It has a win rate of 36% and a tie rate of 31.5% compared to ChatGPT. It also outperforms the MPT-7B-chat model on 60% of the prompts. The Llama 2-Chat 34B model has an overall win rate of over 75% against the equivalently sized Vicuna-33B and Falcon 40B models. Additionally, the 70B model outperforms the PaLM-bison chat model by a significant margin.

However, Llama-2 is weak in coding.

It is not better than GPT-3.5 (48.1) level or GPT-4 (67) when it comes to coding. Although it MMLU (Massive Multitask Language Understanding) benchmark is good, HumanEval shows coding capability is quite a bit lower compared to StarCoder (33.6) or many other models specifically designed for coding. But, considering that Llama-2 has open weights, it is highly likely that it will improve significantly over time.

On the other hand, Claude-2 excels in coding, mathematics, and logical thinking, including the ability to comprehend PDFs—a task that GPT 4 still struggles with. It attained an impressive score of 71.2% on the Codex HumanEval, an evaluation specifically designed to assess Python coding skills.

When it comes to writing, Llama-2 and GPT-4 are very different, too.

When asked to write a poem, both had a different approach. ChatGPT seems to have more intentional word choices which are more focused on the way words sound, a more sophisticated poet with a wider vocabulary. While Llama-2 uses a more obvious rhyming word selection, like a high school poem.

I asked both Llama-2 and GPT-4 to write a poem about their epic competition. Guess which one is which.

========= Poem 1 =========

In the grand tapestry of technology's weave,
Where information turns and ideas cleave,
Two figures stand, their stories interweave,
GPT and Llama-2,…
— Jim Fan (@DrJimFan) July 18, 2023

Even though Llama-2 is trained on a much smaller scale, its output is commendable, as per several users who have beta access. Meta initially used publicly available data but since it’s insufficient, they collected high-quality data, achieving better results with fewer examples and observed the impact of different platforms or vendors on performance and found the model’s outputs comparable to human annotations.

Open Source Or Openness?

Building LLaMA likely cost Meta over USD 20 million. And although it is being touted as open-source, it comes with a condition. Meta is helping the open-source community by releasing the model with a commercially-friendly license.

As per the license, if a company with over 700 million active users per month needs to request permission to use the model and it will be at the discretion of Meta whether or not to grant access. To sum up, it is “Free for everyone except FAANG” as mentioned in the paper.

However, other LLM models like GPT-4, and Claude 2 are not open source but can be accessed through APIs.

Microsoft’s Second Child

Microsoft’s new partnership with Meta came as a surprise. After investing in a ten-year partnership with OpenAI, Satya Nadella seems to yearn for more. Meanwhile, Meta’s Threads managed to amass a staggering 10 million registrations within a mere seven hours of its debut, However, ChatGPT saw an unprecedented decline of 9.7% in June, marking the first downturn since its introduction in December.

When OpenAI released the paper of GPT-4, the ChatGPT maker received immense flak for being lame as it lacked crucial details about the architecture, model size, hardware, training compute, dataset construction, and training method. Researchers believe that OpenAI’s approach undermines the principles of disclosure, perpetuates biases, and fails to establish the validity of GPT-4‘s performance on human exams.

On the other hand, Meta’s white paper is itself a masterpiece. It spelt out the entire recipe, including model details, training stages, hardware, data pipeline, and annotation process. For example, there’s a systematic analysis of the effect of RLHF with nice visualisations.

According to Percy Liang, director of Stanford’s Center for Research on Foundation Models, Llama-2 poses a considerable threat to OpenAI. Meta’s research paper admits there is still a large gap in performance between LLaMA 2 and GPT-4. So even though LLaMA 2 can’t compete with GPT-4 on all parameters, it has the potential to make it better. “To have Llama-2 become the leading open-source alternative to OpenAI would be a huge win for Meta,” says Steve Weber, a professor at the University of California, Berkeley.

Thus, with the arrival of Meta’s Llama-2, Microsoft now has a new child to rely upon should its older child fail.

Read more: Claude-2 Vs GPT-4

The post Llama 2 vs GPT-4 vs Claude-2 – Comparison appeared first on AIM.

Claude-2 vs GPT-4 – Which is Better?

Mohit Pandey — Thu, 13 Jul 2023 07:22:34 +0000

Google-backed Anthropic, an AI lab based in San Francisco, has unveiled Claude 2, a publicly accessible alternative to GPT-4. Previously, Claude, the earlier iteration, was exclusively offered to enterprises, but the latest version is now open to the general public in the United States and the United Kingdom. Distinguishing itself from its predecessor, Claude 2 is accessible through both a beta website and an API.

The timing couldn’t have been better. Claude-2 comes at a time when the popularity of GPT has seen a decline in recent months. Users are seeking alternatives that offer superior performance and affordability. Claude-2 appears to fit the bill, with its enhanced capabilities and cost-effectiveness.

Learning from Google’s Bard and OpenAI’s ChatGPT and taking user feedback into account, Anthropic has made significant enhancements to Claude-2. Users on Twitter have been lauding Claude’s ability to engage in natural language conversations, clearly explain its reasoning, and produce less harmful outputs. Claude-2 builds on these strengths and adds several key features that elevate its performance to new heights.

One notable improvement is Claude-2’s enhanced coding, maths, and reasoning skills. This includes reading PDFs, something that GPT-based models still struggle with. This is exactly the time around when OpenAI has introduced Code Interpreter on its paid models.

The best part? 100K context, so it can "see" the entire file

Oh yah and its free

This is arguable better than GPT4-32K because Its cheaper and has a larger context window
— Sully (@SullyOmarr) July 11, 2023

Let’s Evaluate

Anthropic has put considerable effort into fine-tuning the model. According to the model card of Claude-2, the model is built using unsupervised learning and reinforcement learning with human feedback (RLHF), similar to what OpenAI used for GPT. Moreover, the model is trained with data till early 2023, but does not access the internet.

Claude-2 now boasts an impressive 71.2% score on the Codex HumanEval, a Python coding test, up from 56.0% achieved by its predecessor, Claude-1.3. This is compared to 67% of GPT-4. Claude-2 wins.

Similarly, on the GSM8k maths problem set, Claude-2 scored 88%, an improvement from Claude-1.3’s score of 85.2%. These advancements position Claude-2 as a valuable asset for developers and individuals seeking assistance with technical challenges. GPT-4 wins here with 92% score.

Claude-2, Anthropic's shot at GPT-4, has arrived. It's cheaper than GPT-4 and far stronger in reasoning & coding than its older self.

Things you should know:
▸ On standard exams, it's not quite at GPT-4 yet but catching up fast compared to v1.3. Winner in bracket:
GRE verbal:… pic.twitter.com/k8Vblc1BDg
— Jim Fan (@DrJimFan) July 11, 2023

The most important aspect is the expansion of Claude-2’s input and output capabilities. Users can now input up to 100,000 tokens per prompt, compared to 32,000 of GPT-4, allowing Claude-2 to process extensive technical documentation or even entire books. Additionally, Claude-2 can generate longer documents, ranging from memos to letters to stories, up to a few thousand tokens in length.

This is also 4-5 times cheaper than GPT-4-32K which costs $1.96 per token. Prompt tokens cost $11 per million token vs $60 million for GPT, and completion costs $32 vs $120/M, assuming similar tokenisation length. This will definitely push a lot of users to start using Claude-2 instead of GPT-4.

Read: Busting the Myth of Context Length

Price drop and availability

Anthropic has made Claude-2 available through multiple channels. Users can access Claude-2 via the API, allowing businesses to integrate it into their systems seamlessly. Remarkably, Anthropic has maintained the same pricing for the Claude-2 API as its predecessor, Claude-1.3, making the upgrade to the latest model even more appealing to budget-conscious users.

If Claude 2 turns out to be as strong as GPT-4, thereby breaking the OpenAI monopoly on strong LMing, the number of companies building products on top of LMs will increase substantially. pic.twitter.com/prfWj3jlSu
— Ofir Press (@OfirPress) July 12, 2023

Partners like Jasper, a generative AI platform, have reported Claude-2’s strength in a wide range of use cases, particularly those involving extended content generation. With a 3X larger context window and improved semantics, Claude-2 has empowered Jasper’s customers to stay ahead of the curve and achieve their content strategy goals.

Another notable collaboration involves Sourcegraph, a code AI platform that assists developers in writing, fixing, and maintaining code. Sourcegraph’s coding assistant, Cody, leverages Claude-2’s improved reasoning and access to a larger context window of up to 100,000 tokens. By providing accurate answers and incorporating codebase context, Cody assists developers in speeding up their workflow and staying up to date with the latest frameworks and libraries.

Safe but still hallucinatory

According to Anthropic, the model has undergone rigorous evaluation, including internal red-teaming and automated tests on harmful prompts. In these evaluations, Claude-2 demonstrated a twofold improvement in providing harmless responses compared to Claude-1.3. While no model is completely immune to misuse, Anthropic accepts that.

“For example, Claude models could support a lawyer but should not be used instead of one, and any work should still be reviewed by a human,” reads the paper. People on Twitter have been already pointing out that the claims of being good at maths are overstated.

It seems that Claude 2 is as bad at math as GPT4, maybe even slightly worse. Here's one example interaction: it makes a massive and fairly obvious error in screenshots 1) and 2), and in 3) it isn't aware of an important (openly available) paper published in 2015 pic.twitter.com/2OKojeE3qC
— Zygi (@nonagonono) July 11, 2023

Anthropic acknowledges the evolving nature of AI and is committed to responsible deployment. Claude-2 is poised to become a trusted companion for individuals and a valuable tool for businesses.

As users seek alternatives to declining ChatGPT usage, Claude-2’s budget-friendly offering and remarkable feature set make it an enticing option. Seems as though, a true competitor for OpenAI is finally here and might finally make the company drop its prices and come to the ground to compete.

The post Claude-2 vs GPT-4 – Which is Better? appeared first on AIM.

Anthropic Launches ChatGPT Rival, Claude 2

Tasmia Ansari — Wed, 12 Jul 2023 06:45:02 +0000

San-Francisco-based AI lab Anthropic has announced Claude 2, a new ChatGPT rival open to the public in the US and the UK. The model is the latest version of ‘Claude’ released merely five months ago which was available only to businesses. Unlike its predecessor the latest version is available via a public-facing beta site as well as an API.

One of the chatbot’s beta testers, Ethan Mollick, an Associate Professor at the Wharton School of the University said in a LinkedIn post, it has two big advantages over the other models: it is very good at handling documents (especially PDFs, which GPT struggles with) and shows a very sophisticated “understanding” of documents. Furthermore, it continues to be the most “pleasant” AI personality. On the downside, he suggested users to refrain from using the model for data, even though it accepts CSV files. It hallucinates answers, contrarily Code Interpreter does not.

The startup run by former senior members of the OpenAI team Daniela and Dario Amodei purports to be a more ethically-driven company that makes generative AI safe and “steerable,” according to its website.

According to the announcement blog, the latest version of the AI assistant scored 76.5 percent on the multiple choice section of the Bar exam and in the 90th percentile on the reading and writing portion of the GRE. Its coding skills have notably improved, scoring 71.2 percent on a Python coding test compared to Claude’s 56 percent.

Earlier this year, in February, Anthropic introduced a waitlist for early access to Claude, following IT giant Google’s recent investment in the startup. The investment—worth $300 million—gave Google a 10% stake in the company bringing Anthropic’s value at approximately $5 billion. The partnership was predicted, as earlier in January Anthropic announced that it had chosen Google Cloud as its preferred cloud provider.

The OpenAI competitor has set itself apart with its focus on understanding and developing safe AI systems, with the “constitutional AI” approach. “We have an internal red-teaming evaluation that scores our models on a large representative set of harmful prompts, using an automated test while we also regularly check the results manually,” said the blog. This is to ensure that Claude 2 is less susceptible to jailbreaks or nefarious uses.

If you’re in the US or UK, you can access the AI chatbot through the Claude 2 page, and sign up for free. Click on “Talk to Claude”, provide an email address and you’ll be ready to go.

The post Anthropic Launches ChatGPT Rival, Claude 2 appeared first on AIM.

Anthropic Scores Fresh $300M Funding Boost

Shritama Saha — Thu, 09 Mar 2023 09:46:56 +0000

After basking in the success of Google’s recent investment in Anthropic, the AI safety and research startup is raising another $300 million round at a pre-investment valuation of $4.1 billion, with Spark Capital leading the round.

Joining the likes of Google, Microsoft, and You.com, privacy-focused search engine DuckDuckGo announced the debut of its AI assistant service, DuckAssist, which is built on the Davinci LLM from OpenAI and Anthropic’s Claude.

Legal tech startup Robin AI and Quora’s chatbot app Poe also use Anthropic models.

Anthropic had recently introduced a waitlist for early access to its AI assistant Claude, following Google’s recent investment in the company.

The investment—worth around $300 to $400 million—gives Google a 10% stake in the company and can value Anthropic at approximately $5 billion. The collaboration between the two companies has been hinted at, as Anthropic announced in January that it had chosen Google Cloud as its preferred cloud provider. Till now, Anthropic has raised $580 million in Series B, led by Sam Bankman-Fried, chief executive at FTX. Caroline Ellison, Jim McClave, Nishad Singh, Jaan Tallinn, and the Center for Emerging Risk Research (CERR) have participated in the funding round.

The VC firm Spark Capital was founded by Paul Conway, Santo Politi, and Todd Dagres in 2005 and has made substantial investments in top tech firms like Twitter, Discord, Slack among others.

The post Anthropic Scores Fresh $300M Funding Boost appeared first on AIM.

The Battle for AI Supremacy Begins

Shritama Saha — Tue, 07 Feb 2023 11:46:42 +0000

The Cold war was the era of inventions. Reliable transistors, chips, solid-state storage, mass-produced programmable computers and other major inventions that authored the future of technology happened between 1950 and 1990. The fight between the two countries ends up taking the artificial satellite (Sputnik) to space and the first man to the moon.

Fast forward to 2023, the race for artificial intelligence dominance bears striking semblance to the ‘Cold War Era’ but with Google and Microsoft locking horns to fight the AI battle.

AI War Just Got Real

When Microsoft invested $1 billion in OpenAI in 2019, everybody sensed that something was cooking. The initial investment in OpenAI allowed the platform to build two amazing products—CODEX and DALL-E. This emboldened Microsoft to increase its investment by putting in another $10 billion in the viral ChatGPT creator, OpenAI.

Meanwhile, Google—slightly threatened by the partnership—issued ‘code red’ and its founders, Larry Page and Sergey Brin, made a surprise entry to work on making the search engine better. Soon, Google made a $300 million investment in AI safety and research company Anthropic, founded by Dario Amodei, ex-VP of research at OpenAI, along with his sister Daniela and nine other former OpenAI employees, in 2021.

The deal is likely to have taken Anthropic’s valuation to about $5 billion, with Google occupying an estimated 10% stake while Microsoft is set to own 49% of OpenAI—almost five times more than Google. Microsoft has already added ChatGPT to Teams and is also preparing to add it as well as GPT-4 to its search engine ‘Bing’ very soon, which otherwise falls fairly short compared to Google Search.

However, not all is bleak because Google has a lot in store that it is revealing slowly. Yesterday, the tech giant unveiled ChatGPT’s rival Bard, its conversational AI chatbot, based on Google’s LaMDA (Language Model for Dialogue Applications). It’s the same AI model, which had purportedly turned sentient and Google had to shut it down.

Besides, the recent investment in Anthropic will also allow Google to use its language model assistant, Claude, for enterprises. Google’s DeepMind’s in-house AI chatbot, Sparrow, is also being fine-tuned simultaneously.

Riled up by the fire of competition, the big techs are even hosting events at the same time. Google’s ‘Live from Paris’ event will be held on February 8, 2023, along with Microsoft–OpenAI’s surprise event to “share some progress on a few exciting projects” today. Microsoft announced this event minutes after Bard’s announcement.

Big Tech, Big Risks

The tech giants often acquire a host of startups to minimise the risk of new competitors emerging and potentially disrupting their dominant position in the market. Over the years, Microsoft has invested in GitHub, LinkedIn, Skype and more. Similarly, Google also has DeepMind, Verily, Nest, Waymo and, most recently, Anthropic. Google has recently moved DeepMind to its major development department while the rest remained in the Other Bets.

The acquisition of startups by larger firms serves as a strategic buffer, allowing them to deflect blame and avoid public backlash in the event of a failure or misstep.

For instance, when Anthony Levandowski, one the prominent engineers of autonomous vehicles, was asked to build the self-driving car for Discovery Channel, Google was supportive of the idea but wanted to keep its name separate from the project due to reputational concerns over the potential risk of a crash. Levandowski then created a new company, Anthony’s Robots, which quickly built the ‘Pribot’, a self-driving Toyota Prius that was the first to drive on public roads. Impressed with the Pribot, Google’s founders provided Levandowski with the funds to bring it into a discrete unit within Google called ‘X’, which is a part of Other Bets.

Ultimately, Google acquired both of Levandowski’s companies, Anthony’s Robots and 510 Systems. During the earnings report, Alphabet files for Other Bets separately.

Addressing the increasing popularity of ChatGPT, Google executives have previously stated that its AI models are on par with OpenAI’s. Still, due to the “reputational risk” created by the technology, it is acting “more conservatively than a small startup”.

Meanwhile, Google’s latest fetch Anthropic has an interesting back story. Former OpenAI employee, Amodei, stepped down from the company due to disagreement with its commercialising direction accelerated by Microsoft’s investment. Ironically, Anthropic, now backed by Google, is pursuing a similar path.

AI Wins the Race

As the AI race heats up, it has moved beyond building something new and groundbreaking. Instead, it’s about who can execute and perfect the existing models better—and, in this battle, no one is holding back.

However, unlike Google, Microsoft is not taking it slow. The company is all-in, leaving no room for complacency. With both tech giants relying on existing models, the competition has become personal and the stakes couldn’t be any higher. More will be unveiled after today’s events.

It’s no longer just about technical capabilities; it’s about who will emerge as the ultimate AI champion and take the trophy home.

The AI arms race just got real.

The post The Battle for AI Supremacy Begins appeared first on AIM.

OpenAI Rival Anthropic Starts Claude Early Access

Shritama Saha — Mon, 06 Feb 2023 07:22:09 +0000

San-Francisco-based AI safety and research lab Anthropic has introduced a waitlist for early access to its AI assistant ‘Claude’, following Google’s recent investment in the company.

The investment—worth $300 million—gives Google a 10% stake in the company and can value Anthropic at approximately $5 billion. The collaboration between the two companies has been hinted at, as Anthropic announced in January that it had chosen Google Cloud as its preferred cloud provider.

Founded by former OpenAI sibling duo in 2021, Anthropic raised $580 million in Series B funding round led by Sam Bankman-Fried, former chief executive at FTX—and now a suspected fraudster—among others.

‘Claude’ is a large language model assistant which is claimed to give tough competition to OpenAI’s ChatGPT.

The company is focused on understanding and developing safe AI systems, utilising a unique approach known as “constitutional AI”.

With a goal of responsibly scaling AI technology, Anthropic is making progress in reverse engineering the behaviour of small language models and understanding the logic behind the pattern-matching behaviour in larger language models.

The post OpenAI Rival Anthropic Starts Claude Early Access appeared first on AIM.