Anthropic Unveils AI Chatbot Models Outperforming OpenAI’s GPT-4

Updated: Mar 12

Anthropic, an AI startup backed by Google and substantial venture capital, has just unveiled the latest iteration of its GenAI technology, named Claude. Notably, Anthropic claims that its new AI chatbot, Claude 3, surpasses the performance of OpenAI's GPT-4.

Claude 3, the newest addition to Anthropic's GenAI lineup, consists of different models, including Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus – with Opus positioned as the most powerful among them. Anthropic asserts that all these models exhibit increased capabilities in analysis and forecasting, showcasing superior performance on specific benchmarks when compared to models like ChatGPT, GPT-4 (excluding GPT-4 Turbo), and Google’s Gemini 1.0 Ultra (excluding Gemini 1.5 Pro).

A noteworthy feature of Claude 3 is that it represents Anthropic's inaugural foray into multimodal GenAI, allowing it to analyze both text and images. This versatility enables Claude 3 to process various visual elements such as photos, charts, graphs, and technical diagrams, drawing information from diverse document types like PDFs, slideshows, and others.

Going a step beyond its competitors, Claude 3 introduces the capability to analyze multiple images in a single request, with a maximum limit of 20 images. While this feature allows for comparing and contrasting images, there are certain limitations in Claude 3's image processing. Anthropic has deliberately disabled the models from identifying people, taking into consideration ethical and legal implications. The company also acknowledges that Claude 3 may make mistakes with low-quality images (under 200 pixels) and faces challenges in tasks involving spatial reasoning (such as reading an analog clock face) and object counting.

Claude 3's primary focus is on image analysis and won't generate artwork, at least in its current state. Regardless of whether it's processing text or images, Anthropic assures customers that Claude 3 excels at following multi-step instructions, producing structured output in formats like JSON, and engaging in multilingual conversations, surpassing its predecessors. Anthropic further claims that Claude 3 is more adept at refusing to answer questions less often, thanks to its nuanced understanding of requests. In the near future, the models will provide the source of their answers, allowing users to verify information.

Anthropic emphasizes that Claude 3 has a tendency to generate expressive and engaging responses, making it easier for users to prompt and steer compared to their legacy models. The company believes that users will achieve desired results with shorter and more concise prompts, showcasing Claude 3's improved user interaction capabilities.

Some of these enhancements result from the expansion of Claude 3’s context capabilities.

A model’s context, often referred to as a context window, signifies the input data (e.g., text) considered by the model before generating output. Models with smaller context windows tend to "forget" recent content, leading to deviations from the topic in problematic ways. Conversely, large-context models excel in grasping the narrative flow of the data they process, yielding more contextually rich responses, at least in theory.

Anthropic states that Claude 3 will initially support a 200,000-token context window, equivalent to about 150,000 words, with selected customers accessing a 1-million-token context window (~700,000 words). This aligns with Google’s latest GenAI model, Gemini 1.5 Pro, which also provides up to a million-token context window.

However, despite being an improvement over its predecessor, Claude 3 is not without its imperfections.

In a technical whitepaper, Anthropic concedes that Claude 3 shares challenges with other GenAI models, including bias and hallucinations (i.e., generating inaccurate information). Unlike some counterparts, Claude 3 lacks web-search capabilities, answering questions solely based on data predating August 2023. Additionally, while multilingual, Claude is less fluent in certain "low-resource" languages compared to English.

Anthropic commits to frequent updates to address these issues in the coming months.

Opus and Sonnet are currently accessible on the web and through Anthropic's development console and API, Amazon's Bedrock platform, and Google's Vertex AI. Haiku is anticipated to launch later this year.

Pricing breakdown:

  • Opus: $15 per million input tokens, $75 per million output tokens

  • Sonnet: $3 per million input tokens, $15 per million output tokens

  • Haiku: $0.25 per million input tokens, $1.25 per million output tokens

So, what's the big picture with Claude 3?

As previously reported, Anthropic's aspiration is to develop a cutting-edge algorithm for "AI self-teaching." This algorithm could empower virtual assistants to handle emails, conduct research, and create art and literature—an evolution seen in models like GPT-4 and other large language models.

Anthropic hints at this in its blog post, indicating plans to enhance Claude 3's initial capabilities, allowing it to interact with other systems, code "interactively," and deliver advanced agentic capabilities. This aligns with OpenAI's ambitions to create a software agent automating complex tasks, similar to their API allowing developers to embed "agent-like experiences" in their applications. Anthropic appears committed to delivering comparable functionality.

The possibility of an image generator from Anthropic raises intrigue, although it would be surprising given the controversies around image generators, including copyright and bias concerns. Legal battles between image generator vendors and artists accusing them of profiting from their work without compensation or credit add to the complexity.

The evolution of Anthropic's technique for training GenAI, known as "constitutional AI," is another area of curiosity. This approach, according to Anthropic, makes the behavior of GenAI more understandable, predictable, and adjustable. Constitutional AI aims to align AI with human intentions by having models respond to questions and perform tasks based on a straightforward set of guiding principles. For instance, for Claude 3, Anthropic introduced a principle, informed by crowdsourced feedback, instructing the models to be understanding and accessible to people with disabilities.

Regardless of Anthropic's ultimate goals, the company is committed to a long-term journey. Leaked information from a pitch deck in May of the previous year suggests the company aims to secure up to $5 billion in the next 12 months—a baseline it deems necessary for remaining competitive with OpenAI. With substantial commitments from Google, Amazon, and other backers, Anthropic is well on its way to achieving this ambitious target.

