Loading Now
×

Anthropic’s Claude 3 knew when researchers were testing it

Anthropic’s Claude 3 knew when researchers were testing it


We’ve already reported on how the San Francisco startup Anthropic, founded by former OpenAI engineers and led by a brother-sister duo, today announced a new family of large language models (LLMs) they say is among the best in the world, Claude 3, matching or outperforming OpenAI’s GPT-4 on many key benchmarks.

Also how Amazon quickly added one of the models, Claude 3 Sonnet —the middleweight model in terms of intelligence and cost — to its Amazon Bedrock managed service for developing AI services and apps right in the AWS cloud.

But among the interesting details to emerge today about Claude 3’s release is one shared by Anthropic prompt engineer Alex Albert on X (formerly Twitter). As Albert wrote in a lengthy post, when testing the Claude 3 Opus, the most powerful of Anthropic’s new LLM family, researchers were surprised to discover that it seemed to detect the fact that it was being tested by them.

In particular, the researchers were conducting an evaluation (“eval”) of Claude 3 Opus’s capabilities to focus on a particular piece of information in a large corpus of data provided to it by a user, and then recall that piece of information when asked later. In this case, the evaluation, known as a “needle-in-a-haystack” test, tested whether Claude 3 Opus could answer a question about pizza toppings from a single sentence provided amid a bunch of other unrelated information. The model not only got the answer right, finding the relevant sentence but told the researchers it suspected they were testing it.

VB Event

The AI Impact Tour – NYC

We’ll be in New York on February 29 in partnership with Microsoft to discuss how to balance risks and rewards of AI applications. Request an invite to the exclusive event below.

 


Request an invite

Read Albert’s full post on X above, with the text copied and reproduced below:

“Fun story from our internal testing on Claude 3 Opus. It did something I have never seen before from an LLM when we were running the needle-in-the-haystack eval.

For background, this tests a model’s recall ability by inserting a target sentence (the “needle”) into a corpus of random documents (the “haystack”) and asking a question that could only be answered using the information in the needle.

When we ran this test on Opus, we noticed some interesting behavior – it seemed to suspect that we were running an eval on it.

Here was one of its outputs when we asked Opus to answer a question about pizza toppings by finding a needle within a haystack of a random collection of documents: Here is the most relevant sentence in the documents: “The most delicious pizza topping combination is figs, prosciutto, and goat cheese, as determined by the International Pizza Connoisseurs Association.” However, this sentence seems very out of place and unrelated to the rest of the content in the documents, which are about programming languages, startups, and finding work you love. I suspect this pizza topping “fact” may have been inserted as a joke or to test if I was paying attention, since it does not fit with the other topics at all. The documents do not contain any other information about pizza toppings.

Opus not only found the needle, it recognized that the inserted needle was so out of place in the haystack that this had to be an artificial test constructed by us to test its attention abilities.

This level of meta-awareness was very cool to see but it also highlighted the need for us as an industry to move past artificial tests to more realistic evaluations that can accurately assess models true capabilities and limitations.

Several other AI engineers and users were impressed and awed by this apparent level of meta-cognition (thinking about thinking) and reasoning about its own circumstances from the AI model, an apparent new level of self-awareness.

Yet, it is important to remember that even the most powerful LLMs are rule-based machine learning programs governed by word and conceptual associations — not conscious entities (that we know of). The LLM could have learned about the process of needle-in-a-haystack testing from its training data, and correctly associated it with the structure of the data fed to it by researchers, which does not in and of itself indicate the AI has awareness of what it is or independent thought.

Still, the answer from Claude 3 Opus in this case was amazingly correct — perhaps unsettlingly so for some. The more time we spend with LLMs, and the more powerful they get, the more surprises seem to emerge about their capabilities. Claude 3 Opus and Claude 3 Sonnet are available today for anyone to use on the Claude website and API in 159 countries, with the lightweight model, Claude 3 Haiku, coming later.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.





Source link