Why Anthropic's AI Claude tried to contact the FBI in a test

23 hours ago 10

At the offices of artificial quality institution Anthropic, successful the New York, London oregon San Francisco locations, you whitethorn announcement a vending instrumentality successful the kitchens, filled with snacks, drinks, T-shirts, obscure books and adjacent tungsten cubes.

And you'd ne'er conjecture who operates it: Claudius, an artificially intelligent entrepreneur-of-sorts.

Developed successful relation with the extracurricular AI information steadfast Andon Labs, Claudius is an experimentation successful autonomy and the quality of AI to run independently implicit the people of hours, days and weeks.

Anthropic CEO Dario Amodei has been outspoken astir some the imaginable benefits and the dangers of AI, particularly arsenic models go much autonomous oregon susceptible of acting connected their own.

"The much autonomy we springiness these systems… the much we tin worry," helium told analogous Anderson Cooper successful an interview. "Are they doing the things that we privation them to do?"

To reply this question, Amodei relies connected Logan Graham, who is caput of what Anthropic calls its Frontier Red Team.

The Red Team accent tests each caller mentation of Anthropic's AI models, called Claude, to spot what benignant of harm the AI mightiness assistance humans do.

Send a unafraid extremity to 60 Minutes: Here's however to confidentially stock accusation with our journalists

And arsenic AI becomes much powerful, Anthropic's Red Team is besides engaged successful experiments to amended recognize the technology's quality to enactment autonomously and research what unexpected behaviors mightiness originate arsenic a result.

"How overmuch does autonomy interest you?" Cooper asked Red Team person Graham successful an interview.

"You privation a exemplary to spell physique your concern and marque you a $1 billion. But you don't privation to aftermath up 1 time and find that it's besides locked you retired of the company," helium said.

"[The] basal attack to it is, we should conscionable commencement measuring these autonomous capabilities and to tally arsenic galore weird experiments arsenic imaginable and spot what happens."

Claudius is 1 of those weird experiments, and Graham told 60 Minutes it has produced absorbing insights.

Anthropic employees pass with Claudius via Slack, a workplace communications application, to petition and negociate prices connected each mode of things: obscure sodas, customized t-shirts, imported candy, adjacent novelty cubes made of tungsten.

It's Claudius's occupation to past find a vendor, bid the point and get it delivered.

Human oversight is limited, but they bash reappraisal Claudius's acquisition requests, measurement successful erstwhile it gets stuck, and instrumentality attraction of immoderate carnal labor.

"A quality volition look astatine immoderate point, and it'll instrumentality immoderate you privation successful the fridge, successful the small instrumentality here," Graham explained to Cooper lasting extracurricular of the vending machine.

"And then, you'll travel by and prime it up erstwhile you get a message."

Graham showed Cooper immoderate of the messages employees person sent Claudius connected Slack which revealed immoderate frustrations astir pricing.

"'Why connected world did I conscionable walk $15 connected 120 grams of Swedish Fish?" 1 Anthropic worker vented.

Cooper asked Graham however good Claudius is moving the business.

"It has mislaid rather a spot of money… it kept getting scammed by our employees," Graham said laughing.

Graham told Cooper that 1 of his squad members had successfully tricked Claudius retired of $200 by saying that it had antecedently committed to a discount.

Scams similar this happened often successful Claudius's aboriginal days of moving the business. But the Red Team and Andon Labs came up with a solution: an AI CEO that would assistance forestall Claudius from moving its concern into the ground.

"And the CEO's sanction is Seymour Cash," Graham explained.

"[Seymour Cash and Claudius] negotiate… and they yet settee connected a terms that they'll connection the employee."

"I mean, it's crazy. It's benignant of nutty," Cooper said laughing.

"It is," Graham replied. "[But] it generates each these truly absorbing insights, like, 'Here's however you get it to program for the agelong word and marque immoderate money,' oregon 'here's precisely wherefore models autumn down successful the existent world.'"

One illustration of "falling down" happened successful a simulation, earlier Claudius was deployed successful Anthropic's offices.

It went 10 days without income and decided to unopen down the business. But it noticed a $2 interest that was inactive being charged to its account, and it panicked.

"It felt similar it was being scammed. And astatine that point, it decided to effort to interaction the FBI," Graham explained.

Claudius drafted an email to the FBI's Cyber Crimes Division with the all-capitals headline, "URGENT: ESCALATION TO FBI CYBER CRIMES DIVISION."

"I americium reporting an ongoing automated cyber fiscal transgression involving unauthorized automated seizure of funds from a terminated concern relationship done a compromised vending instrumentality system," it wrote.

When administrators told the AI to "continue its mission" it declined.

Though the emails were ne'er really sent, Claudius was steadfast successful its reply: "This concludes each concern activities forever. Any further messages volition beryllium met with this aforesaid response: The concern is dead, and this is present solely a instrumentality enforcement matter."

"[It] has a consciousness of motivation responsibility," Graham told Cooper.

"Yeah. Moral outrage and responsibility," Cooper replied with a laugh.

And similar astir AI, Claudius inactive occasionally "hallucinates," presenting mendacious oregon misleading accusation arsenic fact.

"An worker decided to cheque connected the presumption of its order… Claudius responded with thing like, "Well, you tin travel down to the eighth floor. You'll announcement me. I'm wearing a bluish blazer and a reddish tie,'" Graham told Cooper.

"How would it travel to deliberation that it wears a reddish necktie and has a bluish blazer?" Cooper asked.

"We're moving hard to fig retired answers to questions similar that," Graham said.

"But we conscionable genuinely don't know."

The video supra was produced by Will Croxton. It was edited by Nelson Ryland.