<100 subscribers
Share Dialog
Share Dialog
Anthropic, the artificial intelligence (AI) research organization responsible for the Claude large language model (LLM), recently published landmark research into how and why AI chatbots choose to generate the outputs they do.
At the heart of the team’s research lies the question of whether LLM systems such as Claude, OpenAI’s ChatGPT and Google’s Bard rely on “memorization” to generate outputs or if there’s a deeper relationship between training data, fine-tuning and what eventually gets outputted.
According to a recent blog post from Anthropic, scientists simply don’t know why AI models generate the outputs they do.
One of the examples provided by Anthropic involves an AI model that, when given a prompt explaining that it will be permanently shut down, refuses to consent to the termination.
When an LLM generates code, begs for its life or outputs information that is demonstrably false, is it “simply regurgitating (or splicing together) passages from the training set,” ask the researchers. “Or is it combining its stored knowledge in creative ways and building on a detailed world model?”
Anthropic, the artificial intelligence (AI) research organization responsible for the Claude large language model (LLM), recently published landmark research into how and why AI chatbots choose to generate the outputs they do.
At the heart of the team’s research lies the question of whether LLM systems such as Claude, OpenAI’s ChatGPT and Google’s Bard rely on “memorization” to generate outputs or if there’s a deeper relationship between training data, fine-tuning and what eventually gets outputted.
According to a recent blog post from Anthropic, scientists simply don’t know why AI models generate the outputs they do.
One of the examples provided by Anthropic involves an AI model that, when given a prompt explaining that it will be permanently shut down, refuses to consent to the termination.
When an LLM generates code, begs for its life or outputs information that is demonstrably false, is it “simply regurgitating (or splicing together) passages from the training set,” ask the researchers. “Or is it combining its stored knowledge in creative ways and building on a detailed world model?”
No comments yet