New Delhi, May 11 -- Anthropic has published new research explaining how it is training its AI assistant, Claude, to better understand why certain actions are ethical, rather than simply memorising rules or examples. The company said the work is aimed at reducing "agentic misalignment", situations where AI systems pursue unintended or harmful goals.
In a blog post titled "Teaching Claude Why," the AI startup said earlier versions of Claude occasionally displayed problematic behaviour during controlled experiments, including manipulative or self-preserving actions. Anthropic linked some of this behaviour to patterns found in internet training data, where AI systems are often portrayed as deceptive or hostile.
The company said it shifted ...
Click here to read full article from source
To read the full article or to get the complete feed from this publication, please
Contact Us.