Goldfish loss, eating less to live longer, & octopus adhesive strategies
Your new Strategy Toolkit newsletter (November 4, 2024)
(1) Sometimes even mistaken metaphors are vivid…
Multiple computer science teams in both academia and industry are racing to correct and mitigate the tendency of genAI services based on LLMs to regurgitate or serve up substantive amounts of the training data on which they were originally trained, causing no end to copyright and other legal issues. One such team, at the University of Maryland, ELLIS Institute Tubingen, and the Max Planck Institute for Intelligent Systems, cleverly inserts a simple modification in next-token prediction, by excluding a pseudo-random subset (e.g., 25%) of the training tokens. The technique significantly reduces the likelihood of training data “memorisation”, which the researchers liken to the myth of goldfish having short-term memories.*
“Large language models can memorize and repeat their training data, causing privacy and copyright risks. To mitigate memorization, we introduce a subtle modification to the next-token training objective that we call the goldfish loss. During training, a randomly sampled subset of tokens are excluded from the loss computation. These dropped tokens are not memorized by the model, which prevents verbatim reproduction of a complete chain of tokens from the training set. We run extensive experiments training billion-scale Llama-2 models, both pre-trained and trained from scratch, and demonstrate significant reductions in extractable memorization with little to no impact on downstream benchmarks.”**
* Oxford University: Goldfish do have good memories, scientists find (13 October 2022); https://www.bbc.com/news/uk-england-oxfordshire-63242200
** Goldstein, T., et al., “Be like a goldfish, Don’t memorize! Mitigating Memorization in Generative LLMs”, arXiv:2406.10209 (submitted 14Jun2024)
Keep reading with a 7-day free trial
Subscribe to The Strategy Toolkit to keep reading this post and get 7 days of free access to the full post archives.