Top News

Lun Wang leaves Google DeepMind, calls LLM evaluation unsolved problem

NewsBytes | May 19, 2026 5:39 PM CST

Lun Wang proposes 'self-evolving evals' tests

In his blog post, Wang explained that current tests work fine for today's models but totally miss the mark when AI systems start showing new abilities or hiding their weaknesses.
He suggested creating "self-evolving evals," basically smarter tests that adapt as AI systems get more advanced.
Without this upgrade, he warns we could make bad decisions about training and safety.
Wang's message is clear: if we want responsible AI progress, our evaluation tools need to level up too.