Meten is Weten: Benchmarks voor Generatieve AI

Launch event vision on Generative AI, Den Haag, The Netherlands. Jan 18, 2024.

Summary

This presentation addressed the critical need for systematic benchmarking and evaluation methodologies for generative AI systems, particularly focusing on measuring bias and risks in large language models (LLMs) when deployed for decision-making processes. Schuth presented approaches for systematically evaluating AI outputs through structured input-output measurement and comparison methodologies, drawing from her extensive industry experience in AI personalization at companies like Google, Spotify, and DPG Media. The work emphasizes the importance of rigorous evaluation frameworks to assess where risks of LLM deployment are greatest, particularly in high-stakes decision-making contexts, contributing to the broader discourse on responsible AI implementation in both academic and industry settings.

Slides

📄 View Slides PDF directly

Links

Slides