Justin Zhao
Home Music Demos Blog
Language Model Council

Language Model Council

Explore how 20 LLMs evaluate each other on emotional intelligence tasks — browse scenarios, responses, and judging data from the original LMC paper.

Balanced Accuracy Simulator screenshot

Balanced Accuracy Simulator

Monte Carlo simulator showing why Balanced Accuracy is the best metric for selecting LLM judges for binary prevalence estimation.

© 2026 Justin Zhao