Justin Zhao
Home Music Demos Talks Blog
USA Baby Names Analysis

USA Baby Names Analysis

What can 145 years of US baby names tell us about American culture? Patterns in conformity, gender, sound, contagion, and regional identity.

Sheet Music Bench

Sheet Music Bench

Can frontier multimodal LLMs read sheet music? Models answer 5 basic questions about 35 sheet-music images.

Eval Landscape (Capabilities)

Eval Landscape (Capabilities)

How are frontier AI capability benchmarks born, saturated, and retired? Tracking 150+ model releases from 20+ labs.

Nym Bridge

Nym Bridge

Find baby names that sound or look like a set of reference names, searching 100K+ names from the SSA dataset.

Language Model Council

Language Model Council (Paper)

Can LLMs evaluate each other on emotional intelligence? Browse scenarios, responses, and judging data from the original NAACL paper.

Language Model Council Live Demo

Language Model Council (Live)

What does a council of LLMs think of your prompt — and of each other's answers? Run one live with your own OpenRouter API key.

Balanced Accuracy Simulator screenshot

Balanced Accuracy Simulator

Why is Balanced Accuracy the right metric for selecting LLM judges in binary prevalence estimation?

Linguistic Noun Atlas

Linguistic Noun Atlas

How do nouns behave across 39 languages? Gender, case, classifiers, honorifics, tone, animacy, and more.

© 2026 Justin Zhao