What can 145 years of US baby names tell us about American culture? Patterns in conformity, gender, sound, contagion, and regional identity.
Can frontier multimodal LLMs read sheet music? Models answer 5 basic questions about 35 sheet-music images.
How are frontier AI capability benchmarks born, saturated, and retired? Tracking 150+ model releases from 20+ labs.
Find baby names that sound or look like a set of reference names, searching 100K+ names from the SSA dataset.
Can LLMs evaluate each other on emotional intelligence? Browse scenarios, responses, and judging data from the original NAACL paper.
What does a council of LLMs think of your prompt — and of each other's answers? Run one live with your own OpenRouter API key.
Why is Balanced Accuracy the right metric for selecting LLM judges in binary prevalence estimation?
How do nouns behave across 39 languages? Gender, case, classifiers, honorifics, tone, animacy, and more.