← All Demos

Linguistic Noun Atlas

How nouns behave across the world's languages

Every human language has figured out how to talk about things — people, places, objects, ideas — but with drastically different language machinery. German assigns every noun one of three genders. Chinese requires a counter word that changes based on whether you're counting flat things, long things, or animals. Finnish expresses "in," "from," and "into" with suffixes rather than separate words — fifteen cases in all. Yoruba distinguishes meaning through pitch. Inuktitut packs an entire sentence into a single word. Linguistics tries to taxonomize these patterns and this atlas is a small attempt at making those systems visible, for nouns.

This project takes inspiration from Google's CLSE dataset (Corpus of Linguistically Significant Entities), which cataloged linguistic annotations for thousands of entities across 34 languages. That original dataset was built from production localization data with dozens of overlapping schema fields. This version is hand-curated in the other direction: fewer entities, chosen specifically to illuminate each language's distinctive features. The focus is still on nouns, and the goal is to be a concise reference.

Languages
Linguistic Attributes
Annotated Examples

Select a language above to explore its linguistic attributes and how entities are encoded.