corpus
n. countablen. a large collection of written or spoken texts used for studying a language. Researchers use it to see how people actually use words in real life.
n. a large and structured set of texts, typically electronically stored and processed, used for linguistic analysis and statistical hypothesis testing. Often refers to the entire body of work by a specific author or on a specific subject.
The researcher searched the corpus for examples of the word.
By analyzing a massive corpus of emails, the linguists identified how professional greetings have changed over the last decade.
The development of a representative corpus is essential for training natural language processing models that can accurately reflect the nuances of contemporary dialectal variations.
Borrowed from Latin corpus (“body”). Doublet of corpse, corps, and riff.
The plural form is 'corpora', though 'corpuses' is occasionally accepted in non-technical contexts.