Skip to main content

LINGUISTICS

LINGUISTICS RESEARCH

INSTAR's linguistics program investigates language as both a cognitive capacity and a social system, combining experimental methods, computational analysis, and fieldwork to address questions that matter for AI development, public communication, and cultural preservation. Our interdisciplinary approach connects formal linguistics with machine learning, cognitive science, and data analytics to advance understanding at the frontier where human language meets computational systems.

Researcher working on computational linguistics and natural language processing models

Computational Linguistics & NLP

We develop and critically evaluate neural language models, parsing architectures, and semantic analysis tools that bridge formal linguistic theory with contemporary machine learning. Research interests include multilingual information extraction, automated discourse analysis, and building language technologies for low-resource languages whose speakers are underserved by existing AI systems.

Field linguist documenting an endangered language with community members

Language Documentation

INSTAR's language documentation research focuses on creating rigorous records of under-documented languages in collaboration with speaker communities. Research outputs include annotated corpora, grammatical analyses, and digital archives with dual purpose: advancing linguistic science and supporting community-led language revitalization efforts.

Eye-tracking equipment used in psycholinguistics experiments

Psycholinguistics

Using eye-tracking, EEG, and reaction-time paradigms, we study how the brain processes language in real time. Research interests include how listeners resolve syntactic ambiguity, integrate intonational cues, and interpret non-literal meaning — with comparisons across typologically diverse languages that test the generality of cognitive models. These findings inform both basic science and the design of more human-like AI language systems.

Acoustic analysis waveforms and spectrogram in phonetics laboratory

Phonetics & Phonology

Our phonetics research combines articulatory measurement, acoustic analysis, and signal-processing tools to investigate the physical and perceptual properties of speech sounds and the abstract phonological patterns that govern them. Cross-linguistic comparison is central: understanding how the world's languages partition acoustic space illuminates both universal constraints and the range of human sound systems. Researchers at any level interested in phonetics or computational linguistics are welcome to explore our Fellowship at /fellowship/.

Public Data Foundations

Grounded in Open Linguistic Data

INSTAR's linguistics research is anchored in publicly accessible speech, text, and language corpora that allow findings to be reproduced, scrutinized, and built upon by the broader scientific community. Open datasets are especially critical in computational linguistics, where training and evaluation corpora must be transparent to avoid concealed biases.

Our primary open linguistic sources:

  • Linguistic Data Consortium — curated speech and text corpora used for training and evaluating NLP and computational linguistics systems.
  • Mozilla Common Voice — open, multilingual speech dataset for acoustic modeling and low-resource language research.
  • TalkBank / CHILDES — open transcripts of human conversation and child language acquisition data for developmental and discourse research.
  • Data.gov — federal open datasets supporting sociolinguistics, language policy, and communication research.

Explore our open-data approach →

OUR PARTNERS

For Researchers

Join the INSTAR Fellowship

The INSTAR Fellowship is an open citizen-scientist program — no minimum degree required, selection based on fit with our research culture. Structured mentorship, interdisciplinary scope, and the freedom to pursue hard problems.