Two researchers in the College of Arts and Sciences at Indiana University Bloomington are the recipients of a $457,000 National Science Foundation grant to lead a groundbreaking project that will make it easier to study how the German language has evolved over the past 900 years.
Christopher Sapp, Professor of Germanic Linguistics and Philology, and Rex Sprouse, Professor of Second Language Studies, are leading the project, the Indiana Parsed Corpus of Historical High German, which will create a 1.4-million-word collection of historical German texts, dating from 1050 to 1950, that can be analyzed for changes in grammar and word order over time.
This collection of texts—known as a corpus—will be freely available online for scholars, students, and language enthusiasts to explore. The texts range from chronicles and religious pamphlets, to travel narratives and literary works. And while many similar projects exist for other European languages, there is currently no such resource for High German, which includes central and southern dialects as well as modern Standard German. This project aims to fill that gap and open new possibilities for studying how the language has developed.
Understanding language structure and why it matters
Studying how German sentence patterns have shifted from the Middle Ages to today can shed light on how languages evolve, revealing patterns of change that may apply to other languages like English or French. By understanding these shifts, researchers can trace how societal, cultural, and linguistic factors influence the way we communicate, which has implications for everything from language teaching to preserving linguistic diversity. This knowledge helps people appreciate the dynamic nature of language and how it connects to identity, history, and culture across time.
Linguistics, which is the scientific study of language, includes many branches that focus on different aspects of how languages are used and how they change. One of these branches is syntax, which looks at how words are arranged to form sentences, and how different parts of sentences relate to each other. The Indiana Parsed Corpus will provide an unprecedented opportunity to study how these sentence structures have changed in German over a nearly 1,000-year timeframe.
The texts in the corpus will be also annotated—that is, categorized and labeled—to show different grammatical features, such as the types of clauses, phrases, and the roles that different words play in sentences. This will help researchers explore how German sentence patterns have shifted from the Middle Ages to the modern day.
“Our hope is the Indiana Parsed Corpus of Historical High German will leave a lasting legacy in the study of Germanic linguistics,” said Professor Sapp. “By offering a detailed, accessible database of historical texts, the project will provide scholars with a resource that will fuel research and new insights for years to come. This is a significant step forward for anyone interested in the history of the German language, its dialects, its relationships to neighboring languages, and how languages evolve over time.”
Building the corpus
The project team has selected 165 texts from the central and southern German-speaking regions and developed a computer program to create basic sentence structures. However, computers can’t fully capture all the nuances of language. Therefore, the research team—comprising the professors, postdoctoral scholar Elliott Evans (Ph.D. 2019), researcher Daniel Dakota (Ph.D. 2018), and College graduate students—will spend the next phase of the project manually checking and correcting each sentence to make sure everything is accurate.
This process of correction and review is essential because it ensures the reliability of the data for anyone who uses it in future research. As the texts are reviewed and corrected, the corpus is released in batches on the project’s website, along with tools that allow users to search the texts and visualize the data. To date, 55 of the texts are completely annotated and published to the website.
Another key goal of the Indiana Parsed Corpus of Historical High German is to make it easier for a wide range of people—including students and researchers with limited resources—to access and study the history of the German language.
The corpus will also help train the next generation of linguists. The project team includes a postdoctoral researcher and several graduate students who are gaining hands-on experience in language research, annotation, and leadership. Additionally, the project will host a summer Research Experience for Undergraduates, giving younger students in the College the opportunity to get involved in research.
Public Engagement and Impacts
The project will feature outreach efforts that focus on local German high school teachers. The research team will offer workshops to help teachers create lessons using the tools developed for the corpus. These lessons will introduce high school students to the ways linguists study language change, and how they can use digital tools to explore language history.
Beyond its academic significance, this project will also help demystify the study of language for the general public. The research team plans to give public talks about the project, where they will discuss how technology is used to study historical languages and why understanding language change is important. By sharing their tools and findings, they hope to spark greater public interest in both the German language and the broader field of linguistics.