DIGITAL SCHOLARSHIP GUIDES

Building your corpus

Understanding your research goals

In computational text analysis, text is your data, and the text corpus is your dataset. The most important part of any data analysis is knowing the data you are working with: the context in which it was collected, its strengths, its limitations, why it has value and how you relate to it.

Have a healthy amount of skepticism as you move through the iterative process of crafting corpora and performing different text analysis methods. The analysis will always be influenced by what texts you choose to include (or not include) and the perspectives that you and your team bring to the research process.

As you begin to build your corpus, consider:

  • What is the main goal of your research? What texts do you anticipate needing for this project?
  • What kinds of patterns are you interested in exploring, and why?
  • Whose perspectives are incorporated into this text corpus? What historical and social contexts informed the creation of the texts? How might this impact the analysis? 
  • Positionality: How do you (or the research team) relate to the concepts reflected in these texts?
  • What assumptions do you have about the texts and the computational methodologies you’d like to use for analysis?

Further reading

The Digital Humanities Coursebook by Johanna Drucker

The Digital Humanities Coursebook by Johanna Drucker
ISBN: 9781003106531
Publication Date: 2021-03-24
Chapter 7: “Data Mining and Analysis” provides an overview of key concepts and histories in computational text analysis methodologies. It introduces critical approaches to thinking about social implications of data. It also features several exercises for text analysts of all skill levels.

The Digital Black Atlantic by Roopika Risam (Editor); Kelly Baker Josephs (Editor)

The Digital Black Atlantic by Roopika Risam (Editor);
Kelly Baker Josephs (Editor)
ISBN: 9781452965307
Publication Date: 2021-03-16
Chapter 7: “Text Analysis for Thought in the Black Atlantic” assesses the limitations of text analysis methodologies as well as assumptions in understanding the meaning of words, centering perspectives from digital African diaspora studies.

ON THIS PAGE