DIGITAL SCHOLARSHIP GUIDES

Introduction to Collections as Data

What is Collections As Data?

Collections as data is the idea and practice of using collections (a group of objects, items, texts, etc., typically digital or digitized) as data that can be analysed, represented, etc. by people using computers. The term came into common use in the digital humanities field with the collaborative project Always Already Computational: Collection As Data, directed by Thomas Padilla, that “documented, iterated on, and shared current and potential approaches to developing cultural heritage collections that support computationally-driven research and teaching.” This project was followed by a report on the responsible implementation of collections as data practices, Collections as Data: Part to Whole.

Collections are groups of objects, items, texts, things, etc. As in this term really means “is”; the collections (and their metadata) are serving as data, becoming data, being considered as data. Data are groups of ordered information stored digitally, that are capable of being processed by a computer. So, in sum, collections as data means groups of objects being formatted as groups of ordered information, that are then analysed by a computer.

Collection as data work thus explores the potential of using computational methods to analyse digital collections, digital objects, and their metadata, using digitised and born-digital collections and their metadata as datasets to perform computational analysis.

What Does Collections As Data Work Look Like?

  • Digital collections and exhibits
  • Interactive maps and visualizations
  • Digital databases
  • Scholarly websites and web archives
  • Processing, presenting, and interpreting metadata from collections
  • Much, much, more!

What Can Collections As Data Entail?

  • Public humanities work
  • Creation of datasets that serve a known user need
  • Collaboratively sharing and documenting processes and practices beyond one’s institution
  • Openly publishing datasets and associated documentation
  • Encouraging computational use of digitised and born-digital collections
  • Lowering barriers to use/access
  • Enabling bulk download of data and optimising data access
  • Prioritising static directories and zipped collections
  • Direct contact with communities
  • Being guided by ongoing ethical commitments
  • Aiming to respect the rights and needs of content creators, collections subjects, and user communities (including crowdsourcing, when appropriate)
  • Valuing interoperability
  • Ongoing, iterative processes
  • Much, much, more!

Have questions or interested in exploring a collections as data project? Reach out to Kiran Mohammadi-Williams at kam535@cornell.edu!

References:

On this Page