Introduction to Collections as Data

What is Collections As Data?

Collections as data is the idea and practice of using collections (a group of objects, items, texts, etc., typically digital or digitized) as data that can be analysed, represented, etc. by people using computers. The term came into common use in the digital humanities field with the collaborative project Always Already Computational: Collection As Data, directed by Thomas Padilla, that “documented, iterated on, and shared current and potential approaches to developing cultural heritage collections that support computationally-driven research and teaching.” This project was followed by a report on the responsible implementation of collections as data practices, Collections as Data: Part to Whole.

Collections are groups of objects, items, texts, things, etc. As in this term really means “is”; the collections (and their metadata) are serving as data, becoming data, being considered as data. Data are groups of ordered information stored digitally, that are capable of being processed by a computer. So, in sum, collections as data means groups of objects being formatted as groups of ordered information, that are then analysed by a computer.

Collection as data work thus explores the potential of using computational methods to analyse digital collections, digital objects, and their metadata, using digitised and born-digital collections and their metadata as datasets to perform computational analysis.

What does collections as data work look like?

Digital collections and exhibits
Interactive maps and visualizations
Digital databases
Scholarly websites and web archives
Processing, presenting, and interpreting metadata from collections
Computational text analysis
Computational image comparison and analysis
Much, much, more!

What can collections as data entail?

Public humanities work
Creation of datasets that serve a known user need
Collaboratively sharing and documenting processes and practices beyond one’s institution
Openly publishing datasets and associated documentation
Encouraging computational use of digitised and born-digital collections
Lowering barriers to use/access
Enabling bulk download of data and optimising data access
Prioritising static directories and zipped collections
Direct contact with communities
Being guided by ongoing ethical commitments
Aiming to respect the rights and needs of content creators, collections subjects, and user communities (including crowdsourcing, when appropriate)
Valuing interoperability
Ongoing, iterative processes
Much, much, more!

Have questions or interested in exploring a collections as data project? Reach out to Kiran Mohammadi-Williams at kam535@cornell.edu!

References:

Digital Scholarship Guides

Introduction to Collections as Data

What is Collections As Data?

What does collections as data work look like?

What can collections as data entail?

on this page