{"id":2304,"date":"2025-03-20T20:43:03","date_gmt":"2025-03-20T20:43:03","guid":{"rendered":"https:\/\/digitalscholarship.library.cornell.edu\/?p=2304"},"modified":"2025-07-17T17:49:20","modified_gmt":"2025-07-17T17:49:20","slug":"web-archiving-guide","status":"publish","type":"post","link":"https:\/\/digitalscholarship.library.cornell.edu\/?p=2304","title":{"rendered":"Web archiving guide"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"2304\" class=\"elementor elementor-2304\" data-elementor-post-type=\"post\">\n\t\t\t\t<div class=\"elementor-element elementor-element-4270461 e-flex e-con-boxed e-con e-parent\" data-id=\"4270461\" data-element_type=\"container\" data-e-type=\"container\">\n\t\t\t\t\t<div class=\"e-con-inner\">\n\t\t\t\t<div class=\"elementor-element elementor-element-d181989 elementor-widget elementor-widget-text-editor\" data-id=\"d181989\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>According to the Society of American Archivists, web archiving is&#8230;<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1e1b731 elementor-blockquote--skin-quotation elementor-blockquote--align-center elementor-widget elementor-widget-blockquote\" data-id=\"1e1b731\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"blockquote.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<blockquote class=\"elementor-blockquote\">\n\t\t\t<p class=\"elementor-blockquote__content\">\n\t\t\t\tthe process of collecting, preserving, and providing enduring access to web content\t\t\t<\/p>\n\t\t\t\t\t\t\t<div class=\"e-q-footer\">\n\t\t\t\t\t\t\t\t\t\t\t<cite class=\"elementor-blockquote__author\">Dictionary, Society of American Archivists<\/cite>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/blockquote>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div class=\"elementor-element elementor-element-c4e9347 e-flex e-con-boxed e-con e-parent\" data-id=\"c4e9347\" data-element_type=\"container\" data-e-type=\"container\">\n\t\t\t\t\t<div class=\"e-con-inner\">\n\t\t\t\t<div class=\"elementor-element elementor-element-67602e6 elementor-widget elementor-widget-text-editor\" data-id=\"67602e6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tWeb archiving is a process which can take many forms but most commonly involves making and storing \u201cpreserved copies of live web content collected for permanent retention and access\u201d. Practically, this means creating a copy of all of the code behind a webpage and the way that code is displayed at a very specific point in time, with the intention of being able to access that capture of the webpage as-is in the future.\n\nWeb archiving is often done by <a class=\"tooltip\"><strong>crawlers<\/strong><span class=\"tooltiptext\">A web archiving (or &#8220;capture&#8221;) operation that is conducted by an automated agent, called a crawler, a robot, or a spider. Crawls identify materials on the live web that belong in your collections, based upon your choice of seed URLs and scope. Crawl can also reference the archived content associated with the action.<\/span><\/a>, which are automated functions that scrape the web for pages related to a specific URL or collection. Crawlers (also called spiders because they crawl the web) are capable of making distinctions about what pages to preserve, but those decisions are based on human direction and input. For this reason, and given the sheer size and scope of the World Wide Web today, many webpages are not archived by crawlers and will naturally disappear as domains break, are blocked, and expire, as servers go down, as web hosts and builders give up their projects, and as transitions between managers create massive changes to existing sites.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9112f16 elementor-widget elementor-widget-heading\" data-id=\"9112f16\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">So why archive webpages?\n<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0956761 elementor-widget elementor-widget-text-editor\" data-id=\"0956761\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThings are constantly disappearing off of the Internet, and scholarly material is becoming increasingly <a class=\"tooltip\"><span class=\"tooltiptext\">originating in a computer environment<\/span><strong>born-digital<\/strong><\/a>, meaning that it was first published on the web or in digital format. Web archiving is the primary way to preserve the things on the web that you find important. If you have sources you use often on the web, resources you consult, memes you enjoy sharing, or even social media posts that you find important to you, your process, or your research, you should save them so that they do not become inaccessible if\/when they disappear off of the Internet. Web archiving is also a great deterrent against <a class=\"tooltip\"><span class=\"tooltiptext\">The action of examining books, plays, films, correspondence, etc., in order to identify and delete, suppress, or obscure material deemed to be obscene, blasphemous, politically unacceptable, classified information, damaging to morale, etc.; a system or organization for doing this. Sometimes also more generally: the suppression or restriction of free speech, thought, etc.<\/span><strong>censorship<\/strong><\/a>, since no matter what external forces decide to remove from public view, you\u2019ll still be able to view the content that matters to you using an archived copy.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-79272db elementor-widget elementor-widget-image\" data-id=\"79272db\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/media4.giphy.com\/media\/v1.Y2lkPTc5MGI3NjExcWZoMzhiNHo3NGpzcnp4Y3NiNW5veHFwMnBqMmdkNWVob2g3MmI3cCZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw\/l0HU2jsr6sQNaqFUs\/giphy.gif\" title=\"\" alt=\"Skeleten slowly disappearing into nothingsness\" loading=\"lazy\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-01fda4d elementor-widget elementor-widget-text-editor\" data-id=\"01fda4d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Personal websites, digital portfolios, and digital humanities projects in particular are prone to dying. Since these projects may sometimes be one-off assignments for a course or digital work completed by students with more limited resources than gigantic institutions, they often fall prey to the pit of defunctness. Archiving a copy of web-based or web-hosted projects before you officially decide to step away from them is the best way to preserve your work, either for future self-reflection or for other scholars\/users who may find it crucial to their own workflows or research.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ebdf98d elementor-widget elementor-widget-heading\" data-id=\"ebdf98d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">What is web archiving good for?\n<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2acf828 elementor-widget elementor-widget-text-editor\" data-id=\"2acf828\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Note that if you\u2019re looking to archive webpages <i>just<\/i> so that you can see an earlier version of your site, there are much better ways to do that using version control systems like Git or Mercurial. Web archiving is most useful for saving an accessible copy of webpages and websites that are not likely to be repeatedly crawled or are not already part of a dedicated archiving program. At Cornell, for example, archiving a cornell.edu site is a moot point, since Cornell University Library\u2019s Web Archiving Technician crawls all cornell.edu sites on a schedule using an institutional instance of the Internet Archives\u2019 paid ArchiveIt service. Archiving my personal website, my favourite blog post, or an obscure digitized copy of a crucial 19th-century text would be a great use of web archiving, since these sites and pages may not be captured by someone else\u2019s proprietary or institutional crawler.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-89dee2e elementor-widget elementor-widget-image\" data-id=\"89dee2e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"576\" height=\"384\" src=\"https:\/\/digitalscholarship.library.cornell.edu\/wp-content\/uploads\/2025\/03\/web-archive-choices.gif\" class=\"attachment-medium_large size-medium_large wp-image-2308\" alt=\"Bad choice of webpage to archive (Cornell.edu webpage) and good page to archive (personal blog)\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-319d592 elementor-widget elementor-widget-heading\" data-id=\"319d592\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Getting started<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-491b85b elementor-widget elementor-widget-heading\" data-id=\"491b85b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Terms to know\n<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c947a32 elementor-widget elementor-widget-text-editor\" data-id=\"c947a32\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p><a href=\"https:\/\/support.archive-it.org\/hc\/en-us\/articles\/208111686-Glossary-of-Archive-It-and-Web-Archiving-Terms\">ArchiveIt has a glossary of terms<\/a> that may be useful to review, but the main five that will come up in this guide are: snapshot, capture, WARC file, website, and webpage. You may be thinking that website and webpage are self-explanatory terms, but it\u2019s important to know the difference since some web archiving tools will only capture pages, while others will capture entire websites.<\/p><ul><li aria-level=\"1\"><b>Capture: <\/b>A full copy of the digital information encoded within a webpage or website that is then archived.<\/li><li aria-level=\"1\"><b>Snapshot:<\/b> A complete capture of a website or webpage\u2019s content at a specific point in time. As if you had taken a screenshot of the page, but interactive.<\/li><li aria-level=\"1\"><b>WARC file:<\/b> A <b>w<\/b>eb <b>arc<\/b>hive file format that contains a capture of a specific URL.<\/li><li aria-level=\"1\"><b>Website:<\/b> A collection of web resources and pages hosted in one particular place and hosted on the Internet by a web server.<\/li><li aria-level=\"1\"><b>webpage:<\/b> A single HTML-based document on the web, often hosted on a website through a relative URL (e.g. <a href=\"http:\/\/website.com\/web-page\/\">website.com\/web-page\/<\/a>).<\/li><\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a282727 elementor-widget elementor-widget-heading\" data-id=\"a282727\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Types of web-archiving tools\n<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-dbb87d4 elementor-widget elementor-widget-text-editor\" data-id=\"dbb87d4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Before you can get started with archiving a page or a site, it\u2019s important to know what kinds of tools exist. While there are dozens of tools out there, most of them can be categorized using a few basic types: self-hosted and service-hosted; one-page or full-site; and\/or open-source and closed-source.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-473944d elementor-widget elementor-widget-heading\" data-id=\"473944d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Self-hosted and service-hosted\n<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f646ff2 elementor-widget elementor-widget-text-editor\" data-id=\"f646ff2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Whenever something goes on the Internet, it needs to be hosted and served by a web server of some sort. You can read more about the basics of web servers on our guides site. What this means is that when you make your capture of the site or page you want to archive, you\u2019ll have to store that capture somewhere and use some sort of web service to view it.<\/p><p><strong>Self-stored and self-hosted web archiving tools<\/strong> allow you to create a WARC file that you can store on your computer, which you can then serve to your computer\u2019s <a href=\"https:\/\/www.freecodecamp.org\/news\/what-is-localhost\/\">local server<\/a> (which is not connected to the Internet) or view through a browser-based WARC viewer like <a href=\"https:\/\/webrecorder.net\/replaywebpage\/\">WebRecorder\u2019s Replay Webpage.<\/a> The file stays on your computer and is fundamentally controlled and owned by you, but you can use a variety of methods to view the archived page or site.<\/p><p><strong>Service-hosted web archiving tools<\/strong> are full-service pieces of web-based software, meaning all you have to do is make an account, pop in URLs for web sites and pages that you&#8217;d like to preserve, and wait for the software to process them. Service-hosted archiving tools often have built-in viewers, let you add captures to specific collections, and have graphical user interfaces (visual interfaces you can interact with in your browser) for accessing, adding, viewing, and editing captures.\u00a0<\/p><p>So which one should you use? There are many pros and cons to both self-managed and service-managed web archives, so the decision you make can be informed by those positive and negatives. Here are just a few:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4179759 e-n-tabs-mobile elementor-widget elementor-widget-n-tabs\" data-id=\"4179759\" data-element_type=\"widget\" data-e-type=\"widget\" data-settings=\"{&quot;horizontal_scroll&quot;:&quot;disable&quot;}\" data-widget_type=\"nested-tabs.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"e-n-tabs\" data-widget-number=\"68654937\" aria-label=\"Tabs. Open items with Enter or Space, close with Escape and navigate using the Arrow keys.\">\n\t\t\t<div class=\"e-n-tabs-heading\" role=\"tablist\">\n\t\t\t\t\t<button id=\"e-n-tab-title-686549371\" data-tab-title-id=\"e-n-tab-title-686549371\" class=\"e-n-tab-title\" aria-selected=\"true\" data-tab-index=\"1\" role=\"tab\" tabindex=\"0\" aria-controls=\"e-n-tab-content-686549371\" style=\"--n-tabs-title-order: 1;\">\n\t\t\t\t\t\t<span class=\"e-n-tab-title-text\">\n\t\t\t\tPros\t\t\t<\/span>\n\t\t<\/button>\n\t\t\t\t<button id=\"e-n-tab-title-686549372\" data-tab-title-id=\"e-n-tab-title-686549372\" class=\"e-n-tab-title\" aria-selected=\"false\" data-tab-index=\"2\" role=\"tab\" tabindex=\"-1\" aria-controls=\"e-n-tab-content-686549372\" style=\"--n-tabs-title-order: 2;\">\n\t\t\t\t\t\t<span class=\"e-n-tab-title-text\">\n\t\t\t\tCons\t\t\t<\/span>\n\t\t<\/button>\n\t\t\t\t\t<\/div>\n\t\t\t<div class=\"e-n-tabs-content\">\n\t\t\t\t<div id=\"e-n-tab-content-686549371\" role=\"tabpanel\" aria-labelledby=\"e-n-tab-title-686549371\" data-tab-index=\"1\" style=\"--n-tabs-title-order: 1;\" class=\"e-active elementor-element elementor-element-737ff2e e-con-full e-flex e-con e-child\" data-id=\"737ff2e\" data-element_type=\"container\" data-e-type=\"container\">\n\t\t<div class=\"elementor-element elementor-element-ce458c4 e-con-full e-flex e-con e-child\" data-id=\"ce458c4\" data-element_type=\"container\" data-e-type=\"container\">\n\t\t\t\t<div class=\"elementor-element elementor-element-5e337fc elementor-widget elementor-widget-heading\" data-id=\"5e337fc\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h6 class=\"elementor-heading-title elementor-size-default\">Self-hosted<\/h6>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3d050e4 elementor-widget elementor-widget-text-editor\" data-id=\"3d050e4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul><li>Storage limits dependent on your computer and external harddrives (way beyond 5G)<\/li><li>Full control over the archived page<\/li><li>Can experiment with code of archived page<\/li><li>Relatively stable copy of page DOM<\/li><li>Accessible on the fly<\/li><li>Can view on local server or choose own viewer<\/li><\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div class=\"elementor-element elementor-element-fa5933b e-con-full e-flex e-con e-child\" data-id=\"fa5933b\" data-element_type=\"container\" data-e-type=\"container\">\n\t\t\t\t<div class=\"elementor-element elementor-element-347ccc5 elementor-widget elementor-widget-heading\" data-id=\"347ccc5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h6 class=\"elementor-heading-title elementor-size-default\">Service-hosted<\/h6>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-99b5603 elementor-widget elementor-widget-text-editor\" data-id=\"99b5603\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul><li>Almost no work<\/li><li>Easily accessible<\/li><li>Viewer built into the platform<\/li><li>Links are easily sharable with others<\/li><li>Can download WARC and WACZ if needed<\/li><li>PDF or png conversion of snapshot<\/li><li>Automated provenance summaries<\/li><\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div id=\"e-n-tab-content-686549372\" role=\"tabpanel\" aria-labelledby=\"e-n-tab-title-686549372\" data-tab-index=\"2\" style=\"--n-tabs-title-order: 2;\" class=\" elementor-element elementor-element-8b3bb7f e-con-full e-flex e-con e-child\" data-id=\"8b3bb7f\" data-element_type=\"container\" data-e-type=\"container\">\n\t\t<div class=\"elementor-element elementor-element-7826ed2 e-con-full e-flex e-con e-child\" data-id=\"7826ed2\" data-element_type=\"container\" data-e-type=\"container\">\n\t\t\t\t<div class=\"elementor-element elementor-element-0ff2134 elementor-widget elementor-widget-heading\" data-id=\"0ff2134\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h6 class=\"elementor-heading-title elementor-size-default\">Self-hosted<\/h6>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2a5085f elementor-widget elementor-widget-text-editor\" data-id=\"2a5085f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul><li>High upfront investment of time and energy<\/li><li>Might need an external viewer<\/li><li>Requires technical knowledge (command line, programming languages)<\/li><\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div class=\"elementor-element elementor-element-8646e99 e-con-full e-flex e-con e-child\" data-id=\"8646e99\" data-element_type=\"container\" data-e-type=\"container\">\n\t\t\t\t<div class=\"elementor-element elementor-element-80ff2a6 elementor-widget elementor-widget-heading\" data-id=\"80ff2a6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h6 class=\"elementor-heading-title elementor-size-default\">Service-hosted<\/h6>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2bae5e6 elementor-widget elementor-widget-text-editor\" data-id=\"2bae5e6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul><li>Upload limits<\/li><li>Sometimes paid features<\/li><li>Snapshots on proprietary servers (companies go out of business)<\/li><li>Storage limits<\/li><li>No access if the service\/tool server goes down<\/li><\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-824c3fe elementor-widget elementor-widget-text-editor\" data-id=\"824c3fe\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>If you\u2019re just archiving individual webpages that are useful for your research or personal endeavours, then a service-hosted archive on an open-source platform may be your best option. While we can\u2019t guarantee that even big service-hosted options like Internet Archive\u2019s Wayback Machine will remain available forever, there\u2019s a decent likelihood of their long-term stability given current institutional support.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-85e4e84 elementor-widget elementor-widget-heading\" data-id=\"85e4e84\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">One-page and full-site\n\n<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-654a37c elementor-widget elementor-widget-text-editor\" data-id=\"654a37c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>The difference between one-page tools and full-site tools is effectively the difference between taking a snapshot and fully crawling a site. One page web archiving tools will only capture the single page for the URL you give the tool. For example, if I want to archive the projects page of my personal website, I\u2019d tell my tool to capture <a href=\"https:\/\/kam535.github.io\/projects.html\">https:\/\/kam535.github.io\/projects.html<\/a>. If I wanted to archive the entire site, I would have to tell my tool to index the site or find the site\u2019s sitemap for a list of every page on the site with the base url <a href=\"https:\/\/kam535.github.io\/\">https:\/\/kam535.github.io\/<\/a>. The tool would then crawl the site and archive every page on that list.<\/p><p>The caveat is that a full-site software will almost always require a crawler. Crawler\u2019s take physical and digital energy, and are often difficult to use if you\u2019re unfamiliar with web archiving. For this reason, archiving of entire websites is usually only done by institutions, memory organizations, and other larger groups and polities. Institutions that require constant captures of full websites usually use a full-service software like ArchiveIt, which is managed by the Internet Archive. However, archiving an entire website is possible using a variety of tools that can group individually archived webpages into collections for easy access to the entire archived site. Depending on the size of the website, you can also simply archive each page of a website individually, and then access them all from a single tool or software.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5b5b4e2 elementor-widget elementor-widget-heading\" data-id=\"5b5b4e2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Open source and closed source<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-26692f1 elementor-widget elementor-widget-text-editor\" data-id=\"26692f1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Open source refers to software whose source code (the code that makes up and runs the software) is freely available and open to the public to access, modify, distribute, etc. Open source software is (almost always) free and can be remixed into other tools and software, tailored to an individual\u2019s needs and use case, and shared with anyone you want. Open source software is often published under a specific <a href=\"https:\/\/creativecommons.org\/\">Creative Commons license<\/a> that stipulates what can be done with the software (for example, whether it can be modified and then sold commercially).<\/p><p>Closed source refers to software whose source code is owned and copyrighted by an individual, institution, or corporate entity and that cannot be freely shared with the public, modified for personal or commercial use, or distributed openly. Closed source software are black boxes; you can\u2019t see the mechanisms that make them work, only what you put into them and what they spit out. Closed source software is usually paid, with tiers depending on what amount of storage and what features you need.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-653a162 elementor-widget elementor-widget-text-editor\" data-id=\"653a162\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>At the Digital CoLab, we highly prioritize the use of open source software for teaching, learning, and consultations, since we aim to contribute and help others contribute to the greatest possible human good.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1736c19 elementor-widget elementor-widget-heading\" data-id=\"1736c19\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Software and tool options<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f293b16 elementor-widget elementor-widget-text-editor\" data-id=\"f293b16\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Michael Hucka, a Staff Software Engineer at Google, has compiled a gigantic <a href=\"https:\/\/docs.google.com\/spreadsheets\/d\/1FqxwaZnIhhQ7jDCC-W64NMRf5rDeh2Shx3u01MsBmTQ\/edit?gid=0#gid=0\">web archiving software comparison list<\/a> assessing a broad range of web archiving tools, both open-source and closed-source. Given that this list may be impenetrable to those new to web archiving, we have a few suggestions and recommendations for tools and software depending on your needs.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b4a1835 elementor-widget elementor-widget-text-editor\" data-id=\"b4a1835\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Note that because of the Digital CoLab\u2019s values, all of the tools listed will be open-source (and thus free). Paying for web archiving software can be useful for institutions, but individuals should have access to high quality tools and software that can help them contribute to the public good without accruing a financial burden. Additionally, open-source tools invite innovation; developments and improvements to open-source software are common and encouraged, meaning that a useful plugin or even a better version of what you\u2019re using may crop up in the future (and who knows, maybe even you could build one!).<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-147dd6f elementor-widget elementor-widget-heading\" data-id=\"147dd6f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Full-service software<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div class=\"elementor-element elementor-element-37cde22 e-grid e-con-boxed e-con e-parent\" data-id=\"37cde22\" data-element_type=\"container\" data-e-type=\"container\">\n\t\t\t\t\t<div class=\"e-con-inner\">\n\t\t\t\t<div class=\"elementor-element elementor-element-e84d580 elementor-flip-box--effect-flip elementor-flip-box--direction-up elementor-widget elementor-widget-flip-box\" data-id=\"e84d580\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"flip-box.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-flip-box\" tabindex=\"0\">\n\t\t\t<div class=\"elementor-flip-box__layer elementor-flip-box__front\">\n\t\t\t\t<div class=\"elementor-flip-box__layer__overlay\">\n\t\t\t\t\t<div class=\"elementor-flip-box__layer__inner\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"elementor-flip-box__image\">\n\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/seeklogo.com\/images\/P\/perma-cc-logo-C47ADCFF7E-seeklogo.com.png\" title=\"\" alt=\"\" loading=\"lazy\" \/>\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t<h6 class=\"elementor-flip-box__layer__title\">\n\t\t\t\t\t\t\t\tPerma.cc\t\t\t\t\t\t\t<\/h6>\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"elementor-flip-box__layer__description\">\n\t\t\t\t\t\t\t\tPerma.cc is a service that allows you to paste URLs, one-by-one or in bulk, to generate a permalink that you can use to access a snapshot of a webpage. \t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t\t<a class=\"elementor-flip-box__layer elementor-flip-box__back\" href=\"https:\/\/guides.library.cornell.edu\/perma\">\n\t\t\t<div class=\"elementor-flip-box__layer__overlay\">\n\t\t\t\t<div class=\"elementor-flip-box__layer__inner\">\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t<div class=\"elementor-flip-box__layer__description\">\n\t\t\t\t\t\t\tPerma.cc is open-source and free, but requires you to create an account to start archiving pages. Perma.cc allows you to export your captures as WARC or WACZ files, sort your links into folders\/collections, and copy the unique permalink for your snapshot to share across the web or to use in your citations. Cornell University offers free Perma.cc accounts that will give you unlimited storage.\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t<\/a>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9de1cc2 elementor-flip-box--effect-flip elementor-flip-box--direction-up elementor-widget elementor-widget-flip-box\" data-id=\"9de1cc2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"flip-box.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-flip-box\" tabindex=\"0\">\n\t\t\t<div class=\"elementor-flip-box__layer elementor-flip-box__front\">\n\t\t\t\t<div class=\"elementor-flip-box__layer__overlay\">\n\t\t\t\t\t<div class=\"elementor-flip-box__layer__inner\">\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t<h6 class=\"elementor-flip-box__layer__title\">\n\t\t\t\t\t\t\t\t\ud83c\udf32 Conifer\t\t\t\t\t\t\t<\/h6>\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"elementor-flip-box__layer__description\">\n\t\t\t\t\t\t\t\tFormerly known as Webrecorder.io, Conifer is similar to Perma.cc in that you paste URLs to generate snapshots for a page. \t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t\t<a class=\"elementor-flip-box__layer elementor-flip-box__back\" href=\"https:\/\/conifer.rhizome.org\/\">\n\t\t\t<div class=\"elementor-flip-box__layer__overlay\">\n\t\t\t\t<div class=\"elementor-flip-box__layer__inner\">\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t<div class=\"elementor-flip-box__layer__description\">\n\t\t\t\t\t\t\tConifer lets you start recording a capture where you can browse through multiple pages on a single website, which are then individually snapshotted as part of the capture (you press a stop button when you\u2019re done). Conifer then organizes captures into collections, which can be downloaded as a WARC file. Conifer is open-source and free, but requires you to create an account. Conifer is the easiest full-service option for saving an entire website.\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t<\/a>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6a675ed elementor-flip-box--effect-flip elementor-flip-box--direction-up elementor-widget elementor-widget-flip-box\" data-id=\"6a675ed\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"flip-box.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-flip-box\" tabindex=\"0\">\n\t\t\t<div class=\"elementor-flip-box__layer elementor-flip-box__front\">\n\t\t\t\t<div class=\"elementor-flip-box__layer__overlay\">\n\t\t\t\t\t<div class=\"elementor-flip-box__layer__inner\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"elementor-flip-box__image\">\n\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/0\/01\/Wayback_Machine_logo_2010.svg\" title=\"\" alt=\"\" loading=\"lazy\" \/>\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t<h6 class=\"elementor-flip-box__layer__title\">\n\t\t\t\t\t\t\t\tThe Wayback Machine\t\t\t\t\t\t\t<\/h6>\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"elementor-flip-box__layer__description\">\n\t\t\t\t\t\t\t\tThe Wayback Machine is managed by Internet Archive and is perhaps the most popular web archiving tool. \t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t\t<a class=\"elementor-flip-box__layer elementor-flip-box__back\" href=\"https:\/\/web.archive.org\/\">\n\t\t\t<div class=\"elementor-flip-box__layer__overlay\">\n\t\t\t\t<div class=\"elementor-flip-box__layer__inner\">\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t<div class=\"elementor-flip-box__layer__description\">\n\t\t\t\t\t\t\tWhen you paste a URL into the search bar, the Wayback Machine will search for existing snapshots of that URL and display a calendar that shows when (if any) each existing snapshot was taken. If no snapshot has already been taken, the Wayback Machine will prompt you to archive the page. The Wayback Machine is free and open-source; an account is not required to archive a page, but is required to save a screenshot to your personal web archive and to get a WACZ file of the capture.\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t<\/a>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div class=\"elementor-element elementor-element-06ce664 e-flex e-con-boxed e-con e-parent\" data-id=\"06ce664\" data-element_type=\"container\" data-e-type=\"container\">\n\t\t\t\t\t<div class=\"e-con-inner\">\n\t\t\t\t<div class=\"elementor-element elementor-element-872357a elementor-widget elementor-widget-heading\" data-id=\"872357a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Self-managed<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div class=\"elementor-element elementor-element-a25d1b0 e-grid e-con-boxed e-con e-parent\" data-id=\"a25d1b0\" data-element_type=\"container\" data-e-type=\"container\">\n\t\t\t\t\t<div class=\"e-con-inner\">\n\t\t\t\t<div class=\"elementor-element elementor-element-c2a669f elementor-flip-box--effect-flip elementor-flip-box--direction-up elementor-widget elementor-widget-flip-box\" data-id=\"c2a669f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"flip-box.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-flip-box\" tabindex=\"0\">\n\t\t\t<div class=\"elementor-flip-box__layer elementor-flip-box__front\">\n\t\t\t\t<div class=\"elementor-flip-box__layer__overlay\">\n\t\t\t\t\t<div class=\"elementor-flip-box__layer__inner\">\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t<h6 class=\"elementor-flip-box__layer__title\">\n\t\t\t\t\t\t\t\t\ud83c\udf68 Scoop\t\t\t\t\t\t\t<\/h6>\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"elementor-flip-box__layer__description\">\n\t\t\t\t\t\t\t\tScoop is an easy-to-use, high-fidelity, one-page web archiving library run through your computer\u2019s command line.\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t\t<a class=\"elementor-flip-box__layer elementor-flip-box__back\" href=\"https:\/\/github.com\/harvard-lil\/scoop\">\n\t\t\t<div class=\"elementor-flip-box__layer__overlay\">\n\t\t\t\t<div class=\"elementor-flip-box__layer__inner\">\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t<div class=\"elementor-flip-box__layer__description\">\n\t\t\t\t\t\t\tScoop allows you to run a command to \u201cscoop\u201d a browser-based snapshot of a web page in a WARC or WACZ file, which you can then save directly to a folder on your computer. Scoop is easy to install, open-source, and free.\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t<\/a>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2f3c038 elementor-flip-box--effect-flip elementor-flip-box--direction-up elementor-widget elementor-widget-flip-box\" data-id=\"2f3c038\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"flip-box.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-flip-box\" tabindex=\"0\">\n\t\t\t<div class=\"elementor-flip-box__layer elementor-flip-box__front\">\n\t\t\t\t<div class=\"elementor-flip-box__layer__overlay\">\n\t\t\t\t\t<div class=\"elementor-flip-box__layer__inner\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"elementor-flip-box__image\">\n\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/avatars.githubusercontent.com\/u\/74894248?s=280&#038;v=4\" title=\"\" alt=\"\" loading=\"lazy\" \/>\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t<h6 class=\"elementor-flip-box__layer__title\">\n\t\t\t\t\t\t\t\tArchiveBox\t\t\t\t\t\t\t<\/h6>\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"elementor-flip-box__layer__description\">\n\t\t\t\t\t\t\t\tArchiveBox is a self-hosted application that you can use to archive URLs through the command line or through a browser-based GUI run through your local host.\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t\t<a class=\"elementor-flip-box__layer elementor-flip-box__back\" href=\"https:\/\/www.google.com\/url?sa=t&#038;source=web&#038;rct=j&#038;opi=89978449&#038;url=http:\/\/archivebox.io\/\">\n\t\t\t<div class=\"elementor-flip-box__layer__overlay\">\n\t\t\t\t<div class=\"elementor-flip-box__layer__inner\">\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t<div class=\"elementor-flip-box__layer__description\">\n\t\t\t\t\t\t\tYou can also use node.js to host the instance somewhere else. ArchiveBox\u2019s graphical user interface makes it easier to archive links to your local computer without having to rely exclusively on the command line. ArchiveBox also has great documentation! ArchiveBox is open-source and free, and you can download it using pip or git.\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t<\/a>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div class=\"elementor-element elementor-element-b54d8c6 e-flex e-con-boxed e-con e-parent\" data-id=\"b54d8c6\" data-element_type=\"container\" data-e-type=\"container\">\n\t\t\t\t\t<div class=\"e-con-inner\">\n\t\t\t\t<div class=\"elementor-element elementor-element-445f0bb elementor-widget elementor-widget-text-editor\" data-id=\"445f0bb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>With all of the above self-managed software, you can use your local server or a viewer like Replay Webpage to look at your WARC files.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5cd9a28 elementor-widget elementor-widget-heading\" data-id=\"5cd9a28\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Guide for using self-managed software<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c81ec42 elementor-widget elementor-widget-heading\" data-id=\"c81ec42\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Step 1. Learn to use the command line<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5cf9736 elementor-widget elementor-widget-text-editor\" data-id=\"5cf9736\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>If you\u2019d like to have a self-managed web archive, you will need to learn how to use your computer\u2019s command line effectively. Almost all self-hosted web archiving software uses the command line, in part since this is where you\u2019ll connect to your local server. Programming Historian has approachable lessons in English for using the <a href=\"https:\/\/programminghistorian.org\/en\/lessons\/intro-to-bash\">Bash Command Line<\/a> (MacOS) or <a href=\"https:\/\/programminghistorian.org\/en\/lessons\/intro-to-powershell\">PowerShell<\/a> (Windows).<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1fd86af elementor-widget elementor-widget-heading\" data-id=\"1fd86af\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Step 2. Decide what software you want to use\n<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c602c9e elementor-widget elementor-widget-text-editor\" data-id=\"c602c9e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Deciding what software you\u2019d like to use can be difficult! Luckily, there are plenty of lists like <a href=\"https:\/\/docs.google.com\/spreadsheets\/d\/1FqxwaZnIhhQ7jDCC-W64NMRf5rDeh2Shx3u01MsBmTQ\/edit?gid=0#gid=0\">this web archiving software comparison list<\/a> and this <a href=\"https:\/\/github.com\/iipc\/awesome-web-archiving?tab=readme-ov-file\">awesome web archiving list,<\/a> which will give you basic information about the features of each software or tool. If you don\u2019t understand a term that\u2019s used to describe a piece of software, make sure you look it up! The difference between a software that offers a CLI (command line interface) and a GUI (graphical user interface) could be the difference between having a usable tool and having an archive you don\u2019t enjoy working with. Looking for something specific or not sure what you\u2019re looking for? Feel free to reach out to the Digital CoLab for help.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0e2f63b elementor-widget elementor-widget-heading\" data-id=\"0e2f63b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Step 3. Read the software documentation<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4c6d798 elementor-widget elementor-widget-text-editor\" data-id=\"4c6d798\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>After you\u2019ve learned to use the command line and have decided on a software, you\u2019ll want to navigate to the documentation page for whichever software you decide to use. Documentation includes information for every step and feature of a piece of software. Good documentation should give you a list of dependencies, which are pieces of software or tools that you need to install on your computer before you can use the software. The best place to start in any piece of documentation is usually the <b>Quickstart<\/b>. Quickstart sections will give you a basic list of things to do to just get the software up and running, and then give you a list of things to do to get the software to do basic tasks (like capture a single webpage and save it into a folder).\u00a0<\/p><p>For example, ArchiveBox\u2019s documentation has a <a href=\"https:\/\/github.com\/ArchiveBox\/ArchiveBox#quickstart\"><b>Quickstart<\/b><\/a> section that guides you through three steps for getting started with archiving pages. You\u2019ll want to make sure you\u2019re following the steps that match your operating system (MacOS, Linux, Windows, etc.). If you\u2019re not sure what your operating system is, you can find it in the system settings for your computer.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-25058ac elementor-widget elementor-widget-heading\" data-id=\"25058ac\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Step 4. Follow the documentation and start archiving!<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1015ba0 elementor-widget elementor-widget-text-editor\" data-id=\"1015ba0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Once you\u2019ve followed the documentation, you should be all set up to start archiving pages! If you want to further control the methods of archiving, the output (snapshot, WARC file, etc.), and any other settings (such as setting up a browser-based interface to access your links), you should peruse the documentation for your software for more advanced guidance and information.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1e200d6 elementor-widget elementor-widget-heading\" data-id=\"1e200d6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Step 5. (Optional) Learning proper file management<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7ee09c1 elementor-widget elementor-widget-text-editor\" data-id=\"7ee09c1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>You may assume that self-hosted options or self-captured options are more stable, since they&#8217;re not living on someone else&#8217;s services, but the increased lack of literacy in basic file management indicates that this isn&#8217;t necessarily true. If you delete your WARC file or misplace it, your copy is gone.<\/p><p>File management is a skill dependent on understanding how your computer organizes files. One of the best ways to learn how to set up a file management system is to watch other people do it and try to find what works for you.<\/p><p>Some file management guides for Mac:<\/p><ul><li aria-level=\"1\"><a href=\"https:\/\/www.youtube.com\/watch?v=3TAEC-1YUZw\">Understanding the File And Folder Structure Of Your Mac by macmostvideo<\/a><\/li><li aria-level=\"1\"><a href=\"https:\/\/www.youtube.com\/watch?v=Ar19Bqmaprc&amp;pp=ygUTZmlsZSBtYW5hZ2VtZW50IG1hYw%3D%3D\">MacOS File and Folder Structure Tutorial &#8211; The Basics<\/a> by Craig Neidel<\/li><\/ul><p>Some file management guides for Windows:<\/p><ul><li aria-level=\"1\"><a href=\"https:\/\/www.youtube.com\/watch?v=gDhhXI7hGoI\">Windows 11 &#8211; Files &amp; Folders for Beginners &#8211; Get Organized &#8211; Get Control of Your Files &amp; Folders<\/a> by Your Windows Guru<\/li><li aria-level=\"1\"><a href=\"https:\/\/www.youtube.com\/watch?v=Mlb09xsIDLc\">Windows 10 &#8211; File Explorer Management Tutorial &#8211; How to Organize Files and Folders &#8211; Folder Manager<\/a> by Professor Adam Morgan<ul><li aria-level=\"2\">Note that Windows 11 has a Files app that is intended to optimize file management. See <a href=\"https:\/\/www.youtube.com\/watch?v=qI4cVPN3f1U&amp;pp=ygURZmlsZXMgYXBwIHdpbmRvd3M%3D\">The File Explorer Replacement &#8211; The Files App for Windows!<\/a> by Productive Tech.<\/li><\/ul><\/li><\/ul><p>Some file management guides for Linux:<\/p><ul><li aria-level=\"1\"><a href=\"https:\/\/www.youtube.com\/watch?v=JPUGuKqbqwo\">File Management in Linux<\/a> by Peter Kay<\/li><\/ul><p>You can find more guides on YouTube; just make sure that you\u2019re searching for a guide that addresses your operating system and operating system version, since operating systems are updated somewhat regularly by Apple and Microsoft.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d78af63 elementor-widget elementor-widget-heading\" data-id=\"d78af63\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Step 6. (Optional) View your archive using an external viewer<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c45673c elementor-widget elementor-widget-text-editor\" data-id=\"c45673c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Depending on what software you\u2019re using, you may have to use an external viewer to look at your archived pages, especially if you have a folder full of WARC files. If your software produces HTML files, you should be able to open those files in any browser on your computer just by double clicking or right clicking the file and clicking \u201cOpen\u201d.<\/p><p>The main browser-based viewer for WARC and WACZ files is <a href=\"https:\/\/replayweb.page\/\">Replay Webpage<\/a>. Replay Webpage lets you upload WARC and WACZ files and view their contents in your browser. You can also install and use a replay tool like <a href=\"https:\/\/netpreserve.org\/web-archiving\/playback\/\">pywb, SolrWayback, or OpenWayback<\/a> and use the command line to serve them to a browser.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>According to the Society of American Archivists, web archiving is&#8230; the process of collecting, preserving, and providing enduring access to web content Dictionary, Society of American Archivists Web archiving is a process which can take many forms but most commonly involves making and storing \u201cpreserved copies of live web content collected for permanent retention and [&hellip;]<\/p>\n","protected":false},"author":6,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[19],"tags":[],"class_list":["post-2304","post","type-post","status-publish","format-standard","hentry","category-platform-guides"],"_links":{"self":[{"href":"https:\/\/digitalscholarship.library.cornell.edu\/index.php?rest_route=\/wp\/v2\/posts\/2304","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/digitalscholarship.library.cornell.edu\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/digitalscholarship.library.cornell.edu\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/digitalscholarship.library.cornell.edu\/index.php?rest_route=\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/digitalscholarship.library.cornell.edu\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2304"}],"version-history":[{"count":22,"href":"https:\/\/digitalscholarship.library.cornell.edu\/index.php?rest_route=\/wp\/v2\/posts\/2304\/revisions"}],"predecessor-version":[{"id":2550,"href":"https:\/\/digitalscholarship.library.cornell.edu\/index.php?rest_route=\/wp\/v2\/posts\/2304\/revisions\/2550"}],"wp:attachment":[{"href":"https:\/\/digitalscholarship.library.cornell.edu\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2304"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/digitalscholarship.library.cornell.edu\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2304"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/digitalscholarship.library.cornell.edu\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2304"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}