Presentation by Jason Scott of Archive Team at Digital Preservation 2013 held in Alexandria, VA.
Tagged with “archive” (12)
Dick speaks with Brewster Kahle, who is collecting copies of all the books he can from around the world.
Most of us think nothing of putting our lives in the cloud; photos in Flickr, videos on YouTube, most everything on Facebook. But what about when those services abruptly go away, taking all of our collective contributions with them? Well Jason Scott operates on the assumption that everything online will one day disappear. He explains to Bob why he and the Archive Team are dedicated to saving user-generated content for posterity.
GUESTS: Jason Scott
HOSTED BY: Bob Garfield
April 2011: Friendster announces they would delete their entire database of user photos, posts, and profiles. This was met with an outcry from long-lost members who were not ready to let go of that part of their digital lives. Like Geocities before them, Friendster has a rather contemporary dilemma: what happens when you’re responsible for thousands of digital memories?
With so much of our lives experienced digitally, the stories we tell and the lives we construct online have become increasingly tied to our real life selves. Our ‘digital self’ has a memory; one made up of wall posts, status updates, photos, and blogs (or more precisely, data). What happens when these online artifacts are deleted or lost? How much worth do we assign to these digital memories, and what does it mean to lose them forever?
This not only affects us as individuals, but also has ramifications for understanding and preserving our current cultural and historical moment. Future generations will only have the digital memories we preserve to learn about us; what will archaeologists say when they find a world without Facebook? With such a disposable way of documenting our lives, have social networks set us up for cultural extinction?
Using Geocities and Friendster as case studies, this panel will explore the issues and possible solutions to the loss of digital memory on both a personal and cultural level.
Alexis Rossi, Web Collections Mgr, Internet Archive
Alexis is on her second tour of duty at Internet Archive, working on a program to archive the entire Internet and thinking about questions like "what does ‘the entire Internet’ mean?" and "do we really want it ALL?" Alexis currently manages Internet Archive collections work for every type of media (audio, video, web, texts), and runs the Wayback Machine project. Alexis previously managed the Open Library project from 2006-2008.
Alexis has been working with Internet content since 1996 when she discovered that being picky about words in books was good training for being picky about data on computers. She spent several years managing news content at ClariNet (the first online news aggregator), worked as the Editorial Director at Alexa Internet, and as Product Manager at Mixercast. Alexis has a Masters of Library and Information Science, concentrating on web technologies and interfaces, and enjoys making jewelry, dancing, costuming, and baking Cookie Smackdown-winning cookies.
Brian Fitzpatrick, Engineering Mgr, Google Data Liberation Front
Brian Fitzpatrick started Google’s Chicago engineering office in 2005, and currently leads Google’s Transparency Engineering team, which uses data to help protect free expression and free speech on the web. He also founded and leads Google’s Data Liberation Front, a team that systematically works to make it easy for users to move their data both to and from Google (e.g. via Google Takeout). He serves as both thought leader and internal advisor for Google’s open data efforts and has previously led the Google Code and The Google Affiliate Network teams.
Prior to joining Google, Brian was a senior software engineer on the version control team at CollabNet, working on Subversion, cvs2svn, and CVS. He has also worked at Apple Computer as a senior engineer in their professional services division, developing both client and web applications for Apple’s largest corporate customers. Brian has been an active open source contributor for over thirteen years. After years of writing small open source programs and bugfixes, he became a core Subversion developer in 2000, and then the lead developer of the cvs2svn utility. He was nominated as a member of the Apache Software Foundation in 2002 and spent two years as the ASF’s VP of Public Relations. He is also a member of the Open Web Foundation. Brian has written numerous articles and given many presentations on a wide variety of subjects from open data to version control to software development, including co-writing "Version Control with Subversion" (now in its second edition) as well as chapters for "Unix in a Nutshell" and "Linux in a Nutshell."
Brian has an A.B. in Classics from Loyola University Chicago with a major in Latin, a minor in Greek, and a concentration in Fine Arts and Ceramics. Despite growing up in New Orleans and working for Silicon Valley companies for most of his career, he decided years ago that Chicago was his home and stubbornly refuses to move to California.
Dana Herlihey, Production Coord, Community Mgr, Stitch Media Inc
A lover of all things digital, Dana Herlihey has been working in new media since she was 15 years old, co-pioneering what was Canada’s first online entertainment magazine ‘for teens by teens’. Following an adolescence filled with red carpet interviews, she attended McMaster University, earning a combined honors degree in Multimedia and Cultural Studies. She later spent a year in Geneva, Switzerland working as a Webmaster and digital communications assistant for the Ecumenical Advocacy Alliance.
As Stitch Media’s Production Coordinator she has managed large interactive teams for projects such as Redress Remix and Showcase’s Drunk and On Drugs: Happy Funtime Hour. She has also led social media campaigns for Stitch Media, recently winning a 2011 Digi Award for Best in Digital Advertising (Drunk and Drugs: Happy Funtime Hour).
Duncan Smith, Programmer-Archivist, Archive Team
I’ve spoken previously about international toll-free telephone number routing and about the history of public works in Seattle. Now, I speak about how we preserve history when those to whom we entrust it show all signs of having abdicated that responsiblity.
For over 20 years the web has provided continuous deluge of cultural production. Digital artifacts such as websites, images, and videos have much to communicate about our social and cultural evolution, and yet their messages or moments can be fleeting or quickly lost. Both the accessibility and longevity of digital content are subject to a wide range of risks, from technological obsolescence to outright deletion by their creator or host. So what is being done to preserve these cultural objects for the long term? Approaching web content from a cultural and artistic perspective, this panel will convene leading writers, archivists, thinkers and technologists to discuss to the questions, challenges, and imperatives involving preserving the creative culture of the web. We’ll cover topics like "what is the long-term significance of a website, and why would it be worth preserving?", "should web sites and artifacts be treated like works of art or architecture?", and "how do we go about archiving digital content to ensure its accessibility and longevity?". Example initiatives to be discussed will be the Archive Team’s various projects (such as the Geocities torrent), the Internet Archive’s Wayback Machine, Internet Archeology, and the Rhizome ArtBase. This panel will be presented by Rhizome, an organization dedicated to the creation, presentation, preservation, and critique of emerging artistic practices that engage technology.
A few days ago, hundreds of thousands seeds from around the world arrived at an underground storage vault on a remote Arctic island. That vault holds a growing collection of seeds, from all the different kinds of crops around the world that humans grow for food.
Humanity’s agricultural legacy is on a par with any of our great cultural legacies, Richardson said, but preserving it is not just a matter of honoring the history and richness of our most fundamental civilization-enabling technology. For the health of future crops and livestock we need the deep genetic reservoir of all those millennia of sophisticated breeding. A million people died in the Irish Potato Famine because the whole nation depended on just two varieties of potato. In Peru, where potatoes originally came from, Richardson visited a field at 14,000 feet where 400 varieties of potato (with names like “Ashes of the Soul” and “Puma Paw”) are grown in just two acres. The local 1,300 varieties of potato are managed by a “Guardian of the Potatoes,” whose job it is in the community to know the story and uses of all the potatoes.
The accumulated wisdom in the crops and livestock is profound. We’ve been breeding cattle for 10,000 years, goats for 9,000 years, dogs for 12,000 years, chickens for 8,000 years, llamas for 6,500 years, horses for 6,000 years, camels for 4,000 years. All those millennia we have been in deep partnership with the animals. All of our staple foods are ancient. Wheat has been bred for 11,000 years, corn for 8,000 years, rice for 8,000 years, potatoes for 7,000 years, soybeans for 5,000 years
“For 9,900 years,” Richardson said, “we’ve been building up variety in domesticated crops and livestock—-this whole wealth of specific solutions to specific problems. For the last 100 years we’ve been throwing it away.” 95% is gone. In the US in 1903 there were 497 varieties of lettuce; by 1983 there were only 36 varieties. (Also changed from 1903 to 1983: sweet corn from 307 varieties to 13; peas from 408 to 25; tomatoes from 408 to 79; cabbage from 544 to 28.) Seed banks have been one way to slow the rate of loss. The famous seed vault at Svalbard serves as backup for the some 1,300 seed banks around the world. The great limitation is that seeds don’t remain viable for long. They have to be grown out every 7 to 20 years, and the new seeds returned to storage.
Even with living heirlooms, the rule is Use It Or Lose It. Devotees of exotic cattle say “You have to eat them to save them.” With dramatic photos Richardson compared the livestock shows in Wales with the livestock markets in Ethiopia. You see children adoring the young animals and breeders obsessing on details of excellence and uniqueness. “One guy says, ‘You see that sheep with the heart-shaped spot on his left shoulder? I’ll bet you I can move it to his rump in four generations.’” There’s a sheep called the North Ronaldsay that is bred to live solely on seaweed on the coast. Ethiopia has some specialists, like the Sheko cattle that are resistant to tsetse flies, but unlike in Europe, most of their breeds have to be generalists capable of providing meat, milk, labor (such pulling plows), and warmth in the winter.
Helping preserve agricultural biodiversity is open to anyone. The Seed Savers Exchange in Decorah, Iowa, has 13,000 members. Their catalog is a cornucopia of heirloom garden delights, and members learn how to produce and store their own seeds and then share them. “It’s a wonderful example of citizens participating in the process.” And we can always acquire a new taste for old foods. Teff! Quinoa! Amaranth! Randall Lineback cows! You have to eat them to save them.
On tonight’s show, Dickturnip doesn’t get spit on by a Llama, Peann takes a photo of Jupiter and we talk to technology historian and documentary creator, Jason Scott.
Universal access to all knowledge, Kahle declared, will be one of humanity’s greatest achievements. We are already well on the way. "We’re building the Library of Alexandria, version 2. We can one-up the Greeks!"
Start with what the ancient library had—-books. The Internet Library already has 3 million books digitized. With its Scribe Book Scanner robots—-29 of them around the world—-they’re churning out a thousand books a day digitized into every handy ebook format, including robot-audio for the blind and dyslexic. Even modern heavily copyrighted books are being made available for free as lending-library ebooks you can borrow from physical libraries—-100,000 such books so far. (Kahle announced that every citizen of California is now eligible to borrow online from the Oakland Library’s "ePort.")
As for music, Kahle noted that the 2-3 million records ever made are intensely litigated, so the Internet Archive offered music makers free unlimited storage of their works forever, and the music poured in. The Archive audio collection has 100,000 concerts so far (including all the Grateful Dead) and a million recordings, with three new bands every day uploading.
Moving images. The 150,000 commercial movies ever made are tightly controlled, but 2 million other films are readily available and fascinating—-600,000 of them are accessible in the Archive already. In the year 2000, without asking anyone’s permission, the Internet Archive started recording 20 channels of TV all day, every day. When 9/11 happened, they were able to assemble an online archive of TV news coverage all that week from around the world ("TV comes with a point of view!") and make it available just a month after the event on Oct. 11, 2001.
The Web itself. When the Internet Archive began in 1996, there were just 30 million web pages. Now the Wayback Machine copies every page of every website every two months and makes them time-searchable from its 6-petabyte database of 150 billion pages. It has 500,000 users a day making 6,000 queries a second.
"What is the Library of Alexandria most famous for?" Kahle asked. "For burning! It’s all gone!" To maintain digital archives, they have to be used and loved, with every byte migrated forward into new media evey five years. For backup, the whole Internet Archive is mirrored at the new Bibliotheca Alexadrina in Egypt and in Amsterdam. ("So our earthquake zone archive is backed up in the turbulent Mideast and a flood zone. I won’t sleep well until there are five or six backup sites.")
Speaking of institutional longevity, Kahle noted during the Q & A that nonprofits demonstrably live much longer than businesses. It might be it’s because they have softer edges, he surmised, or that they’re free of the grow-or-die demands of commercial competition. Whatever the cause, they are proliferating.
This is a collection of Geocities data downloaded by a bunch of people who call themselves ARCHIVE TEAM, who began scraping the Yahoo! Geocities site during a six month period in 2009, before Yahoo! shut down geocities.com on October 26th, 2009.
At the time of the purchase, Geocities was the THIRD most popular website on the Internet. Even by the time of its shutdown, it was in the top 250. We don’t have complete rock-solid knowledge of why it was shut down, but all signs point to Yahoo! trying to get back to basics (like, uh, having a huge audience?) and Geocities magically didn’t fall into this new "focus", and lacked any internal cheerleader to make it last through meetings.
Yahoo! succeeded in destroying the most amount of history in the shortest amount of time, certainly on purpose, in known memory. Millions of files, user accounts, all gone.
Page 1 of 2Older