Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Metadata is a Love Note to the Future

38,487 views

Published on

A talk given at Confab 2015

Published in: Marketing, Technology

Metadata is a Love Note to the Future

  1. Rachel Lovinger @rlovinger Confab, 22 May, 2015 Image via Bond
  2. 2 ©2015 All rights reserved. • Experience Director, Content Strategy; Razorfish New York • Co-editor of scatter/gather, a content strategy blog: http://scattergather.razorfish.com • Author of Nimble: A Razorfish Report on Publishing in the Digital Age (June 2010): http://nimble.razorfish.com • Twitter: @rlovinger
  3. 4
  4. 5
  5. 6
  6. 7
  7. 8
  8. 9
  9. 10
  10. 11 ©2015 All rights reserved. is HARDCORE
  11. 12 ©2015 All rights reserved. 2006 2009 2008 2012 2011 2010
  12. 13 ©2015 All rights reserved. Metadata = Context Context enables Connections How does one convey that in a concise and powerful way?
  13. 14 Photo by Jesse Chan-Norris
  14. Metadata Is A Love note To the Future
  15. 16 Tweet and photo by Erin Kissane, Tumblr by Austin Kleon 429 notes 82 retweets
  16. 17 Photo and shirt by Sarah
  17. 18 Photo by Rachel Lovinger
  18. 19 Content Strategy for Mobile by Karen McGrane
  19. 21 • Nearly 60,000 files archived • Mostly from 1980-1995 • Collected and curated since 1998 • Almost no metadata Textfiles.com
  20. 22 Who needs a database?
  21. 23 Metadata Skeptic transformed into… Metadata Warrior Photos by Jason Scott and Rachel Lovinger
  22. 24 Photo by Rachel Lovinger
  23. 25 • Me? Photo by Rachel Lovinger
  24. ENTERTAINMENTWEEKLY Metadata for Journalism Products
  25. 27 ©2015 All rights reserved. ~3 years online content ~10 years magazine content
  26. 28 ©2015 All rights reserved. Imported from text files to CMS
  27. 29 ©2015 All rights reserved. Semi-structured information allowed us to map the files to content types and site sections, and add some metadata (author, published date, keywords, etc.) 10 years x 50 issues per year x 100 files per issue (approx.) 50,000 estimated articles
  28. 30 ©2015 All rights reserved. Once in the CMS, we could add photos, links, formatting, etc.
  29. 31 ©2015 All rights reserved. For the content already in the CMS, keywords had been manually typed in by authors • 6790 “different” keywords • Removed 12% during clean up • Typos • Redundant • Not Useful
  30. 33 ©2015 All rights reserved. • Star Wars: Episode I -- The Phantom Menace • Episode 1 • Episode I • Phantom Menace • Star Wars Episode I The Phantom Menace • Star Wars Episode I: The Phantom Menace • Star Wars prequel • Star Wars: Episode 1 -- The Phantom Menace • Star Wars: Episode i -- the Phantom Menace • Star Wars: Episode I: The Phantom Menace • Star Wars: Episode I--The Phantom Menace • Star Wars: Episode I--The Phantom Menance • Star Wars: Episode One -- The Phantom Menace • Star Wars: The Phantom Menace • Star Wars: The Phantom Menace -- Episode I • The Phantom Menace • The Phanton Menace
  31. 34 ©2015 All rights reserved. • TAFKAP?
  32. 35 ©2015 All rights reserved. • TAFKAP? • The Artist • Artist Formerly Known as Prince • The Artist Formerly Known As Prince • The Artist formerly known as Prince • the Artist Formerly Known as Prince • The Artist Formerly Known as Prince (PKA)
  33. 37 ©2015 All rights reserved. • The magazine was once a week • The website published new articles several times a day • Plus: Over 50,000 past articles! • How could we better use all that content?
  34. 38 ©2015 All rights reserved. If you like James Bond, we wanted it to be easy for you to discover everything we had. Cover Story Interview Photo Gallery Etc.
  35. 39 Entertainment Weekly Journalism IMDb-like Information
  36. 40
  37. 41 ©2015 All rights reserved. We put our controlled vocabulary into categories, to make them more distinct and meaningful. For example: • Book > Product > Harry Potter and the Goblet of Fire • Movie > Product > Harry Potter and the Goblet of Fire • Person > Individual > Daniel Radcliffe • Person > Individual > J.K. Rowling
  38. 42 Capsule Move Review Preview Move Review DVD Review
  39. 43 • Relationships defined for each media type • Managed separately from the article content • The full set of metadata was available to all articles
  40. 44 ©2015 All rights reserved. • Standard relationships • For example, for Movie: - Lead Performers - Director - Writer - Release Date - EW Grade - Etc. • Select a related category for each relationship, as applicable • Some allow multiple values
  41. 45 • Authors just selected the primary category • Related metadata pulled in automatically • Updates appeared on all articles *Metadata categories and relationships were managed by a dedicated data librarian
  42. 46
  43. 47 ©2015 All rights reserved. • “Best Results” linked directly to an aggregated page based on the category. • For example: - “Cats & Dogs” vs. “The Truth About Cats & Dogs” - The Green Mile (Movie) vs. The Green Mile (Book)
  44. 49 • Wal-mart sold gallon jars of Vlasic pickles for $2.97. • A popular item – priced so low it nearly put Vlasic out of business. • By achieving their goals, they put themselves in a position they might not survive. See: http://www.fastcompany.com/47593/wal-mart-you-dont-know
  45. 50 ©2015 All rights reserved. • We wanted people to discover older content, and they did! • By 2006, we had 16 years of magazine and web content. • Other Time Inc. publications were interested in using our categorization system, too.
  46. 51 Not well-suited for our expensive and frequent database calls.
  47. 52 Our webservers were optimized to serve up the latest “issue” of content. 40% of Time Inc.’s database calls, only 25% of the total traffic
  48. 53 A 2007 redesign removed the “third column” entirely.
  49. 54 ©2015 All rights reserved. The creator of Freebase (a semi-semantic UGC site for structured content, now read-only) said EW.com was way ahead of its time.
  50. METADATAWARRIOR The making of a
  51. 57 Who needs a database?
  52. 58 “The hardest part of [recording] history is to be there when it happens.” Photo by Rachel Lovinger
  53. 59
  54. 60 • An informal post on August 4th • Notification sent out September 30th • Shut down October 31st
  55. 61 “What happened to my web page on my husband, Bob Champine, that took me many years to put together on his career and which meant a lot to me and to the aviation community. I noticed with 9.0 I lost the left margin and the picture of him exiting the X-1. I need to restore it to the internet as it is history. Please tell me what to do. I will be glad to retype it, I just don’t want it lost to the world. I need help. Gloria Champine”
  56. 62 Illustration from “Fire in the Library,” MIT Technology Review
  57. 63 “Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever.”
  58. 64
  59. 65
  60. 66
  61. 67
  62. 68
  63. 69
  64. 70
  65. 71
  66. 72 • In 6 months Archive Team saved 900 Gb • Estimated 4-5 Tb total • Other people saved additional pages, but probably ¼ is gone forever • For many people, Geocities was their first web presence
  67. 73
  68. 74
  69. 75
  70. 76 Those screenshots were automatically generated from Geocities sites rescued by Archive Team in 2009 See more at One Terabyte of Kilobyte Age Photo Op: http://oneterabyteofkilobyteage.tumblr.com/
  71. 77 Due to lack of metadata: • The rescued data was less useful • Really bulky files • Case-sensitive filenames difficult to access and read • Not in a web-ready format (WARC) • The process was less efficient and more error prone • Poor tracking of completed activity • Lots of duplication of data • Took way too long (6 months vs. 3 days) • Could have gotten all the data in a month (estimated)
  72. 78
  73. 79 ©2015 All rights reserved. Mission: The Internet Archive’s purposes include offering permanent access for researchers, historians, scholars, people with disabilities, and the general public to historical collections that exist in digital format. Photo by Ulf Benjaminsson
  74. 80
  75. 81
  76. 82
  77. 83 Save the history before it's lost forever Offer permanent access to historical collections that exist in digital format
  78. 84 ©2015 All rights reserved. Internet Archive contains: web pages, texts, videos, audio files, software, and images. (Plus concerts and collections) • Media Type makes it Readable or Playable • Emulator (for software) makes it Executable • Subject Keywords makes it Findable
  79. 86 ©2015 All rights reserved. • Is it Accurate? • Is it Credible? • What is the Source? (machines or people) • It’s a lot of Effort. Do we have enough people and time?
  80. 88 ©2015 All rights reserved. Additional processing takes place, depending on the type
  81. 89 • Description and keywords are required, but open fields • Other metadata is optional
  82. 90
  83. 91 • Metadata attributes determined by the community
  84. 92 ©2015 All rights reserved. • For user-generated content, it’s just easier for people not to. • Internet Archive will never have enough people on staff to do it properly.
  85. 93 Crowdsource manual creation of metadata Photo by Pascal
  86. 94 • Small a pool of volunteers, and their drive didn’t last long • Tools didn’t provide immediate feedback/satisfaction. They had to email their inputs and wait. Photo by psyberartist
  87. 95 • 10 most common words + 10 most common 2-word phrases • Applied to 200,000 items • Much more scalable • Heavily machine assisted: a person can validate data and create collections Photo by James St. John
  88. 96
  89. 97 “Controversial, but roughly as good as a bored intern.”
  90. 98 Topics: switch, atari, antenna, game, cable, terminals, console, television, video, program, power supply, console unit, video computer, game program, computer system, atari game, power switch, switch box, atari video, screw terminals
  91. 99 Having the stuff is vital, the most important thing. But it’s also vital to have a system by which these things are described. “If a person can’t get the information they need, then we’re failing.” Photo by Rachel Lovinger
  92. 101 • Jason had converted to a metadata advocate But I realized that… • Content strategists who care about the long game should think like historians, archivists and futurists, too.
  93. NATURALIS BIODIVERSITY CENTER Metadata from the past
  94. 103 • Dutch leader in academic research and education on biodiversity and taxonomy. • Has a collection of 37 million natural history objects.
  95. 104 Describe, understand and explore biodiversity for human wellbeing and the future of our planet. They do this with: • Accessible collections • Contributions to global scientific research • Awe of natural history • Openly shared knowledge
  96. 105 • From 2010 to June 2015 • 250 staff members & 450 volunteers • Digitizing 7 million objects in detail • Adding metadata for the other 30 million objects
  97. 106 • Information is more easily discovered, studied, and used. • Scientists worldwide can access it directly online, without assistance. • Some of this data has never been available in digital form before.
  98. 107 • Scientific name • Where it was found • When it was found • Who found it “Objects [in the collection] have no scientific value without this information.” - Suzanne de Jong-Kole
  99. 108
  100. 109 Employees enter data, verbatim, into the collection registration system.
  101. 110 This allows them to retrieve the physical specimen if requested.
  102. 111 • Vele Handen = Many Hands • People helped transcribe hand written labels • In 9 months, people did 200,000, of which about half were usable.
  103. 112 The person who collected the specimen wrote the metadata on the label. This could be a professional researcher, or a non-professional enthusiast.
  104. 113 Darwin’s Finches
  105. 114 The oldest is this Spanish pepper from 1550!
  106. 115 When they wrote this metadata, they had no idea that nearly half a millennium later people would be “digitizing” it.
  107. 116 ©2015 All rights reserved. The ‘love note’ is when you behave selflessly for a partner – or customer – that doesn’t exist yet. A drawing Jason drew in my notebook in high school, 20+ years before we ever dated.
  108. Rachel Lovinger @rlovinger Image via Bond

×