Metadata is a Love Note to the Future

Rachel Lovinger
Rachel LovingerContent Strategy Director
Rachel Lovinger @rlovinger
Confab, 22 May, 2015
Image via Bond
2
©2015 All rights reserved.
• Experience Director, Content Strategy;
Razorfish New York
• Co-editor of scatter/gather, a content
strategy blog:
http://scattergather.razorfish.com
• Author of Nimble: A Razorfish Report
on Publishing in the Digital Age (June
2010): http://nimble.razorfish.com
• Twitter: @rlovinger
Metadata is a Love Note to the Future
4
5
6
7
8
9
10
11
©2015 All rights reserved.
is
HARDCORE
12
©2015 All rights reserved.
2006
2009
2008
2012
2011
2010
13
©2015 All rights reserved.
Metadata = Context
Context enables Connections
How does one convey that in a concise and powerful way?
14
Photo by Jesse Chan-Norris
Metadata Is A
Love note
To the Future
16
Tweet and photo by Erin Kissane, Tumblr by Austin Kleon
429 notes
82 retweets
17
Photo and shirt by Sarah
18
Photo by Rachel Lovinger
19
Content Strategy for Mobile by Karen McGrane
Metadata is a Love Note to the Future
21
• Nearly 60,000
files archived
• Mostly from
1980-1995
• Collected and
curated since
1998
• Almost no
metadata
Textfiles.com
22
Who needs a database?
23
Metadata Skeptic transformed into… Metadata Warrior
Photos by Jason Scott and Rachel Lovinger
24
Photo by Rachel Lovinger
25
• Me?
Photo by Rachel Lovinger
ENTERTAINMENTWEEKLY
Metadata for Journalism Products
27
©2015 All rights reserved.
~3 years online content ~10 years magazine content
28
©2015 All rights reserved.
Imported from text files to CMS
29
©2015 All rights reserved.
Semi-structured information
allowed us to map the files to
content types and site sections,
and add some metadata (author,
published date, keywords, etc.)
10 years
x 50 issues per year
x 100 files per issue (approx.)
50,000 estimated articles
30
©2015 All rights reserved.
Once in the CMS, we could add
photos, links, formatting, etc.
31
©2015 All rights reserved.
For the content already in the
CMS, keywords had been
manually typed in by authors
• 6790 “different” keywords
• Removed 12% during clean up
• Typos
• Redundant
• Not Useful
Metadata is a Love Note to the Future
33
©2015 All rights reserved.
• Star Wars: Episode I -- The Phantom Menace
• Episode 1
• Episode I
• Phantom Menace
• Star Wars Episode I The Phantom Menace
• Star Wars Episode I: The Phantom Menace
• Star Wars prequel
• Star Wars: Episode 1 -- The Phantom Menace
• Star Wars: Episode i -- the Phantom Menace
• Star Wars: Episode I: The Phantom Menace
• Star Wars: Episode I--The Phantom Menace
• Star Wars: Episode I--The Phantom Menance
• Star Wars: Episode One -- The Phantom Menace
• Star Wars: The Phantom Menace
• Star Wars: The Phantom Menace -- Episode I
• The Phantom Menace
• The Phanton Menace
34
©2015 All rights reserved.
• TAFKAP?
35
©2015 All rights reserved.
• TAFKAP?
• The Artist
• Artist Formerly Known as Prince
• The Artist Formerly Known As Prince
• The Artist formerly known as Prince
• the Artist Formerly Known as Prince
• The Artist Formerly Known as Prince (PKA)
Metadata is a Love Note to the Future
37
©2015 All rights reserved.
• The magazine was once a week
• The website published new
articles several times a day
• Plus: Over 50,000 past articles!
• How could we better use all
that content?
38
©2015 All rights reserved.
If you like James Bond, we wanted it to be easy for you to
discover everything we had.
Cover Story
Interview
Photo Gallery
Etc.
39
Entertainment Weekly
Journalism
IMDb-like
Information
40
41
©2015 All rights reserved.
We put our controlled vocabulary into categories, to make them more
distinct and meaningful.
For example:
• Book > Product > Harry Potter and the Goblet of Fire
• Movie > Product > Harry Potter and the Goblet of Fire
• Person > Individual > Daniel Radcliffe
• Person > Individual > J.K. Rowling
42
Capsule
Move
Review
Preview
Move Review
DVD Review
43
• Relationships
defined for each
media type
• Managed
separately from
the article content
• The full set of
metadata was
available to all
articles
44
©2015 All rights reserved.
• Standard relationships
• For example, for Movie:
- Lead Performers
- Director
- Writer
- Release Date
- EW Grade
- Etc.
• Select a related category for
each relationship, as applicable
• Some allow multiple values
45
• Authors just
selected the
primary category
• Related metadata
pulled in
automatically
• Updates appeared
on all articles
*Metadata categories and
relationships were managed
by a dedicated data librarian
46
47
©2015 All rights reserved.
• “Best Results” linked directly to
an aggregated page based on
the category.
• For example:
- “Cats & Dogs” vs. “The Truth
About Cats & Dogs”
- The Green Mile (Movie) vs. The
Green Mile (Book)
Metadata is a Love Note to the Future
49
• Wal-mart sold gallon jars of
Vlasic pickles for $2.97.
• A popular item – priced so low
it nearly put Vlasic out of
business.
• By achieving their goals, they
put themselves in a position
they might not survive.
See: http://www.fastcompany.com/47593/wal-mart-you-dont-know
50
©2015 All rights reserved.
• We wanted people to
discover older content, and
they did!
• By 2006, we had 16 years of
magazine and web content.
• Other Time Inc. publications
were interested in using our
categorization system, too.
51
Not well-suited for our expensive
and frequent database calls.
52
Our webservers were optimized to
serve up the latest “issue” of content.
40% of Time Inc.’s database calls,
only 25% of the total traffic
53
A 2007 redesign removed the “third column” entirely.
54
©2015 All rights reserved.
The creator of Freebase (a semi-semantic UGC site for structured
content, now read-only) said EW.com was way ahead of its time.
Metadata is a Love Note to the Future
METADATAWARRIOR
The making of a
57
Who needs a database?
58
“The hardest part of
[recording] history is to be
there when it happens.”
Photo by Rachel Lovinger
59
60
• An informal post on August 4th
• Notification sent out September 30th
• Shut down October 31st
61
“What happened to my web page on my husband, Bob Champine,
that took me many years to put together on his career and which
meant a lot to me and to the aviation community. I noticed with 9.0
I lost the left margin and the picture of him exiting the X-1. I need to
restore it to the internet as it is history. Please tell me what to do. I
will be glad to retype it, I just don’t want it lost to the world. I need
help. Gloria Champine”
62
Illustration from “Fire in the Library,” MIT Technology Review
63
“Archive Team is a loose collective of rogue archivists,
programmers, writers and loudmouths dedicated to
saving our digital heritage. Since 2009 this variant
force of nature has caught wind of shutdowns,
shutoffs, mergers, and plain old deletions - and done
our best to save the history before it's lost forever.”
64
65
66
67
68
69
70
71
72
• In 6 months Archive Team saved 900 Gb
• Estimated 4-5 Tb total
• Other people saved additional pages,
but probably ¼ is gone forever
• For many people, Geocities was their
first web presence
73
74
75
76
Those screenshots were automatically generated from
Geocities sites rescued by Archive Team in 2009
See more at One Terabyte of Kilobyte Age Photo Op:
http://oneterabyteofkilobyteage.tumblr.com/
77
Due to lack of metadata:
• The rescued data was less useful
• Really bulky files
• Case-sensitive filenames difficult to access and read
• Not in a web-ready format (WARC)
• The process was less efficient and more error prone
• Poor tracking of completed activity
• Lots of duplication of data
• Took way too long (6 months vs. 3 days)
• Could have gotten all the data in a month (estimated)
78
79
©2015 All rights reserved.
Mission:
The Internet Archive’s purposes include offering permanent access
for researchers, historians, scholars, people with disabilities, and the
general public to historical collections that exist in digital format.
Photo by Ulf Benjaminsson
80
81
82
83
Save the history before it's lost
forever
Offer permanent access to
historical collections that exist in
digital format
84
©2015 All rights reserved.
Internet Archive contains: web pages, texts, videos, audio files,
software, and images. (Plus concerts and collections)
• Media Type makes it Readable or Playable
• Emulator (for software) makes it Executable
• Subject Keywords makes it Findable
Metadata is a Love Note to the Future
86
©2015 All rights reserved.
• Is it Accurate?
• Is it Credible?
• What is the Source? (machines or people)
• It’s a lot of Effort. Do we have enough people and time?
Metadata is a Love Note to the Future
88
©2015 All rights reserved.
Additional processing takes place, depending on the type
89
• Description and keywords are required, but open fields
• Other metadata is optional
90
91
• Metadata attributes
determined by the
community
92
©2015 All rights reserved.
• For user-generated content, it’s just easier for people not to.
• Internet Archive will never have enough people on staff to do it
properly.
93
Crowdsource manual creation of metadata
Photo by Pascal
94
• Small a pool of volunteers, and
their drive didn’t last long
• Tools didn’t provide immediate
feedback/satisfaction. They had
to email their inputs and wait.
Photo by psyberartist
95
• 10 most common words + 10
most common 2-word phrases
• Applied to 200,000 items
• Much more scalable
• Heavily machine assisted: a
person can validate data and
create collections
Photo by James St. John
96
97
“Controversial, but roughly as
good as a bored intern.”
98
Topics:
switch, atari,
antenna, game,
cable, terminals,
console, television,
video, program,
power supply,
console unit, video
computer, game
program, computer
system, atari game,
power switch,
switch box, atari
video, screw
terminals
99
Having the stuff is vital, the
most important thing. But
it’s also vital to have a
system by which these
things are described.
“If a person can’t get the
information they need, then
we’re failing.”
Photo by Rachel Lovinger
Metadata is a Love Note to the Future
101
• Jason had converted to a
metadata advocate
But I realized that…
• Content strategists who care
about the long game should
think like historians,
archivists and futurists, too.
NATURALIS BIODIVERSITY
CENTER
Metadata from the past
103
• Dutch leader in academic research and education on
biodiversity and taxonomy.
• Has a collection of 37 million natural history objects.
104
Describe, understand and explore biodiversity for human
wellbeing and the future of our planet.
They do this with:
• Accessible collections
• Contributions to global
scientific research
• Awe of natural history
• Openly shared knowledge
105
• From 2010 to June 2015
• 250 staff members & 450 volunteers
• Digitizing 7 million objects in detail
• Adding metadata for the other 30 million objects
106
• Information is
more easily
discovered,
studied, and used.
• Scientists
worldwide can
access it directly
online, without
assistance.
• Some of this data
has never been
available in digital
form before.
107
• Scientific name
• Where it was found
• When it was found
• Who found it
“Objects [in the collection] have no scientific value
without this information.” - Suzanne de Jong-Kole
108
109
Employees enter data, verbatim, into the collection registration system.
110
This allows them to retrieve the physical specimen if requested.
111
• Vele Handen = Many Hands
• People helped transcribe
hand written labels
• In 9 months, people did
200,000, of which about half
were usable.
112
The person who collected the specimen wrote the metadata on the label.
This could be a professional researcher, or a non-professional enthusiast.
113
Darwin’s Finches
114
The oldest is this Spanish
pepper from 1550!
115
When they wrote this metadata, they had no idea that nearly
half a millennium later people would be “digitizing” it.
116
©2015 All rights reserved.
The ‘love note’ is when
you behave selflessly for
a partner – or customer –
that doesn’t exist yet.
A drawing Jason drew in my notebook in high
school, 20+ years before we ever dated.
Rachel Lovinger @rlovinger
Image via Bond
1 of 117

Recommended

Introducation to metadata by
Introducation to metadataIntroducation to metadata
Introducation to metadataMetaschool Project
3.8K views62 slides
JSON Data Modeling in Document Database by
JSON Data Modeling in Document DatabaseJSON Data Modeling in Document Database
JSON Data Modeling in Document DatabaseDATAVERSITY
620 views56 slides
How to govern and secure a Data Mesh? by
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?confluent
194 views32 slides
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture by
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data ArchitectureADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data ArchitectureDATAVERSITY
1.8K views32 slides
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D... by
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
1.5K views50 slides
Introduction to Dublin Core Metadata by
Introduction to Dublin Core MetadataIntroduction to Dublin Core Metadata
Introduction to Dublin Core MetadataHannes Ebner
4.3K views11 slides

More Related Content

What's hot

‏‏Chapter 8: Reference and Master Data Management by
‏‏Chapter 8: Reference and Master Data Management ‏‏Chapter 8: Reference and Master Data Management
‏‏Chapter 8: Reference and Master Data Management Ahmed Alorage
329 views45 slides
You Need a Data Catalog. Do You Know Why? by
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?Precisely
441 views38 slides
Enterprise Architecture vs. Data Architecture by
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
2.9K views37 slides
SharePoint Folders vs. Metadata Best Practices by
SharePoint Folders vs. Metadata Best PracticesSharePoint Folders vs. Metadata Best Practices
SharePoint Folders vs. Metadata Best PracticesChris Woodill
27.3K views12 slides
GPT and Graph Data Science to power your Knowledge Graph by
GPT and Graph Data Science to power your Knowledge GraphGPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge GraphNeo4j
269 views49 slides
‏‏‏‏‏‏‏‏Chapter 11: Meta-data Management by
‏‏‏‏‏‏‏‏Chapter 11: Meta-data Management‏‏‏‏‏‏‏‏Chapter 11: Meta-data Management
‏‏‏‏‏‏‏‏Chapter 11: Meta-data ManagementAhmed Alorage
247 views41 slides

What's hot(20)

‏‏Chapter 8: Reference and Master Data Management by Ahmed Alorage
‏‏Chapter 8: Reference and Master Data Management ‏‏Chapter 8: Reference and Master Data Management
‏‏Chapter 8: Reference and Master Data Management
Ahmed Alorage329 views
You Need a Data Catalog. Do You Know Why? by Precisely
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
Precisely441 views
Enterprise Architecture vs. Data Architecture by DATAVERSITY
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
DATAVERSITY2.9K views
SharePoint Folders vs. Metadata Best Practices by Chris Woodill
SharePoint Folders vs. Metadata Best PracticesSharePoint Folders vs. Metadata Best Practices
SharePoint Folders vs. Metadata Best Practices
Chris Woodill27.3K views
GPT and Graph Data Science to power your Knowledge Graph by Neo4j
GPT and Graph Data Science to power your Knowledge GraphGPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge Graph
Neo4j269 views
‏‏‏‏‏‏‏‏Chapter 11: Meta-data Management by Ahmed Alorage
‏‏‏‏‏‏‏‏Chapter 11: Meta-data Management‏‏‏‏‏‏‏‏Chapter 11: Meta-data Management
‏‏‏‏‏‏‏‏Chapter 11: Meta-data Management
Ahmed Alorage247 views
Chapter 7: Data Security Management by Ahmed Alorage
Chapter 7: Data Security ManagementChapter 7: Data Security Management
Chapter 7: Data Security Management
Ahmed Alorage363 views
FAIR Data Knowledge Graphs by Tom Plasterer
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
Tom Plasterer937 views
Subject cataloging a review by Ime Amor Mortel
Subject cataloging   a reviewSubject cataloging   a review
Subject cataloging a review
Ime Amor Mortel16.4K views
Data Architecture Best Practices for Advanced Analytics by DATAVERSITY
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
DATAVERSITY921 views
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat... by DataWorks Summit
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
DataWorks Summit2.5K views
Enabling a Data Mesh Architecture with Data Virtualization by Denodo
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
Denodo 534 views
How a Semantic Layer Makes Data Mesh Work at Scale by DATAVERSITY
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
DATAVERSITY795 views
Chapter 3: Data Governance by Ahmed Alorage
Chapter 3: Data Governance Chapter 3: Data Governance
Chapter 3: Data Governance
Ahmed Alorage732 views
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric by Cambridge Semantics
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Vector databases and neural search by Dmitry Kan
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
Dmitry Kan3.8K views
Data Mesh in Azure using Cloud Scale Analytics (WAF) by Nathan Bijnens
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens229 views
Archival Technologies by Cliff Landis
Archival TechnologiesArchival Technologies
Archival Technologies
Cliff Landis24.2K views

Viewers also liked

Does metadata matter? by
Does metadata matter?Does metadata matter?
Does metadata matter?Eduserv Foundation
50.5K views130 slides
10 Things I Learned in 10 Years as a Content Strategist by
10 Things I Learned in 10 Years as a Content Strategist10 Things I Learned in 10 Years as a Content Strategist
10 Things I Learned in 10 Years as a Content StrategistRachel Lovinger
4.7K views84 slides
Introduction to metadata management by
Introduction to metadata managementIntroduction to metadata management
Introduction to metadata managementOpen Data Support
48.5K views42 slides
What is Metadata? by
What is Metadata?What is Metadata?
What is Metadata?Adgistics
4.1K views13 slides
Metadata For Catalogers (introductions) by
Metadata For Catalogers (introductions)Metadata For Catalogers (introductions)
Metadata For Catalogers (introductions)robin fay
20.6K views35 slides
Content Modelling Workshop Preview by
Content Modelling Workshop PreviewContent Modelling Workshop Preview
Content Modelling Workshop PreviewRachel Lovinger
7.7K views40 slides

Viewers also liked(20)

10 Things I Learned in 10 Years as a Content Strategist by Rachel Lovinger
10 Things I Learned in 10 Years as a Content Strategist10 Things I Learned in 10 Years as a Content Strategist
10 Things I Learned in 10 Years as a Content Strategist
Rachel Lovinger4.7K views
Introduction to metadata management by Open Data Support
Introduction to metadata managementIntroduction to metadata management
Introduction to metadata management
Open Data Support48.5K views
What is Metadata? by Adgistics
What is Metadata?What is Metadata?
What is Metadata?
Adgistics 4.1K views
Metadata For Catalogers (introductions) by robin fay
Metadata For Catalogers (introductions)Metadata For Catalogers (introductions)
Metadata For Catalogers (introductions)
robin fay20.6K views
Content Modelling Workshop Preview by Rachel Lovinger
Content Modelling Workshop PreviewContent Modelling Workshop Preview
Content Modelling Workshop Preview
Rachel Lovinger7.7K views
Content Auditing: Unearthing the Substance of Your Brand by Rachel Lovinger
Content Auditing: Unearthing the Substance of Your BrandContent Auditing: Unearthing the Substance of Your Brand
Content Auditing: Unearthing the Substance of Your Brand
Rachel Lovinger3.7K views
Metadata and Terminology Registries by Marcia Zeng
Metadata and Terminology RegistriesMetadata and Terminology Registries
Metadata and Terminology Registries
Marcia Zeng1.9K views
SKOS - 2007 Open Forum on Metadata Registries - NYC by jonphipps
SKOS - 2007 Open Forum on Metadata Registries - NYCSKOS - 2007 Open Forum on Metadata Registries - NYC
SKOS - 2007 Open Forum on Metadata Registries - NYC
jonphipps1.3K views
Empowering Your Audience Ambassadors with Semantic Publishing by Rachel Lovinger
Empowering Your Audience Ambassadors with Semantic Publishing Empowering Your Audience Ambassadors with Semantic Publishing
Empowering Your Audience Ambassadors with Semantic Publishing
Rachel Lovinger2.1K views
Content in the Age of Promiscuous Reuse by Rachel Lovinger
Content in the Age of Promiscuous ReuseContent in the Age of Promiscuous Reuse
Content in the Age of Promiscuous Reuse
Rachel Lovinger14.3K views
Making of The DEFCON Documentary by Rachel Lovinger
Making of The DEFCON DocumentaryMaking of The DEFCON Documentary
Making of The DEFCON Documentary
Rachel Lovinger1.4K views
Journey Towards Datameaningfulness by Rachel Lovinger
Journey Towards DatameaningfulnessJourney Towards Datameaningfulness
Journey Towards Datameaningfulness
Rachel Lovinger3.3K views

Similar to Metadata is a Love Note to the Future

Twitter Realtime Social Data @StartupFest by
Twitter Realtime Social Data @StartupFestTwitter Realtime Social Data @StartupFest
Twitter Realtime Social Data @StartupFestSylvain Carle
1.1K views23 slides
H2O World - Clustering & Feature Extraction on Text - Seth Redmore by
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreH2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreSri Ambati
5.4K views26 slides
Selected Thoughts on Modern Discovery and Access by
Selected Thoughts on Modern Discovery and AccessSelected Thoughts on Modern Discovery and Access
Selected Thoughts on Modern Discovery and AccessCentre for Advanced Management Education
146 views45 slides
UXSG2014 Lightning Talks - Selfish accessibility (Adrian Roselli) by
UXSG2014 Lightning Talks - Selfish accessibility (Adrian Roselli)UXSG2014 Lightning Talks - Selfish accessibility (Adrian Roselli)
UXSG2014 Lightning Talks - Selfish accessibility (Adrian Roselli)ux singapore
702 views103 slides
Development of the CyberCemetery (2011) by
Development of the CyberCemetery (2011)Development of the CyberCemetery (2011)
Development of the CyberCemetery (2011)Dr. Starr Hoffman
384 views32 slides
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three... by
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...Digital History
896 views82 slides

Similar to Metadata is a Love Note to the Future(20)

Twitter Realtime Social Data @StartupFest by Sylvain Carle
Twitter Realtime Social Data @StartupFestTwitter Realtime Social Data @StartupFest
Twitter Realtime Social Data @StartupFest
Sylvain Carle1.1K views
H2O World - Clustering & Feature Extraction on Text - Seth Redmore by Sri Ambati
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreH2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
Sri Ambati5.4K views
UXSG2014 Lightning Talks - Selfish accessibility (Adrian Roselli) by ux singapore
UXSG2014 Lightning Talks - Selfish accessibility (Adrian Roselli)UXSG2014 Lightning Talks - Selfish accessibility (Adrian Roselli)
UXSG2014 Lightning Talks - Selfish accessibility (Adrian Roselli)
ux singapore702 views
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three... by Digital History
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
Digital History896 views
How To Create Content by Amy Vernon
How To Create ContentHow To Create Content
How To Create Content
Amy Vernon838 views
PyData Texas 2015 Keynote by Peter Wang
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
Peter Wang2.6K views
How to Regularly – and Without a Lot of Extra Effort – Find, Capture and Shar... by NetSquared Vancouver
How to Regularly – and Without a Lot of Extra Effort – Find, Capture and Shar...How to Regularly – and Without a Lot of Extra Effort – Find, Capture and Shar...
How to Regularly – and Without a Lot of Extra Effort – Find, Capture and Shar...
IWMW 2004: Socrates Building an intranet for the UK Research Councils by IWMW
IWMW 2004: Socrates Building an intranet for the UK Research CouncilsIWMW 2004: Socrates Building an intranet for the UK Research Councils
IWMW 2004: Socrates Building an intranet for the UK Research Councils
IWMW 264 views
Webinar - The Changing Landscape of Library Privacy - 2016-06-15 by TechSoup
Webinar - The Changing Landscape of Library Privacy - 2016-06-15Webinar - The Changing Landscape of Library Privacy - 2016-06-15
Webinar - The Changing Landscape of Library Privacy - 2016-06-15
TechSoup 1.5K views
The Digital 4 Ps of Marketing Campaigns Dave Drodge by David Drodge
The Digital 4 Ps of Marketing Campaigns Dave DrodgeThe Digital 4 Ps of Marketing Campaigns Dave Drodge
The Digital 4 Ps of Marketing Campaigns Dave Drodge
David Drodge171 views
PTTP09 London Film Fest Workshop by Brian Newman
PTTP09 London Film Fest WorkshopPTTP09 London Film Fest Workshop
PTTP09 London Film Fest Workshop
Brian Newman980 views
Building Thought Leadership through Content Curation by Corinne Weisgerber
Building Thought Leadership through Content CurationBuilding Thought Leadership through Content Curation
Building Thought Leadership through Content Curation
Corinne Weisgerber219.1K views
Asian Digital Storytelling Congress, Singapore by Barrie Stephenson
Asian Digital Storytelling Congress, SingaporeAsian Digital Storytelling Congress, Singapore
Asian Digital Storytelling Congress, Singapore
Barrie Stephenson299 views

More from Rachel Lovinger

Content Strategy as a Methodology by
Content Strategy as a MethodologyContent Strategy as a Methodology
Content Strategy as a MethodologyRachel Lovinger
207 views48 slides
Making of The DEFCON Documentary by
Making of The DEFCON DocumentaryMaking of The DEFCON Documentary
Making of The DEFCON DocumentaryRachel Lovinger
1.7K views98 slides
Orchestrated Content by
Orchestrated ContentOrchestrated Content
Orchestrated ContentRachel Lovinger
2.8K views2 slides
Content Strategy: Why Now? by
Content Strategy: Why Now?Content Strategy: Why Now?
Content Strategy: Why Now?Rachel Lovinger
26.4K views79 slides
Make Your Content Nimble - Sem Tech UK by
Make Your Content Nimble - Sem Tech UKMake Your Content Nimble - Sem Tech UK
Make Your Content Nimble - Sem Tech UKRachel Lovinger
1.8K views36 slides
Make Your Content Nimble - Confab by
Make Your Content Nimble - ConfabMake Your Content Nimble - Confab
Make Your Content Nimble - ConfabRachel Lovinger
17K views53 slides

More from Rachel Lovinger(16)

Content Strategy as a Methodology by Rachel Lovinger
Content Strategy as a MethodologyContent Strategy as a Methodology
Content Strategy as a Methodology
Rachel Lovinger207 views
Making of The DEFCON Documentary by Rachel Lovinger
Making of The DEFCON DocumentaryMaking of The DEFCON Documentary
Making of The DEFCON Documentary
Rachel Lovinger1.7K views
Content Strategy: Why Now? by Rachel Lovinger
Content Strategy: Why Now?Content Strategy: Why Now?
Content Strategy: Why Now?
Rachel Lovinger26.4K views
Make Your Content Nimble - Sem Tech UK by Rachel Lovinger
Make Your Content Nimble - Sem Tech UKMake Your Content Nimble - Sem Tech UK
Make Your Content Nimble - Sem Tech UK
Rachel Lovinger1.8K views
Make Your Content Nimble - Confab by Rachel Lovinger
Make Your Content Nimble - ConfabMake Your Content Nimble - Confab
Make Your Content Nimble - Confab
Rachel Lovinger17K views
Semantics in Publishing & Media by Rachel Lovinger
Semantics in Publishing & MediaSemantics in Publishing & Media
Semantics in Publishing & Media
Rachel Lovinger2.4K views
STC Summit 2010: Semantic Web and Content Strategy by Rachel Lovinger
STC Summit 2010: Semantic Web and Content StrategySTC Summit 2010: Semantic Web and Content Strategy
STC Summit 2010: Semantic Web and Content Strategy
Rachel Lovinger2.2K views
Semantic Web and Content Strategy by Rachel Lovinger
Semantic Web and Content StrategySemantic Web and Content Strategy
Semantic Web and Content Strategy
Rachel Lovinger35.8K views
Representing Taxonomies: What am I looking at here? by Rachel Lovinger
Representing Taxonomies: What am I looking at here?Representing Taxonomies: What am I looking at here?
Representing Taxonomies: What am I looking at here?
Rachel Lovinger19.3K views
Metadata Strategies And Tools by Rachel Lovinger
Metadata Strategies And ToolsMetadata Strategies And Tools
Metadata Strategies And Tools
Rachel Lovinger9.5K views
A Survey: Taxonomy Building Tools by Rachel Lovinger
A Survey: Taxonomy Building ToolsA Survey: Taxonomy Building Tools
A Survey: Taxonomy Building Tools
Rachel Lovinger5.7K views

Recently uploaded

7 Benefits of Child Welfare Management Software by
7 Benefits of Child Welfare Management Software7 Benefits of Child Welfare Management Software
7 Benefits of Child Welfare Management Softwarejeremyray18
8 views9 slides
Performance Max Pros and Cons by
Performance Max Pros and ConsPerformance Max Pros and Cons
Performance Max Pros and Consakisselev
7 views5 slides
Unlocking Growth in the Digital Age - A Digital Marketing Plan for SMEs in 2024 by
Unlocking Growth in the Digital Age - A Digital Marketing Plan for SMEs in 2024Unlocking Growth in the Digital Age - A Digital Marketing Plan for SMEs in 2024
Unlocking Growth in the Digital Age - A Digital Marketing Plan for SMEs in 2024Partha Dutta
6 views21 slides
"SEO Mastery: Top 10 Tools used by every expert for improving Websites" by
 "SEO Mastery: Top 10 Tools used by every expert for improving Websites" "SEO Mastery: Top 10 Tools used by every expert for improving Websites"
"SEO Mastery: Top 10 Tools used by every expert for improving Websites"Beacon Coders
7 views11 slides
E-commerce Marketing by
E-commerce MarketingE-commerce Marketing
E-commerce MarketingNabil Abidi
66 views7 slides
The evolution of internet.pptx by
The evolution of internet.pptxThe evolution of internet.pptx
The evolution of internet.pptxssuser520a351
6 views19 slides

Recently uploaded(20)

7 Benefits of Child Welfare Management Software by jeremyray18
7 Benefits of Child Welfare Management Software7 Benefits of Child Welfare Management Software
7 Benefits of Child Welfare Management Software
jeremyray188 views
Performance Max Pros and Cons by akisselev
Performance Max Pros and ConsPerformance Max Pros and Cons
Performance Max Pros and Cons
akisselev7 views
Unlocking Growth in the Digital Age - A Digital Marketing Plan for SMEs in 2024 by Partha Dutta
Unlocking Growth in the Digital Age - A Digital Marketing Plan for SMEs in 2024Unlocking Growth in the Digital Age - A Digital Marketing Plan for SMEs in 2024
Unlocking Growth in the Digital Age - A Digital Marketing Plan for SMEs in 2024
Partha Dutta6 views
"SEO Mastery: Top 10 Tools used by every expert for improving Websites" by Beacon Coders
 "SEO Mastery: Top 10 Tools used by every expert for improving Websites" "SEO Mastery: Top 10 Tools used by every expert for improving Websites"
"SEO Mastery: Top 10 Tools used by every expert for improving Websites"
Beacon Coders7 views
E-commerce Marketing by Nabil Abidi
E-commerce MarketingE-commerce Marketing
E-commerce Marketing
Nabil Abidi66 views
The evolution of internet.pptx by ssuser520a351
The evolution of internet.pptxThe evolution of internet.pptx
The evolution of internet.pptx
ssuser520a3516 views
SaaS growth strategies that generate MRR, not just traffic (TheBootstrappedWa... by Daniel Pirciu
SaaS growth strategies that generate MRR, not just traffic (TheBootstrappedWa...SaaS growth strategies that generate MRR, not just traffic (TheBootstrappedWa...
SaaS growth strategies that generate MRR, not just traffic (TheBootstrappedWa...
Daniel Pirciu7 views
Growth strategies for SaaS MRR $10,000 (by TheBootstrappedWay.com) by Daniel Pirciu
Growth strategies for SaaS MRR $10,000 (by TheBootstrappedWay.com)Growth strategies for SaaS MRR $10,000 (by TheBootstrappedWay.com)
Growth strategies for SaaS MRR $10,000 (by TheBootstrappedWay.com)
Daniel Pirciu9 views
November 2023 - Partners meeting group by Vbout.com
November 2023 - Partners meeting groupNovember 2023 - Partners meeting group
November 2023 - Partners meeting group
Vbout.com31 views
Bridging the Gap: How SEO and CRO Work Together to Maximize User Satisfaction... by Rio Ichikawa
Bridging the Gap: How SEO and CRO Work Together to Maximize User Satisfaction...Bridging the Gap: How SEO and CRO Work Together to Maximize User Satisfaction...
Bridging the Gap: How SEO and CRO Work Together to Maximize User Satisfaction...
Rio Ichikawa144 views
Weekly Media Update_04_12_2023.pdf by BalmerLawrie
Weekly Media Update_04_12_2023.pdfWeekly Media Update_04_12_2023.pdf
Weekly Media Update_04_12_2023.pdf
BalmerLawrie14 views
Weekly Media Update_28_11_2023.pdf by BalmerLawrie
Weekly Media Update_28_11_2023.pdfWeekly Media Update_28_11_2023.pdf
Weekly Media Update_28_11_2023.pdf
BalmerLawrie15 views
Branding Proposal for Company.pptx by DSOMGuy
Branding Proposal for Company.pptxBranding Proposal for Company.pptx
Branding Proposal for Company.pptx
DSOMGuy5 views
First 30 days of Your CRO Program by VWO
First 30 days of Your CRO ProgramFirst 30 days of Your CRO Program
First 30 days of Your CRO Program
VWO58 views
"SEO Keyword Checklist: Supercharge Your Website's Ranking Strategy" by Beacon Coders
"SEO Keyword Checklist: Supercharge Your Website's Ranking Strategy""SEO Keyword Checklist: Supercharge Your Website's Ranking Strategy"
"SEO Keyword Checklist: Supercharge Your Website's Ranking Strategy"
Beacon Coders8 views
Monetizing Your Newsletter with Affiliate Marketing by David Clayton
Monetizing Your Newsletter with Affiliate MarketingMonetizing Your Newsletter with Affiliate Marketing
Monetizing Your Newsletter with Affiliate Marketing
David Clayton8 views

Metadata is a Love Note to the Future

  • 1. Rachel Lovinger @rlovinger Confab, 22 May, 2015 Image via Bond
  • 2. 2 ©2015 All rights reserved. • Experience Director, Content Strategy; Razorfish New York • Co-editor of scatter/gather, a content strategy blog: http://scattergather.razorfish.com • Author of Nimble: A Razorfish Report on Publishing in the Digital Age (June 2010): http://nimble.razorfish.com • Twitter: @rlovinger
  • 4. 4
  • 5. 5
  • 6. 6
  • 7. 7
  • 8. 8
  • 9. 9
  • 10. 10
  • 11. 11 ©2015 All rights reserved. is HARDCORE
  • 12. 12 ©2015 All rights reserved. 2006 2009 2008 2012 2011 2010
  • 13. 13 ©2015 All rights reserved. Metadata = Context Context enables Connections How does one convey that in a concise and powerful way?
  • 14. 14 Photo by Jesse Chan-Norris
  • 15. Metadata Is A Love note To the Future
  • 16. 16 Tweet and photo by Erin Kissane, Tumblr by Austin Kleon 429 notes 82 retweets
  • 17. 17 Photo and shirt by Sarah
  • 18. 18 Photo by Rachel Lovinger
  • 19. 19 Content Strategy for Mobile by Karen McGrane
  • 21. 21 • Nearly 60,000 files archived • Mostly from 1980-1995 • Collected and curated since 1998 • Almost no metadata Textfiles.com
  • 22. 22 Who needs a database?
  • 23. 23 Metadata Skeptic transformed into… Metadata Warrior Photos by Jason Scott and Rachel Lovinger
  • 24. 24 Photo by Rachel Lovinger
  • 25. 25 • Me? Photo by Rachel Lovinger
  • 27. 27 ©2015 All rights reserved. ~3 years online content ~10 years magazine content
  • 28. 28 ©2015 All rights reserved. Imported from text files to CMS
  • 29. 29 ©2015 All rights reserved. Semi-structured information allowed us to map the files to content types and site sections, and add some metadata (author, published date, keywords, etc.) 10 years x 50 issues per year x 100 files per issue (approx.) 50,000 estimated articles
  • 30. 30 ©2015 All rights reserved. Once in the CMS, we could add photos, links, formatting, etc.
  • 31. 31 ©2015 All rights reserved. For the content already in the CMS, keywords had been manually typed in by authors • 6790 “different” keywords • Removed 12% during clean up • Typos • Redundant • Not Useful
  • 33. 33 ©2015 All rights reserved. • Star Wars: Episode I -- The Phantom Menace • Episode 1 • Episode I • Phantom Menace • Star Wars Episode I The Phantom Menace • Star Wars Episode I: The Phantom Menace • Star Wars prequel • Star Wars: Episode 1 -- The Phantom Menace • Star Wars: Episode i -- the Phantom Menace • Star Wars: Episode I: The Phantom Menace • Star Wars: Episode I--The Phantom Menace • Star Wars: Episode I--The Phantom Menance • Star Wars: Episode One -- The Phantom Menace • Star Wars: The Phantom Menace • Star Wars: The Phantom Menace -- Episode I • The Phantom Menace • The Phanton Menace
  • 34. 34 ©2015 All rights reserved. • TAFKAP?
  • 35. 35 ©2015 All rights reserved. • TAFKAP? • The Artist • Artist Formerly Known as Prince • The Artist Formerly Known As Prince • The Artist formerly known as Prince • the Artist Formerly Known as Prince • The Artist Formerly Known as Prince (PKA)
  • 37. 37 ©2015 All rights reserved. • The magazine was once a week • The website published new articles several times a day • Plus: Over 50,000 past articles! • How could we better use all that content?
  • 38. 38 ©2015 All rights reserved. If you like James Bond, we wanted it to be easy for you to discover everything we had. Cover Story Interview Photo Gallery Etc.
  • 40. 40
  • 41. 41 ©2015 All rights reserved. We put our controlled vocabulary into categories, to make them more distinct and meaningful. For example: • Book > Product > Harry Potter and the Goblet of Fire • Movie > Product > Harry Potter and the Goblet of Fire • Person > Individual > Daniel Radcliffe • Person > Individual > J.K. Rowling
  • 43. 43 • Relationships defined for each media type • Managed separately from the article content • The full set of metadata was available to all articles
  • 44. 44 ©2015 All rights reserved. • Standard relationships • For example, for Movie: - Lead Performers - Director - Writer - Release Date - EW Grade - Etc. • Select a related category for each relationship, as applicable • Some allow multiple values
  • 45. 45 • Authors just selected the primary category • Related metadata pulled in automatically • Updates appeared on all articles *Metadata categories and relationships were managed by a dedicated data librarian
  • 46. 46
  • 47. 47 ©2015 All rights reserved. • “Best Results” linked directly to an aggregated page based on the category. • For example: - “Cats & Dogs” vs. “The Truth About Cats & Dogs” - The Green Mile (Movie) vs. The Green Mile (Book)
  • 49. 49 • Wal-mart sold gallon jars of Vlasic pickles for $2.97. • A popular item – priced so low it nearly put Vlasic out of business. • By achieving their goals, they put themselves in a position they might not survive. See: http://www.fastcompany.com/47593/wal-mart-you-dont-know
  • 50. 50 ©2015 All rights reserved. • We wanted people to discover older content, and they did! • By 2006, we had 16 years of magazine and web content. • Other Time Inc. publications were interested in using our categorization system, too.
  • 51. 51 Not well-suited for our expensive and frequent database calls.
  • 52. 52 Our webservers were optimized to serve up the latest “issue” of content. 40% of Time Inc.’s database calls, only 25% of the total traffic
  • 53. 53 A 2007 redesign removed the “third column” entirely.
  • 54. 54 ©2015 All rights reserved. The creator of Freebase (a semi-semantic UGC site for structured content, now read-only) said EW.com was way ahead of its time.
  • 57. 57 Who needs a database?
  • 58. 58 “The hardest part of [recording] history is to be there when it happens.” Photo by Rachel Lovinger
  • 59. 59
  • 60. 60 • An informal post on August 4th • Notification sent out September 30th • Shut down October 31st
  • 61. 61 “What happened to my web page on my husband, Bob Champine, that took me many years to put together on his career and which meant a lot to me and to the aviation community. I noticed with 9.0 I lost the left margin and the picture of him exiting the X-1. I need to restore it to the internet as it is history. Please tell me what to do. I will be glad to retype it, I just don’t want it lost to the world. I need help. Gloria Champine”
  • 62. 62 Illustration from “Fire in the Library,” MIT Technology Review
  • 63. 63 “Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever.”
  • 64. 64
  • 65. 65
  • 66. 66
  • 67. 67
  • 68. 68
  • 69. 69
  • 70. 70
  • 71. 71
  • 72. 72 • In 6 months Archive Team saved 900 Gb • Estimated 4-5 Tb total • Other people saved additional pages, but probably ¼ is gone forever • For many people, Geocities was their first web presence
  • 73. 73
  • 74. 74
  • 75. 75
  • 76. 76 Those screenshots were automatically generated from Geocities sites rescued by Archive Team in 2009 See more at One Terabyte of Kilobyte Age Photo Op: http://oneterabyteofkilobyteage.tumblr.com/
  • 77. 77 Due to lack of metadata: • The rescued data was less useful • Really bulky files • Case-sensitive filenames difficult to access and read • Not in a web-ready format (WARC) • The process was less efficient and more error prone • Poor tracking of completed activity • Lots of duplication of data • Took way too long (6 months vs. 3 days) • Could have gotten all the data in a month (estimated)
  • 78. 78
  • 79. 79 ©2015 All rights reserved. Mission: The Internet Archive’s purposes include offering permanent access for researchers, historians, scholars, people with disabilities, and the general public to historical collections that exist in digital format. Photo by Ulf Benjaminsson
  • 80. 80
  • 81. 81
  • 82. 82
  • 83. 83 Save the history before it's lost forever Offer permanent access to historical collections that exist in digital format
  • 84. 84 ©2015 All rights reserved. Internet Archive contains: web pages, texts, videos, audio files, software, and images. (Plus concerts and collections) • Media Type makes it Readable or Playable • Emulator (for software) makes it Executable • Subject Keywords makes it Findable
  • 86. 86 ©2015 All rights reserved. • Is it Accurate? • Is it Credible? • What is the Source? (machines or people) • It’s a lot of Effort. Do we have enough people and time?
  • 88. 88 ©2015 All rights reserved. Additional processing takes place, depending on the type
  • 89. 89 • Description and keywords are required, but open fields • Other metadata is optional
  • 90. 90
  • 92. 92 ©2015 All rights reserved. • For user-generated content, it’s just easier for people not to. • Internet Archive will never have enough people on staff to do it properly.
  • 93. 93 Crowdsource manual creation of metadata Photo by Pascal
  • 94. 94 • Small a pool of volunteers, and their drive didn’t last long • Tools didn’t provide immediate feedback/satisfaction. They had to email their inputs and wait. Photo by psyberartist
  • 95. 95 • 10 most common words + 10 most common 2-word phrases • Applied to 200,000 items • Much more scalable • Heavily machine assisted: a person can validate data and create collections Photo by James St. John
  • 96. 96
  • 97. 97 “Controversial, but roughly as good as a bored intern.”
  • 98. 98 Topics: switch, atari, antenna, game, cable, terminals, console, television, video, program, power supply, console unit, video computer, game program, computer system, atari game, power switch, switch box, atari video, screw terminals
  • 99. 99 Having the stuff is vital, the most important thing. But it’s also vital to have a system by which these things are described. “If a person can’t get the information they need, then we’re failing.” Photo by Rachel Lovinger
  • 101. 101 • Jason had converted to a metadata advocate But I realized that… • Content strategists who care about the long game should think like historians, archivists and futurists, too.
  • 103. 103 • Dutch leader in academic research and education on biodiversity and taxonomy. • Has a collection of 37 million natural history objects.
  • 104. 104 Describe, understand and explore biodiversity for human wellbeing and the future of our planet. They do this with: • Accessible collections • Contributions to global scientific research • Awe of natural history • Openly shared knowledge
  • 105. 105 • From 2010 to June 2015 • 250 staff members & 450 volunteers • Digitizing 7 million objects in detail • Adding metadata for the other 30 million objects
  • 106. 106 • Information is more easily discovered, studied, and used. • Scientists worldwide can access it directly online, without assistance. • Some of this data has never been available in digital form before.
  • 107. 107 • Scientific name • Where it was found • When it was found • Who found it “Objects [in the collection] have no scientific value without this information.” - Suzanne de Jong-Kole
  • 108. 108
  • 109. 109 Employees enter data, verbatim, into the collection registration system.
  • 110. 110 This allows them to retrieve the physical specimen if requested.
  • 111. 111 • Vele Handen = Many Hands • People helped transcribe hand written labels • In 9 months, people did 200,000, of which about half were usable.
  • 112. 112 The person who collected the specimen wrote the metadata on the label. This could be a professional researcher, or a non-professional enthusiast.
  • 114. 114 The oldest is this Spanish pepper from 1550!
  • 115. 115 When they wrote this metadata, they had no idea that nearly half a millennium later people would be “digitizing” it.
  • 116. 116 ©2015 All rights reserved. The ‘love note’ is when you behave selflessly for a partner – or customer – that doesn’t exist yet. A drawing Jason drew in my notebook in high school, 20+ years before we ever dated.