Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Smxeastbarbarastarr2012
1. Schema 101
Why Metadata Matters: From a Search Engine Perspective.
By: Barbara Starr
Twitter: @BarbaraStarr
Email: bstarr@Ontologica.us
2. Meta Information
ME
• Pursued a doctorate in Artificial Intelligence from South My favorite author:
Africa in the 80's. Isaac Asimov
• Recruited to build intelligent/predictive trading systems
on Wall Street
• Migrated to government-based contracts, several of
which turned into real world products like Favorite book:
– SIRI (PAL from DARPA)
– WATSON (Acquaint - IBM Watson Labs was a team
I Robot
member)
• From the vantage of a semantic technologist, I keenly
watched the evolution of the Semantic Web.
• “Shocked into the real world” when working as a
consultant @ Overstock
• Today - Educator, Consultant, Developer. Favorite character:
MULTIVAC
By: Barbara Starr
Twitter: @BarbaraStarr
Email: bstarr@Ontologica.us
Linkedin: http://www.linkedin.com/in/barbarastarr
7. SEARCH ENGINE POINT OF VIEW
I can provide direct
answers to queries by
searching on
consumed, verified and
validated information
8. SEARCH ENGINE POINT OF VIEW
I can even aggregate
answers or deduce
them (like a timeline of
events)
9. SEARCH ENGINE POINT OF VIEW
?
I can detect
Penn Treebank tagset
relevancy
signals: i.e what
content to show
to what I can even use it in
I can use it to
audience conjunction with
Assist in
interpreting a machine learning
user query techniques- to eg.
Train other
components
10. SEARCH ENGINE POINT OF VIEW
Really interesting in terms
of exposing long tail
content too. It makes
I meant the things findable for me
beer brewer when pages are published
in Arizona with structured markup!
11. SEARCH ENGINE POINT OF VIEW
I could really use
Multiple conflicting
this stuff. And it
vocabularies that I will
is like the tower
have to align internally
of babel out
and multiple syntax
there!
formats as well.
?
Microdata
Microformats
RDFa
Goodrelations for e-commerce
I’m a Search Engine Robot
Prior to Schema.org
13. What has been the history?
RDFa exploded in 2012 – Source Peter Mika - Yahoo
Another five-fold increase
between October 2010 and
January, 2012
Five-fold increase between
March, 2009 and October,
2010
Percentage of URLs with embedded metadata in various formats
14. Current state of metadata on the Web
• 31% of webpages, 5% of domains contain some metadata
– Analysis of the Bing Crawl (US crawl, January, 2012)
– RDFa is most common format
• By URL: 25% RDFa, 7% microdata, 9% microformat
• By eTLD (PLD): 4% RDFa, 0.3% microdata, 5.4% microformat
– Adoption is stronger among large publishers
• Especially for RDFa and microdata
• See also
– P. Mika, T. Potter. Metadata Statistics for a Large Web Corpus, LDOW 2012
– H.Mühleisen, C.Bizer.Web Data Commons - Extracting Structured Data
from Two Large Web Corpora, LDOW 2012
15. What’s been the History
Linked Open Data exploded from 2007 thru 2010
Oct 2007
Nov 2007
16. What’s been the History
Linked Open Data exploded from 2007 thru 2010
Sept 2008
March 2009
17. What’s been the History
Linked Open Data exploded from 2007 thru 2010
Sussex St.
Sept 2010
Reading Andrews NDL
Audio- Lists Resource subjects t4gm
MySpace scrobbler Lists
Moseley (DBTune) (DBTune) RAMEAU
Folk NTU SH lobid
GTAA Plymouth Resource
Lists
Organi-
Reading
Lists
sations
Music The Open ECS
Magna- Brainz Music
DB tune Library LCSH South-
(Data Brainz LIBRIS ampton
Tropes lobid Ulm
Incubator) (zitgist) Man- EPrints
Resources
chester
Surge Reading
biz. Music RISKS
Radio Lists The Open ECS
data. John Brainz
Discogs Library PSH Gem. UB South-
gov.uk Peel (DBTune)
FanHubz (Data In- (Talis) Norm- Mann- ampton
(DB cubator) Jamendo datei heim RESEX
Tune)
Popula- Poké- DEPLOY
Last.fm
tion (En- pédia
Artists Last.FM Linked RDF
AKTing) research EUTC (DBTune) (rdfize) LCCN VIAF Book Wiki
data.gov Produc- Pisa Eurécom
P20 Mashup semantic
NHS .uk tions classical web.org
(EnAKTing) Pokedex
(DB
Mortality Tune) PBAC ECS
(En-
AKTing)
BBC MARC (RKB Budapest
Program Codes Explorer)
Energy education OpenEI BBC List Semantic Lotico Revyu OAI
(En- CO2 data.gov mes Music Crunch SW
AKTing) (En- .uk Chronic- Linked Dog
NSZL Base
AKTing) ling Event- MDB RDF Food IRIT
America Media Catalog
ohloh
BBC DBLP ACM IBM
Good- BibBase
Ord- Wildlife (RKB
Openly Recht- win
nance Finder Explorer)
Local spraak. Family DBLP
legislation Survey Tele- New VIVO UF
.gov.uk nl graphis York flickr (L3S) New-
VIVO castle
Times URI wrappr Open Indiana RAE2001
UK Post- Burner Calais DBLP
codes statistics (FU
VIVO CiteSeer Roma
data.gov LOIUS Taxon iServe Berlin) IEEE
.uk Cornell
Concept Geo
World data
ESD Fact- OS dcs
Names book dotAC
stan- reference Project
Linked Data NASA (FUB) Freebase
dards data.gov Guten-
.uk
for Intervals (Data GESIS Course-
transport DBpedia berg STW ePrints CORDIS
Incu- ware
data.gov bator) (FUB)
Fishes ERA UN/
.uk
of Texas Geo LOCODE
Uberblic
Euro- Species
The stat dbpedia TCM SIDER Pub KISTI
(FUB) lite Gene STITCH Chem JISC
London Geo KEGG
DIT LAAS
Gazette TWC LOGD Linked Daily OBO Drug
Eurostat Data UMBEL lingvoj Med
(es) Disea-
YAGO Medi some
Care ChEBI KEGG NSF
Linked KEGG KEGG
Linked Drug Cpd
GovTrack rdfabout Glycan
Sensor Data CT Bank Pathway
US SEC Open Reactome
(Kno.e.sis) riese Uni
Cyc Lexvo Path-
way PDB Media
Semantic totl.net Pfam
HGNC
XBRL
WordNet KEGG KEGG Geographic
(VUA) Linked Taxo- CAS Reaction
rdfabout Twarql UniProt Enzyme
EUNIS Open nomy
US Census Publications
Numbers PRO- ProDom
SITE Chem2
UniRef Bio2RDF User-generated content
Climbing WordNet SGD Homolo
Linked (W3C) Affy- Gene
GeoData
Cornetto
metrix Government
PubMed Gene
UniParc
Ontology
GeneID Cross-domain
Airports
Product
DB UniSTS MGI
Gen Life sciences
Bank OMIM InterPro
As of September 2010
LOD Cloud
18. Timeline of RDFa and Semantic Web Adoption
As of Semtech 2011
Inevitable passage of
Semantic Web adoption –
culminating in schema.org
19. SEARCH ENGINE POINT OF VIEW
A Search Engine
alliance has the power
to MANDATE Align and consume
vocabulary and syntax! many vocabularies
that may not be of
interest to search
engines?
Rather mandate vocabulary And Syntax - microdata
23. SEARCH ENGINE POINT OF VIEW
Ensure your data
feeds match
information with
the structured
markup or Make sure you are
“metadata” on not cloaking by
your web pages. feeding one set of
information to me
and another to
human users!
24. SEARCH ENGINE POINT OF VIEW
Serving
RELEVANT
ANSWERS are
IMPERATIVE!
& central to my
very being!
Your Logo
27. SEARCH ENGINE POINT OF VIEW
Adding context in
search verticals really
Google’s “SearchVerticals” helps me serve up
relevant information
Notice any correlations? (Seriously increases my
I would advise you to! recall), as does
geospatial information.
Consumed information -
Structured Data Dashboard
28. SEARCH ENGINE POINT OF VIEW
“Amazing fact: same
amount of computing to
answer one Google Search
query as all the computing
done -- in flight and on the
ground -- for the entire
Apollo program!
I also have a pretty
good understanding of
big data and web
intelligence so I can
leverage them!
SIRI
OH! and be sure to
check out Moores law
29. SEARCH ENGINE POINT OF VIEW
I can combine it with
computer vision
techniques.
I can leverage
metadata for
better image
search
SIRI
I can enhance
user’s shopping
experience.
30.
31. SEARCH ENGINE POINT OF VIEW
Symbolic
reasoning vs
stochastic
reasoning (Latter is
more like NLP or Know rather than
page rank) Recognize?
INTRODUCING THE KNOWLEDGE GRAPH
32. And if you thought
SEARCH ENGINE POINT OF VIEW
the knowledge
graph was cool, Talk of increase in
checkout the screen real estate
knowledge and CTR?
carousel!
33. SEARCH ENGINE POINT OF VIEW
Thank you for your
time!
And just a bye-the-bye,
this technology is still in
it’s nascent stages. Can
Resources to help
you imagine what I will
you! Make sure to
be able to do soon?
use them wisely!
Barbara Starr
Email: bstarr@ontologica.us
Twitter: @BarbaraStarr
34. Resources at this point in time
Caveat: Some training may be required for some of the tools
Programming Languages: Publishing Platforms:
JavaSCript:Microdatajs Form Based tools: Drupal
Live microdata Schema Creator Joomla
Php: Microdataphp Microdata generator Wordpress (about 7 of them)
Ruby: RDF Microdata Standalone tools Virtuoso
RDF Lib plugin Web.instadata Topbraid Composer
Editors:
PerlRuby: RDF Microdata Gem Topbraid Composer
Mida Protege
Java: Sindice any23 library
Validators, Testers and More Check.rdfa.infoSindice Inspector
Rich Snippets Testing Tool Bing Validator
Structured data Linter Online Parser?viewer and RSS generator
Validator.nuGoogle Structured Data Tester
35. Resources at this point in time
Goodrelations: Resources, generators, validators, more, ….
38. Other Semantic Web Resources
Caveat: Some training may be required for some of the tools
OpenCalais – Can extract information about people, places and things
AlchemyAPI – named entity extraction, topic recognition, keyword tagging, more ….
Cogito – Expert System
Franz Inc. – Gruff
Many More….
Barbara Starr
Twitter: @BarbaraStarr
Email: bstarr@Ontologica.us
For more info contact: Linkedin: http://www.linkedin.com/in/barbarastarr