Exploring Our World With
Freebase
Paul Houle
paul@ontology2.com
Generic Databases
Where does the data come from?
Copyright 2009 CC-BY by Richard HeavenRobot Image Copyright 2007 CC-BY by Crispin Summers
Google Knowledge Graph
The Wikipedia Data Ecosystem
API RDF Deferencing
Quad Dump
Simple Topic Dump
Type Tables
MQL
{
"status": "200 OK",
"code": "/api/status/ok",
"result": {
"type": "/music/artist",
"name": "The Police",
"album": [
...
My path to the semantic web
My path to the semantic web
My path to the semantic web
Infovore 1
Quad Dump Simple Topic Dump
:BaseKB Pro :BaseKB Lite
Spring 2012
Fall 2012
Quad Dump
Official RDF Dump
Infovore 1.0 released as open source under Apache License
13+ million Invalid Facts
Image cc-by from arj03
Infovore 1.0
Quad Dump -> RDF
Infovore 1.1
General RDF Cleanup
& Filtering
Millipede framework – Map/Reduce on a single co...
Infovore 2
What does Freebase cover?
Is it a bibliographic database?
Ahead of their time?
Reading Room, Library of Congress
MARC… in electronic form since 1969!
First standard data format with variable length fields & I18N.
Now everybody has a bibliographic
database…
Or, do documents annotate the
world?
Social Semantic Systems
Linked Data User-Generated Content
The dominant paradigm
Triple store
How to break your triple store
http://gen5.info/q/2009/02/25/putting-freebase-in-a-star-schema/
The RDF data warehouse
ETL
warehouse
operations
development
science
The RDF data warehouse II
warehouse
Operations tools
Science Tools
Latency: low is not low enough
operations
development
science
0 10 20 30 40 50 60
Freebase
DBpedia
any relational database
machine learning
Jena
Amazon Web Services
PHP
map/reduce fram...
Map/Reduce
Inputs
Mappers
Shuffle
Sort
Reducers
Output
RDF: Reduction on Subject
:Goat
:Bear
:Alligator
:Iguana
:Dog
:Elephant
:Cat
:Horse
:Fox
:Alligator
:Dog
:Goat
:Bear
:Elep...
Jena Framework
SDB
Relational db-based
Triple store
TDB
Native disk-based
triple store
Model
In-memory triple store
“We us...
Hadoop Physical Architecture
Namenode
Jobtracker
Datanodes
&
Tasktrackers
HDFS
My development cluster – Namenode/JobTracker
Hadoop tolerates
Hardware failures
My other computer is
Amazon Elastic Map/Reduce
Amazon S3 (Permanent Storage)
“It’s harder to make up names for things than to invent them”
- Tom Swift
Fictional American Inventor
Infovore modules
bakemono
haruhi
centipede
chopper
Bakemono Super JAR
Bakemono Super JAR
Contains applications like
freebaseRDFPrefilter pse3 ranSample
sieve3
Named after Japanese word for “mo...
“Haruhi”
(1) Japanese religious word for “Full of Spirit” ; (2) a very dominant person
Unpacking the Freebase RDF Dump
photograph Copyright 2010 Ian Munroe CC-BY SA
Eliminate Bulk Up Front
BIG DATA
Eliminate Bulk Up Front
DATA
Inputs
Mappers
freebaseRDFPrefilter removes…
Wasteful Facts
• 120M+ copies of the “a” predicate
• 60M+ access control predicates
Violent ...
… uneven bin distribution …
331
332330
333
334 335
… …
Prefiltering stops memory exhaustion
before it happens!
Parallel Super Eyeball
“triples”
valid triples junk
Currently, 250,000 or so triples in Freebase are rejected by PSE3
Parallel Super Eyeball 3
Sieve3
literal facts (ex. ?s ?p 55. )
?s :a ?p .
?s ?p ns:some_topic .
?s rdfs:label ?o .
Horizontal Decomposition of Freebase
a
5%
description
18%
key
11%
keyNs
13%
label
6%
name
6%
notability
0%
nfp
0%
text
8%
web
6%
links
20%
other
7%
percentage ...
a
16%
description
1%
key
9%
keyNs
11%
label
6%
name
6%
notability
2%
nfp
2%
text
0%
web
5%
links
32%
other
10%
percentage ...
a
15%
description
7%
key
8%
keyNs
9%
label
4%
name
4%
notability
2%
nfp
1%
text
3%
web
6%
links
30%
other
11%
percentage o...
rdf:type aka “a”
16% 15%
5%
facts bytes compressed bytes
ns:m.02qvftw rdf:type ns:business.employer .
RDFS Inference
:a :Actor ?
RDFS Inference
Jesse Plemons
Todd
:a :Actor .
Jesse Plemons
Todd
implies
Descriptions
1%
facts
18%
bytes
7%
compressed
Descriptions
ns:m.010bfy ns:common.topic.description
"Riverside u00E9 uma cidade localizada no estado norte-americano
de T...
Descriptions
ns:m.010bfy ns:common.topic.description
"Riverside u00E9 uma cidade localizada no estado norte-americano
de T...
Descriptions
ns:m.010bfy ns:common.topic.description
"Riverside u00E9 uma cidade localizada no estado norte-americano
de T...
Labels and Names
ns:american_football.football_division rdfs:label
"American football division"@en .
ns:american_football....
Freebase Labels Are Not Unique
Dbpedia Labels are Unique
https://github.com/paulhoule/infovore/wiki
https://groups.google.com/forum/#!forum/infovore-basekb
Keys in the Freebase dump
• Most objects represented by mid identifiers
Keys in the Freebase dump
• Schema objects have friendly identifiers
Keys in the Freebase dump
Examples…
ns:m.010bs8 ns:common.topic.description
"El Campo is a city in Wharton County, Texas, United States. The
populat...
It wasn’t always this way
… the old quad dump used mids in the subject field, but others in the destination field …
Turtle0
Turtle1
Turtle2
Turtle3
Extract namespace graph
Convert all identifiers to mids
Extract type information from sche...
Freebase Knows Many Keys
ns:g.11vk55hmr ns:type.object.key "/base/dspl/us_census/population/place" .
ns:m.010004m ns:type....
A directed acyclic graph
/m/01
root
/m/019s
wikipedia
/m/047w32v
authority
/m/0gt9
en
/m/05x_rjr
Geoff_Simmons
/wikipedia/...
key: namespace encodes the graph
ns:m.010005 key:wikipedia.pt
"Corinth_$0028Texas$0029" .
ns:m.010005h key:authority.music...
Useful external keys
Music
http://www.freebase.com/authority/musicbrainz/e217a1e9-9ec8-4e88-aebc-7d6b720384c1
Musical Composition
…
Recording
“Recording appears on Album as track #”
Functional Requirements For
Bibliographic Records (FRBR)
Nick Hexium Rap Rock
311
Omaha, NE Los Angeles, CA
Unique data in DBpedia
Wikipedia Categories
Wikipedia Page Links
“Smushing”
dbpedia:Striated_Heron :linksTo dbpedia:Heron .
dbpedia:Striated_Heron owl:sameAs ns:m.01v7dp .
dbpedia:Heron o...
Duck Types
• ?a performed on music track ?b
- ?a is a musician
Duck Types
• ?a employed ?b
- ?a is an employer
Duck Types
• Book ?a was written about ?b
– ?b is a book subject
The Problem of Notability
ns:m.0100007 ns:common.topic.notable_types ns:m.0kpv11.
ns:m.01000_r ns:common.topic.notable_types ns:m.0kpv11.
ns:m.01000...
Analysis with Chopper and Pig
Why APIs suck
(Including SPARQL endpoints)
• Provider can afford maximum $/query
• If you need a more complex query you’ve...
:BaseKB Now
YOU
AWS S3
Cluster creation made easy
:BaseKB Now
Pig Script – count common types
$ pig
grunt> run chopper/src/main/pig/lib/chopper.pig
grunt> a = LOAD '/freebase/20130915/...
Most frequent types
(<http://rdf.basekb.com/ns/common.topic>,39030195)
(<http://rdf.basekb.com/ns/common.notable_for>,1874...
Compound Value Types
and our 4D world
The 13th most prevalent type
(<http://rdf.basekb.com/ns/common.topic>,39030195)
(<http://rdf.basekb.com/ns/common.notable_...
:Las_Vegas
945
1910
:US_Census_Bureau
population
number
date
source
25 1900
945 1910
2,304 1920
5,165 1930
8,422 1940
24,624 1950
64,405 1960
125,787 1970
164,674 1980
260,561 1990
284,931 1...
Vertical Divisions of Freebase
Wikipedia Topics Movies and Television Travel and Lodging
:BaseKB Lite
Separating Blank Nodes
Separating Blank Nodes
Separating Blank Nodes
Separating Blank Nodes
:BaseKB Now
• Created Weekly by automated process
• Delivered to AMZN S3
• Accepted facts are 100% Valid RDF
• Rejected fa...
Infovore Software
http://github.com/paulhoule/infovore/wiki
Exploring our world with freebase
Exploring our world with freebase
Exploring our world with freebase
Exploring our world with freebase
Exploring our world with freebase
Exploring our world with freebase
Exploring our world with freebase
Exploring our world with freebase
Exploring our world with freebase
Exploring our world with freebase
Exploring our world with freebase
Exploring our world with freebase
Exploring our world with freebase
Exploring our world with freebase
Exploring our world with freebase
Upcoming SlideShare
Loading in...5
×

Exploring our world with freebase

10,844

Published on

I gave this talk on Oct 2 at the Semantic Technology and Business conference. In this talk I discuss how I process Freebase data with the open source Infovore framework, which processes Freebase and other RDF data quickly by using Hadoop, Map/Reduce, and Amazon Web Services

Published in: Education
0 Comments
11 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
10,844
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
63
Comments
0
Likes
11
Embeds 0
No embeds

No notes for slide

Exploring our world with freebase

  1. 1. Exploring Our World With Freebase Paul Houle paul@ontology2.com
  2. 2. Generic Databases
  3. 3. Where does the data come from? Copyright 2009 CC-BY by Richard HeavenRobot Image Copyright 2007 CC-BY by Crispin Summers
  4. 4. Google Knowledge Graph
  5. 5. The Wikipedia Data Ecosystem
  6. 6. API RDF Deferencing Quad Dump Simple Topic Dump Type Tables
  7. 7. MQL { "status": "200 OK", "code": "/api/status/ok", "result": { "type": "/music/artist", "name": "The Police", "album": [ "Outlandos d'Amour", "Reggatta de Blanc", "Zenyatta Mondatta", "Ghost in the Machine", "Synchronicity" ] } }
  8. 8. My path to the semantic web
  9. 9. My path to the semantic web
  10. 10. My path to the semantic web
  11. 11. Infovore 1 Quad Dump Simple Topic Dump :BaseKB Pro :BaseKB Lite
  12. 12. Spring 2012
  13. 13. Fall 2012 Quad Dump Official RDF Dump Infovore 1.0 released as open source under Apache License
  14. 14. 13+ million Invalid Facts Image cc-by from arj03
  15. 15. Infovore 1.0 Quad Dump -> RDF Infovore 1.1 General RDF Cleanup & Filtering Millipede framework – Map/Reduce on a single computer
  16. 16. Infovore 2
  17. 17. What does Freebase cover?
  18. 18. Is it a bibliographic database?
  19. 19. Ahead of their time? Reading Room, Library of Congress
  20. 20. MARC… in electronic form since 1969! First standard data format with variable length fields & I18N.
  21. 21. Now everybody has a bibliographic database…
  22. 22. Or, do documents annotate the world?
  23. 23. Social Semantic Systems Linked Data User-Generated Content
  24. 24. The dominant paradigm Triple store
  25. 25. How to break your triple store http://gen5.info/q/2009/02/25/putting-freebase-in-a-star-schema/
  26. 26. The RDF data warehouse ETL warehouse operations development science
  27. 27. The RDF data warehouse II warehouse Operations tools Science Tools
  28. 28. Latency: low is not low enough
  29. 29. operations development science
  30. 30. 0 10 20 30 40 50 60 Freebase DBpedia any relational database machine learning Jena Amazon Web Services PHP map/reduce frameworks (ex. Hadoop) MongoDB Sesame Virtuoso OpenLink other NoSQL database Solid State Drives (SSD) other cloud computing service Neo4J Ruby Drupal alternative JVM languages (ex. Scala or Clojure) other triple store any key/value store (ex. JDBM or Berkeley DB) OWLIM Allegrograph 4store Factual dotNetRDF Stardog Kasabi/Talis Platform Oracle Spatial RDF Tools Popular With :BaseKB Users
  31. 31. Map/Reduce Inputs Mappers Shuffle Sort Reducers Output
  32. 32. RDF: Reduction on Subject :Goat :Bear :Alligator :Iguana :Dog :Elephant :Cat :Horse :Fox :Alligator :Dog :Goat :Bear :Elephant :Horse :Cat :Fox :Iguana
  33. 33. Jena Framework SDB Relational db-based Triple store TDB Native disk-based triple store Model In-memory triple store “We use Jena Models like PHP programmers use hashtables” -- Kendall Clark, Clark and Parsia
  34. 34. Hadoop Physical Architecture Namenode Jobtracker Datanodes & Tasktrackers HDFS
  35. 35. My development cluster – Namenode/JobTracker
  36. 36. Hadoop tolerates Hardware failures
  37. 37. My other computer is
  38. 38. Amazon Elastic Map/Reduce Amazon S3 (Permanent Storage)
  39. 39. “It’s harder to make up names for things than to invent them” - Tom Swift Fictional American Inventor
  40. 40. Infovore modules bakemono haruhi centipede chopper
  41. 41. Bakemono Super JAR
  42. 42. Bakemono Super JAR Contains applications like freebaseRDFPrefilter pse3 ranSample sieve3 Named after Japanese word for “monsters”
  43. 43. “Haruhi” (1) Japanese religious word for “Full of Spirit” ; (2) a very dominant person
  44. 44. Unpacking the Freebase RDF Dump photograph Copyright 2010 Ian Munroe CC-BY SA
  45. 45. Eliminate Bulk Up Front BIG DATA
  46. 46. Eliminate Bulk Up Front DATA
  47. 47. Inputs Mappers
  48. 48. freebaseRDFPrefilter removes… Wasteful Facts • 120M+ copies of the “a” predicate • 60M+ access control predicates Violent and Dangerous facts ns:common.topic ns:type.type.instance ?o . Is repeated 30M times, and if you group on ?s and keep them in memory…
  49. 49. … uneven bin distribution … 331 332330 333 334 335 … …
  50. 50. Prefiltering stops memory exhaustion before it happens!
  51. 51. Parallel Super Eyeball “triples” valid triples junk Currently, 250,000 or so triples in Freebase are rejected by PSE3
  52. 52. Parallel Super Eyeball 3
  53. 53. Sieve3 literal facts (ex. ?s ?p 55. ) ?s :a ?p . ?s ?p ns:some_topic . ?s rdfs:label ?o .
  54. 54. Horizontal Decomposition of Freebase
  55. 55. a 5% description 18% key 11% keyNs 13% label 6% name 6% notability 0% nfp 0% text 8% web 6% links 20% other 7% percentage of gz compressed size
  56. 56. a 16% description 1% key 9% keyNs 11% label 6% name 6% notability 2% nfp 2% text 0% web 5% links 32% other 10% percentage of facts
  57. 57. a 15% description 7% key 8% keyNs 9% label 4% name 4% notability 2% nfp 1% text 3% web 6% links 30% other 11% percentage of uncompressed size
  58. 58. rdf:type aka “a” 16% 15% 5% facts bytes compressed bytes ns:m.02qvftw rdf:type ns:business.employer .
  59. 59. RDFS Inference :a :Actor ?
  60. 60. RDFS Inference Jesse Plemons Todd
  61. 61. :a :Actor . Jesse Plemons Todd implies
  62. 62. Descriptions 1% facts 18% bytes 7% compressed
  63. 63. Descriptions ns:m.010bfy ns:common.topic.description "Riverside u00E9 uma cidade localizada no estado norte-americano de Texas, no Condado de Walker."@pt . ns:m.010bs8 ns:common.topic.description "El Campo is a city in Wharton County, Texas, United States. The population was 10,945 at the 2000 census, making it the largest city in Wharton County."@en .
  64. 64. Descriptions ns:m.010bfy ns:common.topic.description "Riverside u00E9 uma cidade localizada no estado norte-americano de Texas, no Condado de Walker."@pt . ns:m.010bs8 ns:common.topic.description "El Campo is a city in Wharton County, Texas, United States. The population was 10,945 at the 2000 census, making it the largest city in Wharton County."@en . This does not compute!
  65. 65. Descriptions ns:m.010bfy ns:common.topic.description "Riverside u00E9 uma cidade localizada no estado norte-americano de Texas, no Condado de Walker."@pt . ns:m.010bs8 ns:common.topic.description "El Campo is a city in Wharton County, Texas, United States. The population was 10,945 at the 2000 census, making it the largest city in Wharton County."@en .
  66. 66. Labels and Names ns:american_football.football_division rdfs:label "American football division"@en . ns:american_football.football_conference rdfs:label "Grupper inom amerikansk fotboll"@sv . ns:american_football.football_player ns:type.object.name "Football-Spieler"@de . ns:american_football.football_team ns:type.object.name "American football-team"@nl .
  67. 67. Freebase Labels Are Not Unique
  68. 68. Dbpedia Labels are Unique
  69. 69. https://github.com/paulhoule/infovore/wiki https://groups.google.com/forum/#!forum/infovore-basekb
  70. 70. Keys in the Freebase dump • Most objects represented by mid identifiers
  71. 71. Keys in the Freebase dump • Schema objects have friendly identifiers
  72. 72. Keys in the Freebase dump
  73. 73. Examples… ns:m.010bs8 ns:common.topic.description "El Campo is a city in Wharton County, Texas, United States. The population was 10,945 at the 2000 census, making it the largest city in Wharton County."@en . ns:american_football.football_division rdfs:label "American football division"@en . Freebase always uses the same key in the ?s, ?p, and ?o fields, but...
  74. 74. It wasn’t always this way … the old quad dump used mids in the subject field, but others in the destination field …
  75. 75. Turtle0 Turtle1 Turtle2 Turtle3 Extract namespace graph Convert all identifiers to mids Extract type information from schema Convert to RDF types :BaseKB 2012
  76. 76. Freebase Knows Many Keys ns:g.11vk55hmr ns:type.object.key "/base/dspl/us_census/population/place" . ns:m.010004m ns:type.object.key "/authority/musicbrainz/339a2897-9ba4-4820-a2a8-f234c22608a4“ . ns:Lm.01003_ ns:type.object.key "/wikipedia/de/Krum_$0028Texas$0029“ . ns:m.01010d ns:type.object.key "/wikipedia/en_id/135860" . ns:m.0100_b ns:type.object.key "/authority/gnis/1352653" . ns:m.0100l2 ns:type.object.key "/authority/hud/countyplace/4814101390" . ns:m.01031l ns:type.object.key "/en/chandler_texas" . ns:m.015g9m ns:type.object.key "/en/aliens_from_space" . ns:m.015gdl ns:type.object.key "/en/self-publishing" . ns:m.015gjr ns:type.object.key "/authority/nndb/231$002F000085973" . … and type.object.key spells them out …
  77. 77. A directed acyclic graph /m/01 root /m/019s wikipedia /m/047w32v authority /m/0gt9 en /m/05x_rjr Geoff_Simmons /wikipedia/en/Geoff_Simmons = /authority/wikipedia/en/Geoff_Simmons
  78. 78. key: namespace encodes the graph ns:m.010005 key:wikipedia.pt "Corinth_$0028Texas$0029" . ns:m.010005h key:authority.musicbrainz "ab0b82ce-d1be-4641-b0d1-838896a25887" .
  79. 79. Useful external keys
  80. 80. Music
  81. 81. http://www.freebase.com/authority/musicbrainz/e217a1e9-9ec8-4e88-aebc-7d6b720384c1
  82. 82. Musical Composition … Recording “Recording appears on Album as track #”
  83. 83. Functional Requirements For Bibliographic Records (FRBR)
  84. 84. Nick Hexium Rap Rock 311 Omaha, NE Los Angeles, CA
  85. 85. Unique data in DBpedia
  86. 86. Wikipedia Categories
  87. 87. Wikipedia Page Links
  88. 88. “Smushing” dbpedia:Striated_Heron :linksTo dbpedia:Heron . dbpedia:Striated_Heron owl:sameAs ns:m.01v7dp . dbpedia:Heron owl:sameAs ns:m.01jgnh . Ns:m.01v7dp :linksTo ns:m.01jgnh .
  89. 89. Duck Types • ?a performed on music track ?b - ?a is a musician
  90. 90. Duck Types • ?a employed ?b - ?a is an employer
  91. 91. Duck Types • Book ?a was written about ?b – ?b is a book subject
  92. 92. The Problem of Notability
  93. 93. ns:m.0100007 ns:common.topic.notable_types ns:m.0kpv11. ns:m.01000_r ns:common.topic.notable_types ns:m.0kpv11. ns:m.01000dh ns:common.topic.notable_types ns:m.09jd9nh. ns:m.01000pp ns:common.topic.notable_types ns:m.09jd9nh. ns:m.01000px ns:common.topic.notable_types ns:m.0kpv11. ns:m.01000w ns:common.topic.notable_types ns:m.01m9. ns:m.01000yk ns:common.topic.notable_types ns:m.0kpv11. ns:m.010012t ns:common.topic.notable_types ns:m.0kpv11. ns:m.010014_ ns:common.topic.notable_types ns:m.09jd9nh. ns:m.010019c ns:common.topic.notable_types ns:m.09jd9nh.
  94. 94. Analysis with Chopper and Pig
  95. 95. Why APIs suck (Including SPARQL endpoints) • Provider can afford maximum $/query • If you need a more complex query you’ve got no option!
  96. 96. :BaseKB Now YOU AWS S3
  97. 97. Cluster creation made easy :BaseKB Now
  98. 98. Pig Script – count common types $ pig grunt> run chopper/src/main/pig/lib/chopper.pig grunt> a = LOAD '/freebase/20130915/a/' USING com.ontology2.chopper.io.PrimitiveTripleInput(); grunt> oNodes = FOREACH a GENERATE o; grunt> groupNodes = GROUP oNodes BY o; grunt> countedNodes = FOREACH groupNodes GENERATE group AS uri:chararray,COUNT(oNodes) AS cnt:long; grunt> sortedNodes = ORDER countedNodes BY cnt DESC; grunt> top100= DUMP sortedNodes;
  99. 99. Most frequent types (<http://rdf.basekb.com/ns/common.topic>,39030195) (<http://rdf.basekb.com/ns/common.notable_for>,18747254) (<http://rdf.basekb.com/ns/music.release_track>,13304261) (<http://rdf.basekb.com/ns/music.recording>,8902041) (<http://rdf.basekb.com/ns/music.single>,6297869) (<http://rdf.basekb.com/ns/common.document>,5580077) (<http://rdf.basekb.com/ns/media_common.cataloged_instance>,3030634) (<http://rdf.basekb.com/ns/book.book_edition>,2771323) (<http://rdf.basekb.com/ns/people.person>,2742157) (<http://rdf.basekb.com/ns/type.namespace>,2689781) (<http://rdf.basekb.com/ns/book.isbn>,2601099) (<http://rdf.basekb.com/ns/type.content>,2499648) (<http://rdf.basekb.com/ns/measurement_unit.dated_integer>,2466557)
  100. 100. Compound Value Types and our 4D world
  101. 101. The 13th most prevalent type (<http://rdf.basekb.com/ns/common.topic>,39030195) (<http://rdf.basekb.com/ns/common.notable_for>,18747254) (<http://rdf.basekb.com/ns/music.release_track>,13304261) (<http://rdf.basekb.com/ns/music.recording>,8902041) (<http://rdf.basekb.com/ns/music.single>,6297869) (<http://rdf.basekb.com/ns/common.document>,5580077) (<http://rdf.basekb.com/ns/media_common.cataloged_instance>,3030634) (<http://rdf.basekb.com/ns/book.book_edition>,2771323) (<http://rdf.basekb.com/ns/people.person>,2742157) (<http://rdf.basekb.com/ns/type.namespace>,2689781) (<http://rdf.basekb.com/ns/book.isbn>,2601099) (<http://rdf.basekb.com/ns/type.content>,2499648) (<http://rdf.basekb.com/ns/measurement_unit.dated_integer>,2466557)
  102. 102. :Las_Vegas 945 1910 :US_Census_Bureau population number date source
  103. 103. 25 1900 945 1910 2,304 1920 5,165 1930 8,422 1940 24,624 1950 64,405 1960 125,787 1970 164,674 1980 260,561 1990 284,931 1991 297,326 1992 312,634 1993 336,380 1994 354,559 1995 372,849 1996 391,074 1997 405,245 1998 418,658 1999 484,487 2000 498,638 2001 507,219 2002 516,723 2003 534,168 2004 544,806 2005 552,855 2006 559,892 2007 562,849 2008 567,641 2009 584,539 2010 589,317 2011 0 100000 200000 300000 400000 500000 600000 700000 1900 1920 1940 1960 1980 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 AxisTitle Population of Las Vegas, NV Series1
  104. 104. Vertical Divisions of Freebase Wikipedia Topics Movies and Television Travel and Lodging :BaseKB Lite
  105. 105. Separating Blank Nodes
  106. 106. Separating Blank Nodes
  107. 107. Separating Blank Nodes
  108. 108. Separating Blank Nodes
  109. 109. :BaseKB Now • Created Weekly by automated process • Delivered to AMZN S3 • Accepted facts are 100% Valid RDF • Rejected facts collected for inspection • “Violent” predicates removed to fight skew • Horizontally divided for fast processing http://basekb.com/
  110. 110. Infovore Software http://github.com/paulhoule/infovore/wiki
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×