Broad Data Jim Hendler Tetherless World Constellation  Tetherless World Professor of Computer and Cognitive Science Direct...
Outline (if  I stick to it) <ul><li>Big Data ≠   Broad Data </li></ul><ul><li>Broad Data problem </li></ul><ul><li>Broad D...
BIG  Data <ul><li>The term “Big Data” is widely used nowadays </li></ul><ul><ul><li>3 main contexts </li></ul></ul><ul><ul...
Big Data Challenge: Scaling <ul><li>Most of the focus of (current) Big Data research is on scaling (traditional) database-...
How BIG is Big? <ul><li>Science uses some extremely large databases and many of them are crucial to society </li></ul><ul>...
Big Data Facebook generates terabytes of data per day What could be learned from this?
BIG Data Google uses their data in many ways Search => ads => user
Big Data is becoming different on the Web <ul><li>New Work </li></ul><ul><ul><li>is moving away from traditional relationa...
BROAD data <ul><li>4 th  context: Broad Data  </li></ul><ul><ul><li>The huge amount of freely available, but widely varied...
Example: adding “Breadth” April 2010
Facebook ’s Open Graph Protocol <ul><li>Facebook now allows other sites to extend the graph  </li></ul><ul><li>Open Graph ...
OGP use growing quickly 15,178 sites of top 1,000,000 as of 3/3/11 In Sept 2012 Facebook announced extension of OGP for ne...
Goal: OGP-powered social (e-commerce) apps
Broad data (in Science) <ul><li>The  “ Deep Web ”  in Science  ( cf . Fox 2011) </li></ul><ul><ul><li>Data behind web serv...
Broad Data Science (Fox &Hendler,  Science , 2/11/10)
BROAD data challenges <ul><li>For broad data the new challenges that emerge include  </li></ul><ul><ul><li>(Web-scale) dat...
Example: Government Data on the Web
Government Data Sharing: “Year 1” January 1, 2009 “ Openness will strengthen our democracy and promote efficiency and effe...
Government Data Sharing: Year 2
Government Data Sharing: Year 3 2012 so far: http://www.gouv.fr Released 300,000 French databases  US/India to release Ope...
Government Data in the linked open data cloud http://linkeddata.org/ Government Data is currently over ½ the cloud in size...
Important to the citizens: eg. Education Data.gov.uk RPI NYS demos
<ul><li>Government “ Data ”  Mashups </li></ul>
Data.gov + epa.gov
 
Linking GDP of the US and China GDP of China (Billion Chinese Yuan ) GDP of the US (Billion Dollar) [Temporal Mashup] bea....
Linking GDP of the US and China GDP of China (Billion Chinese Yuan ) GDP of the US (Billion Dollar) [Temporal Mashup] bea....
Linking to “context” important Datasets: acres burned, and agency budgets Dbpedia: wikipedia descriptions of major US fires
Integrate with Social media
Combining data from different data sharing sites
http://logd.tw.rpi.edu   demos, tutorials, RDF-ized datasets, and more
Broad Data “Integration” requires simple semantics
Example any wikipedia topic!
Metadata is crucial for Broad Data <ul><li>Metadata design is crucial to govt data sharing </li></ul><ul><ul><li>Needed fo...
International Open Government Data Search
Searching for data <ul><li>Faceted browser with </li></ul><ul><ul><li>Keyword search </li></ul></ul><ul><ul><li>Catalogs <...
Details and download… http://logd.tw.rpi.edu/demo/international_dataset_catalog_search
Research in Govt Data => Broad Data challenges <ul><li>Trust </li></ul><ul><ul><li>Government data is controversial, and p...
Exploring new visualizations Data from http://littlesis.org
Reaching beyond the government
Broad Data Goes Beyond the Govt http://linkeddata.org/
Broad Data Challenges <ul><li>Finding and Using Broad Data is an emerging challenge </li></ul><ul><ul><li>How do I find a ...
Broad Data Market? <ul><li>Significant and growing commercial interest… </li></ul><ul><ul><li>Web:  Google, Amazon, Travel...
Broad Data Market? <ul><li>Significant and growing commercial interest… </li></ul><ul><ul><li>Web:  Google, Amazon, Travel...
Research (and business) Opportunities <ul><li>Broad Data is a great field for those looking for emerging opportunities </l...
Conclusions <ul><li>Big data is going Broad </li></ul><ul><ul><li>World Wide Web trend towards more and more varied data <...
Upcoming SlideShare
Loading in...5
×

Broad Data

5,361

Published on

In this talk I compare "Broad" data, the idea of thousands of datas

Published in: Technology, Education
3 Comments
14 Likes
Statistics
Notes
  • Oh, you know what Facebook will do in September 2012, eh? Btw, I often get the feeling that various Semantic Web/Linked Data etc. protagonists somehow doom the upper parts of the Semantic Web technology stack (whatever ;) ). However, they are also very important. Of course, could/should start with the easier parts, but propagate the impression the other parts are overly complex or not useful. I guess, appliers will notice when they will need those bits too ;)
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • thanks for catching that - obviously I meant data.gouv.fr - will fix in next version
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • For the french side, maybe you were talking about www.data.gouv.fr?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
5,361
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
119
Comments
3
Likes
14
Embeds 0
No embeds

No notes for slide
  • http://www.mkbergman.com/458/new-currents-in-the-deep-web/ http://academics.smcvt.edu/sburks/Definition_search_engine.htm
  • Broad Data

    1. 1. Broad Data Jim Hendler Tetherless World Constellation Tetherless World Professor of Computer and Cognitive Science Director, Information Technology and Web Science Program Rensselaer Polytechnic Institute http://www.cs.rpi.edu/~hendler @jahendler (twitter)
    2. 2. Outline (if I stick to it) <ul><li>Big Data ≠ Broad Data </li></ul><ul><li>Broad Data problem </li></ul><ul><li>Broad Data Example </li></ul><ul><ul><li>Open Government Data </li></ul></ul><ul><li>Broad Data challenges </li></ul><ul><li>How can you make money off this stuff? </li></ul>
    3. 3. BIG Data <ul><li>The term “Big Data” is widely used nowadays </li></ul><ul><ul><li>3 main contexts </li></ul></ul><ul><ul><ul><li>The large data collections of “big science” projects </li></ul></ul></ul><ul><ul><ul><li>The data holdings of a Google, Facebook or other large Web company </li></ul></ul></ul><ul><ul><ul><li>The enterprise data of large, non-Web-based companies (IBM, TATA, etc.) </li></ul></ul></ul>
    4. 4. Big Data Challenge: Scaling <ul><li>Most of the focus of (current) Big Data research is on scaling (traditional) database-related technologies </li></ul><ul><ul><li>Schema Modeling </li></ul></ul><ul><ul><li>Data Warehousing </li></ul></ul><ul><ul><li>Datamining </li></ul></ul><ul><ul><li>Statistical analysis </li></ul></ul><ul><ul><li>Mathematical Analytics </li></ul></ul><ul><ul><li>… </li></ul></ul>
    5. 5. How BIG is Big? <ul><li>Science uses some extremely large databases and many of them are crucial to society </li></ul><ul><ul><li>Petabytes of Data </li></ul></ul><ul><li>World Wide Web data is also extremely large </li></ul><ul><ul><li>With primary resources to explore it held by companies </li></ul></ul><ul><ul><ul><li>eg. Facebook </li></ul></ul></ul><ul><ul><ul><ul><li>25 Terabytes of logged data per day; valuation $100B? </li></ul></ul></ul></ul><ul><ul><ul><li>eg. Google </li></ul></ul></ul><ul><ul><ul><ul><li>In 2008 it was estimated at 20 petabytes per day (not including youTube); 2010 valuation >$190B </li></ul></ul></ul></ul>
    6. 6. Big Data Facebook generates terabytes of data per day What could be learned from this?
    7. 7. BIG Data Google uses their data in many ways Search => ads => user
    8. 8. Big Data is becoming different on the Web <ul><li>New Work </li></ul><ul><ul><li>is moving away from traditional relational models </li></ul></ul><ul><ul><ul><li>cf . NoSQL </li></ul></ul></ul><ul><ul><li>Moving towards third party application and extension </li></ul></ul><ul><ul><ul><li>cf . Mobile apps for local governments </li></ul></ul></ul><ul><ul><li>Includes a focus on interoperability and exchange with “lightweight” semantics </li></ul></ul><ul><ul><ul><li>Using ideas from the Semantic Web </li></ul></ul></ul><ul><ul><ul><ul><li>Search: Schema.org </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Social Networking: OGP </li></ul></ul></ul></ul>
    9. 9. BROAD data <ul><li>4 th context: Broad Data </li></ul><ul><ul><li>The huge amount of freely available, but widely varied, Open Data on the World Wide Web (Structured and Semi-structured) </li></ul></ul><ul><ul><ul><li>Example: The extended Facebook OGP graph (the part outside Facebook’s datasets) </li></ul></ul></ul><ul><ul><ul><li>Example: The growing linked open data cloud of freely available RDF linked data </li></ul></ul></ul><ul><ul><ul><li>Example: More than 710,000 datasets that are available on the Web free from governments around the world </li></ul></ul></ul>
    10. 10. Example: adding “Breadth” April 2010
    11. 11. Facebook ’s Open Graph Protocol <ul><li>Facebook now allows other sites to extend the graph </li></ul><ul><li>Open Graph Protocol uses RDFa to let web sites contain information about the things people “like” </li></ul><ul><ul><ul><li>og:title - The title of your object as it should appear within the graph, e.g., &quot;The Rock&quot;. </li></ul></ul></ul><ul><ul><ul><li>og:type - The type of your object, e.g., &quot;movie&quot;. Depending on the type you specify, other properties may also be required. </li></ul></ul></ul><ul><ul><ul><li>og:image - An image URL which should represent your object within the graph. </li></ul></ul></ul><ul><ul><ul><li>og:url - The canonical URL of your object that will be used as its permanent ID in the graph </li></ul></ul></ul><ul><ul><ul><li>og:description - A one to two sentence description of your object. </li></ul></ul></ul><ul><ul><ul><li>og:site_name - If your object is part of a larger web site, the name which should be displayed for the overall site. e.g., &quot;IMDb&quot;. </li></ul></ul></ul><ul><ul><li>Not a traditional “ontology” </li></ul></ul>
    12. 12. OGP use growing quickly 15,178 sites of top 1,000,000 as of 3/3/11 In Sept 2012 Facebook announced extension of OGP for new uses
    13. 13. Goal: OGP-powered social (e-commerce) apps
    14. 14. Broad data (in Science) <ul><li>The “ Deep Web ” in Science ( cf . Fox 2011) </li></ul><ul><ul><li>Data behind web services </li></ul></ul><ul><ul><li>Data behind query interfaces (databases or files) </li></ul></ul><ul><li>Introduces a different curation problem </li></ul>
    15. 15. Broad Data Science (Fox &Hendler, Science , 2/11/10)
    16. 16. BROAD data challenges <ul><li>For broad data the new challenges that emerge include </li></ul><ul><ul><li>(Web-scale) data search </li></ul></ul><ul><ul><li>“ Crowd-sourced” modeling </li></ul></ul><ul><ul><li>rapid (and potentially ad hoc ) integration of datasets </li></ul></ul><ul><ul><li>visualization and analysis of only-partially modeled datasets </li></ul></ul><ul><ul><li>policies for data use, reuse and combination. </li></ul></ul>
    17. 17. Example: Government Data on the Web
    18. 18. Government Data Sharing: “Year 1” January 1, 2009 “ Openness will strengthen our democracy and promote efficiency and effectiveness in Government.” --- President Obama Putting Govt Data online- Data.gov.uk beta May 21, 2009 January 19, 2010 data.gov.uk online May 21, 2010 data.gov online data.gov relaunch with semantic web featured June30,2009 December 8, 2009 “ Open Government Directive ” released 2009 2010 … 57 Data Sets ~6000 Data Set ~2000 Data Sets >305,000 Data Sets
    19. 19. Government Data Sharing: Year 2
    20. 20. Government Data Sharing: Year 3 2012 so far: http://www.gouv.fr Released 300,000 French databases US/India to release Open Government Platform Kenya announces “Open Africa” project
    21. 21. Government Data in the linked open data cloud http://linkeddata.org/ Government Data is currently over ½ the cloud in size (~17B triples), 10s of thousands of links to other data (within and without)
    22. 22. Important to the citizens: eg. Education Data.gov.uk RPI NYS demos
    23. 23. <ul><li>Government “ Data ” Mashups </li></ul>
    24. 24. Data.gov + epa.gov
    25. 26. Linking GDP of the US and China GDP of China (Billion Chinese Yuan ) GDP of the US (Billion Dollar) [Temporal Mashup] bea.gov + federalreserve.gov +stats.gov.cn
    26. 27. Linking GDP of the US and China GDP of China (Billion Chinese Yuan ) GDP of the US (Billion Dollar) [Temporal Mashup] bea.gov + federalreserve.gov +stats.gov.cn This mashup was built in less than 4 hours – including conversion of data, web interface, and visualization!
    27. 28. Linking to “context” important Datasets: acres burned, and agency budgets Dbpedia: wikipedia descriptions of major US fires
    28. 29. Integrate with Social media
    29. 30. Combining data from different data sharing sites
    30. 31. http://logd.tw.rpi.edu demos, tutorials, RDF-ized datasets, and more
    31. 32. Broad Data “Integration” requires simple semantics
    32. 33. Example any wikipedia topic!
    33. 34. Metadata is crucial for Broad Data <ul><li>Metadata design is crucial to govt data sharing </li></ul><ul><ul><li>Needed for search and federation in large data sharing efforts </li></ul></ul><ul><li>International data sharing </li></ul><ul><ul><li>W3C Govt Linked Data Working Group </li></ul></ul><ul><ul><li>Need for vocabularies within govt sectors </li></ul></ul><ul><ul><ul><li>Esp for cross-langauge use </li></ul></ul></ul><ul><ul><ul><ul><li>How can we compare health (or legal, or social, or ….) data between countries like US, UK, India, Kenya (English) with Norway, China, France, etc. </li></ul></ul></ul></ul><ul><ul><ul><ul><li>How can we link local govts (in traditional languages, local dialects, etc) w/national data </li></ul></ul></ul></ul>
    34. 35. International Open Government Data Search
    35. 36. Searching for data <ul><li>Faceted browser with </li></ul><ul><ul><li>Keyword search </li></ul></ul><ul><ul><li>Catalogs </li></ul></ul><ul><ul><li>Countries </li></ul></ul><ul><ul><li>Agencies </li></ul></ul><ul><ul><li>Categories </li></ul></ul><ul><ul><li>(in any order) </li></ul></ul>
    36. 37. Details and download… http://logd.tw.rpi.edu/demo/international_dataset_catalog_search
    37. 38. Research in Govt Data => Broad Data challenges <ul><li>Trust </li></ul><ul><ul><li>Government data is controversial, and potentially biased </li></ul></ul><ul><ul><ul><li>How do we confirm or dispute? </li></ul></ul></ul><ul><li>Combination </li></ul><ul><ul><li>When we combine data we need to keep the provenance of information (see trust) </li></ul></ul><ul><ul><ul><li>How do we make policies explicit and sharable </li></ul></ul></ul><ul><li>Scaling </li></ul><ul><ul><li>Our project has already converted 9.9B triples from only >2,000 of the 710,000 government databases we can identify (116 catalogs, 32 countries, 16 languages) </li></ul></ul><ul><ul><ul><li>Cross-catalog </li></ul></ul></ul><ul><ul><ul><li>Cross Langauge </li></ul></ul></ul><ul><li>Versioning and updating </li></ul><ul><li>Archiving </li></ul><ul><li>Visualization </li></ul><ul><li>… </li></ul>
    38. 39. Exploring new visualizations Data from http://littlesis.org
    39. 40. Reaching beyond the government
    40. 41. Broad Data Goes Beyond the Govt http://linkeddata.org/
    41. 42. Broad Data Challenges <ul><li>Finding and Using Broad Data is an emerging challenge </li></ul><ul><ul><li>How do I find a dataset in the many out there that might be of use to me? </li></ul></ul><ul><ul><ul><li>Cannot keyword search in data </li></ul></ul></ul><ul><ul><li>How do I know what is in a large data store? In the cloud? </li></ul></ul><ul><ul><ul><li>What is the coverage? </li></ul></ul></ul><ul><ul><ul><li>What is the access? </li></ul></ul></ul><ul><ul><ul><li>Who do I need to ask for what </li></ul></ul></ul><ul><ul><li>What are the rules about using it? </li></ul></ul><ul><ul><ul><li>What can I combine it with? </li></ul></ul></ul><ul><ul><ul><li>How do downstream users know I ’ ve combined it </li></ul></ul></ul>
    42. 43. Broad Data Market? <ul><li>Significant and growing commercial interest… </li></ul><ul><ul><li>Web: Google, Amazon, Travelocity… </li></ul></ul><ul><ul><li>Web 2.0: Facebook, Wikipedia, YouTube, Twitter… </li></ul></ul><ul><ul><li>Web 3.0: ?? </li></ul></ul>
    43. 44. Broad Data Market? <ul><li>Significant and growing commercial interest… </li></ul><ul><ul><li>Web: Google, Amazon, Travelocity… </li></ul></ul><ul><ul><li>Web 2.0: Facebook, Wikipedia, YouTube, Twitter… </li></ul></ul><ul><ul><li>Web 3.0: ?? </li></ul></ul>Broad Data Goes Here
    44. 45. Research (and business) Opportunities <ul><li>Broad Data is a great field for those looking for emerging opportunities </li></ul><ul><ul><li>Tooling is needed </li></ul></ul><ul><ul><li>(Business) Models are just starting to emerge </li></ul></ul><ul><ul><li>Scalability Infrastructure is there </li></ul></ul><ul><ul><li>Massive Distribution (think mobile) is wide open to Web 3.0 innovation </li></ul></ul><ul><li>Govt data gives us a place to cooperate (with public good) while exploring all of the above </li></ul>
    45. 46. Conclusions <ul><li>Big data is going Broad </li></ul><ul><ul><li>World Wide Web trend towards more and more varied data </li></ul></ul><ul><ul><ul><li>In many domains </li></ul></ul></ul><ul><ul><ul><ul><li>E-commerce, Open Govt, many more (cf. Health/Medical care) </li></ul></ul></ul></ul><ul><li>Broad data requires thinking outside the “Database” box </li></ul><ul><li>Broad data opens exciting possibilities for research and innovation </li></ul><ul><ul><li>Come play! </li></ul></ul>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×