Broad Data
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


Broad Data



In this talk I compare "Broad" data, the idea of thousands of datas

In this talk I compare "Broad" data, the idea of thousands of datas



Total Views
Views on SlideShare
Embed Views



7 Embeds 684 590 30 25 24 11 3 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Oh, you know what Facebook will do in September 2012, eh? Btw, I often get the feeling that various Semantic Web/Linked Data etc. protagonists somehow doom the upper parts of the Semantic Web technology stack (whatever ;) ). However, they are also very important. Of course, could/should start with the easier parts, but propagate the impression the other parts are overly complex or not useful. I guess, appliers will notice when they will need those bits too ;)
    Are you sure you want to
    Your message goes here
  • thanks for catching that - obviously I meant - will fix in next version
    Are you sure you want to
    Your message goes here
  • For the french side, maybe you were talking about
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Broad Data Presentation Transcript

  • 1. Broad Data Jim Hendler Tetherless World Constellation Tetherless World Professor of Computer and Cognitive Science Director, Information Technology and Web Science Program Rensselaer Polytechnic Institute @jahendler (twitter)
  • 2. Outline (if I stick to it)
    • Big Data ≠ Broad Data
    • Broad Data problem
    • Broad Data Example
      • Open Government Data
    • Broad Data challenges
    • How can you make money off this stuff?
  • 3. BIG Data
    • The term “Big Data” is widely used nowadays
      • 3 main contexts
        • The large data collections of “big science” projects
        • The data holdings of a Google, Facebook or other large Web company
        • The enterprise data of large, non-Web-based companies (IBM, TATA, etc.)
  • 4. Big Data Challenge: Scaling
    • Most of the focus of (current) Big Data research is on scaling (traditional) database-related technologies
      • Schema Modeling
      • Data Warehousing
      • Datamining
      • Statistical analysis
      • Mathematical Analytics
  • 5. How BIG is Big?
    • Science uses some extremely large databases and many of them are crucial to society
      • Petabytes of Data
    • World Wide Web data is also extremely large
      • With primary resources to explore it held by companies
        • eg. Facebook
          • 25 Terabytes of logged data per day; valuation $100B?
        • eg. Google
          • In 2008 it was estimated at 20 petabytes per day (not including youTube); 2010 valuation >$190B
  • 6. Big Data Facebook generates terabytes of data per day What could be learned from this?
  • 7. BIG Data Google uses their data in many ways Search => ads => user
  • 8. Big Data is becoming different on the Web
    • New Work
      • is moving away from traditional relational models
        • cf . NoSQL
      • Moving towards third party application and extension
        • cf . Mobile apps for local governments
      • Includes a focus on interoperability and exchange with “lightweight” semantics
        • Using ideas from the Semantic Web
          • Search:
          • Social Networking: OGP
  • 9. BROAD data
    • 4 th context: Broad Data
      • The huge amount of freely available, but widely varied, Open Data on the World Wide Web (Structured and Semi-structured)
        • Example: The extended Facebook OGP graph (the part outside Facebook’s datasets)
        • Example: The growing linked open data cloud of freely available RDF linked data
        • Example: More than 710,000 datasets that are available on the Web free from governments around the world
  • 10. Example: adding “Breadth” April 2010
  • 11. Facebook ’s Open Graph Protocol
    • Facebook now allows other sites to extend the graph
    • Open Graph Protocol uses RDFa to let web sites contain information about the things people “like”
        • og:title - The title of your object as it should appear within the graph, e.g., "The Rock".
        • og:type - The type of your object, e.g., "movie". Depending on the type you specify, other properties may also be required.
        • og:image - An image URL which should represent your object within the graph.
        • og:url - The canonical URL of your object that will be used as its permanent ID in the graph
        • og:description - A one to two sentence description of your object.
        • og:site_name - If your object is part of a larger web site, the name which should be displayed for the overall site. e.g., "IMDb".
      • Not a traditional “ontology”
  • 12. OGP use growing quickly 15,178 sites of top 1,000,000 as of 3/3/11 In Sept 2012 Facebook announced extension of OGP for new uses
  • 13. Goal: OGP-powered social (e-commerce) apps
  • 14. Broad data (in Science)
    • The “ Deep Web ” in Science ( cf . Fox 2011)
      • Data behind web services
      • Data behind query interfaces (databases or files)
    • Introduces a different curation problem
  • 15. Broad Data Science (Fox &Hendler, Science , 2/11/10)
  • 16. BROAD data challenges
    • For broad data the new challenges that emerge include
      • (Web-scale) data search
      • “ Crowd-sourced” modeling
      • rapid (and potentially ad hoc ) integration of datasets
      • visualization and analysis of only-partially modeled datasets
      • policies for data use, reuse and combination.
  • 17. Example: Government Data on the Web
  • 18. Government Data Sharing: “Year 1” January 1, 2009 “ Openness will strengthen our democracy and promote efficiency and effectiveness in Government.” --- President Obama Putting Govt Data online- beta May 21, 2009 January 19, 2010 online May 21, 2010 online relaunch with semantic web featured June30,2009 December 8, 2009 “ Open Government Directive ” released 2009 2010 … 57 Data Sets ~6000 Data Set ~2000 Data Sets >305,000 Data Sets
  • 19. Government Data Sharing: Year 2
  • 20. Government Data Sharing: Year 3 2012 so far: Released 300,000 French databases US/India to release Open Government Platform Kenya announces “Open Africa” project
  • 21. Government Data in the linked open data cloud Government Data is currently over ½ the cloud in size (~17B triples), 10s of thousands of links to other data (within and without)
  • 22. Important to the citizens: eg. Education RPI NYS demos
  • 23.
    • Government “ Data ” Mashups
  • 24. +
  • 25.  
  • 26. Linking GDP of the US and China GDP of China (Billion Chinese Yuan ) GDP of the US (Billion Dollar) [Temporal Mashup] +
  • 27. Linking GDP of the US and China GDP of China (Billion Chinese Yuan ) GDP of the US (Billion Dollar) [Temporal Mashup] + This mashup was built in less than 4 hours – including conversion of data, web interface, and visualization!
  • 28. Linking to “context” important Datasets: acres burned, and agency budgets Dbpedia: wikipedia descriptions of major US fires
  • 29. Integrate with Social media
  • 30. Combining data from different data sharing sites
  • 31. demos, tutorials, RDF-ized datasets, and more
  • 32. Broad Data “Integration” requires simple semantics
  • 33. Example any wikipedia topic!
  • 34. Metadata is crucial for Broad Data
    • Metadata design is crucial to govt data sharing
      • Needed for search and federation in large data sharing efforts
    • International data sharing
      • W3C Govt Linked Data Working Group
      • Need for vocabularies within govt sectors
        • Esp for cross-langauge use
          • How can we compare health (or legal, or social, or ….) data between countries like US, UK, India, Kenya (English) with Norway, China, France, etc.
          • How can we link local govts (in traditional languages, local dialects, etc) w/national data
  • 35. International Open Government Data Search
  • 36. Searching for data
    • Faceted browser with
      • Keyword search
      • Catalogs
      • Countries
      • Agencies
      • Categories
      • (in any order)
  • 37. Details and download…
  • 38. Research in Govt Data => Broad Data challenges
    • Trust
      • Government data is controversial, and potentially biased
        • How do we confirm or dispute?
    • Combination
      • When we combine data we need to keep the provenance of information (see trust)
        • How do we make policies explicit and sharable
    • Scaling
      • Our project has already converted 9.9B triples from only >2,000 of the 710,000 government databases we can identify (116 catalogs, 32 countries, 16 languages)
        • Cross-catalog
        • Cross Langauge
    • Versioning and updating
    • Archiving
    • Visualization
  • 39. Exploring new visualizations Data from
  • 40. Reaching beyond the government
  • 41. Broad Data Goes Beyond the Govt
  • 42. Broad Data Challenges
    • Finding and Using Broad Data is an emerging challenge
      • How do I find a dataset in the many out there that might be of use to me?
        • Cannot keyword search in data
      • How do I know what is in a large data store? In the cloud?
        • What is the coverage?
        • What is the access?
        • Who do I need to ask for what
      • What are the rules about using it?
        • What can I combine it with?
        • How do downstream users know I ’ ve combined it
  • 43. Broad Data Market?
    • Significant and growing commercial interest…
      • Web: Google, Amazon, Travelocity…
      • Web 2.0: Facebook, Wikipedia, YouTube, Twitter…
      • Web 3.0: ??
  • 44. Broad Data Market?
    • Significant and growing commercial interest…
      • Web: Google, Amazon, Travelocity…
      • Web 2.0: Facebook, Wikipedia, YouTube, Twitter…
      • Web 3.0: ??
    Broad Data Goes Here
  • 45. Research (and business) Opportunities
    • Broad Data is a great field for those looking for emerging opportunities
      • Tooling is needed
      • (Business) Models are just starting to emerge
      • Scalability Infrastructure is there
      • Massive Distribution (think mobile) is wide open to Web 3.0 innovation
    • Govt data gives us a place to cooperate (with public good) while exploring all of the above
  • 46. Conclusions
    • Big data is going Broad
      • World Wide Web trend towards more and more varied data
        • In many domains
          • E-commerce, Open Govt, many more (cf. Health/Medical care)
    • Broad data requires thinking outside the “Database” box
    • Broad data opens exciting possibilities for research and innovation
      • Come play!