CAEPIA 2011

545 views

Published on

Slides for my keynote at the KEESOS workshop http://ir.ii.uam.es/keesos2011/, CAPEIA 2011

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
545
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

CAEPIA 2011

  1. 1. The Data Era: Production,Consumption, Challenges Miriam Fernández 8th November, CAEPIA 2011 Website: http://people.kmi.open.ac.uk/miriam/about/ Twitter: @miri_fs Slide_share: http://www.slideshare.net/miriamfs
  2. 2. What is … ?
  3. 3. How do humans infer knowledge? Alejandro in Chicago!Semantic interpretation A picture!Syntactic interpretation
  4. 4. How do machines infer knowledge?Semantic interpretation A picture!Syntactic interpretation
  5. 5. The Challenge• We need to find the way in which machines will interpret and extract knowledge for us! =
  6. 6. The Challenge
  7. 7. The Data Era• The 2011 Digital University Study: Extracting Value from Chaos (IDC) – We have entered the Zettabyte era (a trillion gigabytes or a billion terabytes) – The great of information growth appears to be exceeding Moore’s Law http://www.emc.com/collateral/demos/ microsites/emc-digital-universe- 2011/index.htm
  8. 8. Big Value from Data• Big Data: The next frontier for innovation, competition and productivity (McKinsey) – $300 billion potential annual value to US health care – €250 billion potential annual value to Europe’s public sector administrationhttp://www.mckinsey.com/mgi/publications/big_data/pdfs/MGI_big_data_full_report.pdf
  9. 9. IBM City ForwardThe Smarter Cities Challenge is a competitive grant programawarding $50 million worth of IBM expertise over the next threeyears to 100 cities around the globe. Designed to address thewide range of challenges facing cities today
  10. 10. Consumption• We need to provide efficient ways to consume data in order to extract the value out of it, the knowledge – Syntactic approaches (visual analytics) • The data is collected, centralized and analysed • Visualizations for humans to extract knowledge – Semantic approaches • The information is distributed / interlinked • Semantic structures are added to the data so that machines can better understand it
  11. 11. Syntactic approaches• Some examples – Gap Minder – IBM many eyes – Google Public Data Explorer – Google correlate – Google N-Gram viewer • What is the most popular hair colour in the literature?
  12. 12. Google N-GramViewer
  13. 13. Semantic approaches• The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001
  14. 14. The SW vision• Use semantic structures (ontologies) to represent data. Provide machines with the ability to interpret and extract knowledge =
  15. 15. Adding Structure• Two paths towards the SW vision – Metadata embedded in HTML • Microformats • RDFa • Microdata – Linked Data • Putting the data online in a standard, web enabled representation (RDF) • Make the data Web addressable (URIs)
  16. 16. Metadata in HTML <div class="vcard"> • An example <div class="fn org">Knowledge Media Institute</div> <div class="adr">Knowledge Media Institute <div class="street-address">Walton Hall</div>Walton Hall <div>Milton Keynes <span class="locality">Milton Keynes</span>,MK7 6AA <span class="postal-code">MK7 6AA</span> </div> <div class="country-name">United Kingdom</div> </div> </div>
  17. 17. Metadata in HTML• Schema.org Semantically enhanced Information Retrieval: an ontology-based approach http://people.kmi.open.ac.uk/miriam/about/
  18. 18. Metadata in HTML• The Open Graph protocol
  19. 19. 2007 Linked Data 2008 2009 2010 Linking Open Data cloud diagram,by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  20. 20. Linked Data• An example http://data.semanticweb.org/person/miriam-fernandez/rdf <ns1:Person rdf:about="http://data.semanticweb.org/person/miriam- fernandez"> <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/> @prefix dbpedia <http://dbpedia.org/resource/>. @prefix dbterm <http://dbpedia.org/property/>. dbpedia:Amsterdam dbterm:officialName “Amsterdam” ; dbterm:longd “4” ; dbterm:longm “53” ; dbterm:longs “32” ;…
  21. 21. Open Government• Data.gov• Data.gov.uk• Many others… Research Funding Explorer
  22. 22. BBC • Programs • Music • Artist • World CupWho won it? ;)
  23. 23. Open University DBPedia RAE Data from OpenLearn Research Content ORO OutputsExposed as linkeddata, our data Archive of Library’s CourseCurrently: OUeachinterlink withgeonames Catalogue public Material Of Digitaldata sit in the externalother and different Content data.gov.uksystemsbecome toworld: – hard part A/V Materialof the “global datadiscover, obtain, Podcasts iTunesUspace” on the Webintegrate by users.BBC DBLP
  24. 24. Data.open.ac.uk data.open.ac. uk
  25. 25. The Value• Recognized as a critical step forward for the HE sector in the UK – Favor transparency and reuse of data, both externally and internally – Reduces cost of dealing with our own public data – Enable both new kinds of applications, and to make the ones that are already feasible more cost effective
  26. 26. The Value• Linking educational material across universities http://smartproducts1.kmi.open.ac.uk/ web-linkeduniversities/index.htm
  27. 27. The Value• Exploring research communities
  28. 28. The Value• And many others….
  29. 29. Conclusions• We have reached the Data Era – Production: currently more than a Zettabyte of information in the digital world and increasing really fast – Consumption: syntactic and semantic approaches have emerged to extract the value (the knowledge) out of the data – Challenges: Provide machines with the capabilities to extract the knowledge for us!
  30. 30. Conclusions• Many more challenges ahead… – Different formats (text vs. multimedia) – Different dynamics (time / location) – Different provenance – Different topics (heterogeneous) – Distributed, Massive, stream – Various quality –…
  31. 31. THX!• Any ideas to make me rich? ☺ = • Slide_share: http://www.slideshare.net/miriamfs • Website: http://people.kmi.open.ac.uk/miriam/about/ • Twitter: @miri_fs Thanks to Fouad Zablith and Mathieu dAquin ☺ for sharing with me some of their slides and for their valuable comments on this presentation

×