• Like
  • Save
Upcoming SlideShare
Loading in...5




Slides for my keynote at the KEESOS workshop http://ir.ii.uam.es/keesos2011/, CAPEIA 2011

Slides for my keynote at the KEESOS workshop http://ir.ii.uam.es/keesos2011/, CAPEIA 2011



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds


Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    CAEPIA 2011 CAEPIA 2011 Presentation Transcript

    • The Data Era: Production,Consumption, Challenges Miriam Fernández 8th November, CAEPIA 2011 Website: http://people.kmi.open.ac.uk/miriam/about/ Twitter: @miri_fs Slide_share: http://www.slideshare.net/miriamfs
    • What is … ?
    • How do humans infer knowledge? Alejandro in Chicago!Semantic interpretation A picture!Syntactic interpretation
    • How do machines infer knowledge?Semantic interpretation A picture!Syntactic interpretation
    • The Challenge• We need to find the way in which machines will interpret and extract knowledge for us! =
    • The Challenge
    • The Data Era• The 2011 Digital University Study: Extracting Value from Chaos (IDC) – We have entered the Zettabyte era (a trillion gigabytes or a billion terabytes) – The great of information growth appears to be exceeding Moore’s Law http://www.emc.com/collateral/demos/ microsites/emc-digital-universe- 2011/index.htm
    • Big Value from Data• Big Data: The next frontier for innovation, competition and productivity (McKinsey) – $300 billion potential annual value to US health care – €250 billion potential annual value to Europe’s public sector administrationhttp://www.mckinsey.com/mgi/publications/big_data/pdfs/MGI_big_data_full_report.pdf
    • IBM City ForwardThe Smarter Cities Challenge is a competitive grant programawarding $50 million worth of IBM expertise over the next threeyears to 100 cities around the globe. Designed to address thewide range of challenges facing cities today
    • Consumption• We need to provide efficient ways to consume data in order to extract the value out of it, the knowledge – Syntactic approaches (visual analytics) • The data is collected, centralized and analysed • Visualizations for humans to extract knowledge – Semantic approaches • The information is distributed / interlinked • Semantic structures are added to the data so that machines can better understand it
    • Syntactic approaches• Some examples – Gap Minder – IBM many eyes – Google Public Data Explorer – Google correlate – Google N-Gram viewer • What is the most popular hair colour in the literature?
    • Google N-GramViewer
    • Semantic approaches• The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001
    • The SW vision• Use semantic structures (ontologies) to represent data. Provide machines with the ability to interpret and extract knowledge =
    • Adding Structure• Two paths towards the SW vision – Metadata embedded in HTML • Microformats • RDFa • Microdata – Linked Data • Putting the data online in a standard, web enabled representation (RDF) • Make the data Web addressable (URIs)
    • Metadata in HTML <div class="vcard"> • An example <div class="fn org">Knowledge Media Institute</div> <div class="adr">Knowledge Media Institute <div class="street-address">Walton Hall</div>Walton Hall <div>Milton Keynes <span class="locality">Milton Keynes</span>,MK7 6AA <span class="postal-code">MK7 6AA</span> </div> <div class="country-name">United Kingdom</div> </div> </div>
    • Metadata in HTML• Schema.org Semantically enhanced Information Retrieval: an ontology-based approach http://people.kmi.open.ac.uk/miriam/about/
    • Metadata in HTML• The Open Graph protocol
    • 2007 Linked Data 2008 2009 2010 Linking Open Data cloud diagram,by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
    • Linked Data• An example http://data.semanticweb.org/person/miriam-fernandez/rdf <ns1:Person rdf:about="http://data.semanticweb.org/person/miriam- fernandez"> <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/> @prefix dbpedia <http://dbpedia.org/resource/>. @prefix dbterm <http://dbpedia.org/property/>. dbpedia:Amsterdam dbterm:officialName “Amsterdam” ; dbterm:longd “4” ; dbterm:longm “53” ; dbterm:longs “32” ;…
    • Open Government• Data.gov• Data.gov.uk• Many others… Research Funding Explorer
    • BBC • Programs • Music • Artist • World CupWho won it? ;)
    • Open University DBPedia RAE Data from OpenLearn Research Content ORO OutputsExposed as linkeddata, our data Archive of Library’s CourseCurrently: OUeachinterlink withgeonames Catalogue public Material Of Digitaldata sit in the externalother and different Content data.gov.uksystemsbecome toworld: – hard part A/V Materialof the “global datadiscover, obtain, Podcasts iTunesUspace” on the Webintegrate by users.BBC DBLP
    • Data.open.ac.uk data.open.ac. uk
    • The Value• Recognized as a critical step forward for the HE sector in the UK – Favor transparency and reuse of data, both externally and internally – Reduces cost of dealing with our own public data – Enable both new kinds of applications, and to make the ones that are already feasible more cost effective
    • The Value• Linking educational material across universities http://smartproducts1.kmi.open.ac.uk/ web-linkeduniversities/index.htm
    • The Value• Exploring research communities
    • The Value• And many others….
    • Conclusions• We have reached the Data Era – Production: currently more than a Zettabyte of information in the digital world and increasing really fast – Consumption: syntactic and semantic approaches have emerged to extract the value (the knowledge) out of the data – Challenges: Provide machines with the capabilities to extract the knowledge for us!
    • Conclusions• Many more challenges ahead… – Different formats (text vs. multimedia) – Different dynamics (time / location) – Different provenance – Different topics (heterogeneous) – Distributed, Massive, stream – Various quality –…
    • THX!• Any ideas to make me rich? ☺ = • Slide_share: http://www.slideshare.net/miriamfs • Website: http://people.kmi.open.ac.uk/miriam/about/ • Twitter: @miri_fs Thanks to Fouad Zablith and Mathieu dAquin ☺ for sharing with me some of their slides and for their valuable comments on this presentation