• Like
Small, Medium and Big Data
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Small, Medium and Big Data



Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Thank you Pierre for delivering such an engaging talk! The students from mastic.ulb.ac.be learned a lot and the invitees from the outside world (consultants and researchers) were also very exited about the topics you raised. It's not always easy to get a global view on the recent developments in the context of Big Data and the NoSQL movement but your presentation helped to clear things up. Hope you'll be able to deliver this talk also elsewhere! Seth van Hooland
    Are you sure you want to
    Your message goes here
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Small, Medium & Big DataPierre De Wilde23 November 2012ULB - MASTIChttp://mastic.ulb.ac.be
  • 2. Sir Tim Berners-Lee http://www.w3.org/People/Berners-Lee/
  • 3. Semantic Web Trends http://www.google.com/trends/explore#q=semantic%20web
  • 4. Linked Data Trends http://www.google.com/trends/explore#q=semantic%20web%2C%20linked%20data
  • 5. Linked Data Cloud Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  • 6. Semantic Web Semantic URI, RDF(S), OWL, SPARQL Web Scale ?
  • 7. Web Scale Million of servers Billion of users Billion of objects => its really Big
  • 8. Big Data Trends http://www.google.com/trends/explore#q=semantic%20web%2C%20big%20data
  • 9. Big Data 3 Vs Its not only about big volume of data...
  • 10. V for ... Source: Anonymous
  • 11. V for ... Volume Scale Sources Variety Relational NoSQL Velocity Operational Analytical
  • 12. V for ... Volume Scale Sources Variety Relational NoSQL Velocity Operational Analytical
  • 13. How Big is our Data? M mega million 106 G giga billion 109 T tera trillion 1012 P peta quadrillion 1015 E exa quintillion 1018 Z zetta sextillion 1021 Y yotta septillion 1024 Check The Powers of Ten (1977) on YouTube
  • 14. Big Data Sources Million of servers (logs) Billion of users (social networks) Billion of devices (smartphones) + Time/Space = Big Data
  • 15. Big Data Examples Facebook collects 500 TB per day (1) Google processes 24 PB per day (2) We create 2.5 EB per day (3) (1) http://gigaom.com/data/facebook-is-collecting-your-data-500-terabytes-a-day/ (2) http://en.wikipedia.org/wiki/Petabyte (2009) (3) http://www-01.ibm.com/software/data/bigdata/
  • 16. How Small is our Wisdom? Wisdom Knowledge Information Big Data Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information? T. S. Eliot, The Rock
  • 17. V for ... Volume Scale Sources Variety Relational NoSQL Velocity Operational Analytical
  • 18. Scalability Scaling up and Scaling out Partitioning and Sharding
  • 19. Relational Databases
  • 20. RDBMS Row Store B-tree indexing SQL as query language
  • 21. RDBMS issues Scale up (big servers) Schemaful (structured) Index-intensive (join)
  • 22. NoSQL Scale out (commodity servers) Schemaless (semi-structured) Index-free adjacency (graph)
  • 23. NoSQL databases Credit: Neo Technology
  • 24. Key-Value Stores (Key:string) => Value fast read, low write latency used for sessions, carts Dynamo: Amazon’s Highly Available Key-value Store (2007)
  • 25. Bigtable Clones Googles Distributed Storage System (row:string, col:string, ts:int64) => string used by Google & most companies Bigtable: A Distributed Storage System for Structured Data (2006)
  • 26. Document Databases document-oriented (content query) semi-structured data (JSON) used for web apps
  • 27. Graph Databases property graph index-free adjacency used for recommendations, social networks
  • 28. Graph G = (V, E)
  • 29. Property Graph A property graph is a directed, labeled, attributed graph
  • 30. Graph Traversal Gremlin is jumping - from vertex to vertex - from vertex to edge - from edge to vertex https://github.com/tinkerpop/gremlin/wiki
  • 31. DBpedia Traversal + +gremlin> g = new SparqlRepositorySailGraph("http://dbpedia.org/sparql")gremlin> r = g.v(http://dbpedia.org/resource/Tim_Berners-Lee)gremlin> r.out(http://www.w3.org/2000/01/rdf-schema#comment).has(lang,fr).value==>Sir Timothy John Berners-Lee est un citoyen britannique surtout connu comme le principal inventeurdu World Wide Web. En juillet 2004, il est anobli par la reine Elizabeth II pour ce travail et son nomofficiel devient Sir Timothy John Berners-Lee. Depuis 1994, il préside le World Wide Web Consortium(W3C), organisme quil a fondé.gremlin> r.in(http://dbpedia.org/ontology/influenced)==>v[http://dbpedia.org/resource/Paul_Otlet]gremlin> r.in(http://dbpedia.org/ontology/influenced).out(http://dbpedia.org/ontology/influenced)==>v[http://dbpedia.org/resource/Douglas_Engelbart]==>v[http://dbpedia.org/resource/Ted_Nelson]==>v[http://dbpedia.org/resource/Vannevar_Bush]==>v[http://dbpedia.org/resource/Tim_Berners-Lee]...
  • 32. Triple/RDF Stores Subject-Predicate-Object SPARQL as query language AllegroGraph, OpenLink Virtuoso, ...
  • 33. V for ... Volume Scale Sources Variety Relational NoSQL Velocity Operational Analytical
  • 34. Big Data Processing Batch Processing MapReduce Interactive Analysis BigQuery
  • 35. MapReduce MapReduce: Simplified Data Processing on Large Clusters (2004)
  • 36. Apache Hadoop Distributed Data + MapReduce http://hadoop.apache.org/
  • 37. Last Trends http://www.google.com/trends/explore#q=hadoop%2C%20mongodb%2C%20neo4j
  • 38. NoSQL issues No Distributed Transactions No SQL as query language
  • 39. NewSQL NoSQL + Distributed Transactions + SQL Spanner: Googles Globally-Distributed Database (2012)
  • 40. Thank youCredit: Most images created by Flickr Creative Commons Artists or Wikipedia Commons Artists