Small, Medium and Big Data


Published on

Published in: Technology
1 Comment
1 Like
  • Thank you Pierre for delivering such an engaging talk! The students from learned a lot and the invitees from the outside world (consultants and researchers) were also very exited about the topics you raised. It's not always easy to get a global view on the recent developments in the context of Big Data and the NoSQL movement but your presentation helped to clear things up. Hope you'll be able to deliver this talk also elsewhere! Seth van Hooland
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Small, Medium and Big Data

  1. 1. Small, Medium & Big DataPierre De Wilde23 November 2012ULB - MASTIC
  2. 2. Sir Tim Berners-Lee
  3. 3. Semantic Web Trends
  4. 4. Linked Data Trends
  5. 5. Linked Data Cloud Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
  6. 6. Semantic Web Semantic URI, RDF(S), OWL, SPARQL Web Scale ?
  7. 7. Web Scale Million of servers Billion of users Billion of objects => its really Big
  8. 8. Big Data Trends
  9. 9. Big Data 3 Vs Its not only about big volume of data...
  10. 10. V for ... Source: Anonymous
  11. 11. V for ... Volume Scale Sources Variety Relational NoSQL Velocity Operational Analytical
  12. 12. V for ... Volume Scale Sources Variety Relational NoSQL Velocity Operational Analytical
  13. 13. How Big is our Data? M mega million 106 G giga billion 109 T tera trillion 1012 P peta quadrillion 1015 E exa quintillion 1018 Z zetta sextillion 1021 Y yotta septillion 1024 Check The Powers of Ten (1977) on YouTube
  14. 14. Big Data Sources Million of servers (logs) Billion of users (social networks) Billion of devices (smartphones) + Time/Space = Big Data
  15. 15. Big Data Examples Facebook collects 500 TB per day (1) Google processes 24 PB per day (2) We create 2.5 EB per day (3) (1) (2) (2009) (3)
  16. 16. How Small is our Wisdom? Wisdom Knowledge Information Big Data Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information? T. S. Eliot, The Rock
  17. 17. V for ... Volume Scale Sources Variety Relational NoSQL Velocity Operational Analytical
  18. 18. Scalability Scaling up and Scaling out Partitioning and Sharding
  19. 19. Relational Databases
  20. 20. RDBMS Row Store B-tree indexing SQL as query language
  21. 21. RDBMS issues Scale up (big servers) Schemaful (structured) Index-intensive (join)
  22. 22. NoSQL Scale out (commodity servers) Schemaless (semi-structured) Index-free adjacency (graph)
  23. 23. NoSQL databases Credit: Neo Technology
  24. 24. Key-Value Stores (Key:string) => Value fast read, low write latency used for sessions, carts Dynamo: Amazon’s Highly Available Key-value Store (2007)
  25. 25. Bigtable Clones Googles Distributed Storage System (row:string, col:string, ts:int64) => string used by Google & most companies Bigtable: A Distributed Storage System for Structured Data (2006)
  26. 26. Document Databases document-oriented (content query) semi-structured data (JSON) used for web apps
  27. 27. Graph Databases property graph index-free adjacency used for recommendations, social networks
  28. 28. Graph G = (V, E)
  29. 29. Property Graph A property graph is a directed, labeled, attributed graph
  30. 30. Graph Traversal Gremlin is jumping - from vertex to vertex - from vertex to edge - from edge to vertex
  31. 31. DBpedia Traversal + +gremlin> g = new SparqlRepositorySailGraph("")gremlin> r = g.v(> r.out(,fr).value==>Sir Timothy John Berners-Lee est un citoyen britannique surtout connu comme le principal inventeurdu World Wide Web. En juillet 2004, il est anobli par la reine Elizabeth II pour ce travail et son nomofficiel devient Sir Timothy John Berners-Lee. Depuis 1994, il préside le World Wide Web Consortium(W3C), organisme quil a fondé.gremlin>>v[]gremlin>>v[]==>v[]==>v[]==>v[]...
  32. 32. Triple/RDF Stores Subject-Predicate-Object SPARQL as query language AllegroGraph, OpenLink Virtuoso, ...
  33. 33. V for ... Volume Scale Sources Variety Relational NoSQL Velocity Operational Analytical
  34. 34. Big Data Processing Batch Processing MapReduce Interactive Analysis BigQuery
  35. 35. MapReduce MapReduce: Simplified Data Processing on Large Clusters (2004)
  36. 36. Apache Hadoop Distributed Data + MapReduce
  37. 37. Last Trends
  38. 38. NoSQL issues No Distributed Transactions No SQL as query language
  39. 39. NewSQL NoSQL + Distributed Transactions + SQL Spanner: Googles Globally-Distributed Database (2012)
  40. 40. Thank youCredit: Most images created by Flickr Creative Commons Artists or Wikipedia Commons Artists