Your SlideShare is downloading. ×
Small, Medium and Big Data
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Small, Medium and Big Data


Published on

Published in: Technology
1 Comment
1 Like
  • Thank you Pierre for delivering such an engaging talk! The students from learned a lot and the invitees from the outside world (consultants and researchers) were also very exited about the topics you raised. It's not always easy to get a global view on the recent developments in the context of Big Data and the NoSQL movement but your presentation helped to clear things up. Hope you'll be able to deliver this talk also elsewhere! Seth van Hooland
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Small, Medium & Big DataPierre De Wilde23 November 2012ULB - MASTIC
  • 2. Sir Tim Berners-Lee
  • 3. Semantic Web Trends
  • 4. Linked Data Trends
  • 5. Linked Data Cloud Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
  • 6. Semantic Web Semantic URI, RDF(S), OWL, SPARQL Web Scale ?
  • 7. Web Scale Million of servers Billion of users Billion of objects => its really Big
  • 8. Big Data Trends
  • 9. Big Data 3 Vs Its not only about big volume of data...
  • 10. V for ... Source: Anonymous
  • 11. V for ... Volume Scale Sources Variety Relational NoSQL Velocity Operational Analytical
  • 12. V for ... Volume Scale Sources Variety Relational NoSQL Velocity Operational Analytical
  • 13. How Big is our Data? M mega million 106 G giga billion 109 T tera trillion 1012 P peta quadrillion 1015 E exa quintillion 1018 Z zetta sextillion 1021 Y yotta septillion 1024 Check The Powers of Ten (1977) on YouTube
  • 14. Big Data Sources Million of servers (logs) Billion of users (social networks) Billion of devices (smartphones) + Time/Space = Big Data
  • 15. Big Data Examples Facebook collects 500 TB per day (1) Google processes 24 PB per day (2) We create 2.5 EB per day (3) (1) (2) (2009) (3)
  • 16. How Small is our Wisdom? Wisdom Knowledge Information Big Data Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information? T. S. Eliot, The Rock
  • 17. V for ... Volume Scale Sources Variety Relational NoSQL Velocity Operational Analytical
  • 18. Scalability Scaling up and Scaling out Partitioning and Sharding
  • 19. Relational Databases
  • 20. RDBMS Row Store B-tree indexing SQL as query language
  • 21. RDBMS issues Scale up (big servers) Schemaful (structured) Index-intensive (join)
  • 22. NoSQL Scale out (commodity servers) Schemaless (semi-structured) Index-free adjacency (graph)
  • 23. NoSQL databases Credit: Neo Technology
  • 24. Key-Value Stores (Key:string) => Value fast read, low write latency used for sessions, carts Dynamo: Amazon’s Highly Available Key-value Store (2007)
  • 25. Bigtable Clones Googles Distributed Storage System (row:string, col:string, ts:int64) => string used by Google & most companies Bigtable: A Distributed Storage System for Structured Data (2006)
  • 26. Document Databases document-oriented (content query) semi-structured data (JSON) used for web apps
  • 27. Graph Databases property graph index-free adjacency used for recommendations, social networks
  • 28. Graph G = (V, E)
  • 29. Property Graph A property graph is a directed, labeled, attributed graph
  • 30. Graph Traversal Gremlin is jumping - from vertex to vertex - from vertex to edge - from edge to vertex
  • 31. DBpedia Traversal + +gremlin> g = new SparqlRepositorySailGraph("")gremlin> r = g.v(> r.out(,fr).value==>Sir Timothy John Berners-Lee est un citoyen britannique surtout connu comme le principal inventeurdu World Wide Web. En juillet 2004, il est anobli par la reine Elizabeth II pour ce travail et son nomofficiel devient Sir Timothy John Berners-Lee. Depuis 1994, il préside le World Wide Web Consortium(W3C), organisme quil a fondé.gremlin>>v[]gremlin>>v[]==>v[]==>v[]==>v[]...
  • 32. Triple/RDF Stores Subject-Predicate-Object SPARQL as query language AllegroGraph, OpenLink Virtuoso, ...
  • 33. V for ... Volume Scale Sources Variety Relational NoSQL Velocity Operational Analytical
  • 34. Big Data Processing Batch Processing MapReduce Interactive Analysis BigQuery
  • 35. MapReduce MapReduce: Simplified Data Processing on Large Clusters (2004)
  • 36. Apache Hadoop Distributed Data + MapReduce
  • 37. Last Trends
  • 38. NoSQL issues No Distributed Transactions No SQL as query language
  • 39. NewSQL NoSQL + Distributed Transactions + SQL Spanner: Googles Globally-Distributed Database (2012)
  • 40. Thank youCredit: Most images created by Flickr Creative Commons Artists or Wikipedia Commons Artists