0
Small, Medium & Big DataPierre De Wilde23 November 2012ULB - MASTIChttp://mastic.ulb.ac.be
Sir Tim Berners-Lee             http://www.w3.org/People/Berners-Lee/
Semantic Web Trends        http://www.google.com/trends/explore#q=semantic%20web
Linked Data Trends   http://www.google.com/trends/explore#q=semantic%20web%2C%20linked%20data
Linked Data Cloud Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Semantic Web               Semantic                 URI, RDF(S), OWL, SPARQL               Web                 Scale ?
Web Scale            Million of servers            Billion of users            Billion of objects            => its really...
Big Data Trends    http://www.google.com/trends/explore#q=semantic%20web%2C%20big%20data
Big Data 3 Vs    Its not only about big volume of data...
V for ...            Source: Anonymous
V for ...            Volume              Scale              Sources            Variety              Relational            ...
V for ...            Volume              Scale              Sources            Variety              Relational            ...
How Big is our Data?        M     mega            million             106        G     giga            billion            ...
Big Data Sources       Million of servers (logs)       Billion of users (social networks)       Billion of devices (smartp...
Big Data Examples            Facebook collects 500 TB per day (1)            Google processes 24 PB per day (2)           ...
How Small is our Wisdom?                           Wisdom                        Knowledge                      Informatio...
V for ...            Volume              Scale              Sources            Variety              Relational            ...
Scalability        Scaling up and Scaling out        Partitioning and Sharding
Relational Databases
RDBMS        Row Store        B-tree indexing        SQL as query language
RDBMS issues      Scale up (big servers)      Schemaful (structured)      Index-intensive (join)
NoSQL        Scale out (commodity servers)        Schemaless (semi-structured)        Index-free adjacency (graph)
NoSQL databases              Credit: Neo Technology
Key-Value Stores       (Key:string) => Value       fast read, low write latency       used for sessions, carts        Dyna...
Bigtable Clones        Googles Distributed Storage System        (row:string, col:string, ts:int64) => string        used ...
Document Databases       document-oriented (content query)       semi-structured data (JSON)       used for web apps
Graph Databases       property graph       index-free adjacency       used for recommendations, social networks
Graph        G = (V, E)
Property Graph     A property graph is a directed, labeled, attributed graph
Graph Traversal                              Gremlin is jumping                              - from vertex to vertex      ...
DBpedia Traversal                                 +                                 +gremlin> g = new SparqlRepositorySail...
Triple/RDF Stores        Subject-Predicate-Object        SPARQL as query language        AllegroGraph, OpenLink Virtuoso, ...
V for ...            Volume              Scale              Sources            Variety              Relational            ...
Big Data Processing        Batch Processing          MapReduce        Interactive Analysis          BigQuery
MapReduce      MapReduce: Simplified Data Processing on Large Clusters (2004)
Apache Hadoop        Distributed Data + MapReduce                http://hadoop.apache.org/
Last Trends   http://www.google.com/trends/explore#q=hadoop%2C%20mongodb%2C%20neo4j
NoSQL issues       No Distributed Transactions       No SQL as query language
NewSQL    NoSQL + Distributed Transactions + SQL         Spanner: Googles Globally-Distributed Database (2012)
Thank youCredit: Most images created by Flickr Creative Commons Artists or Wikipedia Commons Artists
Upcoming SlideShare
Loading in...5
×

Small, Medium and Big Data

1,712

Published on

Published in: Technology
1 Comment
1 Like
Statistics
Notes
  • Thank you Pierre for delivering such an engaging talk! The students from mastic.ulb.ac.be learned a lot and the invitees from the outside world (consultants and researchers) were also very exited about the topics you raised. It's not always easy to get a global view on the recent developments in the context of Big Data and the NoSQL movement but your presentation helped to clear things up. Hope you'll be able to deliver this talk also elsewhere! Seth van Hooland
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
1,712
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
36
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Small, Medium and Big Data"

  1. 1. Small, Medium & Big DataPierre De Wilde23 November 2012ULB - MASTIChttp://mastic.ulb.ac.be
  2. 2. Sir Tim Berners-Lee http://www.w3.org/People/Berners-Lee/
  3. 3. Semantic Web Trends http://www.google.com/trends/explore#q=semantic%20web
  4. 4. Linked Data Trends http://www.google.com/trends/explore#q=semantic%20web%2C%20linked%20data
  5. 5. Linked Data Cloud Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  6. 6. Semantic Web Semantic URI, RDF(S), OWL, SPARQL Web Scale ?
  7. 7. Web Scale Million of servers Billion of users Billion of objects => its really Big
  8. 8. Big Data Trends http://www.google.com/trends/explore#q=semantic%20web%2C%20big%20data
  9. 9. Big Data 3 Vs Its not only about big volume of data...
  10. 10. V for ... Source: Anonymous
  11. 11. V for ... Volume Scale Sources Variety Relational NoSQL Velocity Operational Analytical
  12. 12. V for ... Volume Scale Sources Variety Relational NoSQL Velocity Operational Analytical
  13. 13. How Big is our Data? M mega million 106 G giga billion 109 T tera trillion 1012 P peta quadrillion 1015 E exa quintillion 1018 Z zetta sextillion 1021 Y yotta septillion 1024 Check The Powers of Ten (1977) on YouTube
  14. 14. Big Data Sources Million of servers (logs) Billion of users (social networks) Billion of devices (smartphones) + Time/Space = Big Data
  15. 15. Big Data Examples Facebook collects 500 TB per day (1) Google processes 24 PB per day (2) We create 2.5 EB per day (3) (1) http://gigaom.com/data/facebook-is-collecting-your-data-500-terabytes-a-day/ (2) http://en.wikipedia.org/wiki/Petabyte (2009) (3) http://www-01.ibm.com/software/data/bigdata/
  16. 16. How Small is our Wisdom? Wisdom Knowledge Information Big Data Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information? T. S. Eliot, The Rock
  17. 17. V for ... Volume Scale Sources Variety Relational NoSQL Velocity Operational Analytical
  18. 18. Scalability Scaling up and Scaling out Partitioning and Sharding
  19. 19. Relational Databases
  20. 20. RDBMS Row Store B-tree indexing SQL as query language
  21. 21. RDBMS issues Scale up (big servers) Schemaful (structured) Index-intensive (join)
  22. 22. NoSQL Scale out (commodity servers) Schemaless (semi-structured) Index-free adjacency (graph)
  23. 23. NoSQL databases Credit: Neo Technology
  24. 24. Key-Value Stores (Key:string) => Value fast read, low write latency used for sessions, carts Dynamo: Amazon’s Highly Available Key-value Store (2007)
  25. 25. Bigtable Clones Googles Distributed Storage System (row:string, col:string, ts:int64) => string used by Google & most companies Bigtable: A Distributed Storage System for Structured Data (2006)
  26. 26. Document Databases document-oriented (content query) semi-structured data (JSON) used for web apps
  27. 27. Graph Databases property graph index-free adjacency used for recommendations, social networks
  28. 28. Graph G = (V, E)
  29. 29. Property Graph A property graph is a directed, labeled, attributed graph
  30. 30. Graph Traversal Gremlin is jumping - from vertex to vertex - from vertex to edge - from edge to vertex https://github.com/tinkerpop/gremlin/wiki
  31. 31. DBpedia Traversal + +gremlin> g = new SparqlRepositorySailGraph("http://dbpedia.org/sparql")gremlin> r = g.v(http://dbpedia.org/resource/Tim_Berners-Lee)gremlin> r.out(http://www.w3.org/2000/01/rdf-schema#comment).has(lang,fr).value==>Sir Timothy John Berners-Lee est un citoyen britannique surtout connu comme le principal inventeurdu World Wide Web. En juillet 2004, il est anobli par la reine Elizabeth II pour ce travail et son nomofficiel devient Sir Timothy John Berners-Lee. Depuis 1994, il préside le World Wide Web Consortium(W3C), organisme quil a fondé.gremlin> r.in(http://dbpedia.org/ontology/influenced)==>v[http://dbpedia.org/resource/Paul_Otlet]gremlin> r.in(http://dbpedia.org/ontology/influenced).out(http://dbpedia.org/ontology/influenced)==>v[http://dbpedia.org/resource/Douglas_Engelbart]==>v[http://dbpedia.org/resource/Ted_Nelson]==>v[http://dbpedia.org/resource/Vannevar_Bush]==>v[http://dbpedia.org/resource/Tim_Berners-Lee]...
  32. 32. Triple/RDF Stores Subject-Predicate-Object SPARQL as query language AllegroGraph, OpenLink Virtuoso, ...
  33. 33. V for ... Volume Scale Sources Variety Relational NoSQL Velocity Operational Analytical
  34. 34. Big Data Processing Batch Processing MapReduce Interactive Analysis BigQuery
  35. 35. MapReduce MapReduce: Simplified Data Processing on Large Clusters (2004)
  36. 36. Apache Hadoop Distributed Data + MapReduce http://hadoop.apache.org/
  37. 37. Last Trends http://www.google.com/trends/explore#q=hadoop%2C%20mongodb%2C%20neo4j
  38. 38. NoSQL issues No Distributed Transactions No SQL as query language
  39. 39. NewSQL NoSQL + Distributed Transactions + SQL Spanner: Googles Globally-Distributed Database (2012)
  40. 40. Thank youCredit: Most images created by Flickr Creative Commons Artists or Wikipedia Commons Artists
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×