SlideShare a Scribd company logo
Small, Medium & Big Data
Pierre De Wilde
23 November 2012
ULB - MASTIC
http://mastic.ulb.ac.be
Sir Tim Berners-Lee




             http://www.w3.org/People/Berners-Lee/
Semantic Web Trends




        http://www.google.com/trends/explore#q=semantic%20web
Linked Data Trends




   http://www.google.com/trends/explore#q=semantic%20web%2C%20linked%20data
Linked Data Cloud




 Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Semantic Web


               Semantic
                 URI, RDF(S), OWL, SPARQL



               Web
                 Scale ?
Web Scale


            Million of servers
            Billion of users
            Billion of objects


            => it's really Big
Big Data Trends




    http://www.google.com/trends/explore#q=semantic%20web%2C%20big%20data
Big Data 3 V's




    It's not only about big volume of data...
V for ...




            Source: Anonymous
V for ...
            Volume
              Scale
              Sources

            Variety
              Relational
              NoSQL

            Velocity
              Operational
              Analytical
V for ...
            Volume
              Scale
              Sources

            Variety
              Relational
              NoSQL

            Velocity
              Operational
              Analytical
How Big is our Data?


        M     mega            million             106
        G     giga            billion             109
        T     tera            trillion            1012
        P     peta            quadrillion         1015
        E     exa             quintillion         1018
        Z     zetta           sextillion          1021
        Y     yotta           septillion          1024



            Check The Powers of Ten (1977) on YouTube
Big Data Sources


       Million of servers (logs)

       Billion of users (social networks)

       Billion of devices (smartphones)

       + Time/Space = Big Data
Big Data Examples


            Facebook collects 500 TB per day (1)

            Google processes 24 PB per day (2)

            We create 2.5 EB per day (3)




    (1) http://gigaom.com/data/facebook-is-collecting-your-data-500-terabytes-a-day/
                       (2) http://en.wikipedia.org/wiki/Petabyte (2009)
                     (3) http://www-01.ibm.com/software/data/bigdata/
How Small is our Wisdom?

                           Wisdom




                        Knowledge



                      Information


                   Big Data

            Where is the wisdom we have lost in knowledge?
          Where is the knowledge we have lost in information?

                                        T. S. Eliot, The Rock
V for ...
            Volume
              Scale
              Sources

            Variety
              Relational
              NoSQL

            Velocity
              Operational
              Analytical
Scalability


        Scaling up and Scaling out

        Partitioning and Sharding
Relational Databases
RDBMS


        Row Store

        B-tree indexing

        SQL as query language
RDBMS issues


      Scale up (big servers)

      Schemaful (structured)

      Index-intensive (join)
NoSQL


        Scale out (commodity servers)

        Schemaless (semi-structured)

        Index-free adjacency (graph)
NoSQL databases




              Credit: Neo Technology
Key-Value Stores


       (Key:string) => Value

       fast read, low write latency

       used for sessions, carts




        Dynamo: Amazon’s Highly Available Key-value Store (2007)
Bigtable Clones


        Google's Distributed Storage System

        (row:string, col:string, ts:int64) => string

        used by Google & most companies




       Bigtable: A Distributed Storage System for Structured Data (2006)
Document Databases


       document-oriented (content query)

       semi-structured data (JSON)

       used for web apps
Graph Databases


       property graph

       index-free adjacency

       used for recommendations, social networks
Graph




        G = (V, E)
Property Graph




     A property graph is a directed, labeled, attributed graph
Graph Traversal


                              Gremlin is jumping

                              - from vertex to vertex
                              - from vertex to edge
                              - from edge to vertex




            https://github.com/tinkerpop/gremlin/wiki
DBpedia Traversal


                                 +                                 +
gremlin> g = new SparqlRepositorySailGraph("http://dbpedia.org/sparql")

gremlin> r = g.v('http://dbpedia.org/resource/Tim_Berners-Lee')

gremlin> r.out('http://www.w3.org/2000/01/rdf-schema#comment').has('lang','fr').value
==>Sir Timothy John Berners-Lee est un citoyen britannique surtout connu comme le principal inventeur
du World Wide Web. En juillet 2004, il est anobli par la reine Elizabeth II pour ce travail et son nom
officiel devient Sir Timothy John Berners-Lee. Depuis 1994, il préside le World Wide Web Consortium
(W3C), organisme qu'il a fondé.

gremlin> r.in('http://dbpedia.org/ontology/influenced')
==>v[http://dbpedia.org/resource/Paul_Otlet]

gremlin> r.in('http://dbpedia.org/ontology/influenced').out('http://dbpedia.org/ontology/influenced')
==>v[http://dbpedia.org/resource/Douglas_Engelbart]
==>v[http://dbpedia.org/resource/Ted_Nelson]
==>v[http://dbpedia.org/resource/Vannevar_Bush]
==>v[http://dbpedia.org/resource/Tim_Berners-Lee]
...
Triple/RDF Stores


        Subject-Predicate-Object

        SPARQL as query language

        AllegroGraph, OpenLink Virtuoso, ...
V for ...
            Volume
              Scale
              Sources

            Variety
              Relational
              NoSQL

            Velocity
              Operational
              Analytical
Big Data Processing



        Batch Processing
          MapReduce


        Interactive Analysis
          BigQuery
MapReduce




      MapReduce: Simplified Data Processing on Large Clusters (2004)
Apache Hadoop




        Distributed Data + MapReduce




                http://hadoop.apache.org/
Last Trends




   http://www.google.com/trends/explore#q=hadoop%2C%20mongodb%2C%20neo4j
NoSQL issues


       No Distributed Transactions

       No SQL as query language
NewSQL




    NoSQL + Distributed Transactions + SQL




         Spanner: Google's Globally-Distributed Database (2012)
Thank you




Credit: Most images created by Flickr Creative Commons Artists or Wikipedia Commons Artists

More Related Content

What's hot

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Kristof Jozsa
 
A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"
Nicola Ferraro
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)
SahilRaina21
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
CodePolitan
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big DataLewis Crawford
 
Microsoft on Big Data
Microsoft on Big DataMicrosoft on Big Data
Microsoft on Big Data
Yvette Teiken
 
Big data ppt
Big data pptBig data ppt
Big data ppt
Shweta Sahu
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
Vamshikrishna Goud
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop
siliconsudipt
 
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.
Shakir Ali
 
Big data and data science
Big data and data scienceBig data and data science
Big data and data science
Song Xue
 
Anita Graser: Analyzing Movment Data with MovingPandas
Anita Graser: Analyzing Movment Data  with MovingPandas Anita Graser: Analyzing Movment Data  with MovingPandas
Anita Graser: Analyzing Movment Data with MovingPandas
Vienna Data Science Group
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
Cloudera, Inc.
 
An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folks
Thomas Hütter
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop
Arohi Khandelwal
 
Storing and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the CloudStoring and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the Cloud
Steffen Staab
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big Data
Arjen de Vries
 
Big Data Tutorial V4
Big Data Tutorial V4Big Data Tutorial V4
Big Data Tutorial V4
Marko Grobelnik
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data
Srinath Perera
 

What's hot (20)

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big Data
 
Microsoft on Big Data
Microsoft on Big DataMicrosoft on Big Data
Microsoft on Big Data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop
 
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.
 
Big data and data science
Big data and data scienceBig data and data science
Big data and data science
 
Anita Graser: Analyzing Movment Data with MovingPandas
Anita Graser: Analyzing Movment Data  with MovingPandas Anita Graser: Analyzing Movment Data  with MovingPandas
Anita Graser: Analyzing Movment Data with MovingPandas
 
Token
TokenToken
Token
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folks
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop
 
Storing and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the CloudStoring and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the Cloud
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big Data
 
Big Data Tutorial V4
Big Data Tutorial V4Big Data Tutorial V4
Big Data Tutorial V4
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data
 

Similar to Small, Medium and Big Data

Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBase
Age Mooij
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
WU (Vienna University of Economics and Business)
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
Bikram Sinha. MBA, PMP
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_publicAttila Barta
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
InnoTech
 
Lunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataLunch & Learn Intro to Big Data
Lunch & Learn Intro to Big Data
Melissa Hornbostel
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
J Singh
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...MongoDB
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
Bart Vandewoestyne
 
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
Emil Eifrem
 
Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Data
waheed751
 
Cs501 dm intro
Cs501 dm introCs501 dm intro
Cs501 dm intro
Kamal Singh Lodhi
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data Lake
Trivadis
 
Big Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's PerspectiveBig Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's Perspective
Hien Luu
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
Ivan Herman
 

Similar to Small, Medium and Big Data (20)

Addressing dm-cloud
Addressing dm-cloudAddressing dm-cloud
Addressing dm-cloud
 
Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBase
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
 
Sailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0sSailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0s
 
STI Summit 2011 - Digital Worlds
STI Summit 2011 - Digital WorldsSTI Summit 2011 - Digital Worlds
STI Summit 2011 - Digital Worlds
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 
Lunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataLunch & Learn Intro to Big Data
Lunch & Learn Intro to Big Data
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
NOSQL Overview Lightning Talk (Scalability Geekcruise 2009)
 
Galaxy of bits
Galaxy of bitsGalaxy of bits
Galaxy of bits
 
Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Data
 
Cs501 dm intro
Cs501 dm introCs501 dm intro
Cs501 dm intro
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data Lake
 
Big Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's PerspectiveBig Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's Perspective
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 

Recently uploaded

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 

Recently uploaded (20)

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 

Small, Medium and Big Data

  • 1. Small, Medium & Big Data Pierre De Wilde 23 November 2012 ULB - MASTIC http://mastic.ulb.ac.be
  • 2. Sir Tim Berners-Lee http://www.w3.org/People/Berners-Lee/
  • 3. Semantic Web Trends http://www.google.com/trends/explore#q=semantic%20web
  • 4. Linked Data Trends http://www.google.com/trends/explore#q=semantic%20web%2C%20linked%20data
  • 5. Linked Data Cloud Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  • 6. Semantic Web Semantic URI, RDF(S), OWL, SPARQL Web Scale ?
  • 7. Web Scale Million of servers Billion of users Billion of objects => it's really Big
  • 8. Big Data Trends http://www.google.com/trends/explore#q=semantic%20web%2C%20big%20data
  • 9. Big Data 3 V's It's not only about big volume of data...
  • 10. V for ... Source: Anonymous
  • 11. V for ... Volume Scale Sources Variety Relational NoSQL Velocity Operational Analytical
  • 12. V for ... Volume Scale Sources Variety Relational NoSQL Velocity Operational Analytical
  • 13. How Big is our Data? M mega million 106 G giga billion 109 T tera trillion 1012 P peta quadrillion 1015 E exa quintillion 1018 Z zetta sextillion 1021 Y yotta septillion 1024 Check The Powers of Ten (1977) on YouTube
  • 14. Big Data Sources Million of servers (logs) Billion of users (social networks) Billion of devices (smartphones) + Time/Space = Big Data
  • 15. Big Data Examples Facebook collects 500 TB per day (1) Google processes 24 PB per day (2) We create 2.5 EB per day (3) (1) http://gigaom.com/data/facebook-is-collecting-your-data-500-terabytes-a-day/ (2) http://en.wikipedia.org/wiki/Petabyte (2009) (3) http://www-01.ibm.com/software/data/bigdata/
  • 16. How Small is our Wisdom? Wisdom Knowledge Information Big Data Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information? T. S. Eliot, The Rock
  • 17. V for ... Volume Scale Sources Variety Relational NoSQL Velocity Operational Analytical
  • 18. Scalability Scaling up and Scaling out Partitioning and Sharding
  • 20. RDBMS Row Store B-tree indexing SQL as query language
  • 21. RDBMS issues Scale up (big servers) Schemaful (structured) Index-intensive (join)
  • 22. NoSQL Scale out (commodity servers) Schemaless (semi-structured) Index-free adjacency (graph)
  • 23. NoSQL databases Credit: Neo Technology
  • 24. Key-Value Stores (Key:string) => Value fast read, low write latency used for sessions, carts Dynamo: Amazon’s Highly Available Key-value Store (2007)
  • 25. Bigtable Clones Google's Distributed Storage System (row:string, col:string, ts:int64) => string used by Google & most companies Bigtable: A Distributed Storage System for Structured Data (2006)
  • 26. Document Databases document-oriented (content query) semi-structured data (JSON) used for web apps
  • 27. Graph Databases property graph index-free adjacency used for recommendations, social networks
  • 28. Graph G = (V, E)
  • 29. Property Graph A property graph is a directed, labeled, attributed graph
  • 30. Graph Traversal Gremlin is jumping - from vertex to vertex - from vertex to edge - from edge to vertex https://github.com/tinkerpop/gremlin/wiki
  • 31. DBpedia Traversal + + gremlin> g = new SparqlRepositorySailGraph("http://dbpedia.org/sparql") gremlin> r = g.v('http://dbpedia.org/resource/Tim_Berners-Lee') gremlin> r.out('http://www.w3.org/2000/01/rdf-schema#comment').has('lang','fr').value ==>Sir Timothy John Berners-Lee est un citoyen britannique surtout connu comme le principal inventeur du World Wide Web. En juillet 2004, il est anobli par la reine Elizabeth II pour ce travail et son nom officiel devient Sir Timothy John Berners-Lee. Depuis 1994, il préside le World Wide Web Consortium (W3C), organisme qu'il a fondé. gremlin> r.in('http://dbpedia.org/ontology/influenced') ==>v[http://dbpedia.org/resource/Paul_Otlet] gremlin> r.in('http://dbpedia.org/ontology/influenced').out('http://dbpedia.org/ontology/influenced') ==>v[http://dbpedia.org/resource/Douglas_Engelbart] ==>v[http://dbpedia.org/resource/Ted_Nelson] ==>v[http://dbpedia.org/resource/Vannevar_Bush] ==>v[http://dbpedia.org/resource/Tim_Berners-Lee] ...
  • 32. Triple/RDF Stores Subject-Predicate-Object SPARQL as query language AllegroGraph, OpenLink Virtuoso, ...
  • 33. V for ... Volume Scale Sources Variety Relational NoSQL Velocity Operational Analytical
  • 34. Big Data Processing Batch Processing MapReduce Interactive Analysis BigQuery
  • 35. MapReduce MapReduce: Simplified Data Processing on Large Clusters (2004)
  • 36. Apache Hadoop Distributed Data + MapReduce http://hadoop.apache.org/
  • 37. Last Trends http://www.google.com/trends/explore#q=hadoop%2C%20mongodb%2C%20neo4j
  • 38. NoSQL issues No Distributed Transactions No SQL as query language
  • 39. NewSQL NoSQL + Distributed Transactions + SQL Spanner: Google's Globally-Distributed Database (2012)
  • 40. Thank you Credit: Most images created by Flickr Creative Commons Artists or Wikipedia Commons Artists