SlideShare a Scribd company logo
1 of 194
Download to read offline
Anthony Nyström
Fellow, Managing Director of Engineering
What is Intridea?
What is Intridea?
               We design and
               develop apps:
               Web, Mobile and Data
What is Intridea?
               We design and
               develop apps:
               Web, Mobile and Data



                    Founded in
                    Washington, DC
What is Intridea?
                                   We design and
We work with cool                  develop apps:
clients – really!                  Web, Mobile and Data



                                        Founded in
                                        Washington, DC
What is Intridea?
                                   We design and
We work with cool                  develop apps:
clients – really!                  Web, Mobile and Data



                                        Founded in
                                        Washington, DC



40+ Intrideans:
Designers/Developers/Scientists
+ Smart biz folks
What is Intridea?
                                    We design and
 We work with cool                  develop apps:
 clients – really!                  Web, Mobile and Data


We work from anywhere!
                                         Founded in
                                         Washington, DC



40+ Intrideans:
Designers/Developers/Scientists
+ Smart biz folks
What is Intridea?
                                        We design and
 We work with cool                      develop apps:
 clients – really!                      Web, Mobile and Data


We work from anywhere!
                                             Founded in
                                             Washington, DC



40+ Intrideans:
Designers/Developers/Scientists
+ Smart biz folks
                       We are growing
What is Intridea?
                                           We design and
 We work with cool                         develop apps:
 clients – really!                         Web, Mobile and Data


We work from anywhere!
                                                  Founded in
                                                  Washington, DC



40+ Intrideans:
Designers/Developers/Scientists
+ Smart biz folks                       We hire the best and
                       We are growing   the smartest
Anthony Nyström
Fellow, Managing Director of Engineering
Intridean:




The guy on stage




                   Anthony Nyström
            Fellow, Managing Director of Engineering
Data Science in the NOW!
It takes an army of TOOLS
An Army of Tools you say?
An Army of Tools you say?

• I am going to talk about what NOW means in Data Science

• Databases, Streaming Engines, Query Engines and Interfaces

• We are going to look at many of them and single out a few

• Each has a respected and in some cases competing set of
features
Then was NOW as NOW was Then
Then was NOW as NOW was Then

        Now is indeed Then
Then was NOW as NOW was Then

        Now is indeed Then
Then was NOW as NOW was Then

        Now is indeed Then




        Then is indeed Now
Why is NOW in data Special?
Why is NOW in data Special?


 Actionable Intelligence & Knowledge
Why is NOW in data Special?


 Actionable Intelligence & Knowledge

      NOW has innate context
Why is NOW in data Special?


  Actionable Intelligence & Knowledge

        NOW has innate context

TIME is THE natural facet for our minds &
                  life!
Why is NOW in data Special?
Why is NOW in data Special?
  Trends | Patterns | Extraction
Why is NOW in data Special?
  Trends | Patterns | Extraction

      Data Centric Trends
Why is NOW in data Special?
  Trends | Patterns | Extraction

      Data Centric Trends


   Pattern Extraction (ML/NLP)
Why is NOW in data Special?
    Trends | Patterns | Extraction

         Data Centric Trends


     Pattern Extraction (ML/NLP)


Signature Extraction (Binary, Encoded)
Why is NOW in data Special?
    Trends | Patterns | Extraction

          Data Centric Trends
     Not user input data like Google, Yahoo etc.


     Pattern Extraction (ML/NLP)


Signature Extraction (Binary, Encoded)
Why is NOW in data Special?
          Trends | Patterns | Extraction

                 Data Centric Trends
            Not user input data like Google, Yahoo etc.


           Pattern Extraction (ML/NLP)
“I am looking for data that conforms to a learned or known pattern”


   Signature Extraction (Binary, Encoded)
Why is NOW in data Special?
          Trends | Patterns | Extraction

                 Data Centric Trends
            Not user input data like Google, Yahoo etc.


           Pattern Extraction (ML/NLP)
“I am looking for data that conforms to a learned or known pattern”


   Signature Extraction (Binary, Encoded)
    “I am looking for data that matches a predefined signature”
Why is NOW in data Special?
Why is NOW in data Special?
Routing | Transformation | Computation
Why is NOW in data Special?
Routing | Transformation | Computation

          Intelligent Routing
Why is NOW in data Special?
Routing | Transformation | Computation

          Intelligent Routing


    Transformation & Computation
Why is NOW in data Special?
Routing | Transformation | Computation

                 Intelligent Routing
“I need to replicate/fork that of criteria x portions of this data
                            stream”
      Transformation & Computation
Why is NOW in data Special?
Routing | Transformation | Computation

                 Intelligent Routing
“I need to replicate/fork that of criteria x portions of this data
                            stream”
      Transformation & Computation
  “I need to transform certain fields” or “I need to compute
                a some value on certain fields”
Why is NOW in data Special?
Why is NOW in data Special?
     Algorithmic Speciality
Why is NOW in data Special?
     Algorithmic Speciality

           Concepts
Why is NOW in data Special?
     Algorithmic Speciality

           Concepts


          Regression
Why is NOW in data Special?
     Algorithmic Speciality

           Concepts


          Regression



         Relationships
Why is NOW in data Special?
         Algorithmic Speciality

                  Concepts
 What does a value represent or infer (NLP/ML/k-NN)

                 Regression



               Relationships
Why is NOW in data Special?
         Algorithmic Speciality

                   Concepts
 What does a value represent or infer (NLP/ML/k-NN)

                  Regression
      How is a value related to another value or
         How can we predict such relations

                Relationships
Why is NOW in data Special?
         Algorithmic Speciality

                   Concepts
 What does a value represent or infer (NLP/ML/k-NN)

                  Regression
      How is a value related to another value or
         How can we predict such relations

                Relationships
          Topological, Ontological, Forest
           (Evolutionary/Random) (NLP)
When NOW matters!
When NOW matters!
   Industry/Vertical
When NOW matters!
     Industry/Vertical

             Medical
 Algorithms are the new medical tests


            Scientific
            Eco, Bio, Geo


            Financial
       Stocks, Actuary Science
Point of Sale System
• Terminal
• Admin
• Tablet
Merck
• RT Persona
• RT Data
• Browser
Where is NOW in data?
Where is NOW in data?


Data Creation Time | Data Consumption Time
Latency
Latency

Data Creation Time | Data Consumption Time
Latency

Data Creation Time | Data Consumption Time


             Standard - NOPE!
Latency

Data Creation Time | Data Consumption Time


             Standard - NOPE!


       Depends upon the Medium - YEP!
Latency

Data Creation Time | Data Consumption Time


             Standard - NOPE!


       Depends upon the Medium - YEP!

      Depends upon the Consumer - YEP!
Latency

Data Creation Time | Data Consumption Time


              Standard - NOPE!


       Depends upon the Medium - YEP!

      Depends upon the Consumer - YEP!

       Depends upon Technology - YEP!
NOW and Latency
NOW and Latency


   Real-Time
NOW and Latency


   Real-Time


 Near Real-Time
NOW and Latency


   Real-Time


 Near Real-Time


   Some-Time
NOW and Latency


               Real-Time
Data that is consumed immediately after creation


           Near Real-Time


               Some-Time
NOW and Latency


               Real-Time
Data that is consumed immediately after creation


           Near Real-Time
   Data is consumed within seconds/minutes


               Some-Time
NOW and Latency


                  Real-Time
   Data that is consumed immediately after creation


              Near Real-Time
      Data is consumed within seconds/minutes


                  Some-Time
Data is consumed when requested & is NOT RT nor NRT
Physiological Latency
Physiological Latency
                                 Perception:

Research suggests that the human retina transmits data to the brain at the
rate of 10 million bits per second, which is close to that of 10 base Ethernet
connection!

We can perceive changes in reality at ~ 13-15 frames per second (fps, or
Hz), Our perception of reality fully refreshes itself ~ once every 77
Physiological Latency
                                 Perception:

Research suggests that the human retina transmits data to the brain at the
rate of 10 million bits per second, which is close to that of 10 base Ethernet
connection!

We can perceive changes in reality at ~ 13-15 frames per second (fps, or
Hz), Our perception of reality fully refreshes itself ~ once every 77

             Stock Exchange ~ 5-100 milliseconds (ms)
Physiological Latency
                                 Perception:

Research suggests that the human retina transmits data to the brain at the
rate of 10 million bits per second, which is close to that of 10 base Ethernet
connection!

We can perceive changes in reality at ~ 13-15 frames per second (fps, or
Hz), Our perception of reality fully refreshes itself ~ once every 77

             Stock Exchange ~ 5-100 milliseconds (ms)
                Web Sites ~ 50-400 milliseconds (ms)
Physiological Latency
                                 Perception:

Research suggests that the human retina transmits data to the brain at the
rate of 10 million bits per second, which is close to that of 10 base Ethernet
connection!

We can perceive changes in reality at ~ 13-15 frames per second (fps, or
Hz), Our perception of reality fully refreshes itself ~ once every 77

             Stock Exchange ~ 5-100 milliseconds (ms)
                Web Sites ~ 50-400 milliseconds (ms)
              Games (FPS) ~ 10-150 milliseconds (ms)
Physiological Latency
                                 Perception:

Research suggests that the human retina transmits data to the brain at the
rate of 10 million bits per second, which is close to that of 10 base Ethernet
connection!

We can perceive changes in reality at ~ 13-15 frames per second (fps, or
Hz), Our perception of reality fully refreshes itself ~ once every 77

             Stock Exchange ~ 5-100 milliseconds (ms)
                Web Sites ~ 50-400 milliseconds (ms)
              Games (FPS) ~ 10-150 milliseconds (ms)
                  Social/Games ~ 200 ms -1 second
Let’s talk about TOOLS!
Let’s talk about TOOLS!


Real or Near Real-Time (DB’s, Index’s, FS’s)
Let’s talk about TOOLS!


Real or Near Real-Time (DB’s, Index’s, FS’s)

    Real or Near Real-Time (nRTQE’s)
Let’s talk about TOOLS!


Real or Near Real-Time (DB’s, Index’s, FS’s)

    Real or Near Real-Time (nRTQE’s)

    Real or Near Real-Time (nRTSE’s)
Real-Time (DB’s, Index’s, FS’s)
Real-Time (DB’s, Index’s, FS’s)
         No particular order
Real-Time (DB’s, Index’s, FS’s)
          No particular order

• MySQL
Real-Time (DB’s, Index’s, FS’s)
               No particular order

• MySQL
• SQL Server
Real-Time (DB’s, Index’s, FS’s)
               No particular order

• MySQL
• SQL Server
• PostgreSQL
Real-Time (DB’s, Index’s, FS’s)
                  No particular order

• MySQL
• SQL Server
• PostgreSQL
• Neo4j (Graph)
Real-Time (DB’s, Index’s, FS’s)
                  No particular order

• MySQL
• SQL Server
• PostgreSQL
• Neo4j (Graph)
• Mongo
Real-Time (DB’s, Index’s, FS’s)
                  No particular order

• MySQL
• SQL Server
• PostgreSQL
• Neo4j (Graph)
• Mongo
• Elastic Search (Lucene)
Real-Time (DB’s, Index’s, FS’s)
                  No particular order

• MySQL
• SQL Server
• PostgreSQL
• Neo4j (Graph)
• Mongo
• Elastic Search (Lucene)
• Solr
Real-Time (DB’s, Index’s, FS’s)
                  No particular order

• MySQL                                 • HDFS
• SQL Server
• PostgreSQL
• Neo4j (Graph)
• Mongo
• Elastic Search (Lucene)
• Solr
Real-Time (DB’s, Index’s, FS’s)
                  No particular order

• MySQL                                 • HDFS
• SQL Server                            • HBase
• PostgreSQL
• Neo4j (Graph)
• Mongo
• Elastic Search (Lucene)
• Solr
Real-Time (DB’s, Index’s, FS’s)
                  No particular order

• MySQL                                 • HDFS
• SQL Server                            • HBase
• PostgreSQL                            • Oracle
• Neo4j (Graph)
• Mongo
• Elastic Search (Lucene)
• Solr
Real-Time (DB’s, Index’s, FS’s)
                  No particular order

• MySQL                                 • HDFS
• SQL Server                            • HBase
• PostgreSQL                            • Oracle
• Neo4j (Graph)                         • ERTFS
• Mongo
• Elastic Search (Lucene)
• Solr
Real-Time (DB’s, Index’s, FS’s)
                  No particular order

• MySQL                                 • HDFS
• SQL Server                            • HBase
• PostgreSQL                            • Oracle
• Neo4j (Graph)                         • ERTFS
• Mongo                                 • Redis
• Elastic Search (Lucene)
• Solr
Real-Time (DB’s, Index’s, FS’s)
                  No particular order

• MySQL                                 • HDFS
• SQL Server                            • HBase
• PostgreSQL                            • Oracle
• Neo4j (Graph)                         • ERTFS
• Mongo                                 • Redis
• Elastic Search (Lucene)               • Cassandra
• Solr
Real-Time (DB’s, Index’s, FS’s)
                  No particular order

• MySQL                                 • HDFS
• SQL Server                            • HBase
• PostgreSQL                            • Oracle
• Neo4j (Graph)                         • ERTFS
• Mongo                                 • Redis
• Elastic Search (Lucene)               • Cassandra
• Solr                                  • Riak
HBase
HBase

Regions and HDFS
HBase

Regions and HDFS



    Scaling
HBase

Regions and HDFS



    Scaling



    Hadoop
HBase

                   Regions and HDFS
“Regions” Data files for regions are stored in HDFS and replicated to
 multiple nodes in the cluster. As well, allocation in to the cluster is
                          rather automatic
                            Scaling



                            Hadoop
HBase

                   Regions and HDFS
“Regions” Data files for regions are stored in HDFS and replicated to
 multiple nodes in the cluster. As well, allocation in to the cluster is
                          rather automatic
                            Scaling
                         Fault Tolerance
                       Commodity Machines


                            Hadoop
HBase

                   Regions and HDFS
“Regions” Data files for regions are stored in HDFS and replicated to
 multiple nodes in the cluster. As well, allocation in to the cluster is
                          rather automatic
                            Scaling
                         Fault Tolerance
                       Commodity Machines


                            Hadoop
                      Runs on top of Hadoop
                      MapReduce Integration
Cassandra
Cassandra

Always Writable
Cassandra

Always Writable



    Scaling
Cassandra

Always Writable



    Scaling



    More...
Cassandra

                      Always Writable
Even when internally the write fails. However, the data will eventually
                    become consistent (Tunable)


                             Scaling



                              More...
Cassandra

                      Always Writable
Even when internally the write fails. However, the data will eventually
                    become consistent (Tunable)


                             Scaling
                      Can span data centers
       Peer-to-Peer communication between nodes (Gossip)

                              More...
Cassandra

                      Always Writable
Even when internally the write fails. However, the data will eventually
                    become consistent (Tunable)


                             Scaling
                      Can span data centers
       Peer-to-Peer communication between nodes (Gossip)

                              More...
                        Supports MapReduce
                       Supports Range Queries
Redis
Redis

Transactions
Redis

         Transactions



An evolutionary Key-Value Store
Redis

         Transactions



An evolutionary Key-Value Store



           Pub-Sub
Redis

                      Transactions
Atomic operations (MULTI/EXEC/Discard) Queue your operations and
    EXEC/Commit as transaction. Allows for Roll-back support.


       An evolutionary Key-Value Store



                         Pub-Sub
Redis

                       Transactions
Atomic operations (MULTI/EXEC/Discard) Queue your operations and
    EXEC/Commit as transaction. Allows for Roll-back support.


        An evolutionary Key-Value Store
Supports complex types that are closely related to fundamental data
            structures. No need for abstraction layer.

                          Pub-Sub
Redis

                       Transactions
Atomic operations (MULTI/EXEC/Discard) Queue your operations and
    EXEC/Commit as transaction. Allows for Roll-back support.


        An evolutionary Key-Value Store
Supports complex types that are closely related to fundamental data
            structures. No need for abstraction layer.

                          Pub-Sub
              Publish - Push messages to a channel
                 Subscribe - Listen to a channel
Near Real-Time & Real-Time
Near Real-Time & Real-Time
      Queries and Streams
Near Real-Time & Real-Time
        Queries and Streams


  • Storm
Near Real-Time & Real-Time
        Queries and Streams


  • Storm
  • Kafka
Near Real-Time & Real-Time
        Queries and Streams


  • Storm
  • Kafka
  • Drill/Dremel
Near Real-Time & Real-Time
        Queries and Streams


  • Storm
  • Kafka
  • Drill/Dremel
  • Hadoop
Near Real-Time & Real-Time
        Queries and Streams


  • Storm
  • Kafka
  • Drill/Dremel
  • Hadoop
  • MapReduce
Near Real-Time & Real-Time
        Queries and Streams


  • Storm          • MapReduce v2 (YARN)
  • Kafka
  • Drill/Dremel
  • Hadoop
  • MapReduce
Near Real-Time & Real-Time
        Queries and Streams


  • Storm          • MapReduce v2 (YARN)
  • Kafka          • Pig
  • Drill/Dremel
  • Hadoop
  • MapReduce
Near Real-Time & Real-Time
        Queries and Streams


  • Storm          • MapReduce v2 (YARN)
  • Kafka          • Pig
  • Drill/Dremel   • Hive
  • Hadoop
  • MapReduce
Near Real-Time & Real-Time
        Queries and Streams


  • Storm          • MapReduce v2 (YARN)
  • Kafka          • Pig
  • Drill/Dremel   • Hive
  • Hadoop         • Cascalog
  • MapReduce
Near Real-Time & Real-Time
        Queries and Streams


  • Storm          • MapReduce v2 (YARN)
  • Kafka          • Pig
  • Drill/Dremel   • Hive
  • Hadoop         • Cascalog
  • MapReduce      • DataTurbine
MapReduce/Hadoop
MapReduce/Hadoop

      Scale
MapReduce/Hadoop

       Scale




    Development
MapReduce/Hadoop

       Scale




    Development


       Batch
MapReduce/Hadoop

             Scale
  100’s to 1000’s of server nodes
        Extreme and cheap
    Simple programming model


       Development


             Batch
MapReduce/Hadoop

             Scale
  100’s to 1000’s of server nodes
        Extreme and cheap
    Simple programming model


       Development
   Java, Python, Grep & Others...


             Batch
MapReduce/Hadoop

             Scale
  100’s to 1000’s of server nodes
        Extreme and cheap
    Simple programming model


       Development
   Java, Python, Grep & Others...


             Batch
  Complex Multi-Step Processing
Storm
Storm

FAST
Storm

   FAST


Integration
Storm

   FAST


Integration




Assurance
Storm

                     FAST
Over a million tuples processed per second per node


                Integration




                 Assurance
Storm

                          FAST
    Over a million tuples processed per second per node


                     Integration
Integrates with any queueing system and any database system
   Handles the parallelization, partitioning, and retrying on
                   failures when necessary


                      Assurance
Storm

                            FAST
      Over a million tuples processed per second per node


                       Integration
 Integrates with any queueing system and any database system
    Handles the parallelization, partitioning, and retrying on
                    failures when necessary


                        Assurance
Scalable, Fault-Tolerant, Guarantees your data will be processed!
CQL/StreamQL/SparQL/QL-RTDB/
CQL/StreamQL/SparQL/QL-RTDB/

          Languages
CQL/StreamQL/SparQL/QL-RTDB/

          Languages


           Scalable
CQL/StreamQL/SparQL/QL-RTDB/

          Languages


           Scalable


          SQL Idioms
CQL/StreamQL/SparQL/QL-RTDB/

          Languages
          Human Readable


           Scalable


          SQL Idioms
CQL/StreamQL/SparQL/QL-RTDB/

                      Languages
                      Human Readable


                        Scalable
   Simultaneous n Queries upon both stream data and static


                     SQL Idioms
CQL/StreamQL/SparQL/QL-RTDB/

                       Languages
                       Human Readable


                         Scalable
    Simultaneous n Queries upon both stream data and static


                       SQL Idioms
  All support to a large degree what you would expect from SQL
PIG
PIG

Language
PIG

  Language



Parallelization
PIG

  Language



Parallelization



 Underneath
PIG

              Language
High Level and easy to understand (Pig Latin)



           Parallelization



             Underneath
PIG

                          Language
           High Level and easy to understand (Pig Latin)



                       Parallelization
It is trivial to achieve parallel execution of simple, "embarrassingly
                       parallel" data analysis tasks


                         Underneath
PIG

                          Language
           High Level and easy to understand (Pig Latin)



                       Parallelization
It is trivial to achieve parallel execution of simple, "embarrassingly
                       parallel" data analysis tasks


                         Underneath
           Essentially a MapReduce sequence compiler
PIG
PIG
Example Pig Script
PIG
Example Pig Script
PIG
PIG
That same example using MR Java code
The perfect Army!
The perfect Army!

    In Memory
The perfect Army!

     In Memory


  Identify and Plan
The perfect Army!

     In Memory


  Identify and Plan



     Consumer
The perfect Army!

                In Memory
Keep as much as you can IN MEMORY! Think Redis...


           Identify and Plan



                Consumer
The perfect Army!

                   In Memory
  Keep as much as you can IN MEMORY! Think Redis...


              Identify and Plan
What data can be batch processed and what can’t! Think
 Hadoop and Storm (for stream) and HBase (for adhoc)


                   Consumer
The perfect Army!

                         In Memory
       Keep as much as you can IN MEMORY! Think Redis...


                    Identify and Plan
     What data can be batch processed and what can’t! Think
      Hadoop and Storm (for stream) and HBase (for adhoc)


                         Consumer
Who is the data consumer? Person or Process? Think Pig or xQL’s for
                              both!
Anthony Nyström                      Thank You
Fellow, Managing Director            Gracias
of Engineering
                                     Merci
anthony@intridea.com                 Danke
@AnthonyNystrom




                       www.intridea.co

More Related Content

What's hot

iTrain Malaysia: Data Science by Tarun Sukhani
iTrain Malaysia: Data Science by Tarun SukhaniiTrain Malaysia: Data Science by Tarun Sukhani
iTrain Malaysia: Data Science by Tarun SukhaniiTrain
 
Lean approach to IT development
Lean approach to IT developmentLean approach to IT development
Lean approach to IT developmentMark Krebs
 
2017 11-10 The Changing Role of Today's CIO
2017 11-10 The Changing Role of Today's CIO2017 11-10 The Changing Role of Today's CIO
2017 11-10 The Changing Role of Today's CIORaffa Learning Community
 
2018 3-14 The Changing Role of Today's CIO
2018 3-14 The Changing Role of Today's CIO2018 3-14 The Changing Role of Today's CIO
2018 3-14 The Changing Role of Today's CIORaffa Learning Community
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceAnnie Flippo
 
IQTechPros- Who are we
IQTechPros- Who are weIQTechPros- Who are we
IQTechPros- Who are weIQTechPros123
 
2017 05-04 The Changing Role of Today's CIO
2017 05-04 The Changing Role of Today's CIO2017 05-04 The Changing Role of Today's CIO
2017 05-04 The Changing Role of Today's CIORaffa Learning Community
 
Neo4j Innovation Lab - Accelerate Innovation through Graph Thinking
Neo4j Innovation Lab - Accelerate Innovation through Graph ThinkingNeo4j Innovation Lab - Accelerate Innovation through Graph Thinking
Neo4j Innovation Lab - Accelerate Innovation through Graph ThinkingNeo4j
 

What's hot (13)

iTrain Malaysia: Data Science by Tarun Sukhani
iTrain Malaysia: Data Science by Tarun SukhaniiTrain Malaysia: Data Science by Tarun Sukhani
iTrain Malaysia: Data Science by Tarun Sukhani
 
2016-12-07 The Changing Role of the CIO
2016-12-07 The Changing Role of the CIO2016-12-07 The Changing Role of the CIO
2016-12-07 The Changing Role of the CIO
 
Lean approach to IT development
Lean approach to IT developmentLean approach to IT development
Lean approach to IT development
 
2016-06-08 Who Needs a CIO?
2016-06-08 Who Needs a CIO?2016-06-08 Who Needs a CIO?
2016-06-08 Who Needs a CIO?
 
2017 11-10 The Changing Role of Today's CIO
2017 11-10 The Changing Role of Today's CIO2017 11-10 The Changing Role of Today's CIO
2017 11-10 The Changing Role of Today's CIO
 
2018 3-14 The Changing Role of Today's CIO
2018 3-14 The Changing Role of Today's CIO2018 3-14 The Changing Role of Today's CIO
2018 3-14 The Changing Role of Today's CIO
 
2018 2-6 The Changing Role of Today's CIO
2018 2-6 The Changing Role of Today's CIO2018 2-6 The Changing Role of Today's CIO
2018 2-6 The Changing Role of Today's CIO
 
Who Needs a CIO?
Who Needs a CIO?Who Needs a CIO?
Who Needs a CIO?
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data Science
 
IQTechPros- Who are we
IQTechPros- Who are weIQTechPros- Who are we
IQTechPros- Who are we
 
2017 05-04 The Changing Role of Today's CIO
2017 05-04 The Changing Role of Today's CIO2017 05-04 The Changing Role of Today's CIO
2017 05-04 The Changing Role of Today's CIO
 
Neo4j Innovation Lab - Accelerate Innovation through Graph Thinking
Neo4j Innovation Lab - Accelerate Innovation through Graph ThinkingNeo4j Innovation Lab - Accelerate Innovation through Graph Thinking
Neo4j Innovation Lab - Accelerate Innovation through Graph Thinking
 
E245 personal libraries-week5
E245 personal libraries-week5E245 personal libraries-week5
E245 personal libraries-week5
 

Viewers also liked (20)

Writing 7 referencing
Writing 7 referencingWriting 7 referencing
Writing 7 referencing
 
Les anophèles du Niger, Recherche et appui au programme national de lutte
Les anophèles du Niger, Recherche et appui au programme national de lutte Les anophèles du Niger, Recherche et appui au programme national de lutte
Les anophèles du Niger, Recherche et appui au programme national de lutte
 
Guia del usuario
Guia del usuarioGuia del usuario
Guia del usuario
 
Emerald
EmeraldEmerald
Emerald
 
Biologia-Animais geneticamente modificados
Biologia-Animais geneticamente modificadosBiologia-Animais geneticamente modificados
Biologia-Animais geneticamente modificados
 
Evangelho no lar com crianças (23)
Evangelho no lar com crianças (23)Evangelho no lar com crianças (23)
Evangelho no lar com crianças (23)
 
No.243 english
No.243 englishNo.243 english
No.243 english
 
Employer branding w rekrutacji
Employer branding w rekrutacjiEmployer branding w rekrutacji
Employer branding w rekrutacji
 
BB16 Spider by OneQube
BB16 Spider by OneQubeBB16 Spider by OneQube
BB16 Spider by OneQube
 
SCURC Presentation-RA 4.9.15-fix
SCURC Presentation-RA 4.9.15-fixSCURC Presentation-RA 4.9.15-fix
SCURC Presentation-RA 4.9.15-fix
 
JRH Resume
JRH ResumeJRH Resume
JRH Resume
 
Enfermedad & muerte (steve jobs)
Enfermedad & muerte (steve jobs)Enfermedad & muerte (steve jobs)
Enfermedad & muerte (steve jobs)
 
Tema i
Tema iTema i
Tema i
 
Fièvres et viroses à Madagascar
Fièvres et viroses à MadagascarFièvres et viroses à Madagascar
Fièvres et viroses à Madagascar
 
La teoría histórico cultural de l
La teoría histórico cultural de lLa teoría histórico cultural de l
La teoría histórico cultural de l
 
Ass4 2
Ass4 2Ass4 2
Ass4 2
 
Santiago
SantiagoSantiago
Santiago
 
Know4 3
Know4 3Know4 3
Know4 3
 
Kaos
KaosKaos
Kaos
 
Guia del usuario
Guia del usuarioGuia del usuario
Guia del usuario
 

Similar to Anthony Nystrom - Intridea - Date Science in the NOW, it takes an Army of tools

Designing for the Prime Interface
Designing for the Prime InterfaceDesigning for the Prime Interface
Designing for the Prime InterfaceBen Taylor
 
Data Science at LinkedIn - Data-Driven Products & Insights
Data Science at LinkedIn - Data-Driven Products & InsightsData Science at LinkedIn - Data-Driven Products & Insights
Data Science at LinkedIn - Data-Driven Products & InsightsYael Garten
 
democratization of data sql-konferenz
democratization of data sql-konferenzdemocratization of data sql-konferenz
democratization of data sql-konferenzJen Stirrup
 
Integrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and PerficientIntegrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and PerficientPerficient, Inc.
 
Big Data World Singapore 2017 - Moving Towards Digitization & Artificial Inte...
Big Data World Singapore 2017 - Moving Towards Digitization & Artificial Inte...Big Data World Singapore 2017 - Moving Towards Digitization & Artificial Inte...
Big Data World Singapore 2017 - Moving Towards Digitization & Artificial Inte...Garrett Teoh Hor Keong
 
Pivotal Data Warehouse in the Age of Digital Transformation
Pivotal Data Warehouse in the Age of Digital TransformationPivotal Data Warehouse in the Age of Digital Transformation
Pivotal Data Warehouse in the Age of Digital TransformationVMware Tanzu
 
Agile data science
Agile data scienceAgile data science
Agile data scienceJoel Horwitz
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?Inside Analysis
 
Data Infused Product Design and Insights at LinkedIn
Data Infused Product Design and Insights at LinkedInData Infused Product Design and Insights at LinkedIn
Data Infused Product Design and Insights at LinkedInYael Garten
 
A Perspective from the intersection Data Science, Mobility, and Mobile Devices
A Perspective from the intersection Data Science, Mobility, and Mobile DevicesA Perspective from the intersection Data Science, Mobility, and Mobile Devices
A Perspective from the intersection Data Science, Mobility, and Mobile DevicesYael Garten
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopPeter Skomoroch
 
The Future for Smart Technology Architects
The Future for Smart Technology ArchitectsThe Future for Smart Technology Architects
The Future for Smart Technology ArchitectsPaul Preiss
 
Data Science Highlights
Data Science Highlights Data Science Highlights
Data Science Highlights Joe Lamantia
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriDemi Ben-Ari
 
ROI And The Business Value Of Information Architecture
ROI And The Business Value Of Information ArchitectureROI And The Business Value Of Information Architecture
ROI And The Business Value Of Information ArchitectureEric Reiss
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Gabriel Moreira
 
Graph Thinking: Why it Matters
Graph Thinking: Why it MattersGraph Thinking: Why it Matters
Graph Thinking: Why it MattersNeo4j
 
2017 06-14-getting started with data science
2017 06-14-getting started with data science2017 06-14-getting started with data science
2017 06-14-getting started with data scienceThinkful
 

Similar to Anthony Nystrom - Intridea - Date Science in the NOW, it takes an Army of tools (20)

Designing for the Prime Interface
Designing for the Prime InterfaceDesigning for the Prime Interface
Designing for the Prime Interface
 
Data Science at LinkedIn - Data-Driven Products & Insights
Data Science at LinkedIn - Data-Driven Products & InsightsData Science at LinkedIn - Data-Driven Products & Insights
Data Science at LinkedIn - Data-Driven Products & Insights
 
democratization of data sql-konferenz
democratization of data sql-konferenzdemocratization of data sql-konferenz
democratization of data sql-konferenz
 
Integrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and PerficientIntegrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and Perficient
 
Big Data World Singapore 2017 - Moving Towards Digitization & Artificial Inte...
Big Data World Singapore 2017 - Moving Towards Digitization & Artificial Inte...Big Data World Singapore 2017 - Moving Towards Digitization & Artificial Inte...
Big Data World Singapore 2017 - Moving Towards Digitization & Artificial Inte...
 
Pivotal Data Warehouse in the Age of Digital Transformation
Pivotal Data Warehouse in the Age of Digital TransformationPivotal Data Warehouse in the Age of Digital Transformation
Pivotal Data Warehouse in the Age of Digital Transformation
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?
 
Data Infused Product Design and Insights at LinkedIn
Data Infused Product Design and Insights at LinkedInData Infused Product Design and Insights at LinkedIn
Data Infused Product Design and Insights at LinkedIn
 
A Perspective from the intersection Data Science, Mobility, and Mobile Devices
A Perspective from the intersection Data Science, Mobility, and Mobile DevicesA Perspective from the intersection Data Science, Mobility, and Mobile Devices
A Perspective from the intersection Data Science, Mobility, and Mobile Devices
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With Hadoop
 
The Future for Smart Technology Architects
The Future for Smart Technology ArchitectsThe Future for Smart Technology Architects
The Future for Smart Technology Architects
 
Office 2.0 / Enterprise 2.0
Office 2.0 / Enterprise 2.0Office 2.0 / Enterprise 2.0
Office 2.0 / Enterprise 2.0
 
Data Science Highlights
Data Science Highlights Data Science Highlights
Data Science Highlights
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-Ari
 
ROI And The Business Value Of Information Architecture
ROI And The Business Value Of Information ArchitectureROI And The Business Value Of Information Architecture
ROI And The Business Value Of Information Architecture
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015
 
Graph Thinking: Why it Matters
Graph Thinking: Why it MattersGraph Thinking: Why it Matters
Graph Thinking: Why it Matters
 
2017 06-14-getting started with data science
2017 06-14-getting started with data science2017 06-14-getting started with data science
2017 06-14-getting started with data science
 

Anthony Nystrom - Intridea - Date Science in the NOW, it takes an Army of tools

  • 1. Anthony Nyström Fellow, Managing Director of Engineering
  • 2.
  • 4. What is Intridea? We design and develop apps: Web, Mobile and Data
  • 5. What is Intridea? We design and develop apps: Web, Mobile and Data Founded in Washington, DC
  • 6. What is Intridea? We design and We work with cool develop apps: clients – really! Web, Mobile and Data Founded in Washington, DC
  • 7. What is Intridea? We design and We work with cool develop apps: clients – really! Web, Mobile and Data Founded in Washington, DC 40+ Intrideans: Designers/Developers/Scientists + Smart biz folks
  • 8. What is Intridea? We design and We work with cool develop apps: clients – really! Web, Mobile and Data We work from anywhere! Founded in Washington, DC 40+ Intrideans: Designers/Developers/Scientists + Smart biz folks
  • 9. What is Intridea? We design and We work with cool develop apps: clients – really! Web, Mobile and Data We work from anywhere! Founded in Washington, DC 40+ Intrideans: Designers/Developers/Scientists + Smart biz folks We are growing
  • 10. What is Intridea? We design and We work with cool develop apps: clients – really! Web, Mobile and Data We work from anywhere! Founded in Washington, DC 40+ Intrideans: Designers/Developers/Scientists + Smart biz folks We hire the best and We are growing the smartest
  • 11. Anthony Nyström Fellow, Managing Director of Engineering
  • 12. Intridean: The guy on stage Anthony Nyström Fellow, Managing Director of Engineering
  • 13.
  • 14. Data Science in the NOW! It takes an army of TOOLS
  • 15.
  • 16. An Army of Tools you say?
  • 17. An Army of Tools you say? • I am going to talk about what NOW means in Data Science • Databases, Streaming Engines, Query Engines and Interfaces • We are going to look at many of them and single out a few • Each has a respected and in some cases competing set of features
  • 18.
  • 19. Then was NOW as NOW was Then
  • 20. Then was NOW as NOW was Then Now is indeed Then
  • 21. Then was NOW as NOW was Then Now is indeed Then
  • 22. Then was NOW as NOW was Then Now is indeed Then Then is indeed Now
  • 23.
  • 24. Why is NOW in data Special?
  • 25. Why is NOW in data Special? Actionable Intelligence & Knowledge
  • 26. Why is NOW in data Special? Actionable Intelligence & Knowledge NOW has innate context
  • 27. Why is NOW in data Special? Actionable Intelligence & Knowledge NOW has innate context TIME is THE natural facet for our minds & life!
  • 28.
  • 29. Why is NOW in data Special?
  • 30. Why is NOW in data Special? Trends | Patterns | Extraction
  • 31. Why is NOW in data Special? Trends | Patterns | Extraction Data Centric Trends
  • 32. Why is NOW in data Special? Trends | Patterns | Extraction Data Centric Trends Pattern Extraction (ML/NLP)
  • 33. Why is NOW in data Special? Trends | Patterns | Extraction Data Centric Trends Pattern Extraction (ML/NLP) Signature Extraction (Binary, Encoded)
  • 34. Why is NOW in data Special? Trends | Patterns | Extraction Data Centric Trends Not user input data like Google, Yahoo etc. Pattern Extraction (ML/NLP) Signature Extraction (Binary, Encoded)
  • 35. Why is NOW in data Special? Trends | Patterns | Extraction Data Centric Trends Not user input data like Google, Yahoo etc. Pattern Extraction (ML/NLP) “I am looking for data that conforms to a learned or known pattern” Signature Extraction (Binary, Encoded)
  • 36. Why is NOW in data Special? Trends | Patterns | Extraction Data Centric Trends Not user input data like Google, Yahoo etc. Pattern Extraction (ML/NLP) “I am looking for data that conforms to a learned or known pattern” Signature Extraction (Binary, Encoded) “I am looking for data that matches a predefined signature”
  • 37.
  • 38. Why is NOW in data Special?
  • 39. Why is NOW in data Special? Routing | Transformation | Computation
  • 40. Why is NOW in data Special? Routing | Transformation | Computation Intelligent Routing
  • 41. Why is NOW in data Special? Routing | Transformation | Computation Intelligent Routing Transformation & Computation
  • 42. Why is NOW in data Special? Routing | Transformation | Computation Intelligent Routing “I need to replicate/fork that of criteria x portions of this data stream” Transformation & Computation
  • 43. Why is NOW in data Special? Routing | Transformation | Computation Intelligent Routing “I need to replicate/fork that of criteria x portions of this data stream” Transformation & Computation “I need to transform certain fields” or “I need to compute a some value on certain fields”
  • 44.
  • 45. Why is NOW in data Special?
  • 46. Why is NOW in data Special? Algorithmic Speciality
  • 47. Why is NOW in data Special? Algorithmic Speciality Concepts
  • 48. Why is NOW in data Special? Algorithmic Speciality Concepts Regression
  • 49. Why is NOW in data Special? Algorithmic Speciality Concepts Regression Relationships
  • 50. Why is NOW in data Special? Algorithmic Speciality Concepts What does a value represent or infer (NLP/ML/k-NN) Regression Relationships
  • 51. Why is NOW in data Special? Algorithmic Speciality Concepts What does a value represent or infer (NLP/ML/k-NN) Regression How is a value related to another value or How can we predict such relations Relationships
  • 52. Why is NOW in data Special? Algorithmic Speciality Concepts What does a value represent or infer (NLP/ML/k-NN) Regression How is a value related to another value or How can we predict such relations Relationships Topological, Ontological, Forest (Evolutionary/Random) (NLP)
  • 53.
  • 55. When NOW matters! Industry/Vertical
  • 56. When NOW matters! Industry/Vertical Medical Algorithms are the new medical tests Scientific Eco, Bio, Geo Financial Stocks, Actuary Science
  • 57.
  • 58.
  • 59.
  • 60. Point of Sale System • Terminal • Admin • Tablet
  • 61.
  • 62. Merck • RT Persona • RT Data • Browser
  • 63.
  • 64. Where is NOW in data?
  • 65. Where is NOW in data? Data Creation Time | Data Consumption Time
  • 66.
  • 68. Latency Data Creation Time | Data Consumption Time
  • 69. Latency Data Creation Time | Data Consumption Time Standard - NOPE!
  • 70. Latency Data Creation Time | Data Consumption Time Standard - NOPE! Depends upon the Medium - YEP!
  • 71. Latency Data Creation Time | Data Consumption Time Standard - NOPE! Depends upon the Medium - YEP! Depends upon the Consumer - YEP!
  • 72. Latency Data Creation Time | Data Consumption Time Standard - NOPE! Depends upon the Medium - YEP! Depends upon the Consumer - YEP! Depends upon Technology - YEP!
  • 73.
  • 75. NOW and Latency Real-Time
  • 76. NOW and Latency Real-Time Near Real-Time
  • 77. NOW and Latency Real-Time Near Real-Time Some-Time
  • 78. NOW and Latency Real-Time Data that is consumed immediately after creation Near Real-Time Some-Time
  • 79. NOW and Latency Real-Time Data that is consumed immediately after creation Near Real-Time Data is consumed within seconds/minutes Some-Time
  • 80. NOW and Latency Real-Time Data that is consumed immediately after creation Near Real-Time Data is consumed within seconds/minutes Some-Time Data is consumed when requested & is NOT RT nor NRT
  • 81.
  • 83. Physiological Latency Perception: Research suggests that the human retina transmits data to the brain at the rate of 10 million bits per second, which is close to that of 10 base Ethernet connection! We can perceive changes in reality at ~ 13-15 frames per second (fps, or Hz), Our perception of reality fully refreshes itself ~ once every 77
  • 84. Physiological Latency Perception: Research suggests that the human retina transmits data to the brain at the rate of 10 million bits per second, which is close to that of 10 base Ethernet connection! We can perceive changes in reality at ~ 13-15 frames per second (fps, or Hz), Our perception of reality fully refreshes itself ~ once every 77 Stock Exchange ~ 5-100 milliseconds (ms)
  • 85. Physiological Latency Perception: Research suggests that the human retina transmits data to the brain at the rate of 10 million bits per second, which is close to that of 10 base Ethernet connection! We can perceive changes in reality at ~ 13-15 frames per second (fps, or Hz), Our perception of reality fully refreshes itself ~ once every 77 Stock Exchange ~ 5-100 milliseconds (ms) Web Sites ~ 50-400 milliseconds (ms)
  • 86. Physiological Latency Perception: Research suggests that the human retina transmits data to the brain at the rate of 10 million bits per second, which is close to that of 10 base Ethernet connection! We can perceive changes in reality at ~ 13-15 frames per second (fps, or Hz), Our perception of reality fully refreshes itself ~ once every 77 Stock Exchange ~ 5-100 milliseconds (ms) Web Sites ~ 50-400 milliseconds (ms) Games (FPS) ~ 10-150 milliseconds (ms)
  • 87. Physiological Latency Perception: Research suggests that the human retina transmits data to the brain at the rate of 10 million bits per second, which is close to that of 10 base Ethernet connection! We can perceive changes in reality at ~ 13-15 frames per second (fps, or Hz), Our perception of reality fully refreshes itself ~ once every 77 Stock Exchange ~ 5-100 milliseconds (ms) Web Sites ~ 50-400 milliseconds (ms) Games (FPS) ~ 10-150 milliseconds (ms) Social/Games ~ 200 ms -1 second
  • 88.
  • 90. Let’s talk about TOOLS! Real or Near Real-Time (DB’s, Index’s, FS’s)
  • 91. Let’s talk about TOOLS! Real or Near Real-Time (DB’s, Index’s, FS’s) Real or Near Real-Time (nRTQE’s)
  • 92. Let’s talk about TOOLS! Real or Near Real-Time (DB’s, Index’s, FS’s) Real or Near Real-Time (nRTQE’s) Real or Near Real-Time (nRTSE’s)
  • 93.
  • 95. Real-Time (DB’s, Index’s, FS’s) No particular order
  • 96. Real-Time (DB’s, Index’s, FS’s) No particular order • MySQL
  • 97. Real-Time (DB’s, Index’s, FS’s) No particular order • MySQL • SQL Server
  • 98. Real-Time (DB’s, Index’s, FS’s) No particular order • MySQL • SQL Server • PostgreSQL
  • 99. Real-Time (DB’s, Index’s, FS’s) No particular order • MySQL • SQL Server • PostgreSQL • Neo4j (Graph)
  • 100. Real-Time (DB’s, Index’s, FS’s) No particular order • MySQL • SQL Server • PostgreSQL • Neo4j (Graph) • Mongo
  • 101. Real-Time (DB’s, Index’s, FS’s) No particular order • MySQL • SQL Server • PostgreSQL • Neo4j (Graph) • Mongo • Elastic Search (Lucene)
  • 102. Real-Time (DB’s, Index’s, FS’s) No particular order • MySQL • SQL Server • PostgreSQL • Neo4j (Graph) • Mongo • Elastic Search (Lucene) • Solr
  • 103. Real-Time (DB’s, Index’s, FS’s) No particular order • MySQL • HDFS • SQL Server • PostgreSQL • Neo4j (Graph) • Mongo • Elastic Search (Lucene) • Solr
  • 104. Real-Time (DB’s, Index’s, FS’s) No particular order • MySQL • HDFS • SQL Server • HBase • PostgreSQL • Neo4j (Graph) • Mongo • Elastic Search (Lucene) • Solr
  • 105. Real-Time (DB’s, Index’s, FS’s) No particular order • MySQL • HDFS • SQL Server • HBase • PostgreSQL • Oracle • Neo4j (Graph) • Mongo • Elastic Search (Lucene) • Solr
  • 106. Real-Time (DB’s, Index’s, FS’s) No particular order • MySQL • HDFS • SQL Server • HBase • PostgreSQL • Oracle • Neo4j (Graph) • ERTFS • Mongo • Elastic Search (Lucene) • Solr
  • 107. Real-Time (DB’s, Index’s, FS’s) No particular order • MySQL • HDFS • SQL Server • HBase • PostgreSQL • Oracle • Neo4j (Graph) • ERTFS • Mongo • Redis • Elastic Search (Lucene) • Solr
  • 108. Real-Time (DB’s, Index’s, FS’s) No particular order • MySQL • HDFS • SQL Server • HBase • PostgreSQL • Oracle • Neo4j (Graph) • ERTFS • Mongo • Redis • Elastic Search (Lucene) • Cassandra • Solr
  • 109. Real-Time (DB’s, Index’s, FS’s) No particular order • MySQL • HDFS • SQL Server • HBase • PostgreSQL • Oracle • Neo4j (Graph) • ERTFS • Mongo • Redis • Elastic Search (Lucene) • Cassandra • Solr • Riak
  • 110.
  • 111. HBase
  • 114. HBase Regions and HDFS Scaling Hadoop
  • 115. HBase Regions and HDFS “Regions” Data files for regions are stored in HDFS and replicated to multiple nodes in the cluster. As well, allocation in to the cluster is rather automatic Scaling Hadoop
  • 116. HBase Regions and HDFS “Regions” Data files for regions are stored in HDFS and replicated to multiple nodes in the cluster. As well, allocation in to the cluster is rather automatic Scaling Fault Tolerance Commodity Machines Hadoop
  • 117. HBase Regions and HDFS “Regions” Data files for regions are stored in HDFS and replicated to multiple nodes in the cluster. As well, allocation in to the cluster is rather automatic Scaling Fault Tolerance Commodity Machines Hadoop Runs on top of Hadoop MapReduce Integration
  • 118.
  • 122. Cassandra Always Writable Scaling More...
  • 123. Cassandra Always Writable Even when internally the write fails. However, the data will eventually become consistent (Tunable) Scaling More...
  • 124. Cassandra Always Writable Even when internally the write fails. However, the data will eventually become consistent (Tunable) Scaling Can span data centers Peer-to-Peer communication between nodes (Gossip) More...
  • 125. Cassandra Always Writable Even when internally the write fails. However, the data will eventually become consistent (Tunable) Scaling Can span data centers Peer-to-Peer communication between nodes (Gossip) More... Supports MapReduce Supports Range Queries
  • 126.
  • 127. Redis
  • 129. Redis Transactions An evolutionary Key-Value Store
  • 130. Redis Transactions An evolutionary Key-Value Store Pub-Sub
  • 131. Redis Transactions Atomic operations (MULTI/EXEC/Discard) Queue your operations and EXEC/Commit as transaction. Allows for Roll-back support. An evolutionary Key-Value Store Pub-Sub
  • 132. Redis Transactions Atomic operations (MULTI/EXEC/Discard) Queue your operations and EXEC/Commit as transaction. Allows for Roll-back support. An evolutionary Key-Value Store Supports complex types that are closely related to fundamental data structures. No need for abstraction layer. Pub-Sub
  • 133. Redis Transactions Atomic operations (MULTI/EXEC/Discard) Queue your operations and EXEC/Commit as transaction. Allows for Roll-back support. An evolutionary Key-Value Store Supports complex types that are closely related to fundamental data structures. No need for abstraction layer. Pub-Sub Publish - Push messages to a channel Subscribe - Listen to a channel
  • 134.
  • 135. Near Real-Time & Real-Time
  • 136. Near Real-Time & Real-Time Queries and Streams
  • 137. Near Real-Time & Real-Time Queries and Streams • Storm
  • 138. Near Real-Time & Real-Time Queries and Streams • Storm • Kafka
  • 139. Near Real-Time & Real-Time Queries and Streams • Storm • Kafka • Drill/Dremel
  • 140. Near Real-Time & Real-Time Queries and Streams • Storm • Kafka • Drill/Dremel • Hadoop
  • 141. Near Real-Time & Real-Time Queries and Streams • Storm • Kafka • Drill/Dremel • Hadoop • MapReduce
  • 142. Near Real-Time & Real-Time Queries and Streams • Storm • MapReduce v2 (YARN) • Kafka • Drill/Dremel • Hadoop • MapReduce
  • 143. Near Real-Time & Real-Time Queries and Streams • Storm • MapReduce v2 (YARN) • Kafka • Pig • Drill/Dremel • Hadoop • MapReduce
  • 144. Near Real-Time & Real-Time Queries and Streams • Storm • MapReduce v2 (YARN) • Kafka • Pig • Drill/Dremel • Hive • Hadoop • MapReduce
  • 145. Near Real-Time & Real-Time Queries and Streams • Storm • MapReduce v2 (YARN) • Kafka • Pig • Drill/Dremel • Hive • Hadoop • Cascalog • MapReduce
  • 146. Near Real-Time & Real-Time Queries and Streams • Storm • MapReduce v2 (YARN) • Kafka • Pig • Drill/Dremel • Hive • Hadoop • Cascalog • MapReduce • DataTurbine
  • 147.
  • 150. MapReduce/Hadoop Scale Development
  • 151. MapReduce/Hadoop Scale Development Batch
  • 152. MapReduce/Hadoop Scale 100’s to 1000’s of server nodes Extreme and cheap Simple programming model Development Batch
  • 153. MapReduce/Hadoop Scale 100’s to 1000’s of server nodes Extreme and cheap Simple programming model Development Java, Python, Grep & Others... Batch
  • 154. MapReduce/Hadoop Scale 100’s to 1000’s of server nodes Extreme and cheap Simple programming model Development Java, Python, Grep & Others... Batch Complex Multi-Step Processing
  • 155.
  • 156. Storm
  • 158. Storm FAST Integration
  • 159. Storm FAST Integration Assurance
  • 160. Storm FAST Over a million tuples processed per second per node Integration Assurance
  • 161. Storm FAST Over a million tuples processed per second per node Integration Integrates with any queueing system and any database system Handles the parallelization, partitioning, and retrying on failures when necessary Assurance
  • 162. Storm FAST Over a million tuples processed per second per node Integration Integrates with any queueing system and any database system Handles the parallelization, partitioning, and retrying on failures when necessary Assurance Scalable, Fault-Tolerant, Guarantees your data will be processed!
  • 163.
  • 166. CQL/StreamQL/SparQL/QL-RTDB/ Languages Scalable
  • 167. CQL/StreamQL/SparQL/QL-RTDB/ Languages Scalable SQL Idioms
  • 168. CQL/StreamQL/SparQL/QL-RTDB/ Languages Human Readable Scalable SQL Idioms
  • 169. CQL/StreamQL/SparQL/QL-RTDB/ Languages Human Readable Scalable Simultaneous n Queries upon both stream data and static SQL Idioms
  • 170. CQL/StreamQL/SparQL/QL-RTDB/ Languages Human Readable Scalable Simultaneous n Queries upon both stream data and static SQL Idioms All support to a large degree what you would expect from SQL
  • 171.
  • 172. PIG
  • 176. PIG Language High Level and easy to understand (Pig Latin) Parallelization Underneath
  • 177. PIG Language High Level and easy to understand (Pig Latin) Parallelization It is trivial to achieve parallel execution of simple, "embarrassingly parallel" data analysis tasks Underneath
  • 178. PIG Language High Level and easy to understand (Pig Latin) Parallelization It is trivial to achieve parallel execution of simple, "embarrassingly parallel" data analysis tasks Underneath Essentially a MapReduce sequence compiler
  • 179.
  • 180. PIG
  • 183.
  • 184. PIG
  • 185. PIG That same example using MR Java code
  • 186.
  • 188. The perfect Army! In Memory
  • 189. The perfect Army! In Memory Identify and Plan
  • 190. The perfect Army! In Memory Identify and Plan Consumer
  • 191. The perfect Army! In Memory Keep as much as you can IN MEMORY! Think Redis... Identify and Plan Consumer
  • 192. The perfect Army! In Memory Keep as much as you can IN MEMORY! Think Redis... Identify and Plan What data can be batch processed and what can’t! Think Hadoop and Storm (for stream) and HBase (for adhoc) Consumer
  • 193. The perfect Army! In Memory Keep as much as you can IN MEMORY! Think Redis... Identify and Plan What data can be batch processed and what can’t! Think Hadoop and Storm (for stream) and HBase (for adhoc) Consumer Who is the data consumer? Person or Process? Think Pig or xQL’s for both!
  • 194. Anthony Nyström Thank You Fellow, Managing Director Gracias of Engineering Merci anthony@intridea.com Danke @AnthonyNystrom www.intridea.co