5. What is Intridea?
We design and
develop apps:
Web, Mobile and Data
Founded in
Washington, DC
6. What is Intridea?
We design and
We work with cool develop apps:
clients – really! Web, Mobile and Data
Founded in
Washington, DC
7. What is Intridea?
We design and
We work with cool develop apps:
clients – really! Web, Mobile and Data
Founded in
Washington, DC
40+ Intrideans:
Designers/Developers/Scientists
+ Smart biz folks
8. What is Intridea?
We design and
We work with cool develop apps:
clients – really! Web, Mobile and Data
We work from anywhere!
Founded in
Washington, DC
40+ Intrideans:
Designers/Developers/Scientists
+ Smart biz folks
9. What is Intridea?
We design and
We work with cool develop apps:
clients – really! Web, Mobile and Data
We work from anywhere!
Founded in
Washington, DC
40+ Intrideans:
Designers/Developers/Scientists
+ Smart biz folks
We are growing
10. What is Intridea?
We design and
We work with cool develop apps:
clients – really! Web, Mobile and Data
We work from anywhere!
Founded in
Washington, DC
40+ Intrideans:
Designers/Developers/Scientists
+ Smart biz folks We hire the best and
We are growing the smartest
17. An Army of Tools you say?
• I am going to talk about what NOW means in Data Science
• Databases, Streaming Engines, Query Engines and Interfaces
• We are going to look at many of them and single out a few
• Each has a respected and in some cases competing set of
features
30. Why is NOW in data Special?
Trends | Patterns | Extraction
31. Why is NOW in data Special?
Trends | Patterns | Extraction
Data Centric Trends
32. Why is NOW in data Special?
Trends | Patterns | Extraction
Data Centric Trends
Pattern Extraction (ML/NLP)
33. Why is NOW in data Special?
Trends | Patterns | Extraction
Data Centric Trends
Pattern Extraction (ML/NLP)
Signature Extraction (Binary, Encoded)
34. Why is NOW in data Special?
Trends | Patterns | Extraction
Data Centric Trends
Not user input data like Google, Yahoo etc.
Pattern Extraction (ML/NLP)
Signature Extraction (Binary, Encoded)
35. Why is NOW in data Special?
Trends | Patterns | Extraction
Data Centric Trends
Not user input data like Google, Yahoo etc.
Pattern Extraction (ML/NLP)
“I am looking for data that conforms to a learned or known pattern”
Signature Extraction (Binary, Encoded)
36. Why is NOW in data Special?
Trends | Patterns | Extraction
Data Centric Trends
Not user input data like Google, Yahoo etc.
Pattern Extraction (ML/NLP)
“I am looking for data that conforms to a learned or known pattern”
Signature Extraction (Binary, Encoded)
“I am looking for data that matches a predefined signature”
39. Why is NOW in data Special?
Routing | Transformation | Computation
40. Why is NOW in data Special?
Routing | Transformation | Computation
Intelligent Routing
41. Why is NOW in data Special?
Routing | Transformation | Computation
Intelligent Routing
Transformation & Computation
42. Why is NOW in data Special?
Routing | Transformation | Computation
Intelligent Routing
“I need to replicate/fork that of criteria x portions of this data
stream”
Transformation & Computation
43. Why is NOW in data Special?
Routing | Transformation | Computation
Intelligent Routing
“I need to replicate/fork that of criteria x portions of this data
stream”
Transformation & Computation
“I need to transform certain fields” or “I need to compute
a some value on certain fields”
46. Why is NOW in data Special?
Algorithmic Speciality
47. Why is NOW in data Special?
Algorithmic Speciality
Concepts
48. Why is NOW in data Special?
Algorithmic Speciality
Concepts
Regression
49. Why is NOW in data Special?
Algorithmic Speciality
Concepts
Regression
Relationships
50. Why is NOW in data Special?
Algorithmic Speciality
Concepts
What does a value represent or infer (NLP/ML/k-NN)
Regression
Relationships
51. Why is NOW in data Special?
Algorithmic Speciality
Concepts
What does a value represent or infer (NLP/ML/k-NN)
Regression
How is a value related to another value or
How can we predict such relations
Relationships
52. Why is NOW in data Special?
Algorithmic Speciality
Concepts
What does a value represent or infer (NLP/ML/k-NN)
Regression
How is a value related to another value or
How can we predict such relations
Relationships
Topological, Ontological, Forest
(Evolutionary/Random) (NLP)
71. Latency
Data Creation Time | Data Consumption Time
Standard - NOPE!
Depends upon the Medium - YEP!
Depends upon the Consumer - YEP!
72. Latency
Data Creation Time | Data Consumption Time
Standard - NOPE!
Depends upon the Medium - YEP!
Depends upon the Consumer - YEP!
Depends upon Technology - YEP!
78. NOW and Latency
Real-Time
Data that is consumed immediately after creation
Near Real-Time
Some-Time
79. NOW and Latency
Real-Time
Data that is consumed immediately after creation
Near Real-Time
Data is consumed within seconds/minutes
Some-Time
80. NOW and Latency
Real-Time
Data that is consumed immediately after creation
Near Real-Time
Data is consumed within seconds/minutes
Some-Time
Data is consumed when requested & is NOT RT nor NRT
83. Physiological Latency
Perception:
Research suggests that the human retina transmits data to the brain at the
rate of 10 million bits per second, which is close to that of 10 base Ethernet
connection!
We can perceive changes in reality at ~ 13-15 frames per second (fps, or
Hz), Our perception of reality fully refreshes itself ~ once every 77
84. Physiological Latency
Perception:
Research suggests that the human retina transmits data to the brain at the
rate of 10 million bits per second, which is close to that of 10 base Ethernet
connection!
We can perceive changes in reality at ~ 13-15 frames per second (fps, or
Hz), Our perception of reality fully refreshes itself ~ once every 77
Stock Exchange ~ 5-100 milliseconds (ms)
85. Physiological Latency
Perception:
Research suggests that the human retina transmits data to the brain at the
rate of 10 million bits per second, which is close to that of 10 base Ethernet
connection!
We can perceive changes in reality at ~ 13-15 frames per second (fps, or
Hz), Our perception of reality fully refreshes itself ~ once every 77
Stock Exchange ~ 5-100 milliseconds (ms)
Web Sites ~ 50-400 milliseconds (ms)
86. Physiological Latency
Perception:
Research suggests that the human retina transmits data to the brain at the
rate of 10 million bits per second, which is close to that of 10 base Ethernet
connection!
We can perceive changes in reality at ~ 13-15 frames per second (fps, or
Hz), Our perception of reality fully refreshes itself ~ once every 77
Stock Exchange ~ 5-100 milliseconds (ms)
Web Sites ~ 50-400 milliseconds (ms)
Games (FPS) ~ 10-150 milliseconds (ms)
87. Physiological Latency
Perception:
Research suggests that the human retina transmits data to the brain at the
rate of 10 million bits per second, which is close to that of 10 base Ethernet
connection!
We can perceive changes in reality at ~ 13-15 frames per second (fps, or
Hz), Our perception of reality fully refreshes itself ~ once every 77
Stock Exchange ~ 5-100 milliseconds (ms)
Web Sites ~ 50-400 milliseconds (ms)
Games (FPS) ~ 10-150 milliseconds (ms)
Social/Games ~ 200 ms -1 second
115. HBase
Regions and HDFS
“Regions” Data files for regions are stored in HDFS and replicated to
multiple nodes in the cluster. As well, allocation in to the cluster is
rather automatic
Scaling
Hadoop
116. HBase
Regions and HDFS
“Regions” Data files for regions are stored in HDFS and replicated to
multiple nodes in the cluster. As well, allocation in to the cluster is
rather automatic
Scaling
Fault Tolerance
Commodity Machines
Hadoop
117. HBase
Regions and HDFS
“Regions” Data files for regions are stored in HDFS and replicated to
multiple nodes in the cluster. As well, allocation in to the cluster is
rather automatic
Scaling
Fault Tolerance
Commodity Machines
Hadoop
Runs on top of Hadoop
MapReduce Integration
123. Cassandra
Always Writable
Even when internally the write fails. However, the data will eventually
become consistent (Tunable)
Scaling
More...
124. Cassandra
Always Writable
Even when internally the write fails. However, the data will eventually
become consistent (Tunable)
Scaling
Can span data centers
Peer-to-Peer communication between nodes (Gossip)
More...
125. Cassandra
Always Writable
Even when internally the write fails. However, the data will eventually
become consistent (Tunable)
Scaling
Can span data centers
Peer-to-Peer communication between nodes (Gossip)
More...
Supports MapReduce
Supports Range Queries
129. Redis
Transactions
An evolutionary Key-Value Store
130. Redis
Transactions
An evolutionary Key-Value Store
Pub-Sub
131. Redis
Transactions
Atomic operations (MULTI/EXEC/Discard) Queue your operations and
EXEC/Commit as transaction. Allows for Roll-back support.
An evolutionary Key-Value Store
Pub-Sub
132. Redis
Transactions
Atomic operations (MULTI/EXEC/Discard) Queue your operations and
EXEC/Commit as transaction. Allows for Roll-back support.
An evolutionary Key-Value Store
Supports complex types that are closely related to fundamental data
structures. No need for abstraction layer.
Pub-Sub
133. Redis
Transactions
Atomic operations (MULTI/EXEC/Discard) Queue your operations and
EXEC/Commit as transaction. Allows for Roll-back support.
An evolutionary Key-Value Store
Supports complex types that are closely related to fundamental data
structures. No need for abstraction layer.
Pub-Sub
Publish - Push messages to a channel
Subscribe - Listen to a channel
152. MapReduce/Hadoop
Scale
100’s to 1000’s of server nodes
Extreme and cheap
Simple programming model
Development
Batch
153. MapReduce/Hadoop
Scale
100’s to 1000’s of server nodes
Extreme and cheap
Simple programming model
Development
Java, Python, Grep & Others...
Batch
154. MapReduce/Hadoop
Scale
100’s to 1000’s of server nodes
Extreme and cheap
Simple programming model
Development
Java, Python, Grep & Others...
Batch
Complex Multi-Step Processing
160. Storm
FAST
Over a million tuples processed per second per node
Integration
Assurance
161. Storm
FAST
Over a million tuples processed per second per node
Integration
Integrates with any queueing system and any database system
Handles the parallelization, partitioning, and retrying on
failures when necessary
Assurance
162. Storm
FAST
Over a million tuples processed per second per node
Integration
Integrates with any queueing system and any database system
Handles the parallelization, partitioning, and retrying on
failures when necessary
Assurance
Scalable, Fault-Tolerant, Guarantees your data will be processed!
169. CQL/StreamQL/SparQL/QL-RTDB/
Languages
Human Readable
Scalable
Simultaneous n Queries upon both stream data and static
SQL Idioms
170. CQL/StreamQL/SparQL/QL-RTDB/
Languages
Human Readable
Scalable
Simultaneous n Queries upon both stream data and static
SQL Idioms
All support to a large degree what you would expect from SQL
176. PIG
Language
High Level and easy to understand (Pig Latin)
Parallelization
Underneath
177. PIG
Language
High Level and easy to understand (Pig Latin)
Parallelization
It is trivial to achieve parallel execution of simple, "embarrassingly
parallel" data analysis tasks
Underneath
178. PIG
Language
High Level and easy to understand (Pig Latin)
Parallelization
It is trivial to achieve parallel execution of simple, "embarrassingly
parallel" data analysis tasks
Underneath
Essentially a MapReduce sequence compiler
191. The perfect Army!
In Memory
Keep as much as you can IN MEMORY! Think Redis...
Identify and Plan
Consumer
192. The perfect Army!
In Memory
Keep as much as you can IN MEMORY! Think Redis...
Identify and Plan
What data can be batch processed and what can’t! Think
Hadoop and Storm (for stream) and HBase (for adhoc)
Consumer
193. The perfect Army!
In Memory
Keep as much as you can IN MEMORY! Think Redis...
Identify and Plan
What data can be batch processed and what can’t! Think
Hadoop and Storm (for stream) and HBase (for adhoc)
Consumer
Who is the data consumer? Person or Process? Think Pig or xQL’s for
both!
194. Anthony Nyström Thank You
Fellow, Managing Director Gracias
of Engineering
Merci
anthony@intridea.com Danke
@AnthonyNystrom
www.intridea.co