4. What is Intridea?
We design and
develop apps:
Web, Mobile and Data
Tuesday, June 18, 13
5. What is Intridea?
We design and
develop apps:
Web, Mobile and Data
Founded in
Washington, DC
Tuesday, June 18, 13
6. What is Intridea?
We work with cool
clients – really!
We design and
develop apps:
Web, Mobile and Data
Founded in
Washington, DC
Tuesday, June 18, 13
7. What is Intridea?
40+ Intrideans:
Designers/Developers/Scientists
+ Smart biz folks
We work with cool
clients – really!
We design and
develop apps:
Web, Mobile and Data
Founded in
Washington, DC
Tuesday, June 18, 13
8. What is Intridea?
40+ Intrideans:
Designers/Developers/Scientists
+ Smart biz folks
We work with cool
clients – really!
We work from anywhere!
We design and
develop apps:
Web, Mobile and Data
Founded in
Washington, DC
Tuesday, June 18, 13
9. What is Intridea?
40+ Intrideans:
Designers/Developers/Scientists
+ Smart biz folks
We work with cool
clients – really!
We work from anywhere!
We design and
develop apps:
Web, Mobile and Data
Founded in
Washington, DC
We are growing
Tuesday, June 18, 13
10. What is Intridea?
40+ Intrideans:
Designers/Developers/Scientists
+ Smart biz folks
We work with cool
clients – really!
We work from anywhere!
We hire the best and
the smartest
We design and
develop apps:
Web, Mobile and Data
Founded in
Washington, DC
We are growing
Tuesday, June 18, 13
16. An Army of Tools you say?
Tuesday, June 18, 13
17. An Army of Tools you say?
• I am going to talk about what NOW means in Data Science
• Databases, Streaming Engines, Query Engines and Interfaces
• We are going to look at many of them and single out a few
• Each has a respected and in some cases competing set of
features
Tuesday, June 18, 13
19. Why is NOW in data Special?
Tuesday, June 18, 13
20. Why is NOW in data Special?
Actionable Intelligence & Knowledge
Tuesday, June 18, 13
21. Why is NOW in data Special?
Actionable Intelligence & Knowledge
NOW has innate context
Tuesday, June 18, 13
22. Why is NOW in data Special?
Actionable Intelligence & Knowledge
NOW has innate context
TIME is THE natural facet for our minds &
life!
Tuesday, June 18, 13
24. Why is NOW in data Special?
Tuesday, June 18, 13
25. Why is NOW in data Special?
Trends | Patterns | Extraction
Tuesday, June 18, 13
26. Why is NOW in data Special?
Trends | Patterns | Extraction
Data Centric Trends
Tuesday, June 18, 13
27. Why is NOW in data Special?
Trends | Patterns | Extraction
Data Centric Trends
Pattern Extraction (ML/NLP)
Tuesday, June 18, 13
28. Why is NOW in data Special?
Trends | Patterns | Extraction
Data Centric Trends
Pattern Extraction (ML/NLP)
Signature Extraction (Binary, Encoded)
Tuesday, June 18, 13
29. Why is NOW in data Special?
Trends | Patterns | Extraction
Data Centric Trends
Pattern Extraction (ML/NLP)
Signature Extraction (Binary, Encoded)
Not user input data like Google, Yahoo etc.
Tuesday, June 18, 13
30. Why is NOW in data Special?
Trends | Patterns | Extraction
Data Centric Trends
Pattern Extraction (ML/NLP)
Signature Extraction (Binary, Encoded)
Not user input data like Google, Yahoo etc.
“I am looking for data that conforms to a learned or known pattern”
Tuesday, June 18, 13
31. Why is NOW in data Special?
Trends | Patterns | Extraction
Data Centric Trends
Pattern Extraction (ML/NLP)
Signature Extraction (Binary, Encoded)
Not user input data like Google, Yahoo etc.
“I am looking for data that conforms to a learned or known pattern”
“I am looking for data that matches a predefined signature”
Tuesday, June 18, 13
33. Why is NOW in data Special?
Tuesday, June 18, 13
34. Why is NOW in data Special?
Routing | Transformation | Computation
Tuesday, June 18, 13
35. Why is NOW in data Special?
Routing | Transformation | Computation
Intelligent Routing
Tuesday, June 18, 13
36. Why is NOW in data Special?
Routing | Transformation | Computation
Transformation & Computation
Intelligent Routing
Tuesday, June 18, 13
37. Why is NOW in data Special?
Routing | Transformation | Computation
Transformation & Computation
Intelligent Routing
“I need to replicate/fork that of criteria x portions of this data
stream”
Tuesday, June 18, 13
38. Why is NOW in data Special?
Routing | Transformation | Computation
Transformation & Computation
Intelligent Routing
“I need to replicate/fork that of criteria x portions of this data
stream”
“I need to transform certain fields” or “I need to compute
a some value on certain fields”
Tuesday, June 18, 13
40. Why is NOW in data Special?
Tuesday, June 18, 13
41. Why is NOW in data Special?
Algorithmic Speciality
Tuesday, June 18, 13
42. Why is NOW in data Special?
Concepts
Algorithmic Speciality
Tuesday, June 18, 13
43. Why is NOW in data Special?
Regression
Concepts
Algorithmic Speciality
Tuesday, June 18, 13
44. Why is NOW in data Special?
Relationships
Regression
Concepts
Algorithmic Speciality
Tuesday, June 18, 13
45. Why is NOW in data Special?
Relationships
Regression
Concepts
Algorithmic Speciality
What does a value represent or infer (NLP/ML/k-NN)
Tuesday, June 18, 13
46. Why is NOW in data Special?
Relationships
Regression
Concepts
Algorithmic Speciality
What does a value represent or infer (NLP/ML/k-NN)
How is a value related to another value or
How can we predict such relations
Tuesday, June 18, 13
47. Why is NOW in data Special?
Relationships
Regression
Concepts
Algorithmic Speciality
What does a value represent or infer (NLP/ML/k-NN)
How is a value related to another value or
How can we predict such relations
Topological, Ontological, Forest
(Evolutionary/Random) (NLP)
Tuesday, June 18, 13
62. Latency
Standard - NOPE!
Depends upon the Consumer - YEP!
Depends upon the Medium - YEP!
Data Creation Time | Data Consumption Time
Tuesday, June 18, 13
63. Latency
Standard - NOPE!
Depends upon the Consumer - YEP!
Depends upon the Medium - YEP!
Depends upon Technology - YEP!
Data Creation Time | Data Consumption Time
Tuesday, June 18, 13
70. NOW and Latency
Real-Time
Some-Time
Data is consumed within seconds/minutes
Data that is consumed immediately after creation
Near Real-Time
Tuesday, June 18, 13
71. NOW and Latency
Real-Time
Some-Time
Data is consumed when requested & is NOT RT nor NRT
Data is consumed within seconds/minutes
Data that is consumed immediately after creation
Near Real-Time
Tuesday, June 18, 13
74. Perception:
Research suggests that the human retina transmits data to the brain at the
rate of 10 million bits per second, which is close to that of 10 base Ethernet
connection!
We can perceive changes in reality at ~ 13-15 frames per second (fps, or
Hz), Our perception of reality fully refreshes itself ~ once every 77
Physiological Latency
Tuesday, June 18, 13
75. Perception:
Research suggests that the human retina transmits data to the brain at the
rate of 10 million bits per second, which is close to that of 10 base Ethernet
connection!
We can perceive changes in reality at ~ 13-15 frames per second (fps, or
Hz), Our perception of reality fully refreshes itself ~ once every 77
Stock Exchange ~ 5-100 milliseconds (ms)
Physiological Latency
Tuesday, June 18, 13
76. Web Sites ~ 50-400 milliseconds (ms)
Perception:
Research suggests that the human retina transmits data to the brain at the
rate of 10 million bits per second, which is close to that of 10 base Ethernet
connection!
We can perceive changes in reality at ~ 13-15 frames per second (fps, or
Hz), Our perception of reality fully refreshes itself ~ once every 77
Stock Exchange ~ 5-100 milliseconds (ms)
Physiological Latency
Tuesday, June 18, 13
77. Web Sites ~ 50-400 milliseconds (ms)
Perception:
Research suggests that the human retina transmits data to the brain at the
rate of 10 million bits per second, which is close to that of 10 base Ethernet
connection!
We can perceive changes in reality at ~ 13-15 frames per second (fps, or
Hz), Our perception of reality fully refreshes itself ~ once every 77
Games (FPS) ~ 10-150 milliseconds (ms)
Stock Exchange ~ 5-100 milliseconds (ms)
Physiological Latency
Tuesday, June 18, 13
78. Web Sites ~ 50-400 milliseconds (ms)
Perception:
Research suggests that the human retina transmits data to the brain at the
rate of 10 million bits per second, which is close to that of 10 base Ethernet
connection!
We can perceive changes in reality at ~ 13-15 frames per second (fps, or
Hz), Our perception of reality fully refreshes itself ~ once every 77
Games (FPS) ~ 10-150 milliseconds (ms)
Social/Games ~ 200 ms -1 second
Stock Exchange ~ 5-100 milliseconds (ms)
Physiological Latency
Tuesday, June 18, 13
101. HBase
Regions and HDFS
“Regions” Data files for regions are stored in HDFS and replicated to
multiple nodes in the cluster. As well, allocation in to the cluster is
rather automatic
Hadoop
Scaling
Tuesday, June 18, 13
102. HBase
Regions and HDFS
“Regions” Data files for regions are stored in HDFS and replicated to
multiple nodes in the cluster. As well, allocation in to the cluster is
rather automatic
Hadoop
Scaling
Fault Tolerance
Commodity Machines
Tuesday, June 18, 13
103. HBase
Regions and HDFS
Runs on top of Hadoop
MapReduce Integration
“Regions” Data files for regions are stored in HDFS and replicated to
multiple nodes in the cluster. As well, allocation in to the cluster is
rather automatic
Hadoop
Scaling
Fault Tolerance
Commodity Machines
Tuesday, June 18, 13
109. Cassandra
Always Writable
Even when internally the write fails. However, the data will eventually
become consistent (Tunable)
More...
Scaling
Tuesday, June 18, 13
110. Cassandra
Always Writable
Even when internally the write fails. However, the data will eventually
become consistent (Tunable)
More...
Scaling
Can span data centers
Peer-to-Peer communication between nodes (Gossip)
Tuesday, June 18, 13
111. Cassandra
Always Writable
Supports MapReduce
Supports Range Queries
Even when internally the write fails. However, the data will eventually
become consistent (Tunable)
More...
Scaling
Can span data centers
Peer-to-Peer communication between nodes (Gossip)
Tuesday, June 18, 13
118. Redis
Transactions
Atomic operations (MULTI/EXEC/Discard) Queue your operations and
EXEC/Commit as transaction. Allows for Roll-back support.
Pub-Sub
An evolutionary Key-Value Store
Supports complex types that are closely related to fundamental data
structures. No need for abstraction layer.
Tuesday, June 18, 13
119. Redis
Transactions
Publish - Push messages to a channel
Subscribe - Listen to a channel
Atomic operations (MULTI/EXEC/Discard) Queue your operations and
EXEC/Commit as transaction. Allows for Roll-back support.
Pub-Sub
An evolutionary Key-Value Store
Supports complex types that are closely related to fundamental data
structures. No need for abstraction layer.
Tuesday, June 18, 13
139. MapReduce/Hadoop
Scale
100’s to 1000’s of server nodes
Extreme and cheap
Simple programming model
Batch
Development
Java, Python, Grep & Others...
Tuesday, June 18, 13
146. Storm
FAST
Over a million tuples processed per second per node
Assurance
Integration
Tuesday, June 18, 13
147. Storm
FAST
Over a million tuples processed per second per node
Assurance
Integration
Integrates with any queueing system and any database system
Handles the parallelization, partitioning, and retrying on
failures when necessary
Tuesday, June 18, 13
148. Storm
FAST
Scalable, Fault-Tolerant, Guarantees your data will be processed!
Over a million tuples processed per second per node
Assurance
Integration
Integrates with any queueing system and any database system
Handles the parallelization, partitioning, and retrying on
failures when necessary
Tuesday, June 18, 13
156. CQL/StreamQL/SparQL/QL-RTDB/
Languages
All support to a large degree what you would expect from SQL
Human Readable
SQL Idioms
Scalable
Simultaneous n Queries upon both stream data and static
Tuesday, June 18, 13
163. PIG
Language
High Level and easy to understand (Pig Latin)
Underneath
Parallelization
It is trivial to achieve parallel execution of simple, "embarrassingly
parallel" data analysis tasks
Tuesday, June 18, 13
164. PIG
Language
Essentially a MapReduce sequence compiler
High Level and easy to understand (Pig Latin)
Underneath
Parallelization
It is trivial to achieve parallel execution of simple, "embarrassingly
parallel" data analysis tasks
Tuesday, June 18, 13
177. The perfect Army!
In Memory
Keep as much as you can IN MEMORY! Think Redis...
Consumer
Identify and Plan
Tuesday, June 18, 13
178. The perfect Army!
In Memory
Keep as much as you can IN MEMORY! Think Redis...
Consumer
Identify and Plan
What data can be batch processed and what can’t! Think
Hadoop and Storm (for stream) and HBase (for adhoc)
Tuesday, June 18, 13
179. The perfect Army!
In Memory
Who is the data consumer? Person or Process? Think Pig or xQL’s for
both!
Keep as much as you can IN MEMORY! Think Redis...
Consumer
Identify and Plan
What data can be batch processed and what can’t! Think
Hadoop and Storm (for stream) and HBase (for adhoc)
Tuesday, June 18, 13