Streaming Analytics: It's Not the Same Game

Strata + Hadoop World, 2015
February 20, 2015
Subutai Ahmad
sahmad@numenta.com
Streaming Analytics:
It’s Not The Same Game

Objectives for next generation:
Generate predictions every 15-minutes
Track all product categories and geographies (hundreds of
thousands)
React rapidly to changes
Problems:
Cumbersome data infrastructure
Algorithm approach completely unclear
Slow business processes
Revenue Forecasting Customer Story
10pm
Team of 10 analysts
“Dear CEO, today’s revenue forecast is $63.4M.”

2. Look at data 3. Build models
Problem: Doesn’t scale with data velocity and number of models
1. Store data
Streaming data
Automated model creation
Continuous learning
Temporal inference
Predictions
Anomalies
Actions
Past
Future
Data: Past and Future
Solution: Streaming data infrastructure
New algorithm approach
Optimized business processes

Talk Outline
1) Challenges for traditional machine learning algorithms
2) A new approach to streaming, based on neuroscience
3) Streaming applications

ThousandsOne
NumberofCustomModels
Dozens
??
Big Data
Algorithms
Regression, SVM, DNN, ...
SecondsMonthly
Data Volatility
DailyAnnual Hourly
What Are The Implications For Machine Learning?

“How Machine Learning Is Done”
Data Prep
Craft Input
Features
Training
Methodology
Choose
Algorithm
Test &
Validate
Deploy

Talk Outline
1) Challenges for traditional machine learning algorithms
2) A new approach to streaming, based on neuroscience
3) Applications

Can Machine Learning Learn From Neuroscience?

Distributed
representations
Hebbian learning
Hierarchy
Computer vision
Auditory processing
Sparse coding
Continuous learning
Complex temporal sequences
Robust - no parameter tuning
Domain independent
Can Machine Learning Learn From Neuroscience?

Functional Properties Of The Neocortex
1) Hierarchy of nearly identical regions
- common algorithm
retina
cochlea
somatic
2) Regions are mostly sequence memory
- inference
- motor
data stream
3) Sparse Distribution Representations
- common data structure
motor control
4) Every region is continually learning
- Fully unsupervised
“Hierarchical Temporal Memory” (HTM)

Physical Architecture of the Cortex
Cortical region Layers with
columns
Neurons with
thousands of
synapses
Dendrites act as
coincidence detectors
HTM Learning Algorithms

HTM Example
Time of DayEncoders Sensor Value
Data
Spatial Pooler
Temporal Memory
HTM
Learning
Algorithms
Predictions
Anomalies
Models common spatial patterns and
temporal sequences in the stream
At every time step improve representation of
that spatial pattern and that transition
At each time step Temporal Memory
makes multiple predictions about
what might occur next
SDR

HTM Learning Algorithm Codebase
Models a single layer of cortex
1) High capacity memory-based system
2) Models complex high-order temporal sequences
3) Makes predictions and detects anomalies
4) Continuously learning
5) No sensitive parameters
Basic building block of neocortex/Machine Intelligence
Whitepaper and full source code available
at numenta.org & github.com/numenta

HTM
Encoder
SDR
Prediction
Point anomaly
Time average
Historical comparison
Anomaly score
Metric(s)
System
Anomaly
Scores
&
Predictions
HTM Engine For Streaming Analytics
HTM
Encoder
SDR
Prediction
Point anomaly
Time average
Historical comparison
Anomaly score
SDR
Metric N
.
.
.

GROK
Server anomalies
Rogue human
behavior
Geospatial
tracking
Stock
anomalies
Applications Of The HTM Engine
Social media
streams (Twitter)

Grok: Anomaly Detection For Amazon Web Services
 Unique value of HTM algorithms
 Automated model creation: configure hundreds of models in minutes
 Continuously learning: automatically adapts to changes
 Detects sophisticated temporal anomalies

3) Anomaly Detection in Geospatial Tracking Data
Fleets, Planes, Materiel, Kids, Pets
CLA
Encoder
SDRs
Prediction
Anomaly Detection
Classification
GPS+ Velocity
Trick: convert GPS coordinates into an SDR
- Represents both location and speed
- Works anywhere on Earth or in space
After input is encoded as an SDR, learning algorithm is agnostic
- This process needs to be done once per sensor type

Geospatial Anomalies
Deviation in path Change in direction

Walking From Numenta to Starbucks
Different paths are ok “Abduction”

These HTM Applications Use Exact Same Code Base
HTM learning algorithms
Identical learning parameters
Suitable for many data types
GROK
Server anomalies
Rogue human
behavior
Geospatial
tracking
Stock
anomalies
Social media
streams (Twitter)

Future of Data is Streaming Data
- High velocity data streams that change often
- Massive number of models
- Existing batch paradigm cannot scale
Streaming Analytics & Algorithms
- Must move towards:
Automated model building
Continuous learning
Temporal modeling
- We can learn how to do this from neuroscience
- HTM learning algorithms demonstrate that this is possible
Summary(Numenta Platform for Intelligent Computing)

Learn More
- Documentation, videos at numenta.com/learn
- Source code and examples at github.com/numenta
Get Involved
- Provide feedback to info@numenta.com
- Participate in NuPIC
- Active mailing lists
- Try out Grok, on AWS Marketplace

Thank you!
Contact information:
sahmad@numenta.com

Streaming Analytics: It's Not the Same Game

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (15)

Similar to Streaming Analytics: It's Not the Same Game

Similar to Streaming Analytics: It's Not the Same Game (20)

More from Numenta

More from Numenta (20)

Recently uploaded

Recently uploaded (20)

Streaming Analytics: It's Not the Same Game