The Briefing Room with Mark Madsen and WebAction
Live Webcast Feb. 10, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=fa83c6283de99dfb6f38b9d7199cb452
In our increasingly interconnected world, the windows of opportunity for meaningful action are shrinking. Where hours once sufficed, minutes are now the norm. For some transactions, seconds make all the difference, even sub-seconds. Meeting these demands requires a new approach to information architecture, one that embraces the many innovations that are fundamentally changing the data-driven economy.
Register for this episode of The Briefing Room to hear veteran Analyst Mark Madsen of Third Nature as he explains how a confluence of advances are changing the nature of data management. He'll be briefed by Sami Akbay of WebAction, who will showcase his company's real-time data platform, designed from the ground up to meet the challenges of leveraging Big Data in concert with all manner of operational enterprise systems.
Visit InsideAnalysis.com for more information.
3. Twitter Tag: #briefr The Briefing Room
Welcome
Host:
Eric Kavanagh
eric.kavanagh@bloorgroup.com
@eric_kavanagh
4. Twitter Tag: #briefr The Briefing Room
Reveal the essential characteristics of enterprise
software, good and bad
Provide a forum for detailed analysis of today s innovative
technologies
Give vendors a chance to explain their product to savvy
analysts
Allow audience members to pose serious questions... and
get answers!
Mission
5. Twitter Tag: #briefr The Briefing Room
Topics
February: DATA IN MOTION
March: BI/ANALYTICS
April: BIG DATA
6. Twitter Tag: #briefr The Briefing Room
Parmenides and the Truth of Now
"Parmenides". Licensed under CC BY-SA 3.0 via Wikimedia
Commons - http://commons.wikimedia.org/wiki/
File:Parmenides.jpg#mediaviewer/File:Parmenides.jpg
There is no tomorrow
There is no yesterday
There is only today
There is only now
7. Twitter Tag: #briefr The Briefing Room
Analyst: Mark Madsen
Mark Madsen is president of Third Nature, a
technology research and consulting firm
focused on business intelligence, data
integration and data management. Mark is
an award-winning author, architect and
CTO whose work has been featured in
numerous industry publications. Over the
past ten years Mark received awards for his
work from the American Productivity &
Quality Center, TDWI, and the Smithsonian
Institute. He is an international speaker, a
contributor to Forbes Online and on the
O’Reilly Strata program committee. For
more information or to contact Mark, follow
@markmadsen on Twitter or visit http://
ThirdNature.net
8. Twitter Tag: #briefr The Briefing Room
WebAction
WebAction offers real-time data-driven apps and the
underlying enterprise platform
The platform captures structured and unstructured data
from a wide variety of data sources and allows users to
correlate and enrich data streams
WebAction leverages in-memory data processing and is
architected to scale up and scale out
9. Twitter Tag: #briefr The Briefing Room
Guest: Sami Akbay
Sami Akbay is a founder of WebAction. Prior to
WebAction, he served as the CEO of Altibase,
Inc., an in-memory RDBMS company with
customers in financial services, utilities, and
telecommunications. Sami was Vice President of
Marketing and Product Management for
GoldenGate Software from 2004 through its
acquisition by Oracle. Prior to GoldenGate, he
served in senior product marketing and business
development roles at Embarcadero and AltoWeb.
He spent his earlier career in technical and
consulting roles working at Rabobank
Nederlands, Hearst New Media, American Stock
Exchange, MediaMetrix, OneMain.com
(Earthlink), and ALK Associates. He is a graduate
of Rutgers University.
12. PROPRIETARY & CONFIDENTIAL
• Insights come from analyzing historic data:
– What is the average hourly sales for our Boston store on a
typical weekday in February?
– Who are my top 1% passengers by revenue for 2014?
– How many dropped calls does my average subscriber
experience before cancelling service if they have a 2 year
contract and $250 cancellation penalty?
13. PROPRIETARY & CONFIDENTIAL
• Events without context are not very meaningful
– In the last 30 minutes, we had a revenue of $8,000 in our
Boston store.
– Mark Madsen will miss his connection from ORD to EWR
because his flight departed late from SFO
– Sami Akbay dropped calls 3 times in the last 30 minutes
14. PROPRIETARY & CONFIDENTIAL
• Actionable insights combine analyzed history with realtime event
streams:
– We typically sell $3000 per hour on a weekday in February at our
Boston store. In the last 30 minutes we sold $8,000. Alert the store
manager and require ID check at checkout.
– Mark Madsen is a top 1% passenger by revenue. Have an agent
meet him at the gate and deliver his boarding pass for the next
flight.
– A subscriber will drop 8 calls before becoming a churn risk. Don’t
give him a service discount as an incentive if he calls 611.
17. PROPRIETARY & CONFIDENTIAL
Data
Warehouse
Device Data
Industry Data
Social Feeds
Transaction Data
System/ IT Data
Hadoop
ETL
18. (Existing) ETL
WebAction
Batch
/
High-‐Latency
Real9me
/
Low-‐Latency
EDW
Realtime
Applications
Legacy
Applications
Pig Hive
Map/Reduce
Applications
Users
Hadoop
Device Data
Industry Data
Social Feeds
Transaction Data
System/ IT Data
19. PROPRIETARY & CONFIDENTIAL
WebAction® delivers the most comprehensive
Realtime Stream Analytics Platform
enabling the tailored enterprise-scale
Big Data Applications
for the Agile Enterprise
20. PROPRIETARY & CONFIDENTIAL
Acquire Store Process
Acquire Process in Memory Deliver
BI /
Analytics
RDBMS EDW
Structured
Data
Machine
Data
LocationClick
Stream
Structured
Data
Machine
Data
LocationClick
Stream
Data Driven
Apps
Batch Reactive
R E A LT I M E B A R R I E R
ProactiveRealtime
Visualizations Store
Alerts Integrate
22. PROPRIETARY & CONFIDENTIAL
Structured and
unstructured data
Distributed,
in-memory, as data
is created
Correlated, enriched,
and
filtered real-time big
data records
Deliver
Process
Acquire
23. PROPRIETARY & CONFIDENTIAL
Acquire
Structured and
unstructured data
§ Data from transactional sources is acquired via redo
or transaction logs
§ Structured and non-Structured data
§ No Production Impact
§ No Application changes
Device Data
Industry Data
Social Feeds
Real-Time
Transaction Data
System/ IT Data
Common File
Format
TYPE EXAMPLE COMPLEXITY
CSV, JSON, XML
Facebook, Twitter
Syslogs, weblogs, Netflow
SmartMeter, Medical Device, RFID
SWIFT, HL7, FIX
Oracle, DB2, SQLServer, MySQL, HP NonStop
SIMPLE
VERY HIGH
SIMPLE TO MEDIUM
MEDIUM
MEDIUM
HIGH
24. PROPRIETARY & CONFIDENTIAL
Process
Distributed,
in-memory, as data
is created
§ Enrich live Big Data with historical data sources
§ Process Big Data faster using partitioned streams,
caches, and additional nodes
§ Execute SQL-like queries of in-memory Big Data
§ Alert in real-time based on predictive analytic
model results
Acquire
Structured and
unstructured data
25. PROPRIETARY & CONFIDENTIAL
Acquire
Process
Structured and
unstructured data
Distributed,
in-memory, as data
is created
Deliver
Correlated, enriched,
and
filtered real-time big
data records
§ Continuous Big Data Records
§ Real-Time Dashboards
§ Predictive Alerts
§ Business Trends
§ Data Patterns
§ Outliers
28. Twitter Tag: #briefr The Briefing Room
Perceptions & Questions
Analyst:
Mark Madsen
29. Copyright
Third
Nature,
Inc.
We
are
in
a
transi*onal
phase
in
IT
architecture
Then
State
of
Prac*ce
Now,
forward
Architecture
Timeshare
Client/server
Cloud
Data
Core
TXs
All
TXs,
some
events,
docs
All
data
Rate
of
change
Slow
Rapid
Con9nuous
Uses
Few
Many
Everything
Latency
Daily+++
<
daily
to
minutes
Immediate
Data
plaAorm
Uniprocessor
SMP,
cluster
Shared
nothing
30. Copyright
Third
Nature,
Inc.
Majority
use
of
compu*ng
over
*me
1930s-‐1950s:
Calculate
1960s-‐1980s:
Automate
1990s-‐2010s:
Informate
2010s+:
Analyze
and
Actuate
Computing technology has become a tool of observation and
actuation, not just a recipient of human-entered data
Risingorganizationalcomplexity
31. Copyright
Third
Nature,
Inc.
The
data
warehouse
vs
business
agility
All
the
data
Ready-‐to-‐use
common,
typed,
tabular
data
The
bo[leneck
is
you
32. 0
1
2
3
4
5
6
7
Polling
is
not
streaming,
minutes
is
not
real
*me
32
0
1
2
3
4
5
6
7
The problem is
visible here after
2.5 minutes, at
the earliest
The problem
is visible here
4 seconds
after the first
bad event
Streaming
model
Polling
model
Events recorded,
processed,
stored in DB and
ready after 2.5
minutes
Action
taken after
3 minutes,
at 3.5
minutes
Problem
completely
resolved at
4 minutes
Something
broke
1st bad event
detected
Action taken
after 3
minutes, at
6 minutes
Problem
completely
resolved at 6.5
minutes
Reaction
takes 3
minutes
…
Reaction
takes 3
minutes
…
Streaming
Polling
Alert
threshold
Problem
gets worse
Action taken
33. Copyright
Third
Nature,
Inc.
The
data
warehouse
is
not
designed
for
real
*me
A
polling
architecture
does
not
work
well
for
event
data
▪ Introduces
latency
▪ Polling
creates
performance
and
scaling
problems
The
DW
can’t
handle
real-‐9me
ingest
▪ One
of
the
original
DW
design
assump9ons:
solve
for
conflic9ng
workloads
by
separa9ng
them
in
9me
▪ Workload
management
has
limits
▪ Scalability
problem
for
event
streams
▪ Spiky
flow
pa[erns
and
dynamic
scaling
Sta9c
schema:
▪ What
happens
first,
upstream
change
or
data
model
change?
▪ What
is
your
reac9on
9me?
The
problem
of
dropped
packets
34. Copyright
Third
Nature,
Inc.
The
crea*on
and
flow
of
data
is
different
for
transac*ons
and
machine-‐generated
events
Data entry Extract Cleanse Load Use
Data
Generation
Store
Store
Use
Use
The process for most human-entered data; human speed
The process for machine-generated data; machine speed
Cleanse
Program
35. Copyright
Third
Nature,
Inc.
Real-‐9me
monitoring
is
not
polling
Real-‐9me
monitoring
o"en
needs
to
access
history
The
data
in
mo9on
and
the
data
at
rest
is
the
same
data.
Therefore:
Real
9me
(in
mo9on)
and
persistence
(at
rest)
must
be
supported
by
the
same
architecture
36. Copyright
Third
Nature,
Inc.
Flowing Unloaded
Sliding window
of “now”
Persisted but not yet
loaded into DB
Queryable history
Stored in database / datastore
Real
*me
isn’t
either-‐or,
it’s
part
of
the
architecture
A DB can get you to within
minutes (at large scale) but it
won’t be easy or cheap
Streaming SQL, stream
engines, CEP may be
used for these
Real-time monitoring doesn’t use only real-time data:
windows, restarts, detecting deviation, so the above
boundaries are crossed.
ESB Cache/Queue Database
37. Copyright
Third
Nature,
Inc.
Deliver
Refine
Manage
Store
Ingest
This
implies
a
new
DW
architecture,
data
modeling
approach
Analyze
Use
Decouple the data architecture layers
38. Copyright
Third
Nature,
Inc.
Stream
If
you
want
to
do
real
*me
and
s*ll
manage
your
data
effec*vely
then
you
need
this
data
architecture
Collect Refine Manage Deliver
Flowing Managed historyPersisted
Metadata? Metadata?
Flow, persisted, managed define different
storage and retrieval requirements
39. Copyright
Third
Nature,
Inc.
Ques*ons
Why
an
integrated
product
rather
than
other
alterna9ves
like
a
RT
streaming
engine
or
a
streaming
SQL
database?
What
do
you
do
at
the
metadata
layer
to
expose
data
this
is
a
message,
a
table,
or
both?
What
mechanisms
does
it
use
to
scale?
How
does
one
deploy
the
user
interface
por9on
of
an
applica9on?
What
happens
if
there’s
a
reader
/
writer
lag
or
failure?
How
do
you
handle
recovery
in
the
event
of
a
stream
failure
(one
stream,
correlated
stream)?
Can
you
/
how
do
you
persist
data
that
you
calculate
and
display?
What
types
of
streaming
func9ons
do
you
support
(e.g.,
windows
–
sliding
/jump
9me,
count,
9me
series
alignment)?
How
complex
of
a
calcula9on
can
you
create?