Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Fast Data and Streaming Analytics
in the Era of Hadoop, R and Apache Spark
Kai Wähner
kwaehner@tibco.com
@KaiWaehner
www.k...
Key Messages
– Streaming Analytics processes Data while it is in Motion!
– Automation and Proactive Human Interaction are ...
Agenda
– Real World Use Cases
– Introduction to Stream Processing
– Market Overview
– Relation to other Big Data Components
Agenda
– Real World Use Cases
– Introduction to Stream Processing
– Market Overview
– Relation to other Big Data Components
© Copyright 2015 TIBCO Software Inc.
Find and Act on “Critical Business Moments”
“Business Moments” occur in Every Facet o...
Operational Intelligence in Action
© Copyright 2000-2015 TIBCO Software Inc.
Actions by Operations
Human decisions in real...
7
Success Story
© Copyright 2000-2015 TIBCO Software Inc.
Predictive Fault Management
© Copyright 2000-2013 TIBCO Software Inc.
“An outage on one well can cost $10M per
hour. We have 20-100 outages per year.“...
Data Monitoring
• Motor temperature
• Motor vibration
• Current
• Intake pressure
• Intake temperature
 Flow
Electrical p...
Voltage
Temperature
Vibration
Device
history
Temporal analytic: “If vibration spike is followed by temp spike then
voltage...
Live Surveillance of Equipment
© Copyright 2000-2014 TIBCO Software Inc.
Continuous, live geospatial display of pump
healt...
12
Success Story
© Copyright 2000-2015 TIBCO Software Inc.
Smart Manufacturing
IoT for High Tech Manufacturing Yield Optimization
© Copyright 2000-2014 TIBCO Software Inc.
• Before: Solar Panel Manufac...
High Tech Manufacturing Yield Optimization
© Copyright 2000-2014 TIBCO Software Inc.
Live streaming datamart analysis
Cont...
High Tech Manufacturing Yield Optimization
© Copyright 2000-2014 TIBCO Software Inc.
Continuously computed real-time analy...
High Tech Manufacturing Yield Optimization
© Copyright 2000-2014 TIBCO Software Inc.
Manufacturing operations staff
drill ...
17
Success Story
© Copyright 2000-2015 TIBCO Software Inc.
Crowd Management
18
© Copyright 2000-2015 TIBCO Software Inc.
Crowd Management (Stadium, Airport, Conference, …)
Sacramento Kings  World’s Smartest Building
© Copyright 2000-2015 TIBCO Software Inc.
20
© Copyright 2000-2014 TIBCO Software Inc.
21
© Copyright 2000-2014 TIBCO Software Inc.
22
© Copyright 2000-2014 TIBCO Software Inc.
23
© Copyright 2000-2014 TIBCO Software Inc.
24
© Copyright 2000-2014 TIBCO Software Inc.
25
© Copyright 2000-2014 TIBCO Software Inc.
26
© Copyright 2000-2014 TIBCO Software Inc.
27
Success Story
© Copyright 2000-2015 TIBCO Software Inc.
Retailing in the 21st Century
Challenges of the 21st Century Retailer
• Retailing and Retail Challenges are changing
• Consumers expect better and integ...
29
© Copyright 2000-2014 TIBCO Software Inc.
All Customers are different… Treat them that way…
Capture – Engage – Expand -...
National Retailer Loyalty 2015
© Copyright 2000-2015 TIBCO Software Inc.
Top Benefits
• Smart cross-selling based in iBeac...
New Real-Time Fraud Detection
Based on Deep Historical Insight
Real-time fraud action can be taken based on
historical ins...
32
© Copyright 2000-2015 TIBCO Software Inc.
Internet of Things
Hybrid Stores
Smart Tags
Smart Shelves
Smart Warehouse
Fas...
33
Great success stories, but …
© Copyright 2000-2015 TIBCO Software Inc.
… how to realize these use cases?
34
© Copyright 2000-2014 TIBCO Software Inc.
Real Time Close Loop
Model
Develop model
Deploy into Stream
Processing flow
A...
Agenda
– Real World Use Cases
– Introduction to Stream Processing
– Market Overview
– Relation to other Big Data Components
Traditional Data Processing: Challenges
• Introduces too much “decision
latency” into the business.
• Responses are delive...
The New Era: Fast Data Processing
• Events are analyzed and processed
in real-time as they arrive.
• Decisions are timely,...
Streaming Analytics
© Copyright 2000-2015 TIBCO Software Inc.
time
1 2 3 4 5 6 7 8 9
Event Streams
• Continuous Queries
• ...
39
Act while data is in motion!
Time
Business
Value
Business Event
Data Ready for Analysis
Analysis Completed
Decision Mad...
Operational Analytics
Operations
Live UI
SENSOR DATA
TRANSACTIONS
MESSAGE BUS
MACHINE DATA
SOCIAL DATA
Streaming Analytics...
Agenda
– Real World Use Cases
– Introduction to Stream Processing
– Market Overview
– Relation to other Big Data Components
Operational Analytics
Operations
Live UI
SENSOR DATA
TRANSACTIONS
MESSAGE BUS
MACHINE DATA
SOCIAL DATA
Streaming Analytics...
44
Alternatives for Stream Processing
Time
to
Market
Streaming
Frameworks
Streaming
Products
Slow Fast
Streaming
Concepts
...
Concepts (Continuous Queries, Sliding Windows)
Patterns (Counting, Sequencing, Tracking, Trends)
Build everything by yours...
46
Usually not an option ...
© Copyright 2000-2015 TIBCO Software Inc.
… as there are a lot of
Frameworks and
Products ava...
47
Alternatives
© Copyright 2000-2015 TIBCO Software Inc.
OPEN SOURCE CLOSED SOURCE
PRODUCT
FRAMEWORK
(no complete list!)
Library (Java, .NET, Python)
Query Language (often similar to SQL)
Scalability (horizontal and vertical, fail over)
Connec...
49
Apache Storm
© Copyright 2000-2015 TIBCO Software Inc.
Spout Bolt
50
Apache Storm – Hello World
© Copyright 2000-2015 TIBCO Software Inc.
http://wpcertification.blogspot.ch/2014/02/hellowo...
51
Amazon Kinesis
© Copyright 2000-2015 TIBCO Software Inc.
https://aws.amazon.com/kinesis/
AWS S3 RedShift DynamoDB
52
Amazon Kinesis – Hello World
© Copyright 2000-2015 TIBCO Software Inc.
53
Amazon Kinesis – The Cloud ...
© Copyright 2000-2015 TIBCO Software Inc.
… is easy to setup and scale!
But you do not h...
54
Apache Spark
© Copyright 2000-2015 TIBCO Software Inc.
General Data-processing Framework
 However, focus is especially...
55
Apache Spark – Focus on Analytics
© Copyright 2000-2015 TIBCO Software Inc.
http://aptuz.com/blog/is-apache-spark-going...
56
Spark Streaming
© Copyright 2000-2015 TIBCO Software Inc.
Spark Streaming
• is no real streaming solution
• uses micro-...
57
Apache Spark – Hello World
© Copyright 2000-2015 TIBCO Software Inc.
Spark Streaming API
Spark Core API
58
Alternatives
© Copyright 2000-2015 TIBCO Software Inc.
OPEN SOURCE CLOSED SOURCE
PRODUCT
FRAMEWORK
(no complete list!)
Visual IDE (Dev, Test, Debug)
Simulation (Feed Testing, Test Generation)
Live UI (monitoring, proactive interaction)
Matur...
60
IBM InfoSphere Streams
© Copyright 2000-2015 TIBCO Software Inc.
61
IBM InfoSphere Streams
© Copyright 2000-2015 TIBCO Software Inc.
https://developer.ibm.com/streamsdev/wp-content/upload...
TIBCO StreamBase
• Performance: Latency, Throughput, Scalability
– Multi-threaded and clustered server from version 1
– Hi...
StreamBase: The Power of Visual Programming
© Copyright 2000-2015 TIBCO Software Inc.
1) Get ideas into
market in days or
...
64
© Copyright 2000-2013 TIBCO Software Inc.
Code Anyone Can Read
Limit Gift Card Activation Amounts at One Location
Aggre...
Visual Debugger
Feed Simulation
Unit Testing
“StreamBase’s modeling tools are easy to
use and will enable the exchange to
...
Live Datamart
Continuous Query Processor Alerts
BusinessEvents
FTL
EMS
ActiveSpaces
Live Datamart
BusinessWorks
Social Med...
67
Dynamic aggregation
Live visualization
Ad-hoc continuous query
Alerts
Action
LiveView Desktop
Live Datamart Clients and APIs
• Rich Desktop Client
– Drag&Drop, no coding
• Rich Web Client
– Drag&Drop, no coding
• HTM...
Predictive Sensor Analytics
Live Demo (Stream Processing)
70
Spoilt for Choice – Which one to choose?
© Copyright 2000-2015 TIBCO Software Inc.
What are the
key aspects?
71
What do you need (out-of-the-box)?
© Copyright 2000-2015 TIBCO Software Inc.
• A stream processing programming language...
72
Spoilt for Choice – Framework or Product?
© Copyright 2000-2015 TIBCO Software Inc.
Does it make sense
to combine both?
Example: Apache Storm + TIBCO Live Datamart
External
Data
Snapshot Results
Continuous Query Processor
Query
TIBCO Live Dat...
Agenda
– Real World Use Cases
– Introduction to Stream Processing
– Market Overview
– Relation to other Big Data Components
Operational Analytics
Operations
Live UI
SENSOR DATA
TRANSACTIONS
MESSAGE BUS
MACHINE DATA
SOCIAL DATA
Streaming Analytics...
76
© Copyright 2000-2014 TIBCO Software Inc.
Real Time Close Loop
Model
Develop model
Deploy into Stream
Processing flow
A...
Real Time Close Loop: Understand – Anticipate – Act
Big Data
 store everything
in Hadoop, DWH, NoSQL, etc.
 even without...
Real Time Close Loop: Understand – Anticipate – Act
Data Discovery + Statistics + Machine Learning
to find insights and pa...
Real Time Close Loop: Understand – Anticipate – Act
Streaming Analytics
to operationalize insights
and patterns in real ti...
R with Revolution Analytics (now Microsoft)
© Copyright 2000-2015 TIBCO Software Inc.
Open Source GPL License
http://www.r...
R with TIBCO Runtime for R (TERR)
TIBCO TERR delivers production-grade R analytics to
enterprises
 Flexibility & analytic...
Use Open Source R or Not?
© Copyright 2000-2015 TIBCO Software Inc.
http://www.forbes.com/sites/danwoods/2015/01/27/micros...
Spark MLlib
© Copyright 2000-2015 TIBCO Software Inc.
MLlib is Spark’s machine learning
(ML) library. Its goal is to make
...
Predictive Sensor Analytics
Live Demo (Data Discovery, Statistics)
© Copyright 2000-2013 TIBCO Software Inc.
80% of betting happens
AFTER the game begins
TODAY
Case Study: Streaming Analytics for Betting
• Situation: Today, 80% of Betting is Done After the
Game Starts
• It’s not yo...
87
“WHEN 5 KEY BOOKIES RAISE
THE SAME ODDS IN A 5-SECOND
WINDOW, BET LESS”
? ? ? ? ?? ???
88
“WHEN THE REAL-TIME ODDS ARE
5% GREATER THAN THE HISTORICAL
SPREAD, INCREASE MY BET”
? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ...
Reference Architecture: Streaming Betting Analytics
Event Processing
MONITOR
REAL-TIME ANALYTICS
AGGREGATE
HISTORICAL COMP...
Twitter
(#TomBradyBrokenLeg)
Twitter (#Boston)
Brady’s Stats
Actionable
Insights
Real-Time Social Media Analytics
Twitter ...
91
Real-Time Social Sentiment Analysis
Did you get the Key Message?
– Streaming Analytics processes Data while it is in Motion!
– Automation and Proactive Human Interaction are BOTH needed!
...
Questions?
Kai Wähner
kwaehner@tibco.com
@KaiWaehner
www.kai-waehner.de
LinkedIn / Xing  Please connect!
Upcoming SlideShare
Loading in …5
×

Streaming Analytics - Comparison of Open Source Frameworks and Products

18,016 views

Published on

Stream Processing is a concept used to create a high-performance system for rapidly building applications that analyze and act on real-time streaming data. Benefits, amongst others, are faster processing and reaction to real-time complex event streams and the flexibility to quickly adapt to changing business and analytic needs. Big data, cloud, mobile and internet of things are the major drivers for stream processing and streaming analytics.

This session discusses the technical concepts of stream processing and how it is related to big data, mobile, cloud and internet of things. Different use cases such as predictive fault management or fraud detection are used to show and compare alternative frameworks and products for stream processing and streaming analytics.

The audience will understand when to use open source frameworks such as Apache Storm, Apache Spark or Esper, and powerful engines from software vendors such as IBM InfoSphere Streams or TIBCO StreamBase. Live demos will give the audience a good feeling about how to use these frameworks and tools.

The session will also discuss how stream processing is related to Hadoop and statistical analysis with software such as SAS, Apache Spark’s MLlib or R language.

Published in: Technology

Streaming Analytics - Comparison of Open Source Frameworks and Products

  1. 1. Fast Data and Streaming Analytics in the Era of Hadoop, R and Apache Spark Kai Wähner kwaehner@tibco.com @KaiWaehner www.kai-waehner.de LinkedIn / Xing  Please connect!
  2. 2. Key Messages – Streaming Analytics processes Data while it is in Motion! – Automation and Proactive Human Interaction are BOTH needed! – Time to Market is the Key Requirement for most Use Cases!
  3. 3. Agenda – Real World Use Cases – Introduction to Stream Processing – Market Overview – Relation to other Big Data Components
  4. 4. Agenda – Real World Use Cases – Introduction to Stream Processing – Market Overview – Relation to other Big Data Components
  5. 5. © Copyright 2015 TIBCO Software Inc. Find and Act on “Critical Business Moments” “Business Moments” occur in Every Facet of Enterprise Operations, they drive competitive differentiation, customer satisfaction and business success! Optimize Pricing Identify fraud Make cross- sell offers Restock inventory Reroute trucks Deliver proactive customer service Predict equipment failure & fix proactively Anticipate and handle disruptions
  6. 6. Operational Intelligence in Action © Copyright 2000-2015 TIBCO Software Inc. Actions by Operations Human decisions in real time informed by up to date information The Challenge: Empower operations staff to see and seize key business moments6 Automated action based on models of history combined with live context and business rules The Challenge: Create, understand, and deploy algorithms & rules that automate key business reactions Machine-to-Machine Automation
  7. 7. 7 Success Story © Copyright 2000-2015 TIBCO Software Inc. Predictive Fault Management
  8. 8. © Copyright 2000-2013 TIBCO Software Inc. “An outage on one well can cost $10M per hour. We have 20-100 outages per year.“ - Drilling operations VP, major oil company
  9. 9. Data Monitoring • Motor temperature • Motor vibration • Current • Intake pressure • Intake temperature  Flow Electrical power cable Pump Intake Protector ESP motor Pump monitoring unit Electric Submersible Pumps (ESP) Predictive Analytics (Fault Management)
  10. 10. Voltage Temperature Vibration Device history Temporal analytic: “If vibration spike is followed by temp spike then voltage spike [within 12 minutes] then flag high severity alert.” Predictive Analytics (Fault Management)
  11. 11. Live Surveillance of Equipment © Copyright 2000-2014 TIBCO Software Inc. Continuous, live geospatial display of pump health and predictive signal breeches Alerts based on predictive signals Compare live readings and signals to historical average and means Continuous, live visualization of stats per 100’s of wells
  12. 12. 12 Success Story © Copyright 2000-2015 TIBCO Software Inc. Smart Manufacturing
  13. 13. IoT for High Tech Manufacturing Yield Optimization © Copyright 2000-2014 TIBCO Software Inc. • Before: Solar Panel Manufacturer with No Unified View of Manufacturing Process – Multiple manufacturing facilities, multiple processes – no way to compare production to yield expectations • Negative Consequences: Sub-Optimal Production – Operations are sub-optimal: high tolerance leads to better yield but less output; tight tolerance means high throughput but lower yield • Business Outcome: Higher Yield and More Runs – Process Manufacturing can run tighter tolerances and adjust them mid-run, predicting yield and adjusting to changing variables – Systems proactively re-route high-value customers around affected network areas in real-time • How We Do It: The TIBCO Fast Data Platform – IoT, Spotfire, StreamBase, and TERR for predictive modeling, high-speed network by TIBCO “For every 1% increase in shipped product, we make $11MM in profit. The demand is there, we just need to fulfill it.” - Head of Quality, Solar Panel Manufacturer
  14. 14. High Tech Manufacturing Yield Optimization © Copyright 2000-2014 TIBCO Software Inc. Live streaming datamart analysis Continuous update and exploration of top yield metrics; take action
  15. 15. High Tech Manufacturing Yield Optimization © Copyright 2000-2014 TIBCO Software Inc. Continuously computed real-time analytics on streams by StreamBase (thresholds, min / max, average) Analysis, alerts and triggers are based on streaming analytics
  16. 16. High Tech Manufacturing Yield Optimization © Copyright 2000-2014 TIBCO Software Inc. Manufacturing operations staff drill down on any machine, any time, to inspect and fix problems before they impact yield
  17. 17. 17 Success Story © Copyright 2000-2015 TIBCO Software Inc. Crowd Management
  18. 18. 18 © Copyright 2000-2015 TIBCO Software Inc. Crowd Management (Stadium, Airport, Conference, …)
  19. 19. Sacramento Kings  World’s Smartest Building © Copyright 2000-2015 TIBCO Software Inc.
  20. 20. 20 © Copyright 2000-2014 TIBCO Software Inc.
  21. 21. 21 © Copyright 2000-2014 TIBCO Software Inc.
  22. 22. 22 © Copyright 2000-2014 TIBCO Software Inc.
  23. 23. 23 © Copyright 2000-2014 TIBCO Software Inc.
  24. 24. 24 © Copyright 2000-2014 TIBCO Software Inc.
  25. 25. 25 © Copyright 2000-2014 TIBCO Software Inc.
  26. 26. 26 © Copyright 2000-2014 TIBCO Software Inc.
  27. 27. 27 Success Story © Copyright 2000-2015 TIBCO Software Inc. Retailing in the 21st Century
  28. 28. Challenges of the 21st Century Retailer • Retailing and Retail Challenges are changing • Consumers expect better and integrated customer experience across all channels – Rapid adoption of mobile is a major driver – Customers want an integrated service across physical and digital channels… Simultaneously – Customer experience is becoming one of the main differentiators • Real-Time, one-on-one marketing can: – Improve a retailer’s relevance with the customer – Increase customer wallet-share • Key to being able to achieve this is: – Identifying and knowing your customer, in depth in real-time – Understanding the opportunity their past behavior reveals – Understanding your inventory (availability, velocity, pipeline)
  29. 29. 29 © Copyright 2000-2014 TIBCO Software Inc. All Customers are different… Treat them that way… Capture – Engage – Expand - Monetize Patterns – Real time MOREPERSONAL MORE CONTEXT social CRM POS mobileweb e-mails
  30. 30. National Retailer Loyalty 2015 © Copyright 2000-2015 TIBCO Software Inc. Top Benefits • Smart cross-selling based in iBeacons • Location-based services in real time • Leveraging partner offerings
  31. 31. New Real-Time Fraud Detection Based on Deep Historical Insight Real-time fraud action can be taken based on historical insight – system not “whiplashed” by real-time events Streaming Analytics for Gift Card Fraud Protection
  32. 32. 32 © Copyright 2000-2015 TIBCO Software Inc. Internet of Things Hybrid Stores Smart Tags Smart Shelves Smart Warehouse Faster Delivery Buy Online Pickup at Store Same Day Delivery Omni Channel 2.0 Store Fulfillment Social Media Predictive Shopping National Retailer Loyalty 2018
  33. 33. 33 Great success stories, but … © Copyright 2000-2015 TIBCO Software Inc. … how to realize these use cases?
  34. 34. 34 © Copyright 2000-2014 TIBCO Software Inc. Real Time Close Loop Model Develop model Deploy into Stream Processing flow Act Automatically monitor real-time transactions Automatically trigger action Analyze Analyze data via Data Discovery Uncover patterns, trends, correlations
  35. 35. Agenda – Real World Use Cases – Introduction to Stream Processing – Market Overview – Relation to other Big Data Components
  36. 36. Traditional Data Processing: Challenges • Introduces too much “decision latency” into the business. • Responses are delivered “after-the- fact”. • Maximum value of the identified situation is lost. – Cross-sell / up-sell opportunities are lost, impending equipment failure is missed, business processes are slow to respond and lack timely context. • Decisions are made on old and stale data. © Copyright 2000-2015 TIBCO Software Inc. Store Analyze Act
  37. 37. The New Era: Fast Data Processing • Events are analyzed and processed in real-time as they arrive. • Decisions are timely, contextual, and based on fresh data. • Decision latency is eliminated, resulting in:  Superior Customer Experience  Operational Excellence  Instant Awareness and Timely Decisions © Copyright 2000-2015 TIBCO Software Inc. Act Analyze Store
  38. 38. Streaming Analytics © Copyright 2000-2015 TIBCO Software Inc. time 1 2 3 4 5 6 7 8 9 Event Streams • Continuous Queries • Sliding Windows • Filter • Aggregation • Correlation • …
  39. 39. 39 Act while data is in motion! Time Business Value Business Event Data Ready for Analysis Analysis Completed Decision Made $$$$ $$$ $$ $ Action Taken Stream Processing speeds action and increases business value by seizing opportunities while they matter
  40. 40. Operational Analytics Operations Live UI SENSOR DATA TRANSACTIONS MESSAGE BUS MACHINE DATA SOCIAL DATA Streaming AnalyticsAction Aggregate Rules Stream Processing Analytics Correlate Live Datamart Continuous query processing Alerts Manual action, escalation HISTORICAL ANALYSIS MS Excel SAS Data Scientists Cleansed Data History Data Discovery R Enterprise Service Bus ERP MDM DB WMS SOA BIG DATA Data Warehouse, Hadoop InternalData IntegrationBus API Event Server Streaming Analytics Reference Architecture Spark
  41. 41. Agenda – Real World Use Cases – Introduction to Stream Processing – Market Overview – Relation to other Big Data Components
  42. 42. Operational Analytics Operations Live UI SENSOR DATA TRANSACTIONS MESSAGE BUS MACHINE DATA SOCIAL DATA Streaming AnalyticsAction Aggregate Rules Stream Processing Analytics Correlate Live Datamart Continuous query processing Alerts Manual action, escalation HISTORICAL ANALYSIS MS Excel SAS Data Scientists Cleansed Data History Data Discovery R Enterprise Service Bus ERP MDM DB WMS SOA BIG DATA Data Warehouse, Hadoop InternalData IntegrationBus API Event Server Streaming Analytics Reference Architecture Spark
  43. 43. 44 Alternatives for Stream Processing Time to Market Streaming Frameworks Streaming Products Slow Fast Streaming Concepts IncludesIncludes © Copyright 2000-2015 TIBCO Software Inc.
  44. 44. Concepts (Continuous Queries, Sliding Windows) Patterns (Counting, Sequencing, Tracking, Trends) Build everything by yourself!  45 What Streaming Alternative do you need? Time to Market Streaming Frameworks Streaming Products Slow Fast Streaming Concepts © Copyright 2000-2015 TIBCO Software Inc.
  45. 45. 46 Usually not an option ... © Copyright 2000-2015 TIBCO Software Inc. … as there are a lot of Frameworks and Products available!
  46. 46. 47 Alternatives © Copyright 2000-2015 TIBCO Software Inc. OPEN SOURCE CLOSED SOURCE PRODUCT FRAMEWORK (no complete list!)
  47. 47. Library (Java, .NET, Python) Query Language (often similar to SQL) Scalability (horizontal and vertical, fail over) Connectivity (technologies, markets, products) Operators (Filter, Sort, Aggregate) 48 What Streaming Alternative do you need? Time to Market Streaming Frameworks Streaming Products Slow Fast Streaming Concepts © Copyright 2000-2015 TIBCO Software Inc.
  48. 48. 49 Apache Storm © Copyright 2000-2015 TIBCO Software Inc. Spout Bolt
  49. 49. 50 Apache Storm – Hello World © Copyright 2000-2015 TIBCO Software Inc. http://wpcertification.blogspot.ch/2014/02/helloworld-apache-storm-word-counter.html
  50. 50. 51 Amazon Kinesis © Copyright 2000-2015 TIBCO Software Inc. https://aws.amazon.com/kinesis/ AWS S3 RedShift DynamoDB
  51. 51. 52 Amazon Kinesis – Hello World © Copyright 2000-2015 TIBCO Software Inc.
  52. 52. 53 Amazon Kinesis – The Cloud ... © Copyright 2000-2015 TIBCO Software Inc. … is easy to setup and scale! But you do not have full control  • Any data that is older than 24 hours is automatically deleted • Every Kinesis application consists of just one procedure, so you can’t use Kinesis to perform complex stream processing unless you connect multiple applications • Kinesis can only support a maximum size of 50KB for each data item http://diamondstream.com/amazon-kinesis-big-real-time-data-processing-solution/ (blog post from 2014, might be outdated, but shows that you do not have full control over a cloud service)
  53. 53. 54 Apache Spark © Copyright 2000-2015 TIBCO Software Inc. General Data-processing Framework  However, focus is especially on Analytics (these days) http://fortune.com/2015/09/09/cloudera-spark-mapreduce/
  54. 54. 55 Apache Spark – Focus on Analytics © Copyright 2000-2015 TIBCO Software Inc. http://aptuz.com/blog/is-apache-spark-going-to-replace-hadoop/ http://fortune.com/2015/09/09/cloudera-spark-mapreduce/ http://www.ebaytechblog.com/2014/05/28/using-spark-to-ignite-data-analytics/ http://www.forbes.com/sites/paulmiller/2015/06/15/ibm-backs-apache-spark-for-big-data-analytics/ “[IBM’s initiatives] include: • deepening the integration between Apache Spark and existing IBM products like the Watson Health Cloud; • open sourcing IBM’s existing SystemML machine learning technology;
  55. 55. 56 Spark Streaming © Copyright 2000-2015 TIBCO Software Inc. Spark Streaming • is no real streaming solution • uses micro-batches • cannot process data in real-time (i.e. no ultra-low latency) • allows easy combination with other Spark components (SQL, Machine Learning, etc.)
  56. 56. 57 Apache Spark – Hello World © Copyright 2000-2015 TIBCO Software Inc. Spark Streaming API Spark Core API
  57. 57. 58 Alternatives © Copyright 2000-2015 TIBCO Software Inc. OPEN SOURCE CLOSED SOURCE PRODUCT FRAMEWORK (no complete list!)
  58. 58. Visual IDE (Dev, Test, Debug) Simulation (Feed Testing, Test Generation) Live UI (monitoring, proactive interaction) Maturity (24h support, consulting) Integration (ootb integration: ESB, MDM, etc.) Library (Java, .NET, Python) Query Language (often similar to SQL) Scalability (horizontal and vertical, fail over) Connectivity (technologies, markets, products) Operators (Filter, Sort, Aggregate) What Streaming Alternative do you need? Time to Market Streaming Frameworks Streaming Products Slow Fast Streaming Concepts
  59. 59. 60 IBM InfoSphere Streams © Copyright 2000-2015 TIBCO Software Inc.
  60. 60. 61 IBM InfoSphere Streams © Copyright 2000-2015 TIBCO Software Inc. https://developer.ibm.com/streamsdev/wp-content/uploads/sites/15/2014/04/Streams-and-Storm-April-2014-Final.pdf
  61. 61. TIBCO StreamBase • Performance: Latency, Throughput, Scalability – Multi-threaded and clustered server from version 1 – High throughput: Millions of messages, 100,000s of quotes, 10,000s of orders – Low-latency: microsecond latency for algo trading, pre-trade risk, market data • Take Advantage of High Performance Hardware – Multicore (12, 24, 32 core) large memory (10s of gigabytes) – 64-bit Linux, Windows, Solaris deployment – Hardware acceleration (GPU, Solace, Tervela) • Enterprise Deployment – High availability and fault tolerance – Distributed state management for large data sets – Management and monitoring tools – Security and entitlements Integration – Continuous deployment and QA Process Support StreamSQL compiler and static optimizer In process, in thread adapter architecture Visual parallelism and scaling ActiveSpaces integration for distributed shared state Data parallelism and dispatch StreamBase Server Innovations “The StreamBase engine is for real. We couldn’t break it, and believe me, I tried” SVP Development, Top 5 Broker Dealer
  62. 62. StreamBase: The Power of Visual Programming © Copyright 2000-2015 TIBCO Software Inc. 1) Get ideas into market in days or weeks, not months or years 2) Unlock the power of IT and data scientists working together
  63. 63. 64 © Copyright 2000-2013 TIBCO Software Inc. Code Anyone Can Read Limit Gift Card Activation Amounts at One Location Aggregate Capture card activations per location Sales too high! Log to any database No Fraud Sales too high?
  64. 64. Visual Debugger Feed Simulation Unit Testing “StreamBase’s modeling tools are easy to use and will enable the exchange to quickly react to the ever changing needs of our customers.” Steve Goldman, Director of Enterprise Architecture StreamBase Development Studio
  65. 65. Live Datamart Continuous Query Processor Alerts BusinessEvents FTL EMS ActiveSpaces Live Datamart BusinessWorks Social Media Data Market Data Sensor Data Historical Data ActiveSpaces Datagrid Enterprise dataMarket Data IoT Mobile Social LiveView Desktop Command & Control ACTION Continuous Query
  66. 66. 67 Dynamic aggregation Live visualization Ad-hoc continuous query Alerts Action LiveView Desktop
  67. 67. Live Datamart Clients and APIs • Rich Desktop Client – Drag&Drop, no coding • Rich Web Client – Drag&Drop, no coding • HTML5 and Javascript API – D3, jQuery, ExtJS, Google Charts, Bing, AngularJS • .NET API – For custom .NET development • Java API – For custom Java GUI development • Combination – Rich Client + HTML5 Extensions
  68. 68. Predictive Sensor Analytics Live Demo (Stream Processing)
  69. 69. 70 Spoilt for Choice – Which one to choose? © Copyright 2000-2015 TIBCO Software Inc. What are the key aspects?
  70. 70. 71 What do you need (out-of-the-box)? © Copyright 2000-2015 TIBCO Software Inc. • A stream processing programming language for streaming analytics • Visual development and debugging instead of coding • Out-of-the-box connectivity to streaming and historical data sources • Performance (real-time vs. micro-batches) • Automated monitoring and alerts • Live UI for proactive human interaction • Maturity and proven deployments • Fault tolerance • Commercial support • Professional services and training
  71. 71. 72 Spoilt for Choice – Framework or Product? © Copyright 2000-2015 TIBCO Software Inc. Does it make sense to combine both?
  72. 72. Example: Apache Storm + TIBCO Live Datamart External Data Snapshot Results Continuous Query Processor Query TIBCO Live Datamart Continuous Alerting Active Tables Active Tables Continuous Updates Clients Message Bus Public Data Customer Data StreamBase Bolt StreamBase Spout Operational Data StreamBase Bolt and Spout connect Apache Storm to StreamBase to provide real-time analytics on operational data
  73. 73. Agenda – Real World Use Cases – Introduction to Stream Processing – Market Overview – Relation to other Big Data Components
  74. 74. Operational Analytics Operations Live UI SENSOR DATA TRANSACTIONS MESSAGE BUS MACHINE DATA SOCIAL DATA Streaming AnalyticsAction Aggregate Rules Stream Processing Analytics Correlate Live Datamart Continuous query processing Alerts Manual action, escalation HISTORICAL ANALYSIS MS Excel SAS Data Scientists Cleansed Data History Data Discovery R Enterprise Service Bus ERP MDM DB WMS SOA BIG DATA Data Warehouse, Hadoop InternalData IntegrationBus API Event Server Streaming Analytics Reference Architecture Spark
  75. 75. 76 © Copyright 2000-2014 TIBCO Software Inc. Real Time Close Loop Model Develop model Deploy into Stream Processing flow Act Automatically monitor real-time transactions Automatically trigger action Analyze Analyze data via Data Discovery Uncover patterns, trends, correlations
  76. 76. Real Time Close Loop: Understand – Anticipate – Act Big Data  store everything in Hadoop, DWH, NoSQL, etc.  even without structure  even if you do not need it today http://blogs.teradata.com/international/tag/hadoop/
  77. 77. Real Time Close Loop: Understand – Anticipate – Act Data Discovery + Statistics + Machine Learning to find insights and patterns in historical data
  78. 78. Real Time Close Loop: Understand – Anticipate – Act Streaming Analytics to operationalize insights and patterns in real time Stream Processing Hadoop Open Source R TERR SAS MATLAB In- database analytics Spark
  79. 79. R with Revolution Analytics (now Microsoft) © Copyright 2000-2015 TIBCO Software Inc. Open Source GPL License http://www.revolutionanalytics.com/webinars/introducing-revolution-r-open-enhanced-open-source-r-distribution-revolution-analytics
  80. 80. R with TIBCO Runtime for R (TERR) TIBCO TERR delivers production-grade R analytics to enterprises  Flexibility & analytic power of R language  Time-to-market agility  Enterprise-grade platform • A TIBCO licensed & supported product • Not GPL, not a repackaging of the Open source R engine • Deployment in TIBCO products and 3rd party applications (e.g. Hadoop) http://spotfire.tibco.com/discover-spotfire/what-does-spotfire-do/predictive-analytics/tibco-enterprise-runtime-for-r-terr
  81. 81. Use Open Source R or Not? © Copyright 2000-2015 TIBCO Software Inc. http://www.forbes.com/sites/danwoods/2015/01/27/microsofts-revolution-analytics-acquisition-is-the-wrong-way-to-embrace-r/
  82. 82. Spark MLlib © Copyright 2000-2015 TIBCO Software Inc. MLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. It consists of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as lower-level optimization primitives and higher-level pipeline APIs. You can even combine Mllib module with R language
  83. 83. Predictive Sensor Analytics Live Demo (Data Discovery, Statistics)
  84. 84. © Copyright 2000-2013 TIBCO Software Inc. 80% of betting happens AFTER the game begins TODAY
  85. 85. Case Study: Streaming Analytics for Betting • Situation: Today, 80% of Betting is Done After the Game Starts • It’s not your father’s bookie anymore! • Problem: How to Analyze Big Betting Data? • Thousands of concurrent games, constantly adjusting odds, dozens of betting networks – firms must correlate millions of events a day to find the best betting opportunities in real-time • Solution: TIBCO for Fast Data Architecture • TXOdds uses TIBCO to correlate, aggregate, and analyze large volumes of streaming betting data in real-time and publish innovative predictive betting analytics to their customers • Result: TXOdds First to Market with Innovative Zero Latency Betting Analytics • Innovative real-time analytics help players who can process electronic data in real-time the edge “With StreamBase, in two months we had our first betting analytics feed live, and we continually deploy new ideas and evolve our old ones.” - Alex Kozlenkov, VP of technology, TXOdds
  86. 86. 87 “WHEN 5 KEY BOOKIES RAISE THE SAME ODDS IN A 5-SECOND WINDOW, BET LESS” ? ? ? ? ?? ???
  87. 87. 88 “WHEN THE REAL-TIME ODDS ARE 5% GREATER THAN THE HISTORICAL SPREAD, INCREASE MY BET” ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
  88. 88. Reference Architecture: Streaming Betting Analytics Event Processing MONITOR REAL-TIME ANALYTICS AGGREGATE HISTORICAL COMPARISON Predictive odds analytics Zero Latency Betting Analytics GLOBAL, DISTRIBUTED INFRASTRUCTURE Historical odds deviations B U S BETTING LINES SCORES NEWS HADOOP Context: Historical Betting Data, Odds, Outcomes B U S CACHE CACHE CACHE Real-Time Analytics CORRELATE StreamBase LiveView SOCIAL
  89. 89. Twitter (#TomBradyBrokenLeg) Twitter (#Boston) Brady’s Stats Actionable Insights Real-Time Social Media Analytics Twitter (#NFL) Something relevant happening? Every minute counts! Change Odds (automated or manually triggered): • Stop live-betting for the currently running game? • How many interceptions will the Quarterback throw? • Will the Patriots win the Super Bowl? • …
  90. 90. 91 Real-Time Social Sentiment Analysis
  91. 91. Did you get the Key Message?
  92. 92. – Streaming Analytics processes Data while it is in Motion! – Automation and Proactive Human Interaction are BOTH needed! – Time to Market is the Key Requirement for most Use Cases! Key Messages
  93. 93. Questions? Kai Wähner kwaehner@tibco.com @KaiWaehner www.kai-waehner.de LinkedIn / Xing  Please connect!

×