SlideShare a Scribd company logo
1 of 115
WEBINAR
Apache Spark Empowering the Real-Time
Data Driven Enterprise
October 13, 2017
Anand VenugopalMike Gualtieri
Twitter: mgualtieri Twitter: streamanalytix
VP & Principal Analyst, Forrester Product Head & AVP, StreamAnalytix
Our Agenda
• Business Value of Streaming Analytics
• Use Cases / Architecture
• Streaming Analytics Platform Criteria
• Spark as a Streaming Technology
• Introducing StreamAnalytix - Visual Spark Studio
• Success Stories and Demo
• Q & A
Mission critical
technology solutions
since 1996
Fortune 500:
Big Data clients
1700 people; US,
India, global reach
Unique mix of
Big Data products
and services
About Impetus
— Mike Gualtieri, VP & Principal Analyst
The Real-Time Enterprise with Apache Spark
Twitter: @mgualtieri | Linkedin: mgualtieri
#Priority
© 2017 Forrester Research, Inc. Reproduction Prohibited
52%
53%
53%
54%
58%
64%
64%
65%
66%
73%
75%
0% 10% 20% 30% 40% 50% 60% 70% 80%
Better leverage big data and analytics in business…
Create a comprehensive strategy for addressing digital…
Create a comprehensive digital marketing strategy
Better comply with regulations and requirements
Improve differentiation in the market
Increase influence and brand reach in the market
Address rising customer expectations
Improve our ability to innovate
Reduce costs
Improve our products /services
Improve the experience of our customers
• Base: 3,005 global data and analytics decision-makers
• Source: Global Business Technographics Data And Analytics Online Survey 2016
Data and analytics decision-makers are driven by business
priorities
Most firms struggle to analyze data and make
insights actionable in real-time
© 2017 Forrester Research, Inc. Reproduction Prohibited
Real-time means business time
#Business
Is this customer thinking about moving to a
rival firm right now?
What offers should you make to your customer
if they are eCommerce’ing right now?
How can you warn other drivers that the road is
slippery to avoid a crash right now?
© 2017 Forrester Research, Inc. Reproduction Prohibited
What are movers and shakers saying about
equities that we cover right now?
How can you prevent this dude from fleecing
you right now?
How you detect customer SLA problems
right now?
How can IoT data be used to predict machine
failure right now?
#Analytics
© 2017 Forrester Research, Inc. Reproduction Prohibited
Ideate Model Detect Adapt
Machine
Learning
Streaming
Analytics
Descriptive
Analytics
Prescriptive
Analytics
(Real-time Analytics)
   
(Batch Analytics)
Only the analytical enterprise can compete and win in the
age of the customer
#Data
© 2017 Forrester Research, Inc. Reproduction Prohibited
10-49
Terabytes
5% 50-99
Terabytes
12%
100-500
Terabytes
54%
Greater
than 500
Terabytes
29%
Enterprises have plenty of data from both internal and
external sources
Using your best estimate, what is the size of all data
stored within your company?
Source: Forrester Research, September 2015
Base: 100 US Managers and above currently using Hadoop for processing and analyzing data.
Internal
business
data
49%
External
source
data
51%
What % of the data available is from internal
business applications (ERP and business
applications) versus external sources (social, IoT)?
Data is like a drop of rain
It forms instantaneously in a cloud…
...and travels far before it makes a ripple
#Real-time
#
All data originates in real-time!
But, analytics to gain insights is usually done
much, much later
#WhyWait
Insights are perishable
© 2017 Forrester Research, Inc. Reproduction Prohibited
Enterprises must act on a range of perishable insights to get
value from data and analytics
Real-time
Insights
Operational
Insights
Performance
Insights
Insight: Shopping
for furniture
Action:
Recommend
cleaning supplies
Insight: Profit
lower than goal
Action: Optimize
price
Insight: Demand
forecast strong
Action: Increase
inventory
Insight: Furniture
demand high
Action: Expand
product line
TimetoAct
Perishability
Sub-second to
seconds
Seconds to
hours
Days to
weeks
Weeks to
years
Sub-second to
seconds
Seconds to
hours
Hours to
weeks
Weeks to
years
Strategic
Insights
© 2017 Forrester Research, Inc. Reproduction Prohibited
Time To Action
Data
originated
Analytics
performed
Insights
gleaned
Action
taken
Outdated
insights
Impotent or
harmful
actions
Decision
made
Poor
decision
BusinessValuePositiveNegative
Most analytics operations are too slow
© 2017 Forrester Research, Inc. Reproduction Prohibited
BusinessValue
Time to Action
PositiveNegative
The Real-time
Enterprise
You must compress analytics time-to-insight to maximize
the value of data
© 2017 Forrester Research, Inc. Reproduction Prohibited
Real-time
Insights
Strategic
Insights
Operational
Insights
Performance
Insights
TimetoAct
Perishability
Sub-second to
seconds
Seconds to
hours
Days to
weeks
Weeks to
years
Sub-second to
seconds
Seconds to
hours
Hours to
weeks
Weeks to
years
Streaming analytics
Batch analytics
IoT applications must act on a range of perishable insights
to get value from big data
#Applications
The opportunity to become real-time is high, but
enterprises must redesign applications
© 2017 Forrester Research, Inc. Reproduction Prohibited
Streaming Data
Application Interface
App Logic
Context
Actions
Real-time Context
Programmed Logic
Learned Logic
Machine Learning
Learning
External
Actions
External
Context
From other data
sources of
applications
To other data
sources or
applications
Applications
Modern applications infuse analytics to respond in real-time
and become smarter
Streaming is essential technology to identify
and act on perishable insights
#Streaming
© 2017 Forrester Research, Inc. Reproduction Prohibited
Streaming analytics lets applications sense, think, and act
in real-time
Source: Forrester Research
© 2017 Forrester Research, Inc. Reproduction Prohibited
Streaming analytics is very different from plain vanilla
stream ingestion
Source: Forrester Research
© 2017 Forrester Research, Inc. Reproduction Prohibited
Architecture
• Workload scalability
• Workload latency
• Fault tolerance
• Operational management
Stream/event Handling
• Event sequencing
• Enrichment
Analytical Operators
• Transformation
• Correlation
• Time windows
• Complex event processing
Applications Development
• Development tools
• Data connectors
• Business solution accelerators
• Community innovation
Streaming analytics solutions must be scalable and have
a rich set of stateful analytical operators
#Solutions
110010011011001
010010011011001
010011001101101
010010
Historical
Transactions
Customerdata
Security
Ability to ingest structured and unstructured
data from multiple sources in real-time
Scale to handle any volume & velocity of data
Process and analyse in real-time
Provide fault-tolerance for mission-critical
applications
Provide tools that make it easy to manage and
monitor the platform and its interaction with
technology components
Offer tools for business users to visualize
insights from real-time data
Capture perishable events and insights
at low latency
Offer sophisticated stateful and stateless
analytics
Leverage existing skills to make it easy for
developers to develop, test and deploy
applications
#
Hadoop is designed for volume
Spark is designed for speed
© 2017 Forrester Research, Inc. Reproduction Prohibited
Spark and Hadoop often coexist in the same cluster
© 2017 Forrester Research, Inc. Reproduction Prohibited
Hadoop and Spark are friends, but…
…Spark is where developers go to create
real-time enterprises
58,000x
Spark is designed to process in-memory
datasets, but can spool to disk if necessary
Spark’s directed acyclic graph (DAG) engine
optimizes parallelization to dramatically reduce
intermediary data movement
© 2017 Forrester Research, Inc. Reproduction Prohibited
and/or and/orand/or
Spark doesn’t need Hadoop; it just needs great compute
and great storage
© 2017 Forrester Research, Inc. Reproduction Prohibited
Spark includes capabilities for streaming analytics and
machine learning!
#Opportunity
© 2017 Forrester Research, Inc. Reproduction Prohibited
Ideate Model Detect Adapt
Machine
Learning
Streaming
Analytics
Descriptive
Analytics
Prescriptive
Analytics
(Real-time Analytics)
   
(Batch Analytics)
Unify batch and streaming analytics to create your
real-time enterprise
#Time
Stop wasting it
Use it to your advantage
Thank you
Mike Gualtieri
mgualtieri@forrester.com
Twitter: @mgualtieri
Real-Time Stream Processing and Machine Learning Platform
ENABLING THE REAL TIME ENTERPRISE
“Impetus has the
opportunity to make
StreamAnalytix the
de facto tooling
standard for Spark
and future streaming
engines…”
Impetus Technologies covers open source bases without the headaches.
Take your pick. Impetus’ StreamAnalytix supports Apache Storm and Apache
Spark and is architecturally positioned to support other open source streaming
analytics software such as Apache Flink.
StreamAnalytix also embeds EsperTech to provide advanced streaming
analytics capabilities such as complex event processing.
What also shines about the StreamAnalytix solution is that it includes
enterprise-grade visual tooling for both development and deployment of
streaming applications.
StreamAnalytix tooling also unifies streaming and batch by supporting arbitrary
Spark jobs such as machine learning.
A Strong Performer in The Forrester Wave™:
Streaming Analytics, Q3 2017
ENABLING THE REAL TIME ENTERPRISE
1
Real-Time Streaming
Data Analytics
2
Makes Spark Easy
(Visual Spark Studio)
SENSE
Hours/
Days
ANALYZE ACT
SENSE ANALYZE ACTSec/ ms
Not so real-time
Hours/
Days
Sec/ ms
StreamAnalytix is a platform to build real-time apps
Near real-time /
real-time
1
Slow processing jobs
Wherever you are – we can make you faster
HADOOP-MR OR
OTHER NON-BIG
DATA TECH
Faster due to
in-memory
SPARK
BATCH
JOBS
Faster due to
micro batch
SPARK
STREAMING
JOBS
Fastest
EVENT
STREAM
PROCESSING
1
ENABLING THE REAL TIME ENTERPRISE
Real-time C360 and Churn
Fraud and Anomaly Detection
IoT and Log Analytics
Next Best Offer or Action
Predictive Maintenance
Cyber Security
Real-time Call Center Analytics
Use Cases
Real-time Streaming
Data Analytics
1
ENABLING THE REAL TIME ENTERPRISE
Learning / Training  Real-time + Batch
PMML, H20, Python – on Spark
Kafka, Storm, Esper
Scoring  Real-time + Batch
Spark Streaming, SparkML, ML-Lib
Stack
Real-time Streaming
Data Analytics
1
ENABLING THE REAL TIME ENTERPRISE
1
Real-Time Streaming
Data Analytics
2
Makes Spark Easy
(Visual Spark Studio)
Shortage of Spark talent and the urgent need for it
• Spark projects are increasing
• Need to get done quickly, with budget controls
• But, there is a big barrier: Talent - both quality and quantity
• Deep Spark / Scala skills are hard to find
• Big gap between Spark prototype app vs. production grade,
scalable, stable apps that don’t need a lot of baby-sitting
2
IMPACT
• S…LLL...O..OO...WW
• DIFFICULT
• COSTLY
• RISK RIDDEN
• SPARK PROJECTS
Is the Real-time Enterprise possible ?
With Spark use-cases taking too long to deliver ?
2
Is the real-time enterprise possible?
SOLUTION
•More people? (They don’t exist yet – just gets more messy and costly)
•Ditch Spark and buy proprietary platforms? ($$$$ - That’s going backwards)
•Just bite the bullet, and delay the project? (Oops!)
•Hire outsourcing companies? (Do they really have more skilled people?)
2
Is the real-time enterprise possible?
SOLUTION
•Get the right tools
•Make existing people and teams – much more productive
2
The right Spark tool or platform – does this…
Maintain
Deploy
Develop
+ Debug
Monitor
+ Tune
Apps
Ingest
Analytics/
ML
ETL
Visual IDE
Scale
Performance
2
Data360
Visual Spark IDE – Drag and Drop
Analytics – Feature extract, ML, Time windows
Transform / Enrich – Filter, Blend, Lookup
Streaming, Batch + Oozie Workflow
Load – HDFS, HBase, Hive, Any NoSQL
View – Real-time Dashboards
Ingest – Tables, Files, Kafka, APIs
Visual Spark Studio
2
User Configurable
Real-time dashboard
Monitoring Spark pipelines
Hadoop Cluster
StreamAnalytix Web Server1 (CentOS / RHEL 6.x or above)
Load
Balancer
With sticky
session
User
StreamAnalytix leverages
Zookeeper for configuration
management4
Standalone spark cluster
or Spark over YARN3
MySQL/
Postgres
RabbitMQ
Deployment diagram
Secured communication
via Kerberos2
StreamAnalytix
Web Container
(Tomcat)
Overview
Local
Mode
+ StreamAnalytix Spark portion
+ All dependencies
= One Binary
Full
Cluster
Identical user experience for building and managing Spark jobs
Desktop or
Single VM
Go to
“StreamAnalytix.com”
to view demo
and download
Visual Spark Studio
Success Stories
Why improve?
…when you can transform your business
Transforming the Business - means….
• Creating a real-time enterprise
• Dramatic non-linear increase in performance / cost trade off
• Net new capabilities or revenue streams – that were previously not possible
Top airline boosts customer digital experience
• Funnels all app data to enterprise bus and into StreamAnalytix
• Couldn’t handle the volume and velocity of data earlier
• Analytical capacity went from 3 days to 3 months
• Ability to correlate events and see patterns across a larger time window
• Customer experience issues proactively resolved in real-time
• Foundation laid for real-time ML, predictive and prescriptive analytics
JSON
Raw
Data
User
Kafka
Data Ingestion
UI Data Diagnostic Tool Query Results
Data Querying
Data Search
YARN
Parsing Filtering Emitting
StreamAnalytix Spark Pipeline
X Service data
Raw JSON Data
• Multiple Apps
• Multiple Services
All Services data
StreamAnalytix Pipeline Overview
High Level Solution Architecture
Highlights
• Input data velocity ~7K /sec
• Contributing to ~5 TB /day
• ES Data retention of 30 days
• Custom built Web UI for queries
• StreamAnalytix implementation providing
easy onboarding of additional services
and application logs
Benefits
• Diagnostic ability on a larger range of data
• SLAs unaffected, similar and better
• Improved searching with custom Web UI
• Scalable architecture
• Supporting even larger data sets
Solution
ElasticSearch
•5X performance gain from the same hardware
•New solution based on StreamAnalytix – costs less
•Can onboard 5 times more application traffic for detecting threats
Major bank - insider threat detection: 5X boost
Data Ingestion
Processing and
Enrichment
Data Sink and
Persistence
Data pipeline – high level processing stages
Pharmacy business processing giant
•Spark based real-time CDC and flow management
•Sense-change, Ingest, Transform, Load
•100s of source tables – data from a large number of pharmacies
•Plus some important real-time ETL / Analytics use cases
•Attunity  Kafka  StreamAnalytix / Spark - HDFS, Hive
•2 mission critical data pipelines delivered in 1 day, 2 days
•“I could hire a 3 person team instead of a 10 person team”
Problem Statement
•Oracle based transactions  merge to  Hive reporting tables in seconds
ACHIEVEMENT
•Spark pipelines for this task built and deployed in 2 days
•Partner Integration with Attunity for CDC
•Consume Oracle multi-table CDC events in real-time
•Capture and reconcile changes into Hive tables
•De-normalize data while landing into Hive
Workflow: Modelled as StreamAnalytix Oozie workflow to
automate execution of Spark pipelines that perform data
de-normalization and incremental updates to Hive
StreamAnalytix Solution
Data Ingestion
and Staging
Stream data from
Attunity replicate for
multiple tables from
Kafka and store raw
data into HDFS
A complete CDC
solution has 3 parts
Each aspect of the
solution is modelled
as StreamAnalytix
pipeline
Data
De-normalization
Join transactional
data with data at
rest and stores
de-normalized data
on HDFS
Incremental Updates
in Hive
Merge previously
processed
transactional data
with new
incremental updates
Pipeline #1 - Data ingestion and staging (Streaming)
Data ingestion via Attunity ‘Channel’:
Reads the data from Attunity target
Kafka. This channel is configured to
read data feeds as well as metadata
from a separate topic
Data enrichment: Enriches incoming
data with metadata information and
event timestamp
HDFS: Stores CDC data on HDFS in landing
area using OOB HDFS emitter. HDFS files are
rotated based on time and size configuration
Pipeline #2 - Data de-normalization (Batch)
HDFS data channel:
Ingests incremental
data from previous runs
of the staging location
Pipeline #1
Reads reference (data
at rest) from a fixed
HDFS location Performs outer join to merge
incremental and static data
Store de-normalized
data to HDFS directory
Pipeline #3 - Incremental updates in Hive (Batch)
Pipeline #2
Hive SQL query to load a managed
table from the HDFS incremental
data generated from Pipeline #2
Reconciliation step - Hive “merge into” SQL,
performs insert, update and delete operation
based on the operation in incremental data
Clean up step - runs a drop table
command on the managed table to
clean up processed data – so that it
doesn’t get repeatedly processed
Workflow: Oozie Coordinator Job
Oozie orchestration flow created using StreamAnalytix webstudio –
it orchestrates pipeline #2 & pipeline #3 into a single Oozie flow that
can scheduled as shown here
“After a long time we now have a new offering we can go sell proudly to our customers”
- Product Manager
•Net new capability for real-time inspection and diagnostics of call quality and customer experience
at the contact center
•Dramatically improves end-user service for their B2B customers
Hosted call center adds new premium product / revenue
source
Hosted call center
Challenges solved
•Individual events scattered in different media servers
•Needed to filter a lot of noise in the data at the source itself
•Tech support took too long to correlate and solve issues
•Call Center manager had no real-time view on IVR operations
•Needed a variety of cell center metrics in real-time
Hosted call center solution
Public
Internet
IP
IP
IP
IP
IP
IP
IP
C
C CIP
C
C C
ACD
= Packet
= Circuit
Internet Caller
Chat, VOIP, E-mail,
Collaboration, Video
Wireless Caller
Live Call, IVR,
Voice Mail
Telephone Caller
Live Call, IVR,
Voice Mail
Core Servers
Routing, Admin,
Stats, Logging
Agent Servers
Agent
Interaction
Connection
Servers
IVR, Voice, Chat,
Video, Message
Dialing Servers
Predictive Engine,
Campaign Manager
GATEWAYS
Circuit
NetworksCircuit
Networks
Legacy Call Centers
ADMINISTRATOR/
SUPERVISOR
Administration, Monitoring
Service Creation,
Recording Reports
PC AGENT - SOFTPHONE
PC AGENT – IP PHONE
HYBRID AGENT
PHONE AGENTS
Hosted call center solution
Hosted call center solution
Hosted call center solution
• 8000+ agent desktops monitored for unethical behaviour in real-time
• Secures customer information
• Ensures top quality service
• Net new capability they couldn’t get earlier at any reasonable price point
Tier 1 Telco deploys new “agent monitoring system”
Desktop Analytics
Key Business Metrics :
• Average Handling Time
• First Call Resolution
• Sales Close Rate
• Disconnect Save Rate
1yr benefit is $5.41M
in the form of Call Volume Reduction
30 sec AHT reduction for Tech
15 sec AHT reduction for Sales
Desktop analytics – desktop data pipeline
Call
Center
Agent
Machine
Big Data Platform
Desktop Raw data
processing
App activity
aggregation
Event activity
aggregation
System data enrich
and persist
App and Event data
enrich and persist
• Consume Raw
ACD events
• Parse and Split the
Bulk Jason mssg
into individual
• Data Process for App, Event,
System events
• Aggregate data: Mini batching,
Data sequencing, Enrich Data
with Agent Hierarchy,
Aggregate Data
• Persist data into HIVE, HBASE,
Elastic
Source System Data type No Of Agent Records/Day
Desktop Data Raw 9 69461
Desktop Data Aggregated 9 45428
Call Data Raw 7000 900000
Call Data Aggregated 7000 900000
Source System Data type No of Agents Records/Day
Desktop Data Raw 7000 60M
Desktop Data Aggregated 7000 20M
Call Data Raw 7000 900000
Call Data Aggregated 7000 900000
Pilot
GA
Desktop analytics - data volume
Thank you.
Questions?
© 2017 Impetus Technologies
Email: inquiry@streamanalytix.com Twitter : @StreamAnalytix

More Related Content

What's hot

Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...SnapLogic
 
Stream Analytics
Stream Analytics Stream Analytics
Stream Analytics Franco Ucci
 
Big Data – A New Testing Challenge
Big Data – A New Testing ChallengeBig Data – A New Testing Challenge
Big Data – A New Testing ChallengeTEST Huddle
 
Splunk for Developers
Splunk for DevelopersSplunk for Developers
Splunk for DevelopersSplunk
 
Data Science in the Cloud with Spark, Zeppelin, and Cloudbreak
Data Science in the Cloud with Spark, Zeppelin, and CloudbreakData Science in the Cloud with Spark, Zeppelin, and Cloudbreak
Data Science in the Cloud with Spark, Zeppelin, and CloudbreakDataWorks Summit
 
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...Codemotion
 
SplunkLive! London: Splunk ninjas- new features and search dojo
SplunkLive! London: Splunk ninjas- new features and search dojoSplunkLive! London: Splunk ninjas- new features and search dojo
SplunkLive! London: Splunk ninjas- new features and search dojoSplunk
 
Data Onboarding Breakout Session
Data Onboarding Breakout SessionData Onboarding Breakout Session
Data Onboarding Breakout SessionSplunk
 
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...SnapLogic
 
Taking Splunk to the Next Level - Architecture
Taking Splunk to the Next Level - ArchitectureTaking Splunk to the Next Level - Architecture
Taking Splunk to the Next Level - ArchitectureSplunk
 
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Databricks
 
BDTC2015 databricks-辛湜-state of spark
BDTC2015 databricks-辛湜-state of sparkBDTC2015 databricks-辛湜-state of spark
BDTC2015 databricks-辛湜-state of sparkJerry Wen
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastDatabricks
 
1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for releaseJen Stirrup
 
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcaderoIasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcaderoCodecamp Romania
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 

What's hot (20)

DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
OpenPOWER Update
OpenPOWER UpdateOpenPOWER Update
OpenPOWER Update
 
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...
 
Stream Analytics
Stream Analytics Stream Analytics
Stream Analytics
 
Big Data – A New Testing Challenge
Big Data – A New Testing ChallengeBig Data – A New Testing Challenge
Big Data – A New Testing Challenge
 
Analyst Toolbox August 2017
Analyst Toolbox August 2017Analyst Toolbox August 2017
Analyst Toolbox August 2017
 
Splunk for Developers
Splunk for DevelopersSplunk for Developers
Splunk for Developers
 
Data Science in the Cloud with Spark, Zeppelin, and Cloudbreak
Data Science in the Cloud with Spark, Zeppelin, and CloudbreakData Science in the Cloud with Spark, Zeppelin, and Cloudbreak
Data Science in the Cloud with Spark, Zeppelin, and Cloudbreak
 
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...
 
SplunkLive! London: Splunk ninjas- new features and search dojo
SplunkLive! London: Splunk ninjas- new features and search dojoSplunkLive! London: Splunk ninjas- new features and search dojo
SplunkLive! London: Splunk ninjas- new features and search dojo
 
Data Onboarding Breakout Session
Data Onboarding Breakout SessionData Onboarding Breakout Session
Data Onboarding Breakout Session
 
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
 
Taking Splunk to the Next Level - Architecture
Taking Splunk to the Next Level - ArchitectureTaking Splunk to the Next Level - Architecture
Taking Splunk to the Next Level - Architecture
 
Oracle Analytics Cloud
Oracle Analytics CloudOracle Analytics Cloud
Oracle Analytics Cloud
 
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
 
BDTC2015 databricks-辛湜-state of spark
BDTC2015 databricks-辛湜-state of sparkBDTC2015 databricks-辛湜-state of spark
BDTC2015 databricks-辛湜-state of spark
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
 
1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release
 
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcaderoIasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 

Similar to Apache spark empowering the real time data driven enterprise - StreamAnalytix webinar

Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester WebinarCloudera, Inc.
 
Streaming analytics webinar | 9.13.16 | Guest: Mike Gualtieri from Forrester
Streaming analytics webinar | 9.13.16 | Guest: Mike Gualtieri from ForresterStreaming analytics webinar | 9.13.16 | Guest: Mike Gualtieri from Forrester
Streaming analytics webinar | 9.13.16 | Guest: Mike Gualtieri from ForresterCubic Corporation
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarImpetus Technologies
 
IW14 Session: Mike Gualtieri, Forrester Research
IW14 Session: Mike Gualtieri, Forrester ResearchIW14 Session: Mike Gualtieri, Forrester Research
IW14 Session: Mike Gualtieri, Forrester ResearchSoftware AG
 
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AIAWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AIAmazon Web Services
 
Moving from data to insights: How to effectively drive business decisions & g...
Moving from data to insights: How to effectively drive business decisions & g...Moving from data to insights: How to effectively drive business decisions & g...
Moving from data to insights: How to effectively drive business decisions & g...Cloudera, Inc.
 
Moving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time DataMoving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time DataVoltDB
 
Insight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital TransformationInsight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital TransformationMapR Technologies
 
AWS Initiate Day Dublin 2019 – Big Data Meets AI
AWS Initiate Day Dublin 2019 – Big Data Meets AIAWS Initiate Day Dublin 2019 – Big Data Meets AI
AWS Initiate Day Dublin 2019 – Big Data Meets AIAmazon Web Services
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data ScienceDataWorks Summit
 
SAS an open ecosystem for Artifical Intelligence - Dean Zouari
SAS an open ecosystem for Artifical Intelligence - Dean ZouariSAS an open ecosystem for Artifical Intelligence - Dean Zouari
SAS an open ecosystem for Artifical Intelligence - Dean ZouariInstitute of Contemporary Sciences
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsFredReynolds2
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessInside Analysis
 
5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
 5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri 5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike GualtieriSpark Summit
 
Adoption is the only option hadoop is changing our world and changing yours f...
Adoption is the only option hadoop is changing our world and changing yours f...Adoption is the only option hadoop is changing our world and changing yours f...
Adoption is the only option hadoop is changing our world and changing yours f...DataWorks Summit
 
The Value of Pervasive Analytics
The Value of Pervasive AnalyticsThe Value of Pervasive Analytics
The Value of Pervasive AnalyticsCloudera, Inc.
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTrivadis
 
Making Sense of Graph Databases
Making Sense of Graph DatabasesMaking Sense of Graph Databases
Making Sense of Graph DatabasesInfiniteGraph
 
Webinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data PlatformWebinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data PlatformDataStax
 

Similar to Apache spark empowering the real time data driven enterprise - StreamAnalytix webinar (20)

Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester Webinar
 
Streaming analytics webinar | 9.13.16 | Guest: Mike Gualtieri from Forrester
Streaming analytics webinar | 9.13.16 | Guest: Mike Gualtieri from ForresterStreaming analytics webinar | 9.13.16 | Guest: Mike Gualtieri from Forrester
Streaming analytics webinar | 9.13.16 | Guest: Mike Gualtieri from Forrester
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
 
IW14 Session: Mike Gualtieri, Forrester Research
IW14 Session: Mike Gualtieri, Forrester ResearchIW14 Session: Mike Gualtieri, Forrester Research
IW14 Session: Mike Gualtieri, Forrester Research
 
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AIAWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
AWS Initiate Day Manchester 2019 – AWS Big Data Meets AI
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
 
Moving from data to insights: How to effectively drive business decisions & g...
Moving from data to insights: How to effectively drive business decisions & g...Moving from data to insights: How to effectively drive business decisions & g...
Moving from data to insights: How to effectively drive business decisions & g...
 
Moving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time DataMoving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time Data
 
Insight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital TransformationInsight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital Transformation
 
AWS Initiate Day Dublin 2019 – Big Data Meets AI
AWS Initiate Day Dublin 2019 – Big Data Meets AIAWS Initiate Day Dublin 2019 – Big Data Meets AI
AWS Initiate Day Dublin 2019 – Big Data Meets AI
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
SAS an open ecosystem for Artifical Intelligence - Dean Zouari
SAS an open ecosystem for Artifical Intelligence - Dean ZouariSAS an open ecosystem for Artifical Intelligence - Dean Zouari
SAS an open ecosystem for Artifical Intelligence - Dean Zouari
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven Business
 
5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
 5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri 5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
 
Adoption is the only option hadoop is changing our world and changing yours f...
Adoption is the only option hadoop is changing our world and changing yours f...Adoption is the only option hadoop is changing our world and changing yours f...
Adoption is the only option hadoop is changing our world and changing yours f...
 
The Value of Pervasive Analytics
The Value of Pervasive AnalyticsThe Value of Pervasive Analytics
The Value of Pervasive Analytics
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
 
Making Sense of Graph Databases
Making Sense of Graph DatabasesMaking Sense of Graph Databases
Making Sense of Graph Databases
 
Webinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data PlatformWebinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data Platform
 

More from Impetus Technologies

The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...
The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...
The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...Impetus Technologies
 
Eliminate cyber-security threats using data analytics – Build a resilient ent...
Eliminate cyber-security threats using data analytics – Build a resilient ent...Eliminate cyber-security threats using data analytics – Build a resilient ent...
Eliminate cyber-security threats using data analytics – Build a resilient ent...Impetus Technologies
 
Automated EDW Assessment and Actionable Recommendations - Impetus Webinar
Automated EDW Assessment and Actionable Recommendations - Impetus WebinarAutomated EDW Assessment and Actionable Recommendations - Impetus Webinar
Automated EDW Assessment and Actionable Recommendations - Impetus WebinarImpetus Technologies
 
Building a mature foundation for life in the cloud
Building a mature foundation for life in the cloudBuilding a mature foundation for life in the cloud
Building a mature foundation for life in the cloudImpetus Technologies
 
Best practices to build a sustainable data lake on cloud - Impetus Webinar
Best practices to build a sustainable data lake on cloud - Impetus WebinarBest practices to build a sustainable data lake on cloud - Impetus Webinar
Best practices to build a sustainable data lake on cloud - Impetus WebinarImpetus Technologies
 
Automate and Optimize Data Warehouse Migration to Snowflake
Automate and Optimize Data Warehouse Migration to SnowflakeAutomate and Optimize Data Warehouse Migration to Snowflake
Automate and Optimize Data Warehouse Migration to SnowflakeImpetus Technologies
 
Instantly convert Teradata ETL and EDW to Spark- Impetus webinar
Instantly convert Teradata ETL and EDW to Spark- Impetus webinarInstantly convert Teradata ETL and EDW to Spark- Impetus webinar
Instantly convert Teradata ETL and EDW to Spark- Impetus webinarImpetus Technologies
 
Keys to establish sustainable DW and analytics on the cloud -Impetus webinar
Keys to establish sustainable DW and analytics on the cloud -Impetus webinarKeys to establish sustainable DW and analytics on the cloud -Impetus webinar
Keys to establish sustainable DW and analytics on the cloud -Impetus webinarImpetus Technologies
 
Solving the EDW transformation conundrum - Impetus webinar
Solving the EDW transformation conundrum - Impetus webinarSolving the EDW transformation conundrum - Impetus webinar
Solving the EDW transformation conundrum - Impetus webinarImpetus Technologies
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleImpetus Technologies
 
Keys to Formulating an Effective Data Management Strategy in the Age of Data
Keys to Formulating an Effective Data Management Strategy in the Age of DataKeys to Formulating an Effective Data Management Strategy in the Age of Data
Keys to Formulating an Effective Data Management Strategy in the Age of DataImpetus Technologies
 
Build Spark-based ETL Workflows on Cloud in Minutes
Build Spark-based ETL Workflows on Cloud in MinutesBuild Spark-based ETL Workflows on Cloud in Minutes
Build Spark-based ETL Workflows on Cloud in MinutesImpetus Technologies
 
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...Impetus Technologies
 
Streaming Analytics for IoT with Apache Spark
Streaming Analytics for IoT with Apache SparkStreaming Analytics for IoT with Apache Spark
Streaming Analytics for IoT with Apache SparkImpetus Technologies
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationImpetus Technologies
 
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptxAnomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptxImpetus Technologies
 

More from Impetus Technologies (17)

The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...
The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...
The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...
 
Eliminate cyber-security threats using data analytics – Build a resilient ent...
Eliminate cyber-security threats using data analytics – Build a resilient ent...Eliminate cyber-security threats using data analytics – Build a resilient ent...
Eliminate cyber-security threats using data analytics – Build a resilient ent...
 
Automated EDW Assessment and Actionable Recommendations - Impetus Webinar
Automated EDW Assessment and Actionable Recommendations - Impetus WebinarAutomated EDW Assessment and Actionable Recommendations - Impetus Webinar
Automated EDW Assessment and Actionable Recommendations - Impetus Webinar
 
Building a mature foundation for life in the cloud
Building a mature foundation for life in the cloudBuilding a mature foundation for life in the cloud
Building a mature foundation for life in the cloud
 
Best practices to build a sustainable data lake on cloud - Impetus Webinar
Best practices to build a sustainable data lake on cloud - Impetus WebinarBest practices to build a sustainable data lake on cloud - Impetus Webinar
Best practices to build a sustainable data lake on cloud - Impetus Webinar
 
Automate and Optimize Data Warehouse Migration to Snowflake
Automate and Optimize Data Warehouse Migration to SnowflakeAutomate and Optimize Data Warehouse Migration to Snowflake
Automate and Optimize Data Warehouse Migration to Snowflake
 
Instantly convert Teradata ETL and EDW to Spark- Impetus webinar
Instantly convert Teradata ETL and EDW to Spark- Impetus webinarInstantly convert Teradata ETL and EDW to Spark- Impetus webinar
Instantly convert Teradata ETL and EDW to Spark- Impetus webinar
 
Keys to establish sustainable DW and analytics on the cloud -Impetus webinar
Keys to establish sustainable DW and analytics on the cloud -Impetus webinarKeys to establish sustainable DW and analytics on the cloud -Impetus webinar
Keys to establish sustainable DW and analytics on the cloud -Impetus webinar
 
Solving the EDW transformation conundrum - Impetus webinar
Solving the EDW transformation conundrum - Impetus webinarSolving the EDW transformation conundrum - Impetus webinar
Solving the EDW transformation conundrum - Impetus webinar
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scale
 
Keys to Formulating an Effective Data Management Strategy in the Age of Data
Keys to Formulating an Effective Data Management Strategy in the Age of DataKeys to Formulating an Effective Data Management Strategy in the Age of Data
Keys to Formulating an Effective Data Management Strategy in the Age of Data
 
Build Spark-based ETL Workflows on Cloud in Minutes
Build Spark-based ETL Workflows on Cloud in MinutesBuild Spark-based ETL Workflows on Cloud in Minutes
Build Spark-based ETL Workflows on Cloud in Minutes
 
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
 
Streaming Analytics for IoT with Apache Spark
Streaming Analytics for IoT with Apache SparkStreaming Analytics for IoT with Apache Spark
Streaming Analytics for IoT with Apache Spark
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
 
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptxAnomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
 
Importance of Big Data Analytics
Importance of Big Data AnalyticsImportance of Big Data Analytics
Importance of Big Data Analytics
 

Recently uploaded

From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 

Recently uploaded (20)

From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 

Apache spark empowering the real time data driven enterprise - StreamAnalytix webinar

  • 1. WEBINAR Apache Spark Empowering the Real-Time Data Driven Enterprise October 13, 2017 Anand VenugopalMike Gualtieri Twitter: mgualtieri Twitter: streamanalytix VP & Principal Analyst, Forrester Product Head & AVP, StreamAnalytix
  • 2. Our Agenda • Business Value of Streaming Analytics • Use Cases / Architecture • Streaming Analytics Platform Criteria • Spark as a Streaming Technology • Introducing StreamAnalytix - Visual Spark Studio • Success Stories and Demo • Q & A
  • 3. Mission critical technology solutions since 1996 Fortune 500: Big Data clients 1700 people; US, India, global reach Unique mix of Big Data products and services About Impetus
  • 4. — Mike Gualtieri, VP & Principal Analyst The Real-Time Enterprise with Apache Spark Twitter: @mgualtieri | Linkedin: mgualtieri
  • 6. © 2017 Forrester Research, Inc. Reproduction Prohibited 52% 53% 53% 54% 58% 64% 64% 65% 66% 73% 75% 0% 10% 20% 30% 40% 50% 60% 70% 80% Better leverage big data and analytics in business… Create a comprehensive strategy for addressing digital… Create a comprehensive digital marketing strategy Better comply with regulations and requirements Improve differentiation in the market Increase influence and brand reach in the market Address rising customer expectations Improve our ability to innovate Reduce costs Improve our products /services Improve the experience of our customers • Base: 3,005 global data and analytics decision-makers • Source: Global Business Technographics Data And Analytics Online Survey 2016 Data and analytics decision-makers are driven by business priorities
  • 7. Most firms struggle to analyze data and make insights actionable in real-time
  • 8. © 2017 Forrester Research, Inc. Reproduction Prohibited Real-time means business time
  • 10. Is this customer thinking about moving to a rival firm right now?
  • 11. What offers should you make to your customer if they are eCommerce’ing right now?
  • 12. How can you warn other drivers that the road is slippery to avoid a crash right now?
  • 13. © 2017 Forrester Research, Inc. Reproduction Prohibited What are movers and shakers saying about equities that we cover right now?
  • 14. How can you prevent this dude from fleecing you right now?
  • 15. How you detect customer SLA problems right now?
  • 16. How can IoT data be used to predict machine failure right now?
  • 18. © 2017 Forrester Research, Inc. Reproduction Prohibited Ideate Model Detect Adapt Machine Learning Streaming Analytics Descriptive Analytics Prescriptive Analytics (Real-time Analytics)     (Batch Analytics) Only the analytical enterprise can compete and win in the age of the customer
  • 19. #Data
  • 20. © 2017 Forrester Research, Inc. Reproduction Prohibited 10-49 Terabytes 5% 50-99 Terabytes 12% 100-500 Terabytes 54% Greater than 500 Terabytes 29% Enterprises have plenty of data from both internal and external sources Using your best estimate, what is the size of all data stored within your company? Source: Forrester Research, September 2015 Base: 100 US Managers and above currently using Hadoop for processing and analyzing data. Internal business data 49% External source data 51% What % of the data available is from internal business applications (ERP and business applications) versus external sources (social, IoT)?
  • 21. Data is like a drop of rain
  • 22. It forms instantaneously in a cloud…
  • 23. ...and travels far before it makes a ripple
  • 25. #
  • 26. All data originates in real-time!
  • 27. But, analytics to gain insights is usually done much, much later
  • 30. © 2017 Forrester Research, Inc. Reproduction Prohibited Enterprises must act on a range of perishable insights to get value from data and analytics Real-time Insights Operational Insights Performance Insights Insight: Shopping for furniture Action: Recommend cleaning supplies Insight: Profit lower than goal Action: Optimize price Insight: Demand forecast strong Action: Increase inventory Insight: Furniture demand high Action: Expand product line TimetoAct Perishability Sub-second to seconds Seconds to hours Days to weeks Weeks to years Sub-second to seconds Seconds to hours Hours to weeks Weeks to years Strategic Insights
  • 31. © 2017 Forrester Research, Inc. Reproduction Prohibited Time To Action Data originated Analytics performed Insights gleaned Action taken Outdated insights Impotent or harmful actions Decision made Poor decision BusinessValuePositiveNegative Most analytics operations are too slow
  • 32. © 2017 Forrester Research, Inc. Reproduction Prohibited BusinessValue Time to Action PositiveNegative The Real-time Enterprise You must compress analytics time-to-insight to maximize the value of data
  • 33. © 2017 Forrester Research, Inc. Reproduction Prohibited Real-time Insights Strategic Insights Operational Insights Performance Insights TimetoAct Perishability Sub-second to seconds Seconds to hours Days to weeks Weeks to years Sub-second to seconds Seconds to hours Hours to weeks Weeks to years Streaming analytics Batch analytics IoT applications must act on a range of perishable insights to get value from big data
  • 35. The opportunity to become real-time is high, but enterprises must redesign applications
  • 36. © 2017 Forrester Research, Inc. Reproduction Prohibited Streaming Data Application Interface App Logic Context Actions Real-time Context Programmed Logic Learned Logic Machine Learning Learning External Actions External Context From other data sources of applications To other data sources or applications Applications Modern applications infuse analytics to respond in real-time and become smarter
  • 37. Streaming is essential technology to identify and act on perishable insights
  • 39. © 2017 Forrester Research, Inc. Reproduction Prohibited Streaming analytics lets applications sense, think, and act in real-time Source: Forrester Research
  • 40. © 2017 Forrester Research, Inc. Reproduction Prohibited Streaming analytics is very different from plain vanilla stream ingestion Source: Forrester Research
  • 41. © 2017 Forrester Research, Inc. Reproduction Prohibited Architecture • Workload scalability • Workload latency • Fault tolerance • Operational management Stream/event Handling • Event sequencing • Enrichment Analytical Operators • Transformation • Correlation • Time windows • Complex event processing Applications Development • Development tools • Data connectors • Business solution accelerators • Community innovation Streaming analytics solutions must be scalable and have a rich set of stateful analytical operators
  • 44. Scale to handle any volume & velocity of data
  • 45. Process and analyse in real-time
  • 46. Provide fault-tolerance for mission-critical applications
  • 47. Provide tools that make it easy to manage and monitor the platform and its interaction with technology components
  • 48. Offer tools for business users to visualize insights from real-time data
  • 49. Capture perishable events and insights at low latency
  • 50. Offer sophisticated stateful and stateless analytics
  • 51. Leverage existing skills to make it easy for developers to develop, test and deploy applications
  • 52. #
  • 53. Hadoop is designed for volume
  • 54. Spark is designed for speed
  • 55. © 2017 Forrester Research, Inc. Reproduction Prohibited Spark and Hadoop often coexist in the same cluster
  • 56. © 2017 Forrester Research, Inc. Reproduction Prohibited Hadoop and Spark are friends, but…
  • 57. …Spark is where developers go to create real-time enterprises
  • 58. 58,000x Spark is designed to process in-memory datasets, but can spool to disk if necessary
  • 59. Spark’s directed acyclic graph (DAG) engine optimizes parallelization to dramatically reduce intermediary data movement
  • 60. © 2017 Forrester Research, Inc. Reproduction Prohibited and/or and/orand/or Spark doesn’t need Hadoop; it just needs great compute and great storage
  • 61. © 2017 Forrester Research, Inc. Reproduction Prohibited Spark includes capabilities for streaming analytics and machine learning!
  • 63. © 2017 Forrester Research, Inc. Reproduction Prohibited Ideate Model Detect Adapt Machine Learning Streaming Analytics Descriptive Analytics Prescriptive Analytics (Real-time Analytics)     (Batch Analytics) Unify batch and streaming analytics to create your real-time enterprise
  • 64. #Time
  • 66. Use it to your advantage
  • 68. Real-Time Stream Processing and Machine Learning Platform ENABLING THE REAL TIME ENTERPRISE
  • 69. “Impetus has the opportunity to make StreamAnalytix the de facto tooling standard for Spark and future streaming engines…” Impetus Technologies covers open source bases without the headaches. Take your pick. Impetus’ StreamAnalytix supports Apache Storm and Apache Spark and is architecturally positioned to support other open source streaming analytics software such as Apache Flink. StreamAnalytix also embeds EsperTech to provide advanced streaming analytics capabilities such as complex event processing. What also shines about the StreamAnalytix solution is that it includes enterprise-grade visual tooling for both development and deployment of streaming applications. StreamAnalytix tooling also unifies streaming and batch by supporting arbitrary Spark jobs such as machine learning. A Strong Performer in The Forrester Wave™: Streaming Analytics, Q3 2017
  • 70. ENABLING THE REAL TIME ENTERPRISE 1 Real-Time Streaming Data Analytics 2 Makes Spark Easy (Visual Spark Studio)
  • 71. SENSE Hours/ Days ANALYZE ACT SENSE ANALYZE ACTSec/ ms Not so real-time Hours/ Days Sec/ ms StreamAnalytix is a platform to build real-time apps Near real-time / real-time 1
  • 72. Slow processing jobs Wherever you are – we can make you faster HADOOP-MR OR OTHER NON-BIG DATA TECH Faster due to in-memory SPARK BATCH JOBS Faster due to micro batch SPARK STREAMING JOBS Fastest EVENT STREAM PROCESSING 1 ENABLING THE REAL TIME ENTERPRISE
  • 73. Real-time C360 and Churn Fraud and Anomaly Detection IoT and Log Analytics Next Best Offer or Action Predictive Maintenance Cyber Security Real-time Call Center Analytics Use Cases Real-time Streaming Data Analytics 1 ENABLING THE REAL TIME ENTERPRISE
  • 74. Learning / Training  Real-time + Batch PMML, H20, Python – on Spark Kafka, Storm, Esper Scoring  Real-time + Batch Spark Streaming, SparkML, ML-Lib Stack Real-time Streaming Data Analytics 1 ENABLING THE REAL TIME ENTERPRISE
  • 75. 1 Real-Time Streaming Data Analytics 2 Makes Spark Easy (Visual Spark Studio)
  • 76. Shortage of Spark talent and the urgent need for it • Spark projects are increasing • Need to get done quickly, with budget controls • But, there is a big barrier: Talent - both quality and quantity • Deep Spark / Scala skills are hard to find • Big gap between Spark prototype app vs. production grade, scalable, stable apps that don’t need a lot of baby-sitting 2 IMPACT • S…LLL...O..OO...WW • DIFFICULT • COSTLY • RISK RIDDEN • SPARK PROJECTS
  • 77. Is the Real-time Enterprise possible ? With Spark use-cases taking too long to deliver ? 2
  • 78. Is the real-time enterprise possible? SOLUTION •More people? (They don’t exist yet – just gets more messy and costly) •Ditch Spark and buy proprietary platforms? ($$$$ - That’s going backwards) •Just bite the bullet, and delay the project? (Oops!) •Hire outsourcing companies? (Do they really have more skilled people?) 2
  • 79. Is the real-time enterprise possible? SOLUTION •Get the right tools •Make existing people and teams – much more productive 2
  • 80. The right Spark tool or platform – does this… Maintain Deploy Develop + Debug Monitor + Tune Apps Ingest Analytics/ ML ETL Visual IDE Scale Performance 2
  • 81. Data360 Visual Spark IDE – Drag and Drop Analytics – Feature extract, ML, Time windows Transform / Enrich – Filter, Blend, Lookup Streaming, Batch + Oozie Workflow Load – HDFS, HBase, Hive, Any NoSQL View – Real-time Dashboards Ingest – Tables, Files, Kafka, APIs Visual Spark Studio 2
  • 82.
  • 83.
  • 84.
  • 87. Hadoop Cluster StreamAnalytix Web Server1 (CentOS / RHEL 6.x or above) Load Balancer With sticky session User StreamAnalytix leverages Zookeeper for configuration management4 Standalone spark cluster or Spark over YARN3 MySQL/ Postgres RabbitMQ Deployment diagram Secured communication via Kerberos2 StreamAnalytix Web Container (Tomcat)
  • 88.
  • 89. Overview Local Mode + StreamAnalytix Spark portion + All dependencies = One Binary Full Cluster Identical user experience for building and managing Spark jobs Desktop or Single VM
  • 90. Go to “StreamAnalytix.com” to view demo and download Visual Spark Studio
  • 92. Why improve? …when you can transform your business
  • 93. Transforming the Business - means…. • Creating a real-time enterprise • Dramatic non-linear increase in performance / cost trade off • Net new capabilities or revenue streams – that were previously not possible
  • 94. Top airline boosts customer digital experience • Funnels all app data to enterprise bus and into StreamAnalytix • Couldn’t handle the volume and velocity of data earlier • Analytical capacity went from 3 days to 3 months • Ability to correlate events and see patterns across a larger time window • Customer experience issues proactively resolved in real-time • Foundation laid for real-time ML, predictive and prescriptive analytics
  • 95. JSON Raw Data User Kafka Data Ingestion UI Data Diagnostic Tool Query Results Data Querying Data Search YARN Parsing Filtering Emitting StreamAnalytix Spark Pipeline X Service data Raw JSON Data • Multiple Apps • Multiple Services All Services data StreamAnalytix Pipeline Overview High Level Solution Architecture Highlights • Input data velocity ~7K /sec • Contributing to ~5 TB /day • ES Data retention of 30 days • Custom built Web UI for queries • StreamAnalytix implementation providing easy onboarding of additional services and application logs Benefits • Diagnostic ability on a larger range of data • SLAs unaffected, similar and better • Improved searching with custom Web UI • Scalable architecture • Supporting even larger data sets Solution ElasticSearch
  • 96. •5X performance gain from the same hardware •New solution based on StreamAnalytix – costs less •Can onboard 5 times more application traffic for detecting threats Major bank - insider threat detection: 5X boost
  • 97. Data Ingestion Processing and Enrichment Data Sink and Persistence Data pipeline – high level processing stages
  • 98. Pharmacy business processing giant •Spark based real-time CDC and flow management •Sense-change, Ingest, Transform, Load •100s of source tables – data from a large number of pharmacies •Plus some important real-time ETL / Analytics use cases •Attunity  Kafka  StreamAnalytix / Spark - HDFS, Hive •2 mission critical data pipelines delivered in 1 day, 2 days •“I could hire a 3 person team instead of a 10 person team”
  • 99. Problem Statement •Oracle based transactions  merge to  Hive reporting tables in seconds ACHIEVEMENT •Spark pipelines for this task built and deployed in 2 days •Partner Integration with Attunity for CDC •Consume Oracle multi-table CDC events in real-time •Capture and reconcile changes into Hive tables •De-normalize data while landing into Hive
  • 100. Workflow: Modelled as StreamAnalytix Oozie workflow to automate execution of Spark pipelines that perform data de-normalization and incremental updates to Hive StreamAnalytix Solution Data Ingestion and Staging Stream data from Attunity replicate for multiple tables from Kafka and store raw data into HDFS A complete CDC solution has 3 parts Each aspect of the solution is modelled as StreamAnalytix pipeline Data De-normalization Join transactional data with data at rest and stores de-normalized data on HDFS Incremental Updates in Hive Merge previously processed transactional data with new incremental updates
  • 101. Pipeline #1 - Data ingestion and staging (Streaming) Data ingestion via Attunity ‘Channel’: Reads the data from Attunity target Kafka. This channel is configured to read data feeds as well as metadata from a separate topic Data enrichment: Enriches incoming data with metadata information and event timestamp HDFS: Stores CDC data on HDFS in landing area using OOB HDFS emitter. HDFS files are rotated based on time and size configuration
  • 102. Pipeline #2 - Data de-normalization (Batch) HDFS data channel: Ingests incremental data from previous runs of the staging location Pipeline #1 Reads reference (data at rest) from a fixed HDFS location Performs outer join to merge incremental and static data Store de-normalized data to HDFS directory
  • 103. Pipeline #3 - Incremental updates in Hive (Batch) Pipeline #2 Hive SQL query to load a managed table from the HDFS incremental data generated from Pipeline #2 Reconciliation step - Hive “merge into” SQL, performs insert, update and delete operation based on the operation in incremental data Clean up step - runs a drop table command on the managed table to clean up processed data – so that it doesn’t get repeatedly processed
  • 104. Workflow: Oozie Coordinator Job Oozie orchestration flow created using StreamAnalytix webstudio – it orchestrates pipeline #2 & pipeline #3 into a single Oozie flow that can scheduled as shown here
  • 105. “After a long time we now have a new offering we can go sell proudly to our customers” - Product Manager •Net new capability for real-time inspection and diagnostics of call quality and customer experience at the contact center •Dramatically improves end-user service for their B2B customers Hosted call center adds new premium product / revenue source
  • 106. Hosted call center Challenges solved •Individual events scattered in different media servers •Needed to filter a lot of noise in the data at the source itself •Tech support took too long to correlate and solve issues •Call Center manager had no real-time view on IVR operations •Needed a variety of cell center metrics in real-time
  • 107. Hosted call center solution Public Internet IP IP IP IP IP IP IP C C CIP C C C ACD = Packet = Circuit Internet Caller Chat, VOIP, E-mail, Collaboration, Video Wireless Caller Live Call, IVR, Voice Mail Telephone Caller Live Call, IVR, Voice Mail Core Servers Routing, Admin, Stats, Logging Agent Servers Agent Interaction Connection Servers IVR, Voice, Chat, Video, Message Dialing Servers Predictive Engine, Campaign Manager GATEWAYS Circuit NetworksCircuit Networks Legacy Call Centers ADMINISTRATOR/ SUPERVISOR Administration, Monitoring Service Creation, Recording Reports PC AGENT - SOFTPHONE PC AGENT – IP PHONE HYBRID AGENT PHONE AGENTS
  • 108. Hosted call center solution
  • 109. Hosted call center solution
  • 110. Hosted call center solution
  • 111. • 8000+ agent desktops monitored for unethical behaviour in real-time • Secures customer information • Ensures top quality service • Net new capability they couldn’t get earlier at any reasonable price point Tier 1 Telco deploys new “agent monitoring system”
  • 112. Desktop Analytics Key Business Metrics : • Average Handling Time • First Call Resolution • Sales Close Rate • Disconnect Save Rate 1yr benefit is $5.41M in the form of Call Volume Reduction 30 sec AHT reduction for Tech 15 sec AHT reduction for Sales
  • 113. Desktop analytics – desktop data pipeline Call Center Agent Machine Big Data Platform Desktop Raw data processing App activity aggregation Event activity aggregation System data enrich and persist App and Event data enrich and persist • Consume Raw ACD events • Parse and Split the Bulk Jason mssg into individual • Data Process for App, Event, System events • Aggregate data: Mini batching, Data sequencing, Enrich Data with Agent Hierarchy, Aggregate Data • Persist data into HIVE, HBASE, Elastic
  • 114. Source System Data type No Of Agent Records/Day Desktop Data Raw 9 69461 Desktop Data Aggregated 9 45428 Call Data Raw 7000 900000 Call Data Aggregated 7000 900000 Source System Data type No of Agents Records/Day Desktop Data Raw 7000 60M Desktop Data Aggregated 7000 20M Call Data Raw 7000 900000 Call Data Aggregated 7000 900000 Pilot GA Desktop analytics - data volume
  • 115. Thank you. Questions? © 2017 Impetus Technologies Email: inquiry@streamanalytix.com Twitter : @StreamAnalytix