SlideShare a Scribd company logo
1 of 27
Download to read offline
Huawei Advanced Data Science
With Spark Streaming
Jianfeng Qian, Cheng He
Huawei Research Institute
Contents
• streamDM: Stream Mining in Spark Streaming
(Jianfeng Qian)
• Business scenarios in Huawei with Spark Steaming
(Cheng He)
Open Source Machine Learning Projects
 Apache Mahout
• May 2010, v0.1: support Hadoop
• Apr 2014, Mahout-Samsara, v0.10: support Spark and H2O
• April 2016,v0.12: R-like DSL, support Flink
 oryx&oryx2
• Dec 2013, v0.3.0: real-time large-scale machine learning support
Hadoop
• Dec 2015, v2.0: support Spark Streaming
 Apache SAMOA: Scalable Advanced Massive Online Analysis
• Jul 2015, v0.3.0: support Storm, Samza and Flink
Stream Data Mining?
Stream Data Mining
Data Streams
• Sequence is potentially infinite
• High amount of data: sublinear space
• High speed of arrival: sublinear time per example
• Once an element from a data stream has been processed it is
discarded or archived
• Data is evolving
Approximation algorithms
• Small error rate with high probability
streamDM?
streamDM
streamDM is incremental
• streamDM is designed specifically to be used inside Spark
Streaming.
• All algorithms are incremental
streamDM!
streamDM for users
• Download streamDM
git clone https://github.com/huawei-noah/streamDM.git
• Build streamDM
sbt package
• streamDM execute tasks
./spark.sh "EvaluatePrequential 
-l (SGDLearner -l 0.01 –o LogisticLoss -r ZeroRegularizer) 
–s (FileReader –k 100 –d 60 –f ../data/mydata)" 
1> ../sgd.log 2>../sgd.result
streamDM for Programmers
• StreamReader read and parse Example and create a stream
• Learner provides the train method from an input stream
• Model data structure and set of methods used for Learner
• Evaluator evaluation of predictions
• StreamWriter output of streams
streamDM
• Advanced machine learning methods including streaming decision
trees, streaming clustering methods as CluStream and StreamKM++.
• Ease of use. Experiments can be performed from the command-line,
as in WEKA or MOA.
• High extensibility
• No dependence on third-part libraries, specially on the linear algebra
package Breeze.
streamDM
First Release 31/12/15
• SGD Learner and Perceptron
• Naive Bayes
• CluStream
• Hoeffding Decision Trees
• Bagging
• Stream KM++
Next Release 31/12/16(support Spark 2.0)
• Random Forests
• Frequent Itemset Miner: IncMine
Something else…
Google Cloud
Dataflow
Spark Flink Storm
streamDM
Platform
Stream Mining SAMOA
mllibMachine Learning(Batch) Mahout
oryx2
Something else…
Beam
Google Cloud
Dataflow
Spark Flink Storm
streamDM
Interface &SDK
Platform
Stream Mining SAMOA
BeamML
mllibMachine Learning(Batch) Mahout
oryx2
or something else…
Contents
• streamDM: Stream Mining in Spark Streaming
(Jianfeng Qian)
• Business scenarios in Huawei with Spark Steaming
(Cheng He)
Special Challenges in Telecom Big Data Analytics
Case 1: Alarm Analysis
Cloud Service
Alarm
Trouble Tickets
Field Operation /
Maintenance
Root Cause Analysis
`40-100 Millions Alarms / day
`Change rapidly with unnoticed phenomenon
Project A Project B Project DProject C
。。。
Data Base
AABD : Automatic Alarm Behavior Discovery
…
Topol
ogy
Config
uration
Alarm correlation
Automatic correlation rules generation
& optimization
Root Cause Identification
Quick troubleshooting, reduce MTTR
Fault Penetration
Multi-dimensional analysis the history alarms to identify the potential
network risk or fault
Forecasting
Forecasting next alarm/fault
Rule Database
Flapping
behavior
mining
Frequent item
sets mining
Casual
Inference
Algorithms Library
Automatic Alarm Behavior Discovery
Correlation Method
Cross Domain Performance
Service
Impact …
…
Mllib - >
StreamDM
AABD for Automatic Alarm Management
30X
Rule Design
Efficiency
30%
Alarm Correlation
Implementation
Process Time
Rule Database
Flapping
behavior
mining
Frequent item
sets mining
Casual
Inference
Algorithms Library
Automatic Alarm
Behavior Discovery
Correlation Method
Cross Domain Performance
Service
Impact …
…
Fault
Management
Auto
Troubleshooting Rule
Auto Alarm Rule
Auto Diagnostic
Expert
AABD: Automatic Alarm Behavior Discovery
Systems
AABD : Results from practical applications
` Deployed in 13 Operator sites all over the world
` Improve efficiency of deployment from 5 man * month to 3
man * day
` Millions of alarm sequences, < 1 hour;
Case 2 : Fault Localization for Customer Care
Complaint
Query
Log file / Signaling
events
Fault Localization Engine
Network
KB
Knowledge Base
Maintenance
Complaint Process
Application
KB
Network/Charging
/Support Dept.…
Front line
Customer
Service Staff
Expert / Manager Review
Maintenance
Dept.
Traditional CCA
Complaints
Fault Localization
Results
Ensemble
Classification Model
Automatic Feature
Engineering
Intelligent CCA
Expert
Rules
K
B
Answers
Difficult
issues
Challenges on Discriminative sequence pattern mining
1. It’s an NP-Hard problem to mine patterns out from
massive sequence data for accurate classification
2. The upper bound of the pattern search space is
3. Usually, important patterns occur not so frequently,
so we probably will search the total space of
(1 )
( )s p
O S 
Important and
discriminative patterns
( )s
O S
Original issues State of art
1. Based on Divide-and-conquer,we can reduce
the search space, even when we adopt larger p,
we can still get discriminative patterns
2. Scale down ratio is 1/ sp
s
(1 )
(2 )l m p
O m 
Research goals
1. Online stream pattern mining,dynamically
generate trees for pattern mining,, further
reduce the search space to ;
2. Approximation analysis: balance
complexity and accuracy;
1. Open dataset:192 samples,p = 12% ,no. of
patterns 8600;p = 4%,No. of patterns 92,000;
2. In CCA,538 samples,p=5%,No. of patterns
687402;
1. In CCA:,538 samples,p=20%,No. of
patterns: 29696;
2
1 2
1
ln
(F ) G(F )
2
st nd
R
G
n

  
Ref: “Direct mining of discriminative and essential frequent
patterns via model-based search tree”, W. Fan, etc.. KDD 2008
Experiment results
Mining discriminative and essential patterns with extremely low global support directly.
Better efficiency for stream sequential data mining while preserving accuracy.
With the ability to detect concept drift quickly and adapt to new concept fast;
Design of Fault Localization Solution
Data preprocessing
Sequence Encoding
Sequential Pattern Mining
Modeling
Pattern Matching
Online Classification
Label Feedback
Model Update
`4-6 TB data / day;
`8M – 10M sequences;
`E-2-E feedback support
< 3.6 sec;
`Model update 5 hours
Spark-streaming
Spark
Lessons from practical applications
 About spark:
 Advantages:
 Easy to use;
 Perfect community & ecosystem;
 Limitations:
 Delay;
 Throughput;
 About big data analytics in Telecom networks:
 Efficient sequential pattern mining framework
 Deep reinforcement learning;
 Robust ML / AI & Domain Knowledge;
 Close-loop evaluation;
StreamSMART
Scenarios & Apps:
 App recommendation system;
100M+ customers;
30M – 100M features;
 Anti-DDoS Solution;
4M - 10M flow / sec;
Huawei Innovation Research Program
• The Huawei Innovation Research Program (HIRP) provides funding opportunities to
leading universities and research institutes conducting innovative research in
communication technology, computer science, engineering, and related fields. HIRP
seeks to identify and support world-class, full-time faculty members pursuing
innovation of mutual interest. Outstanding HIRP winners may be invited to establish
further long-term research collaboration with Huawei.
• Call for Proposals for Big Data & Artificial Intelligence
• HIRPO20160606: Novel Algorithm Design and Use Cases for Data Stream Mining based on streamDM
https://innovationresearch.huawei.com/IPD/hirp/portal/index.html
Join us to build a better connected world
THANK YOU
jianfeng.qian@outlook.com, hecheng@huawei.com

More Related Content

What's hot

Spark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko KorndorfSpark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko KorndorfSpark Summit
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark StreamingJen Aman
 
Resource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache SparkResource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache SparkDatabricks
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit
 
CaffeOnSpark: Deep Learning On Spark Cluster
CaffeOnSpark: Deep Learning On Spark ClusterCaffeOnSpark: Deep Learning On Spark Cluster
CaffeOnSpark: Deep Learning On Spark ClusterJen Aman
 
Spark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan RavatSpark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan RavatSpark Summit
 
Apache Spark MLlib 2.0 Preview: Data Science and Production
Apache Spark MLlib 2.0 Preview: Data Science and ProductionApache Spark MLlib 2.0 Preview: Data Science and Production
Apache Spark MLlib 2.0 Preview: Data Science and ProductionDatabricks
 
Spark Summit EU talk by Rolf Jagerman
Spark Summit EU talk by Rolf JagermanSpark Summit EU talk by Rolf Jagerman
Spark Summit EU talk by Rolf JagermanSpark Summit
 
Spark Summit EU talk by Elena Lazovik
Spark Summit EU talk by Elena LazovikSpark Summit EU talk by Elena Lazovik
Spark Summit EU talk by Elena LazovikSpark Summit
 
Spark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan KesslerSpark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan KesslerSpark Summit
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit
 
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Kent Buenaventura and Willaim LauSpark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Kent Buenaventura and Willaim LauSpark Summit
 
An Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaAn Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaSpark Summit
 
Analytics at Scale with Apache Spark on AWS with Jonathan Fritz
Analytics at Scale with Apache Spark on AWS with Jonathan FritzAnalytics at Scale with Apache Spark on AWS with Jonathan Fritz
Analytics at Scale with Apache Spark on AWS with Jonathan FritzDatabricks
 
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Spark Summit
 
Scalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In BaiduScalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In BaiduJen Aman
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit
 
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit EU talk by Mikhail Semeniuk Hollin WilkinsSpark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit EU talk by Mikhail Semeniuk Hollin WilkinsSpark Summit
 
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit
 

What's hot (20)

Spark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko KorndorfSpark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko Korndorf
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
 
Resource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache SparkResource-Efficient Deep Learning Model Selection on Apache Spark
Resource-Efficient Deep Learning Model Selection on Apache Spark
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik Sivashanmugam
 
CaffeOnSpark: Deep Learning On Spark Cluster
CaffeOnSpark: Deep Learning On Spark ClusterCaffeOnSpark: Deep Learning On Spark Cluster
CaffeOnSpark: Deep Learning On Spark Cluster
 
Spark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan RavatSpark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan Ravat
 
Apache Spark MLlib 2.0 Preview: Data Science and Production
Apache Spark MLlib 2.0 Preview: Data Science and ProductionApache Spark MLlib 2.0 Preview: Data Science and Production
Apache Spark MLlib 2.0 Preview: Data Science and Production
 
Spark Summit EU talk by Rolf Jagerman
Spark Summit EU talk by Rolf JagermanSpark Summit EU talk by Rolf Jagerman
Spark Summit EU talk by Rolf Jagerman
 
Spark Summit EU talk by Elena Lazovik
Spark Summit EU talk by Elena LazovikSpark Summit EU talk by Elena Lazovik
Spark Summit EU talk by Elena Lazovik
 
Spark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan KesslerSpark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan Kessler
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
 
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Kent Buenaventura and Willaim LauSpark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
 
An Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaAn Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal Malohlava
 
Analytics at Scale with Apache Spark on AWS with Jonathan Fritz
Analytics at Scale with Apache Spark on AWS with Jonathan FritzAnalytics at Scale with Apache Spark on AWS with Jonathan Fritz
Analytics at Scale with Apache Spark on AWS with Jonathan Fritz
 
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
 
Scalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In BaiduScalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In Baidu
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik Sivashanmugam
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
 
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit EU talk by Mikhail Semeniuk Hollin WilkinsSpark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
 
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
 

Viewers also liked

Morticia: Visualizing And Debugging Complex Spark Workflows
Morticia: Visualizing And Debugging Complex Spark WorkflowsMorticia: Visualizing And Debugging Complex Spark Workflows
Morticia: Visualizing And Debugging Complex Spark WorkflowsSpark Summit
 
Large Scale Multimedia Data Intelligence And Analysis On Spark
Large Scale Multimedia Data Intelligence And Analysis On SparkLarge Scale Multimedia Data Intelligence And Analysis On Spark
Large Scale Multimedia Data Intelligence And Analysis On SparkJen Aman
 
Airstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At AirbnbAirstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At AirbnbJen Aman
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitSpark Summit
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLJen Aman
 
Big Data in Production: Lessons from Running in the Cloud
Big Data in Production: Lessons from Running in the CloudBig Data in Production: Lessons from Running in the Cloud
Big Data in Production: Lessons from Running in the CloudJen Aman
 
Recent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsRecent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsDatabricks
 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousJen Aman
 
Netflix - Productionizing Spark On Yarn For ETL At Petabyte Scale
Netflix - Productionizing Spark On Yarn For ETL At Petabyte ScaleNetflix - Productionizing Spark On Yarn For ETL At Petabyte Scale
Netflix - Productionizing Spark On Yarn For ETL At Petabyte ScaleJen Aman
 
700 Queries Per Second with Updates: Spark As A Real-Time Web Service
700 Queries Per Second with Updates: Spark As A Real-Time Web Service700 Queries Per Second with Updates: Spark As A Real-Time Web Service
700 Queries Per Second with Updates: Spark As A Real-Time Web ServiceSpark Summit
 
Interactive Visualization of Streaming Data Powered by Spark
Interactive Visualization of Streaming Data Powered by SparkInteractive Visualization of Streaming Data Powered by Spark
Interactive Visualization of Streaming Data Powered by SparkSpark Summit
 
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0Databricks
 
Credit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph AnalysisCredit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph AnalysisJen Aman
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Databricks
 
Analyzing Log Data With Apache Spark
Analyzing Log Data With Apache SparkAnalyzing Log Data With Apache Spark
Analyzing Log Data With Apache SparkSpark Summit
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Databricks
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Jen Aman
 
Big Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your BrowserBig Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your Browsergethue
 
Huawei - Celcom Site Survey April 2016 proposal
Huawei - Celcom Site Survey April 2016 proposalHuawei - Celcom Site Survey April 2016 proposal
Huawei - Celcom Site Survey April 2016 proposalAbdullah Arsath
 

Viewers also liked (20)

Morticia: Visualizing And Debugging Complex Spark Workflows
Morticia: Visualizing And Debugging Complex Spark WorkflowsMorticia: Visualizing And Debugging Complex Spark Workflows
Morticia: Visualizing And Debugging Complex Spark Workflows
 
Large Scale Multimedia Data Intelligence And Analysis On Spark
Large Scale Multimedia Data Intelligence And Analysis On SparkLarge Scale Multimedia Data Intelligence And Analysis On Spark
Large Scale Multimedia Data Intelligence And Analysis On Spark
 
Airstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At AirbnbAirstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At Airbnb
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemML
 
Big Data in Production: Lessons from Running in the Cloud
Big Data in Production: Lessons from Running in the CloudBig Data in Production: Lessons from Running in the Cloud
Big Data in Production: Lessons from Running in the Cloud
 
Recent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsRecent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced Analytics
 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 Furious
 
Netflix - Productionizing Spark On Yarn For ETL At Petabyte Scale
Netflix - Productionizing Spark On Yarn For ETL At Petabyte ScaleNetflix - Productionizing Spark On Yarn For ETL At Petabyte Scale
Netflix - Productionizing Spark On Yarn For ETL At Petabyte Scale
 
700 Queries Per Second with Updates: Spark As A Real-Time Web Service
700 Queries Per Second with Updates: Spark As A Real-Time Web Service700 Queries Per Second with Updates: Spark As A Real-Time Web Service
700 Queries Per Second with Updates: Spark As A Real-Time Web Service
 
Interactive Visualization of Streaming Data Powered by Spark
Interactive Visualization of Streaming Data Powered by SparkInteractive Visualization of Streaming Data Powered by Spark
Interactive Visualization of Streaming Data Powered by Spark
 
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
 
Credit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph AnalysisCredit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph Analysis
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
 
Analyzing Log Data With Apache Spark
Analyzing Log Data With Apache SparkAnalyzing Log Data With Apache Spark
Analyzing Log Data With Apache Spark
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow
 
Big Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your BrowserBig Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your Browser
 
Huawei - Celcom Site Survey April 2016 proposal
Huawei - Celcom Site Survey April 2016 proposalHuawei - Celcom Site Survey April 2016 proposal
Huawei - Celcom Site Survey April 2016 proposal
 

Similar to Huawei Advanced Data Science With Spark Streaming

Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...confluent
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupPaco Nathan
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko GlobalLogic Ukraine
 
Spark Streaming the Industrial IoT
Spark Streaming the Industrial IoTSpark Streaming the Industrial IoT
Spark Streaming the Industrial IoTJim Haughwout
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesPaco Nathan
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
 
Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_VeriticalsPeyman Mohajerian
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Jason Dai
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaDataWorks Summit
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022HostedbyConfluent
 
Data Pipelines and Telephony Fraud Detection Using Machine Learning
Data Pipelines and Telephony Fraud Detection Using Machine Learning Data Pipelines and Telephony Fraud Detection Using Machine Learning
Data Pipelines and Telephony Fraud Detection Using Machine Learning Eugene
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsAsis Mohanty
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analyticsconfluent
 

Similar to Huawei Advanced Data Science With Spark Streaming (20)

Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
 
Spark Streaming the Industrial IoT
Spark Streaming the Industrial IoTSpark Streaming the Industrial IoT
Spark Streaming the Industrial IoT
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case Studies
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_Veriticals
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
 
Brad stack - Digital Health and Well-Being Festival
Brad stack - Digital Health and Well-Being Festival Brad stack - Digital Health and Well-Being Festival
Brad stack - Digital Health and Well-Being Festival
 
IoT meets Big Data
IoT meets Big DataIoT meets Big Data
IoT meets Big Data
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
Data Pipelines and Telephony Fraud Detection Using Machine Learning
Data Pipelines and Telephony Fraud Detection Using Machine Learning Data Pipelines and Telephony Fraud Detection Using Machine Learning
Data Pipelines and Telephony Fraud Detection Using Machine Learning
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture Patterns
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
M7 and Apache Drill, Micheal Hausenblas
M7 and Apache Drill, Micheal HausenblasM7 and Apache Drill, Micheal Hausenblas
M7 and Apache Drill, Micheal Hausenblas
 

More from Jen Aman

Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei ZahariaDeep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei ZahariaJen Aman
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéJen Aman
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesJen Aman
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesJen Aman
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsJen Aman
 
Spatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using SparkSpatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using SparkJen Aman
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Jen Aman
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat DetectionJen Aman
 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkJen Aman
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersJen Aman
 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkJen Aman
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityJen Aman
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityJen Aman
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache SparkJen Aman
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesJen Aman
 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkJen Aman
 
GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonJen Aman
 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on MesosJen Aman
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibJen Aman
 
Spark at Bloomberg: Dynamically Composable Analytics
Spark at Bloomberg:  Dynamically Composable Analytics Spark at Bloomberg:  Dynamically Composable Analytics
Spark at Bloomberg: Dynamically Composable Analytics Jen Aman
 

More from Jen Aman (20)

Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei ZahariaDeep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
 
Spatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using SparkSpatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using Spark
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat Detection
 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using Spark
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache Spark
 
GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And Python
 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on Mesos
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlib
 
Spark at Bloomberg: Dynamically Composable Analytics
Spark at Bloomberg:  Dynamically Composable Analytics Spark at Bloomberg:  Dynamically Composable Analytics
Spark at Bloomberg: Dynamically Composable Analytics
 

Recently uploaded

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 

Recently uploaded (20)

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 

Huawei Advanced Data Science With Spark Streaming

  • 1. Huawei Advanced Data Science With Spark Streaming Jianfeng Qian, Cheng He Huawei Research Institute
  • 2. Contents • streamDM: Stream Mining in Spark Streaming (Jianfeng Qian) • Business scenarios in Huawei with Spark Steaming (Cheng He)
  • 3. Open Source Machine Learning Projects  Apache Mahout • May 2010, v0.1: support Hadoop • Apr 2014, Mahout-Samsara, v0.10: support Spark and H2O • April 2016,v0.12: R-like DSL, support Flink  oryx&oryx2 • Dec 2013, v0.3.0: real-time large-scale machine learning support Hadoop • Dec 2015, v2.0: support Spark Streaming  Apache SAMOA: Scalable Advanced Massive Online Analysis • Jul 2015, v0.3.0: support Storm, Samza and Flink
  • 5. Stream Data Mining Data Streams • Sequence is potentially infinite • High amount of data: sublinear space • High speed of arrival: sublinear time per example • Once an element from a data stream has been processed it is discarded or archived • Data is evolving Approximation algorithms • Small error rate with high probability
  • 8. streamDM is incremental • streamDM is designed specifically to be used inside Spark Streaming. • All algorithms are incremental
  • 10. streamDM for users • Download streamDM git clone https://github.com/huawei-noah/streamDM.git • Build streamDM sbt package • streamDM execute tasks ./spark.sh "EvaluatePrequential -l (SGDLearner -l 0.01 –o LogisticLoss -r ZeroRegularizer) –s (FileReader –k 100 –d 60 –f ../data/mydata)" 1> ../sgd.log 2>../sgd.result
  • 11. streamDM for Programmers • StreamReader read and parse Example and create a stream • Learner provides the train method from an input stream • Model data structure and set of methods used for Learner • Evaluator evaluation of predictions • StreamWriter output of streams
  • 12. streamDM • Advanced machine learning methods including streaming decision trees, streaming clustering methods as CluStream and StreamKM++. • Ease of use. Experiments can be performed from the command-line, as in WEKA or MOA. • High extensibility • No dependence on third-part libraries, specially on the linear algebra package Breeze.
  • 13. streamDM First Release 31/12/15 • SGD Learner and Perceptron • Naive Bayes • CluStream • Hoeffding Decision Trees • Bagging • Stream KM++ Next Release 31/12/16(support Spark 2.0) • Random Forests • Frequent Itemset Miner: IncMine
  • 14. Something else… Google Cloud Dataflow Spark Flink Storm streamDM Platform Stream Mining SAMOA mllibMachine Learning(Batch) Mahout oryx2
  • 15. Something else… Beam Google Cloud Dataflow Spark Flink Storm streamDM Interface &SDK Platform Stream Mining SAMOA BeamML mllibMachine Learning(Batch) Mahout oryx2 or something else…
  • 16. Contents • streamDM: Stream Mining in Spark Streaming (Jianfeng Qian) • Business scenarios in Huawei with Spark Steaming (Cheng He)
  • 17. Special Challenges in Telecom Big Data Analytics
  • 18. Case 1: Alarm Analysis Cloud Service Alarm Trouble Tickets Field Operation / Maintenance Root Cause Analysis `40-100 Millions Alarms / day `Change rapidly with unnoticed phenomenon
  • 19. Project A Project B Project DProject C 。。。 Data Base AABD : Automatic Alarm Behavior Discovery … Topol ogy Config uration Alarm correlation Automatic correlation rules generation & optimization Root Cause Identification Quick troubleshooting, reduce MTTR Fault Penetration Multi-dimensional analysis the history alarms to identify the potential network risk or fault Forecasting Forecasting next alarm/fault Rule Database Flapping behavior mining Frequent item sets mining Casual Inference Algorithms Library Automatic Alarm Behavior Discovery Correlation Method Cross Domain Performance Service Impact … … Mllib - > StreamDM
  • 20. AABD for Automatic Alarm Management 30X Rule Design Efficiency 30% Alarm Correlation Implementation Process Time Rule Database Flapping behavior mining Frequent item sets mining Casual Inference Algorithms Library Automatic Alarm Behavior Discovery Correlation Method Cross Domain Performance Service Impact … … Fault Management Auto Troubleshooting Rule Auto Alarm Rule Auto Diagnostic Expert AABD: Automatic Alarm Behavior Discovery Systems AABD : Results from practical applications ` Deployed in 13 Operator sites all over the world ` Improve efficiency of deployment from 5 man * month to 3 man * day ` Millions of alarm sequences, < 1 hour;
  • 21. Case 2 : Fault Localization for Customer Care Complaint Query Log file / Signaling events Fault Localization Engine Network KB Knowledge Base Maintenance Complaint Process Application KB Network/Charging /Support Dept.… Front line Customer Service Staff Expert / Manager Review Maintenance Dept. Traditional CCA Complaints Fault Localization Results Ensemble Classification Model Automatic Feature Engineering Intelligent CCA Expert Rules K B Answers Difficult issues
  • 22. Challenges on Discriminative sequence pattern mining 1. It’s an NP-Hard problem to mine patterns out from massive sequence data for accurate classification 2. The upper bound of the pattern search space is 3. Usually, important patterns occur not so frequently, so we probably will search the total space of (1 ) ( )s p O S  Important and discriminative patterns ( )s O S Original issues State of art 1. Based on Divide-and-conquer,we can reduce the search space, even when we adopt larger p, we can still get discriminative patterns 2. Scale down ratio is 1/ sp s (1 ) (2 )l m p O m  Research goals 1. Online stream pattern mining,dynamically generate trees for pattern mining,, further reduce the search space to ; 2. Approximation analysis: balance complexity and accuracy; 1. Open dataset:192 samples,p = 12% ,no. of patterns 8600;p = 4%,No. of patterns 92,000; 2. In CCA,538 samples,p=5%,No. of patterns 687402; 1. In CCA:,538 samples,p=20%,No. of patterns: 29696; 2 1 2 1 ln (F ) G(F ) 2 st nd R G n     Ref: “Direct mining of discriminative and essential frequent patterns via model-based search tree”, W. Fan, etc.. KDD 2008
  • 23. Experiment results Mining discriminative and essential patterns with extremely low global support directly. Better efficiency for stream sequential data mining while preserving accuracy. With the ability to detect concept drift quickly and adapt to new concept fast;
  • 24. Design of Fault Localization Solution Data preprocessing Sequence Encoding Sequential Pattern Mining Modeling Pattern Matching Online Classification Label Feedback Model Update `4-6 TB data / day; `8M – 10M sequences; `E-2-E feedback support < 3.6 sec; `Model update 5 hours Spark-streaming Spark
  • 25. Lessons from practical applications  About spark:  Advantages:  Easy to use;  Perfect community & ecosystem;  Limitations:  Delay;  Throughput;  About big data analytics in Telecom networks:  Efficient sequential pattern mining framework  Deep reinforcement learning;  Robust ML / AI & Domain Knowledge;  Close-loop evaluation; StreamSMART Scenarios & Apps:  App recommendation system; 100M+ customers; 30M – 100M features;  Anti-DDoS Solution; 4M - 10M flow / sec;
  • 26. Huawei Innovation Research Program • The Huawei Innovation Research Program (HIRP) provides funding opportunities to leading universities and research institutes conducting innovative research in communication technology, computer science, engineering, and related fields. HIRP seeks to identify and support world-class, full-time faculty members pursuing innovation of mutual interest. Outstanding HIRP winners may be invited to establish further long-term research collaboration with Huawei. • Call for Proposals for Big Data & Artificial Intelligence • HIRPO20160606: Novel Algorithm Design and Use Cases for Data Stream Mining based on streamDM https://innovationresearch.huawei.com/IPD/hirp/portal/index.html
  • 27. Join us to build a better connected world THANK YOU jianfeng.qian@outlook.com, hecheng@huawei.com