SlideShare a Scribd company logo
1 of 37
DATA DATA EVERYWHERE
ARUN KEJARIWAL
— Not An Insight to Take Action Upon ­­­­­—
WHAT’S UP WITH THE TITLE?
­­­­­— Metrics Arms Race ­­­­­—
•  “… scaling it to about two million distinct time
series …” (Netflix)
•  “… highly accurate, real-time alerts on millions of
system and business metrics …” (Uber)
•  “As we have hundreds of systems exposing
multiple data items, the write data rate might
easily exceed tens of millions of data points each
second.” (Facebook)
•  “ … w e a r e t a k i n g t e n s o f b i l l i o n s o f
measurements…” (Google)
•  “The Observability stack collects 170 million
individual data metrics (time series) …” (Twitter)
•  “… serving over 50 million distinct time
series.” (Spotify)
WHAT’S UP WITH THE TITLE?
­­­­­— Metrics Arms Race ­­­­­—
•  >95% of metrics data is NEVER read!!
•  Legacy instrumentation
•  Lack of understanding of how to use metrics
Latency
10p, 20p, …, 90p
95p, 99p, 99.9p
Mean
•  Retention
“Hard disk is cheap”
WHAT’S UP WITH THE TITLE?
“Rime of the Ancient Mariner”
INSPIRATION
By Samuel Taylor Coleridge
Water, water, every where,
Nor a drop to drink.
­­­­­— Rooted in 1798! ­­­­­—
Image Source: https://ebooks.adelaide.edu.au/c/coleridge/samuel_taylor/rime/!
Growing number of data sources
DATA EXPLOSION
•  Mobile (Smartphones, Tablets, Smart Watches)
­­­­­— Relation to Big/Fast Data ­­­­­—
Data collection has become a commodity
•  Million of time series
•  IoT
•  Wearables
DATA, DATA, EVERYWHERE
DATA, DATA, EVERYWHERE
­­­­­— Non-trivial to Mine Actionable Insights ­­­­­—
DATA, DATA, EVERYWHERE
­­­­­— Time to Market, sinking ships – Can get “lonely” ­­­­­—
Image Source: https://ebooks.adelaide.edu.au/c/coleridge/samuel_taylor/rime/!
1896 Olympics, Greece: Thomas Burke
Purpose and ability to act
Applying hard metrics and asking hard questions
“Advanced data analytics is a quintessential business matter.” [1]
[1] http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/making-data-analytics-work-for-you-instead-of-the-other-way-around, McKinsey, October, 2016.!
DATA, DATA, EVERYWHERE
­­­­­— Value-driving analytics, NOT pristine data sets, interesting patterns or killer algorithms ­­­­­—
BREAK IT DOWN
The impact of “big data” analytics is
often manifested by thousands—or
m o r e — o f i n c r e m e n t a l l y s m a l l
improvements. If an organization can
atomize a single process into its
smallest parts and implement advances
where possible, the payoffs can be
profound. [1]
DATA, DATA, EVERYWHERE
­­­­­— Purpose and Action ­­­­­—
[1] http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/making-data-analytics-work-for-you-instead-of-the-other-way-around, McKinsey, October, 2016.!
ITERATE
“Victory often resulted from the way
decisions are made; the side that
reacts to situations more quickly and
processes new information more
accurately should prevail.”
DATA ANALYTICS
PERFORMANCE
KNEE POINT
DETECTION
AVAILABILITY
ROOT CAUSE
ANALYSIS
AUTOSCALING
EFFICIENCY
A/B
TESTING
INTRUSION
DETECTION
­­­­­— Not a Hype, But Not Trivial Either ­­­­­—
FLORENCE ST 54
NEW YORK
ANOMALY DETECTION
­­­­­— >100 years History ­­­­­—
Bessel and Baeyer 1838, Chauvenet 1863, Stone 1868, Wright 1884, Irwin 1925, Student 1927, Thompson 1935!
“… present and exact rule for the rejection of observations,!
which shall be legitimately derived from the fundamental !
principles of Calculus of Probabilities.”!
FLORENCE ST 54
NEW YORK
ANOMALY DETECTION
­­­­­— Prior to 1950s ­­­­­—
FLORENCE ST 54
NEW YORK
ANOMALY DETECTION
­­­­­— Prior to 1950s ­­­­­—
ANOMALY DETECTION
Median - Wright 1599 (“Certaine Errors in Navigation”), Cournot 1843, Fechner 1874, Galton1882, Edgeworth 1887
Median Absolute Deviation (median |Xi – median(X)|)
Estimators Sn (mediani {medianj|Xi - Xj|}) and Qn (median {(Xi + Xj)/2; i < j})
ROBUST STATISTICS
NORMAL
DIST
STATIONARITY CONTEXT
­­­­­— Check your assumptions­­­­­—
FLORENCE ST 54
NEW YORK
ANOMALY DETECTION
­­­­­— Early work post 1950 ­­­­­—
ANOMALY DETECTION
­­­­­— >100 years History ­­­­­—
ANOMALY DETECTION
­­­­­— >100 years History ­­­­­—
ANOMALY DETECTION
Increasing focus of monitoring solutions (RUM/Synthetic): DataDog, Catchpoint, Opsclarity, Ruxit
Netflix, Airbnb, Cloudera
­­­­­— Operations ­­­­­—
Introducing Practical and Robust
Anomaly Detection in Time Series[2]
[1] https://research.yahoo.com/news/announcing-open-source-egads-scalable-configurable-and-novel-anomaly-detection-system 2015 (https://github.com/yahoo/egads)!
[2] https://blog.twitter.com/2015/introducing-practical-and-robust-anomaly-detection-in-a-time-series, 2015 (https://github.com/twitter/AnomalyDetection)!
[3] https://eng.uber.com/argos/, 2015!
[4] http://bit.ly/luminol-velocity, 2016 (https://github.com/linkedin/luminol)!
Identifying Outages with Argos,
Uber Engineering’s Real-Time
Monitoring and Root-Cause
Exploration Tool [3]
Robust Anomaly Detection System
for Real User Monitoring Data [4]
EGADS: A Scalable, Configurable, and
Novel Anomaly Detection System[1]
ANOMALY DETECTION
AZURE [1] AWS [2] IBM Cloud [3]
­­­­­— As A Service On The Cloud ­­­­­—
[1] https://azure.microsoft.com/en-us/documentation/articles/machine-learning-apps-anomaly-detection!
[2] https://aws.amazon.com/blogs/iot/anomaly-detection-using-aws-iot-and-aws-lambda!
[3] https://developer.ibm.com/recipes/tutorials/engage-machine-learning-for-detecting-anomalous-behaviors-of-things!
ANOMALY DETECTION
SEASONALITY
Geographic
TREND
Growth
RESIDUAL
­­­­­— Key Aspects ­­­­­—
ANOMALY DETECTION
HADOOP CLUSTER
LOAD BALANCER
DISTRIBUTED DATABASE
DBSCAN
Finding outlier nodes
­­­­­— Finding needle in a haystack ­­­­­—
Clustering Techniques
OPTICS
K-MEANS/MEDIOD
ANOMALY DETECTION
VS
Real-time Implications: Frequency based techniques may incur additional overhead
Time Domain Frequency Domain
­­­­­— Distinct Approaches ­­­­­—
Time series analysis
Statistical tests, Clustering
Fast Fourier Transform, DWT
Fractals, Filters (Kalman Filter)
FLORENCE ST 54
NEW YORK
ANOMALY DETECTION
­­­­­— Distinct Approaches ­­­­­—
UNSUPERVISED SUPERVISED
Higher accuracy, Bias-Variance Tradeoff
Decision Trees, SVM, Neural NetworksClustering
The common case in Operations
ANOMALY DETECTION
Complex Architectures: Hundreds of Microservices
Loss of productivity (TTD, TTR)
Impact on end-user experienceFALSE POSITIVE/NEGATIVE TRUE POSITIVE
­­­­­— Too Many Alerts ­­­­­—
ANOMALY DETECTION
ACTIONABLE
PRODUCTIVITY
SEVERITY
PRIORITIZATTION
ROOT CAUSE ANALYSIS
CORRELATION
­­­­­— Properties ­­­­­—
SOURCES
Data Collection Issues (network hiccups, bursty traffic, queue overflow – packet loss)
System Failures (bugs, hardware) – cascading effects
MISSING DATA
­­­­­— Often Overlooked During Analysis ­­­­­—
Makes	
  analysis	
  non-­‐trivial
Methods are in general not prepared to handle them
	
  
Loss	
  of	
  efficiency	
  
Fewer patterns extracted from data
Conclusions statistically less strong
Inference bias
Resulting from differences between missing and complete data
Larger standard error
Reduced sample size
MISSING DATA
­­­­­— A Large Amount of Prior Work ­­­­­—
MISSING DATA
­­­­­— A Large Amount of Prior Work ­­­­­—
MISSING DATA
MISSING COMPLETELY
AT RANDOM
Does not depend on either the
observed or missing data
MISSING AT
RANDOM
Depends on the observed
data, but not on missing
data
MISSING NOT
AT RANDOM
Depends on missing/
unobserved (latent) values
­­­­­— Characterization of Missing Values ­­­­­—
Cause
Random
Uncorrelated with variables
of interest
Most Common Assumption Factors
Correlation between cause
of missingness and
Variables of interest
Missingness of variable
of interest
Common Case!
MISSING DATA
Completely
Recorded Units
Weighting Imputation Model
Procedures Based Based
­­­­­— TAXONOMY of METHODS ­­­­­—
Discard variables with
missing data
Subset data to ignore
missing data
Differentially weigh the
complete cases to adjust
for bias
Weights are a function of
response probability
Assign a value to missing
one
Leverage existing ML/data
mining methods
Define a model for the
observed data
Inference based on the
likelihood or posterior
distribution under the model
MISSING VALUES
Transformations
Normalization
Prediction
­­­­­— Imputation Methods ­­­­­—
Mean Value Last value Mean of xt-1 and xt+1Most Common
Value
Regression
Similarity K-Nearest
Neighbor
K-
Mean/Mediod
Fuzzy K-
Mean/Mediod
SVM, SVD
Event Covering
MISSING VALUES
Resampling Methods: Jackknifing (Quenouille, 1949) and Bootstrapping (Efron, 1979)
Based on large-sample theory
MI: Based on Bayesian theory and provides useful inferences for small samples
MULTIPLE
IMPUTATION
­­­­­— Imputation Methods ­­­­­—
Replace each missing value by a vector of D ≥ 2 imputed values
Single imputation cannot reflect sampling variability under one
model for missing data or uncertainty about the correct model
for missing data
D complete-data inferences are combined to form one inference
that reflects uncertainty due missing data under that model
Throws light on sensitivity of inference of models for missing
data
MISSING VALUES
MICE: Chained Equations
(Sequential Regression Multiple Regression)
Assumes missing data is MAR
Each variable with missing data is modeled
conditional upon the other variables in the data
Learn a Bayesian Network using complete data and
use it simultaneously impute all missing values via
abductive inference
Incremental imputation
MULTIVARIATE
IMPUTATION
­­­­­— Imputation Methods ­­­­­—
Multivariate regression
Linear, Logistic, Poisson
Use of auxiliary variables – not used in the analysis (predictive of missingness) but can improve imputations
MISSING VALUES
Expectation-Maximization (EM)
Hartley 1958, Orchard and Woodbury 1972
Dempster et al. 1977
Converges reliably to a local maximum or a
saddle point
Slow to converge in presence of large number of
missing values
M step can be difficult (e.g., has no closed form)
MODEL
BASED
­­­­­— Methods ­­­­­—
Expectation/Conditional Maximization (ECM) – two or more conditional (on parameters) maximization
Alternating Expectation/Conditional Maximization (AECM) – complete-data/actual loglikelihood
Parameter-Expanded EM (PX-EM) – include parameters whose values are known during maximization
Variational Bayes
On Achieving Energy Efficiency and Reducing CO2
Footprint in Cloud Computing, 2015.
Fractal based Anomaly Detection Over Data
Streams, 2013.
READINGS
Techniques for Optimizing Cloud Footprint, 2011.
Alternatives to the Median Absolute Deviation,
1993.
“IF I HAVE SEEN FURTHER, IT IS BY STANDING ON THE SHOULDERS OF GIANTS”
­­­­­— Research papers ­­­­­—
Knee Point Detection on Bayesian Information
Criterion, 2008.
Knee Point Search Using Cascading Top-k Search
with Minimized Time Complexity, 2013.
Finding a ‘Kneedle’ in a Haystack: Detecting Knee
Points in System Behavior, 2011.
NbClust R Package
ISAAC NEWTON
Saturated Correlates (“Spider”) Model
Multiple Group SEM (Structural Equation
Modeling)
READINGS
Latent Transition Analysis
General Location Model
­­­­­— Techniques ­­­­­—
Extra Dependent Variable (DV) Model
Bayesian Principal Component Analysis
Local Least Squares Imputation
Full Information Maximum Likelihood (FIML)
COFFEE BREAK
— 50 minutes —

More Related Content

What's hot

Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudDatabricks
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real DataTed Dunning
 
Keynote 1 the rise of stream processing for data management &amp; micro serv...
Keynote 1  the rise of stream processing for data management &amp; micro serv...Keynote 1  the rise of stream processing for data management &amp; micro serv...
Keynote 1 the rise of stream processing for data management &amp; micro serv...Sabri Skhiri
 
Big image analytics for (Re-) insurer
 Big image analytics for (Re-) insurer Big image analytics for (Re-) insurer
Big image analytics for (Re-) insurerFlavio Trolese
 
Streaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same GameStreaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same GameNumenta
 
Using Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationUsing Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationTed Dunning
 
Detecting Anomalies in Streaming Data
Detecting Anomalies in Streaming DataDetecting Anomalies in Streaming Data
Detecting Anomalies in Streaming DataSubutai Ahmad
 
Evaluating Real-Time Anomaly Detection: The Numenta Anomaly Benchmark
Evaluating Real-Time Anomaly Detection: The Numenta Anomaly BenchmarkEvaluating Real-Time Anomaly Detection: The Numenta Anomaly Benchmark
Evaluating Real-Time Anomaly Detection: The Numenta Anomaly BenchmarkNumenta
 
Intelligent Production: Deploying IoT and cloud-based machine learning to opt...
Intelligent Production: Deploying IoT and cloud-based machine learning to opt...Intelligent Production: Deploying IoT and cloud-based machine learning to opt...
Intelligent Production: Deploying IoT and cloud-based machine learning to opt...Amazon Web Services
 
Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine LearningGraphAware
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareCarol McDonald
 
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...How Spark Enables the Internet of Things: Efficient Integration of Multiple ...
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...sparktc
 
How the growth of R helps data-driven organizations succeed
How the growth of R helps data-driven organizations succeedHow the growth of R helps data-driven organizations succeed
How the growth of R helps data-driven organizations succeedRevolution Analytics
 
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...Artificial Intelligence Institute at UofSC
 
Getting Started with Numenta Technology
Getting Started with Numenta Technology Getting Started with Numenta Technology
Getting Started with Numenta Technology Numenta
 
Real time machine learning
Real time machine learningReal time machine learning
Real time machine learningVinoth Kannan
 
Event Driven Architecture: Mistakes, I've made a few...
Event Driven Architecture: Mistakes, I've made a few...Event Driven Architecture: Mistakes, I've made a few...
Event Driven Architecture: Mistakes, I've made a few...confluent
 

What's hot (20)

An Analytics Platform for Connected Vehicles
An Analytics Platform for Connected VehiclesAn Analytics Platform for Connected Vehicles
An Analytics Platform for Connected Vehicles
 
I'm being followed by drones
I'm being followed by dronesI'm being followed by drones
I'm being followed by drones
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
 
Keynote 1 the rise of stream processing for data management &amp; micro serv...
Keynote 1  the rise of stream processing for data management &amp; micro serv...Keynote 1  the rise of stream processing for data management &amp; micro serv...
Keynote 1 the rise of stream processing for data management &amp; micro serv...
 
Big image analytics for (Re-) insurer
 Big image analytics for (Re-) insurer Big image analytics for (Re-) insurer
Big image analytics for (Re-) insurer
 
Streaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same GameStreaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same Game
 
Using Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationUsing Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for Recommendation
 
Detecting Anomalies in Streaming Data
Detecting Anomalies in Streaming DataDetecting Anomalies in Streaming Data
Detecting Anomalies in Streaming Data
 
Evaluating Real-Time Anomaly Detection: The Numenta Anomaly Benchmark
Evaluating Real-Time Anomaly Detection: The Numenta Anomaly BenchmarkEvaluating Real-Time Anomaly Detection: The Numenta Anomaly Benchmark
Evaluating Real-Time Anomaly Detection: The Numenta Anomaly Benchmark
 
Intelligent Production: Deploying IoT and cloud-based machine learning to opt...
Intelligent Production: Deploying IoT and cloud-based machine learning to opt...Intelligent Production: Deploying IoT and cloud-based machine learning to opt...
Intelligent Production: Deploying IoT and cloud-based machine learning to opt...
 
Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine Learning
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
 
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...How Spark Enables the Internet of Things: Efficient Integration of Multiple ...
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...
 
How the growth of R helps data-driven organizations succeed
How the growth of R helps data-driven organizations succeedHow the growth of R helps data-driven organizations succeed
How the growth of R helps data-driven organizations succeed
 
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
 
Getting Started with Numenta Technology
Getting Started with Numenta Technology Getting Started with Numenta Technology
Getting Started with Numenta Technology
 
Real time machine learning
Real time machine learningReal time machine learning
Real time machine learning
 
GoTo Amsterdam 2013 Skinned
GoTo Amsterdam 2013 SkinnedGoTo Amsterdam 2013 Skinned
GoTo Amsterdam 2013 Skinned
 
Event Driven Architecture: Mistakes, I've made a few...
Event Driven Architecture: Mistakes, I've made a few...Event Driven Architecture: Mistakes, I've made a few...
Event Driven Architecture: Mistakes, I've made a few...
 

Viewers also liked

Statistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterStatistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterArun Kejariwal
 
Real Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsReal Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsArun Kejariwal
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactArun Kejariwal
 
Anomaly Detection and You
Anomaly Detection and YouAnomaly Detection and You
Anomaly Detection and YouMary Kelly Rich
 
When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...
When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...
When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...Forum One
 
Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...
Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...
Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...MRAMidAtlanticChapter
 
Everyone is a Data Analyst Adobe EMEA Summit 2014
Everyone is a Data Analyst Adobe EMEA Summit 2014Everyone is a Data Analyst Adobe EMEA Summit 2014
Everyone is a Data Analyst Adobe EMEA Summit 2014Simon James
 
Days In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceDays In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceArun Kejariwal
 
Data, data, everywhere… - SEE UK - 2016
Data, data, everywhere… - SEE UK - 2016Data, data, everywhere… - SEE UK - 2016
Data, data, everywhere… - SEE UK - 2016TOPdesk
 
Digital Marketing Trends Disrupting Consumer Behavior v. 19
Digital Marketing Trends Disrupting Consumer Behavior v. 19Digital Marketing Trends Disrupting Consumer Behavior v. 19
Digital Marketing Trends Disrupting Consumer Behavior v. 19Kyle Lacy
 
Plotting your path to success in fundraising
Plotting your path to success in fundraisingPlotting your path to success in fundraising
Plotting your path to success in fundraisingTPP Recruitment
 
Q214 earnings presentation
Q214 earnings presentationQ214 earnings presentation
Q214 earnings presentationTextronCorp
 
How effective is the combination of your main question 2 evaluation
How effective is the combination of your main question 2 evaluationHow effective is the combination of your main question 2 evaluation
How effective is the combination of your main question 2 evaluationGrayce
 
Getting Started Blogging
Getting Started BloggingGetting Started Blogging
Getting Started BloggingTonia.Johnson
 

Viewers also liked (19)

Statistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterStatistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ Twitter
 
Real Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsReal Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and Systems
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impact
 
Velocity 2015-final
Velocity 2015-finalVelocity 2015-final
Velocity 2015-final
 
Anomaly Detection and You
Anomaly Detection and YouAnomaly Detection and You
Anomaly Detection and You
 
When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...
When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...
When Data is Everywhere, Where Do You Start?: Using Drupal to Manage, Distrib...
 
Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...
Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...
Everyone Is an Analyst and Data Is Everywhere, But Research Has Never Been Ne...
 
Everyone is a Data Analyst Adobe EMEA Summit 2014
Everyone is a Data Analyst Adobe EMEA Summit 2014Everyone is a Data Analyst Adobe EMEA Summit 2014
Everyone is a Data Analyst Adobe EMEA Summit 2014
 
Days In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceDays In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy service
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Data, data, everywhere… - SEE UK - 2016
Data, data, everywhere… - SEE UK - 2016Data, data, everywhere… - SEE UK - 2016
Data, data, everywhere… - SEE UK - 2016
 
Astikh Etaireia
Astikh EtaireiaAstikh Etaireia
Astikh Etaireia
 
Make employees brand ambassador
Make employees brand ambassadorMake employees brand ambassador
Make employees brand ambassador
 
9 de febrero 2016 powerpoint
9 de febrero 2016 powerpoint9 de febrero 2016 powerpoint
9 de febrero 2016 powerpoint
 
Digital Marketing Trends Disrupting Consumer Behavior v. 19
Digital Marketing Trends Disrupting Consumer Behavior v. 19Digital Marketing Trends Disrupting Consumer Behavior v. 19
Digital Marketing Trends Disrupting Consumer Behavior v. 19
 
Plotting your path to success in fundraising
Plotting your path to success in fundraisingPlotting your path to success in fundraising
Plotting your path to success in fundraising
 
Q214 earnings presentation
Q214 earnings presentationQ214 earnings presentation
Q214 earnings presentation
 
How effective is the combination of your main question 2 evaluation
How effective is the combination of your main question 2 evaluationHow effective is the combination of your main question 2 evaluation
How effective is the combination of your main question 2 evaluation
 
Getting Started Blogging
Getting Started BloggingGetting Started Blogging
Getting Started Blogging
 

Similar to Data Data Everywhere: Not An Insight to Take Action Upon

The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.Theo Schlossnagle
 
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...Sarah Aerni
 
Dynamics Day 2015: Systems of Intelligence in Action
Dynamics Day 2015: Systems of Intelligence in ActionDynamics Day 2015: Systems of Intelligence in Action
Dynamics Day 2015: Systems of Intelligence in ActionIntergen
 
Machine Learning + Analytics in Splunk
Machine Learning + Analytics in Splunk Machine Learning + Analytics in Splunk
Machine Learning + Analytics in Splunk Splunk
 
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET Journal
 
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...Pete Burnap
 
Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataTreasure Data, Inc.
 
BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6Rod Soto
 
Machine Learning and Analytics in Splunk
Machine Learning and Analytics in SplunkMachine Learning and Analytics in Splunk
Machine Learning and Analytics in SplunkSplunk
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionSplunk
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Srinath Perera
 
Navy security contest-bigdataforsecurity
Navy security contest-bigdataforsecurityNavy security contest-bigdataforsecurity
Navy security contest-bigdataforsecuritystelligence
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera, Inc.
 
Big Data and Machine Learning on AWS
Big Data and Machine Learning on AWSBig Data and Machine Learning on AWS
Big Data and Machine Learning on AWSCloudHesive
 
Delivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and VisualizationDelivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and VisualizationRaffael Marty
 
Echelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy WorkshopEchelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy WorkshopGarrett Teoh Hor Keong
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data AnalyticsOsman Ali
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 

Similar to Data Data Everywhere: Not An Insight to Take Action Upon (20)

The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
 
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
 
Dynamics Day 2015: Systems of Intelligence in Action
Dynamics Day 2015: Systems of Intelligence in ActionDynamics Day 2015: Systems of Intelligence in Action
Dynamics Day 2015: Systems of Intelligence in Action
 
Machine Learning + Analytics in Splunk
Machine Learning + Analytics in Splunk Machine Learning + Analytics in Splunk
Machine Learning + Analytics in Splunk
 
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
 
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
 
Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_data
 
BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6
 
1025 track1 Malin
1025 track1 Malin1025 track1 Malin
1025 track1 Malin
 
OpenML data@Sheffield
OpenML data@SheffieldOpenML data@Sheffield
OpenML data@Sheffield
 
Machine Learning and Analytics in Splunk
Machine Learning and Analytics in SplunkMachine Learning and Analytics in Splunk
Machine Learning and Analytics in Splunk
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference
 
Navy security contest-bigdataforsecurity
Navy security contest-bigdataforsecurityNavy security contest-bigdataforsecurity
Navy security contest-bigdataforsecurity
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
 
Big Data and Machine Learning on AWS
Big Data and Machine Learning on AWSBig Data and Machine Learning on AWS
Big Data and Machine Learning on AWS
 
Delivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and VisualizationDelivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and Visualization
 
Echelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy WorkshopEchelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy Workshop
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 

More from Arun Kejariwal

Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The EdgeArun Kejariwal
 
Serverless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseServerless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseArun Kejariwal
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesArun Kejariwal
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesArun Kejariwal
 
Model Serving via Pulsar Functions
Model Serving via Pulsar FunctionsModel Serving via Pulsar Functions
Model Serving via Pulsar FunctionsArun Kejariwal
 
Designing Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsDesigning Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsArun Kejariwal
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsArun Kejariwal
 
Deep Learning for Time Series Data
Deep Learning for Time Series DataDeep Learning for Time Series Data
Deep Learning for Time Series DataArun Kejariwal
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsArun Kejariwal
 
Techniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud FootprintTechniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud FootprintArun Kejariwal
 
A Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the CloudA Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the CloudArun Kejariwal
 

More from Arun Kejariwal (11)

Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The Edge
 
Serverless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseServerless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the Enterprise
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time Series
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time Series
 
Model Serving via Pulsar Functions
Model Serving via Pulsar FunctionsModel Serving via Pulsar Functions
Model Serving via Pulsar Functions
 
Designing Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsDesigning Modern Streaming Data Applications
Designing Modern Streaming Data Applications
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data Streams
 
Deep Learning for Time Series Data
Deep Learning for Time Series DataDeep Learning for Time Series Data
Deep Learning for Time Series Data
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data Streams
 
Techniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud FootprintTechniques for Minimizing Cloud Footprint
Techniques for Minimizing Cloud Footprint
 
A Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the CloudA Tool for Practical Garbage Collection Analysis In the Cloud
A Tool for Practical Garbage Collection Analysis In the Cloud
 

Recently uploaded

Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 

Recently uploaded (20)

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 

Data Data Everywhere: Not An Insight to Take Action Upon

  • 1. DATA DATA EVERYWHERE ARUN KEJARIWAL — Not An Insight to Take Action Upon ­­­­­—
  • 2. WHAT’S UP WITH THE TITLE? ­­­­­— Metrics Arms Race ­­­­­— •  “… scaling it to about two million distinct time series …” (Netflix) •  “… highly accurate, real-time alerts on millions of system and business metrics …” (Uber) •  “As we have hundreds of systems exposing multiple data items, the write data rate might easily exceed tens of millions of data points each second.” (Facebook) •  “ … w e a r e t a k i n g t e n s o f b i l l i o n s o f measurements…” (Google) •  “The Observability stack collects 170 million individual data metrics (time series) …” (Twitter) •  “… serving over 50 million distinct time series.” (Spotify)
  • 3. WHAT’S UP WITH THE TITLE? ­­­­­— Metrics Arms Race ­­­­­— •  >95% of metrics data is NEVER read!! •  Legacy instrumentation •  Lack of understanding of how to use metrics Latency 10p, 20p, …, 90p 95p, 99p, 99.9p Mean •  Retention “Hard disk is cheap”
  • 4. WHAT’S UP WITH THE TITLE? “Rime of the Ancient Mariner” INSPIRATION By Samuel Taylor Coleridge Water, water, every where, Nor a drop to drink. ­­­­­— Rooted in 1798! ­­­­­— Image Source: https://ebooks.adelaide.edu.au/c/coleridge/samuel_taylor/rime/!
  • 5. Growing number of data sources DATA EXPLOSION •  Mobile (Smartphones, Tablets, Smart Watches) ­­­­­— Relation to Big/Fast Data ­­­­­— Data collection has become a commodity •  Million of time series •  IoT •  Wearables DATA, DATA, EVERYWHERE
  • 6. DATA, DATA, EVERYWHERE ­­­­­— Non-trivial to Mine Actionable Insights ­­­­­—
  • 7. DATA, DATA, EVERYWHERE ­­­­­— Time to Market, sinking ships – Can get “lonely” ­­­­­— Image Source: https://ebooks.adelaide.edu.au/c/coleridge/samuel_taylor/rime/!
  • 8. 1896 Olympics, Greece: Thomas Burke Purpose and ability to act Applying hard metrics and asking hard questions “Advanced data analytics is a quintessential business matter.” [1] [1] http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/making-data-analytics-work-for-you-instead-of-the-other-way-around, McKinsey, October, 2016.! DATA, DATA, EVERYWHERE ­­­­­— Value-driving analytics, NOT pristine data sets, interesting patterns or killer algorithms ­­­­­—
  • 9. BREAK IT DOWN The impact of “big data” analytics is often manifested by thousands—or m o r e — o f i n c r e m e n t a l l y s m a l l improvements. If an organization can atomize a single process into its smallest parts and implement advances where possible, the payoffs can be profound. [1] DATA, DATA, EVERYWHERE ­­­­­— Purpose and Action ­­­­­— [1] http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/making-data-analytics-work-for-you-instead-of-the-other-way-around, McKinsey, October, 2016.! ITERATE “Victory often resulted from the way decisions are made; the side that reacts to situations more quickly and processes new information more accurately should prevail.”
  • 10. DATA ANALYTICS PERFORMANCE KNEE POINT DETECTION AVAILABILITY ROOT CAUSE ANALYSIS AUTOSCALING EFFICIENCY A/B TESTING INTRUSION DETECTION ­­­­­— Not a Hype, But Not Trivial Either ­­­­­—
  • 11. FLORENCE ST 54 NEW YORK ANOMALY DETECTION ­­­­­— >100 years History ­­­­­— Bessel and Baeyer 1838, Chauvenet 1863, Stone 1868, Wright 1884, Irwin 1925, Student 1927, Thompson 1935! “… present and exact rule for the rejection of observations,! which shall be legitimately derived from the fundamental ! principles of Calculus of Probabilities.”!
  • 12. FLORENCE ST 54 NEW YORK ANOMALY DETECTION ­­­­­— Prior to 1950s ­­­­­—
  • 13. FLORENCE ST 54 NEW YORK ANOMALY DETECTION ­­­­­— Prior to 1950s ­­­­­—
  • 14. ANOMALY DETECTION Median - Wright 1599 (“Certaine Errors in Navigation”), Cournot 1843, Fechner 1874, Galton1882, Edgeworth 1887 Median Absolute Deviation (median |Xi – median(X)|) Estimators Sn (mediani {medianj|Xi - Xj|}) and Qn (median {(Xi + Xj)/2; i < j}) ROBUST STATISTICS NORMAL DIST STATIONARITY CONTEXT ­­­­­— Check your assumptions­­­­­—
  • 15. FLORENCE ST 54 NEW YORK ANOMALY DETECTION ­­­­­— Early work post 1950 ­­­­­—
  • 16. ANOMALY DETECTION ­­­­­— >100 years History ­­­­­—
  • 17. ANOMALY DETECTION ­­­­­— >100 years History ­­­­­—
  • 18. ANOMALY DETECTION Increasing focus of monitoring solutions (RUM/Synthetic): DataDog, Catchpoint, Opsclarity, Ruxit Netflix, Airbnb, Cloudera ­­­­­— Operations ­­­­­— Introducing Practical and Robust Anomaly Detection in Time Series[2] [1] https://research.yahoo.com/news/announcing-open-source-egads-scalable-configurable-and-novel-anomaly-detection-system 2015 (https://github.com/yahoo/egads)! [2] https://blog.twitter.com/2015/introducing-practical-and-robust-anomaly-detection-in-a-time-series, 2015 (https://github.com/twitter/AnomalyDetection)! [3] https://eng.uber.com/argos/, 2015! [4] http://bit.ly/luminol-velocity, 2016 (https://github.com/linkedin/luminol)! Identifying Outages with Argos, Uber Engineering’s Real-Time Monitoring and Root-Cause Exploration Tool [3] Robust Anomaly Detection System for Real User Monitoring Data [4] EGADS: A Scalable, Configurable, and Novel Anomaly Detection System[1]
  • 19. ANOMALY DETECTION AZURE [1] AWS [2] IBM Cloud [3] ­­­­­— As A Service On The Cloud ­­­­­— [1] https://azure.microsoft.com/en-us/documentation/articles/machine-learning-apps-anomaly-detection! [2] https://aws.amazon.com/blogs/iot/anomaly-detection-using-aws-iot-and-aws-lambda! [3] https://developer.ibm.com/recipes/tutorials/engage-machine-learning-for-detecting-anomalous-behaviors-of-things!
  • 21. ANOMALY DETECTION HADOOP CLUSTER LOAD BALANCER DISTRIBUTED DATABASE DBSCAN Finding outlier nodes ­­­­­— Finding needle in a haystack ­­­­­— Clustering Techniques OPTICS K-MEANS/MEDIOD
  • 22. ANOMALY DETECTION VS Real-time Implications: Frequency based techniques may incur additional overhead Time Domain Frequency Domain ­­­­­— Distinct Approaches ­­­­­— Time series analysis Statistical tests, Clustering Fast Fourier Transform, DWT Fractals, Filters (Kalman Filter)
  • 23. FLORENCE ST 54 NEW YORK ANOMALY DETECTION ­­­­­— Distinct Approaches ­­­­­— UNSUPERVISED SUPERVISED Higher accuracy, Bias-Variance Tradeoff Decision Trees, SVM, Neural NetworksClustering The common case in Operations
  • 24. ANOMALY DETECTION Complex Architectures: Hundreds of Microservices Loss of productivity (TTD, TTR) Impact on end-user experienceFALSE POSITIVE/NEGATIVE TRUE POSITIVE ­­­­­— Too Many Alerts ­­­­­—
  • 25. ANOMALY DETECTION ACTIONABLE PRODUCTIVITY SEVERITY PRIORITIZATTION ROOT CAUSE ANALYSIS CORRELATION ­­­­­— Properties ­­­­­—
  • 26. SOURCES Data Collection Issues (network hiccups, bursty traffic, queue overflow – packet loss) System Failures (bugs, hardware) – cascading effects MISSING DATA ­­­­­— Often Overlooked During Analysis ­­­­­— Makes  analysis  non-­‐trivial Methods are in general not prepared to handle them   Loss  of  efficiency   Fewer patterns extracted from data Conclusions statistically less strong Inference bias Resulting from differences between missing and complete data Larger standard error Reduced sample size
  • 27. MISSING DATA ­­­­­— A Large Amount of Prior Work ­­­­­—
  • 28. MISSING DATA ­­­­­— A Large Amount of Prior Work ­­­­­—
  • 29. MISSING DATA MISSING COMPLETELY AT RANDOM Does not depend on either the observed or missing data MISSING AT RANDOM Depends on the observed data, but not on missing data MISSING NOT AT RANDOM Depends on missing/ unobserved (latent) values ­­­­­— Characterization of Missing Values ­­­­­— Cause Random Uncorrelated with variables of interest Most Common Assumption Factors Correlation between cause of missingness and Variables of interest Missingness of variable of interest Common Case!
  • 30. MISSING DATA Completely Recorded Units Weighting Imputation Model Procedures Based Based ­­­­­— TAXONOMY of METHODS ­­­­­— Discard variables with missing data Subset data to ignore missing data Differentially weigh the complete cases to adjust for bias Weights are a function of response probability Assign a value to missing one Leverage existing ML/data mining methods Define a model for the observed data Inference based on the likelihood or posterior distribution under the model
  • 31. MISSING VALUES Transformations Normalization Prediction ­­­­­— Imputation Methods ­­­­­— Mean Value Last value Mean of xt-1 and xt+1Most Common Value Regression Similarity K-Nearest Neighbor K- Mean/Mediod Fuzzy K- Mean/Mediod SVM, SVD Event Covering
  • 32. MISSING VALUES Resampling Methods: Jackknifing (Quenouille, 1949) and Bootstrapping (Efron, 1979) Based on large-sample theory MI: Based on Bayesian theory and provides useful inferences for small samples MULTIPLE IMPUTATION ­­­­­— Imputation Methods ­­­­­— Replace each missing value by a vector of D ≥ 2 imputed values Single imputation cannot reflect sampling variability under one model for missing data or uncertainty about the correct model for missing data D complete-data inferences are combined to form one inference that reflects uncertainty due missing data under that model Throws light on sensitivity of inference of models for missing data
  • 33. MISSING VALUES MICE: Chained Equations (Sequential Regression Multiple Regression) Assumes missing data is MAR Each variable with missing data is modeled conditional upon the other variables in the data Learn a Bayesian Network using complete data and use it simultaneously impute all missing values via abductive inference Incremental imputation MULTIVARIATE IMPUTATION ­­­­­— Imputation Methods ­­­­­— Multivariate regression Linear, Logistic, Poisson Use of auxiliary variables – not used in the analysis (predictive of missingness) but can improve imputations
  • 34. MISSING VALUES Expectation-Maximization (EM) Hartley 1958, Orchard and Woodbury 1972 Dempster et al. 1977 Converges reliably to a local maximum or a saddle point Slow to converge in presence of large number of missing values M step can be difficult (e.g., has no closed form) MODEL BASED ­­­­­— Methods ­­­­­— Expectation/Conditional Maximization (ECM) – two or more conditional (on parameters) maximization Alternating Expectation/Conditional Maximization (AECM) – complete-data/actual loglikelihood Parameter-Expanded EM (PX-EM) – include parameters whose values are known during maximization Variational Bayes
  • 35. On Achieving Energy Efficiency and Reducing CO2 Footprint in Cloud Computing, 2015. Fractal based Anomaly Detection Over Data Streams, 2013. READINGS Techniques for Optimizing Cloud Footprint, 2011. Alternatives to the Median Absolute Deviation, 1993. “IF I HAVE SEEN FURTHER, IT IS BY STANDING ON THE SHOULDERS OF GIANTS” ­­­­­— Research papers ­­­­­— Knee Point Detection on Bayesian Information Criterion, 2008. Knee Point Search Using Cascading Top-k Search with Minimized Time Complexity, 2013. Finding a ‘Kneedle’ in a Haystack: Detecting Knee Points in System Behavior, 2011. NbClust R Package ISAAC NEWTON
  • 36. Saturated Correlates (“Spider”) Model Multiple Group SEM (Structural Equation Modeling) READINGS Latent Transition Analysis General Location Model ­­­­­— Techniques ­­­­­— Extra Dependent Variable (DV) Model Bayesian Principal Component Analysis Local Least Squares Imputation Full Information Maximum Likelihood (FIML)
  • 37. COFFEE BREAK — 50 minutes —