SlideShare a Scribd company logo
1 of 44
The Analytics Continuum
Rob Marano
7 May 2014
15/7/14
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
“What’s measured improves.”
Peter F. Drucker
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 25/7/14
“Knowledge has to be improved, challenged,
and increased constantly, or it vanishes.”
Peter F. Drucker
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 35/7/14
“When you develop your opinions on the basis
of weak evidence, you will have difficulty
interpreting subsequent information that
contradicts these opinions, even if this new
information is obviously more accurate.”
Nassim Nicholas Taleb
The Black Swan: The Impact of the Highly Improbable
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 45/7/14
Agenda
• Execution vs. search
• Balancing the “knowns” & “unknowns”
• Data here, there, everywhere …
• Machine learning as foundation to analytics
• Visualization as action to analytics
• Imminent opportunities
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 55/7/14
History of Analytics
Source: Economic Time of India
What drives the progression?
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 65/7/14
Why Consider Such an Investment?
• Like any innovation, right?
• Enable the business to gain
– Competitive advantage
– Cost cutting via productivity or automation
– Compliance
• But what about all that tech we already have?
Is change good to the bottom line?
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 75/7/14
Why Consider Such an Investment?
• Machine learning is used in
– Web search
– Spam filters
– Recommender systems
– Ad placement
– Credit scoring
– Fraud detection
– Stock trading
– Drug design
– and much more
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 85/7/14
Impact of “Startup Culture”
• The most successful of businesses have
perfected execution
• They run operations with the highest level of
efficiency and effectiveness for the business
• Like any auto-assist or fully automated system,
the operations are modeled perfectly
Change is not considered a constant or asset
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 95/7/14
Impact of “Startup Culture”
• The most successful of starts have perfected
change as its advantage to search for its niche
• Startups build solutions that anticipate
change, especially on how to use data to pivot
• Data & analytics form core to manage change
Startups value change inherently
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 105/7/14
Impact of “Startup Culture”
• The startup community continues to be the
vendor of choice behind all modern analytics
• Google, Yahoo, Facebook, Twitter, etc … the
list goes on
• Google started this “analytics age” – open
source now dominates it
Any business has access to modern analytics
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 115/7/14
Knowns & Unknowns
• Knowledge & business strategy
– “Known knowns”
– “Known unknowns”
– “Unknown unknowns”
• Operations & strategy depend upon evidence
• Timely get the right info to the right person
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 125/7/14
(Big) Data Here, There, Everywhere
• Data operates every process but not collected
• The more online, the more potential
• Advantages
– Competitive
– Productivity/efficiency
– Compliance
Wisdom
Knowledge
Info
Data
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 135/7/14
How Big is “Big Data?”
5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 14
What’s big for your department? Company?
Source: InfoChimps, “[Infographic] Taming Big Data from Wikibon”
Foundation of Analytics
• Historically rigid data dictionaries provided
advantages via SQL and RDBMS
• As compute/storage reduced in cost &
deployment complexities, more data processed
• Cost of infrastructure kept rising; state-of-the-
art not keeping pace
Big Data enables commodity analytics
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 155/7/14
Analytics Core
• Big Data
– Commodity computation & storage
– Modern computation framework
– Open, loose-coupling of components
• Machine learning
– Commodity knowledge discovery
• Delivered as a cost-effective service
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 165/7/14
IT Transition to Big Data Analytics
• Startup advantages lead to cost-effective
analysis of large quantities of data
• Traditional data warehouse solutions do not
effectively scale in cost nor productivity
• Growth of open source delivers both
New “open” vendors leading the way
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 175/7/14
Big Data as Enabler
Source: VMware Blog, “4 Key Architecture Considerations for Big Data Analytics”
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 185/7/14
5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 19
Apache Hadoop as Epicenter
5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 20
DataIntegration
(Flume,Chukwa,Sqoop)
Scripting
(Pig)
Distributed Storage
(HDFS)
SystemsManagement&Monitoring
(Ambari,Zookeeper)
Workflow&Scheduling
(Oozie)
Database
(Hbase,Cassandra)
Distributed Compute
(MapReduce)
Meta Data Services
(HCatalog)
Query
(Hive)
MachineLearning
(Mahout)
Source: Hortonworks, “About Hortonworks Data Platform”
The Hadoop Ecosystem
• Ambari Deployment, configuration and monitoring
• Flume Collection and import of log and event data
• HBase Column-oriented database scaling to billions of rows
• HCatalog Schema and data type sharing over Pig, Hive and MapReduce
• HDFS Distributed redundant file system for Hadoop
• Hive Data warehouse with SQL-like access
• Mahout Library of machine learning and data mining algorithms
• MapReduce Parallel computation on server clusters
• Pig High-level programming language for Hadoop computations
• Oozie Orchestration and workflow management
• Sqoop Imports data from relational databases
• Whirr Cloud-agnostic deployment of clusters
• Zookeeper Configuration management and coordination
5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 21
Source: Edd Dumbill, “What is Apache Hadoop?”
So, what is Machine Learning?
• Non-trivial process of finding and
communicating “valid, novel, potentially
useful and understandable patterns in data.”1
• Delivers the engineering behind the science of
automated classification, categorization, and
recommendation without being explicitly
programmed
• Allows data to be transformed with relative
ease into actionable knowledge
ML powers today’s internet economies
1: Ciro Donalek, “Supervised & Unsupervised Learning”
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 225/7/14
Machine Learning as Enabler
• Open source, cloud computing, & startup
culture powered rise of analytics
• Delivers powerful processing & results
• Figures out how to perform a particularly
manual task by generalizing from examples
Tactics & strategy require evidence that learns
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 235/7/14
Learning – Human or Machine
• Learning an iterative process to converge
• The ML “space” is huge and growing, but get a
handle on the intended mission objectives
– Representation
• Which group of classifiers will “it” learn; which features
– Evaluation
• Distinguish good from bad classifiers
– Optimization
• Which is the highest scoring classifier
1: Pedro Domingos, “A Few Useful Things to Know about Machine Learning”
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 245/7/14
Analytics Starts With Data
5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 25
Ingestion Conversion Upload
Image Source: Research Live, “Order from Chaos”
websites + web svcs
and It Ends with Knowledge
5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 26
Aggregation Analysis Visualization
Image Source: Visualize This by Nahan Yau
Wisdom
Knowledge
Info
Data
Taxonomy of ML
• ML converts data trends into logic to
automate data processing
• Based upon pattern recognition
• Basic goal is generalization
• Built upon two key techniques
– Supervised learning
– Unsupervised learning
5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 27
Supervised Learning
• ML technique which takes a training data set with
specific features that result in a model
• The model is used to assess whether an input is
of a pre-defined class
• Key to supervised learning remains feature set
extraction
• Popular examples include
– Regression
– Classification
– Outliers detection
5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 28
Unsupervised Learning
• ML technique to group data according to
similar features, or characteristics
• Such technique does not require a model to
be generated, rather similarity is calculated
• Popular examples include
– Clustering
– Density estimation
– Visualization by projection
5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 29
Most Important Step in ML
• “Know thine data like thyself”
– Know features about your data in order to narrow
the algorithm selection process
– Are the features nominal or continuous?
– Are there missing values in the features?
– If missing values, where are they missing?
– Are there outliers in the data?
– Are you looking for something that occurs very
infrequently?
5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 30
Choosing the ML Algorithm
• Know your data inside out & back again
• Consider the goal
• Use unsupervised unless need to predict certain
target values, then use supervised
• Choose a set of algos matched to goal/data
• Try each algorithm, assess and compare
• Adjust and combine optimization techniques
• Choose, operate, and continually measure
• Repeat
5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 31
Generalized ML Application Steps
• Collect data
• Prepare the input data
• Analyze input data & features
• Train the algorithm (if supervised)
• Test the algorithm with fresh data
• Operate ML
• Detect subtle changes to data (cycles,seasons)
• Measure for performance
• Repeat as frequently needed
5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 32
Portions sourced: Machine Learning in Action by Peter Harrington, Manning Publications
Highlights of Supervised Algos
• Generalized Linear Models
– Bayesian Regression
– Ordinary least squares (regression)
• Support Vector Machines
• K Nearest Neighbors
• Naïve Bayes
• Decision Trees
• Neural Networks
• Ensemble Methods
5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 33
Portions sourced: “Supervised Learning” from scikit-learn.org
Highlights of Unsupervised Algos
• Clustering
• K-means
• DBSCAN
• Hidden Markov Models
• Density Estimation
• Neural Networks (restricted Boltzmann)
5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 34
Portions sourced: “Supervised Learning” from scikit-learn.org
Learning -> Evaluation
5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 35
• The Classifier Evaluation Framework
1 2 : Knowledge of 1 is necessary for 2
1 2 : Feedback from 1 should be used to adjust 2
Choice of Learning Algorithm(s)
Datasets Selection
Error-Estimation/ Sampling
Method
Performance Measure of
Interest
Statistical Test
Perform Evaluation
Source: “Performance Evaluation of Machine Learning Algorithms” by Mohak Shah & Nathalie Japkowicz
Overview of Performance Measures
5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 36
All Measures
Additional Info (Classifier
Uncertainty Cost ratio
Skew)
Confusion Matrix Alternate Information
Deterministic Classifiers Scoring Classifiers
Continuous and Prob.
Classifiers (Reliability
metrics)
Multi-class
Focus
Single-class
Focus
No Chance
Correction
Chance
Correction
Accuracy
Error Rate
Cohen’s Kappa
Fielss Kappa
TP/FP Rate Precision/Recall Sens./Spec.
F-measure Geom. Mean Dice
Graphical
Measures
Summary
Statistic
Roc Curves PR
Curves DET
Curves Lift
Charts Cost
Curves
AUC
H Measure
Area under ROC-
cost curve
Distance/Error
measures
KL divergence
K&B IR BIR
RMSE
Information
Theoretic
Measures
Interestingness
Comprehensibility Multi-
criteria
Source: “Performance Evaluation of Machine Learning Algorithms” by Mohak Shah & Nathalie Japkowicz
Confusion Matrix-Based
Performance Measures
5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 37
• Multi-Class Focus:
– Accuracy =
(TP+TN)/(P+N)
• Single-Class Focus:
– Precision = TP/(TP+FP)
– Recall = TP/P
– Fallout = FP/N
– Sensitivity =
TP/(TP+FN)
– Specificity =
TN/(FP+TN)
True class 
Hypothesized
class
Pos Neg
Yes TP FP
No FN TN
P=TP+FN N=FP+TN
Confusion Matrix
Source: “Performance Evaluation of Machine Learning Algorithms” by Mohak Shah & Nathalie Japkowicz
Tying It All Together
5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 38
Visualization as Action
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 395/7/14
Imminent Opportunities
• Any business with high volume of data
– Look at processes, human-machine interfaces
– Sentiment; Customer Experience; Campaigns
– Infosec; Network Services; Customer Churn
• Sectors coming analytics-ready
– Healthcare; Government; Retail
– Manufacturing; Utilities
• Imagine a world of Internet-of-Things?
Can you imagine keeping all data? Analyze it?
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 405/7/14
Analytics
• Big Data
– Commodity compute & storage
• Analytics
– Commodity intelligence
• Big Data Analytics
– Store everything
– Analyze everything
– Do it everyday
Cost effectively manage “unknown unknowns”
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 415/7/14
“Know the enemy and know yourself; in a
hundred battles you will never be in peril.”
Sun Tzu
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 425/7/14
“It’s no longer hard to find the answer to a given
question; the hard part is finding the right
question, and as questions evolve, we gain
better insight into our own ecosystem and our
business.”
Kevin Weil
Director of Product for Revenue
Twitter
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 435/7/14
The Analytics Continuum
Rob Marano
rob@thehackerati.com
7 May 2014
© 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 445/7/14

More Related Content

What's hot

Pervasive Analytics Gets Real
Pervasive Analytics Gets RealPervasive Analytics Gets Real
Pervasive Analytics Gets RealCloudera, Inc.
 
Customer Experience: A Catalyst for Digital Transformation
Customer Experience: A Catalyst for Digital TransformationCustomer Experience: A Catalyst for Digital Transformation
Customer Experience: A Catalyst for Digital TransformationCloudera, Inc.
 
Whitepaper - Simplifying Analytics Adoption in Enterprise
Whitepaper - Simplifying Analytics Adoption in EnterpriseWhitepaper - Simplifying Analytics Adoption in Enterprise
Whitepaper - Simplifying Analytics Adoption in EnterpriseBRIDGEi2i Analytics Solutions
 
Introducing Gartner
Introducing GartnerIntroducing Gartner
Introducing Gartnerdwthomas22
 
Data-Driven Marketing Survey
Data-Driven Marketing SurveyData-Driven Marketing Survey
Data-Driven Marketing SurveyTeradata
 
Too much data and not enough analytics!
Too much data and not enough analytics!Too much data and not enough analytics!
Too much data and not enough analytics!Emma Kelly
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US InformationJulian Tong
 
"Smart Data Web: Connecting data and extracting knowledge", Prof. Dr. Hans Us...
"Smart Data Web: Connecting data and extracting knowledge", Prof. Dr. Hans Us..."Smart Data Web: Connecting data and extracting knowledge", Prof. Dr. Hans Us...
"Smart Data Web: Connecting data and extracting knowledge", Prof. Dr. Hans Us...Dataconomy Media
 
WhitePaper-BuyersGuidePatentSearchAnalysisSoftware-AdvancedAnalysis-Corporate...
WhitePaper-BuyersGuidePatentSearchAnalysisSoftware-AdvancedAnalysis-Corporate...WhitePaper-BuyersGuidePatentSearchAnalysisSoftware-AdvancedAnalysis-Corporate...
WhitePaper-BuyersGuidePatentSearchAnalysisSoftware-AdvancedAnalysis-Corporate...Chris Takacs
 
Augmented Data Management
Augmented Data ManagementAugmented Data Management
Augmented Data ManagementFORMCEPT
 
Gartner - introduction
Gartner - introductionGartner - introduction
Gartner - introductionsozanska
 
Coveo_Intelligent Workspace_eBook_FINAL
Coveo_Intelligent Workspace_eBook_FINALCoveo_Intelligent Workspace_eBook_FINAL
Coveo_Intelligent Workspace_eBook_FINALStephen Weidman
 
Introduction to Gartner
Introduction to GartnerIntroduction to Gartner
Introduction to Gartnerjasoncreane
 
Big Data in Financial Services: How to Improve Performance with Data-Driven D...
Big Data in Financial Services: How to Improve Performance with Data-Driven D...Big Data in Financial Services: How to Improve Performance with Data-Driven D...
Big Data in Financial Services: How to Improve Performance with Data-Driven D...Perficient, Inc.
 
SAS Big Data Forum - Transforming Big Data into Corporate Gold
SAS Big Data Forum - Transforming Big Data into Corporate GoldSAS Big Data Forum - Transforming Big Data into Corporate Gold
SAS Big Data Forum - Transforming Big Data into Corporate GoldLouis Fernandes
 
Coveo_Intelligent_Workplace_eBook
Coveo_Intelligent_Workplace_eBookCoveo_Intelligent_Workplace_eBook
Coveo_Intelligent_Workplace_eBookStephen Alfano
 

What's hot (20)

Pervasive Analytics Gets Real
Pervasive Analytics Gets RealPervasive Analytics Gets Real
Pervasive Analytics Gets Real
 
Customer Experience: A Catalyst for Digital Transformation
Customer Experience: A Catalyst for Digital TransformationCustomer Experience: A Catalyst for Digital Transformation
Customer Experience: A Catalyst for Digital Transformation
 
Whitepaper - Simplifying Analytics Adoption in Enterprise
Whitepaper - Simplifying Analytics Adoption in EnterpriseWhitepaper - Simplifying Analytics Adoption in Enterprise
Whitepaper - Simplifying Analytics Adoption in Enterprise
 
Introducing Gartner
Introducing GartnerIntroducing Gartner
Introducing Gartner
 
Data-Driven Marketing Survey
Data-Driven Marketing SurveyData-Driven Marketing Survey
Data-Driven Marketing Survey
 
IDOL presentation
IDOL presentationIDOL presentation
IDOL presentation
 
Nexus of Forces
Nexus of ForcesNexus of Forces
Nexus of Forces
 
Too much data and not enough analytics!
Too much data and not enough analytics!Too much data and not enough analytics!
Too much data and not enough analytics!
 
Gartner Introduction
Gartner IntroductionGartner Introduction
Gartner Introduction
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
"Smart Data Web: Connecting data and extracting knowledge", Prof. Dr. Hans Us...
"Smart Data Web: Connecting data and extracting knowledge", Prof. Dr. Hans Us..."Smart Data Web: Connecting data and extracting knowledge", Prof. Dr. Hans Us...
"Smart Data Web: Connecting data and extracting knowledge", Prof. Dr. Hans Us...
 
WhitePaper-BuyersGuidePatentSearchAnalysisSoftware-AdvancedAnalysis-Corporate...
WhitePaper-BuyersGuidePatentSearchAnalysisSoftware-AdvancedAnalysis-Corporate...WhitePaper-BuyersGuidePatentSearchAnalysisSoftware-AdvancedAnalysis-Corporate...
WhitePaper-BuyersGuidePatentSearchAnalysisSoftware-AdvancedAnalysis-Corporate...
 
Augmented Data Management
Augmented Data ManagementAugmented Data Management
Augmented Data Management
 
Buyer's guide to strategic analytics
Buyer's guide to strategic analyticsBuyer's guide to strategic analytics
Buyer's guide to strategic analytics
 
Gartner - introduction
Gartner - introductionGartner - introduction
Gartner - introduction
 
Coveo_Intelligent Workspace_eBook_FINAL
Coveo_Intelligent Workspace_eBook_FINALCoveo_Intelligent Workspace_eBook_FINAL
Coveo_Intelligent Workspace_eBook_FINAL
 
Introduction to Gartner
Introduction to GartnerIntroduction to Gartner
Introduction to Gartner
 
Big Data in Financial Services: How to Improve Performance with Data-Driven D...
Big Data in Financial Services: How to Improve Performance with Data-Driven D...Big Data in Financial Services: How to Improve Performance with Data-Driven D...
Big Data in Financial Services: How to Improve Performance with Data-Driven D...
 
SAS Big Data Forum - Transforming Big Data into Corporate Gold
SAS Big Data Forum - Transforming Big Data into Corporate GoldSAS Big Data Forum - Transforming Big Data into Corporate Gold
SAS Big Data Forum - Transforming Big Data into Corporate Gold
 
Coveo_Intelligent_Workplace_eBook
Coveo_Intelligent_Workplace_eBookCoveo_Intelligent_Workplace_eBook
Coveo_Intelligent_Workplace_eBook
 

Similar to The Analytics Continuum

Rediscover Software Development Edward Hieatt Web Summit 2014
Rediscover Software Development Edward Hieatt Web Summit 2014Rediscover Software Development Edward Hieatt Web Summit 2014
Rediscover Software Development Edward Hieatt Web Summit 2014VMware Tanzu
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti
 
Usama Fayyad talk at IIT Madras on March 27, 2015: BigData, AllData, Old Dat...
Usama Fayyad talk at IIT Madras on March 27, 2015:  BigData, AllData, Old Dat...Usama Fayyad talk at IIT Madras on March 27, 2015:  BigData, AllData, Old Dat...
Usama Fayyad talk at IIT Madras on March 27, 2015: BigData, AllData, Old Dat...Usama Fayyad
 
From Data to Data Driven - Applications that will change your business
From Data to Data Driven - Applications that will change your businessFrom Data to Data Driven - Applications that will change your business
From Data to Data Driven - Applications that will change your businessNG DATA
 
Gaining Support for Hadoop in a Large Corporate Environment
Gaining Support for Hadoop in a Large Corporate EnvironmentGaining Support for Hadoop in a Large Corporate Environment
Gaining Support for Hadoop in a Large Corporate EnvironmentDataWorks Summit
 
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...StampedeCon
 
Dba to data scientist -Satyendra
Dba to data scientist -SatyendraDba to data scientist -Satyendra
Dba to data scientist -Satyendrapasalapudi123
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersDatameer
 
Predictive analytics from a to z
Predictive analytics from a to zPredictive analytics from a to z
Predictive analytics from a to zalpinedatalabs
 
Operationalizing Data Analytics
Operationalizing Data AnalyticsOperationalizing Data Analytics
Operationalizing Data AnalyticsVMware Tanzu
 
Cloudera - Mike Olson - Hadoop World 2010
Cloudera - Mike Olson - Hadoop World 2010Cloudera - Mike Olson - Hadoop World 2010
Cloudera - Mike Olson - Hadoop World 2010Cloudera, Inc.
 
Keynote - Cloudera - Mike Olson - Hadoop World 2010
Keynote - Cloudera - Mike Olson - Hadoop World 2010Keynote - Cloudera - Mike Olson - Hadoop World 2010
Keynote - Cloudera - Mike Olson - Hadoop World 2010Cloudera, Inc.
 
Oracle BI Big Data and Bics
Oracle BI Big Data and BicsOracle BI Big Data and Bics
Oracle BI Big Data and BicsDarren Grogan
 
Managing Growing Transaction Volumes Using Hadoop
Managing Growing Transaction Volumes Using HadoopManaging Growing Transaction Volumes Using Hadoop
Managing Growing Transaction Volumes Using HadoopArvind Purushothaman
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data AnalyticsDatameer
 
Data Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache HadoopData Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache HadoopHortonworks
 
Come fare business con i big data in concreto
Come fare business con i big data in concretoCome fare business con i big data in concreto
Come fare business con i big data in concretoHP Enterprise Italia
 
Open-BDA Hadoop Summit 2014 - Mr. Krish Krishnan (Driving Business Value – Bi...
Open-BDA Hadoop Summit 2014 - Mr. Krish Krishnan (Driving Business Value – Bi...Open-BDA Hadoop Summit 2014 - Mr. Krish Krishnan (Driving Business Value – Bi...
Open-BDA Hadoop Summit 2014 - Mr. Krish Krishnan (Driving Business Value – Bi...Innovative Management Services
 

Similar to The Analytics Continuum (20)

Rediscover Software Development Edward Hieatt Web Summit 2014
Rediscover Software Development Edward Hieatt Web Summit 2014Rediscover Software Development Edward Hieatt Web Summit 2014
Rediscover Software Development Edward Hieatt Web Summit 2014
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to Production
 
Upc jornadas tic 2015 final
Upc jornadas tic 2015 finalUpc jornadas tic 2015 final
Upc jornadas tic 2015 final
 
Usama Fayyad talk at IIT Madras on March 27, 2015: BigData, AllData, Old Dat...
Usama Fayyad talk at IIT Madras on March 27, 2015:  BigData, AllData, Old Dat...Usama Fayyad talk at IIT Madras on March 27, 2015:  BigData, AllData, Old Dat...
Usama Fayyad talk at IIT Madras on March 27, 2015: BigData, AllData, Old Dat...
 
From Data to Data Driven - Applications that will change your business
From Data to Data Driven - Applications that will change your businessFrom Data to Data Driven - Applications that will change your business
From Data to Data Driven - Applications that will change your business
 
Gaining Support for Hadoop in a Large Corporate Environment
Gaining Support for Hadoop in a Large Corporate EnvironmentGaining Support for Hadoop in a Large Corporate Environment
Gaining Support for Hadoop in a Large Corporate Environment
 
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
 
Dba to data scientist -Satyendra
Dba to data scientist -SatyendraDba to data scientist -Satyendra
Dba to data scientist -Satyendra
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business Managers
 
Predictive analytics from a to z
Predictive analytics from a to zPredictive analytics from a to z
Predictive analytics from a to z
 
Operationalizing Data Analytics
Operationalizing Data AnalyticsOperationalizing Data Analytics
Operationalizing Data Analytics
 
Cloudera - Mike Olson - Hadoop World 2010
Cloudera - Mike Olson - Hadoop World 2010Cloudera - Mike Olson - Hadoop World 2010
Cloudera - Mike Olson - Hadoop World 2010
 
Keynote - Cloudera - Mike Olson - Hadoop World 2010
Keynote - Cloudera - Mike Olson - Hadoop World 2010Keynote - Cloudera - Mike Olson - Hadoop World 2010
Keynote - Cloudera - Mike Olson - Hadoop World 2010
 
iKariera 2015
iKariera 2015iKariera 2015
iKariera 2015
 
Oracle BI Big Data and Bics
Oracle BI Big Data and BicsOracle BI Big Data and Bics
Oracle BI Big Data and Bics
 
Managing Growing Transaction Volumes Using Hadoop
Managing Growing Transaction Volumes Using HadoopManaging Growing Transaction Volumes Using Hadoop
Managing Growing Transaction Volumes Using Hadoop
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data Analytics
 
Data Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache HadoopData Discovery, Visualization, and Apache Hadoop
Data Discovery, Visualization, and Apache Hadoop
 
Come fare business con i big data in concreto
Come fare business con i big data in concretoCome fare business con i big data in concreto
Come fare business con i big data in concreto
 
Open-BDA Hadoop Summit 2014 - Mr. Krish Krishnan (Driving Business Value – Bi...
Open-BDA Hadoop Summit 2014 - Mr. Krish Krishnan (Driving Business Value – Bi...Open-BDA Hadoop Summit 2014 - Mr. Krish Krishnan (Driving Business Value – Bi...
Open-BDA Hadoop Summit 2014 - Mr. Krish Krishnan (Driving Business Value – Bi...
 

Recently uploaded

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 

The Analytics Continuum

  • 1. The Analytics Continuum Rob Marano 7 May 2014 15/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
  • 2. “What’s measured improves.” Peter F. Drucker © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 25/7/14
  • 3. “Knowledge has to be improved, challenged, and increased constantly, or it vanishes.” Peter F. Drucker © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 35/7/14
  • 4. “When you develop your opinions on the basis of weak evidence, you will have difficulty interpreting subsequent information that contradicts these opinions, even if this new information is obviously more accurate.” Nassim Nicholas Taleb The Black Swan: The Impact of the Highly Improbable © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 45/7/14
  • 5. Agenda • Execution vs. search • Balancing the “knowns” & “unknowns” • Data here, there, everywhere … • Machine learning as foundation to analytics • Visualization as action to analytics • Imminent opportunities © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 55/7/14
  • 6. History of Analytics Source: Economic Time of India What drives the progression? © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 65/7/14
  • 7. Why Consider Such an Investment? • Like any innovation, right? • Enable the business to gain – Competitive advantage – Cost cutting via productivity or automation – Compliance • But what about all that tech we already have? Is change good to the bottom line? © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 75/7/14
  • 8. Why Consider Such an Investment? • Machine learning is used in – Web search – Spam filters – Recommender systems – Ad placement – Credit scoring – Fraud detection – Stock trading – Drug design – and much more © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 85/7/14
  • 9. Impact of “Startup Culture” • The most successful of businesses have perfected execution • They run operations with the highest level of efficiency and effectiveness for the business • Like any auto-assist or fully automated system, the operations are modeled perfectly Change is not considered a constant or asset © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 95/7/14
  • 10. Impact of “Startup Culture” • The most successful of starts have perfected change as its advantage to search for its niche • Startups build solutions that anticipate change, especially on how to use data to pivot • Data & analytics form core to manage change Startups value change inherently © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 105/7/14
  • 11. Impact of “Startup Culture” • The startup community continues to be the vendor of choice behind all modern analytics • Google, Yahoo, Facebook, Twitter, etc … the list goes on • Google started this “analytics age” – open source now dominates it Any business has access to modern analytics © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 115/7/14
  • 12. Knowns & Unknowns • Knowledge & business strategy – “Known knowns” – “Known unknowns” – “Unknown unknowns” • Operations & strategy depend upon evidence • Timely get the right info to the right person © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 125/7/14
  • 13. (Big) Data Here, There, Everywhere • Data operates every process but not collected • The more online, the more potential • Advantages – Competitive – Productivity/efficiency – Compliance Wisdom Knowledge Info Data © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 135/7/14
  • 14. How Big is “Big Data?” 5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 14 What’s big for your department? Company? Source: InfoChimps, “[Infographic] Taming Big Data from Wikibon”
  • 15. Foundation of Analytics • Historically rigid data dictionaries provided advantages via SQL and RDBMS • As compute/storage reduced in cost & deployment complexities, more data processed • Cost of infrastructure kept rising; state-of-the- art not keeping pace Big Data enables commodity analytics © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 155/7/14
  • 16. Analytics Core • Big Data – Commodity computation & storage – Modern computation framework – Open, loose-coupling of components • Machine learning – Commodity knowledge discovery • Delivered as a cost-effective service © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 165/7/14
  • 17. IT Transition to Big Data Analytics • Startup advantages lead to cost-effective analysis of large quantities of data • Traditional data warehouse solutions do not effectively scale in cost nor productivity • Growth of open source delivers both New “open” vendors leading the way © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 175/7/14
  • 18. Big Data as Enabler Source: VMware Blog, “4 Key Architecture Considerations for Big Data Analytics” © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 185/7/14
  • 19. 5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 19
  • 20. Apache Hadoop as Epicenter 5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 20 DataIntegration (Flume,Chukwa,Sqoop) Scripting (Pig) Distributed Storage (HDFS) SystemsManagement&Monitoring (Ambari,Zookeeper) Workflow&Scheduling (Oozie) Database (Hbase,Cassandra) Distributed Compute (MapReduce) Meta Data Services (HCatalog) Query (Hive) MachineLearning (Mahout) Source: Hortonworks, “About Hortonworks Data Platform”
  • 21. The Hadoop Ecosystem • Ambari Deployment, configuration and monitoring • Flume Collection and import of log and event data • HBase Column-oriented database scaling to billions of rows • HCatalog Schema and data type sharing over Pig, Hive and MapReduce • HDFS Distributed redundant file system for Hadoop • Hive Data warehouse with SQL-like access • Mahout Library of machine learning and data mining algorithms • MapReduce Parallel computation on server clusters • Pig High-level programming language for Hadoop computations • Oozie Orchestration and workflow management • Sqoop Imports data from relational databases • Whirr Cloud-agnostic deployment of clusters • Zookeeper Configuration management and coordination 5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 21 Source: Edd Dumbill, “What is Apache Hadoop?”
  • 22. So, what is Machine Learning? • Non-trivial process of finding and communicating “valid, novel, potentially useful and understandable patterns in data.”1 • Delivers the engineering behind the science of automated classification, categorization, and recommendation without being explicitly programmed • Allows data to be transformed with relative ease into actionable knowledge ML powers today’s internet economies 1: Ciro Donalek, “Supervised & Unsupervised Learning” © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 225/7/14
  • 23. Machine Learning as Enabler • Open source, cloud computing, & startup culture powered rise of analytics • Delivers powerful processing & results • Figures out how to perform a particularly manual task by generalizing from examples Tactics & strategy require evidence that learns © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 235/7/14
  • 24. Learning – Human or Machine • Learning an iterative process to converge • The ML “space” is huge and growing, but get a handle on the intended mission objectives – Representation • Which group of classifiers will “it” learn; which features – Evaluation • Distinguish good from bad classifiers – Optimization • Which is the highest scoring classifier 1: Pedro Domingos, “A Few Useful Things to Know about Machine Learning” © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 245/7/14
  • 25. Analytics Starts With Data 5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 25 Ingestion Conversion Upload Image Source: Research Live, “Order from Chaos” websites + web svcs
  • 26. and It Ends with Knowledge 5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 26 Aggregation Analysis Visualization Image Source: Visualize This by Nahan Yau Wisdom Knowledge Info Data
  • 27. Taxonomy of ML • ML converts data trends into logic to automate data processing • Based upon pattern recognition • Basic goal is generalization • Built upon two key techniques – Supervised learning – Unsupervised learning 5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 27
  • 28. Supervised Learning • ML technique which takes a training data set with specific features that result in a model • The model is used to assess whether an input is of a pre-defined class • Key to supervised learning remains feature set extraction • Popular examples include – Regression – Classification – Outliers detection 5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 28
  • 29. Unsupervised Learning • ML technique to group data according to similar features, or characteristics • Such technique does not require a model to be generated, rather similarity is calculated • Popular examples include – Clustering – Density estimation – Visualization by projection 5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 29
  • 30. Most Important Step in ML • “Know thine data like thyself” – Know features about your data in order to narrow the algorithm selection process – Are the features nominal or continuous? – Are there missing values in the features? – If missing values, where are they missing? – Are there outliers in the data? – Are you looking for something that occurs very infrequently? 5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 30
  • 31. Choosing the ML Algorithm • Know your data inside out & back again • Consider the goal • Use unsupervised unless need to predict certain target values, then use supervised • Choose a set of algos matched to goal/data • Try each algorithm, assess and compare • Adjust and combine optimization techniques • Choose, operate, and continually measure • Repeat 5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 31
  • 32. Generalized ML Application Steps • Collect data • Prepare the input data • Analyze input data & features • Train the algorithm (if supervised) • Test the algorithm with fresh data • Operate ML • Detect subtle changes to data (cycles,seasons) • Measure for performance • Repeat as frequently needed 5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 32 Portions sourced: Machine Learning in Action by Peter Harrington, Manning Publications
  • 33. Highlights of Supervised Algos • Generalized Linear Models – Bayesian Regression – Ordinary least squares (regression) • Support Vector Machines • K Nearest Neighbors • Naïve Bayes • Decision Trees • Neural Networks • Ensemble Methods 5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 33 Portions sourced: “Supervised Learning” from scikit-learn.org
  • 34. Highlights of Unsupervised Algos • Clustering • K-means • DBSCAN • Hidden Markov Models • Density Estimation • Neural Networks (restricted Boltzmann) 5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 34 Portions sourced: “Supervised Learning” from scikit-learn.org
  • 35. Learning -> Evaluation 5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 35 • The Classifier Evaluation Framework 1 2 : Knowledge of 1 is necessary for 2 1 2 : Feedback from 1 should be used to adjust 2 Choice of Learning Algorithm(s) Datasets Selection Error-Estimation/ Sampling Method Performance Measure of Interest Statistical Test Perform Evaluation Source: “Performance Evaluation of Machine Learning Algorithms” by Mohak Shah & Nathalie Japkowicz
  • 36. Overview of Performance Measures 5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 36 All Measures Additional Info (Classifier Uncertainty Cost ratio Skew) Confusion Matrix Alternate Information Deterministic Classifiers Scoring Classifiers Continuous and Prob. Classifiers (Reliability metrics) Multi-class Focus Single-class Focus No Chance Correction Chance Correction Accuracy Error Rate Cohen’s Kappa Fielss Kappa TP/FP Rate Precision/Recall Sens./Spec. F-measure Geom. Mean Dice Graphical Measures Summary Statistic Roc Curves PR Curves DET Curves Lift Charts Cost Curves AUC H Measure Area under ROC- cost curve Distance/Error measures KL divergence K&B IR BIR RMSE Information Theoretic Measures Interestingness Comprehensibility Multi- criteria Source: “Performance Evaluation of Machine Learning Algorithms” by Mohak Shah & Nathalie Japkowicz
  • 37. Confusion Matrix-Based Performance Measures 5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 37 • Multi-Class Focus: – Accuracy = (TP+TN)/(P+N) • Single-Class Focus: – Precision = TP/(TP+FP) – Recall = TP/P – Fallout = FP/N – Sensitivity = TP/(TP+FN) – Specificity = TN/(FP+TN) True class  Hypothesized class Pos Neg Yes TP FP No FN TN P=TP+FN N=FP+TN Confusion Matrix Source: “Performance Evaluation of Machine Learning Algorithms” by Mohak Shah & Nathalie Japkowicz
  • 38. Tying It All Together 5/7/14 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 38
  • 39. Visualization as Action © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 395/7/14
  • 40. Imminent Opportunities • Any business with high volume of data – Look at processes, human-machine interfaces – Sentiment; Customer Experience; Campaigns – Infosec; Network Services; Customer Churn • Sectors coming analytics-ready – Healthcare; Government; Retail – Manufacturing; Utilities • Imagine a world of Internet-of-Things? Can you imagine keeping all data? Analyze it? © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 405/7/14
  • 41. Analytics • Big Data – Commodity compute & storage • Analytics – Commodity intelligence • Big Data Analytics – Store everything – Analyze everything – Do it everyday Cost effectively manage “unknown unknowns” © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 415/7/14
  • 42. “Know the enemy and know yourself; in a hundred battles you will never be in peril.” Sun Tzu © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 425/7/14
  • 43. “It’s no longer hard to find the answer to a given question; the hard part is finding the right question, and as questions evolve, we gain better insight into our own ecosystem and our business.” Kevin Weil Director of Product for Revenue Twitter © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 435/7/14
  • 44. The Analytics Continuum Rob Marano rob@thehackerati.com 7 May 2014 © 2014 The Hackerati, Inc. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 445/7/14

Editor's Notes

  1. The “data” in “Big Data”
  2. The “BIG” in “Big Data”
  3. Data effects business management opinions that steer tactics and strategies that ultimately effect business operations, which can be seen/witnessed/evidenced by measuring the business
  4. Execution vs. searchBalancing the “knowns” & “unknowns”Data here, there, everywhere …Machine learning as foundation to analyticsVisualization as action to analyticsImminent opportunities
  5. Most business systems and processes are geared towards two main needs: (a) know what you know already and (b) know what you don't know already.  However, not knowing what you don't know remains what challenges businesses most and serve as the catalyst for business failures.  Therefore, all businesses need to know more faster and better.  This is where analytics becomes super valuable
  6. Describe business need to have timely, detailed data transformed into information transformed into knowledge, transformed into actionable decisions in order to drive the business forward via (a) competitive advantage, (b) compliance, and (c) productivity/cost savings.
  7. Analytics design and implementation are one based upon "embrace and extend" and not "rip out and replace."  Therefore all businesses can benefit, one step at a time
  8. Source:http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdfSuppose you have an application that you think machine learning might be good for. The first problem facing you is the bewildering variety of learning algorithms available. Which one to use? There are literally thousands available, and hundreds more are published each year. The key to not getting lost in this huge space is to realize that it consists of combinations of just three components. The components are: Representation. A classifier must be represented in some formal language that the computer can handle. Conversely, choosing a representation for a learner is tantamount to choosing the set of classifiers that it can possibly learn. This set is called the hypothesis space of the learner. If a classifier is not in the hypothesis space, it cannot be learned. A related question, which we will address in a later section, is how to represent the input, i.e., what features to use. Evaluation. An evaluation function (also called objective function or scoring function) is needed to distinguish good classifiers from bad ones. The evaluation function used internally by the algorithm may differ from the external one that we want the classifier to optimize, for ease of optimization (see below) and due to the issues discussed in the next section. Optimization. Finally, we need a method to search among the classifiers in the language for the highest-scoring one. The choice of optimization technique is key to the efficiency of the learner, and also helps determine the classifier produced if the evaluation function has more than one optimum. It is common for new learners to start out using off-the-shelf optimizers, which are later replaced by custom-designed ones.
  9. Generalization is defined as the ability of the ML algorithm to tag labels correctly to data input beyond the examples in the training set.
  10. if supervised learning, what’s your target value classification if target value is a discrete value, e.g., True/False, Yes/No, 1/2/3, A/B/C, Red/Green/Blue regression if target value can take on a number or range of values, e.g., 0.00 to 100.00, or -999 to 999, or +∞ to -∞
  11. if you’re NOT looking to predict a target value, use unsupervised learning clustering if you are trying to fit data into some discrete groups density estimation if you are trying to have some numerical estimate of how strong the fit is into each groupClustering = discover groups of similar examples within the dataDensity estimation = determine the distribution of data within the input space (all features considered)Visualization by projection = project the data from high-dimensional space down to 2 or 3 dimensions for the purpose of visualization
  12. To perform classification with generalized linear models (linear regression), see Logistic regression.Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection.The advantages of support vector machines are:Effective in high dimensional spaces.Still effective in cases where number of dimensions is greater than the number of samples.Uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.Versatile: different Kernel functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels.The disadvantages of support vector machines include:If the number of features is much greater than the number of samples, the method is likely to give poor performances.SVMs do not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation (see Scores and probabilities, below).K Nearest Neighborsprovides functionality for unsupervised and supervised neighbors-based learning methods. Unsupervised nearest neighbors is the foundation of many other learning methods, notably manifold learning and spectral clustering. Supervised neighbors-based learning comes in two flavors: classification for data with discrete labels, and regression for data with continuous labels.The principle behind nearest neighbor methods is to find a predefined number of training samples closest in distance to the new point, and predict the label from these. The number of samples can be a user-defined constant (k-nearest neighbor learning), or vary based on the local density of points (radius-based neighbor learning). The distance can, in general, be any metric measure: standard Euclidean distance is the most common choice. Neighbors-based methods are known as non-generalizing machine learning methods, since they simply “remember” all of its training data (possibly transformed into a fast indexing structure such as a Ball Tree or KD Tree.).Despite its simplicity, nearest neighbors has been successful in a large number of classification and regression problems, including handwritten digits or satellite image scenes. Being a non-parametric method, it is often successful in classification situations where the decision boundary is very irregular.Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of independence between every pair of features.Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.Ensemble MethodsThe goal of ensemble methods is to combine the predictions of several models built with a given learning algorithm in order to improve generalizability / robustness over a single model.Two families of ensemble methods are usually distinguished:In averaging methods, the driving principle is to build several models independently and then to average their predictions. On average, the combined model is usually better than any of the single model because its variance is reduced.Examples: Bagging methods, Forests of randomized trees...By contrast, in boosting methods, models are built sequentially and one tries to reduce the bias of the combined model. The motivation is to combine several weak models to produce a powerful ensemble.
  13. This and similar analyses reveal that each performance measure will convey some information and hide other.Therefore, there is an information trade-off carried through the different metrics.A practitioner has to choose, quite carefully, the quantity s/he is interested in monitoring, while keeping in mind that other values matter as well.Classifiers may rank differently depending on the metrics along which they are being compared.
  14. The “data” in “Big Data”
  15. The “data” in “Big Data”