SlideShare a Scribd company logo
1 of 20
Agile Analytics:
An explanatory study of
complexity management.
Agnirudra Sikdar
Contents
▪ Introduction to Big Data and Analytics
▪ What is Agile Analytics and how it is related to the scope of this thesis
▪ Analytics and its case studies
▪ Lessons learned from case studies on technical complexity management
▪ Proposed Methodology
▪ Data Acquisition
▪ Selection of Tools for Analytics
▪ Selection of Algorithms
▪ Choice of modelling approach
▪ Outcomes and Benefits from proposed methodology
▪ Conclusions
Introduction to Big Data and Analytics
▪ What is Big Data ?
▪ What is Analytics?
▪ Why do we need Analytics?
Fig 1. Big Data Analytics Pipeline Model
What is Agile Analytics and how it is related to
the scope of this thesis?
• Agile Analytics is a style of building the data
warehouse, data marts, business intelligence
applications and analytics applications that focuses
on early and continuous delivery of business value.
• Agile Approach implements the practice of
implementing FEEDBACKS in each iteration
making faster delivery possible with every iteration.
• It is not a rigid methodology.
• The scope of this thesis is to understand the
various technical complexities involved from
different Big Data analytics and trying to implement
agile approach to address these issues. Technical
Complexity is not addressed in current agile
methodologies and I would try to focus on bridging
this gap.
Fig 2: Various Influences of Agile Analytics
Goals and Methodology
▪ Agile analytics focuses on soft/managerial variables.
▪ My goal is to establish the technical guidelines
▪ This thesis reports on an explanatory study surveying public case
studies to identify the various technical complexities involved, and
how it can be solved using the implementation of Agile Analytics.
▪ 12 case multi studies have been reviewed
▪ 2 Detailed Case studies on Customer Segmentation and Clustering
have also been reviewed
▪ After intense reviewing of all the cases, I had implemented my
proposed methodologies.
Analytical Data Solutions
IBM
Invenco Oy
PriceWater House Cooper
Deloitte
ThoughtWorks
No of cases #2
No of cases #5
No of cases #1
No of cases #2
No of cases #1
No of cases #1
Online UK Retailer
4 Major Australian Banks
Methodology Steps
Case Studies
Data Acquisition
Data Acquisition variable
Data Sampling Period
Selection of Tools
Selection of Algorithm
Modelling Techniques
From the various case studies , the following
Methodological steps have been performed.
1)Data Acquisition: a)This step has been performed to
determine the variables which can be used to sample the data.
b) The time period of the data that can be taken to sample it.
2) Selection of Tools: This step has been performed to
determine the type of tool to be used and the prose & cons
involved with various tools.
3) Selection of Algorithm This step to determine the most
efficient time complexity algorithm for the required analysis.
4) Modelling Techniques This step to identify if descriptive, or
predictive or prescriptive modelling has been used, and how to
improve it.
Discussion of Benefits from the methodology: The outcome and
advantages involved from applying the methodology.
Catalogue of case studies: a blueprint
Domain Company Modelling Software
Retail US Based Luxury
Company, Top Toy,
GAP
Descriptive,
Predictive
MongoDB, IBM
Cognos
Electronics Global PC
manufacturer
Descriptive,
Predictive
Proprietary software of
Analytical Data
Solutions
Food Colombus Food Descriptive,
Predictive
IBM Cognos TM1,
Finance Corona Direct,
Canadian Insurance
Firm
Descriptive,
Predictive, and
slight Prescriptive
IBM SPSS
Marketing Mediamath Descriptive,
Predictive,
Prescriptive
Netezza, MathClarity
Manufacturing Mueller Inc Descriptive,
Predictive,
Prescriptive
IBM Cognos, IBM
SPSS
Tele-communications Telecom company in
US
Descriptive HADOOP
Information Technology PriceWater House
Cooper
Descriptive Open Source Tools
Fig 3: Geolocations of the countries on which the case
studies were carried out
Conclusion from the Multi-Case studies
• Agile is not completely
present in all the cases.
• Descriptive modelling and
predictive modelling is present
in most of the cases.
• Marketing is the only sector
which has the influence of all
the modelling factors.
• We will be focusing into the
marketing domain, interested in
the topic of Customer
Segmentation and Clustering
in the following slides.
Fig 4 : Domains Vs Modelling Chart
Key technical complexity issues in case studies
Key points realized from Customer
Segmentation case studies-:
1)Choice of Algorithm was , K clustering method
along with RFM approach.
2)Data complexities were experienced while pre-
processing datasets.
3)Choice of algorithm also resulted in algorithm
processing complexity.
4)Process complexity lies in the fact how
parallelization of computer processing can be
used to compute data
Fig 5: A flowchart of Customer Segmentation procedure.
Proposed Methodologies
I. Data Acquisition
Proper Data Acquisition techniques are very important for the analytics domain.
My proposed techniques are :
1) Starting with a selection of 6 variables.
3 from the customer’s perspective (Postal Code, Gender, Age Group)
3 From the company’s perspective ( Product , Price , Quantity )
2) For analytic segmentation, we should try to gather a large sample (more than 600).
Segments as small as 5% can often promise high profitability
once they are weighted according to their spend levels.
Conclusion:
Agile Approach of early analytics and implementation
can reduce the Data Warehouse/ Business Intelligence risk
II. Selection Tool for Analytics
In the selection of tools methodology,
we have taken the following queries
into consideration
When choosing a data analysis
software we need to answer a basic
set of questions, like:
Does it run natively on the
computer?
Does it handle large datasets?
Does the software provide all
methods we need?
Is it affordable?
Ease of use.
Fig 6.A(left): Lavastorm survey of Analytics Tool
Fig 6.B(right):Tools used by 2015 respondents to O’Reilly 2015 salary survey.
A Tabular Summary of Software and their
functionalities
Name Advantages Disadvantages Open source? Typical users
R Library support;
visualization
Steep learning curve Yes Finance; Statistics
Matlab Elegant matrix support;
visualization
Expensive; incomplete
statistics support
No Engineering
SciPy/NumPy
/Matplotlib
Python (general-purpose
programming language)
Immature Yes Engineering
Excel Easy; visual; flexible Large datasets No Business
SAS Large datasets Expensive; outdated
programming language
No Business; Government
Stata Easy statistical analysis No Science
SPSS Like Stata but more expensive and worse
III. Choice of Algorithm
The choice of algorithm is extremely crucial to any
form of analytics. It is because algorithms addresses
two issues, a) Time Complexity b) Space
Complexity
Some of the proposals about the selection of
Algorithm are :-
1) Implementing pre-clustering methods like Canopy
Clusters. (Canopy Clusters can process huge
datasets efficiently. It is an unsupervised pre-
clustering algorithm
2) Divisive Algorithms can minimize cut based
cost in place of K means, as the latter
maximizes only similarity within a cluster and
ignores cost of cuts.
Fig 7: Hierarchical Distribution of Analytical Segmentation.
K-Clustering Method
1) Simplest Algorithm to solve clustering
Problems
2) No of clusters is first initialized and
Accordingly the initial cluster centers are
randomly selected
3)New Partition created by assigning
each Data to cluster that has the closest
centroid
4)Steps 2 & 3 repeated until the
centroids no longer move any cluster
5) The downside of K-Means is that,
the result of clustering mostly
depends
on the initially selected centroids
6) Spherical datasets cannot be
efficiently
clustered using K-Means
A few guidelines: clustering as a sample problem
1) Precede interdependence segmentation with focus groups
2) To solve differential bias, an exhaustive 5 point scale can be used.
3) Conjoint or discrete-choice based utilities can be used as basis of clustering.
4) We should try to cluster only unique dimensions by factor analyzation.
5) We should try to focus on the reliability and accuracy of the dependent variable (e.g. using CHAID)
5) Heavy Data Cleansing can be avoided when using decision trees (e.g. CHAID).
6) Different algorithms have a best-practice parametrization, e.g. for CHAID:
(i) Smallest child node should be set at approximately 5% of total N,
(ii) and parent node at twice the size of the child node.
(ii) Alpha set at 0.5, with Bonferroni adjustment turned off
IV. Modelling Approach
DESCRIPTIVE MODELLING :-
a) We estimate probability densities using either parametric or non- parametric approach.
b) Some of the types of parametric modelling techniques that can be followed are:
I) Mixture models
II) Expectation Maximization.(Time complexity O(Kp2n); space complexity O(Kn).
Can be slow to converge; local maxima )
c) Non-parametric density estimation doesn’t scale well.
d) For determining number of clusters, we can use K-Means clustering, Partitioning
Around Medoids (PAM),Clustering Large Applications (CLARA).
e) For identifying number of clusters, we can select the Method of Mojena.
f) Some of the defining decisions taken for clustering using descriptive approach are
1) Ability to deal with various attributes
2) Discovery of clusters with arbitrary shapes.
3) Minimal requirement for domain knowledge to determine input parameters.
Fig 8: Components of Modelling
Predictive Modelling
The most used clustering algorithms based on predictive modelling are stated as follows:
1) Behavioral Clustering
This approach enlightens about how people behave while purchasing.
2) Product Based Clustering/ Category Based Clustering
This algorithm discovers what different groupings of product people buy from.
3) Brand based clustering
This clustering tells us what brands people prefer.
The other types of modelling techniques are used in Propensity Models for Predictions and Collaborative
Filtering for Recommendations.
Prescriptive Modelling
Prescriptive modelling is an emerging modelling approach
It is still in its infancy
maintaining an Agile approach towards failures while
prescriptive modelling, has resulted this to be one of the
strongest modelling techniques on the rise
From the case studies ,we have seen very few companies
Are able to implement prescriptive modelling.
It can also be said that , all prescriptive models are
descriptive in nature, but all descriptive models are not
prescriptive.
Fig 9: Comparison of different modeling techniques.
V. Outcomes and Benefits from Methodology
By extracting common themes from the case studies and
literature reviews, I have noticed three underlying principles.
These principles are present in different magnitudes
irrespective of their size, sector and structure. They are as
follows:
1. Commonality,i.e ,business functions in a common field.
2. Concentration, i.e, grouping of enterprises who can
interact.
3. Connectivity,i.e ,interconnected organizations with different
types of relationships
Conclusion
1) Data Integration Process is cumbersome & Companies should try to migrate to the latest Big
Data Technologies like Hadoop, Cloudera, SPARK, HIVE and the likes.
2)Various technical complexities can be approached using Agile Approach
3) My guidelines on technical complexity management are Agile as they help the data scientist
to obtain continuous outputs. Agile analytics focuses on soft managerial variables.
Utilizing this idea and putting it to practice in situations of technical complexities from the
various case study reviews, we can see that with every iteration , the result can be improved.
Faster deployment. Faster feedback incorporation. Faster Delivery. The potential to scale the
system using the feedback and its default flexible nature makes it the state of the art analytics
for choice.
In one phrase, “ Get Lean. Get Agile. Get Started”

More Related Content

What's hot

Quantitative Analysis For Management 13th Edition Render Test Bank
Quantitative Analysis For Management 13th Edition Render Test BankQuantitative Analysis For Management 13th Edition Render Test Bank
Quantitative Analysis For Management 13th Edition Render Test BankJescieer
 
A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...IJMER
 
IRJET- Analyzing Voting Results using Influence Matrix
IRJET- Analyzing Voting Results using Influence MatrixIRJET- Analyzing Voting Results using Influence Matrix
IRJET- Analyzing Voting Results using Influence MatrixIRJET Journal
 
Computational optimization, modelling and simulation: Recent advances and ove...
Computational optimization, modelling and simulation: Recent advances and ove...Computational optimization, modelling and simulation: Recent advances and ove...
Computational optimization, modelling and simulation: Recent advances and ove...Xin-She Yang
 
Study of relevancy, diversity, and novelty in recommender systems
Study of relevancy, diversity, and novelty in recommender systemsStudy of relevancy, diversity, and novelty in recommender systems
Study of relevancy, diversity, and novelty in recommender systemsChemseddine Berbague
 
IRJET - Stock Market Prediction using Machine Learning Algorithm
IRJET - Stock Market Prediction using Machine Learning AlgorithmIRJET - Stock Market Prediction using Machine Learning Algorithm
IRJET - Stock Market Prediction using Machine Learning AlgorithmIRJET Journal
 
Problem Formulation in Artificial Inteligence Projects
Problem Formulation in Artificial Inteligence ProjectsProblem Formulation in Artificial Inteligence Projects
Problem Formulation in Artificial Inteligence ProjectsDr. C.V. Suresh Babu
 
Performance Analysis of Genetic Algorithm as a Stochastic Optimization Tool i...
Performance Analysis of Genetic Algorithm as a Stochastic Optimization Tool i...Performance Analysis of Genetic Algorithm as a Stochastic Optimization Tool i...
Performance Analysis of Genetic Algorithm as a Stochastic Optimization Tool i...paperpublications3
 
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACHESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACHcscpconf
 
Estimating project development effort using clustered regression approach
Estimating project development effort using clustered regression approachEstimating project development effort using clustered regression approach
Estimating project development effort using clustered regression approachcsandit
 
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...Editor IJCATR
 
PhD Thesis Igor Barahona July 26th of 2013
PhD Thesis Igor Barahona July 26th of 2013PhD Thesis Igor Barahona July 26th of 2013
PhD Thesis Igor Barahona July 26th of 2013Igor Barahona
 
Lesson 1 introduction_to_time_series
Lesson 1 introduction_to_time_seriesLesson 1 introduction_to_time_series
Lesson 1 introduction_to_time_seriesankit_ppt
 
Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...
Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...
Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...Dr. Amarjeet Singh
 
Operation Research Techniques
Operation Research Techniques Operation Research Techniques
Operation Research Techniques Lijin Mathew
 
Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
Review on Algorithmic and Non Algorithmic Software Cost Estimation TechniquesReview on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniquesijtsrd
 
Operation research techniques
Operation research techniquesOperation research techniques
Operation research techniquesRodixon94
 
IRJET- Stock Market Prediction using Machine Learning Techniques
IRJET- Stock Market Prediction using Machine Learning TechniquesIRJET- Stock Market Prediction using Machine Learning Techniques
IRJET- Stock Market Prediction using Machine Learning TechniquesIRJET Journal
 

What's hot (20)

Quantitative Analysis For Management 13th Edition Render Test Bank
Quantitative Analysis For Management 13th Edition Render Test BankQuantitative Analysis For Management 13th Edition Render Test Bank
Quantitative Analysis For Management 13th Edition Render Test Bank
 
A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...A Threshold fuzzy entropy based feature selection method applied in various b...
A Threshold fuzzy entropy based feature selection method applied in various b...
 
IRJET- Analyzing Voting Results using Influence Matrix
IRJET- Analyzing Voting Results using Influence MatrixIRJET- Analyzing Voting Results using Influence Matrix
IRJET- Analyzing Voting Results using Influence Matrix
 
Computational optimization, modelling and simulation: Recent advances and ove...
Computational optimization, modelling and simulation: Recent advances and ove...Computational optimization, modelling and simulation: Recent advances and ove...
Computational optimization, modelling and simulation: Recent advances and ove...
 
Study of relevancy, diversity, and novelty in recommender systems
Study of relevancy, diversity, and novelty in recommender systemsStudy of relevancy, diversity, and novelty in recommender systems
Study of relevancy, diversity, and novelty in recommender systems
 
IRJET - Stock Market Prediction using Machine Learning Algorithm
IRJET - Stock Market Prediction using Machine Learning AlgorithmIRJET - Stock Market Prediction using Machine Learning Algorithm
IRJET - Stock Market Prediction using Machine Learning Algorithm
 
Problem Formulation in Artificial Inteligence Projects
Problem Formulation in Artificial Inteligence ProjectsProblem Formulation in Artificial Inteligence Projects
Problem Formulation in Artificial Inteligence Projects
 
Performance Analysis of Genetic Algorithm as a Stochastic Optimization Tool i...
Performance Analysis of Genetic Algorithm as a Stochastic Optimization Tool i...Performance Analysis of Genetic Algorithm as a Stochastic Optimization Tool i...
Performance Analysis of Genetic Algorithm as a Stochastic Optimization Tool i...
 
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACHESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
 
Estimating project development effort using clustered regression approach
Estimating project development effort using clustered regression approachEstimating project development effort using clustered regression approach
Estimating project development effort using clustered regression approach
 
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
 
PhD Thesis Igor Barahona July 26th of 2013
PhD Thesis Igor Barahona July 26th of 2013PhD Thesis Igor Barahona July 26th of 2013
PhD Thesis Igor Barahona July 26th of 2013
 
Lesson 1 introduction_to_time_series
Lesson 1 introduction_to_time_seriesLesson 1 introduction_to_time_series
Lesson 1 introduction_to_time_series
 
Ankit presentation
Ankit presentationAnkit presentation
Ankit presentation
 
Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...
Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...
Nonlinear Programming: Theories and Algorithms of Some Unconstrained Optimiza...
 
Machine Learning - Deep Learning
Machine Learning - Deep LearningMachine Learning - Deep Learning
Machine Learning - Deep Learning
 
Operation Research Techniques
Operation Research Techniques Operation Research Techniques
Operation Research Techniques
 
Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
Review on Algorithmic and Non Algorithmic Software Cost Estimation TechniquesReview on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
 
Operation research techniques
Operation research techniquesOperation research techniques
Operation research techniques
 
IRJET- Stock Market Prediction using Machine Learning Techniques
IRJET- Stock Market Prediction using Machine Learning TechniquesIRJET- Stock Market Prediction using Machine Learning Techniques
IRJET- Stock Market Prediction using Machine Learning Techniques
 

Viewers also liked

Word2Vec on Italian language
Word2Vec on Italian languageWord2Vec on Italian language
Word2Vec on Italian languageFrancesco Cucari
 
Word2vec on the italian language: first experiments
Word2vec on the italian language: first experimentsWord2vec on the italian language: first experiments
Word2vec on the italian language: first experimentsVincenzo Lomonaco
 
Empowering the Business with Agile Analytics
Empowering the Business with Agile AnalyticsEmpowering the Business with Agile Analytics
Empowering the Business with Agile AnalyticsInside Analysis
 
Agile Consumer Analytics
Agile Consumer AnalyticsAgile Consumer Analytics
Agile Consumer AnalyticsThoughtworks
 
CNN for Sentiment Analysis on Italian Tweets
CNN for Sentiment Analysis on Italian TweetsCNN for Sentiment Analysis on Italian Tweets
CNN for Sentiment Analysis on Italian TweetsGiuseppe Attardi
 
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...tboubez
 
Agile Analytics: The Secret to Test, Improve, Fail & Succeed Quickly.
Agile Analytics: The Secret to Test, Improve, Fail & Succeed Quickly.Agile Analytics: The Secret to Test, Improve, Fail & Succeed Quickly.
Agile Analytics: The Secret to Test, Improve, Fail & Succeed Quickly.Venveo
 
Continuous delivery with zero downtime. made real by dev ops.
Continuous delivery with zero downtime. made real by dev ops.Continuous delivery with zero downtime. made real by dev ops.
Continuous delivery with zero downtime. made real by dev ops.Edureka!
 
DevOps and the Bottom Line
DevOps and the Bottom Line DevOps and the Bottom Line
DevOps and the Bottom Line Chef
 
Analytics for smarter software development
Analytics for smarter software development Analytics for smarter software development
Analytics for smarter software development Thomas Zimmermann
 
ChefConf 2015 Event Slides
ChefConf 2015 Event SlidesChefConf 2015 Event Slides
ChefConf 2015 Event SlidesSumo Logic
 
DevOps: The Key to IT Performance
DevOps: The Key to IT PerformanceDevOps: The Key to IT Performance
DevOps: The Key to IT PerformanceNicole Forsgren
 
Enhance your Agility with DevOps
Enhance your Agility with DevOpsEnhance your Agility with DevOps
Enhance your Agility with DevOpsEdureka!
 
Why Use Analytics on Your Software
Why Use Analytics on Your SoftwareWhy Use Analytics on Your Software
Why Use Analytics on Your SoftwareDeskMetrics
 
Analytics for Software Development
Analytics for Software DevelopmentAnalytics for Software Development
Analytics for Software DevelopmentRay Buse
 
Information Needs for Software Development Analytics
Information Needs for Software Development AnalyticsInformation Needs for Software Development Analytics
Information Needs for Software Development AnalyticsRay Buse
 
Analytics for software development
Analytics for software developmentAnalytics for software development
Analytics for software developmentThomas Zimmermann
 

Viewers also liked (20)

GUBI: Agile Analytics [pt-br]
GUBI: Agile Analytics [pt-br]GUBI: Agile Analytics [pt-br]
GUBI: Agile Analytics [pt-br]
 
Word2Vec on Italian language
Word2Vec on Italian languageWord2Vec on Italian language
Word2Vec on Italian language
 
Word2vec on the italian language: first experiments
Word2vec on the italian language: first experimentsWord2vec on the italian language: first experiments
Word2vec on the italian language: first experiments
 
Empowering the Business with Agile Analytics
Empowering the Business with Agile AnalyticsEmpowering the Business with Agile Analytics
Empowering the Business with Agile Analytics
 
Agile Consumer Analytics
Agile Consumer AnalyticsAgile Consumer Analytics
Agile Consumer Analytics
 
CNN for Sentiment Analysis on Italian Tweets
CNN for Sentiment Analysis on Italian TweetsCNN for Sentiment Analysis on Italian Tweets
CNN for Sentiment Analysis on Italian Tweets
 
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...
Beyond pretty charts, Analytics for the rest of us. Toufic Boubez DevOps Days...
 
Analytics gets Agile
Analytics gets AgileAnalytics gets Agile
Analytics gets Agile
 
Agile Analytics: The Secret to Test, Improve, Fail & Succeed Quickly.
Agile Analytics: The Secret to Test, Improve, Fail & Succeed Quickly.Agile Analytics: The Secret to Test, Improve, Fail & Succeed Quickly.
Agile Analytics: The Secret to Test, Improve, Fail & Succeed Quickly.
 
Continuous delivery with zero downtime. made real by dev ops.
Continuous delivery with zero downtime. made real by dev ops.Continuous delivery with zero downtime. made real by dev ops.
Continuous delivery with zero downtime. made real by dev ops.
 
DevOps and the Bottom Line
DevOps and the Bottom Line DevOps and the Bottom Line
DevOps and the Bottom Line
 
Lean Analytics 101
Lean Analytics 101Lean Analytics 101
Lean Analytics 101
 
Analytics for smarter software development
Analytics for smarter software development Analytics for smarter software development
Analytics for smarter software development
 
ChefConf 2015 Event Slides
ChefConf 2015 Event SlidesChefConf 2015 Event Slides
ChefConf 2015 Event Slides
 
DevOps: The Key to IT Performance
DevOps: The Key to IT PerformanceDevOps: The Key to IT Performance
DevOps: The Key to IT Performance
 
Enhance your Agility with DevOps
Enhance your Agility with DevOpsEnhance your Agility with DevOps
Enhance your Agility with DevOps
 
Why Use Analytics on Your Software
Why Use Analytics on Your SoftwareWhy Use Analytics on Your Software
Why Use Analytics on Your Software
 
Analytics for Software Development
Analytics for Software DevelopmentAnalytics for Software Development
Analytics for Software Development
 
Information Needs for Software Development Analytics
Information Needs for Software Development AnalyticsInformation Needs for Software Development Analytics
Information Needs for Software Development Analytics
 
Analytics for software development
Analytics for software developmentAnalytics for software development
Analytics for software development
 

Similar to Agile analytics : An exploratory study of technical complexity management

Data Analytics Using R - Report
Data Analytics Using R - ReportData Analytics Using R - Report
Data Analytics Using R - ReportAkanksha Gohil
 
Module Overview Careers in Analytics In this module, we .docx
Module Overview  Careers in Analytics In this module, we .docxModule Overview  Careers in Analytics In this module, we .docx
Module Overview Careers in Analytics In this module, we .docxaudeleypearl
 
Module Overview Careers in Analytics In this module, we .docx
Module Overview  Careers in Analytics In this module, we .docxModule Overview  Careers in Analytics In this module, we .docx
Module Overview Careers in Analytics In this module, we .docxroushhsiu
 
Doing Analytics Right - Designing and Automating Analytics
Doing Analytics Right - Designing and Automating AnalyticsDoing Analytics Right - Designing and Automating Analytics
Doing Analytics Right - Designing and Automating AnalyticsTasktop
 
Review of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmReview of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmIRJET Journal
 
Clustering of Big Data Using Different Data-Mining Techniques
Clustering of Big Data Using Different Data-Mining TechniquesClustering of Big Data Using Different Data-Mining Techniques
Clustering of Big Data Using Different Data-Mining TechniquesIRJET Journal
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial IndustrySubrat Panda, PhD
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET Journal
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needGibDevs
 
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
IRJET-	 Fault Detection and Prediction of Failure using Vibration AnalysisIRJET-	 Fault Detection and Prediction of Failure using Vibration Analysis
IRJET- Fault Detection and Prediction of Failure using Vibration AnalysisIRJET Journal
 
churn_detection.pptx
churn_detection.pptxchurn_detection.pptx
churn_detection.pptxDhanuDhanu49
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruptionjagan477830
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptopRising Media, Inc.
 
A Firefly based improved clustering algorithm
A Firefly based improved clustering algorithmA Firefly based improved clustering algorithm
A Firefly based improved clustering algorithmIRJET Journal
 
CUSTOMER SEGMENTATION IN SHOPPING MALL USING CLUSTERING IN MACHINE LEARNING
CUSTOMER SEGMENTATION IN SHOPPING MALL USING CLUSTERING IN MACHINE LEARNINGCUSTOMER SEGMENTATION IN SHOPPING MALL USING CLUSTERING IN MACHINE LEARNING
CUSTOMER SEGMENTATION IN SHOPPING MALL USING CLUSTERING IN MACHINE LEARNINGIRJET Journal
 
Irjet v4 iA Survey on FP (Growth) Tree using Association Rule Mining7351
Irjet v4 iA Survey on FP (Growth) Tree using Association Rule Mining7351Irjet v4 iA Survey on FP (Growth) Tree using Association Rule Mining7351
Irjet v4 iA Survey on FP (Growth) Tree using Association Rule Mining7351IRJET Journal
 

Similar to Agile analytics : An exploratory study of technical complexity management (20)

Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
 
Data Analytics Using R - Report
Data Analytics Using R - ReportData Analytics Using R - Report
Data Analytics Using R - Report
 
Lesson1.2.pptx.pdf
Lesson1.2.pptx.pdfLesson1.2.pptx.pdf
Lesson1.2.pptx.pdf
 
Module Overview Careers in Analytics In this module, we .docx
Module Overview  Careers in Analytics In this module, we .docxModule Overview  Careers in Analytics In this module, we .docx
Module Overview Careers in Analytics In this module, we .docx
 
Module Overview Careers in Analytics In this module, we .docx
Module Overview  Careers in Analytics In this module, we .docxModule Overview  Careers in Analytics In this module, we .docx
Module Overview Careers in Analytics In this module, we .docx
 
Doing Analytics Right - Designing and Automating Analytics
Doing Analytics Right - Designing and Automating AnalyticsDoing Analytics Right - Designing and Automating Analytics
Doing Analytics Right - Designing and Automating Analytics
 
Review of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmReview of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering Algorithm
 
Clustering of Big Data Using Different Data-Mining Techniques
Clustering of Big Data Using Different Data-Mining TechniquesClustering of Big Data Using Different Data-Mining Techniques
Clustering of Big Data Using Different Data-Mining Techniques
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data Mining
 
Data processing
Data processingData processing
Data processing
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
 
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
IRJET-	 Fault Detection and Prediction of Failure using Vibration AnalysisIRJET-	 Fault Detection and Prediction of Failure using Vibration Analysis
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
 
churn_detection.pptx
churn_detection.pptxchurn_detection.pptx
churn_detection.pptx
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
A Firefly based improved clustering algorithm
A Firefly based improved clustering algorithmA Firefly based improved clustering algorithm
A Firefly based improved clustering algorithm
 
CUSTOMER SEGMENTATION IN SHOPPING MALL USING CLUSTERING IN MACHINE LEARNING
CUSTOMER SEGMENTATION IN SHOPPING MALL USING CLUSTERING IN MACHINE LEARNINGCUSTOMER SEGMENTATION IN SHOPPING MALL USING CLUSTERING IN MACHINE LEARNING
CUSTOMER SEGMENTATION IN SHOPPING MALL USING CLUSTERING IN MACHINE LEARNING
 
R in Insurance 2014
R in Insurance 2014R in Insurance 2014
R in Insurance 2014
 
Irjet v4 iA Survey on FP (Growth) Tree using Association Rule Mining7351
Irjet v4 iA Survey on FP (Growth) Tree using Association Rule Mining7351Irjet v4 iA Survey on FP (Growth) Tree using Association Rule Mining7351
Irjet v4 iA Survey on FP (Growth) Tree using Association Rule Mining7351
 

Recently uploaded

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...ranjana rawat
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 

Recently uploaded (20)

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 

Agile analytics : An exploratory study of technical complexity management

  • 1. Agile Analytics: An explanatory study of complexity management. Agnirudra Sikdar
  • 2. Contents ▪ Introduction to Big Data and Analytics ▪ What is Agile Analytics and how it is related to the scope of this thesis ▪ Analytics and its case studies ▪ Lessons learned from case studies on technical complexity management ▪ Proposed Methodology ▪ Data Acquisition ▪ Selection of Tools for Analytics ▪ Selection of Algorithms ▪ Choice of modelling approach ▪ Outcomes and Benefits from proposed methodology ▪ Conclusions
  • 3. Introduction to Big Data and Analytics ▪ What is Big Data ? ▪ What is Analytics? ▪ Why do we need Analytics? Fig 1. Big Data Analytics Pipeline Model
  • 4. What is Agile Analytics and how it is related to the scope of this thesis? • Agile Analytics is a style of building the data warehouse, data marts, business intelligence applications and analytics applications that focuses on early and continuous delivery of business value. • Agile Approach implements the practice of implementing FEEDBACKS in each iteration making faster delivery possible with every iteration. • It is not a rigid methodology. • The scope of this thesis is to understand the various technical complexities involved from different Big Data analytics and trying to implement agile approach to address these issues. Technical Complexity is not addressed in current agile methodologies and I would try to focus on bridging this gap. Fig 2: Various Influences of Agile Analytics
  • 5. Goals and Methodology ▪ Agile analytics focuses on soft/managerial variables. ▪ My goal is to establish the technical guidelines ▪ This thesis reports on an explanatory study surveying public case studies to identify the various technical complexities involved, and how it can be solved using the implementation of Agile Analytics. ▪ 12 case multi studies have been reviewed ▪ 2 Detailed Case studies on Customer Segmentation and Clustering have also been reviewed ▪ After intense reviewing of all the cases, I had implemented my proposed methodologies. Analytical Data Solutions IBM Invenco Oy PriceWater House Cooper Deloitte ThoughtWorks No of cases #2 No of cases #5 No of cases #1 No of cases #2 No of cases #1 No of cases #1 Online UK Retailer 4 Major Australian Banks
  • 6. Methodology Steps Case Studies Data Acquisition Data Acquisition variable Data Sampling Period Selection of Tools Selection of Algorithm Modelling Techniques From the various case studies , the following Methodological steps have been performed. 1)Data Acquisition: a)This step has been performed to determine the variables which can be used to sample the data. b) The time period of the data that can be taken to sample it. 2) Selection of Tools: This step has been performed to determine the type of tool to be used and the prose & cons involved with various tools. 3) Selection of Algorithm This step to determine the most efficient time complexity algorithm for the required analysis. 4) Modelling Techniques This step to identify if descriptive, or predictive or prescriptive modelling has been used, and how to improve it. Discussion of Benefits from the methodology: The outcome and advantages involved from applying the methodology.
  • 7. Catalogue of case studies: a blueprint Domain Company Modelling Software Retail US Based Luxury Company, Top Toy, GAP Descriptive, Predictive MongoDB, IBM Cognos Electronics Global PC manufacturer Descriptive, Predictive Proprietary software of Analytical Data Solutions Food Colombus Food Descriptive, Predictive IBM Cognos TM1, Finance Corona Direct, Canadian Insurance Firm Descriptive, Predictive, and slight Prescriptive IBM SPSS Marketing Mediamath Descriptive, Predictive, Prescriptive Netezza, MathClarity Manufacturing Mueller Inc Descriptive, Predictive, Prescriptive IBM Cognos, IBM SPSS Tele-communications Telecom company in US Descriptive HADOOP Information Technology PriceWater House Cooper Descriptive Open Source Tools Fig 3: Geolocations of the countries on which the case studies were carried out
  • 8. Conclusion from the Multi-Case studies • Agile is not completely present in all the cases. • Descriptive modelling and predictive modelling is present in most of the cases. • Marketing is the only sector which has the influence of all the modelling factors. • We will be focusing into the marketing domain, interested in the topic of Customer Segmentation and Clustering in the following slides. Fig 4 : Domains Vs Modelling Chart
  • 9. Key technical complexity issues in case studies Key points realized from Customer Segmentation case studies-: 1)Choice of Algorithm was , K clustering method along with RFM approach. 2)Data complexities were experienced while pre- processing datasets. 3)Choice of algorithm also resulted in algorithm processing complexity. 4)Process complexity lies in the fact how parallelization of computer processing can be used to compute data Fig 5: A flowchart of Customer Segmentation procedure.
  • 10. Proposed Methodologies I. Data Acquisition Proper Data Acquisition techniques are very important for the analytics domain. My proposed techniques are : 1) Starting with a selection of 6 variables. 3 from the customer’s perspective (Postal Code, Gender, Age Group) 3 From the company’s perspective ( Product , Price , Quantity ) 2) For analytic segmentation, we should try to gather a large sample (more than 600). Segments as small as 5% can often promise high profitability once they are weighted according to their spend levels. Conclusion: Agile Approach of early analytics and implementation can reduce the Data Warehouse/ Business Intelligence risk
  • 11. II. Selection Tool for Analytics In the selection of tools methodology, we have taken the following queries into consideration When choosing a data analysis software we need to answer a basic set of questions, like: Does it run natively on the computer? Does it handle large datasets? Does the software provide all methods we need? Is it affordable? Ease of use. Fig 6.A(left): Lavastorm survey of Analytics Tool Fig 6.B(right):Tools used by 2015 respondents to O’Reilly 2015 salary survey.
  • 12. A Tabular Summary of Software and their functionalities Name Advantages Disadvantages Open source? Typical users R Library support; visualization Steep learning curve Yes Finance; Statistics Matlab Elegant matrix support; visualization Expensive; incomplete statistics support No Engineering SciPy/NumPy /Matplotlib Python (general-purpose programming language) Immature Yes Engineering Excel Easy; visual; flexible Large datasets No Business SAS Large datasets Expensive; outdated programming language No Business; Government Stata Easy statistical analysis No Science SPSS Like Stata but more expensive and worse
  • 13. III. Choice of Algorithm The choice of algorithm is extremely crucial to any form of analytics. It is because algorithms addresses two issues, a) Time Complexity b) Space Complexity Some of the proposals about the selection of Algorithm are :- 1) Implementing pre-clustering methods like Canopy Clusters. (Canopy Clusters can process huge datasets efficiently. It is an unsupervised pre- clustering algorithm 2) Divisive Algorithms can minimize cut based cost in place of K means, as the latter maximizes only similarity within a cluster and ignores cost of cuts. Fig 7: Hierarchical Distribution of Analytical Segmentation.
  • 14. K-Clustering Method 1) Simplest Algorithm to solve clustering Problems 2) No of clusters is first initialized and Accordingly the initial cluster centers are randomly selected 3)New Partition created by assigning each Data to cluster that has the closest centroid 4)Steps 2 & 3 repeated until the centroids no longer move any cluster 5) The downside of K-Means is that, the result of clustering mostly depends on the initially selected centroids 6) Spherical datasets cannot be efficiently clustered using K-Means
  • 15. A few guidelines: clustering as a sample problem 1) Precede interdependence segmentation with focus groups 2) To solve differential bias, an exhaustive 5 point scale can be used. 3) Conjoint or discrete-choice based utilities can be used as basis of clustering. 4) We should try to cluster only unique dimensions by factor analyzation. 5) We should try to focus on the reliability and accuracy of the dependent variable (e.g. using CHAID) 5) Heavy Data Cleansing can be avoided when using decision trees (e.g. CHAID). 6) Different algorithms have a best-practice parametrization, e.g. for CHAID: (i) Smallest child node should be set at approximately 5% of total N, (ii) and parent node at twice the size of the child node. (ii) Alpha set at 0.5, with Bonferroni adjustment turned off
  • 16. IV. Modelling Approach DESCRIPTIVE MODELLING :- a) We estimate probability densities using either parametric or non- parametric approach. b) Some of the types of parametric modelling techniques that can be followed are: I) Mixture models II) Expectation Maximization.(Time complexity O(Kp2n); space complexity O(Kn). Can be slow to converge; local maxima ) c) Non-parametric density estimation doesn’t scale well. d) For determining number of clusters, we can use K-Means clustering, Partitioning Around Medoids (PAM),Clustering Large Applications (CLARA). e) For identifying number of clusters, we can select the Method of Mojena. f) Some of the defining decisions taken for clustering using descriptive approach are 1) Ability to deal with various attributes 2) Discovery of clusters with arbitrary shapes. 3) Minimal requirement for domain knowledge to determine input parameters. Fig 8: Components of Modelling
  • 17. Predictive Modelling The most used clustering algorithms based on predictive modelling are stated as follows: 1) Behavioral Clustering This approach enlightens about how people behave while purchasing. 2) Product Based Clustering/ Category Based Clustering This algorithm discovers what different groupings of product people buy from. 3) Brand based clustering This clustering tells us what brands people prefer. The other types of modelling techniques are used in Propensity Models for Predictions and Collaborative Filtering for Recommendations.
  • 18. Prescriptive Modelling Prescriptive modelling is an emerging modelling approach It is still in its infancy maintaining an Agile approach towards failures while prescriptive modelling, has resulted this to be one of the strongest modelling techniques on the rise From the case studies ,we have seen very few companies Are able to implement prescriptive modelling. It can also be said that , all prescriptive models are descriptive in nature, but all descriptive models are not prescriptive. Fig 9: Comparison of different modeling techniques.
  • 19. V. Outcomes and Benefits from Methodology By extracting common themes from the case studies and literature reviews, I have noticed three underlying principles. These principles are present in different magnitudes irrespective of their size, sector and structure. They are as follows: 1. Commonality,i.e ,business functions in a common field. 2. Concentration, i.e, grouping of enterprises who can interact. 3. Connectivity,i.e ,interconnected organizations with different types of relationships
  • 20. Conclusion 1) Data Integration Process is cumbersome & Companies should try to migrate to the latest Big Data Technologies like Hadoop, Cloudera, SPARK, HIVE and the likes. 2)Various technical complexities can be approached using Agile Approach 3) My guidelines on technical complexity management are Agile as they help the data scientist to obtain continuous outputs. Agile analytics focuses on soft managerial variables. Utilizing this idea and putting it to practice in situations of technical complexities from the various case study reviews, we can see that with every iteration , the result can be improved. Faster deployment. Faster feedback incorporation. Faster Delivery. The potential to scale the system using the feedback and its default flexible nature makes it the state of the art analytics for choice. In one phrase, “ Get Lean. Get Agile. Get Started”

Editor's Notes

  1. Volume, Velocity, Variety and Value. Discovery and communication of meaningful patterns in data. it depends on documented simultaneous application of computer programming, operations research to statistics. It is a multidimensional discipline. 3) We need analytics for recommendation actions and decision making. 4) The primary goal of big data analytics is to help companies make more informed business decisions by helping data scientists, predictive modelers and other analytics professionals to analyze data, or other untapped data by conventional business intelligence programs
  2. Agile analytics starts with learning and testing, so that companies can build their models and strategies based on solid answers to their most crucial business questions Agile big data analytics focuses not on the data itself but on the insight and action that can ultimately be drawn from nimble business intelligence systems. Agile Approach focuses on continuously delivering working features. In traditional models, developers take months on a feature, only to find it no longer applicable to the business environment. Frequent testing should be done. Using Agile approach we can make big data to find Agility instead of pivoting to a key insights. Adapt Agile Methods to individual projects and teams. The scope of this thesis is to understand the various technical complexities involved from various Big Data analytics and trying to implement agile approach to address these issues.
  3. Descriptive analytics looks at past performance and comprehends that performance by mining past information to look for the motives behind previous success or failure Classical descriptive approaches usually seen in data warehousing systems . Predictive models select major characteristics differently- i.e. while we may know a lot about our customers, we may not be able to precisely estimate what they will do next Prescriptive models though, take advantage of the data of descriptive models and the premise of predictive models and try to answer, not only what the client will do next, but why they will do so.
  4. This approach can be considered a good starting point for companies who wants to cluster their clients for marketing purpose. This is because it gives way to the possibility to answer various questions like, “Which product is preferred by which age group?”, or “how much money is spent by people of a particular location?” etc. 2) Segments as small as 5% can often promise high profitability once they are weighted according to their spend levels. But generally, one would never want to make inferences from a subsample any smaller than around n=30. Thus, one would have had to start with an overall sample of 600, (30/5% = 600) in order to ensure that a segment as small as 5% could be reliably described. Conclusion : We cannot know how well the data warehouse design matches the available data until we try to load it, nor how well it matches the stakeholder’s actual BI requirements until they use it. This is why Agile Approach of early analytics and implementation can reduce the Data Warehouse/ Business Intelligence risk
  5. When choosing a data analysis software we need to answer a basic set of questions, like:  Does it run natively on the computer?  Does it handle large datasets?  Does the software provide all methods we need?  Is it affordable?  Ease of use.
  6. Although all of these software supports Agile Approach but the final selection choice lies on the user and for the purpose he wants to focus in.
  7. 1) I would propose to implement pre-clustering methods like Canopy Clusters. (Canopy Clusters can process huge datasets efficiently. It is an unsupervised pre-clustering algorithm). This can be used as a pre-processing step for K-Means algorithm or the Hierarchical Clustering Algorithm. This will speed up the clustering operations on large datasets.
  8. The reasons why clustering is cluster development policies can benefit from understanding process of clusteringare as follows: 1. Clusters are not static and do not have fixed boundaries. 2. Clusters have different stages of development. (Agile approach can be implemented in developing a cluster as well as analyzing it.)