2. CCG
Analytics Solutions & Services
DATA
MANAGEMENT
Data & analytics consultants with a passion for helping clients
overcome business challenges & increase performance by
leveraging modern analytic solutions.
BUSINESS
ANALYTICS
DATA
STRATEGY
3. VOICES OF OUR CUSTOMERS
“CCG to brought the expertise and the vision of to help us execute, to provide
visibility to the data in a manner that we can use it faster.”
- Gary Gray, Business Solutions Executive, Corsicana Mattress Company
“The people we talked to know us. CCG wasn’t trying to fit us into a boilerplate
template but prescribe a tailored solution. Their RapidRoadmap was the basis of
our BI Strategy for the next two years.”
- Kevin Davis, Sr. Director of BI, Kforce
“Many times with CCG, we come to the table with questions or ideas and within a
couple of days or weeks the team comes back with above and beyond what we
actually asked for. They care.”
- Chris Fitzpatrick, Vice President of Business Analytics & Strategy, vineyard vines
“"I'm amazed at the talent at CCG, not just the skillset - they're really good people.
We've already referred them once and will do so again!”
- CIO, Ruth’s Chris Hospitality Group
4. AGENDA
What is Machine Learning?
Why should anyone care about
machine learning?
How does Machine Learning
work?
Ok but how does it really work?
How can an organization use
Machine Learning?
5. Advanced Analytics (“AA”) enable predictive and prescriptive uses of data by
applying sophisticated math and statistics to automate parts of the analysis.
What is Advanced Analytics?
Traditional analytics focuses on
understanding and explaining the
data that has been collected.
AA focuses on generating new
data in the form of predictions or
decisions, and going the extra step
to automate decision-making
when possible.
6. Advanced Analytics deal with making “best guesses” faster, better, and
more consistent than relying on human SMEs.
Provide insights on existing data using:
• Raw data points
• Summaries of data
• Calculations across existing data fields
• KPIs
The data reported are historical or current facts.
Generally requires the application of basic
mathematics or arithmetic.
Generate new data, including:
• Predicted future values
• Best guesses of missing values
• Suggested next steps
• Categorizations
The data generated are “best guesses” and
contain some uncertainty.
Requires the application of advanced
mathematics, statistics and computing principles.
Traditional Analytics Advanced Analytics
Traditional vs. Advanced Analytics
7. Machine Learning is one of several tools that enable advanced
analytics. It’s more of a HOW than a WHAT.
What is Machine Learning?
Data Science
A broad process for generating insights
that may involve data ingestion from
one or many sources (including external
data, streaming data, or big data), data
processing and cleansing, model
generation using statistical or machine
learning approaches, model selection,
model deployment and maintenance,
and visualization of data.
Advanced Analytics
Apply data science to predictive (what
will happen?) or prescriptive (what
should we do?) business use cases.
Artificial Intelligence /
Cognitive Computing
Apply data science to approximate
human intuition and decision making
(e.g. strategy, creativity, planning) or
human sensory function s (e.g.
computer vision, natural language
understanding, etc.)
Statistics
A branch of math for generating
descriptions of or inferences about a
population, often based on samples
of the population. Inferences may
take the form of “models,” which
are equations that approximate the
data’s inherent relationships.
Machine Learning
Combines computer science with
math concepts to generate models
by rapidly iterating on large
datasets.
Other Analytics Disciplines
(e.g. Data Engineering, Visualization)
Disciplines Process Outputs
Automation /
Robotics /
Intelligent
Devices
Actions
Strategy /
Operations
8. AGENDA
What is Machine Learning?
Why should anyone care about
machine learning?
How does Machine Learning
work?
Ok but how does it really work?
How can an organization use
Machine Learning?
9. The concepts in Machine Learning are not new.
How has Machine Learning Evolved?
https://www.quantinsti.com/blog/machine-learning-basics
another human.
10. Even though the concepts are decades old,
machine learning has only become feasible at scale in recent years.
Why Machine Learning Now?
Flood of data and decreasing costs of storage
Increasing computational power
Increased attention from researchers
Growth of open source technologies
Support from industries
11. Machine Learning has tons of useful applications you already encounter or
hear about every day.
Analyzing
Images
Understanding
Language
Forming &
Executing Strategy
Personalized
Recommendations
Autonomous
Decisions
Predicting
Asset Values
How is Machine Learning used?
12. Machine Learning isn’t just applicable to high tech.
There are suitable use cases present in most business sectors.
Where is Machine Learning used?
Healthcare
• Claims Fraud
• Real-time mortality risk
for ICU patients
• Response Adapted
Radiotherapy
• Predicting patient
medication adherence
• Translational/precision
medicine
Finance
• Foreclosure/credit risk
• Risk analysis
• Fraud detection
• Demand forecasting
• Anti Money Laundering
• Algorithmic trading
Energy
• Resource allocation
• Load forecasting
• Grid optimization
• Robotics
• Anomaly detection
• Image recognition
• Predictive maintenance
Retail
• Single view of customer
• Customer service analysis
• Inventory planning
• Social media analysis
• Lead scoring
• Marketing campaign
evaluation
13. Machine learning sits at the intersection of statistics and computer science to
help businesses make decisions.
Why Machine Learning Now?
Computational
Power
Statistics
Predictive & Prescriptive Decision Support
Faster More
Accurate
More
PowerfulSelf-Improving Always-On
14. AGENDA
What is Machine Learning?
Why should anyone care about
machine learning?
How does Machine Learning
work?
Ok but how does it really work?
How can an organization use
Machine Learning?
15. A model is a repeatable, data-driven approach to making a best guess.
It does this by formalizing mathematical relationships between data in the form of either:
– Rules (e.g. predict applicants will default on a loan if Credit Score < 700 and Debt to Income Ratio > 30%)
– Or an equation (e.g. predict Home Price = 100*Square Footage + 2*Average Income in the Area)
Machine Learning works by using “algorithms” to generate “models.”
How does Machine Learning work?
Data Model Statistical Model
16. In the past we’ve told computers how to use data to a answer our
questions.
Data
Prior month sales: $4MM
2 months prior: $3MM
3 months prior: $2MM
Program / Model
This month sales =
(prior month +
2 months prior +
3 months prior)
/ 3
Answer
This month’s sales = $3MM?
What’s a model?
17. Answer
Last month’s sales: $2MM
Data
Prior month sales: $4MM
2 months prior: $3MM
3 months prior: $1MM
Answer
Last month’s sales: $2MM
Data
Prior month sales: $4MM
2 months prior: $3MM
3 months prior: $1MM
Answer
Last month’s sales: $2MM
Data
Prior month sales: $4MM
2 months prior: $3MM
3 months prior: $1MM
Answer
Last month’s sales: $2MM
Data
Prior month sales: $4MM
2 months prior: $3MM
3 months prior: $1MM
But we’ve found that if we give the machine historic facts, we can let it find
the right program / model to plug in for future answers.
Answer
Last month’s sales: $2MM
Data
Prior month sales: $4MM
2 months prior: $3MM
3 months prior: $2MM
Program / Model
This month’s sales =
1/8 * Prior month +
1/3 * 2 months prior +
1/4 * 3 months prior
What’s a model?
18. Answer
Last month’s sales: $2MM
Data
Prior month sales: $4MM
2 months prior: $3MM
3 months prior: $1MM
Answer
Last month’s sales: $2MM
Data
Prior month sales: $4MM
2 months prior: $3MM
3 months prior: $1MM
Answer
Last month’s sales: $2MM
Data
Prior month sales: $4MM
2 months prior: $3MM
3 months prior: $1MM
Answer
Last month’s sales: $2MM
Data
Prior month sales: $4MM
2 months prior: $3MM
3 months prior: $1MM
Once we have our machine-defined program, we can use it with new data to
make better predictions.
Answer
Last month’s sales: $2MM
Data
Prior month sales: $4MM
2 months prior: $3MM
3 months prior: $2MM
Program / Model
This month’s sales =
1/8 * Prior month +
1/3 * 2 months prior +
1/4 * 3 months prior
New Data
Prior month sales: $8MM
2 months prior: $6MM
3 months prior: $8MM
Answer
This month’s sales = $5MM
What’s a model?
19. A defined set of steps for solving a problem
Often involves repeating steps
May or may not have an ending condition
– The problem is solved to our satisfaction
• For example – stop when the last 4 iterations have been 95% accurate or better
– The problem hasn’t been solved but we don’t seem to be getting any closer to solving it
• For example – stop if the last 10 iterations have not seen any improvement in accuracy
– The process has run for a long time
• For example – stop after the program has run for 12 hours, regardless of whether progress is still being made
The word algorithm gets used a lot, but it isn’t always defined.
What is an algorithm?
20. Collect the data and randomly create initial decision rules.
Design a method for measurably evaluating how good or bad your hypothesis is.
Update your hypothesis in a way that marginally improves the performance of your decision rules.
Continue this process until either you are satisfied with the results, or your hypothesis can’t improve
anymore with the data available.
Almost all machine learning algorithms follow the same general pattern.
Create a
hypothesis
Evaluate the
hypothesis
Adjust the
hypothesis
Repeat until
convergence
What is an algorithm?
21. AGENDA
What is Machine Learning?
Why should anyone care about
machine learning?
How does Machine Learning
work?
Ok but how does it really work?
How can an organization use
Machine Learning?
22. There are two main families of algorithms to choose from.
Supervised Learning Unsupervised Learning
There aren’t necessarily “right answers,” we just want to
get a better understanding of our data.
We know the “right answers” for some of the scenarios.
– We may have history we can look back on
– We may be hoping to replicate human decision making
23. Supervised or Unsupervised?
Predict our profits next quarter. Supervised
Identify the number written on a check.
Group our customers into segments.
Supervised
Unsupervised
Predict a user’s rating for a given product. Supervised
Find the most important variables in a dataset. Unsupervised
Identify credit card transactions that are out of the ordinary. Unsupervised
24. Now let’s walk through two of the most popular machine learning
approaches and discuss how the algorithms are applied.
How does an algorithm really work for businesses?
Classification Clustering
25. Use classification when you want to guess a non-numeric value, like a
yes/no answer. We will take a decision tree approach.
Everyone will repay their loan.
Create a
hypothesis
20 outstanding loans
26. Use classification when you want to guess a non-numeric value, like a
yes/no answer. We will take a decision tree approach.
Calculate accuracy as the % of predictions that are correct based on your current set of rules.
Evaluate the
hypothesis
20 outstanding loans
12 repaid, 8 defaulted
Accuracy = 12/20 = 60%
27. Income > 60kIncome < 60k
Use classification when you want to guess a non-numeric value, like a
yes/no answer. We will take a decision tree approach.
Find the next branch by looking for the data split that would have the biggest impact on the purity of
each node. There are several ways to do this mathematically (Gini Index, Information Gain, Chi-
Square).
Adjust the
hypothesis
20 outstanding loans20 outstanding loans
Credit Score > 700Credit Score < 700
20 outstanding loans
DTI > 40%DTI < 40%
70%50%
60% weighted
71%53%
59% weighted
80%73%
75% weighted
28. Use classification when you want to guess a non-numeric value, like a
yes/no answer. We will take a decision tree approach.
Repeat the process for each of your new “leaf” nodes. Stop when you reach an acceptable level of
accuracy, or when your accuracy begins getting worse with independent data.
Repeat until
convergence
20 outstanding loans
DTI > 40%DTI < 40%
Credit Score > 700Credit Score < 700Income > $60kIncome < $60k
100%50% 100%100%
80% weighted
29. Classification is used for lots of problems that copy human intuition. Think
about how you classify information to identify these images!
These use cases are obviously
more complex than our
simple decision tree, but with
more advanced approaches
like convolutional neural
networks these pictures can
definitely be classified by a
machine.
30. Use clustering when there’s no “correct” classification, but you still want to
assign individuals to groups. This algorithm is called k-means clustering.
Imagine Marketing
has asked you to
split these customers
into 3 groups.
How would you do
it?
31. Use clustering when there’s no “correct” classification, but you still want to
assign individuals to groups. This algorithm is called k-means clustering.
I can segment my customers by assigning them to 3 groups. We’ll set down 3 random “anchors” and
assign each customer to its closest anchor.
Create a
hypothesis
32. Use clustering when there’s no “correct” classification, but you still want to
assign individuals to groups. This algorithm is called k-means clustering.
Move the anchors to the center of each cluster. Count how many anchors are actually closer to one of
the other anchors.
Evaluate the
hypothesis
33. Use clustering when there’s no “correct” classification, but you still want to
assign individuals to groups. This algorithm is called k-means clustering.
Re-assign each customer to the group corresponding to the center they’re closest to.
Adjust the
hypothesis
34. Use clustering when there’s no “correct” classification, but you still want to
assign individuals to groups. This algorithm is called k-means clustering.
Repeat until
convergence
Move the anchors again. Continue re-assigning customers and moving the anchors until the anchors
stop moving.
35. This is just the tip of the iceberg. There are several
algorithms available for various types of problems.
36. AGENDA
What is Machine Learning?
Why should anyone care about
machine learning?
How does Machine Learning
work?
Ok but how does it really work?
How can an organization use
Machine Learning?
37. Delivering analytics with Machine Learning requires alignment
across people, process, technology, and data.
Engaging with Machine Learning
Image inspired by Microsoft
People
Process Technology
Data
Guide
Support
Enable
38. Data scientists combine broad skills to integrate data, build
models, and drive business value.
People
Process Technology
Data
39. Let’s look at the Microsoft Team Data Science Process to see how
data scientists spend their time.
People
Process Technology
Data
40. Traditional Analytics
The outputs of the process can be used in traditional analytics,
analyzed directly, or fed into automated decision-making.
Store and access data. Filter and aggregate it. Visualize it.
Show it to the business
so they can take action.
Machine Learning
Filter and aggregate it.
1
𝑁
𝑛=1
𝑁
𝑥
Create a model. Generate new data
(predictions, etc.).
The new data can be
stored with the rest of the
data for use in analytics.
Or it can be visualized
directly to gain insights.
Or it can automate
decisions or actions,
allowing better processes
to run faster and 24/7.
People
Process Technology
Data
41. Model performance naturally degrades over time as relationships in data
shift. Model maintenance is critical to using models on an ongoing basis.
It’s 1995.
Roughly 25% of Americans
own cell phones.
Imagine you want to build a model
predicting an individual’s income. This
model would justifiably give a significant
premium to cell phone owners.
It’s 2007.
Roughly 70% of Americans
own cell phones.
Ownership still has some significance, but
the premium for cell phone ownership
should be much smaller.
A machine learning model would
adapt itself over time, but a hand-
built model would need its
parameters adjusted manually.
It’s 2019.
Over 95% of Americans own
cell phones.
At this point cell phone ownership should
probably be dropped in favor of other data,
like ownership of an electric car.
Even machine learning models can only
learn from the data at their disposal, so
the data acquisition pipeline requires
updates regardless of the approach.
People
Process Technology
Data
42. The sources of data for use in data science can be broad.
People
Process Technology
Data
Data
Warehouses
•Curated &
Governed data
•Big data
•Cloud or on-prem
Data Lakes
•Unstructured &
Semi-structured
data
•Streaming data
•Partially curated
Externally
Procured
Data
•May be purchased
from 3rd party
providers
•May be scraped
from the web
•May require
designing research
experiments
Data scientists typically have the
programming and data integration skills to
use data from anywhere it can be found.
43. The Microsoft technology stack provides a holistic
solution to your Machine Learning needs.
People
Process Technology
Data
45. We can work with your business to deliver custom predictive and
prescriptive analytics across the lifecycle.
What can CCG do?
Use Case Definition
• Develop a backlog of
predictive and
prescriptive use cases
• Refine and prioritize use
cases by value
• Develop a predictive
roadmap
Model Development
• Aggregate data from
across internal and
external data sources
• Develop and test
multiple models to find
the best approach to
making predictions
Model Maintenance
• Monitor and maintain
statistical models to
sustain predictive power
• Develop a model telemetry
dashboard
• Test model design changes
to improve predictive
power
Model Governance & Processes
• Assess existing Data Science capabilities
• Develop standards and processes to help guide data science output
• Build a Data Science Center of Excellence
Model Deployment
• Customize and deploy
pre-existing models from
Azure Cognitive Services
• Deploy custom model as
an API or batch job, or
support deployment in
existing systems
Rapid Insight Prototype Offering Model as a Service Subscription Offering
46. CCG’s Rapid Insight Solution
Actionable Backlog
– Of use cases ripe for predictive
analytics to transform your
business
Detailed Readouts
– The materials we leave behind
will include extensive analysis
of our methodology, findings,
and recommendations
Ownership of the Model
– Just because the project ends
doesn’t mean the model stops
working. Unlike other managed
service providers, what we
produce on your behalf is yours
to keep
Identify Use Cases
– By holding a workshop with
process SMEs to identify
opportunities to supercharge the
business
Summarize the Findings
– So you can understand the
model’s outputs and begin
taking action on what we’ve
learned
Develop a Prototype Model
– To generate forecasts,
classifications, or exploratory
analysis for one of your use
cases using an industry-standard
tool like Azure Machine Learning
Studio or Databricks
Week 1 Weeks 2-5 Week 6
47. Fully Operational Production Model
– Available at all times, in production
– Batch & API integrations
Model Supervision
– Model is monitored for ongoing usability
– Performance dashboard
– Guaranteed accuracy SLAs
Model Retraining & Support
– Scheduled & triggered model re-tuning or re-training
– Add new data features over time
Model as a Service Solution
Set up model as
a web service
Visualize model
performance in a
dashboard
Maintain and
enhance model
51. What is Azure Databricks?
A fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure
Best of Databricks Best of Microsoft
Designed in collaboration with the founders of Apache Spark
One-click set up; streamlined workflows
Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.
Native integration with Azure services (Power BI, SQL DW, Cosmos DB, Blob Storage)
Enterprise-grade Azure security (Active Directory integration, compliance, enterprise -grade SLAs)
52. Azure Databricks key audiences & benefits
Unified analytics platform
Integrated workspace
Easy data exploration
Collaborative experience
Interactive dashboards
Faster insights
• Best of Spark & serverless
• Databricks managed Spark
Improved ETL performance
• Zero management clusters, serverless
Easy to schedule jobs
Automated workflows
Enhanced monitoring & troubleshooting
• Automated alerts & easy access to logs
Zero Management Spark
Cluster democratization (serverless)
Fast, collaborative analytics platform
accelerating time to market
No dev-ops required
Enterprise grade security
• Encryption
• End-to-end auditing
• Role-based control
• Compliance
Data scientist Data engineer CDO, VP of analytics
Provided by Microsoft and Databricks under NDA