SlideShare a Scribd company logo
1 of 25
Introduction to
Descriptive &
Predictive Analytics
CS5122 DESCRIPTIVE & PREDICTIVE ANALYTICS
DILUM BANDARA
DILUM.BANDARA@UOM.LK
Data Analytics, Engineering, &
Science
 Buzz words that every decision maker either wants
to or forced to look into
 Data-driven decision making is hard
 Needs right data, fitting tools, skilled analysts, & a
supportive environment
 Data analysts
 Domain experts
 Tool experts
2
Useful Insights
Objectives
 Given a dataset, to train you to
 Ask Right Questions
 Identify Right Tool(s)
 Derive Right Answers/Insights
 We take a data-driven approach
 First try to derive a set of questions based on data
available for the analysis
 Explore potential techniques to support answering those
questions while using available data
 Deriving right answers by interpreting processed data &
visualizations
3
Source: https://moz.com/blog/when-it-comes-to-analytics-are-you-doing-enough
4
Descriptive, Predictive, &
Prescriptive Analytics
 Descriptive Analytics
 Use data aggregation & data mining techniques to provide
insight into the past & answer: “What has happened?”
 Predictive Analytics
 Use statistical models & forecasts techniques to
understand the future & answer: “What could happen?”
 Prescriptive Analytics
 Use optimization & simulation algorithms to advice on
possible outcomes & answer: “What should we do?”
Source: https://halobi.com/2014/10/descriptive-predictive-and-prescriptive-analytics-explained/
5
Example
Descriptive Analytics
◦ Wal-Mart’s found that on Friday afternoons, young American males who buy
diapers also tend to buy beer
◦ Potential sales of each item can increase, if they are kept close to each other
Predictive analytics
◦ Demand for diapers could increase in mid to late summer as more babies are
expected to bone in the USA.
◦ Make sure expected mothers are informed of their diaper choices through
advertising, & production & supply are ready to meet the extra demand
◦ Increased sales
Prescriptive analytics
◦ When to start advertising & when to give discounts?
◦ Help us understand the most effective dates & percentage of discounts that not
only increase sales but also profit
6
Tools can help
reduce difficulty
7
Source: IBM
8
Review of Basic
Statistics & Probability
Populations & Samples
 Population
 All items of interest for a particular decision or investigation
 E.g., all Gmail users, all subscribers to Netflix
 Sample
 A subset of the population
 E.g., all Google Apps for Education users, list of customers
who rented a comedy from Netflix in the past year
 Purpose of sampling is to obtain sufficient
information to draw a valid inference about a
population
10
Sample Space & Events
 Sample Space
 All possible outcomes of an experiment
 E.g., flipping a coin {H, T}
 E.g., rolling a dice {1, 2, 3, 4, 5, 6}
 Event
 Any subset of the sample space
 E.g., {H}, {T}, {H, T}, {1}, or {2, 4, 6}
11
Random Variable
 Variable whose value is subject to variations due
to chance
 Discrete random variables
 Toss a coin, roll a dice
 Continuous random variables
 Stock value, voltage of a sensor,
12
Measures of Location
 Mean
◦ Population mean
◦ Sample mean
 Median
◦ Middle value of data when sorted from least to greatest
 Mode
◦ Observation that occurs most often
 Midrange
◦ Average of greatest & least values = (max – min)/2
13
Probability Distribution/Mass
Function
14
Measures of Dispersion
 Dispersion
 Refers to the degree of variation in data
 Range
 Difference between max & min value
 Interquartile Range (IQR)
 Difference between 3rd and 1st quartiles
 Variance
 Average of squared deviations form mean
 Standard Deviation (STD)
 Square root of the variance
15
Measures of Dispersion (Cont.)
 z-score
Standard score is the number of STD an observation is
above/below the mean
For many data sets encountered in practice:
 ~68% of observations fall within 1 STD of mean
 ~95% fall within 2 STDs
 ~99.7% fall within 3 STDs
16
Measures of Dispersion (Cont.)
 Coefficient of Variation
 A relative measure of dispersion
 Return to risk = 1/CV
17
Exercise
Mean & STD of Closing Stock Prices:
 Intel (INTC): Mean = $18.81, STD = $0.50
 General Electric (GE): Mean = $16.19, STD =
$0.35
Which stock has higher risk of investment?
18
Measures of Dispersion (Cont.)
 Percentiles
 Value below which a given percentage of observations
in a group of observations fall
Source: www.mathsisfun.com/data/percentiles.html
19
Measures of Shape
 Skewness
 Describes lack of symmetry
 Coefficient of Skewness
CS < 0 for left-skewed data CS > 0 for right-skewed data
|CS| > 1 suggests high degree of skewness 0.5 ≤ |CS| ≤ 1 suggests moderate skewness
|CS| < 0.5 suggests relative symmetry
20
Measures of Shape (Cont.)
 Kurtosis
◦ Refers to peakedness or flatness
◦ Coefficient of Kurtosis
 CK < 3 indicates data is somewhat flat with a wide degree of dispersion
 CK > 3 indicates data is somewhat peaked with less dispersion
21
Measures of Association
 Covariance
◦ Measure of linear association between 2 variables, X & Y
Population
Sample
22
Measures of Association
 Correlation
◦ Measure of linear association between 2 variables, X & Y
◦ Correlation Coefficient
◦ Doesn’t depend upon units of measurement (unlike
covariance)
Population
Sample
23
Measures of Association
24
Outliers
 Mean & range are sensitive to outliers
 No standard definition of what constitutes an outlier
 Possible methods to identify outliers are:
 z-scores greater than +3 or less than -3
 extreme outliers are more than 3*IQR to the left of Q1
or right of Q3
 mild outliers are between 1.5*IQR and 3*IQR to the
left of Q1 or right of Q3
25

More Related Content

Similar to Introduction to Descriptive & Predictive Analytics

Session 1 and 2.pptx
Session 1 and 2.pptxSession 1 and 2.pptx
Session 1 and 2.pptxAkshitMGoel
 
Statistics Assignments 090427
Statistics Assignments 090427Statistics Assignments 090427
Statistics Assignments 090427amykua
 
Statistice Chapter 02[1]
Statistice  Chapter 02[1]Statistice  Chapter 02[1]
Statistice Chapter 02[1]plisasm
 
A high level overview of all that is Analytics
A high level overview of all that is AnalyticsA high level overview of all that is Analytics
A high level overview of all that is AnalyticsRamkumar Ravichandran
 
IRJET- Effecient Support Itemset Mining using Parallel Map Reducing
IRJET-  	  Effecient Support Itemset Mining using Parallel Map ReducingIRJET-  	  Effecient Support Itemset Mining using Parallel Map Reducing
IRJET- Effecient Support Itemset Mining using Parallel Map ReducingIRJET Journal
 
Retail lessons learned from the first data driven business and future direct...
Retail  lessons learned from the first data driven business and future direct...Retail  lessons learned from the first data driven business and future direct...
Retail lessons learned from the first data driven business and future direct...TUSHAR GARG
 
Data Samples & Data AnalysesNYU SCPSDataba
Data Samples & Data AnalysesNYU  SCPSDatabaData Samples & Data AnalysesNYU  SCPSDataba
Data Samples & Data AnalysesNYU SCPSDatabaOllieShoresna
 
Lecture3 Modelling Decision Processes
Lecture3 Modelling Decision ProcessesLecture3 Modelling Decision Processes
Lecture3 Modelling Decision ProcessesKodok Ngorex
 
Mat 255 chapter 3 notes
Mat 255 chapter 3 notesMat 255 chapter 3 notes
Mat 255 chapter 3 notesadrushle
 
A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...Yao Wu
 
Demystifying Data Science
Demystifying Data ScienceDemystifying Data Science
Demystifying Data ScienceJonathan Sedar
 

Similar to Introduction to Descriptive & Predictive Analytics (20)

Session 1 and 2.pptx
Session 1 and 2.pptxSession 1 and 2.pptx
Session 1 and 2.pptx
 
Statistics Assignments 090427
Statistics Assignments 090427Statistics Assignments 090427
Statistics Assignments 090427
 
Quantitative data essentials for charities - Learning Lab
Quantitative data essentials for charities - Learning LabQuantitative data essentials for charities - Learning Lab
Quantitative data essentials for charities - Learning Lab
 
Data analysis
Data analysisData analysis
Data analysis
 
Statistice Chapter 02[1]
Statistice  Chapter 02[1]Statistice  Chapter 02[1]
Statistice Chapter 02[1]
 
statistics.ppt
statistics.pptstatistics.ppt
statistics.ppt
 
Lecture-1.ppt
Lecture-1.pptLecture-1.ppt
Lecture-1.ppt
 
Lecture 1.ppt
Lecture 1.pptLecture 1.ppt
Lecture 1.ppt
 
Lecture 1.ppt
Lecture 1.pptLecture 1.ppt
Lecture 1.ppt
 
A high level overview of all that is Analytics
A high level overview of all that is AnalyticsA high level overview of all that is Analytics
A high level overview of all that is Analytics
 
IRJET- Effecient Support Itemset Mining using Parallel Map Reducing
IRJET-  	  Effecient Support Itemset Mining using Parallel Map ReducingIRJET-  	  Effecient Support Itemset Mining using Parallel Map Reducing
IRJET- Effecient Support Itemset Mining using Parallel Map Reducing
 
Retail lessons learned from the first data driven business and future direct...
Retail  lessons learned from the first data driven business and future direct...Retail  lessons learned from the first data driven business and future direct...
Retail lessons learned from the first data driven business and future direct...
 
Data Samples & Data AnalysesNYU SCPSDataba
Data Samples & Data AnalysesNYU  SCPSDatabaData Samples & Data AnalysesNYU  SCPSDataba
Data Samples & Data AnalysesNYU SCPSDataba
 
Kaedah Menganalisis data/Data Analysis
Kaedah Menganalisis data/Data AnalysisKaedah Menganalisis data/Data Analysis
Kaedah Menganalisis data/Data Analysis
 
Lecture3 Modelling Decision Processes
Lecture3 Modelling Decision ProcessesLecture3 Modelling Decision Processes
Lecture3 Modelling Decision Processes
 
Mat 255 chapter 3 notes
Mat 255 chapter 3 notesMat 255 chapter 3 notes
Mat 255 chapter 3 notes
 
A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...
 
18 cleaning
18 cleaning18 cleaning
18 cleaning
 
Statistics
StatisticsStatistics
Statistics
 
Demystifying Data Science
Demystifying Data ScienceDemystifying Data Science
Demystifying Data Science
 

More from Dilum Bandara

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningDilum Bandara
 
Time Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in PracticeTime Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in PracticeDilum Bandara
 
Introduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCAIntroduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCADilum Bandara
 
Introduction to Concurrent Data Structures
Introduction to Concurrent Data StructuresIntroduction to Concurrent Data Structures
Introduction to Concurrent Data StructuresDilum Bandara
 
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-MatrixHard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-MatrixDilum Bandara
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopDilum Bandara
 
Embarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsEmbarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsDilum Bandara
 
Introduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale ComputersIntroduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale ComputersDilum Bandara
 
Introduction to Thread Level Parallelism
Introduction to Thread Level ParallelismIntroduction to Thread Level Parallelism
Introduction to Thread Level ParallelismDilum Bandara
 
CPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesCPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesDilum Bandara
 
Data-Level Parallelism in Microprocessors
Data-Level Parallelism in MicroprocessorsData-Level Parallelism in Microprocessors
Data-Level Parallelism in MicroprocessorsDilum Bandara
 
Instruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware TechniquesInstruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware TechniquesDilum Bandara
 
Instruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler TechniquesInstruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler TechniquesDilum Bandara
 
CPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An IntroductionCPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An IntroductionDilum Bandara
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
High Performance Networking with Advanced TCP
High Performance Networking with Advanced TCPHigh Performance Networking with Advanced TCP
High Performance Networking with Advanced TCPDilum Bandara
 
Introduction to Content Delivery Networks
Introduction to Content Delivery NetworksIntroduction to Content Delivery Networks
Introduction to Content Delivery NetworksDilum Bandara
 
Peer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and StreamingPeer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and StreamingDilum Bandara
 
Wired Broadband Communication
Wired Broadband CommunicationWired Broadband Communication
Wired Broadband CommunicationDilum Bandara
 

More from Dilum Bandara (20)

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Time Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in PracticeTime Series Analysis and Forecasting in Practice
Time Series Analysis and Forecasting in Practice
 
Introduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCAIntroduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCA
 
Introduction to Concurrent Data Structures
Introduction to Concurrent Data StructuresIntroduction to Concurrent Data Structures
Introduction to Concurrent Data Structures
 
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-MatrixHard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with Hadoop
 
Embarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel ProblemsEmbarrassingly/Delightfully Parallel Problems
Embarrassingly/Delightfully Parallel Problems
 
Introduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale ComputersIntroduction to Warehouse-Scale Computers
Introduction to Warehouse-Scale Computers
 
Introduction to Thread Level Parallelism
Introduction to Thread Level ParallelismIntroduction to Thread Level Parallelism
Introduction to Thread Level Parallelism
 
CPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesCPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching Techniques
 
Data-Level Parallelism in Microprocessors
Data-Level Parallelism in MicroprocessorsData-Level Parallelism in Microprocessors
Data-Level Parallelism in Microprocessors
 
Instruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware TechniquesInstruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Hardware Techniques
 
Instruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler TechniquesInstruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler Techniques
 
CPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An IntroductionCPU Pipelining and Hazards - An Introduction
CPU Pipelining and Hazards - An Introduction
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
High Performance Networking with Advanced TCP
High Performance Networking with Advanced TCPHigh Performance Networking with Advanced TCP
High Performance Networking with Advanced TCP
 
Introduction to Content Delivery Networks
Introduction to Content Delivery NetworksIntroduction to Content Delivery Networks
Introduction to Content Delivery Networks
 
Peer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and StreamingPeer-to-Peer Networking Systems and Streaming
Peer-to-Peer Networking Systems and Streaming
 
Mobile Services
Mobile ServicesMobile Services
Mobile Services
 
Wired Broadband Communication
Wired Broadband CommunicationWired Broadband Communication
Wired Broadband Communication
 

Recently uploaded

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Introduction to Descriptive & Predictive Analytics

  • 1. Introduction to Descriptive & Predictive Analytics CS5122 DESCRIPTIVE & PREDICTIVE ANALYTICS DILUM BANDARA DILUM.BANDARA@UOM.LK
  • 2. Data Analytics, Engineering, & Science  Buzz words that every decision maker either wants to or forced to look into  Data-driven decision making is hard  Needs right data, fitting tools, skilled analysts, & a supportive environment  Data analysts  Domain experts  Tool experts 2 Useful Insights
  • 3. Objectives  Given a dataset, to train you to  Ask Right Questions  Identify Right Tool(s)  Derive Right Answers/Insights  We take a data-driven approach  First try to derive a set of questions based on data available for the analysis  Explore potential techniques to support answering those questions while using available data  Deriving right answers by interpreting processed data & visualizations 3
  • 5. Descriptive, Predictive, & Prescriptive Analytics  Descriptive Analytics  Use data aggregation & data mining techniques to provide insight into the past & answer: “What has happened?”  Predictive Analytics  Use statistical models & forecasts techniques to understand the future & answer: “What could happen?”  Prescriptive Analytics  Use optimization & simulation algorithms to advice on possible outcomes & answer: “What should we do?” Source: https://halobi.com/2014/10/descriptive-predictive-and-prescriptive-analytics-explained/ 5
  • 6. Example Descriptive Analytics ◦ Wal-Mart’s found that on Friday afternoons, young American males who buy diapers also tend to buy beer ◦ Potential sales of each item can increase, if they are kept close to each other Predictive analytics ◦ Demand for diapers could increase in mid to late summer as more babies are expected to bone in the USA. ◦ Make sure expected mothers are informed of their diaper choices through advertising, & production & supply are ready to meet the extra demand ◦ Increased sales Prescriptive analytics ◦ When to start advertising & when to give discounts? ◦ Help us understand the most effective dates & percentage of discounts that not only increase sales but also profit 6
  • 7. Tools can help reduce difficulty 7
  • 10. Populations & Samples  Population  All items of interest for a particular decision or investigation  E.g., all Gmail users, all subscribers to Netflix  Sample  A subset of the population  E.g., all Google Apps for Education users, list of customers who rented a comedy from Netflix in the past year  Purpose of sampling is to obtain sufficient information to draw a valid inference about a population 10
  • 11. Sample Space & Events  Sample Space  All possible outcomes of an experiment  E.g., flipping a coin {H, T}  E.g., rolling a dice {1, 2, 3, 4, 5, 6}  Event  Any subset of the sample space  E.g., {H}, {T}, {H, T}, {1}, or {2, 4, 6} 11
  • 12. Random Variable  Variable whose value is subject to variations due to chance  Discrete random variables  Toss a coin, roll a dice  Continuous random variables  Stock value, voltage of a sensor, 12
  • 13. Measures of Location  Mean ◦ Population mean ◦ Sample mean  Median ◦ Middle value of data when sorted from least to greatest  Mode ◦ Observation that occurs most often  Midrange ◦ Average of greatest & least values = (max – min)/2 13
  • 15. Measures of Dispersion  Dispersion  Refers to the degree of variation in data  Range  Difference between max & min value  Interquartile Range (IQR)  Difference between 3rd and 1st quartiles  Variance  Average of squared deviations form mean  Standard Deviation (STD)  Square root of the variance 15
  • 16. Measures of Dispersion (Cont.)  z-score Standard score is the number of STD an observation is above/below the mean For many data sets encountered in practice:  ~68% of observations fall within 1 STD of mean  ~95% fall within 2 STDs  ~99.7% fall within 3 STDs 16
  • 17. Measures of Dispersion (Cont.)  Coefficient of Variation  A relative measure of dispersion  Return to risk = 1/CV 17
  • 18. Exercise Mean & STD of Closing Stock Prices:  Intel (INTC): Mean = $18.81, STD = $0.50  General Electric (GE): Mean = $16.19, STD = $0.35 Which stock has higher risk of investment? 18
  • 19. Measures of Dispersion (Cont.)  Percentiles  Value below which a given percentage of observations in a group of observations fall Source: www.mathsisfun.com/data/percentiles.html 19
  • 20. Measures of Shape  Skewness  Describes lack of symmetry  Coefficient of Skewness CS < 0 for left-skewed data CS > 0 for right-skewed data |CS| > 1 suggests high degree of skewness 0.5 ≤ |CS| ≤ 1 suggests moderate skewness |CS| < 0.5 suggests relative symmetry 20
  • 21. Measures of Shape (Cont.)  Kurtosis ◦ Refers to peakedness or flatness ◦ Coefficient of Kurtosis  CK < 3 indicates data is somewhat flat with a wide degree of dispersion  CK > 3 indicates data is somewhat peaked with less dispersion 21
  • 22. Measures of Association  Covariance ◦ Measure of linear association between 2 variables, X & Y Population Sample 22
  • 23. Measures of Association  Correlation ◦ Measure of linear association between 2 variables, X & Y ◦ Correlation Coefficient ◦ Doesn’t depend upon units of measurement (unlike covariance) Population Sample 23
  • 25. Outliers  Mean & range are sensitive to outliers  No standard definition of what constitutes an outlier  Possible methods to identify outliers are:  z-scores greater than +3 or less than -3  extreme outliers are more than 3*IQR to the left of Q1 or right of Q3  mild outliers are between 1.5*IQR and 3*IQR to the left of Q1 or right of Q3 25

Editor's Notes

  1. Excel: AVERAGE(data range) MEDIAN(data range) MODE.SNGL(data range)
  2. Excel VAR.P(data range) VAR.S(data range) STDEV.P(data range) STDEV.S(data range)
  3. Excel: STANDARDIZE(x, mean, standard deviation)
  4. INTC = 0.0265 GE = 0.0216 INTC is a higher risk investment than GE.
  5. A total of 10,000 people visited the shopping mall over 12 hours
  6. SKEW(data range)
  7. kur·to·sis (ker to sis) KURT(data range)
  8. Excel: COVARIANCE.P(array1, array2) COVARIANCE.S(array1, array2)
  9. Normalized Excel: CORREL(array1, array2)