SlideShare a Scribd company logo
1 of 24
DATA ANALYTICS
Introduction to Big data
โ€ข Data are everywhere.
โ€ข IBM projects that every day 2.5 quintillion bytes of data was
generated
โ€ข 90 percent of the data has been created in the last two years.
โ€ข 85 percent of organizations will be unable to exploit big data for
competitive advantage.
โ€ข 4.4 million jobs will be created around big data
Largest Data Sets Analysis by KDnuggets
Data Size Percentage
Less than 1 MB (12) 3.3
1.1 to 10 MB (8) 2.5
11 to 100 MB (14) 4.3
101 MB to 1 GB (50) 15.5
1.1 to 10 GB (59) 18
11 to 100 GB (52) 16
101 GB to 1 TB(59) 18
1.1 to 10 TB (39) 12
11 to 100 TB (15) 4.7
101 TB to 1 PB (6) 1.9
1.1 to 10 PB (2) 0.6
11 to 100 PB (0) 0
Over 100 PetaByte (6) 1.9
Example Applications
โ€ข Mail box analysis
โ€ข Internet Bill
โ€ข Electricity Bill
โ€ข Social Media
Analytics Process Model
ANALYTICS
โ€ข Analytics is a term that is often used interchangeably with data
science, data mining and knowledge discovery.
โ€ข It refers to extracting useful business patterns or mathematical
decision models from a preprocessed data set.
โ€ข Different underlying techniques can be used for this purpose,
โ€ข Statistics (Linear and logistics regression)
โ€ข Machine Learning (Decision tree)
โ€ข Biology (Neural Network)
โ€ข Kernel Methods (SVM)
Predictive and Descriptive - Distinction
โ€ข Predictive
โ€ข Target is available
โ€ข Categorical or continues
โ€ข Descriptive
โ€ข Target is not available
โ€ข Association rules, Sequence rules, and Clustering
Analytical Model Requirements
โ€ข A first critical success factor is business relevance
โ€ข The analytical model should actually solve the business
problem for which it was developed.
โ€ข It makes no sense to have a working analytical model
that got sidetracked from the original problem
statement.
โ€ข In order to achieve business relevance, the business
problem to be solved is appropriately defined,
qualified, and agreed upon by all parties involved at the
outset of the analysis.
Analytical Model Requirements
โ€ข A second criterion is statistical performance.
โ€ข The model should have statistical significance and
predictive power.
โ€ข Depending upon the application Analytical models should
also be
โ€ข Interpretable - understanding the patterns that the
analytical model captures
โ€ข Justifiable - the degree to which a model corresponds
to previous business knowledge
Analytical Model Requirements
โ€ข Analytical models should also be operationally efficient.
โ€ข the efforts needed to collect the data,
โ€ข preprocess the model,
โ€ข evaluate the model
โ€ข feed its outputs to the business application
โ€ข The economic cost needed to set up the analytical model
โ€ข Analytical models should also comply with both local and
international regulation.
STANDARDIZING AND CATEGORIZING
โ€ข Data standardization is the process of converting data to a
common format to enable users to process and analyze it
โ€ข Data standardization is the critical process of bringing data into a
common format that allows for
โ€ข collaborative research,
โ€ข large-scale analytics,
โ€ข sharing of sophisticated tools and methodologies.
Steps to standardize data
โ€ข Four steps to standardize customer data for better insights
โ€ข Step 1: Conduct a data source audit.
โ€ข Step 2: Define standards for data formats.
โ€ข Step 3: Standardize the format of external data sources.
โ€ข Step 4: Standardize existing data in the database.
โ€ข Standard Deviation
โ€ข Standardize = (xi-mean) / N
CATEGORIZATION
โ€ข Categorization is a major component of qualitative data analysis by
which investigators attempt to group patterns observed in the data
into meaningful units or categories.
โ€ข Categorization is also referred as coarse classification, classing,
grouping, binning, etc.
โ€ข For categorical variables, it is needed to reduce the number of
categories.
โ€ข E.g. Purpose of loan โ€“ has 50 values.
โ€ข 49 dummy variables are needed to estimate one variable.
Categorization Methods
โ€ข Two very basic methods are used for categorization.
โ€ข equal interval binning
โ€ข equal frequency binning.
โ€ข Consider, for example, the income values 1,000, 1,200, 1,300, 2,000, 1,800,
and 1,400.
โ€ข Equal interval binning would create two bins with the same rangeโ€”Bin 1:
1,000, 1,500 and
โ€ข Bin 2: 1,500, 2,000
โ€ข Equal frequency binning would create two bins with the same number of
observationsโ€”
โ€ข Bin 1: 1,000, 1,200, 1,300;
โ€ข Bin 2: 1,400, 1,800,2,000.
Weight of Evidence Coding
โ€ข Variable transformation of independent variables.
โ€ข Used for grouping, variable selection etc.
โ€ข The weight of evidence tells the predictive power of an independent
variable in relation to the dependent variable.
Weight of Evidence Coding
โ€ข Example: Predict good or bad customer based on age or income
โ€ข Model 1:
โ€ข Customer type = a + b (income) ----> Predicts 70% correctly
โ€ข Model 2:
โ€ข Customer type = a + b (age) ----> Predicts 60% correctly
โ€ข So the ability of โ€œincomeโ€ to separate good and bad is more than
โ€œageโ€ and hence the weight
Weight of Evidence Coding
โ€ข Definition:
โ€ข Since it evolved from credit scoring world, it is generally described as
a measure of the separation of good and bad customers.
โ€ข "Bad Customers" refers to the customers who defaulted on a loan.
and "Good Customers" refers to the customers who paid back loan.
โ€ข Positive WOE means Distribution of Goods > Distribution of Badโ€™s
Negative WOE means Distribution of Goods < Distribution of Badโ€™s
Weight of Evidence Coding
DATA SEGMENTATION
โ€ข Sometimes data is segmented before the analytical modeling starts.
โ€ข The segmentation can be conducted
โ€ข using the experience and knowledge from a business expert
โ€ข based on statistical analysis using decision trees, kโ€means, or selfโ€organizing
maps
โ€ข Segmentation is used to estimate different analytical models each
personalized to a specific segment.
โ€ข This process must be done careful because it may lead to increase the
production, monitoring and maintenance cost.
DATA SEGMENTATION
โ€ข Data Segmentation is the process of taking the data you hold and
dividing it up and grouping similar data together based on the chosen
parameters
โ€ข So that you can use it more efficiently within marketing and
operations
โ€ข It is the process of grouping your data into at least two subsets
F TEST
โ€ข The F-test is used to carry out the test for the equality of the two
population variances.
โ€ข If a researcher wants to test whether or not two independent
samples have been drawn from a normal population with the same
variability, then he generally employs the F-test.
F-TEST
โ€ข It is a statistical test used to compare any two different data sets
โ€ข It gives the mean, variance, observations etc details
โ€ข F-Test :
โ€ข compares your model with zero predictor variables and decides whether
your added coefficients improved the model.
T-distribution
โ€ข The t-distribution is used as an alternative to the normal distribution
when sample sizes are small in order to estimate confidence
โ€ข It also determine critical values that an observation is a given distance
from the mean.

More Related Content

Similar to DA-Module 1.pptx

Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017StampedeCon
ย 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningSanghamitra Deb
ย 
A Guide to SPSS Statistics
A Guide to SPSS Statistics A Guide to SPSS Statistics
A Guide to SPSS Statistics Luke Farrell
ย 
Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Vivastream
ย 
Lean Six Sigma Black Belt Training
Lean Six Sigma Black Belt TrainingLean Six Sigma Black Belt Training
Lean Six Sigma Black Belt TrainingRavikanth Jagarlapudi
ย 
Understanding the Lifecycle of a Data Analysis Project
Understanding the Lifecycle of a Data Analysis ProjectUnderstanding the Lifecycle of a Data Analysis Project
Understanding the Lifecycle of a Data Analysis ProjectLevel Education
ย 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
ย 
Supply chain design and analysis
Supply chain design and analysisSupply chain design and analysis
Supply chain design and analysisMohammadHoseinSharif1
ย 
Intro to ml_2021
Intro to ml_2021Intro to ml_2021
Intro to ml_2021Sanghamitra Deb
ย 
Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Vivastream
ย 
Lecture 5 Quality Performance Tools & Techniques
Lecture 5  Quality Performance Tools & TechniquesLecture 5  Quality Performance Tools & Techniques
Lecture 5 Quality Performance Tools & TechniquesTantish QS, UTM
ย 
LPP application and problem formulation
LPP application and problem formulationLPP application and problem formulation
LPP application and problem formulationKarishma Chaudhary
ย 
Chapter 6-Process Selection and Facility Layout.pptx
Chapter 6-Process Selection and Facility Layout.pptxChapter 6-Process Selection and Facility Layout.pptx
Chapter 6-Process Selection and Facility Layout.pptxKristaella Requiz
ย 
Kaizen Egypt | Introduction to Business Process Management & Process Developm...
Kaizen Egypt | Introduction to Business Process Management & Process Developm...Kaizen Egypt | Introduction to Business Process Management & Process Developm...
Kaizen Egypt | Introduction to Business Process Management & Process Developm...Amr El-Ganainy
ย 
Foundations of analytics.ppt
Foundations of analytics.pptFoundations of analytics.ppt
Foundations of analytics.pptSurekha98
ย 
An Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningAn Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningBarry Leventhal
ย 
Data Collection Preparation
Data Collection PreparationData Collection Preparation
Data Collection PreparationBusiness Student
ย 
Sfm module iv
Sfm module ivSfm module iv
Sfm module ivAshwini Das
ย 

Similar to DA-Module 1.pptx (20)

Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
ย 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
ย 
A Guide to SPSS Statistics
A Guide to SPSS Statistics A Guide to SPSS Statistics
A Guide to SPSS Statistics
ย 
Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?
ย 
Lean Six Sigma Black Belt Training
Lean Six Sigma Black Belt TrainingLean Six Sigma Black Belt Training
Lean Six Sigma Black Belt Training
ย 
Understanding the Lifecycle of a Data Analysis Project
Understanding the Lifecycle of a Data Analysis ProjectUnderstanding the Lifecycle of a Data Analysis Project
Understanding the Lifecycle of a Data Analysis Project
ย 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
ย 
Supply chain design and analysis
Supply chain design and analysisSupply chain design and analysis
Supply chain design and analysis
ย 
Intro to ml_2021
Intro to ml_2021Intro to ml_2021
Intro to ml_2021
ย 
Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?
ย 
Lecture 5 Quality Performance Tools & Techniques
Lecture 5  Quality Performance Tools & TechniquesLecture 5  Quality Performance Tools & Techniques
Lecture 5 Quality Performance Tools & Techniques
ย 
LPP application and problem formulation
LPP application and problem formulationLPP application and problem formulation
LPP application and problem formulation
ย 
Chapter 6-Process Selection and Facility Layout.pptx
Chapter 6-Process Selection and Facility Layout.pptxChapter 6-Process Selection and Facility Layout.pptx
Chapter 6-Process Selection and Facility Layout.pptx
ย 
Kaizen Egypt | Introduction to Business Process Management & Process Developm...
Kaizen Egypt | Introduction to Business Process Management & Process Developm...Kaizen Egypt | Introduction to Business Process Management & Process Developm...
Kaizen Egypt | Introduction to Business Process Management & Process Developm...
ย 
Lesson1.2.pptx.pdf
Lesson1.2.pptx.pdfLesson1.2.pptx.pdf
Lesson1.2.pptx.pdf
ย 
segmentda
segmentdasegmentda
segmentda
ย 
Foundations of analytics.ppt
Foundations of analytics.pptFoundations of analytics.ppt
Foundations of analytics.ppt
ย 
An Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningAn Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data mining
ย 
Data Collection Preparation
Data Collection PreparationData Collection Preparation
Data Collection Preparation
ย 
Sfm module iv
Sfm module ivSfm module iv
Sfm module iv
ย 

More from vijayapraba1

HBase.pptx
HBase.pptxHBase.pptx
HBase.pptxvijayapraba1
ย 
C10ComputerEngg.pptx
C10ComputerEngg.pptxC10ComputerEngg.pptx
C10ComputerEngg.pptxvijayapraba1
ย 
apacheairflow-160827123852.pdf
apacheairflow-160827123852.pdfapacheairflow-160827123852.pdf
apacheairflow-160827123852.pdfvijayapraba1
ย 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.pptvijayapraba1
ย 
2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptxvijayapraba1
ย 
C6ModelingTestingFinalOutputs.pptx
C6ModelingTestingFinalOutputs.pptxC6ModelingTestingFinalOutputs.pptx
C6ModelingTestingFinalOutputs.pptxvijayapraba1
ย 
C3Problems and Brainstorming.ppt
C3Problems and Brainstorming.pptC3Problems and Brainstorming.ppt
C3Problems and Brainstorming.pptvijayapraba1
ย 
ch13_extsort.ppt
ch13_extsort.pptch13_extsort.ppt
ch13_extsort.pptvijayapraba1
ย 
logic gates ppt.pptx
logic gates ppt.pptxlogic gates ppt.pptx
logic gates ppt.pptxvijayapraba1
ย 
Combinational_Logic_Circuit_PPT.pptx
Combinational_Logic_Circuit_PPT.pptxCombinational_Logic_Circuit_PPT.pptx
Combinational_Logic_Circuit_PPT.pptxvijayapraba1
ย 
NumberSystems.pptx
NumberSystems.pptxNumberSystems.pptx
NumberSystems.pptxvijayapraba1
ย 
NumberSystems.pptx
NumberSystems.pptxNumberSystems.pptx
NumberSystems.pptxvijayapraba1
ย 
logic gates ppt-180430044215.pdf
logic gates ppt-180430044215.pdflogic gates ppt-180430044215.pdf
logic gates ppt-180430044215.pdfvijayapraba1
ย 
sequential circuits PPT.pdf
sequential circuits PPT.pdfsequential circuits PPT.pdf
sequential circuits PPT.pdfvijayapraba1
ย 
03_NumberSystems.pdf
03_NumberSystems.pdf03_NumberSystems.pdf
03_NumberSystems.pdfvijayapraba1
ย 
Fundamentals of Programming Constructs.pptx
Fundamentals of  Programming Constructs.pptxFundamentals of  Programming Constructs.pptx
Fundamentals of Programming Constructs.pptxvijayapraba1
ย 

More from vijayapraba1 (16)

HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
ย 
C10ComputerEngg.pptx
C10ComputerEngg.pptxC10ComputerEngg.pptx
C10ComputerEngg.pptx
ย 
apacheairflow-160827123852.pdf
apacheairflow-160827123852.pdfapacheairflow-160827123852.pdf
apacheairflow-160827123852.pdf
ย 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
ย 
2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx
ย 
C6ModelingTestingFinalOutputs.pptx
C6ModelingTestingFinalOutputs.pptxC6ModelingTestingFinalOutputs.pptx
C6ModelingTestingFinalOutputs.pptx
ย 
C3Problems and Brainstorming.ppt
C3Problems and Brainstorming.pptC3Problems and Brainstorming.ppt
C3Problems and Brainstorming.ppt
ย 
ch13_extsort.ppt
ch13_extsort.pptch13_extsort.ppt
ch13_extsort.ppt
ย 
logic gates ppt.pptx
logic gates ppt.pptxlogic gates ppt.pptx
logic gates ppt.pptx
ย 
Combinational_Logic_Circuit_PPT.pptx
Combinational_Logic_Circuit_PPT.pptxCombinational_Logic_Circuit_PPT.pptx
Combinational_Logic_Circuit_PPT.pptx
ย 
NumberSystems.pptx
NumberSystems.pptxNumberSystems.pptx
NumberSystems.pptx
ย 
NumberSystems.pptx
NumberSystems.pptxNumberSystems.pptx
NumberSystems.pptx
ย 
logic gates ppt-180430044215.pdf
logic gates ppt-180430044215.pdflogic gates ppt-180430044215.pdf
logic gates ppt-180430044215.pdf
ย 
sequential circuits PPT.pdf
sequential circuits PPT.pdfsequential circuits PPT.pdf
sequential circuits PPT.pdf
ย 
03_NumberSystems.pdf
03_NumberSystems.pdf03_NumberSystems.pdf
03_NumberSystems.pdf
ย 
Fundamentals of Programming Constructs.pptx
Fundamentals of  Programming Constructs.pptxFundamentals of  Programming Constructs.pptx
Fundamentals of Programming Constructs.pptx
ย 

Recently uploaded

Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
ย 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
ย 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
ย 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
ย 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
ย 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .DerechoLaboralIndivi
ย 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
ย 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
ย 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Christo Ananth
ย 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
ย 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
ย 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
ย 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
ย 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
ย 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
ย 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
ย 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
ย 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
ย 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
ย 

Recently uploaded (20)

Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
ย 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
ย 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
ย 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
ย 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
ย 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ย 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
ย 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
ย 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ย 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
ย 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ย 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
ย 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
ย 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
ย 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
ย 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
ย 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
ย 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
ย 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
ย 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
ย 

DA-Module 1.pptx

  • 2. Introduction to Big data โ€ข Data are everywhere. โ€ข IBM projects that every day 2.5 quintillion bytes of data was generated โ€ข 90 percent of the data has been created in the last two years. โ€ข 85 percent of organizations will be unable to exploit big data for competitive advantage. โ€ข 4.4 million jobs will be created around big data
  • 3. Largest Data Sets Analysis by KDnuggets Data Size Percentage Less than 1 MB (12) 3.3 1.1 to 10 MB (8) 2.5 11 to 100 MB (14) 4.3 101 MB to 1 GB (50) 15.5 1.1 to 10 GB (59) 18 11 to 100 GB (52) 16 101 GB to 1 TB(59) 18 1.1 to 10 TB (39) 12 11 to 100 TB (15) 4.7 101 TB to 1 PB (6) 1.9 1.1 to 10 PB (2) 0.6 11 to 100 PB (0) 0 Over 100 PetaByte (6) 1.9
  • 4. Example Applications โ€ข Mail box analysis โ€ข Internet Bill โ€ข Electricity Bill โ€ข Social Media
  • 6. ANALYTICS โ€ข Analytics is a term that is often used interchangeably with data science, data mining and knowledge discovery. โ€ข It refers to extracting useful business patterns or mathematical decision models from a preprocessed data set. โ€ข Different underlying techniques can be used for this purpose, โ€ข Statistics (Linear and logistics regression) โ€ข Machine Learning (Decision tree) โ€ข Biology (Neural Network) โ€ข Kernel Methods (SVM)
  • 7. Predictive and Descriptive - Distinction โ€ข Predictive โ€ข Target is available โ€ข Categorical or continues โ€ข Descriptive โ€ข Target is not available โ€ข Association rules, Sequence rules, and Clustering
  • 8. Analytical Model Requirements โ€ข A first critical success factor is business relevance โ€ข The analytical model should actually solve the business problem for which it was developed. โ€ข It makes no sense to have a working analytical model that got sidetracked from the original problem statement. โ€ข In order to achieve business relevance, the business problem to be solved is appropriately defined, qualified, and agreed upon by all parties involved at the outset of the analysis.
  • 9. Analytical Model Requirements โ€ข A second criterion is statistical performance. โ€ข The model should have statistical significance and predictive power. โ€ข Depending upon the application Analytical models should also be โ€ข Interpretable - understanding the patterns that the analytical model captures โ€ข Justifiable - the degree to which a model corresponds to previous business knowledge
  • 10. Analytical Model Requirements โ€ข Analytical models should also be operationally efficient. โ€ข the efforts needed to collect the data, โ€ข preprocess the model, โ€ข evaluate the model โ€ข feed its outputs to the business application โ€ข The economic cost needed to set up the analytical model โ€ข Analytical models should also comply with both local and international regulation.
  • 11. STANDARDIZING AND CATEGORIZING โ€ข Data standardization is the process of converting data to a common format to enable users to process and analyze it โ€ข Data standardization is the critical process of bringing data into a common format that allows for โ€ข collaborative research, โ€ข large-scale analytics, โ€ข sharing of sophisticated tools and methodologies.
  • 12. Steps to standardize data โ€ข Four steps to standardize customer data for better insights โ€ข Step 1: Conduct a data source audit. โ€ข Step 2: Define standards for data formats. โ€ข Step 3: Standardize the format of external data sources. โ€ข Step 4: Standardize existing data in the database.
  • 13. โ€ข Standard Deviation โ€ข Standardize = (xi-mean) / N
  • 14. CATEGORIZATION โ€ข Categorization is a major component of qualitative data analysis by which investigators attempt to group patterns observed in the data into meaningful units or categories. โ€ข Categorization is also referred as coarse classification, classing, grouping, binning, etc. โ€ข For categorical variables, it is needed to reduce the number of categories. โ€ข E.g. Purpose of loan โ€“ has 50 values. โ€ข 49 dummy variables are needed to estimate one variable.
  • 15. Categorization Methods โ€ข Two very basic methods are used for categorization. โ€ข equal interval binning โ€ข equal frequency binning. โ€ข Consider, for example, the income values 1,000, 1,200, 1,300, 2,000, 1,800, and 1,400. โ€ข Equal interval binning would create two bins with the same rangeโ€”Bin 1: 1,000, 1,500 and โ€ข Bin 2: 1,500, 2,000 โ€ข Equal frequency binning would create two bins with the same number of observationsโ€” โ€ข Bin 1: 1,000, 1,200, 1,300; โ€ข Bin 2: 1,400, 1,800,2,000.
  • 16. Weight of Evidence Coding โ€ข Variable transformation of independent variables. โ€ข Used for grouping, variable selection etc. โ€ข The weight of evidence tells the predictive power of an independent variable in relation to the dependent variable.
  • 17. Weight of Evidence Coding โ€ข Example: Predict good or bad customer based on age or income โ€ข Model 1: โ€ข Customer type = a + b (income) ----> Predicts 70% correctly โ€ข Model 2: โ€ข Customer type = a + b (age) ----> Predicts 60% correctly โ€ข So the ability of โ€œincomeโ€ to separate good and bad is more than โ€œageโ€ and hence the weight
  • 18. Weight of Evidence Coding โ€ข Definition: โ€ข Since it evolved from credit scoring world, it is generally described as a measure of the separation of good and bad customers. โ€ข "Bad Customers" refers to the customers who defaulted on a loan. and "Good Customers" refers to the customers who paid back loan. โ€ข Positive WOE means Distribution of Goods > Distribution of Badโ€™s Negative WOE means Distribution of Goods < Distribution of Badโ€™s
  • 20. DATA SEGMENTATION โ€ข Sometimes data is segmented before the analytical modeling starts. โ€ข The segmentation can be conducted โ€ข using the experience and knowledge from a business expert โ€ข based on statistical analysis using decision trees, kโ€means, or selfโ€organizing maps โ€ข Segmentation is used to estimate different analytical models each personalized to a specific segment. โ€ข This process must be done careful because it may lead to increase the production, monitoring and maintenance cost.
  • 21. DATA SEGMENTATION โ€ข Data Segmentation is the process of taking the data you hold and dividing it up and grouping similar data together based on the chosen parameters โ€ข So that you can use it more efficiently within marketing and operations โ€ข It is the process of grouping your data into at least two subsets
  • 22. F TEST โ€ข The F-test is used to carry out the test for the equality of the two population variances. โ€ข If a researcher wants to test whether or not two independent samples have been drawn from a normal population with the same variability, then he generally employs the F-test.
  • 23. F-TEST โ€ข It is a statistical test used to compare any two different data sets โ€ข It gives the mean, variance, observations etc details โ€ข F-Test : โ€ข compares your model with zero predictor variables and decides whether your added coefficients improved the model.
  • 24. T-distribution โ€ข The t-distribution is used as an alternative to the normal distribution when sample sizes are small in order to estimate confidence โ€ข It also determine critical values that an observation is a given distance from the mean.