SlideShare a Scribd company logo
BAS 250
Lesson 2: Data Preparation
 Explain concepts and purpose of Data Preparation
 Understand solutions for handling missing and
inconsistent data
 Utilize data and attribute reduction techniques
 Effectively work in RapidMiner to prepare your data.
This Week’s Learning Objectives
The Data Mining Process: CRISP-DM
o Join data sets that are needed for your analysis
o Reduce data sets to only include pertinent
variables
o Scrub data to remove anomalies- outliers or
missing data
o Reformat for consistency and effective use
3. Data Preparation
 Ensure robustness of data
o Combine more 2 or more data sets to create a “mini – database”
with all variables needed for analysis in one place.
o Merge by a unique identifier common to both data sets
 “Key Identifier”, “Common ID”, “ID Number”, etc.
 Example: Social Security Number (links Medical and Insurance)
Data Preparation
Data Preparation
Example: Sources of Data
Customer Purchases - “Point of Sale data” – CSV file format
Cost of Products Sold – “Accounting department” – Excel file format
Inventory of Products - “ IT Data Warehouse” - XML file format
Merge By Product ID or SKU
 Data Reduction…two part
o Observations (rows, instances, etc)
o Attributes (variables, records, columns, etc)
Data Preparation
 Attribute reduction to filter out irrelevant or
uninteresting data without completely removing them
from the original set.
 Even if a variable isn’t interesting for answering some
questions, it may still be useful in others.
It is recommended to import all attributes first, then filter as necessary
Data Preparation
 Observation Reduction…
 Observation reduction is to reduce the # of observations to create a
smaller data set.
 Some reasons to do so:
o Create a sample set for:
 Training data, proof of concept analysis, testing theories, sharing data
o Improve analysis speed or process time
o Data scrubbing for outliers, missing values, etc.
Data Preparation
 Ensure consistency of data
o Missing information
o Spelling errors, typos
o Multiple responses for an attribute
o Characters in numeric fields and vice-versa
Data Preparation
 Ensure consistency of data
Data Preparation
KEY: Missing data is data that does not exist in a data set
• Not the same as zero or some other value
• In a dataset, it is blank and the value is unknown
• Sometimes referred to as null values
• Depending on your objective and the circumstance, you may
choose to leave missing data as they are or replace with some
other value
 Ensure consistency of data
Data Preparation
KEY: Inconsistent data is different from missing data
• Occurs when a value does exist but its value is not valid
or meaningful.
• Common = “.” or “zero”
 Ensure consistency of data
Data Preparation
Replace or remove missing or inconsistent data
• For numeric data…
• Can be replaced using Measures of Central Tendency
• Mean, Median, and Mode
• Mean - Average value
• Median - Middle value
• Mode - Most frequent or common value
 Ensure consistency of data
Data Preparation
Replace or remove missing or inconsistent data
• For character data…
• Can be replaced using Best Estimated Value
• “Like Others”
• Ex. All males in data like bass fishing. If attribute “Fish Type” is blank and
attribute “Gender” equals male, then “Bass”
• “Clustering Techniques”
• “Best Guess”
 Ensure consistency of data
Data Preparation
• Replacing missing or inconsistent values found in
data should be done:
• With intention, not haphazardly
• Use common sense
• Be transparent
It is recommended to always document your
missing or consistent data processes.
 This course is a practical application course in Data Mining. Learning to
use RapidMiner is required.
 If you have not done so yet, please plan to walk through the tutorial
examples in RapidMiner.
 To assist you in understanding RapidMiner, I will take screenshots of what
I am doing to get the results we are looking for.
 RapidMiner is pretty intuitive. You will get it quickly.
Basics of RapidMiner
 Types of files that can be imported into RapidMiner:
o CSV File
o Excel File
o XML File
o Access Database Table
o … and much more
 We use mainly CSV files which contain Comma Separated Values- be mindful if
your dataset contains commas
o Alternative delimiters can be selected in this case:
 Tab
 Semicolon
 Pipe ( l ), etc.
Basics of RapidMiner
 Three main areas that contain useful tools in
RapidMiner:
o Operators – Every possible task you can think of
o Repositories – Where you store your data
o Parameters – Task set up details
Basics of RapidMiner
Basics of RapidMiner
Basics of RapidMiner
Basics of RapidMiner
Basics of RapidMiner
Basics of RapidMiner
Basics of RapidMiner
Basics of RapidMiner
 Explain concepts and purpose of Data Preparation
 Understand solutions for handling missing and inconsistent
data
 Utilize data and attribute reduction techniques
 Effectively work in RapidMiner to prepare your data.
Summary
“This workforce solution was funded by a grant awarded by the U.S. Department of Labor’s
Employment and Training Administration. The solution was created by the grantee and does not
necessarily reflect the official position of the U.S. Department of Labor. The Department of Labor
makes no guarantees, warranties, or assurances of any kind, express or implied, with respect to such
information, including any information on linked sites and including, but not limited to, accuracy of the
information or its completeness, timeliness, usefulness, adequacy, continued availability, or ownership.”
Except where otherwise stated, this work by Wake Technical Community College Building Capacity in
Business Analytics, a Department of Labor, TAACCCT funded project, is licensed under the Creative
Commons Attribution 4.0 International License. To view a copy of this license, visit
http://creativecommons.org/licenses/by/4.0/
Copyright Information

More Related Content

What's hot

Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
Dr. C.V. Suresh Babu
 
Data analytics
Data analyticsData analytics
Data Analytics
Data AnalyticsData Analytics
Data Analytics
Srinimf-Slides
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
Umasree Raghunath
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
Edureka!
 
SAS/MIT/Sloan Data Analytics
SAS/MIT/Sloan Data AnalyticsSAS/MIT/Sloan Data Analytics
SAS/MIT/Sloan Data AnalyticsSteven Kimber
 
Introducing SPSS customer overview
Introducing SPSS customer overviewIntroducing SPSS customer overview
Introducing SPSS customer overviewebuc
 
Data analytics
Data analyticsData analytics
Data analytics
davidfergarcia
 
Data Mining Technique - SEMMA
Data Mining Technique - SEMMAData Mining Technique - SEMMA
Data Mining Technique - SEMMA
Ashish Chandra Jha
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceVignesh Prajapati
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project Lifecycle
Jason Geng
 
Challenges in business analytics
Challenges in business analyticsChallenges in business analytics
Challenges in business analytics
Miklos Koren
 
Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overview
Colleen Farrelly
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
Seth Grimes
 
Analytics
AnalyticsAnalytics
Data Science Project Lifecycle and Skill Set
Data Science Project Lifecycle and Skill SetData Science Project Lifecycle and Skill Set
Data Science Project Lifecycle and Skill Set
IDEAS - Int'l Data Engineering and Science Association
 
Leveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive IndustryLeveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive Industry
Domino Data Lab
 
How to Become a Data Analyst? | Data Analyst Skills | Data Analyst Training |...
How to Become a Data Analyst? | Data Analyst Skills | Data Analyst Training |...How to Become a Data Analyst? | Data Analyst Skills | Data Analyst Training |...
How to Become a Data Analyst? | Data Analyst Skills | Data Analyst Training |...
Edureka!
 

What's hot (20)

Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Data analytics
Data analyticsData analytics
Data analytics
 
Data analytics
Data analyticsData analytics
Data analytics
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 
SAS/MIT/Sloan Data Analytics
SAS/MIT/Sloan Data AnalyticsSAS/MIT/Sloan Data Analytics
SAS/MIT/Sloan Data Analytics
 
Introducing SPSS customer overview
Introducing SPSS customer overviewIntroducing SPSS customer overview
Introducing SPSS customer overview
 
Data analytics
Data analyticsData analytics
Data analytics
 
Data Mining Technique - SEMMA
Data Mining Technique - SEMMAData Mining Technique - SEMMA
Data Mining Technique - SEMMA
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project Lifecycle
 
Challenges in business analytics
Challenges in business analyticsChallenges in business analytics
Challenges in business analytics
 
Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overview
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
 
Reports vs analysis
Reports vs analysisReports vs analysis
Reports vs analysis
 
Analytics
AnalyticsAnalytics
Analytics
 
Data Science Project Lifecycle and Skill Set
Data Science Project Lifecycle and Skill SetData Science Project Lifecycle and Skill Set
Data Science Project Lifecycle and Skill Set
 
Leveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive IndustryLeveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive Industry
 
How to Become a Data Analyst? | Data Analyst Skills | Data Analyst Training |...
How to Become a Data Analyst? | Data Analyst Skills | Data Analyst Training |...How to Become a Data Analyst? | Data Analyst Skills | Data Analyst Training |...
How to Become a Data Analyst? | Data Analyst Skills | Data Analyst Training |...
 

Viewers also liked

Learning SAS by Example -A Programmer’s Guide by Ron CodySolution
Learning SAS by Example -A Programmer’s Guide by Ron CodySolutionLearning SAS by Example -A Programmer’s Guide by Ron CodySolution
Learning SAS by Example -A Programmer’s Guide by Ron CodySolution
Vibeesh CS
 
SAS BASICS
SAS BASICSSAS BASICS
SAS BASICS
Bhuwanesh Rawat
 
SAS basics Step by step learning
SAS basics Step by step learningSAS basics Step by step learning
SAS basics Step by step learning
Venkata Reddy Konasani
 
Basics Of SAS Programming Language
Basics Of SAS Programming LanguageBasics Of SAS Programming Language
Basics Of SAS Programming Language
guest2160992
 
SAS Training session - By Pratima
SAS Training session  -  By Pratima SAS Training session  -  By Pratima
SAS Training session - By Pratima
Pratima Pandey
 
BAS 250 Lecture 1
BAS 250 Lecture 1BAS 250 Lecture 1
BAS 250 Lecture 1
Wake Tech BAS
 
SAS Ron Cody Solutions for even Number problems from Chapter 16 to 20
SAS Ron Cody Solutions for even Number problems from Chapter 16 to 20SAS Ron Cody Solutions for even Number problems from Chapter 16 to 20
SAS Ron Cody Solutions for even Number problems from Chapter 16 to 20
Ayapparaj SKS
 
SAS TRAINING
SAS TRAININGSAS TRAINING
SAS TRAINING
Krishna Stansys
 
BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
Wake Tech BAS
 
Where Vs If Statement
Where Vs If StatementWhere Vs If Statement
Where Vs If Statement
Sunil Gupta
 
Base 9.1 preparation guide
Base 9.1 preparation guideBase 9.1 preparation guide
Base 9.1 preparation guideimaduddin91
 
Analytics with SAS
Analytics with SASAnalytics with SAS
Analytics with SAS
Edureka!
 
Base SAS Full Sample Paper
Base SAS Full Sample Paper Base SAS Full Sample Paper
Base SAS Full Sample Paper
Jimmy Rana
 
Statistical analytical programming for social media analysis .
Statistical analytical programming for social media analysis .Statistical analytical programming for social media analysis .
Statistical analytical programming for social media analysis .
Felicita Florence
 
Base SAS Exam Questions
Base SAS Exam QuestionsBase SAS Exam Questions
Base SAS Exam Questionsguestc45097
 
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | EdurekaBig Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
Edureka!
 
Deep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applicationsDeep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applications
Buhwan Jeong
 
The Second Little Book of Leadership
The Second Little Book of LeadershipThe Second Little Book of Leadership
The Second Little Book of LeadershipPhil Dourado
 
Best Presentation About Infosys
Best Presentation About InfosysBest Presentation About Infosys
Best Presentation About Infosys
Durgadatta Dash
 

Viewers also liked (20)

Learning SAS by Example -A Programmer’s Guide by Ron CodySolution
Learning SAS by Example -A Programmer’s Guide by Ron CodySolutionLearning SAS by Example -A Programmer’s Guide by Ron CodySolution
Learning SAS by Example -A Programmer’s Guide by Ron CodySolution
 
SAS BASICS
SAS BASICSSAS BASICS
SAS BASICS
 
SAS basics Step by step learning
SAS basics Step by step learningSAS basics Step by step learning
SAS basics Step by step learning
 
Basics Of SAS Programming Language
Basics Of SAS Programming LanguageBasics Of SAS Programming Language
Basics Of SAS Programming Language
 
SAS Training session - By Pratima
SAS Training session  -  By Pratima SAS Training session  -  By Pratima
SAS Training session - By Pratima
 
BAS 250 Lecture 1
BAS 250 Lecture 1BAS 250 Lecture 1
BAS 250 Lecture 1
 
SAS Ron Cody Solutions for even Number problems from Chapter 16 to 20
SAS Ron Cody Solutions for even Number problems from Chapter 16 to 20SAS Ron Cody Solutions for even Number problems from Chapter 16 to 20
SAS Ron Cody Solutions for even Number problems from Chapter 16 to 20
 
SAS TRAINING
SAS TRAININGSAS TRAINING
SAS TRAINING
 
BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
 
Where Vs If Statement
Where Vs If StatementWhere Vs If Statement
Where Vs If Statement
 
Base 9.1 preparation guide
Base 9.1 preparation guideBase 9.1 preparation guide
Base 9.1 preparation guide
 
Analytics with SAS
Analytics with SASAnalytics with SAS
Analytics with SAS
 
Sas demo
Sas demoSas demo
Sas demo
 
Base SAS Full Sample Paper
Base SAS Full Sample Paper Base SAS Full Sample Paper
Base SAS Full Sample Paper
 
Statistical analytical programming for social media analysis .
Statistical analytical programming for social media analysis .Statistical analytical programming for social media analysis .
Statistical analytical programming for social media analysis .
 
Base SAS Exam Questions
Base SAS Exam QuestionsBase SAS Exam Questions
Base SAS Exam Questions
 
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | EdurekaBig Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
 
Deep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applicationsDeep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applications
 
The Second Little Book of Leadership
The Second Little Book of LeadershipThe Second Little Book of Leadership
The Second Little Book of Leadership
 
Best Presentation About Infosys
Best Presentation About InfosysBest Presentation About Infosys
Best Presentation About Infosys
 

Similar to BAS 250 Lecture 2

data wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjhdata wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjh
VISHALMARWADE1
 
Data Processing & Explain each term in details.pptx
Data Processing & Explain each term in details.pptxData Processing & Explain each term in details.pptx
Data Processing & Explain each term in details.pptx
PratikshaSurve4
 
ML-ChapterTwo-Data Preprocessing.ppt
ML-ChapterTwo-Data Preprocessing.pptML-ChapterTwo-Data Preprocessing.ppt
ML-ChapterTwo-Data Preprocessing.ppt
belay41
 
Chapter 3.pdf
Chapter 3.pdfChapter 3.pdf
Chapter 3.pdf
DrGnaneswariG
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
Mahmoud Alfarra
 
Top 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfTop 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdf
ShaikSikindar1
 
Data Science in Python.pptx
Data Science in Python.pptxData Science in Python.pptx
Data Science in Python.pptx
Ramakrishna Reddy Bijjam
 
Pandas Data Cleaning and Preprocessing PPT.pptx
Pandas Data Cleaning and Preprocessing PPT.pptxPandas Data Cleaning and Preprocessing PPT.pptx
Pandas Data Cleaning and Preprocessing PPT.pptx
bajajrishabh96tech
 
Data Quality: principles, approaches, and best practices
Data Quality: principles, approaches, and best practicesData Quality: principles, approaches, and best practices
Data Quality: principles, approaches, and best practices
Carl Anderson
 
Data processing
Data processingData processing
Data processing
AnupamSingh211
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
Roger Barga
 
4 Data preparation and processing
4  Data preparation and processing4  Data preparation and processing
4 Data preparation and processing
Mahmoud Alfarra
 
Knowledge discovery claudiad amato
Knowledge discovery claudiad amatoKnowledge discovery claudiad amato
Knowledge discovery claudiad amato
SSSW
 
Data Preparation.pptx
Data Preparation.pptxData Preparation.pptx
Data Preparation.pptx
YashikaSengar2
 
KNOLX_Data_preprocessing
KNOLX_Data_preprocessingKNOLX_Data_preprocessing
KNOLX_Data_preprocessing
Knoldus Inc.
 
Mba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation aMba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation a
Rai University
 
BDA TAE 2 (BMEB 83).pptx
BDA TAE 2 (BMEB 83).pptxBDA TAE 2 (BMEB 83).pptx
BDA TAE 2 (BMEB 83).pptx
Akash527744
 
Data Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data QualityData Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data Quality
priyanka rajput
 

Similar to BAS 250 Lecture 2 (20)

data wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjhdata wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjh
 
Metopen 6
Metopen 6Metopen 6
Metopen 6
 
Data Processing & Explain each term in details.pptx
Data Processing & Explain each term in details.pptxData Processing & Explain each term in details.pptx
Data Processing & Explain each term in details.pptx
 
ML-ChapterTwo-Data Preprocessing.ppt
ML-ChapterTwo-Data Preprocessing.pptML-ChapterTwo-Data Preprocessing.ppt
ML-ChapterTwo-Data Preprocessing.ppt
 
Chapter 3.pdf
Chapter 3.pdfChapter 3.pdf
Chapter 3.pdf
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
 
Business analyst
Business analystBusiness analyst
Business analyst
 
Top 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfTop 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdf
 
Data Science in Python.pptx
Data Science in Python.pptxData Science in Python.pptx
Data Science in Python.pptx
 
Pandas Data Cleaning and Preprocessing PPT.pptx
Pandas Data Cleaning and Preprocessing PPT.pptxPandas Data Cleaning and Preprocessing PPT.pptx
Pandas Data Cleaning and Preprocessing PPT.pptx
 
Data Quality: principles, approaches, and best practices
Data Quality: principles, approaches, and best practicesData Quality: principles, approaches, and best practices
Data Quality: principles, approaches, and best practices
 
Data processing
Data processingData processing
Data processing
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
4 Data preparation and processing
4  Data preparation and processing4  Data preparation and processing
4 Data preparation and processing
 
Knowledge discovery claudiad amato
Knowledge discovery claudiad amatoKnowledge discovery claudiad amato
Knowledge discovery claudiad amato
 
Data Preparation.pptx
Data Preparation.pptxData Preparation.pptx
Data Preparation.pptx
 
KNOLX_Data_preprocessing
KNOLX_Data_preprocessingKNOLX_Data_preprocessing
KNOLX_Data_preprocessing
 
Mba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation aMba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation a
 
BDA TAE 2 (BMEB 83).pptx
BDA TAE 2 (BMEB 83).pptxBDA TAE 2 (BMEB 83).pptx
BDA TAE 2 (BMEB 83).pptx
 
Data Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data QualityData Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data Quality
 

More from Wake Tech BAS

BAS 250 Lecture 5
BAS 250 Lecture 5BAS 250 Lecture 5
BAS 250 Lecture 5
Wake Tech BAS
 
BAS 250 Lecture 4
BAS 250 Lecture 4BAS 250 Lecture 4
BAS 250 Lecture 4
Wake Tech BAS
 
BAS 250 Lecture 3
BAS 250 Lecture 3BAS 250 Lecture 3
BAS 250 Lecture 3
Wake Tech BAS
 
BAS 150 Lesson 8 Lecture
BAS 150 Lesson 8 LectureBAS 150 Lesson 8 Lecture
BAS 150 Lesson 8 Lecture
Wake Tech BAS
 
BAS 150 Lesson 7 Lecture
BAS 150 Lesson 7 LectureBAS 150 Lesson 7 Lecture
BAS 150 Lesson 7 Lecture
Wake Tech BAS
 
BAS 150 Lesson 6 Lecture
BAS 150 Lesson 6 LectureBAS 150 Lesson 6 Lecture
BAS 150 Lesson 6 Lecture
Wake Tech BAS
 
BAS 150 Lesson 5 Lecture
BAS 150 Lesson 5 LectureBAS 150 Lesson 5 Lecture
BAS 150 Lesson 5 Lecture
Wake Tech BAS
 
BAS 150 Lesson 4 Lecture
BAS 150 Lesson 4 LectureBAS 150 Lesson 4 Lecture
BAS 150 Lesson 4 Lecture
Wake Tech BAS
 
BAS 150 Lesson 3 Lecture
BAS 150 Lesson 3 LectureBAS 150 Lesson 3 Lecture
BAS 150 Lesson 3 Lecture
Wake Tech BAS
 

More from Wake Tech BAS (9)

BAS 250 Lecture 5
BAS 250 Lecture 5BAS 250 Lecture 5
BAS 250 Lecture 5
 
BAS 250 Lecture 4
BAS 250 Lecture 4BAS 250 Lecture 4
BAS 250 Lecture 4
 
BAS 250 Lecture 3
BAS 250 Lecture 3BAS 250 Lecture 3
BAS 250 Lecture 3
 
BAS 150 Lesson 8 Lecture
BAS 150 Lesson 8 LectureBAS 150 Lesson 8 Lecture
BAS 150 Lesson 8 Lecture
 
BAS 150 Lesson 7 Lecture
BAS 150 Lesson 7 LectureBAS 150 Lesson 7 Lecture
BAS 150 Lesson 7 Lecture
 
BAS 150 Lesson 6 Lecture
BAS 150 Lesson 6 LectureBAS 150 Lesson 6 Lecture
BAS 150 Lesson 6 Lecture
 
BAS 150 Lesson 5 Lecture
BAS 150 Lesson 5 LectureBAS 150 Lesson 5 Lecture
BAS 150 Lesson 5 Lecture
 
BAS 150 Lesson 4 Lecture
BAS 150 Lesson 4 LectureBAS 150 Lesson 4 Lecture
BAS 150 Lesson 4 Lecture
 
BAS 150 Lesson 3 Lecture
BAS 150 Lesson 3 LectureBAS 150 Lesson 3 Lecture
BAS 150 Lesson 3 Lecture
 

Recently uploaded

Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
ArianaBusciglio
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
Wasim Ak
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
Kartik Tiwari
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 

Recently uploaded (20)

Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 

BAS 250 Lecture 2

  • 1. BAS 250 Lesson 2: Data Preparation
  • 2.  Explain concepts and purpose of Data Preparation  Understand solutions for handling missing and inconsistent data  Utilize data and attribute reduction techniques  Effectively work in RapidMiner to prepare your data. This Week’s Learning Objectives
  • 3. The Data Mining Process: CRISP-DM
  • 4. o Join data sets that are needed for your analysis o Reduce data sets to only include pertinent variables o Scrub data to remove anomalies- outliers or missing data o Reformat for consistency and effective use 3. Data Preparation
  • 5.  Ensure robustness of data o Combine more 2 or more data sets to create a “mini – database” with all variables needed for analysis in one place. o Merge by a unique identifier common to both data sets  “Key Identifier”, “Common ID”, “ID Number”, etc.  Example: Social Security Number (links Medical and Insurance) Data Preparation
  • 6. Data Preparation Example: Sources of Data Customer Purchases - “Point of Sale data” – CSV file format Cost of Products Sold – “Accounting department” – Excel file format Inventory of Products - “ IT Data Warehouse” - XML file format Merge By Product ID or SKU
  • 7.  Data Reduction…two part o Observations (rows, instances, etc) o Attributes (variables, records, columns, etc) Data Preparation
  • 8.  Attribute reduction to filter out irrelevant or uninteresting data without completely removing them from the original set.  Even if a variable isn’t interesting for answering some questions, it may still be useful in others. It is recommended to import all attributes first, then filter as necessary Data Preparation
  • 9.  Observation Reduction…  Observation reduction is to reduce the # of observations to create a smaller data set.  Some reasons to do so: o Create a sample set for:  Training data, proof of concept analysis, testing theories, sharing data o Improve analysis speed or process time o Data scrubbing for outliers, missing values, etc. Data Preparation
  • 10.  Ensure consistency of data o Missing information o Spelling errors, typos o Multiple responses for an attribute o Characters in numeric fields and vice-versa Data Preparation
  • 11.  Ensure consistency of data Data Preparation KEY: Missing data is data that does not exist in a data set • Not the same as zero or some other value • In a dataset, it is blank and the value is unknown • Sometimes referred to as null values • Depending on your objective and the circumstance, you may choose to leave missing data as they are or replace with some other value
  • 12.  Ensure consistency of data Data Preparation KEY: Inconsistent data is different from missing data • Occurs when a value does exist but its value is not valid or meaningful. • Common = “.” or “zero”
  • 13.  Ensure consistency of data Data Preparation Replace or remove missing or inconsistent data • For numeric data… • Can be replaced using Measures of Central Tendency • Mean, Median, and Mode • Mean - Average value • Median - Middle value • Mode - Most frequent or common value
  • 14.  Ensure consistency of data Data Preparation Replace or remove missing or inconsistent data • For character data… • Can be replaced using Best Estimated Value • “Like Others” • Ex. All males in data like bass fishing. If attribute “Fish Type” is blank and attribute “Gender” equals male, then “Bass” • “Clustering Techniques” • “Best Guess”
  • 15.  Ensure consistency of data Data Preparation • Replacing missing or inconsistent values found in data should be done: • With intention, not haphazardly • Use common sense • Be transparent It is recommended to always document your missing or consistent data processes.
  • 16.  This course is a practical application course in Data Mining. Learning to use RapidMiner is required.  If you have not done so yet, please plan to walk through the tutorial examples in RapidMiner.  To assist you in understanding RapidMiner, I will take screenshots of what I am doing to get the results we are looking for.  RapidMiner is pretty intuitive. You will get it quickly. Basics of RapidMiner
  • 17.  Types of files that can be imported into RapidMiner: o CSV File o Excel File o XML File o Access Database Table o … and much more  We use mainly CSV files which contain Comma Separated Values- be mindful if your dataset contains commas o Alternative delimiters can be selected in this case:  Tab  Semicolon  Pipe ( l ), etc. Basics of RapidMiner
  • 18.  Three main areas that contain useful tools in RapidMiner: o Operators – Every possible task you can think of o Repositories – Where you store your data o Parameters – Task set up details Basics of RapidMiner
  • 26.  Explain concepts and purpose of Data Preparation  Understand solutions for handling missing and inconsistent data  Utilize data and attribute reduction techniques  Effectively work in RapidMiner to prepare your data. Summary
  • 27. “This workforce solution was funded by a grant awarded by the U.S. Department of Labor’s Employment and Training Administration. The solution was created by the grantee and does not necessarily reflect the official position of the U.S. Department of Labor. The Department of Labor makes no guarantees, warranties, or assurances of any kind, express or implied, with respect to such information, including any information on linked sites and including, but not limited to, accuracy of the information or its completeness, timeliness, usefulness, adequacy, continued availability, or ownership.” Except where otherwise stated, this work by Wake Technical Community College Building Capacity in Business Analytics, a Department of Labor, TAACCCT funded project, is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ Copyright Information