SlideShare a Scribd company logo
1 of 13
DATA WRANGLING
ABDULLAH ABBASI
WHAT IS DATA WRANGLING?
 Data wrangling—also called data cleaning, data remediation, or data munging—refers to a
variety of processes designed to transform raw data into more readily used formats. The
exact methods differ from project to project depending on the data you’re leveraging and
the goal you’re trying to achieve.
Examples
 Merging multiple data sources into a single dataset for analysis
 Identifying gaps in data (for example, empty cells in a spreadsheet) and
either filling or deleting them
 Deleting data that’s either unnecessary or irrelevant to the project you’re
working on
 Identifying extreme outliers in data and either explaining the discrepancies
or removing them so that analysis can take place
DATA WRANGLING STEPS
 Each data project requires a unique approach to ensure its final dataset is
reliable and accessible. That being said, several processes typically inform
the approach. These are commonly referred to as data wrangling steps or
activities.
1. Discovery
 Discovery refers to the process of familiarizing yourself with data so you can conceptualize
how you might use it. You can liken it to looking in your refrigerator before cooking a meal
to see what ingredients you have at your disposal.
 During discovery, you may identify trends or patterns in the data, along with obvious
issues, such as missing or incomplete values that need to be addressed. This is an important
step, as it will inform every activity that comes afterward.
2. Structuring
 Raw data is typically unusable in its raw state
because it’s either incomplete or misformatted
for its intended application. Data structuring is
the process of taking raw data and transforming
it to be more readily leveraged. The form your
data takes will depend on the analytical model
you use to interpret it.
3. Cleaning
 Data cleaning is the process of removing inherent errors in data that might
distort your analysis or render it less valuable. Cleaning can come in
different forms, including deleting empty cells or rows, removing outliers,
and standardizing inputs. The goal of data cleaning is to ensure there are no
errors (or as few as possible) that could influence your final analysis.

4. Enriching
 Once you understand your existing data and have transformed it into a more
usable state, you must determine whether you have all of the data necessary
for the project at hand. If not, you may choose to enrich or augment your
data by incorporating values from other datasets. For this reason, it’s
important to understand what other data is available for use.
 If you decide that enrichment is necessary, you need to repeat the steps
above for any new data.
5. Validating
 Data validation refers to the process of verifying that your data is both
consistent and of a high enough quality. During validation, you may
discover issues you need to resolve or conclude that your data is ready to be
analyzed. Validation is typically achieved through various automated
processes and requires programming.
6. Publishing
 Once your data has been validated, you can publish it. This involves making
it available to others within your organization for analysis. The format you
use to share the information—such as a written report or electronic file—
will depend on your data and the organization’s goals.

THE IMPORTANCE OF DATA WRANGLING
 Any analyses a business performs will ultimately be constrained by the data that
informs them. If data is incomplete, unreliable, or faulty, then analyses will be
too—diminishing the value of any insights gleaned.
 Data wrangling seeks to remove that risk by ensuring data is in a reliable
state before it’s analyzed and leveraged. This makes it a critical part of the
analytical process.
 It’s important to note that data wrangling can be time-consuming and taxing on
resources, particularly when done manually. This is why many organizations
institute policies and best practices that help employees streamline the data
cleanup process—for example, requiring that data include certain information or
be in a specific format before it’s uploaded to a database.
 For this reason, it’s vital to understand the steps of the data wrangling process
and the negative outcomes associated with incorrect or faulty data.
Benefits of Data Wrangling
 Data wrangling helps to improve data usability as it converts data into a
compatible format for the end system.
 It helps to quickly build data flows within an intuitive user interface and
easily schedule and automate the data-flow process.
 Integrates various types of information and their sources (like databases,
web services, files, etc.)
 Help users to process very large volumes of data easily and easily share data-
flow techniques.
Data Wrangling Tools
 There are different tools for data wrangling that can be used for gathering,
importing, structuring, and cleaning data before it can be fed into analytics
and BI apps. You can use automated tools for data wrangling, where the
software allows you to validate data mappings and scrutinize data samples
at every step of the transformation process. This helps to quickly detect and
correct errors in data mapping. Automated data cleaning becomes necessary
in businesses dealing with exceptionally large data sets. For manual data
cleaning processes, the data team or data scientist is responsible for
wrangling. In smaller setups, however, non-data professionals are
responsible for cleaning data before leveraging it.
Some examples of basic data munging
tools are:
 Spreadsheets / Excel Power Query - It is the most basic manual data
wrangling tool
 OpenRefine - An automated data cleaning tool that requires programming
skills
 Tabula – It is a tool suited for all data types
 Google DataPrep – It is a data service that explores, cleans, and prepares
data
 Data wrangler – It is a data cleaning and transforming tool

More Related Content

What's hot

DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data MiningDHIVYADEVAKI
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science ProcessVishal Patel
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Miningidnats
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining ConceptsDung Nguyen
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycleManoj Mishra
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1malathieswaran29
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data AnalyticsUtkarsh Sharma
 
1. Data Analytics-introduction
1. Data Analytics-introduction1. Data Analytics-introduction
1. Data Analytics-introductionkrishna singh
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsDatamining Tools
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationDr. Abdul Ahad Abro
 

What's hot (20)

DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
 
Data analytics vs. Data analysis
Data analytics vs. Data analysisData analytics vs. Data analysis
Data analytics vs. Data analysis
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science Process
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycle
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
1. Data Analytics-introduction
1. Data Analytics-introduction1. Data Analytics-introduction
1. Data Analytics-introduction
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 

Similar to DATA WRANGLING presentation.pptx

BDA TAE 2 (BMEB 83).pptx
BDA TAE 2 (BMEB 83).pptxBDA TAE 2 (BMEB 83).pptx
BDA TAE 2 (BMEB 83).pptxAkash527744
 
data wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjhdata wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjhVISHALMARWADE1
 
Top 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfTop 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfShaikSikindar1
 
Decoding the Role of a Data Engineer.pdf
Decoding the Role of a Data Engineer.pdfDecoding the Role of a Data Engineer.pdf
Decoding the Role of a Data Engineer.pdfDatavalley.ai
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business IntelligenceSukirti Garg
 
Self-service analytics risk_September_2016
Self-service analytics risk_September_2016Self-service analytics risk_September_2016
Self-service analytics risk_September_2016Leigh Ulpen
 
thegrowingimportanceofdatacleaning-211202141902.pptx
thegrowingimportanceofdatacleaning-211202141902.pptxthegrowingimportanceofdatacleaning-211202141902.pptx
thegrowingimportanceofdatacleaning-211202141902.pptxYashaswiniSrinivasan1
 
The Growing Importance of Data Cleaning
The Growing Importance of Data CleaningThe Growing Importance of Data Cleaning
The Growing Importance of Data CleaningCarolineSmith912130
 
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREEA ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREEijcsa
 
Machine learning topics machine learning algorithm into three main parts.
Machine learning topics  machine learning algorithm into three main parts.Machine learning topics  machine learning algorithm into three main parts.
Machine learning topics machine learning algorithm into three main parts.DurgaDeviP2
 
Data Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data QualityData Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data Qualitypriyanka rajput
 
Data Entry Best Practices to Enhance Data Accuracy - DEIO
Data Entry Best Practices to Enhance Data Accuracy - DEIOData Entry Best Practices to Enhance Data Accuracy - DEIO
Data Entry Best Practices to Enhance Data Accuracy - DEIOData Entry India Outsource
 
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineQlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineSrikanth Sharma Boddupalli
 
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Chain Sys Corporation
 
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITYMANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITYFreelance
 
Data Collection Process And Integrity
Data Collection Process And IntegrityData Collection Process And Integrity
Data Collection Process And IntegrityGerrit Klaschke, CSM
 

Similar to DATA WRANGLING presentation.pptx (20)

BDA TAE 2 (BMEB 83).pptx
BDA TAE 2 (BMEB 83).pptxBDA TAE 2 (BMEB 83).pptx
BDA TAE 2 (BMEB 83).pptx
 
data wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjhdata wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjh
 
Top 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfTop 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdf
 
Decoding the Role of a Data Engineer.pdf
Decoding the Role of a Data Engineer.pdfDecoding the Role of a Data Engineer.pdf
Decoding the Role of a Data Engineer.pdf
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligence
 
Self-service analytics risk_September_2016
Self-service analytics risk_September_2016Self-service analytics risk_September_2016
Self-service analytics risk_September_2016
 
thegrowingimportanceofdatacleaning-211202141902.pptx
thegrowingimportanceofdatacleaning-211202141902.pptxthegrowingimportanceofdatacleaning-211202141902.pptx
thegrowingimportanceofdatacleaning-211202141902.pptx
 
The Growing Importance of Data Cleaning
The Growing Importance of Data CleaningThe Growing Importance of Data Cleaning
The Growing Importance of Data Cleaning
 
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREEA ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
 
Machine learning topics machine learning algorithm into three main parts.
Machine learning topics  machine learning algorithm into three main parts.Machine learning topics  machine learning algorithm into three main parts.
Machine learning topics machine learning algorithm into three main parts.
 
Quality Assurance in Knowledge Data Warehouse
Quality Assurance in Knowledge Data WarehouseQuality Assurance in Knowledge Data Warehouse
Quality Assurance in Knowledge Data Warehouse
 
Data Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data QualityData Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data Quality
 
Data Entry Best Practices to Enhance Data Accuracy - DEIO
Data Entry Best Practices to Enhance Data Accuracy - DEIOData Entry Best Practices to Enhance Data Accuracy - DEIO
Data Entry Best Practices to Enhance Data Accuracy - DEIO
 
Business analyst
Business analystBusiness analyst
Business analyst
 
Data analytics
Data analyticsData analytics
Data analytics
 
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineQlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
 
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
 
Data Mining
Data MiningData Mining
Data Mining
 
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITYMANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
 
Data Collection Process And Integrity
Data Collection Process And IntegrityData Collection Process And Integrity
Data Collection Process And Integrity
 

Recently uploaded

Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 

Recently uploaded (20)

Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 

DATA WRANGLING presentation.pptx

  • 2. WHAT IS DATA WRANGLING?  Data wrangling—also called data cleaning, data remediation, or data munging—refers to a variety of processes designed to transform raw data into more readily used formats. The exact methods differ from project to project depending on the data you’re leveraging and the goal you’re trying to achieve.
  • 3. Examples  Merging multiple data sources into a single dataset for analysis  Identifying gaps in data (for example, empty cells in a spreadsheet) and either filling or deleting them  Deleting data that’s either unnecessary or irrelevant to the project you’re working on  Identifying extreme outliers in data and either explaining the discrepancies or removing them so that analysis can take place
  • 4. DATA WRANGLING STEPS  Each data project requires a unique approach to ensure its final dataset is reliable and accessible. That being said, several processes typically inform the approach. These are commonly referred to as data wrangling steps or activities. 1. Discovery  Discovery refers to the process of familiarizing yourself with data so you can conceptualize how you might use it. You can liken it to looking in your refrigerator before cooking a meal to see what ingredients you have at your disposal.  During discovery, you may identify trends or patterns in the data, along with obvious issues, such as missing or incomplete values that need to be addressed. This is an important step, as it will inform every activity that comes afterward.
  • 5. 2. Structuring  Raw data is typically unusable in its raw state because it’s either incomplete or misformatted for its intended application. Data structuring is the process of taking raw data and transforming it to be more readily leveraged. The form your data takes will depend on the analytical model you use to interpret it.
  • 6. 3. Cleaning  Data cleaning is the process of removing inherent errors in data that might distort your analysis or render it less valuable. Cleaning can come in different forms, including deleting empty cells or rows, removing outliers, and standardizing inputs. The goal of data cleaning is to ensure there are no errors (or as few as possible) that could influence your final analysis. 
  • 7. 4. Enriching  Once you understand your existing data and have transformed it into a more usable state, you must determine whether you have all of the data necessary for the project at hand. If not, you may choose to enrich or augment your data by incorporating values from other datasets. For this reason, it’s important to understand what other data is available for use.  If you decide that enrichment is necessary, you need to repeat the steps above for any new data.
  • 8. 5. Validating  Data validation refers to the process of verifying that your data is both consistent and of a high enough quality. During validation, you may discover issues you need to resolve or conclude that your data is ready to be analyzed. Validation is typically achieved through various automated processes and requires programming.
  • 9. 6. Publishing  Once your data has been validated, you can publish it. This involves making it available to others within your organization for analysis. The format you use to share the information—such as a written report or electronic file— will depend on your data and the organization’s goals. 
  • 10. THE IMPORTANCE OF DATA WRANGLING  Any analyses a business performs will ultimately be constrained by the data that informs them. If data is incomplete, unreliable, or faulty, then analyses will be too—diminishing the value of any insights gleaned.  Data wrangling seeks to remove that risk by ensuring data is in a reliable state before it’s analyzed and leveraged. This makes it a critical part of the analytical process.  It’s important to note that data wrangling can be time-consuming and taxing on resources, particularly when done manually. This is why many organizations institute policies and best practices that help employees streamline the data cleanup process—for example, requiring that data include certain information or be in a specific format before it’s uploaded to a database.  For this reason, it’s vital to understand the steps of the data wrangling process and the negative outcomes associated with incorrect or faulty data.
  • 11. Benefits of Data Wrangling  Data wrangling helps to improve data usability as it converts data into a compatible format for the end system.  It helps to quickly build data flows within an intuitive user interface and easily schedule and automate the data-flow process.  Integrates various types of information and their sources (like databases, web services, files, etc.)  Help users to process very large volumes of data easily and easily share data- flow techniques.
  • 12. Data Wrangling Tools  There are different tools for data wrangling that can be used for gathering, importing, structuring, and cleaning data before it can be fed into analytics and BI apps. You can use automated tools for data wrangling, where the software allows you to validate data mappings and scrutinize data samples at every step of the transformation process. This helps to quickly detect and correct errors in data mapping. Automated data cleaning becomes necessary in businesses dealing with exceptionally large data sets. For manual data cleaning processes, the data team or data scientist is responsible for wrangling. In smaller setups, however, non-data professionals are responsible for cleaning data before leveraging it.
  • 13. Some examples of basic data munging tools are:  Spreadsheets / Excel Power Query - It is the most basic manual data wrangling tool  OpenRefine - An automated data cleaning tool that requires programming skills  Tabula – It is a tool suited for all data types  Google DataPrep – It is a data service that explores, cleans, and prepares data  Data wrangler – It is a data cleaning and transforming tool