SlideShare a Scribd company logo
1 of 15
G.H RAISONI COLLEGE OF ENGINEERING
& MANAGEMENT, PUNE
Department of Mechanical Engineering
TAE 2
Topic :-DATA WRANGLING
Subject:- BIG DATA ANALYTICS
Guided By: Presented By: Roll No:
Dr.Apshabi pathan Mr.Akash Gaikwad BMEB 83
Contents
• What is data wrangling
• Examples
• Steps in data wrangling
• Importance of data wrangling
• Benefits of data wrangling
• Data wrangling tools
• Examples of basic mungling tools
 Data wrangling—also called data cleaning, data remediation, or data munging—refers
to a variety of processes designed to transform raw data into more readily used
formats. The exact methods differ from project to project depending on the data
you’re leveraging and the goal you’re trying to achieve.
WHAT IS DATA WRANGLING?
Examples
 Merging multiple data sources into a single dataset for analysis
▶ Identifying gaps in data (for example, empty cells in a spreadsheet) and either filling
or deleting them
▶ Deleting data that’s either unnecessary or irrelevant to the project you’re
working on
▶ Identifying extreme outliers in data and either explaining the discrepancies or removing
them so that analysis can take place
DATA WRANGLING STEPS
 Each data project requires a unique approach to ensure its final dataset is reliable and accessible.
That being said, several processes typically inform the approach. These are commonly referred to
as data wrangling steps or activities.
1. Discovery
 Discovery refers to the process of familiarizing yourself with data so you can
conceptualize how you might use it. You can liken it to looking in your refrigerator before
cooking a meal to see what ingredients you have at your disposal.
▶ During discovery, you may identify trends or patterns in the data, along with obvious
issues, such as missing or incomplete values that need to be addressed. This is an important
step, as it will inform every activity that comes afterward.
2. Structuring
 Raw data is typically unusable in its raw state because it’s either incomplete or
misformatted for its intended application. Data structuring is the process of taking
raw data and transforming it to be more readily leveraged. The form your data takes
will depend on the analytical model you use to interpret it.
3. Cleaning
 Data cleaning is the process of removing inherent errors in data that might distort your
analysis or render it less valuable. Cleaning can come in different forms, including
deleting empty cells or rows, removing outliers, and standardizing inputs. The goal of
data cleaning is to ensure there are no errors (or as few as possible) that could influence
your final analysis.
4. Enriching
 Once you understand your existing data and have transformed it into a more usable
state, you must determine whether you have all of the data necessary for the project at
hand. If not, you may choose to enrich or augment your data by incorporating values
from other datasets. For this reason, it’s important to understand what other data is
available for use.
▶ If you decide that enrichment is necessary, you need to repeat the steps above for
any new data.
5. Validating
 Data validation refers to the process of verifying that your data is both consistent and
of a high enough quality. During validation, you may discover issues you need to
resolve or conclude that your data is ready to be analyzed. Validation is typically
achieved through various automated processes and requires programming.
6. Publishing
 Once your data has been validated, you can publish it. This involves making it available to
others within your organization for analysis. The format you use to share the
information—such as a written report or electronic file— will depend on your data and
the organization’s goals.
THE IMPORTANCE OF DATA WRANGLING
 Any analyses a business performs will ultimately be constrained by the data that
informs them. If data is incomplete, unreliable, or faulty, then analyses will be too—
diminishing the value of any insights gleaned.
 Data wrangling seeks to remove that risk by ensuring data is in a reliable state
before it’s analyzed and leveraged. This makes it a critical part of the analytical
process.
 It’s important to note that data wrangling can be time-consuming and taxing on
resources, particularly when done manually. This is why many organizations institute
policies and best practices that help employees streamline the data cleanup process—
for example, requiring that data include certain information or be in a specific format
before it’s uploaded to a database.
Benefits of Data Wrangling
 Data wrangling helps to improve data usability as it converts data into a compatible
format for the end system.
▶ It helps to quickly build data flows within an intuitive user interface and easily
schedule and automate the data-flow process.
▶ Integrates various types of information and their sources (like databases,
web services, files, etc.)
▶ Help users to process very large volumes of data easily and easily share data- flow
techniques.
Data Wrangling Tools
 There are different tools for data wrangling that can be used for gathering, importing,
structuring, and cleaning data before it can be fed into analytics and BI apps. You can
use automated tools for data wrangling, where the software allows you to validate
data mappings and scrutinize data samples at every step of the transformation process.
This helps to quickly detect and correct errors in data mapping. Automated data
cleaning becomes necessary in businesses dealing with exceptionally large data sets.
For manual data cleaning processes, the data team or data scientist is responsible for
wrangling. In smaller setups, however, non-data professionals are responsible for
cleaning data before leveraging it.
Some examples of basic data munging tools are
 Spreadsheets / Excel Power Query - It is the most basic manual data wrangling tool
▶ OpenRefine - An automated data cleaning tool that requires programming skills
▶ Tabula – It is a tool suited for all data types
▶ Google DataPrep – It is a data service that explores, cleans, and prepares data
▶ Data wrangler – It is a data cleaning and transforming tool
Thank you !

More Related Content

Similar to BDA TAE 2 (BMEB 83).pptx

Self-service analytics risk_September_2016
Self-service analytics risk_September_2016Self-service analytics risk_September_2016
Self-service analytics risk_September_2016Leigh Ulpen
 
Data Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdf
Data Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdfData Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdf
Data Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdfNeha Singh
 
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREEA ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREEijcsa
 
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITYMANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITYFreelance
 
thegrowingimportanceofdatacleaning-211202141902.pptx
thegrowingimportanceofdatacleaning-211202141902.pptxthegrowingimportanceofdatacleaning-211202141902.pptx
thegrowingimportanceofdatacleaning-211202141902.pptxYashaswiniSrinivasan1
 
Moh.Abd-Ellatif_DataAnalysis1.pptx
Moh.Abd-Ellatif_DataAnalysis1.pptxMoh.Abd-Ellatif_DataAnalysis1.pptx
Moh.Abd-Ellatif_DataAnalysis1.pptxAbdullahEmam4
 
The Growing Importance of Data Cleaning
The Growing Importance of Data CleaningThe Growing Importance of Data Cleaning
The Growing Importance of Data CleaningCarolineSmith912130
 
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Chain Sys Corporation
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfData Science Council of America
 
Introduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfIntroduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfAbdulrahimShaibuIssa
 
DATA PREPROCESSING AND DATA CLEANSING
DATA PREPROCESSING AND DATA CLEANSINGDATA PREPROCESSING AND DATA CLEANSING
DATA PREPROCESSING AND DATA CLEANSINGAhtesham Ullah khan
 
Data Cleaning Best Practices.pdf
Data Cleaning Best Practices.pdfData Cleaning Best Practices.pdf
Data Cleaning Best Practices.pdfUncodemy
 
How do you assess the quality and reliability of data sources in data analysi...
How do you assess the quality and reliability of data sources in data analysi...How do you assess the quality and reliability of data sources in data analysi...
How do you assess the quality and reliability of data sources in data analysi...Soumodeep Nanee Kundu
 

Similar to BDA TAE 2 (BMEB 83).pptx (20)

Self-service analytics risk_September_2016
Self-service analytics risk_September_2016Self-service analytics risk_September_2016
Self-service analytics risk_September_2016
 
Data Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdf
Data Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdfData Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdf
Data Analytics Course Curriculum_ What to Expect and How to Prepare in 2023.pdf
 
Quality Assurance in Knowledge Data Warehouse
Quality Assurance in Knowledge Data WarehouseQuality Assurance in Knowledge Data Warehouse
Quality Assurance in Knowledge Data Warehouse
 
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREEA ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
 
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITYMANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
MANAGING RESOURCES FOR BUSINESS ANALYTICS BA4206 ANNA UNIVERSITY
 
data analysis-mining
data analysis-miningdata analysis-mining
data analysis-mining
 
thegrowingimportanceofdatacleaning-211202141902.pptx
thegrowingimportanceofdatacleaning-211202141902.pptxthegrowingimportanceofdatacleaning-211202141902.pptx
thegrowingimportanceofdatacleaning-211202141902.pptx
 
Moh.Abd-Ellatif_DataAnalysis1.pptx
Moh.Abd-Ellatif_DataAnalysis1.pptxMoh.Abd-Ellatif_DataAnalysis1.pptx
Moh.Abd-Ellatif_DataAnalysis1.pptx
 
The Growing Importance of Data Cleaning
The Growing Importance of Data CleaningThe Growing Importance of Data Cleaning
The Growing Importance of Data Cleaning
 
Data Mining
Data MiningData Mining
Data Mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Business analyst
Business analystBusiness analyst
Business analyst
 
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
 
Introduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfIntroduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdf
 
DATA PREPROCESSING AND DATA CLEANSING
DATA PREPROCESSING AND DATA CLEANSINGDATA PREPROCESSING AND DATA CLEANSING
DATA PREPROCESSING AND DATA CLEANSING
 
Data Cleaning Best Practices.pdf
Data Cleaning Best Practices.pdfData Cleaning Best Practices.pdf
Data Cleaning Best Practices.pdf
 
How do you assess the quality and reliability of data sources in data analysi...
How do you assess the quality and reliability of data sources in data analysi...How do you assess the quality and reliability of data sources in data analysi...
How do you assess the quality and reliability of data sources in data analysi...
 

Recently uploaded

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 

Recently uploaded (20)

Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 

BDA TAE 2 (BMEB 83).pptx

  • 1. G.H RAISONI COLLEGE OF ENGINEERING & MANAGEMENT, PUNE Department of Mechanical Engineering TAE 2 Topic :-DATA WRANGLING Subject:- BIG DATA ANALYTICS Guided By: Presented By: Roll No: Dr.Apshabi pathan Mr.Akash Gaikwad BMEB 83
  • 2. Contents • What is data wrangling • Examples • Steps in data wrangling • Importance of data wrangling • Benefits of data wrangling • Data wrangling tools • Examples of basic mungling tools
  • 3.  Data wrangling—also called data cleaning, data remediation, or data munging—refers to a variety of processes designed to transform raw data into more readily used formats. The exact methods differ from project to project depending on the data you’re leveraging and the goal you’re trying to achieve. WHAT IS DATA WRANGLING?
  • 4. Examples  Merging multiple data sources into a single dataset for analysis ▶ Identifying gaps in data (for example, empty cells in a spreadsheet) and either filling or deleting them ▶ Deleting data that’s either unnecessary or irrelevant to the project you’re working on ▶ Identifying extreme outliers in data and either explaining the discrepancies or removing them so that analysis can take place
  • 5. DATA WRANGLING STEPS  Each data project requires a unique approach to ensure its final dataset is reliable and accessible. That being said, several processes typically inform the approach. These are commonly referred to as data wrangling steps or activities. 1. Discovery  Discovery refers to the process of familiarizing yourself with data so you can conceptualize how you might use it. You can liken it to looking in your refrigerator before cooking a meal to see what ingredients you have at your disposal. ▶ During discovery, you may identify trends or patterns in the data, along with obvious issues, such as missing or incomplete values that need to be addressed. This is an important step, as it will inform every activity that comes afterward.
  • 6. 2. Structuring  Raw data is typically unusable in its raw state because it’s either incomplete or misformatted for its intended application. Data structuring is the process of taking raw data and transforming it to be more readily leveraged. The form your data takes will depend on the analytical model you use to interpret it.
  • 7. 3. Cleaning  Data cleaning is the process of removing inherent errors in data that might distort your analysis or render it less valuable. Cleaning can come in different forms, including deleting empty cells or rows, removing outliers, and standardizing inputs. The goal of data cleaning is to ensure there are no errors (or as few as possible) that could influence your final analysis.
  • 8. 4. Enriching  Once you understand your existing data and have transformed it into a more usable state, you must determine whether you have all of the data necessary for the project at hand. If not, you may choose to enrich or augment your data by incorporating values from other datasets. For this reason, it’s important to understand what other data is available for use. ▶ If you decide that enrichment is necessary, you need to repeat the steps above for any new data.
  • 9. 5. Validating  Data validation refers to the process of verifying that your data is both consistent and of a high enough quality. During validation, you may discover issues you need to resolve or conclude that your data is ready to be analyzed. Validation is typically achieved through various automated processes and requires programming.
  • 10. 6. Publishing  Once your data has been validated, you can publish it. This involves making it available to others within your organization for analysis. The format you use to share the information—such as a written report or electronic file— will depend on your data and the organization’s goals.
  • 11. THE IMPORTANCE OF DATA WRANGLING  Any analyses a business performs will ultimately be constrained by the data that informs them. If data is incomplete, unreliable, or faulty, then analyses will be too— diminishing the value of any insights gleaned.  Data wrangling seeks to remove that risk by ensuring data is in a reliable state before it’s analyzed and leveraged. This makes it a critical part of the analytical process.  It’s important to note that data wrangling can be time-consuming and taxing on resources, particularly when done manually. This is why many organizations institute policies and best practices that help employees streamline the data cleanup process— for example, requiring that data include certain information or be in a specific format before it’s uploaded to a database.
  • 12. Benefits of Data Wrangling  Data wrangling helps to improve data usability as it converts data into a compatible format for the end system. ▶ It helps to quickly build data flows within an intuitive user interface and easily schedule and automate the data-flow process. ▶ Integrates various types of information and their sources (like databases, web services, files, etc.) ▶ Help users to process very large volumes of data easily and easily share data- flow techniques.
  • 13. Data Wrangling Tools  There are different tools for data wrangling that can be used for gathering, importing, structuring, and cleaning data before it can be fed into analytics and BI apps. You can use automated tools for data wrangling, where the software allows you to validate data mappings and scrutinize data samples at every step of the transformation process. This helps to quickly detect and correct errors in data mapping. Automated data cleaning becomes necessary in businesses dealing with exceptionally large data sets. For manual data cleaning processes, the data team or data scientist is responsible for wrangling. In smaller setups, however, non-data professionals are responsible for cleaning data before leveraging it.
  • 14. Some examples of basic data munging tools are  Spreadsheets / Excel Power Query - It is the most basic manual data wrangling tool ▶ OpenRefine - An automated data cleaning tool that requires programming skills ▶ Tabula – It is a tool suited for all data types ▶ Google DataPrep – It is a data service that explores, cleans, and prepares data ▶ Data wrangler – It is a data cleaning and transforming tool