SlideShare a Scribd company logo
1 of 14
Download to read offline
Cleansing Big
Data
MIS 4596 - Team 3
Brendon Lee, Brody McGillen, John Dinh, Kashif
Malik, and Khuong Tang
Agenda
Background
Strategy
Process & Demo
Results
01
02
04
05 Q&A
01
03
Background
The Client: Aleksi Aaltonen
The Data:
● User generated content from Wikipedia
● 70k+ celebrity, influencer, and politician deaths across the globe
● Key metrics: name, age, occupation, and nationality
● Range: 2004 to 2018
Develop a program that reads notable deaths
extracted from Wikipedia and transforms that
‘raw’ dataset to match the ‘ground truth’
dataset as closely as possible.
The Objective:
Strategy
Coding Process
Research While
Coding. Meeting
with Client for
Consultations.
Examine Ground
Truth Dataset
Determine objectives
and time range of the
data.
Match Data
Visually Compared
Our Solution to the
Ground Truth Dataset.
Decide On Python
Used Python with
Spyder IDE: Editing,
Interactive Testing and
Debugging.
Evaluate Research
Tools
Bioinformatic Studio
Study Group,
Online Research
with w3schools.
Process
Consult with Client
Demonstrate coding approach to
client and request feedback.
Reevaluate Code
Utilize client feedback, review code,
and implement changes.
Coding
Apply knowledge and research to
extract data.
Research
Leverage resources such as
Stack Overflow and w3schools to
better understand Python.
Use Excel
Cleaning the dataset with
VLOOKUP and Filter to match with
Ground Truth dataset.
Demo: Importing Data
● Import Packages and
Module
● Variable correspond to
Month and Year
Demo: Concatenation
● Use Numpy to
Concatenate Into One
List
● While Loop to Find the
Data
● Try Statement to Find
Names and Titles
● The Exception Clause to
Caught Error
Demo: Exporting Data
● Find the Third Values
● Get the First Values in
The Strings
● The Exception Clause to
Caught Error
● Use CSV to Write These
Variables into Their
Respective Columns
Demo: Excel
Comparison: Ground Truth Dataset
Comparison: Our Team’s Dataset
Thank You!
Questions?
Appendix
Appendix (Cont.)

More Related Content

What's hot

What's New In Neo4j 3.4 & Bloom Update
What's New In Neo4j 3.4 & Bloom UpdateWhat's New In Neo4j 3.4 & Bloom Update
What's New In Neo4j 3.4 & Bloom UpdateNeo4j
 
4. Document Discovery with Graph Data Science
 4. Document Discovery with Graph Data Science 4. Document Discovery with Graph Data Science
4. Document Discovery with Graph Data ScienceNeo4j
 
Measuring the benefit effect for customers with Bayesian predictive modeling
Measuring the benefit effect for customers with Bayesian predictive modelingMeasuring the benefit effect for customers with Bayesian predictive modeling
Measuring the benefit effect for customers with Bayesian predictive modelingJeongMin Kwon
 
Chest TermSet GDPR ScanR Presentation
Chest TermSet GDPR ScanR PresentationChest TermSet GDPR ScanR Presentation
Chest TermSet GDPR ScanR PresentationJenny Carroll
 
Ieee 2015 16 java titles for me-mtech @triple n infotec-trichy
Ieee 2015 16  java titles for me-mtech @triple n infotec-trichyIeee 2015 16  java titles for me-mtech @triple n infotec-trichy
Ieee 2015 16 java titles for me-mtech @triple n infotec-trichysubhu8430
 
Diffusion in platform-based markets: big data driven agent-based model
Diffusion in platform-based markets: big data driven agent-based modelDiffusion in platform-based markets: big data driven agent-based model
Diffusion in platform-based markets: big data driven agent-based modelJari Jussila
 
Triple n infotech IEEE java titles 2015
Triple n infotech IEEE java titles 2015Triple n infotech IEEE java titles 2015
Triple n infotech IEEE java titles 2015subhu8430
 

What's hot (7)

What's New In Neo4j 3.4 & Bloom Update
What's New In Neo4j 3.4 & Bloom UpdateWhat's New In Neo4j 3.4 & Bloom Update
What's New In Neo4j 3.4 & Bloom Update
 
4. Document Discovery with Graph Data Science
 4. Document Discovery with Graph Data Science 4. Document Discovery with Graph Data Science
4. Document Discovery with Graph Data Science
 
Measuring the benefit effect for customers with Bayesian predictive modeling
Measuring the benefit effect for customers with Bayesian predictive modelingMeasuring the benefit effect for customers with Bayesian predictive modeling
Measuring the benefit effect for customers with Bayesian predictive modeling
 
Chest TermSet GDPR ScanR Presentation
Chest TermSet GDPR ScanR PresentationChest TermSet GDPR ScanR Presentation
Chest TermSet GDPR ScanR Presentation
 
Ieee 2015 16 java titles for me-mtech @triple n infotec-trichy
Ieee 2015 16  java titles for me-mtech @triple n infotec-trichyIeee 2015 16  java titles for me-mtech @triple n infotec-trichy
Ieee 2015 16 java titles for me-mtech @triple n infotec-trichy
 
Diffusion in platform-based markets: big data driven agent-based model
Diffusion in platform-based markets: big data driven agent-based modelDiffusion in platform-based markets: big data driven agent-based model
Diffusion in platform-based markets: big data driven agent-based model
 
Triple n infotech IEEE java titles 2015
Triple n infotech IEEE java titles 2015Triple n infotech IEEE java titles 2015
Triple n infotech IEEE java titles 2015
 

Similar to Cleansing Big Data

Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"Lviv Startup Club
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science LandscapePhilip Bourne
 
The NIH as a Digital Enterprise: Implications for PAG
The NIH as a Digital Enterprise: Implications for PAGThe NIH as a Digital Enterprise: Implications for PAG
The NIH as a Digital Enterprise: Implications for PAGPhilip Bourne
 
DataSpryng Overview
DataSpryng OverviewDataSpryng Overview
DataSpryng Overviewjkvr
 
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...Sean Ekins
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformVMware Tanzu
 
Neo4j GraphDay Seattle- Sept19- Connected data imperative
Neo4j GraphDay Seattle- Sept19- Connected data imperativeNeo4j GraphDay Seattle- Sept19- Connected data imperative
Neo4j GraphDay Seattle- Sept19- Connected data imperativeNeo4j
 
Where does Data Democracy begin? [Segment-Synapse, 2019]
Where does Data Democracy begin? [Segment-Synapse, 2019]Where does Data Democracy begin? [Segment-Synapse, 2019]
Where does Data Democracy begin? [Segment-Synapse, 2019]aj_cache
 
Department of Commerce App Challenge: Big Data Dashboards
Department of Commerce App Challenge: Big Data DashboardsDepartment of Commerce App Challenge: Big Data Dashboards
Department of Commerce App Challenge: Big Data DashboardsBrand Niemann
 
Big Data as a Catalyst for Collaboration & Innovation
Big Data as a Catalyst for Collaboration & InnovationBig Data as a Catalyst for Collaboration & Innovation
Big Data as a Catalyst for Collaboration & InnovationPhilip Bourne
 
Building the Analytics Capability
Building the Analytics CapabilityBuilding the Analytics Capability
Building the Analytics CapabilityBala Iyer
 
Adding Open Data Value to 'Closed Data' Problems
Adding Open Data Value to 'Closed Data' ProblemsAdding Open Data Value to 'Closed Data' Problems
Adding Open Data Value to 'Closed Data' ProblemsSimon Price
 
STI 2022 - Generating large-scale network analyses of scientific landscapes i...
STI 2022 - Generating large-scale network analyses of scientific landscapes i...STI 2022 - Generating large-scale network analyses of scientific landscapes i...
STI 2022 - Generating large-scale network analyses of scientific landscapes i...Michele Pasin
 
Proposal for the Theme on Big Data.pdf
Proposal for the Theme on Big Data.pdfProposal for the Theme on Big Data.pdf
Proposal for the Theme on Big Data.pdfshayamiticharles
 
Cambridgeshire Insight Open Data: What we’ve learnt from the unexpected - He...
Cambridgeshire Insight Open Data: What we’ve learnt from the unexpected - He...Cambridgeshire Insight Open Data: What we’ve learnt from the unexpected - He...
Cambridgeshire Insight Open Data: What we’ve learnt from the unexpected - He...CambridgeshireInsight
 
Minor_project_PPT_final_covid_prediction.pptx
Minor_project_PPT_final_covid_prediction.pptxMinor_project_PPT_final_covid_prediction.pptx
Minor_project_PPT_final_covid_prediction.pptxAMANSHARMA891906
 
IoT 2014 Value Creation Workshop: SDIL
IoT 2014 Value Creation Workshop: SDILIoT 2014 Value Creation Workshop: SDIL
IoT 2014 Value Creation Workshop: SDILTill Riedel
 
BigData-Challenges.pptx
BigData-Challenges.pptxBigData-Challenges.pptx
BigData-Challenges.pptxamanyosama12
 
Advancing Alcohol Behavior Change
Advancing Alcohol Behavior ChangeAdvancing Alcohol Behavior Change
Advancing Alcohol Behavior ChangeChad Travis
 

Similar to Cleansing Big Data (20)

Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science Landscape
 
The NIH as a Digital Enterprise: Implications for PAG
The NIH as a Digital Enterprise: Implications for PAGThe NIH as a Digital Enterprise: Implications for PAG
The NIH as a Digital Enterprise: Implications for PAG
 
DataSpryng Overview
DataSpryng OverviewDataSpryng Overview
DataSpryng Overview
 
BD2K Update
BD2K UpdateBD2K Update
BD2K Update
 
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
CDD: Vault, CDD: Vision and CDD: Models software for biologists and chemists ...
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 
Neo4j GraphDay Seattle- Sept19- Connected data imperative
Neo4j GraphDay Seattle- Sept19- Connected data imperativeNeo4j GraphDay Seattle- Sept19- Connected data imperative
Neo4j GraphDay Seattle- Sept19- Connected data imperative
 
Where does Data Democracy begin? [Segment-Synapse, 2019]
Where does Data Democracy begin? [Segment-Synapse, 2019]Where does Data Democracy begin? [Segment-Synapse, 2019]
Where does Data Democracy begin? [Segment-Synapse, 2019]
 
Department of Commerce App Challenge: Big Data Dashboards
Department of Commerce App Challenge: Big Data DashboardsDepartment of Commerce App Challenge: Big Data Dashboards
Department of Commerce App Challenge: Big Data Dashboards
 
Big Data as a Catalyst for Collaboration & Innovation
Big Data as a Catalyst for Collaboration & InnovationBig Data as a Catalyst for Collaboration & Innovation
Big Data as a Catalyst for Collaboration & Innovation
 
Building the Analytics Capability
Building the Analytics CapabilityBuilding the Analytics Capability
Building the Analytics Capability
 
Adding Open Data Value to 'Closed Data' Problems
Adding Open Data Value to 'Closed Data' ProblemsAdding Open Data Value to 'Closed Data' Problems
Adding Open Data Value to 'Closed Data' Problems
 
STI 2022 - Generating large-scale network analyses of scientific landscapes i...
STI 2022 - Generating large-scale network analyses of scientific landscapes i...STI 2022 - Generating large-scale network analyses of scientific landscapes i...
STI 2022 - Generating large-scale network analyses of scientific landscapes i...
 
Proposal for the Theme on Big Data.pdf
Proposal for the Theme on Big Data.pdfProposal for the Theme on Big Data.pdf
Proposal for the Theme on Big Data.pdf
 
Cambridgeshire Insight Open Data: What we’ve learnt from the unexpected - He...
Cambridgeshire Insight Open Data: What we’ve learnt from the unexpected - He...Cambridgeshire Insight Open Data: What we’ve learnt from the unexpected - He...
Cambridgeshire Insight Open Data: What we’ve learnt from the unexpected - He...
 
Minor_project_PPT_final_covid_prediction.pptx
Minor_project_PPT_final_covid_prediction.pptxMinor_project_PPT_final_covid_prediction.pptx
Minor_project_PPT_final_covid_prediction.pptx
 
IoT 2014 Value Creation Workshop: SDIL
IoT 2014 Value Creation Workshop: SDILIoT 2014 Value Creation Workshop: SDIL
IoT 2014 Value Creation Workshop: SDIL
 
BigData-Challenges.pptx
BigData-Challenges.pptxBigData-Challenges.pptx
BigData-Challenges.pptx
 
Advancing Alcohol Behavior Change
Advancing Alcohol Behavior ChangeAdvancing Alcohol Behavior Change
Advancing Alcohol Behavior Change
 

Recently uploaded

Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 

Recently uploaded (20)

Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 

Cleansing Big Data

  • 1. Cleansing Big Data MIS 4596 - Team 3 Brendon Lee, Brody McGillen, John Dinh, Kashif Malik, and Khuong Tang
  • 3. Background The Client: Aleksi Aaltonen The Data: ● User generated content from Wikipedia ● 70k+ celebrity, influencer, and politician deaths across the globe ● Key metrics: name, age, occupation, and nationality ● Range: 2004 to 2018 Develop a program that reads notable deaths extracted from Wikipedia and transforms that ‘raw’ dataset to match the ‘ground truth’ dataset as closely as possible. The Objective:
  • 4. Strategy Coding Process Research While Coding. Meeting with Client for Consultations. Examine Ground Truth Dataset Determine objectives and time range of the data. Match Data Visually Compared Our Solution to the Ground Truth Dataset. Decide On Python Used Python with Spyder IDE: Editing, Interactive Testing and Debugging. Evaluate Research Tools Bioinformatic Studio Study Group, Online Research with w3schools.
  • 5. Process Consult with Client Demonstrate coding approach to client and request feedback. Reevaluate Code Utilize client feedback, review code, and implement changes. Coding Apply knowledge and research to extract data. Research Leverage resources such as Stack Overflow and w3schools to better understand Python. Use Excel Cleaning the dataset with VLOOKUP and Filter to match with Ground Truth dataset.
  • 6. Demo: Importing Data ● Import Packages and Module ● Variable correspond to Month and Year
  • 7. Demo: Concatenation ● Use Numpy to Concatenate Into One List ● While Loop to Find the Data ● Try Statement to Find Names and Titles ● The Exception Clause to Caught Error
  • 8. Demo: Exporting Data ● Find the Third Values ● Get the First Values in The Strings ● The Exception Clause to Caught Error ● Use CSV to Write These Variables into Their Respective Columns