Data Wrangle and Visualization

•

0 likes•47 views

MuhammedMostafa9

Data gathering from different sources regarding Twitter account WERateDogs.

Data & Analytics

wrangle_report
December 12, 2020
1 Wrangle Report
1.1 WeRateDogs Twitter Acount Data Wrangling
1.2 Intoduction
1.2.1 The porpose of this project is to wrangle data about twitter acount WeRateDogs
from 3 different sources to create interesting and trustworthy analyses and
visualizations.
2 Project Details
2.0.1 1. Gathering
2.0.2 2. Assess
2.0.3 3. Clean
2.0.4 4. Store
2.1 Gathering
2.1.1 1. Twitter archive file was downloaded manually; it contains basic tweet data
for all 5000+ of their tweets, but not everything.
2.1.2 2. Image Predictions File downloaded programatically every image in the WeR-
ateDogs Twitter archive through a neural network that can classify breeds of
dogs*. The results: a table full of image predictions (the top three only) along-
side each tweet ID, image URL, and the image number that corresponded to
the most confident prediction (numbered 1 to 4 since tweets can have up to
four images).
2.1.3 3. Additional Data via the Twitter API: I successfully created a Twitter De-
veloper acount and collected more data with the tweets Id column from the
Twitter archive file.
A Line Brake
3 Assess
3.0.1 Twitter archive
1. the archive have 2356 rows only 2278 are tweets
1

2. some ratings are too high and the type should be a float
3. the rating numerator has very high and very low values it should be 10 or a multiple of ten
for multiple dog ratings
4. the name of the dog have non-name values
5. the timestamp column is of type object
6. doggo, floofer, pupper and puppo these are values not columns names the should be melt into
one column
3.0.2 Image Predictions
1. not all tweets have a valid pic of dog; the col“p1_dog, p2_dog, p3_dog” are false
2. jpg_url has duplicates
3. tidiness issue that (p1,p2,p3) (p1_conf,p2_conf,p3_conf),and (p1_dog ,p2_dog ,p3_dog )
are in 3 columns instead of one
3.0.3 Twitter API
1. to manny info a bout the tweets was rtrieved from Twitter Api i choose the retweet count
and favorite count
4 Clean
4.0.1 Twitter archive
1. first make a copy of the archive_df
2. remove all retweets and tweets without a photo
3. convert timestamp to datetime format
4. extract ratings from the tweets text and invistigate them
5. clean the name column
6. crate a column named dog_class and append the 4-columns of class in it
4.0.2 Image Predictions
1. remove all tweeets with 3 algorithms failed to predict a dog breed
2. remove all duplicated photos
3. i choose only the first algorithm to continue the analysis
4.0.3 Twitter API: no cleaning needed
5 Store
5.0.1 i merged the tree data frames into one master data frame stored it as ‘twit-
ter_archive_master.csv’ it has tweets with a photoor more only with the
retweet count and favorite count and a most confidence prediction of the dog
breed as a name if excist and dog stage if excist.
2

Recently uploaded

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

Edukaciniai dropshipping via API with DroFxolyaivanovalion

Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh9953056974 Low Rate Call Girls In Saket, Delhi NCR

Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila

Mature dropshipping via API with DroFx.pptxolyaivanovalion

Zuja dropshipping via API with DroFx.pptxolyaivanovalion

Capstone Project on IBM Data Analytics ProgramMoniSankarHazra

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823

BigBuy dropshipping via API with DroFx.pptxolyaivanovalion

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal

Halmar dropshipping via API with DroFxolyaivanovalion

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083

Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten

Recently uploaded (20)

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service

Generative AI on Enterprise Cloud with NiFi and Milvus

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Edukaciniai dropshipping via API with DroFx

Sampling (random) method and Non random.ppt

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

Mature dropshipping via API with DroFx.pptx

Zuja dropshipping via API with DroFx.pptx

Capstone Project on IBM Data Analytics Program

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...

BigBuy dropshipping via API with DroFx.pptx

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

FESE Capital Markets Fact Sheet 2024 Q1.pdf

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...

Halmar dropshipping via API with DroFx

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...

Log Analysis using OSSEC sasoasasasas.pptx

Featured

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools

12 Ways to Increase Your Influence at WorkGetSmarter

Featured (20)

Product Design Trends in 2024 | Teenage Engineerings

How Race, Age and Gender Shape Attitudes Towards Mental Health

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...

12 Ways to Increase Your Influence at Work

Data Wrangle and Visualization

1. wrangle_report December 12, 2020 1 Wrangle Report 1.1 WeRateDogs Twitter Acount Data Wrangling 1.2 Intoduction 1.2.1 The porpose of this project is to wrangle data about twitter acount WeRateDogs from 3 different sources to create interesting and trustworthy analyses and visualizations. 2 Project Details 2.0.1 1. Gathering 2.0.2 2. Assess 2.0.3 3. Clean 2.0.4 4. Store 2.1 Gathering 2.1.1 1. Twitter archive file was downloaded manually; it contains basic tweet data for all 5000+ of their tweets, but not everything. 2.1.2 2. Image Predictions File downloaded programatically every image in the WeR- ateDogs Twitter archive through a neural network that can classify breeds of dogs*. The results: a table full of image predictions (the top three only) along- side each tweet ID, image URL, and the image number that corresponded to the most confident prediction (numbered 1 to 4 since tweets can have up to four images). 2.1.3 3. Additional Data via the Twitter API: I successfully created a Twitter De- veloper acount and collected more data with the tweets Id column from the Twitter archive file. A Line Brake 3 Assess 3.0.1 Twitter archive 1. the archive have 2356 rows only 2278 are tweets 1

2. 2. some ratings are too high and the type should be a float 3. the rating numerator has very high and very low values it should be 10 or a multiple of ten for multiple dog ratings 4. the name of the dog have non-name values 5. the timestamp column is of type object 6. doggo, floofer, pupper and puppo these are values not columns names the should be melt into one column 3.0.2 Image Predictions 1. not all tweets have a valid pic of dog; the col“p1_dog, p2_dog, p3_dog” are false 2. jpg_url has duplicates 3. tidiness issue that (p1,p2,p3) (p1_conf,p2_conf,p3_conf),and (p1_dog ,p2_dog ,p3_dog ) are in 3 columns instead of one 3.0.3 Twitter API 1. to manny info a bout the tweets was rtrieved from Twitter Api i choose the retweet count and favorite count 4 Clean 4.0.1 Twitter archive 1. first make a copy of the archive_df 2. remove all retweets and tweets without a photo 3. convert timestamp to datetime format 4. extract ratings from the tweets text and invistigate them 5. clean the name column 6. crate a column named dog_class and append the 4-columns of class in it 4.0.2 Image Predictions 1. remove all tweeets with 3 algorithms failed to predict a dog breed 2. remove all duplicated photos 3. i choose only the first algorithm to continue the analysis 4.0.3 Twitter API: no cleaning needed 5 Store 5.0.1 i merged the tree data frames into one master data frame stored it as ‘twit- ter_archive_master.csv’ it has tweets with a photoor more only with the retweet count and favorite count and a most confidence prediction of the dog breed as a name if excist and dog stage if excist. 2

Data Wrangle and Visualization

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Data Wrangle and Visualization