The Body Shop Data Analysis

•Download as PPTX, PDF•

0 likes•93 views

The document summarizes a term project analyzing product data scraped from The Body Shop website. It describes scraping 326 products, collecting 12 data points per product, and analyzing the dataset using Pandas. It identifies popular products and ingredients, visualizes the dataset to show category and sub-category trends, and explores correlations between variables like reviews and ratings. The project encountered challenges with website changes and complex HTML, but was able to mine useful insights from the scraped Body Shop data.

Data & Analytics

Introduction
– The Body Shop International Limited, trading as The Body
Shop, is a cosmetics, skin care and perfume company which
is a subsidiary of Brazilian company Natura & Co.
– In this presentation, we’ll show how we performed web
scraping using Python 3 and the BeautifulSoup library on the
Body Shop website.
– We’ll be scraping all products information from
the https://www.thebodyshop.com/en-us/ , and then
analyzing them using the Pandas library.

Popular products
 Body butters (including Moringa, Satsuma, Strawberry, Olive, Shea, Mango and Coconut)
 Body products such as body scrub, body butter and bath lilies
 Cosmetics (including mascara, lipstick, lip gloss, eye shadow and cotton rounds)
 Full skin care ranges (including Tea tree, Vitamin C, Vitamin E, Aloe vera and Seaweed)
 Men's skin care (Including maca root and white musk)
 Hair care (including their famous Banana shampoo and Banana conditioner)
 Fragrances (Women's and Men's)
 Bath products including shower gels and solid soaps

 We parsed through all main
categories and scraped their
products information.
 We were able to fetch about
326 products from the website.
Web Scraping

Columns in our dataset
 Item Number
 Main Category
 Sub-Category
 Product Name
 Main Ingredient
 Reviews Count
 Ratings out of 5
 Sizes Available
 Sizes
 Prices
 Short Description
 Long Description
328 Rows, 12 Columns

Snapshot of our dataset after
web scraping
Head rows

Challenges
 External sites can change without warning.
 Due to holiday season, the website was changing frequently and
that broke scrapers often
 Confusing and difficult to dig HTML tags
 Difficulty in scraping ‘Reviews Count’ & ‘Ratings’
 ‘Main Ingredient’ might just not scrap successfully sometimes and
result in huge number of Nas
 Nested sizes and prices columns difficult to mine or explode

What are the most
popular Main
Ingredients ?
 There 31 unique Main
Ingredients
 This bar plot shows the top 5
popular Main Ingredients are:
 Aloe Vera
 Shea
 Marula
 Organic Alcohol
 Honey

Total products
in each Main
Category ?
Products in BODY category are
highest in number with 36.89% of all
products.

What are the most popular
Sub-Categories ?
 This horizontal bar plot shows top 5 sub-
categories are:
 Lotions & Creams
 Hand Creams
 Body Butter
 Body Wash
 Lip Balm

What are the top 10
highest reviewed
products?
What are the
top 25 highest
rated products?

Data Mining
 Let us check how many products
are Vegan or Vegetarian?
 Created a function to data mine
from “Long Descriptions” texts
 Created a new column called
“Cruelty-Free” to show if the
product is Vegan or Vegetarian
Pie Chart showing percent of
products in cruelty-free categories.

Correlation Matrix between
Reviews_Count, Ratings_out_of_5 &
Sizes_Available

Pairplot or
Correlogram
 Using Seaborn
 Shows scatter plot between
each numerical variable
 Here, there is no linear trend
visible

Data Slicing and Dicing
Exploding nested columns
 The Sizes column was a string containing all sizes
 Created a function to clean it and convert it to list
 Exploded new list column into separate rows
 Grouped the rows for each Main category by three sizes – Large, Medium &
Small
 Visualized grouped data

Visualizing grouped sizes
 Two examples of two categories
 Body and Makeup
 Small sized products are popular in makeup
 While Body sells more medium and some large sized
products

What's hot

Neha sharma _ppt final PIBM NehaSharma1940

Retail Audit of Philips Light BulbsRohan Sharma

johnson n johnsonanand karki

ayush-case studyAkhilesh Krishnan

Project report on marketing mix and competitive analysis of pureit hulJitender Kumar

SWOT & PESTEL ANALYSIS OF HUL.pptxmanhar4

PatanjaliPriyesh Neema

Revlon projectPT Education, Indore

L'orealItesh Panda

Ppt on PhilipsPrasham Bhargava

What's hot (10)

Neha sharma _ppt final PIBM

Retail Audit of Philips Light Bulbs

johnson n johnson

ayush-case study

Project report on marketing mix and competitive analysis of pureit hul

SWOT & PESTEL ANALYSIS OF HUL.pptx

Patanjali

Revlon project

L'oreal

Ppt on Philips

Similar to The Body Shop Data Analysis

The Body Shop UTAR

EpigamiaShweta Yelpula

D2C Insider Regional Summit North- Delhi. Yaan Man by Rahul ShahD2C Insider

VOTRE VU ~ CLASSYCHIC-Soiree Presentationclassychic

Market Study on Australian Skincare Market | Market Research | Innovius Resea...Innovius Designs

Hindustan lever ltd brand stratergy OF MARKETING BY BABASAB PATIL Babasab Patil

Skin care-trends-2017-beauty-marketingJohn Stickler

BlackBox Cosmetics Products and Sales OpportunityMarie Gagne

Choosing InterventionsPurposeThe purpose of this session i.docxchristinemaritza

Hindustan lever ltd brand stratergy MBA MARKETING STRTERGY BY BABASAB PATIL Babasab Patil

Marketing ManagementKashif Khaira

BrightonSEO Automated Text Generation at ScaleEmma Russell

3'c Report (Johnson & Johnson, Proctor And Gamble, Amway) Shivam Jain

Competitors of complan,revlon and hitNithin Daniel

Assignment 1 Part A Your Marketing PlanDue Week 3 and worth 20.docxbraycarissa250

Opportunity PresentationTara Puckey

Atria presentationtpressley

CrowdANALYTIX Catalog Quality Assessment - WalgreensDivyabh Mishra

Product mix of amwayHarshalPatil242

HUL : Hindustan Unilever Limited Ayush Parekh

Similar to The Body Shop Data Analysis (20)

The Body Shop

Epigamia

D2C Insider Regional Summit North- Delhi. Yaan Man by Rahul Shah

VOTRE VU ~ CLASSYCHIC-Soiree Presentation

Market Study on Australian Skincare Market | Market Research | Innovius Resea...

Hindustan lever ltd brand stratergy OF MARKETING BY BABASAB PATIL

Skin care-trends-2017-beauty-marketing

BlackBox Cosmetics Products and Sales Opportunity

Choosing InterventionsPurposeThe purpose of this session i.docx

Hindustan lever ltd brand stratergy MBA MARKETING STRTERGY BY BABASAB PATIL

Marketing Management

BrightonSEO Automated Text Generation at Scale

3'c Report (Johnson & Johnson, Proctor And Gamble, Amway)

Competitors of complan,revlon and hit

Assignment 1 Part A Your Marketing PlanDue Week 3 and worth 20.docx

Opportunity Presentation

Atria presentation

CrowdANALYTIX Catalog Quality Assessment - Walgreens

Product mix of amway

HUL : Hindustan Unilever Limited

Recently uploaded

一比一原版西悉尼大学毕业证成绩单如何办理pyhepag

Fuzzy Sets decision making under information of uncertaintyRafigAliyev2

2024 Q2 Orange County (CA) Tableau User Group MeetingAlison Pitt

MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxNidaFaviankaNawawi

一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理pyhepag

Pre-ProductionImproveddsfjgndflghtgg.pptxStephen266013

basics of data science with application areas.pdfvyankatesh1

how can i exchange pi coins for others currency like BitcoinDOT TECH

Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole

一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理pyhepag

一比一原版麦考瑞大学毕业证成绩单如何办理cyebo

Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc

一比一原版阿德莱德大学毕业证成绩单如何办理pyhepag

2024 Q1 Tableau User Group Leader Quarterly Calllward7

Artificial_General_Intelligence__storm_gen_article.pdfscitechtalktv

Easy and simple project file on mp onlinebalibahu1313

Exploratory Data Analysis - Dilip S.pptxDilipVasan

Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen

How I opened a fake bank account and didn't go to prisonPayment Village

Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Riyadh +966572737505 get cytotec

Recently uploaded (20)

一比一原版西悉尼大学毕业证成绩单如何办理

Fuzzy Sets decision making under information of uncertainty

2024 Q2 Orange County (CA) Tableau User Group Meeting

MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx

一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理

Pre-ProductionImproveddsfjgndflghtgg.pptx

basics of data science with application areas.pdf

how can i exchange pi coins for others currency like Bitcoin

Supply chain analytics to combat the effects of Ukraine-Russia-conflict

一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理

一比一原版麦考瑞大学毕业证成绩单如何办理

Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs

一比一原版阿德莱德大学毕业证成绩单如何办理

2024 Q1 Tableau User Group Leader Quarterly Call

Artificial_General_Intelligence__storm_gen_article.pdf

Easy and simple project file on mp online

Exploratory Data Analysis - Dilip S.pptx

Atlantic Grupa Case Study (Mintec Data AI)

How I opened a fake bank account and didn't go to prison

Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec

The Body Shop Data Analysis

2. CIS5357- COMPUTING FOR DATA ANALYTICS Term Project 2019 The Body Shop By – Aditi & Pooja

3. Introduction – The Body Shop International Limited, trading as The Body Shop, is a cosmetics, skin care and perfume company which is a subsidiary of Brazilian company Natura & Co. – In this presentation, we’ll show how we performed web scraping using Python 3 and the BeautifulSoup library on the Body Shop website. – We’ll be scraping all products information from the https://www.thebodyshop.com/en-us/ , and then analyzing them using the Pandas library.

4. DATA COLLECTION of Body Shop Dataset

5. Popular products  Body butters (including Moringa, Satsuma, Strawberry, Olive, Shea, Mango and Coconut)  Body products such as body scrub, body butter and bath lilies  Cosmetics (including mascara, lipstick, lip gloss, eye shadow and cotton rounds)  Full skin care ranges (including Tea tree, Vitamin C, Vitamin E, Aloe vera and Seaweed)  Men's skin care (Including maca root and white musk)  Hair care (including their famous Banana shampoo and Banana conditioner)  Fragrances (Women's and Men's)  Bath products including shower gels and solid soaps

6.  We parsed through all main categories and scraped their products information.  We were able to fetch about 326 products from the website. Web Scraping

8. Columns in our dataset  Item Number  Main Category  Sub-Category  Product Name  Main Ingredient  Reviews Count  Ratings out of 5  Sizes Available  Sizes  Prices  Short Description  Long Description 328 Rows, 12 Columns

9. Snapshot of our dataset after web scraping Head rows

10. Challenges  External sites can change without warning.  Due to holiday season, the website was changing frequently and that broke scrapers often  Confusing and difficult to dig HTML tags  Difficulty in scraping ‘Reviews Count’ & ‘Ratings’  ‘Main Ingredient’ might just not scrap successfully sometimes and result in huge number of Nas  Nested sizes and prices columns difficult to mine or explode

11. DATA VISUALIZATION of Body Shop Dataset

12. What are the most popular Main Ingredients ?  There 31 unique Main Ingredients  This bar plot shows the top 5 popular Main Ingredients are:  Aloe Vera  Shea  Marula  Organic Alcohol  Honey

13. Total products in each Main Category ? Products in BODY category are highest in number with 36.89% of all products.

14. What are the most popular Sub-Categories ?  This horizontal bar plot shows top 5 sub- categories are:  Lotions & Creams  Hand Creams  Body Butter  Body Wash  Lip Balm

15. What are the top 10 highest reviewed products? What are the top 25 highest rated products?

16. Data Mining  Let us check how many products are Vegan or Vegetarian?  Created a function to data mine from “Long Descriptions” texts  Created a new column called “Cruelty-Free” to show if the product is Vegan or Vegetarian Pie Chart showing percent of products in cruelty-free categories.

17. Correlation Matrix between Reviews_Count, Ratings_out_of_5 & Sizes_Available

18. Pairplot or Correlogram  Using Seaborn  Shows scatter plot between each numerical variable  Here, there is no linear trend visible

19. Data Slicing and Dicing Exploding nested columns  The Sizes column was a string containing all sizes  Created a function to clean it and convert it to list  Exploded new list column into separate rows  Grouped the rows for each Main category by three sizes – Large, Medium & Small  Visualized grouped data

20. Grouped data

21. Visualizing grouped sizes  Two examples of two categories  Body and Makeup  Small sized products are popular in makeup  While Body sells more medium and some large sized products

22. Thank you

The Body Shop Data Analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (10)

Similar to The Body Shop Data Analysis

Similar to The Body Shop Data Analysis (20)

Recently uploaded

Recently uploaded (20)

The Body Shop Data Analysis