SlideShare a Scribd company logo
1 of 21
Iryna Smologonova
Data Analyst - Portfolio
https://smologonova.github.io/
Hello!
My name is Iryna Smologonova
I am a data analyst with a background in product
development, audit and customer service.
With my curiosity and tenacity, I make connections
between different data sets, translate data into
actionable insights and communicate the ideas to
stakeholders.
I am excited to expand my data analysis and
critical-thinking skills, and to uncover the most
desirable solutions of not only solving problems
from the customer’s perspective and also affecting
business growth.
2
Technical skills
Excel Tableau
SQL Project management
Python Big data processing
Soft skills
Problem-solving Collaboration
Leadership Business acumen
Storytelling Curiosity
Projects
3
 Rockbuster Stealth
Analyzing online movie rental transactions to answer business questions
 Flu Season
Analyzing regional and seasonal trends of influenza in the US
 GameCo
Analyzing global video game sales
 Instacart
Analyzing historical grocery order data to generate insights for Marketing
strategy
4
Rockbuster Stealth – a global movie rental company
Objective
To assist with launcging strategy for
the new online video service
 Perform an analysis of historical
data to identify sales trends,
customer behavior, rental duration
 Develop insights and
recommendations for Rockbuster
(fictitious company)
Tools
 PostgreSQL
 Power Point
 Tableau
Data
 Rockbuster dataset
Source: PostgreSQL Tutorial
Skills
Creating data dictionary
Database querying
Joining tables
Subqueries
Common table expressions
ERD
visualization
Database
querying
Summarizing
& cleaning
data in SQL
Filtering &
grouping
Answer
business
questions
Data
visualization
Recommend
strategies
5
Rockbuster Stealth SQL functions:
Aggregating, ranking, joining & grouping
SQL queries available here
Business questions
Which movies contributed the most to
revenue gain?
Which genres the most popular?
Answers
 Top 20 movies account for 6% of total
revenue and 2% number of movies
 Top 3 genres by revenue- Sports, Sci-
Fi and Animation produce
 Top genres - Comedy, New and Sports
produce more revenue per film.
6
Rockbuster Stealth Business questions:
Do sales figures vary between geographic regions?
Which rating yield the most revenue?
Top 3 countries - India, China and the United States account for
25% of total number of customers and company revenue.
Ratings PG-13 & NC 17 generate the most revenue
Data visualizations created in Tableau, available here
7
Rockbuster Stealth Project deliverables:
 Project report
 GitHub Repo
 Data Dictionary
Key learning experience:
 Common Table Expressions (CTE) is more readable than
subqueries and can be reusable. However, subqueries
and CTE have pros & cons and the choice between them
should be made on a case-by-case basis.
 SQL ranking functions allowed me to define top 20
movies with the highest revenue in a simple way and
made my query more readable. RANK()/DENSE_RANK()
functions are great for sequencing and comparing data
across various factors.
 A bubble chart is a solution to visualize three metrics –
number of transactions, revenue and average revenue
per number of transactions. It allowed to include the
addition of a third dimension as a bubble size/color to
emphasize the most popular genres.
Recommendations:
Focus on:
 Adding movies to inventory generating the most
revenue Ratings: PG-13 & NC-17 and Genres: Sports,
Sci-Fi and Animation
 Comedy, New and Sports as higher generating
genres produced more revenue per film could be
beneficial at a pilot project
 Top 3 countries - India, China and the United States
account for 25% of total number of customers and
company revenue. Therefore, I would recommend to
start the streaming service by piloting in these
countries
Click links
to check
the project
Flu season
8
Sourcing the proper
data
Data profiling
& integrity
Data quality
measures
Data
transformation
& integration
Conducting
statistical
analysis
Consolidating
analytical
insights
Statistical
hypothesis
testing
Objective
To assist in preparation of
staffing plan in the United
States for upcoming influenza
season:
 Analyze death trends
 Prioritize states with
vulnerable populations
Tools
 Excel
 Tableau
Data
• Influenza deaths by geography, time,
age, and gender
Source: CDC
 Population data by geography
Source: US Census Bureau
 Influenza lab test results by state
Source: CDC (Fluview)
Skills
Translating business requirements
Data cleaning
Data integration & transformation
Statistical hypothesis testing
Visual analysis
Forecasting
Storytelling in Tableau
Presenting results
Scenario: To help a medical staffing agency prepare for the upcoming flu season by examining trends in
influenza and how they can be used to proactively plan for clinic and hospital staffing needs across the country
Data viz &
storytelling
9
Flu Season
 Data transformation by using pivot tables and VLOOKUP functions
 Combining different data sets by utilizing common state and year/month variables
 Normalizing the flu deaths data according to state populations by deriving new variables representing flu
deaths as a percentage of state population
 Examining the data variability by calculating the variance and the standard deviation
 Correlation coefficient between death rate of 65+ population and death rate below 65 was 0.79 that
quantified the strong relationship. It means that the higher rate of vulnerable population 65+ in the state
the higher a death rate
 Null Hypothesis: Flu death rate of population 65+ years old is less or equal than people under 65 years
old
 Alternative Hypothesis: Flu death rate of population 65+ years old is higher than people under 65 years
old
 The p-value is much less than the significance level of 0.05. This means that the null hypothesis is
rejected and there is 95% chance that the flu death rate of people 65+ years old is higher
 With 95% confidence (alfa 0.05) we can say that there is a significant difference between the flu death
rate of 65+ years and other groups.
Data
transformation
& integration
Conducting
Statistical
Analysis
Statistical
Hypothesis
Testing
Transformation & integration
Statistical Analysis
 Project management plan
 Interim report
Click links
to check
the project
10
Flu Season
Influenza death rate among population 65+ years old Number of influenza deaths by state
One-year forecast of influenza deaths by state
 The flu season starts in December and ends in March with a peak in January
 The death rate forecast for an upcoming flu season is pretty the same in
comparison with the historical data
 The less populous states have the higher death rate of elderly populations
per 100K: Alaska, Hawaii, Wyoming, District of Columbia, South and North
Dakota, Vermont
 The higher populous states have the higher number of deaths: California,
New York, Texas, Pennsylvania and Florida
Key questions:
When does the flu season start and end?
Where to focus during the flu season?
11
Flu Season Project deliverables:  Storytelling in Tableau – an interactive slide deck
 GitHub Repo
Key learning experience:
 It is important to consider data limitations and assess the
impact of it on the analysis and the result interpretation.
As analysis progresses, data limitations may become
apparent and should be added to the analysis plan.
 Data mapping helps to match variables between
different data sets. Death data and population data by
states were mapped using age variable on 10 years
ranges starting from age of 5 years old.
 Normalization is a part of data preparation and allows
data in different units to be compared using the same
units. The flu deaths data was normalized according to
state populations by deriving new variables representing
flu deaths as a percentage of state population.
Recommendations:
Focus on:
 6 top states with the highest death rate:
Alaska, Hawaii, Wyoming, District of Columbia,
South and North Dakota, Vermont
 5 top states with the highest elderly
populations:
California, Texas, Florida, New York and
Pennsylvania
 the influenza season months:
Dec-Mar with a peak in Jan
Click links
to check
the project
GameCo – an online rental video game company
12
Data exploration
Data
cleaning
Grouping
data
Descriptive
analytics
Developing
insights
Proposal
report
Viz data
insights
Objective
 Perform a descriptive analysis of
an online video game sales data
set to foster a better
understanding of how GameCo’s
(fictitious company) new games
might fare in the market.
 Compare historical regional sales
assumptions with the reality of
current market conditions.
Tools
 Excel
 Power Point
Data
 VGSales
Skills
Grouping data
Summarizing data
Descriptive analysis
Visualizing results in Excel
Presenting results
GameCo
13
0%
10%
20%
30%
40%
50%
60%
Percent
of
Sales
Years
Regional Sales as a Percentage of Global Sales
NA sales EU Sales JP sales
Key question: How have their sales figures varied between geographic regions over time?
52%
27%
9%
32%
38%
19%
North America Europe Japan
Sales by region 2008 vs 2016
2008 2016
Assumption:
The initial understanding of the business was that
videogame sales have remained consistent across
regions over time.
Testing:
• by plotting each region as a percentage of global
sales in a line graph, it reveals that sales in Europe
overtook North America in 2016
• the column chart has indicated a comparison of
regional sales in 2008 vs 2016 - share of North
America sales dropped by 20%, in Europe and Japan
it increased by 11% and 10% accordingly
14
54%
17%
14%
10%
5%
New games by Genre 2015-2016
Action
Role-Playing
Sports
Shooter
Fighting
GameCo Key question: Are certain types of games more popular than others?
0 5 10 15 20 25 30 35 40
Shooter
Action
Sports
Role-Playing
Fighting
Misc
Racing
Sales in Millions ($)
Genre
Regional Sales by Genre 2015-2016
JP sales EU sales NA sales
The clustered bar chart was created in Excel to
represent the regional sales by different video games
genres.
The pie chart shows the percentage breakdown of new
games sales for the last two years.
Insights:
• Shooter dominates in North American market
• Action is popular in all regional markets and it has
the highest number of new games
• Role-Playing is the second leader in Japan
15
GameCo Project deliverables:
Market Goal Target
audience
Actions
North American
refocus the budget
allocation
stabilize
sales in
preventing
further
decline
current and
former
customers using
the large
historical
customer base
launch direct marketing campaigns
on promoting new games in
Shooter, Action and Sports genres
European
support the sales with
a slight increasing
budget resources
keep the
growing
trend over
time
loyal customers
and acquire new
customers
promote intensively new games via
marketing campaigns such as
BOGO - buy one game in
Shooter/Sports genre and get one in
Action/Role-Playing genre with a
certain discount
Japanese
allocate additional
resources for
emphasizing
promotion and
attracting new
customers
continue a
growing
trend
started the
last year
current
customers and
attract new ones
advance promotions using the last
year approach and keeping the main
accent on Action and Role-Playing
new games
 Proposal report
 GitHub Repo
Key learning experience:
 It is important to consider testing
the assumptions that go with the
analysis. It allows to determine if
conclusions are correctly drawn
from the results of the analysis.
 The goal of the project, the
regional customers, sales over
time and best-selling genres were
taken into consideration to
develop a regional approach of
setting goals, focusing on a target
audience and developing
recommendations for marketing
activities.
Recommendations:
Click links
to check
the project
16
Instacart – an online grocery store operates via an app
Objective
To assist with identifying sales
patterns for better segmentation:
 Explore historical data to
define buying trends and
customer behavior
 Select sub-groups of
customers and analyze their
ordering habits
Tools
 Python
 Jupiter notebook
 Pandas & NumPy libraries
 Matplotlib, seaborn & pyplot
 Excel
Data
Skills
Data cleaning &
wrangling
Data merging
Deriving new variables
Aggregating
Population flows
Data wrangling &
subsetting
Data
consistency
check
Combining &
exporting
data
Deriving
new
variables
Grouping &
aggregating
Excel report
Data viz with
Python
 Orders  Departments
 Products  Customers
17
Instacart Population flows
Merging the datasets
The project was started from cleaning,
organising and merging data before
conducting the analysis
 Data wrangling procedure:
- dropping and renaming columns in “orders” dataset
- renaming columns and changing data types in “products” and “customers” datasets
- transposing “departments” datasets
 Merge data together
18
Instacart Deriving variables & crosstabs
Flag creation:
 Spending flag was defined by spending amount: less than $8 & over or equal $8
 Product price range was divided by 4 categories: High-range product over $15, Mid-
high range between $10 &$ 15, Mid-low range between $5 &$10 and Low range
product equal or less than $5
Crosstab calculation:
 To display the number of orders made for
every day of the week, a flag ‘busiest days’ and
a variable ‘order_day_of_week’ were taken
19
Instacart Visualization:
 Matplotlib
 Seaborn
Business question
What differences can you find in
ordering habits of different
customer profiles?
Answer
 High income level customers
prefer products from alcohol
and pets department.
 Affluent customers prefer
products from meat seafood
and can goods.
 There is no clear preference for
Middle income customers.
 Low income customers prefer
products from snacks,
beverages and breakfast
departments.
Business question
What different classifications does the
demographic information suggest?
Answer
 Middle and Affluent customers across
all age groups make the major number
of orders
 Young and Middle-aged customers
with the Middle income level are the
core of the customer base
20
Instacart
Key learning experience:
 Run out of memory in RAM for a code execution
is substantially reduced by converting data
types, sampling data or restarting kernels
 A vast collection of libraries helps with exploring,
cleaning large data sets and creating
visualizations in a simple way
 Deriving new variables and creating crosstabs
give unlimited opportunities to find insights and
communicate them via visualizations with
different features for creating informative,
customized, and appealing plots to present data
in the most simple and effective way.
Recommendations:
 The least busiest days are the middle of the week: Tuesday
and Wednesday. Ads should be running on Monday-
Wednesday with a target to increase number of orders on
Tuesday and Wednesday
 To boost sales - focus on middle level income with the profile
young single and single parent .Target younger customers
with low income level lower price products from snacks,
beverages and breakfast departments
 Promote to High income and Affluent customers high range
products from meat seafood, can goods, alcohol and pets
departments due to they have more potential capability buy a
group of products
Project deliverables:
 GitHub Repo
 Excel report
Click links
to check
the project
Iryna Smologonova
Winnipeg, Canada
Get in touch
21
Email:
email me directly

More Related Content

Similar to Data Analytics Portfolio_IS.pptx

AMES 2016 - The Human Side of Analytics
AMES 2016 - The Human Side of AnalyticsAMES 2016 - The Human Side of Analytics
AMES 2016 - The Human Side of AnalyticsStephen Tracy
 
6 years of my private G+ Spotfire community
6 years of my private G+ Spotfire community6 years of my private G+ Spotfire community
6 years of my private G+ Spotfire communityChristof Gaenzler
 
Cutting Edge Predictive Analytics with Eric Siegel
Cutting Edge Predictive Analytics with Eric Siegel   Cutting Edge Predictive Analytics with Eric Siegel
Cutting Edge Predictive Analytics with Eric Siegel Databricks
 
Big data hype (and reality)
Big data hype (and reality)Big data hype (and reality)
Big data hype (and reality)Abhi Rana
 
Telling A Story With Data
Telling A Story With DataTelling A Story With Data
Telling A Story With DataMashMetrics
 
Storyfying your Data: How to go from Data to Insights to Stories
Storyfying your Data: How to go from Data to Insights to StoriesStoryfying your Data: How to go from Data to Insights to Stories
Storyfying your Data: How to go from Data to Insights to StoriesGramener
 
Statistics in business paper
Statistics in business paperStatistics in business paper
Statistics in business paperStacey Troup
 
WUD2008 - The Numbers Revolution and its Effect on the Web
WUD2008 - The Numbers Revolution and its Effect on the WebWUD2008 - The Numbers Revolution and its Effect on the Web
WUD2008 - The Numbers Revolution and its Effect on the WebRich Miller
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series AnalysisAmanda Reed
 
Modern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in InsuranceModern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in InsuranceCambridge Semantics
 
Page 579Assess the Constituent Data. What is included Omi.docx
Page 579Assess the Constituent Data. What is included Omi.docxPage 579Assess the Constituent Data. What is included Omi.docx
Page 579Assess the Constituent Data. What is included Omi.docxbunyansaturnina
 
About Your Signature AssignmentThis signature assignment is de.docx
About Your Signature AssignmentThis signature assignment is de.docxAbout Your Signature AssignmentThis signature assignment is de.docx
About Your Signature AssignmentThis signature assignment is de.docxransayo
 
Data Storytelling - Game changer for Analytics
Data Storytelling - Game changer for Analytics Data Storytelling - Game changer for Analytics
Data Storytelling - Game changer for Analytics Gramener
 
Introduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive AnalyticsIntroduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive AnalyticsDilum Bandara
 
Can tweets help predict a stock's price movements?
Can tweets help predict a stock's price movements?Can tweets help predict a stock's price movements?
Can tweets help predict a stock's price movements?Fernan Flores
 
Early Lessons Learned in Applying Big Data To TV Advertising
Early Lessons Learned in Applying Big Data To TV AdvertisingEarly Lessons Learned in Applying Big Data To TV Advertising
Early Lessons Learned in Applying Big Data To TV AdvertisingJeffrey Storan
 
Early Lessons Learned in Applying Big Data To TV Advertising
Early Lessons Learned in Applying Big Data To TV AdvertisingEarly Lessons Learned in Applying Big Data To TV Advertising
Early Lessons Learned in Applying Big Data To TV AdvertisingJeff Storan
 

Similar to Data Analytics Portfolio_IS.pptx (20)

Statistics
StatisticsStatistics
Statistics
 
AMES 2016 - The Human Side of Analytics
AMES 2016 - The Human Side of AnalyticsAMES 2016 - The Human Side of Analytics
AMES 2016 - The Human Side of Analytics
 
6 years of my private G+ Spotfire community
6 years of my private G+ Spotfire community6 years of my private G+ Spotfire community
6 years of my private G+ Spotfire community
 
Cutting Edge Predictive Analytics with Eric Siegel
Cutting Edge Predictive Analytics with Eric Siegel   Cutting Edge Predictive Analytics with Eric Siegel
Cutting Edge Predictive Analytics with Eric Siegel
 
Big data hype (and reality)
Big data hype (and reality)Big data hype (and reality)
Big data hype (and reality)
 
Qnt275 qnt 275
Qnt275 qnt 275Qnt275 qnt 275
Qnt275 qnt 275
 
Telling A Story With Data
Telling A Story With DataTelling A Story With Data
Telling A Story With Data
 
Storyfying your Data: How to go from Data to Insights to Stories
Storyfying your Data: How to go from Data to Insights to StoriesStoryfying your Data: How to go from Data to Insights to Stories
Storyfying your Data: How to go from Data to Insights to Stories
 
Statistics in business paper
Statistics in business paperStatistics in business paper
Statistics in business paper
 
WUD2008 - The Numbers Revolution and its Effect on the Web
WUD2008 - The Numbers Revolution and its Effect on the WebWUD2008 - The Numbers Revolution and its Effect on the Web
WUD2008 - The Numbers Revolution and its Effect on the Web
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series Analysis
 
Modern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in InsuranceModern Data Discovery and Integration in Insurance
Modern Data Discovery and Integration in Insurance
 
Page 579Assess the Constituent Data. What is included Omi.docx
Page 579Assess the Constituent Data. What is included Omi.docxPage 579Assess the Constituent Data. What is included Omi.docx
Page 579Assess the Constituent Data. What is included Omi.docx
 
About Your Signature AssignmentThis signature assignment is de.docx
About Your Signature AssignmentThis signature assignment is de.docxAbout Your Signature AssignmentThis signature assignment is de.docx
About Your Signature AssignmentThis signature assignment is de.docx
 
Data Storytelling - Game changer for Analytics
Data Storytelling - Game changer for Analytics Data Storytelling - Game changer for Analytics
Data Storytelling - Game changer for Analytics
 
Introduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive AnalyticsIntroduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive Analytics
 
Unit V.pdf
Unit V.pdfUnit V.pdf
Unit V.pdf
 
Can tweets help predict a stock's price movements?
Can tweets help predict a stock's price movements?Can tweets help predict a stock's price movements?
Can tweets help predict a stock's price movements?
 
Early Lessons Learned in Applying Big Data To TV Advertising
Early Lessons Learned in Applying Big Data To TV AdvertisingEarly Lessons Learned in Applying Big Data To TV Advertising
Early Lessons Learned in Applying Big Data To TV Advertising
 
Early Lessons Learned in Applying Big Data To TV Advertising
Early Lessons Learned in Applying Big Data To TV AdvertisingEarly Lessons Learned in Applying Big Data To TV Advertising
Early Lessons Learned in Applying Big Data To TV Advertising
 

Recently uploaded

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 

Recently uploaded (20)

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 

Data Analytics Portfolio_IS.pptx

  • 1. Iryna Smologonova Data Analyst - Portfolio https://smologonova.github.io/
  • 2. Hello! My name is Iryna Smologonova I am a data analyst with a background in product development, audit and customer service. With my curiosity and tenacity, I make connections between different data sets, translate data into actionable insights and communicate the ideas to stakeholders. I am excited to expand my data analysis and critical-thinking skills, and to uncover the most desirable solutions of not only solving problems from the customer’s perspective and also affecting business growth. 2 Technical skills Excel Tableau SQL Project management Python Big data processing Soft skills Problem-solving Collaboration Leadership Business acumen Storytelling Curiosity
  • 3. Projects 3  Rockbuster Stealth Analyzing online movie rental transactions to answer business questions  Flu Season Analyzing regional and seasonal trends of influenza in the US  GameCo Analyzing global video game sales  Instacart Analyzing historical grocery order data to generate insights for Marketing strategy
  • 4. 4 Rockbuster Stealth – a global movie rental company Objective To assist with launcging strategy for the new online video service  Perform an analysis of historical data to identify sales trends, customer behavior, rental duration  Develop insights and recommendations for Rockbuster (fictitious company) Tools  PostgreSQL  Power Point  Tableau Data  Rockbuster dataset Source: PostgreSQL Tutorial Skills Creating data dictionary Database querying Joining tables Subqueries Common table expressions ERD visualization Database querying Summarizing & cleaning data in SQL Filtering & grouping Answer business questions Data visualization Recommend strategies
  • 5. 5 Rockbuster Stealth SQL functions: Aggregating, ranking, joining & grouping SQL queries available here Business questions Which movies contributed the most to revenue gain? Which genres the most popular? Answers  Top 20 movies account for 6% of total revenue and 2% number of movies  Top 3 genres by revenue- Sports, Sci- Fi and Animation produce  Top genres - Comedy, New and Sports produce more revenue per film.
  • 6. 6 Rockbuster Stealth Business questions: Do sales figures vary between geographic regions? Which rating yield the most revenue? Top 3 countries - India, China and the United States account for 25% of total number of customers and company revenue. Ratings PG-13 & NC 17 generate the most revenue Data visualizations created in Tableau, available here
  • 7. 7 Rockbuster Stealth Project deliverables:  Project report  GitHub Repo  Data Dictionary Key learning experience:  Common Table Expressions (CTE) is more readable than subqueries and can be reusable. However, subqueries and CTE have pros & cons and the choice between them should be made on a case-by-case basis.  SQL ranking functions allowed me to define top 20 movies with the highest revenue in a simple way and made my query more readable. RANK()/DENSE_RANK() functions are great for sequencing and comparing data across various factors.  A bubble chart is a solution to visualize three metrics – number of transactions, revenue and average revenue per number of transactions. It allowed to include the addition of a third dimension as a bubble size/color to emphasize the most popular genres. Recommendations: Focus on:  Adding movies to inventory generating the most revenue Ratings: PG-13 & NC-17 and Genres: Sports, Sci-Fi and Animation  Comedy, New and Sports as higher generating genres produced more revenue per film could be beneficial at a pilot project  Top 3 countries - India, China and the United States account for 25% of total number of customers and company revenue. Therefore, I would recommend to start the streaming service by piloting in these countries Click links to check the project
  • 8. Flu season 8 Sourcing the proper data Data profiling & integrity Data quality measures Data transformation & integration Conducting statistical analysis Consolidating analytical insights Statistical hypothesis testing Objective To assist in preparation of staffing plan in the United States for upcoming influenza season:  Analyze death trends  Prioritize states with vulnerable populations Tools  Excel  Tableau Data • Influenza deaths by geography, time, age, and gender Source: CDC  Population data by geography Source: US Census Bureau  Influenza lab test results by state Source: CDC (Fluview) Skills Translating business requirements Data cleaning Data integration & transformation Statistical hypothesis testing Visual analysis Forecasting Storytelling in Tableau Presenting results Scenario: To help a medical staffing agency prepare for the upcoming flu season by examining trends in influenza and how they can be used to proactively plan for clinic and hospital staffing needs across the country Data viz & storytelling
  • 9. 9 Flu Season  Data transformation by using pivot tables and VLOOKUP functions  Combining different data sets by utilizing common state and year/month variables  Normalizing the flu deaths data according to state populations by deriving new variables representing flu deaths as a percentage of state population  Examining the data variability by calculating the variance and the standard deviation  Correlation coefficient between death rate of 65+ population and death rate below 65 was 0.79 that quantified the strong relationship. It means that the higher rate of vulnerable population 65+ in the state the higher a death rate  Null Hypothesis: Flu death rate of population 65+ years old is less or equal than people under 65 years old  Alternative Hypothesis: Flu death rate of population 65+ years old is higher than people under 65 years old  The p-value is much less than the significance level of 0.05. This means that the null hypothesis is rejected and there is 95% chance that the flu death rate of people 65+ years old is higher  With 95% confidence (alfa 0.05) we can say that there is a significant difference between the flu death rate of 65+ years and other groups. Data transformation & integration Conducting Statistical Analysis Statistical Hypothesis Testing Transformation & integration Statistical Analysis  Project management plan  Interim report Click links to check the project
  • 10. 10 Flu Season Influenza death rate among population 65+ years old Number of influenza deaths by state One-year forecast of influenza deaths by state  The flu season starts in December and ends in March with a peak in January  The death rate forecast for an upcoming flu season is pretty the same in comparison with the historical data  The less populous states have the higher death rate of elderly populations per 100K: Alaska, Hawaii, Wyoming, District of Columbia, South and North Dakota, Vermont  The higher populous states have the higher number of deaths: California, New York, Texas, Pennsylvania and Florida Key questions: When does the flu season start and end? Where to focus during the flu season?
  • 11. 11 Flu Season Project deliverables:  Storytelling in Tableau – an interactive slide deck  GitHub Repo Key learning experience:  It is important to consider data limitations and assess the impact of it on the analysis and the result interpretation. As analysis progresses, data limitations may become apparent and should be added to the analysis plan.  Data mapping helps to match variables between different data sets. Death data and population data by states were mapped using age variable on 10 years ranges starting from age of 5 years old.  Normalization is a part of data preparation and allows data in different units to be compared using the same units. The flu deaths data was normalized according to state populations by deriving new variables representing flu deaths as a percentage of state population. Recommendations: Focus on:  6 top states with the highest death rate: Alaska, Hawaii, Wyoming, District of Columbia, South and North Dakota, Vermont  5 top states with the highest elderly populations: California, Texas, Florida, New York and Pennsylvania  the influenza season months: Dec-Mar with a peak in Jan Click links to check the project
  • 12. GameCo – an online rental video game company 12 Data exploration Data cleaning Grouping data Descriptive analytics Developing insights Proposal report Viz data insights Objective  Perform a descriptive analysis of an online video game sales data set to foster a better understanding of how GameCo’s (fictitious company) new games might fare in the market.  Compare historical regional sales assumptions with the reality of current market conditions. Tools  Excel  Power Point Data  VGSales Skills Grouping data Summarizing data Descriptive analysis Visualizing results in Excel Presenting results
  • 13. GameCo 13 0% 10% 20% 30% 40% 50% 60% Percent of Sales Years Regional Sales as a Percentage of Global Sales NA sales EU Sales JP sales Key question: How have their sales figures varied between geographic regions over time? 52% 27% 9% 32% 38% 19% North America Europe Japan Sales by region 2008 vs 2016 2008 2016 Assumption: The initial understanding of the business was that videogame sales have remained consistent across regions over time. Testing: • by plotting each region as a percentage of global sales in a line graph, it reveals that sales in Europe overtook North America in 2016 • the column chart has indicated a comparison of regional sales in 2008 vs 2016 - share of North America sales dropped by 20%, in Europe and Japan it increased by 11% and 10% accordingly
  • 14. 14 54% 17% 14% 10% 5% New games by Genre 2015-2016 Action Role-Playing Sports Shooter Fighting GameCo Key question: Are certain types of games more popular than others? 0 5 10 15 20 25 30 35 40 Shooter Action Sports Role-Playing Fighting Misc Racing Sales in Millions ($) Genre Regional Sales by Genre 2015-2016 JP sales EU sales NA sales The clustered bar chart was created in Excel to represent the regional sales by different video games genres. The pie chart shows the percentage breakdown of new games sales for the last two years. Insights: • Shooter dominates in North American market • Action is popular in all regional markets and it has the highest number of new games • Role-Playing is the second leader in Japan
  • 15. 15 GameCo Project deliverables: Market Goal Target audience Actions North American refocus the budget allocation stabilize sales in preventing further decline current and former customers using the large historical customer base launch direct marketing campaigns on promoting new games in Shooter, Action and Sports genres European support the sales with a slight increasing budget resources keep the growing trend over time loyal customers and acquire new customers promote intensively new games via marketing campaigns such as BOGO - buy one game in Shooter/Sports genre and get one in Action/Role-Playing genre with a certain discount Japanese allocate additional resources for emphasizing promotion and attracting new customers continue a growing trend started the last year current customers and attract new ones advance promotions using the last year approach and keeping the main accent on Action and Role-Playing new games  Proposal report  GitHub Repo Key learning experience:  It is important to consider testing the assumptions that go with the analysis. It allows to determine if conclusions are correctly drawn from the results of the analysis.  The goal of the project, the regional customers, sales over time and best-selling genres were taken into consideration to develop a regional approach of setting goals, focusing on a target audience and developing recommendations for marketing activities. Recommendations: Click links to check the project
  • 16. 16 Instacart – an online grocery store operates via an app Objective To assist with identifying sales patterns for better segmentation:  Explore historical data to define buying trends and customer behavior  Select sub-groups of customers and analyze their ordering habits Tools  Python  Jupiter notebook  Pandas & NumPy libraries  Matplotlib, seaborn & pyplot  Excel Data Skills Data cleaning & wrangling Data merging Deriving new variables Aggregating Population flows Data wrangling & subsetting Data consistency check Combining & exporting data Deriving new variables Grouping & aggregating Excel report Data viz with Python  Orders  Departments  Products  Customers
  • 17. 17 Instacart Population flows Merging the datasets The project was started from cleaning, organising and merging data before conducting the analysis  Data wrangling procedure: - dropping and renaming columns in “orders” dataset - renaming columns and changing data types in “products” and “customers” datasets - transposing “departments” datasets  Merge data together
  • 18. 18 Instacart Deriving variables & crosstabs Flag creation:  Spending flag was defined by spending amount: less than $8 & over or equal $8  Product price range was divided by 4 categories: High-range product over $15, Mid- high range between $10 &$ 15, Mid-low range between $5 &$10 and Low range product equal or less than $5 Crosstab calculation:  To display the number of orders made for every day of the week, a flag ‘busiest days’ and a variable ‘order_day_of_week’ were taken
  • 19. 19 Instacart Visualization:  Matplotlib  Seaborn Business question What differences can you find in ordering habits of different customer profiles? Answer  High income level customers prefer products from alcohol and pets department.  Affluent customers prefer products from meat seafood and can goods.  There is no clear preference for Middle income customers.  Low income customers prefer products from snacks, beverages and breakfast departments. Business question What different classifications does the demographic information suggest? Answer  Middle and Affluent customers across all age groups make the major number of orders  Young and Middle-aged customers with the Middle income level are the core of the customer base
  • 20. 20 Instacart Key learning experience:  Run out of memory in RAM for a code execution is substantially reduced by converting data types, sampling data or restarting kernels  A vast collection of libraries helps with exploring, cleaning large data sets and creating visualizations in a simple way  Deriving new variables and creating crosstabs give unlimited opportunities to find insights and communicate them via visualizations with different features for creating informative, customized, and appealing plots to present data in the most simple and effective way. Recommendations:  The least busiest days are the middle of the week: Tuesday and Wednesday. Ads should be running on Monday- Wednesday with a target to increase number of orders on Tuesday and Wednesday  To boost sales - focus on middle level income with the profile young single and single parent .Target younger customers with low income level lower price products from snacks, beverages and breakfast departments  Promote to High income and Affluent customers high range products from meat seafood, can goods, alcohol and pets departments due to they have more potential capability buy a group of products Project deliverables:  GitHub Repo  Excel report Click links to check the project
  • 21. Iryna Smologonova Winnipeg, Canada Get in touch 21 Email: email me directly