I just finished the Coursera Data Analytics Certificate course and decided to do the optional capstone data analysis project. I had the option of choosing my own project or working with Chicago’s real bikeshare data set compiled by Divvy by Lyft. After looking into the data and the program, I saw that Lyft does a bikeshare program in my home city of San Francisco called Bay Wheels so I decided to do the bikeshare project but work with the Bay Wheels data instead.
In this project, I am the junior data analyst on a marketing analytics team for a fictional company called Cyclistic. My role is to ask, prepare, process, analyze, share, and act. While the scenario is fictional, the data and findings are real.
The director of marketing believes the company’s future success depends on maximizing the number of annual memberships and they want to know how casual riders and members use the bikes differently. My report will be shared with my analytics team, the director and the executive team.
The director has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends
My assignment is to answer: how annual members and casual riders differ.
I am responsible for producing a report with the following deliverables:
1. A clear statement of the business task
2. A description of all data sources used
3. Documentation of any cleaning or manipulation of data
4. A summary of my analysis
5. Supporting visualizations and key findings
6. Top three recommendations based on my analysis
1. Nate DeWaele November 1st, 2023
Ridership Trends of Bay Wheels
Member and Casual Riders
Data analysis exploring bike share rider behavior
2. Contents
• Intro / Background
• Statement of Business Task
• Description of Data Sources Used
• Documentation of any cleaning or manipulation of data
• Summary of Analysis
• Supporting Visualizations
• Recommendations
• Appendix
3. Intro / Background
Coursera Data Analytics Certi
fi
cate Capstone Project
• This project demonstrates what I have learned in the course
• The data is real; taken from San Francisco Bay Wheels data provided
by Lyft
• My role: Data Analyst on a marketing analytics team for a fictional
company called Cyclistic
• Goal: Determine difference between casual and member bike riders
• Deliverable: This presentation which meets the business task
I just
fi
nished the Coursera Data Analytics Certi
fi
cate course and decided to do the optional capstone data analysis project. I had the option of choosing my own project
or working with Chicago’s real bikeshare data set compiled by Divvy by Lyft. After looking into the data and the program, I saw that Lyft does a bikeshare program in my
home city of San Francisco called Bay Wheels so I decided to do the bikeshare project but work with the Bay Wheels data instead.
In this project, I am the junior data analyst on a marketing analytics team for a
fi
ctional company called Cyclistic. My role is to ask, prepare, process, analyze, share, and
act. While the scenario is
fi
ctional, the data and
fi
ndings are real.
The director of marketing believes the company’s future success depends on maximizing the number of annual memberships and they want to know how casual riders
and members use the bikes di
ff
erently. My report will be shared with my analytics team, the director and the executive team.
The director has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst
team needs to better understand how annual members and casual riders di
ff
er, why casual riders would buy a membership, and how digital media could a
ff
ect their
marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends
My assignment is to answer: how annual members and casual riders di
ff
er.
I am responsible for producing a report with the following deliverables:
1. A clear statement of the business task
2. A description of all data sources used
3. Documentation of any cleaning or manipulation of data
4. 4. A summary of my analysis
5. Supporting visualizations and key
fi
ndings
6. Your top three recommendations based on my analysis
5. Business Task
• Determine how annual and casual riders differ
• Report findings to:
• Marketing team
• Director
• Executive team
6. Data Source Details
Bay Wheels bike-share data is real and was collected by Lyft; it is made
publicly available by the city of San Francisco and Motivate International Inc.
• Data spans from Oct. 2022 - Sept. 2023
• Final processing of data was on November 6th, 2023
• Total data: 2,485,313 observations
• Credit Card Info was removed so as to protect user
identities and
fi
nancial Info
• Insights about who converted from casual to
member cannot be determined
• See appendix for more info
• Variables
• ride_id
• rideable_type
• started_at
• ended_at
• start_station_name
• start_station_id
• end_station_name
• end_station_id
• start_lat
• start_lng
• end_lat
• end_lng
• member_casual
7. Documentation of Data Cleaning
From 2,485,313 raw observations to a clean data set of 1,865,753
• RStudio was used due to the large volume of records
• Removed Observations:
• Duplicates
• Nulls
• Containing empty strings in member_casual, started_at, ended_at
• Trip times < 0
• Records with the word “Test” in them
• Con
fi
rmed there were no outliers
• started_at and ended_at converted from string to date time format
• duration column made by subtracting started_at from ended_at
Data cleaning and manipulation was done in Rstudio. After all records were collected they were merged into one raw data set containing 2,485,313 observations. Here
is a list of things I did to clean the data:
Remove duplicates
Remove observations with null values
Remove observations with empty strings
After con
fi
rming that those records had an even distribution and were not in large chunks of time
Con
fi
rmed there were no large outliers, no observations were removed
Remove observations where the trip time was less than zero
Remove any observations with the word “test” in them
After removing records, I had 1,865,753 records left. I then checked some basic stats about trip duration to see if there might be anything concerning. To do that I
converted ended_at and started_at from strings to datetime data types and made a duration column which is ended_at - started_at.
8. Documentation of Data Cleaning
Final cleaned data set
• Mean Duration: 22.00771
• Median Duration: 22.74417
• Standard Deviation: 4.766666
The median durations are relatively close, which suggests that the data may be
normally distributed. The standard deviation is relatively low, indicating that most
of the data points are close to the mean and will tell an accurate story.
Below are the stats:
Mean Duration: 22.00771
Median Duration: 22.74417
Standard Deviation: 4.766666
The median durations are relatively close, which suggests that the data may be normally distributed. The standard deviation is relatively low, indicating that most of the
data points are close to the mean and will tell an accurate story.
9. Summary of Analysis
Basic Stats: No signi
fi
cant di
ff
erence in trip duration
Member Riders Summary
• Mean duration: 21.91673
• Median duration: 22.68861
• Standard deviation: 4.769218
Casual Riders Summary
• Mean duration: 22.20975
• Median duration: 22.83722
• Standard deviation: 4.754778
10. Casual routes
Summary of Analysis
Top routes
Member routes
Start Station Name End Station Name Total Rides
Market St. at Stuart St Barry St. at 4th St 811
North Point St. at Polk St. Market St. at Stuart St 811
Bay Pl. at Vernon St. 19th St. BART Station 773
Market St. at 10th St. Market St. at 10th St. 738
The Embarcadero St. at Sansome Market St. at Stuart St 738
Start Station Name End Station Name Total Rides
Mason St. at Halleck Mason St. at Halleck 1763
Lincoln Blvd. at Graham St. Lincoln Blvd. at Graham St. 1176
Pier 1/2 at The Embarcadero North Point St. at Powell St. 876
Fell St. at Stanyan St. Fell St. at Stanyan St. 875
North Point St. at Polk St. North Point St. at Polk St. 784
Looking at the top 5 routes for member and casual rider types, we see that 4/5 routes casual riders take are circuitous; they end at the same station the bike was
checked out at. Alternatively, 4/5 of the most common routes member riders take end at a di
ff
erent station than the start.
This suggests that members have a place to go and that casual riders are generally taking bikes for a joy ride
11. Casual riders
Summary of Analysis
Top start stations
Member riders
Start Station Name Total Rides
Market St. at Stuart St. 18,327
Market St. at 10th St. 18,014
Powell St. BART Station (Market St. at 4th St.) 14,521
San Francisco Caltrain (Townsend St. at 4th St.) 13,991
Montgomery St. BART Station (Market St. at 2nd St.) 12,853
Start Station Name Total Rides
Market St. at Stuart St. 8,060
San Francisco Ferry Building (Harry Bridges Plaza) 7,008
3 Pier 1/2 at The Embarcadero 6,775
Powell St. BART Station (Market St. at 4th St.) 6,068
North Point St. at Polk St. 6,053
While we are looking at the di
ff
erences, I found that there was a similarity in start stations for casual and member riders. A look at the top 5 start stations for each rider
showed that Market St at Steuart St and Powell St BART Station (Market St at 4th St) stations are popular among both riders. These may be stations where more
commuters are leaving from therefore these may be better targets for advertising.
*Market St at Steuart St station* and *Powell St BART Station (Market St at 4th St)* are two stations that are most common for causal and member riders. It may be
worth targeting marketing in this area. Some users may be in town for work and biking into the site from these stations. People who live in the area may also have used
these stations casually; advertisements around these stations may help convert casual riders to member riders.
12. Visualizations
Ridership for members is highest at the beginning of the business week, especially mid week. It drops signi
fi
cantly on Friday and Saturday and then picks up again on
Sunday. This may be because members use bikes to commute during the week. Members may be less inclined to ride on Friday and Saturday because that is where
their routine transitions out of work. And the increase on Sunday may be more for leisure.
Casual riders show a more even distribution of bike checkouts from day to day with highest number of checkouts on Thursday, Friday and Saturday. Interestingly, on
Monday, Tuesday and Thursday, the di
ff
erence in amount of bike checkouts is very similar to the bike checkouts of members on those days, this indicates casual riders
on those days may be using the bikes more like members do to commute.
13. Visualizations
Ridership for members and casual riders show a similar trend which makes sense because people are less likely to check bikes out during the late hours of the night.
Member riders begin checking out more bikes before work starts with a peak between 8-9am. Following rush hour, the member ridership drops and evens out until the
post work rush hour at 4pm where a signi
fi
cantly larger amount of bikes are checked out till 6pm. This trend indicates members are largely riding as part of a commute
and between the ride to work and the ride home, they prefer to commute home via bike.
Casual riders don’t demonstrate as large an uptick of checkouts as members during the morning rush hour; however ridership does increase around that time.
Interestingly, casual ridership does spike at rush hour during the afternoon commute in the same proportion as member riders; that indicates that casual riders who
check out bikes around that time are commuting home or back to their hotel after work.
14. Visualizations
The rate of bike checkouts is proportionally similar between member and casual riders throughout the year. April through October are the warmest months in San
Francisco and the uptick in bike checkouts re
fl
ects that; these may be the best months for marketing campaigns.
15. Recommendations
• Implement a strategic marketing campaign spanning from April to October to capture the
peak ridership seasons, targeting the transition of casual riders into loyal members.
• Highlight the practical utility of bike usage, particularly during the afternoon rush hour.
Emphasize biking as a means of unwinding after work, aligning with the observed surge in
checkouts during this time.
• Focus marketing e
ff
orts on stations like Market St. at Steuart St. and Powell St. BART
Station (Market St. at 4th St.), the most frequent start stations for both casual and member
riders. These locations, often used by commuters or locals, present a prime opportunity for
targeted advertising. Encouraging riders to shift from casual use to committed membership
could signi
fi
cantly bene
fi
t from localized marketing strategies around these stations.
Roll out marketing campaigns from April to October
There are many reasons to use a bike such as leisure, but it seems even casual riders largely use the bikes for the practical purpose of commuting. Since a larger
amount of checkouts are during afternoon rush hour, the marketing message may appeal to people who want to unwind after work.
Market St at Stuart St station and Powell St BART Station (Market St at 4th St) are two stations that are most common for causal and member riders. It may be worth
targeting marketing in this area. Some users may be in town for work and biking into the site from these stations. People who live in the area may also have used these
stations casually; advertisements around these stations may help convert casual riders to member riders.
16. Appendix
Links to Published Work
• This project can be found and checked on my Kaggle pro le
• Supporting scripts must be accessed on my GitHub account
• This presentation is also on my LinkedIn Page or my slideshare page
• If you like my work please consider sharing it or connecting with me on
LinkedIn
• I am currently looking for a Software Product Manager or Data Analyst
position
17. Appendix
Kudos / Citations
• Data set: Motivate International Inc. Bay Wheels, Lyft Bikes and Scooters LLC.
2022-2023. Bay Wheels Trip Data. Retrieved from URL: https://s3.amazonaws.com/
baywheels-data/index.html.
• License: Bay Wheels, Lyft Bikes and Scooters LLC. (2023). Data License Agreement.
Retrieved from URL: https://baywheels-assets.s3.amazonaws.com/data-license-
agreement.html
• Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New
York. ISBN 978-3-319-24277-4. https://ggplot2.tidyverse.org
• Wickham, H., Francois, R., Henry, L., & Muller, K. (2021). Dplyr: A Grammar of Data
Manipulation. R package version 1.0.7 Retrieved from https://CRAN.R-project.org/
package=dplyr
18. Appendix
Kudos / Citations
• Xie, Y. (2021). Knitr: A General-Purpose Package for Dynamic Report
Generation in R. R package version 1.33. Retrieved from https://CRAN.R-
project.org/package/knitr
• Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., Francois,
R., … & Yutani, H. (2019). Welcome to the tidy verse. Journal of Open Source
Software, 4(43), 1686. Retrieved from https://doi.org/10.21105/joss.01686
• Grolemund, H., & Wickham, H. (2011). Dates and times made easy with
lubridate. Journal of Statistical Software, 40(3), 1-25. Retrieved from https://
doi.org/10.18637/jss.v040.i03