SlideShare a Scribd company logo
1 of 111
Download to read offline
B-Cycle Report
Brett Keim
November 24, 2016
Abstract
With bike rentals in Omaha gaining popularity each year from the years 2013 to 2016 it is important
for the Heartland B-Cycle team to understand the patterns behind this growth. This research report
reveals some of those patterns and provides data to researchers which will aid them in planning for future
demand of bike rentals. This report was also created to find inefficiencies inside the current system and
provide information on why certain phenomenon occurred. Conclusions about the bike rentals will be
provided inside each subtopic of discussion as well as a closing future works section.
1
Contents
1 About the Data 3
2 Bike Rebalancing 3
2.1 Station Distance Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Driving Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Biking Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Station Counts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Initial Station Counts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Station Counts on any Date/Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Data Analysis 8
3.1 General Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Checkouts vs Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3 Day of the Week and Time of day Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3.1 Day of the Week . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3.2 Time of the Day . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4 Distribution of Membership Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.5 Station Pairs Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.6 Bus Route Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.7 Bike Histories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.8 Station Histories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.8.1 Time a station is full and empty(or near) . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.8.2 Why did a station’s full or empty count change? . . . . . . . . . . . . . . . . . . . . . 25
3.8.3 What about the busy months? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4 Forecasting 28
4.1 Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 Low Count Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5 Future Works 36
A Appendix 38
A.1 Stations Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
A.2 Driving Distances Matrix Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
A.3 Biking Distances Matrix Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
A.4 Station Bike Count Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
A.5 Station Counts at any Date/Time Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
A.6 Summary and Desribe Output from Section 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . 41
A.7 Code for Section 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
A.8 Code for Section 3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
A.9 Code for Section 3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
A.10 Line Frequency Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
A.11 Facetted Line Frequency Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
A.12 Code used for Top 10 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
A.13 Code used in Section 3.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
A.14 Bike Histories Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
A.15 Bike Jumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
A.16 Station Histories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
A.17 Station Histories(cont.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
A.18 Code for full/empty changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
A.19 Busy Months Full/Empty Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
A.20 Bike Usage at Stations according to Duration . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
A.21 Histogram Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
1
A.22 Individual Frequency Tables for Max Bike Demand . . . . . . . . . . . . . . . . . . . . . . . . 86
A.23 Individual Histograms for Max Bike Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
List of Figures
1 Checkouts vs Returns per each Kiosk Station . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Difference between Checkouts and Returns for each station over the lifetime of the data set. . 12
3 Usage per day of the week with overall percentage . . . . . . . . . . . . . . . . . . . . . . . . 13
4 Usage throughout the day with hour time splits . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5 Time grouping usage distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6 Usage plot according to time of day and grouped by day of the week. . . . . . . . . . . . . . . 15
7 Usage according to Membership Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
8 Time of day usage plot split by day of the week and facetted by membership type . . . . . . 16
9 Station Pairs usage conveyed through a line frequency plot. . . . . . . . . . . . . . . . . . . . 17
10 Station pairs line frequency plot of downtown Omaha . . . . . . . . . . . . . . . . . . . . . . 18
11 Station pairs line frequency plot of downtown Omaha facetted by membership type. . . . . . 19
12 Popular bus routes plotted against station pairs line frequency plot. . . . . . . . . . . . . . . 21
13 Downtown Omaha station pairs line frequency plot with popular bus routes shown. . . . . . . 22
14 Number of Instances Maintenance did (or did not) cause a station to go from full to not full . 25
15 Times Maintenance did (or did not) cause a station to go from empty (or near) to not empty
(or near) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
16 Bike Usage for each Return Station excluding Bob Kerrey Predistrian Bridge . . . . . . . . . 31
17 Bike Usage for each Checkout Station excluding Bob Kerrey Predistrian Bridge . . . . . . . . 32
18 Frequency of Max Demands for all stations compiled . . . . . . . . . . . . . . . . . . . . . . . 33
19 Minimum Counts per Day Split by Station . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
List of Tables
1 Station Driving Distances (in meters) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Station Biking Distances (in meters) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Bike Station count on 2015-07-20 21:00:00 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4 Station Counts on 2015-07-20 21:00:00 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5 Station Checkouts and Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
6 Stations with more Checkouts than Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
7 Stations with more Returns than Checkouts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
8 Top 10 Station Pairs with Membership Type . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
9 Top 10 Station Pairs with Membership Type excluding round trips . . . . . . . . . . . . . . . 20
10 Sample of a Bike’s History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
11 Sample of Jumps from bikes 951 and 942 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
12 Sample of Bob Kerrey Pedestrian Bridge Station History . . . . . . . . . . . . . . . . . . . . . 24
13 Stations Time being Full and Empty (or near) (in mins) with Percentage . . . . . . . . . . . 24
14 Full changes according to UserRole . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
15 Empty (or near empty) changes according to UserRole . . . . . . . . . . . . . . . . . . . . . . 27
16 Summer Stats for Full/Empty/or near for all Stations . . . . . . . . . . . . . . . . . . . . . . 28
17 Bike Usage Distribution at each Checkout Station . . . . . . . . . . . . . . . . . . . . . . . . 29
18 Bike Usage Distribution at each Return Station . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2
1 About the Data
The data used in this research report was gathered through the Heartland B-Cycle Administration website.
The data was collected on the site in a couple of different locations, but the primary set used in this report
was the trips data. This data set is continuously being updated as the trips come in, so in order to have a
closed set a snapshot data set was taken from the website. This data set has a final date of 2016-08-31 and a
starting date of 2014-01-11. The data set contained 39414, 1 rows with 28 different columns which included:
• TripID
• UserProgramName
• User Id
• UserRole
• UserCity
• UserState
• UserZip
• UserCountry
• MembershipType
• Bike
• BikeType
• CheckoutKioskName
• ReturnKioskName
• DurationMins
• AdjustedDurationMins
• UsageFee
• AdjustmentsFlag
• Distance
• EstimatedCarbonOffset
• EstimatedCaloriesBurned
• CheckoutDateLocal
• ReturnDateLocal
• CheckoutTimeLocal
• ReturnTimeLocal
• TripOver30Mins
• LocalProgramFlag
• TripRouteCategory
• TrpProgramName
These original columns were missing some key information so the following columns were added:
• DOWC: Day of the week for the checkout entry.
• DOWR: Day of the week for the return entry.
• Checkout date time: Combines the checkout date and time into one entry.
• Return date time: Combines the return date and time into one entry.
The data set needed some cleaning and data type changes in order to make it easier to manipulate.
I first changed the CheckoutDateLocal and ReturnDateLocal from factor fields to date fields. I left the
Checkout date time and Return date time fields as ”POSIXct” ”POSIXt” classes which allowed for easy
sorting, subsetting, and elimination of rows in the data set based on their date and time stamps. It is also
worth mentioning all of the time stamps were converted into military time for accuracy and readability.
2 Bike Rebalancing
This report only reflects the work of part of the Heartland B-Cycle research team. This report is the data
analysis side of things, whereas the other group of researchers were working to find the most efficient way to
re-balance the distribution of bikes among the stations. Ideally the stations need to be balanced every day,
but in a real world situation with real costs this is just not feasible. The team is working on a method which
would insure the stations are re-balanced every week. In order for them to achieve their goal they needed
certain information about the data and the system as a whole which is where our works intertwine.
2.1 Station Distance Matrices
The first item the bike balancing team asked for was the distance from station to station for the 31 original
stations. The most efficient way to do this was to use the Google API which essentially is typing the address
of each station to station destination on google.com/maps. In order for this to work efficiently a function
was written in R that afforded me all of the same options as the google maps site. Those options include
3
transportation type (bus, walking, driving, biking) as well as output type (distance, time). The function was
coded to identify five different types of errors (invalid request, unknown error, max elements exceeded, over
query limit, request denied) each of which tells the user a specific problem to address. The function needed
to know the address of each station, specifically a four digit address and not just a two street intersection.
The intersections were converted from the station list provided by Heartland B-Cycle’s Associates. These
addresses were converted into longitudinal and latitudinal coordinates and stored inside a separate data
frame for future reference, see A.1. The function and addresses were then ran in RStudio to yield two
different results.
2.1.1 Driving Distances
The first distance matrix to be calculated was the driving distance between all of the stations. Although the
users will be riding bikes between stations it is still worth finding the driving distances between all of the
stations because the bike balancers will be driving among the stations to collect bikes. The final distance
matrix is shown in Table 1 and the code for obtaining the matrix is located in A.2. In an effort for the
matrix to be readable the kiosk names were replaced with an index value. To identify the two stations please
see the stations table in A.1.
Kiosk 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
1 0 523 730 1188 1506 1440 617 1174 1509 1488 5083 6598 1581 4216 5975 6764 12086 12247 14490 2149 1774 1305 13885 5083 11110 7791 2149 2924 7365 1441 4677
2 920 0 420 878 983 916 1096 1526 986 964 4773 6413 1058 4695 5671 6579 12565 12725 14969 1631 2141 995 14363 4773 11589 7482 1631 2739 7055 1808 4367
3 1406 940 0 464 786 727 1366 1795 789 999 5253 6216 1327 4965 5940 6849 12834 12995 8450 1915 1944 796 14633 5253 11859 7961 1915 2569 7535 1611 4847
4 1213 747 473 0 340 964 1607 2037 558 1241 5059 5683 1569 5206 6182 7091 11246 11406 7917 1711 820 331 13044 5059 12101 7768 1711 2811 7341 487 4653
5 1310 844 804 554 0 625 1269 1699 219 903 5157 5481 1231 4868 5844 6753 7679 7840 7716 1819 1375 872 8265 5157 11762 7865 1819 2472 7438 1041 4751
6 1125 658 618 1076 516 0 1084 1513 519 729 4971 5781 1045 4683 5658 6567 12552 12713 8015 1633 1674 1193 14351 4971 11577 7679 1633 2299 7253 1341 4565
7 705 1161 1374 1610 1271 1205 0 436 1275 1253 5379 6687 1347 2810 5237 6853 12175 12335 14579 2162 2430 1927 13973 5379 11199 8088 2162 3013 7661 2096 4973
8 1133 1569 1775 2024 1686 2111 417 0 1689 1942 5794 6797 1761 2374 4801 6920 12042 12203 9031 2123 2844 2342 13841 5794 10072 8502 2123 3103 8076 2511 5388
9 1755 1288 1248 780 679 630 1714 2276 0 679 5601 5262 1008 4677 5620 6529 7460 7620 7496 2263 1597 1327 8046 5601 11572 8310 2263 2249 7883 1264 5195
10 1294 827 787 1246 685 169 1253 1682 688 0 5140 5667 547 4216 5159 6068 11985 12145 7901 1802 1844 1362 13783 5140 11111 7849 1802 2019 7422 1510 4734
11 5493 5027 5233 5691 6009 5943 5853 6286 6013 5991 0 14491 6085 9376 11015 11924 13032 13192 15436 6204 7909 5808 14830 0 16270 9689 6204 8083 9296 7576 3753
12 7862 6644 6604 5868 5763 5986 7146 6729 5541 5817 11411 0 5702 7444 1999 1863 2881 2621 2234 8852 6219 5949 2784 11411 14338 14120 8852 3787 13693 5885 11005
13 1855 1389 1349 1807 1246 731 1814 2162 1250 561 7637 5355 0 3670 4613 5521 7553 7713 7589 2364 2392 1924 8085 7637 10564 10346 2364 1681 9919 2058 7231
14 3519 4882 5095 5331 4992 4548 2802 2386 4996 4379 9149 7241 4203 0 5429 7407 12736 12896 15140 4509 6151 5648 14534 9149 7702 11857 4509 3567 11431 5817 8743
15 5949 6344 5893 6793 6455 5275 5232 4815 6458 5106 10611 2058 4991 5463 0 1866 4569 4729 3763 6939 6873 6468 4313 10611 12378 13320 6939 3358 12893 6540 10205
16 7308 6842 6802 7260 6699 6184 7544 7128 6703 6014 11609 2108 5900 7642 1885 0 4139 3117 2520 7817 7782 7377 3070 11609 14536 14318 7817 4267 13891 7448 11203
17 15360 15086 15299 8194 15197 14752 15066 14649 15200 14583 16960 3018 14407 15304 4560 3236 0 466 736 16071 8544 8275 532 16960 22198 21432 16071 6112 21039 8211 18326
18 13480 13206 13419 8237 8131 12872 13186 12769 7909 12703 15080 2616 12527 13424 4489 3105 365 0 605 14191 8587 8317 400 15080 20318 19552 14191 6155 19159 8254 16446
19 10292 8807 8766 8298 8193 8148 9576 9159 7970 7979 15686 2430 7864 14031 3831 2507 966 605 0 11282 8649 8379 557 15686 16920 20159 11282 6216 19765 8315 17052
20 1510 1258 1241 1699 2240 1967 2182 2125 2244 2222 5827 8922 2316 4499 6926 7837 13606 13767 16011 0 2285 1816 15405 5827 12722 8536 0 3810 8109 1952 5421
21 1512 1046 772 539 1143 1498 2137 2573 1097 1771 5358 6222 2104 5736 6716 7625 11582 11743 13987 2010 0 234 13381 5358 12630 8067 2010 3340 7640 370 4952
22 1353 886 612 379 656 1339 1978 2414 876 1611 5199 6062 1944 5577 6557 7466 11375 11535 13779 1851 496 0 13173 5199 12471 7908 1851 3181 7481 162 4793
23 15187 14913 15125 8848 8742 14579 14892 14476 8520 14409 16786 2980 14234 15131 4381 3057 765 400 557 15897 9198 8929 0 16786 22025 21259 15897 6766 20865 8865 18152
24 5493 5027 5233 5691 6009 5943 5853 6286 6013 5991 0 14491 6085 9376 11015 11924 13032 13192 15436 6204 7909 5808 14830 0 16270 9689 6204 8083 9296 7576 3753
25 11942 11668 11881 12117 11779 11334 10596 10179 11782 11165 21591 14027 10989 7824 12238 14193 19522 19682 21926 12721 12937 12435 21320 21591 0 18501 12721 10353 18075 12604 21185
26 7411 6945 7151 7610 7928 7861 7771 8205 7931 7909 9855 13676 8003 11294 12933 13842 18708 18869 21112 8122 8196 7726 20507 9855 18896 0 8122 10001 642 7862 7159
27 1510 1258 1241 1699 2240 1967 2182 2125 2244 2222 5827 8922 2316 4499 6926 7837 13606 13767 16011 0 2285 1816 15405 5827 12722 8536 0 3810 8109 1952 5421
28 3762 3296 3256 2803 2465 2638 3858 3441 2468 2468 8063 3528 2354 4096 3358 4267 5726 5887 5762 4271 3621 3352 6258 8063 10990 10772 4271 0 10345 3288 7657
29 7436 6970 7176 7634 7952 7886 7796 8229 7956 7934 9399 13700 8027 11318 12958 13867 18252 18412 20656 8147 8221 7751 20050 9399 19072 460 8147 10026 0 7887 6672
30 1461 994 720 487 826 1454 2093 2523 1045 1727 5307 6168 2055 5692 6668 7576 11266 11426 13670 1959 387 182 13064 5307 12586 8015 1959 3296 7589 0 4901
31 4881 4415 4621 5079 5397 5331 5241 5675 5401 5379 3384 11145 5473 8764 10403 11312 14936 15097 17340 5592 5666 5196 16735 3384 15658 6991 5592 7471 6646 5332 0
Table 1: Station Driving Distances (in meters)
2.1.2 Biking Distances
The other distance matrix calculated was the biking distances between all of the stations. These distances
give us vital information about user distances and are especially useful when identifying where potentially
new stations need to made according to usage of each pair of stations. This matrix also allows a person to
know how far they would have to travel using the system from station to station. The matrix is shown in
Table 2 and the code for obtaining the matrix is located in A.3.
4
Kiosk 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
1 0 523 730 1188 1295 999 615 1174 1516 1169 5622 11254 1479 3519 5835 7982 9805 9920 9364 2129 1599 1506 10015 5622 10670 10679 2129 3299 8804 1404 3637
2 792 0 265 724 831 580 1094 1582 1051 749 5173 10805 1058 3859 6174 8321 9340 9455 8899 1407 1135 1041 9550 5173 11010 10230 1407 2834 8355 940 3188
3 743 276 0 464 568 727 1366 1853 789 999 5780 11413 1325 4130 5909 8056 9075 9190 8634 1964 872 777 9285 5780 11281 10838 1964 2569 8963 677 3796
4 1213 747 473 0 339 965 2042 2092 558 1238 6023 11655 1564 4369 6148 8295 8707 8867 8873 2207 820 316 9524 6023 11519 11080 2207 2807 9205 430 4038
5 1179 842 568 342 0 626 1705 1755 220 901 6128 11760 1227 4032 5811 7957 8369 8530 8536 2311 1314 655 9187 6128 11182 11185 2311 2470 9310 775 4143
6 995 658 618 854 517 0 1301 1351 519 169 5399 11032 1055 3628 5639 7786 8805 8920 8364 1633 1487 1172 9015 5399 10778 10456 1633 2299 8582 1292 3415
7 615 1141 1347 1610 1272 868 0 488 1275 1038 5635 11267 1348 2765 5220 7367 9560 9676 9238 2142 2420 1927 9889 5635 9916 10692 2142 3054 8817 2021 3650
8 1133 1569 1775 2024 1687 1283 417 0 1689 1452 5596 11228 1383 2345 4976 7123 9609 9591 8994 2103 2835 2342 9645 5596 9496 10653 2103 3103 8778 2450 3611
9 1404 1063 789 780 339 408 1482 1532 0 463 6347 11979 986 3820 5589 7736 8149 8310 8314 2530 1535 1096 8965 6347 10960 11404 2530 2249 9529 1211 4362
10 1164 827 787 1023 686 169 1244 1294 688 0 5569 11201 776 3571 5360 7507 8525 8641 8085 1802 1657 1341 8736 5569 10722 10626 1802 2019 8751 1462 3584
11 5709 5355 5286 5744 6128 6286 5857 5641 6348 6559 0 8035 6170 7986 11469 13615 14634 14750 14194 3767 6018 6251 14844 0 16412 7457 3767 8128 7343 5960 2113
12 9848 9494 9424 9883 10266 10424 9996 9780 10487 10697 8065 0 10308 12125 15607 17754 22207 18888 18332 7905 10156 10390 18983 8065 20550 578 7905 12267 1037 10099 7598
13 1477 1389 1349 1583 1241 731 1331 1381 1243 561 7130 12763 0 2809 5063 7210 8229 8344 7788 3314 2319 1896 8439 7130 10882 12188 3314 1722 10313 2015 5146
14 3380 3920 4127 4394 4052 3542 2765 2386 4055 3373 7981 13613 2834 0 5488 7417 10337 9972 9375 4489 5180 4708 10025 7981 8070 13038 4489 3729 11164 4801 5996
15 5833 6367 6100 6110 5768 5719 5218 4798 5771 5569 10394 16026 5246 5464 0 2147 4980 4615 4018 6901 6847 6424 4669 10394 10679 15451 6901 3340 13576 6542 8409
16 7874 8408 8142 8152 7810 7760 7259 6839 7813 7611 12435 18067 7288 7377 2042 0 3482 3117 2520 8942 8889 8465 3183 12435 13051 17492 8942 5382 15617 8584 10450
17 10561 9701 9427 8958 9095 9045 9946 9526 8632 8895 15122 20754 8780 10064 4729 3236 0 466 736 11629 9314 9044 641 15122 15737 21637 11629 6880 21523 9388 13137
18 10429 9234 8960 8491 8628 8578 9814 9394 8165 8428 14517 20149 8313 9932 4597 3105 365 0 605 10701 8847 8577 509 14517 15606 19574 10701 6413 17700 8921 12532
19 9832 9435 9161 8692 8829 8779 9217 8797 8366 8629 14393 20025 8514 9335 4000 2507 970 605 0 10900 9048 8778 743 14393 15008 19450 10900 7340 17575 9122 12408
20 1546 1539 1469 1928 2311 2469 2365 2148 2532 2742 3767 9399 2354 4494 7652 9799 10818 10933 10377 0 2201 2435 11028 3767 13137 8824 0 4312 6949 2144 1782
21 1512 1046 894 549 889 1516 2139 2573 1109 1789 6018 11650 2115 4918 6699 8846 9258 9419 9424 2201 0 234 10075 6018 12069 11075 2201 3358 9200 755 4033
22 1353 886 612 316 655 1283 1980 2414 875 1555 6252 11885 1881 4759 6465 8612 9024 9185 9190 2436 694 0 9841 6252 11909 11310 2436 3125 9435 744 4268
23 10501 9631 9357 8888 9025 8975 9886 9466 8562 8825 15062 20694 8710 10004 4669 3183 874 509 666 11569 9244 8974 0 15062 15677 20119 11569 8009 18244 9318 13077
24 5709 5355 5286 5744 6128 6286 5857 5641 6348 6559 0 8035 6170 7986 11469 13615 14634 14750 14194 3767 6018 6251 14844 0 16412 7457 3767 8128 7343 5960 2113
25 10530 11070 11277 11519 11182 10778 9915 9495 11184 10708 16332 21964 10169 7994 11778 13707 16627 16262 15665 11598 12330 11836 16316 16332 0 21389 11598 10376 19514 11951 14347
26 9272 8918 8849 9307 9691 9849 9420 9204 9911 10122 7489 578 9733 11549 15032 17179 21631 18313 17757 7330 9581 9814 18408 7489 19975 0 7330 11691 460 9523 7022
27 1546 1539 1469 1928 2311 2469 2365 2148 2532 2742 3767 9399 2354 4494 7652 9799 10818 10933 10377 0 2201 2435 11028 3767 13137 8824 0 4312 6949 2144 1782
28 3422 3084 2810 2820 2478 2428 3525 3537 2480 2279 8368 14000 2164 4185 3340 5487 6506 6621 6065 4551 3556 3133 6716 8368 10458 13425 4551 0 11550 3252 6383
29 8876 8521 8452 8910 9294 9452 9024 8807 9514 9725 7294 1038 9336 11153 14635 16782 17801 17916 17360 6933 9184 9418 18011 7294 19578 460 6933 11294 0 9126 6827
30 1417 951 677 444 775 1403 2044 2478 995 1676 5960 11592 2002 4823 6586 8733 10526 10687 9311 2144 1423 535 11321 5960 11974 11017 2144 3246 9142 0 3975
31 3724 3370 3301 3759 4143 4301 3873 3656 4363 4574 2113 7568 4185 6001 9484 11631 12649 12765 12209 1782 4033 4266 12860 2113 14427 6991 1782 6143 6876 3975 0
Table 2: Station Biking Distances (in meters)
2.2 Station Counts
In order for the bike rebalancing team to develop a model or system of balancing the bike distribution they
needed to know how many bikes were at a station at one given time. They wanted to be able to have the
power to type in any date in the systems life and know exactly how many bikes were at each station given
that date/time. What the team was asking for was simple, enter a date and count how many bikes were
at the station. This proved to be a much more complex task because of the lack of data on the number of
bikes at each station at the start of the time as well as the fact that more bikes and stations were added
to the system as the system as a whole gained more usage. My initial idea to answer this question was to
start at the date the user wanted and ”crawl” through the data set counting each unique bikes first entry in
the system and tally them in their respective station locations. However, this idea wad problematic given
the structure of the data and how much it changed throughout the life of the system. Instead I split the
data set into two parts, the entries before the user date and the entries after the user date. I only needed
the entries before the user date because those show where the bikes where up to the user date. The initial
idea of crawling through the data set was then used, but instead of starting at the user date and moving
forward in time I started at the user date and crawled backwards. When a bike first came up its station was
identified and gained a tally for total bikes, this process was repeated until every bike ID was identified in
the system. This method eliminated the need to account for new stations being added to the system as well
as worry about new bikes being added to the system.
2.2.1 Initial Station Counts
TIt was important to look at the initial bike counts at the respective stations, so I took the time to choose an
arbitrary date to get the station counts at. The date I choose was 2015-07-20 and I used my idea previously
talk about in 2.2 to obtain Table 3. To see the code used to create the table please see A.4
5
KioskName Count
10th & Cass 3
10th & Dodge 4
10th & Farnam 7
11th & Jackson 13
13th & Howard 3
14th & Douglas 7
14th & Fahey 3
1516 Cuming St 8
15th & Farnam 0
15th & Howard 3
16th & Douglas 4
1819 Farnam 2
1st & Broadway 3
20th & Dodge 2
24th & Lake 2
50th & Underwood 2
62nd & Dodge 1
66th & Center 2
67th & Frances 6
67th & Pine 4
711 S Main St 0
7th & Jones 6
9th & Jones 1
Aksarben Drive 8
Ameristar 4
Ben’s House 0
Bike Share HQ 0
Bike Union 0
Bob Kerrey Pedestrian Bridge 11
Broadway & Main 6
College of Saint Mary 0
Lewis & Clark Landing 6
Midtown Crossing: 32nd & Farnam 5
Pearl St & Willow Ave 2
Shop 2
The Durham Museum 5
Tip Top Building 0
Tom Hanafan River’s Edge Park 1
dead station 14
Table 3: Bike Station count on 2015-07-20 21:00:00
Looking at Table 3, there are a couple of features that jump out at me. The first of which is the number
of bikes (14) at the dead station. I also wanted to make sure the number of bikes added up to 150 because
there are 150 unique bikes inside the system, which is true in this case.
2.2.2 Station Counts on any Date/Time
I have shown that we can find the station counts at a single date, but the real important question was could
this process be repeated easy on different dates and times. The bike rebalancing team needed a way to check
the station counts at any given date as well as at any time. Their motivation for this is to better understand
what each station looks like at the end of a day, which would aid them in knowing which stations have too
many bikes, too few bikes, and which stations are close to each other to trade bikes essentially. This actually
didn’t present as big of a problem because of the lubridate package in R and its ability to easily subset dates
that contain time of day. Table 4 shows the counts of the stations on 2015-07-20 21:00:00. The code for
creating the table is same as the code used for 2.2.1, but because the bike rebalancing team needed to use
6
this code a function was created which can be found in A.5
KioskName Count
10th & Cass 3
10th & Dodge 4
10th & Farnam 7
11th & Jackson 13
13th & Howard 3
14th & Douglas 7
14th & Fahey 3
1516 Cuming St 9
15th & Farnam 0
15th & Howard 3
16th & Douglas 4
1819 Farnam 2
1st & Broadway 3
20th & Dodge 2
24th & Lake 2
50th & Underwood 2
62nd & Dodge 1
66th & Center 2
67th & Frances 6
67th & Pine 4
711 S Main St 0
7th & Jones 6
9th & Jones 1
Aksarben Drive 8
Ameristar 4
Ben’s House 0
Bike Share HQ 0
Bike Union 0
Bob Kerrey Pedestrian Bridge 10
Broadway & Main 6
College of Saint Mary 0
Lewis & Clark Landing 6
Midtown Crossing: 32nd & Farnam 5
Pearl St & Willow Ave 2
Shop 2
The Durham Museum 5
Tip Top Building 0
Tom Hanafan River’s Edge Park 1
dead station 14
Table 4: Station Counts on 2015-07-20 21:00:00
The use of the function along with the rationale of inverting the date data set according to date/time
lead to this efficient method to obtain the counts of each and every station at any given date/time. Before
this method was discovered the process was extremely complicated because of the inconsistencies inside the
data set itself. This was the first time I revealed major discrepancies in the entry of the data as well as the
available destinations of a bike. This was the point when I discovered ”ghost” stations like ”Ben’s House”
and ”Shop”. These emerging stations that were not a part of the original data stations makes the data
analysis portion more challenging.
7
3 Data Analysis
This report now takes a different direction into data analysis. The trips information obtained from B-Cycle
includes a mass amount of data that could hold the key to better understanding the system’s usage and
ultimately give us the information that can lead to a more efficient system. Over the next few subtopics I
will be looking into different aspects of the data and inside each of the subtopics I will be discussing some
information from the data I find interesting.
3.1 General Information
labelGeneral.Information When looking at a large amount of data it is always wise to start by getting a
general summary of the data. This summary output should include the basic summary of statistics (mean,
median, mode, standard deviation, range) as well as tell how the data has been stored as far as form is
concerned. I have produced a basic summary which yielded the following initial interesting themes (for the
code used to produce the summary please see A.6):
1. Bob Kerrey Pedestrian Bridge is by far the most used station with 7,000 more checkouts and returns
than the second place station (10th and Farnam).
2. Saturday and Sunday were the days with the most usage among all of the stations accounting for 40%
of the trips.
3. There was basically an even split between trips over 30 minutes and trips under 30 minutes 55% to
45%
4. The average number of calories burned was 181.5 which leads to the question of how that is calculated.
5. The 24-Hour Pass was far and away the most used Membership Type accounting for 72% of the trips.
6. Most of the trips have an adjustment flag tagged on them (97%).
7. The non-RFID card member is the most common user role accounting for 58% of the trips.
8. There are a few different user program names in the system other than Heartland and Omaha B-Cycle
(53).
These are just some general findings from looking at the summary and describe outputs. The outputs ask
more questions than they answer. They spark the ”why” questions. These outputs also served as motivation
for numerous upcoming tables and plots.
3.2 Checkouts vs Returns
When looking at this data set it was clear that an essential part of the data analysis piece was going to be
looking into how each station performed in the system. Table 5 shows the number of checkouts and returns
for each station, for the code used to create the table please see A.7.
8
Kiosk Name Checkouts Returns Difference
10th & Cass 954 960 6
10th & Dodge 1228 1249 21
10th & Farnam 3522 3522 0
11th & Jackson 3109 3129 20
13th & Howard 1627 1641 14
14th & Douglas 970 958 -12
14th & Fahey 994 990 -4
1516 Cuming St 1328 1302 -26
15th & Farnam 265 275 10
15th & Howard 627 614 -13
16th & Douglas 745 747 2
1819 Farnam 474 459 -15
1st & Broadway 177 179 2
20th & Dodge 523 502 -21
24th & Lake 121 113 -8
50th & Underwood 281 280 -1
62nd & Dodge 410 409 -1
66th & Center 475 460 -15
67th & Frances 1669 1676 7
67th & Pine 654 644 -10
711 S Main St 151 155 4
7th & Jones 234 229 -5
9th & Jones 283 287 4
Aksarben Drive 1395 1385 -10
Ameristar 511 509 -2
Ben’s House 0 1 1
Bike Share HQ 5 66 61
Bike Union 0 5 5
Bob Kerrey Pedestrian Bridge 10288 10218 -70
Broadway & Main 164 166 2
College of Saint Mary 79 80 1
dead station 0 130 130
Lewis & Clark Landing 2755 2762 7
Midtown Crossing: 32nd & Farnam 615 600 -15
Pearl St & Willow Ave 90 94 4
Shop 0 8 8
The Durham Museum 435 408 -27
Tip Top Building 0 5 5
Tom Hanafan River’s Edge Park 2256 2197 -59
Table 5: Station Checkouts and Returns
An additional column was added to the table to show the difference between checkouts and returns at each
station. A negative number in the ”Difference” column tells there were more checkouts at that particular
station than there were returns, thus leaving a positive number in the ”Difference” column showing there
were more returns than checkouts. Because these are two very different types of numbers it was worth
splitting the data frame up into two separate tables one for more checkouts than returns (Table 6) and one
for more returns than checkouts (Table 7).
9
Kiosk Name Checkouts Returns Difference
6 14th & Douglas 970 958 -12
7 14th & Fahey 994 990 -4
8 1516 Cuming St 1328 1302 -26
10 15th & Howard 627 614 -13
12 1819 Farnam 474 459 -15
14 20th & Dodge 523 502 -21
15 24th & Lake 121 113 -8
16 50th & Underwood 281 280 -1
17 62nd & Dodge 410 409 -1
18 66th & Center 475 460 -15
20 67th & Pine 654 644 -10
22 7th & Jones 234 229 -5
24 Aksarben Drive 1395 1385 -10
25 Ameristar 511 509 -2
29 Bob Kerrey Pedestrian Bridge 10288 10218 -70
34 Midtown Crossing: 32nd & Farnam 615 600 -15
37 The Durham Museum 435 408 -27
39 Tom Hanafan River’s Edge Park 2256 2197 -59
Table 6: Stations with more Checkouts than Returns
Kiosk Name Checkouts Returns Difference
1 10th & Cass 954 960 6
2 10th & Dodge 1228 1249 21
4 11th & Jackson 3109 3129 20
5 13th & Howard 1627 1641 14
9 15th & Farnam 265 275 10
11 16th & Douglas 745 747 2
13 1st & Broadway 177 179 2
19 67th & Frances 1669 1676 7
21 711 S Main St 151 155 4
23 9th & Jones 283 287 4
26 Ben’s House 0 1 1
27 Bike Share HQ 5 66 61
28 Bike Union 0 5 5
30 Broadway & Main 164 166 2
31 College of Saint Mary 79 80 1
32 dead station 0 130 130
33 Lewis & Clark Landing 2755 2762 7
35 Pearl St & Willow Ave 90 94 4
36 Shop 0 8 8
38 Tip Top Building 0 5 5
Table 7: Stations with more Returns than Checkouts
Looking at Tables 6 and 7 nothing really jumps off the page as far as their numbers, meaning there isn’t
one station that has an outrageous amount of returns compared to checkouts or vice versa. We can see the
few stations in the system that either do not allow checkouts which is interesting in the idea of seeing how
many bikes travel through them. In this case the dead station, shop, tip top building station, bike union and
Ben’s house didn’t account for a lot of entries given they only accounted for 0.38% of all trips. The tables
can be hard to read, but if seen in a bar graph it is much easier to unique features of the table so Figure 1
has been created which covers from 2014-01-11 to 2016-08-31. For code used to create it please see A.7.
10
10th & Cass
10th & Dodge
10th & Farnam
11th & Jackson
13th & Howard
14th & Douglas
14th & Fahey
1516 Cuming St
15th & Farnam
15th & Howard
16th & Douglas
1819 Farnam
1st & Broadway
20th & Dodge
24th & Lake
50th & Underwood
62nd & Dodge
66th & Center
67th & Frances
67th & Pine
711 S Main St
7th & Jones
9th & Jones
Aksarben Drive
Ameristar
Ben's House
Bike Share HQ
Bike Union
Bob Kerrey Pedestrian Bridge
Broadway & Main
College of Saint Mary
dead station
Lewis & Clark Landing
Midtown Crossing: 32nd & Farnam
Pearl St & Willow Ave
Shop
The Durham Museum
Tip Top Building
Tom Hanafan River's Edge Park
0 2500 5000 7500 10000
Value
KioskName
variable
Checkouts
Returns
Figure 1: Checkouts vs Returns per each Kiosk Station
Figure 1 was extremely useful because it gave me a real sense of what stations were being used the most.
Of course the plot then made me want to see the differences between checkouts and returns in the same type
of visual which is why Figure 2 was created. When looking at this figure it is important to remember red
bars indicate those stations have more checkouts than returns where as the blue bars indicate stations with
more returns than checkouts with that absence of a bar meaning it had the same number of checkouts as
returns. For the code used to create the difference plot please see A.7.
11
10th & Cass
10th & Dodge
10th & Farnam
11th & Jackson
13th & Howard
14th & Douglas
14th & Fahey
1516 Cuming St
15th & Farnam
15th & Howard
16th & Douglas
1819 Farnam
1st & Broadway
20th & Dodge
24th & Lake
50th & Underwood
62nd & Dodge
66th & Center
67th & Frances
67th & Pine
711 S Main St
7th & Jones
9th & Jones
Aksarben Drive
Ameristar
Ben's House
Bike Share HQ
Bike Union
Bob Kerrey Pedestrian Bridge
Broadway & Main
College of Saint Mary
dead station
Lewis & Clark Landing
Midtown Crossing: 32nd & Farnam
Pearl St & Willow Ave
Shop
The Durham Museum
Tip Top Building
Tom Hanafan River's Edge Park
0 50 100
Difference
KioskName
type
Negative
Neutral
Postive
Figure 2: Difference between Checkouts and Returns for each station over the lifetime of the data set.
3.3 Day of the Week and Time of day Analysis
After looking at checkouts vs returns I wondered how the system was used according to the day of the week
as well as the time of the day. In this section I will look into the system usage for each day of the week, the
time of day and the way those two interact together.
3.3.1 Day of the Week
First, I wanted to look at how the data was split according to the day of the week, and because of the
additional column I made inside the data set this was rather straight forward. Figure 3 shows the number
of uses each day of the week received with the percentage of the total number of trips that accounted for at
the end of each individual bar. Please see A.8 for the code used to create the figure.
12
14
12
21
19
11
12
11
Sunday
Saturday
Friday
Thursday
Wednesday
Tuesday
Monday
0 2000 4000 6000 8000
Count
DayoftheWeek
Figure 3: Usage per day of the week with overall percentage
As I looked at the plot the weekend clearly jumped out as being the time when the system is being used
the most. If I think of Friday, Saturday and Sunday as the weekend then it accounts for 54% of the trips.
The other days are very comparable, which lead me to believe that maybe the system was being used more
for people casually rather than for commuting to and from work. Because if people were using the system
more for commuting to and from work we would see a percentage of the trips on the weekdays.
3.3.2 Time of the Day
After seeing the day of the week distribution I found myself wondering how the system usage throughout a
day might look so Figure 4 was created. The figures x-axis is the time of the day based on the hour of the
checkout (in military time) and the percentage of trips is the y-axis. As I expected it seems that the middle
of the day seems to be the most common time for the system to be used, more specifically between noon
and 7pm. The code used to create plot can also be found in A.8.
0.000
0.025
0.050
0.075
5 10 15 20
Hour of the Day
PercentageofAllTrips
Figure 4: Usage throughout the day with hour time splits
Figure 4 was interesting but it led me to wonder how the system behaves during certain blocked times.
I know time is a continuous scale, but when thinking about a ’normal’ day there are certain blocks of time
a person is concerned with such as rush hour and lunch time. The next plot was made with the following
time splits in mind:
• 12(midnight) - 6am
13
• 6am - 11am
• 11am - 2pm
• 2pm - 6pm
• 6pm - 12(midnight)
When creating Figure 5 there was quite of bit of coding that needed to be done to get the correct time
splits and the code used to create the figure can be found in A.8. The figure shows that 2pm-6pm is the most
used time slot which might have been obvious based on the last couple plots. Remembering the weekend
had the most use which indicates that people were probably using the system causally when they went out
which usually occurs in the evening hours.
0
14
24
34
286pm−12(Midnight)
2pm−6pm
11am−2pm
6am−11am
12(Midnight)−6am
0 5000 10000
Count
Figure 5: Time grouping usage distribution
When I looked at the day of the week and time of day figures plotted separately it made me want to see
how they would interact with each other so Figure 6 was created. This figure shows each day of the week
individual’s time of day distribution. The plot is very interesting to look at because it reveals some ideas
that were previously difficult to see in the other plots. One of the most interesting things I saw on this plot
was the peaking of the individual days of the week. Thursday and Wednesday seemed to peak earlier in the
day as compared to the rest of the days. The graph is open to interpretation but can be very useful in better
understanding the system.
14
0
200
400
600
5 10 15 20 25
Time of Day
Checkouts
DOWC
Friday
Monday
Saturday
Sunday
Thursday
Tuesday
Wednesday
Figure 6: Usage plot according to time of day and grouped by day of the week.
3.4 Distribution of Membership Types
The trips are recorded with many different variables one of which is the type of membership the user has
when they checkout a bike. It was worth looking into the distribution of the membership types inside the data
set. This information could prove very important to the administration team when they look at the price of
each individual membership and its relativity to the use in the system. Figure 7 shows the distribution of
membership type. This figure is a simplification of the many different membership types inside of the data
set. The data set is a running tally of trips and as the administration team began to update the system
membership types changed names as well as having new ones created. To ensure I correctly combined the
different types a member of the administration provided the grouping types. The code used to create the
figure can be found in A.9
72
14
0
14
FUN
Annual and Heartland Pass
30−Day
Other
0 10000 20000
Count
MembershipType
Figure 7: Usage according to Membership Type
15
It is clear the ”FUN” pass is by far the most common user membership type with almost 75% of the
trips. The fun pass allows a user to use the system for 24-hours which I could see being used the most
often given I have already seen that a majority of the use of the system comes through casual use it makes
sense that there are not many annual members. I thought there might have been more 30-day memberships
though given the summer months, but clearly not the case inside this data set. Logically I wanted to how
membership types were used according to the day of the week and the time of day just, which is shown in
Figure 8 and the code for the figure can be found in A.9.
FUN Annual and Heartland Pass
30−Day Other
0
200
400
600
0
200
400
600
5 10 15 20 25 5 10 15 20 25
Time of Day
Checkouts
DOWC
Friday
Monday
Saturday
Sunday
Thursday
Tuesday
Wednesday
Figure 8: Time of day usage plot split by day of the week and facetted by membership type
There are a couple of ideas that standout looking at Figure 8. The first two lines that jump off the page
are located in the ”FUN” membership type and they are Saturday and Sunday. Not only do are these two
the most used combination, but they also show a huge spike at around 3pm. This shows that a majority
of ”FUN” users were using the system during the weekend. The next two lines that really catch the eye
are in the ”Other” section for Monday’s and Tuesday’s. There was a noticeable spike in their usage from
12:30pm-4pm for Tuesday and from 3pm-5:30pm for Monday. It was also interesting to see the annual
members seemed to be using the system more on the weekdays than on the weekends, perhaps indicating
annual members use the system to commute to and from work.
3.5 Station Pairs Usage
The B-Cycle system allows users to checkout a bike and then return the bike to any given station inside
the system (assuming there is an open slot). These trips can either be one-way or a loop meaning the user
returns the bike to the same station they checked it out of. This section mainly focuses on the one-way
trips, because the main question I wanted to answer was, ”What stations are used together frequently and
what locations of Omaha in general are used together.” These questions can be answer by looking at a table
of values giving every combination of stations and their counts, but this can be cumbersome. Instead, it is
much easier to look at a map with the stations plotted on it and lines showing the frequencies between pairs
of stations. Figure 9 is a line frequency plot where the thickness of the line is an indication of how many
16
trips there were between that pair of stations, the thicker the line the more the trips. This is first initial plot
showing most of the stations, and the code used to produce it can be found in A.10.
Figure 9: Station Pairs usage conveyed through a line frequency plot.
Figure 9 shows there is a clustering of usage in the downtown area. I was interested in seeing how the
downtown area looked so Figure 10 was created which is a zoomed in look at the downtown area.
17
Figure 10: Station pairs line frequency plot of downtown Omaha
This is the best looking and most revealing plot when looking at the all of the trips taken in the downtown
area. I noticed the Bob Kerrey Station had the thickest lines attached to it which makes sense when thinking
back to total number checkouts and returns at that station. The usage could differ according to membership
types as well thus it was worth creating a similar plot as section 3.4 which was facetted by the membership
type. Figure 11 shows this idea and the code used to create the plot can be found in A.11.
18
30−Day Annual and Heartland Pass
FUN Other
Figure 11: Station pairs line frequency plot of downtown Omaha facetted by membership type.
Figure ?? shows some interesting things about the Bob Kerrey station. The station continues to be the
most dominating station in the system, but when I look at the annual and monthly memberships frequency
line plot it shows Bob Kerrey is just a normal part of the system, which provides further evidence that people
using the system with annual and monthly memberships might be using it to travel and from work. The FUN
pass frequency line plot shows the Bob Kerrey station is used quite frequently, which paired with the idea that
it is close to the water and we believe those are the causal system users it makes sense for them to be together.
The frequency line plots were great to get a general idea of how the membership types behaved as sta-
tion pairs, but being able to see true quantities of the top 10 pairs could be valuable information. Table 8
was created to show the top 10 pairings along with their membership types associated with each top paring.
19
Start End MemType Count
Bob Kerrey Pedestrian Bridge Bob Kerrey Pedestrian Bridge FUN 7550
10th & Farnam 10th & Farnam FUN 1850
Lewis & Clark Landing Lewis & Clark Landing FUN 1504
11th & Jackson 11th & Jackson FUN 1354
67th & Frances 67th & Frances FUN 1243
Tom Hanafan River’s Edge Park Tom Hanafan River’s Edge Park FUN 905
Aksarben Drive Aksarben Drive FUN 695
10th & Dodge 10th & Dodge FUN 490
Tom Hanafan River’s Edge Park Bob Kerrey Pedestrian Bridge FUN 468
13th & Howard 13th & Howard FUN 467
Table 8: Top 10 Station Pairs with Membership Type
As shown by the Table 8, Bob Kerrey dominates the usage and along with some other stations seems to
be controlled by users doing round trips at the stations (checking out and returning the bike to the same
location). In fact the only one-way trip in the top 10 was Bob Kerrey to 13th and Howard which according
to Table 2 are 11,182 meters (6.9mi) apart which seems like an extremely long journey for a person to make.
Which leads one to believe this due to the bike rebalancing act. I wanted to see the top pair of stations
excluding round trips so a new top 10 table was created (Table 9) showing the top 10 pairs of stations with
their membership type, both tables creation code can be found A.12.
top10nr <- trimpc[,c(1:4)]
top10nr <- top10nr[top10nr$Start != top10nr$End,]
top10nr <- top10nr[order(top10nr$Count,decreasing = T),]
top10nr <- head(top10nr,10)
rownames(top10nr) <- 1:10
Start End MemType Count
Tom Hanafan River’s Edge Park Bob Kerrey Pedestrian Bridge FUN 468
Bob Kerrey Pedestrian Bridge 11th & Jackson FUN 382
Bob Kerrey Pedestrian Bridge Tom Hanafan River’s Edge Park FUN 369
Bob Kerrey Pedestrian Bridge Lewis & Clark Landing FUN 316
Bob Kerrey Pedestrian Bridge 13th & Howard FUN 308
11th & Jackson Bob Kerrey Pedestrian Bridge Other 245
Bob Kerrey Pedestrian Bridge 10th & Farnam FUN 226
11th & Jackson Bob Kerrey Pedestrian Bridge FUN 207
Lewis & Clark Landing Bob Kerrey Pedestrian Bridge FUN 184
10th & Farnam Bob Kerrey Pedestrian Bridge FUN 156
Table 9: Top 10 Station Pairs with Membership Type excluding round trips
Table 9 confirms that Bob Kerrey is the essential to the system given every top pair combination included
Bob Kerrey. This is very surprising to me, that there wasn’t a single pair of stations that didn’t include Bob
Kerrey to crack the top 10.
3.6 Bus Route Interaction
The city of Omaha has a bus system just like many big cities, which is used by many different people.
Heartland B-Cycle is a competitor to the bus stations in a way that they are both vying for the people to
20
use their transportation services. I thought it would be worth looking into how the stations usage looked
compared to the three most used bus routes (2,11,15). Figure 12 shows the entirety of the bus routes
overlayed with the frequency line plot from section 3.5 and the code used to create it can be found in A.13.
The pink line shows bus route 2, the green line shows bus route 11 and the blue line shows bus route 15.
Figure 12: Popular bus routes plotted against station pairs line frequency plot.
Figure 12 was not incredibly fruitful other thna showing us again that downtown is where most of the
trips occur. I wanted to see how the downtown from section 3.5 would look with the bus routes so Figure
13 was created.
21
Figure 13: Downtown Omaha station pairs line frequency plot with popular bus routes shown.
This is an easier graph to read but doesn’t provide much information about how the bus routes affect the
bike system or vice versa. There doesn’t seem to be a station that is close to the routes that is exploding
with usage, thus Figure 13 is inconclusive.
3.7 Bike Histories
So far I have looked into the use of the stations as tandems or just looked at the individual stations usage.
I would now like to switch gears and look at the trips a bike takes. Individual bike histories could provide
vital information about how a bike moves from station to station as well as if the data collection process
is efficient. The idea was to get each bikes full history, which would include each individual trip it takes
from station to station showing the time it was checked out and returned from the stations. This process
took some coding which can be found in A.14. The bike histories were created in a list which contained an
element corresponding to each individual bike which made a data frame showing every trip the bike took.
This was an extremely large list and thus has been excluded from this report, but a sample of one of the
bikes data frame can be seen in Table 10.
22
Bike CheckoutKioskName ReturnKioskName Checkout date time Return date time
8751 11 67th & Pine Aksarben Drive 2015-03-08 17:36:36 2015-03-08 18:12:26
8762 11 Aksarben Drive Aksarben Drive 2015-03-09 12:40:53 2015-03-09 12:56:52
8789 11 Aksarben Drive Aksarben Drive 2015-03-10 12:56:43 2015-03-10 13:22:30
8834 11 Aksarben Drive 67th & Pine 2015-03-11 16:31:21 2015-03-11 23:05:06
8900 11 67th & Pine 66th & Center 2015-03-12 19:02:35 2015-03-12 19:43:24
8911 11 66th & Center 66th & Center 2015-03-13 13:30:59 2015-03-13 14:20:47
8925 11 66th & Center 66th & Center 2015-03-13 17:28:50 2015-03-13 18:21:34
8945 11 66th & Center 66th & Center 2015-03-13 19:21:00 2015-03-13 20:07:15
9009 11 66th & Center dead station 2015-03-14 16:28:51 2015-03-16 10:45:10
9213 11 Aksarben Drive Aksarben Drive 2015-03-17 19:11:40 2015-03-17 19:12:08
10457 11 Aksarben Drive Aksarben Drive 2015-04-21 21:42:52 2015-04-21 22:01:14
Table 10: Sample of a Bike’s History
Notice from Table 10 there was a ’jump’ in the bike’s history even in this small sample. A jump occurs
when the bike was checked out from a station different from the station it was last returned to. This was
a problem inside the data set, because I was essentially missing a trip for every occurence of this situation.
The main point of making this list of bike histories was to gain information about the bikes trips, but with
missing information my data is comprimised. Because of this I had to take the time to design a method of
filling in this gaps.
Finding the Jumps The end goal is to fill in the gaps of the trips, but in order to do that I had to know
where I needed to fill. The first step in this process was obtaining a list of all the jumps for each individual
bike. This was done by comparing the checkout station to the previous trips return station, inside the same
bike history of course. This process was then repeated for all 150 bikes. Table 11 shows a small sample of
the jumping list, whereas the entire list along with the code used to produce it can be found in A.15.
Bike CheckoutKioskName ReturnKioskName Checkout date time Return date time co
16496 951 1819 Farnam 1516 Cuming St 2015-07-22 13:02:56 2015-07-22 13:10:00 82
19639 951 15th & Howard 15th & Howard 2015-08-29 12:53:35 2015-08-29 14:10:43 83
33406 942 14th & Douglas Bike Share HQ 2016-06-28 13:48:52 2016-06-28 15:04:29 19
33565 942 10th & Dodge 10th & Dodge 2016-06-29 08:49:23 2016-06-29 09:56:20 20
Table 11: Sample of Jumps from bikes 951 and 942
This was a very big problem inside the data set especially when there were 944 rows affected by this
phenomenon. Using these jumps the next step was to create the filler rows to insert into the data base which
would create a complete history for each of the bikes eliminating the jumps from the set, the code used to
produce these filler rows can be found in A.15.
3.8 Station Histories
After I looked into the bike histories I found myself wondering what the individual station histories might
tell me. The station histories could show how often a station was full or empty and this of course is of great
importance to both the administration at B-Cycle as well as the bike rebalancing team. This information is
also important to individual customers as if a station is full and they are trying to turn in their bike they
are in a real predicament along with the idea of needed a bike and the station being empty. Table 12 shows
a sample of one of the station’s individual history and the code used to produce it can be found in A.16.
23
Bike CheckoutKioskName ReturnKioskName Checkout date time Return date time UserRole
3586 258 Bob Kerrey Pedestrian Bridge Bob Kerrey Pedestrian Bridge 2014-07-05 10:33:00 2014-07-05 11:44:39 Subscriber
3587 197 14th & Fahey Bob Kerrey Pedestrian Bridge 2014-07-05 10:38:35 2014-07-05 11:31:36 Subscriber
3588 193 14th & Fahey Bob Kerrey Pedestrian Bridge 2014-07-05 10:39:00 2014-07-05 11:31:32 Subscriber
3589 112 11th & Jackson Bob Kerrey Pedestrian Bridge 2014-07-05 10:49:00 2014-07-05 11:44:02 Subscriber
70100 18 11th & Jackson Bob Kerrey Pedestrian Bridge 2014-07-05 10:49:52 2014-07-05 10:49:52 Jumper
3590 197 Bob Kerrey Pedestrian Bridge 11th & Jackson 2014-07-05 11:31:43 2014-07-05 11:48:57 Subscriber
3591 193 Bob Kerrey Pedestrian Bridge 11th & Jackson 2014-07-05 11:32:02 2014-07-05 11:48:53 Subscriber
3594 55 Bob Kerrey Pedestrian Bridge Bob Kerrey Pedestrian Bridge 2014-07-05 12:13:45 2014-07-05 15:34:30 Subscriber
3595 44 Bob Kerrey Pedestrian Bridge Bob Kerrey Pedestrian Bridge 2014-07-05 12:14:27 2014-07-05 15:34:35 Subscriber
3597 258 Bob Kerrey Pedestrian Bridge 11th & Jackson 2014-07-05 12:53:12 2014-07-05 13:17:15 Subscriber
3600 9 Bob Kerrey Pedestrian Bridge Bob Kerrey Pedestrian Bridge 2014-07-05 13:09:39 2014-07-05 14:00:29 Subscriber
Table 12: Sample of Bob Kerrey Pedestrian Bridge Station History
3.8.1 Time a station is full and empty(or near)
If a customer is riding a bike and reaches their desired location to find the station at that location is full
we have a serious problem on our hands because the person then has to go looking for the next station.
This could be a very frustrating situation for the customer and thus is something I wanted to look into. In
order to do this I have to split the station histories up making each trip essentially have two separate pieces
of data. One piece consisting of checkout location and date/time for each trip and the other consisting of
the return location and date/time. This allowed me to have each event either be categorized as the station
gaining a bike (+1) or losing a bike (-1) whenever a piece of data was entered into the system.
Once I had that information I crept through the database row by row counting each event and waiting
until a station became full. I wanted to be able to record the time that a station was full so I had a time
difference function built into my ’creeping’ function all of which can be found in A.17. An essential question
that came about as I was doing this was, ”How did the station go from full to not full?” Meaning was the
change of status due to the bike rebalancing act or was it just by chance or normal usage that it came to
change. Table 13 shows how long each station was full in minutes over the time period 2015-04-01 03:00:00
to 2016-08-31 22:30:08. This time period was chosen to ensure the system was not under the effects of having
new bikes and stations being added to it.
Time Full Percentage Time Empty Percentage Time near Empty Percentage Time Empty or near Percentage
Bob Kerrey Pedestrian Bridge 7128 0.95 8176 1.09 5320 0.71 13496 1.81
13th & Howard 2226 0.30 2 0.00 3963 0.53 3965 0.53
Aksarben Drive 758 0.10 56298 7.54 56637 7.58 112936 15.12
67th & Pine 50256 6.73 14832 1.99 80068 10.72 94901 12.70
67th & Frances 76 0.01 149808 20.05 31167 4.17 180975 24.22
62nd & Dodge 0 0.00 11439 1.53 42852 5.74 54291 7.27
1819 Farnam 0 0.00 53251 7.13 119067 15.94 172318 23.07
66th & Center 0 0.00 187440 25.09 131665 17.62 319105 42.71
11th & Jackson 6215 0.83 11947 1.60 22259 2.98 34206 4.58
14th & Fahey 6532 0.87 24455 3.27 83535 11.18 107991 14.45
1516 Cuming St 0 0.00 24375 3.26 34207 4.58 58582 7.84
1st & Broadway 0 0.00 44740 5.99 354549 47.46 399289 53.45
711 S Main St 0 0.00 30850 4.13 171806 23.00 202656 27.13
Lewis & Clark Landing 39411 5.28 13060 1.75 41301 5.53 54361 7.28
10th & Farnam 18486 2.47 44825 6.00 117189 15.69 162014 21.69
50th & Underwood 0 0.00 93874 12.57 126768 16.97 220643 29.53
Midtown Crossing: 32nd & Farnam 0 0.00 13764 1.84 8370 1.12 22134 2.96
The Durham Museum 0 0.00 31019 4.15 15708 2.10 46727 6.25
15th & Howard 0 0.00 47895 6.41 51152 6.85 99047 13.26
9th & Jones 0 0.00 88044 11.78 23087 3.09 111131 14.88
20th & Dodge 0 0.00 1275 0.17 11 0.00 1286 0.17
7th & Jones 0 0.00 33732 4.52 177687 23.78 211419 28.30
Broadway & Main 15608 2.09 0 0.00 0 0.00 0 0.00
24th & Lake 0 0.00 362515 48.52 94099 12.60 456614 61.12
Ameristar 0 0.00 22317 2.99 101298 13.56 123615 16.55
Tom Hanafan River’s Edge Park 0 0.00 13825 1.85 4502 0.60 18326 2.45
14th & Douglas 2979 0.40 113920 15.25 107785 14.43 221705 29.68
10th & Cass 1227 0.16 3980 0.53 22924 3.07 26904 3.60
16th & Douglas 5472 0.73 13612 1.82 36097 4.83 49709 6.65
Tip Top Building 0 0.00 1 0.00 28523 3.82 28524 3.82
Pearl St & Willow Ave 0 0.00 25272 3.38 323604 43.32 348876 46.70
College of Saint Mary 0 0.00 48268 6.46 4940 0.66 53207 7.12
15th & Farnam 14921 2.00 6984 0.93 7314 0.98 14298 1.91
Table 13: Stations Time being Full and Empty (or near) (in mins) with Percentage
Table 13 is extremely fruitful. It shows all the stations that had time being completely full or being
completely empty along with the percentage over the given data range that the station was full or empty.
The near empty column was added because some people might be trying to ride the bikes with a companion
and if there was only one bike that option of course would be eliminated for that particular pair of people.
The information provided here can be evaluated and looked into immensely but one facet I found interesting
24
was the fact that 1st and Broadway was empty or near 53% of the time. I could spend a lot of time breaking
down this table but I will read the further interpretation up to the individual reader.
3.8.2 Why did a station’s full or empty count change?
It was important to look at the amount of time a station was full and empty (or near) but it is also important
to find out how it went from full and empty to not full and not empty. In other words, did a customer cause
the station to open up/get another bike or was that the rebalancing team. In order to answer this question
I needed to have our list of stations split like did in the previous section, but this time each row inside the
data frame of each element needs to have a ”UserRole” column. I then needed to work my way through each
station’s list and find the row’s UserRole after the row of which causes a station to become full/empty(or
near). The code used can be found in A.18 which produced Figure 14 for the full data and Figure 15 for the
empty (or near empty) data.
10th & Cass
10th & Dodge
10th & Farnam
11th & Jackson
13th & Howard
14th & Douglas
14th & Fahey
15th & Farnam
16th & Douglas
67th & Frances
67th & Pine
7th & Jones
Aksarben Drive
Bob Kerrey Pedestrian Bridge
Broadway & Main
Lewis & Clark Landing
0 100 200 300
Freq
Station
UserRole
Maintenance
Not Maintenance
Figure 14: Number of Instances Maintenance did (or did not) cause a station to go from full to not full
25
10th & Cass
10th & Farnam
11th & Jackson
13th & Howard
14th & Douglas
14th & Fahey
1516 Cuming St
15th & Farnam
15th & Howard
16th & Douglas
1819 Farnam
1st & Broadway
20th & Dodge
24th & Lake
50th & Underwood
62nd & Dodge
66th & Center
67th & Frances
67th & Pine
711 S Main St
7th & Jones
9th & Jones
Aksarben Drive
Ameristar
Bob Kerrey Pedestrian Bridge
College of Saint Mary
Lewis & Clark Landing
Midtown Crossing: 32nd & Farnam
Pearl St & Willow Ave
The Durham Museum
Tip Top Building
Tom Hanafan River's Edge Park
0 250 500 750 1000
Freq
Station
UserRole
Maintenance
Not Maintenance
Figure 15: Times Maintenance did (or did not) cause a station to go from empty (or near) to not empty (or
near)
I can also display the tables to make it easier to see each individual value. The tables for the full and
empty counts are shown in the next few tables (14 and 15) repsectively.
UserRole Freq Station
1 Maintenance 2 Bob Kerrey Pedestrian Bridge
2 Not Maintenance 37 Bob Kerrey Pedestrian Bridge
3 Maintenance 5 13th & Howard
4 Not Maintenance 22 13th & Howard
6 Not Maintenance 12 Aksarben Drive
7 Maintenance 14 67th & Pine
8 Not Maintenance 30 67th & Pine
10 Not Maintenance 2 67th & Frances
17 Maintenance 6 11th & Jackson
18 Not Maintenance 13 11th & Jackson
19 Maintenance 8 14th & Fahey
20 Not Maintenance 13 14th & Fahey
27 Maintenance 14 Lewis & Clark Landing
28 Not Maintenance 291 Lewis & Clark Landing
29 Maintenance 9 10th & Farnam
30 Not Maintenance 78 10th & Farnam
43 Maintenance 1 7th & Jones
45 Maintenance 2 Broadway & Main
46 Not Maintenance 2 Broadway & Main
56 Not Maintenance 3 10th & Dodge
60 Not Maintenance 2 14th & Douglas
61 Maintenance 4 10th & Cass
62 Not Maintenance 6 10th & Cass
63 Maintenance 1 16th & Douglas
64 Not Maintenance 20 16th & Douglas
71 Maintenance 8 15th & Farnam
72 Not Maintenance 33 15th & Farnam
Table 14: Full changes according to UserRole
26
UserRole Freq Station
1 Maintenance 26 Bob Kerrey Pedestrian Bridge
2 Not Maintenance 389 Bob Kerrey Pedestrian Bridge
4 Not Maintenance 13 13th & Howard
5 Maintenance 13 Aksarben Drive
6 Not Maintenance 287 Aksarben Drive
7 Maintenance 14 67th & Pine
8 Not Maintenance 66 67th & Pine
9 Maintenance 30 67th & Frances
10 Not Maintenance 406 67th & Frances
11 Maintenance 10 62nd & Dodge
12 Not Maintenance 19 62nd & Dodge
13 Maintenance 19 1819 Farnam
14 Not Maintenance 111 1819 Farnam
15 Maintenance 46 66th & Center
16 Not Maintenance 166 66th & Center
17 Maintenance 23 11th & Jackson
18 Not Maintenance 371 11th & Jackson
19 Maintenance 22 14th & Fahey
20 Not Maintenance 146 14th & Fahey
21 Maintenance 51 1516 Cuming St
22 Not Maintenance 182 1516 Cuming St
23 Maintenance 27 1st & Broadway
24 Not Maintenance 64 1st & Broadway
25 Maintenance 33 711 S Main St
26 Not Maintenance 8 711 S Main St
27 Maintenance 24 Lewis & Clark Landing
28 Not Maintenance 340 Lewis & Clark Landing
29 Maintenance 38 10th & Farnam
30 Not Maintenance 995 10th & Farnam
31 Maintenance 14 50th & Underwood
32 Not Maintenance 134 50th & Underwood
33 Maintenance 18 Midtown Crossing: 32nd & Farnam
34 Not Maintenance 59 Midtown Crossing: 32nd & Farnam
35 Maintenance 2 The Durham Museum
36 Not Maintenance 16 The Durham Museum
37 Maintenance 24 15th & Howard
38 Not Maintenance 133 15th & Howard
39 Maintenance 10 9th & Jones
40 Not Maintenance 23 9th & Jones
41 Maintenance 2 20th & Dodge
42 Not Maintenance 7 20th & Dodge
43 Maintenance 29 7th & Jones
44 Not Maintenance 43 7th & Jones
47 Maintenance 34 24th & Lake
48 Not Maintenance 48 24th & Lake
49 Maintenance 14 Ameristar
50 Not Maintenance 104 Ameristar
51 Maintenance 6 Tom Hanafan River’s Edge Park
52 Not Maintenance 66 Tom Hanafan River’s Edge Park
59 Maintenance 33 14th & Douglas
60 Not Maintenance 380 14th & Douglas
61 Maintenance 20 10th & Cass
62 Not Maintenance 70 10th & Cass
63 Maintenance 13 16th & Douglas
64 Not Maintenance 66 16th & Douglas
66 Not Maintenance 2 Tip Top Building
67 Maintenance 16 Pearl St & Willow Ave
68 Not Maintenance 38 Pearl St & Willow Ave
69 Maintenance 7 College of Saint Mary
70 Not Maintenance 23 College of Saint Mary
71 Maintenance 66 15th & Farnam
72 Not Maintenance 2 15th & Farnam
Table 15: Empty (or near empty) changes according to UserRole
3.8.3 What about the busy months?
With Omaha being located in the Midwest there are definitely times of the year when the stations lay
dormant for the most part, such as the winter. One would be led to believe that the summer months are the
time of year when the system sees the most usage. This is in part to numerous different factors including
the weather, length of the day (in sunlight), activities in Omaha, and work/school schedules. The previous
Figure 13 showed all the numbers calculated from the start of the data set to the end of the data set. I
wondered what this would look like for the summer months only. To do these calculations I needed to set a
new time frame to look at. I choose to start at 2016-05-01 01:00:00 and go until 2016-09-19 23:00:00. Code
used for the creation of the new table can be found in A.19 and the statistics are shown in Table 3.8.3.
27
Time Full Percentage Time Empty Percentage Time near Empty Percentage Time Empty or near Percentage
Bob Kerrey Pedestrian Bridge 0 0.00 6590 3.22 5690 2.78 12280 6.01
13th & Howard 12748 6.24 0 0.00 0 0.00 0 0.00
Aksarben Drive 38202 18.69 0 0.00 0 0.00 0 0.00
67th & Pine 36501 17.86 0 0.00 0 0.00 0 0.00
67th & Frances 0 0.00 4544 2.22 2181 1.07 6725 3.29
62nd & Dodge 0 0.00 2595 1.27 8 0.00 2602 1.27
1819 Farnam 73 0.04 0 0.00 0 0.00 0 0.00
66th & Center 2617 1.28 0 0.00 0 0.00 0 0.00
11th & Jackson 20896 10.23 0 0.00 0 0.00 0 0.00
14th & Fahey 23816 11.65 0 0.00 1 0.00 1 0.00
1516 Cuming St 0 0.00 10819 5.29 25153 12.31 35972 17.60
1st & Broadway 0 0.00 10255 5.02 12398 6.07 22653 11.09
Lewis & Clark Landing 18015 8.82 1 0.00 1398 0.68 1399 0.68
10th & Farnam 0 0.00 13909 6.81 25814 12.63 39723 19.44
50th & Underwood 13044 6.38 0 0.00 0 0.00 0 0.00
Midtown Crossing: 32nd & Farnam 0 0.00 21524 10.53 15503 7.59 37027 18.12
The Durham Museum 16 0.01 11328 5.54 10169 4.98 21497 10.52
15th & Howard 0 0.00 25248 12.35 12081 5.91 37329 18.27
9th & Jones 0 0.00 19214 9.40 3125 1.53 22340 10.93
20th & Dodge 0 0.00 7421 3.63 2889 1.41 10310 5.05
7th & Jones 1013 0.50 14289 6.99 24321 11.90 38610 18.89
24th & Lake 0 0.00 2153 1.05 68 0.03 2221 1.09
Ameristar 796 0.39 0 0.00 0 0.00 0 0.00
Tom Hanafan River’s Edge Park 0 0.00 11562 5.66 6697 3.28 18259 8.93
10th & Dodge 30661 15.00 0 0.00 0 0.00 0 0.00
14th & Douglas 11940 5.84 0 0.00 0 0.00 0 0.00
10th & Cass 0 0.00 20945 10.25 19215 9.40 40160 19.65
16th & Douglas 0 0.00 6012 2.94 9823 4.81 15835 7.75
Pearl St & Willow Ave 0 0.00 77 0.04 74012 36.22 74089 36.25
College of Saint Mary 0 0.00 48268 23.62 4940 2.42 53207 26.04
15th & Farnam 14921 7.30 6984 3.42 7314 3.58 14298 7.00
Table 16: Summer Stats for Full/Empty/or near for all Stations
4 Forecasting
4.1 Discovery
There are many different forecasting models and each one varies in technique depending on the data they
represent. In order to pick the best type of model I first needed to take a closer look at the data for the
bike usage at each station. The distribution of bike usage according to duration of the trip at each checkout
station is shown in Table 17. This table allowed me to see what kind of trips people are taking from each
of the stations. It is also worth looking at what each station has coming back to it according to the trip
duration, these numbers are shown in Table 18. The code used to find the numbers and create the tables
can be found in A.20.
28
Checkout Station Under 5 min 6-10 min 11-20 min 21-30 min 31-60 min Over 60 min
10th & Cass 61 135 176 85 251 182
10th & Dodge 99 155 127 112 416 306
10th & Farnam 372 180 340 328 1487 782
11th & Jackson 273 304 499 259 1010 736
13th & Howard 213 224 252 126 451 346
14th & Douglas 217 242 146 66 154 140
14th & Fahey 130 145 263 76 170 195
1516 Cuming St 234 235 230 95 199 328
15th & Farnam 80 43 31 16 46 40
15th & Howard 135 110 69 36 145 124
16th & Douglas 94 169 157 58 142 117
1819 Farnam 98 89 95 25 84 83
1st & Broadway 111 2 4 5 31 24
20th & Dodge 90 90 80 44 108 101
24th & Lake 56 6 7 11 21 20
50th & Underwood 80 7 32 35 74 52
62nd & Dodge 78 60 99 35 58 75
66th & Center 126 13 25 38 162 108
67th & Frances 415 34 75 146 633 343
67th & Pine 166 29 111 61 187 93
711 S Main St 122 1 2 5 7 14
7th & Jones 132 22 25 3 23 26
9th & Jones 117 26 23 30 40 31
Aksarben Drive 120 25 99 208 665 275
Ameristar 136 7 75 64 176 50
Bike Share HQ 3 0 2 0 0 0
Bob Kerrey Pedestrian Bridge 590 212 891 1449 5305 1756
Broadway & Main 98 5 14 8 22 16
College of Saint Mary 46 5 3 3 14 8
Lewis & Clark Landing 209 242 217 284 1294 493
Midtown Crossing: 32nd & Farnam 42 16 198 89 124 103
Pearl St & Willow Ave 18 2 7 12 27 16
The Durham Museum 97 136 42 26 79 54
Tom Hanafan River’s Edge Park 310 69 282 329 943 314
Table 17: Bike Usage Distribution at each Checkout Station
29
Return Station Under 5 min 6-10 min 11-20 min 21-30 min 31-60 min Over 60 min
10th & Cass 84 136 110 46 244 180
10th & Dodge 100 145 110 91 466 335
10th & Farnam 366 158 363 356 1521 742
11th & Jackson 207 284 412 317 1107 793
13th & Howard 185 138 259 167 496 388
14th & Douglas 224 231 147 57 185 113
14th & Fahey 151 190 166 93 199 190
1516 Cuming St 238 248 170 102 202 337
15th & Farnam 79 60 18 18 31 44
15th & Howard 133 84 94 52 134 116
16th & Douglas 115 231 129 33 127 106
1819 Farnam 77 72 91 32 100 87
1st & Broadway 112 2 6 9 30 20
20th & Dodge 75 68 77 45 103 131
24th & Lake 56 4 10 8 19 15
50th & Underwood 80 7 38 25 74 56
62nd & Dodge 78 14 138 29 55 89
66th & Center 127 13 18 31 189 80
67th & Frances 422 29 84 160 627 330
67th & Pine 159 73 68 67 197 73
711 S Main St 119 0 4 8 9 15
7th & Jones 126 32 18 13 27 10
9th & Jones 133 23 20 13 28 43
Aksarben Drive 119 32 101 200 675 256
Ameristar 136 4 55 80 158 75
Ben’s House 0 0 0 0 0 1
Bike Share HQ 2 1 3 2 3 55
Bike Union 0 0 0 1 1 3
Bob Kerrey Pedestrian Bridge 600 390 1192 1389 5030 1544
Broadway & Main 99 6 9 4 29 18
College of Saint Mary 46 5 3 3 12 11
dead station 1 0 1 1 3 124
Lewis & Clark Landing 255 185 230 261 1324 499
Midtown Crossing: 32nd & Farnam 42 32 137 102 156 96
Pearl St & Willow Ave 19 2 10 12 28 16
Shop 0 0 0 0 0 8
The Durham Museum 94 74 75 24 91 44
Tip Top Building 0 0 0 2 1 2
Tom Hanafan River’s Edge Park 309 67 332 314 867 306
Table 18: Bike Usage Distribution at each Return Station
Table 17 and 18 were created as a talking point with members of the research team. The team wanted to
get a better sense of the data at the station level according to trip times to see if a predictive model could be
produced giving the prediction of bike usage at certain stations. This idea did not end up gaining traction,
but I left the tables in this report for the reader to look at in case of future works. The figures 16 and 17
were created to show the table in a bar plot format which allows viewers the opportunity to quickly identify
unique aspects of the data obtained, because of Bob Kerrey’s overwhelming presence in the data set it has
been removed from the figures.
30
## Using Return Station as id variables
10th & Cass
10th & Dodge
10th & Farnam
11th & Jackson
13th & Howard
14th & Douglas
14th & Fahey
1516 Cuming St
15th & Farnam
15th & Howard
16th & Douglas
1819 Farnam
1st & Broadway
20th & Dodge
24th & Lake
50th & Underwood
62nd & Dodge
66th & Center
67th & Frances
67th & Pine
711 S Main St
7th & Jones
9th & Jones
Aksarben Drive
Ameristar
Ben's House
Bike Share HQ
Bike Union
Broadway & Main
College of Saint Mary
dead station
Lewis & Clark Landing
Midtown Crossing: 32nd & Farnam
Pearl St & Willow Ave
Shop
The Durham Museum
Tip Top Building
Tom Hanafan River's Edge Park
0 500 1000 1500
Value
ReturnStation
variable
Under 5 min
6−10 min
11−20 min
21−30 min
31−60 min
Over 60 min
Figure 16: Bike Usage for each Return Station excluding Bob Kerrey Predistrian Bridge
31
## Using Checkout Station as id variables
10th & Cass
10th & Dodge
10th & Farnam
11th & Jackson
13th & Howard
14th & Douglas
14th & Fahey
1516 Cuming St
15th & Farnam
15th & Howard
16th & Douglas
1819 Farnam
1st & Broadway
20th & Dodge
24th & Lake
50th & Underwood
62nd & Dodge
66th & Center
67th & Frances
67th & Pine
711 S Main St
7th & Jones
9th & Jones
Aksarben Drive
Ameristar
Bike Share HQ
Broadway & Main
College of Saint Mary
Lewis & Clark Landing
Midtown Crossing: 32nd & Farnam
Pearl St & Willow Ave
The Durham Museum
Tom Hanafan River's Edge Park
0 500 1000 1500
Value
CheckoutStation
variable
Under 5 min
6−10 min
11−20 min
21−30 min
31−60 min
Over 60 min
Figure 17: Bike Usage for each Checkout Station excluding Bob Kerrey Predistrian Bridge
The figures 16 and 17 show some of the stations that have the most checkouts and returns for each type
of trip duration, but this data did not end up revealing enough about the trips in order for me to be able
to fit a predictive model. Instead of making a predictive model I went a different direction, demand. If
the administration team could know what the demand for each station was on average and what the max
demand was for each station in the system they could better estimate where they needed more bikes, slots,
rebalancing efforts, etc.
32
4.2 Low Count Histogram
Most of the research done throughout this report was motivated by the idea of helping Omaha B-Cycle
become a more efficient company as well as better plan future expansion of the system. The initial problem
Omaha C-Cycle presented the research group was one of rebalancing the bikes as I have mentioned before.
In this section I provide extremely useful information on both the fronts of bike rebalancing and future
expansion of stations.
What if B-Cycle knew how many bikes a station was down at most on any given day. A station has
bikes coming in and going out constantly, but what I wanted to know was how many bikes did the station
go down until it gained bikes back. Such as, assume station A has nine bikes at the start of the day; three
people checkout bikes from station A thus the station is down three bikes at this point according to the start
of the day. If two bikes are returned now the station is only down one bike according to the start of the day.
I wanted to know what the most number of bikes a station went down in any given day.
This idea is essentially make a demand frequency for each station. Figure 18 shows a compilation of stations
individual days max demand of bikes. A value of two in the table indicates there was at most 2 bikes taken
in a row without replacement from a station which occurred the amount of times given by the height of the
bar. A zero indicates that the station on that day either didn’t have any bikes checked out or had even/more
returns to the station than checkouts for that particular day. The figure only shows data from the summer
of 2016, which was chosen because of the stability of the time period (as previously discussed) as well as
because of the usage during those months. For the code used to obtain the data used to create the figure
please see A.21.
636
605
451
229
149
110
74
48
29 17 15 11 11 4 1 2
0
200
400
600
0 5 10 15
Max Demand on any given Day
Count
All Station Lows Frequency
Figure 18: Frequency of Max Demands for all stations compiled
33
I wanted to see how the stations looked individually now that I was able to see the max demand counts
as a system. Figure 19 shows the individual histograms for each station. The legend to reading these plots
is the same as Figure 18. The total count of each occurrence has been removed because of the small size of
each of the histograms. Individual station frequency tables can found in A.22.
Figure 19: Minimum Counts per Day Split by Station
10th & Cass 10th & Dodge 10th & Farnam 11th & Jackson 13th & Howard 14th & Douglas
14th & Fahey 1516 Cuming St 15th & Farnam 15th & Howard 16th & Douglas 1819 Farnam
1st & Broadway 20th & Dodge 24th & Lake 50th & Underwood 62nd & Dodge 66th & Center
67th & Frances 67th & Pine 711 S Main St 7th & Jones 9th & Jones Aksarben Drive
Ameristar Bike Share HQ Bike Union Bob Kerrey Pedestrian BridgeBroadway & Main College of Saint Mary
dead station Lewis & Clark LandingMidtown Crossing: 32nd & FarnamPearl St & Willow Ave The Durham MuseumTom Hanafan River's Edge Park
0
20
40
60
0
20
40
60
0
20
40
60
0
20
40
60
0
20
40
60
0
20
40
60
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
Max Demand on any given Day
Count
Figure 19 is not the easiest figure to read and thus it made sense to plot a few of the interesting
stations on a larger scale allowing their qualities to be easier to see. I have choosen to look at the following
stations: Cumming Street, Durham Museum and Tom Hanafan River’s Edge Park. The rest of the individual
histograms can be found in A.23.
34
24
16
20
13
8 10 8
3 5
1 1 3
Tom Hanafan River's Edge Park
0
20
40
60
0 5 10 15
Max Demand on any given Day
Count
20
30
23
9
4 2 2 1
1516 Cuming St
0
20
40
60
0 5 10 15
Max Demand on any given Day
Count
35
10
66
17
8
3 1
The Durham Museum
0
20
40
60
0 5 10 15
Max Demand on any given Day
Count
5 Future Works
The goal of this research report was to provide data analysis on the Omaha B-Cycle data that would reveal
useful traits about the bike sharing system. The deeper I looked into a specific topic the more layers I
discovered and realized that there is still a lot of work that could be done. The bike rebalancing team now
has more data to implement in their model design and testing which should help them further their research
in finding the best method to redistribute bikes amongst the stations. Unfortunately I do not have infinite
time and I had to stop perusing some topic areas, but if I believe future researchers should focus their efforts
on the following topics.
The first topic which could be of great importance would be to further develop a way to predict/determine
the demand of bikes at each station. The histogram technique is quick and allows for real time analysis of
the demand of bikes which is very useful to the bike rebalancing team and the B-Cycle administration but
as the system gets more and more data a model could be developed and trained using data from the system.
Future models could be designed using decision trees, support vector machines and random kernel methods.
Another topic that should be looked into would be to map out GPS data from the bikes from the sys-
tem. Seeing where bikes travel from station to station as well as just around the city of Omaha could further
help not only the creation of new stations as well as the bike distribution side of things but could also be
used on the business side of things I am implying that if the administration team knew what routes bikes
were traveling maybe they could sell advertisements or come up with other profitable ideas.
The final topic I feel would be fruitful would be to look into how to update and maintain the physical
bikes in the system. When talking with the administration from B-Cycle they informed me they want to do
maintenance on each bike every three weeks. The idea initially was to see if the bikes circulated enough to
just do maintenance on them inside the rebalancing effort with a combination of the bikes that were docked
at the stations close the headquarters. Before the research report was started I looked slightly into this idea
but it didn’t end up gaining any traction, but I do believe it could be incredibly useful to the company.
The door has been opened and a small light has been shed on the Omaha B-Cycle System. There is
still much work to be done but with the research provided in this paper there is at least a good initial
starting point. Obtaining information about station’s trips to and from, bike’s individual histories, demands
of stations, station interaction and time analysis helped to slingshot researchers in the right direction.
36
References
[1] R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical
Computing, Vienna, Austria, http://www.R-project.org/ , 2016
[2] Yihui Xie knitr: A general-purpose package for dynamic report generation in R, http://yihui.name/
knitr/, 2016
[3] Hadley Wickham, Flexibly Reshape Data: A Reboot of the Reshape Package, https://github.com/
hadley/reshape, 2016
[4] G. Grothendieck, Perform SQL Selects on R Data Frames, http://sqldf.googlecode.com, 2016
[5] Garrett Grolemund, Vitalie Spinu, and Hadley Wickham, Make Dealing with Dates a Little Easier,
http://sqldf.googlecode.com, 2016
[6] Winston Chang and Hadley Wickham, An Implementation of the Grammar of Graphics, http://
ggplot2.org, 2016
[7] David Kahle and Hadley Wickham, Spatial Visualization with ggplot2, https://github.com/dkahle/
ggmap, 2016
[8] Frank E Harrell Jr, Harrell Miscellaneous, http://biostat.mc.vanderbilt.edu/Hmisc, 2016
[9] Heartland B-Cycle, Heartland B-Cycle, https://admin.bcycle.com/Admin/Default.aspx, 2016
[10] Dr. Betty N. Love and Dr. Andrew Swift, Discussion of Research Techniques, 2016
[11] Ben Turner, B-Cycle Information, 2016
37
A Appendix
A.1 Stations Table
Kiosk Index Kiosk Name Address Latitude Longitude Total Docks
1 10th & Cass 1001 Cass St 41.26370 -95.92943 13
2 10th & Dodge 1008 Dodge St 41.25966 -95.92928 11
3 10th & Farnam 1000 Farnam St 41.25756 -95.92921 11
4 11th & Jackson 1100 Jackson St 41.25440 -95.93058 13
5 13th & Howard 1300 Howard St 41.25547 -95.93322 10
6 14th & Douglas 1400 Douglas St 41.25864 -95.93451 11
7 14th & Fahey 1400 Fahey St 41.26583 -95.93449 11
8 1516 Cuming St 1516 Cuming St 41.26826 -95.93669 9
9 15th & Howard 1500 Howard St 41.25549 -95.93585 11
10 16th & Douglas 1600 Douglas St 41.25865 -95.93719 11
11 1819 Farnam 2200 River Rd 41.25708 -95.94028 11
12 1st & Broadway 100 Broadway St 41.26333 -95.84373 10
13 20th & Dodge 2000 Dodge St 41.25975 -95.94242 9
14 24th & Lake 2400 Lake St 41.28153 -95.94701 9
15 50th & Underwood 5000 Underwood St 41.26506 -95.99028 10
16 62nd & Dodge 6200 Dodge St 41.25961 -96.00819 10
17 66th & Center 6600 Center St 41.25236 -95.99799 10
18 67th & Frances 6700 Frances St 41.23984 -96.01464 9
19 67th & Pine 6700 Pine St 41.24374 -96.01462 7
20 711 S Main St 601 Riverfront Dr 41.25474 -95.85110 9
21 7th & Jones 700 Jones St 41.25337 -95.92539 10
22 9th & Jones 900 Jones St 41.25339 -95.92785 10
23 Aksarben Drive 1919 Ak-Sar-Ben Dr 41.24123 -96.01784 11
24 Ameristar 2200 River Rd 41.24312 -95.90857 11
25 Bob Kerrey Pedestrian Bridge 644 Bridge 41.26565 -95.92413 15
26 Broadway & Main 00 Main St 41.26088 -95.84973 10
27 Lewis & Clark Landing 601 Riverfront Dr 41.26217 -95.92446 11
28 Midtown Crossing: 32nd & Farnam 3200 Farnam St 41.25763 -95.96043 11
29 Pearl St & Willow Ave 533 Willow Ave 41.25844 -95.85103 11
30 The Durham Museum 801 S 10th S 41.25144 -95.92808 11
31 Tom Hanafan River’s Edge Park River Edge Service Rd 41.26253 -95.91838 15
A.2 Driving Distances Matrix Code
getDist <- function(from,to,modus="driving",get="distance") {
library(rjson)
metric <- numeric(length=length(from))
for (i in seq_along(from)) {
url <- paste0("http://maps.googleapis.com/maps/api/distancematrix/json?origins=Omaha+",
gsub(" ","+",from[i]),"&destinations=Omaha+",gsub(" ","+",to[i]),"&mode=",
modus,"&units=metric&language=de-DE&sensor=false")
data <- fromJSON(file=url)
if (data$status=="INVALID_REQUEST") {
warning("Invalid request")
metric[i] <- NA
}
if (data$status=="UNKNOWN_ERROR") {
warning("Unknown error - try again !")
metric[i] <- NA
38
}
if (data$status=="MAX_ELEMENTS_EXCEEDED") {
stop("Maximum number of items per request exceeded")
metric[i] <- NA
}
if (data$status=="OVER_QUERY_LIMIT") {
stop("Exceeded maximum number of requests")
metric[i] <- NA
}
if (data$status=="REQUEST_DENIED") {
stop("rejected request")
metric[i] <- NA
}
if (data$rows[[1]]$elements[[1]]$status=="NOT_FOUND") {
warning("Start and / or destination can not be found")
metric[i] <- NA
}
if (data$rows[[1]]$elements[[1]]$status=="ZERO_RESULTS") {
warning("Found No route between starting point and destination")
metric[i] <- NA
}
if (data$status=="OK" & data$rows[[1]]$elements[[1]]$status=="OK") {
if (get=="distance") {
metric[i] <- data$rows[[1]]$elements[[1]]$distance$value
metric[i] <- round(metric[i]/1,digits=1)
}
if(get=="duration") {
metric[i] <- data$rows[[1]]$elements[[1]]$duration$value
metric[i] <- round(metric[i]/(60),digits=1)
}
}
}
return(metric)
}
first <- outer(stations$Address[1:10],stations$Address,getDist,modus='driving',get='distance')
second <- outer(stations$Address[11:21],stations$Address,getDist,modus='driving',get='distance')
third <- outer(stations$Address[21:31],stations$Address,getDist,modus='driving',get='distance')
A.3 Biking Distances Matrix Code
getDist <- function(from,to,modus="driving",get="distance") {
library(rjson)
metric <- numeric(length=length(from))
for (i in seq_along(from)) {
url <- paste0("http://maps.googleapis.com/maps/api/distancematrix/json?origins=Omaha+",
gsub(" ","+",from[i]),"&destinations=Omaha+",gsub(" ","+",to[i]),"&mode=",
modus,"&units=metric&language=de-DE&sensor=false")
data <- fromJSON(file=url)
if (data$status=="INVALID_REQUEST") {
warning("Invalid request")
metric[i] <- NA
}
if (data$status=="UNKNOWN_ERROR") {
warning("Unknown error - try again !")
metric[i] <- NA
}
39
if (data$status=="MAX_ELEMENTS_EXCEEDED") {
stop("Maximum number of items per request exceeded")
metric[i] <- NA
}
if (data$status=="OVER_QUERY_LIMIT") {
stop("Exceeded maximum number of requests")
metric[i] <- NA
}
if (data$status=="REQUEST_DENIED") {
stop("rejected request")
metric[i] <- NA
}
if (data$rows[[1]]$elements[[1]]$status=="NOT_FOUND") {
warning("Start and / or destination can not be found")
metric[i] <- NA
}
if (data$rows[[1]]$elements[[1]]$status=="ZERO_RESULTS") {
warning("Found No route between starting point and destination")
metric[i] <- NA
}
if (data$status=="OK" & data$rows[[1]]$elements[[1]]$status=="OK") {
if (get=="distance") {
metric[i] <- data$rows[[1]]$elements[[1]]$distance$value
metric[i] <- round(metric[i]/1,digits=1)
}
if(get=="duration") {
metric[i] <- data$rows[[1]]$elements[[1]]$duration$value
metric[i] <- round(metric[i]/(60),digits=1)
}
}
}
return(metric)
}
first <- outer(stations$Address[1:10],stations$Address,getDist,modus='bicyling',get='distance')
second <- outer(stations$Address[11:21],stations$Address,getDist,modus='bicyling',get='distance')
third <- outer(stations$Address[21:31],stations$Address,getDist,modus='bicyling',get='distance')
A.4 Station Bike Count Code
#subset the data set to only include trips before the date specified
end <- trips[trips$Checkout_date_time <= adate ,c("Bike","CheckoutKioskName
","ReturnKioskName","Checkout_date_time","Return_date_time")]
#order the subset oldest to newest
end <- end[with(end,order(Return_date_time,decreasing=T)),]
#grap the first bike id location
first <- end[match(unique(end$Bike), end$Bike),]
#count the number of times a station appears in the unique bike list
stat_c <- as.data.frame(table(first$ReturnKioskName))
names(stat_c) <- c('KioskName','Count')
#make a readable table
40
count <- sqldf('select * from stat_c
order by KioskName')
A.5 Station Counts at any Date/Time Code
#flipping the bike column method to find the number of bikes at each station
bike_count <- function(dt){
#prepare the date for subset logical
dt <- as.POSIXlt(dt,tz='UTC')
#subset the data set to only include trips before the date specified
end <- trips[trips$Return_date_time <= dt ,c("Bike","CheckoutKioskName","ReturnKioskName"
,"Checkout_date_time","Return_date_time")]
#order the subset oldest to newest
end <- end[with(end,order(Return_date_time,decreasing=T)),]
#grap the first bike id location
first <- end[match(unique(end$Bike), end$Bike),]
#count the number of times a station appears in the unique bike list
stat_c <- as.data.frame(table(first$ReturnKioskName))
names(stat_c) <- c('KioskName','Count')
#make a readable table
count <- sqldf('select * from stat_c
order by KioskName')
count
}
#use the date/time insid the function
count <- bike_count(bdate)
A.6 Summary and Desribe Output from Section 3.1
summary(trips)
## TripId UserProgramName UserId
## Min. : 2008284 Heartland B-cycle :32777 Min. : 13020
## 1st Qu.: 4317028 Omaha B-cycle : 6584 1st Qu.: 601822
## Median : 6138047 Denver B-cycle : 25 Median : 831570
## Mean : 6846018 Boulder B-cycle : 13 Mean : 838811
## 3rd Qu.:10104673 Des Moines B-cycle : 3 3rd Qu.:1149241
## Max. :11691781 Kansas City B-cycle: 3 Max. :1359695
## (Other) : 9
## UserRole UserCity UserState
## Demo Member : 28 :33643 :33611
## InternalMember : 31 Omaha : 4371 NE : 5241
## Maintenance : 5282 OMAHA : 282 IA : 130
## Member : 1182 omaha : 159 KS : 73
## Non-RFID Card Member:22846 Council Bluffs: 91 WI : 67
## RFID Card Member : 4354 Omaha : 80 DC : 51
## Subscriber : 5691 (Other) : 788 (Other): 241
## UserZip UserCountry
## Min. : 0 FRANCE : 5
## 1st Qu.:63701 NETHERLANDS : 1
## Median :68106 UNITED STATES:39408
## Mean :63572
## 3rd Qu.:68135
## Max. :99999
## NA's :6798
## MembershipType Bike BikeType
## 24-Hour Pass Kiosk :28302 193 : 588 Standard:39414
## : 5341 185 : 525
41
## Heartland Pass (Annual Pay) : 2623 197 : 502
## Heartland Pass (Monthly Pay): 1685 267 : 487
## Annual : 1171 36 : 486
## FUN! Pass : 103 109 : 485
## (Other) : 189 (Other):36341
## CheckoutKioskName
## Bob Kerrey Pedestrian Bridge :10288
## 10th & Farnam : 3522
## 11th & Jackson : 3109
## Lewis & Clark Landing : 2755
## Tom Hanafan River's Edge Park : 2256
## 67th & Frances : 1669
## (Other) :15815
## ReturnKioskName DurationMins
## Bob Kerrey Pedestrian Bridge :10218 Min. : 0.0
## 10th & Farnam : 3522 1st Qu.: 12.0
## 11th & Jackson : 3129 Median : 34.0
## Lewis & Clark Landing : 2762 Mean : 104.7
## Tom Hanafan River's Edge Park : 2197 3rd Qu.: 53.0
## 67th & Frances : 1676 Max. :49914.0
## (Other) :15910
## AdjustedDurationMins UsageFee AdjustmentFlag Distance
## Min. : 0.000 Min. : 0.0000 N:38372 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 0.0000 Y: 1042 1st Qu.: 1.500
## Median : 0.000 Median : 0.0000 Median : 3.000
## Mean : 1.556 Mean : 0.6856 Mean : 4.556
## 3rd Qu.: 0.000 3rd Qu.: 0.0000 3rd Qu.: 6.600
## Max. :304.000 Max. :83.0000 Max. :99.600
##
## EstimatedCarbonOffset EstimatedCaloriesBurned CheckoutDateLocal
## Min. : 0.000 Min. : 0.0 Min. :2014-01-11
## 1st Qu.: 1.400 1st Qu.: 58.0 1st Qu.:2015-03-31
## Median : 2.900 Median : 120.0 Median :2015-08-09
## Mean : 4.317 Mean : 181.5 Mean :2015-08-27
## 3rd Qu.: 6.300 3rd Qu.: 264.0 3rd Qu.:2016-05-22
## Max. :94.600 Max. :3984.0 Max. :2016-08-31
##
## ReturnDateLocal CheckoutTimeLocal
## Min. :2014-01-11 Min. :5H 44M 38S
## 1st Qu.:2015-03-31 1st Qu.:12H 27M 51.25S
## Median :2015-08-09 Median :15H 24M 32.5S
## Mean :2015-08-27 Mean :15H 23M 22.4273354645557S
## 3rd Qu.:2016-05-22 3rd Qu.:18H 26M 2.75S
## Max. :2016-08-31 Max. :23H 24M 41S
##
## ReturnTimeLocal TripOver30Mins LocalProgramFlag
## Min. :27S N:17718 N: 53
## 1st Qu.:12H 59M 5.25S Y:21696 Y:39361
## Median :16H 0M 21S
## Mean :15H 56M 56.3876795047472S
## 3rd Qu.:19H 6M 51.5S
## Max. :23H 59M 57S
##
## TripRouteCategory TripProgramName DOWC
## One Way :16220 Heartland B-cycle:32818 Friday :5481
## Round Trip:23194 Omaha B-cycle : 6596 Monday :4663
## Saturday :8177
## Sunday :7630
## Thursday :4211
## Tuesday :4807
## Wednesday:4445
## DOWR Checkout_date_time
## Friday :5471 Min. :2014-01-11 18:27:10
## Monday :4716 1st Qu.:2015-03-31 15:31:23
## Saturday :8102 Median :2015-08-09 16:02:43
## Sunday :7623 Mean :2015-08-27 19:59:26
## Thursday :4226 3rd Qu.:2016-05-22 13:21:57
## Tuesday :4814 Max. :2016-08-31 22:03:35
## Wednesday:4462
## Return_date_time dur_sec
## Min. :2014-01-11 19:27:24 Min. : 0
## 1st Qu.:2015-03-31 15:34:21 1st Qu.: 737
## Median :2015-08-09 17:07:02 Median : 2063
## Mean :2015-08-27 22:06:53 Mean : 6264
## 3rd Qu.:2016-05-22 14:39:22 3rd Qu.: 3192
## Max. :2016-08-31 23:05:24 Max. :2994814
## NA's :43
describe(trips)
## trips
##
## 33 Variables 39414 Observations
## ---------------------------------------------------------------------------
## TripId
## n missing unique Info Mean .05 .10 .25
## 39414 0 39414 1 6846018 2572828 2854288 4317028
## .50 .75 .90 .95
## 6138047 10104673 10966710 11302699
##
## lowest : 2008284 2008286 2010589 2010594 2010596
## highest: 11691514 11691517 11691522 11691776 11691781
## ---------------------------------------------------------------------------
## UserProgramName
## n missing unique
## 39414 0 11
##
## Austin B-cycle (2, 0%), Boulder B-cycle (13, 0%)
## Broward B-cycle (2, 0%), Cincy Red Bike (1, 0%)
## Denver B-cycle (25, 0%)
## Des Moines B-cycle (3, 0%)
## Heartland B-cycle (32777, 83%)
## Houston B-cycle (1, 0%)
## Kansas City B-cycle (3, 0%)
## Nashville B-cycle (3, 0%)
## Omaha B-cycle (6584, 17%)
42
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report
B-Cycle_Report

More Related Content

What's hot (18)

Da handbook section_3_power_distribution_systems_757959_ena
Da handbook section_3_power_distribution_systems_757959_enaDa handbook section_3_power_distribution_systems_757959_ena
Da handbook section_3_power_distribution_systems_757959_ena
 
Tinyos programming
Tinyos programmingTinyos programming
Tinyos programming
 
Datasheet
DatasheetDatasheet
Datasheet
 
Electronics en engineering-basic-vocational-knowledge
Electronics en engineering-basic-vocational-knowledgeElectronics en engineering-basic-vocational-knowledge
Electronics en engineering-basic-vocational-knowledge
 
Travel guide-to-india-railways---78pages
Travel guide-to-india-railways---78pagesTravel guide-to-india-railways---78pages
Travel guide-to-india-railways---78pages
 
CSP Report Group E (1)
CSP Report Group E (1)CSP Report Group E (1)
CSP Report Group E (1)
 
Water samplers-Liquistation CSF48
Water samplers-Liquistation CSF48Water samplers-Liquistation CSF48
Water samplers-Liquistation CSF48
 
Motorcycle riders handbook
Motorcycle riders handbookMotorcycle riders handbook
Motorcycle riders handbook
 
Addendum
AddendumAddendum
Addendum
 
Z4 mz6musersguide
Z4 mz6musersguideZ4 mz6musersguide
Z4 mz6musersguide
 
Junipe 1
Junipe 1Junipe 1
Junipe 1
 
Tài liệu biến tần ABB ACS310
Tài liệu biến tần ABB ACS310Tài liệu biến tần ABB ACS310
Tài liệu biến tần ABB ACS310
 
EVS-06-33e.pdf
EVS-06-33e.pdfEVS-06-33e.pdf
EVS-06-33e.pdf
 
Latex2e
Latex2eLatex2e
Latex2e
 
Sokkia Set B training manual
Sokkia Set B  training manualSokkia Set B  training manual
Sokkia Set B training manual
 
95763406 atoll-3-1-0-user-manual-lte
95763406 atoll-3-1-0-user-manual-lte95763406 atoll-3-1-0-user-manual-lte
95763406 atoll-3-1-0-user-manual-lte
 
C01508406
C01508406C01508406
C01508406
 
Notes of 8085 micro processor Programming for BCA, MCA, MSC (CS), MSC (IT) &...
Notes of 8085 micro processor Programming  for BCA, MCA, MSC (CS), MSC (IT) &...Notes of 8085 micro processor Programming  for BCA, MCA, MSC (CS), MSC (IT) &...
Notes of 8085 micro processor Programming for BCA, MCA, MSC (CS), MSC (IT) &...
 

Similar to B-Cycle_Report

Data Visualization with RRob Kabacoff2018-09-032
Data Visualization with RRob Kabacoff2018-09-032Data Visualization with RRob Kabacoff2018-09-032
Data Visualization with RRob Kabacoff2018-09-032OllieShoresna
 
Data visualization with r rob kabacoff2018 09-032
Data visualization with r rob kabacoff2018 09-032Data visualization with r rob kabacoff2018 09-032
Data visualization with r rob kabacoff2018 09-032AISHA232980
 
Concept mapping patient initials, age, gender and admitting d
Concept mapping patient initials, age, gender and admitting dConcept mapping patient initials, age, gender and admitting d
Concept mapping patient initials, age, gender and admitting dARIV4
 
Pratical mpi programming
Pratical mpi programmingPratical mpi programming
Pratical mpi programmingunifesptk
 
Capstone Final Report
Capstone Final ReportCapstone Final Report
Capstone Final ReportVaibhav Menon
 
ENGS_90_Final_Report_TeamTara.pdf
ENGS_90_Final_Report_TeamTara.pdfENGS_90_Final_Report_TeamTara.pdf
ENGS_90_Final_Report_TeamTara.pdfHanaBaSabaa
 
Design of a regional aircaft
Design of a regional aircaftDesign of a regional aircaft
Design of a regional aircaftAlexGuerrero117
 
Reverse engineering for_beginners-en
Reverse engineering for_beginners-enReverse engineering for_beginners-en
Reverse engineering for_beginners-enAndri Yabu
 
Mongo db aggregation guide
Mongo db aggregation guideMongo db aggregation guide
Mongo db aggregation guideDeysi Gmarra
 
Mongo db aggregation-guide
Mongo db aggregation-guideMongo db aggregation-guide
Mongo db aggregation-guideDan Llimpe
 
Power-Jib-Crane-Design.pdf
Power-Jib-Crane-Design.pdfPower-Jib-Crane-Design.pdf
Power-Jib-Crane-Design.pdfYOOSHIN3
 
Guide for the design of crane supporting steel structures
Guide for the design of crane supporting steel structuresGuide for the design of crane supporting steel structures
Guide for the design of crane supporting steel structuresTimóteo Rocha
 
27911 gyroscope appnote2
27911 gyroscope appnote227911 gyroscope appnote2
27911 gyroscope appnote2ssuseraddd15
 

Similar to B-Cycle_Report (20)

Data Visualization with RRob Kabacoff2018-09-032
Data Visualization with RRob Kabacoff2018-09-032Data Visualization with RRob Kabacoff2018-09-032
Data Visualization with RRob Kabacoff2018-09-032
 
Data visualization with r rob kabacoff2018 09-032
Data visualization with r rob kabacoff2018 09-032Data visualization with r rob kabacoff2018 09-032
Data visualization with r rob kabacoff2018 09-032
 
Concept mapping patient initials, age, gender and admitting d
Concept mapping patient initials, age, gender and admitting dConcept mapping patient initials, age, gender and admitting d
Concept mapping patient initials, age, gender and admitting d
 
Pratical mpi programming
Pratical mpi programmingPratical mpi programming
Pratical mpi programming
 
Capstone Final Report
Capstone Final ReportCapstone Final Report
Capstone Final Report
 
ENGS_90_Final_Report_TeamTara.pdf
ENGS_90_Final_Report_TeamTara.pdfENGS_90_Final_Report_TeamTara.pdf
ENGS_90_Final_Report_TeamTara.pdf
 
pyspark.pdf
pyspark.pdfpyspark.pdf
pyspark.pdf
 
Design of a regional aircaft
Design of a regional aircaftDesign of a regional aircaft
Design of a regional aircaft
 
Reverse engineering for_beginners-en
Reverse engineering for_beginners-enReverse engineering for_beginners-en
Reverse engineering for_beginners-en
 
Mongo db aggregation guide
Mongo db aggregation guideMongo db aggregation guide
Mongo db aggregation guide
 
Mongo db aggregation-guide
Mongo db aggregation-guideMongo db aggregation-guide
Mongo db aggregation-guide
 
Circuitikzmanual
CircuitikzmanualCircuitikzmanual
Circuitikzmanual
 
Power-Jib-Crane-Design.pdf
Power-Jib-Crane-Design.pdfPower-Jib-Crane-Design.pdf
Power-Jib-Crane-Design.pdf
 
MSc_Dissertation
MSc_DissertationMSc_Dissertation
MSc_Dissertation
 
Mat power manual
Mat power manualMat power manual
Mat power manual
 
Guide for the design of crane supporting steel structures
Guide for the design of crane supporting steel structuresGuide for the design of crane supporting steel structures
Guide for the design of crane supporting steel structures
 
2016-07 Web
2016-07 Web2016-07 Web
2016-07 Web
 
Manual quagga
Manual quaggaManual quagga
Manual quagga
 
Thesis
ThesisThesis
Thesis
 
27911 gyroscope appnote2
27911 gyroscope appnote227911 gyroscope appnote2
27911 gyroscope appnote2
 

More from Brett Keim

presentation_Hadoop_File_System
presentation_Hadoop_File_Systempresentation_Hadoop_File_System
presentation_Hadoop_File_SystemBrett Keim
 
poster_Work_Injuries
poster_Work_Injuriesposter_Work_Injuries
poster_Work_InjuriesBrett Keim
 
Data_Mining_Exploration
Data_Mining_ExplorationData_Mining_Exploration
Data_Mining_ExplorationBrett Keim
 
Backpack_Paper
Backpack_PaperBackpack_Paper
Backpack_PaperBrett Keim
 
Traffic_Deaths
Traffic_DeathsTraffic_Deaths
Traffic_DeathsBrett Keim
 
poster_B-Cycle
poster_B-Cycleposter_B-Cycle
poster_B-CycleBrett Keim
 

More from Brett Keim (6)

presentation_Hadoop_File_System
presentation_Hadoop_File_Systempresentation_Hadoop_File_System
presentation_Hadoop_File_System
 
poster_Work_Injuries
poster_Work_Injuriesposter_Work_Injuries
poster_Work_Injuries
 
Data_Mining_Exploration
Data_Mining_ExplorationData_Mining_Exploration
Data_Mining_Exploration
 
Backpack_Paper
Backpack_PaperBackpack_Paper
Backpack_Paper
 
Traffic_Deaths
Traffic_DeathsTraffic_Deaths
Traffic_Deaths
 
poster_B-Cycle
poster_B-Cycleposter_B-Cycle
poster_B-Cycle
 

B-Cycle_Report

  • 1. B-Cycle Report Brett Keim November 24, 2016 Abstract With bike rentals in Omaha gaining popularity each year from the years 2013 to 2016 it is important for the Heartland B-Cycle team to understand the patterns behind this growth. This research report reveals some of those patterns and provides data to researchers which will aid them in planning for future demand of bike rentals. This report was also created to find inefficiencies inside the current system and provide information on why certain phenomenon occurred. Conclusions about the bike rentals will be provided inside each subtopic of discussion as well as a closing future works section. 1
  • 2. Contents 1 About the Data 3 2 Bike Rebalancing 3 2.1 Station Distance Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1.1 Driving Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.2 Biking Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Station Counts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2.1 Initial Station Counts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2.2 Station Counts on any Date/Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Data Analysis 8 3.1 General Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 Checkouts vs Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.3 Day of the Week and Time of day Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3.1 Day of the Week . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3.2 Time of the Day . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.4 Distribution of Membership Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.5 Station Pairs Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.6 Bus Route Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.7 Bike Histories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.8 Station Histories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.8.1 Time a station is full and empty(or near) . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.8.2 Why did a station’s full or empty count change? . . . . . . . . . . . . . . . . . . . . . 25 3.8.3 What about the busy months? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4 Forecasting 28 4.1 Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2 Low Count Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5 Future Works 36 A Appendix 38 A.1 Stations Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 A.2 Driving Distances Matrix Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 A.3 Biking Distances Matrix Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 A.4 Station Bike Count Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 A.5 Station Counts at any Date/Time Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 A.6 Summary and Desribe Output from Section 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . 41 A.7 Code for Section 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 A.8 Code for Section 3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 A.9 Code for Section 3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 A.10 Line Frequency Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 A.11 Facetted Line Frequency Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 A.12 Code used for Top 10 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 A.13 Code used in Section 3.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 A.14 Bike Histories Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 A.15 Bike Jumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 A.16 Station Histories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 A.17 Station Histories(cont.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 A.18 Code for full/empty changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 A.19 Busy Months Full/Empty Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 A.20 Bike Usage at Stations according to Duration . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 A.21 Histogram Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 1
  • 3. A.22 Individual Frequency Tables for Max Bike Demand . . . . . . . . . . . . . . . . . . . . . . . . 86 A.23 Individual Histograms for Max Bike Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 List of Figures 1 Checkouts vs Returns per each Kiosk Station . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2 Difference between Checkouts and Returns for each station over the lifetime of the data set. . 12 3 Usage per day of the week with overall percentage . . . . . . . . . . . . . . . . . . . . . . . . 13 4 Usage throughout the day with hour time splits . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5 Time grouping usage distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 6 Usage plot according to time of day and grouped by day of the week. . . . . . . . . . . . . . . 15 7 Usage according to Membership Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 8 Time of day usage plot split by day of the week and facetted by membership type . . . . . . 16 9 Station Pairs usage conveyed through a line frequency plot. . . . . . . . . . . . . . . . . . . . 17 10 Station pairs line frequency plot of downtown Omaha . . . . . . . . . . . . . . . . . . . . . . 18 11 Station pairs line frequency plot of downtown Omaha facetted by membership type. . . . . . 19 12 Popular bus routes plotted against station pairs line frequency plot. . . . . . . . . . . . . . . 21 13 Downtown Omaha station pairs line frequency plot with popular bus routes shown. . . . . . . 22 14 Number of Instances Maintenance did (or did not) cause a station to go from full to not full . 25 15 Times Maintenance did (or did not) cause a station to go from empty (or near) to not empty (or near) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 16 Bike Usage for each Return Station excluding Bob Kerrey Predistrian Bridge . . . . . . . . . 31 17 Bike Usage for each Checkout Station excluding Bob Kerrey Predistrian Bridge . . . . . . . . 32 18 Frequency of Max Demands for all stations compiled . . . . . . . . . . . . . . . . . . . . . . . 33 19 Minimum Counts per Day Split by Station . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 List of Tables 1 Station Driving Distances (in meters) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Station Biking Distances (in meters) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Bike Station count on 2015-07-20 21:00:00 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4 Station Counts on 2015-07-20 21:00:00 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 5 Station Checkouts and Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 6 Stations with more Checkouts than Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 7 Stations with more Returns than Checkouts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 8 Top 10 Station Pairs with Membership Type . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 9 Top 10 Station Pairs with Membership Type excluding round trips . . . . . . . . . . . . . . . 20 10 Sample of a Bike’s History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 11 Sample of Jumps from bikes 951 and 942 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 12 Sample of Bob Kerrey Pedestrian Bridge Station History . . . . . . . . . . . . . . . . . . . . . 24 13 Stations Time being Full and Empty (or near) (in mins) with Percentage . . . . . . . . . . . 24 14 Full changes according to UserRole . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 15 Empty (or near empty) changes according to UserRole . . . . . . . . . . . . . . . . . . . . . . 27 16 Summer Stats for Full/Empty/or near for all Stations . . . . . . . . . . . . . . . . . . . . . . 28 17 Bike Usage Distribution at each Checkout Station . . . . . . . . . . . . . . . . . . . . . . . . 29 18 Bike Usage Distribution at each Return Station . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2
  • 4. 1 About the Data The data used in this research report was gathered through the Heartland B-Cycle Administration website. The data was collected on the site in a couple of different locations, but the primary set used in this report was the trips data. This data set is continuously being updated as the trips come in, so in order to have a closed set a snapshot data set was taken from the website. This data set has a final date of 2016-08-31 and a starting date of 2014-01-11. The data set contained 39414, 1 rows with 28 different columns which included: • TripID • UserProgramName • User Id • UserRole • UserCity • UserState • UserZip • UserCountry • MembershipType • Bike • BikeType • CheckoutKioskName • ReturnKioskName • DurationMins • AdjustedDurationMins • UsageFee • AdjustmentsFlag • Distance • EstimatedCarbonOffset • EstimatedCaloriesBurned • CheckoutDateLocal • ReturnDateLocal • CheckoutTimeLocal • ReturnTimeLocal • TripOver30Mins • LocalProgramFlag • TripRouteCategory • TrpProgramName These original columns were missing some key information so the following columns were added: • DOWC: Day of the week for the checkout entry. • DOWR: Day of the week for the return entry. • Checkout date time: Combines the checkout date and time into one entry. • Return date time: Combines the return date and time into one entry. The data set needed some cleaning and data type changes in order to make it easier to manipulate. I first changed the CheckoutDateLocal and ReturnDateLocal from factor fields to date fields. I left the Checkout date time and Return date time fields as ”POSIXct” ”POSIXt” classes which allowed for easy sorting, subsetting, and elimination of rows in the data set based on their date and time stamps. It is also worth mentioning all of the time stamps were converted into military time for accuracy and readability. 2 Bike Rebalancing This report only reflects the work of part of the Heartland B-Cycle research team. This report is the data analysis side of things, whereas the other group of researchers were working to find the most efficient way to re-balance the distribution of bikes among the stations. Ideally the stations need to be balanced every day, but in a real world situation with real costs this is just not feasible. The team is working on a method which would insure the stations are re-balanced every week. In order for them to achieve their goal they needed certain information about the data and the system as a whole which is where our works intertwine. 2.1 Station Distance Matrices The first item the bike balancing team asked for was the distance from station to station for the 31 original stations. The most efficient way to do this was to use the Google API which essentially is typing the address of each station to station destination on google.com/maps. In order for this to work efficiently a function was written in R that afforded me all of the same options as the google maps site. Those options include 3
  • 5. transportation type (bus, walking, driving, biking) as well as output type (distance, time). The function was coded to identify five different types of errors (invalid request, unknown error, max elements exceeded, over query limit, request denied) each of which tells the user a specific problem to address. The function needed to know the address of each station, specifically a four digit address and not just a two street intersection. The intersections were converted from the station list provided by Heartland B-Cycle’s Associates. These addresses were converted into longitudinal and latitudinal coordinates and stored inside a separate data frame for future reference, see A.1. The function and addresses were then ran in RStudio to yield two different results. 2.1.1 Driving Distances The first distance matrix to be calculated was the driving distance between all of the stations. Although the users will be riding bikes between stations it is still worth finding the driving distances between all of the stations because the bike balancers will be driving among the stations to collect bikes. The final distance matrix is shown in Table 1 and the code for obtaining the matrix is located in A.2. In an effort for the matrix to be readable the kiosk names were replaced with an index value. To identify the two stations please see the stations table in A.1. Kiosk 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1 0 523 730 1188 1506 1440 617 1174 1509 1488 5083 6598 1581 4216 5975 6764 12086 12247 14490 2149 1774 1305 13885 5083 11110 7791 2149 2924 7365 1441 4677 2 920 0 420 878 983 916 1096 1526 986 964 4773 6413 1058 4695 5671 6579 12565 12725 14969 1631 2141 995 14363 4773 11589 7482 1631 2739 7055 1808 4367 3 1406 940 0 464 786 727 1366 1795 789 999 5253 6216 1327 4965 5940 6849 12834 12995 8450 1915 1944 796 14633 5253 11859 7961 1915 2569 7535 1611 4847 4 1213 747 473 0 340 964 1607 2037 558 1241 5059 5683 1569 5206 6182 7091 11246 11406 7917 1711 820 331 13044 5059 12101 7768 1711 2811 7341 487 4653 5 1310 844 804 554 0 625 1269 1699 219 903 5157 5481 1231 4868 5844 6753 7679 7840 7716 1819 1375 872 8265 5157 11762 7865 1819 2472 7438 1041 4751 6 1125 658 618 1076 516 0 1084 1513 519 729 4971 5781 1045 4683 5658 6567 12552 12713 8015 1633 1674 1193 14351 4971 11577 7679 1633 2299 7253 1341 4565 7 705 1161 1374 1610 1271 1205 0 436 1275 1253 5379 6687 1347 2810 5237 6853 12175 12335 14579 2162 2430 1927 13973 5379 11199 8088 2162 3013 7661 2096 4973 8 1133 1569 1775 2024 1686 2111 417 0 1689 1942 5794 6797 1761 2374 4801 6920 12042 12203 9031 2123 2844 2342 13841 5794 10072 8502 2123 3103 8076 2511 5388 9 1755 1288 1248 780 679 630 1714 2276 0 679 5601 5262 1008 4677 5620 6529 7460 7620 7496 2263 1597 1327 8046 5601 11572 8310 2263 2249 7883 1264 5195 10 1294 827 787 1246 685 169 1253 1682 688 0 5140 5667 547 4216 5159 6068 11985 12145 7901 1802 1844 1362 13783 5140 11111 7849 1802 2019 7422 1510 4734 11 5493 5027 5233 5691 6009 5943 5853 6286 6013 5991 0 14491 6085 9376 11015 11924 13032 13192 15436 6204 7909 5808 14830 0 16270 9689 6204 8083 9296 7576 3753 12 7862 6644 6604 5868 5763 5986 7146 6729 5541 5817 11411 0 5702 7444 1999 1863 2881 2621 2234 8852 6219 5949 2784 11411 14338 14120 8852 3787 13693 5885 11005 13 1855 1389 1349 1807 1246 731 1814 2162 1250 561 7637 5355 0 3670 4613 5521 7553 7713 7589 2364 2392 1924 8085 7637 10564 10346 2364 1681 9919 2058 7231 14 3519 4882 5095 5331 4992 4548 2802 2386 4996 4379 9149 7241 4203 0 5429 7407 12736 12896 15140 4509 6151 5648 14534 9149 7702 11857 4509 3567 11431 5817 8743 15 5949 6344 5893 6793 6455 5275 5232 4815 6458 5106 10611 2058 4991 5463 0 1866 4569 4729 3763 6939 6873 6468 4313 10611 12378 13320 6939 3358 12893 6540 10205 16 7308 6842 6802 7260 6699 6184 7544 7128 6703 6014 11609 2108 5900 7642 1885 0 4139 3117 2520 7817 7782 7377 3070 11609 14536 14318 7817 4267 13891 7448 11203 17 15360 15086 15299 8194 15197 14752 15066 14649 15200 14583 16960 3018 14407 15304 4560 3236 0 466 736 16071 8544 8275 532 16960 22198 21432 16071 6112 21039 8211 18326 18 13480 13206 13419 8237 8131 12872 13186 12769 7909 12703 15080 2616 12527 13424 4489 3105 365 0 605 14191 8587 8317 400 15080 20318 19552 14191 6155 19159 8254 16446 19 10292 8807 8766 8298 8193 8148 9576 9159 7970 7979 15686 2430 7864 14031 3831 2507 966 605 0 11282 8649 8379 557 15686 16920 20159 11282 6216 19765 8315 17052 20 1510 1258 1241 1699 2240 1967 2182 2125 2244 2222 5827 8922 2316 4499 6926 7837 13606 13767 16011 0 2285 1816 15405 5827 12722 8536 0 3810 8109 1952 5421 21 1512 1046 772 539 1143 1498 2137 2573 1097 1771 5358 6222 2104 5736 6716 7625 11582 11743 13987 2010 0 234 13381 5358 12630 8067 2010 3340 7640 370 4952 22 1353 886 612 379 656 1339 1978 2414 876 1611 5199 6062 1944 5577 6557 7466 11375 11535 13779 1851 496 0 13173 5199 12471 7908 1851 3181 7481 162 4793 23 15187 14913 15125 8848 8742 14579 14892 14476 8520 14409 16786 2980 14234 15131 4381 3057 765 400 557 15897 9198 8929 0 16786 22025 21259 15897 6766 20865 8865 18152 24 5493 5027 5233 5691 6009 5943 5853 6286 6013 5991 0 14491 6085 9376 11015 11924 13032 13192 15436 6204 7909 5808 14830 0 16270 9689 6204 8083 9296 7576 3753 25 11942 11668 11881 12117 11779 11334 10596 10179 11782 11165 21591 14027 10989 7824 12238 14193 19522 19682 21926 12721 12937 12435 21320 21591 0 18501 12721 10353 18075 12604 21185 26 7411 6945 7151 7610 7928 7861 7771 8205 7931 7909 9855 13676 8003 11294 12933 13842 18708 18869 21112 8122 8196 7726 20507 9855 18896 0 8122 10001 642 7862 7159 27 1510 1258 1241 1699 2240 1967 2182 2125 2244 2222 5827 8922 2316 4499 6926 7837 13606 13767 16011 0 2285 1816 15405 5827 12722 8536 0 3810 8109 1952 5421 28 3762 3296 3256 2803 2465 2638 3858 3441 2468 2468 8063 3528 2354 4096 3358 4267 5726 5887 5762 4271 3621 3352 6258 8063 10990 10772 4271 0 10345 3288 7657 29 7436 6970 7176 7634 7952 7886 7796 8229 7956 7934 9399 13700 8027 11318 12958 13867 18252 18412 20656 8147 8221 7751 20050 9399 19072 460 8147 10026 0 7887 6672 30 1461 994 720 487 826 1454 2093 2523 1045 1727 5307 6168 2055 5692 6668 7576 11266 11426 13670 1959 387 182 13064 5307 12586 8015 1959 3296 7589 0 4901 31 4881 4415 4621 5079 5397 5331 5241 5675 5401 5379 3384 11145 5473 8764 10403 11312 14936 15097 17340 5592 5666 5196 16735 3384 15658 6991 5592 7471 6646 5332 0 Table 1: Station Driving Distances (in meters) 2.1.2 Biking Distances The other distance matrix calculated was the biking distances between all of the stations. These distances give us vital information about user distances and are especially useful when identifying where potentially new stations need to made according to usage of each pair of stations. This matrix also allows a person to know how far they would have to travel using the system from station to station. The matrix is shown in Table 2 and the code for obtaining the matrix is located in A.3. 4
  • 6. Kiosk 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1 0 523 730 1188 1295 999 615 1174 1516 1169 5622 11254 1479 3519 5835 7982 9805 9920 9364 2129 1599 1506 10015 5622 10670 10679 2129 3299 8804 1404 3637 2 792 0 265 724 831 580 1094 1582 1051 749 5173 10805 1058 3859 6174 8321 9340 9455 8899 1407 1135 1041 9550 5173 11010 10230 1407 2834 8355 940 3188 3 743 276 0 464 568 727 1366 1853 789 999 5780 11413 1325 4130 5909 8056 9075 9190 8634 1964 872 777 9285 5780 11281 10838 1964 2569 8963 677 3796 4 1213 747 473 0 339 965 2042 2092 558 1238 6023 11655 1564 4369 6148 8295 8707 8867 8873 2207 820 316 9524 6023 11519 11080 2207 2807 9205 430 4038 5 1179 842 568 342 0 626 1705 1755 220 901 6128 11760 1227 4032 5811 7957 8369 8530 8536 2311 1314 655 9187 6128 11182 11185 2311 2470 9310 775 4143 6 995 658 618 854 517 0 1301 1351 519 169 5399 11032 1055 3628 5639 7786 8805 8920 8364 1633 1487 1172 9015 5399 10778 10456 1633 2299 8582 1292 3415 7 615 1141 1347 1610 1272 868 0 488 1275 1038 5635 11267 1348 2765 5220 7367 9560 9676 9238 2142 2420 1927 9889 5635 9916 10692 2142 3054 8817 2021 3650 8 1133 1569 1775 2024 1687 1283 417 0 1689 1452 5596 11228 1383 2345 4976 7123 9609 9591 8994 2103 2835 2342 9645 5596 9496 10653 2103 3103 8778 2450 3611 9 1404 1063 789 780 339 408 1482 1532 0 463 6347 11979 986 3820 5589 7736 8149 8310 8314 2530 1535 1096 8965 6347 10960 11404 2530 2249 9529 1211 4362 10 1164 827 787 1023 686 169 1244 1294 688 0 5569 11201 776 3571 5360 7507 8525 8641 8085 1802 1657 1341 8736 5569 10722 10626 1802 2019 8751 1462 3584 11 5709 5355 5286 5744 6128 6286 5857 5641 6348 6559 0 8035 6170 7986 11469 13615 14634 14750 14194 3767 6018 6251 14844 0 16412 7457 3767 8128 7343 5960 2113 12 9848 9494 9424 9883 10266 10424 9996 9780 10487 10697 8065 0 10308 12125 15607 17754 22207 18888 18332 7905 10156 10390 18983 8065 20550 578 7905 12267 1037 10099 7598 13 1477 1389 1349 1583 1241 731 1331 1381 1243 561 7130 12763 0 2809 5063 7210 8229 8344 7788 3314 2319 1896 8439 7130 10882 12188 3314 1722 10313 2015 5146 14 3380 3920 4127 4394 4052 3542 2765 2386 4055 3373 7981 13613 2834 0 5488 7417 10337 9972 9375 4489 5180 4708 10025 7981 8070 13038 4489 3729 11164 4801 5996 15 5833 6367 6100 6110 5768 5719 5218 4798 5771 5569 10394 16026 5246 5464 0 2147 4980 4615 4018 6901 6847 6424 4669 10394 10679 15451 6901 3340 13576 6542 8409 16 7874 8408 8142 8152 7810 7760 7259 6839 7813 7611 12435 18067 7288 7377 2042 0 3482 3117 2520 8942 8889 8465 3183 12435 13051 17492 8942 5382 15617 8584 10450 17 10561 9701 9427 8958 9095 9045 9946 9526 8632 8895 15122 20754 8780 10064 4729 3236 0 466 736 11629 9314 9044 641 15122 15737 21637 11629 6880 21523 9388 13137 18 10429 9234 8960 8491 8628 8578 9814 9394 8165 8428 14517 20149 8313 9932 4597 3105 365 0 605 10701 8847 8577 509 14517 15606 19574 10701 6413 17700 8921 12532 19 9832 9435 9161 8692 8829 8779 9217 8797 8366 8629 14393 20025 8514 9335 4000 2507 970 605 0 10900 9048 8778 743 14393 15008 19450 10900 7340 17575 9122 12408 20 1546 1539 1469 1928 2311 2469 2365 2148 2532 2742 3767 9399 2354 4494 7652 9799 10818 10933 10377 0 2201 2435 11028 3767 13137 8824 0 4312 6949 2144 1782 21 1512 1046 894 549 889 1516 2139 2573 1109 1789 6018 11650 2115 4918 6699 8846 9258 9419 9424 2201 0 234 10075 6018 12069 11075 2201 3358 9200 755 4033 22 1353 886 612 316 655 1283 1980 2414 875 1555 6252 11885 1881 4759 6465 8612 9024 9185 9190 2436 694 0 9841 6252 11909 11310 2436 3125 9435 744 4268 23 10501 9631 9357 8888 9025 8975 9886 9466 8562 8825 15062 20694 8710 10004 4669 3183 874 509 666 11569 9244 8974 0 15062 15677 20119 11569 8009 18244 9318 13077 24 5709 5355 5286 5744 6128 6286 5857 5641 6348 6559 0 8035 6170 7986 11469 13615 14634 14750 14194 3767 6018 6251 14844 0 16412 7457 3767 8128 7343 5960 2113 25 10530 11070 11277 11519 11182 10778 9915 9495 11184 10708 16332 21964 10169 7994 11778 13707 16627 16262 15665 11598 12330 11836 16316 16332 0 21389 11598 10376 19514 11951 14347 26 9272 8918 8849 9307 9691 9849 9420 9204 9911 10122 7489 578 9733 11549 15032 17179 21631 18313 17757 7330 9581 9814 18408 7489 19975 0 7330 11691 460 9523 7022 27 1546 1539 1469 1928 2311 2469 2365 2148 2532 2742 3767 9399 2354 4494 7652 9799 10818 10933 10377 0 2201 2435 11028 3767 13137 8824 0 4312 6949 2144 1782 28 3422 3084 2810 2820 2478 2428 3525 3537 2480 2279 8368 14000 2164 4185 3340 5487 6506 6621 6065 4551 3556 3133 6716 8368 10458 13425 4551 0 11550 3252 6383 29 8876 8521 8452 8910 9294 9452 9024 8807 9514 9725 7294 1038 9336 11153 14635 16782 17801 17916 17360 6933 9184 9418 18011 7294 19578 460 6933 11294 0 9126 6827 30 1417 951 677 444 775 1403 2044 2478 995 1676 5960 11592 2002 4823 6586 8733 10526 10687 9311 2144 1423 535 11321 5960 11974 11017 2144 3246 9142 0 3975 31 3724 3370 3301 3759 4143 4301 3873 3656 4363 4574 2113 7568 4185 6001 9484 11631 12649 12765 12209 1782 4033 4266 12860 2113 14427 6991 1782 6143 6876 3975 0 Table 2: Station Biking Distances (in meters) 2.2 Station Counts In order for the bike rebalancing team to develop a model or system of balancing the bike distribution they needed to know how many bikes were at a station at one given time. They wanted to be able to have the power to type in any date in the systems life and know exactly how many bikes were at each station given that date/time. What the team was asking for was simple, enter a date and count how many bikes were at the station. This proved to be a much more complex task because of the lack of data on the number of bikes at each station at the start of the time as well as the fact that more bikes and stations were added to the system as the system as a whole gained more usage. My initial idea to answer this question was to start at the date the user wanted and ”crawl” through the data set counting each unique bikes first entry in the system and tally them in their respective station locations. However, this idea wad problematic given the structure of the data and how much it changed throughout the life of the system. Instead I split the data set into two parts, the entries before the user date and the entries after the user date. I only needed the entries before the user date because those show where the bikes where up to the user date. The initial idea of crawling through the data set was then used, but instead of starting at the user date and moving forward in time I started at the user date and crawled backwards. When a bike first came up its station was identified and gained a tally for total bikes, this process was repeated until every bike ID was identified in the system. This method eliminated the need to account for new stations being added to the system as well as worry about new bikes being added to the system. 2.2.1 Initial Station Counts TIt was important to look at the initial bike counts at the respective stations, so I took the time to choose an arbitrary date to get the station counts at. The date I choose was 2015-07-20 and I used my idea previously talk about in 2.2 to obtain Table 3. To see the code used to create the table please see A.4 5
  • 7. KioskName Count 10th & Cass 3 10th & Dodge 4 10th & Farnam 7 11th & Jackson 13 13th & Howard 3 14th & Douglas 7 14th & Fahey 3 1516 Cuming St 8 15th & Farnam 0 15th & Howard 3 16th & Douglas 4 1819 Farnam 2 1st & Broadway 3 20th & Dodge 2 24th & Lake 2 50th & Underwood 2 62nd & Dodge 1 66th & Center 2 67th & Frances 6 67th & Pine 4 711 S Main St 0 7th & Jones 6 9th & Jones 1 Aksarben Drive 8 Ameristar 4 Ben’s House 0 Bike Share HQ 0 Bike Union 0 Bob Kerrey Pedestrian Bridge 11 Broadway & Main 6 College of Saint Mary 0 Lewis & Clark Landing 6 Midtown Crossing: 32nd & Farnam 5 Pearl St & Willow Ave 2 Shop 2 The Durham Museum 5 Tip Top Building 0 Tom Hanafan River’s Edge Park 1 dead station 14 Table 3: Bike Station count on 2015-07-20 21:00:00 Looking at Table 3, there are a couple of features that jump out at me. The first of which is the number of bikes (14) at the dead station. I also wanted to make sure the number of bikes added up to 150 because there are 150 unique bikes inside the system, which is true in this case. 2.2.2 Station Counts on any Date/Time I have shown that we can find the station counts at a single date, but the real important question was could this process be repeated easy on different dates and times. The bike rebalancing team needed a way to check the station counts at any given date as well as at any time. Their motivation for this is to better understand what each station looks like at the end of a day, which would aid them in knowing which stations have too many bikes, too few bikes, and which stations are close to each other to trade bikes essentially. This actually didn’t present as big of a problem because of the lubridate package in R and its ability to easily subset dates that contain time of day. Table 4 shows the counts of the stations on 2015-07-20 21:00:00. The code for creating the table is same as the code used for 2.2.1, but because the bike rebalancing team needed to use 6
  • 8. this code a function was created which can be found in A.5 KioskName Count 10th & Cass 3 10th & Dodge 4 10th & Farnam 7 11th & Jackson 13 13th & Howard 3 14th & Douglas 7 14th & Fahey 3 1516 Cuming St 9 15th & Farnam 0 15th & Howard 3 16th & Douglas 4 1819 Farnam 2 1st & Broadway 3 20th & Dodge 2 24th & Lake 2 50th & Underwood 2 62nd & Dodge 1 66th & Center 2 67th & Frances 6 67th & Pine 4 711 S Main St 0 7th & Jones 6 9th & Jones 1 Aksarben Drive 8 Ameristar 4 Ben’s House 0 Bike Share HQ 0 Bike Union 0 Bob Kerrey Pedestrian Bridge 10 Broadway & Main 6 College of Saint Mary 0 Lewis & Clark Landing 6 Midtown Crossing: 32nd & Farnam 5 Pearl St & Willow Ave 2 Shop 2 The Durham Museum 5 Tip Top Building 0 Tom Hanafan River’s Edge Park 1 dead station 14 Table 4: Station Counts on 2015-07-20 21:00:00 The use of the function along with the rationale of inverting the date data set according to date/time lead to this efficient method to obtain the counts of each and every station at any given date/time. Before this method was discovered the process was extremely complicated because of the inconsistencies inside the data set itself. This was the first time I revealed major discrepancies in the entry of the data as well as the available destinations of a bike. This was the point when I discovered ”ghost” stations like ”Ben’s House” and ”Shop”. These emerging stations that were not a part of the original data stations makes the data analysis portion more challenging. 7
  • 9. 3 Data Analysis This report now takes a different direction into data analysis. The trips information obtained from B-Cycle includes a mass amount of data that could hold the key to better understanding the system’s usage and ultimately give us the information that can lead to a more efficient system. Over the next few subtopics I will be looking into different aspects of the data and inside each of the subtopics I will be discussing some information from the data I find interesting. 3.1 General Information labelGeneral.Information When looking at a large amount of data it is always wise to start by getting a general summary of the data. This summary output should include the basic summary of statistics (mean, median, mode, standard deviation, range) as well as tell how the data has been stored as far as form is concerned. I have produced a basic summary which yielded the following initial interesting themes (for the code used to produce the summary please see A.6): 1. Bob Kerrey Pedestrian Bridge is by far the most used station with 7,000 more checkouts and returns than the second place station (10th and Farnam). 2. Saturday and Sunday were the days with the most usage among all of the stations accounting for 40% of the trips. 3. There was basically an even split between trips over 30 minutes and trips under 30 minutes 55% to 45% 4. The average number of calories burned was 181.5 which leads to the question of how that is calculated. 5. The 24-Hour Pass was far and away the most used Membership Type accounting for 72% of the trips. 6. Most of the trips have an adjustment flag tagged on them (97%). 7. The non-RFID card member is the most common user role accounting for 58% of the trips. 8. There are a few different user program names in the system other than Heartland and Omaha B-Cycle (53). These are just some general findings from looking at the summary and describe outputs. The outputs ask more questions than they answer. They spark the ”why” questions. These outputs also served as motivation for numerous upcoming tables and plots. 3.2 Checkouts vs Returns When looking at this data set it was clear that an essential part of the data analysis piece was going to be looking into how each station performed in the system. Table 5 shows the number of checkouts and returns for each station, for the code used to create the table please see A.7. 8
  • 10. Kiosk Name Checkouts Returns Difference 10th & Cass 954 960 6 10th & Dodge 1228 1249 21 10th & Farnam 3522 3522 0 11th & Jackson 3109 3129 20 13th & Howard 1627 1641 14 14th & Douglas 970 958 -12 14th & Fahey 994 990 -4 1516 Cuming St 1328 1302 -26 15th & Farnam 265 275 10 15th & Howard 627 614 -13 16th & Douglas 745 747 2 1819 Farnam 474 459 -15 1st & Broadway 177 179 2 20th & Dodge 523 502 -21 24th & Lake 121 113 -8 50th & Underwood 281 280 -1 62nd & Dodge 410 409 -1 66th & Center 475 460 -15 67th & Frances 1669 1676 7 67th & Pine 654 644 -10 711 S Main St 151 155 4 7th & Jones 234 229 -5 9th & Jones 283 287 4 Aksarben Drive 1395 1385 -10 Ameristar 511 509 -2 Ben’s House 0 1 1 Bike Share HQ 5 66 61 Bike Union 0 5 5 Bob Kerrey Pedestrian Bridge 10288 10218 -70 Broadway & Main 164 166 2 College of Saint Mary 79 80 1 dead station 0 130 130 Lewis & Clark Landing 2755 2762 7 Midtown Crossing: 32nd & Farnam 615 600 -15 Pearl St & Willow Ave 90 94 4 Shop 0 8 8 The Durham Museum 435 408 -27 Tip Top Building 0 5 5 Tom Hanafan River’s Edge Park 2256 2197 -59 Table 5: Station Checkouts and Returns An additional column was added to the table to show the difference between checkouts and returns at each station. A negative number in the ”Difference” column tells there were more checkouts at that particular station than there were returns, thus leaving a positive number in the ”Difference” column showing there were more returns than checkouts. Because these are two very different types of numbers it was worth splitting the data frame up into two separate tables one for more checkouts than returns (Table 6) and one for more returns than checkouts (Table 7). 9
  • 11. Kiosk Name Checkouts Returns Difference 6 14th & Douglas 970 958 -12 7 14th & Fahey 994 990 -4 8 1516 Cuming St 1328 1302 -26 10 15th & Howard 627 614 -13 12 1819 Farnam 474 459 -15 14 20th & Dodge 523 502 -21 15 24th & Lake 121 113 -8 16 50th & Underwood 281 280 -1 17 62nd & Dodge 410 409 -1 18 66th & Center 475 460 -15 20 67th & Pine 654 644 -10 22 7th & Jones 234 229 -5 24 Aksarben Drive 1395 1385 -10 25 Ameristar 511 509 -2 29 Bob Kerrey Pedestrian Bridge 10288 10218 -70 34 Midtown Crossing: 32nd & Farnam 615 600 -15 37 The Durham Museum 435 408 -27 39 Tom Hanafan River’s Edge Park 2256 2197 -59 Table 6: Stations with more Checkouts than Returns Kiosk Name Checkouts Returns Difference 1 10th & Cass 954 960 6 2 10th & Dodge 1228 1249 21 4 11th & Jackson 3109 3129 20 5 13th & Howard 1627 1641 14 9 15th & Farnam 265 275 10 11 16th & Douglas 745 747 2 13 1st & Broadway 177 179 2 19 67th & Frances 1669 1676 7 21 711 S Main St 151 155 4 23 9th & Jones 283 287 4 26 Ben’s House 0 1 1 27 Bike Share HQ 5 66 61 28 Bike Union 0 5 5 30 Broadway & Main 164 166 2 31 College of Saint Mary 79 80 1 32 dead station 0 130 130 33 Lewis & Clark Landing 2755 2762 7 35 Pearl St & Willow Ave 90 94 4 36 Shop 0 8 8 38 Tip Top Building 0 5 5 Table 7: Stations with more Returns than Checkouts Looking at Tables 6 and 7 nothing really jumps off the page as far as their numbers, meaning there isn’t one station that has an outrageous amount of returns compared to checkouts or vice versa. We can see the few stations in the system that either do not allow checkouts which is interesting in the idea of seeing how many bikes travel through them. In this case the dead station, shop, tip top building station, bike union and Ben’s house didn’t account for a lot of entries given they only accounted for 0.38% of all trips. The tables can be hard to read, but if seen in a bar graph it is much easier to unique features of the table so Figure 1 has been created which covers from 2014-01-11 to 2016-08-31. For code used to create it please see A.7. 10
  • 12. 10th & Cass 10th & Dodge 10th & Farnam 11th & Jackson 13th & Howard 14th & Douglas 14th & Fahey 1516 Cuming St 15th & Farnam 15th & Howard 16th & Douglas 1819 Farnam 1st & Broadway 20th & Dodge 24th & Lake 50th & Underwood 62nd & Dodge 66th & Center 67th & Frances 67th & Pine 711 S Main St 7th & Jones 9th & Jones Aksarben Drive Ameristar Ben's House Bike Share HQ Bike Union Bob Kerrey Pedestrian Bridge Broadway & Main College of Saint Mary dead station Lewis & Clark Landing Midtown Crossing: 32nd & Farnam Pearl St & Willow Ave Shop The Durham Museum Tip Top Building Tom Hanafan River's Edge Park 0 2500 5000 7500 10000 Value KioskName variable Checkouts Returns Figure 1: Checkouts vs Returns per each Kiosk Station Figure 1 was extremely useful because it gave me a real sense of what stations were being used the most. Of course the plot then made me want to see the differences between checkouts and returns in the same type of visual which is why Figure 2 was created. When looking at this figure it is important to remember red bars indicate those stations have more checkouts than returns where as the blue bars indicate stations with more returns than checkouts with that absence of a bar meaning it had the same number of checkouts as returns. For the code used to create the difference plot please see A.7. 11
  • 13. 10th & Cass 10th & Dodge 10th & Farnam 11th & Jackson 13th & Howard 14th & Douglas 14th & Fahey 1516 Cuming St 15th & Farnam 15th & Howard 16th & Douglas 1819 Farnam 1st & Broadway 20th & Dodge 24th & Lake 50th & Underwood 62nd & Dodge 66th & Center 67th & Frances 67th & Pine 711 S Main St 7th & Jones 9th & Jones Aksarben Drive Ameristar Ben's House Bike Share HQ Bike Union Bob Kerrey Pedestrian Bridge Broadway & Main College of Saint Mary dead station Lewis & Clark Landing Midtown Crossing: 32nd & Farnam Pearl St & Willow Ave Shop The Durham Museum Tip Top Building Tom Hanafan River's Edge Park 0 50 100 Difference KioskName type Negative Neutral Postive Figure 2: Difference between Checkouts and Returns for each station over the lifetime of the data set. 3.3 Day of the Week and Time of day Analysis After looking at checkouts vs returns I wondered how the system was used according to the day of the week as well as the time of the day. In this section I will look into the system usage for each day of the week, the time of day and the way those two interact together. 3.3.1 Day of the Week First, I wanted to look at how the data was split according to the day of the week, and because of the additional column I made inside the data set this was rather straight forward. Figure 3 shows the number of uses each day of the week received with the percentage of the total number of trips that accounted for at the end of each individual bar. Please see A.8 for the code used to create the figure. 12
  • 14. 14 12 21 19 11 12 11 Sunday Saturday Friday Thursday Wednesday Tuesday Monday 0 2000 4000 6000 8000 Count DayoftheWeek Figure 3: Usage per day of the week with overall percentage As I looked at the plot the weekend clearly jumped out as being the time when the system is being used the most. If I think of Friday, Saturday and Sunday as the weekend then it accounts for 54% of the trips. The other days are very comparable, which lead me to believe that maybe the system was being used more for people casually rather than for commuting to and from work. Because if people were using the system more for commuting to and from work we would see a percentage of the trips on the weekdays. 3.3.2 Time of the Day After seeing the day of the week distribution I found myself wondering how the system usage throughout a day might look so Figure 4 was created. The figures x-axis is the time of the day based on the hour of the checkout (in military time) and the percentage of trips is the y-axis. As I expected it seems that the middle of the day seems to be the most common time for the system to be used, more specifically between noon and 7pm. The code used to create plot can also be found in A.8. 0.000 0.025 0.050 0.075 5 10 15 20 Hour of the Day PercentageofAllTrips Figure 4: Usage throughout the day with hour time splits Figure 4 was interesting but it led me to wonder how the system behaves during certain blocked times. I know time is a continuous scale, but when thinking about a ’normal’ day there are certain blocks of time a person is concerned with such as rush hour and lunch time. The next plot was made with the following time splits in mind: • 12(midnight) - 6am 13
  • 15. • 6am - 11am • 11am - 2pm • 2pm - 6pm • 6pm - 12(midnight) When creating Figure 5 there was quite of bit of coding that needed to be done to get the correct time splits and the code used to create the figure can be found in A.8. The figure shows that 2pm-6pm is the most used time slot which might have been obvious based on the last couple plots. Remembering the weekend had the most use which indicates that people were probably using the system causally when they went out which usually occurs in the evening hours. 0 14 24 34 286pm−12(Midnight) 2pm−6pm 11am−2pm 6am−11am 12(Midnight)−6am 0 5000 10000 Count Figure 5: Time grouping usage distribution When I looked at the day of the week and time of day figures plotted separately it made me want to see how they would interact with each other so Figure 6 was created. This figure shows each day of the week individual’s time of day distribution. The plot is very interesting to look at because it reveals some ideas that were previously difficult to see in the other plots. One of the most interesting things I saw on this plot was the peaking of the individual days of the week. Thursday and Wednesday seemed to peak earlier in the day as compared to the rest of the days. The graph is open to interpretation but can be very useful in better understanding the system. 14
  • 16. 0 200 400 600 5 10 15 20 25 Time of Day Checkouts DOWC Friday Monday Saturday Sunday Thursday Tuesday Wednesday Figure 6: Usage plot according to time of day and grouped by day of the week. 3.4 Distribution of Membership Types The trips are recorded with many different variables one of which is the type of membership the user has when they checkout a bike. It was worth looking into the distribution of the membership types inside the data set. This information could prove very important to the administration team when they look at the price of each individual membership and its relativity to the use in the system. Figure 7 shows the distribution of membership type. This figure is a simplification of the many different membership types inside of the data set. The data set is a running tally of trips and as the administration team began to update the system membership types changed names as well as having new ones created. To ensure I correctly combined the different types a member of the administration provided the grouping types. The code used to create the figure can be found in A.9 72 14 0 14 FUN Annual and Heartland Pass 30−Day Other 0 10000 20000 Count MembershipType Figure 7: Usage according to Membership Type 15
  • 17. It is clear the ”FUN” pass is by far the most common user membership type with almost 75% of the trips. The fun pass allows a user to use the system for 24-hours which I could see being used the most often given I have already seen that a majority of the use of the system comes through casual use it makes sense that there are not many annual members. I thought there might have been more 30-day memberships though given the summer months, but clearly not the case inside this data set. Logically I wanted to how membership types were used according to the day of the week and the time of day just, which is shown in Figure 8 and the code for the figure can be found in A.9. FUN Annual and Heartland Pass 30−Day Other 0 200 400 600 0 200 400 600 5 10 15 20 25 5 10 15 20 25 Time of Day Checkouts DOWC Friday Monday Saturday Sunday Thursday Tuesday Wednesday Figure 8: Time of day usage plot split by day of the week and facetted by membership type There are a couple of ideas that standout looking at Figure 8. The first two lines that jump off the page are located in the ”FUN” membership type and they are Saturday and Sunday. Not only do are these two the most used combination, but they also show a huge spike at around 3pm. This shows that a majority of ”FUN” users were using the system during the weekend. The next two lines that really catch the eye are in the ”Other” section for Monday’s and Tuesday’s. There was a noticeable spike in their usage from 12:30pm-4pm for Tuesday and from 3pm-5:30pm for Monday. It was also interesting to see the annual members seemed to be using the system more on the weekdays than on the weekends, perhaps indicating annual members use the system to commute to and from work. 3.5 Station Pairs Usage The B-Cycle system allows users to checkout a bike and then return the bike to any given station inside the system (assuming there is an open slot). These trips can either be one-way or a loop meaning the user returns the bike to the same station they checked it out of. This section mainly focuses on the one-way trips, because the main question I wanted to answer was, ”What stations are used together frequently and what locations of Omaha in general are used together.” These questions can be answer by looking at a table of values giving every combination of stations and their counts, but this can be cumbersome. Instead, it is much easier to look at a map with the stations plotted on it and lines showing the frequencies between pairs of stations. Figure 9 is a line frequency plot where the thickness of the line is an indication of how many 16
  • 18. trips there were between that pair of stations, the thicker the line the more the trips. This is first initial plot showing most of the stations, and the code used to produce it can be found in A.10. Figure 9: Station Pairs usage conveyed through a line frequency plot. Figure 9 shows there is a clustering of usage in the downtown area. I was interested in seeing how the downtown area looked so Figure 10 was created which is a zoomed in look at the downtown area. 17
  • 19. Figure 10: Station pairs line frequency plot of downtown Omaha This is the best looking and most revealing plot when looking at the all of the trips taken in the downtown area. I noticed the Bob Kerrey Station had the thickest lines attached to it which makes sense when thinking back to total number checkouts and returns at that station. The usage could differ according to membership types as well thus it was worth creating a similar plot as section 3.4 which was facetted by the membership type. Figure 11 shows this idea and the code used to create the plot can be found in A.11. 18
  • 20. 30−Day Annual and Heartland Pass FUN Other Figure 11: Station pairs line frequency plot of downtown Omaha facetted by membership type. Figure ?? shows some interesting things about the Bob Kerrey station. The station continues to be the most dominating station in the system, but when I look at the annual and monthly memberships frequency line plot it shows Bob Kerrey is just a normal part of the system, which provides further evidence that people using the system with annual and monthly memberships might be using it to travel and from work. The FUN pass frequency line plot shows the Bob Kerrey station is used quite frequently, which paired with the idea that it is close to the water and we believe those are the causal system users it makes sense for them to be together. The frequency line plots were great to get a general idea of how the membership types behaved as sta- tion pairs, but being able to see true quantities of the top 10 pairs could be valuable information. Table 8 was created to show the top 10 pairings along with their membership types associated with each top paring. 19
  • 21. Start End MemType Count Bob Kerrey Pedestrian Bridge Bob Kerrey Pedestrian Bridge FUN 7550 10th & Farnam 10th & Farnam FUN 1850 Lewis & Clark Landing Lewis & Clark Landing FUN 1504 11th & Jackson 11th & Jackson FUN 1354 67th & Frances 67th & Frances FUN 1243 Tom Hanafan River’s Edge Park Tom Hanafan River’s Edge Park FUN 905 Aksarben Drive Aksarben Drive FUN 695 10th & Dodge 10th & Dodge FUN 490 Tom Hanafan River’s Edge Park Bob Kerrey Pedestrian Bridge FUN 468 13th & Howard 13th & Howard FUN 467 Table 8: Top 10 Station Pairs with Membership Type As shown by the Table 8, Bob Kerrey dominates the usage and along with some other stations seems to be controlled by users doing round trips at the stations (checking out and returning the bike to the same location). In fact the only one-way trip in the top 10 was Bob Kerrey to 13th and Howard which according to Table 2 are 11,182 meters (6.9mi) apart which seems like an extremely long journey for a person to make. Which leads one to believe this due to the bike rebalancing act. I wanted to see the top pair of stations excluding round trips so a new top 10 table was created (Table 9) showing the top 10 pairs of stations with their membership type, both tables creation code can be found A.12. top10nr <- trimpc[,c(1:4)] top10nr <- top10nr[top10nr$Start != top10nr$End,] top10nr <- top10nr[order(top10nr$Count,decreasing = T),] top10nr <- head(top10nr,10) rownames(top10nr) <- 1:10 Start End MemType Count Tom Hanafan River’s Edge Park Bob Kerrey Pedestrian Bridge FUN 468 Bob Kerrey Pedestrian Bridge 11th & Jackson FUN 382 Bob Kerrey Pedestrian Bridge Tom Hanafan River’s Edge Park FUN 369 Bob Kerrey Pedestrian Bridge Lewis & Clark Landing FUN 316 Bob Kerrey Pedestrian Bridge 13th & Howard FUN 308 11th & Jackson Bob Kerrey Pedestrian Bridge Other 245 Bob Kerrey Pedestrian Bridge 10th & Farnam FUN 226 11th & Jackson Bob Kerrey Pedestrian Bridge FUN 207 Lewis & Clark Landing Bob Kerrey Pedestrian Bridge FUN 184 10th & Farnam Bob Kerrey Pedestrian Bridge FUN 156 Table 9: Top 10 Station Pairs with Membership Type excluding round trips Table 9 confirms that Bob Kerrey is the essential to the system given every top pair combination included Bob Kerrey. This is very surprising to me, that there wasn’t a single pair of stations that didn’t include Bob Kerrey to crack the top 10. 3.6 Bus Route Interaction The city of Omaha has a bus system just like many big cities, which is used by many different people. Heartland B-Cycle is a competitor to the bus stations in a way that they are both vying for the people to 20
  • 22. use their transportation services. I thought it would be worth looking into how the stations usage looked compared to the three most used bus routes (2,11,15). Figure 12 shows the entirety of the bus routes overlayed with the frequency line plot from section 3.5 and the code used to create it can be found in A.13. The pink line shows bus route 2, the green line shows bus route 11 and the blue line shows bus route 15. Figure 12: Popular bus routes plotted against station pairs line frequency plot. Figure 12 was not incredibly fruitful other thna showing us again that downtown is where most of the trips occur. I wanted to see how the downtown from section 3.5 would look with the bus routes so Figure 13 was created. 21
  • 23. Figure 13: Downtown Omaha station pairs line frequency plot with popular bus routes shown. This is an easier graph to read but doesn’t provide much information about how the bus routes affect the bike system or vice versa. There doesn’t seem to be a station that is close to the routes that is exploding with usage, thus Figure 13 is inconclusive. 3.7 Bike Histories So far I have looked into the use of the stations as tandems or just looked at the individual stations usage. I would now like to switch gears and look at the trips a bike takes. Individual bike histories could provide vital information about how a bike moves from station to station as well as if the data collection process is efficient. The idea was to get each bikes full history, which would include each individual trip it takes from station to station showing the time it was checked out and returned from the stations. This process took some coding which can be found in A.14. The bike histories were created in a list which contained an element corresponding to each individual bike which made a data frame showing every trip the bike took. This was an extremely large list and thus has been excluded from this report, but a sample of one of the bikes data frame can be seen in Table 10. 22
  • 24. Bike CheckoutKioskName ReturnKioskName Checkout date time Return date time 8751 11 67th & Pine Aksarben Drive 2015-03-08 17:36:36 2015-03-08 18:12:26 8762 11 Aksarben Drive Aksarben Drive 2015-03-09 12:40:53 2015-03-09 12:56:52 8789 11 Aksarben Drive Aksarben Drive 2015-03-10 12:56:43 2015-03-10 13:22:30 8834 11 Aksarben Drive 67th & Pine 2015-03-11 16:31:21 2015-03-11 23:05:06 8900 11 67th & Pine 66th & Center 2015-03-12 19:02:35 2015-03-12 19:43:24 8911 11 66th & Center 66th & Center 2015-03-13 13:30:59 2015-03-13 14:20:47 8925 11 66th & Center 66th & Center 2015-03-13 17:28:50 2015-03-13 18:21:34 8945 11 66th & Center 66th & Center 2015-03-13 19:21:00 2015-03-13 20:07:15 9009 11 66th & Center dead station 2015-03-14 16:28:51 2015-03-16 10:45:10 9213 11 Aksarben Drive Aksarben Drive 2015-03-17 19:11:40 2015-03-17 19:12:08 10457 11 Aksarben Drive Aksarben Drive 2015-04-21 21:42:52 2015-04-21 22:01:14 Table 10: Sample of a Bike’s History Notice from Table 10 there was a ’jump’ in the bike’s history even in this small sample. A jump occurs when the bike was checked out from a station different from the station it was last returned to. This was a problem inside the data set, because I was essentially missing a trip for every occurence of this situation. The main point of making this list of bike histories was to gain information about the bikes trips, but with missing information my data is comprimised. Because of this I had to take the time to design a method of filling in this gaps. Finding the Jumps The end goal is to fill in the gaps of the trips, but in order to do that I had to know where I needed to fill. The first step in this process was obtaining a list of all the jumps for each individual bike. This was done by comparing the checkout station to the previous trips return station, inside the same bike history of course. This process was then repeated for all 150 bikes. Table 11 shows a small sample of the jumping list, whereas the entire list along with the code used to produce it can be found in A.15. Bike CheckoutKioskName ReturnKioskName Checkout date time Return date time co 16496 951 1819 Farnam 1516 Cuming St 2015-07-22 13:02:56 2015-07-22 13:10:00 82 19639 951 15th & Howard 15th & Howard 2015-08-29 12:53:35 2015-08-29 14:10:43 83 33406 942 14th & Douglas Bike Share HQ 2016-06-28 13:48:52 2016-06-28 15:04:29 19 33565 942 10th & Dodge 10th & Dodge 2016-06-29 08:49:23 2016-06-29 09:56:20 20 Table 11: Sample of Jumps from bikes 951 and 942 This was a very big problem inside the data set especially when there were 944 rows affected by this phenomenon. Using these jumps the next step was to create the filler rows to insert into the data base which would create a complete history for each of the bikes eliminating the jumps from the set, the code used to produce these filler rows can be found in A.15. 3.8 Station Histories After I looked into the bike histories I found myself wondering what the individual station histories might tell me. The station histories could show how often a station was full or empty and this of course is of great importance to both the administration at B-Cycle as well as the bike rebalancing team. This information is also important to individual customers as if a station is full and they are trying to turn in their bike they are in a real predicament along with the idea of needed a bike and the station being empty. Table 12 shows a sample of one of the station’s individual history and the code used to produce it can be found in A.16. 23
  • 25. Bike CheckoutKioskName ReturnKioskName Checkout date time Return date time UserRole 3586 258 Bob Kerrey Pedestrian Bridge Bob Kerrey Pedestrian Bridge 2014-07-05 10:33:00 2014-07-05 11:44:39 Subscriber 3587 197 14th & Fahey Bob Kerrey Pedestrian Bridge 2014-07-05 10:38:35 2014-07-05 11:31:36 Subscriber 3588 193 14th & Fahey Bob Kerrey Pedestrian Bridge 2014-07-05 10:39:00 2014-07-05 11:31:32 Subscriber 3589 112 11th & Jackson Bob Kerrey Pedestrian Bridge 2014-07-05 10:49:00 2014-07-05 11:44:02 Subscriber 70100 18 11th & Jackson Bob Kerrey Pedestrian Bridge 2014-07-05 10:49:52 2014-07-05 10:49:52 Jumper 3590 197 Bob Kerrey Pedestrian Bridge 11th & Jackson 2014-07-05 11:31:43 2014-07-05 11:48:57 Subscriber 3591 193 Bob Kerrey Pedestrian Bridge 11th & Jackson 2014-07-05 11:32:02 2014-07-05 11:48:53 Subscriber 3594 55 Bob Kerrey Pedestrian Bridge Bob Kerrey Pedestrian Bridge 2014-07-05 12:13:45 2014-07-05 15:34:30 Subscriber 3595 44 Bob Kerrey Pedestrian Bridge Bob Kerrey Pedestrian Bridge 2014-07-05 12:14:27 2014-07-05 15:34:35 Subscriber 3597 258 Bob Kerrey Pedestrian Bridge 11th & Jackson 2014-07-05 12:53:12 2014-07-05 13:17:15 Subscriber 3600 9 Bob Kerrey Pedestrian Bridge Bob Kerrey Pedestrian Bridge 2014-07-05 13:09:39 2014-07-05 14:00:29 Subscriber Table 12: Sample of Bob Kerrey Pedestrian Bridge Station History 3.8.1 Time a station is full and empty(or near) If a customer is riding a bike and reaches their desired location to find the station at that location is full we have a serious problem on our hands because the person then has to go looking for the next station. This could be a very frustrating situation for the customer and thus is something I wanted to look into. In order to do this I have to split the station histories up making each trip essentially have two separate pieces of data. One piece consisting of checkout location and date/time for each trip and the other consisting of the return location and date/time. This allowed me to have each event either be categorized as the station gaining a bike (+1) or losing a bike (-1) whenever a piece of data was entered into the system. Once I had that information I crept through the database row by row counting each event and waiting until a station became full. I wanted to be able to record the time that a station was full so I had a time difference function built into my ’creeping’ function all of which can be found in A.17. An essential question that came about as I was doing this was, ”How did the station go from full to not full?” Meaning was the change of status due to the bike rebalancing act or was it just by chance or normal usage that it came to change. Table 13 shows how long each station was full in minutes over the time period 2015-04-01 03:00:00 to 2016-08-31 22:30:08. This time period was chosen to ensure the system was not under the effects of having new bikes and stations being added to it. Time Full Percentage Time Empty Percentage Time near Empty Percentage Time Empty or near Percentage Bob Kerrey Pedestrian Bridge 7128 0.95 8176 1.09 5320 0.71 13496 1.81 13th & Howard 2226 0.30 2 0.00 3963 0.53 3965 0.53 Aksarben Drive 758 0.10 56298 7.54 56637 7.58 112936 15.12 67th & Pine 50256 6.73 14832 1.99 80068 10.72 94901 12.70 67th & Frances 76 0.01 149808 20.05 31167 4.17 180975 24.22 62nd & Dodge 0 0.00 11439 1.53 42852 5.74 54291 7.27 1819 Farnam 0 0.00 53251 7.13 119067 15.94 172318 23.07 66th & Center 0 0.00 187440 25.09 131665 17.62 319105 42.71 11th & Jackson 6215 0.83 11947 1.60 22259 2.98 34206 4.58 14th & Fahey 6532 0.87 24455 3.27 83535 11.18 107991 14.45 1516 Cuming St 0 0.00 24375 3.26 34207 4.58 58582 7.84 1st & Broadway 0 0.00 44740 5.99 354549 47.46 399289 53.45 711 S Main St 0 0.00 30850 4.13 171806 23.00 202656 27.13 Lewis & Clark Landing 39411 5.28 13060 1.75 41301 5.53 54361 7.28 10th & Farnam 18486 2.47 44825 6.00 117189 15.69 162014 21.69 50th & Underwood 0 0.00 93874 12.57 126768 16.97 220643 29.53 Midtown Crossing: 32nd & Farnam 0 0.00 13764 1.84 8370 1.12 22134 2.96 The Durham Museum 0 0.00 31019 4.15 15708 2.10 46727 6.25 15th & Howard 0 0.00 47895 6.41 51152 6.85 99047 13.26 9th & Jones 0 0.00 88044 11.78 23087 3.09 111131 14.88 20th & Dodge 0 0.00 1275 0.17 11 0.00 1286 0.17 7th & Jones 0 0.00 33732 4.52 177687 23.78 211419 28.30 Broadway & Main 15608 2.09 0 0.00 0 0.00 0 0.00 24th & Lake 0 0.00 362515 48.52 94099 12.60 456614 61.12 Ameristar 0 0.00 22317 2.99 101298 13.56 123615 16.55 Tom Hanafan River’s Edge Park 0 0.00 13825 1.85 4502 0.60 18326 2.45 14th & Douglas 2979 0.40 113920 15.25 107785 14.43 221705 29.68 10th & Cass 1227 0.16 3980 0.53 22924 3.07 26904 3.60 16th & Douglas 5472 0.73 13612 1.82 36097 4.83 49709 6.65 Tip Top Building 0 0.00 1 0.00 28523 3.82 28524 3.82 Pearl St & Willow Ave 0 0.00 25272 3.38 323604 43.32 348876 46.70 College of Saint Mary 0 0.00 48268 6.46 4940 0.66 53207 7.12 15th & Farnam 14921 2.00 6984 0.93 7314 0.98 14298 1.91 Table 13: Stations Time being Full and Empty (or near) (in mins) with Percentage Table 13 is extremely fruitful. It shows all the stations that had time being completely full or being completely empty along with the percentage over the given data range that the station was full or empty. The near empty column was added because some people might be trying to ride the bikes with a companion and if there was only one bike that option of course would be eliminated for that particular pair of people. The information provided here can be evaluated and looked into immensely but one facet I found interesting 24
  • 26. was the fact that 1st and Broadway was empty or near 53% of the time. I could spend a lot of time breaking down this table but I will read the further interpretation up to the individual reader. 3.8.2 Why did a station’s full or empty count change? It was important to look at the amount of time a station was full and empty (or near) but it is also important to find out how it went from full and empty to not full and not empty. In other words, did a customer cause the station to open up/get another bike or was that the rebalancing team. In order to answer this question I needed to have our list of stations split like did in the previous section, but this time each row inside the data frame of each element needs to have a ”UserRole” column. I then needed to work my way through each station’s list and find the row’s UserRole after the row of which causes a station to become full/empty(or near). The code used can be found in A.18 which produced Figure 14 for the full data and Figure 15 for the empty (or near empty) data. 10th & Cass 10th & Dodge 10th & Farnam 11th & Jackson 13th & Howard 14th & Douglas 14th & Fahey 15th & Farnam 16th & Douglas 67th & Frances 67th & Pine 7th & Jones Aksarben Drive Bob Kerrey Pedestrian Bridge Broadway & Main Lewis & Clark Landing 0 100 200 300 Freq Station UserRole Maintenance Not Maintenance Figure 14: Number of Instances Maintenance did (or did not) cause a station to go from full to not full 25
  • 27. 10th & Cass 10th & Farnam 11th & Jackson 13th & Howard 14th & Douglas 14th & Fahey 1516 Cuming St 15th & Farnam 15th & Howard 16th & Douglas 1819 Farnam 1st & Broadway 20th & Dodge 24th & Lake 50th & Underwood 62nd & Dodge 66th & Center 67th & Frances 67th & Pine 711 S Main St 7th & Jones 9th & Jones Aksarben Drive Ameristar Bob Kerrey Pedestrian Bridge College of Saint Mary Lewis & Clark Landing Midtown Crossing: 32nd & Farnam Pearl St & Willow Ave The Durham Museum Tip Top Building Tom Hanafan River's Edge Park 0 250 500 750 1000 Freq Station UserRole Maintenance Not Maintenance Figure 15: Times Maintenance did (or did not) cause a station to go from empty (or near) to not empty (or near) I can also display the tables to make it easier to see each individual value. The tables for the full and empty counts are shown in the next few tables (14 and 15) repsectively. UserRole Freq Station 1 Maintenance 2 Bob Kerrey Pedestrian Bridge 2 Not Maintenance 37 Bob Kerrey Pedestrian Bridge 3 Maintenance 5 13th & Howard 4 Not Maintenance 22 13th & Howard 6 Not Maintenance 12 Aksarben Drive 7 Maintenance 14 67th & Pine 8 Not Maintenance 30 67th & Pine 10 Not Maintenance 2 67th & Frances 17 Maintenance 6 11th & Jackson 18 Not Maintenance 13 11th & Jackson 19 Maintenance 8 14th & Fahey 20 Not Maintenance 13 14th & Fahey 27 Maintenance 14 Lewis & Clark Landing 28 Not Maintenance 291 Lewis & Clark Landing 29 Maintenance 9 10th & Farnam 30 Not Maintenance 78 10th & Farnam 43 Maintenance 1 7th & Jones 45 Maintenance 2 Broadway & Main 46 Not Maintenance 2 Broadway & Main 56 Not Maintenance 3 10th & Dodge 60 Not Maintenance 2 14th & Douglas 61 Maintenance 4 10th & Cass 62 Not Maintenance 6 10th & Cass 63 Maintenance 1 16th & Douglas 64 Not Maintenance 20 16th & Douglas 71 Maintenance 8 15th & Farnam 72 Not Maintenance 33 15th & Farnam Table 14: Full changes according to UserRole 26
  • 28. UserRole Freq Station 1 Maintenance 26 Bob Kerrey Pedestrian Bridge 2 Not Maintenance 389 Bob Kerrey Pedestrian Bridge 4 Not Maintenance 13 13th & Howard 5 Maintenance 13 Aksarben Drive 6 Not Maintenance 287 Aksarben Drive 7 Maintenance 14 67th & Pine 8 Not Maintenance 66 67th & Pine 9 Maintenance 30 67th & Frances 10 Not Maintenance 406 67th & Frances 11 Maintenance 10 62nd & Dodge 12 Not Maintenance 19 62nd & Dodge 13 Maintenance 19 1819 Farnam 14 Not Maintenance 111 1819 Farnam 15 Maintenance 46 66th & Center 16 Not Maintenance 166 66th & Center 17 Maintenance 23 11th & Jackson 18 Not Maintenance 371 11th & Jackson 19 Maintenance 22 14th & Fahey 20 Not Maintenance 146 14th & Fahey 21 Maintenance 51 1516 Cuming St 22 Not Maintenance 182 1516 Cuming St 23 Maintenance 27 1st & Broadway 24 Not Maintenance 64 1st & Broadway 25 Maintenance 33 711 S Main St 26 Not Maintenance 8 711 S Main St 27 Maintenance 24 Lewis & Clark Landing 28 Not Maintenance 340 Lewis & Clark Landing 29 Maintenance 38 10th & Farnam 30 Not Maintenance 995 10th & Farnam 31 Maintenance 14 50th & Underwood 32 Not Maintenance 134 50th & Underwood 33 Maintenance 18 Midtown Crossing: 32nd & Farnam 34 Not Maintenance 59 Midtown Crossing: 32nd & Farnam 35 Maintenance 2 The Durham Museum 36 Not Maintenance 16 The Durham Museum 37 Maintenance 24 15th & Howard 38 Not Maintenance 133 15th & Howard 39 Maintenance 10 9th & Jones 40 Not Maintenance 23 9th & Jones 41 Maintenance 2 20th & Dodge 42 Not Maintenance 7 20th & Dodge 43 Maintenance 29 7th & Jones 44 Not Maintenance 43 7th & Jones 47 Maintenance 34 24th & Lake 48 Not Maintenance 48 24th & Lake 49 Maintenance 14 Ameristar 50 Not Maintenance 104 Ameristar 51 Maintenance 6 Tom Hanafan River’s Edge Park 52 Not Maintenance 66 Tom Hanafan River’s Edge Park 59 Maintenance 33 14th & Douglas 60 Not Maintenance 380 14th & Douglas 61 Maintenance 20 10th & Cass 62 Not Maintenance 70 10th & Cass 63 Maintenance 13 16th & Douglas 64 Not Maintenance 66 16th & Douglas 66 Not Maintenance 2 Tip Top Building 67 Maintenance 16 Pearl St & Willow Ave 68 Not Maintenance 38 Pearl St & Willow Ave 69 Maintenance 7 College of Saint Mary 70 Not Maintenance 23 College of Saint Mary 71 Maintenance 66 15th & Farnam 72 Not Maintenance 2 15th & Farnam Table 15: Empty (or near empty) changes according to UserRole 3.8.3 What about the busy months? With Omaha being located in the Midwest there are definitely times of the year when the stations lay dormant for the most part, such as the winter. One would be led to believe that the summer months are the time of year when the system sees the most usage. This is in part to numerous different factors including the weather, length of the day (in sunlight), activities in Omaha, and work/school schedules. The previous Figure 13 showed all the numbers calculated from the start of the data set to the end of the data set. I wondered what this would look like for the summer months only. To do these calculations I needed to set a new time frame to look at. I choose to start at 2016-05-01 01:00:00 and go until 2016-09-19 23:00:00. Code used for the creation of the new table can be found in A.19 and the statistics are shown in Table 3.8.3. 27
  • 29. Time Full Percentage Time Empty Percentage Time near Empty Percentage Time Empty or near Percentage Bob Kerrey Pedestrian Bridge 0 0.00 6590 3.22 5690 2.78 12280 6.01 13th & Howard 12748 6.24 0 0.00 0 0.00 0 0.00 Aksarben Drive 38202 18.69 0 0.00 0 0.00 0 0.00 67th & Pine 36501 17.86 0 0.00 0 0.00 0 0.00 67th & Frances 0 0.00 4544 2.22 2181 1.07 6725 3.29 62nd & Dodge 0 0.00 2595 1.27 8 0.00 2602 1.27 1819 Farnam 73 0.04 0 0.00 0 0.00 0 0.00 66th & Center 2617 1.28 0 0.00 0 0.00 0 0.00 11th & Jackson 20896 10.23 0 0.00 0 0.00 0 0.00 14th & Fahey 23816 11.65 0 0.00 1 0.00 1 0.00 1516 Cuming St 0 0.00 10819 5.29 25153 12.31 35972 17.60 1st & Broadway 0 0.00 10255 5.02 12398 6.07 22653 11.09 Lewis & Clark Landing 18015 8.82 1 0.00 1398 0.68 1399 0.68 10th & Farnam 0 0.00 13909 6.81 25814 12.63 39723 19.44 50th & Underwood 13044 6.38 0 0.00 0 0.00 0 0.00 Midtown Crossing: 32nd & Farnam 0 0.00 21524 10.53 15503 7.59 37027 18.12 The Durham Museum 16 0.01 11328 5.54 10169 4.98 21497 10.52 15th & Howard 0 0.00 25248 12.35 12081 5.91 37329 18.27 9th & Jones 0 0.00 19214 9.40 3125 1.53 22340 10.93 20th & Dodge 0 0.00 7421 3.63 2889 1.41 10310 5.05 7th & Jones 1013 0.50 14289 6.99 24321 11.90 38610 18.89 24th & Lake 0 0.00 2153 1.05 68 0.03 2221 1.09 Ameristar 796 0.39 0 0.00 0 0.00 0 0.00 Tom Hanafan River’s Edge Park 0 0.00 11562 5.66 6697 3.28 18259 8.93 10th & Dodge 30661 15.00 0 0.00 0 0.00 0 0.00 14th & Douglas 11940 5.84 0 0.00 0 0.00 0 0.00 10th & Cass 0 0.00 20945 10.25 19215 9.40 40160 19.65 16th & Douglas 0 0.00 6012 2.94 9823 4.81 15835 7.75 Pearl St & Willow Ave 0 0.00 77 0.04 74012 36.22 74089 36.25 College of Saint Mary 0 0.00 48268 23.62 4940 2.42 53207 26.04 15th & Farnam 14921 7.30 6984 3.42 7314 3.58 14298 7.00 Table 16: Summer Stats for Full/Empty/or near for all Stations 4 Forecasting 4.1 Discovery There are many different forecasting models and each one varies in technique depending on the data they represent. In order to pick the best type of model I first needed to take a closer look at the data for the bike usage at each station. The distribution of bike usage according to duration of the trip at each checkout station is shown in Table 17. This table allowed me to see what kind of trips people are taking from each of the stations. It is also worth looking at what each station has coming back to it according to the trip duration, these numbers are shown in Table 18. The code used to find the numbers and create the tables can be found in A.20. 28
  • 30. Checkout Station Under 5 min 6-10 min 11-20 min 21-30 min 31-60 min Over 60 min 10th & Cass 61 135 176 85 251 182 10th & Dodge 99 155 127 112 416 306 10th & Farnam 372 180 340 328 1487 782 11th & Jackson 273 304 499 259 1010 736 13th & Howard 213 224 252 126 451 346 14th & Douglas 217 242 146 66 154 140 14th & Fahey 130 145 263 76 170 195 1516 Cuming St 234 235 230 95 199 328 15th & Farnam 80 43 31 16 46 40 15th & Howard 135 110 69 36 145 124 16th & Douglas 94 169 157 58 142 117 1819 Farnam 98 89 95 25 84 83 1st & Broadway 111 2 4 5 31 24 20th & Dodge 90 90 80 44 108 101 24th & Lake 56 6 7 11 21 20 50th & Underwood 80 7 32 35 74 52 62nd & Dodge 78 60 99 35 58 75 66th & Center 126 13 25 38 162 108 67th & Frances 415 34 75 146 633 343 67th & Pine 166 29 111 61 187 93 711 S Main St 122 1 2 5 7 14 7th & Jones 132 22 25 3 23 26 9th & Jones 117 26 23 30 40 31 Aksarben Drive 120 25 99 208 665 275 Ameristar 136 7 75 64 176 50 Bike Share HQ 3 0 2 0 0 0 Bob Kerrey Pedestrian Bridge 590 212 891 1449 5305 1756 Broadway & Main 98 5 14 8 22 16 College of Saint Mary 46 5 3 3 14 8 Lewis & Clark Landing 209 242 217 284 1294 493 Midtown Crossing: 32nd & Farnam 42 16 198 89 124 103 Pearl St & Willow Ave 18 2 7 12 27 16 The Durham Museum 97 136 42 26 79 54 Tom Hanafan River’s Edge Park 310 69 282 329 943 314 Table 17: Bike Usage Distribution at each Checkout Station 29
  • 31. Return Station Under 5 min 6-10 min 11-20 min 21-30 min 31-60 min Over 60 min 10th & Cass 84 136 110 46 244 180 10th & Dodge 100 145 110 91 466 335 10th & Farnam 366 158 363 356 1521 742 11th & Jackson 207 284 412 317 1107 793 13th & Howard 185 138 259 167 496 388 14th & Douglas 224 231 147 57 185 113 14th & Fahey 151 190 166 93 199 190 1516 Cuming St 238 248 170 102 202 337 15th & Farnam 79 60 18 18 31 44 15th & Howard 133 84 94 52 134 116 16th & Douglas 115 231 129 33 127 106 1819 Farnam 77 72 91 32 100 87 1st & Broadway 112 2 6 9 30 20 20th & Dodge 75 68 77 45 103 131 24th & Lake 56 4 10 8 19 15 50th & Underwood 80 7 38 25 74 56 62nd & Dodge 78 14 138 29 55 89 66th & Center 127 13 18 31 189 80 67th & Frances 422 29 84 160 627 330 67th & Pine 159 73 68 67 197 73 711 S Main St 119 0 4 8 9 15 7th & Jones 126 32 18 13 27 10 9th & Jones 133 23 20 13 28 43 Aksarben Drive 119 32 101 200 675 256 Ameristar 136 4 55 80 158 75 Ben’s House 0 0 0 0 0 1 Bike Share HQ 2 1 3 2 3 55 Bike Union 0 0 0 1 1 3 Bob Kerrey Pedestrian Bridge 600 390 1192 1389 5030 1544 Broadway & Main 99 6 9 4 29 18 College of Saint Mary 46 5 3 3 12 11 dead station 1 0 1 1 3 124 Lewis & Clark Landing 255 185 230 261 1324 499 Midtown Crossing: 32nd & Farnam 42 32 137 102 156 96 Pearl St & Willow Ave 19 2 10 12 28 16 Shop 0 0 0 0 0 8 The Durham Museum 94 74 75 24 91 44 Tip Top Building 0 0 0 2 1 2 Tom Hanafan River’s Edge Park 309 67 332 314 867 306 Table 18: Bike Usage Distribution at each Return Station Table 17 and 18 were created as a talking point with members of the research team. The team wanted to get a better sense of the data at the station level according to trip times to see if a predictive model could be produced giving the prediction of bike usage at certain stations. This idea did not end up gaining traction, but I left the tables in this report for the reader to look at in case of future works. The figures 16 and 17 were created to show the table in a bar plot format which allows viewers the opportunity to quickly identify unique aspects of the data obtained, because of Bob Kerrey’s overwhelming presence in the data set it has been removed from the figures. 30
  • 32. ## Using Return Station as id variables 10th & Cass 10th & Dodge 10th & Farnam 11th & Jackson 13th & Howard 14th & Douglas 14th & Fahey 1516 Cuming St 15th & Farnam 15th & Howard 16th & Douglas 1819 Farnam 1st & Broadway 20th & Dodge 24th & Lake 50th & Underwood 62nd & Dodge 66th & Center 67th & Frances 67th & Pine 711 S Main St 7th & Jones 9th & Jones Aksarben Drive Ameristar Ben's House Bike Share HQ Bike Union Broadway & Main College of Saint Mary dead station Lewis & Clark Landing Midtown Crossing: 32nd & Farnam Pearl St & Willow Ave Shop The Durham Museum Tip Top Building Tom Hanafan River's Edge Park 0 500 1000 1500 Value ReturnStation variable Under 5 min 6−10 min 11−20 min 21−30 min 31−60 min Over 60 min Figure 16: Bike Usage for each Return Station excluding Bob Kerrey Predistrian Bridge 31
  • 33. ## Using Checkout Station as id variables 10th & Cass 10th & Dodge 10th & Farnam 11th & Jackson 13th & Howard 14th & Douglas 14th & Fahey 1516 Cuming St 15th & Farnam 15th & Howard 16th & Douglas 1819 Farnam 1st & Broadway 20th & Dodge 24th & Lake 50th & Underwood 62nd & Dodge 66th & Center 67th & Frances 67th & Pine 711 S Main St 7th & Jones 9th & Jones Aksarben Drive Ameristar Bike Share HQ Broadway & Main College of Saint Mary Lewis & Clark Landing Midtown Crossing: 32nd & Farnam Pearl St & Willow Ave The Durham Museum Tom Hanafan River's Edge Park 0 500 1000 1500 Value CheckoutStation variable Under 5 min 6−10 min 11−20 min 21−30 min 31−60 min Over 60 min Figure 17: Bike Usage for each Checkout Station excluding Bob Kerrey Predistrian Bridge The figures 16 and 17 show some of the stations that have the most checkouts and returns for each type of trip duration, but this data did not end up revealing enough about the trips in order for me to be able to fit a predictive model. Instead of making a predictive model I went a different direction, demand. If the administration team could know what the demand for each station was on average and what the max demand was for each station in the system they could better estimate where they needed more bikes, slots, rebalancing efforts, etc. 32
  • 34. 4.2 Low Count Histogram Most of the research done throughout this report was motivated by the idea of helping Omaha B-Cycle become a more efficient company as well as better plan future expansion of the system. The initial problem Omaha C-Cycle presented the research group was one of rebalancing the bikes as I have mentioned before. In this section I provide extremely useful information on both the fronts of bike rebalancing and future expansion of stations. What if B-Cycle knew how many bikes a station was down at most on any given day. A station has bikes coming in and going out constantly, but what I wanted to know was how many bikes did the station go down until it gained bikes back. Such as, assume station A has nine bikes at the start of the day; three people checkout bikes from station A thus the station is down three bikes at this point according to the start of the day. If two bikes are returned now the station is only down one bike according to the start of the day. I wanted to know what the most number of bikes a station went down in any given day. This idea is essentially make a demand frequency for each station. Figure 18 shows a compilation of stations individual days max demand of bikes. A value of two in the table indicates there was at most 2 bikes taken in a row without replacement from a station which occurred the amount of times given by the height of the bar. A zero indicates that the station on that day either didn’t have any bikes checked out or had even/more returns to the station than checkouts for that particular day. The figure only shows data from the summer of 2016, which was chosen because of the stability of the time period (as previously discussed) as well as because of the usage during those months. For the code used to obtain the data used to create the figure please see A.21. 636 605 451 229 149 110 74 48 29 17 15 11 11 4 1 2 0 200 400 600 0 5 10 15 Max Demand on any given Day Count All Station Lows Frequency Figure 18: Frequency of Max Demands for all stations compiled 33
  • 35. I wanted to see how the stations looked individually now that I was able to see the max demand counts as a system. Figure 19 shows the individual histograms for each station. The legend to reading these plots is the same as Figure 18. The total count of each occurrence has been removed because of the small size of each of the histograms. Individual station frequency tables can found in A.22. Figure 19: Minimum Counts per Day Split by Station 10th & Cass 10th & Dodge 10th & Farnam 11th & Jackson 13th & Howard 14th & Douglas 14th & Fahey 1516 Cuming St 15th & Farnam 15th & Howard 16th & Douglas 1819 Farnam 1st & Broadway 20th & Dodge 24th & Lake 50th & Underwood 62nd & Dodge 66th & Center 67th & Frances 67th & Pine 711 S Main St 7th & Jones 9th & Jones Aksarben Drive Ameristar Bike Share HQ Bike Union Bob Kerrey Pedestrian BridgeBroadway & Main College of Saint Mary dead station Lewis & Clark LandingMidtown Crossing: 32nd & FarnamPearl St & Willow Ave The Durham MuseumTom Hanafan River's Edge Park 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15 Max Demand on any given Day Count Figure 19 is not the easiest figure to read and thus it made sense to plot a few of the interesting stations on a larger scale allowing their qualities to be easier to see. I have choosen to look at the following stations: Cumming Street, Durham Museum and Tom Hanafan River’s Edge Park. The rest of the individual histograms can be found in A.23. 34
  • 36. 24 16 20 13 8 10 8 3 5 1 1 3 Tom Hanafan River's Edge Park 0 20 40 60 0 5 10 15 Max Demand on any given Day Count 20 30 23 9 4 2 2 1 1516 Cuming St 0 20 40 60 0 5 10 15 Max Demand on any given Day Count 35
  • 37. 10 66 17 8 3 1 The Durham Museum 0 20 40 60 0 5 10 15 Max Demand on any given Day Count 5 Future Works The goal of this research report was to provide data analysis on the Omaha B-Cycle data that would reveal useful traits about the bike sharing system. The deeper I looked into a specific topic the more layers I discovered and realized that there is still a lot of work that could be done. The bike rebalancing team now has more data to implement in their model design and testing which should help them further their research in finding the best method to redistribute bikes amongst the stations. Unfortunately I do not have infinite time and I had to stop perusing some topic areas, but if I believe future researchers should focus their efforts on the following topics. The first topic which could be of great importance would be to further develop a way to predict/determine the demand of bikes at each station. The histogram technique is quick and allows for real time analysis of the demand of bikes which is very useful to the bike rebalancing team and the B-Cycle administration but as the system gets more and more data a model could be developed and trained using data from the system. Future models could be designed using decision trees, support vector machines and random kernel methods. Another topic that should be looked into would be to map out GPS data from the bikes from the sys- tem. Seeing where bikes travel from station to station as well as just around the city of Omaha could further help not only the creation of new stations as well as the bike distribution side of things but could also be used on the business side of things I am implying that if the administration team knew what routes bikes were traveling maybe they could sell advertisements or come up with other profitable ideas. The final topic I feel would be fruitful would be to look into how to update and maintain the physical bikes in the system. When talking with the administration from B-Cycle they informed me they want to do maintenance on each bike every three weeks. The idea initially was to see if the bikes circulated enough to just do maintenance on them inside the rebalancing effort with a combination of the bikes that were docked at the stations close the headquarters. Before the research report was started I looked slightly into this idea but it didn’t end up gaining any traction, but I do believe it could be incredibly useful to the company. The door has been opened and a small light has been shed on the Omaha B-Cycle System. There is still much work to be done but with the research provided in this paper there is at least a good initial starting point. Obtaining information about station’s trips to and from, bike’s individual histories, demands of stations, station interaction and time analysis helped to slingshot researchers in the right direction. 36
  • 38. References [1] R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org/ , 2016 [2] Yihui Xie knitr: A general-purpose package for dynamic report generation in R, http://yihui.name/ knitr/, 2016 [3] Hadley Wickham, Flexibly Reshape Data: A Reboot of the Reshape Package, https://github.com/ hadley/reshape, 2016 [4] G. Grothendieck, Perform SQL Selects on R Data Frames, http://sqldf.googlecode.com, 2016 [5] Garrett Grolemund, Vitalie Spinu, and Hadley Wickham, Make Dealing with Dates a Little Easier, http://sqldf.googlecode.com, 2016 [6] Winston Chang and Hadley Wickham, An Implementation of the Grammar of Graphics, http:// ggplot2.org, 2016 [7] David Kahle and Hadley Wickham, Spatial Visualization with ggplot2, https://github.com/dkahle/ ggmap, 2016 [8] Frank E Harrell Jr, Harrell Miscellaneous, http://biostat.mc.vanderbilt.edu/Hmisc, 2016 [9] Heartland B-Cycle, Heartland B-Cycle, https://admin.bcycle.com/Admin/Default.aspx, 2016 [10] Dr. Betty N. Love and Dr. Andrew Swift, Discussion of Research Techniques, 2016 [11] Ben Turner, B-Cycle Information, 2016 37
  • 39. A Appendix A.1 Stations Table Kiosk Index Kiosk Name Address Latitude Longitude Total Docks 1 10th & Cass 1001 Cass St 41.26370 -95.92943 13 2 10th & Dodge 1008 Dodge St 41.25966 -95.92928 11 3 10th & Farnam 1000 Farnam St 41.25756 -95.92921 11 4 11th & Jackson 1100 Jackson St 41.25440 -95.93058 13 5 13th & Howard 1300 Howard St 41.25547 -95.93322 10 6 14th & Douglas 1400 Douglas St 41.25864 -95.93451 11 7 14th & Fahey 1400 Fahey St 41.26583 -95.93449 11 8 1516 Cuming St 1516 Cuming St 41.26826 -95.93669 9 9 15th & Howard 1500 Howard St 41.25549 -95.93585 11 10 16th & Douglas 1600 Douglas St 41.25865 -95.93719 11 11 1819 Farnam 2200 River Rd 41.25708 -95.94028 11 12 1st & Broadway 100 Broadway St 41.26333 -95.84373 10 13 20th & Dodge 2000 Dodge St 41.25975 -95.94242 9 14 24th & Lake 2400 Lake St 41.28153 -95.94701 9 15 50th & Underwood 5000 Underwood St 41.26506 -95.99028 10 16 62nd & Dodge 6200 Dodge St 41.25961 -96.00819 10 17 66th & Center 6600 Center St 41.25236 -95.99799 10 18 67th & Frances 6700 Frances St 41.23984 -96.01464 9 19 67th & Pine 6700 Pine St 41.24374 -96.01462 7 20 711 S Main St 601 Riverfront Dr 41.25474 -95.85110 9 21 7th & Jones 700 Jones St 41.25337 -95.92539 10 22 9th & Jones 900 Jones St 41.25339 -95.92785 10 23 Aksarben Drive 1919 Ak-Sar-Ben Dr 41.24123 -96.01784 11 24 Ameristar 2200 River Rd 41.24312 -95.90857 11 25 Bob Kerrey Pedestrian Bridge 644 Bridge 41.26565 -95.92413 15 26 Broadway & Main 00 Main St 41.26088 -95.84973 10 27 Lewis & Clark Landing 601 Riverfront Dr 41.26217 -95.92446 11 28 Midtown Crossing: 32nd & Farnam 3200 Farnam St 41.25763 -95.96043 11 29 Pearl St & Willow Ave 533 Willow Ave 41.25844 -95.85103 11 30 The Durham Museum 801 S 10th S 41.25144 -95.92808 11 31 Tom Hanafan River’s Edge Park River Edge Service Rd 41.26253 -95.91838 15 A.2 Driving Distances Matrix Code getDist <- function(from,to,modus="driving",get="distance") { library(rjson) metric <- numeric(length=length(from)) for (i in seq_along(from)) { url <- paste0("http://maps.googleapis.com/maps/api/distancematrix/json?origins=Omaha+", gsub(" ","+",from[i]),"&destinations=Omaha+",gsub(" ","+",to[i]),"&mode=", modus,"&units=metric&language=de-DE&sensor=false") data <- fromJSON(file=url) if (data$status=="INVALID_REQUEST") { warning("Invalid request") metric[i] <- NA } if (data$status=="UNKNOWN_ERROR") { warning("Unknown error - try again !") metric[i] <- NA 38
  • 40. } if (data$status=="MAX_ELEMENTS_EXCEEDED") { stop("Maximum number of items per request exceeded") metric[i] <- NA } if (data$status=="OVER_QUERY_LIMIT") { stop("Exceeded maximum number of requests") metric[i] <- NA } if (data$status=="REQUEST_DENIED") { stop("rejected request") metric[i] <- NA } if (data$rows[[1]]$elements[[1]]$status=="NOT_FOUND") { warning("Start and / or destination can not be found") metric[i] <- NA } if (data$rows[[1]]$elements[[1]]$status=="ZERO_RESULTS") { warning("Found No route between starting point and destination") metric[i] <- NA } if (data$status=="OK" & data$rows[[1]]$elements[[1]]$status=="OK") { if (get=="distance") { metric[i] <- data$rows[[1]]$elements[[1]]$distance$value metric[i] <- round(metric[i]/1,digits=1) } if(get=="duration") { metric[i] <- data$rows[[1]]$elements[[1]]$duration$value metric[i] <- round(metric[i]/(60),digits=1) } } } return(metric) } first <- outer(stations$Address[1:10],stations$Address,getDist,modus='driving',get='distance') second <- outer(stations$Address[11:21],stations$Address,getDist,modus='driving',get='distance') third <- outer(stations$Address[21:31],stations$Address,getDist,modus='driving',get='distance') A.3 Biking Distances Matrix Code getDist <- function(from,to,modus="driving",get="distance") { library(rjson) metric <- numeric(length=length(from)) for (i in seq_along(from)) { url <- paste0("http://maps.googleapis.com/maps/api/distancematrix/json?origins=Omaha+", gsub(" ","+",from[i]),"&destinations=Omaha+",gsub(" ","+",to[i]),"&mode=", modus,"&units=metric&language=de-DE&sensor=false") data <- fromJSON(file=url) if (data$status=="INVALID_REQUEST") { warning("Invalid request") metric[i] <- NA } if (data$status=="UNKNOWN_ERROR") { warning("Unknown error - try again !") metric[i] <- NA } 39
  • 41. if (data$status=="MAX_ELEMENTS_EXCEEDED") { stop("Maximum number of items per request exceeded") metric[i] <- NA } if (data$status=="OVER_QUERY_LIMIT") { stop("Exceeded maximum number of requests") metric[i] <- NA } if (data$status=="REQUEST_DENIED") { stop("rejected request") metric[i] <- NA } if (data$rows[[1]]$elements[[1]]$status=="NOT_FOUND") { warning("Start and / or destination can not be found") metric[i] <- NA } if (data$rows[[1]]$elements[[1]]$status=="ZERO_RESULTS") { warning("Found No route between starting point and destination") metric[i] <- NA } if (data$status=="OK" & data$rows[[1]]$elements[[1]]$status=="OK") { if (get=="distance") { metric[i] <- data$rows[[1]]$elements[[1]]$distance$value metric[i] <- round(metric[i]/1,digits=1) } if(get=="duration") { metric[i] <- data$rows[[1]]$elements[[1]]$duration$value metric[i] <- round(metric[i]/(60),digits=1) } } } return(metric) } first <- outer(stations$Address[1:10],stations$Address,getDist,modus='bicyling',get='distance') second <- outer(stations$Address[11:21],stations$Address,getDist,modus='bicyling',get='distance') third <- outer(stations$Address[21:31],stations$Address,getDist,modus='bicyling',get='distance') A.4 Station Bike Count Code #subset the data set to only include trips before the date specified end <- trips[trips$Checkout_date_time <= adate ,c("Bike","CheckoutKioskName ","ReturnKioskName","Checkout_date_time","Return_date_time")] #order the subset oldest to newest end <- end[with(end,order(Return_date_time,decreasing=T)),] #grap the first bike id location first <- end[match(unique(end$Bike), end$Bike),] #count the number of times a station appears in the unique bike list stat_c <- as.data.frame(table(first$ReturnKioskName)) names(stat_c) <- c('KioskName','Count') #make a readable table 40
  • 42. count <- sqldf('select * from stat_c order by KioskName') A.5 Station Counts at any Date/Time Code #flipping the bike column method to find the number of bikes at each station bike_count <- function(dt){ #prepare the date for subset logical dt <- as.POSIXlt(dt,tz='UTC') #subset the data set to only include trips before the date specified end <- trips[trips$Return_date_time <= dt ,c("Bike","CheckoutKioskName","ReturnKioskName" ,"Checkout_date_time","Return_date_time")] #order the subset oldest to newest end <- end[with(end,order(Return_date_time,decreasing=T)),] #grap the first bike id location first <- end[match(unique(end$Bike), end$Bike),] #count the number of times a station appears in the unique bike list stat_c <- as.data.frame(table(first$ReturnKioskName)) names(stat_c) <- c('KioskName','Count') #make a readable table count <- sqldf('select * from stat_c order by KioskName') count } #use the date/time insid the function count <- bike_count(bdate) A.6 Summary and Desribe Output from Section 3.1 summary(trips) ## TripId UserProgramName UserId ## Min. : 2008284 Heartland B-cycle :32777 Min. : 13020 ## 1st Qu.: 4317028 Omaha B-cycle : 6584 1st Qu.: 601822 ## Median : 6138047 Denver B-cycle : 25 Median : 831570 ## Mean : 6846018 Boulder B-cycle : 13 Mean : 838811 ## 3rd Qu.:10104673 Des Moines B-cycle : 3 3rd Qu.:1149241 ## Max. :11691781 Kansas City B-cycle: 3 Max. :1359695 ## (Other) : 9 ## UserRole UserCity UserState ## Demo Member : 28 :33643 :33611 ## InternalMember : 31 Omaha : 4371 NE : 5241 ## Maintenance : 5282 OMAHA : 282 IA : 130 ## Member : 1182 omaha : 159 KS : 73 ## Non-RFID Card Member:22846 Council Bluffs: 91 WI : 67 ## RFID Card Member : 4354 Omaha : 80 DC : 51 ## Subscriber : 5691 (Other) : 788 (Other): 241 ## UserZip UserCountry ## Min. : 0 FRANCE : 5 ## 1st Qu.:63701 NETHERLANDS : 1 ## Median :68106 UNITED STATES:39408 ## Mean :63572 ## 3rd Qu.:68135 ## Max. :99999 ## NA's :6798 ## MembershipType Bike BikeType ## 24-Hour Pass Kiosk :28302 193 : 588 Standard:39414 ## : 5341 185 : 525 41
  • 43. ## Heartland Pass (Annual Pay) : 2623 197 : 502 ## Heartland Pass (Monthly Pay): 1685 267 : 487 ## Annual : 1171 36 : 486 ## FUN! Pass : 103 109 : 485 ## (Other) : 189 (Other):36341 ## CheckoutKioskName ## Bob Kerrey Pedestrian Bridge :10288 ## 10th & Farnam : 3522 ## 11th & Jackson : 3109 ## Lewis & Clark Landing : 2755 ## Tom Hanafan River's Edge Park : 2256 ## 67th & Frances : 1669 ## (Other) :15815 ## ReturnKioskName DurationMins ## Bob Kerrey Pedestrian Bridge :10218 Min. : 0.0 ## 10th & Farnam : 3522 1st Qu.: 12.0 ## 11th & Jackson : 3129 Median : 34.0 ## Lewis & Clark Landing : 2762 Mean : 104.7 ## Tom Hanafan River's Edge Park : 2197 3rd Qu.: 53.0 ## 67th & Frances : 1676 Max. :49914.0 ## (Other) :15910 ## AdjustedDurationMins UsageFee AdjustmentFlag Distance ## Min. : 0.000 Min. : 0.0000 N:38372 Min. : 0.000 ## 1st Qu.: 0.000 1st Qu.: 0.0000 Y: 1042 1st Qu.: 1.500 ## Median : 0.000 Median : 0.0000 Median : 3.000 ## Mean : 1.556 Mean : 0.6856 Mean : 4.556 ## 3rd Qu.: 0.000 3rd Qu.: 0.0000 3rd Qu.: 6.600 ## Max. :304.000 Max. :83.0000 Max. :99.600 ## ## EstimatedCarbonOffset EstimatedCaloriesBurned CheckoutDateLocal ## Min. : 0.000 Min. : 0.0 Min. :2014-01-11 ## 1st Qu.: 1.400 1st Qu.: 58.0 1st Qu.:2015-03-31 ## Median : 2.900 Median : 120.0 Median :2015-08-09 ## Mean : 4.317 Mean : 181.5 Mean :2015-08-27 ## 3rd Qu.: 6.300 3rd Qu.: 264.0 3rd Qu.:2016-05-22 ## Max. :94.600 Max. :3984.0 Max. :2016-08-31 ## ## ReturnDateLocal CheckoutTimeLocal ## Min. :2014-01-11 Min. :5H 44M 38S ## 1st Qu.:2015-03-31 1st Qu.:12H 27M 51.25S ## Median :2015-08-09 Median :15H 24M 32.5S ## Mean :2015-08-27 Mean :15H 23M 22.4273354645557S ## 3rd Qu.:2016-05-22 3rd Qu.:18H 26M 2.75S ## Max. :2016-08-31 Max. :23H 24M 41S ## ## ReturnTimeLocal TripOver30Mins LocalProgramFlag ## Min. :27S N:17718 N: 53 ## 1st Qu.:12H 59M 5.25S Y:21696 Y:39361 ## Median :16H 0M 21S ## Mean :15H 56M 56.3876795047472S ## 3rd Qu.:19H 6M 51.5S ## Max. :23H 59M 57S ## ## TripRouteCategory TripProgramName DOWC ## One Way :16220 Heartland B-cycle:32818 Friday :5481 ## Round Trip:23194 Omaha B-cycle : 6596 Monday :4663 ## Saturday :8177 ## Sunday :7630 ## Thursday :4211 ## Tuesday :4807 ## Wednesday:4445 ## DOWR Checkout_date_time ## Friday :5471 Min. :2014-01-11 18:27:10 ## Monday :4716 1st Qu.:2015-03-31 15:31:23 ## Saturday :8102 Median :2015-08-09 16:02:43 ## Sunday :7623 Mean :2015-08-27 19:59:26 ## Thursday :4226 3rd Qu.:2016-05-22 13:21:57 ## Tuesday :4814 Max. :2016-08-31 22:03:35 ## Wednesday:4462 ## Return_date_time dur_sec ## Min. :2014-01-11 19:27:24 Min. : 0 ## 1st Qu.:2015-03-31 15:34:21 1st Qu.: 737 ## Median :2015-08-09 17:07:02 Median : 2063 ## Mean :2015-08-27 22:06:53 Mean : 6264 ## 3rd Qu.:2016-05-22 14:39:22 3rd Qu.: 3192 ## Max. :2016-08-31 23:05:24 Max. :2994814 ## NA's :43 describe(trips) ## trips ## ## 33 Variables 39414 Observations ## --------------------------------------------------------------------------- ## TripId ## n missing unique Info Mean .05 .10 .25 ## 39414 0 39414 1 6846018 2572828 2854288 4317028 ## .50 .75 .90 .95 ## 6138047 10104673 10966710 11302699 ## ## lowest : 2008284 2008286 2010589 2010594 2010596 ## highest: 11691514 11691517 11691522 11691776 11691781 ## --------------------------------------------------------------------------- ## UserProgramName ## n missing unique ## 39414 0 11 ## ## Austin B-cycle (2, 0%), Boulder B-cycle (13, 0%) ## Broward B-cycle (2, 0%), Cincy Red Bike (1, 0%) ## Denver B-cycle (25, 0%) ## Des Moines B-cycle (3, 0%) ## Heartland B-cycle (32777, 83%) ## Houston B-cycle (1, 0%) ## Kansas City B-cycle (3, 0%) ## Nashville B-cycle (3, 0%) ## Omaha B-cycle (6584, 17%) 42