Should the bikeshare industry adopt an open data standard? As bikesharing spreads to more cities, having a common method for accessing and analyzing data will become more important.
CarStream: An Industrial System of Big Data Processing for Internet of Vehiclesijtsrd
As the Internet-of-Vehicles (IoV) technology becomes an increasingly important trend for future transportation, de-signing large-scale IoV systems has become a critical task that aims to process big data uploaded by fleet vehicles and to provide data-driven services. The IoV data, especially high-frequency vehicle statuses (e.g., location, engine parameters), are characterized as large volume with a low density of value and low data quality. Such characteristics pose challenges for developing real-time applications based on such data. In this paper, we address the challenges in de-signing a scalable IoV system by describing CarStream, an industrial system of big data processing for chauffeured car services. Photon is deployed within Google Advertising System to join data streams such as web search queries and user clicks on advertisements. It produces joined logs that are used to derive key business metrics, including billing for advertisers. Our production deployment processes millions of events per minute at peak with an average end-to-end latency of less than 10 seconds. We also present challenges and solutions in maintaining large persistent state across geographically distant locations, and highlight the design principles that emerged from our experience. Rakshitha K. S | Radhika K. R"CarStream: An Industrial System of Big Data Processing for Internet of Vehicles" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-4 , June 2018, URL: http://www.ijtsrd.com/papers/ijtsrd14408.pdf http://www.ijtsrd.com/computer-science/database/14408/carstream-an-industrial-system-of-big-data-processing-for-internet-of-vehicles/rakshitha-k-s
Roland is currently working with TfL on the Surface Intelligent Transport System, which is looking to improve the insight available from existing and new data sources. Have worked on event driven architectures for many years and across many sectors although with a primary focus on Transport.
Open Transit Data - A Developer's PerspectiveSean Barbeau
I gave this presentation as part of an N-CATT webinar on "Open Source Software and Open Data". It discusses open transit data, with a focus on rural and demand response transit agencies and topics to watch as of May 2020. The full webinar is available at https://n-catt.org/tech-university/webinar-open-source-software-and-open-data/.
CarStream: An Industrial System of Big Data Processing for Internet of Vehiclesijtsrd
As the Internet-of-Vehicles (IoV) technology becomes an increasingly important trend for future transportation, de-signing large-scale IoV systems has become a critical task that aims to process big data uploaded by fleet vehicles and to provide data-driven services. The IoV data, especially high-frequency vehicle statuses (e.g., location, engine parameters), are characterized as large volume with a low density of value and low data quality. Such characteristics pose challenges for developing real-time applications based on such data. In this paper, we address the challenges in de-signing a scalable IoV system by describing CarStream, an industrial system of big data processing for chauffeured car services. Photon is deployed within Google Advertising System to join data streams such as web search queries and user clicks on advertisements. It produces joined logs that are used to derive key business metrics, including billing for advertisers. Our production deployment processes millions of events per minute at peak with an average end-to-end latency of less than 10 seconds. We also present challenges and solutions in maintaining large persistent state across geographically distant locations, and highlight the design principles that emerged from our experience. Rakshitha K. S | Radhika K. R"CarStream: An Industrial System of Big Data Processing for Internet of Vehicles" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-4 , June 2018, URL: http://www.ijtsrd.com/papers/ijtsrd14408.pdf http://www.ijtsrd.com/computer-science/database/14408/carstream-an-industrial-system-of-big-data-processing-for-internet-of-vehicles/rakshitha-k-s
Roland is currently working with TfL on the Surface Intelligent Transport System, which is looking to improve the insight available from existing and new data sources. Have worked on event driven architectures for many years and across many sectors although with a primary focus on Transport.
Open Transit Data - A Developer's PerspectiveSean Barbeau
I gave this presentation as part of an N-CATT webinar on "Open Source Software and Open Data". It discusses open transit data, with a focus on rural and demand response transit agencies and topics to watch as of May 2020. The full webinar is available at https://n-catt.org/tech-university/webinar-open-source-software-and-open-data/.
APTA TransITech 2013 - "Open Transit Data - A Developers Perspective"Sean Barbeau
A discussion of the different types of transit data and mobile application developer's perspective on open data and transit data formats. For the raw Powerpoint with animations, see http://bit.ly/TransITech-Open-Transit-Data.
This content describes Call Detail Records (CDR) data format, data acquisition method, visualize in Mobmap and the applications for disaster management.
Cataloging Of Sessions in Genuine Traffic by Packet Size Distribution and Ses...IOSR Journals
Abstract: Cataloging traffic keen on precise network applications is vital for application-aware network
organization and it turn into more taxing because modern applications incomprehensible their network
behaviors. Whereas port number-based classifiers work merely for a little renowned application and signaturebased
classifiers are not significant to encrypted packet payloads, researchers are inclined to classify network
traffic rooted in behaviors scrutinized in network applications. In this document, a session level Flood
Cataloging (SLFC) approach is proposed to organize network Floods as a session, which encompasses of
Floods in the equal discussion. SLFC initially classifies flood into the analogous applications by packet size
distribution (PSD) and subsequently faction Floods as sessions by port locality. With PSD, each Flood is
distorted into a set of points in a two-Dimension space and the remoteness among all Flood and the
representatives of preselected applications are calculated. The Flood is predicted as the application having a
least distance. Meanwhile, port locality is accustomed to cluster Floods as sessions since an application often
uses successive port statistics surrounded by a session. If flood of a session are categorized into diverse
applications, an arbitration algorithm is invoked to make the improvement.
Keywords: Flood Cataloging; session grouping; session Cataloging; packet size distribution
The article describes types of data used in autonomous driving, its intrinsic value and ability to monetize. Ecosystem data, fast versus slow moving data informs AV business models
A Linked Data Dataset for Madrid Transport Authority's DatasetsOscar Corcho
Presentation done at the CIT2014 conference in Santander, describing the initial work towards providing a Linked Data dataset for Consorcio Regional de Transportes de Madrid
Running Head PROJECT DELIVERABLE 31PROJECT DELIVERABLE 310.docxtodd581
Running Head: PROJECT DELIVERABLE 31
PROJECT DELIVERABLE 310
Project Deliverable 3: Database and Programming Design
Leo Austin
Professor Joe Scott
CIS498 – Information Technology Capstone
08/22/2018
Introduction
Bicycle Trader being a constantly growing internet-based company requires the collection of an abundance of data to analyze for continued operations. Whether customers signup for services or browse through the website, data is gathered to allow the website to adapt to demands and cater to the customers’ needs and determine what will make using the site more user-friendly. Most importantly is the need to gather data in order to facilitate the entry and archiving of customer input data and use by other entities or departments within the business. Various database models can be taken into consideration for the needs of this business, and the relational database model is the most applicable due to the data sorting requirements for the website.
Not only is the rational database model the ideal database solution, but because they primarily consist of tables used to manage and store data, they are relatively easy to create and maintain. Many organizations choose this approach as it facilitates access to understandable data assets. Separating data by implementing tables also allows for the ability to adequately secure data by distinguishing each with their own classifications. Sorting data into tables also means that data can be added or withdrawn without having to overhaul the entire database.
Implementing data warehousing alongside relational databases provides further practicality and presents many advantages. By doing so, we can take advantage of its ability to “store large quantities of historical data and enable fast, complex queries across all the data, typically using Online Analytical Processing (OLAP)” (Panoply, n.d.). Data warehouses are essentially a collection of data from various sources that can be used by organizations for reporting and analysis. Because of the nature of Bicycle Trader and the abundance of like items that will be sold be by users on the website, a data warehouse will be the most practical solution for archiving data, because unlike most databases which normalize data in order to eliminate redundant data, a data warehouse uses a denormalized data structure. This means that fewer data tables with more grouping are used and redundancies aren’t excluded.
This combination of relational data systems, the data warehouse and relational database, can be hosted internally by the organization on its’ mainframe, and stored in their cloud. Using a cloud yields more advantages as it is the easiest and most cost-effective approach. By using this method, data can easily be accessed from several locations. Additionally, this allows for fewer physical resources as it eliminates some of the costs associated with expensive systems and equipment, expert staff, and energy consumption by alternatively utilizing the .
Driving the Data Pipelines for Connected Vehicles with Spring Cloud Data FlowVMware Tanzu
SpringOne 2021
Session Title: Driving the Data Pipelines for Connected Vehicles with Spring Cloud Data Flow
Speaker: Banu Parasuraman, Chief Technologist at Wipro
The increasing need for traffic detection system has become a vital area in both developing and developed
countries. However, it is more important to get the accurate and valuable data to give the better result
about traffic condition. For this reason, this paper proposes an approach of tracking traffic data as cheap
as possible in terms of communication, computation and energy efficient ways by using mobile phone
network. This system gives the information of which vehicles are running on which location and how much
speed for the Traffic Detection System. The GPS sensor of mobile device will be mainly utilized to guess a
user’s transportation mode, then it integrates cloud environment to enhance the limitation of mobile device,
such as storage, energy and computing power. This system includes three main components: Client
Interface, Server process and Cloud Storage. Some tasks are carried out on the Client. Therefore, it greatly
reduces the bottleneck situation on Server side in efficient way. Most of tasks are executed on the Server
and history data are stored on the Cloud Storage. Moreover, the paper mainly uses the distance based
clustering algorithm in grouping mobile devices on the same bus to get the accurate data.
ŠVOČ: Design and architecture of a web applications for interactive display o...Martin Puškáč
My ŠVOČ (Študentská vedecká činnosť) based on my Bachelor thesis with title "Design and architecture of a web applications for interactive display of criminal statistical data".
Open and participatory planning process is built into planning for the Capital Bikeshare system in Arlington, Virginia, with meetings and communication with civic associations and individual residents about individual sites.
BikeArlington, the Arlington County Department of Environmental Services, and Capital Bikeshare recently developed a brief document, Building Bikeshare Together, which outlines this process step by step.
More Related Content
Similar to Building a Standard for Open Bikeshare Data
APTA TransITech 2013 - "Open Transit Data - A Developers Perspective"Sean Barbeau
A discussion of the different types of transit data and mobile application developer's perspective on open data and transit data formats. For the raw Powerpoint with animations, see http://bit.ly/TransITech-Open-Transit-Data.
This content describes Call Detail Records (CDR) data format, data acquisition method, visualize in Mobmap and the applications for disaster management.
Cataloging Of Sessions in Genuine Traffic by Packet Size Distribution and Ses...IOSR Journals
Abstract: Cataloging traffic keen on precise network applications is vital for application-aware network
organization and it turn into more taxing because modern applications incomprehensible their network
behaviors. Whereas port number-based classifiers work merely for a little renowned application and signaturebased
classifiers are not significant to encrypted packet payloads, researchers are inclined to classify network
traffic rooted in behaviors scrutinized in network applications. In this document, a session level Flood
Cataloging (SLFC) approach is proposed to organize network Floods as a session, which encompasses of
Floods in the equal discussion. SLFC initially classifies flood into the analogous applications by packet size
distribution (PSD) and subsequently faction Floods as sessions by port locality. With PSD, each Flood is
distorted into a set of points in a two-Dimension space and the remoteness among all Flood and the
representatives of preselected applications are calculated. The Flood is predicted as the application having a
least distance. Meanwhile, port locality is accustomed to cluster Floods as sessions since an application often
uses successive port statistics surrounded by a session. If flood of a session are categorized into diverse
applications, an arbitration algorithm is invoked to make the improvement.
Keywords: Flood Cataloging; session grouping; session Cataloging; packet size distribution
The article describes types of data used in autonomous driving, its intrinsic value and ability to monetize. Ecosystem data, fast versus slow moving data informs AV business models
A Linked Data Dataset for Madrid Transport Authority's DatasetsOscar Corcho
Presentation done at the CIT2014 conference in Santander, describing the initial work towards providing a Linked Data dataset for Consorcio Regional de Transportes de Madrid
Running Head PROJECT DELIVERABLE 31PROJECT DELIVERABLE 310.docxtodd581
Running Head: PROJECT DELIVERABLE 31
PROJECT DELIVERABLE 310
Project Deliverable 3: Database and Programming Design
Leo Austin
Professor Joe Scott
CIS498 – Information Technology Capstone
08/22/2018
Introduction
Bicycle Trader being a constantly growing internet-based company requires the collection of an abundance of data to analyze for continued operations. Whether customers signup for services or browse through the website, data is gathered to allow the website to adapt to demands and cater to the customers’ needs and determine what will make using the site more user-friendly. Most importantly is the need to gather data in order to facilitate the entry and archiving of customer input data and use by other entities or departments within the business. Various database models can be taken into consideration for the needs of this business, and the relational database model is the most applicable due to the data sorting requirements for the website.
Not only is the rational database model the ideal database solution, but because they primarily consist of tables used to manage and store data, they are relatively easy to create and maintain. Many organizations choose this approach as it facilitates access to understandable data assets. Separating data by implementing tables also allows for the ability to adequately secure data by distinguishing each with their own classifications. Sorting data into tables also means that data can be added or withdrawn without having to overhaul the entire database.
Implementing data warehousing alongside relational databases provides further practicality and presents many advantages. By doing so, we can take advantage of its ability to “store large quantities of historical data and enable fast, complex queries across all the data, typically using Online Analytical Processing (OLAP)” (Panoply, n.d.). Data warehouses are essentially a collection of data from various sources that can be used by organizations for reporting and analysis. Because of the nature of Bicycle Trader and the abundance of like items that will be sold be by users on the website, a data warehouse will be the most practical solution for archiving data, because unlike most databases which normalize data in order to eliminate redundant data, a data warehouse uses a denormalized data structure. This means that fewer data tables with more grouping are used and redundancies aren’t excluded.
This combination of relational data systems, the data warehouse and relational database, can be hosted internally by the organization on its’ mainframe, and stored in their cloud. Using a cloud yields more advantages as it is the easiest and most cost-effective approach. By using this method, data can easily be accessed from several locations. Additionally, this allows for fewer physical resources as it eliminates some of the costs associated with expensive systems and equipment, expert staff, and energy consumption by alternatively utilizing the .
Driving the Data Pipelines for Connected Vehicles with Spring Cloud Data FlowVMware Tanzu
SpringOne 2021
Session Title: Driving the Data Pipelines for Connected Vehicles with Spring Cloud Data Flow
Speaker: Banu Parasuraman, Chief Technologist at Wipro
The increasing need for traffic detection system has become a vital area in both developing and developed
countries. However, it is more important to get the accurate and valuable data to give the better result
about traffic condition. For this reason, this paper proposes an approach of tracking traffic data as cheap
as possible in terms of communication, computation and energy efficient ways by using mobile phone
network. This system gives the information of which vehicles are running on which location and how much
speed for the Traffic Detection System. The GPS sensor of mobile device will be mainly utilized to guess a
user’s transportation mode, then it integrates cloud environment to enhance the limitation of mobile device,
such as storage, energy and computing power. This system includes three main components: Client
Interface, Server process and Cloud Storage. Some tasks are carried out on the Client. Therefore, it greatly
reduces the bottleneck situation on Server side in efficient way. Most of tasks are executed on the Server
and history data are stored on the Cloud Storage. Moreover, the paper mainly uses the distance based
clustering algorithm in grouping mobile devices on the same bus to get the accurate data.
ŠVOČ: Design and architecture of a web applications for interactive display o...Martin Puškáč
My ŠVOČ (Študentská vedecká činnosť) based on my Bachelor thesis with title "Design and architecture of a web applications for interactive display of criminal statistical data".
Similar to Building a Standard for Open Bikeshare Data (20)
Open and participatory planning process is built into planning for the Capital Bikeshare system in Arlington, Virginia, with meetings and communication with civic associations and individual residents about individual sites.
BikeArlington, the Arlington County Department of Environmental Services, and Capital Bikeshare recently developed a brief document, Building Bikeshare Together, which outlines this process step by step.
Smart Fares: What if we sold transit fares like cell phone minutes?Mobility Lab
If we think of a transit trip like we do a cell-phone minute (or megabyte) we start to realize that there are many ways to package our usage. While cell-phone plans have many flavors that pertain to many different types of users, public-transit fares tend to come in variations of just two flavors: single ride or unlimited. But electronic-payment infrastructure such as Smart Cards can allow market segmentation that wasn’t possible with cash, token, or paper fare media.
Creating Better Places with Transportation Demand Management (TDM)Mobility Lab
A “transit premium” can increase property values by anywhere between a few percentage points up to more than 150 percent.
TDM focuses on shifting travelers away from single occupancy-vehicle modes like biking, walking, bus, and rail. In many cases, however, TDM solutions and programs may address only a single alternative mode, or ignore the increasing diversity in how people – particularly younger generations – are traveling.
There is strong evidence of this narrow focus occurring frequently. Residential buildings may tout their WalkScore as a measure of pedestrian-friendliness. Or a commercial building may earn a Bicycle Friendly Business’ designation from the League of American Bicyclists. While these tools and designations are certainly valuable, sustainable buildings should have an an equitable distribution of transportation options and opportunities.
Most property owners and managers (and the business leaders who operate within them) can find ways to better promote and encourage a range of multi-modal options.
My contribution to helping them do so is the Multi-Modal Transportation Score (or what I like to call ModeScore for short). It measures the total accessibility of a given building, taking into account all possible sustainable transportation modes. My overarching goal is that building users will create and embrace programs to encourage and increase alternative travel.
Which Attributes Make a Community Successful?Mobility Lab
A presentation by Sophie Mintier and her colleagues from the Metropolitan Washington Council of Governments at Mobility Lab in Arlington on February 27, 2014. This looks at the Arlington examples of Rosslyn, Shirlington, and Columbia Pike.
Capital Bikeshare’s First Mile:Last-Mile RidershipMobility Lab
In Capital Bikeshare’s 2013 customer survey, the local government regional partnership that created the service asked who made trips to and from Metro. It turns out that 54 percent of our customers do.
So how far do Capital Bikeshare customers ride to get to and from Metro and their home? Well, we had some interesting findings which include a “hot zone” of bikeshare activity surrounding Metro stations.
Integrating Community Development and Transportation StrategiesMobility Lab
Arlington’s strategies have yielded substantial economic, transportation, and environmental benefits - allowing continued growth with less reliance on auto trips, and more use of transit and other travel options. It isn’t just one policy but many that contribute to enhanced performance
Real-time Ridesharing presentation by Peggy Tadej (Northern Va Regional Commission) at the Intelligent Transportation Society of Virginia Tech Session (Mobility Lab 12/4/2012)
Justin Schor, a senior TDM specialist at Wells + Associates, presents the different scenarios and plans that building developers and others can get towards points for LEED certification, focusing on the significant amount they can easily achieve through transportation planning and adjustments.
The goal of the ACCS 2011 WalkArlington Study was gauge awareness and satisfaction with the program, understand the impact of the program on walking behavior and investigate the potential for new services.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Building a Standard for Open Bikeshare Data
1. Building a Standard for Open Bikeshare Data
Originally published at Michael Schade’s Mystery Incorporated Blog
March 2nd, 2014
Should the bikeshare industry adopt an open data standard? As
bikesharing spreads to more cities, having a common method for
accessing and analyzing data will become more important. We know
that transit systems work best when agencies concentrate on their
core mission. Transit agencies are not in the information technology
business; all they should do is release their data to let third parties
build apps that let passengers use the systems.
To use open data, programmers need to know: Where is the
data? What are the files called? Which fields are available? What are
the fields called?
Bikesharing systems should adopt the standard of having a “data”
page which can be found by appending “data” immediately after the
main URL. This is what many U.S. government web sites are doing
(like justice.gov/data, dot.gov/data, state.gov/data, etc.) It would be
awesome to have consistent URLs like capitalbikeshare.com/data and
velib.paris.fr/data.
To standardize what the files are called, we have to decide how many
files are used, and what formats to use. Some systems do not
separate the station information data (which is static) from the station
status data (which is dynamic). The Capital Bikeshare XML file and
the Bixi Montreal XML file are examples of combining both static and
dynamic data in a single file (both use the Bixi public bike system).
This might be more convenient in some cases, but for systems that
frequently update their displays, it wastes a lot of bandwidth. This
process could be made more efficient by using two files. JCDecaux,
which manages many bikesharing systems in Europe, separates
the static data from the dynamic real-time data.
Denver‟s B-cycle doesn‟t seem to offer any data at all, though
Denver‟s Open Data Catalog does offer a variety of formats for data
about B-cycle Stations. I doubt this is the true, live, system data,
because the coordinates are given as street addresses and not latitude
and longitude coordinates.
2. In addition to information needed by apps, we also need historic data
in order to analyze how people use the system. The most common
kind is system metrics, such as the type released by Bay Area
Bikeshare. This typically shows ridership and membership totals, and
is good for showing how the system has grown. It would be updated at
the end of each day.
Planners and analysts rely on two other types of historic data: trip
history information shows every trip made within a certain period,
and station history data shows the status of the stations within a
certain period. The best example of the former is the Capital Bikeshare
trip history data page, which releases a new data set every quarter.
The latter is sometimes recorded by enthusiasts on their own initiative,
such as the CaBi Tracker website. In San Francisco, Eric Fisherkeeps a
daily log of Bay Area Bikeshare stats at trafficways.org/babs (I used
his data in Probing Data from Bay Area Bikeshare).
The trip history and station history files need a naming convention to
reflect the content‟s date range. CaBi‟s largest quarterly file is 72.5MB,
for the 572,919 trips in the 2nd quarter of 2012 (they have now
started zipping the files). A filename format like trips-2012-3-1-to-20125-30.csv would work well.
While the systems are expected to protect their customers‟ privacy by
not including customer IDs, users should be able to download their
own personal trip history files, and those files should use the same
format as the main trip history files.
Finally, there should be a standard way of summarizing general
information about the entire system. Who provides the equipment,
who runs the system, which jurisdictions participate, where the system
is located and what its boundaries are, what the hours of operation
are, what the operating season is, what the URL is and other contact
info. And to really integrate all the various systems, we also could
benefit from having the URL for a standard-size logo images, plus the
systems‟s colors. This System information file should also include data
found in a manifest file, namely, a list of all the associated open-data
files.
The system information should include definitions of available
membership types. This might merit being listed as a separate table.
Each membership type should include the cost and duration. We also
need to know how long rides can be, and what the charges are for
going beyond the time limit. For example, theCaBi pricing rules say
3. rides are free for the first 30 minutes; going up to 30 minutes longer
costs $2.00 for casual members (those with 1- or 3-day memberships)
and $1.50 for subscribers. In contrast, the Citi Bike pricing rules say
rides are free for the first 45 minutes; going up to 30 minutes longer
costs $4.00 for those with 24-hour & 7-day passes, and $2.50 for
those with annual memberships.
This table summarizes the six types of bikesharing data:
System information: general info
Station information: a mostly-static list of all stations
Station status: the number of available bikes and docks
System metrics: membership and trip totals
Trip history: every trip made during a given period
Station history: a history of the station status list
Here‟s how I would organize the files. I‟ll use ▶ to indicate a primary
key (one that must be unique within the system), and ▷ to indicate a
foreign key (one that references another table‟s primary key, and
which must exist).
The station information data is the information most likely to be
shared by bikeshare systems. At the very least, it includes the latitude
& longitude coordinates for every station, and the name. The file is
fairly static, changing mostly when new stations are added.
Here are the fields I would include, compared with CaBi (DC), Vélib
(Paris), and Denver‟s B-cycle to see what names they use.
Station information
proposal
CaBi
Vélib
B-cycle
id,
stationid ▶
number
GLOBALID
terminalName
name
name
name
STATION_NAME
STATION_ADDRESS,
address
(not used)
address
ADDRESS_LINE1,
ADDRESS_LINE2
(not
region
(not used)
CITY, STATE
used)
(not
zip
(not used)
ZIP
used)
lat
lat
latitude
(not used)
lng
long
longitude (not used)
installed
installDate
(not
(not used)
4. removed
removalDate
public
public
capacity
(not used)
message
(not used)
used)
(not
used)
(not
used)
(not
used)
(not
used)
(not used)
(not used)
NUM_DOCKS
(not used)
Most systems don‟t use a region field, but for multi-jurisdictional
systems, it is important to know which jurisdiction manages each
station. For example,Capital Bikeshare operates within DC,
Montgomery County, Arlington, and Alexandria. Bay Area
Bikeshare operates within San Francisco, Redwood City, Palo Alto,
Mountain View, and San Jose. Nice Ride operates within Minneapolis
and St Paul. Other systems could use this field to track which
neighborhood the station is in.
Vélib appends the postal code & city to the address field, but this
would be better as a separate fields. For example, the Bastille Richard
Lenoir station has an address of “2 BOULEVARD RICHARD LENOIR –
75011 PARIS”, but this should be just “2 BOULEVARD RICHARD
LENOIR”, with a zip of “75011″ and a city of “Paris.” And there is no
reason for Vélib to use all-uppercase letters. The data should be in the
proper mixed-case (using French rules for capitalization), and
programs can easily convert to uppercase if they wish.
I would suggest a message field so systems can communicate that a
station will be shutting down early, or moved to a new location. Or
during snow storms, the rebalancing van might not be able to service
a station.
Denver has other fields that should be considered for a standard.
“PROPERTY_TYPE” shows whether the station‟s location
is Private or Public. This could be expanded to show exactly who the
property owner or responsible agency is. “POWER_TYPE” has values
of Solar Only, Wired Only, and Solar with Wire Backup.
Cities often provide temporary stations. The station ID should
correspond to a specific location. If a station returns to the same
location for an annual event, it should re-use the old ID.
5. The station status file should have the smallest amount of data needed
to describe the current state of each station. This is the file that will be
called most often, potentially thousands of times per minute, so every
byte counts. And many people will be querying this data from mobile
devices, another reason to keep the file size as small as possible.
Here‟s how I would design the standard for this file, compared with
CaBi (DC) and Denver‟s B-cycle to see what names they use. Because
I couldn‟t find Denver‟s XML feed, I used CityBike„s Denver JSON feed.
Station status
proposal
CaBi
Denver B-cycle
stationid ▷ id, terminalName
id, idx
bikes
nbBikes
bikes
docks
nbEmptyDocks
free
open
locked
(not used)
time
lastCommWithServer timestamp
The bikes and docks numbers will generally add up to
the capacity value in the station information file, but if there are nonfunctioning bikes or docks, the total could be smaller. The open field
would be true or false. Sometimes stations are temporarily closed,
perhaps because they have become inaccessible. The timevalue shows
the last time the station communicated with the server. This is useful
to determine if the data might no longer be accurate, such as during a
power outage.
Notice we don‟t duplicate any of the fields in the station
information file, other than our foreign key, the stationid field.
The trip history file also needs to be as compact as possible, not
because people will be downloading it frequently, but because these
files could be used to store millions of records.
Trip history
startdate
startstation ▷
enddate
endstation ▷
bikeid
usertype
The duration of each trip can be computed on-the-fly and doesn‟t need
to be included in the file. The startstation and endstation values link up
6. to the stationid field in the station information file. The usertype field
describes the type of membership the rider has.
Though few systems release trip history data on a regular basis, there
have been occasions when systems have released data in support of a
visualization contest. The Hubway Data Visualization Challenge took
place in 2013, and included demographic data about the rider of each
trip: residential zip code, year of birth, and sex. The Divvy Data
Challenge (for Chicago) is currently underway; its data includes riders‟
year of birth and sex.
The station history file should be a list of every change in status
(available bikes and docks) for every station, listed in chronological
order. In order to avoid having to repeat the state of the entire system
when only a few stations have new values, the file should start with
every station, and thereafter list a station only when it has changed.
The initial value would be needed in order to compute the state of any
later times recorded in the file.
Station history
stationid ▷
bikes
docks
open
time
The dominant data format nowadays is either XML or JSON. CSV is
also a good choice, as long as the data fits in a tabular format,
consisting of simple rows and columns. For CSV files, the order of
fields should be consistent.
The values of the fields are numeric, string, Boolean, and timestamp.
Boolean is easily expressed as “true” or “false,” and Unix time is a
common way of recording date and time.
By publishing and standardizing bikesharing open data, developers and
analysts can make it easier for the public to make use of and discover
bikesharing systems across the globe, such as the Bike Share
Map by Oliver O‟Brian. The vendors, operators, and managing
jurisdictions should work together to create a standard that can be
used by everyone.