SlideShare a Scribd company logo
Weather & Transportation
Streaming the Data, Finding Correlations
Provide capability to Data for Democracy democratizing_weather_data
University of Washington Professional & Continuing Education
BIG DATA 230 B Su 17: Emerging Technologies In Big Data
Team D-Hawks
John Bever, Karunakar Kotha, Leo Salemann, Shiva Vuppala, Wenfan Xu
Overview
Our
“Client”
Their Mission
Learn More www.datafordemocracy.org https://github.com/Data4Democracy democratizing_weather_data/streaming
Our Mission ● Provide a streaming capability to extract weather and traffic data from multiple Web API’s, and produce a
clean merged dataframe suitable for Machine Learning and other Data Science analysis.
● Deliver code to D4D’s Github Repository
● Use vendor-neutral, opensource solutions, implemented in python and Jupyter notebooks
Pipeline
• Kafka transport mechanism (vendor-neutral, open source)
• Message value is an entire JSON document
• One topic per source API, guarantees consistent schema
• Multiple json documents (sharing same schema) combined into a single dataframe
• Dataframe records joined based on space and time
Web APIs
Postman
• Great tool for interacting with potential
APIs.
• Friendly GUI for constructing requests
and reading responses.
• Provided JSON files before pipeline
was completed. Allowed analysis of
data in parallel
ProgrammableWeb.com
● A massive searchable directory of over
15,500 web APIs that are updated daily
● Includes sample source code for APIs
Producers
Arguments
● Topic
● URL + Access Key
Message.Value
● JSON document
Consumers
● Filename includes timestamp
● “utf-8” decoded text file
● One complete JSON file on
disk per message
Analysis
Load Json file, normalize, save as dataframe.
Repeat for next json file, append to prior.
7 days of data (includes eclipse!) 30 minutes between readings
1 Merged Traffic/Weather Table (52,975 rows x 30 columns)
54 Weather Json Files from Yahoo (54 rows x 31 columns)
394 Weather Json Files from WSDOT (40,931 rows x 16 columns)
395 Traffic Json Files from WSDOT (70,998 rows x 20 columns)
Merge WSDOT & Yahoo Weather Dataframes (use columns common to both)
Merge Traffic/Weather Dataframes. Each Row has:
- Traffic data from a specific Traffic dataframe row
- Weather data from a weather station within 20 miles and 30 minutes of traffic reading.
Visualization
Charting with AltairMapping with Folium (traffic in black; weather in blue)
TemperatureforZillah,WA
CurrentTravelTimeforI-5
SBCorridor
Eclipse
Analyzing the Merged/Traffic Weather Dataset
Scatterplot Matrix with Seaborn (10% random sample)
Average Travel Time
Current Travel Time
Wind Direction
Wind SpeedTemp.
Humidity
Barometer
Wrapping Up ...
Key Takeaways
• Choose your python libraries carefully (2 lines of code for a fully-labeled lineplot vs. dozens)
• Spatial plots first, data-joins later (I-5 traffic data vs. statewide weather, also Portland)
• The fastest way to count records in a dataframe is df.shape[0]
Conclusion
• Data for Democracy has a repeatable way to extract weather and transportation data from WSDOT and Yahoo
• Jupyter Notebook provides a teaching/coding environment
• Bitnami provides low-cost simple Kafka infrastructure
Further Work
• Upload csv and zipped json’s to data.world
• Better parameters for Producer scripts (ex. Longitude, Latitude, Date, Time)
• Config files for access keys
• More matrix plots, Data Science, Machine Learning
•Gather data for longer time frames (fewer readings per day?)
•Isolate matrix plots to specific locations and/or time.
THANKYOU!

More Related Content

What's hot

Open trip planner status update may 2011
Open trip planner status update may 2011Open trip planner status update may 2011
Open trip planner status update may 2011
bibianamchugh
 
A use case-driven iterative method for building a provenance-aware GCIS onto...
A use case-driven iterative method for building a provenance-aware GCIS onto...A use case-driven iterative method for building a provenance-aware GCIS onto...
A use case-driven iterative method for building a provenance-aware GCIS onto...Xiaogang (Marshall) Ma
 
What to do with the existing spatial data in planning
What to do with the existing spatial data in planningWhat to do with the existing spatial data in planning
What to do with the existing spatial data in planning
Karel Charvat
 
From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...
From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...
From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...
Anita Graser
 
Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality
Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality
Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality
andimou
 
Query Reranking As A Service
Query Reranking As A ServiceQuery Reranking As A Service
Query Reranking As A Service
Abolfazl Asudeh
 
The spatiotemporal RDF store Strabon
The spatiotemporal RDF store StrabonThe spatiotemporal RDF store Strabon
The spatiotemporal RDF store Strabon
Kostis Kyzirakos
 
06 preview of a global survey of selected deep underground facilities tynan l...
06 preview of a global survey of selected deep underground facilities tynan l...06 preview of a global survey of selected deep underground facilities tynan l...
06 preview of a global survey of selected deep underground facilities tynan l...
leann_mays
 
Big data in GIS Environment
Big data in GIS Environment Big data in GIS Environment
Big data in GIS Environment
Shivaprakash Yaragal
 
2011 ITS World Congress - GO-Sync - A Framework to Synchronize Transit Agency...
2011 ITS World Congress - GO-Sync - A Framework to Synchronize Transit Agency...2011 ITS World Congress - GO-Sync - A Framework to Synchronize Transit Agency...
2011 ITS World Congress - GO-Sync - A Framework to Synchronize Transit Agency...
Sean Barbeau
 
Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...
Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...
Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...grssieee
 
Graph of UK train stations
Graph of UK train stationsGraph of UK train stations
Graph of UK train stations
Daniyar Mukhanov
 
Using topological analysis to support event guided exploration in urban data
Using topological analysis to support event guided exploration in urban dataUsing topological analysis to support event guided exploration in urban data
Using topological analysis to support event guided exploration in urban data
ivaderivader
 
Itasec2020
Itasec2020Itasec2020
Itasec2020
Ivan Letteri
 
Tabplotd3, interactive inspection of large data
Tabplotd3, interactive inspection of large dataTabplotd3, interactive inspection of large data
Tabplotd3, interactive inspection of large data
Edwin de Jonge
 
Starting work with R
Starting work with RStarting work with R
Starting work with R
Vladimir Bakhrushin
 
Provenance Analytics at AAAI Human Computation Conference 2013
Provenance Analytics at AAAI Human Computation Conference 2013Provenance Analytics at AAAI Human Computation Conference 2013
Provenance Analytics at AAAI Human Computation Conference 2013
T Dong Huynh
 
Synthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modelingSynthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modeling
Graph-TA
 
DARIAH Geo-browser: Exploring Data through Time and Space
DARIAH Geo-browser: Exploring Data through Time and SpaceDARIAH Geo-browser: Exploring Data through Time and Space
DARIAH Geo-browser: Exploring Data through Time and SpaceMatteo Romanello
 

What's hot (20)

Open trip planner status update may 2011
Open trip planner status update may 2011Open trip planner status update may 2011
Open trip planner status update may 2011
 
A use case-driven iterative method for building a provenance-aware GCIS onto...
A use case-driven iterative method for building a provenance-aware GCIS onto...A use case-driven iterative method for building a provenance-aware GCIS onto...
A use case-driven iterative method for building a provenance-aware GCIS onto...
 
What to do with the existing spatial data in planning
What to do with the existing spatial data in planningWhat to do with the existing spatial data in planning
What to do with the existing spatial data in planning
 
From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...
From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...
From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...
 
Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality
Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality
Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality
 
Query Reranking As A Service
Query Reranking As A ServiceQuery Reranking As A Service
Query Reranking As A Service
 
The spatiotemporal RDF store Strabon
The spatiotemporal RDF store StrabonThe spatiotemporal RDF store Strabon
The spatiotemporal RDF store Strabon
 
Poster Final
Poster FinalPoster Final
Poster Final
 
06 preview of a global survey of selected deep underground facilities tynan l...
06 preview of a global survey of selected deep underground facilities tynan l...06 preview of a global survey of selected deep underground facilities tynan l...
06 preview of a global survey of selected deep underground facilities tynan l...
 
Big data in GIS Environment
Big data in GIS Environment Big data in GIS Environment
Big data in GIS Environment
 
2011 ITS World Congress - GO-Sync - A Framework to Synchronize Transit Agency...
2011 ITS World Congress - GO-Sync - A Framework to Synchronize Transit Agency...2011 ITS World Congress - GO-Sync - A Framework to Synchronize Transit Agency...
2011 ITS World Congress - GO-Sync - A Framework to Synchronize Transit Agency...
 
Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...
Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...
Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...
 
Graph of UK train stations
Graph of UK train stationsGraph of UK train stations
Graph of UK train stations
 
Using topological analysis to support event guided exploration in urban data
Using topological analysis to support event guided exploration in urban dataUsing topological analysis to support event guided exploration in urban data
Using topological analysis to support event guided exploration in urban data
 
Itasec2020
Itasec2020Itasec2020
Itasec2020
 
Tabplotd3, interactive inspection of large data
Tabplotd3, interactive inspection of large dataTabplotd3, interactive inspection of large data
Tabplotd3, interactive inspection of large data
 
Starting work with R
Starting work with RStarting work with R
Starting work with R
 
Provenance Analytics at AAAI Human Computation Conference 2013
Provenance Analytics at AAAI Human Computation Conference 2013Provenance Analytics at AAAI Human Computation Conference 2013
Provenance Analytics at AAAI Human Computation Conference 2013
 
Synthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modelingSynthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modeling
 
DARIAH Geo-browser: Exploring Data through Time and Space
DARIAH Geo-browser: Exploring Data through Time and SpaceDARIAH Geo-browser: Exploring Data through Time and Space
DARIAH Geo-browser: Exploring Data through Time and Space
 

Similar to Streaming Weather Data from Web APIs to Jupyter through Kafka

Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations
Understanding City Traffic Dynamics Utilizing Sensor and Textual ObservationsUnderstanding City Traffic Dynamics Utilizing Sensor and Textual Observations
Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations
Artificial Intelligence Institute at UofSC
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architectures
Vincenzo Gulisano
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
marpierc
 
2003-11-02 Combined Aerosol Trajectory Tool, CATT
2003-11-02 Combined Aerosol Trajectory Tool, CATT2003-11-02 Combined Aerosol Trajectory Tool, CATT
2003-11-02 Combined Aerosol Trajectory Tool, CATTRudolf Husar
 
On-the-fly Integration of Static and Dynamic Linked Data
On-the-fly Integration of Static and Dynamic Linked DataOn-the-fly Integration of Static and Dynamic Linked Data
On-the-fly Integration of Static and Dynamic Linked Data
aharth
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
Ian Foster
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Brian O'Neill
 
Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...
Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...
Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...
DataStax Academy
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNL
Anubhav Jain
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
Ian Foster
 
Understanding Public Transport Networks using Free and Open Source Software
Understanding Public Transport Networks using Free and Open Source SoftwareUnderstanding Public Transport Networks using Free and Open Source Software
Understanding Public Transport Networks using Free and Open Source Software
Patrick Sunter
 
Integrating Sensor and Social Data for Understanding City Events
Integrating Sensor and Social Data for Understanding City EventsIntegrating Sensor and Social Data for Understanding City Events
Integrating Sensor and Social Data for Understanding City Events
Artificial Intelligence Institute at UofSC
 
Stream Processing
Stream Processing Stream Processing
Stream Processing
FogGuru MSCA Project
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance Workflow
Daniel S. Katz
 
From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"
diannepatricia
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
HPCC Systems
 
Reflections on Almost Two Decades of Research into Stream Processing
Reflections on Almost Two Decades of Research into Stream ProcessingReflections on Almost Two Decades of Research into Stream Processing
Reflections on Almost Two Decades of Research into Stream Processing
Kyumars Sheykh Esmaili
 
2010 06-08 chania stochastic web modelling - copy
2010 06-08 chania stochastic web modelling - copy2010 06-08 chania stochastic web modelling - copy
2010 06-08 chania stochastic web modelling - copyvafopoulos
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Ian Foster
 
Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operations
Vincenzo Gulisano
 

Similar to Streaming Weather Data from Web APIs to Jupyter through Kafka (20)

Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations
Understanding City Traffic Dynamics Utilizing Sensor and Textual ObservationsUnderstanding City Traffic Dynamics Utilizing Sensor and Textual Observations
Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architectures
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
2003-11-02 Combined Aerosol Trajectory Tool, CATT
2003-11-02 Combined Aerosol Trajectory Tool, CATT2003-11-02 Combined Aerosol Trajectory Tool, CATT
2003-11-02 Combined Aerosol Trajectory Tool, CATT
 
On-the-fly Integration of Static and Dynamic Linked Data
On-the-fly Integration of Static and Dynamic Linked DataOn-the-fly Integration of Static and Dynamic Linked Data
On-the-fly Integration of Static and Dynamic Linked Data
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
 
Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...
Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...
Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNL
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
 
Understanding Public Transport Networks using Free and Open Source Software
Understanding Public Transport Networks using Free and Open Source SoftwareUnderstanding Public Transport Networks using Free and Open Source Software
Understanding Public Transport Networks using Free and Open Source Software
 
Integrating Sensor and Social Data for Understanding City Events
Integrating Sensor and Social Data for Understanding City EventsIntegrating Sensor and Social Data for Understanding City Events
Integrating Sensor and Social Data for Understanding City Events
 
Stream Processing
Stream Processing Stream Processing
Stream Processing
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance Workflow
 
From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
 
Reflections on Almost Two Decades of Research into Stream Processing
Reflections on Almost Two Decades of Research into Stream ProcessingReflections on Almost Two Decades of Research into Stream Processing
Reflections on Almost Two Decades of Research into Stream Processing
 
2010 06-08 chania stochastic web modelling - copy
2010 06-08 chania stochastic web modelling - copy2010 06-08 chania stochastic web modelling - copy
2010 06-08 chania stochastic web modelling - copy
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
 
Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operations
 

Recently uploaded

1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 

Recently uploaded (20)

1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 

Streaming Weather Data from Web APIs to Jupyter through Kafka

  • 1. Weather & Transportation Streaming the Data, Finding Correlations Provide capability to Data for Democracy democratizing_weather_data University of Washington Professional & Continuing Education BIG DATA 230 B Su 17: Emerging Technologies In Big Data Team D-Hawks John Bever, Karunakar Kotha, Leo Salemann, Shiva Vuppala, Wenfan Xu
  • 2. Overview Our “Client” Their Mission Learn More www.datafordemocracy.org https://github.com/Data4Democracy democratizing_weather_data/streaming Our Mission ● Provide a streaming capability to extract weather and traffic data from multiple Web API’s, and produce a clean merged dataframe suitable for Machine Learning and other Data Science analysis. ● Deliver code to D4D’s Github Repository ● Use vendor-neutral, opensource solutions, implemented in python and Jupyter notebooks
  • 3. Pipeline • Kafka transport mechanism (vendor-neutral, open source) • Message value is an entire JSON document • One topic per source API, guarantees consistent schema • Multiple json documents (sharing same schema) combined into a single dataframe • Dataframe records joined based on space and time
  • 4. Web APIs Postman • Great tool for interacting with potential APIs. • Friendly GUI for constructing requests and reading responses. • Provided JSON files before pipeline was completed. Allowed analysis of data in parallel ProgrammableWeb.com ● A massive searchable directory of over 15,500 web APIs that are updated daily ● Includes sample source code for APIs
  • 5. Producers Arguments ● Topic ● URL + Access Key Message.Value ● JSON document
  • 6. Consumers ● Filename includes timestamp ● “utf-8” decoded text file ● One complete JSON file on disk per message
  • 7. Analysis Load Json file, normalize, save as dataframe. Repeat for next json file, append to prior. 7 days of data (includes eclipse!) 30 minutes between readings 1 Merged Traffic/Weather Table (52,975 rows x 30 columns) 54 Weather Json Files from Yahoo (54 rows x 31 columns) 394 Weather Json Files from WSDOT (40,931 rows x 16 columns) 395 Traffic Json Files from WSDOT (70,998 rows x 20 columns) Merge WSDOT & Yahoo Weather Dataframes (use columns common to both) Merge Traffic/Weather Dataframes. Each Row has: - Traffic data from a specific Traffic dataframe row - Weather data from a weather station within 20 miles and 30 minutes of traffic reading.
  • 8. Visualization Charting with AltairMapping with Folium (traffic in black; weather in blue) TemperatureforZillah,WA CurrentTravelTimeforI-5 SBCorridor Eclipse
  • 9. Analyzing the Merged/Traffic Weather Dataset Scatterplot Matrix with Seaborn (10% random sample) Average Travel Time Current Travel Time Wind Direction Wind SpeedTemp. Humidity Barometer
  • 10. Wrapping Up ... Key Takeaways • Choose your python libraries carefully (2 lines of code for a fully-labeled lineplot vs. dozens) • Spatial plots first, data-joins later (I-5 traffic data vs. statewide weather, also Portland) • The fastest way to count records in a dataframe is df.shape[0] Conclusion • Data for Democracy has a repeatable way to extract weather and transportation data from WSDOT and Yahoo • Jupyter Notebook provides a teaching/coding environment • Bitnami provides low-cost simple Kafka infrastructure Further Work • Upload csv and zipped json’s to data.world • Better parameters for Producer scripts (ex. Longitude, Latitude, Date, Time) • Config files for access keys • More matrix plots, Data Science, Machine Learning •Gather data for longer time frames (fewer readings per day?) •Isolate matrix plots to specific locations and/or time.