SlideShare a Scribd company logo
Weather & Transportation
Streaming the Data, Finding Correlations
Provide capability to Data for Democracy democratizing_weather_data
University of Washington Professional & Continuing Education
BIG DATA 230 B Su 17: Emerging Technologies In Big Data
Team D-Hawks
John Bever, Karunakar Kotha, Leo Salemann, Shiva Vuppala, Wenfan Xu
Overview
Our
“Client”
Their Mission
Learn More www.datafordemocracy.org https://github.com/Data4Democracy democratizing_weather_data/streaming
Our Mission ● Provide a streaming capability to extract weather and traffic data from multiple Web API’s, and produce a
clean merged dataframe suitable for Machine Learning and other Data Science analysis.
● Deliver code to D4D’s Github Repository
● Use vendor-neutral, opensource solutions, implemented in python and Jupyter notebooks
Pipeline
• Kafka transport mechanism (vendor-neutral, open source)
• Message value is an entire JSON document
• One topic per source API, guarantees consistent schema
• Multiple json documents (sharing same schema) combined into a single dataframe
• Dataframe records joined based on space and time
Web APIs
Postman
• Great tool for interacting with potential
APIs.
• Friendly GUI for constructing requests
and reading responses.
• Provided JSON files before pipeline
was completed. Allowed analysis of
data in parallel
ProgrammableWeb.com
● A massive searchable directory of over
15,500 web APIs that are updated daily
● Includes sample source code for APIs
Producers
Arguments
● Topic
● URL + Access Key
Message.Value
● JSON document
Consumers
● Filename includes timestamp
● “utf-8” decoded text file
● One complete JSON file on
disk per message
Analysis
Load Json file, normalize, save as dataframe.
Repeat for next json file, append to prior.
7 days of data (includes eclipse!) 30 minutes between readings
1 Merged Traffic/Weather Table (52,975 rows x 30 columns)
54 Weather Json Files from Yahoo (54 rows x 31 columns)
394 Weather Json Files from WSDOT (40,931 rows x 16 columns)
395 Traffic Json Files from WSDOT (70,998 rows x 20 columns)
Merge WSDOT & Yahoo Weather Dataframes (use columns common to both)
Merge Traffic/Weather Dataframes. Each Row has:
- Traffic data from a specific Traffic dataframe row
- Weather data from a weather station within 20 miles and 30 minutes of traffic reading.
Visualization
Charting with AltairMapping with Folium (traffic in black; weather in blue)
TemperatureforZillah,WA
CurrentTravelTimeforI-5
SBCorridor
Eclipse
Analyzing the Merged/Traffic Weather Dataset
Scatterplot Matrix with Seaborn (10% random sample)
Average Travel Time
Current Travel Time
Wind Direction
Wind SpeedTemp.
Humidity
Barometer
Wrapping Up ...
Key Takeaways
• Choose your python libraries carefully (2 lines of code for a fully-labeled lineplot vs. dozens)
• Spatial plots first, data-joins later (I-5 traffic data vs. statewide weather, also Portland)
• The fastest way to count records in a dataframe is df.shape[0]
Conclusion
• Data for Democracy has a repeatable way to extract weather and transportation data from WSDOT and Yahoo
• Jupyter Notebook provides a teaching/coding environment
• Bitnami provides low-cost simple Kafka infrastructure
Further Work
• Upload csv and zipped json’s to data.world
• Better parameters for Producer scripts (ex. Longitude, Latitude, Date, Time)
• Config files for access keys
• More matrix plots, Data Science, Machine Learning
•Gather data for longer time frames (fewer readings per day?)
•Isolate matrix plots to specific locations and/or time.
THANKYOU!

More Related Content

What's hot

Open trip planner status update may 2011
Open trip planner status update may 2011Open trip planner status update may 2011
Open trip planner status update may 2011
bibianamchugh
 
A use case-driven iterative method for building a provenance-aware GCIS onto...
A use case-driven iterative method for building a provenance-aware GCIS onto...A use case-driven iterative method for building a provenance-aware GCIS onto...
A use case-driven iterative method for building a provenance-aware GCIS onto...Xiaogang (Marshall) Ma
 
What to do with the existing spatial data in planning
What to do with the existing spatial data in planningWhat to do with the existing spatial data in planning
What to do with the existing spatial data in planning
Karel Charvat
 
From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...
From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...
From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...
Anita Graser
 
Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality
Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality
Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality
andimou
 
Query Reranking As A Service
Query Reranking As A ServiceQuery Reranking As A Service
Query Reranking As A Service
Abolfazl Asudeh
 
The spatiotemporal RDF store Strabon
The spatiotemporal RDF store StrabonThe spatiotemporal RDF store Strabon
The spatiotemporal RDF store Strabon
Kostis Kyzirakos
 
06 preview of a global survey of selected deep underground facilities tynan l...
06 preview of a global survey of selected deep underground facilities tynan l...06 preview of a global survey of selected deep underground facilities tynan l...
06 preview of a global survey of selected deep underground facilities tynan l...
leann_mays
 
Big data in GIS Environment
Big data in GIS Environment Big data in GIS Environment
Big data in GIS Environment
Shivaprakash Yaragal
 
2011 ITS World Congress - GO-Sync - A Framework to Synchronize Transit Agency...
2011 ITS World Congress - GO-Sync - A Framework to Synchronize Transit Agency...2011 ITS World Congress - GO-Sync - A Framework to Synchronize Transit Agency...
2011 ITS World Congress - GO-Sync - A Framework to Synchronize Transit Agency...
Sean Barbeau
 
Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...
Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...
Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...grssieee
 
Using topological analysis to support event guided exploration in urban data
Using topological analysis to support event guided exploration in urban dataUsing topological analysis to support event guided exploration in urban data
Using topological analysis to support event guided exploration in urban data
ivaderivader
 
Graph of UK train stations
Graph of UK train stationsGraph of UK train stations
Graph of UK train stations
Daniyar Mukhanov
 
Itasec2020
Itasec2020Itasec2020
Itasec2020
Ivan Letteri
 
Tabplotd3, interactive inspection of large data
Tabplotd3, interactive inspection of large dataTabplotd3, interactive inspection of large data
Tabplotd3, interactive inspection of large data
Edwin de Jonge
 
Starting work with R
Starting work with RStarting work with R
Starting work with R
Vladimir Bakhrushin
 
Provenance Analytics at AAAI Human Computation Conference 2013
Provenance Analytics at AAAI Human Computation Conference 2013Provenance Analytics at AAAI Human Computation Conference 2013
Provenance Analytics at AAAI Human Computation Conference 2013
T Dong Huynh
 
Synthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modelingSynthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modeling
Graph-TA
 
DARIAH Geo-browser: Exploring Data through Time and Space
DARIAH Geo-browser: Exploring Data through Time and SpaceDARIAH Geo-browser: Exploring Data through Time and Space
DARIAH Geo-browser: Exploring Data through Time and SpaceMatteo Romanello
 

What's hot (20)

Open trip planner status update may 2011
Open trip planner status update may 2011Open trip planner status update may 2011
Open trip planner status update may 2011
 
A use case-driven iterative method for building a provenance-aware GCIS onto...
A use case-driven iterative method for building a provenance-aware GCIS onto...A use case-driven iterative method for building a provenance-aware GCIS onto...
A use case-driven iterative method for building a provenance-aware GCIS onto...
 
What to do with the existing spatial data in planning
What to do with the existing spatial data in planningWhat to do with the existing spatial data in planning
What to do with the existing spatial data in planning
 
From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...
From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...
From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...
 
Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality
Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality
Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality
 
Query Reranking As A Service
Query Reranking As A ServiceQuery Reranking As A Service
Query Reranking As A Service
 
The spatiotemporal RDF store Strabon
The spatiotemporal RDF store StrabonThe spatiotemporal RDF store Strabon
The spatiotemporal RDF store Strabon
 
Poster Final
Poster FinalPoster Final
Poster Final
 
06 preview of a global survey of selected deep underground facilities tynan l...
06 preview of a global survey of selected deep underground facilities tynan l...06 preview of a global survey of selected deep underground facilities tynan l...
06 preview of a global survey of selected deep underground facilities tynan l...
 
Big data in GIS Environment
Big data in GIS Environment Big data in GIS Environment
Big data in GIS Environment
 
2011 ITS World Congress - GO-Sync - A Framework to Synchronize Transit Agency...
2011 ITS World Congress - GO-Sync - A Framework to Synchronize Transit Agency...2011 ITS World Congress - GO-Sync - A Framework to Synchronize Transit Agency...
2011 ITS World Congress - GO-Sync - A Framework to Synchronize Transit Agency...
 
Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...
Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...
Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...
 
Using topological analysis to support event guided exploration in urban data
Using topological analysis to support event guided exploration in urban dataUsing topological analysis to support event guided exploration in urban data
Using topological analysis to support event guided exploration in urban data
 
Graph of UK train stations
Graph of UK train stationsGraph of UK train stations
Graph of UK train stations
 
Itasec2020
Itasec2020Itasec2020
Itasec2020
 
Tabplotd3, interactive inspection of large data
Tabplotd3, interactive inspection of large dataTabplotd3, interactive inspection of large data
Tabplotd3, interactive inspection of large data
 
Starting work with R
Starting work with RStarting work with R
Starting work with R
 
Provenance Analytics at AAAI Human Computation Conference 2013
Provenance Analytics at AAAI Human Computation Conference 2013Provenance Analytics at AAAI Human Computation Conference 2013
Provenance Analytics at AAAI Human Computation Conference 2013
 
Synthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modelingSynthetic Data Generation using exponential random Graph modeling
Synthetic Data Generation using exponential random Graph modeling
 
DARIAH Geo-browser: Exploring Data through Time and Space
DARIAH Geo-browser: Exploring Data through Time and SpaceDARIAH Geo-browser: Exploring Data through Time and Space
DARIAH Geo-browser: Exploring Data through Time and Space
 

Similar to Streaming Weather Data from Web APIs to Jupyter through Kafka

Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations
Understanding City Traffic Dynamics Utilizing Sensor and Textual ObservationsUnderstanding City Traffic Dynamics Utilizing Sensor and Textual Observations
Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations
Artificial Intelligence Institute at UofSC
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architectures
Vincenzo Gulisano
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
marpierc
 
2003-11-02 Combined Aerosol Trajectory Tool, CATT
2003-11-02 Combined Aerosol Trajectory Tool, CATT2003-11-02 Combined Aerosol Trajectory Tool, CATT
2003-11-02 Combined Aerosol Trajectory Tool, CATTRudolf Husar
 
On-the-fly Integration of Static and Dynamic Linked Data
On-the-fly Integration of Static and Dynamic Linked DataOn-the-fly Integration of Static and Dynamic Linked Data
On-the-fly Integration of Static and Dynamic Linked Data
aharth
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
Ian Foster
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Brian O'Neill
 
Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...
Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...
Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...
DataStax Academy
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNL
Anubhav Jain
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
Ian Foster
 
Understanding Public Transport Networks using Free and Open Source Software
Understanding Public Transport Networks using Free and Open Source SoftwareUnderstanding Public Transport Networks using Free and Open Source Software
Understanding Public Transport Networks using Free and Open Source Software
Patrick Sunter
 
Integrating Sensor and Social Data for Understanding City Events
Integrating Sensor and Social Data for Understanding City EventsIntegrating Sensor and Social Data for Understanding City Events
Integrating Sensor and Social Data for Understanding City Events
Artificial Intelligence Institute at UofSC
 
Stream Processing
Stream Processing Stream Processing
Stream Processing
FogGuru MSCA Project
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance Workflow
Daniel S. Katz
 
From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"
diannepatricia
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
HPCC Systems
 
Reflections on Almost Two Decades of Research into Stream Processing
Reflections on Almost Two Decades of Research into Stream ProcessingReflections on Almost Two Decades of Research into Stream Processing
Reflections on Almost Two Decades of Research into Stream Processing
Kyumars Sheykh Esmaili
 
2010 06-08 chania stochastic web modelling - copy
2010 06-08 chania stochastic web modelling - copy2010 06-08 chania stochastic web modelling - copy
2010 06-08 chania stochastic web modelling - copyvafopoulos
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Ian Foster
 
Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operations
Vincenzo Gulisano
 

Similar to Streaming Weather Data from Web APIs to Jupyter through Kafka (20)

Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations
Understanding City Traffic Dynamics Utilizing Sensor and Textual ObservationsUnderstanding City Traffic Dynamics Utilizing Sensor and Textual Observations
Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architectures
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
2003-11-02 Combined Aerosol Trajectory Tool, CATT
2003-11-02 Combined Aerosol Trajectory Tool, CATT2003-11-02 Combined Aerosol Trajectory Tool, CATT
2003-11-02 Combined Aerosol Trajectory Tool, CATT
 
On-the-fly Integration of Static and Dynamic Linked Data
On-the-fly Integration of Static and Dynamic Linked DataOn-the-fly Integration of Static and Dynamic Linked Data
On-the-fly Integration of Static and Dynamic Linked Data
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
 
Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...
Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...
Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNL
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
 
Understanding Public Transport Networks using Free and Open Source Software
Understanding Public Transport Networks using Free and Open Source SoftwareUnderstanding Public Transport Networks using Free and Open Source Software
Understanding Public Transport Networks using Free and Open Source Software
 
Integrating Sensor and Social Data for Understanding City Events
Integrating Sensor and Social Data for Understanding City EventsIntegrating Sensor and Social Data for Understanding City Events
Integrating Sensor and Social Data for Understanding City Events
 
Stream Processing
Stream Processing Stream Processing
Stream Processing
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance Workflow
 
From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
 
Reflections on Almost Two Decades of Research into Stream Processing
Reflections on Almost Two Decades of Research into Stream ProcessingReflections on Almost Two Decades of Research into Stream Processing
Reflections on Almost Two Decades of Research into Stream Processing
 
2010 06-08 chania stochastic web modelling - copy
2010 06-08 chania stochastic web modelling - copy2010 06-08 chania stochastic web modelling - copy
2010 06-08 chania stochastic web modelling - copy
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
 
Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operations
 

Recently uploaded

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 

Streaming Weather Data from Web APIs to Jupyter through Kafka

  • 1. Weather & Transportation Streaming the Data, Finding Correlations Provide capability to Data for Democracy democratizing_weather_data University of Washington Professional & Continuing Education BIG DATA 230 B Su 17: Emerging Technologies In Big Data Team D-Hawks John Bever, Karunakar Kotha, Leo Salemann, Shiva Vuppala, Wenfan Xu
  • 2. Overview Our “Client” Their Mission Learn More www.datafordemocracy.org https://github.com/Data4Democracy democratizing_weather_data/streaming Our Mission ● Provide a streaming capability to extract weather and traffic data from multiple Web API’s, and produce a clean merged dataframe suitable for Machine Learning and other Data Science analysis. ● Deliver code to D4D’s Github Repository ● Use vendor-neutral, opensource solutions, implemented in python and Jupyter notebooks
  • 3. Pipeline • Kafka transport mechanism (vendor-neutral, open source) • Message value is an entire JSON document • One topic per source API, guarantees consistent schema • Multiple json documents (sharing same schema) combined into a single dataframe • Dataframe records joined based on space and time
  • 4. Web APIs Postman • Great tool for interacting with potential APIs. • Friendly GUI for constructing requests and reading responses. • Provided JSON files before pipeline was completed. Allowed analysis of data in parallel ProgrammableWeb.com ● A massive searchable directory of over 15,500 web APIs that are updated daily ● Includes sample source code for APIs
  • 5. Producers Arguments ● Topic ● URL + Access Key Message.Value ● JSON document
  • 6. Consumers ● Filename includes timestamp ● “utf-8” decoded text file ● One complete JSON file on disk per message
  • 7. Analysis Load Json file, normalize, save as dataframe. Repeat for next json file, append to prior. 7 days of data (includes eclipse!) 30 minutes between readings 1 Merged Traffic/Weather Table (52,975 rows x 30 columns) 54 Weather Json Files from Yahoo (54 rows x 31 columns) 394 Weather Json Files from WSDOT (40,931 rows x 16 columns) 395 Traffic Json Files from WSDOT (70,998 rows x 20 columns) Merge WSDOT & Yahoo Weather Dataframes (use columns common to both) Merge Traffic/Weather Dataframes. Each Row has: - Traffic data from a specific Traffic dataframe row - Weather data from a weather station within 20 miles and 30 minutes of traffic reading.
  • 8. Visualization Charting with AltairMapping with Folium (traffic in black; weather in blue) TemperatureforZillah,WA CurrentTravelTimeforI-5 SBCorridor Eclipse
  • 9. Analyzing the Merged/Traffic Weather Dataset Scatterplot Matrix with Seaborn (10% random sample) Average Travel Time Current Travel Time Wind Direction Wind SpeedTemp. Humidity Barometer
  • 10. Wrapping Up ... Key Takeaways • Choose your python libraries carefully (2 lines of code for a fully-labeled lineplot vs. dozens) • Spatial plots first, data-joins later (I-5 traffic data vs. statewide weather, also Portland) • The fastest way to count records in a dataframe is df.shape[0] Conclusion • Data for Democracy has a repeatable way to extract weather and transportation data from WSDOT and Yahoo • Jupyter Notebook provides a teaching/coding environment • Bitnami provides low-cost simple Kafka infrastructure Further Work • Upload csv and zipped json’s to data.world • Better parameters for Producer scripts (ex. Longitude, Latitude, Date, Time) • Config files for access keys • More matrix plots, Data Science, Machine Learning •Gather data for longer time frames (fewer readings per day?) •Isolate matrix plots to specific locations and/or time.