Streaming Weather Data from Web APIs to Jupyter through Kafka

•

3 likes•919 views

Final Project for Certificate in Big Data Technologies form University of Washington Professional & Continuing Education.

Technology

Weather & Transportation
Streaming the Data, Finding Correlations
Provide capability to Data for Democracy democratizing_weather_data
University of Washington Professional & Continuing Education
BIG DATA 230 B Su 17: Emerging Technologies In Big Data
Team D-Hawks
John Bever, Karunakar Kotha, Leo Salemann, Shiva Vuppala, Wenfan Xu

Overview
Our
“Client”
Their Mission
Learn More www.datafordemocracy.org https://github.com/Data4Democracy democratizing_weather_data/streaming
Our Mission ● Provide a streaming capability to extract weather and traffic data from multiple Web API’s, and produce a
clean merged dataframe suitable for Machine Learning and other Data Science analysis.
● Deliver code to D4D’s Github Repository
● Use vendor-neutral, opensource solutions, implemented in python and Jupyter notebooks

Pipeline
• Kafka transport mechanism (vendor-neutral, open source)
• Message value is an entire JSON document
• One topic per source API, guarantees consistent schema
• Multiple json documents (sharing same schema) combined into a single dataframe
• Dataframe records joined based on space and time

Web APIs
Postman
• Great tool for interacting with potential
APIs.
• Friendly GUI for constructing requests
and reading responses.
• Provided JSON files before pipeline
was completed. Allowed analysis of
data in parallel
ProgrammableWeb.com
● A massive searchable directory of over
15,500 web APIs that are updated daily
● Includes sample source code for APIs

Producers
Arguments
● Topic
● URL + Access Key
Message.Value
● JSON document

Consumers
● Filename includes timestamp
● “utf-8” decoded text file
● One complete JSON file on
disk per message

Analysis
Load Json file, normalize, save as dataframe.
Repeat for next json file, append to prior.
7 days of data (includes eclipse!) 30 minutes between readings
1 Merged Traffic/Weather Table (52,975 rows x 30 columns)
54 Weather Json Files from Yahoo (54 rows x 31 columns)
394 Weather Json Files from WSDOT (40,931 rows x 16 columns)
395 Traffic Json Files from WSDOT (70,998 rows x 20 columns)
Merge WSDOT & Yahoo Weather Dataframes (use columns common to both)
Merge Traffic/Weather Dataframes. Each Row has:
- Traffic data from a specific Traffic dataframe row
- Weather data from a weather station within 20 miles and 30 minutes of traffic reading.

Visualization
Charting with AltairMapping with Folium (traffic in black; weather in blue)
TemperatureforZillah,WA
CurrentTravelTimeforI-5
SBCorridor
Eclipse

Analyzing the Merged/Traffic Weather Dataset
Scatterplot Matrix with Seaborn (10% random sample)
Average Travel Time
Current Travel Time
Wind Direction
Wind SpeedTemp.
Humidity
Barometer

Wrapping Up ...
Key Takeaways
• Choose your python libraries carefully (2 lines of code for a fully-labeled lineplot vs. dozens)
• Spatial plots first, data-joins later (I-5 traffic data vs. statewide weather, also Portland)
• The fastest way to count records in a dataframe is df.shape[0]
Conclusion
• Data for Democracy has a repeatable way to extract weather and transportation data from WSDOT and Yahoo
• Jupyter Notebook provides a teaching/coding environment
• Bitnami provides low-cost simple Kafka infrastructure
Further Work
• Upload csv and zipped json’s to data.world
• Better parameters for Producer scripts (ex. Longitude, Latitude, Date, Time)
• Config files for access keys
• More matrix plots, Data Science, Machine Learning
•Gather data for longer time frames (fewer readings per day?)
•Isolate matrix plots to specific locations and/or time.

Imagery-based Traffic Sensing Knowledge Graph (ITSKG) framework utilizes the stationary traffic camera information as sensors to understand the traffic patterns. This system extracts image-based features from traffic camera images, adds a semantic layer to the sensor data for traffic information, and then labels traffic imagery with semantic labels such as congestion. This framework adds a new dimension to existing traffic modeling systems by incorporating dynamic image-based features as well as creating a knowledge graph to add a layer of abstraction to understand and interpret concepts like congestion to the traffic event detection system. This work is presented at the Industrial Knowledge workshop co-located with the 9th International ACM Web Science Conference 2017 on 25th June 2017.

Web Services Emissions 2006 Falke

Rudolf Husar

ESWC 2009 In-Use Track: SCOVO

Michael Hausenblas

Visualising statistical Linked Data with Plone

Eau de Web

060525AGU_ESSI CAPITA Poster

Rudolf Husar

Spatial planning acts between all levels of government so planners face important challenges in the development of territorial frameworks and concepts every day. Spatial planning systems, the legal situation and spatial planning data management are completely different and fragmented throughout Europe. Nevertheless, planning is a holistic activity. All tasks and processes must be solved comprehensively with input from various sources. It is necessary to make inputs interoperable because it allows the user to search data from different sources, view them, download them and use them with help of geoinformation technologies (GIT).

From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...

Anita Graser

Presentation of arxiv preprint https://arxiv.org/abs/2006.16900 Mobility data science lacks common data structures and analytical functions. This position paper assesses the current status and open issues towards a universal API for mobility data science. In particular, we look at standardization efforts revolving around the OGC Moving Features standard which, so far, has not attracted much attention within the mobility data science community. We discuss the hurdles any universal API for movement data has to overcome and propose key steps of a roadmap that would provide the foundation for the development of this API.

Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality

andimou

RDF dataset Quality Assessment is currently performed primarily after data is published. Incorporating its results, by applying corresponding adjustments to the dataset, happens manually and occurs rarely. In the case of (semi-)structured data (e.g., CSV, XML), the root of the violations often derives from the mappings that specify how the RDF dataset will be generated. Thus, we suggest shifting the quality assessment from the rdf dataset to the mapping definitions that generate it. The proposed test-driven approach for assessing mappings relies on RDFUnit test cases applied over mappings specified with RML. Our evaluation is applied to different cases, e.g., DBpedia, and indicates that the overall quality of an RDF dataset is quickly and significantly improved.

Query Reranking As A Service

Abolfazl Asudeh

In Proceedings of the VLDB Endowment (PVLDB) Vol. 9 No. 11 www.vldb.org/pvldb/vol9/p888-asudeh.pdf -- The ranked retrieval model has rapidly become the de facto way for search query processing in client-server databases, especially those on the web. Despite of the extensive efforts in the database community on designing better ranking functions/mechanisms, many such databases in practice still fail to address the diverse and sometimes contradicting preferences of users on tuple ranking, perhaps (at least partially) due to the lack of expertise and/or motivation for the database owner to design truly effective ranking functions. This paper takes a different route on addressing the issue by defining a novel {\em query reranking problem}, i.e., we aim to design a third-party service that uses nothing but the public search interface of a client-server database to enable the on-the-fly processing of queries with any user-specified ranking functions (with or without selection conditions), no matter if the ranking function is supported by the database or not. We analyze the worst-case complexity of the problem and introduce a number of ideas, e.g., on-the-fly indexing, domination detection and virtual tuple pruning, to reduce the average-case cost of the query reranking algorithm. We also present extensive experimental results on real-world datasets, in both offline and live online systems, that demonstrate the effectiveness of our proposed techniques.

The spatiotemporal RDF store Strabon

Kostis Kyzirakos

Poster FinalGireeshma Reddy

06 preview of a global survey of selected deep underground facilities tynan l...

leann_mays

Big data in GIS Environment

Shivaprakash Yaragal

2011 ITS World Congress - GO-Sync - A Framework to Synchronize Transit Agency...

Sean Barbeau

Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...grssieee

Using topological analysis to support event guided exploration in urban data

ivaderivader

Graph of UK train stations

Daniyar Mukhanov

Itasec2020

Ivan Letteri

Abstract - In the present paper we describe a new, updated and refined dataset specifically tailored to train and evaluate machine learning based malware traffic analysis algorithms. To generate it, we started from the largest databases of network traffic captures available online, deriving a dataset with a set of widely-applicable features and then cleaning and preprocessing it to remove noise, handle missing data and keep its size as small as possible. The resulting dataset is not biased by any specific application (although specifically addressed to machine learning algorithms), and the entire process can run automatically to keep it updated.

Tabplotd3, interactive inspection of large data

Edwin de Jonge

Starting work with R

Vladimir Bakhrushin

Provenance Analytics at AAAI Human Computation Conference 2013

T Dong Huynh

Synthetic Data Generation using exponential random Graph modeling

Graph-TA

DARIAH Geo-browser: Exploring Data through Time and SpaceMatteo Romanello

Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations

Artificial Intelligence Institute at UofSC

Understanding speed and travel-time dynamics in response to various city related events is an important and challenging problem. Sensor data (numerical) containing average speed of vehicles passing through a road link can be interpreted in terms of traffic related incident reports from city authorities and social media data (textual), providing a complementary understanding of traffic dynamics. State-of-the-art research is focused on either analyzing sensor observations or citizen observations; we seek to exploit both in a synergistic manner. We demonstrate the role of domain knowledge in capturing the non-linearity of speed and travel-time dynamics by segmenting speed and travel-time observations into simpler components amenable to description using linear models such as Linear Dynamical System (LDS). Specifically, we propose Restricted Switching Linear Dynamical System (RSLDS) to model normal speed and travel time dynamics and thereby characterize anomalous dynamics. We utilize the city traffic events extracted from text to explain anomalous dynamics. We present a large scale evaluation of the proposed approach on a real-world traffic and twitter dataset collected over a year with promising results.

The data streaming processing paradigm and its use in modern fog architectures

Vincenzo Gulisano

What's hot

Open trip planner status update may 2011

bibianamchugh

A use case-driven iterative method for building a provenance-aware GCIS onto...Xiaogang (Marshall) Ma

What to do with the existing spatial data in planning

Karel Charvat

From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...

Anita Graser

Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality

andimou

Query Reranking As A Service

Abolfazl Asudeh

The spatiotemporal RDF store Strabon

Kostis Kyzirakos

Poster FinalGireeshma Reddy

06 preview of a global survey of selected deep underground facilities tynan l...

leann_mays

Big data in GIS Environment

Shivaprakash Yaragal

2011 ITS World Congress - GO-Sync - A Framework to Synchronize Transit Agency...

Sean Barbeau

Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...grssieee

Using topological analysis to support event guided exploration in urban data

ivaderivader

Graph of UK train stations

Daniyar Mukhanov

Itasec2020

Ivan Letteri

Tabplotd3, interactive inspection of large data

Edwin de Jonge

Starting work with R

Vladimir Bakhrushin

Provenance Analytics at AAAI Human Computation Conference 2013

T Dong Huynh

Synthetic Data Generation using exponential random Graph modeling

Graph-TA

DARIAH Geo-browser: Exploring Data through Time and SpaceMatteo Romanello

What's hot (20)

Open trip planner status update may 2011

A use case-driven iterative method for building a provenance-aware GCIS onto...

What to do with the existing spatial data in planning

From Simple Features to Moving Features and Beyond? at OGC Member Meeting, Se...

Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality

Query Reranking As A Service

The spatiotemporal RDF store Strabon

Poster Final

06 preview of a global survey of selected deep underground facilities tynan l...

Big data in GIS Environment

2011 ITS World Congress - GO-Sync - A Framework to Synchronize Transit Agency...

Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...

Using topological analysis to support event guided exploration in urban data

Graph of UK train stations

Itasec2020

Tabplotd3, interactive inspection of large data

Starting work with R

Provenance Analytics at AAAI Human Computation Conference 2013

Synthetic Data Generation using exponential random Graph modeling

DARIAH Geo-browser: Exploring Data through Time and Space

Similar to Streaming Weather Data from Web APIs to Jupyter through Kafka

Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations

Artificial Intelligence Institute at UofSC

The data streaming processing paradigm and its use in modern fog architectures

Vincenzo Gulisano

Cyberinfrastructure and Applications Overview: Howard University June22

marpierc

2003-11-02 Combined Aerosol Trajectory Tool, CATTRudolf Husar

On-the-fly Integration of Static and Dynamic Linked Data

aharth

Open Analytics Environment

Ian Foster

Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...

Brian O'Neill

Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...

DataStax Academy

Do you want to expose a kick-ass REST API? Do you want interactions with that API to drive slick dashboards that answer hard questions? Do you want all of that in near real-time, distributed across a number of machines, and tolerant of system faults? Take one part Cassandra, stir in a bit of Paxos, and blend with Storm. Coat the rim of the glass with DropWizard. Sit back, relax, and enjoy the show. Come see how Health Market Science is using this mixology to fix our healthcare system.

Data dissemination and materials informatics at LBNL

Anubhav Jain

Taming Big Data!

Ian Foster

My talk at the Winter School on Big Data in Tarragona, Spain. Abstract: We have made much progress over the past decade toward harnessing the collective power of IT resources distributed across the globe. In high-energy physics, astronomy, and climate, thousands work daily within virtual computing systems with global scope. But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that many more--ultimately most?--researchers will soon require capabilities not so different from those used by such big-science teams. How are we to meet these needs? Must every lab be filled with computers and every researcher become an IT specialist? Perhaps the solution is rather to move research IT out of the lab entirely: to leverage the “cloud” (whether private or public) to achieve economies of scale and reduce cognitive load. I explore the past, current, and potential future of large-scale outsourcing and automation for science, and suggest opportunities and challenges for today’s researchers.

Understanding Public Transport Networks using Free and Open Source Software

Patrick Sunter

Integrating Sensor and Social Data for Understanding City Events

Artificial Intelligence Institute at UofSC

Stream Processing

FogGuru MSCA Project

Swift Parallel Scripting for High-Performance Workflow

Daniel S. Katz

The Swift scripting language was created to provide a simple, compact way to write parallel scripts that run many copies of ordinary programs concurrently in various workflow patterns, reducing the need for complex parallel programming or arcane scripting to achieve this common high-level task. The result was a highly portable programming model based on implicitly parallel functional dataflow. The same Swift script runs on multi-core computers, clusters, grids, clouds, and supercomputers, and is thus a useful tool for moving workflow computations from laptop to distributed and/or high performance systems. Swift has proven to be very general, and is in use in domains ranging from earth systems to bioinformatics to molecular modeling. It’s more recently been adapted to serve as a programming model for much finer-grain in-memory workflow on extreme scale systems, where it can perform task rates in the millions to billion-per-second. In this talk, we describe the state of Swift’s implementation, present several Swift applications, and discuss ideas for of the future evolution of the programming model on which it’s based.

From complex Systems to Networks: Discovering and Modeling the Correct Network"

diannepatricia

Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre

HPCC Systems

Reflections on Almost Two Decades of Research into Stream Processing

Kyumars Sheykh Esmaili

2010 06-08 chania stochastic web modelling - copyvafopoulos

Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...

Ian Foster

Ever more data- and compute-intensive science makes computing increasingly important for research. But for advanced computing infrastructure to benefit more than the scientific 1%, we need new delivery methods that slash access costs, new sustainability models beyond direct research funding, and new platform capabilities to accelerate the development of new, interoperable tools and services. The Globus team has been working towards these goals since 2010. We have developed software-as-a-service methods that move complex and time-consuming research IT tasks out of the lab and into the cloud, thus greatly reducing the expertise and resources required to use them. We have demonstrated a subscription-based funding model that engages research institutions in supporting service operations. And we are now also showing how the platform services that underpin Globus applications can accelerate the development and use of an integrated ecosystem of advanced science applications, such as NCAR’s Research Data Archive and OSG Connect, thus enabling access to powerful data and compute resources by many more people than is possible today. In this talk, I introduce Globus services and the underlying Globus platform. I present representative applications and discuss opportunities that this platform presents for both small science and large facilities.

Data Streaming (in a Nutshell) ... and Spark's window operations

Vincenzo Gulisano

Similar to Streaming Weather Data from Web APIs to Jupyter through Kafka (20)

Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations

The data streaming processing paradigm and its use in modern fog architectures

Cyberinfrastructure and Applications Overview: Howard University June22

2003-11-02 Combined Aerosol Trajectory Tool, CATT

On-the-fly Integration of Static and Dynamic Linked Data

Open Analytics Environment

Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...

Cassandra Day 2014: Re-envisioning the Lambda Architecture - Web-Services & R...

Data dissemination and materials informatics at LBNL

Taming Big Data!

Understanding Public Transport Networks using Free and Open Source Software

Integrating Sensor and Social Data for Understanding City Events

Stream Processing

Swift Parallel Scripting for High-Performance Workflow

From complex Systems to Networks: Discovering and Modeling the Correct Network"

Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre

Reflections on Almost Two Decades of Research into Stream Processing

2010 06-08 chania stochastic web modelling - copy

Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...

Data Streaming (in a Nutshell) ... and Spark's window operations

Recently uploaded

Monitoring Java Application Security with JDK Tools and JFR Events

Ana-Maria Mihalceanu

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

DanBrown980551

Do you want to learn how to model and simulate an electrical network from scratch in under an hour? Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)! During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook. PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides: - A fully editable and extendable library for grid component modelling; - Visualization tools to display your network; - Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses; The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well. What you will learn during the webinar: - For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills; - For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.

Microsoft - Power Platform_G.Aspiotis.pdf

Uni Systems S.M.S.A.

Removing Uninteresting Bytes in Software Fuzzing

Aftab Hussain

Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process. In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds. - These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Aggregage

Securing your Kubernetes cluster_ a step-by-step guide to success !

KatiaHIMEUR1

Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster. However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks. In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.

Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx

nkrafacyberclub

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

Albert Hoitingh

The Art of the Pitch: WordPress Relationships and Sales

Laura Byrne

Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes? All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.

Video Streaming: Then, Now, and in the Future

Alpen-Adria-Universität

In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.

GridMate - End to end testing is a critical piece to ensure quality and avoid...

ThomasParaiso2

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

FIDO Alliance

DevOps and Testing slides at DASA Connect

Kari Kakkonen

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

FIDO Alliance

FIDO Alliance Osaka Seminar: Overview.pdf

FIDO Alliance

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...

Neo4j

Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...

Neo4j

Dr. Sean Tan, Head of Data Science, Changi Airport Group Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

BookNet Canada

The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more. Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/ Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...

Neo4j

Leonard Jayamohan, Partner & Generative AI Lead, Deloitte This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.

SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf

Peter Spielvogel

Building better applications for business users with SAP Fiori. • What is SAP Fiori and why it matters to you • How a better user experience drives measurable business benefits • How to get started with SAP Fiori today • How SAP Fiori elements accelerates application development • How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities • How SAP Fiori paves the way for using AI in SAP apps

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

Microsoft - Power Platform_G.Aspiotis.pdf

Removing Uninteresting Bytes in Software Fuzzing

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Securing your Kubernetes cluster_ a step-by-step guide to success !

Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

The Art of the Pitch: WordPress Relationships and Sales

Video Streaming: Then, Now, and in the Future

GridMate - End to end testing is a critical piece to ensure quality and avoid...

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

DevOps and Testing slides at DASA Connect

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

FIDO Alliance Osaka Seminar: Overview.pdf

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...

SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf

Streaming Weather Data from Web APIs to Jupyter through Kafka

1. Weather & Transportation Streaming the Data, Finding Correlations Provide capability to Data for Democracy democratizing_weather_data University of Washington Professional & Continuing Education BIG DATA 230 B Su 17: Emerging Technologies In Big Data Team D-Hawks John Bever, Karunakar Kotha, Leo Salemann, Shiva Vuppala, Wenfan Xu

2. Overview Our “Client” Their Mission Learn More www.datafordemocracy.org https://github.com/Data4Democracy democratizing_weather_data/streaming Our Mission ● Provide a streaming capability to extract weather and traffic data from multiple Web API’s, and produce a clean merged dataframe suitable for Machine Learning and other Data Science analysis. ● Deliver code to D4D’s Github Repository ● Use vendor-neutral, opensource solutions, implemented in python and Jupyter notebooks

3. Pipeline • Kafka transport mechanism (vendor-neutral, open source) • Message value is an entire JSON document • One topic per source API, guarantees consistent schema • Multiple json documents (sharing same schema) combined into a single dataframe • Dataframe records joined based on space and time

4. Web APIs Postman • Great tool for interacting with potential APIs. • Friendly GUI for constructing requests and reading responses. • Provided JSON files before pipeline was completed. Allowed analysis of data in parallel ProgrammableWeb.com ● A massive searchable directory of over 15,500 web APIs that are updated daily ● Includes sample source code for APIs

5. Producers Arguments ● Topic ● URL + Access Key Message.Value ● JSON document

6. Consumers ● Filename includes timestamp ● “utf-8” decoded text file ● One complete JSON file on disk per message

7. Analysis Load Json file, normalize, save as dataframe. Repeat for next json file, append to prior. 7 days of data (includes eclipse!) 30 minutes between readings 1 Merged Traffic/Weather Table (52,975 rows x 30 columns) 54 Weather Json Files from Yahoo (54 rows x 31 columns) 394 Weather Json Files from WSDOT (40,931 rows x 16 columns) 395 Traffic Json Files from WSDOT (70,998 rows x 20 columns) Merge WSDOT & Yahoo Weather Dataframes (use columns common to both) Merge Traffic/Weather Dataframes. Each Row has: - Traffic data from a specific Traffic dataframe row - Weather data from a weather station within 20 miles and 30 minutes of traffic reading.

8. Visualization Charting with AltairMapping with Folium (traffic in black; weather in blue) TemperatureforZillah,WA CurrentTravelTimeforI-5 SBCorridor Eclipse

9. Analyzing the Merged/Traffic Weather Dataset Scatterplot Matrix with Seaborn (10% random sample) Average Travel Time Current Travel Time Wind Direction Wind SpeedTemp. Humidity Barometer

10. Wrapping Up ... Key Takeaways • Choose your python libraries carefully (2 lines of code for a fully-labeled lineplot vs. dozens) • Spatial plots first, data-joins later (I-5 traffic data vs. statewide weather, also Portland) • The fastest way to count records in a dataframe is df.shape[0] Conclusion • Data for Democracy has a repeatable way to extract weather and transportation data from WSDOT and Yahoo • Jupyter Notebook provides a teaching/coding environment • Bitnami provides low-cost simple Kafka infrastructure Further Work • Upload csv and zipped json’s to data.world • Better parameters for Producer scripts (ex. Longitude, Latitude, Date, Time) • Config files for access keys • More matrix plots, Data Science, Machine Learning •Gather data for longer time frames (fewer readings per day?) •Isolate matrix plots to specific locations and/or time.

11. THANKYOU!

Streaming Weather Data from Web APIs to Jupyter through Kafka

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Streaming Weather Data from Web APIs to Jupyter through Kafka

Similar to Streaming Weather Data from Web APIs to Jupyter through Kafka (20)

Recently uploaded

Recently uploaded (20)

Streaming Weather Data from Web APIs to Jupyter through Kafka