SlideShare a Scribd company logo
1 of 60
Download to read offline
i4Trust Website
i4Trust Community
End-to-end AI Solution With
PySpark & Real-time Data
Processing With Apache NiFi
Rihab Feki, Machine Learning Engineer and Evangelist
Sherifa Fayed, Technical Expert and Evangelist
FIWARE Foundation
Learning goals
● Managing real time data with the Context broker
● Data transformation (JSON-LD to CSV) and persistence with Apache NiFi
● Setting up a Google Cloud environment
○ Creating a Dataproc cluster and connecting it to Jupyter Notebook
○ Using Google Cloud Storage Service (GCS)
● Modeling a ML solution based on PySpark for multi-classification
● Deploying the ML model with Flask and getting predictions in real time
2
End to End AI service architecture powered by FIWARE
3
What is Apache NiFi?
4
● System to process and distribute
data
● Supports powerful and scalable
directed graphs of data routing and
transformation
● Web based user interface
● Tracking data flow from beginning
to end
5
Connecting NiFi to the Context Broker
NGSI-LD
Context
Broker
cURL or
Postman
NiFi (or
Draco)
1026:1026 5050:5050
27017:27017
MongoDB
Entity: Steel plate geometric measurements
6
Link to dataset
End to End AI service architecture powered by FIWARE
7
Dataflow overview
8
Ingesting
Data processing and persistence with NiFi
9
The overall NiFi workflow
10
Overview about NiFi workflow
11
● ListenHTTP: Configured as source for receiving notifications from the Context Broker
● GetFile: Reads data in JSON-LD format
● JoltTransformJSON: Transforms nested JSON to a simple attribute value JSON file which
will be used to form the CSV file
● ConvertRecord: Converts each JSON file to a CSV file
● MergeContent: Merges the resulting CSV record files to form an aggregated CSV dataset
(PS: The min number of entries can be set to perform the merge processor. Also a max
number of flow files can be set)
● PutGCSObject: Saves the resulting CSV in Google Cloud Storage bucket
Demo: Data transformation and persistence
12
End to End AI service architecture powered by FIWARE
13
What is PySpark?
14
PySpark is an interface for Apache Spark in Python.
PySpark is a language for performing exploratory data analysis at scale, building
machine learning pipelines, and creating ETLs for a data platform.
What is Cloud Dataproc?
Batch processing, querying, streaming
Machine Learning
15
Dataproc is a managed Spark and Hadoop service that lets you take advantage of open source data tools.
Big data processing
The main benefits of Dataproc
● It’s a managed service: No need for a system administrator to set it up.
● It’s fast: Cluster creation in about 90 seconds.
● It’s cheaper than building your own cluster: Because you can spin up a Dataproc cluster
when you need to run a job and shut it down afterward, so you only pay when jobs are
running.
● It’s integrated with other Google Cloud services: Including Cloud Storage, BigQuery, and
Cloud Bigtable, so it’s easy to get data into and out of it.
16
What makes Dataproc special?
Typical mode of operation of Hadoop/Spark   on premise or in cloud  require
you deploy a cluster, and then you proceed to fill up said cluster with jobs
17
What makes Dataproc special?
Rather than submitting the
job to an already-deployed
cluster, you submit the job to
Dataproc, which creates a
cluster on your behalf
on-demand.
➢ A cluster is now a
means to an end for
job execution.
18
Let’s see how Dataproc makes
it easy and scalable...
19
Data scientists are big fans of Jupyter Notebooks
However, getting an Apache Spark cluster set-up with Jupyter Notebooks can be complicated
Apache Spark and Jupyter Lab architecture on Google
Cloud
20
How it works ?
1. Setting up the Google cloud environment and creating a project
2. Creating a Google Cloud Storage bucket for your cluster
3. Creating a Dataproc Cluster with Jupyter and Component Gateway
4. Accessing the JupyterLab web UI on Dataproc
5. Creating a Notebook and developing the AI algorithm with PySpark
21
Creating a Dataproc cluster using cloud shell
22
gcloud beta dataproc clusters create ${CLUSTER_NAME} 
--region=${REGION} 
--image-version=1.4 
--master-machine-type=n1-standard-4 
--worker-machine-type=n1-standard-4 
--bucket=${BUCKET_NAME} 
--optional-components=ANACONDA,JUPYTER 
--enable-component-gateway
Component gateway for additional cluster components
23
Steel plates faults prediction
24
● Features: 27
Geometric Measurements
of the steel plates
● Fault types: 7
○ Pastry
○ Z_Scratch
○ K_Scatch
○ Stains
○ Dirtiness
○ Bumps
○ Other_Faults
Dataset format: CSV | Number of Samples: 1941
Link to dataset
Demo:
Cloud environment set up
Modeling the ML solution based on PySpark
25
ML model deployment with Flask architecture
26
27017:27017
5000:5000
www
Orion
Context
Broker
Model
prediction
Saved
Model
(.parquet)
Model training
Jupyter Notebook
cURL or
Postman
1026:1026
Useful links
● Source code and documentation
https://github.com/RihabFekii/PySpark-AI-service_Data-processing-NiFi
● Jupyter Notebook for Steel faults classification based on PySpark
https://github.com/RihabFekii/PySpark-AI-service_Data-processing-NiFi/blob/master/PySpark/P
ySpark_Steel_faults_Classification.ipynb
● Data processing and persistence with Apache NiFi documentation
https://github.com/RihabFekii/PySpark-AI-service_Data-processing-NiFi/tree/master/Nifi
● NGSI-LD Context Broker
○ Docker hub: https://hub.docker.com/r/fiware/orion-ld
○ Documentation: https://github.com/FIWARE/context.Orion-LD
● Google Cloud Console: https://console.cloud.google.com/
● Flask Apps with Docker: https://runnable.com/docker/python/docker-compose-with-flask-apps
● 27
Summary
28
● Context Broker does not store data or persist it
● Google Cloud Dataproc service provides data scientists an easy way to set up, control
and secure data science environments. Plus making it simple and fast for them to
integrate it with other open source data tools.
● Once the Dataproc cluster is created, it is not possible to change the configuration or
install new dependencies, libraries,..
● Dataproc jobs are limited to some programming languages.
● Apache NiFi might not be the easiest tool for data processing but it manages data flows
and automates them and it fits when dealing with large scale data or real-time data.
● Other cloud platforms could be used (AWS, Azure, Databricks,..)
Thank you!
http://fiware.org
Follow @FIWARE on Twitter
30
Q&A
31
Annex
32
Creating an entity in the Context Broker
unique id and type
Attributes of the
created entity
33
Subscribing to changes and listening
posting subscription to Orion
subscribing to all entities of
certain type
sending notification to port NiFi is listening on
subscribing to relevant attributes
34
Subscribing to changes and listening
Inducing a change and receiving a notification
35
Processor Out Count jumps to 1
changing the value of X_Minimum
Inducing a change and receiving a notification
Setting up the cloud environment
37
Creating a project in Google Cloud Platform
38
We can manage the
project via the Cloud Shell
Creating a Google Cloud Storage bucket
39
➢ Store datastes
➢ Store Notebooks
➢ Store logs
➢ Store output files
Creating a Dataproc cluster using cloud shell
40
gcloud beta dataproc clusters create ${CLUSTER_NAME} 
--region=${REGION} 
--image-version=1.4 
--master-machine-type=n1-standard-4 
--worker-machine-type=n1-standard-4 
--bucket=${BUCKET_NAME} 
--optional-components=ANACONDA,JUPYTER 
--enable-component-gateway
Creating a Dataproc cluster using GUI
41
Component gateway for additional cluster components
42
Overview of the Dataproc cluster
43
Dataproc cluster web interfaces
44
Dataproc cluster : Jupyter lab interface
45
Creating a Jupyter Notebook and provisioning data from
Google Cloud Bucket
46
Link to Notebook
Submitting a Pyspark job using Dataproc GUI
47
Submitting a Pyspark job to Dataproc cluster
48
www.egm.io
Fluid Machine Learning
lifecycle with FIWARE
Benoit Orihuela – i4Trust Training Webinar
A TYPICAL ML LIFECYCLE
• A Data Scientist
• Get and clean up data
• Prepare and train a ML model
• An IT person
• Package and deploy the ML model
• An end user
• Discover the available ML models (with respect to privacy)
• Ask to use one or more of them (and optionally pay for it)
• Get real time data (predictions, outliers,…) from a ML model
ML lifecycle with FIWARE - i4Trust - 12/05/2021 3
WHAT DO WE AIM AT?
ML lifecycle with FIWARE - i4Trust - 12/05/2021 4
Bridge the gap between data scientists and operations (MLOps)
Develop the Machine Learning as a Service (MLaaS) model
And also:
More and more use cases requiring ML / AI activities
FIWARE needs to offer a rich variety of tools
THE TRAINING AND PREPARATION PHASE
ML lifecycle with FIWARE - i4Trust - 12/05/2021 5
THE DISCOVERY AND REGISTRATION PHASE
ML lifecycle with FIWARE - i4Trust - 12/05/2021 6
THE PREDICTION PHASE
ML lifecycle with FIWARE - i4Trust - 12/05/2021 7
DEMONSTRATIONS
• Demonstration #1 - End to end demonstration of a ML model development, deployment and use
• Use of Jupyter notebook as interface
• Applied to a simplistic water flow calculation
• Demonstration #2 – Events generation from video stream analysis
• Realtime extraction of context information from a video stream
ML lifecycle with FIWARE - i4Trust - 12/05/2021 8
Thank You!
Tel:
E.mail:
www.egm.io
Benoit ORIHUELA
Lead Architect
+33 687427107
benoit.orihuela@egm.io
www.egm.io
MlaaS for Image analysis
Anwar ALFATAYRI
2
REAL LIFE EXAMPLE: SOCIAL DISTANCING
Number of people : 14
Groups of 2 people : 1
Groups of 3 people : 2
Groups of 4 people : 1
Groups >4 People: 0
Machine learning on the edge
TWO APPROACHES
3
Image 3 people detected
Street
Fiware Cloud
4
Machine learning as a service
TWO APPROACHES
Image
3 people detected
Street Fiware Cloud
API Rest

More Related Content

What's hot

KeyRock and Wilma - Openstack-based Identity Management in FIWARE
KeyRock and Wilma - Openstack-based Identity Management in FIWAREKeyRock and Wilma - Openstack-based Identity Management in FIWARE
KeyRock and Wilma - Openstack-based Identity Management in FIWAREÁlvaro Alonso González
 
FIWARE Wednesday Webinars - Introduction to NGSI-LD
FIWARE Wednesday Webinars - Introduction to NGSI-LDFIWARE Wednesday Webinars - Introduction to NGSI-LD
FIWARE Wednesday Webinars - Introduction to NGSI-LDFIWARE
 
FIWARE Wednesday Webinars - Core Context Management
FIWARE Wednesday Webinars - Core Context ManagementFIWARE Wednesday Webinars - Core Context Management
FIWARE Wednesday Webinars - Core Context ManagementFIWARE
 
FIWARE Training: API Umbrella
FIWARE Training: API UmbrellaFIWARE Training: API Umbrella
FIWARE Training: API UmbrellaFIWARE
 
FIWARE Global Summit - NGSI-LD – an Evolution from NGSIv2
FIWARE Global Summit - NGSI-LD – an Evolution from NGSIv2FIWARE Global Summit - NGSI-LD – an Evolution from NGSIv2
FIWARE Global Summit - NGSI-LD – an Evolution from NGSIv2FIWARE
 
FIWARE Context Information Management
FIWARE Context Information ManagementFIWARE Context Information Management
FIWARE Context Information Managementfisuda
 
FIWARE Wednesday Webinars - NGSI-LD and Smart Data Models: Standard Access to...
FIWARE Wednesday Webinars - NGSI-LD and Smart Data Models: Standard Access to...FIWARE Wednesday Webinars - NGSI-LD and Smart Data Models: Standard Access to...
FIWARE Wednesday Webinars - NGSI-LD and Smart Data Models: Standard Access to...FIWARE
 
Actuation, Federation and Interoperability of Context Brokers
Actuation, Federation and Interoperability of Context BrokersActuation, Federation and Interoperability of Context Brokers
Actuation, Federation and Interoperability of Context BrokersFIWARE
 
Introduction to Smart Data Models
Introduction to Smart Data ModelsIntroduction to Smart Data Models
Introduction to Smart Data ModelsFIWARE
 
Everything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingEverything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingAmuhinda Hungai
 
Towards Digital Twin standards following an open source approach
Towards Digital Twin standards following an open source approachTowards Digital Twin standards following an open source approach
Towards Digital Twin standards following an open source approachFIWARE
 
FIWARE Global Summit - NGSI-LD - NGSI with Linked Data
FIWARE Global Summit - NGSI-LD - NGSI with Linked DataFIWARE Global Summit - NGSI-LD - NGSI with Linked Data
FIWARE Global Summit - NGSI-LD - NGSI with Linked DataFIWARE
 
i4Trust IAM Components
i4Trust IAM Componentsi4Trust IAM Components
i4Trust IAM ComponentsFIWARE
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySparkRussell Jurney
 
Introduction to OPA
Introduction to OPAIntroduction to OPA
Introduction to OPAKnoldus Inc.
 
FIWARE Training: NGSI-LD Introduction
FIWARE Training: NGSI-LD IntroductionFIWARE Training: NGSI-LD Introduction
FIWARE Training: NGSI-LD IntroductionFIWARE
 
Splunk Cloud
Splunk CloudSplunk Cloud
Splunk CloudSplunk
 
Integrating Fiware Orion, Keyrock and Wilma
Integrating Fiware Orion, Keyrock and WilmaIntegrating Fiware Orion, Keyrock and Wilma
Integrating Fiware Orion, Keyrock and WilmaDalton Valadares
 

What's hot (20)

KeyRock and Wilma - Openstack-based Identity Management in FIWARE
KeyRock and Wilma - Openstack-based Identity Management in FIWAREKeyRock and Wilma - Openstack-based Identity Management in FIWARE
KeyRock and Wilma - Openstack-based Identity Management in FIWARE
 
FIWARE and Smart Data Models
FIWARE and Smart Data ModelsFIWARE and Smart Data Models
FIWARE and Smart Data Models
 
FIWARE Wednesday Webinars - Introduction to NGSI-LD
FIWARE Wednesday Webinars - Introduction to NGSI-LDFIWARE Wednesday Webinars - Introduction to NGSI-LD
FIWARE Wednesday Webinars - Introduction to NGSI-LD
 
FIWARE Wednesday Webinars - Core Context Management
FIWARE Wednesday Webinars - Core Context ManagementFIWARE Wednesday Webinars - Core Context Management
FIWARE Wednesday Webinars - Core Context Management
 
FIWARE Training: API Umbrella
FIWARE Training: API UmbrellaFIWARE Training: API Umbrella
FIWARE Training: API Umbrella
 
FIWARE Global Summit - NGSI-LD – an Evolution from NGSIv2
FIWARE Global Summit - NGSI-LD – an Evolution from NGSIv2FIWARE Global Summit - NGSI-LD – an Evolution from NGSIv2
FIWARE Global Summit - NGSI-LD – an Evolution from NGSIv2
 
FIWARE Context Information Management
FIWARE Context Information ManagementFIWARE Context Information Management
FIWARE Context Information Management
 
FIWARE Wednesday Webinars - NGSI-LD and Smart Data Models: Standard Access to...
FIWARE Wednesday Webinars - NGSI-LD and Smart Data Models: Standard Access to...FIWARE Wednesday Webinars - NGSI-LD and Smart Data Models: Standard Access to...
FIWARE Wednesday Webinars - NGSI-LD and Smart Data Models: Standard Access to...
 
Actuation, Federation and Interoperability of Context Brokers
Actuation, Federation and Interoperability of Context BrokersActuation, Federation and Interoperability of Context Brokers
Actuation, Federation and Interoperability of Context Brokers
 
Introduction to Smart Data Models
Introduction to Smart Data ModelsIntroduction to Smart Data Models
Introduction to Smart Data Models
 
Everything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingEverything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed Tracing
 
Towards Digital Twin standards following an open source approach
Towards Digital Twin standards following an open source approachTowards Digital Twin standards following an open source approach
Towards Digital Twin standards following an open source approach
 
FIWARE Global Summit - NGSI-LD - NGSI with Linked Data
FIWARE Global Summit - NGSI-LD - NGSI with Linked DataFIWARE Global Summit - NGSI-LD - NGSI with Linked Data
FIWARE Global Summit - NGSI-LD - NGSI with Linked Data
 
i4Trust IAM Components
i4Trust IAM Componentsi4Trust IAM Components
i4Trust IAM Components
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySpark
 
Introduction to OPA
Introduction to OPAIntroduction to OPA
Introduction to OPA
 
Open Policy Agent
Open Policy AgentOpen Policy Agent
Open Policy Agent
 
FIWARE Training: NGSI-LD Introduction
FIWARE Training: NGSI-LD IntroductionFIWARE Training: NGSI-LD Introduction
FIWARE Training: NGSI-LD Introduction
 
Splunk Cloud
Splunk CloudSplunk Cloud
Splunk Cloud
 
Integrating Fiware Orion, Keyrock and Wilma
Integrating Fiware Orion, Keyrock and WilmaIntegrating Fiware Orion, Keyrock and Wilma
Integrating Fiware Orion, Keyrock and Wilma
 

Similar to Session 8 - Creating Data Processing Services | Train the Trainers Program

Day 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramDay 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramFIWARE
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowWes McKinney
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Alluxio, Inc.
 
Building Modern Data Pipelines on GCP via a FREE online Bootcamp
Building Modern Data Pipelines on GCP via a FREE online BootcampBuilding Modern Data Pipelines on GCP via a FREE online Bootcamp
Building Modern Data Pipelines on GCP via a FREE online BootcampData Con LA
 
Oracle Cloud Infrastructure Data Science 概要資料(20200406)
Oracle Cloud Infrastructure Data Science 概要資料(20200406)Oracle Cloud Infrastructure Data Science 概要資料(20200406)
Oracle Cloud Infrastructure Data Science 概要資料(20200406)オラクルエンジニア通信
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityWes McKinney
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21JDA Labs MTL
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT_MTL
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshSion Smith
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshIanFurlong4
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingTimothy Spann
 
Google's Infrastructure and Specific IoT Services
Google's Infrastructure and Specific IoT ServicesGoogle's Infrastructure and Specific IoT Services
Google's Infrastructure and Specific IoT ServicesIntel® Software
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxCalvinSim10
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...James Anderson
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAlluxio, Inc.
 
Red hat infrastructure for analytics
Red hat infrastructure for analyticsRed hat infrastructure for analytics
Red hat infrastructure for analyticsKyle Bader
 
Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...
Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...
Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...Andrejs Prokopjevs
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupPaco Nathan
 

Similar to Session 8 - Creating Data Processing Services | Train the Trainers Program (20)

Day 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramDay 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers Program
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
 
Building Modern Data Pipelines on GCP via a FREE online Bootcamp
Building Modern Data Pipelines on GCP via a FREE online BootcampBuilding Modern Data Pipelines on GCP via a FREE online Bootcamp
Building Modern Data Pipelines on GCP via a FREE online Bootcamp
 
Oracle Cloud Infrastructure Data Science 概要資料(20200406)
Oracle Cloud Infrastructure Data Science 概要資料(20200406)Oracle Cloud Infrastructure Data Science 概要資料(20200406)
Oracle Cloud Infrastructure Data Science 概要資料(20200406)
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
 
Google's Infrastructure and Specific IoT Services
Google's Infrastructure and Specific IoT ServicesGoogle's Infrastructure and Specific IoT Services
Google's Infrastructure and Specific IoT Services
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
 
JAM23-24_ppt.pptx
JAM23-24_ppt.pptxJAM23-24_ppt.pptx
JAM23-24_ppt.pptx
 
Red hat infrastructure for analytics
Red hat infrastructure for analyticsRed hat infrastructure for analytics
Red hat infrastructure for analytics
 
Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...
Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...
Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 

More from FIWARE

Behm_Herne_NeMo_akt.pptx
Behm_Herne_NeMo_akt.pptxBehm_Herne_NeMo_akt.pptx
Behm_Herne_NeMo_akt.pptxFIWARE
 
Katharina Hogrebe Herne Digital Days.pdf
 Katharina Hogrebe Herne Digital Days.pdf Katharina Hogrebe Herne Digital Days.pdf
Katharina Hogrebe Herne Digital Days.pdfFIWARE
 
Christoph Mertens_IDSA_Introduction to Data Spaces.pptx
Christoph Mertens_IDSA_Introduction to Data Spaces.pptxChristoph Mertens_IDSA_Introduction to Data Spaces.pptx
Christoph Mertens_IDSA_Introduction to Data Spaces.pptxFIWARE
 
Behm_Herne_NeMo.pptx
Behm_Herne_NeMo.pptxBehm_Herne_NeMo.pptx
Behm_Herne_NeMo.pptxFIWARE
 
Evangelists + iHubs Promo Slides.pptx
Evangelists + iHubs Promo Slides.pptxEvangelists + iHubs Promo Slides.pptx
Evangelists + iHubs Promo Slides.pptxFIWARE
 
Lukas Künzel Smart City Operating System.pptx
Lukas Künzel Smart City Operating System.pptxLukas Künzel Smart City Operating System.pptx
Lukas Künzel Smart City Operating System.pptxFIWARE
 
Pierre Golz Der Transformationsprozess im Konzern Stadt.pptx
Pierre Golz Der Transformationsprozess im Konzern Stadt.pptxPierre Golz Der Transformationsprozess im Konzern Stadt.pptx
Pierre Golz Der Transformationsprozess im Konzern Stadt.pptxFIWARE
 
Dennis Wendland_The i4Trust Collaboration Programme.pptx
Dennis Wendland_The i4Trust Collaboration Programme.pptxDennis Wendland_The i4Trust Collaboration Programme.pptx
Dennis Wendland_The i4Trust Collaboration Programme.pptxFIWARE
 
Ulrich Ahle_FIWARE.pptx
Ulrich Ahle_FIWARE.pptxUlrich Ahle_FIWARE.pptx
Ulrich Ahle_FIWARE.pptxFIWARE
 
Aleksandar Vrglevski _FIWARE DACH_OSIH.pptx
Aleksandar Vrglevski _FIWARE DACH_OSIH.pptxAleksandar Vrglevski _FIWARE DACH_OSIH.pptx
Aleksandar Vrglevski _FIWARE DACH_OSIH.pptxFIWARE
 
Water Quality - Lukas Kuenzel.pdf
Water Quality - Lukas Kuenzel.pdfWater Quality - Lukas Kuenzel.pdf
Water Quality - Lukas Kuenzel.pdfFIWARE
 
Cameron Brooks_FGS23_FIWARE Summit_Keynote_Cameron.pptx
Cameron Brooks_FGS23_FIWARE Summit_Keynote_Cameron.pptxCameron Brooks_FGS23_FIWARE Summit_Keynote_Cameron.pptx
Cameron Brooks_FGS23_FIWARE Summit_Keynote_Cameron.pptxFIWARE
 
FiWareSummit.msGIS-Data-to-Value.2023.06.12.pptx
FiWareSummit.msGIS-Data-to-Value.2023.06.12.pptxFiWareSummit.msGIS-Data-to-Value.2023.06.12.pptx
FiWareSummit.msGIS-Data-to-Value.2023.06.12.pptxFIWARE
 
Boris Otto_FGS2023_Opening- EU Innovations from Data_PUB_V1_BOt.pptx
Boris Otto_FGS2023_Opening- EU Innovations from Data_PUB_V1_BOt.pptxBoris Otto_FGS2023_Opening- EU Innovations from Data_PUB_V1_BOt.pptx
Boris Otto_FGS2023_Opening- EU Innovations from Data_PUB_V1_BOt.pptxFIWARE
 
Bjoern de Vidts_FGS23_Opening_athumi - bjord de vidts - personal data spaces....
Bjoern de Vidts_FGS23_Opening_athumi - bjord de vidts - personal data spaces....Bjoern de Vidts_FGS23_Opening_athumi - bjord de vidts - personal data spaces....
Bjoern de Vidts_FGS23_Opening_athumi - bjord de vidts - personal data spaces....FIWARE
 
Abdulrahman Ibrahim_FGS23 Opening - Abdulrahman Ibrahim.pdf
Abdulrahman Ibrahim_FGS23 Opening - Abdulrahman Ibrahim.pdfAbdulrahman Ibrahim_FGS23 Opening - Abdulrahman Ibrahim.pdf
Abdulrahman Ibrahim_FGS23 Opening - Abdulrahman Ibrahim.pdfFIWARE
 
FGS2023_Opening_Red Hat Keynote Andrea Battaglia.pdf
FGS2023_Opening_Red Hat Keynote Andrea Battaglia.pdfFGS2023_Opening_Red Hat Keynote Andrea Battaglia.pdf
FGS2023_Opening_Red Hat Keynote Andrea Battaglia.pdfFIWARE
 
HTAG_Skalierung_Plattform_lokal_final_versand.pptx
HTAG_Skalierung_Plattform_lokal_final_versand.pptxHTAG_Skalierung_Plattform_lokal_final_versand.pptx
HTAG_Skalierung_Plattform_lokal_final_versand.pptxFIWARE
 
WE_LoRaWAN _ IoT.pptx
WE_LoRaWAN  _ IoT.pptxWE_LoRaWAN  _ IoT.pptx
WE_LoRaWAN _ IoT.pptxFIWARE
 
EU Opp_Clara Pezuela - German chapter.pptx
EU Opp_Clara Pezuela - German chapter.pptxEU Opp_Clara Pezuela - German chapter.pptx
EU Opp_Clara Pezuela - German chapter.pptxFIWARE
 

More from FIWARE (20)

Behm_Herne_NeMo_akt.pptx
Behm_Herne_NeMo_akt.pptxBehm_Herne_NeMo_akt.pptx
Behm_Herne_NeMo_akt.pptx
 
Katharina Hogrebe Herne Digital Days.pdf
 Katharina Hogrebe Herne Digital Days.pdf Katharina Hogrebe Herne Digital Days.pdf
Katharina Hogrebe Herne Digital Days.pdf
 
Christoph Mertens_IDSA_Introduction to Data Spaces.pptx
Christoph Mertens_IDSA_Introduction to Data Spaces.pptxChristoph Mertens_IDSA_Introduction to Data Spaces.pptx
Christoph Mertens_IDSA_Introduction to Data Spaces.pptx
 
Behm_Herne_NeMo.pptx
Behm_Herne_NeMo.pptxBehm_Herne_NeMo.pptx
Behm_Herne_NeMo.pptx
 
Evangelists + iHubs Promo Slides.pptx
Evangelists + iHubs Promo Slides.pptxEvangelists + iHubs Promo Slides.pptx
Evangelists + iHubs Promo Slides.pptx
 
Lukas Künzel Smart City Operating System.pptx
Lukas Künzel Smart City Operating System.pptxLukas Künzel Smart City Operating System.pptx
Lukas Künzel Smart City Operating System.pptx
 
Pierre Golz Der Transformationsprozess im Konzern Stadt.pptx
Pierre Golz Der Transformationsprozess im Konzern Stadt.pptxPierre Golz Der Transformationsprozess im Konzern Stadt.pptx
Pierre Golz Der Transformationsprozess im Konzern Stadt.pptx
 
Dennis Wendland_The i4Trust Collaboration Programme.pptx
Dennis Wendland_The i4Trust Collaboration Programme.pptxDennis Wendland_The i4Trust Collaboration Programme.pptx
Dennis Wendland_The i4Trust Collaboration Programme.pptx
 
Ulrich Ahle_FIWARE.pptx
Ulrich Ahle_FIWARE.pptxUlrich Ahle_FIWARE.pptx
Ulrich Ahle_FIWARE.pptx
 
Aleksandar Vrglevski _FIWARE DACH_OSIH.pptx
Aleksandar Vrglevski _FIWARE DACH_OSIH.pptxAleksandar Vrglevski _FIWARE DACH_OSIH.pptx
Aleksandar Vrglevski _FIWARE DACH_OSIH.pptx
 
Water Quality - Lukas Kuenzel.pdf
Water Quality - Lukas Kuenzel.pdfWater Quality - Lukas Kuenzel.pdf
Water Quality - Lukas Kuenzel.pdf
 
Cameron Brooks_FGS23_FIWARE Summit_Keynote_Cameron.pptx
Cameron Brooks_FGS23_FIWARE Summit_Keynote_Cameron.pptxCameron Brooks_FGS23_FIWARE Summit_Keynote_Cameron.pptx
Cameron Brooks_FGS23_FIWARE Summit_Keynote_Cameron.pptx
 
FiWareSummit.msGIS-Data-to-Value.2023.06.12.pptx
FiWareSummit.msGIS-Data-to-Value.2023.06.12.pptxFiWareSummit.msGIS-Data-to-Value.2023.06.12.pptx
FiWareSummit.msGIS-Data-to-Value.2023.06.12.pptx
 
Boris Otto_FGS2023_Opening- EU Innovations from Data_PUB_V1_BOt.pptx
Boris Otto_FGS2023_Opening- EU Innovations from Data_PUB_V1_BOt.pptxBoris Otto_FGS2023_Opening- EU Innovations from Data_PUB_V1_BOt.pptx
Boris Otto_FGS2023_Opening- EU Innovations from Data_PUB_V1_BOt.pptx
 
Bjoern de Vidts_FGS23_Opening_athumi - bjord de vidts - personal data spaces....
Bjoern de Vidts_FGS23_Opening_athumi - bjord de vidts - personal data spaces....Bjoern de Vidts_FGS23_Opening_athumi - bjord de vidts - personal data spaces....
Bjoern de Vidts_FGS23_Opening_athumi - bjord de vidts - personal data spaces....
 
Abdulrahman Ibrahim_FGS23 Opening - Abdulrahman Ibrahim.pdf
Abdulrahman Ibrahim_FGS23 Opening - Abdulrahman Ibrahim.pdfAbdulrahman Ibrahim_FGS23 Opening - Abdulrahman Ibrahim.pdf
Abdulrahman Ibrahim_FGS23 Opening - Abdulrahman Ibrahim.pdf
 
FGS2023_Opening_Red Hat Keynote Andrea Battaglia.pdf
FGS2023_Opening_Red Hat Keynote Andrea Battaglia.pdfFGS2023_Opening_Red Hat Keynote Andrea Battaglia.pdf
FGS2023_Opening_Red Hat Keynote Andrea Battaglia.pdf
 
HTAG_Skalierung_Plattform_lokal_final_versand.pptx
HTAG_Skalierung_Plattform_lokal_final_versand.pptxHTAG_Skalierung_Plattform_lokal_final_versand.pptx
HTAG_Skalierung_Plattform_lokal_final_versand.pptx
 
WE_LoRaWAN _ IoT.pptx
WE_LoRaWAN  _ IoT.pptxWE_LoRaWAN  _ IoT.pptx
WE_LoRaWAN _ IoT.pptx
 
EU Opp_Clara Pezuela - German chapter.pptx
EU Opp_Clara Pezuela - German chapter.pptxEU Opp_Clara Pezuela - German chapter.pptx
EU Opp_Clara Pezuela - German chapter.pptx
 

Recently uploaded

QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfROWELL MARQUINA
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024BookNet Canada
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Bitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactiveBitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactivestartupro
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneUiPathCommunity
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
A PowerPoint Presentation on Vikram Lander pptx
A PowerPoint Presentation on Vikram Lander pptxA PowerPoint Presentation on Vikram Lander pptx
A PowerPoint Presentation on Vikram Lander pptxatharvdev2010
 
Software Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerSoftware Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerAnchore
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Dublin_mulesoft_meetup_API_specifications.pptx
Dublin_mulesoft_meetup_API_specifications.pptxDublin_mulesoft_meetup_API_specifications.pptx
Dublin_mulesoft_meetup_API_specifications.pptxKunal Gupta
 
Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Memoori
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdfHCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdfROWELL MARQUINA
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 

Recently uploaded (20)

QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdf
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Bitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactiveBitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactive
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyone
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
BoSEU24 | Bill Thompson | Talk From Another Century
BoSEU24 | Bill Thompson | Talk From Another CenturyBoSEU24 | Bill Thompson | Talk From Another Century
BoSEU24 | Bill Thompson | Talk From Another Century
 
A PowerPoint Presentation on Vikram Lander pptx
A PowerPoint Presentation on Vikram Lander pptxA PowerPoint Presentation on Vikram Lander pptx
A PowerPoint Presentation on Vikram Lander pptx
 
Software Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerSoftware Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey Hightower
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Dublin_mulesoft_meetup_API_specifications.pptx
Dublin_mulesoft_meetup_API_specifications.pptxDublin_mulesoft_meetup_API_specifications.pptx
Dublin_mulesoft_meetup_API_specifications.pptx
 
Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdfHCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 

Session 8 - Creating Data Processing Services | Train the Trainers Program

  • 1. i4Trust Website i4Trust Community End-to-end AI Solution With PySpark & Real-time Data Processing With Apache NiFi Rihab Feki, Machine Learning Engineer and Evangelist Sherifa Fayed, Technical Expert and Evangelist FIWARE Foundation
  • 2. Learning goals ● Managing real time data with the Context broker ● Data transformation (JSON-LD to CSV) and persistence with Apache NiFi ● Setting up a Google Cloud environment ○ Creating a Dataproc cluster and connecting it to Jupyter Notebook ○ Using Google Cloud Storage Service (GCS) ● Modeling a ML solution based on PySpark for multi-classification ● Deploying the ML model with Flask and getting predictions in real time 2
  • 3. End to End AI service architecture powered by FIWARE 3
  • 4. What is Apache NiFi? 4 ● System to process and distribute data ● Supports powerful and scalable directed graphs of data routing and transformation ● Web based user interface ● Tracking data flow from beginning to end
  • 5. 5 Connecting NiFi to the Context Broker NGSI-LD Context Broker cURL or Postman NiFi (or Draco) 1026:1026 5050:5050 27017:27017 MongoDB
  • 6. Entity: Steel plate geometric measurements 6 Link to dataset
  • 7. End to End AI service architecture powered by FIWARE 7
  • 9. Data processing and persistence with NiFi 9
  • 10. The overall NiFi workflow 10
  • 11. Overview about NiFi workflow 11 ● ListenHTTP: Configured as source for receiving notifications from the Context Broker ● GetFile: Reads data in JSON-LD format ● JoltTransformJSON: Transforms nested JSON to a simple attribute value JSON file which will be used to form the CSV file ● ConvertRecord: Converts each JSON file to a CSV file ● MergeContent: Merges the resulting CSV record files to form an aggregated CSV dataset (PS: The min number of entries can be set to perform the merge processor. Also a max number of flow files can be set) ● PutGCSObject: Saves the resulting CSV in Google Cloud Storage bucket
  • 12. Demo: Data transformation and persistence 12
  • 13. End to End AI service architecture powered by FIWARE 13
  • 14. What is PySpark? 14 PySpark is an interface for Apache Spark in Python. PySpark is a language for performing exploratory data analysis at scale, building machine learning pipelines, and creating ETLs for a data platform.
  • 15. What is Cloud Dataproc? Batch processing, querying, streaming Machine Learning 15 Dataproc is a managed Spark and Hadoop service that lets you take advantage of open source data tools. Big data processing
  • 16. The main benefits of Dataproc ● It’s a managed service: No need for a system administrator to set it up. ● It’s fast: Cluster creation in about 90 seconds. ● It’s cheaper than building your own cluster: Because you can spin up a Dataproc cluster when you need to run a job and shut it down afterward, so you only pay when jobs are running. ● It’s integrated with other Google Cloud services: Including Cloud Storage, BigQuery, and Cloud Bigtable, so it’s easy to get data into and out of it. 16
  • 17. What makes Dataproc special? Typical mode of operation of Hadoop/Spark   on premise or in cloud  require you deploy a cluster, and then you proceed to fill up said cluster with jobs 17
  • 18. What makes Dataproc special? Rather than submitting the job to an already-deployed cluster, you submit the job to Dataproc, which creates a cluster on your behalf on-demand. ➢ A cluster is now a means to an end for job execution. 18
  • 19. Let’s see how Dataproc makes it easy and scalable... 19 Data scientists are big fans of Jupyter Notebooks However, getting an Apache Spark cluster set-up with Jupyter Notebooks can be complicated
  • 20. Apache Spark and Jupyter Lab architecture on Google Cloud 20
  • 21. How it works ? 1. Setting up the Google cloud environment and creating a project 2. Creating a Google Cloud Storage bucket for your cluster 3. Creating a Dataproc Cluster with Jupyter and Component Gateway 4. Accessing the JupyterLab web UI on Dataproc 5. Creating a Notebook and developing the AI algorithm with PySpark 21
  • 22. Creating a Dataproc cluster using cloud shell 22 gcloud beta dataproc clusters create ${CLUSTER_NAME} --region=${REGION} --image-version=1.4 --master-machine-type=n1-standard-4 --worker-machine-type=n1-standard-4 --bucket=${BUCKET_NAME} --optional-components=ANACONDA,JUPYTER --enable-component-gateway
  • 23. Component gateway for additional cluster components 23
  • 24. Steel plates faults prediction 24 ● Features: 27 Geometric Measurements of the steel plates ● Fault types: 7 ○ Pastry ○ Z_Scratch ○ K_Scatch ○ Stains ○ Dirtiness ○ Bumps ○ Other_Faults Dataset format: CSV | Number of Samples: 1941 Link to dataset
  • 25. Demo: Cloud environment set up Modeling the ML solution based on PySpark 25
  • 26. ML model deployment with Flask architecture 26 27017:27017 5000:5000 www Orion Context Broker Model prediction Saved Model (.parquet) Model training Jupyter Notebook cURL or Postman 1026:1026
  • 27. Useful links ● Source code and documentation https://github.com/RihabFekii/PySpark-AI-service_Data-processing-NiFi ● Jupyter Notebook for Steel faults classification based on PySpark https://github.com/RihabFekii/PySpark-AI-service_Data-processing-NiFi/blob/master/PySpark/P ySpark_Steel_faults_Classification.ipynb ● Data processing and persistence with Apache NiFi documentation https://github.com/RihabFekii/PySpark-AI-service_Data-processing-NiFi/tree/master/Nifi ● NGSI-LD Context Broker ○ Docker hub: https://hub.docker.com/r/fiware/orion-ld ○ Documentation: https://github.com/FIWARE/context.Orion-LD ● Google Cloud Console: https://console.cloud.google.com/ ● Flask Apps with Docker: https://runnable.com/docker/python/docker-compose-with-flask-apps ● 27
  • 28. Summary 28 ● Context Broker does not store data or persist it ● Google Cloud Dataproc service provides data scientists an easy way to set up, control and secure data science environments. Plus making it simple and fast for them to integrate it with other open source data tools. ● Once the Dataproc cluster is created, it is not possible to change the configuration or install new dependencies, libraries,.. ● Dataproc jobs are limited to some programming languages. ● Apache NiFi might not be the easiest tool for data processing but it manages data flows and automates them and it fits when dealing with large scale data or real-time data. ● Other cloud platforms could be used (AWS, Azure, Databricks,..)
  • 32. 32 Creating an entity in the Context Broker unique id and type Attributes of the created entity
  • 33. 33 Subscribing to changes and listening posting subscription to Orion subscribing to all entities of certain type sending notification to port NiFi is listening on subscribing to relevant attributes
  • 34. 34 Subscribing to changes and listening
  • 35. Inducing a change and receiving a notification 35
  • 36. Processor Out Count jumps to 1 changing the value of X_Minimum Inducing a change and receiving a notification
  • 37. Setting up the cloud environment 37
  • 38. Creating a project in Google Cloud Platform 38 We can manage the project via the Cloud Shell
  • 39. Creating a Google Cloud Storage bucket 39 ➢ Store datastes ➢ Store Notebooks ➢ Store logs ➢ Store output files
  • 40. Creating a Dataproc cluster using cloud shell 40 gcloud beta dataproc clusters create ${CLUSTER_NAME} --region=${REGION} --image-version=1.4 --master-machine-type=n1-standard-4 --worker-machine-type=n1-standard-4 --bucket=${BUCKET_NAME} --optional-components=ANACONDA,JUPYTER --enable-component-gateway
  • 41. Creating a Dataproc cluster using GUI 41
  • 42. Component gateway for additional cluster components 42
  • 43. Overview of the Dataproc cluster 43
  • 44. Dataproc cluster web interfaces 44
  • 45. Dataproc cluster : Jupyter lab interface 45
  • 46. Creating a Jupyter Notebook and provisioning data from Google Cloud Bucket 46 Link to Notebook
  • 47. Submitting a Pyspark job using Dataproc GUI 47
  • 48. Submitting a Pyspark job to Dataproc cluster 48
  • 49. www.egm.io Fluid Machine Learning lifecycle with FIWARE Benoit Orihuela – i4Trust Training Webinar
  • 50. A TYPICAL ML LIFECYCLE • A Data Scientist • Get and clean up data • Prepare and train a ML model • An IT person • Package and deploy the ML model • An end user • Discover the available ML models (with respect to privacy) • Ask to use one or more of them (and optionally pay for it) • Get real time data (predictions, outliers,…) from a ML model ML lifecycle with FIWARE - i4Trust - 12/05/2021 3
  • 51. WHAT DO WE AIM AT? ML lifecycle with FIWARE - i4Trust - 12/05/2021 4 Bridge the gap between data scientists and operations (MLOps) Develop the Machine Learning as a Service (MLaaS) model And also: More and more use cases requiring ML / AI activities FIWARE needs to offer a rich variety of tools
  • 52. THE TRAINING AND PREPARATION PHASE ML lifecycle with FIWARE - i4Trust - 12/05/2021 5
  • 53. THE DISCOVERY AND REGISTRATION PHASE ML lifecycle with FIWARE - i4Trust - 12/05/2021 6
  • 54. THE PREDICTION PHASE ML lifecycle with FIWARE - i4Trust - 12/05/2021 7
  • 55. DEMONSTRATIONS • Demonstration #1 - End to end demonstration of a ML model development, deployment and use • Use of Jupyter notebook as interface • Applied to a simplistic water flow calculation • Demonstration #2 – Events generation from video stream analysis • Realtime extraction of context information from a video stream ML lifecycle with FIWARE - i4Trust - 12/05/2021 8
  • 56. Thank You! Tel: E.mail: www.egm.io Benoit ORIHUELA Lead Architect +33 687427107 benoit.orihuela@egm.io
  • 57. www.egm.io MlaaS for Image analysis Anwar ALFATAYRI
  • 58. 2 REAL LIFE EXAMPLE: SOCIAL DISTANCING Number of people : 14 Groups of 2 people : 1 Groups of 3 people : 2 Groups of 4 people : 1 Groups >4 People: 0
  • 59. Machine learning on the edge TWO APPROACHES 3 Image 3 people detected Street Fiware Cloud
  • 60. 4 Machine learning as a service TWO APPROACHES Image 3 people detected Street Fiware Cloud API Rest