SlideShare a Scribd company logo
1 of 21
Download to read offline
Cloud Tools Guidance
Author: Timothy Spann
Michael Kohs George Vetticaden
Date: 04/19/2023
Last Updated: 5/01/2023
Notice
This document assumes that you have registered for an account, activated it and logged into
the CDP Sandbox. This is for authorized users only who have attended the webinar and have
read the training materials.
A short guide and references are listed here.
THIS IS NOT FOR USE WITH THE FIRST TWO
TUTORIALS. THIS IS FOR BUILDING ASSETS FOR
YOUR OWN NEW FLOWS.
1. How To Build Data Assets
1.1 Create a Kafka Topic
1. Navigate to Data Hub Clusters
2. Navigate to the oss-kafka-demo cluster
3. Navigate to Streams Messaging Manager
Info: Streams Messaging Manager (SMM) is a tool for working with Apache Kafka.
4. Now that you are in SMM.
5. Navigate to the round icon third from the top, click this Topic button.
6. You are now in the Topic browser.
7. Click Add New to build a new topic.
8. Enter the name of your topic prefixed with your “Workload User Name“
<yourusername>_yournewtopic, ex: tim_younewtopic.
9. Enter the name of your topic prefixed with your Workload User Name, ex:
tim_younewtopic. For settings you should create it with (3 partitions,
cleanup.policy: delete, availability maximum) as shown above.
Congratulations! You have built a new topic.
1.2 Create a Schema If You Need One. Not Required For Using Kafka Topics or Tutorials.
1. Navigate to Schema Registry from the Kafka Data Hub.
2. You will see existing schemas.
3. Click the white plus sign in the gray hexagon to create a new schema.
4. You can now add a new schema by entering a unique name starting with your Workload
User Name (ex: tim), followed by a short description and then the schema text as
shown. If you need examples, see the github list at the end of this guide.
5. Click Save and you have a new schema. If there were errors they will be shown and
you can fix them. For more help see, Schema Registry Documentation and Schema
Registry Public Cloud.
Congratulations! You have built a new schema. Start using it in your DataFlow application.
1.3 Create an Apache Iceberg Table
1. Navigate to oss-kudu-demo from the Data Hubs list
2. Navigate to Hue from the Kudu Data Hub.
3. Inside of Hue you can now create your table.
4. Navigate to your database, this was created for you.
Info: The database name pattern is your email address and then all special characters are
replaced with underscore and then _db is appended to that to make the db name and the
ranger policy is created to limit access to just the user and those that are in the admin
group. For example:
5. Create your Apache Iceberg table, it must be prefixed with your Work Load User Name
(userid).
CREATE TABLE <<userid>>_syslog_critical_archive
(priority int, severity int, facility int, version int, event_timestamp bigint, hostname string,
body string, appName string, procid string, messageid string,
structureddata struct<sdid:struct<eventid:string,eventsource:string,iut:string>>)
STORED BY ICEBERG
6. Your table is created in s3a://oss-uat2/iceberg/
7. Once you have sent data to your table, you can query it.
Additional Documentation
● Create a Table
● Query a Table
● Apache Iceberg Table Properties
2. Streaming Data Sets Available for Apps
The following Kafka topics are being populated with streaming data for you.
These come from the read-only Kafka cluster.
Navigate to the Data Hub Clusters.
Click on oss-kafka-datagen.
Click Schema Registry.
Click Streams Messaging Manager.
Use these brokers to connect to them:
Brokers
oss-kafka-datagen-corebroker1.oss-demo.qsm5-opic.cloudera.site:9093,oss-kafka-dat
agen-corebroker0.oss-demo.qsm5-opic.cloudera.site:9093,oss-kafka-datagen-corebro
ker2.oss-demo.qsm5-opic.cloudera.site:9093
Use this link for Schema Registry
https://#{Schema2}:7790/api/v1
Schema Registry Parameter Hostname: Schema2
oss-kafka-datagen-master0.oss-demo.qsm5-opic.cloudera.site
To View Schemas in the Schema Registry click the icon from the datahub
https://oss-kafka-datagen-gateway.oss-demo.qsm5-opic.cloudera.site/oss-kafka-datagen/cd
p-proxy/schema-registry/ui/#/
Schemas
https://github.com/tspannhw/FLaNK-DataFlows/tree/main/schemas
Group ID: yourid_cdf
Customers (customer)
Example Row
{"first_name":"Charley","last_name":"Farrell","age":19,"city":"Sawaynside","country":"Guinea","em
ail":"keven.herzog@hotmail.com","phone_number":"312-269-6619"}
IP Tables (ip_address)
Example Row
{"source_ip":"216.25.204.241","dest_port":219,"tcp_flags_ack":0,"tcp_flags_reset":0,"ts":"2023-0
4-20 15:26:45.517"}
Orders (orders)
Example Row
{"order_id":84170282,"city":"Wintheiserton","street_address":"80206 Caroyln
Lakes","amount":29,"order_time":"2023-04-20 13:25:06.097","order_status":"DELIVERED"}
Plants (plant)
Example Row
{"plant_id":829,"city":"Lake Gerald","lat":"39.568679","lon":"-151.64497","country":"Eritrea"}
Sensors (sensor)
Example Row
{"sensor_id":264,"timestamp_of_production":"2023-04-20 18:28:42.751"}
Sensor Data (sensor_data)
Example Row
{"sensor_id":250,"timestamp_of_production":"2023-04-20 18:42:04.847","sensor_value":-72}
Weather (weather)
Example Row
{"city":"New Ernesto","temp_c":21,"description":"Sleet"}
Transactions (transactions)
Example Row
{"sender_id":40816,"receiver_id":96057,"amount":557,"execution_date":"2023-04-20
16:15:30.744","currency":"UYU"}
These are realistic generated data sources that you can use, they are available from
read-only Kafka topics. These can be consumed by any developers in the sandbox.
Make sure you name your Kafka Consumer your Workload Username _ Some
Name.
Ex: tim_customerdata_reader
3. Bring Your Own Data (Public Only)
● Data is visible and downloadable to all, make sure it is safe, free, open, public data.
● Public REST Feeds are good
○ Wikipedia
https://docs.cloudera.com/dataflow/cloud/flow-designer-beginners-guide-readyflo
w/topics/cdf-flow-designer-getting-started-readyflow.html
○ https://gbfs.citibikenyc.com/gbfs/en/station_status.json
○ https://travel.state.gov/_res/rss/TAsTWs.xml
○ https://www.njtransit.com/rss/BusAdvisories_feed.xml
○ https://www.njtransit.com/rss/RailAdvisories_feed.xml
○ https://www.njtransit.com/rss/LightRailAdvisories_feed.xml
○ https://www.njtransit.com/rss/CustomerNotices_feed.xml
○ https://w1.weather.gov/xml/current_obs/all_xml.zip
○ https://dailymed.nlm.nih.gov/dailymed/services/v2/spls.json?page=1&pagesize=1
00
○ https://dailymed.nlm.nih.gov/dailymed/services/v2/drugnames.json?pagesize=10
0
○ https://dailymed.nlm.nih.gov/dailymed/rss.cfm
● Generic data files
○ https://aws.amazon.com/data-exchange
● Simulators
○ Use external data simulators via REST
○ Use GeneralFlowFile see:
https://www.datainmotion.dev/2019/04/integration-testing-for-apache-nifi.html
● Schemas, Data Sources and Examples
○ https://github.com/tspannhw/FLaNK-AllTheStreams/
○ https://github.com/tspannhw/FLaNK-DataFlows
○ https://github.com/tspannhw/FLaNK-TravelAdvisory/
○ https://github.com/tspannhw/FLiP-Current22-LetsMonitorAllTheThings
○ https://github.com/tspannhw/create-nifi-kafka-flink-apps
○ https://www.datainmotion.dev/2021/01/flank-real-time-transit-information-for.html
CloudToolGuidance03May2023

More Related Content

Similar to CloudToolGuidance03May2023

Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...
Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...
Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...buildacloud
 
Open writing-cloud-collab
Open writing-cloud-collabOpen writing-cloud-collab
Open writing-cloud-collabKaren Vuong
 
Hortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on AzureHortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on AzureAnita Luthra
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackRich Lee
 
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...HostedbyConfluent
 
HPC and HPGPU Cluster Tutorial
HPC and HPGPU Cluster TutorialHPC and HPGPU Cluster Tutorial
HPC and HPGPU Cluster TutorialDirk Hähnel
 
Running Spark In Production in the Cloud is Not Easy with Nayur Khan
Running Spark In Production in the Cloud is Not Easy with Nayur KhanRunning Spark In Production in the Cloud is Not Easy with Nayur Khan
Running Spark In Production in the Cloud is Not Easy with Nayur KhanDatabricks
 
Reusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modulesReusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modulesYevgeniy Brikman
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Amazon Web Services
 
Snowflake free trial_lab_guide
Snowflake free trial_lab_guideSnowflake free trial_lab_guide
Snowflake free trial_lab_guideslidedown1
 
Snowflake_free_trial_LabGuide.pdf
Snowflake_free_trial_LabGuide.pdfSnowflake_free_trial_LabGuide.pdf
Snowflake_free_trial_LabGuide.pdfssuserf8f9b2
 
Azure Data serices and databricks architecture
Azure Data serices and databricks architectureAzure Data serices and databricks architecture
Azure Data serices and databricks architectureAdventureWorld5
 

Similar to CloudToolGuidance03May2023 (20)

Mysql ppt
Mysql pptMysql ppt
Mysql ppt
 
Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...
Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...
Open Writing! Collaborative Authoring for CloudStack Documentation by Jessica...
 
Open writing-cloud-collab
Open writing-cloud-collabOpen writing-cloud-collab
Open writing-cloud-collab
 
Lampstack (1)
Lampstack (1)Lampstack (1)
Lampstack (1)
 
SubjectsPlus Manual in Compatible with XAMPP
SubjectsPlus Manual in Compatible with XAMPPSubjectsPlus Manual in Compatible with XAMPP
SubjectsPlus Manual in Compatible with XAMPP
 
Hortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on AzureHortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on Azure
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
 
HPC and HPGPU Cluster Tutorial
HPC and HPGPU Cluster TutorialHPC and HPGPU Cluster Tutorial
HPC and HPGPU Cluster Tutorial
 
Running Spark In Production in the Cloud is Not Easy with Nayur Khan
Running Spark In Production in the Cloud is Not Easy with Nayur KhanRunning Spark In Production in the Cloud is Not Easy with Nayur Khan
Running Spark In Production in the Cloud is Not Easy with Nayur Khan
 
Team lab install_en
Team lab install_enTeam lab install_en
Team lab install_en
 
Reusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modulesReusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modules
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3
 
Snowflake free trial_lab_guide
Snowflake free trial_lab_guideSnowflake free trial_lab_guide
Snowflake free trial_lab_guide
 
Snowflake_free_trial_LabGuide.pdf
Snowflake_free_trial_LabGuide.pdfSnowflake_free_trial_LabGuide.pdf
Snowflake_free_trial_LabGuide.pdf
 
Azure Data serices and databricks architecture
Azure Data serices and databricks architectureAzure Data serices and databricks architecture
Azure Data serices and databricks architecture
 
slides.pptx
slides.pptxslides.pptx
slides.pptx
 
slides.pptx
slides.pptxslides.pptx
slides.pptx
 
mush With Xampp
mush With Xamppmush With Xampp
mush With Xampp
 

More from Timothy Spann

April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...Timothy Spann
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-PipelinesTimothy Spann
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTimothy Spann
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-ProfitsTimothy Spann
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...Timothy Spann
 
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsConf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsTimothy Spann
 
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...Timothy Spann
 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI PipelinesTimothy Spann
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...Timothy Spann
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesTimothy Spann
 
Building Real-Time Travel Alerts
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel AlertsTimothy Spann
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
 
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data PipelinesTimothy Spann
 
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoTimothy Spann
 
AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101Timothy Spann
 
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC MeetupTimothy Spann
 

More from Timothy Spann (20)

April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...2024 XTREMEJ_  Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...
 
28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines28March2024-Codeless-Generative-AI-Pipelines
28March2024-Codeless-Generative-AI-Pipelines
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
 
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsConf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python Processors
 
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...
 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
 
Building Real-Time Travel Alerts
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
 
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
 
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
 
AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101
 
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
 

Recently uploaded

What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 

Recently uploaded (20)

What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 

CloudToolGuidance03May2023

  • 1. Cloud Tools Guidance Author: Timothy Spann Michael Kohs George Vetticaden Date: 04/19/2023 Last Updated: 5/01/2023 Notice This document assumes that you have registered for an account, activated it and logged into the CDP Sandbox. This is for authorized users only who have attended the webinar and have read the training materials. A short guide and references are listed here. THIS IS NOT FOR USE WITH THE FIRST TWO TUTORIALS. THIS IS FOR BUILDING ASSETS FOR YOUR OWN NEW FLOWS. 1. How To Build Data Assets 1.1 Create a Kafka Topic 1. Navigate to Data Hub Clusters
  • 2. 2. Navigate to the oss-kafka-demo cluster 3. Navigate to Streams Messaging Manager
  • 3. Info: Streams Messaging Manager (SMM) is a tool for working with Apache Kafka. 4. Now that you are in SMM. 5. Navigate to the round icon third from the top, click this Topic button.
  • 4. 6. You are now in the Topic browser. 7. Click Add New to build a new topic. 8. Enter the name of your topic prefixed with your “Workload User Name“ <yourusername>_yournewtopic, ex: tim_younewtopic.
  • 5. 9. Enter the name of your topic prefixed with your Workload User Name, ex: tim_younewtopic. For settings you should create it with (3 partitions, cleanup.policy: delete, availability maximum) as shown above. Congratulations! You have built a new topic.
  • 6. 1.2 Create a Schema If You Need One. Not Required For Using Kafka Topics or Tutorials. 1. Navigate to Schema Registry from the Kafka Data Hub. 2. You will see existing schemas. 3. Click the white plus sign in the gray hexagon to create a new schema.
  • 7.
  • 8. 4. You can now add a new schema by entering a unique name starting with your Workload User Name (ex: tim), followed by a short description and then the schema text as shown. If you need examples, see the github list at the end of this guide. 5. Click Save and you have a new schema. If there were errors they will be shown and you can fix them. For more help see, Schema Registry Documentation and Schema Registry Public Cloud. Congratulations! You have built a new schema. Start using it in your DataFlow application.
  • 9. 1.3 Create an Apache Iceberg Table 1. Navigate to oss-kudu-demo from the Data Hubs list 2. Navigate to Hue from the Kudu Data Hub. 3. Inside of Hue you can now create your table. 4. Navigate to your database, this was created for you.
  • 10. Info: The database name pattern is your email address and then all special characters are replaced with underscore and then _db is appended to that to make the db name and the ranger policy is created to limit access to just the user and those that are in the admin group. For example: 5. Create your Apache Iceberg table, it must be prefixed with your Work Load User Name (userid). CREATE TABLE <<userid>>_syslog_critical_archive (priority int, severity int, facility int, version int, event_timestamp bigint, hostname string, body string, appName string, procid string, messageid string, structureddata struct<sdid:struct<eventid:string,eventsource:string,iut:string>>) STORED BY ICEBERG 6. Your table is created in s3a://oss-uat2/iceberg/
  • 11. 7. Once you have sent data to your table, you can query it.
  • 12. Additional Documentation ● Create a Table ● Query a Table ● Apache Iceberg Table Properties
  • 13. 2. Streaming Data Sets Available for Apps The following Kafka topics are being populated with streaming data for you. These come from the read-only Kafka cluster. Navigate to the Data Hub Clusters. Click on oss-kafka-datagen.
  • 14. Click Schema Registry. Click Streams Messaging Manager.
  • 15. Use these brokers to connect to them: Brokers oss-kafka-datagen-corebroker1.oss-demo.qsm5-opic.cloudera.site:9093,oss-kafka-dat agen-corebroker0.oss-demo.qsm5-opic.cloudera.site:9093,oss-kafka-datagen-corebro ker2.oss-demo.qsm5-opic.cloudera.site:9093 Use this link for Schema Registry https://#{Schema2}:7790/api/v1 Schema Registry Parameter Hostname: Schema2 oss-kafka-datagen-master0.oss-demo.qsm5-opic.cloudera.site To View Schemas in the Schema Registry click the icon from the datahub https://oss-kafka-datagen-gateway.oss-demo.qsm5-opic.cloudera.site/oss-kafka-datagen/cd p-proxy/schema-registry/ui/#/ Schemas https://github.com/tspannhw/FLaNK-DataFlows/tree/main/schemas Group ID: yourid_cdf Customers (customer) Example Row {"first_name":"Charley","last_name":"Farrell","age":19,"city":"Sawaynside","country":"Guinea","em ail":"keven.herzog@hotmail.com","phone_number":"312-269-6619"}
  • 16. IP Tables (ip_address) Example Row {"source_ip":"216.25.204.241","dest_port":219,"tcp_flags_ack":0,"tcp_flags_reset":0,"ts":"2023-0 4-20 15:26:45.517"} Orders (orders) Example Row
  • 17. {"order_id":84170282,"city":"Wintheiserton","street_address":"80206 Caroyln Lakes","amount":29,"order_time":"2023-04-20 13:25:06.097","order_status":"DELIVERED"} Plants (plant) Example Row {"plant_id":829,"city":"Lake Gerald","lat":"39.568679","lon":"-151.64497","country":"Eritrea"} Sensors (sensor) Example Row {"sensor_id":264,"timestamp_of_production":"2023-04-20 18:28:42.751"}
  • 18. Sensor Data (sensor_data) Example Row {"sensor_id":250,"timestamp_of_production":"2023-04-20 18:42:04.847","sensor_value":-72} Weather (weather) Example Row {"city":"New Ernesto","temp_c":21,"description":"Sleet"}
  • 19. Transactions (transactions) Example Row {"sender_id":40816,"receiver_id":96057,"amount":557,"execution_date":"2023-04-20 16:15:30.744","currency":"UYU"} These are realistic generated data sources that you can use, they are available from read-only Kafka topics. These can be consumed by any developers in the sandbox. Make sure you name your Kafka Consumer your Workload Username _ Some Name. Ex: tim_customerdata_reader
  • 20. 3. Bring Your Own Data (Public Only) ● Data is visible and downloadable to all, make sure it is safe, free, open, public data. ● Public REST Feeds are good ○ Wikipedia https://docs.cloudera.com/dataflow/cloud/flow-designer-beginners-guide-readyflo w/topics/cdf-flow-designer-getting-started-readyflow.html ○ https://gbfs.citibikenyc.com/gbfs/en/station_status.json ○ https://travel.state.gov/_res/rss/TAsTWs.xml ○ https://www.njtransit.com/rss/BusAdvisories_feed.xml ○ https://www.njtransit.com/rss/RailAdvisories_feed.xml ○ https://www.njtransit.com/rss/LightRailAdvisories_feed.xml ○ https://www.njtransit.com/rss/CustomerNotices_feed.xml ○ https://w1.weather.gov/xml/current_obs/all_xml.zip ○ https://dailymed.nlm.nih.gov/dailymed/services/v2/spls.json?page=1&pagesize=1 00 ○ https://dailymed.nlm.nih.gov/dailymed/services/v2/drugnames.json?pagesize=10 0 ○ https://dailymed.nlm.nih.gov/dailymed/rss.cfm ● Generic data files ○ https://aws.amazon.com/data-exchange ● Simulators ○ Use external data simulators via REST ○ Use GeneralFlowFile see: https://www.datainmotion.dev/2019/04/integration-testing-for-apache-nifi.html ● Schemas, Data Sources and Examples ○ https://github.com/tspannhw/FLaNK-AllTheStreams/ ○ https://github.com/tspannhw/FLaNK-DataFlows ○ https://github.com/tspannhw/FLaNK-TravelAdvisory/ ○ https://github.com/tspannhw/FLiP-Current22-LetsMonitorAllTheThings ○ https://github.com/tspannhw/create-nifi-kafka-flink-apps ○ https://www.datainmotion.dev/2021/01/flank-real-time-transit-information-for.html