Future of Data : Berlin
Apache NiFi and MiniFi with Apache MXNet and Tensorfor for IoT from edge devices like Raspberry Pis. Including Python and other tools.
Open Source Predictive Analytics Pipeline with Apache NiFi and MiniFi PrincetonTimothy Spann
We walk through a design of an open source Predictive Analytics Pipeline with MiniFi on a Raspberry Pi running Python and ingesting SenseHat sensor readings, a webcam image and running Inception classification on that image. MiniFi sends the resulting JSON and Image along with data provenance to an Apache NiFi server for preprocessing. It is then routed, converted, queried and stored as an Apache Hive table in Apache ORC format.
Apache NiFi, MiniFi, Apache Spark, HDP 3.0, HDF 3.1.2, Hortonworks Schema Registry Raspberry Pi with Intel Movidius running TensorFlow Python JSON Apache Avro Apache Hive HDFS
We also touched on integration with Blockchain, Ethereum and accessing REST API and Websocket interfaces offered by online blockchain explorers like Etherdelta and Etherscan.
This talk was June 28th, 2018 in Hamilton, NJ as part of the Future of Data Princeton and NJ Blockchain/Big Data joint meetup.
Apache MXNet for IoT with Apache NiFi. Using Apache MXNet with Apache NiFi and MiniFi for IoT use cases. Ingesting, managing, orchestration and running IoT workloads.
Apache Deep Learning 101 - DWS Berlin 2018Timothy Spann
Apache Deep Learning 101 with Apache MXNet, Apache NiFi, MiniFi, Apache Tika, Apache Open NLP, Apache Spark, Apache Hive, Apache HBase, Apache Livy and Apache Hadoop. Using Python we run various existing models via MXNet Model Server and via Python APIs. We also use NLP for entity resolution
Enterprise IIoT Edge Processing with Apache NiFiTimothy Spann
April 5, 2018 IoT Fusion 2018 Conference in Philadelphia, PA hosted by Chariot Solutions. This talk is about Apache NiFi, MiniFi, Python, Deep Learning, NVidia Jetson TX1, Raspberry Pi, Apache MXNet, TensorFlow and how to run things at the edge and process in your big data center. http://iotfusion.net/session/ https://github.com/tspannhw/IoTFusion2018Talk
Introduction: This workshop will provide a hands on introduction to simple event data processing and data flow processing using a Sandbox on students’ personal machines.
Format: A short introductory lecture to Apache NiFi and computing used in the lab followed by a demo, lab exercises and a Q&A session. The lecture will be followed by lab time to work through the lab exercises and ask questions.
Objective: To provide a quick and short hands-on introduction to Apache NiFi. In the lab, you will install and use Apache NiFi to collect, conduct and curate data-in-motion and data-at-rest with NiFi. You will learn how to connect and consume streaming sensor data, filter and transform the data and persist to multiple data sources.
Pre-requisites: Registrants must bring a laptop that has the latest VirtualBox installed and an image for Hortonworks DataFlow (HDF) Sandbox will be provided.
Speakers: Andy LoPresto, Timothy Spann
Open Computer Vision with OpenCV, Apache NiFi, TensorFlow, PythonTimothy Spann
Open Computer Vision with OpenCV, Apache NiFi, TensorFlow, Python
For processing images from IoT devices like Raspberry Pis, NVidia Jetson TX1, NanoPi Duos, and more that are equipped with attached cameras or external USB webcams, we use Python to interface via OpenCV and PiCamera. From there we run image processing at the edge on these IoT device using OpenCV and TensorFlow to determine attributes and image analytics. Apache MiniFi coordinates running these Python scripts and decides when and what to send from that analysis and the image to a remote Apache NiFi server for additional processing.
At the Apache NiFi cluster, in the cluster it routes the images to one processing path and the JSON encoded metadata to another flow. The JSON data (with its schema referenced from a central Schema Registry) is routed using Record Processing and SQL. This data in enriched and augmented before conversion to AVRO to be sent via Apache Kafka to SAM. Streaming Analytics Manager then does deeper processing on this stream and others including weather and twitter to determine what should be done on this data. https://github.com/tspannhw/OpenSourceComputerVision http://dataworkssummit.com/ San Jose 2018 DWS18
Introduction: This workshop will provide a hands on introduction to simple event data processing and data flow processing using a Sandbox on students’ personal machines.
Format: A short introductory lecture to Apache NiFi and computing used in the lab followed by a demo, lab exercises and a Q&A session. The lecture will be followed by lab time to work through the lab exercises and ask questions.
Objective: To provide a quick and short hands-on introduction to Apache NiFi. In the lab, you will install and use Apache NiFi to collect, conduct and curate data-in-motion and data-at-rest with NiFi. You will learn how to connect and consume streaming sensor data, filter and transform the data and persist to multiple data sources.
Open Source Predictive Analytics Pipeline with Apache NiFi and MiniFi PrincetonTimothy Spann
We walk through a design of an open source Predictive Analytics Pipeline with MiniFi on a Raspberry Pi running Python and ingesting SenseHat sensor readings, a webcam image and running Inception classification on that image. MiniFi sends the resulting JSON and Image along with data provenance to an Apache NiFi server for preprocessing. It is then routed, converted, queried and stored as an Apache Hive table in Apache ORC format.
Apache NiFi, MiniFi, Apache Spark, HDP 3.0, HDF 3.1.2, Hortonworks Schema Registry Raspberry Pi with Intel Movidius running TensorFlow Python JSON Apache Avro Apache Hive HDFS
We also touched on integration with Blockchain, Ethereum and accessing REST API and Websocket interfaces offered by online blockchain explorers like Etherdelta and Etherscan.
This talk was June 28th, 2018 in Hamilton, NJ as part of the Future of Data Princeton and NJ Blockchain/Big Data joint meetup.
Apache MXNet for IoT with Apache NiFi. Using Apache MXNet with Apache NiFi and MiniFi for IoT use cases. Ingesting, managing, orchestration and running IoT workloads.
Apache Deep Learning 101 - DWS Berlin 2018Timothy Spann
Apache Deep Learning 101 with Apache MXNet, Apache NiFi, MiniFi, Apache Tika, Apache Open NLP, Apache Spark, Apache Hive, Apache HBase, Apache Livy and Apache Hadoop. Using Python we run various existing models via MXNet Model Server and via Python APIs. We also use NLP for entity resolution
Enterprise IIoT Edge Processing with Apache NiFiTimothy Spann
April 5, 2018 IoT Fusion 2018 Conference in Philadelphia, PA hosted by Chariot Solutions. This talk is about Apache NiFi, MiniFi, Python, Deep Learning, NVidia Jetson TX1, Raspberry Pi, Apache MXNet, TensorFlow and how to run things at the edge and process in your big data center. http://iotfusion.net/session/ https://github.com/tspannhw/IoTFusion2018Talk
Introduction: This workshop will provide a hands on introduction to simple event data processing and data flow processing using a Sandbox on students’ personal machines.
Format: A short introductory lecture to Apache NiFi and computing used in the lab followed by a demo, lab exercises and a Q&A session. The lecture will be followed by lab time to work through the lab exercises and ask questions.
Objective: To provide a quick and short hands-on introduction to Apache NiFi. In the lab, you will install and use Apache NiFi to collect, conduct and curate data-in-motion and data-at-rest with NiFi. You will learn how to connect and consume streaming sensor data, filter and transform the data and persist to multiple data sources.
Pre-requisites: Registrants must bring a laptop that has the latest VirtualBox installed and an image for Hortonworks DataFlow (HDF) Sandbox will be provided.
Speakers: Andy LoPresto, Timothy Spann
Open Computer Vision with OpenCV, Apache NiFi, TensorFlow, PythonTimothy Spann
Open Computer Vision with OpenCV, Apache NiFi, TensorFlow, Python
For processing images from IoT devices like Raspberry Pis, NVidia Jetson TX1, NanoPi Duos, and more that are equipped with attached cameras or external USB webcams, we use Python to interface via OpenCV and PiCamera. From there we run image processing at the edge on these IoT device using OpenCV and TensorFlow to determine attributes and image analytics. Apache MiniFi coordinates running these Python scripts and decides when and what to send from that analysis and the image to a remote Apache NiFi server for additional processing.
At the Apache NiFi cluster, in the cluster it routes the images to one processing path and the JSON encoded metadata to another flow. The JSON data (with its schema referenced from a central Schema Registry) is routed using Record Processing and SQL. This data in enriched and augmented before conversion to AVRO to be sent via Apache Kafka to SAM. Streaming Analytics Manager then does deeper processing on this stream and others including weather and twitter to determine what should be done on this data. https://github.com/tspannhw/OpenSourceComputerVision http://dataworkssummit.com/ San Jose 2018 DWS18
Introduction: This workshop will provide a hands on introduction to simple event data processing and data flow processing using a Sandbox on students’ personal machines.
Format: A short introductory lecture to Apache NiFi and computing used in the lab followed by a demo, lab exercises and a Q&A session. The lecture will be followed by lab time to work through the lab exercises and ask questions.
Objective: To provide a quick and short hands-on introduction to Apache NiFi. In the lab, you will install and use Apache NiFi to collect, conduct and curate data-in-motion and data-at-rest with NiFi. You will learn how to connect and consume streaming sensor data, filter and transform the data and persist to multiple data sources.
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFiDataWorks Summit
Apache NiFi MiNiFi enables data collection in a brand new environment - small sensor footprint, intermittent or limited bandwidth distributed system, and disposable or short-lived hardware. You can prioritize this data or perform initial analysis on the edge, as well as immediately encrypt and protect it.
Concept: Apache NiFi offers a revolutionary data flow management system and extensive integration of existing data production, consumption and analysis ecosystems, all of which are robust data delivery and a (data) logging infrastructure It is protected by. Learn about the additional project Apache MiNiFi, which extends the scope of NiFi's power to the maximum. MiNiFi is a lightweight application that can be placed on hardware that is one order of magnitude smaller than the existing standard data collection platform and is less powerful. As a JVM-enabled native agent MiNiFi enables data gathering in a brand new environment - small sensor footprint, intermittent or limited bandwidth distributed system, and disposable or short-lived hardware. You can prioritize this data or perform initial analysis on the edge, as well as immediately encrypt and protect it. Regional governance and regulatory policies are applied to geopolitical boundaries and comply with legal requirements. And all of this configuration can be done from the existing NiFi and central control using the stable data UI that the data flow administrator has already liked and trusted.
Required prior knowledge / targeted participants: Developers and data flow administrators need some knowledge of Apache NiFi as a platform for routing, conversion, and data delivery through the system (a brief overview is provided ). In this talk we will focus on extending data collection, routing, data history, and NiFi control functions, through IoT / edge integration via MiNiFi.
Key Points: Participants will learn about the opportunity to collect and capture data flows close to the source of data, "edge", such as IoT devices, vehicles, machines, etc. Participants prioritize, filter, protect, and manipulate this data in the initial data lifecycle and understand the potential for data visibility and performance improvement.
Dataflow Management From Edge to Core with Apache NiFiDataWorks Summit
What is “dataflow?” — the process and tooling around gathering necessary information and getting it into a useful form to make insights available. Dataflow needs change rapidly — what was noise yesterday may be crucial data today, an API endpoint changes, or a service switches from producing CSV to JSON or Avro. In addition, developers may need to design a flow in a sandbox and deploy to QA or production — and those database passwords aren’t the same (hopefully). Learn about Apache NiFi — a robust and secure framework for dataflow development and monitoring.
Abstract: Identifying, collecting, securing, filtering, prioritizing, transforming, and transporting abstract data is a challenge faced by every organization. Apache NiFi and MiNiFi allow developers to create and refine dataflows with ease and ensure that their critical content is routed, transformed, validated, and delivered across global networks. Learn how the framework enables rapid development of flows, live monitoring and auditing, data protection and sharing. From IoT and machine interaction to log collection, NiFi can scale to meet the needs of your organization. Able to handle both small event messages and “big data” on the scale of terabytes per day, NiFi will provide a platform which lets both engineers and non-technical domain experts collaborate to solve the ingest and storage problems that have plagued enterprises.
Expected prior knowledge / intended audience: developers and data flow managers should be interested in learning about and improving their dataflow problems. The intended audience does not need experience in designing and modifying data flows.
Takeaways: Attendees will gain an understanding of dataflow concepts, data management processes, and flow management (including versioning, rollbacks, promotion between deployment environments, and various backing implementations).
Current uses: I am a committer and PMC member for the Apache NiFi, MiNiFi, and NiFi Registry projects and help numerous users deploy these tools to collect data from an incredibly diverse array of endpoints, aggregate, prioritize, filter, transform, and secure this data, and generate actionable insight from it. Current users of these platforms include many Fortune 100 companies, governments, startups, and individual users across fields like telecommunications, finance, healthcare, automotive, aerospace, and oil & gas, with use cases like fraud detection, logistics management, supply chain management, machine learning, IoT gateway, connected vehicles, smart grids, etc.
Speaker: Andy LoPresto, Sr. Member of Technical Staff, Hortonworks
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiDataWorks Summit
Apache NiFi provided a revolutionary data flow management system with a broad range of integrations with existing data production, consumption, and analysis ecosystems, all covered with robust data delivery and provenance infrastructure. Now learn about the follow-on project which expands the reach of NiFi to the edge, Apache MiNiFi. MiNiFi is a lightweight application which can be deployed on hardware orders of magnitude smaller and less powerful than the existing standard data collection platforms. With both a JVM compatible and native agent, MiNiFi allows data collection in brand new environments — sensors with tiny footprints, distributed systems with intermittent or restricted bandwidth, and even disposable or ephemeral hardware. Not only can this data be prioritized and have some initial analysis performed at the edge, it can be encrypted and secured immediately. Local governance and regulatory policies can be applied across geopolitical boundaries to conform with legal requirements. And all of this configuration can be done from central command & control using an existing NiFi with the trusted and stable UI data flow managers already love.
Expected prior knowledge / intended audience: developers and data flow managers should have passing knowledge of Apache NiFi as a platform for routing, transforming, and delivering data through systems (a brief overview will be provided). The talk will focus on extending the data collection, routing, provenance, and governance capabilities of NiFi to IoT/edge integration via MiNiFi.
Speaker
Andy LoPresto, Sr. Member of Technical Staff, Hortonworks
Dataflow Management From Edge to Core with Apache NiFiDataWorks Summit
What is “dataflow?” — the process and tooling around gathering necessary information and getting it into a useful form to make insights available. Dataflow needs change rapidly — what was noise yesterday may be crucial data today, an API endpoint changes, or a service switches from producing CSV to JSON or Avro. In addition, developers may need to design a flow in a sandbox and deploy to QA or production — and those database passwords aren’t the same (hopefully). Learn about Apache NiFi — a robust and secure framework for dataflow development and monitoring.
Abstract: Identifying, collecting, securing, filtering, prioritizing, transforming, and transporting abstract data is a challenge faced by every organization. Apache NiFi and MiNiFi allow developers to create and refine dataflows with ease and ensure that their critical content is routed, transformed, validated, and delivered across global networks. Learn how the framework enables rapid development of flows, live monitoring and auditing, data protection and sharing. From IoT and machine interaction to log collection, NiFi can scale to meet the needs of your organization. Able to handle both small event messages and “big data” on the scale of terabytes per day, NiFi will provide a platform which lets both engineers and non-technical domain experts collaborate to solve the ingest and storage problems that have plagued enterprises.
Expected prior knowledge / intended audience: developers and data flow managers should be interested in learning about and improving their dataflow problems. The intended audience does not need experience in designing and modifying data flows.
Takeaways: Attendees will gain an understanding of dataflow concepts, data management processes, and flow management (including versioning, rollbacks, promotion between deployment environments, and various backing implementations).
Current uses: I am a committer and PMC member for the Apache NiFi, MiNiFi, and NiFi Registry projects and help numerous users deploy these tools to collect data from an incredibly diverse array of endpoints, aggregate, prioritize, filter, transform, and secure this data, and generate actionable insight from it. Current users of these platforms include many Fortune 100 companies, governments, startups, and individual users across fields like telecommunications, finance, healthcare, automotive, aerospace, and oil & gas, with use cases like fraud detection, logistics management, supply chain management, machine learning, IoT gateway, connected vehicles, smart grids, etc.
This presentation was created as an introduction to the Apache NiFi project; to be followed by “Lab 0” of the “Realtime Event Processing in Hadoop with NiFi, Kafka and Storm” tutorial hosted here: http://hortonworks.com/hadoop-tutorial/realtime-event-processing-nifi-kafka-storm/#section_1
Running Apache NiFi with Apache Spark : Integration OptionsTimothy Spann
A walk-through of various options in integration Apache Spark and Apache NiFi in one smooth dataflow. There are now several options in interfacing between Apache NiFi and Apache Spark with Apache Kafka and Apache Livy.
Intelligently collecting data at the edge—intro to Apache MiNiFiDataWorks Summit
Apache NiFi provided a revolutionary data flow management system with a broad range of integrations with existing data production, consumption, and analysis ecosystems, all covered with robust data delivery and provenance infrastructure. Now learn about the follow-on project which expands the reach of NiFi to the edge, Apache MiNiFi. MiNiFi is a lightweight application which can be deployed on hardware orders of magnitude smaller and less powerful than the existing standard data collection platforms. With both a JVM compatible and native agent, MiNiFi allows data collection in brand new environments — sensors with tiny footprints, distributed systems with intermittent or restricted bandwidth, and even disposable or ephemeral hardware. Not only can this data be prioritized and have some initial analysis performed at the edge, it can be encrypted and secured immediately. Local governance and regulatory policies can be applied across geopolitical boundaries to conform with legal requirements. And all of this configuration can be done from central command & control using an existing NiFi with the trusted and stable UI data flow managers already love.
Expected prior knowledge / intended audience: developers and data flow managers should have passing knowledge of Apache NiFi as a platform for routing, transforming, and delivering data through systems (a brief overview will be provided). The talk will focus on extending the data collection, routing, provenance, and governance capabilities of NiFi to IoT/edge integration via MiNiFi.
Speaker
Andy LoPresto, Sr Member of Technical Staff, Hortonworks
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFiDataWorks Summit
Apache NiFi provided a revolutionary data flow management system with a broad range of integrations with existing data production, consumption, and analysis ecosystems, all covered with robust data delivery and provenance infrastructure. Now learn about the follow-on project which expands the reach of NiFi to the edge, Apache MiNiFi. MiNiFi is a lightweight application which can be deployed on hardware orders of magnitude smaller and less powerful than the existing standard data collection platforms. With both a JVM compatible and native agent, MiNiFi allows data collection in brand new environments — sensors with tiny footprints, distributed systems with intermittent or restricted bandwidth, and even disposable or ephemeral hardware. Not only can this data be prioritized and have some initial analysis performed at the edge, it can be encrypted and secured immediately. Local governance and regulatory policies can be applied across geopolitical boundaries to conform with legal requirements. And all of this configuration can be done from central command & control using an existing NiFi with the trusted and stable UI data flow managers already love.
Apache NiFi: Ingesting Enterprise Data At Scale Timothy Spann
Ingesting various enterprise data sources including JMS, MQTT, Sensors, JSON, XML, CSV, Text, Images, RDBMS and more. Processing, transforming and storing in HDFS, HBase, Phoenix, Hive and more. Also processing with TensorFlow. Hortonworks and TRAC Intermodal Event. Future of Data Princeton.
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFiDataWorks Summit
Apache NiFi MiNiFi allows data collection in brand new environments — sensors with tiny footprints, distributed systems with intermittent or restricted bandwidth, and even disposable or ephemeral hardware. Not only can this data be prioritized and have some initial analysis performed at the edge, it can be encrypted and secured immediately.
Abstract: Apache NiFi provided a revolutionary data flow management system with a broad range of integrations with existing data production, consumption, and analysis ecosystems, all covered with robust data delivery and provenance infrastructure. Now learn about the follow-on project which expands the reach of NiFi to the edge, Apache MiNiFi. MiNiFi is a lightweight application which can be deployed on hardware orders of magnitude smaller and less powerful than the existing standard data collection platforms. With both a JVM compatible and native agent, MiNiFi allows data collection in brand new environments — sensors with tiny footprints, distributed systems with intermittent or restricted bandwidth, and even disposable or ephemeral hardware. Not only can this data be prioritized and have some initial analysis performed at the edge, it can be encrypted and secured immediately. Local governance and regulatory policies can be applied across geopolitical boundaries to conform with legal requirements. And all of this configuration can be done from central command & control using an existing NiFi with the trusted and stable UI data flow managers already love.
Expected prior knowledge / intended audience: developers and data flow managers should have a passing knowledge of Apache NiFi as a platform for routing, transforming, and delivering data through systems (a brief overview will be provided). The talk will focus on extending the data collection, routing, provenance, and governance capabilities of NiFi to IoT/edge integration via MiNiFi.
Takeaways: Attendees will learn about opportunities to bring their data flow and capture closer to the "edge" -- sources of data like IoT devices, vehicles, machinery, etc. They will understand the possibilities to prioritize, filter, secure, and manipulate this data earlier in the data lifecycle to enhance their data visibility and performance.
Speaker: Andy LoPresto, Sr. Member of Technical Staff, Hortonworks
Hadoop operations started on-prem primarily driven by Apache Ambari. However, due to the agility and flexibility of the cloud, it has driven many Hadoop cluster operations to the cloud and to hybrid environments. Cloud is enabling many ephemeral on-demand use cases which is a game-changing opportunity for analytic workloads. But all of this comes with the challenges of running enterprise workloads in the cloud securely and with ease.
Apache Ambari is used by thousands of Hadoop Operators to manage the deployment, lifecycle, and automation of DevOps for Hadoop ecosystem projects. Starting out, Apache Ambari installed a handful of Apache Hadoop ecosystem projects, on a few operating systems, and helped with the most basic Hadoop operational tasks. Today, the product manages over 20 different services, runs on multiple major operating systems and versions, and automates many of the most challenging Hadoop operational tasks in the most secure customer environments.
In this session, we will also take you through Cloudbreak as a solution to simplify provisioning and managing enterprise workloads while providing an open and common experience for deploying workloads across clouds. We will discuss the challenges (and opportunities) to run enterprise workloads in the cloud and will go through a live demo of how the latest from Cloudbreak enables enterprises to easily and securely run Apache Hadoop. This includes deep-dive discussion on Ambari Blueprints, recipes, custom images, and enabling Kerberos -- which are all key capabilities for Enterprise deployments.
As part of this talk, will walk you through what we've learned, the challenges we've overcome, and how the Apache Ambari and Cloudbreak community has changed the product to handle them. The future is fast approaching, and with it comes new on-premise and cloud deployment architectures. See how Apache Ambari and Cloudbreak are being re-imagined to handle these new challenges.
IoT with Apache MXNet and Apache NiFi and MiniFiDataWorks Summit
A hands-on deep dive on using Apachee MiniFi with Apache MXNet on the edge device including Raspberry Pi with Movidius and NVidia Jetson TX1. We run deep learning models on the edge device and send images, sensor data and deep learning results if values exceed norms. Using S2S data is sent to NiFi for further processing, additional deep learning processing, data augmentation. A stream of data is landed as ORC files in HDFS with Hive tables on-top.
Processed data in AVRO format with a schema stored in Schema Registry. Visualization is shown in Zeppelin.
Use Cases: Security Camera Monitoring, Utility Asset Anomaly Detection, Temperature and Humdiity filtering for devices.
This talk builds on several existing articles I have written:
https://community.hortonworks.com/articles/142686/real-time-ingesting-and-transforming-sensor-and-so.html
https://community.hortonworks.com/articles/121916/controlling-big-data-flows-with-gestures-minifi-ni.html
https://community.hortonworks.com/articles/83100/deep-learning-iot-workflows-with-raspberry-pi-mqtt.html
https://community.hortonworks.com/articles/103863/using-an-asus-tinkerboard-with-tensorflow-and-pyth.html
https://community.hortonworks.com/articles/130814/sensors-and-image-capture-and-deep-learning-analys.html
https://community.hortonworks.com/articles/107379/minifi-for-image-capture-and-ingestion-from-raspbe.html
https://community.hortonworks.com/articles/118132/minifi-capturing-converting-tensorflow-inception-t.html
https://community.hortonworks.com/articles/136039/integrating-nvidia-jetson-tx1-running-tensorrt-int-3.html
https://community.hortonworks.com/articles/101904/part-2-iot-augmenting-gps-data-with-weather.html
Speaker
Timothy Spann, Solutions Engineer, Hortonworks
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018Timothy Spann
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018. A quick talk on how to ingest IoT sensor data, camera images and run deep learning prebuilt models on edge devices including Raspberry Pis and NVidia Jetson TX1s. From there data is processed and orchestrated with Apache NiFi to send to various Big Data backends.
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFiDataWorks Summit
Apache NiFi MiNiFi enables data collection in a brand new environment - small sensor footprint, intermittent or limited bandwidth distributed system, and disposable or short-lived hardware. You can prioritize this data or perform initial analysis on the edge, as well as immediately encrypt and protect it.
Concept: Apache NiFi offers a revolutionary data flow management system and extensive integration of existing data production, consumption and analysis ecosystems, all of which are robust data delivery and a (data) logging infrastructure It is protected by. Learn about the additional project Apache MiNiFi, which extends the scope of NiFi's power to the maximum. MiNiFi is a lightweight application that can be placed on hardware that is one order of magnitude smaller than the existing standard data collection platform and is less powerful. As a JVM-enabled native agent MiNiFi enables data gathering in a brand new environment - small sensor footprint, intermittent or limited bandwidth distributed system, and disposable or short-lived hardware. You can prioritize this data or perform initial analysis on the edge, as well as immediately encrypt and protect it. Regional governance and regulatory policies are applied to geopolitical boundaries and comply with legal requirements. And all of this configuration can be done from the existing NiFi and central control using the stable data UI that the data flow administrator has already liked and trusted.
Required prior knowledge / targeted participants: Developers and data flow administrators need some knowledge of Apache NiFi as a platform for routing, conversion, and data delivery through the system (a brief overview is provided ). In this talk we will focus on extending data collection, routing, data history, and NiFi control functions, through IoT / edge integration via MiNiFi.
Key Points: Participants will learn about the opportunity to collect and capture data flows close to the source of data, "edge", such as IoT devices, vehicles, machines, etc. Participants prioritize, filter, protect, and manipulate this data in the initial data lifecycle and understand the potential for data visibility and performance improvement.
Dataflow Management From Edge to Core with Apache NiFiDataWorks Summit
What is “dataflow?” — the process and tooling around gathering necessary information and getting it into a useful form to make insights available. Dataflow needs change rapidly — what was noise yesterday may be crucial data today, an API endpoint changes, or a service switches from producing CSV to JSON or Avro. In addition, developers may need to design a flow in a sandbox and deploy to QA or production — and those database passwords aren’t the same (hopefully). Learn about Apache NiFi — a robust and secure framework for dataflow development and monitoring.
Abstract: Identifying, collecting, securing, filtering, prioritizing, transforming, and transporting abstract data is a challenge faced by every organization. Apache NiFi and MiNiFi allow developers to create and refine dataflows with ease and ensure that their critical content is routed, transformed, validated, and delivered across global networks. Learn how the framework enables rapid development of flows, live monitoring and auditing, data protection and sharing. From IoT and machine interaction to log collection, NiFi can scale to meet the needs of your organization. Able to handle both small event messages and “big data” on the scale of terabytes per day, NiFi will provide a platform which lets both engineers and non-technical domain experts collaborate to solve the ingest and storage problems that have plagued enterprises.
Expected prior knowledge / intended audience: developers and data flow managers should be interested in learning about and improving their dataflow problems. The intended audience does not need experience in designing and modifying data flows.
Takeaways: Attendees will gain an understanding of dataflow concepts, data management processes, and flow management (including versioning, rollbacks, promotion between deployment environments, and various backing implementations).
Current uses: I am a committer and PMC member for the Apache NiFi, MiNiFi, and NiFi Registry projects and help numerous users deploy these tools to collect data from an incredibly diverse array of endpoints, aggregate, prioritize, filter, transform, and secure this data, and generate actionable insight from it. Current users of these platforms include many Fortune 100 companies, governments, startups, and individual users across fields like telecommunications, finance, healthcare, automotive, aerospace, and oil & gas, with use cases like fraud detection, logistics management, supply chain management, machine learning, IoT gateway, connected vehicles, smart grids, etc.
Speaker: Andy LoPresto, Sr. Member of Technical Staff, Hortonworks
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiDataWorks Summit
Apache NiFi provided a revolutionary data flow management system with a broad range of integrations with existing data production, consumption, and analysis ecosystems, all covered with robust data delivery and provenance infrastructure. Now learn about the follow-on project which expands the reach of NiFi to the edge, Apache MiNiFi. MiNiFi is a lightweight application which can be deployed on hardware orders of magnitude smaller and less powerful than the existing standard data collection platforms. With both a JVM compatible and native agent, MiNiFi allows data collection in brand new environments — sensors with tiny footprints, distributed systems with intermittent or restricted bandwidth, and even disposable or ephemeral hardware. Not only can this data be prioritized and have some initial analysis performed at the edge, it can be encrypted and secured immediately. Local governance and regulatory policies can be applied across geopolitical boundaries to conform with legal requirements. And all of this configuration can be done from central command & control using an existing NiFi with the trusted and stable UI data flow managers already love.
Expected prior knowledge / intended audience: developers and data flow managers should have passing knowledge of Apache NiFi as a platform for routing, transforming, and delivering data through systems (a brief overview will be provided). The talk will focus on extending the data collection, routing, provenance, and governance capabilities of NiFi to IoT/edge integration via MiNiFi.
Speaker
Andy LoPresto, Sr. Member of Technical Staff, Hortonworks
Dataflow Management From Edge to Core with Apache NiFiDataWorks Summit
What is “dataflow?” — the process and tooling around gathering necessary information and getting it into a useful form to make insights available. Dataflow needs change rapidly — what was noise yesterday may be crucial data today, an API endpoint changes, or a service switches from producing CSV to JSON or Avro. In addition, developers may need to design a flow in a sandbox and deploy to QA or production — and those database passwords aren’t the same (hopefully). Learn about Apache NiFi — a robust and secure framework for dataflow development and monitoring.
Abstract: Identifying, collecting, securing, filtering, prioritizing, transforming, and transporting abstract data is a challenge faced by every organization. Apache NiFi and MiNiFi allow developers to create and refine dataflows with ease and ensure that their critical content is routed, transformed, validated, and delivered across global networks. Learn how the framework enables rapid development of flows, live monitoring and auditing, data protection and sharing. From IoT and machine interaction to log collection, NiFi can scale to meet the needs of your organization. Able to handle both small event messages and “big data” on the scale of terabytes per day, NiFi will provide a platform which lets both engineers and non-technical domain experts collaborate to solve the ingest and storage problems that have plagued enterprises.
Expected prior knowledge / intended audience: developers and data flow managers should be interested in learning about and improving their dataflow problems. The intended audience does not need experience in designing and modifying data flows.
Takeaways: Attendees will gain an understanding of dataflow concepts, data management processes, and flow management (including versioning, rollbacks, promotion between deployment environments, and various backing implementations).
Current uses: I am a committer and PMC member for the Apache NiFi, MiNiFi, and NiFi Registry projects and help numerous users deploy these tools to collect data from an incredibly diverse array of endpoints, aggregate, prioritize, filter, transform, and secure this data, and generate actionable insight from it. Current users of these platforms include many Fortune 100 companies, governments, startups, and individual users across fields like telecommunications, finance, healthcare, automotive, aerospace, and oil & gas, with use cases like fraud detection, logistics management, supply chain management, machine learning, IoT gateway, connected vehicles, smart grids, etc.
This presentation was created as an introduction to the Apache NiFi project; to be followed by “Lab 0” of the “Realtime Event Processing in Hadoop with NiFi, Kafka and Storm” tutorial hosted here: http://hortonworks.com/hadoop-tutorial/realtime-event-processing-nifi-kafka-storm/#section_1
Running Apache NiFi with Apache Spark : Integration OptionsTimothy Spann
A walk-through of various options in integration Apache Spark and Apache NiFi in one smooth dataflow. There are now several options in interfacing between Apache NiFi and Apache Spark with Apache Kafka and Apache Livy.
Intelligently collecting data at the edge—intro to Apache MiNiFiDataWorks Summit
Apache NiFi provided a revolutionary data flow management system with a broad range of integrations with existing data production, consumption, and analysis ecosystems, all covered with robust data delivery and provenance infrastructure. Now learn about the follow-on project which expands the reach of NiFi to the edge, Apache MiNiFi. MiNiFi is a lightweight application which can be deployed on hardware orders of magnitude smaller and less powerful than the existing standard data collection platforms. With both a JVM compatible and native agent, MiNiFi allows data collection in brand new environments — sensors with tiny footprints, distributed systems with intermittent or restricted bandwidth, and even disposable or ephemeral hardware. Not only can this data be prioritized and have some initial analysis performed at the edge, it can be encrypted and secured immediately. Local governance and regulatory policies can be applied across geopolitical boundaries to conform with legal requirements. And all of this configuration can be done from central command & control using an existing NiFi with the trusted and stable UI data flow managers already love.
Expected prior knowledge / intended audience: developers and data flow managers should have passing knowledge of Apache NiFi as a platform for routing, transforming, and delivering data through systems (a brief overview will be provided). The talk will focus on extending the data collection, routing, provenance, and governance capabilities of NiFi to IoT/edge integration via MiNiFi.
Speaker
Andy LoPresto, Sr Member of Technical Staff, Hortonworks
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFiDataWorks Summit
Apache NiFi provided a revolutionary data flow management system with a broad range of integrations with existing data production, consumption, and analysis ecosystems, all covered with robust data delivery and provenance infrastructure. Now learn about the follow-on project which expands the reach of NiFi to the edge, Apache MiNiFi. MiNiFi is a lightweight application which can be deployed on hardware orders of magnitude smaller and less powerful than the existing standard data collection platforms. With both a JVM compatible and native agent, MiNiFi allows data collection in brand new environments — sensors with tiny footprints, distributed systems with intermittent or restricted bandwidth, and even disposable or ephemeral hardware. Not only can this data be prioritized and have some initial analysis performed at the edge, it can be encrypted and secured immediately. Local governance and regulatory policies can be applied across geopolitical boundaries to conform with legal requirements. And all of this configuration can be done from central command & control using an existing NiFi with the trusted and stable UI data flow managers already love.
Apache NiFi: Ingesting Enterprise Data At Scale Timothy Spann
Ingesting various enterprise data sources including JMS, MQTT, Sensors, JSON, XML, CSV, Text, Images, RDBMS and more. Processing, transforming and storing in HDFS, HBase, Phoenix, Hive and more. Also processing with TensorFlow. Hortonworks and TRAC Intermodal Event. Future of Data Princeton.
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFiDataWorks Summit
Apache NiFi MiNiFi allows data collection in brand new environments — sensors with tiny footprints, distributed systems with intermittent or restricted bandwidth, and even disposable or ephemeral hardware. Not only can this data be prioritized and have some initial analysis performed at the edge, it can be encrypted and secured immediately.
Abstract: Apache NiFi provided a revolutionary data flow management system with a broad range of integrations with existing data production, consumption, and analysis ecosystems, all covered with robust data delivery and provenance infrastructure. Now learn about the follow-on project which expands the reach of NiFi to the edge, Apache MiNiFi. MiNiFi is a lightweight application which can be deployed on hardware orders of magnitude smaller and less powerful than the existing standard data collection platforms. With both a JVM compatible and native agent, MiNiFi allows data collection in brand new environments — sensors with tiny footprints, distributed systems with intermittent or restricted bandwidth, and even disposable or ephemeral hardware. Not only can this data be prioritized and have some initial analysis performed at the edge, it can be encrypted and secured immediately. Local governance and regulatory policies can be applied across geopolitical boundaries to conform with legal requirements. And all of this configuration can be done from central command & control using an existing NiFi with the trusted and stable UI data flow managers already love.
Expected prior knowledge / intended audience: developers and data flow managers should have a passing knowledge of Apache NiFi as a platform for routing, transforming, and delivering data through systems (a brief overview will be provided). The talk will focus on extending the data collection, routing, provenance, and governance capabilities of NiFi to IoT/edge integration via MiNiFi.
Takeaways: Attendees will learn about opportunities to bring their data flow and capture closer to the "edge" -- sources of data like IoT devices, vehicles, machinery, etc. They will understand the possibilities to prioritize, filter, secure, and manipulate this data earlier in the data lifecycle to enhance their data visibility and performance.
Speaker: Andy LoPresto, Sr. Member of Technical Staff, Hortonworks
Hadoop operations started on-prem primarily driven by Apache Ambari. However, due to the agility and flexibility of the cloud, it has driven many Hadoop cluster operations to the cloud and to hybrid environments. Cloud is enabling many ephemeral on-demand use cases which is a game-changing opportunity for analytic workloads. But all of this comes with the challenges of running enterprise workloads in the cloud securely and with ease.
Apache Ambari is used by thousands of Hadoop Operators to manage the deployment, lifecycle, and automation of DevOps for Hadoop ecosystem projects. Starting out, Apache Ambari installed a handful of Apache Hadoop ecosystem projects, on a few operating systems, and helped with the most basic Hadoop operational tasks. Today, the product manages over 20 different services, runs on multiple major operating systems and versions, and automates many of the most challenging Hadoop operational tasks in the most secure customer environments.
In this session, we will also take you through Cloudbreak as a solution to simplify provisioning and managing enterprise workloads while providing an open and common experience for deploying workloads across clouds. We will discuss the challenges (and opportunities) to run enterprise workloads in the cloud and will go through a live demo of how the latest from Cloudbreak enables enterprises to easily and securely run Apache Hadoop. This includes deep-dive discussion on Ambari Blueprints, recipes, custom images, and enabling Kerberos -- which are all key capabilities for Enterprise deployments.
As part of this talk, will walk you through what we've learned, the challenges we've overcome, and how the Apache Ambari and Cloudbreak community has changed the product to handle them. The future is fast approaching, and with it comes new on-premise and cloud deployment architectures. See how Apache Ambari and Cloudbreak are being re-imagined to handle these new challenges.
IoT with Apache MXNet and Apache NiFi and MiniFiDataWorks Summit
A hands-on deep dive on using Apachee MiniFi with Apache MXNet on the edge device including Raspberry Pi with Movidius and NVidia Jetson TX1. We run deep learning models on the edge device and send images, sensor data and deep learning results if values exceed norms. Using S2S data is sent to NiFi for further processing, additional deep learning processing, data augmentation. A stream of data is landed as ORC files in HDFS with Hive tables on-top.
Processed data in AVRO format with a schema stored in Schema Registry. Visualization is shown in Zeppelin.
Use Cases: Security Camera Monitoring, Utility Asset Anomaly Detection, Temperature and Humdiity filtering for devices.
This talk builds on several existing articles I have written:
https://community.hortonworks.com/articles/142686/real-time-ingesting-and-transforming-sensor-and-so.html
https://community.hortonworks.com/articles/121916/controlling-big-data-flows-with-gestures-minifi-ni.html
https://community.hortonworks.com/articles/83100/deep-learning-iot-workflows-with-raspberry-pi-mqtt.html
https://community.hortonworks.com/articles/103863/using-an-asus-tinkerboard-with-tensorflow-and-pyth.html
https://community.hortonworks.com/articles/130814/sensors-and-image-capture-and-deep-learning-analys.html
https://community.hortonworks.com/articles/107379/minifi-for-image-capture-and-ingestion-from-raspbe.html
https://community.hortonworks.com/articles/118132/minifi-capturing-converting-tensorflow-inception-t.html
https://community.hortonworks.com/articles/136039/integrating-nvidia-jetson-tx1-running-tensorrt-int-3.html
https://community.hortonworks.com/articles/101904/part-2-iot-augmenting-gps-data-with-weather.html
Speaker
Timothy Spann, Solutions Engineer, Hortonworks
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018Timothy Spann
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018. A quick talk on how to ingest IoT sensor data, camera images and run deep learning prebuilt models on edge devices including Raspberry Pis and NVidia Jetson TX1s. From there data is processed and orchestrated with Apache NiFi to send to various Big Data backends.
Hands-On Deep Dive with MiniFi and Apache MXNetTimothy Spann
Deep Learning on The Edge, a hands-on a approach to running deep learning work loads on the edge for IoT as well as in Apache NiFi and in Hadoop 3.1 YARN as dockerized work loads.
In my talk I will discuss and show examples of using Apache Hadoop, Apache Hive, Apache MXNet, Apache OpenNLP, Apache NiFi and Apache Spark for deep learning applications.
As part of my talk I will walk through using Apache NXNet Pre-Built Models, MXNet's New Model Server with Apache NiFi, executing MXNet with Apache NiFi and running Apache MXNet on edge nodes utilizing Python and Apache MiniFi.
This talk is geared towards Data Engineers interested in the basics of Deep Learning with open source Apache tools in a Big Data environment. I will walk through source code examples available in github and run the code live on an Apache Hadoop / YARN / Apache Spark cluster.
This will be an introduction to executing Deep Learning Pipelines in an Apache Big Data environment.
My talk at Data Works Summit Sydney was listed in top 7 -> https://hortonworks.com/blog/7-sessions-dataworks-summit-sydney-see/
Also have speak at and run Future of Data Princeton and at Oracle Code NYC.
Ref:
https://community.hortonworks.com/articles/83100/deep-learning-iot-workflows-with-raspberry-pi-mqtt.html
https://community.hortonworks.com/articles/146704/edge-analytics-with-nvidia-jetson-tx1-running-apac.html
https://dzone.com/refcardz/introduction-to-tensorflow
Speaker
Timothy Spann, Solutions Engineer, Hortonworks
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxDataWorks Summit
Today enterprises are increasingly leveraging hybrid cloud data lakes while taking advantage of the elastic resources and services available in the public cloud. However, such gains come with risks and challenges in the areas of security and privacy. In this talk, we will cover how an enterprise can use Apache Knox as a secure point of entry into a Multi-cluster hybrid cloud data lakes. We will outline how enterprises can securely test out new big data applications or concepts in the public cloud while protecting their production clusters on-premises. We will show how enterprises can leverage their existing on-premises Active Directory infrastructure for authenticating users trying to access their services in the cloud. Further, we will cover how you can leverage Apache Knox Authorization to thwart an unauthorized access to a multi-cloud and multi-cluster data lake and bring to bear Multi Factor Authentication (MFA) on Apache Knox to block a hacker with stolen credentials. KIRAN MATTY, Senior Product Manager, Hortonworks and SANDEEP MORE, Sr. Software Engineer, Hortonworks
As Hadoop becomes the defacto big data platform, enterprises deploy HDP across wide range of physical and virtual environments spanning private and public clouds. This session will cover key considerations for cloud deployment and showcase Cloudbreak for simple and consistent deployment across cloud providers of choice.
Open source computer vision with TensorFlow, Apache MiniFi, Apache NiFi, Open...DataWorks Summit
For processing images from IoT devices like Raspberry Pis, NVidia Jetson TX1, NanoPi Duos, and more that are equipped with attached cameras or external USB webcams, we use Python to interface via OpenCV and PiCamera. From there we run image processing at the edge on these IoT device using OpenCV and TensorFlow to determine attributes and image analytics. Apache MiniFi coordinates running these Python scripts and decides when and what to send from that analysis and the image to a remote Apache NiFi server for additional processing. Also includes using custom NiFi processors for Atrribute Cleaning, Apache Tika, TensorFlow and Camera ingest.
At the Apache NiFi cluster, in the cluster it routes the images to one processing path and the JSON encoded metadata to another flow. The JSON data (with its schema referenced from a central Schema Registry) is routed using Record Processing and SQL. TIMOTHY SPANN, Solutions Engineer, Hortonworks and NAGARAJ JAYAKUMAR, Architect, Hortonworks
Big data security challenges are bit different from traditional client-server applications and are distributed in nature, introducing unique security vulnerabilities. Cloud Security Alliance (CSA) has categorized the different security and privacy challenges into four different aspects of the big data ecosystem. These aspects are infrastructure security, data privacy, data management and, integrity and reactive security. Each of these aspects are further divided into following security challenges:
1. Infrastructure security
a. Secure distributed processing of data
b. Security best practices for non-relational data stores
2. Data privacy
a. Privacy-preserving analytics
b. Cryptographic technologies for big data
c. Granular access control
3. Data management
a. Secure data storage and transaction logs
b. Granular audits
c. Data provenance
4. Integrity and reactive security
a. Endpoint input validation/filtering
b. Real-time security/compliance monitoring
In this talk, we are going to refer above classification and identify existing security controls, best practices, and guidelines. We will also paint a big picture about how collective usage of all discussed security controls (Kerberos, TDE, LDAP, SSO, SSL/TLS, Apache Knox, Apache Ranger, Apache Atlas, Ambari Infra, etc.) can address fundamental security and privacy challenges that encompass the entire Hadoop ecosystem. We will also discuss briefly recent security incidents involving Hadoop systems.
Speakers
Krishna Pandey, Staff Software Engineer, Hortonworks
Kunal Rajguru, Premier Support Engineer, Hortonworks
This webinar series covers Apache Kafka and Apache Storm for streaming data processing. Also, it discusses new streaming innovations for Kafka and Storm included in HDP 2.2
As containerization continues to gain momentum and become a de facto standard for application deployment, challenges around containerization of big data workloads are coming to light. Great strides have been made within the open source communities towards running big data workloads in containers, but much is left to be done.
Apache Hadoop YARN is the modern distributed operating system for big data applications. It has morphed the Hadoop compute layer into a common resource-management platform that can host a wide variety of applications. At its core, YARN has a very powerful scheduler which enforces global cluster level invariants and helps sites manage user and operator expectations of elastic sharing, resource usage limits, SLAs, and more. YARN recently increased its support for Docker containerization and added a YARN service framework supporting long-running services.
In this session we will explore the emerging patterns and challenges related to containers and big data workloads, including running applications such as Apache Spark, Apache HBase, and Kubernetes in containers on YARN. BILLIE RINALDI, Principal Software Engineer, Hortonworks and SHANE KUMPF, Software Engineer, Hortonworks
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...DataWorks Summit
Today enterprises desire to move more and more of their data lakes to the cloud to help them execute faster, increase productivity, drive innovation while leveraging the scale and flexibility of the cloud. However, such gains come with risks and challenges in the areas of data security, privacy, and governance. In this talk we cover how enterprises can overcome governance and security obstacles to leverage these new advances that the cloud can provide to ease the management of their data lakes in the cloud. We will also show how the enterprise can have consistent governance and security controls in the cloud for their ephemeral analytic workloads in a multi-cluster cloud environment without sacrificing any of the data security and privacy/compliance needs that their business context demands. Additionally, we will outline some use cases and patterns as well as best practices to rationally manage such a multi-cluster data lake infrastructure in the cloud.
From leading IoT Protocols to Python Dashboarding_finalLukas Ott
First i like to give an overview on common IoT Protocols:
#CoAP (Constrained Application Protocol -> Close to HTTP / REST ) #MQTT ( Message Queue Telemetry Transport -> Pub/Sub with Broker -> Well defined Quality of Service -> Newest addition Eclipse Amlem (formerly the core of IBM Watson IoT platform) -> Eclipse Sparkplug -> Standardization of the topics and payloads -> Interoperability!) , #DDS (Data Distribution Service -> Pub/Sub without Broker -> Drones / Robotics) #LwM2M (Lightweight M2M -> Runs on Top of CoAP or MQTT -> standard sets of payloads for sensors) #zenoh (https://zenoh.io/ Pub/Sub Protocol -> combines the advantages of #DDS and #MQTT) #eclipsefoundation #apache #opensource #lightweight (+ some comments that this is not complete and does not encompass Industrial and Building Automation)
Then I would like to show the leading edge IoT protocol Zenoh. Saving Zenoh Payload to Apache IoTDB. After that I would like to dive into Panel and the awesome capabilities of Apache ECharts.
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31Timothy Spann
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
An overview for Big Data Engineers on how one could use Apache projects to run deep learning workflows with Apache NiFi, YARN, Spark, Kafka and many other Apache projects.
In my talk I will discuss and show examples of using Apache Hadoop, Apache Hive, Apache MXNet, Apache OpenNLP, Apache NiFi and Apache Spark for deep learning applications. This is the follow up to last years Apache Deep Learning 101 that was done at Dataworks Summit and ApacheCon.
As part of my talk I will walk through using Apache NXNet Pre-Built Models, MXNet's New Model Server with Apache NiFi, executing MXNet with Apache NiFi and running Apache MXNet on edge nodes utilizing Python and Apache MiniFi.
This talk is geared towards Data Engineers interested in the basics of Deep Learning with open source Apache tools in a Big Data environment. I will walk through source code examples available in github and run the code live on an Apache Hadoop / YARN / Apache Spark cluster.
This will be an introduction to executing Deep Learning Pipelines in an Apache Big Data environment.
My talk at Data Works Summit Sydney was listed in top 7 -> https://hortonworks.com/blog/7-sessions-dataworks-summit-sydney-see/
Also have speak at and run Future of Data Princeton and at Oracle Code NYC.
https://www.slideshare.net/oom65/hadoop-security-architecture?next_slideshow=1
https://community.hortonworks.com/articles/83100/deep-learning-iot-workflows-with-raspberry-pi-mqtt.html
https://community.hortonworks.com/articles/146704/edge-analytics-with-nvidia-jetson-tx1-running-apac.html
https://dzone.com/refcardz/introduction-to-tensorflow
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA
Hortonworks DataFlow (HDF) is built with the vision of creating a platform that enables enterprises to build dataflow management and streaming analytics solutions that collect, curate, analyze and act on data in motion across the datacenter and cloud. Do you want to be able to provide a complete end-to-end streaming solution, from an IoT device all the way to a dashboard for your business users with no code? Come to this session to learn how this is now possible with HDF 3.1.
Similar to MiniFi and Apache NiFi : IoT in Berlin Germany 2018 (20)
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
Building Real-Time Pipelines With FLaNK
Timothy Spann, Principal Developer Advocate, Streaming - Cloudera Future of Data meetup, startup grind, AI Camp
The combination of Apache Flink, Apache NiFi, and Apache Kafka for building real-time data processing pipelines is extremely powerful, as demonstrated by this case study using the FLaNK-MTA project. The project leverages these technologies to process and analyze real-time data from the New York City Metropolitan Transportation Authority (MTA). FLaNK-MTA demonstrates how to efficiently collect, transform, and analyze high-volume data streams, enabling timely insights and decision-making.
Apache NiFi
Apache Kafka
Apache Flink
Apache Iceberg
LLM
Generative AI
Slack
Postgresql
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
Gen AI on Enterprise Cloud
Apache NiFi
Milvus
Apache Kafka
Apache Flink
Cloudera Machine Learning
Cloudera DataFlow
https://medium.com/@tspann/building-a-milvus-connector-for-nifi-34372cb3c7fa
https://www.meetup.com/futureofdata-princeton/events/300737266/
https://lu.ma/q7pcfyjn?source=post_page-----34372cb3c7fa--------------------------------&tk=TTyakY
If you're interested in working with Generative AI on the cloud, this virtual workshop is for you.
Tim Spann from Cloudera and Yujian Tang from Zilliz will cover how you can implement your own GenAI workflows on the cloud at enterprise scale.
9:00 - 9:05: Intro
9:05 - 9:15: What is Milvus
9:15 - 9:25: Cloudera Development Platform
9:25 - 10:00: Demo
Location
https://www.youtube.com/watch?v=IfWIzKsoHnA
https://github.com/tspannhw/SpeakerProfile
https://www.linkedin.com/in/yujiantang/
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
https://www.youtube.com/watch?v=Yeua8NlzQ3Y
https://www.conf42.com/Large_Language_Models_LLMs_2024_Tim_Spann_generative_ai_streaming
Adding Generative AI to Real-Time Streaming Pipelines
Abstract
Let’s build streaming pipelines that convert streaming events into prompts, call LLMs, and process the results.
Summary
Tim Spann: My talk is adding generative AI to real time streaming pipelines. I'm going to discuss a couple of different open source technologies. We'll touch on Kafka, Nifi, Flink, Python, Iceberg. All the slides, all the code and GitHub are out there.
Llm, if you didn't know, is rapidly evolving. There's a lot of different ways to interact with models. That enrichment, transformation, processing really needs tools. The amount of models and projects and software that are available is massive.
Nifi supports hundreds of different inputs and can convert them on the fly. Great way to distribute your data quickly to whoever needs it without duplication, without tight coupling. Fun to find new things to integrate into.
So what we can do is, well, I want to get a meetup chat going. I have a processor here that just listens for events as they come from slack. And then I'm going to clean it up, add a couple fields and push that out to slack. Every model is a little bit of different tweaking.
Nifi acts as a whole website. And as you see here, it can be get, post, put, whatever you want. We send that response back to flink and it shows up here. Thank you for attending this talk. I'm going to be speaking at some other events very shortly.
Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi, Tim Spann here. My talk is adding generative AI to real time streaming pipelines, and we're here for the large language model conference at Comp 42, which is always a nice one, great place to be. I'm going to discuss a couple of different open source technologies that work together to enable you to build real time pipelines using large language models. So we'll touch on Kafka, Nifi, Flink, Python, Iceberg, and I'll show you a little bit of each one in the demos. I've been working with data machine learning, streaming IoT, some other things for a number of years, and you could contact me at any of these places, whether Twitter or whatever it's called, some different blogs, or in person at my meetups and at different conferences around the world. I do a weekly newsletter, cover streaming ML, a lot of LLM, open source, Python, Java, all kinds of fun stuff, as I mentioned, do a bunch of different meetups. They are not just in the east coast of the US, they are available virtually live, and I also put them on YouTube, and if you need them somewhere else, let me know. We publish all the slides, all the code and GitHub. Everything you need is out there. Let's get into the talk. Llm, if you didn't know, is rapidly evolving. While you're typing down the things that you use, it
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...Timothy Spann
2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
https://xtremej.dev/2023/schedule/
Building Real-time Pipelines with FLaNK: A Case Study with Transit Data
Overview of the problem, the application (code walkthru and running), overview of FLaNK, introduction to NiFi, introduction to Kafka, and introduction to Flink.
28March2024-Codeless-Generative-AI-Pipelines
https://www.meetup.com/futureofdata-princeton/events/299440871/
https://www.meetup.com/real-time-analytics-meetup-ny/events/299290822/
******Note*****
The event is seat-limited, therefore please complete your registration here. Only people completing the form will be able to attend.
-----------------------
We're excited to invite you to join us in-person, for a Real-Time Analytics exploration!
Join us for an evening of insights, networking as we delve into the OSS technologies shaping the field!
Agenda:
05:30-06:00: Pizza and friends
06:00- 06:40: Codeless GenAI Pipelines with Flink, Kafka, NiFi
06:40- 07:20 Real-Time Analytics in the Corporate World: How Apache Pinot® Powers Industry Leaders
07:20-07:30 QNA
Codeless GenAI Pipelines with Flink, Kafka, NiFi | Tim Spann, Cloudera
Explore the power of real-time streaming with GenAI using Apache NiFi. Learn how NiFi simplifies data engineering workflows, allowing you to focus on creativity over technical complexities. I'll guide you through practical examples, showcasing NiFi's automation impact from ingestion to delivery. Whether you're a seasoned data engineer or new to GenAI, this talk offers valuable insights into optimizing workflows. Join us to unlock the potential of real-time streaming and witness how NiFi makes data engineering a breeze for GenAI applications!
Real-Time Analytics in the Corporate World: How Apache Pinot® Powers Industry Leaders | Viktor Gamov, StarTree
Explore how industry leaders like LinkedIn, Uber Eats, and Stripe are mastering real-time data with Viktor as your guide. Discover how Apache Pinot transforms data into actionable insights instantly. Viktor will showcase Pinot's features, including the Star-Tree Index, and explain why it's a game-changer in data strategy. This session is for everyone, from data geeks to business gurus, eager to uncover the future of tech. Join us and be wowed by the power of real-time analytics with Apache Pinot!
-------
Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera.
He works with Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more.
TCFPro24 Building Real-Time Generative AI PipelinesTimothy Spann
https://princetonacm.acm.org/tcfpro/
18th Annual IEEE IT Professional Conference (ITPC)
Armstrong Hall at The College of New Jersey
Friday, March 15th, 2024 | 10:00 AM to 5:00 PM
IT Professional Conference at Trenton Computer Festival
IEEE Information Technology Professional Conference on Friday, March 15th, 2024
TCFPro24 Building Real-Time Generative AI Pipelines
Building Real-Time Generative AI Pipelines
In this talk, Tim will delve into the exciting realm of building real-time generative AI pipelines with streaming capabilities. The discussion will revolve around the integration of cutting-edge technologies to create dynamic and responsive systems that harness the power of generative algorithms.
From leveraging streaming data sources to implementing advanced machine learning models, the presentation will explore the key components necessary for constructing a robust real-time generative AI pipeline. Practical insights, use cases, and best practices will be shared, offering a comprehensive guide for developers and data scientists aspiring to design and implement dynamic AI systems in a streaming environment.
Tim will show a live demo showing we can use Apache NiFi to provide a live chat between a person in Slack and several LLM models all orchestrated with Apache NiFi, Apache Kafka and Python. We will use RAG against Chroma and Pinecone vector data stores, Hugging Face and WatsonX.AI LLM, and add additional context with NiFi lookups of stocks, weather and other data streams in real-time.
Timothy Spann
Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Pulsar, Apache Kafka, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming.
Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark.
Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...Timothy Spann
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipelines
https://www.meetup.com/futureofdata-newyork/events/298660453/
Unlocking Financial Data with Real-Time Pipelines
(Flink Analytics on Stocks with SQL )
By Timothy Spann
Financial institutions thrive on accurate and timely data to drive critical decision-making processes, risk assessments, and regulatory compliance. However, managing and processing vast amounts of financial data in real-time can be a daunting task. To overcome this challenge, modern data engineering solutions have emerged, combining powerful technologies like Apache Flink, Apache NiFi, Apache Kafka, and Iceberg to create efficient and reliable real-time data pipelines. In this talk, we will explore how this technology stack can unlock the full potential of financial data, enabling organizations to make data-driven decisions swiftly and with confidence.
Introduction: Financial institutions operate in a fast-paced environment where real-time access to accurate and reliable data is crucial. Traditional batch processing falls short when it comes to handling rapidly changing financial markets and responding to customer demands promptly. In this talk, we will delve into the power of real-time data pipelines, utilizing the strengths of Apache Flink, Apache NiFi, Apache Kafka, and Iceberg, to unlock the potential of financial data. I will be utilizing NiFi 2.0 with Python and Vector Databases.
Timothy Spann
Principal Developer Advocate, Cloudera
Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.
https://twitter.com/PaaSDev
https://www.linkedin.com/in/timothyspann/
https://medium.com/@tspann
https://github.com/tspannhw/FLiPStackWeekly/
Conf42-Python-Building Apache NiFi 2.0 Python Processors
https://www.conf42.com/Python_2024_Tim_Spann_apache_nifi_2_processors
Building Apache NiFi 2.0 Python Processors
Abstract
Let’s enhance real-time streaming pipelines with smart Python code. Adding code for vector databases and LLM.
Summary
Tim Spann: I'm going to be talking today, be building Apache 9520 Python processors. One of the main purposes of supporting Python in the streaming tool Apache Nifi is to interface with new machine learning and AI and Gen AI. He says Python is a real game changer for Cloudera.
You're just going to add some metadata around it. It's a great way to pass a file along without changing it too substantially. We really need you to have Python 310 and again JDK 21 on your machine. You got to be smart about how you use these models.
There are a ton of python processors available. You can use them in multiple ways. We're still in the early world of Python processors, so now's the time to start putting yours out there. Love to see a lot of people write their own.
When we are parsing documents here, again, this is the Python one I'm picking PDF. Lots of different things you could do. If you're interested on writing your own python code for Apache Nifi, definitely reach out and thank.
Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg with Stock Data and LLM
Abstract
In this talk, we’ll discuss how to use Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg to process and analyze stock data. We demonstrated the ingestion, processing, and analysis of stock data. Additionally, we illustrated how to use an LLM to generate predictions from the analyzed data.
Karin Wolok
Developer Relations, Dev Marketing, and Community Programming @ Project Elevate
Karin Wolok's LinkedIn account Karin Wolok's twitter account
Tim Spann
Principal Developer Advocate @ Cloudera
Tim Spann's LinkedIn account Tim Spann's twitter account
https://www.conf42.com/Python_2024_Karin_Wolok_Tim_Spann_nifi__kafka_risingwave_iceberg_llm
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI PipelinesTimothy Spann
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
https://www.aicamp.ai/event/eventdetails/W2024022214
apache nifi
llm
generative ai
gen ai
ml
dl
machine learning
apache kafka
apache flink
postgresql
python
AI Meetup (NYC): GenAI, LLMs, ML and Data
Feb 22, 05:30 PM EST
Welcome to the monthly in-person AI meetup in New York City, in collaboration with Microsoft. Join us for deep dive tech talks on AI, GenAI, LLMs and machine learning, food/drink, networking with speakers and fellow developers
Agenda:
* 5:30pm~6:00pm: Checkin, Food/drink and networking
* 6:00pm~6:10pm: Welcome/community update
* 6:10pm~8:30pm: Tech talks
* 8:30pm: Q&A, Open discussion
Tech Talk: Searching and Reasoning Over Multimedia Data with Vector Databases and LMMs
Speaker: Zain Hasan (Weaviate LinkedIn)
Abstract: In this talk, Zain Hasan will discuss how we can use open-source multimodal embedding models in conjunction with large generative multimodal models that can that can see, hear, read, and feel data(!), to perform cross-modal search(searching audio with images, videos with text etc.) and multimodal retrieval augmented generation (MM-RAG) at the billion-object scale with the help of open source vector databases. I will also demonstrate, with live code demos, how being able to perform this cross-modal retrieval in real-time can enables users to use LLMs that can reason over their enterprise multimodal data. This talk will revolve around how we can scale the usage of multimodal embedding and generative models in production.
Tech Talk: Codeless Generative AI Pipelines
Speaker: Timothy Spann (Cloudera LinkedIn)
Abstract: Join us for an insightful talk on leveraging the power of real-time streaming tools, specifically Apache NiFi, to revolutionize GenAI data engineering. In this session, we’ll explore how the integration of Apache NiFi can automate the entire process of prompt building, making it a seamless and efficient task.
Speakers/Topics:
Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics
Sponsors:
We are actively seeking sponsors to support our community. Whether it is by offering venue spaces, providing food/drink, or cash sponsorship. Sponsors will have the chance to speak at the meetups, receive prominent recognition, and gain exposure to our extensive membership base of 20,000+ local or 300K+ developers worldwide.
Venue:
Microsoft NYC - Times Square, 11 Times Square, New York, NY 10036
Room Name: Central Park West 6501
Community on Slack/Discord
- Event chat: chat and connect with speakers and attendees
- Sharing blogs, events, job openings, projects collaborations
Join Slack (search and join the #newyork channel) | Join Discord
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
20-Feb-2024
In this talk, I will walk through how someone can set up and run continuous SQL queries against Kafka topics utilizing Apache Flink. We will walk through creating Kafka topics, schemas, and publishing data.
We will then cover consuming Kafka data, joining Kafka topics, and inserting new events into Kafka topics as they arrive. This basic overview will show hands-on techniques, tips, and examples of how to do this.
Tim Spann
Tim Spann is the Principal Developer Advocate for Data in Motion @ Cloudera where he works with Apache Kafka, Apache Flink, Apache NiFi, Apache Iceberg, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesTimothy Spann
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
Unlocking Financial Data with Real-Time Pipelines
Financial institutions thrive on accurate and timely data to drive critical decision-making processes, risk assessments, and regulatory compliance. However, managing and processing vast amounts of financial data in real-time can be a daunting task. To overcome this challenge, modern data engineering solutions have emerged, combining powerful technologies like Apache Flink, Apache NiFi, Apache Kafka, and Iceberg to create efficient and reliable real-time data pipelines. In this talk, we will explore how this technology stack can unlock the full potential of financial data, enabling organizations to make data-driven decisions swiftly and with confidence.
Introduction: Financial institutions operate in a fast-paced environment where real-time access to accurate and reliable data is crucial. Traditional batch processing falls short when it comes to handling rapidly changing financial markets and responding to customer demands promptly. In this talk, we will delve into the power of real-time data pipelines, utilizing the strengths of Apache Flink, Apache NiFi, Apache Kafka, and Iceberg, to unlock the potential of financial data.
Key Points to be Covered:
Introduction to Real-Time Data Pipelines: a. The limitations of traditional batch processing in the financial domain. b. Understanding the need for real-time data processing.
Apache Flink: Powering Real-Time Stream Processing: a. Overview of Apache Flink and its role in real-time stream processing. b. Use cases for Apache Flink in the financial industry. c. How Flink enables fast, scalable, and fault-tolerant processing of streaming financial data.
Apache Kafka: Building Resilient Event Streaming Platforms: a. Introduction to Apache Kafka and its role as a distributed streaming platform. b. Kafka's capabilities in handling high-throughput, fault-tolerant, and real-time data streaming. c. Integration of Kafka with financial data sources and consumers.
Apache NiFi: Data Ingestion and Flow Management: a. Overview of Apache NiFi and its role in data ingestion and flow management. b. Data integration and transformation capabilities of NiFi for financial data. c. Utilizing NiFi to collect and process financial data from diverse sources.
Iceberg: Efficient Data Lake Management: a. Understanding Iceberg and its role in managing large-scale data lakes. b. Iceberg's schema evolution and table-level metadata capabilities. c. How Iceberg simplifies data lake management in financial institutions.
Real-World Use Cases: a. Real-time fraud detection using Flink, Kafka, and NiFi. b. Portfolio risk analysis with Iceberg and Flink. c. Streamlined regulatory reporting leveraging all four technologies.
Best Practices and Considerations: a. Architectural considerations when building real-time financial data pipelines. b. Ensuring data integrity, security, and compliance in real-time pipelines. c. Scalability an
Building Real-time Travel Alerts
In this session, we will walk through how to build a complete streaming application to send alerts based on travel advisories from public data. We will also join in other data sources of relevance and push out alerts.
We will show you how to build this streaming application with Apache NiFi, Apache Kafka, and Apache Flink and show you when/why/how, and what to build to maximize performance, productivity, and ease of development.
Let's get streaming.
Apache Flink
Apache Kafka
Apache NiFi
FLaNK Stack
Tim Spann
Big Data Conference Europe 2023
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
JConWorld: Continuous SQL with Kafka and Flink
In this talk, I will walk through how someone can setup and run continous SQL queries against Kafka topics utilizing Apache Flink. We will walk through creating Kafka topics, schemas and publishing data.
We will then cover consuming Kafka data, joining Kafka topics and inserting new events into Kafka topics as they arrive. This basic over view will show hands-on techniques, tips and examples of how to do this.
Tim Spann is the Principal Developer Advocate for Data in Motion @ Cloudera where he works with Apache Kafka, Apache Flink, Apache NiFi, Apache Iceberg, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science. https://www.datainmotion.dev/p/about-me.html https://dzone.com/users/297029/bunkertor.html
https://www.youtube.com/channel/UCDIDMDfje6jAvNE8DGkJ3_w?view_as=subscriber
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.