The document discusses the potential benefits of container technologies like Docker. It notes that containers offer significantly higher density than virtual machines by avoiding hypervisor overhead. This density improvement can lead to major cost reductions by reducing infrastructure needs. Containers also improve developer efficiency by making development environments portable and disposable. This allows more rapid experimentation and innovation, potentially translating to increased revenue. Technologies like Amazon Lambda take the on-demand aspects of containers even further by abstracting compute resources. The document promotes StackEngine as a solution for managing containers at scale in production environments.
Big data conference europe real-time streaming in any and all clouds, hybri...Timothy Spann
Biography
Tim Spann is a Principal DataFlow Field Engineer at Cloudera where he works with Apache NiFi, MiniFi, Pulsar, Apache Flink, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a senior solutions architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
Talk
Real-Time Streaming in Any and All Clouds, Hybrid and Beyond
Today, data is being generated from devices and containers living at the edge of networks, clouds and data centers. We need to run business logic, analytics and deep learning at the scale and as events arrive.
Tools:
Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, DJL.ai Apache MXNet.
References:
https://www.datainmotion.dev/2019/11/introducing-mm-flank-apache-flink-stack.html
https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html
https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html
https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html
https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html
Source Code: https://github.com/tspannhw/MmFLaNK
FLiP Stack
StreamNative
ApacheCon 2021: Apache NiFi 101- introduction and best practicesTimothy Spann
ApacheCon 2021: Apache NiFi 101- introduction and best practices
Thursday 14:10 UTC
Apache NiFi 101: Introduction and Best Practices
Timothy Spann
In this talk, we will walk step by step through Apache NiFi from the first load to first application. I will include slides, articles and examples to take away as a Quick Start to utilizing Apache NiFi in your real-time dataflows. I will help you get up and running locally on your laptop, Docker
DZone Zone Leader and Big Data MVB
@PaasDev
https://github.com/tspannhw https://www.datainmotion.dev/
https://github.com/tspannhw/SpeakerProfile
https://dev.to/tspannhw
https://sessionize.com/tspann/
https://www.slideshare.net/bunkertor
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...Timothy Spann
Scenic city summit real-time streaming in any and all clouds, hybrid and beyond
24-September-2021. Scenic City Summit. Virtual. Real-Time Streaming in Any and All Clouds, Hybrid and Beyond
Apache Pulsar, Apache NiFi, Apache Flink
StreamNative
Tim Spann
https://sceniccitysummit.com/
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and...Timothy Spann
https://altinity.com/osa-con-2021/
An empty real-time SQL data warehouse is not useful to anyone. How can you load data quickly from diverse data sources? Utilizing open source tools from Apache, the FLiP stack enables any data engineer, programmer or analyst to build reusable modules with low or no code. We’ll show how to use them to load CDC, logs, events, XML, images, and many other types of data into ClickHouse and similar data warehouses.
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and friends
Big data conference europe real-time streaming in any and all clouds, hybri...Timothy Spann
Biography
Tim Spann is a Principal DataFlow Field Engineer at Cloudera where he works with Apache NiFi, MiniFi, Pulsar, Apache Flink, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a senior solutions architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
Talk
Real-Time Streaming in Any and All Clouds, Hybrid and Beyond
Today, data is being generated from devices and containers living at the edge of networks, clouds and data centers. We need to run business logic, analytics and deep learning at the scale and as events arrive.
Tools:
Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, DJL.ai Apache MXNet.
References:
https://www.datainmotion.dev/2019/11/introducing-mm-flank-apache-flink-stack.html
https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html
https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html
https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html
https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html
Source Code: https://github.com/tspannhw/MmFLaNK
FLiP Stack
StreamNative
ApacheCon 2021: Apache NiFi 101- introduction and best practicesTimothy Spann
ApacheCon 2021: Apache NiFi 101- introduction and best practices
Thursday 14:10 UTC
Apache NiFi 101: Introduction and Best Practices
Timothy Spann
In this talk, we will walk step by step through Apache NiFi from the first load to first application. I will include slides, articles and examples to take away as a Quick Start to utilizing Apache NiFi in your real-time dataflows. I will help you get up and running locally on your laptop, Docker
DZone Zone Leader and Big Data MVB
@PaasDev
https://github.com/tspannhw https://www.datainmotion.dev/
https://github.com/tspannhw/SpeakerProfile
https://dev.to/tspannhw
https://sessionize.com/tspann/
https://www.slideshare.net/bunkertor
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...Timothy Spann
Scenic city summit real-time streaming in any and all clouds, hybrid and beyond
24-September-2021. Scenic City Summit. Virtual. Real-Time Streaming in Any and All Clouds, Hybrid and Beyond
Apache Pulsar, Apache NiFi, Apache Flink
StreamNative
Tim Spann
https://sceniccitysummit.com/
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and...Timothy Spann
https://altinity.com/osa-con-2021/
An empty real-time SQL data warehouse is not useful to anyone. How can you load data quickly from diverse data sources? Utilizing open source tools from Apache, the FLiP stack enables any data engineer, programmer or analyst to build reusable modules with low or no code. We’ll show how to use them to load CDC, logs, events, XML, images, and many other types of data into ClickHouse and similar data warehouses.
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and friends
Pass data community summit - 2021 - Real-Time Streaming in Azure with Apache ...Timothy Spann
PASS Data Community Summit
2021
Apache NiFi, Apache Flink, Apache Pulsar
FLiP Stack
Pass data community summit - 2021 - Real-Time Streaming in Azure with Apache Pulsar
https://passdatacommunitysummit.com/
DBCC 2021 - FLiP Stack for Cloud Data LakesTimothy Spann
DBCC 2021 - FLiP Stack for Cloud Data Lakes
With Apache Pulsar, Apache NiFi, Apache Flink. The FLiP(N) Stack for Event processing and IoT. With StreamNative Cloud.
DBCC International – Friday 15.10.2021
Powered by Apache Pulsar, StreamNative provides a cloud-native, real-time messaging and streaming platform to support multi-cloud and hybrid cloud strategies.
Hail hydrate! from stream to lake using open sourceTimothy Spann
(VIRTUAL) Hail Hydrate! From Stream to Lake Using Open Source - Timothy J Spann, StreamNative
https://osselc21.sched.com/event/lAPi?iframe=no
A cloud data lake that is empty is not useful to anyone. How can you quickly, scalably and reliably fill your cloud data lake with diverse sources of data you already have and new ones you never imagined you needed. Utilizing open source tools from Apache, the FLiP stack enables any data engineer, programmer or analyst to build reusable modules with low or no code. FLiP utilizes Apache NiFi, Apache Pulsar, Apache Flink and MiNiFi agents to load CDC, Logs, REST, XML, Images, PDFs, Documents, Text, semistructured data, unstructured data, structured data and a hundred data sources you could never dream of streaming before. I will teach you how to fish in the deep end of the lake and return a data engineering hero. Let's hope everyone is ready to go from 0 to Petabyte hero.
https://osselc21.sched.com/event/lAPi/virtual-hail-hydrate-from-stream-to-lake-using-open-source-timothy-j-spann-streamnative
Real time cloud native open source streaming of any data to apache solrTimothy Spann
Real time cloud native open source streaming of any data to apache solr
Utilizing Apache Pulsar and Apache NiFi we can parse any document in real-time at scale. We receive a lot of documents via cloud storage, email, social channels and internal document stores. We want to make all the content and metadata to Apache Solr for categorization, full text search, optimization and combination with other datastores. We will not only stream documents, but all REST feeds, logs and IoT data. Once data is produced to Pulsar topics it can instantly be ingested to Solr through Pulsar Solr Sink.
Utilizing a number of open source tools, we have created a real-time scalable any document parsing data flow. We use Apache Tika for Document Processing with real-time language detection, natural language processing with Apache OpenNLP, Sentiment Analysis with Stanford CoreNLP, Spacy and TextBlob. We will walk everyone through creating an open source flow of documents utilizing Apache NiFi as our integration engine. We can convert PDF, Excel and Word to HTML and/or text. We can also extract the text to apply sentiment analysis and NLP categorization to generate additional metadata about our documents. We also will extract and parse images that if they contain text we can extract with TensorFlow and Tesseract.
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...Timothy Spann
Data science online camp using the flipn stack for edge ai (flink, nifi, pulsar)
Dec 3, 2021
Apache NiFi
Apache Flink
Apache Pulsar
Edge AI
Cloud Native Made Easy
StreamNative
Using FLiP with influxdb for edgeai iot at scale 2022Timothy Spann
https://adtmag.com/webcasts/2021/12/influxdata-february-10.aspx?tc=page0
FLiP Stack (Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark) with Influx DB for Edge AI and IoT workloads at scale
Tim Spann
Developer Advocate
StreamNative
datainmotion.dev
Real-time Streaming Pipelines with FLaNKData Con LA
Introducing the FLaNK stack which combines Apache Flink, Apache NiFi and Apache Kafka to build fast applications for IoT, AI, rapid ingest and deploy them anywhere. I will walk through live demos and show how to do this yourself.
FLaNK provides a quick set of tools to build applications at any scale for any streaming and IoT use cases.
We will discuss a use case - Smart Stocks with FLaNK (NiFi, Kafka, Flink SQL)
Bio -
Tim Spann is an avid blogger and the Big Data Zone Leader for Dzone (https://dzone.com/users/297029/bunkertor.html). He runs the the successful Future of Data Princeton meetup with over 1200 members at http://www.meetup.com/futureofdata-princeton/. He is currently a Senior Solutions Engineer at Cloudera in the Princeton New Jersey area. You can find all the source and material behind his talks at his Github and Community blog:
https://github.com/tspannhw/ApacheDeepLearning201
https://community.hortonworks.com/users/9304/tspann.html
Cracking the nut, solving edge ai with apache tools and frameworksTimothy Spann
Cracking the nut, solving edge ai with apache tools and frameworks
Using the FLaNK stack for Edge AI and Streaming AI.
Apache Flink, Apache Kafka, Apache Nifi, Apache Kudu, DJL, Apache MXNet, Apache OpenNLP, Apache Tika, Apache Hue, Apache Hadoop, Apache HDFS
Presented at AI DevWorld 2020 virtual
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP)Timothy Spann
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP)
by Timothy Spann
Wednesday 17:10 UTC - Cracking the Nut, Solving Edge AI with Apache Tools and Frameworks
Wednesday 17:10 UTC
Cracking the Nut, Solving Edge AI with Apache Tools and Frameworks
Today, data is being generated from devices and containers living at the edge of networks, clouds and data centers. We need to run business logic, analytics and deep learning at the edge before we start our real-time streaming flows. Fortunately using the all Apache FLiP Stack we can do this with ease! Streaming AI Powered Analytics From the Edge to the Data Center is now a simple use case. With MiNiFi we can ingest the data, do data checks, cleansing, run machine learning and deep learning models and route our data in real-time to Apache NiFi and Apache Pulsar for further transformations and processing. Apache Flink will provide our advanced streaming capabilities fed real-time via Apache Kafka topics. Apache MXNet models will run both at the edge and in our data centers via Apache NiFi and MiNiFi. Our final data will be stored in various Apache datastores. Event-Driven Microservices in Apache Pulsar Functions.
Tools:
Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, Apache MXNet
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsTimothy Spann
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
https://portotechhub.com/conference-2021/
Timothy Spann
Developer Advocate
StreamNative
A cloud data lake that is empty is not useful to anyone.
How can you quickly, scalably and reliably fill your cloud data lake with diverse sources of data you already have and new ones you never imagined you needed. Utilizing open source tools from Apache, the FLiP stack enables any data engineer, programmer or analyst to build reusable modules with low or no code. FLiP utilizes Apache NiFi, Apache Pulsar, Apache Flink and MiNiFi agents to load CDC, Logs, REST, XML, Images, PDFs, Documents, Text, semistructured data, unstructured data, structured data and a hundred data sources you could never dream of streaming before.
I will teach you how to fish in the deep end of the lake and return a data engineering hero. Let's hope everyone is ready to go from 0 to Petabyte hero.
TRACK RIBEIRA Fri 07:00 — 50 min
19-Nov-2021
Automation + dev ops summit hail hydrate! from stream to lakeTimothy Spann
Automation + dev ops summit hail hydrate! from stream to lake
2021
Apache Pulsar, APache NiFi, Apache Flink
StreamNative
https://sessionize.com/app/speaker/session/265189
Tim Spann, Developer Advocate
Cracking the nut, solving edge ai with apache tools and frameworksTimothy Spann
27-April-2021. Developer Week Europe. OPEN Stage A. 11:00
Tspann cracking the nut, solving edge ai with apache tools and frameworks
Using Apache Flink, Apache Airflow, Apache Arrow, Apache NiFi, Apache Kafka, Apache MXNet, DJL.AI, Apache Tika, Apache OpenNLP, Apache Kudu, Apache Impala, Apache HBase and more open source tools for edge AI.
ApacheCon 2021 Apache Deep Learning 302Timothy Spann
ApacheCon 2021 Apache Deep Learning 302
Tuesday 18:00 UTC
Apache Deep Learning 302
Timothy Spann
This talk will discuss and show examples of using Apache Hadoop, Apache Kudu, Apache Flink, Apache Hive, Apache MXNet, Apache OpenNLP, Apache NiFi and Apache Spark for deep learning applications. This is the follow up to previous talks on Apache Deep Learning 101 and 201 and 301 at ApacheCon, Dataworks Summit, Strata and other events. As part of this talk, the presenter will walk through using Apache MXNet Pre-Built Models, integrating new open source Deep Learning libraries with Python and Java, as well as running real-time AI streams from edge devices to servers utilizing Apache NiFi and Apache NiFi - MiNiFi. This talk is geared towards Data Engineers interested in the basics of architecting Deep Learning pipelines with open source Apache tools in a Big Data environment. The presenter will also walk through source code examples available in github and run the code live on Apache NiFi and Apache Flink clusters.
Tim Spann is a Developer Advocate @ StreamNative where he works with Apache NiFi, Apache Pulsar, Apache Flink, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Field Engineer at Cloudera, a senior solutions architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
* https://github.com/tspannhw/ApacheDeepLearning302/
* https://github.com/tspannhw/nifi-djl-processor
* https://github.com/tspannhw/nifi-djlsentimentanalysis-processor
* https://github.com/tspannhw/nifi-djlqa-processor
* https://www.linkedin.com/pulse/2021-schedule-tim-spann/
Since April 2016, Spark-as-a-service has been available to researchers in Sweden from the Swedish ICT SICS Data Center at www.hops.site. Researchers work in an entirely UI-driven environment on a platform built with only open-source software.
Spark applications can be either deployed as jobs (batch or streaming) or written and run directly from Apache Zeppelin. Spark applications are run within a project on a YARN cluster with the novel property that Spark applications are metered and charged to projects. Projects are also securely isolated from each other and include support for project-specific Kafka topics. That is, Kafka topics are protected from access by users that are not members of the project. In this talk we will discuss the challenges in building multi-tenant Spark streaming applications on YARN that are metered and easy-to-debug. We show how we use the ELK stack (Elasticsearch, Logstash, and Kibana) for logging and debugging running Spark streaming applications, how we use Graphana and Graphite for monitoring Spark streaming applications, and how users can debug and optimize terminated Spark Streaming jobs using Dr Elephant. We will also discuss the experiences of our users (over 120 users as of Sept 2016): how they manage their Kafka topics and quotas, patterns for how users share topics between projects, and our novel solutions for helping researchers debug and optimize Spark applications.
To conclude, we will also give an overview on our course ID2223 on Large Scale Learning and Deep Learning, in which 60 students designed and ran SparkML applications on the platform.
The "Apache Way" is the process by which Apache Software Foundation projects are managed. It has evolved over many years and has produced over 100 highly successful open source projects. But what is it and how does it work?
In this session Ross Gardler will describe how an Apache project is managed. He will describe how the foundation provides an technical and legal infrastructure for each project and how the Apache Way provides the governance scaffolding for individual projects. This provides the framework for Apache projects which are then free to apply the Apache Way to ensure their project succeeds.
Having attended this session you will have a better understanding of the inner workings of both the foundation and its projects. With this understanding you will be better equipped to engage with and benefit from Apache projects.
Pass data community summit - 2021 - Real-Time Streaming in Azure with Apache ...Timothy Spann
PASS Data Community Summit
2021
Apache NiFi, Apache Flink, Apache Pulsar
FLiP Stack
Pass data community summit - 2021 - Real-Time Streaming in Azure with Apache Pulsar
https://passdatacommunitysummit.com/
DBCC 2021 - FLiP Stack for Cloud Data LakesTimothy Spann
DBCC 2021 - FLiP Stack for Cloud Data Lakes
With Apache Pulsar, Apache NiFi, Apache Flink. The FLiP(N) Stack for Event processing and IoT. With StreamNative Cloud.
DBCC International – Friday 15.10.2021
Powered by Apache Pulsar, StreamNative provides a cloud-native, real-time messaging and streaming platform to support multi-cloud and hybrid cloud strategies.
Hail hydrate! from stream to lake using open sourceTimothy Spann
(VIRTUAL) Hail Hydrate! From Stream to Lake Using Open Source - Timothy J Spann, StreamNative
https://osselc21.sched.com/event/lAPi?iframe=no
A cloud data lake that is empty is not useful to anyone. How can you quickly, scalably and reliably fill your cloud data lake with diverse sources of data you already have and new ones you never imagined you needed. Utilizing open source tools from Apache, the FLiP stack enables any data engineer, programmer or analyst to build reusable modules with low or no code. FLiP utilizes Apache NiFi, Apache Pulsar, Apache Flink and MiNiFi agents to load CDC, Logs, REST, XML, Images, PDFs, Documents, Text, semistructured data, unstructured data, structured data and a hundred data sources you could never dream of streaming before. I will teach you how to fish in the deep end of the lake and return a data engineering hero. Let's hope everyone is ready to go from 0 to Petabyte hero.
https://osselc21.sched.com/event/lAPi/virtual-hail-hydrate-from-stream-to-lake-using-open-source-timothy-j-spann-streamnative
Real time cloud native open source streaming of any data to apache solrTimothy Spann
Real time cloud native open source streaming of any data to apache solr
Utilizing Apache Pulsar and Apache NiFi we can parse any document in real-time at scale. We receive a lot of documents via cloud storage, email, social channels and internal document stores. We want to make all the content and metadata to Apache Solr for categorization, full text search, optimization and combination with other datastores. We will not only stream documents, but all REST feeds, logs and IoT data. Once data is produced to Pulsar topics it can instantly be ingested to Solr through Pulsar Solr Sink.
Utilizing a number of open source tools, we have created a real-time scalable any document parsing data flow. We use Apache Tika for Document Processing with real-time language detection, natural language processing with Apache OpenNLP, Sentiment Analysis with Stanford CoreNLP, Spacy and TextBlob. We will walk everyone through creating an open source flow of documents utilizing Apache NiFi as our integration engine. We can convert PDF, Excel and Word to HTML and/or text. We can also extract the text to apply sentiment analysis and NLP categorization to generate additional metadata about our documents. We also will extract and parse images that if they contain text we can extract with TensorFlow and Tesseract.
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...Timothy Spann
Data science online camp using the flipn stack for edge ai (flink, nifi, pulsar)
Dec 3, 2021
Apache NiFi
Apache Flink
Apache Pulsar
Edge AI
Cloud Native Made Easy
StreamNative
Using FLiP with influxdb for edgeai iot at scale 2022Timothy Spann
https://adtmag.com/webcasts/2021/12/influxdata-february-10.aspx?tc=page0
FLiP Stack (Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark) with Influx DB for Edge AI and IoT workloads at scale
Tim Spann
Developer Advocate
StreamNative
datainmotion.dev
Real-time Streaming Pipelines with FLaNKData Con LA
Introducing the FLaNK stack which combines Apache Flink, Apache NiFi and Apache Kafka to build fast applications for IoT, AI, rapid ingest and deploy them anywhere. I will walk through live demos and show how to do this yourself.
FLaNK provides a quick set of tools to build applications at any scale for any streaming and IoT use cases.
We will discuss a use case - Smart Stocks with FLaNK (NiFi, Kafka, Flink SQL)
Bio -
Tim Spann is an avid blogger and the Big Data Zone Leader for Dzone (https://dzone.com/users/297029/bunkertor.html). He runs the the successful Future of Data Princeton meetup with over 1200 members at http://www.meetup.com/futureofdata-princeton/. He is currently a Senior Solutions Engineer at Cloudera in the Princeton New Jersey area. You can find all the source and material behind his talks at his Github and Community blog:
https://github.com/tspannhw/ApacheDeepLearning201
https://community.hortonworks.com/users/9304/tspann.html
Cracking the nut, solving edge ai with apache tools and frameworksTimothy Spann
Cracking the nut, solving edge ai with apache tools and frameworks
Using the FLaNK stack for Edge AI and Streaming AI.
Apache Flink, Apache Kafka, Apache Nifi, Apache Kudu, DJL, Apache MXNet, Apache OpenNLP, Apache Tika, Apache Hue, Apache Hadoop, Apache HDFS
Presented at AI DevWorld 2020 virtual
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP)Timothy Spann
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP)
by Timothy Spann
Wednesday 17:10 UTC - Cracking the Nut, Solving Edge AI with Apache Tools and Frameworks
Wednesday 17:10 UTC
Cracking the Nut, Solving Edge AI with Apache Tools and Frameworks
Today, data is being generated from devices and containers living at the edge of networks, clouds and data centers. We need to run business logic, analytics and deep learning at the edge before we start our real-time streaming flows. Fortunately using the all Apache FLiP Stack we can do this with ease! Streaming AI Powered Analytics From the Edge to the Data Center is now a simple use case. With MiNiFi we can ingest the data, do data checks, cleansing, run machine learning and deep learning models and route our data in real-time to Apache NiFi and Apache Pulsar for further transformations and processing. Apache Flink will provide our advanced streaming capabilities fed real-time via Apache Kafka topics. Apache MXNet models will run both at the edge and in our data centers via Apache NiFi and MiNiFi. Our final data will be stored in various Apache datastores. Event-Driven Microservices in Apache Pulsar Functions.
Tools:
Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, Apache MXNet
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsTimothy Spann
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
https://portotechhub.com/conference-2021/
Timothy Spann
Developer Advocate
StreamNative
A cloud data lake that is empty is not useful to anyone.
How can you quickly, scalably and reliably fill your cloud data lake with diverse sources of data you already have and new ones you never imagined you needed. Utilizing open source tools from Apache, the FLiP stack enables any data engineer, programmer or analyst to build reusable modules with low or no code. FLiP utilizes Apache NiFi, Apache Pulsar, Apache Flink and MiNiFi agents to load CDC, Logs, REST, XML, Images, PDFs, Documents, Text, semistructured data, unstructured data, structured data and a hundred data sources you could never dream of streaming before.
I will teach you how to fish in the deep end of the lake and return a data engineering hero. Let's hope everyone is ready to go from 0 to Petabyte hero.
TRACK RIBEIRA Fri 07:00 — 50 min
19-Nov-2021
Automation + dev ops summit hail hydrate! from stream to lakeTimothy Spann
Automation + dev ops summit hail hydrate! from stream to lake
2021
Apache Pulsar, APache NiFi, Apache Flink
StreamNative
https://sessionize.com/app/speaker/session/265189
Tim Spann, Developer Advocate
Cracking the nut, solving edge ai with apache tools and frameworksTimothy Spann
27-April-2021. Developer Week Europe. OPEN Stage A. 11:00
Tspann cracking the nut, solving edge ai with apache tools and frameworks
Using Apache Flink, Apache Airflow, Apache Arrow, Apache NiFi, Apache Kafka, Apache MXNet, DJL.AI, Apache Tika, Apache OpenNLP, Apache Kudu, Apache Impala, Apache HBase and more open source tools for edge AI.
ApacheCon 2021 Apache Deep Learning 302Timothy Spann
ApacheCon 2021 Apache Deep Learning 302
Tuesday 18:00 UTC
Apache Deep Learning 302
Timothy Spann
This talk will discuss and show examples of using Apache Hadoop, Apache Kudu, Apache Flink, Apache Hive, Apache MXNet, Apache OpenNLP, Apache NiFi and Apache Spark for deep learning applications. This is the follow up to previous talks on Apache Deep Learning 101 and 201 and 301 at ApacheCon, Dataworks Summit, Strata and other events. As part of this talk, the presenter will walk through using Apache MXNet Pre-Built Models, integrating new open source Deep Learning libraries with Python and Java, as well as running real-time AI streams from edge devices to servers utilizing Apache NiFi and Apache NiFi - MiNiFi. This talk is geared towards Data Engineers interested in the basics of architecting Deep Learning pipelines with open source Apache tools in a Big Data environment. The presenter will also walk through source code examples available in github and run the code live on Apache NiFi and Apache Flink clusters.
Tim Spann is a Developer Advocate @ StreamNative where he works with Apache NiFi, Apache Pulsar, Apache Flink, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Field Engineer at Cloudera, a senior solutions architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
* https://github.com/tspannhw/ApacheDeepLearning302/
* https://github.com/tspannhw/nifi-djl-processor
* https://github.com/tspannhw/nifi-djlsentimentanalysis-processor
* https://github.com/tspannhw/nifi-djlqa-processor
* https://www.linkedin.com/pulse/2021-schedule-tim-spann/
Since April 2016, Spark-as-a-service has been available to researchers in Sweden from the Swedish ICT SICS Data Center at www.hops.site. Researchers work in an entirely UI-driven environment on a platform built with only open-source software.
Spark applications can be either deployed as jobs (batch or streaming) or written and run directly from Apache Zeppelin. Spark applications are run within a project on a YARN cluster with the novel property that Spark applications are metered and charged to projects. Projects are also securely isolated from each other and include support for project-specific Kafka topics. That is, Kafka topics are protected from access by users that are not members of the project. In this talk we will discuss the challenges in building multi-tenant Spark streaming applications on YARN that are metered and easy-to-debug. We show how we use the ELK stack (Elasticsearch, Logstash, and Kibana) for logging and debugging running Spark streaming applications, how we use Graphana and Graphite for monitoring Spark streaming applications, and how users can debug and optimize terminated Spark Streaming jobs using Dr Elephant. We will also discuss the experiences of our users (over 120 users as of Sept 2016): how they manage their Kafka topics and quotas, patterns for how users share topics between projects, and our novel solutions for helping researchers debug and optimize Spark applications.
To conclude, we will also give an overview on our course ID2223 on Large Scale Learning and Deep Learning, in which 60 students designed and ran SparkML applications on the platform.
The "Apache Way" is the process by which Apache Software Foundation projects are managed. It has evolved over many years and has produced over 100 highly successful open source projects. But what is it and how does it work?
In this session Ross Gardler will describe how an Apache project is managed. He will describe how the foundation provides an technical and legal infrastructure for each project and how the Apache Way provides the governance scaffolding for individual projects. This provides the framework for Apache projects which are then free to apply the Apache Way to ensure their project succeeds.
Having attended this session you will have a better understanding of the inner workings of both the foundation and its projects. With this understanding you will be better equipped to engage with and benefit from Apache projects.
The Hadoop Distributed File System is the foundational storage layer in typical Hadoop deployments. Performance and stability of HDFS are crucial to the correct functioning of applications at higher layers in the Hadoop stack. This session is a technical deep dive into recent enhancements committed to HDFS by the entire Apache contributor community. We describe real-world incidents that motivated these changes and how the enhancements prevent those problems from reoccurring. Attendees will leave this session with a deeper understanding of the implementation challenges in a distributed file system and identify helpful new metrics to monitor in their own clusters.
Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware.
It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs. The core of Apache Hadoop consists of a storage part (HDFS) and a processing part (MapReduce).
This presentation is about apache hadoop technology. This may be helpful for the beginners. The beginners will know about some terminologies of hadoop technology. There is also some diagrams which will show the working of this technology.
Thank you.
The Hadoop Distributed File System is the foundational storage layer in typical Hadoop deployments. Performance and stability of HDFS are crucial to the correct functioning of applications at higher layers in the Hadoop stack. This session is a technical deep dive into recent enhancements committed to HDFS by the entire Apache contributor community. We describe real-world incidents that motivated these changes and how the enhancements prevent those problems from reoccurring. Attendees will leave this session with a deeper understanding of the implementation challenges in a distributed file system and identify helpful new metrics to monitor in their own clusters.
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Cloudera, Inc.
The Hadoop ecosystem has improved real-time access capabilities recently, narrowing the gap with relational database technologies. However, gaps remain in the storage layer that complicate the transition to Hadoop-based architectures. In this session, the presenter will describe these gaps and discuss the tradeoffs between real-time transactional access and fast analytic performance from the perspective of storage engine internals. The session also will cover Kudu (currently in beta), the new addition to the open source Hadoop ecosystem with outof-the-box integration with Apache Spark and Apache Impala (incubating), that achieves fast scans and fast random access from a single API.
This session will provide an introduction to The Apache Software Foundation – it’s history, organization and principles, and how Apache projects work. You will learn about The Apache Way of managing meritocracy-based and community driven projects as is practiced by all of the 100+ Apache projects, the levels of participation in Apache projects, and how you can get involved. This talk will also touch on the how the Apache governance process and the permissive Apache 2.0 license help ensure longer-lived open source projects, and provide a different opportunity for engagement than some other source communities and license models.
Shane Curcuru was elected as a Member of the ASF in 2002, and has been volunteering on public relations, conferences, brand management, and various other areas at Apache ever since. He also serves as a Director.
Improving Your Apache Project's Image And BrandShane Curcuru
Want to find new ways to draw in contributors to your project? Looking to attract ideas and attention from some of the corporate vendors, but don't want to lose your independence? Don't know how to approach your employer's plans to launch BigCo's SuperLucene product?
Learn how to improve your project's brand, drawing in newcomers as productive contributors, and defending your brand from aggressive vendors. Dealing fairly and firmly with companies mis-using your good reputation seems hard, but it doesn't need to be.
Learn about what uses of Apache brands that are OK, versus infringing uses hungry vendors try to use - and how to stop them. The strong independent reputation of your project and Apache overall relies on every PMC policing their own brand effectively and fairly. The Trademarks Committee is here to help!
Hadoop is emerging as the preferred solution for big data analytics across unstructured data. Using real world examples learn how to achieve a competitive advantage by finding effective ways of analyzing new sources of unstructured and machine-generated data.
Docker moves very fast, with an edge channel released every month and a stable release every 3 months. Patrick will talk about how Docker introduced Docker EE and a certification program for containers and plugins with Docker CE and EE 17.03 (from March), the announcements from DockerCon (April), and the many new features planned for Docker CE 17.05 in May.
This talk will be about what's new in Docker and what's next on the roadmap
The new buzz world in the world of Agile is "DevOps". So what exactly is devOps and Why do we need it? When development got married to deployment (sys-admin/operations) ; what is born is a new advanced species which is known to us today as "DevOps"
Docker Orchestration: Welcome to the Jungle! Devoxx & Docker Meetup Tour Nov ...Patrick Chanezon
In two years, Docker hit the sweet spot for devs and ops, with tools for building, shipping, and running distributed apps architected as a set of collaborating microservices packaged as Linux containers. One area of the Docker ecosystem that saw a lot of innovation in the past year is container orchestration systems. This session compares and contrasts various Docker orchestration systems (Swarm, Machine, and Compose), the batteries included with Docker itself, Mesos, Kubernetes, CoreOS/Fleet, Deis, Cloud Foundry, and Tutum. It includes a demo of how to deploy a Java 8 app with MongoDB on several of these systems. The goal of the session is to give you a framework to help evaluate how these systems can meet your particular requirements.
Demo code at https://github.com/chanezon/docker-tips/blob/master/orchestration-networking/README.md
Some technologies are tools of the DevOps trade. Chef, Jenkins, Vagrant and Zookeeper are all tools that can be used for huge leverage and impact by the right people. Rarely, however, is there a technology that *enables* the practice of DevOps. The advent of the cloud and disposable infrastructure is one example. Docker is in this second, more rarified class.
A Primer on Kubernetes and Google Container EngineRightScale
Docker and other container technologies offer the promise of improved productivity and portability. Kubernetes is one of the leading cluster management systems for Docker and powers the Google Container Engine managed service.
-A review of key Linux container concepts
-The role of Kubernetes in deploying Docker-based applications
-Primer on Google Container Service
-How RightScale works with containers and clusters
Versioning an API can be a somewhat daunting task for the uninitiated. Even worse, some of the most common approaches are less than ideal. In this session I discuss the struggles and outcomes of my first foray into versioning and deploying. I will show how using a combination immutable docker containers, nginx, and a few other friendly tools made for the creation of a fully automated versioning and deployment system at the push of a button.
OCCIware Project at EclipseCon France 2016, by Marc Dutoo, Open WideOCCIware
Hear hear dev & ops alike - ever got bitten by the fragmentation of the Cloud space at deployment time, By AWS vs Azure, Open Shift vs Heroku ? in a word, ever dreamt of configuring at once your Cloud application along with both its VMs and database ? Well, the extensible Open Cloud Computing Interface (OCCI) REST API (see http://occi-wg.org/) allows just that, by addressing the whole XaaS spectrum.
And now, OCCI is getting powerboosted by Eclipse Modeling and formal foundations. Enter Cloud Designer and other outputs of the OCCIware project (See http://www.occiware.org) : multiple visual representations, one per Cloud layer and technology. XaaS Cloud extension model validation, documentation & ops scripting generation. Simulation, decision-making comparison. Connectors that bring those models to life by getting their status from common Cloud services. Runtime middleware, deployed, monitored, adminstrated. And tackling the very interesting challenge of modeling a meta API in EMF's metamodel, while staying true to EMF, Eclipse tools and the OCCI standard.
Featuring Eclipse Sirius, Acceleo generators, EMF at runtime. Coming soon to a new Eclipse Foundation project near you, if so you'd like.
This talk includes a demonstration of the Docker connector and of how to use Cloud Designer to configure a simple Cloud application's deployment on the Roboconf PaaS system and OpenStack infrastructure.
Some tools such as Chef and Jenkins are used by engineers in ops to great effect. Rarely though, a technology brings a paradigm to the masses.
Docker, like cloud virtualization is of this more rare breed.
The challenge of application distribution - Introduction to Docker (2014 dec ...Sébastien Portebois
Live recording with the demos: https://www.youtube.com/watch?v=0XRcmJEiZOM
Contents
- The application distribution challenge
- The current solutions
- Introduction to Docker, Containers, and the Matrix from Hell
- Why people care: Separation of Concerns
- Technical Discussion
- Ecosystem, momentum
- How to build Docker images
- How to make containers talk to each other, how to handle data persistence
- Demo 1: isolation
- Demo 2: real case - installing Go Math! Academy, tail –f containers, unit tests
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCPDaniel Zivkovic
Enterprises traditionally think of App Platforms as PCF (Pivotal Cloud Foundry) or Red Hat OpenShift. In reality, public Clouds have evolved into Application Platforms - especially when using Managed Services & Serverless.
• If you are an IT Executive under increased pressure to cut costs, see how better Technology Stack choices – not layoffs or pay cuts, can reduce IT costs + increase business agility (while avoiding vendor lock-in):
• If you are a Developer lost in the sea of the Cloud Computing choices, watch Ray Tsang (Java Champion from GCP) live-code, and you will walk away Cloud-Native :)
See how to stop cannibalization of IT by deploying your good ol' Java Spring Boot Apps directly to Google Cloud Platform - no Servers/PCF/OpenShift/Kubernetes to manage, nor to limit your creativity: https://youtu.be/2B0wWagE0dc
P.S. For more forward-looking Software Developerment topics, join ServerlessToronto.org Meetups, and if you have any questions about the Architectural Patterns discussed, reach out to me to chat.
Docker containers & the Future of Drupal testing Ricardo Amaro
Story of an investigation to improve cloud
The sad VirtualMachine story
Containers and non-containers
DEMO - Drupal Docker
Drupal Testbots story in a Glance
Docker as a testing automation factor
DEMO - Docker Tesbot
Integration path
Reuse, Reduce, Recycle in Serverless WorldDmitri Zimine
Slides for the talk at @ServerlessConf San Francisco 2018
Reuse is fundamental to any software development. Serverless development, however, still misses a coherent end-to-end resuability story. AWS Application Repository, Serverless Components from @goserverless, and LogicApps' Connectors are all the steps in the right direction. But we are still far away from npm/pip install developer's paradise. What is missing, and the what is path forward?
In this talk, I reflect on the current state of reusability in Serverless, share relevant learnings from establishing reusability in DevOps tools, and show a working code, a proof of concept for an open-source catalog of reusable Serverless functions. How exactly? We recycled StackStorm Exchange - a mature opensource action catalog - with a plugin to serverless framework. Come and see the details, and bring your ideas to discuss how we promote reusability in Serverless.
UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin
I presented to the Georgia Southern Computer Science ACM group. Rather than one topic for 90 minutes, I decided to do an UnConference. I presented them a list of 8-9 topics, let them vote on what to talk about, then repeated.
Each presentation was ~8 minutes, (Except Career) and was by no means an attempt to explain the full concept or technology. Only to wake up their interest.
Similar to Matt Franklin - Apache Software (Geekfest) (20)
Common Sense for the C-Suite: Relevance is the New ReputationW2O Group
In today’s social/digital reality, Relevance has become the new reputation. This means that an organization must connect consistently and authentically on multiple levels with its key audiences — in areas that are both meaningful to the business- its core purpose and strategic direction - as well as areas that are meaningful to its audiences. What makes this different is the speed at which relevance forms and dissipates and the agility necessary to harness it for sustained growth and success.
In an age where information is ubiquitous and people move from one subject to another in a blink of eye, if your brand, product, service or company isn’t on their radar you don’t exist.
It’s all about connection.
In this issue of Common Sense for the C-Suite, we explore how organizations can drive growth and remain relevant in a crowded, distracted landscape.
Understanding Physician/ Patient Conversations OnlineW2O Group
MDigitalLife's Managing Director & Founder, Greg Matthews led a webinar discussing the evolution of online interactions between patients and Healthcare Providers (HCPs) and what healthcare companies need to know to stay ahead of the curve.
Innovations in Healthcare - US Chamber of CommerceW2O Group
W2O Group's president and author of Storytizing, Bob Pearson spoke at the US Chamber of Commerce's #healthforward event. He shared key insights on innovations in healthcare.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
2. The Apache Software Foundation
“The Apache Software Foundation provides support for
the Apache community of open-source software projects,
which provide software products for the public good.”
3. Apache Projects
Over 300 Open Source Projects
Over 25 different programming languages
Over 20 different categories from big data to e-mail
4. The Apache Incubator
“The Incubator project is the entry path into The Apache
Software Foundation for projects and codebases wishing
to become part of the Foundation’s efforts.”
5. Apache TinkerPop (incubating)
• A graph computing framework
– Defines and manages the Gremlin graph
query language
– Provides abstracted server runtime for most
major graph databases
• Neo4J
• TinkerGraph
• Sail Graph
• Titan
• Working on their first official Apache release
• Last pre-Apache release: 3.0.0M3
6. Apache NiFi(incubating)
• Distributed data processing system
– Guaranteed Delivery
– Data Buffering w/ Throttling
– Prioritized Queuing
– Configurable QoS optimizations
– Data Provenance
– Visual C2
– Security
• Latest Release 0.0.1-incubating
7. Apache Kylin(incubating)
• Distributed Analytics Engine
– Open sourced from eBay
– Provides a SQL interface
– Supports existing BI tools
– Sub-second response across large data
sets
– Leverages Hadoop as a data store
• Working on their first official Apache
release
• Last pre-Apache release: 0.6.6
8. Apache Zeppelin (incubating)
• Web-based Analytics Visualization System
– Integrates Apache Spark for data
processing
– Multi-language backend and query engine
– Rich plugin and extension capability
– Native visualizations with embedding
capability
– Organized into a sharable notebook for
collaboration
• Working on their first official Apache
release
9. Apache HTrace (incubating)
• Provides a mechanism for easily
tracing processes in distributed
systems
• Simple integration model by
wrapping threads
• Modular tracing receivers allows
for custom back-ends
• Supports Zipkin natively
• Latest release 3.1.0-incubating
10. Apache Ignite (incubating)
• In-memory data fabric
– Distributed processing
– Supports multiple use cases
• In-memory DBMS operations with persistent backend
• High performance, massively-parallel computing
• Distributed messaging
• Advanced clustering capabilities
• Distributed data structures
– Optimized for speed in all cases
• Latest release 1.0.0-RC1
11. Groovy
• Popular dynamic programming language for
the JVM
– Interoperability with any JVM libraries
– More permissive programming model than
Java
– Allows for a wider set of use cases than java
• Dynamic scripting
• Programming console
• Proposed Apache Incubator project
• Latest Pre-Apache Release 2.4
18. Who am I?
Director of Evangelism - StackEngine Jesus, Jobs, Gates - Pick a religion
19. A Brief History of Virtualization
Is history repeating itself?
20. History from an Engineers
Perspective
First there were
containers (1982, 1998,
2005) but they were hard
21. History from an Engineers
Perspective
First there were
containers (1982, 1998,
2005) but they were hard
Then there was the cloud
(2009). It was easy.
22. History from an Engineers
Perspective
First there were containers
(1982, 1998, 2005) but they
were hard
Then there was the cloud
(2009). It was easy.
Today there is Docker and
containers are ready for mere
mortals
24. What is a Container
A Virtual Machine (Cloud)
is a full copy of an entire
computer running in
software via a hypervisor
25. What is a Container
A Virtual Machine (Cloud) is
a full copy of an entire
computer running in
software via a hypervisor
A Container is a slice of a
computer with no hypervisor
overhead.
26. A Virtual Machine (Cloud) is a full
copy of an entire computer running
in software via a hypervisor
A Container is a slice of a
computer with no hypervisor
overhead.
Executive Summary: The lack of
extra stuff in the bottom picture
means big efficiency gains
What is a Container
27. But Wait!
Why do I care?
Typical
Rockstar
CTO
Starts
with,
“Why?”
28. Why you care
In the cloud a physical
machine might practically
be split into 16 VMs.
29. Why you care
In the cloud a physical
machine might practically
be split into 16 VMs.
With containers the
number is in the 100s for
the same machine
(Density)
30. Why you care
In the cloud a physical machine
might practically be split into 16
VMs.
With containers the number is in
the 100s for the same machine
(Density)
In the cloud it can take minutes
to get a new VM
31. Why you care
In the cloud a physical machine
might practically be split into 16
VMs.
With containers the number is in the
100s for the same machine
(Density)
In the cloud it can take minutes to
get a new VM
Containers start in milliseconds.
(Agility)
33. Cost Reduction - Density
Today you have 100’s or
1000’s of machines.
34. Cost Reduction - Density
Today you have 100’s or
1000’s of machines.
Tomorrow you have
10,000’s containers (and
10 to 100 machines).
35. Cost Reduction - Density
Today you have 100’s or
1000’s of machines.
Tomorrow you have 10,000’s
containers (and 10 or 100
machines).
You pay for machines, not
containers.
37. Cost Reduction - Better Geek
Efficiency
Geeks are expensive
Containerized
development
environments save
developer time.
38. Cost Reduction - Better Geek
Efficiency
Geeks are expensive
Containerized development
environments save developer
time.
At W2O using VMs we recouped
up to 8 hours per week
(measured) of geek time!
~$250,000 per year. Containers
can be better!
39. Cost Reduction - Better Geek
Efficiency
Geeks are expensive
Containerized development
environments save developer time.
At W2O using VMs we recouped up to 8
hours per week (measured) of geek
time! ~$250,000 per year. Containers
can be better!
We did not measure the recovered
opportunity costs (shame)
40. OK … I like spending less money.
But will it help me grow revenue?
45. Revenue Growth -
Innovation
These containerized
development
environments are
disposable
Developers want to
upgrade for the latest
features.
46. Revenue Growth -
Innovation
These containerized
development environments are
disposable
Geeks want to upgrade for the
latest features.
Majority of Developers have
bespoke development
environments
47. Revenue Growth -
Innovation
These containerized development
environments are disposable
Geeks want to upgrade for the latest
features.
Majority of Developers have
bespoke development environments
Easy to make, easy to throw away.
Easy to try something new.
48. Revenue Growth -
Innovation
These containerized development
environments are disposable
Geeks want to upgrade for the latest
features.
Majority of Developers have
bespoke development environments
Easy to make, easy to throw away.
Easy to try something new.
49. Revenue Growth -
Innovation
These containerized development
environments are disposable
Geeks want to upgrade for the latest
features.
Instead of bespoke development
environments
Easy to make, easy to throw away.
Easy to try something new.
Easy to go back if you don’t like the
result
56. Amazon Lambda
Containers mean truly on
demand compute
In the same way the cloud
abstracted all the details of
a machine, lambda does
the same for compute
57. Amazon Lambda
Containers mean truly on
demand compute
In the same way the cloud
abstracted all the details of a
machine lambda does the
same for compute
Tsunami #2
58. Amazon Lambda
Containers mean truly on
demand compute
In the same way the cloud
abstracted all the details of a
machine lambda does the
same for compute
Tsunami #2
Don’t be caught
61. StackEngine
We provide a way to
manage containers in a
Production environment
Cattle not Pets -> Ants not
Cattle
62. StackEngine
We provide a way to manage
containers in a Production
environment
Cattle not Pets -> Ants not
Cattle
Want some help
understanding this potential?
Look us up!
63. StackEngine
We provide a way to manage
containers in a Production
environment
Cattle not Pets -> Ants not
Cattle
Want some help understanding
this potential? Look us up!
http://stackengine.com
64. Tech Colophon
Containers vs. VMs at Pantheon - Use Case -
goo.gl/u3ztxj and goo.gl/gRkKGN
Disposable Development Environments - Vagrant -
goo.gl/whsRV3
Docker 101 - tech tutorial - goo.gl/cuXUU6
Amazon Lambda - announcement - goo.gl/sb1rLh
65. Reading Colophon
Bullets not Cannonballs, Creative Empiricism - Great by Choice - Jim
Collins
Start with Why - Simon Sinek
Measured Learning - The Lean Start up - Eric Reis
Features = Revenue - The Goal, It’s not Luck - Eliyahu Goldratt
Change or Die (goo.gl/Y8cMNT)- The Three Horsemen of the Digital
Apocalypse Considered - Michael Cote
84. Defenders need to find
hundreds of
vulnerabilities and fix
them all, while the
Attackers only need to
find one
Attackers need to
complete a series of
operations without being
detected, while the
Defenders only need to
detect them in one
Changing the Game
86. • tk@lancope.com
• Follow me on Twitter
– @tkeanini
• Personal Blog:
– tkonsecurity.com
• Professional Blog
– www.lancope.com/blog
• LinkedIn:
www.linkedin.com/in/tkkeaninipub/
• Goodreads: TK Keanini
Because I’m noisy
90. CNET, 2013
Cyber Crime Market
Page 90Click Security Confidential
Criminal Action Estimated Costs
Global Cyber Activity $300 billion – $1 trillion
Drug Trafficking $600 billion
Piracy $1 billion – $16 billion
Globally, we spend $70 billion per year to stop the bad guys
The bad guys are making $300+ billion a year
91. Why Security Systems are Failing
Page 91Click Security Confidential
Attack Surfaces
Adversaries
Enterprise Defenses
92. Expanding Attack Surfaces
Page 92Click Security Confidential
Humans
78% of IT professionals
consider employees as the
biggest security threat
508 is the average
number of applications
in an enterprise
Networks
5.2 is the average number of
devices per knowledge worker
connecting to a network
Software
Citrix, 2013
Forbes, 2014 Ponemon Institute, 2015
93. AV-test.org, 2015.
Evolution of Adversaries
Page 93Click Security Confidential
$1,300 is the average
attacker payment for a
banking Trojan
400,000 hackers
estimated in China alone &
growing daily
Malware Explosion # Skilled Hackers Black Market
383,000 new
malware variants
every day
US Intelligence, infosecisland.com
darkreading.com, 2012AV-test.org, 2015
94. Overwhelmed Defenses
Page 94Click Security Confidential
1-3 is the average
number of headcount
devoted to IT security
64% of US companies
face 10,000+ alerts
per month
Point Products Insufficient Workloads Increasing Budgets Underfunded
8% of incidents are detected
by endpoint, firewall &
network solutions
FireEye, 2015
FireEye, 2015Verizon DBIR, 2013
95. Impact on your Enterprise
Page 95Click Security Confidential
32 is the average number
of days to resolve &
lockdown an attack
173 is the average
number of days from
infiltration to discovery
$8.9m is the cost of
the average enterprise
breach
Escalating Costs Slow to Discover Long to Resolve
Verizon 2012 DBIR Ponemon Institute, 2013darkreading.com, 2012
97. Dave & Buster’s Restaurant
Page 97Click Security Confidential
98. D&B – Slow and Methodical
Page 98Click Security Confidential
Event Date Time Kill Chain Description of Actor’s Activities
Dave & Busters Feb. 1 0 1 Estonian and Ukrainian intruders scan /evaluate restaurant internet-facing connections
Dave & Busters Mar. 1 28 2 Estonian and Ukrainian intruders breached network security controls at a restaurant
Dave & Busters Mar. 2 1 4 Intruders breach a poorly secured retail system with internal network access, explore network
Dave & Busters Mar. 15 13 3 Yastremskiy and Suvorov contract Albert Gonzalez to customize sniffer for DB network
Dave & Busters Apr. 1 17 4 Intruders used network access to install packet sniffer designed to capture track 2 credit card data
Dave & Busters Apr. 15 14 5 The initial tests of the sniffer failed by crashing or failing to record data
Dave & Busters Apr. 15 0 5 Revised packet sniffer often failed to capture the intended information
Dave & Busters Sept. 1 139 5 Over 6 months intruders improved, tested and monitored their tools
Dave & Busters Sept. 22 1 6 Intruders establishing reliable and persistent control of the restaurant environments
Dave & Busters Sept. 3 1 6 Intruders prepare for breaching the corporate network in Dallas
Dave & Busters Sept. 15 12 5 Corporate servers breached, and admin passwords allow access to network devices
Dave & Busters Sept. 16 1 7 Intruders install the refined tools at 11 locations without detection
Dave & Busters Sept. 17 1 8 Packet capture tools return over 130,000 credit cards' full track data
Dave & Busters Sept. 30 13 10 The intruders were eventually blocked and identified by financial records
99. New Model for Security
Page 99Click Security Confidential
The bad guys are
going to get in – how
do you find them
before they do
damage?
100. Transformational Changes
Page 100Click Security Confidential
Current Security Practices
• Blocking & preventing attacks will work
• Big data produces better results
• Monitoring events will find bad actors
• Canned rules in SIEM’s are enough
Future Solutions Focus
• Detection, profiling & lockdown
• Adversary monitoring & investigation
• Actor kill-chain visualization & analysis
• User created analytics & sharing
Focus on what they do, not
what they use…
124. Metaio
Key Enabling
Technologies
• AREngine
• Thermal Touch
Over 12 Years
of Augmented Reality and
Computer Vision Experience
Optimized
For Next Gen Devices
Best in Class
Software for Professionals
• Metaio Suite
• Metaio Creator
• Metaio SDK
• Metaio Cloud
Largest Product
AR Distribution
• Every I
• V
• Audi AR V
137. • Your purpose – your reason for existing – is your key
message…and it is not “to make a profit”.
• Why are you really in business?
• Your audience are those that connect with that purpose,
and whose values align with yours. Know your brand
values!
It all starts with your PURPOSE
138. • You don’t own your brand.
• Your customers thoughts and emotions about your
brand are more important.
• Inspire and collaborate with your fans in co-creating
brand stories and content with you, making them
participants and leading stars!
• Empower via site, social media to create & foster a
sense of community and belonging both online and
offline.
From “Customer” to “Advocate”
141. • Create events and moments = experiences that
connect your fans with the rest of the community.
• Use your brand as a platform for this connectedness
and foster the sense of belonging.
• Examples:
– Brand Moments and Milestones
– Instameets
– Store Openings or events with influencers
– Music Festival
Weaving Physical & Digital
142. Diverse touchpoints
Offline Online
Mobile
Responsive
Website Mobile Site
Mobile App
One Day Without Shoes
Experiential
Social Media
TOMS Stores
Experiential
Retail Partners
Community Outreach
Campus Programs
Giving Trips
Ticket to Give
Sweepstakes
Customer
Service
144. deliver memorable
moments
INSPIRE and MOTIVATE to TAKE ACTION
• CREATE a dialog – in physical and digital contexts.
• INVITE to be active participators.
• CO-CREATE and deliver value.
• PROVIDE a sense of fun and entertainment.
Beyond Touchpoints / Places …
To understand what is next for Apache, need to understand WHAT Apache is
Really just an organization to support:
* Open source software development
* Legal and IP management independent of corporate entities and governments
* Global community
Volunteer organization. Only 4 paid contractors to support IT and executive admin
Apache is a collection of major open source projects
Covers almost every language, framework and programming style
People have a perception of Apache that isn’t rooted in reality
Projects are king, governance is just oversight
Major Categories:
Data processing
Data storage
Data analytics
Security
Development Frameworks
Languages
Why are you really in biz for?
TOMS is in business to help improve lives through business.
Apple is in business to “think differently” and challenge the status quo.
The idea is that the hashtag is aligned with our purpose and brand ethos,
and users and empowered as advocates of the brand to share their
experiences with TOMS.
This is a great example of how to tie “social” media and your social following back to your website. Leading with