This document provides an overview of Apache Kafka and Storm for distributed stream processing. It describes Kafka's architecture as a distributed commit log and covers topics, producers, consumers and clients. For Storm, it outlines the architecture including spouts, bolts and groupings. The document then provides guidance on coding topologies with spouts and bolts in Java and deploying locally or on a cluster.
Storm-on-YARN: Convergence of Low-Latency and Big-DataDataWorks Summit
adoop plays a central role for Yahoo! to provide personalized experiences for our users and create value for our advertisers. In this talk, we will discuss the convergence of low-latency processing and Hadoop platform. To enable the convergence, we have developed Storm-on-YARN to enable Storm streaming/microbatch applications and Hadoop batch applications hosted in a single cluster. Storm applications could leverage YARN for resource management, and apply Hadoop style security to Hadoop datasets on HDFS and HBase. In Storm-on-YARN, YARN is used to launch Storm application master (Nimbus), and enable Nimbus to request resources for Storm workers (Supervisors). YARN resource manager and Storm scheduler work together to support multi-tenancy and high availability. HDFS enables Storm to achieve higher availability of Nimbus itself. We are introducing Hadoop style security into Storm through JAAS authentication (Kerberos and Digest). Storm servers (Nimbus and DRPC) will be configured with authorization plugins for access control and audit. The security context enables Storm applications to access authorized datasets only (including those created by Hadoop applications). Yahoo! is making our contribution on Storm and YARN available as open source. We will work with industry partners to foster the convergence of low-latency processing and big-data.
Slides from talk given at the NYC Cassandra Meetup. Discussing how Storm works and how it integrates well with Apache Cassandra.
There is also a segway into a example project that uses Storm and Cassandra to implement a scalable reactive web crawler.
http://github.com/tjake/stormscraper
Apache Storm 0.9 basic training - VerisignMichael Noll
Apache Storm 0.9 basic training (130 slides) covering:
1. Introducing Storm: history, Storm adoption in the industry, why Storm
2. Storm core concepts: topology, data model, spouts and bolts, groupings, parallelism
3. Operating Storm: architecture, hardware specs, deploying, monitoring
4. Developing Storm apps: Hello World, creating a bolt, creating a topology, running a topology, integrating Storm and Kafka, testing, data serialization in Storm, example apps, performance and scalability tuning
5. Playing with Storm using Wirbelsturm
Audience: developers, operations, architects
Created by Michael G. Noll, Data Architect, Verisign, https://www.verisigninc.com/
Verisign is a global leader in domain names and internet security.
Tools mentioned:
- Wirbelsturm (https://github.com/miguno/wirbelsturm)
- kafka-storm-starter (https://github.com/miguno/kafka-storm-starter)
Blog post at:
http://www.michael-noll.com/blog/2014/09/15/apache-storm-training-deck-and-tutorial/
Many thanks to the Twitter Engineering team (the creators of Storm) and the Apache Storm open source community!
Kafka and Storm - event processing in realtimeGuido Schmutz
Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. It is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Storm is a distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. This session presents the main concepts of Kafka and Storm and then shows how a simple stream processing application is implemented using these two technologies.
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013Sonal Raj
This talk briefly outlines the Storm framework and Neo4J graph database, and how to compositely use them to perform computations on complex graphs in Python using the Petrel and Py2neo packages. This talk was given at PyCon India 2013.
Storm-on-YARN: Convergence of Low-Latency and Big-DataDataWorks Summit
adoop plays a central role for Yahoo! to provide personalized experiences for our users and create value for our advertisers. In this talk, we will discuss the convergence of low-latency processing and Hadoop platform. To enable the convergence, we have developed Storm-on-YARN to enable Storm streaming/microbatch applications and Hadoop batch applications hosted in a single cluster. Storm applications could leverage YARN for resource management, and apply Hadoop style security to Hadoop datasets on HDFS and HBase. In Storm-on-YARN, YARN is used to launch Storm application master (Nimbus), and enable Nimbus to request resources for Storm workers (Supervisors). YARN resource manager and Storm scheduler work together to support multi-tenancy and high availability. HDFS enables Storm to achieve higher availability of Nimbus itself. We are introducing Hadoop style security into Storm through JAAS authentication (Kerberos and Digest). Storm servers (Nimbus and DRPC) will be configured with authorization plugins for access control and audit. The security context enables Storm applications to access authorized datasets only (including those created by Hadoop applications). Yahoo! is making our contribution on Storm and YARN available as open source. We will work with industry partners to foster the convergence of low-latency processing and big-data.
Slides from talk given at the NYC Cassandra Meetup. Discussing how Storm works and how it integrates well with Apache Cassandra.
There is also a segway into a example project that uses Storm and Cassandra to implement a scalable reactive web crawler.
http://github.com/tjake/stormscraper
Apache Storm 0.9 basic training - VerisignMichael Noll
Apache Storm 0.9 basic training (130 slides) covering:
1. Introducing Storm: history, Storm adoption in the industry, why Storm
2. Storm core concepts: topology, data model, spouts and bolts, groupings, parallelism
3. Operating Storm: architecture, hardware specs, deploying, monitoring
4. Developing Storm apps: Hello World, creating a bolt, creating a topology, running a topology, integrating Storm and Kafka, testing, data serialization in Storm, example apps, performance and scalability tuning
5. Playing with Storm using Wirbelsturm
Audience: developers, operations, architects
Created by Michael G. Noll, Data Architect, Verisign, https://www.verisigninc.com/
Verisign is a global leader in domain names and internet security.
Tools mentioned:
- Wirbelsturm (https://github.com/miguno/wirbelsturm)
- kafka-storm-starter (https://github.com/miguno/kafka-storm-starter)
Blog post at:
http://www.michael-noll.com/blog/2014/09/15/apache-storm-training-deck-and-tutorial/
Many thanks to the Twitter Engineering team (the creators of Storm) and the Apache Storm open source community!
Kafka and Storm - event processing in realtimeGuido Schmutz
Apache Kafka is publish-subscribe messaging rethought as a distributed commit log. It is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Storm is a distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. This session presents the main concepts of Kafka and Storm and then shows how a simple stream processing application is implemented using these two technologies.
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013Sonal Raj
This talk briefly outlines the Storm framework and Neo4J graph database, and how to compositely use them to perform computations on complex graphs in Python using the Petrel and Py2neo packages. This talk was given at PyCon India 2013.
Storm is a distributed, reliable, fault-tolerant system for processing streams of data.
In this track we will introduce Storm framework, explain some design concepts and considerations, and show some real world examples to explain how to use it to process large amounts of data in real time, in a distributed environment. We will describe how we can scale this solution very easily as more data need to be processed.
We will explain all you need to know to get started with Storm and some tips on how to get your Spouts, Bolts and Topologies up and running in the cloud.
Learning Stream Processing with Apache StormEugene Dvorkin
Over the last couple years, Apache Storm became a de-facto standard for developing real-time analytics and complex event processing applications. Storm enables to tackle real-time data processing challenges the same way Hadoop enables batch processing of Big Data. Storm enables companies to have "Fast Data" alongside with "Big Data". Some use cases where Storm can be used are Fraud Detection, Operation Intelligence, Machine Learning, ETL, Analytics, etc.
In this meetup, Eugene Dvorkin, Architect @WebMD and NYC Storm User Group organizer will teach Apache Storm and Stream Processing fundamentals. While this meeting is geared toward new Storm users, experienced users may find something interesting as well.
Following topics will be covered:
• Why use Apache Storm?
• Common use cases
• Storm Architecture - components, concepts, topology
• Building simple Storm topology with Java and Groovy
• Trident and micro-batch processing
• Fault tolerance and guaranteed message delivery
• Running and monitoring Storm in production
• Kafka
• Storm at WebMD
• Resources
PHP Backends for Real-Time User Interaction using Apache Storm.DECK36
Engaging users in real-time is the topic of our times. Whether it’s a game, a shop, or a content-network, the aim remains the same: providing a personalized experience. In this workshop we will look under the hood of Apache Storm and lay a firm foundation on how to use it with PHP. By that, you can leverage your existing codebase and PHP expertise for an entirely new world: real-time analytics and business logic working on message streams. During the course of the workshop, we will introduce Apache Storm and take a look at all of its components. We will then skyrocket the applicability of Storm by showing you how to implement their components with PHP. All exercises will be conducted using an example project, the infamous and most exhilarating lolcat kitten game ever conceived: Plan 9 From Outer Kitten. In order to follow the hands-on excercises, you will need a development VM prepared by us with all relevant system components and our project repositories. To make the workshop experience as smooth as possible for all participants, please bring a prepared computer to the workshop, as there will be no time to deal with installation and setup issues. Please download all prerequisites and install them as described: VM, Plan 9 webapp, Plan 9 storm backend, (Tutorial: https://github.com/DECK36/plan9_workshop_tutorial ).
Talk held at the FrOSCon 2013 on 24.08.2013 in Sankt Augustin, Germany
Agenda:
- Why Twitter Storm?
- What is Twitter Storm?
- What to do with Twitter Storm?
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsData Con LA
This talk provides an overview of the open source Storm system for processing Big Data in realtime. The talk starts with an overview of the technology, including key components: Nimbus, Zookeeper, Topology, Tuple, Trident. It looks at integration with Hadoop through YARN and recent improvements. The presentation then dives into the complex Big Data architecture in which Storm can be integrated . The result is a compelling stack of technologies including integrated Hadoop clusters, MPP, and NoSQL databases.
After this, we look at example use cases for Storm: real-time advertising statistics, updating a Machine Learned model for content popularity predictions, and financial compliance monitoring.
Real-Time Big Data at In-Memory Speed, Using StormNati Shalom
Storm, a popular framework from Twitter, is used for real-time event processing. The challenge presented is how to manage the state of your real-time data processing at all times. In addition, you need Storm to integrate with your batch processing system (such as Hadoop) in a consistent manner.
This session will demonstrate how to integrate Storm with an in-memory database/grid, and explore various strategies for integrating the data grid with Hadoop and Cassandra, seamlessly. By achieving smooth integration with consistent management, you will be able to easily manage all the tiers of you Big Data stack in a consistent and effective way.
- See more at: http://nosql2013.dataversity.net/sessionPop.cfm?confid=74&proposalid=5526#sthash.FWIdqRHh.dpuf
A technical explanation of hadoop's infrastructure and the software that makes it work. This is cursory, but should help explain the inner-workings of hadoop.
Storm is a distributed, reliable, fault-tolerant system for processing streams of data.
In this track we will introduce Storm framework, explain some design concepts and considerations, and show some real world examples to explain how to use it to process large amounts of data in real time, in a distributed environment. We will describe how we can scale this solution very easily as more data need to be processed.
We will explain all you need to know to get started with Storm and some tips on how to get your Spouts, Bolts and Topologies up and running in the cloud.
Learning Stream Processing with Apache StormEugene Dvorkin
Over the last couple years, Apache Storm became a de-facto standard for developing real-time analytics and complex event processing applications. Storm enables to tackle real-time data processing challenges the same way Hadoop enables batch processing of Big Data. Storm enables companies to have "Fast Data" alongside with "Big Data". Some use cases where Storm can be used are Fraud Detection, Operation Intelligence, Machine Learning, ETL, Analytics, etc.
In this meetup, Eugene Dvorkin, Architect @WebMD and NYC Storm User Group organizer will teach Apache Storm and Stream Processing fundamentals. While this meeting is geared toward new Storm users, experienced users may find something interesting as well.
Following topics will be covered:
• Why use Apache Storm?
• Common use cases
• Storm Architecture - components, concepts, topology
• Building simple Storm topology with Java and Groovy
• Trident and micro-batch processing
• Fault tolerance and guaranteed message delivery
• Running and monitoring Storm in production
• Kafka
• Storm at WebMD
• Resources
PHP Backends for Real-Time User Interaction using Apache Storm.DECK36
Engaging users in real-time is the topic of our times. Whether it’s a game, a shop, or a content-network, the aim remains the same: providing a personalized experience. In this workshop we will look under the hood of Apache Storm and lay a firm foundation on how to use it with PHP. By that, you can leverage your existing codebase and PHP expertise for an entirely new world: real-time analytics and business logic working on message streams. During the course of the workshop, we will introduce Apache Storm and take a look at all of its components. We will then skyrocket the applicability of Storm by showing you how to implement their components with PHP. All exercises will be conducted using an example project, the infamous and most exhilarating lolcat kitten game ever conceived: Plan 9 From Outer Kitten. In order to follow the hands-on excercises, you will need a development VM prepared by us with all relevant system components and our project repositories. To make the workshop experience as smooth as possible for all participants, please bring a prepared computer to the workshop, as there will be no time to deal with installation and setup issues. Please download all prerequisites and install them as described: VM, Plan 9 webapp, Plan 9 storm backend, (Tutorial: https://github.com/DECK36/plan9_workshop_tutorial ).
Talk held at the FrOSCon 2013 on 24.08.2013 in Sankt Augustin, Germany
Agenda:
- Why Twitter Storm?
- What is Twitter Storm?
- What to do with Twitter Storm?
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsData Con LA
This talk provides an overview of the open source Storm system for processing Big Data in realtime. The talk starts with an overview of the technology, including key components: Nimbus, Zookeeper, Topology, Tuple, Trident. It looks at integration with Hadoop through YARN and recent improvements. The presentation then dives into the complex Big Data architecture in which Storm can be integrated . The result is a compelling stack of technologies including integrated Hadoop clusters, MPP, and NoSQL databases.
After this, we look at example use cases for Storm: real-time advertising statistics, updating a Machine Learned model for content popularity predictions, and financial compliance monitoring.
Real-Time Big Data at In-Memory Speed, Using StormNati Shalom
Storm, a popular framework from Twitter, is used for real-time event processing. The challenge presented is how to manage the state of your real-time data processing at all times. In addition, you need Storm to integrate with your batch processing system (such as Hadoop) in a consistent manner.
This session will demonstrate how to integrate Storm with an in-memory database/grid, and explore various strategies for integrating the data grid with Hadoop and Cassandra, seamlessly. By achieving smooth integration with consistent management, you will be able to easily manage all the tiers of you Big Data stack in a consistent and effective way.
- See more at: http://nosql2013.dataversity.net/sessionPop.cfm?confid=74&proposalid=5526#sthash.FWIdqRHh.dpuf
A technical explanation of hadoop's infrastructure and the software that makes it work. This is cursory, but should help explain the inner-workings of hadoop.
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...Amazon Web Services
With AWS, companies now have the ability to develop and run their applications with speed and flexibility like never before. Working with an infrastructure that can be 100 percent API driven enables businesses to use lean methodologies and realize these benefits. This in turn leads to greater success for those who make use of these practices. In this session, we talk about some key concepts and design patterns for continuous deployment and continuous integration, two elements of lean development of applications and infrastructures.
Apache Storm vs. Spark Streaming - two stream processing platforms comparedGuido Schmutz
Storm as well as Spark Streaming are Open-Source Frameworks supporting distributed stream processing. Storm has been developed by Twitter and is a free and open source distributed real-time computation system that can be used with any programming language. It is written primarily in Clojure and supports Java by default. Spark is fast and general engine for large-scale data processing and has been designed to provide a more efficient alternative to Hadoop MapReduce. Spark Streaming brings Spark's language-integrated API to stream processing, letting you write streaming applications the same way you write batch jobs. It supports both Java and Scala. This presentation shows how you can implement stream processing solutions with the two frameworks, discusses how they compare and highlights the differences and similarities.
Bobby Evans and Tom Graves, the engineering leads for Spark and Storm development at Yahoo will talk about how these technologies are used on Yahoo's grids and reasons why to use one or the other.
Bobby Evans is the low latency data processing architect at Yahoo. He is a PMC member on many Apache projects including Storm, Hadoop, Spark, and Tez. His team is responsible for delivering Storm as a service to all of Yahoo and maintaining Spark on Yarn for Yahoo (Although Tom really does most of that work).
Tom Graves a Senior Software Engineer on the Platform team at Yahoo. He is an Apache PMC member on Hadoop, Spark, and Tez. His team is responsible for delivering and maintaining Spark on Yarn for Yahoo.
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry confluent
Apache Beam (unified Batch and strEAM processing!) is a new Apache incubator project. Originally based on years of experience developing Big Data infrastructure within Google (such as MapReduce, FlumeJava, and MillWheel), it has now been donated to the OSS community at large.
Come learn about the fundamentals of out-of-order stream processing, and how Beam’s powerful tools for reasoning about time greatly simplify this complex task. Beam provides a model that allows developers to focus on the four important questions that must be answered by any stream processing pipeline:
What results are being calculated?
Where in event time are they calculated?
When in processing time are they materialized?
How do refinements of results relate?
Furthermore, by cleanly separating these questions from runtime characteristics, Beam programs become portable across multiple runtime environments, both proprietary (e.g., Google Cloud Dataflow) and open-source (e.g., Flink, Spark, et al).
Slides for the talk "Cassandra and Spark: Love at First Sight" given at Texas Linux Fest 2015. Gives an introduction to both Cassandra and Spark and how they work together.
Quickly re-publish CSV/TSV files from existing repositories as FAIR Data with just a few mouse clicks!
You select the columns to "project" as Linked Data, and the associated ontology terms. The FAIR Projector Builder will create a FAIR Projector for you: a Triple Pattern Fragment server to provide the Linked Data; a published DCAT Distribution containing metadata about those triples and their source; and an RML model (syntactic and semantic of the triples, to aid in third-party discovery of this novel projection.
(current status - first prototype, not ready for public consumption)
-------
Thanks to the NBDC/DBCLS for sponsoring the hackathon series.
MDW also funded by Ministerio de Economía y Competitividad grant number TIN2014-55993-RM
Apache Kafka - A modern Stream Processing PlatformGuido Schmutz
After a quick overview and introduction of Apache Kafka, this session cover two components which extend the core of Apache Kafka: Kafka Connect and Kafka Streams/KSQL.
Kafka Connects role is to access data from the out-side-world and make it available inside Kafka by publishing it into a Kafka topic. On the other hand, Kafka Connect is also responsible to transport information from inside Kafka to the outside world, which could be a database or a file system. There are many existing connectors for different source and target systems available out-of-the-box, either provided by the community or by Confluent or other vendors. You simply configure these connectors and off you go.
Kafka Streams is a light-weight component which extends Kafka with stream processing functionality. By that, Kafka can now not only reliably and scalable transport events and messages through the Kafka broker but also analyse and process these event in real-time. Interestingly Kafka Streams does not provide its own cluster infrastructure and it is also not meant to run on a Kafka cluster. The idea is to run Kafka Streams where it makes sense, which can be inside a “normal” Java application, inside a Web container or on a more modern containerized (cloud) infrastructure, such as Mesos, Kubernetes or Docker. Kafka Streams has a lot of interesting features, such as reliable state handling, queryable state and much more. KSQL is a streaming engine for Apache Kafka, providing a simple and completely interactive SQL interface for processing data in Kafka.
Mario Molina, Datio, Software Engineer
Kafka Streams is an open source JVM library for building event streaming applications on top of Apache Kafka. Its goal is to allow programmers to create efficient, real-time, streaming applications and perform analysis and operations on the incoming data.
In this presentation we’ll cover the main features of Kafka Streams and do a live demo!
This demo will be partially on Confluent Cloud, if you haven’t already signed up, you can try Confluent Cloud for free. Get $200 every month for your first three months ($600 free usage in total) get more information and claim it here: https://cnfl.io/cloud-meetup-free
https://www.meetup.com/Mexico-Kafka/events/271972045/
Machine Learning for Big Data Analytics: Scaling In with Containers while Sc...Ian Lumb
Watch On Demand Anytime via http://www.univa.com/resources/webinar-machine-learning.php
Armed with nothing more than an Apache Spark toting laptop, you have all the trappings required to prototype the application of Machine Learning against your data-science needs. From programmability in Scala, Java or Python, to built-in support for Machine Learning via MLlib, Spark is an exceedingly effective enabler that allows you to rapidly produce results.
Of course, as soon as your prototyping proves successful, you'll want to scale out to embrace the volume, variety and velocity that characterizes today's Big Data demands... in production. Because Spark is as comfortable on an isolated laptop as it is in a distributed-computing environment, addressing Big Data requirements in production boils down to effectively and efficiently embracing containers and clusters for Big Data Analytics.
And this is where offerings from Univa shine - i.e., in making the transition from prototype to production completely seamless. For some use cases, it makes sense to scale-in Spark based applications within Docker containers via Univa Grid Engine Container Edition or Navops by Univa; whereas in others, Spark is interfaced (as a Mesos-compliant framework) with Univa Universal Resource Broker, to permit scaling out on a cluster. In both scenarios, your production Spark applications are scheduled alongside other classes of workload - without a need for dedicated resources.
Agenda:
• Overview of Apache Spark as a platform for Deep Learning - from Python-based Jupyter Notebooks to Spark's Machine Learning library MLlib
• Overview of prototyping Machine Learning via Apache Spark on a laptop - without and within Docker containers
• Introductions to Univa Grid Engine Container Edition and Univa Universal Resource Broker plus Navops by Univa
• Overview of production Big Data Analytics platforms for Machine Learning
• Docker-containerized Apache Spark and Univa Grid Engine Container Edition
• Docker-containerized Apache Spark and Navops by Univa
• Apache Spark plus Univa Universal Resource Broker
• Introducing support for GPUs without and within Docker containers
• Use case example - using Machine Learning to classify data from Twitter without and within Docker containers
• Summary and next steps
Watch On Demand Anytime via http://www.univa.com/resources/webinar-machine-learning.php
Stream Data Deduplication Powered by Kafka Streams | Philipp Schirmer, BakdataHostedbyConfluent
Representations of data, e.g., describing news, persons or places, differ. Therefore, we need to identify duplicates, for example, if we want to stream deduplicated news from different sources into a sentiment classifier.
We built a system that collects data from different sources in a streaming fashion, aligns them to a global schema and then detects duplicates within the data stream without time window constraints. The challenge is not only to process newly published data without significant delay, but also to reprocess hundreds of millions existing messages, for example, after improving the similarity measure.
In this talk, we present our implementation for deduplication of data streams built on top of Kafka Streams. For this, we leverage Kafka APIs, namely state stores, and also use Kubernetes to auto-scale our application from 0 to a defined maximum. This allows us to process live data immediately and also reprocess all data from scratch within a reasonable amount of time.
Failing to Cross the Streams – Lessons Learned the Hard Way | Philip Schmitt,...HostedbyConfluent
This is a talk about debugging Stream–Table joins – based on my first-hand experience of stumbling into various pitfalls.
I will walk through the laughter and tears of the sharper edges of ksqlDB that I encountered along the way. We will witness the power and versatility of kafkacat and uncover the number one kafkacat pitfall. We will see Stream–Table join semantics in action.
Scala.io 2013 - Scala and ZeroMQ: Events beyond the JVMRUDDER
After a short presentation of Message Oriented Middleware history and the birth of ZeroMQ in reaction to AMQP drift, these slides show how ZeroMQ and Scala can be used, either with a direct binding or trought an Akka layer, to make events propagate beyound the boundaries of a JVM system.
The final example
Describes some differences and similarities of Apache Flink and Apache Storm. Gives a introduction into Flink's compatibility layer that allows to run Storm topologies in Flink and to embed spouts and bolts in Flink streaming programs.
Covers using Kafka MirrorMaker for disaster recovery, scaling reads, and to isolate mission critical clusters. Starts out with a description of MirrorMaker and how to use. Then walks through a thorough introduction and example. Step by Step.
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...MLconf
Spark and GraphX in the Netflix Recommender System: We at Netflix strive to deliver maximum enjoyment and entertainment to our millions of members across the world. We do so by having great content and by constantly innovating on our product. A key strategy to optimize both is to follow a data-driven method. Data allows us to find optimal approaches to applications such as content buying or our renowned personalization algorithms. But, in order to learn from this data, we need to be smart about the algorithms we use, how we apply them, and how we can scale them to our volume of data (over 50 million members and 5 billion hours streamed over three months). In this talk we describe how Spark and GraphX can be leveraged to address some of our scale challenges. In particular, we share insights and lessons learned on how to run large probabilistic clustering and graph diffusion algorithms on top of GraphX, making it possible to apply them at Netflix scale.
Apache Kafka is an open-source message broker project developed by the Apache Software Foundation written in Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaGuido Schmutz
After a quick overview and introduction of Apache Kafka, this session cover two components which extend the core of Apache Kafka: Kafka Connect and Kafka Streams/KSQL.
Kafka Connects role is to access data from the out-side-world and make it available inside Kafka by publishing it into a Kafka topic. On the other hand, Kafka Connect is also responsible to transport information from inside Kafka to the outside world, which could be a database or a file system. There are many existing connectors for different source and target systems available out-of-the-box, either provided by the community or by Confluent or other vendors. You simply configure these connectors and off you go.
Kafka Streams is a light-weight component which extends Kafka with stream processing functionality. By that, Kafka can now not only reliably and scalable transport events and messages through the Kafka broker but also analyse and process these event in real-time. Interestingly Kafka Streams does not provide its own cluster infrastructure and it is also not meant to run on a Kafka cluster. The idea is to run Kafka Streams where it makes sense, which can be inside a “normal” Java application, inside a Web container or on a more modern containerized (cloud) infrastructure, such as Mesos, Kubernetes or Docker. Kafka Streams has a lot of interesting features, such as reliable state handling, queryable state and much more. KSQL is a streaming engine for Apache Kafka, providing a simple and completely interactive SQL interface for processing data in Kafka.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Tutorial Kafka-Storm
1. Sistemas distribuidos escalables
Tutorial
Miguel C´arcamo V´asquez
Daniel Wladdimiro Cottet
Profesores: Erika Rosas Olivos
Nicol´as Hidalgo Castillo
Departamento de Ingenier´ıa Inform´atica
Universidad de Santiago de Chile
November, 2014
M. C´arcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 1 / 31
2. Kafka
What is Kafka?
Apache Kafka is publish-subscribe messaging rethought as a distributed
commit log.
• Fast
• Hundreds of megabytes
• Scalable
• Elastically
• Transparently
• Durable
• Persisted on disk
M. C´arcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 2 / 31
3. Kafka
Architecture
It is a distributed, partitioned, replicated commit log service. It provides
the functionality of a messaging system, but with a unique design.
M. C´arcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 3 / 31
4. Kafka
Architecture
A two server Kafka cluster hosting four partitions (P0-P3) with two
consumer groups. Consumer group A has two consumer instances and
group B has four.
M. C´arcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 4 / 31
16. Storm
What is Storm?
• Computation platform for stream data processing
• Fault Tolerant
• Scalable
• Distributed
• Reliable
• Learn, code and run
M. C´arcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 16 / 31
17. Architecture
Fig. 1: Storm Cluster
M. C´arcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 17 / 31
20. Before coding
• Install maven or graddle
• Install Eclipse (only if you want to)
M. C´arcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 20 / 31
21. Coding a Spout
Structure
• import libraries
• public class ”SpoutName” extends BaseRichSpout
• class variables
• public void open(Map conf, TopologyContext topologyContext,
SpoutOutCollector collector)
• public void nextTuple()
• public void declareOutputFields(OutputFields declarer)
• Your methods
M. C´arcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 21 / 31
22. Coding a Bolt
Structure
• import libraries
• public class ”BoltName” extends BaseRichBolt
• class variables
• public ”BoltName”() (Constructor)
• public void prepare(Map map, TopologyContext topologyContext,
OutputCollector collector)
• public void execute(Tuple input)
• public void declareOutputFields(OutputFields declarer)
• Your methods
M. C´arcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 22 / 31
23. Coding a Topology
Structure
• import libraries
• public class Topology
• class variables
• public static void main(String[] args)
• Config config = new Config()
• TopologyBuilder b = new TopologyBuilder()
• b.setSpout(”SpoutName”, new ”SpoutName”)
• b.setBolt(”BoltName”, new
”BoltName”.shuffleGroping(”SpoutName”))
• final LocalCluster cluster = new LocalCluster()
• cluster.submitTopology(”TopologyName”, config, b.createTopology())
M. C´arcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 23 / 31
24. Compile & Run
• Download a Storm release , unpack it, and put the unpacked bin/
directory on your PATH.
• cd myapp
• mvn package
• storm jar target/my-app-1.0-SNAPSHOT.jar
com.mycompany.app.App
M. C´arcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 24 / 31
26. Grouping
• Shuffle: Stream tuples are randomly distributed such that each bolt is
guaranteed to get an equal number of tuples.
• Fields: Stream tuples are partitioned by the fields specified in the
grouping.
• All grouping: Stream tuples are replicated across all the bolts.
• Global grouping: entire stream goes to a single bolt.
• Direct Grouping: the source decides which component will receive the
tuple.
M. C´arcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 26 / 31