This document summarizes Shuhsi Lin's presentation about Apache Kafka. The presentation introduced Kafka as a distributed streaming platform and message broker. It covered Kafka's core concepts like topics, partitions, producers, consumers and brokers. It also discussed different Python clients for Kafka like Pykafka, Kafka-python and Confluent Kafka and their usage in applications like log aggregation, metrics collection and stream processing.
zebra is an open source implementation as a successor of GNU Zebra and Quagga project. Together with openconfigd, it will work as data plane agnostic Network Operation Stack working with variable protocol / functional modules.
Advanced Kurento Real Time Media Stream ProcessingFIWARE
Advanced Kurento Real Time Media Stream Processing presentation, by Juan Ángel Fuentes.
Stream Oriented GE. How-to sessions. 1st FIWARE Summit, Málaga, Dec. 13-15, 2016.
WPEWebKit, the WebKit port for embedded platforms (Linaro Connect San Diego 2...Igalia
By Philippe Normand.
WPEWebKit[1] is a WebKit flavor (also known as port) specially crafted for embedded platforms and use-cases. During this talk I would present WPEWebKit's architecture with a special emphasis on its multimedia backend based on GStreamer[2] and implementing support for the MSE[3], EME[4], MediaCapabilities specifications. I would also present a case study on how to successfully integrate WPEWebKit on i.MX6 and i.MX8M platforms with the Cog[5] standalone reference web-app container or within existing Qt5 applications, using the
WPEQt QML plugin.
[1] https://wpewebkit.org
[2] https://gstreamer.freedesktop.org
[3] https://www.w3.org/TR/media-source/
[4] https://www.w3.org/TR/encrypted-media/
[5] https://github.com/Igalia/cog
Linaro Connect San Diego 2019
September 23-27, 2019
https://connect.linaro.org/resources/san19/
Presentation at OpenStack Summit Boston. This talk covers various lessons on IPv6 Neutron deployments like address allocation, address configuration, router consideration and so on.
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and BareboxAnne Nicolas
Different upgrade and update strategies exist when it comes to embedded Linux system. If at development time none of these strategies have been chosen, adding them afterwards can be tedious task.
Even harder it gets when the system is already deployed in the field and only accessible via a 3G connection.
This talk is a developer experience of putting in place exactly that. Giving a return of experience on one way of doing it on a system running Barebox and a Yocto-based distribution.
Patrick Boettcher
From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...Landoop Ltd
Presentation on "Big Data and Kafka, Kafka-Connect and the modern days of stream processing" For @Argos - @Accenture Development Technology Conference - London Science Museum (IMAX)
Landoop presentation in the Athens Big Data meetup, about streaming technologies on Apache Kafka. Introduction to the Lenses SQL engine and the Lenses platform and our open-source projects.
Landoop presenting how to simplify your ETL process using Kafka Connect for (E) and (L). Introducing KCQL - the Kafka Connect Query Language & how it can simplify fast-data (ingress & egress) pipelines. How KCQL can be used to set up Kafka Connectors for popular in-memory and analytical systems and live demos with HazelCast, Redis and InfluxDB. How to get started with a fast-data docker kafka development environment. Enhance your existing Cloudera (Hadoop) clusters with fast-data capabilities.
Kafka Tutorial: Streaming Data ArchitectureJean-Paul Azar
Kafka tutorial covers Java examples for Producers and Consumers. Also covers why Kafka is important and what Kafka is. Takes a look at the whole ecosystem around Kafka. Discusses low-level details about Kafka needed for successful deploys and performance tuning like batching, compression, partitioning, and replication.
zebra is an open source implementation as a successor of GNU Zebra and Quagga project. Together with openconfigd, it will work as data plane agnostic Network Operation Stack working with variable protocol / functional modules.
Advanced Kurento Real Time Media Stream ProcessingFIWARE
Advanced Kurento Real Time Media Stream Processing presentation, by Juan Ángel Fuentes.
Stream Oriented GE. How-to sessions. 1st FIWARE Summit, Málaga, Dec. 13-15, 2016.
WPEWebKit, the WebKit port for embedded platforms (Linaro Connect San Diego 2...Igalia
By Philippe Normand.
WPEWebKit[1] is a WebKit flavor (also known as port) specially crafted for embedded platforms and use-cases. During this talk I would present WPEWebKit's architecture with a special emphasis on its multimedia backend based on GStreamer[2] and implementing support for the MSE[3], EME[4], MediaCapabilities specifications. I would also present a case study on how to successfully integrate WPEWebKit on i.MX6 and i.MX8M platforms with the Cog[5] standalone reference web-app container or within existing Qt5 applications, using the
WPEQt QML plugin.
[1] https://wpewebkit.org
[2] https://gstreamer.freedesktop.org
[3] https://www.w3.org/TR/media-source/
[4] https://www.w3.org/TR/encrypted-media/
[5] https://github.com/Igalia/cog
Linaro Connect San Diego 2019
September 23-27, 2019
https://connect.linaro.org/resources/san19/
Presentation at OpenStack Summit Boston. This talk covers various lessons on IPv6 Neutron deployments like address allocation, address configuration, router consideration and so on.
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and BareboxAnne Nicolas
Different upgrade and update strategies exist when it comes to embedded Linux system. If at development time none of these strategies have been chosen, adding them afterwards can be tedious task.
Even harder it gets when the system is already deployed in the field and only accessible via a 3G connection.
This talk is a developer experience of putting in place exactly that. Giving a return of experience on one way of doing it on a system running Barebox and a Yocto-based distribution.
Patrick Boettcher
From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...Landoop Ltd
Presentation on "Big Data and Kafka, Kafka-Connect and the modern days of stream processing" For @Argos - @Accenture Development Technology Conference - London Science Museum (IMAX)
Landoop presentation in the Athens Big Data meetup, about streaming technologies on Apache Kafka. Introduction to the Lenses SQL engine and the Lenses platform and our open-source projects.
Landoop presenting how to simplify your ETL process using Kafka Connect for (E) and (L). Introducing KCQL - the Kafka Connect Query Language & how it can simplify fast-data (ingress & egress) pipelines. How KCQL can be used to set up Kafka Connectors for popular in-memory and analytical systems and live demos with HazelCast, Redis and InfluxDB. How to get started with a fast-data docker kafka development environment. Enhance your existing Cloudera (Hadoop) clusters with fast-data capabilities.
Kafka Tutorial: Streaming Data ArchitectureJean-Paul Azar
Kafka tutorial covers Java examples for Producers and Consumers. Also covers why Kafka is important and what Kafka is. Takes a look at the whole ecosystem around Kafka. Discusses low-level details about Kafka needed for successful deploys and performance tuning like batching, compression, partitioning, and replication.
This tutorial covers advanced consumer topics like custom deserializers, ConsumerRebalanceListener to rewind to a certain offset, manual assignment of partitions to implement a "priority queue", “at least once” message delivery semantics Consumer Java example, “at most once” message delivery semantics Consumer Java example, “exactly once” message delivery semantics Consumer Java example, and a lot more.
You’ve heard all of the hype, but how can SMACK work for you? In this all-star lineup, you will learn how to create a reactive, scaling, resilient and performant data processing powerhouse. We will go through the basics of Akka, Kafka and Mesos and then deep dive into putting them together in an end2end (and back again) distrubuted transaction. Distributed transactions mean producers waiting for one or more of consumers to respond. On the backend, you will see how Apache Cassandra and Spark can be combined to add the incredibly scaling storage and data analysis needed for fast data pipelines. With these technologies as a foundation, you have the assurance that scale is never a problem and uptime is default.
In this slide deck we show how to implement custom Kafka Serializer for Producer. We then show how failover works configuring when broker/topic config min.insync.replicas, and Producer config acks (0, 1, -1, none, leader, all).
Then tutorial show how to implement Kafka producer batching and compression. Then use Producer metrics API to see how batching and compression improves throughput. Then this tutorial covers using retires and timeouts, and tested that it works. It explains how the setup of max inflight messages and retry back off work and when to use and not use inflight messaging.
It goes on to who how to implement a ProducerInterceptor. Then lastly, it shows how to implement a custom Kafka partitioner to implement a priority queue for important records. Through many of the step by step examples, this tutorial shows how to use some of the Kafka tools to do replication verification, and inspect the topic partition leadership status.
Building Event-Driven Systems with Apache KafkaBrian Ritchie
Event-driven systems provide simplified integration, easy notifications, inherent scalability and improved fault tolerance. In this session we'll cover the basics of building event driven systems and then dive into utilizing Apache Kafka for the infrastructure. Kafka is a fast, scalable, fault-taulerant publish/subscribe messaging system developed by LinkedIn. We will cover the architecture of Kafka and demonstrate code that utilizes this infrastructure including C#, Spark, ELK and more.
Sample code: https://github.com/dotnetpowered/StreamProcessingSample
Today, many companies are faced with a huge quantity of data and a wide variety of tools with which to process it. This potentially allows for great opportunities to satisfy customers’ needs and bring user experience to the next level. However, in order to achieve this and provide a competitive solution, sophisticated and complex data processing is needed. Such processing can rarely be done with one tool or framework — a number of tools are often involved, each having prowess in a particular field of the processing pipeline.
In this session, we will see the latest endeavors of Apache Ignite to integrate with other big data platforms and provide its in-memory computing strengths for data processing pipelines. In particular we will have a closer look at how it can be integrated and used with Apache Kafka and/or Flume, and outline several use scenarios.
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksSlim Baltagi
Slides of my talk at the Hadoop Summit Europe in Dublin, Ireland on April 13th, 2016. The talk introduces Apache Flink as both a multi-purpose Big Data analytics framework and real-world streaming analytics framework. It is focusing on Flink's key differentiators and suitability for streaming analytics use cases. It also shows how Flink enables novel use cases such as distributed CEP (Complex Event Processing) and querying the state by behaving like a key value data store.
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksSlim Baltagi
Slides of my talk at the Hadoop Summit Europe in Dublin, Ireland on April 13th, 2016. The talk introduces Apache Flink as both a multi-purpose Big Data analytics framework and real-world streaming analytics framework. It is focusing on Flink's key differentiators and suitability for streaming analytics use cases. It also shows how Flink enables novel use cases such as distributed CEP (Complex Event Processing) and querying the state by behaving like a key value data store.
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
With a current zoo of technologies and different ways of their interaction it's a big challenge to architect a system (or adopt existed one) that will conform to low-latency BigData analysis requirements. Apache Kafka and Kappa Architecture in particular take more and more attention over classic Hadoop-centric technologies stack. New Consumer API put significant boost in this direction. Microservices-based streaming processing and new Kafka Streams tend to be a synergy in BigData world.
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
ndependent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can me make sure that all these event are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target.
This session will start with an introduction into Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table. Additionally the Kafka ecosystem will be covered as well as the integration of Kafka in the Oracle Stack, with products such as Golden Gate, Service Bus and Oracle Stream Analytics all being able to act as a Kafka consumer or producer.
Stream Processing with Apache Kafka and .NETconfluent
Presentation from South Bay.NET meetup on 3/30.
Speaker: Matt Howlett, Software Engineer at Confluent
Apache Kafka is a scalable streaming platform that forms a key part of the infrastructure at many companies including Uber, Netflix, Walmart, Airbnb, Goldman Sachs and LinkedIn. In this talk Matt will give a technical overview of Kafka, discuss some typical use cases (from surge pricing to fraud detection to web analytics) and show you how to use Kafka from within your C#/.NET applications.
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Timothy Spann
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and Kafka
Apache NiFi, Apache Flink, Apache Kafka
Timothy Spann
Principal Developer Advocate
Cloudera
Data in Motion
https://budapestdata.hu/2023/en/speakers/timothy-spann/
Timothy Spann
Principal Developer Advocate
Cloudera (US)
LinkedIn · GitHub · datainmotion.dev
June 8 · Online · English talk
Building Modern Data Streaming Apps with NiFi, Flink and Kafka
In my session, I will show you some best practices I have discovered over the last 7 years in building data streaming applications including IoT, CDC, Logs, and more.
In my modern approach, we utilize several open-source frameworks to maximize the best features of all. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Kafka. From there we build streaming ETL with Apache Flink SQL. We will stream data into Apache Iceberg.
We use the best streaming tools for the current applications with FLaNK. flankstack.dev
BIO
Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Pulsar, Apache Kafka, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming.
Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.
Event Hub (i.e. Kafka) in Modern Data ArchitectureGuido Schmutz
Today's modern data architectures and the their implementations contain an Event Hub. What are the benefits of placing an Event Hub in a Modern Data (Analytics) Architecture? What exactly is an Event Hub and what capabilities should it provide? Why is Apache Kafka the most popular realization of an Event Hub?
These and many other questions will be answered in this session. The talk will start with a vendor-neutral definition of the capabilities of an Event Hub.
Then the session will highlight the different architecture styles which can be supported using an Event Hub (Kafka), such as Streaming Data Integration, Stream Analytics and Decoupled Event-Driven Applications and how can these be combined into a unified architecture, making the Event Hub the central nervous system of an enterprise architecture. We will end with an overview of the Kafka ecosystem and a placement of the various components onto the Modern Data (Analytics) Architecture.
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraJoe Stein
Slides for our solution we developed for using Mesos, Docker, Kafka, Spark, Cassandra and Solr (DataStax Enterprise Edition) all developed in Go for doing realtime log analysis at scale. Many organizations either need or want log analysis in real time where you can see within a second what is happening within your entire infrastructure. Today, with the hardware available and software systems we have in place, you can develop, build and use as a service these solutions.
OSSNA Building Modern Data Streaming AppsTimothy Spann
OSSNA
Building Modern Data Streaming Apps
https://ossna2023.sched.com/event/1Jt05/virtual-building-modern-data-streaming-apps-with-open-source-timothy-spann-streamnative
Timothy Spann
Cloudera
Principal Developer Advocate
Data in Motion
In my session, I will show you some best practices I have discovered over the last seven years in building data streaming applications, including IoT, CDC, Logs, and more. In my modern approach, we utilize several open-source frameworks to maximize all the best features. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Pulsar. From there, we build streaming ETL with Apache Spark and enhance events with Pulsar Functions for ML and enrichment. We make continuous queries against our topics with Flink SQL. We will stream data into various open-source data stores, including Apache Iceberg, Apache Pinot, and others. We use the best streaming tools for the current applications with the open source stack - FLiPN. https://www.flipn.app/ Updates: This will be in-person with live coding based on feedback from the crowd. This will also include new data stores, new sources, and data relevant to and from the Vancouver area. This will also include updates to the platforms and inclusion of Apache Iceberg, Apache Pinot and some other new tech.
https://github.com/tspannhw/SpeakerProfile Tim Spann is a Principal Developer Advocate for Cloudera. He works with Apache Kafka, Apache Flink, Flink SQL, Apache NiFi, MiniFi, Apache MXNet, TensorFlow, Apache Spark, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.
Timothy J Spann
Cloudera
Principal Developer Advocate
Hightstown, NJ
Websitehttps://datainmotion.dev/
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can me make sure that all these event are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amount of messages between a source and a target.
This session will start with an introduction into Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table. Additionally the Kafka ecosystem will be covered as well as the integration of Kafka in the Oracle Stack, with products such as Golden Gate, Service Bus and Oracle Stream Analytics all being able to act as a Kafka consumer or producer.
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...GeeksLab Odessa
4.6.16 AI&BigData Lab
Upcoming events: goo.gl/I2gJ4H
Как устроить анализ данных 40 млн. человек за 5 лет так, чтобы это выглядело почти в реальном времени.
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...DataStax Academy
We will be talking about the solution we developed for using Mesos, Docker, Kafka, Spark, Cassandra and Solr (DataStax Enterprise Edition) all developed in Go for doing realtime log analysis at scale. Many organizations either need or want log analysis in real time where you can see within a second what is happening within your entire infrastructure. Today, with the hardware available and software systems we have in place, you can develop, build and use as a service these solutions.
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
Independent of the source of data, the integration and analysis of event streams gets more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analyzed, often with many consumers or systems interested in all or part of the events. In this session we compare two popular Streaming Analytics solutions: Spark Streaming and Kafka Streams.
Spark is fast and general engine for large-scale data processing and has been designed to provide a more efficient alternative to Hadoop MapReduce. Spark Streaming brings Spark's language-integrated API to stream processing, letting you write streaming applications the same way you write batch jobs. It supports both Java and Scala.
Kafka Streams is the stream processing solution which is part of Kafka. It is provided as a Java library and by that can be easily integrated with any Java application.
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloJoe Stein
In this talk we will walk through how Apache Kafka and Apache Accumulo can be used together to orchestrate a de-coupled, real-time distributed and reactive request/response system at massive scale. Multiple data pipelines can perform complex operations for each message in parallel at high volumes with low latencies. The final result will be inline with the initiating call. The architecture gains are immense. They allow for the requesting system to receive a response without the need for direct integration with the data pipeline(s) that messages must go through. By utilizing Apache Kafka and Apache Accumulo, these gains sustain at scale and allow for complex operations of different messages to be applied to each response in real-time.
Similar to Connect K of SMACK:pykafka, kafka-python or? (20)
Your Digital Assistant.
Making complex approach simple. Straightforward process saves time. No more waiting to connect with people that matter to you. Safety first is not a cliché - Securely protect information in cloud storage to prevent any third party from accessing data.
Would you rather make your visitors feel burdened by making them wait? Or choose VizMan for a stress-free experience? VizMan is an automated visitor management system that works for any industries not limited to factories, societies, government institutes, and warehouses. A new age contactless way of logging information of visitors, employees, packages, and vehicles. VizMan is a digital logbook so it deters unnecessary use of paper or space since there is no requirement of bundles of registers that is left to collect dust in a corner of a room. Visitor’s essential details, helps in scheduling meetings for visitors and employees, and assists in supervising the attendance of the employees. With VizMan, visitors don’t need to wait for hours in long queues. VizMan handles visitors with the value they deserve because we know time is important to you.
Feasible Features
One Subscription, Four Modules – Admin, Employee, Receptionist, and Gatekeeper ensures confidentiality and prevents data from being manipulated
User Friendly – can be easily used on Android, iOS, and Web Interface
Multiple Accessibility – Log in through any device from any place at any time
One app for all industries – a Visitor Management System that works for any organisation.
Stress-free Sign-up
Visitor is registered and checked-in by the Receptionist
Host gets a notification, where they opt to Approve the meeting
Host notifies the Receptionist of the end of the meeting
Visitor is checked-out by the Receptionist
Host enters notes and remarks of the meeting
Customizable Components
Scheduling Meetings – Host can invite visitors for meetings and also approve, reject and reschedule meetings
Single/Bulk invites – Invitations can be sent individually to a visitor or collectively to many visitors
VIP Visitors – Additional security of data for VIP visitors to avoid misuse of information
Courier Management – Keeps a check on deliveries like commodities being delivered in and out of establishments
Alerts & Notifications – Get notified on SMS, email, and application
Parking Management – Manage availability of parking space
Individual log-in – Every user has their own log-in id
Visitor/Meeting Analytics – Evaluate notes and remarks of the meeting stored in the system
Visitor Management System is a secure and user friendly database manager that records, filters, tracks the visitors to your organization.
"Secure Your Premises with VizMan (VMS) – Get It Now"
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Why React Native as a Strategic Advantage for Startup Innovation.pdfayushiqss
Do you know that React Native is being increasingly adopted by startups as well as big companies in the mobile app development industry? Big names like Facebook, Instagram, and Pinterest have already integrated this robust open-source framework.
In fact, according to a report by Statista, the number of React Native developers has been steadily increasing over the years, reaching an estimated 1.9 million by the end of 2024. This means that the demand for this framework in the job market has been growing making it a valuable skill.
But what makes React Native so popular for mobile application development? It offers excellent cross-platform capabilities among other benefits. This way, with React Native, developers can write code once and run it on both iOS and Android devices thus saving time and resources leading to shorter development cycles hence faster time-to-market for your app.
Let’s take the example of a startup, which wanted to release their app on both iOS and Android at once. Through the use of React Native they managed to create an app and bring it into the market within a very short period. This helped them gain an advantage over their competitors because they had access to a large user base who were able to generate revenue quickly for them.
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
Tim Combridge from Sensible Giraffe and Salesforce Ben presents some important tips that all developers should know when dealing with Flows in Salesforce.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Designing for Privacy in Amazon Web ServicesKrzysztofKkol1
Data privacy is one of the most critical issues that businesses face. This presentation shares insights on the principles and best practices for ensuring the resilience and security of your workload.
Drawing on a real-life project from the HR industry, the various challenges will be demonstrated: data protection, self-healing, business continuity, security, and transparency of data processing. This systematized approach allowed to create a secure AWS cloud infrastructure that not only met strict compliance rules but also exceeded the client's expectations.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
2. About Me
Data Software Engineer of EAD
in the manufacturer, Micron
Currently working with
- data and people
- Lurking in PyHug, Taipei.py and various Meetups
Shuhsi Lin
sucitw gmail.com
5. Agenda
» Pipeline to streaming
» What is Apache Kafka
⋄ Overview
⋄ Architecture
⋄ Use cases
» Kafka API
⋄ Python clients
» Conclusion and More about Kafka
6. What we will not focus on
» Reliability and durability
⋄ Scaling, replication, guarantee
⋄ Zookeeper
» Compact log
» Administration, Configuration, Operations
» Kafka connect
» Kafka Stream
» Apache Kafka vs XXX
⋄ RabbitMQ, AWS Kinesis, GCP Pub/Sub, ActiveMQ,
ZeroMQ, Redis, and ....
12. What is streaming process
» Data comes from the rise of events
(orders, sales, clicks or trades)
» Databases are event streams
⋄ the process of creating a backup or standby copy
of a database
⋄ publishing the database changes
16. The name, “Kafka”, came from?
https://www.quora.com/What-is-the-relation-between-Kafka-the-writer-and-Apache-Kafka-the-distributed-messaging-system
http://slideplayer.com/slide/4221536/
https://en.wikipedia.org/wiki/Franz_Kafka
17. What is Apache Kafka?
Apache Kafka is a distributed system designed for streams. It is built to be
fault-tolerant, high-throughput, horizontally scalable, and allows geographically
distributing data streams and processing.
https://kafka.apache.org
22. What a streaming data platform can provide
» “Data integration” (ETL)
⋄ How to transport data between systems
⋄ Captures streams of events or data changes and
feeds these to other data systems
» “Stream processing” (messaging)
⋄ Continuous, real-time processing and
transformation of these streams and makes the
results available system-wide.
various systems in LinkedIn
https://www.confluent.io/blog/stream-data-platform-1/
Analytical data processing with very low latency
24. What Kafka Does
Publish & subscribe
● to streams of data like a messaging system
Process
● streams of data efficiently and in real time
Store
● streams of data safely in a distributed replicated cluster
https://kafka.apache.org/
27. A modern stream-centric data architecture built around Apache Kafka
https://www.confluent.io/blog/stream-data-platform-1/
500 billion events per day
28. The key abstraction in Kafka is a
structured commit log of updates
append records to this log
https://www.confluent.io/blog/stream-data-platform-1/
Each of these data consumers
has its own position in the log
and advances independently.
This allows a reliable, ordered stream of updates
to be distributed to each consumer.
The log can be sharded and spread
over a cluster of machines, and
each shard is replicated for
fault-tolerance.
consumers
producers
parallel, ordered consumption
(important to a change capture system
for database updates)
TBs of data
29. Topics and Partitions
» Topics are split into partitions
» Partitions are strongly ordered & immutable
» Partitions can exist on different servers
» Partition enable scalability
» Producers assign a message to a partition within the topic
⋄ Either round robin ( simply to balance load)
⋄ or according to the keys
https://kafka.apache.org/documentation/#gettingStarted
30. Offsets
» Message are assigned an offset in the partition
» Consumers track with ( offset, partition, topic)
https://kafka.apache.org/documentation/#gettingStarted
A two server Kafka cluster hosting four partitions (P0-P3) with two consumer groups
31. Consumers and Partitions
» A consumer group consumes one topic
» A partition is always sent to the same consumer instance
https://kafka.apache.org/documentation/#gettingStarted
32. Consumer
● Messages are available to consumers only when they have been
committed
● Kafka does not push
○ Unlike JMS
● Read does not destroy by consumers
○ Unlike JMS Topic
● (some) History available
○ Offline consumers can catch up
○ Consumers can re-consume from the past
● Delivery Guarantees
○ Ordering maintained
○ At-least-once (per consumer) by default; at-most-once and exactly-once can be
implemented
P11 at https://www.slideshare.net/lucasjellema/amis-sig-introducing-apache-kafka-scalable-reliable-event-bus-message-queue
33. ZooKeeper: the coordination interface
between the Kafka broker and consumers
https://hortonworks.com/hadoop-tutorial/realtime-event-processing-nifi-kafka-storm/#section_3
» Stores configuration data for distributed services
» Used primarily by brokers
» Used by consumers in 0.8 but not 0.9
35. Apache Kafka timeline
2011-Nov
2016-May2013-Nov 2015-Nov
Next
version
v0.10
Kafka Stream
rack awareness
v0.8
New Producer
Reassign-partitions
v0.9
Kafka Connect
Security
New Consumer
Apache
Software
Foundation
incubator
2010
Creation
In Linkedin
2014, Confluent
v0.10.2
Single Message Transforms
for Kafka Connect
36. TLS connection
SSL is supported only for the new Kafka Producer and Consumer (Kafka versions 0.9.0 and higher)
http://kafka.apache.org/documentation.html#security_ssl
http://docs.confluent.io/current/kafka/ssl.html
http://maximilianchrist.com/blog/connect-to-apache-kafka-from-python-using-ssl
https://github.com/edenhill/librdkafka/wiki/Using-SSL-with-librdkafka
37. Apache Kafka is consider as :
Stream data platform
» Commit log service
» Messaging system
» circular buffer
38. Cons of Apache Kafka
» Consumer Complexity (smart, but poor client)
» Lack of tooling/monitoring (3rd party)
» Still pre 1.0 release
» Operationally, it’s more manual than desired
» Requires ZooKeeper
Sep 26, 2015http://www.slideshare.net/jimplush/introduction-to-apache-kafka-53225326
39. Use Cases
» Website Activity Tracking
» Log Aggregation
» Stream Processing
» Event Sourcing
» Commit logs
» Metrics (Performance index streaming)
⋄ CPU/IO/Memory usage
⋄ Application Specific:
⋄ Time taken to load a web-page
⋄ Time taken to build a web-page
⋄ No. of requests
⋄ No. of hits on a particular page/url
40. Event-driven Applications
» how it first is adopted and how its role
evolves over time in their architecture.
https://aws.amazon.com/tw/kafka/
42. Conceptual Reference Architecture
for Real-Time Processing in HDP 2.2
https://hortonworks.com/blog/storm-kafka-together-real-time-data-refinery/ February 12, 2015
43. Event delivery system design in Spotify
43
https://labs.spotify.com/2016/03/03/spotifys-event-delivery-the-road-to-the-cloud-part-ii/
44. Case: Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spark Streaming
http://helenaedelson.com/?p=1186 (2016/03)
62. Must be type bytes, or be
serializable to bytes via
configured value_serializer.
Producer API -Kafka-Python
from kafka import KafkaConsumer, KafkaProducer
from settings import BOOTSTRAP_SERVERS, TOPICS, MSG
p = KafkaProducer(bootstrap_servers=BOOTSTRAP_SERVERS)
p.send(TOPICS, MSG.encode('utf-8'))
p.flush()
Class kafka.KafkaProducer(**configs)
https://kafka-python.readthedocs.io/en/master/_modules/kafka/producer/kafka.html#KafkaProducer
● close(timeout=None)
● flush(timeout=None)
● partitions_for(topic)
● send(topic, value=None, key=None,
partition=None, timestamp_ms=None)
http://kafka-python.readthedocs.io/en/master/apidoc/KafkaProducer.html
63. Producer API -Confluent-python -Kafka
from confluent_kafka import Producer
from settings import BOOTSTRAP_SERVERS,
TOPICS, MSG
p = Producer({'bootstrap.servers':
BOOTSTRAP_SERVERS})
p.produce(TOPICS, MSG.encode('utf-8'))
p.flush()
http://docs.confluent.io/current/clients/confluent-kafka-python/#producer
Class confluent_kafka.Producer(*kwargs)
● len()
● flush([timeout])
● poll([timeout])
● produce(topic[, value][, key][, partition][,
on_delivery][, timestamp])
67. Create a Kafka Topic
» Let's create a topic named "test" with a single partition and
only one replica:
⋄ kafka-topics.sh --create --zookeeper zhost:2181
--replication-factor 1 --partitions 1 --topic test
» See that topic
⋄ bin/kafka-topics.sh --list --zookeeper zhost:2181
bin/kafka-topics.sh
» Create, delete, describe, or change a topic.
74. More about Kafka
» Reliability and durability
⋄ Scaling, replication, guarantee, Zookeeper
» Compact log
» Administration, Configuration, Operations, Monitoring
» Kafka connect
» Kafka Stream
» Schema Registry
» Rest proxy
» Apache Kafka vs XXX
⋄ RabbitMQ, AWS Kinesis, GCP Pub/Sub, ActiveMQ, ZeroMQ, Redis,
and ....
75. The Another 2 APIs
» Connect API
○ JDBC, HDFS, S3, ….
» Streams API
○ MAP, filter, aggregate, join
76. More references
1. The Log: What every software engineer should know about real-time data's unifying abstraction,
Jay Kreps, 2013
2. Pykafka and Kafka-python? https://github.com/Parsely/pykafka/issues/559
3. Why I am not a fan of Apache Kafka (2015-2016 Sep)
4. Kafka vs RabbitMQ
a. What are the differences between Apache Kafka and RabbitMQ?
b. Understanding When to use RabbitMQ or Apache Kafka
5. Kafka summit (2016~)
6. Future features of Kafka (Kafka Improvement Proposals)
7. Kafka- The Definitive Guide