El pasado 17 de mayo se celebró en las oficinas de Paradigma Digital el meetup mensual del grupo Python Madrid. Nuestro compañero Álvaro León nos habló de Kafka y Python.
Vídeo de la presentación: https://www.youtube.com/watch?v=HPfNDL-jIGM
How we use Zabbix at BlaBlaCar. What we did to be able to deal with >25k items, >300 values per second & >6.5k triggers
- Trappers everywhere
- Low level discovery
- python-protobix
- jmx-zabbix
Reactive Programming, Traits and Principles. What is Reactive, where does it come from, and what is it good for? How does it differ from event driven programming? It only functional?
"ReactiveX
ReactiveX is a library for composing asynchronous and event-based programs by using observable sequences.
It extends the observer pattern to support sequences of data and/or events and adds operators that allow you to compose sequences together declaratively while abstracting away concerns about things like low-level threading, synchronization, thread-safety, concurrent data structures, and non-blocking I/O."
http://reactivex.io/
ApacheCon 2021 - Apache NiFi Deep Dive 300Timothy Spann
21-September-2021 - ApacheCon - Tuesday 17:10 UTC Apache NIFi Deep Dive 300
* https://github.com/tspannhw/EverythingApacheNiFi
* https://github.com/tspannhw/FLiP-ApacheCon2021
* https://www.datainmotion.dev/2020/06/no-more-spaghetti-flows.html
* https://github.com/tspannhw/FLiP-IoT
* https://github.com/tspannhw/FLiP-Energy
* https://github.com/tspannhw/FLiP-SOLR
* https://github.com/tspannhw/FLiP-EdgeAI
* https://github.com/tspannhw/FLiP-CloudQueries
* https://github.com/tspannhw/FLiP-Jetson
* https://www.linkedin.com/pulse/2021-schedule-tim-spann/
Tuesday 17:10 UTC
Apache NIFi Deep Dive 300
Timothy Spann
For Data Engineers who have flows already in production, I will dive deep into best practices, advanced use cases, performance optimizations, tips, tricks, edge cases, and interesting examples. This is a master class for those looking to learn quickly things I have picked up after years in the field with Apache NiFi in production.
This will be interactive and I encourage questions and discussions.
You will take away examples and tips in slides, github, and articles.
This talk will cover:
Load Balancing
Parameters and Parameter Contexts
Stateless vs Stateful NiFi
Reporting Tasks
NiFi CLI
NiFi REST Interface
DevOps
Advanced Record Processing
Schemas
RetryFlowFile
Lookup Services
RecordPath
Expression Language
Advanced Error Handling Techniques
Tim Spann is a Developer Advocate @ StreamNative where he works with Apache NiFi, Apache Pulsar, Apache Flink, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Field Engineer at Cloudera, a senior solutions architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
How we use Zabbix at BlaBlaCar. What we did to be able to deal with >25k items, >300 values per second & >6.5k triggers
- Trappers everywhere
- Low level discovery
- python-protobix
- jmx-zabbix
Reactive Programming, Traits and Principles. What is Reactive, where does it come from, and what is it good for? How does it differ from event driven programming? It only functional?
"ReactiveX
ReactiveX is a library for composing asynchronous and event-based programs by using observable sequences.
It extends the observer pattern to support sequences of data and/or events and adds operators that allow you to compose sequences together declaratively while abstracting away concerns about things like low-level threading, synchronization, thread-safety, concurrent data structures, and non-blocking I/O."
http://reactivex.io/
ApacheCon 2021 - Apache NiFi Deep Dive 300Timothy Spann
21-September-2021 - ApacheCon - Tuesday 17:10 UTC Apache NIFi Deep Dive 300
* https://github.com/tspannhw/EverythingApacheNiFi
* https://github.com/tspannhw/FLiP-ApacheCon2021
* https://www.datainmotion.dev/2020/06/no-more-spaghetti-flows.html
* https://github.com/tspannhw/FLiP-IoT
* https://github.com/tspannhw/FLiP-Energy
* https://github.com/tspannhw/FLiP-SOLR
* https://github.com/tspannhw/FLiP-EdgeAI
* https://github.com/tspannhw/FLiP-CloudQueries
* https://github.com/tspannhw/FLiP-Jetson
* https://www.linkedin.com/pulse/2021-schedule-tim-spann/
Tuesday 17:10 UTC
Apache NIFi Deep Dive 300
Timothy Spann
For Data Engineers who have flows already in production, I will dive deep into best practices, advanced use cases, performance optimizations, tips, tricks, edge cases, and interesting examples. This is a master class for those looking to learn quickly things I have picked up after years in the field with Apache NiFi in production.
This will be interactive and I encourage questions and discussions.
You will take away examples and tips in slides, github, and articles.
This talk will cover:
Load Balancing
Parameters and Parameter Contexts
Stateless vs Stateful NiFi
Reporting Tasks
NiFi CLI
NiFi REST Interface
DevOps
Advanced Record Processing
Schemas
RetryFlowFile
Lookup Services
RecordPath
Expression Language
Advanced Error Handling Techniques
Tim Spann is a Developer Advocate @ StreamNative where he works with Apache NiFi, Apache Pulsar, Apache Flink, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Field Engineer at Cloudera, a senior solutions architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
Rust has something unique to offer that languages in that space have never had before, and that is a degree of safety that languages like C and C++ have never had. Rust promises to deliver equivalent or better performance and greater productivity with guaranteed memory safety and data race freedom while allowing complete and direct control over memory.
This video will cover:
What is Rust?
Benefits of Rust
Rust Ecosystem
Popular Applications in Rust
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2PeX3wC.
Steve Klabnik gives an overview of Rust’s history, diving into the technical details of how the design has changed, and talks about the difficulties of adding a major new feature to a programming language. Filmed at qconnewyork.com.
Steve Klabnik is on the core team of Rust, leads the documentation team, and is an author of "The Rust Programming Language”. He is a frequent speaker at conferences and is a prolific open source contributor, previously working on projects such as Ruby and Ruby on Rails.
What are some of the performance implications of using lambdas and what strategies can be used to address these. When might be want an alternative to using a lambda and how can we design our APIs to be flexible in this regard. What are the principles of writing low latency code in Java? How do we tune and optimize our code for low latency? When don’t we optimize our code? Where does the JVM help and where does it get in our way? How does this apply to lambdas? How can we design our APIs to use lambdas and minimize garbage?
Jdd2014: High performance logging - Peter LawreyPROIDEA
In many applications, there is a tension between how much you can log without slowing down your application, and how much information you would like to have.
Chronicle provided a number of solutions which allow you to record millions of events per second, with micro-second latencies in a persisted way without contributing to your garbage.
How does this simplify the design, help you increase the determinism and vertical scalability of your application?
The Internet is asynchronous, people are asynchronous, the universe is asynchronous. They are now and they always will be. Writing applications which deal correctly with asynchronous data is difficult. Or at least it was. Microsoft open sourced ReactiveX in 2010 to make what used to be some of the hairiest kinds of coding almost easy.
The project was so well received that it has been ported to nearly every major programming language. Versions of ReactiveX exists for .NET, JavaScript, Java, Scala, Clojure, C++, Ruby, Python, Groovy, JRuby, Kotlin, and Swift. The project is open source and community maintain with corporate backing from the likes of Microsoft and Netflix.
Microsoft created the ReactiveX, then called reactive extensions, from the burnt out remains of Project Volta. Project Volta's goal was to extend .NET's to run both on the server and in the browser. A compiler would decide which parts were best to put where. It essentially was the Meteor framework in 2007.
In this talk we will take a deep look at ReactiveX. We will use code samples to show how things are done before and after ReactiveX. The code will be in C# and JavaScript. We will see how ReactiveX makes our lives as developers easier and our code more reactive.
Producer Performance Tuning for Apache KafkaJiangjie Qin
Kafka is well known for high throughput ingestion. However, to get the best latency characteristics without compromising on throughput and durability, we need to tune Kafka. In this talk, we share our experiences to achieve the optimal combination of latency, throughput and durability for different scenarios.
Prometheus’s simple and reliable operational model is one of its major selling points. However, after surpassing a certain scale, we have identified a few shortcomings it imposes. We are proud to present Thanos, an open source project by Improbable that bundles a set of components that seamlessly transform existing Prometheus deployments, into a unified, global scale monitoring system.
Authors: Fabian Reinartz, Bartlomiej Plotka
Slides from January London Prometheus Meetup 2018.
Thanos: https://github.com/improbable-eng/thanos
Cloud Native Night June 2019, Munich: Talk by Moritz Kammerer (Software Architect at QAware)
Join our Meetup: www.meetup.com/cloud-native-muc
Abstract: Startup-Zeiten von Containern werden bei elastischer Skalierung in der Cloud immer wichtiger – wer möchte schon 2 Minuten warten, bis eine neue Instanz des Services erscheint? Hier kann der Shooting Star von Java-basierten Microservices, Spring Boot, nicht glänzen. Systembedingt steigt die Startzeit des Containers mit der Menge der Klassen auf dem Classpath. Auch der Speicherverbrauch einer Spring Boot Anwendung ist nicht gerade gering.
Micronaut, erschaffen von den Machern des Grails-Frameworks, behauptet, diese Probleme in den Griff zu bekommen. In dem Talk sehen wir uns an, wie sich Microservices mit Micronaut entwickeln lassen und prüfen, ob es die Versprechen hält.
Bonus: Da Micronaut kaum Reflection verwendet, sollte es sich mit dem GraalVM AOT Compiler in ein natives Image kompilieren lassen. In dem Vortrag werden wir dies ausprobieren und die Performance dieses Native Images messen.
Keeping Latency Low and Throughput High with Application-level Priority Manag...ScyllaDB
Throughput and latency are at a constant tension. ScyllaDB CTO and co-founder Avi Kivity will show how high throughput and low latency can both be achieved in a single application by using application-level priority scheduling.
SignalFx engineer Rajiv Kurian's presentation on why we wrote our own Kafka consumer, the performance goals, and the performance gains achieved.
Download the slides to see animations showing hardware details. These slides were converged from Keynote to Powerpoint, so there may be some oddness with slide transitions!
Google Analytics es una herramienta de analítica la que se conoce sólo una parte de su potencial. Además de medir audiencias y su comportamiento, Google Analytics permite priorizar las inversiones en marketing online, recoger comportamientos de Single Page Applications y visualizar datos offline, por ejemplo de CRM y combinarlos con los de visitas online. También es posible recoger datos en tiempo real de ventas, por ejemplo de ecommerce y de dispositivos físicos como bluetooth beacons. Las funcionalidades de Google Analytics, en combinación con Big Query y otros servicios de Google Cloud Platform, la convierte en una de las plataformas más interesantes de analítica para la transformación digital.
Si quieres ver el vídeo en el que fue usada esta presentación, pulsa aquí: https://www.youtube.com/watch?v=2mfIU-NXGXI
Para ver la convocatoria en nuestra web, clic aquí: https://www.paradigmadigital.com/eventos/usar-google-analytics/
La convocatoria a través del grupo de Meetup.com, clic aquí: https://www.meetup.com/es-ES/Front-end-Developers-Madrid/events/231793469/
Rust has something unique to offer that languages in that space have never had before, and that is a degree of safety that languages like C and C++ have never had. Rust promises to deliver equivalent or better performance and greater productivity with guaranteed memory safety and data race freedom while allowing complete and direct control over memory.
This video will cover:
What is Rust?
Benefits of Rust
Rust Ecosystem
Popular Applications in Rust
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2PeX3wC.
Steve Klabnik gives an overview of Rust’s history, diving into the technical details of how the design has changed, and talks about the difficulties of adding a major new feature to a programming language. Filmed at qconnewyork.com.
Steve Klabnik is on the core team of Rust, leads the documentation team, and is an author of "The Rust Programming Language”. He is a frequent speaker at conferences and is a prolific open source contributor, previously working on projects such as Ruby and Ruby on Rails.
What are some of the performance implications of using lambdas and what strategies can be used to address these. When might be want an alternative to using a lambda and how can we design our APIs to be flexible in this regard. What are the principles of writing low latency code in Java? How do we tune and optimize our code for low latency? When don’t we optimize our code? Where does the JVM help and where does it get in our way? How does this apply to lambdas? How can we design our APIs to use lambdas and minimize garbage?
Jdd2014: High performance logging - Peter LawreyPROIDEA
In many applications, there is a tension between how much you can log without slowing down your application, and how much information you would like to have.
Chronicle provided a number of solutions which allow you to record millions of events per second, with micro-second latencies in a persisted way without contributing to your garbage.
How does this simplify the design, help you increase the determinism and vertical scalability of your application?
The Internet is asynchronous, people are asynchronous, the universe is asynchronous. They are now and they always will be. Writing applications which deal correctly with asynchronous data is difficult. Or at least it was. Microsoft open sourced ReactiveX in 2010 to make what used to be some of the hairiest kinds of coding almost easy.
The project was so well received that it has been ported to nearly every major programming language. Versions of ReactiveX exists for .NET, JavaScript, Java, Scala, Clojure, C++, Ruby, Python, Groovy, JRuby, Kotlin, and Swift. The project is open source and community maintain with corporate backing from the likes of Microsoft and Netflix.
Microsoft created the ReactiveX, then called reactive extensions, from the burnt out remains of Project Volta. Project Volta's goal was to extend .NET's to run both on the server and in the browser. A compiler would decide which parts were best to put where. It essentially was the Meteor framework in 2007.
In this talk we will take a deep look at ReactiveX. We will use code samples to show how things are done before and after ReactiveX. The code will be in C# and JavaScript. We will see how ReactiveX makes our lives as developers easier and our code more reactive.
Producer Performance Tuning for Apache KafkaJiangjie Qin
Kafka is well known for high throughput ingestion. However, to get the best latency characteristics without compromising on throughput and durability, we need to tune Kafka. In this talk, we share our experiences to achieve the optimal combination of latency, throughput and durability for different scenarios.
Prometheus’s simple and reliable operational model is one of its major selling points. However, after surpassing a certain scale, we have identified a few shortcomings it imposes. We are proud to present Thanos, an open source project by Improbable that bundles a set of components that seamlessly transform existing Prometheus deployments, into a unified, global scale monitoring system.
Authors: Fabian Reinartz, Bartlomiej Plotka
Slides from January London Prometheus Meetup 2018.
Thanos: https://github.com/improbable-eng/thanos
Cloud Native Night June 2019, Munich: Talk by Moritz Kammerer (Software Architect at QAware)
Join our Meetup: www.meetup.com/cloud-native-muc
Abstract: Startup-Zeiten von Containern werden bei elastischer Skalierung in der Cloud immer wichtiger – wer möchte schon 2 Minuten warten, bis eine neue Instanz des Services erscheint? Hier kann der Shooting Star von Java-basierten Microservices, Spring Boot, nicht glänzen. Systembedingt steigt die Startzeit des Containers mit der Menge der Klassen auf dem Classpath. Auch der Speicherverbrauch einer Spring Boot Anwendung ist nicht gerade gering.
Micronaut, erschaffen von den Machern des Grails-Frameworks, behauptet, diese Probleme in den Griff zu bekommen. In dem Talk sehen wir uns an, wie sich Microservices mit Micronaut entwickeln lassen und prüfen, ob es die Versprechen hält.
Bonus: Da Micronaut kaum Reflection verwendet, sollte es sich mit dem GraalVM AOT Compiler in ein natives Image kompilieren lassen. In dem Vortrag werden wir dies ausprobieren und die Performance dieses Native Images messen.
Keeping Latency Low and Throughput High with Application-level Priority Manag...ScyllaDB
Throughput and latency are at a constant tension. ScyllaDB CTO and co-founder Avi Kivity will show how high throughput and low latency can both be achieved in a single application by using application-level priority scheduling.
SignalFx engineer Rajiv Kurian's presentation on why we wrote our own Kafka consumer, the performance goals, and the performance gains achieved.
Download the slides to see animations showing hardware details. These slides were converged from Keynote to Powerpoint, so there may be some oddness with slide transitions!
Google Analytics es una herramienta de analítica la que se conoce sólo una parte de su potencial. Además de medir audiencias y su comportamiento, Google Analytics permite priorizar las inversiones en marketing online, recoger comportamientos de Single Page Applications y visualizar datos offline, por ejemplo de CRM y combinarlos con los de visitas online. También es posible recoger datos en tiempo real de ventas, por ejemplo de ecommerce y de dispositivos físicos como bluetooth beacons. Las funcionalidades de Google Analytics, en combinación con Big Query y otros servicios de Google Cloud Platform, la convierte en una de las plataformas más interesantes de analítica para la transformación digital.
Si quieres ver el vídeo en el que fue usada esta presentación, pulsa aquí: https://www.youtube.com/watch?v=2mfIU-NXGXI
Para ver la convocatoria en nuestra web, clic aquí: https://www.paradigmadigital.com/eventos/usar-google-analytics/
La convocatoria a través del grupo de Meetup.com, clic aquí: https://www.meetup.com/es-ES/Front-end-Developers-Madrid/events/231793469/
¿Cómo definir el roadmap de transformación digital? En Paradigma llevamos más de 20 años ayudando a grandes compañías en su camino hacia la digitalización.
El pasado 17 de mayo se celebró en las oficinas de Paradigma Digital el meetup mensual del grupo Python Madrid. Pablo González Fuente, de GMV, nos habló de Python y Flink.
Vídeo del evento: https://www.youtube.com/watch?v=HPfNDL-jIGM
¿Cómo se despliega y autoescala Couchbase en Cloud? ¡Aprende de manera práctica!Paradigma Digital
En el pasado Meetup, presentamos Couchbase de manera general, pero ha llegado el momento de ir ahondando en los detalles del producto para conocer todas sus capacidades. Esto nos permitirá estar en mejor disposición para adoptarlo en nuestros proyectos.
En esta ocasión, se hablará de la capa de operaciones y despliegue de Couchbase aunque no con un enfoque tradicional en máquinas físicas, sino siguiendo las buenas prácticas del mercado. Explicaremos y haremos el despliegue en Google Cloud con escalabilidad horizontal elástica y automática.
Para llevar a cabo esto haremos uso, entre otras, de las siguientes tecnologías: Google Cloud, Kubernetes, Python y, por supuesto, Couchbase.
Pondremos a prueba nuestra infraestructura con una pequeña aplicación, si queréis ver los resultados, no os lo podéis perder!
Somos una empresa nativa digital, creada para Internet y con una manera diferente de hacer las cosas. A lo largo de estos últimos 10 años hemos construido una compañía sin jerarquías con una cultura de empresa basada en la libertad y la responsabilidad, que nos ha permitido llegar a ser el partner tecnológico de algunas de las grandes empresas españolas. Te contamos cuál es nuestro secreto. ¿Quieres conocer la cultura digital de Paradigma?
Javascript no para de expandirse y avanzar. Y ahora, con la llegada de EcmaScript 6, ciertos workflows y la forma de escribir código va a cambiar. Ahora los javascripters vamos a tener más herramientas en nuestras manos. Ya hay frameworks populares como Angular que sus futuras versiones vendrán en EcmaScript 6. ¿Porqué no hecharle un vistazo al futuro?
Charla impartida por Luis Calvo en la última edición de Codemotion (Madrid, Spain - Nov 21-22)
How can we use open source tools to understand complex site graphs?
Web crawlers needs websites well connected. Large ecommerce/news websites and feed readers are graphs with hundreds of thousands of vertices (web pages) and edges (links between them). Understanding these graphs has a direct effect in usability and SEO.
Greach es el evento sobre tecnologías basadas en lenguaje Groovy referente en España.
Dentro de este evento, la charla 'Use Groovy & Grails in your Spring Boot projects' se presenta como una propuesta de ejemplos y posibilidades de introducir este lenguaje y algunos módulos del framework Grails (basado también en Groovy) en proyectos implementados con la reciente solución lanzada por Spring llama Spring Boot.
More info:
http://buff.ly/1DXXQWU
Esta presentación nos muestra qué es la programación reactiva, en qué consiste, qué nos permite hacer y por qué está tan de moda. Además, podemos ver el uso concreto de esta programación utilizando RxJava.
Autor: Juan Pablo González de Gracia.
Video https://www.youtube.com/edit?video_id=gQNcDyT2qnc
Description of the Google Analytics platform, how the tracking code works, metrics and dimensions, event hits, tests A/B, clientID and user_id
En Paradigma creemos que los grandes dragones digitales han desbancado a las empresas tradicionales. La clave para combatir esos dragones es la transformación digital.
Kubernetes es un proyecto open source de Google cuyo propósito es el de hacer de orquestador de containers. En este seminario se tratará de crear una base partiendo desde los principios más fundamentales, de forma que cualquiera con unos conceptos básicos de contenedores pueda entender cómo funciona kubernetes y qué utilidades nos ofrece a la hora de manejar contenedores.
Ponente: Alfredo Espejel, técnico de sistemas en Paradigma
Alfredo cuenta con casi 10 años de experiencia en administración de sistemas, principalmente Linux. Interesado también en las redes, pero sobre todo en las últimas tendencias y tecnologías.
Vídeo de la charla: https://www.youtube.com/watch?v=zI16fatmnVQ
Más información sobre el meetup: http://www.meetup.com/Cloud-Computing-Spain/events/226254765/
Lo que subyace bajo lo que denominamos "HTML5" es la conversión en nativo de "frameworks" y/o tecnologías utilizadas a diario. El navegador se convierte así en una aplicación cada vez más potente gracias a que HTML5 cada vez es más poderoso. Los Web Components van en esa línea, haciendo nativo el "templating", los "custom tags" los "import" (los "includes" de otros lenguajes) y el "shadow dom". Y yo con estos pelos...
Charla impartida por Luis Calvo en la última edición de Codemotion (Madrid, Spain - Nov 21-22)
Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning ResearchAI Frontiers
In this talk at AI Frontiers conference, Jeff Dean discusses recent trends and developments in deep learning research. Jeff touches on the significant progress that this research has produced in a number of areas, including computer vision, language understanding, translation, healthcare, and robotics. These advances are driven by both new algorithmic approaches to some of these problems, and by the ability to scale computation for training ever large models on larger datasets. Finally, one of the reasons for the rapid spread of the ideas and techniques of deep learning has been the availability of open source libraries such as TensorFlow. He gives an overview of why these software libraries have an important role in making the benefits of machine learning available throughout the world.
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNetAI Frontiers
In this talk at AI Frontiers Conference, Alex Smola gives a brief overview over the features used to scale deep learning using MXNet. It relies on a mix between declarative and imperative programming to achieve efficiency while also allowing for significant flexibility for the user. It relies on a distributed (key, value) store for synchronization between GPUs and between machines. It also relies on the separation between a highly efficient execution engine and language bindings to achieve a high degree of flexibility between different languages while offering a native feel in each of them. Alex also briefly discusses how Amazon AWS can help deploy deep learning models and outline steps on our future roadmap.
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...StreamNative
Kafka-on-Pulsar has been one of the most anticipated features in the Pulsar ecosystem. The Kafka-on-Pulsar project was initiated by StreamNative and the OVHCloud team quickly joined the project to collaborate on its development. Kafka-on-Pulsar enables Kafka applications to leverage Pulsar’s powerful features, such as streamlined operations with enterprise-grade multi-tenancy, without modifying code.
In this webinar, Sijie Guo, from StreamNative, and Pierre Zemb, from OVHCloud, will introduce KoP and discuss the following:
1. What are the key benefits?
2. What is the protocol handler and how does it work?
3. How KoP is implemented?
4. What are the new use cases it unlocks?
5. Watch a Live Demo!
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...Timothy Spann
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-ramp 2022
As the Pulsar communities grows, more and more connectors will be added. To enhance the availability of sources and sinks and to make use of the greater Apache Streaming community, joining forces between Apache NiFi and Apache Pulsar is a perfect fit. Apache NiFi also adds the benefits of ELT, ETL, data crunching, transformation, validation and batch data processing. Once data is ready to be an event, NiFi can launch it into Pulsar at light speed.
I will walk through how to get started, some use cases and demos and answer questions.
https://www.devfest-uki.com/schedule
https://linktr.ee/tspannhw
Build real time stream processing applications using Apache KafkaHotstar
This talk was presented at the Hotstar Scale Meetup in Bangalore by Jayesh Sidhwani
In this talk, the presenter introduces Apache Kafka and the Apache Kafka Streams library. Starting from the need for building streaming applications to thinking the use-cases as a streaming job - this talk covers all the technicalities.
It ends with a short description of how Kafka is deployed and used at Hotstar
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8Timothy Spann
https://github.com/tspannhw/FLiPN-Conf42-KubeNative-2022
https://www.conf42.com/Kube_Native_2022_Tim_Spann_realtime_pulsar_apps_k8
I will get you started with real-time cloud native streaming programming with Java, Golang, Python and Apache NiFi.
Apache Pulsar
StreamNative
FLiPN Stack
Kafka on Pulsar:bringing native Kafka protocol support to Pulsar_Sijie&PierreStreamNative
Apache Pulsar is a distributed and open-source pub-sub messaging system. It offers many advantages over Kafka, such as multi-tenant, geo-replication, decoupled storage or even SQL and FaaS directly integrated. The only thing missing for wide adoption is support for the de-facto standard for streaming: Kafka. And this is how our story begins.
In this talk, Sijie Guo from StreamNative and Pierre Zemb from OVHcloud will share the journey on building Kafka-on-Pulsar (KoP) to bring native Kafka protocol support to Pulsar. Before joining the force on building KoP, OVHcloud implemented a Kafka proxy in Rust capable of transforming the Kafka protocol to that Pulsar on the fly and encountered some challenges. After realizing that StreamNative was working on bringing the Kafka protocol natively to Pulsar broker via a pluggable protocol handler mechanism. OVHCloud joined forces with StreamNative to work on brining Kafka protocol support to Pulsar brokers.
At the end of this talk, you will know more about the inner workings of Kafka and Pulsar. You'll also get feedback from both companies from their initial proofs of concepts and the current implementation.
Hana Lee compares some popular options for data engineering work — including Python and Rust, Datafusion and Pandas — to explore the tradeoffs and help determine the ideal stack for your data engineering needs.
BKK16-408B Data Analytics and Machine Learning From Node to ClusterLinaro
Linaro is building an OpenStack based Developer Cloud. Here we present what was required to bring OpenStack to 64-bit ARM, the pitfalls, successes and lessons learnt; what’s missing and what’s next.
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarTimothy Spann
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
In this session I will get you started with real-time cloud native streaming programming with Java, Golang, Python and Apache NiFi. If there’s a preferred language that the attendees pick, we will focus only on that one. I will start off with an introduction to Apache Pulsar and setting up your first easy standalone cluster in docker. We will then go into terms and architecture so you have an idea of what is going on with your events. I will then show you how to produce and consume messages to and from Pulsar topics. As well as using some of the command line and REST interfaces to monitor, manage and do CRUD on things like tenants, namespaces and topics. We will discuss Functions, Sinks, Sources, Pulsar SQL, Flink SQL and Spark SQL interfaces. We also discuss why you may want to add protocols such as MoP (MQTT), AoP (AMQP/RabbitMQ) or KoP (Kafka) to your cluster. We will also look at WebSockets as a producer and consumer. I will demonstrate a simple web page that sends and receives Pulsar messages with basic JavaScript. After this session you will be able to build simple real-time streaming and messaging applications with your chosen language or tool of your choice.
apache pulsar
tim spann developer advocate
streamnative
datainmotion.dev
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...ScyllaDB
Numberly operates business-critical data pipelines and applications where failure and latency means "lost money" in the best-case scenario. Most of those data pipelines and applications are deployed on Kubernetes and rely on Kafka and ScyllaDB, where Kafka acts as the message bus and ScyllaDB as the source of some data enrichment. The availability and latency of both systems are thus very important because they mix and match data in the early stage of their pipelines to be consumed by their platforms.
Most of their applications are developed using Python. But they always felt that they could benefit from a lower-level programming language to squeeze the performance of their hardware even further for some of the most demanding applications. So, when an important part of their data pipeline was to be adjusted to reflect some important changes in their platforms, they thought it was a great opportunity to rewrite it in Rust!
Moving to Rust was hard, not only because of the language itself, but because being at a lower level allowed them to see, test, and demonstrate things that they could not pinpoint or explain that well using Python. They spent a lot of time analyzing the latency impacts of code patterns and client driver settings and ended up contributing to Apache Avro as they went down the rabbit hole.
This session will share their experience transitioning from Python to Rust while meeting the expectations of a business-critical application mixing data from Confluent Kafka and ScyllaDB. There will be code snippets, graphs, numbers, tears, pull requests, grins, latency results, smiles, rants of frustration, and a lot of fun!
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpJosé Román Martín Gil
Apache Kafka is the most used data streaming broker by companies. It could manage millions of messages easily and it is the base of many architectures based in events, micro-services, orchestration, ... and now cloud environments. OpenShift is the most extended Platform as a Service (PaaS). It is based in Kubernetes and it helps the companies to deploy easily any kind of workload in a cloud environment. Thanks many of its features it is the base for many architectures based in stateless applications to build new Cloud Native Applications. Strimzi is an open source community that implements a set of Kubernetes Operators to help you to manage and deploy Apache Kafka brokers in OpenShift environments.
These slides will introduce you Strimzi as a new component on OpenShift to manage your Apache Kafka clusters.
Slides used at OpenShift Meetup Spain:
- https://www.meetup.com/es-ES/openshift_spain/events/261284764/
NYC Dec 2022 Meetup_ Building Real-Time Requires a TeamTimothy Spann
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team
https://www.meetup.com/new-york-city-apache-pulsar-meetup/events/289817171/
We are excited to invite you to an in-person meetup, that is all about streaming data!
If you are interested in Apache Pinot, Apache Pulsar, Apache Flink, and Apache NiFi, this event is for you!
AGENDA
6:00 - 6:30 PM EST: Food, Drink, and Networking!!!
7:15 - 8:00 PM EST: Introduction to Real-Time Analytics with Apache Pinot: David G. Simmons @ StarTree
7:15 - 8:00 PM EST: Building Real-Time Requires a Team: Tim Spann, Developer Advocate @ StreamNative
8:00 - 8:30 PM EST: Round Table
8:30 - 9:00 PM EST: Q&A + Networking
----
“Building Real-Time Requires a Team”- Tim Spann, Developer Advocate @ StreamNative
This talk will discuss building real-time streaming applications utilizing the best open-source systems. This FLiPPN Stack consisting of Apache Flink, Apache Pulsar, Apache Pinot, and Apache NiFi supercharges any real-time app or pipeline building. I will walk you through the what, why, how, and where to use these amazing tools. We will walk through some demos and dive into the source. At the end, we will have driven real-time events into Apache Pinot, setting the stage for your real-time user-facing insights.
Tim Spann is a Developer Advocate for StreamNative. He works with StreamNative Cloud, Apache Pulsar™, Apache Flink®, Flink® SQL, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal, and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit, and many more.
Pronouns: He, They
https://github.com/tspannhw/SpeakerProfile
“Introduction to Real-Time Analytics with Apache Pinot" - David G. Simmons, Head of Developer Advocacy @ StarTree
https://notist.davidgs.com/ https://notist.davidgs.com/
We will simulcast: https://streamnative.zoom.us/meeting/register/tZcpf-iurTojHtTJjwhi87e_iKJJYhrONpAG
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)Igalia
By Andy Wingo.
Snabb is an open-source toolkit for building fast, flexible network functions. Since its beginnings in 2012, Snabb has seen some modest deployment success ranging from simple one-off diagnosis tools to border routers that process all IPv4 traffic for entire countries. This talk will give an introduction to Snabb. After going over Snabb's fundamental components and how they combine, the talk will move on to examples of how network engineers are taking advantage of Snabb in practice, mentioning a few of the many open-source network functions built on Snabb.
(c) RIPE 77
15 - 19 October 2018
Amsterdam, Netherlands
https://ripe77.ripe.net
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Michael Noll
These are the slides of my Kafka talk at Apache: Big Data Europe in Budapest, Hungary. Enjoy! --Michael
Apache Kafka is a high-throughput distributed messaging system that has become a mission-critical infrastructure component for modern data platforms. Kafka is used across a wide range of industries by thousands of companies such as Twitter, Netflix, Cisco, PayPal, and many others.
After a brief introduction to Kafka this talk will provide an update on the growth and status of the Kafka project community. Rest of the talk will focus on walking the audience through what's required to put Kafka in production. We’ll give an overview of the current ecosystem of Kafka, including: client libraries for creating your own apps; operational tools; peripheral components required for running Kafka in production and for integration with other systems like Hadoop. We will cover the upcoming project roadmap, which adds key features to make Kafka even more convenient to use and more robust in production.
La arquitectura de microservicios persigue maximizar la adaptabilidad de las soluciones mediante la distribución de las responsabilidades del software en servicios con ciclo de vida independiente.
Lograr la independencia de los microservicios es clave para beneficiarse de las ventajas de la arquitectura. Esto exige un profundo entendimiento del dominio funcional, lo que se logra mediante DDD.
Por otro lado la arquitectura hexagonal nos permite estructurar el software de manera que la capa de código relacionada con el dominio funcional no se vea interferida por aspectos tecnológicos, es decir, que dicha capa sólo exprese el Ubiquitous Language, es decir el lenguaje del modelo en según lo llama DDD.
Dicha separación en capas y el invertir las dependencias permite además garantizar la máxima portabilidad del código.
¿Qué vamos a ver?
1. Beneficios
2. Domain Driven Design.
- Conceptos - Big Picture.
- Conceptos - Code architecture.
- Event Storming.
3. Clean Code Architecture.
- Hexagonal Architecture.
- Onion Architecture.
Bots 3.0: Dejando atrás los bots conversacionales con Dialogflow.Paradigma Digital
Atención personalizada y automatización de operativas con IA de forma sencilla con DialogFlow. Al terminar esta charla serás capaz de crear un bot con Dialogflow que solucione tareas sencillas.
En esta charla veremos:
- Cuales son las necesidades de negocio que satisface este tipo de soluciones
- Alternativas en el mercado
- Solución de la necesidad con DialogFlow
Ponente: Alex Asensio - Business Lead en Paradigma Digital
Pragmático y siempre enfocado a objetivos de negocio. Enamorado de la tecnología pero también con la forma en que entregamos software a nuestros clientes, basada en el "empirismo". Tech + Biz mano a mano es la fórmula de éxito que queremos compartir con ellos.
En esta nueva entrega sobre service-mesh veremos el que probablemente se convertirá en el producto de referencia: Istio.
Analizaremos las funcionalidades que aporta, su arquitectura interna, la integración con productos de terceros así como su repercusión
dentro de las arquitecturas actuales. Realizaremos algunos ejemplos para mostrar la funcionalidad y configuración
Ponente:
Abraham Rodríguez está especializado en soluciones cloud native con arquitecturas de microservicios, stack con el que ha trabajado en diversos proyectos. Apasionado defensor de todo lo relacionado con cloud, metodologías ágiles, software libre y devops.
En esta presentación hablamos de Linkerd, uno de los pioneros en el ámbito de las "arquitecturas Service Mesh". Haremos un repaso por la historia de este producto, conoceremos sus principales funcionalidades y tendremos una parte práctica en la que mostraremos su integración en arquitecturas distribuidas junto a Docker y Kubernetes.
¿Cómo hago que mis APIs sean usables?
A través de un ejemplo desarrollado en Spring veremos como realizar todo el proceso de diseño aplicando un conjunto de buenas prácticas que te ayuden en el proceso de toma de decisión a la hora de enfrentarte al diseño de APIs.
En este meetup vamos a analizar uno de los pilares básicos en el proceso de transformación digital de las empresas: API Management. Para ello, explicaremos en qué consiste esta estrategia, y los diferentes conceptos y componentes que intervienen en la misma.Además, para completar esta visión con un caso práctico, mostraremos un ejemplo de implementación mediante uno de los productos OpenSource de API Management más exitoso del mercado: WSO2.
https://www.meetup.com/Microservicios
Solr es un motor de búsqueda open source que proporciona unas herramientas muy potentes a la hora de realizar búsquedas sobre campos de texto. En esta charla se tratarán las características básicas y las principales funcionalidades, ya sean básicas (indexación y búsqueda) o avanzadas (resaltado, corrección ortográfica y resultados similares)
Impartido por Alejandro Marqués
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
5. Kafka y Python
Python Madrid · Python y Kafka
Kafka ¿Qué es?
“
”
If you think of Hadoop as long-term
memory, the question then is how you
get the memories in there to begin with
Apache Kafka is like the central nervous
system, which collects all of these
messages from the underlying systems
and transmits them into the memory
vault, or storage.
- Eric Vishria
6. Kafka y Python
Python Madrid · Python y Kafka
Kafka Motivation
To be able to act as a unified
platform for handling all the
real-time data feeds a large
company might have.
…
…
…
…
…
…
Event
Tracking
Application
Logs
Application
Messages
Application
Monitoring
data
7. Kafka y Python
Python Madrid · Python y Kafka
Kafka How to ?
● Distributed, the essence
● Scalable
● Efficient
● Durable, fault tolerance
8. Kafka y Python
Python Madrid · Python y Kafka
Kafka Básicos
P PP
C C C C
…
…
…
Kafka Cluster
● Producers
● Brokers
● Consumers
12. Kafka y Python
Python Madrid · Python y Kafka
Kafka Consumers
● “Subscribe” to a feed
● Consumer groups Kafka Cluster
Partition 0
Broker1Broker2
Partition 1
○ Queue
○ Publish-subscribe
C
C
C
● Order
guarantees
C
C
13. Kafka y Python
Python Madrid · Python y Kafka
Kafka Efficiency
● Small I/O problem
○ Message sets
● Message set compression
○ policies
● Standard binary message format
○ Transfer without modifications