This slides, co-produced by Angelo Corsaro and Thomas Bernhardt introduce stream processing with OpenSplice and Esper. The slides also provide a basic introduction to OpenSplice DDS and the Esper CEP engine.
This presentations explains the foundations of Stream Processing and shows how elegant Stream Processing Architectures can be built by using in synergy DDS and CEP.
This presentations explains the foundations of Stream Processing and shows how elegant Stream Processing Architectures can be built by using in synergy DDS and CEP.
Slides of my talk about the DashProfiler perl module, which enables lightweight always-on performance monitoring for critical sections of code. See
http://search.cpan.org/perldoc?DashProfiler
Abstract:
Many machine learning algorithms can be implemented to run parallel operations on graphics cards. Deeplearning4j is a Java-based machine learning library, which includes implementations of many popular neural-network algorithms. Deeplearning4j uses uses a library called Nd4j to run matrix algebra operations on either CPUs or GPUs with NVIDIA’s CUDA API.
In this talk, I will show how to get a simple machine learning algorithm running on the GPU. I will also cover how to get started with CUDA development: how to get your code to run on the GPU, how to monitor the device, and how to write code to make effective use of parralelization.
Bio: Gary Sieling is a Lead Software Engineer at IQVIA, in Blue Bell, PA, with an interests in database technologies, machine learning, and software engineering practices. He has been involved in curating talks for a company lunch and learn program and the organizing committee for a tech conference. Building on these experiences, he built a search engine called FindLectures.com to help find great talks and speakers.
Aggregate Sharing for User-Define Data Stream WindowsParis Carbone
Aggregation queries on data streams are evaluated over evolving and often overlapping logical views called windows. While the aggregation of periodic windows were extensively studied in the past through the use of aggregate sharing techniques such as Panes and Pairs, little to no work has been put in optimizing the aggregation of very common, non-periodic windows. Typical examples of non-periodic windows are punctuations and sessions which can implement complex business logic and are often expressed as user- defined operators on platforms such as Google Dataflow or Apache Storm. The aggregation of such non-periodic or user-defined windows either falls back to expensive, best-effort aggregate sharing methods, or is not optimized at all.
In this paper we present a technique to perform efficient aggregate sharing for data stream windows, which are de- clared as user-defined functions (UDFs) and can contain arbitrary business logic. To this end, we first introduce the concept of User-Defined Windows (UDWs), a simple, UDF-based programming abstraction that allows users to programmatically define custom windows. We then define semantics for UDWs, based on which we design Cutty, a low-cost aggregate sharing technique. Cutty improves and outperforms the state of the art for aggregate sharing on single and multiple queries. Moreover, it enables aggregate sharing for a broad class of non-periodic UDWs. We implemented our techniques on Apache Flink, an open source stream processing system, and performed experiments demonstrating orders of magnitude of reduction in aggregation costs compared to the state of the art.
Graphs as Streams: Rethinking Graph Processing in the Streaming EraVasia Kalavri
Streaming is the latest hot topic in the big data world. We want to process data immediately and continuously. Modern stream processors have matured significantly and offer exceptional features, including sub-second latencies, high throughput, fault-tolerance, and seamless integration with various data sources and sinks.
Many sources of streaming data consist of related or connected events: user interactions in a social network, web page clicks, movie ratings, product purchases. These connected events can be naturally represented as edges in an evolving graph.
In this talk I will explain how we can leverage a powerful stream processor, such as Apache Flink, and academic research of the past two decades, to build graph streaming applications. I will describe how we can model graphs as streams and how we can compute graph properties without storing and managing the graph state. I will introduce useful graph summary data structures and show how they allow us to build graph algorithms in the streaming model, such as connected components, bipartiteness detection, and distance estimation.
Full Video: https://www.youtube.com/watch?v=cOShsisEsC0
An overview of the relation and combination of three data processing paradigms that is becoming more relevant today. It introduces the essentials of graph, distributed and stream computing and beyond. Furthermore, it questions the fundamental problems that we want to solve with data analysis and the potential of eventually saving the human kind in the next millennium by improving the state of the art of computation technologies while being too busy answering first world problem questions. Crazy but possible.
Internet of Things and Complex event processing (CEP)/Data fusionBAINIDA
Internet of Things and Complex event processing (CEP)/Data fusion
ปริญญา หิรัญปัณฑาพร
Data Analytics/Advanced Analytics ที่ Allianz Ayudhya
วทม (NIDA)
ในงาน THE FIRST NIDA BUSINESS ANALYTICS AND DATA SCIENCES CONTEST/CONFERENCE จัดโดย คณะสถิติประยุกต์และ DATA SCIENCES THAILAND
Day 3 of series, reading assessment: what counts, what is measured, what is valued, what informs our daily instruction. A sampling of instructional sequences with high ceilings and low floors.
Slides of my talk about the DashProfiler perl module, which enables lightweight always-on performance monitoring for critical sections of code. See
http://search.cpan.org/perldoc?DashProfiler
Abstract:
Many machine learning algorithms can be implemented to run parallel operations on graphics cards. Deeplearning4j is a Java-based machine learning library, which includes implementations of many popular neural-network algorithms. Deeplearning4j uses uses a library called Nd4j to run matrix algebra operations on either CPUs or GPUs with NVIDIA’s CUDA API.
In this talk, I will show how to get a simple machine learning algorithm running on the GPU. I will also cover how to get started with CUDA development: how to get your code to run on the GPU, how to monitor the device, and how to write code to make effective use of parralelization.
Bio: Gary Sieling is a Lead Software Engineer at IQVIA, in Blue Bell, PA, with an interests in database technologies, machine learning, and software engineering practices. He has been involved in curating talks for a company lunch and learn program and the organizing committee for a tech conference. Building on these experiences, he built a search engine called FindLectures.com to help find great talks and speakers.
Aggregate Sharing for User-Define Data Stream WindowsParis Carbone
Aggregation queries on data streams are evaluated over evolving and often overlapping logical views called windows. While the aggregation of periodic windows were extensively studied in the past through the use of aggregate sharing techniques such as Panes and Pairs, little to no work has been put in optimizing the aggregation of very common, non-periodic windows. Typical examples of non-periodic windows are punctuations and sessions which can implement complex business logic and are often expressed as user- defined operators on platforms such as Google Dataflow or Apache Storm. The aggregation of such non-periodic or user-defined windows either falls back to expensive, best-effort aggregate sharing methods, or is not optimized at all.
In this paper we present a technique to perform efficient aggregate sharing for data stream windows, which are de- clared as user-defined functions (UDFs) and can contain arbitrary business logic. To this end, we first introduce the concept of User-Defined Windows (UDWs), a simple, UDF-based programming abstraction that allows users to programmatically define custom windows. We then define semantics for UDWs, based on which we design Cutty, a low-cost aggregate sharing technique. Cutty improves and outperforms the state of the art for aggregate sharing on single and multiple queries. Moreover, it enables aggregate sharing for a broad class of non-periodic UDWs. We implemented our techniques on Apache Flink, an open source stream processing system, and performed experiments demonstrating orders of magnitude of reduction in aggregation costs compared to the state of the art.
Graphs as Streams: Rethinking Graph Processing in the Streaming EraVasia Kalavri
Streaming is the latest hot topic in the big data world. We want to process data immediately and continuously. Modern stream processors have matured significantly and offer exceptional features, including sub-second latencies, high throughput, fault-tolerance, and seamless integration with various data sources and sinks.
Many sources of streaming data consist of related or connected events: user interactions in a social network, web page clicks, movie ratings, product purchases. These connected events can be naturally represented as edges in an evolving graph.
In this talk I will explain how we can leverage a powerful stream processor, such as Apache Flink, and academic research of the past two decades, to build graph streaming applications. I will describe how we can model graphs as streams and how we can compute graph properties without storing and managing the graph state. I will introduce useful graph summary data structures and show how they allow us to build graph algorithms in the streaming model, such as connected components, bipartiteness detection, and distance estimation.
Full Video: https://www.youtube.com/watch?v=cOShsisEsC0
An overview of the relation and combination of three data processing paradigms that is becoming more relevant today. It introduces the essentials of graph, distributed and stream computing and beyond. Furthermore, it questions the fundamental problems that we want to solve with data analysis and the potential of eventually saving the human kind in the next millennium by improving the state of the art of computation technologies while being too busy answering first world problem questions. Crazy but possible.
Internet of Things and Complex event processing (CEP)/Data fusionBAINIDA
Internet of Things and Complex event processing (CEP)/Data fusion
ปริญญา หิรัญปัณฑาพร
Data Analytics/Advanced Analytics ที่ Allianz Ayudhya
วทม (NIDA)
ในงาน THE FIRST NIDA BUSINESS ANALYTICS AND DATA SCIENCES CONTEST/CONFERENCE จัดโดย คณะสถิติประยุกต์และ DATA SCIENCES THAILAND
Day 3 of series, reading assessment: what counts, what is measured, what is valued, what informs our daily instruction. A sampling of instructional sequences with high ceilings and low floors.
Focus 2 - Principii de psihologie aplicata in softwareValentin Bora
Al doilea pas dupa coborarea barierelor este influentarea utilizatorului catre un comportament dezirabil penru business-ul desfasurat de siteul/produsul software dezvoltat de tine.
Afla cateva principii psihologice care se pot folosi pentru a incuraja/influenta comportamentul vizitatorilor tai in directia dorita de tine.
Open source Tools and Frameworks for M2M - Sierra Wireless Developer DaysBenjamin Cabé
On June 14, 2013 were the first Sierra Wireless Developer Days. This is the presentation I gave about Sierra Wireless Open-Source activities, and the technologies being delivered together with the Eclipse M2M Industry Working Group.
Join our developer community at http://developer.sierrawireless.com
The Data Distribution Service for Real-Time Systems (DDS) is an Object Management Group (OMG) standard for publish/subscribe designed to address the needs of a large class of mission- and business-critical distributed real-time systems and system of systems. The DDS standard was formally adopted in 2004 and in less than five years from its inception has experienced swift adoption in a wide variety of application domains. These application domains are characterized by the need to distribute high volumes of data with predictable low latencies, such as, Radar Processors, Flying and Land Drones, Combat Management Systems, Air Traffic Management, High Performance Telemetry, Large Scale Supervisory Systems, and Automated Stocks and Options Trading. Along with wide commercial adoption, the DDS Standard has been recommended and mandated as the technology for real-time data distribution by key administrations worldwide such as the US Navy, the DoD Information-Technology Standards Registry (DISR), the UK MoD, and EUROCONTROL.
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101DataStax Academy
Speaker(s): Patrick McFadin, Chief Evangelist for Apache Cassandra at DataStax
Relational systems have always been built on the premise of modeling relationships. As you will see, static schema, one-to-one, many-to-many still have a place in Cassandra. From the familiar, we’ll go into the specific differences in Cassandra and tricks to make your application fast and resilient.
Speaker(s): Patrick McFadin, Chief Evangelist for Apache Cassandra at DataStax
Relational systems have always been built on the premise of modeling relationships. As you will see, static schema, one-to-one, many-to-many still have a place in Cassandra. From the familiar, we’ll go into the specific differences in Cassandra and tricks to make your application fast and resilient.
Relational systems have always been built on the premise of modeling relationships. As you will see, static schema, one-to-one, many-to-many still have a place in Cassandra. From the familiar, we’ll go into the specific differences in Cassandra and tricks to make your application fast and resilient.
Phil Day [Configured Things] | Policy-Driven Real-Time Data Filtering from Io...InfluxData
Policy-Driven Real-Time Data Filtering from IoT Sensors with Flux
Data is central to any smart city, and valuable to a range of different consumers. However, access to the data has to be balanced against privacy concerns to ensure that each recipient only receives the set and quality of data they are authorized to access. This talk describes a solution developed around InfluxDB and Flux which filters data in real time according to a declarative policy model and delivers it securely via web-socket data streams
Introduction to data modeling with apache cassandraPatrick McFadin
Are you using relational databases and wonder how to get started with data modeling and Apache Cassandra? Here is a starting tour of how to get started. Translating from the knowledge you already have to the knowledge you need to effective with Cassandra development. We cover patterns and anti-patterns. Get going today!
The Data Distribution Service (DDS) is a standard for efficient and ubiquitous data sharing built upon the concept of a, strongly typed, distributed data space. The ability to scale from resource constrained embedded systems to ultra-large scale distributed systems, has made DDS the technology of choice for applications, such as, Power Generation, Large Scale SCADA, Air Traffic Control and Management, Smart Cities, Smart Grids, Vehicles, Medical Devices, Simulation, Aerospace, Defense and Financial Trading.
This two part webcast provides an in depth introduction to DDS – the universal data sharing technology. Specifically, we will introduce (1) the DDS conceptual model and data-centric design, (2) DDS data modeling fundamentals, (3) the complete set of C++ and Java API, (4) the most important programming, data modeling and QoS Idioms, and (5) the integration between DDS and web applications.
After attending this webcast you will understand how to exploit DDS architectural features when designing your next system, how to write idiomatic DDS applications in C++ and Java and what are the fundamental patterns that you should adopt in your applications.
Introduced in 2004, the Data Distribution Service (DDS) has been steadily growing in popularity and adoption. Today, DDS is at the heart of a large number of mission and business critical systems, such as, Air Traffic Control and Management, Train Control Systems, Energy Production Systems, Medical Devices, Autonomous Vehicles, Smart Cities and NASA’s Kennedy Space Centre Launch System.
Considered the technological trends toward data-centricity and the rate of adoption, tomorrow, DDS will be at the at the heart of an incredible number of Industrial IoT systems.
To help you become an expert in DDS and exploit your skills in the growing DDS market, we have designed the DDS in Action webcast series. This series is a learning journey through which you will (1) discover the essence of DDS, (2) understand how to effectively exploit DDS to architect and program distributed applications that perform and scale, (3) learn the key DDS programming idioms and architectural patterns, (4) understand how to characterise DDS performances and configure for optimal latency/throughput, (5) grow your system to Internet scale, and (6) secure you DDS system.
This presentation provides with an historical perspective on the development of the DDS-PSM-Cxx and its relationship with simd-cxx 0.x and simd-cxx v1.0
DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...Hakka Labs
New DNA sequencing technologies are revolutionizing the life sciences by generating extremely large data sets. Traditional tools for processing this data will have difficulty scaling to the coming deluge of genomics data. We discuss how the innovations of Hadoop and Spark are solving core problems that enable scientists to address questions that were previously out of reach.
Similar to Getting Started with OpenSplice and Esper (20)
This was the opening presentation of the Zenoh Summit in June 2022. The presentation goes through the motivations that lead to the design of the zenoh protocol and provides an introduction of its core concepts. This is the place to start to understand why you should care about zenoh and the way in which is disrupts existing technologies.
The recording for this presentation is available at https://bit.ly/3QOuC6i
Zenoh is rapidly growing Eclipse project that unifies data in motion, data at rest and computations. It elegantly blends traditional pub/sub with geo distributed storage, queries and computations, while retaining a level of time and space efficiency that is well beyond any of the mainstream stacks. This presentation will provide an introduction to Eclipse Zenoh along with a crisp explanation of the challenges that motivated the creation of this project. We will go through a series of real-world use cases that demonstrate the advantages brought by Zenoh in enabling and optimising typical edge scenarios and in simplifying the development of any scale distributed applications.
Data Decentralisation: Efficiency, Privacy and Fair MonetisationAngelo Corsaro
A presentation give at the European H-Cloud Conference to motivate decentralisation as a mean to improve energy efficiency, privacy, and opportunity for monetisation for your digital footprint.
zenoh: zero overhead pub/sub store/query computeAngelo Corsaro
Unifies data in motion, data in-use, data at rest and computations.
It carefully blends traditional pub/sub with distributed queries, while retaining a level of time and space efficiency that is well beyond any of the mainstream stacks.
It provides built-in support for geo-distributed storages and distributed computations
zenoh -- the ZEro Network OverHead protocolAngelo Corsaro
This presentation introduces the key ideas behind zenoh -- an Internet scale data-centric protocol that unifies data-sharing between any kind of device including those constrained with respect to the node resources, such as computational resources and power, as well as the network.
zenoh -- the ZEro Network OverHead protocolAngelo Corsaro
This presentation introduces the key ideas behind zenoh -- an Internet scale data-centric protocol that unifies data-sharing between any kind of device including those constrained with respect to the node resources, such as computational resources and power, as well as the network.
Fog computing aims at providing horizontal, system-level, abstractions to distribute computing, storage, control and networking functions closer to the user along a cloud-to-thing continuum. Whilst fog computing is increasingly recognised as the key paradigm at the foundation of Consumer and Industrial Internet of Things (IoT), most of the initiatives on fog computing focus on extending cloud infrastructure. As a consequence, these infrastructure fall short in addressing heterogeneity and resource constraints characteristics of fog computing environments.
fog⌀5 (read as fog O-five or fog OS) is an Eclipse IoT Project that is building a fog computing infrastructure from first principle. In other terms, fog⌀5 has been designed to address the challenges induced by fog computing in terms of heterogeneity, decentralisation, resource constraints, geographical scale and security.
This webcast will introduce fog⌀5, motivate its architecture and building blocks as well as provide a demonstration of fog⌀5 provisioning applications that span from the cloud to the things.
The video recording for this presentation is available at https://www.youtube.com/watch?v=Osl3O5DxHF8
Making the right data available at the right time, at the right place, securely, efficiently, whilst promoting interoperability, is a key need for virtually any IoT application. After all, IoT is about leveraging access data – that used to be unavailable – in order to improve the ability to react, manage, predict and preserve a cyber-physical system.
The Data Distribution Service (DDS) is a standard for interoperable, secure, and efficient data sharing, used at the foundation of some of the most challenging Consumer and Industrial IoT applications, such as Smart Cities, Autonomous Vehicles, Smart Grids, Smart Farming, Home Automation and Connected Medical Devices.
In this presentation we will (1) introduce the Eclipse Cyclone DDS project, (2) provide a quick intro that will get you started with Cyclone DDS, (3) present a few Cyclone DDS use cases, and (4) share the Cyclone DDS development road-map.
Fog Computing is a paradigm that complements and extends cloud computing by providing an end-to-end virtualisation of computing, storage and communication resources. As such, fog computing allow applications to be transparently provisioned and managed end-to-end. This presentation first motivates the need for fog computing, then introduced fog05 the first and only Open Source fog computing platform!
Data Sharing in Extremely Resource Constrained EnvionrmentsAngelo Corsaro
This presentation introduces XRCE a new protocol for very efficiently distributing data in resource constrained (power, network, computation, and storage) environments. XRCE greatly improves the wire efficiency of existing protocol and in many cases provides higher level abstractions.
RUSTing is not a tutorial on the Rust programming language.
I decided to create the RUSTing series as a way to document and share programming idioms and techniques.
From time to time I’ll draw parallels with Haskell and Scala, having some familiarity with one of them is useful but not indispensable.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
3. DDS Overview
...from a Stream Processing Perspective
Angelo CORSARO, Ph.D.
Chief Technology Officer
OMG DDS Sig Co-Chair
PrismTech
angelo.corsaro@prismtech.com
4. Stream Processing [1/3]
! Stream Processing is an architectural style for building
!"#$%&'()*+,-,.*/%&01234(*5**677*8&'()0*8303%93:;
systems that operate over continuous (theoretically
infinite) streams of data
! Stream Processing is often reified under one of its many
declinations, such as:
! Reactive Systems
! Signal Processing Systems
! Functional Stream Programming
! Data Flow Systems
5. Stream Processing [2/3]
! The Stream Processing Architecture very
!"#$%&'()*+,-,.*/%&01234(*5**677*8&'()0*8303%93:;
naturally models systems reacting to streams of
data and events produced by the external
world, such as the data produced by sensors, a
camera or even the data produced by the
stock exchange.
! Stream Processing Systems usually operate in
real-time over streams and generate in turns
other streams of data providing information on
what is happening or suggesting actions to
perform, such as by stock X, raise alarm Y, or
detected spatial violation, etc.
6. Stream Processing [3/3]
! Stream Processing Systems are typically
!"#$%&'()*+,-,.*/%&01234(*5**677*8&'()0*8303%93:;
modeled as collection of modules
communicating via typed data
channels called usually streams
Filter
! Modules usually play one of the Filter
following roles:
Filter
! Sources: Injecting data into the System
! Filters/Actors: Performing some
computation over sources Sink
! Sinks: Consuming the data produced by Stream
the system source
8. Defining Streams
! In abstract terms, a stream is an infinite sequence of
!"#$%&'()*+,-,.*/%&01234(*5**677*8&'()0*8303%93:;
data samples of a given type T
! Streams can be further classified in continuous and
discrete streams. Sometimes referred as Behaviors/
Signals and Events
! In this presentation we’ll refer to Continuous Streams
as Data Streams and to Discrete Streams as Event
Streams
9. Data Streams
Data Streams Temp
!"#$%&'()*+,-,.*/%&01234(*5**677*8&'()0*8303%93:;
! The value of a Data Stream
is always defined, i.e.
continuous.
! Good examples of a Data
Stream are the value
assumed by a real-world
entity, such as temperature, time
pressure, a price, etc.
10. Event Streams
Event Streams OverheatAlarm
!"#$%&'()*+,-,.*/%&01234(*5**677*8&'()0*8303%93:;
! The value of the stream is
defined at precise point in
time, i.e. it is discrete
! Good examples of Event
Streams are events in the real
world, such a violation of a
regulatory compliance, the
time
temperature higher than a
given value, etc.
11. [A Stream Perspective]
What is DDS?
! DDS is a high-performance, real-time, highly-
!"#$%&'()*+,-,.*/%&01234(*5**677*8&'()0*8303%93:;
available, fully-distributed, messaging technology
that allows you to define data/event streams and
make them dynamically discoverable
! DDS is equipped with a rich set of QoS providing
control on the key temporal and availability
properties of data
13. DDS Topics [1/2]
“org.opensplice.demo.TTempSensor”
! A Topic defines a stream class/
!"#$%&'()*+,-,.*/%&01234(*5**677*8&'()0*8303%93:;
category
! A Topic has associated a user
defined type and QoS
! The Topic name, type and QoS
defines the key functional and
non-functional invariants
! Topics can be discovered or
locally defined struct TempSensor {
long Id; DURABILITY,
float temp; DEADLINE,
float hum;
} PRIORITY,
#pragma keylist TempSensor id …
14. DDS Topics [2/2]
“org.opensplice.demo.TTempSensor”
!"#$%&'()*+,-,.*/%&01234(*5**677*8&'()0*8303%93:;
! DDS Topic types can have
associated keys
! Each unique key-value
identifies a unique sub-
stream of values -- called
Topic Instance
struct TempSensor {
long id; DURABILITY,
float temp; DEADLINE,
float hum;
} PRIORITY,
#pragma keylist TempSensor id …
15. “Seeing” Streams
!"#$%&'()*+,-,.*/%&01234(*5**677*8&'()0*8303%93:;
id =701
struct TempSensor {
id =809 @key long id;
Topic float temp;
float hum;
};
id =977
Instances Instances
17. Anatomy of a DDS Application
!"#$%&'()*+,-,.*/%&01234(*5**677*8&'()0*8303%93:;
Domain val dp = DomainParticipant(0)
Domain
Participant
// Create a Publisher / Subscriber
Publisher p = dp.create_publisher();
Publisher Topic Subscriber
Session Subscriber s = dp.create_subscriber();
// Create a Topic
Topic<TempSensor> t = Gives access to a
dp.create_topic<TempSensor>(“com.myco.TSTopic”) DDS Domain
DataWrter DataReader
Reader/Writers
User Defined for Types
// Create a DataWriter/DataWriter
DataWriter<TempSensor> dw = pub.create_datawriter(t);
DataReader<TempSensor> dr = sub.create_datareader(t);
18. Anatomy of a DDS Application
!"#$%&'()*+,-,.*/%&01234(*5**677*8&'()0*8303%93:;
Domain val dp = DomainParticipant(0)
Domain
Participant
// Create a Publisher / Subscriber
val pub = Publisher(dp)
Publisher Topic Subscriber
Session val sub = Subscriber(dp)
// Create a Topic
val topic = Topic[TempSensor](dp,
“org.opensplice.demo.TTempSensor”)
DataWrter DataReader
Reader/Writers
User Defined for Types Pub/Sub
// Create a DataWriter/DataWriter
Abstractions
DataWriter<TempSensor> dw = pub.create_datawriter(t);
DataReader<TempSensor> dr = sub.create_datareader(t);
19. Anatomy of a DDS Application
!"#$%&'()*+,-,.*/%&01234(*5**677*8&'()0*8303%93:;
Domain val dp = DomainParticipant(0)
Domain
Participant
// Create a Publisher / Subscriber
val pub = Publisher(dp)
Publisher Topic Subscriber
Session val sub = Subscriber(dp)
// Create a Topic
val topic = Topic[TempSensor](dp,
“org.opensplice.demo.TTempSensor”)
DataWrter DataReader
Reader/Writers for User Defined for Types
// Create a DataWriter/DataWriter
val writer = DataWriter[TempSensor](pub, topic)
val reader = DataReader[TempSensor](sub, topic) Reader/Writer for
application
// Write data defined Topic
val t = new TempSensor ts(101, 25, 40)
writer write ts; Types
20. Data & Event Streams
! DDS does not provide different types for Data/Event Streams.
!"#$%&'()*+,-,.*/%&01234(*5**677*8&'()0*8303%93:;
The difference between the two can be made through the
DataReader API by using properly using read/take operations
! DataReader::read
! Reads the value of the stream w/o removing it from the stream. As a result
multiple read can see the last known value of the stream
! DataReader::take
! Takes the value available on the stream (if any yet) and removes it from the
stream
22. Content Filtered Topics
! Content Filtered Topics
!"#$%&'()*+,-,.*/%&01234(*5**677*8&'()0*8303%93:;
provide a way of defining a Example:
filters over an incoming // Create a Topic (on default domain)
stream associated with a val topic = Topic[TempSensor](“TTempSensor”)
val ftopic =
given topic ContentFilteredTopic[TempSensor](“CFTempSensor”,
topic,
filter,
! Filters are expressed as the params)
“WHERE” clause of an SQL // - filter is a WHERE-like clause, such as:
// “temp > 20 AND hum > 50”
statement // “temp > %0”
// “temp > hum”
// “temp BETWEEN (%0 AND %1)
! Filters can operate on any //
// - params is the list of parameters to pass to the
attribute of the type // filter expression – if any
associated with the topic
23. Filter Expression Syntax
!"#$%&'()*+,-,.*/%&01234(*5**677*8&'()0*8303%93:;
! DDS Filters are condition over
a topic type attributes
! Temporal properties or
causality cannot be
captured via DDS filter
expression
24. History
!"#$%&'()*+,-,.*/%&01234(*5**677*8&'()0*8303%93:;
Data older than “n samples ago”
get’s out the window
! DDS provides a way of
controlling data future past
windows through the
History QoS now
The window keeps
the last n data samples
25. [Putting it All Together]
TempSensor Moving Average
object MovingAverageFilter {
def main(args: Array[String]) {
if (args.length < 2) {
!"#$%&'()*+,-,.*/%&01234(*5**677*8&'()0*8303%93:;
println("USAGE:ntMovingAverageFilter <window> <filter-expression>")
}
val topic = Topic[TempSensor]("TTempSensor")
val ftopic = ContentFilteredTopic[TempSensor]("CFTempSensor",topic, args(1))
val rqos = DataReaderQos() <= KeepLastHistory(args(0).toInt)
val reader = DataReader[TempSensor](ftopic, rqos)
reader.reactions += {
case e: DataAvailable[_] => {
var average: Float = 0
val window = e[TempSensor].reader.history
window foreach (average += _.temp)
average = average / window.length
println("+--------------------------------------------------------")
println("Moving Average: " + average)
}
}
}
27. Product Organization
Commercial Edition
! No Cost Runtime Licenses
!"#$%&'()*+,--.*/%&01234(*5**677*8&'()0*8303%93:
! Your choice of licensing
! LGPL or Commercial
! Subscription or Perpetual
! Complete DDS Implementation
Commercial
Edition
! Comprehensive Developer and
Deployment Support Options with a
range of Service Level Agreements
Commercial Add-Ons
! Individually licensable technologies
Commercial Add-ons
! Rich ecosystem covering tools,
integration, testing, etc.
28. Key Points So Far
!"#$%&'()*+,-,.*/%&01234(*5**677*8&'()0*8303%93:;
! DDS key abstractions for building stream processing
architectures
! DDS provides some event processing capabilities that
facilitate the development of Stream Processing Filters
! What else can you use for “Stream Processing” in combination
with DDS?
! Let’s have Tom introduce us to the world of CEP and Esper!
43. %#
#!
%#
#!
[insert into insert_into_def]
select select_list
from stream_def [as name] [, stream_def [as name]] [,...]
[where search_conditions]
[group by grouping_expression_list]
[having grouping_search_conditions]
[output output_specification]
[order by order_by_expression_list]
%!#
select acctId, sum(amount)
from Withdrawal.win:time(1 minute)
group by acctId
having sum(amount) 1000
order by acctId asc
9
55. -$
!
!
!#
every login=LoginEvent
- logout=LogOutEvent where timer:within(2 sec)
!
every [2] login=LoginEvent
!
every [2..] login=LoginEvent until logout=LogoutEvent
77. OpenSplice + Esper
!#$%'()*+,-,.*/%01234(*5**677*8'()0*8303%93:;
! Esper Provides an EsperIO framework for plugging-in
new stream transports
! Plugging OpenSplice into Esper is trivial even w/o
relying on the EsperIO framework
! Let’s have a look…
79. iShapes Application
Spotted shapes represent subscriptions
! To explore play with OpenSplice and
Pierced shapes represent publications
Esper, we’ll use the simd-cxx ishapes
!#$%'()*+,-,.*/%01234(*5**677*8'()0*8303%93:;
application
! Three Topics
! Circle, Square, Triangle
! One Type:
struct ShapeType {
string color;
long x;
long y;
long shapesize;
};
#pragma keylist Shapetype color
80. Esper Setup
!#$%'()*+,-,.*/%01234(*5**677*8'()0*8303%93:;
Step 1: Register Topic Types
val config = new Configuration
val ddsConf = new ConfigurationEventTypeLegacy
ddsConf.setAccessorStyle(ConfigurationEventTypeLegacy.AccessorStyle.PUBLIC)
config.addEventType(ShapeType,
classOf[org.opensplice.demo.ShapeType].getName,
ddsConf)
val cep: EPServiceProvider =
EPServiceProviderManager.getDefaultProvider(config)
81. Esper Setup
!#$%'()*+,-,.*/%01234(*5**677*8'()0*8303%93:;
Step 2: Register a Listener for receiving Esper Events
val listener = new UpdateListener {
def update(ne: Array[EventBean], oe: Array[EventBean]) {
ne foreach(e = {
! // Handle the event
})
}
}
83. iShapes FrameRate
!#$%'()*+,-,.*/%01234(*5**677*8'()0*8303%93:;
! Let’s suppose that we wanted to keep under control
the iShapes Frame rate for ech given color
! In Esper this can be achieved with the following
expression:
insert into ShapesxSec
select color, count(*) as cnt
from ShapeType.win:time_batch(1 second)
group by color
84. iShapes Center of Mass
!#$%'()*+,-,.*/%01234(*5**677*8'()0*8303%93:;
! Suppose that we wanted to compute the center of
mass of all the shapes currently displayed over the
last second
! The Esper expression for this would be:
select ShapeFactory.createShape(color, cast(avg(x),int), cast(avg
(y),int), shapesize) as NewShape
from ShapeType.win:time(10 sec)
85. References
!#$%'()*+,-,.*/%01234(*5**677*8'()0*8303%93:;
#1 OMG DDS Implementation #1 Java-Based CEP Engine
Open Source Open Source
www.opensplice.org www.espertech.com
Fastest growing JVM Language Scala API for OpenSplice DDS
Open Source Open Source
www.scala-lang.org code.google.com/p/escalier