Complex Event Processing (CEP) is a powerful technology in realtime distributed environments for analyzing fast and distributed streams of data, and deriving conclusions from them. CEP permits defining complex events based on the events produced by the incoming sources in order to identify complex meaningful circumstances and to respond to them as quickly as possible. However, in many situations the information that needs to be analyzed is not structured as a mere sequence of events, but as graphs of interconnected data that evolve over time. This paper proposes an extension of CEP systems that permits dealing with graph-structured information. Two case studies are used to validate the proposal and to compare its performance with traditional CEP systems. We discuss the benefits and limitations of the CEP extensions presented.
Webinar: Detecting row patterns with Flink SQL - Dawid WysakowiczVerverica
Apache Flink is one of the first open source stream processors that was able to address the full spectrum of stream processing applications, ranging from applications with low latency requirements to applications that process millions of events per second. On top of this powerful processing engine, the Flink community built APIs for complex event processing and streaming analytics, namely the CEP library and support for streaming SQL.
Since recently, the Flink community is integrating both APIs by extending Flink SQL to support the MATCH RECOGNIZE clause for row pattern matching that was introduced with the SQL:2016 standard.
I will discuss the new MATCH RECOGNIZE feature and present use cases that benefit from pattern matching support in streaming SQL, such as process monitoring or anomaly detection. I will demonstrate the feature with a few example queries.
Real-Time Processing of Spatial Data Using Kafka Streams, Ian Feeney & Roman ...HostedbyConfluent
Real-Time Processing of Spatial Data Using Kafka Streams, Ian Feeney & Roman Kolesnev | Current 2022
Kafka Streams applications can process fast-moving, unbounded streams of data. This gives us the capability to process and react to events from many sources in near real time as they converge in Kafka. However, if the events in these data streams have a spatial component and their spatial relationships with each other determine how they should be processed or reacted to, this raises some fundamental challenges. Determining that, for example, a person is within an area or that routes are intersecting requires access to geospatial operations which are not readily available in Kafka Streams.
In this talk, we will first set the scene with a geospatial 101. Then, using a simplified taxi hailing use case, we will look at two approaches for processing spatial data with Kafka Streams. The first approach is a naive approach which uses Kafka Streams DSL, geohashing and the Java Spatial4j library. The second approach is a prototype which replaces the RocksDB statestore with Apache Lucene (an embedded storage engine with powerful indexing, search and geospatial capabilities), and implements a stateful spatial join with the Transformer API.
This talk will give you an appreciation of geospatial use cases and how Kafka Streams could enable them. You will see the role the state store plays in stateful processing and the implications for geospatial processing. It will also show you what is involved in integrating a custom state store with Kafka Streams. Overall, this talk will give you an understanding of how you might go about building custom processing capabilities on top of Kafka Streams for your own use cases.
Using the new extended Berkley Packet Filter capabilities in Linux to the improve performance of auditing security relevant kernel events around network, file and process actions.
Introducing the Apache Flink Kubernetes OperatorFlink Forward
Flink Forward San Francisco 2022.
The Apache Flink Kubernetes Operator provides a consistent approach to manage Flink applications automatically, without any human interaction, by extending the Kubernetes API. Given the increasing adoption of Kubernetes based Flink deployments the community has been working on a Kubernetes native solution as part of Flink that can benefit from the rich experience of community members and ultimately make Flink easier to adopt. In this talk we give a technical introduction to the Flink Kubernetes Operator and demonstrate the core features and use-cases through in-depth examples."
by
Thomas Weise
Video: https://data-artisans.com/flink-forward-berlin/resources/monitoring-flink-with-prometheus
Live Demo Code: https://github.com/mbode/flink-prometheus-example
Prometheus is a cloud-native monitoring system prioritizing reliability and simplicity – and Flink works really well with it! This session will show you how to leverage the Flink metrics system together with Pronetheus to improve the observability of your jobs. There will be a live demo establishing how everything ties in together. The talk is aimed at people already building and running Flink jobs who would like to gain more insight into them. It is fine if you are not familiar with Prometheus yet as the basic concepts will be introduced. If you have ever wondered how you could use modern monitoring tools to be alerted in the middle of the night in case your Flink job‘s 99th percentile end-to-end latency degraded for some reason, this might just be the talk you are looking for.
Webinar: Detecting row patterns with Flink SQL - Dawid WysakowiczVerverica
Apache Flink is one of the first open source stream processors that was able to address the full spectrum of stream processing applications, ranging from applications with low latency requirements to applications that process millions of events per second. On top of this powerful processing engine, the Flink community built APIs for complex event processing and streaming analytics, namely the CEP library and support for streaming SQL.
Since recently, the Flink community is integrating both APIs by extending Flink SQL to support the MATCH RECOGNIZE clause for row pattern matching that was introduced with the SQL:2016 standard.
I will discuss the new MATCH RECOGNIZE feature and present use cases that benefit from pattern matching support in streaming SQL, such as process monitoring or anomaly detection. I will demonstrate the feature with a few example queries.
Real-Time Processing of Spatial Data Using Kafka Streams, Ian Feeney & Roman ...HostedbyConfluent
Real-Time Processing of Spatial Data Using Kafka Streams, Ian Feeney & Roman Kolesnev | Current 2022
Kafka Streams applications can process fast-moving, unbounded streams of data. This gives us the capability to process and react to events from many sources in near real time as they converge in Kafka. However, if the events in these data streams have a spatial component and their spatial relationships with each other determine how they should be processed or reacted to, this raises some fundamental challenges. Determining that, for example, a person is within an area or that routes are intersecting requires access to geospatial operations which are not readily available in Kafka Streams.
In this talk, we will first set the scene with a geospatial 101. Then, using a simplified taxi hailing use case, we will look at two approaches for processing spatial data with Kafka Streams. The first approach is a naive approach which uses Kafka Streams DSL, geohashing and the Java Spatial4j library. The second approach is a prototype which replaces the RocksDB statestore with Apache Lucene (an embedded storage engine with powerful indexing, search and geospatial capabilities), and implements a stateful spatial join with the Transformer API.
This talk will give you an appreciation of geospatial use cases and how Kafka Streams could enable them. You will see the role the state store plays in stateful processing and the implications for geospatial processing. It will also show you what is involved in integrating a custom state store with Kafka Streams. Overall, this talk will give you an understanding of how you might go about building custom processing capabilities on top of Kafka Streams for your own use cases.
Using the new extended Berkley Packet Filter capabilities in Linux to the improve performance of auditing security relevant kernel events around network, file and process actions.
Introducing the Apache Flink Kubernetes OperatorFlink Forward
Flink Forward San Francisco 2022.
The Apache Flink Kubernetes Operator provides a consistent approach to manage Flink applications automatically, without any human interaction, by extending the Kubernetes API. Given the increasing adoption of Kubernetes based Flink deployments the community has been working on a Kubernetes native solution as part of Flink that can benefit from the rich experience of community members and ultimately make Flink easier to adopt. In this talk we give a technical introduction to the Flink Kubernetes Operator and demonstrate the core features and use-cases through in-depth examples."
by
Thomas Weise
Video: https://data-artisans.com/flink-forward-berlin/resources/monitoring-flink-with-prometheus
Live Demo Code: https://github.com/mbode/flink-prometheus-example
Prometheus is a cloud-native monitoring system prioritizing reliability and simplicity – and Flink works really well with it! This session will show you how to leverage the Flink metrics system together with Pronetheus to improve the observability of your jobs. There will be a live demo establishing how everything ties in together. The talk is aimed at people already building and running Flink jobs who would like to gain more insight into them. It is fine if you are not familiar with Prometheus yet as the basic concepts will be introduced. If you have ever wondered how you could use modern monitoring tools to be alerted in the middle of the night in case your Flink job‘s 99th percentile end-to-end latency degraded for some reason, this might just be the talk you are looking for.
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesAltinity Ltd
From webinars September 11 and September 17, 2019
ClickHouse is famous for speed. That said, you can almost always make it faster! This webinar uses examples to teach you how to deduce what queries are actually doing by reading the system log and system tables. We'll then explore standard ways to increase query speed: data types and encodings, filtering, join reordering, skip indexes, materialized views, session parameters, to name just a few. In each case we'll circle back to query plans and system metrics to demonstrate changes in ClickHouse behavior that explain the boost in performance. We hope you'll enjoy the first step to becoming a ClickHouse performance guru!
Speaker Bio:
Robert Hodges is CEO of Altinity, which offers enterprise support for ClickHouse. He has over three decades of experience in data management spanning 20 different DBMS types. ClickHouse is his current favorite. ;)
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Flink Forward
Netflix’s playback data records every user interaction with video on the service, from trailers on the home page to full-length movies. This is a critical dataset with high volume that is used broadly across Netflix, powering product experiences, AB test metrics, and offline insights. In processing playback data, we depend heavily on event-time partitioning to handle a long tail of late arriving events. In this talk, I’ll provide an overview of our recent implementation of generic event-time partitioning on high volume streams using Apache Flink and Apache Iceberg (Incubating). Built as configurable Flink components that leverage Iceberg as a new output table format, we are now able to write playback data and other large scale datasets directly from a stream into a table partitioned on event time, replacing the common pattern of relying on a post-processing batch job that “puts the data in the right place”. We’ll talk through what it took to apply this to our playback data in practice, as well as challenges we hit along the way and tradeoffs with a streaming approach to event-time partitioning.
Building a Streaming Microservice Architecture: with Apache Spark Structured ...Databricks
As we continue to push the boundaries of what is possible with respect to pipeline throughput and data serving tiers, new methodologies and techniques continue to emerge to handle larger and larger workloads
Intro to Kapacitor for Alerting and Anomaly DetectionInfluxData
In this session you’ll get detailed overview of Kapacitor, InfluxDB’s native data processing engine. The session will cover how to install, configure and build custom TICKscripts enable alerting and anomaly detection.
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
Watch video at: http://youtu.be/Wg2boMqLjCg
Want to learn how to write faster and more efficient programs for Apache Spark? Two Spark experts from Databricks, Vida Ha and Holden Karau, provide some performance tuning and testing tips for your Spark applications
Speaker: Jay Runkel, Principal Solution Architect, MongoDB
Session Type: 40 minute main track session
Track: Operations
When architecting a MongoDB application, one of the most difficult questions to answer is how much hardware (number of shards, number of replicas, and server specifications) am I going to need for an application. Similarly, when deploying in the cloud, how do you estimate your monthly AWS, Azure, or GCP costs given a description of a new application? While there isn’t a precise formula for mapping application features (e.g., document structure, schema, query volumes) into servers, there are various strategies you can use to estimate the MongoDB cluster sizing. This presentation will cover the questions you need to ask and describe how to use this information to estimate the required cluster size or cloud deployment cost.
What You Will Learn:
- How to architect a sharded cluster that provides the required computing resources while minimizing hardware or cloud computing costs
- How to use this information to estimate the overall cluster requirements for IOPS, RAM, cores, disk space, etc.
- What you need to know about the application to estimate a cluster size
Flink Forward San Francisco 2022.
Resource Elasticity is a frequently requested feature in Apache Flink: Users want to be able to easily adjust their clusters to changing workloads for resource efficiency and cost saving reasons. In Flink 1.13, the initial implementation of Reactive Mode was introduced, later releases added more improvements to make the feature production ready. In this talk, we’ll explain scenarios to deploy Reactive Mode to various environments to achieve autoscaling and resource elasticity. We’ll discuss the constraints to consider when planning to use this feature, and also potential improvements from the Flink roadmap. For those interested in the internals of Flink, we’ll also briefly explain how the feature is implemented, and if time permits, conclude with a short demo.
by
Robert Metzger
All about Zookeeper and ClickHouse Keeper.pdfAltinity Ltd
ClickHouse clusters depend on ZooKeeper to handle replication and distributed DDL commands. In this Altinity webinar, we’ll explain why ZooKeeper is necessary, how it works, and introduce the new built-in replacement named ClickHouse Keeper. You’ll learn practical tips to care for ZooKeeper in sickness and health. You’ll also learn how/when to use ClickHouse Keeper. We will share our recommendations for keeping that happy as well.
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
20-Feb-2024
In this talk, I will walk through how someone can set up and run continuous SQL queries against Kafka topics utilizing Apache Flink. We will walk through creating Kafka topics, schemas, and publishing data.
We will then cover consuming Kafka data, joining Kafka topics, and inserting new events into Kafka topics as they arrive. This basic overview will show hands-on techniques, tips, and examples of how to do this.
Tim Spann
Tim Spann is the Principal Developer Advocate for Data in Motion @ Cloudera where he works with Apache Kafka, Apache Flink, Apache NiFi, Apache Iceberg, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
eBPF is one of the key technologies nowadays. There are several existing technologies in network or observability fields but not much in storage space. This presentation tells my research story and tries to define some of the possibilities of the technology.
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San JoseNikolay Samokhvalov
Future database administration will be highly automated. Until then, we still live in a world where extensive manual interactions are required from a skilled DBA. This will change soon as more "autonomous databases" reach maturity and enter the production environment.
Postgres-specific monitoring tools and systems continue to improve, detecting and analyzing performance issues and bottlenecks in production databases. However, while these tools can detect current issues, they require highly-experienced DBAs to analyze and recommend mitigations.
In this session, the speaker will present the initial results of the POSTGRES.AI project – Nancy CLI, a unified way to manage automated database experiments. Nancy CLI is an automated database management framework based on well-known open-source projects and incorporating major open-source tools and Postgres modules: pgBadger, pg_stat_kcache, auto_explain, pgreplay, and others.
Originally developed with the goal to simulate various SQL query use cases in various environments and collect data to train ML models, Nancy CLI turned out to be very a universal framework that can play a crucial role in CI/CD pipelines in any company.
Using Nancy CLI, casual DBAs and any engineers can easily conduct automated experiments today, either on AWS EC2 Spot instances or on any other servers. All you need is to tell Nancy which database to use, specify workload (synthetic or "real", generated based on the Postgres logs), and what you want to test – say, check how a new index will affect all most expensive query groups from pg_stat_statements, or compare various values of "default_statistics_target". All the collected information with a very high level of confidence will give you understanding, how various queries and overall Postgres performance will be affected when you apply this change to production.
Concurrent Programming Using the DisruptorTrisha Gee
Presented to the London Java Community at Skillsmatter on 1st March 2012.
Full presentation can be viewed here: http://skillsmatter.com/podcast/home/the-disruptor/js-3798
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...Till Blume
Semi-structured, schema-free data formats are used in many applications because their flexibility enables simple data exchange. Especially graph data formats like RDF have become well established in the Web of Data. For the Web of Data, it is known that data instances are not only added, changed, and removed regularly, but that their schemas are also subject to enormous changes over time. Unfortunately, the collection, indexing, and analysis of the evolution of data schemas on the web is still in its infancy. To enable a detailed analysis of the evolution of Linked Open Data, we lay the foundation for the implementation of incremental schema-level indices for the Web of Data. Unlike existing schema-level indices, incremental schema-level indices have an efficient update mechanism to avoid costly recomputations of the entire index. This enables us to monitor changes to data instances at schema-level, trace changes, and ultimately provide an always up-to-date schema-level index for the Web of Data. In this paper, we analyze in detail the challenges of updating arbitrary schema-level indices for the Web of Data. To this end, we extend our previously developed meta model FLuID. In addition, we outline an algorithm for performing the updates.
Time Series Analysis… using an Event Streaming Platformconfluent
Time Series Analysis… using an Event Streaming Platform, Mirko Kämpf, Solutions Architect, Confluent
Meetup Link: https://www.meetup.com/Apache-Kafka-Germany-Munich/events/272827528/
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesAltinity Ltd
From webinars September 11 and September 17, 2019
ClickHouse is famous for speed. That said, you can almost always make it faster! This webinar uses examples to teach you how to deduce what queries are actually doing by reading the system log and system tables. We'll then explore standard ways to increase query speed: data types and encodings, filtering, join reordering, skip indexes, materialized views, session parameters, to name just a few. In each case we'll circle back to query plans and system metrics to demonstrate changes in ClickHouse behavior that explain the boost in performance. We hope you'll enjoy the first step to becoming a ClickHouse performance guru!
Speaker Bio:
Robert Hodges is CEO of Altinity, which offers enterprise support for ClickHouse. He has over three decades of experience in data management spanning 20 different DBMS types. ClickHouse is his current favorite. ;)
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Flink Forward
Netflix’s playback data records every user interaction with video on the service, from trailers on the home page to full-length movies. This is a critical dataset with high volume that is used broadly across Netflix, powering product experiences, AB test metrics, and offline insights. In processing playback data, we depend heavily on event-time partitioning to handle a long tail of late arriving events. In this talk, I’ll provide an overview of our recent implementation of generic event-time partitioning on high volume streams using Apache Flink and Apache Iceberg (Incubating). Built as configurable Flink components that leverage Iceberg as a new output table format, we are now able to write playback data and other large scale datasets directly from a stream into a table partitioned on event time, replacing the common pattern of relying on a post-processing batch job that “puts the data in the right place”. We’ll talk through what it took to apply this to our playback data in practice, as well as challenges we hit along the way and tradeoffs with a streaming approach to event-time partitioning.
Building a Streaming Microservice Architecture: with Apache Spark Structured ...Databricks
As we continue to push the boundaries of what is possible with respect to pipeline throughput and data serving tiers, new methodologies and techniques continue to emerge to handle larger and larger workloads
Intro to Kapacitor for Alerting and Anomaly DetectionInfluxData
In this session you’ll get detailed overview of Kapacitor, InfluxDB’s native data processing engine. The session will cover how to install, configure and build custom TICKscripts enable alerting and anomaly detection.
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
Watch video at: http://youtu.be/Wg2boMqLjCg
Want to learn how to write faster and more efficient programs for Apache Spark? Two Spark experts from Databricks, Vida Ha and Holden Karau, provide some performance tuning and testing tips for your Spark applications
Speaker: Jay Runkel, Principal Solution Architect, MongoDB
Session Type: 40 minute main track session
Track: Operations
When architecting a MongoDB application, one of the most difficult questions to answer is how much hardware (number of shards, number of replicas, and server specifications) am I going to need for an application. Similarly, when deploying in the cloud, how do you estimate your monthly AWS, Azure, or GCP costs given a description of a new application? While there isn’t a precise formula for mapping application features (e.g., document structure, schema, query volumes) into servers, there are various strategies you can use to estimate the MongoDB cluster sizing. This presentation will cover the questions you need to ask and describe how to use this information to estimate the required cluster size or cloud deployment cost.
What You Will Learn:
- How to architect a sharded cluster that provides the required computing resources while minimizing hardware or cloud computing costs
- How to use this information to estimate the overall cluster requirements for IOPS, RAM, cores, disk space, etc.
- What you need to know about the application to estimate a cluster size
Flink Forward San Francisco 2022.
Resource Elasticity is a frequently requested feature in Apache Flink: Users want to be able to easily adjust their clusters to changing workloads for resource efficiency and cost saving reasons. In Flink 1.13, the initial implementation of Reactive Mode was introduced, later releases added more improvements to make the feature production ready. In this talk, we’ll explain scenarios to deploy Reactive Mode to various environments to achieve autoscaling and resource elasticity. We’ll discuss the constraints to consider when planning to use this feature, and also potential improvements from the Flink roadmap. For those interested in the internals of Flink, we’ll also briefly explain how the feature is implemented, and if time permits, conclude with a short demo.
by
Robert Metzger
All about Zookeeper and ClickHouse Keeper.pdfAltinity Ltd
ClickHouse clusters depend on ZooKeeper to handle replication and distributed DDL commands. In this Altinity webinar, we’ll explain why ZooKeeper is necessary, how it works, and introduce the new built-in replacement named ClickHouse Keeper. You’ll learn practical tips to care for ZooKeeper in sickness and health. You’ll also learn how/when to use ClickHouse Keeper. We will share our recommendations for keeping that happy as well.
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
20-Feb-2024
In this talk, I will walk through how someone can set up and run continuous SQL queries against Kafka topics utilizing Apache Flink. We will walk through creating Kafka topics, schemas, and publishing data.
We will then cover consuming Kafka data, joining Kafka topics, and inserting new events into Kafka topics as they arrive. This basic overview will show hands-on techniques, tips, and examples of how to do this.
Tim Spann
Tim Spann is the Principal Developer Advocate for Data in Motion @ Cloudera where he works with Apache Kafka, Apache Flink, Apache NiFi, Apache Iceberg, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
eBPF is one of the key technologies nowadays. There are several existing technologies in network or observability fields but not much in storage space. This presentation tells my research story and tries to define some of the possibilities of the technology.
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San JoseNikolay Samokhvalov
Future database administration will be highly automated. Until then, we still live in a world where extensive manual interactions are required from a skilled DBA. This will change soon as more "autonomous databases" reach maturity and enter the production environment.
Postgres-specific monitoring tools and systems continue to improve, detecting and analyzing performance issues and bottlenecks in production databases. However, while these tools can detect current issues, they require highly-experienced DBAs to analyze and recommend mitigations.
In this session, the speaker will present the initial results of the POSTGRES.AI project – Nancy CLI, a unified way to manage automated database experiments. Nancy CLI is an automated database management framework based on well-known open-source projects and incorporating major open-source tools and Postgres modules: pgBadger, pg_stat_kcache, auto_explain, pgreplay, and others.
Originally developed with the goal to simulate various SQL query use cases in various environments and collect data to train ML models, Nancy CLI turned out to be very a universal framework that can play a crucial role in CI/CD pipelines in any company.
Using Nancy CLI, casual DBAs and any engineers can easily conduct automated experiments today, either on AWS EC2 Spot instances or on any other servers. All you need is to tell Nancy which database to use, specify workload (synthetic or "real", generated based on the Postgres logs), and what you want to test – say, check how a new index will affect all most expensive query groups from pg_stat_statements, or compare various values of "default_statistics_target". All the collected information with a very high level of confidence will give you understanding, how various queries and overall Postgres performance will be affected when you apply this change to production.
Concurrent Programming Using the DisruptorTrisha Gee
Presented to the London Java Community at Skillsmatter on 1st March 2012.
Full presentation can be viewed here: http://skillsmatter.com/podcast/home/the-disruptor/js-3798
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...Till Blume
Semi-structured, schema-free data formats are used in many applications because their flexibility enables simple data exchange. Especially graph data formats like RDF have become well established in the Web of Data. For the Web of Data, it is known that data instances are not only added, changed, and removed regularly, but that their schemas are also subject to enormous changes over time. Unfortunately, the collection, indexing, and analysis of the evolution of data schemas on the web is still in its infancy. To enable a detailed analysis of the evolution of Linked Open Data, we lay the foundation for the implementation of incremental schema-level indices for the Web of Data. Unlike existing schema-level indices, incremental schema-level indices have an efficient update mechanism to avoid costly recomputations of the entire index. This enables us to monitor changes to data instances at schema-level, trace changes, and ultimately provide an always up-to-date schema-level index for the Web of Data. In this paper, we analyze in detail the challenges of updating arbitrary schema-level indices for the Web of Data. To this end, we extend our previously developed meta model FLuID. In addition, we outline an algorithm for performing the updates.
Time Series Analysis… using an Event Streaming Platformconfluent
Time Series Analysis… using an Event Streaming Platform, Mirko Kämpf, Solutions Architect, Confluent
Meetup Link: https://www.meetup.com/Apache-Kafka-Germany-Munich/events/272827528/
Time Series Analysis Using an Event Streaming PlatformDr. Mirko Kämpf
Advanced time series analysis (TSA) requires very special data preparation procedures to convert raw data into useful and compatible formats.
In this presentation you will see some typical processing patterns for time series based research, from simple statistics to reconstruction of correlation networks.
The first case is relevant for anomaly detection and to protect safety.
Reconstruction of graphs from time series data is a very useful technique to better understand complex systems like supply chains, material flows in factories, information flows within organizations, and especially in medical research.
With this motivation we will look at typical data aggregation patterns. We investigate how to apply analysis algorithms in the cloud. Finally we discuss a simple reference architecture for TSA on top of the Confluent Platform or Confluent cloud.
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...Otávio Carvalho
Work presented in partial fulfillment
of the requirements for the degree of
Bachelor in Computer Science - Federal University of Rio Grande do - Brazil
Blending Supersonic, Subatomic Java with deep learning to perform object detection. Sounds interesting? Because it is! Then watch this session to learn how to create a microservice combining TensorFlow and Quarkus together into one executable using GraalVM native image, JNI, and Protobuf. With this, we detect objects in photos by returning labels, bounding boxes, and confidence scores. Additionally, we will touch on Open Data Hub, an AI/ML solution for OpenShift.
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
Tinder’s Quickfire Pipeline powers all things data at Tinder. It was originally built using AWS Kinesis Firehoses and has since been extended to use both Kafka and other event buses. It is the core of Tinder’s data infrastructure. This rich data flow of both client and backend data has been extended to service a variety of needs at Tinder, including Experimentation, ML, CRM, and Observability, allowing backend developers easier access to shared client side data. We perform this using many systems, including Kafka, Spark, Flink, Kubernetes, and Prometheus. Many of Tinder’s systems were natively designed in an RPC first architecture.
Things we’ll discuss decoupling your system at scale via event-driven architectures include:
– Powering ML, backend, observability, and analytical applications at scale, including an end to end walk through of our processes that allow non-programmers to write and deploy event-driven data flows.
– Show end to end the usage of dynamic event processing that creates other stream processes, via a dynamic control plane topology pattern and broadcasted state pattern
– How to manage the unavailability of cached data that would normally come from repeated API calls for data that’s being backfilled into Kafka, all online! (and why this is not necessarily a “good” idea)
– Integrating common OSS frameworks and libraries like Kafka Streams, Flink, Spark and friends to encourage the best design patterns for developers coming from traditional service oriented architectures, including pitfalls and lessons learned along the way.
– Why and how to avoid overloading microservices with excessive RPC calls from event-driven streaming systems
– Best practices in common data flow patterns, such as shared state via RocksDB + Kafka Streams as well as the complementary tools in the Apache Ecosystem.
– The simplicity and power of streaming SQL with microservices
Swift Parallel Scripting for High-Performance WorkflowDaniel S. Katz
The Swift scripting language was created to provide a simple, compact way to write parallel scripts that run many copies of ordinary programs concurrently in various workflow patterns, reducing the need for complex parallel programming or arcane scripting to achieve this common high-level task. The result was a highly portable programming model based on implicitly parallel functional dataflow. The same Swift script runs on multi-core computers, clusters, grids, clouds, and supercomputers, and is thus a useful tool for moving workflow computations from laptop to distributed and/or high performance systems.
Swift has proven to be very general, and is in use in domains ranging from earth systems to bioinformatics to molecular modeling. It’s more recently been adapted to serve as a programming model for much finer-grain in-memory workflow on extreme scale systems, where it can perform task rates in the millions to billion-per-second.
In this talk, we describe the state of Swift’s implementation, present several Swift applications, and discuss ideas for of the future evolution of the programming model on which it’s based.
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...Raffaele Montella
FACE-IT is an effort to develop a new IT infrastructure to accelerate existing disciplinary research and enable information transfer among traditionally separate fields. At present, finding data and processing it into usable form can dominate research efforts. By providing ready access to not only data but also the software tools used to process it for specific uses (e.g., climate impact and economic model inputs), FACE-IT allows researchers to concentrate their efforts on analysis. Lowering barriers to data access allows researchers to stretch in new directions and allows researchers to learn and respond to the needs of other fields. FACE-IT builds on the Globus Galaxies platform, which has been developed over the past several years at the University of Chicago. FACE-IT also benefit from substantial software development undertaken by the communities who have developed most of the domain-specific tools required to populate FACE-IT with useful capabilities. The FACE-IT Galaxy manages earth system datatypes (as NetCDF), new tool parameters (dates, map, opendap), aggregated datatypes (RAFT), service providers and cool map visualizers.
Flink Forward Berlin 2017: Stephan Ewen - The State of Flink and how to adopt...Flink Forward
Data stream processing has redefined how many of us build data pipelines. Apache Flink is one of the systems at the forefront of that development: With its versatile APIs (event-time streaming, Stream SQL, events/state) and powerful execution model, Flink has been part of re-defining what stream processing can do. By now, Apache Flink powers some of the largest data stream processing pipelines in open source data stream processing. In this keynote, we will look at the evolution of Stream Processing and Apache Flink during the last year, and what we believe will be the next wave of stream processing applications. We show how the Flink community and users evolved, what use cases are coming up, and how new and upcoming features in Flink are making new types of applications possible. We will also discuss common challenges that companies are facing when adopting stream processing, and how we can help companies to rapidly adopt and roll out stream processing company-wide.
Data Stream Analytics - Why they are importantParis Carbone
Streaming is cool and it can help us do quick analytics and make profit but what about tsunamis? This is a motivation talk presented at the SeRC Big Data Workshop in Sweden during spring 2016. It motivates the streaming paradigm and provides examples on Apache Flink.
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...Big Data Value Association
The main goal of the session is to showcase approaches that greatly simplify the work of a data analyst when performing data analytics, or when employing machine learning algorithms, over Big Data. The session will include presentations on
(a) How data analytics workflows can be easily and graphically composed, and then optimized for execution,
(b) How raw data with great variety can be easily queried using SQL interfaces, and
(c) How complex machine learning operations can be performed efficiently in distributed settings.
After these presentations, the speakers will participate in a discussion with the audience, in order to discuss further tools that could make the work of a data analyst more simple.
Similar to Extending Complex Event Processing to Graph-structured Information (20)
Modeling the behavior of complex systems that operate in real environments, deal with physical elements, or interact with humans is a challenging task. It involves the explicit representation of aspects of behavioral uncertainty that are inherent in the system but generally neglected in software models. In this paper, we focus on the explicit representation of the behavior of objects of complex systems, considering their motivations, randomness, and the different types of underlying uncertainty that affect their actions. We show how such uncertain behaviors can be effectively modeled in UML and OCL, and how the specifications produced can be used to simulate and analyze these systems.
Knowledge-based applications that deal with uncertainty usually represent it by means of a confidence score that expresses the probability that a given fact is true. However, different users may have distinct opinions about the same fact, something that is not considered in existing proposals. This is critical in a number of areas where individual opinions need to be taken into account when making informed decisions, particularly when these are to be made by consensus. This paper introduces Subjective Knowledge Graphs (SKG), an extension to Probabilistic Knowledge Graphs that considers the individual opinions of separate users about the same facts, and allows reasoning about them. We show how SKGs can be implemented using standard graph databases and how the results of the queries can be enriched with the associated degrees of uncertainty.
Using UML and OCL Models to realize High-Level Digital TwinsAntonio Vallecillo
Digital twins constitute virtual representations of physically existing systems. However, their inherent complexity makes them difficult to develop and prove correct. In this paper, we explore the use of UML and OCL, complemented with an executable language, SOIL, to build and test digital twins at a high level of abstraction. We also show how to realize the bidirectional connection between the UML models of the digital twin in the USE tool with the physical twin, using an architectural framework centered on a data lake. We have built a prototype of the framework to demonstrate our ideas, and validated it by developing a digital twin of a Lego Mindstorms car. The results allow us to show some interesting advantages of using high-level UML models to specify virtual twins, such as simulation, property checking, and some other types of tests.
Modeling and Evaluating Quality in the Presence of UncertaintyAntonio Vallecillo
Slides of my keynote at QUATIC 2019.
Abstract: Uncertainty is the quality or state that involves lacking information or insufficient knowledge. Uncertainty can be due to different reasons, including incomplete or inaccurate information, inexact data or measurements, imprecise human judgments, or approximate estimations. The explicit representation of uncertainty is gaining attention among software engineers in order to provide more faithful systems representations, more accurate design methods, and better estimations of the development processes. However, incorporating uncertainty into our systems models is not enough. Uncertainty also affects many aspects related to the quality of systems, products, processes, and data, including how uncertainty is taken into account when designing our systems, measured when evaluating their quality, and perceived by customers and users. In fact, uncertainty – and, more specifically, the lack of knowledge about the system, our measuring tools, and our potential users – should be incorporated into our quality models, too. This talk identifies several kinds of uncertainties that have a direct impact on quality, and discusses some challenges on how quality needs to be planned, modeled, designed, measured and ensured in the presence of uncertainty.
This presentation discusses the representation of Belief Uncertainty in software models. This kind of uncertainty refers to the situation in which the modeler, or any other belief agent, is uncertain about the behavior of the system, or the statements that the model expresses about it. In this work, we propose to assign a degree of belief to model statements (let they be constraints, or any other model expression), which is expressed by a probability (called credence, in statistical terms) that represents a quantification of such a subjective degree of belief. We discuss how it can be represented using current modeling notations, and how to operate with it in order to make informed decisions.
This slides correspond to the talk we gave at the MODEVVA'17 workshop. This work presents an extension of OCL to allow modellers to deal with random numbers and probability distributions in their OCL specifications. We show its implementation in the tool USE and discuss some advantages of this new feature for the validation and verification of models.
Towards a Body of Knowledge for Model-Based Software EngineeringAntonio Vallecillo
Model-based Software Engineering (MBSE) is now accepted as a Software Engineering (SE) discipline and is being taught as part of more general SE curricula. However, an agreed core of concepts, mechanisms and practices — which constitutes the Body of Knowledge of a discipline — has not been captured anywhere, and is only partially covered by the SE Body of Knowledge (SWEBOK). With the goals of characterizing the contents of the MBSE discipline, promoting a consistent view of it worldwide, clarifying its scope with regard to other SE disciplines, and defining a foundation for a curriculum development on MBSE, this paper provides a proposal
for an extension of the contents of SWEBOK with the set of fundamental concepts, terms and mechanisms that should constitute the MBSE Body of Knowledge.
La Ingeniería Informática no es una Ciencia -- Reflexiones sobre la Educación...Antonio Vallecillo
Charla invitada en Jenui 2017: En esta charla cuestionamos la formación actual que damos a nuestros alumnos de ingeniería informática, más propia de una disciplina científica que una ingeniería. De hecho, a pesar de los esfuerzos llevados a cabo durante los últimos años en nuestras Escuelas de Ingeniería Informática para mejorar la formación que se da a sus alumnos, la sociedad sigue sin percibirnos como ingenieros ni reconoce las competencias propias de nuestra disciplina. Partiendo de las características que debería tener la profesión de ingeniero informático, y que nuestra misión como Universidad debe ser la de formar profesionales, se analizan las fortalezas y debilidades de nuestra educación y se identifican algunos aspectos tanto de contenidos como de metodología que sería preciso plantear si realmente queremos formar ingenieros informáticos y mejorar la percepción que tiene de nosotros la Sociedad.
La Ética en la Ingeniería de Software de Pruebas: Necesidad de un Código ÉticoAntonio Vallecillo
En esta charla se analiza la necesidad de un código ético en el desarrollo de la actividad profesional en el ámbito de la ingeniería de software y, consecuentemente, en las pruebas.(mpartida en el Primer Congreso del Comité Español de Empresas de Pruebas Software (SSTQB). Sevilla, 16/6/2016. http://www.sstqb.es/eventos/gira2016sstqbetapasevilla.html)
La enseñanza digital y los MOOC en la UMA. Presentación en el XV encuentro de...Antonio Vallecillo
Presentación realizada en el XV Encuentro de Rectores del Grupo Tordesillas (http://www.grupotordesillas.net/) celebrado en Lisboa en octubre de 2014, en el Seminario sobre nuevos instrumentos de aprendizaje digital y cursos masivos abiertos online (MOOC)
El doctorado en Informática: ¿Nuevo vino en viejas botellas? (Charla U. Sevil...Antonio Vallecillo
RESUMEN: El nuevo Real Decreto 99/2011 ha supuesto un cambio sustancial en el tercer ciclo de los estudios universitarios y en las prácticas que conducen al desarrollo de la tesis. Estos cambios son especialmente significativos en los doctorados de ciencias e ingenierías, y en particular en Informática, con la aparición de nuevas formas de comunicación social y de evaluación de la actividad investigadora, las bases de datos de publicaciones y los índices de impacto, la reputación online de los investigadores, y la profesionalización de los doctorados.
Esta charla está dedicada a presentar, y debatir, lo que representan estas novedades para los estudiantes de doctorado en Informática, y sugerir algunos aspectos que es importante tener en cuenta a la hora de plantear el desarrollo de la tesis y construir nuestra carrera profesional.
Accountable objects: Modeling Liability in Open Distributed SystemsAntonio Vallecillo
As an increasing amount of commercial activity becomes automated, the importance of techniques for providing complete system specifications, checking the correctness of interactions and flagging incorrect behaviour increases. The aim throughout is to generate more complete information about the system and so to produce IT solutions that reflect the business requirements accurately. So far, most efforts have been placed on the appropriate specification of the system behaviour and then on the non-functional requirements that constitute the contract between a system and its users. But in fully-automated commercial systems, such as Cloud Computing or SOA systems, we should also consider the liability of the different parties, since we should be able that assign responsibility to objects and, more importantly, to know in case of problems or contact violations, which one should be blamed.
The consequence of these considerations is that we need the ability to express more directly the necessary obligations and other deontic concepts, such as permissions and prohibitions, giving the designer the tools for extending the behavioural information to make it clear where obligations apply and with what detailed properties. In this talk we describe current activities within the International Organization for Standardization (ISO) to extend the ODP family of standards for the expression of policies using deontic logic, and on how to improve support for deontic concepts based on their reification.
Why do we model? Apart from to generating code, models can *have (and should have) many different usages in the realm of* Software Engineering including, e.g., understanding and reasoning *about the system under study, simulating it, or analyzing its* properties before the system is built. For these tasks we need to be *able to make questions about the model, and therefore count on* languages for expressing both the models and the questions, at the *right level of abstraction, and using the appropriate notations.* This talk discusses the need to count on different models to* describe a system, using different languages, and how semantics can *be assigned to them using model transformations. Such semantics *define the "meanings" of models, making them amenable to *interpretation and analysis.
Slides of the talk at ECMDA 2011, Brimingham, June 2011
ABSTRACT:
The package is one of the basic UML concepts. It is used both to group model elements and to provide a namescope for its members. However, combining these two tasks into a single UML concept can become not only too restrictive but also a source of subtle problems. This paper presents some improvements to the current UML naming and grouping schemata, using the ideas proposed in the reference model of Open Distributed Processing (ODP). The extensions try to maintain backwards compatibility with the existing UML concepts, while allowing more flexible grouping and naming mechanisms.
On the Combination of Domain Specific Modeling LanguagesAntonio Vallecillo
This are the slides of the presentation at ECMFA 2010 of paper:
"On the Combination of Domain Specific Modeling Languages". LNCS 6138, pp. 301-316, Paris, June 16-18, 2010.
ABSTRACT: Domain Specific Modeling Languages (DSMLs) are essential elements in Model-based Engineering. Each DSML allows capturing certain properties of the system, while abstracting other properties away. Nowadays DSMLs are mostly used in silos to solve specific problems. However, there are many occasions when multiple DSMLs need to be combined to design systems in a modular way. In this paper we discuss some scenarios of use and several mechanisms for DSML combination. We propose a general framework for combining DSMLs that subsumes them, based on the concept of viewpoint unification, and its realization using model-driven techniques.
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
SOCRadar Research Team: Latest Activities of IntelBroker
Extending Complex Event Processing to Graph-structured Information
1. Extending Complex Event Processing
to Graph-structured Information
Gala Barquero1, Loli Burgueño2, Javier Troya3, Antonio Vallecillo1
1Universidad de Málaga, Spain
2Universitat Oberta de Catalunya, Spain
3Universidad de Sevilla, Spain
2. Complex Event Processing
1. CEP is a method for data stream-processing for analyzing and correlating streams of
information about real-time events in order to derive conclusions from them.
2. CEP permits defining complex events on top of other events (primitive or complex)
3. CEP programs are composed of rules which are in charge of processing the events
2
3. Complex Event Processing
1. CEP is a method for data stream-processing for analyzing and correlating streams of
information about real-time events in order to derive conclusions from them.
2. CEP permits defining complex events on top of other events (primitive or complex)
3. CEP programs are composed of rules which are in charge of processing the events
3
Queries Data Results Data Results
Queries
(patterns)
4. Complex Event Processing
1. CEP is a method for data stream-processing for analyzing and correlating streams of
information about real-time events in order to derive conclusions from them.
2. CEP permits defining complex events on top of other events (primitive or complex)
3. CEP programs are composed of rules which are in charge of processing the events
4. CEP programs define (size or temporal) windows on the stream of events
4
5. Current CEP technologies
1. Efficient languages and technologies for processing huge streams of data
6.5 zettabytes (10^21) in 2016
15.3 zettabytes expected in 2020
2. Increasingly used (and useful) in applications for critical infrastructure monitoring,
real-time market trend analysis, plagues and natural disasters prediction, ...
5
7. However, real information is normally structured in more complex ways
1. The data is not only structured as a sequence of timed events, but as graphs that
combine transient (streams) and persistent (database) information
Queries about social trends based on Twitter feeds and shared Flickr photos
Monitoring tendencies via Twitter and Facebook posts
7
8. Our contribution
1. Extend CEP systems and languages to deal with graph-based information
Able to deal both with streams of timed events and with graphs of persistent data
Extend the concept of a CEP “sequential window” to a “spatial window”
Keep up with the stringent requirements on performance and scalability of CEP
systems
2. For this we decided to:
Generalize the structure of a CEP stream from a sequence of time-ordered events to a
Model (i.e., a graph of interrelated elements – time being just one dimension)
Consider the behavior of a CEP system as a particular kind of in-place Model
Transformation
Use the concept of “vicinity graphs” to define and implement spatial windows in
models (a generalization of CEP’s sequential windows)
Use recent graph parallel computational technologies to provide the supporting
storage and access infrastructure for the models, and graph-processing systems to
implement the corresponding in-place model transformations
8
10. Case study: Twitter and Flicker
10
Q1
A HotTopic event is generated every time a hashtag has been used
by both Twitter and Flickr users at least 100 times in the last hour
11. Case study: Twitter and Flicker
Q1: A HotTopic event is generated every time a hashtag has been used by both
Twitter and Flickr users at least 100 times in the last hour.
Q2: A PopularTwitterPhoto element is created when the hashtag of a photo is
mentioned in a tweet that receives more than 30 likes in the last hour.
Q3: A PopularFlickrPhoto element is created when a photo is favored by more
than 50 Flickr users who have more than 50 followers.
Q4: We generate a NiceTwitterPhoto event when a user, with an h-index higher
than 50, posts three tweets in a row in the last hour containing a hashtag
that describes a photo.
Q5: A InfluencerTweeted event is generated, considering the 10K most recent
tweets, when a user with h-index higher than 70 and more than 50K
followers, sends a tweet.
11
12. Current Implementation
1. Models implemented with Apache Spark
RDDs (resilient distributed dataset) used to store both model elements (graph vertices)
and their relations (edges)
Models populated using the sources’ APIs to obtain the data
One thread for each stream of events in case of streaming data
2. Model transformation rules (modeling the corresponding CEP rules) implemented in
Scala
Implemented in terms of Spark and GraphX functions
One dedicated running thread for each rule
Produced events stored using RDDs too
3. Data lifecycle
Transient data (and their relationships) have an “expiration date” (ED)
The ED is determined by the largest window of the rules that deal with the event
Once the ED of an element has passed, the element is removed from the system
12
14. Analyses
1. Performance
How fast are we?
Is the performance of our
proposal acceptable for dealing
with large systems?
How do we compare with CEP
systems? (when only
one-dimensional streams are
used)
2. Expressiveness
Are we as expressive as CEP
languages?
Can we write all CEP patterns
with GraphX?
How easy is to write Rules with
our proposal?
14
15. Performance analysis
1. Performance Figures for the Twitter and Flickr case study (in milliseconds)
2. Comparison figures with other solutions (127K/6500K):
15
16. Performance analysis: comparison with streaming CEP systems
1. A different case study (Motorbike) implemented using both our solution and Esper
16
17. Expressiveness
1. We have been able to express all queries using Scala and GraphX
2. However, the expression of the queries is not simple
17
Scala code for the “DriverLeftSeat” rule:
19. Technology (and its rapid evolution) is an issue in this context
19
Technology In
memory
Query
Language
Pros Cons
Neo4j No Cypher * Expressiveness and usability of Cypher!!!
* Easy to install and to use
* Scalability
* Disk Access (R/W) very slow
* No in-memory implementation available
Spark +
Graphx
Yes Scala * Versatile and very expressive language.
* Easy to install
* Implements cluster mode (distributed)
* Cumbersome as query lang. for graphs
* Uses lazy evaluation
* Complex configuration in cluster mode
Viatra Yes Viatra * Speed and general performance
* Good language for querying models
* Very expressive
* Difficult to install and configure
* Documentation is scarce
Tinkergraph Yes Gremlin * Graph-native language and tools
* In-memory implementation
* Easy to install and to use
* Learning curve of Gremlin
CrateDB No SQL * Uses disk but very efficiently (scalability).
* SQL is well known and used
* Implements cluster mode (distributed)
* Easy to install and to use
* Writting graph queries in SQL is not easy
(specially those queries involving hops)
20. Conclusions and future work
Contribution: Extension of CEP systems to deal with graph-structured information:
Able to deal both with streams of timed events and with graphs of persistent data
Represent the information to manage as a Model
Consider the behavior of a CEP system as an in-place Model Transformation
Extend the concept of CEP windows to models’ spatial windows
Use graph parallel computational technologies to provide the supporting storage
and access infrastructure, and
Use of graph-processing languages and systems to implement the corresponding
model transformations
20
21. Future work
1. Performance:
Experiment with other technologies, beyond Spark+GraphX
Each one has pros and cons (expressiveness, performance, scalability, distribution)
Volatility is an issue… They change too rapidly!
2. Expressiveness
Compilers from Query languages to Storage technologies can be a solution
For example, from Cypher to Gremlin or to Scala+GraphX
3. Correctness/Accuracy
What is the error introduced by the use of spatial windows?
Here we need to trade accuracy for performance
Approximate queries and model transformations…
21
Q: A YoungInfluencer is a TwitterUser younger
than 25 years old, which has more than 30
followers older than 25 years old.
22. Extending Complex Event Processing
to Graph-structured Information
Gala Barquero1, Loli Burgueño2, Javier Troya3, Antonio Vallecillo1
1Universidad de Málaga, Spain
2Universitat Oberta de Catalunya, Spain
3Universidad de Sevilla, Spain