Palo Alto Networks processes terabytes of events each day. One of their many challenges is to understand which of those events (which might come from various different sensors) actually describe the same story but from many different viewpoints.
Traditionally, such a system would need some sort of a database to store the events, and a message queue to notify consumers about new events that arrived into the system. They wanted to mitigate the cost and operational overhead of deploying yet another stateful component to their system, and designed a solution that uses ScyllaDB as the database for the events *and* as a message queue that allows our consumers to consume the correct events each time. Join this talk with Daniel Belenky, Principal Software Engineer, Palo Alto Networks where he will walk you through their process.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Running Scylla on Kubernetes with Scylla OperatorScyllaDB
Kubernetes has quickly become the ad-hoc standard for managing software deployments. The Scylla team recently released the beta version of the Scylla Operator for Kubernetes.
Join us to learn how Kubernetes can be used to automate the deployment, scaling, and various operations of Scylla NoSQL. In this webinar we will:
- Present the mapping between Scylla and Kubernetes entities, and explain their reasoning
- Give insight into the Operator Open Source project, and how you can get involved
- Explain and demo common procedures with Scylla Operator
- Discuss the challenges of running a high performance, persistent application on Kubernetes, and the trade offs we considered
- Share our plans for Scylla on Kubernetes
Nesta apresentação mostramos as melhores práticas para se fazer backup, abordando temas como Disaster recovery, Point in time recovery, backup físcio vs lógico, backup incremental vs diferencial.
Quieting noisy neighbor with Intel® Resource Director TechnologyMichelle Holley
A typical computer server on the cloud hosted multiple VMs. Each VM hosted an independent application. The operation of a mixture of applications in cloud requires proper resource management and it's critical to QoS, this session is to study the impact of different neighbors on an application’s performance and to show how Intel® RDT can help to detect and mitigate a noisy-neighbor situation.
About the authors: Sunil is senior cloud performance engineer at Intel working on cloud performance and optimization for Oracle cloud. Prior to this he worked on service assurance and orchestration products for Openstack cloud. Sunil has 10+ years of experience working on different software products for server management. He holds Masters in Computer Science from IIT Chicago.
Khun Ban is a cloud performance engineer manager leading a team to optimize cloud performance and TCO. He has over twenty years of enterprise software development experience. His current focus is on providing customer with best cloud experience. He received his B.S. degree in Computer Science and Engineering from the University of Washington in 1995.
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...HostedbyConfluent
Managing Apache Kafka sometimes could be cumbersome, and that's something that we would like to avoid, especially for developers and data engineers that need to build and develop data pipelines.
Luckily, Kubernetes and Kafka's combination helps us reduce everyday tasks tremendously by adding myriad capabilities to lessen the complexity of managing clusters.
Kafka Connect and KSQLDB are a fantastic combo to add to your streaming stack. These two soldiers can facilitate data acquisition and processing and also provide outstanding real-time ETL capabilities. But what if you need an OLAP datastore to answer complex queries with a low-latency response, that's where Apache Pinot comes to play.
At this session, you're going to learn:
- Effective Kafka deployment on Kubernetes
- How to properly configure Kafka Connect and KSQLDB
- Integrate Apache Pinot to answer OLAP queries
Running Scylla on Kubernetes with Scylla OperatorScyllaDB
Kubernetes has quickly become the ad-hoc standard for managing software deployments. The Scylla team recently released the beta version of the Scylla Operator for Kubernetes.
Join us to learn how Kubernetes can be used to automate the deployment, scaling, and various operations of Scylla NoSQL. In this webinar we will:
- Present the mapping between Scylla and Kubernetes entities, and explain their reasoning
- Give insight into the Operator Open Source project, and how you can get involved
- Explain and demo common procedures with Scylla Operator
- Discuss the challenges of running a high performance, persistent application on Kubernetes, and the trade offs we considered
- Share our plans for Scylla on Kubernetes
Nesta apresentação mostramos as melhores práticas para se fazer backup, abordando temas como Disaster recovery, Point in time recovery, backup físcio vs lógico, backup incremental vs diferencial.
Quieting noisy neighbor with Intel® Resource Director TechnologyMichelle Holley
A typical computer server on the cloud hosted multiple VMs. Each VM hosted an independent application. The operation of a mixture of applications in cloud requires proper resource management and it's critical to QoS, this session is to study the impact of different neighbors on an application’s performance and to show how Intel® RDT can help to detect and mitigate a noisy-neighbor situation.
About the authors: Sunil is senior cloud performance engineer at Intel working on cloud performance and optimization for Oracle cloud. Prior to this he worked on service assurance and orchestration products for Openstack cloud. Sunil has 10+ years of experience working on different software products for server management. He holds Masters in Computer Science from IIT Chicago.
Khun Ban is a cloud performance engineer manager leading a team to optimize cloud performance and TCO. He has over twenty years of enterprise software development experience. His current focus is on providing customer with best cloud experience. He received his B.S. degree in Computer Science and Engineering from the University of Washington in 1995.
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...HostedbyConfluent
Managing Apache Kafka sometimes could be cumbersome, and that's something that we would like to avoid, especially for developers and data engineers that need to build and develop data pipelines.
Luckily, Kubernetes and Kafka's combination helps us reduce everyday tasks tremendously by adding myriad capabilities to lessen the complexity of managing clusters.
Kafka Connect and KSQLDB are a fantastic combo to add to your streaming stack. These two soldiers can facilitate data acquisition and processing and also provide outstanding real-time ETL capabilities. But what if you need an OLAP datastore to answer complex queries with a low-latency response, that's where Apache Pinot comes to play.
At this session, you're going to learn:
- Effective Kafka deployment on Kubernetes
- How to properly configure Kafka Connect and KSQLDB
- Integrate Apache Pinot to answer OLAP queries
Complex event processing (CEP) and stream analytics are commonly treated as distinct classes of stream processing applications. While CEP workloads identify patterns from event streams in near real-time, stream analytics queries ingest and aggregate high-volume streams. Both types of use cases have very different requirements which resulted in diverging system designs. CEP systems excel at low-latency processing whereas engines for stream analytics achieve high throughput. Recent advances in open source stream processing yielded systems that can process several millions of events per second at a sub-second latency. One of these systems is Apache Flink and it enables applications that include typical CEP features as well as heavy aggregations.
Guided by examples, I will demonstrate how Apache Flink enables the user to process CEP and stream analytics workloads alike. Starting from aggregations over streams, we will next detect temporal patterns in our data triggering alerts and finally aggregate these alerts to gain more insights from our data. As an outlook, I will present Flink's CEP-enriched StreamSQL interface providing a declarative way to specify temporal patterns in your SQL query.
Re:invent 2016 Container Scheduling, Execution and AWS Integrationaspyker
Members from over all over the world streamed over forty-two billion hours of Netflix content last year. Various Netflix batch jobs and an increasing number of service applications use containers for their processing. In this session, Netflix presents a deep dive on the motivations and the technology powering container deployment on top of Amazon Web Services. The session covers our approach to resource management and scheduling with the open source Fenzo library, along with details of how we integrate Docker and Netflix container scheduling running on AWS. We cover the approach we have taken to deliver AWS platform features to containers such as IAM roles, VPCs, security groups, metadata proxies, and user data. We want to take advantage of native AWS container resource management using Amazon ECS to reduce operational responsibilities. We are delivering these integrations in collaboration with the Amazon ECS engineering team. The session also shares some of the results so far, and lessons learned throughout our implementation and operations.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Talk given at JAX DevOps London on 2019-05-15.
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 90+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are open source and can be applied to most Kubernetes deployments. Topics covered in the talk include: understanding resource requests and limits, cgroups and CFS quota behavior, contributing factors to cluster costs (in public clouds), and best practices for managing Kubernetes resources.
We will show the advantages of having a geo-distributed database cluster and how to create one using Galera Cluster for MySQL. We will also discuss the configuration and status variables that are involved and how to deal with typical situations on the WAN such as slow, untrusted or unreliable links, latency and packet loss. We will demonstrate a multi-region cluster on Amazon EC2 and perform some throughput and latency measurements in real-time (video http://galeracluster.com/videos/using-galera-replication-to-create-geo-distributed-clusters-on-the-wan-webinar-video-3/)
This talk will show how to build your own simple, cheap and scalable CGN solutions with stateful-failover with commodity servers with a decent NIC running Linux, nftables, and bird.
We were in need to introduce NAT into the network and a commercial solution would have required a 6 figure invest, so we build it ourselves for <10% of that cost.
Two Dell servers with a recent CPU, two Mellanox NICs and nftables as well as bird do the trick and make for a simple, cheap and scalable CGN box, supporting ECMP, simple draining and orchestration by your usual Linux tool chain as well as stateful-failover.
Video at: https://www.youtube.com/watch?v=qHsHkjhGibA
Hoodie (Hadoop Upsert Delete and Incremental) is an analytical, scan-optimized data storage abstraction which enables applying mutations to data in HDFS on the order of few minutes and chaining of incremental processing in hadoop
Near real-time statistical modeling and anomaly detection using Flink!Flink Forward
Flink Forward San Francisco 2022.
At ThousandEyes we receive billions of events every day that allow us to monitor the internet; the most important aspect of our platform is to detect outages and anomalies that have a potential to cause serious impact to customer applications and user experience. Automatic detection of such events at lowest latency and highest accuracy is extremely important for our customers and their business. After launching several resilient and low latency data pipelines in production using Flink we decided to take it up a notch; we leveraged Flink to build statistical models in near real-time and apply them on incoming stream of events to detect anomalies! In this session we will deep dive into the design as well as discuss pitfalls and learnings while developing our real-time platform that leverages Debezium, Kafka, Flink, ElasticCache and DynamoDB to process events at scale!
by
Kunal Umrigar & Balint Kurnasz
Multiple Sites and Disaster Recovery with Ceph: Andrew Hatfield, Red HatOpenStack
Multiple Sites and Disaster Recovery with Ceph
Audience: Intermediate
Topic: Storage
Abstract: Ceph is the leading storage solution for OpenStack. As OpenStack deployments become more mission critical and widely deployed, multiple site requirements are increasing as is the need to ensure disaster recovery and business continuity. Learn about the new capabilities in Ceph that assist customers with meeting these requirements for block and object uses.
Speaker Bio: Andrew Hatfield, Red Hat
Andrew has over 20 years experience in the IT industry across APAC, specialising in Databases, Directory Systems, Groupware, Virtualisation and Storage for Enterprise and Government organisations. When not helping customers slash costs and increase agility by moving to the software-defined storage future, he’s enjoying the subtle tones of Islay Whisky and shredding pow pow on the world’s best snowboard resorts.
OpenStack Australia Day Government - Canberra 2016
https://events.aptira.com/openstack-australia-day-canberra-2016/
Operating PostgreSQL at Scale with KubernetesJonathan Katz
The maturation of containerization platforms has changed how people think about creating development environments and has eliminated many inefficiencies for deploying applications. These concept and technologies have made its way into the PostgreSQL ecosystem as well, and tools such as Docker and Kubernetes have enabled teams to run their own “database-as-a-service” on the infrastructure of their choosing.
All this sounds great, but if you are new to the world of containers, it can be very overwhelming to find a place to start. In this talk, which centers around demos, we will see how you can get PostgreSQL up and running in a containerized environment with some advanced sidecars in only a few steps! We will also see how it extends to a larger production environment with Kubernetes, and what the future holds for PostgreSQL in a containerized world.
We will cover the following:
* Why containers are important and what they mean for PostgreSQL
* Create a development environment with PostgreSQL, pgadmin4, monitoring, and more
* How to use Kubernetes to create your own "database-as-a-service"-like PostgreSQL environment
* Trends in the container world and how it will affect PostgreSQL
At the conclusion of the talk, you will understand the fundamentals of how to use container technologies with PostgreSQL and be on your way to running a containerized PostgreSQL environment at scale!
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...Sandesh Rao
In this session, I will cover under-the-hood features that power Oracle Real Application Clusters (Oracle RAC) 19c specifically around Cache Fusion and Service management. Improvements in Oracle RAC helps in integration with features such as Multitenant and Data Guard. In fact, these features benefit immensely when used with Oracle RAC. Finally we will talk about changes to the broader Oracle RAC Family of Products stack and the algorithmic changes that helps quickly detect sick/dead nodes/instances and the reconfiguration improvements to ensure that the Oracle RAC Databases continue to function without any disruption
IBM MQ: An Introduction to Using and Developing with MQ Publish/SubscribeDavid Ware
IBM MQ allows application programmers to use the publish/subscribe application model with ease. This session takes you through the fundamental publish/subscribe concepts and how they relate to IBM MQ. Covering aspects of system design, configuration and application programming, this session is essential for all users looking to adopt publish/subscribe with IBM MQ.
What is HyperLogLog and Why You Will Love It | PostgreSQL Conference Europe 2...Citus Data
In applications, it’s typical to have some analytics dashboard highlighting the number of unique items such as unique users or unique visits. While traditional COUNT(DISTINCT) queries work well most of the time for such use cases, it has some drawbacks while working on large data sets which result in large memory requirements and/or slow execution time. It is also not easy to use traditional COUNT(DISTINCT) queries in a distributed environment.
In this talk, we will focus on the HyperLogLog (HLL) algorithm and its PostgreSQL extension postgresql-hll. HLL can provide approximate answers to COUNT(DISTINCT) queries within mathematically provable error bounds. It is not only fast and memory-efficient but also has very interesting properties which especially shine in a distributed environment. In this talk, Burak Y. from Citus Data (also maintainer of postgresql-hll extension) will talk about internals of HLL and how it estimates cardinality; Jarred from IronNet Cybersecurity will showcase HLL in PostgreSQL with real world examples and use cases from his experience of running HLL in production. We promise that at the end of this session, you will fall in love with this fun little data structure as the newest tool in your data science and analytics tool belt.
Scylla on Kubernetes: Introducing the Scylla OperatorScyllaDB
How can Kubernetes be best used to automate the deployment, scaling, and various operations of a Scylla database?
Enter Kubernetes Operators, the way to combine domain-specific knowledge about Scylla with the automation framework of Kubernetes.
In this presentation, we will quickly explore what Kubernetes is and why it works so well, highlight the pain points of running Scylla with just Kubernetes primitives, and show how we extended Kubernetes so that it can correctly operate a Scylla database.
Finally, we will show the Scylla Operator in action and show how easily you can spin up a Scylla cluster with just one command.
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...Imperva Incapsula
Mondrian, MySQL, Mongo, Casandra, Lucene. You name it, we tried it. As a startup looking for cost-efficient and scalable solutions to power our event processing and statistics backend, we gave almost every Big Data technology out there a go. What we learned from these experiences is that doing it yourself is better than using plug-and-play black box solutions.
This presentation details the building of Incapsula’s Big Data system as a case study, examining the requirements and the different evolutionary phases it went through before becoming what it is today.
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...Codemotion
Vast volume of our processed data is Time Series data and once you start working with distributed systems, you start tackling many scale and performance problems: How to handle missing data?Should I handle both serving and backed process or separating them out? Best Performance for Money? In the talk we will tell the tale of all of the transformations we’ve made to our data model@Windward, some of the problems we’ve handled, review the multiple data persistency layers like: S3, MongoDB, Apache Cassandra, MySQL. And I’ll try my best NOT to answer the question “Which one of them is the Best?"
Complex event processing (CEP) and stream analytics are commonly treated as distinct classes of stream processing applications. While CEP workloads identify patterns from event streams in near real-time, stream analytics queries ingest and aggregate high-volume streams. Both types of use cases have very different requirements which resulted in diverging system designs. CEP systems excel at low-latency processing whereas engines for stream analytics achieve high throughput. Recent advances in open source stream processing yielded systems that can process several millions of events per second at a sub-second latency. One of these systems is Apache Flink and it enables applications that include typical CEP features as well as heavy aggregations.
Guided by examples, I will demonstrate how Apache Flink enables the user to process CEP and stream analytics workloads alike. Starting from aggregations over streams, we will next detect temporal patterns in our data triggering alerts and finally aggregate these alerts to gain more insights from our data. As an outlook, I will present Flink's CEP-enriched StreamSQL interface providing a declarative way to specify temporal patterns in your SQL query.
Re:invent 2016 Container Scheduling, Execution and AWS Integrationaspyker
Members from over all over the world streamed over forty-two billion hours of Netflix content last year. Various Netflix batch jobs and an increasing number of service applications use containers for their processing. In this session, Netflix presents a deep dive on the motivations and the technology powering container deployment on top of Amazon Web Services. The session covers our approach to resource management and scheduling with the open source Fenzo library, along with details of how we integrate Docker and Netflix container scheduling running on AWS. We cover the approach we have taken to deliver AWS platform features to containers such as IAM roles, VPCs, security groups, metadata proxies, and user data. We want to take advantage of native AWS container resource management using Amazon ECS to reduce operational responsibilities. We are delivering these integrations in collaboration with the Amazon ECS engineering team. The session also shares some of the results so far, and lessons learned throughout our implementation and operations.
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Talk given at JAX DevOps London on 2019-05-15.
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 90+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are open source and can be applied to most Kubernetes deployments. Topics covered in the talk include: understanding resource requests and limits, cgroups and CFS quota behavior, contributing factors to cluster costs (in public clouds), and best practices for managing Kubernetes resources.
We will show the advantages of having a geo-distributed database cluster and how to create one using Galera Cluster for MySQL. We will also discuss the configuration and status variables that are involved and how to deal with typical situations on the WAN such as slow, untrusted or unreliable links, latency and packet loss. We will demonstrate a multi-region cluster on Amazon EC2 and perform some throughput and latency measurements in real-time (video http://galeracluster.com/videos/using-galera-replication-to-create-geo-distributed-clusters-on-the-wan-webinar-video-3/)
This talk will show how to build your own simple, cheap and scalable CGN solutions with stateful-failover with commodity servers with a decent NIC running Linux, nftables, and bird.
We were in need to introduce NAT into the network and a commercial solution would have required a 6 figure invest, so we build it ourselves for <10% of that cost.
Two Dell servers with a recent CPU, two Mellanox NICs and nftables as well as bird do the trick and make for a simple, cheap and scalable CGN box, supporting ECMP, simple draining and orchestration by your usual Linux tool chain as well as stateful-failover.
Video at: https://www.youtube.com/watch?v=qHsHkjhGibA
Hoodie (Hadoop Upsert Delete and Incremental) is an analytical, scan-optimized data storage abstraction which enables applying mutations to data in HDFS on the order of few minutes and chaining of incremental processing in hadoop
Near real-time statistical modeling and anomaly detection using Flink!Flink Forward
Flink Forward San Francisco 2022.
At ThousandEyes we receive billions of events every day that allow us to monitor the internet; the most important aspect of our platform is to detect outages and anomalies that have a potential to cause serious impact to customer applications and user experience. Automatic detection of such events at lowest latency and highest accuracy is extremely important for our customers and their business. After launching several resilient and low latency data pipelines in production using Flink we decided to take it up a notch; we leveraged Flink to build statistical models in near real-time and apply them on incoming stream of events to detect anomalies! In this session we will deep dive into the design as well as discuss pitfalls and learnings while developing our real-time platform that leverages Debezium, Kafka, Flink, ElasticCache and DynamoDB to process events at scale!
by
Kunal Umrigar & Balint Kurnasz
Multiple Sites and Disaster Recovery with Ceph: Andrew Hatfield, Red HatOpenStack
Multiple Sites and Disaster Recovery with Ceph
Audience: Intermediate
Topic: Storage
Abstract: Ceph is the leading storage solution for OpenStack. As OpenStack deployments become more mission critical and widely deployed, multiple site requirements are increasing as is the need to ensure disaster recovery and business continuity. Learn about the new capabilities in Ceph that assist customers with meeting these requirements for block and object uses.
Speaker Bio: Andrew Hatfield, Red Hat
Andrew has over 20 years experience in the IT industry across APAC, specialising in Databases, Directory Systems, Groupware, Virtualisation and Storage for Enterprise and Government organisations. When not helping customers slash costs and increase agility by moving to the software-defined storage future, he’s enjoying the subtle tones of Islay Whisky and shredding pow pow on the world’s best snowboard resorts.
OpenStack Australia Day Government - Canberra 2016
https://events.aptira.com/openstack-australia-day-canberra-2016/
Operating PostgreSQL at Scale with KubernetesJonathan Katz
The maturation of containerization platforms has changed how people think about creating development environments and has eliminated many inefficiencies for deploying applications. These concept and technologies have made its way into the PostgreSQL ecosystem as well, and tools such as Docker and Kubernetes have enabled teams to run their own “database-as-a-service” on the infrastructure of their choosing.
All this sounds great, but if you are new to the world of containers, it can be very overwhelming to find a place to start. In this talk, which centers around demos, we will see how you can get PostgreSQL up and running in a containerized environment with some advanced sidecars in only a few steps! We will also see how it extends to a larger production environment with Kubernetes, and what the future holds for PostgreSQL in a containerized world.
We will cover the following:
* Why containers are important and what they mean for PostgreSQL
* Create a development environment with PostgreSQL, pgadmin4, monitoring, and more
* How to use Kubernetes to create your own "database-as-a-service"-like PostgreSQL environment
* Trends in the container world and how it will affect PostgreSQL
At the conclusion of the talk, you will understand the fundamentals of how to use container technologies with PostgreSQL and be on your way to running a containerized PostgreSQL environment at scale!
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...Sandesh Rao
In this session, I will cover under-the-hood features that power Oracle Real Application Clusters (Oracle RAC) 19c specifically around Cache Fusion and Service management. Improvements in Oracle RAC helps in integration with features such as Multitenant and Data Guard. In fact, these features benefit immensely when used with Oracle RAC. Finally we will talk about changes to the broader Oracle RAC Family of Products stack and the algorithmic changes that helps quickly detect sick/dead nodes/instances and the reconfiguration improvements to ensure that the Oracle RAC Databases continue to function without any disruption
IBM MQ: An Introduction to Using and Developing with MQ Publish/SubscribeDavid Ware
IBM MQ allows application programmers to use the publish/subscribe application model with ease. This session takes you through the fundamental publish/subscribe concepts and how they relate to IBM MQ. Covering aspects of system design, configuration and application programming, this session is essential for all users looking to adopt publish/subscribe with IBM MQ.
What is HyperLogLog and Why You Will Love It | PostgreSQL Conference Europe 2...Citus Data
In applications, it’s typical to have some analytics dashboard highlighting the number of unique items such as unique users or unique visits. While traditional COUNT(DISTINCT) queries work well most of the time for such use cases, it has some drawbacks while working on large data sets which result in large memory requirements and/or slow execution time. It is also not easy to use traditional COUNT(DISTINCT) queries in a distributed environment.
In this talk, we will focus on the HyperLogLog (HLL) algorithm and its PostgreSQL extension postgresql-hll. HLL can provide approximate answers to COUNT(DISTINCT) queries within mathematically provable error bounds. It is not only fast and memory-efficient but also has very interesting properties which especially shine in a distributed environment. In this talk, Burak Y. from Citus Data (also maintainer of postgresql-hll extension) will talk about internals of HLL and how it estimates cardinality; Jarred from IronNet Cybersecurity will showcase HLL in PostgreSQL with real world examples and use cases from his experience of running HLL in production. We promise that at the end of this session, you will fall in love with this fun little data structure as the newest tool in your data science and analytics tool belt.
Scylla on Kubernetes: Introducing the Scylla OperatorScyllaDB
How can Kubernetes be best used to automate the deployment, scaling, and various operations of a Scylla database?
Enter Kubernetes Operators, the way to combine domain-specific knowledge about Scylla with the automation framework of Kubernetes.
In this presentation, we will quickly explore what Kubernetes is and why it works so well, highlight the pain points of running Scylla with just Kubernetes primitives, and show how we extended Kubernetes so that it can correctly operate a Scylla database.
Finally, we will show the Scylla Operator in action and show how easily you can spin up a Scylla cluster with just one command.
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...Imperva Incapsula
Mondrian, MySQL, Mongo, Casandra, Lucene. You name it, we tried it. As a startup looking for cost-efficient and scalable solutions to power our event processing and statistics backend, we gave almost every Big Data technology out there a go. What we learned from these experiences is that doing it yourself is better than using plug-and-play black box solutions.
This presentation details the building of Incapsula’s Big Data system as a case study, examining the requirements and the different evolutionary phases it went through before becoming what it is today.
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...Codemotion
Vast volume of our processed data is Time Series data and once you start working with distributed systems, you start tackling many scale and performance problems: How to handle missing data?Should I handle both serving and backed process or separating them out? Best Performance for Money? In the talk we will tell the tale of all of the transformations we’ve made to our data model@Windward, some of the problems we’ve handled, review the multiple data persistency layers like: S3, MongoDB, Apache Cassandra, MySQL. And I’ll try my best NOT to answer the question “Which one of them is the Best?"
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2lGNybu.
Stefan Krawczyk discusses how his team at StitchFix use the cloud to enable over 80 data scientists to be productive. He also talks about prototyping ideas, algorithms and analyses, how they set up & keep schemas in sync between Hive, Presto, Redshift & Spark and make access easy for their data scientists, etc. Filmed at qconsf.com..
Stefan Krawczyk is Algo Dev Platform Lead at StitchFix, where he’s leading development of the algorithm development platform. He spent formative years at Stanford, LinkedIn, Nextdoor & Idibon, working on everything from growth engineering, product engineering, data engineering, to recommendation systems, NLP, data science and business intelligence.
In this talk, we share our experience when we build up our data pipeline. We went from mongodb, and migrated to cassandra, and now we use kafka and spark to handle our data. We also talk about what problem encounter, why we select these solutions, and where we will go next.
In 2018 we've seen a huge uptick in applications using Kubernetes for their deployment method. Many times your persistent data layer is a difficult decision. What will you store data in, how long will you need to access this data and who will manage the lifecycle of this data? These are important questions many developers and ops teams have taken to heart. In this talk we'll review how the data layer is managed for high availability and reliability in modern application deployment. The attendee should leave having a better understanding of the options in front of them and their ability to build applications in any hosting environment.
Enterprise Cloud Databases are fully managed and clustered databases tailored for production needs.
OVH takes care of all the infrastructure setup, you end up with you SQL access and are able to focus on your business.
In many database applications we first log data and then, a few hours or days later, we start analyzing it. But in a world that’s moving faster and faster, we sometimes need to analyze what is happening NOW.
Azure Stream Analytics allows you to analyze streams of data via a new Azure service. In this session you will see how to get started using this new service. From event hubs on the input side over temporal SQL queries: the demo’s in this session will show you end to end how to get started with Azure Stream Analytics.
Headaches and Breakthroughs in Building Continuous ApplicationsDatabricks
At SpotX, we have built and maintained a portfolio of Spark Streaming applications -- all of which process records in the millions per minute. From pure data ingestion, to ETL, to real-time reporting, to live customer-facing products and features, continuous applications are in our DNA. Come along with us as we outline our journey from square one to present in the world of Spark Streaming. We'll detail what we've learned about efficient processing and monitoring, reliability and stability, and long term support of a streaming app. Come learn from our mistakes, and leave with some handy settings and designs you can implement in your own streaming apps.
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Landon Robinson
At SpotX, we have built and maintained a portfolio of Spark Streaming applications -- all of which process records in the millions per minute. From pure data ingestion, to ETL, to real-time reporting, to live customer-facing products and features, continuous applications are in our DNA. Come along with us as we outline our journey from square one to present in the world of Spark Streaming. We'll detail what we've learned about efficient processing and monitoring, reliability and stability, and long term support of a streaming app. Come learn from our mistakes, and leave with some handy settings and designs you can implement in your own streaming apps.
Presented by Landon Robinson and Jack Chapa
OSMC 2019 | How to improve database Observability by Charles JudithNETWAYS
Delivering a database service is not a simple job but to ensure that everything is working correctly your platform needs to be observable. In this talk, I’ll talk about how we make the MySQL/MariaDB databases observable. We’ll talk about the RED, USE methods, and the golden signals. You’ll discover how we dealt with the following questions “We think the database is slow”. This talk will allow you to make your databases discoverable with open source solutions.
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real WorldWSO2
Abundant data is all around. The most important aspect is how you as an organization can access the data, process it, and present information to the relevant authorities on time. To gain competitive advantage the means of accessing, processing and presenting the data should be optimal, highly available and scalable.
In this talk, we will discuss how you can leverage WSO2 Data Analytics Server, WSO2 IoT Server, WSO2 Enterprise Service Bus and other WSO2 products in order to analyze the data. We will also discuss different deployment patterns that can provide you with a suitable solution that lets you analyze relevant data historically, in real-time or interactively and predict future states to make better decisions for your organization’s success.
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Databricks
Morningstar’s Risk Model project is created by stitching together statistical and machine learning models to produce risk and performance metrics for millions of financial securities. Previously, we were running a single version of this application, but needed to expand it to allow for customizations based on client demand. With the goal of running hundreds of custom Risk Model runs at once at an output size of around 1TB of data each, we had a challenging technical problem on our hands! In this presentation, we’ll talk about the challenges we faced replatforming this application to Spark, how we solved them, and the benefits we saw.
Some things we’ll touch on include how we created customized models, the architecture of our machine learning application, how we maintain an audit trail of data transformations (for rigorous third party audits), and how we validate the input data our model takes in and output data our model produces. We want the attendees to walk away with some key ideas of what worked for us when productizing a large scale machine learning platform.
Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Spark Summit
Streaming applications have often been complex to design and maintain because of the significant upfront infrastructure investment required. However, with the advent of Spark an easy transition to stream processing is now available, enabling personalization applications and experiments to consume near real-time data without massive development cycles.
Our decision to evaluate Spark as our stream processing engine was primarily led by the following considerations: 1) Ease of development for the team (already familiar with spark for batch), 2) the scope/requirements of our problem, 3) re-usability of code from spark batch jobs, and 4) Spark support from infrastructure teams within the company.
In this session, we will present our experience using Spark for stream processing unbounded datasets in the personalization space. The datasets consisted of, but were not limited, to the stream of playback events that are used as feedback for all personalization algorithms. These plays are used to extract specific behaviors which are highly predictive of a customer’s enjoyment of our service. This dataset is massive and has to be further enriched by other online and offline Netflix data sources. These datasets, when consumed by our machine learning models, directly affect the customer’s personalized experience, which means that the impact is high and tolerance for failure is low. We’ll talk about the experiments we did to compare Spark with other streaming solutions like Apache Flink , the impact that we had on our customers, and most importantly, the challenges we faced.
Take-aways for the audience:
1) A great example of stream processing large, personalization datasets at scale.
2) An increased awareness of the costs/requirements for making the transition from batch to streaming successfully.
3) Exposure to some of the technical challenges that should be expected along the way.
PyConline AU 2021 - Things might go wrong in a data-intensive applicationHua Chu
We are going to go behind the scene of building a data-intensive system. The story includes challenges I have faced and what I learned from those incidents.
https://2021.pycon.org.au/program/8hlvvs/
Optimizing NoSQL Performance Through ObservabilityScyllaDB
ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. But before you squeeze, make sure you know what to monitor!
Watch our experienced Postgres developer work through monitoring and performance strategies that help him understand what mistakes he’s made moving to NoSQL. And learn with him as our database performance expert offers friendly guidance on how to use monitoring and performance tuning to get his sample Rust application on the right track.
This webinar focuses on using monitoring and performance tuning to discover and correct mistakes that commonly occur when developers move from SQL to NoSQL. For example:
- Common issues getting up and running with the monitoring stack
- Using the CQL optimizations dashboard
- Common issues causing high latency in a node
- Common issues causing replica imbalance
- What a healthy system looks like in terms of memory
- Key metrics to keep an eye on
This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
Discuss the core tradeoffs and considerations involved in order-free and ordered stream processing. Brian Taylor walks through the pros and cons of three different approaches: no data dependency, deferred inter-event data dependency, and streaming inter-event data dependency.
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...ScyllaDB
We start by setting up a common ground introducing why relational databases fall short, addressing common EDA characteristics such as the need for real-time response times and schemaless approaches to address recurring changes to adapt and on-board new use cases. Next, interact with a sample Rust-based application: a social network app demonstrating an integration of both ScyllaDB and Redpanda.
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
Discover how to avoid common pitfalls when shifting to an event-driven architecture (EDA) in order to boost system recovery and scalability. We cover Kafka Schema Registry, in-broker transformations, event sourcing, and more.
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the data modeling transition
ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning.
Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts.
This webinar focuses on common data modeling and querying mistakes that occur when developers move from SQL to NoSQL. For example:
- Understanding query first design principles
- Planning for schema evolution
- Steering clear of common pitfalls and anti-patterns
- Assessing data access patterns
This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the transition
ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning.
Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts.
Our first webinar of this series will cover common mistakes with practices such as:
- Translating the data model to NoSQL
- Optimizing table design
- Optimizing query performance
- Planning for partitioning
This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
Expert tips on how to maximize your database performance at scale
Untangle the complexity of achieving database performance at scale. Join this webinar to discover commonly overlooked ways to get predictable low latency, even at extreme scale. Our Solution Architects will walk you through the strategies and pitfalls learned by working on thousands of real-world distributed database projects, many reaching 1M OPS with single-digit MS latencies.
In addition to offering clear recommendations, we’ll also explain the process behind how we arrived at them – so you can benefit from the lessons learned by other teams.
We’ll cover how to:
- Design and deploy a large-scale distributed database cluster
- Optimize your clients’ interactions with it
- Expand the cluster horizontally and globally
- Ensure it survives whatever disasters the world throws at it
Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments.
Join us for an interactive session where we dissect four specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma, we'll:
- Examine the context and technical requirements
- Talk about potential solutions and cover the pros and cons of each
- Disclose what approach the team took, and how it worked out
About the speaker:
Felipe is an IT specialist with years of experience on distributed systems and open-source technologies. He is one of the co-authors of "Database Performance at Scale", an Open Access, freely available publication for individuals interested on improving database performance. At ScyllaDB, he works as a Solution Architect.
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
Linear scaling (sometimes near linear scaling) is often mentioned in several benchmarks, articles and product comparisons as proof that a given technology and algorithmic optimizations perform better than another. But is that really what performance is all about, and should you even care?
This webinar discusses performance beyond linear scalability, including what typically matters more when running high throughput and low latency workloads at scale. We'll cover how ScyllaDB offers unparalleled performance and share our insights on:
- The hidden aspects of linear scaling
- When linear scaling matters most and when it’s simply irrelevant
- Often overlooked considerations for optimizing and measuring distributed systems performance
Watch now to learn from our experience (and lessons learned) in building the fastest NoSQL database in the world.
Navigating Complex Database Performance Hurdles
Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments.
Join us for an interactive session where we dissect 4 specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma:
- The presenters will describe the context and technical requirements
- Together, we’ll talk about potential solutions and cover the pros and cons of each
- Finally, we’ll disclose what approach the team took, and how it worked out
Throughout the event, we’ll have opportunities to win ScyllaDB swag and prizes! Come prepared to engage in lively discussions and gain valuable insight into database performance strategies.
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
Felipe Cardeneti Mendes, Solutions Architect at ScyllaDB
Navigating workload-specific performance challenges and tradeoffs.
Felipe Mendes covers how to navigate the top performance challenges and tradeoffs that you’re likely to face with your project’s specific workload characteristics and technical/business requirements.
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
Pavel Emelyanov, Principal Engineer at ScyllaDB
Botond Denes, C++ Developer at ScyllaDB
What performance-minded engineers need to know.
Hear from Pavel Emelyanov and Botond Dénes on the impact of database internals – specifically, what to look for if you need latency and/or throughput improvements.
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
Piotr Sarna, Software Engineer at Turso
Understanding and tapping your driver’s performance potential.
Piotr Sarna discusses how to get the most out of a driver, particularly from the performance perspective, and select a driver that’s a good fit for your needs.
Technical risks of putting a cache in front of your database– and what to do instead
Teams experiencing subpar latency commonly turn to an external cache to meet the required SLAs. Placing a cache in front of your database might seem like a fast and easy fix, but it often ends up introducing unanticipated complexity, costs, and risks. External caches can be one of the more problematic components of distributed application architecture.
Join this webinar for a technical discussion of the risks associated with using an external cache and a look at how ScyllaDB’s cache implementation simplifies your architecture without compromising latency. We’ll cover:
- Different approaches to caching (pre-caching vs. caching, side cache vs. transparent cache)
- 7 specific reasons why external caching ia a bad choice
- Why Linux’s default caching doesn’t work well for databases
- The advantages & architecture of ScyllaDB's specialized row-based cache
- Real-world examples of why and how teams eliminated their external cache with ScyllaDB
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
Discover how your team can achieve low latency at the extreme scale that your data-intensive applications require. We’ll walk you through an example of how ScyllaDB scales linearly to achieve 1M and then 2M OPS – with <1ms P99 latency. We’ll cover how this works on a sample realtime app (an ML feature store), share best practices for performance, and talk about the most important tradeoffs you’ll need to negotiate.
Join us to learn:
- Why and how to ensure your database takes full advantage of your cloud infrastructure
- What architectural considerations matter most for high throughput and low latency
- Key factors to consider when selecting a high-performance database
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
Teams experiencing subpar latency commonly turn to an external cache to meet the required SLAs. Placing a cache in front of your database might seem like a fast and easy fix, but it often ends up introducing unanticipated complexity, costs, and risks. Caches can be one of the more problematic components of distributed application architecture.
Join this webinar for a technical discussion of the risks associated with using an external cache and a look at an alternative strategy that simplifies your architecture without compromising latency. We’ll cover:
- Different approaches to caching (pre-caching vs. caching, side cache vs. transparent cache)
- 7 specific reasons why external caching can be a bad choice
- Why Linux’s default caching doesn’t work well for databases
- The advantages & architecture of specialized row-based caches
- Real-world examples of why and how teams eliminated their external cache
Expert tips on how to maximize your database potential
If you’re considering or getting started with ScyllaDB, you’re probably intrigued by its potential to achieve high throughput and predictable low latency at a reasonable cost. So how do you ensure that you’re maximizing that potential for your team’s specific workloads and use case?
This webinar offers practical advice for navigating the various decision points you’ll face as you assess whether ScyllaDB is a good fit for your team and later roll it out into production. We’ll cover the most critical considerations, tradeoffs, and recommendations related to:
- Infrastructure selection
- ScyllaDB configuration
- Client-side setup
- Data modeling
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
In this talk, Felipe Mendes, Solutions Architect at ScyllaDB, shares how 4 companies managed their migration. He covers:
Disney+ – No migration needed!
Discord – Shadow cluster
OpenWeb – TTL expiration, cover Load and Stream
MyHeritage – Counters
ShareChat – Bonus: A bit of everything
In this talk, Lubos discusses tools and methods for a successful migration. He covers:
Methods
Data (re)modeling
APIs
Spark Migrator
DS bulk
Tuning
Testing/monitoring
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
In this talk, Jon discusses practical strategies and issues to consider. He covers:
Reasons for Migrations
DB Functionality
Cost/Licensing
Outdated Technology
Scaling Problems
Technology Evolution
SQL to NoSQL
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
2. Daniel Belenky
■ Kubernetes & Virtualization
■ Distributed applications
■ Big data and stream processing
Principal Software Engineer
YOUR PHOTO
GOES HERE
3. Agenda
■ A brief the product and my team - 3 min
■ What was the challenge that we were facing - 5 min
■ What were the solutions were considered - 5 min
■ How we’ve managed to solve the problem with Scylla - 12 min
5. Our product
Is a security product that performs analytics, detection and response.
■ Millions of records per second
■ Multiple data sources and schemas
■ Has to provide insights in a near real-time timeframe (security...)
6. About my team
We are responsible for the infrastructure that:
■ Stream processing of data that comes from multiple sources.
■ Clean, normalize and process the data - prepare it for further analysis.
■ Build stories - multiple data sources emit different events and provide different
views on the same network session.
■ We want to fuse those events that tell the same story from a different
perspective.
■ Mostly developing with Go and Python
■ Deployment is on K8s
10. Problem description (part 1)
Various sensors
see a network
event
{event: dns-query, id: 6c92e}
{event: dns-query, id: 873a1}
...
10:00:01
11. Problem description (part 1)
Various sensors
see a network
event
{event: dns-query, id: 6c92e}
{event: dns-query, id: 873a1}
...
10:00:01
10:00:02
12. Problem description (part 1)
Various sensors
see a network
event
{event: dns-query, id: 6c92e}
{event: dns-query, id: 873a1}
...
10:00:01
10:00:02 {kind: login, id: 13}
{kind: signup, id: 17}
...
13. Problem description (part 1)
Various sensors
see a network
event
{kind: login, id: 13}
{kind: signup, id: 17}
...
{event: dns-query, id: 6c92e}
{event: dns-query, id: 873a1}
...
10:00:01
10:08:05
10:00:02
14. Problem description (part 1)
Various sensors
see a network
event
{kind: login, id: 13}
{kind: signup, id: 17}
...
{type: GET, id: CHJW}
{type: POST, id: KQJD}
...
{event: dns-query, id: 6c92e}
{event: dns-query, id: 873a1}
...
10:00:01
10:08:05
10:00:02
15. Problem description (part 2)
Data from different sensors
comes in different forms
and formats
In different times
16. Problem description (part 2)
Data from different sensors
comes in different forms
and formats
In different times
17. Problem description (part 2)
Data from different sensors
comes in different forms
and formats
In different times
Normalized
data in a
canonical form
ready for
processing
18. Problem description (part 2)
Data from different sensors
comes in different forms
and formats
In different times
Normalized
data in a
canonical form
ready for
processing
?
Millions of
normalized but
unassociated
entries per
second from
many different
sources
19. The question is:
How to associate discrete entries that describe the
same network session
21. Why is it a challenge?
■ Clock skew across different sensors
Clocks across sensors might not be synchronized to the second
22. Why is it a challenge?
■ Clock skew across different sensors
Clocks across sensors might not be synchronized to the second
■ We have thousands of deployments to manage
Deployments also vary in size (from Bps to GBps)
23. Why is it a challenge?
■ Clock skew across different sensors
Clocks across sensors might not be synchronized to the second
■ We have thousands of deployments to manage
Deployments also vary in size (from Bps to GBps)
■ Sensor’s viewpoint on the session
Different sensors have different views on the same session
24. Why is it a challenge?
■ Clock skew across different sensors
Clocks across sensors might not be synchronized to the second
■ We have thousands of deployments to manage
Deployments also vary in size (from Bps to GBps)
■ Sensor’s viewpoint on the session
Different sensors have different views on the same session
■ Zero tolerance for data loss
Data is pushed to us and if we lose it, it’s lost for good
25. Why is it a challenge?
■ Clock skew across different sensors
Clocks across sensors might not be synchronized to the second
■ We have thousands of deployments to manage
Deployments also vary in size (from Bps to GBps)
■ Sensor’s viewpoint on the session
Different sensors have different views on the same session
■ Zero tolerance for data loss
Data is pushed to us and if we lose it, it’s lost for good
■ Continuous out of order stream
Sensors send data in different times and
event time != ingestion time != processing
time
39. So… what do we need here?
■ Receive a stream of events
40. So… what do we need here?
■ Receive a stream of events
■ Wait some amount of time to allow related events
to arrive
41. So… what do we need here?
■ Receive a stream of events
■ Wait some amount of time to allow related events
to arrive
■ Decide which events are related to each other
42. So… what do we need here?
■ Receive a stream of events
■ Wait some amount of time to allow related events
to arrive
■ Decide which events are related to each other
■ Publish the results
43. So… what do we need here?
■ Receive a stream of events
■ Wait some amount of time to allow related events
to arrive
■ Decide which events are related to each other
■ Publish the results
44. So… what do we need here?
■ Receive a stream of events
■ Wait some amount of time to allow related events
to arrive
■ Decide which events are related to each other
■ Publish the results
■ Single tenant deployment - we need isolation
45. So… what do we need here?
■ Receive a stream of events
■ Wait some amount of time to allow related events
to arrive
■ Decide which events are related to each other
■ Publish the results
■ Single tenant deployment - we need isolation
■ Support rates from several KB per hour up to
several GBs per second at a reasonable cost
49. Proposed solution #1
Normalized
data in a
canonical form
ready for
processing Store the
records in a
relational DB
Periodical tasks
to compute
stories
50. Proposed solution #1
Normalized
data in a
canonical form
ready for
processing Store the
records in a
relational DB
Periodical tasks
to compute
stories
Publish stories
for other
components to
consume
52. Pros
■ Relatively simple implementation:
We have to orchestrate the data and the
queries but not to write any complex logic
ourselves
53. Pros
■ Relatively simple implementation:
We have to orchestrate the data and the
queries but not to write any complex logic
ourselves
Cons
54. Pros
■ Relatively simple implementation:
We have to orchestrate the data and the
queries but not to write any complex logic
ourselves
■ Operational overhead - we have to deploy,
maintain and operate another database
Cons
55. Pros
■ Relatively simple implementation:
We have to orchestrate the data and the
queries but not to write any complex logic
ourselves
■ Operational overhead - we have to deploy,
maintain and operate another database
■ Limited performance - relational
database queries are slower when
compared to queries on a NoSQL
database (if the data model allows
utilizing a NoSQL database)
Cons
56. Pros
■ Relatively simple implementation:
We have to orchestrate the data and the
queries but not to write any complex logic
ourselves
■ Operational overhead - we have to deploy,
maintain and operate another database
■ Limited performance - relational
database queries are slower when
compared to queries on a NoSQL
database (if the data model allows
utilizing a NoSQL database)
■ Operational cost - complex queries
require more CPU hence are more
expensive
Cons
61. Proposed solution #2
Normalized
data in a
canonical form
ready for
processing
Store the
records
ScyllaDB
Publish keys to
fetch the
records
Records can’t be sent on Kafka because they are too big
So we send only the primary key to fetch from Scylla
62. Proposed solution #2
Normalized
data in a
canonical form
ready for
processing
Store the
records
ScyllaDB
Publish keys to
fetch the
records
Multiple consumers read
data from a Kafka topic
63. Proposed solution #2
Normalized
data in a
canonical form
ready for
processing
Store the
records
ScyllaDB
Publish keys to
fetch the
records
Fetch records
from Scylla
Multiple consumers read
data from a Kafka topic
64. Proposed solution #2
Normalized
data in a
canonical form
ready for
processing
Store the
records
ScyllaDB
Publish stories
for other
components to
consume
Publish keys to
fetch the
records
Multiple consumers read
data from a Kafka topic
Fetch records
from Scylla
69. Pros
■ High throughput
■ One less database to maintain
■ We have to write our own logic to
find correlations and build stories
Cons
70. Pros
■ High throughput
■ One less database to maintain
■ We have to write our own logic to
find correlations and build stories
■ Complex architecture and
deployment
Cons
71. Pros
■ High throughput
■ One less database to maintain
■ We have to write our own logic to
find correlations and build stories
■ Complex architecture and
deployment
■ We have to maintain thousands
of Kafka deployments
Cons
73. Proposed solution #3
Normalized
data in a
canonical form
ready for
processing
Store the
records
ScyllaDB
Publish stories
for other
components to
consume
Publish keys to
fetch the
records
Fetch records
from Scylla
using keys
received from
the queue
Process and
compute
stories
Queue
76. Pros
■ High throughput when compared to the
relational database approach
■ One less database to maintain
77. Pros
■ High throughput when compared to the
relational database approach
■ One less database to maintain
■ No need to maintain Kafka deployments
78. Pros
■ High throughput when compared to the
relational database approach
■ One less database to maintain
■ No need to maintain Kafka deployments
Cons
79. Pros
■ High throughput when compared to the
relational database approach
■ One less database to maintain
■ No need to maintain Kafka deployments
■ Much slower performance when
compared to Kafka
Cons
80. The solution that solved our use case
Using ScyllaDB - no message queue
81. Accepted solution - high level
Normalized data in
a canonical form
ready for
processing
82. Accepted solution - high level
Normalized data in
a canonical form
ready for
processing
Store the records
ScyllaDB
83. Accepted solution - high level
Normalized data in
a canonical form
ready for
processing
Store the records
ScyllaDB
The data is sharded into hundreds shards
84. Accepted solution - high level
Normalized data in
a canonical form
ready for
processing
Store the records
ScyllaDB
The data is sharded into hundreds shards
Partition key is a tuple of
(shard-number, insert_time)
Clustering key is (event id)
85. Accepted solution - high level
Normalized data in
a canonical form
ready for
processing
Store the records
ScyllaDB
The data is sharded into hundreds shards
Partition key is a tuple of
(shard-number, insert_time)
Clustering key is (event id)
Multiple consumers fetch records
from Scylla using their assigned shard
numbers and the time they want to
consume
The step resolution is configurable
86. Accepted solution - high level
Normalized data in
a canonical form
ready for
processing
Store the records
ScyllaDB
The data is sharded into hundreds shards
Partition key is a tuple of
(shard-number, insert_time)
Clustering key is (event id)
Multiple consumers fetch records
from Scylla using their assigned shard
numbers and the time they want to
consume
The step resolution is configurable
Process and
compute
stories
87. Accepted solution - high level
Normalized data in
a canonical form
ready for
processing
Store the records
ScyllaDB
The data is sharded into hundreds shards
Partition key is a tuple of
(shard-number, insert_time)
Clustering key is (event id)
Publish stories
for other
components to
consume
Multiple consumers fetch records
from Scylla using their assigned shard
numbers and the time they want to
consume
The step resolution is configurable
Process and
compute
stories
89. Pros
■ Since we already have Scylla deployed for
other parts, we don’t have to add any new
creatures to the system
90. Pros
■ Since we already have Scylla deployed for
other parts, we don’t have to add any new
creatures to the system
■ High throughput when compared to the
relational database approach
91. Pros
■ Since we already have Scylla deployed for
other parts, we don’t have to add any new
creatures to the system
■ High throughput when compared to the
relational database approach
■ One less database to maintain
92. Pros
■ Since we already have Scylla deployed for
other parts, we don’t have to add any new
creatures to the system
■ High throughput when compared to the
relational database approach
■ One less database to maintain
■ No need to maintain Kafka deployments
93. Pros
■ Since we already have Scylla deployed for
other parts, we don’t have to add any new
creatures to the system
■ High throughput when compared to the
relational database approach
■ One less database to maintain
■ No need to maintain Kafka deployments
Cons
94. Pros
■ Since we already have Scylla deployed for
other parts, we don’t have to add any new
creatures to the system
■ High throughput when compared to the
relational database approach
■ One less database to maintain
■ No need to maintain Kafka deployments
■ Our code became more complex
Cons
95. Pros
■ Since we already have Scylla deployed for
other parts, we don’t have to add any new
creatures to the system
■ High throughput when compared to the
relational database approach
■ One less database to maintain
■ No need to maintain Kafka deployments
■ Our code became more complex
■ Producers and consumers has to have
synchronized clocks
(up to a certain resolution)
Cons