We will share Scylla adoption practices in equipment sensor data management of MES, Data Modeling Tips, Data Architecture using Scylla, configurations, and tunings.
Scylla Summit 2017: SMF: The Fastest RPC in the WestScyllaDB
On a quest to build the fastest durable log broker in the west, we had to rethink all of the components needed to deliver on this promise. First, we began by building the fastest RPC system in the west, SMF. SMF is a new RPC mechanism, IDL-compiler, and libraries that make using Seastar easy. In this talk, I will cover SMF in detail and show a live demo on how you can get started using it to build your next application so you can live in the future.
Scylla Summit 2017: Planning Your Queries for Maximum PerformanceScyllaDB
What happens to a request that reaches Scylla, and why should one care? Understanding how Scylla executes your queries can help you make better architectural decisions and also better understand the performance of your application.
Are my rows too big? Should I make that other column a part of my partition key instead? This talk will cover the interaction between nodes, shards and the role of Scylla's internal components like memtables, cache and sstables. I will explain how different types of queries are executed and how to plan your queries for maximum performance.
Scylla Summit 2017: How to Run Cassandra/Scylla from a MySQL DBA's Point of ViewScyllaDB
Are you a MySQL DBA or DevOps individual being asked to run Cassandra or Scylla? Feeling overwhelmed? In this talk, I will present Cassandra/Scylla operations in terms that directly relate to MySQL. I will show you comparisons between the Information Schema and the Cassandra/Scylla System keyspace(s). I will also talk about metrics available in MySQL versus Cassandra/Scylla and how to retrieve them. Finally, I will talk about how MySQL replication compares with Cassandra replication. Hopefully, when I am done you will be able to relate to Cassandra operations in a practical and useful way.
Scylla Summit 2017: From Elasticsearch to Scylla at ZenlyScyllaDB
Zenly (recently acquired by Snap) makes a social map app. Their team has been running Scylla in production for the past eight months. Get an overview of the reasons they chose Scylla, its deployment on Google Cloud, the performances they achieved, plus learn as they share some of the few hiccups they hit along the way.
Scylla Summit 2017: Stateful Streaming Applications with Apache Spark ScyllaDB
When working with streaming data, stateful operations are a common use case. If you would like to perform data de-duplication, calculate aggregations over event-time windows, track user activity over sessions, you are performing a stateful operation.
Apache Spark provides users with a high level, simple to use DataFrame/Dataset API to work with both batch and streaming data. The funny thing about batch workloads is that people tend to run these batch workloads over and over again. Structured Streaming allows users to run these same workloads, with the exact same business logic in a streaming fashion, helping users answer questions at lower latencies.
In this talk, we will focus on stateful operations with Structured Streaming and we will demonstrate through live demos, how NoSQL stores can be plugged in as a fault tolerant state store to store intermediate state, as well as used as a streaming sink, where the output data can be stored indefinitely for downstream applications.
Scylla Summit 2017: Intel Optane SSDs as the New Accelerator in Your Data CenterScyllaDB
Frank will share the motivation behind the 3D XPoint memory, the current shipping Optane SSD product and key values of why it is better than NAND-based SSDs, and a few use cases that exist in the Open Source space for Database usages of Optane SSDs.
Scylla Summit 2017: The Upcoming HPC EvolutionScyllaDB
In this talk, I will explain how HPC is beginning to evolve and how we use supercomputers to monitor supercomputers. First we will look at how HPC is different from cloud computing in terms of infrastructure and application architecture. Then I will discuss how those things are changing and why. Finally, I will dive into a use case of monitoring supercomputers as an application area for Scylla.
Scylla Summit 2017: How We Got to 1 Millisecond Latency in 99% Under Repair, ...ScyllaDB
Glauber Costa, a Principal Architect at ScyllaDB, discusses techniques for achieving low latency database operations. He identifies three main sources of latency: speed mismatch between disk and CPU, lack of respect for task quotas, and imperfect isolation. Glauber describes how ScyllaDB addresses these issues through techniques like the I/O scheduler, CPU scheduler, task quotas, block detector, and controllers that regulate operations like memtable flushes. The goal is to make high percentile latencies low and bounded by treating them as bugs rather than nice-to-haves. ScyllaDB users can already benefit from these latency improvements in many situations, with more fixes coming in future releases.
Scylla Summit 2017: SMF: The Fastest RPC in the WestScyllaDB
On a quest to build the fastest durable log broker in the west, we had to rethink all of the components needed to deliver on this promise. First, we began by building the fastest RPC system in the west, SMF. SMF is a new RPC mechanism, IDL-compiler, and libraries that make using Seastar easy. In this talk, I will cover SMF in detail and show a live demo on how you can get started using it to build your next application so you can live in the future.
Scylla Summit 2017: Planning Your Queries for Maximum PerformanceScyllaDB
What happens to a request that reaches Scylla, and why should one care? Understanding how Scylla executes your queries can help you make better architectural decisions and also better understand the performance of your application.
Are my rows too big? Should I make that other column a part of my partition key instead? This talk will cover the interaction between nodes, shards and the role of Scylla's internal components like memtables, cache and sstables. I will explain how different types of queries are executed and how to plan your queries for maximum performance.
Scylla Summit 2017: How to Run Cassandra/Scylla from a MySQL DBA's Point of ViewScyllaDB
Are you a MySQL DBA or DevOps individual being asked to run Cassandra or Scylla? Feeling overwhelmed? In this talk, I will present Cassandra/Scylla operations in terms that directly relate to MySQL. I will show you comparisons between the Information Schema and the Cassandra/Scylla System keyspace(s). I will also talk about metrics available in MySQL versus Cassandra/Scylla and how to retrieve them. Finally, I will talk about how MySQL replication compares with Cassandra replication. Hopefully, when I am done you will be able to relate to Cassandra operations in a practical and useful way.
Scylla Summit 2017: From Elasticsearch to Scylla at ZenlyScyllaDB
Zenly (recently acquired by Snap) makes a social map app. Their team has been running Scylla in production for the past eight months. Get an overview of the reasons they chose Scylla, its deployment on Google Cloud, the performances they achieved, plus learn as they share some of the few hiccups they hit along the way.
Scylla Summit 2017: Stateful Streaming Applications with Apache Spark ScyllaDB
When working with streaming data, stateful operations are a common use case. If you would like to perform data de-duplication, calculate aggregations over event-time windows, track user activity over sessions, you are performing a stateful operation.
Apache Spark provides users with a high level, simple to use DataFrame/Dataset API to work with both batch and streaming data. The funny thing about batch workloads is that people tend to run these batch workloads over and over again. Structured Streaming allows users to run these same workloads, with the exact same business logic in a streaming fashion, helping users answer questions at lower latencies.
In this talk, we will focus on stateful operations with Structured Streaming and we will demonstrate through live demos, how NoSQL stores can be plugged in as a fault tolerant state store to store intermediate state, as well as used as a streaming sink, where the output data can be stored indefinitely for downstream applications.
Scylla Summit 2017: Intel Optane SSDs as the New Accelerator in Your Data CenterScyllaDB
Frank will share the motivation behind the 3D XPoint memory, the current shipping Optane SSD product and key values of why it is better than NAND-based SSDs, and a few use cases that exist in the Open Source space for Database usages of Optane SSDs.
Scylla Summit 2017: The Upcoming HPC EvolutionScyllaDB
In this talk, I will explain how HPC is beginning to evolve and how we use supercomputers to monitor supercomputers. First we will look at how HPC is different from cloud computing in terms of infrastructure and application architecture. Then I will discuss how those things are changing and why. Finally, I will dive into a use case of monitoring supercomputers as an application area for Scylla.
Scylla Summit 2017: How We Got to 1 Millisecond Latency in 99% Under Repair, ...ScyllaDB
Glauber Costa, a Principal Architect at ScyllaDB, discusses techniques for achieving low latency database operations. He identifies three main sources of latency: speed mismatch between disk and CPU, lack of respect for task quotas, and imperfect isolation. Glauber describes how ScyllaDB addresses these issues through techniques like the I/O scheduler, CPU scheduler, task quotas, block detector, and controllers that regulate operations like memtable flushes. The goal is to make high percentile latencies low and bounded by treating them as bugs rather than nice-to-haves. ScyllaDB users can already benefit from these latency improvements in many situations, with more fixes coming in future releases.
Scylla Summit 2017: Migrating to Scylla From Cassandra and Others With No Dow...ScyllaDB
The session will cover the best practices to migrate existing data from Apache Cassandra to Scylla and how to do it while being online all of the time.
Scylla Summit 2017: Running a Soft Real-time Service at One Million QPSScyllaDB
AdGear runs an ad tech gateway at more than one million queries per second to Scylla and recently transitioned from Apache Cassandra. In this talk, we will highlight the tools and languages that we use (Erlang), how we do bulk imports, and how performance compares between the two database engines.
ScyllaDB CTO Avi Kivity gave a keynote on how Scylla has evolved. He discussed new features in Scylla 2.0—including Materialized Views and Heat-Weighted Load Balancing, changes in monitoring—and shared our product roadmap. He also talked about our recent acquisition of Seastar.io and how it will enable us to deliver a database-as-a-service offering.
Scylla Summit 2017 Keynote: NextGen NoSQL with CEO Dor LaorScyllaDB
ScyllaDB CEO and co-founder Dor Laor shares his vision for Scylla and announces Scylla 2.0, a big step towards the first autonomous NoSQL database—one that dynamically tunes itself to varying conditions while always maintaining a high level of performance.
Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and S...ScyllaDB
The document appears to be a presentation on optimizing inter-data center communication. It discusses key topics like what inter-data center communication involves, the costs associated with it, best practices for setting snitches, keyspaces, client drivers and consistency levels for queries to optimize performance between data centers. It recommends using network topology replication strategies over simple strategies for multi-region deployments, setting load balancing and consistency levels appropriately in clients, and enabling internode compression to reduce costs of communication between data centers. The presentation encourages reviewing client locations, data access patterns, who is reading/writing data, and having conversations between operations and development teams to determine the best use cases.
Kubernetes is a declarative system for automatically deploying, managing, and scaling applications and their dependencies. In this short talk, I'll demonstrate a small Scylla cluster running in Google Compute Engine via Kubernetes and our publicly-published Docker images.
Scylla Summit 2017: Scylla on Samsung NVMe Z-SSDsScyllaDB
I will be giving a talk about performance characterization and tuning of Scylla on Samsung NVMe SSDs. We will characterize the performance of Scylla on Samsung high-performance NVMe SSDs and show how Z-SSD ─ the Samsung ultra-low-latency NVMe drive ─ can significantly shrink the performance gap between in-memory and in-storage with Scylla.
We will further evaluate the throughput-vs-latency profile of Scylla with NVMe devices and present end-to-end latencies (from the client's viewpoint) as well as the latencies of the software/hardware stack. We will show that a Z-SSD-backed Scylla cluster can provide competitive performance to an in-memory deployment while sharply reducing costs.
If You Care About Performance, Use User Defined TypesScyllaDB
Shlomi Livne, VP of R&D at ScyllaDB, presented on the performance benefits of using user-defined types (UDTs) in ScyllaDB. He explained that with traditional columns, each column has overhead and flexibility comes at a price. However, with frozen UDTs, the columns are treated as a single unit, sharing metadata and improving performance. Livne showed results of a test where UDTs with many fields outperformed traditional columns with the same number of fields. However, he noted that Scylla's row cache and Java driver performance need improvement for UDTs.
Scylla Summit 2017: How to Use Gocql to Execute Queries and What the Driver D...ScyllaDB
This document outlines a presentation on using the GoCQL driver to execute queries against Cassandra and Scylla databases. It discusses connecting to a Cassandra cluster, executing queries, iterating over results, and using asynchronous queries. It also mentions some additional Cassandra libraries built on top of GoCQL, including gocqlx for data binding and queries, and gocassa for queries and migrations. The presentation aims to explain how GoCQL works behind the scenes and how to get started with basic querying functionality.
Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend fo...ScyllaDB
JanusGraph, a highly scalable graph database solution, supports historically Cassandra and HBase as database backends. We decided to put Scylla in the mix, certainly searching for the best performing backend. We ran test scenarios that cover high volume reads and writes. In this talk, we will show you the performance results of Scylla vs others and also share our lessons learned during the performance evaluation.
Scylla Summit 2017: Snapfish's Journey Towards ScyllaScyllaDB
Snapfish, a web-based photo and printing service, will walk through their evaluation process for a new database, discuss use cases, and how they plan to use Scylla in their production systems.
Duarte Nunes presented on distributed materialized views in ScyllaDB. He discussed the challenges of implementing materialized views in a distributed system without a single master, including propagating updates from base tables to views, handling consistency when tables can diverge, and managing concurrent updates safely. His proposed solution uses asynchronous replica-based propagation paired with repair mechanisms and locking or optimistic concurrency to address these issues. Materialized views provide powerful indexing capabilities but also introduce performance overhead that is difficult to avoid given Scylla's data model.
Scylla Summit 2017: How to Ruin Your Workload's Performance by Choosing the W...ScyllaDB
In my talk, I will present the different compaction strategies that Scylla provides, and demonstrate when it is appropriate and when it is inappropriate to use each one. I will then present a new compaction strategy that we designed as a lesson from the existing compaction strategies by picking the best features of the existing strategies while avoiding their problems.
Scylla Summit 2017: Scylla's Open Source Monitoring SolutionScyllaDB
Scylla's monitoring capability has come a long way in the last year. We now have native support for Prometheus. Through scylla-grafana-monitoring, we have started providing default dashboards summarizing the most important aspects of Scylla for users. In this talk, I will cover what is currently available in our metrics, other non-standard metrics that are interesting but not available in our main dashboard, as well as our future plans for enhancement.
Scylla Summit 2017: A Deep Dive on Heat Weighted Load BalancingScyllaDB
This presentation discusses the "cold node problem" that occurs when a node restarts in a Cassandra cluster. When a node restarts, it loses its cached data and becomes a bottleneck. The presentation proposes a "heat weighted load balancing" solution where the cluster tracks each node's cache hit ratio and redistributes requests based on this ratio after a restart. Testing shows this solution significantly improves throughput after a node restart by distributing requests more evenly across nodes based on their "heat" or cache contents.
Scylla Summit 2017: Cry in the Dojo, Laugh in the Battlefield: How We Constan...ScyllaDB
Testing a complex system like Scylla is a challenge on its own. There are many environments, workloads, and problems. Simple problems become increasingly worse at scale. In this talk, we will explore the testing method that we employ in our QA lab and our plans to make it even better in years to come.
Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...ScyllaDB
Benchmarks are fun to do but when going to production, all sorts of things can happen: anything from hardware outages to human error bringing your database down. Even in a healthy database, a lot of maintenance operations have to periodically run. Do you have the tools necessary to make sure you are good to go?
Scylla Summit 2017: Welcome and Keynote - Nextgen NoSQLScyllaDB
Our CEO and co-founder Dor Laor and our chairman Benny Schnaider sharing their vision for Scylla. This was also our opportunity to announce Scylla 2.0. Our latest release is a big step toward the first autonomous NoSQL database—one that dynamically tunes itself to varying conditions while always maintaining a high level of performance.
Scylla Summit 2017: Saving Thousands by Running Scylla on EC2 Spot InstancesScyllaDB
Scylla and Spotinst together provide a strong combination of extreme performance and cost reduction. In this talk, we will present how a Scylla cluster can be used on AWS’s EC2 Spot without losing consistency with the help of Spotinst prediction technology and advanced stateful features. We will show a live demo on how to run Scylla on the Spotinst platform.
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...ScyllaDB
This document discusses using ScyllaDB as the data store for machine learning workflow pipelines processing IoT device data on Kubernetes. It describes SmartDeployAI's goal of creating reusable AI/ML pipelines and the challenges of previous approaches using Cassandra. ScyllaDB allows building cloud native ML pipelines that can efficiently run multiple workflows on Kubernetes and store model metadata, hyperparameters, and inference results for real-time analysis of IoT sensor data. Examples of computer vision pipelines for object detection and scene parsing are provided.
This presentation is an attempt do demystify the practice of building reliable data processing pipelines. We go through the necessary pieces needed to build a stable processing platform: data ingestion, processing engines, workflow management, schemas, and pipeline development processes. The presentation also includes component choice considerations and recommendations, as well as best practices and pitfalls to avoid, most learnt through expensive mistakes.
Torsten Steinbach and Chris Glew present IBM Cloud Query, a serverless analytics service that allows users to run ANSI SQL queries against data stored in cloud object storage. Some key points:
- IBM Cloud Query allows users to query data in various open formats like CSV, Parquet, and JSON stored in cloud object storage using SQL, with results also stored in object storage.
- It has a pay-per-query pricing model with no infrastructure to manage. Queries can be run via a web console, REST API, or Python client.
- The presentation outlines the architecture and provides examples of using Cloud Query for log analytics, data exploration, and building serverless data pipelines with Cloud Functions.
Scylla Summit 2017: Migrating to Scylla From Cassandra and Others With No Dow...ScyllaDB
The session will cover the best practices to migrate existing data from Apache Cassandra to Scylla and how to do it while being online all of the time.
Scylla Summit 2017: Running a Soft Real-time Service at One Million QPSScyllaDB
AdGear runs an ad tech gateway at more than one million queries per second to Scylla and recently transitioned from Apache Cassandra. In this talk, we will highlight the tools and languages that we use (Erlang), how we do bulk imports, and how performance compares between the two database engines.
ScyllaDB CTO Avi Kivity gave a keynote on how Scylla has evolved. He discussed new features in Scylla 2.0—including Materialized Views and Heat-Weighted Load Balancing, changes in monitoring—and shared our product roadmap. He also talked about our recent acquisition of Seastar.io and how it will enable us to deliver a database-as-a-service offering.
Scylla Summit 2017 Keynote: NextGen NoSQL with CEO Dor LaorScyllaDB
ScyllaDB CEO and co-founder Dor Laor shares his vision for Scylla and announces Scylla 2.0, a big step towards the first autonomous NoSQL database—one that dynamically tunes itself to varying conditions while always maintaining a high level of performance.
Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and S...ScyllaDB
The document appears to be a presentation on optimizing inter-data center communication. It discusses key topics like what inter-data center communication involves, the costs associated with it, best practices for setting snitches, keyspaces, client drivers and consistency levels for queries to optimize performance between data centers. It recommends using network topology replication strategies over simple strategies for multi-region deployments, setting load balancing and consistency levels appropriately in clients, and enabling internode compression to reduce costs of communication between data centers. The presentation encourages reviewing client locations, data access patterns, who is reading/writing data, and having conversations between operations and development teams to determine the best use cases.
Kubernetes is a declarative system for automatically deploying, managing, and scaling applications and their dependencies. In this short talk, I'll demonstrate a small Scylla cluster running in Google Compute Engine via Kubernetes and our publicly-published Docker images.
Scylla Summit 2017: Scylla on Samsung NVMe Z-SSDsScyllaDB
I will be giving a talk about performance characterization and tuning of Scylla on Samsung NVMe SSDs. We will characterize the performance of Scylla on Samsung high-performance NVMe SSDs and show how Z-SSD ─ the Samsung ultra-low-latency NVMe drive ─ can significantly shrink the performance gap between in-memory and in-storage with Scylla.
We will further evaluate the throughput-vs-latency profile of Scylla with NVMe devices and present end-to-end latencies (from the client's viewpoint) as well as the latencies of the software/hardware stack. We will show that a Z-SSD-backed Scylla cluster can provide competitive performance to an in-memory deployment while sharply reducing costs.
If You Care About Performance, Use User Defined TypesScyllaDB
Shlomi Livne, VP of R&D at ScyllaDB, presented on the performance benefits of using user-defined types (UDTs) in ScyllaDB. He explained that with traditional columns, each column has overhead and flexibility comes at a price. However, with frozen UDTs, the columns are treated as a single unit, sharing metadata and improving performance. Livne showed results of a test where UDTs with many fields outperformed traditional columns with the same number of fields. However, he noted that Scylla's row cache and Java driver performance need improvement for UDTs.
Scylla Summit 2017: How to Use Gocql to Execute Queries and What the Driver D...ScyllaDB
This document outlines a presentation on using the GoCQL driver to execute queries against Cassandra and Scylla databases. It discusses connecting to a Cassandra cluster, executing queries, iterating over results, and using asynchronous queries. It also mentions some additional Cassandra libraries built on top of GoCQL, including gocqlx for data binding and queries, and gocassa for queries and migrations. The presentation aims to explain how GoCQL works behind the scenes and how to get started with basic querying functionality.
Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend fo...ScyllaDB
JanusGraph, a highly scalable graph database solution, supports historically Cassandra and HBase as database backends. We decided to put Scylla in the mix, certainly searching for the best performing backend. We ran test scenarios that cover high volume reads and writes. In this talk, we will show you the performance results of Scylla vs others and also share our lessons learned during the performance evaluation.
Scylla Summit 2017: Snapfish's Journey Towards ScyllaScyllaDB
Snapfish, a web-based photo and printing service, will walk through their evaluation process for a new database, discuss use cases, and how they plan to use Scylla in their production systems.
Duarte Nunes presented on distributed materialized views in ScyllaDB. He discussed the challenges of implementing materialized views in a distributed system without a single master, including propagating updates from base tables to views, handling consistency when tables can diverge, and managing concurrent updates safely. His proposed solution uses asynchronous replica-based propagation paired with repair mechanisms and locking or optimistic concurrency to address these issues. Materialized views provide powerful indexing capabilities but also introduce performance overhead that is difficult to avoid given Scylla's data model.
Scylla Summit 2017: How to Ruin Your Workload's Performance by Choosing the W...ScyllaDB
In my talk, I will present the different compaction strategies that Scylla provides, and demonstrate when it is appropriate and when it is inappropriate to use each one. I will then present a new compaction strategy that we designed as a lesson from the existing compaction strategies by picking the best features of the existing strategies while avoiding their problems.
Scylla Summit 2017: Scylla's Open Source Monitoring SolutionScyllaDB
Scylla's monitoring capability has come a long way in the last year. We now have native support for Prometheus. Through scylla-grafana-monitoring, we have started providing default dashboards summarizing the most important aspects of Scylla for users. In this talk, I will cover what is currently available in our metrics, other non-standard metrics that are interesting but not available in our main dashboard, as well as our future plans for enhancement.
Scylla Summit 2017: A Deep Dive on Heat Weighted Load BalancingScyllaDB
This presentation discusses the "cold node problem" that occurs when a node restarts in a Cassandra cluster. When a node restarts, it loses its cached data and becomes a bottleneck. The presentation proposes a "heat weighted load balancing" solution where the cluster tracks each node's cache hit ratio and redistributes requests based on this ratio after a restart. Testing shows this solution significantly improves throughput after a node restart by distributing requests more evenly across nodes based on their "heat" or cache contents.
Scylla Summit 2017: Cry in the Dojo, Laugh in the Battlefield: How We Constan...ScyllaDB
Testing a complex system like Scylla is a challenge on its own. There are many environments, workloads, and problems. Simple problems become increasingly worse at scale. In this talk, we will explore the testing method that we employ in our QA lab and our plans to make it even better in years to come.
Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...ScyllaDB
Benchmarks are fun to do but when going to production, all sorts of things can happen: anything from hardware outages to human error bringing your database down. Even in a healthy database, a lot of maintenance operations have to periodically run. Do you have the tools necessary to make sure you are good to go?
Scylla Summit 2017: Welcome and Keynote - Nextgen NoSQLScyllaDB
Our CEO and co-founder Dor Laor and our chairman Benny Schnaider sharing their vision for Scylla. This was also our opportunity to announce Scylla 2.0. Our latest release is a big step toward the first autonomous NoSQL database—one that dynamically tunes itself to varying conditions while always maintaining a high level of performance.
Scylla Summit 2017: Saving Thousands by Running Scylla on EC2 Spot InstancesScyllaDB
Scylla and Spotinst together provide a strong combination of extreme performance and cost reduction. In this talk, we will present how a Scylla cluster can be used on AWS’s EC2 Spot without losing consistency with the help of Spotinst prediction technology and advanced stateful features. We will show a live demo on how to run Scylla on the Spotinst platform.
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...ScyllaDB
This document discusses using ScyllaDB as the data store for machine learning workflow pipelines processing IoT device data on Kubernetes. It describes SmartDeployAI's goal of creating reusable AI/ML pipelines and the challenges of previous approaches using Cassandra. ScyllaDB allows building cloud native ML pipelines that can efficiently run multiple workflows on Kubernetes and store model metadata, hyperparameters, and inference results for real-time analysis of IoT sensor data. Examples of computer vision pipelines for object detection and scene parsing are provided.
This presentation is an attempt do demystify the practice of building reliable data processing pipelines. We go through the necessary pieces needed to build a stable processing platform: data ingestion, processing engines, workflow management, schemas, and pipeline development processes. The presentation also includes component choice considerations and recommendations, as well as best practices and pitfalls to avoid, most learnt through expensive mistakes.
Torsten Steinbach and Chris Glew present IBM Cloud Query, a serverless analytics service that allows users to run ANSI SQL queries against data stored in cloud object storage. Some key points:
- IBM Cloud Query allows users to query data in various open formats like CSV, Parquet, and JSON stored in cloud object storage using SQL, with results also stored in object storage.
- It has a pay-per-query pricing model with no infrastructure to manage. Queries can be run via a web console, REST API, or Python client.
- The presentation outlines the architecture and provides examples of using Cloud Query for log analytics, data exploration, and building serverless data pipelines with Cloud Functions.
M|18 Understanding the Architecture of MariaDB ColumnStoreMariaDB plc
The document provides an overview of MariaDB ColumnStore, including its history, components, disk storage architecture, writing and querying data processes. It was presented by Andrew Hutchings, the lead software engineer for MariaDB ColumnStore, who has previous experience with MySQL, HP, and other companies. The presentation covers the technical use cases for ColumnStore, differences from row-oriented databases, and optimizations for ColumnStore.
Netflix's Transition to High-Availability Storage (QCon SF 2010)Sid Anand
This talk focuses on Netflix's transition from Oracle to SimpleDB -- a cloud-hosted, key-value store -- during Netflix's transition to the cloud (i.e. AWS). Stay tuned for future talks as Netflix evaluates more technologies, e.g. Cassandra.
Data Science Connect, July 22nd 2014 @IBM Innovation Center ZurichRomeo Kienzler
This document discusses data science tools and techniques for working with large datasets. It begins by introducing data science and the tools currently used by data scientists, such as SQL, R, and Python. It then discusses challenges with big data like data that exceeds memory limits. The document proposes solutions like Hadoop and its shared-nothing architecture to enable distributed, parallel processing across clusters of machines. It also presents SQL engines like Hive, Impala, and BigSQL that run on Hadoop and provide SQL interfaces. Finally, it demonstrates running SQL queries and R programs on a small Hadoop cluster to summarize and analyze large datasets that do not fit on a single machine.
Understanding the architecture of MariaDB ColumnStoreMariaDB plc
MariaDB ColumnStore extends MariaDB Server, a relational database for transaction processing, with distributed columnar storage and parallel query processing for scalable, high-performance analytical processing. This session helps MariaDB users understand how MariaDB ColumnStore works and why it’s needed for more demanding analytical workloads, and covers:
Use cases
Query processing
Bulk data insertion
Distributed partitions
Query optimization
RDBMS to NoSQL: Practical Advice from Successful MigrationsScyllaDB
When and how to migrate data from SQL to NoSQL are matters of much debate. It can certainly be a daunting task, but when your SQL systems hit architectural limits or your Aurora expenses skyrocket, it’s probably time to consider the move.
See a discussion of how best to migrate data from SQL to NoSQL, and how to get heterogenous data systems to communicate with each other effectively in real time. Get important architectural considerations, tips and tricks and several real-world use cases.
From this webinar you will learn:
Key differences between RDBMS and NoSQL, and how to know when it’s time to migrate
How to harness the greatest strengths out of both classes of databases, SQL and NoSQL
Migration techniques proven in the field
Modeling differences between RDBMS and NoSQL
Managing releases in NoSQL vs RDBMS
Scylla features and services that help with migrating from a relational database
Netflix migrated to the cloud to avoid single points of failure and to focus on their core competencies. They chose Amazon Web Services and migrated non-sensitive data and applications to the cloud. Netflix picked SimpleDB and S3 as their data stores in the cloud. Migrating from an RDBMS required translating relational concepts like normalization to key-value stores and working around issues with SimpleDB like lack of data types and transactions.
Building an Amazon Datawarehouse and Using Business Intelligence Analytics ToolsAmazon Web Services
Using AWS has never been easier or more affordable to solve business problems and uncover new opportunities using data. Now, businesses of all sizes and across all industries can take advantage of big data technologies and easily collect, store, process, analyze, and share their data. Gain a thorough understanding of what AWS offers across the big data lifecycle and learn architectural best practices for applying these technologies to your projects. We will also deep dive into how to use AWS services such as Kinesis, DynamoDB, Redshift, and Quicksight to optimize logging, build real-time applications, and analyze and visualize data at any scale.
NoSQL Database- cassandra column Base DBsadegh salehi
This document discusses column-oriented NoSQL databases and Cassandra. It begins with a review of NoSQL and an overview of key-value and column-oriented database models. It then provides details on Cassandra, including its data model, architecture, origins at Facebook, and suitability for large datasets. Examples are given of how Cassandra could be used for user profiles, shopping carts, and large datasets. Comparisons are made to MySQL and reasons for choosing Cassandra are outlined.
The document discusses modern big data trends and technologies. It covers topics like the role of data engineers, architectures like data mesh and lambda architectures, technologies like SQL, Apache Spark, and serverless computing, maturity of data governance and platforms, and innovations in areas like AI-driven analytics and data lake houses. The target audience is managers and engineers to provide an outlook on the latest developments in big data.
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all of your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing, scale-out architecture, and columnar direct-attached storage to minimize I/O time and maximize performance. Learn how you can gain deeper business insights and save money and time by migrating to Amazon Redshift. Take away strategies for migrating from on-premises data warehousing solutions, tuning schema and queries, and utilizing third party solutions.
The document discusses the role of a Chief Data Officer (CDO) and how data architecture can help support that role. It provides examples of what BT is doing with data in the absence of a CDO, including building a data platform called HAAS to centralize data and facilitate self-service analytics. The challenges of BT's complex legacy systems are also outlined. Data architecture helps by developing a vision, building the data infrastructure platform, and educating others on its use to support data-driven initiatives without a formal CDO.
Solving enterprise challenges through scale out storage & big compute finalAvere Systems
Google Cloud Platform, Avere Systems, and Cycle Computing experts will share best practices for advancing solutions to big challenges faced by enterprises with growing compute and storage needs. In this “best practices” webinar, you’ll hear how these companies are working to improve results that drive businesses forward through scalability, performance, and ease of management.
The slides were from a webinar presented January 24, 2017. The audience learned:
- How enterprises are using Google Cloud Platform to gain compute and storage capacity on-demand
- Best practices for efficient use of cloud compute and storage resources
- Overcoming the need for file systems within a hybrid cloud environment
- Understand how to eliminate latency between cloud and data center architectures
- Learn how to best manage simulation, analytics, and big data workloads in dynamic environments
- Look at market dynamics drawing companies to new storage models over the next several years
Presenters communicated a foundation to build infrastructure to support ongoing demand growth.
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all of your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing, scale-out architecture, and columnar direct-attached storage to minimize I/O time and maximize performance. Learn how you can gain deeper business insights and save money and time by migrating to Amazon Redshift. Take away strategies for migrating from on-premises data warehousing solutions, tuning schema and queries, and utilizing third party solutions.
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all of your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing, scale-out architecture, and columnar direct-attached storage to minimize I/O time and maximize performance. Learn how you can gain deeper business insights and save money and time by migrating to Amazon Redshift. Take away strategies for migrating from on-premises data warehousing solutions, tuning schema and queries, and utilizing third party solutions.
Presentation to dm as november 2007 with dynamic provisioning informationxKinAnx
This document provides updates on the ST9900 product line and program from Ken Ow-Wing. It includes details on microcode releases, new features like dynamic provisioning support, and performance enhancements. It also discusses tools now available, pre-sales performance assistance, and competition from EMC and IBM storage arrays.
Even though there have been a large number of proposals to accelerate databases using specialized hardware, often the opinion of the community is pessimistic: the performance and energy efficiency benefits of specialization are seen to be outweighed by the limitations of the proposed solutions and the additional complexity of including specialized hardware, such as field programmable gate arrays (FPGAs), in servers. Recently, however, as an effect of stagnating CPU performance, server architectures started to incorporate various programmable hardware and the availability of such components brings opportunities to databases. In the light of a shifting hardware landscape and emerging analytics workloads, it is time to revisit our stance on hardware acceleration. In this talk we highlight several challenges that have traditionally hindered the deployment of hardware acceleration in databases and explain how they have been alleviated or removed altogether by recent research results and the changing hardware landscape. We also highlight a new set of questions that emerge around deep integration of heterogeneous programmable hardware in tomorrow’s databases.
Travelling in time with SQL Server 2016 - Damian WideraITCamp
SQL Server 2016 comes up with a very exciting feature called Temporal tables. You can make queries to historical data lot easier by using this feature. The mechanism is very simple however you all should know it in depth to make sure you can use it efficiently. And this is exactly what I am going to do during this session – show you how to create temporal tables, how to use and manage them.
IBM DB2 Analytics Accelerator Trends & Directions by Namik Hrle Surekha Parekh
IBM DB2 Analytics Accelerator has drawn lots of attention from DB2 for z/OS users. In many respects it presents itself as just another DB2 access path (but what a powerful one!) and its deep integration into DB2 as well as application transparency makes it one of the most exciting DB2 enhancements in years. The IBM DB2 Analytics Accelerator complements DB2 by adding industry leading data intensive complex query performance thanks to being powered by the Netezza engine and enhances DB2 to the ultimate database management system that delivers the best of both worlds: transactional as well as analytical workloads. This presentation brings the latest news from the IDAA development and shows the trends and directions in which this technology develops.
Similar to Scylla Summit 2017: Scylla for Mass Simultaneous Sensor Data Processing of MES (Manufacturing Execution System) (20)
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
Optimizing NoSQL Performance Through ObservabilityScyllaDB
ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. But before you squeeze, make sure you know what to monitor!
Watch our experienced Postgres developer work through monitoring and performance strategies that help him understand what mistakes he’s made moving to NoSQL. And learn with him as our database performance expert offers friendly guidance on how to use monitoring and performance tuning to get his sample Rust application on the right track.
This webinar focuses on using monitoring and performance tuning to discover and correct mistakes that commonly occur when developers move from SQL to NoSQL. For example:
- Common issues getting up and running with the monitoring stack
- Using the CQL optimizations dashboard
- Common issues causing high latency in a node
- Common issues causing replica imbalance
- What a healthy system looks like in terms of memory
- Key metrics to keep an eye on
This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
Discuss the core tradeoffs and considerations involved in order-free and ordered stream processing. Brian Taylor walks through the pros and cons of three different approaches: no data dependency, deferred inter-event data dependency, and streaming inter-event data dependency.
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...ScyllaDB
We start by setting up a common ground introducing why relational databases fall short, addressing common EDA characteristics such as the need for real-time response times and schemaless approaches to address recurring changes to adapt and on-board new use cases. Next, interact with a sample Rust-based application: a social network app demonstrating an integration of both ScyllaDB and Redpanda.
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
Discover how to avoid common pitfalls when shifting to an event-driven architecture (EDA) in order to boost system recovery and scalability. We cover Kafka Schema Registry, in-broker transformations, event sourcing, and more.
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the data modeling transition
ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning.
Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts.
This webinar focuses on common data modeling and querying mistakes that occur when developers move from SQL to NoSQL. For example:
- Understanding query first design principles
- Planning for schema evolution
- Steering clear of common pitfalls and anti-patterns
- Assessing data access patterns
This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the transition
ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning.
Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts.
Our first webinar of this series will cover common mistakes with practices such as:
- Translating the data model to NoSQL
- Optimizing table design
- Optimizing query performance
- Planning for partitioning
This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
Expert tips on how to maximize your database performance at scale
Untangle the complexity of achieving database performance at scale. Join this webinar to discover commonly overlooked ways to get predictable low latency, even at extreme scale. Our Solution Architects will walk you through the strategies and pitfalls learned by working on thousands of real-world distributed database projects, many reaching 1M OPS with single-digit MS latencies.
In addition to offering clear recommendations, we’ll also explain the process behind how we arrived at them – so you can benefit from the lessons learned by other teams.
We’ll cover how to:
- Design and deploy a large-scale distributed database cluster
- Optimize your clients’ interactions with it
- Expand the cluster horizontally and globally
- Ensure it survives whatever disasters the world throws at it
Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments.
Join us for an interactive session where we dissect four specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma, we'll:
- Examine the context and technical requirements
- Talk about potential solutions and cover the pros and cons of each
- Disclose what approach the team took, and how it worked out
About the speaker:
Felipe is an IT specialist with years of experience on distributed systems and open-source technologies. He is one of the co-authors of "Database Performance at Scale", an Open Access, freely available publication for individuals interested on improving database performance. At ScyllaDB, he works as a Solution Architect.
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
Linear scaling (sometimes near linear scaling) is often mentioned in several benchmarks, articles and product comparisons as proof that a given technology and algorithmic optimizations perform better than another. But is that really what performance is all about, and should you even care?
This webinar discusses performance beyond linear scalability, including what typically matters more when running high throughput and low latency workloads at scale. We'll cover how ScyllaDB offers unparalleled performance and share our insights on:
- The hidden aspects of linear scaling
- When linear scaling matters most and when it’s simply irrelevant
- Often overlooked considerations for optimizing and measuring distributed systems performance
Watch now to learn from our experience (and lessons learned) in building the fastest NoSQL database in the world.
Navigating Complex Database Performance Hurdles
Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments.
Join us for an interactive session where we dissect 4 specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma:
- The presenters will describe the context and technical requirements
- Together, we’ll talk about potential solutions and cover the pros and cons of each
- Finally, we’ll disclose what approach the team took, and how it worked out
Throughout the event, we’ll have opportunities to win ScyllaDB swag and prizes! Come prepared to engage in lively discussions and gain valuable insight into database performance strategies.
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
Felipe Cardeneti Mendes, Solutions Architect at ScyllaDB
Navigating workload-specific performance challenges and tradeoffs.
Felipe Mendes covers how to navigate the top performance challenges and tradeoffs that you’re likely to face with your project’s specific workload characteristics and technical/business requirements.
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
Pavel Emelyanov, Principal Engineer at ScyllaDB
Botond Denes, C++ Developer at ScyllaDB
What performance-minded engineers need to know.
Hear from Pavel Emelyanov and Botond Dénes on the impact of database internals – specifically, what to look for if you need latency and/or throughput improvements.
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
Piotr Sarna, Software Engineer at Turso
Understanding and tapping your driver’s performance potential.
Piotr Sarna discusses how to get the most out of a driver, particularly from the performance perspective, and select a driver that’s a good fit for your needs.
This document discusses replacing external caching solutions with using the internal caching capabilities of ScyllaDB. It provides examples of companies that improved performance, reduced costs and complexity by moving from Redis or Elasticsearch with an external cache to using ScyllaDB's embedded cache instead. The document also outlines some of the advantages of ScyllaDB's cache like improved latency, coherency with the database and observability compared to external caching layers.
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
Discover how your team can achieve low latency at the extreme scale that your data-intensive applications require. We’ll walk you through an example of how ScyllaDB scales linearly to achieve 1M and then 2M OPS – with <1ms P99 latency. We’ll cover how this works on a sample realtime app (an ML feature store), share best practices for performance, and talk about the most important tradeoffs you’ll need to negotiate.
Join us to learn:
- Why and how to ensure your database takes full advantage of your cloud infrastructure
- What architectural considerations matter most for high throughput and low latency
- Key factors to consider when selecting a high-performance database
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
This document discusses the pros and cons of placing an external cache in front of a database. It introduces Tomasz Grabiec and Tzach Livyatan from ScyllaDB and describes ScyllaDB's optimized internal caching design. External caches can increase latency and costs while ignoring the database's context and workload knowledge. ScyllaDB embeds its cache to minimize overhead and ensure data and query awareness. The document shares customer examples that improved performance and reduced costs by moving from cached databases to ScyllaDB.
Expert tips on how to maximize your database potential
If you’re considering or getting started with ScyllaDB, you’re probably intrigued by its potential to achieve high throughput and predictable low latency at a reasonable cost. So how do you ensure that you’re maximizing that potential for your team’s specific workloads and use case?
This webinar offers practical advice for navigating the various decision points you’ll face as you assess whether ScyllaDB is a good fit for your team and later roll it out into production. We’ll cover the most critical considerations, tradeoffs, and recommendations related to:
- Infrastructure selection
- ScyllaDB configuration
- Client-side setup
- Data modeling
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
In this talk, Felipe Mendes, Solutions Architect at ScyllaDB, shares how 4 companies managed their migration. He covers:
Disney+ – No migration needed!
Discord – Shadow cluster
OpenWeb – TTL expiration, cover Load and Stream
MyHeritage – Counters
ShareChat – Bonus: A bit of everything
In this talk, Lubos discusses tools and methods for a successful migration. He covers:
Methods
Data (re)modeling
APIs
Spark Migrator
DS bulk
Tuning
Testing/monitoring
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Generating privacy-protected synthetic data using Secludy and Milvus
Scylla Summit 2017: Scylla for Mass Simultaneous Sensor Data Processing of MES (Manufacturing Execution System)
1. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
SCYLLA
in Manufacturing
Principal Engineers, Samsung SDS
Kuyul Noh & Junghyun Park
2. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Kuyul Noh
• 25-year experience in ICT industry
• Principal Data Architect at Samsung SDS
• Planning & Leading ScyllaDB projects for Samsung
Junghyun Park
• 10-year experience in ICT industry
• Senior Data Architect at Samsung SDS
• Leading ScyllaDB adoption projects for Samsung
- 2 / 30 -
3. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Agenda
Use Case in Manufacturing
Samsung SDS?
Lessons Learned
Scylla Managed Service
4. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Samsung SDS ?
5. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
SAMSUNG SDS (1/2)
IT Services Business Solutions Logistics
BPO
Logistics BPO2
Consulting / SI1
Infrastructure Outsourcing
Application Outsourcing
Supply Chain & Logistics
1SI : Systems Integration
2BPO : Business Process Outsourcing
Enterprise Applications
Enterprise Analytics
Enterprise Mobility
▪ As an “IT Solution & Service Provider”, Samsung SDS plays a pivotal role
in improving IT competitiveness across the Samsung Group to become a
top tier company in diverse industries
- 5 / 30 -
6. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
57 Global Offices in 31countries
Global Presence
SDS China
Beijing, China
Global HQ
Seoul, Korea
SDS Latin America
Sao Paulo, Brazil
SDS Asia Pacific
Singapore
SDS America
New Jersey, USA
SDS India
New Delhi, India
SDS Europe
Weybridge, UK
SDS Middle East
Dubai, UAE
Global Footprints
4 SW Centers
29 Logistics Offices
7 Overseas Subsidiaries
11 Data Centers
SAMSUNG SDS (2/2)
- 6 / 30 -
7. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Scylla in SAMSUNG SDS
▪ In-depth technical validation of Scylla solution
▪ Signed a Global Partnership Agreement
▪ Deploying Scylla in Samsung
(Manufacturing, IoT Platform, Communication, Healthcare, etc.)
▪ Preparing Scylla Managed Service in Cloud
- 7 / 30 -
8. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Use Case
in Manufacturing
9. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Gather sensor data from equipment in real time
e.g. Temperature, Pressure
Stop production lines if specification
data exceeds the pre-defined threshold
FDC
➔ Reference data setup for equipment & sensors
➔ Threshold setup for anomaly detection
➔ Dash Board
➔ Data / Trend Viewer
➔ Data Analysis
ScyllaDB #1
ScyllaDB #2
ScyllaDB #3
RDBMS
Meta Data
Sensor data
Use Case Overview
▪ FDC (Fault Detection & Classification) System
- 9 / 30 -
10. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
System Requirements
▪ High throughput (more than 200K Events per Second)
▪ Scalability for production facilities
▪ Lower cost than existing commercial RDBMS (e.g. Oracle Exadata)
▪ Easy deployment and maintenance (Auto Tuning, etc.)
▪ Easy to delete old data (Time To Live, etc.)
- 10 / 30 -
11. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Performance Test (Cassandra vs. ScyllaDB)
▪ Scylla has 2.3 times higher throughput
누적 시간(seconds)
X 100 batch
▪ HW : 16 Cores / 48GB (3 Nodes)
▪ SW : Scylla 1.5 / Cassandra 3.9
▪ Client : Java Program
110 Thread Max
Avg. 282,900
Avg. 159,400
Avg. 124,600
2.3x
Cumulative Time (Seconds)
- 11 / 30 -
12. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Legacy Data Schema (Oracle)
▪ Each sensor data is collected every second
▪ Sensor data occupies more than 80% of the disk
▪ About 19 additional columns (data types) are required
Column Data Type
SensorId (PK) NUMBER
Time (PK) TIMESTAMP
Value NUMBER
Col1 NUMBER
Col2 NUMBER
Step_cd VARCHAR2
… …
19 Columns
- 12 / 30 -
13. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
1st Design (Scylla)
▪ Added Partition Key (daily partitioning)
▪ Added 19 meta-data columns
▪ Default Configuration
Column Data type Key type
PartitionKey text PARTITION KEY
SensorId bigint PARTITION KEY
Time timestamp CLUSTERING KEY
Value Double
Col1 Text
Col2 Text
Step_cd Text
… …
19 Columns
- 13 / 30 -
14. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Challenge #1
▪ Adding additional 19 columns resulted in an enormous amount of data
→ Defined a UDT(User Defined Type) as a group of columns
which is looked up together
CREATE TYPE UDT1 (
step_cd text,
…
);
Column Data type
PartitionKey text
SensorId bigint
Time timestamp
Value Double
Col1 Text
Col2 Text
Detail1 UDT1 (12 column)
Detail2 UDT2 (5 column)
Data size reduced
by more than 50%
- 14 / 30 -
15. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Challenge #2
▪ Failures in deleting some expired data in “DateTieredCompaction” policy
& loop compaction
→ Scylla’s technical support and patch (#2260)
Expired data was deleted
Urgent
Patch
14:31:11 server02 scylla: [shard 0] compaction - Compacted 1
14:31:11 server02 scylla: [shard 0] compaction - Compacting
14:31:12 server02 scylla: [shard 0] compaction - Compacted 1
14:31:12 server02 scylla: [shard 0] compaction - Compacting
14:31:12 server02 scylla: [shard 0] compaction - Compacted 1
14:31:12 server02 scylla: [shard 0] compaction – Compacting
…
<< Loop Compaction >>
<< No Loop Compaction & Expired Data Deletion >>
- 15 / 30 -
16. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Challenge #3
▪ Large partition size caused slow response
→ Changed daily key partitioning to hourly (34MB ➔ 1.4 MB)
→ Used async queries to process multiple partitions simultaneously
2x
Faster
read latency
ScyllaDB
…
Asynchronous 24 Queries for one-day data
Sorted
partition
- 16 / 30 -
17. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Challenge #4
▪ Increased memory usage due to the large size of CompressionInfo File
→ Changed chunk_length_kb value from 4k to 64k
Total memory : 20G
Chunk
length
CompressionInfo
Size (GB)
4k 13
64k 0.8
Size of Data.db file: 1.8 TB Non-LSA memory usage
decreased
13GB ➔ 0.8GB
Use case Recommendation
small single key smaller chunks
large single key larger chunks
range scans larger chunks
mostly writes larger chunks
Size Test ScyllaTeam‘s Guide
- 17 / 30 -
18. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Final Design
▪ Hourly Partitioning + Async Query
▪ UDT (User Defined Type) Columns
▪ chunk_length_kb = 64
Column Data type Key type
PartitionKey text PARTITION KEY
SensorId bigint PARTITION KEY
Time timestamp CLUSTERING KEY
Value Double
Col1 Text
Col2 Text
Detail1 UDT (12 columns)
Detail2 UDT (5 columns)
- 18 / 30 -
19. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Production (1/2)
▪ Read Request
▪ Write Request
✓ As of Now : 3,000 TPS
✓ Near Future : 10,000 TPS
✓ As of Now : 300 TPS
- 19 / 30 -
20. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Production (2/2)
▪ Reactor
▪ Disk Usage
GB
✓ Data past the retention period (31 days) was confirmed to have physically been deleted
✓ As of Now : 550 GB
✓ Near Future : 3 TB
- 20 / 30 -
21. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Lessons Learned
22. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Lessons Learned
▪ “Datetiered” compaction policy for Time Series Data
▪ UDT is an alternative choice for many columns
▪ The Smaller partition size, the Better
▪ Consider Async API for faster range read latency
▪ Design a suitable chunk size for memory utilization
• Reference : http://www.scylladb.com/2017/08/01/compression-chunk-sizes-scylla/
- 22 / 30 -
23. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Voices of the Customer
▪ Very satisfied with the simplicity of the architecture & high performance
▪ Still, some enhancements are required
▪ New storage format (like Cassandra 3.0) ➔ 2.x
▪ Allow row cache to store incomplete partitions ➔ 2.0
▪ Hinted handoff ➔ 2.1
▪ Materialized View ➔ Experimental 2.0, Production 2.2
▪ Secondary Index ➔ 2.2
▪ Time Window Compaction Strategy ➔ 2.1
- 23 / 30 -
24. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Next Step
▪ Through a close collaboration with ScyllaDB team,
we plan to deploy Scylla as a sensor data processing DBMS
across customer’s overseas production plants in the near future
- 24 / 30 -
25. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Scylla
Managed Service
26. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Service Provider
Service User
Management Interface
ConsumerInterfaceClientInterface
Service Management
Scylla Service Management
• Provisioning
• DB Operation
• Monitoring
• Metering, etc.
Infrastructure
Controller
(IaaS)
Resource pool
Infrastructure
DB image,
configurations
DB Management Interface
Admin
Developer
Applications
DB instance
DB instance
DB instance
DB instance
DB instance
DB instance
BSS
OSS
Admin
Managed DB Service
Conceptual Architecture
▪ Preparing for Scylla Managed Service in Cloud
- 26 / 30 -
27. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Service Features
▪ Completed Managed Service features
Category Features
Managed Service
Enterprise
Functionality
Optimization
DevOps
Cluster Provisioning DB Operations DB Monitoring
Configuration Management Backup / Restore Scale In / Out
Data Migration Backup Scheduler
Threshold Management / Alarm Cluster Diagnosis
Schema Management Query Execution
- 27 / 30 -
28. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
28
29. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Kirke
29
▪ Joyent Site
✓ Trial service is now available at
https://www.joyent.com/
30. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
THANK YOU
hanbada@samsung.com
infordb.park@samsung.com
Contact
Any questions?