Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

•

2 likes•1,181 views

Kenshoo is a leader in digital marketing with very heavy data usage. Learn about their big data challenges, the tools that they use, and their experience evaluating Scylla.

About Me
• Wrote “Basic” code when I was a kid
• 17 years in the internet industry
• Big data fanatic for the last 6 years
• Big data team leader At Kenshoo
• Our team is Big data DBA
• Programming: ETL & Administration tools

Kenshoo
• 10 years, Tel-Aviv based Startup
• Industry Leader in Digital Marketing
• 500+ employees
• Heavy data shop

Cassandra at Kenshoo
• Datastax Community
• 5 clusters
• 70 nodes
• Biggest cluster, ver 1.2.19
▪ 40 physical nodes
▪ 4TB compressed to 1TB per node
▪ 14 Billion records
▪ 1500 bytes values, IOPS: 5K avg, 30K burst
▪ Processing clicks and conversion for user behavior

Cassandra Cost
• Cassandra is great at writes (At first)
• Postponing the “cost” to background processes
(no free meals)
▪ Compact
▪ Repair
▪ Cleanup
▪ Add / Remove node

Day to day
• Requires a lot of in depth knowledge
• Lack of documentation
• Tuning per application
• Lot’s of custom maintenance scripts
• Don’t run repair , only rebuild

• Compaction didn’t complete during maintenance window
• Leveled CS is a ColumnFamily cluster wide configuration
• Found a jmx “hack” for a per server config
Migration to Leveled
Compaction Strategy
• It took us a month to manually switch
• Needed to reconfigure each server after restart

GC Hell
• Lot’s of GC under load
• GC causing 95th percentile performance
▪ One problematic node affects the cluster performance
• Full restart to the cluster each week
• Change rpc server type to hsha
• Tuning is black magic, takes days to see effects

Maintenance is delicate
• Need to wait between adding and removing nodes for
cluster to rebalance
• Turn off a node thrift service
• Running partial cleanup
• Tuning of params
▪ compaction_throughput_mb_per_sec
▪ concurrent_compactors

Lab setup
• i2.8xlarge
• Cassandra 2.1.15
▪ compaction_throughput_mb_per_sec = 0 (16)
▪ stream_throughput_outbound_megabits_per_sec = 10000 (200)
▪ inter_dc_stream_throughput_outbound_megabits_per_sec = 10000 (200)
• Scylla 1.3
▪ out of the box

Repair
• 3 x i2.8xlarge
• RF 3
• 72GB data per node
• 10M rows
▪ 5 columns
▪ 1500 bytes value
• Delete all the data from one node

Cleanup at Scylla
• 4 x i2.8xlarge
• RF 3
• 270GB data per node
• 50M rows
▪ 5 columns
▪ 1500 bytes value
• Decommission a node and join it back

Compact at Scylla
• 3 x i2.8xlarge
• RF 3
• 30 minute stress
• 10M rows
▪ 5 columns
▪ 1500 bytes value

Latency & Throughput
• 3 x i2.8xlarge
• RF 3
• 72GB data per node
• 10M rows
▪ 5 columns
▪ 1.5K value
▪ 3 writes, 2 reads mixed
• 4 x cassandra-stress nodes
▪ 30 minutes

• Scylla has lower cost
▪ Compaction, repair & cleanup are much more efficient
▪ Consistent in latency under much higher load
• Moving forward
▪ Integrate it in an inner production monitoring system
Conclusion

Thank You!
Contact: noam.hasson@kenshoo.com

How do you handle the continuous transformation and refinement of billions of entities with some sort of reliability and performance? In this talk, Henrik will describe how Scylla enabled him and his team to create a pipelined solution using a series of microservices written in Go communicating with each other using Nats. You’ll hear about the mistakes and learnings they had along the way as they built the services that led to the great performance and stability they are experiencing today.

Scylla Summit 2016: Compose on Containing the Database

ScyllaDB

This document discusses how Compose applies containerization best practices to provide database services. It outlines the "Twelve Factors of Stateful Apps" that guide Compose's architecture. These include running databases and data in separate containers, using environment variables for configuration, scaling containers vertically before adding nodes, and collecting logs and metrics within the deployment. By applying these factors, Compose can reliably deploy a range of database technologies like MongoDB, PostgreSQL, and now ScyllaDB across its platform.

mParticle's Journey to Scylla from Cassandra

ScyllaDB

mParticle processes 50 billion monthly messages and needed a data store that provides full availability and performance. They previously used Cassandra but faced issues with high latency, complicated tuning, and backlogs of up to 20 hours. They tested Scylla and found it provided significantly lower latency and compaction backlogs with minimal tuning needed. Scylla also offered knowledgeable support. mParticle migrated their data from Cassandra to Scylla, which immediately kept up with their data loads with little to no backlog.

SAS Institute on Changing All Four Tires While Driving an AdTech Engine at Fu...

ScyllaDB

How to Monitor and Size Workloads on AWS i3 instances

ScyllaDB

There is a new class of machines in town! Amazon recently unveiled i3, a new class of machines targeted at I/O-intensive workloads. Scylla will officially support i3, and previews are already available. Join our webinar to learn how to build a state-of-the-art database solution. Presenters Glauber Costa and Eyal Gutkind will cover how to: - Determine which workloads can benefit from i3 instances - Ensure Scylla fully leverages the great resources in the i3 family - Effectively navigate the Scylla monitoring system and identify bottlenecks You'll also see a live demonstration with a dashboard featuring an i3 cluster with different data models and workloads.

ClustrixDB: how distributed databases scale out

MariaDB plc

ClustrixDB, now part of MariaDB, is a fully distributed and transactional RDBMS for applications with the highest scalability requirements. In this session Robbie Mihalyi, VP of Engineering for ClustrixDB, provides an introduction to ClustrixDB, followed by an in-depth technical overview of its architecture, with a focus on distributed storage, transactions and query processing – and its unique approach to index partitioning.

Eventually consistent databases choose to remain available under failure, allowing for conflicting data to be stored in different replicas (later repaired by background processes). Weakening the consistency guarantees improves not only availability, but also performance, as the number of replicas involved in a given operation can be minimized. There are, however, use-cases that require the opposite trade-off. Indeed, Apache Cassandra and Scylla provide Lightweight Transactions (LWT), which allow single-key linearizable updates. The mechanism underlying LWT is asynchronous consensus. In this talk, we'll describe the characteristics and requirements of Scylla's consensus implementation, and how it enables strongly consistent updates. We will also cover how consensus can be applied to other aspects of the system, such as schema changes, node membership, and range movements, in order to improve their reliability and safety. We will thus show that an eventually consistent database can leverage consensus without compromising either availability or performance.

Scylla Summit 2019 Keynote - Avi Kivity

ScyllaDB

Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle

ScyllaDB

Scylla Summit 2016: Graph Processing with Titan and Scylla

ScyllaDB

FireEye & Scylla: Intel Threat Analysis Using a Graph Database

ScyllaDB

FireEye believes in intelligence driven cyber security. Their legacy system used PostgreSQL with a custom graph database system to store and facilitate analysis of threat intelligence data. As their user base increased they ran into scaling issues requiring a system redesign with a new platform. This presentation will focus on the bac kend systems and migration path to a new technology stack using JanusGraph running on top of Scylla plus Elasticsearch. Using Scylladb turned out to be a game-changer in terms of performance and the types of analysis our application is able to do effortlessly.

Workshop - How to benchmark your database

ScyllaDB

Why you need benchmarks Finding the right database solution for your use case can be an arduous journey. The database deployment touches aspects of throughput performance, latency control, high availability and data resilience. You will need to decide on the infrastructure to use: Cloud, on-premise or a hybrid solution. Data models also have an impact on finding the right fit for the use case. Once you establish a requirements set, the next step is to test your use case against the databases of choice. In this workshop, we will discuss the different data points you need to collect in order to get the most realistic testing environment. We will cover: Data model impact on performance and latency Client behavior related to database capabilities Failover and high availability testing Hardware selection and cluster configuration impact We will show 2 benchmarking tools you can use to test and benchmark your clusters to identify the optimal deployment scenario for your use case. Attend this virtual workshop if you are: Looking to minimize the cost of your database deployment Making a database decision based on performance and scale data Planning to emulate your workload on a pre-production system where you can test, fail fast and learn.

Scylla Summit 2018: Getting the Most Out of Scylla on Kubernetes

ScyllaDB

People want to have the convenience of deployment through Kubernetes, while still maintaining performance and management control. Moreno first began by getting Scylla working on Docker, and will discuss his in-depth investigation in getting passed performance bottlenecks. After finding how to get most of the performance back, then moved into Kubernetes. StatefulSets are production-ready since Kubernetes 1.9 but there is lot around StatefulSets that is not quite there. What are the tradeoffs of running a stateful application in a stateless environment? How do we minimize those tradeoffs to get the best operational reliability on Kubernetes without losing Scylla performance optimizations? What do you do when you are trying to run as close to the hardware as possible and then you containerize your installation? How do you remain an auto-tuning database when you are running in a containerized world? Learn how to use Docker, Kubernetes and Helm Charts with Scylla. We now invite members of the open source user community for your contributions, testing and feedback. Join our channels for #docker and #kubernetes on our open Slack!

Scylla’s Journey Towards Being an Elastic Cloud Native Database

ScyllaDB

The True Cost of NoSQL DBaaS Options

ScyllaDB

Many NoSQL DBaaS vendors limit what cloud platform you can run on, the size of the data you can run and require you to over-provision cloud infrastructure resources while failing to deliver performance and low latency at scale. In this session, we will compare the performance and Total Cost of Ownership (TCO) of competing NoSQL DBaaS offerings. We will also review how to migrate to Scylla Cloud, our fully managed database service. You will learn: - The true cost of ownership for selected NoSQL DBaaS offerings - The 8 essentials for selecting a NoSQL DBaaS - Migration options from Apache Cassandra, DynamoDB and other databases

Seastar Summit 2019 Keynote

ScyllaDB

Seastar is a framework for disk, network, compute, and multicore intensive applications such as databases and filesystems. It treats multicore CPUs and disk I/O as asynchronous entities like networking, replacing locks with message passing. This provides benefits like high throughput, low latency, and control over where throughput and latency occur. The keynote discussed Seastar's approach to scheduling, opportunities around coroutines, and its goals for modules, stream revamping, and task co-execution. Compatibility policies were outlined emphasizing community involvement in supported compilers, APIs, and architectures.

Running Scylla on Kubernetes with Scylla Operator

ScyllaDB

- The document discusses running Scylla, a NoSQL database, on Kubernetes using the Scylla Operator. The Operator allows Kubernetes to leverage for workload management and provides a management layer for Scylla. - A demo shows deploying a Scylla cluster on Kubernetes with the Operator, stress testing the deployment, and performing common procedures like scaling up and upgrading Scylla versions. - The Operator uses custom resources and controllers to map Scylla concepts like members, clusters, and datacenters to Kubernetes concepts like statefulsets and pods. This provides capabilities like topology changes and rolling upgrades.

Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration

ScyllaDB

Patience with Apache Cassandra’s volatile latencies was wearing thin at Rakuten, a global online retailer serving 1.5B worldwide members. The Rakuten Catalog Platform team architected an advanced data platform – with Cassandra at its core – to normalize, validate, transform, and store product data for their global operations. However, while the business was expecting this platform to support extreme growth with exceptional end-user experiences, the team was battling Cassandra’s instability, inconsistent performance at scale, and maintenance overhead. So, they decided to migrate. Join this webinar to hear a firsthand account of: How specific Cassandra challenges were impacting the team and their product How they determined whether migration would be worth the effort What processes they used to evaluate alternative databases What their migration required from a technical perspective Strategies (and lessons learned) for your own database migration

Scylla Summit 2018: Scylla 3.0 and Beyond

ScyllaDB

Scylla 3.0 will include several new features and performance improvements including incremental compaction to reduce storage requirements, columnar storage to boost analytics performance, and multi-tenancy to fully isolate user workloads. It will also add lightweight transactions and improve analytics queries, large partition support, and observability tools. Underlying infrastructure changes involve optimizing Linux and Seastar for Scylla's needs.

Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...

ScyllaDB

Scylla strives to deliver high throughput at low, consistent latencies under any scenario. But in the field things can and do get slower than one would like. Some of those issues come from bad data modelling and anti-patterns. Some others from lack of resources and bad system configuration, and in rare cases even product malfunction. But how to tell them apart? And once you do, how to understand how to fix your application or reconfigure your system? Scylla has a rich ecosystem of tools available to answer those questions and in this talk we’ll discuss the proper use of some of them and how to take advantage of each tool’s strength. We will discuss real examples using tools like CQL tracing, nodetool commands, the Scylla monitor and others.

Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB

ScyllaDB

How does Cassandra 4.0’s performance compare to Cassandra 3.x’s? What’s been fixed in 4.0 and what remains unchanged? Should you upgrade or consider other options? Join us for a webinar where we’ll answer these questions and more, based on our extensive benchmarks comparing Cassandra 4.0 against Cassandra 3.11. We’ll also share how the new release of Cassandra stacks up against Scylla Open Source. You’ll learn the rationale and results for our head-to-head comparisons, including: - Throughput under various loads - Comparison of long-tail (p95, p99) latencies - Improvements to operations such as compactions If you are considering upgrading your existing infrastructure from Cassandra 3.11, or if you are considering a new wide column database for a greenfield deployment, this is a session you won’t want to miss!

ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...

Data Con LA

Scylla is a new, open-source NoSQL data store with a novel design optimized for modern hardware, capable of 1.8 million requests per second per node, while providing Apache Cassandra compatibility and scaling properties. While conventional NoSQL databases suffer from latency hiccups, expensive locking, and low throughput due to low processor utilization, the Scylla design is based on a modern shared-nothing approach. Scylla runs multiple engines, one per core, each with its own memory, CPU and multi-queue NIC. The result is a NoSQL database that delivers an order of magnitude more performance, with less performance tuning needed from the administrator. With extra performance to work with, NoSQL projects can have more flexibility to focus on other concerns, such as functionality and time to market. Come for the tech details on what Scylla does under the hood, and leave with some ideas on how to do more with NoSQL, faster. Speaker bio Don Marti is technical marketing manager for ScyllaDB. He has written for Linux Weekly News, Linux Journal, and other publications. He co-founded the Linux consulting firm Electric Lichen. Don is a strategic advisor for Mozilla, and has previously served as president and vice president of the Silicon Valley Linux Users Group and on the program committees for Uselinux, Codecon, and LinuxWorld Conference and Expo.

ScyllaDB @ Apache BigData, may 2016

Tzach Livyatan

Scylla is a new open source NoSQL database that is compatible with Apache Cassandra but provides significantly higher performance through a redesign that takes advantage of modern hardware. Scylla is capable of over 1.8 million operations per second per node with predictable low latencies. It uses an architecture with shard-per-core and reactor programming that avoids locks and threads for near-linear scaling. Scylla also has its own efficient unified cache and I/O scheduler that maximize throughput and allow it to outperform Cassandra on benchmarks by an order of magnitude. Scylla is fully compatible with Cassandra and aims to build an open source community around ongoing core database improvements.

Scylla Summit 2018: Cassandra and ScyllaDB at Yahoo! Japan

ScyllaDB

Yahoo! JAPAN is one of the most successful internet service companies in Japan. Their NoSQL Team's Takahiro Iwase and Murukesh Mohanan have been testing out ScyllaDB, comparing it with Cassandra on multiple parameters: performance (both throughout and latency), reliability and ease of use. They will discuss the motivations behind their search for a successor of Cassandra that can handle exceedingly heavy traffic, and their evaluation of ScyllaDB in this regard.

Introducing Scylla Open Source 4.0

ScyllaDB

Since its inception, Scylla has offered a compelling alternative to Apache Cassandra, providing better performance for a lower cost of ownership. With Scylla Open Source 4.0 we continue to extend our CQL interface features and capabilities and also now provide an open source alternative to DynamoDB, allowing you to run your workloads anywhere, on any cloud provider, or on premises. Join ScyllaDB co-founders, CTO Avi Kivity and CEO Dor Laor, for a look at the new features in Scylla Open Source 4.0, and architectural and cost comparisons with the coming Cassandra 4.0. Topics will include: Improved consistency with our new Lightweight Transactions Scylla Operator for Kubernetes How we stack up against Apache Cassandra 4.0 Our “run anywhere” DynamoDB alternative

Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...

ScyllaDB

ScyllaDB is a distributed database designed to scale horizontally and vertically — in theory. What about in practice? ScyllaDB’s Benny Halevy, Director, Software Engineering, will take you through the process and results of benchmarking our NoSQL database at the petabyte level, showing how you can use advanced features like workload prioritization to control priorities of transactional (read-write) and analytic (read-only) queries on the same cluster with smooth and predictable performance. To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.

Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...

ScyllaDB

Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla

ScyllaDB

What's hot

Scylla Summit 2018: Consensus in Eventually Consistent Databases

ScyllaDB

Scylla Summit 2019 Keynote - Avi Kivity

ScyllaDB

Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle

ScyllaDB

Scylla Summit 2016: Graph Processing with Titan and Scylla

ScyllaDB

FireEye & Scylla: Intel Threat Analysis Using a Graph Database

ScyllaDB

Workshop - How to benchmark your database

ScyllaDB

Scylla Summit 2018: Getting the Most Out of Scylla on Kubernetes

ScyllaDB

Scylla’s Journey Towards Being an Elastic Cloud Native Database

ScyllaDB

The True Cost of NoSQL DBaaS Options

ScyllaDB

Seastar Summit 2019 Keynote

ScyllaDB

Running Scylla on Kubernetes with Scylla Operator

ScyllaDB

Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration

ScyllaDB

Scylla Summit 2018: Scylla 3.0 and Beyond

ScyllaDB

Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...

ScyllaDB

Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB

ScyllaDB

ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...

Data Con LA

ScyllaDB @ Apache BigData, may 2016

Tzach Livyatan

Scylla Summit 2018: Cassandra and ScyllaDB at Yahoo! Japan

ScyllaDB

Introducing Scylla Open Source 4.0

ScyllaDB

Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...

ScyllaDB

What's hot (20)

Scylla Summit 2018: Consensus in Eventually Consistent Databases

Scylla Summit 2019 Keynote - Avi Kivity

Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle

Scylla Summit 2016: Graph Processing with Titan and Scylla

FireEye & Scylla: Intel Threat Analysis Using a Graph Database

Workshop - How to benchmark your database

Scylla Summit 2018: Getting the Most Out of Scylla on Kubernetes

Scylla’s Journey Towards Being an Elastic Cloud Native Database

The True Cost of NoSQL DBaaS Options

Seastar Summit 2019 Keynote

Running Scylla on Kubernetes with Scylla Operator

Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration

Scylla Summit 2018: Scylla 3.0 and Beyond

Scylla Summit 2018: Make Scylla Fast Again! Find out how using Tools, Talent,...

Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB

ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...

ScyllaDB @ Apache BigData, may 2016

Scylla Summit 2018: Cassandra and ScyllaDB at Yahoo! Japan

Introducing Scylla Open Source 4.0

Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...

Viewers also liked

Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...

ScyllaDB

Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla

ScyllaDB

Scylla Summit 2016: Keynote - Big Data Goes Native

ScyllaDB

This document discusses Scylla, a new database that aims to improve upon existing databases. It notes several key differences in Scylla's architecture that allow it to be faster and more scalable than other databases, including its use of techniques like log-structured merge trees, lock-free design, and asynchronous programming. The document also outlines Scylla's value proposition as the fastest database with the best high availability and ease of management compared to other options.

Performance Monitoring: Understanding Your Scylla Cluster

ScyllaDB

Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra

Tzach Livyatan

Kenshoo us conversion attribution whitepaper

Marco Botticelli

1) As digital marketing has evolved, attribution models have moved from single-point models like last click, to linear models, and now to non-linear models that better reflect consumer purchase journeys across multiple channels. 2) Case studies demonstrate how attribution modeling reveals touchpoints throughout the purchase funnel, allowing optimization of budgets to increase ROI. Selecting the right attribution model based on industry and regional differences is important. 3) A multi-channel approach to attribution considers touchpoints across email, search, display, and retargeting to better understand their influence on conversions rather than relying solely on last-click attribution.

Improving SEM Results through Multi-Channel Conversion Attribution - Kenshoo ...

Kenshoo

View a recording of the webinar at: www.Kenshoo.com/conversion-attribution-webinar-recording/ Presentation from the November 28th Kenshoo webinar on multi-channel conversion attribution featuring Accor Hotels, one of the world's leading hotel groups, and sponsored by the Yahoo! Bing Network. Learn how applying advanced conversion attribution models across all your online media will help you make better marketing decisions and drive higher revenues. And see how Accor implemented conversion attribution techniques to drive an 82% lift in revenues from SEM.

The Key to Successful Social Advertising - Kenshoo Social Webinar featuring F...

Kenshoo

The document summarizes the key findings from a Forrester Research study on social advertising, including that most social advertisers focus on basic targeting criteria and tactics like ad rotation, and that promoted content is best for creating brand awareness while paid social ads are best for direct objectives like driving purchases. The presentation then discusses how Kenshoo's social media platform can help clients by integrating paid and organic social campaigns, providing advanced targeting and attribution analytics, and taking a holistic cross-channel approach to optimization and results.

Back to the future with C++ and Seastar

Tzach Livyatan

Seastar is an open source framework that provides highly scalable and asynchronous distributed applications. It uses a shared-nothing architecture with no locks or threads to achieve linear scaling across cores. Applications built on Seastar can handle millions of connections and I/O operations in parallel. It uses an asynchronous programming model based on promises and futures with zero-copy networking and disk I/O for high performance.

The Intersection of Search and Social - Kenshoo Webinar

Kenshoo

The Role Of Social Ads In Your Overall Marketing Campaigns By David Zelniker

Marketing Land

Marketing Technology Ecosystem

Michael Rektorik

Paid Search & Social: The Ultimate Knock-Out Punch By Maggie Malek

Search Marketing Expo - SMX

Performance Tuning EC2 Instances

Brendan Gregg

Talk for AWS re:Invent 2014. Video: https://www.youtube.com/watch?v=7Cyd22kOqWc . Netflix tunes Amazon EC2 instances for maximum performance. In this session, you learn how Netflix configures the fastest possible EC2 instances, while reducing latency outliers. This session explores the various Xen modes (e.g., HVM, PV, etc.) and how they are optimized for different workloads. Hear how Netflix chooses Linux kernel versions based on desired performance characteristics and receive a firsthand look at how they set kernel tunables, including hugepages. You also hear about Netflix’s use of SR-IOV to enable enhanced networking and their approach to observability, which can exonerate EC2 issues and direct attention back to application performance.

Top 4 Enterprise Bid Management Platforms for Digital Agencies | ymarketing

Ryan Lash

Viewers also liked (15)

Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...

Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla

Scylla Summit 2016: Keynote - Big Data Goes Native

Performance Monitoring: Understanding Your Scylla Cluster

Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra

Kenshoo us conversion attribution whitepaper

Improving SEM Results through Multi-Channel Conversion Attribution - Kenshoo ...

The Key to Successful Social Advertising - Kenshoo Social Webinar featuring F...

Back to the future with C++ and Seastar

The Intersection of Search and Social - Kenshoo Webinar

The Role Of Social Ads In Your Overall Marketing Campaigns By David Zelniker

Marketing Technology Ecosystem

Paid Search & Social: The Ultimate Knock-Out Punch By Maggie Malek

Performance Tuning EC2 Instances

Top 4 Enterprise Bid Management Platforms for Digital Agencies | ymarketing

Similar to Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

Running Kafka for Maximum Pain

Todd Palino

This document discusses some of the challenges of running Kafka at scale based on LinkedIn's experience. It describes how multitenancy can cause problems when topics are automatically created without ownership. It also discusses issues with infrastructure like inefficient mirroring and a lack of auditing. Management was difficult due to the lack of tools for configuring topics across clusters and upgrading brokers. LinkedIn developed open source tools like Cruise Control and Burrow to help address some of these problems.

Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known

DataStax

A brief intro to how Barracuda Networks uses Cassandra and the ways in which they are replacing their MySQL infrastructure, with Cassandra. This presentation will include the lessons they've learned along the way during this migration. Speaker: Michael Kjellman, Software Engineer at Barracuda Networks Michael Kjellman is a Software Engineer, from San Francisco, working at Barracuda Networks. Michael works across multiple products, technologies, and languages. He primarily works on Barracuda's spam infrastructure and web filter classification data.

Collier exadata technical overview presentation 4 14-10

xKinAnx

C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

DataStax Academy

Hindsight is 20/20: MySQL to Cassandra

Michael Kjellman

Kafka Summit SF 2017 - Running Kafka for Maximum Pain

confluent

This document discusses some of the challenges of running Apache Kafka at scale at LinkedIn, including issues with multitenancy, infrastructure, and management. It describes how high volumes of data and many producers can complicate ownership and capacity planning when data is shared. It also explains the pain points of tools like Mirror Maker and the lack of topic configuration management across clusters. Finally, it outlines some of LinkedIn's open source efforts to improve Kafka operations through tools like Cruise Control, Kafka Monitor, and kafka-tools.

M6d cassandrapresentation

Edward Capriolo

Cassandra Summit 2014: Deploying Cassandra for Call of Duty

DataStax Academy

Presenters: Seán O Sullivan, Service Reliability Engineer & Tim Czerniak, Software Engineer at Demonware This presentation covers the eight-month evaluation process we underwent to migrate some of Call of Duty’s core services from MySQL to Cassandra. We will outline our requirements, the process we followed for the evaluation, decisions we made around our schema, configuration and hardware, and some issues we encountered.

Scaling with sync_replication using Galera and EC2

Marco Tusa

Sergey Dzyuban "To Build My Own Cloud with Blackjack…"

Fwdays

Cloud providers like Amazon or Google have a great user experience to create and manage PaaS. But is it possible to reproduce the same experience and flexibility locally, in the on-premise datacenter? What if your own infrastructure grows to fast and your team can’t deal with it in the old way? What does Jenkins, .NET microservices and TVs for daily meetings have in common? This talk shares our experience using DC/OS (datacenter operating system) for building flexible and stable infrastructure. I will show the evolution of private cloud from the first steps with Vagrant to the hybrid cloud with instance groups in Google Cloud, the benefits it gives us and the problems we get instead.

[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary

Hiram Fleitas León

Infinispan, transactional key value data grid and nosql database

Alexander Petrov

The document discusses key topics related to distributed caching including cache technologies, consistency models, performance considerations, and challenges in introducing distributed caching to existing systems. It provides examples of how reference data and transactional data differ in maximum reads and writes per second. The document also covers cache eviction policies, transactions, and mixing technology stacks.

LeanXcale for Monitoring

LeanXcale

The document discusses the data challenges for monitoring platforms and how LeanXcale addresses them. It outlines LeanXcale's capabilities for high data ingestion rates with small footprint, real-time KPI/aggregation calculations, providing a 360-degree view of data at scale, and seamlessly blending current and historical data. Benchmarks show LeanXcale outperforming alternatives from DynamoDB, PostgreSQL, and clustered solutions in these areas. LeanXcale allows cost-effective monitoring of large volumes of systems and metrics in real-time.

Building a High Performance Analytics Platform

Santanu Dey

The document discusses using flash memory to build a high performance data platform. It notes that flash memory is faster than disk storage and cheaper than RAM. The platform utilizes NVMe flash drives connected via PCIe for high speed performance. This allows it to provide in-memory database speeds at the cost and density of solid state drives. It can scale independently by adding compute nodes or storage nodes. The platform offers a unified database for both real-time and analytical workloads through common APIs.

Cloud DWH deep dive

Alexander Tokarev

RedisConf18 - Redis at LINE - 25 Billion Messages Per Day

Redis Labs

LINE uses Redis for caching and primary storage of messaging data. It operates over 60 Redis clusters with over 1,000 machines and 10,000 nodes to handle 25 billion messages per day. LINE developed its own Redis client and monitoring system to support client-side sharding without a proxy, automated failure detection, and scalable cluster monitoring. While the official Redis Cluster was tested, it exhibited some issues around memory usage and maximum node size for LINE's large scale needs.

Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...

smallerror

Twitter's operations team manages software performance, availability, capacity planning, and configuration management for Twitter. They use metrics, logs, and analysis to find weak points and take corrective action. Some techniques include caching everything possible, moving operations to asynchronous daemons, and optimizing databases to reduce replication delay and locks. The team also created several open source projects like CacheMoney for caching and Kestrel for asynchronous messaging.

Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...

xlight

Fixing Twitter and Finding your own Fail Whale document discusses Twitter operations. The operations team manages software performance, availability, capacity planning, and configuration management using metrics, logs, and data-driven analysis to find weak points and take corrective action. They use managed services for infrastructure to focus on computer science problems. The document outlines Twitter's rapid growth and challenges in maintaining performance as traffic increases. It provides recommendations around caching, databases, asynchronous processing, and other techniques Twitter uses to optimize performance under heavy load.

Fixing twitter

Roger Xia

Fixing Twitter and Finding your own Fail Whale document discusses Twitter operations. The Twitter operations team focuses on software performance, availability, capacity planning, and configuration management using metrics, logs, and science. They use a dedicated managed services team and run their own servers instead of cloud services. The document outlines Twitter's rapid growth and challenges in maintaining performance. It discusses strategies for monitoring, analyzing metrics to find weak points, deploying changes, and improving processes through configuration management and peer reviews.

Fixing_Twitter

liujianrong

Twitter's operations team manages software performance, availability, capacity planning, and configuration management. They use metrics, logs, and analysis to find weak points and take corrective action. Some techniques include caching everything possible, moving operations to asynchronous daemons, optimizing databases, and instrumenting all systems. Their goal is to process requests asynchronously when possible and avoid overloading relational databases.

Similar to Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla (20)

Running Kafka for Maximum Pain

Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known

Collier exadata technical overview presentation 4 14-10

C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

Hindsight is 20/20: MySQL to Cassandra

Kafka Summit SF 2017 - Running Kafka for Maximum Pain

M6d cassandrapresentation

Cassandra Summit 2014: Deploying Cassandra for Call of Duty

Scaling with sync_replication using Galera and EC2

Sergey Dzyuban "To Build My Own Cloud with Blackjack…"

[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary

Infinispan, transactional key value data grid and nosql database

LeanXcale for Monitoring

Building a High Performance Analytics Platform

Cloud DWH deep dive

RedisConf18 - Redis at LINE - 25 Billion Messages Per Day

Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...

Fixing twitter

Fixing_Twitter

More from ScyllaDB

Optimizing NoSQL Performance Through Observability

ScyllaDB

ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. But before you squeeze, make sure you know what to monitor! Watch our experienced Postgres developer work through monitoring and performance strategies that help him understand what mistakes he’s made moving to NoSQL. And learn with him as our database performance expert offers friendly guidance on how to use monitoring and performance tuning to get his sample Rust application on the right track. This webinar focuses on using monitoring and performance tuning to discover and correct mistakes that commonly occur when developers move from SQL to NoSQL. For example: - Common issues getting up and running with the monitoring stack - Using the CQL optimizations dashboard - Common issues causing high latency in a node - Common issues causing replica imbalance - What a healthy system looks like in terms of memory - Key metrics to keep an eye on This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.

Event-Driven Architecture Masterclass: Challenges in Stream Processing

ScyllaDB

Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...

ScyllaDB

Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...

ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL

ScyllaDB

See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the data modeling transition ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning. Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts. This webinar focuses on common data modeling and querying mistakes that occur when developers move from SQL to NoSQL. For example: - Understanding query first design principles - Planning for schema evolution - Steering clear of common pitfalls and anti-patterns - Assessing data access patterns This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.

What Developers Need to Unlearn for High Performance NoSQL

ScyllaDB

See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the transition ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning. Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts. Our first webinar of this series will cover common mistakes with practices such as: - Translating the data model to NoSQL - Optimizing table design - Optimizing query performance - Planning for partitioning This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.

Low Latency at Extreme Scale: Proven Practices & Pitfalls

ScyllaDB

Expert tips on how to maximize your database performance at scale Untangle the complexity of achieving database performance at scale. Join this webinar to discover commonly overlooked ways to get predictable low latency, even at extreme scale. Our Solution Architects will walk you through the strategies and pitfalls learned by working on thousands of real-world distributed database projects, many reaching 1M OPS with single-digit MS latencies. In addition to offering clear recommendations, we’ll also explain the process behind how we arrived at them – so you can benefit from the lessons learned by other teams. We’ll cover how to: - Design and deploy a large-scale distributed database cluster - Optimize your clients’ interactions with it - Expand the cluster horizontally and globally - Ensure it survives whatever disasters the world throws at it

Dissecting Real-World Database Performance Dilemmas

ScyllaDB

Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments. Join us for an interactive session where we dissect four specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma, we'll: - Examine the context and technical requirements - Talk about potential solutions and cover the pros and cons of each - Disclose what approach the team took, and how it worked out About the speaker: Felipe is an IT specialist with years of experience on distributed systems and open-source technologies. He is one of the co-authors of "Database Performance at Scale", an Open Access, freely available publication for individuals interested on improving database performance. At ScyllaDB, he works as a Solution Architect.

Beyond Linear Scaling: A New Path for Performance with ScyllaDB

ScyllaDB

Linear scaling (sometimes near linear scaling) is often mentioned in several benchmarks, articles and product comparisons as proof that a given technology and algorithmic optimizations perform better than another. But is that really what performance is all about, and should you even care? This webinar discusses performance beyond linear scalability, including what typically matters more when running high throughput and low latency workloads at scale. We'll cover how ScyllaDB offers unparalleled performance and share our insights on: - The hidden aspects of linear scaling - When linear scaling matters most and when it’s simply irrelevant - Often overlooked considerations for optimizing and measuring distributed systems performance Watch now to learn from our experience (and lessons learned) in building the fastest NoSQL database in the world.

Dissecting Real-World Database Performance Dilemmas

ScyllaDB

Navigating Complex Database Performance Hurdles Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments. Join us for an interactive session where we dissect 4 specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma: - The presenters will describe the context and technical requirements - Together, we’ll talk about potential solutions and cover the pros and cons of each - Finally, we’ll disclose what approach the team took, and how it worked out Throughout the event, we’ll have opportunities to win ScyllaDB swag and prizes! Come prepared to engage in lively discussions and gain valuable insight into database performance strategies.

Database Performance at Scale Masterclass: Workload Characteristics by Felipe...

ScyllaDB

Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...

ScyllaDB

Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna

ScyllaDB

Replacing Your Cache with ScyllaDB

ScyllaDB

This document discusses replacing external caching solutions with using the internal caching capabilities of ScyllaDB. It provides examples of companies that improved performance, reduced costs and complexity by moving from Redis or Elasticsearch with an external cache to using ScyllaDB's embedded cache instead. The document also outlines some of the advantages of ScyllaDB's cache like improved latency, coherency with the database and observability compared to external caching layers.

Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability

ScyllaDB

Discover how your team can achieve low latency at the extreme scale that your data-intensive applications require. We’ll walk you through an example of how ScyllaDB scales linearly to achieve 1M and then 2M OPS – with <1ms P99 latency. We’ll cover how this works on a sample realtime app (an ML feature store), share best practices for performance, and talk about the most important tradeoffs you’ll need to negotiate. Join us to learn: - Why and how to ensure your database takes full advantage of your cloud infrastructure - What architectural considerations matter most for high throughput and low latency - Key factors to consider when selecting a high-performance database

7 Reasons Not to Put an External Cache in Front of Your Database.pptx

ScyllaDB

This document discusses the pros and cons of placing an external cache in front of a database. It introduces Tomasz Grabiec and Tzach Livyatan from ScyllaDB and describes ScyllaDB's optimized internal caching design. External caches can increase latency and costs while ignoring the database's context and workload knowledge. ScyllaDB embeds its cache to minimize overhead and ensure data and query awareness. The document shares customer examples that improved performance and reduced costs by moving from cached databases to ScyllaDB.

Getting the most out of ScyllaDB

ScyllaDB

Expert tips on how to maximize your database potential If you’re considering or getting started with ScyllaDB, you’re probably intrigued by its potential to achieve high throughput and predictable low latency at a reasonable cost. So how do you ensure that you’re maximizing that potential for your team’s specific workloads and use case? This webinar offers practical advice for navigating the various decision points you’ll face as you assess whether ScyllaDB is a good fit for your team and later roll it out into production. We’ll cover the most critical considerations, tradeoffs, and recommendations related to: - Infrastructure selection - ScyllaDB configuration - Client-side setup - Data modeling

NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration

ScyllaDB

NoSQL Database Migration Masterclass - Session 3: Migration Logistics

ScyllaDB

NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges

ScyllaDB

More from ScyllaDB (20)

Optimizing NoSQL Performance Through Observability

Event-Driven Architecture Masterclass: Challenges in Stream Processing

Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...

Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...

Developer Data Modeling Mistakes: From Postgres to NoSQL

What Developers Need to Unlearn for High Performance NoSQL

Low Latency at Extreme Scale: Proven Practices & Pitfalls

Dissecting Real-World Database Performance Dilemmas

Beyond Linear Scaling: A New Path for Performance with ScyllaDB

Dissecting Real-World Database Performance Dilemmas

Database Performance at Scale Masterclass: Workload Characteristics by Felipe...

Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...

Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna

Replacing Your Cache with ScyllaDB

Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability

7 Reasons Not to Put an External Cache in Front of Your Database.pptx

Getting the most out of ScyllaDB

NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration

NoSQL Database Migration Masterclass - Session 3: Migration Logistics

NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges

Recently uploaded

Microsoft - Power Platform_G.Aspiotis.pdf

Uni Systems S.M.S.A.

How to use Firebase Data Connect For Flutter

Daiki Mogmet Ito

TrustArc Webinar - 2024 Global Privacy Survey

TrustArc

How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024? In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores. See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe. This webinar will review: - The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey - The top challenges for privacy leaders, practitioners, and organizations in 2024 - Key themes to consider in developing and maintaining your privacy program

Monitoring Java Application Security with JDK Tools and JFR Events

Ana-Maria Mihalceanu

National Security Agency - NSA mobile device best practices

Quotidiano Piemontese

Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI

Vladimir Iglovikov, Ph.D.

Presented by Vladimir Iglovikov: - https://www.linkedin.com/in/iglovikov/ - https://x.com/viglovikov - https://www.instagram.com/ternaus/ This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation. Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners. This case study covers various aspects, including: People: The contributors and community that have supported Albumentations. Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions. Challenges: The hurdles in monetizing open-source projects and measuring user engagement. Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration. Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community. Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations. Mental Health: Maintaining balance and not feeling pressured by user demands. Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth. Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects. Explore more about Albumentations and join the community at: GitHub: https://github.com/albumentations-team/albumentations Website: https://albumentations.ai/ LinkedIn: https://www.linkedin.com/company/100504475 Twitter: https://x.com/albumentations

PCI PIN Basics Webinar from the Controlcase Team

ControlCase

RESUME BUILDER APPLICATION Project for students

KAMESHS29

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024

Neo4j

Presentation of the OECD Artificial Intelligence Review of Germany

innovationoecd

How to Get CNIC Information System with Paksim Ga.pptx

danishmna97

Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...

James Anderson

Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management. The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM). Speakers: Bob Boule Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle. Gopinath Rebala Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.

A tale of scale & speed: How the US Navy is enabling software delivery from l...

sonjaschweigert1

Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved: - Reduction in onboarding time from 5 weeks to 1 day - Improved developer experience and productivity through actionable findings and reduction of false positives - Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO) Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production. We will cover: - How to remove silos in DevSecOps - How to build efficient development pipeline roles and component templates - How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence) - How to streamline operations with automated policy checks on container images

Removing Uninteresting Bytes in Software Fuzzing

Aftab Hussain

Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process. In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds. - These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...

SOFTTECHHUB

The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing. One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...

Neo4j

Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf

Malak Abu Hammad

Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers: * What is Vector Search? * Importance and benefits of vector search * Practical use cases across various industries * Step-by-step implementation guide * Live demos with code snippets * Enhancing LLM capabilities with vector search * Best practices and optimization strategies Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications. #MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology

20 Comprehensive Checklist of Designing and Developing a Website

Pixlogix Infotech

Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.

Climate Impact of Software Testing at Nordic Testing Days

Kari Kakkonen

My slides at Nordic Testing Days 6.6.2024 Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.

Mind map of terminologies used in context of Generative AI

Kumud Singh

Recently uploaded (20)

Microsoft - Power Platform_G.Aspiotis.pdf

How to use Firebase Data Connect For Flutter

TrustArc Webinar - 2024 Global Privacy Survey

Monitoring Java Application Security with JDK Tools and JFR Events

National Security Agency - NSA mobile device best practices

Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI

PCI PIN Basics Webinar from the Controlcase Team

RESUME BUILDER APPLICATION Project for students

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024

Presentation of the OECD Artificial Intelligence Review of Germany

How to Get CNIC Information System with Paksim Ga.pptx

Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...

A tale of scale & speed: How the US Navy is enabling software delivery from l...

Removing Uninteresting Bytes in Software Fuzzing

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...

Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf

20 Comprehensive Checklist of Designing and Developing a Website

Climate Impact of Software Testing at Nordic Testing Days

Mind map of terminologies used in context of Generative AI

Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

1. Cassandra & Scylla at Kenshoo

2. About Me • Wrote “Basic” code when I was a kid • 17 years in the internet industry • Big data fanatic for the last 6 years • Big data team leader At Kenshoo • Our team is Big data DBA • Programming: ETL & Administration tools

3. Kenshoo • 10 years, Tel-Aviv based Startup • Industry Leader in Digital Marketing • 500+ employees • Heavy data shop

4. Kenshoo Legacy Architecture

5. Bigdata at Kenshoo

6. Cassandra at Kenshoo • Datastax Community • 5 clusters • 70 nodes • Biggest cluster, ver 1.2.19 ▪ 40 physical nodes ▪ 4TB compressed to 1TB per node ▪ 14 Billion records ▪ 1500 bytes values, IOPS: 5K avg, 30K burst ▪ Processing clicks and conversion for user behavior

7. Cassandra Cost • Cassandra is great at writes (At first) • Postponing the “cost” to background processes (no free meals) ▪ Compact ▪ Repair ▪ Cleanup ▪ Add / Remove node

8. War & Peace Knowing your Cassandra

9. Day to day • Requires a lot of in depth knowledge • Lack of documentation • Tuning per application • Lot’s of custom maintenance scripts • Don’t run repair , only rebuild

10. • Compaction didn’t complete during maintenance window • Leveled CS is a ColumnFamily cluster wide configuration • Found a jmx “hack” for a per server config Migration to Leveled Compaction Strategy • It took us a month to manually switch • Needed to reconfigure each server after restart

11. GC Hell • Lot’s of GC under load • GC causing 95th percentile performance ▪ One problematic node affects the cluster performance • Full restart to the cluster each week • Change rpc server type to hsha • Tuning is black magic, takes days to see effects

12. Maintenance is delicate • Need to wait between adding and removing nodes for cluster to rebalance • Turn off a node thrift service • Running partial cleanup • Tuning of params ▪ compaction_throughput_mb_per_sec ▪ concurrent_compactors

13. Scylla Evaluation

14. Self tuning features

15. Lab setup • i2.8xlarge • Cassandra 2.1.15 ▪ compaction_throughput_mb_per_sec = 0 (16) ▪ stream_throughput_outbound_megabits_per_sec = 10000 (200) ▪ inter_dc_stream_throughput_outbound_megabits_per_sec = 10000 (200) • Scylla 1.3 ▪ out of the box

16. GC at Scylla

17. Repair • 3 x i2.8xlarge • RF 3 • 72GB data per node • 10M rows ▪ 5 columns ▪ 1500 bytes value • Delete all the data from one node

18. Repair results

19. Cleanup at Scylla • 4 x i2.8xlarge • RF 3 • 270GB data per node • 50M rows ▪ 5 columns ▪ 1500 bytes value • Decommission a node and join it back

20. Cleanup results

21. Compact at Scylla • 3 x i2.8xlarge • RF 3 • 30 minute stress • 10M rows ▪ 5 columns ▪ 1500 bytes value

22. Compaction result

23. Latency & Throughput • 3 x i2.8xlarge • RF 3 • 72GB data per node • 10M rows ▪ 5 columns ▪ 1.5K value ▪ 3 writes, 2 reads mixed • 4 x cassandra-stress nodes ▪ 30 minutes

24.

25.

26.

27.

28.

29.

30.

31. • Scylla has lower cost ▪ Compaction, repair & cleanup are much more efficient ▪ Consistent in latency under much higher load • Moving forward ▪ Integrate it in an inner production monitoring system Conclusion

32. Q&A

33. Thank You! Contact: noam.hasson@kenshoo.com

Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (15)

Similar to Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

Similar to Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla (20)

More from ScyllaDB

More from ScyllaDB (20)

Recently uploaded

Recently uploaded (20)

Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla