Cache & Concurrency considerations for a high performance Cassandra deployment.
SriSatish Ambati
Cassandra has hit it's stride as a distributed java NoSQL database! It's fast, it's in-memory, it's scalable, it's seda; It's eventually consistent model makes it practical for the large & growing volumes of unstructured data usecases. It is also time to run it through the filters of performance analysis. For starters it runs on the java virtual machine and inherits the capabilities and culpabilities of the platform. This presentation reviews the runtime architecture, cache behavior & performance of a real-world workload on Cassandra. We blend existing system & jvm tools to get a quick overview & a breakdown of hotspots in the get, put & update operations. We highlight the role played by garbage collection & fragmentation due to long lived objects; We investigate lock contention in the data structures under concurrent usage. Cassandra uses UDP for management & TCP for data: we look at robustness of the communication patterns during high spikes and cluster-wide events. We review Non-Blocking Hashmap modifications to Cassandra that improve concurrency & amplify performance of this frontrunner in the NoSQL space
ApacheCon2010 NA
Wed, 03 November 2010 15:00
cassandra
Josh Berkus
Most users know that PostgreSQL has a 23-year development history. But did you know that Postgres code is used for over a dozen other database systems? Thanks to our liberal licensing, many companies and open source projects over the years have taken the Postgres or PostgreSQL code, changed it, added things to it, and/or merged it into something else. Illustra, Truviso, Aster, Greenplum, and others have seen the value of Postgres not just as a database but as some darned good code they could use. We'll explore the lineage of these forks, and go into the details of some of the more interesting ones.
Two popular tools for doing Machine Learning on top of JVM ecosystem is H2O and SparkML. This presentation compares these two tools as Machine Learning libraries (Didn't consider Spark's Data Munjing perspective). This work was done during June of 2018.
Josh Berkus
Most users know that PostgreSQL has a 23-year development history. But did you know that Postgres code is used for over a dozen other database systems? Thanks to our liberal licensing, many companies and open source projects over the years have taken the Postgres or PostgreSQL code, changed it, added things to it, and/or merged it into something else. Illustra, Truviso, Aster, Greenplum, and others have seen the value of Postgres not just as a database but as some darned good code they could use. We'll explore the lineage of these forks, and go into the details of some of the more interesting ones.
Two popular tools for doing Machine Learning on top of JVM ecosystem is H2O and SparkML. This presentation compares these two tools as Machine Learning libraries (Didn't consider Spark's Data Munjing perspective). This work was done during June of 2018.
Cassandra Summit 2014: Performance Tuning Cassandra in AWSDataStax Academy
Presenters: Michael Nelson, Development Manager at FamilySearch
A recent research project at FamilySearch.org pushed Cassandra to very high scale and performance limits in AWS using a real application. Come see how we achieved 250K reads/sec with latencies under 5 milliseconds on a 400-core cluster holding 6 TB of data while maintaining transactional consistency for users. We'll cover tuning of Cassandra's caches, other server-side settings, client driver, AWS cluster placement and instance types, and the tradeoffs between regular & SSD storage.
Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.
The Google Chubby lock service for loosely-coupled distributed systemsRomain Jacotin
The Google Chubby lock service presented in 2006 is the inspiration for Apache ZooKeeper: let's take a deep dive into Chubby to better understand ZooKeeper and distributed consensus.
Presented to eRum (Budapest), May 2018
There are many common workloads in R that are "embarrassingly parallel": group-by analyses, simulations, and cross-validation of models are just a few examples. In this talk I'll describe the doAzureParallel package, a backend to the "foreach" package that automates the process of spawning a cluster of virtual machines in the Azure cloud to process iterations in parallel. This will include an example of optimizing hyperparameters for a predictive model using the "caret" package.
These slides cover a talk on using distributed computation for database queries. Moore's Law, Amdahl's Law and distribution techniques are highlighted, and a simple performance comparison is provided.
The beautiful thing about software engineering is that it gives you the warm and fuzzy illusion of total understanding: I control this machine because I know how it operates. This is the result of layers upon layers of successful abstractions, which hide immense sophistication and complexity. As with any abstraction, though, these sometimes leak, and that's when a good grounding in what's under the hood pays off.
This first in what will hopefully be a series of talks covers the fundamentals of storage, providing an overview of the three storage tiers commonly found on modern platforms (hard drives, RAM and CPU cache). You'll come away knowing a little bit about a lot of different moving parts under the hood; after all, isn't understanding how the machine operates what this is all about?
-- A talk given at GeeCON Kraków 2016.
Genius scan - Du boostrap à 20 millions d’utilisateurs, techniques et outils ...CocoaHeads France
Nous aurons l’honneur de recevoir Bruno de Genius Scan. Il viendra nous parler de la genèse de l’application éponyme, dans un talk intitulé “Du boostrap à 20 millions d’utilisateurs, techniques et outils tirés de notre expérience.”
Cassandra Summit 2014: Performance Tuning Cassandra in AWSDataStax Academy
Presenters: Michael Nelson, Development Manager at FamilySearch
A recent research project at FamilySearch.org pushed Cassandra to very high scale and performance limits in AWS using a real application. Come see how we achieved 250K reads/sec with latencies under 5 milliseconds on a 400-core cluster holding 6 TB of data while maintaining transactional consistency for users. We'll cover tuning of Cassandra's caches, other server-side settings, client driver, AWS cluster placement and instance types, and the tradeoffs between regular & SSD storage.
Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.
The Google Chubby lock service for loosely-coupled distributed systemsRomain Jacotin
The Google Chubby lock service presented in 2006 is the inspiration for Apache ZooKeeper: let's take a deep dive into Chubby to better understand ZooKeeper and distributed consensus.
Presented to eRum (Budapest), May 2018
There are many common workloads in R that are "embarrassingly parallel": group-by analyses, simulations, and cross-validation of models are just a few examples. In this talk I'll describe the doAzureParallel package, a backend to the "foreach" package that automates the process of spawning a cluster of virtual machines in the Azure cloud to process iterations in parallel. This will include an example of optimizing hyperparameters for a predictive model using the "caret" package.
These slides cover a talk on using distributed computation for database queries. Moore's Law, Amdahl's Law and distribution techniques are highlighted, and a simple performance comparison is provided.
The beautiful thing about software engineering is that it gives you the warm and fuzzy illusion of total understanding: I control this machine because I know how it operates. This is the result of layers upon layers of successful abstractions, which hide immense sophistication and complexity. As with any abstraction, though, these sometimes leak, and that's when a good grounding in what's under the hood pays off.
This first in what will hopefully be a series of talks covers the fundamentals of storage, providing an overview of the three storage tiers commonly found on modern platforms (hard drives, RAM and CPU cache). You'll come away knowing a little bit about a lot of different moving parts under the hood; after all, isn't understanding how the machine operates what this is all about?
-- A talk given at GeeCON Kraków 2016.
Genius scan - Du boostrap à 20 millions d’utilisateurs, techniques et outils ...CocoaHeads France
Nous aurons l’honneur de recevoir Bruno de Genius Scan. Il viendra nous parler de la genèse de l’application éponyme, dans un talk intitulé “Du boostrap à 20 millions d’utilisateurs, techniques et outils tirés de notre expérience.”
Diagnosing Problems in Production - CassandraJon Haddad
This presentation covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Readers will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster. This presentation is intended for people with a general understanding of Cassandra, but it not required to have experience running it in production.
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionDataStax Academy
This sessions covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Attendees will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster. This talk is intended for people with a general understanding of Cassandra, but it not required to have experience running it in production.
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionDataStax Academy
Speaker(s): Jon Haddad, Apache Cassandra Evangelist at DataStax
This sessions covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Attendees will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster. This talk is intended for people with a general understanding of Cassandra, but it not required to have experience running it in production.
Cassandra Day London 2015: Diagnosing Problems in ProductionDataStax Academy
Speaker(s): Jon Haddad, Apache Cassandra Evangelist at DataStax
This sessions covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Attendees will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster. This talk is intended for people with a general understanding of Cassandra, but it not required to have experience running it in production.
This sessions covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Attendees will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster. This talk is intended for people with a general understanding of Cassandra, but it not required to have experience running it in production.
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
This session covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Viewers will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster.
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
This session covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Viewers will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster.
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...srisatish ambati
Top 10 Causes for Java Issues in Production and What to Do When Things Go Wrong
JavaOne 2010.
Abstract: It's Friday evening and you hear the first rumble . . . one java node has become slightly unresponsive. You lookup the process, get a thread dump, and for good measure restart it at 8 p.m. Saturday afternoon is when you realize that other nodes have caught the flu and you get the ugly call from the customer. In a matter of hours, you're on that conference bridge with support groups of different packages and Java vendors and one of your uberarchitects. Yes, production instances are up and down, and restarting like there's no tomorrow. Here's an accumulated compendium of the op 10 things that can cause Java production heartburn and what to do when your Java production is on fire. And yes, please have your tools belt on.
Speaker(s):
Cliff Click, Azul Systems, Distinguished Engineer
SriSatish Ambati, Azul Systems, Performance Engineer
In this session we review the design of the newly released off heap storage feature in Apache Geode, and discuss use cases and potential direction for additional capabilities of this feature.
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]Malin Weiss
By leveraging memory-mapped files, Speedment and the Chronicle Engine supports large Java maps that easily can exceed the size of your server’s RAM.Because the Java maps are mapped onto files, these maps can be shared instantly between several microservice JVMs and new microservice instances can be added, removed, or restarted very quickly. Data can be retrieved with predictable ultralow latency for a wide range of operations. The solution can be synchronized with an underlying database so that your in-memory maps will be consistently “alive.” The mapped files can be tens of terabytes, which has been done in real-world deployment cases, and a large number of micro services can share these maps simultaneously. Learn more in this session.
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]Speedment, Inc.
By leveraging memory-mapped files, Speedment and the Chronicle Engine supports large Java maps that easily can exceed the size of your server’s RAM.Because the Java maps are mapped onto files, these maps can be shared instantly between several microservice JVMs and new microservice instances can be added, removed, or restarted very quickly. Data can be retrieved with predictable ultralow latency for a wide range of operations. The solution can be synchronized with an underlying database so that your in-memory maps will be consistently “alive.” The mapped files can be tens of terabytes, which has been done in real-world deployment cases, and a large number of micro services can share these maps simultaneously. Learn more in this session.
Similar to ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM) (20)
Top 10 Performance Gotchas for scaling in-memory Algorithms.srisatish ambati
Top 10 Data Parallelism and Model Parallelism lessons from scaling H2O.
"Math Algorithms have primarily been the domain of desktop data science. With the success of scalable algorithms at Google, Amazon, and Netflix, there is an ever growing demand for sophisticated algorithms over big data. In this talk, we get a ringside view in the making of the world's most scalable and fastest machine learning framework, H2O, and the performance lessons learnt scaling it over EC2 for Netflix and over commodity hardware for other power users.
Top 10 Performance Gotchas is about the white hot stories of i/o wars, S3 resets, and muxers, as well as the power of primitive byte arrays, non-blocking structures, and fork/join queues. Of good data distribution & fine-grain decomposition of Algorithms to fine-grain blocks of parallel computation. It's a 10-point story of the rage of a network of machines against the tyranny of Amdahl while keeping the statistical properties of the data and accuracy of the algorithm."
Compendium of my Brisk, Cassandra & Hadoop talks of the Summer 2011 - Delivered at JavaOne2011. I like the content in this one personally as it touches, Usecase driven intro to Cassandra, NoSQL followed by Intro to hadoop - MapReduce, HDFS internals, NameNode and JobTrackers. And how Brisk decomposes the Single point of failures in HDFS while providing a single form for Realtime & Batch storage and processing.
(And it seemed enjoyable to the audience in attendance)
SFJava, SFNoSQL, SFMySQL, Marakana & Microsoft come together for a presentation evening of three NoSQL technologies - Apache Cassandra, Mongodb, Hadoop.
This talk lays out a few talking points for Apache Cassandra.
Brisk - Truly peer-to-peer hadoop.
Brisk is an open-source Hadoop & Hive distribution that uses Apache Cassandra for its core services and storage. Brisk makes it possible to run Hadoop MapReduce on top of CassandraFS, an HDFS-compatible storage layer. By replacing HDFS with CassandraFS, users leverage MapReduce jobs on Cassandra’s peer-to-peer, fault-tolerant and scalable architecture.
With CassandraFS all nodes are peers. Data files can be loaded through any node in the cluster and any node can serve as the JobTracker for MapReduce jobs. Hive MetaStore is stored & accessed as just another column family (table) on the distributed data store. Brisk makes Hadoop truly peer-to-peer.
We demonstrate visualisation & monitoring of Brisk using OpsCenter. The operational simplicity of cassandra’s multi-datacenter & multi-region aware replication makes Brisk well-suited for a rich set of Applications and usecases. And by being able to store and isolate hdfs & online data within the same data cluster, Brisk makes analytics possible without ETL!
LA Scalability Talk, Mahalo
May 31.2011
SF Java presentation of jvm goes to big data.
“Slowly yet surely the JVM is going to Big Data! In this fun filled presentation we see what pieces of Java & JVM triumph or unravel in the battle for performance at high scale!”
Concurrency is the currency of scale on multi-core & the new generation of collections and non-blocking hashmaps are well worth the time taking a deep dive into. We take a quick look at the next gen serialization techniques as well as implementation pitfalls around UUID. The achilles' heel for JVM remains Garbage Collection: a deep dive into the internals of the memory model, common GC algorithms and their tuning knobs is always a big draw. EC2 & cloud present us with a virtualized & unchartered territory for scaling the JVM.
We will leave some room for Q&A or fill it up with any asynchronous I/O that might queue up during the talk. A round of applause will be due to the various tools that are essentials for Java performance debugging.
invited netflix talk: JVM issues in the age of scale! We take an under the hood look at java locking, memory model, overheads, serialization, uuid, gc tuning, CMS, ParallelGC, java.
Caching in Java - A review of different caching vendors (Oracle Coherence, Apache Cassandra, Infinispan, Ehcache/Terracotta, etc) and limitations presented by the underlying Java Platform.
Presented at RedHat Summit 2010, Boston
Speakers: SriSatish Ambati, Performance Engg
Manik Surtani, InfiniSpan Lead
Presentation details from RH Summit:
How to Stop Worrying & Start Caching in Java
SriSatish Ambati — Performance & Partner Engineer, Azul Systems, Inc.
Manik Surtani — Principal Software Engineer, Red Hat
Application data caching has come of age as distributed and large cache clusters are now common. The next generation of applications that depend on efficient caching has come into being and data and cache size explosion has set in.
In this session, Azul Systems’ SriSatish Ambati and Red Hat’s Manik Surtani will survey performance characteristics of different cache algorithms, their implementations (e.g., implementing a 200Gb data cache size), and how well they work in practical JVM deployments. In each scenario, they will present patterns of architecture that scale, and demonstrate where read and write performance stands in the context of increasing cache sizes and concurrency.
Throughout this discussion, they will recognize several villains, including heap fragmentation, long-lived objects, multi-VM communication, socket handlers, and queue managers. SriSatish and Manik will take a fun-filled “whodunit” approach to portray the roles played by each villain in killing cache performance.
http://www.redhat.com/promo/summit/2010/sessions/jboss.html
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Knowledge engineering: from people to machines and back
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
1. SriSatish Ambati
Performance, Riptano, Cassandra
Azul Systems & OpenJDK
Twitter: @srisatish
srisatish.ambati@gmail.com
Cache & Concurrency considerations
for a high performance Cassandra
2. Trail ahead
Elements of Cache Performance
Metrics, Monitors
JVM goes to BigData Land!
Examples
Lucandra, Twissandra
Cassandra Performance with JVM
Commentary
Runtime Views
Non Blocking HashMap
Locking: concurrency
Garbage Collection
3. A feather in the CAP
• Eventual
Consistency
– Levels
– Doesn’t mean data
loss (journaled)
• SEDA
– Partitioning, Cluster
& Failure detection,
Storage engine mod
– Event driven & non-
blocking io
– Pure Java
4. Count what is countable, measure what is measurable, and what is not
measurable, make measurable
-Galileo
5. Elements of Cache Performance
Metrics
• Operations:
– Ops/s: Puts/sec, Gets/sec, updates/sec
– Latencies, percentiles
– Indexing
• # of nodes – scale, elasticity
• Replication
– Synchronous, Asynchronous (fast writes)
• Tuneable Consistency
• Durability/Persistence
• Size & Number of Objects, Size of Cache
• # of user clients
6. Elements of Cache Performance:
“Think Locality”
• Hot or Not: The 80/20 rule.
– A small set of objects are very popular!
– What is the most RT tweet?
• Hit or Miss: Hit Ratio
– How effective is your cache?
– LRU, LFU, FIFO.. Expiration
• Long-lived objects lead to better locality.
• Spikes happen
– Cascading events
– Cache Thrash: full table scans
7. Real World Performance
• Facebook Inbox
– Writes:0.12ms, Reads:15ms @ 50GB data
• Twitter performance
– Twissandra (simulation)
• Cassandra for Search & Portals
– Lucandra, solandra (simulation)
• ycbs/PNUTS benchmarks
– 5ms read/writes @ 5k ops/s (50/50 Update heavy)
– 8ms reads/5ms writes @ 5k ops/s (95/5 read heavy)
• Lab environment
– ~5k writes per sec per node, <5ms latencies
– ~10k reads per sec per node, <5ms latencies
• Performance has improved in newer versions
10. JVM in BigData Land!
Limits for scale
• Locks : synchronized
– Can’t use all my multi-cores!
– java.util.collections also hold locks
– Use non-blocking collections!
• (de)Serialization is expensive
– Hampers object portability
– Use avro, thrift!
• Object overhead
– average enterprise collection has 3 elements!
– Use byte[ ], primitives where possible!
• Garbage Collection
– Can’t throw memory at the problem!
– Mitigate, Monitor, Measure foot print
11. Tools
• What is the JVM doing:
– dtrace, hprof, introscope, jconsole,
visualvm, yourkit, azul zvision
• Invasive JVM observation tools
– bci, jvmti, jvmdi/pi agents, jmx, logging
• What is the OS doing:
– dtrace, oprofile, vtune
• What is the network disk doing:
– Ganglia, iostat, lsof, netstat, nagios
12. furiously fast writes
• Append only writes
– Sequential disk access
• No locks in critical path
• Key based atomicity
client
issues
write
n1
partitioner
commit log
apply to
memory
n2
find node
13. furiously fast writes
• Use separate disks for commitlog
– Don’t forget to size them well
– Isolation difficult in the cloud..
• Memtable/SSTable sizes
– Delicately balanced with GC
• memtable_throughput_in_mb
17. Compactions
K1 < Serialized data >
K2 < Serialized data >
K3 < Serialized data >
--
--
--
Sorted
K2 < Serialized data >
K10 < Serialized data >
K30 < Serialized data >
--
--
--
Sorted
K4 < Serialized data >
K5 < Serialized data >
K10 < Serialized data >
--
--
--
Sorted
MERGE SORT
Loaded in memory
K1 < Serialized data >
K2 < Serialized data >
K3 < Serialized data >
K4 < Serialized data >
K5 < Serialized data >
K10 < Serialized data >
K30 < Serialized data >
Sorted
K1 Offset
K5 Offset
K30 Offset
Bloom Filter
Index File
Data File
D E L E T E D
18. Compactions
• Intense disk io & mem churn
• Triggers GC for tombstones
• Minor/Major Compactions
• Reduce priority for better reads
• Other Parameters -
– CompactionManager.
minimumCompactionThreshold=xxxx
21. reads performance
• BloomFilter used to identify the right file
• Maintain column indices to look up columns
– Which can span different SSTables
• Less io than typical b-tree
• Cold read: Two seeks
– One for Key lookup, another row lookup
• Key Cache
– Optimized in latest cassandra
• Row Cache
– Improves read performance
– GC sensitive for large rows.
• Most (google) applications require single row
transactions*
*Sanjay G, BigTable Design, Google.
22. Client Performance
Marshal Arts:
Ser/Deserialization
• Clients dominated by Thrift, Avro
– Hector, Pelops
• Thrift: upgrade to latest: 0.5, 0.4
• No news: java.io.Serializable is S.L..O.…W
• Use “transient”
• avro, thrift, proto-buf
• Common Patterns of Doom:
– Death by a million gets
24. Adding Nodes
• New nodes
– Add themselves to busiest node
– And then Split its Range
• Busy Node starts transmit to new node
• Bootstrap logic initiated from any node, cli, web
• Each node capable of ~40MB/s
– Multiple replicas to parallelize bootstrap
• UDP for control messages
• TCP for request routing
26. Bloom Filter: in full bloom
• “constant” time
• size:compact
• false positives
• Single lookup
for key in file
• Deletion
• Improve
– Counting BF
– Bloomier filters
27. Birthdays, Collisions &
Hashing functions
• Birthday Paradox
For the N=21 people in this room
Probability that at least 2 of them share same birthday is
~0.47
• Collisions are real!
• An unbalanced HashMap behaves like a list O(n) retrieval
• Chaining & Linear probing
• Performance Degrades
• with 80% table density
•
36. U U I D
• java.util.UUID is slow
– static use leads to contention
SecureRandom
• Uses /dev/urandom for seed initialization
-Djava.security.egd=file:/dev/urandom
• PRNG without file is atleast 20%-40% better.
• Use TimeUUIDs where possible – much faster
• JUG – java.uuid.generator
• http://github.com/cowtowncoder/java-uuid-generator
• http://jug.safehaus.org/
• http://johannburkard.de/blog/programming/java/Java-UUID-generators-compared.html
37. synchronized
• Coarse grained locks
• io under lock
• Stop signal on a highway
• java.util.concurrent does not mean no
locks
• Non Blocking, Lock free, Wait free
collections
38. Scalable Lock-Free Coding Style
• Big Array to hold Data
• Concurrent writes via: CAS & Finite State
Machine
– No locks, no volatile
– Much faster than locking under heavy load
– Directly reach main data array in 1 step
• Resize as needed
– Copy Array to a larger Array on demand
– Use State Machine to help copy
– “ Mark” old Array words to avoid missing late
updates
40. Cassandra uses High Scale
Non-Blocking Hashmap
public class BinaryMemtable implements IFlushable
{
…
private final Map<DecoratedKey,byte[]> columnFamilies =
new NonBlockingHashMap<DecoratedKey, byte[]>();
/* Lock and Condition for notifying new clients about Memtable
switches */
private final Lock lock = new ReentrantLock(); Condition condition;
…
}
public class Table
{
…
private static final Map<String, Table> instances = new
NonBlockingHashMap<String, Table>();
…
}
41. GC-sensitive elements within
Cassandra
• Compaction triggers System.gc()
– Tombstones from files
• “GCInspector”
• Memtable Threshold, sizes
• SSTable sizes
• Low overhead collection choices
42. Garbage Collection
• Pause Times
if stop_the_word_FullGC > ttl_of_node
=> failed requests; failure accrual & node repair.
• Allocation Rate
– New object creation, insertion rate
• Live Objects (residency)
– if residency in heap > 50%
– GC overheads dominate.
• Overhead
– space, cpu cycles spent GC
• 64-bit not addressing pause times
– Bigger is not better!
43. Memory Fragmentation
• Fragmentation
– Performance degrades over time
– Inducing “Full GC” makes problem go away
– Free memory that cannot be used
• Reduce occurrence
– Use a compacting collector
– Promote less often
– Use uniform sized objects
• Solution – unsolved
– Use latest CMS with CR:6631166
– Azul’s Zing JVM & Pauseless GC
45. Best Practices:
Garbage Collection
• GC Logs are cheap even in
production
-Xloggc:/var/log/cassandra/gc.log
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution
-XX:+PrintHeapAtGC
• Slightly expensive ones:
-XX:PrintFLSStatistics=2 -XX:CMSStatistics=1
-XX:CMSInitiationStatistics
46. Sizing: Young Generation
• Should we set –Xms == -Xmx ?
• Use –Xmn (fixed eden)
survivor spaces
allocations {new Object();}
eden
promotion
old generation
allocation by jvm
survivor ratio
Tenuring
Threshold
47. Tuning CMS
• Don’t promote too often!
– Frequent promotion causes fragmentation
• Size the generations
– Min GC times are a function of Live Set
– Old Gen should host steady state comfortably
• Parallelize on multicores:
– -XX:ParallelCMSThreads=4
– -XX:ParallelGCThreads=4
• Avoid CMS Initiating heuristic
– -XX:+UseCMSInitiationOccupanyOnly
• Use Concurrent for System.gc()
– -XX:+ExplicitGCInvokesConcurrent
48. Summary
Design & Implementation of Cassandra takes advantages
of strengths while avoiding common JVM issues.
• Locks:
– Avoids locks in critical path
– Uses non-blocking collections, TimeUUIDs!
– Still Can’t use all my multi-cores..?
>> Other bottlenecks to find!
• De/Serialization:
– Uses avro, thrift!
• Object overhead
– Uses mostly byte[ ], primitives where possible!
• Garbage Collection
– Mitigate: Monitor, Measure foot print.
– Work in progress by all jvm vendors!
Cassandra starts from a great footing from a JVM standpoint
and will reap the benefits of the platform!
49. Q&AReferences
• Verner Wogels, Eventually Consistent
http://www.allthingsdistributed.com/2008/12/eventually_consistent.htm
• Bloom, Burton H. (1970), "Space/time trade-offs in hash coding
with allowable errors"
• Avinash Lakshman, http://static.last.fm/johan/nosql-
20090611/cassandra_nosql.pdf
• Eric Brewer, CAP
http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
• Tony Printzeis, Charlie Hunt, Javaone Talk
http://www.scribd.com/doc/36090475/GC-Tuning-in-the-Java
• http://github.com/digitalreasoning/PyStratus/wiki/Documentation
• http://www.cs.cornell.edu/home/rvr/papers/flowgossip.pdf
• Cassandra on Cloud, http://www.coreyhulen.org/?p=326
• Cliff Click’s, Non-blocking HashMap
http://sourceforge.net/projects/high-scale-lib/
• Brian F. Cooper., Yahoo Cloud Storage Benchmark,
http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf
Editor's Notes
Typical write
operation involves a write into a commit log for durability
and recoverability and an update into an in-memory data
structure. The write into the in-memory data structure is
performed only after a successful write into the commit log.
We have a dedicated disk on each machine for the commit
log since all writes into the commit log are sequential and
so we can maximize disk throughput. When the in-memory
data structure crosses a certain threshold, calculated based
on data size and number of objects, it dumps itself to disk.
This write is performed on one of many commodity disks
that machines are equipped with. All writes are sequential
to disk and also generate an index for ecient lookup based
on row key. These indices are also persisted along with the
data le. Over time many such les could exist on disk and
a merge process runs in the background to collate the different
les into one le. This process is very similar to the
compaction process that happens in the Bigtable system
“A typical read operation rst queries the in-memory data
structure before looking into the les on disk. The files are
looked at in the order of newest to oldest. When a disk
lookup occurs we could be looking up a key in multiple les
on disk. In order to prevent lookups into les that do not
contain the key, a bloom lter, summarizing the keys in
the le, is also stored in each data le and also kept in
memory. This bloom lter is rst consulted to check if the
key being looked up does indeed exist in the given le. A key
in a column family could have many columns. Some special
indexing is required to retrieve columns which are further
away from the key. In order to prevent scanning of every
column on disk we maintain column indices which allow us to
jump to the right chunk on disk for column retrieval. As the
columns for a given key are being serialized and written out
to disk we generate indices at every 256K chunk boundary.
This boundary is congurable, but we have found 256K to
work well for us in our production workloads.”
Description of Graph
Shows the average number of cache misses expected when inserting into a hash table with various collision resolution mechanisms; on modern machines, this is a good estimate of actual clock time required. This seems to confirm the common heuristic that performance begins to degrade at about 80% table density.
It is based on a simulated model of a hash table where the hash function chooses indexes for each insertion uniformly at random. The parameters of the model were:
You may be curious what happens in the case where no cache exists. In other words, how does the number of probes (number of reads, number of comparisons) rise as the table fills? The curve is similar in shape to the one above, but shifted left: it requiresan average of 24 probes for an 80% full table, and you have to go down to a 50% full table for only 3 probes to be required on average. This suggests that in the absence of a cache, ideally your hash table should be about twice as large for probing as for chaining.