Functional data models are great, but how can you squeeze out more performance and make them awesome! Let's talk through some example models, go through the tuning steps and understand the tradeoffs. Many time's just a simple understanding of the underlying internals can make all the difference. I've helped some of the biggest companies in the world do this and I can help you. Do you feel the need for Cassandra 2.0 speed?
Storing time series data with Apache CassandraPatrick McFadin
If you are looking to collect and store time series data, it's probably not going to be small. Don't get caught without a plan! Apache Cassandra has proven itself as a solid choice now you can learn how to do it. We'll look at possible data models and the the choices you have to be successful. Then, let's open the hood and learn about how data is stored in Apache Cassandra. You don't need to be an expert in distributed systems to make this work and I'll show you how. I'll give you real-world examples and work through the steps. Give me an hour and I will upgrade your time series game.
Further discussion on Data Modeling with Apache Cassandra. Overview of formal data modeling techniques as well as practical. Real-world use cases and associated data models.
A lot has changed since I gave one of these talks and man, has it been good. 2.0 brought us a lot of new CQL features and now with 2.1 we get even more! Let me show you some real life data models and those new features taking developer productivity to an all new high. User Defined Types, New Counters, Paging, Static Columns. Exciting new ways of making your app truly killer!
At this meetup Patrick McFadin, Solutions Architect at DataStax, will be discussing the most recently added features in Apache Cassandra 2.0, including: Lightweight transactions, eager retries, improved compaction, triggers, and CQL cursors. He'll also be touching on time series data with Apache Cassandra.
Apache Cassandra 2.0 is out - now there's no reason not to ditch that ol' legacy relational system for your important online applications. Cassandra 2.0 includes big impact features like Light Weight Transactions and Triggers. Do you know about the other new enhancements that got lost in the noise. Let's put the spotlight on all the things! Changes in memory management, file handling and internals. Low hype but they pack a big punch. While we were at it, we also did a bit of house cleaning.
Time series with Apache Cassandra - Long versionPatrick McFadin
Apache Cassandra has proven to be one of the best solutions for storing and retrieving time series data. This talk will give you an overview of the many ways you can be successful. We will discuss how the storage model of Cassandra is well suited for this pattern and go over examples of how best to build data models.
Managing large volumes of data isn’t trivial and needs a plan. Fast Data is how we describe the nature of data in a heavily consumer-driven world. Fast in. Fast out. Is your data infrastructure ready? You will learn some important reference architectures for large-scale data problems. The three main areas are covered:
Organize - Manage the incoming data stream and ensure it is processed correctly and on time. No data left behind.
Process - Analyze volumes of data you receive in near real-time or in a batch. Be ready for fast serving in your application.
Store - Reliably store data in the data models to support your application. Never accept downtime or slow response times.
Storing time series data with Apache CassandraPatrick McFadin
If you are looking to collect and store time series data, it's probably not going to be small. Don't get caught without a plan! Apache Cassandra has proven itself as a solid choice now you can learn how to do it. We'll look at possible data models and the the choices you have to be successful. Then, let's open the hood and learn about how data is stored in Apache Cassandra. You don't need to be an expert in distributed systems to make this work and I'll show you how. I'll give you real-world examples and work through the steps. Give me an hour and I will upgrade your time series game.
Further discussion on Data Modeling with Apache Cassandra. Overview of formal data modeling techniques as well as practical. Real-world use cases and associated data models.
A lot has changed since I gave one of these talks and man, has it been good. 2.0 brought us a lot of new CQL features and now with 2.1 we get even more! Let me show you some real life data models and those new features taking developer productivity to an all new high. User Defined Types, New Counters, Paging, Static Columns. Exciting new ways of making your app truly killer!
At this meetup Patrick McFadin, Solutions Architect at DataStax, will be discussing the most recently added features in Apache Cassandra 2.0, including: Lightweight transactions, eager retries, improved compaction, triggers, and CQL cursors. He'll also be touching on time series data with Apache Cassandra.
Apache Cassandra 2.0 is out - now there's no reason not to ditch that ol' legacy relational system for your important online applications. Cassandra 2.0 includes big impact features like Light Weight Transactions and Triggers. Do you know about the other new enhancements that got lost in the noise. Let's put the spotlight on all the things! Changes in memory management, file handling and internals. Low hype but they pack a big punch. While we were at it, we also did a bit of house cleaning.
Time series with Apache Cassandra - Long versionPatrick McFadin
Apache Cassandra has proven to be one of the best solutions for storing and retrieving time series data. This talk will give you an overview of the many ways you can be successful. We will discuss how the storage model of Cassandra is well suited for this pattern and go over examples of how best to build data models.
Managing large volumes of data isn’t trivial and needs a plan. Fast Data is how we describe the nature of data in a heavily consumer-driven world. Fast in. Fast out. Is your data infrastructure ready? You will learn some important reference architectures for large-scale data problems. The three main areas are covered:
Organize - Manage the incoming data stream and ensure it is processed correctly and on time. No data left behind.
Process - Analyze volumes of data you receive in near real-time or in a batch. Be ready for fast serving in your application.
Store - Reliably store data in the data models to support your application. Never accept downtime or slow response times.
Apache cassandra and spark. you got the the lighter, let's start the firePatrick McFadin
Introduction to analyzing Apache Cassandra data using Apache Spark. This includes data models, operations topics and the internal on how Spark interfaces with Cassandra.
Nike Tech Talk: Double Down on Apache Cassandra and SparkPatrick McFadin
Apache Cassandra has proven to be one of the best solutions for storing and retrieving time series data at high velocity and high volume. This talk will give you an overview of the many ways you can be successful by introducing Apache Cassandra concepts. We will discuss how the storage model of Cassandra is well suited for this pattern and go over examples of how best to build data models. There will also be examples of how you can use Apache Spark along with Apache Cassandra to create a real time data analytics platform. It’s so easy, you will be shocked and ready to try it yourself.
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterprisePatrick McFadin
Wait! Back away from the Cassandra 2ndary index. It’s ok for some use cases, but it’s not an easy button. "But I need to search through a bunch of columns to look for the data and I want to do some regression analysis… and I can’t model that in C*, even after watching all of Patrick McFadins videos. What do I do?” The answer, dear developer, is in DSE Search and Analytics. With it’s easy Solr API and Spark integration so you can search and analyze data stored in your Cassandra database until your heart’s content. Take our hand. WE will show you how.
Introduction to data modeling with apache cassandraPatrick McFadin
Are you using relational databases and wonder how to get started with data modeling and Apache Cassandra? Here is a starting tour of how to get started. Translating from the knowledge you already have to the knowledge you need to effective with Cassandra development. We cover patterns and anti-patterns. Get going today!
This summer, coming to a server near you, Cassandra 3.0! Contributors and committers have been working hard on what is the most ambitious release to date. It’s almost too much to talk about, but we will dig into some of the most important, ground breaking features that you’ll want to use. Indexing changes that will make your applications faster and spark jobs more efficient. Storage engine changes to get even more density and efficiency from your nodes. Developer focused features like full JSON support and User Defined Functions. And finally, one of the most requested features, Windows support, has made it’s arrival. There is more, but you’ll just have to some see for yourself. Get your front row seat and don’t miss it!
- Apache Cassandra is a linearly scalable and fault tolerant NoSQL database that increases throughput linearly with additional machines
- It is an AP system that is eventually consistent according to the CAP theorem, sacrificing consistency in favor of availability and partition tolerance
- Cassandra uses replication and consistency levels to control fault tolerance at the server and client levels respectively
- Its data model and use of SSTables allows for fast writes and queries along clustering columns
DataStax: An Introduction to DataStax Enterprise SearchDataStax Academy
1) Why We Built DSE Search
2) Basics of the Read and Write Paths
3) Fault-tolerance and Adaptive Routing
4) Analytics with Search and Spark
5) Live Indexing
This document discusses Apache Spark and Cassandra. It provides an overview of Cassandra as a shared-nothing, masterless, peer-to-peer database with great scaling. It then discusses how Spark can be used to analyze large amounts of data stored in Cassandra in parallel across a cluster. The Spark Cassandra connector allows Spark to create partitions that align with the token ranges in Cassandra, enabling efficient distributed queries across the cluster.
The data model is dead, long live the data modelPatrick McFadin
The document discusses how data modeling concepts translate from relational databases to Cassandra. It begins with background on how Cassandra stores data using a row key and columns rather than tables and relations. Common patterns like one-to-many and many-to-many relationships are achieved without foreign keys by duplicating and denormalizing data. The document also covers concepts like UUIDs, transactions, and how some relational features like sequences are handled differently in Cassandra.
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
Cassandra community has consistently requested that we cover C* schema design concepts. This presentation goes in depth on the following topics:
- Schema design
- Best Practices
- Capacity Planning
- Real World Examples
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...DataStax Academy
Wait! Back away from the Cassandra 2ndary index. It’s ok for some use cases, but it’s not an easy button. "But I need to search through a bunch of columns to look for the data and I want to do some regression analysis… and I can’t model that in C*, even after watching all of Patrick McFadins videos. What do I do?” The answer, dear developer, is in DSE Search and Analytics. With it’s easy Solr API and Spark integration so you can search and analyze data stored in your Cassandra database until your heart’s content. Take our hand. WE will show you how.
An Introduction to time series with Team ApachePatrick McFadin
We as an industry are collecting more data every year. IoT, web, and mobile applications send torrents of bits to our data centers that have to be processed and stored, even as users expect an always-on experience—leaving little room for error. Patrick McFadin explores how successful companies do this every day using the powerful Team Apache: Apache Kafka, Spark, and Cassandra.
Patrick walks you through organizing a stream of data into an efficient queue using Apache Kafka, processing the data in flight using Apache Spark Streaming, storing the data in a highly scaling and fault-tolerant database using Apache Cassandra, and transforming and finding insights in volumes of stored data using Apache Spark.
Topics include:
- Understanding the right use case
- Considerations when deploying Apache Kafka
- Processing streams with Apache Spark Streaming
- A deep dive into how Apache Cassandra stores data
- Integration between Cassandra and Spark
- Data models for time series
- Postprocessing without ETL using Apache Spark on Cassandra
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingDataStax Academy
This document discusses using Cassandra to store and query time series data. It provides examples of modeling weather station data and financial trading data in Cassandra. The key points are:
- Cassandra is well-suited for storing and querying time series data due to its ability to scale out, its resilience, and efficient storage of sequential data.
- Example data models show how to store weather station temperature readings and stock trade events, with timestamps as the primary key to support queries on ranges of time.
- The on-disk layout sequentially stores data, allowing efficient slicing operations to retrieve ranges of records with a single disk seek.
Spark Cassandra Connector: Past, Present and FurureDataStax Academy
The document discusses the past, present, and future of the Spark Cassandra Connector. In the past, integrating Hadoop and Cassandra required expertise and was difficult. The Spark Cassandra Connector was first released in 2014 and makes it easier to access Cassandra data from Spark applications. Currently, the connector can read and write Cassandra data into RDDs, push filters down to Cassandra, and support Java APIs. It also enables working with DataFrames/SQL for Cassandra data.
The document discusses data modeling techniques for Cassandra and provides examples for four use cases: shopping cart data, user activity tracking, log collection/aggregation, and user form versioning. For each use case, it describes the business needs, issues with a relational database approach, and proposes a Cassandra data model using CQL. It emphasizes the importance of proper data modeling and getting the model right for a given use case.
Cassandra Community Webinar | Become a Super ModelerDataStax
Sure you can do some time series modeling. Maybe some user profiles. What's going to make you a super modeler? Let's take a look at some great techniques taken from real world applications where we exploit the Cassandra big table model to it's fullest advantage. We'll cover some of the new features in CQL 3 as well as some tried and true methods. In particular, we will look at fast indexing techniques to get data faster at scale. You'll be jet setting through your data like a true super modeler in no time.
Speaker: Patrick McFadin, Principal Solutions Architect at DataStax
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...StampedeCon
Learn how to model beyond traditional direct access in Apache Cassandra. Utilizing the DataStax platform to harness the power of Spark and Solr to perform search, analytics, and complex operations in place on your Cassandra data!
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
This document provides an overview of using Datastax Enterprise (DSE) Search to enable full-text search capabilities in Cassandra applications. It discusses how DSE Search integrates Solr/Lucene indexing with the Cassandra database to allow searching of application data without requiring a separate search cluster, external ETL processes, or custom application code for data management. The document also includes examples of different types of searches that can be performed, such as filtering, faceting, geospatial searches, and joins. It concludes with basic steps for getting started with DSE Search such as creating a Solr core and executing search queries using CQL.
Making money with open source and not losing your soul: A practical guidePatrick McFadin
We now live in a world where Open Source Software is as generally accepted as any commercial software. This doesn’t mean that there are lack of commercial aspects for OSS, because I’m here to tell you, Open Source is a perfectly viable business model. Don't worry! You don't have to sell your soul to the suits on Wall Street and give up on the core values of open source to make it work. I'm employed by a company that (hopefully) embodies these values with a lot of success. I’ve also interviewed many business leaders in Open Source companies. Let me share some of what I’ve learned so you too can be successful. The topics I will be covering:
- Picking the right open source license
- Business models for monetizing open source
- Engaging the community in a mutually beneficial way
- Competing with commercial alternatives
- The selling process (yes, we have to talk about that)
Help! I want to contribute to an Open Source project but my boss says no.Patrick McFadin
You love using Open Source Software. It's done right by you and now you want to contribute back. You get your patch all ready and… the boss says no! Don't feel alone. Enterprises everywhere are trying to figure this out. I'll walk you through what actually risks exist to businesses and how you can help manage them. Maybe armed with some information your boss will say... yes!
Apache cassandra and spark. you got the the lighter, let's start the firePatrick McFadin
Introduction to analyzing Apache Cassandra data using Apache Spark. This includes data models, operations topics and the internal on how Spark interfaces with Cassandra.
Nike Tech Talk: Double Down on Apache Cassandra and SparkPatrick McFadin
Apache Cassandra has proven to be one of the best solutions for storing and retrieving time series data at high velocity and high volume. This talk will give you an overview of the many ways you can be successful by introducing Apache Cassandra concepts. We will discuss how the storage model of Cassandra is well suited for this pattern and go over examples of how best to build data models. There will also be examples of how you can use Apache Spark along with Apache Cassandra to create a real time data analytics platform. It’s so easy, you will be shocked and ready to try it yourself.
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterprisePatrick McFadin
Wait! Back away from the Cassandra 2ndary index. It’s ok for some use cases, but it’s not an easy button. "But I need to search through a bunch of columns to look for the data and I want to do some regression analysis… and I can’t model that in C*, even after watching all of Patrick McFadins videos. What do I do?” The answer, dear developer, is in DSE Search and Analytics. With it’s easy Solr API and Spark integration so you can search and analyze data stored in your Cassandra database until your heart’s content. Take our hand. WE will show you how.
Introduction to data modeling with apache cassandraPatrick McFadin
Are you using relational databases and wonder how to get started with data modeling and Apache Cassandra? Here is a starting tour of how to get started. Translating from the knowledge you already have to the knowledge you need to effective with Cassandra development. We cover patterns and anti-patterns. Get going today!
This summer, coming to a server near you, Cassandra 3.0! Contributors and committers have been working hard on what is the most ambitious release to date. It’s almost too much to talk about, but we will dig into some of the most important, ground breaking features that you’ll want to use. Indexing changes that will make your applications faster and spark jobs more efficient. Storage engine changes to get even more density and efficiency from your nodes. Developer focused features like full JSON support and User Defined Functions. And finally, one of the most requested features, Windows support, has made it’s arrival. There is more, but you’ll just have to some see for yourself. Get your front row seat and don’t miss it!
- Apache Cassandra is a linearly scalable and fault tolerant NoSQL database that increases throughput linearly with additional machines
- It is an AP system that is eventually consistent according to the CAP theorem, sacrificing consistency in favor of availability and partition tolerance
- Cassandra uses replication and consistency levels to control fault tolerance at the server and client levels respectively
- Its data model and use of SSTables allows for fast writes and queries along clustering columns
DataStax: An Introduction to DataStax Enterprise SearchDataStax Academy
1) Why We Built DSE Search
2) Basics of the Read and Write Paths
3) Fault-tolerance and Adaptive Routing
4) Analytics with Search and Spark
5) Live Indexing
This document discusses Apache Spark and Cassandra. It provides an overview of Cassandra as a shared-nothing, masterless, peer-to-peer database with great scaling. It then discusses how Spark can be used to analyze large amounts of data stored in Cassandra in parallel across a cluster. The Spark Cassandra connector allows Spark to create partitions that align with the token ranges in Cassandra, enabling efficient distributed queries across the cluster.
The data model is dead, long live the data modelPatrick McFadin
The document discusses how data modeling concepts translate from relational databases to Cassandra. It begins with background on how Cassandra stores data using a row key and columns rather than tables and relations. Common patterns like one-to-many and many-to-many relationships are achieved without foreign keys by duplicating and denormalizing data. The document also covers concepts like UUIDs, transactions, and how some relational features like sequences are handled differently in Cassandra.
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
Cassandra community has consistently requested that we cover C* schema design concepts. This presentation goes in depth on the following topics:
- Schema design
- Best Practices
- Capacity Planning
- Real World Examples
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...DataStax Academy
Wait! Back away from the Cassandra 2ndary index. It’s ok for some use cases, but it’s not an easy button. "But I need to search through a bunch of columns to look for the data and I want to do some regression analysis… and I can’t model that in C*, even after watching all of Patrick McFadins videos. What do I do?” The answer, dear developer, is in DSE Search and Analytics. With it’s easy Solr API and Spark integration so you can search and analyze data stored in your Cassandra database until your heart’s content. Take our hand. WE will show you how.
An Introduction to time series with Team ApachePatrick McFadin
We as an industry are collecting more data every year. IoT, web, and mobile applications send torrents of bits to our data centers that have to be processed and stored, even as users expect an always-on experience—leaving little room for error. Patrick McFadin explores how successful companies do this every day using the powerful Team Apache: Apache Kafka, Spark, and Cassandra.
Patrick walks you through organizing a stream of data into an efficient queue using Apache Kafka, processing the data in flight using Apache Spark Streaming, storing the data in a highly scaling and fault-tolerant database using Apache Cassandra, and transforming and finding insights in volumes of stored data using Apache Spark.
Topics include:
- Understanding the right use case
- Considerations when deploying Apache Kafka
- Processing streams with Apache Spark Streaming
- A deep dive into how Apache Cassandra stores data
- Integration between Cassandra and Spark
- Data models for time series
- Postprocessing without ETL using Apache Spark on Cassandra
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingDataStax Academy
This document discusses using Cassandra to store and query time series data. It provides examples of modeling weather station data and financial trading data in Cassandra. The key points are:
- Cassandra is well-suited for storing and querying time series data due to its ability to scale out, its resilience, and efficient storage of sequential data.
- Example data models show how to store weather station temperature readings and stock trade events, with timestamps as the primary key to support queries on ranges of time.
- The on-disk layout sequentially stores data, allowing efficient slicing operations to retrieve ranges of records with a single disk seek.
Spark Cassandra Connector: Past, Present and FurureDataStax Academy
The document discusses the past, present, and future of the Spark Cassandra Connector. In the past, integrating Hadoop and Cassandra required expertise and was difficult. The Spark Cassandra Connector was first released in 2014 and makes it easier to access Cassandra data from Spark applications. Currently, the connector can read and write Cassandra data into RDDs, push filters down to Cassandra, and support Java APIs. It also enables working with DataFrames/SQL for Cassandra data.
The document discusses data modeling techniques for Cassandra and provides examples for four use cases: shopping cart data, user activity tracking, log collection/aggregation, and user form versioning. For each use case, it describes the business needs, issues with a relational database approach, and proposes a Cassandra data model using CQL. It emphasizes the importance of proper data modeling and getting the model right for a given use case.
Cassandra Community Webinar | Become a Super ModelerDataStax
Sure you can do some time series modeling. Maybe some user profiles. What's going to make you a super modeler? Let's take a look at some great techniques taken from real world applications where we exploit the Cassandra big table model to it's fullest advantage. We'll cover some of the new features in CQL 3 as well as some tried and true methods. In particular, we will look at fast indexing techniques to get data faster at scale. You'll be jet setting through your data like a true super modeler in no time.
Speaker: Patrick McFadin, Principal Solutions Architect at DataStax
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...StampedeCon
Learn how to model beyond traditional direct access in Apache Cassandra. Utilizing the DataStax platform to harness the power of Spark and Solr to perform search, analytics, and complex operations in place on your Cassandra data!
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
This document provides an overview of using Datastax Enterprise (DSE) Search to enable full-text search capabilities in Cassandra applications. It discusses how DSE Search integrates Solr/Lucene indexing with the Cassandra database to allow searching of application data without requiring a separate search cluster, external ETL processes, or custom application code for data management. The document also includes examples of different types of searches that can be performed, such as filtering, faceting, geospatial searches, and joins. It concludes with basic steps for getting started with DSE Search such as creating a Solr core and executing search queries using CQL.
Making money with open source and not losing your soul: A practical guidePatrick McFadin
We now live in a world where Open Source Software is as generally accepted as any commercial software. This doesn’t mean that there are lack of commercial aspects for OSS, because I’m here to tell you, Open Source is a perfectly viable business model. Don't worry! You don't have to sell your soul to the suits on Wall Street and give up on the core values of open source to make it work. I'm employed by a company that (hopefully) embodies these values with a lot of success. I’ve also interviewed many business leaders in Open Source companies. Let me share some of what I’ve learned so you too can be successful. The topics I will be covering:
- Picking the right open source license
- Business models for monetizing open source
- Engaging the community in a mutually beneficial way
- Competing with commercial alternatives
- The selling process (yes, we have to talk about that)
Help! I want to contribute to an Open Source project but my boss says no.Patrick McFadin
You love using Open Source Software. It's done right by you and now you want to contribute back. You get your patch all ready and… the boss says no! Don't feel alone. Enterprises everywhere are trying to figure this out. I'll walk you through what actually risks exist to businesses and how you can help manage them. Maybe armed with some information your boss will say... yes!
Owning time series with team apache Strata San Jose 2015Patrick McFadin
Break out your laptops for this hands-on tutorial is geared around understanding the basics of how Apache Cassandra stores and access time series data. We’ll start with an overview of how Cassandra works and how that can be a perfect fit for time series. Then we will add in Apache Spark as a perfect analytics companion. There will be coding as a part of the hands on tutorial. The goal will be to take a example application and code through the different aspects of working with this unique data pattern. The final section will cover the building of an end-to-end data pipeline to ingest, process and store high speed, time series data.
Apache Cassandra is a popular choice for a wide variety of application persistence needs. There are many design choices that can effect uptime and performance. In this talk we'll look at some of the many things to consider from a single server to multiple data centers. Basic understanding of Cassandra features coupled with client driver features can be a very powerful combination. This talk will be an introduction but will deep dive into the technical details of how Cassandra works.
Cassandra by example - the path of read and write requestsgrro
This article describes how Cassandra handles and processes requests. It will help you to get a better impression about Cassandra's internals and architecture. The path of a single read request as well as the path of a single write request will be described in detail.
Apache cassandra & apache spark for time series dataPatrick McFadin
Apache Cassandra is a distributed database that stores time series data in a partitioned and ordered format. Apache Spark can efficiently query this Cassandra data using Resilient Distributed Datasets (RDDs) and perform analytics like aggregations. For example, weather station data stored sequentially in Cassandra by time can be aggregated into daily high and low temperatures with Spark and written back to a roll-up Cassandra table.
1) O livro Os Maias narra a história de três gerações da família Maia ao longo do século XIX, desde as lutas liberais até à regeneração política em Portugal.
2) A história começa em 1875 e descreve a mudança da família para a casa Ramalhete em Lisboa, bem como as características dos seus membros, incluindo Afonso da Maia.
3) O capítulo também aborda o casamento e filhos de Pedro da Maia, neto de Afonso, e o desentendimento entre ambos
Cassandra Community Webinar | Data Model on FireDataStax
Functional data models are great, but how can you squeeze out more performance and make them awesome? Let's talk through some example Cassandra 2.0 models, go through the tuning steps and understand the tradeoffs. Many time's just a simple understanding of the underlying Cassandra 2.0 internals can make all the difference. I've helped some of the biggest companies in the world do this and I can help you. Do you feel the need for Cassandra 2.0 speed?
This document discusses PostgreSQL and Solaris as a low-cost platform for medium to large scale critical scenarios. It provides an overview of PostgreSQL, highlighting features like MVCC, PITR, and ACID compliance. It describes how Solaris and PostgreSQL integrate well, with benefits like DTrace support, scalability on multicore/multiprocessor systems, and Solaris Cluster support. Examples are given for installing PostgreSQL on Solaris using different methods, configuring zones for isolation, using ZFS for storage, and monitoring performance with DTrace scripts.
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinDataStax Academy
The document provides an overview and examples of data modeling techniques for Cassandra. It discusses four use cases - shopping cart data, user activity tracking, log collection/aggregation, and user form versioning. For each use case, it describes the business needs, issues with a relational database approach, and provides the Cassandra data model solution with examples in CQL. The models showcase techniques like de-normalizing data, partitioning, clustering, counters, maps and setting TTL for expiration. The presentation aims to help attendees properly model their data for Cassandra use cases.
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...it-people
Modern Apache Cassandra provides a highly scalable and available database. Some key points covered in the document include:
- Cassandra has been under active development since 2008 and is now at version 2.0, with 2.1 upcoming.
- It is used by many companies for applications such as social media features, logging, notifications, and more due to its abilities around scalability, high availability, and tunable consistency.
- Cassandra uses a decentralized architecture with no single point of failure and dynamic partitioning of data across nodes using a token ring approach for high availability without a single point of failure.
- It provides tunable consistency levels, lightweight transactions, and other features for flexibility while maintaining high
1) Hatohol is a server that collects and merges data from Zabbix and Nagios servers. It has a web-based client for visualizing this data.
2) The Hatohol server architecture pulls data from Zabbix and Nagios using APIs and stores it in a unified database. The server also has a REST API for the client.
3) Future plans for Hatohol include adding an action framework to allow it to take actions based on triggers, improving high availability, adding graphing capabilities, and a more sophisticated web client.
Need help implementing the skeleton code below, I have provided the .pdfezzi552
I need help with this practice problem?
Execute the following coding segment and identify the errors in the program. Debug the program
and provide the correct version of the code. Hinclude int main() printf(\"%s\", isdigit(\'A\') ? \"A
is digit\" \"A is not digit\"); a : a printf(\"Inlnln\") return return e;
Solution
in range [0,9]. If it\'s not then this function returns 0.
Given program compiles successfully, compiler does not give any error.
But If the input of isdigit(\'A\') is changed then the program will always print
\"A is a digit\" or
\"A is not a digit\" .
It will not print exact value of char A.
So, correct code is given below.
#include
int main()
{
char A = \'1\';
printf(\"%c%s\",A,isdigit(A)? \" is a digit\" : \" is not a digit\");
printf(\"\ \");
A = \'B\';
printf(\"%c%s\",A,isdigit(A)? \" is a digit\" : \" is not a digit\");
printf(\"\ \");
return 0;
}
Sample Output:
1 is a digit
B is not a digit.
The document is a presentation about new features in PostgreSQL 9.6. It discusses several major new features including parallel queries, avoiding VACUUM on all-frozen pages using freeze maps, monitoring the progress of VACUUM, phrase full text search, multiple synchronous replication, remote_apply synchronous commit, and improved capabilities of the postgres_fdw extension including pushing down sorts, joins, updates and deletes to remote servers.
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2DataStax
Title: Introduction to Apache Cassandra 1.2
Details: Join Aaron Morton, DataStax MVP for Apache Cassandra and learn the basics of the massively scalable NoSQL database. This webinar is will examine C*’s architecture and its strengths for powering mission-critical applications. Aaron will introduce you to core concepts such as Cassandra’s data model, multi-datacenter replication, and tunable consistency. He’ll also cover new features in Cassandra version 1.2 including virtual nodes, CQL 3 language and query tracing.
Speaker: Aaron Morton, Apache Cassandra Committer
Aaron Morton is a Freelance Developer based in New Zealand, and a Committer on the Apache Cassandra project. In 2010, he gave up the RDBMS world for the scale and reliability of Cassandra. He now spends his time advancing the Cassandra project and helping others get the best out of it.
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2aaronmorton
This document provides an introduction to Apache Cassandra, including an overview of key concepts like the cluster, nodes, data model, and data modeling best practices. It discusses Cassandra's origins and popularity. The presentation covers the cluster architecture with consistent hashing and token ranges, replication strategies, consistency levels, and more. It also summarizes the Cassandra data model including tables, columns, SSTables, caching, compaction and discusses building a Twitter-like data model in CQL.
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidiaMail.ru Group
Все мы знаем, что наш любимый Pandas исключительно однопоточный, а модели из scikit-learn часто учатся не очень быстро даже в несколько процессов. Поэтому в докладе я расскажу о проекте RAPIDS - наборе библиотек для анализа данных и построения предиктивных моделей с использованием NVIDIA GPU. В докладе я предложу подискутировать о том, что закон Мура больше не выполняется, рассмотрю принципы работы архитектуры CUDA. Разберу библиотеки cuDF и cuML, а также постараюсь предельно честно рассказать о том, ждать ли чуда от перехода на GPU и в каких случаях чудо неизбежно.
The document discusses several new features and enhancements in Oracle Database 11g Release 1. Key points include:
1) Encrypted tablespaces allow full encryption of data while maintaining functionality like indexing and foreign keys.
2) New caching capabilities improve performance by caching more results and metadata to avoid repeat work.
3) Standby databases have been enhanced and can now be used for more active purposes like development, testing, reporting and backups while still providing zero data loss protection.
The document discusses new features in Oracle Database 11g Release 1. Key points include:
1. Encrypted tablespaces allow encryption of data at the tablespace level while still supporting indexing and queries.
2. New caching capabilities improve performance by caching more results in memory, such as function results and query results.
3. Standby databases have enhanced capabilities and can now be used for more active purposes like development, testing and reporting for increased usability and value.
Conference: HP Big Data Conference 2015
Session: Real-world Methods for Boosting Query Performance
Presentation: "Extra performance out of thin air"
Presenter: Konstantine Krutiy, Principal Software Engineer / Vertica Whisperer
Company: Localytics
Description:
Learn how to get extra performance out of Vertica from areas you never expected.
This presentation will illustrate how you can improve performance of your Vertica cluster without extra budget.
All you need is ingenuity, knowledge of Vertica internals, and the ability to challenge conventional wisdom.
We will show you real world examples on gaining performance by eliminating unneeded work, eliminating unneeded system waits and making your system operate more efficiently.
Visit my blog http://www.dbjungle.com for more Vertica insights
Cassandra Community Webinar | The World's Next Top Data ModelDataStax
You know you need Cassandra for it's uptime and scaling, but what about that data model? Let's bridge that gap and get you building your game changing app. We'll break down topics like storing objects and indexing for fast retrieval. You will see by understanding a few things about Cassandra internals, you can put your data model in the spotlight. The goal of this talk is to get you comfortable working with data in Cassandra throughout the application lifecycle. What are you waiting for? The cameras are waiting!
Building an Automated Behavioral Malware Analysis Environment using Free and ...Jim Clausing
The document describes building an automated malware behavioral analysis environment using free and open-source tools. It details setting up analysis machines running Debian, installing analysis tools including Volatility, RegRipper, and AIDE. Samples are submitted to the machines via SSH and analyzed for network traffic using tools like tcpdump, DNS queries with fauxDNS, and open ports with connections. The results including OS identification, registry changes, and network indicators are summarized for analysts.
- The document provides information on using Ansible to manage network device configurations including Juniper devices. It discusses using modules like junos_get_config to backup configurations, templates to generate configurations, and junos_install_config to deploy them. It also covers using Ansible to manage users on Linux systems.
This document provides an overview of Apache Cassandra including:
- What Cassandra is and how it differs from an RDBMS by not supporting joins, having an optional schema, and being transactionless.
- Cassandra's data model using keyspaces, column families, and static vs dynamic column families.
- How to integrate Cassandra with Java applications using the Hector client and ColumnFamilyTemplate for querying, updating, and deleting data.
- Additional topics covered include the CAP theorem, data storage and compaction, and using CQL via JDBC.
This document provides an overview of performance tuning for Java applications. It discusses top-down and bottom-up performance analysis approaches. It also covers choosing the right garbage collector and JVM tuning basics like calculating allocation rates and live data size from GC logs. The document shows examples of tuning JVM settings for latency using CMS and G1 collectors as well as tuning for throughput using ParallelOldGC.
This document provides a block diagram and component list for the Quanta Computer Inc. PROJECT ZQA. It includes details of the AMD Champlain processor, chipset components, memory, I/O ports, and voltage regulation circuitry. Pins and connections are specified between the processor and chipset components, as well as guidelines for trace lengths between components to optimize performance.
When your query execution is slow, a couple of questions arise. Where to look for resources utilization? What tools do you have to analyze CPU, hard drive and RAM bottlenecks? Could you do something to reduce query execution time? MariaDB's Patrick LeBlanc and Roman Nozdrin touch on both Columnstore's query execution introspection tools as well as operating system capabilities that everyone should know about. They go on to discuss a number of real life use cases too. Some called for configuration changes whilst others forced them to make serious changes in the code.
If you’re involved in open source work in or around a business, you will inevitably have the discussion, “Is this open source or proprietary?” Do not take this moment lightly. This seemingly easy question is met with strong opinions on both sides. Friendships have been lost. Companies have suffered. It’s as close to religious warfare as we can get in the tech world.
It’s time to call a truce.
There are plenty of valid arguments on both sides. Patrick McFadin outlines the pros and cons of each. Using example scenarios of projects that must decide whether or not they’ll be open source, Patrick explores objective ways to make a decision without descending into chaos and name calling. Even without a completely objective picture, understanding both sides of the argument can help keep you on track and civil. Patrick has been involved in OSS for more years than he likes to admit and would love for his past mistakes to benefit you.
Topics include:
- Key questions to ask to help guide your decision
- Reasons for choosing OSS
- Reasons for staying strictly proprietary
- Considerations for mixing OSS and proprietary models
- Transitioning from one model to the other
You’ve heard all of the hype, but how can SMACK work for you? In this all-star lineup, you will learn how to create a reactive, scaling, resilient and performant data processing powerhouse. We will go through the basics of Akka, Kafka and Mesos and then deep dive into putting them together in an end2end (and back again) distrubuted transaction. Distributed transactions mean producers waiting for one or more of consumers to respond. On the backend, you will see how Apache Cassandra and Spark can be combined to add the incredibly scaling storage and data analysis needed for fast data pipelines. With these technologies as a foundation, you have the assurance that scale is never a problem and uptime is default.
Analyzing Time Series Data with Apache Spark and CassandraPatrick McFadin
You have collected a lot of time series data so now what? It's not going to be useful unless you can analyze what you have. Apache Spark has become the heir apparent to Map Reduce but did you know you don't need Hadoop? Apache Cassandra is a great data source for Spark jobs! Let me show you how it works, how to get useful information and the best part, storing analyzed data back into Cassandra. That's right. Kiss your ETL jobs goodbye and let's get to analyzing. This is going to be an action packed hour of theory, code and examples so caffeine up and let's go.
Building Antifragile Applications with Apache CassandraPatrick McFadin
Even with the best infrastructure, failures will occur without warning and are almost guaranteed. Building applications that can resist this fact of life can be both art and science. In this talk, I'll try to eliminate the art portion and focus more on the science. Starting at high level architecture decisions, I will take you through each layer and finally down to actual application code. Using Cassandra as the back end database, we can build layers of fault tolerance that will leave end users completely unaware of the underlying chaos that could be occurring. With a little planning, we can say goodbye to the Fail Whale and the fragility of the traditional RDBMS. Topics will include:
- Application strategies to utilize active-active, diverse, datacenters
- Replicating data with the highest integrity and maximum resilience
- Utilizing Cassandra's built-in fault tolerance
- Architecture of private, cloud or hybrid based applications
- Application driver techniques when using Cassandra
A 30 minute talk I did at Cassandra Dublin and Cassandra London. Just some things I've learned along the way as I've helped some of the largest users of Cassandra be successful. Learn form other peoples mistakes!
This document is a presentation on advanced Cassandra data modeling techniques. It discusses time series modeling, user modeling, using collections like sets, lists and maps, indexing strategies like keyword indexing and bitmap indexing. It encourages the audience to go beyond basic modeling and take advantage of Cassandra features to create "super" models that are fast and efficient. It promotes experimenting with different partitioning and clustering strategies. The presentation concludes by advertising an upcoming modeling competition at the Cassandra summit and sharing a discount code for attendance.
The document discusses the introduction of virtual nodes in Cassandra 1.2. It explains that virtual nodes allow a single server to handle multiple token ranges, improving hardware utilization and simplifying operations. The transition involves changing configuration settings to enable multiple tokens per node and initiating a shuffling process to redistribute data. Virtual nodes provide benefits like faster rebuilds and adding new nodes without complex token management.
This document contains a presentation about DataStax Enterprise and Cassandra. It discusses DataStax as the company behind Cassandra, the features of DataStax Enterprise including support for Cassandra, Hadoop and Solr. It also covers Cassandra core concepts like the data model, data loading, and new features in Cassandra 1.2 like collections and virtual nodes. There is also a demonstration of interacting with Cassandra using CQL.
The document discusses building a video sharing application using Cassandra. It outlines conceptualizing the application, identifying entity and query tables, and coding and deploying the application. Key tables discussed include Users, Videos, Comments, and Ratings, along with sample CQL and code to store and retrieve data from these tables.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Webinar: Designing a schema for a Data WarehouseFederico Razzoli
Are you new to data warehouses (DWH)? Do you need to check whether your data warehouse follows the best practices for a good design? In both cases, this webinar is for you.
A data warehouse is a central relational database that contains all measurements about a business or an organisation. This data comes from a variety of heterogeneous data sources, which includes databases of any type that back the applications used by the company, data files exported by some applications, or APIs provided by internal or external services.
But designing a data warehouse correctly is a hard task, which requires gathering information about the business processes that need to be analysed in the first place. These processes must be translated into so-called star schemas, which means, denormalised databases where each table represents a dimension or facts.
We will discuss these topics:
- How to gather information about a business;
- Understanding dictionaries and how to identify business entities;
- Dimensions and facts;
- Setting a table granularity;
- Types of facts;
- Types of dimensions;
- Snowflakes and how to avoid them;
- Expanding existing dimensions and facts.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Project Management Semester Long Project - Acuityjpupo2018
Acuity is an innovative learning app designed to transform the way you engage with knowledge. Powered by AI technology, Acuity takes complex topics and distills them into concise, interactive summaries that are easy to read & understand. Whether you're exploring the depths of quantum mechanics or seeking insight into historical events, Acuity provides the key information you need without the burden of lengthy texts.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
1. #CASSANDRAEU
Data Model on Fire
Patrick McFadin | Chief Evangelist DataStax
@PatrickMcFadin
Friday, October 18, 13
2. Data Model is King
•With 2.0 we now have more choices
•Sometimes the data model is only the first part
•Understanding the underlying engine helps
•You aren’t done until you tune
Load test baby!
Friday, October 18, 13
#CASSANDRAEU
4. The race is on
Process 1
#CASSANDRAEU
Process 2
SELECT firstName, lastName
FROM users
WHERE username = 'pmcfadin';
T0
T1
(0 rows)
SELECT firstName, lastName
FROM users
WHERE username = 'pmcfadin';
(0 rows)
INSERT INTO users (username, firstname,
lastname, email, password, created_date)
VALUES ('pmcfadin','Patrick','McFadin',
['patrick@datastax.com'],
'ba27e03fd95e507daf2937c937d499ab',
'2011-06-20 13:50:00');
Got nothing! Good to go!
T2
T3
This one wins
Friday, October 18, 13
INSERT INTO users (username, firstname,
lastname, email, password, created_date)
VALUES ('pmcfadin','Paul','McFadin',
['paul@oracle.com'],
'ea24e13ad95a209ded8912e937d499de',
'2011-06-20 13:51:00');
5. Solution LWT
#CASSANDRAEU
Process 1
INSERT INTO users (username, firstname,
lastname, email, password, created_date)
VALUES ('pmcfadin','Patrick','McFadin',
['patrick@datastax.com'],
'ba27e03fd95e507daf2937c937d499ab',
'2011-06-20 13:50:00')
IF NOT EXISTS;
[applied]
----------True
T0
T1
•Check performed for record
•Paxos ensures exclusive access
•applied = true: Success
Friday, October 18, 13
6. Solution LWT
Process 2
T2
T3
INSERT INTO users (username, firstname,
lastname, email, password, created_date)
VALUES ('pmcfadin','Paul','McFadin',
['paul@oracle.com'],
'ea24e13ad95a209ded8912e937d499de',
'2011-06-20 13:51:00')
IF NOT EXISTS;
[applied] | username | created_date
| firstname | lastname
-----------+----------+--------------------------+-----------+---------False | pmcfadin | 2011-06-20 13:50:00-0700 |
Patrick | McFadin
•applied = false: Rejected
•No record stomping!
Friday, October 18, 13
#CASSANDRAEU
7. LWT Fine Print
#CASSANDRAEU
•Light Weight Transactions solve edge conditions
•They have latency cost.
• Be aware
• Load test
• Consider in your data model
•Now go shut down that ZooKeeper mess you have!
Friday, October 18, 13
9. Form Versioning Pt 1
•From “Next top data model”
•Great idea, but edge conditions
CREATE TABLE working_version (
!
username varchar,
!
form_id int,
!
version_number int,
!
locked_by varchar,
!
form_attributes map<varchar,varchar>
!
PRIMARY KEY ((username, form_id), version_number)
) WITH CLUSTERING ORDER BY (version_number DESC);
•Each user has a form
•Each form needs versioning
•Need an exclusive lock on the form
Friday, October 18, 13
#CASSANDRAEU
10. Form Versioning Pt 1
1. Insert first version
INSERT INTO working_version
(username, form_id, version_number, locked_by, form_attributes)
VALUES ('pmcfadin',1138,1,'',
{'FirstName<text>':'First Name: ',
'LastName<text>':'Last Name: ',
'EmailAddress<text>':'Email Address: ',
'Newsletter<radio>':'Y,N'});
2. Lock for one user
Danger Zone
UPDATE working_version
SET locked_by = 'pmcfadin'
WHERE username = 'pmcfadin'
AND form_id = 1138
AND version_number = 1;
3. Insert new version. Release lock
INSERT INTO working_version
(username, form_id, version_number, locked_by, form_attributes)
VALUES ('pmcfadin',1138,2,null,
{'FirstName<text>':'First Name: ',
'LastName<text>':'Last Name: ',
'EmailAddress<text>':'Email Address: ',
'Newsletter<checkbox>':'Y'});
Friday, October 18, 13
#CASSANDRAEU
11. Form Versioning Pt 2
#CASSANDRAEU
1. Insert first version
INSERT INTO working_version
(username, form_id, version_number, locked_by, form_attributes)
VALUES ('pmcfadin',1138,1,'pmcfadin',
{'FirstName<text>':'First Name: ',
'LastName<text>':'Last Name: ',
'EmailAddress<text>':'Email Address: ',
'Newsletter<radio>':'Y,N'})
IF NOT EXISTS;
Exclusive lock
UPDATE working_version
SET form_attributes['EmailAddress<text>'] = 'Primary Email Address: '
WHERE username = 'pmcfadin'
AND form_id = 1138
AND version_number = 1
IF locked_by = 'pmcfadin';
Accepted
UPDATE working_version
SET form_attributes['EmailAddress<text>'] = 'Email Adx: '
WHERE username = 'pmcfadin'
AND form_id = 1138
AND version_number = 1
IF locked_by = 'dude';
Rejected
(sorry dude)
Friday, October 18, 13
12. Form Versioning Pt 2
•Old way: Edge cases with problems
• Use external locking?
• Take your chances?
•New way: Managed expectations (LWT)
• Exclusive by existence check
• Continued with IF clause
• Downside: More latency
Friday, October 18, 13
#CASSANDRAEU
14. Cassandra 2.0 Fire
•Great changes in both 1.2 and 2.0 for perf
•Three big changes in 2.0 I like
Friday, October 18, 13
#CASSANDRAEU
15. Cassandra 2.0 Fire
•Great changes in both 1.2 and 2.0 for perf
•Three big changes in 2.0 I like
Single pass compaction
Friday, October 18, 13
#CASSANDRAEU
16. Cassandra 2.0 Fire
•Great changes in both 1.2 and 2.0 for perf
•Three big changes in 2.0 I like
Single pass compaction
Hints to reduce SSTable reads
Friday, October 18, 13
#CASSANDRAEU
17. Cassandra 2.0 Fire
•Great changes in both 1.2 and 2.0 for perf
•Three big changes in 2.0 I like
Single pass compaction
Hints to reduce SSTable reads
Faster index reads from off-heap
Friday, October 18, 13
#CASSANDRAEU
18. Why is this important?
•Reducing SStable reads mean less seeks
•Disk seeks can add up fast
•5 seeks on SATA = 60ms of just disk!
Avg Access Time*
Rotation Speed
12ms
7200 RPM
7ms
10k RPM
5ms
15k RPM
.04ms
SSD
* Source: www.tomshardware.com
Friday, October 18, 13
#CASSANDRAEU
19. Why is this important?
•Reducing SStable reads mean less seeks
•Disk seeks can add up fast
•5 seeks on SATA = 60ms of just disk!
Avg Access Time*
Rotation Speed
12ms
7200 RPM
7ms
10k RPM
5ms
15k RPM
.04ms
SSD
Shared storage == Great sadness
* Source: www.tomshardware.com
Friday, October 18, 13
#CASSANDRAEU
20. Quick Diversion
#CASSANDRAEU
•cfhistograms is your friend
•Histograms of statistics per table
•Collected...
• per read
• per write
• SSTable flush
• Compaction
nodetool cfhistograms <keyspace> <table>
Friday, October 18, 13
28. Histograms + Data Model
•Your data model is the key to success
•How do you ensure that?
Test
Measure
Repeat
Friday, October 18, 13
#CASSANDRAEU
29. Real World Example
•Real Customer
•Needed very tight SLA on reads
Problem
•Read response highly variable
•Loading data increases latency
Friday, October 18, 13
#CASSANDRAEU
32. Partition Size
#CASSANDRAEU
•Tuning is an option based on size in bytes
•All about the reads
•index_interval
•How many samples taken
•Lower for faster access but more memory usage
•column_index_size_in_kb
•Add column indexes to a row when the data
reaches this size
•Partial row reads? Maybe smaller.
Friday, October 18, 13
33. Tuning results
•Spent a lot of time tuning disk
•Played with
• index_interval (Lowered)
• concurrent_reads (Increased)
• column_index_size_in_kb (Lowered)
220 Million Ops/Day
10000 Transactions/Sec Peak
9ms at 95th percentile. Measured at the application!
Friday, October 18, 13
#CASSANDRAEU
35. Disk + Data Model
•Understand the internals
• Size of partition
• Compaction
•Learn how to measure
•Load test
Friday, October 18, 13
#CASSANDRAEU
36. #CASSANDRAEU
Thank you! Time for questions...
*More? My data modeling talks:
The Data Model is Dead, Long Live the Data Model
Become a Super Modeler
The World's Next Top Data Model
Friday, October 18, 13