This document contains a presentation on Cassandra and how it can be used. It discusses Cassandra's architecture based on Dynamo and BigTable, as well as how it provides availability, scalability, and performance. It covers data modeling techniques in Cassandra like column families, static vs dynamic columns, and using timestamps for time series data. Examples are provided for modeling user login data and social network activity. Anti-patterns like super columns and read-before-write are also discussed. The document concludes with information on an Ebay use case involving social signals and recommendations.
Percona Live 4/15/15: Transparent sharding database virtualization engine (DVE)Tesora
Amrith Kumar of Tesora and Peter Boros of Percona present an in-depth exploration of transparent database scale out use the Tesora DVE framework for MySQL.
PGDAY FR 2014 : presentation de Postgresql chez leboncoin.frjlb666
This document discusses the use of PostgreSQL in Schibsted Classified Media's platform. Some key points:
- SCM uses PostgreSQL across 30+ countries, with over 100 servers storing 8TB of data and handling 50 million classified ads.
- Leboncoin.fr, the French classifieds site, is powered by this PostgreSQL-based platform. It receives 250 million page views per day from 5 million unique visitors.
- The database infrastructure includes high-performance servers with 2TB of RAM storing the 6TB production database. The read workload is offloaded to multiple read-only slaves.
- Despite caching, the master database still handles 600 transactions per second. Future scalability improvements may include sharding
Use Your MySQL Knowledge to Become a MongoDB GuruTim Callaghan
Leverage all of your MySQL knowledge and experience to get up to speed quickly with MongoDB.
Presented at Percona Live London 2013 with Robert Hodges of Continuent.
This document provides a summary of different data storage systems and structures. It discusses B-trees, LSM-trees, hash indices, R-trees, and the Block Range Index. It describes their uses, properties, and tradeoffs for operations like reads, writes, and range queries. Overall, the document analyzes various indexing techniques and how they are applied in different databases.
The document provides an introduction to Cassandra presented by Nick Bailey. It discusses key Cassandra concepts like cluster architecture, data modeling using CQL, and best practices. Examples are provided to illustrate how to model time-series data and denormalize schemas to support different queries. Tools for testing Cassandra implementations like CCM and client drivers are also mentioned.
This document summarizes an introduction to Cassandra and Linux. It discusses Cassandra's origins from Amazon's Dynamo and Google's BigTable systems. It also outlines key features of Cassandra like availability, scalability, performance and data center support. The document provides hardware recommendations for Cassandra including memory, CPU, disks and configuration guidance. It concludes by listing resources for learning more about Cassandra.
The data model is dead, long live the data modelPatrick McFadin
The document discusses how data modeling concepts translate from relational databases to Cassandra. It begins with background on how Cassandra stores data using a row key and columns rather than tables and relations. Common patterns like one-to-many and many-to-many relationships are achieved without foreign keys by duplicating and denormalizing data. The document also covers concepts like UUIDs, transactions, and how some relational features like sequences are handled differently in Cassandra.
Synchronise your data between MySQL and MongoDBGiuseppe Maxia
The document discusses synchronizing data between MySQL and MongoDB using Tungsten Replicator. Tungsten Replicator allows data to be replicated from a MySQL database to MongoDB in near real-time. The document provides examples of data being inserted and updated in MySQL, and then appearing in MongoDB through the replication process. It also discusses security features and how to install Tungsten Replicator for basic master-slave replication between MySQL and MongoDB databases.
Percona Live 4/15/15: Transparent sharding database virtualization engine (DVE)Tesora
Amrith Kumar of Tesora and Peter Boros of Percona present an in-depth exploration of transparent database scale out use the Tesora DVE framework for MySQL.
PGDAY FR 2014 : presentation de Postgresql chez leboncoin.frjlb666
This document discusses the use of PostgreSQL in Schibsted Classified Media's platform. Some key points:
- SCM uses PostgreSQL across 30+ countries, with over 100 servers storing 8TB of data and handling 50 million classified ads.
- Leboncoin.fr, the French classifieds site, is powered by this PostgreSQL-based platform. It receives 250 million page views per day from 5 million unique visitors.
- The database infrastructure includes high-performance servers with 2TB of RAM storing the 6TB production database. The read workload is offloaded to multiple read-only slaves.
- Despite caching, the master database still handles 600 transactions per second. Future scalability improvements may include sharding
Use Your MySQL Knowledge to Become a MongoDB GuruTim Callaghan
Leverage all of your MySQL knowledge and experience to get up to speed quickly with MongoDB.
Presented at Percona Live London 2013 with Robert Hodges of Continuent.
This document provides a summary of different data storage systems and structures. It discusses B-trees, LSM-trees, hash indices, R-trees, and the Block Range Index. It describes their uses, properties, and tradeoffs for operations like reads, writes, and range queries. Overall, the document analyzes various indexing techniques and how they are applied in different databases.
The document provides an introduction to Cassandra presented by Nick Bailey. It discusses key Cassandra concepts like cluster architecture, data modeling using CQL, and best practices. Examples are provided to illustrate how to model time-series data and denormalize schemas to support different queries. Tools for testing Cassandra implementations like CCM and client drivers are also mentioned.
This document summarizes an introduction to Cassandra and Linux. It discusses Cassandra's origins from Amazon's Dynamo and Google's BigTable systems. It also outlines key features of Cassandra like availability, scalability, performance and data center support. The document provides hardware recommendations for Cassandra including memory, CPU, disks and configuration guidance. It concludes by listing resources for learning more about Cassandra.
The data model is dead, long live the data modelPatrick McFadin
The document discusses how data modeling concepts translate from relational databases to Cassandra. It begins with background on how Cassandra stores data using a row key and columns rather than tables and relations. Common patterns like one-to-many and many-to-many relationships are achieved without foreign keys by duplicating and denormalizing data. The document also covers concepts like UUIDs, transactions, and how some relational features like sequences are handled differently in Cassandra.
Synchronise your data between MySQL and MongoDBGiuseppe Maxia
The document discusses synchronizing data between MySQL and MongoDB using Tungsten Replicator. Tungsten Replicator allows data to be replicated from a MySQL database to MongoDB in near real-time. The document provides examples of data being inserted and updated in MySQL, and then appearing in MongoDB through the replication process. It also discusses security features and how to install Tungsten Replicator for basic master-slave replication between MySQL and MongoDB databases.
The document provides an overview of integrating the Cassandra database including:
- Cassandra is a key-value store that evolved to support tables but lacks SQL features like joins and aggregation.
- It offers predictable performance as data grows and no single point of failure through replication across nodes.
- To write and read from Cassandra, clients connect to nodes and operations are distributed based on partitioning keys, with tunable consistency levels.
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetchDataStax Academy
Do you love some Cassandra, but that relational brain is still on? You aren't alone. Let's take that OLAP data model and get it OLTP. This will be an updated talk with some of the new features brought to you by Cassandra 3.0. Real techniques to translate application patterns into effective models. Common pitfalls that can slow you down and send you running back to RDBMS land. Don't do it! Finally, if you didn't get it right the first time, I'll show you how to fix that data model without any downtime. Turn a hot cup of fail into a tall glass of awesome!
The document discusses Ivan Zoratti's presentation on using MySQL for big data. It defines big data and how it can be structured as either unstructured or structured data. It then outlines various technologies that can be used with MySQL like storage engines, partitioning, columnar databases, and the MariaDB optimizer. The presentation provides an overview of how these technologies can help manage large and complex data sets with MySQL.
The document provides an overview of a presentation on Apache Cassandra and Spark. It introduces the speaker and their background with Cassandra. The presentation will cover a recap of Cassandra, replication, fault tolerance, data modeling, and Spark integration. It will also look at a potential use case with KillrWeather. Common Cassandra use cases include ordered data like time series for events, financial transactions, and sensor data.
Functional data models are great, but how can you squeeze out more performance and make them awesome! Let's talk through some example models, go through the tuning steps and understand the tradeoffs. Many time's just a simple understanding of the underlying internals can make all the difference. I've helped some of the biggest companies in the world do this and I can help you. Do you feel the need for Cassandra 2.0 speed?
This document discusses MySQL and SQL. It provides information on installing and downloading MySQL, how to connect to a MySQL database using the command line, how to create, select, insert, update, and delete data from MySQL databases and tables using SQL statements. It also includes SQL statements for creating sample tables to demonstrate MySQL and SQL commands.
This document summarizes a presentation about Apache Cassandra given by Christopher Batey, a technical evangelist for Cassandra. The presentation provides an overview of Cassandra, including its use cases, data modeling approach, and replication strategy. It also discusses Cassandra's roots in Amazon's Dynamo paper and Google's BigTable system. Examples are given around modeling time series data and building a customer event store in Cassandra.
A 30 minute talk I did at Cassandra Dublin and Cassandra London. Just some things I've learned along the way as I've helped some of the largest users of Cassandra be successful. Learn form other peoples mistakes!
Clojure at DataStax: The Long Road From Python to Clojurenickmbailey
This talk will detail our experience using Clojure at DataStax for our OpsCenter product. The main focus of the talk will be our desire to move from Python to Clojure and how the process is going. As part of that I’ll discuss why we decided to introduce Clojure to begin with, how we first integrated Clojure into a single component of our application, and how we are now working towards migrating our entire application to Clojure. From a technical perspective, I’ll cover our approach to migrating to Clojure by using Jython as an intermediate step between Python and Clojure. I’ll also touch on the experience of choosing Clojure and then scaling a development team from a single team 3 developers to multiple teams and over 15 developers in a span of five years.
The document provides an overview of Apache Cassandra's architecture and design. It was created to address the needs of building reliable, high-performing, and always-available distributed databases. Cassandra is based on Dynamo and BigTable and uses a distributed hashing technique to partition and replicate data across nodes. It supports configurable replication across multiple data centers for high availability. Writes are sent to the local node and replicated to other nodes based on consistency level, while reads can be served from any replica.
This document provides an overview of Cassandra and Spark and how they can be used together. It first introduces Cassandra as a linearly scalable and fault tolerant distributed database. It then discusses key Cassandra concepts like data distribution, consistency levels, and the Cassandra query language (CQL). The document next introduces Spark as a distributed computing framework similar to Hadoop MapReduce. It describes the Spark architecture and programming model using Resilient Distributed Datasets (RDDs). Finally, it explains how Cassandra data can be accessed as RDDs in Spark, allowing for analytics and transformations on Cassandra data using Spark's API in a simple and workload isolated way. Code examples are provided to demonstrate connecting Spark to Cassandra, reading and
Lightning fast analytics with Spark and Cassandranickmbailey
Spark is a fast and general engine for large-scale data processing. It provides APIs for Java, Scala, and Python that allow users to load data into a distributed cluster as resilient distributed datasets (RDDs) and then perform operations like map, filter, reduce, join and save. The Cassandra Spark driver allows accessing Cassandra tables as RDDs to perform analytics and run Spark SQL queries across Cassandra data. It provides server-side data selection and mapping of rows to Scala case classes or other objects.
An introduction to Cassandra as well as an example of accessing Cassandra from Clojure.
Includes an introduction to cluster architecture and data model in Cassandra. The code for the examples is available at: https://github.com/nickmbailey/clojure-cassandra-demo
CFS: Cassandra backed storage for Hadoopnickmbailey
This document describes CFS, a Cassandra-backed storage system for Hadoop. It discusses the motivations for building such a system given Cassandra's strengths in scalability and real-time data but lack of support for ad-hoc queries. The solution presented uses Cassandra to store Hadoop file metadata and block data, allowing tasks to run locally for data locality. It describes how files are written by storing metadata in an "inode" column family and block contents in an "sblocks" column family. Reads then retrieve the necessary blocks from Cassandra via Thrift.
The document provides an overview of integrating the Cassandra database including:
- Cassandra is a key-value store that evolved to support tables but lacks SQL features like joins and aggregation.
- It offers predictable performance as data grows and no single point of failure through replication across nodes.
- To write and read from Cassandra, clients connect to nodes and operations are distributed based on partitioning keys, with tunable consistency levels.
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetchDataStax Academy
Do you love some Cassandra, but that relational brain is still on? You aren't alone. Let's take that OLAP data model and get it OLTP. This will be an updated talk with some of the new features brought to you by Cassandra 3.0. Real techniques to translate application patterns into effective models. Common pitfalls that can slow you down and send you running back to RDBMS land. Don't do it! Finally, if you didn't get it right the first time, I'll show you how to fix that data model without any downtime. Turn a hot cup of fail into a tall glass of awesome!
The document discusses Ivan Zoratti's presentation on using MySQL for big data. It defines big data and how it can be structured as either unstructured or structured data. It then outlines various technologies that can be used with MySQL like storage engines, partitioning, columnar databases, and the MariaDB optimizer. The presentation provides an overview of how these technologies can help manage large and complex data sets with MySQL.
The document provides an overview of a presentation on Apache Cassandra and Spark. It introduces the speaker and their background with Cassandra. The presentation will cover a recap of Cassandra, replication, fault tolerance, data modeling, and Spark integration. It will also look at a potential use case with KillrWeather. Common Cassandra use cases include ordered data like time series for events, financial transactions, and sensor data.
Functional data models are great, but how can you squeeze out more performance and make them awesome! Let's talk through some example models, go through the tuning steps and understand the tradeoffs. Many time's just a simple understanding of the underlying internals can make all the difference. I've helped some of the biggest companies in the world do this and I can help you. Do you feel the need for Cassandra 2.0 speed?
This document discusses MySQL and SQL. It provides information on installing and downloading MySQL, how to connect to a MySQL database using the command line, how to create, select, insert, update, and delete data from MySQL databases and tables using SQL statements. It also includes SQL statements for creating sample tables to demonstrate MySQL and SQL commands.
This document summarizes a presentation about Apache Cassandra given by Christopher Batey, a technical evangelist for Cassandra. The presentation provides an overview of Cassandra, including its use cases, data modeling approach, and replication strategy. It also discusses Cassandra's roots in Amazon's Dynamo paper and Google's BigTable system. Examples are given around modeling time series data and building a customer event store in Cassandra.
A 30 minute talk I did at Cassandra Dublin and Cassandra London. Just some things I've learned along the way as I've helped some of the largest users of Cassandra be successful. Learn form other peoples mistakes!
Similar to Introduction to Cassandra and Data Modeling (8)
Clojure at DataStax: The Long Road From Python to Clojurenickmbailey
This talk will detail our experience using Clojure at DataStax for our OpsCenter product. The main focus of the talk will be our desire to move from Python to Clojure and how the process is going. As part of that I’ll discuss why we decided to introduce Clojure to begin with, how we first integrated Clojure into a single component of our application, and how we are now working towards migrating our entire application to Clojure. From a technical perspective, I’ll cover our approach to migrating to Clojure by using Jython as an intermediate step between Python and Clojure. I’ll also touch on the experience of choosing Clojure and then scaling a development team from a single team 3 developers to multiple teams and over 15 developers in a span of five years.
The document provides an overview of Apache Cassandra's architecture and design. It was created to address the needs of building reliable, high-performing, and always-available distributed databases. Cassandra is based on Dynamo and BigTable and uses a distributed hashing technique to partition and replicate data across nodes. It supports configurable replication across multiple data centers for high availability. Writes are sent to the local node and replicated to other nodes based on consistency level, while reads can be served from any replica.
This document provides an overview of Cassandra and Spark and how they can be used together. It first introduces Cassandra as a linearly scalable and fault tolerant distributed database. It then discusses key Cassandra concepts like data distribution, consistency levels, and the Cassandra query language (CQL). The document next introduces Spark as a distributed computing framework similar to Hadoop MapReduce. It describes the Spark architecture and programming model using Resilient Distributed Datasets (RDDs). Finally, it explains how Cassandra data can be accessed as RDDs in Spark, allowing for analytics and transformations on Cassandra data using Spark's API in a simple and workload isolated way. Code examples are provided to demonstrate connecting Spark to Cassandra, reading and
Lightning fast analytics with Spark and Cassandranickmbailey
Spark is a fast and general engine for large-scale data processing. It provides APIs for Java, Scala, and Python that allow users to load data into a distributed cluster as resilient distributed datasets (RDDs) and then perform operations like map, filter, reduce, join and save. The Cassandra Spark driver allows accessing Cassandra tables as RDDs to perform analytics and run Spark SQL queries across Cassandra data. It provides server-side data selection and mapping of rows to Scala case classes or other objects.
An introduction to Cassandra as well as an example of accessing Cassandra from Clojure.
Includes an introduction to cluster architecture and data model in Cassandra. The code for the examples is available at: https://github.com/nickmbailey/clojure-cassandra-demo
CFS: Cassandra backed storage for Hadoopnickmbailey
This document describes CFS, a Cassandra-backed storage system for Hadoop. It discusses the motivations for building such a system given Cassandra's strengths in scalability and real-time data but lack of support for ad-hoc queries. The solution presented uses Cassandra to store Hadoop file metadata and block data, allowing tasks to run locally for data locality. It describes how files are written by storing metadata in an "inode" column family and block contents in an "sblocks" column family. Reads then retrieve the necessary blocks from Cassandra via Thrift.
63. Come to the Summit!
Ask me for a discount code
June 11-12, 2013
San Francisco, CA
http://www.datastax.com/company/news-and-events/events/
cassandrasummit2013
Thursday, May 30, 13