Cassandra By Example: Data Modelling with CQL3Eric Evans
CQL is the query language for Apache Cassandra that provides an SQL-like interface. The document discusses the evolution from the older Thrift RPC interface to CQL and provides examples of modeling tweet data in Cassandra using tables like users, tweets, following, followers, userline, and timeline. It also covers techniques like denormalization, materialized views, and batch loading of related data to optimize for common queries.
The document discusses different database architectures including master-slave, master-master, and MySQL cluster. Master-slave involves one master node that handles writes and multiple read-only slave nodes. Master-master allows writes and reads on all nodes but has weaker consistency. MySQL cluster provides high availability, no single point of failure, and automatic sharding but has some limitations. The author has compiled pros and cons of each and decided MySQL cluster is best for their use case.
Running MariaDB in multiple data centersMariaDB plc
The document discusses running MariaDB across multiple data centers. It begins by outlining the need for multi-datacenter database architectures to provide high availability, disaster recovery, and continuous operation. It then describes topology choices for different use cases, including traditional disaster recovery, geo-synchronous distributed architectures, and how technologies like MariaDB Master/Slave and Galera Cluster work. The rest of the document discusses answering key questions when designing a multi-datacenter topology, trade-offs to consider, architecture technologies, and pros and cons of different approaches.
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayAltinity Ltd
The document summarizes a real-time data ingestion solution from Kafka to ClickHouse using a block aggregator to ensure exactly-once message delivery. The block aggregator aggregates Kafka messages into large blocks before loading to ClickHouse. It uses Kafka metadata and ClickHouse's block duplication detection to replay messages deterministically after failures. The talk outlines the block aggregator's design for multi-DC deployments, deterministic replay protocol, runtime monitoring with a verifier, implementation experiences, and production deployment metrics.
ClickHouse Monitoring 101: What to monitor and howAltinity Ltd
Webinar. Presented by Robert Hodges and Ned McClain, April 1, 2020
You are about to deploy ClickHouse into production. Congratulations! But what about monitoring? In this webinar we will introduce how to track the health of individual ClickHouse nodes as well as clusters. We'll describe available monitoring data, how to collect and store measurements, and graphical display using Grafana. We'll demo techniques and share sample Grafana dashboards that you can use for your own clusters.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, AdjustAltinity Ltd
This document discusses shipping data from PostgreSQL to ClickHouse using logical replication. It explains that logical replication in PostgreSQL replicates only DML statements, while physical replication replicates the entire database. It describes using pg2ch to replicate changes from PostgreSQL to ClickHouse tables using CollapsingMergeTree, ReplacingMergeTree, or MergeTree engines. Pg2ch accumulates changes in buffers and handles updating/deleting rows when using CollapsingMergeTree or ReplacingMergeTree engines.
Cassandra By Example: Data Modelling with CQL3Eric Evans
CQL is the query language for Apache Cassandra that provides an SQL-like interface. The document discusses the evolution from the older Thrift RPC interface to CQL and provides examples of modeling tweet data in Cassandra using tables like users, tweets, following, followers, userline, and timeline. It also covers techniques like denormalization, materialized views, and batch loading of related data to optimize for common queries.
The document discusses different database architectures including master-slave, master-master, and MySQL cluster. Master-slave involves one master node that handles writes and multiple read-only slave nodes. Master-master allows writes and reads on all nodes but has weaker consistency. MySQL cluster provides high availability, no single point of failure, and automatic sharding but has some limitations. The author has compiled pros and cons of each and decided MySQL cluster is best for their use case.
Running MariaDB in multiple data centersMariaDB plc
The document discusses running MariaDB across multiple data centers. It begins by outlining the need for multi-datacenter database architectures to provide high availability, disaster recovery, and continuous operation. It then describes topology choices for different use cases, including traditional disaster recovery, geo-synchronous distributed architectures, and how technologies like MariaDB Master/Slave and Galera Cluster work. The rest of the document discusses answering key questions when designing a multi-datacenter topology, trade-offs to consider, architecture technologies, and pros and cons of different approaches.
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayAltinity Ltd
The document summarizes a real-time data ingestion solution from Kafka to ClickHouse using a block aggregator to ensure exactly-once message delivery. The block aggregator aggregates Kafka messages into large blocks before loading to ClickHouse. It uses Kafka metadata and ClickHouse's block duplication detection to replay messages deterministically after failures. The talk outlines the block aggregator's design for multi-DC deployments, deterministic replay protocol, runtime monitoring with a verifier, implementation experiences, and production deployment metrics.
ClickHouse Monitoring 101: What to monitor and howAltinity Ltd
Webinar. Presented by Robert Hodges and Ned McClain, April 1, 2020
You are about to deploy ClickHouse into production. Congratulations! But what about monitoring? In this webinar we will introduce how to track the health of individual ClickHouse nodes as well as clusters. We'll describe available monitoring data, how to collect and store measurements, and graphical display using Grafana. We'll demo techniques and share sample Grafana dashboards that you can use for your own clusters.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, AdjustAltinity Ltd
This document discusses shipping data from PostgreSQL to ClickHouse using logical replication. It explains that logical replication in PostgreSQL replicates only DML statements, while physical replication replicates the entire database. It describes using pg2ch to replicate changes from PostgreSQL to ClickHouse tables using CollapsingMergeTree, ReplacingMergeTree, or MergeTree engines. Pg2ch accumulates changes in buffers and handles updating/deleting rows when using CollapsingMergeTree or ReplacingMergeTree engines.
Gartner 2021 Magic Quadrant for Cloud Database Management Systems.pdfmomirlan
By 2025, cloud preference for data management will substantially reduce the vendor landscape while the growth in multicloud will increase complexity for data governance and integration. By 2022, cloud database management system (DBMS) revenue will account for 50% of the total DBMS market revenue. This Magic Quadrant evaluates vendors in the cloud DBMS market and assesses their strengths and weaknesses. It provides an overview of 14 vendors in the market including leaders like AWS, Databricks, and Alibaba Cloud as well as niche players like Cockroach Labs and Exasol.
Migration to ClickHouse. Practical guide, by Alexander ZaitsevAltinity Ltd
This document provides a summary of migrating to ClickHouse for analytics use cases. It discusses the author's background and company's requirements, including ingesting 10 billion events per day and retaining data for 3 months. It evaluates ClickHouse limitations and provides recommendations on schema design, data ingestion, sharding, and SQL. Example queries demonstrate ClickHouse performance on large datasets. The document outlines the company's migration timeline and challenges addressed. It concludes with potential future integrations between ClickHouse and MySQL.
The document provides an introduction and overview of Apache Kafka presented by Jeff Holoman. It begins with an agenda and background on the presenter. It then covers basic Kafka concepts like topics, partitions, producers, consumers and consumer groups. It discusses efficiency and delivery guarantees. Finally, it presents some use cases for Kafka and positioning around when it may or may not be a good fit compared to other technologies.
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...Altinity Ltd
JSON is the king of data formats and ClickHouse has a plethora of features to handle it. This webinar covers JSON features from A to Z starting with traditional ways to load and represent JSON data in ClickHouse. Next, we’ll jump into the JSON data type: how it works, how to query data from it, and what works and doesn’t work. JSON data type is one of the most awaited features in the 2022 ClickHouse roadmap, so you won’t want to miss out. Finally, we’ll talk about Jedi master techniques like adding bloom filter indexing on JSON data.
A brief history of Instagram's adoption cycle of the open source distributed database Apache Cassandra, in addition to details about it's use case and implementation. This was presented at the San Francisco Cassandra Meetup at the Disqus HQ in August 2013.
Understanding the architecture of MariaDB ColumnStoreMariaDB plc
MariaDB ColumnStore extends MariaDB Server, a relational database for transaction processing, with distributed columnar storage and parallel query processing for scalable, high-performance analytical processing. This session helps MariaDB users understand how MariaDB ColumnStore works and why it’s needed for more demanding analytical workloads, and covers:
Use cases
Query processing
Bulk data insertion
Distributed partitions
Query optimization
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlareAltinity Ltd
Presented on December ClickHouse Meetup. Dec 3, 2019
Concrete findings and "best practices" from building a cluster sized for 150 analytic queries per second on 100TB of http logs. Topics covered: hardware, clients (http vs native), partitioning, indexing, SELECT vs INSERT performance, replication, sharding, quotas, and benchmarking.
The document discusses the Domain Name System (DNS) and how it works. DNS is an internet directory service that maps hostnames to IP addresses, allowing users to use names instead of numbers. It uses a distributed, hierarchical system of name servers to perform this name resolution in a scalable way. DNS caches mappings for performance, starting queries at the highest level domains and following delegations between servers until the answer is found. DNS has become a major attack vector, so protection of DNS infrastructure and traffic is important.
Introduction to Docker Compose | Docker Intermediate WorkshopAjeet Singh Raina
Docker Compose allows users to define and run multi-container Docker applications. With Docker Compose, a YAML file is used to configure an application's services, and with a single command, all the services can be started from the configuration. Docker Compose is a three step process - services are defined in a Dockerfile, then in a Docker Compose file, and then run with docker-compose up. It supports volumes, networks, and environmental variables. Docker Compose can be used for development, testing, and production environments across different platforms.
MySQL Database Architectures - InnoDB ReplicaSet & ClusterKenny Gryp
This document provides an overview and comparison of MySQL InnoDB Cluster and MySQL InnoDB ReplicaSet. It discusses the components, goals, and features of each solution. MySQL InnoDB Cluster uses Group Replication to provide high availability, automatic failover, and data consistency. MySQL InnoDB ReplicaSet uses asynchronous replication and provides availability and read scaling through manual primary/secondary configuration and failover. Both solutions integrate MySQL Shell, Router, and automatic member provisioning for easy management.
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...Severalnines
This presentation by Krzysztof Książek at Percona Live 2017 in Santa Clara, California gives detailed descriptions and comparisons of the leading open source database load balancing technologies
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Altinity Ltd
Presented at the webinar, July 31, 2019
Built-in replication is a powerful ClickHouse feature that helps scale data warehouse performance as well as ensure high availability. This webinar will introduce how replication works internally, explain configuration of clusters with replicas, and show you how to set up and manage ZooKeeper, which is necessary for replication to function. We'll finish off by showing useful replication tricks, such as utilizing replication to migrate data between hosts. Join us to become an expert in this important subject!
MySQL InnoDB Cluster - A complete High Availability solution for MySQLOlivier DASINI
MySQL InnoDB Cluster provides a complete high availability solution for MySQL. It uses MySQL Group Replication, which allows for multiple read-write replicas of a database to exist with synchronous replication. MySQL InnoDB Cluster also includes MySQL Shell for setup, management and orchestration of the cluster, and MySQL Router for intelligent connection routing. It allows databases to scale out writes across replicas in a fault-tolerant and self-healing manner.
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...DataStax
Deleting data from Cassandra has several challenges, and existing solutions (tombstones or TTLs) have limitations that make them unusable or untenable in certain circumstances. We'll explore the cases where existing deletion options fail or are inadequate, then describe a solution we developed which deletes data from Cassandra during standard or user-defined compaction, but without resorting to tombstones or TTL's.
About the Speaker
Eric Stevens Principal Architect, ProtectWise, Inc.
Eric is the principal architect, and day one employee of ProtectWise, Inc., specializing in massive real time processing and scalability problems. The team at ProtectWise processes, analyzes, optimizes, indexes, and stores billions of network packets each second. They look for threats in real time, but also store full fidelity network data (including PCAP), and when new security intelligence is received, automatically replay existing network history through that new intelligence.
Maria DB Galera Cluster for High AvailabilityOSSCube
Want to understand how to set high availability solutions for MySQL using MariaDB Galera Cluster? Join this webinar, and learn from experts. During this webinar, you will also get guidance on how to implement MariaDB Galera Cluster.
Highly Scalable Java Programming for Multi-Core SystemJames Gan
This document discusses best practices for highly scalable Java programming on multi-core systems. It begins by outlining software challenges like parallelism, memory management, and storage management. It then introduces profiling tools like the Java Lock Monitor (JLM) and Multi-core SDK (MSDK) to analyze parallel applications. The document provides techniques like reducing lock scope and granularity, using lock stripping and striping, splitting hot points, and alternatives to exclusive locks. It also recommends reducing memory allocation and using immutable/thread local data. The document concludes by discussing lock-free programming and its advantages for scalability over locking.
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
Cassandra community has consistently requested that we cover C* schema design concepts. This presentation goes in depth on the following topics:
- Schema design
- Best Practices
- Capacity Planning
- Real World Examples
Gartner 2021 Magic Quadrant for Cloud Database Management Systems.pdfmomirlan
By 2025, cloud preference for data management will substantially reduce the vendor landscape while the growth in multicloud will increase complexity for data governance and integration. By 2022, cloud database management system (DBMS) revenue will account for 50% of the total DBMS market revenue. This Magic Quadrant evaluates vendors in the cloud DBMS market and assesses their strengths and weaknesses. It provides an overview of 14 vendors in the market including leaders like AWS, Databricks, and Alibaba Cloud as well as niche players like Cockroach Labs and Exasol.
Migration to ClickHouse. Practical guide, by Alexander ZaitsevAltinity Ltd
This document provides a summary of migrating to ClickHouse for analytics use cases. It discusses the author's background and company's requirements, including ingesting 10 billion events per day and retaining data for 3 months. It evaluates ClickHouse limitations and provides recommendations on schema design, data ingestion, sharding, and SQL. Example queries demonstrate ClickHouse performance on large datasets. The document outlines the company's migration timeline and challenges addressed. It concludes with potential future integrations between ClickHouse and MySQL.
The document provides an introduction and overview of Apache Kafka presented by Jeff Holoman. It begins with an agenda and background on the presenter. It then covers basic Kafka concepts like topics, partitions, producers, consumers and consumer groups. It discusses efficiency and delivery guarantees. Finally, it presents some use cases for Kafka and positioning around when it may or may not be a good fit compared to other technologies.
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...Altinity Ltd
JSON is the king of data formats and ClickHouse has a plethora of features to handle it. This webinar covers JSON features from A to Z starting with traditional ways to load and represent JSON data in ClickHouse. Next, we’ll jump into the JSON data type: how it works, how to query data from it, and what works and doesn’t work. JSON data type is one of the most awaited features in the 2022 ClickHouse roadmap, so you won’t want to miss out. Finally, we’ll talk about Jedi master techniques like adding bloom filter indexing on JSON data.
A brief history of Instagram's adoption cycle of the open source distributed database Apache Cassandra, in addition to details about it's use case and implementation. This was presented at the San Francisco Cassandra Meetup at the Disqus HQ in August 2013.
Understanding the architecture of MariaDB ColumnStoreMariaDB plc
MariaDB ColumnStore extends MariaDB Server, a relational database for transaction processing, with distributed columnar storage and parallel query processing for scalable, high-performance analytical processing. This session helps MariaDB users understand how MariaDB ColumnStore works and why it’s needed for more demanding analytical workloads, and covers:
Use cases
Query processing
Bulk data insertion
Distributed partitions
Query optimization
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlareAltinity Ltd
Presented on December ClickHouse Meetup. Dec 3, 2019
Concrete findings and "best practices" from building a cluster sized for 150 analytic queries per second on 100TB of http logs. Topics covered: hardware, clients (http vs native), partitioning, indexing, SELECT vs INSERT performance, replication, sharding, quotas, and benchmarking.
The document discusses the Domain Name System (DNS) and how it works. DNS is an internet directory service that maps hostnames to IP addresses, allowing users to use names instead of numbers. It uses a distributed, hierarchical system of name servers to perform this name resolution in a scalable way. DNS caches mappings for performance, starting queries at the highest level domains and following delegations between servers until the answer is found. DNS has become a major attack vector, so protection of DNS infrastructure and traffic is important.
Introduction to Docker Compose | Docker Intermediate WorkshopAjeet Singh Raina
Docker Compose allows users to define and run multi-container Docker applications. With Docker Compose, a YAML file is used to configure an application's services, and with a single command, all the services can be started from the configuration. Docker Compose is a three step process - services are defined in a Dockerfile, then in a Docker Compose file, and then run with docker-compose up. It supports volumes, networks, and environmental variables. Docker Compose can be used for development, testing, and production environments across different platforms.
MySQL Database Architectures - InnoDB ReplicaSet & ClusterKenny Gryp
This document provides an overview and comparison of MySQL InnoDB Cluster and MySQL InnoDB ReplicaSet. It discusses the components, goals, and features of each solution. MySQL InnoDB Cluster uses Group Replication to provide high availability, automatic failover, and data consistency. MySQL InnoDB ReplicaSet uses asynchronous replication and provides availability and read scaling through manual primary/secondary configuration and failover. Both solutions integrate MySQL Shell, Router, and automatic member provisioning for easy management.
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...Severalnines
This presentation by Krzysztof Książek at Percona Live 2017 in Santa Clara, California gives detailed descriptions and comparisons of the leading open source database load balancing technologies
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Altinity Ltd
Presented at the webinar, July 31, 2019
Built-in replication is a powerful ClickHouse feature that helps scale data warehouse performance as well as ensure high availability. This webinar will introduce how replication works internally, explain configuration of clusters with replicas, and show you how to set up and manage ZooKeeper, which is necessary for replication to function. We'll finish off by showing useful replication tricks, such as utilizing replication to migrate data between hosts. Join us to become an expert in this important subject!
MySQL InnoDB Cluster - A complete High Availability solution for MySQLOlivier DASINI
MySQL InnoDB Cluster provides a complete high availability solution for MySQL. It uses MySQL Group Replication, which allows for multiple read-write replicas of a database to exist with synchronous replication. MySQL InnoDB Cluster also includes MySQL Shell for setup, management and orchestration of the cluster, and MySQL Router for intelligent connection routing. It allows databases to scale out writes across replicas in a fault-tolerant and self-healing manner.
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...DataStax
Deleting data from Cassandra has several challenges, and existing solutions (tombstones or TTLs) have limitations that make them unusable or untenable in certain circumstances. We'll explore the cases where existing deletion options fail or are inadequate, then describe a solution we developed which deletes data from Cassandra during standard or user-defined compaction, but without resorting to tombstones or TTL's.
About the Speaker
Eric Stevens Principal Architect, ProtectWise, Inc.
Eric is the principal architect, and day one employee of ProtectWise, Inc., specializing in massive real time processing and scalability problems. The team at ProtectWise processes, analyzes, optimizes, indexes, and stores billions of network packets each second. They look for threats in real time, but also store full fidelity network data (including PCAP), and when new security intelligence is received, automatically replay existing network history through that new intelligence.
Maria DB Galera Cluster for High AvailabilityOSSCube
Want to understand how to set high availability solutions for MySQL using MariaDB Galera Cluster? Join this webinar, and learn from experts. During this webinar, you will also get guidance on how to implement MariaDB Galera Cluster.
Highly Scalable Java Programming for Multi-Core SystemJames Gan
This document discusses best practices for highly scalable Java programming on multi-core systems. It begins by outlining software challenges like parallelism, memory management, and storage management. It then introduces profiling tools like the Java Lock Monitor (JLM) and Multi-core SDK (MSDK) to analyze parallel applications. The document provides techniques like reducing lock scope and granularity, using lock stripping and striping, splitting hot points, and alternatives to exclusive locks. It also recommends reducing memory allocation and using immutable/thread local data. The document concludes by discussing lock-free programming and its advantages for scalability over locking.
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
Cassandra community has consistently requested that we cover C* schema design concepts. This presentation goes in depth on the following topics:
- Schema design
- Best Practices
- Capacity Planning
- Real World Examples
This is a full immersion in MongoDB, I talk about the MongoDB data model, when referencing and when embedding, how denormalize, how to use the aggregation framework of MongoDB
Cassandra Summit 2014: Fuzzy Entity Matching at ScaleDataStax Academy
Presenter: Ken Krugler, President of Scale Unlimited
Early Warning has information on hundreds of millions of people and companies. When a person wants to open a new bank account, they need to be able to accurately find similar entities in this large dataset, to provide a risk assessment. Using the combination of Cassandra & Solr via DSE, they can quickly find and evaluate all reasonable candidates.
Low Latency “OLAP” with HBase - HBaseCon 2012Cosmin Lehene
The document discusses building an OLAP system with low latency ingestion and querying capabilities using HBase. It describes using dimensions, metrics and aggregations to model OLAP data and support OLAP queries like rollups, slicing, dicing, sorting. It also discusses tuning ingestion throughput vs latency and using preprocessing to reduce data sizes through aggregation before querying.
Use Your MySQL Knowledge to Become an Instant Cassandra GuruTim Callaghan
This document discusses how to leverage MySQL knowledge when working with Cassandra. It explains that while CQL is similar to SQL, Cassandra does not support features like joins, foreign keys, and complex secondary indexes. It recommends focusing on schema design and modeling data around query patterns. The document provides examples of creating tables, handling relationships, and time series data. It also covers topics like transactions, replication, partitioning, and conflict resolution in Cassandra. The overall recommendation is to rethink data modeling for Cassandra and use this knowledge as a starting point rather than trying to port MySQL applications directly.
Detail behind the Apache Cassandra 2.0 release and what is new in it including Lightweight Transactions (compare and swap) Eager retries, Improved compaction, Triggers (experimental) and more!
• CQL cursors
Building a Scalable XML-based Dynamic Delivery Architecture: Standards and Be...Jerry SILVER
The document discusses challenges with traditional and dynamic content delivery and solutions using XML standards and a native XML database. It provides examples of using XQuery, XSLT, XForms, XProc and other XML standards to dynamically assemble and deliver personalized content at scale from an XML repository. It also presents two case studies of companies that implemented such standards-based dynamic XML content delivery solutions.
The document discusses scalable web architectures and common patterns for scaling web applications. It covers key topics like load balancing, caching, database replication and sharding, and asynchronous queuing to distribute workloads across multiple servers. The goal of these patterns is to scale traffic, data size, and maintainability through horizontal expansion rather than just vertical upgrades.
El 30 de abril de 1995 las empresas ya no necesitaban permiso para conectarse a Internet. ARPANET inició Internet en 1969, utilizando el protocolo TCP/IP. En 2009 había alrededor de 1,669 millones de usuarios de Internet.
This document discusses how to build scalable applications using Scala. It defines scalability as a system's ability to handle growing workloads through methods like supporting more users, adding new features easily, or maintaining performance across different locations. The document then covers how Scala supports scalability through features like concurrency/parallelism, immutability, and functional programming patterns. It provides examples of how Scala's collections library allows parallel operations and how futures can be composed to perform asynchronous and parallel work efficiently.
Hadoop World 2011: Advanced HBase Schema DesignCloudera, Inc.
While running a simple key/value based solution on HBase usually requires an equally simple schema, it is less trivial to operate a different application that has to insert thousands of records per second.
This talk will address the architectural challenges when designing for either read or write performance imposed by HBase. It will include examples of real world use-cases and how they can be implemented on top of HBase, using schemas that optimize for the given access patterns.
Presented by Andrew Erlichson, Vice President, Engineering, Developer Experience, MongoDB
Audience level: Beginner
MongoDB’s basic unit of storage is a document. Documents can represent rich, schema-free data structures, meaning that we have several viable alternatives to the normalized, relational model. In this talk, we’ll discuss the tradeoff of various data modeling strategies in MongoDB. You will learn:
- How to work with documents
- How to evolve your schema
- Common schema design patterns
This document provides an introduction to Cassandra including:
- Datastax is a company that contributes to Apache Cassandra and sells Datastax Enterprise.
- Cassandra was created at Facebook and is now open source software with the current version being 3.2.
- Cassandra's key features include linear scalability, continuous availability, multi-datacenter support, operational simplicity, and Spark integration.
1) Apache Cassandra in term of CAP Theorem
2) What makes Apache Cassandra "Available"?
3) How Apache Cassandra ensures data consistency?
4) Cassandra advantages and disadvantages
5) Frameworks/libraries to access Apache Cassandra + performance comparison
This document discusses Apache Cassandra and its features and use cases. It provides an overview of Cassandra's key characteristics like massive scalability, extreme availability, and rich data modeling. Example use cases mentioned include messaging, collections/playlists, fraud detection, recommendations, and IoT sensor data. New features introduced in Cassandra in 2016 are also summarized, such as delete by range, materialized views, atomic UDT updates, a new SASI index, and support for GROUP BY queries.
Cassandra uses a distributed architecture with consistent hashing to distribute data across nodes in a cluster. It provides high availability and partition tolerance by replicating data across multiple nodes and data centers. The coordinator node handles read and write requests from clients by interacting with the necessary replica nodes to satisfy the requested consistency level. Cassandra stores data both in memory and on disk for high performance and durability. It uses commit logs, memtables, and SSTables to manage data writes, caches for efficient reads, and a gossip protocol to detect node failures.
The document provides instructions on how to create tables, insert data, and write queries for a database with tables for students, library memberships, books, and book issue records. It includes examples of creating the tables with primary and foreign keys, inserting sample data, and queries to list student names and issued books, count books issued per student, and create views of issue records and daily issues.
This document provides an overview of C# programming and SQL. It introduces C#, its syntax derived from C/C++, and what can be done with C# including Windows and web applications. It also covers SQL for manipulating database records, including INSERT, UPDATE, DELETE statements. Sample SQL code is provided to insert, update, modify records in a phonebook table. Key things to remember are enclosing string data in single quotes, separating values with commas, ending statements with semicolons, and using Pascal casing.
Apache Cassandra Data Modeling with Travis PriceDataStax Academy
This document provides an overview of data modeling in Cassandra, including:
- The components of the Cassandra schema including columns, column families, keyspaces, and clusters.
- Best practices for designing Cassandra data models including working backwards from application requirements and de-normalizing data.
- Features like compound primary keys, secondary indexes, collections, and partitioning that help optimize data models for Cassandra.
- Examples of different types of column families and modeling patterns for user profiles, activity logs, and other common use cases.
This document provides instructions and examples for using the MySQL database system. It discusses MySQL concepts like database, tables, rows, and columns. It also demonstrates common SQL commands like CREATE, SELECT, INSERT, UPDATE, DROP. Examples show how to create databases and tables, insert data, query data, and more. Installation and configuration steps are also covered.
This document provides instructions and examples for using the MySQL database system. It discusses MySQL concepts like database, tables, rows, and columns. It also demonstrates common SQL commands like CREATE, SELECT, INSERT, UPDATE, DROP. Examples show how to create databases and tables, insert and query data, use functions, conditions and wildcards. Script files demonstrate populating tables with sample data.
This document introduces Apache Cassandra, a distributed column-oriented NoSQL database. It discusses Cassandra's architecture, data model, query language (CQL), and how to install and run Cassandra. Key points covered include Cassandra's linear scalability, high availability and fault tolerance. The document also demonstrates how to use the nodetool utility and provides guidance on backing up and restoring Cassandra data.
This document provides instructions for experiments in a Database Management System laboratory course. It includes a list of 12 experiments covering topics like Data Definition Language commands, Data Manipulation Language commands, database design using ER modeling and normalization, and implementation of various database applications. It also provides details on the hardware and software requirements for the course, as well as the internal assessment structure including marks distribution.
This document provides details on connecting to a Cassandra instance in EC2 cloud using Putty and performing various operations like listing keyspaces, working in a keyspace, describing and creating tables, inserting and retrieving records, deleting records, truncating tables, dropping tables and keyspaces, and using indexes. It also includes examples of creating a sample Facebook project with a UserProfiles table and updating records.
DDL(Data defination Language ) Using OracleFarhan Aslam
The document discusses DDL and DCL commands in Oracle including naming rules for objects, data types, creating tables, constraints, defining constraints, updating and violating constraints, creating tables using subqueries, altering tables, views, sequences, granting and revoking privileges, and dropping tables. It also discusses the Oracle data dictionary.
The document summarizes new features and improvements in Apache Cassandra 2.1, including enhanced performance, lightweight transactions, collection indexing, improved counters, incremental repair, and a new row cache. It also discusses Cassandra's use at eBay to power mission-critical features for hundreds of millions of users daily.
The document summarizes new features and improvements in Cassandra 2.0, including enhanced performance, scalability, and ease of use. Key updates include improved cursors for paging through large result sets, batching of prepared statements, simplified parameterized queries, additional CQL3 functionality, and lightweight transactions. Future plans outlined are secondary indexes on collections, more efficient repairs, custom data types, and aggregate functions in CQL. The document provides examples and explanations of new capabilities such as tracing, tombstone handling, and rapid read protection.
This document provides an introduction and overview of C# programming and SQL. It discusses key aspects of C#, its uses in Windows, web, and web service applications. It also covers SQL fundamentals like retrieving, inserting, updating, and deleting records. The document includes examples of SQL statements like INSERT, UPDATE, DELETE, and SELECT and highlights best practices like enclosing string values in single quotes and ending statements with semicolons.
Cassandra v3.0 at Rakuten meet-up on 12/2/2015datastaxjp
Cassandra v3.0 sessions at Cassandra Meet-up at Rakuten Tokyo, Fall 2015. New Functionality (support of JSON, new storage engine, Mview, UDF, UDA etc..)
The document contains 5 questions related to C++ programs for matrix and array operations:
1) Finding the average of numbers in an array
2) Finding the smallest number in a 1D array
3) Adding two matrices
4) Finding the transpose of a matrix
5) Multiplying two matrices (for 2x2 matrices and matrices up to 10x10)
Each question provides the full C++ code to implement the operation and sample input/output.
This document discusses using Python to interact with SQLite databases. It provides examples of how to connect to an SQLite database, create tables, insert/update/delete records, and query the database. Key points covered include using the sqlite3 module to connect to a database, getting a cursor object to execute SQL statements, and various cursor methods like execute(), executemany(), fetchone(), fetchall(), commit(), and close(). Example code is given for common SQLite operations like creating a table, inserting records, updating records, deleting records, and selecting records.
C*ollege Credit: Data Modeling for Apache CassandraDataStax
Cassandra stores data differently than traditional RDBMS’s. It is these differences that allow for improvements in performance, availability and scalability. Aaron Morton, DataStax MVP for Apache Cassandra will present the basics of the data model and outline the differences clearly. This webinar is 101 level and is suitable for people who are coming from a relational background and just starting to get into Apache Cassandra.
This document provides instructions for using Cassandra with Docker and examples of Cassandra queries for creating and interacting with keyspaces, tables, rows, columns and different data types including sets, lists, and maps. It demonstrates how to create and query tables with a single primary key or composite primary keys, add and modify columns, insert, update, select and delete data. The document concludes with an activity to design and implement an enrollment example using Cassandra.
Introduction to Data Modeling with Apache CassandraLuke Tillman
Relational systems have always been built on the premise of modeling relationships. As you will see, static schema, one-to-one, many-to-many still have a place in Cassandra. From the familiar, we’ll go into the specific differences in Cassandra and tricks to make your application fast and resilient.
Similar to Apache Cassandra Lesson: Data Modelling and CQL3 (20)
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
This presentation provides valuable insights into effective cost-saving techniques on AWS. Learn how to optimize your AWS resources by rightsizing, increasing elasticity, picking the right storage class, and choosing the best pricing model. Additionally, discover essential governance mechanisms to ensure continuous cost efficiency. Whether you are new to AWS or an experienced user, this presentation provides clear and practical tips to help you reduce your cloud costs and get the most out of your budget.
A Comprehensive Guide to DeFi Development Services in 2024Intelisync
DeFi represents a paradigm shift in the financial industry. Instead of relying on traditional, centralized institutions like banks, DeFi leverages blockchain technology to create a decentralized network of financial services. This means that financial transactions can occur directly between parties, without intermediaries, using smart contracts on platforms like Ethereum.
In 2024, we are witnessing an explosion of new DeFi projects and protocols, each pushing the boundaries of what’s possible in finance.
In summary, DeFi in 2024 is not just a trend; it’s a revolution that democratizes finance, enhances security and transparency, and fosters continuous innovation. As we proceed through this presentation, we'll explore the various components and services of DeFi in detail, shedding light on how they are transforming the financial landscape.
At Intelisync, we specialize in providing comprehensive DeFi development services tailored to meet the unique needs of our clients. From smart contract development to dApp creation and security audits, we ensure that your DeFi project is built with innovation, security, and scalability in mind. Trust Intelisync to guide you through the intricate landscape of decentralized finance and unlock the full potential of blockchain technology.
Ready to take your DeFi project to the next level? Partner with Intelisync for expert DeFi development services today!
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Tatiana Kojar
Skybuffer AI, built on the robust SAP Business Technology Platform (SAP BTP), is the latest and most advanced version of our AI development, reaffirming our commitment to delivering top-tier AI solutions. Skybuffer AI harnesses all the innovative capabilities of the SAP BTP in the AI domain, from Conversational AI to cutting-edge Generative AI and Retrieval-Augmented Generation (RAG). It also helps SAP customers safeguard their investments into SAP Conversational AI and ensure a seamless, one-click transition to SAP Business AI.
With Skybuffer AI, various AI models can be integrated into a single communication channel such as Microsoft Teams. This integration empowers business users with insights drawn from SAP backend systems, enterprise documents, and the expansive knowledge of Generative AI. And the best part of it is that it is all managed through our intuitive no-code Action Server interface, requiring no extensive coding knowledge and making the advanced AI accessible to more users.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
5. Column family (Table)
partition key
columns ...
email
name
tel
ab@c.to
otto
12345
email
name
tel
tel2
karl@a.b
karl
6789
12233
101
103
name
104
linda
6. Table with standard PRIMARY KEY
CREATE TABLE messages (
msg_id timeuuid PRIMARY KEY,
author text,
body text
);
9. “Wide-row” Table: Timeline
PRIMARY KEY = user_id + msg_id
partition key column
msg_id
author
body
9990
otto
Hello World!
msg_id
author
body
9991
linda
Hi @otto
103
103
10. Timeline Table is partitioned by user
and locally clustered by msg
103
103
104
104
msg_id
9990
msg_id
9994
msg_id
9881
msg_id
9999
...
...
...
...
...
...
...
...
Node A
211
211
211
212
msg_id
8090
msg_id
8555
msg_id
9678
msg_id
9877
Node B
...
...
...
...
...
...
...
...
11. Comparison: RDBMS vs. Cassandra
RDBMS Data Design
Cassandra Data Design
Users Table
Users Table
user_id
name
email
user_id
name
email
101
otto
o@t.to
101
otto
o@t.to
Tweets Table
tweet_id author_id
9990
101
Followers Table
follows_
id
id
104
4321
Tweets Table
body
Hello!
tweet_id author_id
9990
101
Follows Table
followed
_id
101
name
body
otto
Hello!
Followed Table
user_id
follows_list
id
followed_list
104
[101,117]
101
[104,109]
12. Exercise: Launch 1 Cassandra Node
Data Center DC1
(Germany)
A~$ start service cassandra
A
launch server A
/etc/conf/cassandra.yaml
seeds: A
rpc_address: 0.0.0.0
14. Intro: CLI, CQL2, CQL3
● CQL is “SQL for Cassandra”
● Cassandra CLI deprecated, CQL2 deprecated
● CQL3 is default since Cassandra 1.2
$ cqlsh
● Pipe scripts into cqlsh
$ cat cql_script | cqlsh
● Source files inside cqlsh
cqlsh> SOURCE '~/cassandra_training/cql3/
01_create_keyspaces';
15. CQL3
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Create a keyspace
Create a column family
Insert data
Alter schema
Update data
Delete data
Apply batch operation
Read data
Secondary Index
Compound Primary Key
Collections
Consistency level
Time-To-Live (TTL)
Counter columns
sstable2json utility tool
16. Create a SimpleStrategy keyspace
● Create a keyspace with SimpleStrategy and
"replication_factor" option with value "3" like
this:
cqlsh> CREATE KEYSPACE <ksname>
WITH REPLICATION =
{'class':'SimpleStrategy',
'replication_factor':3};
17. Exercise: Create a SimpleStrategy
keyspace
● Create a keyspace "simpledb" with SimpleStrategy and
replication factor 1.
19. Create a NetworkTopologyStrategy
keyspace
Create a keyspace with NetworkTopologyStrategy and
strategy option "DC1" with a value of "1" and "DC2" with a
value of "2" like this:
cqlsh> CREATE KEYSPACE <ksname>
WITH REPLICATION = {
'class':'NetworkTopologyStrategy',
'DC1':1,
'DC2':2
};
20. Exercise Create Table “users”
● Connect to the "twotter" keyspace.
cqlsh> USE twotter;
● Create new column family (Table) named "users".
cqlsh:twotter> CREATE TABLE users (
id int PRIMARY KEY,
name text,
email text
);
cqlsh:twotter> DESCRIBE TABLES;
cqlsh:twotter> DESCRIBE TABLE users;
*we use int instead of uuid in the exercises for the sake of readability
21. Exercise: Create Table "messages"
● Create a new Table named "messages" with the
attributes "posted_on", "user_id", "user_name", "body",
and a primary key that consists of "user_id" and
"posted_on".
cqlsh:twotter> CREATE TABLE messages (
posted_on bigint,
user_id int,
user_name text,
body text,
PRIMARY KEY (user_id, posted_on)
);
*we use bigint instead of timeuuid in the exercises for the sake of readability
22. Exercise: Insert data into Table
"users" of keyspace "twotter"
cqlsh:twotter>
INSERT INTO users(id, name, email)
VALUES (101, 'otto', 'otto@abc.de');
cqlsh:twotter> ... insert more records ...
cqlsh> SOURCE
'~/cassandra_training/cql3/03_insert';
23. Exercise: Insert message records
cqlsh:twotter>
INSERT INTO messages (user_id, posted_on,
user_name, body)
VALUES (101, 1384895178, 'otto', 'Hello!');
cqlsh:twotter> SELECT * FROM messages;
24. Read data
cqlsh:twotter> SELECT * FROM users;
id | email
| name
-----+--------------+------105 |
g@rd.de
| gerd
104 | linda@abc.de | linda
102 |
null
| jane
106 | heinz@xyz.de | heinz
101 | otto@abc.de | otto
103 |
null
| karl
25. Update data
cqlsh:twotter> UPDATE users
SET email = 'jane@smith.org'
WHERE id = 102;
id
| email
| name
-----+----------------+------105 |
g@rd.de
| gerd
104 |
linda@abc.de | linda
102 | jane@smith.org | jane
106 |
heinz@xyz.de | heinz
101 | otto@abc.de
| otto
103 |
null
| karl
26. Delete data
● Delete columns
cqlsh:twotter> DELETE email
FROM users
WHERE id = 105;
● Delete an entire row
cqlsh:twotter> DELETE FROM users
WHERE id = 106;
27. Delete data
id | email
| name
-----+----------------+------105 |
null
| gerd
104 |
linda@abc.de | linda
102 | jane@smith.org | jane
.
101 | otto@abc.de
| otto
103 |
null
| karl
28. Batch operation
● Execute multiple mutations with a single operation
cqlsh:twotter>
BEGIN BATCH
INSERT INTO users(id, name, email)
VALUES(107, 'john','j@doe.net')
INSERT INTO users(id, name)
VALUES(108, 'michael')
UPDATE users
SET email = 'michael@abc.de'
WHERE id = 108
DELETE FROM users WHERE id = 105
APPLY BATCH;
29. Batch operation
id | email
| name
-----+----------------+--------.
107 |
j@doe.net
| john
108 | michael@abc.de | michael
104 |
linda@abc.de | linda
102 | jane@smith.org | jane
101 | otto@abc.de
|
otto
103 |
null
| karl
30. Secondary Index
cqlsh:twotter>
CREATE INDEX name_index ON users(name);
cqlsh:twotter>
CREATE INDEX email_index ON users(email);
cqlsh:twotter> SELECT name, email FROM
users WHERE name = 'otto';
cqlsh:twotter> SELECT name, email FROM
users WHERE email = 'michael@abc.de';
31. Alter Table Schema
cqlsh:twotter>
ALTER TABLE users ADD password text;
cqlsh:twotter>
ALTER TABLE users
ADD password_reset_token text;
* Given its flexible schema, Cassandra’s CQL ALTER finishes much
quicker than RDBMS SQL ALTER where all existing records need to be
updated.
32. Alter Table Schema
id | email
| name
| password | password_reset_token
-----+----------------+---------+----------+---------------------107 |
j@doe.net
|
john
|
null |
null
108 | michael@abc.de | michael |
null |
null
104 |
linda@abc.de |
linda |
null |
null
102 | jane@smith.org |
jane
|
null |
null
101 |
otto@abc.de
| otto
|
null |
null
103 |
null |
karl
|
null |
null
33. Collections - Set
CQL3 introduces collections for storing complex data
structures, namely the following: set, list, and map. This is
the CQL way of modelling many-to-one relationships.
1. Let us add a set of "hobbies" to the Table "users".
cqlsh:twotter> ALTER TABLE users ADD hobbies
set<text>;
cqlsh:twotter> UPDATE users SET hobbies =
hobbies +
{'badminton','jazz'} WHERE id = 101;
34. Collections - List
2. Now create a Table "followers" with a list of followers.
cqlsh:twotter> CREATE TABLE followers (
user_id int PRIMARY KEY,
followers list<text>);
cqlsh:twotter> INSERT INTO followers (
user_id, followers)
VALUES (101, ['willi','heinz']);
cqlsh:twotter> SELECT * FROM followers;
user_id | followers
---------+-------------------101
| ['willi', 'heinz']
35. Collections - Map
3. Add a map to the Table "messages".
cqlsh:twotter>
ALTER TABLE messages
ADD comments map<text, text>;
cqlsh:twotter>
UPDATE messages
SET comments = comments + {'otto':'thx!'}
WHERE user_id = 103
AND posted_on = 1384895223;
36. Consistency Level
● Set the consistency level for all subsequent requests:
cqlsh:twotter> CONSISTENCY ONE;
cqlsh:twotter> CONSISTENCY QUORUM;
cqlsh:twotter> CONSISTENCY ALL;
● Show the current consistency level:
cqlsh:twotter> CONSISTENCY;
38. Exercise: Consistency Level
● Set the consistency level to ANY and execute a SELECT
statement.
Bad Request: ANY ConsistencyLevel is only
supported for writes
39. Exercise: Time-To-Live (TTL)
● Insert a user record with a password reset
token with a 77 second TTL value.
cqlsh:twotter>
INSERT INTO users (id, name,
password_reset_token)
VALUES (109, 'timo', 'abc-xyz-123')
USING TTL 77;
40. Exercise: Time-To-Live (TTL)
● The INSERT statement before will delete the
entire user record after 77 seconds.
● This is what we actually wanted to do:
cqlsh:twotter> INSERT INTO users (id, name)
VALUES(110, 'anna');
cqlsh:twotter> UPDATE users USING TTL 77
SET password_reset_token =
'abc-xyz-123'
WHERE id = 110;
41. Time-To-Live (TTL)
● Check the TTL expiration time in seconds.
cqlsh:twotter> SELECT TTL
(password_reset_token)
FROM messages
WHERE user_id = 110;
47. CQL v3.1.0
(New in Cassandra 2.0)
●
●
●
●
●
●
●
●
●
IF keyword
Lightweight transactions (“Compare-And-Set”)
Triggers (experimental!!)
CQL paging support
Drop column support
SELECT column aliases
Conditional DDL
Index enhancements
cqlsh COPY
48. IF Keyword
cqlsh> DROP KEYSPACE twotter;
cqlsh> DROP KEYSPACE twotter;
Bad Request: Cannot drop non existing
keyspace 'twotter'.
cqlsh> DROP KEYSPACE IF EXISTS twotter;
49. Lightweight Transactions
● Compare And Set (CAS)
● Example: without CAS, two users attempting
to create a unique user account in the same
cluster could overwrite each other’s work
with neither user knowing about it.
Source: http://www.datastax.com/documentation/cassandra/2.0/webhelp/cassandra/dml/dml_about_transactions_c.html
50. Lightweight Transactions
1. Register a new user
cqlsh:twotter>
INSERT INTO users (id, name, email)
VALUES (110, 'franz', 'fr@nz.de')
IF NOT EXISTS;
2. Perform a CAS reset of Karl’s email.
cqlsh:twotter>
UPDATE users
SET email = 'franz@gmail.com'
WHERE id = 110
IF email = 'fr@nz.de';
52. Exercise: Lightweight Transactions
● Perform a failing CAS e-mail reset:
cqlsh:twotter>
UPDATE users
SET email = 'franz@ABC.de'
WHERE id = 110
IF email = 'fr@nz.de';
[applied] | email
-----------+------------False
| franz@gmail.com
53. Exercise: Lightweight Transactions
● Write a password reset method by using an
expiring password_reset_token column and
a CAS password update query.
54. Exercise: Lightweight Transactions
cqlsh:twotter>
UPDATE users USING TTL 77
SET password_reset_token = 'abc-xyz-123'
WHERE id = 110;
cqlsh:twotter>
UPDATE users
SET password = 'geheim!'
WHERE id = 110
IF password_reset_token = 'abc-xyz-123';
55. Create a Trigger
(experimental feature)
● Triggers are written in Java.
● Triggers are currently an experimental feature
in Cassandra 2.0. Use with caution!
cqlsh:twotter>
CREATE TRIGGER myTrigger ON users USING 'org.
apache.cassandra.triggers.InvertedIndex'