This document provides an overview of data modeling in Cassandra, including:
- The components of the Cassandra schema including columns, column families, keyspaces, and clusters.
- Best practices for designing Cassandra data models including working backwards from application requirements and de-normalizing data.
- Features like compound primary keys, secondary indexes, collections, and partitioning that help optimize data models for Cassandra.
- Examples of different types of column families and modeling patterns for user profiles, activity logs, and other common use cases.
Further discussion on Data Modeling with Apache Cassandra. Overview of formal data modeling techniques as well as practical. Real-world use cases and associated data models.
Cassandra Community Webinar | Become a Super ModelerDataStax
Sure you can do some time series modeling. Maybe some user profiles. What's going to make you a super modeler? Let's take a look at some great techniques taken from real world applications where we exploit the Cassandra big table model to it's fullest advantage. We'll cover some of the new features in CQL 3 as well as some tried and true methods. In particular, we will look at fast indexing techniques to get data faster at scale. You'll be jet setting through your data like a true super modeler in no time.
Speaker: Patrick McFadin, Principal Solutions Architect at DataStax
Further discussion on Data Modeling with Apache Cassandra. Overview of formal data modeling techniques as well as practical. Real-world use cases and associated data models.
Cassandra Community Webinar | Become a Super ModelerDataStax
Sure you can do some time series modeling. Maybe some user profiles. What's going to make you a super modeler? Let's take a look at some great techniques taken from real world applications where we exploit the Cassandra big table model to it's fullest advantage. We'll cover some of the new features in CQL 3 as well as some tried and true methods. In particular, we will look at fast indexing techniques to get data faster at scale. You'll be jet setting through your data like a true super modeler in no time.
Speaker: Patrick McFadin, Principal Solutions Architect at DataStax
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinDataStax Academy
You know you need Cassandra for it's uptime and scaling, but what about that data model? Let's bridge that gap and get you building your game changing app. We'll break down topics like storing objects and indexing for fast retrieval. You will see by understanding a few things about Cassandra internals, you can put your data model in the spotlight. The goal of this talk is to get you comfortable working with data in Cassandra throughout the application lifecycle. What are you waiting for? The cameras are waiting!
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...DataStax
The 3.0 storage engine re-write is the biggest and most exciting change to ever happen in Apache Cassandra. The new storage engine can efficiently store and read data from disk using the same concepts present in the CQL 3 language. This has delivered large space savings, and creates new performance characteristics.
In this talk Aaron Morton, Co Founder at The Last Pickle and Apache Cassandra Committer, will discuss the 3.0 storage engine, it's layout and performance characteristics.
About the Speaker
Aaron Morton CEO, The Last Pickle
Aaron Morton is the Co Founder & CEO at The Last Pickle (thelastpickle.com). A professional services company that works with clients to deliver and improve Apache Cassandra based solutions. He's based in New Zealand, is an Apache Cassandra Committer and a DataStax MVP for Apache Cassandra.
A lot has changed since I gave one of these talks and man, has it been good. 2.0 brought us a lot of new CQL features and now with 2.1 we get even more! Let me show you some real life data models and those new features taking developer productivity to an all new high. User Defined Types, New Counters, Paging, Static Columns. Exciting new ways of making your app truly killer!
Cassandra 3.0 - JSON at scale - StampedeCon 2015StampedeCon
This session will explore the new features in Cassandra 3.0, starting with JSON support. Cassandra now allows storing JSON directly to Cassandra rows and vice versa, making it trivial to deploy Cassandra as a component in modern service-oriented architectures.
Cassandra 3.0 also delivers other enhancements to developer productivity: user defined functions let developers deploy custom application logic server side with any language conforming to the Java scripting API, including Javascript. Global indexes allow scaling indexed queries linearly with the size of the cluster, a first for open-source NoSQL databases.
Finally, we will cover the performance improvements in Cassandra 3.0 as well.
Introduction to data modeling with apache cassandraPatrick McFadin
Are you using relational databases and wonder how to get started with data modeling and Apache Cassandra? Here is a starting tour of how to get started. Translating from the knowledge you already have to the knowledge you need to effective with Cassandra development. We cover patterns and anti-patterns. Get going today!
Cassandra is a free and open-source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. The slides explain the details of data modeling in Cassandra.
Introduction to CQL and Data Modeling with Apache CassandraJohnny Miller
Cassandra Meetup, Helsinki February 2014. Introduction to CQL and Data Modeling with Apache Cassandra. You can find the video here: http://bit.ly/jpm_004
Using Spark to Load Oracle Data into CassandraJim Hatcher
This presentation describes how you can use Spark as an ETL tool to get data from a relational database into Cassandra. I go through the concept in general and then talk about some specific issues you might run into and how to fix them.
Time series with Apache Cassandra - Long versionPatrick McFadin
Apache Cassandra has proven to be one of the best solutions for storing and retrieving time series data. This talk will give you an overview of the many ways you can be successful. We will discuss how the storage model of Cassandra is well suited for this pattern and go over examples of how best to build data models.
Cassandra Community Webinar: Back to Basics with CQL3DataStax
Cassandra is a distributed, massively scalable, fault tolerant, columnar data store, and if you need the ability to make fast writes, the only thing faster than Cassandra is /dev/null! In this fast-paced presentation, we'll briefly describe big data, and the area of big data that Cassandra is designed to fill. We will cover Cassandra's unique, every-node-the-same architecture. We will reveal Cassandra's internal data structure and explain just why Cassandra is so darned fast. Finally, we'll wrap up with a discussion of data modeling using the new standard protocol: CQL (Cassandra Query Language).
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016DataStax
Most web applications start out with a Postgres database and it serves the application very well for an extended period of time. Based on type of application, the data model of the app will have a table that tracks some kind of state for either objects in the system or the users of the application. Names for this table include logs, messages or events. The growth in the number of rows in this table is not linear as the traffic to the app increases, it's typically exponential.
Over time, the state table will increasingly become the bulk of the data volume in Postgres, think terabytes, and become increasingly hard to query. This use case can be characterized as the one-big-table problem. In this situation, it makes sense to move that table out of Postgres and into Cassandra. This talk will walk through the conceptual differences between the two systems, a bit of data modeling, as well as advice on making the conversion.
About the Speaker
Rimas Silkaitis Product Manager, Heroku
Rimas currently runs Product for Heroku Postgres and Heroku Redis but the common thread throughout his career is data. From data analysis, building data warehouses and ultimately building data products, he's held various positions that have allowed him to see the challenges of working with data at all levels of an organization. This experience spans the smallest of startups to the biggest enterprises.
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingDataStax Academy
You know you need Cassandra for it's uptime and scaling, but what about that data model? Let's bridge that gap and get you building your game changing app. We'll break down topics like storing objects and indexing for fast retrieval. You will see by understanding a few things about Cassandra internals, you can put your data model in the spotlight. The goal of this talk is to get you comfortable working with data in Cassandra throughout the application lifecycle. What are you waiting for? The cameras are waiting!
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinDataStax Academy
You know you need Cassandra for it's uptime and scaling, but what about that data model? Let's bridge that gap and get you building your game changing app. We'll break down topics like storing objects and indexing for fast retrieval. You will see by understanding a few things about Cassandra internals, you can put your data model in the spotlight. The goal of this talk is to get you comfortable working with data in Cassandra throughout the application lifecycle. What are you waiting for? The cameras are waiting!
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...DataStax
The 3.0 storage engine re-write is the biggest and most exciting change to ever happen in Apache Cassandra. The new storage engine can efficiently store and read data from disk using the same concepts present in the CQL 3 language. This has delivered large space savings, and creates new performance characteristics.
In this talk Aaron Morton, Co Founder at The Last Pickle and Apache Cassandra Committer, will discuss the 3.0 storage engine, it's layout and performance characteristics.
About the Speaker
Aaron Morton CEO, The Last Pickle
Aaron Morton is the Co Founder & CEO at The Last Pickle (thelastpickle.com). A professional services company that works with clients to deliver and improve Apache Cassandra based solutions. He's based in New Zealand, is an Apache Cassandra Committer and a DataStax MVP for Apache Cassandra.
A lot has changed since I gave one of these talks and man, has it been good. 2.0 brought us a lot of new CQL features and now with 2.1 we get even more! Let me show you some real life data models and those new features taking developer productivity to an all new high. User Defined Types, New Counters, Paging, Static Columns. Exciting new ways of making your app truly killer!
Cassandra 3.0 - JSON at scale - StampedeCon 2015StampedeCon
This session will explore the new features in Cassandra 3.0, starting with JSON support. Cassandra now allows storing JSON directly to Cassandra rows and vice versa, making it trivial to deploy Cassandra as a component in modern service-oriented architectures.
Cassandra 3.0 also delivers other enhancements to developer productivity: user defined functions let developers deploy custom application logic server side with any language conforming to the Java scripting API, including Javascript. Global indexes allow scaling indexed queries linearly with the size of the cluster, a first for open-source NoSQL databases.
Finally, we will cover the performance improvements in Cassandra 3.0 as well.
Introduction to data modeling with apache cassandraPatrick McFadin
Are you using relational databases and wonder how to get started with data modeling and Apache Cassandra? Here is a starting tour of how to get started. Translating from the knowledge you already have to the knowledge you need to effective with Cassandra development. We cover patterns and anti-patterns. Get going today!
Cassandra is a free and open-source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. The slides explain the details of data modeling in Cassandra.
Introduction to CQL and Data Modeling with Apache CassandraJohnny Miller
Cassandra Meetup, Helsinki February 2014. Introduction to CQL and Data Modeling with Apache Cassandra. You can find the video here: http://bit.ly/jpm_004
Using Spark to Load Oracle Data into CassandraJim Hatcher
This presentation describes how you can use Spark as an ETL tool to get data from a relational database into Cassandra. I go through the concept in general and then talk about some specific issues you might run into and how to fix them.
Time series with Apache Cassandra - Long versionPatrick McFadin
Apache Cassandra has proven to be one of the best solutions for storing and retrieving time series data. This talk will give you an overview of the many ways you can be successful. We will discuss how the storage model of Cassandra is well suited for this pattern and go over examples of how best to build data models.
Cassandra Community Webinar: Back to Basics with CQL3DataStax
Cassandra is a distributed, massively scalable, fault tolerant, columnar data store, and if you need the ability to make fast writes, the only thing faster than Cassandra is /dev/null! In this fast-paced presentation, we'll briefly describe big data, and the area of big data that Cassandra is designed to fill. We will cover Cassandra's unique, every-node-the-same architecture. We will reveal Cassandra's internal data structure and explain just why Cassandra is so darned fast. Finally, we'll wrap up with a discussion of data modeling using the new standard protocol: CQL (Cassandra Query Language).
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016DataStax
Most web applications start out with a Postgres database and it serves the application very well for an extended period of time. Based on type of application, the data model of the app will have a table that tracks some kind of state for either objects in the system or the users of the application. Names for this table include logs, messages or events. The growth in the number of rows in this table is not linear as the traffic to the app increases, it's typically exponential.
Over time, the state table will increasingly become the bulk of the data volume in Postgres, think terabytes, and become increasingly hard to query. This use case can be characterized as the one-big-table problem. In this situation, it makes sense to move that table out of Postgres and into Cassandra. This talk will walk through the conceptual differences between the two systems, a bit of data modeling, as well as advice on making the conversion.
About the Speaker
Rimas Silkaitis Product Manager, Heroku
Rimas currently runs Product for Heroku Postgres and Heroku Redis but the common thread throughout his career is data. From data analysis, building data warehouses and ultimately building data products, he's held various positions that have allowed him to see the challenges of working with data at all levels of an organization. This experience spans the smallest of startups to the biggest enterprises.
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingDataStax Academy
You know you need Cassandra for it's uptime and scaling, but what about that data model? Let's bridge that gap and get you building your game changing app. We'll break down topics like storing objects and indexing for fast retrieval. You will see by understanding a few things about Cassandra internals, you can put your data model in the spotlight. The goal of this talk is to get you comfortable working with data in Cassandra throughout the application lifecycle. What are you waiting for? The cameras are waiting!
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
Cassandra community has consistently requested that we cover C* schema design concepts. This presentation goes in depth on the following topics:
- Schema design
- Best Practices
- Capacity Planning
- Real World Examples
I don't think it's hyperbole when I say that Facebook, Instagram, Twitter & Netflix now define the dimensions of our social & entertainment universe. But what kind of technology engines purr under the hoods of these social media machines?
Here is a tech student's perspective on making the paradigm shift to "Big Data" using innovative models: alphabet blocks, nesting dolls, & LEGOs!
Get info on:
- What is Cassandra (C*)?
- Installing C* Community Version on Amazon Web Services EC2
- Data Modelling & Database Design in C* using CQL3
- Industry Use Cases
Symantec: Cassandra Data Modelling techniques in actionDataStax Academy
Our product presents an aggregated view of metadata collected for billions of objects (files, emails, sharepoint objects etc.). We used Cassandra to store those billions of objects along with aggregated view of that metadata. Customers can analyse the corpus of data in real time by searching in completely flexible way i.e. be able to get summary aggregates for many billions of objects, and then be able to further drill down to items by filtering using various facets of the metadata. We achieve this using a combination of Cassandra and ElasticSearch. This presentation will talk about various data modelling techniques we use to aggregate and then further summarise all that metadata and be able to search the summary in real t
How We Use MongoDB in Our Advertising SystemMongoDB
This talk will go over why we chose to use MongoDB for storing billions of documents with only 3 replset nodes, and why we choose MongoDB for our report data store instead of MySQL.
This course is designed to be a “fast start” on the basics of data modeling with Cassandra. We will cover some basic Administration information upfront that is important to understand as you choose your data model. It is still important to take a proper Admin class if you are responsible for production instance. This course focuses on CQL3, but thrift shall not be ignored.
This presentation will recount the story of Macys.com (and Bloomingdales.com)'s selection and migration from legacy RDBMS to NoSQL Cassandra in partnership with DataStax.
We'll start with a mercifully brief backgrounder on our website and our business. Then we will go over the various technologies that we considered, as well as our use case-based performance benchmarks that led to the decision to go with Cassandra.
We'll cover the various schema options that we tried and how we settled on the current one. We'll show you a selection of some of our extensive performance tuning benchmarks.
One thing that differentiates this talk from others on Cassandra is Macy's philosophy of "doing more with less." You will see why we emphasize the performance tuning aspects of iterative development when you see how much processing we can support on relatively small configurations.
And, finally, we will wrap up with our "lessons learned" and a brief look at our future plans.
Cassandra Community Webinar | The World's Next Top Data ModelDataStax
You know you need Cassandra for it's uptime and scaling, but what about that data model? Let's bridge that gap and get you building your game changing app. We'll break down topics like storing objects and indexing for fast retrieval. You will see by understanding a few things about Cassandra internals, you can put your data model in the spotlight. The goal of this talk is to get you comfortable working with data in Cassandra throughout the application lifecycle. What are you waiting for? The cameras are waiting!
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016DataStax
Large partitions shall no longer be a nightmare. That is the goal of CASSANDRA-11206.
100MB and 100,000 cells per partition is the recommended limit for a single partition in Cassandra up to 3.5. Exceeding these limits can cause a lot of trouble. Repairs and compactions could fail and reads cause out-of-memory failures.
This talk provides a deep-dive of the reasons for the previous limitations, why exceeding these limitations caused trouble, how the improvements in Cassandra 3.6 helps with big partitions and why you should not blindly let your partitions get huge.
About the Speaker
Robert Stupp Solution Architect, DataStax
Robert is working as a Solutions Architect at DataStax and is also a Committer to Apache Cassandra. Before joining DataStax he worked with his customers to architect and build distributed systems using Cassandra and has a long experience in building distributed backend systems mostly using Java as the preferred language of choice.
Primary and Clustering Keys should be one of the very first things you learn about when modeling Cassandra data. Most people coming from a relational background automatically think, ""Yeah, I know what a Primary Key is"", and gloss right over it. Because of this, there always seems to be a lot of confusion around the topic of Primary Keys in Cassandra. This presentation will demystify that confusion. I will cover what the different types of Keys are, how they can be used, what their purpose is, and how they affect your queries.
For this presentation, I will be using CrossFit gym locations as my subject matter. I will explain the differences between Primary Keys, Compound Keys, Clustering Keys, & Composite Keys. I will also show how the data behind each type differs as stored on disk. Lastly, I will show what queries each type of key will support.
About the Speaker
Adam Hutson Data Architect, DataScale
Adam is Data Architect for DataScale, Inc. He is a seasoned data professional with experience designing & developing large-scale, high-volume database systems. Adam previously spent four years as Senior Data Engineer for Expedia building a distributed Hotel Search using Cassandra 1.1 in AWS. Having worked with Cassandra since version 0.8, he was early to recognize the value Cassandra adds to Enterprise data storage. Adam is also a DataStax Certified Cassandra Developer.
Solr & Cassandra: Searching Cassandra with DataStax EnterpriseDataStax Academy
Wait! Back away from the Cassandra secondary index. It’s ok for some use cases, but it’s not an easy button. “But I need to search through a bunch of columns to look for the data… and I can’t model that in C*, even after watching all of Patrick McFadins data modeling videos. What do I do?” The answer, dear developer, is in DSE Search. With it’s easy Solr API, Lucene indexes (and fault tolerance) you can search data stored in your Cassandra database until your heart’s content. Take my hand. I will show you how.
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis
First Steps of an Oracle-expert in the Big Data World. Everyone speaks about Big Data. But what does it mean? This speech focuses on one animal of the Big Data Zoo - Cassandra and answers the following questions:
- Why another database?
- There is Impala and Spark. Why would I need Cassandra?
- New database - do I need to learn a new language?
- How do I get the data in?
- Can I use SQL?
- Is it part of a distribution, for example Cloudera?
Demos will explain the theory.
SQL stands for Structured Query Language
SQL lets you access and manipulate databases
SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the International Organization for Standardization (ISO) in 1987
Apache Cassandra, part 1 – principles, data modelAndrey Lomakin
Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.
Introduction to Data Modeling with Apache CassandraLuke Tillman
Relational systems have always been built on the premise of modeling relationships. As you will see, static schema, one-to-one, many-to-many still have a place in Cassandra. From the familiar, we’ll go into the specific differences in Cassandra and tricks to make your application fast and resilient.
SenchaCon 2016: The Once and Future Grid - Nige WhiteSencha
The Ext JS Grid has been a powerhouse for years, used by thousands of enterprises to deliver robust, data-rich applications to the web. The Ext JS Grid for the Modern Toolkit builds on these years of experience and leverages the full power of HTML5 and CSS3 to provide an extremely flexible and efficient grid for the modern era. In this session, we'll explore some of the key architectural advantages of the Modern Grid. Come and see how you can take advantage of these capabilities to tame mountains of data and give your users the world-class experience they demand.
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
Companies today are innovating with real-time data to deliver truly amazing customer experiences in the moment. Real-time data management for real-time customer experience is core to staying ahead of competition and driving revenue growth. Join Trays to learn how Comcast is differentiating itself from it's own historical reputation with Customer Experience strategies.
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
DataStax Enterprise (DSE) Graph is a built to manage, analyze, and search highly connected data. DSE Graph, built on NoSQL Apache Cassandra delivers continuous uptime along with predictable performance and scales for modern systems dealing with complex and constantly changing data.
Download DataStax Enterprise: Academy.DataStax.com/Download
Start free training for DataStax Enterprise Graph: Academy.DataStax.com/courses/ds332-datastax-enterprise-graph
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
DataStax Enterprise Advanced Replication supports one-way distributed data replication from remote database clusters that might experience periods of network or internet downtime. Benefiting use cases that require a 'hub and spoke' architecture.
Learn more at http://www.datastax.com/2016/07/stay-100-connected-with-dse-advanced-replication
Advanced Replication docs – https://docs.datastax.com/en/latest-dse/datastax_enterprise/advRep/advRepTOC.html
Data Modeling is the one of the first things to sink your teeth into when trying out a new database. That's why we are going to cover this foundational topic in enough detail for you to get dangerous. Data Modeling for relational databases is more than a touch different than the way it's approached with Cassandra. We will address the quintessential query-driven methodology through a couple of different use cases, including working with time series data for IoT. We will also demo a new tool to get you bootstrapped quickly with MovieLens sample data. This talk should give you the basics you need to get serious with Apache Cassandra.
Hear about how Coursera uses Cassandra as the core of its scalable online education platform. I'll discuss the strengths of Cassandra that we leverage, as well as some limitations that you might run into as well in practice.
In the second part of this talk, we'll dive into how best to effectively use the Datastax Java drivers. We'll dig into how the driver is architected, and use this understanding to develop best practices to follow. I'll also share a couple of interesting bug we've run into at Coursera.
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
This talk covers scaling Cassandra to a fast growing user base. Alex and Isaias will cover new best practices and how to work with the strengths and weaknesses of Cassandra at large scale. They will discuss how to adapt to bottlenecks while providing a rich feature set to the playstation community.
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
This talk covers scaling Cassandra to a fast growing user base. Alex and Isaias will cover new best practices and how to work with the strengths and weaknesses of Cassandra at large scale. They will discuss how to adapt to bottlenecks while providing a rich feature set to the playstation community.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
5. System and hardware failures can and do
happen
Peer-to-peer, distributed system
All nodes identical – Gossip status and state
Data partitioned among nodes
Data replication ensures fault tolerance
Read/write anywhere design
Cassandra Architecture
80
50 30
1070
60
40
20
6. Schema is a row-oriented, column structure
A keyspace is akin to a database
Tables are more flexible/dynamic
A row in a table is indexed by its key
Other columns may be indexed as well
Keyspaces and Tables
80
50 30
1070
60
40
20
Key Name SSN DOB
Keyspace
Table
7. Writing Data
80
50 30
1070
60
40
20
Data is partitioned by token
Nodes own token and range back to previous token
Nodes may own multiple tokens (rf, vnodes)
Client connects to any node
Node takes on role of coordinator during transaction
hash(key) => token, rd/wr node(s)
Cluster owns all data – typically, each DC does, too
Nodes own slices of data - one, or more
Replicas can be rack-aware
Client
Write - key
hash(key)=43
8. Replication and DCs
80
50 30
1070
60
40
20
Client
Write - key
hash(key)=43
81
51 31
1171
61
41
21
Data Center 1; rf=3 Data Center 2; rf=3
9. Choose between strong and eventual consistency (one to
all nodes acknowledging) depending upon the need
Can be configured on a per-operation basis, and for both
reads and writes
Handles multi-data center transparently
CLwr + CLrd > RF yields consistency
Tunable Consistency
80
50 30
1070
60
40
20
10. Familiar SQL syntax without joins, group by, etc
Create objects via DDL (e.g. CREATE...)
Core DML commands supported (e.g.
CREATE, UPDATE, DELETE)
Query data with SELECT
CQL Language
80
50 30
1070
60
40
20
11. B l o o m F i l t e r s
K e y C a c h e
R o w C a c h e
O S C a c h e
Commit
Log
(1/node)
memtable (1/cf)
key; col1=val1,..coln=valn
key; col1=val1,..coln=valn
...
key; col1=val1,..coln=valn
memtable (1/cf)
key; col1=val1,..coln=valn
key; col1=val1,..coln=valn
...
key; col1=val1,..coln=valn
...
memtable (1/cf)
key; col1=val1,..coln=valn
key; col1=val1,..coln=valn
...
key; col1=val1,..coln=valn
...
sstable (n/cf)
...
s
m
x
key; col1=val1,..coln=valncdx
r
d
x
key; col1=val1,..coln=valncdx
key; col1=val1,..coln=valncdx
sstable (n/cf)
...
s
m
x
key; col1=val1,..coln=valncdx
r
d
x
key; col1=val1,..coln=valncdx
key; col1=val1,..coln=valncdx
sstable (n/cf)
...
s
m
x
key; col1=val1,..coln=valncdx
r
d
x
key; col1=val1,..coln=valncdx
key; col1=val1,..coln=valncdx
Read Path
M
e
m
o
r
y
Read / Write Path
Write Path
D
i
s
k
H
e
a
p
14. Components of the Column
The column is the fundamental data
type in Cassandra and includes:
• Column name
• Column value
• Timestamp
• TTL (Optional)
18. Column Names and Values
• the data type for a column value is called a validator.
• The data type for a column name is called a comparator.
• Cassandra validates that data type of the keys of rows.
• Columns are sorted, and stored in sorted order on disk, so you have to
specify a comparator for columns. This can be reversed… more on this
later
23. Counters
• Allows for addition / subtraction
• 64-bit value
• No timestamp
update recommendation_summary
set num_products = num_products + 1
where recommendation = 'highly recommend';
24. Cassandra feature - Collections
• Collections give you three types:
- Set
- List
- Map
• Each allow for dynamic updates
• Fully supported in CQL 3
• Requires serialization
24
CREATE TABLE collections_example (
id int PRIMARY KEY,
set_example set<text>,
list_example list<text>,
map_example map<int,text>
);
25. Cassandra Collections - Set
• Set is sorted by CQL type comparator
25
INSERT INTO collections_example (id, set_example)
VALUES(1, {'1-one', '2-two'});
set_example set<text>
Collection
name
Collection
type
CQL
Type
26. Cassandra Collections - Set Operations
26
UPDATE collections_example
SET set_example = set_example + {'3-three'} WHERE id = 1;
UPDATE collections_example
SET set_example = set_example + {'0-zero'} WHERE id = 1;
UPDATE collections_example
SET set_example = set_example - {'3-three'} WHERE id = 1;
• Adding an element to the set
• After adding this element, it will sort to the beginning.
• Removing an element from the set
27. Cassandra Collections - List
• Ordered by insertion
27
list_example list<text>
Collection
name
Collection
type
CQL
Type
INSERT INTO collections_example (id, list_example)
VALUES(1, ['1-one', '2-two']);
28. Cassandra Collections - List Operations
• Adding an element to the end of a list
28
UPDATE collections_example
SET list_example = list_example + ['3-three'] WHERE id = 1;
UPDATE collections_example
SET list_example = ['0-zero'] + list_example WHERE id = 1;
• Adding an element to the beginning of a list
UPDATE collections_example
SET list_example = list_example - ['3-three'] WHERE id = 1;
• Deleting an element from a list
29. Cassandra Collections - Map
• Key and value
• Key is sorted by CQL type comparator
29
INSERT INTO collections_example (id, map_example)
VALUES(1, { 1 : 'one', 2 : 'two' });
map_example map<int,text>
Collection
name
Collection
type
Value CQL
Type
Key CQL
Type
30. Cassandra Collections - Map Operations
• Add an element to the map
30
UPDATE collections_example
SET map_example[3] = 'three' WHERE id = 1;
UPDATE collections_example
SET map_example[3] = 'tres' WHERE id = 1;
DELETE map_example[3]
FROM collections_example WHERE id = 1;
• Update an existing element in the map
• Delete an element in the map
31. User model
• Enhance a simple user table
• Great for static + some dynamic
• Takes advantage of row level isolation
31
CREATE TABLE user_with_location (
username text PRIMARY KEY,
first_name text,
last_name text,
address1 text,
city text,
postal_code text,
last_login timestamp,
location_by_date map<timeuuid,text>
);
32. User profile - Operations
• Adding new login locations to the map
32
UPDATE user_with_location
SET last_login = now(), location_by_date = {now() : '123.123.123.1'}
WHERE username='PatrickMcFadin';
UPDATE user_with_location
USING TTL 2592000 // 30 Days
SET last_login = now(), location_by_date = {now() : '123.123.123.1'}
WHERE username='PatrickMcFadin';
• Adding new login locations to the map + TTL!
34. Column Families / Tables
• Similar to tables
• Groupings of Rows
• Tunable Consistency
• De-Normalization
• To avoid I/O
• Simplify the Read Path
• Static or Dynamic
35. Static Column Families
• Are the most similar to a relational table
• Most rows have the same column names
• Columns in rows can be different
jbellis
Name Email Address State
Jonathan jb@ds.com 123 main TX
dhutch
Name Email Address State
Daria dh@ds.com 45 2nd St. CA
egilmore
Name Email
eric eg@ds.com
Row Key Columns
36. Dynamic Column Families
• Also called “wide rows”
• Structured so a query into the row will answer a question
jbellis
dhutch egilmore datastax mzcassie
dhutch
egilmore
egilmore
datastax mzcassie
Row Key Columns
46. Partitioners – paging through all rows
• SELECT * FROM test WHERE token(k) >
token(42);
CQL 3 forbids such a query unless the partitioner in use is ordered. Even when using the random partitioner or the murmur3
partitioner, it can sometimes be useful to page through all rows. For this purpose, CQL 3 includes the token function:
The ByteOrdered partitioner arranges tokens the same way as key values, but the RandomPartitioner and Murmur3Partitioner
distribute tokens in a completely unordered manner. The token function makes it possible to page through unordered partitioner
results. Using the token function actually queries results directly using tokens. Underneath, the token function makes token-
based comparisons and does not convert keys to tokens (not k > 42).
47. Primary Index Overview
• Index for all of your row keys
• Per-node index
• Partitioner + placement manages
which node
• Keys are just kept in ordered buckets
• Partitioner determines how K
Token
48. Natural Keys
Examples:
• An email address
• A user id
• Easy to make the relationship
• Less de-normalization
• More risk of an „UPSERT‟
• Changing the key requires a bulk copy
operation
49. Surrogate Keys
• Example:
• UUID
• Independently generated
• Allows you to store multiple versions of a
user
• Relationship is now indirect
• Changing the key requires the creation of
a new row, or column
54. Secondary Indexes
• Need for an easy way to do limited ad-hoc
queries
• Supports multiple per row
• Single clause can support multiple
selectors
• Implemented as a hash map, not B-Tree
• Low cardinality ONLY
59. The Basics of C* Modeling
• Work backwards
• What does your application do?
• What are the access patterns?
• Now design your data model
60. Procedures
Consider use case requirements
• What data?
• Ordering?
• Filtering?
• Grouping?
• Events in chronological order?
• Does the data expire?
61. De-Normalization
• De-Normalize in C*
• Forget third normal form
• Data Duplication is OK
• Concerns
• Resource contention
• Latency
• Client-side joins
• Avoid them in your C* code
63. Table Design
• Ideally each query will be one row
• Compared to other resources, disk
space is cheap
• Reduce disk seeks
• Reduce network traffic
64. Workload Preference
• High level of de-normalization means you
may have to write the same data many
times
• Cassandra handles large numbers of
writes well
• If given the choice:
• Read once
• Write many
65. Concurrent Writes
• A row is always referenced by a Key
• Keys are just bytes
• They must be unique within a CF
• Primary keys are unique
• But Cassandra will not enforce uniqueness
• If you are not careful you will accidentally
[UPSERT] the whole thing
66. The 5 C* Commandments for Developers
1. Start with queries. Don‟t data model for data modeling sake.
2. It‟s ok to duplicate data.
3. C* is designed to read and write sequentially. Great for rotational
disk, awesome for SSDs, awful for NAS. If your disk has an Ethernet
port, it‟s not good for C*.
4. Use secondary indexes strategically and cautiously.
5. Embrace wide rows and de-normalization
68. Shopping cart use case
* Store shopping cart data reliably
* Minimize (or eliminate) downtime. Multi-dc
* Scale for the “Cyber Monday” problem
* Every minute off-line is lost $$
* Online shoppers want speed!
The
bad
69. Shopping cart data
model
* Each customer can have
one or more shopping
carts
* De-normalize data for
fast access
* Shopping cart == One
partition (Row Level
Isolation)
* Each item a new column
70. Shopping cart data
modelCREATE TABLE user (
username varchar,
firstname varchar,
lastname varchar,
shopping_carts set<varchar>,
PRIMARY KEY (username)
);
CREATE TABLE shopping_cart (
username varchar,
cart_name text
item_id int,
item_name varchar,
description varchar,
price float,
item_detail map<varchar,varchar>
PRIMARY KEY ((username,cart_name),item_id)
);
INSERT INTO shopping_cart
(username,cart_name,item_id,item_name,description,price,item_detail)
VALUES ('pmcfadin','Gadgets I want',8675309,'Garmin 910XT','Multisport
training watch',349.99,
{'Related':'Timex sports watch',
'Volume Discount':'10'});
INSERT INTO shopping_cart
(username,cart_name,item_id,item_name,description,price,item_detail)
VALUES ('pmcfadin','Gadgets I want',9748575,'Polaris Foot Pod','Bluetooth Smart
foot pod',64.00
{'Related':'Timex foot pod',
'Volume Discount':'25'});
One partition (storage row) of data
Item details. Flexible for any use.
Partition row key for one user‟s cart
Creates partition row key
72. User activity use case
* React to user input in real time
* Support for multiple application pods
* Scale for speed
* Losing interactions is costly
* Waiting for batch(hadoop) is to long
The
bad
73. User activity data model
* Interaction points stored
per user in short table
* Long term interaction
stored in similar table with
date partition
* Process long term later
using batch
* Reverse time series to
get last N items
74. User activity data model
CREATE TABLE user_activity (
username varchar,
interaction_time timeuuid,
activity_code varchar,
detail varchar,
PRIMARY KEY (username, interaction_time)
) WITH CLUSTERING ORDER BY (interaction_time
DESC);
CREATE TABLE user_activity_history (
username varchar,
interaction_date varchar,
interaction_time timeuuid,
activity_code varchar,
detail varchar,
PRIMARY KEY ((username,interaction_date),interaction_time)
);
INSERT INTO user_activity
(username,interaction_time,activity_code,detail)
VALUES ('pmcfadin',0D1454E0-D202-11E2-8B8B-0800200C9A66,'100','Normal
login')
USING TTL 2592000;
INSERT INTO user_activity_history
(username,interaction_date,interaction_time,activity_code,detail)
VALUES ('pmcfadin','20130605',0D1454E0-D202-11E2-8B8B-
0800200C9A66,'100','Normal login');
Reverse order based on timestamp
Expire after 30 days
75. Data model usage
username | interaction_time | detail | activity_code
----------+--------------------------------------+------------------------------------------+------------------
pmcfadin | 9ccc9df0-d076-11e2-923e-5d8390e664ec | Entered shopping area: Jewelry | 301
pmcfadin | 9c652990-d076-11e2-923e-5d8390e664ec | Created shopping cart: Anniversary gifts | 202
pmcfadin | 1b5cef90-d076-11e2-923e-5d8390e664ec | Deleted shopping cart: Gadgets I want | 205
pmcfadin | 1b0e5a60-d076-11e2-923e-5d8390e664ec | Opened shopping cart: Gadgets I want | 201
pmcfadin | 1b0be960-d076-11e2-923e-5d8390e664ec | Normal login | 100
select * from user_activity limit 5;
Maybe put a sale item for flowers
too?
77. Log collection use case
* Collect log data at high speed
* Cassandra near where logs are generated. Multi-datacenter
* Dice data for various uses. Dashboard. Lookup. Etc.
* The scale needed for RDBMS is cost
prohibitive
* Batch analysis of logs too late for some use
cases
The
bad
78. Log collection data
model
* Collect and fan out data to
various tables
* Tables for lookup based on
source and time
* Tables for dashboard with
aggregation and summation
79. Log collection data
model
CREATE TABLE log_lookup (
source varchar,
date_to_minute varchar,
timestamp timeuuid,
raw_log blob,
PRIMARY KEY ((source,date_to_minute),timestamp)
);
CREATE TABLE login_success (
source varchar,
date_to_minute varchar,
successful_logins counter,
PRIMARY KEY (source,date_to_minute)
) WITH CLUSTERING ORDER BY (date_to_minute DESC);
CREATE TABLE login_failure (
source varchar,
date_to_minute varchar,
failed_logins counter,
PRIMARY KEY (source,date_to_minute)
) WITH CLUSTERING ORDER BY (date_to_minute DESC);
Consider storing raw logs as
GZIP
82. Form versioning use
case
* Store every possible version efficiently
* Scale to any number of users
* Commit/Rollback functionality on a form
* In RDBMS, many relations that need complicated
join
* Needs to be in cloud and local data center
The
bad
83. Form version data
model
* Each user has a form
* Each form needs
versioning
* Separate table to store live
version
* Exclusive lock on a form
84. Form version data
model
CREATE TABLE working_version (
username varchar,
form_id int,
version_number int,
locked_by varchar,
form_attributes map<varchar,varchar>
PRIMARY KEY ((username, form_id), version_number)
) WITH CLUSTERING ORDER BY (version_number DESC);
INSERT INTO working_version
(username, form_id, version_number, locked_by, form_attributes)
VALUES ('pmcfadin',1138,1,'',
{'FirstName<text>':'First Name: ',
'LastName<text>':'Last Name: ',
'EmailAddress<text>':'Email Address: ',
'Newsletter<radio>':'Y,N'});
UPDATE working_version
SET locked_by = 'pmcfadin'
WHERE username = 'pmcfadin'
AND form_id = 1138
AND version_number = 1;
INSERT INTO working_version
(username, form_id, version_number, locked_by, form_attributes)
VALUES ('pmcfadin',1138,2,null,
{'FirstName<text>':'First Name: ',
'LastName<text>':'Last Name: ',
'EmailAddress<text>':'Email Address: ',
'Newsletter<checkbox>':'Y'});
1. Insert first
version
2. Lock for one user
3. Insert new version. Release lock
86. The race is on
Process 1 Process 2
SELECT firstName, lastName
FROM users
WHERE username = 'pmcfadin'; SELECT firstName, lastName
FROM users
WHERE username = 'pmcfadin';
(0 rows)
(0 rows)
INSERT INTO users (username, firstname,
lastname, email, password, created_date)
VALUES ('pmcfadin','Patrick','McFadin',
['patrick@datastax.com'],
'ba27e03fd95e507daf2937c937d499ab',
'2011-06-20 13:50:00'); INSERT INTO users (username, firstname,
lastname, email, password, created_date)
VALUES ('pmcfadin','Paul','McFadin',
['paul@oracle.com'],
'ea24e13ad95a209ded8912e937d499de',
'2011-06-20 13:51:00');
T0
T1
T2
T3
Got nothing! Good to go!
This one wins
93. LWT Fine Print
•Light Weight Transactions solve edge conditions
•They have latency cost.
•Be aware
•Load test
•Consider in your data model
94. Indexing with Tables
• Indexing expresses application intent
• Fast access to specific queries
• Secondary indexes != relational indexes
• Use information you have. No pre-reads.
94
Goals:
1. Create row key for speed
2. Use wide rows for efficiency
95. Keyword index
• Use a word as a key
• Columns are the occurrence
• Ex: Index of tag words about videos
95
CREATE TABLE tag_index (
tag varchar,
videoid uuid,
timestamp timestamp,
PRIMARY KEY (tag, videoid)
);
VideoId1 .. VideoIdNtag
Fast
Efficie
nt
96. Partial word index
• Where row size will be large
• Take one part for key, rest for columns name
96
CREATE TABLE email_index (
domain varchar,
user varchar,
username varchar,
PRIMARY KEY (domain, user)
);
INSERT INTO email_index (domain, user, username)
VALUES ('@relational.com','tcodd', 'tcodd');
User: tcodd Email:
tcodd@relational.com
97. Partial word index
• Create partitions + partial indexes
97
CREATE TABLE product_index (
store int,
part_number0_3 int,
part_number4_9 int,
count int,
PRIMARY KEY ((store,part_number0_3), part_number4_9)
);
INSERT INTO product_index (store,part_number0_3,part_number4_9,count)
VALUES (8675309,7079,48575,3);
SELECT count
FROM product_index
WHERE store = 8675309
AND part_number0_3 = 7079
AND part_number4_9 = 48575;
Compound row
key!
Fast and
efficient!
• Store #8675309 has 3 of part# 7079748575
98. Bit map index – supports ad hoc queries
• Multiple parts to a key
• Create a truth table of the different combinations
• Inserts == the number of combinations
- 3 fields? 7 options (Not going to use null choice)
- 4 fields? 15 options
- 2n – 1, where n = # of dynamic query fields
98
99. Bit map index
• Find a car in a lot by variable combinations
99
Make Model Color Combination
x Color
x Model
x x Model+Color
x Make
x x Make+Color
x x Make+Model
x x x Make+Model+Color
100. Bit map index - Table create
• Make a table with three different key combos
10
0
CREATE TABLE car_location_index (
make varchar,
model varchar,
color varchar,
vehical_id int,
lot_id int,
PRIMARY KEY ((make,model,color),vehical_id)
);
Compound row key with three different options
101. Bit map index - Adding records
• Pre-optimize for 7 possible questions on insert
10
1
INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)
VALUES ('Ford','Mustang','Blue',1234,8675309);
INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)
VALUES ('Ford','Mustang','',1234,8675309);
INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)
VALUES ('Ford','','Blue',1234,8675309);
INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)
VALUES ('Ford','','',1234,8675309);
INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)
VALUES ('','Mustang','Blue',1234,8675309);
INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)
VALUES ('','Mustang','',1234,8675309);
INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)
VALUES ('','','Blue',1234,8675309);
102. Bit map index - Selecting records
• Different combinations now possible
10
2
SELECT vehical_id,lot_id
FROM car_location_index
WHERE make = 'Ford'
AND model = ''
AND color = 'Blue';
vehical_id | lot_id
------------+---------
1234 | 8675309
SELECT vehical_id,lot_id
FROM car_location_index
WHERE make = ''
AND model = ''
AND color = 'Blue';
vehical_id | lot_id
------------+---------
1234 | 8675309
8765 | 5551212
You can define the validator and comparator when you create your table schema (which is recommended), but Cassandra does not require it. Internally, Cassandra stores column names and values as hex byte arrays (BytesType). This is the default client encoding used if data types are not defined in the table schema (or if not specified by the client request).
Cassandra comes with the following built-in data types, which can be used as both validators (row key and column value data types) or comparators (column name data types). One exception is CounterColumnType, which is only allowed as a column value (not allowed for row keys or column names).
SELECT WRITETIME (title) FROM songs WHERE id = 8a172618-b121-4136-bb10-f665cfc469eb; writetime(title) ------------------ 1353890782373000
A counter is a special kind of column used to store a number that incrementally counts the occurrences of a particular event or process. For example, you might use a counter column to count the number of times a page is viewed. Counter column tables must use Counter data type. Counters may only be stored in dedicated tables. After a counter is defined, the client application then updates the counter column value by incrementing (or decrementing) it. A client update to a counter column passes the name of the counter and the increment (or decrement) value; no timestamp is required. Internally, the structure of a counter column is a bit more complex. Cassandra tracks the distributed state of the counter as well as a server-generated timestamp upon deletion of a counter column. For this reason, it is important that all nodes in your cluster have their clocks synchronized using a source such as network time protocol (NTP). Unlike normal columns, a write to a counter requires a read in the background to ensure that distributed counter values remain consistent across replicas. Typically, you use a consistency level of ONE with counters because during a write operation, the implicit read does not impact write latency.
Using clustering order You can order query results to make use of the on-disk sorting of columns. You can order results in ascending or descending order. The ascending order will be more efficient than descending. If you need results in descending order, you can specify a clustering order to store columns on disk in the reverse order of the default. Descending queries will then be faster than ascending ones. The following example shows a table definition that changes the clustering order to descending by insertion time. create table timeseries ( event_type text, insertion_time timestamp, event blob, PRIMARY KEY (event_type, insertion_time) ) WITH CLUSTERING ORDER BY (insertion_time DESC);
CQL 3 forbids such a query unless the partitioner in use is ordered. Even when using the random partitioner or the murmur3 partitioner, it can sometimes be useful to page through all rows. For this purpose, CQL 3 includes the token function: The ByteOrderedpartitioner arranges tokens the same way as key values, but the RandomPartitioner and Murmur3Partitioner distribute tokens in a completely unordered manner. The token function makes it possible to page through unordered partitioner results. Using the token function actually queries results directly using tokens. Underneath, the token function makes token-based comparisons and does not convert keys to tokens (not k > 42).
What is it?Index for all of your row keysA key is mapped to a token, and from that token we can find the data in our ring.Index for all of your row keysPer-node indexPartitioner + placement manages which nodeKeys are just kept in ordered bucketsPartitioner determines how K TokenIn relational database design, a primary key is the unique key used to identify each row in a table. A primary key index, like any index, speeds up random access to data in the table. The primary key also ensures record uniqueness, and may also control the order in which records are physically clustered, or stored by the database.In Cassandra, the primary index for a column family is the index of its row keys. Each node maintains this index for the data it manages.Rows are assigned to nodes by the cluster-configured partitionerand the keyspace-configured replica placement strategy. The primary index in Cassandra allows looking up of rows by their row key. Since each node knows what ranges of keys each node manages, requested rows can be efficiently located by scanning the row indexes only on the relevant replicas.With randomly partitioned row keys (the default in Cassandra), row keys are partitioned by their MD5 hash and cannot be scanned in order like traditional b-tree indexes. Using an ordered partitioner does allow for range queries over rows, but is not recommended because of the difficulty in maintaining even data distribution across nodes. See About Data Partitioning in Cassandra for more information.
Cassandra uses the first column name in the primary key definition as the partition key. For example, in the playlists table, id is the partition key. The remaining column, or columns that are not partition keys in the primary key definition are the clustering columns. In the case of the playlists table, the song_id is the clustering column. The data for each partition is clustered by the remaining column or columns of the primary key definition. On a physical node, when rows for a partition key are stored in order based on the clustering columns, retrieval of rows is very efficient. For example, because the id in the playlists table is the partition key, all the songs for a playlist are clustered in the order of the remaining song_id column. Insertion, update, and deletion operations on rows sharing the same partition key for a table are performed atomically and in isolation.
CREATE TABLE playlists ( id uuid,song_iduuid, title text, album text, artist text, PRIMARY KEY (id, song_id));
ORDER BY clauses can select a single column only. That column has to be the second column in a compound PRIMARY KEY. This also applies to tables with more than two column components in the primary key. Ordering can be done in ascending or descending order, default ascending, and specified with the ASC or DESC keywords. Basically we are saying that you get sorting for free with Cassandra, but it’s not always as straight forward as you would expect.
Secondary Indexes came about in Cassandra because of a need for an easy way to query data.
An advantage of secondary indexes is the operational ease of populating and maintaining the index. Secondary indexes are built in the background automatically, without blocking reads or writes. Client-maintained 'column families as indexes' must be created manually; for example, if the state column had been indexed by creating a column family such as users_by_state, your client application would have to populate the column family with data from the users column family.Using CQL, you can create a secondary index on a column after defining a column family. Using CQL 3, for example, define a static column family, and then create a secondary index on two of its named columns:Secondary index names, in this case, users_state, must be unique within the keyspace. Because of the secondary index created for the state column, after inserting values into the column family, you can query Cassandra directly for users who live in a given state
P 124Cassandra supports these conditional operators: =, >, >=, <, or <=, but not all in certain situations. • A filter based on a non-equals condition on a partition key is supported only if the partitioner is an ordered one. • WHERE clauses can include a greater-than and less-than comparisons, but for a given partition key, the conditions on the clustering column are restricted to the filters that allow Cassandra to select a contiguous ordering of rows. For example: This query constructs a filter that selects data about stewards whose reign started by 2450 and ended before 2500. If king were not a component of the primary key, you would need to create a secondary index on king to use this query: The output is: To allow Cassandra to select a contiguous ordering of rows, you need to include the king component of the primary key in the filter using an equality condition. The ALLOW FILTERING clause, described in the next section, is also required.