The document provides an overview and discussion of Cassandra including its architecture, data model, and real world applications. It discusses Cassandra's distributed architecture based on BigTable and Dynamo, as well as key concepts like nodes, clusters, consistency levels, and tunable consistency. The document also covers data modeling techniques in Cassandra like compound primary keys, materialized views, secondary indexes, counters, and using time to live for expiring data. Real world examples are provided for many of these techniques.
Cassandra is a free and open-source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. The slides explain the details of data modeling in Cassandra.
C* Summit 2013: The State of CQL by Sylvain LebresneDataStax Academy
Abstract Since its inception, the Cassandra Query Language (CQL) has grown and matured, resulting in the 3rd version of the language (CQL3) being finalized in Cassandra 1.2. Compared to the legacy Thrift API, CQL3 aims at providing an API that is higher level and more user friendly but still fully assumes the distributed nature of Cassandra and it's storage engine. This presentation will present CQL3, describing the reasoning and goals behind the language as well as the language itself. CQL's relationship with Thrift will be touched on, along with the CQL binary protocol that has been introduced in Cassandra 1.2. This presentation will wrap up by discussing the future of CQL.
Design of a lightweight set of data pipelines to scrub PII information.
Scrubbing PII information from data brings ease of sharing data.
It also helps organisations to confidently push data outside organisation for large scale analytics on the cloud.
This Doc Consist of ER diagram of University and NHL, Introduction to posgres SQL and installation,DML and its various commands,implementation of constraints with examples,DML Implementation with set operations & Functions,Implementation of nested Queries.
Cassandra is a free and open-source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. The slides explain the details of data modeling in Cassandra.
C* Summit 2013: The State of CQL by Sylvain LebresneDataStax Academy
Abstract Since its inception, the Cassandra Query Language (CQL) has grown and matured, resulting in the 3rd version of the language (CQL3) being finalized in Cassandra 1.2. Compared to the legacy Thrift API, CQL3 aims at providing an API that is higher level and more user friendly but still fully assumes the distributed nature of Cassandra and it's storage engine. This presentation will present CQL3, describing the reasoning and goals behind the language as well as the language itself. CQL's relationship with Thrift will be touched on, along with the CQL binary protocol that has been introduced in Cassandra 1.2. This presentation will wrap up by discussing the future of CQL.
Design of a lightweight set of data pipelines to scrub PII information.
Scrubbing PII information from data brings ease of sharing data.
It also helps organisations to confidently push data outside organisation for large scale analytics on the cloud.
This Doc Consist of ER diagram of University and NHL, Introduction to posgres SQL and installation,DML and its various commands,implementation of constraints with examples,DML Implementation with set operations & Functions,Implementation of nested Queries.
ADO.NET Architecture
Data processing has traditionally relied primarily on a connection-based, two-tier model. As data
processing increasingly uses multi-tier architectures, programmers are switching to a
disconnected approach to provide better scalability for their applications.
Star Transformation, 12c Adaptive Bitmap Pruning and In-Memory optionFranck Pachot
Besides adaptive joins and adaptive parallel distribution, 12c comes with Adaptive Bitmap Pruning. I’ll describe the case it applies to and which is often not well known: the Star Transformation
Last year, in Apache Spark 2.0, we introduced Structured Steaming, a new stream processing engine built on Spark SQL, which revolutionized how developers could write stream processing application. Structured Streaming enables users to express their computations the same way they would express a batch query on static data. Developers can express queries using powerful high-level APIs including DataFrames, Dataset and SQL. Then, the Spark SQL engine is capable of converting these batch-like transformations into an incremental execution plan that can process streaming data, while automatically handling late, out-of-order data, and ensuring end-to-end exactly-once fault-tolerance guarantees.
Since Spark 2.0 we've been hard at work building first class integration with Kafka. With this new connectivity, performing complex, low-latency analytics is now as easy as writing a standard SQL query. This functionality, in addition to the existing connectivity of Spark SQL, makes it easy to analyze data using one unified framework. Users can now seamlessly extract insights from data, independent of whether it is coming from messy / unstructured files, a structured / columnar historical data warehouse or arriving in real-time from Kafka/Kinesis.
MySQL 8.0 is a big advancement over previous versions with a true data dictionary, invisible indexes, histograms, windowing functions, improved JSON support, CATS, and more
How we built our new crowdfunding for social good community Yimby.com using lean startup and agile principles, and how we're growing our audience using growth hacking tactics and constantly testing and learning, even with a small team.
Integrating PHP With System-i using Web ServicesIvo Jansch
Presentation about Web Services in PHP for IBM System-i users. Sam Pinkhasov (Zend) did most of the presentation, I did the general part on PHP (first 9 slides). Presentation was done at the IBM Future Proof event in Eindhoven (june 5th, 2007)
Presentation I did at the Linuxdagen 2007 in Oslo (http://www.linuxdagen.no). Covers Achievo ATK, but also some background info on running an open source project (may 7th, 2007).
ADO.NET Architecture
Data processing has traditionally relied primarily on a connection-based, two-tier model. As data
processing increasingly uses multi-tier architectures, programmers are switching to a
disconnected approach to provide better scalability for their applications.
Star Transformation, 12c Adaptive Bitmap Pruning and In-Memory optionFranck Pachot
Besides adaptive joins and adaptive parallel distribution, 12c comes with Adaptive Bitmap Pruning. I’ll describe the case it applies to and which is often not well known: the Star Transformation
Last year, in Apache Spark 2.0, we introduced Structured Steaming, a new stream processing engine built on Spark SQL, which revolutionized how developers could write stream processing application. Structured Streaming enables users to express their computations the same way they would express a batch query on static data. Developers can express queries using powerful high-level APIs including DataFrames, Dataset and SQL. Then, the Spark SQL engine is capable of converting these batch-like transformations into an incremental execution plan that can process streaming data, while automatically handling late, out-of-order data, and ensuring end-to-end exactly-once fault-tolerance guarantees.
Since Spark 2.0 we've been hard at work building first class integration with Kafka. With this new connectivity, performing complex, low-latency analytics is now as easy as writing a standard SQL query. This functionality, in addition to the existing connectivity of Spark SQL, makes it easy to analyze data using one unified framework. Users can now seamlessly extract insights from data, independent of whether it is coming from messy / unstructured files, a structured / columnar historical data warehouse or arriving in real-time from Kafka/Kinesis.
MySQL 8.0 is a big advancement over previous versions with a true data dictionary, invisible indexes, histograms, windowing functions, improved JSON support, CATS, and more
How we built our new crowdfunding for social good community Yimby.com using lean startup and agile principles, and how we're growing our audience using growth hacking tactics and constantly testing and learning, even with a small team.
Integrating PHP With System-i using Web ServicesIvo Jansch
Presentation about Web Services in PHP for IBM System-i users. Sam Pinkhasov (Zend) did most of the presentation, I did the general part on PHP (first 9 slides). Presentation was done at the IBM Future Proof event in Eindhoven (june 5th, 2007)
Presentation I did at the Linuxdagen 2007 in Oslo (http://www.linuxdagen.no). Covers Achievo ATK, but also some background info on running an open source project (may 7th, 2007).
A Tale of Data Pattern Discovery in ParallelJenny Liu
In the era of IoTs and A.I., distributed and parallel computing is embracing big data driven and algorithm focused applications and services. With rapid progress and development on parallel frameworks, algorithms and accelerated computing capacities, it still remains challenging on deliver an efficient and scalable data analysis solution. This talk shares a research experience on data pattern discovery in domain applications. In particular, the research scrutinizes key factors in analysis workflow design and data parallelism improvement on cloud.
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureKent Graziano
This presentation was given at OakTable World 2014 (#OTW14) in San Francisco as a short Ted-style 10 minute talk. In it I introduce Data Vault 2.0 and its innovative approach to doing change data capture in a data warehouse by using MD5 Hash columns.
Apache Cassandra, part 1 – principles, data modelAndrey Lomakin
Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.
Introduciton to Apache Cassandra for Java Developers (JavaOne)zznate
The database industry has been abuzz over the past year about NoSQL databases. Apache Cassandra, which has quickly emerged as a best-of-breed solution in this space, is used at many companies to achieve unprecedented scale while maintaining streamlined operations.
This presentation goes beyond the hype, buzzwords, and rehashed slides and actually presents the attendees with a hands-on, step-by-step tutorial on how to write a Java application on top of Apache Cassandra. It focuses on concepts such as idempotence, tunable consistency, and shared-nothing clusters to help attendees get started with Apache Cassandra quickly while avoiding common pitfalls.
A TALE of DATA PATTERN DISCOVERY IN PARALLELJenny Liu
In the era of IoTs and A.I., distributed and parallel computing is embracing big data driven and algorithm focused applications and services. With rapid progress and development on parallel frameworks, algorithms and accelerated computing capacities, it still remains challenging on deliver an efficient and scalable data analysis solution. This talk shares a research experience on data pattern discovery in domain applications. In particular, the research scrutinizes key factors in analysis workflow design and data parallelism improvement on cloud.
Apache Cassandra, part 2 – data model example, machineryAndrey Lomakin
Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
How fast can you modify your data collection to include a new field, make all the necessary changes in data processing and storage, and then use that field in analytics or product features? For many companies, the answer is a few quarters, whereas others do it in a day. This data agility latency has a direct impact on companies' ability to innovate with data. Schema-on-read has been a key strategy to lower that latency - as the community has shifted towards storing data outside relational databases, we no longer need to make series of schema changes through the whole data chain, coordinated between teams to minimise operational risk. Schema-on-read comes with a cost, however. Errors that we used to catch during testing or in early test deployments can now sneak into production undetected and surface as product errors or hard-to-debug data quality problems later than with schema-on-write solutions.
In this presentation, we will show how we have rejected the tradeoff between slow schema change rate and quality to achieve the best of both worlds. By using metaprogramming and versioned pipelines that are tested end-to-end, we can achieve fast schema changes with schema-on-write and the protection of static typing. We will describe the tools in our toolbox - Scalameta, Chimney, Bazel, and custom tools. We will also show how we leverage them to take static typing one step further and differentiate between domain types that share representation, e.g. EmailAddress vs ValidatedEmailAddress or kW vs kWh, while maintaining harmony with data technology ecosystems.
SenchaCon 2016: The Once and Future Grid - Nige WhiteSencha
The Ext JS Grid has been a powerhouse for years, used by thousands of enterprises to deliver robust, data-rich applications to the web. The Ext JS Grid for the Modern Toolkit builds on these years of experience and leverages the full power of HTML5 and CSS3 to provide an extremely flexible and efficient grid for the modern era. In this session, we'll explore some of the key architectural advantages of the Modern Grid. Come and see how you can take advantage of these capabilities to tame mountains of data and give your users the world-class experience they demand.
Slides from workshop held on 12/14 in Asbury Park, NJ
http://www.meetup.com/Jersey-Shore-Tech/events/148118762/?gj=ro2_e&a=ro2_gnl&rv=ro2_e&_af_eid=148118762&_af=event
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Strategies for Successful Data Migration Tools.pptxvarshanayak241
Data migration is a complex but essential task for organizations aiming to modernize their IT infrastructure and leverage new technologies. By understanding common challenges and implementing these strategies, businesses can achieve a successful migration with minimal disruption. Data Migration Tool like Ask On Data play a pivotal role in this journey, offering features that streamline the process, ensure data integrity, and maintain security. With the right approach and tools, organizations can turn the challenge of data migration into an opportunity for growth and innovation.
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
Your Digital Assistant.
Making complex approach simple. Straightforward process saves time. No more waiting to connect with people that matter to you. Safety first is not a cliché - Securely protect information in cloud storage to prevent any third party from accessing data.
Would you rather make your visitors feel burdened by making them wait? Or choose VizMan for a stress-free experience? VizMan is an automated visitor management system that works for any industries not limited to factories, societies, government institutes, and warehouses. A new age contactless way of logging information of visitors, employees, packages, and vehicles. VizMan is a digital logbook so it deters unnecessary use of paper or space since there is no requirement of bundles of registers that is left to collect dust in a corner of a room. Visitor’s essential details, helps in scheduling meetings for visitors and employees, and assists in supervising the attendance of the employees. With VizMan, visitors don’t need to wait for hours in long queues. VizMan handles visitors with the value they deserve because we know time is important to you.
Feasible Features
One Subscription, Four Modules – Admin, Employee, Receptionist, and Gatekeeper ensures confidentiality and prevents data from being manipulated
User Friendly – can be easily used on Android, iOS, and Web Interface
Multiple Accessibility – Log in through any device from any place at any time
One app for all industries – a Visitor Management System that works for any organisation.
Stress-free Sign-up
Visitor is registered and checked-in by the Receptionist
Host gets a notification, where they opt to Approve the meeting
Host notifies the Receptionist of the end of the meeting
Visitor is checked-out by the Receptionist
Host enters notes and remarks of the meeting
Customizable Components
Scheduling Meetings – Host can invite visitors for meetings and also approve, reject and reschedule meetings
Single/Bulk invites – Invitations can be sent individually to a visitor or collectively to many visitors
VIP Visitors – Additional security of data for VIP visitors to avoid misuse of information
Courier Management – Keeps a check on deliveries like commodities being delivered in and out of establishments
Alerts & Notifications – Get notified on SMS, email, and application
Parking Management – Manage availability of parking space
Individual log-in – Every user has their own log-in id
Visitor/Meeting Analytics – Evaluate notes and remarks of the meeting stored in the system
Visitor Management System is a secure and user friendly database manager that records, filters, tracks the visitors to your organization.
"Secure Your Premises with VizMan (VMS) – Get It Now"
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Modern design is crucial in today's digital environment, and this is especially true for SharePoint intranets. The design of these digital hubs is critical to user engagement and productivity enhancement. They are the cornerstone of internal collaboration and interaction within enterprises.
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?XfilesPro
Worried about document security while sharing them in Salesforce? Fret no more! Here are the top-notch security standards XfilesPro upholds to ensure strong security for your Salesforce documents while sharing with internal or external people.
To learn more, read the blog: https://www.xfilespro.com/how-does-xfilespro-make-document-sharing-secure-and-seamless-in-salesforce/
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
Tim Combridge from Sensible Giraffe and Salesforce Ben presents some important tips that all developers should know when dealing with Flows in Salesforce.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
1. OVERVIEW AND REAL WORLD
APPLICATIONS
Cassandra
Jersey Shore Tech Meetup
Nov 13, 2014
2. You Are Not Here…
*** http://njhalloffame.org/
2
3. Agenda
3
Some Basic Concepts/Overview
New Developments In Cassandra
Basic Data Modeling Concepts
Materialized Views
Secondary Indexes
Counters
Time Series Data
Expiring Data
4. Cassandra High Level
4
Cassandra's architecture is based on the combination
of two technologies:
Google BigTable – Data Model
Amazon Dynamo – Distributed Architecture
BTW – these mean the same thing ->
Cassandra = C*
5. Architecture Basics & Terminology
5
Nodes are single instances of C*
Cluster is a group of nodes
Data is organized by keys (tokens) which are
distributed across the cluster
Replication Factor (rf) determines how many copies
are key
Data Center Aware – works well in multi-DC/EC2
etc.
Consistency Level – powerful feature to tune
consistency vs. speed vs. availability.’
7. More Architecture
7
Information on who has what data and who is
available is transferred using gossip.
No single point of failure (SPF), every node can
service requests.
Handles Replication and Downed Nodes (within
reason)
8. CAP Theorem
8
Distributed Systems Law:
Consistency
Availability
Partition Tolerance
(you can only really have two in a distributed system)
Cassandra is AP with Eventual Consistency
9. Consistency
9
Cassandra Uses the concept of Tunable Consistency,
which make it very powerful and flexible for system
needs.
13. Data Model Architecture
13
Keyspace – container of column families (tables).
Defines RF among others.
Table – column family. Contains definition of
schema.
Row – a “record” identified by a key
Column - a key and a value
15. Deletions
15
Distributed systems present unique problem for
deletes. If it actually deleted data and a node was
down and didn’t receive the delete notice it would try
and create record when came back online. So…
Tombstone - The data is replaced with a special
value called a Tombstone, works within distributed
architecture
16. Keys
16
Primary Key
Partition Key – identifies a row
Cluster Key – sorting within a row
Using CQL these are defined together as a compound
(composite) key
Compound keys are how you implement “wide
rows”, the COOL FEATURE!
17. Single Primary Key
17
create table users (
user_id UUID PRIMARY KEY,
firstname text,
lastname text,
emailaddres text
);
** Cassandra Data Types
http://www.datastax.com/documentation/cql/3.0/cql/cql_ref
erence/cql_data_types_c.html
18. Compound Key
18
create table users (
emailaddress text,
department text,
firstname text,
lastname text,
PRIMARY KEY (emailaddress, department)
);
Partition Key plus Cluster Key
emailaddress is partition key
department is cluster key
19. Compound Key
19
create table users (
emailaddress text,
department text,
country text,
firstname text,
lastname text,
PRIMARY KEY ((emailaddress, department), country)
);
Partition Key plus Cluster Key
Emailaddress & department is partition key
country is cluster key
20. New Rules
20
Writes Are Cheap
Denormalize All You Need
Model Your Queries, Not Data (understand access
patterns)
Application Worries About Joins
21. What’s New In 2.0
21
Conditional DDL
IF Exists or If Not Exists
Drop Column Support
ALTER TABLE users DROP lastname;
22. More New Stuff
22
Triggers
CREATE TRIGGER myTrigger
ON myTable
USING 'com.thejavaexperts.cassandra.updateevt'
Lightweight Transactions (CAS)
UPDATE users
SET firstname = 'tim'
WHERE emailaddress = 'tpeters@example.com'
IF firstname = 'tom';
** Not like an ACID Transaction!!
23. CAS & Transactions
23
CAS - compare-and-set operations. In a single,
atomic operation compares a value of a column in
the database and applying a modification depending
on the result of the comparison.
Consider performance hit. CAS is (was) considered
an anti-pattern.
24. Data Modeling… The Basics
24
Cassandra now is very familiar to RDBMS/SQL
users.
Very nicely hides the underlying data storage model.
Still have all the power of Cassandra, it is all in the
key definition.
RDBMS = model data
Cassandra = model access (queries)
25. Side-Note On Querying
25
Create table with compound key
Select using ALLOW FILTERING
Counts
Select using IN or =
26. Batch Operations
26
Saves Network Roundtrips
Can contain INSERT, UPDATE, DELETE
Atomic by default (all or nothing)
Can use timestamp for specific ordering
27. Batch Operation Example
27
BEGIN BATCH
INSERT INTO users (emailaddress, firstname, lastname, country) values
('brian.enochson@gmail.com', 'brian', 'enochson', 'USA');
INSERT INTO users (emailaddress, firstname, lastname, country) values
('tpeters@example.com', 'tom', 'peters', 'DE');
INSERT INTO users (emailaddress, firstname, lastname, country) values
('jsmith@example.com', 'jim', 'smith', 'USA');
INSERT INTO users (emailaddress, firstname, lastname, country) values
('arogers@example.com', 'alan', 'rogers', 'USA');
DELETE FROM users WHERE emailaddress = 'jsmith@example.com';
APPLY BATCH;
select in cqlsh
List in cassandra-cli with timestamp
28. More Data Modeling…
28
No Joins
No Foreign Keys
No Third (or any other) Normal Form Concerns
Redundant Data Encouraged. Apps maintain
consistency.
29. Secondary Indexes
29
Allow defining indexes to allow other access than
partition key.
Each node has a local index for its data.
They have uses, but shouldn’t be used all the time
without consideration.
We will look at alternatives.
30. Secondary Index Example
30
Create a table
Try to select with column not in PK
Add Secondary Index
Try select again. (maybe need to reinsert)
31. When to use?
31
Low Cardinality – small number of unique values
High Cardinality – high number of distinct values
Secondary Indexes are good for Low Cardinality. So
country codes, department codes etc. Not email
addresses.
32. Materialized View
32
Want full distribution can use what is called a
Materialized View pattern.
Remember redundant data is fine.
Model the queries
33. Materialized View Example
33
Show normal able with compound key and querying
limitations
Create Materialized View Table With Different
Compound Key, support alternate access.
Selects use partition key.
Secondary indexes local, not distributed
Allow filtering. Can cause performance issues
34. Counters
34
Updated in 2.1 and now work in a more distributed
and accurate manner.
Table organization, example
How to update, view etc.
35. Time Series Example….
35
Time series table model.
Need to consider interval for event frequency and
wide row size.
Make what is tracked by time and unit of interval
partition key.
36. Time Series Data
36
Due to its quick writing model Cassandra is suited
for storing time series data.
The Cassandra wide row is a perfect fit for modeling
time series / time based events.
Let’s look at an example….
37. Event Data
37
Notice primary key and cluster key.
Insert some data
View in CQL, then in CLI as wide row
38. TTL – Self Expiring Data
38
Another technique is data that has a defined lifespan.
For instance session identifiers, temporary
passwords etc.
For this Cassandra provides a Time To Live (TTL)
mechanism.
39. TTL Example…
39
Create table
Insert data using TTL
Can update specific column with table
Show using selects.