This presentation is part of one of my webinar in clean code webinar series. The contents are slightly edited to share the information in public domain. In this presentation, I tried to provide detailed introduction to code refactoring practice.
This presentation will be useful for software architects/Managers,developers and QAs. Do share your feedback in comments.
Refactoring: Improve the design of existing codeValerio Maggio
Refactoring: Improve the design of existing code
Software Engineering class on main refactoring techniques and bad smells reported in the famous Fawler's book on this topic!
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConJérôme Petazzoni
Containers are everywhere. But what exactly is a container? What are they made from? What's the difference between LXC, butts-nspawn, Docker, and the other container systems out there? And why should we bother about specific filesystems?
In this talk, Jérôme will show the individual roles and behaviors of the components making up a container: namespaces, control groups, and copy-on-write systems. Then, he will use them to assemble a container from scratch, and highlight the differences (and likelinesses) with existing container systems.
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 InstanceScyllaDB
In this talk I will walk you through the performance tuning steps that I took to serve 1.2M JSON requests per second from a 4 vCPU c5 instance, using a simple API server written in C.
At the start of the journey the server is capable of a very respectable 224k req/s with the default configuration. Along the way I made extensive use of tools like FlameGraph and bpftrace to measure, analyze, and optimize the entire stack, from the application framework, to the network driver, all the way down to the kernel.
I began this wild adventure without any prior low-level performance optimization experience; but once I started going down the performance tuning rabbit-hole, there was no turning back. Fueled by my curiosity, willingness to learn, and relentless persistence, I was able to boost performance by over 400% and reduce p99 latency by almost 80%.
This presentation is part of one of my webinar in clean code webinar series. The contents are slightly edited to share the information in public domain. In this presentation, I tried to provide detailed introduction to code refactoring practice.
This presentation will be useful for software architects/Managers,developers and QAs. Do share your feedback in comments.
Refactoring: Improve the design of existing codeValerio Maggio
Refactoring: Improve the design of existing code
Software Engineering class on main refactoring techniques and bad smells reported in the famous Fawler's book on this topic!
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConJérôme Petazzoni
Containers are everywhere. But what exactly is a container? What are they made from? What's the difference between LXC, butts-nspawn, Docker, and the other container systems out there? And why should we bother about specific filesystems?
In this talk, Jérôme will show the individual roles and behaviors of the components making up a container: namespaces, control groups, and copy-on-write systems. Then, he will use them to assemble a container from scratch, and highlight the differences (and likelinesses) with existing container systems.
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 InstanceScyllaDB
In this talk I will walk you through the performance tuning steps that I took to serve 1.2M JSON requests per second from a 4 vCPU c5 instance, using a simple API server written in C.
At the start of the journey the server is capable of a very respectable 224k req/s with the default configuration. Along the way I made extensive use of tools like FlameGraph and bpftrace to measure, analyze, and optimize the entire stack, from the application framework, to the network driver, all the way down to the kernel.
I began this wild adventure without any prior low-level performance optimization experience; but once I started going down the performance tuning rabbit-hole, there was no turning back. Fueled by my curiosity, willingness to learn, and relentless persistence, I was able to boost performance by over 400% and reduce p99 latency by almost 80%.
Optimizing Servers for High-Throughput and Low-Latency at DropboxScyllaDB
I'm going to discuss the efficiency/performance optimizations of different layers of the system. Starting from the lowest levels like hardware and drivers: these tunings can be applied to pretty much any high-load server. Then we’ll move to Linux kernel and its TCP/IP stack: these are the knobs you want to try on any of your TCP-heavy boxes. Finally, we’ll discuss library and application-level tunings, which are mostly applicable to HTTP servers in general and nginx/envoy specifically.
For each potential area of optimization I’ll try to give some background on latency/throughput tradeoffs (if any), monitoring guidelines, and, finally, suggest tunings for different workloads.
Also, I'll cover more theoretical approaches to performance analysis and the newly developed tooling like `bpftrace` and new `perf` features.
This presentation introduces the principles of high-quality programming code construction during the software development process. The quality of the code is discussed in its most important characteristics – correctness, readability and maintainability. The principles of construction of high-quality class hierarchies, classes and methods are explained. Two fundamental concepts – “loose coupling” and “strong cohesion” are defined and their effect on the construction of classes and subroutines is discussed. Some advices for correctly dealing with the variables and data are given, as well as directions for correct naming of the variables and the rest elements of the program. Best practices for organization of the logical programming constructs are explained. Attention is given also to the “refactoring” as a technique for improving the quality of the existing code. The principles of good formatting of the code are defined and explained. The concept of “self-documenting code” as a programming style is introduced.
(BDT401) Amazon Redshift Deep Dive: Tuning and Best PracticesAmazon Web Services
Get a look under the covers: Learn tuning best practices for taking advantage of Amazon Redshift's columnar technology and parallel processing capabilities to improve your delivery of queries and improve overall database performance. This session explains how to migrate from existing data warehouses, create an optimized schema, efficiently load data, use work load management, tune your queries, and use Amazon Redshift's interleaved sorting features. Finally, learn how TripAdvisor uses these best practices to give their entire organization access to analytic insights at scale.
Refactoring of code for more readable, scalable modules based on The book titled "Refactoring - Improving the Design of Existing Code" written by Martin Fowler
Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023Steve Pember
In this presentation we will present the general philosophy of Clean Architecture, Hexagonal Architecture, and Ports & Adapters: discussing why these approaches are useful and general guidelines for introducing them to your code. Chiefly, we will show how to implement these patterns within your Spring (Boot) Applications. Through a publicly available reference app, we will demonstrate what these concepts can look like within Spring and walkthrough a handful of scenarios: isolating core business logic, ease of testing, and adding a new feature or two.
Memory Management: What You Need to Know When Moving to Java 8AppDynamics
This presentation will compare and contrast application behavior in Java 7 with Java 8, particularly focusing on memory management and usage. Several code examples are presented to show how to recognize and respond to common pitfalls.
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
How to add an interactive shell (remote, too) to a C++ application by using my open-source C++14 library:
https://github.com/daniele77/cli
In the slide deck you can learn how to use it, how does it work, and find some thoughts about C++ design and patterns used by the library.
How to improve your code quality? The answer is continuous refactoring. Learn more about refactoring. Know the most frequent code smells (antipatterns), telling when to refactor. Go through the catalog of well-known refactorings, telling how to improve your code.
Optimizing Servers for High-Throughput and Low-Latency at DropboxScyllaDB
I'm going to discuss the efficiency/performance optimizations of different layers of the system. Starting from the lowest levels like hardware and drivers: these tunings can be applied to pretty much any high-load server. Then we’ll move to Linux kernel and its TCP/IP stack: these are the knobs you want to try on any of your TCP-heavy boxes. Finally, we’ll discuss library and application-level tunings, which are mostly applicable to HTTP servers in general and nginx/envoy specifically.
For each potential area of optimization I’ll try to give some background on latency/throughput tradeoffs (if any), monitoring guidelines, and, finally, suggest tunings for different workloads.
Also, I'll cover more theoretical approaches to performance analysis and the newly developed tooling like `bpftrace` and new `perf` features.
This presentation introduces the principles of high-quality programming code construction during the software development process. The quality of the code is discussed in its most important characteristics – correctness, readability and maintainability. The principles of construction of high-quality class hierarchies, classes and methods are explained. Two fundamental concepts – “loose coupling” and “strong cohesion” are defined and their effect on the construction of classes and subroutines is discussed. Some advices for correctly dealing with the variables and data are given, as well as directions for correct naming of the variables and the rest elements of the program. Best practices for organization of the logical programming constructs are explained. Attention is given also to the “refactoring” as a technique for improving the quality of the existing code. The principles of good formatting of the code are defined and explained. The concept of “self-documenting code” as a programming style is introduced.
(BDT401) Amazon Redshift Deep Dive: Tuning and Best PracticesAmazon Web Services
Get a look under the covers: Learn tuning best practices for taking advantage of Amazon Redshift's columnar technology and parallel processing capabilities to improve your delivery of queries and improve overall database performance. This session explains how to migrate from existing data warehouses, create an optimized schema, efficiently load data, use work load management, tune your queries, and use Amazon Redshift's interleaved sorting features. Finally, learn how TripAdvisor uses these best practices to give their entire organization access to analytic insights at scale.
Refactoring of code for more readable, scalable modules based on The book titled "Refactoring - Improving the Design of Existing Code" written by Martin Fowler
Anatomy of a Spring Boot App with Clean Architecture - Spring I/O 2023Steve Pember
In this presentation we will present the general philosophy of Clean Architecture, Hexagonal Architecture, and Ports & Adapters: discussing why these approaches are useful and general guidelines for introducing them to your code. Chiefly, we will show how to implement these patterns within your Spring (Boot) Applications. Through a publicly available reference app, we will demonstrate what these concepts can look like within Spring and walkthrough a handful of scenarios: isolating core business logic, ease of testing, and adding a new feature or two.
Memory Management: What You Need to Know When Moving to Java 8AppDynamics
This presentation will compare and contrast application behavior in Java 7 with Java 8, particularly focusing on memory management and usage. Several code examples are presented to show how to recognize and respond to common pitfalls.
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
How to add an interactive shell (remote, too) to a C++ application by using my open-source C++14 library:
https://github.com/daniele77/cli
In the slide deck you can learn how to use it, how does it work, and find some thoughts about C++ design and patterns used by the library.
How to improve your code quality? The answer is continuous refactoring. Learn more about refactoring. Know the most frequent code smells (antipatterns), telling when to refactor. Go through the catalog of well-known refactorings, telling how to improve your code.
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB
Query performance can either be a constant headache or the unsung hero of an application. MongoDB provides extremely powerful querying capabilities when used properly. As a member of the solutions architecture team, I will share common mistakes observed as well as tips and tricks to avoiding them.
An overview and discussion on indexing data in Redis to facilitate fast and efficient data retrieval. Presented on September 22nd, 2014 to the Redis Tel Aviv Meetup.
In this talk most of the C++11 features will be uncovered and examples from real world use will be presented from my personal experience in writing software systems using C++11 using GCC compiler. Also I will compare open source implementation with proprietary implementation of C++11. This is basically a C++11 talk to give audience a glimpse on what is going on in the C++ world.
AMC Squarelearning Bangalore is the best training institute for a career development. it had students from various parts of the country and even few were from West African countries.
Rails is a great Ruby-based framework for producing web sites quickly and effectively. Here are a bunch of tips and best practices aimed at the Ruby newbie.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
2. Agenda
• Background
• Basics of Indexes
• Native Secondary Indexes
• "Wide rows" and CF-based Indexes
• Inverted-indexes Using SuperColumns
• Inverted-indexes Using Composite Columns
• Q&A
3. Background
This presentation is based on:
• What we learned over the last year in building a
highly indexed system on Cassandra
• Participation in the Hector client project
• Common questions and issues posted to the
Cassandra mailing lists
4. Brief History - Cassandra 0.6
• No built-in secondary indexes
• All indexes were custom-built, usually using
super-columns
• Pros
– Forced you to learn how Cassandra works
• Cons
– Lots of work
– Super-columns proved a dead-end
5. Brief History - Cassandra 0.7
• Built-in secondary indexes
• New users flocked to these
• Pros
– Easy to use, out of the box
• Cons
– Deceptively similar to SQL indexes but not the same
– Reinforce data modeling that plays against Cassandra’s
strengths
6. Present Day
• New users can now get started with Cassandra
without really understanding it (CQL, etc.)
• Veteran users are using advanced techniques
(Composites, etc.) that aren’t really documented
anywhere*
• New user panic mode when they try to go to the
next level and suddenly find themselves in the
deep end
*Actually, they are, Google is your friend…
8. A Quick Review
There are essentially two ways of finding rows:
The Primary Index
(row keys)
Alternate Indexes
(everything else)
9. The “Primary Index”
• The “primary index” is your row key*
• Sometimes it’s meaningful (“natural id”):
Users = {
"edanuff" : {
email: "ed@anuff.com"
}
}
*Yeah. No. But if it helps, yes.
10. The “Primary Index”
• The “primary index” is your row key*
• But usually it’s not:
Users = {
"4e3c0423-aa84-11e0-a743-58b0356a4c0a" : {
username: "edanuff",
email: "ed@anuff.com"
}
}
*Yeah. No. But if it helps, yes.
11. Get vs. Find
• Using the row key is the best way to retrieve
something if you’ve got a precise and
immutable 1:1 mapping
• If you find yourself ever planning to iterate
over keys, you’re probably doing something
wrong
– i.e. avoid the Order Preserving Partitioner*
• Use alternate indexes to find (search) things
*Feel free to disregard, but don’t complain later
12. Alternate Indexes
Anything other than using the row key:
• Native secondary indexes
• Wide rows as lookup and grouping tables
• Custom secondary indexes
Remember, there is no magic here…
13. Native Secondary Indexes
• Easy to use
• Look (deceptively) like SQL indexes, especially
when used with CQL
CREATE INDEX ON Users(username);
SELECT * FROM Users WHERE
username="edanuff";
14. Under The Hood
• Every index is stored as its own "hidden" CF
• Nodes index the rows they store
• When you issue a query, it gets sent to all
nodes
• Currently does equality operations, the range
operations get performed by in memory by
coordinator node
15. Some Limitations
• Not recommended for high cardinality values (i.e.
timestamps, birthdates, keywords, etc.)
• Requires at least one equality comparison in a
query – not great for less-than/greater-than/range
queries
• Unsorted - results are in token order, not query
value order
• Limited to search on data types Cassandra
natively understands
16. I can’t live with those limitations,
what are my options?
Complain on the mailing list
Switch to Mongo
Build my indexes in my application
18. Wide Rows
“Why would a row need 2B columns”?
• Basis of all indexing, organizing, and
relationships in Cassandra
• If your data model has no rows with over a
hundred columns, you’re either doing
something wrong or shouldn’t be using
Cassandra*
*IMHO J
22. Column Families As Indexes
• CF column operations very fast
• Column slices can be retrieved by range, are
always sorted, can be reversed, etc.
• If target key a TimeUUID you get both grouping
and sort by timestamp
– Good for inboxes, feeds, logs, etc. (Twissandra)
• Best option when you need to combine groups,
sort, and search
– Search friends list, inbox, etc.
23. But, only works for 1:1
What happens when I’ve got one to many?
Indexes = {
"User_Keys_By_Last_Name" : {
"adams" : "e5d61f2b-…",
"alden" : "e80a17ba-…",
"anderson" : "e5d61f2b-…",
✖! "anderson" : "e719962b-…",
Not Allowed
"doe" : "e78ece0f-…",
"franks" : "e66afd40-…",
… : …,
}
}
25. Use with caution
• Not officially deprecated, but not highly
recommended either
• Sorts only on the supercolumn, not subcolumn
• Some performance issues
• What if I want more nesting?
– Can subcolumns have subcolumns? NO!
• Anecdotally, many projects have moved away
from using supercolumns
26. So, let’s revisit regular CF’s
What happens when I’ve got one to many?
Indexes = {
"User_Keys_By_Last_Name" : {
"adams" : "e5d61f2b-…",
"alden" : "e80a17ba-…",
"anderson" : "e5d61f2b-…",
✖! "anderson" : "e719962b-…",
Not Allowed
"doe" : "e78ece0f-…",
"franks" : "e66afd40-…",
… : …,
}
}
27. So, let’s revisit regular CF’s
What if we could turn it back to one to one?
Indexes = {
"User_Keys_By_Last_Name" : {
{"adams", 1} : "e5d…",
{"alden", 1} : "e80…",
{"anderson", 1} : "e5f…",
✔! {"anderson", 2} : "e71…",
Allowed!!!
{"doe", 1} : "e78…",
{"franks", 1} : "e66…",
… : …,
}
}
28. Composite Column Names
Comparator = “CompositeType” or “DynamicCompositeType”
{"anderson", 1, 1, 1, …} : "e5f…"
1..N Components
Build your column name out of one or more
“component” values which can be of any of the
columns types Cassandra supports
(i.e. UTF8, IntegerType, UUIDType, etc.)
29. Composite Column Names
Comparator = “CompositeType” or “DynamicCompositeType”
{"anderson", 1, 1, 1, …} : "e5f…"
{"anderson", 1, 1, 2, …} : "e5f…"
{"anderson", 1, 2, 1, …} : "e5f…"
{"anderson", 1, 2, 2, …} : "e5f…"
Sorts by component values using each
component type’s sort order
Retrieve using normal column slice technique
30. Two Types Of Composites
- column_families:
- name: My_Composite_Index_CF
- compare_with:
CompositeType(UTF8Type, UUIDType)
- name: My_Dynamic_Composite_Index_CF
- compare_with:
DynamicCompositeType(s=>UTF8Type, u=>UUIDType)
Main difference for use in indexes is whether you
need to create one CF per index vs one CF for
all indexes with one row per index
31. Static Composites
- column_families:
- name: My_Composite_Index_CF
- compare_with:
CompositeType(UTF8Type, UUIDType)
{"anuff", "e5f…"} : "e5f…"
Fixed # and order
defined in column
configuration
32. Dynamic Composites
- column_families:
- name: My_Dynamic_Composite_Index_CF
- compare_with:
DynamicCompositeType(s=>UTF8Type, u=>UUIDType)
{"anuff", "e5f…", "e5f…"} : "…"
Any # and
order of
types at
runtime
33. Typical Composite Index Entry
{<term 1>, …, <term N>, <key>, <ts>}
<term 1…N> - terms to query on (i.e. last_name, first_name)
<key> - target row key
<ts> - unique timestamp, usually time-based UUID
34. How does this work?
• Queries are easy
– regular column slice operations
• Updates are harder
– Need to remove old value and insert the new
value
– Uh oh, read before write??!!!
35. Example – Users By Location
• We need 3 Column Families (not 2)
• First 2 CF’s are obvious:
– Users
– Indexes
• We also need a third CF:
– Users_Index_Entries
41. Updating The Index
• We read previous index values from the
Users_Index_Entries CF rather than the
Users CF to deal with concurrency
• Columns in Index CF and
Users_Index_Entries CF are timestamped so
no locking is needed for concurrent updates
42. Get Old Values For Column
SELECT {"location"}..{"location",*}
FROM Users_Index_Entries WHERE KEY = <user_key>;
BEGIN BATCH
DELETE {"location", ts1}, {"location", ts2}, …
FROM Users_Index_Entries WHERE KEY = <user_key>;
DELETE {<value1>, <user_key>, ts1}, {<value2>, <user_key>, ts2}, …
FROM Users_By_Location WHERE KEY = <user_key>;
UPDATE Users_Index_Entries SET {"location", ts3} = <value3>
WHERE KEY = <user_key>;
UPDATE Indexes SET {<value3>, <user_key>, ts3) = null
WHERE KEY = "Users_By_Location";
UPDATE Users SET location = <value3>
WHERE KEY = <user_key>;
APPLY BATCH
43. Remove Old Column Values
SELECT {"location"}..{"location",*}
FROM Users_Index_Entries WHERE KEY = <user_key>;
BEGIN BATCH
DELETE {"location", ts1}, {"location", ts2}, …
FROM Users_Index_Entries WHERE KEY = <user_key>;
DELETE {<value1>, <user_key>, ts1}, {<value2>, <user_key>, ts2}, …
FROM Users_By_Location WHERE KEY = <user_key>;
UPDATE Users_Index_Entries SET {"location", ts3} = <value3>
WHERE KEY = <user_key>;
UPDATE Indexes SET {<value3>, <user_key>, ts3) = null
WHERE KEY = "Users_By_Location";
UPDATE Users SET location = <value3>
WHERE KEY = <user_key>;
APPLY BATCH
44. Insert New Column Values In Index
SELECT {"location"}..{"location",*}
FROM Users_Index_Entries WHERE KEY = <user_key>;
BEGIN BATCH
DELETE {"location", ts1}, {"location", ts2}, …
FROM Users_Index_Entries WHERE KEY = <user_key>;
DELETE {<value1>, <user_key>, ts1}, {<value2>, <user_key>, ts2}, …
FROM Users_By_Location WHERE KEY = <user_key>;
UPDATE Users_Index_Entries SET {"location", ts3} = <value3>
WHERE KEY = <user_key>;
UPDATE Indexes SET {<value3>, <user_key>, ts3) = null
WHERE KEY = "Users_By_Location";
UPDATE Users SET location = <value3>
WHERE KEY = <user_key>;
APPLY BATCH
45. Set New Value For User
SELECT {"location"}..{"location",*}
FROM Users_Index_Entries WHERE KEY = <user_key>;
BEGIN BATCH
DELETE {"location", ts1}, {"location", ts2}, …
FROM Users_Index_Entries WHERE KEY = <user_key>;
DELETE {<value1>, <user_key>, ts1}, {<value2>, <user_key>, ts2}, …
FROM Users_By_Location WHERE KEY = <user_key>;
UPDATE Users_Index_Entries SET {"location", ts3} = <value3>
WHERE KEY = <user_key>;
UPDATE Indexes SET {<value3>, <user_key>, ts3) = null
WHERE KEY = "Users_By_Location";
UPDATE Users SET location = <value3>
WHERE KEY = <user_key>;
APPLY BATCH
46. Frequently Asked Questions
• Do I need locking for concurrency?
– No, the index will always be eventually consistent
• What if something goes wrong?
– You will have to have provisions to repeat the batch operation
until it completes, but its idempotent, so it’s ok
• Can’t I get a false positive?
– Depending on updates being in-flight, you might get a false
positive, if this is a problem, filter on read
• Who else is using this approach?
– Actually very common with lots of variations, this isn’t the only
way to do this but at least composite format is now standard
47. Some Things To Think About
• Indexes can be derived from column values
– Create a “last_name, first_name” index from a
“fullname” column
– Unroll a JSON object to construct deep indexes of
serialized JSON structures
• Include additional denormalized values in the
index for faster lookups
• Use composites for column values too not just
column names
48. How can I learn more?
Sample implementation using Hector:
https://github.com/edanuff/CassandraIndexedCollections
JPA implementation using this for Hector:
https://github.com/riptano/hector-jpa
Jira entry on native composites for Cassandra:
https://issues.apache.org/jira/browse/CASSANDRA-2231
Background blog posts:
http://www.anuff.com/2011/02/indexing-in-cassandra.html
http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
49. What is Usergrid?
• Cloud PaaS backend for mobile and rich-client applications
• Powered by Cassandra
• In private Beta now
• Entire stack to be open-sourced in August so people can
run their own
• Sign up to get admitted to Beta and to be notified when
about source code
http://www.usergrid.com