Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.
C*ollege Credit: Data Modeling for Apache CassandraDataStax
Cassandra stores data differently than traditional RDBMS’s. It is these differences that allow for improvements in performance, availability and scalability. Aaron Morton, DataStax MVP for Apache Cassandra will present the basics of the data model and outline the differences clearly. This webinar is 101 level and is suitable for people who are coming from a relational background and just starting to get into Apache Cassandra.
Apache Cassandra, part 3 – machinery, work with CassandraAndrey Lomakin
Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.
Apache Cassandra, part 1 – principles, data modelAndrey Lomakin
Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
Cassandra community has consistently requested that we cover C* schema design concepts. This presentation goes in depth on the following topics:
- Schema design
- Best Practices
- Capacity Planning
- Real World Examples
C*ollege Credit: Data Modeling for Apache CassandraDataStax
Cassandra stores data differently than traditional RDBMS’s. It is these differences that allow for improvements in performance, availability and scalability. Aaron Morton, DataStax MVP for Apache Cassandra will present the basics of the data model and outline the differences clearly. This webinar is 101 level and is suitable for people who are coming from a relational background and just starting to get into Apache Cassandra.
Apache Cassandra, part 3 – machinery, work with CassandraAndrey Lomakin
Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.
Apache Cassandra, part 1 – principles, data modelAndrey Lomakin
Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
Cassandra community has consistently requested that we cover C* schema design concepts. This presentation goes in depth on the following topics:
- Schema design
- Best Practices
- Capacity Planning
- Real World Examples
Cassandra Community Webinar: Back to Basics with CQL3DataStax
Cassandra is a distributed, massively scalable, fault tolerant, columnar data store, and if you need the ability to make fast writes, the only thing faster than Cassandra is /dev/null! In this fast-paced presentation, we'll briefly describe big data, and the area of big data that Cassandra is designed to fill. We will cover Cassandra's unique, every-node-the-same architecture. We will reveal Cassandra's internal data structure and explain just why Cassandra is so darned fast. Finally, we'll wrap up with a discussion of data modeling using the new standard protocol: CQL (Cassandra Query Language).
Functional data models are great, but how can you squeeze out more performance and make them awesome! Let's talk through some example models, go through the tuning steps and understand the tradeoffs. Many time's just a simple understanding of the underlying internals can make all the difference. I've helped some of the biggest companies in the world do this and I can help you. Do you feel the need for Cassandra 2.0 speed?
Inside Cassandra – C* is an interesting piece of software for many reasons, but it is especially interesting in its use of elegant data structures and algorithms. This talk will focus on the data structures and algorithms that make C* such a scalable and performant database. We will walk along the write, read and delete paths exploring the low-level details of how each of these operations work. We will also explore some of the background processes that maintain availability and performance. The goal of this talk is to gain a deeper understanding of C* by exploring the low-level details of its implementation.
Further discussion on Data Modeling with Apache Cassandra. Overview of formal data modeling techniques as well as practical. Real-world use cases and associated data models.
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series ExampleDataStax Academy
Take a deep dive into understanding best practices for Cassandra data modelin,g with a review of a time series data modeling example. Partition key selection, data duplication, in place aggregation, as well as using TTL's and DateTieredCompaction to positive effect will all be covered.
Cassandra Community Webinar | Become a Super ModelerDataStax
Sure you can do some time series modeling. Maybe some user profiles. What's going to make you a super modeler? Let's take a look at some great techniques taken from real world applications where we exploit the Cassandra big table model to it's fullest advantage. We'll cover some of the new features in CQL 3 as well as some tried and true methods. In particular, we will look at fast indexing techniques to get data faster at scale. You'll be jet setting through your data like a true super modeler in no time.
Speaker: Patrick McFadin, Principal Solutions Architect at DataStax
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...DataStax
The 3.0 storage engine re-write is the biggest and most exciting change to ever happen in Apache Cassandra. The new storage engine can efficiently store and read data from disk using the same concepts present in the CQL 3 language. This has delivered large space savings, and creates new performance characteristics.
In this talk Aaron Morton, Co Founder at The Last Pickle and Apache Cassandra Committer, will discuss the 3.0 storage engine, it's layout and performance characteristics.
About the Speaker
Aaron Morton CEO, The Last Pickle
Aaron Morton is the Co Founder & CEO at The Last Pickle (thelastpickle.com). A professional services company that works with clients to deliver and improve Apache Cassandra based solutions. He's based in New Zealand, is an Apache Cassandra Committer and a DataStax MVP for Apache Cassandra.
Managing large volumes of data isn’t trivial and needs a plan. Fast Data is how we describe the nature of data in a heavily consumer-driven world. Fast in. Fast out. Is your data infrastructure ready? You will learn some important reference architectures for large-scale data problems. The three main areas are covered:
Organize - Manage the incoming data stream and ensure it is processed correctly and on time. No data left behind.
Process - Analyze volumes of data you receive in near real-time or in a batch. Be ready for fast serving in your application.
Store - Reliably store data in the data models to support your application. Never accept downtime or slow response times.
Cassandra 3.0 - JSON at scale - StampedeCon 2015StampedeCon
This session will explore the new features in Cassandra 3.0, starting with JSON support. Cassandra now allows storing JSON directly to Cassandra rows and vice versa, making it trivial to deploy Cassandra as a component in modern service-oriented architectures.
Cassandra 3.0 also delivers other enhancements to developer productivity: user defined functions let developers deploy custom application logic server side with any language conforming to the Java scripting API, including Javascript. Global indexes allow scaling indexed queries linearly with the size of the cluster, a first for open-source NoSQL databases.
Finally, we will cover the performance improvements in Cassandra 3.0 as well.
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingDataStax Academy
You know you need Cassandra for it's uptime and scaling, but what about that data model? Let's bridge that gap and get you building your game changing app. We'll break down topics like storing objects and indexing for fast retrieval. You will see by understanding a few things about Cassandra internals, you can put your data model in the spotlight. The goal of this talk is to get you comfortable working with data in Cassandra throughout the application lifecycle. What are you waiting for? The cameras are waiting!
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...DataStax
Building queues on distributed data stores is hard, and long been considered an antipattern. However, with careful consideration and tactics, it is possible to do. CassieQ is an implementation of a distributed queue on Cassandra which supports easy installation, massive data ingest, authentication, a simple to use HTTP based API, and no dependencies other than your already existing Cassandra environment.
About the Speakers
Anton Kropp Senior Software Engineer, Curalate
Anton Kropp is a senior engineer with over 8 years experience building distributed and fault tolerant systems. He has worked at companies big and small (Godaddy, PracticeFusion), and enjoys building frameworks and tooling to make life easier with a penchant for dockerized containers and simple API's. When he's not messing around on his computer he's drinking local Seattle beers, zipping around the city on his electric bike, and hanging out with his wife and dog.
Cassandra Community Webinar: Back to Basics with CQL3DataStax
Cassandra is a distributed, massively scalable, fault tolerant, columnar data store, and if you need the ability to make fast writes, the only thing faster than Cassandra is /dev/null! In this fast-paced presentation, we'll briefly describe big data, and the area of big data that Cassandra is designed to fill. We will cover Cassandra's unique, every-node-the-same architecture. We will reveal Cassandra's internal data structure and explain just why Cassandra is so darned fast. Finally, we'll wrap up with a discussion of data modeling using the new standard protocol: CQL (Cassandra Query Language).
Functional data models are great, but how can you squeeze out more performance and make them awesome! Let's talk through some example models, go through the tuning steps and understand the tradeoffs. Many time's just a simple understanding of the underlying internals can make all the difference. I've helped some of the biggest companies in the world do this and I can help you. Do you feel the need for Cassandra 2.0 speed?
Inside Cassandra – C* is an interesting piece of software for many reasons, but it is especially interesting in its use of elegant data structures and algorithms. This talk will focus on the data structures and algorithms that make C* such a scalable and performant database. We will walk along the write, read and delete paths exploring the low-level details of how each of these operations work. We will also explore some of the background processes that maintain availability and performance. The goal of this talk is to gain a deeper understanding of C* by exploring the low-level details of its implementation.
Further discussion on Data Modeling with Apache Cassandra. Overview of formal data modeling techniques as well as practical. Real-world use cases and associated data models.
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series ExampleDataStax Academy
Take a deep dive into understanding best practices for Cassandra data modelin,g with a review of a time series data modeling example. Partition key selection, data duplication, in place aggregation, as well as using TTL's and DateTieredCompaction to positive effect will all be covered.
Cassandra Community Webinar | Become a Super ModelerDataStax
Sure you can do some time series modeling. Maybe some user profiles. What's going to make you a super modeler? Let's take a look at some great techniques taken from real world applications where we exploit the Cassandra big table model to it's fullest advantage. We'll cover some of the new features in CQL 3 as well as some tried and true methods. In particular, we will look at fast indexing techniques to get data faster at scale. You'll be jet setting through your data like a true super modeler in no time.
Speaker: Patrick McFadin, Principal Solutions Architect at DataStax
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...DataStax
The 3.0 storage engine re-write is the biggest and most exciting change to ever happen in Apache Cassandra. The new storage engine can efficiently store and read data from disk using the same concepts present in the CQL 3 language. This has delivered large space savings, and creates new performance characteristics.
In this talk Aaron Morton, Co Founder at The Last Pickle and Apache Cassandra Committer, will discuss the 3.0 storage engine, it's layout and performance characteristics.
About the Speaker
Aaron Morton CEO, The Last Pickle
Aaron Morton is the Co Founder & CEO at The Last Pickle (thelastpickle.com). A professional services company that works with clients to deliver and improve Apache Cassandra based solutions. He's based in New Zealand, is an Apache Cassandra Committer and a DataStax MVP for Apache Cassandra.
Managing large volumes of data isn’t trivial and needs a plan. Fast Data is how we describe the nature of data in a heavily consumer-driven world. Fast in. Fast out. Is your data infrastructure ready? You will learn some important reference architectures for large-scale data problems. The three main areas are covered:
Organize - Manage the incoming data stream and ensure it is processed correctly and on time. No data left behind.
Process - Analyze volumes of data you receive in near real-time or in a batch. Be ready for fast serving in your application.
Store - Reliably store data in the data models to support your application. Never accept downtime or slow response times.
Cassandra 3.0 - JSON at scale - StampedeCon 2015StampedeCon
This session will explore the new features in Cassandra 3.0, starting with JSON support. Cassandra now allows storing JSON directly to Cassandra rows and vice versa, making it trivial to deploy Cassandra as a component in modern service-oriented architectures.
Cassandra 3.0 also delivers other enhancements to developer productivity: user defined functions let developers deploy custom application logic server side with any language conforming to the Java scripting API, including Javascript. Global indexes allow scaling indexed queries linearly with the size of the cluster, a first for open-source NoSQL databases.
Finally, we will cover the performance improvements in Cassandra 3.0 as well.
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingDataStax Academy
You know you need Cassandra for it's uptime and scaling, but what about that data model? Let's bridge that gap and get you building your game changing app. We'll break down topics like storing objects and indexing for fast retrieval. You will see by understanding a few things about Cassandra internals, you can put your data model in the spotlight. The goal of this talk is to get you comfortable working with data in Cassandra throughout the application lifecycle. What are you waiting for? The cameras are waiting!
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...DataStax
Building queues on distributed data stores is hard, and long been considered an antipattern. However, with careful consideration and tactics, it is possible to do. CassieQ is an implementation of a distributed queue on Cassandra which supports easy installation, massive data ingest, authentication, a simple to use HTTP based API, and no dependencies other than your already existing Cassandra environment.
About the Speakers
Anton Kropp Senior Software Engineer, Curalate
Anton Kropp is a senior engineer with over 8 years experience building distributed and fault tolerant systems. He has worked at companies big and small (Godaddy, PracticeFusion), and enjoys building frameworks and tooling to make life easier with a penchant for dockerized containers and simple API's. When he's not messing around on his computer he's drinking local Seattle beers, zipping around the city on his electric bike, and hanging out with his wife and dog.
Everyone knows that Cassandra is a NoSQL solution for data storage. But often for processing of this data message queues are used with some existing messaging provider. Due to this, there is inconsistency of data sometimes and an additional infrastructure level to maintain. Since one of our services stores all the data in Cassandra, we have developed a solution for message queues that automatically gained a lot of useful features: scalability, high availability and flexibility. This solution I will present in the talk.
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...DataStax
At Choice Hotels International, we are in the midst of a multi-year effort to replace our 25 year old monolithic reservation system with a cloud-based, microservice-style architecture using Cassandra. Since processing the first live reservation on the new system in December 2015, we've been shifting an increasing amount of shopping and booking traffic to the new system, with retirement of the old system scheduled for early 2017.
After a quick review of our problem space, architecture, schema design, and Cassandra deployment, we'll take a closer look several challenges we faced and discuss how they impacted our data modeling, development and deployment:
* Managing data with varying consistency requirements
* Maintaining data integrity across microservice boundaries
* Performing complex queries involving overlapping time ranges
* Relying on time-to-live (TTL) for data cleanup
* Balancing denormalization, performance and cost
About the Speakers
Andrew Baker Senior Software Engineer, Choice Hotels International
Andrew is the technical lead of the service development team responsible for storage and maintenance of rates and reservations for thousands of hotels around the world.
Jeffrey Carpenter Systems Architect, Choice Hotels International
Jeff Carpenter is a software and systems architect with experience in the hospitality and defense industries, it. Jeff is currently working on a cloud-based hotel reservation system using Cassandra and is the author of the new O'Reilly book "Cassandra: The Definitive Guide, 2nd edition".
Cassandra is an open source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple data-centres,with asynchronous master-less replication allowing low latency operations for all clients.
Introduciton to Apache Cassandra for Java Developers (JavaOne)zznate
The database industry has been abuzz over the past year about NoSQL databases. Apache Cassandra, which has quickly emerged as a best-of-breed solution in this space, is used at many companies to achieve unprecedented scale while maintaining streamlined operations.
This presentation goes beyond the hype, buzzwords, and rehashed slides and actually presents the attendees with a hands-on, step-by-step tutorial on how to write a Java application on top of Apache Cassandra. It focuses on concepts such as idempotence, tunable consistency, and shared-nothing clusters to help attendees get started with Apache Cassandra quickly while avoiding common pitfalls.
This presentation explain about "Apache Cassandra's concepts and architecture".
My friends and colleagues said
"This presentation should be release on public space to help many peoples work in IT"
so, I upload this file for everyone love "Technology for the people"
This presentation used for educating the employee of KT last year.
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSCobus Bernard
In this session, we will take you through setting up an Amazon Redshift cluster and at the ways you can populate it with data. We will start by using AWS DMS to replicate the data as-is as well as doing some ETL on it. This will be followed by AWS Glue where you can do more advanced ETL operations. Lastly, we will look at how you can use Amazon Kinesis Firehose to stream event directly to the Redshift cluster.
Data warehousing is a critical component for analysing and extracting actionable insights from your data. Amazon Redshift allows you to deploy a scalable data warehouse in a matter of minutes and starts to analyse your data right away using your existing business intelligence tools.
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Tathagata Das
Spark Streaming is a framework for processing large volumes of streaming data in near-real-time. This is an introductory presentation about how Spark Streaming and Kafka can be used for high volume near-real-time streaming data processing in a cluster. This was a guest lecture in a Stanford course.
More information on the course at http://stanford.edu/~rezab/dao/
NET Systems Programming Learned the Hard Way.pptxpetabridge
What is a thread quantum and why is it different on Windows Desktop and Windows Server? What's the difference between a blocking call and a blocking flow? Why did our remoting benchmarks suddenly drop when we moved to .NET 6? When should I try to write lock-free code? What does the `volatile` keyword mean?
Welcome to the types of questions my team and I are asked, or ask ourselves, on a regular basis - we're the makers of Akka.NET, a high performance distributed actor system library and these are the sorts of low-level questions we need to answer in order to build great experiences for our own users.
In this talk we're going to learn about .NET systems programming, the low level components we hope we can take for granted, but sometimes can't. In particular:
- The `ThreadPool` and how work queues operate in practice;
- Synchronization mechanisms - including `lock`-less ones;
- Memory management, `Span<T>`, and garbage collection;
- `await`, `Task`, and the synchronization contexts; and
- Crossing user-code and system boundaries in areas such as sockets.
This talk will help .NET developers understand why their code works the way it does and what to do in scenarios that demand high performance.
Similar to Apache Cassandra, part 2 – data model example, machinery (20)
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
3. Twissandra Use Cases Get the friends of a username Get the followers of a username Get a timeline of a specific user’s tweets Create a tweet Create a user Add friends to a user
13. Cassandra QL – User creation BATCH BEGIN BATCH INSERT INTO User (KEY, username, password) VALUES (‘id', ‘konstantin’, ‘******’) INSERT INTO Username (KEY, userid) VALUES ( ‘konstantin’, ‘id’) APPLY BATCH
14. Cassandra QL – following a friend BATCH BEGIN BATCH INSERT INTO Friends (KEY, friendid) VALUES (‘userid‘, ‘friendid’) INSERT INTO Followers (KEY, userid) VALUES (‘friendid ‘, ‘userid’) APPLY BATCH
15. Cassandra QL – Tweet creation BATCH BEGIN BATCH INSERT INTO Tweet (KEY, userid, body, timestamp) VALUES (‘tweetid‘, ‘userid’, ’@ericflo thanks for Twissandra, it helps!’, 123656459847) INSERT INTO Userline (KEY, 123656459847) VALUES ( ‘userid’, ‘tweetid’) INSERT INTO Timeline (KEY, 123656459847) VALUES ( ‘userid’, ‘tweetid’) …….. INSERT INTO Timeline (KEY, 123656459847) VALUES ( ‘followerid’, ‘tweetid’) …… APPLY BATCH
16. Cassandra QL – Getting user tweets SELECT * FROM Userline KEY = ‘userid’ SELECT * FROM Tweet WHERE KEY IN (‘tweetid1’, ‘tweetid2’, ‘tweetid3’, …., ‘tweetidn’)
17. Cassandra QL – Getting user timeline SELECT * FROM Timeline KEY = ‘userid’ SELECT * FROM Tweet WHERE KEY IN (‘tweetid1’, ‘tweetid2’, ‘tweetid3’, …., ‘tweetidn’)
18. Design patterns Materialized View create a second column family to represent additional queries Valueless Column use column names for values Aggregate Key If you need to find sub item, use composite key
21. Problem with eventual consistency When we update value, we should add new value to index, and remove old value. However, eventual consistency and lack of transactions make it impossible
26. Replication Replication controlled by the replication_factor setting in the keyspace definition The actual placement of replicas in the cluster is determined by the Replica Placement Strategies.
30. Snitches Give Cassandra information about the network topology of the cluster Endpoint snitch – gives information about network topology. Dynamic snitch – monitor read latencies
31. Endpoint Snitch Implementations SimpleSnitch(default)- can be efficient for locating nodes in clusters limited to a single data center.
32. Endpoint Snitch Implementations RackInferringSnitch - extrapolates the topolology of the network by analyzing IP addresses. 192.168.191.71 In the same rack 192.168.191.21 192.168.191.71 In the same datacenter 192.168.171.21 192.78.19.71 In different datacenters 192.18.11.21
33. Endpoint Snitch Implementations PropertyFileSnitch - determines the location of nodes by referring to a user-defined description of the network details located in the property file cassandra-topology.properties.
37. Write properties Write properties No reads No seeks Fast Atomic within ColumnFamily Always writable
38. Write/Read properties Read properties Read multiple SSTables Slower than writes (but still fast) Seeks can be mitigated with more RAM Scales to billions of rows
39. Commit Log durability Durability settings reflects PostgreSQL settings. Periodic sync of commit log. With potential probability for data loss. Batch sync of commit log. Write is acknowledged only if commit log is flushed on disk. It is strongly recommended to have separate device for commit log in such case.
40. Gossip protocol Intra-ring communication Runs periodically Failure detection,hinted handoffs and nodes exchange
41. Gossip protocol org.apache.cassandra.gms.Gossiper Has the list of nodes that are alive and dead Chooses a random node and starts “chat” with it. One gossip round requires three messages Failure detection uses a suspicion level to decide whether the node is alive or dead
44. Tombstones The data is not immediately deleted Deleted values are marked Tombstones will be suppressed during next compaction GCGraceSeconds – amount of seconds that server will wait to garbage-collect a tombstone
45. Compaction Merging SSTables into one merging keys combining columns creating new index Main aims: Free up space Reduce number of required seeks
46. Compaction Minor: Triggered when at least N SSTables have been flushed on disk (N is tunable, 4 – by default) Merging SSTables of the similar size Major: Merging all SSTables Done manually through nodetool compact discarding tombstones
48. Anti-entropy During major compaction the node exchanges Merkle trees (hash of its data) with another nodes If the trees don’t match, they are repaired Nodes maintain timestamp index and exchange only the most recent updates
49. Read repair During read operation replicas with stale values are brought up to date Week consistency level (ONE): after the data is returned Strong consistency level (QUORUM, ALL): before the data is returned
50. Bloom filters A bit array Test whether value is a member of set Reduce disk access (improve performance)
51. Bloom filters On write:` several hashes are generated per key bits for each hash are marked On read: hashes are generated for the key if all bits of this hashes are non-empty then the key may probably exist in SSTable if at least one bit is empty then the key has been never written to SSTable
53. Resources Home of Apache Cassandra Project http://cassandra.apache.org/ Apache Cassandra Wiki http://wiki.apache.org/cassandra/ Documentation provided by DataStaxhttp://www.datastax.com/docs/0.8/ Good explanation of creation secondary indexes http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html Eben Hewitt “Cassandra: The Definitive Guide”, O’REILLY, 2010, ISBN: 978-1-449-39041-9
Endpoint snitch can be wrapped with a dynamic snitch, which will monitor read latencies and avoid reading from hosts that have slowed (due to compaction, for instance)