Slides from my NoSQL Exchange 2011 talk introducing Apache Cassandra. This talk explained the fundamental concepts of Cassandra and then demonstrated how to build a simple ad-targeting application using PHP, with a focus on data modeling.
Video of talk: http://skillsmatter.com/podcast/home/cassandra/js-2880
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...DataStax Academy
The Cassandra Storage Engine allows access to data in a Cassandra cluster from MariaDB. Learn what the Cassandra Storage Engine is and how to make use of it, how we implemented it using dynamic columns in MariaDB. Also, we'll look at CQL, data and command mapping, use cases and benchmarks.
Whether running load tests or migrating historic data, loading data directly into Cassandra can be very useful to bypass the system’s write path.
In this webinar, we will look at how data is stored on disk in sstables, how to generate these structures directly, and how to load this data rapidly into your cluster using sstableloader. We'll also review different use cases for when you should and shouldn't use this method.
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...DataStax Academy
The Cassandra Storage Engine allows access to data in a Cassandra cluster from MariaDB. Learn what the Cassandra Storage Engine is and how to make use of it, how we implemented it using dynamic columns in MariaDB. Also, we'll look at CQL, data and command mapping, use cases and benchmarks.
Whether running load tests or migrating historic data, loading data directly into Cassandra can be very useful to bypass the system’s write path.
In this webinar, we will look at how data is stored on disk in sstables, how to generate these structures directly, and how to load this data rapidly into your cluster using sstableloader. We'll also review different use cases for when you should and shouldn't use this method.
This course is designed to be a “fast start” on the basics of data modeling with Cassandra. We will cover some basic Administration information upfront that is important to understand as you choose your data model. It is still important to take a proper Admin class if you are responsible for production instance. This course focuses on CQL3, but thrift shall not be ignored.
Cassandra Community Webinar: Back to Basics with CQL3DataStax
Cassandra is a distributed, massively scalable, fault tolerant, columnar data store, and if you need the ability to make fast writes, the only thing faster than Cassandra is /dev/null! In this fast-paced presentation, we'll briefly describe big data, and the area of big data that Cassandra is designed to fill. We will cover Cassandra's unique, every-node-the-same architecture. We will reveal Cassandra's internal data structure and explain just why Cassandra is so darned fast. Finally, we'll wrap up with a discussion of data modeling using the new standard protocol: CQL (Cassandra Query Language).
Using advanced options in MariaDB Connector/JMariaDB plc
MariaDB Connector/J is our widely used Type 4 JDBC driver for Java. This session covers the basics of getting started with Java and MariaDB, and moves quickly to more advanced topics, including connection pooling, automatic failover and debugging. Diego Dupin also includes an overview of popular object/relational mapping (ORM) and programming frameworks for Java. Even if you have been using the MariaDB Connector/J for years, come to this session to learn about the latest release, see where the connector is going and discover the latest tips and tricks.
Introduction to CQL and Data Modeling with Apache CassandraJohnny Miller
Cassandra Meetup, Helsinki February 2014. Introduction to CQL and Data Modeling with Apache Cassandra. You can find the video here: http://bit.ly/jpm_004
Cassandra training course is designed to provide knowledge and skills to become a successful Cassandra developer. In depth knowledge of concepts such as Clusters, Keyspaces, Column familes, Replication, Cassandra’s Data Model, Cassandra’s Architecture, Performance Tuning, How to read and write data and finally how to integrate Cassandra with Hadoop will be covered in this course.
MySQL PHP native driver : Advanced Functions / PHP forum Paris 2013 Serge Frezefond
mysqlnd the MySQL native driver for PHP brings a lot of value to MySQL.
There is no change for developers that can still use the mysqli and PDO API.
This driver supports a plugins extension capability. Some very useful features have been implemented :
- mysqlnd_ms replication and load balancing plugin
- mysqlnd_qc query result cache plugin
- mysqlnd_memcache innoDB memcache plugin
- mysqlnd_uh user handler plugin
- mysqlnd_mux plugin to multiplex PHP connections
MySQL Fabric is the new sharding framework for MySQL. The mysqlnd_ms plugging the MySQL native driver makes it possible to use this sharding framework from PHP.
This course is designed to be a “fast start” on the basics of data modeling with Cassandra. We will cover some basic Administration information upfront that is important to understand as you choose your data model. It is still important to take a proper Admin class if you are responsible for production instance. This course focuses on CQL3, but thrift shall not be ignored.
Cassandra Community Webinar: Back to Basics with CQL3DataStax
Cassandra is a distributed, massively scalable, fault tolerant, columnar data store, and if you need the ability to make fast writes, the only thing faster than Cassandra is /dev/null! In this fast-paced presentation, we'll briefly describe big data, and the area of big data that Cassandra is designed to fill. We will cover Cassandra's unique, every-node-the-same architecture. We will reveal Cassandra's internal data structure and explain just why Cassandra is so darned fast. Finally, we'll wrap up with a discussion of data modeling using the new standard protocol: CQL (Cassandra Query Language).
Using advanced options in MariaDB Connector/JMariaDB plc
MariaDB Connector/J is our widely used Type 4 JDBC driver for Java. This session covers the basics of getting started with Java and MariaDB, and moves quickly to more advanced topics, including connection pooling, automatic failover and debugging. Diego Dupin also includes an overview of popular object/relational mapping (ORM) and programming frameworks for Java. Even if you have been using the MariaDB Connector/J for years, come to this session to learn about the latest release, see where the connector is going and discover the latest tips and tricks.
Introduction to CQL and Data Modeling with Apache CassandraJohnny Miller
Cassandra Meetup, Helsinki February 2014. Introduction to CQL and Data Modeling with Apache Cassandra. You can find the video here: http://bit.ly/jpm_004
Cassandra training course is designed to provide knowledge and skills to become a successful Cassandra developer. In depth knowledge of concepts such as Clusters, Keyspaces, Column familes, Replication, Cassandra’s Data Model, Cassandra’s Architecture, Performance Tuning, How to read and write data and finally how to integrate Cassandra with Hadoop will be covered in this course.
MySQL PHP native driver : Advanced Functions / PHP forum Paris 2013 Serge Frezefond
mysqlnd the MySQL native driver for PHP brings a lot of value to MySQL.
There is no change for developers that can still use the mysqli and PDO API.
This driver supports a plugins extension capability. Some very useful features have been implemented :
- mysqlnd_ms replication and load balancing plugin
- mysqlnd_qc query result cache plugin
- mysqlnd_memcache innoDB memcache plugin
- mysqlnd_uh user handler plugin
- mysqlnd_mux plugin to multiplex PHP connections
MySQL Fabric is the new sharding framework for MySQL. The mysqlnd_ms plugging the MySQL native driver makes it possible to use this sharding framework from PHP.
Cassandra concepts, patterns and anti-patternsDave Gardner
An introduction to the fundamental concepts behind Apache Cassandra. This talk explains the engineering principles that make Cassandra such an attractive choice for building highly resilient and available systems and then goes on to explain how to use it - covering basic data modelling patterns and anti-patterns.
This presentation shortly describes key features of Apache Cassandra. It was held at the Apache Cassandra Meetup in Vienna in January 2014. You can access the meetup here: http://www.meetup.com/Vienna-Cassandra-Users/
Cassandra, Modeling and Availability at AMUGMatthew Dennis
brief high level comparison of modeling between relational databases and Cassandra followed by a brief description of how Cassandra achieves global availability
Slides from my Planning to Fail talk given at PHP North East conference 2013. This is a slightly longer version of the same talk given at the PHP UK conference. The talk was on how you can build resilient systems by embracing failure.
DataStax C*ollege Credit: What and Why NoSQL?DataStax
In the first of our bi-weekly C*ollege Credit series Aaron Morton, DataStax MVP for Apache Cassandra and Apache Cassandra committer and Robin Schumacher, VP of product management at DataStax, will take a look back at the history of NoSQL databases and provide a foundation of knowledge for people looking to get started with NoSQL, or just wanting to learn more about this growing trend. You will learn how to know that NoSQL is right for your application, and how to pick a NoSQL database. This webinar is C* 101 level.
Samza: Real-time Stream Processing at LinkedInC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1eGbVJv.
Chris Riccomini discusses: Samza's feature set, how Samza integrates with YARN and Kafka, how it's used at LinkedIn, and what's next on the roadmap. Filmed at qconsf.com.
Chris Riccomini is a Staff Software Engineer at LinkedIn, where he's is currently working as a committer and PMC member for Apache Samza. He's been involved in a wide range of projects at LinkedIn, including, "People You May Know", REST.li, Hadoop, engineering tooling, and OLAP systems. Prior to LinkedIn, he worked on data visualization and fraud modeling at PayPal.
Building Real-time Data Products at LinkedIn with Apache SamzaTrieu Nguyen
The world is going real-time. MapReduce, SQL-on-Hadoop and similar batch processing tools are fine for analyzing and processing data after the fact — but sometimes you need to process data continuously as it comes in, and react to it within a few seconds or less. How do you do that at Hadoop scale?
Apache Samza is an open source stream processing framework designed to solve these kinds of problems. It is built upon YARN/Hadoop 2.0 and Apache Kafka. You can think of Samza as a real-time, continuously running version of MapReduce.
Samza has some unique features that make it powerful. It provides high performance for stateful processing jobs, including aggregation and joins between many input streams. It is designed to support an ecosystem of many different jobs written by different teams, and it isolates them from each other, so that one badly behaved job can’t affect the others.
Introduciton to Apache Cassandra for Java Developers (JavaOne)zznate
The database industry has been abuzz over the past year about NoSQL databases. Apache Cassandra, which has quickly emerged as a best-of-breed solution in this space, is used at many companies to achieve unprecedented scale while maintaining streamlined operations.
This presentation goes beyond the hype, buzzwords, and rehashed slides and actually presents the attendees with a hands-on, step-by-step tutorial on how to write a Java application on top of Apache Cassandra. It focuses on concepts such as idempotence, tunable consistency, and shared-nothing clusters to help attendees get started with Apache Cassandra quickly while avoiding common pitfalls.
Having used apache pulsar in production for an year for our pub sub use cases such as stream analytics, event sourcing etc, this slide deck presents the lesson learned per se understanding the architecture, tuning the cluster, managing to keep it highly available and fault tolerant and much more.
While the slides are presented in terms of apache pulsar, a lot of the concepts can be easily extended to a lot of distributed systems.
The views here are my own and do not represent the view of nutanix corporation.
DataTalks.Club - Building Scalable End-to-End Deep Learning Pipelines in the ...Rustem Feyzkhanov
One of the main issues with ML and DL deployment is finding the right way to train and operationalize the model within the company. Serverless approach for deep learning provides simple, scalable, affordable yet reliable architecture. The challenge of this approach is to keep in mind certain limitations in CPU, GPU and RAM, and organize training and inference of your model.
My presentation will show how to utilize services like Amazon SageMaker, AWS Batch, AWS Fargate, AWS Lambda and AWS Step Functions to organize deep learning workflows.
Big Data Challenges and How to Overcome Them with Qubole - a Self-Service Platform for Big Data Analytics built on Amazon Web Services, Microsoft and Google Clouds. Storing, accessing, and analyzing large amounts of data from diverse sources and making it easily accessible to deliver actionable insights for users can be challenging for data driven organizations. The solution for customers is to optimize scaling and create a unified interface to simplify analysis. Qubole helps customers simplify their big data analytics with speed and scalability, while providing data analysts and scientists self-service access in Cloud. The platform is fully elastic and automatically scales or contracts clusters based on workload. We will try to overview main features, advantages and drawback of this platform.
Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply such as: schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), a huge amount of data and more. So the misleading term "nosql" (the community now translates it mostly with "not only sql") should be seen as an alias to something like the definition above.
Speaker: Aaron Morton, Apache Cassandra Committer & Co-Founder/Principle Consultant at The Last Pickle Inc.
Video: http://www.youtube.com/watch?v=efI5fL8eEfo&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=23
From the microsecond your request hits an Apache Cassandra node there are many code paths, threads and machines involved in storing or fetching your data. This talk will step through the common operations and highlight the code responsible. Apache Cassandra solves many interesting problems to provide a scalable, distributed, fault tolerant database. Cluster wide operations track node membership, direct requests and implement consistency guarantees. At the node level, the Log Structured storage engine provides high performance reads and writes. All of this is implemented in a Java code base that has greatly matured over the past few years. This talk will step through read and write requests, automatic processes and manual maintenance tasks. I'll discuss the general approach to solving the problem and drill down to the code responsible for implementation. Existing Cassandra users, those wanting to contribute to the project and people interested in Dynamo based systems will all benefit from this tour of the code base.
Scylla Summit 2018: Introducing ValuStor, A Memcached Alternative Made to Run...ScyllaDB
In this presentation, we share approaches to replacing RAM-only caching infrastructure while achieving high performance against a persistent datastore. Memcached has proven very popular, but it also requires its users to sacrifice reliability, scalability, redundancy, availability, and security. To address these issues, Sensaphone implemented a memcached replacement called ValuStor, an easy-to-use key-value database client layer written in C++ that works well with Scylla. ValuStor includes features like client-side write queues, multi-threading support, automatic adaptive consistency, and support for multiple data types (including JSON).
This presentation introduces people to Cassandra and Column Family Datastores in general. I will discuss what Cassandra is, how and when it is useful, and how it integrates with Rails. I will also go in to lessons learned during our 3-month project, and the useful patterns that emerged. The discussion will be very technical, but targeted at developers who are not familiar with, or have not done a project with Cassandra.
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at AlibabaMichael Stack
Yun Zhang
Track 2: Ecology and Solutions
https://open.mi.com/conference/hbasecon-asia-2019
THE COMMUNITY EVENT FOR APACHE HBASE™
July 20th, 2019 - Sheraton Hotel, Beijing, China
https://hbase.apache.org/hbaseconasia-2019/
The event, held on 27th April 2019, was part of the Global Azure Bootcamp and covered Microsoft's Cosmos DB, more specifically:
- Introduction to Cosmos DB, its features, internals, resource models, and request units.
- DEMO: Create an SQL API. Download sample .NET app. Simple queries.
- Covered Change Feed and showcased various use case scenarios.
- Detailed Global Distribution and Consistency Models implications.
- DEMO: Mongo - Lift and shift. Run simple .NET code against a MongoDB (in docker container) and cosmos.
- Introduction to Tinkerpop graphs
- DEMO: Graphs API. Download sample .NET app. Simple queries.
https://techspark.mt/global-azure-bootcamp-27th-april-2019/
RMG203 Cloud Infrastructure and Application Monitoring with Amazon CloudWatch...Amazon Web Services
Amazon CloudWatch provides AWS customers the monitoring platform for keeping tabs on their cloud infrastructure and applications. In this session, we show you how to use CloudWatch to monitor vital operational resource data such as EC2 Instance CPU Utilization, ELB Request Counts, RDS Read Throughput and much more. Learn how to configure CloudWatch Alarms to alert you any time services are operating outside of ranges you define. Finally, see how you can monitor applications on your EC2 instances or outside of AWS.
An introduction to DataStax's Brisk (a distribution of Cassandra, Hadoop and Hive). Includes a back story of my own experience with Cassandra plus a demo of Brisk built around a very simple ad-network-type application.
Introduction to Cassandra at London Web MeetupDave Gardner
A 15 minute introduction to the Cassandra distributed data store from the February 2011 London Web meetup.
This covers the basics of who is using it, why you might want to use it (due to the large amount of data being collected by Web Apps today) and, most importantly, _what_ it is!
What are the challenges of running Apache Cassandra on Amazon EC2? Is it a good idea?
In this presentation, we explore reasons for and against running the distributed database Cassandra on EC2. We look at the I/O performance of EC2 and
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
4. Comparing Cassandra with X
“Can someone quickly explain the
differences between the two? Other than
the fact that MongoDB supports ad-hoc
querying I don't know whats different. It also
appears (using google trends) that MongoDB
seems to be growing while Cassandra is
dying off. Is this the case?”
27th July
2010http://comments.gmane.org/gmane.comp.db.cassandra.user/
7773
5. Comparing Cassandra with X
“They have approximately nothing in
common. And, no, Cassandra is
definitely not dying off.”
28th July 2010
http://comments.gmane.org/gmane.comp.db.cassandra.user/7773
6. Top Tip #1
To use a NoSQL solution effectively, we
need to identify it's sweet spot.
7. Top Tip #1
To use a NoSQL solution effectively, we
need to identify it's sweet spot.
This means learning about each solution;
how is it designed? what algorithms
does it use?
http://www.alberton.info/nosql_databases_what_when_why_phpuk2
011.html
8. Comparing Cassandra with X
“they say … I can’t decide between this project
and this project even though they look nothing
like each other. And the fact that you can’t
decide indicates that you don’t actually have a
problem that requires them.”
Benjamin Black – NoSQL Tapes (at 30:15)
http://nosqltapes.com/video/benjamin-black-on-nosql-cloud-computing-and-
fast_ip
9. Headline features
1. Elastic
Read and write throughput increases
linearly as new machines are added
http://cassandra.apache.org/
10. Headline features
2. Decentralised
Fault tolerant with no single point of
failure; no “master” node
http://cassandra.apache.org/
11. The dynamo paper
• Consistent hashing
• Vector clocks
• Gossip protocol
• Hinted handoff
• Read repair
http://www.allthingsdistributed.com/files/amazon-dynamo-
sosp2007.pdf
12. The dynamo paper
#
1 RF = 3
# #
6 2
Coordinator
# #
5 3
Client
#
4
13. Headline features
3. Rich data model
Column based, range slices, column
slices, secondary
indexes, counters, expiring columns
http://cassandra.apache.org/
14. The big table paper
• Sparse "columnar" data model
• SSTable disk storage
• Append-only commit log
• Memtable (buffer and sort)
• Immutable SSTable files
• Compaction
http://labs.google.com/papers/bigtable-osdi06.pdf
http://www.slideshare.net/geminimobile/bigtable-4820829
15. The big table paper
Column Family
Name Name Name
Row Key
Value Value Value
Column Column Column
16. Headline features
4. You're in control
Tunable consistency, per operation
http://cassandra.apache.org/
18. Consistency levels: write operations
Level Description
ANY One node, including hinted handoff
ONE One node
QUORUM N/2 + 1 replicas
LOCAL_QUORUM N/2 + 1 replicas in local data centre
EACH_QUORUM N/2 + 1 replicas in each data centre
ALL All replicas
http://wiki.apache.org/cassandra/API#Write
19. Consistency levels: read operations
Level Description
ONE 1st Response
QUORUM N/2 + 1 replicas
LOCAL_QUORUM N/2 + 1 replicas in local data centre
EACH_QUORUM N/2 + 1 replicas in each data centre
ALL All replicas
http://wiki.apache.org/cassandra/API#Read
20. Headline features
5. Performant
Well known for high write performance
http://www.datastax.com/docs/1.0/introduction/index#core-
strengths-of-cassandra
21. Benchmark*
http://blog.cubrid.org/dev-
platform/nosql-benchmarking/
* Add pinch of salt
22. Recap: headline features
1. Elastic
2. Decentralised
3. Rich data model
4. You’re in control (tunable consistency)
5. Performant
23. A simple ad-targeting application
Some ads
Choose which
ad to show
Our user knowledge
24. A simple ad-targeting application
Allow us to capture user behaviour/data
via “pixels” - placing users into segments
(different buckets)
http://pixel.wehaveyourkidneys.com/add.php?add=foo
25. A simple ad-targeting application
Record clicks and impressions of each
ad; storing data per-ad and per-segment
http://pixel.wehaveyourkidneys.com/adImpression.php?ad=1
http://pixel.wehaveyourkidneys.com/adClick.php?ad=1
26. A simple ad-targeting application
Real-time ad performance
analytics, broken down by segment
(which segments are performing well?)
http://www.wehaveyourkidneys.com/adPerformance.php?ad=1
27. A simple ad-targeting application
Recommendations based on best-
performing ads
(this is left as an exercise for the reader)
28. Additional requirements
• Large number of users
• High volume of impressions
• Highly available – downtime is money
29. A good fit for Cassandra?
Yes!
Big data, high availability and lots of
writes are all good signs that Cassandra
will fit well.
http://www.nosqldatabases.com/main/2010/10/19/what-is-
cassandra-good-for.html
30. A good fit for Cassandra?
Although there are many things that
people are using Cassandra for.
Highly available HTTP request routing
(tiny data!)
http://blip.tv/datastax/highly-available-http-request-routing-dns-
using-cassandra-5501901
31. Top Tip #2
Cassandra is an excellent fit where
availability matters, where there is a lot
of data or where you have a large
number of write operations.
33. Data modeling
Start from your queries, work backwards
http://www.slideshare.net/mattdennis/cassandra-data-modeling
http://blip.tv/datastax/data-modeling-workshop-5496906
34. Data model basics: conflict resolution
Per-column timestamp-based conflict
resolution
{ {
column: foo, column: foo,
value: bar, value: zing,
timestamp: 1000 timestamp: 1001
} }
http://cassandra.apache.org/
35. Data model basics: conflict resolution
Per-column timestamp-based conflict
resolution
{ {
column: foo, column: foo,
value: bar, value: zing,
timestamp: 1000 timestamp: 1001
} }
http://cassandra.apache.org/
36. Data model basics: column ordering
Columns ordered at time of
writing, according to Column Family
schema
{ {
column: zebra, column: badger,
value: foo, value: foo,
timestamp: 1000 timestamp: 1001
} }
http://cassandra.apache.org/
37. Data model basics: column ordering
Columns ordered at time of
writing, according to Column Family
schema
{
badger: foo, with AsciiType column
zebra: foo schema
}
http://cassandra.apache.org/
38. Data modeling: user segments
Add user to bucket X, with expiry time Y
Which buckets is user X in?
["user"][<uuid>][<bucketId>] = 1
[CF] [rowKey] [columnName] = value
39. Data modeling: user segments
user Column Family:
[f97be9cc-5255-4578-8813-76701c0945bd][bar] = 1
[f97be9cc-5255-4578-8813-76701c0945bd][foo] = 1
[06a6f1b0-fcf2-41d9-8949-fe2d416bde8e][baz] = 1
[06a6f1b0-fcf2-41d9-8949-fe2d416bde8e][zoo] = 1
[503778bc-246f-4041-ac5a-fd944176b26d][aaa] = 1
Q: Is user in segment X?
A: Single column fetch
40. Data modeling: user segments
user Column Family:
[f97be9cc-5255-4578-8813-76701c0945bd][bar] = 1
[f97be9cc-5255-4578-8813-76701c0945bd][foo] = 1
[06a6f1b0-fcf2-41d9-8949-fe2d416bde8e][baz] = 1
[06a6f1b0-fcf2-41d9-8949-fe2d416bde8e][zoo] = 1
[503778bc-246f-4041-ac5a-fd944176b26d][aaa] = 1
Q: Which segments is user X in?
A: Column slice fetch
41. Top Tip #3
With column slices, we get the columns
back ordered, according to our schema
We cannot do the same for rows
however, unless we use the Order
Preserving Partitioner
42. Top Tip #4
Don’t use the Order Preserving
Partitioner unless you absolutely have to
http://ria101.wordpress.com/2010/02/22/cassandra-
randompartitioner-vs-orderpreservingpartitioner/
43. Data modeling: user segments
Add user to bucket X, with expiry time Y
Which buckets is user X in?
["user"][<uuid>][<bucketId>] = 1
[CF] [rowKey] [columnName] = value
44. Expiring columns
An expiring column will be automatically
deleted after n seconds
http://cassandra.apache.org/
45. Data modeling: user segments
$pool = new ConnectionPool(
'whyk', array('localhost')
);
$users = new ColumnFamily($pool, 'users');
$users->insert(
$userUuid,
array($segment => 1),
NULL, // default TS
$expires
);
Using phpcassa client: https://github.com/thobbs/phpcassa
46. Data modeling: user segments
UPDATE users
USING TTL = 3600
SET 'foo' = 1
WHERE KEY =
'f97be9cc-5255-4578-8813-76701c0945bd'
Using CQL
http://www.datastax.com/dev/blog/what%E2%80%99s-new-in-
cassandra-0-8-part-1-cql-the-cassandra-query-language
http://www.datastax.com/docs/1.0/references/cql
47. Top Tip #5
Try to exploit Cassandra’s columnar data
model; avoid read-before write and
locking by safely mutating individual
columns
48. Data modeling: ad performance
Track overall ad performance; how many
clicks/impressions per ad?
["ads"][<adId>][<stamp>]["click"] = #
["ads"][<adId>][<stamp>]["impression"] = #
[CF] [Row] [S.Col] [Col] = value
Using super columns
49. Top Tip #6
Friends don’t let friends use Super
Columns.
http://rubyscale.com/2010/beware-the-supercolumn-its-a-trap-for-
the-unwary/
50. Data modeling: ad performance
Try again using regular columns:
["ads"][<adId>][<stamp>-"click"] = #
["ads"][<adId>][<stamp>-"impression"] = #
[CF] [Row] [Col] = value
51. Data modeling: ad performance
ads Column Family:
[1][2011103015-click] = 1
[1][2011103015-impression] = 3434
[1][2011103016-click] = 12
[1][2011103016-impression] = 5411
[1][2011103017-click] = 2
[1][2011103017-impression] = 345
Q: Get performance of ad X between two date/times
A: Column slice against single row specifying a start
stamp and end stamp + 1
52. Think carefully about your data
This scheme works because I’m
assuming each ad has a relatively short
lifespan. This means that there are lots
of rows and hence the load is spread.
Other options:
http://rubyscale.com/2011/basic-time-series-with-cassandra/
53. Counters
• Distributed atomic counters
• Easy to use
• Not idempotent
http://www.datastax.com/dev/blog/whats-new-in-cassandra-0-8-part-
2-counters
54. Data modeling: ad performance
$stamp = date('YmdH');
$ads->add(
$adId, // row key
"$stamp-impression", // column
1 // increment
);
We’ll store performance metrics in hour buckets for graphing.
55. Data modeling: ad performance
UPDATE ads
SET '2011103015-impression'
= '2011103015-impression' + 1
WHERE KEY = '1’
56. Data modeling: performance/segment
We can add in another dimension to our
stats so we can breakdown by segment.
["ads"][<adId>]
[<stamp>-<segment>-"click"] = #
[CF] [Row]
[Col] = value
57. Data modeling: performance/segment
ads Column Family:
[1][2011103015-bar-click] = 1
[1][2011103015-bar-impression] = 3434
[1][2011103015-foo-click] = 12
[1][2011103015-foo-impression] = 5411
[1][2011103016-bar-click] = 2
Q: Get performance of ad X between two date/times,
split by segment
A: Column slice against single row specifying a start
stamp and end stamp + 1
58. Data modeling: performance/segment
$stamp = date('YmdH');
$ads->add(
"$adId-segments", // row key
"$stamp-$segment-impression", // column
1 // incr
);
We’ll store performance metrics in hour buckets for graphing.
59. Data modeling: segment stats
Track overall clicks/impressions per
bucket; which buckets are most clicky?
["segments"][<adId>-"segments"]
[<stamp>-<segment>-"click"] = #
[CF] [Row]
[Col] = value
60. Recap: Data modeling
• Think about the queries, work
backwards
• Don’t overuse single rows; try to
spread the load
• Don’t use super columns
• Ask on IRC! #cassandra
61. Recap: Common data modeling patterns
1. Using column names with no value
[cf][rowKey][columnName] = 1
62. Recap: Common data modeling patterns
2. Counters
[cf][rowKey][columnName]++
63. And also…
3. Serialising a whole object
[cf][rowKey][columnName] = {
foo: 3,
bar: 11
}
64. There’s more: Brisk
Integrated Hadoop distribution (without
HDFS installed). Run Hive and Pig queries
directly against Cassandra
DataStax now offer this functionality in
their “Enterprise” product
http://www.datastax.com/products/enterprise
65. Hive
CREATE EXTERNAL TABLE tempUsers
(userUuid string, segmentId string, value string)
STORED BY
'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES (
"cassandra.columns.mapping" = ":key,:column,:value",
"cassandra.cf.name" = "users"
);
SELECT segmentId, count(1) AS total
FROM tempUsers
GROUP BY segmentId
ORDER BY total DESC;
66. There’s more: Supercharged Cassandra
Acunu have reengineered the entire Unix
storage stack, optimised specifically for
Big Data workloads
Includes instant snapshot of CFs
http://www.acunu.com/products/choosing-cassandra/
70. In conclusion
Hadoop integration means we can
analyse data directly from a Cassandra
cluster
71. In conclusion
Cassandra’s sweet spot is highly
available “big data” (especially time-
series) with large numbers of writes
72. Thanks
Learn more about Cassandra
meetup.com/Cassandra-London
Checkout the code https://github.com/davegardnerisme/we-have-
your-kidneys
Watch videos from Cassandra SF 2011
http://www.datastax.com/events/cassandrasf2011/presentations