A Comprehensive Introduction to Apache Cassandra.
Agenda:
- What is NoSQL?
- What is Cassandra?
- Architecture
- Data Model
- Key Features and Benefits
- Cassandra Tools
-- CQL
-- Nodetool
-- DataStax Opscenter
- Who’s using Cassandra?
These ppt are the part 2 of mobile computing concepts. These ppt defines the following things
Wireless Networking
Wireless LAN Overview: IEEE 802.11
Wireless applications
Data Broadcasting
Bluetooth
TCP over wireless
Mobile IP
WAP: Architecture, protocol stack, application
environment, applications.
This is a presentation of the popular NoSQL database Apache Cassandra which was created by our team in the context of the module "Business Intelligence and Big Data Analysis".
These ppt are the part 2 of mobile computing concepts. These ppt defines the following things
Wireless Networking
Wireless LAN Overview: IEEE 802.11
Wireless applications
Data Broadcasting
Bluetooth
TCP over wireless
Mobile IP
WAP: Architecture, protocol stack, application
environment, applications.
This is a presentation of the popular NoSQL database Apache Cassandra which was created by our team in the context of the module "Business Intelligence and Big Data Analysis".
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
Apache Kafka is a new breed of messaging system built for the "big data" world. Coming out of LinkedIn (and donated to Apache), it is a distributed pub/sub system built in Scala. It has been an Apache TLP now for several months with the first Apache release imminent. Built for speed, scalability, and robustness, Kafka should definitely be one of the data tools you consider when designing distributed data-oriented applications.
The talk will cover a general overview of the project and technology, with some use cases, and a demo.
This slide explains the design part as well as implementation part of the firewall. And also tells about the need of firewall and firewall capabilities.
Overview of UDP protocol.
UDP (User Datagram Protocol) is a simple extension of the Internet Protocol services. It basically provides simple packet transport service without any quality of service functions.
Unlike TCP, UDP is connection-less and packet-based. Application PDUs (application packets) sent over a UDP socket are delivered to the receiving host application as is without fragmentation.
UDP is mostly used by applications with simple request-response communication patterns like DNS, DHCP, RADIUS, RIP or RPC.
Since UDP does provide any error recovery such as retransmission of lost packets, the application protocols have to take care of these situations.
A brief overview of caching mechanisms in a web application. Taking a look at the different layers of caching and how to utilize them in a PHP code base. We also compare Redis and MemCached discussing their advantages and disadvantages.
Using all of the high availability options in MariaDBMariaDB plc
MariaDB provides a number of high availability options, including replication with automatic failover and multi-master clustering. In this session Wagner Bianchi, Principal Remote DBA, provides a comprehensive overview of the high availability features in MariaDB, highlights their impact on consistency and performance, discusses advanced failover strategies and introduces new features such as casual reads and transparent connection failover.
As part of NoSQL series, I presented Google Bigtable paper. In presentation I tried to give some plain introduction to Hadoop, MapReduce, HBase
www.scalability.rs
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
Apache Kafka is a new breed of messaging system built for the "big data" world. Coming out of LinkedIn (and donated to Apache), it is a distributed pub/sub system built in Scala. It has been an Apache TLP now for several months with the first Apache release imminent. Built for speed, scalability, and robustness, Kafka should definitely be one of the data tools you consider when designing distributed data-oriented applications.
The talk will cover a general overview of the project and technology, with some use cases, and a demo.
This slide explains the design part as well as implementation part of the firewall. And also tells about the need of firewall and firewall capabilities.
Overview of UDP protocol.
UDP (User Datagram Protocol) is a simple extension of the Internet Protocol services. It basically provides simple packet transport service without any quality of service functions.
Unlike TCP, UDP is connection-less and packet-based. Application PDUs (application packets) sent over a UDP socket are delivered to the receiving host application as is without fragmentation.
UDP is mostly used by applications with simple request-response communication patterns like DNS, DHCP, RADIUS, RIP or RPC.
Since UDP does provide any error recovery such as retransmission of lost packets, the application protocols have to take care of these situations.
A brief overview of caching mechanisms in a web application. Taking a look at the different layers of caching and how to utilize them in a PHP code base. We also compare Redis and MemCached discussing their advantages and disadvantages.
Using all of the high availability options in MariaDBMariaDB plc
MariaDB provides a number of high availability options, including replication with automatic failover and multi-master clustering. In this session Wagner Bianchi, Principal Remote DBA, provides a comprehensive overview of the high availability features in MariaDB, highlights their impact on consistency and performance, discusses advanced failover strategies and introduces new features such as casual reads and transparent connection failover.
As part of NoSQL series, I presented Google Bigtable paper. In presentation I tried to give some plain introduction to Hadoop, MapReduce, HBase
www.scalability.rs
Redis as a Main Database, Scaling and HADave Nielsen
Iskren Chernev, an Independent developer, uses a lot of Redis. In this talk, Iskren will look at a particular Redis use-case -- using it as the main database (not cache). Iskren will show how to achieve reasonable guarantees about data integrity, speed, high-availability in an event of failure and infinite horizontal scalability. This particular approach has proven successful in managing clusters of up to 2400 nodes, and storing data north of 7TB before replication. We'll cover ways to separate your data appropriately into many nodes, performing different types of migrations (from another database, from one cluster to another, scaling migrations and migrating out of Redis), moving nodes without downtime, some configuration tips and monitoring.
For this upcoming meetup, we welcome Patrick Eaton PhD, Systems Architect at Stackdriver, and Joey Imbasciano, Cloud Platform Engineer at Stackdriver.
What You'll Learn At This Meetup:
• Why Stackdriver chose Cassandra over other DB offerings
• Stackdriver's data pipeline that runs into Cassandra
• Operating Cassandra Running on AWS
• Stackdriver's approach to disaster recovery
Patrick and Joey will be presenting their use of Apache Cassandra at Stackdriver, some lesson's learned, technical tips and a Q&A to end the evening.
Visual geral sobre conceitos do redshift.
Exemplificando como é dividido a arquitetura do cluster redshift, como funcionar sorting key, modelos de distribuição de dados e precificação do cluster.
Casandra is a open-source, distributed, highly scalable and fault-tolerant database. It is a best choice for managing structured, semi-structured or unstructured data at a large amount.
Concepts, architectures and uses of distributed databases. A gentle introduction to get you up to speed and understand the value and potential of distributed databases.
cachegrand: A Take on High Performance CachingScyllaDB
cachegrand is what happens when you throw in a mix a SIMD-accelerated hashtable — capable of performing parallel GET operations without locks or busy-wait loops (e.g. atomic operations) — with fibers, io_uring, your own I/O library, your own memory allocator, and an in-memory & on-disk time series database!
Written in C, built from scratch, natively modular - currently working on Redis compatibility — it's a platform that can deliver very high QPS with low latencies for caching and data streaming with the door open to supporting business logic in Rust & WebAssembly down the line.
This session will focus on developing techniques and OS components used highlighting how they can provide an extra boost to your platforms, no matter the programming language.
a comprehensive good introduction to the the Big data world in AWS cloud, hadoop, Streaming, batch, Kinesis, DynamoDB, Hbase, EMR, Athena, Hive, Spark, Piq, Impala, Oozie, Data pipeline, Security , Cost, Best practices
MySQL Cluster (NDB) - Best Practices Percona Live 2017Severalnines
This presentation by Johan Andersson at Percona Live 2017 in Santa Clara, California gives detailed information on all you need to know to effectively deploy and manage MySQL Cluster technology in your environment.
This presentation introduces Web Components. It includes the following topics:
0. What's Web Component?
1. Templates
2. Shadow DOM
3. Custom Elements
4. Imports
5. How to use them
6. Browser compatibilities.
An overview of Scalable Web Application Front-endSaeid Zebardast
Problem Definition:
Building large web applications with dozens of developers is a difficult task. Organizing the engineers around a common goal is one thing, but organizing your code so that people can work efficiently is another. Many large applications suffer from growing pains after just a few months in production due to poorly designed JavaScript with unclear upgrade and extension paths.
Scalable JavaScript Application Framework:
Yahoo! home page engineer Nicholas Zakas, author of Professional JavaScript for Web Developers, introduced front-end architecture for complex, modular web applications with significant JavaScript elements.
MySQL Cheat Sheet includes the following contents:
- Data Types
- Creating and Modifying Databases and Tables
- Syntax of SELECT queries
- Joins
- String functions
- Calculation functions
- Matching data
Java Cheat Sheet includes the following contents:
- Data Types
- Statements
- String, ArrayList and HashMap Methods
- Conversion
- Operators
- Exception Handling
Developing Applications with MySQL and Java for beginnersSaeid Zebardast
A presentation about Developing Applications with MySQL and Java for beginners. It includes the following topics:
- Requirements
- MySQL Data Definitions
- Java Classes
- MySQL Connector (JDBC)
- Define Methods
- Compile and Run
A presentation about MySQL for beginners. It includes the following topics:
- Introduction
- Installation
- Executing SQL statements
- SQL Language Syntax
- The most important SQL commands
- MySQL Data Types
- Operators
- Basic Syntax
- SQL Joins
- Some Exercise
ای ارائه شامل خلاصهای از کتاب ۱۷ اصل کار تیمی نوشته جان ماکسول میباشد. با توجه به نحوه انتشار مطالب در کتاب، عنوان ارائه خود را «هفده اصل افراد موثر در تیم» انتخاب نمودم. محتوای این ارائه شامل ۱۷ اصل و نحوه کسب آنها است.
امیدوارم که محتوای این ارائه برای شما مفید باشد.
REpresentational State Transfer (REST) is a style of software architecture for distributed systems such as the World Wide Web. REST has emerged as a predominant web API design model.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
2. Agenda
● What is NoSQL?
● What is Cassandra?
● Architecture
● Data Model
● Key Features and Benefits
● Hardware
● Directories and Files
● Cassandra Tools
○ CQL
○ Nodetool
○ DataStax Opscenter
● Backup and Restore
● Who’s using Cassandra?
2
3. What is NoSQL?
● NoSQL (Not Only SQL)
● Simplicity of Design
● Horizontal Scaling (Scale Out)
○ Add nodes to the Cluster as much as you wish
○ Not all NoSQL databases.
● Finer Control over availability
● Data Structure
○ Key-Value
○ Column-Oriented
○ Graph
○ Document-Oriented
○ And etc.
3
4. What is Cassandra?
● Since 2008 - Current stable version 2.1.2 (Nov 2014)
● NoSQL
● Distributed
● Open source
● Written in Java
● High performance
● Extremely scalable
● Fault tolerant (i.e no SPOF)
4
5. Architecture Highlights
● Scale out, not up
● Peer-to-Peer, distributed system
○ All nodes the same - masterless with no SPOF
● Online load balancing, cluster growth
● Understanding System/Hardware failures
● Custom data replication to ensure fault tolerance
● CAP theorem (Consistency, Availability, Partition tolerance)
○ You can not have the tree at the same time
○ Tradeoff between consistency and latency are tunable
○ Strong Consistency = Increased Latency
● Each node communicates with each other
○ through the Gossip protocol
5
7. Architecture of a write
1. At first write to a disk commit log (sequential).
2. After write to commit log, it is sent to the appropriate nodes.
3. Each node receiving write, first records it in a local log, then makes update to appropriate Memtables (one for each column family).
○ Memtable is in-memory representation of data (before the data gets flushed to disk as an SSTable).
○ Memtables are flushed to disk when:
■ Out of space
■ Too many keys (128 is default)
■ Time duration (Client provided - no cluster clock)
4. When Memtables written out two files go out:
○ Data File (SSTable).
○ Index File (SSTable Index)
5. When a commit log has had all its column families pushed to disk, it is deleted.
6. Compaction
○ Periodically data files are merged sorted into a new file.
○ Merge keys
○ Combine columns
○ Discard tombstones
7
8. Data Model
● [Keyspace][ColumnFamily][Key][Column]
● A keyspace is akin to a database in RDBMS
● The keyspace is a row-oriented, column structure
● A column family is similar to an RDBMS table
○ More flexible/dynamic
● A row in a column family is indexed by its key (Primary Key).
○ Cassandra supports up to 2 billion columns per (physical) row.
● Sample code to create keyspace and column family:
○ CREATE KEYSPACE logs WITH replication = {'class': 'SimpleStrategy',
'replication_factor': 1} ;
○ CREATE TABLE logs.samples (
node_id text,
metric text,
collection_ts timestamp,
value bigint,
PRIMARY KEY ((node_id, metric), collection_ts)
) WITH CLUSTERING ORDER BY (collection_ts DESC);
8
9. Data Model - Primary Keys
● Primary Keys are unique.
● Single Primary Key
○ PRIMARY KEY(keyColumn)
● Composite Primary Key
○ PRIMARY KEY (myPartiotionKey, my1stClusteringKey, my2stClusteringKey)
● Composite Partitioning Key
○ PRIMARY KEY ((my1PartiotionKey ,my2PartiotionKey), myClusteringKey)
9
10. Data Model - Time-To-Live (TTL)
● TTL a row
○ INSERT INTO users (id, first, last) VALUES (‘abc123’, ‘saeid’, ‘zeb’)
USING TTL 3600; //Expires data in one our
● TTL a column
○ UPDATE users USING TTL 30 SET last = ‘zebardast’ WHERE id = ‘abc123’;
● TTL is in seconds
● Can also set default TTL at a table level.
● Expired columns/rows automatically deleted.
● With no TTL specified, columns/values never expire.
● TTL is useful for automatic deletion.
● Re-inserting the same row before it expires will overwrite TTL.
10
11. Partitioners - Consistent hashing
● A partitioner determines how data is distributed across the nodes in the cluster (including replicas).
● A partitioner is a function for deriving a token representing a row from its partition key (typically by hashing).
11
name email gender
Saeid saeid@domain.com M
Kamyar kamyar@domain.com M
Nazanin nazanin@domain.com F
Masoud masoud@domain.com M
partition key Murmur3 hash value
Saeid -2245462676723223822
Kamyar 7723358927203680754
Nazanin -6723372854036780875
Masoud 1168604627387940318
Cassandra places the data on each
node according to the value of
partition key and the range that the
node is responsible for.
Node Start range End range Partition key Hash value
A -9223372036854775808 -4611686018427387903 Saeid -6723372854036780875
B -4611686018427387904 -1 Kamyar -2245462676723223822
C 0 4611686018427387903 Nazanin 1168604627387940318
D 4611686018427387904 9223372036854775807 Masoud 7723358927203680754
Cassandra assigns a hash value to each partition
key
12. Key Features and Benefits
● Gigabyte to Petabyte scalability
● Linear performance
● No SPOF
● Easy replication / data distribution
● Multi datacenter and cloud capable
● No need for separate caching layer
● Tunable data consistency
● Flexible schema design
● Data compaction
● CQL Language (like SQL)
● Support for key languages and platforms
● No need for special hardware or software
12
13. Big Data Scalability
● Capable of comfortably scaling to petabytes
● New nodes = linear performance increase
● Add new nodes online
13
14. No Single Point of Failure
● All nodes the same
○ Peer-to-Peer - masterless
● Customized replication affords tunales data redundancy
● Read/Write from any node
● Can replicate data among different physical data center racks
14
15. Easy Replication / Data Distribution
● Transparently handled by Cassandra
● Multi-data center capable
● Exploits all the benefits of Cloud computing
● Able to do Hybrid Cloud/On-Premise setup
15
16. No Need for Caching Software
● Peer-to-Peer architecture
○ removes need for special caching layer
● The database cluster uses the memory from all participating nodes to cache the data assigned
to each node.
● No irregularities between a memory cache and database are encountered
16
17. Tunable Data Consistency
● Choose between strong and eventual consistency
○ Depends on the need
● Can be done on a per operation basis, and for both read and writes.
● Handle Multi-data center operations
● Consistency Level (CL)
○ ALL = all replicas ack
○ QUORUM = > 51% of replicas ack
○ ONE = only one replica ack
○ Plus more… (see docs)
17
18. Flexible Schema
● Dynamic schema design
● Handles structured, semi-structured, and unstructured data.
● Counters is supported
● No offline/downtime for schema changes
● Support primary and secondary indexes
○ Secondary indexes != Relational Indexes (They are not for convenient not speed)
18
19. Data Compaction
● Use Google’s Snappy data compression algorithm
● Compresses data on a per column family level
● Internal tests at DataStax show up to 80%+ compression on row data
● No performance penalty
○ Some increases in overall performance due to less physical I/O
19
20. Locally Distributed
● Client reads or writes to any node
● Node coordinates with others
● Data read or replicated in parallel
● Replication info
○ Replication Factor (RF): How many copy of your data?
○ Each node is storing (RF/Cluster Size)% of the clusters total data.
○ Handy Calculator: http://www.ecyrd.com/cassandracalculator/
20
21. Rack Aware
● Cassandra is aware of which rack (or availability zone) each node resides in.
● It will attempt to place each data copy in different rack.
21
22. Data Center Aware
● Active Everywhere - reads/writes in multiple data centers
● Client writes local
● Data syncs across WAN
● Replication Factor per DC
● Different number of nodes per data center
22
23. Node Failure
● A single node failure shouldn’t bring failure.
● Replication Factor + Consistency Level = Success
23
24. Node Recovery
● When a write is performed and a replica node for the row is unavailable the coordinator will store a hint locally.
● When the node recovers, the coordinator replays the missed writes.
● Note: a hinted write does not count towards the consistency level.
● Note: you should still run repairs across your cluster.
24
25. Security in Cassandra
● Internal Authentication
○ Manages login IDs and passwords inside the database.
● Object Permission Management
○ Controls who has access to what and who can do what in the database
○ Uses familiar GRANT/REVOKE from relational systems.
● Client to Node Encryption
○ Protects data in flight to and from a database
25
26. Hardware
● RAM
○ The more memory a Cassandra node has, the better read performance.
■ For dedicated hardware, the optimal price-performance sweet spot is 16GB to 64GB; the minimum is 8GB.
■ For a virtual environments, the optimal range may be 8GB to 16GB; the minimum is 4GB.
● CPU
○ More cores is better. Cassandra is built with concurrency in mind.
■ For dedicated hardware, 8-core CPU processors are the current price-performance sweet spot.
■ For virtual environments, consider using a provider that allows CPU bursting, such as Rackspace.
● Disk
○ Cassandra tries to minimize random IO. Minimum of 2 disks. Keep CommitLog and Data (SSTable) on separate
spindles. RAID10 or RAID0 as you see fit.
○ XFS or ext4.
● Network
○ Be sure that your network can handle traffic between nodes without bottlenecks.
■ Recommended bandwidth is 1000 Mbit/s (gigabit) or greater.
● More info: Selecting hardware for enterprise implementations...
26
27. Directories and Files
● Configs
○ The main configuration file for Cassandra
■ /etc/cassandra/cassandra.yaml
○ Java Virtual Machine (JVM) configuration settings
■ /etc/cassandra/cassandra-env.sh
● Data directories
○ /var/lib/cassandra
● Log directory
○ /var/log/cassandra
● Environment settings
○ /usr/share/cassandra
● Cassandra user limits
○ /etc/security/limits.d/cassandra.conf
● More info: Package installation directories...
27
28. CQL Language
● Very similar to RDBMS SQL syntax
● Create objects via DDL (e.g. CREATE)
● Core DML commands supported: INSERT, UPDATE, DELETE
● Query data with SELECT
● cqlsh, the Python-based command-line client
○ CASSANDRA_PATH/bin/cqlsh
● More info: https://cassandra.apache.org/doc/cql/CQL.html
28
29. Nodetool
● A command line interface for managing a cluster.
○ CASSANDRA_PATH/bin/nodetool
● Useful commands:
○ nodetool info - Display node info (uptime, load and etc.).
○ nodetool status [keyspace] - Display cluster info (state, load and etc.).
○ nodetool cfstats [keyspace] - Display statistics of column families.
○ nodetool tpstats - Display usage statistics of thread pool.
○ nodetool netstats - Display network information.
○ nodetool repair - Repair one or more column families.
○ nodetool rebuild - Rebuild data by streaming from other nodes (similarly to bootstrap).
○ nodetool drain - Flush Memtables to SSTables on disk and stop accepting writes. Useful before a restart to make startup
quick.
○ nodetool flush [keyspace [columnfamily]] - Flushes one or more column families from the memtable.
○ nodetool cfhistograms keyspace columnfamily - Display statistic histograms for a given column family.
○ nodetool proxyhistograms - Display statistic histograms for network operations.
○ nodetool help - Display help information!
29
30. Backup and Restore
● Take Snapshot
○ nodetool snapshot
■ /var/lib/cassandra/keyspace_name/table_name-UUID/snapshots/snapshot_name
○ nodetool clearsnapshot
● Restore Procedure
○ Shutdown the node.
○ Clear all files in the commitlog directory (/var/lib/cassandra/commitlog)
○ Delete all *.db files in data_directory_location/keyspace_name/table_name-UUID directory.
○ Locate the most recent snapshot folder in this directory:
■ data_directory_location/keyspace_name/table_name-UUID/snapshots/snapshot_name
○ Copy its contents into this directory:
■ data_directory_location/keyspace_name/table_name-UUID
○ Start the node
■ Restarting causes a temporary burst of I/O activity and consumes a large amount of CPU resources.
○ Run nodetool repair
● More info: Restoring from a Snapshot...
30
31. DataStax Opscenter
● Visually create new clusters with a few mouse clicks either on premise or in the cloud
● Add, edit, and remove nodes
● Automatically rebalance a cluster
● Control automatic management services including transparent repair
● Manage and schedule backup and restore operations
● Perform capacity planning with historical trend analysis and forecasting capabilities
● Proactively manage all clusters with threshold and timing-based alerts
● Generate reports and diagnostic reports with the push of a button
● Integrate with other enterprise tools via developer API
● More info: http://www.datastax.com/datastax-opscenter
31