With Apache Cassandra being a massively scalable open source NoSQL database and with the amount of data that we create and copy annually which is doubling in size every two years, it is expected to reach 44 zettabytes, or 44 trillion gigabytes, we can assume that sooner or later a DBA will be handling a Cassandra database in their shop. This beginner/intermediate-level session will take you through my journey of an Oracle DBA and my first 100 days of starting to administer a Cassandra Cluster, show several demos and all the roadblocks and the success I had along this path.
I don't think it's hyperbole when I say that Facebook, Instagram, Twitter & Netflix now define the dimensions of our social & entertainment universe. But what kind of technology engines purr under the hoods of these social media machines?
Here is a tech student's perspective on making the paradigm shift to "Big Data" using innovative models: alphabet blocks, nesting dolls, & LEGOs!
Get info on:
- What is Cassandra (C*)?
- Installing C* Community Version on Amazon Web Services EC2
- Data Modelling & Database Design in C* using CQL3
- Industry Use Cases
I don't think it's hyperbole when I say that Facebook, Instagram, Twitter & Netflix now define the dimensions of our social & entertainment universe. But what kind of technology engines purr under the hoods of these social media machines?
Here is a tech student's perspective on making the paradigm shift to "Big Data" using innovative models: alphabet blocks, nesting dolls, & LEGOs!
Get info on:
- What is Cassandra (C*)?
- Installing C* Community Version on Amazon Web Services EC2
- Data Modelling & Database Design in C* using CQL3
- Industry Use Cases
At Instagram, our mission is to capture and share the world's moments. Our app is used by over 400M people monthly; this creates a lot of challenging data needs. We use Cassandra heavily, as a general key-value storage. In this presentation, I will talk about how we use Cassandra to serve our critical use cases; the improvements/patches we made to make sure Cassandra can meet our low latency, high scalability requirements; and some pain points we have.
About the Speaker
Dikang Gu Software Engineer, Facebook
I'm a software engineer at Instagram core infra team, working on scaling Instagram infrastructure, especially on building a generic key-value store based on Cassandra. Prior to this, I worked on the development of HDFS in Facebook. I got the master degree of Computer Science in Shanghai Jiao Tong university in China.
Cassandra Summit 2014: Apache Cassandra Best Practices at EbayDataStax Academy
Presenter: Feng Qu, Principal DBA at eBay
Cassandra has been adopted widely at eBay in recent years and used by many end-user facing applications. I will introduce best practices we have built over the time around system design, capacity planning, deployment automation, monitoring integration, performance analysis and troubleshooting. I will also share our experience working with DataStax support to provide a highly available, highly scalable data store fitting into eBay infrastructure.
Apache Cassandra is a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model.
This presentation, given at FOSDEM in 2010, provides a brief summary of cassandra's history, a high-level overview of the architecture and data model, and showcases some real life use-cases.
Slides for the talk "Cassandra and Spark: Love at First Sight" given at Texas Linux Fest 2015. Gives an introduction to both Cassandra and Spark and how they work together.
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarDataStax Academy
We have seen rapid adoption of C* at eBay in past two years. We have made tremendous efforts to integrate C* into existing database platforms, including Oracle, MySQL, Postgres, MongoDB, XMP etc.. We also scale C* to meet business requirement and encountered technical challenges you only see at eBay scale, 100TB data on hundreds of nodes. We will share our experience of deployment automation, managing, monitoring, reporting for both Apache Cassandra and DataStax enterprise.
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...DataStax
Lessons learned from a year spent building a Cassandra cluster over multiple regions, data centers, and providers. Will discuss our successes and learnings on replication, operations, and application development.
About the Speaker
Aaron Ploetz Lead Technical Architect, Target
Aaron is a Lead Technical Architect for Target, where he coaches development teams on modeling and building applications for Cassandra. He is active in the Cassandra tags on StackOverflow, and has also contributed patches to cqlsh. Aaron holds a B.S. in Management/Computer Systems from the University of Wisconsin-Whitewater, a M.S. in Software Engineering and Database Technologies from Regis University, and is a 2x DataStax MVP for Apache Cassandra.
This presentation will investigate how using micro-batching for submitting writes to Cassandra can improve throughput and reduce client application CPU load.
Micro-batching combines writes for the same partition key into a single network request and ensures they hit the "fast path" for writes on a Cassandra node.
About the Speaker
Adam Zegelin Technical Co-founder, Instaclustr
As Instaclustrs founding software engineer, Adam provides the foundation knowledge of our capability and engineering environment. He delivers business-focused value to our code-base and overall capability architecture. Adam is also focused on providing Instaclustr's contribution to the broader open source community on which our products and services rely, including Apache Cassandra, Apache Spark and other technologies such as CoreOS and Docker.
At Signal we've been running Apache Cassandra in production since late 2011. We use a multi-region Cassandra deployment to make our data available globally to our customers. While Cassandra does much of the heavy lifting for us, we've run into interesting challenges during periods of rapid growth. In this presentation we'll focus on one of those scenarios, including our before and after data model, methodology and tools we used to recover and lessons learned along the way.
This is a presentation of the popular NoSQL database Apache Cassandra which was created by our team in the context of the module "Business Intelligence and Big Data Analysis".
Apache Cassandra operations have the reputation to be simple on single datacenter deployments and / or low volume clusters but they become way more complex on high latency multi-datacenter clusters with high volume and / or high throughout: basic Apache Cassandra operations such as repairs, compactions or hints delivery can have dramatic consequences even on a healthy high latency multi-datacenter cluster.
In this presentation, Julien will go through Apache Cassandra mutli-datacenter concepts first then show multi-datacenter operations essentials in details: bootstrapping new nodes and / or datacenter, repairs strategy, Java GC tuning, OS tuning, Apache Cassandra configuration and monitoring.
Based on his 3 years experience managing a multi-datacenter cluster against Apache Cassandra 2.0, 2.1, 2.2 and 3.0, Julien will give you tips on how to anticipate and prevent / mitigate issues related to basic Apache Cassandra operations with a multi-datacenter cluster.
About the Speaker
Julien Anguenot VP Software Engineering, iland Internet Solutions, Corp
Julien currently serves as iland's Vice President of Software Engineering. Prior to joining iland, Mr. Anguenot held tech leadership positions at several open source content management vendors and tech startups in Europe and in the U.S. Julien is a long time Open Source software advocate, contributor and speaker: Zope, ZODB, Nuxeo contributor, Zope and OpenStack foundations member, his talks includes Apache Con, Cassandra summit, OpenStack summit, The WWW Conference or still EuroPython.
Webinar: Getting Started with Apache CassandraDataStax
Would you like to learn how to use Cassandra but don’t know where to begin? Want to get your feet wet but you’re lost in the desert? Longing for a cluster when you don’t even know how to set up a node? Then look no further! Rebecca Mills, Junior Evangelist at Datastax, will guide you in the webinar “Getting Started with Apache Cassandra...”
You'll get an overview of Planet Cassandra’s resources to get you started quickly and easily. Rebecca will take you down the path that's right for you, whether you are a developer or administrator. Join if you are interested in getting Cassandra up and working in the way that suits you best.
Migration Best Practices: From RDBMS to Cassandra without a HitchDataStax Academy
Presenter: Duy Hai Doan, Technical Advocate at Datastax
Libon is a messaging service designed to improve mobile communications through free calls, chat and a voicemail services regardless of operator or Internet access provider. As a mobile communications application, Libon processes billions of messages and calls while backing up billions of contact data. Join this webinar to learn best practices and pitfalls to avoid when tackling a migration project from Relational Database (RDBMS) to Cassandra and how Libon is now able to ingest massive volumes of high velocity data with read and write latency below 10 milliseconds.
Security is often an afterthought; configured and applied at the last minute before rolling out a new system. Instaclustr has deployed Cassandra for customers with many different requirements.
From deployments in Heroku requiring total public access through to private data centres, we will walk you through securing Cassandra the right way.
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandraaaronmorton
Slides from my talk at Cassandra Summit 2015
http://cassandrasummit-datastax.com/agenda/repeatable-scalable-reliable-observable-cassandra/
thelastpickle.com
At Instagram, our mission is to capture and share the world's moments. Our app is used by over 400M people monthly; this creates a lot of challenging data needs. We use Cassandra heavily, as a general key-value storage. In this presentation, I will talk about how we use Cassandra to serve our critical use cases; the improvements/patches we made to make sure Cassandra can meet our low latency, high scalability requirements; and some pain points we have.
About the Speaker
Dikang Gu Software Engineer, Facebook
I'm a software engineer at Instagram core infra team, working on scaling Instagram infrastructure, especially on building a generic key-value store based on Cassandra. Prior to this, I worked on the development of HDFS in Facebook. I got the master degree of Computer Science in Shanghai Jiao Tong university in China.
Cassandra Summit 2014: Apache Cassandra Best Practices at EbayDataStax Academy
Presenter: Feng Qu, Principal DBA at eBay
Cassandra has been adopted widely at eBay in recent years and used by many end-user facing applications. I will introduce best practices we have built over the time around system design, capacity planning, deployment automation, monitoring integration, performance analysis and troubleshooting. I will also share our experience working with DataStax support to provide a highly available, highly scalable data store fitting into eBay infrastructure.
Apache Cassandra is a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model.
This presentation, given at FOSDEM in 2010, provides a brief summary of cassandra's history, a high-level overview of the architecture and data model, and showcases some real life use-cases.
Slides for the talk "Cassandra and Spark: Love at First Sight" given at Texas Linux Fest 2015. Gives an introduction to both Cassandra and Spark and how they work together.
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarDataStax Academy
We have seen rapid adoption of C* at eBay in past two years. We have made tremendous efforts to integrate C* into existing database platforms, including Oracle, MySQL, Postgres, MongoDB, XMP etc.. We also scale C* to meet business requirement and encountered technical challenges you only see at eBay scale, 100TB data on hundreds of nodes. We will share our experience of deployment automation, managing, monitoring, reporting for both Apache Cassandra and DataStax enterprise.
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...DataStax
Lessons learned from a year spent building a Cassandra cluster over multiple regions, data centers, and providers. Will discuss our successes and learnings on replication, operations, and application development.
About the Speaker
Aaron Ploetz Lead Technical Architect, Target
Aaron is a Lead Technical Architect for Target, where he coaches development teams on modeling and building applications for Cassandra. He is active in the Cassandra tags on StackOverflow, and has also contributed patches to cqlsh. Aaron holds a B.S. in Management/Computer Systems from the University of Wisconsin-Whitewater, a M.S. in Software Engineering and Database Technologies from Regis University, and is a 2x DataStax MVP for Apache Cassandra.
This presentation will investigate how using micro-batching for submitting writes to Cassandra can improve throughput and reduce client application CPU load.
Micro-batching combines writes for the same partition key into a single network request and ensures they hit the "fast path" for writes on a Cassandra node.
About the Speaker
Adam Zegelin Technical Co-founder, Instaclustr
As Instaclustrs founding software engineer, Adam provides the foundation knowledge of our capability and engineering environment. He delivers business-focused value to our code-base and overall capability architecture. Adam is also focused on providing Instaclustr's contribution to the broader open source community on which our products and services rely, including Apache Cassandra, Apache Spark and other technologies such as CoreOS and Docker.
At Signal we've been running Apache Cassandra in production since late 2011. We use a multi-region Cassandra deployment to make our data available globally to our customers. While Cassandra does much of the heavy lifting for us, we've run into interesting challenges during periods of rapid growth. In this presentation we'll focus on one of those scenarios, including our before and after data model, methodology and tools we used to recover and lessons learned along the way.
This is a presentation of the popular NoSQL database Apache Cassandra which was created by our team in the context of the module "Business Intelligence and Big Data Analysis".
Apache Cassandra operations have the reputation to be simple on single datacenter deployments and / or low volume clusters but they become way more complex on high latency multi-datacenter clusters with high volume and / or high throughout: basic Apache Cassandra operations such as repairs, compactions or hints delivery can have dramatic consequences even on a healthy high latency multi-datacenter cluster.
In this presentation, Julien will go through Apache Cassandra mutli-datacenter concepts first then show multi-datacenter operations essentials in details: bootstrapping new nodes and / or datacenter, repairs strategy, Java GC tuning, OS tuning, Apache Cassandra configuration and monitoring.
Based on his 3 years experience managing a multi-datacenter cluster against Apache Cassandra 2.0, 2.1, 2.2 and 3.0, Julien will give you tips on how to anticipate and prevent / mitigate issues related to basic Apache Cassandra operations with a multi-datacenter cluster.
About the Speaker
Julien Anguenot VP Software Engineering, iland Internet Solutions, Corp
Julien currently serves as iland's Vice President of Software Engineering. Prior to joining iland, Mr. Anguenot held tech leadership positions at several open source content management vendors and tech startups in Europe and in the U.S. Julien is a long time Open Source software advocate, contributor and speaker: Zope, ZODB, Nuxeo contributor, Zope and OpenStack foundations member, his talks includes Apache Con, Cassandra summit, OpenStack summit, The WWW Conference or still EuroPython.
Webinar: Getting Started with Apache CassandraDataStax
Would you like to learn how to use Cassandra but don’t know where to begin? Want to get your feet wet but you’re lost in the desert? Longing for a cluster when you don’t even know how to set up a node? Then look no further! Rebecca Mills, Junior Evangelist at Datastax, will guide you in the webinar “Getting Started with Apache Cassandra...”
You'll get an overview of Planet Cassandra’s resources to get you started quickly and easily. Rebecca will take you down the path that's right for you, whether you are a developer or administrator. Join if you are interested in getting Cassandra up and working in the way that suits you best.
Migration Best Practices: From RDBMS to Cassandra without a HitchDataStax Academy
Presenter: Duy Hai Doan, Technical Advocate at Datastax
Libon is a messaging service designed to improve mobile communications through free calls, chat and a voicemail services regardless of operator or Internet access provider. As a mobile communications application, Libon processes billions of messages and calls while backing up billions of contact data. Join this webinar to learn best practices and pitfalls to avoid when tackling a migration project from Relational Database (RDBMS) to Cassandra and how Libon is now able to ingest massive volumes of high velocity data with read and write latency below 10 milliseconds.
Security is often an afterthought; configured and applied at the last minute before rolling out a new system. Instaclustr has deployed Cassandra for customers with many different requirements.
From deployments in Heroku requiring total public access through to private data centres, we will walk you through securing Cassandra the right way.
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandraaaronmorton
Slides from my talk at Cassandra Summit 2015
http://cassandrasummit-datastax.com/agenda/repeatable-scalable-reliable-observable-cassandra/
thelastpickle.com
Hardening cassandra for compliance or paranoiazznate
How to secure a cassandra cluster. Includes details on configuring SSL, setting up a certificate authority and creating certificates and trust chains for the JVM.
The Economics of Scaling Cassandra - By Alex Bordei, Techie Product Manager at Bigstep
This presentation was made during the "Cassandra Summit 2014" Event, in London.
We benchmarked Cassandra on a number of configurations and we show what's the scaling profile. We test Cassandra on Docker as well as Cassandra's In-memory feature.
Follow Alex on Twitter: @alexandrubordei
Bigstep on Twitter: @BigStepInc
If you have any questions, let us know at hello@bigstep.com and we'll do our best to answer.
Stay informed: http://blog.bigstep.com/
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...DataStax Academy
At Hulu, we deal with scaling our web services to meet the demands of an ever growing number of users. During this talk, we will discuss our initial use case for cassandra at Hulu: the video progress tracking service known as hugetop. While cassandra provides a fantastic platform on which to build scalable applications, there are some dark corners of which to be cautious. We will provide a walkthrough of hugetop and some design decisions that went into the hugetop keyspace, our hardware choices, and our experiences operating cassandra in a high-traffic environment.
In the rush to release a new product, a new version or simply trying to get things working, security can sometimes be an afterthought. In this talk, Ben Bromhead CTO of Instaclustr, will explore the various ways in which you can setup and secure Cassandra appropriately for your threat environmen
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...Adrian Cockcroft
Presentation given in October 2011 at the High Performance Transaction Systems Workshop http://hpts.ws - describes how Netflix used AWS to run a set of highly scalable Cassandra benchmarks on hundreds of instances in only a few hours.
Cassandra Summit 2015: Real World DTCS For OperatorsJeff Jirsa
Real World DTCS For Operators
The introduction of DateTieredCompactionStrategy in late 2014 was a significant step forward in providing a viable compaction strategy for time series data, especially time series data that will be TTL'd out. DateTieredCompactionStrategy's introduction was met with genuine excitement, and its rapid adoption is testament to developers' and operators' desire to have data compacted in a way that better matches their write patterns.
However, DateTieredCompactionStrategy's features come with significant limitations. This talk will review our real world benchmarking and use cases for DTCS as a vehicle to discuss the implications of DateTieredCompactionStrategy on operational tasks such as repair, read-repair, bootstrapping, and especially DR recovery scenarios, and it will also discuss how those various limitations lead us to proposing an operations-friendly alternative to DateTieredCompactionStrategy.
Cassandra Summit 2015 - A Change of SeasonsEiti Kimura
A CHANGE OF SEASONS: A big move to Apache Cassandra!
This is an extended version of the material presented at Cassandra Summit 2015 - Santa Clara - California - USA.
In this presentation I will show you 3 moves, use cases, that constitute our Big Move to Apache Cassandra @Movile.
Walking through relational model to NoSQL solution, hybrid platforms and a staggering cost reduction and throughput increase.
AddThis: Scaling Cassandra up and down into containers with ZFSDataStax Academy
ZFS is an advanced file, raid, and volume management system originally developed by Sun Microsystems, 'The Last Word in File Systems' has been unavailable on Linux until recently. AddThis uses ZFS to more effectively scale up dedicated hardware, getting twice the performance at half the cost. ZFS is also fundamental to containerization, allowing nodes from multiple clusters to be co-located with safe persistent storage.
Ficstar Software: Cassandra Installation to OptimizationDataStax Academy
A general rule of thumb talk aimed at late bloomers, managers, directors and architects who have yet to adopt Cassandra.
Covers:
- what not to do.
- operational setup
- data modeling
- performance tuning
- capacity planning
- advanced use cases
What should you do in the First 90 Days as a Sales Manager or VP? Brett Wallace, VP of Sales for Zoominfo, gives 10 high-impact things to focus on to ramp up quickly. A must read for newly promoted Sales VPs and Managers...or aspiring ones!
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
Regardless of the meaning we are searching for over our vast amounts of data, whether we are in science, finance, technology, energy, health care…, we all share the same problems that must be solved: How do we achieve that? What technologies best support the requirements? This talk is about how to leverage fast access to historical data with real time streaming data for predictive modeling for lambda architecture with Spark Streaming, Kafka, Cassandra, Akka and Scala. Efficient Stream Computation, Composable Data Pipelines, Data Locality, Cassandra data model and low latency, Kafka producers and HTTP endpoints as akka actors...
Transitions are a critical time for leaders at all levels. Missteps made during the crucial first three months in a new role can jeopardize your success.
In this updated and expanded version of the international bestseller, Michael D. Watkins offers proven strategies for conquering the challenges of taking on a new role — no matter where you are in your career. Watkins, a noted expert on leadership transitions, also addresses today’s increasingly demanding professional landscape, where managers face more frequent changes and steeper expectations when they start their new jobs.
Whether you’re starting a new job, being promoted from within, or embarking on an overseas assignment, this is the guide you’ll need to succeed in your first 90 days — and beyond.
Introduction to Real Application Cluster
RAC - Savior of DBA
Oracle Clusterware (Platform on Platform)
RAC Startup sequence
RAC Architecture
RAC Components
Single Instance on RAC
Node Eviction
Important Log directories in RAC.
Tips to monitor and improve the RAC environment.
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsJulien Anguenot
iland has built a global data warehouse across multiple data centers, collecting and aggregating data from core cloud services including compute, storage and network as well as chargeback and compliance. iland's warehouse brings actionable intelligence that customers can use to manipulate resources, analyze trends, define alerts and share information.
In this session, we would like to present the lessons learned around Cassandra, both at the development and operations level, but also the technology and architecture we put in action on top of Cassandra such as Redis, syslog-ng, RabbitMQ, Java EE, etc.
Finally, we would like to share insights on how we are currently extending our platform with Spark and Kafka and what our motivations are.
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...DataStax Academy
iland has built a global data warehouse across multiple data centers, collecting and aggregating data from core cloud services including compute, storage and network as well as chargeback and compliance. iland's warehouse brings actionable intelligence that customers can use to manipulate resources, analyze trends, define alerts and share information.
In this session, we would like to present the lessons learned around Cassandra, both at the development and operations level, but also the technology and architecture we put in action on top of Cassandra such as Redis, syslog-ng, RabbitMQ, Java EE, etc.
Finally, we would like to share insights on how we are currently extending our platform with Spark and Kafka and what our motivations are.
Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson
This talk will address how a new architecture is emerging for analytics, based on Spark, Mesos, Akka, Cassandra and Kafka (SMACK). Popular architecture like Lambda separate layers of computation and delivery and require many technologies which have overlapping functionality. Some of this results in duplicated code, untyped processes, or high operational overhead, let alone the cost (i.e. ETL). I will discuss the problem domain and what is needed in terms of strategies, architecture and application design and code to begin leveraging simpler data flows. We will cover how the particular set of technologies addresses common requirements and how collaboratively they work together to enrich and reinforce each other.
One of our presentation which was given on Cassandra Database. Aruman implement big-data projects for its multiple client. RDBMS to Cassandra conversion is task which is taken by ARUMAN.
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...DataStax Academy
- Quick review of Cassandra functionality that applies to this use case
- Common Data Center and application architectures for highly available inventory applications, and why the were designed that way
- Cassandra implementations vis-a-vis infrastructure capabilities
The impedance mismatch: compromises made to fit into IT infrastructures designed and implemented with an old mindset
SMACK Stack 1.0 has been Spark, Mesos, Akka, Cassandra and Kafka working into different cohesive systems delivering different solutions for different use cases. Haven't heard about it before? Oh man! Where have you been? https://www.google.com/search?q=smack+stack+1.0
SMACK Stack 1.1 we go a step further Streaming, Mesos, Analytics, Cassandra and Kafka and Joe Stein will walk through in detail some of the different viable options for Streaming and Analytics with Mesos, Kafka and Cassandra.
Data Lake and the rise of the microservicesBigstep
By simply looking at structured and unstructured data, Data Lakes enable companies to understand correlations between existing and new external data - such as social media - in ways traditional Business Intelligence tools cannot.
For this you need to find out the most efficient way to store and access structured or unstructured petabyte-sized data across your entire infrastructure.
In this meetup we’ll give answers on the next questions:
1. Why would someone use a Data Lake?
2. Is it hard to build a Data Lake?
3. What are the main features that a Data Lake should bring in?
4. What’s the role of the microservices in the big data world?
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
Companies today are innovating with real-time data to deliver truly amazing customer experiences in the moment. Real-time data management for real-time customer experience is core to staying ahead of competition and driving revenue growth. Join Trays to learn how Comcast is differentiating itself from it's own historical reputation with Customer Experience strategies.
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
DataStax Enterprise (DSE) Graph is a built to manage, analyze, and search highly connected data. DSE Graph, built on NoSQL Apache Cassandra delivers continuous uptime along with predictable performance and scales for modern systems dealing with complex and constantly changing data.
Download DataStax Enterprise: Academy.DataStax.com/Download
Start free training for DataStax Enterprise Graph: Academy.DataStax.com/courses/ds332-datastax-enterprise-graph
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
DataStax Enterprise Advanced Replication supports one-way distributed data replication from remote database clusters that might experience periods of network or internet downtime. Benefiting use cases that require a 'hub and spoke' architecture.
Learn more at http://www.datastax.com/2016/07/stay-100-connected-with-dse-advanced-replication
Advanced Replication docs – https://docs.datastax.com/en/latest-dse/datastax_enterprise/advRep/advRepTOC.html
Data Modeling is the one of the first things to sink your teeth into when trying out a new database. That's why we are going to cover this foundational topic in enough detail for you to get dangerous. Data Modeling for relational databases is more than a touch different than the way it's approached with Cassandra. We will address the quintessential query-driven methodology through a couple of different use cases, including working with time series data for IoT. We will also demo a new tool to get you bootstrapped quickly with MovieLens sample data. This talk should give you the basics you need to get serious with Apache Cassandra.
Hear about how Coursera uses Cassandra as the core of its scalable online education platform. I'll discuss the strengths of Cassandra that we leverage, as well as some limitations that you might run into as well in practice.
In the second part of this talk, we'll dive into how best to effectively use the Datastax Java drivers. We'll dig into how the driver is architected, and use this understanding to develop best practices to follow. I'll also share a couple of interesting bug we've run into at Coursera.
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
This talk covers scaling Cassandra to a fast growing user base. Alex and Isaias will cover new best practices and how to work with the strengths and weaknesses of Cassandra at large scale. They will discuss how to adapt to bottlenecks while providing a rich feature set to the playstation community.
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
This talk covers scaling Cassandra to a fast growing user base. Alex and Isaias will cover new best practices and how to work with the strengths and weaknesses of Cassandra at large scale. They will discuss how to adapt to bottlenecks while providing a rich feature set to the playstation community.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
3. • 18
Years
of
Data
infrastructure
management
consulting
• 200+
Top
brands
• 6000+
databases
under
management
• Over
400
DBA’s,
in
35
countries
• Top
5%
of
DBA
work
force,
9
Oracle
ACE’s,
2
Microsoft
MVP’s,
1
Cassandra
MVP
• Oracle,
Microsoft,
MySQL,
Datastax
partners,
Netezza,
Hadoop
and
MongoDB
plus
UNIX
Sysadmin
and
Oracle
apps
About Pythian
4. Where does René come from
– Oracle
DBA
• Started
with
Version
9.2
in
2004
– Speaker
at
Oracle
Open
World,
Developers
Day
and
Collaborate
– APress
Q1
2016:
“Prac%cal
Data
Refresh”
– Movie
Fanatic
&
Music
Lover
– Bringing
the
best
from
México
(Mexihtli)
to
the
rest
of
the
world
and
in
the
process
photographing
it
:)
– rene-‐ace.com
– @rene_ace
4
5. Where does Carlos come
5
• Cassandra
Consultant
• First
contact
was
0.8
• Cassandra
MVP
&
DataStax
Certified
Architect
• Lisbon
Cassandra
Meetup
• Passion
for
distributed
systems
• Loves
a
good
challenge
• Waterpolo
is
my
sport
• @cjrolo
7. 6th Happiest Job of 2015!
7
http://www.forbes.com/sites/susanadams/2014/03/20/the-happiest-and-unhappiest-jobs-in-2014/
Work-life
balance
Relationship with
boss and co-workers
Daily tasks
Job resources
Field will grow by
15% between
2012 and 2022
DBA can be the
key driver of
success
8. Happiest Job of 2034?
Oxford University: THE FUTURE OF EMPLOYMENT: HOW SUSCEPTIBLE ARE JOBS TO COMPUTERISATION?
• 47
percent
of
American
jobs
are
at
high
risk
of
being
taken
by
computers
within
the
next
two
decades.
– 1st
Wave
• Computers
will
start
replacing
people
in
especially
vulnerable
fields
like
transportation/logistics,
production
labor,
and
administrative
support.
– 2nd
Wave
• Dependent
upon
the
development
of
good
artificial
intelligence.
This
could
next
put
jobs
in
management,
science
and
engineering,
and
the
arts
at
risk.
8
9. What is Cassandra ?
• NoSQL
database,
developed
in
JavaOne
• Fully
distributed
DB
• Meaning
that
there
is
no
master
DB,
unlike
Oracle
or
MySQL.
• Linearly
scalable
• Based
on
2
core
technologies,
Google’s
Big
Table
and
Amazon’s
Dynamo
• 2
versions
of
Cassandra
• Community
Edition.-‐
This
is
distributed
under
the
Apache™
License
• Enterprise
Edition
.-‐
This
is
distributed
by
Datastax
9
≠
10. CAP
Theorem
• In
a
distributed
system
you
can
only
have
two
out
of
the
following
three
guarantees
across
a
write/read
pair:
• Consistency.-‐
A
read
is
guaranteed
to
return
the
most
recent
write
for
a
given
client.
• Availability.-‐A
non-‐failing
node
will
return
a
reasonable
response
within
a
reasonable
amount
of
time
(no
error
or
timeout).
• Partition
Tolerance.-‐The
system
will
continue
to
function
when
network
partitions
occur.
10
N1 N2
X X
N1 N2
N1 N2
What is Cassandra ?
11. What is Cassandra ?
• Cassandra
is
a
BASE
(Basically
Available,
Soft
state,
Eventually
consistent)
type
system
11
• Not
an
ACID
(Atomicity,
Consistency,
Isolation,
Durability)
type
system
12. It Can be as easy as …
• Start
your
machine
and
install
the
following:
• ntp
(Packages
are
normally
ntp,
ntpdata
and
ntp-‐
doc)
• wget
(Unless
you
have
your
packages
copied
over
via
other
means)
• vim
(Or
your
favorite
text
editor)
• Yum
Package
Management
• Root
or
sudo
access
to
the
install
machine
• Latest
version
of
Oracle
Java
SE
Runtime
Environment
(JRE)
8
(recommended)
or
OpenJDK
7.
• Python
2.6+
(needed
if
installing
OpsCenter)
12
13. It Can be as easy as …
13
• Install
Cassandra.
~$ sudo yum install dsc21-2.1.5-1 cassandra2.1.5-1
• Install
optional
utilities.
~$ sudo yum install cassandra21-tools-2.1.5-1
• Start
Cassandra
service
~$ sudo service cassandra stop
~$ sudo rm -rf /var/lib/cassandra/data/system/*
• In
the
cassandra-‐rackdc.properties
file
#
indicate
the
rack
and
dc
for
this
node
dc=Pythian
rack=RAC1
~$ sudo service cassandra start
14. Where is everything in Cassandra?
14
Directories Description
/var/lib/cassandra Data
directories
/var/log/
cassandra Log
directory
/var/run/
cassandra Runtime
files
/usr/share/
cassandra Environment
settings
/usr/share/
cassandra/
lib
JAR
files
/usr/bin Optional
utilities,
such
as
sstablelevelreset,
sstablerepairedset,
and
sstablesplit
/usr/bin Binary
files
/usr/sbin
/etc/cassandra Configuration
files
/etc/init.d Service
startup
script
/etc/security/
limits.d Cassandra
user
limits
/etc/default
/usr/share/
doc/
cassandra/examples
Sample
cassandra.yaml
files
for
stress
testing
15. I come from this world…
12c
Version
Architecture…
15
16. I come from this world…
Oracle…
16
101010
Online Redo
Log10100
Data Files Control Files
Segment
Database
Tablespace
Extent
Oracle data
block
Schema Data file
OS block
Logical
Datafile
Physical
Datafile
17. I come from this world…
17
RAC
-‐
For
Node
Point
of
Failure
RAC Cluster
Node3Node2
ASM Disks
Node1
Public Network
Storage Network
ASM Network
CSS Network
ASM ASM ASM
DBB DBBDBB
Global
Data
Services
– Service Failover / Load Balancing
18. I come from this world…
18
Dataguard
-‐
For
Failover
Primary
Standby
Far
Sync
Instance
SYNC
ASYNC
Zero
data
loss
failover
20. One Ring to Rule them All
20
• The
total
amount
of
data
managed
by
the
cluster
is
represented
as
a
ring
• Each
node
is
assigned
a
part
of
the
database
to
hold
based
on
each
table’s
primary
key.
• To
guarantee
both
availability
and
durability
multiple
nodes
will
be
assigned
to
the
same
data.
• There
is
no
master
node
all
nodes
can
perform
all
operations
1
4
3
2
A-F,T-Z,M-S
G-L,A-F,T-Z
M-S,G-L,A-F
T-Z,M-S,G-L
21. Gossip
21
• Peer-‐to-‐peer
communication
protocol
in
which
nodes
periodically
exchange
state
information
• Runs
every
second
and
exchanges
state
messages
with
up
to
three
other
nodes
in
the
cluster
• Failure
detection
• It
determines
locally
from
gossip
state
and
history
if
another
node
in
the
system
is
down
or
has
come
back
up.
22. Consistent Hashing
22
• A
hash
consists
of
one
or
more
arithmetic
operations
on
a
piece
of
data
• Common
way
of
load
balancing
across
several
nodes
• Hash
function
must
have
a
upper
and
lower
bound
so
objects
can
be
mapped
in
a
circle
• Common
Hash
algorithms
– Simple
checksums
– Message
Digest
(MD5)
– Secure
Hash
Algorithm
(SHA-‐1/2)
– MurmurHash
23. Partitioners
23
• Determines
how
data
is
distributed
across
the
nodes
in
the
cluster
• Function
for
deriving
a
token
representing
a
row
from
its
partition
key
Cassandra
Offers:
– Murmur3Partition
– RandomPartitioner
– ByteOrderedPartitioner
24. Virtual Nodes
24
• Solution
for
avoiding
calculating
node
tokens
and
thinking
about
the
cluster
size
before
hand
• Each
node
has
multiple
virtual
nodes
• Each
node
virtual
node
own
a
much
smaller
subset
of
data
25. Coordinators
25
• Acts
as
a
proxy
between
the
client
application
and
the
nodes
that
own
the
data
being
requestedAny
client
request
can
be
sent
to
any
node.
26. Snitch
26
• Is
responsible
for
keeping
all
of
the
nodes
up
to
date
on
what
node
has
what
data,
what
nodes
are
currently
down,
what
nodes
are
bootstrapping,
etc.
• It
Interprets
the
topology
The
most
popular
are:
– Gossiping
property
file
snitch
– EC2
Snitch
– EC2
Multi-‐region
snitch
– Dynamic
Snitch
29. A CASSANDRA TABLE OR COLUMN FAMILY
29
Coordinator
Snitch
Commitlog
Writer
Mem
table
writer
Mem
Table
Flush
(Sstable
writer)
Reader
Mem
tables
Bloom
Filters
Cassandra
Node
CommitLog
10100
SSTables
30. A CASSANDRA TABLE OR COLUMN FAMILY
30
• Consists
of
one
or
more
SStables
and
0
or
more
MEMtables
• SStable
stands
for
Sorted
String
Table.
• E.G.
all
of
the
Columns
in
the
SStable
are
sorted
in
order
by
key.
• Each
SStable
consists
of
the
data
table,
bloom
filter,
index
and
some
other
minor
files.
• SStables
are
immutable.
Once
written
they
are
never
altered
only
read
and
eventually
deleted
videogames-events-data-jb-1.db
videogames-events-filters-jb-1.db
videogames-events-index-jb-1.db
videogames-events-data-jb-2.db
videogames-events-filters-jb-2.db
videogames-events-index-jb-2.db
videogames-events-data-jb-3.db
videogames-events-filters-jb-3.db
videogames-events-index-jb-3.db
videogames-events-data-jb-4.db
videogames-events-filters-jb-4.db
videogames-events-index-jb-4.db
SStables
on
disk
/var/lib/cassandra
31. REPLICATION FACTOR (RF) AND CONSISTENCY
31
• Replication
Factor
is
the
number
of
copies
of
columns
stored
in
the
ring
• Replication
factor
should
not
exceed
the
number
of
nodes
in
the
cluster
– RF=1
is
one
copy
this
means
that
the
data
for
each
column
is
stored
only
once
in
the
ring.
– RF=3
(default)
means
every
column
stored
in
the
database
is
stored
three
times.
– Quorum
.-‐
The
read
and
write
must
be
acked/returned
from
a
quorum
of
nodes.
32. REPLICATION FACTOR (RF) AND CONSISTENCY
32
• Consistency
– When
write
or
read
is
performed
the
application
can
choose
to
wait
for
n
copies
of
the
data
to
be
written
or
read
this
is
referred
to
as
consistency
of
n.
– There
is
a
special
consistency
value
called
quorum
which
means
a
response
from
RF/2+1
nodes
is
required.
33. HOW TO MAKE SURE WE DON’T LOOSE DATA
33
• Three
anti-‐entropy
mechanisms
in
Cassandra
1)
Hinted
handoff
2)
Read
repair
3)
Repair
A.K.A.
Anti-‐Entropy
35. COMPACTIONS
35
• SStables
are
immutable.
• Deletes
and
updates
are
just
new
writes
• SStables
are
merged
together
by
partitioned
key.Old
obsolete
data
is
discarded.
• Lots
of
SStables
become
a
few.
• Compaction
can
require
a
lot
of
disk
space.
DO
NOT
LET
your
disks
get
more
than
50%
full.
36. CQL - Cassandra Query Language
36
CQL
is
not
SQL
• Default
and
primary
interface
into
the
Cassandra
Database
(since
2.0)
• Cassandra
does
not
support
joins
or
subqueries
• Only
way
to
create
users
and
user
based
permissions
• Very
similar:
cqlsh> CREATE KEYSPACE sandbox WITH REPLICATION = { 'class' :
'NetworkTopologyStrategy', DC1 : 1};
cqlsh> USE sandbox;
cqlsh:sandbox>CREATE TABLE data (id uuid, data text, PRIMARY KEY (id));
cqlsh:sandbox> INSERT INTO data (id, data) values
(c37d661d-7e61-49ea-96a5-68c34e83db3a, 'testing');
cqlsh:sandbox> SELECT * FROM data;
38. 38
Feature/Function
DSE/Cassandra Oracle
RDBMS
Core architecture “Masterless”; peer-to-peer with
all nodes being the same
Traditional standalone
High availability Continuous availability with built
in redundancy and hardware
rack awareness in both single
and multiple data centers
Oracle Dataguard (for failover)
and Oracle RAC (Node SPOF)
GoldenGate
Data model Google Bigtable Relational/tabular
Data consistency model Tunable consistency (CAP
theorem consistency per
operation
Traditional ACID
Storage model Targeted directories with
separation
Tablespaces
Logical database
container
Keyspace Database
Backup/recovery Online, point-in-time restore Online, point-in-time restore
Enterprise management/
monitoring
DataStax OpsCenter Oracle Enterprise Manager
39. LESSONS LEARNED
39
• Understand
the
Data
Model
Differences
• Hardware
Setup
does
Matter
• Grep
the
logs
for
errors
and
warnings
• Make
sure
each
node
is
created
properly
• Know
your
tools
• nodetool
utility
• Cassandra
bulk
loader
(sstableloader)
• jconsole/JavaVisualVM
• Cassandra-‐Stress
• OpsCenter
41. FIT-ACER
• F – Focus (SLOW DOWN! Are you ready?)
• I – Identify server/DB name, time, authorization
• T – Type the command (do not hit enter yet)
• A – Assess the command (SPEND TIME HERE!)
• C – Check the server / database name again
• E – Execute the command
• R – Review and document the results
41
43. 43
To contact us
sales@pythian.com
1-877-PYTHIAN
To follow us
http://www.pythian.com/blog
http://www.facebook.com/pages/The-Pythian-Group/163902527671
@pythian
http://www.linkedin.com/company/pythian
Thank you – Q&A