Apache Cassandra is a scalable, fault-tolerant database that has found its way into more than 25% of the Fortune 100 and continues to enjoy significant adoption in the marketplace. In this talk we'll introduce you to Cassandra, explore some of its internals, and discuss CQL (the SQL-like query language for Cassandra). We'll finish by talking about how some companies are using it for services you probably interact with in your daily life. You'll leave with all the tools you need to start exploring Cassandra on your own.
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2kvXlPd
This CloudxLab Introduction to Apache ZooKeeper tutorial helps you to understand ZooKeeper in detail. Below are the topics covered in this tutorial:
1) Data Model
2) Znode Types
3) Persistent Znode
4) Sequential Znode
5) Architecture
6) Election & Majority Demo
7) Why Do We Need Majority?
8) Guarantees - Sequential consistency, Atomicity, Single system image, Durability, Timeliness
9) ZooKeeper APIs
10) Watches & Triggers
11) ACLs - Access Control Lists
12) Usecases
13) When Not to Use ZooKeeper
Apache Cassandra operations have the reputation to be quite simple against single datacenter clusters and / or low volume clusters but they become way more complex against high latency multi-datacenter clusters: basic operations such as repair, compaction or hints delivery can have dramatic consequences even on a healthy cluster.
In this presentation, Julien will go through Cassandra operations in details: bootstrapping new nodes and / or datacenter, repair strategies, compaction strategies, GC tuning, OS tuning, large batch of data removal and Apache Cassandra upgrade strategy.
Julien will give you tips and techniques on how to anticipate issues inherent to multi-datacenter cluster: how and what to monitor, hardware and network considerations as well as data model and application level bad design / anti-patterns that can affect your multi-datacenter cluster performances.
Apache Cassandra, part 3 – machinery, work with CassandraAndrey Lomakin
Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2kvXlPd
This CloudxLab Introduction to Apache ZooKeeper tutorial helps you to understand ZooKeeper in detail. Below are the topics covered in this tutorial:
1) Data Model
2) Znode Types
3) Persistent Znode
4) Sequential Znode
5) Architecture
6) Election & Majority Demo
7) Why Do We Need Majority?
8) Guarantees - Sequential consistency, Atomicity, Single system image, Durability, Timeliness
9) ZooKeeper APIs
10) Watches & Triggers
11) ACLs - Access Control Lists
12) Usecases
13) When Not to Use ZooKeeper
Apache Cassandra operations have the reputation to be quite simple against single datacenter clusters and / or low volume clusters but they become way more complex against high latency multi-datacenter clusters: basic operations such as repair, compaction or hints delivery can have dramatic consequences even on a healthy cluster.
In this presentation, Julien will go through Cassandra operations in details: bootstrapping new nodes and / or datacenter, repair strategies, compaction strategies, GC tuning, OS tuning, large batch of data removal and Apache Cassandra upgrade strategy.
Julien will give you tips and techniques on how to anticipate issues inherent to multi-datacenter cluster: how and what to monitor, hardware and network considerations as well as data model and application level bad design / anti-patterns that can affect your multi-datacenter cluster performances.
Apache Cassandra, part 3 – machinery, work with CassandraAndrey Lomakin
Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.
Introduction to Cassandra: Replication and ConsistencyBenjamin Black
A short introduction to replication and consistency in the Cassandra distributed database. Delivered April 28th, 2010 at the Seattle Scalability Meetup.
11 April 2016 - ION Bangladesh - Jan Zorz set up DNSSEC, DANE, and TLS in his go6lab and then tested the implementations in the top one million Alexa domains. Jan will share his experiences deploying, testing, and evaluating DNSSEC, DANE, and TLS in his own lab and explain the process he used.
Cassandra by example - the path of read and write requestsgrro
This article describes how Cassandra handles and processes requests. It will help you to get a better impression about Cassandra's internals and architecture. The path of a single read request as well as the path of a single write request will be described in detail.
Managing large volumes of data isn’t trivial and needs a plan. Fast Data is how we describe the nature of data in a heavily consumer-driven world. Fast in. Fast out. Is your data infrastructure ready? You will learn some important reference architectures for large-scale data problems. The three main areas are covered:
Organize - Manage the incoming data stream and ensure it is processed correctly and on time. No data left behind.
Process - Analyze volumes of data you receive in near real-time or in a batch. Be ready for fast serving in your application.
Store - Reliably store data in the data models to support your application. Never accept downtime or slow response times.
Cassandra Community Webinar: Back to Basics with CQL3DataStax
Cassandra is a distributed, massively scalable, fault tolerant, columnar data store, and if you need the ability to make fast writes, the only thing faster than Cassandra is /dev/null! In this fast-paced presentation, we'll briefly describe big data, and the area of big data that Cassandra is designed to fill. We will cover Cassandra's unique, every-node-the-same architecture. We will reveal Cassandra's internal data structure and explain just why Cassandra is so darned fast. Finally, we'll wrap up with a discussion of data modeling using the new standard protocol: CQL (Cassandra Query Language).
Apache Cassandra operations have the reputation to be simple on single datacenter deployments and / or low volume clusters but they become way more complex on high latency multi-datacenter clusters with high volume and / or high throughout: basic Apache Cassandra operations such as repairs, compactions or hints delivery can have dramatic consequences even on a healthy high latency multi-datacenter cluster.
In this presentation, Julien will go through Apache Cassandra mutli-datacenter concepts first then show multi-datacenter operations essentials in details: bootstrapping new nodes and / or datacenter, repairs strategy, Java GC tuning, OS tuning, Apache Cassandra configuration and monitoring.
Based on his 3 years experience managing a multi-datacenter cluster against Apache Cassandra 2.0, 2.1, 2.2 and 3.0, Julien will give you tips on how to anticipate and prevent / mitigate issues related to basic Apache Cassandra operations with a multi-datacenter cluster.
About the Speaker
Julien Anguenot VP Software Engineering, iland Internet Solutions, Corp
Julien currently serves as iland's Vice President of Software Engineering. Prior to joining iland, Mr. Anguenot held tech leadership positions at several open source content management vendors and tech startups in Europe and in the U.S. Julien is a long time Open Source software advocate, contributor and speaker: Zope, ZODB, Nuxeo contributor, Zope and OpenStack foundations member, his talks includes Apache Con, Cassandra summit, OpenStack summit, The WWW Conference or still EuroPython.
Abstract:
Cassandra is a new kind of database: it is more than a single-machine system. It naturally runs in a High-Availability configuration. All nodes in the system are symmetric; there is no single point of failure. As you add machines, failure becomes routine, and Cassandra is built to tolerate that with no interruptions.
Cassandra is linearly scalable with good performance characteristics for very small and very large data stores. Unlike earlier efforts, Cassandra is more than just a key-value store; it is a structured data store which can facilitate complex use cases and queries. Cassandra allows for random access to your data organized into rows and columns.
Cassandra is different, and exciting. This presentation will discuss the pros and cons of using Cassandra, and why it has seen such amazing adoption in the past year.
Bio:
Ben Coverston is Director of Operations at DataStax (formerly knows as Riptano), a provider of software, support, services, training, resources and help for Cassandra. He has been involved in enterprise software his entire career. Working in the airline industry, he helped to build some of the highest volume online booking sites in the world. He saw first hand the consequences of trying to solve real world scalability problems at the limit of what traditional relational databases are capable of.
Apache Cassandra operations have the reputation to be simple on single datacenter deployments and / or low volume clusters but they become way more complex on high latency multi-datacenter clusters with high volume and / or high throughout: basic Apache Cassandra operations such as repairs, compactions or hints delivery can have dramatic consequences even on a healthy high latency multi-datacenter cluster.
In this presentation, Julien will go through Apache Cassandra mutli-datacenter concepts first then show multi-datacenter operations essentials in details: bootstrapping new nodes and / or datacenter, repairs strategy, Java GC tuning, OS tuning, Apache Cassandra configuration and monitoring.
Based on his 3 years experience managing a multi-datacenter cluster against Apache Cassandra 2.0, 2.1, 2.2 and 3.0, Julien will give you tips on how to anticipate and prevent / mitigate issues related to basic Apache Cassandra operations with a multi-datacenter cluster.
Basically everything you need to get started on your Zookeeper training, and setup apache Hadoop high availability with QJM setup with automatic failover.
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayAltinity Ltd
LIVE WEBINAR: October 21, 2021 | 10 am PT
SPEAKERS: Jun Li, Principal Architect, eBay & Robert Hodges, CEO, Altinity
eBay depends on Kafka to solve the impedance mismatch between rapidly arriving messages in event streams and efficient block insert into ClickHouse clusters. Naïve loading procedures from Kafka to ClickHouse generate non-deterministic blocks, which can lead to data loss and incorrect results in applications. The eBay team solved this problem with a block aggregator that leverages Kafka to store message processing metadata as well as ClickHouse deduplication to ensure blocks being loaded to ClickHouse exactly once. The block aggregator allows eBay to support a sharded ClickHouse architecture across multiple data centers that can tolerate failures in any individual part of the system. Join us to learn how eBay developed this unique architecture and how they use it to deliver low-latency analytics to users.
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...Luke Tillman
Transitioning a legacy monolithic application to microservices is a daunting task by itself and it only gets more complicated as you start to dig through all the libraries and frameworks out there meant to help. In this talk, we'll cover the transition of a real Cassandra-based application to a microservices architecture using Grpc from Google and Falcor from Netflix. (Yes, Falcor is more than just a magical luck dragon from an awesome 80's movie.) We'll talk about why these technologies were a good fit for the project as well as why Cassandra is often a great choice once you go down the path of microservices. And since all the code for the project is open source, you'll have plenty to dig into afterwards.
Introduction to Data Modeling with Apache CassandraLuke Tillman
Relational systems have always been built on the premise of modeling relationships. As you will see, static schema, one-to-one, many-to-many still have a place in Cassandra. From the familiar, we’ll go into the specific differences in Cassandra and tricks to make your application fast and resilient.
Introduction to Cassandra: Replication and ConsistencyBenjamin Black
A short introduction to replication and consistency in the Cassandra distributed database. Delivered April 28th, 2010 at the Seattle Scalability Meetup.
11 April 2016 - ION Bangladesh - Jan Zorz set up DNSSEC, DANE, and TLS in his go6lab and then tested the implementations in the top one million Alexa domains. Jan will share his experiences deploying, testing, and evaluating DNSSEC, DANE, and TLS in his own lab and explain the process he used.
Cassandra by example - the path of read and write requestsgrro
This article describes how Cassandra handles and processes requests. It will help you to get a better impression about Cassandra's internals and architecture. The path of a single read request as well as the path of a single write request will be described in detail.
Managing large volumes of data isn’t trivial and needs a plan. Fast Data is how we describe the nature of data in a heavily consumer-driven world. Fast in. Fast out. Is your data infrastructure ready? You will learn some important reference architectures for large-scale data problems. The three main areas are covered:
Organize - Manage the incoming data stream and ensure it is processed correctly and on time. No data left behind.
Process - Analyze volumes of data you receive in near real-time or in a batch. Be ready for fast serving in your application.
Store - Reliably store data in the data models to support your application. Never accept downtime or slow response times.
Cassandra Community Webinar: Back to Basics with CQL3DataStax
Cassandra is a distributed, massively scalable, fault tolerant, columnar data store, and if you need the ability to make fast writes, the only thing faster than Cassandra is /dev/null! In this fast-paced presentation, we'll briefly describe big data, and the area of big data that Cassandra is designed to fill. We will cover Cassandra's unique, every-node-the-same architecture. We will reveal Cassandra's internal data structure and explain just why Cassandra is so darned fast. Finally, we'll wrap up with a discussion of data modeling using the new standard protocol: CQL (Cassandra Query Language).
Apache Cassandra operations have the reputation to be simple on single datacenter deployments and / or low volume clusters but they become way more complex on high latency multi-datacenter clusters with high volume and / or high throughout: basic Apache Cassandra operations such as repairs, compactions or hints delivery can have dramatic consequences even on a healthy high latency multi-datacenter cluster.
In this presentation, Julien will go through Apache Cassandra mutli-datacenter concepts first then show multi-datacenter operations essentials in details: bootstrapping new nodes and / or datacenter, repairs strategy, Java GC tuning, OS tuning, Apache Cassandra configuration and monitoring.
Based on his 3 years experience managing a multi-datacenter cluster against Apache Cassandra 2.0, 2.1, 2.2 and 3.0, Julien will give you tips on how to anticipate and prevent / mitigate issues related to basic Apache Cassandra operations with a multi-datacenter cluster.
About the Speaker
Julien Anguenot VP Software Engineering, iland Internet Solutions, Corp
Julien currently serves as iland's Vice President of Software Engineering. Prior to joining iland, Mr. Anguenot held tech leadership positions at several open source content management vendors and tech startups in Europe and in the U.S. Julien is a long time Open Source software advocate, contributor and speaker: Zope, ZODB, Nuxeo contributor, Zope and OpenStack foundations member, his talks includes Apache Con, Cassandra summit, OpenStack summit, The WWW Conference or still EuroPython.
Abstract:
Cassandra is a new kind of database: it is more than a single-machine system. It naturally runs in a High-Availability configuration. All nodes in the system are symmetric; there is no single point of failure. As you add machines, failure becomes routine, and Cassandra is built to tolerate that with no interruptions.
Cassandra is linearly scalable with good performance characteristics for very small and very large data stores. Unlike earlier efforts, Cassandra is more than just a key-value store; it is a structured data store which can facilitate complex use cases and queries. Cassandra allows for random access to your data organized into rows and columns.
Cassandra is different, and exciting. This presentation will discuss the pros and cons of using Cassandra, and why it has seen such amazing adoption in the past year.
Bio:
Ben Coverston is Director of Operations at DataStax (formerly knows as Riptano), a provider of software, support, services, training, resources and help for Cassandra. He has been involved in enterprise software his entire career. Working in the airline industry, he helped to build some of the highest volume online booking sites in the world. He saw first hand the consequences of trying to solve real world scalability problems at the limit of what traditional relational databases are capable of.
Apache Cassandra operations have the reputation to be simple on single datacenter deployments and / or low volume clusters but they become way more complex on high latency multi-datacenter clusters with high volume and / or high throughout: basic Apache Cassandra operations such as repairs, compactions or hints delivery can have dramatic consequences even on a healthy high latency multi-datacenter cluster.
In this presentation, Julien will go through Apache Cassandra mutli-datacenter concepts first then show multi-datacenter operations essentials in details: bootstrapping new nodes and / or datacenter, repairs strategy, Java GC tuning, OS tuning, Apache Cassandra configuration and monitoring.
Based on his 3 years experience managing a multi-datacenter cluster against Apache Cassandra 2.0, 2.1, 2.2 and 3.0, Julien will give you tips on how to anticipate and prevent / mitigate issues related to basic Apache Cassandra operations with a multi-datacenter cluster.
Basically everything you need to get started on your Zookeeper training, and setup apache Hadoop high availability with QJM setup with automatic failover.
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayAltinity Ltd
LIVE WEBINAR: October 21, 2021 | 10 am PT
SPEAKERS: Jun Li, Principal Architect, eBay & Robert Hodges, CEO, Altinity
eBay depends on Kafka to solve the impedance mismatch between rapidly arriving messages in event streams and efficient block insert into ClickHouse clusters. Naïve loading procedures from Kafka to ClickHouse generate non-deterministic blocks, which can lead to data loss and incorrect results in applications. The eBay team solved this problem with a block aggregator that leverages Kafka to store message processing metadata as well as ClickHouse deduplication to ensure blocks being loaded to ClickHouse exactly once. The block aggregator allows eBay to support a sharded ClickHouse architecture across multiple data centers that can tolerate failures in any individual part of the system. Join us to learn how eBay developed this unique architecture and how they use it to deliver low-latency analytics to users.
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...Luke Tillman
Transitioning a legacy monolithic application to microservices is a daunting task by itself and it only gets more complicated as you start to dig through all the libraries and frameworks out there meant to help. In this talk, we'll cover the transition of a real Cassandra-based application to a microservices architecture using Grpc from Google and Falcor from Netflix. (Yes, Falcor is more than just a magical luck dragon from an awesome 80's movie.) We'll talk about why these technologies were a good fit for the project as well as why Cassandra is often a great choice once you go down the path of microservices. And since all the code for the project is open source, you'll have plenty to dig into afterwards.
Introduction to Data Modeling with Apache CassandraLuke Tillman
Relational systems have always been built on the premise of modeling relationships. As you will see, static schema, one-to-one, many-to-many still have a place in Cassandra. From the familiar, we’ll go into the specific differences in Cassandra and tricks to make your application fast and resilient.
Building your First Application with CassandraLuke Tillman
You’ve heard the talks, followed the tutorials, and done the research. You are a font of Cassandra knowledge. Now it’s time to change the world! (Or at least build something to make your boss happy). In this talk we’ll walk through the process of building KillrVideo, an open source video sharing website where users can upload and share videos, rate them, comment on them, and more. By looking at a real application, we’ll talk about architectural decisions, how the application drives the data model, some pro tips when using the DataStax drivers, and some lessons learned from mistakes made along the way. You’ll leave this session ready to start building your next application (world-changing or otherwise) with Cassandra.
Getting started with DataStax .NET Driver for CassandraLuke Tillman
Video of this presentation from Cassandra Day Seattle is here: https://www.youtube.com/watch?v=sbs6YExxYqc&index=6&list=PLqcm6qE9lgKIgRKG0d-NEvYw9qYOztbci
So you’ve grabbed the latest 2.0 version of the DataStax C# driver from NuGet. Now what? In this talk, Luke will walk you through some of the basics of the C# driver--how to bootstrap the driver and connect to a cluster, execute CQL, and retrieve the results. Wondering what the difference between a PreparedStatement and a SimpleStatement is? Not sure what the appropriate lifetime is for a Cluster or a Session object? What about ADO.NET and LINQ support? We’ll cover this and more, so that you can get on with building applications on top of Cassandra. Even if you’re not a C# developer (or think that C# is the handiwork of the devil), many of the concepts we’ll cover will help you get started with the other DataStax drivers as well (Python, Java, and C++).
Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)Luke Tillman
In this talk, we'll dive into the event sourcing API used to persist actor state in Akka and talk about how we build a data model to support it in Cassandra. At first, the data model seems pretty straightforward, but the more we dig in, the more we see that a couple of classic Cassandra anti-patterns are pushing us close to the Pit of Despair. We'll come up with a way to avoid these problems so we can go on building distributed systems happily ever after with Akka and Cassandra.
Avoiding the Pit of Despair - Event Sourcing with Akka and CassandraLuke Tillman
With Akka you take a complicated system and break it down into lots of smaller units (actors) that communicate by passing messages. A single actor system can easily scale to millions or tens of millions of actors running on many machines. As actors process messages, they build up internal state, and many times we want that state persisted somewhere. In this talk, we'll dive into the event sourcing API used to persist actor state in Akka and talk about how we build a data model to support it in Cassandra. At first, the data model seems pretty straightforward, but the more we dig in, the more we see that a couple of classic Cassandra anti-patterns are pushing us close to the Pit of Despair. We'll come up with a way to avoid these problems so we can go on building distributed systems happily ever after with Akka and Cassandra.
A Deep Dive into Apache Cassandra for .NET DevelopersLuke Tillman
.NET developers have a lot of options when it comes to databases these days. Apache Cassandra is a scalable, fault-tolerant database that has already found its way into more than 25% of the Fortune 100 and continues to grow in popularity. But what makes it different from the myriad of other options available? In this talk, we’ll take a deep dive into Cassandra and learn about:
- Cassandra’s internals and how it works
- CQL (the SQL-like query language for Cassandra)
- Data Modeling like a pro
- Tools available for developers
- Writing .NET code that talks to Cassandra
If there’s time and interest, we’ll finish up with how some companies are already using Cassandra to power services you probably interact with in your daily life. You’ll leave with all the tools you need to start build highly available .NET applications and services on top of Cassandra.
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)Luke Tillman
You're building the next big thing. It will attract hundreds of thousands of users and make so much cash, Gordon Gecko would blush. You just know that if you build it, they will come. But what happens when all those users do show up? Will you spend your time adding the new features they're clamoring for, or will you be scrambling to make sure your relational database doesn't die hard? In this talk, we'll take a look at some of the risky business we undertake to try and scale our relational databases and the problems we run into. Then we'll talk about how Cassandra is different and some of the knobs you control to turn things up to 11. If you're new to Cassandra and are looking for an introduction, come from a relational database background, or you just want to see how many 80s movie references we can cover in 40 minutes, then don your favorite fedora and come for an excellent adventure.
Slides for the talk "Cassandra and Spark: Love at First Sight" given at Texas Linux Fest 2015. Gives an introduction to both Cassandra and Spark and how they work together.
These are the slides from my talk at Hulu in March 2015 discussing Apache Spark & Cassandra. I cover the evolution of data from a single machine to RDBMS (MySQL is the primary example) to big data systems.
On the Spark side, I covered batch jobs, streaming, Apache Kafka, an introduction to machine learning, clustering, logistic regression and recommendations systems (collaborative filtering).
The talk was recorded and is available on youtube: https://www.youtube.com/watch?v=_gFgU3phogQ
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...ScyllaDB
Outbrain is the world's largest content discovery program. Learn about their use case with Scylla where they lowered latency while doing 20X IOPS of Cassandra.
Pythian: My First 100 days with a Cassandra ClusterDataStax Academy
With Apache Cassandra being a massively scalable open source NoSQL database and with the amount of data that we create and copy annually which is doubling in size every two years, it is expected to reach 44 zettabytes, or 44 trillion gigabytes, we can assume that sooner or later a DBA will be handling a Cassandra database in their shop. This beginner/intermediate-level session will take you through my journey of an Oracle DBA and my first 100 days of starting to administer a Cassandra Cluster, show several demos and all the roadblocks and the success I had along this path.
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...Dataconomy Media
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias Johansson, Lead Developer at Valo.io
Watch more from Data Natives Berlin 2016 here: http://bit.ly/2fE1sEo
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
Tobias is technical lead developer for Valo.io in London. He has a background in the financial sector as a front-office developer but changed track in 2013 to be part of a team building a new real-time analytics platform from the ground up. His goal is to outlive the JVM and his tea addiction. This is his first appearance on the conference scene as a speaker.
Always On: Building Highly Available Applications on CassandraRobbie Strickland
Cassandra was built from the ground up to enable linearly scalable, always-on applications. But the path to high availability has many land mines that can mean failure for the inexperienced user. In this talk, I will offer practical advice on how to achieve 100% uptime on millions of transactions per second. I'll address all aspects of the topic, including deployment, configuration, application design, and operations.
Things YouShould Be Doing When Using Cassandra DriversRebecca Mills
Did you know there are some things you should doing when writing your data driven application using Apache Cassandra? Let’s talk about that. There are features in almost every driver that will keep your application online and running fast. You should know what they are. Even if you know these features, I’m going to tell why they work and why they’re a good idea. This will not be language specific; I will be using multiple languages and drivers. This talk should appeal to programmers in general.
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...DataStax
We use Apache Cassandra at BlackRock to help power our Aladdin investment management platform. Like most users, we love Cassandra’s scalability and fault tolerance. One challenge we’ve faced is keeping data consistent between data centers. Cassandra is great at replicating data to multiple data centers, and many users take advantage of this feature to achieve eventual consistency in multi-region clusters. At BlackRock, we have several use cases where eventual consistency is not good enough; sometimes we need to guarantee that the most recent data is available from all locations. Cassandra’s tunable consistency makes it possible to achieve this extreme level of resiliency. In this talk we’ll discuss our experience from the past several years using Cassandra for cross-WAN consistency, some of the novel ways we’ve dealt with the performance implications, and our ideas for improving support for this usage model in future versions of Cassandra.
About the Speaker
Randy Fradin Vice President, BlackRock
Randy Fradin is part of BlackRock’s Aladdin Product Group. His team is responsible for developing the core software infrastructure in BlackRock’s Aladdin platform, including scalable storage, compute, and messaging services. Previously he spent time developing the market data, risk reporting, and core trading functions in Aladdin. He has been an enthusiastic Cassandra user since 2011.
At this meetup Patrick McFadin, Solutions Architect at DataStax, will be discussing the most recently added features in Apache Cassandra 2.0, including: Lightweight transactions, eager retries, improved compaction, triggers, and CQL cursors. He'll also be touching on time series data with Apache Cassandra.
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...DataStax Academy
Netflix has updated and added new tools and benchmarks for Cassandra in the last year. In this talk we will cover the latest additions and recipes for the Astyanax Java client, updates to Priam to support Cassandra 1.2 Vnodes, plus newly released and upcoming tools that are all part of the NetflixOSS platform. Following on from the Cassandra on SSD on AWS benchmark that was run live during the 2012 Summit, we've been benchmarking a large write intensive multi-region cluster to see how far we can push it. Cassandra is the data storage and global replication foundation for the Cloud Native architecture that runs Netflix streaming for 36 Million users. Netflix is also offering a Cloud Prize for open source contributions to NetflixOSS, and there are ten categories including Best Datastore Integration and Best Contribution to Performance Improvements, with $10K cash and $5K of AWS credits for each winner. We'd like to pay you to use our free software!
Using Apache Cassandra: What is this thing, and how do I use it?jeremiahdjordan
This is the presentation I gave at the Reflections | Projections conference at UIUC. http://www.acm.uiuc.edu/conference/2013/ It is an introduction to some of the basics of Apache Cassandra, followed by actually getting it up and running. This presentation goes over what Apache Cassandra is and how to get it up and running on your development machine. It then goes over using the DataStax Python Driver and the Cassandra Query Language (CQL) to create tables, write data to them, and then read it back out.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Epistemic Interaction - tuning interfaces to provide information for AI support
Introduction to Apache Cassandra
1. Introduction to Apache Cassandra
Luke Tillman (@LukeTillman)
Language Evangelist at DataStax
2. Who are you?!
•Evangelist with a focus on the .NET Community
•Long-time Developer
•Recently presented at Cassandra Summit 2014 with Microsoft
•Very Recent Denver Transplant
2
3. DataStax and Cassandra
•DataStax Enterprise
–Apache Cassandra, now with more QA!
–Easy integrations with Solr, Apache Spark, Hadoop
•Dev and Ops Tooling
–DevCenter IDE, OpsCenter
•Open source drivers
–Java, C#, Python, C++, Ruby, NodeJS
3
4. •Unlimited, free use of DataStax Enterprise
•No limit on number of nodes or other hidden restrictions
•If you’re a startup, it’s free.
•Requirements:
–< $2M annual revenue, < $20M capital raised
4
www.datastax.com/startups
5. 1
What is Cassandra?
2
How does it work?
3
Cassandra Query Language (CQL)
4
Who’s using it?
5
Questions
5
7. What is Cassandra?
•A Linearly Scaling and Fault Tolerant Distributed Database
•Fully Distributed
–Data spread over many nodes
–All nodes participate in a cluster
–All nodes are equal
–No SPOF (shared nothing)
7
8. What is Cassandra?
•Linearly Scaling
–Have More Data? Add more nodes.
–Need More Throughput? Add more nodes.
8
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
9. What is Cassandra?
•Fault Tolerant
–Nodes Down != Database Down
–Datacenter Down != Database Down
9
10. What is Cassandra?
•Fully Replicated
•Clients write local
•Data syncs across WAN
•Replication Factor per DC
10
US
Europe
Client
11. Cassandra and the CAP Theorem
•The CAP Theorem limits what distributed systems can do
•Consistency
•Availability
•Partition Tolerance
•Limits? “Pick 2 out of 3”
11
12. Cassandra and the CAP Theorem
Consistency
•When I ask the same question to any part of the system, I should get the same answer
12
Is he guilty yet?
No.
No.
No.
Consistent
13. Cassandra and the CAP Theorem
Consistency
•When I ask the same question to any part of the system, I should get the same answer
13
Is he guilty yet?
No.
Yes.
Yes.
Not Consistent
14. Cassandra and the CAP Theorem
Availability
•When I ask a question, I will get an answer
14
Is he guilty yet?
Yes.
Available
15. Cassandra and the CAP Theorem
Availability
•When I ask a question, I will get an answer
15
Is he guilty yet?
I don’t know, we have to wait for Dreamy to wake up.
Not Available
16. Cassandra and the CAP Theorem
Partition Tolerance
•I can ask questions even when the system is having intra-system communication problems.
16
Is he guilty yet?
Tolerant
No.
Team Tyrion
Team Cersei
17. Cassandra and the CAP Theorem
Partition Tolerance
•I can ask questions even when the system is having intra-system communication problems.
17
Is he guilty yet?
Not Tolerant
I’m not sure without asking them and we’re not speaking (I’m pretty sure that one helped kill my sister).
Team Tyrion
Team Cersei
18. Cassandra and the CAP Theorem
•Cassandra is an AP system that is Eventually Consistent
18
Is he guilty yet?
No.
Wait, he’s going to take the black. Yes.
No.
Eventually Consistent
19. Cassandra and the CAP Theorem
•Cassandra is an AP system that is Eventually Consistent
19
Is he guilty yet?
Yes.
Yes.
Eventually Consistent
Yes.
21. Two knobs control Cassandra fault tolerance
•Replication Factor (server side)
–How many copies of the data should exist?
21
Client
B
AD
C
AB
A
CD
D
BC
Write A
RF=3
22. Two knobs control Cassandra fault tolerance
•Consistency Level (client side)
–How many replicas do we need to hear from before we acknowledge?
22
Client
B
AD
C AB
A
CD
D
BC
Write A
CL=QUORUM
Client
B
AD
C
AB
A CD
D
BC
Write A
CL=ONE
23. Consistency Levels
•Applies to both Reads and Writes (i.e. is set on each query)
•ONE – one replica from any DC
•LOCAL_ONE – one replica from local DC
•QUORUM – 51% of replicas from any DC
•LOCAL_QUORUM – 51% of replicas from local DC
•ALL – all replicas
•TWO
23
24. Consistency Level and Speed
•How many replicas we need to hear from can affect how quickly we can read and write data in Cassandra
24
Client
B
AD
C AB
A
CD
D
BC
5 μs ack
300 μs ack
12 μs ack
12 μs ack
Read A
(CL=QUORUM)
25. Consistency Level and Availability
•Consistency Level choice affects availability
•For example, QUORUM can tolerate one replica being down and still be available (in RF=3)
25
Client
B
AD
C
AB
A CD
D
BC
A=2
A=2
A=2
Read A
(CL=QUORUM)
26. Consistency Level and Eventual Consistency
•Cassandra is an AP system that is Eventually Consistent so replicas may disagree
•Column values are timestamped
•In Cassandra, Last Write Wins (LWW)
26
Client
B AD
C AB
A
CD
D
BC
A=2
Newer
A=1 Older
A=2
Read A
(CL=QUORUM)
Christos from Netflix: “Eventual Consistency != Hopeful Consistency” https://www.youtube.com/watch?v=lwIA8tsDXXE
27. Writes in the cluster
•Fully distributed, no SPOF
•Node that receives a request is the Coordinator for request
•Any node can act as Coordinator
27
Client
B
AD
C
AB
A CD
D BC
Write A
(CL=ONE)
Coordinator Node
28. Writes in the cluster – Data Distribution
•Partition Key determines node placement
28
Partition Key
id='pmcfadin'
lastname='McFadin'
id='jhaddad'
firstname='Jon'
lastname='Haddad'
id='ltillman'
firstname='Luke'
lastname='Tillman'
CREATE TABLE users ( id text, firstname text, lastname text, PRIMARY KEY (id) );
29. Writes in the cluster – Data Distribution
•The Partition Key is hashed using a consistent hashing function (Murmur 3) and the output is used to place the data on a node
•The data is also replicated to RF-1 other nodes
29
Partition Key
id='ltillman'
firstname='Luke'
lastname='Tillman'
Murmur3
id: ltillman
Murmur3: A
B
AD
C AB
A
CD
D
BC
RF=3
30. Hashing – Back to Reality
•Back in reality, Partition Keys actually hash to 128 bit numbers
•Nodes in Cassandra own token ranges (i.e. hash ranges)
30
B AD
C
AB
A
CD
D BC
Range
Start
End
A
0xC000000..1
0x0000000..0
B
0x0000000..1
0x4000000..0
C
0x4000000..1
0x8000000..0
D
0x8000000..1
0xC000000..0
Partition Key
id='ltillman'
Murmur3
0xadb95e99da887a8a4cb474db86eb5769
31. Writes on a single node
•Client makes a write request
Client
UPDATE users SET firstname = 'Luke' WHERE id = 'ltillman'
Disk
Memory
32. Writes on a single node
•Data is appended to the Commit Log
•Cassandra writes are FAST due to log appended storage
Client
UPDATE users
SET firstname = 'Luke'
WHERE id = 'ltillman'
Commit Log
id='ltillman', firstname='Luke'
…
…
Disk
Memory
33. Writes on a single node
•Data is written to Memtable
Client
UPDATE users
SET firstname = 'Luke'
WHERE id = 'ltillman'
Commit Log
id='ltillman', firstname='Luke'
…
…
Disk
Memory
Memtable for Users
Some Other Memtable
id='ltillman'
firstname='Luke'
lastname='Tillman'
34. Writes on a single node
•Server acknowledges to client
Client
UPDATE users
SET firstname = 'Luke'
WHERE id = 'ltillman'
Commit Log
id='ltillman', firstname='Luke'
…
…
Disk
Memory
Memtable for Users
Some Other Memtable
id='ltillman'
firstname='Luke'
lastname='Tillman'
35. Writes on a single node
•Once Memtable is full, data is flushed to disk as SSTable (Sorted String Table)
Client
UPDATE users SET firstname = 'Luke' WHERE id = 'ltillman'
Data Directory
Disk
Memory
Memtable for Users
Some Other Memtable
id='ltillman'
firstname='Luke'
lastname='Tillman'
Some Other SSTable
SSTable #1 for Users
SSTable #2 for Users
36. Compaction
•Compactions merge and unify data in our SSTables
•SSTables are immutable, so this is when we consolidate rows
36
SSTable #1 for Users
SSTable #2 for Users
SSTable #3 for Users
id='ltillman'
firstname='Lucas' (timestamp=Older)
lastname='Tillman'
id='ltillman'
firstname='Luke'
lastname='Tillman'
id='ltillman'
firstname='Luke' (timestamp=Newer)
37. Reads in the cluster
•Same as writes in the cluster, reads are coordinated
•Any node can be the Coordinator Node
37
Client
B AD
C
AB
A CD
D
BC
Read A
(CL=QUORUM)
Coordinator Node
38. Reads on a single node
•Client makes a read request
38
Client
SELECT firstname, lastname FROM users WHERE id = 'ltillman'
Disk
Memory
39. Reads on a single node
•Data is read from (possibly multiple) SSTables and merged
•Reads in Cassandra are also FAST but are limited by Disk IO
39
Client
SELECT firstname, lastname FROM users WHERE id = 'ltillman'
Disk
Memory
SSTable #1 for Users
id='ltillman'
firstname='Lucas' (timestamp=Older)
lastname='Tillman'
SSTable #2 for Users
id='ltillman'
firstname='Luke'
(timestamp=Newer)
firstname='Luke'
lastname='Tillman'
40. Reads on a single node
•Any unflushed Memtable data is also merged
40
Client
SELECT firstname, lastname
FROM users
WHERE id = 'ltillman'
Disk
Memory
firstname='Luke'
lastname='Tillman'
Memtable for Users
41. Reads on a single node
•Client gets acknowledgement with the data
41
Client
SELECT firstname, lastname
FROM users
WHERE id = 'ltillman'
Disk
Memory
firstname='Luke'
lastname='Tillman'
42. Compaction - Revisited
•Compactions merge and unify data in our SSTables, making them important to reads (less SSTables = less to read/merge)
42
SSTable #1 for Users
SSTable #2 for Users
SSTable #3 for Users
id='ltillman'
firstname='Lucas' (timestamp=Older)
lastname='Tillman'
id='ltillman'
firstname='Luke'
lastname='Tillman'
id='ltillman'
firstname='Luke' (timestamp=Newer)
44. Data Structures
•Keyspace is like RDBMS Database or Schema
•Like RDBMS, Cassandra uses Tables to store data
•Partitions can have one row (narrow) or multiple rows (wide)
44
Keyspace
Tables
Partitions
Rows
45. Schema Definition (DDL)
•Easy to define tables for storing data
•First part of Primary Key is the Partition Key
CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) );
47. Clustering Columns
•Second part of Primary Key is Clustering Columns
•Clustering columns affect ordering of data (on disk)
•Multiple rows per partition
47
CREATE TABLE comments_by_video (
videoid uuid,
commentid timeuuid,
userid uuid,
comment text,
PRIMARY KEY (videoid, commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);
48. Clustering Columns – Wide Rows (Partitions)
•Use of Clustering Columns is where the term “Wide Rows” comes from
48
videoid='0fe6a...'
userid=
'ac346...'
comment= 'Awesome!'
commentid='82be1...'
(10/1/2014 9:36AM)
userid= 'f89d3...'
comment=
'Garbage!'
commentid='765ac...' (9/17/2014 7:55AM)
CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC);
49. Inserts and Updates
•Use INSERT or UPDATE to add and modify data
•Both will overwrite data (no constraints like RDBMS)
•INSERT and UPDATE functionally equivalent
49
INSERT INTO comments_by_video (
videoid, commentid, userid, comment)
VALUES (
'0fe6a...', '82be1...', 'ac346...', 'Awesome!');
UPDATE comments_by_video SET userid = 'ac346...', comment = 'Awesome!' WHERE videoid = '0fe6a...' AND commentid = '82be1...';
50. TTL and Deletes
•Can specify a Time to Live (TTL) in seconds when doing an INSERT or UPDATE
•Use DELETE statement to remove data
•Can optionally specify columns to remove part of a row
50
INSERT INTO comments_by_video ( ... )
VALUES ( ... )
USING TTL 86400;
DELETE FROM comments_by_video WHERE videoid = '0fe6a...' AND commentid = '82be1...';
51. Querying
•Use SELECT to get data from your tables
•Always include Partition Key and optionally Clustering Columns
•Can use ORDER BY and LIMIT
•Use range queries (for example, by date) to slice partitions
51
SELECT * FROM comments_by_video
WHERE videoid = 'a67cd...'
LIMIT 10;
52. Cassandra Data Modeling
•Requires a different mindset than RDBMS modeling
•Know your data and your queries up front
•Queries drive a lot of the modeling decisions (i.e. “table per query” pattern)
•Denormalize/Duplicate data at write time to do as few queries as possible come read time
•Remember, disk is cheap and writes in Cassandra are FAST
52
53. Cassandra Data Modeling – A Quick Example
•Users need to be looked up by a unique Id, but when logging in, need to look them up by email address
•Some data is duplicated (email, userid) but that’s OK
53
CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, PRIMARY KEY (userid) );
CREATE TABLE users_by_email (
email text,
password text,
userid uuid,
PRIMARY KEY (email)
);
56. Some Common Use Case Categories
•Product Catalogs and Playlists
•Internet of Things (IoT) and Sensor Data
•Messaging (emails, IMs, alerts, comments)
•Recommendation and Personalization
•Fraud Detection
•Time series and temporal ordered data
http://planetcassandra.org/apache-cassandra-use-cases/
57. The “Slide Heard Round the World”
•From Cassandra Summit 2014, got a lot of attention
•75,000+ nodes
•10s of PBs of data
•Millions ops/s
•One of the largest known Cassandra deployments
57
58. Spotify
•Streaming music web service
•> 24,000,000 music tracks
•> 50TB of data in Cassandra
Why Cassandra?
•Was PostgreSQL, but hit scaling problems
•Multi Datacenter Availability
•Integration with Spark for data processing and analytics
Usage
•Catalog
•User playlists
•Artists following
•Radio Stations
•Event notifications
58
http://planetcassandra.org/blog/interview/spotify-scales-to-the-top-of-the-charts-with-apache-cassandra-at-40k-requestssecond/
59. eBay
•Online auction site
•> 250TB of data, dozens of nodes, multiple data centres
•> 6 billion writes, > 5 billion reads per day
Why Cassandra?
•Low latency, high scale, multiple data centers
•Suited for graph structures using wide rows
Usage
•Building next generation of recommendation engine
•Storing user activity data
•Updating models of user interests in real time
59
http://planetcassandra.org/blog/5-minute-c-interview-ebay/
60. FullContact
•Contact management: from multiple sources, sync, de-dupe, APIs available
•2 clusters, dozens of nodes, running in AWS
•Based here in Denver
Why Cassandra?
•Migated from MongoDB after running into scaling issues
•Operational simplicity
•Resilience and Availability
Usage
•Person API (search by email, Twitter handle, Facebook, or phone)
•Searched data from multiple sources (ingested by Hadoop M/R jobs)
•Resolved profiles
60
http://planetcassandra.org/blog/fullcontact-readies-their-search-platform-to-scale-moves-from-mongodb-to-apache-cassandra/
61. Instagram
•Photo-sharing, video-sharing and social networking service
•Originally AWS (Now Facebook data centers?)
•> 20k writes/second, >15k reads/second
Why Cassandra?
•Migrated from Redis (problems keeping everything in memory)
•No painful “sharding” process
•75% reduction in costs
Usage
•Auditing information – security, integrity, spam detection
•News feed (“inboxes” or activity feed)
–Likes, Follows, etc.
61
http://planetcassandra.org/blog/instagram-making-the-switch-to-cassandra-from-redis-75-instasavings/ Summit 2014 Presentation: https://www.youtube.com/watch?v=_gc94ITUitY
62. Netflix
•TV and Movie streaming service
•> 2700+ nodes on over 90 clusters
•4 Datacenters
•> 1 Trillion operations per day
Why Cassandra?
•Migrated from Oracle
•Massive amounts of data
•Multi datacenter, No SPOF
•No downtime for schema changes
Usage
•Everything! (Almost – 95% of DB use)
•Example: Personalization
–What titles do you play?
–What do you play before/after?
–Where did you pause?
–What did you abandon watching after 5 minutes?
62
http://planetcassandra.org/blog/case-study-netflix/ Summit 2014 Presentation: https://www.youtube.com/watch?v=RMSNLP_ORg8&index=43&list=UUvP-AXuCr-naAeEccCfKwUA