Scaling Twitter with Cassandra

•Download as KEY, PDF•

115 likes•20,601 views

Ryan King

Technology

Scaling Twitter
with Cassandra
Ryan King
Storage Team

bit.ly/chirpcassandra

ryan@twitter.com

@rk

Legacy
• vertically & horiztonally partitioned mysql

• memcached (rows, indexes and fragments)

• application managed

Legacy Drawbacks
• many single-points-of-failure

• hardware-intensive

• manpower-intensive

• tight coupling

Apache Cassandra
• Apache top level project

• originally developed at Facebook

• Rackspace, Digg, SimpleGeo, Twitter, etc.

Why Cassandra?
• highly available

• consistent, eventually

• decentralized

• fault tolerant

• elastic

• flexible schema

• high write throughput

What is Cassandra?
• distributed database

• Google's BigTable's data model

• Amazon's Dynamo's infrastructure

Cassandra Data Model
• keyspaces

• column families

• columns

• super columns

Cassandra Infrastructure
• partitioners

• storage

• querying

Partitioners
• order-preserving

• random

• custom

Storage
• commit log

• memtables

• sstables

• compaction

• bloom filters

• indexes

• key cache

• row cache

Querying
• get

• multiget

• range

• slice

Consistency
• N, R, W

• N = number of replicas

Consistency
• N, R, W

• N = number of replicas

• R = read replicas

Consistency
• N, R, W

• N = number of replicas

• R = read replicas

• W = write replicas

Consistency
• N, R, W

• N = number of replicas

• R = read replicas

• W = write replicas

• send request, wait for specified number

Consistency
• N, R, W

• N = number of replicas

• R = read replicas

• W = write replicas

• send request, wait for specified number

• wait for others in background and perform read-
repair

Consistency Levels
• ZERO

• ONE

• QUORUM

• ALL

Strong Consistency
• If W + R > N, you will have consistency

• W=1, R=N

• W=N, R=1

• W=Q, R=Q where Q = N / 2 + 1

Eventuality
• Hinted Handoff

• Read Repair

• Proactive Repair (Merkle trees)

Potential Consistency
• causes

• write-through caching

• master-slave replication failures

Read Repair
• send read to all replicas

• if they differ, resolve conflicts and update (in
background)

Hinted Handoff
• A wants to write to B

• B is down

• A tells C, "when B is back, send them this
update"

Proactive Repair
• use Merkle trees to find inconsistencies

• resolve conflicts

• send repaired data

• triggered manually

How we’re moving?
• parallel deployments

• incremental traffic shifting

Parallel Deployment
1. build new implementation
2. integrate it alongside existing
3. ...with switches for dynamically move/mirror traffic

4. turn up traffic
5. break something
6. Fix it

7. GOTO 4

What's hot

Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...

Helena Edelson

Scala Days, Amsterdam, 2015: Lambda Architecture - Batch and Streaming with Spark, Cassandra, Kafka, Akka and Scala; Fault Tolerance, Data Pipelines, Data Flows, Data Locality, Akka Actors, Spark, Spark Cassandra Connector, Big Data, Asynchronous data flows. Time series data, KillrWeather, Scalable Infrastructure, Partition For Scale, Replicate For Resiliency, Parallelism Isolation, Data Locality, Location Transparency

Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala

Helena Edelson

Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17

spark-project

Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)

Spark Summit

Spark Streaming with Cassandra

Jacek Lewandowski

Meet Up - Spark Stream Processing + Kafka

Knoldus Inc.

Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...

Spark Summit

Analytics with Cassandra & Spark

Matthias Niehoff

Developing a Real-time Engine with Akka, Cassandra, and Spray

Jacob Park

Regardless of the meaning we are searching for over our vast amounts of data, whether we are in science, finance, technology, energy, health care…, we all share the same problems that must be solved: How do we achieve that? What technologies best support the requirements? This talk is about how to leverage fast access to historical data with real time streaming data for predictive modeling for lambda architecture with Spark Streaming, Kafka, Cassandra, Akka and Scala. Efficient Stream Computation, Composable Data Pipelines, Data Locality, Cassandra data model and low latency, Kafka producers and HTTP endpoints as akka actors...

Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...

Helena Edelson

Apache Spark has emerged over the past year as the imminent successor to Hadoop MapReduce. Spark can process data in memory at very high speed, while still be able to spill to disk if required. Spark’s powerful, yet flexible API allows users to write complex applications very easily without worrying about the internal workings and how the data gets processed on the cluster. Spark comes with an extremely powerful Streaming API to process data as it is ingested. Spark Streaming integrates with popular data ingest systems like Apache Flume, Apache Kafka, Amazon Kinesis etc. allowing users to process data as it comes in. In this talk, Hari will discuss the basics of Spark Streaming, its API and its integration with Flume, Kafka and Kinesis. Hari will also discuss a real-world example of a Spark Streaming application, and how code can be shared between a Spark application and a Spark Streaming application. Each stage of the application execution will be presented, which can help understand practices while writing such an application. Hari will finally discuss how to write a custom application and a custom receiver to receive data from other systems.

Real Time Data Processing Using Spark Streaming

Hari Shreedharan

How do you rapidly derive complex insights on top of really big data sets in Cassandra? This session draws upon Evan's experience building a distributed, interactive, columnar query engine on top of Cassandra and Spark. We will start by surveying the existing query landscape of Cassandra and discuss ways to integrate Cassandra and Spark. We will dive into the design and architecture of a fast, column-oriented query architecture for Spark, and why columnar stores are so advantageous for OLAP workloads. I will present a schema for Parquet-like storage of analytical datasets onCassandra. Find out why Cassandra and Spark are the perfect match for enabling fast, scalable, complex querying and storage of big analytical data.

OLAP with Cassandra and Spark

Evan Chan

Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)

Spark Summit

Muvr is a real-time personal trainer system. It must be highly available, resilient and responsive, and so it relies on heavily on Spark, Mesos, Akka, Cassandra, and Kafka—the quintuple also known as the SMACK stack. In this talk, we are going to explore the architecture of the entire muvr system, exploring, in particular, the challenges of ingesting very large volume of data, applying trained models on the data to provide real-time advice to our users, and training & evaluating new models using the collected data. We will specifically emphasize on how we have used Cassandra for consuming lots of fast incoming biometric data from devices and sensors, and how to securely access the big data sets from Cassandra in Spark to compute the models. We will finish by showing the mechanics of deploying such a distributed application. You will get a clear understanding of how Mesos, Marathon, in conjunction with Docker, is used to build an immutable infrastructure that allows us to provide reliable service to our users and a great environment for our engineers.

Real-time personal trainer on the SMACK stack

Anirvan Chakraborty

Stream Processing using Apache Spark and Apache Kafka

Abhinav Singh

You have collected a lot of time series data so now what? It's not going to be useful unless you can analyze what you have. Apache Spark has become the heir apparent to Map Reduce but did you know you don't need Hadoop? Apache Cassandra is a great data source for Spark jobs! Let me show you how it works, how to get useful information and the best part, storing analyzed data back into Cassandra. That's right. Kiss your ETL jobs goodbye and let's get to analyzing. This is going to be an action packed hour of theory, code and examples so caffeine up and let's go.

Analyzing Time Series Data with Apache Spark and Cassandra

Patrick McFadin

spark-timeseries is a Scala / Java / Python library for interacting with time series data on Apache Spark. Time-series are an important part of data science applications, but are notoriously difficult in the context of distributed systems, due to their sequential nature. Getting this right is therefore a challenging but important element of progress in the universe of distributed systems applied to data science. This talk will cover the current overall design of spark-timeseries, the current functionalities, and will provide some usage examples. Because the project is still at an early stage, the talk will also cover the current weaknesses and future improvements that are in the spark-timeseries project roadmap.

Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette

Spark Summit

Are you tired of struggling with your existing data analytic applications? When MapReduce first emerged it was a great boon to the big data world, but modern big data processing demands have outgrown this framework. That’s where Apache Spark steps in, boasting speeds 10-100x faster than Hadoop and setting the world record in large scale sorting. Spark’s general abstraction means it can expand beyond simple batch processing, making it capable of such things as blazing-fast, iterative algorithms and exactly once streaming semantics. This combined with it’s interactive shell make it a powerful tool useful for everybody, from data tinkerers to data scientists to data developers.

The How and Why of Fast Data Analytics with Apache Spark

Legacy Typesafe (now Lightbend)

Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...

Spark Summit

Developing Application with Big Data is really challenging work, scaling, fault tolerance and responsiveness some are the biggest challenge. Realtime bigdata application that have self healing feature is a dream these days. Apache Spark is a fast in-memory data processing system that gives a good backend for realtime application.In this talk I will show how to use reactive platform, Actor model and Apache Spark stack to develop a system that have responsiveness, resiliency, fault tolerance and message driven feature.

Reactive app using actor model & apache spark

Rahul Kumar

What's hot (20)

Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...

Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala

Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17

Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)

Spark Streaming with Cassandra

Meet Up - Spark Stream Processing + Kafka

Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...

Analytics with Cassandra & Spark

Developing a Real-time Engine with Akka, Cassandra, and Spray

Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...

Real Time Data Processing Using Spark Streaming

OLAP with Cassandra and Spark

Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)

Real-time personal trainer on the SMACK stack

Stream Processing using Apache Spark and Apache Kafka

Analyzing Time Series Data with Apache Spark and Cassandra

Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette

The How and Why of Fast Data Analytics with Apache Spark

Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...

Reactive app using actor model & apache spark

Viewers also liked

NoSQL at Twitter (NoSQL EU 2010)

Kevin Weil

IT performance management isn’t about monitoring CPU, memory or disk space any more. One of the toughest application performance challenges for any IT administrator is when a user says "my application is slow". You have to be able to quickly determine what the real cause of the problem is - is it in the network, the database, the application, storage? The fact that applications are using multi-tier architectures and being deployed in cloud and virtualized infrastructures only adds to the challenge. View these slides from our webinar where Frank Ohlhorst, Enterprise IT Analyst & Consultant and Srinivas Ramanathan, CEO of eG Innovations, discussed the best practices for troubleshooting and prevention so even before a user complains their application is slow, you can pinpoint exactly where the cause of a problem lies – ensuring quick resolution and a positive user experience.

My Application is Slow | Best Practices for Troubleshooting and Prevention

eG Innovations

Chirp 2010: Scaling Twitter

John Adams

Biometric Databases and Hadoop__HadoopSummit2010

Yahoo Developer Network

Docker国内外本番環境サービス事例のご紹介

ThinkIT_impress

"In this session, Twitter engineer Alex Payne will explore how the popular social messaging service builds scalable, distributed systems in the Scala programming language. Since 2008, Twitter has moved the development of its most critical systems to Scala, which blends object-oriented and functional programming with the power, robust tooling, and vast library support of the Java Virtual Machine. Find out how to use the Scala components that Twitter has open sourced, and learn the patterns they employ for developing core infrastructure components in this exciting and increasingly popular language."

Building Distributed Systems in Scala

Alex Payne

Machine Data 101 Hands-on

Splunk

Aggregates

Huzaifa Shafiq

NoSQL databases, the CAP theorem, and the theory of relativity

Lars Marius Garshol

Scaling Twitter

Blaine

BUILDING MATERIALS - SAND

Ravindra Patnayaka

Facebook architecture presentation: scalability challenge

Cristina Munoz

Facebook Architecture - Breaking it Open

HARMAN Services

Twitter - Architecture and Scalability lessons

Aditya Rao

facebook architecture for 600M users

Jongyoon Choi

Big Data in Real-Time at Twitter

nkallen

Viewers also liked (16)

NoSQL at Twitter (NoSQL EU 2010)

My Application is Slow | Best Practices for Troubleshooting and Prevention

Chirp 2010: Scaling Twitter

Biometric Databases and Hadoop__HadoopSummit2010

Docker国内外本番環境サービス事例のご紹介

Building Distributed Systems in Scala

Machine Data 101 Hands-on

Aggregates

NoSQL databases, the CAP theorem, and the theory of relativity

Scaling Twitter

BUILDING MATERIALS - SAND

Facebook architecture presentation: scalability challenge

Facebook Architecture - Breaking it Open

Twitter - Architecture and Scalability lessons

facebook architecture for 600M users

Big Data in Real-Time at Twitter

Similar to Scaling Twitter with Cassandra

cassandra_presentation_final

SergioBruno21

Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...

Lviv Startup Club

Exploring the Fundamentals of YugabyteDB - Mydbops MyWebinar Edition 25 Join us for an enlightening journey into the world of YugabyteDB, a distributed SQL database revolutionizing data management. In this webinar presentation, we delve into the challenges faced by traditional databases, explore the architecture and unique features of YugabyteDB, and showcase its seamless scalability and fault tolerance. Watch the full recording: https://youtu.be/QtvK-apLBwQ Visit Mydbops Blogs: https://www.mydbops.com/blog/

Exploring the Fundamentals of YugabyteDB - Mydbops

Mydbops

No sql solutions - 공개용

Byeongweon Moon

Spring one2gx2010 spring-nonrelational_data

Roger Xia

Apache Cassandra @Geneva JUG 2013.02.26

Benoit Perroud

Clojure has been heralded as a pioneer in data oriented functional programming. In this talk, Huahai will explore the use of Clojure data diffing/patching library as a tool to simplify software architecture and solve complex engineering problems. After briefly describing EditScript, a Clojure data diffing/patching library, he will detail several usage patterns by drawing from code examples in our production system. Huahai will discuss how diffing improves system modularization by reducing namespace dependencies; how it drastically simplifies client-server communication to drive much faster UI iterations; how it enables massive scaling by turning stateful applications into stateless ones; and how it powers collaborative editing of online documents. This talk is for everyone who are interested in expanding their data oriented functional programming tool box.

Data Diffing Based Software Architecture Patterns

Huahai Yang

Thoughts on Transaction and Consistency Models

iammutex

Cassandra from the trenches: migrating Netflix (update)

Jason Brown

Cassandra

exsuns

MyCassandra (Full English Version)

Shun Nakamura

Cassandra integrations

T Jake Luciani

NoSQL overview #phptostart turin 11.07.2011

David Funaro

The Return of the Living Datalog

Mike Fogus

Real-time Cassandra

Acunu

Life as a software engineer is so exciting! Computing power continue to rise exponentially, software demands continue to rise exponentially as well, so far so good. The bad news are that in the last decade the computing power of single threaded application remains almost flat. If you decide to continue ignoring concurrency and multi-threading the gap between the problems you are able to solve and your hardware capabilities will continue to rise. In this session we will discuss different approaches for taming the concurrency beast, such as shared mutability,shared immutability and isolated mutability actors, STM, etc we will discuss the shortcomings and the dangers of each approach and we will compare different programming languages and how they choose to tackle/ignore concurrency.

Concurrency and Multithreading Demistified - Reversim Summit 2014

Haim Yadid

NoSQL_Night

Clarence J M Tauro

04-Introduction-to-CassandraDB-.pdf

hothyfa

SDEC2011 NoSQL Data modelling

Korea Sdec

Intro to Graphs for Fedict

Rik Van Bruggen

Similar to Scaling Twitter with Cassandra (20)

cassandra_presentation_final

Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...

Exploring the Fundamentals of YugabyteDB - Mydbops

No sql solutions - 공개용

Spring one2gx2010 spring-nonrelational_data

Apache Cassandra @Geneva JUG 2013.02.26

Data Diffing Based Software Architecture Patterns

Thoughts on Transaction and Consistency Models

Cassandra from the trenches: migrating Netflix (update)

Cassandra

MyCassandra (Full English Version)

Cassandra integrations

NoSQL overview #phptostart turin 11.07.2011

The Return of the Living Datalog

Real-time Cassandra

Concurrency and Multithreading Demistified - Reversim Summit 2014

NoSQL_Night

04-Introduction-to-CassandraDB-.pdf

SDEC2011 NoSQL Data modelling

Intro to Graphs for Fedict

Recently uploaded

CNIC Information System with Pakdata Cf In Pakistan

danishmna97

Whatsapp Number Escorts Call girls 8617370543 Available 24x7 Mcleodganj Call Girls Service Offer Genuine VIP Model Escorts Call Girls in Your Budget. Mcleodganj Call Girls Service Provide Real Call Girls Number. Make Your Sexual Pleasure Memorable with Our Mcleodganj Call Girls at Affordable Price. Top VIP Escorts Call Girls, High Profile Independent Escorts Call Girls, Housewife Women Escorts Call Girl, College Girls Escorts Call Girls, Russian Escorts Call girls Service in Your Budget.

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model

Deepika Singh

💥 You’re lucky! We’ve found two different (lead) developers that are willing to share their valuable lessons learned about using UiPath Document Understanding! Based on recent implementations in appealing use cases at Partou and SPIE. Don’t expect fancy videos or slide decks, but real and practical experiences that will help you with your own implementations. 📕 Topics that will be addressed: • Training the ML-model by humans: do or don't? • Rule-based versus AI extractors • Tips for finding use cases • How to start 👨‍🏫👨‍💻 Speakers: o Dion Morskieft, RPA Product Owner @Partou o Jack Klein-Schiphorst, Automation Developer @Tacstone Technology

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam

UiPathCommunity

In this session, we will discuss the journey of API governance from its initial, ungoverned state to the development of sophisticated models that tackle contemporary challenges. We'll explore how APIs have become essential in the intersection of business and technology, adapting to advancements and evolving needs. We'll focus on how organizations have moved from launching to monetizing APIs, using models like pay-per-use and subscriptions, and finding the right balance between technical implementation and business strategy. We'll also highlight the impact of governance on monetization strategies, especially how data security, compliance, and service quality influence pricing. Real-world examples will demonstrate the effective integration of governance with monetization, including AI's role in dynamic pricing. Looking ahead, we'll share insights into future trends in API governance and monetization, emphasizing the importance of adaptability, continuous learning, and innovation.

API Governance and Monetization - The evolution of API governance

WSO2

Understanding the FAA Part 107 License ..

Christopher Logan Kennedy

Dubai, often portrayed as a shimmering oasis in the desert, faces its own set of challenges, including the occasional threat of flooding. Despite its reputation for opulence and modernity, the emirate is not immune to the forces of nature. In recent years, Dubai has experienced sporadic but significant floods, testing the resilience of its infrastructure and communities. Among the critical lifelines in this bustling metropolis is the Dubai International Airport, a bustling hub that connects the city to the world. This article explores the intersection of Dubai flood events and the resilience demonstrated by the Dubai International Airport in the face of such challenges.

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

Orbitshub

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Juan lago vázquez

💉💊+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI}}+971581248768 +971581248768 Mtp-Kit (500MG) Prices » Dubai [(+971581248768**)] Abortion Pills For Sale In Dubai, UAE, Mifepristone and Misoprostol Tablets Available In Dubai, UAE CONTACT DR.Maya Whatsapp +971581248768 We Have Abortion Pills / Cytotec Tablets /Mifegest Kit Available in Dubai, Sharjah, Abudhabi, Ajman, Alain, Fujairah, Ras Al Khaimah, Umm Al Quwain, UAE, Buy cytotec in Dubai +971581248768''''Abortion Pills near me DUBAI | ABU DHABI|UAE. Price of Misoprostol, Cytotec” +971581248768' Dr.DEEM ''BUY ABORTION PILLS MIFEGEST KIT, MISOPROTONE, CYTOTEC PILLS IN DUBAI, ABU DHABI,UAE'' Contact me now via What's App…… abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain Above all, Cytotec Abortion Pills are Available In Dubai / UAE, you will be very happy to do abortion in Dubai we are providing cytotec 200mg abortion pill in Dubai, UAE. Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. We only offer abortion pills from 1 week-6 Months. We then advise you to use surgery if its beyond 6 months. Our Abu Dhabi, Ajman, Al Ain, Dubai, Fujairah, Ras Al Khaimah (RAK), Sharjah, Umm Al Quwain (UAQ) United Arab Emirates Abortion Clinic provides the safest and most advanced techniques for providing non-surgical, medical and surgical abortion methods for early through late second trimester, including the Abortion By Pill Procedure (RU 486, Mifeprex, Mifepristone, early options French Abortion Pill), Tamoxifen, Methotrexate and Cytotec (Misoprostol). The Abu Dhabi, United Arab Emirates Abortion Clinic performs Same Day Abortion Procedure using medications that are taken on the first day of the office visit and will cause the abortion to occur generally within 4 to 6 hours (as early as 30 minutes) for patients who are 3 to 12 weeks pregnant. When Mifepristone and Misoprostol are used, 50% of patients complete in 4 to 6 hours; 75% to 80% in 12 hours; and 90% in 24 hours. We use a regimen that allows for completion without the need for surgery 99% of the time. All advanced second trimester and late term pregnancies at our Tampa clinic (17 to 24 weeks or greater) can be completed within 24 hours or less 99% of the time without the need surgery. The procedure is completed with minimal to no complications. Our Women's Health Center located in Abu Dhabi, United Arab Emirates, uses the latest medications for medical abortions (RU-486, Mifeprex, Mifegyne, Mifepristone, early options French abortion pill), Methotrexate and Cytotec (Misoprostol). The safety standards of our Abu Dhabi, United Arab Emirates Abortion Doctors remain unparalleled. They consistently maintain the lowest complication rates throughout the nation. Our Physicians and staff are always available to answer questions and care for women in one of the most difficult times in their lives. The decision to have an abortion at the Abortion Cl

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

DBX First Quarter 2024 Investor Presentation

Dropbox

Key topics covered: - Understanding the basics of IAM and its significance in the modern enterprise. IAM in a platformless environment - Tackling real-world issues like prioritizing frictionless yet secure user access, securing high-value APIs, integrating to business, compliance, and adapting to cloud native environments with scalable solutions - Practical demonstrations of how WSO2 products can be instrumental in deploying efficient IAM solutions - Preparing for upcoming trends and innovations in identity management

Navigating Identity and Access Management in the Modern Enterprise

WSO2

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

Following the popularity of "Cloud Revolution: Exploring the New Wave of Serverless Spatial Data," we're thrilled to announce this much-anticipated encore webinar. In this sequel, we'll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you're building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

MINDCTI Revenue Release Quarter One 2024

MIND CTI

Retrieval augmented generation (RAG) is the most popular style of large language model application to emerge from 2023. The most basic style of RAG works by vectorizing your data and injecting it into a vector database like Milvus for retrieval to augment the text output generated by an LLM. This is just the beginning. One of the ways that we can extend RAG, and extend AI, is through multilingual use cases. Typical RAG is done in English using embedding models that are trained in English. In this talk, we’ll explore how RAG could work in languages other than English. We’ll explore French, Chinese, and Polish.

Introduction to Multilingual Retrieval Augmented Generation (RAG)

Zilliz

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the deployment of external web forms using Jotform for Bonterra Impact Management. This solution can be customized to your organization’s needs and deployed to support the common use cases below: - Intake and consent - Assessments - Surveys - Applications - Program registration Interested in deploying web form automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Jeffrey Haguewood

Key topics covered: - Real-world examples of Choreo's comprehensive coverage from application design and deployment, security, scaling, and monitoring - Running different types of workloads, such as web applications, APIs, microservices, integrations, and tasks at scale, and wire them together to deliver seamless omnichannel digital experiences - How Choreo improves the developer experience by eliminating repetition, silos, and redundancy through enhanced discoverability and self-serviceability

Choreo: Empowering the Future of Enterprise Software Engineering

WSO2

WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...

WSO2

Exploring Multimodal Embeddings with Milvus

Zilliz

Vector Search -An Introduction in Oracle Database 23ai.pptx

Remote DBA Services

Recently uploaded (20)

CNIC Information System with Pakdata Cf In Pakistan

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam

API Governance and Monetization - The evolution of API governance

Understanding the FAA Part 107 License ..

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

DBX First Quarter 2024 Investor Presentation

Navigating Identity and Access Management in the Modern Enterprise

How to Troubleshoot Apps for the Modern Connected Worker

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Strategies for Landing an Oracle DBA Job as a Fresher

MINDCTI Revenue Release Quarter One 2024

Introduction to Multilingual Retrieval Augmented Generation (RAG)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Choreo: Empowering the Future of Enterprise Software Engineering

WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...

Exploring Multimodal Embeddings with Milvus

Vector Search -An Introduction in Oracle Database 23ai.pptx

Scaling Twitter with Cassandra

2. Scaling Twitter with Cassandra Ryan King Storage Team

3. bit.ly/chirpcassandra ryan@twitter.com @rk

4. Legacy • vertically & horiztonally partitioned mysql • memcached (rows, indexes and fragments) • application managed

5. Legacy Drawbacks • many single-points-of-failure • hardware-intensive • manpower-intensive • tight coupling

6. Apache Cassandra • Apache top level project • originally developed at Facebook • Rackspace, Digg, SimpleGeo, Twitter, etc.

7. Why Cassandra? • highly available • consistent, eventually • decentralized • fault tolerant • elastic • flexible schema • high write throughput

8. What is Cassandra? • distributed database • Google's BigTable's data model • Amazon's Dynamo's infrastructure

9. Cassandra Data Model • keyspaces • column families • columns • super columns

10. Cassandra Infrastructure • partitioners • storage • querying

11. Partitioners • order-preserving • random • custom

12. Storage • commit log • memtables • sstables • compaction • bloom filters • indexes • key cache • row cache

13. Querying • get • multiget • range • slice

14. Consistency

15. Consistency • N, R, W

16. Consistency • N, R, W • N = number of replicas

17. Consistency • N, R, W • N = number of replicas • R = read replicas

18. Consistency • N, R, W • N = number of replicas • R = read replicas • W = write replicas

19. Consistency • N, R, W • N = number of replicas • R = read replicas • W = write replicas • send request, wait for specified number

20. Consistency • N, R, W • N = number of replicas • R = read replicas • W = write replicas • send request, wait for specified number • wait for others in background and perform read- repair

21. Consistency Levels • ZERO • ONE • QUORUM • ALL

22. Strong Consistency • If W + R > N, you will have consistency • W=1, R=N • W=N, R=1 • W=Q, R=Q where Q = N / 2 + 1

23. Eventuality • Hinted Handoff • Read Repair • Proactive Repair (Merkle trees)

24. Potential Consistency

25. Potential Consistency • causes • write-through caching • master-slave replication failures

26. Example

27. Read Repair • send read to all replicas • if they differ, resolve conflicts and update (in background)

28. Hinted Handoff • A wants to write to B • B is down • A tells C, "when B is back, send them this update"

29. Proactive Repair • use Merkle trees to find inconsistencies • resolve conflicts • send repaired data • triggered manually

30. Parallel Deployment

31. How we’re moving? • parallel deployments • incremental traffic shifting

32. Parallel Deployment 1. build new implementation 2. integrate it alongside existing 3. ...with switches for dynamically move/mirror traffic 4. turn up traffic 5. break something 6. Fix it 7. GOTO 4

33. ?

Editor's Notes

* storage team * personal background
* began working on this problem last june * complexity had grown unmanageable * multiple internal customers * error domain grows as data size and complexity grow
* every master db is a SPOF (failover is hard to pull off without strong coordination) * SPOFs lead to expensive hardware * app-managed hosts is tight coupling
* our application is already tolerant of eventual consistency (actually more tolerant...) * in addition to scale, we want more flexibility than relational data models give us
keyspace: database CF: table column: attribute SC: collection of attributes
[insert diagrams of ring + tokens] nodes are arranged on a ring keys are mapped to the ring and written to the next N machines partitioners map keys to the ring
[flow chart of how updates happen]
if OPP, rows are ordered columns are ordered [diagram of range and slice]
insert to mysql insert into memcache replicate to slave update mysql insert into memcache fails replication to slave fails
Launching is shifting from roll back to roll forward

Scaling Twitter with Cassandra

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Scaling Twitter with Cassandra

Similar to Scaling Twitter with Cassandra (20)

Recently uploaded

Recently uploaded (20)

Scaling Twitter with Cassandra

Editor's Notes