Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...DataStax
A solid backup strategy is a DBA's bread and butter. Cassandra's nodetool snapshot makes it easy to back up the SSTable files, but there remains the question of where to put them and how. Knewton's backup strategy uses Ansible for distributed backups and stores them in S3.
Unfortunately, it's all too easy to store backups that are essentially useless due to the absence of a coherent restoration strategy. This problem proved much more difficult and nuanced than taking the backups themselves. I will discuss Knewton's restoration strategy, which again leverages Ansible, yet I will focus on general principles and pitfalls to be avoided. In particular, restores necessitated modifying our backup strategy to generate cluster-wide metadata that is critical for a smooth automated restoration. Such pitfalls indicate that a restore-focused backup design leads to faster and more deterministic recovery.
About the Speaker
Joshua Wickman Database Engineer, Knewton
Dr. Joshua Wickman is currently part of the database team at Knewton, a NYC tech company focused on adaptive learning. He earned his PhD at the University of Delaware in 2012, where he studied particle physics models of the early universe. After a brief stint teaching college physics, he entered the New York tech industry in 2014 working with NoSQL, first with MongoDB and then Cassandra. He was certified in Cassandra at his first Cassandra Summit in 2015.
My Learnings on Setting up a Kubernetes Cluster on AWS using Kubernetes Opera...Sathyajith Bhat
I recently setup a Kubernetes cluster on Amazon Web Services(AWS) using Kubernetes Operations(KOPS). Here's some of my findings and learnings from a out-of-box-experience of setting up a Kubernetes cluster
Alexander Ignatyev "MapReduce infrastructure"Yandex
Семинар «Использование современных информационных технологий для решения современных задач физики частиц» в московском офисе Яндекса, 3 июля 2012
Александр Игнатьев, разработчик MapReduce, Yandex
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...DataStax
A solid backup strategy is a DBA's bread and butter. Cassandra's nodetool snapshot makes it easy to back up the SSTable files, but there remains the question of where to put them and how. Knewton's backup strategy uses Ansible for distributed backups and stores them in S3.
Unfortunately, it's all too easy to store backups that are essentially useless due to the absence of a coherent restoration strategy. This problem proved much more difficult and nuanced than taking the backups themselves. I will discuss Knewton's restoration strategy, which again leverages Ansible, yet I will focus on general principles and pitfalls to be avoided. In particular, restores necessitated modifying our backup strategy to generate cluster-wide metadata that is critical for a smooth automated restoration. Such pitfalls indicate that a restore-focused backup design leads to faster and more deterministic recovery.
About the Speaker
Joshua Wickman Database Engineer, Knewton
Dr. Joshua Wickman is currently part of the database team at Knewton, a NYC tech company focused on adaptive learning. He earned his PhD at the University of Delaware in 2012, where he studied particle physics models of the early universe. After a brief stint teaching college physics, he entered the New York tech industry in 2014 working with NoSQL, first with MongoDB and then Cassandra. He was certified in Cassandra at his first Cassandra Summit in 2015.
My Learnings on Setting up a Kubernetes Cluster on AWS using Kubernetes Opera...Sathyajith Bhat
I recently setup a Kubernetes cluster on Amazon Web Services(AWS) using Kubernetes Operations(KOPS). Here's some of my findings and learnings from a out-of-box-experience of setting up a Kubernetes cluster
Alexander Ignatyev "MapReduce infrastructure"Yandex
Семинар «Использование современных информационных технологий для решения современных задач физики частиц» в московском офисе Яндекса, 3 июля 2012
Александр Игнатьев, разработчик MapReduce, Yandex
(BDT307) Running NoSQL on Amazon EC2 | AWS re:Invent 2014Amazon Web Services
Deploying self-managed NoSQL databases on Amazon Web Services (AWS) is more straightforward than you would think. This session focuses on three popular self-managed NoSQL systems on Amazon EC2: MongoDB, Cassandra, and Couchbase. We start with an overview of each of these popular NoSQL databases, discuss their origins and characteristics, and demonstrate the use of the AWS ecosystem to deploy these NoSQL databases quickly. Later in the session, we dive deep on use cases, design patterns, and discuss creating highly-available and high-performance architectures with careful consideration for failure and recovery. Whether you're a NoSQL developer, architect, or administrator, join us for a comprehensive session on looking at three different NoSQL systems from a uniform perspective.
early benchmarks on pre-release Gnocchi v4. includes benchmark comparison between all-ceph v3.x driver versus all-ceph v4 driver. also, shows benchmark using redis+ceph deployment.
Presentation from the Boulder/Denver Big Data Meetup on 2/20/2020 in Boulder, CO. Topics covered: Troubleshooting Spark jobs (groupby, shuffle) for big data, tuning AWS EMR Spark clusters, EMR cluster resource utilization, writing scaleable Scala for scanning S3 metadata.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.Rakib Hossain
We present a program that implemented to execute Adaptive merge sort algorithm in parallel on a GPU based system. Parallel implementation is used to get better performance than serial implementation in runtime perspective. Parallel implementation executes independent executable operation in parallel using large number of cores in GPU based system. Results from a parallel implementation of the algorithm is given and compared with its serial implementation on run time basis. The parallel version is implemented with CUDA platform in a system based on NVIDIA GPU (GTX 650)
Talk given on September 21 to the Bay Area R User Group. The talk walks a stochastic project SVD algrorithm through the steps from initial implementation in R to a proposed implementation using map-reduce that integrates cleanly with R via NFS export of the distributed file system. Not surprisingly, this algorithm is essentially the same as the one used by Mahout.
Cassandra trainings presentation for R&D:
Training objectives links:
http://www.datastax.com/what-we-offer/products-services/training/objectives-developer
http://www.datastax.com/what-we-offer/products-services/training/objectives-administrator
Talk given on September 2011 to the Bay Area R User Group. The talk walks a stochastic project SVD algorithm through the steps from initial implementation in R to a proposed implementation using map-reduce that integrates cleanly with R via NFS export of the distributed file system. Not surprisingly, this algorithm is essentially the same as the one used by Mahout.
Benchmark results for running bioinformatics platform Galaxy on the Amazon Web Services cloud. Results include info about disks, instance types, sizes, and variable data size.
(BDT307) Running NoSQL on Amazon EC2 | AWS re:Invent 2014Amazon Web Services
Deploying self-managed NoSQL databases on Amazon Web Services (AWS) is more straightforward than you would think. This session focuses on three popular self-managed NoSQL systems on Amazon EC2: MongoDB, Cassandra, and Couchbase. We start with an overview of each of these popular NoSQL databases, discuss their origins and characteristics, and demonstrate the use of the AWS ecosystem to deploy these NoSQL databases quickly. Later in the session, we dive deep on use cases, design patterns, and discuss creating highly-available and high-performance architectures with careful consideration for failure and recovery. Whether you're a NoSQL developer, architect, or administrator, join us for a comprehensive session on looking at three different NoSQL systems from a uniform perspective.
early benchmarks on pre-release Gnocchi v4. includes benchmark comparison between all-ceph v3.x driver versus all-ceph v4 driver. also, shows benchmark using redis+ceph deployment.
Presentation from the Boulder/Denver Big Data Meetup on 2/20/2020 in Boulder, CO. Topics covered: Troubleshooting Spark jobs (groupby, shuffle) for big data, tuning AWS EMR Spark clusters, EMR cluster resource utilization, writing scaleable Scala for scanning S3 metadata.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.Rakib Hossain
We present a program that implemented to execute Adaptive merge sort algorithm in parallel on a GPU based system. Parallel implementation is used to get better performance than serial implementation in runtime perspective. Parallel implementation executes independent executable operation in parallel using large number of cores in GPU based system. Results from a parallel implementation of the algorithm is given and compared with its serial implementation on run time basis. The parallel version is implemented with CUDA platform in a system based on NVIDIA GPU (GTX 650)
Talk given on September 21 to the Bay Area R User Group. The talk walks a stochastic project SVD algrorithm through the steps from initial implementation in R to a proposed implementation using map-reduce that integrates cleanly with R via NFS export of the distributed file system. Not surprisingly, this algorithm is essentially the same as the one used by Mahout.
Cassandra trainings presentation for R&D:
Training objectives links:
http://www.datastax.com/what-we-offer/products-services/training/objectives-developer
http://www.datastax.com/what-we-offer/products-services/training/objectives-administrator
Talk given on September 2011 to the Bay Area R User Group. The talk walks a stochastic project SVD algorithm through the steps from initial implementation in R to a proposed implementation using map-reduce that integrates cleanly with R via NFS export of the distributed file system. Not surprisingly, this algorithm is essentially the same as the one used by Mahout.
Benchmark results for running bioinformatics platform Galaxy on the Amazon Web Services cloud. Results include info about disks, instance types, sizes, and variable data size.
Плоский и традиционный дизайн интернет-сайтов: сравнительная оценка эффективн...Ivan Burmistrov
Доклад на конференции «Основные тенденции развития психологии труда и организационной психологии 2015» (Москва, 15-16 октября 2015 г.).
Статья: http://www.interux.ru/publications/Burmistrov_Zlokazova_Izmalkova_Leonova-Flat_vs_Traditional_Webdesign-2015.pdf
Amazon Aurora is a cloud-optimized relational database that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. The recently announced PostgreSQL-compatibility, together with the original MySQL compatibility, are perfect for new application development and for migrations from overpriced, restrictive commercial databases. In this session, we’ll do a deep dive into the new architectural model and distributed systems techniques behind Amazon Aurora, discuss best practices and configurations, look at migration options and share customer experience from the field.
[Globant summer take over] Empowering Big Data with CassandraGlobant
Mar del Plata Summer Take Over Presentation 2016 - By Renato Carelli
DevOps + Infra @ Big Data
Hardening Enthusiast
Cloud evangelist
Bitcoin speculator
Announcing Amazon Aurora with PostgreSQL Compatibility - January 2017 AWS Onl...Amazon Web Services
Amazon Aurora is now PostgreSQL compatible. With Amazon Aurora’s new PostgreSQL support, customers can get several times better performance than the typical PostgreSQL database and take advantage of the scalability, durability, and security capabilities of Amazon Aurora – all for one-tenth the cost of commercial grade databases. Amazon Aurora is a fully managed relational database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. Amazon Aurora is built on a cloud native architecture that is designed to offer greater than 99.99 percent availability and automatic failover with no loss of data.
Learning Objectives:
• Learn about the capabilities and features of Amazon Aurora with PostgreSQL Compatibility
• Learn about the benefits and different use cases
• Learn how to get started using Amazon Aurora with PostgreSQL Compatibility
It’s been an exciting year for Amazon Aurora, the MySQL-compatible relational database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. In this deep dive session, we’ll discuss best practices and explore new features, include high availability options and new integrations with AWS services. We’ll also discuss the recently-announced Aurora with PostgreSQL compatibility.
Deep Dive on the Amazon Aurora MySQL-compatible Edition - DAT301 - re:Invent ...Amazon Web Services
The Amazon Aurora MySQL-compatible Edition is a fully managed relational database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. It is purpose-built for the cloud using a new architectural model and distributed systems techniques. It provides far higher performance, availability, and durability than previously possible using conventional monolithic database architectures. Amazon Aurora packs a lot of innovations in the engine and storage layers. In this session, we do a deep-dive into some key innovations behind Amazon Aurora MySQL-compatible edition. We explore new improvements to the service and discuss best practices and optimal configurations.
by Joyjeet Banerjee, Enterprise Solutions Architect, AWS
Amazon Aurora is a MySQL- and PostgreSQL-compatible database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. In this deep dive session, we’ll discuss best practices and explore new features in areas like high availability, security, performance management and database cloning. Level 300
This presentation was used by Blair during his talk on Aurora and PostgreSQl compatibility for Aurora at pgDay Asia 2017. The talk was part of dedicated PostgreSQL track at FOSSASIA 2017
AWS June 2016 Webinar Series - Amazon Aurora Deep Dive - Optimizing Database ...Amazon Web Services
Amazon Aurora is a MySQL-compatible relational database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. Amazon Aurora is a disruptive technology in the database space, bringing a new architectural model and distributed system techniques to provide far higher performance, availability and durability than previously available using conventional monolithic database techniques. In this session, we will do a deep-dive into some of the key innovations behind Amazon Aurora, discuss best practices and configurations, and share customer experiences from the field.
Learning Objectives:
Learn how Amazon Aurora delivers 5x the performance and 1/10th the cost
Learn best practices for using Amazon Aurora
C* Summit 2013: Cassandra at Instagram by Rick BransonDataStax Academy
Speaker: Rick Branson, Infrastructure Engineer at Instagram
Cassandra is a critical part of Instagram's large scale site infrastructure that supports more than 100 million active users. This talk is a practical deep dive into data models, systems architecture, and challenges encountered during the implementation process.
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all of your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing, scale-out architecture, and columnar direct-attached storage to minimize I/O time and maximize performance. Learn how you can gain deeper business insights and save money and time by migrating to Amazon Redshift. Take away strategies for migrating from on-premises data warehousing solutions, tuning schema and queries, and utilizing third party solutions.
AWS January 2016 Webinar Series - Amazon Aurora for Enterprise Database Appli...Amazon Web Services
Relational databases are a cornerstone of the enterprise IT landscape, powering business-critical applications of many kinds. Though they have been around for a while, current commercial relational databases have lagged behind in innovation. Amazon Aurora, a managed database service built for the cloud, is intended to change that. It targets the high-performance needs of business-critical applications with an emphasis on cost-effectiveness.
In this session, we will look into how Aurora fits the needs of applications built and bought by enterprises to power their business.
Learning Objectives:
Learn about the overall architecture, capabilities, and cost-effectiveness of Aurora, comparing it to current commercial database offerings
Explore best practices for enterprises adopting Aurora for existing and new applications, as well as strategies, tools, and techniques for migrating existing databases to Aurora
Who Should Attend:
IT Managers, DBAs, Enterprise and Solution Architects , DevOps Engineers and Developers
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
Companies today are innovating with real-time data to deliver truly amazing customer experiences in the moment. Real-time data management for real-time customer experience is core to staying ahead of competition and driving revenue growth. Join Trays to learn how Comcast is differentiating itself from it's own historical reputation with Customer Experience strategies.
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
DataStax Enterprise (DSE) Graph is a built to manage, analyze, and search highly connected data. DSE Graph, built on NoSQL Apache Cassandra delivers continuous uptime along with predictable performance and scales for modern systems dealing with complex and constantly changing data.
Download DataStax Enterprise: Academy.DataStax.com/Download
Start free training for DataStax Enterprise Graph: Academy.DataStax.com/courses/ds332-datastax-enterprise-graph
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
DataStax Enterprise Advanced Replication supports one-way distributed data replication from remote database clusters that might experience periods of network or internet downtime. Benefiting use cases that require a 'hub and spoke' architecture.
Learn more at http://www.datastax.com/2016/07/stay-100-connected-with-dse-advanced-replication
Advanced Replication docs – https://docs.datastax.com/en/latest-dse/datastax_enterprise/advRep/advRepTOC.html
Data Modeling is the one of the first things to sink your teeth into when trying out a new database. That's why we are going to cover this foundational topic in enough detail for you to get dangerous. Data Modeling for relational databases is more than a touch different than the way it's approached with Cassandra. We will address the quintessential query-driven methodology through a couple of different use cases, including working with time series data for IoT. We will also demo a new tool to get you bootstrapped quickly with MovieLens sample data. This talk should give you the basics you need to get serious with Apache Cassandra.
Hear about how Coursera uses Cassandra as the core of its scalable online education platform. I'll discuss the strengths of Cassandra that we leverage, as well as some limitations that you might run into as well in practice.
In the second part of this talk, we'll dive into how best to effectively use the Datastax Java drivers. We'll dig into how the driver is architected, and use this understanding to develop best practices to follow. I'll also share a couple of interesting bug we've run into at Coursera.
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
This talk covers scaling Cassandra to a fast growing user base. Alex and Isaias will cover new best practices and how to work with the strengths and weaknesses of Cassandra at large scale. They will discuss how to adapt to bottlenecks while providing a rich feature set to the playstation community.
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
This talk covers scaling Cassandra to a fast growing user base. Alex and Isaias will cover new best practices and how to work with the strengths and weaknesses of Cassandra at large scale. They will discuss how to adapt to bottlenecks while providing a rich feature set to the playstation community.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
5. Cassandra in EC2 at Talkbits
NetworkTopologyStrategy + EC2MultiRegionSnitch
1 DC, 3 racks (availability zones in S3 Region), N nodes per rack.
3N nodes total.
Data stored in 3 local copies, 1 per zone.
Write with LOCAL_QUORUM setting, read with 1 or 2.
m1.large nodes (2 cores, 4CU, 7.5Gb RAM).
Transaction log and data files are both on RAID0-ed ephemeral
drive (2 drives in array). Works for SSD or EC2 disks only!
Other typical setup options for EC2:
m1.xlarge (16Gb) / m2.4xlarge (64Gb) / hi1.4xlarge (SSD) nodes
EBS-backed data volumes (not recommended. use for
development only).
6. Cassandra consistency options
Definitions
N, R, W settings from Amazon Dynamo.
N – replication factor. Set per keyspace on keyspace creation.
Quorum: N / 2 + 1 (rounded down)
RW consistency options:
ANY, ONE, TWO, THREE, QUORUM, LOCAL_QUORUM &
EACH_QUORUM (multi-dc), ALL.
Set per query.
7. Cassandra consistency semantics
W + R > N
Ensures strong consistency. Read will always reflect the most recent
write.
R = W = [LOCAL_]QUORUM
Strong consistency. See quorum definition and formula above.
W + R <= N
Eventual consistency.
W = 1
Good for fire-n-forget writes: logs, traces, metrics, page views etc.
8. Cassandra backups to S3
Full backups
•Periodic snapshots (daily, weekly)
•Remove from local disk after upload to S3 to prevent disk
overflow
Incremental backups
•SSTable are compressed and copied to S3
•Happens on IN_MOVED_TO, IN_CLOSE_WRITE events
•Don’t turn on with leveled compaction (huge network traffic
to S3)
Continuous backups
•Compress and copy transaction log to S3 with short time
intervals (for example - 5, 30, 60 mins)
9. Cassandra backups to S3 - tools
TableSnap from SimpleGeo
https://github.com/Instagram/tablesnap (most up-to-date fork)
3 simple Python scripts is the whole tool (tablesnap, tableslurp,
tablechop). Allows to upload SSTables in real-time, restore and remove
old backups uploads from S3.
Priam from Netflix
https://github.com/Netflix/Priam
Full-blown web application. Requires servlet container to run and
depends on Amazon SimpleDB service for distributed token
management.