C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

•Download as PPTX, PDF•

3 likes•1,010 views

Netflix stores 98 percent of data related with streaming services: right from bookmarks, viewing history to billing and payment information. These services / applications simply desire highly available and scalable persistence solution to keep themselves running efficiently in a normal and disastrous situation. How does Netflix plan for capacity for it's new as well as existing services? In this talk, Arun Agrawal, Senior Software Engineer and Ajay Upadhyay, Cloud Data Architect @Netflix will talk about the capacity planning and capacity forecasting in cassandra world. We will take you through the science behind forecasting the short and long term usage and auto-scaling adequate capacity well before C* clusters reach their limit. This guarantees highly scalable and available persistence solution meeting our SLAs @ Netflix. About the Speakers ajay upadhyay Senior Database Engineer, Netflix Responsible for persistent layer at Netflix, part of CDE [Cloud Database Engineering] team. Working with application team, suggesting and guiding them with the best practices for various persistent layers provided by CDE team. Arun Agrawal Senior Software Engineer, Netflix Arun Agrawal is part of Cloud Database Engineering where they provide CAAS (Cassandra as a service). Ensuring smooth operations of service and finding innovative ways to reduce the management overheads of having CAAS.

Capacity Forecast @ Scale
CDE, Cloud Database Engineering
Netflix.

● CDE, Cloud Database Engineering
● Providing data stores as a service
○Cassandra,
○ Dynomite,
○ Elasticsearch and RDS
Ajay Upadhyay
Cloud Data Architect @ Netflix
Arun Agrawal
Sr. Software Engineer @
Netflix
Who are we?

●Cassandra @ Netflix
●Cassandra footprint
●Capacity planning lifecycle
●Forecasting the capacity
●Q and A
Agenda

• 98% of streaming data is stored
in Cassandra
• Data ranges from customer
details to Viewing history /
streaming bookmarks to billing
and payment
Cassandra @ Netflix

Capacity Planning
• Able to predict
– Current usage and available capacity
– Resources needing upgrade
– Life cycle of current configuration
– Appropriate configuration for new and
existing App/Service
• Optimize
– Under or over utilized resource
– Increased business productivity

Capacity Planning
Avoid:
• Impact on Business
• No service or SLA
disruption
• Un-planned maintenance
• Firefighting

Life Cycle
Capture
Requirement
Requirement
Analysis/feasibility
Proxy or Simulate
Requirement
Monitoring /
Trending
New / Increased
traffic Optimization

Capture Requirement
– IOPs and SLA
– Maintenance overhead
– Failover
– Access pattern

IOPs and SLA
Questions Response
Read OPS/sec [avg, peak] 5k - 10k
Read Latency requirement 95th - 20ms
99th - 100ms
Write OPS/sec [avg, peak] 1k - 2k
Write Latency requirement 95th - 20ms
99th - 100ms
Num Columns / Row 100
Avg col size / or avg row size 64k
Num of rows 100 Mil
TTL [life Cycle of data] 365 Days
Data store
C*
Gutenberg publisher service
Gutenberg publisher serviceRead
Write

Maintenance Overhead
Repairs / Compactions Y/N
Node replacement Y
Backup - Full /
Incrementals
Y/N
Type Response

Failover
Region Failover Y/N
SLA in case of region
failover
Y/N
Questions Response

Access Pattern
Questions Response
Read Point read
All row readers
Column slices
Write Part existing row
New rows

Proxy/Simulate Traffic
– Proxy existing traffic
– Simulate traffic
–NDBench
– Generate actual / synthetic
traffic before final
deployment using app

Optimization
• Cache
- Application level
- Fronting cache engine before C*
- Stagger R - W operations if possible

Trend Analysis
Continuous monitoring / trending on usage pattern

New / Increased Traffic
Capacity planning cycle begins
Capture
Requirement
Requirement
Analysis/feasibility
Proxy or Simulate
Requirement
Monitoring /
Trending
New /
Increased
traffic
Optimization

Pain Points
• No support for complex
relationships
• Hardware failure could fail
leading to false positives

Winston
• Bridge between atlas and oncall
• Complex relationship modeling
between metrics
• Reduce false positives
• Auto remediation platform

Lesson Learnt
• It might be already too late to
fix the system.
• Reactive than proactive

Requirements
• Show us trend for the clusters.
• Warn us of what is coming if trend
continues.
• Give us time to scale their cluster

Aggregation
• Daily
• Instance Level
• Cluster Level
•Instance Failures
•Adding capacity over days

Growth Criteria
f(x) of
– Subscriber
– Netflix content
– # Viewing Sessions

ARIMA
– AR
•Regression on prior values
–I
•Data values are replaced with (x(i) - x(i-1))
–MA
•Linear combination of error terms

Future
•Vector Auto
Regression
•Automate manual
judgement

Resources
– https://www.otexts.org/fpp/8

You may not control all the events that happen to you,
but you CAN decide not to be reduced by them.
- Maya Angelou

During this session Ben Lackey (DataStax) and Ravi Madasu (Google) will cover best practices for quickly setting up a cluster on Google Cloud Platform (GCP) using both Google Compute Engine (GCE) and Google Container Engine (GKE) which is based on Kubernetes and Docker. About the Speakers Ben Lackey Partner Architect, DataStax I work in the Cloud Strategy group at DataStax where I concentrate on improving the integration between DataStax Enterprise and cloud platforms including Azure, GCP and Pivotal. Ravi Madasu Ravi Madasu is a program manager at Google, primarily focused on Google Cloud Launcher. He works closely with ISV partners to make their products and services available on the Google Cloud Platform providing a developer friendly deployment experience. He has 15+ years of experience, working in variety of roles such as software engineer, project manager and product manager. Ravi received a Masters degree in Information Systems from Northeastern University and an MBA from Carnegie Mellon University.

Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...

DataStax

DataStax provides modern, feature-rich, and highly tunable client libraries for C/C++, C#, Java, Node.js, Python, PHP, and Ruby that work with any cluster size no matter if deployed across multiple on premise or cloud datacenters. Come learn right from the source about the DataStax drivers for Apache Cassandra and DSE and how they can help you build continuously available, fault tolerant, and instantly responsive applications. About the Speakers Alex Popescu Senior Product Manager, DataStax I'm a developer turned product manager building developer tools for Apache Cassandra and DSE. With an eye for simplicity, I focus on creating friendly developer solutions that enable building high-performance, scalable, and fault tolerant applications. I'm passionate about open source and over years I made numerous contributions to major projects like TestNG and Groovy. Bulat Shakirzyanov Architect, DataStax Bulat Shakirzyanov, a.k.a. avalance123, is a software alchemist who holds a black belt in test-fu. Open source enthusiast, author of and contributor to several popular open source projects, he also loves talking about clean code, open source, unix, distributed systems, consensus algorithms and himself in third person.

Azure + DataStax Enterprise Powers Office 365 Per User Store

DataStax Academy

Tsinghua University: Two Exemplary Applications in China

DataStax Academy

In this talk, we will share the experiences of applying Cassandra with two real customers in China. In the first use case, we deployed Cassandra at Sany Group, a leading company of Machinery manufacturing, to manage the sensor data generated by construction machinery. By designing a specific schema and optimizing the write process, we successfully managed over 1.5 billion historical data records and achieved the online write throughput of 10k write operations per second with 5 servers. MapReduce is also used on Cassandra for valued-added services, e.g. operations management, machine failure prediction, and abnormal behavior mining. In the second use case, Cassandra is deployed in the China Meteorological Administration to manage the Meteorological data. We design a hybrid schema to support both slice query and time window based query efficiently. Also, we explored the optimized compaction and deletion strategy for meteorological data in this case.

Data Pipelines with Spark & DataStax Enterprise

DataStax

Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay

DataStax Academy

Presenter: Feng Qu, Principal DBA at eBay Cassandra has been adopted widely at eBay in recent years and used by many end-user facing applications. I will introduce best practices we have built over the time around system design, capacity planning, deployment automation, monitoring integration, performance analysis and troubleshooting. I will also share our experience working with DataStax support to provide a highly available, highly scalable data store fitting into eBay infrastructure.

Workshop - How to benchmark your database

ScyllaDB

Why you need benchmarks Finding the right database solution for your use case can be an arduous journey. The database deployment touches aspects of throughput performance, latency control, high availability and data resilience. You will need to decide on the infrastructure to use: Cloud, on-premise or a hybrid solution. Data models also have an impact on finding the right fit for the use case. Once you establish a requirements set, the next step is to test your use case against the databases of choice. In this workshop, we will discuss the different data points you need to collect in order to get the most realistic testing environment. We will cover: Data model impact on performance and latency Client behavior related to database capabilities Failover and high availability testing Hardware selection and cluster configuration impact We will show 2 benchmarking tools you can use to test and benchmark your clusters to identify the optimal deployment scenario for your use case. Attend this virtual workshop if you are: Looking to minimize the cost of your database deployment Making a database decision based on performance and scale data Planning to emulate your workload on a pre-production system where you can test, fail fast and learn.

Productizing a Cassandra-Based Solution (Brij Bhushan Ravat, Ericsson) | C* S...

DataStax

Cassandra is a better alternative to RDBMS for a scalable solution which requires a distributed DB but it is more popular in clustered solutions which are targeted for a single installation. Key reason is maintainability & life-cycle management. Ericsson has re-engineered its voucher management solution for prepaid billing by replacing RDBMS with Cassandra. It facilitates clusters with large set of nodes which can easily scale up & scale down, so that one doesn't have to deal with multiple clusters. However, skills for its administration are sparse, unlke RDBMS. Activities like nodetool repair, compaction & scale up/down become challenging. Moreover, frequency of new Cassandra releases is high and rolling them out to several deployments is challenging Key technical challenges were consistency of denormalized data, performance of full-table scan & porting the product from Thrift to CQL. Challenges with large scale global deployments are with anti-entropy & size-tiered compaction. About the Speaker Brij Bhushan Ravat Chief Architect, Ericsson Brij is Chief Architect for prepaid billing product in Ericsson. The product uses Cassandra in business support systems for telecom service providers. He has also led Centre of Excellence for Network Applications, which tracks emerging trends in the application development in the area of telecom. This includes telecom services, OSS & leveraging big data technologies for innovative new age solutions His focus is on application of big data in telecom. This includes analytics using Spark & NoSQL

Cassandra is a distributed database with features included but not limited to Secundary Indexes, UDF, Materialized Views, etc. and not so strict hardware requirements. It is important to use those features and select hardware correctly to make sure the use of Cassandra in your business can be as painless as possible. I will address how these features are used in the wrong way, how hardware should be selected, and how to make Cassandra work in the best possible way. Learning Objective #1: Learn that Cassandra hardware requirements exist (and why) and the shortcomings in some of features(Secundary Indexes, Compaction Strategies, etc). Learning Objective #2: The most misused features and common hardware errors. How they might seem harmeless at first (either small cluster or even single node). Learning Objective #3: How to correctly use Cassandra and it's features and go for perfect operation. About the Speaker Carlos Rolo Cassandra Consultant, Pythian Carlos Rolo is a Cassandra MVP, and has deep expertise with distributed architecture technologies. Carlos is driven by challenge, and enjoys the opportunities to discover new things.. He has become known and trusted by customers and colleagues for his ability to understand complex problems, and to work well under pressure. When Carlos isn't working he can be found playing water polo or enjoying the his local community.

Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...

DataStax Academy

The state of analytics has changed dramatically over the last few years. Hadoop is now commonplace, and the ecosystem has evolved to include new tools such as Spark, Shark, and Drill, that live alongside the old MapReduce-based standards. It can be difficult to keep up with the pace of change, and newcomers are left with a dizzying variety of seemingly similar choices. This is compounded by the number of possible deployment permutations, which can cause all but the most determined to simply stick with the tried and true. But there are serious advantages to many of the new tools, and this presentation will give an analysis of the current state–including pros and cons as well as what’s needed to bootstrap and operate the various options. About Robbie Strickland, Software Development Manager at The Weather Channel Robbie works for The Weather Channel’s digital division as part of the team that builds backend services for weather.com and the TWC mobile apps. He has been involved in the Cassandra project since 2010 and has contributed in a variety of ways over the years; this includes work on drivers for Scala and C#, the Hadoop integration, heading up the Atlanta Cassandra Users Group, and answering lots of Stack Overflow questions.

Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...

DataStax

Lessons learned from a year spent building a Cassandra cluster over multiple regions, data centers, and providers. Will discuss our successes and learnings on replication, operations, and application development. About the Speaker Aaron Ploetz Lead Technical Architect, Target Aaron is a Lead Technical Architect for Target, where he coaches development teams on modeling and building applications for Cassandra. He is active in the Cassandra tags on StackOverflow, and has also contributed patches to cqlsh. Aaron holds a B.S. in Management/Computer Systems from the University of Wisconsin-Whitewater, a M.S. in Software Engineering and Database Technologies from Regis University, and is a 2x DataStax MVP for Apache Cassandra.

Lambda Architecture with Spark

Knoldus Inc.

IEEE International Conference on Data Engineering 2015

Yousun Jeong

Stsg17 speaker yousunjeong

Yousun Jeong

Spark Summit EU talk by Kaarthik Sivashanmugam

Spark Summit

ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...

Data Con LA

Scylla is a new, open-source NoSQL data store with a novel design optimized for modern hardware, capable of 1.8 million requests per second per node, while providing Apache Cassandra compatibility and scaling properties. While conventional NoSQL databases suffer from latency hiccups, expensive locking, and low throughput due to low processor utilization, the Scylla design is based on a modern shared-nothing approach. Scylla runs multiple engines, one per core, each with its own memory, CPU and multi-queue NIC. The result is a NoSQL database that delivers an order of magnitude more performance, with less performance tuning needed from the administrator. With extra performance to work with, NoSQL projects can have more flexibility to focus on other concerns, such as functionality and time to market. Come for the tech details on what Scylla does under the hood, and leave with some ideas on how to do more with NoSQL, faster. Speaker bio Don Marti is technical marketing manager for ScyllaDB. He has written for Linux Weekly News, Linux Journal, and other publications. He co-founded the Linux consulting firm Electric Lichen. Don is a strategic advisor for Mozilla, and has previously served as president and vice president of the Silicon Valley Linux Users Group and on the program committees for Uselinux, Codecon, and LinuxWorld Conference and Expo.

Webinar: How to Shrink Your Datacenter Footprint by 50%

ScyllaDB

Are you running separate database clusters for operational and analytical workloads? If your company is like most, you're dedicating too much time and effort maintaining infrastructure to support both OLTP and OLAP. To make life easier, Scylla now has the ability to handle multiple workloads from a single cluster--without performance degradation to either. We call it Workload Prioritization, and it could make a big difference to your team. Join our webinar to learn about the vision behind developing this feature. We’ll show you: - The evolving requirements for operational (OLTP) and analytics (OLAP) workloads in the modern datacenter - How Scylla provides built-in control over workload priority and makes it easy for administrators to configure workload priorities - The TCO impact of minimizing integrations and maintenance tasks, while also shrinking the datacenter footprint and maximizing utilization Plus we’ll share test results of how it performs in real-world settings.

Instaclustr webinar 2017 feb 08 japan

Hiromitsu Komatsu

The True Cost of NoSQL DBaaS Options

ScyllaDB

Many NoSQL DBaaS vendors limit what cloud platform you can run on, the size of the data you can run and require you to over-provision cloud infrastructure resources while failing to deliver performance and low latency at scale. In this session, we will compare the performance and Total Cost of Ownership (TCO) of competing NoSQL DBaaS offerings. We will also review how to migrate to Scylla Cloud, our fully managed database service. You will learn: - The true cost of ownership for selected NoSQL DBaaS offerings - The 8 essentials for selecting a NoSQL DBaaS - Migration options from Apache Cassandra, DynamoDB and other databases

Lambda architecture

Szilveszter Molnár

Data Stores @ Netflix

Vinay Kumar Chella

How to Build a Scylla Database Cluster that Fits Your Needs

ScyllaDB

Sizing a database cluster makes or breaks your application. Too small and you could sustain spikes in usage and recover from a node loss or an operational slowdown. Too big and your cluster will cost more and waste valuable human resources. Since different workloads have different requirements, successful sizing of your application should be optimized for both throughput and latency performance. However, in many cases, the requirements for each contradicts each other. In this webinar, we explain how to remediate the contradicting forces and build a sustainable cluster to meet both performance and resiliency requirements.

Case Study: Troubleshooting Cassandra performance issues as a developer

Carlos Alonso Pérez

Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017

Big Data Spain

Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...

DataStax

Designing & Optimizing micro batch processing system to handle multi-billion events using 100+ nodes of Cassandra , spark and Kafka - Lessons learned from the trenches Designing and Optimizing 20+ billion operations a day presents a set of complex challenges especially when the SLA is near real-time. In this presentation we will walk through our experience in building large scale event processing pipeline using Cassandra , spark streaming and kafka using 100+ nodes. We will present the Design patterns, development steps and diagnostics setups at the technology level and application level that are needed to manage the application of this scale. We also aim to present some unique problems we encountered in optimizing and operationalizing these environments. About the Speakers Ananth Ram Senior Principal / Senior Manager, Accenture Ananth Ram is a Solution Architect with over 17 years of experience in Oracle database Architecture and designing large scale applications. He was with Oracle Corp for nine years before joining Accenture as Senior Principal . As a part of Accenture, Ananth has been working on many large scale Oracle and big data initiatives in the last four years. Rich Rein Solution Architect, DataStax Rich Rein is a Solutions Architect from DataStax on Accenture team with over 30+ years as an architect, manager, and consultant in Silicon Valley's computing industry. Rumeel Kazi, Accenture Federal Rumeel Kazi is a Senior Manager in the Accenture Health & Public Service (H&PS) practice. He has over 17 years of Systems Integration implementation experience involving Oracle, J2EE platforms, Enterprise Application Integration, Supply Chain, ETL and Business Rules Management Systems. Rumeel has been working on large scale Oracle and big data application solutions since the last 5 years.

Light Weight Transactions Under Stress (Christopher Batey, The Last Pickle) ...

DataStax

The Strong Consistency provided by QUORUM reads in Cassandra can still lead to read-write-modify problems when applications want to do things such as guarantee uniqueness or sell exactly 300 cinema tickets. Fortunately Light Weight Transactions (LWT) are designed to solve the problems Strong Consistency can not. In this talk Christopher Batey, Consultant at The Last Pickle, will discuss: - Syntax and semantics: Theoretical use cases - How they work under the covers Then we will go through LWTs in practice: - How do the number of nodes/replicas/data centres affect performance? - How does contention (multiple concurrent queries using LWTs) affect availability and performance? - What consistency guarantees do you get with other LWTs and non-LWTs? - How does LWT timeout differ from normal write timeout? - Use case: LWTs as a distributed lock and how it went wrong 5 times. About the Speaker Christopher Batey Consultant / Software Engineer, The Last Pickle Christopher (@chbatey) is a part time consultant at The Last Pickle where he works with clients to help them succeed with Apache Cassandra as well as a freelance software engineer working in London. Likes: Scala, Haskell, Java, the JVM, Akka, distributed databases, XP, TDD, Pairing. Hates: Untested software, code ownership. You can checkout his blog at: http://www.batey.info

What's hot

Spark Summit EU talk by Mike Percy

Spark Summit

Cisco: Cassandra adoption on Cisco UCS & OpenStack

DataStax Academy

Cassandra CLuster Management by Japan Cassandra Community

Hiromitsu Komatsu

DIscover Spark and Spark streaming

Maturin BADO

Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...

DataStax

Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...

DataStax Academy

Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...

DataStax

Lambda Architecture with Spark

Knoldus Inc.

IEEE International Conference on Data Engineering 2015

Yousun Jeong

Stsg17 speaker yousunjeong

Yousun Jeong

Spark Summit EU talk by Kaarthik Sivashanmugam

Spark Summit

ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...

Data Con LA

Webinar: How to Shrink Your Datacenter Footprint by 50%

ScyllaDB

Instaclustr webinar 2017 feb 08 japan

Hiromitsu Komatsu

The True Cost of NoSQL DBaaS Options

ScyllaDB

Lambda architecture

Szilveszter Molnár

Data Stores @ Netflix

Vinay Kumar Chella

How to Build a Scylla Database Cluster that Fits Your Needs

ScyllaDB

Case Study: Troubleshooting Cassandra performance issues as a developer

Carlos Alonso Pérez

Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017

Big Data Spain

What's hot (20)

Spark Summit EU talk by Mike Percy

Cisco: Cassandra adoption on Cisco UCS & OpenStack

Cassandra CLuster Management by Japan Cassandra Community

DIscover Spark and Spark streaming

Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...

Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...

Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...

Lambda Architecture with Spark

IEEE International Conference on Data Engineering 2015

Stsg17 speaker yousunjeong

Spark Summit EU talk by Kaarthik Sivashanmugam

ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...

Webinar: How to Shrink Your Datacenter Footprint by 50%

Instaclustr webinar 2017 feb 08 japan

The True Cost of NoSQL DBaaS Options

Lambda architecture

Data Stores @ Netflix

How to Build a Scylla Database Cluster that Fits Your Needs

Case Study: Troubleshooting Cassandra performance issues as a developer

Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017

Viewers also liked

Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...

DataStax

Light Weight Transactions Under Stress (Christopher Batey, The Last Pickle) ...

DataStax

KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Sum...

DataStax

In 2012 I presented my first version of the KillrVideo video sharing site and a data model was born! Many things have happened to Cassandra since then and as a result, the data model for KillrVideo has evolved. The transition from Thrift to CQL was the first big shift. From Cassandra 2 to 3 we have seen some major usability enhancements to CQL that have reduced the complexity on the application developer. Indexing changes. Denormalization help. Syntax changes in the select queries. Storage engine changes that has eliminated anti-patterns. A lot to talk about in a constantly evolving project like Apache Cassandra. Don't get left behind! About the Speaker Patrick McFadin Chief Evangelist, DataStax Patrick McFadin is one of the leading experts of Apache Cassandra and data modeling techniques. As the Chief Evangelist for Apache Cassandra and consultant for DataStax, he has helped build some of the largest and exciting deployments in production. Previous to DataStax, he was Chief Architect at Hobsons and an Oracle DBA/Developer for over 15 years.

Optimizing Cassandra in AWS

greggulrich

Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...

DataStax

Advanced Apache Cassandra operations depends on an understanding of what features are available via the JMX interface. While nodetool exposes many of these, the most useful are still waiting to be discovered. The JMX interface allows the code base to expose functions that operate directly on internal structures, making real time changes to the way the process runs. With this skill in your toolkit there is no limit to the changes you can make. In this talk Nate McCall, CTO at The Last Pickle, will explain how to explore, secure, and invoke the JMX interface exposed by Cassandra. He'll then move on to what you can do with it such as compacting specific SSTables, changing compaction on a single node, managing repairs, diagnosing latency, viewing cross node timeouts, and others. Whether you are a developer or operator, new or experienced, you will be given a thorough understanding of what all is available via JMX without having to consult the code on your own. About the Speaker Nate McCall CTO, The Last Pickle Nate McCall has 16 years of server-side systems and software development experience. He started his involvement in the Cassandra community in the late fall of 2009 when he became one of the original developers on the Hector Java client. He has contributed a number of patches over the years to the Apache Cassandra code base and continues to be actively involved on the mail lists, issue system and IRC. He has been a DataStax MVP every year since the inception of the program.

Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...

DataStax

Deleting data from Cassandra has several challenges, and existing solutions (tombstones or TTLs) have limitations that make them unusable or untenable in certain circumstances. We'll explore the cases where existing deletion options fail or are inadequate, then describe a solution we developed which deletes data from Cassandra during standard or user-defined compaction, but without resorting to tombstones or TTL's. About the Speaker Eric Stevens Principal Architect, ProtectWise, Inc. Eric is the principal architect, and day one employee of ProtectWise, Inc., specializing in massive real time processing and scalability problems. The team at ProtectWise processes, analyzes, optimizes, indexes, and stores billions of network packets each second. They look for threats in real time, but also store full fidelity network data (including PCAP), and when new security intelligence is received, automatically replay existing network history through that new intelligence.

Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...

DataStax

There are many aspects of tuning Cassandra for production and a lot can go wrong: network splits and latency, hardware issues and failure, data corruption, etc. Most are mitigated with Cassandra's architecture but there are use cases where we need to dig deep and tune all layers to get the result we need to achieve specific business goals. We will explore such case where we had to tune Cassandra for performance but also have consistent results on 99.999% of the queries. Getting even to 99 percent was relatively easy, but pushing those extra nines involved a lot of work. There are many nuts and bolts to turn and tune in order to get consistent results. We will cover biggest latency-inducing factors and see how to set up metrics and tackle inevitable issues when doing cloud-based deployments. We will get into one of the major "sins" regarding AWS deployment by demystifying EBS based storage and talk about how we can leverage OS properties while tuning for high read performance. About the Speaker Matija Gobec CTO, SmartCat Experienced software engineer interested in distributed streaming systems and real time analytics. In love with Cassandra since early versions.

A look at the CQL changes in 3.x (Benjamin Lerer, Datastax) | Cassandra Summi...

DataStax

Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...

DataStax

Cassandra's support for multiple data centers can bring massive benefits to an organization, however it can also bring painful operational lessons. While there is no recipe for trouble free mutli DC clusters, the best approach is to understand why you are using one, what Cassandra supports, and how it does it. With this knowledge in your toolkit you will have a better chance of fixing the sort of gremlins that can trouble a globally distributed database. In this talk Alexander Dejanovski, Consultant at The Last Pickle, will outline the motivations people typically have for running a multi DC cluster. He will also look at how multiple DC's are supported through all areas of the Cassandra, how it impacts your application and operations, and how you can always blame the network. About the Speaker Alexander DEJANOVSKI Consultant, The Last Pickle Alexander has been working as a software developer for the last 18 years, mainly for the french leader of express shipments. He's been leading there the effort to build a Cassandra based architecture and migrate services to it from traditional RDBMS. He is involved in the Cassandra community through the development of a JDBC wrapper for the DataStax Java Driver. Recently, he joined The Last Pickle as a Cassandra consultant and now helps customers to get the best out of it.

Viewers also liked (9)

Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...

Light Weight Transactions Under Stress (Christopher Batey, The Last Pickle) ...

KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Sum...

Optimizing Cassandra in AWS

Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...

Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...

Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...

A look at the CQL changes in 3.x (Benjamin Lerer, Datastax) | Cassandra Summi...

Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...

Similar to C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Software Architecture for Cloud Infrastructure

Tapio Rautonen

Pulsar - Real-time Analytics at Scale

Tony Ng

Enterprises are Increasingly demanding realtime analytics and insights to power use cases like personalization, monitoring and marketing. We will present Pulsar, a realtime streaming system used at eBay which can scale to millions of events per second with high availability and SQL-like language support, enabling realtime data enrichment, filtering and multi-dimensional metrics aggregation. We will discuss how Pulsar integrates with a number of open source Apache technologies like Kafka, Hadoop and Kylin (Apache incubator) to achieve the high scalability, availability and flexibility. We use Kafka to replay unprocessed events to avoid data loss and to stream realtime events into Hadoop enabling reconciliation of data between realtime and batch. We use Kylin to provide multi-dimensional OLAP capabilities.

WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

WSO2

The Marketplace data team at Uber has built a scalable complex event processing platform to solve many challenging real-time data needs for various Uber products. This platform has been in production for more than a year and supports over 100 real-time data use cases with a team of 3. In this talk, we will share the detail of the design and our experience, and how we employ Siddhi, Kafka and Samza at scale.

Using VisualSim Architect for Semiconductor System Analysis

Deepak Shankar

Mirabilis Design provides architecture exploration software for semiconductor, electronics and embedded software. Using this modeling and simulation solution, designers could trade-off power vs performance, partition into hardware-software, optimize for timing, minimize power consumption, functional analysis and evaluate the quality of the system in the event of a failure. The outcome of this early exploration is a highly validated specification, a reference design for prospective customers to evaluate and data for certification purposes. VisualSim has a large library of components (stochastic, hardware, software, network and RTOS) that is used to assemble models of the entire system, extremely fast and handle level of abstraction from stochastic to timing-accurate. These models are simulated against workloads and use-cases and the generated reports are used to make architecture decisions.

A sdn based application aware and network provisioning

Stanley Wang

Tech Talk: Leverage the combined power of CA Unified Infrastructure Managemen...

CA Technologies

Take the guesswork out of your infrastructure environment by combining CA Unified Infrastructure Management, CA Network Flow Analysis and CA Application Delivery Analysis. Learn how to optimize your infrastructure by combining IT monitoring, network traffic monitoring and application response time monitoring solutions to give you enhanced end-to-end visibility into your infrastructure. This sessions will review the power of the three solutions and explain how you can easily combine them to give you the information you need. For more information, please visit http://cainc.to/Nv2VOe

Big Data Berlin v8.0 Stream Processing with Apache Apex

Apache Apex

Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...

Dataconomy Media

CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...Farley Lai

Creating a Centralized Consumer Profile Management Service with WebSphere Dat...

Prolifics

In this presentation will talk about how one of the world's leading Financial Institutions, leveraged WebSphere DataPower to provide a set of centralized consumer profile management services. This central service would be leveraged by internal and external applications, and would align with enterprise marketing capabilities. The solution included a complex security model which included the following products: Tivoli Directory Server, Tivoli Access Manager and Tivoli Federated Identity Manager. We will describe how to build complex orchestrations in WebSphere DataPower, and also go through some of the performance tuning options we implemented to achieve a high degree of efficiency.

Transform Your Organization with Real Real-Time Monitoring

Amazon Web Services

Acquia, a Drupal web experience provider, faced a common growing pain: with its expanding customer base and AWS workloads came numerous monitoring systems and scattered data from disparate sources and teams. The company knew it needed better insight into its customers’ resources and quicker access to data it could trust. Join our webinar to see why Acquia turned to SignalFx for real real-time monitoring for its AWS environment, enabling its entire organization with operational insights, from development all the way through sales. Learn how Acquia consolidated the number of monitoring services used, improved the quality of its customer services, and saved more than half a million dollars per year in costs.

Guide to Application Performance: Planning to Continued Optimization

MuleSoft

Improving Traffic Prediction Using Weather Data with Ramya Raghavendra

Spark Summit

As common sense would suggest, weather has a definite impact on traffic. But how much? And under what circumstances? Can we improve traffic (congestion) prediction given weather data? Predictive traffic is envisioned to significantly impact how driver’s plan their day by alerting users before they travel, find the best times to travel, and over time, learn from new IoT data such as road conditions, incidents, etc. This talk will cover the traffic prediction work conducted jointly by IBM and the traffic data provider. As a part of this work, we conducted a case study over five large metropolitans in the US, 2.58 billion traffic records and 262 million weather records, to quantify the boost in accuracy of traffic prediction using weather data. We will provide an overview of our lambda architecture with Apache Spark being used to build prediction models with weather and traffic data, and Spark Streaming used to score the model and provide real-time traffic predictions. This talk will also cover a suite of extensions to Spark to analyze geospatial and temporal patterns in traffic and weather data, as well as the suite of machine learning algorithms that were used with Spark framework. Initial results of this work were presented at the National Association of Broadcasters meeting in Las Vegas in April 2017, and there is work to scale the system to provide predictions in over a 100 cities. Audience will learn about our experience scaling using Spark in offline and streaming mode, building statistical and deep-learning pipelines with Spark, and techniques to work with geospatial and time-series data.

Cloud MigrationJolyne Marie

Databus - LinkedIn's Change Data Capture Pipeline

Sunil Nagaraj

COLLABORATE 18 Presentation: Demand Planning in Cloud R13

Jade Global

Understanding Demand Planning in Cloud R13 Through an Early Adaptor Case Study Session Abstract: Oracle has released Demand Planning in Cloud with Release 13. We will share the experiences of an early adaptor customer with their demand management and SNOP processes in cloud. We will also compare the cloud offering with that of Demantra and provide a guidance on its readiness for different industries. In the end we will explore the coexistence possibilities and prerequisites for Demand Management Cloud.

Autoscaling Confluent Cloud: Should We? How Would We?

HostedbyConfluent

"Although cloud-based, managed Kafka offerings abstract away most administrative responsibilities, a few admin-related concerns remain––like cluster scaling. When is scaling your cloud-based Kafka appropriate? And how should you set it up to auto-scale? Gone are the days of over-provisioning resources to meet expected demand. Technologies like kubernetes make it relatively simple to implement strategies around both horizontal and vertical scaling. Cloud providers give users the ability to track their resource utilization and set up autoscaling groups and policies. Cloud administrators use these tools (and others) to guarantee their applications can handle the demands placed on them. With Kafka being a central pillar of our cloud-native data pipelines it requires administrators to determine if, when and how to scale Kafka as their workloads ebb and flow. In this session, we’ll explore the topic of auto-scaling by implementing a strategy for Confluent Cloud resources. We’ll first discuss common use cases that dictate a need to create a scaling strategy for Confluent Cloud and introduce the approaches best suited for each use case. With a nod to both where we came from and where we are going, we will discuss the architecture of Confluent Cloud and how it affects the way we scale Kafka. Attendees will learn how to deal with ephemeral workloads, what to monitor for when creating an auto-scaling policy, and the “gotchas” of auto-scaling in Confluent Cloud. We will also discuss best practices for scaling Kafka clients, because Kafka is only as scalable as the client applications that connect to it. We will dive into code that examines these approaches and by the end of the session, you’ll have the tools needed to design and implement your own scaling strategy for your Confluent Cloud workloads."

Tuning Java Driver for Apache Cassandra

Nenad Bozic

Apache Cassandra is distributed masterless column store database which is becoming mainstream for analytics and IoT data. Many use cases where Cassandra is natural fit require latency tuning in order to serve requests really fast. DataStax driver has many options, some less familiar, which can greatly influence performance aspect. This talk will focus on Java applications and options at your disposal in DataStax Java driver which became standard when you want to use this database. We will concentrate on both monitoring and tuning aspect of things and we will provide different options for different use cases. There is no silver bullet solution and having many options requires deep dive when you want to figure out right decision. This talk will narrow down options and give you push in the right direction.

AWS Migration Planning Roadmap

Amazon Web Services

The pathway to the cloud has many different options and levers that customers can pull. This webinar walks customers through actual steps from creating a cloud adoption vision to actually building a migration roadmap with actionable guidance. We’ll go through proven migration patterns, methods and tooling that AWS has leveraged successfully with hundreds of Enterprise customers around the globe. Learn what challenges customers face when planning the migrations to cloud, and how they overcome them to minimize risk and accelerate the adoption.

The Need for Complex Analytics from Forwarding Pipelines

Netronome

Similar to C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016 (20)

Software Architecture for Cloud Infrastructure

Pulsar - Real-time Analytics at Scale

WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Using VisualSim Architect for Semiconductor System Analysis

A sdn based application aware and network provisioning

Tech Talk: Leverage the combined power of CA Unified Infrastructure Managemen...

Big Data Berlin v8.0 Stream Processing with Apache Apex

Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...

CSense: A Stream-Processing Toolkit for Robust and High-Rate Mobile Sensing A...

Creating a Centralized Consumer Profile Management Service with WebSphere Dat...

Transform Your Organization with Real Real-Time Monitoring

Guide to Application Performance: Planning to Continued Optimization

Improving Traffic Prediction Using Weather Data with Ramya Raghavendra

Cloud Migration

Databus - LinkedIn's Change Data Capture Pipeline

COLLABORATE 18 Presentation: Demand Planning in Cloud R13

Autoscaling Confluent Cloud: Should We? How Would We?

Tuning Java Driver for Apache Cassandra

AWS Migration Planning Roadmap

The Need for Complex Analytics from Forwarding Pipelines

More from DataStax

Is Your Enterprise Ready to Shine This Holiday Season?

DataStax

Be a holiday hero—not a sorry statistic. View this on-demand webinar to learn how to drive revenue, business growth, customer satisfaction, and loyalty during the holiday season, and achieve operational excellence (and sanity!) at the same time. You’ll also hear real-world stories of companies that have experienced Black Friday nightmares—and learn how they turned things back around. View webinar: https://pages.datastax.com/20191003-NAM-Webinar-IsYourEnterpriseReadytoShinethisHolidaySeason_1-Registration-LP.html Explore all DataStax webinars: www.datastax.com/webinars

Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...

DataStax

Data resiliency and availability are mission-critical for enterprises today—yet we live in a world where outages are an everyday occurrence. Whether the problem is a single server failure or losing connectivity to an entire data center, if your applications aren’t designed to be fault tolerant, recovery from an outage can be painful and slow. Watch this on-demand webinar to look at best practices for developing fault-tolerant applications with DataStax Drivers for Apache Cassandra and DataStax Enterprise (DSE). View recording: https://youtu.be/NT2-i3u5wo0 Explore all DataStax webinars: https://www.datastax.com/resources/webinars

Running DataStax Enterprise in VMware Cloud and Hybrid Environments

DataStax

To simplify deploying and managing modern applications, enterprises have been combining the benefits of hyperconverged infrastructure (HCI) with the performance and scale of a NoSQL database — and the results have been remarkable. With this combination, IT organizations have experienced more agility, improved reliability, and better application performance. Watch this on-demand webinar where you’ll learn specifically how VMware HCI with DataStax Enterprise (DSE) and Apache Cassandra™ are transforming the enterprise. View recording: https://youtu.be/FCLGHMIB0L4 Explore all DataStax Webinars: https://www.datastax.com/resources/webinars

Best Practices for Getting to Production with DataStax Enterprise Graph

DataStax

A distributed graph database is the most powerful means of discovering and leveraging the relationships in your data. With the right techniques combined with the right enterprise graph features, you can build modern applications at scale for real-time use-cases. But how exactly should you manage and model your data for a distributed graph database? And how can you leverage the relationships in that data? Watch this on-demand webinar as our graph expert answers those questions and shares tips and insights into creating production apps with distributed graph data. View recording: https://youtu.be/TSs_qPnhOas

Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey

DataStax

Data management may be the hardest part of making the transition to the cloud, but enterprises including Intuit and Macy’s have figured out how to do it right. So what do they know that you might not? Join Robin Schumacher, Chief Product Officer at DataStax as he explores best practices for defining and implementing data management strategies for the cloud. He outlines a four-step journey that will take you from your first deployment in the cloud through to a true intercloud implementation and walk through a real-world use case where a major retailer has evolved through the four phases over a period of four years and is now benefiting from a highly resilient multi-cloud deployment. View webinar: https://youtu.be/RrTxQ2BAxjg

Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...

DataStax

In this webinar, you will leverage free and open source tools as well as enterprise-grade utilities developed by DataStax to get a solid grasp on the performance of a masterless distributed database like Cassandra. You’ll also get the opportunity to walk through DataStax Enterprise Insights dashboards and see exactly how to identify performance bottlenecks. View Recording: https://youtu.be/McZg_MMzVjI

Webinar | Better Together: Apache Cassandra and Apache Kafka

DataStax

In this webinar, you’ll also be introduced to DataStax Apache Kafka Connector, and get a brief demonstration of this groundbreaking technology. You’ll directly experience how this tool can help you stream data from Kafka topics into DataStax Enterprise versions of Cassandra. The future of your organization won’t wait. Register now to reserve your spot in this exciting new webinar. Youtube: https://youtu.be/HmkNb8twUNk

Top 10 Best Practices for Apache Cassandra and DataStax Enterprise

DataStax

No matter how diligent your organization is at driving toward efficiency, databases are complex and it’s easy to make mistakes on your way to production. The good news is, these mistakes are completely avoidable. In this webinar, Jeff Carpenter shares with you exactly how to get started in the right direction — and stay on the path to a successful database launch. View recording: https://youtu.be/K9Zj3bhjdQg Explore all DataStax webinars: https://www.datastax.com/resources/webinars

Introduction to Apache Cassandra™ + What’s New in 4.0

DataStax

Apache Cassandra has been a driving force for applications that scale for over 10 years. This open-source database now powers 30% of the Fortune 100.Now is your chance to get an inside look, guided by the company that’s responsible for 85% of the code commits.You won’t want to miss this deep dive into the database that has become the power behind the moment — the force behind game-changing, scalable cloud applications - Patrick McFadin, VP Developer Relations at DataStax, is going behind the Cassandra curtain in an exclusive webinar. View recording: https://youtu.be/z8fLn8GL5as Explore all DataStax webinars: https://www.datastax.com/resources/webinars

Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...

DataStax

In this webinar, we’ll discuss how an Active Everywhere database—a masterless architecture where multiple servers (or nodes) are grouped together in a cluster—provides a consistent data fabric between on-premises data centers and public clouds, enabling enterprises to effortlessly scale their hybrid cloud deployments and easily transition to the new hybrid cloud world, without changes to existing applications. View recording: https://youtu.be/ob6tr-9YiF4

Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities

DataStax

The European Union’s General Data Protection Regulation (GDPR) has sweeping effects on how enterprises manage their data. Without the right policies and safeguards in place, a tiny data mishap could end up turning into a catastrophic mistake. Join Datastax and our partner Thales eSecurity for a live webinar to learn how GDPR effects impact data management and the various ways enterprises can both comply and thrive in a hybrid cloud environment. View recording: https://youtu.be/QZ48_qkK9PU Explore all DataStax webinars: https://www.datastax.com/resources/webinars

Designing a Distributed Cloud Database for Dummies

DataStax

Join Designing a Distributed Cloud Database for Dummies—the webinar. The webinar “stars” industry vet Patrick McFadin, best known among developers for his seven years at Apache Cassandra, where he held pivotal community roles. Register for the webinar today to learn: why you need distributed cloud databases, the technology you need to create the best used experience, the benefits of data autonomy and much more. View the recording: https://youtu.be/azC7lB0QU7E To explore all DataStax webinars: https://www.datastax.com/resources/webinars

How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud

DataStax

Most enterprises understand the value of hybrid cloud. In fact, your enterprise is already working in a multi-cloud or hybrid cloud environment, whether you know it or not. View this SlideShare to gain a greater understanding of the requirements of a geo-distributed cloud database in hybrid and multi-cloud environments. View recording: https://youtu.be/tHukS-p6lUI Explore all DataStax webinars: https://www.datastax.com/resources/webinars

How to Evaluate Cloud Databases for eCommerce

DataStax

View these slides to discover the advantages of a distributed cloud database designed for hybrid cloud along with examples of how companies are delivering innovative and personalized ecommerce experiences. We'll discuss the sources of common data challenges and the hidden impact they have on business, the database requirements for improved customer experiences and innovative application delivery, and how leading organizations such as eBay, Sony, Macy’s, and Comcast are transforming the eCommerce experience with DataStax Enterprise 6. View recording: https://youtu.be/4UXrJ3xtmGg Explore all DataStax webinars: https://www.datastax.com/resources/webinars

Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...

DataStax

Today’s customers want experiences that are contextual, always on, and above all — delightful. To be able to provide this, enterprises need a distributed, hybrid cloud-ready database that can easily crunch massive volumes of data from disparate sources while offering data autonomy and operational simplicity. Don’t miss this webinar, where you’ll learn how DataStax Enterprise 6 maintains hybrid cloud flexibility with all the benefits of a distributed cloud database, delivers all the advantages of Apache Cassandra with none of the complexities, doubles performance, and provides additional capabilities around robust transactional analytics, graph, search, and more. View recording: https://youtu.be/tuiWAt2jwBw Explore all DataStax webinars: https://www.datastax.com/resources/webinars

Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...

DataStax

Today’s Right-Now Economy means employees and customers alike expect applications to be always on, real time, and contextual. But how do you manage applications that collect data from a variety of sources, at cloud scale, and provide instant insights? And, can you embrace the public cloud while still retaining control of your data? Join us to hear from Microsoft Cloud Architect and Azure Global Black Belt Ron Abellera to learn how an enterprise-ready hybrid cloud data layer can help to accelerate time to market and scale linearly, ensure continuous availability, and achieve data autonomy with a hybrid cloud strategy. View webinar recording: https://youtu.be/_-GqmAk5C_I Explore all DataStax webinars: https://www.datastax.com/resources/webinars

Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...

DataStax

Welcome to the Right-Now Economy. To win in the Right-Now Economy, your enterprise needs to be able to provide delightful, always-on, instantaneously responsive applications via a data layer that can handle data rapidly, in real time, and at cloud scale. Don’t miss our upcoming webinar in which Forrester Principal Analyst Brendan Witcher will discuss why a singular, contextual, 360-degree view of the customer in real-time is critical to CX success and how companies are using data to deliver real-time personalization and recommendations. View recording: https://youtu.be/e6prezfIGMY Explore all DataStax webinars: https://www.datastax.com/resources/webinars

Datastax - The Architect's guide to customer experience (CX)

DataStax

An Operational Data Layer is Critical for Transformative Banking Applications

DataStax

Customer expectations are changing fast, while customer-related data is pouring in at an unprecedented rate and volume. Join this webinar, to hear leading experts from DataStax, discuss how DataStax Enterprise, the data management platform trusted by 9 out of the top 15 global banks, enables innovation and industry transformation. They’ll cover how the right data management platform can help break down data silos and modernize old systems of record as an operational data layer that scales to meet the distributed, real-time, always available demands of the enterprise. Register now to learn how the right data management platform allows you to power innovative banking applications, gain instant insight into comprehensive customer interactions, and beat fraud before it happens. Video: https://youtu.be/319NnKEKJzI Explore all DataStax webinars: https://www.datastax.com/resources/webinars

Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking

DataStax

Customer expectations are changing fast, while customer-related data is pouring in at an unprecedented rate and volume. How can you contextualize and analyze all this customer data in real time to meet increasingly demanding customer expectations? Join Mike Rowland, Director and National Practice Leader for CX Strategy at West Monroe Partners, and Kartavya Jain, Product Marketing Manager at DataStax, for an in-depth conversation about how customer experience frameworks, driven by Design Thinking, can help enterprises: understand their customers and their needs, define their strategy for real-time CX, create value from contextual and instant insights.

More from DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?

Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...

Running DataStax Enterprise in VMware Cloud and Hybrid Environments

Best Practices for Getting to Production with DataStax Enterprise Graph

Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey

Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...

Webinar | Better Together: Apache Cassandra and Apache Kafka

Top 10 Best Practices for Apache Cassandra and DataStax Enterprise

Introduction to Apache Cassandra™ + What’s New in 4.0

Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...

Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities

Designing a Distributed Cloud Database for Dummies

How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud

How to Evaluate Cloud Databases for eCommerce

Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...

Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...

Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...

Datastax - The Architect's guide to customer experience (CX)

An Operational Data Layer is Critical for Transformative Banking Applications

Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking

Recently uploaded

Large Language Models and the End of Programming

Matt Welsh

Understanding Globus Data Transfers with NetSage

Globus

NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?

A Sighting of filterA in Typelevel Rite of Passage

Philip Schwarz

TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR

Tier1 app

Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.

First Steps with Globus Compute Multi-User Endpoints

Globus

In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.

OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam

takuyayamamoto1800

GlobusWorld 2024 Opening Keynote session

Globus

Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better

XfilesPro

Developing Distributed High-performance Computing Capabilities of an Open Sci...

Globus

COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.

Into the Box 2024 - Keynote Day 2 Slides.pdf

Ortus Solutions, Corp

Enhancing Research Orchestration Capabilities at ORNL.pdf

Globus

Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.

Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...

Mind IT Systems

Cyaniclab : Software Development Agency Portfolio.pdf

Cyanic lab

CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.

BoxLang: Review our Visionary Licenses of 2024

Ortus Solutions, Corp

How to Position Your Globus Data Portal for Success Ten Good Practices

Globus

Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.

Cracking the code review at SpringIO 2024

Paco van Beckhoven

Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production. Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process? In this session we will cover: - The Art of Effective Code Reviews - Streamlining the Review Process - Elevating Reviews with Automated Tools By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces

Globus Connect Server Deep Dive - GlobusWorld 2024

Globus

Globus Compute Introduction - GlobusWorld 2024

Globus

Providing Globus Services to Users of JASMIN for Environmental Data Analysis

Globus

JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.

Graphic Design Crash Course for beginners

e20449

Recently uploaded (20)

Large Language Models and the End of Programming

Understanding Globus Data Transfers with NetSage

A Sighting of filterA in Typelevel Rite of Passage

TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR

First Steps with Globus Compute Multi-User Endpoints

OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam

GlobusWorld 2024 Opening Keynote session

Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better

Developing Distributed High-performance Computing Capabilities of an Open Sci...

Into the Box 2024 - Keynote Day 2 Slides.pdf

Enhancing Research Orchestration Capabilities at ORNL.pdf

Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...

Cyaniclab : Software Development Agency Portfolio.pdf

BoxLang: Review our Visionary Licenses of 2024

How to Position Your Globus Data Portal for Success Ten Good Practices

Cracking the code review at SpringIO 2024

Globus Connect Server Deep Dive - GlobusWorld 2024

Globus Compute Introduction - GlobusWorld 2024

Providing Globus Services to Users of JASMIN for Environmental Data Analysis

Graphic Design Crash Course for beginners

C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

1. Capacity Forecast @ Scale CDE, Cloud Database Engineering Netflix.

2. ● CDE, Cloud Database Engineering ● Providing data stores as a service ○Cassandra, ○ Dynomite, ○ Elasticsearch and RDS Ajay Upadhyay Cloud Data Architect @ Netflix Arun Agrawal Sr. Software Engineer @ Netflix Who are we?

3. ●Cassandra @ Netflix ●Cassandra footprint ●Capacity planning lifecycle ●Forecasting the capacity ●Q and A Agenda

4. • 98% of streaming data is stored in Cassandra • Data ranges from customer details to Viewing history / streaming bookmarks to billing and payment Cassandra @ Netflix

5. Cassandra Footprint Hundreds C*

6. Cassandra Footprint Thousands

7. Capacity Planning • Able to predict – Current usage and available capacity – Resources needing upgrade – Life cycle of current configuration – Appropriate configuration for new and existing App/Service • Optimize – Under or over utilized resource – Increased business productivity

8. Capacity Planning Avoid: • Impact on Business • No service or SLA disruption • Un-planned maintenance • Firefighting

9. Life Cycle Capture Requirement Requirement Analysis/feasibility Proxy or Simulate Requirement Monitoring / Trending New / Increased traffic Optimization

10. Capture Requirement – IOPs and SLA – Maintenance overhead – Failover – Access pattern

11. IOPs and SLA Questions Response Read OPS/sec [avg, peak] 5k - 10k Read Latency requirement 95th - 20ms 99th - 100ms Write OPS/sec [avg, peak] 1k - 2k Write Latency requirement 95th - 20ms 99th - 100ms Num Columns / Row 100 Avg col size / or avg row size 64k Num of rows 100 Mil TTL [life Cycle of data] 365 Days Data store C* Gutenberg publisher service Gutenberg publisher serviceRead Write

12. Maintenance Overhead Repairs / Compactions Y/N Node replacement Y Backup - Full / Incrementals Y/N Type Response

13. Failover Region Failover Y/N SLA in case of region failover Y/N Questions Response

14. Access Pattern Questions Response Read Point read All row readers Column slices Write Part existing row New rows

15. Proxy/Simulate Traffic – Proxy existing traffic – Simulate traffic –NDBench – Generate actual / synthetic traffic before final deployment using app

16. Optimization • Cache - Application level - Fronting cache engine before C* - Stagger R - W operations if possible

17. Cluster Sharding

18. Trend Analysis Continuous monitoring / trending on usage pattern

19. New / Increased Traffic Capacity planning cycle begins Capture Requirement Requirement Analysis/feasibility Proxy or Simulate Requirement Monitoring / Trending New / Increased traffic Optimization

20. Capacity Forecasting

21. Arun Agrawal Sr. Software Engineer

22. Demo

23.

24.

25. Metrics Atlas Previous Architecture

26. Pain Points • No support for complex relationships • Hardware failure could fail leading to false positives

27. Winston • Bridge between atlas and oncall • Complex relationship modeling between metrics • Reduce false positives • Auto remediation platform

28. Lesson Learnt • It might be already too late to fix the system. • Reactive than proactive

29. Requirements • Show us trend for the clusters. • Warn us of what is coming if trend continues. • Give us time to scale their cluster

30.

31. Automic (UC4) Architecture

32. Aggregation • Daily • Instance Level • Cluster Level •Instance Failures •Adding capacity over days

33. Growth Criteria f(x) of – Subscriber – Netflix content – # Viewing Sessions

34.

35.

36. ARIMA – AR •Regression on prior values –I •Data values are replaced with (x(i) - x(i-1)) –MA •Linear combination of error terms

37.

38.

39.

40.

41. Future •Vector Auto Regression •Automate manual judgement

42. Resources – https://www.otexts.org/fpp/8

43. Q & A

44. You may not control all the events that happen to you, but you CAN decide not to be reduced by them. - Maya Angelou

Editor's Notes

For business to delivery - quality service to meet and exceed customers expectations - need right capacity and resources
Work with app team -
Cluster / ring size 9 nodes 300 nodes 10k instances - right from ms to i2 to d2 instances
Cluster / ring size 9 nodes 300 nodes 10k instances - right from ms to i2 to d2 instances
Current usage and available capacity Resources needing upgrade Cost-effective configuration - just vertical upgrade - no need to add nodes or increase ring size Life cycle of current configuration - when cluster will run out of resources Appropriate configuration for new and existing App/Service
Analysis – In the analysis phase data collected in the Monitoring phase and analyze them to find problems and evaluate the quality of the deployment. Optimization - stagger R - W
Repair overheads - amount of writes and data size - Entropy in the system No repair - quorum R and W - aggressive ttl data Compactions - implicit - compaction-threshold - 2 - GC grace period more aggressive Node replacement - replace early if node is still healthy - bootstrap from neighboring nodes Backup overheads - throttle if creates a big bottleneck on network
Read - full row or column slices Write - full row or few columns at a time STCS - size tiered LCS - Leveled compaction straregy Aggressive TTLs - few hours to few days Variable Payloads - 1k - 1m range
Model/Simulate traffic using NDBench for new requirement
Cache for aggressive latencies Cluster sharding for high and low latency required data
Continuous monitoring to keep track of usage pattern Useful for predicting it’s clusters life For proxing for traffic similar to one captured here
New requirement or change in existing traffic capacity planning cycle begins
Let’s see how we really manage CAAS at netflix. A short video where we get notified on slack about the cluster which may reach its capacity and then we do some investigation, talk with app teams, warn them, take proper steps (if required) or increase the capacity of the cluster.
Pretty cool and neat stuff. Right. So let’s see how we do this? Its actually some human doing the analysis and posting on slack. No, let’s see what is the science behind this but before we get there we need to understand netflix ecosystem.
Pretty cool and neat stuff. Right. So let’s see how we do this? Its actually some human doing the analysis and posting on slack. No, let’s see what is the science behind this but before we get there we need to understand netflix ecosystem.
Pretty cool and neat stuff. Right. So let’s see how we do this? Its actually some human doing the analysis and posting on slack. No, let’s see what is the science behind this but before we get there we need to understand netflix ecosystem.
Every instance in netflix uploads all the telemetry information to Atlas. Atlas is very useful tool as it combines all the raw inputs from multiple instances based on availability zones, region, application etc. Really handy tool to find performance issues, debug, triage and have aggregated view of app. One of the multiple features of atlas is the ability to set a threshold and duration for a metric which when tripped can page on-call.
But let’s face it, if you paged a person based on single metric being tripped, is it right? It is NEVER a single metric which can tell you about the cluster. There are hooks in atlas where you can define basic relationship between metrics but again, it is always complex relationships which we are after. In addition to that, there could be false positives being reported because let’s face it, we are hosted in AWS and failure of machines is not a exception but norm. When machines fail they don’t always report metrics leading us to believe we are in false positive zone.
So to reduce oncall pain, we needed a middle layer logic which could sit between atlas and oncall, where we could provide complex relationship between multiple metrics, add context, do basic triage and remove those false positives. We brought “Winston” which is based on stackstorm which does all this and provides a great UI to work with. Winston has native integration with Atlas and thus you can write some python code which will be triggered when Atlas fires the event. This combination of Atlas and Winston greatly reduces false positives for oncall.
But wait, now we were getting paged which is accurate but how can we save the cluster? It might be already too late. It is more reactive than proactive.
How can we build the system which tells us that system might get under pressure if the trend continues. If we have such system, then we are better prepared for what is about to come. Chances of us getting paged at middle of night for degraded performance or latencies alert can be reduced drastically if not avoided completely. This is where we started to think about a system which could predict the future of a cluster.
With such a system, we can take a break and have more confidence on our system that it will be able to handle what is about to come. Again this is to say if the “trend” continues, if you try to do something which is not expected we still might have issues where we would be increasing the capacity of the cluster at the very last moment. So we set the expectations that this is not magic ball which will solve all the problems but it will surely help you find the problem areas before they happen. Don’t expect the clusters to auto-scale when you suddenly add another 100m subscribers!
In netflix, atlas metrics are pushed to big data platform which is netflix’s data warehouse. Here all the metrics are stored and all analysis can run here.
With such a system, we can take a break and have more confidence on our system that it will be able to handle what is about to come. Again this is to say if the “trend” continues, if you try to do something which is not expected we still might have issues where we would be increasing the capacity of the cluster at the very last moment. So we set the expectations that this is not magic ball which will solve all the problems but it will surely help you find the problem areas before they happen. Don’t expect the clusters to auto-scale when you suddenly add another 100m subscribers!
With such a system, we can take a break and have more confidence on our system that it will be able to handle what is about to come. Again this is to say if the “trend” continues, if you try to do something which is not expected we still might have issues where we would be increasing the capacity of the cluster at the very last moment. So we set the expectations that this is not magic ball which will solve all the problems but it will surely help you find the problem areas before they happen. Don’t expect the clusters to auto-scale when you suddenly add another 100m subscribers!
RSS - RESIDUAL SUM OF SQUARES RMSE - ROOT MEAN SQUARED ERROR
With such a system, we can take a break and have more confidence on our system that it will be able to handle what is about to come. Again this is to say if the “trend” continues, if you try to do something which is not expected we still might have issues where we would be increasing the capacity of the cluster at the very last moment. So we set the expectations that this is not magic ball which will solve all the problems but it will surely help you find the problem areas before they happen. Don’t expect the clusters to auto-scale when you suddenly add another 100m subscribers!
With such a system, we can take a break and have more confidence on our system that it will be able to handle what is about to come. Again this is to say if the “trend” continues, if you try to do something which is not expected we still might have issues where we would be increasing the capacity of the cluster at the very last moment. So we set the expectations that this is not magic ball which will solve all the problems but it will surely help you find the problem areas before they happen. Don’t expect the clusters to auto-scale when you suddenly add another 100m subscribers!
With such a system, we can take a break and have more confidence on our system that it will be able to handle what is about to come. Again this is to say if the “trend” continues, if you try to do something which is not expected we still might have issues where we would be increasing the capacity of the cluster at the very last moment. So we set the expectations that this is not magic ball which will solve all the problems but it will surely help you find the problem areas before they happen. Don’t expect the clusters to auto-scale when you suddenly add another 100m subscribers!

C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Similar to C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016 (20)

More from DataStax

More from DataStax (20)

Recently uploaded

Recently uploaded (20)

C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016

Editor's Notes