Netflix stores 98 percent of data related with streaming services: right from bookmarks, viewing history to billing and payment information. These services / applications simply desire highly available and scalable persistence solution to keep themselves running efficiently in a normal and disastrous situation. How does Netflix plan for capacity for it's new as well as existing services?
In this talk, Arun Agrawal, Senior Software Engineer and Ajay Upadhyay, Cloud Data Architect @Netflix will talk about the capacity planning and capacity forecasting in cassandra world.
We will take you through the science behind forecasting the short and long term usage and auto-scaling adequate capacity well before C* clusters reach their limit. This guarantees highly scalable and available persistence solution meeting our SLAs @ Netflix.
About the Speakers
ajay upadhyay Senior Database Engineer, Netflix
Responsible for persistent layer at Netflix, part of CDE [Cloud Database Engineering] team. Working with application team, suggesting and guiding them with the best practices for various persistent layers provided by CDE team.
Arun Agrawal Senior Software Engineer, Netflix
Arun Agrawal is part of Cloud Database Engineering where they provide CAAS (Cassandra as a service). Ensuring smooth operations of service and finding innovative ways to reduce the management overheads of having CAAS.
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...DataStax
During this session Ben Lackey (DataStax) and Ravi Madasu (Google) will cover best practices for quickly setting up a cluster on Google Cloud Platform (GCP) using both Google Compute Engine (GCE) and Google Container Engine (GKE) which is based on Kubernetes and Docker.
About the Speakers
Ben Lackey Partner Architect, DataStax
I work in the Cloud Strategy group at DataStax where I concentrate on improving the integration between DataStax Enterprise and cloud platforms including Azure, GCP and Pivotal.
Ravi Madasu
Ravi Madasu is a program manager at Google, primarily focused on Google Cloud Launcher. He works closely with ISV partners to make their products and services available on the Google Cloud Platform providing a developer friendly deployment experience. He has 15+ years of experience, working in variety of roles such as software engineer, project manager and product manager. Ravi received a Masters degree in Information Systems from Northeastern University and an MBA from Carnegie Mellon University.
Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...DataStax
DataStax provides modern, feature-rich, and highly tunable client libraries for C/C++, C#, Java, Node.js, Python, PHP, and Ruby that work with any cluster size no matter if deployed across multiple on premise or cloud datacenters.
Come learn right from the source about the DataStax drivers for Apache Cassandra and DSE and how they can help you build continuously available, fault tolerant, and instantly responsive applications.
About the Speakers
Alex Popescu Senior Product Manager, DataStax
I'm a developer turned product manager building developer tools for Apache Cassandra and DSE. With an eye for simplicity, I focus on creating friendly developer solutions that enable building high-performance, scalable, and fault tolerant applications. I'm passionate about open source and over years I made numerous contributions to major projects like TestNG and Groovy.
Bulat Shakirzyanov Architect, DataStax
Bulat Shakirzyanov, a.k.a. avalance123, is a software alchemist who holds a black belt in test-fu. Open source enthusiast, author of and contributor to several popular open source projects, he also loves talking about clean code, open source, unix, distributed systems, consensus algorithms and himself in third person.
Azure + DataStax Enterprise Powers Office 365 Per User StoreDataStax Academy
We will present our O365 use case scenarios, why we chose Cassandra + Spark, and walk through the architecture we chose for running DataStax Enterprise on azure.
Tsinghua University: Two Exemplary Applications in ChinaDataStax Academy
In this talk, we will share the experiences of applying Cassandra with two real customers in China. In the first use case, we deployed Cassandra at Sany Group, a leading company of Machinery manufacturing, to manage the sensor data generated by construction machinery. By designing a specific schema and optimizing the write process, we successfully managed over 1.5 billion historical data records and achieved the online write throughput of 10k write operations per second with 5 servers. MapReduce is also used on Cassandra for valued-added services, e.g. operations management, machine failure prediction, and abnormal behavior mining. In the second use case, Cassandra is deployed in the China Meteorological Administration to manage the Meteorological data. We design a hybrid schema to support both slice query and time window based query efficiently. Also, we explored the optimized compaction and deletion strategy for meteorological data in this case.
Cassandra Summit 2014: Apache Cassandra Best Practices at EbayDataStax Academy
Presenter: Feng Qu, Principal DBA at eBay
Cassandra has been adopted widely at eBay in recent years and used by many end-user facing applications. I will introduce best practices we have built over the time around system design, capacity planning, deployment automation, monitoring integration, performance analysis and troubleshooting. I will also share our experience working with DataStax support to provide a highly available, highly scalable data store fitting into eBay infrastructure.
Why you need benchmarks
Finding the right database solution for your use case can be an arduous journey. The database deployment touches aspects of throughput performance, latency control, high availability and data resilience.
You will need to decide on the infrastructure to use: Cloud, on-premise or a hybrid solution.
Data models also have an impact on finding the right fit for the use case. Once you establish a requirements set, the next step is to test your use case against the databases of choice.
In this workshop, we will discuss the different data points you need to collect in order to get the most realistic testing environment.
We will cover:
Data model impact on performance and latency
Client behavior related to database capabilities
Failover and high availability testing
Hardware selection and cluster configuration impact
We will show 2 benchmarking tools you can use to test and benchmark your clusters to identify the optimal deployment scenario for your use case.
Attend this virtual workshop if you are:
Looking to minimize the cost of your database deployment
Making a database decision based on performance and scale data
Planning to emulate your workload on a pre-production system where you can test, fail fast and learn.
Cassandra is a better alternative to RDBMS for a scalable solution which requires a distributed DB but it is more popular in clustered solutions which are targeted for a single installation. Key reason is maintainability & life-cycle management.
Ericsson has re-engineered its voucher management solution for prepaid billing by replacing RDBMS with Cassandra. It facilitates clusters with large set of nodes which can easily scale up & scale down, so that one doesn't have to deal with multiple clusters. However, skills for its administration are sparse, unlke RDBMS. Activities like nodetool repair, compaction & scale up/down become challenging. Moreover, frequency of new Cassandra releases is high and rolling them out to several deployments is challenging
Key technical challenges were consistency of denormalized data, performance of full-table scan & porting the product from Thrift to CQL. Challenges with large scale global deployments are with anti-entropy & size-tiered compaction.
About the Speaker
Brij Bhushan Ravat Chief Architect, Ericsson
Brij is Chief Architect for prepaid billing product in Ericsson. The product uses Cassandra in business support systems for telecom service providers. He has also led Centre of Excellence for Network Applications, which tracks emerging trends in the application development in the area of telecom. This includes telecom services, OSS & leveraging big data technologies for innovative new age solutions His focus is on application of big data in telecom. This includes analytics using Spark & NoSQL
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...DataStax
During this session Ben Lackey (DataStax) and Ravi Madasu (Google) will cover best practices for quickly setting up a cluster on Google Cloud Platform (GCP) using both Google Compute Engine (GCE) and Google Container Engine (GKE) which is based on Kubernetes and Docker.
About the Speakers
Ben Lackey Partner Architect, DataStax
I work in the Cloud Strategy group at DataStax where I concentrate on improving the integration between DataStax Enterprise and cloud platforms including Azure, GCP and Pivotal.
Ravi Madasu
Ravi Madasu is a program manager at Google, primarily focused on Google Cloud Launcher. He works closely with ISV partners to make their products and services available on the Google Cloud Platform providing a developer friendly deployment experience. He has 15+ years of experience, working in variety of roles such as software engineer, project manager and product manager. Ravi received a Masters degree in Information Systems from Northeastern University and an MBA from Carnegie Mellon University.
Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...DataStax
DataStax provides modern, feature-rich, and highly tunable client libraries for C/C++, C#, Java, Node.js, Python, PHP, and Ruby that work with any cluster size no matter if deployed across multiple on premise or cloud datacenters.
Come learn right from the source about the DataStax drivers for Apache Cassandra and DSE and how they can help you build continuously available, fault tolerant, and instantly responsive applications.
About the Speakers
Alex Popescu Senior Product Manager, DataStax
I'm a developer turned product manager building developer tools for Apache Cassandra and DSE. With an eye for simplicity, I focus on creating friendly developer solutions that enable building high-performance, scalable, and fault tolerant applications. I'm passionate about open source and over years I made numerous contributions to major projects like TestNG and Groovy.
Bulat Shakirzyanov Architect, DataStax
Bulat Shakirzyanov, a.k.a. avalance123, is a software alchemist who holds a black belt in test-fu. Open source enthusiast, author of and contributor to several popular open source projects, he also loves talking about clean code, open source, unix, distributed systems, consensus algorithms and himself in third person.
Azure + DataStax Enterprise Powers Office 365 Per User StoreDataStax Academy
We will present our O365 use case scenarios, why we chose Cassandra + Spark, and walk through the architecture we chose for running DataStax Enterprise on azure.
Tsinghua University: Two Exemplary Applications in ChinaDataStax Academy
In this talk, we will share the experiences of applying Cassandra with two real customers in China. In the first use case, we deployed Cassandra at Sany Group, a leading company of Machinery manufacturing, to manage the sensor data generated by construction machinery. By designing a specific schema and optimizing the write process, we successfully managed over 1.5 billion historical data records and achieved the online write throughput of 10k write operations per second with 5 servers. MapReduce is also used on Cassandra for valued-added services, e.g. operations management, machine failure prediction, and abnormal behavior mining. In the second use case, Cassandra is deployed in the China Meteorological Administration to manage the Meteorological data. We design a hybrid schema to support both slice query and time window based query efficiently. Also, we explored the optimized compaction and deletion strategy for meteorological data in this case.
Cassandra Summit 2014: Apache Cassandra Best Practices at EbayDataStax Academy
Presenter: Feng Qu, Principal DBA at eBay
Cassandra has been adopted widely at eBay in recent years and used by many end-user facing applications. I will introduce best practices we have built over the time around system design, capacity planning, deployment automation, monitoring integration, performance analysis and troubleshooting. I will also share our experience working with DataStax support to provide a highly available, highly scalable data store fitting into eBay infrastructure.
Why you need benchmarks
Finding the right database solution for your use case can be an arduous journey. The database deployment touches aspects of throughput performance, latency control, high availability and data resilience.
You will need to decide on the infrastructure to use: Cloud, on-premise or a hybrid solution.
Data models also have an impact on finding the right fit for the use case. Once you establish a requirements set, the next step is to test your use case against the databases of choice.
In this workshop, we will discuss the different data points you need to collect in order to get the most realistic testing environment.
We will cover:
Data model impact on performance and latency
Client behavior related to database capabilities
Failover and high availability testing
Hardware selection and cluster configuration impact
We will show 2 benchmarking tools you can use to test and benchmark your clusters to identify the optimal deployment scenario for your use case.
Attend this virtual workshop if you are:
Looking to minimize the cost of your database deployment
Making a database decision based on performance and scale data
Planning to emulate your workload on a pre-production system where you can test, fail fast and learn.
Cassandra is a better alternative to RDBMS for a scalable solution which requires a distributed DB but it is more popular in clustered solutions which are targeted for a single installation. Key reason is maintainability & life-cycle management.
Ericsson has re-engineered its voucher management solution for prepaid billing by replacing RDBMS with Cassandra. It facilitates clusters with large set of nodes which can easily scale up & scale down, so that one doesn't have to deal with multiple clusters. However, skills for its administration are sparse, unlke RDBMS. Activities like nodetool repair, compaction & scale up/down become challenging. Moreover, frequency of new Cassandra releases is high and rolling them out to several deployments is challenging
Key technical challenges were consistency of denormalized data, performance of full-table scan & porting the product from Thrift to CQL. Challenges with large scale global deployments are with anti-entropy & size-tiered compaction.
About the Speaker
Brij Bhushan Ravat Chief Architect, Ericsson
Brij is Chief Architect for prepaid billing product in Ericsson. The product uses Cassandra in business support systems for telecom service providers. He has also led Centre of Excellence for Network Applications, which tracks emerging trends in the application development in the area of telecom. This includes telecom services, OSS & leveraging big data technologies for innovative new age solutions His focus is on application of big data in telecom. This includes analytics using Spark & NoSQL
Cisco: Cassandra adoption on Cisco UCS & OpenStackDataStax Academy
n this talk we will address how we developed our Cassandra environments utilizing Cisco UCS Open Stack Platform with the DataStax Enterprise Edition software. In addition we are utilizing OpenSource CEPH storage in our Infrastructure to optimize the Performance and reduce the costs.
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...DataStax
Cassandra is a distributed database with features included but not limited to Secundary Indexes, UDF, Materialized Views, etc. and not so strict hardware requirements.
It is important to use those features and select hardware correctly to make sure the use of Cassandra in your business can be as painless as possible.
I will address how these features are used in the wrong way, how hardware should be selected, and how to make Cassandra work in the best possible way.
Learning Objective #1:
Learn that Cassandra hardware requirements exist (and why) and the shortcomings in some of features(Secundary Indexes, Compaction Strategies, etc).
Learning Objective #2:
The most misused features and common hardware errors. How they might seem harmeless at first (either small cluster or even single node).
Learning Objective #3:
How to correctly use Cassandra and it's features and go for perfect operation.
About the Speaker
Carlos Rolo Cassandra Consultant, Pythian
Carlos Rolo is a Cassandra MVP, and has deep expertise with distributed architecture technologies. Carlos is driven by challenge, and enjoys the opportunities to discover new things.. He has become known and trusted by customers and colleagues for his ability to understand complex problems, and to work well under pressure. When Carlos isn't working he can be found playing water polo or enjoying the his local community.
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...DataStax Academy
The state of analytics has changed dramatically over the last few years. Hadoop is now commonplace, and the ecosystem has evolved to include new tools such as Spark, Shark, and Drill, that live alongside the old MapReduce-based standards. It can be difficult to keep up with the pace of change, and newcomers are left with a dizzying variety of seemingly similar choices. This is compounded by the number of possible deployment permutations, which can cause all but the most determined to simply stick with the tried and true. But there are serious advantages to many of the new tools, and this presentation will give an analysis of the current state–including pros and cons as well as what’s needed to bootstrap and operate the various options.
About Robbie Strickland, Software Development Manager at The Weather Channel
Robbie works for The Weather Channel’s digital division as part of the team that builds backend services for weather.com and the TWC mobile apps. He has been involved in the Cassandra project since 2010 and has contributed in a variety of ways over the years; this includes work on drivers for Scala and C#, the Hadoop integration, heading up the Atlanta Cassandra Users Group, and answering lots of Stack Overflow questions.
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...DataStax
Lessons learned from a year spent building a Cassandra cluster over multiple regions, data centers, and providers. Will discuss our successes and learnings on replication, operations, and application development.
About the Speaker
Aaron Ploetz Lead Technical Architect, Target
Aaron is a Lead Technical Architect for Target, where he coaches development teams on modeling and building applications for Cassandra. He is active in the Cassandra tags on StackOverflow, and has also contributed patches to cqlsh. Aaron holds a B.S. in Management/Computer Systems from the University of Wisconsin-Whitewater, a M.S. in Software Engineering and Database Technologies from Regis University, and is a 2x DataStax MVP for Apache Cassandra.
The presentation covers lambda architecture and implementation with spark. In the presentation we will discuss about components of lambda architecture like batch layer, speed layer and serving layer. We will also discuss its advantages and benefits with spark.
Strata Singapore 2017 business use case section
"Big Telco Real-Time Network Analytics"
https://conferences.oreilly.com/strata/strata-sg/public/schedule/detail/62797
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...Data Con LA
Scylla is a new, open-source NoSQL data store with a novel design optimized for modern hardware, capable of 1.8 million requests per second per node, while providing Apache Cassandra compatibility and scaling properties. While conventional NoSQL databases suffer from latency hiccups, expensive locking, and low throughput due to low processor utilization, the Scylla design is based on a modern shared-nothing approach. Scylla runs multiple engines, one per core, each with its own memory, CPU and multi-queue NIC. The result is a NoSQL database that delivers an order of magnitude more performance, with less performance tuning needed from the administrator.
With extra performance to work with, NoSQL projects can have more flexibility to focus on other concerns, such as functionality and time to market. Come for the tech details on what Scylla does under the hood, and leave with some ideas on how to do more with NoSQL, faster.
Speaker bio
Don Marti is technical marketing manager for ScyllaDB. He has written for Linux Weekly News, Linux Journal, and other publications. He co-founded the Linux consulting firm Electric Lichen. Don is a strategic advisor for Mozilla, and has previously served as president and vice president of the Silicon Valley Linux Users Group and on the program committees for Uselinux, Codecon, and LinuxWorld Conference and Expo.
Webinar: How to Shrink Your Datacenter Footprint by 50%ScyllaDB
Are you running separate database clusters for operational and analytical workloads? If your company is like most, you're dedicating too much time and effort maintaining infrastructure to support both OLTP and OLAP. To make life easier, Scylla now has the ability to handle multiple workloads from a single cluster--without performance degradation to either. We call it Workload Prioritization, and it could make a big difference to your team.
Join our webinar to learn about the vision behind developing this feature. We’ll show you:
- The evolving requirements for operational (OLTP) and analytics (OLAP) workloads in the modern datacenter
- How Scylla provides built-in control over workload priority and makes it easy for administrators to configure workload priorities
- The TCO impact of minimizing integrations and maintenance tasks, while also shrinking the datacenter footprint and maximizing utilization
Plus we’ll share test results of how it performs in real-world settings.
Many NoSQL DBaaS vendors limit what cloud platform you can run on, the size of the data you can run and require you to over-provision cloud infrastructure resources while failing to deliver performance and low latency at scale.
In this session, we will compare the performance and Total Cost of Ownership (TCO) of competing NoSQL DBaaS offerings. We will also review how to migrate to Scylla Cloud, our fully managed database service.
You will learn:
- The true cost of ownership for selected NoSQL DBaaS offerings
- The 8 essentials for selecting a NoSQL DBaaS
- Migration options from Apache Cassandra, DynamoDB and other databases
How to Build a Scylla Database Cluster that Fits Your NeedsScyllaDB
Sizing a database cluster makes or breaks your application. Too small and you could sustain spikes in usage and recover from a node loss or an operational slowdown. Too big and your cluster will cost more and waste valuable human resources.
Since different workloads have different requirements, successful sizing of your application should be optimized for both throughput and latency performance. However, in many cases, the requirements for each contradicts each other.
In this webinar, we explain how to remediate the contradicting forces and build a sustainable cluster to meet both performance and resiliency requirements.
Case Study: Troubleshooting Cassandra performance issues as a developerCarlos Alonso Pérez
This talk will be a step by step walkthrough of a developer troubleshooting a real performance issue we had at MyDrive, from the very first steps diagnosing the symptoms, through looking at metric charts down to CQL queries, the Ruby CQL driver, and Ruby code profiling.
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017Big Data Spain
Apache Cassandra is distributed masterless column store database which is becoming mainstream for analytics and IoT data.
https://www.bigdataspain.org/2017/talk/tuning-java-driver-for-apache-cassandra
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...DataStax
Designing & Optimizing micro batch processing system to handle multi-billion events using 100+ nodes of Cassandra , spark and Kafka - Lessons learned from the trenches
Designing and Optimizing 20+ billion operations a day presents a set of complex challenges especially when the SLA is near real-time. In this presentation we will walk through our experience in building large scale event processing pipeline using Cassandra , spark streaming and kafka using 100+ nodes. We will present the Design patterns, development steps and diagnostics setups at the technology level and application level that are needed to manage the application of this scale. We also aim to present some unique problems we encountered in optimizing and operationalizing these environments.
About the Speakers
Ananth Ram Senior Principal / Senior Manager, Accenture
Ananth Ram is a Solution Architect with over 17 years of experience in Oracle database Architecture and designing large scale applications. He was with Oracle Corp for nine years before joining Accenture as Senior Principal . As a part of Accenture, Ananth has been working on many large scale Oracle and big data initiatives in the last four years.
Rich Rein Solution Architect, DataStax
Rich Rein is a Solutions Architect from DataStax on Accenture team with over 30+ years as an architect, manager, and consultant in Silicon Valley's computing industry.
Rumeel Kazi, Accenture Federal
Rumeel Kazi is a Senior Manager in the Accenture Health & Public Service (H&PS) practice. He has over 17 years of Systems Integration implementation experience involving Oracle, J2EE platforms, Enterprise Application Integration, Supply Chain, ETL and Business Rules Management Systems. Rumeel has been working on large scale Oracle and big data application solutions since the last 5 years.
Light Weight Transactions Under Stress (Christopher Batey, The Last Pickle) ...DataStax
The Strong Consistency provided by QUORUM reads in Cassandra can still lead to read-write-modify problems when applications want to do things such as guarantee uniqueness or sell exactly 300 cinema tickets. Fortunately Light Weight Transactions (LWT) are designed to solve the problems Strong Consistency can not.
In this talk Christopher Batey, Consultant at The Last Pickle, will discuss:
- Syntax and semantics: Theoretical use cases
- How they work under the covers
Then we will go through LWTs in practice:
- How do the number of nodes/replicas/data centres affect performance?
- How does contention (multiple concurrent queries using LWTs) affect availability and performance?
- What consistency guarantees do you get with other LWTs and non-LWTs?
- How does LWT timeout differ from normal write timeout?
- Use case: LWTs as a distributed lock and how it went wrong 5 times.
About the Speaker
Christopher Batey Consultant / Software Engineer, The Last Pickle
Christopher (@chbatey) is a part time consultant at The Last Pickle where he works with clients to help them succeed with Apache Cassandra as well as a freelance software engineer working in London. Likes: Scala, Haskell, Java, the JVM, Akka, distributed databases, XP, TDD, Pairing. Hates: Untested software, code ownership. You can checkout his blog at: http://www.batey.info
Cisco: Cassandra adoption on Cisco UCS & OpenStackDataStax Academy
n this talk we will address how we developed our Cassandra environments utilizing Cisco UCS Open Stack Platform with the DataStax Enterprise Edition software. In addition we are utilizing OpenSource CEPH storage in our Infrastructure to optimize the Performance and reduce the costs.
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...DataStax
Cassandra is a distributed database with features included but not limited to Secundary Indexes, UDF, Materialized Views, etc. and not so strict hardware requirements.
It is important to use those features and select hardware correctly to make sure the use of Cassandra in your business can be as painless as possible.
I will address how these features are used in the wrong way, how hardware should be selected, and how to make Cassandra work in the best possible way.
Learning Objective #1:
Learn that Cassandra hardware requirements exist (and why) and the shortcomings in some of features(Secundary Indexes, Compaction Strategies, etc).
Learning Objective #2:
The most misused features and common hardware errors. How they might seem harmeless at first (either small cluster or even single node).
Learning Objective #3:
How to correctly use Cassandra and it's features and go for perfect operation.
About the Speaker
Carlos Rolo Cassandra Consultant, Pythian
Carlos Rolo is a Cassandra MVP, and has deep expertise with distributed architecture technologies. Carlos is driven by challenge, and enjoys the opportunities to discover new things.. He has become known and trusted by customers and colleagues for his ability to understand complex problems, and to work well under pressure. When Carlos isn't working he can be found playing water polo or enjoying the his local community.
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...DataStax Academy
The state of analytics has changed dramatically over the last few years. Hadoop is now commonplace, and the ecosystem has evolved to include new tools such as Spark, Shark, and Drill, that live alongside the old MapReduce-based standards. It can be difficult to keep up with the pace of change, and newcomers are left with a dizzying variety of seemingly similar choices. This is compounded by the number of possible deployment permutations, which can cause all but the most determined to simply stick with the tried and true. But there are serious advantages to many of the new tools, and this presentation will give an analysis of the current state–including pros and cons as well as what’s needed to bootstrap and operate the various options.
About Robbie Strickland, Software Development Manager at The Weather Channel
Robbie works for The Weather Channel’s digital division as part of the team that builds backend services for weather.com and the TWC mobile apps. He has been involved in the Cassandra project since 2010 and has contributed in a variety of ways over the years; this includes work on drivers for Scala and C#, the Hadoop integration, heading up the Atlanta Cassandra Users Group, and answering lots of Stack Overflow questions.
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...DataStax
Lessons learned from a year spent building a Cassandra cluster over multiple regions, data centers, and providers. Will discuss our successes and learnings on replication, operations, and application development.
About the Speaker
Aaron Ploetz Lead Technical Architect, Target
Aaron is a Lead Technical Architect for Target, where he coaches development teams on modeling and building applications for Cassandra. He is active in the Cassandra tags on StackOverflow, and has also contributed patches to cqlsh. Aaron holds a B.S. in Management/Computer Systems from the University of Wisconsin-Whitewater, a M.S. in Software Engineering and Database Technologies from Regis University, and is a 2x DataStax MVP for Apache Cassandra.
The presentation covers lambda architecture and implementation with spark. In the presentation we will discuss about components of lambda architecture like batch layer, speed layer and serving layer. We will also discuss its advantages and benefits with spark.
Strata Singapore 2017 business use case section
"Big Telco Real-Time Network Analytics"
https://conferences.oreilly.com/strata/strata-sg/public/schedule/detail/62797
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...Data Con LA
Scylla is a new, open-source NoSQL data store with a novel design optimized for modern hardware, capable of 1.8 million requests per second per node, while providing Apache Cassandra compatibility and scaling properties. While conventional NoSQL databases suffer from latency hiccups, expensive locking, and low throughput due to low processor utilization, the Scylla design is based on a modern shared-nothing approach. Scylla runs multiple engines, one per core, each with its own memory, CPU and multi-queue NIC. The result is a NoSQL database that delivers an order of magnitude more performance, with less performance tuning needed from the administrator.
With extra performance to work with, NoSQL projects can have more flexibility to focus on other concerns, such as functionality and time to market. Come for the tech details on what Scylla does under the hood, and leave with some ideas on how to do more with NoSQL, faster.
Speaker bio
Don Marti is technical marketing manager for ScyllaDB. He has written for Linux Weekly News, Linux Journal, and other publications. He co-founded the Linux consulting firm Electric Lichen. Don is a strategic advisor for Mozilla, and has previously served as president and vice president of the Silicon Valley Linux Users Group and on the program committees for Uselinux, Codecon, and LinuxWorld Conference and Expo.
Webinar: How to Shrink Your Datacenter Footprint by 50%ScyllaDB
Are you running separate database clusters for operational and analytical workloads? If your company is like most, you're dedicating too much time and effort maintaining infrastructure to support both OLTP and OLAP. To make life easier, Scylla now has the ability to handle multiple workloads from a single cluster--without performance degradation to either. We call it Workload Prioritization, and it could make a big difference to your team.
Join our webinar to learn about the vision behind developing this feature. We’ll show you:
- The evolving requirements for operational (OLTP) and analytics (OLAP) workloads in the modern datacenter
- How Scylla provides built-in control over workload priority and makes it easy for administrators to configure workload priorities
- The TCO impact of minimizing integrations and maintenance tasks, while also shrinking the datacenter footprint and maximizing utilization
Plus we’ll share test results of how it performs in real-world settings.
Many NoSQL DBaaS vendors limit what cloud platform you can run on, the size of the data you can run and require you to over-provision cloud infrastructure resources while failing to deliver performance and low latency at scale.
In this session, we will compare the performance and Total Cost of Ownership (TCO) of competing NoSQL DBaaS offerings. We will also review how to migrate to Scylla Cloud, our fully managed database service.
You will learn:
- The true cost of ownership for selected NoSQL DBaaS offerings
- The 8 essentials for selecting a NoSQL DBaaS
- Migration options from Apache Cassandra, DynamoDB and other databases
How to Build a Scylla Database Cluster that Fits Your NeedsScyllaDB
Sizing a database cluster makes or breaks your application. Too small and you could sustain spikes in usage and recover from a node loss or an operational slowdown. Too big and your cluster will cost more and waste valuable human resources.
Since different workloads have different requirements, successful sizing of your application should be optimized for both throughput and latency performance. However, in many cases, the requirements for each contradicts each other.
In this webinar, we explain how to remediate the contradicting forces and build a sustainable cluster to meet both performance and resiliency requirements.
Case Study: Troubleshooting Cassandra performance issues as a developerCarlos Alonso Pérez
This talk will be a step by step walkthrough of a developer troubleshooting a real performance issue we had at MyDrive, from the very first steps diagnosing the symptoms, through looking at metric charts down to CQL queries, the Ruby CQL driver, and Ruby code profiling.
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017Big Data Spain
Apache Cassandra is distributed masterless column store database which is becoming mainstream for analytics and IoT data.
https://www.bigdataspain.org/2017/talk/tuning-java-driver-for-apache-cassandra
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...DataStax
Designing & Optimizing micro batch processing system to handle multi-billion events using 100+ nodes of Cassandra , spark and Kafka - Lessons learned from the trenches
Designing and Optimizing 20+ billion operations a day presents a set of complex challenges especially when the SLA is near real-time. In this presentation we will walk through our experience in building large scale event processing pipeline using Cassandra , spark streaming and kafka using 100+ nodes. We will present the Design patterns, development steps and diagnostics setups at the technology level and application level that are needed to manage the application of this scale. We also aim to present some unique problems we encountered in optimizing and operationalizing these environments.
About the Speakers
Ananth Ram Senior Principal / Senior Manager, Accenture
Ananth Ram is a Solution Architect with over 17 years of experience in Oracle database Architecture and designing large scale applications. He was with Oracle Corp for nine years before joining Accenture as Senior Principal . As a part of Accenture, Ananth has been working on many large scale Oracle and big data initiatives in the last four years.
Rich Rein Solution Architect, DataStax
Rich Rein is a Solutions Architect from DataStax on Accenture team with over 30+ years as an architect, manager, and consultant in Silicon Valley's computing industry.
Rumeel Kazi, Accenture Federal
Rumeel Kazi is a Senior Manager in the Accenture Health & Public Service (H&PS) practice. He has over 17 years of Systems Integration implementation experience involving Oracle, J2EE platforms, Enterprise Application Integration, Supply Chain, ETL and Business Rules Management Systems. Rumeel has been working on large scale Oracle and big data application solutions since the last 5 years.
Light Weight Transactions Under Stress (Christopher Batey, The Last Pickle) ...DataStax
The Strong Consistency provided by QUORUM reads in Cassandra can still lead to read-write-modify problems when applications want to do things such as guarantee uniqueness or sell exactly 300 cinema tickets. Fortunately Light Weight Transactions (LWT) are designed to solve the problems Strong Consistency can not.
In this talk Christopher Batey, Consultant at The Last Pickle, will discuss:
- Syntax and semantics: Theoretical use cases
- How they work under the covers
Then we will go through LWTs in practice:
- How do the number of nodes/replicas/data centres affect performance?
- How does contention (multiple concurrent queries using LWTs) affect availability and performance?
- What consistency guarantees do you get with other LWTs and non-LWTs?
- How does LWT timeout differ from normal write timeout?
- Use case: LWTs as a distributed lock and how it went wrong 5 times.
About the Speaker
Christopher Batey Consultant / Software Engineer, The Last Pickle
Christopher (@chbatey) is a part time consultant at The Last Pickle where he works with clients to help them succeed with Apache Cassandra as well as a freelance software engineer working in London. Likes: Scala, Haskell, Java, the JVM, Akka, distributed databases, XP, TDD, Pairing. Hates: Untested software, code ownership. You can checkout his blog at: http://www.batey.info
KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Sum...DataStax
In 2012 I presented my first version of the KillrVideo video sharing site and a data model was born! Many things have happened to Cassandra since then and as a result, the data model for KillrVideo has evolved. The transition from Thrift to CQL was the first big shift. From Cassandra 2 to 3 we have seen some major usability enhancements to CQL that have reduced the complexity on the application developer. Indexing changes. Denormalization help. Syntax changes in the select queries. Storage engine changes that has eliminated anti-patterns. A lot to talk about in a constantly evolving project like Apache Cassandra. Don't get left behind!
About the Speaker
Patrick McFadin Chief Evangelist, DataStax
Patrick McFadin is one of the leading experts of Apache Cassandra and data modeling techniques. As the Chief Evangelist for Apache Cassandra and consultant for DataStax, he has helped build some of the largest and exciting deployments in production. Previous to DataStax, he was Chief Architect at Hobsons and an Oracle DBA/Developer for over 15 years.
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...DataStax
Advanced Apache Cassandra operations depends on an understanding of what features are available via the JMX interface. While nodetool exposes many of these, the most useful are still waiting to be discovered. The JMX interface allows the code base to expose functions that operate directly on internal structures, making real time changes to the way the process runs. With this skill in your toolkit there is no limit to the changes you can make.
In this talk Nate McCall, CTO at The Last Pickle, will explain how to explore, secure, and invoke the JMX interface exposed by Cassandra. He'll then move on to what you can do with it such as compacting specific SSTables, changing compaction on a single node, managing repairs, diagnosing latency, viewing cross node timeouts, and others. Whether you are a developer or operator, new or experienced, you will be given a thorough understanding of what all is available via JMX without having to consult the code on your own.
About the Speaker
Nate McCall CTO, The Last Pickle
Nate McCall has 16 years of server-side systems and software development experience. He started his involvement in the Cassandra community in the late fall of 2009 when he became one of the original developers on the Hector Java client. He has contributed a number of patches over the years to the Apache Cassandra code base and continues to be actively involved on the mail lists, issue system and IRC. He has been a DataStax MVP every year since the inception of the program.
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...DataStax
Deleting data from Cassandra has several challenges, and existing solutions (tombstones or TTLs) have limitations that make them unusable or untenable in certain circumstances. We'll explore the cases where existing deletion options fail or are inadequate, then describe a solution we developed which deletes data from Cassandra during standard or user-defined compaction, but without resorting to tombstones or TTL's.
About the Speaker
Eric Stevens Principal Architect, ProtectWise, Inc.
Eric is the principal architect, and day one employee of ProtectWise, Inc., specializing in massive real time processing and scalability problems. The team at ProtectWise processes, analyzes, optimizes, indexes, and stores billions of network packets each second. They look for threats in real time, but also store full fidelity network data (including PCAP), and when new security intelligence is received, automatically replay existing network history through that new intelligence.
There are many aspects of tuning Cassandra for production and a lot can go wrong: network splits and latency, hardware issues and failure, data corruption, etc. Most are mitigated with Cassandra's architecture but there are use cases where we need to dig deep and tune all layers to get the result we need to achieve specific business goals.
We will explore such case where we had to tune Cassandra for performance but also have consistent results on 99.999% of the queries. Getting even to 99 percent was relatively easy, but pushing those extra nines involved a lot of work. There are many nuts and bolts to turn and tune in order to get consistent results.
We will cover biggest latency-inducing factors and see how to set up metrics and tackle inevitable issues when doing cloud-based deployments. We will get into one of the major "sins" regarding AWS deployment by demystifying EBS based storage and talk about how we can leverage OS properties while tuning for high read performance.
About the Speaker
Matija Gobec CTO, SmartCat
Experienced software engineer interested in distributed streaming systems and real time analytics. In love with Cassandra since early versions.
A look at the CQL changes in 3.x (Benjamin Lerer, Datastax) | Cassandra Summi...DataStax
Several CQL changes have occured since Cassandra 2.2. In this talk, I will explain some of the most important ones.
About the Speaker
Benjamin Lerer Software engineer, Datastax
Benjamin Lerer is an Apache Cassandra committer and a software engineer at Datastax. Prior to that, he worked 7 years for a High Frequency Trading Company.
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...DataStax
Cassandra's support for multiple data centers can bring massive benefits to an organization, however it can also bring painful operational lessons. While there is no recipe for trouble free mutli DC clusters, the best approach is to understand why you are using one, what Cassandra supports, and how it does it. With this knowledge in your toolkit you will have a better chance of fixing the sort of gremlins that can trouble a globally distributed database.
In this talk Alexander Dejanovski, Consultant at The Last Pickle, will outline the motivations people typically have for running a multi DC cluster. He will also look at how multiple DC's are supported through all areas of the Cassandra, how it impacts your application and operations, and how you can always blame the network.
About the Speaker
Alexander DEJANOVSKI Consultant, The Last Pickle
Alexander has been working as a software developer for the last 18 years, mainly for the french leader of express shipments. He's been leading there the effort to build a Cassandra based architecture and migrate services to it from traditional RDBMS. He is involved in the Cassandra community through the development of a JDBC wrapper for the DataStax Java Driver. Recently, he joined The Last Pickle as a Cassandra consultant and now helps customers to get the best out of it.
Software Architecture for Cloud InfrastructureTapio Rautonen
Distributed systems are hard to build. Software architecture must be carefully crafted to suit cloud infrastructure.
Design for failure. Learn from failure. Adopt new cloud compatible design patterns and follow the guidelines during the journey of building cloud native applications.
Enterprises are Increasingly demanding realtime analytics and insights to power use cases like personalization, monitoring and marketing. We will present Pulsar, a realtime streaming system used at eBay which can scale to millions of events per second with high availability and SQL-like language support, enabling realtime data enrichment, filtering and multi-dimensional metrics aggregation.
We will discuss how Pulsar integrates with a number of open source Apache technologies like Kafka, Hadoop and Kylin (Apache incubator) to achieve the high scalability, availability and flexibility. We use Kafka to replay unprocessed events to avoid data loss and to stream realtime events into Hadoop enabling reconciliation of data between realtime and batch. We use Kylin to provide multi-dimensional OLAP capabilities.
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2
The Marketplace data team at Uber has built a scalable complex event processing platform to solve many challenging real-time data needs for various Uber products. This platform has been in production for more than a year and supports over 100 real-time data use cases with a team of 3. In this talk, we will share the detail of the design and our experience, and how we employ Siddhi, Kafka and Samza at scale.
Using VisualSim Architect for Semiconductor System AnalysisDeepak Shankar
Mirabilis Design provides architecture exploration software for semiconductor, electronics and embedded software. Using this modeling and simulation solution, designers could trade-off power vs performance, partition into hardware-software, optimize for timing, minimize power consumption, functional analysis and evaluate the quality of the system in the event of a failure. The outcome of this early exploration is a highly validated specification, a reference design for prospective customers to evaluate and data for certification purposes.
VisualSim has a large library of components (stochastic, hardware, software, network and RTOS) that is used to assemble models of the entire system, extremely fast and handle level of abstraction from stochastic to timing-accurate. These models are simulated against workloads and use-cases and the generated reports are used to make architecture decisions.
Tech Talk: Leverage the combined power of CA Unified Infrastructure Managemen...CA Technologies
Take the guesswork out of your infrastructure environment by combining CA Unified Infrastructure Management, CA Network Flow Analysis and CA Application Delivery Analysis. Learn how to optimize your infrastructure by combining IT monitoring, network traffic monitoring and application response time monitoring solutions to give you enhanced end-to-end visibility into your infrastructure. This sessions will review the power of the three solutions and explain how you can easily combine them to give you the information you need.
For more information, please visit http://cainc.to/Nv2VOe
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Dataconomy Media
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder of DataTorrent presented "Streaming Analytics with Apache Apex" as part of the Big Data, Berlin v 8.0 meetup organised on the 14th of July 2016 at the WeWork headquarters.
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...Prolifics
In this presentation will talk about how one of the world's leading Financial Institutions, leveraged WebSphere DataPower to provide a set of centralized consumer profile management services. This central service would be leveraged by internal and external applications, and would align with enterprise marketing capabilities. The solution included a complex security model which included the following products: Tivoli Directory Server, Tivoli Access Manager and Tivoli Federated Identity Manager. We will describe how to build complex orchestrations in WebSphere DataPower, and also go through some of the performance tuning options we implemented to achieve a high degree of efficiency.
Transform Your Organization with Real Real-Time MonitoringAmazon Web Services
Acquia, a Drupal web experience provider, faced a common growing pain: with its expanding customer base and AWS workloads came numerous monitoring systems and scattered data from disparate sources and teams. The company knew it needed better insight into its customers’ resources and quicker access to data it could trust. Join our webinar to see why Acquia turned to SignalFx for real real-time monitoring for its AWS environment, enabling its entire organization with operational insights, from development all the way through sales. Learn how Acquia consolidated the number of monitoring services used, improved the quality of its customer services, and saved more than half a million dollars per year in costs.
Guide to Application Performance: Planning to Continued OptimizationMuleSoft
Supporting everything from mobile apps with thousands of concurrent users to global deployments processing millions of requests daily, Anypoint Platform has been put to test. In this session, MuleSoft experts will talk through case studies from our most demanding deployments and provide a best practice approach to designing and tuning applications for optimal performance.
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
As common sense would suggest, weather has a definite impact on traffic. But how much? And under what circumstances? Can we improve traffic (congestion) prediction given weather data? Predictive traffic is envisioned to significantly impact how driver’s plan their day by alerting users before they travel, find the best times to travel, and over time, learn from new IoT data such as road conditions, incidents, etc. This talk will cover the traffic prediction work conducted jointly by IBM and the traffic data provider. As a part of this work, we conducted a case study over five large metropolitans in the US, 2.58 billion traffic records and 262 million weather records, to quantify the boost in accuracy of traffic prediction using weather data. We will provide an overview of our lambda architecture with Apache Spark being used to build prediction models with weather and traffic data, and Spark Streaming used to score the model and provide real-time traffic predictions. This talk will also cover a suite of extensions to Spark to analyze geospatial and temporal patterns in traffic and weather data, as well as the suite of machine learning algorithms that were used with Spark framework. Initial results of this work were presented at the National Association of Broadcasters meeting in Las Vegas in April 2017, and there is work to scale the system to provide predictions in over a 100 cities. Audience will learn about our experience scaling using Spark in offline and streaming mode, building statistical and deep-learning pipelines with Spark, and techniques to work with geospatial and time-series data.
COLLABORATE 18 Presentation: Demand Planning in Cloud R13Jade Global
Understanding Demand Planning in Cloud R13 Through an Early Adaptor Case Study
Session Abstract:
Oracle has released Demand Planning in Cloud with Release 13. We will share the experiences of an early adaptor customer with their demand management and SNOP processes in cloud. We will also compare the cloud offering with that of Demantra and provide a guidance on its readiness for different industries. In the end we will explore the coexistence possibilities and prerequisites for Demand Management Cloud.
Autoscaling Confluent Cloud: Should We? How Would We?HostedbyConfluent
"Although cloud-based, managed Kafka offerings abstract away most administrative responsibilities, a few admin-related concerns remain––like cluster scaling. When is scaling your cloud-based Kafka appropriate? And how should you set it up to auto-scale?
Gone are the days of over-provisioning resources to meet expected demand. Technologies like kubernetes make it relatively simple to implement strategies around both horizontal and vertical scaling. Cloud providers give users the ability to track their resource utilization and set up autoscaling groups and policies. Cloud administrators use these tools (and others) to guarantee their applications can handle the demands placed on them. With Kafka being a central pillar of our cloud-native data pipelines it requires administrators to determine if, when and how to scale Kafka as their workloads ebb and flow.
In this session, we’ll explore the topic of auto-scaling by implementing a strategy for Confluent Cloud resources. We’ll first discuss common use cases that dictate a need to create a scaling strategy for Confluent Cloud and introduce the approaches best suited for each use case. With a nod to both where we came from and where we are going, we will discuss the architecture of Confluent Cloud and how it affects the way we scale Kafka. Attendees will learn how to deal with ephemeral workloads, what to monitor for when creating an auto-scaling policy, and the “gotchas” of auto-scaling in Confluent Cloud. We will also discuss best practices for scaling Kafka clients, because Kafka is only as scalable as the client applications that connect to it.
We will dive into code that examines these approaches and by the end of the session, you’ll have the tools needed to design and implement your own scaling strategy for your Confluent Cloud workloads."
Tuning Java Driver for Apache CassandraNenad Bozic
Apache Cassandra is distributed masterless column store database which is becoming mainstream for analytics and IoT data. Many use cases where Cassandra is natural fit require latency tuning in order to serve requests really fast. DataStax driver has many options, some less familiar, which can greatly influence performance aspect. This talk will focus on Java applications and options at your disposal in DataStax Java driver which became standard when you want to use this database. We will concentrate on both monitoring and tuning aspect of things and we will provide different options for different use cases. There is no silver bullet solution and having many options requires deep dive when you want to figure out right decision. This talk will narrow down options and give you push in the right direction.
The pathway to the cloud has many different options and levers that customers can pull. This webinar walks customers through actual steps from creating a cloud adoption vision to actually building a migration roadmap with actionable guidance. We’ll go through proven migration patterns, methods and tooling that AWS has leveraged successfully with hundreds of Enterprise customers around the globe. Learn what challenges customers face when planning the migrations to cloud, and how they overcome them to minimize risk and accelerate the adoption.
The Need for Complex Analytics from Forwarding Pipelines Netronome
Nic Viljoen, Research Engineer, (including Tom Tofigh and Bryan Sullivan form AT&T) presentation from ONS 2016 at Santa Clara Convention Center in Santa Clara, CA.
Similar to C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix) | Cassandra Summit 2016 (20)
Is Your Enterprise Ready to Shine This Holiday Season?DataStax
Be a holiday hero—not a sorry statistic. View this on-demand webinar to learn how to drive revenue, business growth, customer satisfaction, and loyalty during the holiday season, and achieve operational excellence (and sanity!) at the same time. You’ll also hear real-world stories of companies that have experienced Black Friday nightmares—and learn how they turned things back around.
View webinar: https://pages.datastax.com/20191003-NAM-Webinar-IsYourEnterpriseReadytoShinethisHolidaySeason_1-Registration-LP.html
Explore all DataStax webinars: www.datastax.com/webinars
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...DataStax
Data resiliency and availability are mission-critical for enterprises today—yet we live in a world where outages are an everyday occurrence. Whether the problem is a single server failure or losing connectivity to an entire data center, if your applications aren’t designed to be fault tolerant, recovery from an outage can be painful and slow. Watch this on-demand webinar to look at best practices for developing fault-tolerant applications with DataStax Drivers for Apache Cassandra and DataStax Enterprise (DSE).
View recording: https://youtu.be/NT2-i3u5wo0
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsDataStax
To simplify deploying and managing modern applications, enterprises have been combining the benefits of hyperconverged infrastructure (HCI) with the performance and scale of a NoSQL database — and the results have been remarkable. With this combination, IT organizations have experienced more agility, improved reliability, and better application performance. Watch this on-demand webinar where you’ll learn specifically how VMware HCI with DataStax Enterprise (DSE) and Apache Cassandra™ are transforming the enterprise.
View recording: https://youtu.be/FCLGHMIB0L4
Explore all DataStax Webinars: https://www.datastax.com/resources/webinars
Best Practices for Getting to Production with DataStax Enterprise GraphDataStax
A distributed graph database is the most powerful means of discovering and leveraging the relationships in your data. With the right techniques combined with the right enterprise graph features, you can build modern applications at scale for real-time use-cases. But how exactly should you manage and model your data for a distributed graph database? And how can you leverage the relationships in that data? Watch this on-demand webinar as our graph expert answers those questions and shares tips and insights into creating production apps with distributed graph data.
View recording: https://youtu.be/TSs_qPnhOas
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyDataStax
Data management may be the hardest part of making the transition to the cloud, but enterprises including Intuit and Macy’s have figured out how to do it right. So what do they know that you might not? Join Robin Schumacher, Chief Product Officer at DataStax as he explores best practices for defining and implementing data management strategies for the cloud. He outlines a four-step journey that will take you from your first deployment in the cloud through to a true intercloud implementation and walk through a real-world use case where a major retailer has evolved through the four phases over a period of four years and is now benefiting from a highly resilient multi-cloud deployment.
View webinar: https://youtu.be/RrTxQ2BAxjg
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...DataStax
In this webinar, you will leverage free and open source tools as well as enterprise-grade utilities developed by DataStax to get a solid grasp on the performance of a masterless distributed database like Cassandra. You’ll also get the opportunity to walk through DataStax Enterprise Insights dashboards and see exactly how to identify performance bottlenecks.
View Recording: https://youtu.be/McZg_MMzVjI
Webinar | Better Together: Apache Cassandra and Apache KafkaDataStax
In this webinar, you’ll also be introduced to DataStax Apache Kafka Connector, and get a brief demonstration of this groundbreaking technology. You’ll directly experience how this tool can help you stream data from Kafka topics into DataStax Enterprise versions of Cassandra. The future of your organization won’t wait. Register now to reserve your spot in this exciting new webinar.
Youtube: https://youtu.be/HmkNb8twUNk
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseDataStax
No matter how diligent your organization is at driving toward efficiency, databases are complex and it’s easy to make mistakes on your way to production. The good news is, these mistakes are completely avoidable. In this webinar, Jeff Carpenter shares with you exactly how to get started in the right direction — and stay on the path to a successful database launch.
View recording: https://youtu.be/K9Zj3bhjdQg
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Introduction to Apache Cassandra™ + What’s New in 4.0DataStax
Apache Cassandra has been a driving force for applications that scale for over 10 years. This open-source database now powers 30% of the Fortune 100.Now is your chance to get an inside look, guided by the company that’s responsible for 85% of the code commits.You won’t want to miss this deep dive into the database that has become the power behind the moment — the force behind game-changing, scalable cloud applications - Patrick McFadin, VP Developer Relations at DataStax, is going behind the Cassandra curtain in an exclusive webinar.
View recording: https://youtu.be/z8fLn8GL5as
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...DataStax
In this webinar, we’ll discuss how an Active Everywhere database—a masterless architecture where multiple servers (or nodes) are grouped together in a cluster—provides a consistent data fabric between on-premises data centers and public clouds, enabling enterprises to effortlessly scale their hybrid cloud deployments and easily transition to the new hybrid cloud world, without changes to existing applications.
View recording: https://youtu.be/ob6tr-9YiF4
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesDataStax
The European Union’s General Data Protection Regulation (GDPR) has sweeping effects on how enterprises manage their data. Without the right policies and safeguards in place, a tiny data mishap could end up turning into a catastrophic mistake. Join Datastax and our partner Thales eSecurity for a live webinar to learn how GDPR effects impact data management and the various ways enterprises can both comply and thrive in a hybrid cloud environment.
View recording: https://youtu.be/QZ48_qkK9PU
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Designing a Distributed Cloud Database for DummiesDataStax
Join Designing a Distributed Cloud Database for Dummies—the webinar. The webinar “stars” industry vet Patrick McFadin, best known among developers for his seven years at Apache Cassandra, where he held pivotal community roles. Register for the webinar today to learn: why you need distributed cloud databases, the technology you need to create the best used experience, the benefits of data autonomy and much more.
View the recording: https://youtu.be/azC7lB0QU7E
To explore all DataStax webinars: https://www.datastax.com/resources/webinars
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudDataStax
Most enterprises understand the value of hybrid cloud. In fact, your enterprise is already working in a multi-cloud or hybrid cloud environment, whether you know it or not. View this SlideShare to gain a greater understanding of the requirements of a geo-distributed cloud database in hybrid and multi-cloud environments.
View recording: https://youtu.be/tHukS-p6lUI
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
How to Evaluate Cloud Databases for eCommerceDataStax
View these slides to discover the advantages of a distributed cloud database designed for hybrid cloud along with examples of how companies are delivering innovative and personalized ecommerce experiences. We'll discuss the sources of common data challenges and the hidden impact they have on business, the database requirements for improved customer experiences and innovative application delivery, and how leading organizations such as eBay, Sony, Macy’s, and Comcast are transforming the eCommerce experience with DataStax Enterprise 6.
View recording: https://youtu.be/4UXrJ3xtmGg
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...DataStax
Today’s customers want experiences that are contextual, always on, and above all — delightful. To be able to provide this, enterprises need a distributed, hybrid cloud-ready database that can easily crunch massive volumes of data from disparate sources while offering data autonomy and operational simplicity. Don’t miss this webinar, where you’ll learn how DataStax Enterprise 6 maintains hybrid cloud flexibility with all the benefits of a distributed cloud database, delivers all the advantages of Apache Cassandra with none of the complexities, doubles performance, and provides additional capabilities around robust transactional analytics, graph, search, and more.
View recording: https://youtu.be/tuiWAt2jwBw
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...DataStax
Today’s Right-Now Economy means employees and customers alike expect applications to be always on, real time, and contextual. But how do you manage applications that collect data from a variety of sources, at cloud scale, and provide instant insights? And, can you embrace the public cloud while still retaining control of your data? Join us to hear from Microsoft Cloud Architect and Azure Global Black Belt Ron Abellera to learn how an enterprise-ready hybrid cloud data layer can help to accelerate time to market and scale linearly, ensure continuous availability, and achieve data autonomy with a hybrid cloud strategy.
View webinar recording: https://youtu.be/_-GqmAk5C_I
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...DataStax
Welcome to the Right-Now Economy. To win in the Right-Now Economy, your enterprise needs to be able to provide delightful, always-on, instantaneously responsive applications via a data layer that can handle data rapidly, in real time, and at cloud scale. Don’t miss our upcoming webinar in which Forrester Principal Analyst Brendan Witcher will discuss why a singular, contextual, 360-degree view of the customer in real-time is critical to CX success and how companies are using data to deliver real-time personalization and recommendations.
View recording: https://youtu.be/e6prezfIGMY
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Datastax - The Architect's guide to customer experience (CX)DataStax
From scalability to data access to data governance, learn the specific performance and data requirements of a customer experience-ready data management platform.
An Operational Data Layer is Critical for Transformative Banking ApplicationsDataStax
Customer expectations are changing fast, while customer-related data is pouring in at an unprecedented rate and volume. Join this webinar, to hear leading experts from DataStax, discuss how DataStax Enterprise, the data management platform trusted by 9 out of the top 15 global banks, enables innovation and industry transformation. They’ll cover how the right data management platform can help break down data silos and modernize old systems of record as an operational data layer that scales to meet the distributed, real-time, always available demands of the enterprise. Register now to learn how the right data management platform allows you to power innovative banking applications, gain instant insight into comprehensive customer interactions, and beat fraud before it happens.
Video: https://youtu.be/319NnKEKJzI
Explore all DataStax webinars: https://www.datastax.com/resources/webinars
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingDataStax
Customer expectations are changing fast, while customer-related data is pouring in at an unprecedented rate and volume. How can you contextualize and analyze all this customer data in real time to meet increasingly demanding customer expectations? Join Mike Rowland, Director and National Practice Leader for CX Strategy at West Monroe Partners, and Kartavya Jain, Product Marketing Manager at DataStax, for an in-depth conversation about how customer experience frameworks, driven by Design Thinking, can help enterprises: understand their customers and their needs, define their strategy for real-time CX, create value from contextual and instant insights.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
2. ● CDE, Cloud Database Engineering
● Providing data stores as a service
○Cassandra,
○ Dynomite,
○ Elasticsearch and RDS
Ajay Upadhyay
Cloud Data Architect @ Netflix
Arun Agrawal
Sr. Software Engineer @
Netflix
Who are we?
4. • 98% of streaming data is stored
in Cassandra
• Data ranges from customer
details to Viewing history /
streaming bookmarks to billing
and payment
Cassandra @ Netflix
7. Capacity Planning
• Able to predict
– Current usage and available capacity
– Resources needing upgrade
– Life cycle of current configuration
– Appropriate configuration for new and
existing App/Service
• Optimize
– Under or over utilized resource
– Increased business productivity
44. You may not control all the events that happen to you,
but you CAN decide not to be reduced by them.
- Maya Angelou
Editor's Notes
For business to delivery - quality service to meet and exceed customers expectations - need right capacity and resources
Work with app team -
Cluster / ring size 9 nodes 300 nodes
10k instances - right from ms to i2 to d2 instances
Cluster / ring size 9 nodes 300 nodes
10k instances - right from ms to i2 to d2 instances
Current usage and available capacity
Resources needing upgrade
Cost-effective configuration - just vertical upgrade - no need to add nodes or increase ring size
Life cycle of current configuration - when cluster will run out of resources
Appropriate configuration for new and existing App/Service
Analysis – In the analysis phase data collected in the Monitoring phase and analyze them to find problems and evaluate the quality of the deployment.
Optimization - stagger R - W
Repair overheads - amount of writes and data size - Entropy in the system
No repair - quorum R and W - aggressive ttl data
Compactions - implicit - compaction-threshold - 2 - GC grace period more aggressive
Node replacement - replace early if node is still healthy - bootstrap from neighboring nodes
Backup overheads - throttle if creates a big bottleneck on network
Read - full row or column slices
Write - full row or few columns at a time
STCS - size tiered
LCS - Leveled compaction straregy
Aggressive TTLs - few hours to few days
Variable Payloads - 1k - 1m range
Model/Simulate traffic using NDBench for new requirement
Cache for aggressive latencies
Cluster sharding for high and low latency required data
Continuous monitoring to keep track of usage pattern
Useful for predicting it’s clusters life
For proxing for traffic similar to one captured here
New requirement or change in existing traffic
capacity planning cycle begins
Let’s see how we really manage CAAS at netflix. A short video where we get notified on slack about the cluster which may reach its capacity and then we do some investigation, talk with app teams, warn them, take proper steps (if required) or increase the capacity of the cluster.
Pretty cool and neat stuff. Right. So let’s see how we do this? Its actually some human doing the analysis and posting on slack. No, let’s see what is the science behind this but before we get there we need to understand netflix ecosystem.
Pretty cool and neat stuff. Right. So let’s see how we do this? Its actually some human doing the analysis and posting on slack. No, let’s see what is the science behind this but before we get there we need to understand netflix ecosystem.
Pretty cool and neat stuff. Right. So let’s see how we do this? Its actually some human doing the analysis and posting on slack. No, let’s see what is the science behind this but before we get there we need to understand netflix ecosystem.
Every instance in netflix uploads all the telemetry information to Atlas. Atlas is very useful tool as it combines all the raw inputs from multiple instances based on availability zones, region, application etc. Really handy tool to find performance issues, debug, triage and have aggregated view of app. One of the multiple features of atlas is the ability to set a threshold and duration for a metric which when tripped can page on-call.
But let’s face it, if you paged a person based on single metric being tripped, is it right? It is NEVER a single metric which can tell you about the cluster. There are hooks in atlas where you can define basic relationship between metrics but again, it is always complex relationships which we are after. In addition to that, there could be false positives being reported because let’s face it, we are hosted in AWS and failure of machines is not a exception but norm. When machines fail they don’t always report metrics leading us to believe we are in false positive zone.
So to reduce oncall pain, we needed a middle layer logic which could sit between atlas and oncall, where we could provide complex relationship between multiple metrics, add context, do basic triage and remove those false positives. We brought “Winston” which is based on stackstorm which does all this and provides a great UI to work with. Winston has native integration with Atlas and thus you can write some python code which will be triggered when Atlas fires the event. This combination of Atlas and Winston greatly reduces false positives for oncall.
But wait, now we were getting paged which is accurate but how can we save the cluster? It might be already too late. It is more reactive than proactive.
How can we build the system which tells us that system might get under pressure if the trend continues. If we have such system, then we are better prepared for what is about to come. Chances of us getting paged at middle of night for degraded performance or latencies alert can be reduced drastically if not avoided completely. This is where we started to think about a system which could predict the future of a cluster.
With such a system, we can take a break and have more confidence on our system that it will be able to handle what is about to come. Again this is to say if the “trend” continues, if you try to do something which is not expected we still might have issues where we would be increasing the capacity of the cluster at the very last moment. So we set the expectations that this is not magic ball which will solve all the problems but it will surely help you find the problem areas before they happen. Don’t expect the clusters to auto-scale when you suddenly add another 100m subscribers!
In netflix, atlas metrics are pushed to big data platform which is netflix’s data warehouse. Here all the metrics are stored and all analysis can run here.
With such a system, we can take a break and have more confidence on our system that it will be able to handle what is about to come. Again this is to say if the “trend” continues, if you try to do something which is not expected we still might have issues where we would be increasing the capacity of the cluster at the very last moment. So we set the expectations that this is not magic ball which will solve all the problems but it will surely help you find the problem areas before they happen. Don’t expect the clusters to auto-scale when you suddenly add another 100m subscribers!
With such a system, we can take a break and have more confidence on our system that it will be able to handle what is about to come. Again this is to say if the “trend” continues, if you try to do something which is not expected we still might have issues where we would be increasing the capacity of the cluster at the very last moment. So we set the expectations that this is not magic ball which will solve all the problems but it will surely help you find the problem areas before they happen. Don’t expect the clusters to auto-scale when you suddenly add another 100m subscribers!
RSS - RESIDUAL SUM OF SQUARES
RMSE - ROOT MEAN SQUARED ERROR
With such a system, we can take a break and have more confidence on our system that it will be able to handle what is about to come. Again this is to say if the “trend” continues, if you try to do something which is not expected we still might have issues where we would be increasing the capacity of the cluster at the very last moment. So we set the expectations that this is not magic ball which will solve all the problems but it will surely help you find the problem areas before they happen. Don’t expect the clusters to auto-scale when you suddenly add another 100m subscribers!
With such a system, we can take a break and have more confidence on our system that it will be able to handle what is about to come. Again this is to say if the “trend” continues, if you try to do something which is not expected we still might have issues where we would be increasing the capacity of the cluster at the very last moment. So we set the expectations that this is not magic ball which will solve all the problems but it will surely help you find the problem areas before they happen. Don’t expect the clusters to auto-scale when you suddenly add another 100m subscribers!
With such a system, we can take a break and have more confidence on our system that it will be able to handle what is about to come. Again this is to say if the “trend” continues, if you try to do something which is not expected we still might have issues where we would be increasing the capacity of the cluster at the very last moment. So we set the expectations that this is not magic ball which will solve all the problems but it will surely help you find the problem areas before they happen. Don’t expect the clusters to auto-scale when you suddenly add another 100m subscribers!