1. If it’s not SQL, it’s not a database.
2. It takes 5+ years to build a database.
3. Listen to your users.
4. Too much magic is a bad thing.
5. It’s the cloud, stupid.
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...DataStax
Lessons learned from a year spent building a Cassandra cluster over multiple regions, data centers, and providers. Will discuss our successes and learnings on replication, operations, and application development.
About the Speaker
Aaron Ploetz Lead Technical Architect, Target
Aaron is a Lead Technical Architect for Target, where he coaches development teams on modeling and building applications for Cassandra. He is active in the Cassandra tags on StackOverflow, and has also contributed patches to cqlsh. Aaron holds a B.S. in Management/Computer Systems from the University of Wisconsin-Whitewater, a M.S. in Software Engineering and Database Technologies from Regis University, and is a 2x DataStax MVP for Apache Cassandra.
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...DataStax
You write with QUORUM, you read with QUORUM. You're safe, right?
Although it may seem that way, you could read a different value than the one you wrote - even if nobody else wrote after you. One way this can happen is if the time on the machines in your cluster is not synchronized closely enough. This is called clock skew, and is just one of the ways you'll see that this anomaly can occur.
In this talk we'll dive in to how Cassandra handles conflicting data, walk through several weird and seemingly impossible situations that can happen (both with and without clock skew), and see what we can do to work around them.
About the Speaker
Donny Nadolny Senior Developer, PagerDuty
Donny Nadolny is a Scala developer at PagerDuty, working on improving the reliability of their backend systems. He spends a large amount of time investigating problems experienced with distributed systems like Cassandra and ZooKeeper.
Instaclustr has a diverse customer base including Ad Tech, IoT and messaging applications ranging from small start ups to large enterprises. In this presentation we share our experiences, common issues, diagnosis methods, and some tips and tricks for managing your Cassandra cluster.
About the Speaker
Brooke Jensen VP Technical Operations & Customer Services, Instaclustr
Instaclustr is the only provider of fully managed Cassandra as a Service in the world. Brooke Jensen manages our team of Engineers that maintain the operational performance of our diverse fleet clusters, as well as providing 24/7 advice and support to our customers. Brooke has over 10 years' experience as a Software Engineer, specializing in performance optimization of large systems and has extensive experience managing and resolving major system incidents.
We run multiple DataStax Enterprise clusters in Azure each holding 300 TB+ data to deeply understand Office 365 users. In this talk, we will deep dive into some of the key challenges and takeaways faced in running these clusters reliably over a year. To name a few: process crashes, ephemeral SSDs contributing to data loss, slow streaming between nodes, mutation drops, compaction strategy choices, schema updates when nodes are down and backup/restore. We will briefly talk about our contributions back to Cassandra, and our path forward using network attached disks offered via Azure premium storage.
About the Speaker
Anubhav Kale Sr. Software Engineer, Microsoft
Anubhav is a senior software engineer at Microsoft. His team is responsible for building big data platform using Cassandra, Spark and Azure to generate per-user insights of Office 365 users.
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...DataStax
Lessons learned from a year spent building a Cassandra cluster over multiple regions, data centers, and providers. Will discuss our successes and learnings on replication, operations, and application development.
About the Speaker
Aaron Ploetz Lead Technical Architect, Target
Aaron is a Lead Technical Architect for Target, where he coaches development teams on modeling and building applications for Cassandra. He is active in the Cassandra tags on StackOverflow, and has also contributed patches to cqlsh. Aaron holds a B.S. in Management/Computer Systems from the University of Wisconsin-Whitewater, a M.S. in Software Engineering and Database Technologies from Regis University, and is a 2x DataStax MVP for Apache Cassandra.
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...DataStax
You write with QUORUM, you read with QUORUM. You're safe, right?
Although it may seem that way, you could read a different value than the one you wrote - even if nobody else wrote after you. One way this can happen is if the time on the machines in your cluster is not synchronized closely enough. This is called clock skew, and is just one of the ways you'll see that this anomaly can occur.
In this talk we'll dive in to how Cassandra handles conflicting data, walk through several weird and seemingly impossible situations that can happen (both with and without clock skew), and see what we can do to work around them.
About the Speaker
Donny Nadolny Senior Developer, PagerDuty
Donny Nadolny is a Scala developer at PagerDuty, working on improving the reliability of their backend systems. He spends a large amount of time investigating problems experienced with distributed systems like Cassandra and ZooKeeper.
Instaclustr has a diverse customer base including Ad Tech, IoT and messaging applications ranging from small start ups to large enterprises. In this presentation we share our experiences, common issues, diagnosis methods, and some tips and tricks for managing your Cassandra cluster.
About the Speaker
Brooke Jensen VP Technical Operations & Customer Services, Instaclustr
Instaclustr is the only provider of fully managed Cassandra as a Service in the world. Brooke Jensen manages our team of Engineers that maintain the operational performance of our diverse fleet clusters, as well as providing 24/7 advice and support to our customers. Brooke has over 10 years' experience as a Software Engineer, specializing in performance optimization of large systems and has extensive experience managing and resolving major system incidents.
We run multiple DataStax Enterprise clusters in Azure each holding 300 TB+ data to deeply understand Office 365 users. In this talk, we will deep dive into some of the key challenges and takeaways faced in running these clusters reliably over a year. To name a few: process crashes, ephemeral SSDs contributing to data loss, slow streaming between nodes, mutation drops, compaction strategy choices, schema updates when nodes are down and backup/restore. We will briefly talk about our contributions back to Cassandra, and our path forward using network attached disks offered via Azure premium storage.
About the Speaker
Anubhav Kale Sr. Software Engineer, Microsoft
Anubhav is a senior software engineer at Microsoft. His team is responsible for building big data platform using Cassandra, Spark and Azure to generate per-user insights of Office 365 users.
Cassandra is a better alternative to RDBMS for a scalable solution which requires a distributed DB but it is more popular in clustered solutions which are targeted for a single installation. Key reason is maintainability & life-cycle management.
Ericsson has re-engineered its voucher management solution for prepaid billing by replacing RDBMS with Cassandra. It facilitates clusters with large set of nodes which can easily scale up & scale down, so that one doesn't have to deal with multiple clusters. However, skills for its administration are sparse, unlke RDBMS. Activities like nodetool repair, compaction & scale up/down become challenging. Moreover, frequency of new Cassandra releases is high and rolling them out to several deployments is challenging
Key technical challenges were consistency of denormalized data, performance of full-table scan & porting the product from Thrift to CQL. Challenges with large scale global deployments are with anti-entropy & size-tiered compaction.
About the Speaker
Brij Bhushan Ravat Chief Architect, Ericsson
Brij is Chief Architect for prepaid billing product in Ericsson. The product uses Cassandra in business support systems for telecom service providers. He has also led Centre of Excellence for Network Applications, which tracks emerging trends in the application development in the area of telecom. This includes telecom services, OSS & leveraging big data technologies for innovative new age solutions His focus is on application of big data in telecom. This includes analytics using Spark & NoSQL
Co-Founder and CTO of Instaclustr, Ben Bromhead's presentation at the Cassandra Summit 2016, in San Jose.
This presentation will show how create truly elastic Cassandra deployments on AWS allowing you to scale and shrink your large Cassandra deployments multiple times a day. Leveraging a combination of EBS backed disks, JBOD, token pinning and our previous work on bootstrapping from backups you will be able to dramatically reduce costs per cluster by scaling to match your daily workloads.
Presenter: Chris Lohfink, Engineer at Pythian
This session will cover a walk-through to provide an understanding of key metrics critical to operating a Cassandra cluster effectively. Without context to the metrics, we just have pretty graphs. With context, we have a powerful tool to determine problems before they happen and to debug production issues more quickly.
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...DataStax
Cassandra's support for multiple data centers can bring massive benefits to an organization, however it can also bring painful operational lessons. While there is no recipe for trouble free mutli DC clusters, the best approach is to understand why you are using one, what Cassandra supports, and how it does it. With this knowledge in your toolkit you will have a better chance of fixing the sort of gremlins that can trouble a globally distributed database.
In this talk Alexander Dejanovski, Consultant at The Last Pickle, will outline the motivations people typically have for running a multi DC cluster. He will also look at how multiple DC's are supported through all areas of the Cassandra, how it impacts your application and operations, and how you can always blame the network.
About the Speaker
Alexander DEJANOVSKI Consultant, The Last Pickle
Alexander has been working as a software developer for the last 18 years, mainly for the french leader of express shipments. He's been leading there the effort to build a Cassandra based architecture and migrate services to it from traditional RDBMS. He is involved in the Cassandra community through the development of a JDBC wrapper for the DataStax Java Driver. Recently, he joined The Last Pickle as a Cassandra consultant and now helps customers to get the best out of it.
Empowering the AWS DynamoDB™ application developer with AlternatorScyllaDB
Getting started with AWS DynamoDB™ is famously easy, but as an application grows and evolves it often starts to struggle with DynamoDB’s limitations. We introduce Scylla’s Alternator, which provides the same API as DynamoDB but aims to empower the application developer. In this presentation we will survey some of Alternator’s developer-centered features: Alternator lets you test and eventually deploy your application anywhere, on any public cloud or private cluster. It efficiently supports multiple tables so it does not require difficult single-table design. Finally, Alternator provides the developer with strong observability tools. The insights provided by these tools can detect bottlenecks, improve performance and even lower its cost.
Performance tuning - A key to successful cassandra migrationRamkumar Nottath
In last few years, technology has seen a major drift in the dominance of traditional / RDMBS databases across different domains. Expeditious adoption of NoSQL databases especially Cassandra in the industry opens up a lot more discussions on what are the major challenges that are faced during implementation of Cassandra and how to mitigate it. Many a times we conclude that migration or POC (proof of concept) is not successful; however the real flaw might be in the data modeling, identifying the right hardware configurations, database parameters, right consistency level and so on. There's no one good model or configuration which fits all use cases and all applications. Performance tuning an application is truly an art and requires perseverance. This paper delve into different performance tuning considerations and anti-patterns that need to be considered during Cassandra migration / implementation to make sure we are able to reap the benefits of Cassandra, what makes it a ‘Visionary’ in 2014 Gartner’s Magic Quadrant for Operational Database Management Systems.
This presentation will investigate how using micro-batching for submitting writes to Cassandra can improve throughput and reduce client application CPU load.
Micro-batching combines writes for the same partition key into a single network request and ensures they hit the "fast path" for writes on a Cassandra node.
About the Speaker
Adam Zegelin Technical Co-founder, Instaclustr
As Instaclustrs founding software engineer, Adam provides the foundation knowledge of our capability and engineering environment. He delivers business-focused value to our code-base and overall capability architecture. Adam is also focused on providing Instaclustr's contribution to the broader open source community on which our products and services rely, including Apache Cassandra, Apache Spark and other technologies such as CoreOS and Docker.
Cassandra Community Webinar | Data Model on FireDataStax
Functional data models are great, but how can you squeeze out more performance and make them awesome? Let's talk through some example Cassandra 2.0 models, go through the tuning steps and understand the tradeoffs. Many time's just a simple understanding of the underlying Cassandra 2.0 internals can make all the difference. I've helped some of the biggest companies in the world do this and I can help you. Do you feel the need for Cassandra 2.0 speed?
Webinar: Getting Started with Apache CassandraDataStax
Would you like to learn how to use Cassandra but don’t know where to begin? Want to get your feet wet but you’re lost in the desert? Longing for a cluster when you don’t even know how to set up a node? Then look no further! Rebecca Mills, Junior Evangelist at Datastax, will guide you in the webinar “Getting Started with Apache Cassandra...”
You'll get an overview of Planet Cassandra’s resources to get you started quickly and easily. Rebecca will take you down the path that's right for you, whether you are a developer or administrator. Join if you are interested in getting Cassandra up and working in the way that suits you best.
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...DataStax
We use Apache Cassandra at BlackRock to help power our Aladdin investment management platform. Like most users, we love Cassandra’s scalability and fault tolerance. One challenge we’ve faced is keeping data consistent between data centers. Cassandra is great at replicating data to multiple data centers, and many users take advantage of this feature to achieve eventual consistency in multi-region clusters. At BlackRock, we have several use cases where eventual consistency is not good enough; sometimes we need to guarantee that the most recent data is available from all locations. Cassandra’s tunable consistency makes it possible to achieve this extreme level of resiliency. In this talk we’ll discuss our experience from the past several years using Cassandra for cross-WAN consistency, some of the novel ways we’ve dealt with the performance implications, and our ideas for improving support for this usage model in future versions of Cassandra.
About the Speaker
Randy Fradin Vice President, BlackRock
Randy Fradin is part of BlackRock’s Aladdin Product Group. His team is responsible for developing the core software infrastructure in BlackRock’s Aladdin platform, including scalable storage, compute, and messaging services. Previously he spent time developing the market data, risk reporting, and core trading functions in Aladdin. He has been an enthusiastic Cassandra user since 2011.
Performance Testing: Scylla vs. Cassandra vs. DatastaxScyllaDB
Ticketmaster is part of Live Nation Entertainment, the world's leading live entertainment company. Learn why they went with Scylla after conducting performance testing between Scylla, Apache Cassandra and DataStax Enterprise.
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...DataStax
Cassandra is a distributed database with features included but not limited to Secundary Indexes, UDF, Materialized Views, etc. and not so strict hardware requirements.
It is important to use those features and select hardware correctly to make sure the use of Cassandra in your business can be as painless as possible.
I will address how these features are used in the wrong way, how hardware should be selected, and how to make Cassandra work in the best possible way.
Learning Objective #1:
Learn that Cassandra hardware requirements exist (and why) and the shortcomings in some of features(Secundary Indexes, Compaction Strategies, etc).
Learning Objective #2:
The most misused features and common hardware errors. How they might seem harmeless at first (either small cluster or even single node).
Learning Objective #3:
How to correctly use Cassandra and it's features and go for perfect operation.
About the Speaker
Carlos Rolo Cassandra Consultant, Pythian
Carlos Rolo is a Cassandra MVP, and has deep expertise with distributed architecture technologies. Carlos is driven by challenge, and enjoys the opportunities to discover new things.. He has become known and trusted by customers and colleagues for his ability to understand complex problems, and to work well under pressure. When Carlos isn't working he can be found playing water polo or enjoying the his local community.
Adam Zegelin is Instaclustr's founding software engineer. This presentation will investigate how using micro-batching for submitting writes to Cassandra can improve throughput and reduce client application CPU load. Micro-batching combines writes for the same partition key into a single network request and ensures they hit the “fast path” for writes on a Cassandra node.
This presentation will show how create truly elastic Cassandra deployments on AWS allowing you to scale and shrink your large Cassandra deployments multiple times a day.
Leveraging a combination of EBS backed disks, JBOD, token pinning and our previous work on bootstrapping from backups you will be able to dramatically reduce costs per cluster by scaling to match your daily workloads.
Warning: This presentation will probably contain some references to late 2000's pop group LMFAO
About the Speaker
Ben Bromhead CTO, Instaclustr
Ben Bromhead is the CTO of Instaclustr where he is responsible for working closely with his engineering team and customers to build highly available, scalable applications on top of Cassandra. Instaclustr is the only multi-cloud, self service Cassandra as a Service provider in the world and is dedicated to provider world class support.
Pollfish is a survey platform which provides access to millions of targeted users. Pollfish allows easy distribution and targeting of surveys through existing mobile apps. (https://www.pollfish.com/). At pollfish we use Cassandra for difference use cases, eg. for application data store to maximize write throughput when appropriate and for our analytics project to find insights in application generated data. As a medium to accomplish our success so far, we use the Datastax's DSE 4.6 environment which integrates Appache Cassadra, Spark and a hadoop compatible file system (CFS). We will discuss how we started, how the journey was and the impressions gained so far along with some tips learned the hard way. This is a result of joint work of an excellent team here at Pollfish.
Cassandra is a better alternative to RDBMS for a scalable solution which requires a distributed DB but it is more popular in clustered solutions which are targeted for a single installation. Key reason is maintainability & life-cycle management.
Ericsson has re-engineered its voucher management solution for prepaid billing by replacing RDBMS with Cassandra. It facilitates clusters with large set of nodes which can easily scale up & scale down, so that one doesn't have to deal with multiple clusters. However, skills for its administration are sparse, unlke RDBMS. Activities like nodetool repair, compaction & scale up/down become challenging. Moreover, frequency of new Cassandra releases is high and rolling them out to several deployments is challenging
Key technical challenges were consistency of denormalized data, performance of full-table scan & porting the product from Thrift to CQL. Challenges with large scale global deployments are with anti-entropy & size-tiered compaction.
About the Speaker
Brij Bhushan Ravat Chief Architect, Ericsson
Brij is Chief Architect for prepaid billing product in Ericsson. The product uses Cassandra in business support systems for telecom service providers. He has also led Centre of Excellence for Network Applications, which tracks emerging trends in the application development in the area of telecom. This includes telecom services, OSS & leveraging big data technologies for innovative new age solutions His focus is on application of big data in telecom. This includes analytics using Spark & NoSQL
Co-Founder and CTO of Instaclustr, Ben Bromhead's presentation at the Cassandra Summit 2016, in San Jose.
This presentation will show how create truly elastic Cassandra deployments on AWS allowing you to scale and shrink your large Cassandra deployments multiple times a day. Leveraging a combination of EBS backed disks, JBOD, token pinning and our previous work on bootstrapping from backups you will be able to dramatically reduce costs per cluster by scaling to match your daily workloads.
Presenter: Chris Lohfink, Engineer at Pythian
This session will cover a walk-through to provide an understanding of key metrics critical to operating a Cassandra cluster effectively. Without context to the metrics, we just have pretty graphs. With context, we have a powerful tool to determine problems before they happen and to debug production issues more quickly.
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...DataStax
Cassandra's support for multiple data centers can bring massive benefits to an organization, however it can also bring painful operational lessons. While there is no recipe for trouble free mutli DC clusters, the best approach is to understand why you are using one, what Cassandra supports, and how it does it. With this knowledge in your toolkit you will have a better chance of fixing the sort of gremlins that can trouble a globally distributed database.
In this talk Alexander Dejanovski, Consultant at The Last Pickle, will outline the motivations people typically have for running a multi DC cluster. He will also look at how multiple DC's are supported through all areas of the Cassandra, how it impacts your application and operations, and how you can always blame the network.
About the Speaker
Alexander DEJANOVSKI Consultant, The Last Pickle
Alexander has been working as a software developer for the last 18 years, mainly for the french leader of express shipments. He's been leading there the effort to build a Cassandra based architecture and migrate services to it from traditional RDBMS. He is involved in the Cassandra community through the development of a JDBC wrapper for the DataStax Java Driver. Recently, he joined The Last Pickle as a Cassandra consultant and now helps customers to get the best out of it.
Empowering the AWS DynamoDB™ application developer with AlternatorScyllaDB
Getting started with AWS DynamoDB™ is famously easy, but as an application grows and evolves it often starts to struggle with DynamoDB’s limitations. We introduce Scylla’s Alternator, which provides the same API as DynamoDB but aims to empower the application developer. In this presentation we will survey some of Alternator’s developer-centered features: Alternator lets you test and eventually deploy your application anywhere, on any public cloud or private cluster. It efficiently supports multiple tables so it does not require difficult single-table design. Finally, Alternator provides the developer with strong observability tools. The insights provided by these tools can detect bottlenecks, improve performance and even lower its cost.
Performance tuning - A key to successful cassandra migrationRamkumar Nottath
In last few years, technology has seen a major drift in the dominance of traditional / RDMBS databases across different domains. Expeditious adoption of NoSQL databases especially Cassandra in the industry opens up a lot more discussions on what are the major challenges that are faced during implementation of Cassandra and how to mitigate it. Many a times we conclude that migration or POC (proof of concept) is not successful; however the real flaw might be in the data modeling, identifying the right hardware configurations, database parameters, right consistency level and so on. There's no one good model or configuration which fits all use cases and all applications. Performance tuning an application is truly an art and requires perseverance. This paper delve into different performance tuning considerations and anti-patterns that need to be considered during Cassandra migration / implementation to make sure we are able to reap the benefits of Cassandra, what makes it a ‘Visionary’ in 2014 Gartner’s Magic Quadrant for Operational Database Management Systems.
This presentation will investigate how using micro-batching for submitting writes to Cassandra can improve throughput and reduce client application CPU load.
Micro-batching combines writes for the same partition key into a single network request and ensures they hit the "fast path" for writes on a Cassandra node.
About the Speaker
Adam Zegelin Technical Co-founder, Instaclustr
As Instaclustrs founding software engineer, Adam provides the foundation knowledge of our capability and engineering environment. He delivers business-focused value to our code-base and overall capability architecture. Adam is also focused on providing Instaclustr's contribution to the broader open source community on which our products and services rely, including Apache Cassandra, Apache Spark and other technologies such as CoreOS and Docker.
Cassandra Community Webinar | Data Model on FireDataStax
Functional data models are great, but how can you squeeze out more performance and make them awesome? Let's talk through some example Cassandra 2.0 models, go through the tuning steps and understand the tradeoffs. Many time's just a simple understanding of the underlying Cassandra 2.0 internals can make all the difference. I've helped some of the biggest companies in the world do this and I can help you. Do you feel the need for Cassandra 2.0 speed?
Webinar: Getting Started with Apache CassandraDataStax
Would you like to learn how to use Cassandra but don’t know where to begin? Want to get your feet wet but you’re lost in the desert? Longing for a cluster when you don’t even know how to set up a node? Then look no further! Rebecca Mills, Junior Evangelist at Datastax, will guide you in the webinar “Getting Started with Apache Cassandra...”
You'll get an overview of Planet Cassandra’s resources to get you started quickly and easily. Rebecca will take you down the path that's right for you, whether you are a developer or administrator. Join if you are interested in getting Cassandra up and working in the way that suits you best.
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...DataStax
We use Apache Cassandra at BlackRock to help power our Aladdin investment management platform. Like most users, we love Cassandra’s scalability and fault tolerance. One challenge we’ve faced is keeping data consistent between data centers. Cassandra is great at replicating data to multiple data centers, and many users take advantage of this feature to achieve eventual consistency in multi-region clusters. At BlackRock, we have several use cases where eventual consistency is not good enough; sometimes we need to guarantee that the most recent data is available from all locations. Cassandra’s tunable consistency makes it possible to achieve this extreme level of resiliency. In this talk we’ll discuss our experience from the past several years using Cassandra for cross-WAN consistency, some of the novel ways we’ve dealt with the performance implications, and our ideas for improving support for this usage model in future versions of Cassandra.
About the Speaker
Randy Fradin Vice President, BlackRock
Randy Fradin is part of BlackRock’s Aladdin Product Group. His team is responsible for developing the core software infrastructure in BlackRock’s Aladdin platform, including scalable storage, compute, and messaging services. Previously he spent time developing the market data, risk reporting, and core trading functions in Aladdin. He has been an enthusiastic Cassandra user since 2011.
Performance Testing: Scylla vs. Cassandra vs. DatastaxScyllaDB
Ticketmaster is part of Live Nation Entertainment, the world's leading live entertainment company. Learn why they went with Scylla after conducting performance testing between Scylla, Apache Cassandra and DataStax Enterprise.
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...DataStax
Cassandra is a distributed database with features included but not limited to Secundary Indexes, UDF, Materialized Views, etc. and not so strict hardware requirements.
It is important to use those features and select hardware correctly to make sure the use of Cassandra in your business can be as painless as possible.
I will address how these features are used in the wrong way, how hardware should be selected, and how to make Cassandra work in the best possible way.
Learning Objective #1:
Learn that Cassandra hardware requirements exist (and why) and the shortcomings in some of features(Secundary Indexes, Compaction Strategies, etc).
Learning Objective #2:
The most misused features and common hardware errors. How they might seem harmeless at first (either small cluster or even single node).
Learning Objective #3:
How to correctly use Cassandra and it's features and go for perfect operation.
About the Speaker
Carlos Rolo Cassandra Consultant, Pythian
Carlos Rolo is a Cassandra MVP, and has deep expertise with distributed architecture technologies. Carlos is driven by challenge, and enjoys the opportunities to discover new things.. He has become known and trusted by customers and colleagues for his ability to understand complex problems, and to work well under pressure. When Carlos isn't working he can be found playing water polo or enjoying the his local community.
Adam Zegelin is Instaclustr's founding software engineer. This presentation will investigate how using micro-batching for submitting writes to Cassandra can improve throughput and reduce client application CPU load. Micro-batching combines writes for the same partition key into a single network request and ensures they hit the “fast path” for writes on a Cassandra node.
This presentation will show how create truly elastic Cassandra deployments on AWS allowing you to scale and shrink your large Cassandra deployments multiple times a day.
Leveraging a combination of EBS backed disks, JBOD, token pinning and our previous work on bootstrapping from backups you will be able to dramatically reduce costs per cluster by scaling to match your daily workloads.
Warning: This presentation will probably contain some references to late 2000's pop group LMFAO
About the Speaker
Ben Bromhead CTO, Instaclustr
Ben Bromhead is the CTO of Instaclustr where he is responsible for working closely with his engineering team and customers to build highly available, scalable applications on top of Cassandra. Instaclustr is the only multi-cloud, self service Cassandra as a Service provider in the world and is dedicated to provider world class support.
Pollfish is a survey platform which provides access to millions of targeted users. Pollfish allows easy distribution and targeting of surveys through existing mobile apps. (https://www.pollfish.com/). At pollfish we use Cassandra for difference use cases, eg. for application data store to maximize write throughput when appropriate and for our analytics project to find insights in application generated data. As a medium to accomplish our success so far, we use the Datastax's DSE 4.6 environment which integrates Appache Cassadra, Spark and a hadoop compatible file system (CFS). We will discuss how we started, how the journey was and the impressions gained so far along with some tips learned the hard way. This is a result of joint work of an excellent team here at Pollfish.
Galaxy Big Data with MariaDB 10 by Bernard Garros, Sandrine Chirokoff and Stéphane Varoqui.
Presented 26.6.2014 at the MariaDB Roadshow in Paris, France.
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra MigrationDataStax Academy
In last few years, technology has seen a major drift in the dominance of traditional / RDMBS databases across different domains. Expeditious adoption of NoSQL databases especially Cassandra in the industry opens up a lot more discussions on what are the major challenges that are faced during implementation of Cassandra and how to mitigate it. Many a times we conclude that migration or POC (proof of concept) is not successful; however the real flaw might be in the data modeling, identifying the right hardware configurations, database parameters, right consistency level and so on. There's no one good model or configuration which fits all use cases and all applications. Performance tuning an application is truly an art and requires perseverance. This paper delve into different performance tuning considerations and anti-patterns that need to be considered during Cassandra migration / implementation to make sure we are able to reap the benefits of Cassandra, what makes it a ‘Visionary’ in 2014 Gartner’s Magic Quadrant for Operational Database Management Systems.
Healthcare Claim Reimbursement using Apache SparkDatabricks
Optum Inc helps hospitals accurately calculate the claim reimbursement, detect underpayment from the Insurance company. Optum receives millions of claims per day which needs to be evaluated in less than 8 hours and the results need to be sent back to the hospitals for revenue recovery purposes.
DataStax Enterprise (DSE) already offers a plethora of solid capabilities to make your distributed database dreams become more real than The NeverEnding Story. But are you aware of all of the crazy, quality of life updates and new features added in DataStax Enterprise 6? These include: significantly improved performance; anti-entropy improvements with DSE NodeSync; quality updates for DSE Search, Graph, Analytics, OpsCenter, Advanced Security, and Studio; metrics collection; and Kafka and Docker integrations. We’ll take a look at all of it, plus give you a sneak peek at some of the foundational changes coming in DataStax Enterprise 6.8 that will rock your world.
MySQL Cluster (NDB) - Best Practices Percona Live 2017Severalnines
This presentation by Johan Andersson at Percona Live 2017 in Santa Clara, California gives detailed information on all you need to know to effectively deploy and manage MySQL Cluster technology in your environment.
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...DataStax
In this webinar, you will leverage free and open source tools as well as enterprise-grade utilities developed by DataStax to get a solid grasp on the performance of a masterless distributed database like Cassandra. You’ll also get the opportunity to walk through DataStax Enterprise Insights dashboards and see exactly how to identify performance bottlenecks.
View Recording: https://youtu.be/McZg_MMzVjI
Dyn delivers exceptional Internet Performance. Enabling high quality services requires data centers around the globe. In order to manage services, customers need timely insight collected from all over the world. Dyn uses DataStax Enterprise (DSE) to deploy complex clusters across multiple datacenters to enable sub 50 ms query responses for hundreds of billions of data points. From granular DNS traffic data, to aggregated counts for a variety of report dimensions, DSE at Dyn has been up since 2013 and has shined through upgrades, data center migrations, DDoS attacks and hardware failures. In this webinar, Principal Engineers Tim Chadwick and Rick Bross cover the requirements which led them to choose DSE as their go-to Big Data solution, the path which led to SPARK, and the lessons that we’ve learned in the process.
NativeX (formerly W3i) recently transitioned a large portion of their backend infrastructure from MS SQL Server to Apache Cassandra. Today, its Cassandra cluster backs its mobile advertising network supporting over 10 million daily active users producing over 10,000 transactions per second with an average database request latency of under 2 milliseconds. Going from relational to noSQL required NativeX's engineers to re-train, re-tool and re-think the way it architects applications and infrastructure. Learn why Cassandra was selected as a replacement, what challenges were encountered along the way, and what architecture and infrastructure were involved in the implementation.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.