This document summarizes a presentation about monitoring Cassandra systems. It discusses gathering metrics from Cassandra using JMX and nodetool, including thread pool statistics, latency histograms, and metric types. It also provides an overview of the Cassandra read/write process involving memtables and SSTables.
Cassandra Summit 2014: Cyanite — Better Graphite Storage with Apache CassandraDataStax Academy
Presenter: Pierre-Yves Ritschard, CTO at Exoscale
Graphite is the go-to tool of sysadmins everywhere to store and retrieve timeseries data. Cyanite is an alternative graphite compatible daemon which uses Cassandra as its main storage engine. The talk will focus on how to build efficient time-series data models in Cassandra, how the ecosystem of tools around Cassandra can help in processing timeseries in batches and will provide architectural insight in how to build truly scalable time series pipelines.
Presenter: Chris Lohfink, Engineer at Pythian
This session will cover a walk-through to provide an understanding of key metrics critical to operating a Cassandra cluster effectively. Without context to the metrics, we just have pretty graphs. With context, we have a powerful tool to determine problems before they happen and to debug production issues more quickly.
Cassandra Community Webinar: Apache Cassandra InternalsDataStax
Apache Cassandra solves many interesting problems to provide a scalable, distributed, fault tolerant database. Cluster wide operations track node membership, direct requests and implement consistency guarantees. At the node level, the Log Structured storage engine provides high performance reads and writes. All of this is implemented in a Java code base that has greatly matured over the past few years.
In this webinar Aaron Morton will step through read and write requests, automatic processes and manual maintenance tasks. He will also discuss the general approach to solving the problem and drill down to the code responsible for implementation.
Speaker: Aaron Morton, Apache Cassandra Committer
Aaron Morton is a Freelance Developer based in New Zealand, and a Committer on the Apache Cassandra project. In 2010 he gave up the RDBMS world for the scale and reliability of Cassandra. He now spends his time advancing the Cassandra project and helping others get the best out of it.
Apache Cassandra operations have the reputation to be quite simple against single datacenter clusters and / or low volume clusters but they become way more complex against high latency multi-datacenter clusters: basic operations such as repair, compaction or hints delivery can have dramatic consequences even on a healthy cluster.
In this presentation, Julien will go through Cassandra operations in details: bootstrapping new nodes and / or datacenter, repair strategies, compaction strategies, GC tuning, OS tuning, large batch of data removal and Apache Cassandra upgrade strategy.
Julien will give you tips and techniques on how to anticipate issues inherent to multi-datacenter cluster: how and what to monitor, hardware and network considerations as well as data model and application level bad design / anti-patterns that can affect your multi-datacenter cluster performances.
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...DataStax
Go90 is a mobile entertainment platform offering access to live and on demand videos. We built the web services platform and social features like activity feed for go90 by making heavy use of Cassandra and Scala, and would like to share what we learned during development and while operating go90. In this presentation, we cover our data model evolution from the initial prototypes to the current production version and the significant performance gain by using a better data model. We will explain how we apply time series data modeling and the benefits of using expiring columns with DateTieredCompactionStrategy. We will also talk about interesting experiences related to table modifications, tombstones and table pagination. On the operations side, we will discuss our findings on java driver usage, performance, monitoring, cluster maintenance, version upgrade, 2-way ssl and many more. We hope you can learn from our mistakes instead of making them yourself!
About the Speakers
Christopher Webster Software Engineer, AOL
Christopher Webster works on the web services platform for the go90 AOL project. Previously he was a Computer Scientist for the Mission Control Technologies project at NASA Ames Center. Chris worked as a senior staff engineer at Sun Microsystems for Project zembly, the cloud development and deployment environment as well as technical lead in many NetBeans projects. Chris is an author of the NetBeans Field Guide and Assemble the Social Web With Zembly.
Thomas Ng Software Engineer, AOL
Thomas Ng is a software engineer at AOL, building web services for the go90 mobile entertainment platform using Cassandra, Scala and Kafka.
Cassandra Summit 2014: Cyanite — Better Graphite Storage with Apache CassandraDataStax Academy
Presenter: Pierre-Yves Ritschard, CTO at Exoscale
Graphite is the go-to tool of sysadmins everywhere to store and retrieve timeseries data. Cyanite is an alternative graphite compatible daemon which uses Cassandra as its main storage engine. The talk will focus on how to build efficient time-series data models in Cassandra, how the ecosystem of tools around Cassandra can help in processing timeseries in batches and will provide architectural insight in how to build truly scalable time series pipelines.
Presenter: Chris Lohfink, Engineer at Pythian
This session will cover a walk-through to provide an understanding of key metrics critical to operating a Cassandra cluster effectively. Without context to the metrics, we just have pretty graphs. With context, we have a powerful tool to determine problems before they happen and to debug production issues more quickly.
Cassandra Community Webinar: Apache Cassandra InternalsDataStax
Apache Cassandra solves many interesting problems to provide a scalable, distributed, fault tolerant database. Cluster wide operations track node membership, direct requests and implement consistency guarantees. At the node level, the Log Structured storage engine provides high performance reads and writes. All of this is implemented in a Java code base that has greatly matured over the past few years.
In this webinar Aaron Morton will step through read and write requests, automatic processes and manual maintenance tasks. He will also discuss the general approach to solving the problem and drill down to the code responsible for implementation.
Speaker: Aaron Morton, Apache Cassandra Committer
Aaron Morton is a Freelance Developer based in New Zealand, and a Committer on the Apache Cassandra project. In 2010 he gave up the RDBMS world for the scale and reliability of Cassandra. He now spends his time advancing the Cassandra project and helping others get the best out of it.
Apache Cassandra operations have the reputation to be quite simple against single datacenter clusters and / or low volume clusters but they become way more complex against high latency multi-datacenter clusters: basic operations such as repair, compaction or hints delivery can have dramatic consequences even on a healthy cluster.
In this presentation, Julien will go through Cassandra operations in details: bootstrapping new nodes and / or datacenter, repair strategies, compaction strategies, GC tuning, OS tuning, large batch of data removal and Apache Cassandra upgrade strategy.
Julien will give you tips and techniques on how to anticipate issues inherent to multi-datacenter cluster: how and what to monitor, hardware and network considerations as well as data model and application level bad design / anti-patterns that can affect your multi-datacenter cluster performances.
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...DataStax
Go90 is a mobile entertainment platform offering access to live and on demand videos. We built the web services platform and social features like activity feed for go90 by making heavy use of Cassandra and Scala, and would like to share what we learned during development and while operating go90. In this presentation, we cover our data model evolution from the initial prototypes to the current production version and the significant performance gain by using a better data model. We will explain how we apply time series data modeling and the benefits of using expiring columns with DateTieredCompactionStrategy. We will also talk about interesting experiences related to table modifications, tombstones and table pagination. On the operations side, we will discuss our findings on java driver usage, performance, monitoring, cluster maintenance, version upgrade, 2-way ssl and many more. We hope you can learn from our mistakes instead of making them yourself!
About the Speakers
Christopher Webster Software Engineer, AOL
Christopher Webster works on the web services platform for the go90 AOL project. Previously he was a Computer Scientist for the Mission Control Technologies project at NASA Ames Center. Chris worked as a senior staff engineer at Sun Microsystems for Project zembly, the cloud development and deployment environment as well as technical lead in many NetBeans projects. Chris is an author of the NetBeans Field Guide and Assemble the Social Web With Zembly.
Thomas Ng Software Engineer, AOL
Thomas Ng is a software engineer at AOL, building web services for the go90 mobile entertainment platform using Cassandra, Scala and Kafka.
This presentation will investigate how using micro-batching for submitting writes to Cassandra can improve throughput and reduce client application CPU load.
Micro-batching combines writes for the same partition key into a single network request and ensures they hit the "fast path" for writes on a Cassandra node.
About the Speaker
Adam Zegelin Technical Co-founder, Instaclustr
As Instaclustrs founding software engineer, Adam provides the foundation knowledge of our capability and engineering environment. He delivers business-focused value to our code-base and overall capability architecture. Adam is also focused on providing Instaclustr's contribution to the broader open source community on which our products and services rely, including Apache Cassandra, Apache Spark and other technologies such as CoreOS and Docker.
Start to finish overview of tools, tips and techniques for developing software for Apache Cassandra. Includes code and configuration examples, build systems and container support.
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...DataStax
Time series data has long been a natural use case for Cassandra with plenty of write ups showing you how to store mock data ""at scale"". Unfortunately warnings of wide rows and examples of storing numeric-only data aren't sufficient to guide your organization through the realities of running these workloads. Instead you find yourself implementing anti patterns like rotating clusters, only to be taken in by the siren song of DTCS, your hopes dashed across the rocks of ever expanding disk utilization.
We will be taking the lessons learned at Threat Stack - a continuous security monitoring platform - about how to scale a large volume of bulky transactions totaling terabytes and petabytes on AWS, but while holding yourself to a sane budget and DBA-free operational life.
Specifics to include ""break the glass"" operational maneuvers, making DTCS function properly, data modeling, and living in a polyglot data platform.
About the Speaker
Sam Bisbee CTO, Threat Stack
As the CTO at Threat Stack, Sam is responsible for leading the Company's strategic technology roadmap for its continuous security monitoring service, purpose-built for cloud environments. Sam brings highly-relevant experience in distributed systems in public, private, and hybrid cloud environments, as well as proven success scaling SaaS startups. Sam was most recently the CXO at Cloudant (acquired by IBM in Feb. 2014), a leader in the Database-as-a-Service space.
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...DataStax
Lessons learned from a year spent building a Cassandra cluster over multiple regions, data centers, and providers. Will discuss our successes and learnings on replication, operations, and application development.
About the Speaker
Aaron Ploetz Lead Technical Architect, Target
Aaron is a Lead Technical Architect for Target, where he coaches development teams on modeling and building applications for Cassandra. He is active in the Cassandra tags on StackOverflow, and has also contributed patches to cqlsh. Aaron holds a B.S. in Management/Computer Systems from the University of Wisconsin-Whitewater, a M.S. in Software Engineering and Database Technologies from Regis University, and is a 2x DataStax MVP for Apache Cassandra.
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...DataStax
With the addition of vnodes (Virtual Nodes), Cassandra users were able to gain a few benefits as a result of streaming when it came to bootstrapping and decommissioning nodes. On the flip side, having to route requests on larger clusters became a lot more intensive of a workload for all nodes that were then forced to act coordinator nodes. By setting up a tier of proxy nodes, we were able to have our cluster of 50 nodes perform with a 300% improvement on average in a mixed workload environment. This is an explanation of what we did, how we did it, and why it works.
About the Speaker
Eric Lubow CTO, SimpleReach
Eric Lubow is CTO of SimpleReach, where he builds highly-scalable distributed systems for processing analytics data. Eric is also a DataStax MVP for Cassandra, and co-author of Practical Cassandra. In his spare time, Eric is a skydiver, motorcycle rider, mixed martial artist, and dog dad.
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...DataStax
We use Apache Cassandra at BlackRock to help power our Aladdin investment management platform. Like most users, we love Cassandra’s scalability and fault tolerance. One challenge we’ve faced is keeping data consistent between data centers. Cassandra is great at replicating data to multiple data centers, and many users take advantage of this feature to achieve eventual consistency in multi-region clusters. At BlackRock, we have several use cases where eventual consistency is not good enough; sometimes we need to guarantee that the most recent data is available from all locations. Cassandra’s tunable consistency makes it possible to achieve this extreme level of resiliency. In this talk we’ll discuss our experience from the past several years using Cassandra for cross-WAN consistency, some of the novel ways we’ve dealt with the performance implications, and our ideas for improving support for this usage model in future versions of Cassandra.
About the Speaker
Randy Fradin Vice President, BlackRock
Randy Fradin is part of BlackRock’s Aladdin Product Group. His team is responsible for developing the core software infrastructure in BlackRock’s Aladdin platform, including scalable storage, compute, and messaging services. Previously he spent time developing the market data, risk reporting, and core trading functions in Aladdin. He has been an enthusiastic Cassandra user since 2011.
Capturing live traffic in a typical micro-services setup is one of the industry standard strategies for various needs including testing, canary, and dark launches, but when it comes to databases, capturing and replaying live traffic is quite challenging but equally critical. Cassandra is one of the few databases which supports a live traffic capture and replay feature out of the box starting with 4.0. As part of this talk, Vinay Chella will dive deeper into the query logging framework that is introduced in Cassandra 4.0, and the features built on top of this framework, including full query logging, audit logging, and traffic replay. At end of this talk, you will learn how to use audit logging, live traffic capture and replay features in the upcoming release of Cassandra.
Webinar: Getting Started with Apache CassandraDataStax
Would you like to learn how to use Cassandra but don’t know where to begin? Want to get your feet wet but you’re lost in the desert? Longing for a cluster when you don’t even know how to set up a node? Then look no further! Rebecca Mills, Junior Evangelist at Datastax, will guide you in the webinar “Getting Started with Apache Cassandra...”
You'll get an overview of Planet Cassandra’s resources to get you started quickly and easily. Rebecca will take you down the path that's right for you, whether you are a developer or administrator. Join if you are interested in getting Cassandra up and working in the way that suits you best.
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...DataStax
You write with QUORUM, you read with QUORUM. You're safe, right?
Although it may seem that way, you could read a different value than the one you wrote - even if nobody else wrote after you. One way this can happen is if the time on the machines in your cluster is not synchronized closely enough. This is called clock skew, and is just one of the ways you'll see that this anomaly can occur.
In this talk we'll dive in to how Cassandra handles conflicting data, walk through several weird and seemingly impossible situations that can happen (both with and without clock skew), and see what we can do to work around them.
About the Speaker
Donny Nadolny Senior Developer, PagerDuty
Donny Nadolny is a Scala developer at PagerDuty, working on improving the reliability of their backend systems. He spends a large amount of time investigating problems experienced with distributed systems like Cassandra and ZooKeeper.
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Alexey Kharlamov
At Integral, we process heavy volumes of click-stream traffic. 50K QPS of ad impressions at peak and close to 200K QPS of all browser calls. We build analytics on this streams of data. There are two applications which require quite significant computational effort: 'sessionization' and fraud detection.
Sessionization implies linking a series of requests from same browser into single record. There can be 5 or more total requests spread over 15-30 minutes which we need to link to each other.
Fraud detection is a process looking at various signals in browser requests and at substantial historical evidence data classifying ad impression either as legitimate or as fraudulent.
We've been doing both (as well as all other analytics) in batch mode once an hour at best. Both processes, and, in particular, fraud detection, are time sensitive and much more meaningful if done in near-real-time.
This talk would be about our experience migrating a once-per-day offline batch processing of impression data using hadoop to in-memory stream processing using Kafka, Storm and Cassandra. We will touch upon our choices and our reasoning for selecting the products used for this solution.
Hadoop is no longer the only or always preferred option in Big Data space. In-memory stream processing may be more effective for time series data preparation and aggregation. Ability to scale at a significantly lower cost means more customers, better accuracy and better business practices: since only in-stream processing allows for low-latency data and insight delivery it opens entirely new opportunities. However, transitioning of non-trivial data pipelines raises a number of questions hidden previously within the offline nature of batch processing. How will you join several data feeds? How will you implement failure recovery? In addition to handling terabytes of data per day our streaming system has to be guided by the following considerations:
• Recovery time
• Time relativity and continuity
• Geographical distribution of data sources
• Limit on data loss
• Maintainability
The system produces complex cross-correlational analysis of several data feeds and aggregation for client analytics with input feed frequency of up to 100K msg/sec.
This presentation will benefit anyone interested in learning an alternate approach for big data analytics, especially the process of joining multiple streams in memory using Cassandra. Presentation will also highlight certain optimization patterns used those can be useful in similar situations.
This sessions covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Attendees will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster. This talk is intended for people with a general understanding of Cassandra, but it not required to have experience running it in production.
Apache Cassandra operations have the reputation to be simple on single datacenter deployments and / or low volume clusters but they become way more complex on high latency multi-datacenter clusters with high volume and / or high throughout: basic Apache Cassandra operations such as repairs, compactions or hints delivery can have dramatic consequences even on a healthy high latency multi-datacenter cluster.
In this presentation, Julien will go through Apache Cassandra mutli-datacenter concepts first then show multi-datacenter operations essentials in details: bootstrapping new nodes and / or datacenter, repairs strategy, Java GC tuning, OS tuning, Apache Cassandra configuration and monitoring.
Based on his 3 years experience managing a multi-datacenter cluster against Apache Cassandra 2.0, 2.1, 2.2 and 3.0, Julien will give you tips on how to anticipate and prevent / mitigate issues related to basic Apache Cassandra operations with a multi-datacenter cluster.
About the Speaker
Julien Anguenot VP Software Engineering, iland Internet Solutions, Corp
Julien currently serves as iland's Vice President of Software Engineering. Prior to joining iland, Mr. Anguenot held tech leadership positions at several open source content management vendors and tech startups in Europe and in the U.S. Julien is a long time Open Source software advocate, contributor and speaker: Zope, ZODB, Nuxeo contributor, Zope and OpenStack foundations member, his talks includes Apache Con, Cassandra summit, OpenStack summit, The WWW Conference or still EuroPython.
Adam Zegelin is Instaclustr's founding software engineer. This presentation will investigate how using micro-batching for submitting writes to Cassandra can improve throughput and reduce client application CPU load. Micro-batching combines writes for the same partition key into a single network request and ensures they hit the “fast path” for writes on a Cassandra node.
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...DataStax Academy
iland has built a global data warehouse across multiple data centers, collecting and aggregating data from core cloud services including compute, storage and network as well as chargeback and compliance. iland's warehouse brings actionable intelligence that customers can use to manipulate resources, analyze trends, define alerts and share information.
In this session, we would like to present the lessons learned around Cassandra, both at the development and operations level, but also the technology and architecture we put in action on top of Cassandra such as Redis, syslog-ng, RabbitMQ, Java EE, etc.
Finally, we would like to share insights on how we are currently extending our platform with Spark and Kafka and what our motivations are.
Open Source Monitoring for Java with JMX and Graphite (GeeCON 2013)Cyrille Le Clerc
Fast feedback from monitoring is a key of Continuous Delivery. JMX is the right Java API to do so but it unfortunately stayed underused and underappreciated as it was difficult to connect to monitoring and graphing systems.
Throw in the sin bin the poor solutions based on log files and weakly secured web interfaces! A new generation of Open Source tooling makes it easy to graph java application metrics and integrate them to traditional monitoring systems like Nagios.
Following the logic of DevOps, we will look together how best to integrate the monitoring dimension in a project: from design to development, to QA and finally to production on both traditional deployment and in the Cloud.
Come and discover how the JmxTrans-Graphite ticket can make your life easier.
This presentation will investigate how using micro-batching for submitting writes to Cassandra can improve throughput and reduce client application CPU load.
Micro-batching combines writes for the same partition key into a single network request and ensures they hit the "fast path" for writes on a Cassandra node.
About the Speaker
Adam Zegelin Technical Co-founder, Instaclustr
As Instaclustrs founding software engineer, Adam provides the foundation knowledge of our capability and engineering environment. He delivers business-focused value to our code-base and overall capability architecture. Adam is also focused on providing Instaclustr's contribution to the broader open source community on which our products and services rely, including Apache Cassandra, Apache Spark and other technologies such as CoreOS and Docker.
Start to finish overview of tools, tips and techniques for developing software for Apache Cassandra. Includes code and configuration examples, build systems and container support.
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...DataStax
Time series data has long been a natural use case for Cassandra with plenty of write ups showing you how to store mock data ""at scale"". Unfortunately warnings of wide rows and examples of storing numeric-only data aren't sufficient to guide your organization through the realities of running these workloads. Instead you find yourself implementing anti patterns like rotating clusters, only to be taken in by the siren song of DTCS, your hopes dashed across the rocks of ever expanding disk utilization.
We will be taking the lessons learned at Threat Stack - a continuous security monitoring platform - about how to scale a large volume of bulky transactions totaling terabytes and petabytes on AWS, but while holding yourself to a sane budget and DBA-free operational life.
Specifics to include ""break the glass"" operational maneuvers, making DTCS function properly, data modeling, and living in a polyglot data platform.
About the Speaker
Sam Bisbee CTO, Threat Stack
As the CTO at Threat Stack, Sam is responsible for leading the Company's strategic technology roadmap for its continuous security monitoring service, purpose-built for cloud environments. Sam brings highly-relevant experience in distributed systems in public, private, and hybrid cloud environments, as well as proven success scaling SaaS startups. Sam was most recently the CXO at Cloudant (acquired by IBM in Feb. 2014), a leader in the Database-as-a-Service space.
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...DataStax
Lessons learned from a year spent building a Cassandra cluster over multiple regions, data centers, and providers. Will discuss our successes and learnings on replication, operations, and application development.
About the Speaker
Aaron Ploetz Lead Technical Architect, Target
Aaron is a Lead Technical Architect for Target, where he coaches development teams on modeling and building applications for Cassandra. He is active in the Cassandra tags on StackOverflow, and has also contributed patches to cqlsh. Aaron holds a B.S. in Management/Computer Systems from the University of Wisconsin-Whitewater, a M.S. in Software Engineering and Database Technologies from Regis University, and is a 2x DataStax MVP for Apache Cassandra.
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...DataStax
With the addition of vnodes (Virtual Nodes), Cassandra users were able to gain a few benefits as a result of streaming when it came to bootstrapping and decommissioning nodes. On the flip side, having to route requests on larger clusters became a lot more intensive of a workload for all nodes that were then forced to act coordinator nodes. By setting up a tier of proxy nodes, we were able to have our cluster of 50 nodes perform with a 300% improvement on average in a mixed workload environment. This is an explanation of what we did, how we did it, and why it works.
About the Speaker
Eric Lubow CTO, SimpleReach
Eric Lubow is CTO of SimpleReach, where he builds highly-scalable distributed systems for processing analytics data. Eric is also a DataStax MVP for Cassandra, and co-author of Practical Cassandra. In his spare time, Eric is a skydiver, motorcycle rider, mixed martial artist, and dog dad.
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...DataStax
We use Apache Cassandra at BlackRock to help power our Aladdin investment management platform. Like most users, we love Cassandra’s scalability and fault tolerance. One challenge we’ve faced is keeping data consistent between data centers. Cassandra is great at replicating data to multiple data centers, and many users take advantage of this feature to achieve eventual consistency in multi-region clusters. At BlackRock, we have several use cases where eventual consistency is not good enough; sometimes we need to guarantee that the most recent data is available from all locations. Cassandra’s tunable consistency makes it possible to achieve this extreme level of resiliency. In this talk we’ll discuss our experience from the past several years using Cassandra for cross-WAN consistency, some of the novel ways we’ve dealt with the performance implications, and our ideas for improving support for this usage model in future versions of Cassandra.
About the Speaker
Randy Fradin Vice President, BlackRock
Randy Fradin is part of BlackRock’s Aladdin Product Group. His team is responsible for developing the core software infrastructure in BlackRock’s Aladdin platform, including scalable storage, compute, and messaging services. Previously he spent time developing the market data, risk reporting, and core trading functions in Aladdin. He has been an enthusiastic Cassandra user since 2011.
Capturing live traffic in a typical micro-services setup is one of the industry standard strategies for various needs including testing, canary, and dark launches, but when it comes to databases, capturing and replaying live traffic is quite challenging but equally critical. Cassandra is one of the few databases which supports a live traffic capture and replay feature out of the box starting with 4.0. As part of this talk, Vinay Chella will dive deeper into the query logging framework that is introduced in Cassandra 4.0, and the features built on top of this framework, including full query logging, audit logging, and traffic replay. At end of this talk, you will learn how to use audit logging, live traffic capture and replay features in the upcoming release of Cassandra.
Webinar: Getting Started with Apache CassandraDataStax
Would you like to learn how to use Cassandra but don’t know where to begin? Want to get your feet wet but you’re lost in the desert? Longing for a cluster when you don’t even know how to set up a node? Then look no further! Rebecca Mills, Junior Evangelist at Datastax, will guide you in the webinar “Getting Started with Apache Cassandra...”
You'll get an overview of Planet Cassandra’s resources to get you started quickly and easily. Rebecca will take you down the path that's right for you, whether you are a developer or administrator. Join if you are interested in getting Cassandra up and working in the way that suits you best.
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...DataStax
You write with QUORUM, you read with QUORUM. You're safe, right?
Although it may seem that way, you could read a different value than the one you wrote - even if nobody else wrote after you. One way this can happen is if the time on the machines in your cluster is not synchronized closely enough. This is called clock skew, and is just one of the ways you'll see that this anomaly can occur.
In this talk we'll dive in to how Cassandra handles conflicting data, walk through several weird and seemingly impossible situations that can happen (both with and without clock skew), and see what we can do to work around them.
About the Speaker
Donny Nadolny Senior Developer, PagerDuty
Donny Nadolny is a Scala developer at PagerDuty, working on improving the reliability of their backend systems. He spends a large amount of time investigating problems experienced with distributed systems like Cassandra and ZooKeeper.
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Alexey Kharlamov
At Integral, we process heavy volumes of click-stream traffic. 50K QPS of ad impressions at peak and close to 200K QPS of all browser calls. We build analytics on this streams of data. There are two applications which require quite significant computational effort: 'sessionization' and fraud detection.
Sessionization implies linking a series of requests from same browser into single record. There can be 5 or more total requests spread over 15-30 minutes which we need to link to each other.
Fraud detection is a process looking at various signals in browser requests and at substantial historical evidence data classifying ad impression either as legitimate or as fraudulent.
We've been doing both (as well as all other analytics) in batch mode once an hour at best. Both processes, and, in particular, fraud detection, are time sensitive and much more meaningful if done in near-real-time.
This talk would be about our experience migrating a once-per-day offline batch processing of impression data using hadoop to in-memory stream processing using Kafka, Storm and Cassandra. We will touch upon our choices and our reasoning for selecting the products used for this solution.
Hadoop is no longer the only or always preferred option in Big Data space. In-memory stream processing may be more effective for time series data preparation and aggregation. Ability to scale at a significantly lower cost means more customers, better accuracy and better business practices: since only in-stream processing allows for low-latency data and insight delivery it opens entirely new opportunities. However, transitioning of non-trivial data pipelines raises a number of questions hidden previously within the offline nature of batch processing. How will you join several data feeds? How will you implement failure recovery? In addition to handling terabytes of data per day our streaming system has to be guided by the following considerations:
• Recovery time
• Time relativity and continuity
• Geographical distribution of data sources
• Limit on data loss
• Maintainability
The system produces complex cross-correlational analysis of several data feeds and aggregation for client analytics with input feed frequency of up to 100K msg/sec.
This presentation will benefit anyone interested in learning an alternate approach for big data analytics, especially the process of joining multiple streams in memory using Cassandra. Presentation will also highlight certain optimization patterns used those can be useful in similar situations.
This sessions covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Attendees will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster. This talk is intended for people with a general understanding of Cassandra, but it not required to have experience running it in production.
Apache Cassandra operations have the reputation to be simple on single datacenter deployments and / or low volume clusters but they become way more complex on high latency multi-datacenter clusters with high volume and / or high throughout: basic Apache Cassandra operations such as repairs, compactions or hints delivery can have dramatic consequences even on a healthy high latency multi-datacenter cluster.
In this presentation, Julien will go through Apache Cassandra mutli-datacenter concepts first then show multi-datacenter operations essentials in details: bootstrapping new nodes and / or datacenter, repairs strategy, Java GC tuning, OS tuning, Apache Cassandra configuration and monitoring.
Based on his 3 years experience managing a multi-datacenter cluster against Apache Cassandra 2.0, 2.1, 2.2 and 3.0, Julien will give you tips on how to anticipate and prevent / mitigate issues related to basic Apache Cassandra operations with a multi-datacenter cluster.
About the Speaker
Julien Anguenot VP Software Engineering, iland Internet Solutions, Corp
Julien currently serves as iland's Vice President of Software Engineering. Prior to joining iland, Mr. Anguenot held tech leadership positions at several open source content management vendors and tech startups in Europe and in the U.S. Julien is a long time Open Source software advocate, contributor and speaker: Zope, ZODB, Nuxeo contributor, Zope and OpenStack foundations member, his talks includes Apache Con, Cassandra summit, OpenStack summit, The WWW Conference or still EuroPython.
Adam Zegelin is Instaclustr's founding software engineer. This presentation will investigate how using micro-batching for submitting writes to Cassandra can improve throughput and reduce client application CPU load. Micro-batching combines writes for the same partition key into a single network request and ensures they hit the “fast path” for writes on a Cassandra node.
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...DataStax Academy
iland has built a global data warehouse across multiple data centers, collecting and aggregating data from core cloud services including compute, storage and network as well as chargeback and compliance. iland's warehouse brings actionable intelligence that customers can use to manipulate resources, analyze trends, define alerts and share information.
In this session, we would like to present the lessons learned around Cassandra, both at the development and operations level, but also the technology and architecture we put in action on top of Cassandra such as Redis, syslog-ng, RabbitMQ, Java EE, etc.
Finally, we would like to share insights on how we are currently extending our platform with Spark and Kafka and what our motivations are.
Open Source Monitoring for Java with JMX and Graphite (GeeCON 2013)Cyrille Le Clerc
Fast feedback from monitoring is a key of Continuous Delivery. JMX is the right Java API to do so but it unfortunately stayed underused and underappreciated as it was difficult to connect to monitoring and graphing systems.
Throw in the sin bin the poor solutions based on log files and weakly secured web interfaces! A new generation of Open Source tooling makes it easy to graph java application metrics and integrate them to traditional monitoring systems like Nagios.
Following the logic of DevOps, we will look together how best to integrate the monitoring dimension in a project: from design to development, to QA and finally to production on both traditional deployment and in the Cloud.
Come and discover how the JmxTrans-Graphite ticket can make your life easier.
Slides from my talk at Cassandra Summit 2016 on troubleshooting Cassandra. This is a reprise of my popular talk from last summit, reorganized, expanded, and updated for Cassandra 3.0. In it I share the secrets I've learned in four years of supporting hundreds of customers using Apache Cassandra and DataStax Enterprise. Be sure to check out presenter notes for additional tips and links to further resources.
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
Talk for USENIX/LISA2014 by Brendan Gregg, Netflix. At Netflix performance is crucial, and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.
The Last Pickle: Hardening Apache Cassandra for Compliance (or Paranoia).DataStax Academy
Security is always at odds with usability, particularly in the context of operations and development. More so when dealing with a distributed system such as Apache Cassandra. In this presentation, we'll walk through the steps required to completely secure a Cassandra cluster to meet most regulatory and compliance guidelines.
Topics will include:
- Encrypting cross-DC traffic
- Different types of at-rest disk encryption options available (and how to tune them)
- Configuring SSL for inter-cluster communication
- Configuring SSL between clients and the API
- Configuring and managing client authentication
Attendees will leave this presentation with the knowledge required to harden Cassandra to meet most guidelines imposed by regulations and compliance.
Some vignettes and advice based on prior experience with Cassandra clusters in live environments. Includes some material from other operational slides.
Webinar | Building Apps with the Cassandra Python DriverDataStax Academy
With the new Python driver for Cassandra it is easy to build integrations and apps that use Cassandra seamlessly as a back in. This session will explore what it takes to build the app and the features available with the new Python drivers.
Cassandra Troubleshooting (for 2.0 and earlier)J.B. Langston
I’ll give a general lay of the land for troubleshooting Cassandra. Then I’ll take you on a deep dive through nodetool and system.log and give you a guided tour of the useful information they provide for troubleshooting. I’ll devote special attention to monitoring the various processes that Cassandra uses to do its work and how to effectively search for information about specific error messages online.
This is the old version of this presentation for Cassandra 2.0 and earlier. Check out the updated slide deck for Cassandra 2.1.
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...DataStax
Data security is an absolute requirement for any organization – large or small – that handles debit, credit and pre-paid cards. But navigating, understanding and complying with PCI-DSS (Payment Card Industry – Data Security Standards) regulations can be tough. In this webinar, we’ll examine the guidelines for securing payment card data and show you how a combined solution from DataStax and Gazzang can put you on course for compliance.
Monitoring Cassandra: Don't Miss a Thing (Alain Rodriguez, The Last Pickle) |...DataStax
Monitoring is critical to successfully running Apache Cassandra in production. Creating a comprehensive and insightful set of dashboards requires a deep knowledge of Cassandra internals that can be intimidating. Everyone however can benefit from knowing where to start looking and why. So that the next time there is a problem you have the right metrics and knowing which dashboards to look at.
In this talk Alain Rodriguez, Consultant at The Last Pickle, will discuss what to monitor in Apache Cassandra, how, and why. He will present examples from commercial products such as DataDog, and open source systems like Grafana.
About the Speaker
Alain Rodriguez Consultant, The Last Pickle
Alain has been working with Apache Cassandra since version 0.8. He was the first Engineer at teads.tv which had grown to 400+ employees by the time he left. During his time at Teads Alain managed and scaled Cassandra clusters across multiple AWS Regions, fully on his own, taking care of the data modeling as well as the troubleshooting and tuning. Alain frequently contributes to the Apache Cassandra users mailing list.
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...DataStax
Advanced Apache Cassandra operations depends on an understanding of what features are available via the JMX interface. While nodetool exposes many of these, the most useful are still waiting to be discovered. The JMX interface allows the code base to expose functions that operate directly on internal structures, making real time changes to the way the process runs. With this skill in your toolkit there is no limit to the changes you can make.
In this talk Nate McCall, CTO at The Last Pickle, will explain how to explore, secure, and invoke the JMX interface exposed by Cassandra. He'll then move on to what you can do with it such as compacting specific SSTables, changing compaction on a single node, managing repairs, diagnosing latency, viewing cross node timeouts, and others. Whether you are a developer or operator, new or experienced, you will be given a thorough understanding of what all is available via JMX without having to consult the code on your own.
About the Speaker
Nate McCall CTO, The Last Pickle
Nate McCall has 16 years of server-side systems and software development experience. He started his involvement in the Cassandra community in the late fall of 2009 when he became one of the original developers on the Hector Java client. He has contributed a number of patches over the years to the Apache Cassandra code base and continues to be actively involved on the mail lists, issue system and IRC. He has been a DataStax MVP every year since the inception of the program.
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...DataStax
In this presentation, we will look into JIRAs, JavaDocs and system log entries to gain a deeper understanding on how LCS works under the hood. We will explain what scenarios don't work well for LCS and (more importantly) why. We will leverage legacy TRACE/DEBUG level log for compaction related objects as well as some newer compaction logging information introduced in C* 3.6 (CASSANDRA-10805) to gain better insights.
About the Speakers
Wei Deng Solutions Architect, DataStax
Solutions Architect for DataStax. I have a strong interest in big data, cloud application and distributed computing practices.
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016DataStax
Large partitions shall no longer be a nightmare. That is the goal of CASSANDRA-11206.
100MB and 100,000 cells per partition is the recommended limit for a single partition in Cassandra up to 3.5. Exceeding these limits can cause a lot of trouble. Repairs and compactions could fail and reads cause out-of-memory failures.
This talk provides a deep-dive of the reasons for the previous limitations, why exceeding these limitations caused trouble, how the improvements in Cassandra 3.6 helps with big partitions and why you should not blindly let your partitions get huge.
About the Speaker
Robert Stupp Solution Architect, DataStax
Robert is working as a Solutions Architect at DataStax and is also a Committer to Apache Cassandra. Before joining DataStax he worked with his customers to architect and build distributed systems using Cassandra and has a long experience in building distributed backend systems mostly using Java as the preferred language of choice.
Running Cassandra in a docker environment to give you a flexible development environment that uses only a very small set of resources, both locally and with your favorite cloud provider. Lessons learned running Cassandra with a very small set of resources are applicable to both your local development environment and larger, less constrained production deployments.
From http://www.meetup.com/Docker-Santa-Clara/events/232789407/
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...DataStax
At Knewton we operate across five different VPCs a total of 29 clusters, each ranging from 3 nodes to 24 nodes. For a team of three to maintain this is not herculean, however good tools to diagnose issues and gather information in a distributed manner are vital to moving quickly and minimizing engineering time spent.
The database team at Knewton has been successfully using a combination of Ansible and custom open sourced tools to maintain and improve the Cassandra deployment at Knewton. I will be talking about several of these tools and giving examples of how we are using them. Specifically I will discuss the cassandra-tracing tool, which analyzes the contents of the system_traces keyspace, and the cassandra-stat tool, which gives real-time output of the operations of a cassandra cluster. Distributed administration with ad-hoc Ansible will also be covered and I will walk through examples of using these commands to identify and remediate clusterwide issues.
About the Speaker
Jeffrey Berger Lead Database Engineer, Knewton
Dr. Jeffrey Berger is currently the lead database engineer at Knewton, an education tech startup in NYC. He joined the tech scene in NYC in 2013 and spent two years working with MongoDB, becoming a certified MongoDB administrator and a MongoDB Master. He received his Cassandra Administrator certification at Cassandra Summit 2015. He holds a Ph.D. in Theoretical Physics from Penn State and spent several years working on high energy nuclear interactions.
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)Apache Apex
This presentation will introduce usage of Apache Apex for Time Series & Data Ingestion Service by General Electric Internet of things Predix platform. Apache Apex is a native Hadoop data in motion platform that is being used by customers for both streaming as well as batch processing. Common use cases include ingestion into Hadoop, streaming analytics, ETL, database off-loads, alerts and monitoring, machine model scoring, etc.
Abstract: Predix is an General Electric platform for Internet of Things. It helps users develop applications that connect industrial machines with people through data and analytics for better business outcomes. Predix offers a catalog of services that provide core capabilities required by industrial internet applications. We will deep dive into Predix Time Series and Data Ingestion services leveraging fast, scalable, highly performant, and fault tolerant capabilities of Apache Apex.
Speakers:
- Venkatesh Sivasubramanian, Sr Staff Software Engineer, GE Predix & Committer of Apache Apex
- Pramod Immaneni, PPMC member of Apache Apex, and DataTorrent Architect
This talk was given at Cassandra London meetup: https://www.meetup.com/Cassandra-London/events/267271963/ . The talk is about orchestration of Cassandra with our Kubernetes Operator and Yelp PaaSTA. We also outline some of the opportunities and challenges associated with this architecture.
Youtube link: https://www.youtube.com/watch?v=JqAILFkkibA
There are many common workloads in R that are "embarrassingly parallel": group-by analyses, simulations, and cross-validation of models are just a few examples. In this talk I'll describe several techniques available in R to speed up workloads like these, by running multiple iterations simultaneously, in parallel.
Many of these techniques require the use of a cluster of machines running R, and I'll provide examples of using cloud-based services to provision clusters for parallel computations. In particular, I will describe how you can use the SparklyR package to distribute data manipulations using the dplyr syntax, on a cluster of servers provisioned in the Azure cloud.
Presented by David Smith at Data Day Texas in Austin, January 27 2018.
Pollfish is a survey platform which provides access to millions of targeted users. Pollfish allows easy distribution and targeting of surveys through existing mobile apps. (https://www.pollfish.com/). At pollfish we use Cassandra for difference use cases, eg. for application data store to maximize write throughput when appropriate and for our analytics project to find insights in application generated data. As a medium to accomplish our success so far, we use the Datastax's DSE 4.6 environment which integrates Appache Cassadra, Spark and a hadoop compatible file system (CFS). We will discuss how we started, how the journey was and the impressions gained so far along with some tips learned the hard way. This is a result of joint work of an excellent team here at Pollfish.
These slides show how to reduce latency on websites and reduce bandwidth for improved user experience.
Covering network, compression, caching, etags, application optimisation, sphinxsearch, memcache, db optimisation
Orchestrating Cassandra with Kubernetes Operator and PaaSTARaghavendra Prabhu
Video URL: https://youtu.be/GjI6MUz7AyE
This is the slide deck of the Percona Live Online 2020 talk given by me in May 2020: https://www.percona.com/resources/videos/orchestrating-cassandra-kubernetes-operator-and-yelp-paasta-percona-live-online
The talk delves into the architecture of our Cassandra Kubernetes Operator and the multi-region multi-AZ clusters it manages, and strategies we have in place for safe rollouts and zero-downtime migration.
Dyn delivers exceptional Internet Performance. Enabling high quality services requires data centers around the globe. In order to manage services, customers need timely insight collected from all over the world. Dyn uses DataStax Enterprise (DSE) to deploy complex clusters across multiple datacenters to enable sub 50 ms query responses for hundreds of billions of data points. From granular DNS traffic data, to aggregated counts for a variety of report dimensions, DSE at Dyn has been up since 2013 and has shined through upgrades, data center migrations, DDoS attacks and hardware failures. In this webinar, Principal Engineers Tim Chadwick and Rick Bross cover the requirements which led them to choose DSE as their go-to Big Data solution, the path which led to SPARK, and the lessons that we’ve learned in the process.
Orchestrating Cassandra with Kubernetes: Challenges and OpportunitiesRaghavendra Prabhu
This is a talk about orchestration of Cassandra with cassandra operator, kubernetes and Yelp PaaSTA (https://github.com/Yelp/paasta).
The talk was presented at Computer Laboratory, University of Cambridge as part of the Engineering, Science and Technology Event (https://www.careers.cam.ac.uk/recruiting/event2Tech.asp) in November 2019.
NetflixOSS Meetup S3 E1, covering latest components in Distributed Databases, Telemetry systems, Big Data tools and more. Speakers from Netflix, IBM Watson, Pivotal and Nike Digital
This is a slide deck that was used for our 11/19/15 Nike Tech Talk to give a detailed overview of the SnappyData technology vision. The slides were presented by Jags Ramnarayan, Co-Founder & CTO of SnappyData
Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...Amazon Web Services
“Infrastructure as Code” has changed not only how we think about configuring infrastructure, but about the infrastructure itself. AWS has been at the core of this movement, enabling your infrastructure teams to benefit from software engineering best practices such as CI/CD, automated testing, and repeatable deployments. Now that you have mastered the art of managing your infrastructure as code, it’s time to leverage these same lessons for monitoring and metrics. In this session, we dive into how you can leverage tooling such as AWS, Terraform, and Datadog to programmatically define your monitoring so that you that you can scale your organizational observability along with your infrastructure, and attain consistency from local development all the way through production.
Session sponsored by Datadog, Inc.
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightScyllaDB
Opera chose Scylla over Cassandra to sync the data of millions of browsers to a back-end data repository. The results of the migration and further optimizations they made in their stack helped Opera to gain better latency/throughput and lower resources usage beyond their expectations.
Attend this session to learn how to
Migrate your data in a sane way, without any downtime
Connect a Python+Django web app to Scylla, how to use intranode sharding to improve your application
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
2. About Me
● Sr. Engineer at Pythian
o Lead of Cassandra Practice
● Remote in Minnesota
● DataStax MVP for Cassandra ‘14
● Interests
o Java, Clojure, Python dev
o Data science
o Hobbyist electronics
#CassandraSummit 2014
3. About Pythian
Pythian is a global data outsourcing and consulting company that
specializes in optimizing and managing mission-critical data systems.
Pythian blends the world’s leading data experts with advanced, secure
service delivery processes to create the industry’s best standard of care
for its clients.
Since its inception, Pythian has managed some of the world’s largest,
most business-critical data infrastructures.
#CassandraSummit 2014
10,000
Pythian currently manages more than 10,000
systems.
350
Pythian currently employs more than 350 people
in 25 countries worldwide.
1997
Pythian was founded in 1997
4. About Cassandra
● No Single Point of Failure
● Fault Tolerant
● Awesome properties for an operations team who does
not want to get up at 3am
#CassandraSummit 2014
5. About Cassandra
● Nothing should be set up and forgotten about
● Easy to do with Cassandra though
o Fault tolerance on properly configured setup handles
single node being down or having temp performance
issues
o No back pressure on writes until there is a lot of
#CassandraSummit 2014
trouble
6. Utilize the fault tolerance buffer
● Need to observe and react to current issues
● Predict future issues
● Divide this into two approaches
#CassandraSummit 2014
o Proactive
o Reactive
7. Proactive
● Daily & Weekly checkups to prevent, and
predict problems
o Capacity
o Performance bottlenecks
o Data Modeling issues
#CassandraSummit 2014
8. Reactive
● Something about best laid plans…
o Hardware failures
o Bugs
o Malicious or Non-Malicious users
● Alarms, Pager Duty
#CassandraSummit 2014
12. Metrics
but of course…
Without context, the data is just pretty graphs
13. JMX
● Java Management Extensions
● Complex… very engineered
● Resources represented as objects with
attributes and operations
● Used for monitoring or as input
#CassandraSummit 2014
14. JMX
● The annoying gateway to metrics
○ Poor tooling - requires java
○ Slow, Memory Leaks
○ Historically and currently frustrating for ops (pre 2.0.8)
Cassandra
Init connection to port
7199 Reply with hostname:port for
1024-65535
#CassandraSummit 2014
RMI connection
Client (You)
Gets new hostname:port,
drops old connection and
attempts to connect
7199
7199
Connected!
15. JMX
#CassandraSummit 2014
● Visual
o jconsole
o visualvm
● Command line
o jmxterm
o jmxsh
● MX4J
● Jolokia
24. Metrics
● Toolkit called metrics for metrics
o By Coda Hale @ Yammer
● Easy to use
● Easy to read (if you know java)
● Popular
#CassandraSummit 2014
25. Types of Metrics
#CassandraSummit 2014
● Gauge
o instantaneous value
● Counter
o number that can be incremented & decremented
● Meter
o rate of events over time (1/5/15 min moving avg)
● Histogram
o representation of statistical distribution
50, 75, 95, 98, 99, 99.9 percentile
average, median, min, max, standard deviation
● Timer
o rate of events (meter)
o histogram of duration
26. JMX
#CassandraSummit 2014
75th percentile is 683 MICROSECONDS
(75% took 683us or less)
One minute rate is 13,915 calls per SECOND
27. JMX
● Overwhelming at first
● Hard to tell what they mean without the source
● Moves around a lot
● Fortunately there is nodetool
#CassandraSummit 2014
28. Nodetool
● JMX command line wrapper
● Many options
● Operations and diagnostic procedures
● For reactive analysis
o ad hoc, spot checks
#CassandraSummit 2014
30. Staged Event Driven Architecture
● Decomposes complex event system
● Set of stages (thread pools)
● Queue between each
● Shares a lot of pros cons as SOA
#CassandraSummit 2014
32. Staged Event Driven Architecture
● Possible to overrun the processing capabilities of a
stage that is not in the requests feedback loop (i.e.
ReadRepairStage)
#CassandraSummit 2014
68. Reporting Interface
● Configurable with yaml
o console, csv, ganglia, graphite, riemann
● Create reporter with premain agent
o compiling new jar with manifest
o add to classpath
o add javaagent in cassandra-env.sh
#CassandraSummit 2014
69. Garbage Collection
● Death, Taxes, and a stop the world GC
● Common issue to all JVM based applications
#CassandraSummit 2014
70. Garbage Collection
Enable gc logging
● Virtually no overhead
● Can be very helpful in diagnosing
performance issues
#CassandraSummit 2014
73. Garbage Collection
Could be its own talk
Honorable mentions:
● https://github.com/chewiebug/GCViewer
● http://jworks.idv.tw/GcWeb/
● Python, R, Octave
#CassandraSummit 2014
74. Logging
/var/log/cassandra/system.log
o provides a rolling log
o log4j
/var/log/cassandra/output.log
o captured standard error and standard out
o truncated on restart
#CassandraSummit 2014
System Logs
o syslog, dmesg, etc