Learning Objectives - This module will cover Advance HBase concepts. You will also learn what Zookeeper is all about, how it helps in monitoring a cluster, why HBase uses Zookeeper and how to Build Applications with Zookeeper.
The Cassandra architecture shines at ensuring a very high availability of data even while nodes are failing or are overloaded. On the other hand, query latency will often rise during these events, especially on the higher percentiles. Many improvements have been made to reduce this effect over the past years. This talk will focus on one in particular: Speculative Retries. Introduced in Cassandra 2.0 on the server side and in the Java Driver 3.0 on the client side, this strategy remains complex to fully understand and to finely tune. This talk will deep dive into theoretical and practical aspects of Speculative Retries, showing the effect of tuning strategies with ad-hoc benchmarks.
About the Speakers
Michael Figuiere Cloud Platform Engineer, Netflix
Michael is a senior software engineer at Netflix where he works on improving the cloud storage infrastructure. He previously worked at Apple and DataStax where he worked for several years on creating Drivers and Developer Tools for Cassandra. At ease with both enterprise applications and lower level technologies, he specializes in distributed architectures and topics such as databases, search engines, and cloud.
Minh Do Senior Distributed Engineer, Netflix
Minh Do has been working at Netflix for the last several years to run, patch, and troubleshoot Cassandra on both server and client sides, and is also a co-creator of Dynomite project. Prior to Netflix, at Tango, he spearheaded its Big Data pipeline system from the ground using Spark/Hadoop. Before that, at Qualys, he built a distributed queue system that bridges traffics between all major components. He has passion in distributed system, machine learning/deep learning, and data storages.
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...DataStax
Successfully running Apache Cassandra in production often means knowing what configuration settings to change and which ones to leave as default. Over the years the cassandra.yaml file has grown to provide a number of settings that can improve stability and performance. While the file contains plenty of helpful comments, there is more to be said about the settings and when to change them.
In this talk Edward Capriolo, Consultant at The Last Pickle, will break down the parameters in the configuration files. Looking at those that are essential to getting started, those that impact performance, those that improve availability, the exotic ones, and the ones that should not be played with. This talk is ideal for someone someone setting up Cassandra for the first time up to people with deployments in productions and wondering what the more exotic configuration options do.
About the Speaker
Edward Capriolo Consultant, The Last Pickle
Long time Apache Cassandra user, big data enthusiast.
.NET developers have a lot of options when it comes to databases these days. Apache Cassandra is a scalable, fault-tolerant database that has already found its way into more than 25% of the Fortune 100 and continues to grow in popularity. But what makes it different from the myriad of other options available? In this talk, we’ll take a deep dive into Cassandra and learn about:
- Cassandra’s internals and how it works
- CQL (the SQL-like query language for Cassandra)
- Data Modeling like a pro
- Tools available for developers
- Writing .NET code that talks to Cassandra
If there’s time and interest, we’ll finish up with how some companies are already using Cassandra to power services you probably interact with in your daily life. You’ll leave with all the tools you need to start build highly available .NET applications and services on top of Cassandra.
Historically, sharing a Linux server entailed all kinds of untenable compromises. In addition to the security concerns, there was simply no good way to keep one application from hogging resources and messing with the others. The classic “noisy neighbor” problem made shared systems the bargain-basement slums of the Internet, suitable only for small or throwaway projects.
Serious use-cases traditionally demanded dedicated systems. Over the past decade virtualization (in conjunction with Moore’s law) has democratized the availability of what amount to dedicated systems, and the result is hundreds of thousands of websites and applications deployed into VPS or cloud instances. It’s a step in the right direction, but still has glaring flaws.
Most of these websites are just piles of code sitting on a server somewhere. How did that code got there? How can it can be scaled? Secured? Maintained? It’s anybody’s guess. There simply isn’t enough SysAdmin talent in the world to meet the demands of managing all these apps with anything close to best practices without a better model.
Containers are a whole new ballgame. Unlike VMs, you skip the overhead of running an entire OS for every application environment. There’s also no need to provision a whole new machine to have a place to deploy, meaning you can spin up or scale your application with orders of magnitude more speed and accuracy.
The Cassandra architecture shines at ensuring a very high availability of data even while nodes are failing or are overloaded. On the other hand, query latency will often rise during these events, especially on the higher percentiles. Many improvements have been made to reduce this effect over the past years. This talk will focus on one in particular: Speculative Retries. Introduced in Cassandra 2.0 on the server side and in the Java Driver 3.0 on the client side, this strategy remains complex to fully understand and to finely tune. This talk will deep dive into theoretical and practical aspects of Speculative Retries, showing the effect of tuning strategies with ad-hoc benchmarks.
About the Speakers
Michael Figuiere Cloud Platform Engineer, Netflix
Michael is a senior software engineer at Netflix where he works on improving the cloud storage infrastructure. He previously worked at Apple and DataStax where he worked for several years on creating Drivers and Developer Tools for Cassandra. At ease with both enterprise applications and lower level technologies, he specializes in distributed architectures and topics such as databases, search engines, and cloud.
Minh Do Senior Distributed Engineer, Netflix
Minh Do has been working at Netflix for the last several years to run, patch, and troubleshoot Cassandra on both server and client sides, and is also a co-creator of Dynomite project. Prior to Netflix, at Tango, he spearheaded its Big Data pipeline system from the ground using Spark/Hadoop. Before that, at Qualys, he built a distributed queue system that bridges traffics between all major components. He has passion in distributed system, machine learning/deep learning, and data storages.
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...DataStax
Successfully running Apache Cassandra in production often means knowing what configuration settings to change and which ones to leave as default. Over the years the cassandra.yaml file has grown to provide a number of settings that can improve stability and performance. While the file contains plenty of helpful comments, there is more to be said about the settings and when to change them.
In this talk Edward Capriolo, Consultant at The Last Pickle, will break down the parameters in the configuration files. Looking at those that are essential to getting started, those that impact performance, those that improve availability, the exotic ones, and the ones that should not be played with. This talk is ideal for someone someone setting up Cassandra for the first time up to people with deployments in productions and wondering what the more exotic configuration options do.
About the Speaker
Edward Capriolo Consultant, The Last Pickle
Long time Apache Cassandra user, big data enthusiast.
.NET developers have a lot of options when it comes to databases these days. Apache Cassandra is a scalable, fault-tolerant database that has already found its way into more than 25% of the Fortune 100 and continues to grow in popularity. But what makes it different from the myriad of other options available? In this talk, we’ll take a deep dive into Cassandra and learn about:
- Cassandra’s internals and how it works
- CQL (the SQL-like query language for Cassandra)
- Data Modeling like a pro
- Tools available for developers
- Writing .NET code that talks to Cassandra
If there’s time and interest, we’ll finish up with how some companies are already using Cassandra to power services you probably interact with in your daily life. You’ll leave with all the tools you need to start build highly available .NET applications and services on top of Cassandra.
Historically, sharing a Linux server entailed all kinds of untenable compromises. In addition to the security concerns, there was simply no good way to keep one application from hogging resources and messing with the others. The classic “noisy neighbor” problem made shared systems the bargain-basement slums of the Internet, suitable only for small or throwaway projects.
Serious use-cases traditionally demanded dedicated systems. Over the past decade virtualization (in conjunction with Moore’s law) has democratized the availability of what amount to dedicated systems, and the result is hundreds of thousands of websites and applications deployed into VPS or cloud instances. It’s a step in the right direction, but still has glaring flaws.
Most of these websites are just piles of code sitting on a server somewhere. How did that code got there? How can it can be scaled? Secured? Maintained? It’s anybody’s guess. There simply isn’t enough SysAdmin talent in the world to meet the demands of managing all these apps with anything close to best practices without a better model.
Containers are a whole new ballgame. Unlike VMs, you skip the overhead of running an entire OS for every application environment. There’s also no need to provision a whole new machine to have a place to deploy, meaning you can spin up or scale your application with orders of magnitude more speed and accuracy.
These slides show how to reduce latency on websites and reduce bandwidth for improved user experience.
Covering network, compression, caching, etags, application optimisation, sphinxsearch, memcache, db optimisation
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...DataStax
Time series data has long been a natural use case for Cassandra with plenty of write ups showing you how to store mock data ""at scale"". Unfortunately warnings of wide rows and examples of storing numeric-only data aren't sufficient to guide your organization through the realities of running these workloads. Instead you find yourself implementing anti patterns like rotating clusters, only to be taken in by the siren song of DTCS, your hopes dashed across the rocks of ever expanding disk utilization.
We will be taking the lessons learned at Threat Stack - a continuous security monitoring platform - about how to scale a large volume of bulky transactions totaling terabytes and petabytes on AWS, but while holding yourself to a sane budget and DBA-free operational life.
Specifics to include ""break the glass"" operational maneuvers, making DTCS function properly, data modeling, and living in a polyglot data platform.
About the Speaker
Sam Bisbee CTO, Threat Stack
As the CTO at Threat Stack, Sam is responsible for leading the Company's strategic technology roadmap for its continuous security monitoring service, purpose-built for cloud environments. Sam brings highly-relevant experience in distributed systems in public, private, and hybrid cloud environments, as well as proven success scaling SaaS startups. Sam was most recently the CXO at Cloudant (acquired by IBM in Feb. 2014), a leader in the Database-as-a-Service space.
Discussion about the evolution of metrics in Cassandra from 1.0 to 3.0, how the metric changes impact operational tooling, pros and cons for different metric representations, and how and why DataStax OpsCenter collects and stores metrics. Includes a deep dive on how DataStax OpsCenter represents and stores the different kinds of metrics to provide visibility beyond simple cluster averages both behind the scenes and in the rendering.
About the Speaker
Chris Lohfink Software Engineer, DataStax
I am a Java, Python, and Clojure developer who has been using Cassandra in an application development and operational context for the last five years. The last nearly two years I have been working with the OpsCenter Monitoring team at DataStax to improve the accuracy and breadth of the visualization tooling available.
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...DataStax
A solid backup strategy is a DBA's bread and butter. Cassandra's nodetool snapshot makes it easy to back up the SSTable files, but there remains the question of where to put them and how. Knewton's backup strategy uses Ansible for distributed backups and stores them in S3.
Unfortunately, it's all too easy to store backups that are essentially useless due to the absence of a coherent restoration strategy. This problem proved much more difficult and nuanced than taking the backups themselves. I will discuss Knewton's restoration strategy, which again leverages Ansible, yet I will focus on general principles and pitfalls to be avoided. In particular, restores necessitated modifying our backup strategy to generate cluster-wide metadata that is critical for a smooth automated restoration. Such pitfalls indicate that a restore-focused backup design leads to faster and more deterministic recovery.
About the Speaker
Joshua Wickman Database Engineer, Knewton
Dr. Joshua Wickman is currently part of the database team at Knewton, a NYC tech company focused on adaptive learning. He earned his PhD at the University of Delaware in 2012, where he studied particle physics models of the early universe. After a brief stint teaching college physics, he entered the New York tech industry in 2014 working with NoSQL, first with MongoDB and then Cassandra. He was certified in Cassandra at his first Cassandra Summit in 2015.
A meticulous presentation on Authorization, Encryption & Authentication of the security features in MySQL 8.0 by Vignesh Prabhu, Database reliability engineer, Mydbops.
A brief introduction to Hadoop distributed file system. How a file is broken into blocks, written and replicated on HDFS. How missing replicas are taken care of. How a job is launched and its status is checked. Some advantages and disadvantages of HDFS-1.x
Co-Founder and CTO of Instaclustr, Ben Bromhead's presentation at the Cassandra Summit 2016, in San Jose.
This presentation will show how create truly elastic Cassandra deployments on AWS allowing you to scale and shrink your large Cassandra deployments multiple times a day. Leveraging a combination of EBS backed disks, JBOD, token pinning and our previous work on bootstrapping from backups you will be able to dramatically reduce costs per cluster by scaling to match your daily workloads.
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...DataStax
Cassandra's support for multiple data centers can bring massive benefits to an organization, however it can also bring painful operational lessons. While there is no recipe for trouble free mutli DC clusters, the best approach is to understand why you are using one, what Cassandra supports, and how it does it. With this knowledge in your toolkit you will have a better chance of fixing the sort of gremlins that can trouble a globally distributed database.
In this talk Alexander Dejanovski, Consultant at The Last Pickle, will outline the motivations people typically have for running a multi DC cluster. He will also look at how multiple DC's are supported through all areas of the Cassandra, how it impacts your application and operations, and how you can always blame the network.
About the Speaker
Alexander DEJANOVSKI Consultant, The Last Pickle
Alexander has been working as a software developer for the last 18 years, mainly for the french leader of express shipments. He's been leading there the effort to build a Cassandra based architecture and migrate services to it from traditional RDBMS. He is involved in the Cassandra community through the development of a JDBC wrapper for the DataStax Java Driver. Recently, he joined The Last Pickle as a Cassandra consultant and now helps customers to get the best out of it.
Cassandra is pretty awesome, sure I am biased, but it rocks. Always on, tuneable consistency and multi-master architecture? Let’s get our web scale on and build a highly available app that never goes down!
Hold on a second. There is one key piece of the puzzle that has a massive impact on your applications availability: the client driver.
In this talk we will go through the how to best configure your clients to make the most of failure handling and tuneable consistency in Cassandra.
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...DataStax
Making sure your Data Model will work on the production cluster after 6 months as well as it does on your laptop is an important skill. It's one that we use every day with our clients at The Last Pickle, and one that relies on tools like the cassandra-stress. Knowing how the data model will perform under stress once it has been loaded with data can prevent expensive re-writes late in the project.
In this talk Christopher Batey, Consultant at The Last Pickle, will shed some light on how to use the cassandra-stress tool to test your own schema, graph the results and even how to extend the tool for your own use cases. While this may be called premature optimisation for a RDBS, a successful Cassandra project depends on it's data model.
About the Speaker
Christopher Batey Consultant / Software Engineer, The Last Pickle
Christopher (@chbatey) is a part time consultant at The Last Pickle where he works with clients to help them succeed with Apache Cassandra as well as a freelance software engineer working in London. Likes: Scala, Haskell, Java, the JVM, Akka, distributed databases, XP, TDD, Pairing. Hates: Untested software, code ownership. You can checkout his blog at: http://www.batey.info
MongoDb scalability and high availability with Replica-SetVivek Parihar
One of the much awaited features in MongoDB 1.6 is replica sets, MongoDB replication solution providing automatic failover and recovery.
MongoDB High Availabiltity with Replica Sets
This talk will cover -
• What is Replica Set?
• Replication Process
• Advantaged of Replica Set vs master/slave
• How to set up replica set on production Demo
This video is tutorial for setting up the MongoDb replica-set ion production environment. In this i took 3 instances which have already mongo installed and running. This tutorial consists-:
1.Setup the each instance of replica set
2.modify the mongodb.conf to include replica set information
3.configure the servers to include in replica set
4.then cross checking if we kill one primary then secondary becomes primary or not.
Training Slides: 203 - Backup & RecoveryContinuent
Watch this 36min training to learn about planning for backups, what some of the methods and tools are, how to restore backups and more.
TOPICS COVERED
- How to develop a backup plan
- Methods and tools for taking a backup
- Verifying the backup contains the last binary position, and the importance of this
- Restore backups into the cluster
- Provision a replica from an existing datasource
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2kvXlPd
This CloudxLab Introduction to Apache ZooKeeper tutorial helps you to understand ZooKeeper in detail. Below are the topics covered in this tutorial:
1) Data Model
2) Znode Types
3) Persistent Znode
4) Sequential Znode
5) Architecture
6) Election & Majority Demo
7) Why Do We Need Majority?
8) Guarantees - Sequential consistency, Atomicity, Single system image, Durability, Timeliness
9) ZooKeeper APIs
10) Watches & Triggers
11) ACLs - Access Control Lists
12) Usecases
13) When Not to Use ZooKeeper
These slides show how to reduce latency on websites and reduce bandwidth for improved user experience.
Covering network, compression, caching, etags, application optimisation, sphinxsearch, memcache, db optimisation
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...DataStax
Time series data has long been a natural use case for Cassandra with plenty of write ups showing you how to store mock data ""at scale"". Unfortunately warnings of wide rows and examples of storing numeric-only data aren't sufficient to guide your organization through the realities of running these workloads. Instead you find yourself implementing anti patterns like rotating clusters, only to be taken in by the siren song of DTCS, your hopes dashed across the rocks of ever expanding disk utilization.
We will be taking the lessons learned at Threat Stack - a continuous security monitoring platform - about how to scale a large volume of bulky transactions totaling terabytes and petabytes on AWS, but while holding yourself to a sane budget and DBA-free operational life.
Specifics to include ""break the glass"" operational maneuvers, making DTCS function properly, data modeling, and living in a polyglot data platform.
About the Speaker
Sam Bisbee CTO, Threat Stack
As the CTO at Threat Stack, Sam is responsible for leading the Company's strategic technology roadmap for its continuous security monitoring service, purpose-built for cloud environments. Sam brings highly-relevant experience in distributed systems in public, private, and hybrid cloud environments, as well as proven success scaling SaaS startups. Sam was most recently the CXO at Cloudant (acquired by IBM in Feb. 2014), a leader in the Database-as-a-Service space.
Discussion about the evolution of metrics in Cassandra from 1.0 to 3.0, how the metric changes impact operational tooling, pros and cons for different metric representations, and how and why DataStax OpsCenter collects and stores metrics. Includes a deep dive on how DataStax OpsCenter represents and stores the different kinds of metrics to provide visibility beyond simple cluster averages both behind the scenes and in the rendering.
About the Speaker
Chris Lohfink Software Engineer, DataStax
I am a Java, Python, and Clojure developer who has been using Cassandra in an application development and operational context for the last five years. The last nearly two years I have been working with the OpsCenter Monitoring team at DataStax to improve the accuracy and breadth of the visualization tooling available.
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...DataStax
A solid backup strategy is a DBA's bread and butter. Cassandra's nodetool snapshot makes it easy to back up the SSTable files, but there remains the question of where to put them and how. Knewton's backup strategy uses Ansible for distributed backups and stores them in S3.
Unfortunately, it's all too easy to store backups that are essentially useless due to the absence of a coherent restoration strategy. This problem proved much more difficult and nuanced than taking the backups themselves. I will discuss Knewton's restoration strategy, which again leverages Ansible, yet I will focus on general principles and pitfalls to be avoided. In particular, restores necessitated modifying our backup strategy to generate cluster-wide metadata that is critical for a smooth automated restoration. Such pitfalls indicate that a restore-focused backup design leads to faster and more deterministic recovery.
About the Speaker
Joshua Wickman Database Engineer, Knewton
Dr. Joshua Wickman is currently part of the database team at Knewton, a NYC tech company focused on adaptive learning. He earned his PhD at the University of Delaware in 2012, where he studied particle physics models of the early universe. After a brief stint teaching college physics, he entered the New York tech industry in 2014 working with NoSQL, first with MongoDB and then Cassandra. He was certified in Cassandra at his first Cassandra Summit in 2015.
A meticulous presentation on Authorization, Encryption & Authentication of the security features in MySQL 8.0 by Vignesh Prabhu, Database reliability engineer, Mydbops.
A brief introduction to Hadoop distributed file system. How a file is broken into blocks, written and replicated on HDFS. How missing replicas are taken care of. How a job is launched and its status is checked. Some advantages and disadvantages of HDFS-1.x
Co-Founder and CTO of Instaclustr, Ben Bromhead's presentation at the Cassandra Summit 2016, in San Jose.
This presentation will show how create truly elastic Cassandra deployments on AWS allowing you to scale and shrink your large Cassandra deployments multiple times a day. Leveraging a combination of EBS backed disks, JBOD, token pinning and our previous work on bootstrapping from backups you will be able to dramatically reduce costs per cluster by scaling to match your daily workloads.
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...DataStax
Cassandra's support for multiple data centers can bring massive benefits to an organization, however it can also bring painful operational lessons. While there is no recipe for trouble free mutli DC clusters, the best approach is to understand why you are using one, what Cassandra supports, and how it does it. With this knowledge in your toolkit you will have a better chance of fixing the sort of gremlins that can trouble a globally distributed database.
In this talk Alexander Dejanovski, Consultant at The Last Pickle, will outline the motivations people typically have for running a multi DC cluster. He will also look at how multiple DC's are supported through all areas of the Cassandra, how it impacts your application and operations, and how you can always blame the network.
About the Speaker
Alexander DEJANOVSKI Consultant, The Last Pickle
Alexander has been working as a software developer for the last 18 years, mainly for the french leader of express shipments. He's been leading there the effort to build a Cassandra based architecture and migrate services to it from traditional RDBMS. He is involved in the Cassandra community through the development of a JDBC wrapper for the DataStax Java Driver. Recently, he joined The Last Pickle as a Cassandra consultant and now helps customers to get the best out of it.
Cassandra is pretty awesome, sure I am biased, but it rocks. Always on, tuneable consistency and multi-master architecture? Let’s get our web scale on and build a highly available app that never goes down!
Hold on a second. There is one key piece of the puzzle that has a massive impact on your applications availability: the client driver.
In this talk we will go through the how to best configure your clients to make the most of failure handling and tuneable consistency in Cassandra.
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...DataStax
Making sure your Data Model will work on the production cluster after 6 months as well as it does on your laptop is an important skill. It's one that we use every day with our clients at The Last Pickle, and one that relies on tools like the cassandra-stress. Knowing how the data model will perform under stress once it has been loaded with data can prevent expensive re-writes late in the project.
In this talk Christopher Batey, Consultant at The Last Pickle, will shed some light on how to use the cassandra-stress tool to test your own schema, graph the results and even how to extend the tool for your own use cases. While this may be called premature optimisation for a RDBS, a successful Cassandra project depends on it's data model.
About the Speaker
Christopher Batey Consultant / Software Engineer, The Last Pickle
Christopher (@chbatey) is a part time consultant at The Last Pickle where he works with clients to help them succeed with Apache Cassandra as well as a freelance software engineer working in London. Likes: Scala, Haskell, Java, the JVM, Akka, distributed databases, XP, TDD, Pairing. Hates: Untested software, code ownership. You can checkout his blog at: http://www.batey.info
MongoDb scalability and high availability with Replica-SetVivek Parihar
One of the much awaited features in MongoDB 1.6 is replica sets, MongoDB replication solution providing automatic failover and recovery.
MongoDB High Availabiltity with Replica Sets
This talk will cover -
• What is Replica Set?
• Replication Process
• Advantaged of Replica Set vs master/slave
• How to set up replica set on production Demo
This video is tutorial for setting up the MongoDb replica-set ion production environment. In this i took 3 instances which have already mongo installed and running. This tutorial consists-:
1.Setup the each instance of replica set
2.modify the mongodb.conf to include replica set information
3.configure the servers to include in replica set
4.then cross checking if we kill one primary then secondary becomes primary or not.
Training Slides: 203 - Backup & RecoveryContinuent
Watch this 36min training to learn about planning for backups, what some of the methods and tools are, how to restore backups and more.
TOPICS COVERED
- How to develop a backup plan
- Methods and tools for taking a backup
- Verifying the backup contains the last binary position, and the importance of this
- Restore backups into the cluster
- Provision a replica from an existing datasource
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2kvXlPd
This CloudxLab Introduction to Apache ZooKeeper tutorial helps you to understand ZooKeeper in detail. Below are the topics covered in this tutorial:
1) Data Model
2) Znode Types
3) Persistent Znode
4) Sequential Znode
5) Architecture
6) Election & Majority Demo
7) Why Do We Need Majority?
8) Guarantees - Sequential consistency, Atomicity, Single system image, Durability, Timeliness
9) ZooKeeper APIs
10) Watches & Triggers
11) ACLs - Access Control Lists
12) Usecases
13) When Not to Use ZooKeeper
Organizations continue to adopt Solr because of its ability to scale to meet even the most demanding workflows. Recently, LucidWorks has been leading the effort to identify, measure, and expand the limits of Solr. As part of this effort, we've learned a few things along the way that should prove useful for any organization wanting to scale Solr. Attendees will come away with a better understanding of how sharding and replication impact performance. Also, no benchmark is useful without being repeatable; Tim will also cover how to perform similar tests using the Solr-Scale-Toolkit in Amazon EC2.
In the first part of Galera Cluster best practices series, we will discuss the following topics:
* ongoing monitoring of the cluster and detection of bottlenecks;
* fine-tuning the configuration based on the actual database workload;
* selecting the optimal State Snapshot Transfer (SST) method;
* backup strategies
(video:http://galeracluster.com/videos/2159/)
This presentation is from the Gophercon-India where we talked about how to design a concurrent high performance database client in go language. We talked about how we use goroutines and channels to our advantages. we also talked about how to use pools for efficient memory utilization.
10 Ways to Scale Your Website Silicon Valley Code Camp 2019Dave Nielsen
Redis has 10 different data structures (String, Hash, List, Set, Sorted Set, Bit Array, Bit Field, Hyperloglog, Geospatial Index, Streams) plus Pub/Sub and many Redis Modules. In this talk, Dave will give 10 examples of how to use these data structures to scale your website. I will start with the basics, such as a cache and User session management. Then I demonstrate user generated tags, leaderboards and counting things with hyberloglog. I will with a demo of Redis Pub/Sub vs Redis Streams which can be used to scale your Microservices-based architecture.
Maria DB Galera Cluster for High AvailabilityOSSCube
Want to understand how to set high availability solutions for MySQL using MariaDB Galera Cluster? Join this webinar, and learn from experts. During this webinar, you will also get guidance on how to implement MariaDB Galera Cluster.
10 Ways to Scale with Redis - LA Redis Meetup 2019Dave Nielsen
Redis has 10 different data structures (String, Hash, List, Set, Sorted Set, Bit Array, Bit Field, Hyperloglog, Geospatial Index, Streams) plus Pub/Sub and many Redis Modules. In this talk, Dave will give 10 examples of how to use these data structures to scale your website. I will start with the basics, such as a cache and User session management. Then I demonstrate user generated tags, leaderboards and counting things with hyberloglog. I will with a demo of Redis Pub/Sub vs Redis Streams which can be used to scale your Microservices-based architecture.
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...DataStax
Go90 is a mobile entertainment platform offering access to live and on demand videos. We built the web services platform and social features like activity feed for go90 by making heavy use of Cassandra and Scala, and would like to share what we learned during development and while operating go90. In this presentation, we cover our data model evolution from the initial prototypes to the current production version and the significant performance gain by using a better data model. We will explain how we apply time series data modeling and the benefits of using expiring columns with DateTieredCompactionStrategy. We will also talk about interesting experiences related to table modifications, tombstones and table pagination. On the operations side, we will discuss our findings on java driver usage, performance, monitoring, cluster maintenance, version upgrade, 2-way ssl and many more. We hope you can learn from our mistakes instead of making them yourself!
About the Speakers
Christopher Webster Software Engineer, AOL
Christopher Webster works on the web services platform for the go90 AOL project. Previously he was a Computer Scientist for the Mission Control Technologies project at NASA Ames Center. Chris worked as a senior staff engineer at Sun Microsystems for Project zembly, the cloud development and deployment environment as well as technical lead in many NetBeans projects. Chris is an author of the NetBeans Field Guide and Assemble the Social Web With Zembly.
Thomas Ng Software Engineer, AOL
Thomas Ng is a software engineer at AOL, building web services for the go90 mobile entertainment platform using Cassandra, Scala and Kafka.
Abstract:
Cassandra is a new kind of database: it is more than a single-machine system. It naturally runs in a High-Availability configuration. All nodes in the system are symmetric; there is no single point of failure. As you add machines, failure becomes routine, and Cassandra is built to tolerate that with no interruptions.
Cassandra is linearly scalable with good performance characteristics for very small and very large data stores. Unlike earlier efforts, Cassandra is more than just a key-value store; it is a structured data store which can facilitate complex use cases and queries. Cassandra allows for random access to your data organized into rows and columns.
Cassandra is different, and exciting. This presentation will discuss the pros and cons of using Cassandra, and why it has seen such amazing adoption in the past year.
Bio:
Ben Coverston is Director of Operations at DataStax (formerly knows as Riptano), a provider of software, support, services, training, resources and help for Cassandra. He has been involved in enterprise software his entire career. Working in the airline industry, he helped to build some of the highest volume online booking sites in the world. He saw first hand the consequences of trying to solve real world scalability problems at the limit of what traditional relational databases are capable of.
This talk is a followup to Deploying systemd at scale that was presented at systemd.conf 2016, and covers the aftermath of the migration of our fleet to CentOS 7. Now that systemd is available everywhere, we found more and more services that started adopting it for their deployment, leveraging its features and occasionally exposing interesting behaviors. At the same time, we've been able to hone our process for integrating and rolling out new versions of systemd on the fleet, and started building tooling to manage and monitor it at scale.
Learning Objectives - In this module, you will understand how multiple Hadoop ecosystem components work together in a Hadoop implementation to solve Big Data problems. We will discuss multiple data sets and specifications of the project. This module will also cover Apache Oozie Workflow Scheduler for Hadoop Jobs.
Learning Objectives - In this module, you will understand the newly added features in Hadoop 2.0, namely, YARN, MRv2, NameNode High Availability, HDFS Federation, support for Windows etc.
Learning Objectives - In this module, you will understand Advance Hive concepts such as UDF. Understanding columnar database HBase. Comparing SQL and NoSQL approach.
Learning Objectives - In this module, you will learn what is Pig, in which type of use case we can use Pig, how Pig is tightly coupled with MapReduce, and Pig Latin scripting.
Learning Objectives - In this module, you will learn Advance MapReduce concepts such as Counters, Custom Writables, Compression, Tuning, Error Handling, and how to deal with complex MapReduce programs.
Learning Objectives - In this module, you will understand Hadoop MapReduce framework and how MapReduce works on data stored in HDFS. Also, you will learn what are the different types of Input and Output formats in MapReduce framework and their usage.
Hadoop Cluster Configuration and Data Loading - Module 2Rohit Agrawal
Learning Objectives - In this module, you will learn the Hadoop Cluster Architecture and Setup, Important Configuration files in a Hadoop Cluster, Data Loading Techniques.
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
Learning Objectives - In this module, you will understand what is Big Data, What are the limitations of the existing solutions for Big Data problem; How Hadoop solves the Big Data problem, What are the common Hadoop ecosystem components, Hadoop Architecture, HDFS and Map Reduce Framework, and Anatomy of File Write and Read.
Learning Objectives - This module will help you in understanding Apache Hive Installation, Loading and Querying Data in Hive and so on.
Topics - Hive Architecture and Installation, Comparison with Traditional Database, HiveQL: Data Types, Operators and Functions, Hive Tables (Managed Tables and External Tables, Partitions and Buckets, Storage Formats, Importing Data, Altering Tables, Dropping Tables), Querying Data (Sorting And Aggregating, Map Reduce Scripts, Joins & Subqueries, Views, Map and Reduce side Joins to optimize Query).
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
5. Example: Mail Inbox
<userId> : <colfam> : <messageId> : <timestamp> : <email-message>
12345 : data : 5fc38314-e290-ae5da5fc375d : 1307097848 : "Hi Lars, ..."
12345 : data : 725aae5f-d72e-f90f3f070419 : 1307099848 : "Welcome, and ..."
12345 : data : cc6775b3-f249-c6dd2b1a7467 : 1307101848 : "To Whom It ..."
12345 : data : dcbee495-6d5e-6ed48124632c : 1307103848 : "Hi, how are ..."
OR
12345-5fc38314-e290-ae5da5fc375d : data : : 1307097848 : "Hi Lars, ..."
12345-725aae5f-d72e-f90f3f070419 : data : : 1307099848 : "Welcome, and ..."
12345-cc6775b3-f249-c6dd2b1a7467 : data : : 1307101848 : "To Whom It ..."
12345-dcbee495-6d5e-6ed48124632c : data : : 1307103848 : "Hi, how are ..."
Same Storage Requirements
6. Secondary Indexes
Although HBase has no native support for secondary indexes, there are
use cases that need them. The requirements are usually that can look
up a cell with not just the primary coordinates—the row key, column
family name, and qualifier—but also an alternative coordinate. In
addition, it can scan a range of rows from the main table, but ordered
by the secondary index.
• Client-managed
• Indexed-Transactional HBase
• Indexed HBase
7. Coprocessors
• Think of this as a small MapReduce framework that distributes
work across the entire cluster.
• A coprocessor enables to run arbitrary code directly on each
region server.
• It executes the code on a per-region basis, giving trigger-like
functionality
8. Zookeeper
• An open source server that reliably coordinates distributed
processes.
• Apache ZooKeeper provides operational services for a Hadoop
cluster.
• ZooKeeper provides a distributed configuration service, a
synchronization service and a naming registry for distributed
systems.
• Distributed applications use ZooKeeper to store and mediate
updates to important configuration information.
9. Zookeeper Service : Data Model
• Znode
– In-memory data node in the Zookeeper data
– Have a hierarchical namespace
– UNIX like notation for path
• Types of Znode
– Persistent
– Ephemeral
• Flags of Znode
– Sequential numbers
11. The ZooKeeper service can run in two modes.
• In standalone mode, there is a single ZooKeeper server, which is
useful for testing due to its simplicity (it can even be embedded in
unit tests), but provides no guarantees of high-availability or
resilience.
• In production, ZooKeeper runs in replicated mode, on a cluster of
machines called an ensemble. ZooKeeper achieves high-availability
through replication, and can provide a service as long as a majority of
the machines in the ensemble are up.
Zookeeper Service: Implementation
13. Zookeeper Service: Sessions
• A ZooKeeper client is configured with the list of servers in the ensemble.
On startup, it tries to connect to one of the servers in the list.
• Once a connection has been made with a ZooKeeper server, the server
creates a new session for the client.
• Sessions are kept alive by the client sending ping requests (also known as
heartbeats) whenever the session is idle for longer than a certain period.