Cassandra advanced course, in the spirit of most Computer Science undergraduate level 3rd year courses. This presentations explores some of the newer features of Apache (DataStax) Cassandra such as Concensus Algorithm Paxos and new datatypes Tuples and User Defined Types.
The document discusses the findings of an IBM survey on cloud computing adoption. Some key points:
- The survey defined public, private, and hybrid cloud models and polled over 1,000 IT and business leaders globally.
- It found that while many organizations are considering cloud computing, most currently favor private clouds. Less than a quarter of respondents said they have the necessary service management capabilities for cloud.
- Preferences and readiness for public versus private clouds varied by workload type. While analytics workloads were seen as more suitable for public clouds, infrastructure services were viewed as better in private clouds.
- Key adoption factors included improving agility and reducing costs, while security and control concerns posed barriers. Organizations
Cassandra: Two data centers and great performanceDATAVERSITY
In this talk we describe the features of Cassandra that set it above the pack, and how to get the most out of them, depending on your application. In particular, we'll describe de-normalization, and detail how the algorithms behind Cassandra leverage awesome write speed to accelerate reads; and we'll explain how Cassandra achieves multi-datacenter support, tunable consistency and no single point of failure, to give a great solution for highly available systems.
As the big data market matures, Hadoop discussions are expanding from pure technology to how businesses can use it to drive innovation and leap-frog competition. In this session, Karmasphere will outline how to successfully deploy Hadoop projects by bringing together the right people, technology and use cases. We will discuss the optimal project team, the role of data scientists and analysts, the new big data analytics workflow and use cases for driving rapid ROI and success.
Winning with Structured Data and Schema.org - OMLIVE 2018Izzi Smith
The use of Structured Data and Schema helps provide crucial understanding and context to your data in a way Search Engines can understand, allowing them to provide more relevant and richer results to users. My talk for the OMLIVE 2018 aims to teach you what it is, how to implement it, and how to get the most out of it to improve your organic CTR and performance.
The document discusses using MongoDB to enable open government data. It describes how MongoDB can gather data from various sources and serve it through RESTful JSON APIs. MongoDB's flexible schema allows data to be stored in a format that closely matches the native structure without needing to transform it. This flexibility also enables the storage of custom fields without changing the database schema. Three open data projects are highlighted that use MongoDB: Poligraft, the Real Time Congress API, and the Open State Project.
BDD: Telling stories through code [For TechNotts]Matt Brunt
Stories? Scenarios? BDD? Are these just more words in the ever-growing list of jargon that developers have to know? Or are they something more important than new terms to memorise?
In this session we'll look at how BDD fits into the software development work-flow, how to tell user stories through features, and what makes a good feature file.
These techniques will help to ensure you're writing well designed and tested software that focuses on what the users really want from a system.
A model based approach for developing event-driven architectures with AsyncAPIabgolla
In this Internet of Things (IoT) era, our everyday objects have evolved into the so-called cyber-physical systems (CPS).
The use and deployment of CPS has especially penetrated the industry, giving rise to the Industry 4.0 or Industrial IoT (IIoT).
Typically, architectures in IIoT environments are distributed and asynchronous, communication being guided by events such as the publication of (and corresponding subscription to) messages.
While these architectures have some clear advantages (such as scalability and flexibility), they also raise interoperability challenges among the agents in the network. Indeed, the knowledge about the message content and its categorization (topics) gets diluted, leading to consistency problems, potential losses of information and complex processing requirements on the subscriber side to try to understand the received messages.
In this paper, we present our proposal relying on \emph{AsyncAPI} to automate the design and implementation of these architectures using model-based techniques for the generation of (part of) event-driven infrastructures.
The prototype that implements this proposal as an open-source project is available at https://github.com/SOM-Research/asyncapi-toolkit
InfluxData Webinar 16 June, 2020 - How to Create a Telegraf Parser Plugin for...Emanuele Falzone
Is your data in an exotic format stored in Kafka? Let’s write a Telegraf Plugin! This session is a step-by-step presentation on how to develop a Telegraf parser plugin. The focus will be on Apache Avro, a popular data serialization format widely used in Kafka-based data pipelines.
The document discusses the findings of an IBM survey on cloud computing adoption. Some key points:
- The survey defined public, private, and hybrid cloud models and polled over 1,000 IT and business leaders globally.
- It found that while many organizations are considering cloud computing, most currently favor private clouds. Less than a quarter of respondents said they have the necessary service management capabilities for cloud.
- Preferences and readiness for public versus private clouds varied by workload type. While analytics workloads were seen as more suitable for public clouds, infrastructure services were viewed as better in private clouds.
- Key adoption factors included improving agility and reducing costs, while security and control concerns posed barriers. Organizations
Cassandra: Two data centers and great performanceDATAVERSITY
In this talk we describe the features of Cassandra that set it above the pack, and how to get the most out of them, depending on your application. In particular, we'll describe de-normalization, and detail how the algorithms behind Cassandra leverage awesome write speed to accelerate reads; and we'll explain how Cassandra achieves multi-datacenter support, tunable consistency and no single point of failure, to give a great solution for highly available systems.
As the big data market matures, Hadoop discussions are expanding from pure technology to how businesses can use it to drive innovation and leap-frog competition. In this session, Karmasphere will outline how to successfully deploy Hadoop projects by bringing together the right people, technology and use cases. We will discuss the optimal project team, the role of data scientists and analysts, the new big data analytics workflow and use cases for driving rapid ROI and success.
Winning with Structured Data and Schema.org - OMLIVE 2018Izzi Smith
The use of Structured Data and Schema helps provide crucial understanding and context to your data in a way Search Engines can understand, allowing them to provide more relevant and richer results to users. My talk for the OMLIVE 2018 aims to teach you what it is, how to implement it, and how to get the most out of it to improve your organic CTR and performance.
The document discusses using MongoDB to enable open government data. It describes how MongoDB can gather data from various sources and serve it through RESTful JSON APIs. MongoDB's flexible schema allows data to be stored in a format that closely matches the native structure without needing to transform it. This flexibility also enables the storage of custom fields without changing the database schema. Three open data projects are highlighted that use MongoDB: Poligraft, the Real Time Congress API, and the Open State Project.
BDD: Telling stories through code [For TechNotts]Matt Brunt
Stories? Scenarios? BDD? Are these just more words in the ever-growing list of jargon that developers have to know? Or are they something more important than new terms to memorise?
In this session we'll look at how BDD fits into the software development work-flow, how to tell user stories through features, and what makes a good feature file.
These techniques will help to ensure you're writing well designed and tested software that focuses on what the users really want from a system.
A model based approach for developing event-driven architectures with AsyncAPIabgolla
In this Internet of Things (IoT) era, our everyday objects have evolved into the so-called cyber-physical systems (CPS).
The use and deployment of CPS has especially penetrated the industry, giving rise to the Industry 4.0 or Industrial IoT (IIoT).
Typically, architectures in IIoT environments are distributed and asynchronous, communication being guided by events such as the publication of (and corresponding subscription to) messages.
While these architectures have some clear advantages (such as scalability and flexibility), they also raise interoperability challenges among the agents in the network. Indeed, the knowledge about the message content and its categorization (topics) gets diluted, leading to consistency problems, potential losses of information and complex processing requirements on the subscriber side to try to understand the received messages.
In this paper, we present our proposal relying on \emph{AsyncAPI} to automate the design and implementation of these architectures using model-based techniques for the generation of (part of) event-driven infrastructures.
The prototype that implements this proposal as an open-source project is available at https://github.com/SOM-Research/asyncapi-toolkit
InfluxData Webinar 16 June, 2020 - How to Create a Telegraf Parser Plugin for...Emanuele Falzone
Is your data in an exotic format stored in Kafka? Let’s write a Telegraf Plugin! This session is a step-by-step presentation on how to develop a Telegraf parser plugin. The focus will be on Apache Avro, a popular data serialization format widely used in Kafka-based data pipelines.
Who wants to be a Cassandra MillionaireVictor Anjos
A game show exploration of Cassandra from a beginner's point of view. This game takes people through my exploration of installation, optimization and performance tuning for Cassandra, as presented at the Cassandra Summit (2015) in Santa Clara - hosted by DataStax.
A review of Cassandra terminology, best practices and what to do and NOT to do when installing and using Cassandra. Based on the hit television show Jeopardy.
Cassandra as the heart and soul of a Lambda Architecture for realtime data analysis. Based on work with Apache Storm, Spark Streaming, Spark and DataStax Cassandra.
An attempt to teach Open Data members in the Government of Ontario Open Data initiative the use of Cassandra, Time Series DB and Kairos DB specifically. This POC was completed in python and is open sourced on my github.
The document provides information and instructions for accessing APIs from various organizations to use for a hackathon, including the Guardian API, Semantria API, DataSift API, and Amazon datasets. It includes tips for initial queries to make using the Guardian API, how to sign up and activate licenses for the DataSift API, and potential ideas for projects using clean energy and Olympic drug use data.
CCM AlchemyAPI and Real-time AggregationVictor Anjos
An exploratory look into KairosDB (OpenTSDB) connected to Cassandra (CCM) and using AlchemyAPI for entity, topic and sentiment extraction.
Sprinkled in is a bit of Data Modeling, Truth Tables, Primary Keys, Partition Keys and Cluster Keys.
All written in Python!
The document provides an overview of big data and how choices were made regarding technologies. It discusses the evolution of big data technologies from blade servers and cheaper storage enabling Google and YouTube to cloud computing and Netflix. A variety of database technologies are presented, from early systems like MySQL to newer systems like HBase, Mahout, and Google MapReduce. The document suggests balancing needs for real-time analytics versus ensured accuracy when choosing a big data solution but does not specify how a choice was made. It hints that data storage, searching, analytics, and research are focuses going forward.
This will walk you through the installation of a Cassandra (1.2) node on an Ubuntu (12.04) server.
It will teach you to configure LVM, XFS, /var/lib/cassandra/data and /var/lib/cassandra/commitlog directories.
It will also teach you how to install OpsCenter (FREE) from Datastax to help manage it all.
This was based on a talk given at the Toronto Cassandra Meetup on August 7th, 2013 by myself, Victor Anjos.
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
Session 1 - Intro to Robotic Process Automation.pdfUiPathCommunity
👉 Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program:
https://bit.ly/Automation_Student_Kickstart
In this session, we shall introduce you to the world of automation, the UiPath Platform, and guide you on how to install and setup UiPath Studio on your Windows PC.
📕 Detailed agenda:
What is RPA? Benefits of RPA?
RPA Applications
The UiPath End-to-End Automation Platform
UiPath Studio CE Installation and Setup
💻 Extra training through UiPath Academy:
Introduction to Automation
UiPath Business Automation Platform
Explore automation development with UiPath Studio
👉 Register here for our upcoming Session 2 on June 20: Introduction to UiPath Studio Fundamentals: https://community.uipath.com/events/details/uipath-lagos-presents-session-2-introduction-to-uipath-studio-fundamentals/
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Ukraine
Під час доповіді відповімо на питання, навіщо потрібно підвищувати продуктивність аплікації і які є найефективніші способи для цього. А також поговоримо про те, що таке кеш, які його види бувають та, основне — як знайти performance bottleneck?
Відео та деталі заходу: https://bit.ly/45tILxj
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...DanBrown980551
This LF Energy webinar took place June 20, 2024. It featured:
-Alex Thornton, LF Energy
-Hallie Cramer, Google
-Daniel Roesler, UtilityAPI
-Henry Richardson, WattTime
In response to the urgency and scale required to effectively address climate change, open source solutions offer significant potential for driving innovation and progress. Currently, there is a growing demand for standardization and interoperability in energy data and modeling. Open source standards and specifications within the energy sector can also alleviate challenges associated with data fragmentation, transparency, and accessibility. At the same time, it is crucial to consider privacy and security concerns throughout the development of open source platforms.
This webinar will delve into the motivations behind establishing LF Energy’s Carbon Data Specification Consortium. It will provide an overview of the draft specifications and the ongoing progress made by the respective working groups.
Three primary specifications will be discussed:
-Discovery and client registration, emphasizing transparent processes and secure and private access
-Customer data, centering around customer tariffs, bills, energy usage, and full consumption disclosure
-Power systems data, focusing on grid data, inclusive of transmission and distribution networks, generation, intergrid power flows, and market settlement data
Who wants to be a Cassandra MillionaireVictor Anjos
A game show exploration of Cassandra from a beginner's point of view. This game takes people through my exploration of installation, optimization and performance tuning for Cassandra, as presented at the Cassandra Summit (2015) in Santa Clara - hosted by DataStax.
A review of Cassandra terminology, best practices and what to do and NOT to do when installing and using Cassandra. Based on the hit television show Jeopardy.
Cassandra as the heart and soul of a Lambda Architecture for realtime data analysis. Based on work with Apache Storm, Spark Streaming, Spark and DataStax Cassandra.
An attempt to teach Open Data members in the Government of Ontario Open Data initiative the use of Cassandra, Time Series DB and Kairos DB specifically. This POC was completed in python and is open sourced on my github.
The document provides information and instructions for accessing APIs from various organizations to use for a hackathon, including the Guardian API, Semantria API, DataSift API, and Amazon datasets. It includes tips for initial queries to make using the Guardian API, how to sign up and activate licenses for the DataSift API, and potential ideas for projects using clean energy and Olympic drug use data.
CCM AlchemyAPI and Real-time AggregationVictor Anjos
An exploratory look into KairosDB (OpenTSDB) connected to Cassandra (CCM) and using AlchemyAPI for entity, topic and sentiment extraction.
Sprinkled in is a bit of Data Modeling, Truth Tables, Primary Keys, Partition Keys and Cluster Keys.
All written in Python!
The document provides an overview of big data and how choices were made regarding technologies. It discusses the evolution of big data technologies from blade servers and cheaper storage enabling Google and YouTube to cloud computing and Netflix. A variety of database technologies are presented, from early systems like MySQL to newer systems like HBase, Mahout, and Google MapReduce. The document suggests balancing needs for real-time analytics versus ensured accuracy when choosing a big data solution but does not specify how a choice was made. It hints that data storage, searching, analytics, and research are focuses going forward.
This will walk you through the installation of a Cassandra (1.2) node on an Ubuntu (12.04) server.
It will teach you to configure LVM, XFS, /var/lib/cassandra/data and /var/lib/cassandra/commitlog directories.
It will also teach you how to install OpsCenter (FREE) from Datastax to help manage it all.
This was based on a talk given at the Toronto Cassandra Meetup on August 7th, 2013 by myself, Victor Anjos.
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
Session 1 - Intro to Robotic Process Automation.pdfUiPathCommunity
👉 Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program:
https://bit.ly/Automation_Student_Kickstart
In this session, we shall introduce you to the world of automation, the UiPath Platform, and guide you on how to install and setup UiPath Studio on your Windows PC.
📕 Detailed agenda:
What is RPA? Benefits of RPA?
RPA Applications
The UiPath End-to-End Automation Platform
UiPath Studio CE Installation and Setup
💻 Extra training through UiPath Academy:
Introduction to Automation
UiPath Business Automation Platform
Explore automation development with UiPath Studio
👉 Register here for our upcoming Session 2 on June 20: Introduction to UiPath Studio Fundamentals: https://community.uipath.com/events/details/uipath-lagos-presents-session-2-introduction-to-uipath-studio-fundamentals/
GlobalLogic Java Community Webinar #18 “How to Improve Web Application Perfor...GlobalLogic Ukraine
Під час доповіді відповімо на питання, навіщо потрібно підвищувати продуктивність аплікації і які є найефективніші способи для цього. А також поговоримо про те, що таке кеш, які його види бувають та, основне — як знайти performance bottleneck?
Відео та деталі заходу: https://bit.ly/45tILxj
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...DanBrown980551
This LF Energy webinar took place June 20, 2024. It featured:
-Alex Thornton, LF Energy
-Hallie Cramer, Google
-Daniel Roesler, UtilityAPI
-Henry Richardson, WattTime
In response to the urgency and scale required to effectively address climate change, open source solutions offer significant potential for driving innovation and progress. Currently, there is a growing demand for standardization and interoperability in energy data and modeling. Open source standards and specifications within the energy sector can also alleviate challenges associated with data fragmentation, transparency, and accessibility. At the same time, it is crucial to consider privacy and security concerns throughout the development of open source platforms.
This webinar will delve into the motivations behind establishing LF Energy’s Carbon Data Specification Consortium. It will provide an overview of the draft specifications and the ongoing progress made by the respective working groups.
Three primary specifications will be discussed:
-Discovery and client registration, emphasizing transparent processes and secure and private access
-Customer data, centering around customer tariffs, bills, energy usage, and full consumption disclosure
-Power systems data, focusing on grid data, inclusive of transmission and distribution networks, generation, intergrid power flows, and market settlement data
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Keywords: AI, Containeres, Kubernetes, Cloud Native
Event Link: https://meine.doag.org/events/cloudland/2024/agenda/#agendaId.4211
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
The Department of Veteran Affairs (VA) invited Taylor Paschal, Knowledge & Information Management Consultant at Enterprise Knowledge, to speak at a Knowledge Management Lunch and Learn hosted on June 12, 2024. All Office of Administration staff were invited to attend and received professional development credit for participating in the voluntary event.
The objectives of the Lunch and Learn presentation were to:
- Review what KM ‘is’ and ‘isn’t’
- Understand the value of KM and the benefits of engaging
- Define and reflect on your “what’s in it for me?”
- Share actionable ways you can participate in Knowledge - - Capture & Transfer
What is an RPA CoE? Session 2 – CoE RolesDianaGray10
In this session, we will review the players involved in the CoE and how each role impacts opportunities.
Topics covered:
• What roles are essential?
• What place in the automation journey does each role play?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
5. Planning your Data Model
@VictorFAnjos
@Cassandra
@FicstarSoftware
@BrightlaneInc
Start with Queries
Denormalize to Optimize
Planning for Concurrent Writes
Who is jbellis a subscriber of?
Blog entry will have a body, user and category.
10. Enter Paxos
Light Weight Transactions
@VictorFAnjos
@Cassandra
@FicstarSoftware
@BrightlaneInc
Prepares a proposal that is sent to a number of Acceptors.
Waits on a an acknowledgement (in form of promise) from
Acceptors.
Sends accept message to Quorum of Acceptors with new value
to commit.
Returns success? completion to client.
Determines if proposal is newer than what it has seen.
Acknowledges/agree with its own highest proposal value seen
AND the current value (of what is to be set).
Receive message to commit new value.
Accept and return on successful commit of value.
21. Did I mention…
We’re HIRING!
@VictorFAnjos
@Cassandra
@FicstarSoftware
@BrightlaneInc
22. Did I mention…
We’re HIRING!
@VictorFAnjos
@Cassandra
@FicstarSoftware
@BrightlaneInc
Editor's Notes
Consistency - All nodes see the same data at the same time.
performing a read operation will return the value of the most recent write operation causing all nodes to return the same data
Availability - Every request gets a response on success/failure
every client gets a response, regardless of the state of any individual node in the system
Partition Tolerance - System continues to work despite message loss or partial failure
can sustain any amount of network failure that doesn't result in a failure of the entire network
data records are sufficiently replicated across combinations of nodes and networks to keep the system up through intermittent outages
How you want to access the data?
All of the use cases your application needs to support.
All lookups your application needs to do.
Note any ordering, filtering or grouping requirements.
Relational world → normalize to minimize redundancy
Smaller, well-structured tables with relationships (foreign keys)
Joins.
Cannot join multiple column families to satisfy a given query request.
Plan for one or more rows in a single column family for each query.
This sacrifices disk space and reduces the number of disk seeks.
Row key, a string of virtually unbounded length.
Cassandra does not enforce unique-ness.
Inserting a duplicate row key will upsert the columns contained in the insert statement rather than return a unique constraint violation.
Scenario:You have one bank account, with $100 left in it, and two bank cards.
When you try to withdraw money with the two cards (you and your wife) at the same time at 2 different ATMs, you might get 2 times $100…
PROBLEM!!!
Scenario:You have one bank account, with $100 left in it, and two bank cards.
When you try to withdraw money with the two cards (you and your wife) at the same time at 2 different ATMs, you might get 2 times $100…
PROBLEM!!!
One node acts as a proposer (initiates the protocol).
Only one node can act as proposer at a time, but if two or more choose to then the protocol will (typically) fail to terminate until only one node continues to act as proposer.
Sacrificing termination for correctness.
The other nodes (which conspire to make a decision about the value being proposed) are called ‘acceptors’.
Acceptors respond to proposals from the proposer either by rejecting them for some reason, or agreeing to them in principle and making promises in return about the proposals they will accept in the future.
These promises guarantee that proposals that may come from other proposers will not be erroneously accepted, and in particular they ensure that only the latest of the proposals sent by the proposer is accepted.
Proposer
Acceptors
‘Accept’ here means that an acceptor commits to a proposal as the one it considers definitive.
Once a majority of acceptors have accepted the same proposal, the Paxos protocol can terminate and the proposed value may be disseminated to nodes which are interested in it (these are called ‘listeners’).
Prepare/promise is the core of the algorithm.
Any node may propose a value; we call that node the leader.
The leader picks a ballot and sends it to the participating replicas.
If the ballot is the highest a replica has seen, it promises to not accept any proposals associated with any earlier ballot.
Along with that promise, it includes the most recent proposal it has already received.
If a majority of the nodes promise to accept the leader’s proposal, it may proceed to the actual proposal
but with the wrinkle that if a majority of replicas included an earlier proposal with their promise, then that is the value the leader must propose.
Conceptually, if a leader interrupts an earlier leader, it must first finish that leader’s proposal before proceeding with its own, thus giving us our desired linearizable behavior.
Thus, at the cost of four round trips, we can provide linearizability.
Prepare/promise is the core of the algorithm.
Any node may propose a value; we call that node the leader.
The leader picks a ballot and sends it to the participating replicas.
If the ballot is the highest a replica has seen, it promises to not accept any proposals associated with any earlier ballot.
Along with that promise, it includes the most recent proposal it has already received.
If a majority of the nodes promise to accept the leader’s proposal, it may proceed to the actual proposal
but with the wrinkle that if a majority of replicas included an earlier proposal with their promise, then that is the value the leader must propose.
Conceptually, if a leader interrupts an earlier leader, it must first finish that leader’s proposal before proceeding with its own, thus giving us our desired linearizable behavior.
Thus, at the cost of four round trips, we can provide linearizability.