This is the Expert Q&A from 2600hz and Cloudant on Database in Telecom. If you are a service provider, MSP or anyone running a VoIP switch, you should definitely check this out.
The Serverless experience is revolutionary and will grow to dominate the future of Cloud. Function-as-a-Service (FaaS) however—with its ephemeral, stateless, and short-lived functions—is only the first step. FaaS is great for processing-intensive, parallelizable workloads, moving data from A to B providing enrichment and transformation along the way. But it is quite limited and constrained in what use-cases it addresses well, which makes it very hard/inefficient to implement general-purpose application development and distributed systems protocols.
What’s needed is a next-generation Serverless platform and programming model for general-purpose application development in the new world of real-time data and event-driven systems. What is missing is ways to manage distributed state in a scalable and available fashion, support for long-lived virtual stateful services, ways to physically co-locate data and processing, and options for choosing the right data consistency model for the job.
This talk will discuss the challenges, requirements, and introduce you to our proposed solution: Cloudstate—an Open Source project building the next generation Stateful Serverless and leveraging state models such as Event Sourcing, CQRS, and CRDTs, running on Akka, gRPC, Knative, Kubernetes, and GraalVM, in a polyglot fashion with support for Go, JavaScript, Java, Swift, Scala, Python, Kotlin, and more.
The Economics of Scale: Promises and Perils of Going DistributedTyler Treat
What does it take to scale a system? We'll learn how going distributed can pay dividends in areas like availability and fault tolerance by examining a real-world case study. However, we will also look at the inherent pitfalls. When it comes to distributed systems, for every promise there is a peril.
This document summarizes a talk given by Tyler Treat about using simple solutions for complex distributed systems problems. Some key points:
- Distributed systems are inherently asynchronous and unreliable, but many try to build them as if they are synchronous.
- Exact delivery guarantees are expensive and impossible at scale. Replayable and idempotent delivery are better alternatives.
- NATS is a simple, high performance, and highly available messaging system that embraces asynchronous communication.
- Workiva uses NATS as a messaging backplane between microservices for pub/sub, RPC, and load balancing. Running a local NATS daemon per VM improves performance.
NoSQL databases, the CAP theorem, and the theory of relativityLars Marius Garshol
The document discusses NoSQL databases and the CAP theorem. It begins by providing an overview of NoSQL databases, their key features like being schemaless and supporting eventual consistency over ACID transactions. It then explains the CAP theorem - that a distributed system can only provide two of consistency, availability, and partition tolerance. It also discusses how Google's Spanner database achieves consistency and scalability using ideas from Lamport's Paxos algorithm and a new time service called TrueTime.
The document discusses the CAP theorem which states that it is impossible for a distributed computer system to simultaneously provide consistency, availability, and partition tolerance. It defines these terms and explores how different systems address the tradeoffs. Consistency means all nodes see the same data at the same time. Availability means every request results in a response. Partition tolerance means the system continues operating despite network failures. The CAP theorem says a system can only choose two of these properties. The document discusses how different types of systems, like CP and AP systems, handle partitions and trade off consistency and availability. It also notes the CAP theorem is more nuanced in reality with choices made at fine granularity within systems.
The document discusses various database consistency models and proposes a new taxonomy called PACELC.
It notes limitations with the CAP theorem, such as inconsistencies between availability and consistency when there are no network partitions. It introduces PACELC which classifies databases based on their behavior during partitions (P) and without partitions.
The document argues for the viability of P*/EC (eventual consistency) systems, noting they can provide stronger guarantees than most NoSQL databases through determinism, which allows for easier replication and reduces the costs of consistency.
The Serverless experience is revolutionary and will grow to dominate the future of Cloud. Function-as-a-Service (FaaS) however—with its ephemeral, stateless, and short-lived functions—is only the first step. FaaS is great for processing-intensive, parallelizable workloads, moving data from A to B providing enrichment and transformation along the way. But it is quite limited and constrained in what use-cases it addresses well, which makes it very hard/inefficient to implement general-purpose application development and distributed systems protocols.
What’s needed is a next-generation Serverless platform and programming model for general-purpose application development in the new world of real-time data and event-driven systems. What is missing is ways to manage distributed state in a scalable and available fashion, support for long-lived virtual stateful services, ways to physically co-locate data and processing, and options for choosing the right data consistency model for the job.
This talk will discuss the challenges, requirements, and introduce you to our proposed solution: Cloudstate—an Open Source project building the next generation Stateful Serverless and leveraging state models such as Event Sourcing, CQRS, and CRDTs, running on Akka, gRPC, Knative, Kubernetes, and GraalVM, in a polyglot fashion with support for Go, JavaScript, Java, Swift, Scala, Python, Kotlin, and more.
The Economics of Scale: Promises and Perils of Going DistributedTyler Treat
What does it take to scale a system? We'll learn how going distributed can pay dividends in areas like availability and fault tolerance by examining a real-world case study. However, we will also look at the inherent pitfalls. When it comes to distributed systems, for every promise there is a peril.
This document summarizes a talk given by Tyler Treat about using simple solutions for complex distributed systems problems. Some key points:
- Distributed systems are inherently asynchronous and unreliable, but many try to build them as if they are synchronous.
- Exact delivery guarantees are expensive and impossible at scale. Replayable and idempotent delivery are better alternatives.
- NATS is a simple, high performance, and highly available messaging system that embraces asynchronous communication.
- Workiva uses NATS as a messaging backplane between microservices for pub/sub, RPC, and load balancing. Running a local NATS daemon per VM improves performance.
NoSQL databases, the CAP theorem, and the theory of relativityLars Marius Garshol
The document discusses NoSQL databases and the CAP theorem. It begins by providing an overview of NoSQL databases, their key features like being schemaless and supporting eventual consistency over ACID transactions. It then explains the CAP theorem - that a distributed system can only provide two of consistency, availability, and partition tolerance. It also discusses how Google's Spanner database achieves consistency and scalability using ideas from Lamport's Paxos algorithm and a new time service called TrueTime.
The document discusses the CAP theorem which states that it is impossible for a distributed computer system to simultaneously provide consistency, availability, and partition tolerance. It defines these terms and explores how different systems address the tradeoffs. Consistency means all nodes see the same data at the same time. Availability means every request results in a response. Partition tolerance means the system continues operating despite network failures. The CAP theorem says a system can only choose two of these properties. The document discusses how different types of systems, like CP and AP systems, handle partitions and trade off consistency and availability. It also notes the CAP theorem is more nuanced in reality with choices made at fine granularity within systems.
The document discusses various database consistency models and proposes a new taxonomy called PACELC.
It notes limitations with the CAP theorem, such as inconsistencies between availability and consistency when there are no network partitions. It introduces PACELC which classifies databases based on their behavior during partitions (P) and without partitions.
The document argues for the viability of P*/EC (eventual consistency) systems, noting they can provide stronger guarantees than most NoSQL databases through determinism, which allows for easier replication and reduces the costs of consistency.
Dan Kaminsky gave a keynote talk at DEFCON China thanking the organizers. He discussed how bugs are not random and connected concepts that may seem unrelated. He explained how 60 frames per second for video originated from 1890s power grid technology running at 60Hz for induction motors, and how the human brain also operates around this frequency range. Spectre and Meltdown CPU bugs occurred because security boundaries were based on assumptions that timing variations did not carry information, but they can be exploited to leak bits of data. Kaminsky argued that development and testing teams should be more integrated to avoid such issues through a more holistic "engineering" approach rather than distinguishing "forward" from "reverse".
The document discusses common mistakes in multidimensional cube design and how to avoid them. It covers issues like overusing parent-child hierarchies, relying too heavily on MDX calculations, unused aggregations, unprocessed aggregations, incorrect use of the NON_EMPTY_BEHAVIOR property, and overusing cell security. The key recommendations are to follow best practices in cube design, avoid unnecessary complexity, ensure the right aggregations are defined and processed, and use dimension security instead of cell security where possible. Proper cube design leads to better performance, easier development and maintenance.
An overview of how recent changes in technology have changed priorities for databases to distributed systems, and how you can preserve consistency in distributed data stores like Riak.
This paper discusses challenges in diagnosing errors when deploying Hadoop ecosystems. It provides 15 examples of specific errors that can occur with Hbase/Hadoop deployment on Amazon EC2, along with potential root causes. The paper also classifies errors as operational, configuration, software, or resource-related. It identifies inconsistencies across component logs, high signal-to-noise ratios, and uncertainty in correlating events as difficulties for error diagnosis. The paper contributes examples to a repository for mapping deployment symptoms to fault trees to determine root causes.
It's harder than ever to predict the load your application will need to handle in advance, so how do you design your architecture so you can afford to implement as you go and be ready for whatever comes your way.
It's easy to focus on optimizing each part of your application but your application architecture determines the options you have to make big leaps in scalability.
In this talk we'll cover practical patterns you can build today to meet the needs of rapid development while still creating systems that can scale up and out. Specific code examples will focus on .NET but the principles apply across many technologies. Real world systems will be discussed based on our experience helping customers around the world optimize their enterprise applications.
The document discusses the CAP theorem and related concepts like PACELC, ACID, and BASE. It analyzes how different database systems like PostgreSQL, MongoDB, and a hybrid PostgreSQL/Salesforce/Heroku Connect system fit within these models. While CAP classifications can be imprecise, the key aspects to understand are the consistency, availability, and partition tolerance tradeoffs that distributed systems must make.
A Technical Dive into Defensive TrickeryDan Kaminsky
This document discusses various techniques for improving security and making it easier to deploy. It begins by introducing Dan Kaminsky and the goal of challenging assumptions. It then discusses how security is often hard to implement due to challenges like DDoS attacks being hard to remediate, TLS being difficult to deploy properly, and data loss prevention during attacks. The document proposes several solutions to these challenges, including Overflowd to help trace DDoS attacks, JFE to automatically provision TLS for all network services, and Ratelock to enforce access policies like rate limits in the cloud even if servers are compromised. It argues that moving enforcement to the cloud can improve security. The document concludes by noting that running code safely through sandboxing is also difficult but
Architecting for the cloud elasticity securityLen Bass
Concurrency and state management are important considerations for achieving elasticity in cloud systems. There are three types of state: session state kept by clients, server-side state kept in processes, and persistent state stored externally. Server-side state makes scaling difficult, while stateless servers allow elasticity. Memcached provides a way to synchronize small amounts of in-memory state across servers to support stateless services running elastically in the cloud.
Designing Events-First Microservices For A Cloud Native WorldLightbend
In this talk by Jonas Bonér, Lightbend CTO/Co-Founder and creator of Akka, we will explore the nature of events, what it means to be event-driven, and how we can unleash the power of events and commands by applying an events first, domain-driven design to microservices-based architectures.
For more information, head over to lightbend.com/blog!
This document discusses the evolution of DevOps practices and platforms. It describes how organizations like Amazon and Netflix built platforms to enable continuous delivery of software through automation. These platforms allowed for high velocity software development while keeping promises around availability, reliability and security. The document advocates that organizations adopt cloud native principles of using simple, automated patterns and tooling to build platforms that help teams keep promises around delivering features quickly at scale.
2016 Mastering SAP Tech - 2 Speed IT and lessons from an Agile Waterfall eCom...Eneko Jon Bilbao
A recent clash of worlds occurred when a local client asked to deliver their Hybris eCommerce portal on top of their global template SAP system. The backend SAP team jogged along in the traditional waterfall pace whilst the frontend Hybris team sought to sprint along in agile fashion. This is the story of how we managed the different worlds, the skills required and the lessons learned from both teams.
The CAP Theorem states that it is impossible for a distributed computer system to simultaneously provide consistency, availability, and partition tolerance. A system must choose between two of these three properties. Consistency means all nodes see the same data at the same time. Availability means every request receives a response without fail. Partition tolerance means the system continues operating despite network failures. Most distributed databases, like Cassandra, choose availability and partition tolerance over consistency and implement eventual consistency.
This document discusses patterns for scaling systems incrementally. It introduces the ACD/C approach of making systems async, caching results, distributing work, and compromising on consistency as needed. Specific architectures like map reduce and distributed queues are presented. The challenges of partial failures, upgrades, and changing topologies are discussed. Testing is emphasized as critical for managing scaled systems.
Just like you can't defeat the laws of physics there are natural laws that ultimately decide software performance. Even the latest technology beta is still bound by Newton's laws, and you can't change the speed of light, even in the cloud!
From Divided to United - Aligning Technical and Business TeamsDominica DeGrandis
This is a true story of one SaaS company's journey to gain alignment across business and technical teams by changing how four important factors were viewed: customer demand, work prioritization, team metrics, and communication etiquette.
Without Resilience, Nothing Else MattersJonas Bonér
It doesn’t matter how beautiful, loosely coupled, scalable, highly concurrent, non-blocking, responsive and performant your application is—if it isn't running, then it's 100% useless. Without resilience, nothing else matters.
Most developers understand what the word resilience means, at least superficially, but way too many lack a deeper understanding of what it really means in the context of the system that they are working on now. I find it really sad to see, since understanding and managing failure is more important today than ever. Outages are incredibly costly—for many definitions of cost—and can sometimes take down whole businesses.
In this talk we will explore the essence of resilience. What does it really mean? What is its mechanics and characterizing traits? How do other sciences and industries manage it, and what can we learn from that? We will see that everything hints at the same conclusion; that failure is inevitable and needs to be embraced, and that resilience is by design.
Architectural Tactics for Large Scale SystemsLen Bass
The document discusses several challenges for large-scale systems operating in cloud environments, including failure, inconsistency, continuous deployment, and installation errors. It describes tactics used by companies like Google and Netflix to address issues like fault tolerance, eventual consistency, rolling upgrades of loosely coupled services, and error diagnosis across distributed systems. The document also outlines research at NICTA to build process models for installation, analyze configuration errors, and develop new tactics for managing changes in complex cloud applications.
Cassandra Core Concepts - Cassandra Day TorontoJon Haddad
- Traditional relational databases do not scale well for large datasets due to limitations in replication, sharding, and consistency.
- Lessons from using relational databases for big data problems include that consistency is impractical, manual sharding is difficult, and additional components increase complexity.
- Apache Cassandra addresses these issues with a distributed architecture that sacrifices consistency for availability and scalability, automates replication and sharding, and uses a simplified design.
An introduction to core concepts in Apache Cassandra. We cover the evolution of database architecture as you try to scale a relational database to solve big data problems, and explain how Cassandra handles these problems efficiently.
This is the keynote talk i delivered at GeekCamp.SG 2014
The main purpose of the talk is to create an awareness, if not existent, in the community when it comes to choosing and wanting to building a distributed system.
This presentation is not meant to be a survey of distributed computing through the ages but hopefully it serves as a good starting point in which the journeyman can start from.
I want to thank Jonas, CTO of Typesafe, as his work in Akka strongly influenced my own and i hope it would help you in the way his work helped me.
Modern Cloud Fundamentals: Misconceptions and Industry TrendsChristopher Bennage
A discussion of misconceptions, problems, and industry trends that hinder adoption of cloud technology; with an emphasis on scenarios that appear to work but fail at critical moments.
Be sure to read the notes!
Dan Kaminsky gave a keynote talk at DEFCON China thanking the organizers. He discussed how bugs are not random and connected concepts that may seem unrelated. He explained how 60 frames per second for video originated from 1890s power grid technology running at 60Hz for induction motors, and how the human brain also operates around this frequency range. Spectre and Meltdown CPU bugs occurred because security boundaries were based on assumptions that timing variations did not carry information, but they can be exploited to leak bits of data. Kaminsky argued that development and testing teams should be more integrated to avoid such issues through a more holistic "engineering" approach rather than distinguishing "forward" from "reverse".
The document discusses common mistakes in multidimensional cube design and how to avoid them. It covers issues like overusing parent-child hierarchies, relying too heavily on MDX calculations, unused aggregations, unprocessed aggregations, incorrect use of the NON_EMPTY_BEHAVIOR property, and overusing cell security. The key recommendations are to follow best practices in cube design, avoid unnecessary complexity, ensure the right aggregations are defined and processed, and use dimension security instead of cell security where possible. Proper cube design leads to better performance, easier development and maintenance.
An overview of how recent changes in technology have changed priorities for databases to distributed systems, and how you can preserve consistency in distributed data stores like Riak.
This paper discusses challenges in diagnosing errors when deploying Hadoop ecosystems. It provides 15 examples of specific errors that can occur with Hbase/Hadoop deployment on Amazon EC2, along with potential root causes. The paper also classifies errors as operational, configuration, software, or resource-related. It identifies inconsistencies across component logs, high signal-to-noise ratios, and uncertainty in correlating events as difficulties for error diagnosis. The paper contributes examples to a repository for mapping deployment symptoms to fault trees to determine root causes.
It's harder than ever to predict the load your application will need to handle in advance, so how do you design your architecture so you can afford to implement as you go and be ready for whatever comes your way.
It's easy to focus on optimizing each part of your application but your application architecture determines the options you have to make big leaps in scalability.
In this talk we'll cover practical patterns you can build today to meet the needs of rapid development while still creating systems that can scale up and out. Specific code examples will focus on .NET but the principles apply across many technologies. Real world systems will be discussed based on our experience helping customers around the world optimize their enterprise applications.
The document discusses the CAP theorem and related concepts like PACELC, ACID, and BASE. It analyzes how different database systems like PostgreSQL, MongoDB, and a hybrid PostgreSQL/Salesforce/Heroku Connect system fit within these models. While CAP classifications can be imprecise, the key aspects to understand are the consistency, availability, and partition tolerance tradeoffs that distributed systems must make.
A Technical Dive into Defensive TrickeryDan Kaminsky
This document discusses various techniques for improving security and making it easier to deploy. It begins by introducing Dan Kaminsky and the goal of challenging assumptions. It then discusses how security is often hard to implement due to challenges like DDoS attacks being hard to remediate, TLS being difficult to deploy properly, and data loss prevention during attacks. The document proposes several solutions to these challenges, including Overflowd to help trace DDoS attacks, JFE to automatically provision TLS for all network services, and Ratelock to enforce access policies like rate limits in the cloud even if servers are compromised. It argues that moving enforcement to the cloud can improve security. The document concludes by noting that running code safely through sandboxing is also difficult but
Architecting for the cloud elasticity securityLen Bass
Concurrency and state management are important considerations for achieving elasticity in cloud systems. There are three types of state: session state kept by clients, server-side state kept in processes, and persistent state stored externally. Server-side state makes scaling difficult, while stateless servers allow elasticity. Memcached provides a way to synchronize small amounts of in-memory state across servers to support stateless services running elastically in the cloud.
Designing Events-First Microservices For A Cloud Native WorldLightbend
In this talk by Jonas Bonér, Lightbend CTO/Co-Founder and creator of Akka, we will explore the nature of events, what it means to be event-driven, and how we can unleash the power of events and commands by applying an events first, domain-driven design to microservices-based architectures.
For more information, head over to lightbend.com/blog!
This document discusses the evolution of DevOps practices and platforms. It describes how organizations like Amazon and Netflix built platforms to enable continuous delivery of software through automation. These platforms allowed for high velocity software development while keeping promises around availability, reliability and security. The document advocates that organizations adopt cloud native principles of using simple, automated patterns and tooling to build platforms that help teams keep promises around delivering features quickly at scale.
2016 Mastering SAP Tech - 2 Speed IT and lessons from an Agile Waterfall eCom...Eneko Jon Bilbao
A recent clash of worlds occurred when a local client asked to deliver their Hybris eCommerce portal on top of their global template SAP system. The backend SAP team jogged along in the traditional waterfall pace whilst the frontend Hybris team sought to sprint along in agile fashion. This is the story of how we managed the different worlds, the skills required and the lessons learned from both teams.
The CAP Theorem states that it is impossible for a distributed computer system to simultaneously provide consistency, availability, and partition tolerance. A system must choose between two of these three properties. Consistency means all nodes see the same data at the same time. Availability means every request receives a response without fail. Partition tolerance means the system continues operating despite network failures. Most distributed databases, like Cassandra, choose availability and partition tolerance over consistency and implement eventual consistency.
This document discusses patterns for scaling systems incrementally. It introduces the ACD/C approach of making systems async, caching results, distributing work, and compromising on consistency as needed. Specific architectures like map reduce and distributed queues are presented. The challenges of partial failures, upgrades, and changing topologies are discussed. Testing is emphasized as critical for managing scaled systems.
Just like you can't defeat the laws of physics there are natural laws that ultimately decide software performance. Even the latest technology beta is still bound by Newton's laws, and you can't change the speed of light, even in the cloud!
From Divided to United - Aligning Technical and Business TeamsDominica DeGrandis
This is a true story of one SaaS company's journey to gain alignment across business and technical teams by changing how four important factors were viewed: customer demand, work prioritization, team metrics, and communication etiquette.
Without Resilience, Nothing Else MattersJonas Bonér
It doesn’t matter how beautiful, loosely coupled, scalable, highly concurrent, non-blocking, responsive and performant your application is—if it isn't running, then it's 100% useless. Without resilience, nothing else matters.
Most developers understand what the word resilience means, at least superficially, but way too many lack a deeper understanding of what it really means in the context of the system that they are working on now. I find it really sad to see, since understanding and managing failure is more important today than ever. Outages are incredibly costly—for many definitions of cost—and can sometimes take down whole businesses.
In this talk we will explore the essence of resilience. What does it really mean? What is its mechanics and characterizing traits? How do other sciences and industries manage it, and what can we learn from that? We will see that everything hints at the same conclusion; that failure is inevitable and needs to be embraced, and that resilience is by design.
Architectural Tactics for Large Scale SystemsLen Bass
The document discusses several challenges for large-scale systems operating in cloud environments, including failure, inconsistency, continuous deployment, and installation errors. It describes tactics used by companies like Google and Netflix to address issues like fault tolerance, eventual consistency, rolling upgrades of loosely coupled services, and error diagnosis across distributed systems. The document also outlines research at NICTA to build process models for installation, analyze configuration errors, and develop new tactics for managing changes in complex cloud applications.
Cassandra Core Concepts - Cassandra Day TorontoJon Haddad
- Traditional relational databases do not scale well for large datasets due to limitations in replication, sharding, and consistency.
- Lessons from using relational databases for big data problems include that consistency is impractical, manual sharding is difficult, and additional components increase complexity.
- Apache Cassandra addresses these issues with a distributed architecture that sacrifices consistency for availability and scalability, automates replication and sharding, and uses a simplified design.
An introduction to core concepts in Apache Cassandra. We cover the evolution of database architecture as you try to scale a relational database to solve big data problems, and explain how Cassandra handles these problems efficiently.
This is the keynote talk i delivered at GeekCamp.SG 2014
The main purpose of the talk is to create an awareness, if not existent, in the community when it comes to choosing and wanting to building a distributed system.
This presentation is not meant to be a survey of distributed computing through the ages but hopefully it serves as a good starting point in which the journeyman can start from.
I want to thank Jonas, CTO of Typesafe, as his work in Akka strongly influenced my own and i hope it would help you in the way his work helped me.
Modern Cloud Fundamentals: Misconceptions and Industry TrendsChristopher Bennage
A discussion of misconceptions, problems, and industry trends that hinder adoption of cloud technology; with an emphasis on scenarios that appear to work but fail at critical moments.
Be sure to read the notes!
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...confluent
We use machine learning to delve deep into the internals of how systems like Kafka work. In this talk I'll dive into what variables affect performance and reliability, including previously unknown leading indicators of major performance problems, failure conditions and how to tune for specific use cases. I'll cover some of the specific methodology we use, including Bayesian optimization, and reinforcement learning. I'll also talk about our own internal infrastructure that makes heavy use of Kafka and Kubernetes to deliver real-time predictions to our customers.
Scaling a High Traffic Web Application: Our Journey from Java to PHP120bi
What makes an application scale? What should you worry about early on and what can wait?
Over the last 3 years, Achievers has learned many lessons and gained fundamental knowledge on scaling our SaaS platform. CTO Dr. Aris Zakinthinos will present and discuss the decisions we’ve made including language choice, server architecture, and much more; join us while we share tips, tricks, and things to absolutely avoid.
Throughout the evening you will have the opportunity to talk to the development team behind the Achievers Platform and ask questions on scaling best practices.
The Power of Determinism in Database SystemsDaniel Abadi
Slides for Daniel Abadi talk at UC Berkeley on 10/22/2014. Discusses the problems with traditional database systems, especially around modularity and horizontal scalability, and shows how deterministic database systems can help.
This document discusses relational and non-relational databases. It begins by introducing NoSQL databases and some of their key characteristics like not requiring a fixed schema and avoiding joins. It then discusses why NoSQL databases became popular for companies dealing with huge data volumes due to limitations of scaling relational databases. The document covers different types of NoSQL databases like key-value, column-oriented, graph and document-oriented databases. It also discusses concepts like eventual consistency, ACID properties, and the CAP theorem in relation to NoSQL databases.
This document outlines a general product direction for connected clouds middleware and is intended for informational purposes only. It may not be incorporated into any contracts and does not commit Oracle to deliver any functionality. The document discusses making globally distributed stateful applications appear and operate as a single application across multiple cloud regions, providers and data centers. It also provides an agenda on challenges of multi-site deployments and introduces Oracle Coherence as a solution.
Haytham ElFadeel presented on next-generation storage systems and key-value stores. He began with an overview of scalable systems and the need for both vertical and horizontal scalability. He discussed the limitations of traditional databases in scaling, including complexity, wasted features, and multi-step query processing. Key-value stores were presented as an alternative, offering simple interfaces and designs optimized for scaling across hundreds of machines. Performance comparisons showed key-value stores significantly outperforming databases. Systems discussed included Amazon Dynamo, Facebook Cassandra, and Redis.
This is a ppt from Open Source Bridge that Thomas used for his session. This basically educates on why redundant power and back up power is so critical, and why you should always back up your info.
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn. This was a presentation made at QCon 2009 and is embedded on LinkedIn's blog - http://blog.linkedin.com/
Stig: Social Graphs & Discovery at ScaleDATAVERSITY
Stig is a distributed graph database built from scratch at Tagged. It handles our scale of data and user demands, which means 100M+ users and 6B+ page views per month. It is particularly suited to graph-based applications involving large volumes of data, transactional updates, and inference-driven queries.
The goal of the stig project is to increase the productivity of web programmers. To this end, the system hides the details of its distributed architecture and provides the application programmer a single, consistent, and reliable path to data. The query language is highly expressive and composable, but also easy to use and stocked with helpful libraries.
Slides from talk given on Java/Scala Lab 2014 in Odessa, Ukraine. Describes of how Java can be used as platform for latency-restricted applications such as High Frequency Trading and demonstrates how latencies 15-30µsec can be achieved on vanilla Oracle JDK.
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...DataStax
Cassandra is a distributed database with features included but not limited to Secundary Indexes, UDF, Materialized Views, etc. and not so strict hardware requirements.
It is important to use those features and select hardware correctly to make sure the use of Cassandra in your business can be as painless as possible.
I will address how these features are used in the wrong way, how hardware should be selected, and how to make Cassandra work in the best possible way.
Learning Objective #1:
Learn that Cassandra hardware requirements exist (and why) and the shortcomings in some of features(Secundary Indexes, Compaction Strategies, etc).
Learning Objective #2:
The most misused features and common hardware errors. How they might seem harmeless at first (either small cluster or even single node).
Learning Objective #3:
How to correctly use Cassandra and it's features and go for perfect operation.
About the Speaker
Carlos Rolo Cassandra Consultant, Pythian
Carlos Rolo is a Cassandra MVP, and has deep expertise with distributed architecture technologies. Carlos is driven by challenge, and enjoys the opportunities to discover new things.. He has become known and trusted by customers and colleagues for his ability to understand complex problems, and to work well under pressure. When Carlos isn't working he can be found playing water polo or enjoying the his local community.
The document discusses different data storage options for small, medium, and large datasets. It argues that relational databases do not scale well for large datasets due to limitations with replication, normalization, sharding, and high availability. The document then introduces Apache Cassandra as a fast, distributed, highly available, and linearly scalable database that addresses these limitations through its use of a hash ring architecture and tunable consistency levels. It describes Cassandra's key features including replication, compaction, and multi-datacenter support.
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
Twitter's operations team manages software performance, availability, capacity planning, and configuration management for Twitter. They use metrics, logs, and analysis to find weak points and take corrective action. Some techniques include caching everything possible, moving operations to asynchronous daemons, and optimizing databases to reduce replication delay and locks. The team also created several open source projects like CacheMoney for caching and Kestrel for asynchronous messaging.
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
Fixing Twitter and Finding your own Fail Whale document discusses Twitter operations. The operations team manages software performance, availability, capacity planning, and configuration management using metrics, logs, and data-driven analysis to find weak points and take corrective action. They use managed services for infrastructure to focus on computer science problems. The document outlines Twitter's rapid growth and challenges in maintaining performance as traffic increases. It provides recommendations around caching, databases, asynchronous processing, and other techniques Twitter uses to optimize performance under heavy load.
Fixing Twitter and Finding your own Fail Whale document discusses Twitter operations. The Twitter operations team focuses on software performance, availability, capacity planning, and configuration management using metrics, logs, and science. They use a dedicated managed services team and run their own servers instead of cloud services. The document outlines Twitter's rapid growth and challenges in maintaining performance. It discusses strategies for monitoring, analyzing metrics to find weak points, deploying changes, and improving processes through configuration management and peer reviews.
Similar to Database Expert Q&A from 2600hz and Cloudant (20)
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...DanBrown980551
This LF Energy webinar took place June 20, 2024. It featured:
-Alex Thornton, LF Energy
-Hallie Cramer, Google
-Daniel Roesler, UtilityAPI
-Henry Richardson, WattTime
In response to the urgency and scale required to effectively address climate change, open source solutions offer significant potential for driving innovation and progress. Currently, there is a growing demand for standardization and interoperability in energy data and modeling. Open source standards and specifications within the energy sector can also alleviate challenges associated with data fragmentation, transparency, and accessibility. At the same time, it is crucial to consider privacy and security concerns throughout the development of open source platforms.
This webinar will delve into the motivations behind establishing LF Energy’s Carbon Data Specification Consortium. It will provide an overview of the draft specifications and the ongoing progress made by the respective working groups.
Three primary specifications will be discussed:
-Discovery and client registration, emphasizing transparent processes and secure and private access
-Customer data, centering around customer tariffs, bills, energy usage, and full consumption disclosure
-Power systems data, focusing on grid data, inclusive of transmission and distribution networks, generation, intergrid power flows, and market settlement data
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving
What began over 115 years ago as a supplier of precision gauges to the automotive industry has evolved into being an industry leader in the manufacture of product branding, automotive cockpit trim and decorative appliance trim. Value-added services include in-house Design, Engineering, Program Management, Test Lab and Tool Shops.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
What is an RPA CoE? Session 2 – CoE RolesDianaGray10
In this session, we will review the players involved in the CoE and how each role impacts opportunities.
Topics covered:
• What roles are essential?
• What place in the automation journey does each role play?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillLizaNolte
HERE IS YOUR WEBINAR CONTENT! 'Mastering Customer Journey Management with Dr. Graham Hill'. We hope you find the webinar recording both insightful and enjoyable.
In this webinar, we explored essential aspects of Customer Journey Management and personalization. Here’s a summary of the key insights and topics discussed:
Key Takeaways:
Understanding the Customer Journey: Dr. Hill emphasized the importance of mapping and understanding the complete customer journey to identify touchpoints and opportunities for improvement.
Personalization Strategies: We discussed how to leverage data and insights to create personalized experiences that resonate with customers.
Technology Integration: Insights were shared on how inQuba’s advanced technology can streamline customer interactions and drive operational efficiency.
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
ScyllaDB is making a major architecture shift. We’re moving from vNode replication to tablets – fragments of tables that are distributed independently, enabling dynamic data distribution and extreme elasticity. In this keynote, ScyllaDB co-founder and CTO Avi Kivity explains the reason for this shift, provides a look at the implementation and roadmap, and shares how this shift benefits ScyllaDB users.
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
This talk will cover ScyllaDB Architecture from the cluster-level view and zoom in on data distribution and internal node architecture. In the process, we will learn the secret sauce used to get ScyllaDB's high availability and superior performance. We will also touch on the upcoming changes to ScyllaDB architecture, moving to strongly consistent metadata and tablets.
6. What is Database?
• A Record of things Remembered or Forgotten
• Used to be Unbelievably hard, now it’s just hard
sometimes
• Modern Databases are amazingly resilient
• Failure Mode still requires lots of attention
• In Distributed Environments…
• Database is inexorably linked to the network
• The network is always unreliable if public
7. Masters and Slaves
• Databases have to Replicate
• Most Databases use a form of Master-Slave
Relationship to manage replication and dedupe
• Masters are where new data is entered
• Then it’s mirrored out to the Slaves for storage
• If you lose access to the original Master, you can
convert a Slave into a Master and restore
operation
Durability
8. Other Replication Strategies
• Other strategies exist, such as…
• Master-Master (What 2600hz Uses)
• Tokenized Exchange
• Time-delimited
• The most popular methods tend to be Master-
Slave or Master-Master
Each Database has its advantages and tradeoffs. Once
again, there is no Magic Bullet.
9. Failure and Quorum
• When A Database needs to elect a new master…
• There are many different strategies
• Most involve the concept of quorum (figuring
out where the greatest number of copies
reside)
• Once Quorum is established, a new master is
elected and (hopefully) operation can resume
• Quorum is different in Master-Master (Explain)
10. Cap Theorem
Databases can have (at most) 2 out of 3 of the following:
•Consistency
•Availability
•Partition Tolerance
Modern Database Management is balancing between
Consistency and Availability because all modern
networks are unreliable
12. What is Important in a Database?
• Reliable Storage of Data?
• Fast Retrieval of Data?
• Fast Saving of Data?
• Resilience during failures?
• <other>
13. Examples
• Buying tickets from ticketmaster
• What’s important and why?
• Withdrawing money from a bank?
• Storing Call Forwarding Settings?
• Storing a List of Favorite Stocks?
Each Scenario has a different set of requirements and
constraints. There is no silver bullet; if you could
write one database for all these scenarios, you’d
be rich.
14. Which Database is Better?
• STUPID QUESTION
• But I thought there were no stupid questions?
• This is the only stupid question.
• The fight of which database is better is almost
always silly
• Databases are a tool, to get a job done
• Like the previous examples, each job is different
• Each database stresses different pros/cons
16. Trouble With Databases
• HUGE TOPIC (We’re only going to cover a little)
• Network Partitions
• Layer 1 disasters
• Flapping Internet (Special Class of Network
Partitions)
17. Network Partitions
• Common in Distributed Databases
• When Databases lose contact with each other they can
partition
• Caused by unreliable or faulty network connections
• Databases can behave very weirdly when in partitions
Arguably, most of what a database admin does is prepare for
network partitions and how to resolve them.
20. Split-Brain
• During a partition, some databases will elect N masters, one
for each partition in the network.
• When the partition is fixed, unless there is a pre-defined
restoral procedure, there will be conflicts
• Databases have all kinds of strategies for handling WAN Split-
brain failure, but you should understand them
Key Takeaway: No Database is perfect. Understand the
automation but also understand the manual intervention
procedure.
22. Layer 1 Failures
• Rut Roh
• Actual Physical Disaster
• No easy way out except…
• Don’t be in a Datacenter that’s hit by a disaster
OR
• Be Nimble enough to Evade Disaster
23. Evading Disaster
• We’re not Magicians, we can’t simply predict disasters
• The next best thing is being able to move and move fast
• Kazoo requires one line of code to move
• Kazoo moves fast
• Moving the Database fast is awesome (Thanks BigCouch!)
During Hurricane Sandy, we cut our Datacenters away from
Downtown New York to a Datacenter above the 100 year
flood plain on the East Coast. Result: No Downtime.
24. No Silver Bullets
• Layer 1 disasters are a humbling experience
• Don’t rely on DataCenters in the Path of a Storm
• Flooding will brick datacenters that have generators below
ground
• To avoid being powerless in a disaster…
• Plan, Test, Analyze, Repeat
• Check out Netflix Simian Army for examples of tests
25. Flapping
• Is it up? Is it Down? Around and Around it Goes, where it
stops nobody knows…
• Flapping Internet is a special case of network partition or lose
connectivity
• Flapping connections lose contact with other servers and then
appear to come back online before going off
Why is this bad?
26. Fixing Flapping
• I’m trying to fix a partition
• The Network keeps going up and down
• As I repair my cluster, it keeps starting to repair and failing (by
attempting to reintegrate the unreliable nodes)
Flapping nodes make everything awful
27. Why is the Network Difficult?
“Detecting network failures is hard. Since our only knowledge of
the other nodes passes through the network, delays are
indistinguishable from failure. This is the fundamental problem of
the network partition: latency high enough to be considered a
failure. When partitions arise, we have no way to
determine what happened on the other nodes: are they alive?
Dead? Did they receive our message? Did they try to respond?
Literally no one knows. When the network finally heals, we'll
have to re-establish the connection and try to work out what
happened–perhaps recovering from an inconsistent state.”
-Kyle Kingsbury, Aphyr.com
28. Why is the Network Difficult?
“Detecting network failures is hard. Since our only knowledge of
the other nodes passes through the network, delays are
indistinguishable from failure. This is the fundamental problem of
the network partition: latency high enough to be considered a
failure. When partitions arise, we have no way to
determine what happened on the other nodes: are they alive?
Dead? Did they receive our message? Did they try to respond?
Literally no one knows. When the network finally heals, we'll
have to re-establish the connection and try to work out what
happened–perhaps recovering from an inconsistent state.”
-Kyle Kingsbury, Aphyr.com
29. Why is the Network Difficult?
“Detecting network failures is hard. Since our only knowledge of
the other nodes passes through the network, delays are
indistinguishable from failure. This is the fundamental problem of
the network partition: latency high enough to be considered a
failure. When partitions arise, we have no way to
determine what happened on the other nodes: are they alive?
Dead? Did they receive our message? Did they try to respond?
Literally no one knows. When the network finally heals, we'll
have to re-establish the connection and try to work out what
happened–perhaps recovering from an inconsistent state.”
-Kyle Kingsbury, Aphyr.com
30. What does 2600hz use?
• Cloudant BigCouch
• NoSQL Database
• Master-Master
• Very sensibly designed for our use case
31. Why BigCouch?
DEMANDS
1.On the Fly Schema Changes
2.Scale in a distributed fashion
3.Configuration changes will
happen as we grow
4.Has to be equipment
agnostic
5.Accessible Raw Data View
6.Simple to Install and Keep up
7.It can’t fail, ergo Fault-
Tolerance
8.Multi-Master writes
9.Simple (to cluster, to
TRADEOFFS
1.Eventual Consistency is OK
2.Nodes going offline randomly
3.Multi-server only
Why are we ok with these
tradeoffs? They suit our use
case.
32. Let’s take some time to pontificate about
Database at scale…
What are the first things you think of when
you get errors reported from the Database?
What’s your Thought Process?
33. • Database is where you put stuff
• You want your Database not to
die
• 2600hz uses BigCouch because
it’s really awesome technology
• Great for our Use Case
• Easy to Administrate
• Resilient and quick-to-restore
Recap
When do we come in and provide the support? Possile examples?
Sponsered features?...they have access to current and future features for free.
Sponsered features?...they have access to current and future features for free.
Yealink stuff: make sure you send the right firmware and then the right config file. If you send the wrong config file, or send the file too early, you can brick the phone. 50 handsets is the threshold for DHCP66