The database world is undergoing a major upheaval. NoSQL databases such as MongoDB and Cassandra are emerging as a compelling choice for many applications. They can simplify the persistence of complex data models and offering significantly better scalability and performance. But these databases have a very different and unfamiliar data model and APIs as well as a limited transaction model. Moreover, the relational world is fighting back with so-called NewSQL databases such as VoltDB, which by using a radically different architecture offers high scalability and performance as well as the familiar relational model and ACID transactions. Sounds great but unlike the traditional relational database you can’t use JDBC and must partition your data.
In this presentation you will learn about popular NoSQL databases – MongoDB, and Cassandra – as well at VoltDB. We will compare and contrast each database’s data model and Java API using NoSQL and NewSQL versions of a use case from the book POJOs in Action. We will learn about the benefits and drawbacks of using NoSQL and NewSQL databases.
Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013odnoklassniki.ru
Odnoklassniki uses cassandra for its business data, which doesn't fit into RAM. This data is typically fast growing, frequently accessed by our users and must be always available, because it constitute our primary business as a social network. The way we use cassandra is somewhat unusual - we don't use thrift or netty based native protocol to communicate with cassandra nodes remotely. Instead, we co-locate cassandra nodes in the same JVM with business service logic, exposing not generic data manipulation, but business level interface remotely. This way, we avoid extra network roundtrips within a single business transaction and use internal calls to Cassandra classes to get information faster. Also, this helps us to create many small hacks on Cassandra's internals, making huge gains on efficiency and ease of distributed server development.
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)Ontico
NoSQL databases have emerged as a response to some perceived problems in the RDBMSs: agile/dynamic schemas; and transparent, horizontal scaling of the database. The former has been promptly targeted with the introduction of unstructured data types, but scaling a relational databases is still a very hard problem.
As a consequence, all NoSQL databases have been built from scratch: their storage engines, replication techniques, journaling, ACID support (if any). They haven't leveraged the previously existing state-of-the-art of RDBMSs, effectively re-inventing the wheel. Isn't this sub-optimal? Wouldn't it be possible to construct a NoSQL database by layering it on top of a relational database?
Enter ToroDB. ToroDB is an open source project that behaves as a NoSQL database but runs on top of PostgreSQL, one of the most respected and reliable relational databases. ToroDB offers a document (JSON-like) interface, and implements the MongoDB wire protocol, hence being compatible with existing MongoDB drivers and applications. Rather than using PostgreSQL's jsonb data type, ToroDB explored an innovative approach by transforming JSON documents to a fully relational representation, in an automated way. This brings to the table many advantages like lower disk footprint and automatic data-partitioning, leading to significantly faster queries.As ToroDB speaks the MongoDB protocol, it also implements MongoDB replication and sharding techniques, enabling it to scale and offer HA like Mongo. Being based on PostgreSQL, ToroDB is effectively scaling PostgreSQL much in the same way MongoDB scales.
Hippo meetup: enterprise search with Solr and elasticsearchLuca Cavanna
Presentation used at the Hippo meetup about enterprise search which took place in Amsterdam. The talk started with a general introduction about search with lucene, scaling with Solr and the distributed problems that elasticsearch successfully addresses.
Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013odnoklassniki.ru
Odnoklassniki uses cassandra for its business data, which doesn't fit into RAM. This data is typically fast growing, frequently accessed by our users and must be always available, because it constitute our primary business as a social network. The way we use cassandra is somewhat unusual - we don't use thrift or netty based native protocol to communicate with cassandra nodes remotely. Instead, we co-locate cassandra nodes in the same JVM with business service logic, exposing not generic data manipulation, but business level interface remotely. This way, we avoid extra network roundtrips within a single business transaction and use internal calls to Cassandra classes to get information faster. Also, this helps us to create many small hacks on Cassandra's internals, making huge gains on efficiency and ease of distributed server development.
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)Ontico
NoSQL databases have emerged as a response to some perceived problems in the RDBMSs: agile/dynamic schemas; and transparent, horizontal scaling of the database. The former has been promptly targeted with the introduction of unstructured data types, but scaling a relational databases is still a very hard problem.
As a consequence, all NoSQL databases have been built from scratch: their storage engines, replication techniques, journaling, ACID support (if any). They haven't leveraged the previously existing state-of-the-art of RDBMSs, effectively re-inventing the wheel. Isn't this sub-optimal? Wouldn't it be possible to construct a NoSQL database by layering it on top of a relational database?
Enter ToroDB. ToroDB is an open source project that behaves as a NoSQL database but runs on top of PostgreSQL, one of the most respected and reliable relational databases. ToroDB offers a document (JSON-like) interface, and implements the MongoDB wire protocol, hence being compatible with existing MongoDB drivers and applications. Rather than using PostgreSQL's jsonb data type, ToroDB explored an innovative approach by transforming JSON documents to a fully relational representation, in an automated way. This brings to the table many advantages like lower disk footprint and automatic data-partitioning, leading to significantly faster queries.As ToroDB speaks the MongoDB protocol, it also implements MongoDB replication and sharding techniques, enabling it to scale and offer HA like Mongo. Being based on PostgreSQL, ToroDB is effectively scaling PostgreSQL much in the same way MongoDB scales.
Hippo meetup: enterprise search with Solr and elasticsearchLuca Cavanna
Presentation used at the Hippo meetup about enterprise search which took place in Amsterdam. The talk started with a general introduction about search with lucene, scaling with Solr and the distributed problems that elasticsearch successfully addresses.
Francesc Alted (UberResearch GmbH), “New Trends In Storing And Analyzing Large Data Silos With Python”.
Bio: Teacher, developer and consultant in a wide variety of business applications. Particularly interested in the field of very large databases, with special emphasis in squeezing the last drop of performance out of computer as whole, i.e. not only the CPU, but the memory and I/O subsystems.
Add a bit of ACID to Cassandra. Cassandra Summit EU 2014odnoklassniki.ru
OK.ru is one of the largest social networks for Russian-speaking audiences with 80+ million unique user’s visits monthly. ok.ru uses Cassandra since 2010 and made a number of improvements to C* 2.0 and 2.1 codebase. Until recent time more than 50 TB of data at Ok.ru OLTP systems was managed by Microsoft SQL Sever. It’s very expensive, hard to scale and cannot save us from outage if one of our data centers fail. We wanted a new, fast scalable and reliable storage for these data. These data has requirements to support ACID transactions, so we don’t have to rewrite all application code from scratch. С* does not support these transactions, only lightweight, so we implemented a new storage with ACID and selected features of SQL world by ourselves. Still, it has C* at its heart. We’ll discuss the internals of the new storage, what features of C* we had to alter and which to rewrite from scratch. We’ll also talk about its operational experience in production.
Cassandra by example - the path of read and write requestsgrro
This article describes how Cassandra handles and processes requests. It will help you to get a better impression about Cassandra's internals and architecture. The path of a single read request as well as the path of a single write request will be described in detail.
Elasticsearch is well known as a highly scalable search engine that stores data in a structure optimized for language based searches but its capabilities and use cases don't stop there. In this tutorial, I'll give you a hands-on introduction to Elasticsearch and give you a glimpse at some of the fundamental concepts.
Database administration is challenging, and Elasticsearch is not an exception to that rule. In this tutorial, we will cover various administrative topics like Installation and Configuration, Cluster/Node management, Indexes management and Monitoring Cluster Health which will help you. Building applications on top of an Elasticsearch are also challenging and raise concerns about schema design. In this tutorial, we will cover developer-oriented topics like Mappings and Analysis, Aggregations and Schema Design that will help you build a robust application on top of Elasticsearch.
There will be lab sessions at the end of some chapters so please have your laptops with you.
Tokyo Cabinet is a library of routines for managing a database. The database is a simple data file containing records, each is a pair of a key and a value. Every key and value is serial bytes with variable length. Both binary data and character string can be used as a key and a value. There is neither concept of data tables nor data types. Records are organized in hash table, B+ tree, or fixed-length array.
Memory efficient applications. FRANCESC ALTED at Big Data Spain 2012Big Data Spain
Session presented at Big Data Spain 2012 Conference
16th Nov 2012
ETSI Telecomunicacion UPM Madrid
www.bigdataspain.org
More info: http://www.bigdataspain.org/es-2012/conference/memory-efficient-applications/francesc-alted
Lucene 4.0 is on its way to deliver a tremendous amount of new features and improvements. Beside Real-Time Search & Flexible Indexing DocValues aka. Column Stride Fields is one of the "next generation" features
You may all know that JSON is a subset of JavaScript, but... Did you know that HTML5 implements NoSQL databases? Did you know that JavaScript was recommended for REST by Roy T. Fielding himself? Did you know that map & reduce are part of the native JavaScript API? Did you know that most NoSQL solutions integrate a JavaScript engine? CouchDB, MongoDB, WakandaDB, ArangoDB, OrientDB, Riak.... And when they don't, they have a shell client which does...
The story of NoSQL and JavaScript goes beyond your expectations and open more opportunities than you might imagine... What better match could you find than a flexible and dynamic language for schemaless databases? Isn't, an event-driven language what you were waiting for to manage eventually consistency? When NoSQL doesn't come to JavaScript, JavaScript comes to NoSQL, and does it very well...
Groovy speech I held last year for introducing a new JVM language as substitute of Java. Easy and intuitive, it offers new features unknow to its parent yet.
Scalable XQuery Processing with Zorba on top of MongoDBWilliam Candillon
Since a couple of years, the NoSQL movement has developed a variety of open-source document stores. Most of them focus on high availability, horizontal scalability, and are designed to run on commodity hardware. These products have gained great traction in the industry to store large amounts of flexible data (mostly JSON). In the meantime, XQuery has evolved to a standardized, full-fledged programming language for XML with native support for complex queries, indexes, updates, full-text search, and scripting. Moreover, JSON has recently been added as a first-level datatype into the language. As of today, it is without doubt the most robust and productive technology to process flexible data.
The aim of this talk is to showcase the benefits that can be achieved by integrating the Zorba XQuery Processor with MongoDB. We will introduce the 28msec platform that seamlessly stores, indexes, and manages flexible data entirely in XQuery. The data itself is stored in MongoDB. The platform leverages MongoDB’s indexes, sharding, and consistency guarantees to scale-out horizontally. The talk will conclude by showing a benchmark of the platform and discuss perspectives of the outlined approach.
Что делает JVM? Компилирует код и выполняет сборку мусора, — скажете вы и будете совершенно правы. Тем не менее, Java приложения могут работать даже при полном отсутсвии JIT и GC.
Виртуальная машина состоит из большого числа компонентов, благодаря которым исполнение Java программ становится возможным. Из доклада вы узнаете, что представляет собой байткод, где лежат переменные, что содержится в class-файлах, кто ловит исключения, насколько дороги JNI методы, как работает синхронизация и многое другое.
Francesc Alted (UberResearch GmbH), “New Trends In Storing And Analyzing Large Data Silos With Python”.
Bio: Teacher, developer and consultant in a wide variety of business applications. Particularly interested in the field of very large databases, with special emphasis in squeezing the last drop of performance out of computer as whole, i.e. not only the CPU, but the memory and I/O subsystems.
Add a bit of ACID to Cassandra. Cassandra Summit EU 2014odnoklassniki.ru
OK.ru is one of the largest social networks for Russian-speaking audiences with 80+ million unique user’s visits monthly. ok.ru uses Cassandra since 2010 and made a number of improvements to C* 2.0 and 2.1 codebase. Until recent time more than 50 TB of data at Ok.ru OLTP systems was managed by Microsoft SQL Sever. It’s very expensive, hard to scale and cannot save us from outage if one of our data centers fail. We wanted a new, fast scalable and reliable storage for these data. These data has requirements to support ACID transactions, so we don’t have to rewrite all application code from scratch. С* does not support these transactions, only lightweight, so we implemented a new storage with ACID and selected features of SQL world by ourselves. Still, it has C* at its heart. We’ll discuss the internals of the new storage, what features of C* we had to alter and which to rewrite from scratch. We’ll also talk about its operational experience in production.
Cassandra by example - the path of read and write requestsgrro
This article describes how Cassandra handles and processes requests. It will help you to get a better impression about Cassandra's internals and architecture. The path of a single read request as well as the path of a single write request will be described in detail.
Elasticsearch is well known as a highly scalable search engine that stores data in a structure optimized for language based searches but its capabilities and use cases don't stop there. In this tutorial, I'll give you a hands-on introduction to Elasticsearch and give you a glimpse at some of the fundamental concepts.
Database administration is challenging, and Elasticsearch is not an exception to that rule. In this tutorial, we will cover various administrative topics like Installation and Configuration, Cluster/Node management, Indexes management and Monitoring Cluster Health which will help you. Building applications on top of an Elasticsearch are also challenging and raise concerns about schema design. In this tutorial, we will cover developer-oriented topics like Mappings and Analysis, Aggregations and Schema Design that will help you build a robust application on top of Elasticsearch.
There will be lab sessions at the end of some chapters so please have your laptops with you.
Tokyo Cabinet is a library of routines for managing a database. The database is a simple data file containing records, each is a pair of a key and a value. Every key and value is serial bytes with variable length. Both binary data and character string can be used as a key and a value. There is neither concept of data tables nor data types. Records are organized in hash table, B+ tree, or fixed-length array.
Memory efficient applications. FRANCESC ALTED at Big Data Spain 2012Big Data Spain
Session presented at Big Data Spain 2012 Conference
16th Nov 2012
ETSI Telecomunicacion UPM Madrid
www.bigdataspain.org
More info: http://www.bigdataspain.org/es-2012/conference/memory-efficient-applications/francesc-alted
Lucene 4.0 is on its way to deliver a tremendous amount of new features and improvements. Beside Real-Time Search & Flexible Indexing DocValues aka. Column Stride Fields is one of the "next generation" features
You may all know that JSON is a subset of JavaScript, but... Did you know that HTML5 implements NoSQL databases? Did you know that JavaScript was recommended for REST by Roy T. Fielding himself? Did you know that map & reduce are part of the native JavaScript API? Did you know that most NoSQL solutions integrate a JavaScript engine? CouchDB, MongoDB, WakandaDB, ArangoDB, OrientDB, Riak.... And when they don't, they have a shell client which does...
The story of NoSQL and JavaScript goes beyond your expectations and open more opportunities than you might imagine... What better match could you find than a flexible and dynamic language for schemaless databases? Isn't, an event-driven language what you were waiting for to manage eventually consistency? When NoSQL doesn't come to JavaScript, JavaScript comes to NoSQL, and does it very well...
Groovy speech I held last year for introducing a new JVM language as substitute of Java. Easy and intuitive, it offers new features unknow to its parent yet.
Scalable XQuery Processing with Zorba on top of MongoDBWilliam Candillon
Since a couple of years, the NoSQL movement has developed a variety of open-source document stores. Most of them focus on high availability, horizontal scalability, and are designed to run on commodity hardware. These products have gained great traction in the industry to store large amounts of flexible data (mostly JSON). In the meantime, XQuery has evolved to a standardized, full-fledged programming language for XML with native support for complex queries, indexes, updates, full-text search, and scripting. Moreover, JSON has recently been added as a first-level datatype into the language. As of today, it is without doubt the most robust and productive technology to process flexible data.
The aim of this talk is to showcase the benefits that can be achieved by integrating the Zorba XQuery Processor with MongoDB. We will introduce the 28msec platform that seamlessly stores, indexes, and manages flexible data entirely in XQuery. The data itself is stored in MongoDB. The platform leverages MongoDB’s indexes, sharding, and consistency guarantees to scale-out horizontally. The talk will conclude by showing a benchmark of the platform and discuss perspectives of the outlined approach.
Что делает JVM? Компилирует код и выполняет сборку мусора, — скажете вы и будете совершенно правы. Тем не менее, Java приложения могут работать даже при полном отсутсвии JIT и GC.
Виртуальная машина состоит из большого числа компонентов, благодаря которым исполнение Java программ становится возможным. Из доклада вы узнаете, что представляет собой байткод, где лежат переменные, что содержится в class-файлах, кто ловит исключения, насколько дороги JNI методы, как работает синхронизация и многое другое.
MongoDB: Optimising for Performance, Scale & AnalyticsServer Density
MongoDB is easy to download and run locally but requires some thought and further understanding when deploying to production. At scale, schema design, indexes and query patterns really matter. So does data structure on disk, sharding, replication and data centre awareness. This talk will examine these factors in the context of analytics, and more generally, to help you optimise MongoDB for any scale.
Presented at MongoDB Days London 2013 by David Mytton.
MongoDB, Hadoop and humongous data - MongoSV 2012Steven Francia
Learn how to integrate MongoDB with Hadoop for large-scale distributed data processing. Using tools like MapReduce, Pig and Streaming you will learn how to do analytics and ETL on large datasets with the ability to load and save data against MongoDB. With Hadoop MapReduce, Java and Scala programmers will find a native solution for using MapReduce to process their data with MongoDB. Programmers of all kinds will find a new way to work with ETL using Pig to extract and analyze large datasets and persist the results to MongoDB. Python and Ruby Programmers can rejoice as well in a new way to write native Mongo MapReduce using the Hadoop Streaming interfaces.
Optimizing MongoDB: Lessons Learned at Localyticsandrew311
Tips, tricks, and gotchas learned at Localytics for optimizing MongoDB installs. Includes information about document design, indexes, fragmentation, migration, AWS EC2/EBS, and more.
19th February 2013, AWS User Group UK, Meetup #3, Managing your apps on AWS: ...AWS User Group UK
Agenda entry: Managing your apps on AWS: Real life lessons with GigaSpaces, Ron Zanver. We’ve all learned Murphy’s inevitable law the hard way – if it can go wrong, it often will! But that doesn’t mean we can’t be ready for such scenarios in the cloud. In this talk, GigaSpaces will focus on the AWS environment, which is dynamic and volatile by nature, and how to maximise your utilisation and minimise downtime. This session will show you how you can architect your cloud-hosted systems to sustain such outages, delving into how to choose the right PaaS for the job, addressing data centre failures, how to avoid single points of failure, and more.
Organiser's commentary: Ron Zanver from GigaSpaces came to talk about the inherent instability of life in the cloud, and what you can do to protect yourself - it's all about good design and architecture. He also introduced us to GigaSaces' new Cloudify product, for abstracting estate management across multiple clouds and cloud vendors.
How to protect your application from outages and failures of cloud infrastructures. Planning disaster recovery architecture and use Cloudify for cloud abstraction and monitoring.
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDBCody Ray
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Many startups collect and display stats and other time-series data for their users. A supposedly-simple NoSQL option such as MongoDB is often chosen to get started... which soon becomes 50 distributed replica sets as volume increases. This talk describes how we designed a scalable distributed stats infrastructure from the ground up. KairosDB, a rewrite of OpenTSDB built on top of Cassandra, provides a solid foundation for storing time-series data. Unfortunately, though, it has some limitations: millisecond time granularity and lack of atomic upsert operations which make counting (critical to any stats infrastructure) a challenge. Additionally, running KairosDB atop Cassandra inside AWS brings its own set of challenges, such as managing Cassandra seeds and AWS security groups as you grow or shrink your Cassandra ring. In this deep-dive talk, we explore how we've used a mix of open-source and in-house tools to tackle these challenges and build a robust, scalable, distributed stats infrastructure.
In this second part, we'll continue the Spark's review and introducing SparkSQL which allows to use data frames in Python, Java, and Scala; read and write data in a variety of structured formats; and query Big Data with SQL.
A common microservice architecture anti-pattern is more the merrier. It occurs when an organization team builds an excessively fine-grained architecture, e.g. one service-per-developer. In this talk, you will learn about the criteria that you should consider when deciding service granularity. I'll discuss the downsides of a fine-grained microservice architecture. You will learn how sometimes the solution to a design problem is simply a JAR file.
YOW London - Considering Migrating a Monolith to Microservices? A Dark Energy...Chris Richardson
This is a talk I gave at YOW! London 2022.
Let's imagine that you are responsible for an aging monolithic application that's critical to your business. Sadly, getting changes into production is a painful ordeal that regularly causes outages. And to make matters worse, the application's technology stack is growing increasingly obsolete. Neither the business nor the developers are happy. You need to modernize your application and have read about the benefits of microservices. But is the microservice architecture a good choice for your application?
In this presentation, I describe the dark energy and dark matter forces (a.k.a. concerns) that you must consider when deciding between the monolithic and microservice architectural styles. You will learn about how well each architectural style resolves each of these forces. I describe how to evaluate the relative importance of each of these forces to your application. You will learn how to use the results of this evaluation to decide whether to migrate to the microservice architecture.
Dark Energy, Dark Matter and the Microservices Patterns?!Chris Richardson
Dark matter and dark energy are mysterious concepts from astrophysics that are used to explain observations of distant stars and galaxies. The Microservices pattern language - a collection of patterns that solve architecture, design, development, and operational problems — enables software developers to use the microservice architecture effectively. But how could there possibly be a connection between microservices and these esoteric concepts from astrophysics?
In this presentation, I describe how dark energy and dark matter are excellent metaphors for the competing forces (a.k.a. concerns) that must be resolved by the microservices pattern language. You will learn that dark energy, which is an anti-gravity, is a metaphor for the repulsive forces that encourage decomposition into services. I describe how dark matter, which is an invisible matter that has a gravitational effect, is a metaphor for the attractive forces that resist decomposition and encourage the use of a monolithic architecture. You will learn how to use the dark energy and dark matter forces as guide when designing services and operations.
Dark energy, dark matter and microservice architecture collaboration patternsChris Richardson
Dark energy and dark matter are useful metaphors for the repulsive forces, which encourage decomposition into services, and the attractive forces, which resist decomposition. You must balance these conflicting forces when defining a microservice architecture including when designing system operations (a.k.a. requests) that span services.
In this talk, I describe the dark energy and dark matter forces. You will learn how to design system operations that span services using microservice architecture collaboration patterns: Saga, Command-side replica, API composition, and CQRS patterns. I describe how each of these patterns resolve the dark energy and dark matter forces differently.
It sounds dull but good architecture documentation is essential. Especially when you are actively trying to improve your architecture.
For example, I spend a lot time helping clients modernize their software architecture. More often than I like, I’m presented with a vague and lifeless collection of boxes and lines. As a result, it’s sometimes difficult to discuss the architecture in a meaningful and productive way. In this presentation, I’ll describe techniques for creating minimal yet effective documentation for your application’s microservice architecture. In particular, you will learn how documenting scenarios can bring your architecture to life.
Using patterns and pattern languages to make better architectural decisions Chris Richardson
This is a presentation that gave at the O'Reilly Software Architecture Superstream: Software Architecture Patterns.
The talk's focus is the microservices pattern language.
However, it also shows how thinking with the pattern mindset - context/problem/forces/solution/consequences - leads to better technically decisions.
The microservices architecture offers tremendous benefits, but it’s not a silver bullet. It also has some significant drawbacks. The microservices pattern language—a collection of patterns that solve architecture, design, development, and operational problems—enables software developers to apply the microservices architecture effectively. I provide an overview of the microservices architecture and examines the motivations for the pattern language, then takes you through the key patterns in the pattern language.
Rapid, reliable, frequent and sustainable software development requires an architecture that is loosely coupled and modular.
Teams need to be able complete their work with minimal coordination and communication with other teams.
They also need to be able keep the software’s technology stack up to date.
However, the microservice architecture isn’t always the only way to satisfy these requirements.
Yet, neither is the monolithic architecture.
In this talk, I describe loose coupling and modularity and why they are is essential.
You will learn about three architectural patterns: traditional monolith, modular monolith and microservices.
I describe the benefits, drawbacks and issues of each pattern and how well it supports rapid, reliable, frequent and sustainable development.
You will learn some heuristics for selecting the appropriate pattern for your application.
Events to the rescue: solving distributed data problems in a microservice arc...Chris Richardson
To deliver a large complex application rapidly, frequently and reliably, you often must use the microservice architecture.
The microservice architecture is an architectural style that structures the application as a collection of loosely coupled services.
One challenge with using microservices is that in order to be loosely coupled each service has its own private database.
As a result, implementing transactions and queries that span services is no longer straightforward.
In this presentation, you will learn how event-driven microservices address this challenge.
I describe how to use sagas, which is an asynchronous messaging-based pattern, to implement transactions that span services.
You will learn how to implement queries that span services using the CQRS pattern, which maintain easily queryable replicas using events.
A pattern language for microservices - June 2021 Chris Richardson
The microservice architecture is growing in popularity. It is an architectural style that structures an application as a set of loosely coupled services that are organized around business capabilities. Its goal is to enable the continuous delivery of large, complex applications. However, the microservice architecture is not a silver bullet and it has some significant drawbacks.
The goal of the microservices pattern language is to enable software developers to apply the microservice architecture effectively. It is a collection of patterns that solve architecture, design, development and operational problems. In this talk, I’ll provide an overview of the microservice architecture and describe the motivations for the pattern language. You will learn about the key patterns in the pattern language.
QConPlus 2021: Minimizing Design Time Coupling in a Microservice ArchitectureChris Richardson
Delivering large, complex software rapidly, frequently and reliably requires a loosely coupled organization. DevOps teams should rarely need to communicate and coordinate in order to get work done. Conway's law states that an organization and the architecture that it develops mirror one another. Hence, a loosely coupled organization requires a loosely coupled architecture.
In this presentation, you will learn about design-time coupling in a microservice architecture and why it's essential to minimize it. I describe how to design service APIs to reduce coupling. You will learn how to minimize design-time coupling by applying a version of the DRY principle. I describe how key microservices patterns potentially result in tight design time coupling and how to avoid it.
Mucon 2021 - Dark energy, dark matter: imperfect metaphors for designing micr...Chris Richardson
In order to explain certain astronomical observations, physicists created the mysterious concepts of dark energy and dark matter.
Dark energy is a repulsive force.
It’s an anti-gravity that is forcing matter apart and accelerating the expansion of the universe.
Dark matter has the opposite attraction effect.
Although it’s invisible, dark matter has a gravitational effect on stars and galaxies.
In this presentation, you will learn how these metaphors apply to the microservice architecture.
I describe how there are multiple repulsive forces that drive the decomposition of your application into services.
You will learn, however, that there are also multiple attractive forces that resist decomposition and bind software elements together.
I describe how as an architect you must find a way to balance these opposing forces.
Skillsmatter CloudNative eXchange 2020
The microservice architecture is a key part of cloud native.
An essential principle of the microservice architecture is loose coupling.
If you ignore this principle and develop tightly coupled services the result will mostly likely be yet another "microservices failure story”.
Your application will be brittle and have all of disadvantages of both the monolithic and microservice architectures.
In this talk you will learn about the different kinds of coupling and how to design loosely coupled microservices.
I describe how to minimize design time and increase the productivity of your DevOps teams.
You will learn how how to reduce runtime coupling and improve availability.
I describe how to improve availability by minimizing the coupling caused by your infrastructure.
DDD SoCal: Decompose your monolith: Ten principles for refactoring a monolith...Chris Richardson
This is a talk I gave at DDD SoCal.
1. Make the most of your monolith
2. Adopt microservices for the right reasons
3. It’s not just architecture
4. Get the support of the business
5. Migrate incrementally
6. Know your starting point
7. Begin with the end in mind
8. Migrate high-value modules first
9. Success is improved velocity and reliability
10. If it hurts, don’t do it
Decompose your monolith: Six principles for refactoring a monolith to microse...Chris Richardson
This was a talk I gave at the CTO virtual summit on July 28th. It describes 6 principles for refactoring to a microservice architecture.
1. Make the most of your monolith
2. Adopt microservices for the right reasons
3. Migrate incrementally
4. Begin with the end in mind
5. Migrate high-value modules first
6. Success is improved velocity and reliability
The microservice architecture is becoming increasingly important. But what is it exactly? Why should you care about microservices? And, what do you need to do to ensure that your organization uses the microservice architecture successfully? In this talk, I’ll answer these and other questions. You will learn about the motivations for the microservice architecture and why simply adopting microservices is insufficient. I describe essential characteristics of microservices, You will learn how a successful microservice architecture consists of loosely coupled services with stable APIs that communicate asynchronously.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Epistemic Interaction - tuning interfaces to provide information for AI support
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
1. SQL, NoSQL, NewSQL?
What's a developer to do?
Chris Richardson
Author of POJOs in Action
Founder of the original CloudFoundry.com
chris.richardson@springsource.com
@crichardson
Blog: http://plainoldobjects.com
Copyright (c) 2012 Chris Richardson. All rights reserved.
10. Food to Go
Take-out food delivery
service
“Launched” in 2006
Used a relational
database (naturally)
10
11. Success Growth challenges
Increasing traffic
Increasing data volume
Distribute across a few data centers
Increasing domain model complexity
12. Limitations of relational databases
Scaling
Distribution
Updating schema
O/R impedance mismatch
Handling semi-structured data
12
13. Solution: Spend Money
Buy SSD and RAM
Buy Oracle
Buy high-end servers
…
OR
http://upload.wikimedia.org/wikipedia/commons/e/e5/Rising_Sun_Yacht.JPG
Hire more DevOps
Use application-level sharding
Build your own middleware
…
http://www.trekbikes.com/us/en/bikes/road/race_performance/madone_5_series/madone_5_2/#
13
14. Solution: Use NoSQL
Benefits
Higher Limited
performance transactions
Higher scalability Relaxed
Richer data- consistency
model Unconstrained
data
Drawbacks
Schema-less
14
24. Tunable reads and writes
Higher availability
Lower response time
Less consistency
Any node (for writes)
One replica
Quorum of replicas
Local quorum
Each quorum
All replicas
Lower availability
Higher response time
More consistency
Slide 24
26. Cassandra use cases
Use cases
Big data
Multiple Data Center distributed
database
(Write intensive) Logging
High-availability (writes)
Used by: Netflix, Facebook, Digg, etc.
Slide 26
28. Solution: Use NewSQL
Relational databases with SQL and
ACID transactions
AND
New and improved architecture
Radically better scalability and
performance
NewSQL vendors: ScaleDB,
NimbusDB, …, VoltDB
Slide 28
29. Stonebraker’s motivations
“…Current databases are designed for
1970s hardware …”
Stonebraker: http://www.slideshare.net/VoltDB/sql-myths-webinar
Significant overhead in “…logging, latching,
locking, B-tree, and buffer management
operations…”
SIGMOD 08: Though the looking glass: http://dl.acm.org/citation.cfm?id=1376713
29
30. About VoltDB
Open-source
In-memory relational database
Durability thru replication; snapshots
and logging
Transparent partitioning
Fast and scalable
…VoltDB is very scalable; it should scale to 120
partitions, 39 servers, and 1.6 million complex
transactions per second at over 300 CPU cores…
http://www.mysqlperformanceblog.com/2011/02/28/is-voltdb-really-as-scalable-as-they-claim/
Slide 30
31. The future is polyglot persistence
e.g. Netflix
• RDBMS
• SimpleDB
• Cassandra
• Hadoop/Hbase
IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg
31
32. Spring Data is here to help
For
NoSQL databases
http://www.springsource.org/spring-data
32
34. Food to Go – Place Order use case
1. Customer enters delivery address and
delivery time
2. System displays available restaurants
3. Customer picks restaurant
4. System displays menu
5. Customer selects menu items
6. Customer places order
34
35. Food to Go – Domain model (partial)
class Restaurant { class TimeRange {
long id; long id;
String name; int dayOfWeek;
Set<String> serviceArea; int openingTime;
Set<TimeRange> openingHours;
int closingTime;
List<MenuItem> menuItems;
}
}
class MenuItem {
String name;
double price;
}
35
44. Option #1: Use a column per attribute
Column Name = path/expression to access property value
Column Family: RestaurantDetails
openingHours[0].dayOfWeek Monday
name Ajanta serviceArea[0] 94619
1 openingHours[0].open 1130
type indian serviceArea[1] 94707
openingHours[0].close 1430
Egg openingHours[0].dayOfWeek Monday
name serviceArea[0] 94611
shop
2 Break openingHours[0].open 0830
type serviceArea[1] 94619
Fast
openingHours[0].close 1430
45. Option #2: Use a single column
Column value = serialized object graph, e.g. JSON
Column Family: RestaurantDetails
2 attributes: { name: “Montclair Eggshop”, … }
1 attributes { name: “Ajanta”, …}
2 attributes { name: “Eggshop”, …}
Can’t use secondary indexes but they aren’t
helpful for these use cases ✔
45
46. Cassandra code: wrapper around Hector
public class AvailableRestaurantRepositoryCassandraKeyImpl
implements AvailableRestaurantRepository {
@Autowired Home grown
private final CassandraTemplate cassandraTemplate;
wrapper class
public void add(Restaurant restaurant) {
cassandraTemplate.insertEntity(keyspace,
RESTAURANT_DETAILS_CF,
restaurant);
}
public Restaurant findDetailsById(int id) {
String key = Integer.toString(id);
return cassandraTemplate.findEntity(Restaurant.class,
keyspace, key, RESTAURANT_DETAILS_CF);
…
}
… http://en.wikipedia.org/wiki/Hector
46
48. Using VoltDB
Use the original schema
Standard SQL statements
BUT YOU MUST
Write stored procedures and invoke
them using proprietary interface
Partition your data
Slide 48
49. About VoltDB stored procedures
Key part of VoltDB
Replication = executing stored
procedure on replica
Logging = log stored procedure
invocation
Stored procedure invocation =
transaction
Slide 49
50. About partitioning
Partition column
RESTAURANT table
ID Name …
1 Ajanta
2 Eggshop
…
Partition 1
Partition 2
ID Name … ID Name …
1 Ajanta 2 Eggshop
… …
Slide 50
51. Example VoltDB cluster
(3 servers) x (2 partitions per server)
÷ 3 partitions
(2 replicas per partition)
Partition 1a Partition 2a Partition 3a
ID Name … ID Name … ID Name …
1 Ajanta 2 Eggshop … ..
… … …
Partition 3b Partition 1b Partition 2b
ID Name … ID Name … ID Name …
… .. 1 Ajanta 2 Eggshop
… … …
VoltDB Server A VoltDB Server B VoltDB Server C
Slide 51
52. Single partition procedure: FAST
SELECT * FROM RESTAURANT WHERE ID = 1
High-performance lock free code Stored
Partition
procedure
column
parameter
ID Name … ID Name … ID Name …
1 Ajanta 1 Eggshop … ..
… … …
… … …
VoltDB Server A VoltDB Server B VoltDB Server C
Slide 52
53. Multi-partition procedure: SLOWER
SELECT * FROM RESTAURANT WHERE NAME = ‘Ajanta’
Communication/Coordination overhead
ID Name … ID Name … ID Name …
1 Ajanta 1 Eggshop … ..
… … …
… … …
VoltDB Server A VoltDB Server B VoltDB Server C
Slide 53
54. Chosen partitioning scheme
<partitions>
<partition table="restaurant" column="id"/>
<partition table="service_area" column="restaurant_id"/>
<partition table="menu_item" column="restaurant_id"/>
<partition table="time_range" column="restaurant_id"/>
<partition table="available_time_range" column="restaurant_id"/>
</partitions>
Performance is excellent: much
faster than MySQL
54
55. Stored procedure – AddRestaurant
@ProcInfo( singlePartition = true, partitionInfo = "Restaurant.id: 0”)
public class AddRestaurant extends VoltProcedure {
public final SQLStmt insertRestaurant =
new SQLStmt("INSERT INTO Restaurant VALUES (?,?);");
public final SQLStmt insertServiceArea =
new SQLStmt("INSERT INTO service_area VALUES (?,?);");
public final SQLStmt insertOpeningTimes =
new SQLStmt("INSERT INTO time_range VALUES (?,?,?,?);");
public final SQLStmt insertMenuItem =
new SQLStmt("INSERT INTO menu_item VALUES (?,?,?);");
public long run(int id, String name, String[] serviceArea, long[] daysOfWeek, long[] openingTimes,
long[] closingTimes, String[] names, double[] prices) {
voltQueueSQL(insertRestaurant, id, name);
for (String zipCode : serviceArea)
voltQueueSQL(insertServiceArea, id, zipCode);
for (int i = 0; i < daysOfWeek.length ; i++)
voltQueueSQL(insertOpeningTimes, id, daysOfWeek[i], openingTimes[i], closingTimes[i]);
for (int i = 0; i < names.length ; i++)
voltQueueSQL(insertMenuItem, id, names[i], prices[i]);
voltExecuteSQL(true);
return 0;
}
}
55
60. Finding available restaurants
Available restaurants =
Serve the zip code of the delivery address
AND
Are open at the delivery time
public interface AvailableRestaurantRepository {
List<AvailableRestaurant>
findAvailableRestaurants(Address deliveryAddress,
Date deliveryTime); …
}
60
61. Finding available restaurants on Monday,
6.15pm for 94619 zip
select r.* Straightforward
from restaurant r three-way join
inner join restaurant_time_range tr
on r.id =tr.restaurant_id
inner join restaurant_zipcode sa
on r.id = sa.restaurant_id
Where ’94619’ = sa.zip_code
and tr.day_of_week=’monday’
and tr.openingtime <= 1815
and 1815 <= tr.closingtime
61
63. MongoDB = easy to query
{
serviceArea:"94619", Find a
openingHours: {
$elemMatch : { restaurant
"dayOfWeek" : "Monday",
"open": {$lte: 1815}, that serves
}
"close": {$gte: 1815}
the 94619 zip
}
} code and is
open at
DBCursor cursor = collection.find(qbeObject);
while (cursor.hasNext()) { 6.15pm on a
DBObject o = cursor.next();
… Monday
}
db.availableRestaurants.ensureIndex({serviceArea: 1})
63
64. MongoTemplate-based code
@Repository
public class AvailableRestaurantRepositoryMongoDbImpl
implements AvailableRestaurantRepository {
@Autowired private final MongoTemplate mongoTemplate;
@Override
public List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress,
Date deliveryTime) {
int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime);
int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime);
Query query = new Query(where("serviceArea").is(deliveryAddress.getZip())
.and("openingHours”).elemMatch(where("dayOfWeek").is(dayOfWeek)
.and("openingTime").lte(timeOfDay)
.and("closingTime").gte(timeOfDay)));
return mongoTemplate.find(AVAILABLE_RESTAURANTS_COLLECTION, query,
AvailableRestaurant.class);
}
mongoTemplate.ensureIndex(“availableRestaurants”,
new Index().on("serviceArea", Order.ASCENDING));
64
66. Slice is equivalent to a simple SELECT
keyVal colName colValue select key, colName, colValue
X Y Z from columnFamily
… where key = keyVal
and colName >= startVal
and colName <= endVal
Column Family
columnFamily.slice(key=keyVal,
startColumn=startVal,
endColumn=endVal)
X Y Z
67. We need to implement an index
that can be queried using a slice
Queries instead of data
model drives NoSQL
database design
Slide 67
68. Simplification #1: Denormalization
Restaurant_id Day_of_week Open_time Close_time Zip_code
1 Monday 1130 1430 94707
1 Monday 1130 1430 94619
1 Monday 1730 2130 94707
1 Monday 1730 2130 94619
2 Monday 0700 1430 94619
…
SELECT restaurant_id
FROM time_range_zip_code
WHERE day_of_week = ‘Monday’ Simpler query:
AND zip_code = 94619 No joins
Two = and two <
AND 1815 < close_time
AND open_time < 1815
68
69. Simplification #2: Application filtering
SELECT restaurant_id, open_time
FROM time_range_zip_code
WHERE day_of_week = ‘Monday’ Even simpler query
AND zip_code = 94619 • No joins
AND 1815 < close_time • Two = and one <
AND open_time < 1815
69
70. Simplification #3: Eliminate multiple =’s with
concatenation
Restaurant_id Zip_dow Open_time Close_time
1 94707:Monday 1130 1430
1 94619:Monday 1130 1430
1 94707:Monday 1730 2130
1 94619:Monday 1730 2130
2 94619:Monday 0700 1430
…
SELECT restaurant_id, open_time
FROM time_range_zip_code
WHERE zip_code_day_of_week = ‘94619:Monday’
AND 1815 < close_time
Row
key
range
70
71. Column family as an index
Restaurant_id Zip_dow Open_time Close_time
1 94707:Monday 1130 1430
1 94619:Monday 1130 1430
1 94707:Monday 1730 2130
1 94619:Monday 1730 2130
2 94619:Monday 0700 1430
…
Column Family: AvailableRestaurants
JSON FOR JSON FOR
(1430,0700,2) (2130,1730,1)
94619:Monday EGG AJANTA
JSON FOR
(1430,1130,1)
AJANTA
72. Querying with a slice
Column Family: AvailableRestaurants
JSON FOR JSON FOR
(1430,0700,2) (2130,1730,1)
EGG AJANTA
94619:Monday
JSON FOR
(1430,1130,1)
AJANTA
slice(key= 94619:Monday, sliceStart = (1815, *, *), sliceEnd = (2359, *, *))
JSON FOR
(2130,1730,1)
94619:Monday AJANTA
18:15 is after 17:30 {Ajanta}
72
73. Needs a few pages of code
private void insertAvailability(Restaurant restaurant) {
for (String zipCode : (Set<String>) restaurant.getServiceArea()) {
@Override for (TimeRange tr : (Set<TimeRange>) restaurant.getOpeningHours()) {
public List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime) {
String dayOfWeek = format2(tr.getDayOfWeek());
int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime);
String openingTime = format4(tr.getOpeningTime());
int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime);
String closingTime = format4(tr.getClosingTime());
String zipCode = deliveryAddress.getZip();
String key = formatKey(zipCode, format2(dayOfWeek));
String restaurantId = format8(restaurant.getId());
HSlicePredicate<Composite> predicate = new HSlicePredicate<Composite>(new CompositeSerializer());
String key = formatKey(zipCode, dayOfWeek);
Composite start = new Composite();
Composite finish = new Composite();
String columnValue = toJson(restaurant);
start.addComponent(0, format4(timeOfDay), ComponentEquality.GREATER_THAN_EQUAL);
finish.addComponent(0, format4(2359), ComponentEquality.GREATER_THAN_EQUAL);
Composite columnName = new Composite();
predicate.setRange(start, finish, false, 100);
columnName.add(0, closingTime);
final List<AvailableRestaurantIndexEntry> closingAfter = new ArrayList<AvailableRestaurantIndexEntry>();
columnName.add(1, openingTime);
columnName.add(2, restaurantId);
ColumnFamilyRowMapper<String, Composite, Object> mapper = new ColumnFamilyRowMapper<String, Composite, Object>() {
@Override
ColumnFamilyUpdater<String, Composite> updater
public Object mapRow(ColumnFamilyResult<String, Composite> results) {
= compositeCloseTemplate.createUpdater(key);
for (Composite columnName : results.getColumnNames()) {
String openTime = columnName.get(1, new StringSerializer());
String restaurantId = columnName.get(2, new StringSerializer());
updater.setString(columnName, columnValue);
closingAfter.add(new AvailableRestaurantIndexEntry(openTime, restaurantId, results.getString(columnName)));
}
return null;
}
};
compositeCloseTemplate.update(updater);
}
compositeCloseTemplate.queryColumns(key, predicate, mapper);
}
List<AvailableRestaurant> result = new LinkedList<AvailableRestaurant>();
}
for (AvailableRestaurantIndexEntry trIdAndAvailableRestaurant : closingAfter) {
if (trIdAndAvailableRestaurant.isOpenBefore(timeOfDay))
result.add(trIdAndAvailableRestaurant.getAvailableRestaurant());
}
return result;
} 73
74. What did I just do to query the data?
Wrote code to maintain an index
Reduced performance due to extra
writes
74
75. But what would you rather implement?
“Complex” query logic
OR
Multi-datacenter, multi-
master database
infrastructure
Slide 75
77. VoltDB - attempt #0
@ProcInfo( singlePartition = false)
public class FindAvailableRestaurants extends VoltProcedure { ... }
SELECT r.*
FROM restaurant r,time_range tr, service_area sa
WHERE ? = sa.zip_code and sa.restaurant_id= tr.restaurant_id
AND r.restaurant_id = sa.restaurant_id and tr.day_of_week=?
AND tr.open_time <= ?
AND ? <= tr.close_time
Slow
Slide 77
78. VoltDB - attempt #1
@ProcInfo( singlePartition = false)
public class FindAvailableRestaurants extends VoltProcedure { ... }
create index idx_service_area_zip_code on service_area(zip_code);
..
ERROR 10:12:03,251 [main] COMPILER: Failed to plan for statement
type(findAvailableRestaurants_with_join) select r.* from restaurant
r,time_range tr, service_area sa Where ? = sa.zip_code and
sa.restaurant_id= tr.restaurant_id and r.restaurant_id =
sa.restaurant_id and tr.day_of_week=? and tr.open_time <= ? and ?
<= tr.close_time Error: "Unable to plan for statement. Possibly joining
partitioned tables in a multi-partition procedure using a column that is
not the partitioning attribute or a non-equality operator. This is
statement not supported at this time."
2012-03-19 08:19:31,743 ERROR [main] COMPILER: Catalog
compilation failed.
#fail
Slide 78
79. VoltDB - attempt #2
@ProcInfo( singlePartition = true, partitionInfo = "Restaurant.id: 0”)
public class AddRestaurant extends VoltProcedure {
public final SQLStmt insertAvailable=
new SQLStmt("INSERT INTO available_time_range VALUES (?,?,?, ?, ?, ?);");
public long run(....) {
...
for (int i = 0; i < daysOfWeek.length ; i++) {
voltQueueSQL(insertOpeningTimes, id, daysOfWeek[i], openingTimes[i], closingTimes[i]);
for (String zipCode : serviceArea) {
voltQueueSQL(insertAvailable, id, daysOfWeek[i], openingTimes[i],
closingTimes[i], zipCode, name);
}
}
... public final SQLStmt findAvailableRestaurants_denorm = new SQLStmt(
voltExecuteSQL(true); "select restaurant_id, name from available_time_range tr " +
return 0; "where ? = tr.zip_code " +
} "and tr.day_of_week=? " +
} "and tr.open_time <= ? " +
" and ? <= tr.close_time ");
Works but queries are only slightly
faster than MySQL!
Slide 79
80. VoltDB - attempt #3
<partitions>
...
<partition table="available_time_range" column="zip_code"/>
</partitions>
@ProcInfo( singlePartition = false, ...)
public class AddRestaurant extends VoltProcedure { ... }
@ProcInfo( singlePartition = true,
partitionInfo = "available_time_range.zip_code: 0")
public class FindAvailableRestaurants extends VoltProcedure { ... }
Queries are really fast
But inserts are not
80
81. VoltDB – key lesson
A given partitioning scheme
can be good for some use
cases but bad for others
Slide 81
83. … Summary…
Benefits Drawbacks
RDBMS • SQL • Lack of performance and scalability
• ACID • Difficult to distribute
• Familiar • Schema updates
• Handling semi-structured data
NoSQL • Scalability • Lack of ACID
• Performance • Limited querying (some)
• Schema-less
NewSQL • ACID • Proprietary API
• SQL • Schema updates
• Familiar • Handling semi-structured data
• Scalability • Not all use cases are performant
• Performance
Slide 83
84. … Summary
Very carefully pick the NewSQL/
NoSQL DB for your application
Consider a polyglot persistence
architecture
Encapsulate your data access code so
you can switch
Startups = avoid NewSQL/NoSQL for
shorter time to market?
Slide 84
85. Thank you!
My contact info:
chris.richardson@springsource.com
@crichardson
Blog: http://plainoldobjects.com