Relational databases have long been considered the one true way to persist enterprise data. But today, NoSQL databases are emerging as a viable alternative for many applications. They can simplify the persistence of complex data models and offer significantly better scalability, and performance. But NoSQL databases are very different than the ACID/SQL/JDBC/JPA world that we have become accustomed to. In this presentation, you will learn about our experience implementing a use case from POJOs in Action using popular NoSQL databases: Redis, MongoDB, and Cassandra. We will compare and contrast each database’s data model and Java API. You will learn about the benefits and drawbacks of using NoSQL.
Enterprise applications are complex making it difficult to fit everything in one model. NoSQL is taking a leading role in the next generation database technologies and polyglot persistence a good option to leverage the strength of multiple data stores. This talk will introduce the Spring Data project, an umbrella project that provides a familiar and consistent Spring-based programming model for a wide range of data access technologies such as Redis, MongoDB, HBase, Neo4j...while retaining store-specific features and capabilities.
Weaving Dataflows with Silk - ScalaMatsuri 2014, TokyoTaro L. Saito
Silk is a framework for building dataflows in Scala. In Silk users write data processing code with collection operators (e.g., map, filter, reduce, join, etc.). Silk uses Scala Macros to construct a DAG of dataflows, nodes of which are annotated with variable names in the program. By using these variable names as markers in the DAG, Silk can support interruption and resume of dataflows and querying the intermediate data. By separating dataflow descriptions from its computation, Silk enables us to switch executors, called weavers, for in-memory or cluster computing without modifying the code. In this talk, we will show how Silk helps you run data-processing pipelines as you write the code.
Enterprise applications are complex making it difficult to fit everything in one model. NoSQL is taking a leading role in the next generation database technologies and polyglot persistence a good option to leverage the strength of multiple data stores. This talk will introduce the Spring Data project, an umbrella project that provides a familiar and consistent Spring-based programming model for a wide range of data access technologies such as Redis, MongoDB, HBase, Neo4j...while retaining store-specific features and capabilities.
Weaving Dataflows with Silk - ScalaMatsuri 2014, TokyoTaro L. Saito
Silk is a framework for building dataflows in Scala. In Silk users write data processing code with collection operators (e.g., map, filter, reduce, join, etc.). Silk uses Scala Macros to construct a DAG of dataflows, nodes of which are annotated with variable names in the program. By using these variable names as markers in the DAG, Silk can support interruption and resume of dataflows and querying the intermediate data. By separating dataflow descriptions from its computation, Silk enables us to switch executors, called weavers, for in-memory or cluster computing without modifying the code. In this talk, we will show how Silk helps you run data-processing pipelines as you write the code.
Talk given at ClojureD conference, Berlin
Apache Spark is an engine for efficiently processing large amounts of data. We show how to apply the elegance of Clojure to Spark - fully exploiting the REPL and dynamic typing. There will be live coding using our gorillalabs/sparkling API.
In the presentation, we will of course introduce the core concepts of Spark, like resilient distributed data sets (RDD). And you will learn how the Spark concepts resembles those well-known from Clojure, like persistent data structures and functional programming.
Finally, we will provide some Do’s and Don’ts for you to kick off your Spark program based upon our experience.
About Paulus Esterhazy and Christian Betz
Being a LISP hacker for several years, and a Java-guy for some more, Chris turned to Clojure for production code in 2011. He’s been Project Lead, Software Architect, and VP Tech in the meantime, interested in AI and data-visualization.
Now, working on the heart of data driven marketing for Performance Media in Hamburg, he turned to Apache Spark for some Big Data jobs. Chris released the API-wrapper ‘chrisbetz/sparkling’ to fully exploit the power of his compute cluster.
Paulus Esterhazy
Paulus is a philosophy PhD turned software engineer with an interest in functional programming and a penchant for hammock-driven development.
He currently works as Senior Web Developer at Red Pineapple Media in Berlin.
This slide deck is used as an introduction to the internals of Apache Spark, as part of the Distributed Systems and Cloud Computing course I hold at Eurecom.
Course website:
http://michiard.github.io/DISC-CLOUD-COURSE/
Sources available here:
https://github.com/michiard/DISC-CLOUD-COURSE
Scalable and Flexible Machine Learning With Scala @ LinkedInVitaly Gordon
The presentation given by Chris Severs and myself at the Bay Area Scala Enthusiasts meetup. http://www.meetup.com/Bay-Area-Scala-Enthusiasts/events/105409962/
Spark after Dark by Chris Fregly of DatabricksData Con LA
Spark After Dark is a mock dating site that uses the latest Spark libraries, AWS Kinesis, Lambda Architecture, and Probabilistic Data Structures to generate dating recommendations.
There will be 5+ demos covering everything from basic data ETL to advanced data processing including Alternating Least Squares Machine Learning/Collaborative Filtering and PageRank Graph Processing.
There is heavy emphasis on Spark Streaming and AWS Kinesis.
Watch the video here
https://www.youtube.com/watch?v=g0i_d8YT-Bs
OrientDB vs Neo4j - and an introduction to NoSQL databasesCurtis Mosters
NoSQL databases are a good alternative to common SQL technologies. Here you get an introduction and comparison of SQL vs NoSQL. Furthermore we have a look on Graph databases and especially OrientDB vs Neo4j.
OCF.tw's talk about "Introduction to spark"Giivee The
在 OCF and OSSF 的邀請下分享一下 Spark
If you have any interest about 財團法人開放文化基金會(OCF) or 自由軟體鑄造場(OSSF)
Please check http://ocf.tw/ or http://www.openfoundry.org/
另外感謝 CLBC 的場地
如果你想到在一個良好的工作環境下工作
歡迎跟 CLBC 接洽 http://clbc.tw/
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2L4rPmM
This CloudxLab Basics of RDD tutorial helps you to understand Basics of RDD in detail. Below are the topics covered in this tutorial:
1) What is RDD - Resilient Distributed Datasets
2) Creating RDD in Scala
3) RDD Operations - Transformations & Actions
4) RDD Transformations - map() & filter()
5) RDD Actions - take() & saveAsTextFile()
6) Lazy Evaluation & Instant Evaluation
7) Lineage Graph
8) flatMap and Union
9) Scala Transformations - Union
10) Scala Actions - saveAsTextFile(), collect(), take() and count()
11) More Actions - reduce()
12) Can We Use reduce() for Computing Average?
13) Solving Problems with Spark
14) Compute Average and Standard Deviation with Spark
15) Pick Random Samples From a Dataset using Spark
Slides of my "Rapid JCR applications development with Sling" at ApacheCon EU 2009. Starts like the US 2008 version but uses a different example for the second part.
NoSQL databases such as Redis, MongoDB and Cassandra are emerging as a compelling choice for many applications. They can simplify the persistence of complex data models and offer significantly better scalability and performance. However, using a NoSQL database means giving up the benefits of the relational model such as SQL, constraints and ACID transactions. For some applications, the solution is polyglot persistence: using SQL and NoSQL databases together.
In this talk, you will learn about the benefits and drawbacks of polyglot persistence and how to design applications that use this approach. We will explore the architecture and implementation of an example application that uses MySQL as the system of record and Redis as a very high-performance database that handles queries from the front-end. You will learn about mechanisms for maintaining consistency across the various databases.
Talk given at ClojureD conference, Berlin
Apache Spark is an engine for efficiently processing large amounts of data. We show how to apply the elegance of Clojure to Spark - fully exploiting the REPL and dynamic typing. There will be live coding using our gorillalabs/sparkling API.
In the presentation, we will of course introduce the core concepts of Spark, like resilient distributed data sets (RDD). And you will learn how the Spark concepts resembles those well-known from Clojure, like persistent data structures and functional programming.
Finally, we will provide some Do’s and Don’ts for you to kick off your Spark program based upon our experience.
About Paulus Esterhazy and Christian Betz
Being a LISP hacker for several years, and a Java-guy for some more, Chris turned to Clojure for production code in 2011. He’s been Project Lead, Software Architect, and VP Tech in the meantime, interested in AI and data-visualization.
Now, working on the heart of data driven marketing for Performance Media in Hamburg, he turned to Apache Spark for some Big Data jobs. Chris released the API-wrapper ‘chrisbetz/sparkling’ to fully exploit the power of his compute cluster.
Paulus Esterhazy
Paulus is a philosophy PhD turned software engineer with an interest in functional programming and a penchant for hammock-driven development.
He currently works as Senior Web Developer at Red Pineapple Media in Berlin.
This slide deck is used as an introduction to the internals of Apache Spark, as part of the Distributed Systems and Cloud Computing course I hold at Eurecom.
Course website:
http://michiard.github.io/DISC-CLOUD-COURSE/
Sources available here:
https://github.com/michiard/DISC-CLOUD-COURSE
Scalable and Flexible Machine Learning With Scala @ LinkedInVitaly Gordon
The presentation given by Chris Severs and myself at the Bay Area Scala Enthusiasts meetup. http://www.meetup.com/Bay-Area-Scala-Enthusiasts/events/105409962/
Spark after Dark by Chris Fregly of DatabricksData Con LA
Spark After Dark is a mock dating site that uses the latest Spark libraries, AWS Kinesis, Lambda Architecture, and Probabilistic Data Structures to generate dating recommendations.
There will be 5+ demos covering everything from basic data ETL to advanced data processing including Alternating Least Squares Machine Learning/Collaborative Filtering and PageRank Graph Processing.
There is heavy emphasis on Spark Streaming and AWS Kinesis.
Watch the video here
https://www.youtube.com/watch?v=g0i_d8YT-Bs
OrientDB vs Neo4j - and an introduction to NoSQL databasesCurtis Mosters
NoSQL databases are a good alternative to common SQL technologies. Here you get an introduction and comparison of SQL vs NoSQL. Furthermore we have a look on Graph databases and especially OrientDB vs Neo4j.
OCF.tw's talk about "Introduction to spark"Giivee The
在 OCF and OSSF 的邀請下分享一下 Spark
If you have any interest about 財團法人開放文化基金會(OCF) or 自由軟體鑄造場(OSSF)
Please check http://ocf.tw/ or http://www.openfoundry.org/
另外感謝 CLBC 的場地
如果你想到在一個良好的工作環境下工作
歡迎跟 CLBC 接洽 http://clbc.tw/
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2L4rPmM
This CloudxLab Basics of RDD tutorial helps you to understand Basics of RDD in detail. Below are the topics covered in this tutorial:
1) What is RDD - Resilient Distributed Datasets
2) Creating RDD in Scala
3) RDD Operations - Transformations & Actions
4) RDD Transformations - map() & filter()
5) RDD Actions - take() & saveAsTextFile()
6) Lazy Evaluation & Instant Evaluation
7) Lineage Graph
8) flatMap and Union
9) Scala Transformations - Union
10) Scala Actions - saveAsTextFile(), collect(), take() and count()
11) More Actions - reduce()
12) Can We Use reduce() for Computing Average?
13) Solving Problems with Spark
14) Compute Average and Standard Deviation with Spark
15) Pick Random Samples From a Dataset using Spark
Slides of my "Rapid JCR applications development with Sling" at ApacheCon EU 2009. Starts like the US 2008 version but uses a different example for the second part.
NoSQL databases such as Redis, MongoDB and Cassandra are emerging as a compelling choice for many applications. They can simplify the persistence of complex data models and offer significantly better scalability and performance. However, using a NoSQL database means giving up the benefits of the relational model such as SQL, constraints and ACID transactions. For some applications, the solution is polyglot persistence: using SQL and NoSQL databases together.
In this talk, you will learn about the benefits and drawbacks of polyglot persistence and how to design applications that use this approach. We will explore the architecture and implementation of an example application that uses MySQL as the system of record and Redis as a very high-performance database that handles queries from the front-end. You will learn about mechanisms for maintaining consistency across the various databases.
Developing applications with Cloud Services (Devnexus 2013)Chris Richardson
Cloud computing isn’t just about application deployment. There are also a growing number of cloud-based web services that you can use to develop your application. One of the most well known is Amazon’s Simple Storage Service. But there are many others including web services for messaging, relational and NoSQL databases, email and telephony. Using these services allows you to build highly scalable applications without the pain and cost of having to develop and operate your own infrastructure.
In this presentation, you will learn about the benefits and drawbacks of these Web services; their typical use cases and how to use them. We will describe a location aware, telephony application that is built using cloud services. You will learn about strategies for building resilient, fault tolerant applications that consume cloud services.
Developing applications with Cloud Services #javaone 2012Chris Richardson
Cloud computing isn't just about application deployment. There are also a growing number of cloud-based web services that you can use to develop your application. One of the most well known is Amazon's Simple Storage Service. But there are many others including web services for messaging, relational and NoSQL databases, email and telephony. Using these services allows you to build highly scalable applications without the pain and cost of having to develop and operate your own infrastructure.
In this presentation, you will learn about the benefits and drawbacks of these Web services; their typical use cases and how to use them. We will describe a location aware, telephony application that is built using cloud services. You will learn about strategies for building resilient, fault tolerant applications that consume cloud services.
Developing polyglot applications on Cloud Foundry (#oredev 2012)Chris Richardson
Developing web applications used to be simple. Your single war-file web application served up HTML to a desktop browser and used a relational database. Today however, web applications are much more complex: the front-end uses HTML5 and NodeJS, the middle tier is decomposed into multiple services, and the back-end uses a mix of SQL and NoSQL databases. Developing these kind of applications can be challenging since there are so many moving parts that need to be correctly installed and configured. Deployment is even more difficult.
In this talk, you will learn why we need to build applications with this architectural style and how Cloud Foundry, which is modern, multi-lingual, multi-service, extensible open-source PaaS, can help. We will talk about how to develop modern applications that run on Cloud Foundry and cover what’s new and different about the cloud environment. You will learn how your application can consume the various services that are provided by Cloud Foundry. We will discuss the various ways of using Cloud Foundry including the Micro Cloud that runs on a laptop as well as the hosted CloudFoundry.com.
NoSQL databases such as Redis, MongoDB and Cassandra are emerging as a compelling choice for many applications. They can simplify the persistence of complex data models and offer significantly better scalability and performance. However, using a NoSQL database means giving up the benefits of the relational model such as SQL, constraints and ACID transactions. For some applications, the solution is polyglot persistence: using SQL and NoSQL databases together.
In this talk, you will learn about the benefits and drawbacks of polyglot persistence and how to design applications that use this approach. We will explore the architecture and implementation of an example application that uses MySQL as the system of record and Redis as a very high-performance database that handles queries from the front-end. You will learn about mechanisms for maintaining consistency across the various databases.
Decomposing applications for scalability and deployability - svcc sv_code_ca...Chris Richardson
Today, there are several trends that are forcing application architectures to evolve. Users expect a rich, interactive and dynamic user experience on a wide variety of clients including mobile devices. Applications must be highly scalable, highly available and run on cloud environments. Organizations often want to frequently roll out updates, even multiple times a day. Consequently, it’s no longer adequate to develop simple, monolithic web applications that serve up HTML to desktop browsers.
In this talk we describe the limitations of a monolithic architecture. You will learn how to use the scale cube to decompose your application into a set of narrowly focused, independently deployable back-end services and an HTML 5 client. We will also discuss the role of technologies such as NodeJS and AMQP brokers. You will learn how a modern PaaS such as Cloud Foundry simplifies the development and deployment of this style of application.
Decomposing applications for scalability and deployability (devnexus 2013)Chris Richardson
Today, there are several trends that are forcing application architectures to evolve. Users expect a rich, interactive and dynamic user experience on a wide variety of clients including mobile devices. Applications must be highly scalable, highly available and run on cloud environments. Organizations often want to frequently roll out updates, even multiple times a day. Consequently, it’s no longer adequate to develop simple, monolithic web applications that serve up HTML to desktop browsers.
In this talk we describe the limitations of a monolithic architecture. You will learn how to use the scale cube to decompose your application into a set of narrowly focused, independently deployable back-end services and an HTML 5 client. We will also discuss the role of technologies such as NodeJS and AMQP brokers. You will learn how a modern PaaS such as Cloud Foundry simplifies the development and deployment of this style of application.
Improving application design with a rich domain model (springone 2007)Chris Richardson
A classic from 2007. This is a presentationthat I gave at SpringOne in Antwerp, Belgium. It describes show to improve application design by using a rich domain model
This is 30 minute GlueCon 2013 version of a much longer talk. See http://plainoldobjects.com/presentations/developing-polyglot-persistence-applications/ for other versions and the example code.
NoSQL databases such as Redis, MongoDB and Cassandra are emerging as a compelling choice for many applications. They can simplify the persistence of complex data models and offer significantly better scalability and performance. However, using a NoSQL database means giving up the benefits of the relational model such as SQL, constraints and ACID transactions. For some applications, the solution is polyglot persistence: using SQL and NoSQL databases together.
In this talk, you will learn about the benefits and drawbacks of polyglot persistence and how to design applications that use this approach. We will explore the architecture and implementation of an example application that uses MySQL as the system of record and Redis as a very high-performance database that handles queries from the front-end. You will learn about mechanisms for maintaining consistency across the various databases.
NodeJS: the good parts? A skeptic’s view (jax jax2013)Chris Richardson
JavaScript used to be confined to the browser. But these days, it's becoming increasingly popular in server-side applications in the form of Node.js. Node.js provides event-driven, non-blocking I/O model that supposedly makes it easy to build scalable network application. In this talk you will learn about the consequences of combining the event-driven programming model with a prototype-based, weakly typed, dynamic language. We will share our perspective as a server-side Java developer who wasn’t entirely happy about JavaScript in the browser, let alone on the server. You will learn how to use Node.js effectively in modern, polyglot applications.
Watch the video: http://www.youtube.com/watch?v=CN0jTnSROsk&feature=youtu.be
Microservices pattern language (microxchg microxchg2016)Chris Richardson
My talk from http://microxchg.io/2016/index.html.
Here is the video - https://www.youtube.com/watch?v=1mcVQhbkA2U
When architecting an enterprise Java application, you need to choose between the traditional monolithic architecture consisting of a single large WAR file, or the more fashionable microservices architecture consisting of many smaller services. But rather than blindly picking the familiar or the fashionable, it’s important to remember what Fred Books said almost 30 years ago: there are no silver bullets in software. Every architectural decision has both benefits and drawbacks. Whether the benefits of one approach outweigh the drawbacks greatly depends upon the context of your particular project. Moreover, even if you adopt the microservices architecture, you must still make numerous other design decisions, each with their own trade-offs.
A software pattern is an ideal way of describing a solution to a problem in a given context along with its tradeoffs. In this presentation, we describe a pattern language for microservices. You will learn about patterns that will help you decide when and how to use microservices vs. a monolithic architecture. We will also describe patterns that solve various problems in a microservice architecture including inter-service communication, service registration and service discovery.
Map, Flatmap and Reduce are Your New Best Friends: Simpler Collections, Concu...Chris Richardson
Higher-order functions such as map(), flatmap(), filter() and reduce() have their origins in mathematics and ancient functional programming languages such as Lisp. But today they have entered the mainstream and are available in languages such as JavaScript, Scala and Java 8. They are well on their way to becoming an essential part of every developer’s toolbox.
In this talk you will learn how these and other higher-order functions enable you to write simple, expressive and concise code that solve problems in a diverse set of domains. We will describe how you use them to process collections in Java and Scala. You will learn how functional Futures and Rx (Reactive Extensions) Observables simplify concurrent code. We will even talk about how to write big data applications in a functional style using libraries such as Scalding.
Map(), flatmap() and reduce() are your new best friends: simpler collections,...Chris Richardson
Higher-order functions such as map(), flatmap(), filter() and reduce() have their origins in mathematics and ancient functional programming languages such as Lisp. But today they have entered the mainstream and are available in languages such as JavaScript, Scala and Java 8. They are well on their way to becoming an essential part of every developer’s toolbox.
In this talk you will learn how these and other higher-order functions enable you to write simple, expressive and concise code that solve problems in a diverse set of domains. We will describe how you use them to process collections in Java and Scala. You will learn how functional Futures and Rx (Reactive Extensions) Observables simplify concurrent code. We will even talk about how to write big data applications in a functional style using libraries such as Scalding.
Polyglot persistence for Java developers - moving out of the relational comfo...Chris Richardson
Relational databases have long been considered the one true way to persist enterprise data. But today, NoSQL databases are emerging as a viable alternative for many applications. They can simplify the persistence of complex data models and offer significantly better scalability, and performance. But using NoSQL databases is very different than the ACID/SQL/JDBC/JPA world that we have become accustomed to. They have different and unfamiliar APIs and a very different and usually limited transaction model. In this presentation, we describe some popular NoSQL databases – Redis, MongoDB, and Cassandra. You will learn about each database’s data model and Java API. We describe the benefits and drawbacks with using NoSQL databases. Finally, you will learn how the Spring Data project simplifies the development of Java applications that use NoSQL databases.
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, EgyptChris Richardson
The database world is undergoing a major upheaval. NoSQL databases such as MongoDB and Cassandra are emerging as a compelling choice for many applications. They can simplify the persistence of complex data models and offering significantly better scalability and performance. But these databases have a very different and unfamiliar data model and APIs as well as a limited transaction model. Moreover, the relational world is fighting back with so-called NewSQL databases such as VoltDB, which by using a radically different architecture offers high scalability and performance as well as the familiar relational model and ACID transactions. Sounds great but unlike the traditional relational database you can’t use JDBC and must partition your data.
In this presentation you will learn about popular NoSQL databases – MongoDB, and Cassandra - as well at VoltDB. We will compare and contrast each database’s data model and Java API using NoSQL and NewSQL versions of a use case from the book POJOs in Action. We will learn about the benefits and drawbacks of using NoSQL and NewSQL databases.
In pursuit of expressivity: Groovy and Scala comparedChris Richardson
Thirty years ago the famous computer scientist Tony Hoare said “I have regarded it as the highest goal of programming language design to enable good ideas to be elegantly expressed”. His comment sums up what has been driving the evolution of programming languages: the desire to create increasingly expressive languages. In this talk you will learn about two modern JVM languages that strive to be expressive in very different ways. We will talk about Groovy, which is a dynamically-typed language, and Scala, which is a statically-typed language. You will learn about the key features of each language as well as their benefits and the drawbacks.
The database world is undergoing a major upheaval. NoSQL databases such as MongoDB and Cassandra are emerging as a compelling choice for many applications. They can simplify the persistence of complex data models and offering significantly better scalability and performance. But these databases have a very different and unfamiliar data model and APIs as well as a limited transaction model. Moreover, the relational world is fighting back with so-called NewSQL databases such as VoltDB, which by using a radically different architecture offers high scalability and performance as well as the familiar relational model and ACID transactions. Sounds great but unlike the traditional relational database you can't use JDBC and must partition your data.
In this presentation you will learn about popular NoSQL databases - MongoDB, and Cassandra - as well at VoltDB. We will compare and contrast each database's data model and Java API using NoSQL and NewSQL versions of a use case from the book POJOs in Action. We will learn about the benefits and drawbacks of using NoSQL and NewSQL databases.
ABSTRACT: The ongoing big data revolution has revolutionized the way in which technology is used to empower new business segments like social networking and transform old business segments like traditional retail. However, the DNA that is used to build data processing platform is evolving quite rapidly. There is a plethora of competing tools, technologies, and “religion” for how to build state-of-the-art data analysis frameworks. In this talk, I will go over five ways to build scalable high-performance long-lasting data analysis frameworks in the wrong way. Surprisingly, the industry is full of examples of organization building frameworks in this “wrong” way. Since the “right” way to build a technology framework is dependent on the key business drivers, it is my hope that this talk will spur a discussion on what is the “right” way for Pinterest. The talk will focus on technologies including “data plumbing” (e.g. tools in the Hadoop ecosystem), and statistical modeling methods (e.g. R and Python). In this talk, I’ll try to connect to platform builders, data scientists, and business decision makers.
BIO: Jignesh Patel is a Professor in Computer Sciences at the University of Wisconsin-Madison, where he also earned his Ph.D. He has worked in the area of databases (now fashionably called “big data”) for over two decades. He has won several best paper awards, and industry research awards. He is the recipient of the Wisconsin COW teaching award, and the U. Michigan College of Engineering Education Excellence Award. He has a strong interest in seeing research ideas transition to actual products. His Ph.D. thesis work was acquired by NCR/Teradata in 1997, and he also co-founded Locomatix -- a startup that built a platform to power real-time data-driven mobile services. Locomatix became part of Twitter in 2013. He is an ACM Distinguished Scientist and an IEEE Senior Member. He also serves on the board of Lands’ End, and advises a number of startups.
NativeX (formerly W3i) recently transitioned a large portion of their backend infrastructure from MS SQL Server to Apache Cassandra. Today, its Cassandra cluster backs its mobile advertising network supporting over 10 million daily active users producing over 10,000 transactions per second with an average database request latency of under 2 milliseconds. Going from relational to noSQL required NativeX's engineers to re-train, re-tool and re-think the way it architects applications and infrastructure. Learn why Cassandra was selected as a replacement, what challenges were encountered along the way, and what architecture and infrastructure were involved in the implementation.
Following the classical software architecture patterns we tend to design large monolith of software applications.
These monoliths are typically quite difficult to scale as they often require powerful machines, making the option to scale out very expensive.
In most cases these monoliths of software are designed to run on a single machine only, hence scaling out is complicated or even impossible without refactoring large portions of the application.
Therefore a new design pattern called microservices arose.
The pattern of microservices keeps the need of a clustered server setup in mind and helps to keep the application very modular.
This allows to simplify a scale out of your application and even allows to scale the bottlenecks of your application only and hence reducing the total cost for a scale out approach.
In this talk I will introduce the concept of microservices, how they are defined and how to design an application with them.
Furthermore I will show how to scale the application properly and why this is only possible due to the use of microservices.
Also we will have a look at Node.js and why it is a perfect, though not the only, fit to this design strategy.
However scaling is not the only purpose of microservices, they also increase the flexibility and maintainability of applications, this will also be discussed in the talk.
Synthesis in VLSI is the process of converting your code (program) into a circuit. In terms of logic gates, synthesis is the process of translating an abstract design into a properly implemented chip. Hardware Description Languages (HDLs) are specific programming languages that are used to explain the hardware of a circuit, and the computer subsequently builds the circuit depending on the programme you provided. A “Gate Level Netlist” is what you get once you finish synthesising. This is how your circuit will appear. It demonstrates how everything is interconnected. You can alter it if you like; the computer just synthesizes this netlist based on its best judgement. The synthesizer generates better netlists as the abilities improve and they become more proficient at creating HDL programmes.
Apache Jackrabbit Oak is a new JCR implementation with a completely new architecture. Based on concepts like eventual consistency and multi-version concurrency control, and borrowing ideas from distributed version control systems and cloud-scale databases, the Oak architecture is a major leap ahead for Jackrabbit. This presentation describes the Oak architecture and shows what it means for the scalability and performance of modern content applications. Changes to existing Jackrabbit functionality are described and the migration process is explained.
ScyllaDB Open Source 5.0 is the latest evolution of our monstrously fast and scalable NoSQL database – powering instantaneous experiences with massive distributed datasets.
Join us to learn about ScyllaDB Open Source 5.0, which represents the first milestone in ScyllaDB V. ScyllaDB 5.0 introduces a host of functional, performance and stability improvements that resolve longstanding challenges of legacy NoSQL databases.
We’ll cover:
- New capabilities including a new IO model and scheduler, Raft-based schema updates, automated tombstone garbage collection, optimized reverse queries, and support for the latest AWS EC2 instances
- How ScyllaDB 5.0 fits into the evolution of ScyllaDB – and what to expect next
- The first look at benchmarks that quantify the impact of ScyllaDB 5.0's numerous optimizations
This will be an interactive session with ample time for Q & A – bring us your questions and feedback!
ScyllaDB V Developer Deep Dive Series: Resiliency and Strong Consistency via ...ScyllaDB
ScyllaDB’s implementation of the Raft consensus protocol translates to strong, immediately consistent schema updates, topology changes, tables and indexes, and more. This eliminates schema and data conflicts, enables rapid and safe increases in cluster capacity, and provides a leap forward in manageability. Join this webinar to learn how the Raft consensus algorithm has been implemented, what you can do with it today, and what radical new capabilities it will enable in the days ahead.
Understand what NoSQL is and what it is not. Why would you want to use NoSQL within your project and which NoSQL database would you utilize. Explore the relationships between NoSQL and RDBMS. Understand how to select between an RDBMs (MySQL and PostgreSQL), Document Database(MongoDB), Key-Value Store, Graph Database, and Columnar databases or combinations of the above.
Understand what NoSQL is and what it is not. Why would you want to use NoSQL within your project and which NoSQL database would you utilize. Explore the relationships between NoSQL and RDBMS. Understand how to select between an RDBMs (MySQL and PostgreSQL), Document Database(MongoDB), Key-Value Store, Graph Database, and Columnar databases or combinations of the above.
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
Après la petite intro sur le stockage distribué et la description de Ceph, Jian Zhang réalise dans cette présentation quelques benchmarks intéressants : tests séquentiels, tests random et surtout comparaison des résultats avant et après optimisations. Les paramètres de configuration touchés et optimisations (Large page numbers, Omap data sur un disque séparé, ...) apportent au minimum 2x de perf en plus.
Similar to Polygot persistence for Java Developers - August 2011 / @Oakjug (20)
A common microservice architecture anti-pattern is more the merrier. It occurs when an organization team builds an excessively fine-grained architecture, e.g. one service-per-developer. In this talk, you will learn about the criteria that you should consider when deciding service granularity. I'll discuss the downsides of a fine-grained microservice architecture. You will learn how sometimes the solution to a design problem is simply a JAR file.
YOW London - Considering Migrating a Monolith to Microservices? A Dark Energy...Chris Richardson
This is a talk I gave at YOW! London 2022.
Let's imagine that you are responsible for an aging monolithic application that's critical to your business. Sadly, getting changes into production is a painful ordeal that regularly causes outages. And to make matters worse, the application's technology stack is growing increasingly obsolete. Neither the business nor the developers are happy. You need to modernize your application and have read about the benefits of microservices. But is the microservice architecture a good choice for your application?
In this presentation, I describe the dark energy and dark matter forces (a.k.a. concerns) that you must consider when deciding between the monolithic and microservice architectural styles. You will learn about how well each architectural style resolves each of these forces. I describe how to evaluate the relative importance of each of these forces to your application. You will learn how to use the results of this evaluation to decide whether to migrate to the microservice architecture.
Dark Energy, Dark Matter and the Microservices Patterns?!Chris Richardson
Dark matter and dark energy are mysterious concepts from astrophysics that are used to explain observations of distant stars and galaxies. The Microservices pattern language - a collection of patterns that solve architecture, design, development, and operational problems — enables software developers to use the microservice architecture effectively. But how could there possibly be a connection between microservices and these esoteric concepts from astrophysics?
In this presentation, I describe how dark energy and dark matter are excellent metaphors for the competing forces (a.k.a. concerns) that must be resolved by the microservices pattern language. You will learn that dark energy, which is an anti-gravity, is a metaphor for the repulsive forces that encourage decomposition into services. I describe how dark matter, which is an invisible matter that has a gravitational effect, is a metaphor for the attractive forces that resist decomposition and encourage the use of a monolithic architecture. You will learn how to use the dark energy and dark matter forces as guide when designing services and operations.
Dark energy, dark matter and microservice architecture collaboration patternsChris Richardson
Dark energy and dark matter are useful metaphors for the repulsive forces, which encourage decomposition into services, and the attractive forces, which resist decomposition. You must balance these conflicting forces when defining a microservice architecture including when designing system operations (a.k.a. requests) that span services.
In this talk, I describe the dark energy and dark matter forces. You will learn how to design system operations that span services using microservice architecture collaboration patterns: Saga, Command-side replica, API composition, and CQRS patterns. I describe how each of these patterns resolve the dark energy and dark matter forces differently.
It sounds dull but good architecture documentation is essential. Especially when you are actively trying to improve your architecture.
For example, I spend a lot time helping clients modernize their software architecture. More often than I like, I’m presented with a vague and lifeless collection of boxes and lines. As a result, it’s sometimes difficult to discuss the architecture in a meaningful and productive way. In this presentation, I’ll describe techniques for creating minimal yet effective documentation for your application’s microservice architecture. In particular, you will learn how documenting scenarios can bring your architecture to life.
Using patterns and pattern languages to make better architectural decisions Chris Richardson
This is a presentation that gave at the O'Reilly Software Architecture Superstream: Software Architecture Patterns.
The talk's focus is the microservices pattern language.
However, it also shows how thinking with the pattern mindset - context/problem/forces/solution/consequences - leads to better technically decisions.
The microservices architecture offers tremendous benefits, but it’s not a silver bullet. It also has some significant drawbacks. The microservices pattern language—a collection of patterns that solve architecture, design, development, and operational problems—enables software developers to apply the microservices architecture effectively. I provide an overview of the microservices architecture and examines the motivations for the pattern language, then takes you through the key patterns in the pattern language.
Rapid, reliable, frequent and sustainable software development requires an architecture that is loosely coupled and modular.
Teams need to be able complete their work with minimal coordination and communication with other teams.
They also need to be able keep the software’s technology stack up to date.
However, the microservice architecture isn’t always the only way to satisfy these requirements.
Yet, neither is the monolithic architecture.
In this talk, I describe loose coupling and modularity and why they are is essential.
You will learn about three architectural patterns: traditional monolith, modular monolith and microservices.
I describe the benefits, drawbacks and issues of each pattern and how well it supports rapid, reliable, frequent and sustainable development.
You will learn some heuristics for selecting the appropriate pattern for your application.
Events to the rescue: solving distributed data problems in a microservice arc...Chris Richardson
To deliver a large complex application rapidly, frequently and reliably, you often must use the microservice architecture.
The microservice architecture is an architectural style that structures the application as a collection of loosely coupled services.
One challenge with using microservices is that in order to be loosely coupled each service has its own private database.
As a result, implementing transactions and queries that span services is no longer straightforward.
In this presentation, you will learn how event-driven microservices address this challenge.
I describe how to use sagas, which is an asynchronous messaging-based pattern, to implement transactions that span services.
You will learn how to implement queries that span services using the CQRS pattern, which maintain easily queryable replicas using events.
A pattern language for microservices - June 2021 Chris Richardson
The microservice architecture is growing in popularity. It is an architectural style that structures an application as a set of loosely coupled services that are organized around business capabilities. Its goal is to enable the continuous delivery of large, complex applications. However, the microservice architecture is not a silver bullet and it has some significant drawbacks.
The goal of the microservices pattern language is to enable software developers to apply the microservice architecture effectively. It is a collection of patterns that solve architecture, design, development and operational problems. In this talk, I’ll provide an overview of the microservice architecture and describe the motivations for the pattern language. You will learn about the key patterns in the pattern language.
QConPlus 2021: Minimizing Design Time Coupling in a Microservice ArchitectureChris Richardson
Delivering large, complex software rapidly, frequently and reliably requires a loosely coupled organization. DevOps teams should rarely need to communicate and coordinate in order to get work done. Conway's law states that an organization and the architecture that it develops mirror one another. Hence, a loosely coupled organization requires a loosely coupled architecture.
In this presentation, you will learn about design-time coupling in a microservice architecture and why it's essential to minimize it. I describe how to design service APIs to reduce coupling. You will learn how to minimize design-time coupling by applying a version of the DRY principle. I describe how key microservices patterns potentially result in tight design time coupling and how to avoid it.
Mucon 2021 - Dark energy, dark matter: imperfect metaphors for designing micr...Chris Richardson
In order to explain certain astronomical observations, physicists created the mysterious concepts of dark energy and dark matter.
Dark energy is a repulsive force.
It’s an anti-gravity that is forcing matter apart and accelerating the expansion of the universe.
Dark matter has the opposite attraction effect.
Although it’s invisible, dark matter has a gravitational effect on stars and galaxies.
In this presentation, you will learn how these metaphors apply to the microservice architecture.
I describe how there are multiple repulsive forces that drive the decomposition of your application into services.
You will learn, however, that there are also multiple attractive forces that resist decomposition and bind software elements together.
I describe how as an architect you must find a way to balance these opposing forces.
Skillsmatter CloudNative eXchange 2020
The microservice architecture is a key part of cloud native.
An essential principle of the microservice architecture is loose coupling.
If you ignore this principle and develop tightly coupled services the result will mostly likely be yet another "microservices failure story”.
Your application will be brittle and have all of disadvantages of both the monolithic and microservice architectures.
In this talk you will learn about the different kinds of coupling and how to design loosely coupled microservices.
I describe how to minimize design time and increase the productivity of your DevOps teams.
You will learn how how to reduce runtime coupling and improve availability.
I describe how to improve availability by minimizing the coupling caused by your infrastructure.
DDD SoCal: Decompose your monolith: Ten principles for refactoring a monolith...Chris Richardson
This is a talk I gave at DDD SoCal.
1. Make the most of your monolith
2. Adopt microservices for the right reasons
3. It’s not just architecture
4. Get the support of the business
5. Migrate incrementally
6. Know your starting point
7. Begin with the end in mind
8. Migrate high-value modules first
9. Success is improved velocity and reliability
10. If it hurts, don’t do it
Decompose your monolith: Six principles for refactoring a monolith to microse...Chris Richardson
This was a talk I gave at the CTO virtual summit on July 28th. It describes 6 principles for refactoring to a microservice architecture.
1. Make the most of your monolith
2. Adopt microservices for the right reasons
3. Migrate incrementally
4. Begin with the end in mind
5. Migrate high-value modules first
6. Success is improved velocity and reliability
The microservice architecture is becoming increasingly important. But what is it exactly? Why should you care about microservices? And, what do you need to do to ensure that your organization uses the microservice architecture successfully? In this talk, I’ll answer these and other questions. You will learn about the motivations for the microservice architecture and why simply adopting microservices is insufficient. I describe essential characteristics of microservices, You will learn how a successful microservice architecture consists of loosely coupled services with stable APIs that communicate asynchronously.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Polygot persistence for Java Developers - August 2011 / @Oakjug
1. Polyglot persistence for Java
developers - moving out of the
relational comfort zone
Chris Richardson
Author of POJOs in Action
Founder of CloudFoundry.com
chris@chrisrichardson.net
@crichardson
2. Overall presentation goal
The joy and pain of
building Java
applications that
use NoSQL
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 2
3. About Chris
• Grew up in England and live in Oakland,
CA
• Over 25+ years of software development
experience including 14+ years of Java
• Speaker at JavaOne, SpringOne,
PhillyETE, Devoxx, etc.
• Organize the Oakland JUG and the
Groovy Grails meetup
http://www.theregister.co.uk/2009/08/19/springsource_cloud_foundry/
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 3
4. Agenda
o Why NoSQL?
o Overview of NoSQL databases
o Introduction to Spring Data
o Case study: POJOs in Action & NoSQL
8/19/11
Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 4
5. Relational databases are great
o SQL = Rich, declarative query language
o Database enforces referential integrity
o ACID semantics
o Well understood by developers
o Well supported by frameworks and tools, e.g. Spring
JDBC, Hibernate, JPA
o Well understood by operations
n Configuration
n Care and feeding
n Backups
n Tuning
n Failure and recovery
n Performance characteristics
o But….
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 5
6. Problem: Complex object graphs
o Object/relational
impedance
mismatch
o Complicated to
map rich domain
model to relational
schema
o Performance issues
n Many rows in many
tables
n Many joins
7. Problem: Semi-structured data
o Relational schema doesn’t easily handle
semi-structured data:
n Varying attributes
n Custom attributes on a customer record
o Common solution = Name/value table
n Poor performance
n E.g. Finding specific attributes for customers
satisfying some criteria = multi-way outer
JOIN
n Lack of constraints
o Another solution = Serialize as blob
n Fewer joins
n BUT can’t be queried
8. Problem: Schema evolution
o For example:
n Add attributes to an object è add
columns to table
o Schema changes =
n Holding locks for a long time è
application downtime
n $$
9. Problem: Scaling
o Scaling reads:
n Master/slave
n But beware of consistency issues
o Scaling writes
n Extremely difficult/impossible/expensive
n Vertical scaling is limited and requires $$
n Horizontal scaling is limited/requires $$
10. Solution: Buy high end technology
http://upload.wikimedia.org/wikipedia/commons/e/e5/Rising_Sun_Yacht.JPG
11. Solution: Hire more developers
o Application-level sharding
o Build your own middleware
o …
http://www.trekbikes.com/us/en/bikes/road/race_performance/madone_4_series/madone_4_5
12. Solution: Use NewSQL
o Led by Stonebraker
n Current databases are designed for 1970s
hardware and for both OLTP and data
warehouses
n http://www.slideshare.net/VoltDB/sql-
myths-webinar
o NewSQL
n Next generation SQL databases, e.g. VoltDB
n Leverage multi-core, commodity hardware
n In-memory
n Horizontally scalable
n Transparently shardable
n ACID
13. NoSQL databases are emerging…
Each one offers
some combination
of:
o Higher performance
o Higher scalability
o Richer data-model
o Schema-less
In return for:
o Limited transactions
o Relaxed consistency
o Unconstrained data
o …
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 13
14. … but there are few commonalities
o Everyone and their dog has written
one
o Different data models
n Key-value “Same sorry state as the database
market in the 1970s before SQL was
n Column invented”
http://queue.acm.org/detail.cfm?
n Document id=1961297
n Graph
o Different APIs
o No JDBC, Hibernate, JPA (generally)
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 14
15. Future = multi-paradigm data storage
for enterprise applications
IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 15
16. Agenda
o Why NoSQL?
o Overview of NoSQL databases
o Introduction to Spring Data
o Case study: POJOs in Action & NoSQL
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 16
17. Redis
o Advanced key-value store
n Values can be binary strings, Lists, Sets,
Sorted Sets, Hashes, …
n Data-type specific operations
o Very fast
n ~100K operations/second on entry-level
hardware
n In-memory operations K1 V1
o Persistent K2 V2
n Periodic snapshots of memory OR K3 V2
append commands to log file
o Transactions within a single server
n Atomic execution of batched commands
n Optimistic locking
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 17
18. Redis CLI Sorted set member = value + score
redis> zadd mysortedset 5.0 a
(integer) 1
redis> zadd mysortedset 10.0 b
(integer) 1
redis> zadd mysortedset 1.0 c
(integer) 1
redis> zrange mysortedset 0 1
1) "c"
2) "a"
redis> zrangebyscore mysortedset 1 6
1) "c"
2) "a"
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 18
19. Scaling Redis
o Master/slave replication
n Tree of Redis servers
n Non-persistent master can replicate to a
persistent slave
n Use slaves for read-only queries
o Sharding
n Client-side only – consistent hashing based
on key
n Server-side sharding – coming one day
o Run multiple servers per physical host
n Server is single threaded => Leverage
multiple CPUs
n 32 bit more efficient than 64 bit
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 19
20. Downsides of Redis
o Low-level API compared to SQL
o Single threaded:
n Multiple cores è multiple Redis servers
o Master/slave failover is manual
o Partitioning is done by the client
o Dataset has to fit in memory
21. Redis use cases
o Drop-in replacement for Memcached
n Session state
n Cache of data retrieved from SOR
o Replica of SOR for queries needing high-
performance
o Miscellaneous yet important
n Counting using INCR command, e.g. hit counts
n Most recent N items - LPUSH and LTRIM
n Randomly selecting an item – SRANDMEMBER
n Queuing – Lists with LPOP, RPUSH, ….
n High score tables – Sorted sets and ZINCRBY
n …
o Notable users: github, guardian.co.uk, ….
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 21
22. Cassandra
o An Apache open-source project
o Developed by Facebook for inbox search
o Column-oriented database/Extensible row store
n The data model will hurt your brain
n Row = map or map of maps
o Fast writes = append to a log
o Extremely scalable
n Transparent and dynamic clustering
n Rack and datacenter aware data replication
o Tunable read/write consistency per operation
n Writes: any, one replica, quorum of replicas, …, all
n Read: one, quorum, …, all
o CQL = “SQL”-like DDL and DML
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 22
23. Cassandra data model
My Column family (within a key space)
Keys Columns
a colA: value1 colB: value2 colC: value3
b colA: value colD: value colE: value
A column has a
timestamp to
o 4-D map: keySpace x key x columnFamily x column è
value
o Arbitrary number of columns
o Column names are dynamic; can contain data
o Columns for a row are stored on disk in order
determined by comparator
o One CF row = one DDD aggregate
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 23
24. Cassandra data model – insert/update
My Column family (within a key space)
Keys Columns
a colA: value1 colB: value2 colC: value3 Transaction =
updates to a
row within a
b colA: value colD: value colE: value ColumnFamily
Insert(key=a, columName=colZ, value=foo) Idempotent
Keys Columns
a colA: value1 colB: value2 colC: value3 colZ: foo
b colA: value colD: value colE: value
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 24
25. Cassandra query example – slice
Key Columns
s
colA: colB: colC: colZ:
a
value1 value2 value3 foo
colA: colD: colE:
b
value value value
slice(key=a, startColumn=colA, endColumnName=colC)
Key Columns You can also do a
s
rangeSlice which
colA: colB:
a
value1 value2 returns a range of keys
– less efficient
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 25
26. Super Column Families – one more
dimension
My Column family (within a key space)
Keys Super columns
ScA ScB
a
colA: value1 colB: value2 colC: value3
b
colA: value colD: value colE: value
Insert(key=a, superColumn=scB, columName=colZ, value=foo)
keySpace x key x columnFamily x superColumn x column -> value
Keys Super columns
ScA ScB
a
colA: value1 colB: value2 colC:colZ: foo
value3
b
colA: value colD: value colE: value
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 26
27. Getting data with super slice
My Column family (within a key space)
Keys Super columns
ScA ScB
a
colA: value1 colB: value2 colC: value3
b
colA: value colD: value colE: value
superSlice(key=a, startColumn=scB, endColumnName=scC)
Keys Super columns
ScB
a
colC: value3
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 27
28. Cassandra CLI
$ bin/cassandra-cli -h localhost
Connected to: "Test Cluster" on localhost/9160
Welcome to cassandra CLI.
[default@unknown] use Keyspace1;
Authenticated to keyspace: Keyspace1
[default@Keyspace1] list restaurantDetails;
Using default limit of 100
-------------------
RowKey: 1
=> (super_column=attributes,
(column=json, value={"id":
1,"name":"Ajanta","menuItems"....
[default@Keyspace1] get restaurantDetails['1']
['attributes’];
=> (column=json, value={"id":
1,"name":"Ajanta","menuItems"....
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 28
29. Scaling Cassandra
• Client connects to any node
• Dynamically add/remove nodes
Keys = [D, A]
Node 1 • Reads/Writes specify how many nodes
• Configurable # of replicas
Token = A • adjacent nodes
• rack and data center aware
replicates replicates
Node 4 Node 2
Keys = [A, B]
Token = D Token = B
replicates
Keys = [C, D] replicates Replicates to
Node 3
Token = C
Keys = [B, C]
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 29
30. Downsides of Cassandra
o Learning curve
o Still maturing, currently v0.8.4
o Limited queries, i.e. KV lookup
o Transactions limited to a column
family row
o Lacks an easy to use API
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 30
31. Cassandra use cases
o Use cases
• Big data
• Multiple Data Center distributed database
• Persistent cache
• (Write intensive) Logging
• High-availability (writes)
o Who is using it
n Digg, Facebook, Twitter, Reddit, Rackspace
n Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX
n The largest production cluster has over 100
TB of data in over 150 machines. –
Casssandra web site
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 31
32. MongoDB
o Document-oriented database
n JSON-style documents: Lists, Maps, primitives
n Documents organized into collections (~table)
n Schema-less
o Rich query language for dynamic queries
o Asynchronous, configurable writes:
n No wait
n Wait for replication
n Wait for write to disk
o Very fast
o Highly scalable and available:
n Replica sets (generalized master/slave)
n Sharding
n Transparent to client
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 32
33. Data Model = Binary JSON documents
{
"name" : "Sahn Maru", One document
"type" : ”Korean",
"serviceArea" : [ =
"94619",
"94618" one DDD aggregate
],
"openingHours" : [
{ DBObject o = new BasicDBObject();
"dayOfWeek" : "Wednesday", o.put("name", ”Sahn Maru");
"open" : 1730,
"close" : 2230 DBObject mi = new BasicDBObject();
} mi.put("name", "Daeji Bulgogi");
], …
"_id" : ObjectId("4bddc2f49d1505567c6220a0") List<DBObject> mis = Collections.singletonList(mi);
}
o.put("menuItems", mis);
o Sequence of bytes on disk = fast I/O
n No joins/seeks
n In-place updates when possible è no index updates
o Transaction = update of single document
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 33
35. MongoDB query by example
{
serviceArea:"94619", Find a
openingHours: {
$elemMatch : { restaurant
"dayOfWeek" : "Monday",
"open": {$lte: 1800}, that serves
}
"close": {$gte: 1800}
the 94619 zip
}
}
code and is
open at 6pm
DBCursor cursor = collection.find(qbeObject);
while (cursor.hasNext()) { on a Monday
DBObject o = cursor.next();
…
}
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 35
36. Scaling MongoDB
Shard 1 Shard 2
Mongod Mongod
(replica) (replica)
Mongod Mongod
(master) Mongod (master) Mongod
(replica) (replica)
Config
Server
mongod
A shard consists of a
mongos replica set =
generalization of
master slave
mongod
mongod Collections spread
over multiple
client shards
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 36
37. Mongo Downsides
o Server has a global write lock
n Single writer OR multiple readers
è Long running queries blocks writers
o Great that writes are not synchronous
n BUT perhaps an asynchronous response
would be better than a synchronous
getLastError()
Interesting story: http://www.slideshare.net/eonnen/from-100s-to-100s-of-millions
38. MongoDB use cases
o Use cases
n High volume writes
n Complex data
n Semi-structured data
o Who is using it?
n Shutterfly, Foursquare
n Bit.ly Intuit
n SourceForge, NY Times
n GILT Groupe, Evite,
n SugarCRM
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 38
39. Other NoSQL databases
Type Examples
Extensible columns/Column- Hbase
oriented SimpleDB
Graph Neo4j
Key-value Membase
Document CouchDb
http://nosql-database.org/ lists 122+ NoSQL databases
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 39
40. Picking a database
Application requirement Solution
Complex transactions/ACID Relational database
Scaling NoSQL
Social data Graph database
Multiple datacenters Cassandra
Highly-available writes Cassandra
Flexible data Document store
High write volumes Mongo, Cassandra
Super fast cache Redis
Adhoc queries Relational or Mongo
…
http://highscalability.com/blog/2011/6/20/35-use-cases-for-choosing-your-next-nosql-database.html
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 40
41. Proceed with caution
o Don’t commit to a
NoSQL DB until you
have done a
significant POC
o Encapsulate your data
access code so you
can switch
o Hope that one day
you won’t need ACID
42. Agenda
o Why NoSQL?
o Overview of NoSQL databases
o Introduction to Spring Data
o Case study: POJOs in Action & NoSQL
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 42
43. NoSQL Java APIs
Database Libraries
Redis Jedis, JRedis, JDBC-Redis, RJC
Cassandra Raw Thrift if you are a masochist
Hector, …
MongoDB MongoDB provides a Java driver
Some are not so easy to use
Stylistic differences
Boilerplate code
…
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 43
44. Spring Data Project Goals
Bring classic Spring value propositions to a wide
range of NoSQL databases
è
n Productivity
n Programming model consistency: E.g.
<NoSQL>Template classes
n “Portability”
http://www.springsource.org/spring-data
Slide 44
45. Spring Data sub-projects
§ Commons: Polyglot persistence
§ Key-Value: Redis, Riak
§ Document: MongoDB, CouchDB
§ Graph: Neo4j
§ GORM for NoSQL
§ Various milestone releases
§ Redis 1.0.0.M4 (July 20th, 2011)
§ Document 1.0.0.M2 (April 9, 2011)
§ Graph - Neo4j Support 1.0.0 (April 19, 2011)
§ …
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 45
47. Richer mapping
Annotations define mapping:
@Document, @Id, @Indexed,
@PersistanceConstructor,
@Document
@CompoundIndex, @DBRef,
public class Person {
@GeoSpatialIndexed, @Value
@Id
private ObjectId id; Map fields instead of properties
private String firstname; è no getters or setters required
@Indexed Non-default constructor
private String lastname;
Index generation
@PersistenceConstructor
public Person(String firstname, String lastname) {
this.firstname = firstname;
this.lastname = lastname;
}
….
}
Slide 47
48. Generic Mongo Repositories
interface PersonRepository extends MongoRepository<Person, ObjectId> {
List<Person> findByLastname(String lastName);
}
<bean>
<mongo:repositories
base-package="net.chrisrichardson.mongodb.example.mongorepository"
mongo-template-ref="mongoTemplate" />
</beans>
Person p = new Person("John", "Doe");
personRepository.save(p);
Person p2 = personRepository.findOne(p.getId());
List<Person> johnDoes = personRepository.findByLastname("Doe");
assertEquals(1, johnDoes.size());
Slide 48
49. Support for the QueryDSL project
Generated from Type-safe
domain model class composable queries
QPerson person = QPerson.person;
Predicate predicate =
person.homeAddress.street1.eq("1 High Street")
.and(person.firstname.eq("John"))
List<Person> people = personRepository.findAll(predicate);
assertEquals(1, people.size());
assertPersonEquals(p, people.get(0));
Slide 49
50. Cross-store/polyglot persistence
Person person = new Person(…);
@Entity entityManager.persist(person);
public class Person {
// In Database Person p2 = entityManager.find(…)
@Id private Long id;
private String firstname;
private String lastname;
// In MongoDB
@RelatedDocument private Address address;
{ "_id" : ObjectId(”….."),
"_entity_id" : NumberLong(1),
"_entity_class" : "net.. Person",
"_entity_field_name" : "address",
"zip" : "94611", "street1" : "1 High Street", …}
Slide 50
51. Agenda
o Why NoSQL?
o Overview of NoSQL databases
o Introduction to Spring Data
o Case study: POJOs in Action &
NoSQL
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 51
52. Food to Go – placing a takeout
order
o Customer enters delivery address and delivery time
o System displays available restaurants = restaurants
that serve the zip code of the delivery address AND
are open at the delivery time
class Restaurant { class TimeRange {
long id; long id;
String name; int dayOfWeek;
Set<String> serviceArea; int openingTime;
Set<TimeRange> openingHours;
int closingTime;
List<MenuItem> menuItems;
}
}
class MenuItem {
String name;
double price;
}
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 52
54. Finding available restaurants on
monday, 7.30pm for 94619 zip
select r.* Straightforward
from restaurant r three-way join
inner join restaurant_time_range tr
on r.id =tr.restaurant_id
inner join restaurant_zipcode sa
on r.id = sa.restaurant_id
Where ’94619’ = sa.zip_code
and tr.day_of_week=’monday’
and tr.openingtime <= 1930
and 1930 <=tr.closingtime
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 54
55. Redis - Persisting restaurants is
“easy”
rest:1:details [ name: “Ajanta”, … ]
Multiple KV value
rest:1:serviceArea [ “94619”, “94611”, …]
pairs
rest:1:openingHours [10, 11]
timerange:10 [“dayOfWeek”: “Monday”, ..]
timerange:11 [“dayOfWeek”: “Tuesday”, ..]
Single KV hash
OR
rest:1 [ name: “Ajanta”,
“serviceArea:0” : “94611”, “serviceArea:1” : “94619”,
“menuItem:0:name”, “Chicken Vindaloo”,
…]
OR
Single KV String
rest:1 { .. A BIG STRING/BYTE ARRAY, E.G. JSON }
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 55
56. BUT…
o … we can only retrieve them via primary key
è We need to implement indexes
è Queries instead of data model drives
NoSQL database design
o But how can a key-value store support a
query that has
?
n A 3-way join
n Multiple =
n > and <
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 56
57. Simplification #1: Denormalization
Restaurant_id Day_of_week Open_time Close_time Zip_code
1 Monday 1130 1430 94707
1 Monday 1130 1430 94619
1 Monday 1730 2130 94707
1 Monday 1730 2130 94619
2 Monday 0700 1430 94619
…
SELECT restaurant_id, open_time
FROM time_range_zip_code
WHERE day_of_week = ‘Monday’ Simpler query:
AND zip_code = 94619 § No joins
§ Two = and two <
AND 1815 < close_time
AND open_time < 1815
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 57
58. Simplification #2: Application filtering
SELECT restaurant_id, open_time
FROM time_range_zip_code
WHERE day_of_week = ‘Monday’ Even simple query
AND zip_code = 94619 • No joins
AND 1815 < close_time • Two = and one <
AND open_time < 1815
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 58
59. Simplification #3: Eliminate multiple
=’s with concatenation
Restaurant_id Zip_dow Open_time Close_time
1 94707:Monday 1130 1430
1 94619:Monday 1130 1430
1 94707:Monday 1730 2130
1 94619:Monday 1730 2130
2 94619:Monday 0700 1430
…
SELECT …
FROM time_range_zip_code
WHERE zip_code_day_of_week = ‘94619:Monday’
AND 1815 < close_time
key
range
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 59
60. Sorted sets support range queries
Key Sorted Set [ Entry:Score, …]
94707:Monday [1130_1:1430, 1730_1:2130]
94619:Monday [0700_2:1430, 1130_1:1430, 1730_1:2130]
zipCode:dayOfWeek Member: OpeningTime_RestaurantId
Score: ClosingTime
ZRANGEBYSCORE 94619:Monday 1815 2359
è
{1730_1}
1730 is before 1815 è Ajanta is open
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 60
61. What did I just do to query the data?
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 61
62. What did I just do to query the data?
o Wrote code to maintain an index
o Reduced performance due to extra
writes
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 62
63. RedisTemplate-based code
@Repository
public class AvailableRestaurantRepositoryRedisImpl implements AvailableRestaurantRepository {
@Autowired private final StringRedisTemplate redisTemplate;
private BoundZSetOperations<String, String> closingTimes(int dayOfWeek, String zipCode) {
return redisTemplate.boundZSetOps(AvailableRestaurantKeys.closingTimesKey(dayOfWeek, zipCode));
}
public List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime) {
String zipCode = deliveryAddress.getZip();
int timeOfDay = timeOfDay(deliveryTime);
int dayOfWeek = dayOfWeek(deliveryTime);
Set<String> closingTrs = closingTimes(dayOfWeek, zipCode).rangeByScore(timeOfDay, 2359);
Set<String> restaurantIds = new HashSet<String>();
String paddedTimeOfDay = FormattingUtil.format4(timeOfDay);
for (String trId : closingTrs) {
if (trId.substring(0, 4).compareTo(paddedTimeOfDay) <= 0)
restaurantIds.add(StringUtils.substringAfterLast(trId, "_"));
}
Collection<String> jsonForRestaurants =
redisTemplate.opsForValue().multiGet(AvailableRestaurantKeys.timeRangeRestaurantInfoKeys(restaurantIds ));
List<AvailableRestaurant> restaurants = new ArrayList<AvailableRestaurant>();
for (String json : jsonForRestaurants) {
restaurants.add(AvailableRestaurant.fromJson(json));
}
return restaurants;
}
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 63
64. Redis – Spring configuration
@Configuration
public class RedisConfiguration extends AbstractDatabaseConfig {
@Bean
public RedisConnectionFactory jedisConnectionFactory() {
JedisConnectionFactory factory = new JedisConnectionFactory();
factory.setHostName(databaseHostName);
factory.setPort(6379);
factory.setUsePool(true);
JedisPoolConfig poolConfig = new JedisPoolConfig();
poolConfig.setMaxActive(1000);
factory.setPoolConfig(poolConfig);
return factory;
}
@Bean
public StringRedisTemplate stringRedisTemplate(RedisConnectionFactory factory) {
StringRedisTemplate template = new StringRedisTemplate();
template.setConnectionFactory(factory);
return template;
}
}
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 64
65. Cassandra: Easy to store
restaurants
Column Family: RestaurantDetails
Keys Columns
1 name: Ajanta type: Indian …
name: Montclair
2 type: Breakfast …
Egg Shop
OR
Column Family: RestaurantDetails
Keys Columns
1 details: { JSON DOCUMENT }
2 details: { JSON DOCUMENT }
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 65
66. Querying using Cassandra
o Similar challenges to using Redis
o Limited querying options
n Row key – exact or range
n Column name – exact or range
o Use composite/concatenated keys
n Prefix - equality match
n Suffix - can be range scan
o No joins è denormalize
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 66
67. Cassandra: Find restaurants that close after the delivery
time and then filter
Keys Super Columns
1430 1430 2130
94619:Mon
1130_1: JSON FOR 1730_1: JSON FOR
0700_2: JSON FOR EGG
AJANTA AJANTA
SuperSlice
key= 94619:Mon
SliceStart = 1815
SliceEnd = 2359
Keys Super Columns
2130
94619:Mon
1730_1: JSON FOR
AJANTA
18:15 is after 17:30 => {Ajanta}
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 67
68. Cassandra/Hector code
import me.prettyprint.hector.api.Cluster;
public class CassandraHelper {
@Autowired private final Cluster cluster;
public <T> List<T> getSuperSlice(String keyspace, String columnFamily,
String key, String sliceStart, String sliceEnd,
SuperSliceResultMapper<T> resultMapper) {
SuperSliceQuery<String, String, String, String> q =
HFactory.createSuperSliceQuery(HFactory.createKeyspace(keyspace, cluster),
StringSerializer.get(), StringSerializer.get(), StringSerializer.get(), StringSerializer.get());
q.setColumnFamily(columnFamily);
q.setKey(key);
q.setRange(sliceStart, sliceEnd, false, 10000);
QueryResult<SuperSlice<String, String, String>> qr = q.execute();
SuperColumnRowProcessor<T> rowProcessor = new SuperColumnRowProcessor<T>(resultMapper);
for (HSuperColumn<String, String, String> superColumn : qr.get().getSuperColumns()) {
List<HColumn<String, String>> columns = superColumn.getColumns();
rowProcessor.processRow(key, superColumn.getName(), columns);
}
return rowProcessor.getResult();
}
}
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 68
69. MongoDB = easy to store
{
"_id": "1234"
"name": "Ajanta",
"serviceArea": ["94619", "99999"],
"openingHours": [
{
"dayOfWeek": 1,
"open": 1130,
"close": 1430
},
{
"dayOfWeek": 2,
"open": 1130,
"close": 1430
},
…
]
}
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 69
70. MongoDB = easy to query
{
"serviceArea": "94619",
"openingHours": {
"$elemMatch": {
"open": { "$lte": 1815},
"dayOfWeek": 4,
"close": { $gte": 1815}
}
}
db.availableRestaurants.ensureIndex({serviceArea: 1})
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 70
71. MongoTemplate-based code
@Repository
public class AvailableRestaurantRepositoryMongoDbImpl
implements AvailableRestaurantRepository {
@Autowired private final MongoTemplate mongoTemplate;
@Autowired @Override
public List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress,
Date deliveryTime) {
int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime);
int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime);
Query query = new Query(where("serviceArea").is(deliveryAddress.getZip())
.and("openingHours”).elemMatch(where("dayOfWeek").is(dayOfWeek)
.and("openingTime").lte(timeOfDay)
.and("closingTime").gte(timeOfDay)));
return mongoTemplate.find(AVAILABLE_RESTAURANTS_COLLECTION, query,
AvailableRestaurant.class);
}
mongoTemplate.ensureIndex(“availableRestaurants”,
new Index().on("serviceArea", Order.ASCENDING));
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 71
72. MongoDB – Spring Configuration
@Configuration
public class MongoConfig extends AbstractDatabaseConfig {
private @Value("#{mongoDbProperties.databaseName}")
String mongoDbDatabase;
public @Bean MongoFactoryBean mongo() {
MongoFactoryBean factory = new MongoFactoryBean();
factory.setHost(databaseHostName);
MongoOptions options = new MongoOptions();
options.connectionsPerHost = 500;
factory.setMongoOptions(options);
return factory;
}
public @Bean
MongoTemplate mongoTemplate(Mongo mongo) throws Exception {
MongoTemplate mongoTemplate = new MongoTemplate(mongo, mongoDbDatabase);
mongoTemplate.setWriteConcern(WriteConcern.SAFE);
mongoTemplate.setWriteResultChecking(WriteResultChecking.EXCEPTION);
return mongoTemplate;
}
}
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 72
73. Summary
o Relational databases are great but
n Object/relational impedance mismatch
n Relational schema is rigid
n Extremely difficult/impossible to scale writes
n Performance can be suboptimal
o Each NoSQL databases can solve some
combination of those problems BUT
n Limited transactions
n One day needing ACID è major rewrite
n Query-driven, denormalized database design
n …
è
o Carefully pick the NoSQL DB for your application
o Consider a polyglot persistence architecture
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 74
74. Thank you!
My contact info:
chris@chrisrichardson.net
@crichardson
8/19/11 Copyright (c) 2011 Chris Richardson. All rights reserved.
Slide 75