This document provides an introduction to CouchDB on Rails. It discusses CouchDB's support for high availability, easy replication, clustering, and short recovery times. CouchDB uses JSON documents and JavaScript for querying and aggregating data. Views allow indexing and querying documents. SimplyStored provides an ActiveRecord-like interface for CouchDB. RockingChair provides an in-memory version of CouchDB for testing. The document also covers CouchDB replication, load balancing, caching, sharding, and other features.
MongoDB + Java - Everything you need to know Norberto Leite
Learn everything you need to know to get started building a MongoDB-based app in Java. We'll explore the relationship between MongoDB and various languages on the Java Virtual Machine such as Java, Scala, and Clojure. From there, we'll examine the popular frameworks and integration points between MongoDB and the JVM including Spring Data and object-document mappers like Morphia.
This talk is focused on tuning analysing and optimizing MongoDB query and index with the use of Database Profiler and "explain()" function.
Also, performance of database can also be impacted by configuring the underline ( Linux ) OS with some recommended settings which do not come by default.
EWD 3 Training Course Part 21: Persistent JavaScript ObjectsRob Tweed
This presentation is Part 21 of the EWD 3 Training Course. It explains how Document Node objects and its $() function allow the abstraction of Persistent JavaScript Objects from Global Storage
MongoDB + Java - Everything you need to know Norberto Leite
Learn everything you need to know to get started building a MongoDB-based app in Java. We'll explore the relationship between MongoDB and various languages on the Java Virtual Machine such as Java, Scala, and Clojure. From there, we'll examine the popular frameworks and integration points between MongoDB and the JVM including Spring Data and object-document mappers like Morphia.
This talk is focused on tuning analysing and optimizing MongoDB query and index with the use of Database Profiler and "explain()" function.
Also, performance of database can also be impacted by configuring the underline ( Linux ) OS with some recommended settings which do not come by default.
EWD 3 Training Course Part 21: Persistent JavaScript ObjectsRob Tweed
This presentation is Part 21 of the EWD 3 Training Course. It explains how Document Node objects and its $() function allow the abstraction of Persistent JavaScript Objects from Global Storage
EWD 3 Training Course Part 24: Traversing a Document's Leaf NodesRob Tweed
This presentation is Part 24 of the EWD 3 Training Course. It examines another way to iterate through Global Storage, via its leaf nodes. In some situations this can be a faster and more efficient technique.
After a short introduction to the Java driver for MongoDB, we'll have a look at the more abtract persistence frameworks like Morphia, Spring Data, Jongo and Hibernate OGM.
For developers new to MongoDB and Node.js, however, some the common design patterns are very different than those of a RDBMS and traditional synchronous languages. Developers learning these technologies together may find it a bit bewildering. In reality, however, these tools fit perfectly together and enable I high degree of developer productivity and application performance.
This webinar will walk developers through common MongoDB development patterns in Node.js, such as efficiently loading data into MongoDB using MongoDB's bulk API, iterating through query results, and managing simultaneous asynchronous MongoDB queries to provide the best possible application performance. Working Node.js and MongoDB examples will be used throughout the presentation.
Mythbusting: Understanding How We Measure the Performance of MongoDBMongoDB
Benchmarking, benchmarking, benchmarking. We all do it, mostly it tells us what we want to hear but often hides a mountain of misinformation. In this talk we will walk through the pitfalls that you might find yourself in by looking at some examples where things go wrong. We will then walk through how MongoDB performance is measured, the processes and methodology and ways to present and look at the information.
MongoDB Europe 2016 - Debugging MongoDB PerformanceMongoDB
Asya is back, and so is Sherlock Holmes and his techniques to gather and analyze data from your poorly performing MongoDB clusters. In this advanced talk we take a deep look at all the diagnostic data that lives inside MongoDB - how to interrogate and interpret it to help you solve those frustrating performance bottlenecks that we all face occasionally.
This presentation will demonstrate how you can use the aggregation pipeline with MongoDB similar to how you would use GROUP BY in SQL and the new stage operators coming 3.4. MongoDB’s Aggregation Framework has many operators that give you the ability to get more value out of your data, discover usage patterns within your data, or use the Aggregation Framework to power your application. Considerations regarding version, indexing, operators, and saving the output will be reviewed.
Just a few years ago all software systems were designed to be monoliths running on a single big and powerful machine. But nowadays most companies desire to scale out instead of scaling up, because it is much easier to buy or rent a large cluster of commodity hardware then to get a single machine that is powerful enough. In the database area scaling out is realized by utilizing a combination of polyglot persistence and sharding of data. On the application level scaling out is realized by microservices. In this talk I will briefly introduce the concepts and ideas of microservices and discuss their benefits and drawbacks. Afterwards I will focus on the point of intersection of a microservice based application talking to one or many NoSQL databases. We will try and find answers to these questions: Are the differences to a monolithic application? How to scale the whole system properly? What about polyglot persistence? Is there a data-centric way to split microservices?
Just a few years ago all software systems were designed to be monoliths running on a single big and powerful machine. But nowadays most companies desire to scale out instead of scaling up, because it is much easier to buy or rent a large cluster of commodity hardware then to get a single machine that is powerful enough. In the database area scaling out is realized by utilizing a combination of polyglot persistence and sharding of data. On the application level scaling out is realized by microservices. In this talk I will briefly introduce the concepts and ideas of microservices and discuss their benefits and drawbacks. Afterwards I will focus on the point of intersection of a microservice based application talking to one or many NoSQL databases. We will try and find answers to these questions: Are the differences to a monolithic application? How to scale the whole system properly? What about polyglot persistence? Is there a data-centric way to split microservices?
Map/Confused? A practical approach to Map/Reduce with MongoDBUwe Printz
Talk given at MongoDb Munich on 16.10.2012 about the different approaches in MongoDB for using the Map/Reduce algorithm. The talk compares the performance of built-in MongoDB Map/Reduce, group(), aggregate(), find() and the MongoDB-Hadoop Adapter using a practical use case.
CouchDB Mobile - From Couch to 5K in 1 HourPeter Friese
In this talk, I explain how to use CouchDB mobile to connect your iPhone or Android phone with a a remote ChouchDB to build a RunKeeper clone. The code for this talk is available at https://github.com/peterfriese/CouchTo5K
These are slides from our Big Data Warehouse Meetup in April. We talked about NoSQL databases: What they are, how they’re used and where they fit in existing enterprise data ecosystems.
Mike O’Brian from 10gen, introduced the syntax and usage patterns for a new aggregation system in MongoDB and give some demonstrations of aggregation using the new system. The new MongoDB aggregation framework makes it simple to do tasks such as counting, averaging, and finding minima or maxima while grouping by keys in a collection, complementing MongoDB’s built-in map/reduce capabilities.
For more information, visit our website at http://casertaconcepts.com/ or email us at info@casertaconcepts.com.
EWD 3 Training Course Part 24: Traversing a Document's Leaf NodesRob Tweed
This presentation is Part 24 of the EWD 3 Training Course. It examines another way to iterate through Global Storage, via its leaf nodes. In some situations this can be a faster and more efficient technique.
After a short introduction to the Java driver for MongoDB, we'll have a look at the more abtract persistence frameworks like Morphia, Spring Data, Jongo and Hibernate OGM.
For developers new to MongoDB and Node.js, however, some the common design patterns are very different than those of a RDBMS and traditional synchronous languages. Developers learning these technologies together may find it a bit bewildering. In reality, however, these tools fit perfectly together and enable I high degree of developer productivity and application performance.
This webinar will walk developers through common MongoDB development patterns in Node.js, such as efficiently loading data into MongoDB using MongoDB's bulk API, iterating through query results, and managing simultaneous asynchronous MongoDB queries to provide the best possible application performance. Working Node.js and MongoDB examples will be used throughout the presentation.
Mythbusting: Understanding How We Measure the Performance of MongoDBMongoDB
Benchmarking, benchmarking, benchmarking. We all do it, mostly it tells us what we want to hear but often hides a mountain of misinformation. In this talk we will walk through the pitfalls that you might find yourself in by looking at some examples where things go wrong. We will then walk through how MongoDB performance is measured, the processes and methodology and ways to present and look at the information.
MongoDB Europe 2016 - Debugging MongoDB PerformanceMongoDB
Asya is back, and so is Sherlock Holmes and his techniques to gather and analyze data from your poorly performing MongoDB clusters. In this advanced talk we take a deep look at all the diagnostic data that lives inside MongoDB - how to interrogate and interpret it to help you solve those frustrating performance bottlenecks that we all face occasionally.
This presentation will demonstrate how you can use the aggregation pipeline with MongoDB similar to how you would use GROUP BY in SQL and the new stage operators coming 3.4. MongoDB’s Aggregation Framework has many operators that give you the ability to get more value out of your data, discover usage patterns within your data, or use the Aggregation Framework to power your application. Considerations regarding version, indexing, operators, and saving the output will be reviewed.
Just a few years ago all software systems were designed to be monoliths running on a single big and powerful machine. But nowadays most companies desire to scale out instead of scaling up, because it is much easier to buy or rent a large cluster of commodity hardware then to get a single machine that is powerful enough. In the database area scaling out is realized by utilizing a combination of polyglot persistence and sharding of data. On the application level scaling out is realized by microservices. In this talk I will briefly introduce the concepts and ideas of microservices and discuss their benefits and drawbacks. Afterwards I will focus on the point of intersection of a microservice based application talking to one or many NoSQL databases. We will try and find answers to these questions: Are the differences to a monolithic application? How to scale the whole system properly? What about polyglot persistence? Is there a data-centric way to split microservices?
Just a few years ago all software systems were designed to be monoliths running on a single big and powerful machine. But nowadays most companies desire to scale out instead of scaling up, because it is much easier to buy or rent a large cluster of commodity hardware then to get a single machine that is powerful enough. In the database area scaling out is realized by utilizing a combination of polyglot persistence and sharding of data. On the application level scaling out is realized by microservices. In this talk I will briefly introduce the concepts and ideas of microservices and discuss their benefits and drawbacks. Afterwards I will focus on the point of intersection of a microservice based application talking to one or many NoSQL databases. We will try and find answers to these questions: Are the differences to a monolithic application? How to scale the whole system properly? What about polyglot persistence? Is there a data-centric way to split microservices?
Map/Confused? A practical approach to Map/Reduce with MongoDBUwe Printz
Talk given at MongoDb Munich on 16.10.2012 about the different approaches in MongoDB for using the Map/Reduce algorithm. The talk compares the performance of built-in MongoDB Map/Reduce, group(), aggregate(), find() and the MongoDB-Hadoop Adapter using a practical use case.
CouchDB Mobile - From Couch to 5K in 1 HourPeter Friese
In this talk, I explain how to use CouchDB mobile to connect your iPhone or Android phone with a a remote ChouchDB to build a RunKeeper clone. The code for this talk is available at https://github.com/peterfriese/CouchTo5K
These are slides from our Big Data Warehouse Meetup in April. We talked about NoSQL databases: What they are, how they’re used and where they fit in existing enterprise data ecosystems.
Mike O’Brian from 10gen, introduced the syntax and usage patterns for a new aggregation system in MongoDB and give some demonstrations of aggregation using the new system. The new MongoDB aggregation framework makes it simple to do tasks such as counting, averaging, and finding minima or maxima while grouping by keys in a collection, complementing MongoDB’s built-in map/reduce capabilities.
For more information, visit our website at http://casertaconcepts.com/ or email us at info@casertaconcepts.com.
Apache Spark for Library Developers with William Benton and Erik ErlandsonDatabricks
As a developer, data engineer, or data scientist, you’ve seen how Apache Spark is expressive enough to let you solve problems elegantly and efficient enough to let you scale out to handle more data. However, if you’re solving the same problems again and again, you probably want to capture and distribute your solutions so that you can focus on new problems and so other people can reuse and remix them: you want to develop a library that extends Spark.
You faced a learning curve when you first started using Spark, and you’ll face a different learning curve as you start to develop reusable abstractions atop Spark. In this talk, two experienced Spark library developers will give you the background and context you’ll need to turn your code into a library that you can share with the world. We’ll cover: Issues to consider when developing parallel algorithms with Spark, Designing generic, robust functions that operate on data frames and datasets, Extending data frames with user-defined functions (UDFs) and user-defined aggregates (UDAFs), Best practices around caching and broadcasting, and why these are especially important for library developers, Integrating with ML pipelines, Exposing key functionality in both Python and Scala, and How to test, build, and publish your library for the community.
We’ll back up our advice with concrete examples from real packages built atop Spark. You’ll leave this talk informed and inspired to take your Spark proficiency to the next level and develop and publish an awesome library of your own.
The CAP theorem is widely known for distributed systems, but it's not the only tradeoff you should be aware of. For datastores, there is also the FAB theory and just like with the CAP theorem you can only pick two:
Fast: Results are real-time or near real-time instead of batch-oriented.
Accurate: Answers are exact and don't have a margin of error.
Big: You require horizontal scaling and need to distribute your data.
While Fast and Big are relatively easy to understand, Accurate is a bit harder to picture. This talk shows some concrete examples of accuracy tradeoffs Elasticsearch can take for terms aggregations, cardinality aggregations with HyperLogLog++, and the IDF part of the full-text search. Or how to trade some speed or the distribution for more accuracy.
Demo presentation given at the Semantic Web Applications and Tools for Life Science (SWAT4LS) 2014 meeting in Berlin, Dec 10, 2014. http://www.swat4ls.org/workshops/berlin2014/scientific-programme/
Philipp Krenn | Make Your Data FABulous | Codemotion Madrid 2018Codemotion
The CAP theorem is widely known for distributed systems, but it's not the only tradeoff you should be aware of. For datastores there is also the FAB theory and just like with the CAP theorem you can only pick two: fast, accurate, big. While Fast and Big are relatively easy to understand, Accurate is a bit harder to picture. This talk shows some concrete examples of accuracy tradeoffs Elasticsearch can take for terms aggregations, cardinality aggregations with HyperLogLog++, and the IDF part of full-text search. Or how to trade some speed or the distribution for more accuracy.
Webinar: Data Processing and Aggregation OptionsMongoDB
MongoDB scales easily to store mass volumes of data. However, when it comes to making sense of it all what options do you have? In this talk, we'll take a look at 3 different ways of aggregating your data with MongoDB, and determine the reasons why you might choose one way over another. No matter what your big data needs are, you will find out how MongoDB the big data store is evolving to help make sense of your data.
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaGuido Schmutz
A Kafka cluster stores streams of records (messages) in categories called topics. It is the architectural backbone for integrating streaming data with a Data Lake, Microservices and Stream Processing. Today’s enterprises have their core systems often implemented on top of relational databases, such as the Oracle RDBMS. Implementing a new solution supporting the digital strategy using Kafka and the ecosystem can not always be done completely separate from the traditional legacy solutions. Often streaming data has to be enriched with state data which is held in an RDBMS of a legacy application. It’s important to cache this data in the stream processing solution, so that It can be efficiently joined to the data stream. But how do we make sure that the cache is kept up-to-date, if the source data changes? We can either poll for changes from Kafka using Kafka Connect or let the RDBMS push the data changes to Kafka. But what about writing data back to the legacy application, i.e. an anomaly is detected inside the stream processing solution which should trigger an action inside the legacy application. Using Kafka Connect we can write to a database table or view, which could trigger the action. But this not always the best option. If you have an Oracle RDBMS, there are many other ways to integrate the database with Kafka, such as Advanced Queueing (message broker in the database), CDC through Golden Gate or Debezium, Oracle REST Database Service (ORDS) and more. In this session, we present various blueprints for integrating an Oracle RDBMS with Apache Kafka in both directions and discuss how these blueprints can be implemented using the products mentioned before.
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...confluent
A Kafka cluster stores streams of records (messages) in categories called topics. It is the architectural backbone for integrating streaming data with a Data Lake, Microservices and Stream Processing. Today's enterprises have their core systems often implemented on top of relational databases, such as the Oracle RDBMS. Implementing a new solution supporting the digital strategy using Kafka and the ecosystem can not always be done completely separate from the traditional legacy solutions. Often streaming data has to be enriched with state data which is held in an RDBMS of a legacy application. It's important to cache this data in the stream processing solution, so that It can be efficiently joined to the data stream. But how do we make sure that the cache is kept up-to-date, if the source data changes? We can either poll for changes from Kafka using Kafka Connect or let the RDBMS push the data changes to Kafka. But what about writing data back to the legacy application, i.e. an anomaly is detected inside the stream processing solution which should trigger an action inside the legacy application. Using Kafka Connect we can write to a database table or view, which could trigger the action. But this not always the best option. If you have an Oracle RDBMS, there are many other ways to integrate the database with Kafka, such as Advanced Queueing (message broker in the database), CDC through Golden Gate or Debezium, Oracle REST Database Service (ORDS) and more. In this session, we present various blueprints for integrating an Oracle RDBMS with Apache Kafka in both directions and discuss how these blueprints can be implemented using the products mentioned before.
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...MongoDB
Este es el cuarto seminario web de la serie Conceptos básicos, en la que se realiza una introducción a la base de datos MongoDB. Este seminario se ve en la compatibilidad con índices de texto libre y geoespaciales.
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
Apache Kafka is a popular distributed streaming data platform. A Kafka cluster stores streams of records (messages) in categories called topics. It is the architectural backbone for integrating streaming data with a Data Lake, Microservices and Stream Processing. Data sources flowing into Kafka are often native data streams such as social media streams, telemetry data, financial transactions and many others. But these data stream only contain part of the information. A lot of data necessary in stream processing is stored in traditional systems backed by relational databases. To implement new and modern, real-time solutions, an up-to-date view of that information is needed. So how do we make sure that information can flow between the RDBMS and Kafka, so that changes are available in Kafka as soon as possible in near-real-time? This session will present different approaches for integrating relational databases with Kafka, such as Kafka Connect, Oracle GoldenGate and bridging Kafka with Oracle Advanced Queuing (AQ).
AWS OpsWorks & Chef at the Hamburg Chef User Group 2014Jonathan Weiss
An introduction to AWS OpsWorks and how it uses Chef. Differences between OpsWorks and Chef server.
Presented by Jonathan Weiss on January 14th 2014 at the Hamburg Chef User Group.
An introduction into Backbone.js – a lightweight MVC framework. Backbone supplies structure to JavaScript-heavy applications by providing models with key-value binding and custom events, collections with a rich API of enumerable functions, views with declarative event handling, and connects it all to your existing application over a RESTful JSON interface.
Build your own clouds with Chef and MCollectiveJonathan Weiss
One important part of the DevOps movement is infrastructure automation, especially if you are running your application on top of services like Amazon EC2.
Everybody's dream is to be able to bootstrap and deploy hundreds or even thousands of machines with a few simple commands. This talk will tell you how you can do this using Open Source tools like Chef and mcollective. Chef manages your servers configuration using a nice Ruby DSL while mcollective orchestrates and commands all your nodes.
Platforms like Amazon EC2 promise scalable and redundant systems for a couple of pennies. As soon as you start to build complex systems or migrate existing apps there are many knobs to set. This talk will explain how you can create and deploy reliable and redundant applications to EC2 and will point out all the little things you need to know, like how to automatically provision new servers with tools like Chef.
Presented by Jonathan Weiss at PHP UK Conference 2011 in London.
Overview of how to manage deployments and clusters in the Amazon cloud. Introduction into Chef. Presented by Jonathan Weiss at RailsCamp DE in Cologne.
Rails in the Cloud - Experiences from running on EC2Jonathan Weiss
Overview of architectures in EC2 and services like EBS, ELB, RDS, and ElasticIPs. How to get your app on EC2. Configuration and deployment with Chef. Presented by Jonathan Weiss at RailsWayCon 2010 in Berlin
Jonathan Weiss gives an overview of the NoSQL databases. Why would you consider one, what are the tradeoffs? Given at BarCampRuhr3 2010 in Essen, Germany (Slides are English).
Ruby on CouchDB - SimplyStored and RockingChairJonathan Weiss
Presentation by Jonathan Weiss about Ruby on CouchDB at Ruby User Group Berlin in Marc 2010. Present SimplyStored, a nice wrapper for Ruby object. RockingChair is an in-memory CouchDB for speeding up your tests.
2. Who am I?
Working for Peritor in Berlin, Germany
Written, maintain, or involved in
Webistrano
Capistrano
SimplyStored
Happening
The great fire of London
http://github.com/jweiss
@jweiss
2
30. Example Map Result
Map functions are similar to SQL indices
ID KEY VALUE
51ABFA211 Cap 1
ABC123456 Cappy 1
BCCD12CBB Helmet 1
BCCD12CBB Sombrero 1
Sorted by the key
Key can also be an array
Value can be complex JSON
30
31. Query a view
GET /dbname/_design/hats/_view/all
HTTP Client
{"total_rows":348,"offset":0,"rows”:[
{"id":"A","key":"A","value":1},
{"id":"B","key":"B","value":1},
]}
31
32. Query a view
GET /dbname/_design/hats/_view/all?
include_docs=true
HTTP Client
32
41. RockingChair
In-memory CouchDB
Just a big Hash
Understands all SimplyStored generated views
Speeds up tests
Tests can run in parallel
Nice for debugging
BSD-licensed on
http://github.com/jweiss/rocking_chair
41