The document introduces MongoDB, an open-source document database that provides high performance, high availability, and easy scalability. MongoDB keeps data as JSON-like documents which allows for flexible schemas and is well-suited for applications that work with unstructured or semi-structured data. The document also discusses how MongoDB can be used in conjunction with Hadoop for large-scale data processing and analytics workloads that require more than just a document database.
- Problems with traditional data centers.
- Cloud computing definition, deployment, and services models.
- Essential characteristics of cloud services.
- IaaS examples.
- PaaS examples.
- SaaS examples.
- Cloud enabling technologies such as grid computing, utility computing, service oriented architecture (SOA), The Internet, Multi-tenancy, Web 2.0, Automation and Virtualization.
Presentation from Embedded Linux Conference 2015 in Dublin, where Tieto presented a concept of an Intelligent Home IoT Gateway, the session received a very good feedback. Authors: Andrzej Wieczorek and Bartosz Markowski
www.tieto.com
- Problems with traditional data centers.
- Cloud computing definition, deployment, and services models.
- Essential characteristics of cloud services.
- IaaS examples.
- PaaS examples.
- SaaS examples.
- Cloud enabling technologies such as grid computing, utility computing, service oriented architecture (SOA), The Internet, Multi-tenancy, Web 2.0, Automation and Virtualization.
Presentation from Embedded Linux Conference 2015 in Dublin, where Tieto presented a concept of an Intelligent Home IoT Gateway, the session received a very good feedback. Authors: Andrzej Wieczorek and Bartosz Markowski
www.tieto.com
Internet of Things with Cloud Computing and M2M CommunicationSherin C Abraham
The IoT is the network of physical objects with intelligence. It can be more secure with MQTT protocol for Machine to Machine Communication and more storage capability can be achieved by using cloud computing.
Internet of Things, Innovation and India by Syam MadanapalliSyam Madanapalli
The presentation defines the Internet of Things for a layman and he/she should be able to relate the IoT in his/her daily life. The presentation also covers how to deploy IoT services and how to innovate for developing new use cases and applications. And additionally the presentation provides few considerations for building new IoT product/solution.
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
Learning Objectives - In this module, you will understand what is Big Data, What are the limitations of the existing solutions for Big Data problem; How Hadoop solves the Big Data problem, What are the common Hadoop ecosystem components, Hadoop Architecture, HDFS and Map Reduce Framework, and Anatomy of File Write and Read.
Edge computing is a distributed computing paradigm that brings computation and data storage closer to the sources of data. This ppt contains everything about Edge Computing Starting from its Definition, needs, terms involved to its merits, demerits and application use cases
A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system.
Edge computing allows data produced by internet of things (IoT) devices to be processed closer to where it is created instead of sending it across long routes to data centers or clouds.
Doing this computing closer to the edge of the network lets organizations analyze important data in near real-time – a need of organizations across many industries, including manufacturing, health care, telecommunications and finance.Edge computing deployments are ideal in a variety of circumstances. One is when IoT devices have poor connectivity and it’s not efficient for IoT devices to be constantly connected to a central cloud.
Other use cases have to do with latency-sensitive processing of information. Edge computing reduces latency because data does not have to traverse over a network to a data center or cloud for processing. This is ideal for situations where latencies of milliseconds can be untenable, such as in financial services or manufacturing.
MongoDB, Hadoop and humongous data - MongoSV 2012Steven Francia
Learn how to integrate MongoDB with Hadoop for large-scale distributed data processing. Using tools like MapReduce, Pig and Streaming you will learn how to do analytics and ETL on large datasets with the ability to load and save data against MongoDB. With Hadoop MapReduce, Java and Scala programmers will find a native solution for using MapReduce to process their data with MongoDB. Programmers of all kinds will find a new way to work with ETL using Pig to extract and analyze large datasets and persist the results to MongoDB. Python and Ruby Programmers can rejoice as well in a new way to write native Mongo MapReduce using the Hadoop Streaming interfaces.
Internet of Things with Cloud Computing and M2M CommunicationSherin C Abraham
The IoT is the network of physical objects with intelligence. It can be more secure with MQTT protocol for Machine to Machine Communication and more storage capability can be achieved by using cloud computing.
Internet of Things, Innovation and India by Syam MadanapalliSyam Madanapalli
The presentation defines the Internet of Things for a layman and he/she should be able to relate the IoT in his/her daily life. The presentation also covers how to deploy IoT services and how to innovate for developing new use cases and applications. And additionally the presentation provides few considerations for building new IoT product/solution.
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
Learning Objectives - In this module, you will understand what is Big Data, What are the limitations of the existing solutions for Big Data problem; How Hadoop solves the Big Data problem, What are the common Hadoop ecosystem components, Hadoop Architecture, HDFS and Map Reduce Framework, and Anatomy of File Write and Read.
Edge computing is a distributed computing paradigm that brings computation and data storage closer to the sources of data. This ppt contains everything about Edge Computing Starting from its Definition, needs, terms involved to its merits, demerits and application use cases
A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system.
Edge computing allows data produced by internet of things (IoT) devices to be processed closer to where it is created instead of sending it across long routes to data centers or clouds.
Doing this computing closer to the edge of the network lets organizations analyze important data in near real-time – a need of organizations across many industries, including manufacturing, health care, telecommunications and finance.Edge computing deployments are ideal in a variety of circumstances. One is when IoT devices have poor connectivity and it’s not efficient for IoT devices to be constantly connected to a central cloud.
Other use cases have to do with latency-sensitive processing of information. Edge computing reduces latency because data does not have to traverse over a network to a data center or cloud for processing. This is ideal for situations where latencies of milliseconds can be untenable, such as in financial services or manufacturing.
MongoDB, Hadoop and humongous data - MongoSV 2012Steven Francia
Learn how to integrate MongoDB with Hadoop for large-scale distributed data processing. Using tools like MapReduce, Pig and Streaming you will learn how to do analytics and ETL on large datasets with the ability to load and save data against MongoDB. With Hadoop MapReduce, Java and Scala programmers will find a native solution for using MapReduce to process their data with MongoDB. Programmers of all kinds will find a new way to work with ETL using Pig to extract and analyze large datasets and persist the results to MongoDB. Python and Ruby Programmers can rejoice as well in a new way to write native Mongo MapReduce using the Hadoop Streaming interfaces.
This presentation was given at the LDS Tech SORT Conference 2011 in Salt Lake City. The slides are quite comprehensive covering many topics on MongoDB. Rather than a traditional presentation, this was presented as more of a Q & A session. Topics covered include. Introduction to MongoDB, Use Cases, Schema design, High availability (replication) and Horizontal Scaling (sharding).
This tutorial will introduce the features of MongoDB by building a simple location-based application using MongoDB. The tutorial will cover the basics of MongoDB’s document model, query language, map-reduce framework and deployment architecture.
The tutorial will be divided into 5 sections:
Data modeling with MongoDB: documents, collections and databases
Querying your data: simple queries, geospatial queries, and text-searching
Writes and updates: using MongoDB’s atomic update modifiers
Trending and analytics: Using mapreduce and MongoDB’s aggregation framework
Deploying the sample application
Besides the knowledge to start building their own applications with MongoDB, attendees will finish the session with a working application they use to check into locations around Portland from any HTML5 enabled phone!
TUTORIAL PREREQUISITES
Each attendee should have a running version of MongoDB. Preferably the latest unstable release 2.1.x, but any install after 2.0 should be fine. You can dowload MongoDB at http://www.mongodb.org/downloads.
Instructions for installing MongoDB are at http://docs.mongodb.org/manual/installation/.
Additionally we will be building an app in Ruby. Ruby 1.9.3+ is required for this. The current latest version of ruby is 1.9.3-p194.
For windows download the http://rubyinstaller.org/
For OSX download http://unfiniti.com/software/mac/jewelrybox/
For linux most users should know how to for their own distributions.
We will be using the following GEMs and they MUST BE installed ahead of time so you can be ahead of the game and safe in the event that the Internet isn’t accommodating.
bson (1.6.4)
bson_ext (1.6.4)
haml (3.1.4)
mongo (1.6.4)
rack (1.4.1)
rack-protection (1.2.0)
rack shotgun (0.9)
sinatra (1.3.2)
tilt (1.3.3)
Prior ruby experience isn’t required for this. We will NOT be using rails for this app.
MongoDB presentation for NYC Python's June meetup. Brief discussion on non-relational databases in general followed by an example of using MongoDB as a blog's backend
Presentation on MongoDB given at the Hadoop DC meetup in October 2009. Some of the slides at the end are extra examples that didn't appear in the talk, but might be of interest.
State of the Gopher Nation - Golang - August 2017Steven Francia
This talk is an overview of the Go project. It covers “what we’ve done”, “why we did it” and “where we are going” as a project.
It highlights our accomplishments, challenges and how the Go Project is working on our challenges.
The Future of the Operating System - Keynote LinuxCon 2015Steven Francia
Linux has become the foundation for infrastructure everywhere as it defined application portability from the desktop to the phone and from to the data center to the cloud. As applications become increasingly distributed in nature, the Docker platform serves as the cornerstone of Linux’s evolution solidifying the dominance of Linux today and into tomorrow.
Given as a Keynote at LinuxCon 2015 in Tokyo
Given at GopherFest 2015. This is an updated version of the talk I gave in NYC Nov 14 at GothamGo.
“We need to think about failure differently. Most people think mistakes are a necessary evil. Mistakes aren't a necessary evil, they aren't evil at all. They are an inevitable consequence of doing something new and as such should be seen as valuable. “ - Ed Catmull
As Go is a "new" programming language we are all experimenting and learning how to write better Go. While most presentations focus on the destination, this presentation focuses on the journey of learning Go and the mistakes I personally made while developing Hugo, Cobra, Viper, Afero & Docker.
What every successful open source project needsSteven Francia
In the last few years open source has transformed the software industry. From Android to Wikipedia, open source is everywhere, but how does one succeed in it? While open source projects come in all shapes and sizes and all forms of governance, no matter what kind of project you’re a part of, there are a set of fundamentals that lead to success. I’d like to share some of the lessons I’ve learned from running two of the largest commercial open source projects, Docker and MongoDB, as well as some very successful community projects.
This presentation was delievered at sinfo.org in Feb 2015.
7 Common mistakes in Go and when to avoid themSteven Francia
I've spent the past two years developing some of the most popular libraries and applications written in Go. I've also made a lot of mistakes along the way. Recognizing that "The only real mistake is the one from which we learn nothing. -John Powell", I would like to share with you the mistakes that I have made over my journey with Go and how you can avoid them.
Go for Object Oriented Programmers or Object Oriented Programming without Obj...Steven Francia
Object Oriented (OO) programming has dominated software engineering for the last two decades. The paradigm built on powerful concepts such as Encapsulation, Inheritance, and Polymoprhism has been internalized by the majority of software engineers. Although Go is not OO in the strict sense, we can continue to leverage the skills we’ve honed as OO engineers to come up with simple and solid designs.
Gopher Steve Francia, Author of
[Hugo](http://hugo.spf13.com), [Cobra](http://github.com/spf13/cobra), and many
other popular Go packages makes these difficult concepts accessible for everyone.
If you’re a OO programmer, especially one with a background with dynamic languages and are curious about go then this talk is for you. We will cover everything you need to know to leverage your existing skills and quickly start coding in go including:
How to use our Object Oriented programming fundamentals in go
Static and pseudo dynamic typing in go
Building fluent interfaces in go
Using go interfaces and duck typing to simplify architecture
Common mistakes made by those coming to go from other OO languages (Ruby, Python, Javascript, etc.),
Principles of good design in go.
This presentation will give developers an introduction and practical experience
of using MongoDB with the Go language. MongoDB Chief Developer Advocate &
Gopher Steve Francia presents plainly what you need to know about using MongoDB
with Go.
As an emerging language Go is able to start fresh without years of relational database dependencies. Application and library developers are able to build applications using the excellent Mgo MongoDB driver and the reliable go sql package for relational database. Find out why some people claim Go and MongoDB are a “pair made in heaven” and “the best database driver they’ve ever used” in this talk by Gustavo Niemeyer, the author of the mgo driver, and Steve Francia, the drivers team lead at MongoDB Inc.
We will cover:
Connecting to MongoDB in various configurations
Performing basic operations in Mgo
Marshaling data to and from MongoDB
Asynchronous & Concurrent operations
Pre-fetching batches for seamless performance
Using GridFS
How MongoDB uses Mgo internally
This presentation was given as a Workshop at OSCON 2014.
New to Go? This tutorial will give developers an introduction and practical
experience in building applications with the Go language. Gopher Steve Francia,
Author of [Hugo](http://hugo.spf13.com),
[Cobra](http://github.com/spf13/cobra), and many other popular Go packages
breaks it down step by step as you build your own full featured Go application.
Starting with an introduction to the Go language. He then reviews the fantastic
go tools available. With our environment ready we will learn by doing. The
remainder of the time will be dedicated to building a working go web and cli
application. Through our application development experience we will introduce
key features, libraries and best practices of using Go.
This tutorial is designed with developers in mind. Prior experience with any of the
following languages: ruby, perl, java, c#, javascript, php, node.js, or python
is preferred. We will be using the MongoDB database as a backend for our
application.
We will be using/learning a variety of libraries including:
* bytes and strings
* templates
* net/http
* io, fmt, errors
* cobra
* mgo
* Gin
* Go.Rice
* Cobra
* Viper
Discover & identify ideal storage solution for our needs by examining the history of data storage & the modern database systems including Key Value, Relational, Graph and Document databases.
This presentation was given at RootsTech 2013 in March
While Hadoop is the most well-known technology in big data, it’s not always the most approachable or appropriate solution for data storage and processing. In this session you’ll learn about enterprise NoSQL architectures, with examples drawn from real-world deployments, as well as how to apply big data regardless of the size of your own enterprise.
Replication, Durability, and Disaster RecoverySteven Francia
This session introduces the basic components of high availability before going into a deep dive on MongoDB replication. We'll explore some of the advanced capabilities with MongoDB replication and best practices to ensure data durability and redundancy. We'll also look at various deployment scenarios and disaster recovery configurations.
Strategies for multi-data center deployment. Diving into the details of deploying of MongoDB across multiple data centers.
Covers the advantages of a multi data center deployment for read/write locality, the various deployment strategies, and disaster preparedness and recovery.
In addition, we’ll look at the MongoDB roadmap and planned enhancements around data center awareness.
This presentation was given at MongoNYC 2012. The animations didn’t survive the transformation to the web, so not all the meaning carries over perfectly.
An unprecedented amount of data is being created and is accessible. This presentation will instruct on using the new NoSQL technologies to make sense of all this data.
Building your first application w/mongoDB MongoSV2011Steven Francia
This talk will introduce the features of MongoDB by walking through how one can building a simple location-based application using MongoDB. The talk will cover the basics of MongoDB's document model, query language, map-reduce framework and deployment architecture.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
2. Talking about
MongoDB Intro & Fundamentals
Why MongoDB & Hadoop
Getting Started
Using MongoDB & Hadoop
Future of Big Data
3. Steve @sp
A
15+ years building
the internet
Father, husband,
skateboarder
Chief Solutions Architect @
responsible for drivers,
integrations, web & docs
4. Company behind MongoDB
Offices in NYC, Palo Alto, London & Dublin
100+ employees
Support, consulting, training
Mgt: Google/DoubleClick, Oracle, Apple, NetApp, Mark Logic
Well Funded: Sequoia, Union Square, Flybridge
6. MongoDB
Application Document
Oriented
High { author : “steve”,
date : new Date(),
Performance
text : “About MongoDB...”,
tags : [“tech”, “database”]}
Fully
Consistent
Horizontally Scalable
7. MongoDB philosophy
Keep functionality when we can (key/value
stores are great, but we need more)
Non-relational (no joins) makes scaling
horizontally practical
Document data models are good
Database technology should run anywhere
virtualized, cloud, metal, etc
8. Under the hood
Written in C++
Runs nearly everywhere
Data serialized to BSON
Extensive use of memory-mapped files
i.e. read-through write-through
memory caching.
10. “
MongoDB has the best
features of key/value
stores, document
databases and
relational databases
in one.
John Nunemaker
11. Relational made normalized
data look like this
Category
• Name
• Url
Article
User • Name
Tag
• Name • Slug • Name
• Email Address • Publish date • Url
• Text
Comment
• Comment
• Date
• Author
12. Document databases make
normalized data look like this
Article
• Name
• Slug
• Publish date
User • Text
• Name • Author
• Email Address
Comment[]
• Comment
• Date
• Author
Tag[]
• Value
Category[]
• Value
14. CMS / Blog
Needs:
• Business needed modern data store for rapid development and
scale
Solution:
• Use PHP & MongoDB
Results:
• Real time statistics
• All data, images, etc stored together
easy access, easy deployment, easy high availability
• No need for complex migrations
• Enabled very rapid development and growth
15. Photo Meta-Data
Problem:
• Business needed more flexibility than Oracle could deliver
Solution:
• Use MongoDB instead of Oracle
Results:
• Developed application in one sprint cycle
• 500% cost reduction compared to Oracle
• 900% performance improvement compared to Oracle
16. Customer Analytics
Problem:
• Deal with massive data volume across all customer sites
Solution:
• Use MongoDB to replace Google Analytics / Omniture options
Results:
• Less than one week to build prototype and prove business case
• Rapid deployment of new features
17. Archiving
Why MongoDB:
• Existing application built on MySQL
• Lots of friction with RDBMS based archive storage
• Needed more scalable archive storage backend
Solution:
• Keep MySQL for active data (100mil)
• MongoDB for archive (2+ billion)
Results:
• No more alter table statements taking over 2 months to run
• Sharding enabled horizontal scale
• Very happily looking at other places to use MongoDB
18. Online Dictionary
Problem:
• MySQL could not scale to handle their 5B+ documents
Solution:
• Switched from MySQL to MongoDB
Results:
• Massive simplification of code base
• Eliminated need for external caching system
• 20x performance improvement over MySQL
19. E-commerce
Problem:
• Multi-vertical E-commerce impossible to model (efficiently) in
RDBMS
Solution:
• Switched from MySQL to MongoDB
Results:
• Massive simplification of code base
• Rapidly build, halving time to market (and cost)
• Eliminated need for external caching system
• 50x+ performance improvement over MySQL
20. Tons more
MongoDB casts a wide net
people keep coming up with
new and brilliant ways to use it
23. Applications have
complex needs
Use the best tool for the job
Often more than one tool is needed
MongoDB ideal operational database
MongoDB ideal for BIG data
Not a data processing engine
For heavy processing needs use tool designed
for that job ... Hadoop
24. MongoDB Map Reduce
MongoDB map reduce quite capable... but with limits
- Javascript not best language for processing map
reduce
- Javascript limited in external data processing
libraries
- Adds load to data store
- Sharded environments do parallel processing
25. MongoDB
Aggregation
Most uses of MongoDB Map Reduce were for
aggregation
Aggregation Framework optimized for aggregate
queries
Fixes some of limits of MongoDB MR
- Can do realtime aggregation similar to SQL GroupBy
- parallel processing on sharded clusters
26. MongoDB Map Reduce
Map()
MongoDB Data
Group(k)
emit(k,v)
map iterates on
documents
Document is $this
Sort(k)
1 at time per shard
Reduce(k,values)
k,v
Finalize(k,v)
Input matches output
k,v Can run multiple times
27. Hadoop Map Reduce
Runs on same
1 1
InputFormat Map (k , v , ctx) thread as map
Many map operations ctx.write(k2,v2) Combiner(k2,values2)
1 at time per input
split same as k 2, v 3
Mongo's emit
similar to
Mongo's reducer
similar to Partitioner(k2)
Mongo's group
Sort(keys2)
Reducer threads
similar to
Mongo's Finalize
Reduce(k3,values4)
Output Format Runs once per key
kf,vf
28. MongoDB & Hadoop
same as Mongo's Many map operations
MongoDB shard chunks (64mb) 1 at time per input split
Creates a list each split Map (k1,1v1,1ctx) Runs on same
of Input Splits Map (k ,1v ,1ctx) thread as map
each split Map (k , v , ctx)
single server or
sharded cluster (InputFormat) each split ctx.write(k2,v2)2
ctx.write(k2,v )2 Combiner(k2,values2)2
RecordReader ctx.write(k2,v ) Combiner(k2,values )2
Combiner(k2,values )
k2, 2v3 3
k , 2v 3
k ,v
Partitioner(k2)2
Partitioner(k )2
Partitioner(k )
Sort(keys2)
Sort(k2)2
Sort(k )
MongoDB
Reducer threads
Reduce(k2,values3)
Output Format Runs once per key
kf,vf
51. Google 2000
Google Inc, today announced it
has released the largest search
engine on the Internet.
Google’s new index, comprising
more than 1 billion URLs
52. Google 2008
Our indexing system for processing
links indicates that
we now count 1 trillion unique URLs
(and the number of individual web
pages out there is growing by
several billion pages per day).
53. BIG 2012 & Beyond
MongoDB enables us to scale
with the redefinition of BIG.
New processing tools like
Hadoop & Storm are enabling
us to process the new BIG.
55. MongoDB is
committed to
working with best
data tools including
Storm, Spark, &
more
56. http://spf13.com
http://github.com/s
@spf13
Question
download at mongodb.org
We’re hiring!! Contact us at jobs@10gen.com
Editor's Notes
\n
10\n15\n10\n5\n
\n
\n
\n
\n
By reducing transactional semantics the db provides, one can still solve an interesting set of problems where performance is very important, and horizontal scaling then becomes easier.\n\n