The document discusses how SNCF deployed MongoDB replicasets using Docker and MongoDB Ops Manager. It describes combining these tools to create replicasets faster and more reliably. Specific steps included using Ops Manager's API to create groups and policies for backups and alerts. Docker was used to standardize MongoDB deployments across containers and automate operations like starting containers and mounting volumes. The overall goal was to make MongoDB deployments cheaper, faster, and more resilient through automation.
Real-time Streaming Pipelines with FLaNKData Con LA
Introducing the FLaNK stack which combines Apache Flink, Apache NiFi and Apache Kafka to build fast applications for IoT, AI, rapid ingest and deploy them anywhere. I will walk through live demos and show how to do this yourself.
FLaNK provides a quick set of tools to build applications at any scale for any streaming and IoT use cases.
We will discuss a use case - Smart Stocks with FLaNK (NiFi, Kafka, Flink SQL)
Bio -
Tim Spann is an avid blogger and the Big Data Zone Leader for Dzone (https://dzone.com/users/297029/bunkertor.html). He runs the the successful Future of Data Princeton meetup with over 1200 members at http://www.meetup.com/futureofdata-princeton/. He is currently a Senior Solutions Engineer at Cloudera in the Princeton New Jersey area. You can find all the source and material behind his talks at his Github and Community blog:
https://github.com/tspannhw/ApacheDeepLearning201
https://community.hortonworks.com/users/9304/tspann.html
Deep Dive Into How To Monitor MySQL or MariaDB Galera Cluster / Percona XtraD...Severalnines
MySQL provides hundreds of status counters, but how do you make sense of all that monitoring data?
If you’re in Operations and your job is to monitor the health of MySQL/MariaDB Galera Cluster or Percona XtraDB Cluster, then this webinar is for you. Setting up a Galera Cluster is fairly straightforward, but keeping it in a good shape and knowing what to look for when it’s having production issues can be a challenge.
Status counters can be tricky to read …
Which of them are more important than others?
How do you find your way in a labyrinth of different variables?
Which of them can make a significant difference?
How might a host’s health impact MySQL performance?
How to identify problematic nodes in your cluster?
To find out more, read these webinar slides (or watch the replay).
Our colleague Krzysztof Książek provided a deep-dive session on what to monitor in Galera Cluster for MySQL & MariaDB. Krzysztof is a MySQL DBA with experience in managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard.
Amongst other things, Krzysztof discussed why having a good monitoring system is a must, covering the following topics:
Galera monitoring
• cluster status
• flow control
Host metrics and their impact on MySQL
• CPU
• memory
• I/O
InnoDB metrics
• CPU-related
• I/O-related
Lessons learned while taking Presto from alpha to production at Twitter. Presented at the Presto meetup at Facebook on 2015.03.22.
Video: https://www.facebook.com/prestodb/videos/531276353732033/
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive DataSumit Rangwala
The “People You May Know” (PYMK) recommendation service helps LinkedIn’s members identify other members that they might want to connect to and is the major driver for growing LinkedIn's social network. The principal challenge in developing a service like PYMK is dealing with the sheer scale of computation needed to make precise recommendations with a high recall. PYMK service at LinkedIn has been operational for over a decade, during which it has evolved from an Oracle-backed system that took weeks to compute recommendations to a Hadoop backed system that took a few days to compute recommendations to its most modern embodiment where it can compute recommendations in near real time.
This talk will present the evolution of PYMK to its current architecture. We will focus on various systems we built along the way, with an emphasis on systems we built for our most recent architecture, namely Gaia, our real-time graph computing capability, and Venice our online feature store with scoring capability, and how we integrate these individual systems to generate recommendations in a timely and agile manner, while still being cost-efficient. We will briefly talk about the lessons learned about scalability limits of our past and current design choices and how we plan to tackle the scalability challenges for the next phase of growth.
https://qcon.ai/qconai2019/presentation/people-you-may-know-fast-recommendations-over-massive-data
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineData Con LA
In this talk, we will discuss how we use Spark as part of a hybrid RDBMS architecture that includes Hadoop and HBase. The optimizer evaluates each query and sends OLTP traffic (including CRUD queries) to HBase and OLAP traffic to Spark. We will focus on the challenges of handling the tradeoffs inherent in an integrated architecture that simultaneously handles real-time and batch traffic. Lessons learned include: - Embedding Spark into a RDBMS - Running Spark on Yarn and isolating OLTP traffic from OLAP traffic - Accelerating the generation of Spark RDDs from HBase - Customizing the Spark UI The lessons learned can also be applied to other hybrid systems, such as Lambda architectures.
Bio:-
John Leach is the CTO and Co-Founder of Splice Machine. With over 15 years of software experience under his belt, John’s expertise in analytics and BI drives his role as Chief Technology Officer. Prior to Splice Machine, John founded Incite Retail in June 2008 and led the company’s strategy and development efforts. At Incite Retail, he built custom Big Data systems (leveraging HBase and Hadoop) for Fortune 500 companies. Prior to Incite Retail, he ran the business intelligence practice at Blue Martini Software and built strategic partnerships with integration partners. John was a key subject matter expert for Blue Martini Software in many strategic implementations across the world. His focus at Blue Martini was helping clients incorporate decision support knowledge into their current business processes utilizing advanced algorithms and machine learning. John received dual bachelor’s degrees in biomedical and mechanical engineering from Washington University in Saint Louis. Leach is the organizer emeritus for the Saint Louis Hadoop Users Group and is active in the Washington University Elliot Society.
Apache Kafka is a simple, high-performance, distributed, fault-tolerant messaging system. It was initially developed at LinkedIn and is now used at many companies, including Twitter, Square, Mozilla, Foursquare, and Tumblr. This talk will cover the architecture of Kafka and how LinkedIn uses Kafka to build a distributed low-latency pipeline that handles all messaging, tracking, logging, and metrics data. This unified pipeline provides data feeds into Hadoop and a diverse set of user-facing real-time stream processing applications. We will describe the lessons learned scaling this service to thousands of data feeds and many terabytes of messages per day.
Real-time Streaming Pipelines with FLaNKData Con LA
Introducing the FLaNK stack which combines Apache Flink, Apache NiFi and Apache Kafka to build fast applications for IoT, AI, rapid ingest and deploy them anywhere. I will walk through live demos and show how to do this yourself.
FLaNK provides a quick set of tools to build applications at any scale for any streaming and IoT use cases.
We will discuss a use case - Smart Stocks with FLaNK (NiFi, Kafka, Flink SQL)
Bio -
Tim Spann is an avid blogger and the Big Data Zone Leader for Dzone (https://dzone.com/users/297029/bunkertor.html). He runs the the successful Future of Data Princeton meetup with over 1200 members at http://www.meetup.com/futureofdata-princeton/. He is currently a Senior Solutions Engineer at Cloudera in the Princeton New Jersey area. You can find all the source and material behind his talks at his Github and Community blog:
https://github.com/tspannhw/ApacheDeepLearning201
https://community.hortonworks.com/users/9304/tspann.html
Deep Dive Into How To Monitor MySQL or MariaDB Galera Cluster / Percona XtraD...Severalnines
MySQL provides hundreds of status counters, but how do you make sense of all that monitoring data?
If you’re in Operations and your job is to monitor the health of MySQL/MariaDB Galera Cluster or Percona XtraDB Cluster, then this webinar is for you. Setting up a Galera Cluster is fairly straightforward, but keeping it in a good shape and knowing what to look for when it’s having production issues can be a challenge.
Status counters can be tricky to read …
Which of them are more important than others?
How do you find your way in a labyrinth of different variables?
Which of them can make a significant difference?
How might a host’s health impact MySQL performance?
How to identify problematic nodes in your cluster?
To find out more, read these webinar slides (or watch the replay).
Our colleague Krzysztof Książek provided a deep-dive session on what to monitor in Galera Cluster for MySQL & MariaDB. Krzysztof is a MySQL DBA with experience in managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard.
Amongst other things, Krzysztof discussed why having a good monitoring system is a must, covering the following topics:
Galera monitoring
• cluster status
• flow control
Host metrics and their impact on MySQL
• CPU
• memory
• I/O
InnoDB metrics
• CPU-related
• I/O-related
Lessons learned while taking Presto from alpha to production at Twitter. Presented at the Presto meetup at Facebook on 2015.03.22.
Video: https://www.facebook.com/prestodb/videos/531276353732033/
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive DataSumit Rangwala
The “People You May Know” (PYMK) recommendation service helps LinkedIn’s members identify other members that they might want to connect to and is the major driver for growing LinkedIn's social network. The principal challenge in developing a service like PYMK is dealing with the sheer scale of computation needed to make precise recommendations with a high recall. PYMK service at LinkedIn has been operational for over a decade, during which it has evolved from an Oracle-backed system that took weeks to compute recommendations to a Hadoop backed system that took a few days to compute recommendations to its most modern embodiment where it can compute recommendations in near real time.
This talk will present the evolution of PYMK to its current architecture. We will focus on various systems we built along the way, with an emphasis on systems we built for our most recent architecture, namely Gaia, our real-time graph computing capability, and Venice our online feature store with scoring capability, and how we integrate these individual systems to generate recommendations in a timely and agile manner, while still being cost-efficient. We will briefly talk about the lessons learned about scalability limits of our past and current design choices and how we plan to tackle the scalability challenges for the next phase of growth.
https://qcon.ai/qconai2019/presentation/people-you-may-know-fast-recommendations-over-massive-data
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineData Con LA
In this talk, we will discuss how we use Spark as part of a hybrid RDBMS architecture that includes Hadoop and HBase. The optimizer evaluates each query and sends OLTP traffic (including CRUD queries) to HBase and OLAP traffic to Spark. We will focus on the challenges of handling the tradeoffs inherent in an integrated architecture that simultaneously handles real-time and batch traffic. Lessons learned include: - Embedding Spark into a RDBMS - Running Spark on Yarn and isolating OLTP traffic from OLAP traffic - Accelerating the generation of Spark RDDs from HBase - Customizing the Spark UI The lessons learned can also be applied to other hybrid systems, such as Lambda architectures.
Bio:-
John Leach is the CTO and Co-Founder of Splice Machine. With over 15 years of software experience under his belt, John’s expertise in analytics and BI drives his role as Chief Technology Officer. Prior to Splice Machine, John founded Incite Retail in June 2008 and led the company’s strategy and development efforts. At Incite Retail, he built custom Big Data systems (leveraging HBase and Hadoop) for Fortune 500 companies. Prior to Incite Retail, he ran the business intelligence practice at Blue Martini Software and built strategic partnerships with integration partners. John was a key subject matter expert for Blue Martini Software in many strategic implementations across the world. His focus at Blue Martini was helping clients incorporate decision support knowledge into their current business processes utilizing advanced algorithms and machine learning. John received dual bachelor’s degrees in biomedical and mechanical engineering from Washington University in Saint Louis. Leach is the organizer emeritus for the Saint Louis Hadoop Users Group and is active in the Washington University Elliot Society.
Apache Kafka is a simple, high-performance, distributed, fault-tolerant messaging system. It was initially developed at LinkedIn and is now used at many companies, including Twitter, Square, Mozilla, Foursquare, and Tumblr. This talk will cover the architecture of Kafka and how LinkedIn uses Kafka to build a distributed low-latency pipeline that handles all messaging, tracking, logging, and metrics data. This unified pipeline provides data feeds into Hadoop and a diverse set of user-facing real-time stream processing applications. We will describe the lessons learned scaling this service to thousands of data feeds and many terabytes of messages per day.
Rolling Out Apache HBase for Mobile Offerings at Visa HBaseCon
Partha Saha and CW Chung (Visa)
Visa has embarked on an ambitious multi-year redesign of its entire data platform that powers its business. As part of this plan, the Apache Hadoop ecosystem, including HBase, will now become a staple in many of its solutions. Here, we will describe our journey in rolling out a high-availability NoSQL solution based on HBase behind some of our prominent mobile offerings.
Andrew Ryan describes how Facebook operates Hadoop to provide access as a shared resource between groups.
More information and video at:
http://developer.yahoo.com/blogs/hadoop/posts/2011/02/hug-feb-2011-recap/
Big data hadoop flume spark cloudera Oracle big data appliance apache , oracle loader for hadoop, Big data copy. Exadata to Big data appliance. bilginc It academy.
ApacheCon 2020 - Flink SQL in 2020: Time to show off!Timo Walther
Four years ago, the Apache Flink community started adding SQL support to ease and unify the processing of static and streaming data. Today, Flink runs business critical batch and streaming SQL queries at Alibaba, Huawei, Lyft, Uber, Yelp, and many others. Although the community made significant progress in the past years, there are still many things on the roadmap and the development is still speeding up. In the past months, several significant improvements and extensions were added including support for DDL statements, refactorings of the type system and the catalog interface, as well as Apache Hive integration. Since it is difficult to follow all development efforts that happen around Flink SQL and its ecosystem, it is time for an update. This session will focus on a comprehensive demo of what is possible with Flink SQL in 2020. Based on a realistic use case scenario, we'll show how to define tables which are backed by various storage systems and how to solve common tasks with streaming SQL queries. We will demonstrate Flink's Hive integration and show how to define and use user-defined functions. We'll close the session with an outlook of upcoming features.
Vladimir Rodionov (Hortonworks)
Time-series applications (sensor data, application/system logging events, user interactions etc) present a new set of data storage challenges: very high velocity and very high volume of data. This talk will present the recent development in Apache HBase that make it a good fit for time-series applications.
Mm.. FLaNK Stack (MiNiFi MXNet Flink NiFi Kudu Kafka)Timothy Spann
Mm.. FLaNK Stack (MiNiFi MXNet Flink NiFi Kudu Kafka)
A quick discussion and demo of the FLaNK stack.
Streaming development with Apache NiFi, Apache Kafka, Apache Flink and friends.
Dec 2019, Timothy Spann, Field Engineer, Data in Motion
Princeton Meetup 10-dec-2019
https://www.meetup.com/futureofdata-princeton/events/266496424/
Hosted By PGA Fund at:
https://pga.fund/coworking-space/
Princeton Growth Accelerator
5 Independence Way, 4th Floor, Princeton, NJ
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...DataStax
If you are interested in Big Data, you surely already know Apache Spark or Apache Cassandra, but do you know Apache Zeppelin ? Do you know that it is possible to draw out beautiful graph using an user-friendly interface out of your Spark RDD and Cassandra queries ?
In this session, I will introduce Zeppelin by live coding example and highlight its modular architecture which allows you to plug-in any interpreter for the back-end of your choice.
Then we'll dig into the Apache Cassandra interpreter and show how to use it as a default front-end to display your Cassandra data
About the Speaker
DuyHai DOAN Apache Cassandra Evangelist, DataStax
DuyHai DOAN is an Apache Cassandra Evangelist at DataStax. He spends his time between technical presentations/meetups on Cassandra, coding on open source projects like Achilles or Apache Zeppelin to support the community and helping all companies using Cassandra to make their project successful. Previously he was working as a freelance Java/Cassandra consultant.
Apache Phoenix: Use Cases and New FeaturesHBaseCon
James Taylor (Salesforce) and Maryann Xue (Intel)
This talk with be broken into two parts: Phoenix use cases and new Phoenix features. Three use cases will be presented as lightning talks by individuals from 1) Sony about its social media NewsSuite app, 2) eHarmony on its matching service, and 3) Salesforce.com on its time-series metrics engine. Two new features will be discussed in detail by the engineers who developed them: ACID transactions in Phoenix through Apache Tephra. and cost-based query optimization through Apache Calcite. The focus will be on helping end users more easily develop scalable applications on top of Phoenix.
This talk will explain best practices for upgrade techniques in MySQL. In deep dive, we will go over how to upgrade successfully to MySQL 8.0. Explain MySQL 8.0 upgrade specific challenges. Go over gotchas and best practices. Review the latest version of MySQL 8.0 and bug reports.
Deploying any software can be a challenge if you don't understand how resources are used or how to plan for the capacity of your systems. Whether you need to deploy or grow a single MongoDB instance, replica set, or tens of sharded clusters then you probably share the same challenges in trying to size that deployment.
MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...MongoDB
MongoDB can be used in the Nuxeo Platform as a replacement for more traditional SQL databases. Nuxeo's content repository, which is the cornerstone of this open source enterprise content management platform, integrates completely with MongoDB for data storage. This presentation will explain the motivation for using MongoDB and will emphasize the different implementation choices driven by the very nature of a NoSQL datastore like MongoDB. Learn how Nuxeo integrated MongoDB into the platform which resulted in increased performance (including actual benchmarks) and better response to some use cases.
MongoDB Europe 2016 - Star in a Reasonably Priced Car - Which Driver is Best?MongoDB
MongoDB's unique Idiomatic Drivers let you work natively with database objects in your favourite language, removing the need to explicitly convert your data and queries to text formats such as SQL, Javascript or XML. Drivers do all the hard work of translating to serialised BSON objects on the wire, removing the need for server-side parsing and ensuring security against injection attacks. Server load and hardware requirements are reduced at the expense of additional client side CPU cycles. In this presentation we compare the performance of drivers in a number of languages to see what impact your language choice can have on your hosting costs and throughput.
Rolling Out Apache HBase for Mobile Offerings at Visa HBaseCon
Partha Saha and CW Chung (Visa)
Visa has embarked on an ambitious multi-year redesign of its entire data platform that powers its business. As part of this plan, the Apache Hadoop ecosystem, including HBase, will now become a staple in many of its solutions. Here, we will describe our journey in rolling out a high-availability NoSQL solution based on HBase behind some of our prominent mobile offerings.
Andrew Ryan describes how Facebook operates Hadoop to provide access as a shared resource between groups.
More information and video at:
http://developer.yahoo.com/blogs/hadoop/posts/2011/02/hug-feb-2011-recap/
Big data hadoop flume spark cloudera Oracle big data appliance apache , oracle loader for hadoop, Big data copy. Exadata to Big data appliance. bilginc It academy.
ApacheCon 2020 - Flink SQL in 2020: Time to show off!Timo Walther
Four years ago, the Apache Flink community started adding SQL support to ease and unify the processing of static and streaming data. Today, Flink runs business critical batch and streaming SQL queries at Alibaba, Huawei, Lyft, Uber, Yelp, and many others. Although the community made significant progress in the past years, there are still many things on the roadmap and the development is still speeding up. In the past months, several significant improvements and extensions were added including support for DDL statements, refactorings of the type system and the catalog interface, as well as Apache Hive integration. Since it is difficult to follow all development efforts that happen around Flink SQL and its ecosystem, it is time for an update. This session will focus on a comprehensive demo of what is possible with Flink SQL in 2020. Based on a realistic use case scenario, we'll show how to define tables which are backed by various storage systems and how to solve common tasks with streaming SQL queries. We will demonstrate Flink's Hive integration and show how to define and use user-defined functions. We'll close the session with an outlook of upcoming features.
Vladimir Rodionov (Hortonworks)
Time-series applications (sensor data, application/system logging events, user interactions etc) present a new set of data storage challenges: very high velocity and very high volume of data. This talk will present the recent development in Apache HBase that make it a good fit for time-series applications.
Mm.. FLaNK Stack (MiNiFi MXNet Flink NiFi Kudu Kafka)Timothy Spann
Mm.. FLaNK Stack (MiNiFi MXNet Flink NiFi Kudu Kafka)
A quick discussion and demo of the FLaNK stack.
Streaming development with Apache NiFi, Apache Kafka, Apache Flink and friends.
Dec 2019, Timothy Spann, Field Engineer, Data in Motion
Princeton Meetup 10-dec-2019
https://www.meetup.com/futureofdata-princeton/events/266496424/
Hosted By PGA Fund at:
https://pga.fund/coworking-space/
Princeton Growth Accelerator
5 Independence Way, 4th Floor, Princeton, NJ
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...DataStax
If you are interested in Big Data, you surely already know Apache Spark or Apache Cassandra, but do you know Apache Zeppelin ? Do you know that it is possible to draw out beautiful graph using an user-friendly interface out of your Spark RDD and Cassandra queries ?
In this session, I will introduce Zeppelin by live coding example and highlight its modular architecture which allows you to plug-in any interpreter for the back-end of your choice.
Then we'll dig into the Apache Cassandra interpreter and show how to use it as a default front-end to display your Cassandra data
About the Speaker
DuyHai DOAN Apache Cassandra Evangelist, DataStax
DuyHai DOAN is an Apache Cassandra Evangelist at DataStax. He spends his time between technical presentations/meetups on Cassandra, coding on open source projects like Achilles or Apache Zeppelin to support the community and helping all companies using Cassandra to make their project successful. Previously he was working as a freelance Java/Cassandra consultant.
Apache Phoenix: Use Cases and New FeaturesHBaseCon
James Taylor (Salesforce) and Maryann Xue (Intel)
This talk with be broken into two parts: Phoenix use cases and new Phoenix features. Three use cases will be presented as lightning talks by individuals from 1) Sony about its social media NewsSuite app, 2) eHarmony on its matching service, and 3) Salesforce.com on its time-series metrics engine. Two new features will be discussed in detail by the engineers who developed them: ACID transactions in Phoenix through Apache Tephra. and cost-based query optimization through Apache Calcite. The focus will be on helping end users more easily develop scalable applications on top of Phoenix.
This talk will explain best practices for upgrade techniques in MySQL. In deep dive, we will go over how to upgrade successfully to MySQL 8.0. Explain MySQL 8.0 upgrade specific challenges. Go over gotchas and best practices. Review the latest version of MySQL 8.0 and bug reports.
Deploying any software can be a challenge if you don't understand how resources are used or how to plan for the capacity of your systems. Whether you need to deploy or grow a single MongoDB instance, replica set, or tens of sharded clusters then you probably share the same challenges in trying to size that deployment.
MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...MongoDB
MongoDB can be used in the Nuxeo Platform as a replacement for more traditional SQL databases. Nuxeo's content repository, which is the cornerstone of this open source enterprise content management platform, integrates completely with MongoDB for data storage. This presentation will explain the motivation for using MongoDB and will emphasize the different implementation choices driven by the very nature of a NoSQL datastore like MongoDB. Learn how Nuxeo integrated MongoDB into the platform which resulted in increased performance (including actual benchmarks) and better response to some use cases.
MongoDB Europe 2016 - Star in a Reasonably Priced Car - Which Driver is Best?MongoDB
MongoDB's unique Idiomatic Drivers let you work natively with database objects in your favourite language, removing the need to explicitly convert your data and queries to text formats such as SQL, Javascript or XML. Drivers do all the hard work of translating to serialised BSON objects on the wire, removing the need for server-side parsing and ensuring security against injection attacks. Server load and hardware requirements are reduced at the expense of additional client side CPU cycles. In this presentation we compare the performance of drivers in a number of languages to see what impact your language choice can have on your hosting costs and throughput.
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...MongoDB
Proximus is one of the biggest Telecom companies in the Belgian market. This year the company began developing a new IoT network using LoRaWan technology. The talk will detail our development team’s search for a database suited to meet the needs of our IoT project, the selection and implementation of MongoDB as a database, as well as well as how we built a system for storing a variety of sensor data with high throughput by leveraging sleepy.mongoose. The talk will also discuss how different decisions around data storage impact applications in regards to both performance and total cost.
MongoDB Europe 2016 - Advanced MongoDB Aggregation PipelinesMongoDB
We will do a deep dive into the powerful query capabilities of MongoDB's Aggregation Framework, and show you how you can use MongoDB's built-in features to inspect the execution and tune the performance of your queries. And, last but not least, we will also give you a brief outlook into MongoDB 3.4's awesome new Aggregation Framework additions.
MongoDB Europe 2016 - Debugging MongoDB PerformanceMongoDB
Asya is back, and so is Sherlock Holmes and his techniques to gather and analyze data from your poorly performing MongoDB clusters. In this advanced talk we take a deep look at all the diagnostic data that lives inside MongoDB - how to interrogate and interpret it to help you solve those frustrating performance bottlenecks that we all face occasionally.
MongoDB Europe 2016 - Deploying MongoDB on NetApp storageMongoDB
Customer and business requirements are shifting constantly. Today’s powerful programming languages can keep up—but what about your database? NetApp® MongoDB solutions offer a flexible, scalable answer. Learn how NetApp storage solutions will accelerate your MongoDB Performance, reduce operational Costs and provide the highest levels of Availability and Security. These solutions provide advanced fault-recovery features and easy, in-service growth capabilities to accommodate your unpredictable, ever-changing business demands. NetApp storage is designed to help you build a high-performance, cost-efficient, and highly available analytics solution. So you can focus on adding real business value.
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB
The popularity of dedicated graph technologies has risen greatly in recent years, at least partly fuelled by the explosion in social media and similar systems, where a friend network or recommendation engine is often a critical component when delivering a successful application. MongoDB 3.4 introduces a new Aggregation Framework graph operator, $graphLookup, to enable some of these types of use cases to be built easily on top of MongoDB. We will see how semantic relationships can be modelled inside MongoDB today, how the new $graphLookup operator can help simplify this in 3.4, and how $graphLookup can be used to leverage these relationships and build a commercially focused news article recommendation system.
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...MongoDB
Travellers are demanding more exhaustive, accurate, and relevant results when they search for flights, and they want these results instantly – even when there can be 100 Billion travel options for a single trip. Amadeus’s “Instant Search” feature was built to meet those requirements. These searches are not trivial. Several terabytes of constantly evolving data is needed to reply instantly to questions like, “I live in Frankfurt, where can I go this weekend for €200?” “What’s the cheapest and most convenient flight for a MongoDB Europe attendee?” This technical session will show you how Amadeus integrated MongoDB into its system, and how it allowed us to handle huge numbers of updates and searches in a high-volume system, to deliver the next generation of flight search products. It will cover topics such as how we discovered which extra indexes were needed, how we were able to get the balancer to meet our needs, and how we modelled our data for optimal performance.
James Tan (MongoDB) - Automate Production-Ready MongoDB DeploymentsOutlyer
Getting from dev to a robust, performant, and scalable production environment takes a fair bit of work. Doing this manually is time-consuming and error-prone, so let's look at the various ways to automate this with Vagrant + Chef, as well as MMS Automation (free up to 8 servers).
Video: https://www.youtube.com/watch?v=vOZqwPEQzDM
Join DevOps Exchange London here: http://www.meetup.com/DevOps-Exchange-London
Follow DOXLON on twitter http://www.twitter.com/doxlon
MongoDB-as-a-Service on Pivotal Cloud FoundryVMware Tanzu
SpringOne Platform 2016'
Speakers: Mallika Iyer; Principal Software Engineer, Pivotal & Sam Weaver; Product Manager, MongoDB
The ability to provide your organization with multiple data services on a platform like Pivotal Cloud Foundry is very powerful, and increases the agility of the organization as a whole, when developers are able to provision data services on demand, and all of this is completely transparent to the system operators. This session will cover a very brief overview of Pivotal Cloud Foundry, and will then deep dive into running MongoDB as a managed service on this platform. The MongoDB service for Pivotal Cloud Foundry leverages the capabilities of Bosh 2.0 for on-demand-dynamic provisioning for services while maintaining an integration with MongoDB's Cloud Ops Manager, to provide the best of both - Pivotal Cloud Foundry and MongoDB.
Hidden inside MongoDB is the WiredTiger data engine, an Open Source, pluggable storage engine that became the database's default in 3.2. Written in C, WiredTiger uses a variety of techniques to provide unmatched performance, low latency and scalability. This talk will explore data structures and techniques C/C++ programmers can use to support heavily threaded applications on modern hardware, using examples from the WiredTiger code base. Data structures and techniques to be covered include hazard pointers, skiplists, ticket locks, atomic instructions and memory barriers.
Webinar: Data Streaming with Apache Kafka & MongoDBMongoDB
A new generation of technologies is needed to consume and exploit today's real time, fast moving data sources. Apache Kafka, originally developed at LinkedIn, has emerged as one of these key new technologies.
A lot of people use Docker/rkt, but very often we do not have time to actually understand how they work. So today in half-hour I will show you in a nutshell how that works. My hope is that even after you know how to build a container engine, I can still convince you that the existing tools are worth spending $MM to create and use.
Trivadis TechEvent 2016 Microservices, Containers, CQRS, Actors in .NET for t...Trivadis
In this session we look at modern software architectures and how they find their place in the .NET world. We examine the architectural requirements of modern reactive systems that are structured into microservices and have the ability to provide feedback to users in near real-time. Architectural patterns include microservices, messaging, CQRS and the actor model. We take a look at .NET based frameworks that support these concepts, like "Azure Service Fabric", "Project Orleans", "Akka.NET" and NServiceBus" and tools to communicate back to the client like "SignalR" or "XSockets".
Meeting (Software Engineering Thailand) about Internet of Things, Internet of Everything, Physical Web, Smart home & Co in Bangkok on Feb. 29. I talked about the Big Picture, i.e. Industry 4.0 and some technological concepts. Then we took a look at IoT standards and products, e.g. The Kaa Project, IBM Bluemix, Carriots, Relayr etc. Later we discussed different aspects of IoT and Cloud Computing:
Micro Manchester Meetup: "The Seven (More) Deadly Sins of Microservices"Daniel Bryant
There is trouble brewing in the land of microservices – today’s shiny technology is tomorrow’s legacy, and there is concern that we will all be dealing with spaghetti services in 2018…
It is often a sign of an architectural approach’s maturity that in addition to the emergence of well established principles and practices, that anti-patterns also begin to be identified and classified. In this talk we introduce the 2016 edition of the seven deadly sins that if left unchecked could easily ruin your next microservices project…
(Updated 26th of April 2014)
TYPO3 Neos - the compendium with more than 270 pages
Thanks a lot to ROLAND SCHENKE for the translation! You rock a lot!!!!
Just in time for the release of TYPO3 Neos 1.0.2 I have released a compendium with more than 270 pages on the subject of TYPO3 Neos.
As an early-adopter and technology leader I have to (and want to) deal early with new technology .
But the result should go back to the community as fast as it can. Because if the TYPO3 Neos community grows and we can feed it, everyone will profit.
This compilation ist unique on the market und covers all aspects of TYPO3 Neos in a detailed, clear and didactic manner.
As soon as a new TYPO3 Neos version will be released, the compendium will be updated too.
Have much fun with it!
Patrick Lobacher
With the advent of large malware in recent years, systems OS X can be vectors of attack us-ing Mach-O binaries. This presentation will illustrate the dissection of something malicious and also identifica-tion,analysis and some possibilities for mitigation.
Patterns for Asynchronous Microservices with NATSApcera
Presentation from a talk by Raul Perez (@repejota) of R3Labs on asynchronous microservices patterns using NATS (@nats_io), the lightweight, high performance open source messaging system written in Go.
You can learn more about NATS at http://www.nats.io
Seán Labastille gives us an overview of key tasks a developer should undertake to be successful and productive. Further material:
-Video of the presentation can be seen here: http://bit.ly/developer_toolbox.
-Transcript of speech is here http://bit.ly/transcript_toolbox
Dalle applicazioni desktop al web ed alle architetture multi tier. Dallo sviluppo basato sui componenti alle service oriented architecture… I Microservices saranno la soluzione vincente?
Enabling Microservice @ Orbitz - GOTO Chicago 2016Steve Hoffman
In this talk we will discuss how we enabled decomposition of one of our 500+ system components into a continuously deployed microservice cluster. Our platform is comprised of Apache Mesos/Marathon, Docker, and a number of local services including Consul for service discovery, Logstash for diskless logging, and a custom metrics forwarder to Graphite. Building on this, we'll detail our CI pipeline using Jenkins workflows to build and publish microservices as Docker images, test and deploy via Marathon/Mesos, and automated change tickets. Finally, we'll discuss lessons learned from building our own enterprise PaaS and scaling it out to a large organization.
Similar to MongoDB Europe 2016 - MongoDB, Ops Manager & Docker at SNCF (20)
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
During this talk we'll navigate through a customer's journey as they migrate an existing MongoDB deployment to MongoDB Atlas. While the migration itself can be as simple as a few clicks, the prep/post effort requires due diligence to ensure a smooth transfer. We'll cover these steps in detail and provide best practices. In addition, we’ll provide an overview of what to consider when migrating other cloud data stores, traditional databases and MongoDB imitations to MongoDB Atlas.
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
Humana, like many companies, is tackling the challenge of creating real-time insights from data that is diverse and rapidly changing. This is our journey of how we used MongoDB to combined traditional batch approaches with streaming technologies to provide continues alerting capabilities from real-time data streams.
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
The challenges involved with managing time-series data in IoT applications
Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch".
This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
MongoDB Kubernetes operator is ready for prime-time. Learn about how MongoDB can be used with most popular orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications.
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
Query performance should be the unsung hero of an application, but without proper configuration, can become a constant headache. When used properly, MongoDB provides extremely powerful querying capabilities. In this session, we'll discuss concepts like equality, sort, range, managing query predicates versus sequential predicates, and best practices to building multikey indexes.
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
Aggregation pipeline has been able to power your analysis of data since version 2.2. In 4.2 we added more power and now you can use it for more powerful queries, updates, and outputting your data to existing collections. Come hear how you can do everything with the pipeline, including single-view, ETL, data roll-ups and materialized views.
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms.
How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms?
In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
aux Core Data, appréciée par des centaines de milliers de développeurs. Apprenez ce qui rend Realm spécial et comment il peut être utilisé pour créer de meilleures applications plus rapidement.
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
Il n’a jamais été aussi facile de commander en ligne et de se faire livrer en moins de 48h très souvent gratuitement. Cette simplicité d’usage cache un marché complexe de plus de 8000 milliards de $.
La data est bien connu du monde de la Supply Chain (itinéraires, informations sur les marchandises, douanes,…), mais la valeur de ces données opérationnelles reste peu exploitée. En alliant expertise métier et Data Science, Upply redéfinit les fondamentaux de la Supply Chain en proposant à chacun des acteurs de surmonter la volatilité et l’inefficacité du marché.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
MongoDB Europe 2016 - MongoDB, Ops Manager & Docker at SNCF
1. MONGODB, OPS MANAGER &
DOCKER @ SNCF
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
MONGODB EUROPE 2016 - NOV 15th - LONDON
FEEDBACK ON HOW TO DEPLOY MONGODB REPLICASET IN LESS
THAN 5 MINUTES ! (PROOF OF CONCEPT)
2. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
AGENDA
01.
MONGODB , PRESENTATION OF THE CONTEXT @ SNCF :
Needs & Goals
02.
INGREDIENTS / PRE-REQUISITES
03.
RECIPES & ORCHESTRATION :
how to combine Docker & Ops Manager for mongodb replicaset ?
04.
CONCLUSION:
Next steps & Improvements
MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
3. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
ABOUT SNCF
BUSINESS PROFILE
MONGODB EUROPE 2016 - NOV 15th - LONDON
Transport Group offering B to B and B to C services in several fields :
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
4. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
ABOUT SNCF
IT @ SNCF
MONGODB EUROPE 2016 - NOV 15th - LONDON
“Production IT” is the operator serving the
IT System Departments of the group. DC #3
DC #1a
DC #1b
DC #2
Physical
servers
SmartphonesWorkstationsFixed
telephone
line
Operated
applications
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
5. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
NEEDS WANTS
HIGH AVAILABILITY
DELIVERY TIME
INDUSTRIALIZE
OPERATIONS
SIMPLIFICATION
SHARED SERVICES
AFFORDABLE PRICE
GOALS
RESILIENT
FASTER
CHEAPER
NEEDS AND GOALSSERVICE OFFER
6. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
TRANSLATING THE REQUIREMENTS
MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
CHEAPER
FASTER
MORE RESILIENT
SHARED INFRASTRUCTURE
AUTOMATED DEPLOYMENT
BACKUP, ALERTING POLICY
7. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
SELECTING INGREDIENTS
SHARED
INFRASTRUCTURE
HARDWARE CLUSTER DOCKER
3 large servers (256 GB/6 TB)
3 Data Center rooms
MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
CHEAPER
Lightweight containers
Full autonomy
Very fast startup
FASTER AUTOMATION
8. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
DOCKER
MONGODB EUROPE 2016 - NOV 15th - LONDON
Type of container : Small/
Medium/Xlarge
Server details
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
9. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
SELECTING INGREDIENTS
FASTER RESILIENTAUTOMATION
BACKUP,
ALERTS
OPS MANAGER
MongoDB’s
Management
Solution
10. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
OPS MANAGER
MANAGEMENT
• MONITORING
• BACKUP
• AUTOMATION
• FULLY API-
DRIVEABLE
11. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
INGREDIENTS
MONGODB EUROPE 2016 - NOV 15th - LONDON
1. OPS MANAGER
FASTER AND
MORE RELIABLE
DEPLOYMENTS
3 nodes for HA
2 MongoDB
Enterprise Replica
Sets
OPS 1
OPS Manager
HTTP
Service
Backup
database
(secondaire)
Application
database
(primaire)
Backup
Daemon
Head DB
Head DB
Backup
database
(primaire)
Application
database
(secondaire)
OPS 2
OPS Manager
HTTP
Service
Backup
Daemon
Head DB
Head DB
Application
database
(secondaire)
OPS 3
OPS Manager
HTTP
Service
Backup
Daemon
Head DB
Head DB
Backup
database
(secondaire)
Loadbalancer
VIP
Ops Manager
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
12. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
INGREDIENTS
MONGODB EUROPE 2016 - NOV 15th - LONDON
2. HARDWARE
CLUSTER
CHEAPER
DEPLOYMENT OF
MONGODB
Linux Ubuntu 16.04
64bits /
256 GB RAM
Pool of 30 IPs pre-
allocated per server
(10Gb network interface)
Local SSD (6TB) storage
per server
And some basic orchestration : bash, python script ….
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
13. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
INGREDIENTS
MONGODB EUROPE 2016 - NOV 15th - LONDON
3. DOCKER
ENGINE (V1.11)
CHEAPER, FASTER
DEPLOYMENTS
Standardised
MongoDB
deployment: 4
containers
And some basic orchestration : bash, python script ….
Mongodb replicaset
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
14. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
PUTTING IT ALL TOGETHER
MONGODB EUROPE 2016 - NOV 15th - LONDON
1 2
3
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
15. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
Our guideline: simple, simple, simple = only 2 commands
(maybe 3 !)
l create_replicaset.py -group sncf-test -size small –name
sncf1 -nb 3 –file passwdfile.csv -backup default -alerting
default -env prod –version 3.4 -dryrun
l remove_replicaset.py -name replicaset-name
RECIPE & ORCHESTRATION
MONGODB EUROPE 2016 - NOV 15th - LONDON
In the future:
upgrade_replicaset.py –name replicaset-name –size large –nb nb-
replicaset-member –env prod
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
16. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
create_replicaset.py –help
-group group_name
-size container’s size S/M/XL
–name replicaset-name
-nb nodes (3/3a/5/5a)
-file passwordfile.csv
-backup policy
-alerting policy
-env prod/preprod
-version 3.2.10
-dryrun (reporting only)
-help This help message
MONGODB EUROPE 2016 - NOV 15th - LONDON
Ops Manager API – Create Group
Docker – Create Image
Capacity Planning
RECIPE & ORCHESTRATION
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
17. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
MONGODB EUROPE 2016 - NOV 15th - LONDON
r = requests.post(host +
"/api/public/v1.0/groups",
auth=HTTPDigestAuth(user, key),
headers=headers,
data=json.dumps(payload))
j = r.json()
group_id = j["id"]
agent_api_key = j["agentApiKey"]
RECIPE & ORCHESTRATION
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
Ops Manager API –
Create Group
Docker – Create Image
Capacity Planning
18. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
MONGODB EUROPE 2016 - NOV 15th - LONDON
RECIPE & ORCHESTRATION
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
Ops Manager API – Create Group
Docker – Create Image
Capacity Planning
Dockerfile:
FROM ubuntu-sncf:16.04
MAINTAINER ext.osmozium.david.tsang-hin-
sun@sncf.fr
RUN apt-get update
RUN apt-get install -y net-tools
RUN apt-get install -y vim-tiny
RUN apt-get install -y aptitude
RUN apt-get install -y libsasl2-2
RUN apt-get install -y ssl-cert ca-certificates
openssl
RUN apt-get install -y munin-node
ADD mongodb-mms-automation-agent-
manager_2.5.18.1647-1_amd64.deb /tmp/mongodb-mms-
automation-agent-manager_2.5.18.1647-1_amd64.deb
RUN dpkg -i /tmp/mongodb-mms-automation-agent-
manager_2.5.18.1647-1_amd64.deb
ADD automation-agent.config /etc/mongodb-mms/
automation-agent.config
RUN chown mongodb:mongodb /etc/mongodb-mms/
automation-agent.config
RUN chmod 600 /etc/mongodb-mms/automation-
agent.config
RUN mkdir /data && chown -R mongodb:mongodb /data
ADD runautomationagent.sh /runautomationagent.sh
ENTRYPOINT /runautomationagent.sh && bash
19. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
MONGODB EUROPE 2016 - NOV 15th - LONDON
Start containers
Create and mount LVM
Volumes
Reserve IP addresses
RECIPE & ORCHESTRATION
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
20. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
MONGODB EUROPE 2016 - NOV 15th - LONDON
Start containers
Create and mount LVM Volumes
Reserve IP addresses
RECIPE & ORCHESTRATION
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
21. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
MONGODB EUROPE 2016 - NOV 15th - LONDON
Start containers
Create and mount LVM Volumes
Reserve IP addresses
RECIPE & ORCHESTRATION
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
docker run --name $replica-node-$n
--restart=always -d -h AAA.BBB.CCC.DDD -m 2G
--cpu-shares 1024 --blkio-weight=300
-p AAA.BBB.CCC.DDD:27017:27017 -ti
-v /data/replicaset1/AAA.BBB.CCC.DDD:/data:rw –v
/home/docker/vol/replicaset1/AAA.BBB.CCC.DDD:/var/
lib/mongodb-mms-automation/:rw sncf/ubuntu-
replicaset1:16.04 /bin/bash
23. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
MONGODB EUROPE 2016 - NOV 15th - LONDON
Ops Manager Backup
API
Ops Manager Alerts API
Ops Manager Automation API
{
"clusterId": "CLUSTER-ID",
"dailySnapshotRetentionDays": 7,
"groupId": "BACKUP-ID",
"monthlySnapshotRetentionMonths": 13,
"pointInTimeWindowHours": 24,
"snapshotIntervalHours": 6,
"snapshotRetentionDays": 2,
"weeklySnapshotRetentionWeeks": 4
}
RECIPE & ORCHESTRATION
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
24. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
MONGODB EUROPE 2016 - NOV 15th - LONDON
Ops Manager Backup API
Ops Manager Alerts API
Ops Manager Automation API
{
"eventTypeName": "MONITORING_AGENT_DOWN",
"groupId": "GROUP-ID",
"notifications": [
{
"delayMin": 0,
"emailEnabled": true,
"intervalMin": 60,
"smsEnabled": false,
"typeName": "GROUP"
}
],
"typeName": "AGENT",
}
RECIPE & ORCHESTRATION
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
25. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
MONGODB EUROPE 2016 - NOV 15th - LONDON
Notify user
RECIPE & ORCHESTRATION
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
26. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
MONGODB EUROPE 2016 - NOV 15th - LONDON
Scalability ?
- Add more physical hosts
Security?
- limit use of Global Owner role in Ops Manager API
Availability?
- Use of 3 Distinct DATACENTERS
Software upgrades?
- Docker 1.12 swarm enabled ? Docker API ?
- Graphical interface for true As A Service experience?
IMPROVEMENTS TO THE RECIPE
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
27. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
CONCLUSION
MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
CHEAPER
FASTER
MORE RESILIENT
Several applications hosted on
shared services
End to end industrialization
Streamline architectures
Availability rate
Recovery point objective
Less than one hour data loss
28. MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
NEXT STEPS & IMPROVEMENTS
Integrate to a
software
factory
Integrate into
the internal
cloud
Create
a Disaster
Recovery Plan
MONGODB EUROPE 2016 - NOV 15th - LONDON
Christophe TRINCAL / David TSANG-HIN-SUN / Sylvain CHAMBON
Next steps ….
…. and improvements