The document discusses various WiredTiger configuration variables that can be used to tune MongoDB's performance. It provides details on variables like journalCompressor, blockCompressor, prefixCompression, concurrent transactions, eviction triggers and targets, dirty triggers and targets, and eviction threads. Benchmarks are presented comparing the default settings to alternative configurations, showing impacts on load time and transactions per second.
In this deck from ATPESC 2019, Jack Dongarra from UT Knoxville presents: Adaptive Linear Solvers and Eigensolvers.
"Success in large-scale scientific computations often depends on algorithm design. Even the fastest machine may prove to be inadequate if insufficient attention is paid to the way in which the computation is organized. We have used several problems from computational physics to illustrate the importance of good algorithms, and we offer some very general principles for designing algorithms. Two subthemes are, first, the strong connection between the algorithm and the architecture of the target machine; and second, the importance of non-numerical methods in scientific computations."
Watch the video: https://wp.me/p3RLHQ-lq3
Learn more: https://extremecomputingtraining.anl.gov/archive/atpesc-2019/agenda-2019/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...Torsten Seemann
I describe the three levels of parallelism that can be exploited in bioinformatics software (1) clusters of multiple computers; (2) multiple cores on each computer; and (3) vector machine code instructions.
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...Bruno Castelucci
Lendo e apresentando o artigo: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN.
http://conferences.sigcomm.org/sigcomm/2013/program.php
http://conferences.sigcomm.org/sigcomm/2013/papers/sigcomm/p99.pdf
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens AxboeAnne Nicolas
Storage keeps moving forward, and so does the Linux IO stack. This talk will detail some of the recent additions and changes that have gone into the Linux kernel storage stack, helping Linux get the most out of industry innovations in that space.
Jens Axboe, Facebook
An introduction and evaluations of a wide area distributed storage systemHiroki Kashiwazaki
A presentation on Storage Developer Conference (SDC) 2014 in Santa Clara, California. General overview of distcloud until now and the future.
米カリフォルニア州サンタクララで開催された Storage Developer Conference 2014 での発表資料です。distcloud のこれまでとこれからの総括。
In this deck from ATPESC 2019, Jack Dongarra from UT Knoxville presents: Adaptive Linear Solvers and Eigensolvers.
"Success in large-scale scientific computations often depends on algorithm design. Even the fastest machine may prove to be inadequate if insufficient attention is paid to the way in which the computation is organized. We have used several problems from computational physics to illustrate the importance of good algorithms, and we offer some very general principles for designing algorithms. Two subthemes are, first, the strong connection between the algorithm and the architecture of the target machine; and second, the importance of non-numerical methods in scientific computations."
Watch the video: https://wp.me/p3RLHQ-lq3
Learn more: https://extremecomputingtraining.anl.gov/archive/atpesc-2019/agenda-2019/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...Torsten Seemann
I describe the three levels of parallelism that can be exploited in bioinformatics software (1) clusters of multiple computers; (2) multiple cores on each computer; and (3) vector machine code instructions.
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...Bruno Castelucci
Lendo e apresentando o artigo: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN.
http://conferences.sigcomm.org/sigcomm/2013/program.php
http://conferences.sigcomm.org/sigcomm/2013/papers/sigcomm/p99.pdf
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens AxboeAnne Nicolas
Storage keeps moving forward, and so does the Linux IO stack. This talk will detail some of the recent additions and changes that have gone into the Linux kernel storage stack, helping Linux get the most out of industry innovations in that space.
Jens Axboe, Facebook
An introduction and evaluations of a wide area distributed storage systemHiroki Kashiwazaki
A presentation on Storage Developer Conference (SDC) 2014 in Santa Clara, California. General overview of distcloud until now and the future.
米カリフォルニア州サンタクララで開催された Storage Developer Conference 2014 での発表資料です。distcloud のこれまでとこれからの総括。
Crimson: Ceph for the Age of NVMe and Persistent MemoryScyllaDB
Ceph is a mature open source software-defined storage solution that was created over a decade ago.
During that time new faster storage technologies have emerged including NVMe and Persistent memory.
The crimson project aim is to create a better Ceph OSD that is more well suited to those faster devices. The crimson OSD is built on the Seastar C++ framework and can leverage these devices by minimizing latency, cpu overhead, and cross-core communication. This talk will discuss the project design, our current status, and our future plans.
CRIU: Time and Space Travel for Linux ContainersKirill Kolyshkin
This talk describes CRIU (checkpoint/restore in userspace) software, used to checkpoint, restore, and live migrate Linux containers and processes. It describes the live migration, compares it to that of VM, and shows other uses for checkpoint/restore.
Shared Memory Performance: Beyond TCP/IP with Ben Cotton, JPMorganHazelcast
In this webinar you’ll learn the importance of Advanced Data Locality and Data IPC Transports with respect to Java distributed cache data grids. This is information that is super-crucial to the HPC Linux supercomputing community. The presenter will show how by using native /dev/shm as an IPC transport we can achieve latencies 1,000x faster than TCP/IP.
We’ll cover the following topics:
-Why going off-heap is fundamental to meeting real-time SLAs
-Why traditional grid transports (TCP/IP) are not good enough for the HPC Linux supercomputing community
-Why we need something more than TCP/IP
-Live Q&A Session
Presenter:
Ben Cotton, Consultant at JPMorgan
Ben is has been an IT Consultant to Financial Services Industry for nearly 20 years. His specializations lie in open source, transactions, caching, datagrids, fixed income & derivatives trading systems.
Ben is active in the following communities:
Java Community Process Member
JSR-156 expert group: Java XML Transactions API
JSR-107 expert group: Java Caching API
JSR-347 expert group: Java Data Grids API
RedHat Community Member (Fedora, Infinispan, JBoss XTS) Code Contributor
AppOS: PostgreSQL Extension for Scalable File I/O @ PGConf.Asia 2019Sangwook Kim
General-purpose operating systems sacrifice database performance in order to preserve generality. This performance impact has become particularly salient with modern hardware, such as high-performance SSDs. On the other hand, substantial effort is required to customize or construct operating system-like functionalities, such as user-level CPU and disk I/O scheduling, inside a database engine to meet the performance need. In this talk, I will present AppOS, a library OS implemented as a PostgreSQL extension for scalable file I/O. AppOS improves PostgreSQL throughput up to 13x and 99th response time up to 32x for a write-intensive workload, without code changes nor hardware updates. Specifically, I will explain why Linux I/O stack is problematic for the PostgreSQL performance. Next, I will introduce the core concepts and the internals of AppOS. Finally, I will share several use cases and performance results of AppOS for PostgreSQL.
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)Red Hat Developers
Just like a spoon full of sugar will cure your hiccups, running your JVM with -XX:+UseShenandoahGC will cure your Java garbage collection hiccups. Shenandoah GC is a new garbage collector algorithm developed for OpenJDK at Red Hat, which will produce much better pause times than the currently-available algorithms without a significant decrease in throughput. In this session, we'll explain how Shenandoah works and compare it to the currently-available OpenJDK garbage collectors.
Slides from #PromCon2018 Munich.
https://promcon.io/2018-munich/talks/thanos-prometheus-at-scale/
Bartłomiej Płotka
Fabian Reinartz
The Prometheus Monitoring system has been thriving for several years. Along with its powerful data model, operational simplicity and reliability have been a key factor in its success. However, some questions were still largely unaddressed to this day. How can we store historical data at the order of petabytes in a reliable and cost-efficient way? Can we do so without sacrificing responsive query times? And what about a global view of all our metrics and transparent handling of HA setups?
Thanos takes Prometheus' strong foundations and extends it into a clustered, yet coordination free, globally scalable metric system. It retains Prometheus's simple operational model and even simplifies deployments further. Under the hood, Thanos uses highly cost-efficient object storage that's available in virtually all environments today. By building directly on top of the storage format introduced with Prometheus 2.0, Thanos achieves near real-time responsiveness even for cold queries against historical data. All while having virtually no cost overhead beyond that of the underlying object storage.
We will show the theoretical concepts behind Thanos and demonstrate how it seamlessly integrates into existing Prometheus setups.
share the common java memory problem cases solutions,including:
1. java.lang.OutOfMemoryError
2. full gc frequently
3. cms gc error: promotion failed or concurrent mode failure
In this deck from the DDN User Group at SC14, Tommy Minyard from TACC presents: Site-wide Storage Use Case and Early User Experience with Infinite Memory Engine.
"IME unleashes a new I/O provisioning paradigm. This breakthrough, software defined storage application introduces a whole new new tier of transparent, extendable, non-volatile memory (NVM), that provides game-changing latency reduction and greater bandwidth and IOPS performance for the next generation of performance hungry scientific, analytic and big data applications – all while offering significantly greater economic and operational efficiency than today’s traditional disk-based and all flash array storage approaches that are currently used to scale performance."
Watch the video presentation: http://insidehpc.com/2014/12/site-wide-storage-use-case-early-user-experience-infinite-memory-engine/
Crimson: Ceph for the Age of NVMe and Persistent MemoryScyllaDB
Ceph is a mature open source software-defined storage solution that was created over a decade ago.
During that time new faster storage technologies have emerged including NVMe and Persistent memory.
The crimson project aim is to create a better Ceph OSD that is more well suited to those faster devices. The crimson OSD is built on the Seastar C++ framework and can leverage these devices by minimizing latency, cpu overhead, and cross-core communication. This talk will discuss the project design, our current status, and our future plans.
CRIU: Time and Space Travel for Linux ContainersKirill Kolyshkin
This talk describes CRIU (checkpoint/restore in userspace) software, used to checkpoint, restore, and live migrate Linux containers and processes. It describes the live migration, compares it to that of VM, and shows other uses for checkpoint/restore.
Shared Memory Performance: Beyond TCP/IP with Ben Cotton, JPMorganHazelcast
In this webinar you’ll learn the importance of Advanced Data Locality and Data IPC Transports with respect to Java distributed cache data grids. This is information that is super-crucial to the HPC Linux supercomputing community. The presenter will show how by using native /dev/shm as an IPC transport we can achieve latencies 1,000x faster than TCP/IP.
We’ll cover the following topics:
-Why going off-heap is fundamental to meeting real-time SLAs
-Why traditional grid transports (TCP/IP) are not good enough for the HPC Linux supercomputing community
-Why we need something more than TCP/IP
-Live Q&A Session
Presenter:
Ben Cotton, Consultant at JPMorgan
Ben is has been an IT Consultant to Financial Services Industry for nearly 20 years. His specializations lie in open source, transactions, caching, datagrids, fixed income & derivatives trading systems.
Ben is active in the following communities:
Java Community Process Member
JSR-156 expert group: Java XML Transactions API
JSR-107 expert group: Java Caching API
JSR-347 expert group: Java Data Grids API
RedHat Community Member (Fedora, Infinispan, JBoss XTS) Code Contributor
AppOS: PostgreSQL Extension for Scalable File I/O @ PGConf.Asia 2019Sangwook Kim
General-purpose operating systems sacrifice database performance in order to preserve generality. This performance impact has become particularly salient with modern hardware, such as high-performance SSDs. On the other hand, substantial effort is required to customize or construct operating system-like functionalities, such as user-level CPU and disk I/O scheduling, inside a database engine to meet the performance need. In this talk, I will present AppOS, a library OS implemented as a PostgreSQL extension for scalable file I/O. AppOS improves PostgreSQL throughput up to 13x and 99th response time up to 32x for a write-intensive workload, without code changes nor hardware updates. Specifically, I will explain why Linux I/O stack is problematic for the PostgreSQL performance. Next, I will introduce the core concepts and the internals of AppOS. Finally, I will share several use cases and performance results of AppOS for PostgreSQL.
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)Red Hat Developers
Just like a spoon full of sugar will cure your hiccups, running your JVM with -XX:+UseShenandoahGC will cure your Java garbage collection hiccups. Shenandoah GC is a new garbage collector algorithm developed for OpenJDK at Red Hat, which will produce much better pause times than the currently-available algorithms without a significant decrease in throughput. In this session, we'll explain how Shenandoah works and compare it to the currently-available OpenJDK garbage collectors.
Slides from #PromCon2018 Munich.
https://promcon.io/2018-munich/talks/thanos-prometheus-at-scale/
Bartłomiej Płotka
Fabian Reinartz
The Prometheus Monitoring system has been thriving for several years. Along with its powerful data model, operational simplicity and reliability have been a key factor in its success. However, some questions were still largely unaddressed to this day. How can we store historical data at the order of petabytes in a reliable and cost-efficient way? Can we do so without sacrificing responsive query times? And what about a global view of all our metrics and transparent handling of HA setups?
Thanos takes Prometheus' strong foundations and extends it into a clustered, yet coordination free, globally scalable metric system. It retains Prometheus's simple operational model and even simplifies deployments further. Under the hood, Thanos uses highly cost-efficient object storage that's available in virtually all environments today. By building directly on top of the storage format introduced with Prometheus 2.0, Thanos achieves near real-time responsiveness even for cold queries against historical data. All while having virtually no cost overhead beyond that of the underlying object storage.
We will show the theoretical concepts behind Thanos and demonstrate how it seamlessly integrates into existing Prometheus setups.
share the common java memory problem cases solutions,including:
1. java.lang.OutOfMemoryError
2. full gc frequently
3. cms gc error: promotion failed or concurrent mode failure
In this deck from the DDN User Group at SC14, Tommy Minyard from TACC presents: Site-wide Storage Use Case and Early User Experience with Infinite Memory Engine.
"IME unleashes a new I/O provisioning paradigm. This breakthrough, software defined storage application introduces a whole new new tier of transparent, extendable, non-volatile memory (NVM), that provides game-changing latency reduction and greater bandwidth and IOPS performance for the next generation of performance hungry scientific, analytic and big data applications – all while offering significantly greater economic and operational efficiency than today’s traditional disk-based and all flash array storage approaches that are currently used to scale performance."
Watch the video presentation: http://insidehpc.com/2014/12/site-wide-storage-use-case-early-user-experience-infinite-memory-engine/
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Виталий Стародубцев
##Что такое Storage Replica
##Архитектура и сценарии
##Синхронная и асинхронная репликация
##Междисковая, межсерверная, внутрикластерная и межкластерная репликация
##Дизайн и проектирование Storage Replica
##Нововведения в Windows Server 2016 TP5
##Графический интерфейс управления, и другие возможности - демонстрация и планы развития
##Интеграция Storage Replica с Storage Spaces Direct
MongoDB World 2019: The Journey of Migration from Oracle to MongoDB at RakutenMongoDB
Find out more about our journey of migrating to MongoDB after using Oracle for our hotel search database for over ten years.
- How did we solve the synchronization problem with the Master Database?
- How to get fast search results (even with massive write operations)?
- How other issues were solved
Building an Amazon Datawarehouse and Using Business Intelligence Analytics ToolsAmazon Web Services
Using AWS has never been easier or more affordable to solve business problems and uncover new opportunities using data. Now, businesses of all sizes and across all industries can take advantage of big data technologies and easily collect, store, process, analyze, and share their data. Gain a thorough understanding of what AWS offers across the big data lifecycle and learn architectural best practices for applying these technologies to your projects. We will also deep dive into how to use AWS services such as Kinesis, DynamoDB, Redshift, and Quicksight to optimize logging, build real-time applications, and analyze and visualize data at any scale.
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon
In this presentation, we will introduce Hotspot's Garbage First collector (G1GC) as the most suitable collector for latency-sensitive applications running with large memory environments. We will first discuss G1GC internal operations and tuning opportunities, and also cover tuning flags that set desired GC pause targets, change adaptive GC thresholds, and adjust GC activities at runtime. We will provide several HBase case studies using Java heaps as large as 100GB that show how to best tune applications to remove unpredicted, protracted GC pauses.
A Front-Row Seat to Ticketmaster’s Use of MongoDBMongoDB
Ticketmaster is the world leader in selling tickets. After more than a decade of developing applications extensively on Oracle and MySQL, Ticketmaster made the move to MongoDB. The reasons for the move are generally in line with those of other companies – increased flexibility and performance, and decreased costs and time-to-market. In this session we’ll discuss how the conversion to MongoDB went at Ticketmaster and we’ll take a deeper dive into some of the successes and set-backs that we faced. We’ll give an overview of the MongoDB topology at Ticketmaster, discuss exactly what data we’re writing to MongoDB and comment on the MongoDB support model that we’re using. We’ll also touch on the transition from relational DBA to NoSQL DBA at Ticketmaster.
Speedrunning the Open Street Map osm2pgsql LoaderGregSmith458515
The Open Street Map project provides invaluable data that keeps driving users toward the PostGIS and PostgreSQL stacks. Loading today’s full Planet data set takes a 120GB XML file and unrolls it into over a terabyte of database data. Crunchy’s benchmark labs have followed the expansion of that Planet data over the last six database releases, as the re-ignition of the CPU wars combined with parallel execution features landing in the database. We’ll take a look at that data evolution, which server configurations worked, and which metrics techniques still matter in the all SSD era.
MongoDB .local Bengaluru 2019: The Journey of Migration from Oracle to MongoD...MongoDB
Find out more about our journey of migrating to MongoDB after using Oracle for our hotel search database for over ten years.
- How did we solve the synchronization problem with the Master Database?
- How to get fast search results (even with massive write operations)?
- How other issues were solved
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward
Let’s be honest: Running a distributed stateful stream processor that is able to handle terabytes of state and tens of gigabytes of data per second while being highly available and correct (in an exactly-once sense) does not work without any planning, configuration and monitoring. While the Flink developer community tries to make everything as simple as possible, it is still important to be aware of all the requirements and implications In this talk, we will provide some insights into the greatest operations mysteries of Flink from a high-level perspective: - Capacity and resource planning: Understand the theoretical limits. - Memory and CPU configuration: Distribute resources according to your needs. - Setting up High Availability: Planning for failures. - Checkpointing and State Backends: Ensure correctness and fast recovery For each of the listed topics, we will introduce the concepts of Flink and provide some best practices we have learned over the past years supporting Flink users in production.
Comparing Geospatial Implementation in MongoDB, Postgres, and ElasticAntonios Giannopoulos
For a considerable set of applications querying geographical data consists of a critical operation. Fast responses combined with a high level of accuracy are often the requirements when an application user interacts with functions/operations of the type “Give me near me” or “Find me in area XYZ”. Additional complexity is usually added when the points of interest are constantly on the move, like a public transportation vehicle or a taxi.
For applications that frequently access geographical data and rely on both speed and accuracy, both application and database design is crucial. In this presentation, we are going to focus on the database side. More specifically, we are going to evaluate three of the most popular open-source databases, MongoDB, Postgres, and Elastic against geospatial workloads. For each of these databases, we are going to examine the implementation and the performance of geo-queries. We are going to discuss best practices and design patterns for each database and try to find a winner among the three.
Kafka is a distributed event streaming platform that has become very popular in the past couple of years.
In a MongoDB-Kafka architecture, MongoDB may be configured as both a Sink and a Source. With sink we mean to ingest events from your Kafka topics directly into MongoDB collections, exposing the data to your services for efficient querying, enrichment, and analytics. With source we mean publish data changes from MongoDB into Kafka topics for streaming to consuming apps.
We are going to cover both scenarios (using MongoDB as Sink and Source), by demonstrating ways to efficiently connect each datastore to the other. At the same time, we will cover use-cases and best practices when using Kafka and MongoDB together.
MongoDB 4.2 comes GA soon delivering some amazing new features on multiple areas. In this talk, we will focus on changes related to sharded clusters. We are going to cover distributed transactions & mutable shard keys providing examples that will reveal the internals of those new features. We will provide best practices around the new sharding features and we will cover other minor changes related to it.
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2Antonios Giannopoulos
MongoDB 4.2 comes GA soon delivering some amazing new features on multiple areas. In this talk, we will focus on the new capabilities of the aggregation framework. We are going to cover the new operators and expressions. At the same time, we will explore how updates commands can now use the aggregation framework operators. We are also going to present aggregation framework improvements focusing on the on-demand materialized views. Finally, we are going to explore the wildcard indexes introduced in MongoDB 4.2 and how they change the way we design documents and build queries/aggregations. We will also make a reference to the new index build system.
Every new version of MongoDB comes with exciting new features and a lot of improvements and version 4.0 couldn't be an exception to this rule. An upgrade from previous versions will unlock long waiting features like transactions but at the same time without proper planning could be catastrophic for your organization.
This presentation will guide you through the stapes for planning and implementing an upgrade to MongoDB 4.0. We will examine how MongoDB 4.0 affects your organization ecosystem and what changes might be necessary prior to the upgrade. We will demonstrate the upgrade steps with a detailed rollback plan. Finally, we will cover some post-upgrade considerations that will allow you to release the power of MongoDB 4.0.
Elasticsearch is well known as a highly scalable search engine that stores data in a structure optimized for language based searches but its capabilities and use cases don't stop there. In this tutorial, I'll give you a hands-on introduction to Elasticsearch and give you a glimpse at some of the fundamental concepts.
Database administration is challenging, and Elasticsearch is not an exception to that rule. In this tutorial, we will cover various administrative topics like Installation and Configuration, Cluster/Node management, Indexes management and Monitoring Cluster Health which will help you. Building applications on top of an Elasticsearch are also challenging and raise concerns about schema design. In this tutorial, we will cover developer-oriented topics like Mappings and Analysis, Aggregations and Schema Design that will help you build a robust application on top of Elasticsearch.
There will be lab sessions at the end of some chapters so please have your laptops with you.
A database trigger is a stored procedure that is executed when specific actions occur within a database. Triggers fit perfectly on a relational schema (foreign keys) and are implemented as a built-in functionality on popular relational database like MySQL.
MongoDB does not have any support for triggers, mainly due to the lack of support for foreign keys. Even if it usually considered an antipattern, there are use cases in MongoDB that benefit from a partially-relational schema. The lack of triggers is an obstacle for a partially-relational schema but there can be workarounds for simulating trigger behavior.
This presentation will guide you through different ways to implement triggers in MongoDB. We will cover the topics streams, tailable cursors, and hooks. We will demonstrate coding examples for each topic and we will explain pros and cons of each implementation.
This tutorial will guide you through the many considerations when deploying a sharded cluster. We will cover the services that make up a sharded cluster, configuration recommendations for these services, shard key selection, use cases, and how data is managed within a sharded cluster. Maintaining a sharded cluster also has its challenges. We will review these challenges and how you can prevent them with proper design or ways to resolve them if they exist today. Additional topics like tag aware sharding (Zones), disaster recovery, and data streaming will also be covered.
How sitecore depends on mongo db for scalability and performance, and what it...Antonios Giannopoulos
Percona Live 2017 - How sitecore depends on mongo db for scalability and performance, and what it can teach you by Antonios Giannopoulos and Grant Killian