This presentation covers all aspects of PostgreSQL administration, including installation, security, file structure, configuration, reporting, backup, daily maintenance, monitoring activity, disk space computations, and disaster recovery. It shows how to control host connectivity, configure the server, find the query being run by each session, and find the disk space used by each database.
This document provides instructions for setting up a MongoDB replica set across multiple virtual machines. It describes installing MongoDB on each VM, creating directories to store data, configuring the yum repository, and installing MongoDB packages. It then explains how to initialize and configure a local 3-node replica set, add members, and check the replica set status. Finally, it briefly discusses connecting to primary and secondary members, performing CRUD operations, and setting up MongoDB Management Service (MMS) for monitoring and backups.
MongoDB: Advantages of an Open Source NoSQL DatabaseFITC
Save 10% off ANY FITC event with discount code 'slideshare'
See our upcoming events at www.fitc.ca
OVERVIEW
The presentation will present an overview of the MongoDB NoSQL database, its history and current status as the leading NoSQL database. It will focus on how NoSQL, and in particular MongoDB, benefits developers building big data or web scale applications. Discuss the community around MongoDB and compare it to commercial alternatives. An introduction to installing, configuring and maintaining standalone instances and replica sets will be provided.
Presented live at FITC's Spotlight:MEAN Stack on March 28th, 2014.
More info at FITC.ca
Elasticsearch allows users to group related data into logical units called indices. An index can be defined using the create index API and documents are indexed to an index. Indices are partitioned into shards which can be distributed across multiple nodes for scaling. Each shard is a standalone Lucene index. Documents must be in JSON format with a unique ID and can contain any text or numeric data to be searched or analyzed.
MySQL async message subscription platformLouis liu
The document discusses Roma, an asynchronous MySQL message system that subscribes to messages from the Canal system. Roma stores these messages in its own storage (MetaQ) since Canal cannot reliably store messages. Clients can then subscribe to the messages from Roma. Roma acts as an intermediary between Canal and clients to ensure messages are reliably stored.
This document discusses streaming replication in PostgreSQL. It covers how streaming replication works, including the write-ahead log and replication processes. It also discusses setting up replication between a primary and standby server, including configuring the servers and verifying replication is working properly. Monitoring replication is discussed along with views and functions for checking replication status. Maintenance tasks like adding or removing standbys and pausing replication are also mentioned.
The paperback version is available on lulu.com there http://goo.gl/fraa8o
This is the first volume of the postgresql database administration book. The book covers the steps for installing, configuring and administering a PostgreSQL 9.3 on Linux debian. The book covers the logical and physical aspect of PostgreSQL. Two chapters are dedicated to the backup/restore topic.
This document summarizes benchmark tests of NoSQL document databases using MongoDB. It compares the performance of MongoDB's MapReduce and Aggregation Framework on single node and sharded cluster configurations. The tests measured query response times for common aggregation operations like counting most frequently mentioned users or hashed tags. The results showed that the Aggregation Framework was roughly 2 times faster than MapReduce. Scaling out to a sharded cluster with multiple nodes initially did not improve performance. However, partitioning the data across multiple shards in a modest 3 node cluster showed better performance than a single node, with query times decreasing as more shards were added up to an optimal number.
This presentation covers all aspects of PostgreSQL administration, including installation, security, file structure, configuration, reporting, backup, daily maintenance, monitoring activity, disk space computations, and disaster recovery. It shows how to control host connectivity, configure the server, find the query being run by each session, and find the disk space used by each database.
This document provides instructions for setting up a MongoDB replica set across multiple virtual machines. It describes installing MongoDB on each VM, creating directories to store data, configuring the yum repository, and installing MongoDB packages. It then explains how to initialize and configure a local 3-node replica set, add members, and check the replica set status. Finally, it briefly discusses connecting to primary and secondary members, performing CRUD operations, and setting up MongoDB Management Service (MMS) for monitoring and backups.
MongoDB: Advantages of an Open Source NoSQL DatabaseFITC
Save 10% off ANY FITC event with discount code 'slideshare'
See our upcoming events at www.fitc.ca
OVERVIEW
The presentation will present an overview of the MongoDB NoSQL database, its history and current status as the leading NoSQL database. It will focus on how NoSQL, and in particular MongoDB, benefits developers building big data or web scale applications. Discuss the community around MongoDB and compare it to commercial alternatives. An introduction to installing, configuring and maintaining standalone instances and replica sets will be provided.
Presented live at FITC's Spotlight:MEAN Stack on March 28th, 2014.
More info at FITC.ca
Elasticsearch allows users to group related data into logical units called indices. An index can be defined using the create index API and documents are indexed to an index. Indices are partitioned into shards which can be distributed across multiple nodes for scaling. Each shard is a standalone Lucene index. Documents must be in JSON format with a unique ID and can contain any text or numeric data to be searched or analyzed.
MySQL async message subscription platformLouis liu
The document discusses Roma, an asynchronous MySQL message system that subscribes to messages from the Canal system. Roma stores these messages in its own storage (MetaQ) since Canal cannot reliably store messages. Clients can then subscribe to the messages from Roma. Roma acts as an intermediary between Canal and clients to ensure messages are reliably stored.
This document discusses streaming replication in PostgreSQL. It covers how streaming replication works, including the write-ahead log and replication processes. It also discusses setting up replication between a primary and standby server, including configuring the servers and verifying replication is working properly. Monitoring replication is discussed along with views and functions for checking replication status. Maintenance tasks like adding or removing standbys and pausing replication are also mentioned.
The paperback version is available on lulu.com there http://goo.gl/fraa8o
This is the first volume of the postgresql database administration book. The book covers the steps for installing, configuring and administering a PostgreSQL 9.3 on Linux debian. The book covers the logical and physical aspect of PostgreSQL. Two chapters are dedicated to the backup/restore topic.
This document summarizes benchmark tests of NoSQL document databases using MongoDB. It compares the performance of MongoDB's MapReduce and Aggregation Framework on single node and sharded cluster configurations. The tests measured query response times for common aggregation operations like counting most frequently mentioned users or hashed tags. The results showed that the Aggregation Framework was roughly 2 times faster than MapReduce. Scaling out to a sharded cluster with multiple nodes initially did not improve performance. However, partitioning the data across multiple shards in a modest 3 node cluster showed better performance than a single node, with query times decreasing as more shards were added up to an optimal number.
The document provides configuration instructions and guidelines for setting up streaming replication between a PostgreSQL master and standby server, including setting parameter values for wal_level, max_wal_senders, wal_keep_segments, creating a dedicated replication role, using pg_basebackup to initialize the standby, and various recovery target options to control the standby's behavior. It also discusses synchronous replication using replication slots and monitoring the replication process on both the master and standby servers.
12cR2 Single-Tenant: Multitenant Features for All EditionsFranck Pachot
Multitenant architecture is available even without Oracle's multitenant option. In this session take a look at the overhead and the 12.2 new features so that you can choose among single-tenant or non-container databases. These features include agility in data movement, easy flashback, and fast upgrade.
This document summarizes a presentation about MySQL backup and monitoring tools MySQLdump and ZRM Community. It discusses how mysqldump is still a valid backup tool, provides 5 reasons to use ZRM including scheduling backups, monitoring backups, and email notifications. It also covers configuring ZRM with backup sets, parameters, scheduling, monitoring backup files, indexes and logs, and restoring backups. The presentation concludes by mentioning possibilities for further development like backing up remote databases.
GOTO 2013: Why Zalando trusts in PostgreSQLHenning Jacobs
NoSQL is on the rise but sadly when people compare the usual NoSQL candidates (Redis, MongoDB, Riak, Cassandra, HBase, ..) to relational databases they often only mention MySQL. In our presentation we tried to explain the power of the world’s most advanced opensource database - PostgreSQL. In our session we showed various examples of why we at Zalando trust PostgreSQL to reliably handle all our data. We make use of it in various scenarios, from less complex CRUD applications on a single node, to highly critical and more complex scenarios. This involves customer and order data with strong constraints for high performance and availability, sharded across multiple nodes. We believe that PostgreSQL is massively underrated and that you should have very good reasons to ignore its great features.
This document summarizes new features and upcoming releases for Ceph. In the Jewel release in April 2016, CephFS became more stable with improvements to repair and disaster recovery tools. The BlueStore backend was introduced experimentally to replace Filestore. Future releases Kraken and Luminous will include multi-active MDS support for CephFS, erasure code overwrites for RBD, management tools, and continued optimizations for performance and scalability.
Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013odnoklassniki.ru
Odnoklassniki uses cassandra for its business data, which doesn't fit into RAM. This data is typically fast growing, frequently accessed by our users and must be always available, because it constitute our primary business as a social network. The way we use cassandra is somewhat unusual - we don't use thrift or netty based native protocol to communicate with cassandra nodes remotely. Instead, we co-locate cassandra nodes in the same JVM with business service logic, exposing not generic data manipulation, but business level interface remotely. This way, we avoid extra network roundtrips within a single business transaction and use internal calls to Cassandra classes to get information faster. Also, this helps us to create many small hacks on Cassandra's internals, making huge gains on efficiency and ease of distributed server development.
This document introduces dbdeployer, a tool written in Go that allows users to easily deploy and manage MySQL sandboxes. It summarizes the key features and capabilities of dbdeployer, including installing single or replicated sandboxes, customizing configurations, finding free ports, and exposing various MySQL 8 data dictionary tables. The document provides instructions on downloading, unpacking, and using dbdeployer to deploy different types of MySQL configurations for testing or development purposes.
The document discusses Varnish, an open source HTTP accelerator. It provides an overview of caching and why it is useful, explaining how Varnish can be used as a reverse proxy and caching layer in front of a web server. The document then covers setup instructions for Varnish, configuring backends, and using Varnishstat to monitor caching statistics and performance metrics.
Caching and tuning fun for high scalability @ phpBenelux 2011Wim Godden
This document summarizes Wim Godden's presentation on caching and tuning for high scalability. It discusses various caching techniques including caching entire pages, parts of pages, SQL queries, and complex PHP results. It also covers different caching storage options like Memcache and APC. The presentation aims to increase performance, reliability, and scalability through proper caching and tuning techniques.
The document provides details about a presentation on Varnish caching proxy given by Thijs Feryn. It includes an agenda for the presentation covering topics like how browsers cache content, problems with browser caching, and how Varnish can be used as a caching proxy to improve website performance. Setup instructions are provided for installing and configuring Varnish on an Ubuntu server. The presentation also demonstrates how to use the Varnishstat tool to monitor hit rates and cache statistics.
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
Control your service resources with systemd Marian Marinov
This document discusses using systemd to manage control groups (cGroups) and set resource limits for processes and services. It describes how systemd simplified cGroup management by creating a cGroup for each service and allowing configuration via service files and drop-in files. Specific configuration options like memory and CPU limits can be set directly in the service file, via a slice file that multiple services reference, or using systemctl commands. Systemd provides unified management of cGroups and services.
The document discusses tools for troubleshooting database performance issues. It describes operating system tools like ps, vmstat, iostat that can help identify hardware and resource bottlenecks. It also covers PostgreSQL-specific tools like the pg_stat views and logs that provide insight into query performance and activity. Benchmarks like pgbench, bonnie++, and the more complex DBT2 are presented as options for reproducing and analyzing problems in a controlled way. The overall approach presented is to start with less invasive tools and progress to more targeted benchmarks if needed to pinpoint severe issues.
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016Tomas Vondra
The document provides an overview of different file systems for PostgreSQL including EXT3/4, XFS, BTRFS and ZFS. It discusses the evolution and improvements made to EXT3/4 and XFS over time to address scalability, bugs and new storage technologies like SSDs. BTRFS and ZFS are described as designed for large data volumes and built-in features like snapshots and checksums but BTRFS is still considered experimental. Benchmark results show ZFS and optimized EXT4/XFS performing best and BTRFS performance significantly reduced due to copy-on-write. The conclusion recommends EXT4/XFS for traditional needs and ZFS for advanced features, avoiding BTRFS.
CephFS performance testing was conducted on a Jewel deployment. Key findings include:
- Single MDS performance is limited by its single-threaded design; operations reached CPU limits
- Improper client behavior can cause MDS OOM issues by exceeding inode caching limits
- Metadata operations like create, open, update showed similar performance, reaching 4-5k ops/sec maximum
- Caching had a large impact on performance when the working set exceeded cache size
Varnish is an HTTP accelerator that can be used as a reverse proxy, caching proxy, or load balancer to speed up websites. It works by caching computed data and storing responses for faster serving of future requests. Using a caching proxy like Varnish solves problems with browser caches, provides a single cache, and protects servers from overload by serving cached content. The document provides instructions on setting up Varnish, configuring a backend server, and using the varnishstat tool to view caching statistics and hit rates.
This document provides a high-level summary of GemStone/S, a multi-user Smalltalk database. It discusses installation, architecture, tools, backup/restore, and other topics. The architecture utilizes a repository to store persistent objects, gem processes to run the Smalltalk virtual machines, a stone process to manage concurrency, and a shared page cache to improve performance. Installation requires configuring the operating system, installing the software in the recommended directory structure, setting environment variables, and obtaining a keyfile.
This document provides a summary of a presentation on becoming an accidental PostgreSQL database administrator (DBA). It covers topics like installation, configuration, connections, backups, monitoring, slow queries, and getting help. The presentation aims to help those suddenly tasked with DBA responsibilities to not panic and provides practical advice on managing a PostgreSQL database.
Facebook has 1.28 billion active users.
But how big is that, really?
This fun and informative presentation gives you a fuller picture of what 1.28 billion looks like, and tells you how to take advantage of those numbers. We compare Facebook to 4 other top social media sites and give you an idea of how useful 1.28 billion people can be.
So, how big IS Facebook, really?
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookBigDataCloud
At Facebook, we use various types of databases and storage system to satisfy the needs of different applications. The solutions built around these data store systems have a common set of requirements: they have to be highly scalable, maintenance costs should be low and they have to perform efficiently. We use a sharded mySQL+memcache solution to support real-time access of tens of petabytes of data and we use TAO to provide consistency of this web-scale database across geographical distances. We use Haystack datastore for storing the 3 billion new photos we host every week. We use Apache Hadoop to mine intelligence from 100 petabytes of clicklogs and combine it with the power of Apache HBase to store all Facebook Messages.
This talk describes the reasons why each of these databases are appropriate for their workloads and the design decisions and tradeoffs that were made while implementing these solutions. We touch upon the consistency, availability and partitioning tolerance of each of these solutions. We touch upon the reasons why some of these systems need ACID semantics and other systems do not. We briefly touch upon some futures of how we plan to do big-data deployments across geographical locations and our requirements for a new breed of pure-memory and pure-SSD based transactional database.
The document provides configuration instructions and guidelines for setting up streaming replication between a PostgreSQL master and standby server, including setting parameter values for wal_level, max_wal_senders, wal_keep_segments, creating a dedicated replication role, using pg_basebackup to initialize the standby, and various recovery target options to control the standby's behavior. It also discusses synchronous replication using replication slots and monitoring the replication process on both the master and standby servers.
12cR2 Single-Tenant: Multitenant Features for All EditionsFranck Pachot
Multitenant architecture is available even without Oracle's multitenant option. In this session take a look at the overhead and the 12.2 new features so that you can choose among single-tenant or non-container databases. These features include agility in data movement, easy flashback, and fast upgrade.
This document summarizes a presentation about MySQL backup and monitoring tools MySQLdump and ZRM Community. It discusses how mysqldump is still a valid backup tool, provides 5 reasons to use ZRM including scheduling backups, monitoring backups, and email notifications. It also covers configuring ZRM with backup sets, parameters, scheduling, monitoring backup files, indexes and logs, and restoring backups. The presentation concludes by mentioning possibilities for further development like backing up remote databases.
GOTO 2013: Why Zalando trusts in PostgreSQLHenning Jacobs
NoSQL is on the rise but sadly when people compare the usual NoSQL candidates (Redis, MongoDB, Riak, Cassandra, HBase, ..) to relational databases they often only mention MySQL. In our presentation we tried to explain the power of the world’s most advanced opensource database - PostgreSQL. In our session we showed various examples of why we at Zalando trust PostgreSQL to reliably handle all our data. We make use of it in various scenarios, from less complex CRUD applications on a single node, to highly critical and more complex scenarios. This involves customer and order data with strong constraints for high performance and availability, sharded across multiple nodes. We believe that PostgreSQL is massively underrated and that you should have very good reasons to ignore its great features.
This document summarizes new features and upcoming releases for Ceph. In the Jewel release in April 2016, CephFS became more stable with improvements to repair and disaster recovery tools. The BlueStore backend was introduced experimentally to replace Filestore. Future releases Kraken and Luminous will include multi-active MDS support for CephFS, erasure code overwrites for RBD, management tools, and continued optimizations for performance and scalability.
Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013odnoklassniki.ru
Odnoklassniki uses cassandra for its business data, which doesn't fit into RAM. This data is typically fast growing, frequently accessed by our users and must be always available, because it constitute our primary business as a social network. The way we use cassandra is somewhat unusual - we don't use thrift or netty based native protocol to communicate with cassandra nodes remotely. Instead, we co-locate cassandra nodes in the same JVM with business service logic, exposing not generic data manipulation, but business level interface remotely. This way, we avoid extra network roundtrips within a single business transaction and use internal calls to Cassandra classes to get information faster. Also, this helps us to create many small hacks on Cassandra's internals, making huge gains on efficiency and ease of distributed server development.
This document introduces dbdeployer, a tool written in Go that allows users to easily deploy and manage MySQL sandboxes. It summarizes the key features and capabilities of dbdeployer, including installing single or replicated sandboxes, customizing configurations, finding free ports, and exposing various MySQL 8 data dictionary tables. The document provides instructions on downloading, unpacking, and using dbdeployer to deploy different types of MySQL configurations for testing or development purposes.
The document discusses Varnish, an open source HTTP accelerator. It provides an overview of caching and why it is useful, explaining how Varnish can be used as a reverse proxy and caching layer in front of a web server. The document then covers setup instructions for Varnish, configuring backends, and using Varnishstat to monitor caching statistics and performance metrics.
Caching and tuning fun for high scalability @ phpBenelux 2011Wim Godden
This document summarizes Wim Godden's presentation on caching and tuning for high scalability. It discusses various caching techniques including caching entire pages, parts of pages, SQL queries, and complex PHP results. It also covers different caching storage options like Memcache and APC. The presentation aims to increase performance, reliability, and scalability through proper caching and tuning techniques.
The document provides details about a presentation on Varnish caching proxy given by Thijs Feryn. It includes an agenda for the presentation covering topics like how browsers cache content, problems with browser caching, and how Varnish can be used as a caching proxy to improve website performance. Setup instructions are provided for installing and configuring Varnish on an Ubuntu server. The presentation also demonstrates how to use the Varnishstat tool to monitor hit rates and cache statistics.
This presentation provides an overview of the Dell PowerEdge R730xd server performance results with Red Hat Ceph Storage. It covers the advantages of using Red Hat Ceph Storage on Dell servers with their proven hardware components that provide high scalability, enhanced ROI cost benefits, and support of unstructured data.
Control your service resources with systemd Marian Marinov
This document discusses using systemd to manage control groups (cGroups) and set resource limits for processes and services. It describes how systemd simplified cGroup management by creating a cGroup for each service and allowing configuration via service files and drop-in files. Specific configuration options like memory and CPU limits can be set directly in the service file, via a slice file that multiple services reference, or using systemctl commands. Systemd provides unified management of cGroups and services.
The document discusses tools for troubleshooting database performance issues. It describes operating system tools like ps, vmstat, iostat that can help identify hardware and resource bottlenecks. It also covers PostgreSQL-specific tools like the pg_stat views and logs that provide insight into query performance and activity. Benchmarks like pgbench, bonnie++, and the more complex DBT2 are presented as options for reproducing and analyzing problems in a controlled way. The overall approach presented is to start with less invasive tools and progress to more targeted benchmarks if needed to pinpoint severe issues.
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016Tomas Vondra
The document provides an overview of different file systems for PostgreSQL including EXT3/4, XFS, BTRFS and ZFS. It discusses the evolution and improvements made to EXT3/4 and XFS over time to address scalability, bugs and new storage technologies like SSDs. BTRFS and ZFS are described as designed for large data volumes and built-in features like snapshots and checksums but BTRFS is still considered experimental. Benchmark results show ZFS and optimized EXT4/XFS performing best and BTRFS performance significantly reduced due to copy-on-write. The conclusion recommends EXT4/XFS for traditional needs and ZFS for advanced features, avoiding BTRFS.
CephFS performance testing was conducted on a Jewel deployment. Key findings include:
- Single MDS performance is limited by its single-threaded design; operations reached CPU limits
- Improper client behavior can cause MDS OOM issues by exceeding inode caching limits
- Metadata operations like create, open, update showed similar performance, reaching 4-5k ops/sec maximum
- Caching had a large impact on performance when the working set exceeded cache size
Varnish is an HTTP accelerator that can be used as a reverse proxy, caching proxy, or load balancer to speed up websites. It works by caching computed data and storing responses for faster serving of future requests. Using a caching proxy like Varnish solves problems with browser caches, provides a single cache, and protects servers from overload by serving cached content. The document provides instructions on setting up Varnish, configuring a backend server, and using the varnishstat tool to view caching statistics and hit rates.
This document provides a high-level summary of GemStone/S, a multi-user Smalltalk database. It discusses installation, architecture, tools, backup/restore, and other topics. The architecture utilizes a repository to store persistent objects, gem processes to run the Smalltalk virtual machines, a stone process to manage concurrency, and a shared page cache to improve performance. Installation requires configuring the operating system, installing the software in the recommended directory structure, setting environment variables, and obtaining a keyfile.
This document provides a summary of a presentation on becoming an accidental PostgreSQL database administrator (DBA). It covers topics like installation, configuration, connections, backups, monitoring, slow queries, and getting help. The presentation aims to help those suddenly tasked with DBA responsibilities to not panic and provides practical advice on managing a PostgreSQL database.
Facebook has 1.28 billion active users.
But how big is that, really?
This fun and informative presentation gives you a fuller picture of what 1.28 billion looks like, and tells you how to take advantage of those numbers. We compare Facebook to 4 other top social media sites and give you an idea of how useful 1.28 billion people can be.
So, how big IS Facebook, really?
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookBigDataCloud
At Facebook, we use various types of databases and storage system to satisfy the needs of different applications. The solutions built around these data store systems have a common set of requirements: they have to be highly scalable, maintenance costs should be low and they have to perform efficiently. We use a sharded mySQL+memcache solution to support real-time access of tens of petabytes of data and we use TAO to provide consistency of this web-scale database across geographical distances. We use Haystack datastore for storing the 3 billion new photos we host every week. We use Apache Hadoop to mine intelligence from 100 petabytes of clicklogs and combine it with the power of Apache HBase to store all Facebook Messages.
This talk describes the reasons why each of these databases are appropriate for their workloads and the design decisions and tradeoffs that were made while implementing these solutions. We touch upon the consistency, availability and partitioning tolerance of each of these solutions. We touch upon the reasons why some of these systems need ACID semantics and other systems do not. We briefly touch upon some futures of how we plan to do big-data deployments across geographical locations and our requirements for a new breed of pure-memory and pure-SSD based transactional database.
Facebook uses HBase running on HDFS to store messaging data and metadata. Key reasons for choosing HBase include high write throughput, horizontal scalability, and integration with HDFS. Typical clusters have multiple regions and racks for redundancy. Facebook stores small messages, metadata, and attachments in HBase, while larger messages and attachments are stored separately. The system processes billions of read and write operations daily and continues to optimize performance and reliability.
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecksAnne Nicolas
lash devices introduced a sudden shift in the performance profile of direct attached storage. With IOPS rates orders of magnitude higher than rotating storage, it became clear that Linux needed a re-design of its storage stack to properly support and get the most out of these new devices.
This talk will detail the architecture of blk-mq, the redesign of the core of the Linux storage stack, and the later set of changes made to adapt the SCSI stack to this new queuing model. Early results of running Facebook infrastructure production workloads on top of the new stack will also be shared.
Jense Axboe, Facebook
Facebook's TAO & Unicorn data storage and search platformsNitish Upreti
Unicorn is Facebook's in-memory, distributed graph search system that allows users to perform complex queries over the social graph. It supports operators like Apply and Extract that enable multi-step graph traversals to find socially relevant results. Unicorn stores adjacency lists in a sharded architecture and uses techniques like weak AND to balance social proximity and result diversity. It also attaches lineage metadata to results to allow privacy-aware rendering of results by Facebook's frontend services.
Extract business value by analyzing large volumes of multi-structured data from various sources such as databases, websites, blogs, social media, smart sensors...
Facebook uses a variety of open source and proprietary technologies to power its massive social network. It relies on technologies like Linux, Apache, PHP, Memcache, Cassandra and more to handle over 500 million users and massive amounts of content shared every day. Facebook has also developed its own technologies like BigPipe, Haystack, HipHop and others to optimize performance and scalability as it continues to grow.
The document summarizes the challenges of scaling social networking services like Facebook at a massive scale. It discusses how the interconnected social graph makes operations like querying expensive. It then outlines the evolution of Facebook's software architecture, including the web tier using PHP and optimizations like HipHop, various storage systems like MySQL and HBase, the memcache caching tier and its replacement Tao, and specialized services like News Feed that operate at massive scale. Complex infrastructure and a rapidly evolving product also pose challenges to scaling.
Big Data: The 4 Layers Everyone Must KnowBernard Marr
The document discusses the 4 key layers of a big data system:
1. The data source layer where data arrives from various sources like sales records, social media, etc.
2. The data storage layer where big data is stored using systems like Hadoop or Google File System. It also requires a database system.
3. The data processing/analysis layer where tools like MapReduce are used to select, analyze, and format the data to glean insights.
4. The data output layer is how the insights are communicated to decision makers through reports, charts and recommendations to take action.
Facebook uses a distributed systems architecture with services like Memcache, Scribe, Thrift, and Hip Hop to handle large data volumes and high concurrency. Key components include the Haystack photo storage system, BigPipe for faster page loading, and a PHP front-end optimized using Hip Hop. Data is partitioned horizontally and services communicate using lightweight protocols like Thrift.
This document discusses Facebook and social media. It provides key facts about Facebook's history, growth, users, and features. Facebook was founded in 2004 by Mark Zuckerberg and other Harvard students. It has grown exponentially since expanding publicly in 2006, reaching over 42 million active users as of 2007 and becoming the second largest social network after MySpace. The document examines Facebook's international growth and plans to translate the site to other languages to continue expanding globally.
Big Data - The 5 Vs Everyone Must KnowBernard Marr
This slide deck, by Big Data guru Bernard Marr, outlines the 5 Vs of big data. It describes in simple language what big data is, in terms of Volume, Velocity, Variety, Veracity and Value.
Many believe Big Data is a brand new phenomenon. It isn't, it is part of an evolution that reaches far back history. Here are some of the key milestones in this development.
Big data architectures and the data lakeJames Serra
The document provides an overview of big data architectures and the data lake concept. It discusses why organizations are adopting data lakes to handle increasing data volumes and varieties. The key aspects covered include:
- Defining top-down and bottom-up approaches to data management
- Explaining what a data lake is and how Hadoop can function as the data lake
- Describing how a modern data warehouse combines features of a traditional data warehouse and data lake
- Discussing how federated querying allows data to be accessed across multiple sources
- Highlighting benefits of implementing big data solutions in the cloud
- Comparing shared-nothing, massively parallel processing (MPP) architectures to symmetric multi-processing (
Big data refers to the massive amounts of unstructured data that are growing exponentially. Hadoop is an open-source framework that allows processing and storing large data sets across clusters of commodity hardware. It provides reliability and scalability through its distributed file system HDFS and MapReduce programming model. The Hadoop ecosystem includes components like Hive, Pig, HBase, Flume, Oozie, and Mahout that provide SQL-like queries, data flows, NoSQL capabilities, data ingestion, workflows, and machine learning. Microsoft integrates Hadoop with its BI and analytics tools to enable insights from diverse data sources.
Big Data and advanced analytics are critical topics for executives today. But many still aren't sure how to turn that promise into value. This presentation provides an overview of 16 examples and use cases that lay out the different ways companies have approached the issue and found value: everything from pricing flexibility to customer preference management to credit risk analysis to fraud protection and discount targeting. For the latest on Big Data & Advanced Analytics: http://mckinseyonmarketingandsales.com/topics/big-data
20 Facebook, Twitter, Linkedin & Pinterest Features You Didn't Know Existed (...HubSpot
The document outlines 20 hidden or lesser-known features of popular social media platforms like Facebook, Twitter, LinkedIn, and Pinterest. Some of the featured tips include saving links and articles to access later on Facebook, replacing ads with baby animal pictures on Facebook using a Chrome extension, embedding SlideShare presentations directly into tweets, and creating a photo collage directly in a single tweet on Twitter. The document also provides tips for adding hidden relationship notes on LinkedIn profiles and using trackable links on Pinterest that won't get flagged as spam.
This document summarizes key insights from a McKinsey presentation on customer journey analytics and big data. It finds that companies are storing large amounts of data but few know how to extract value from it. Analyzing customer journeys rather than individual touchpoints provides more predictive insights into customer satisfaction and churn. Mapping important customer journeys in an industry reveals opportunities to improve the customer experience and reduce costs. The presentation provides an example of a retail bank that identified ways to decrease service costs and improve customer satisfaction by analyzing its customer journey data.
The document discusses memory hierarchy and virtual memory. It describes how memory is organized in a hierarchy with registers and caches providing the fastest access but smallest capacity, and disks providing the largest capacity but slowest access. It explains how caches improve performance by exploiting locality of reference. Direct, associative, and set-associative mapping functions are described for placing cache blocks. Virtual memory allows programs to access memory using virtual addresses that are translated to physical addresses, enabling programs to appear larger than actual memory.
Cache memory is used to improve processor performance by making main memory access appear faster. It works based on the principle of locality of reference, where programs tend to access the same data/instructions repeatedly. A cache hit provides faster access than main memory, while a miss requires retrieving data from main memory. Caches use mapping functions like direct, associative, or set-associative mapping to determine where to place blocks of data from main memory.
SSD Caching: Device-Mapper- and Hardware-based solutions compared Werner Fischer
The document provides an overview comparing SSD caching solutions that use device-mapper-based (FlashCache) and hardware-based (CacheCade, MaxCache) approaches. It discusses technologies, performance results from tests, and lessons learned. The presentation contained 45 slides covering topics like introduction, caching basics, feature comparisons of solutions, deep dives on FlashCache and CacheCade, and conclusions.
ZFS is a filesystem that provides end-to-end data integrity and reliability through the use of checksums, copy-on-write transactions, and pooled storage. Key features include detecting and correcting silent data corruption, eliminating volumes in favor of pooled storage, and providing a transactional design with consistent data. Administration is simplified with only two commands needed to manage the entire storage configuration.
In the past few years, the bar for exploitation was raised highly, and in the current state of software security it is harder and harder to make successful exploitation on newest operating systems.
But as some systems continue to evolve and introduce new mitigations, the others just freeze a few years behind. In our talk we will focus on rooting Android by two racing conditions vulnerabilities. We will show the differences between level of exploitation needed, and how some mobile vendors are killing offered security features.
Fine Tuning and Enhancing Performance of Apache Spark JobsDatabricks
Apache Spark defaults provide decent performance for large data sets but leave room for significant performance gains if able to tune parameters based on resources and job.
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt AhrensMatthew Ahrens
Guest lecture at Brown University's Computer Science Operating Systems class, CS167, by Matt Ahrens, co-creator of ZFS. Introduction by professor Tom Doeppner. Recording, March 2017: https://youtu.be/uJGkyMxdNFE
Topics:
- Data structures and algorithms used by ZFS snapshots
- Overview of ZFS on-disk structure
- Data structures used for ZFS space allocation
- RAID-Z compared with traditional RAID-4/5/6
Class website: http://cs.brown.edu/courses/cs167/
Database performance tuning for SSD based storageAngelo Rajadurai
Databases are a key part of any application. The storage subsystem contributes most to performance of the database. In recent days, new storage technologies like Solid State Storage (SSD) and high performance drives are becoming cheaper and more accessible, but it takes a lot of planning to use these technologies in a cost effective way for best price-performance.
Modern processors are faster than memory
So Processors may waste time for accessing memory
Its purpose is to make the main memory appear to the processor to be much faster than it actually is
Databases are a key part of any application. The storage subsystem contributes most to performance of the database. In recent days, new storage technologies like Solid State Storage (SSD) and high performance drives are becoming cheaper and more accessible, but it takes a lot of planning to use these technologies in a cost effective way for best price-performance.
The Oracle Database Smart Flash Cache is a read-only cache that stores clean database blocks from the buffer cache on flash memory devices to improve performance. As blocks age out of the buffer cache, they can be moved to the Flash Cache for quick retrieval back into the buffer cache without needing slower I/O from conventional storage. The Flash Cache provides a cost-effective way to expand the buffer cache beyond memory limits by caching data on faster flash storage.
David Jiang presented InnoSQL, a new branch of MySQL that features a flash cache for the InnoDB storage engine. The flash cache provides higher performance than using an SSD as the durable storage alone. It caches both reads and writes using SSDs and employs techniques like merge writes and sequential writes to optimize for SSD performance. Benchmark results showed the flash cache improved throughput for the TPC-C workload by around 2x compared to using SSDs as the durable storage directly.
This document provides an overview of Real Application Clusters (RAC) for beginners. It defines what RAC is, explains the key components and terminology of RAC including cache fusion, global locking, interconnects, and virtual IPs. It describes how data is stored and replicated across multiple database instances and nodes in a RAC configuration. The document also discusses important RAC concepts such as instance recovery, undo tablespaces, storage layout, TNS entries, and virtual IPs.
Speaker: Vladimir Rodionov (bigbase.org)
This talks introduces a totally new implementation of a multilayer caching in HBase called BigBase. BigBase has a big advantage over HBase 0.94/0.96 because of an ability to utilize all available server RAM in the most efficient way, and because of a novel implementation of a L3 level cache on fast SSDs. The talk will show that different type of caches in BigBase work best for different type of workloads, and that a combination of these caches (L1/L2/L3) increases the overall performance of HBase by a very wide margin.
An intro to RAIDZ and how an investigation in to why a bhyve VM runs out of space led us to finding surprising things about how space hungry RAIDZ is and the theory of RAIDZ space accounting.
An intro to RAIDZ and how an investigation in to why a bhyve VM runs out of space led us to finding surprising things about how space hungry RAIDZ is and the theory of RAIDZ space accounting.
This document discusses using logical volume management (LVM) with MySQL for enterprise storage. Some key points include:
- LVM allows virtualizing storage into logical volumes mapped to physical disks, enabling features like snapshots and clones for backups and replication.
- Snapshots provide point-in-time copies while clones are writable copies, both using copy-on-write to avoid duplicating data.
- Backing up MySQL with snapshots is fast, and recovery involves reverting the logical volumes rather than restoring files. Replication slaves can be quickly recloned from snapshots.
- With shared storage, replication slaves maintain the same blocks as the master for efficient scaling and resyncing slaves by recloning
In this talk, we'll walk through RocksDB technology and look into areas where MyRocks is a good fit by comparison to other engines such as InnoDB. We will go over internals, benchmarks, and tuning of MyRocks engine. We also aim to explore the benefits of using MyRocks within the MySQL ecosystem. Attendees will be able to conclude with the latest development of tools and integration within MySQL.
This document discusses new capabilities in CFEngine 3, an advanced configuration management system. Key points include:
- CFEngine 3 is declarative, ensures desired state is reached through convergence, is lightweight using 3-6MB of memory, and can run continuously to check configurations every 5 minutes.
- It supports both new platforms like ARM boards and older systems like Solaris.
- Recent additions allow managing resources like SQL databases, XML files, and virtual machines in a code-free manner using the Design Center.
- CFEngine treats all resources like files, processes, and VMs as maintainable and ensures they self-correct through convergence to the desired state.
Kuyper Hoffmann's presentation from the #lspe "Private Clouds" event: http://www.meetup.com/SF-Bay-Area-Large-Scale-Production-Engineering/events/48901162/
The document discusses MongoDB's new aggregation framework, which provides a declarative pipeline for performing data aggregation operations on complex documents. The framework allows users to describe a chain of operations without writing JavaScript. It will offer high-performance operators like $match, $project, $unwind, $group, $sort, and computed expressions to reshape and analyze document data without the overhead of JavaScript. The aggregation framework is nearing release and will support sharding by forwarding pipeline operations to shards and combining results.
Replication in MongoDB allows for high availability and scaling of reads. A replica set consists of at least three mongod servers, with one primary and one or more secondaries that replicate from the primary. Writes go to the primary while reads can be distributed to secondaries for scaling. Replica sets are configured and managed through shell helpers, and maintain consistency through an oplog and elections when the primary is unavailable.
Replication in MongoDB allows for high availability and scaling of reads. A replica set consists of at least three mongod servers, with one primary and one or more secondaries that replicate from the primary. The primary applies all write operations to its oplog, which is then replicated to the secondaries. If the primary fails, a new primary is elected from the remaining secondaries. Administrative commands help monitor and manage the replica set configuration.
Vladimir Vuksan's presentation on Ganglia at the "Not Nagios" episode of The Bay Area Large-Scale Production Engineering meetup: http://www.meetup.com/SF-Bay-Area-Large-Scale-Production-Engineering/events/15481164/
This document discusses MongoDB's new aggregation framework, which provides a more performant and declarative way to perform data aggregation tasks compared to MapReduce. The framework includes pipeline operations like $match, $project, and $group that allow filtering, reshaping, and grouping documents. It also features an expression language for computed fields. The initial release will support aggregation pipelines and sharding, with future plans to add more operations and expressions.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Project Management Semester Long Project - Acuityjpupo2018
Acuity is an innovative learning app designed to transform the way you engage with knowledge. Powered by AI technology, Acuity takes complex topics and distills them into concise, interactive summaries that are easy to read & understand. Whether you're exploring the depths of quantum mechanics or seeking insight into historical events, Acuity provides the key information you need without the burden of lengthy texts.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
3. FlashCache at Facebook
▪ What
▪ We want to use some Flash storage on existing servers
▪ We want something that is simple to deploy and use
▪ Our IO access patterns benefit from a cache
▪ Who
▪ Mohan Srinivasan – design and implementation
▪ Paul Saab – platform and MySQL integration
▪ Michael Jiang – testing, performance and capacity planning
▪ Mark Callaghan - benchmarketing
4. Introduction
▪ Block cache for Linux - write back and write through modes
▪ Layered below the filesystem at the top of the storage stack
▪ Cache Disk Blocks on fast persistent storage (Flash, SSD)
▪ Loadable Linux Kernel module, built using the Device Mapper (DM)
▪ Primary use case InnoDB, but general purpose
▪ Based on dm-cache by Prof. Ming
5. Caching Modes
Write Back Write Through, Write Around
▪ Lazy writing to disk ▪ Non-persistent
▪ Persistent across reboot ▪ Are you a pessimist?
▪ Persistent across device removal
6. Cache Structure
▪ Set associative hash
▪ Hash with fixed sized buckets (sets) with linear probing within a set
▪ 512-way set associative by default
▪ dbn: Disk Block Number, address of block on disk
▪ Set = (dbn / block size / set size) mod (number of sets)
▪ Sequential range of dbns map onto a single sets
7. Cache Structure
. .
Set 0 . .
. .
. .
. .
. .
Block 0
Cache set Block 0
.
.
.
Block 511
.
SET i .
. N set’s worth
.
. Of blocks
.
Block 0
.
Cache set Block 511 .
.
.
.
Block 511
.
. .
. .
Set N-1
8. Replacement and Memory Footprint
▪ Replacement policy is FIFO (default) or LRU within a set
▪ Switch on the fly between FIFO/LRU (sysctl)
▪ Metadata per cache block: 16 bytes in memory, 16 bytes on ssd
▪ On ssd metadata per-slot
▪ <dbn, block state>
▪ In memory metadata per-slot:
▪ <dbn, block state, LRU chain pointers, misc>
9. Reads
▪ Compute cache set for dbn
▪ Cache Hit
▪ Verify checksums if configured
▪ Serve read out of cache
▪ Cache Miss
▪ Find free block or reclaim block based on replacement policy
▪ Read block from disk and populate cache
▪ Update block checksum if configured
▪ Return data to user
10. Write Through - writes
▪ Compute cache set for dbn
▪ Cache hit
▪ Get cached block
▪ Cache miss
▪ Find free block or reclaim block
▪ Write data block to disk
▪ Write data block to cache
▪ Update block checksum
11. Write Back - writes
▪ Compute cache set for dbn
▪ Cache Hit
▪ Write data block into cache
▪ If data block not DIRTY, synchronously update on-ssd cache metadata to
mark block DIRTY
▪ Cache miss
▪ Find free block or reclaim block based on replacement policy
▪ Write data block to cache
▪ Synchronously update on-ssd cache metadata to mark block DIRTY
12. Small or uncacheable requests
▪ First invalidate blocks that overlap the requests
▪ There are at most 2 such blocks
▪ For Write Back, if the overlapping blocks are DIRTY they are cleaned
first then invalidated
▪ Uncacheable full block reads are served from cache in case of a cache
hit.
▪ Perform disk IO
▪ Repeat invalidation to close races which might have caused the block
to be cached while the disk IO was in progress
13. Write Back policy
▪ Dirty blocks not recently accessed
▪ A clock-like algorithm picks off Dirty blocks not accessed in the last
15 minutes (configurable) for cleaning
▪ When dirty blocks in a set exceeds configurable threshold, clean some
blocks
▪ Blocks selected for writeback based on replacement policy
▪ Default dirty threshold 20%. Set higher for write heavy workloads
▪ Sort selected blocks and pickup any other blocks in set that can be
contiguously merged with these
▪ Writes merged by the IO scheduler
14. Write Back – overheads
▪ In-Memory cache metadata memory footprint
▪ 300GB/4KB cache -> ~1.2GB
▪ 160GB/4KB cache -> ~640MB
▪ Cache metadata writes/file system write
▪ Worst case is 2 cache metadata updates per write
▪ (VALID->DIRTY, DIRTY->VALID)
▪ Average case is much lower because of cache write hits and batching of
cache metadata updates
15. Write Through/Around – cache overheads
▪ In-Memory Cache metadata footprint
▪ 300GB/4KB cache -> ~1.2GB
▪ 160GB/4KB cache -> ~640MB
▪ Cache metadata writes per file system write
▪ 1 cache data write per file system write (Write Through)
▪ No overhead (for Write Around)
16. Write Back – metadata updates
▪ Cache (on-ssd) metadata only updated on writes and block cleanings
(VALID->DIRTY or DIRTY->VALID)
▪ Cache (on-ssd) metadata not updated on cache population for reads
▪ Reload after an unclean shutdown only loads DIRTY blocks
▪ Fast and Slow cache shutdowns
▪ Only metadata is written on fast shutdown. Reload loads both dirty and
clean blocks
▪ Slow shutdown writes all dirty blocks to disk first, then writes out
metadata to the ssd. Reload only loads clean blocks.
▪ Metadata updates to multiple blocks in same sector are batched
17. Torn Page Problem
▪ Handle partial block write caused by power failure or other causes
▪ Problem exists for Flashcache in Write Back mode
▪ Detected via block checksums
▪ Checksums are disabled by default
▪ Pages with bad checksums are not used
▪ Checksums increase cache metadata writes and memory footprint
▪ Update cache metadata checksums on DIRTY->DIRTY block transitions
for Write Back
▪ Each per-cache slot grows by 8 bytes to hold the checksum (a 50%
increase from 16 bytes to 24 bytes for the Write Back case).
18. Cache controls for Write Back
▪ Work best with O_DIRECT file access
▪ Global modes – Cache All or Cache Nothing
▪ Cache All has a blacklist of pids and tgids
▪ Cache Nothing has a whitelist of pids and tgids
▪ tgids can be used to tag all pthreads in the group as cacheable
▪ Exceptions for threads within a group are supported
▪ List changes done via FlashCache ioctls
▪ Cache can be read but is not written for non-cacheable tgids and pids
▪ We modified MySQL and scp to use this support
19. Cache Nothing policy
▪ If the thread id is whitelisted, cache all IOs for this thread
▪ If the tgid is whitelisted, cache all IOs for this thread
▪ If the thread id is blacklisted do not cache IOs
20. Cache control example
▪ We use Cache Nothing mode for MySQL servers
▪ The mysqld tgid is added to the whitelist
▪ All IO done by it is cacheable
▪ Writes done by other processes do not update the cache
▪ Full table scans done by mysqldump use a hint that directs mysqld to
add the query’s thread id to the blacklist to avoid wiping FlashCache
▪ select /* SQL_NO_FCACHE */ pk, col1, col2 from foobar
25. Future Work
▪ Cache mirroring
▪ SW RAID 0 block device as a cache
▪ Online cache resize
▪ No shutdown and recreate
▪ Support for ATA trim
▪ Discard blocks no longer in use
▪ Fix the torn page problem
▪ Use shadow pages