This document summarizes optimizations made to Memcached to improve its scalability. The baseline Memcached implementation did not scale well beyond 3 CPU cores due to coarse-grained global locks. The optimized version removed these locks and used non-blocking operations, striped locks, and a "Bag" LRU scheme to allow linear scaling up to 16 cores. These changes improved performance by 900% and efficiency by 3.4x compared to the baseline.
Introduction of mesos persistent storageZhou Weitao
1. How to run stateful service against current Mesos-0.22
2. Disk isolation and monitoring
3. Persistent Volumes
4. Dynamic Reservations
5. What we can contribute for Mesos persistent storage
A brief introduction to Hadoop distributed file system. How a file is broken into blocks, written and replicated on HDFS. How missing replicas are taken care of. How a job is launched and its status is checked. Some advantages and disadvantages of HDFS-1.x
Memcached or Redis? It's a question that nearly always arises in any discussion about squeezing more performance out of a modern, database-driven Web application. When performance needs to be improved, caching is often the first step employed, and Memcached and Redis are typically the first places to turn.
[db tech showcase Tokyo 2014] B15: Scalability with MariaDB and MaxScale by ...Insight Technology, Inc.
Scalability with MariaDB and MaxScale talks about MariaDB 10, and MaxScale, a pluggable router for your queries. These are technologies developed at MariaDB Corporation, made opensource, and will help scale your MariaDB and MySQL workloads
This is a introduction to PostgreSQL that provides a brief overview of PostgreSQL's architecture, features and ecosystem. It was delivered at NYLUG on Nov 24, 2014.
http://www.meetup.com/nylug-meetings/events/180533472/
Introduction of mesos persistent storageZhou Weitao
1. How to run stateful service against current Mesos-0.22
2. Disk isolation and monitoring
3. Persistent Volumes
4. Dynamic Reservations
5. What we can contribute for Mesos persistent storage
A brief introduction to Hadoop distributed file system. How a file is broken into blocks, written and replicated on HDFS. How missing replicas are taken care of. How a job is launched and its status is checked. Some advantages and disadvantages of HDFS-1.x
Memcached or Redis? It's a question that nearly always arises in any discussion about squeezing more performance out of a modern, database-driven Web application. When performance needs to be improved, caching is often the first step employed, and Memcached and Redis are typically the first places to turn.
[db tech showcase Tokyo 2014] B15: Scalability with MariaDB and MaxScale by ...Insight Technology, Inc.
Scalability with MariaDB and MaxScale talks about MariaDB 10, and MaxScale, a pluggable router for your queries. These are technologies developed at MariaDB Corporation, made opensource, and will help scale your MariaDB and MySQL workloads
This is a introduction to PostgreSQL that provides a brief overview of PostgreSQL's architecture, features and ecosystem. It was delivered at NYLUG on Nov 24, 2014.
http://www.meetup.com/nylug-meetings/events/180533472/
MySQL Administrator
Basic course
- MySQL 개요
- MySQL 설치 / 설정
- MySQL 아키텍처 - MySQL 스토리지 엔진
- MySQL 관리
- MySQL 백업 / 복구
- MySQL 모니터링
Advanced course
- MySQL Optimization
- MariaDB / Percona
- MySQL HA (High Availability)
- MySQL troubleshooting
네오클로바
http://neoclova.co.kr/
MariaDB 10.5 binary install (바이너리 설치)
- 네오클로바 DB지원사업부
1. About MariaDB
1.1 MariaDB 개요
1.2 MariaDB as a R-DBMS
1.3 Open Source Database System
2. 설치
2.1 설치 기본 정보
2.2 설치 준비
2.3 MariaDB 설치
2.4 MariaDB 시작 / 접속 / 종료
2.5 추가 설정
Introduction to HBase. HBase is a NoSQL databases which experienced a tremendous increase in popularity during the last years. Large companies like Facebook, LinkedIn, Foursquare are using HBase. In this presentation we will address questions like: what is HBase?, and compared to relational databases?, what is the architecture?, how does HBase work?, what about the schema design?, what about the IT ressources?. Questions that should help you consider whether this solution might be suitable in your case.
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisSameer Tiwari
There is a plethora of storage solutions for big data, each having its own pros and cons. The objective of this talk is to delve deeper into specific classes of storage types like Distributed File Systems, in-memory Key Value Stores, Big Table Stores and provide insights on how to choose the right storage solution for a specific class of problems. For instance, running large analytic workloads, iterative machine learning algorithms, and real time analytics.
The talk will cover HDFS, HBase and brief introduction to Redis
How to configure the clusterbased on Multi-site (WAN) configurationAkihiro Kitada
Step by step instruction to configure Apache Geode cluster based on Multi-site (WAN) configuration. This is basically applicable to Pivotal GemFire - commercial version of Apache Geode.
When tasked with repairing an Avionics fault the default assumption prevalent among repair technicians is that a Line Replaceable Unit (LRU) will be the source of the problem. This ‘LRU replacement norm’ has been largely unchallenged in aviation maintenance organisations for decades and has a major influence on the size of the No Fault Found (NFF) problem, and on safety, reliability and repair costs. This Slideshare describes the issues that sustain the ‘LRU replacement norm’ and proposes a maintenance strategy approach to increase Avionics repair success rates by resetting the ‘LRU replacement norm’ using a 3-stage approach: Containment, Measurement and Improvement.
MySQL Administrator
Basic course
- MySQL 개요
- MySQL 설치 / 설정
- MySQL 아키텍처 - MySQL 스토리지 엔진
- MySQL 관리
- MySQL 백업 / 복구
- MySQL 모니터링
Advanced course
- MySQL Optimization
- MariaDB / Percona
- MySQL HA (High Availability)
- MySQL troubleshooting
네오클로바
http://neoclova.co.kr/
MariaDB 10.5 binary install (바이너리 설치)
- 네오클로바 DB지원사업부
1. About MariaDB
1.1 MariaDB 개요
1.2 MariaDB as a R-DBMS
1.3 Open Source Database System
2. 설치
2.1 설치 기본 정보
2.2 설치 준비
2.3 MariaDB 설치
2.4 MariaDB 시작 / 접속 / 종료
2.5 추가 설정
Introduction to HBase. HBase is a NoSQL databases which experienced a tremendous increase in popularity during the last years. Large companies like Facebook, LinkedIn, Foursquare are using HBase. In this presentation we will address questions like: what is HBase?, and compared to relational databases?, what is the architecture?, how does HBase work?, what about the schema design?, what about the IT ressources?. Questions that should help you consider whether this solution might be suitable in your case.
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisSameer Tiwari
There is a plethora of storage solutions for big data, each having its own pros and cons. The objective of this talk is to delve deeper into specific classes of storage types like Distributed File Systems, in-memory Key Value Stores, Big Table Stores and provide insights on how to choose the right storage solution for a specific class of problems. For instance, running large analytic workloads, iterative machine learning algorithms, and real time analytics.
The talk will cover HDFS, HBase and brief introduction to Redis
How to configure the clusterbased on Multi-site (WAN) configurationAkihiro Kitada
Step by step instruction to configure Apache Geode cluster based on Multi-site (WAN) configuration. This is basically applicable to Pivotal GemFire - commercial version of Apache Geode.
When tasked with repairing an Avionics fault the default assumption prevalent among repair technicians is that a Line Replaceable Unit (LRU) will be the source of the problem. This ‘LRU replacement norm’ has been largely unchallenged in aviation maintenance organisations for decades and has a major influence on the size of the No Fault Found (NFF) problem, and on safety, reliability and repair costs. This Slideshare describes the issues that sustain the ‘LRU replacement norm’ and proposes a maintenance strategy approach to increase Avionics repair success rates by resetting the ‘LRU replacement norm’ using a 3-stage approach: Containment, Measurement and Improvement.
[Hanoi-August 13] Tech Talk on Caching SolutionsITviec
ITviec Tech Talk
Hanoi, 24 August 2013
Topic: Caching Solutions
Speaker: Mr. Hoang Tran from Niteco
For full report of the talk: http://blog.itviec.com/2013/08/caching-solutions-response-time-niteco/
Speaker: Varun Sharma (Pinterest)
Over the past year, HBase has become an integral component of Pinterest's storage stack. HBase has enabled us to quickly launch and iterate on new products and create amazing pinner experiences. This talk briefly describes some of these applications, the underlying schema, and how our HBase setup stays highly available and performant despite billions of requests every week. It will also include some performance tips for running on SSDs. Finally, we will talk about a homegrown serving technology we built from a mashup of HBase components that has gained wide adoption across Pinterest.
How to get the maximum performance from your AEP server. This will discuss ways to improve execution time of short running jobs and how to properly configure the server depending on the expected number of users as well as the average size and duration of individual jobs. Included will be examples of making use of job pooling, Database connection sharing, and parallel subprotocol tuning. Determining when to make use of cluster, grid, or load balanced configurations along with memory and CPU sizing guidelines will also be discussed.
We will show the advantages of having a geo-distributed database cluster and how to create one using Galera Cluster for MySQL. We will also discuss the configuration and status variables that are involved and how to deal with typical situations on the WAN such as slow, untrusted or unreliable links, latency and packet loss. We will demonstrate a multi-region cluster on Amazon EC2 and perform some throughput and latency measurements in real-time (video http://galeracluster.com/videos/using-galera-replication-to-create-geo-distributed-clusters-on-the-wan-webinar-video-3/)
Using galera replication to create geo distributed clusters on the wanSakari Keskitalo
We will show the advantages of having a geo-distributed database cluster and how to create one using Galera Cluster for MySQL. We will also discuss the configuration and status variables that are involved and how to deal with typical situations on the WAN such as slow, untrusted or unreliable links, latency and packet loss. We will demonstrate a multi-region cluster on Amazon EC2 and perform some throughput and latency measurements in real-time.
Using galera replication to create geo distributed clusters on the wanSakari Keskitalo
We will show the advantages of having a geo-distributed database cluster and how to create one using Galera Cluster for MySQL. We will also discuss the configuration and status variables that are involved and how to deal with typical situations on the WAN such as slow, untrusted or unreliable links, latency and packet loss. We will demonstrate a multi-region cluster on Amazon EC2 and perform some throughput and latency measurements in real-time.
Work with hundred of hot terabytes in JVMsMalin Weiss
Third-party updates to the database can cause Hazelcast applications to work with data which is out-of-date.
By synchronizing with an underlying database using an SQL Reflector, the Hazelcast Maps will be “alive” and change whenever the underlying data changes. The solution can also automatically derive domain models directly from the database schemas, so that you can start using the solution very quickly and handle extreme volumes of data.
Enterprise Presto PaaS offering in Google Cloud Ashish Tadose
Presentation highlights need for Presto in modern data analytics stack.
Reference architecture for multi-tenant Presto offering in GCP at Walmart.
Components used for automated deployment, auto-scaling, caching & security integrations to open source Presto.
Slides presented at Great Indian Developer Summit 2016 at the session MySQL: What's new on April 29 2016.
Contains information about the new MySQL Document Store released in April 2016.
In this lecture we analyze key-values databases. At first we introduce key-value characteristics, advantages and disadvantages.
Then we analyze the major Key-Value data stores and finally we discuss about Dynamo DB.
In particular we consider how Dynamo DB: How is implemented
1. Motivation Background
2. Partitioning: Consistent Hashing
3. High Availability for writes: Vector Clocks
4. Handling temporary failures: Sloppy Quorum
5. Recovering from failures: Merkle Trees
6. Membership and failure detection: Gossip Protocol
Similar to [B5]memcached scalability-bag lru-deview-100 (20)
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
2. Content
• What is Memcached
• Usage model
• Measuring performance
• Baseline performance & scalability
• Performance root cause
• Base transaction flow
• Optimization goals, design considerations
• Optimized transaction flow
• Optimization details
• Optimized version performance
• Summary
2
3. What is Memcached?
• Open Source distributed memory caching system
− Typically serves as a cache for persistent databases
− In-memory key-value store for quick data access
− For a particular “key” a “value” is stored/deleted/retrieved etc.
− Provides a networked data caching API simple to use and setup
• Used by many companies with web centric businesses
• Most common usage model - web data caching
− Original data resides in persistent database
− Database queries are expensive
− Memcached caches the data to provide low latency access
− Helps reduce the load on the database
• Computational cache
• Temporary object store
3
4. Web data caching usage model
• Memcached tier acts as a cache for the database tier
− Cache is spread over several memcached servers
• Client requests the “value” associated with a “key”
• A “GET” request for “key” sent to memcached
• If “key” found
− Memcached returns “value” for “key”
• If “key” not found
− Persistent database is queried for “key”
− “value” from database is returned to client
− “SET” request sent to MC with “key” & “value”
• Key-value pair stays in cache unless
− It is evicted because of cache LRU policies
− Explicitly removed by a “DELETE” request
• Typical operations
− GET, SET, DELETE, STATS, REPLACE, etc.
• Most frequent transaction is “GET”
− Impacts perf of most common use cases
4
5. Measuring performance
• Measure perf of most important transaction - “get”
• Best perf = max “get” Requests Per Sec (RPS) under SLA
− SLA (Service Level Agreement) : Average “get” latency <= 1 ms
• Measurement configuration is “client-server”
− Run memcached on one or more servers
− Run load generator/s on “client/s” to send requests to MC servers
− Load generator keeps track of transactions and reports results
• Process
− Load gen sends “set” requests to prime cache with key-value pairs
− For incremental RPS in a range, do following until avg latency >
1ms:
− Send random key “gets” for 60 secs, calculate average latency
• S/W and H/W configuration
− Open Source Memcached V 1.6 base and optimized
− Open Source Mcblaster load generator
− Intel® Xeon® E5-2660 2.2 GHz, 10GB NIC, 64 GB memory
5
6. Baseline performance & core scalability
• Intel® Xeon® E5-2660 2.2 GHz, 10GB NIC, 64 GB memory
• Intel® Turbo Boost Technology ON, Intel® Hyper-Threading Technology OFF
No scalability beyond 3 cores, degrades beyond 4
Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance
tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions.
Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in
fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more
information go to http://www.intel.com/performance. Configuration: Intel® Xeon® E5-2660 2.2 GHz, 10GB NIC, 64 GB memory,
6 Intel® Turbo Boost Technology ON, Intel® Hyper-Threading Technology OFF
7. Performance root cause
• Profile during “gets” shows lots of time spent in locks
• Drill down into code shows coarse grained global cache locks
− Held for most of a thread’s execution time
• Removing the global locks & measuring “gets” showed substantial improvement
− Unsafe, done only as a proof of concept
• “Top” shows unbalanced CPU core utilization, possibilities are:
− Sub-optimal network packet handling and distribution
− Thread migration between cores
7
8. Transaction flow
• Incoming requests from clients
• Libevent distributes them to MC threads
− # of MC threads = # of cores
− No thread affinity
• Threads do key hashing in parallel
• Hash table processing to
− Find place for new item (key-value pair)
− Find location of existing item
• LRU processing to maintain cache policy
− Move item to front of list indicating most
recently accessed
• A global cache lock around hash table
and LRU processing
− Serializes all transactions on all threads
− This is the key bottleneck to scalability
• Final responses handled in parallel
8
9. The hash table
• Hash table is arranged as an array of buckets
• Each bucket has a singly linked list as a hash chain
• The hashed key is used to find the bucket it belongs in
• Item (key-value pair) is then inserted/retrieved from in
the hash chain of that bucket
9
10. The LRU
• LRU - Least Recently Used cache management scheme
− Cache is finite amount - evict old items to make room for new ones
− LRU policy determines eviction order of cache items
− Oldest active cache item is evicted first
• Uses a doubly linked list for quick manipulation
− Head has most recently used item
− GET for item removes it from current position & moved to head
− On eviction the tail is checked for oldest item
10
11. Why the global lock
• Linked lists are used in both the hash table & the LRU
• Corruption can occur if the lock is removed
− Example below of two close by items being removed
− Higher chance of corruption in the LRU because of doubly linked list
11
12. Optimization goals, design considerations
• Goals
− Must scale well with larger core counts
− Hash distribution should have little effect on perf
− Same performance accessing 1 unique key or 100k unique keys
− Changes to LRU must maintain/increase hit rates
− ~90% with test data set
• Implementation considerations
− Any lock removal or reduction should be safe
− No additional data should be used for cache items
− Millions to billions of cache items in a fully populated instance
− A single 64-bit field would reduce useable memory considerably leading to a
reduced the hit rate
− Focus on GETs for best performance
− Most memcached instances are read dominated
− New design should account for this and optimize for read traffic
− Transaction ordering not guaranteed – just like the original
implementation
12
13. Optimized transaction flow
Original Optimized
• Global lock serializes Hash table and LRU • Non-blocking gets using a “Bag” LRU scheme
operations • Better parallelization for set/delete with striped locks
13
14. SET/DEL optimization - parallel hash table
• Uses striped locks instead of a global lock
− Fine grain collection of locks instead of a single global lock
• Makes use of a fixed-size, shared collection of locks for
the entire hash table
− Allows for a highly scalable hash table solution
− Fixed-overhead
• Number of locks is a ^2 to determine lock quickly
− Bitwise and the bucket with the number of locks to determine lock
• Not used for GETs
14
15. SET/DEL optimization - parallel hash table ..
• Each lock services Z number of buckets
• Number of locks, Z, based on balance between
parallelism and lock maintenance overhead
• Multiple buckets can be manipulated in parallel
15
16. GET optimization – removing the global lock
• No global lock during hash table processing for GET
• With no global lock, two situations must be handled
− Expansion of hash table during a GET
− Hash table expands if there are a lot more items than buckets can handle
− SET/DEL of an item during a GET
• Handling hash table expansion during GET
− If expanding then wait for it to finish before looking up hash chain
− If not expanding then find data in hash chain and return it
• Handling SET/DEL during a GET
− If hash table expanding, wait to finish before modifying hash chain
− Modify pointers in right order using atomic operations to ensure correct
hash chain traversal for GETs
• A GET may still happen while the item is being modified
(SET/DEL/REPLACE)
− Is that a problem?
− No, as long as traversal is correct, because operation order is not
guaranteed anyways
16
17. GET optimization – Parallel Bag LRU
• Replaces the original doubly linked list LRU
• Basic concept is to group items with similar time stamps
into “bags”
− As before, no ordering is guaranteed
• Has all the functionality as the original LRU
• Re-uses original item data structure – no additions
• SET to a bag uses atomic Compare and Swap operation
• GET from a bag is lockless
• DEL requests do nothing to the Bag LRU
• LRU cleanup is delegated to a “cleaner thread”
− Acts like “garbage collection/cleanup”
− Evicts expired items quickly
− Handles item cleanup from deletes
− Reorders cache items based on update time
− Adds additional Bags as needed
17
18. Parallel Bag LRU details – Bag Array
Original LRU
Bag LRU
• A list of bags in chronological order
• Bags have list of items
• Newest bag has recently allocated or accessed items
• Alternate bag used by cleaner thread to avoid lock contention on
inserts to newest bag
• Bag head has pointers to oldest and newest bags for quick access
18
19. Parallel Bag LRU details – Bags
• Each bag has a singly linked list of cache items
• SET causes new item to be inserted into “newest bag”
• GET updates timestamp & pointer to point to “newest
bag”
• Evictions handled by cleaner thread
19
20. Parallel Bag LRU – Cleaner Thread
• Periodically does house keeping on the Bag LRU
− Currently every 5 secs
• Starts cleaning from the oldest bag’s oldest item
20
21. Optimizations - Misc
• Used thread affinity to bind 1 memcached thread per core
• Configured NIC driver to evenly distribute incoming
packets over CPUs
− 1 NIC queue per logical CPU, affinitized to a logical CPU
• Irqbalance, iptables services turned off
21
22. Optimized performance & core scaling
• Intel® Xeon® E5-2660 2.2 GHz, 10GB NIC, 64 GB memory
• Intel® Turbo Boost Technology ON, Intel® Hyper-Threading Technology OFF
Linear scaling with optimizations
Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance
tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions.
Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in
fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more
information go to http://www.intel.com/performance. Configuration: Intel® Xeon® E5-2660 2.2 GHz, 10GB NIC, 64 GB memory,
22 Intel® Turbo Boost Technology ON, Intel® Hyper-Threading Technology OFF
23. Server capacity
• Intel® Xeon® E5-2660 2.2 GHz, 10GB NIC, 64 GB memory
Overall 900% gains vs. baseline
Turbo and HT boost performance by 31%
Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance
tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions.
Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in
fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more
information go to http://www.intel.com/performance. Configuration: Intel® Xeon® E5-2660 2.2 GHz, 10GB NIC, 64 GB memory,
23 Intel® Turbo Boost Technology OFF/ON, Intel® Hyper-Threading Technology OFF/ON
24. Efficiency and hit rate
• Hit rate measured with a synthetic benchmark
increased slightly
− At ~90% - similar to that of original version
• Efficiency (Transactions Per watt) increased by 3.4X
− Mostly due to much higher RPS for little increase in power
− Power draw would be less in a production environment
• Intel® Xeon® E5-2660 2.2 GHz, 10GB NIC, 64 GB memory
Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance
tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions.
Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in
fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more
information go to http://www.intel.com/performance. Configuration: Intel® Xeon® E5-2660 2.2 GHz, 10GB NIC, 64 GB memory,
24 Intel® Turbo Boost Technology ON, Intel® Hyper-Threading Technology ON
25. Summary
• Base core/thread scalability hampered by locks
− No throughput scaling beyond 3 cores, degradation beyond 4
• Lockless “GETs” with Bag LRU improves scalability
− Linear till the measured 16 cores
− No increase in average latency
− No loss in hit rate (~90%)
− Same performance for random and hot/repeated keys
• Striped locks parallelize hash table access for SET/DEL
• Bag LRU source code available on GitHub
− https://github.com/rajiv-kapoor/memcached/tree/bagLRU
25
28. Intel's compilers may or may not optimize to the same degree for non-Intel
microprocessors for optimizations that are not unique to Intel microprocessors.
These optimizations include SSE2, SSE3, and SSE3 instruction sets and other
optimizations. Intel does not guarantee the availability, functionality, or
effectiveness of any optimization on microprocessors not manufactured by Intel.
Microprocessor-dependent optimizations in this product are intended for use with
Intel microprocessors. Certain optimizations not specific to Intel
microarchitecture are reserved for Intel microprocessors. Please refer to the
applicable product User and Reference Guides for more information regarding the
specific instruction sets covered by this notice.
Notice revision #20110804
29. Legal Disclaimer
• Built-In Security: No computer system can provide absolute security under all conditions. Built-in security features
available on select Intel® Core™ processors may require additional software, hardware, services and/or an Internet
connection. Results may vary depending upon configuration. Consult your PC manufacturer for more details.
• Enhanced Intel SpeedStep® Technology - See the Processor Spec Finder at http://ark.intel.com or contact your Intel
representative for more information.
• Intel® Hyper-Threading Technology (Intel® HT Technology) is available on select Intel® Core™ processors. Requires
an Intel® HT Technology-enabled system. Consult your PC manufacturer. Performance will vary depending on the
specific hardware and software used. For more information including details on which processors support Intel HT
Technology, visit http://www.intel.com/info/hyperthreading.
• Intel® 64 architecture requires a system with a 64-bit enabled processor, chipset, BIOS and software. Performance
will vary depending on the specific hardware and software you use. Consult your PC manufacturer for more
information. For more information, visit http://www.intel.com/info/em64t
• Intel® Turbo Boost Technology requires a system with Intel Turbo Boost Technology. Intel Turbo Boost Technology
and Intel Turbo Boost Technology 2.0 are only available on select Intel® processors. Consult your PC manufacturer.
Performance varies depending on hardware, software, and system configuration. For more information, visit
http://www.intel.com/go/turbo
• Other Software Code Disclaimer
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated
documentation files (the "Software"), to deal in the Software without restriction, including without limitation the
rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit
persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies
or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT
NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
30. Risk Factors
The above statements and any others in this document that refer to plans and expectations for the second quarter, the year and the
future are forward-looking statements that involve a number of risks and uncertainties. Words such as “anticipates,” “expects,”
“intends,” “plans,” “believes,” “seeks,” “estimates,” “may,” “will,” “should” and their variations identify forward-looking statements.
Statements that refer to or are based on projections, uncertain events or assumptions also identify forward-looking statements.
Many factors could affect Intel’s actual results, and variances from Intel’s current expectations regarding such factors could cause
actual results to differ materially from those expressed in these forward-looking statements. Intel presently considers the following
to be the important factors that could cause actual results to differ materially from the company’s expectations. Demand could be
different from Intel's expectations due to factors including changes in business and economic conditions, including supply constraints
and other disruptions affecting customers; customer acceptance of Intel’s and competitors’ products; changes in customer order
patterns including order cancellations; and changes in the level of inventory at customers. Uncertainty in global economic and
financial conditions poses a risk that consumers and businesses may defer purchases in response to negative financial events, which
could negatively affect product demand and other related matters. Intel operates in intensely competitive industries that are
characterized by a high percentage of costs that are fixed or difficult to reduce in the short term and product demand that is highly
variable and difficult to forecast. Revenue and the gross margin percentage are affected by the timing of Intel product introductions
and the demand for and market acceptance of Intel's products; actions taken by Intel's competitors, including product offerings and
introductions, marketing programs and pricing pressures and Intel’s response to such actions; and Intel’s ability to respond quickly
to technological developments and to incorporate new features into its products. Intel is in the process of transitioning to its next
generation of products on 22nm process technology, and there could be execution and timing issues associated with these changes,
including products defects and errata and lower than anticipated manufacturing yields. The gross margin percentage could vary
significantly from expectations based on capacity utilization; variations in inventory valuation, including variations related to the
timing of qualifying products for sale; changes in revenue levels; segment product mix; the timing and execution of the
manufacturing ramp and associated costs; start-up costs; excess or obsolete inventory; changes in unit costs; defects or disruptions
in the supply of materials or resources; product manufacturing quality/yields; and impairments of long-lived assets, including
manufacturing, assembly/test and intangible assets. The majority of Intel’s non-marketable equity investment portfolio balance is
concentrated in companies in the flash memory market segment, and declines in this market segment or changes in management’s
plans with respect to Intel’s investments in this market segment could result in significant impairment charges, impacting
restructuring charges as well as gains/losses on equity investments and interest and other. Intel's results could be affected by
adverse economic, social, political and physical/infrastructure conditions in countries where Intel, its customers or its suppliers
operate, including military conflict and other security risks, natural disasters, infrastructure disruptions, health concerns and
fluctuations in currency exchange rates. Expenses, particularly certain marketing and compensation expenses, as well as
restructuring and asset impairment charges, vary depending on the level of demand for Intel's products and the level of revenue and
profits. Intel’s results could be affected by the timing of closing of acquisitions and divestitures. Intel's results could be affected by
adverse effects associated with product defects and errata (deviations from published specifications), and by litigation or regulatory
matters involving intellectual property, stockholder, consumer, antitrust, disclosure and other issues, such as the litigation and
regulatory matters described in Intel's SEC reports. An unfavorable ruling could include monetary damages or an injunction
prohibiting Intel from manufacturing or selling one or more products, precluding particular business practices, impacting Intel’s
ability to design its products, or requiring other remedies such as compulsory licensing of intellectual property. A detailed discussion
of these and other factors that could affect Intel’s results is included in Intel’s SEC filings, including the company’s most recent Form
10-Q, Form 10-K and earnings release.
Rev. 5/4/12
31. Summary
Memcached is a popular key-value caching service used by web
service delivery companies to reduce the latency of serving data
to consumers and reduce load on back-end database servers.
It has a scale out architecture that easily supports increasing
throughput by simply adding more memcached servers, but at
the individual server level scaling up to higher core counts is
less rewarding. In this talk we introduce optimizations that
break through such scalability barriers and allow all cores in a
server to be used effectively. We explain new algorithms
implemented to achieve an almost 6x increase in throughput
while maintaining a 1ms average latency SLA by utilizing
concurrent data structures, a new cache replacement policy and
network optimizations.
31