SlideShare a Scribd company logo
1 of 28
Michael Kehoe
Staff Site Reliability Engineer
LinkedIn
Going all in:
From single use-case to many
2
Overview
• The LinkedIn Story
• Couchbase Use-Cases
• Development & Operations
• Conclusions
• Questions
$ whoami
3
Michael Kehoe
• Staff Site Reliability Engineer (SRE)
• Production-SRE team
• Funny accent = Australian
• Contact
• linkedin.com/in/michaelkkehoe
• @matrixtek
$ whatis SRE
4
Michael Kehoe
• Site Reliability Engineering
• Operations for the production application environment
• Responsibilities include
• Architecture design
• Capacity planning
• Operations
• Tooling
$ whatis CBVT
5
Michael Kehoe
• Couchbase Virtual Team
• ~10 SRE’s
• 2 Software Engineers
• Sponsored by SRE Director
• 5-90% of their time to support Couchbase
• Encourage as many people to contribute as possible
• What do we do?
• Operational work on Couchbase clusters
• Evangelize the use of Couchbase within LinkedIn
• Develop tools for the Couchbase Ecosystem
6
The LinkedIn Story
• Founded in 2002, LinkedIn has grown into the world’s largest professional social
media network
• 30 offices in 24 countries, Available in 24 languages
• More than 450+ million members worldwide
7
The LinkedIn Story
• Growth in Products
• Profiles
• Groups
• Recruiter
• Sales Navigator
• Growth in Internet Traffic
• Billions of page-hits per day
• 100k+ QPS to production services
In-Memory Storage Needs
8
The LinkedIn Story
• LinkedIn started as an Oracle shop
• Hyper-growth = Scaling challenges
• Read-Scaling becomes important
• Applicable use-cases
• Simple cache store
• Pre-warmed
• Read through
• Potential for Source of Truth (SoT) store
Enter Couchbase
9
The LinkedIn Story
• Until 2012, we were only using Memcache as a non SoT In-Memory store
• Drawbacks
• Difficult to pre-warm
• No partitioning/sharding (had to write our own)
• Cold-cache restarts
• Difficult to move data across hosts/clusters data-centers
Enter Couchbase
10
The LinkedIn Story
• Evaluated replacement systems for Memcached: Mongo, Redis, and others
• Couchbase had distinct advantages:
• Simple replacement for Memcached
• Built-in replication and cluster expansion
• Automatic partitioning
• Low latency
• Async writes to disk
• Building tooling is simple
Enter Couchbase
11
The LinkedIn Story
• Today we run Couchbase in our Corporate, Staging and Production environments
• Production/ Staging statistics:
• 148 buckets
• 2821 hosts
• 10M+ QPS
• Largest Clusters:
• By Hosts: 72 Hosts
• By Documents: 1.4B Documents
• By QPS: 2.5M QPS
Summary
12
Use-Cases
Today’s use-cases:
• Simple read-through cache
• Ephemeral Counter Store
• Temporary de-duping store
• SoT data-store for internal tooling
Simple read-through cache
13
Use-Cases
• Drop-in replacement for memcache
• Read-scaling
• Protecting backend database from large amounts of traffic
• E.g. 3rd party ingestion credential cache
Counter Store
14
Use-Cases
• In certain places, we simply need to increment counters from multiple systems and
store them
• E.g. Anti-abuse/Anti-scraping systems (Fuse)
Temporary De-duping store
15
Use-Cases
• Need to de-dup data over a large application cluster
• E.g. Email systems – Ensure we don’t send the same email twice
SoT Store for Internal Tools
16
Use-Cases
• For Non-Member facing tools, we use Couchbase as a SoT store.
• Benefits:
• Schema-less
• Short setup time
• Couchbase Python Client works easily in our environment
• Use views for simple map-reduce
• Example Uses:
• Nurse – Autoremediation system
• TrafficshiftIn – Global traffic automation system
• Availability – Storing and tracking Linkedin availability data
Couchbase Ecosystem
17
The LinkedIn Story
18
Developing around Couchbase
• Java – li-couchbase-client
• Wrapper around standard Java Couchbase Client
• Custom metrics emission
• Using Spring interface
• Storing data as Java serialized objects
• Python – couchbase-python-client
19
Operational Tooling
In order to efficiently use Couchbase as SRE’s, we need the following:
• Provisioning
• Installation
• Monitoring & Alerting
• Infrastructure Visibility
Provisioning
20
Operational Tooling
• Provisioning Flow
• Seek estimated usage statistics for cluster
• Size of data to be stored
• QPS
• Redundancy Needs
• Calculate cluster sizing
• Currently done with a template
• Couchbase has a simple calculator available online: http://docs.couchbase.com/prebuilt/calculators/sizing-
calc.html
• Request hardware for cluster(s)
Installation
21
Operational Tooling
• Process
• Enter cluster metadata into our management system (Range)
• Use Salt States to install and configure cluster
• See Issa Fattah’s post for more information:
• https://engineering.linkedin.com/blog/2016/04/leveraging-saltstack-to-scale-couchbase
• Benefits
• Ability to perform ‘state enforcement’
• Using Salt Pillar’s to encrypt cluster/ bucket passwords end-to-end
Monitoring & Alerting
22
Operational Tooling
• We run a daemon on each Couchbase Server that collects metrics every minute via
Couchbase API’s
• Use cluster metadata from range to build dashboards with our own system
InGraphs
• See: ‘Monitoring production deployments’: 4pm - Great America 1
Monitoring & Alerting
23
Operational Tooling
Management
24
Operational Tooling
• We want to see a world-view of all the clusters we run
• Having bucket cluster/server level statistics is useful
• Having a global view of who owns and operates each cluster/ bucket is useful
Management
25
Operational Tooling
26
Conclusions
• Couchbase was a natural fit into our existing infrastructure
• Building an ecosystem around Couchbase was important to us and has helped
Couchbase be successful at LinkedIn
• Expanding use of Couchbase
• In the past year we’ve grown the number of buckets over 50%
• Starting to use Views in production
• Moving Couchbase into LinkedIn standard deployment infrastructure
27
Thank You
Questions?
©2014 LinkedIn Corporation. All Rights Reserved.©2014 LinkedIn Corporation. All Rights Reserved.

More Related Content

What's hot

Introducing Tupilak, Snowplow's unified log fabric
Introducing Tupilak, Snowplow's unified log fabricIntroducing Tupilak, Snowplow's unified log fabric
Introducing Tupilak, Snowplow's unified log fabricAlexander Dean
 
Consolidating services with middleware - NDC London 2017
Consolidating services with middleware - NDC London 2017Consolidating services with middleware - NDC London 2017
Consolidating services with middleware - NDC London 2017Christian Horsdal
 
URP? Excuse You! The Three Metrics You Have to Know
URP? Excuse You! The Three Metrics You Have to Know URP? Excuse You! The Three Metrics You Have to Know
URP? Excuse You! The Three Metrics You Have to Know confluent
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environmentconfluent
 
Server Monitoring from the Cloud
Server Monitoring from the CloudServer Monitoring from the Cloud
Server Monitoring from the CloudSite24x7
 
Microsoft Azure and Windows Application monitoring
Microsoft Azure and Windows Application monitoringMicrosoft Azure and Windows Application monitoring
Microsoft Azure and Windows Application monitoringSite24x7
 
Scala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaScala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaAlexander Dean
 
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...HostedbyConfluent
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
 
Akka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in PracticeAkka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in PracticeRoland Kuhn
 
David Max SATURN 2018 - Migrating from Oracle to Espresso
David Max SATURN 2018 - Migrating from Oracle to EspressoDavid Max SATURN 2018 - Migrating from Oracle to Espresso
David Max SATURN 2018 - Migrating from Oracle to EspressoDavid Max
 
DOES SFO 2016 - Rich Jackson & Rosalind Radcliffe - The Mainframe DevOps Team...
DOES SFO 2016 - Rich Jackson & Rosalind Radcliffe - The Mainframe DevOps Team...DOES SFO 2016 - Rich Jackson & Rosalind Radcliffe - The Mainframe DevOps Team...
DOES SFO 2016 - Rich Jackson & Rosalind Radcliffe - The Mainframe DevOps Team...Gene Kim
 
Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source ToolsBootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Toolsbotsplash.com
 
Span Conference: Why your company needs a unified log
Span Conference: Why your company needs a unified logSpan Conference: Why your company needs a unified log
Span Conference: Why your company needs a unified logAlexander Dean
 
Azkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation EngineAzkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation EnginePraveen Thirukonda
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Piyush Kumar
 
Office Online Server 2016 - a must for on-premises installation for SharePoin...
Office Online Server 2016 - a must for on-premises installation for SharePoin...Office Online Server 2016 - a must for on-premises installation for SharePoin...
Office Online Server 2016 - a must for on-premises installation for SharePoin...SPC Adriatics
 
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming ApplicationsMetrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applicationsconfluent
 
10 Tips to Pump Up Your Atlassian Performance
10 Tips to Pump Up Your Atlassian Performance10 Tips to Pump Up Your Atlassian Performance
10 Tips to Pump Up Your Atlassian PerformanceAtlassian
 

What's hot (20)

Introducing Tupilak, Snowplow's unified log fabric
Introducing Tupilak, Snowplow's unified log fabricIntroducing Tupilak, Snowplow's unified log fabric
Introducing Tupilak, Snowplow's unified log fabric
 
Consolidating services with middleware - NDC London 2017
Consolidating services with middleware - NDC London 2017Consolidating services with middleware - NDC London 2017
Consolidating services with middleware - NDC London 2017
 
URP? Excuse You! The Three Metrics You Have to Know
URP? Excuse You! The Three Metrics You Have to Know URP? Excuse You! The Three Metrics You Have to Know
URP? Excuse You! The Three Metrics You Have to Know
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environment
 
Server Monitoring from the Cloud
Server Monitoring from the CloudServer Monitoring from the Cloud
Server Monitoring from the Cloud
 
Microsoft Azure and Windows Application monitoring
Microsoft Azure and Windows Application monitoringMicrosoft Azure and Windows Application monitoring
Microsoft Azure and Windows Application monitoring
 
Scala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaScala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in Scala
 
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Intro to.net core 20170111
Intro to.net core   20170111Intro to.net core   20170111
Intro to.net core 20170111
 
Akka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in PracticeAkka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in Practice
 
David Max SATURN 2018 - Migrating from Oracle to Espresso
David Max SATURN 2018 - Migrating from Oracle to EspressoDavid Max SATURN 2018 - Migrating from Oracle to Espresso
David Max SATURN 2018 - Migrating from Oracle to Espresso
 
DOES SFO 2016 - Rich Jackson & Rosalind Radcliffe - The Mainframe DevOps Team...
DOES SFO 2016 - Rich Jackson & Rosalind Radcliffe - The Mainframe DevOps Team...DOES SFO 2016 - Rich Jackson & Rosalind Radcliffe - The Mainframe DevOps Team...
DOES SFO 2016 - Rich Jackson & Rosalind Radcliffe - The Mainframe DevOps Team...
 
Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source ToolsBootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Tools
 
Span Conference: Why your company needs a unified log
Span Conference: Why your company needs a unified logSpan Conference: Why your company needs a unified log
Span Conference: Why your company needs a unified log
 
Azkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation EngineAzkaban - WorkFlow Scheduler/Automation Engine
Azkaban - WorkFlow Scheduler/Automation Engine
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
 
Office Online Server 2016 - a must for on-premises installation for SharePoin...
Office Online Server 2016 - a must for on-premises installation for SharePoin...Office Online Server 2016 - a must for on-premises installation for SharePoin...
Office Online Server 2016 - a must for on-premises installation for SharePoin...
 
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming ApplicationsMetrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
 
10 Tips to Pump Up Your Atlassian Performance
10 Tips to Pump Up Your Atlassian Performance10 Tips to Pump Up Your Atlassian Performance
10 Tips to Pump Up Your Atlassian Performance
 

Viewers also liked

Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016Michael Kehoe
 
SRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentSRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentMichael Kehoe
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4Michael Kehoe
 
Feedback loops: How SREs benefit and what is needed to realize their potential
Feedback loops: How SREs benefit and what is needed to realize their potentialFeedback loops: How SREs benefit and what is needed to realize their potential
Feedback loops: How SREs benefit and what is needed to realize their potentialPooja Tangi
 
CouchDB y el desarrollo de aplicaciones Android
CouchDB y el desarrollo de aplicaciones AndroidCouchDB y el desarrollo de aplicaciones Android
CouchDB y el desarrollo de aplicaciones AndroidRicardo Monagas Medina
 
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.Issa Fattah
 
Software reliability tools and common software errors
Software reliability tools and common software errorsSoftware reliability tools and common software errors
Software reliability tools and common software errorsHimanshu
 
How TPM saves the day
How TPM saves the dayHow TPM saves the day
How TPM saves the dayPooja Tangi
 
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...Diego Pacheco
 
Software Reliability Engineering
Software Reliability EngineeringSoftware Reliability Engineering
Software Reliability Engineeringguest90cec6
 
Event Driven Automation Meetup May 14/2015
Event Driven Automation Meetup May 14/2015Event Driven Automation Meetup May 14/2015
Event Driven Automation Meetup May 14/2015Dmitri Zimine
 
Software reliability growth model
Software reliability growth modelSoftware reliability growth model
Software reliability growth modelHimanshu
 
Load balancing in the SRE way
Load balancing in the SRE wayLoad balancing in the SRE way
Load balancing in the SRE wayShawn Zhu
 
Event driven-automation and workflows
Event driven-automation and workflowsEvent driven-automation and workflows
Event driven-automation and workflowsDmitri Zimine
 
Monitoring Microservices
Monitoring MicroservicesMonitoring Microservices
Monitoring MicroservicesWeaveworks
 

Viewers also liked (15)

Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016
 
SRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentSRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level Talent
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4
 
Feedback loops: How SREs benefit and what is needed to realize their potential
Feedback loops: How SREs benefit and what is needed to realize their potentialFeedback loops: How SREs benefit and what is needed to realize their potential
Feedback loops: How SREs benefit and what is needed to realize their potential
 
CouchDB y el desarrollo de aplicaciones Android
CouchDB y el desarrollo de aplicaciones AndroidCouchDB y el desarrollo de aplicaciones Android
CouchDB y el desarrollo de aplicaciones Android
 
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.
Couchbase Orchestration and Scaling a Caching Infrastructure At LinkedIn.
 
Software reliability tools and common software errors
Software reliability tools and common software errorsSoftware reliability tools and common software errors
Software reliability tools and common software errors
 
How TPM saves the day
How TPM saves the dayHow TPM saves the day
How TPM saves the day
 
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
 
Software Reliability Engineering
Software Reliability EngineeringSoftware Reliability Engineering
Software Reliability Engineering
 
Event Driven Automation Meetup May 14/2015
Event Driven Automation Meetup May 14/2015Event Driven Automation Meetup May 14/2015
Event Driven Automation Meetup May 14/2015
 
Software reliability growth model
Software reliability growth modelSoftware reliability growth model
Software reliability growth model
 
Load balancing in the SRE way
Load balancing in the SRE wayLoad balancing in the SRE way
Load balancing in the SRE way
 
Event driven-automation and workflows
Event driven-automation and workflowsEvent driven-automation and workflows
Event driven-automation and workflows
 
Monitoring Microservices
Monitoring MicroservicesMonitoring Microservices
Monitoring Microservices
 

Similar to Couchbase Connect 2016

Stay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithStay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithMarkus Eisele
 
Building a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with DatabricksBuilding a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with DatabricksDatabricks
 
Stay productive while slicing up the monolith
Stay productive while slicing up the monolith Stay productive while slicing up the monolith
Stay productive while slicing up the monolith Markus Eisele
 
Sql Start! 2020 - SQL Server Lift & Shift su Azure
Sql Start! 2020 - SQL Server Lift & Shift su AzureSql Start! 2020 - SQL Server Lift & Shift su Azure
Sql Start! 2020 - SQL Server Lift & Shift su AzureMarco Obinu
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Fwdays
 
The Need of Cloud-Native Application
The Need of Cloud-Native ApplicationThe Need of Cloud-Native Application
The Need of Cloud-Native ApplicationEmiliano Pecis
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyCeph Community
 
Sitecore - what to look forward to
Sitecore - what to look forward toSitecore - what to look forward to
Sitecore - what to look forward tojinto77
 
Cloud-Native-Data with Cornelia Davis
Cloud-Native-Data with Cornelia DavisCloud-Native-Data with Cornelia Davis
Cloud-Native-Data with Cornelia DavisVMware Tanzu
 
Criteo meetup - S.R.E Tech Talk
Criteo meetup - S.R.E Tech TalkCriteo meetup - S.R.E Tech Talk
Criteo meetup - S.R.E Tech TalkPierre Mavro
 
High performance web sites with multilevel caching
High performance web sites with multilevel cachingHigh performance web sites with multilevel caching
High performance web sites with multilevel cachingDotnet Open Group
 
Membase Meetup - Silicon Valley
Membase Meetup - Silicon ValleyMembase Meetup - Silicon Valley
Membase Meetup - Silicon ValleyMembase
 
Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...
Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...
Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...Lucas Jellema
 
Container Conf 2017: Rancher Kubernetes
Container Conf 2017: Rancher KubernetesContainer Conf 2017: Rancher Kubernetes
Container Conf 2017: Rancher KubernetesVishal Biyani
 
KoprowskiT_SQLRelay2014#4_Caerdydd_MaintenancePlansForBeginners
KoprowskiT_SQLRelay2014#4_Caerdydd_MaintenancePlansForBeginnersKoprowskiT_SQLRelay2014#4_Caerdydd_MaintenancePlansForBeginners
KoprowskiT_SQLRelay2014#4_Caerdydd_MaintenancePlansForBeginnersTobias Koprowski
 
Meet Magento Belarus 2015: Jurģis Lukss
Meet Magento Belarus 2015: Jurģis LukssMeet Magento Belarus 2015: Jurģis Lukss
Meet Magento Belarus 2015: Jurģis LukssAmasty
 
Urbanesia - Development History
Urbanesia - Development HistoryUrbanesia - Development History
Urbanesia - Development HistoryBatista Harahap
 

Similar to Couchbase Connect 2016 (20)

Stay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithStay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolith
 
Building a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with DatabricksBuilding a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with Databricks
 
Stay productive while slicing up the monolith
Stay productive while slicing up the monolith Stay productive while slicing up the monolith
Stay productive while slicing up the monolith
 
Sql Start! 2020 - SQL Server Lift & Shift su Azure
Sql Start! 2020 - SQL Server Lift & Shift su AzureSql Start! 2020 - SQL Server Lift & Shift su Azure
Sql Start! 2020 - SQL Server Lift & Shift su Azure
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
 
The Need of Cloud-Native Application
The Need of Cloud-Native ApplicationThe Need of Cloud-Native Application
The Need of Cloud-Native Application
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case Study
 
Sitecore - what to look forward to
Sitecore - what to look forward toSitecore - what to look forward to
Sitecore - what to look forward to
 
Cloud-native Data
Cloud-native DataCloud-native Data
Cloud-native Data
 
Cloud-Native-Data with Cornelia Davis
Cloud-Native-Data with Cornelia DavisCloud-Native-Data with Cornelia Davis
Cloud-Native-Data with Cornelia Davis
 
Criteo meetup - S.R.E Tech Talk
Criteo meetup - S.R.E Tech TalkCriteo meetup - S.R.E Tech Talk
Criteo meetup - S.R.E Tech Talk
 
TechBeats #2
TechBeats #2TechBeats #2
TechBeats #2
 
High performance web sites with multilevel caching
High performance web sites with multilevel cachingHigh performance web sites with multilevel caching
High performance web sites with multilevel caching
 
Membase Meetup - Silicon Valley
Membase Meetup - Silicon ValleyMembase Meetup - Silicon Valley
Membase Meetup - Silicon Valley
 
Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...
Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...
Dutch Oracle Architects Platform - Reviewing Oracle OpenWorld 2017 and New Tr...
 
Container Conf 2017: Rancher Kubernetes
Container Conf 2017: Rancher KubernetesContainer Conf 2017: Rancher Kubernetes
Container Conf 2017: Rancher Kubernetes
 
Exploring sql server 2016
Exploring sql server 2016Exploring sql server 2016
Exploring sql server 2016
 
KoprowskiT_SQLRelay2014#4_Caerdydd_MaintenancePlansForBeginners
KoprowskiT_SQLRelay2014#4_Caerdydd_MaintenancePlansForBeginnersKoprowskiT_SQLRelay2014#4_Caerdydd_MaintenancePlansForBeginners
KoprowskiT_SQLRelay2014#4_Caerdydd_MaintenancePlansForBeginners
 
Meet Magento Belarus 2015: Jurģis Lukss
Meet Magento Belarus 2015: Jurģis LukssMeet Magento Belarus 2015: Jurģis Lukss
Meet Magento Belarus 2015: Jurģis Lukss
 
Urbanesia - Development History
Urbanesia - Development HistoryUrbanesia - Development History
Urbanesia - Development History
 

More from Michael Kehoe

Code Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayMichael Kehoe
 
QConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsMichael Kehoe
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayMichael Kehoe
 
AllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsAllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsMichael Kehoe
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container BasicsMichael Kehoe
 
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsMichael Kehoe
 
What the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsWhat the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsMichael Kehoe
 
PyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsMichael Kehoe
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayMichael Kehoe
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringMichael Kehoe
 
Building Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFMichael Kehoe
 
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...Michael Kehoe
 
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...Michael Kehoe
 
SRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsMichael Kehoe
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleMichael Kehoe
 

More from Michael Kehoe (17)

eBPF Workshop
eBPF WorkshopeBPF Workshop
eBPF Workshop
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
Code Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart way
 
QConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready Applications
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
AllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsAllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortems
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container Basics
 
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
 
What the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsWhat the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortems
 
PyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python Applications
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability Engineering
 
Building Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSF
 
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
 
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
 
SRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREs
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Couchbase Connect 2016

  • 1. Michael Kehoe Staff Site Reliability Engineer LinkedIn Going all in: From single use-case to many
  • 2. 2 Overview • The LinkedIn Story • Couchbase Use-Cases • Development & Operations • Conclusions • Questions
  • 3. $ whoami 3 Michael Kehoe • Staff Site Reliability Engineer (SRE) • Production-SRE team • Funny accent = Australian • Contact • linkedin.com/in/michaelkkehoe • @matrixtek
  • 4. $ whatis SRE 4 Michael Kehoe • Site Reliability Engineering • Operations for the production application environment • Responsibilities include • Architecture design • Capacity planning • Operations • Tooling
  • 5. $ whatis CBVT 5 Michael Kehoe • Couchbase Virtual Team • ~10 SRE’s • 2 Software Engineers • Sponsored by SRE Director • 5-90% of their time to support Couchbase • Encourage as many people to contribute as possible • What do we do? • Operational work on Couchbase clusters • Evangelize the use of Couchbase within LinkedIn • Develop tools for the Couchbase Ecosystem
  • 6. 6 The LinkedIn Story • Founded in 2002, LinkedIn has grown into the world’s largest professional social media network • 30 offices in 24 countries, Available in 24 languages • More than 450+ million members worldwide
  • 7. 7 The LinkedIn Story • Growth in Products • Profiles • Groups • Recruiter • Sales Navigator • Growth in Internet Traffic • Billions of page-hits per day • 100k+ QPS to production services
  • 8. In-Memory Storage Needs 8 The LinkedIn Story • LinkedIn started as an Oracle shop • Hyper-growth = Scaling challenges • Read-Scaling becomes important • Applicable use-cases • Simple cache store • Pre-warmed • Read through • Potential for Source of Truth (SoT) store
  • 9. Enter Couchbase 9 The LinkedIn Story • Until 2012, we were only using Memcache as a non SoT In-Memory store • Drawbacks • Difficult to pre-warm • No partitioning/sharding (had to write our own) • Cold-cache restarts • Difficult to move data across hosts/clusters data-centers
  • 10. Enter Couchbase 10 The LinkedIn Story • Evaluated replacement systems for Memcached: Mongo, Redis, and others • Couchbase had distinct advantages: • Simple replacement for Memcached • Built-in replication and cluster expansion • Automatic partitioning • Low latency • Async writes to disk • Building tooling is simple
  • 11. Enter Couchbase 11 The LinkedIn Story • Today we run Couchbase in our Corporate, Staging and Production environments • Production/ Staging statistics: • 148 buckets • 2821 hosts • 10M+ QPS • Largest Clusters: • By Hosts: 72 Hosts • By Documents: 1.4B Documents • By QPS: 2.5M QPS
  • 12. Summary 12 Use-Cases Today’s use-cases: • Simple read-through cache • Ephemeral Counter Store • Temporary de-duping store • SoT data-store for internal tooling
  • 13. Simple read-through cache 13 Use-Cases • Drop-in replacement for memcache • Read-scaling • Protecting backend database from large amounts of traffic • E.g. 3rd party ingestion credential cache
  • 14. Counter Store 14 Use-Cases • In certain places, we simply need to increment counters from multiple systems and store them • E.g. Anti-abuse/Anti-scraping systems (Fuse)
  • 15. Temporary De-duping store 15 Use-Cases • Need to de-dup data over a large application cluster • E.g. Email systems – Ensure we don’t send the same email twice
  • 16. SoT Store for Internal Tools 16 Use-Cases • For Non-Member facing tools, we use Couchbase as a SoT store. • Benefits: • Schema-less • Short setup time • Couchbase Python Client works easily in our environment • Use views for simple map-reduce • Example Uses: • Nurse – Autoremediation system • TrafficshiftIn – Global traffic automation system • Availability – Storing and tracking Linkedin availability data
  • 18. 18 Developing around Couchbase • Java – li-couchbase-client • Wrapper around standard Java Couchbase Client • Custom metrics emission • Using Spring interface • Storing data as Java serialized objects • Python – couchbase-python-client
  • 19. 19 Operational Tooling In order to efficiently use Couchbase as SRE’s, we need the following: • Provisioning • Installation • Monitoring & Alerting • Infrastructure Visibility
  • 20. Provisioning 20 Operational Tooling • Provisioning Flow • Seek estimated usage statistics for cluster • Size of data to be stored • QPS • Redundancy Needs • Calculate cluster sizing • Currently done with a template • Couchbase has a simple calculator available online: http://docs.couchbase.com/prebuilt/calculators/sizing- calc.html • Request hardware for cluster(s)
  • 21. Installation 21 Operational Tooling • Process • Enter cluster metadata into our management system (Range) • Use Salt States to install and configure cluster • See Issa Fattah’s post for more information: • https://engineering.linkedin.com/blog/2016/04/leveraging-saltstack-to-scale-couchbase • Benefits • Ability to perform ‘state enforcement’ • Using Salt Pillar’s to encrypt cluster/ bucket passwords end-to-end
  • 22. Monitoring & Alerting 22 Operational Tooling • We run a daemon on each Couchbase Server that collects metrics every minute via Couchbase API’s • Use cluster metadata from range to build dashboards with our own system InGraphs • See: ‘Monitoring production deployments’: 4pm - Great America 1
  • 24. Management 24 Operational Tooling • We want to see a world-view of all the clusters we run • Having bucket cluster/server level statistics is useful • Having a global view of who owns and operates each cluster/ bucket is useful
  • 26. 26 Conclusions • Couchbase was a natural fit into our existing infrastructure • Building an ecosystem around Couchbase was important to us and has helped Couchbase be successful at LinkedIn • Expanding use of Couchbase • In the past year we’ve grown the number of buckets over 50% • Starting to use Views in production • Moving Couchbase into LinkedIn standard deployment infrastructure
  • 28. ©2014 LinkedIn Corporation. All Rights Reserved.©2014 LinkedIn Corporation. All Rights Reserved.

Editor's Notes

  1. The LinkedIn Story Couchbase Use-Cases Development & Operations Conclusions Questions
  2. Site Reliability Engineering A term coined by Ben Treynor from Google Hybrid of: Sysadmin Network Engineer Architect Troubleshooter Software Engineer Ninja’s – Digital economy
  3. 10 SRE’s, with a tech-lead Sponsored by a SRE Director Input from Software Engineers on development
  4. Founded in 2002, LinkedIn has grown into the world’s largest professional social media network 30 offices in 24 countries, Available in 24 languages More than 450+ million members worldwide
  5. LinkedIn started as an Oracle shop To-date, we still run a significant number of Oracle databases Oracle is fine for writes, scaling reads becomes challenging HyperGrowth == Scaling challenges Scaling writes isn’t a common problem in most cases Scaling reads to 100k+ QPS, is challenging Failures in read-scaling infra can take down back-end systems Applicable use-cases Simple cache store Pre-warmed Read-through SoT Store
  6. Until 2012, we were only using Memcache as a non SoT In-Memory store Drawbacks of memcache: Difficult to pre-warm, not easy to copy-data No native sharding for clusters, had to write our own Restarting memcache servers caused problems Couldn’t copy data across for new DC’s, expanding clusters etc Mid-2012, started testing Couchbase
  7. Evaluated replacement systems for Memcached: Mongo, Redis, and others Couchbase had distinct advantages Simple replacement for memcache  JAVA Spring made this simpler Built-in replication and cluster expansion, significantly reduces ops-workload Automatic partitioning, doesn’t become a concern anymore Low-latency, reads from disk are still very fast Async write to disk, can write a low of data at once without it being a problem Lots of API’s that make tooling relatively simple
  8. Insert fuse architecture
  9. We have a deduplication filter in stork that you can take advantage of to make sure we don't send duplicates of your email. This is highly recommended for any email using kafka (kafka can potentially deliver your email to our system twice)
  10. Don’t use as SoT store as Espresso is our primary key-value store