MySQL DevOps at Outbrain
Upcoming SlideShare
Loading in...5
×
 

MySQL DevOps at Outbrain

on

  • 2,158 views

Tools and techniques used at Outbrain to promote good DevOps culture.

Tools and techniques used at Outbrain to promote good DevOps culture.

Statistics

Views

Total Views
2,158
Views on SlideShare
1,066
Embed Views
1,092

Actions

Likes
1
Downloads
11
Comments
0

4 Embeds 1,092

http://code.openark.org 1038
https://twitter.com 50
http://www.newsblur.com 3
http://www.openark.org 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

MySQL DevOps at Outbrain MySQL DevOps at Outbrain Presentation Transcript

  • Tools bridging the gap between MySQL engineering, ops & DBAs Shlomi Noach MySQLDevOps@Outbrain Shlomi Noach
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Aboutme ● Engineer, DBA ● Working with MySQL since 2000 ● Formerly consultant, instructor ● Author of common_schema, openark-kit, propagator ● Write at http://openark.org ● Work at the infrastructure team, Outbrain
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain AboutOutbrain ● The leading content discovery platform on the web ● Embedded in over 90,000 websites ● Serves over 150 million unique US visitors, 15 billion pages and 100 billion recommendations per month ● You may not be familiar with us by name, but have met us frequently. ● We aim to provide with reliable content to our users.
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain PAID DISCOVERY INTERNAL DISCOVERY
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain AboutOutbrain ● Managing total of over 2,000 servers (Hadoop, Cassandra, MySQL, web services, …) ● Processing about 1 Petabyte of information ● Over 70 engineers ● Doing continuous deployments ● Fans and supporters of open source ● Have "Ownership" culture: "You build it, you run it!" ○ Must be supported by technology
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain What'sDevOps? ● Or, DevDbaOps? ?
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain What'sDevOps? ● Often described as developers doing ops work, or ops doing engineering work ● I see this more as the integration between the groups ● Avoiding the scenario where parties have no control of parts of their domain. ○ Tools ○ Techniques ○ Culture
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain What'sDevOps? ● With good DevOps, you get: ○ Ownership ○ Visibility ○ Action-ability (word has just been invented and will be used as axiom) ● Allowing engineers own and be responsible for their apps. ○ No need for ops telling them something is wrong ○ No need to sit with ops to understand what is wrong ○ No need to ask ops to deploy changes ● All the while giving ops visibility into engineers actions
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Tribute:automation ● We use chef for automation ● Some databags leftovers, changing to attributes ● Everything is under version control ○ Allows ops/DBAs easily add/remove packages ○ Different treatment for masters ○ Different my.cnf settings based on MySQL role ○ Different my.cnf settings based on hardware ○ Setting up backup servers ○ More...
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Automation: OneRingtoRuleThemAll ● Outbrain's onering is an orchestration solution ● Provisioning servers: from operating system through packages (via chef integration) to application deployment (via glu integration) ○ Allows for a one click "I want a host with MyService tomcat service", or "I want a host with MySQL server" ● Then acting as inventory service ○ "give me all MySQL servers in the LA data center" ○ "which disks do our OLAP servers use?" ● https://github.com/outbrain/onering
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Onering&pmysql: on-demandsemi-automatedactions ● pmysql is a parallel MySQL client (originally developed by Domas Mituzas) ● Using onering's API, we can: curl "https://my.onering.service/api/devices /list/name/where/chef.run_list/mysql/name/olap? format=txt" | pmysql -pmypass "stop slave" curl "https://my.onering.service/api/devices /list/name/where/chef.run_list/mysql/name/olap? format=txt" | pmysql -pmypass "select @@version" | grep tokudb | awk '{print $1}' | pmysql -pmypass "stop slave"
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Visibility ● A classic developers-ops collision: slow queries ○ Ops notice increased I/O, slave lags ○ What do they know of the domain of the problem? ○ Developers see long response times ○ What visibility do they get?
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain BoxAnemometer
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain WhatmakesAnemometersucha goodDevOpstool? ● It provides visibility to everyone ● The engineer doesn't need to know what slow logs are, where they are located, how to interpret them. ● It promotes ownership in that it gets the drill down per query/per host/per service ● The Permalink. How such a small thing can make all the difference ● Ops can hand over what they think is the "guilty query"
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Anemometer Host Anemometer@Outbrain, behindthescenes MySQL Slow log MySQL Slow log MySQL Slow log Slow log Slow log Slow log logstash logstash logstash pt-query-digest Web interface
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Multipleservices,multipleMySQL hosts:whomakesitslow? MySQL Slow log MySQL Slow log MySQL Slow log service service service ● What is our analysis granulation? ● Are slow logs caused by a query? ● Affected by a loaded MySQL host? ● By a loaded service?
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Anemometer,collectingtheslowlogs input { tcp { port => 23306 type => "mysql-slow" mode => "server" } } filter { dns { reverse => [ "@source_host", "source_host_name" ] action => "replace" } } output { file { type => "mysql-slow" message_format => "%{@message}" path => "/path/to/slow_logs/logstash/%{@source_host}-mysql-slow.log" } }
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Anemometer,rotatingtheslowlogs /outbrain/slow_logs/logstash/*.log { daily nocompress size 1 missingok ifempty copytruncate prerotate /bin/bash /var/www/html/anemometer/outbrain/pre_rotate.sh $1 endscript nosharedscripts rotate 100 } ● logstash streams logs onto the anemometer machine ● We choose not to aggregate them into one; the target file name indicates the source host name
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Anemometer,processingtheslowlogs #!/bin/bash rotated_slow_log_file=$1 rotated_slow_log_file_path=$(dirname $rotated_slow_log_file) rotated_slow_log_file_name=$(basename $rotated_slow_log_file) hostname=${rotated_slow_log_file_name%%-mysql-slow.log*} /bin/grep -v "^$" $rotated_slow_log_file | /usr/bin/pt-query-digest --user=... --password=... --review u=,p=,h=localhost,D=...,t=global_query_review --history u=,p=,h=localhost,D=...,t=global_query_review_history --filter=" $event->{Bytes} = length($event->{arg}) and $event->{hostname}="${hostname}" and $event->{clustername}="${clustername}"" --no-report --group-by-extra=host ● Reading files per mysql-host, adding host & cluster ● Secondary grouping by client-host
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Anemomaster: visibilityintomasterDML ● One of those "How did we ever live without it?" tools. ● Provides near real time (10 minute granularity) visibility into queries issued on master. ● Got an unexpected burst of INSERTs? Anemomaster provides a quick and accurate access into the specific "guilty" query. ● And ops take a permalink to the owner. ● "Anemomaster" is a nickname. This is Anemometer on top of binary log analysis instead of slow log, analyzing number of executions instead of total run time. ● Also writing all DMLs to graphite.
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Anemomaster ● Pinpointing count executions of a specific UPDATE query ● This query is owned by a known team.
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Anemomaster Host Anemomaster@Outbrain, behindthescenes MySQL Master Binary log MySQL Slave pt-query-digestRelay log MySQL Master Binary log MySQL Slave pt-query-digest Relay log Binary log Web interface
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Anemomaster, processingthebinarylogs /usr/bin/mysql -umy_user -pmy_password -e 'flush relay logsG;' sleep 1 binlog_file=$(ls -tr /path/to/mysql/mysqld-relay-bin.[0-9]* | tail -n 2 | head -n 1) mysqlbinlog $binlog_file | /usr/bin/pt-query-digest --type binlog --order-by Query_time:cnt --group-by fingerprint --limit 100 --review h=myhost,D=anemomaster,t=global_query_review --history h=myhost,D=anemomaster,t=global_query_review_history --filter=" $event->{Bytes} = length($event->{arg}) and $event- >{hostname}="$(hostname)" and $event->{clustername}="$ {clustername}" and $event->{host}="n/a" " --no-report ● Actually processing the relay logs on slaves ● Assumes SBR, work in progress for RBR
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Anemomaster, writingtographite query=" select ... " mysql anemomaster --silent --silent --raw -e "$query" | while IFS=$'t' read -r -a result_values do fingerprint_cluster=${result_values[0]} ; fingerprint_count=${result_values[1]} ; fingerprint_query=${result_values[2]} ; fingerprint_query=$(echo $fingerprint_query | sed -r -e "s/^(-- .*)]//g") fingerprint_query=$(echo $fingerprint_query | tr 'n' ' ' | tr 'r' ' ' | tr 't' ' ') fingerprint_query=${fingerprint_query%%(*} fingerprint_query=${fingerprint_query%%,*} fingerprint_query=${fingerprint_query%% set *} fingerprint_query=${fingerprint_query%% SET *} fingerprint_query=${fingerprint_query%% where *} fingerprint_query=${fingerprint_query%% WHERE *} fingerprint_query=${fingerprint_query%% join *} fingerprint_query=${fingerprint_query%% JOIN *} fingerprint_query=${fingerprint_query%% using *} fingerprint_query=${fingerprint_query%% USING *} fingerprint_query=${fingerprint_query%% select *} fingerprint_query=${fingerprint_query%% SELECT *} fingerprint_query=$(echo $fingerprint_query | tr -d "`") fingerprint_query=$(echo $fingerprint_query | tr -d "*") fingerprint_query=$(echo $fingerprint_query | tr " " "_") fingerprint_query=$(echo $fingerprint_query | tr "." "__") echo "data.mysql.${fingerprint_cluster}.mysql_dml.${fingerprint_query}.count ${fingerprint_count} $unixtime" | nc -w 1 graphite 3003 done
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain audit_login:aloginauditingplugin ● Auditing every single login to our databases ○ Keeping track of connects per minute, find problems ○ Detecting unused accounts ○ Detecting failed connects, taking action ○ Detecting naughty scripts executed by developers (haha, got your IP!) ○ And, well, auditing for the record
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain audit_login,output {"ts":"2013-09-11 09:11:47","type":"successful_login","myhost":"gromit03","thread":"74153868", "user":"web_user","priv_user":"web_user","host":"web- 87.localdomain","ip":"10.0.0.87"} {"ts":"2013-09-11 09:11:55","type":"failed_login","myhost":"gromit03","thread":"74153869","use r":"backup_user","priv_user":"","host":"web-32","ip":"10.0.0.32"} {"ts":"2013-09-11 09:11:57","type":"failed_login","myhost":"gromit03","thread":"74153870","use r":"backup_user","priv_user":"","host":"web-32","ip":"10.0.0.32"} {"ts":"2013-09-11 09:12:48","type":"successful_login","myhost":"gromit03","thread":"74153871", "user":"root","priv_user":"root","host":"localhost","ip":"10.0.0.111"} {"ts":"2013-09-11 09:13:26","type":"successful_login","myhost":"gromit03","thread":"74153872", "user":"web_user","priv_user":"web_user","host":"web- 11.localdomain","ip":"10.0.0.11"}
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain logstash read, transform, write Kibana Searchable via Lucene audit_login@Outbrain, behindthescenes MySQL Master audit log MySQL Master audit log MySQL Master audit log audit meta log grep-able like mama used to make
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain audit_login,logstash input { file { type => "mysql_audit_login" format => "json" sincedb_path => "/var/cache/logstash/.since_audit_login_log" sincedb_write_interval => 1 path => [ "/path/to/audit_login.log" ] } } filter { grep { type => "mysql_audit_login" match => [ "user", "monitoring_user" ] negate => true } grep { type => "mysql_audit_login" match => [ "user", "heartbeat_user" ] negate => true }
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain audit_login,logstash output { rabbitmq { host => "my.rmq.host" user => "logstash_user" password => "logstash_password" exchange => "logstash.out" exchange_type => "fanout" type => "mysql_audit_login" } } output { tcp { type => "mysql_audit_login" mode => "client" host => "my.logstash.aggregator" port => "23307" message_format => "%{timestamp},%{type},%{myhost},%{thread},% {user},%{priv_user},%{host},%{ip}" } }
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain audit_loginKibana@Outbrain user:webapp AND myhost:east1 AND type:failed_login
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Actionability ● Can developers actually have controlled/automated actions on the database? ● Such that everyone, including DBA/Ops, have visibility into? ● Solving the above gives developers greater ownership over their domain, even within the database server.
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Schema&datadeployments ● Who controls the database schema design? ○ Ops? Is schema design within their domain? ○ DBA? Expert about schema design, but is the DBA an expert about the business domain? ○ Developers? Do they understand indexing? ● With many dozens of engineers, we can't have the DBA be the single mutex for any schema change. ● But the DBA must know what's going on.
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Hive Meta Hive Meta MySQL&Hiveservers@Outbrain Slave MySQL Slave Slave DWH Slave Slave Meta Slave Hive Hive Hive Hive Hive Hive Hive Hive Hive Hive Hive Hive Hive Hive MySQL build server MySQL build server MySQL unit tests MySQL dev/sim MySQL dev/sim MySQL dev/sim MySQL dev/sim MySQL build server MySQL build server
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Databaseservers@Outbrain ● Multiple servers ● Multiple roles (OLTP, OLAP, Meta, Hive, others) ● Multiple environments (dev, QA, Build, Production) ● Multiple types (MySQL, Hive) ● Multiple engineers who want to deploy to them all. How and where does a developer issue a CREATE TABLE?
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Where&howtoCREATETABLE? ● Not on all servers, since the table is irrelevant to some (e.g. relevant to OLTP, not to DWH). Who keeps track? ● Shall the developer work them out one by one? Maybe a shell script? ○ Does the developer know all the credentials on all the servers? ● What if some deployment goes wrong? (Table already there; server cannot be accessed) ○ Who keeps track and retries/fixes? ○ Do you know who did what, when & where?
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Arealldatabasesequal? ● We use different schema names on our test servers than we do on production ○ Who keeps record? ● We have services which use multiple schemas, all with exact same structure. Changes must apply on all schemas. ○ We've just multiplied the number of deployments for our CREATE TABLE statement. ● Different ports, different credentials, different FEDERATED/CONNECT targets...
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Existingschemadeploymenttools ● Some excellent open source solutions. Notable are Liquibase & flywaydb ● However we found them to be unsuitable to our needs ○ Both linear ○ Multitenancy not easy to achieve ○ Mathematically sound, but reality isn't mathematically sound. ○ Require a lot of management to achieve visibility and ownership ● Some Windows-Desktop apps around. Ahem.
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Propagator ● Eventually we developed our own, "multi-everything" solution ● Propagator provides ownership, action-ability and visibility ● Developers specify what they want to execute, and for which database role ● Propagator infers the hosts, the schema/query transformations and awaits your approval.
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Propagator:submitascriptfor deployment
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Propagatoraction-ability,visibility& ownership ● Deployments are fully audited. Any failure is accounted for. ● Propagator tells you who did what, when and on which host. Also encourages "why". ● Engineers do most of the work with no intervention by DBA or ops ● DBA has control over deployments. Can retry, restart, selectively skip or issue partial queries...
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Propagator:historyvisibility
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Propagator:visibility&ownership ● The DBA may review deployments history ● Has immediate feedback on anything that went wrong ● Can most of the time figure out by herself why that went wrong and rerun the deployment ● Otherwise knows who to contact ● Commenting and tagging enhance visibility ● Typical scenario: developer is new, unsure what went wrong (this can be considered as a bug, actually) ● Next typical scenario: developer is experienced. Everything works.
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Propagator:stillmuchTODO ● Propagator has been in production at Outbrain for a few months now, and it gets the job done. ● But still TODO: ○ More feedback automation ○ Email alerts ○ Two-phase approval ○ Online schema changes integration ○ SVN integration ○ Maven integration ○ Cassandra
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Dataretention ● If disk space runs out, who gets the alert? ○ Ops? Sure, they can add some disk space (volume group free space; spare disks on shelf). But only to up to some point. ● Time for data retention. Ideally, we would store data forever. Reality is not ideal. ○ Who is the owner of retention? If I want to drop a partition, who do I approve this with? ○ Can this be more visible? ● Are you doing data retention via shell/Perl scripts? Are these tested, audited, controlled?
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Dataretentionautomation:Gardien ● An Outbrain internal service automating data retention ○ Currently works on Hive/HDFS; MySQL in the works ● Every partitioned table is owned by a person or group ● Gardien knows the business demands: ○ Rolls new partitions, knows partition scope ○ Drops old partitions, has retention policy ● Has a web interface, controlled by the business/engineers ● Provides visibility to all, actionability to owners
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Gardiendashboard ● Create rules (partitions), edit, remove ● Visible and audited
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain It'snighttime Beep
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Being'nice'tothedatabase Workinprogress ● Are you happy now that you've made your engineers all-powerful? ● Can you sleep well at night? ● No, really. What haunts your dreams? ● Darn. It's PagerDuty alert. Beep ● Apparently all the slaves are lagging. ● An engineer someone issued too many INSERTs
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Being'nice'tothedatabase Workinprogress ● How do you protect your database against malfunctioning/abusing services? ● How do you define/detect/respond to an event where your master is flooded with DMLs, and slaves just can't keep up?
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain What'syourslaves'servingcapacity? PerDC?Perservice? MySQL Master Slave Lagging Slave Slave Lagging Slave Lagging Slave Lagging Slave
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Visibility:servingcapacity ● We measure current serving capacity and make this value visible ● Not only to graphite/alerts. Also visible to any of our services. ● Our services can be nice to the database by self-throttling access or postponing tasks.
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Slave Visibility:servingcapacity, Flow Slave Lagging Slave Slave Lagging Slave Outbrain service Zookeeper Zookeeper Zookeeper Slave Availability detector service Reads status Writes summary status Consults status, connects to DB
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Forcingservicestobe'nice', WorkinProgress ● A connection pool proxy ● Proxy consults availability status ● Throttles connections based on availability Outbrain service Zookeeper Zookeeper Zookeeper MySQL cluster Consults status, approves/throttles connection Proxy Attempts to get a connection
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Contributions ● We love open source and heavily rely on open source solutions. ● We try to contribute back in form of patches, bug reports and subscribing for commercial support for open source projects. ● Some code we have open sourced: ○ Onering: https://github.com/outbrain/onering ○ Graphitus: https://github.com/ezbz/graphitus ○ Propagator: https://github.com/outbrain/propagator ○ audit_login: https://github.com/outbrain/audit_login
  • MySQL DevOps @ Outbrain Shlomi Noach Percona Live 2014 Copyright © 2014, Outbrain Thankyou! Questions?