SlideShare a Scribd company logo
1 of 43
[object Object],[object Object],[object Object]
Riak is... ,[object Object],[object Object],[object Object],[object Object]
Riak Data Model ,[object Object],[object Object],[object Object]
Basic Operations ,[object Object],[object Object],[object Object]
Extras ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
When things go wrong ,[object Object]
Situation ,[object Object],[object Object],[object Object]
Solution ,[object Object]
Hostnames ,[object Object],Aston IPA Highball Gin Framboise ESB
riak-admin join ,[object Object],[object Object],[object Object]
This can’t be good...
Quick, what do you do? ,[object Object],[object Object],[object Object]
Control the situation ,[object Object],[object Object],[object Object],[object Object]
Monitor
...for signs of...
Stabilization
Now what? ,[object Object],[object Object],[object Object]
But first ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
So what happened?! ,[object Object],[object Object],[object Object],[object Object],[object Object]
Member Status First let’s peek under the hood. $ riak-admin member_status ================================= Membership ================================Status  Ring  Pending  Node-----------------------------------------------------------------------------valid  4.3%  16.8%  riak@astonvalid  18.8%  16.8%  riak@esbvalid  19.1%  16.8%  riak@framboisevalid  19.5%  16.8%  riak@ginvalid  19.1%  16.4%  riak@highballvalid  19.1%  16.4%  riak@ipa-----------------------------------------------------------------------------Valid:6 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
Relief ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Relief ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Relief ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Eureka! ,[object Object],[object Object]
What’s the solution? ,[object Object],[object Object]
Hot Patch We patched their live, production system  while still under load. (on all nodes)  riak attachl(riak_kv_bitcask_backend).m(riak_kv_bitcask_backend).Module riak_kv_bitcask_backend compiled: Date: November 12 2011, Time: 04.18Compiler options:  [{outdir,"ebin"},  debug_info,warnings_as_errors,  {parse_transform,lager_transform},  {i,"include"}]Object file: /usr/lib/riak/lib/riak_kv-1.0.1/ebin/riak_kv_bitcask_backend.beamExports: api_version/0  is_empty/1callback/3  key_counts/0delete/4  key_counts/1drop/1  module_info/0fold_buckets/4  module_info/1fold_keys/4  put/5fold_objects/4  start/2get/3  status/1...
Bingo! And the new code did what we expected. {ok, R} = riak_core_ring_manager:get_my_ring().[riak_core_vnode_master:get_vnode_pid(Partition, riak_kv_vnode) || {Partition,_} <- riak_core_ring:all_owners(R)].(riak@gin)19> [riak_core_vnode_master:get_vnode_pid(Partition, riak_kv_vnode) || {Partition,_} <- riak_core_ring:all_owners(R)].22:48:07.423 [notice]  Unused data directories exist for partition &quot;11417981541647679048466287755595961091061972992 &quot;: &quot;/data/riak/bitcask/11417981541647679048466287755595961091061972992&quot;22:48:07.785 [notice]  Unused data directories exist for partition &quot;582317058624031631471780675535394015644160622592 &quot;: &quot;/data/riak/bitcask/582317058624031631471780675535394015644160622592&quot;22:48:07.829 [notice]  Unused data directories exist for partition &quot;782131735602866014819940711258323334737745149952 &quot;: &quot;/data/riak/bitcask/782131735602866014819940711258323334737745149952&quot;[{ok,<0.30093.11>},...
Manual Cleanup So we backed up those vnodes with unused data on Gin to another system and manually removed them. gin:/data/riak/bitcask$ ls manual_cleanup/ 11417981541647679048466287755595961091061972992  782131735602866014819940711258323334737745149952582317058624031631471780675535394015644160622592 gin:/data/riak/bitcask$ rm -rf manual_cleanup
Gin’s Status Improves
Bedtime ,[object Object],[object Object]
Next Day’s Plan ,[object Object],[object Object],[object Object],[object Object]
Let’s Get Started ,[object Object],[object Object],[object Object],[object Object],[object Object]
Gin Moves Data to IPA
Highball’s Turn Highball was next lowest now that Gin was handing data off, time to restart it too. on highball application:unset_env(riak_core, forced_ownership_handoff).application:set_env(riak_core, vnode_inactivity_timeout, 60000).application:set_env(riak_core, handoff_concurrency, 1). on gin application:set_env(riak_core, handoff_concurrency, 4). % the default setting riak_core_vnode_manager:force_handoffs().
Rebalance Starts
and keeps going...
and going...
and going...
Rebalanced
Minimal Impact ,[object Object],[object Object]
Moral of the Story ,[object Object],[object Object],[object Object]
Things break, Riak  bends . .
Thank You ,[object Object],[object Object],[object Object]

More Related Content

Similar to Riak a successful failure

Saga pattern and event sourcing with kafka
Saga pattern and event sourcing with kafkaSaga pattern and event sourcing with kafka
Saga pattern and event sourcing with kafkaRoan Brasil Monteiro
 
Cassandra Troubleshooting (for 2.0 and earlier)
Cassandra Troubleshooting (for 2.0 and earlier)Cassandra Troubleshooting (for 2.0 and earlier)
Cassandra Troubleshooting (for 2.0 and earlier)J.B. Langston
 
Oracle Active Data Guard 12c New Features
Oracle Active Data Guard 12c New FeaturesOracle Active Data Guard 12c New Features
Oracle Active Data Guard 12c New FeaturesEmre Baransel
 
Capacity Management for Web Operations
Capacity Management for Web OperationsCapacity Management for Web Operations
Capacity Management for Web OperationsJohn Allspaw
 
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...DataStax
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACKristofferson A
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com confluent
 
J An Gutierrez Credentials
J An Gutierrez CredentialsJ An Gutierrez Credentials
J An Gutierrez CredentialsJ. An Gutierrez
 
11 Things About11g
11 Things About11g11 Things About11g
11 Things About11gfcamachob
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01Karam Abuataya
 
Scylla Summit 2018: What's New in Scylla Manager?
Scylla Summit 2018: What's New in Scylla Manager?Scylla Summit 2018: What's New in Scylla Manager?
Scylla Summit 2018: What's New in Scylla Manager?ScyllaDB
 
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Whoops, The Numbers Are Wrong! Scaling Data Quality @ NetflixWhoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Whoops, The Numbers Are Wrong! Scaling Data Quality @ NetflixDataWorks Summit
 
Scaling Data Quality @ Netflix
Scaling Data Quality @ NetflixScaling Data Quality @ Netflix
Scaling Data Quality @ NetflixMichelle Ufford
 
B35 all you wanna know about rman by francisco alvarez
B35 all you wanna know about rman by francisco alvarezB35 all you wanna know about rman by francisco alvarez
B35 all you wanna know about rman by francisco alvarezInsight Technology, Inc.
 
Oracle10g New Features I
Oracle10g New Features IOracle10g New Features I
Oracle10g New Features IDenish Patel
 
Patterns in the cloud
Patterns in the cloudPatterns in the cloud
Patterns in the cloudDavid Manning
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
MySQL Utilities -- Cool Tools For You: PHP World Nov 16 2016
MySQL Utilities -- Cool Tools For You: PHP World Nov 16 2016MySQL Utilities -- Cool Tools For You: PHP World Nov 16 2016
MySQL Utilities -- Cool Tools For You: PHP World Nov 16 2016Dave Stokes
 

Similar to Riak a successful failure (20)

Saga pattern and event sourcing with kafka
Saga pattern and event sourcing with kafkaSaga pattern and event sourcing with kafka
Saga pattern and event sourcing with kafka
 
Cassandra Troubleshooting (for 2.0 and earlier)
Cassandra Troubleshooting (for 2.0 and earlier)Cassandra Troubleshooting (for 2.0 and earlier)
Cassandra Troubleshooting (for 2.0 and earlier)
 
Oracle Active Data Guard 12c New Features
Oracle Active Data Guard 12c New FeaturesOracle Active Data Guard 12c New Features
Oracle Active Data Guard 12c New Features
 
Capacity Management for Web Operations
Capacity Management for Web OperationsCapacity Management for Web Operations
Capacity Management for Web Operations
 
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
 
MySQL Monitoring 101
MySQL Monitoring 101MySQL Monitoring 101
MySQL Monitoring 101
 
Backups
BackupsBackups
Backups
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com
 
J An Gutierrez Credentials
J An Gutierrez CredentialsJ An Gutierrez Credentials
J An Gutierrez Credentials
 
11 Things About11g
11 Things About11g11 Things About11g
11 Things About11g
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01
 
Scylla Summit 2018: What's New in Scylla Manager?
Scylla Summit 2018: What's New in Scylla Manager?Scylla Summit 2018: What's New in Scylla Manager?
Scylla Summit 2018: What's New in Scylla Manager?
 
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Whoops, The Numbers Are Wrong! Scaling Data Quality @ NetflixWhoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
 
Scaling Data Quality @ Netflix
Scaling Data Quality @ NetflixScaling Data Quality @ Netflix
Scaling Data Quality @ Netflix
 
B35 all you wanna know about rman by francisco alvarez
B35 all you wanna know about rman by francisco alvarezB35 all you wanna know about rman by francisco alvarez
B35 all you wanna know about rman by francisco alvarez
 
Oracle10g New Features I
Oracle10g New Features IOracle10g New Features I
Oracle10g New Features I
 
Patterns in the cloud
Patterns in the cloudPatterns in the cloud
Patterns in the cloud
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
MySQL Utilities -- Cool Tools For You: PHP World Nov 16 2016
MySQL Utilities -- Cool Tools For You: PHP World Nov 16 2016MySQL Utilities -- Cool Tools For You: PHP World Nov 16 2016
MySQL Utilities -- Cool Tools For You: PHP World Nov 16 2016
 

More from GiltTech

Real World Cassandra
Real World CassandraReal World Cassandra
Real World CassandraGiltTech
 
Gotszling mogo db-membase
Gotszling mogo db-membaseGotszling mogo db-membase
Gotszling mogo db-membaseGiltTech
 
Couchdb at AMEX
Couchdb at AMEXCouchdb at AMEX
Couchdb at AMEXGiltTech
 
Scala for the web Lightning Talk
Scala for the web Lightning TalkScala for the web Lightning Talk
Scala for the web Lightning TalkGiltTech
 
Clojure Lightning Talk
Clojure Lightning TalkClojure Lightning Talk
Clojure Lightning TalkGiltTech
 
CoffeeScript Lightning Talk
CoffeeScript Lightning TalkCoffeeScript Lightning Talk
CoffeeScript Lightning TalkGiltTech
 
Erlang Lightning Talk
Erlang Lightning TalkErlang Lightning Talk
Erlang Lightning TalkGiltTech
 
Groovy and Grails
Groovy and GrailsGroovy and Grails
Groovy and GrailsGiltTech
 
Java to scala
Java to scalaJava to scala
Java to scalaGiltTech
 

More from GiltTech (9)

Real World Cassandra
Real World CassandraReal World Cassandra
Real World Cassandra
 
Gotszling mogo db-membase
Gotszling mogo db-membaseGotszling mogo db-membase
Gotszling mogo db-membase
 
Couchdb at AMEX
Couchdb at AMEXCouchdb at AMEX
Couchdb at AMEX
 
Scala for the web Lightning Talk
Scala for the web Lightning TalkScala for the web Lightning Talk
Scala for the web Lightning Talk
 
Clojure Lightning Talk
Clojure Lightning TalkClojure Lightning Talk
Clojure Lightning Talk
 
CoffeeScript Lightning Talk
CoffeeScript Lightning TalkCoffeeScript Lightning Talk
CoffeeScript Lightning Talk
 
Erlang Lightning Talk
Erlang Lightning TalkErlang Lightning Talk
Erlang Lightning Talk
 
Groovy and Grails
Groovy and GrailsGroovy and Grails
Groovy and Grails
 
Java to scala
Java to scalaJava to scala
Java to scala
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

Riak a successful failure

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11. This can’t be good...
  • 12.
  • 13.
  • 17.
  • 18.
  • 19.
  • 20. Member Status First let’s peek under the hood. $ riak-admin member_status ================================= Membership ================================Status Ring Pending Node-----------------------------------------------------------------------------valid 4.3% 16.8% riak@astonvalid 18.8% 16.8% riak@esbvalid 19.1% 16.8% riak@framboisevalid 19.5% 16.8% riak@ginvalid 19.1% 16.4% riak@highballvalid 19.1% 16.4% riak@ipa-----------------------------------------------------------------------------Valid:6 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26. Hot Patch We patched their live, production system while still under load. (on all nodes) riak attachl(riak_kv_bitcask_backend).m(riak_kv_bitcask_backend).Module riak_kv_bitcask_backend compiled: Date: November 12 2011, Time: 04.18Compiler options: [{outdir,&quot;ebin&quot;}, debug_info,warnings_as_errors, {parse_transform,lager_transform}, {i,&quot;include&quot;}]Object file: /usr/lib/riak/lib/riak_kv-1.0.1/ebin/riak_kv_bitcask_backend.beamExports: api_version/0 is_empty/1callback/3 key_counts/0delete/4 key_counts/1drop/1 module_info/0fold_buckets/4 module_info/1fold_keys/4 put/5fold_objects/4 start/2get/3 status/1...
  • 27. Bingo! And the new code did what we expected. {ok, R} = riak_core_ring_manager:get_my_ring().[riak_core_vnode_master:get_vnode_pid(Partition, riak_kv_vnode) || {Partition,_} <- riak_core_ring:all_owners(R)].(riak@gin)19> [riak_core_vnode_master:get_vnode_pid(Partition, riak_kv_vnode) || {Partition,_} <- riak_core_ring:all_owners(R)].22:48:07.423 [notice] Unused data directories exist for partition &quot;11417981541647679048466287755595961091061972992 &quot;: &quot;/data/riak/bitcask/11417981541647679048466287755595961091061972992&quot;22:48:07.785 [notice] Unused data directories exist for partition &quot;582317058624031631471780675535394015644160622592 &quot;: &quot;/data/riak/bitcask/582317058624031631471780675535394015644160622592&quot;22:48:07.829 [notice] Unused data directories exist for partition &quot;782131735602866014819940711258323334737745149952 &quot;: &quot;/data/riak/bitcask/782131735602866014819940711258323334737745149952&quot;[{ok,<0.30093.11>},...
  • 28. Manual Cleanup So we backed up those vnodes with unused data on Gin to another system and manually removed them. gin:/data/riak/bitcask$ ls manual_cleanup/ 11417981541647679048466287755595961091061972992 782131735602866014819940711258323334737745149952582317058624031631471780675535394015644160622592 gin:/data/riak/bitcask$ rm -rf manual_cleanup
  • 30.
  • 31.
  • 32.
  • 33. Gin Moves Data to IPA
  • 34. Highball’s Turn Highball was next lowest now that Gin was handing data off, time to restart it too. on highball application:unset_env(riak_core, forced_ownership_handoff).application:set_env(riak_core, vnode_inactivity_timeout, 60000).application:set_env(riak_core, handoff_concurrency, 1). on gin application:set_env(riak_core, handoff_concurrency, 4). % the default setting riak_core_vnode_manager:force_handoffs().
  • 40.
  • 41.
  • 42. Things break, Riak bends . .
  • 43.

Editor's Notes

  1. Thank you to Greg Burd for most of these slides. He was going to give the presentation, but did not feel well enough to be here tonight.
  2. Gin had not removed that vnode’s data directory after sending it to Aston.We had confirmation, data was not being removed after transfers finished.This would have eventually eaten all space on all nodes and halted the cluster.
  3. We already had a solution ready for 1.0.2 which would properly identify any orphaned vnodes, why not simply use that?So we tested that on our laptops, creating a close approximation of the customer’s environment.
  4. At this point it was late at night, the cluster was servicing requests as always and customers had no idea anything was wrong.We all went to bed.And didn’t reconvene for 12 hours.
  5. On Gin only, we reset things we’d changed to default values and then re-enabled handoffs.