Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mashing the data

A proposal for replicating relational databases to non-relational ones. We choose Google Cloud Datastore, but could as well use MongoDB

  • Login to see the comments

Mashing the data

  1. 1. Mashing the Data Real-Time replication from MySQL to Google Cloud Datastore
  2. 2. Ingredients ● MySQL ● NodeJS ● ZongJi ● Google Cloud Datastore
  3. 3. There are two types of DBAs: 1) DBAs that do backups 2) DBAs that will do backups
  4. 4. MySQL ● Most used Open source DB - second place overall after Oracle (but almost equal)* ● Since 1995 ● Currently at version 5.7 (5.7.16 in Oct’16) ● Several forks - MariaDB, Percona ● Several storage engines, most used is InnoDB ● NDB Cluster and Master-Master Replication for HA * According to
  5. 5. A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
  6. 6. MySQL replication ● Master - Slave(s) ● Slaves can be Masters in their turn (Master->Slave->Slave->...->Slave) ○ log_slave_updates ● Only data modifying queries are logged (Create, Update, Delete; not Reads) ● 2 ½ types of replication ○ Statement Based (SBR) -> binary log records queries (UPDATE … SET ..) which are then replayed on slave ○ Row Based (RBR) -> binary log records directly the values of the affected row before and after the change is applied ○ Mixed -> binary log records a mix of SBR and RBR (default is SBR, but for certain statements + storage engine used, the log is automatically switched to row-based)
  7. 7. Q: Why do you never ask SQL people to help you move your furniture? A: They sometimes drop the table
  8. 8. MySQL replication (cont’d) ● SBR is good when changes affect lots of rows (as for e.g. 1k modified rows we only send a few bytes across the wire) ● SBR has problems when there are inconsistencies between master and slave or when queries are not deterministic (e.g. UPDATE … SET … LIMIT 100) ● RBR is good in maintaining a better consistency (as every changed row is replicated) ● RBR can be problematic when many rows are changed with a single statement (lots of traffic over the network)
  9. 9. Google Cloud Datastore
  10. 10. What is GCD ● NoSQL document database ● Automatic scaling ● High performance ● Flexible storage
  11. 11. GCD (cont’d) ● Balance of strong and eventual consistency ○ entity lookups by key and ancestor queries always receive strongly consistent data ○ Other queries are eventually consistent ● Encryption at rest ○ encrypts all data before it is written to disk ● Querying of data through GQL ○ Similar with “classic” SQL; e.g. SELECT * FROM myKind WHERE myProp >= 100 AND myProp < 200 or SELECT * FROM myKind ORDER BY myProp DESC LIMIT 100 ● By default all properties are indexed, supports composite indexes (a bit more work to enable them though)
  12. 12. Our Setup
  13. 13. Setup MySQL Master MySQL Slave SBR NodeJS App RBR Google Cloud Datastore Google Cloud Node modules
  14. 14. Details about NodeJS App ● Uses ZongJi ( - MySQL binlog listener) var ZongJi = require('zongji'); var zongji = new ZongJi(config.database); zongji.on('binlog',function (evt) {doSomething('binlog',evt)}) zongji.on('query', function(evt) {doSomething('query',evt)}) zongji.on('writerows',function(evt) {doSomething('insert',evt)}) zongji.on('updaterows', function(evt) {doSomething('update',evt)}) zongji.on('deleterows', function(evt) {doSomething('delete',evt)})
  15. 15. NodeJS (cont’d) zongji.start({ startAtEnd: true, includeSchema: {yourDBhere":true,"yourOtherDBHere":true},//config.monitor, includeEvents: [ 'tablemap', 'writerows', 'updaterows', 'deleterows' , 'query','rotate'] }); var doSomething = function(type, event) { //event has a rows attribute containing every modified row //it also has a tableMap containing table metadata (most important - table name) }
  16. 16. NodeJS (last one, I promise) var sendToDataStore = function(namespace,idfldname,row) { var k = datastore.key([namespace, row[idfldname]]);{key:k,data:row} ,function(err,res){ if(err) console.log("ERROR",err) else console.log("OK",JSON.stringify(res)) }); }
  17. 17. Demo Time
  18. 18. In case the demo does not work
  19. 19. Thank you!