Big Data with MySQL

6,390 views
6,024 views

Published on

I presented this slides for the first time at the Percona Live Conference 2013 in Santa Clara.

Published in: Technology

Big Data with MySQL

  1. 1. Ivan ZorattiBig Data with MySQLPercona Live Santa Clara 2013V1304.01Friday, 3 May 13
  2. 2. Who is Ivan?Friday, 3 May 13
  3. 3. SkySQL•Leading provider of open sourcedatabases, services andsolutions•Home for the founders and theoriginal developers of the coreof MySQL•The creators of MariaDB, thedrop-off, innovativereplacement of MySQLFriday, 3 May 13
  4. 4. What is Big Data?http://marketingblogged.marketingmagazine.co.uk/files/Big-Data-3.jpgFriday, 3 May 13
  5. 5. PAGEBig Data!Big data is a collection of datasets so large and complex that itbecomes difficult to processusing on-hand databasemanagement tools or traditionaldata processing applications.5http://readwrite.com/files/styles/800_450sc/public/files/fields/shutterstock_bigdata.jpgFriday, 3 May 13
  6. 6. PAGEBig Data By Structure6Unstructured•Store everything you have/you find•In any format and shape•You do not know how to use it, but it maycome handy•Storing unstructured data is usually cheaper thanstoring it in a more structured datastore•Does not fit well in a relational database•Examples:•Text: Plain text, documents, web content,messages•Bitmap: Image, audio, video•Typical approach:•Mining, pattern recognition, tagging•Usually batch analysisStructured•Store only what you need•In a good format, ready to be used•You should already know how to use it, or atleast what it means•Storing structured data is quite expensive•Raw data, indexing, denormalisation,aggregation•Arelational database is still the best choice•Examples:•Machine-Generated Data (MGD)•Tags, counters, sales•Typical approach:•BI tools, reporting•Real time analysis change data captureFriday, 3 May 13
  7. 7. PAGEUnstructured•Store everything you have/you find•In any format and shape•You do not know how to use it, but it maycome handy•Storing unstructured data is usually cheaper thanstoring it in a more structured datastore•Does not fit well in a relational database•Examples:•Text: Plain text, documents, web content,messages•Bitmap: Image, audio, video•Typical approach:•Mining, pattern recognition, tagging•Usually batch analysisStructured•Store only what you need•In a good format, ready to be used•You should already know how to use it, or atleast what it means•Storing structured data is quite expensive•Raw data, indexing, denormalisation,aggregation•Arelational database is still the best choice•Examples:•Machine-Generated Data (MGD)•Tags, counters, sales•Typical approach:•BI tools, reporting•Real time analysis change data captureBig Data By Structure7Friday, 3 May 13
  8. 8. PAGEHow “Big” is Big Data?•Data Factors•Size•Speed to collect/generate•Variety•Resources•Administrators•Developers•Infrastructure•Growth•Collection•Processing•Availability•To whom?•For how long?•In which format?•Aggregated•Detailed8Friday, 3 May 13
  9. 9. PAGEHow to manage Big Data•Collection - Storage -Archive•Load - Transform -Analyze•Access - Explore - Utilize9http://www.futuresmag.com/2012/07/01/big-data-manage-it-dont-drown-in-itFriday, 3 May 13
  10. 10. Big Data with MySQLhttp://news.mydosti.com/newsphotos/tech/BigDataV1Dec22012.jpgFriday, 3 May 13
  11. 11. PAGETechnologies toUse / Consider / Watch•MyISAM and MyISAM compression•InnoDB compression•MySQL 5.6 Partitioning•MariaDB Optimizer•MariaDB Virtual & DynamicColumns•Cassandra Storage Engine•Connect Storage Engine•Columnar Databases•InfiniDB•Infobright•TokuDB Storage Engine11Friday, 3 May 13
  12. 12. PAGEColumnar Databases•Automatic compression•Automatic column storage•Data distribution•Map/Reduce approach•MPP / Parallel loading•No indexes•On public clouds, HW or SWappliances12Friday, 3 May 13
  13. 13. PAGETokuDB•Increased Performance•Increased Compression•Online administration•No Index rebuild13Friday, 3 May 13
  14. 14. PAGEMyISAM•Static, dynamic and compressedformat•Multiple key cache, CACHE INDEXand LOAD INDEX•Compressed tables•Horizontal partitioning (manual)•External locking14Friday, 3 May 13
  15. 15. PAGEInnoDB/XtraDB•Data Load•Pre-order data•Split data into chunks•unique_checks = 0;•foreign_key_checks = 0;•sql_log_bin = 0;•innodb_autoinc_lock_mode = 2;•Compression and block size•Persistent optimizer stats•innodb_stats_persistent•innodb_stats_auto_recalc15SET GLOBAL innodb_file_per_table = 1;SET GLOBAL innodb_file_format = Barracuda;CREATE TABLE t1( c1 INT PRIMARY KEY,c2 VARCHAR(255) )ROW_FORMAT = COMPRESSEDKEY_BLOCK_SIZE = 8;LOAD   DATA LOCAL INFILE /usr2/t1_01_simple INTO TABLE t1;Query OK, 134217728 rows affected (1 hour 34 min 7.49 sec)Records: 134217728  Deleted: 0  Skipped: 0  Warnings: 0LOAD   DATA LOCAL INFILE /usr2/t1_01_simple INTO TABLE t2;Query OK, 134217728 rows affected (25 min 20.75 sec)Records: 134217728  Deleted: 0  Skipped: 0  Warnings: 0Friday, 3 May 13
  16. 16. PAGEPartitioning (MySQL 5.6)•Partitioning Types•RANGE, LIST, RANGE COLUMN,HASH, LINEAR HASH, KEY LINEARKEY, sub-partitions•Partition and lock pruning•Use of INDEX and DATADIRECTORY•PARTITIONADD, DROP,REORGANIZE, COALESCE,TRUNCATE, EXCHANGE,REBUILD, OPTIMIZE, CHECK,ANALYZE, REPAIR16CREATE TABLE t1 ( c1 INT, c2 DATE )PARTITION BY RANGE( YEAR( c2 ) )SUBPARTITION BY HASH ( TO_DAYS( c2 ) )( PARTITION p0 VALUES LESS THAN (1990) (SUBPARTITION s0DATA DIRECTORY = /disk0/dataINDEX DIRECTORY = /disk0/idx,SUBPARTITION s1DATA DIRECTORY = /disk1/dataINDEX DIRECTORY = /disk1/idx ),...ALTER TABLE t1EXCHANGE PARTITION p3 WITH TABLE t2;-- Range and List partitionsALTER TABLE t1 REORGANIZE PARTITIONp0,p1,p2,p3 INTO (PARTITION m0 VALUES LESS THAN (1980),PARTITION m1 VALUES LESS THAN (2000));-- Hash and Key partitionsALTER TABLE t1 COALESCE PARTITION 10;ALTER TABLE t1 ADD PARTITION PARTITIONS 5;Friday, 3 May 13
  17. 17. PAGEMariaDB Optimizer•Multi-Range Read (MRR)*•Index Merge / Sort intersection•Batch KeyAccess*•Block hash join•Cost-based choice of range vs.index_merge•ORDER BY ... LIMIT <limit>*•MariaDB 10•Subqueries•Semi-join*•Materialization*•subquery cache•LIMIT ... ROWS EXAMINED<limit>17(*) - Available in MySQL 5.6Friday, 3 May 13
  18. 18. PAGEVirtual & Dynamic ColumnsVIRTUAL COLUMNS•For InnoDB, MyISAM andAria•PERSISTENT (stored) or VIRTUAL(generated)18CREATE TABLE t1 (c1 INT NOT NULL,c2 VARCHAR(32),c3 INT AS( c1 MOD 10 ) VIRTUAL,c4 VARCHAR(5) AS( LEFT(B,5) ) PERSISTENT);DYNAMIC COLUMNS•Implement a schemaless,document store•COLUMN_ CREATE,ADD, GET, LIST,JSON, EXISTS, CHECK, DELETE•Nested colums are allowed•Main datatypes are allowed•Max 1GB documentsCREATE TABLE assets (item_name VARCHAR(32) PRIMARY KEY,dynamic_cols BLOB );INSERT INTO assets VALUES (MariaDB T-shirt,COLUMN_CREATE( color, blue,size, XL ) );INSERT INTO assets VALUES (Thinkpad Laptop,COLUMN_CREATE( color, black,price, 500 ) );Friday, 3 May 13
  19. 19. PAGECassandra Storage Engine•Column Family == Table•Rowkey, static and dynamiccolumns allowed•Batch key access supportSET cassandra_default_thrift_host =192.168.0.10CREATE TABLE cassandra_tbl (rowkey INT PRIMARY KEY,col1 VARCHAR(25),col2 BIGINT,dyn_cols BLOB DYNAMIC_COLUMN_STORAGE = yes )ENGINE = cassandraKEYSPACE = cassandra_key_spaceCOLUMN_FAMILY = column_family_name;19Friday, 3 May 13
  20. 20. PAGEConnect Storage Engine•Any file format as MySQLTABLE:•ODBC•Text, XML, *ML•Excel,Access etc.•MariaDB CREATE TABLE options•Multi-file table•TableAutocreation•Condition push down•Read/Write and Multi Storage Engine Join•CREATE INDEX20CREATE TABLE handoutENGINE = CONNECTTABLE_TYPE = XMLFILE_NAME = handout.htmHEADER = yes OPTION_LIST =name = TABLE,coltype = HTML,attribute =(border=1;cellpadding=5);Friday, 3 May 13
  21. 21. Starting Your Big Data ProjectFriday, 3 May 13
  22. 22. PAGEWhy would you use MySQL?• Time• Knowledge• Infrastructure• Costs• Simplified Integration• Not so “big” data22Friday, 3 May 13
  23. 23. PAGEApache Hadoop & Friends23HDFSMapReducePIG HIVEHCatalogHBASEZooKeeper•Mahout•Ambari, Ganglia,Nagios•Sqoop•Cascading•Oozie•Flume•Protobuf, Avro,Thrift•Fuse-DFS•Chukwa•CassandraFriday, 3 May 13
  24. 24. PAGEMySQL & Friends24MySQL/MariaDB/Storage EnginesSQL OptimizerScriptsStored Procedures DMLDB Schema / DDLMySQL/MariaDBSkySQLDS•Mahout•SDS, Ganglia,Nagios•mysqlimport•Cascading•Talend, Pentaho•ConnectFriday, 3 May 13
  25. 25. PAGEJoin us at the Solutions Day•Cassandra and Connect Storage Engine•Map/Reduce approach - Proxy optimisation•Multiple protocols and more25Friday, 3 May 13
  26. 26. Thank You!ivan@skysql.comizoratti.blogspot.comwww.slideshare.net/izorattiwww.skysql.comFriday, 3 May 13

×