Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Ivan ZorattiBig Data with MySQLPercona Live Santa Clara 2013V1304.01Friday, 3 May 13
Who is Ivan?Friday, 3 May 13
SkySQL•Leading provider of open sourcedatabases, services andsolutions•Home for the founders and theoriginal developers of...
What is Big Data?http://marketingblogged.marketingmagazine.co.uk/files/Big-Data-3.jpgFriday, 3 May 13
PAGEBig Data!Big data is a collection of datasets so large and complex that itbecomes difficult to processusing on-hand da...
PAGEBig Data By Structure6Unstructured•Store everything you have/you find•In any format and shape•You do not know how to u...
PAGEUnstructured•Store everything you have/you find•In any format and shape•You do not know how to use it, but it maycome ...
PAGEHow “Big” is Big Data?•Data Factors•Size•Speed to collect/generate•Variety•Resources•Administrators•Developers•Infrast...
PAGEHow to manage Big Data•Collection - Storage -Archive•Load - Transform -Analyze•Access - Explore - Utilize9http://www.f...
Big Data with MySQLhttp://news.mydosti.com/newsphotos/tech/BigDataV1Dec22012.jpgFriday, 3 May 13
PAGETechnologies toUse / Consider / Watch•MyISAM and MyISAM compression•InnoDB compression•MySQL 5.6 Partitioning•MariaDB ...
PAGEColumnar Databases•Automatic compression•Automatic column storage•Data distribution•Map/Reduce approach•MPP / Parallel...
PAGETokuDB•Increased Performance•Increased Compression•Online administration•No Index rebuild13Friday, 3 May 13
PAGEMyISAM•Static, dynamic and compressedformat•Multiple key cache, CACHE INDEXand LOAD INDEX•Compressed tables•Horizontal...
PAGEInnoDB/XtraDB•Data Load•Pre-order data•Split data into chunks•unique_checks = 0;•foreign_key_checks = 0;•sql_log_bin =...
PAGEPartitioning (MySQL 5.6)•Partitioning Types•RANGE, LIST, RANGE COLUMN,HASH, LINEAR HASH, KEY LINEARKEY, sub-partitions...
PAGEMariaDB Optimizer•Multi-Range Read (MRR)*•Index Merge / Sort intersection•Batch KeyAccess*•Block hash join•Cost-based ...
PAGEVirtual & Dynamic ColumnsVIRTUAL COLUMNS•For InnoDB, MyISAM andAria•PERSISTENT (stored) or VIRTUAL(generated)18CREATE ...
PAGECassandra Storage Engine•Column Family == Table•Rowkey, static and dynamiccolumns allowed•Batch key access supportSET ...
PAGEConnect Storage Engine•Any file format as MySQLTABLE:•ODBC•Text, XML, *ML•Excel,Access etc.•MariaDB CREATE TABLE optio...
Starting Your Big Data ProjectFriday, 3 May 13
PAGEWhy would you use MySQL?• Time• Knowledge• Infrastructure• Costs• Simplified Integration• Not so “big” data22Friday, 3...
PAGEApache Hadoop & Friends23HDFSMapReducePIG HIVEHCatalogHBASEZooKeeper•Mahout•Ambari, Ganglia,Nagios•Sqoop•Cascading•Ooz...
PAGEMySQL & Friends24MySQL/MariaDB/Storage EnginesSQL OptimizerScriptsStored Procedures DMLDB Schema / DDLMySQL/MariaDBSky...
PAGEJoin us at the Solutions Day•Cassandra and Connect Storage Engine•Map/Reduce approach - Proxy optimisation•Multiple pr...
Thank You!ivan@skysql.comizoratti.blogspot.comwww.slideshare.net/izorattiwww.skysql.comFriday, 3 May 13
Upcoming SlideShare
Loading in …5
×

Big Data with MySQL

9,621 views

Published on

I presented this slides for the first time at the Percona Live Conference 2013 in Santa Clara.

Published in: Technology
  • Dating direct: ❤❤❤ http://bit.ly/36cXjBY ❤❤❤
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating for everyone is here: ♥♥♥ http://bit.ly/36cXjBY ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Big Data with MySQL

  1. 1. Ivan ZorattiBig Data with MySQLPercona Live Santa Clara 2013V1304.01Friday, 3 May 13
  2. 2. Who is Ivan?Friday, 3 May 13
  3. 3. SkySQL•Leading provider of open sourcedatabases, services andsolutions•Home for the founders and theoriginal developers of the coreof MySQL•The creators of MariaDB, thedrop-off, innovativereplacement of MySQLFriday, 3 May 13
  4. 4. What is Big Data?http://marketingblogged.marketingmagazine.co.uk/files/Big-Data-3.jpgFriday, 3 May 13
  5. 5. PAGEBig Data!Big data is a collection of datasets so large and complex that itbecomes difficult to processusing on-hand databasemanagement tools or traditionaldata processing applications.5http://readwrite.com/files/styles/800_450sc/public/files/fields/shutterstock_bigdata.jpgFriday, 3 May 13
  6. 6. PAGEBig Data By Structure6Unstructured•Store everything you have/you find•In any format and shape•You do not know how to use it, but it maycome handy•Storing unstructured data is usually cheaper thanstoring it in a more structured datastore•Does not fit well in a relational database•Examples:•Text: Plain text, documents, web content,messages•Bitmap: Image, audio, video•Typical approach:•Mining, pattern recognition, tagging•Usually batch analysisStructured•Store only what you need•In a good format, ready to be used•You should already know how to use it, or atleast what it means•Storing structured data is quite expensive•Raw data, indexing, denormalisation,aggregation•Arelational database is still the best choice•Examples:•Machine-Generated Data (MGD)•Tags, counters, sales•Typical approach:•BI tools, reporting•Real time analysis change data captureFriday, 3 May 13
  7. 7. PAGEUnstructured•Store everything you have/you find•In any format and shape•You do not know how to use it, but it maycome handy•Storing unstructured data is usually cheaper thanstoring it in a more structured datastore•Does not fit well in a relational database•Examples:•Text: Plain text, documents, web content,messages•Bitmap: Image, audio, video•Typical approach:•Mining, pattern recognition, tagging•Usually batch analysisStructured•Store only what you need•In a good format, ready to be used•You should already know how to use it, or atleast what it means•Storing structured data is quite expensive•Raw data, indexing, denormalisation,aggregation•Arelational database is still the best choice•Examples:•Machine-Generated Data (MGD)•Tags, counters, sales•Typical approach:•BI tools, reporting•Real time analysis change data captureBig Data By Structure7Friday, 3 May 13
  8. 8. PAGEHow “Big” is Big Data?•Data Factors•Size•Speed to collect/generate•Variety•Resources•Administrators•Developers•Infrastructure•Growth•Collection•Processing•Availability•To whom?•For how long?•In which format?•Aggregated•Detailed8Friday, 3 May 13
  9. 9. PAGEHow to manage Big Data•Collection - Storage -Archive•Load - Transform -Analyze•Access - Explore - Utilize9http://www.futuresmag.com/2012/07/01/big-data-manage-it-dont-drown-in-itFriday, 3 May 13
  10. 10. Big Data with MySQLhttp://news.mydosti.com/newsphotos/tech/BigDataV1Dec22012.jpgFriday, 3 May 13
  11. 11. PAGETechnologies toUse / Consider / Watch•MyISAM and MyISAM compression•InnoDB compression•MySQL 5.6 Partitioning•MariaDB Optimizer•MariaDB Virtual & DynamicColumns•Cassandra Storage Engine•Connect Storage Engine•Columnar Databases•InfiniDB•Infobright•TokuDB Storage Engine11Friday, 3 May 13
  12. 12. PAGEColumnar Databases•Automatic compression•Automatic column storage•Data distribution•Map/Reduce approach•MPP / Parallel loading•No indexes•On public clouds, HW or SWappliances12Friday, 3 May 13
  13. 13. PAGETokuDB•Increased Performance•Increased Compression•Online administration•No Index rebuild13Friday, 3 May 13
  14. 14. PAGEMyISAM•Static, dynamic and compressedformat•Multiple key cache, CACHE INDEXand LOAD INDEX•Compressed tables•Horizontal partitioning (manual)•External locking14Friday, 3 May 13
  15. 15. PAGEInnoDB/XtraDB•Data Load•Pre-order data•Split data into chunks•unique_checks = 0;•foreign_key_checks = 0;•sql_log_bin = 0;•innodb_autoinc_lock_mode = 2;•Compression and block size•Persistent optimizer stats•innodb_stats_persistent•innodb_stats_auto_recalc15SET GLOBAL innodb_file_per_table = 1;SET GLOBAL innodb_file_format = Barracuda;CREATE TABLE t1( c1 INT PRIMARY KEY,c2 VARCHAR(255) )ROW_FORMAT = COMPRESSEDKEY_BLOCK_SIZE = 8;LOAD   DATA LOCAL INFILE /usr2/t1_01_simple INTO TABLE t1;Query OK, 134217728 rows affected (1 hour 34 min 7.49 sec)Records: 134217728  Deleted: 0  Skipped: 0  Warnings: 0LOAD   DATA LOCAL INFILE /usr2/t1_01_simple INTO TABLE t2;Query OK, 134217728 rows affected (25 min 20.75 sec)Records: 134217728  Deleted: 0  Skipped: 0  Warnings: 0Friday, 3 May 13
  16. 16. PAGEPartitioning (MySQL 5.6)•Partitioning Types•RANGE, LIST, RANGE COLUMN,HASH, LINEAR HASH, KEY LINEARKEY, sub-partitions•Partition and lock pruning•Use of INDEX and DATADIRECTORY•PARTITIONADD, DROP,REORGANIZE, COALESCE,TRUNCATE, EXCHANGE,REBUILD, OPTIMIZE, CHECK,ANALYZE, REPAIR16CREATE TABLE t1 ( c1 INT, c2 DATE )PARTITION BY RANGE( YEAR( c2 ) )SUBPARTITION BY HASH ( TO_DAYS( c2 ) )( PARTITION p0 VALUES LESS THAN (1990) (SUBPARTITION s0DATA DIRECTORY = /disk0/dataINDEX DIRECTORY = /disk0/idx,SUBPARTITION s1DATA DIRECTORY = /disk1/dataINDEX DIRECTORY = /disk1/idx ),...ALTER TABLE t1EXCHANGE PARTITION p3 WITH TABLE t2;-- Range and List partitionsALTER TABLE t1 REORGANIZE PARTITIONp0,p1,p2,p3 INTO (PARTITION m0 VALUES LESS THAN (1980),PARTITION m1 VALUES LESS THAN (2000));-- Hash and Key partitionsALTER TABLE t1 COALESCE PARTITION 10;ALTER TABLE t1 ADD PARTITION PARTITIONS 5;Friday, 3 May 13
  17. 17. PAGEMariaDB Optimizer•Multi-Range Read (MRR)*•Index Merge / Sort intersection•Batch KeyAccess*•Block hash join•Cost-based choice of range vs.index_merge•ORDER BY ... LIMIT <limit>*•MariaDB 10•Subqueries•Semi-join*•Materialization*•subquery cache•LIMIT ... ROWS EXAMINED<limit>17(*) - Available in MySQL 5.6Friday, 3 May 13
  18. 18. PAGEVirtual & Dynamic ColumnsVIRTUAL COLUMNS•For InnoDB, MyISAM andAria•PERSISTENT (stored) or VIRTUAL(generated)18CREATE TABLE t1 (c1 INT NOT NULL,c2 VARCHAR(32),c3 INT AS( c1 MOD 10 ) VIRTUAL,c4 VARCHAR(5) AS( LEFT(B,5) ) PERSISTENT);DYNAMIC COLUMNS•Implement a schemaless,document store•COLUMN_ CREATE,ADD, GET, LIST,JSON, EXISTS, CHECK, DELETE•Nested colums are allowed•Main datatypes are allowed•Max 1GB documentsCREATE TABLE assets (item_name VARCHAR(32) PRIMARY KEY,dynamic_cols BLOB );INSERT INTO assets VALUES (MariaDB T-shirt,COLUMN_CREATE( color, blue,size, XL ) );INSERT INTO assets VALUES (Thinkpad Laptop,COLUMN_CREATE( color, black,price, 500 ) );Friday, 3 May 13
  19. 19. PAGECassandra Storage Engine•Column Family == Table•Rowkey, static and dynamiccolumns allowed•Batch key access supportSET cassandra_default_thrift_host =192.168.0.10CREATE TABLE cassandra_tbl (rowkey INT PRIMARY KEY,col1 VARCHAR(25),col2 BIGINT,dyn_cols BLOB DYNAMIC_COLUMN_STORAGE = yes )ENGINE = cassandraKEYSPACE = cassandra_key_spaceCOLUMN_FAMILY = column_family_name;19Friday, 3 May 13
  20. 20. PAGEConnect Storage Engine•Any file format as MySQLTABLE:•ODBC•Text, XML, *ML•Excel,Access etc.•MariaDB CREATE TABLE options•Multi-file table•TableAutocreation•Condition push down•Read/Write and Multi Storage Engine Join•CREATE INDEX20CREATE TABLE handoutENGINE = CONNECTTABLE_TYPE = XMLFILE_NAME = handout.htmHEADER = yes OPTION_LIST =name = TABLE,coltype = HTML,attribute =(border=1;cellpadding=5);Friday, 3 May 13
  21. 21. Starting Your Big Data ProjectFriday, 3 May 13
  22. 22. PAGEWhy would you use MySQL?• Time• Knowledge• Infrastructure• Costs• Simplified Integration• Not so “big” data22Friday, 3 May 13
  23. 23. PAGEApache Hadoop & Friends23HDFSMapReducePIG HIVEHCatalogHBASEZooKeeper•Mahout•Ambari, Ganglia,Nagios•Sqoop•Cascading•Oozie•Flume•Protobuf, Avro,Thrift•Fuse-DFS•Chukwa•CassandraFriday, 3 May 13
  24. 24. PAGEMySQL & Friends24MySQL/MariaDB/Storage EnginesSQL OptimizerScriptsStored Procedures DMLDB Schema / DDLMySQL/MariaDBSkySQLDS•Mahout•SDS, Ganglia,Nagios•mysqlimport•Cascading•Talend, Pentaho•ConnectFriday, 3 May 13
  25. 25. PAGEJoin us at the Solutions Day•Cassandra and Connect Storage Engine•Map/Reduce approach - Proxy optimisation•Multiple protocols and more25Friday, 3 May 13
  26. 26. Thank You!ivan@skysql.comizoratti.blogspot.comwww.slideshare.net/izorattiwww.skysql.comFriday, 3 May 13

×