Big Data For OLTPHandling Massive Writes Liran Zelkha, Tona Consulting    Liran.zelkha@gmail.com
Intro• Israel’s Big Data Meetup• By Developers, For Developers• We talk code, architecture, solutions  – No products  – No...
About Me• Liran Zelkha, from Tona Consulting• Formerly co-founder at ScaleBase  – NewSQL solution for scaling RDBMS• Works...
Agenda• Terminology• Massive Reads vs. Massive Writes• Solutions  – RDBMS  – NoSQL  – Code
TERMINOLOGY
Terminology• OLTP  – a class of systems that facilitate and manage    transaction-oriented applications, typically for dat...
Terminology• OLAP  – s an approach to swiftly answer multi-dimensional    analytical (MDA) queries. OLAP is part of the br...
MASSIVE READS VS. MASSIVEWRITES
Reads vs. WritesReads                        Writes• Caching helps              • Caching sucks• No need for availability ...
Massive Reads Solutions• Memory, Memory, Memory  – For Caching, Caching, Caching• Column stores (?)
Caching• Tons of solutions.• To name a few:  – java.util.Map  – Hazelcast  – Infinispan  – Coherence
When Caching Consider•   Time To Live•   Memory Leaks•   Updates (?)•   What is cached and where
Column Store Databases• Store data in columns vs. rows• Faster reads, slower writes• Compression helps store more data in ...
Massive Write Solutions• Memory?  – Only with fast networks• Fast disks
Memory• Memory can fail, machines can fail• Distributed memory• Size of memory• See Nati Shalom’s points at  http://natish...
Memory Is The New Disk
Fast Disks• When massive writes translates to massive  disk writes  – Fast disks are a must• Can offer  – HA  – Scalabilit...
2 Words On Storage Technologies•   RAID•   SSD•   SAN•   NAS
Example• NMS System• Each device interaction is built from 5-10  request/response pair• Each request causes up to 3 databa...
JDBC Profiling – Standard Disks
JDBC Profiling – Fast SAN• Spec  – HP P2000 with 16 drives configured as:     • 14 300G 10K SAS Drives in RAID 10 (Data)  ...
JDBC Profiling – Fast SAN
RDBMS
MySQL Scaling Options•   Tuning•   Hardware upscale•   Partitioning•   New MySQL distribution•   Read/Write Split
Database Tuning• There are many ways to tune your database• Allot of data online, check out this post  – http://forge.mysq...
Database Tuning – Some Examples• innodb_buffer_pool_size   –   Holds the data and indexes of tables in memory.   –   Bigge...
Database Tuning – Pros and Cons  Pros                          Cons  May result in major           Doesn’t scale. No matte...
SQL Tuning• If you write lousy SQL code, you’ll get lousy  performance  – Java gurus are not SQL gurus  – Your ORM code do...
SQL Tuning – Some Examples• Here are just some examples:  – Use EXPLAIN to profile the query execution plan  – Use DISTINC...
SQL Tuning – Pros and Cons  Pros                       Cons  May result in major        Requires code modifications.  perf...
Scaling Up Hardware  • Usually DB gets the strongest servers  • However – there is a limit to how much performance    gain...
Scaling Up Hardware – Pros and ConsPros                                           ConsMay result in major performance impr...
SSD• Solid State Drive   – Better latency and access time than regular HDD   – Cost more per GB (but prices are dropping)•...
SSD – Pros and ConsPros                                           ConsMay result in major performance improvements   Expen...
Partitioning• Partitioning was introduced to MySQL at  version 5.1.• It is a way to split tables across multiple files, a ...
Partitioning Performance • See excellent presentation by Giuseppe Maxia   from 2010      – http://www.slideshare.net/datac...
PartitioningPros                                    ConsMay result in major performance         MySQL server itself introd...
New MySQL Distributions• There are many MySQL drop-in replacements• Are MySQL, but tuned differently, different  extension...
New MySQL Distributions – Pros and             Cons    Pros                  Cons    Provide performance   Still limited s...
Other Storage Engines• InnoDB better than MyISAM  – Oh really?  – As always, it depends.  – InnoDB will cause less corrupt...
Read/Write Splitting• Write to MySQL master, read from 1 (or more)  slaves• Excellent read scaling• Many issues:  – Since ...
Read/Write Splitting – Pros and ConsPros                     ConsProvides performance     Requires code changesimprovement...
Sharding                 DB1App                 DB2
MySQL + NoSQL - HandlerSocket• Fascinating post -  http://yoshinorimatsunobu.blogspot.com/2010/1  0/using-mysql-as-nosql-s...
HandlerSocket - Architecture
Code Sample#!/usr/bin/perluse strict;use warnings;use Net::HandlerSocket;#1. establishing a connectionmy $args = { host =>...
Code Sample – Cont’#3. main logic #fetching rows by id #execute_single (index id, cond, cond value, max rows, offset)$res ...
Bashing Some NewSQL Solutions• Xeround   – Limited Database size   – Only on the cloud• VoltDB   – Rewrite your entire app...
NOSQL
NoSQL Is Here To Stay
NoSQL• A term used to designate databases which  differ from classic relational databases in  some way. These data stores ...
NoSQL Types• Key/Value   – A big hash table   – Examples: Voldemort, Amazon Dynamo• Big Table   – Big table, column famili...
NO-SQL  http://browsertoolkit.com/fault-tolerance.png
MongoDB• I use the slides of Roger Bodamer from 10gen• Find them here:  – http://assets.en.oreilly.com/1/event/61/Building...
MongoDB• Document Oriented Database   – Data is stored in documents, not tables / relations• MongoDB is Implemented in C++...
Design• Want to build an app where users can check in  to a location• Leave notes or comments about that location• Iterati...
Requirements• Locations  – Need to store locations (Offices, Restaurants etc)     • Want to be able to store name, address...
Requirements• Locations  – Need to store locations (Offices, Restaurants etc)     • Want to be able to store name, address...
TerminologyRDBMS                 MongoTable, View           CollectionRow(s)                JSON DocumentIndex            ...
Collectionsloc1, loc2, loc3                 User1, User2      Location                                      Users          s
JSON Sample Doc { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),   author : "roger",   date : "Sat Jul 24 2010 19:47:11 GMT-0...
BSON• JSON has powerful, but limited set of datatypes  – Mongo extends datypes with Date, Int types, Id, …• MongoDB stores...
Locations v1location1= {      name: "10gen East Coast”,      address: ”134 5th Avenue 3rd Floor”,      city: "New York”,  ...
Places v1location1= {        name: "10gen East Coast”,        address: ”134 5th Avenue 3rd Floor”,        city: "New York”...
Places v2location1 = {      name: "10gen East Coast”,      address: "17 West 18th Street 8th Floor”,      city: "New York”...
Places v2location1 = {        name: "10gen East Coast”,        address: "17 West 18th Street 8th Floor”,        city: "New...
Places v3location1 = {      name: "10gen East Coast”,      address: "17 West 18th Street 8th Floor”,      city: "New York”...
Places v3location1 = {      name: "10gen East Coast”,      address: "17 West 18th Street 8th Floor”,      city: "New York”...
Places v3location1 = {      name: "10gen HQ”,      address: "17 West 18th Street 8th Floor”,      city: "New York”,      z...
Places v4location1 = {        name: "10gen HQ”,        address: "17 West 18th Street 8th Floor”,        city: "New York”, ...
Querying your PlacesCreating your indexesdb.locations.ensureIndex({tags:1})db.locations.ensureIndex({name:1})db.locations....
Inserting and updating locationsInitial data load:db.locations.insert(place1)Using update to Add tips:db.locations.update(...
Requirements• Locations  – Need to store locations (Offices, Restaurants etc)     • Want to be able to store name, address...
Usersuser1 = {      name: “nosh”      email: “nosh@10gen.com”,      .      .      .      checkins: [{ location: “10gen HQ”...
Simple Statsdb.users.find({„checkins.location‟: “10gen HQ”)db.checkins.find({„checkins.location‟: “10gen HQ”})            ...
Alternativeuser1 = {      name: “nosh”      email: “nosh@10gen.com”,      .      .      .      checkins: [4b97e62bf1d8c715...
User Check inCheck-in = 2 ops        read location to obtain location id        Update ($push) location id to user objectQ...
Unsharded Deployment                •Configure as a replica set forPrimary                automated failover              ...
Sharded Deployment                             MongoS                                                                confi...
Cassandra• Slides used from eben hewitt• See original slides here:  – http://assets.en.oreilly.com/1/event/51/Scaling%2   ...
cassandra properties•   tuneably consistent•   very fast writes•   highly available•   fault tolerant•   linear, elastic s...
write op
Staged Event-Driven Architecture• A general-purpose framework for high  concurrency & load conditioning• Decomposes applic...
instrumentation
data replication• configurable replication factor• replica placement strategy   rack unaware  Simple Strategy  rack aware...
partitioner smack-downRandom Preserving                Order Preserving• system will use MD5(key) to    • key distribution...
agenda•   context•   features•   data model•   api
structurekeyspace               column family settings    (eg,partitioner)   settings (eg,                                ...
keyspace• ~= database• typically one per application• some settings are configurable only per  keyspace
column family• group records of similar kind• not same kind, because CFs are sparse tables• ex:  – User  – Address  – Twee...
think of cassandra as row-oriented• each row is uniquely identifiable by key• rows group columns and super columns
column familykey                 nickname=      user=eben         The123                  Situationkey                    ...
json-like notationUser {  123 : { email: alison@foo.com,          icon:          },    456 : { email: eben@bar.com,       ...
example$cassandra –f$bin/cassandra-clicassandra> connect localhost/9160cassandra> set  Keyspace1.Standard1[‘eben’][‘age’]=...
a column has 3 parts1. name  –   byte[]  –   determines sort order  –   used in queries  –   indexed2. value  –   byte[]  ...
column comparators•   byte•   utf8•   long•   timeuuid•   lexicaluuid•   <pluggable>    – ex: lat/long
super columnsuper columns group columns under a common name
super column family             <<SCF>>PointOfInterest           <<SC>>Central          <<SC>>               Park         ...
super column family                     super column familyPointOfInterest {  key: 85255 {                                ...
about super column families• sub-column names in a SCF are not indexed  – top level columns (SCF Name) are always indexed•...
slice predicate• data structure describing columns to return  – SliceRange     •   start column name     •   finish column...
• get() : Column    – get the Col or SC at given ColPath                                                                re...
client.insert(userKeyBytes, parent,      write       api    new Column(“band".getBytes(UTF8),    “Funkadelic".getBytes(), ...
//create param                                                            batch_mutateMap<byte[], Map<String, List<Mutatio...
what about…SELECT WHERE    ORDER BY      JOIN ON       GROUP
rdbms: domain-based model             what answers do I have?cassandra: query-based model             what questions do I ...
SELECT WHERE               cassandra is an index factory<<cf>>USERKey: UserIDCols: username, email, birth date, city, stat...
SELECT WHERE pt 2• Use an aggregate key  state:city: { user1, user2}• Get rows between AZ: & AZ;  for all Arizona users• G...
ORDER BYColumns                   Rowsare sorted according to   are placed according to their Partitioner:CompareWith orCo...
rdbms
cassandra
When To Use NoSQL• No schema – No SQL   – Don’t do this KV DB design      • Terrible performance, impossible to maintain• ...
Real Life NoSQL Usages• MongoDB is great for CMS  – Try MTV…• Cassandra is great for low latency writes• Use the right too...
SpringData• In your face – there are frameworks that use  NoSQL
And To Finish• A long time ago people told me Java doesn’t  perform nor scale…
CODE
ORM• Check the queries created  – For instance, in Hibernate     • setLast,setFirst – kill your performance     • Lazy fet...
Hibernate With Batch Updatessession.doWork(new Work() {  public void execute(Connection connection) throws SQLException { ...
JDBC Code• Always use prepared statements• Use your Database performance extensions  – For instance, massive inserts for C...
Transactions• One transaction per hit. At most.• Reads can hurt your writes
Database Tuning• We talked about it, but:  – Indexes – good for read, bad for write  – Multi column indexes – good only if...
Final Note• NoSQL, SQL – it doesn’t matter if your code  sucks!
2 Words On Cloud• Storage sucks• Network sucks• So  – Cache locally  – Write a-sync  – Write to files, not DB  – Don’t use...
Upcoming SlideShare
Loading in …5
×

Handling Massive Writes

1,186 views

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,186
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
22
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • There are many ways to scale MySQL, but we’ll try to follow some guidelines – Plain vanilla MySQL – we’re not going to go into the different Storage Engine stories. They require a lengthy data migration process, and as such are outside the scope of this webinar.MySQL – we’re not comparing MySQL to Postgress or any other database – open sourced or notAfter laying down these guidelines we’re left with the following options:Tuning – be it SQL tuning or MySQL tuningScaling up the hardware used to run MySQL – more memory, CPU or even SSDUse partioning – a cool scaling feature introduced by MySQL 5.1Use another distribution of MySQL, which uses the same storage engines – but has some improvements in the MySQL core which can cause performance improvementsRead/Write splitting – a technique to send reads and writes to different servers, in order to decrease the load on a specific server.Sharding – the most popular technique to scale-out MySQL – however, this technique comes at a price – which we’ll discuss.And I promise – no sales pitch. I will only mention the ScaleBase solution at the very end of the webinar, and I promise to be quick about it.
  • The first step of any serious tuning process starts with tuning the MySQL database. This can be divided to two types of tuning:Operating system tuning – like setting socket timeout, freeing up memory, using dedicated cores, etc. This is specific to each operating system you run MySQL on – and most likely you can Google some good data on how to tune your operating system for MySQL.MySQL process tuning – which is usually not dependent on your operating system – but rather on your hardware and application usage. So let’s dive in and see what kind of options MySQL exposes for tuning.
  • Now, for the sake of this webinar, we’ll dive into 2 important and very popular parameters. Both can be set in the my.cnf file.The first,  innodb_buffer_pool_size, holds the data and indexes of tables in memory. Database uses indexes to find data more quickly. Usually, you add indexes to columns used in the where clause of your SQL statement. If the index is in RAM – the search will perform much faster. If the data is in RAM – the fetching of data will be faster. The bigger this value is the better, but of course, it can’t exceed system memory, and you must take into account other applications running on the database server. Note that this parameter only holds for InnoDB, which is the default storage engine nowadays anyway.The second, QueryCache, is a cache of query results, stored until an update invalidates the query result. The bigger this value is – the more query results it can store, and make big queries on data that is not updated often, perform much faster. By default, MySQL stores only big queries in its cache, which is what query cache limit is used for.
  • The Pros of database tuning are pretty obvious – it will probably result in major performance improvement, and is transparent to the application.The downside is that once completing the initial tuning process, additional tuning will deliver very little if any performance improvement.
  • The next must have scaling strategy is tuning your SQL commands. It sounds pretty minor – how much damage can an SQL command do? You’ll be surprised. The following command actually performs a FULL TABLE SCAN – meaning that the database will have to run through the entire table – without the use of indexes. This is almost never a desired behavior as it results in lousy performance. Tuning such a query is simple – just write the name of the columns you wish to query – even if it’s a list of all the columns in the table. You’ll see substantial performance improvements, and the bigger the table – the bigger the improvement. This might seem negligible, but if the database is under heavy load, long running queries will actually take even longer to execute. The reason is transaction isolation, a topic we’ll not discuss in this webinar – but you can find more info about it in the ScaleBase blog section. So always make sure you fine tune your SQL commands. Note that this task should be done by experts, as it’s not a skill developers usually have.
  • Here are some very general tips on SQL command tuning. MySQL offers an EXPLAIN command, which shows how MySQL executes a query – which tables are accessed first, indexes, etc. Always use EXPLAIN when tuning your SQL commands.DISTINCT in MySQL is faster than Group By. The reason is that while MySQL executes both commands the same way (which means a temporary table is created for the results) – MySQL sorts the results when performing a Group By – thus taking an additional step when executing the query.If you use an indexed column as a function parameter in a where clause – the index will not be used – and a full table scan might be performedUsing non deterministic functions in the where clause, like CURRENT_DATE(), will eliminate the query results from the query cache – a major performance hit.
  • So the pros of SQL tuning are clear – it will probably result in major performance improvements. The down side is that it requires code modifications, can take a long time, and at then end of the day – is still limited, since under heavy load even the best queries will perform poorly.A point to remember when talking about code modifications – when using ORM tools, changing queries can be really difficult – as not all ORM tools give the option of tuning the SQL query.
  • Scaling up the hardware is the oldest trick in the book. Buy a new machine, and Moore’s law will make sure it’s much stronger than the one you already have. If that was the case – none of us would have been here. Scaling up MySQL is limited. And those graphs, built by Baron Schwartz of Percona fame show just that. Performance improves as hardware becomes more powerful – but at some point, performance starts to degrade. MySQL has a sweet spot – meaning an optimal hardware configuration. Anything more powerful than this configuration will not improve performance, but in fact might even degrade it.
  • I think the pros and cons of scaling up are pretty clear – the easiest way to go – but it’s limited and might be very expensive, for high end hardware.
  • SSD is another form of scaling up. SSD stands for Solid State Drives, and they’re basically the new generation of hard-disks – they work much faster than regular hard drives. Without going into the technicalities involved, SSD use microchips to store data, so unlike hard-drives, no moving parts are involved, and access time is much faster.However – they are more expensive. And while improving performance – SSD doesn’t scale indefinitely. As VadimTkachenko of Percona showed in his MySQL Conference presentation – you can expect a times seven performance improvement by moving to SSD, but that’s that.
  • The pros and cons for the SSD solution are similar to those of scaling up hardware.
  • Partitioning is a great feature introduced to MySQL 5.1. To understand it, we need to understand how MySQL stores data in tables. Each table is mapped to a file on the file system. Partitioning lets you store the table in multiple files. The result is improved performance – queries and inserts run much faster, since indexes are smaller and can fit in RAM.
  • The table in this slide can show just how well database partitioning can perform. Those results are based on specific use cases, and incorrect partitioning configuration can hurt performance – but if you know what you’re doing, partitioning is a great solution.
  • While partitioning is not completely transparent to the application (as some SQL limitations apply), it’s a great solution for performance improvements. However – it’s not perfect. And the reason is that although it improves I/O, CPU bound actions, like user concurrency or transactions isolations are still bottle necks that hurt performance under heavy load.
  • Many companies rose up to release new MySQL distributions. We mention them here since they usually show major performance improvements, while still being a 100% drop-in replacement for MySQL, and can use the same storage engine MySQL uses. Percona Server and MariaDB are the leading examples, but you can find other distributions as well.
  • And again – while performance improvements might be gained with new MySQL distributions – scalability is still limited at the machine level.
  • We now reach the first solution that can truly scale out MySQL. With Read/Write splitting, the application directs all writes to a single, master, server. That server is replicating all its data to any number of slaves. Read operations from the application can be executed against any one of the slaves.This solution requires code modifications in the application, but also poses some data consistency situations – since replication in MySQL is a-synchronous, it is possible that the application will read data from a slave that is not up to date. This is an acceptable situation for many applications, since the data is not lost, and will appear in the next query. However, for some applications this is unacceptable.Another point to remember is that if the application started a transaction, database stickiness must be implemented. And that if the transaction executes a write operation, then even read operations in that transaction must be executed on the master server, so the code implementation is quite complex.
  • Read/Write splitting greatly improves database performance, and is unlimited in its scaling capabilities, since it allows you to scale out the database. However, it requires code changes, and might create consistency problems in the application. But the biggest con of read/write splitting is that write operations are still executed on one server. Since many applications use caching layers – many databases see allot of write traffic – and read/write splitting doesn’t help these applications one bit.
  • Memory mapped files, BSON, indexes, multiple data types, binary files, etc@ main datasets: places and checkins use cases: given current loc find places nearby; add notes to locations Record checkinsGenerate stats about checkins
  • Memory mapped files, BSON, indexes, multiple data types, binary files, etc@ main datasets: places and checkins use cases: given current loc find places nearby; add notes to locations Record checkinsGenerate stats about checkins
  • Documents go into collectionsTodays app: users , places, checkins
  • Latlong are actually real lat / long points$near gives you the closest 100
  • Latlong are actually real lat / long points$near gives you the closest 100
  • Latlong are actually real lat / long points$near gives you the closest 100
  • Memory mapped files, BSON, indexes, multiple data types, binary files, etc@ main datasets: places and checkins use cases: given current loc find places nearby; add notes to locations Record checkinsGenerate stats about checkins
  • Handling Massive Writes

    1. 1. Big Data For OLTPHandling Massive Writes Liran Zelkha, Tona Consulting Liran.zelkha@gmail.com
    2. 2. Intro• Israel’s Big Data Meetup• By Developers, For Developers• We talk code, architecture, solutions – No products – No sales pitches• Suggest ideas for next meetups• Suggest yourselves as speakers for next meetups
    3. 3. About Me• Liran Zelkha, from Tona Consulting• Formerly co-founder at ScaleBase – NewSQL solution for scaling RDBMS• Works (allot) with applications that need to scale, both reads and writes
    4. 4. Agenda• Terminology• Massive Reads vs. Massive Writes• Solutions – RDBMS – NoSQL – Code
    5. 5. TERMINOLOGY
    6. 6. Terminology• OLTP – a class of systems that facilitate and manage transaction-oriented applications, typically for data entry and retrieval transaction processing. The term is somewhat ambiguous; some understand a "transaction" in the context of computer or database transactions, while others (such as the Transaction Processing Performance Council) define it in terms of business or commercial transactions. OLTP has also been used to refer to processing in which the system responds immediately to user requests. An automatic teller machine (ATM) for a bank is an example of a commercial transaction processing application. http://en.wikipedia.org/wiki/Online_transaction_processing
    7. 7. Terminology• OLAP – s an approach to swiftly answer multi-dimensional analytical (MDA) queries. OLAP is part of the broader category of business intelligence, which also encompasses relational reporting and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications coming up, such as agriculture. The term OLAP was created as a slight modification of the traditional database term OLTP (Online Transaction Processing). http://en.wikipedia.org/wiki/Online_analytical_processing
    8. 8. MASSIVE READS VS. MASSIVEWRITES
    9. 9. Reads vs. WritesReads Writes• Caching helps • Caching sucks• No need for availability • Availability is a must (?)• No transactions • Transactions are a must (?)• No locking • Locking is a must (?)
    10. 10. Massive Reads Solutions• Memory, Memory, Memory – For Caching, Caching, Caching• Column stores (?)
    11. 11. Caching• Tons of solutions.• To name a few: – java.util.Map – Hazelcast – Infinispan – Coherence
    12. 12. When Caching Consider• Time To Live• Memory Leaks• Updates (?)• What is cached and where
    13. 13. Column Store Databases• Store data in columns vs. rows• Faster reads, slower writes• Compression helps store more data in less disk space
    14. 14. Massive Write Solutions• Memory? – Only with fast networks• Fast disks
    15. 15. Memory• Memory can fail, machines can fail• Distributed memory• Size of memory• See Nati Shalom’s points at http://natishalom.typepad.com/nati_shaloms _blog/2010/03/memory-is-the-new-disk-for- the-enterprise.html
    16. 16. Memory Is The New Disk
    17. 17. Fast Disks• When massive writes translates to massive disk writes – Fast disks are a must• Can offer – HA – Scalability – Low latency
    18. 18. 2 Words On Storage Technologies• RAID• SSD• SAN• NAS
    19. 19. Example• NMS System• Each device interaction is built from 5-10 request/response pair• Each request causes up to 3 database insert/updates – And multiple reads• Support up to 5M devices• Technology stack – Jboss 6 – EJB3, JPA – Oracle Database
    20. 20. JDBC Profiling – Standard Disks
    21. 21. JDBC Profiling – Fast SAN• Spec – HP P2000 with 16 drives configured as: • 14 300G 10K SAS Drives in RAID 10 (Data) • 2 300G 10K SAS Drives in RAID 1 (Redo) • Write-back is enabled – Sisk enclosure is connected via FC Switch and 8GB Qlogic HBA on the server side.
    22. 22. JDBC Profiling – Fast SAN
    23. 23. RDBMS
    24. 24. MySQL Scaling Options• Tuning• Hardware upscale• Partitioning• New MySQL distribution• Read/Write Split
    25. 25. Database Tuning• There are many ways to tune your database• Allot of data online, check out this post – http://forge.mysql.com/wiki/Top10SQLPerforman ceTips
    26. 26. Database Tuning – Some Examples• innodb_buffer_pool_size – Holds the data and indexes of tables in memory. – Bigger buffer results in faster row lookups. – The bigger the better. – Default – 8M• Query Cache – Keeps the result of queries in memory until they are invalidated by writes. – query_cache_size • total size of memory available to query caching – query_cache_limit • the maximum number of kilobytes one query may be in order to be cached. – query_cache_size = 128MB – query_cache_limit = 4MB
    27. 27. Database Tuning – Pros and Cons Pros Cons May result in major Doesn’t scale. No matter how well performance improvements the tuning is performed, it will reach a limit defined by machine capabilities. Doesn’t require application changes
    28. 28. SQL Tuning• If you write lousy SQL code, you’ll get lousy performance – Java gurus are not SQL gurus – Your ORM code does not know how to write good SQL code• What will happen when executing – SELECT * FROM• Tuning your SQL commands is tedious but very rewarding
    29. 29. SQL Tuning – Some Examples• Here are just some examples: – Use EXPLAIN to profile the query execution plan – Use DISTINCT – not GROUP BY – Don’t use an indexed column with a function – Try not to use a non deterministic functions in where clause
    30. 30. SQL Tuning – Pros and Cons Pros Cons May result in major Requires code modifications. performance improvements Doesn’t scale. No matter how well the tuning is performed, it will reach a limit defined by machine capabilities.
    31. 31. Scaling Up Hardware • Usually DB gets the strongest servers • However – there is a limit to how much performance gains you can get from increasing hardware • Some data:http://www.mysqlperformanceblog.com/2011/01/26/modeling-innodb-scalability-on-multi-core-servers/
    32. 32. Scaling Up Hardware – Pros and ConsPros ConsMay result in major performance improvements Scaling is limitedTransparent Might be expensiveEasy
    33. 33. SSD• Solid State Drive – Better latency and access time than regular HDD – Cost more per GB (but prices are dropping)• Vadim Tkachenko from Percona gave a great lecture on SSD at MySQL Conf 2011 – (see slides at http://en.oreilly.com/mysql2011/public/schedule/det ail/17117) – Claims you can expect up to X7 performance from SSD
    34. 34. SSD – Pros and ConsPros ConsMay result in major performance improvements ExpensiveTransparent Still limited scalability
    35. 35. Partitioning• Partitioning was introduced to MySQL at version 5.1.• It is a way to split tables across multiple files, a technique that proved very useful for improving database performance.• Benefits: – Helps fit indexes in RAM – Faster query/insert – Instant delete
    36. 36. Partitioning Performance • See excellent presentation by Giuseppe Maxia from 2010 – http://www.slideshare.net/datacharmer/partition s-performance-with-mysql-51-and-55Engine 6 month range queryInnoDB 4min 30sMyISAM 25.03sInnoDB partitions 13.19sMyISAM partiotions 4.45s
    37. 37. PartitioningPros ConsMay result in major performance MySQL server itself introduces limits. Userimprovements concurrency, transaction chains, isolation, are still bottlenecked by the single MySQL that owns all partitionsMostly transparent to the application
    38. 38. New MySQL Distributions• There are many MySQL drop-in replacements• Are MySQL, but tuned differently, different extensions• Leading examples – PerconaServer – MariaDB
    39. 39. New MySQL Distributions – Pros and Cons Pros Cons Provide performance Still limited scalability improvements Transparent
    40. 40. Other Storage Engines• InnoDB better than MyISAM – Oh really? – As always, it depends. – InnoDB will cause less corruptions, and is probably better for most high traffic applications.• MEMORY can be great – However, no persistency
    41. 41. Read/Write Splitting• Write to MySQL master, read from 1 (or more) slaves• Excellent read scaling• Many issues: – Since replication is a-synchronous – read might not be up to date – Transactions create stickiness – Code changes
    42. 42. Read/Write Splitting – Pros and ConsPros ConsProvides performance Requires code changesimprovementsScale out the database Good for scaling reads, not writes Since replication is asynchronous, some reads might get data that is not up to date.
    43. 43. Sharding DB1App DB2
    44. 44. MySQL + NoSQL - HandlerSocket• Fascinating post - http://yoshinorimatsunobu.blogspot.com/2010/1 0/using-mysql-as-nosql-story-for.html• MySQL spends huge amount of time on SQL statement parsing• Using InnoDB API directly• MySQL Plugin• Comes builtin with Percona Server 5.5
    45. 45. HandlerSocket - Architecture
    46. 46. Code Sample#!/usr/bin/perluse strict;use warnings;use Net::HandlerSocket;#1. establishing a connectionmy $args = { host => ip_to_remote_host, port => 9998 };my $hs = new Net::HandlerSocket($args);#2. initializing an index so that we can use in main logics.# MySQL tables will be opened here (if not opened)my $res = $hs->open_index(0, test, user, PRIMARY, user_name,user_email,created);die $hs->get_error() if $res != 0;
    47. 47. Code Sample – Cont’#3. main logic #fetching rows by id #execute_single (index id, cond, cond value, max rows, offset)$res = $hs->execute_single(0, =, [ 101 ], 1, 0);die $hs->get_error() if $res->[0] != 0;shift(@$res);for (my $row = 0; $row < 1; ++$row) { my $user_name= $res->[$row + 0]; my $user_email= $res->[$row + 1]; my $created= $res->[$row + 2]; print "$user_namet$user_emailt$createdn";}#4. closing the connection$hs->close();
    48. 48. Bashing Some NewSQL Solutions• Xeround – Limited Database size – Only on the cloud• VoltDB – Rewrite your entire app to use stored procedures• NimbusDB – Still in Beta• Clustrix – Insanely expensive – NoSQL that looks like MySQL• Schooner – Fast MySQL on SSD• And no word on ScaleBase
    49. 49. NOSQL
    50. 50. NoSQL Is Here To Stay
    51. 51. NoSQL• A term used to designate databases which differ from classic relational databases in some way. These data stores may not require fixed table schemas, and usually avoid join operations and typically scale horizontally. Academics and papers typically refer to these databases as structured storage, a term which would include classic relational databases as a subset. http://en.wikipedia.org/wiki/NoSQL
    52. 52. NoSQL Types• Key/Value – A big hash table – Examples: Voldemort, Amazon Dynamo• Big Table – Big table, column families – Examples: Hbase, Cassandra• Document based – Collections of collections – Examples: CouchDB, MongoDB• Graph databases – Based on graph theory – Examples: Neo4J• Each solves a different problem
    53. 53. NO-SQL http://browsertoolkit.com/fault-tolerance.png
    54. 54. MongoDB• I use the slides of Roger Bodamer from 10gen• Find them here: – http://assets.en.oreilly.com/1/event/61/Building% 20Web%20Applications%20with%20MongoDB%2 0Presentation.ppt• In my book – Mongo doesn’t fit the massive write story.
    55. 55. MongoDB• Document Oriented Database – Data is stored in documents, not tables / relations• MongoDB is Implemented in C++ for best performance• Platforms 32/64 bit Windows Linux, Mac OS-X, FreeBSD, Solaris• Language drivers for: – Ruby / Ruby-on-Rails – Java – C# – JavaScript – C / C++ – Erlang Python, Perl others..... and much more ! ..
    56. 56. Design• Want to build an app where users can check in to a location• Leave notes or comments about that location• Iterative Approach: – Decide requirements – Design documents – Rinse, repeat :-)
    57. 57. Requirements• Locations – Need to store locations (Offices, Restaurants etc) • Want to be able to store name, address and tags • Maybe User Generated Content, i.e. tips / small notes ? – Want to be able to find other locations nearby
    58. 58. Requirements• Locations – Need to store locations (Offices, Restaurants etc) • Want to be able to store name, address and tags • Maybe User Generated Content, i.e. tips / small notes ? – Want to be able to find other locations nearby• Checkins – User should be able to ‘check in’ to a location – Want to be able to generate statistics
    59. 59. TerminologyRDBMS MongoTable, View CollectionRow(s) JSON DocumentIndex IndexJoin Embedded DocumentPartition ShardPartition Key Shard Key
    60. 60. Collectionsloc1, loc2, loc3 User1, User2 Location Users s
    61. 61. JSON Sample Doc { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "roger", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : ”MongoSF", tags : [ ”San Francisco", ”MongoDB" ] }Notes: - _id is unique, but can be anything you’d like
    62. 62. BSON• JSON has powerful, but limited set of datatypes – Mongo extends datypes with Date, Int types, Id, …• MongoDB stores data in BSON• BSON is a binary representation of JSON – Optimized for performance and navigational abilities – Also compression – See bsonspec.org
    63. 63. Locations v1location1= { name: "10gen East Coast”, address: ”134 5th Avenue 3rd Floor”, city: "New York”, zip: "10011”}
    64. 64. Places v1location1= { name: "10gen East Coast”, address: ”134 5th Avenue 3rd Floor”, city: "New York”, zip: "10011”}db.locations.find({zip:”10011”}).limit(10)
    65. 65. Places v2location1 = { name: "10gen East Coast”, address: "17 West 18th Street 8th Floor”, city: "New York”, zip: "10011”, tags: [“business”, “mongodb”]}
    66. 66. Places v2location1 = { name: "10gen East Coast”, address: "17 West 18th Street 8th Floor”, city: "New York”, zip: "10011”, tags: [“business”, “mongodb”]}db.locations.find({zip:”10011”, tags:”business”})
    67. 67. Places v3location1 = { name: "10gen East Coast”, address: "17 West 18th Street 8th Floor”, city: "New York”, zip: "10011”, tags: [“business”, “mongodb”], latlong: [40.0,72.0]}
    68. 68. Places v3location1 = { name: "10gen East Coast”, address: "17 West 18th Street 8th Floor”, city: "New York”, zip: "10011”, tags: [“business”, “cool place”], latlong: [40.0,72.0]}db.locations.ensureIndex({latlong:”2d”})
    69. 69. Places v3location1 = { name: "10gen HQ”, address: "17 West 18th Street 8th Floor”, city: "New York”, zip: "10011”, tags: [“business”, “cool place”], latlong: [40.0,72.0]}db.locations.ensureIndex({latlong:”2d”})db.locations.find({latlong:{$near:[40,70]}})
    70. 70. Places v4location1 = { name: "10gen HQ”, address: "17 West 18th Street 8th Floor”, city: "New York”, zip: "10011”, latlong: [40.0,72.0], tags: [“business”, “cool place”], tips: [ {user:"nosh", time:6/26/2010, tip:"stop by foroffice hours on Wednesdays from 4-6pm"}, {.....}, ]}
    71. 71. Querying your PlacesCreating your indexesdb.locations.ensureIndex({tags:1})db.locations.ensureIndex({name:1})db.locations.ensureIndex({latlong:”2d”})Finding places:db.locations.find({latlong:{$near:[40,70]}})With regular expressions:db.locations.find({name: /^typeaheadstring/)By tag:db.locations.find({tags: “business”})
    72. 72. Inserting and updating locationsInitial data load:db.locations.insert(place1)Using update to Add tips:db.locations.update({name:"10gen HQ"}, {$push :{tips: {user:"nosh", time:6/26/2010, tip:"stop by for office hours onWednesdays from 4-6"}}}}
    73. 73. Requirements• Locations – Need to store locations (Offices, Restaurants etc) • Want to be able to store name, address and tags • Maybe User Generated Content, i.e. tips / small notes ? – Want to be able to find other locations nearby• Checkins – User should be able to ‘check in’ to a location – Want to be able to generate statistics
    74. 74. Usersuser1 = { name: “nosh” email: “nosh@10gen.com”, . . . checkins: [{ location: “10gen HQ”, ts: 9/20/2010 10:12:00, …}, … ]}
    75. 75. Simple Statsdb.users.find({„checkins.location‟: “10gen HQ”)db.checkins.find({„checkins.location‟: “10gen HQ”}) .sort({ts:-1}).limit(10)db.checkins.find({„checkins.location‟: “10gen HQ”, ts: {$gt: midnight}}).count()
    76. 76. Alternativeuser1 = { name: “nosh” email: “nosh@10gen.com”, . . . checkins: [4b97e62bf1d8c7152c9ccb74, 5a20e62bf1d8c736ab]}checkins [] = ObjectId reference to locations collection
    77. 77. User Check inCheck-in = 2 ops read location to obtain location id Update ($push) location id to user objectQueries: find all locations where a user checked in: checkin_array = db.users.find({..}, {checkins:true}).checkins db.location.find({_id:{$in: checkin_array}})
    78. 78. Unsharded Deployment •Configure as a replica set forPrimary automated failover •Async replication between nodesSecondary •Add more secondaries to scale readsSecondary
    79. 79. Sharded Deployment MongoS confi g Primary Secondary•Autosharding distributes data among two or more replica sets•Mongo Config Server(s) handles distribution & balancing•Transparent to applications
    80. 80. Cassandra• Slides used from eben hewitt• See original slides here: – http://assets.en.oreilly.com/1/event/51/Scaling%2 0Web%20Applications%20with%20Cassandra%20 Presentation.ppt
    81. 81. cassandra properties• tuneably consistent• very fast writes• highly available• fault tolerant• linear, elastic scalability• decentralized/symmetric• ~12 client languages – Thrift RPC API• ~automatic provisioning of new nodes• 0(1) dht• big data
    82. 82. write op
    83. 83. Staged Event-Driven Architecture• A general-purpose framework for high concurrency & load conditioning• Decomposes applications into stages separated by queues• Adopt a structured approach to event-driven concurrency
    84. 84. instrumentation
    85. 85. data replication• configurable replication factor• replica placement strategy rack unaware  Simple Strategy rack aware  Old Network Topology Strategy data center shard  Network Topology Strategy
    86. 86. partitioner smack-downRandom Preserving Order Preserving• system will use MD5(key) to • key distribution determined distribute data across nodes by token• even distribution of keys • lexicographical ordering from one CF across • required for range queries ranges/nodes – scan over rows like cursor in index • can specify the token for this node to use • ‘scrabble’ distribution
    87. 87. agenda• context• features• data model• api
    88. 88. structurekeyspace column family settings (eg,partitioner) settings (eg, column comparator, type [Std]) name value clock
    89. 89. keyspace• ~= database• typically one per application• some settings are configurable only per keyspace
    90. 90. column family• group records of similar kind• not same kind, because CFs are sparse tables• ex: – User – Address – Tweet – PointOfInterest – HotelRoom
    91. 91. think of cassandra as row-oriented• each row is uniquely identifiable by key• rows group columns and super columns
    92. 92. column familykey nickname= user=eben The123 Situationkey icon= n= user=alison456 42
    93. 93. json-like notationUser { 123 : { email: alison@foo.com, icon: }, 456 : { email: eben@bar.com, location: The Danger Zone}}
    94. 94. example$cassandra –f$bin/cassandra-clicassandra> connect localhost/9160cassandra> set Keyspace1.Standard1[‘eben’][‘age’]=‘29’cassandra> set Keyspace1.Standard1[‘eben’][‘email’]=‘e@e.com’cassandra> get Keyspace1.Standard1[‘eben][‘age]=> (column=6e616d65, value=39, timestamp=1282170655390000)
    95. 95. a column has 3 parts1. name – byte[] – determines sort order – used in queries – indexed2. value – byte[] – you don’t query on column values3. timestamp – long (clock) – last write wins conflict resolution
    96. 96. column comparators• byte• utf8• long• timeuuid• lexicaluuid• <pluggable> – ex: lat/long
    97. 97. super columnsuper columns group columns under a common name
    98. 98. super column family <<SCF>>PointOfInterest <<SC>>Central <<SC>> Park Empire State Bldg10017 desc=Fun to desc=Great phone=212. walk in. view from 555.11212 102nd floor! <<SC>>85255 Phoenix Zoo
    99. 99. super column family super column familyPointOfInterest { key: 85255 { column Phoenix Zoo { phone: 480-555-5555, desc: They have animals here. }, Spring Training { phone: 623-333-3333, desc: Fun for baseball fans. }, }, //end phx key super column key: 10019 { flexible schema Central Park { desc: Walk around. Its pretty.} , s Empire State Building { phone: 212-777-7777, desc: Great view from 102nd floor. } } //end nyc}
    100. 100. about super column families• sub-column names in a SCF are not indexed – top level columns (SCF Name) are always indexed• often used for denormalizing data from standard CFs
    101. 101. slice predicate• data structure describing columns to return – SliceRange • start column name • finish column name (can be empty to stop on count) • reverse • count (like LIMIT)
    102. 102. • get() : Column – get the Col or SC at given ColPath read api COSC cosc = client.get(key, path, CL);• get_slice() : List<ColumnOrSuperColumn> – get Cols in one row, specified by SlicePredicate: List<ColumnOrSuperColumn> results = client.get_slice(key, parent, predicate, CL);• multiget_slice() : Map<key, List<CoSC>> – get slices for list of keys, based on SlicePredicate Map<byte[],List<ColumnOrSuperColumn>> results = client.multiget_slice(rowKeys, parent, predicate, CL);• get_range_slices() : List<KeySlice> – returns multiple Cols according to a range – range is startkey, endkey, starttoken, endtoken: List<KeySlice> slices = client.get_range_slices( parent, predicate, keyRange, CL);
    103. 103. client.insert(userKeyBytes, parent, write api new Column(“band".getBytes(UTF8), “Funkadelic".getBytes(), clock), CL);batch_mutate – void batch_mutate( map<byte[], map<String, List<Mutation>>> , CL)remove – void remove(byte[], ColumnPath column_path, Clock, CL)
    104. 104. //create param batch_mutateMap<byte[], Map<String, List<Mutation>>> mutationMap = new HashMap<byte[], Map<String, List<Mutation>>>();//create Cols for MutsColumn nameCol = new Column("name".getBytes(UTF8),“Funkadelic”.getBytes("UTF-8"), new Clock(System.nanoTime()););Mutation nameMut = new Mutation();nameMut.column_or_supercolumn = nameCosc; //also phone, etcMap<String, List<Mutation>> muts = new HashMap<String, List<Mutation>>();List<Mutation> cols = new ArrayList<Mutation>();cols.add(nameMut);cols.add(phoneMut);muts.put(CF, cols);//outer map key is a row key; inner map key is the CF namemutationMap.put(rowKey.getBytes(), muts);//send to serverclient.batch_mutate(mutationMap, CL);
    105. 105. what about…SELECT WHERE ORDER BY JOIN ON GROUP
    106. 106. rdbms: domain-based model what answers do I have?cassandra: query-based model what questions do I have?
    107. 107. SELECT WHERE cassandra is an index factory<<cf>>USERKey: UserIDCols: username, email, birth date, city, stateHow to support this query?SELECT * FROM User WHERE city = ‘Scottsdale’Create a new CF called UserCity:<<cf>>USERCITYKey: cityCols: IDs of the users in that city.Also uses the Valueless Column pattern
    108. 108. SELECT WHERE pt 2• Use an aggregate key state:city: { user1, user2}• Get rows between AZ: & AZ; for all Arizona users• Get rows between AZ:Scottsdale & AZ:Scottsdale1 for all Scottsdale users
    109. 109. ORDER BYColumns Rowsare sorted according to are placed according to their Partitioner:CompareWith orCompareSubcolumnsWith •Random: MD5 of key •Order-Preserving: actual key are sorted by key, regardless of partitioner
    110. 110. rdbms
    111. 111. cassandra
    112. 112. When To Use NoSQL• No schema – No SQL – Don’t do this KV DB design • Terrible performance, impossible to maintain• No persistency – No SQL – Heck – use a distributed cache for that• Low write latency – And fast storage is not an option• Simple queries – You can always ETL to a DB later• Cool factor – Good luck with that
    113. 113. Real Life NoSQL Usages• MongoDB is great for CMS – Try MTV…• Cassandra is great for low latency writes• Use the right tool for the job – or you’ll get worse performance than a DB• Expect a very high learning curve to implement
    114. 114. SpringData• In your face – there are frameworks that use NoSQL
    115. 115. And To Finish• A long time ago people told me Java doesn’t perform nor scale…
    116. 116. CODE
    117. 117. ORM• Check the queries created – For instance, in Hibernate • setLast,setFirst – kill your performance • Lazy fetching • N+1• Use batch updates
    118. 118. Hibernate With Batch Updatessession.doWork(new Work() { public void execute(Connection connection) throws SQLException { PreparedStatement stmt = connection.prepareStatement(“insert into testvalues(?,?)"); stmt.setLong(1, id); stmt.setLong(2, transId); call.execute(); }});
    119. 119. JDBC Code• Always use prepared statements• Use your Database performance extensions – For instance, massive inserts for Column Store DBs
    120. 120. Transactions• One transaction per hit. At most.• Reads can hurt your writes
    121. 121. Database Tuning• We talked about it, but: – Indexes – good for read, bad for write – Multi column indexes – good only if query reads using the same order of columns – Indexes and views – EXPLAIN is your friend• Keep your DB small – BI against read-replica – Delete history – Ensure you only save what you actually need
    122. 122. Final Note• NoSQL, SQL – it doesn’t matter if your code sucks!
    123. 123. 2 Words On Cloud• Storage sucks• Network sucks• So – Cache locally – Write a-sync – Write to files, not DB – Don’t use RDS – Check Cloud providers that offer SSD

    ×