Scaling the World’s
Largest Photo
Blogging
Community
Farhan “Frank” Mashraqi
Senior MySQL DBA
Fotolog, Inc.
fmashraqi@foto...
Introduction
 Farhan Mashraqi
- Senior MySQL DBA Fotolog, Inc.
- Known on PlanetMySQL as Frank Mash
- Author of upcoming ...
What is Fotolog?
 Social networking
- Guestbook comments
- Friend/ Favorite lists
- Members create “Social Capital”
 “On...
Fotolog (Screenshot of home page)
Fotolog (Screenshot of a fotolog member page)
Fotolog Growth
 228 million member photos
 2.47 billion guestbook comments
 20% of members visit the site daily
 24 mi...
Technology
 Sun
 Solaris 10
 MySQL
 Apache
 Java / Hibernate
 PHP
 Memcached
 3Par
 IBRIX
 StrongMail
MySQL at Fotolog
 32 Servers
Specification of servers
 Four “clusters”
- User
- GB
- PH
- FF
 Non-persistent connection...
Image Storage / Delivery
 MySQL is used to store image metadata only
- 3Par (utility storage)
- Thin Provisioning
- (dedi...
Important Scalability Considerations
Do you really need to have 5 nines availability?
Budget
Time to deploy
Testing
Can we...
Partitioning
SHARD 1
SHARD 2
SHARD 3
Table_v1
Table_v2
Table_v3
Table_v4
Partitioning thoughts
Load distribution across shards
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
M A B D Z K T 0 1 2 3 7 K O Q R...
Ideal distribution
proposed shard for load distribution
0%
2%
4%
6%
8%
10%
12%
db4 db18 db19 db22 db23 db24 db25 db28 db30...
GB current db4
db18
db22
db23
db24
db25
db26
db27
db28
db30
db32
Application Servers
4 18 22 23 24 25 26 27 28 30 32
read
...
GB Scalability db4
db18
db22
db23
db24
db25
db26
db27
db28
db30
db32
Application Servers
4 18 22 23 24 25 26 27 28 30 32
r...
Current Scheme for fl_db1 repl. PH
Application Servers
read
write
Slave
DB2DB1 DB3
DB8 DB12
Application Servers Issuing PH...
Proposed Scheme for PH
(Write & Read)
Application Servers
7 8 9 10 11 12 13 14 15 16 29
read
write
00-08 09-17 18-26 27-35...
AUTO-INC table lock contention
SEL
SEL
SEL
SEL
SEL
SEL
SEL
SEL
SEL
SEL
M
Y
S
Q
L
Thread concurrency
SELECTs do very well w...
AUTO-INC table lock contention
SEL
SEL
SEL
SEL
SEL
INS
INS
M
Y
S
Q
L
Thread concurrency
As more SELECTs come,
AUTO-INC loc...
AUTO-INC table lock contention
INS
SEL
INS
SEL
INS
INS
INS
INS
INS
INS
M
Y
S
Q
L
Thread concurrency
PROBLEM
SELECT
INSERT
...
InnoDB Tablespace Structure (Simplified)
PK / CLUSTERED INDEX
SECONDARY INDEX
PK (clustered index key)
6 byte header
Links...
InnoDB Index Structure (Simplified)
DATA PAGE
PK INDEX / CLUSTERED INDEX
SECONDARY INDEX
PK
ROW DATA
PK
Old Schema
 CREATE TABLE `guestbook_v3` (
`identifier` bigint(20) unsigned NOT NULL auto_increment,
`user_name` varchar(1...
Reads
Data pages
• Data ordered by
Identifier (PK)
• Looked up by
secondary key
New Schema
 CREATE TABLE `guestbook_v4` (
`identifier` int(9) unsigned NOT NULL auto_increment,
`user_name` varchar(16) N...
Pending preads (Optimizing Disk Usage)
Data pages
• Data ordered by
composite key
consisting of
photo_identifier
(FK)
• Lo...
Pending reads / writes / Proposed
Throughput not as important as number of requests
Pending reads / writes / Proposed
Pending reads
MySQL Performance Challenges
 Finding the source of problem
 Mostly disk bound in mature systems
 Is the query cache hu...
Considerations for future growth
 SQLite?
 File system?
 PostgreSQL?
 Make application better and optimize tables?
Things to remember
 Know the problem
 Know your application
 Know your storage engine
 Know your requirements
 Know y...
Questions?
Upcoming SlideShare
Loading in …5
×

Fotolog.Com.Mashraqi Scaling

1,614 views

Published on

fotolog.com图片网站使用mysql的经验及架构

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,614
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
9
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Fotolog.Com.Mashraqi Scaling

  1. 1. Scaling the World’s Largest Photo Blogging Community Farhan “Frank” Mashraqi Senior MySQL DBA Fotolog, Inc. fmashraqi@fotolog.com Credits: Warren L. Habib: CTO Olu King: Senior Systems Administrator
  2. 2. Introduction  Farhan Mashraqi - Senior MySQL DBA Fotolog, Inc. - Known on PlanetMySQL as Frank Mash - Author of upcoming “Pro Ruby on Rails” by Apress  Contact - fmashraqi@fotolog.com - softwareengineer99@yahoo.com - Blog: - http://mysqldatabaseadministration.blogspot.com - http://mashraqi.com
  3. 3. What is Fotolog?  Social networking - Guestbook comments - Friend/ Favorite lists - Members create “Social Capital”  “One photo a day”  Currently 25th most visited website on the Internet (Alexa)  History  http://blog.fotolog.com/
  4. 4. Fotolog (Screenshot of home page)
  5. 5. Fotolog (Screenshot of a fotolog member page)
  6. 6. Fotolog Growth  228 million member photos  2.47 billion guestbook comments  20% of members visit the site daily  24 minutes a day spent by an average user  10 guestbook comments per photo  1,000 people or more see a photo on average  7 million members and counting  “explosive growth in Europe”  Italy and Spain among the fastest- growing countries  Recently broke the 500K photos uploaded a day record  90 million page views Fotolog Flickr
  7. 7. Technology  Sun  Solaris 10  MySQL  Apache  Java / Hibernate  PHP  Memcached  3Par  IBRIX  StrongMail
  8. 8. MySQL at Fotolog  32 Servers Specification of servers  Four “clusters” - User - GB - PH - FF  Non-persistent connections (PHP) - Connection Pooling (Java)  Mostly MyISAM initially Later mostly converted to InnoDB  Application side table partitioning  Memcache
  9. 9. Image Storage / Delivery  MySQL is used to store image metadata only - 3Par (utility storage) - Thin Provisioning - (dedicate on allocation vs. dedicate on write)  How fast growing each day?  Frequently Accessed vs. Infrequently accessed media  Third party CDN: Akamai/Panther
  10. 10. Important Scalability Considerations Do you really need to have 5 nines availability? Budget Time to deploy Testing Can we afford: SPF? Not having read redundancy? User PH GB FF Not having write redundancy? User PH GB FF
  11. 11. Partitioning SHARD 1 SHARD 2 SHARD 3 Table_v1 Table_v2 Table_v3 Table_v4
  12. 12. Partitioning thoughts Load distribution across shards 0.00% 2.00% 4.00% 6.00% 8.00% 10.00% M A B D Z K T 0 1 2 3 7 K O Q R T V F P 8 9 G S 5 6 E H U X Y L _ A Load distribution across shards
  13. 13. Ideal distribution proposed shard for load distribution 0% 2% 4% 6% 8% 10% 12% db4 db18 db19 db22 db23 db24 db25 db28 db30 db32 proposed shard for load distribution
  14. 14. GB current db4 db18 db22 db23 db24 db25 db26 db27 db28 db30 db32 Application Servers 4 18 22 23 24 25 26 27 28 30 32 read write Single Point of Failure
  15. 15. GB Scalability db4 db18 db22 db23 db24 db25 db26 db27 db28 db30 db32 Application Servers 4 18 22 23 24 25 26 27 28 30 32 read write 00-08 09-17 18-26 27-35 36-44 45-53 54-62 63-71 72-80 81-89 90-99 SlaveMaster/DRBD
  16. 16. Current Scheme for fl_db1 repl. PH Application Servers read write Slave DB2DB1 DB3 DB8 DB12 Application Servers Issuing PH Queries RTX Repl. Repl.Repl. DB7 DB9 DB15 FSW 05DHN AEK 16JOQUZ 28IP _ 39B 4C 7GLVY M DB10 DB11 DB13 DB14 DB16 29 FF. Repl.
  17. 17. Proposed Scheme for PH (Write & Read) Application Servers 7 8 9 10 11 12 13 14 15 16 29 read write 00-08 09-17 18-26 27-35 36-44 45-53 54-62 63-71 72-80 81-89 90-99 TO USER CLUSTER
  18. 18. AUTO-INC table lock contention SEL SEL SEL SEL SEL SEL SEL SEL SEL SEL M Y S Q L Thread concurrency SELECTs do very well with Increased concurrency. QPS: 500+ SELECT INSERT GOOD TIMES
  19. 19. AUTO-INC table lock contention SEL SEL SEL SEL SEL INS INS M Y S Q L Thread concurrency As more SELECTs come, AUTO-INC lock contention Starts causing problem. WARNING SEL SEL SEL SELECT INSERT
  20. 20. AUTO-INC table lock contention INS SEL INS SEL INS INS INS INS INS INS M Y S Q L Thread concurrency PROBLEM SELECT INSERT SEL SEL SEL SEL INS INS INS INS INS
  21. 21. InnoDB Tablespace Structure (Simplified) PK / CLUSTERED INDEX SECONDARY INDEX PK (clustered index key) 6 byte header Links together consecutive records & used in row-level locking Clustered index contains Fields for all user-defined columns 6 byte trx id 7 byte roll pointer 6 byte row id If no PK or UNIQUE NOT NULL defined Record Directory Array of Pointers to each field of the record 1 byte: If the total length of fields in record is 128 bytes 2 bytes: otherwise Data part of record
  22. 22. InnoDB Index Structure (Simplified) DATA PAGE PK INDEX / CLUSTERED INDEX SECONDARY INDEX PK ROW DATA PK
  23. 23. Old Schema  CREATE TABLE `guestbook_v3` ( `identifier` bigint(20) unsigned NOT NULL auto_increment, `user_name` varchar(16) NOT NULL default '', `photo_identifier` bigint(20) unsigned NOT NULL default '0', `posted` datetime NOT NULL default '0000-00-00 00:00:00', … PRIMARY KEY (`identifier`), KEY `guestbook_photo_id_posted_idx` (`photo_identifier`,`posted`) ) ENGINE=MyISAM
  24. 24. Reads Data pages • Data ordered by Identifier (PK) • Looked up by secondary key
  25. 25. New Schema  CREATE TABLE `guestbook_v4` ( `identifier` int(9) unsigned NOT NULL auto_increment, `user_name` varchar(16) NOT NULL default '', `photo_identifier` int(9) unsigned NOT NULL default '0', `posted` timestamp NOT NULL default '0000-00-00 00:00:00', … PRIMARY KEY (`photo_identifier`,`posted`,`identifier`), KEY `identifier` (`identifier`) ) ENGINE=InnoDB 1 row in set (7.64 sec)
  26. 26. Pending preads (Optimizing Disk Usage) Data pages • Data ordered by composite key consisting of photo_identifier (FK) • Looked up by primary key • Very low read requests per second
  27. 27. Pending reads / writes / Proposed Throughput not as important as number of requests
  28. 28. Pending reads / writes / Proposed
  29. 29. Pending reads
  30. 30. MySQL Performance Challenges  Finding the source of problem  Mostly disk bound in mature systems  Is the query cache hurting you?  RAM addition helps dodge the bullet  Disk striping  Restructuring tables for optimal performance  LD_PRELOAD_64 = /usr/lib/sparcv9/libumem.so
  31. 31. Considerations for future growth  SQLite?  File system?  PostgreSQL?  Make application better and optimize tables?
  32. 32. Things to remember  Know the problem  Know your application  Know your storage engine  Know your requirements  Know your budget
  33. 33. Questions?

×