Feed Burner Scalability

FeedBurner:
Scalable Web
Applications using
MySQL and Java
Joe Kottke, Director of
Network Operations

What is FeedBurner? 2

• Market-leading feed management provider
• 170,000 bloggers, podcasters and commercial
publishers including Reuters, USA TODAY,
Newsweek, Ars Technica, BoingBoing…
• 11 million subscribers in 190 countries.
• Web-based services help publishers expand
their reach online, attract subscribers and
make money from their content
• The largest advertising network for feeds

© 2006 F eedBurner

Scaling history 3

• July 2004
– 300Kbps, 5,600 feeds
– 3 app servers, 3 web servers 2 DB servers
• April 2005
– 5Mbps, 47,700 feeds
– My first MySQL Users Conference
– 6 app servers, 6 web servers (same machines)
• September 2005
– 20Mbps, 109,200 feeds
• Currently
– 115 Mbps, 270,000 feeds, 100 Million hits per day

© 2006 F eedBurner

Scalability Problem 1: Plain old reliability 4

• August 2004
• 3 web servers, 3 app servers, 2 DB servers.
Round Robin DNS
• Single-server failure, seen by 1/3 of all users

© 2006 F eedBurner

Solution: Load Balancers, Monitoring 5

• Health Check pages
– Round trip all the way back to the database
– Same page monitored by load balancers
and monitoring
• Monitoring
– Cacti (http://www.cacti.net/)
– Nagios (http://www.nagios.org)

© 2006 F eedBurner

Health Check 6

UserComponent uc = UserComponentFactory.getUserComponent();
User user = uc.getUser(”monitor-userquot;);

// If first load, mark as down.
// Let FeedServlet mark things as up in init method. load-on-startup
String healthcheck = (String) application.getAttribute(quot;healthcheckquot;);
if(healthcheck == null || healthcheck.length() < 1) {
healthcheck = new String(”DOWNquot;);
application.setAttribute(quot;healthcheckquot;,healthcheck);
}
// We return null in case of problem, or if user doesn’t exist
if( user == null ) {
healthcheck = new String(quot;DOWNquot;);
application.setAttribute(quot;healthcheckquot;,healthcheck);
}
System.out.print(healthcheck);

© 2006 F eedBurner

Cacti 7

© 2006 F eedBurner

Start/Stop scripts 8

#!/bin/bash

# Source the environment
. ${HOME}/fb.env

# Start TOMCAT
cd ${FB_APPHOME}

# Remove stale temp files
find ~/rsspp/catalina/temp/ -type f -exec rm -rf {} ;

# Remove the work directory
#rm -rf ~/rsspp/catalina/work/*

${CATALINA_HOME}/bin/startup.sh

© 2006 F eedBurner

Start/Stop scripts 9

#!/bin/bash
FB_APPHOME=/opt/fb/fb-app
JAVA_HOME=/usr
CATALINA_HOME=/opt/tomcat
CATALINA_BASE=${FB_APPHOME}/catalina
CATALINA_OPTS=quot;-Xmx768m -Xms7688m -Dnetworkaddress.cache.ttl=0quot;
WEBROOT=/opt/fb/webroot

export JAVA_HOME CATALINA_HOME CATALINA_BASE CATALINA_OPTS WEBROOT

© 2006 F eedBurner

Scalability Problem 2: Stats recording/mgmt 10

• Every hit is recorded
• Certain hits mean more than others
• Flight recorder
• Any table management locks
• Inserts slow way down (90GB table)

© 2006 F eedBurner

Solution: Executor Pool 11

• Executor Pool
– Doug Lea’s concurrency library
– Use a PooledExecutor so stats inserts happen in a
separate thread
– Spring bean definition:
<bean id=quot;StatsExecutorquot;
class=quot;EDU.oswego.cs.dl.util.concurrent.PooledExecutorquot;>
<constructor-arg>
<bean class=quot;EDU.oswego.cs.dl.util.concurrent.LinkedQueuequot;/>
</constructor-arg>
<property name=quot;minimumPoolSizequot; value=quot;10quot; />
<property name=quot;keepAliveTimequot; value=quot;5000quot; />
</bean>

© 2006 F eedBurner

Solution: Lazy rollup 12

• Only today’s detailed stats need to go against
real-time table
• Roll up previous days into sparse summary
tables on-demand
• First access for stats for a day is slow,
subsequent request are fast

© 2006 F eedBurner

Scalability Problem 3: Primary DB overload 13

• Mostly used master DB server for everything
• Read vs. Read/Write load didn’t matter in the
beginnning
• Slow inserts would block reads, when using
MyISAM

© 2006 F eedBurner

Solution: Balance read and read/write load 14

• Looked at workload
– Found where we could break up read vs. read/write
– Created Spring ExtendedDaoObjects
– Tomcat-managed DataSources
• Balanced master vs. slave load (Duh)
– Slave becomes perfect place for snapshot backups
• Watch for replication problems
– Merge table problems (later)
– Slow queries slow down replication

© 2006 F eedBurner

Example: Cacti graph of MySQL handlers 15

© 2006 F eedBurner

ExtendedDaoObject 16

• Application code extends this class and uses
getHibernateTemplate() or getReadOnlyHibernateTemplate()
depending upon requirements
• Similar class for JDBC
public class ExtendedHibernateDaoSupport extends HibernateDaoSupport {

private HibernateTemplate readOnlyHibernateTemplate;

public void setReadOnlySessionFactory(SessionFactory sessionFactory) {
this.readOnlyHibernateTemplate = new HibernateTemplate(sessionFactory);
readOnlyHibernateTemplate.setFlushMode(HibernateTemplate.FLUSH_NEVER);
}

protected HibernateTemplate getReadOnlyHibernateTemplate() {
return (readOnlyHibernateTemplate == null) ? getHibernateTemplate() :
readOnlyHibernateTemplate;
}
}

© 2006 F eedBurner

Scalability Problem 4: Total DB overload 17

• Everything slowing down
• Using DB as cache
• Database is the ‘shared’ part of all app servers
• Ran into table size limit defaults on MyISAM
(4GB). We were lazy.
– Had to use Merge tables as a bridge to newer
larger tables

© 2006 F eedBurner

Solution: Stop using the database 18

• Where possible :)
• Multi-level caching
– Local VM caching (EHCache, memory only)
– Memcached (http://www.danga.com/memcached/)
– And finally, database.
• Memcached
– Fault-tolerant, but client handles that.
– Shared nothing
– Data is transient, can be recreated

© 2006 F eedBurner

Scalability Problem 5: Lazy initialization 19

• Our stats get rolled up on demand
– Popular feeds slowed down the whole system
• FeedCount chicklet calculation
– Every feed gets its circulation calculated at the
same time
– Contention on the table

© 2006 F eedBurner

Solution: BATCH PROCESSING 20

• For FeedCount, we staggered the calculation
– Still would run into contention
– Stats stuff again slowed down at 1AM Chicago time.
• We now process the rolled-up data every night
– Delay showing the previous circulation in the
FeedCount until roll-up is done.
• Still wasn’t enough

© 2006 F eedBurner

Scalability Problem 6: Stats writes, again 21

• Too much writing to master DB
• More and more data stored associated with
each feed
• More stats tracking
– Ad Stats
– Item Stats
– Circulation Stats

© 2006 F eedBurner

Solution: Merge Tables 22

• After the nightly rollup, we truncate the
subtable from 2 days ago
• Gotcha with truncating a subtable:
– FLUSH TABLES; TRUNCATE TABLE ad_stats0;
– Could succeed on master, but fail on slave
• The right way to truncate a subtable:
– ALTER TABLE ad_stats TYPE=MERGE
UNION=(ad_stats1,ad_stats2);
– TRUNCATE TABLE ad_stats0;
– ALTER TABLE ad_stats TYPE=MERGE
UNION=(ad_stats0,ad_stats1,ad_stats2);

© 2006 F eedBurner

Solution: Horizontal Partitioning 23

• Constantly identifying hot spots in the
database
– Ad serving
– Flare serving
– Circulation (constant writes, occasional reads)
• Move hottest tables/queries off to own clusters
– Hibernate and certain lazy patterns allow this
– Keeps the driving tables from slowing down

© 2006 F eedBurner

Scalability Problem 7: Master DB Failure 24

• Still using just a primary and slave
• Master crash: Single point of failure
• No easy way to promote a slave to a master

© 2006 F eedBurner

Solution: No easy answer 25

• Still using auto_increment
– Multi-master replication is out
• Tried DRBD + HeartBeat
– Disk is replicated block-by-block
– Hot primary, cold secondary
• Didn’t work as we hoped
– Myisamchk takes too long after failure
– I/O + CPU overhead
• InnoDB is supposedly better

© 2006 F eedBurner

Our multi-master solution 26

• Low-volume master cluster
– Uses DRBD + HeartBeat
– Works well under smaller load
– Does mapping to feed data clusters
• Feed Data Cluster
– Standard Master + Slave(s) structure
– Can be added as needed

© 2006 F eedBurner

Scalability Problem 8: Power Failure 28

• Chicago has ‘questionable’ infrastructure.
• Battery backup, generators can be problematic
• Colo techs have been known to hit the Big
Red Switch
• Needed a disaster recovery/secondary site
– Active/Active not possible for us. Yet.
– Would have to keep fast connection to redundant
site
– Would require 100% of current hardware, but
would lie quiet

© 2006 F eedBurner

Code Name: Panic App 29

• Product Name: Feed Insurance
• Elegant, simple solution
• Not Java (sorry)
• Perl-based feed fetcher
– Downloads copies of feeds, saved as flat XML files
– Synchronized out to local and remote servers
– Special rules for click tracking, dynamic GIFs, etc

© 2006 F eedBurner

General guidelines 30

• Know your DB workload
– Cacti really helps with this
• ‘EXPLAIN’ all of your queries
– Helps keep crushing queries out of the system
• Cache everything that you can
• Profile your code
– Usually only needed on hard-to-find leaks

© 2006 F eedBurner

Our settings / what we use 31

• Don’t always need the latest and greatest
– Hibernate 2.1
– Spring
– DBCP
– MySQL 4.1
– Tomcat 5.0.x
• Let the container manage DataSources

© 2006 F eedBurner

JDBC 32

• Hibernate/iBatis/Name-Your-ORM-Here
– Use ORM when appropriate
– Watch the queries that your ORM generates
– Don't be afraid to drop to JDBC
• Driver parameters we use:
# For Internationalization of Ads, multi-byte characters in general
useUnicode=true
characterEncoding=UTF-8

# Biggest performance bits
cacheServerConfiguration=true
useLocalSessionState=true

# Some other settings that we've needed as things have evolved
useServerPrepStmts=false
jdbcCompliantTruncation=false

© 2006 F eedBurner

Feed Burner Scalability

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Feed Burner Scalability

Similar to Feed Burner Scalability (20)

More from didip

More from didip (9)

Recently uploaded

Recently uploaded (20)

Feed Burner Scalability