HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data


Published on

The AOL Mail Team will discuss our implementation of HBase for two large scale applications: an anti-abuse mechanism and a user-visible API. We will provide an overview of how and why HBase and Hadoop were incorporated into the massive and diverse technology stack that is the nearly 20-year-old AOL Mail system and the history of how we took our HBase/Hadoop apps through our traditional process of design, to development, through QA, and into production. We will explain how our practical approach to HBase has evolved over time, and we will discuss our lessons learned and some of our techniques and tools developed via our iterative dev/qa and operational processes. We will explain the pain-points we have experienced with erratic usage and edge-cases, and how we address problems when we run across them.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Introduce myself: I am Chris Niemira, a Systems Administrator with AOL. I run a number of Hadoop and HBase clusters, along with numerous other components of the AOL Mail system. I spend my days doing work that ranges from system patches, code installs and troubleshooting, to capacity planning, performance and bottleneck analysis, and kernel tuning. I do a little engineering, a little design work, an on-call rotation, and every once in a while I get to play with Hadoop/HBase.
  • The AOL Mail System has been around for a long time, and went through a major re-architecture between 2010 – 2011. It’s not a 15 year old code base, and we evolve it constantly. We service over 70 million mailboxes in the AOL Mail environment today. That includes supporting our paying members, in addition to free accounts. Of course, member experience is our #1 priority. We have all kinds of tools in our proverbial utility belt, as we believe in trying to use the right thing for the right job.
  • It means we’re reasonably large. But we’ve also been operating “at scale” for a long time now. While we have been doing “Big Data” for a lot of years now, we got to our current size by operating a certain way: Rigid quality and change controls, lots of documentation, emphasis on uptime. As we have shifted toward being more agile, we have had to be careful with unproven technologies. HBase, for all the buzz, is still pretty young and error-prone. Some of the realities for dealing with a production Hadoop/HBase system would seemingly require a departure from our traditional mentality. Like everyone, we require stability and robustness of our production applications, but our way of getting there has had to change. Above all, however, we must still take care of our customers, so it’s a balancing act for us.
  • So HBase is one of the tools we’ve added to kit in the last few years that’s still proving itself. We’ve got two applications running and we’ve identified a few other places where it’s a good candidate to utilize. This isn’t to say that we are not using it for important things, but it’s not at the core of our system. We’ve managed to build a relatively stable platform over time. There’s a lot of scripted recovery, and a lot of proactive monitoring in our environment, and for the most part when there are problems, they are mitigated or resolved without even the involvement of an admin.
  • AOL Mail first stared looking into Hadoop and HBase back in mid 2010. Other business units in our company had been working with Hadoop for a while before then, and a little of intra-company consulting convinced us to give HBase a try. This system is one component our our anti-abuse strategy. I can’t reveal exactly what it does, but I can tell you a bit about how the HBase stuff happens. In addition to the 60 node cluster and the application servers there’s the ancillary junk which includes NameNodes (2x), HMasters (2x), Zookeepers (3x). The app hosts and Zookeepers, which are currently physicals, are being switched to virtual devices in our internal cloud.
  • This is what the application looks like. The “Service Layer” comprises various components within the AOL Mail system. They speak their own protocols and send messages to an “Event Catcher” which decodes the stream, and writes a log to local disk. That log is imported in Hadoop (and can optionally be sampled to a development sandbox at the same time) and then further cooked via MapRed which ultimately outputs rows into HBase, and can send triggers to external applications. One thing we can do at this point (not illustrated) is populate a memcache which may be used by client apps to reduce load on some HBase queries.
  • The real answer is that when we first started, we couldn’t make streaming a million and a half rows a minute work out with the Hbase we had two years ago. At the time, it was easier for us to build the batch loader, which has proven to have a few interesting advantages. Our next-generation model will rely on HBase itself being more stable, and will heavily leverage coprocessors to do a lot of what we’re doing now with MapReduce.
  • A big obstacle for us is getting MapReduce and HBase to play nicely together. From what I’ve seen, bigger hardware is starting to become more popular for running HBase, and we believe it’s essential. We’ve floated between an 8 – 16 GB heap for the RegionServer. For this application, I believe we’re currently using 16. Getting GC tuning and the IPC timeouts in HBase/Zookeeper correct are critically important. System tuning is also very important. Depending on which flavor of Linux you’re running, the stock configuration may be completely inappropriate for the needs of an HBase/Hadoop complex. In particular, look at the kenel’s IO scheduler, and VM settings.
  • This application was built a short while after we started our trial-by-fire with HBase on the previous application. It was a different development team with input from the engineers working on the previously discussed application. This application has the same “event catcher” layer for the same reasons, but it has always written directly to HBase. We import data into a “raw” table and then process that table with MapReduce writing the output into a “cooked” table. There’s a much lower number of events here, but it spikes up significantly during the MapReduce phase. It’s exactly the same class of hardware with the same ancillary junk as the previous app. Most of the query load is actually farmed out of memcache.
  • Yes, this is a relatively straight-forward design.
  • Exploding tables might be a better name for this, since it’s an across-the-board sort of thing. Backups, of course, are obvious. We’ve run into three catastrophic data loss events, actually once each with three different clusters. The first was during a burn-in phase for the Contact History application I described earlier. At that time the data it had accumulated over the week or so that it had been running wasn’t considered important essential so we were able to truncate and move along. Another time, for a separate plain Hadoop cluster, an unintentionally malicious user actually managed to delete my backups and corrupt the namenode’s event log. Luckily that data was restorable from another source. The last time was with the Activity Profiler application. Basically, having data backups saved the day.
  • This is our working model for a next-generation HBase system It is currently being prototyped with the cooperation of our Engineering and Operations teams The key design concept is to allow for a great deal of flexibility and re-use, and it centers around this idea of installing a fairly dynamic rules-engine at both the event collection and event storage layers. Hopefully will be presenting it soon
  • HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data

    1. 1. You’ve got HBaseHow AOL Mail handles Big DataPresented at HBaseConMay 22, 2012
    2. 2. The AOL Mail SystemOver 15 years oldConstantly evolving10,000+ hosts70+ Million mailboxes50+ Billion emailsA technology stack that runs the gamut Presented at HBaseCon 2012 Page 2
    3. 3. What that means…Lots of dataLots of moving partsTight SLAsMature system + Young software = Tough marriage We don’t buy “commodity” hardware Engrained Dev/QA/Prod product lifecycle Somewhat “version locked” to tried-and-true platforms Expect service outages to be quickly mitigated by our NOC w/out waiting for an on-call Presented at HBaseCon 2012 Page 3
    4. 4. So where does HBase fit?It’s a component, not the foundationCurrently used in two placesBeing evaluated for more It will remain a tool in our diverse Big Data arsenal Presented at HBaseCon 2012 Page 4
    5. 5. An Activity Profiler
    6. 6. An “Activity Profiler”Watches for particular behaviorsDesigned and built in 6/2010Originally “vanilla” Hadoop 0.20.2 + HBase 0.90.2Currently CDH31.4+ Million Events/min60x 24TB (raw) DataNodes w/ local RegionServers15x application hostsIs an internal-only tool Used by automated anti-abuse systems Leveraged by data analysts for adhoc queries/MapRed Presented at HBaseCon 2012 Page 6
    7. 7. An “Activity Profiler” Presented at HBaseCon 2012 Page 7
    8. 8. Why the “Event Catcher” layer?Has to “speak the language” of our existing systems Easy to plug an HBase translator in to existing data feeds Hard to modify the infrastructure to speak HBaseFlume was too young at the time Presented at HBaseCon 2012 Page 8
    9. 9. Why batch load via MapRed?Real time is not currently a requirementAllows filtering at different pointsAllows us to “trigger” events Designed before coprocessorsEarly data integrity issues necessitated “replaying” Missing append support early on Holes in the Meta table Long splits and GC pauses caused client timeoutsCan sample data into a “sandbox” for job developmentMakes pig, hive, and other MapRed easy and stable We keep the raw data around as well Presented at HBaseCon 2012 Page 9
    10. 10. HBase and MapRed can live in harmonyBigger than “average” hardware 36+GB RAM 8+ coresProper system tuning is essential Good information on tuning Hadoop is prolific, but… XFS > EXT JBOD > RAID As far as HBase is concerned… Just go buy Lars’ bookCareful job development, optimization is key! Presented at HBaseCon 2012 Page 10
    11. 11. Contact History API
    12. 12. Contact History APIServices a member-facing APIDesigned and built in 10/2010Modeled after the previous application Built by a different Engineering team Used to solve a very different problem250K+ Inserts/min3+ Million Inserts/min during MapRed20x 24TB (raw) DataNodes w/ local RegionServers14x application hostsLeverages Memcached to reduce query load on HBase Presented at HBaseCon 2012 Page 12
    13. 13. Contact History API Presented at HBaseCon 2012 Page 13
    14. 14. Where we go from here
    15. 15. Amusing mistakes to learn fromExploding regions Batch inserts via MapRed result in fast, symmetrical key space growth Attempting to split every region at the same time is a bad idea Turning off region splitting and using a custom “rolling region splitter” is a good idea Take time and load into consideration when selecting regions to splitBackups, backups, backups! You can never have to manyLarge, non-splitable regions tell you things Our key space maps to accounts Excessively large keys equal excessively “active” accounts Presented at HBaseCon 2012 Page 15
    16. 16. Next-generation model Presented at HBaseCon 2012 Page 16
    17. 17. Thanks! Presented at HBaseCon 2012 Page 17