Berlin Buzzwords preso

890 views
816 views

Published on

Talk given at Berlin Buzzword 2011

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
890
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Berlin Buzzwords preso

    1. 1. Analyzing the internetusing Hadoop and Hbase Friso van Vollenhoven fvanvollenhoven@xebia.com
    2. 2. +Friso van Vollenhovenfvanvollenhoven@xebia.com
    3. 3. + = One of five Regional Internet Registrars Allocate IP address ranges to organizations (e.g. ISPs, network operators, etc.) Provide near real time networkinformation based on measurements
    4. 4. WEB 5.0 SERVICE PowerBook G4 THEINTERNET
    5. 5. WEB 5.0 SERVICE AS T-Mobile PowerBook G4 BG AS P ASKPN AMS-IX BGP BG P BG BG P P B GP AS AS Level3 XS4All P BGP BG AS DT BG P BGP AS AS BT AT&T
    6. 6. Our Data BGP4MP|980099497|A|193.148.15.68|3333|192.37.0.0/16|3333 5378 286 1836|IGP|193.148.15.140|0|0||NAG|| I can take traffic for anything in the range 192.37.0.0 - 192.37.255.255 and deliver it through the network route 3333 -> 5378 -> 286 -> 1836 Got ROUTER it! AS3333 ROUTER PEER Thanks! AS RIPE NCC ROUTER PEER AS(not really a router!)
    7. 7. Our Data• 15 active data collection points (the not really routers), collecting updates from ~600 peers• BGP updates every 5 minutes, ~80M / day• Routing table dumps every 8 hours, ~10M entries / dump (160M index updates)
    8. 8. FAQ• What are all updates that are somehow related to AS3333 for the past two weeks?• How many updates per second were announcing prefixes originated by AS286 between 11:00 and 12:00 on February 2nd in 2005?• Which AS adjacencies have existed for AS3356 in the past year?• What was the size of the routing table of the peer router at 193.148.15.140 two years ago?
    9. 9. Not so frequently asked:What happened in Egypt? http://stat.ripe.net/egypt
    10. 10. THE INTERNET DATA COLLECTOR DATA COLLECTOR DATA COLLECTOR DATA COLLECTOR ONLINE QUERY SYSTEM -+ RIPE.net + Home | Login | Register | Contact RIS DATA “RDX Wall Art: The - “RDX Wall Art: The Making Of” iand new short Making Of” is a new documentary iand new short isa short documentary Blog highlighting Forun new short - isa new short documentarypush to central server - highlighting iand new sho documentary some of the pioneers - some of the pioneers highlighting highlighting iand new sho more ... more ... raw data repository (file system query based) Pages 1 2 3 4 5 6 7 . . . 120 121 122 copy to HDFS CLUSTER source data on HDFS <<MapReduce>> create datasets <<MapReduce>> insert into HBase intermediate intermediate data on intermediate HDFS on data HDFS on data HDFS
    11. 11. Import <<mapreduce>> HBASE INSERT JOB UPDATES <<mapreduce>> SOURCE COPY TO HDFS UPDATE FILES IMPORT JOB <<mapreduce>> <<mapreduce>> ADJACENCIES HBASE INSERT DERIVED SET JOB CREATE JOBTABLEDUMP <<mapreduce>> <<mapreduce>> HBase SOURCE COPY TO HDFS TABLEDUMP HBASE INSERT FILES IMPORT JOB JOB <<mapreduce>> <<mapreduce>> PEERS HBASE INSERT DERIVED SET JOB CREATE JOB
    12. 12. Import• Hadoop MapReduce for parallelization and fault tolerance• Chain jobs to create derived datasets• Database store operations are idempotent
    13. 13. BGP4MP|980099497|A|193.148.15.68|3333|192.37.0.0/16|3333 5378 286 1836|IGP|193.148.15.140|0|0||NAG|| 980099497 A|193.148.15.68|3333|192.37.0.0/16|3333 1836|IGP|193.148.15.140 RECORD prefix: 193.92.140/21 TIMELINE origin IP: 193.148.15.68 T1, T2, T3, T4, T5, T6, T7, etc. AS path: 3333 5378 286 1836 Inserting new data points means read-modify-write: • find record • fetch timeline • merge timeline with new data • write merged timeline
    14. 14. Schema and representation ROW KEY CF: DATA CF: META [record]:[timeline] [record]:[timeline][index bytes] [record]:[timeline] exists = Y [record]:[timeline] [record]:[timeline] [record]:[timeline][index bytes] [record]:[timeline] exists = Y [record]:[timeline]
    15. 15. Schema and representation• Multiple indexes; data is duplicated for each index• Protobuf encoded records and timelines• Timelines are timestamps or pairs of timestamps denoting validity intervals• Data is partitioned over time segments (index contains time part)• Indexes have an additional discriminator byte to avoid extremely wide rows• Keep an in-memory representation of the key space for backwards traversal• Keep data hot spot (most recent two to three weeks) in HBase block cache
    16. 16. Schema and representation Strategy:• Make sure data hotspot fits in HBase block cache• Scale out seek operations for (higher latency) access to old data Result:• 10x 10K RPM disk per worker node• 16GB heap space for region servers
    17. 17. { "select": ["META_DATA", "BLOB", "TIMELINE"], HTTP POST Query "data_class": "RIS_UPDATE", "where": { "composite": { "primary": { Server "time_index": { "from": "08:00:00.000 05-02-2008", "to": "18:00:00.000 05-05-2008"} }, "secondary": { "ipv4": { "match": "ALL_LEVELS_MORE_SPECIFIC", PROTOBUF "exact_match": "INCLUDE_EXACT", "resource": "193.0.0.0/8" or } JSON } } }, Client "combine_timeline": true}• In-memory index of all resources (IPs and AS numbers)• Internally generates lists ranges to scan that match query predicates• Combines time partitions into a single timeline before returning to client• Runs on every RS behind a load balancer• Knows which data exists
    18. 18. Hadoop and HBase: some experiencesBlocking writes because of memstoreflushing and compactions.Tuned these (amongst others):hbase.regionserver.global.memstore.upperLimithbase.regionserver.global.memstore.lowerLimithbase.hregion.memstore.flush.sizehbase.hregion.max.filesizehbase.hstore.compactionThresholdhbase.hstore.blockingStoreFileshbase.hstore.compaction.max
    19. 19. Hadoop and HBase: some experiences Dev machines: dual quad core, 32GB RAM, 4x 750GB 5.4K RPM data disks, 150GB OS disk Prod machines: dual quad core, 64GB RAM, 10x 600GB 10K RPM data disks, 150GB OS diskSee a previously IO bound job become CPU boundbecause of expanded IO capacity.
    20. 20. Hadoop and HBase: some experiences Garbage collecting a 16GB heap can take quite a long time.Using G1*. Full collections still take seconds, but nomissed heartbeats. * New HBase versions have a fix / feature that works with the CMS collector. RIPE NCC is looking into upgrading.
    21. 21. Hadoop and HBase: some experiences Estimating required capacity and machine profiles is hard.Pick a starting point. Monitor. Tune. Scale. In that order.
    22. 22. Hadoop and HBase: some experiences Other useful tools• Ganglia• CFEngine / Puppet• Splunk• Bash
    23. 23. Hadoop and HBase: some experiences Awesomecommunity!(mailing lists are a big help when in trouble)
    24. 24. Q&A? Friso van Vollenhoven fvanvollenhoven@xebia.com

    ×