Hypercubes In Hbase

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    2 Favorites

    Hypercubes In Hbase - Presentation Transcript

    1. Hypercubes in HBase Fredrik Möllerstrand <fredrik@last.fm> Hadoop User Group UK, April 14 2009
    2. Hello. • Per Andersson, Fredrik Möllerstrand • Chalmers University of Technology, Sweden • Master thesis at last.fm
    3. stats.last.fm • Web statistics for in-house use. • Served out of mysql.
    4. stats.last.fm • y-axis: pageviews. • x-axis: time. • also: countries.
    5. stats.last.fm SELECT pageviews, country FROM webstatistics GROUP BY country;
    6. SQL: Star schema • Facts table & dimension tables. • Joins!
    7. Data Cube Foundations • n-dimensional cube • attribute => dimension • attribute value => measurement
    8. Data Cube Foundations • dimensionality reductions • projections
    9. Data Cube Foundations • Aggregation: sum, count, average, &c. • Data cubes modeled as: in RDBMSs, modeled as star schemas. in HBase, modeled with column-families.
    10. Data Cubes in HBase • Store projections, not distinct dimensions. • Pre-compute *everything*.
    11. Data Cubes in HBase • Rowkey: unit + time i.e. ‘pageviews-20090414’ • One column-family for every projection. i.e. ‘country-useragent’ • One qualifier per point in n-space. i.e. ‘US-safari’, ‘NO-opera’, &c.
    12. The SQL-DB Problem • Too much data to keep in memory. • Plenty of joins makes queries complex. • Can’t serve at mouse click rate.
    13. The Solution; A data store that is: • Distributed • Multi-dimensional • Magnetic(!) • Just general enough
    14. Enter: Zohmg.
    15. Zohmg; A data store that is: • Distributed • Multi-dimensional • Time-series-based • Magnetic(!) • Just general enough
    16. Tech • Rides on the back of Dumbo. • Stores aggregates in HBase. • Serves JSON.
    17. Zohmg • $> setup.py # create hbase database. • $> import.py --mapper weblogs.py # run dumbo job. • $> serve.py # start web server.
    18. Developers, developers. • Configuration - yaml. • Mapper - python.
    19. User’s configuration. project_name: webmetrics dimensions: - country - domain - useragent - usertype units: - pageviews projections: country: - country domain-usertype: - domain - usertype country-domain-useragent-usertype: - country - domain - useragent - usertype
    20. User’s mapper. def map(key, value): from lfm.data.parse import web log = web.parse(value) dimensions = {'country' : geoip(log.host), 'domain' : log.domain, 'useragent' : classify(log.useragent), 'usertype' : ("user", "anon")[log.userid == None] } values = {'pageviews' : 1} yield log.timestamp, dimensions, values
    21. Example.
    22. Dimensions in HBase • Column-family: country-useragent-domain • Qualifier: US-firefox-last.fm
    23. Questions?

    + George AngGeorge Ang, 3 months ago

    custom

    460 views, 2 favs, 1 embeds more stats

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 460
      • 431 on SlideShare
      • 29 from embeds
    • Comments 0
    • Favorites 2
    • Downloads 8
    Most viewed embeds
    • 29 views on http://nosql.mypopescu.com

    more

    All embeds
    • 29 views on http://nosql.mypopescu.com

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories