Brisk: More Powerful Hadoop    Powered by Cassandra    jbellis@datastax.comMonday, July 25, 2011
The evolution of Analytics                        Analytics + RealtimeMonday, July 25, 2011
The evolution of Analytics                                    replication                        Analytics                ...
The evolution of Analytics                         ETLMonday, July 25, 2011
Brisk re-unifies realtime and analyticsMonday, July 25, 2011
The Traditional Hadoop Stack                                          Slave Nodes                 Master Nodes            ...
7Monday, July 25, 2011
Brisk ArchitectureMonday, July 25, 2011
Brisk Highlights          ✤    Easy to deploy and operate          ✤    No single points of failure          ✤    Scale an...
Cassandra data model          ✤    ColumnFamilies contain rows + columns          ✤    (Not really schemaless for a while ...
Sparse                                  password         name                        zznate                               ...
Rows as containers / materialized views                                  driftx   thobbs pcmanus jbellis zznate           ...
Monday, July 25, 2011
CassandraFS          ✤    data stored as ByteBuffer internally -- excellent fit for blocks          ✤    local reads mmap d...
Hive support          ✤    Hive MetaStore in Cassandra                ✤       Unified schema view from any node, with no ex...
Hive: CFS and ColumnFamilies         CREATE TABLE users (name STRING, zip INT);          LOAD DATA LOCAL INPATH kv2.txt OV...
Pig Support    ✤    With standard Cassandra:         $ export PIG_HOME=/path/to/pig         $ export PIG_INITIAL_ADDRESS=l...
Pig: CFS and ColumnFamilies         grunt> data = LOAD cfs:///example.txt using PigStorage() as         (name:chararray, v...
19Monday, July 25, 2011
Data model: Realtime               LiveStocks                                      last                         GOOG      ...
Data model: Analytics               HistLoss                                     worst_date    loss                       ...
Data model: Analytics               10dayreturns                   ticker      rdate     return                   GOOG    ...
2011-01-01     2011-01-02   2011-01-03                GOOG                           $79.85         $75.23       $82.11   ...
Data model: Analytics               portfolio_returns                    portfolio       rdate      preturn               ...
Data model: Analytics               HistLoss                                     worst_date    loss                       ...
Portfolio Demo dataflow     Portfolios               Web-based Portfolios     Historical Prices        Live Prices for tod...
OpsCenterMonday, July 25, 2011
Monday, July 25, 2011
Where to get it    ✤    http://www.datastax.com/briskMonday, July 25, 2011
Monday, July 25, 2011
Upcoming SlideShare
Loading in...5
×

Brisk: more powerful Hadoop powered by Cassandra

7,169

Published on

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,169
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
106
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Brisk: more powerful Hadoop powered by Cassandra

  1. 1. Brisk: More Powerful Hadoop Powered by Cassandra jbellis@datastax.comMonday, July 25, 2011
  2. 2. The evolution of Analytics Analytics + RealtimeMonday, July 25, 2011
  3. 3. The evolution of Analytics replication Analytics RealtimeMonday, July 25, 2011
  4. 4. The evolution of Analytics ETLMonday, July 25, 2011
  5. 5. Brisk re-unifies realtime and analyticsMonday, July 25, 2011
  6. 6. The Traditional Hadoop Stack Slave Nodes Master Nodes Data Node Name Node Task Tracker Secondary Name Node Region Server Job Tracker Hbase Master Client Nodes Pig ZooKeeper Hive MetaStore Region ServerMonday, July 25, 2011
  7. 7. 7Monday, July 25, 2011
  8. 8. Brisk ArchitectureMonday, July 25, 2011
  9. 9. Brisk Highlights ✤ Easy to deploy and operate ✤ No single points of failure ✤ Scale and change nodes with no downtime ✤ Cross-DC, multi-master clusters ✤ Allocate resources for OLAP vs OLTP ✤ With no ETLMonday, July 25, 2011
  10. 10. Cassandra data model ✤ ColumnFamilies contain rows + columns ✤ (Not really schemaless for a while now) password name site zznate * Nate McCall driftx * Brandon Williams jbellis * Jonathan Ellis datastax.comMonday, July 25, 2011
  11. 11. Sparse password name zznate * Nate McCall password name driftx * Brandon Williams password name site jbellis * Jonathan Ellis datastax.comMonday, July 25, 2011
  12. 12. Rows as containers / materialized views driftx thobbs pcmanus jbellis zznate circle1 xedin mdennis circle2 xedin pcmanus ymorishita circle3Monday, July 25, 2011
  13. 13. Monday, July 25, 2011
  14. 14. CassandraFS ✤ data stored as ByteBuffer internally -- excellent fit for blocks ✤ local reads mmap data directly (no rpc) ✤ blocks are compressed with google snappy ✤ hadoop distcp hdfs:///mydata cfs:///mydataMonday, July 25, 2011
  15. 15. Hive support ✤ Hive MetaStore in Cassandra ✤ Unified schema view from any node, with no external systems and no SPOF ✤ Automatically maps Cassandra column families to Hive tables ✤ Supports static and dynamic column families (and supercolumns)Monday, July 25, 2011
  16. 16. Hive: CFS and ColumnFamilies CREATE TABLE users (name STRING, zip INT);  LOAD DATA LOCAL INPATH kv2.txt OVERWRITE INTO TABLE users;   CREATE EXTERNAL TABLE Keyspace1.Users(name STRING, zip INT) STORED BY org.apache.hadoop.hive.cassandra.CassandraStorageHandler; CREATE EXTERNAL TABLE Keyspace1.Users (row_key STRING, column_name STRING, value string) STORED BY org.apache.hadoop.hive.cassandra.CassandraStorageHandler;Monday, July 25, 2011
  17. 17. Pig Support ✤ With standard Cassandra: $ export PIG_HOME=/path/to/pig $ export PIG_INITIAL_ADDRESS=localhost $ export PIG_RPC_PORT=9160 $ export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner $ contrib/pig/bin/pig_cassandra grunt> ✤ With Brisk: $ bin/brisk pig grunt>Monday, July 25, 2011
  18. 18. Pig: CFS and ColumnFamilies grunt> data = LOAD cfs:///example.txt using PigStorage() as (name:chararray, value:long); data = LOAD cassandra://Demo1/Scores using CassandraStorage() AS (key, columns: {T: tuple(name, value)}); data = LOAD cassandra://Demo1/Scores&slice_start=M&slice_end=S using CassandraStorage() AS (key, columns: {T: tuple(name, value)});Monday, July 25, 2011
  19. 19. 19Monday, July 25, 2011
  20. 20. Data model: Realtime LiveStocks last GOOG $95.52 AAPL $186.10 AMZN $112.98 Portfolios GOOG LNKD P AMZN AAPLE Portfolio1 80 20 40 100 20 StockHist 2011-01-01 2011-01-02 2011-01-03 GOOG $79.85 $75.23 $82.11Monday, July 25, 2011
  21. 21. Data model: Analytics HistLoss worst_date loss Portfolio1 2011-07-23 -$34.81 Portfolio2 2011-03-11 -$11432.24 Portfolio3 2011-05-21 -$1476.93Monday, July 25, 2011
  22. 22. Data model: Analytics 10dayreturns ticker rdate return GOOG 2011-07-25 $8.23 GOOG 2011-07-24 $6.14 GOOG 2011-07-23 $7.78 AAPL 2011-07-25 $15.32 AAPL 2011-07-24 $12.68 INSERT OVERWRITE TABLE 10dayreturns SELECT a.row_key ticker, b.column_name rdate, b.value - a.value FROM StockHist a JOIN StockHist b ON (a.row_key = b.row_key AND date_add(a.column_name,10) = b.column_name);Monday, July 25, 2011
  23. 23. 2011-01-01 2011-01-02 2011-01-03 GOOG $79.85 $75.23 $82.11 row_key column_name value GOOG 2011-01-01 $8.23 GOOG 2011-01-02 $6.14 GOOG 2011-001-03 $7.78Monday, July 25, 2011
  24. 24. Data model: Analytics portfolio_returns portfolio rdate preturn Portfolio1 2011-07-25 $118.21 Portfolio1 2011-07-24 $60.78 Portfolio1 2011-07-23 -$34.81 Portfolio2 2011-07-25 $2143.92 Portfolio3 2011-07-24 -$10.19 INSERT OVERWRITE TABLE portfolio_returns SELECT row_key portfolio, rdate, SUM(b.return) FROM portfolios a JOIN 10dayreturns b ON (a.column_name = b.ticker) GROUP BY row_key, rdate;Monday, July 25, 2011
  25. 25. Data model: Analytics HistLoss worst_date loss Portfolio1 2011-07-23 -$34.81 Portfolio2 2011-03-11 -$11432.24 Portfolio3 2011-05-21 -$1476.93 INSERT OVERWRITE TABLE HistLoss SELECT a.portfolio, rdate, minp FROM ( SELECT portfolio, min(preturn) as minp FROM portfolio_returns GROUP BY portfolio ) a JOIN portfolio_returns b ON (a.portfolio = b.portfolio and a.minp = b.preturn);Monday, July 25, 2011
  26. 26. Portfolio Demo dataflow Portfolios Web-based Portfolios Historical Prices Live Prices for today Intermediate Results Largest loss Largest lossMonday, July 25, 2011
  27. 27. OpsCenterMonday, July 25, 2011
  28. 28. Monday, July 25, 2011
  29. 29. Where to get it ✤ http://www.datastax.com/briskMonday, July 25, 2011
  30. 30. Monday, July 25, 2011
  1. Gostou de algum slide específico?

    Recortar slides é uma maneira fácil de colecionar informações para acessar mais tarde.

×