SlideShare a Scribd company logo
Brisk: More Powerful Hadoop
    Powered by Cassandra
    jbellis@datastax.com




Monday, July 25, 2011
The evolution of Analytics




                        Analytics + Realtime


Monday, July 25, 2011
The evolution of Analytics




                                    replication




                        Analytics                 Realtime



Monday, July 25, 2011
The evolution of Analytics




                         ETL




Monday, July 25, 2011
Brisk re-unifies realtime and analytics




Monday, July 25, 2011
The Traditional Hadoop Stack
                                          Slave Nodes
                 Master Nodes
                                                Data Node
                        Name Node
                                                Task Tracker
                    Secondary Name Node
                                               Region Server
                         Job Tracker

                        Hbase Master      Client Nodes
                                                    Pig
                         ZooKeeper
                                                    Hive
                         MetaStore
                                               Region Server


Monday, July 25, 2011
7

Monday, July 25, 2011
Brisk Architecture




Monday, July 25, 2011
Brisk Highlights

          ✤    Easy to deploy and operate
          ✤    No single points of failure
          ✤    Scale and change nodes with no downtime
          ✤    Cross-DC, multi-master clusters
          ✤    Allocate resources for OLAP vs OLTP
                ✤       With no ETL




Monday, July 25, 2011
Cassandra data model

          ✤    ColumnFamilies contain rows + columns
          ✤    (Not really schemaless for a while now)


                                  password              name             site
                        zznate           *       Nate McCall
                        driftx           *   Brandon Williams
                        jbellis          *      Jonathan Ellis   datastax.com




Monday, July 25, 2011
Sparse

                                  password         name
                        zznate
                                     *          Nate McCall

                                  password         name
                        driftx
                                     *       Brandon Williams

                                  password       name             site
                        jbellis
                                     *       Jonathan Ellis   datastax.com




Monday, July 25, 2011
Rows as containers / materialized views

                                  driftx   thobbs pcmanus jbellis zznate
                        circle1

                                  xedin    mdennis
                        circle2

                                  xedin     pcmanus    ymorishita
                        circle3




Monday, July 25, 2011
Monday, July 25, 2011
CassandraFS

          ✤    data stored as ByteBuffer internally -- excellent fit for blocks
          ✤    local reads mmap data directly (no rpc)
          ✤    blocks are compressed with google snappy
          ✤    hadoop distcp hdfs:///mydata cfs:///mydata




Monday, July 25, 2011
Hive support

          ✤    Hive MetaStore in Cassandra
                ✤       Unified schema view from any node, with no external systems
                        and no SPOF
                ✤       Automatically maps Cassandra column families to Hive tables
          ✤    Supports static and dynamic column families (and supercolumns)




Monday, July 25, 2011
Hive: CFS and ColumnFamilies

         CREATE TABLE users (name STRING, zip INT); 
         LOAD DATA LOCAL INPATH 'kv2.txt' OVERWRITE INTO TABLE users;
          

         CREATE EXTERNAL TABLE Keyspace1.Users(name STRING, zip INT)
         STORED BY
         'org.apache.hadoop.hive.cassandra.CassandraStorageHandler';


         CREATE EXTERNAL TABLE Keyspace1.Users
         (row_key STRING, column_name STRING, value string)
         STORED BY
         'org.apache.hadoop.hive.cassandra.CassandraStorageHandler';




Monday, July 25, 2011
Pig Support

    ✤    With standard Cassandra:
         $ export PIG_HOME=/path/to/pig
         $ export PIG_INITIAL_ADDRESS=localhost

         $ export PIG_RPC_PORT=9160
         $ export
         PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
         $ contrib/pig/bin/pig_cassandra

         grunt>

    ✤    With Brisk:
         $ bin/brisk pig
         grunt>


Monday, July 25, 2011
Pig: CFS and ColumnFamilies

         grunt> data = LOAD 'cfs:///example.txt' using PigStorage() as
         (name:chararray, value:long);


         data = LOAD 'cassandra://Demo1/Scores' using CassandraStorage()
         AS (key, columns: {T: tuple(name, value)});


         data = LOAD 'cassandra://Demo1/Scores&slice_start=M&slice_end=S'
         using CassandraStorage() AS (key, columns: {T: tuple(name,
         value)});




Monday, July 25, 2011
19

Monday, July 25, 2011
Data model: Realtime
               LiveStocks
                                      last
                         GOOG        $95.52
                          AAPL       $186.10
                         AMZN        $112.98


                 Portfolios
                                     GOOG      LNKD       P        AMZN    AAPLE
                        Portfolio1
                                      80        20       40        100       20


                 StockHist
                                     2011-01-01       2011-01-02     2011-01-03
                         GOOG
                                       $79.85          $75.23            $82.11



Monday, July 25, 2011
Data model: Analytics
               HistLoss
                                     worst_date    loss
                        Portfolio1   2011-07-23   -$34.81
                        Portfolio2   2011-03-11 -$11432.24
                        Portfolio3   2011-05-21 -$1476.93




Monday, July 25, 2011
Data model: Analytics
               10dayreturns
                   ticker      rdate     return
                   GOOG     2011-07-25   $8.23
                   GOOG     2011-07-24   $6.14
                   GOOG     2011-07-23   $7.78
                   AAPL     2011-07-25   $15.32
                   AAPL     2011-07-24   $12.68


              INSERT OVERWRITE TABLE 10dayreturns
              SELECT a.row_key ticker,
                     b.column_name rdate,
                     b.value - a.value
              FROM StockHist a
              JOIN StockHist b
              ON (a.row_key = b.row_key
                  AND date_add(a.column_name,10) = b.column_name);



Monday, July 25, 2011
2011-01-01     2011-01-02   2011-01-03
                GOOG
                           $79.85         $75.23       $82.11




             row_key column_name      value
              GOOG    2011-01-01      $8.23
              GOOG    2011-01-02      $6.14
              GOOG 2011-001-03        $7.78




Monday, July 25, 2011
Data model: Analytics
               portfolio_returns
                    portfolio       rdate      preturn
                    Portfolio1   2011-07-25    $118.21
                    Portfolio1   2011-07-24     $60.78
                    Portfolio1   2011-07-23    -$34.81
                    Portfolio2   2011-07-25   $2143.92
                    Portfolio3   2011-07-24    -$10.19


               INSERT OVERWRITE TABLE portfolio_returns
               SELECT row_key portfolio,
                      rdate,
                      SUM(b.return)
               FROM portfolios a JOIN 10dayreturns b
               ON (a.column_name = b.ticker)
               GROUP BY row_key, rdate;




Monday, July 25, 2011
Data model: Analytics
               HistLoss
                                     worst_date    loss
                        Portfolio1   2011-07-23   -$34.81
                        Portfolio2   2011-03-11 -$11432.24
                        Portfolio3   2011-05-21 -$1476.93



               INSERT OVERWRITE TABLE HistLoss
               SELECT a.portfolio, rdate, minp
               FROM (
                 SELECT portfolio, min(preturn) as minp
                 FROM portfolio_returns
                 GROUP BY portfolio
               ) a
               JOIN portfolio_returns b
               ON (a.portfolio = b.portfolio and a.minp = b.preturn);



Monday, July 25, 2011
Portfolio Demo dataflow


     Portfolios               Web-based Portfolios
     Historical Prices        Live Prices for today
     Intermediate Results
     Largest loss             Largest loss




Monday, July 25, 2011
OpsCenter




Monday, July 25, 2011
Monday, July 25, 2011
Where to get it

    ✤    http://www.datastax.com/brisk




Monday, July 25, 2011
Monday, July 25, 2011

More Related Content

Similar to Brisk: more powerful Hadoop powered by Cassandra

Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)jbellis
 
잘 알려지지 않은 Php 코드 활용하기
잘 알려지지 않은 Php 코드 활용하기잘 알려지지 않은 Php 코드 활용하기
잘 알려지지 않은 Php 코드 활용하기
형우 안
 
international PHP2011_ilia alshanetsky_Hidden Features of PHP
international PHP2011_ilia alshanetsky_Hidden Features of PHPinternational PHP2011_ilia alshanetsky_Hidden Features of PHP
international PHP2011_ilia alshanetsky_Hidden Features of PHPsmueller_sandsmedia
 
DBXTalk: Smalltalk Relational Database Suite
DBXTalk: Smalltalk Relational Database SuiteDBXTalk: Smalltalk Relational Database Suite
DBXTalk: Smalltalk Relational Database Suite
Mariano Martínez Peck
 
Solving performance problems in MySQL without denormalization
Solving performance problems in MySQL without denormalizationSolving performance problems in MySQL without denormalization
Solving performance problems in MySQL without denormalization
dmcfarlane
 
Akiban Technologies: Renormalize
Akiban Technologies: RenormalizeAkiban Technologies: Renormalize
Akiban Technologies: Renormalize
Ariel Weil
 
Using object dependencies in sql server 2008 tech republic
Using object dependencies in sql server 2008   tech republicUsing object dependencies in sql server 2008   tech republic
Using object dependencies in sql server 2008 tech republicKaing Menglieng
 
Xldb2011 tue 1055_tom_fastner
Xldb2011 tue 1055_tom_fastnerXldb2011 tue 1055_tom_fastner
Xldb2011 tue 1055_tom_fastnerliqiang xu
 
Advanced WAL File Management With OmniPITR
Advanced WAL File Management With OmniPITRAdvanced WAL File Management With OmniPITR
Advanced WAL File Management With OmniPITR
Robert Treat
 
Ext GWT 3.0 Theming and Appearances
Ext GWT 3.0 Theming and AppearancesExt GWT 3.0 Theming and Appearances
Ext GWT 3.0 Theming and Appearances
Sencha
 
Introducing Ext GWT 3.0
Introducing Ext GWT 3.0Introducing Ext GWT 3.0
Introducing Ext GWT 3.0
Sencha
 
Zookeeper In Simple Words
Zookeeper In Simple WordsZookeeper In Simple Words
Zookeeper In Simple Words
Fuqiang Wang
 
2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastore2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastore
ikailan
 
Writing a Crawler with Python and TDD
Writing a Crawler with Python and TDDWriting a Crawler with Python and TDD
Writing a Crawler with Python and TDDAndrea Francia
 
The Solar Framework for PHP
The Solar Framework for PHPThe Solar Framework for PHP
The Solar Framework for PHPConFoo
 
Pets etiseo nice100505
Pets etiseo nice100505Pets etiseo nice100505
Pets etiseo nice100505
Shiva Kumar
 
Paris NoSQL User Group - In Memory Data Grids in Action (without transactions...
Paris NoSQL User Group - In Memory Data Grids in Action (without transactions...Paris NoSQL User Group - In Memory Data Grids in Action (without transactions...
Paris NoSQL User Group - In Memory Data Grids in Action (without transactions...
Cyrille Le Clerc
 
Paris NoSQL User Group - In Memory Data Grids in Action (without transactions...
Paris NoSQL User Group - In Memory Data Grids in Action (without transactions...Paris NoSQL User Group - In Memory Data Grids in Action (without transactions...
Paris NoSQL User Group - In Memory Data Grids in Action (without transactions...
Publicis Sapient Engineering
 

Similar to Brisk: more powerful Hadoop powered by Cassandra (20)

Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
 
잘 알려지지 않은 Php 코드 활용하기
잘 알려지지 않은 Php 코드 활용하기잘 알려지지 않은 Php 코드 활용하기
잘 알려지지 않은 Php 코드 활용하기
 
Coding Potpourri: MySQL
Coding Potpourri: MySQLCoding Potpourri: MySQL
Coding Potpourri: MySQL
 
international PHP2011_ilia alshanetsky_Hidden Features of PHP
international PHP2011_ilia alshanetsky_Hidden Features of PHPinternational PHP2011_ilia alshanetsky_Hidden Features of PHP
international PHP2011_ilia alshanetsky_Hidden Features of PHP
 
DBXTalk: Smalltalk Relational Database Suite
DBXTalk: Smalltalk Relational Database SuiteDBXTalk: Smalltalk Relational Database Suite
DBXTalk: Smalltalk Relational Database Suite
 
Solving performance problems in MySQL without denormalization
Solving performance problems in MySQL without denormalizationSolving performance problems in MySQL without denormalization
Solving performance problems in MySQL without denormalization
 
Akiban Technologies: Renormalize
Akiban Technologies: RenormalizeAkiban Technologies: Renormalize
Akiban Technologies: Renormalize
 
Using object dependencies in sql server 2008 tech republic
Using object dependencies in sql server 2008   tech republicUsing object dependencies in sql server 2008   tech republic
Using object dependencies in sql server 2008 tech republic
 
Xldb2011 tue 1055_tom_fastner
Xldb2011 tue 1055_tom_fastnerXldb2011 tue 1055_tom_fastner
Xldb2011 tue 1055_tom_fastner
 
Advanced WAL File Management With OmniPITR
Advanced WAL File Management With OmniPITRAdvanced WAL File Management With OmniPITR
Advanced WAL File Management With OmniPITR
 
Ext GWT 3.0 Theming and Appearances
Ext GWT 3.0 Theming and AppearancesExt GWT 3.0 Theming and Appearances
Ext GWT 3.0 Theming and Appearances
 
Introducing Ext GWT 3.0
Introducing Ext GWT 3.0Introducing Ext GWT 3.0
Introducing Ext GWT 3.0
 
Zookeeper In Simple Words
Zookeeper In Simple WordsZookeeper In Simple Words
Zookeeper In Simple Words
 
2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastore2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastore
 
Writing a Crawler with Python and TDD
Writing a Crawler with Python and TDDWriting a Crawler with Python and TDD
Writing a Crawler with Python and TDD
 
The Solar Framework for PHP
The Solar Framework for PHPThe Solar Framework for PHP
The Solar Framework for PHP
 
Pets etiseo nice100505
Pets etiseo nice100505Pets etiseo nice100505
Pets etiseo nice100505
 
Paris NoSQL User Group - In Memory Data Grids in Action (without transactions...
Paris NoSQL User Group - In Memory Data Grids in Action (without transactions...Paris NoSQL User Group - In Memory Data Grids in Action (without transactions...
Paris NoSQL User Group - In Memory Data Grids in Action (without transactions...
 
Paris NoSQL User Group - In Memory Data Grids in Action (without transactions...
Paris NoSQL User Group - In Memory Data Grids in Action (without transactions...Paris NoSQL User Group - In Memory Data Grids in Action (without transactions...
Paris NoSQL User Group - In Memory Data Grids in Action (without transactions...
 
Caridy patino - node-js
Caridy patino - node-jsCaridy patino - node-js
Caridy patino - node-js
 

More from jbellis

Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
jbellis
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
jbellis
 
Data day texas: Cassandra and the Cloud
Data day texas: Cassandra and the CloudData day texas: Cassandra and the Cloud
Data day texas: Cassandra and the Cloud
jbellis
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014
jbellis
 
Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1jbellis
 
Tokyo cassandra conference 2014
Tokyo cassandra conference 2014Tokyo cassandra conference 2014
Tokyo cassandra conference 2014jbellis
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013jbellis
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0jbellis
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynotejbellis
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012jbellis
 
Top five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionTop five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionjbellis
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012jbellis
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandrajbellis
 
Cassandra 1.1
Cassandra 1.1Cassandra 1.1
Cassandra 1.1jbellis
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Javajbellis
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprisejbellis
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)jbellis
 
Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011jbellis
 
What python can learn from java
What python can learn from javaWhat python can learn from java
What python can learn from javajbellis
 
State of Cassandra, 2011
State of Cassandra, 2011State of Cassandra, 2011
State of Cassandra, 2011jbellis
 

More from jbellis (20)

Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 
Data day texas: Cassandra and the Cloud
Data day texas: Cassandra and the CloudData day texas: Cassandra and the Cloud
Data day texas: Cassandra and the Cloud
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014
 
Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1
 
Tokyo cassandra conference 2014
Tokyo cassandra conference 2014Tokyo cassandra conference 2014
Tokyo cassandra conference 2014
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynote
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012
 
Top five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionTop five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solution
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandra
 
Cassandra 1.1
Cassandra 1.1Cassandra 1.1
Cassandra 1.1
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Java
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprise
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
 
Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011
 
What python can learn from java
What python can learn from javaWhat python can learn from java
What python can learn from java
 
State of Cassandra, 2011
State of Cassandra, 2011State of Cassandra, 2011
State of Cassandra, 2011
 

Recently uploaded

Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 

Recently uploaded (20)

Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 

Brisk: more powerful Hadoop powered by Cassandra

  • 1. Brisk: More Powerful Hadoop Powered by Cassandra jbellis@datastax.com Monday, July 25, 2011
  • 2. The evolution of Analytics Analytics + Realtime Monday, July 25, 2011
  • 3. The evolution of Analytics replication Analytics Realtime Monday, July 25, 2011
  • 4. The evolution of Analytics ETL Monday, July 25, 2011
  • 5. Brisk re-unifies realtime and analytics Monday, July 25, 2011
  • 6. The Traditional Hadoop Stack Slave Nodes Master Nodes Data Node Name Node Task Tracker Secondary Name Node Region Server Job Tracker Hbase Master Client Nodes Pig ZooKeeper Hive MetaStore Region Server Monday, July 25, 2011
  • 9. Brisk Highlights ✤ Easy to deploy and operate ✤ No single points of failure ✤ Scale and change nodes with no downtime ✤ Cross-DC, multi-master clusters ✤ Allocate resources for OLAP vs OLTP ✤ With no ETL Monday, July 25, 2011
  • 10. Cassandra data model ✤ ColumnFamilies contain rows + columns ✤ (Not really schemaless for a while now) password name site zznate * Nate McCall driftx * Brandon Williams jbellis * Jonathan Ellis datastax.com Monday, July 25, 2011
  • 11. Sparse password name zznate * Nate McCall password name driftx * Brandon Williams password name site jbellis * Jonathan Ellis datastax.com Monday, July 25, 2011
  • 12. Rows as containers / materialized views driftx thobbs pcmanus jbellis zznate circle1 xedin mdennis circle2 xedin pcmanus ymorishita circle3 Monday, July 25, 2011
  • 14. CassandraFS ✤ data stored as ByteBuffer internally -- excellent fit for blocks ✤ local reads mmap data directly (no rpc) ✤ blocks are compressed with google snappy ✤ hadoop distcp hdfs:///mydata cfs:///mydata Monday, July 25, 2011
  • 15. Hive support ✤ Hive MetaStore in Cassandra ✤ Unified schema view from any node, with no external systems and no SPOF ✤ Automatically maps Cassandra column families to Hive tables ✤ Supports static and dynamic column families (and supercolumns) Monday, July 25, 2011
  • 16. Hive: CFS and ColumnFamilies CREATE TABLE users (name STRING, zip INT);  LOAD DATA LOCAL INPATH 'kv2.txt' OVERWRITE INTO TABLE users;   CREATE EXTERNAL TABLE Keyspace1.Users(name STRING, zip INT) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'; CREATE EXTERNAL TABLE Keyspace1.Users (row_key STRING, column_name STRING, value string) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'; Monday, July 25, 2011
  • 17. Pig Support ✤ With standard Cassandra: $ export PIG_HOME=/path/to/pig $ export PIG_INITIAL_ADDRESS=localhost $ export PIG_RPC_PORT=9160 $ export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner $ contrib/pig/bin/pig_cassandra grunt> ✤ With Brisk: $ bin/brisk pig grunt> Monday, July 25, 2011
  • 18. Pig: CFS and ColumnFamilies grunt> data = LOAD 'cfs:///example.txt' using PigStorage() as (name:chararray, value:long); data = LOAD 'cassandra://Demo1/Scores' using CassandraStorage() AS (key, columns: {T: tuple(name, value)}); data = LOAD 'cassandra://Demo1/Scores&slice_start=M&slice_end=S' using CassandraStorage() AS (key, columns: {T: tuple(name, value)}); Monday, July 25, 2011
  • 20. Data model: Realtime LiveStocks last GOOG $95.52 AAPL $186.10 AMZN $112.98 Portfolios GOOG LNKD P AMZN AAPLE Portfolio1 80 20 40 100 20 StockHist 2011-01-01 2011-01-02 2011-01-03 GOOG $79.85 $75.23 $82.11 Monday, July 25, 2011
  • 21. Data model: Analytics HistLoss worst_date loss Portfolio1 2011-07-23 -$34.81 Portfolio2 2011-03-11 -$11432.24 Portfolio3 2011-05-21 -$1476.93 Monday, July 25, 2011
  • 22. Data model: Analytics 10dayreturns ticker rdate return GOOG 2011-07-25 $8.23 GOOG 2011-07-24 $6.14 GOOG 2011-07-23 $7.78 AAPL 2011-07-25 $15.32 AAPL 2011-07-24 $12.68 INSERT OVERWRITE TABLE 10dayreturns SELECT a.row_key ticker, b.column_name rdate, b.value - a.value FROM StockHist a JOIN StockHist b ON (a.row_key = b.row_key AND date_add(a.column_name,10) = b.column_name); Monday, July 25, 2011
  • 23. 2011-01-01 2011-01-02 2011-01-03 GOOG $79.85 $75.23 $82.11 row_key column_name value GOOG 2011-01-01 $8.23 GOOG 2011-01-02 $6.14 GOOG 2011-001-03 $7.78 Monday, July 25, 2011
  • 24. Data model: Analytics portfolio_returns portfolio rdate preturn Portfolio1 2011-07-25 $118.21 Portfolio1 2011-07-24 $60.78 Portfolio1 2011-07-23 -$34.81 Portfolio2 2011-07-25 $2143.92 Portfolio3 2011-07-24 -$10.19 INSERT OVERWRITE TABLE portfolio_returns SELECT row_key portfolio, rdate, SUM(b.return) FROM portfolios a JOIN 10dayreturns b ON (a.column_name = b.ticker) GROUP BY row_key, rdate; Monday, July 25, 2011
  • 25. Data model: Analytics HistLoss worst_date loss Portfolio1 2011-07-23 -$34.81 Portfolio2 2011-03-11 -$11432.24 Portfolio3 2011-05-21 -$1476.93 INSERT OVERWRITE TABLE HistLoss SELECT a.portfolio, rdate, minp FROM ( SELECT portfolio, min(preturn) as minp FROM portfolio_returns GROUP BY portfolio ) a JOIN portfolio_returns b ON (a.portfolio = b.portfolio and a.minp = b.preturn); Monday, July 25, 2011
  • 26. Portfolio Demo dataflow Portfolios Web-based Portfolios Historical Prices Live Prices for today Intermediate Results Largest loss Largest loss Monday, July 25, 2011
  • 29. Where to get it ✤ http://www.datastax.com/brisk Monday, July 25, 2011