SlideShare a Scribd company logo
1 of 43
Download to read offline
Brisk: Truly peer­to­peer Hadoop
       

      srisatish.ambati AT gmail.com
      DataStax/OpenJDK
      @srisatish


                      
Brisk: Hive + Hadoop + Cassandra




                    
                                  @srisatish
Map Reduce




         
                 @srisatish
Have large sets of data & you can 
    work on small pieces in parallel. 




                     
                                    @srisatish
        
    Map Reduce
                 @srisatish
Multi­core map reduce framework, 
    Kunle, et al

                       
                                        @srisatish
                   
    Parallel Execution View   @srisatish
     
        @srisatish
     
        @srisatish
JobTracker
    NameNode
      HDFS




         
                 @srisatish
Write­once­read­many!
    File once created, written & closed need change




                            
                                                @srisatish
Move computation, not data




                 
                                 @srisatish
     
        @srisatish
DataNodes: Read, Write Blocks




                   
                                    @srisatish
NameNode: 
Single Master node
Single Machine Address space
Single Point of failure




                       
When “it” does not fit in a single node!
    … Enter the distributed dragon!



                  Enter the Cassandra:
                       High Scale
                      Peer­to­peer




                              
                                               @srisatish
NameNode




    DataNodes
           
One­kind­of­node!




        
Cassandra:
     High Scale
    Peer­to­peer




          
                   @srisatish
Portfolio Demo
Low latency
        Live tick prices for stocks.
Batch Analytics
        Historical EOD prices.
        Value at Risk.



    http://www.datastax.com/docs/0.8/brisk/brisk_demo
                                  
Demo URLs (good for this demo only)


http://ec2­50­19­4­143.compute­1.amazonaws.com:8888/opscenter/index.html
http://ec2­67­202­12­176.compute­1.amazonaws.com:50030/jobdetails.jsp?job
http://ec2­50­19­4­143.compute­1.amazonaws.com:8983/portfolio/




                                           
Dynamo, 2007
Bigtable, 2006




                        OSS, 2008




      Incubator, 2009      TLP, 2010
Y
                                       Key “C”
                           A
        W
            Cassandra:
             High Scale
    U
            Peer­to­peer           F
             No SPOF

        T
                               L
              P

                   
                                                 @srisatish
     
     
Brisk




         
            @srisatish
Brisk
    HowStuffWorks version




               
                            @srisatish
YDH security edition (soon to be Apache)
Apache Hive – Access via SQL like
  CassandraHandler
Cassandra 0.8




                         
Use ColumnFamilies
inode
sblock  




                 
                     @srisatish
 
     String keyspace = “cfs”;
     CfDef cf = new CfDef();
        cf.setName(inodeDefaultCf);
        cf.setComparator_type("BytesType");
     …
          
         cf.setName(sblockDefaultCf);
         cf.setKey_cache_size(1M);
         cf.setComment( 
    "Stores blocks of information associated with a inode");

    cf.setKeyspace(keyspace);
                                  
                                                           @srisatish
Consistency: R + W > N

"brisk.consistencylevel.read", "QUORUM";
"brisk.consistencylevel.write", "QUORUM";




                         
                                            @srisatish
Hadoop: 
job tracker, task tracker




                 
                            @srisatish
BriskSnitch: 
brisk nodes, cassandra nodes




                 
                               @srisatish
BriskSimpleSnitch.java

if(TrackerInitializer.isTrackerNode)
     {
           myDC = BRISK_DC;
          logger.info("Detected Hadoop trackers 
are enabled, setting my DC to " + myDC);
      }
 else
      {
            myDC = CASSANDRA_DC;
            logger.info("Looks like Vanilla Cassandra 
nodes, setting my DC to " + myDC);
      }
                               
                                                   @srisatish
Hive: SQL­like access
cli, hwi, jdbc, metastore
Pushdown predicates (v beta2)




                        
                                @srisatish
hive>  CREATE TABLE invites (foo INT, bar 
STRING)PARTITIONED BY (ds STRING);


hive>  LOAD DATA LOCAL INPATH 
'$BRISK_HOME/resources/hive/examples/files
/kv2.txt' OVERWRITE INTO TABLE invites 
PARTITION (ds='2008­08­15');


hive>  SELECT count(*), ds FROM invites 
GROUP BY ds;


                                              
    http://www.datastax.com/docs/0.8/brisk/about_hive   @srisatish
ETL
      Real­time
    Cassandra CFs
     DataCenters
        Scale




           
                    @srisatish
     
        @srisatish
No me in team!
    ●   Ben Coverston                ●   Michael Allen
    ●   Ben Werther                  ●   Mike Bulman
    ●   Brandon Williams             ●   Michael Weir
    ●   Cathy Daw                    ●   Nate McCall
    ●   Daria Hutchinson             ●   Nick M Bailey
    ●   Jackson Chung                ●   Patricio Echague
    ●   Jake Luciani                 ●   Tyler Hobbs
    ●   Joaquin Casares              ●   SriSatish Ambati
    ●   Jonathan Ellis               ●   Yewei Zhang



                                  
                                                            @srisatish
                          
    100­node Brisk Cluster on Opscenter
                                          @srisatish
Dynamo, 2007
Bigtable, 2006              +

                               OSS, 2008


                 Incubator 2009
                                                    TLP, 2010


                                                  Cassandra
                           +          +


                                               Brisk
                                            
git clone git@github.com:riptano/brisk.git
http://www.datastax.com/product/brisk
Getting  Started via Brisk AMI.
Mahalo. Thank You. 




                          
                                             @srisatish
References
    ●   MapReduce: Simplified Data Processing on Large Clusters, 2004, Jeffrey Dean and 
        Sanjay Ghemawat, http://bit.ly/googmr_pdf
    ●   Multi­core MapReduce, Kunle, et al. http://bit.ly/iRJd1n




                                                  
                                                                                  @srisatish

More Related Content

What's hot

Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
DataStax
 
Cloud Friendly Hadoop and Hive
Cloud Friendly Hadoop and HiveCloud Friendly Hadoop and Hive
Cloud Friendly Hadoop and Hive
DataWorks Summit
 

What's hot (20)

SSTable Reader Cassandra Day Denver 2014
SSTable Reader Cassandra Day Denver 2014SSTable Reader Cassandra Day Denver 2014
SSTable Reader Cassandra Day Denver 2014
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
 
Intro to py spark (and cassandra)
Intro to py spark (and cassandra)Intro to py spark (and cassandra)
Intro to py spark (and cassandra)
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
 
Cassandra and Spark: Optimizing for Data Locality
Cassandra and Spark: Optimizing for Data LocalityCassandra and Spark: Optimizing for Data Locality
Cassandra and Spark: Optimizing for Data Locality
 
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
 
Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra
 
Spark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesSpark Cassandra Connector Dataframes
Spark Cassandra Connector Dataframes
 
Spark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureSpark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and Future
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-Cases
 
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
 
Spark application on ec2 cluster
Spark application on ec2 clusterSpark application on ec2 cluster
Spark application on ec2 cluster
 
Spark Streaming with Cassandra
Spark Streaming with CassandraSpark Streaming with Cassandra
Spark Streaming with Cassandra
 
Zero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and CassandraZero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and Cassandra
 
Cloud Friendly Hadoop and Hive
Cloud Friendly Hadoop and HiveCloud Friendly Hadoop and Hive
Cloud Friendly Hadoop and Hive
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & Spark
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
 
Apache Spark and DataStax Enablement
Apache Spark and DataStax EnablementApache Spark and DataStax Enablement
Apache Spark and DataStax Enablement
 
Escape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Escape from Hadoop: Ultra Fast Data Analysis with Spark & CassandraEscape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Escape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
 

Similar to Brisk hadoop june2011

Similar to Brisk hadoop june2011 (20)

Brisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjavaBrisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjava
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter Experience
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
 
Stratio big data spain
Stratio   big data spainStratio   big data spain
Stratio big data spain
 
Spark cassandra integration, theory and practice
Spark cassandra integration, theory and practiceSpark cassandra integration, theory and practice
Spark cassandra integration, theory and practice
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and Cassandra
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azure
 
Big Data Solutions in Azure - David Giard
Big Data Solutions in Azure - David GiardBig Data Solutions in Azure - David Giard
Big Data Solutions in Azure - David Giard
 
Dask: Scaling Python
Dask: Scaling PythonDask: Scaling Python
Dask: Scaling Python
 
Can we run the Whole Web on Apache Sling?
Can we run the Whole Web on Apache Sling?Can we run the Whole Web on Apache Sling?
Can we run the Whole Web on Apache Sling?
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
 
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
 
Bids talk 9.18
Bids talk 9.18Bids talk 9.18
Bids talk 9.18
 
JS App Architecture
JS App ArchitectureJS App Architecture
JS App Architecture
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of Datastax
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
 
Riak add presentation
Riak add presentationRiak add presentation
Riak add presentation
 

More from srisatish ambati

How to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in JavaHow to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in Java
srisatish ambati
 

More from srisatish ambati (11)

H2O Open Dallas 2016 keynote for Business Transformation
H2O Open Dallas 2016 keynote for Business TransformationH2O Open Dallas 2016 keynote for Business Transformation
H2O Open Dallas 2016 keynote for Business Transformation
 
Digital Transformation with AI and Data - H2O.ai and Open Source
Digital Transformation with AI and Data - H2O.ai and Open SourceDigital Transformation with AI and Data - H2O.ai and Open Source
Digital Transformation with AI and Data - H2O.ai and Open Source
 
Top 10 Performance Gotchas for scaling in-memory Algorithms.
Top 10 Performance Gotchas for scaling in-memory Algorithms.Top 10 Performance Gotchas for scaling in-memory Algorithms.
Top 10 Performance Gotchas for scaling in-memory Algorithms.
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svcc
 
Jvm goes big_data_sfjava
Jvm goes big_data_sfjavaJvm goes big_data_sfjava
Jvm goes big_data_sfjava
 
jvm goes to big data
jvm goes to big datajvm goes to big data
jvm goes to big data
 
Svccg nosql 2011_sri-cassandra
Svccg nosql 2011_sri-cassandraSvccg nosql 2011_sri-cassandra
Svccg nosql 2011_sri-cassandra
 
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
 
How to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in JavaHow to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in Java
 
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
 
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 

Brisk hadoop june2011