SlideShare a Scribd company logo
Brisk: Truly peer­to­peer Hadoop
       

      srisatish.ambati AT gmail.com
      Apache Cassandra/OpenJDK
      @srisatish


                      
Brisk: Hive + Hadoop + Cassandra




                    
                                  @srisatish
Map Reduce




         
                 @srisatish
Have large sets of data & you can 
    work on small pieces in parallel. 




                     
                                    @srisatish
        
    Map Reduce
                 @srisatish
Multi­core map reduce framework, 
    Kunle, et al

                       
                                        @srisatish
                   
    Parallel Execution View   @srisatish
     
        @srisatish
     
        @srisatish
JobTracker
    NameNode
      HDFS




         
                 @srisatish
Write­once­read­many!
    File once created, written & closed need change




                            
                                                @srisatish
Move computation, not data




                 
                                 @srisatish
     
        @srisatish
DataNodes: Read, Write Blocks




                   
                                    @srisatish
NameNode: 
Single Master node
Single Machine Address space
Single Point of failure




                       
When “it” does not fit in a single node!
    … Enter the distributed dragon!



                  Enter the Cassandra:
                       High Scale
                      Peer­to­peer




                              
                                               @srisatish
NameNode




    DataNodes
           
One­kind­of­node!




        
Cassandra:
     High Scale
    Peer­to­peer




          
                   @srisatish
Portfolio Demo
Low latency
        Live tick prices for stocks.
Batch Analytics
        Historical EOD prices.
        Value at Risk.



    http://www.datastax.com/docs/0.8/brisk/brisk_demo
                                  
Demo URLs (good for this demo only)


http://ec2­50­19­4­143.compute­1.amazonaws.com:8888/opscenter/index.html
http://ec2­67­202­12­176.compute­1.amazonaws.com:50030/jobdetails.jsp?job
http://ec2­50­19­4­143.compute­1.amazonaws.com:8983/portfolio/




                                           
Dynamo, 2007
Bigtable, 2006




                        OSS, 2008




      Incubator, 2009      TLP, 2010
Y
                                       Key “C”
                           A
        W
            Cassandra:
             High Scale
    U
            Peer­to­peer           F
             No SPOF

        T
                               L
              P

                   
                                                 @srisatish
“dynamic” columnfamilies

                         Following
zznate        driftx:   thobbs:


driftx


thobbs       zznate:


jbellis       driftx:   mdennis:   pcmanus:   thobbs:   xedin:   zznate:
     
     
Brisk




         
            @srisatish
Brisk
    HowStuffWorks version




               
                            @srisatish
YDH security edition (soon to be Apache)
Apache Hive – Access via SQL like
Cassandra 0.8
CQL Interface
Apache Thrift




                         
Use ColumnFamilies
inode
sblock  




                 
                     @srisatish
 
     String keyspace = “cfs”;
     CfDef cf = new CfDef();
        cf.setName(inodeDefaultCf);
        cf.setComparator_type("BytesType");
     …
          
         cf.setName(sblockDefaultCf);
         cf.setKey_cache_size(1M);
         cf.setComment( 
    "Stores blocks of information associated with a inode");

    cf.setKeyspace(keyspace);
                                  
                                                           @srisatish
Consistency: R + W > N

"brisk.consistencylevel.read", "QUORUM";
"brisk.consistencylevel.write", "QUORUM";




                         
                                            @srisatish
Hadoop: 
job tracker, task tracker




                 
                            @srisatish
BriskSnitch: 
brisk nodes, cassandra nodes




                 
                               @srisatish
BriskSimpleSnitch.java

if(TrackerInitializer.isTrackerNode)
     {
           myDC = BRISK_DC;
          logger.info("Detected Hadoop trackers 
are enabled, setting my DC to " + myDC);
      }
 else
      {
            myDC = CASSANDRA_DC;
            logger.info("Looks like Vanilla Cassandra 
nodes, setting my DC to " + myDC);
      }
                               
                                                   @srisatish
Hive: SQL­like access
cli, hwi, jdbc, metastore
Pushdown predicates (v beta2)




                        
                                @srisatish
hive>  CREATE TABLE invites (foo INT, bar 
STRING)PARTITIONED BY (ds STRING);


hive>  LOAD DATA LOCAL INPATH 
'$BRISK_HOME/resources/hive/examples/files
/kv2.txt' OVERWRITE INTO TABLE invites 
PARTITION (ds='2008­08­15');


hive>  SELECT count(*), ds FROM invites 
GROUP BY ds;


                                              
    http://www.datastax.com/docs/0.8/brisk/about_hive   @srisatish
ETL
      Real­time
    Cassandra CFs
     DataCenters
        Scale




           
                    @srisatish
     
        @srisatish
No me in team!
    ●   Ben Coverston                ●   Jonathan Ellis
    ●   Ben Werther                  ●   Michael Allen
    ●   Brandon Williams             ●   Mike Bulman
    ●   Cathy Daw                    ●   Nate McCall
    ●   Daria Hutchinson             ●   Nick M Bailey
    ●   Eric Gilmore                 ●   Patricio Echague
    ●   Jackson Chung                ●   Tyler Hobbs
    ●   Jake Luciani                 ●   SriSatish Ambati
    ●   Joaquin Casares              ●   Yewei Zhang



                                  
                                                            @srisatish
                          
    100­node Brisk Cluster on Opscenter
                                          @srisatish
Dynamo, 2007
Bigtable, 2006              +

                               OSS, 2008


                 Incubator 2009
                                                    TLP, 2010


                                                  Cassandra
                           +          +


                                               Brisk
                                            
Git started:
git clone git@github.com:riptano/brisk.git
http://www.datastax.com/product/brisk
Getting  Started via Brisk AMI.
Thank You. 




                          
                                       @srisatish
References
    ●   MapReduce: Simplified Data Processing on Large Clusters, 2004, Jeffrey Dean and 
        Sanjay Ghemawat, http://bit.ly/googmr_pdf
    ●   Multi­core MapReduce, Kunle, et al. http://bit.ly/iRJd1n




                                                  
                                                                                  @srisatish

More Related Content

What's hot

Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
DataStax
 
Cloud Friendly Hadoop and Hive
Cloud Friendly Hadoop and HiveCloud Friendly Hadoop and Hive
Cloud Friendly Hadoop and Hive
DataWorks Summit
 

What's hot (20)

SSTable Reader Cassandra Day Denver 2014
SSTable Reader Cassandra Day Denver 2014SSTable Reader Cassandra Day Denver 2014
SSTable Reader Cassandra Day Denver 2014
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
 
Intro to py spark (and cassandra)
Intro to py spark (and cassandra)Intro to py spark (and cassandra)
Intro to py spark (and cassandra)
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
 
Cassandra and Spark: Optimizing for Data Locality
Cassandra and Spark: Optimizing for Data LocalityCassandra and Spark: Optimizing for Data Locality
Cassandra and Spark: Optimizing for Data Locality
 
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
 
Spark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesSpark Cassandra Connector Dataframes
Spark Cassandra Connector Dataframes
 
Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra
 
Spark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureSpark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and Future
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-Cases
 
Tokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
Tokyo Cassandra Summit 2014: Tunable Consistency by Al TobeyTokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
Tokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
 
Spark application on ec2 cluster
Spark application on ec2 clusterSpark application on ec2 cluster
Spark application on ec2 cluster
 
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
 
Spark Streaming with Cassandra
Spark Streaming with CassandraSpark Streaming with Cassandra
Spark Streaming with Cassandra
 
Zero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and CassandraZero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and Cassandra
 
Onyx data processing the clojure way
Onyx   data processing  the clojure wayOnyx   data processing  the clojure way
Onyx data processing the clojure way
 
Cloud Friendly Hadoop and Hive
Cloud Friendly Hadoop and HiveCloud Friendly Hadoop and Hive
Cloud Friendly Hadoop and Hive
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & Spark
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
 

Similar to Brisk hadoop june2011_sfjava

Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceQubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
Joydeep Sen Sarma
 

Similar to Brisk hadoop june2011_sfjava (20)

Brisk hadoop june2011
Brisk hadoop june2011Brisk hadoop june2011
Brisk hadoop june2011
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter Experience
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
Can we run the Whole Web on Apache Sling?
Can we run the Whole Web on Apache Sling?Can we run the Whole Web on Apache Sling?
Can we run the Whole Web on Apache Sling?
 
Big Data Solutions in Azure - David Giard
Big Data Solutions in Azure - David GiardBig Data Solutions in Azure - David Giard
Big Data Solutions in Azure - David Giard
 
Riak add presentation
Riak add presentationRiak add presentation
Riak add presentation
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azure
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra Project
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Bids talk 9.18
Bids talk 9.18Bids talk 9.18
Bids talk 9.18
 
Dask: Scaling Python
Dask: Scaling PythonDask: Scaling Python
Dask: Scaling Python
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatterns
 
JS App Architecture
JS App ArchitectureJS App Architecture
JS App Architecture
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and Cassandra
 
Stratio big data spain
Stratio   big data spainStratio   big data spain
Stratio big data spain
 
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceQubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion Queries
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
 

More from srisatish ambati

How to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in JavaHow to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in Java
srisatish ambati
 

More from srisatish ambati (11)

H2O Open Dallas 2016 keynote for Business Transformation
H2O Open Dallas 2016 keynote for Business TransformationH2O Open Dallas 2016 keynote for Business Transformation
H2O Open Dallas 2016 keynote for Business Transformation
 
Digital Transformation with AI and Data - H2O.ai and Open Source
Digital Transformation with AI and Data - H2O.ai and Open SourceDigital Transformation with AI and Data - H2O.ai and Open Source
Digital Transformation with AI and Data - H2O.ai and Open Source
 
Top 10 Performance Gotchas for scaling in-memory Algorithms.
Top 10 Performance Gotchas for scaling in-memory Algorithms.Top 10 Performance Gotchas for scaling in-memory Algorithms.
Top 10 Performance Gotchas for scaling in-memory Algorithms.
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svcc
 
Jvm goes big_data_sfjava
Jvm goes big_data_sfjavaJvm goes big_data_sfjava
Jvm goes big_data_sfjava
 
jvm goes to big data
jvm goes to big datajvm goes to big data
jvm goes to big data
 
Svccg nosql 2011_sri-cassandra
Svccg nosql 2011_sri-cassandraSvccg nosql 2011_sri-cassandra
Svccg nosql 2011_sri-cassandra
 
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
 
How to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in JavaHow to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in Java
 
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
 
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 

Recently uploaded (20)

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 

Brisk hadoop june2011_sfjava