SlideShare a Scribd company logo
End-to-end Analytics
          with Apache Cassandra




C*                                #cassandra12
Basics
     • CFIF/CFOF/CFRR/CFRW
     • BulkOutputFormat
     • Input locality (identical to HDFS)
     • Wide row, 2I, composite support
     • Pig, Hive, Mahout, Sqoop, Oozie, DSE
       Analytics

C*                                  #cassandra12
Why Cassandra?

     • Excellent Hadoop capabilities built-in
     • Multi-datacenter support, load isolation
     • Operationally, order of magnitude simpler
     • DSE Analytics is all-in-one, simpler still
     • Review your requirements, do homework
C*                                       #cassandra12
Use Cases

     • Trends, recommendations, reporting, etc.
     • Detect and fix problems in data
     • New realtime (or analytic) query pattern?
      • Backpopulate new CF with historical data

C*                                    #cassandra12
Data Model
     • Consider growth patterns and analytic
       query patterns for your data
      • One-off inquiry or regular processing?
      • Fast growing, want only small slices?
     • Consider active/archive CFs
     • Secondary indexes for small inputs
C*                                      #cassandra12
Miscellaneous Tips


     • Don’t forget about tombstones
     • BOP to enable range slices (Rows with
       keys ‘A*’ to ‘F*’)




C*                                     #cassandra12
Cassandra + Oozie
     • Workflows: cohesive, nestable, scheduled
     • Web UI, CLI, web service
     • Cassandra properties in oozie job
       properties, and workflow.xml
     • Writing out to Cassandra:
       mapreduce.fileoutputcommitter.marksuccessfuljobs to false


     • DSE Analytics works with Oozie 3.2.1+
C*                                                            #cassandra12
Cassandra + Pig
     • Data with validators
      • Pig tuples (address.name, address.value)
     • Data with default validator
      • Bag of key, value pairs (tuples)
      • Unmarshal with Pygmalion
        • Select by regex (eg ‘1369*’, ‘link*’)
C*                                      #cassandra12
Cassandra + Pig
     • Output to Cassandra
      • Can output directly with (key, (name,
         value), (name, value)...) format
      • For tabular data, format output with
         Pygmalion’s ToCassandraBag
      • Use BulkOutputFormat (C* 1.1)
C*                                          #cassandra12
Cassandra + Pig
     • Composite column support (C* 1.0.9+)
     • Counter support (C* 1.0.9+)
     • Secondary Index support for relatively
       small slices (C* 1.1+)
     • Wide row support (C* 1.1+)
     • Composite key support (C* 1.1.3+)
C*                                     #cassandra12
Cassandra + Hadoop X
     • Example: Cassandra + CDH3
      • Start with Cassandra ring
      • Add NN, JT, Oozie server
      • TaskTracker, DataNode on each node
      • Jobs/launching point have Cassandra info
      • Segregate out analytics with virtual DC
C*                                     #cassandra12
Current/Future Work


     • Cassandra core Hive support (outside of
       Brisk) (CASSANDRA-4131)




C*                                     #cassandra12
For More Information


     • Follow @CassandraHadoop on Twitter
     • http://wiki.apache.org/cassandra/
       HadoopSupport




C*                                  #cassandra12
Questions?




C*                #cassandra12

More Related Content

What's hot

Hadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-DelhiHadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-Delhi
Joydeep Sen Sarma
 
מיכאל
מיכאלמיכאל
מיכאל
sqlserver.co.il
 
Hadoop sqoop
Hadoop sqoop Hadoop sqoop
Hadoop sqoop
Wei-Yu Chen
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
Siva Pandeti
 
The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012
Joydeep Sen Sarma
 
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLCloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
liuknag
 
Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤
Toshihiro Suzuki
 
Apache drill
Apache drillApache drill
Apache drill
Jakub Pieprzyk
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!
Databricks
 
Partners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with HadoopPartners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with Hadoop
Stu Hood
 
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
Bouquet
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Adam Kawa
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
Nathan Bijnens
 
Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015
Joydeep Sen Sarma
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
Rajesh Nadipalli
 
Hadoop
HadoopHadoop
Hadoop
Cassell Hsu
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
Ricardo Varela
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
MapR Technologies
 

What's hot (20)

Nextag talk
Nextag talkNextag talk
Nextag talk
 
Hadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-DelhiHadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-Delhi
 
מיכאל
מיכאלמיכאל
מיכאל
 
Hadoop sqoop
Hadoop sqoop Hadoop sqoop
Hadoop sqoop
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012
 
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLCloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
 
Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤
 
Apache drill
Apache drillApache drill
Apache drill
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!
 
Partners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with HadoopPartners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with Hadoop
 
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
 
Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Hadoop and Distributed Computing
Hadoop and Distributed ComputingHadoop and Distributed Computing
Hadoop and Distributed Computing
 
Hadoop
HadoopHadoop
Hadoop
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
 

Viewers also liked

Plagger はじめよう
Plagger はじめようPlagger はじめよう
Plagger はじめよう
Seacolor
 
Friends Don't Let Friends Clap on One and Three: a Backbeat Clapping Study
Friends Don't Let Friends Clap on One and Three: a Backbeat Clapping StudyFriends Don't Let Friends Clap on One and Three: a Backbeat Clapping Study
Friends Don't Let Friends Clap on One and Three: a Backbeat Clapping Study
Ethan Hein
 
Build Your Brand and Build Your Business
Build Your Brand and Build Your BusinessBuild Your Brand and Build Your Business
Build Your Brand and Build Your Business
KeySplash Creative, Inc.
 
ISUCONの勝ち方 YAPC::Asia Tokyo 2015
ISUCONの勝ち方 YAPC::Asia Tokyo 2015ISUCONの勝ち方 YAPC::Asia Tokyo 2015
ISUCONの勝ち方 YAPC::Asia Tokyo 2015
Masahiro Nagano
 
Design process
Design processDesign process
Design process
Divonis.com
 
El Lenguaje Fotografico
El Lenguaje FotograficoEl Lenguaje Fotografico
El Lenguaje Fotografico
Rosa Fernández
 
Enduring CSS
Enduring CSSEnduring CSS
Enduring CSS
Takazudo
 
120 Awesome Marketing Stats, Charts and Graphs
120 Awesome Marketing Stats, Charts and Graphs120 Awesome Marketing Stats, Charts and Graphs
120 Awesome Marketing Stats, Charts and Graphs
HubSpot
 
DISPLAY LUMAscape
DISPLAY LUMAscapeDISPLAY LUMAscape
DISPLAY LUMAscape
LUMA Partners
 

Viewers also liked (9)

Plagger はじめよう
Plagger はじめようPlagger はじめよう
Plagger はじめよう
 
Friends Don't Let Friends Clap on One and Three: a Backbeat Clapping Study
Friends Don't Let Friends Clap on One and Three: a Backbeat Clapping StudyFriends Don't Let Friends Clap on One and Three: a Backbeat Clapping Study
Friends Don't Let Friends Clap on One and Three: a Backbeat Clapping Study
 
Build Your Brand and Build Your Business
Build Your Brand and Build Your BusinessBuild Your Brand and Build Your Business
Build Your Brand and Build Your Business
 
ISUCONの勝ち方 YAPC::Asia Tokyo 2015
ISUCONの勝ち方 YAPC::Asia Tokyo 2015ISUCONの勝ち方 YAPC::Asia Tokyo 2015
ISUCONの勝ち方 YAPC::Asia Tokyo 2015
 
Design process
Design processDesign process
Design process
 
El Lenguaje Fotografico
El Lenguaje FotograficoEl Lenguaje Fotografico
El Lenguaje Fotografico
 
Enduring CSS
Enduring CSSEnduring CSS
Enduring CSS
 
120 Awesome Marketing Stats, Charts and Graphs
120 Awesome Marketing Stats, Charts and Graphs120 Awesome Marketing Stats, Charts and Graphs
120 Awesome Marketing Stats, Charts and Graphs
 
DISPLAY LUMAscape
DISPLAY LUMAscapeDISPLAY LUMAscape
DISPLAY LUMAscape
 

Similar to End-to-end Analytics with Apache Cassandra

PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark Meetup
Frens Jan Rumph
 
Introduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and HadoopIntroduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and Hadoop
Patricia Gorla
 
Cassandra & Spark for IoT
Cassandra & Spark for IoTCassandra & Spark for IoT
Cassandra & Spark for IoT
Matthias Niehoff
 
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupDataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
Victor Coustenoble
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
Brent Theisen
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
Brian Enochson
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
Victor Coustenoble
 
Cassandra 2.0 (Introduction)
Cassandra 2.0 (Introduction)Cassandra 2.0 (Introduction)
Cassandra 2.0 (Introduction)
bigdatagurus_meetup
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introduction
fardinjamshidi
 
"Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-...
"Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-..."Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-...
"Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-...
hamidsamadi
 
Spark Introduction
Spark IntroductionSpark Introduction
Spark Introduction
DataStax Academy
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
Roger Xia
 
Spark cassandra integration, theory and practice
Spark cassandra integration, theory and practiceSpark cassandra integration, theory and practice
Spark cassandra integration, theory and practice
Duyhai Doan
 
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael KjellmanC* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
DataStax Academy
 
Hindsight is 20/20: MySQL to Cassandra
Hindsight is 20/20: MySQL to CassandraHindsight is 20/20: MySQL to Cassandra
Hindsight is 20/20: MySQL to Cassandra
Michael Kjellman
 
Munich March 2015 - Cassandra + Spark Overview
Munich March 2015 -  Cassandra + Spark OverviewMunich March 2015 -  Cassandra + Spark Overview
Munich March 2015 - Cassandra + Spark Overview
Christopher Batey
 
Cassandra integrations
Cassandra integrationsCassandra integrations
Cassandra integrations
T Jake Luciani
 
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
Victor Coustenoble
 
Developing with Cassandra
Developing with CassandraDeveloping with Cassandra
Developing with CassandraSperasoft
 

Similar to End-to-end Analytics with Apache Cassandra (20)

PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark Meetup
 
Introduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and HadoopIntroduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and Hadoop
 
Cassandra & Spark for IoT
Cassandra & Spark for IoTCassandra & Spark for IoT
Cassandra & Spark for IoT
 
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupDataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
 
Cassandra 2.0 (Introduction)
Cassandra 2.0 (Introduction)Cassandra 2.0 (Introduction)
Cassandra 2.0 (Introduction)
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introduction
 
"Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-...
"Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-..."Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-...
"Real-time data processing with Spark & Cassandra", jDays 2015 Speaker: "Duy-...
 
Spark Introduction
Spark IntroductionSpark Introduction
Spark Introduction
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
Spark cassandra integration, theory and practice
Spark cassandra integration, theory and practiceSpark cassandra integration, theory and practice
Spark cassandra integration, theory and practice
 
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael KjellmanC* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
 
Hindsight is 20/20: MySQL to Cassandra
Hindsight is 20/20: MySQL to CassandraHindsight is 20/20: MySQL to Cassandra
Hindsight is 20/20: MySQL to Cassandra
 
Munich March 2015 - Cassandra + Spark Overview
Munich March 2015 -  Cassandra + Spark OverviewMunich March 2015 -  Cassandra + Spark Overview
Munich March 2015 - Cassandra + Spark Overview
 
Cassandra integrations
Cassandra integrationsCassandra integrations
Cassandra integrations
 
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
 
Developing with Cassandra
Developing with CassandraDeveloping with Cassandra
Developing with Cassandra
 

More from Jeremy Hanna

Göteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache CassandraGöteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache Cassandra
Jeremy Hanna
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
Jeremy Hanna
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
Jeremy Hanna
 
Modern Cassandra for Developers
Modern Cassandra for DevelopersModern Cassandra for Developers
Modern Cassandra for DevelopersJeremy Hanna
 
Troubleshooting Cassandra
Troubleshooting CassandraTroubleshooting Cassandra
Troubleshooting Cassandra
Jeremy Hanna
 
Cassandra + Hadoop: Analisi Batch con Apache Cassandra
Cassandra + Hadoop: Analisi Batch con Apache CassandraCassandra + Hadoop: Analisi Batch con Apache Cassandra
Cassandra + Hadoop: Analisi Batch con Apache CassandraJeremy Hanna
 
Cassandra eu
Cassandra euCassandra eu
Cassandra eu
Jeremy Hanna
 
Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon Jeremy Hanna
 
Cassandra+Hadoop
Cassandra+HadoopCassandra+Hadoop
Cassandra+Hadoop
Jeremy Hanna
 

More from Jeremy Hanna (9)

Göteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache CassandraGöteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache Cassandra
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
Modern Cassandra for Developers
Modern Cassandra for DevelopersModern Cassandra for Developers
Modern Cassandra for Developers
 
Troubleshooting Cassandra
Troubleshooting CassandraTroubleshooting Cassandra
Troubleshooting Cassandra
 
Cassandra + Hadoop: Analisi Batch con Apache Cassandra
Cassandra + Hadoop: Analisi Batch con Apache CassandraCassandra + Hadoop: Analisi Batch con Apache Cassandra
Cassandra + Hadoop: Analisi Batch con Apache Cassandra
 
Cassandra eu
Cassandra euCassandra eu
Cassandra eu
 
Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon
 
Cassandra+Hadoop
Cassandra+HadoopCassandra+Hadoop
Cassandra+Hadoop
 

Recently uploaded

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 

Recently uploaded (20)

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 

End-to-end Analytics with Apache Cassandra

  • 1. End-to-end Analytics with Apache Cassandra C* #cassandra12
  • 2. Basics • CFIF/CFOF/CFRR/CFRW • BulkOutputFormat • Input locality (identical to HDFS) • Wide row, 2I, composite support • Pig, Hive, Mahout, Sqoop, Oozie, DSE Analytics C* #cassandra12
  • 3. Why Cassandra? • Excellent Hadoop capabilities built-in • Multi-datacenter support, load isolation • Operationally, order of magnitude simpler • DSE Analytics is all-in-one, simpler still • Review your requirements, do homework C* #cassandra12
  • 4. Use Cases • Trends, recommendations, reporting, etc. • Detect and fix problems in data • New realtime (or analytic) query pattern? • Backpopulate new CF with historical data C* #cassandra12
  • 5. Data Model • Consider growth patterns and analytic query patterns for your data • One-off inquiry or regular processing? • Fast growing, want only small slices? • Consider active/archive CFs • Secondary indexes for small inputs C* #cassandra12
  • 6. Miscellaneous Tips • Don’t forget about tombstones • BOP to enable range slices (Rows with keys ‘A*’ to ‘F*’) C* #cassandra12
  • 7. Cassandra + Oozie • Workflows: cohesive, nestable, scheduled • Web UI, CLI, web service • Cassandra properties in oozie job properties, and workflow.xml • Writing out to Cassandra: mapreduce.fileoutputcommitter.marksuccessfuljobs to false • DSE Analytics works with Oozie 3.2.1+ C* #cassandra12
  • 8. Cassandra + Pig • Data with validators • Pig tuples (address.name, address.value) • Data with default validator • Bag of key, value pairs (tuples) • Unmarshal with Pygmalion • Select by regex (eg ‘1369*’, ‘link*’) C* #cassandra12
  • 9. Cassandra + Pig • Output to Cassandra • Can output directly with (key, (name, value), (name, value)...) format • For tabular data, format output with Pygmalion’s ToCassandraBag • Use BulkOutputFormat (C* 1.1) C* #cassandra12
  • 10. Cassandra + Pig • Composite column support (C* 1.0.9+) • Counter support (C* 1.0.9+) • Secondary Index support for relatively small slices (C* 1.1+) • Wide row support (C* 1.1+) • Composite key support (C* 1.1.3+) C* #cassandra12
  • 11. Cassandra + Hadoop X • Example: Cassandra + CDH3 • Start with Cassandra ring • Add NN, JT, Oozie server • TaskTracker, DataNode on each node • Jobs/launching point have Cassandra info • Segregate out analytics with virtual DC C* #cassandra12
  • 12. Current/Future Work • Cassandra core Hive support (outside of Brisk) (CASSANDRA-4131) C* #cassandra12
  • 13. For More Information • Follow @CassandraHadoop on Twitter • http://wiki.apache.org/cassandra/ HadoopSupport C* #cassandra12
  • 14. Questions? C* #cassandra12

Editor's Notes

  1. \n
  2. wide row, 2I and composite key support in 1.1.\ncomposite column support in 1.0.9+\nwide row support is in both pig and hive\n
  3. \n
  4. So I have all this data...\nAll the normal use cases for Hadoop, plus some interesting use cases that are specific to things like Cassandra.\n
  5. If you are going to seriously use Cassandra with Hadoop, your analytic query patterns also have to be considered.\nFor active/archive CFs, denormalize by query - same as with RT query patterns\nYou can have lots of CFs, so no worries there\n
  6. \n
  7. OOZIE-477: hardcoded goodness if don’t want to use Oozie 3.2.1 (tiny patch)\n
  8. \n
  9. \n
  10. \n
  11. Dachis Group: in production with Cassandra + CDH3 going on 18 months.\nLaunching point may be a server from which you test or submit jobs. On that node, just put the Cassandra information in the default Hadoop conf (mapred-site.xml).\nFor tuning, just realize that Cassandra and Hadoop will share each node’s resources, but that you can scale one with the other. Both require memory, CPU, and IO.\n
  12. \n
  13. \n
  14. \n