Your SlideShare is downloading. ×
0
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski

7,729

Published on

Published in: Sports, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,729
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
64
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  1. Handling realtime and analyticworkloads in a single clusterwith Hadoop and CassandraHandling realtime and analyticworkloads in a single clusterwith Hadoop and CassandraPiotr Kołaczkowskipkolaczk@datastax.com@pkolaczkPiotr Kołaczkowskipkolaczk@datastax.com@pkolaczk
  2. Basic Cassandra + Hadoop IntegrationC*C*C*C*C*C*C*C*CassandraClusterHadoop ClusterNameNode & JobTrackerDataNode DataNodeDataNode DataNodeDataNode DataNodeCFIFCFOF
  3. ColumnFamilyInputFormatjim age: 36 car: camaro gender: Mcarol age: 37 car: subarujohnny age: 12 gender: Msuzy age: 10 gender: FKey: ByteBufferValue: SortedMap<ByteBuffer, IColumn>(column name, value, timestamp)row keycolumn name
  4. ColumnFamilyInputFormatjim age: 36 car: camaro gender: Mcarol age: 37 car: subarujohnny age: 12 gender: Msuzy age: 10 gender: FInput Key:jimage: 36 car: camaro gender: MInput Value:
  5. ColumnFamilyInputFormatjim age: 36 car: camaro gender: Mcarol age: 37 car: subarujohnny age: 12 gender: Msuzy age: 10 gender: FInput Key:carolage: 37 car: subaruInput Value:
  6. ColumnFamilyInputFormatjim age: 36 car: camaro gender: Mcarol age: 37 car: subarujohnny age: 12 gender: Msuzy age: 10 gender: FInput Key:johnnyage: 12 gender: MInput Value:
  7. ColumnFamilyInputFormatjim age: 36 car: camaro gender: Mcarol age: 37 car: subarujohnny age: 12 gender: Msuzy age: 10 gender: FInput Key:suzyage: 10 gender: FInput Value:
  8. CFIF – Wide Row SupportInput Key:jimage: 36Input Value:jim age: 36 car: camaro gender: Mcarol age: 37 car: subarujohnny age: 12 gender: Msuzy age: 10 gender: F
  9. CFIF – Wide Row SupportInput Key:jimcar: camaroInput Value:jim age: 36 car: camaro gender: Mcarol age: 37 car: subarujohnny age: 12 gender: Msuzy age: 10 gender: F
  10. CFIF – Wide Row SupportInput Key:jimgender: MInput Value:jim age: 36 car: camaro gender: Mcarol age: 37 car: subarujohnny age: 12 gender: Msuzy age: 10 gender: F
  11. CFIF – Wide Row SupportInput Key:carolage: 37Input Value:jim age: 36 car: camaro gender: Mcarol age: 37 car: subarujohnny age: 12 gender: Msuzy age: 10 gender: F
  12. CFIF – Wide Row SupportInput Key:carolcar: subaruInput Value:jim age: 36 car: camaro gender: Mcarol age: 37 car: subarujohnny age: 12 gender: Msuzy age: 10 gender: F
  13. CFIF – Cassandra Secondary Index SupportIndexExpression expr =new IndexExpression(ByteBufferUtil.bytes("car"),IndexOperator.EQ,ByteBufferUitl.bytes("subaru"));ConfigHelper.setInputRange(job.getConfiguration(),Arrays.asList(expr));jim age: 36 car: camaro gender: Mcarol age: 37 car: subarujohnny age: 12 gender: Msuzy age: 10 gender: F
  14. ColumnFamilyOutputFormat● Key: ByteBuffer (row key)● Value: List<Mutation>– Mutation: insert or delete a columnC*C*C*C*C*C*C*C*CassandraClusterColumnFamilyRecordWriterwritequeueclientthrift
  15. CFOF – Creating MutationsByteBuffer rowkey = ByteBufferUtil.bytes(“carol”);Column column = new Column();column.name = ByteBufferUtil.bytes(“age”);column.value = ByteBufferUtil.bytes(37);List<Mutation> mutations;Mutation mutation = new Mutation();mutation.column_or_supercolumn = new ColumnOrSuperColumn();mutation.column_or_supercolumn.column = column;mutations.add(mutation);context.write(rowkey, mutationList);
  16. BulkOutputFormatHadoop Temporary DirSSTable 1 SSTable 2 SSTable N...flushwriteBulkRecordWriterMemory Buffer
  17. DataStax Enterprise:Cassandra and Hadoop in a Single Cluster
  18. Basic Features● Single, simplified component● Workload separation● No SPOF● Peer to peer● JobTracker failover● No additional Cassandra config
  19. System Administrators ViewAddress DC Rack Workload Status State Load Owns Token148873535527910577765226390751398592512101.202.204.101 Analytics rack1 Analytics(JT) Up Normal 78,96 GB 12,50% 0101.202.204.102 Analytics rack1 Analytics(TT) Up Normal 82,65 GB 12,50% 21267647932558653966460912964485513216101.202.204.103 Analytics rack1 Analytics(TT) Up Normal 74,96 GB 12,50% 42535295865117307932921825928971026432101.202.204.104 Analytics rack1 Analytics(TT) Up Normal 78,79 GB 12,50% 63802943797675961899382738893456539648101.202.204.105 Cassandra rack1 Cassandra Up Normal 67,42 GB 12,50% 85070591730234615865843651857942052864101.202.204.106 Cassandra rack1 Cassandra Up Normal 60,86 GB 12,50% 106338239662793269832304564822427566080101.202.204.107 Cassandra rack1 Cassandra Up Normal 81,27 GB 12,50% 127605887595351923798765477786913079296101.202.204.108 Cassandra rack1 Cassandra Up Normal 77,17 GB 12,50% 148873535527910577765226390751398592512Easy monitoring ofyour nodes,regardless of theirworkload type
  20. Wait, but where are my files?Hadoop M/RHDFSHadoop M/RCFSCassandra Server
  21. Cassandra File System Properties● Decentralized● Replicated● HDFS compatible– compatible with Hadoop filesystem utilities– allows for running M/R programs on DSE withoutany change● Compressed
  22. CFS Architecture
  23. CFS Compaction● Keeps track of deleted rows (blocks)● When all blocks in SSTable removed,deletes the whole SSTableCassandra Storageblock 1block 2block 3block 4block 5block 6ts 1ts 2block 6 block 6block 7block 8ts 3ts 4block 6block 9block 10X
  24. Hive Integration● CassandraHiveMetaStore– stores Hive database metadata in Cassandra– no need to run a separate RDBMS● CassandraStorageHandler– allows for direct access to C* tables with CFIF andCFOF
  25. Hive Integration – ExampleCREATE EXTERNAL TABLE MyHiveTable(row_key string, col1 string, col2 string)STORED BY org.apache.hadoop.hive.cassandra.CassandraStorageHandlerTBLPROPERTIES ("cassandra.ks.name" = "MyCassandraKS");SELECT count(*) FROM MyHiveTable;Total MapReduce jobs = 1Launching Job 1 out of 1Number of reduce tasks determined at compile time: 1In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:set hive.exec.reducers.max=<number>In order to set a constant number of reducers:set mapred.reduce.tasks=<number>Starting Job = job_201306041030_0001, Tracking URL = http://192.168.123.10:50030/jobdetails.jsp?jobid=job_201306041030_0001Kill Command = /usr/bin/dse hadoop job -Dmapred.job.tracker=192.168.123.10:8012 -kill job_201306041030_0001Hadoop job information for Stage-1: number of mappers: 9; number of reducers: 12013-06-04 15:11:54,573 Stage-1 map = 0%, reduce = 0%2013-06-04 15:11:58,622 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 1.04 sec2013-06-04 15:11:59,691 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 1.04 sec...2013-06-04 15:12:28,288 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 31.91 sec2013-06-04 15:12:29,304 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 31.91 sec2013-06-04 15:12:30,330 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 31.91 sec2013-06-04 15:12:31,339 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 31.91 secMapReduce Total cumulative CPU time: 31 seconds 910 msecEnded Job = job_201306041030_0001MapReduce Jobs Launched:Job 0: Map: 9 Reduce: 1 Cumulative CPU: 31.91 sec HDFS Read: 0 HDFS Write: 0 SUCCESSTotal MapReduce CPU Time Spent: 31 seconds 910 msecOK1000000Time taken: 46.246 seconds
  26. Custom Column MappingCREATE EXTERNAL TABLE Users(userid string, name string, email string, phone string)STORED BY org.apache.hadoop.hive.cassandra.CassandraStorageHandlerWITHSERDEPROPERTIES ("cassandra.columns.mapping" = ":key,user_name,primary_email,home_phone");Cassandra: row key user_name primary_email home_phoneHive: userid name email phone

×