More Related Content
Similar to From PoCs to Production (20)
From PoCs to Production
- 2. © 2016 DataStax, All Rights Reserved.
Agenda
• Pilots/PoCs - best practices
• Infrastructure
• Testing
• Coding
• Optimising for Production
2
- 3. © 2016 DataStax, All Rights Reserved.
PoCs / Pilots
• Start with, what are your queries ? - always !
• About 5 to 8 is common for a pilot.
• Build your Rest API first.
3
- 4. © 2016 DataStax, All Rights Reserved.
Non Functional Requirements
• Requests per second (writes/reads and batch loading/reporting).
• Throughput at peak
• 99.9s % latency slas (if any)
4
- 5. © 2016 DataStax, All Rights Reserved.
What’s your Problem ! ? !
• Is it speed/throughput/volume ?
• Is there any searching that needs to be done ?
• Is there any analytics or BI involved ?
5
- 6. © 2016 DataStax, All Rights Reserved.
Event capture and replay
CREATE TABLE IF NOT EXISTS eventsource (
date text,
bucket int,
id uuid,
aggregatetype text,
host text,
loglevel text,
data text,
time timestamp,
eventtype text,
PRIMARY KEY((date, bucket), time, id)
6
- 8. © 2016 DataStax, All Rights Reserved.
Cassandra Data Modelling
• Searching (Multiple views and secondary Indexes can help)
• Joining (None)
• Model for queries - not entities
• Make use of collections, they can be Cassandra’s secret weapon
• Make use of counters
8
- 9. © 2016 DataStax, All Rights Reserved.
Advanced Cassandra Data Modelling
• How long do I want to keep my data for ?
• How often will I access it ?
• Billions of rows with small no of columns Vs Millions of rows with many columns
• Row cache, key cache, on-heap vs off-heap
9
- 10. © 2016 DataStax, All Rights Reserved.
DataStax Data Modelling
• OLTP - Cassandra
• Search - modelling data for Search
• Analytics - modelling Data with Spark - I can now do joins (and a lot more) !
• Having in-memory (off heap) capabilities, no more separate caching.
• And soon Graph.
10
- 12. © 2016 DataStax, All Rights Reserved.
Coding
• Always ensure you can run multi-threaded (or at least multiple instances)
• Think Async first.
• Building Rest Api first.
12
- 13. © 2016 DataStax, All Rights Reserved.
Testing
• Real life (as close as you can) data
• Profiling
• Monitoring
• CQLsh Tracing
• Stress testing
13
- 14. © 2016 DataStax, All Rights Reserved.
Infrastructure
• Don’t use SAN / NAS
• Try to mimic Production as much as possible
• SSDs are always preferred
• Cores / Ram / VMs / Docker
14
- 16. © 2016 DataStax, All Rights Reserved.
Application Optimisation
• Always - run multi-threaded
• Always - use executeAsync unless absolutely necessary
• Always - think about the difference in short term vs long term storage, it affects a
lot more than you think.
• Always - use bucketing to increase efficiency. Think before storing billions of
partitions per node.
16
- 17. for (TimeSeries timeSeries : timeSeriesEvents){
BoundStatement bs = new BoundStatement(preparedStatement);
bs.bind (timeSeries.getSymbol(), timeSeries.getDate(), timeSeries.getValue());
session.execute (bs);
}
- 18. © 2016 DataStax, All Rights Reserved.
Application Optimisation
18
• Always - run multi-threaded
• Always - use executeAsync unless absolutely necessary
• Always - think about the difference in short term vs long term storage, it affects a
lot more than you think.
• Always - use bucketing to increase efficiency. Think before storing billions of
partitions per node.
- 19. List<ResultSetFuture> results = new ArrayList<ResultSetFuture>();
for (TimeSeries timeSeries : timeSeriesEvents){
BoundStatement bs = new BoundStatement(preparedStatement);
bs.bind (timeSeries.getSymbol(), timeSeries.getDate(), timeSeries.getValue());
session.executeAsync (bs);
}
for (ResultSetFuture result : results) {
result.getUninterruptibly();
}
AsyncWriterWrapper.java - http://bit.ly/1OwHQ3m
- 20. © 2016 DataStax, All Rights Reserved.
Application Optimisation
20
• Always - run multi-threaded
• Always - use executeAsync unless absolutely necessary
• Always - think about the difference in short term vs long term storage, it
affects a lot more than you think.
• Always - use bucketing to increase efficiency. Think before storing billions of
partitions per node.
- 21. © 2016 DataStax, All Rights Reserved.
Application Optimisation
21
• Always - run multi-threaded
• Always - use executeAsync unless absolutely necessary
• Always - think about the difference in short term vs long term storage, it affects a
lot more than you think.
• Always - use bucketing to increase efficiency. Think before storing billions
of partitions per node.
- 22. © 2016 DataStax, All Rights Reserved.
And
• http://academy.datastax.com
• http://www.github.com/PatrickCallaghan
• http://www.github.com/DataStaxCodeSamples
• https://academy.datastax.com/units/how-visualize-event-data-using-banana-and-
solr-datastax-enterprise
22