Performance Management in ‘Big Data’ Applications
Upcoming SlideShare
Loading in...5
×
 

Performance Management in ‘Big Data’ Applications

on

  • 1,632 views

Do applications using NoSQL still require performance management? Is it always the best option to throw more hardware at a MapReduce job? In both cases, performance management is still about the ...

Do applications using NoSQL still require performance management? Is it always the best option to throw more hardware at a MapReduce job? In both cases, performance management is still about the application, but "Big Data" technologies have added a new wrinkle.

Statistics

Views

Total Views
1,632
Views on SlideShare
1,486
Embed Views
146

Actions

Likes
1
Downloads
51
Comments
0

2 Embeds 146

http://www.scoop.it 145
http://translate.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Map/Reduce Problem PatternsUneven DistributionOptimal splittingOptimizing Design Choke Point (between map and reduce)Complex Jobs aka. Hive QueriesData Locality (a customer of ours has the problem that while the job itself is distributed the data comes from only 3 HBase/Data Nodes)Too Many HBase CallsWasteful Job Code aka. Add more hardware instead of fixing hotspotsPremarture flushing  (see http://blog.dynatrace.com/2012/01/25/about-the-performance-of-map-reduce-jobs/) Cassandra/NoSQLThe theme here will be that from and App point of view the problem patterns haven’t really changes, but we actually have additional onesToo many callsToo much data readNon optimal data accessData driven Locking issuesSlow QueriesUneven DistributionUsing wrong Consistency LevelSlower NodesI/O IssuesGC?
  • Purepath is the only solution spanning client and server (or edge and cloud)Keynote has no Real User MonitoringAppDynamics?New Relic
  • Done by MikeExplain NoSQL on a high levelExplain Key Benefits and Key Challenges
  • Done by MikeExplain MapReduce on a high levelExplain Key Benefits and Key Challenges
  • Ed:Describe how hadoop works on a high levelDescribe a M6DUseCase as an example?Typical Performance Issues and why it is hard (different jobs different options)point towards developer and hive query, complicated, but most potentialMike:We are now starting doing things a little differentWhen you look at the typical Map/Reduce flow you’ll see the major parts, now we can monitor these areas for each jobTherefore we can decide on a job per job basis if we have one of the typical hadoop problems, or if it is worth our while to optimize things at the core of it, at the code level, and here we get pretty decent hotspots. After all when cutting down mapcode from 60 to 20% it helps a lot, after that it might be good enough or now that we spend most of our time in the framework it is time to look at hadoop itself againThe message however is, it’s the same as APM always has been, first identify on a Job basis if and what the problem is and then go for it, don’t just tune away, you’ll need an expert like Ed to go anywhere then.
  • Ed:Describe how hadoop works on a high levelDescribe a M6DUseCase as an example?Typical Performance Issues and why it is hard (different jobs different options)point towards developer and hive query, complicated, but most potentialMike:We are now starting doing things a little differentWhen you look at the typical Map/Reduce flow you’ll see the major parts, now we can monitor these areas for each jobTherefore we can decide on a job per job basis if we have one of the typical hadoop problems, or if it is worth our while to optimize things at the core of it, at the code level, and here we get pretty decent hotspots. After all when cutting down mapcode from 60 to 20% it helps a lot, after that it might be good enough or now that we spend most of our time in the framework it is time to look at hadoop itself againThe message however is, it’s the same as APM always has been, first identify on a Job basis if and what the problem is and then go for it, don’t just tune away, you’ll need an expert like Ed to go anywhere then.
  • Ed:Describe how hadoop works on a high levelDescribe a M6DUseCase as an example?Typical Performance Issues and why it is hard (different jobs different options)point towards developer and hive query, complicated, but most potentialMike:We are now starting doing things a little differentWhen you look at the typical Map/Reduce flow you’ll see the major parts, now we can monitor these areas for each jobTherefore we can decide on a job per job basis if we have one of the typical hadoop problems, or if it is worth our while to optimize things at the core of it, at the code level, and here we get pretty decent hotspots. After all when cutting down mapcode from 60 to 20% it helps a lot, after that it might be good enough or now that we spend most of our time in the framework it is time to look at hadoop itself againThe message however is, it’s the same as APM always has been, first identify on a Job basis if and what the problem is and then go for it, don’t just tune away, you’ll need an expert like Ed to go anywhere then.
  • Done by MikeExplain MapReduce on a high levelExplain Key Benefits and Key Challenges
  • Ed:Describe how hadoop works on a high levelDescribe a M6DUseCase as an example?Typical Performance Issues and why it is hard (different jobs different options)point towards developer and hive query, complicated, but most potentialMike:We are now starting doing things a little differentWhen you look at the typical Map/Reduce flow you’ll see the major parts, now we can monitor these areas for each jobTherefore we can decide on a job per job basis if we have one of the typical hadoop problems, or if it is worth our while to optimize things at the core of it, at the code level, and here we get pretty decent hotspots. After all when cutting down mapcode from 60 to 20% it helps a lot, after that it might be good enough or now that we spend most of our time in the framework it is time to look at hadoop itself againThe message however is, it’s the same as APM always has been, first identify on a Job basis if and what the problem is and then go for it, don’t just tune away, you’ll need an expert like Ed to go anywhere then.

Performance Management in ‘Big Data’ Applications Performance Management in ‘Big Data’ Applications Presentation Transcript

  • Performance Management in ‘Big Data’ Applications It’s still about the ApplicationMichael Kopp, Technology Strategist Edward Capriolomichael.kopp@compuware.com edward@m6d.com@mikopp @edwardcaprioloblog.dynatrace.com m6d.com/blog
  • BigData High Volume/Low Latency DBs Web JavaKey Challenges Key Benefits1) Even Distribution 1) Fast Read/Write2) Correct Schema and Access patterns 2) Horizontal Scalability3) Understanding Application Impact 3) Redundancy and High Availability 3
  • BigData Large Parallel Batch Processing Hive high-level map/reduce JOB query JOB 1 2 1 batch 3 2 trigger 4 3 . . .Key Challenges Hive Server 754 Key Benefits1) Optimal Distribution 1) Massive Horizontal Batch Job2) Unwieldy Configuration 2) Split big Problems into smaller ones3) Can easily waste your resources
  • What is m6d?5
  • Impressions look like… 6
  • Map Reduce Performance7
  • Typical MapReduce Job at m6d8
  • Hadoop at m6d • Critical piece of infrastructure • Long Term Data Storage – Raw logs – Aggregations – Reports – Generated data (feed back loops) • Numerous ETL (Extract Transform Load) • Scheduled and adhoc processes • Used directly by Tech-Team, Ad Ops, Data Science9
  • Hadoop at m6d• Two deployments production and research – ~ 500 TB - 40+ Nodes – ~ 350 TB – 20+ Nodes• Thousands of jobs – <5 minute jobs and 12 hour Job Flows – Mostly Hive Jobs – Some custom code and streaming jobs
  • Hadoop Design Tenants• Linear scalability by adding more hardware• HDFS Distributed file system – User space file system – Blocks are replicated across nodes – Limited semantics• MapReduce – Paradigm which models using map/reduce – Data Locality – Split Job into Tasks by Data – Retry in failure
  • Schema Design Challenges • Partition data for good distribution – By time interval (optionally a second level) • Partition pruning with WHERE – Clustering (aka bucketing) • Optimized sampling and joins – Columnar • Column oriented • Raw Data Growth • Data features change (more distinct X)12
  • Key Performance Challenges • Intermediate I/O – Compression codec – Block size – Split-table formats • Contentions between jobs • Data and Map/Reduce Distribution • Data Skew • Non Uniform Computation (long running tasks) • ‘Cost of new feature – is this justified? • Tuning variables (spills, buffers, Etc, etc)13
  • How to handle Performance Issues? • Profile the Job / Query? – Who should do this? (DBA, Dev, Ops, DevOps , NoOps, Big Data Guru) – How should we do this? • Look at job run times day over day? • Look at code and micro-benchmark? • Collect Job Counters? • Upgrade often for latest performance features? • Investigate/purchase newer better hardware – More cores? RAM? 10G Ethernet? SSD Test Data is not like • Read blogs? Real Data14
  • But how to optimize the job itself?15
  • Understanding Map/Reduce Performance Attention Data Maximum Parallelism Volume!Actual Mapping Also your own Millions of Parallelism Code Executions!!! Attention Potential Choke Point! Maximum Reduce Parallelism Actual Reduce Also your own Parallelism 16 Code
  • Understanding Map/Reduce Performance
  • Map/Reduce Performance18
  • Map/Reduce behind the scenes Serialize De-Serialize and Serialize again Potentionally Inefficient Too Many Files, Same Key spread all overDe-Serialize Expensiveand Serialize Synchronous again Combine19
  • Map/Reduce Combine and Spill Performance 1) Pre Combine in Mapping Step 2) Avoid many intermediate files and combines20
  • Map/Reduce “Map” Performance Avoid Brute Force Then on Big Hotspots FocusOptimize Hadoop Save a lot of Hardware21
  • Map/Reduce to the Max! • Ensure Data Locality • Optimize Map/Reduce Hotspots • Reduce Intermediate Data and “Overhead” • Ensure optimal Data and Compute Distribution • Tune Hadoop Environment22
  • Cassandra and Application Performance23
  • A High Level look at RTB 1. Browsers visit Publishers and create impressions. 2. Publishers sell impressions via Exchanges. 3. Exchanges serve as auction houses for the impressions 4. On behalf of the marketer, m6d bids the impressions via the auction house. If m6d wins, we display our ad to the browser.24
  • Cassandra at m6d for Real Time Bidding • RTB limited data is provided from exchange • System to store information on users – Frequency Capping – Visit History – Segments (product service affinity) • Low latency Requirements – Less then 100ms – Requires fast read/write on discrete data25
  • Cassandra design26
  • Key Cassandra Design Tennents• Swap/paging not possible• Mostly schema-less• Writes do not read – Read/Write is an anti-pattern• Optimize around put and get – Not for scan and query• De-Normalize data – Attempt to get all data in single read*
  • Cassandra Design Challenges • De-normailize – Store data to optimize reads – Composite (multi-column) keys • Multi-column family and Multi-tenant scenarios • Compress settings – Disk and cache savings – CPU and JVM costs • Data/Compaction settings – Size tiered vs LevelDB • Caching, Memtable and other tuning28
  • How to handle performance issues? • Monitor standard vitals (cpu,disk) ? • Read blogs and documentation? • Use Cassandra JMX to track req/sec • Use Cassandra JMX to track size of Column Families, rows and columns • Upgrade often to get latest performance enhancements? * What about the Application?29
  • APM for Cassandra30
  • NoSQL APM is not so different after all… Web Java Database Key APM Problems Identified 1) Response Time Contribution 2) data access patterns 3) transaction to query relationship (transaction flow)31
  • Response Time Contribution Contribution to Business Transaction Connection Pool Access Pattern32
  • Statement Analysis Executions per Average and Total Contribution to Transactions and Business Transaction Execution Time Total33
  • Where, Why, How and which Transaction… Which Business Transaction Which Web ServiceWhere and why in my Transaction Single Statement Performance34
  • How does this apply to NoSQL Databases? Web JavaKey APM Problems Identified1) Response Time Contribution 1) Data Access Distribution2) data access patterns 2) End-to-End Monitoring3) transaction to query 3) Storage (I/O, GC) Bottlenecks relationship (transaction flow) 4) Consistency Level35
  • Real End-to-End Application Performance Our Application Third Party External End User Services End User Response Time Contribution37
  • Understanding Cassandra’s Contribution Which statements did the Transaction Execute? Which node where they executed against? Contribution of each many calls? Too Statment Data Access patterns Which Consistency Level was used?38
  • Understand Response Time Contribution 5 Calls 4 Calls ~50-80 ms ~15 ms Contribution Contribution? Access and Data Distribution39
  • Why and how was a statement executed? 45ms latency? 60ms waiting on the server?40
  • Any Hotspots on the Cassandra Nodes? Much more load on Node3? Which Transactions are responsible41
  • Specific Cassandra Health Metrics42
  • General Health of Cassandra Memory Issues? Too many requests? Too much GC Suspensions?43
  • Conclusion44
  • Extend Performance Focus on Application Web Java A Fast Database doesn’t make a fast Application45
  • Intelligent MapReduce APM data/task node Hive high-level map/reduce JOB data/task node query JOB 1 2 batch 1 3 master node 2 trigger 4 3 . . Hive Server . 754 data/task nodeSimple Optimizations with big impact
  • Big Data is about solving Application Problems APM is about Application Performance and Efficiency47
  • THANK YOU Michael Kopp, Technology Strategist Edward Capriolo michael.kopp@compuware.com edward@m6d.com @mikopp @edwardcapriolo48 blog.dynatrace.com m6d.com/blog