Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Instrumenting your Instruments

294 views

Published on

Instrumenting your Instruments

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Instrumenting your Instruments

  1. 1. INSTRUMENTING YOUR INSTRUMENTS Premal Shah Co-Founder @ 6sense Hadoop Summit 2016
  2. 2. AGENDA What does 6sense do? How do we do it? What does the pipeline look like? Where do we do it? What are the challenges? How are we planning to solve them?
  3. 3. WHAT DOES 6SENSE DO? • We find prospects that are in market to buy • We empower marketing and sales teams
  4. 4. SAMPLE OUTPUT Account Name Buying Stage Profile Fit ACME Corporation Purchase Strong ABC Corp Decision Strong XYZ Systems Consideration Medium Doe Inc Awareness Strong PURCHASE DECISION CONSIDERATION AWARENESS
  5. 5. HOW DO WE DO IT? 1st Party •Web •CRM •Marketing Automation 3rd Party •Web •Search •Ad Impressions Modelling & Scoring Actionable Data for the Customer
  6. 6. Customer Systems WHAT DOES THE PIPELINE LOOK LIKE? Customer Systems Ingest Process Export Customer Systems
  7. 7. THE DAILY PROCESS GRAPH (DAG)
  8. 8. THE REAL WORLD
  9. 9. THE REAL WORLD * N
  10. 10. PIPELINE COMPONENTS Hadoop Eco System YARN Hive Presto Mesos World Mesos Chronos Marathon
  11. 11. WORKFLOW Chronos Queue Marathon Jobs •Hadoop •Hive •Presto •Python
  12. 12. WHERE DO WE DO IT? • AWS ─ Elastic ─ Easy to experiment ─ No CAPEX • Hadoop ─ Data Nodes are run separately from Node Managers ─ Most of the data sits in S3
  13. 13. PROJECT RAVEN
  14. 14. WHAT AFFECTS PERFORMANCE • Hive ─ Joins ─ Non-Partitioned tables ─ Filters ─ Bucketing • Hadoop ─ File format ─ Compression ─ Data Locality
  15. 15. METRICS THAT MATTER • # of Mappers • # of Input Files • # of Input Records • # of Records passed on to the next stage • Time taken in ─ Mappers ─ Copy ─ Shuffle ─ Reducers • # of Reducers • # of compressed vs uncompressed files • File formats • Etc.
  16. 16. WHAT DO WE STORE? • Job Name 1 ─ Date 1 o Yarn Job # 1  Metrics o Yarn Job # 2  Metrics ─ Date 2 o Repeat as above • Job Name 2 ─ Repeat as above
  17. 17. WHAT DO WE USE THEM FOR? • Finding the Job that ─ Is the slowest ─ Process the most files ─ Filter out most of the data ─ Use the most amount of memory • Observe trends over time in the above metrics • Get alerted on changes in the trends, both up and down
  18. 18. RECOMMENDATIONS • Storage Format • Compression Type • Partition Columns • Bucketing • Etc.
  19. 19. OPTIMIZATIONS • Which job is causing the bottleneck? • How many errors can we tolerate? • Which job is the biggest offender? • Which job fails the most? • What did the latest release do?
  20. 20. SCALING • Can we scale the number of customers? • What does it cost to add a customer? • What does it cost to add a job to each customer’s pipeline?
  21. 21. VENDOR SHOUT OUT • ClusterK (now AWS Spot Fleet) ─ Allows us to use different instance types to load balance and reduce costs • Sumo Logic ─ Detect variances in behavior over a custom time period • OpsClarity ─ Collects, monitors and alerts on the following metrics o AWS Cloud Watch metrics (Queue length, S3 bucket size, etc.) o Host metrics (CPU, Memory, Disk Space, etc.) o Service metrics (YARN, HBase, Mesos, etc.) o Container metrics - Docker o Custom metrics – Anything else you want to send
  22. 22. THANK YOU • premal at 6sense.com • https://www.linkedin.com/in/premaljshah

×