Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Measure() or die()

970 views

Published on

In this Meetup Arik Lerner – Liveperson Team lead of Java Automation, Performance & Resilience , will talk about How we measure our services, By End2End testing which become one of the most critical Monitor tool in LP .

Over 200K tests runs per day providing statistics and insights into the problem as they happen.

Arik will go through different topics and stages of the journey and share details that led to current results .

Part of the menu topics are : The Awakens of the End2End Insights
• How we measure our services using synthetic user experience
• Measuring through analytics & insights
• How we collect our data
• How we debug our services? Hint: video recording, HAR (Http archive), KIbana , Dashboard analytics & insights
• Future logs App correlation with End2End data
• Our tools: Selenium, Jenkins and cutting edge technologies such as Kafka & ELK (Elastic search, Logstash and Kibana)

In this Meetup, Arik will host Ali AbuAli- NOC Team Leader , who will talk about the e2e usage on his day 2 day work.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Measure() or die()

  1. 1. By Arik Lerner Team Lead Automation & Performance/Resilience Measure() OR Die(); Measure or Die
  2. 2. - 3.5 years in Liveperson - 2 years - Reporting Platform - 1.5 years Team Lead Automation & Performance/Resilience - Interests: Private pilot on Cessna 172 Bio
  3. 3. ➔ How we monitor with e2e testing ➔ E2E Products & Persona’s ➔ The Awakens of the End2End Data ➔ Architecture & Life cycle Meetup Agenda
  4. 4. About Liveperson Liveperson transforms the connection between brands and consumers.
  5. 5. 3BN Visits/month 200BN API calls/month 2 PB data a year 1.5 M Visits concurrent Our Scale
  6. 6. Our Engineering ~200 people RnD Constant innovation Multiple Technologies Fast release cycle
  7. 7. We Monitor Liveperson Services By e2e tests which simulate Real Business scenario ➔ Indicates real business problems ➔ Service availability from consumer eyes. ➔ Alert and acquire immediate action. ➔ Insight on our business services
  8. 8. Agent Login Enter into the system Visitor init chatVisitor enter into site Agent Chat E2E Scenario Example
  9. 9. E2E customers expectations ➔Stability == TRUST ➔Investigatable ➔Service Coverage ➔Scale
  10. 10. E2E Dashboard Statistics
  11. 11. Real Time Dashboard
  12. 12. Kibana - HAR statistics & Aggregation
  13. 13. E2E Persona’s Production specialist PMO Management
  14. 14. This is Yossi. When Yossi gets up in the morning Yossi looks at the E2E RT dashboard Yossi recognize failure Yossi enters into E2E debug center tools Yossi is smart! Be like Yossi. Production Specialist User Story
  15. 15. PMO User Story This is Michal. Before any software deployment When dashboard failure rate is below 3% Michal have a GO for deployment Michal is smart! Be like Michal.
  16. 16. Management story This is Eli. When Eli getup in the morning. Eli looks into the Dashboard statistics Eli can see the health and availability Each Data Centers Eli is smart! Be like Eli.
  17. 17. ➔ Total failures rate. ◆ Filter for each Data Center ◆ Filter each business flow KPIs ➔ Trend to understand service stability Widgets What KPIs do I need to measure ?
  18. 18. ➔ Total chats failure rate. ➔ Total missing engagements ➔ Total login failures ➔ Average login response time. KPIs ➔ Failure cause break down ➔ Client location root cause ➔ Test scenario failures Widgets What KPIs do I need to measure ?
  19. 19. Dashboard Demo
  20. 20. The Awakening of the End2End Data
  21. 21. Start collecting the data! ➔ Get build failures/success ➔ Get failure cause ➔ Business flows ➔ Test duration ➔ Client location ➔ Data Center location ➔ Account @Test Raw Data Output
  22. 22. The HTTP Archive format or HAR, is a JSON-formatted archive file format for logging of a web browser's interaction with a site. The common extension for these files is .har. The specification for the HTTP Archive (HAR) format defines an archival format for HTTP transactions that can be used by a web browser to export detailed performance data about web pages it loads. The specification for this format is produced by the Web Performance Working Group[1] of the World Wide Web Consortium (W3C). The specification is in draft form and is a work in progress. HAR (Http Archive) ➔Logging web browser traffic
  23. 23. HAR proxy diagram Proxy on port XXX Selenium WebDriver HAR www.Liveperson.com Request passes through proxy Based on BrowserMob embedded proxy server Code snippet - adding proxy into Selenium
  24. 24. • N scenarios • Running from M locations • Running to X Data Centers • Yields HAR Data Question: how do we investigate the data for the entire Farm/Location/Scenario ? etc... Answer: aggregation. Pop quiz:
  25. 25. Start with collecting the data! @Test Raw Data Output { metaData:{ "Testname": ChatFlow, "Account": qa12345, "ClientLocation": US, "DataCenter": UK, } } MetadataHAR
  26. 26. Kafka (topic e2e) Logstash + Elasticsearch Kibana Dashboard Jenkin s Slave Jenkin s Slave Jenkin s Slave HAR files@Test @Test HAR Processor Files Output Get Json Send data Code snippet send message into Kafka
  27. 27. Our benefits ➔ Data Retention - 30 days ➔ Ability to query and aggregate over the data for investigation ➔ Ability to build dashboards ➔ Access to the data thorough Elasticsearch APIs ELK & HAR Downsides ➔ Complicated queries over Kibana ➔ ELK setup & maintenance ➔ When getting response timeout -> HAR displayed enormous number (need to be handled by code)
  28. 28. What more E2E outputs do we have ? @Test More Output BDD Reports Video Logs Browser console logs
  29. 29. Code snippet BDD - Behaviour Driven Development
  30. 30. MySql DB KAFKA + ELK Kibana serviceE2E Reports HAR data e2e data Graphite Zabbix Jenkins Master Production metrics Grafana Jenkin s Slave Jenkin s Slave Jenkin s Slave Jenkin s Slave Jenkin s Slave Jenkin s Slave Jenkin s Slave Jenkin s Slave Jenkin s Slave DC-1 DC-2 DC-N @Test @Test RT Dashboard Jenkins Master DR
  31. 31. E2E Test Lifecycle DEV ProductionStagingQADEV
  32. 32. E2E @ Scale
  33. 33. E2E @ Scale ➔ 1.5M http traffic records per day ➔ 200K runs per day ➔ 60 Jenkins slaves machines ➔ 28 scenarios ➔ 6 client location ➔ 6 Regions
  34. 34. What to take home ? ➔ Monitor your Data Centers from consumer experience ➔ Collect data ➔ Provide business meaning with the data.
  35. 35. THANK YOU! We are hiring
  36. 36. YouTube.com/LivePersonDev Twitter.com/LivePersonDev Facebook.com/LivePersonDev Slideshare.net/LivePersonDev

×