Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data, Big Trouble: Getting into the Flow of Hadoop Testing

232 views

Published on

Big Data, one of the latest buzzwords in our industry, involves working with petabytes of data captured by various systems and making sense of that data in some way. Maryam Umar has found that testing systems like Hadoop is very challenging because of the frequency with which the data arrives in the system, the number of jobs that run to process that data, and the interdependency of the data. Maryam describes some of the projects at Hotels.com which involve identifying multiple users and using that data to make recommendations of hotels. Testing this is fairly difficult as we need an ability to represent the jobs being executed in the Hadoop ecosystem with an appropriate test tool. Maryam presents a few examples of how she has been able to overcome this challenge using the Oozie workflow coordinator as a test tool that works with the Hadoop file system (HDFS). She demonstrates how test code can be written in a non-testing tool to help gain confidence in the data produced as a result of running a job processor.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Big Data, Big Trouble: Getting into the Flow of Hadoop Testing

  1. 1.         T19   Big  Data   10/6/16  15:00             Big  Data,  Big  Trouble:  Getting  into  the   Flow  of  Hadoop  Testing   Presented  by:         Maryam  Umar       Hotels.com     Brought  to  you  by:                 350  Corporate  Way,  Suite  400,  Orange  Park,  FL  32073     888-­‐-­‐-­‐268-­‐-­‐-­‐8770  ·∙·∙  904-­‐-­‐-­‐278-­‐-­‐-­‐0524  -­‐  info@techwell.com  -­‐  http://www.starwest.techwell.com/            
  2. 2.     Maryam  Umar       Maryam  Umar  works  in  London  at  Hotels.com,  an  Expedia,  Inc.  company.  She   started  her  career  nine  years  ago  as  a  QA  test  engineer  in  the  finance  and  mobile   industry.  After  transitioning  to  the  eCommerce  sector,  Maryam  performed  QA  in   various  capacities  for  online  restaurant  and  travel  services.  She  continues  to  work  in   QA  since  she  strongly  feels  that  software  testing  is  critical  to  getting  products  to   meet  the  customer's  desire.  In  the  past  few  years,  Maryam  has  been  passionately   promoting  and  developing  testing  automation  techniques,  which  can  streamline  and   fortify  the  QA  processes.  Her  mantra  is  to  reduce  any  repetitive  tasks  which  can  be   automated  for  testing.  
  3. 3. BIG DATA BIG TROUBLE: GETTING INTO THE FLOW OF HADOOP TESTING
  4. 4. @maryamumar@maryamumar
  5. 5. HELLO! I AM MARYAM UMAR YOU CAN FIND ME AT: @MARYAMUMAR
  6. 6. @maryamumar@maryamumar
  7. 7. @maryamumar@maryamumar
  8. 8. @maryamumar@maryamumar BIG DATA IS NOT HEAVY DATA BUT JUST LOTS AND LOTS OF DATA!
  9. 9. @maryamumar@maryamumar SHOPPING DATA
  10. 10. @maryamumar@maryamumar ORDER DATA
  11. 11. @maryamumar@maryamumar CUSTOMER DATA
  12. 12. @maryamumar@maryamumar WHAT DO WE DO WITH ALL THIS DATA?
  13. 13. @maryamumar@maryamumar Destination Recommendations!
  14. 14. @maryamumar@maryamumar SOME TECHNICAL DETAILS…. Order data Graph per user User searches
  15. 15. @maryamumar
  16. 16. @maryamumar@maryamumar
  17. 17. @maryamumar
  18. 18. @maryamumar@maryamumar PHILOSOPHY FOR TESTING •Generate test data •Run test data against code-in-question •Verify outcome against expected results
  19. 19. @maryamumar PROCESS WORKFLOW IN OOZIE Test Data Generator Graph Generator Test Data Assertor
  20. 20. @maryamumar@maryamumar
  21. 21. @maryamumar TESTDATAGENERATOR
  22. 22. @maryamumar TESTDATAGENERATOR (CONTD.)
  23. 23. @maryamumar TESTDATAASSERTER
  24. 24. @maryamumar TESTDATAASSERTER (CONTD.)
  25. 25. @maryamumar@maryamumar
  26. 26. @maryamumar@maryamumar
  27. 27. @maryamumar@maryamumar Thank you!!!

×