Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

969 views

Published on

Published in: Technology, Business
  • Be the first to comment

Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

  1. 1. 1Testing Big DataPrepared by: Anca Andreea Sfecla, Quality Assurance ManagerEmbarcadero Technologies Romania@ CODECAMP 2013,20th April 2013
  2. 2. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
  3. 3. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesWhat is Big Data?• “Big Data is the frontier of a firm’s ability to store,process, and access all the data it needs tooperate effectively, make decisions, reduce risks,and serve customers.” - Forrester Research• “Big data creates a new layer in the economywhich is all about information, turninginformation, or data, into revenue. In 2013, bigdata is forecast to drive $34 billion of IT spending”– Gartner Research
  4. 4. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data CharacteristicsBigDataVolumeVarietyVelocityValue
  5. 5. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data CharacteristicsBigDataVolumeVarietyVelocityValue
  6. 6. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data CharacteristicsBigDataVolumeVarietyVelocityValue
  7. 7. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data CharacteristicsBigDataVolumeVarietyVelocityValue
  8. 8. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data CharacteristicsBigDataVolumeVarietyVelocityValue
  9. 9. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data Success Stories• Detecting infections in premature infants upto 24 hours before they exhibit symptoms• Reducing the cost of sequencing a genomefrom $10,000 to less than $100• Predict flu outbreaks by analyzing massivenumber of Google searches related to flusymptoms
  10. 10. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesEDW versus Big Data
  11. 11. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesEDW versus Big DataClean Data Unclean DataGigabytes toTerabytes(1000 GB)Petabytes(1000 TB) toExabytes(1000 PB)Simplified, Structured Complex, Semi or UnstructuredData from relationaldatabaseData from non-relational flatfile storageCentralized data Distributed dataStructured DatabaseSchemaCustomized-instant schema,generated
  12. 12. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data SolutionsMicrosoft Big Data Solution
  13. 13. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data Solutions
  14. 14. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data Solutions
  15. 15. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
  16. 16. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data Processing using HadoopFramework
  17. 17. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig DataAnalyticsWeb LogsStreamingDataSocial DataTransactionalData (RDBMS)Enterprise Data WarehouseHADOOPHivePigMapReduce(Job Execution)HBase(NoSQL DB)HDFS (Hadoop Distributed File System)ProcessedDataData Load using SqoopETLProcessBig Data Architecture
  18. 18. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data ArchitectureBig DataAnalyticsWeb LogsStreamingDataSocial DataTransactionalData (RDBMS)Enterprise Data WarehouseHADOOPHivePigMapReduce(Job Execution)HBase(NoSQL DB)HDFS (Hadoop Distributed File System)ProcessedDataData Load using SqoopETLProcess1 Pre-HadoopProcessing
  19. 19. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesPossible problems• incorrect data captured from source systems• incorrect storage of data• incomplete or incorrect replications
  20. 20. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig DataAnalyticsWeb LogsStreamingDataSocial DataTransactionalData (RDBMS)Enterprise Data WarehouseHADOOPHivePigMapReduce(Job Execution)HBase(NoSQL DB)HDFS (Hadoop Distributed File System)Processed DataData Load using SqoopETLProcessBig Data Architecture1 Pre-HadoopProcessing2 Map-Reduceprocessvalidation
  21. 21. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesPossible problems•coding issues in map-reduce jobs• jobs working correctly when runin standalone node, but workingincorrectly when run on multiplenodes• incorrect aggregations, nodeconfigurations and incorrectoutput format
  22. 22. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig DataAnalyticsWeb LogsStreamingDataSocial DataTransactionalData (RDBMS)Enterprise Data WarehouseHADOOPHivePigMapReduce(Job Execution)HBase(NoSQL DB)HDFS (Hadoop Distributed File System)Processed DataData Load using SqoopETLProcessBig Data Architecture1 Pre-HadoopProcessing2 Map-Reduceprocessvalidation3 Data Extractand LoadProcess
  23. 23. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesPossible problems• incorrectly applied transformationrules• incomplete data extract from HDFS• incorrect load of HDFS files intoanalysis tools
  24. 24. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig DataAnalyticsWeb LogsStreamingDataSocial DataTransactionalData (RDBMS)Enterprise Data WarehouseHADOOPHivePigMapReduce(Job Execution)HBase(NoSQL DB)HDFS (Hadoop Distributed File System)Processed DataData Load using SqoopETLProcessBig Data Architecture1 Pre-HadoopProcessing2 Map-Reduceprocessvalidation3 Data Extractand Load ProcessReports testing
  25. 25. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesPossible problems• report definitions not set as per requirement• report data issues• layout and format issues
  26. 26. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig DataAnalyticsWeb LogsStreamingDataSocial DataTransactionalData (RDBMS)Enterprise Data WarehouseHADOOPHivePigMapReduce(Job Execution)HBase(NoSQL DB)HDFS (Hadoop Distributed File System)Processed DataData Load using SqoopETLProcessBig Data Architecture1 Pre-HadoopProcessing2 Map-Reduceprocessvalidation3 Data Extractand Load ProcessNonFunctionalTestingReports testing
  27. 27. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesPossible problems• imbalance in input splits• redundant sorts• moving most of the aggregation computations to theReduce process• node failures• data corruption
  28. 28. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesNew to the tester• Semi-structured and unstructured data• Immense volumes of dynamic, complex data• Test environment• Big Data ecosystem• Pure programming tools• Non-SQL interrogations
  29. 29. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesTesting Big Data• Big• Fast• Complex• Rewarding
  30. 30. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesQ&A
  31. 31. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesThank you!& Please fill in your evaluation formanca.sfecla@embarcadero.com

×