1Testing Big DataPrepared by: Anca Andreea Sfecla, Quality Assurance ManagerEmbarcadero Technologies Romania@ CODECAMP 201...
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesWhat is Big Data?• “Big Data is the frontier of a firm’s ability to...
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data CharacteristicsBigDataVolumeVarietyVelocityValue
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data CharacteristicsBigDataVolumeVarietyVelocityValue
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data CharacteristicsBigDataVolumeVarietyVelocityValue
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data CharacteristicsBigDataVolumeVarietyVelocityValue
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data CharacteristicsBigDataVolumeVarietyVelocityValue
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data Success Stories• Detecting infections in premature infants...
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesEDW versus Big Data
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesEDW versus Big DataClean Data Unclean DataGigabytes toTerabytes(100...
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data SolutionsMicrosoft Big Data Solution
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data Solutions
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data Solutions
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data Processing using HadoopFramework
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig DataAnalyticsWeb LogsStreamingDataSocial DataTransactionalData ...
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data ArchitectureBig DataAnalyticsWeb LogsStreamingDataSocial D...
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesPossible problems• incorrect data captured from source systems• inc...
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig DataAnalyticsWeb LogsStreamingDataSocial DataTransactionalData ...
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesPossible problems•coding issues in map-reduce jobs• jobs working co...
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig DataAnalyticsWeb LogsStreamingDataSocial DataTransactionalData ...
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesPossible problems• incorrectly applied transformationrules• incompl...
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig DataAnalyticsWeb LogsStreamingDataSocial DataTransactionalData ...
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesPossible problems• report definitions not set as per requirement• r...
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig DataAnalyticsWeb LogsStreamingDataSocial DataTransactionalData ...
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesPossible problems• imbalance in input splits• redundant sorts• movi...
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesNew to the tester• Semi-structured and unstructured data• Immense v...
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesTesting Big Data• Big• Fast• Complex• Rewarding
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesQ&A
Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesThank you!& Please fill in your evaluation formanca.sfecla@embarcad...
Upcoming SlideShare
Loading in...5
×

Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

647
-1

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
647
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
47
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

  1. 1. 1Testing Big DataPrepared by: Anca Andreea Sfecla, Quality Assurance ManagerEmbarcadero Technologies Romania@ CODECAMP 2013,20th April 2013
  2. 2. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
  3. 3. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesWhat is Big Data?• “Big Data is the frontier of a firm’s ability to store,process, and access all the data it needs tooperate effectively, make decisions, reduce risks,and serve customers.” - Forrester Research• “Big data creates a new layer in the economywhich is all about information, turninginformation, or data, into revenue. In 2013, bigdata is forecast to drive $34 billion of IT spending”– Gartner Research
  4. 4. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data CharacteristicsBigDataVolumeVarietyVelocityValue
  5. 5. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data CharacteristicsBigDataVolumeVarietyVelocityValue
  6. 6. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data CharacteristicsBigDataVolumeVarietyVelocityValue
  7. 7. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data CharacteristicsBigDataVolumeVarietyVelocityValue
  8. 8. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data CharacteristicsBigDataVolumeVarietyVelocityValue
  9. 9. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data Success Stories• Detecting infections in premature infants upto 24 hours before they exhibit symptoms• Reducing the cost of sequencing a genomefrom $10,000 to less than $100• Predict flu outbreaks by analyzing massivenumber of Google searches related to flusymptoms
  10. 10. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesEDW versus Big Data
  11. 11. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesEDW versus Big DataClean Data Unclean DataGigabytes toTerabytes(1000 GB)Petabytes(1000 TB) toExabytes(1000 PB)Simplified, Structured Complex, Semi or UnstructuredData from relationaldatabaseData from non-relational flatfile storageCentralized data Distributed dataStructured DatabaseSchemaCustomized-instant schema,generated
  12. 12. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data SolutionsMicrosoft Big Data Solution
  13. 13. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data Solutions
  14. 14. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data Solutions
  15. 15. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
  16. 16. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data Processing using HadoopFramework
  17. 17. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig DataAnalyticsWeb LogsStreamingDataSocial DataTransactionalData (RDBMS)Enterprise Data WarehouseHADOOPHivePigMapReduce(Job Execution)HBase(NoSQL DB)HDFS (Hadoop Distributed File System)ProcessedDataData Load using SqoopETLProcessBig Data Architecture
  18. 18. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data ArchitectureBig DataAnalyticsWeb LogsStreamingDataSocial DataTransactionalData (RDBMS)Enterprise Data WarehouseHADOOPHivePigMapReduce(Job Execution)HBase(NoSQL DB)HDFS (Hadoop Distributed File System)ProcessedDataData Load using SqoopETLProcess1 Pre-HadoopProcessing
  19. 19. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesPossible problems• incorrect data captured from source systems• incorrect storage of data• incomplete or incorrect replications
  20. 20. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig DataAnalyticsWeb LogsStreamingDataSocial DataTransactionalData (RDBMS)Enterprise Data WarehouseHADOOPHivePigMapReduce(Job Execution)HBase(NoSQL DB)HDFS (Hadoop Distributed File System)Processed DataData Load using SqoopETLProcessBig Data Architecture1 Pre-HadoopProcessing2 Map-Reduceprocessvalidation
  21. 21. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesPossible problems•coding issues in map-reduce jobs• jobs working correctly when runin standalone node, but workingincorrectly when run on multiplenodes• incorrect aggregations, nodeconfigurations and incorrectoutput format
  22. 22. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig DataAnalyticsWeb LogsStreamingDataSocial DataTransactionalData (RDBMS)Enterprise Data WarehouseHADOOPHivePigMapReduce(Job Execution)HBase(NoSQL DB)HDFS (Hadoop Distributed File System)Processed DataData Load using SqoopETLProcessBig Data Architecture1 Pre-HadoopProcessing2 Map-Reduceprocessvalidation3 Data Extractand LoadProcess
  23. 23. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesPossible problems• incorrectly applied transformationrules• incomplete data extract from HDFS• incorrect load of HDFS files intoanalysis tools
  24. 24. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig DataAnalyticsWeb LogsStreamingDataSocial DataTransactionalData (RDBMS)Enterprise Data WarehouseHADOOPHivePigMapReduce(Job Execution)HBase(NoSQL DB)HDFS (Hadoop Distributed File System)Processed DataData Load using SqoopETLProcessBig Data Architecture1 Pre-HadoopProcessing2 Map-Reduceprocessvalidation3 Data Extractand Load ProcessReports testing
  25. 25. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesPossible problems• report definitions not set as per requirement• report data issues• layout and format issues
  26. 26. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig DataAnalyticsWeb LogsStreamingDataSocial DataTransactionalData (RDBMS)Enterprise Data WarehouseHADOOPHivePigMapReduce(Job Execution)HBase(NoSQL DB)HDFS (Hadoop Distributed File System)Processed DataData Load using SqoopETLProcessBig Data Architecture1 Pre-HadoopProcessing2 Map-Reduceprocessvalidation3 Data Extractand Load ProcessNonFunctionalTestingReports testing
  27. 27. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesPossible problems• imbalance in input splits• redundant sorts• moving most of the aggregation computations to theReduce process• node failures• data corruption
  28. 28. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesNew to the tester• Semi-structured and unstructured data• Immense volumes of dynamic, complex data• Test environment• Big Data ecosystem• Pure programming tools• Non-SQL interrogations
  29. 29. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesTesting Big Data• Big• Fast• Complex• Rewarding
  30. 30. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesQ&A
  31. 31. Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesThank you!& Please fill in your evaluation formanca.sfecla@embarcadero.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×