Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Upcoming SlideShare
Loading in...5
×
 

Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

on

  • 739 views

 

Statistics

Views

Total Views
739
Views on SlideShare
739
Embed Views
0

Actions

Likes
0
Downloads
27
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero Presentation Transcript

    • 1Testing Big DataPrepared by: Anca Andreea Sfecla, Quality Assurance ManagerEmbarcadero Technologies Romania@ CODECAMP 2013,20th April 2013
    • Prepared by Anca Sfecla, QAM - Embarcadero Technologies
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesWhat is Big Data?• “Big Data is the frontier of a firm’s ability to store,process, and access all the data it needs tooperate effectively, make decisions, reduce risks,and serve customers.” - Forrester Research• “Big data creates a new layer in the economywhich is all about information, turninginformation, or data, into revenue. In 2013, bigdata is forecast to drive $34 billion of IT spending”– Gartner Research
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data CharacteristicsBigDataVolumeVarietyVelocityValue
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data CharacteristicsBigDataVolumeVarietyVelocityValue
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data CharacteristicsBigDataVolumeVarietyVelocityValue
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data CharacteristicsBigDataVolumeVarietyVelocityValue
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data CharacteristicsBigDataVolumeVarietyVelocityValue
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data Success Stories• Detecting infections in premature infants upto 24 hours before they exhibit symptoms• Reducing the cost of sequencing a genomefrom $10,000 to less than $100• Predict flu outbreaks by analyzing massivenumber of Google searches related to flusymptoms
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesEDW versus Big Data
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesEDW versus Big DataClean Data Unclean DataGigabytes toTerabytes(1000 GB)Petabytes(1000 TB) toExabytes(1000 PB)Simplified, Structured Complex, Semi or UnstructuredData from relationaldatabaseData from non-relational flatfile storageCentralized data Distributed dataStructured DatabaseSchemaCustomized-instant schema,generated
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data SolutionsMicrosoft Big Data Solution
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data Solutions
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data Solutions
    • Prepared by Anca Sfecla, QAM - Embarcadero Technologies
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data Processing using HadoopFramework
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig DataAnalyticsWeb LogsStreamingDataSocial DataTransactionalData (RDBMS)Enterprise Data WarehouseHADOOPHivePigMapReduce(Job Execution)HBase(NoSQL DB)HDFS (Hadoop Distributed File System)ProcessedDataData Load using SqoopETLProcessBig Data Architecture
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig Data ArchitectureBig DataAnalyticsWeb LogsStreamingDataSocial DataTransactionalData (RDBMS)Enterprise Data WarehouseHADOOPHivePigMapReduce(Job Execution)HBase(NoSQL DB)HDFS (Hadoop Distributed File System)ProcessedDataData Load using SqoopETLProcess1 Pre-HadoopProcessing
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesPossible problems• incorrect data captured from source systems• incorrect storage of data• incomplete or incorrect replications
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig DataAnalyticsWeb LogsStreamingDataSocial DataTransactionalData (RDBMS)Enterprise Data WarehouseHADOOPHivePigMapReduce(Job Execution)HBase(NoSQL DB)HDFS (Hadoop Distributed File System)Processed DataData Load using SqoopETLProcessBig Data Architecture1 Pre-HadoopProcessing2 Map-Reduceprocessvalidation
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesPossible problems•coding issues in map-reduce jobs• jobs working correctly when runin standalone node, but workingincorrectly when run on multiplenodes• incorrect aggregations, nodeconfigurations and incorrectoutput format
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig DataAnalyticsWeb LogsStreamingDataSocial DataTransactionalData (RDBMS)Enterprise Data WarehouseHADOOPHivePigMapReduce(Job Execution)HBase(NoSQL DB)HDFS (Hadoop Distributed File System)Processed DataData Load using SqoopETLProcessBig Data Architecture1 Pre-HadoopProcessing2 Map-Reduceprocessvalidation3 Data Extractand LoadProcess
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesPossible problems• incorrectly applied transformationrules• incomplete data extract from HDFS• incorrect load of HDFS files intoanalysis tools
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig DataAnalyticsWeb LogsStreamingDataSocial DataTransactionalData (RDBMS)Enterprise Data WarehouseHADOOPHivePigMapReduce(Job Execution)HBase(NoSQL DB)HDFS (Hadoop Distributed File System)Processed DataData Load using SqoopETLProcessBig Data Architecture1 Pre-HadoopProcessing2 Map-Reduceprocessvalidation3 Data Extractand Load ProcessReports testing
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesPossible problems• report definitions not set as per requirement• report data issues• layout and format issues
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesBig DataAnalyticsWeb LogsStreamingDataSocial DataTransactionalData (RDBMS)Enterprise Data WarehouseHADOOPHivePigMapReduce(Job Execution)HBase(NoSQL DB)HDFS (Hadoop Distributed File System)Processed DataData Load using SqoopETLProcessBig Data Architecture1 Pre-HadoopProcessing2 Map-Reduceprocessvalidation3 Data Extractand Load ProcessNonFunctionalTestingReports testing
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesPossible problems• imbalance in input splits• redundant sorts• moving most of the aggregation computations to theReduce process• node failures• data corruption
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesNew to the tester• Semi-structured and unstructured data• Immense volumes of dynamic, complex data• Test environment• Big Data ecosystem• Pure programming tools• Non-SQL interrogations
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesTesting Big Data• Big• Fast• Complex• Rewarding
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesQ&A
    • Prepared by Anca Sfecla, QAM - Embarcadero TechnologiesThank you!& Please fill in your evaluation formanca.sfecla@embarcadero.com