3. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
What is Big Data?
• “Big Data is the frontier of a firm’s ability to store,
process, and access all the data it needs to
operate effectively, make decisions, reduce risks,
and serve customers.” - Forrester Research
• “Big data creates a new layer in the economy
which is all about information, turning
information, or data, into revenue. In 2013, big
data is forecast to drive $34 billion of IT spending”
– Gartner Research
4. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Characteristics
Big
Data
Volume
Variety
Velocity
Value
5. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Characteristics
Big
Data
Volume
Variety
Velocity
Value
6. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Characteristics
Big
Data
Volume
Variety
Velocity
Value
7. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Characteristics
Big
Data
Volume
Variety
Velocity
Value
8. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Characteristics
Big
Data
Volume
Variety
Velocity
Value
9. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Success Stories
• Detecting infections in premature infants up
to 24 hours before they exhibit symptoms
• Reducing the cost of sequencing a genome
from $10,000 to less than $100
• Predict flu outbreaks by analyzing massive
number of Google searches related to flu
symptoms
10. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
EDW versus Big Data
11. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
EDW versus Big Data
Clean Data Unclean Data
Gigabytes to
Terabytes(1000 GB)
Petabytes(1000 TB) to
Exabytes(1000 PB)
Simplified, Structured Complex, Semi or Unstructured
Data from relational
database
Data from non-relational flat
file storage
Centralized data Distributed data
Structured Database
Schema
Customized-instant schema,
generated
12. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Solutions
Microsoft Big Data Solution
13. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Solutions
14. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Solutions
16. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Processing using Hadoop
Framework
17. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data
Analytics
Web Logs
Streaming
Data
Social Data
Transactional
Data (RDBMS)
Enterprise Data Warehouse
HADOOP
HivePig
MapReduce
(Job Execution)
HBase(NoSQL DB)
HDFS (Hadoop Distributed File System)
ProcessedData
Data Load using Sqoop
ETL
Process
Big Data Architecture
18. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Architecture
Big Data
Analytics
Web Logs
Streaming
Data
Social Data
Transactional
Data (RDBMS)
Enterprise Data Warehouse
HADOOP
HivePig
MapReduce
(Job Execution)
HBase(NoSQL DB)
HDFS (Hadoop Distributed File System)
ProcessedData
Data Load using Sqoop
ETL
Process
1 Pre-Hadoop
Processing
19. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Possible problems
• incorrect data captured from source systems
• incorrect storage of data
• incomplete or incorrect replications
20. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data
Analytics
Web Logs
Streaming
Data
Social Data
Transactional
Data (RDBMS)
Enterprise Data Warehouse
HADOOP
HivePig
MapReduce
(Job Execution)
HBase(NoSQL DB)
HDFS (Hadoop Distributed File System)
Processed Data
Data Load using Sqoop
ETL
Process
Big Data Architecture
1 Pre-Hadoop
Processing
2 Map-Reduce
process
validation
21. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Possible problems
•coding issues in map-reduce jobs
• jobs working correctly when run
in standalone node, but working
incorrectly when run on multiple
nodes
• incorrect aggregations, node
configurations and incorrect
output format
22. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data
Analytics
Web Logs
Streaming
Data
Social Data
Transactional
Data (RDBMS)
Enterprise Data Warehouse
HADOOP
HivePig
MapReduce
(Job Execution)
HBase(NoSQL DB)
HDFS (Hadoop Distributed File System)
Processed Data
Data Load using Sqoop
ETL
Process
Big Data Architecture
1 Pre-Hadoop
Processing
2 Map-Reduce
process
validation
3 Data Extract
and Load
Process
23. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Possible problems
• incorrectly applied transformation
rules
• incomplete data extract from HDFS
• incorrect load of HDFS files into
analysis tools
24. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data
Analytics
Web Logs
Streaming
Data
Social Data
Transactional
Data (RDBMS)
Enterprise Data Warehouse
HADOOP
HivePig
MapReduce
(Job Execution)
HBase(NoSQL DB)
HDFS (Hadoop Distributed File System)
Processed Data
Data Load using Sqoop
ETL
Process
Big Data Architecture
1 Pre-Hadoop
Processing
2 Map-Reduce
process
validation
3 Data Extract
and Load Process
Reports testing
25. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Possible problems
• report definitions not set as per requirement
• report data issues
• layout and format issues
26. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data
Analytics
Web Logs
Streaming
Data
Social Data
Transactional
Data (RDBMS)
Enterprise Data Warehouse
HADOOP
HivePig
MapReduce
(Job Execution)
HBase(NoSQL DB)
HDFS (Hadoop Distributed File System)
Processed Data
Data Load using Sqoop
ETL
Process
Big Data Architecture
1 Pre-Hadoop
Processing
2 Map-Reduce
process
validation
3 Data Extract
and Load Process
NonFunctionalTesting
Reports testing
27. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Possible problems
• imbalance in input splits
• redundant sorts
• moving most of the aggregation computations to the
Reduce process
• node failures
• data corruption
28. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
New to the tester
• Semi-structured and unstructured data
• Immense volumes of dynamic, complex data
• Test environment
• Big Data ecosystem
• Pure programming tools
• Non-SQL interrogations
29. Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Testing Big Data
• Big
• Fast
• Complex
• Rewarding