SlideShare a Scribd company logo
1 of 24
Download to read offline
Big Data
Storage POC
ASHA CAMPER SINGH
SOFTWARE ENGINEER INTERN - SUMMER 2015
GOAL
Store HBase Dedupe HBase Write SequenceFile
GOAL 1: STORE
CSV
POJO
HBase
GOAL 1: STORE
CSV
POJO
HBase
TEST_RESULT_KEY
STATUS
CLUSTER_KEY
KNOWN_ISSUE_KEY
BASELINE_OUTCOME
UPGRADED_OUTCOME
BASELINE_MESSAGE
UPGRADED_MESSAGE
IS_DATA_SILO
IS_COMPILE_FAILURE
NAMESPACE
......
GOAL 1: STORE
CSV
POJO
HBase
Entity
“TEST_RESULT_KEY” à “foo”
“STATUS” à “foo”
“CLUSTER_KEY” à “foo”
“KNOWN_ISSUE_KEY” à “foo”
“BASELINE_OUTCOME” à “foo”
“UPGRADED_OUTCOME” à “foo”
“BASELINE_MESSAGE” à “foo”
“UPGRADED_MESSAGE” à “foo”
“IS_DATA_SILO” à “foo”
“IS_COMPILE_FAILURE” à “foo”
“NAMESPACE” à “foo”
...... à .....
UPSERT INTO
TEST_RESULT ...
GOAL 1: STORE
CSV
Map Reduce
/usr/asha/CSVs
Map
/usr/asha/CSVs/1.csv
/usr/asha/CSVs/2.csv
/usr/asha/CSVs/3.csv
/usr/asha/CSVs/4.csv
/usr/asha/CSVs/5.csv
………
/usr/…/1.csv 34223
/usr/…/2.csv 98702
/usr/…/3.csv 982
/usr/…/4.csv 12570
/usr/…/5.csv 8787
………
Reduce
Input.txt Output.txt
absolutePath()
Entity
UPSERT INTO
TEST_RESULT ...
GOAL 2: DEDUPE
F-P F-F
Perm
Regex
“NOT_A_BUG” “DUPLICATE_OF
_BASELINE”
“DUPLICATE
_PERM_REGEX”
GOAL 2: DEDUPE F-P
RESULT
_KEY
ANALYSIS CORE_INFO …
STATUS …
BASELINE_
OUTCOME
UPGRADED_
OUTCOME
… …
foo1 TEST TEST
foo2 COMPILE
foo3 TEST
foo4
GOAL 2: DEDUPE F-P
RESULT
_KEY
ANALYSIS CORE_INFO …
STATUS …
BASELINE_
OUTCOME
UPGRADED_
OUTCOME
… …
foo1 TEST TEST
foo2 NOT_A_BUG COMPILE
foo3 NOT_A_BUG TEST
foo4
GOAL 2: DEDUPE F-P
UPSERT INTO TEST_RESULT (RESULT_KEY, STATUS)
SELECT RESULT_KEY, ‘NOT_A_BUG’
FROM TEST_RESULT
WHERE (BASELINE_OUTCOME = ‘TEST’ OR
BASELINE_OUTCOME = ‘COMPILE’) AND
UPGRADED_OUTCOME IS NULL
GOAL 2: DEDUPE F-F
RESULT
_KEY
ANALYSIS CORE_INFO …
STATUS …
BASELINE_
OUTCOME
UPGRADED_
OUTCOME
BASELINE_
MESSAGE
UPGRADED_
MESSAGE
…
foo1 TEST TEST ABC ABC
foo2 COMPILE JKL
foo3 TEST ABC
foo4 TEST TEST ABC XYZ
GOAL 2: DEDUPE F-F
RESULT
_KEY
ANALYSIS CORE_INFO …
STATUS …
BASELINE_
OUTCOME
UPGRADED_
OUTCOME
BASELINE_
MESSAGE
UPGRADED_
MESSAGE
…
foo1
DUP_OF_
BASELINE
TEST TEST ABC ABC
foo2 COMPILE JKL
foo3 TEST ABC
foo4 TEST TEST ABC XYZ
GOAL 2: DEDUPE F-F
UPSERT INTO TEST_RESULT (RESULT_KEY, STATUS)
SELECT RESULT_KEY, ‘DUPLICATE_OF_BASELINE’
FROM TEST_RESULT
WHERE (BASELINE_OUTCOME = ‘TEST’ AND
UPGRADED_OUTCOME = ‘TEST’) AND
BASELINE_MESSAGE = UPGRADED_MESSAGE
* TODO: implement pattern matching between the messages
instead of only strict equality
GOAL 2: DEDUPE
Perm
Regex
base_regex1, up_regex1
base_regex2, up_regex2
base_regex3, up_regex3
base_regex4, up_regex4
base_regex5, up_regex5
base_regex6, up_regex6
………
<base_regex1,
up_regex1>
<base_regex2,
up_regex2>
<base_regex3,
up_regex3> …
KNOWN_ISSUE__C
SCOPE = ‘PERMANENT’
GOAL 2: DEDUPE
Perm
Regex
<base_regex1,
up_regex1>
<base_regex2,
up_regex2>
<base_regex3,
up_regex3>
<base_regex4,
up_regex4>
<base_regex4,
up_regex4> …
GOAL 2: DEDUPE
Perm
Regex
<base_regex1,
up_regex1> … Column Filter
<base_regex1>
Column Filter
<up_regex1>
Scanner
GOAL 2: DEDUPE
Perm
Regex
Scanner
RESULT
_KEY
ANALYSIS CORE_INFO …
STATUS …
BASELINE_
OUTCOME
UPGRADED_
OUTCOME
BASELINE_
MESSAGE
UPGRADED_
MESSAGE
…
foo1 TEST TEST ABC ABC
foo2 COMPILE JKL
foo3 TEST ABC
foo4 TEST TEST ABC XYZ
……
Results STATUS = ‘DUPLICATE_PERM_REGEX’
ANALYSIS
Psuedo-distributed mode
~1,000,000 entries
ANALYSIS
Psuedo-distributed mode
~1,000,000 entries
HAMMER
Total (EU5) 8-14 hours
THOR/SCREWDRIVER/JAGHAMMER/
VOLDEMORT/GROND/SHARUR
Total < 1 hour
ANALYSIS
THOR/SCREWDRIVER/JAGHAMMER/VOLDEMORT/
GROND/SHARUR
Type of Deduping Time (seconds)
F - P 26.678
F - F 26.742
PermRegex 2369.15
Psuedo-distributed mode
~1,000,000 entries
GOAL 3: WRITE
HBase Write SequenceFile
NOT_A_BUG ✗
DUPLICATE_OF_BASELINE ✗
DUPLICATE_PERM_REGEX ✗
IS_COMPILE_FAILURE ✗
IS_DATA_SILO ✔
<RESULT_KEY, UPGRADED_MESSAGE>
GOAL 3: WRITE
NOT_A_BUG ✗
DUPLICATE_OF_BASELINE ✗
DUPLICATE_PERM_REGEX ✗
IS_COMPILE_FAILURE ✗
IS_DATA_SILO ✔
SELECT * FROM TEST_RESULT WHERE
Results
GOAL 3: WRITE
Results
<RESULT_KEY, UPGRADED_MESSAGE>
write
SequenceFile
Store HBase Dedupe HBase Write SequenceFile
✔ ✔✔
Thank you

More Related Content

What's hot

What's hot (20)

Php mysql
Php mysqlPhp mysql
Php mysql
 
Smarty Template Engine
Smarty Template EngineSmarty Template Engine
Smarty Template Engine
 
Php-mysql by Jogeswar Sir
Php-mysql by Jogeswar SirPhp-mysql by Jogeswar Sir
Php-mysql by Jogeswar Sir
 
System performance tuning
System performance tuningSystem performance tuning
System performance tuning
 
PHP Workshop Notes
PHP Workshop NotesPHP Workshop Notes
PHP Workshop Notes
 
Php i basic chapter 3
Php i basic chapter 3Php i basic chapter 3
Php i basic chapter 3
 
Programming with php
Programming with phpProgramming with php
Programming with php
 
Operators in PHP
Operators in PHPOperators in PHP
Operators in PHP
 
Storytelling By Numbers
Storytelling By NumbersStorytelling By Numbers
Storytelling By Numbers
 
PHP Basic
PHP BasicPHP Basic
PHP Basic
 
MYSQL
MYSQLMYSQL
MYSQL
 
Basic PHP
Basic PHPBasic PHP
Basic PHP
 
OpenWRT Makefile reference
OpenWRT Makefile referenceOpenWRT Makefile reference
OpenWRT Makefile reference
 
My cool new Slideshow!
My cool new Slideshow!My cool new Slideshow!
My cool new Slideshow!
 
slidesharenew1
slidesharenew1slidesharenew1
slidesharenew1
 
Web technology html5 php_mysql
Web technology html5 php_mysqlWeb technology html5 php_mysql
Web technology html5 php_mysql
 
Using Geeklog as a Web Application Framework
Using Geeklog as a Web Application FrameworkUsing Geeklog as a Web Application Framework
Using Geeklog as a Web Application Framework
 
Shell Scripting Tutorial | Edureka
Shell Scripting Tutorial | EdurekaShell Scripting Tutorial | Edureka
Shell Scripting Tutorial | Edureka
 
PHP
PHPPHP
PHP
 
Zend Certification PHP 5 Sample Questions
Zend Certification PHP 5 Sample QuestionsZend Certification PHP 5 Sample Questions
Zend Certification PHP 5 Sample Questions
 

Viewers also liked

Cyto 2015 Forensic Flow Cytometry Tutorial
Cyto 2015 Forensic Flow Cytometry TutorialCyto 2015 Forensic Flow Cytometry Tutorial
Cyto 2015 Forensic Flow Cytometry TutorialPratip Chattopadhyay
 
Flow cytometry: Principles and Applications
Flow cytometry: Principles and ApplicationsFlow cytometry: Principles and Applications
Flow cytometry: Principles and ApplicationsJuhi Arora
 
Kumc introduction to flow cytometry
Kumc introduction to flow cytometryKumc introduction to flow cytometry
Kumc introduction to flow cytometryRichard Hastings
 
Master Data Management
Master Data ManagementMaster Data Management
Master Data ManagementSung Kuan
 
MDM Strategy & Roadmap
MDM Strategy & RoadmapMDM Strategy & Roadmap
MDM Strategy & Roadmapvictorlbrown
 
[Webinaire] Présentation de la création de workflow avec la Plateforme Nuxeo
[Webinaire] Présentation de la création de workflow avec la Plateforme Nuxeo[Webinaire] Présentation de la création de workflow avec la Plateforme Nuxeo
[Webinaire] Présentation de la création de workflow avec la Plateforme NuxeoNuxeo
 

Viewers also liked (8)

Cyto 2015 Forensic Flow Cytometry Tutorial
Cyto 2015 Forensic Flow Cytometry TutorialCyto 2015 Forensic Flow Cytometry Tutorial
Cyto 2015 Forensic Flow Cytometry Tutorial
 
Flow cytometry: Principles and Applications
Flow cytometry: Principles and ApplicationsFlow cytometry: Principles and Applications
Flow cytometry: Principles and Applications
 
Kumc introduction to flow cytometry
Kumc introduction to flow cytometryKumc introduction to flow cytometry
Kumc introduction to flow cytometry
 
Introduction to Flow Cytometry
Introduction to Flow CytometryIntroduction to Flow Cytometry
Introduction to Flow Cytometry
 
Master Data Management
Master Data ManagementMaster Data Management
Master Data Management
 
Flow cytometry
Flow cytometryFlow cytometry
Flow cytometry
 
MDM Strategy & Roadmap
MDM Strategy & RoadmapMDM Strategy & Roadmap
MDM Strategy & Roadmap
 
[Webinaire] Présentation de la création de workflow avec la Plateforme Nuxeo
[Webinaire] Présentation de la création de workflow avec la Plateforme Nuxeo[Webinaire] Présentation de la création de workflow avec la Plateforme Nuxeo
[Webinaire] Présentation de la création de workflow avec la Plateforme Nuxeo
 

Similar to Final Presentation

Compiler Design and Construction COSC 5353Project Instructions -
Compiler Design and Construction COSC 5353Project Instructions -Compiler Design and Construction COSC 5353Project Instructions -
Compiler Design and Construction COSC 5353Project Instructions -LynellBull52
 
Php 7 errors messages
Php 7 errors messagesPhp 7 errors messages
Php 7 errors messagesDamien Seguy
 
What's new in PHP 8.0?
What's new in PHP 8.0?What's new in PHP 8.0?
What's new in PHP 8.0?Nikita Popov
 
Static Optimization of PHP bytecode (PHPSC 2017)
Static Optimization of PHP bytecode (PHPSC 2017)Static Optimization of PHP bytecode (PHPSC 2017)
Static Optimization of PHP bytecode (PHPSC 2017)Nikita Popov
 
Task 2
Task 2Task 2
Task 2EdiPHP
 
The road to continuous deployment (DomCode September 2016)
The road to continuous deployment (DomCode September 2016)The road to continuous deployment (DomCode September 2016)
The road to continuous deployment (DomCode September 2016)Michiel Rook
 
Preparing for the next php version
Preparing for the next php versionPreparing for the next php version
Preparing for the next php versionDamien Seguy
 
實戰 Hhvm extension php conf 2014
實戰 Hhvm extension   php conf 2014實戰 Hhvm extension   php conf 2014
實戰 Hhvm extension php conf 2014Ricky Su
 
Advanced php
Advanced phpAdvanced php
Advanced phpAnne Lee
 
Say Hello To Ecmascript 5
Say Hello To Ecmascript 5Say Hello To Ecmascript 5
Say Hello To Ecmascript 5Juriy Zaytsev
 
Diving Into Memory Allocation to Understand Buffer Overflow Better
Diving Into Memory Allocation to Understand Buffer Overflow BetterDiving Into Memory Allocation to Understand Buffer Overflow Better
Diving Into Memory Allocation to Understand Buffer Overflow BetterOguzhan Topgul
 
Code Generation in PHP - PHPConf 2015
Code Generation in PHP - PHPConf 2015Code Generation in PHP - PHPConf 2015
Code Generation in PHP - PHPConf 2015Lin Yo-An
 
A Lifecycle Of Code Under Test by Robert Fornal
A Lifecycle Of Code Under Test by Robert FornalA Lifecycle Of Code Under Test by Robert Fornal
A Lifecycle Of Code Under Test by Robert FornalQA or the Highway
 

Similar to Final Presentation (20)

SOLID PRINCIPLES
SOLID PRINCIPLESSOLID PRINCIPLES
SOLID PRINCIPLES
 
PHP 7
PHP 7PHP 7
PHP 7
 
Compiler Design and Construction COSC 5353Project Instructions -
Compiler Design and Construction COSC 5353Project Instructions -Compiler Design and Construction COSC 5353Project Instructions -
Compiler Design and Construction COSC 5353Project Instructions -
 
Php 7 errors messages
Php 7 errors messagesPhp 7 errors messages
Php 7 errors messages
 
What's new in PHP 8.0?
What's new in PHP 8.0?What's new in PHP 8.0?
What's new in PHP 8.0?
 
Php
PhpPhp
Php
 
Static Optimization of PHP bytecode (PHPSC 2017)
Static Optimization of PHP bytecode (PHPSC 2017)Static Optimization of PHP bytecode (PHPSC 2017)
Static Optimization of PHP bytecode (PHPSC 2017)
 
Task 2
Task 2Task 2
Task 2
 
Auxiliary
AuxiliaryAuxiliary
Auxiliary
 
The road to continuous deployment (DomCode September 2016)
The road to continuous deployment (DomCode September 2016)The road to continuous deployment (DomCode September 2016)
The road to continuous deployment (DomCode September 2016)
 
PHPUnit testing to Zend_Test
PHPUnit testing to Zend_TestPHPUnit testing to Zend_Test
PHPUnit testing to Zend_Test
 
New in php 7
New in php 7New in php 7
New in php 7
 
Preparing for the next php version
Preparing for the next php versionPreparing for the next php version
Preparing for the next php version
 
實戰 Hhvm extension php conf 2014
實戰 Hhvm extension   php conf 2014實戰 Hhvm extension   php conf 2014
實戰 Hhvm extension php conf 2014
 
PHP7 Presentation
PHP7 PresentationPHP7 Presentation
PHP7 Presentation
 
Advanced php
Advanced phpAdvanced php
Advanced php
 
Say Hello To Ecmascript 5
Say Hello To Ecmascript 5Say Hello To Ecmascript 5
Say Hello To Ecmascript 5
 
Diving Into Memory Allocation to Understand Buffer Overflow Better
Diving Into Memory Allocation to Understand Buffer Overflow BetterDiving Into Memory Allocation to Understand Buffer Overflow Better
Diving Into Memory Allocation to Understand Buffer Overflow Better
 
Code Generation in PHP - PHPConf 2015
Code Generation in PHP - PHPConf 2015Code Generation in PHP - PHPConf 2015
Code Generation in PHP - PHPConf 2015
 
A Lifecycle Of Code Under Test by Robert Fornal
A Lifecycle Of Code Under Test by Robert FornalA Lifecycle Of Code Under Test by Robert Fornal
A Lifecycle Of Code Under Test by Robert Fornal
 

Final Presentation