The impact of supercomputers on MSR

Yasutaka Kamei
Yasutaka KameiAssociate Professor
The impact of 
supercomputers on MSR 
Y. Kamei A. Osaka C. Huang N. Ubayashi 
MSR Next Generation 2014@HKUST
Who am I? 
❖ Yasutaka Kamei 
http://posl.ait.kyushu-u.ac.jp/~kamei/ 
❖ My research interests are 
2 
Summer Winter 
Understanding 
OSS Collaboration 
Improving 
Software Quality 
Scaling up 
MSR Analysis
Today... 
❖ Derive messages from HPC community 
to MSR community. 
• Make use of High Performance Computing 
(HPC) in MSR. 
HPC MSR 
3
2014: A Space Odyssey 
❖ MSR researchers will explore treasure in 
the Universe anytime soon. 
4 
2004 2014
2014: A Space Odyssey 
❖ MSR researchers will explore treasure in 
the Universe anytime soon. 
5 
Diversity in software engineering 
research @ FSE 2013 
20,028 projects as the Universe 
2004 2014
2014: A Space Odyssey 
❖ MSR researchers will explore treasure in 
the Universe anytime soon. 
6 
Diversity in software engineering 
research @ FSE 2013 
20,028 projects as the Universe 
Challenges in Mining Whole 
Software Universe 
2004 2014
One solution is 
❖ Supercomputer 
❖ In the case of FX10, 
• CPU: 16 cores 
• Memory: 32 GByte 
× 4,800 nodes 
7
However… 
❖ The adoption rate for HPC is still low. 
8 
Domain-Specific 
techniques for 
Only Fortran using HPC? 
and C? 
My tool is imple-mented 
by
Prof. Chiba says 
❖ Via collaboration of CREST project, 
9 
We can use Java, Ruby 
and Python on FX10!
Case Study 
❖ Evaluate the impact that HPC can have 
on MSR analyses. 
❖ Apply HPC (FX10) to Code Clone 
Detection. 
10
Code Clone 
❖ A code fragment that has identical or 
similar code fragments 
11 
copy%and%paste copy%and%paste 
code%clone 
clone%fragment 
clone%fragment 
clone%fragment 
Hotta et al. CSMR 2012
Type-3 Clones 
❖ Programmers often make some changes 
to code fragments after copy-and-paste. 
Zhang et al. ICSM 2012 
12 
final 
public 
void 
daload() 
{ 
 countLabels 
= 
0; 
 try 
{ 
position++; 
bCodeStream[i++] 
= 
OPC_daload; 
} 
catch 
(Exception 
e) 
{ 
resizeByteArray(OPC_daload); 
} 
}
Type-3 Clones 
❖ Programmers often make some changes 
to code fragments after copy-and-paste. 
Zhang et al. ICSM 2012 
13 
final 
public 
void 
daload() 
{ 
 countLabels 
= 
0; 
 try 
{ 
position++; 
bCodeStream[i++] 
= 
OPC_daload; 
} 
catch 
(Exception 
e) 
{ 
resizeByteArray(OPC_daload); 
} 
} 
final 
public 
void 
daload() 
{ 
countLabels 
= 
0; 
try 
{ 
position++; 
bCodeStream[i++] 
= 
OPC_daload; 
} 
catch 
(Exception 
e) 
{ 
resizeByteArray(OPC_daload); 
} 
} 
copy-and-paste
Type-3 Clones 
❖ Programmers often make some changes 
to code fragments after copy-and-paste. 
Zhang et al. ICSM 2012 
14 
final 
public 
void 
daload() 
{ 
 countLabels 
= 
0; 
 try 
{ 
position++; 
bCodeStream[i++] 
= 
OPC_daload; 
} 
catch 
(Exception 
e) 
{ 
resizeByteArray(OPC_daload); 
} 
} 
final 
public 
void 
daload() 
{ 
countLabels 
= 
0; 
stackDepth 
+= 
2; 
if 
(stackDepth 
 
stackMax) 
stackMax 
= 
stackDepth; 
try 
{ 
position++; 
bCodeStream[i++] 
= 
OPC_daload; 
} 
catch 
(Exception 
e) 
{ 
resizeByteArray(OPC_daload); 
} 
} 
copy-and-paste 
gap 
added code 
fragment 
Type-3 clones
Our collaborator 
❖ Dr. Keisuke Hotta 
• Postdoc 
• Osaka University, Japan 
• Visiting Researcher 
• Bremen University, Germany 
❖ Help our group to use Scorpio (jar file), 
which is a PDG-based Type-3 clone 
detection tool. 
15
Case Study Setting 
❖ Environment 
❖ Dataset 
• Apache CXF 
• LOC: 830K 
• SIZE: 150MB 
16 
CPU 
Memory [GB] 
per node 
Cores × Nodes 
Desktop 1 
Intel® Core™ i7 
16 
12×1 
Desktop 2 
Xeon E5-2630 v2 
144 
12×1 
FX10 
SPARC64™ IXfx 
32 
16×190
17 
FX10 is much faster! 
127h28m 
42s 
2h15m 
16m58s 
Desktop 1 Desktop 2 FX10 
Time
How to run Scorpio in FX10 
❖ Describe only 20-30 lines of (bash) 
code to run Scorpio in FX10. 
18 
#!/bin/bash 
#PJM ‒L “rscgrp=debug” 
#PJM ‒L “node=190” 
#PJM ‒L “elapse=30:00” 
#PJB ‒j 
#PJM ‒S 
module load Java 
…⋯ 
java scorpio.jar 
How many nodes do 
we use? 
How long do we use 
FX10? 
What are output 
options?
Current our challenges 
19 
Apache CXF 
6,000 files 
Apache All 
Projects 
770,000 files 
UCI Dataset 
390,000,000 
files 
Done Doing ToDo
Challenges in Mining Whole 
Software Universe 
2140 
Diversity in software engineering 
research @ FSE 2013 
20,028 projects as the Universe 
FX10 is much faster! 
127h28m 
42s 
2h15m 
16m58s 
Desktop 1 Desktop 2 FX10 
Time 
Case Study 
❖ Evaluate the impact that HPC can have 
on MSR analyses. 
❖ Apply HPC (FX10) to Code Clone 
Detection. 
7 
Today... 
❖ Derive messages from HPC community 
to MSR community. 
• Make use of High Performance Computing 
(HPC) in MSR. 
HPC MSR 
2 
2014: A Space Odyssey 
❖ MSR researchers will explore treasure in 
the Universe anytime soon. 
3 
2004 2014
1 of 20

Recommended

Распределенные системы хранения данных, особенности реализации DHT в проекте ... by
Распределенные системы хранения данных, особенности реализации DHT в проекте ...Распределенные системы хранения данных, особенности реализации DHT в проекте ...
Распределенные системы хранения данных, особенности реализации DHT в проекте ...yaevents
2K views17 slides
An evaluation of LLVM compiler for SVE with fairly complicated loops by
An evaluation of LLVM compiler for SVE with fairly complicated loopsAn evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loopsLinaro
2.7K views13 slides
Arm tools and roadmap for SVE compiler support by
Arm tools and roadmap for SVE compiler supportArm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler supportLinaro
4.7K views24 slides
"Metrics: Where and How", Vsevolod Polyakov by
"Metrics: Where and How", Vsevolod Polyakov"Metrics: Where and How", Vsevolod Polyakov
"Metrics: Where and How", Vsevolod PolyakovYulia Shcherbachova
1.4K views65 slides
Some analysis of BlueStore and RocksDB by
Some analysis of BlueStore and RocksDBSome analysis of BlueStore and RocksDB
Some analysis of BlueStore and RocksDBXiao Yan Li
440 views10 slides
Is It Faster to Go with Redpanda Transactions than Without Them?! by
Is It Faster to Go with Redpanda Transactions than Without Them?!Is It Faster to Go with Redpanda Transactions than Without Them?!
Is It Faster to Go with Redpanda Transactions than Without Them?!ScyllaDB
484 views39 slides

More Related Content

What's hot

[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree by
[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree
[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-TreePingCAP
239 views18 slides
Code gpu with cuda - CUDA introduction by
Code gpu with cuda - CUDA introductionCode gpu with cuda - CUDA introduction
Code gpu with cuda - CUDA introductionMarina Kolpakova
1.9K views20 slides
Mirage: ML kernels in the cloud (ML Workshop 2010) by
Mirage: ML kernels in the cloud (ML Workshop 2010)Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)Anil Madhavapeddy
3.9K views22 slides
Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ... by
Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ...Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ...
Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ...Stefan Marr
1.6K views19 slides
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法 by
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法MITSUNARI Shigeo
4K views69 slides
Всеволод Поляков (DevOps Team Lead в Grammarly) by
Всеволод Поляков (DevOps Team Lead в Grammarly)Всеволод Поляков (DevOps Team Lead в Grammarly)
Всеволод Поляков (DevOps Team Lead в Grammarly)Provectus
402 views65 slides

What's hot(20)

[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree by PingCAP
[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree
[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree
PingCAP239 views
Code gpu with cuda - CUDA introduction by Marina Kolpakova
Code gpu with cuda - CUDA introductionCode gpu with cuda - CUDA introduction
Code gpu with cuda - CUDA introduction
Marina Kolpakova1.9K views
Mirage: ML kernels in the cloud (ML Workshop 2010) by Anil Madhavapeddy
Mirage: ML kernels in the cloud (ML Workshop 2010)Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)
Anil Madhavapeddy3.9K views
Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ... by Stefan Marr
Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ...Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ...
Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better ...
Stefan Marr1.6K views
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法 by MITSUNARI Shigeo
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
MITSUNARI Shigeo4K views
Всеволод Поляков (DevOps Team Lead в Grammarly) by Provectus
Всеволод Поляков (DevOps Team Lead в Grammarly)Всеволод Поляков (DevOps Team Lead в Grammarly)
Всеволод Поляков (DevOps Team Lead в Grammarly)
Provectus402 views
Efficient Two-level Homomorphic Encryption in Prime-order Bilinear Groups and... by MITSUNARI Shigeo
Efficient Two-level Homomorphic Encryption in Prime-order Bilinear Groups and...Efficient Two-level Homomorphic Encryption in Prime-order Bilinear Groups and...
Efficient Two-level Homomorphic Encryption in Prime-order Bilinear Groups and...
MITSUNARI Shigeo3.1K views
function* - ES6, generators, and all that (JSRomandie meetup, February 2014) by Igalia
function* - ES6, generators, and all that (JSRomandie meetup, February 2014)function* - ES6, generators, and all that (JSRomandie meetup, February 2014)
function* - ES6, generators, and all that (JSRomandie meetup, February 2014)
Igalia504 views
C++ in kernel mode by corehard_by
C++ in kernel modeC++ in kernel mode
C++ in kernel mode
corehard_by1.6K views
Compiler basics: lisp to assembly by Phil Eaton
Compiler basics: lisp to assemblyCompiler basics: lisp to assembly
Compiler basics: lisp to assembly
Phil Eaton275 views
Goroutine stack and local variable allocation in Go by Yu-Shuan Hsieh
Goroutine stack and local variable allocation in GoGoroutine stack and local variable allocation in Go
Goroutine stack and local variable allocation in Go
Yu-Shuan Hsieh340 views
Why Is Concurrent Programming Hard? And What Can We Do about It? by Stefan Marr
Why Is Concurrent Programming Hard? And What Can We Do about It?Why Is Concurrent Programming Hard? And What Can We Do about It?
Why Is Concurrent Programming Hard? And What Can We Do about It?
Stefan Marr1.9K views
LCDS - State Presentation by Ruochun Tzeng
LCDS - State PresentationLCDS - State Presentation
LCDS - State Presentation
Ruochun Tzeng211 views
Status at 2015, Ruby implementation of openEHR by Shinji Kobayashi
Status at 2015, Ruby implementation of openEHRStatus at 2015, Ruby implementation of openEHR
Status at 2015, Ruby implementation of openEHR
Shinji Kobayashi865 views
FOSDEM 2020: Querying over millions and billions of metrics with M3DB's index by Rob Skillington
FOSDEM 2020: Querying over millions and billions of metrics with M3DB's indexFOSDEM 2020: Querying over millions and billions of metrics with M3DB's index
FOSDEM 2020: Querying over millions and billions of metrics with M3DB's index
Rob Skillington202 views
Slide smallfiles by rledisez
Slide smallfilesSlide smallfiles
Slide smallfiles
rledisez267 views
JavaScriptCore's DFG JIT (JSConf EU 2012) by Igalia
JavaScriptCore's DFG JIT (JSConf EU 2012)JavaScriptCore's DFG JIT (JSConf EU 2012)
JavaScriptCore's DFG JIT (JSConf EU 2012)
Igalia1.1K views
.NET Memory Primer (Martin Kulov) by ITCamp
.NET Memory Primer (Martin Kulov).NET Memory Primer (Martin Kulov)
.NET Memory Primer (Martin Kulov)
ITCamp527 views
Мониторинг. Опять, rootconf 2016 by Vsevolod Polyakov
Мониторинг. Опять, rootconf 2016Мониторинг. Опять, rootconf 2016
Мониторинг. Опять, rootconf 2016
Vsevolod Polyakov587 views

Viewers also liked

Defect Prediction: Accomplishments and Future Challenges by
Defect Prediction: Accomplishments and Future ChallengesDefect Prediction: Accomplishments and Future Challenges
Defect Prediction: Accomplishments and Future ChallengesYasutaka Kamei
997 views44 slides
Icsm2010 kamei by
Icsm2010 kameiIcsm2010 kamei
Icsm2010 kameiSAIL_QU
475 views79 slides
A Study of the Quality-Impacting Practices of Modern Code Review at Sony Mobile by
A Study of the Quality-Impacting Practices of Modern Code Review at Sony MobileA Study of the Quality-Impacting Practices of Modern Code Review at Sony Mobile
A Study of the Quality-Impacting Practices of Modern Code Review at Sony MobileSAIL_QU
556 views67 slides
Revisiting the Applicability of the Pareto Principle to Core Development Team... by
Revisiting the Applicability of the Pareto Principle to Core Development Team...Revisiting the Applicability of the Pareto Principle to Core Development Team...
Revisiting the Applicability of the Pareto Principle to Core Development Team...SAIL_QU
442 views56 slides
Icse2011 build maintenance by
Icse2011 build maintenanceIcse2011 build maintenance
Icse2011 build maintenanceSAIL_QU
224 views100 slides
An Automated Approach for Recommending When to Stop Performance Tests by
An Automated Approach for Recommending When to Stop Performance TestsAn Automated Approach for Recommending When to Stop Performance Tests
An Automated Approach for Recommending When to Stop Performance TestsSAIL_QU
606 views37 slides

Viewers also liked(8)

Defect Prediction: Accomplishments and Future Challenges by Yasutaka Kamei
Defect Prediction: Accomplishments and Future ChallengesDefect Prediction: Accomplishments and Future Challenges
Defect Prediction: Accomplishments and Future Challenges
Yasutaka Kamei997 views
Icsm2010 kamei by SAIL_QU
Icsm2010 kameiIcsm2010 kamei
Icsm2010 kamei
SAIL_QU475 views
A Study of the Quality-Impacting Practices of Modern Code Review at Sony Mobile by SAIL_QU
A Study of the Quality-Impacting Practices of Modern Code Review at Sony MobileA Study of the Quality-Impacting Practices of Modern Code Review at Sony Mobile
A Study of the Quality-Impacting Practices of Modern Code Review at Sony Mobile
SAIL_QU556 views
Revisiting the Applicability of the Pareto Principle to Core Development Team... by SAIL_QU
Revisiting the Applicability of the Pareto Principle to Core Development Team...Revisiting the Applicability of the Pareto Principle to Core Development Team...
Revisiting the Applicability of the Pareto Principle to Core Development Team...
SAIL_QU442 views
Icse2011 build maintenance by SAIL_QU
Icse2011 build maintenanceIcse2011 build maintenance
Icse2011 build maintenance
SAIL_QU224 views
An Automated Approach for Recommending When to Stop Performance Tests by SAIL_QU
An Automated Approach for Recommending When to Stop Performance TestsAn Automated Approach for Recommending When to Stop Performance Tests
An Automated Approach for Recommending When to Stop Performance Tests
SAIL_QU606 views
A Holistic Approach to Evolving Software Systems by Michele Lanza
A Holistic Approach to Evolving Software SystemsA Holistic Approach to Evolving Software Systems
A Holistic Approach to Evolving Software Systems
Michele Lanza2.2K views
An Empirical Study of Goto in C Code from GitHub Repositories by SAIL_QU
An Empirical Study of Goto in C Code from GitHub RepositoriesAn Empirical Study of Goto in C Code from GitHub Repositories
An Empirical Study of Goto in C Code from GitHub Repositories
SAIL_QU516 views

Similar to The impact of supercomputers on MSR

Experiences building a distributed shared log on RADOS - Noah Watkins by
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsCeph Community
98 views80 slides
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan by
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanScala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanJimin Hsieh
665 views74 slides
Java On CRaC by
Java On CRaCJava On CRaC
Java On CRaCSimon Ritter
460 views35 slides
Finding Xori: Malware Analysis Triage with Automated Disassembly by
Finding Xori: Malware Analysis Triage with Automated DisassemblyFinding Xori: Malware Analysis Triage with Automated Disassembly
Finding Xori: Malware Analysis Triage with Automated DisassemblyPriyanka Aash
403 views28 slides
Alto Desempenho com Java by
Alto Desempenho com JavaAlto Desempenho com Java
Alto Desempenho com Javacodebits
3.1K views34 slides
Java Memory Model by
Java Memory ModelJava Memory Model
Java Memory ModelŁukasz Koniecki
576 views65 slides

Similar to The impact of supercomputers on MSR(20)

Experiences building a distributed shared log on RADOS - Noah Watkins by Ceph Community
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah Watkins
Ceph Community 98 views
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan by Jimin Hsieh
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanScala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
Jimin Hsieh665 views
Finding Xori: Malware Analysis Triage with Automated Disassembly by Priyanka Aash
Finding Xori: Malware Analysis Triage with Automated DisassemblyFinding Xori: Malware Analysis Triage with Automated Disassembly
Finding Xori: Malware Analysis Triage with Automated Disassembly
Priyanka Aash403 views
Alto Desempenho com Java by codebits
Alto Desempenho com JavaAlto Desempenho com Java
Alto Desempenho com Java
codebits3.1K views
Kubernetes @ Squarespace (SRE Portland Meetup October 2017) by Kevin Lynch
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kevin Lynch237 views
Engineer Engineering Software by Yung-Yu Chen
Engineer Engineering SoftwareEngineer Engineering Software
Engineer Engineering Software
Yung-Yu Chen204 views
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's... by Glenn K. Lockwood
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
Glenn K. Lockwood647 views
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim... by InfluxData
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxData740 views
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night by ScyllaDB
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
ScyllaDB1.1K views
Understanding and Measuring I/O Performance by Glenn K. Lockwood
Understanding and Measuring I/O PerformanceUnderstanding and Measuring I/O Performance
Understanding and Measuring I/O Performance
Glenn K. Lockwood534 views
차세대컴파일러, VM의미래: 애플 오픈소스 LLVM by Jung Kim
차세대컴파일러, VM의미래: 애플 오픈소스 LLVM차세대컴파일러, VM의미래: 애플 오픈소스 LLVM
차세대컴파일러, VM의미래: 애플 오픈소스 LLVM
Jung Kim4.6K views
Revelation pyconuk2016 by Sarah Mount
Revelation pyconuk2016Revelation pyconuk2016
Revelation pyconuk2016
Sarah Mount169 views
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015) by Yuuki Takano
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
Yuuki Takano1.2K views
QNIBTerminal: Understand your datacenter by overlaying multiple information l... by QNIB Solutions
QNIBTerminal: Understand your datacenter by overlaying multiple information l...QNIBTerminal: Understand your datacenter by overlaying multiple information l...
QNIBTerminal: Understand your datacenter by overlaying multiple information l...
QNIB Solutions617 views

Recently uploaded

[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation by
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented GenerationDataScienceConferenc1
19 views29 slides
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... by
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...DataScienceConferenc1
8 views36 slides
Dr. Ousmane Badiane-2023 ReSAKSS Conference by
Dr. Ousmane Badiane-2023 ReSAKSS ConferenceDr. Ousmane Badiane-2023 ReSAKSS Conference
Dr. Ousmane Badiane-2023 ReSAKSS ConferenceAKADEMIYA2063
5 views34 slides
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion by
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionGames, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionBertram Ludäscher
9 views37 slides
Inawsidom - Data Journey by
Inawsidom - Data JourneyInawsidom - Data Journey
Inawsidom - Data JourneyPhilipBasford
8 views38 slides
DGST Methodology Presentation.pdf by
DGST Methodology Presentation.pdfDGST Methodology Presentation.pdf
DGST Methodology Presentation.pdfmaddierlegum
7 views9 slides

Recently uploaded(20)

[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation by DataScienceConferenc1
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... by DataScienceConferenc1
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
Dr. Ousmane Badiane-2023 ReSAKSS Conference by AKADEMIYA2063
Dr. Ousmane Badiane-2023 ReSAKSS ConferenceDr. Ousmane Badiane-2023 ReSAKSS Conference
Dr. Ousmane Badiane-2023 ReSAKSS Conference
AKADEMIYA20635 views
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion by Bertram Ludäscher
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionGames, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
DGST Methodology Presentation.pdf by maddierlegum
DGST Methodology Presentation.pdfDGST Methodology Presentation.pdf
DGST Methodology Presentation.pdf
maddierlegum7 views
Ukraine Infographic_22NOV2023_v2.pdf by AnastosiyaGurin
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdf
AnastosiyaGurin1.4K views
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ... by DataScienceConferenc1
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
Listed Instruments Survey 2022.pptx by secretariat4
Listed Instruments Survey  2022.pptxListed Instruments Survey  2022.pptx
Listed Instruments Survey 2022.pptx
secretariat493 views
Lack of communication among family.pptx by ahmed164023
Lack of communication among family.pptxLack of communication among family.pptx
Lack of communication among family.pptx
ahmed16402314 views
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f... by DataScienceConferenc1
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...
CRM stick or twist workshop by info828217
CRM stick or twist workshopCRM stick or twist workshop
CRM stick or twist workshop
info82821714 views
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init... by DataScienceConferenc1
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx by DataScienceConferenc1
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
Product Research sample.pdf by AllenSingson
Product Research sample.pdfProduct Research sample.pdf
Product Research sample.pdf
AllenSingson33 views
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf by 10urkyr34
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf
10urkyr347 views

The impact of supercomputers on MSR

  • 1. The impact of supercomputers on MSR Y. Kamei A. Osaka C. Huang N. Ubayashi MSR Next Generation 2014@HKUST
  • 2. Who am I? ❖ Yasutaka Kamei http://posl.ait.kyushu-u.ac.jp/~kamei/ ❖ My research interests are 2 Summer Winter Understanding OSS Collaboration Improving Software Quality Scaling up MSR Analysis
  • 3. Today... ❖ Derive messages from HPC community to MSR community. • Make use of High Performance Computing (HPC) in MSR. HPC MSR 3
  • 4. 2014: A Space Odyssey ❖ MSR researchers will explore treasure in the Universe anytime soon. 4 2004 2014
  • 5. 2014: A Space Odyssey ❖ MSR researchers will explore treasure in the Universe anytime soon. 5 Diversity in software engineering research @ FSE 2013 20,028 projects as the Universe 2004 2014
  • 6. 2014: A Space Odyssey ❖ MSR researchers will explore treasure in the Universe anytime soon. 6 Diversity in software engineering research @ FSE 2013 20,028 projects as the Universe Challenges in Mining Whole Software Universe 2004 2014
  • 7. One solution is ❖ Supercomputer ❖ In the case of FX10, • CPU: 16 cores • Memory: 32 GByte × 4,800 nodes 7
  • 8. However… ❖ The adoption rate for HPC is still low. 8 Domain-Specific techniques for Only Fortran using HPC? and C? My tool is imple-mented by
  • 9. Prof. Chiba says ❖ Via collaboration of CREST project, 9 We can use Java, Ruby and Python on FX10!
  • 10. Case Study ❖ Evaluate the impact that HPC can have on MSR analyses. ❖ Apply HPC (FX10) to Code Clone Detection. 10
  • 11. Code Clone ❖ A code fragment that has identical or similar code fragments 11 copy%and%paste copy%and%paste code%clone clone%fragment clone%fragment clone%fragment Hotta et al. CSMR 2012
  • 12. Type-3 Clones ❖ Programmers often make some changes to code fragments after copy-and-paste. Zhang et al. ICSM 2012 12 final public void daload() {  countLabels = 0;  try { position++; bCodeStream[i++] = OPC_daload; } catch (Exception e) { resizeByteArray(OPC_daload); } }
  • 13. Type-3 Clones ❖ Programmers often make some changes to code fragments after copy-and-paste. Zhang et al. ICSM 2012 13 final public void daload() {  countLabels = 0;  try { position++; bCodeStream[i++] = OPC_daload; } catch (Exception e) { resizeByteArray(OPC_daload); } } final public void daload() { countLabels = 0; try { position++; bCodeStream[i++] = OPC_daload; } catch (Exception e) { resizeByteArray(OPC_daload); } } copy-and-paste
  • 14. Type-3 Clones ❖ Programmers often make some changes to code fragments after copy-and-paste. Zhang et al. ICSM 2012 14 final public void daload() {  countLabels = 0;  try { position++; bCodeStream[i++] = OPC_daload; } catch (Exception e) { resizeByteArray(OPC_daload); } } final public void daload() { countLabels = 0; stackDepth += 2; if (stackDepth stackMax) stackMax = stackDepth; try { position++; bCodeStream[i++] = OPC_daload; } catch (Exception e) { resizeByteArray(OPC_daload); } } copy-and-paste gap added code fragment Type-3 clones
  • 15. Our collaborator ❖ Dr. Keisuke Hotta • Postdoc • Osaka University, Japan • Visiting Researcher • Bremen University, Germany ❖ Help our group to use Scorpio (jar file), which is a PDG-based Type-3 clone detection tool. 15
  • 16. Case Study Setting ❖ Environment ❖ Dataset • Apache CXF • LOC: 830K • SIZE: 150MB 16 CPU Memory [GB] per node Cores × Nodes Desktop 1 Intel® Core™ i7 16 12×1 Desktop 2 Xeon E5-2630 v2 144 12×1 FX10 SPARC64™ IXfx 32 16×190
  • 17. 17 FX10 is much faster! 127h28m 42s 2h15m 16m58s Desktop 1 Desktop 2 FX10 Time
  • 18. How to run Scorpio in FX10 ❖ Describe only 20-30 lines of (bash) code to run Scorpio in FX10. 18 #!/bin/bash #PJM ‒L “rscgrp=debug” #PJM ‒L “node=190” #PJM ‒L “elapse=30:00” #PJB ‒j #PJM ‒S module load Java …⋯ java scorpio.jar How many nodes do we use? How long do we use FX10? What are output options?
  • 19. Current our challenges 19 Apache CXF 6,000 files Apache All Projects 770,000 files UCI Dataset 390,000,000 files Done Doing ToDo
  • 20. Challenges in Mining Whole Software Universe 2140 Diversity in software engineering research @ FSE 2013 20,028 projects as the Universe FX10 is much faster! 127h28m 42s 2h15m 16m58s Desktop 1 Desktop 2 FX10 Time Case Study ❖ Evaluate the impact that HPC can have on MSR analyses. ❖ Apply HPC (FX10) to Code Clone Detection. 7 Today... ❖ Derive messages from HPC community to MSR community. • Make use of High Performance Computing (HPC) in MSR. HPC MSR 2 2014: A Space Odyssey ❖ MSR researchers will explore treasure in the Universe anytime soon. 3 2004 2014