Makar: A Framework for Multi-source
Studies Based on Unstructured Data 

Mathias Birrer, Pooja Rani, Sebastiano Panichella, Oscar Nierstrasz
University of Bern, Switzerland
/** Code comments */
Do developers discuss code comments?
/**

* TOD
O

*
/

public void log(String s)
{

System.out.println(s)
;

}
2
Developers use various discussion sources
3
Planning
Implementation
Releasing
Maintenance
Testing
4
Challenges
Planning
Implementation
Releasing
Maintenance
Testing
Extracting
Processing Querying Exploring
Makar
5
Makar: A tool for Multi-source Studies
https://github.com/maethub/makar
Planning
Implementation
Releasing
Maintenance
Testing
Extracting
Processing Querying Exploring
Features
Extract data from di
ff
erent sources

e.g., Stack Over
fl
ow, Github, Mailing Lists

Support mapping and processing the data

Explore and perform ad-hoc searches

Extending the dataset easily
6
7
Makar Architecture
https://github.com/maethub/makar
8
Do developers ask questions
about comments?
Do the comment
related questions contain
code snippets?
Demo
9
Case Study
10
Import adapters: Stack overflow, CSV, Apache mailing list

Preprocess the data:

Export adapters: CSV adapters
Case Study
11
Features: Import
A complete demo is available at Youtube
12
A complete demo is available at Youtube
Features: Search
13
Features: Collection
A complete demo is available at Youtube
14
Features: Transform
A complete demo is available at Youtube
15
Tool Comparison
Tool Extract Process Manage
Octoparse
Knime Extension
Rapidminer Limited Limited
ELKI
Keel
WEKA
TrifactaWrangler
Boa Limited
OpenRefine
Makar
16
Future work
Extension of data source adapters


Building a UI of the study pipeline


Development of analysis and visualisation components


Facilitation of more multi-source studies
Hosted on Github

https://github.com/maethub/makar

Demo at YouTube

https://youtu.be/Yqj1b4Bv-58

Replication Package at Zenodo

https://doi.org/10.5281/zenodo.4434822

Contact us

17
https://twitter.com/poojaruhal http://scg.unibe.ch/staff/Pooja-Rani
Makar: A Framework for Multi-source
Studies Based on Unstructured Data

Makar: a framework for multi-source studies based on unstructured data