Service public de Wallonie – Recherche Grant 2010235 “ARIAC BY DIGITALWALLONIA4.AI”
Fonds de la Recherche Scientifique – FNRS under grant numbers J.0147.24, T.0149.22,and F.4515.23
A Bot Identification Model and Tool Based on Activity
Sequences in GitHub
Natarajan Chidambaram, Alexandre Decan, Tom Mens
Software Engineering Lab, UMONS, Belgium
1
DOI: https://doi.org/10.1016/j.jss.2024.112287
Bot accounts and Apps frequently engage in GitHub repositories
2
Pushing commits
Bot accounts and Apps frequently engage in GitHub repositories
3
Pushing commits
Commenting PR
Bot accounts and Apps frequently engage in GitHub repositories
4
Pushing commits
Commenting PR
Merging PR
Bot accounts and Apps frequently engage in GitHub repositories
5
Pushing commits
Commenting PR
Merging PR
Bot accounts and Apps frequently engage in GitHub repositories
6
• Pushing commits
• Opening/closing/reopening/
commenting issues or PRs
• Creating/deleting tags
• Creating/deleting branches
• Reviewing code
• …
Repository contributors can be
involved in a wide range of activity
types:
Pushing commits
Commenting PR
Merging PR
Bot accounts and Apps frequently engage in GitHub repositories
7
• Pushing commits
• Opening/closing/reopening/
commenting issues or PRs
• Creating/deleting tags
• Creating/deleting branches
• Reviewing code
• …
Repository contributors can be
involved in a wide range of activity
types:
Pushing commits
Commenting PR
Merging PR
Bot accounts and Apps frequently engage in GitHub repositories
8
• Pushing commits
• Opening/closing/reopening/
commenting issues or PRs
• Creating/deleting tags
• Creating/deleting branches
• Reviewing code
• …
Repository contributors can be
involved in a wide range of activity
types:
Bot accounts
9
Existing Bot Identification Approaches
10
BotHunter
Abdellatif et al.
MSR 2022
ahmad-abdellatif/
BotHunter
BoDeGHa
Golzadeh et al.
JSS 2021
mehdigolzadeh/
BoDeGHa
BoDeGiC
Golzadeh et al.
BENEVOL 2020
mehdigolzadeh/
BoDeGiC
BIMAN
Dey et al.
MSR 2020
ssc-oscar/
BIMAN_bot_detection
git commit messages,
files changed in commits,
‘bot’in name
PR and issue
comments
git commit
messages
BoDeGHa + BIMAN +
profile + #events + ..
Cannot run on “live data”. Consider limited set of activity types. Uses many API queries.
High processing time.
Identifying activities from events
11
https://api.github.com/users/zorro-bot[bot]/events?
https://api.github.com/users/zorro-bot[bot]/events?
N. Chidambaram, A. Decan, and T. Mens, “A dataset of bot and human activities in GitHub”, MSR, 2023.
GitHub REST API events endpoint:
Can retrieve the latest 300 events
of a contributor in the last 90 days
Identifying activities from events
12
https://api.github.com/users/zorro-bot[bot]/events?
https://api.github.com/users/zorro-bot[bot]/events?
N. Chidambaram, A. Decan, and T. Mens, “A dataset of bot and human activities in GitHub”, MSR, 2023.
GitHub REST API events endpoint:
Can retrieve the latest 300 events
of a contributor in the last 90 days
Identifying activities from events
13
https://api.github.com/users/zorro-bot[bot]/events?
https://api.github.com/users/zorro-bot[bot]/events?
Closing issue
Opening issue
Reopening issue
N. Chidambaram, A. Decan, and T. Mens, “A dataset of bot and human activities in GitHub”, MSR, 2023.
GitHub REST API events endpoint:
Can retrieve the latest 300 events
of a contributor in the last 90 days
Identifying activities from events
14
https://api.github.com/users/zorro-bot[bot]/events?
https://api.github.com/users/zorro-bot[bot]/events?
Closing issue
Opening issue
Reopening issue
Event types Activity types
branch
Creating tag
Creating branch
Creating repository
IssuesEvent
IssueCommentEvent Closing issue
created
Reopening issue
reopened
CreateEvent
Opening issue
N. Chidambaram, A. Decan, and T. Mens, “A dataset of bot and human activities in GitHub”, MSR, 2023.
Dataset
15
New Ground-truth Dataset
of GitHub contributors
• Training set of 1290 contributors (60%) of with 621 bots and 669 humans
• Test set of 860 contributors (40%) of which 414 bots and 446 humans.
N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
# contributors # activities median # activities median # activity types
bots 1,035 182,218 194 3.0
humans 1,115 155,028 147 9.0
total 2,150 337,246 171 6.0
Limitations of Existing
Bot Identification Approaches
16
N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
approach P R F1 #unknown data downloaded time # API queries
NBH 0.77 0.76 0.76 0 - 0.01 sec -
BoDeGiC 0.81 0.27 0.41 627 23.3 GB 23.1 h -
BoDeGHa 0.92 0.51 0.66 392 3.83 GB 7.7 h 10,222
BotHunter 0.97 0.93 0.95 1 0.261 GB 20.8 h 37,240
Test set of 860 contributors (40%) of which 414 bots and 446 humans.
Limitations of Existing
Bot Identification Approaches
17
N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
approach P R F1 #unknown data downloaded time # API queries
NBH 0.77 0.76 0.76 0 - 0.01 sec -
BoDeGiC 0.81 0.27 0.41 627 23.3 GB 23.1 h -
BoDeGHa 0.92 0.51 0.66 392 3.83 GB 7.7 h 10,222
BotHunter 0.97 0.93 0.95 1 0.261 GB 20.8 h 37,240
Test set of 860 contributors (40%) of which 414 bots and 446 humans.
Limitations of Existing
Bot Identification Approaches
18
N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
approach P R F1 #unknown data downloaded time # API queries
NBH 0.77 0.76 0.76 0 - 0.01 sec -
BoDeGiC 0.81 0.27 0.41 627 23.3 GB 23.1 h -
BoDeGHa 0.92 0.51 0.66 392 3.83 GB 7.7 h 10,222
BotHunter 0.97 0.93 0.95 1 0.261 GB 20.8 h 37,240
Test set of 860 contributors (40%) of which 414 bots and 446 humans.
Goal: Improved Bot Identification Model
Based on Activity Sequences (BIMBAS)
19
Good performance
N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
Goal: Improved Bot Identification Model
Based on Activity Sequences (BIMBAS)
20
Good performance Download less data
N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
Goal: Improved Bot Identification Model
Based on Activity Sequences (BIMBAS)
21
Good performance Download less data
Less API queries
N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
Goal: Improved Bot Identification Model
Based on Activity Sequences (BIMBAS)
22
Good performance Download less data
Less API queries Fast enough to apply on
thousands of contributors
N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
Goal: Improved Bot Identification Model
Based on Activity Sequences (BIMBAS)
23
Good performance Download less data
Independent of text
(future proof for LLM-based bots)
Less API queries Fast enough to apply on
thousands of contributors
N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
24
# activities
# activity types
# repositories
# owners
# 𝑜𝑤𝑛𝑒𝑟𝑠
# 𝑟𝑒𝑝𝑜𝑠𝑖𝑡𝑜𝑟𝑖𝑒𝑠
Counting metrics Aggregate metrics – mean, std, median, IQR, Gini
# activities per repository
# activities per activity type
# consecutive activities in a repository
# activity types per repository
Time between consecutive activities
Time spent in a repository
Time to switch repository
Time to switch activity type
# features = 5 # features = 8*5 = 40
Features of BIMBAS
N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
25
# activities
# activity types
# repositories
# owners
# 𝑜𝑤𝑛𝑒𝑟𝑠
# 𝑟𝑒𝑝𝑜𝑠𝑖𝑡𝑜𝑟𝑖𝑒𝑠
Counting metrics Aggregate metrics – mean, std, median, IQR, Gini
# activities per repository
# activities per activity type
# consecutive activities in a repository
# activity types per repository
Time between consecutive activities
Time spent in a repository
Time to switch repository
Time to switch activity type
# features = 5 # features = 8*5 = 40
type
repository
temporal
Features of BIMBAS
N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
Binary Classifiers
Gradient Boosting
Random Forest
Decision Tree
XGBoost
Linear Discriminant Analysis
Support Vector Machines
Gaussian Naïve Bayes
hyperparameter
tuning
• 7 classifiers
• 13,021 models
Selecting a classifier for BIMBAS
N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
Grid search
on training set
Model evaluation
on test set
Removing
unimportant features
Binary Classifiers
Gradient Boosting
Random Forest
Decision Tree
XGBoost
Linear Discriminant Analysis
Support Vector Machines
Gaussian Naïve Bayes
Precision 0.93
Recall 0.93
AUC-ROC 0.97
hyperparameter
tuning
• 7 classifiers
• 13,021 models
Selecting a classifier for BIMBAS
N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
Grid search
on training set
Model evaluation
on test set
Removing
unimportant features
28
Feature analysis
N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
Grid search
on training set
Removing
unimportant features
Model evaluation
on test set
29
Recursive Feature Elimination: Removed 7 features
Feature analysis
N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
Grid search
on training set
Removing
unimportant features
Model evaluation
on test set
30
Permutation Importance Analysis
1. Number of activity types
2. Number of repository owners
3. Median time between activities of different types
4. Median number of activities per type
5. Mean number of activities per type
Recursive Feature Elimination: Removed 7 features
Feature analysis
N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
Grid search
on training set
Removing
unimportant features
Model evaluation
on test set
31
Evaluating performance of BIMBAS
N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
Grid search
on training set
Model evaluation
on test set
Removing
unimportant features
approach P R F1
NBH (‘bot’ in the name) 0.77 0.76 0.76
BoDeGiC 0.81 0.27 0.41
BoDeGHa 0.92 0.51 0.66
BotHunter 0.97 0.93 0.95
BotHunter without NBH 0.85 0.80 0.82
BIMBAS 0.90 0.90 0.90
https://github.com/natarajan-chidambaram/RABBIT
RABBIT: A CLI-based tool
implementing BIMBAS
32
N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
RABBIT: A CLI-based tool
implementing BIMBAS
33
https://github.com/natarajan-chidambaram/RABBIT
N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
Demo of RABBIT
RABBIT: A CLI-based tool
implementing BIMBAS
34
https://github.com/natarajan-chidambaram/RABBIT
N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
Demo of RABBIT
Evaluating RABBIT’s Efficiency
35
N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
data downloaded X time X API queries X
NBH - - 0.01 sec -
BoDeGiC 23.3 GB 208X 23.1 h 60X - -
BoDeGHa 3.83 GB 34X 7.7 h 21X 10,222 4X
BotHunter 0.261 GB 2.3X 20.8 h 57X 37,240 15X
RABBIT 0.112 GB 22 m 2,426
Evaluating RABBIT’s Efficiency
36
N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
data downloaded X time X API queries X
NBH - - 0.01 sec -
BoDeGiC 23.3 GB 208X 23.1 h 60X - -
BoDeGHa 3.83 GB 34X 7.7 h 21X 10,222 4X
BotHunter 0.261 GB 2.3X 20.8 h 57X 37,240 15X
RABBIT 0.112 GB 22 m 2,426
An order of magnitude faster and less queries
37

A Bot Identification Model and Tool Based on GitHub Activity Sequences

  • 1.
    Service public deWallonie – Recherche Grant 2010235 “ARIAC BY DIGITALWALLONIA4.AI” Fonds de la Recherche Scientifique – FNRS under grant numbers J.0147.24, T.0149.22,and F.4515.23 A Bot Identification Model and Tool Based on Activity Sequences in GitHub Natarajan Chidambaram, Alexandre Decan, Tom Mens Software Engineering Lab, UMONS, Belgium 1 DOI: https://doi.org/10.1016/j.jss.2024.112287
  • 2.
    Bot accounts andApps frequently engage in GitHub repositories 2
  • 3.
    Pushing commits Bot accountsand Apps frequently engage in GitHub repositories 3
  • 4.
    Pushing commits Commenting PR Botaccounts and Apps frequently engage in GitHub repositories 4
  • 5.
    Pushing commits Commenting PR MergingPR Bot accounts and Apps frequently engage in GitHub repositories 5
  • 6.
    Pushing commits Commenting PR MergingPR Bot accounts and Apps frequently engage in GitHub repositories 6 • Pushing commits • Opening/closing/reopening/ commenting issues or PRs • Creating/deleting tags • Creating/deleting branches • Reviewing code • … Repository contributors can be involved in a wide range of activity types:
  • 7.
    Pushing commits Commenting PR MergingPR Bot accounts and Apps frequently engage in GitHub repositories 7 • Pushing commits • Opening/closing/reopening/ commenting issues or PRs • Creating/deleting tags • Creating/deleting branches • Reviewing code • … Repository contributors can be involved in a wide range of activity types:
  • 8.
    Pushing commits Commenting PR MergingPR Bot accounts and Apps frequently engage in GitHub repositories 8 • Pushing commits • Opening/closing/reopening/ commenting issues or PRs • Creating/deleting tags • Creating/deleting branches • Reviewing code • … Repository contributors can be involved in a wide range of activity types:
  • 9.
  • 10.
    Existing Bot IdentificationApproaches 10 BotHunter Abdellatif et al. MSR 2022 ahmad-abdellatif/ BotHunter BoDeGHa Golzadeh et al. JSS 2021 mehdigolzadeh/ BoDeGHa BoDeGiC Golzadeh et al. BENEVOL 2020 mehdigolzadeh/ BoDeGiC BIMAN Dey et al. MSR 2020 ssc-oscar/ BIMAN_bot_detection git commit messages, files changed in commits, ‘bot’in name PR and issue comments git commit messages BoDeGHa + BIMAN + profile + #events + .. Cannot run on “live data”. Consider limited set of activity types. Uses many API queries. High processing time.
  • 11.
    Identifying activities fromevents 11 https://api.github.com/users/zorro-bot[bot]/events? https://api.github.com/users/zorro-bot[bot]/events? N. Chidambaram, A. Decan, and T. Mens, “A dataset of bot and human activities in GitHub”, MSR, 2023. GitHub REST API events endpoint: Can retrieve the latest 300 events of a contributor in the last 90 days
  • 12.
    Identifying activities fromevents 12 https://api.github.com/users/zorro-bot[bot]/events? https://api.github.com/users/zorro-bot[bot]/events? N. Chidambaram, A. Decan, and T. Mens, “A dataset of bot and human activities in GitHub”, MSR, 2023. GitHub REST API events endpoint: Can retrieve the latest 300 events of a contributor in the last 90 days
  • 13.
    Identifying activities fromevents 13 https://api.github.com/users/zorro-bot[bot]/events? https://api.github.com/users/zorro-bot[bot]/events? Closing issue Opening issue Reopening issue N. Chidambaram, A. Decan, and T. Mens, “A dataset of bot and human activities in GitHub”, MSR, 2023. GitHub REST API events endpoint: Can retrieve the latest 300 events of a contributor in the last 90 days
  • 14.
    Identifying activities fromevents 14 https://api.github.com/users/zorro-bot[bot]/events? https://api.github.com/users/zorro-bot[bot]/events? Closing issue Opening issue Reopening issue Event types Activity types branch Creating tag Creating branch Creating repository IssuesEvent IssueCommentEvent Closing issue created Reopening issue reopened CreateEvent Opening issue N. Chidambaram, A. Decan, and T. Mens, “A dataset of bot and human activities in GitHub”, MSR, 2023.
  • 15.
    Dataset 15 New Ground-truth Dataset ofGitHub contributors • Training set of 1290 contributors (60%) of with 621 bots and 669 humans • Test set of 860 contributors (40%) of which 414 bots and 446 humans. N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025. # contributors # activities median # activities median # activity types bots 1,035 182,218 194 3.0 humans 1,115 155,028 147 9.0 total 2,150 337,246 171 6.0
  • 16.
    Limitations of Existing BotIdentification Approaches 16 N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025. approach P R F1 #unknown data downloaded time # API queries NBH 0.77 0.76 0.76 0 - 0.01 sec - BoDeGiC 0.81 0.27 0.41 627 23.3 GB 23.1 h - BoDeGHa 0.92 0.51 0.66 392 3.83 GB 7.7 h 10,222 BotHunter 0.97 0.93 0.95 1 0.261 GB 20.8 h 37,240 Test set of 860 contributors (40%) of which 414 bots and 446 humans.
  • 17.
    Limitations of Existing BotIdentification Approaches 17 N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025. approach P R F1 #unknown data downloaded time # API queries NBH 0.77 0.76 0.76 0 - 0.01 sec - BoDeGiC 0.81 0.27 0.41 627 23.3 GB 23.1 h - BoDeGHa 0.92 0.51 0.66 392 3.83 GB 7.7 h 10,222 BotHunter 0.97 0.93 0.95 1 0.261 GB 20.8 h 37,240 Test set of 860 contributors (40%) of which 414 bots and 446 humans.
  • 18.
    Limitations of Existing BotIdentification Approaches 18 N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025. approach P R F1 #unknown data downloaded time # API queries NBH 0.77 0.76 0.76 0 - 0.01 sec - BoDeGiC 0.81 0.27 0.41 627 23.3 GB 23.1 h - BoDeGHa 0.92 0.51 0.66 392 3.83 GB 7.7 h 10,222 BotHunter 0.97 0.93 0.95 1 0.261 GB 20.8 h 37,240 Test set of 860 contributors (40%) of which 414 bots and 446 humans.
  • 19.
    Goal: Improved BotIdentification Model Based on Activity Sequences (BIMBAS) 19 Good performance N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
  • 20.
    Goal: Improved BotIdentification Model Based on Activity Sequences (BIMBAS) 20 Good performance Download less data N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
  • 21.
    Goal: Improved BotIdentification Model Based on Activity Sequences (BIMBAS) 21 Good performance Download less data Less API queries N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
  • 22.
    Goal: Improved BotIdentification Model Based on Activity Sequences (BIMBAS) 22 Good performance Download less data Less API queries Fast enough to apply on thousands of contributors N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
  • 23.
    Goal: Improved BotIdentification Model Based on Activity Sequences (BIMBAS) 23 Good performance Download less data Independent of text (future proof for LLM-based bots) Less API queries Fast enough to apply on thousands of contributors N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
  • 24.
    24 # activities # activitytypes # repositories # owners # 𝑜𝑤𝑛𝑒𝑟𝑠 # 𝑟𝑒𝑝𝑜𝑠𝑖𝑡𝑜𝑟𝑖𝑒𝑠 Counting metrics Aggregate metrics – mean, std, median, IQR, Gini # activities per repository # activities per activity type # consecutive activities in a repository # activity types per repository Time between consecutive activities Time spent in a repository Time to switch repository Time to switch activity type # features = 5 # features = 8*5 = 40 Features of BIMBAS N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
  • 25.
    25 # activities # activitytypes # repositories # owners # 𝑜𝑤𝑛𝑒𝑟𝑠 # 𝑟𝑒𝑝𝑜𝑠𝑖𝑡𝑜𝑟𝑖𝑒𝑠 Counting metrics Aggregate metrics – mean, std, median, IQR, Gini # activities per repository # activities per activity type # consecutive activities in a repository # activity types per repository Time between consecutive activities Time spent in a repository Time to switch repository Time to switch activity type # features = 5 # features = 8*5 = 40 type repository temporal Features of BIMBAS N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
  • 26.
    Binary Classifiers Gradient Boosting RandomForest Decision Tree XGBoost Linear Discriminant Analysis Support Vector Machines Gaussian Naïve Bayes hyperparameter tuning • 7 classifiers • 13,021 models Selecting a classifier for BIMBAS N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025. Grid search on training set Model evaluation on test set Removing unimportant features
  • 27.
    Binary Classifiers Gradient Boosting RandomForest Decision Tree XGBoost Linear Discriminant Analysis Support Vector Machines Gaussian Naïve Bayes Precision 0.93 Recall 0.93 AUC-ROC 0.97 hyperparameter tuning • 7 classifiers • 13,021 models Selecting a classifier for BIMBAS N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025. Grid search on training set Model evaluation on test set Removing unimportant features
  • 28.
    28 Feature analysis N. Chidambaram,A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025. Grid search on training set Removing unimportant features Model evaluation on test set
  • 29.
    29 Recursive Feature Elimination:Removed 7 features Feature analysis N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025. Grid search on training set Removing unimportant features Model evaluation on test set
  • 30.
    30 Permutation Importance Analysis 1.Number of activity types 2. Number of repository owners 3. Median time between activities of different types 4. Median number of activities per type 5. Mean number of activities per type Recursive Feature Elimination: Removed 7 features Feature analysis N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025. Grid search on training set Removing unimportant features Model evaluation on test set
  • 31.
    31 Evaluating performance ofBIMBAS N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025. Grid search on training set Model evaluation on test set Removing unimportant features approach P R F1 NBH (‘bot’ in the name) 0.77 0.76 0.76 BoDeGiC 0.81 0.27 0.41 BoDeGHa 0.92 0.51 0.66 BotHunter 0.97 0.93 0.95 BotHunter without NBH 0.85 0.80 0.82 BIMBAS 0.90 0.90 0.90
  • 32.
    https://github.com/natarajan-chidambaram/RABBIT RABBIT: A CLI-basedtool implementing BIMBAS 32 N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025.
  • 33.
    RABBIT: A CLI-basedtool implementing BIMBAS 33 https://github.com/natarajan-chidambaram/RABBIT N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025. Demo of RABBIT
  • 34.
    RABBIT: A CLI-basedtool implementing BIMBAS 34 https://github.com/natarajan-chidambaram/RABBIT N. Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025. Demo of RABBIT
  • 35.
    Evaluating RABBIT’s Efficiency 35 N.Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025. data downloaded X time X API queries X NBH - - 0.01 sec - BoDeGiC 23.3 GB 208X 23.1 h 60X - - BoDeGHa 3.83 GB 34X 7.7 h 21X 10,222 4X BotHunter 0.261 GB 2.3X 20.8 h 57X 37,240 15X RABBIT 0.112 GB 22 m 2,426
  • 36.
    Evaluating RABBIT’s Efficiency 36 N.Chidambaram, A. Decan, and T. Mens, “A BotIdentification Model and Tool Based on GitHub Activity Sequences”, JSS, 2025. data downloaded X time X API queries X NBH - - 0.01 sec - BoDeGiC 23.3 GB 208X 23.1 h 60X - - BoDeGHa 3.83 GB 34X 7.7 h 21X 10,222 4X BotHunter 0.261 GB 2.3X 20.8 h 57X 37,240 15X RABBIT 0.112 GB 22 m 2,426 An order of magnitude faster and less queries
  • 37.