SlideShare a Scribd company logo
A Dataset of Bot and Human Activities in GitHub
Natarajan Chidambaram, Alexandre Decan, Tom Mens
Software Engineering Lab, University of Mons, Belgium
Supported by Service public de Wallonie – Recherche under grant n°2010235 “ARIAC BY DIGITALWALLONIA4.AI”and
Fonds de la Recherche Scientifique – FNRS under grant numbers F.4515.23, O.0157.18F-RG43 and T.0149.22
SECO-ASSIST
https://zenodo.org/record/7740521
24 Activity types: Opening issue, Opening pull request, Publishing a release, …
GitHub Events API
can retrieve the latest 300 events
in the last 90 days
Closing issue
Opening issue
Reopening issue
branch
repository
tag
Creating tag
Creating branch
Creating repository
IssuesEvent
IssueCommentEvent Closing issue
created
closed
Reopening issue
reopened
created
CreateEvent
Opening issue
opened
# contributors # activities
Bot dataset 385 649,755
Human dataset 616 184,056
total 1,001 833,811
• 834K activities obtained from 1M+ events
• 24 activity types
• 1K contributors
• 105 days (25 Nov 2022 to 9 Mar 2023)
{
"date": "2022-11-26T14:13:19+00:00",
"activity": "Commenting issue",
"contributor": "kubevirt-bot",
"repository": "kubevirt/kubevirt",
"comment": {
"length": 255,
"GH_node": "IC_kwDOBJIk985PKH4s"
},
"issue": {
"id": 8294,
"title": "SRIOV VF interface not found in VM",
"created_at": "2022-08-13T11:10:06+00:00",
"status": "open",
"closed_at": null,
"resolved": false,
"GH_node": "I_kwDOBJIk985Pvz5k"
}
"conversation": {
"comments": 9
}
}
JSON format
Usefulness of the Dataset
• Analyse most frequent activities
• Find frequent patterns in activities
• Find behavioural differences between bots and humans
• Forecast future contributor activities
• Detect which tasks are made by which bots
• Classify contributors based on activities
• Develop a new bot detection technique
bot human
Some Distinguishing Features
Dispersion of activity types across
repositories
Time to shift between repositories
Number of activity types
Variation in activity frequency
Remove no
longer existing
contributors
Dataset Construction Process
Golzadeh
et al.
Abdellatif
et al.
Wang
et al.
Chidambaram
et al.
Combine all
contributors
Set of
contributors
Get
contributor
events
GitHub
event
stream
Missed
event?
Curating contributors
Drop
contributor
events
Yes
Contributor
events
no
Querying events
every 6 hours
Identify
contributor
activities
Human
activity
Anonymise
activities
Yes
Generating activities
Human
activities
Bot
activities
no
https://zenodo.org/record/7740521
# contributors # activities
Bot dataset 385 649,755
Human dataset 616 184,056
total 1,001 833,811
A Dataset of Bot and Human Activities in GitHub

More Related Content

Similar to A Dataset of Bot and Human Activities in GitHub

UKSG webinar: Blockchain in research and education with Martin Hamilton, Jisc
UKSG webinar: Blockchain in research and education with Martin Hamilton, JiscUKSG webinar: Blockchain in research and education with Martin Hamilton, Jisc
UKSG webinar: Blockchain in research and education with Martin Hamilton, Jisc
UKSG: connecting the knowledge community
 
Blockchain in research and education - UKSG Webinar - September 2017
Blockchain in research and education - UKSG Webinar - September 2017Blockchain in research and education - UKSG Webinar - September 2017
Blockchain in research and education - UKSG Webinar - September 2017
Martin Hamilton
 
ICT 2013 Flyer - The Citizen Cyberlab
ICT 2013 Flyer - The Citizen CyberlabICT 2013 Flyer - The Citizen Cyberlab
ICT 2013 Flyer - The Citizen Cyberlab
Margaret Gold
 
Teaching Machine Learning with Physical Computing - July 2023
Teaching Machine Learning with Physical Computing - July 2023Teaching Machine Learning with Physical Computing - July 2023
Teaching Machine Learning with Physical Computing - July 2023
Hal Speed
 
The role of individuals and communities in IoT
The role of individuals and communities in IoT The role of individuals and communities in IoT
The role of individuals and communities in IoT
Paola Negrin
 
EGI and EOSC-hub Digital Innovation Hub
EGI and EOSC-hub Digital Innovation HubEGI and EOSC-hub Digital Innovation Hub
EGI and EOSC-hub Digital Innovation Hub
ICT FOOTPRINT .eu
 
GDSC WoC 3.0 Opening Ceremony.pptx
GDSC WoC 3.0 Opening Ceremony.pptxGDSC WoC 3.0 Opening Ceremony.pptx
GDSC WoC 3.0 Opening Ceremony.pptx
SuryanshOmar1
 
EU-funded OPTIMIS Cloud Project - Exploitation & Dissemination
EU-funded OPTIMIS Cloud Project - Exploitation & DisseminationEU-funded OPTIMIS Cloud Project - Exploitation & Dissemination
EU-funded OPTIMIS Cloud Project - Exploitation & Dissemination
Csilla Zsigri
 
Opening Innovation
Opening InnovationOpening Innovation
Opening Innovation
Michael Heiss
 
Cyber Security Challenge Belgium - welcome to our belgian IT security community
Cyber Security Challenge Belgium - welcome to our belgian IT security communityCyber Security Challenge Belgium - welcome to our belgian IT security community
Cyber Security Challenge Belgium - welcome to our belgian IT security community
Sebastien Deleersnyder
 
What Is Digital Social Innovation?
What Is Digital Social Innovation?What Is Digital Social Innovation?
What Is Digital Social Innovation?
Crowdsourcing Week
 
MSR2022_Hackathon.pdf
MSR2022_Hackathon.pdfMSR2022_Hackathon.pdf
MSR2022_Hackathon.pdf
natarajan8993
 
Oscon 2016: open source lessons from the todo group
Oscon 2016: open source lessons from the todo groupOscon 2016: open source lessons from the todo group
Oscon 2016: open source lessons from the todo group
Ben VanEvery
 
What is the Living room of the future?
What is the Living room of the future?What is the Living room of the future?
What is the Living room of the future?
Ian Forrester
 
Conjugating Open Science & Open Education: The Sci-GaIA e-Research Hackfest m...
Conjugating Open Science & Open Education: The Sci-GaIA e-Research Hackfest m...Conjugating Open Science & Open Education: The Sci-GaIA e-Research Hackfest m...
Conjugating Open Science & Open Education: The Sci-GaIA e-Research Hackfest m...
African Open Science Platform
 
Blockchain in Digital Vienna - Technology of an innovative administration
Blockchain in Digital Vienna - Technology of an innovative administrationBlockchain in Digital Vienna - Technology of an innovative administration
Blockchain in Digital Vienna - Technology of an innovative administration
Stadt Wien
 
Ii 06. peter baeck dsi tepsie lodz presentation 2.0
Ii 06. peter baeck   dsi tepsie lodz presentation 2.0Ii 06. peter baeck   dsi tepsie lodz presentation 2.0
Ii 06. peter baeck dsi tepsie lodz presentation 2.0
Maciej Szczepańczyk
 
Visualization notes
Visualization notesVisualization notes
Visualization notes
University of South Australlia
 
Creating And Sharing Open Cultural Heritage & Data
Creating And Sharing Open Cultural Heritage & DataCreating And Sharing Open Cultural Heritage & Data
Creating And Sharing Open Cultural Heritage & Data
OurDigitalWorld
 
How the rise of DevOps and containers is transforming IT service delivery
How the rise of DevOps and containers is transforming IT service deliveryHow the rise of DevOps and containers is transforming IT service delivery
How the rise of DevOps and containers is transforming IT service delivery
Donnie Berkholz
 

Similar to A Dataset of Bot and Human Activities in GitHub (20)

UKSG webinar: Blockchain in research and education with Martin Hamilton, Jisc
UKSG webinar: Blockchain in research and education with Martin Hamilton, JiscUKSG webinar: Blockchain in research and education with Martin Hamilton, Jisc
UKSG webinar: Blockchain in research and education with Martin Hamilton, Jisc
 
Blockchain in research and education - UKSG Webinar - September 2017
Blockchain in research and education - UKSG Webinar - September 2017Blockchain in research and education - UKSG Webinar - September 2017
Blockchain in research and education - UKSG Webinar - September 2017
 
ICT 2013 Flyer - The Citizen Cyberlab
ICT 2013 Flyer - The Citizen CyberlabICT 2013 Flyer - The Citizen Cyberlab
ICT 2013 Flyer - The Citizen Cyberlab
 
Teaching Machine Learning with Physical Computing - July 2023
Teaching Machine Learning with Physical Computing - July 2023Teaching Machine Learning with Physical Computing - July 2023
Teaching Machine Learning with Physical Computing - July 2023
 
The role of individuals and communities in IoT
The role of individuals and communities in IoT The role of individuals and communities in IoT
The role of individuals and communities in IoT
 
EGI and EOSC-hub Digital Innovation Hub
EGI and EOSC-hub Digital Innovation HubEGI and EOSC-hub Digital Innovation Hub
EGI and EOSC-hub Digital Innovation Hub
 
GDSC WoC 3.0 Opening Ceremony.pptx
GDSC WoC 3.0 Opening Ceremony.pptxGDSC WoC 3.0 Opening Ceremony.pptx
GDSC WoC 3.0 Opening Ceremony.pptx
 
EU-funded OPTIMIS Cloud Project - Exploitation & Dissemination
EU-funded OPTIMIS Cloud Project - Exploitation & DisseminationEU-funded OPTIMIS Cloud Project - Exploitation & Dissemination
EU-funded OPTIMIS Cloud Project - Exploitation & Dissemination
 
Opening Innovation
Opening InnovationOpening Innovation
Opening Innovation
 
Cyber Security Challenge Belgium - welcome to our belgian IT security community
Cyber Security Challenge Belgium - welcome to our belgian IT security communityCyber Security Challenge Belgium - welcome to our belgian IT security community
Cyber Security Challenge Belgium - welcome to our belgian IT security community
 
What Is Digital Social Innovation?
What Is Digital Social Innovation?What Is Digital Social Innovation?
What Is Digital Social Innovation?
 
MSR2022_Hackathon.pdf
MSR2022_Hackathon.pdfMSR2022_Hackathon.pdf
MSR2022_Hackathon.pdf
 
Oscon 2016: open source lessons from the todo group
Oscon 2016: open source lessons from the todo groupOscon 2016: open source lessons from the todo group
Oscon 2016: open source lessons from the todo group
 
What is the Living room of the future?
What is the Living room of the future?What is the Living room of the future?
What is the Living room of the future?
 
Conjugating Open Science & Open Education: The Sci-GaIA e-Research Hackfest m...
Conjugating Open Science & Open Education: The Sci-GaIA e-Research Hackfest m...Conjugating Open Science & Open Education: The Sci-GaIA e-Research Hackfest m...
Conjugating Open Science & Open Education: The Sci-GaIA e-Research Hackfest m...
 
Blockchain in Digital Vienna - Technology of an innovative administration
Blockchain in Digital Vienna - Technology of an innovative administrationBlockchain in Digital Vienna - Technology of an innovative administration
Blockchain in Digital Vienna - Technology of an innovative administration
 
Ii 06. peter baeck dsi tepsie lodz presentation 2.0
Ii 06. peter baeck   dsi tepsie lodz presentation 2.0Ii 06. peter baeck   dsi tepsie lodz presentation 2.0
Ii 06. peter baeck dsi tepsie lodz presentation 2.0
 
Visualization notes
Visualization notesVisualization notes
Visualization notes
 
Creating And Sharing Open Cultural Heritage & Data
Creating And Sharing Open Cultural Heritage & DataCreating And Sharing Open Cultural Heritage & Data
Creating And Sharing Open Cultural Heritage & Data
 
How the rise of DevOps and containers is transforming IT service delivery
How the rise of DevOps and containers is transforming IT service deliveryHow the rise of DevOps and containers is transforming IT service delivery
How the rise of DevOps and containers is transforming IT service delivery
 

More from Tom Mens

How to be(come) a successful PhD student
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD student
Tom Mens
 
The (r)evolution of CI/CD on GitHub
 The (r)evolution of CI/CD on GitHub The (r)evolution of CI/CD on GitHub
The (r)evolution of CI/CD on GitHub
Tom Mens
 
Nurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the FutureNurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the Future
Tom Mens
 
Comment programmer un robot en 30 minutes?
Comment programmer un robot en 30 minutes?Comment programmer un robot en 30 minutes?
Comment programmer un robot en 30 minutes?
Tom Mens
 
On the rise and fall of CI services in GitHub
On the rise and fall of CI services in GitHubOn the rise and fall of CI services in GitHub
On the rise and fall of CI services in GitHub
Tom Mens
 
On backporting practices in package dependency networks
On backporting practices in package dependency networksOn backporting practices in package dependency networks
On backporting practices in package dependency networks
Tom Mens
 
Comparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Comparing semantic versioning practices in Cargo, npm, Packagist and RubygemsComparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Comparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Tom Mens
 
Lost in Zero Space
Lost in Zero SpaceLost in Zero Space
Lost in Zero Space
Tom Mens
 
Evaluating a bot detection model on git commit messages
Evaluating a bot detection model on git commit messagesEvaluating a bot detection model on git commit messages
Evaluating a bot detection model on git commit messages
Tom Mens
 
Is my software ecosystem healthy? It depends!
Is my software ecosystem healthy? It depends!Is my software ecosystem healthy? It depends!
Is my software ecosystem healthy? It depends!
Tom Mens
 
Bot or not? Detecting bots in GitHub pull request activity based on comment s...
Bot or not? Detecting bots in GitHub pull request activity based on comment s...Bot or not? Detecting bots in GitHub pull request activity based on comment s...
Bot or not? Detecting bots in GitHub pull request activity based on comment s...
Tom Mens
 
On the fragility of open source software packaging ecosystems
On the fragility of open source software packaging ecosystemsOn the fragility of open source software packaging ecosystems
On the fragility of open source software packaging ecosystems
Tom Mens
 
How magic is zero? An Empirical Analysis of Initial Development Releases in S...
How magic is zero? An Empirical Analysis of Initial Development Releases in S...How magic is zero? An Empirical Analysis of Initial Development Releases in S...
How magic is zero? An Empirical Analysis of Initial Development Releases in S...
Tom Mens
 
Comparing dependency issues across software package distributions (FOSDEM 2020)
Comparing dependency issues across software package distributions (FOSDEM 2020)Comparing dependency issues across software package distributions (FOSDEM 2020)
Comparing dependency issues across software package distributions (FOSDEM 2020)
Tom Mens
 
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Tom Mens
 
SecoHealth 2019 Research Achievements
SecoHealth 2019 Research AchievementsSecoHealth 2019 Research Achievements
SecoHealth 2019 Research Achievements
Tom Mens
 
SECO-Assist 2019 research seminar
SECO-Assist 2019 research seminarSECO-Assist 2019 research seminar
SECO-Assist 2019 research seminar
Tom Mens
 
Empirically Analysing the Socio-Technical Health of Software Package Managers
Empirically Analysing the Socio-Technical Health of Software Package ManagersEmpirically Analysing the Socio-Technical Health of Software Package Managers
Empirically Analysing the Socio-Technical Health of Software Package Managers
Tom Mens
 
ConPan: Analysing Packages Installed in Docker Containers
ConPan: Analysing Packages Installed in Docker ContainersConPan: Analysing Packages Installed in Docker Containers
ConPan: Analysing Packages Installed in Docker Containers
Tom Mens
 
On the Relation between Outdated Docker Containers, Severity Vulnerabilities,...
On the Relation between Outdated Docker Containers, Severity Vulnerabilities,...On the Relation between Outdated Docker Containers, Severity Vulnerabilities,...
On the Relation between Outdated Docker Containers, Severity Vulnerabilities,...
Tom Mens
 

More from Tom Mens (20)

How to be(come) a successful PhD student
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD student
 
The (r)evolution of CI/CD on GitHub
 The (r)evolution of CI/CD on GitHub The (r)evolution of CI/CD on GitHub
The (r)evolution of CI/CD on GitHub
 
Nurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the FutureNurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the Future
 
Comment programmer un robot en 30 minutes?
Comment programmer un robot en 30 minutes?Comment programmer un robot en 30 minutes?
Comment programmer un robot en 30 minutes?
 
On the rise and fall of CI services in GitHub
On the rise and fall of CI services in GitHubOn the rise and fall of CI services in GitHub
On the rise and fall of CI services in GitHub
 
On backporting practices in package dependency networks
On backporting practices in package dependency networksOn backporting practices in package dependency networks
On backporting practices in package dependency networks
 
Comparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Comparing semantic versioning practices in Cargo, npm, Packagist and RubygemsComparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Comparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
 
Lost in Zero Space
Lost in Zero SpaceLost in Zero Space
Lost in Zero Space
 
Evaluating a bot detection model on git commit messages
Evaluating a bot detection model on git commit messagesEvaluating a bot detection model on git commit messages
Evaluating a bot detection model on git commit messages
 
Is my software ecosystem healthy? It depends!
Is my software ecosystem healthy? It depends!Is my software ecosystem healthy? It depends!
Is my software ecosystem healthy? It depends!
 
Bot or not? Detecting bots in GitHub pull request activity based on comment s...
Bot or not? Detecting bots in GitHub pull request activity based on comment s...Bot or not? Detecting bots in GitHub pull request activity based on comment s...
Bot or not? Detecting bots in GitHub pull request activity based on comment s...
 
On the fragility of open source software packaging ecosystems
On the fragility of open source software packaging ecosystemsOn the fragility of open source software packaging ecosystems
On the fragility of open source software packaging ecosystems
 
How magic is zero? An Empirical Analysis of Initial Development Releases in S...
How magic is zero? An Empirical Analysis of Initial Development Releases in S...How magic is zero? An Empirical Analysis of Initial Development Releases in S...
How magic is zero? An Empirical Analysis of Initial Development Releases in S...
 
Comparing dependency issues across software package distributions (FOSDEM 2020)
Comparing dependency issues across software package distributions (FOSDEM 2020)Comparing dependency issues across software package distributions (FOSDEM 2020)
Comparing dependency issues across software package distributions (FOSDEM 2020)
 
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
 
SecoHealth 2019 Research Achievements
SecoHealth 2019 Research AchievementsSecoHealth 2019 Research Achievements
SecoHealth 2019 Research Achievements
 
SECO-Assist 2019 research seminar
SECO-Assist 2019 research seminarSECO-Assist 2019 research seminar
SECO-Assist 2019 research seminar
 
Empirically Analysing the Socio-Technical Health of Software Package Managers
Empirically Analysing the Socio-Technical Health of Software Package ManagersEmpirically Analysing the Socio-Technical Health of Software Package Managers
Empirically Analysing the Socio-Technical Health of Software Package Managers
 
ConPan: Analysing Packages Installed in Docker Containers
ConPan: Analysing Packages Installed in Docker ContainersConPan: Analysing Packages Installed in Docker Containers
ConPan: Analysing Packages Installed in Docker Containers
 
On the Relation between Outdated Docker Containers, Severity Vulnerabilities,...
On the Relation between Outdated Docker Containers, Severity Vulnerabilities,...On the Relation between Outdated Docker Containers, Severity Vulnerabilities,...
On the Relation between Outdated Docker Containers, Severity Vulnerabilities,...
 

Recently uploaded

3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
RASHMI M G
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
SSR02
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 

Recently uploaded (20)

3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 

A Dataset of Bot and Human Activities in GitHub

  • 1. A Dataset of Bot and Human Activities in GitHub Natarajan Chidambaram, Alexandre Decan, Tom Mens Software Engineering Lab, University of Mons, Belgium Supported by Service public de Wallonie – Recherche under grant n°2010235 “ARIAC BY DIGITALWALLONIA4.AI”and Fonds de la Recherche Scientifique – FNRS under grant numbers F.4515.23, O.0157.18F-RG43 and T.0149.22 SECO-ASSIST https://zenodo.org/record/7740521
  • 2. 24 Activity types: Opening issue, Opening pull request, Publishing a release, …
  • 3. GitHub Events API can retrieve the latest 300 events in the last 90 days Closing issue Opening issue Reopening issue branch repository tag Creating tag Creating branch Creating repository IssuesEvent IssueCommentEvent Closing issue created closed Reopening issue reopened created CreateEvent Opening issue opened
  • 4. # contributors # activities Bot dataset 385 649,755 Human dataset 616 184,056 total 1,001 833,811 • 834K activities obtained from 1M+ events • 24 activity types • 1K contributors • 105 days (25 Nov 2022 to 9 Mar 2023) { "date": "2022-11-26T14:13:19+00:00", "activity": "Commenting issue", "contributor": "kubevirt-bot", "repository": "kubevirt/kubevirt", "comment": { "length": 255, "GH_node": "IC_kwDOBJIk985PKH4s" }, "issue": { "id": 8294, "title": "SRIOV VF interface not found in VM", "created_at": "2022-08-13T11:10:06+00:00", "status": "open", "closed_at": null, "resolved": false, "GH_node": "I_kwDOBJIk985Pvz5k" } "conversation": { "comments": 9 } } JSON format
  • 5. Usefulness of the Dataset • Analyse most frequent activities • Find frequent patterns in activities • Find behavioural differences between bots and humans • Forecast future contributor activities • Detect which tasks are made by which bots • Classify contributors based on activities • Develop a new bot detection technique bot human
  • 6. Some Distinguishing Features Dispersion of activity types across repositories Time to shift between repositories Number of activity types Variation in activity frequency
  • 7. Remove no longer existing contributors Dataset Construction Process Golzadeh et al. Abdellatif et al. Wang et al. Chidambaram et al. Combine all contributors Set of contributors Get contributor events GitHub event stream Missed event? Curating contributors Drop contributor events Yes Contributor events no Querying events every 6 hours Identify contributor activities Human activity Anonymise activities Yes Generating activities Human activities Bot activities no
  • 8. https://zenodo.org/record/7740521 # contributors # activities Bot dataset 385 649,755 Human dataset 616 184,056 total 1,001 833,811 A Dataset of Bot and Human Activities in GitHub