SlideShare a Scribd company logo
1 of 22
Download to read offline
Exploring Language
Communities on GitHub
Antigoni M. Founta
Introduction
This study focuses on the exploration of underlying patterns and the detection of
communities on programming languages used by GitHub users, via network analysis.
There are two graphs derived from the whole dataset and two location-specific graphs,
in order to study both the general audience of GitHub as well as the trends regarding
some sample locations.
Goal: Understand how languages are practically grouped in terms of the way
developers use them, as well as discover trends either worldwide or on specific
locations.
Nodes → Languages
Edges → Language co-occurrence in User Profiles (based on the user repositories)
GitHub
● GitHub is a web-based Git repository hosting service
● It offers distributed revision control and source code
management (SCM)
● It is the largest host of source code in the world! [1]
Why Github?
“The introduction of social features in a code hosting site has drawn
particular attention from researchers while the integrated social
features, and the availability of metadata through an accessible api
have made GitHub very attractive for software engineering
researchers” [3]
Top Image Source: https://goo.gl/CWBMqb
Bottom Image Source: https://github.com/logos
● Programming Language
categorization ambiguity
● GitHub bias on Web Development
● Locations and users have
power-law distribution: there are
numerous developers from few
locations (such as California,
London etc) and there is a
significant amount of locations
with few users
Pros Challenges
● Developers will get a hint of
which languages are used jointly,
and thus perhaps serve the same
purpose.
● Language creators will get a hint
of what their audience prefer and
trust.
● Language communities might
actually be another way to
explore developer communities.
Fundamentals
Dataset Features
➔ ID, Username, Location, Followers, Public Repos, Languages & Bytes of code
Network Structure
➔ Nodes: Languages
◆ Attribute: Total Bytes of Code
➔ Edges: Pairs of Languages that co-occurred in at least one user profile
◆ Weight: Amount of users that use both languages
Challenges upon Data
➔ Only public repositories accessible (users mainly work on private!)
➔ Languages are added by the user (empty, not real, not written in the same way)
PyGithub[2]
Final Datasets
❏ 4000 users since GitHub foundation + 150.000 from 2012
❏ Filter: Get only users with locations!
❏ Final: 2300 users since GitHub foundation + 37.000 from 2012
Descriptive Statistics
Data Distribution
Methodology
Create graph (as described):
● Filters: Degree Range
● Layout: Force Atlas 2
● Node size: “Bytes of Code” Range
● Label size: Degree Range
Compute Modularity & get communities:
● Sometimes using edge weights, sometimes not
Visualize pairs of languages and amount of developers that use both
Results: All Data - All Languages
User-based
Language Graph
Language Co-occurrences on User Profiles &
Top Languages based on Bytes of Code written
Results: All Data - Top Languages
User-based
Language Graph
Language Co-occurrences
on User Profiles
#Top languages had minor differences, and thus are not reported
Results: California - Top 3 Languages
User-based
Language Graph
Language Co-occurrences on User Profiles &
Top Languages based on Bytes of Code written
Results: Greece - Top 3 Languages
User-based
Language Graph
Language Co-occurrences on User Profiles &
Top Languages based on Bytes of Code written
Repo-based
Language Graph
Communities
(modularity: 0.23)
Blue: Web-oriented
Pink: Desktop-oriented
Yellow: Other
Conclusions
Language-Oriented
➔ “Web-oriented” is the most robust category of languages used in Github
➔ “JavaScript - CSS” is the leading pair of languages, always outnumbering all other pairs
➔ Even though JavaScript is almost always dominating Pairs of Languages, C is always the
most used one in matters of Bytes of Code [perhaps C users are not language-extroverts…]
Scheme-Oriented
➔ With a user-based scheme we can understand the general preferences of developers and the
patterns between languages. [difficult when dataset is big!]
➔ With a repo-based scheme we can understand hidden (or at least not widely known)
patterns of languages that are used for same purposes.
➔ General purpose: repo-based scheme
Location purpose: user-based scheme
Future Work
● More Data !
● More Locations and Comparisons
● Language Graphs based on Top/Most influential Users [using followers or stars]
● Association Rules on Languages for community detection
● User Graph to detect user communities per Location (e.g. web developers, game
developers) and compare with Language Graph of Location
References
1. Github on Wikipedia: https://en.wikipedia.org/wiki/GitHub
2. PyGithub Library: https://github.com/PyGithub/PyGithub
3. Kalliamvakou, Eirini, et al. "The promises and perils of mining GitHub." Proceedings of the
11th working conference on mining software repositories. ACM, 2014.
4. Thung, Ferdian, et al. "Network structure of social coding in github." Software maintenance
and reengineering (csmr), 2013 17th european conference on. IEEE, 2013.
5. Takhteyev, Yuri, and Andrew Hilts. "Investigating the geography of open source software
through GitHub." (2010).
6. Figueira Filho, Fernando, et al. "A study on the geographical distribution of Brazil’s
prestigious software developers." Journal of Internet Services and Applications 6.1 (2015): 1.
Image Source: http://wifflegif.com/tags/58347-octocat-gifs
Thank you for your attention! Any questions?
Image Source: https://octodex.github.com/images/heisencat.png

More Related Content

Similar to Exploring Language Communities on Github

The Ring programming language version 1.7 book - Part 6 of 196
The Ring programming language version 1.7 book - Part 6 of 196The Ring programming language version 1.7 book - Part 6 of 196
The Ring programming language version 1.7 book - Part 6 of 196Mahmoud Samir Fayed
 
ADVANCED PROGRAMMING TECHNOLOGIES NOWADAYS - Copy.pptx
ADVANCED PROGRAMMING TECHNOLOGIES NOWADAYS - Copy.pptxADVANCED PROGRAMMING TECHNOLOGIES NOWADAYS - Copy.pptx
ADVANCED PROGRAMMING TECHNOLOGIES NOWADAYS - Copy.pptxRickyLoberiano
 
The Ring programming language version 1.6 book - Part 6 of 189
The Ring programming language version 1.6 book - Part 6 of 189The Ring programming language version 1.6 book - Part 6 of 189
The Ring programming language version 1.6 book - Part 6 of 189Mahmoud Samir Fayed
 
APIs and SDKs: Breaking Into and Succeeding in a Specialty Market
APIs and SDKs: Breaking Into and Succeeding in a Specialty MarketAPIs and SDKs: Breaking Into and Succeeding in a Specialty Market
APIs and SDKs: Breaking Into and Succeeding in a Specialty MarketScott Abel
 
Introduction to flutter's basic concepts
Introduction to flutter's basic conceptsIntroduction to flutter's basic concepts
Introduction to flutter's basic conceptsKumaresh Chandra Baruri
 
Ready, set, go! An introduction to the Go programming language
Ready, set, go! An introduction to the Go programming languageReady, set, go! An introduction to the Go programming language
Ready, set, go! An introduction to the Go programming languageRTigger
 
Full Stack Web Development
Full Stack Web DevelopmentFull Stack Web Development
Full Stack Web DevelopmentSWAGATHCHOWDARY1
 
The path to an hybrid open source paradigm
The path to an hybrid open source paradigmThe path to an hybrid open source paradigm
The path to an hybrid open source paradigmJonathan Challener
 
Git influencer -catherine shen
Git influencer -catherine shenGit influencer -catherine shen
Git influencer -catherine shenCatherine Shen
 
Guidelines for Working with Contract Developers in Evergreen
Guidelines for Working with Contract Developers in EvergreenGuidelines for Working with Contract Developers in Evergreen
Guidelines for Working with Contract Developers in Evergreenloriayre
 
Build Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPCBuild Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPCTim Burks
 
The Ring programming language version 1.4 book - Part 2 of 30
The Ring programming language version 1.4 book - Part 2 of 30The Ring programming language version 1.4 book - Part 2 of 30
The Ring programming language version 1.4 book - Part 2 of 30Mahmoud Samir Fayed
 
An introduction to go programming language
An introduction to go programming languageAn introduction to go programming language
An introduction to go programming languageTechnology Parser
 
The Concept Of Abstract Data Types
The Concept Of Abstract Data TypesThe Concept Of Abstract Data Types
The Concept Of Abstract Data TypesKaty Allen
 
Dart presentation
Dart presentationDart presentation
Dart presentationLucas Leal
 
Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...source{d}
 

Similar to Exploring Language Communities on Github (20)

Final Algos
Final AlgosFinal Algos
Final Algos
 
The Ring programming language version 1.7 book - Part 6 of 196
The Ring programming language version 1.7 book - Part 6 of 196The Ring programming language version 1.7 book - Part 6 of 196
The Ring programming language version 1.7 book - Part 6 of 196
 
ADVANCED PROGRAMMING TECHNOLOGIES NOWADAYS - Copy.pptx
ADVANCED PROGRAMMING TECHNOLOGIES NOWADAYS - Copy.pptxADVANCED PROGRAMMING TECHNOLOGIES NOWADAYS - Copy.pptx
ADVANCED PROGRAMMING TECHNOLOGIES NOWADAYS - Copy.pptx
 
The Ring programming language version 1.6 book - Part 6 of 189
The Ring programming language version 1.6 book - Part 6 of 189The Ring programming language version 1.6 book - Part 6 of 189
The Ring programming language version 1.6 book - Part 6 of 189
 
APIs and SDKs: Breaking Into and Succeeding in a Specialty Market
APIs and SDKs: Breaking Into and Succeeding in a Specialty MarketAPIs and SDKs: Breaking Into and Succeeding in a Specialty Market
APIs and SDKs: Breaking Into and Succeeding in a Specialty Market
 
Introduction to flutter's basic concepts
Introduction to flutter's basic conceptsIntroduction to flutter's basic concepts
Introduction to flutter's basic concepts
 
Ready, set, go! An introduction to the Go programming language
Ready, set, go! An introduction to the Go programming languageReady, set, go! An introduction to the Go programming language
Ready, set, go! An introduction to the Go programming language
 
Full Stack Web Development
Full Stack Web DevelopmentFull Stack Web Development
Full Stack Web Development
 
The path to an hybrid open source paradigm
The path to an hybrid open source paradigmThe path to an hybrid open source paradigm
The path to an hybrid open source paradigm
 
Git influencer -catherine shen
Git influencer -catherine shenGit influencer -catherine shen
Git influencer -catherine shen
 
Guidelines for Working with Contract Developers in Evergreen
Guidelines for Working with Contract Developers in EvergreenGuidelines for Working with Contract Developers in Evergreen
Guidelines for Working with Contract Developers in Evergreen
 
An Introduction to Go
An Introduction to GoAn Introduction to Go
An Introduction to Go
 
Build Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPCBuild Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPC
 
The Ring programming language version 1.4 book - Part 2 of 30
The Ring programming language version 1.4 book - Part 2 of 30The Ring programming language version 1.4 book - Part 2 of 30
The Ring programming language version 1.4 book - Part 2 of 30
 
An introduction to go programming language
An introduction to go programming languageAn introduction to go programming language
An introduction to go programming language
 
The Concept Of Abstract Data Types
The Concept Of Abstract Data TypesThe Concept Of Abstract Data Types
The Concept Of Abstract Data Types
 
Resume
ResumeResume
Resume
 
Dart presentation
Dart presentationDart presentation
Dart presentation
 
Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...
 
C++
C++C++
C++
 

Recently uploaded

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 

Recently uploaded (20)

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 

Exploring Language Communities on Github

  • 1. Exploring Language Communities on GitHub Antigoni M. Founta
  • 2. Introduction This study focuses on the exploration of underlying patterns and the detection of communities on programming languages used by GitHub users, via network analysis. There are two graphs derived from the whole dataset and two location-specific graphs, in order to study both the general audience of GitHub as well as the trends regarding some sample locations. Goal: Understand how languages are practically grouped in terms of the way developers use them, as well as discover trends either worldwide or on specific locations. Nodes → Languages Edges → Language co-occurrence in User Profiles (based on the user repositories)
  • 3. GitHub ● GitHub is a web-based Git repository hosting service ● It offers distributed revision control and source code management (SCM) ● It is the largest host of source code in the world! [1] Why Github? “The introduction of social features in a code hosting site has drawn particular attention from researchers while the integrated social features, and the availability of metadata through an accessible api have made GitHub very attractive for software engineering researchers” [3] Top Image Source: https://goo.gl/CWBMqb Bottom Image Source: https://github.com/logos
  • 4. ● Programming Language categorization ambiguity ● GitHub bias on Web Development ● Locations and users have power-law distribution: there are numerous developers from few locations (such as California, London etc) and there is a significant amount of locations with few users Pros Challenges ● Developers will get a hint of which languages are used jointly, and thus perhaps serve the same purpose. ● Language creators will get a hint of what their audience prefer and trust. ● Language communities might actually be another way to explore developer communities.
  • 5. Fundamentals Dataset Features ➔ ID, Username, Location, Followers, Public Repos, Languages & Bytes of code Network Structure ➔ Nodes: Languages ◆ Attribute: Total Bytes of Code ➔ Edges: Pairs of Languages that co-occurred in at least one user profile ◆ Weight: Amount of users that use both languages Challenges upon Data ➔ Only public repositories accessible (users mainly work on private!) ➔ Languages are added by the user (empty, not real, not written in the same way) PyGithub[2]
  • 6. Final Datasets ❏ 4000 users since GitHub foundation + 150.000 from 2012 ❏ Filter: Get only users with locations! ❏ Final: 2300 users since GitHub foundation + 37.000 from 2012
  • 9. Methodology Create graph (as described): ● Filters: Degree Range ● Layout: Force Atlas 2 ● Node size: “Bytes of Code” Range ● Label size: Degree Range Compute Modularity & get communities: ● Sometimes using edge weights, sometimes not Visualize pairs of languages and amount of developers that use both
  • 10. Results: All Data - All Languages User-based Language Graph
  • 11. Language Co-occurrences on User Profiles & Top Languages based on Bytes of Code written
  • 12. Results: All Data - Top Languages User-based Language Graph
  • 13. Language Co-occurrences on User Profiles #Top languages had minor differences, and thus are not reported
  • 14. Results: California - Top 3 Languages User-based Language Graph
  • 15. Language Co-occurrences on User Profiles & Top Languages based on Bytes of Code written
  • 16. Results: Greece - Top 3 Languages User-based Language Graph
  • 17. Language Co-occurrences on User Profiles & Top Languages based on Bytes of Code written
  • 18. Repo-based Language Graph Communities (modularity: 0.23) Blue: Web-oriented Pink: Desktop-oriented Yellow: Other
  • 19. Conclusions Language-Oriented ➔ “Web-oriented” is the most robust category of languages used in Github ➔ “JavaScript - CSS” is the leading pair of languages, always outnumbering all other pairs ➔ Even though JavaScript is almost always dominating Pairs of Languages, C is always the most used one in matters of Bytes of Code [perhaps C users are not language-extroverts…] Scheme-Oriented ➔ With a user-based scheme we can understand the general preferences of developers and the patterns between languages. [difficult when dataset is big!] ➔ With a repo-based scheme we can understand hidden (or at least not widely known) patterns of languages that are used for same purposes. ➔ General purpose: repo-based scheme Location purpose: user-based scheme
  • 20. Future Work ● More Data ! ● More Locations and Comparisons ● Language Graphs based on Top/Most influential Users [using followers or stars] ● Association Rules on Languages for community detection ● User Graph to detect user communities per Location (e.g. web developers, game developers) and compare with Language Graph of Location
  • 21. References 1. Github on Wikipedia: https://en.wikipedia.org/wiki/GitHub 2. PyGithub Library: https://github.com/PyGithub/PyGithub 3. Kalliamvakou, Eirini, et al. "The promises and perils of mining GitHub." Proceedings of the 11th working conference on mining software repositories. ACM, 2014. 4. Thung, Ferdian, et al. "Network structure of social coding in github." Software maintenance and reengineering (csmr), 2013 17th european conference on. IEEE, 2013. 5. Takhteyev, Yuri, and Andrew Hilts. "Investigating the geography of open source software through GitHub." (2010). 6. Figueira Filho, Fernando, et al. "A study on the geographical distribution of Brazil’s prestigious software developers." Journal of Internet Services and Applications 6.1 (2015): 1. Image Source: http://wifflegif.com/tags/58347-octocat-gifs
  • 22. Thank you for your attention! Any questions? Image Source: https://octodex.github.com/images/heisencat.png