SlideShare a Scribd company logo
Exploring Language
Communities on GitHub
Antigoni M. Founta
Introduction
This study focuses on the exploration of underlying patterns and the detection of
communities on programming languages used by GitHub users, via network analysis.
There are two graphs derived from the whole dataset and two location-specific graphs,
in order to study both the general audience of GitHub as well as the trends regarding
some sample locations.
Goal: Understand how languages are practically grouped in terms of the way
developers use them, as well as discover trends either worldwide or on specific
locations.
Nodes → Languages
Edges → Language co-occurrence in User Profiles (based on the user repositories)
GitHub
● GitHub is a web-based Git repository hosting service
● It offers distributed revision control and source code
management (SCM)
● It is the largest host of source code in the world! [1]
Why Github?
“The introduction of social features in a code hosting site has drawn
particular attention from researchers while the integrated social
features, and the availability of metadata through an accessible api
have made GitHub very attractive for software engineering
researchers” [3]
Top Image Source: https://goo.gl/CWBMqb
Bottom Image Source: https://github.com/logos
● Programming Language
categorization ambiguity
● GitHub bias on Web Development
● Locations and users have
power-law distribution: there are
numerous developers from few
locations (such as California,
London etc) and there is a
significant amount of locations
with few users
Pros Challenges
● Developers will get a hint of
which languages are used jointly,
and thus perhaps serve the same
purpose.
● Language creators will get a hint
of what their audience prefer and
trust.
● Language communities might
actually be another way to
explore developer communities.
Fundamentals
Dataset Features
➔ ID, Username, Location, Followers, Public Repos, Languages & Bytes of code
Network Structure
➔ Nodes: Languages
◆ Attribute: Total Bytes of Code
➔ Edges: Pairs of Languages that co-occurred in at least one user profile
◆ Weight: Amount of users that use both languages
Challenges upon Data
➔ Only public repositories accessible (users mainly work on private!)
➔ Languages are added by the user (empty, not real, not written in the same way)
PyGithub[2]
Final Datasets
❏ 4000 users since GitHub foundation + 150.000 from 2012
❏ Filter: Get only users with locations!
❏ Final: 2300 users since GitHub foundation + 37.000 from 2012
Descriptive Statistics
Data Distribution
Methodology
Create graph (as described):
● Filters: Degree Range
● Layout: Force Atlas 2
● Node size: “Bytes of Code” Range
● Label size: Degree Range
Compute Modularity & get communities:
● Sometimes using edge weights, sometimes not
Visualize pairs of languages and amount of developers that use both
Results: All Data - All Languages
User-based
Language Graph
Language Co-occurrences on User Profiles &
Top Languages based on Bytes of Code written
Results: All Data - Top Languages
User-based
Language Graph
Language Co-occurrences
on User Profiles
#Top languages had minor differences, and thus are not reported
Results: California - Top 3 Languages
User-based
Language Graph
Language Co-occurrences on User Profiles &
Top Languages based on Bytes of Code written
Results: Greece - Top 3 Languages
User-based
Language Graph
Language Co-occurrences on User Profiles &
Top Languages based on Bytes of Code written
Repo-based
Language Graph
Communities
(modularity: 0.23)
Blue: Web-oriented
Pink: Desktop-oriented
Yellow: Other
Conclusions
Language-Oriented
➔ “Web-oriented” is the most robust category of languages used in Github
➔ “JavaScript - CSS” is the leading pair of languages, always outnumbering all other pairs
➔ Even though JavaScript is almost always dominating Pairs of Languages, C is always the
most used one in matters of Bytes of Code [perhaps C users are not language-extroverts…]
Scheme-Oriented
➔ With a user-based scheme we can understand the general preferences of developers and the
patterns between languages. [difficult when dataset is big!]
➔ With a repo-based scheme we can understand hidden (or at least not widely known)
patterns of languages that are used for same purposes.
➔ General purpose: repo-based scheme
Location purpose: user-based scheme
Future Work
● More Data !
● More Locations and Comparisons
● Language Graphs based on Top/Most influential Users [using followers or stars]
● Association Rules on Languages for community detection
● User Graph to detect user communities per Location (e.g. web developers, game
developers) and compare with Language Graph of Location
References
1. Github on Wikipedia: https://en.wikipedia.org/wiki/GitHub
2. PyGithub Library: https://github.com/PyGithub/PyGithub
3. Kalliamvakou, Eirini, et al. "The promises and perils of mining GitHub." Proceedings of the
11th working conference on mining software repositories. ACM, 2014.
4. Thung, Ferdian, et al. "Network structure of social coding in github." Software maintenance
and reengineering (csmr), 2013 17th european conference on. IEEE, 2013.
5. Takhteyev, Yuri, and Andrew Hilts. "Investigating the geography of open source software
through GitHub." (2010).
6. Figueira Filho, Fernando, et al. "A study on the geographical distribution of Brazil’s
prestigious software developers." Journal of Internet Services and Applications 6.1 (2015): 1.
Image Source: http://wifflegif.com/tags/58347-octocat-gifs
Thank you for your attention! Any questions?
Image Source: https://octodex.github.com/images/heisencat.png

More Related Content

Similar to Exploring Language Communities on Github

Final Algos
Final AlgosFinal Algos
Final Algos
Anirudh Mallem
 
The Ring programming language version 1.7 book - Part 6 of 196
The Ring programming language version 1.7 book - Part 6 of 196The Ring programming language version 1.7 book - Part 6 of 196
The Ring programming language version 1.7 book - Part 6 of 196
Mahmoud Samir Fayed
 
ADVANCED PROGRAMMING TECHNOLOGIES NOWADAYS - Copy.pptx
ADVANCED PROGRAMMING TECHNOLOGIES NOWADAYS - Copy.pptxADVANCED PROGRAMMING TECHNOLOGIES NOWADAYS - Copy.pptx
ADVANCED PROGRAMMING TECHNOLOGIES NOWADAYS - Copy.pptx
RickyLoberiano
 
The Ring programming language version 1.6 book - Part 6 of 189
The Ring programming language version 1.6 book - Part 6 of 189The Ring programming language version 1.6 book - Part 6 of 189
The Ring programming language version 1.6 book - Part 6 of 189
Mahmoud Samir Fayed
 
APIs and SDKs: Breaking Into and Succeeding in a Specialty Market
APIs and SDKs: Breaking Into and Succeeding in a Specialty MarketAPIs and SDKs: Breaking Into and Succeeding in a Specialty Market
APIs and SDKs: Breaking Into and Succeeding in a Specialty Market
Scott Abel
 
Introduction to flutter's basic concepts
Introduction to flutter's basic conceptsIntroduction to flutter's basic concepts
Introduction to flutter's basic concepts
Kumaresh Chandra Baruri
 
Ready, set, go! An introduction to the Go programming language
Ready, set, go! An introduction to the Go programming languageReady, set, go! An introduction to the Go programming language
Ready, set, go! An introduction to the Go programming language
RTigger
 
Full Stack Web Development
Full Stack Web DevelopmentFull Stack Web Development
Full Stack Web Development
SWAGATHCHOWDARY1
 
Presentation on coding language name basica
Presentation on coding language name basicaPresentation on coding language name basica
Presentation on coding language name basica
cagav55063
 
The path to an hybrid open source paradigm
The path to an hybrid open source paradigmThe path to an hybrid open source paradigm
The path to an hybrid open source paradigm
Jonathan Challener
 
Git influencer -catherine shen
Git influencer -catherine shenGit influencer -catherine shen
Git influencer -catherine shen
Catherine Shen
 
Guidelines for Working with Contract Developers in Evergreen
Guidelines for Working with Contract Developers in EvergreenGuidelines for Working with Contract Developers in Evergreen
Guidelines for Working with Contract Developers in Evergreen
loriayre
 
An Introduction to Go
An Introduction to GoAn Introduction to Go
An Introduction to Go
Imesh Gunaratne
 
Build Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPCBuild Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPC
Tim Burks
 
The Ring programming language version 1.4 book - Part 2 of 30
The Ring programming language version 1.4 book - Part 2 of 30The Ring programming language version 1.4 book - Part 2 of 30
The Ring programming language version 1.4 book - Part 2 of 30
Mahmoud Samir Fayed
 
An introduction to go programming language
An introduction to go programming languageAn introduction to go programming language
An introduction to go programming language
Technology Parser
 
The Concept Of Abstract Data Types
The Concept Of Abstract Data TypesThe Concept Of Abstract Data Types
The Concept Of Abstract Data Types
Katy Allen
 
Resume
ResumeResume
Resume
PeterTao7
 
Dart presentation
Dart presentationDart presentation
Dart presentation
Lucas Leal
 
Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...
source{d}
 

Similar to Exploring Language Communities on Github (20)

Final Algos
Final AlgosFinal Algos
Final Algos
 
The Ring programming language version 1.7 book - Part 6 of 196
The Ring programming language version 1.7 book - Part 6 of 196The Ring programming language version 1.7 book - Part 6 of 196
The Ring programming language version 1.7 book - Part 6 of 196
 
ADVANCED PROGRAMMING TECHNOLOGIES NOWADAYS - Copy.pptx
ADVANCED PROGRAMMING TECHNOLOGIES NOWADAYS - Copy.pptxADVANCED PROGRAMMING TECHNOLOGIES NOWADAYS - Copy.pptx
ADVANCED PROGRAMMING TECHNOLOGIES NOWADAYS - Copy.pptx
 
The Ring programming language version 1.6 book - Part 6 of 189
The Ring programming language version 1.6 book - Part 6 of 189The Ring programming language version 1.6 book - Part 6 of 189
The Ring programming language version 1.6 book - Part 6 of 189
 
APIs and SDKs: Breaking Into and Succeeding in a Specialty Market
APIs and SDKs: Breaking Into and Succeeding in a Specialty MarketAPIs and SDKs: Breaking Into and Succeeding in a Specialty Market
APIs and SDKs: Breaking Into and Succeeding in a Specialty Market
 
Introduction to flutter's basic concepts
Introduction to flutter's basic conceptsIntroduction to flutter's basic concepts
Introduction to flutter's basic concepts
 
Ready, set, go! An introduction to the Go programming language
Ready, set, go! An introduction to the Go programming languageReady, set, go! An introduction to the Go programming language
Ready, set, go! An introduction to the Go programming language
 
Full Stack Web Development
Full Stack Web DevelopmentFull Stack Web Development
Full Stack Web Development
 
Presentation on coding language name basica
Presentation on coding language name basicaPresentation on coding language name basica
Presentation on coding language name basica
 
The path to an hybrid open source paradigm
The path to an hybrid open source paradigmThe path to an hybrid open source paradigm
The path to an hybrid open source paradigm
 
Git influencer -catherine shen
Git influencer -catherine shenGit influencer -catherine shen
Git influencer -catherine shen
 
Guidelines for Working with Contract Developers in Evergreen
Guidelines for Working with Contract Developers in EvergreenGuidelines for Working with Contract Developers in Evergreen
Guidelines for Working with Contract Developers in Evergreen
 
An Introduction to Go
An Introduction to GoAn Introduction to Go
An Introduction to Go
 
Build Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPCBuild Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPC
 
The Ring programming language version 1.4 book - Part 2 of 30
The Ring programming language version 1.4 book - Part 2 of 30The Ring programming language version 1.4 book - Part 2 of 30
The Ring programming language version 1.4 book - Part 2 of 30
 
An introduction to go programming language
An introduction to go programming languageAn introduction to go programming language
An introduction to go programming language
 
The Concept Of Abstract Data Types
The Concept Of Abstract Data TypesThe Concept Of Abstract Data Types
The Concept Of Abstract Data Types
 
Resume
ResumeResume
Resume
 
Dart presentation
Dart presentationDart presentation
Dart presentation
 
Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...
 

Recently uploaded

一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
facilitymanager11
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
exukyp
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
bmucuha
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 

Recently uploaded (20)

一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 

Exploring Language Communities on Github

  • 1. Exploring Language Communities on GitHub Antigoni M. Founta
  • 2. Introduction This study focuses on the exploration of underlying patterns and the detection of communities on programming languages used by GitHub users, via network analysis. There are two graphs derived from the whole dataset and two location-specific graphs, in order to study both the general audience of GitHub as well as the trends regarding some sample locations. Goal: Understand how languages are practically grouped in terms of the way developers use them, as well as discover trends either worldwide or on specific locations. Nodes → Languages Edges → Language co-occurrence in User Profiles (based on the user repositories)
  • 3. GitHub ● GitHub is a web-based Git repository hosting service ● It offers distributed revision control and source code management (SCM) ● It is the largest host of source code in the world! [1] Why Github? “The introduction of social features in a code hosting site has drawn particular attention from researchers while the integrated social features, and the availability of metadata through an accessible api have made GitHub very attractive for software engineering researchers” [3] Top Image Source: https://goo.gl/CWBMqb Bottom Image Source: https://github.com/logos
  • 4. ● Programming Language categorization ambiguity ● GitHub bias on Web Development ● Locations and users have power-law distribution: there are numerous developers from few locations (such as California, London etc) and there is a significant amount of locations with few users Pros Challenges ● Developers will get a hint of which languages are used jointly, and thus perhaps serve the same purpose. ● Language creators will get a hint of what their audience prefer and trust. ● Language communities might actually be another way to explore developer communities.
  • 5. Fundamentals Dataset Features ➔ ID, Username, Location, Followers, Public Repos, Languages & Bytes of code Network Structure ➔ Nodes: Languages ◆ Attribute: Total Bytes of Code ➔ Edges: Pairs of Languages that co-occurred in at least one user profile ◆ Weight: Amount of users that use both languages Challenges upon Data ➔ Only public repositories accessible (users mainly work on private!) ➔ Languages are added by the user (empty, not real, not written in the same way) PyGithub[2]
  • 6. Final Datasets ❏ 4000 users since GitHub foundation + 150.000 from 2012 ❏ Filter: Get only users with locations! ❏ Final: 2300 users since GitHub foundation + 37.000 from 2012
  • 9. Methodology Create graph (as described): ● Filters: Degree Range ● Layout: Force Atlas 2 ● Node size: “Bytes of Code” Range ● Label size: Degree Range Compute Modularity & get communities: ● Sometimes using edge weights, sometimes not Visualize pairs of languages and amount of developers that use both
  • 10. Results: All Data - All Languages User-based Language Graph
  • 11. Language Co-occurrences on User Profiles & Top Languages based on Bytes of Code written
  • 12. Results: All Data - Top Languages User-based Language Graph
  • 13. Language Co-occurrences on User Profiles #Top languages had minor differences, and thus are not reported
  • 14. Results: California - Top 3 Languages User-based Language Graph
  • 15. Language Co-occurrences on User Profiles & Top Languages based on Bytes of Code written
  • 16. Results: Greece - Top 3 Languages User-based Language Graph
  • 17. Language Co-occurrences on User Profiles & Top Languages based on Bytes of Code written
  • 18. Repo-based Language Graph Communities (modularity: 0.23) Blue: Web-oriented Pink: Desktop-oriented Yellow: Other
  • 19. Conclusions Language-Oriented ➔ “Web-oriented” is the most robust category of languages used in Github ➔ “JavaScript - CSS” is the leading pair of languages, always outnumbering all other pairs ➔ Even though JavaScript is almost always dominating Pairs of Languages, C is always the most used one in matters of Bytes of Code [perhaps C users are not language-extroverts…] Scheme-Oriented ➔ With a user-based scheme we can understand the general preferences of developers and the patterns between languages. [difficult when dataset is big!] ➔ With a repo-based scheme we can understand hidden (or at least not widely known) patterns of languages that are used for same purposes. ➔ General purpose: repo-based scheme Location purpose: user-based scheme
  • 20. Future Work ● More Data ! ● More Locations and Comparisons ● Language Graphs based on Top/Most influential Users [using followers or stars] ● Association Rules on Languages for community detection ● User Graph to detect user communities per Location (e.g. web developers, game developers) and compare with Language Graph of Location
  • 21. References 1. Github on Wikipedia: https://en.wikipedia.org/wiki/GitHub 2. PyGithub Library: https://github.com/PyGithub/PyGithub 3. Kalliamvakou, Eirini, et al. "The promises and perils of mining GitHub." Proceedings of the 11th working conference on mining software repositories. ACM, 2014. 4. Thung, Ferdian, et al. "Network structure of social coding in github." Software maintenance and reengineering (csmr), 2013 17th european conference on. IEEE, 2013. 5. Takhteyev, Yuri, and Andrew Hilts. "Investigating the geography of open source software through GitHub." (2010). 6. Figueira Filho, Fernando, et al. "A study on the geographical distribution of Brazil’s prestigious software developers." Journal of Internet Services and Applications 6.1 (2015): 1. Image Source: http://wifflegif.com/tags/58347-octocat-gifs
  • 22. Thank you for your attention! Any questions? Image Source: https://octodex.github.com/images/heisencat.png