SlideShare a Scribd company logo
1 of 44
Download to read offline
A DATA-DRIVEN JOURNEY THROUGH
RESEARCH ON SOFTWARE ENGINEERING
           Mario Sangiorgio
MOTIVATION

Getting a better idea of what’s going on in
software engineering research community
     through a quantitative approach
RELATED WORKS
•C. Ghezzi - Keynote at ICSE 2008
 Reflections on 40+ years of software engineering
 research and beyond

•L. Briand - Keynote at ICSM 2011
 Useful software engineering research: leading a double
 agent life

•D. Rosemblum - Keynote at ASE 2012
 Whither software engineering research?
SUBJECTS OF OUR STUDY


 researchers    research topics




 affiliations   geographical areas
DATA
ACADEMIC LITERATURE
SELECTED PUBLICATIONS


           REPRESENTATIVENESS



           AUTHORITATIVENESS
DATA SOURCES

   Articles published and their authors
       COMPLETE XML DATABASE

  Citations, authors and affiliation details
                     APIs
COLLECTED DATA
  Venue    Number of papers   From    To
    TSE        3043           1975   2012
 TOSEM          295           1992   2012
   ICSE        2907           1976   2012
    ASE        1116           1997   2012
ESEC/FSE        416           1987   2012
  TOTAL        7777           1975   2012


9865 researchers      278794 citations
ANALYSIS
AUTHOR ANALYSIS

  Who published the most?

 Are there sub-communities?
MOST PROLIFIC AUTHORS
 Software
                ICSE       ASE      ESEC/FSE       TSE      TOSEM
Engineering
   Basili      Bohem       Xie       Clarke       Basili    Notkin
    60           28         24          8          33         13
  Notkin        Basili    Grundy    D. Jackson    Briand   Rothermel
    56           26         18          8          26         8
 Kramer       Osterweil   Hosking     Ernst      Weyuker    Roman
    49           23         16          7          18         6
 Harrold       Kramer     Egyed      Notkin       Knight     Wolf
    46           21         16          7          17         6
   Xie         Notkin       Lo       Uchitel     Kramer     Harrold
    46           21         16          7          16         6
SUB-COMMUNITY DETECTION
                       For each venue we
                      consider the top most
                         prolific authors



            |A  B|
                       We compute the set
  J(A, B) =
            |A [ B|
                       similarity between all
                         the pair of venues
SUB-COMMUNITIES
                                 FSE


           0.4

                  TOSEM
           0.2
 mds[,2]

           0.0




                                                       ASE
           −0.2




                  TSE




                   ICSE

                    −0.2   0.0            0.2    0.4   0.6

                                       mds[,1]
TOPIC ANALYSIS
What is the topic of a paper?

 What are the hot topics in
  software engineering?

  How have they evolved?
CITATION NETWORK




     Papers in the
       dataset
CITATION NETWORK




      Internal
      citations
CITATION NETWORK
    Citations from
    specific venues




      Complete
       citations
EXAMPLE




What is the topic of the yellow paper?
EXAMPLE
                             What is the topic of
                              the yellow paper?




 Topic    Direct citations
Topic A          2           What is the topic of
Topic B          0           the general paper?
General          1
EXAMPLE
                             What is the topic of
                              the yellow paper?




 Topic    Direct citations          Topic profile
Topic A          2
                                Topic A       66%
Topic B          1
General          1              Topic B       33%
SOFTWARE ENGINEERING TOPICS
              Topic         Fraction of papers
    Programming Languages         9.34%
        Formal Methods            8.49%
       Software Reliability       6.13%
      Distributed Systems         5.96%
     Software Maintenance         5.92%
             Testing              4.64%
        Software Quality          4.53%
             Models               4.36%
     Software Architectures       4.36%
TOPICS IN THE ‘70S
                                                            By far the most
                                                              represented
                         Topic         Fraction of papers
Topics from    Programming Languages         16.71%
other fields         Performance              7.95%
                  Operating Systems           7.29%
                  Database Systems            6.84%
                   Formal Methods             6.65%
                Software Architectures        6.14%
               Knowledge Engineering          5.69%
                 Distributed Systems          4.94%
                Software Maintenance          4.18%
TOPICS IN THE ‘80S
                             Topic          Fraction of papers
                                                                 Significant rise
                  Programming Languages           10.48%
                     Distributed Systems           9.30%
    Other fields,   Knowledge Engineering          8.47%
     related to       Software Reliability         6.68%
distributed systems
                       Formal Methods              6.51%
                     Information Systems           5.55%
                    Software Maintenance           5.04%
                            Models                 4.35%
                     Artificial Intelligence        3.74%
     Not only code
TOPICS IN THE ‘90S
                                                               Change of the
                                                               most published
                            Topic         Fraction of papers       topic
                      Formal Methods            8.29%
                  Programming Languages         8.13%
                    Distributed Systems         6.80%
Focus on soft ware Software Maintenance         6.55%
      quality
                   Software Architectures       5.34%
                      Software Quality          4.80%
                   Knowledge Engineering        4.67%
                           Models               4.65%
                    Information Systems         4.40%
TOPICS IN THE 2000S                                 Still lot of
                                                               emphasis on
                                                                soft ware
                            Topic         Fraction of papers     quality
                       Formal Methods           9.93%
                  Programming Languages         8.37%
                           Testing              6.86%
                    Software Maintenance        6.58%
                     Software Reliability       6.22%
 Analysis of open     Software Quality          5.72%
source repositories
                           Models               4.80%
                      Empirical Studies         4.76%
                   Software Architectures       4.38%
NEED FOR A FINER ANALYSIS


 Topics change constantly, not once in a decade




          SOLUTION: sliding window
          instead of fixed subdivision
TESTING
0.18



0.14



0.09



0.05



  0
   1975   1980   1985   1990   1995   2000   2005
EMPIRICAL STUDIES
0.18



0.14



0.09



0.05



  0
   1975   1980   1985   1990   1995   2000   2005
SERVICES
0.18



0.14



0.09



0.05



  0
   1975   1980   1985   1990   1995   2000   2005
DISTRIBUTED SYSTEMS
0.18



0.14



0.09



0.05



  0
   1975   1980   1985   1990   1995   2000   2005
PROGRAMMING LANGUAGES
0.18



0.14



0.09



0.05



  0
   1975   1980   1985   1990   1995   2000   2005
PER-VENUE INSIGHTS
 Venue                     Peculiarities

  TSE            Biased towards empirical works

TOSEM           More focused on formal aspects

  ICSE       Balanced with respect to other venues

            Formal, with interests in testing, modeling
ESEC/FSE
                and requirements engineering
           Interests in program analysis and automated
  ASE
                             reasoning
AFFILIATION ANALYSIS

Where do the most prolific authors work?

 How much research is done in industry?
AFFILIATION PROFILE


 Author    Affiliation       Affiliation profile
Author A       1
                        Affiliation 1    33%
Author B       2
Author B       2        Affiliation 2    66%
MOST PROLIFIC AFFILIATIONS
                Affiliation                 Papers
                   IBM                     186.32
       Carnegie Mellon University          166.52
        University of Texas, Austin        122.62
         University of Maryland            106.83
                Microsoft                  101.63
         AT&T Bell Laboratories            101.37
      University of California, Irvine      98.17
     Georgia Institute of Technology        94.75
   Massachusetts Institute of Technology    93.24
          University of Virginia            81.55
               ALL FROM THE USA
PER-VENUE INSIGHTS
                                           Is it linked to the presence
 Venue                     Peculiarities       of empirical works?

               Is the venue with more industrial
  TSE
                          contribution
             European universities among the top
TOSEM
                        contributors                    Is Europe more
                                                            formal?
           Balanced set of contributors we saw in the
  ICSE
                          other venues
            Despite ESEC, there is no bias towards
ESEC/FSE                                     It is representative
                           Europe
             Industrial contribution is less relevant.
  ASE
           Some affiliations appear only in its top list.
INDUSTRY VS ACADEMIA
1.00



0.75



0.50



0.25



  0
   1970    1975   1980       1985   1990   1995       2000   2005
                  Industry                 Academia
GEOGRAPHICAL ANALYSIS


 Where does the contribution come from?
GEOGRAPHICAL AREAS
          Europe
 North
                     Asia
America
                      &
                    Oceania


 South
America    Africa
LOCATION OF A PAPER
    Affiliation profile                    Locations
Affiliation 1     20%           Affiliation 1 North America
Affiliation 2     30%           Affiliation 2        Europe
Affiliation 3     50%           Affiliation 3        Europe




                        Location profile
             North America           20%
                  Europe             80%
GEOGRAPHICAL DISTRIBUTION
1.00



0.75



0.50



0.25



  0
   1970         1975       1980        1985         1990      1995       2000    2005
       Europe          North America          South America     Asia & Oceania   Africa
CONCLUSION

    Academic literature contains a lot of
 information about a scientific community

With data mining techniques we can unveil it
     and get some interesting insights
QUESTIONS?

More Related Content

Similar to A data driven journey through research on software engineering

Test-Driven Development in the Corporate Workplace
Test-Driven Development in the Corporate WorkplaceTest-Driven Development in the Corporate Workplace
Test-Driven Development in the Corporate WorkplaceAhmed Owian
 
Towards Automatic Classification of LOD Datasets
Towards Automatic Classification of LOD DatasetsTowards Automatic Classification of LOD Datasets
Towards Automatic Classification of LOD DatasetsBlerina Spahiu
 
Ba course content intensive
Ba course content intensiveBa course content intensive
Ba course content intensiveCGI Federal
 
A Framework for Classifying and Comparing Architecture-Centric Software Evolu...
A Framework for Classifying and Comparing Architecture-Centric Software Evolu...A Framework for Classifying and Comparing Architecture-Centric Software Evolu...
A Framework for Classifying and Comparing Architecture-Centric Software Evolu...Pooyan Jamshidi
 
PhD Proposal talk
PhD Proposal talkPhD Proposal talk
PhD Proposal talkRay Buse
 
Software_Engineering_-_Pressman.pdf book
Software_Engineering_-_Pressman.pdf bookSoftware_Engineering_-_Pressman.pdf book
Software_Engineering_-_Pressman.pdf bookpinki sagar
 
AI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficientAI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficientKari Kakkonen
 
Foutse_MSR Vision keynote.pptx
Foutse_MSR Vision keynote.pptxFoutse_MSR Vision keynote.pptx
Foutse_MSR Vision keynote.pptxFoutse Khomh
 
Can We Automatically Generate Class Comments in Pharo?
Can We Automatically Generate Class Comments in Pharo?Can We Automatically Generate Class Comments in Pharo?
Can We Automatically Generate Class Comments in Pharo?ESUG
 
AI improves software testing through test automation, test creation and test ...
AI improves software testing through test automation, test creation and test ...AI improves software testing through test automation, test creation and test ...
AI improves software testing through test automation, test creation and test ...Kari Kakkonen
 
Arabic Handwritten Text Recognition and Writer Identification
Arabic Handwritten Text Recognition and Writer IdentificationArabic Handwritten Text Recognition and Writer Identification
Arabic Handwritten Text Recognition and Writer IdentificationMustafa Salam
 
msword
mswordmsword
mswordbutest
 
Software defect prevention example project
Software defect prevention example projectSoftware defect prevention example project
Software defect prevention example projectZarko Acimovic
 
Can We Automatically Generate Class Comments in Pharo?
Can We Automatically Generate Class Comments in Pharo?Can We Automatically Generate Class Comments in Pharo?
Can We Automatically Generate Class Comments in Pharo?Pooja Rani
 
Tracking Trends in Korean Information Science Research, 2000-2011
Tracking Trends in Korean Information Science Research, 2000-2011Tracking Trends in Korean Information Science Research, 2000-2011
Tracking Trends in Korean Information Science Research, 2000-2011SoYoung YU
 
Code-Review-COW56-Meeting
Code-Review-COW56-MeetingCode-Review-COW56-Meeting
Code-Review-COW56-MeetingMasud Rahman
 
Design Patterns - General Introduction
Design Patterns - General IntroductionDesign Patterns - General Introduction
Design Patterns - General IntroductionAsma CHERIF
 
Software engineering -core topics
Software engineering -core topicsSoftware engineering -core topics
Software engineering -core topicsAmnah_Ch
 
IEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK GuideIEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK GuideHironori Washizaki
 

Similar to A data driven journey through research on software engineering (20)

Test-Driven Development in the Corporate Workplace
Test-Driven Development in the Corporate WorkplaceTest-Driven Development in the Corporate Workplace
Test-Driven Development in the Corporate Workplace
 
Towards Automatic Classification of LOD Datasets
Towards Automatic Classification of LOD DatasetsTowards Automatic Classification of LOD Datasets
Towards Automatic Classification of LOD Datasets
 
Ba course content intensive
Ba course content intensiveBa course content intensive
Ba course content intensive
 
A Framework for Classifying and Comparing Architecture-Centric Software Evolu...
A Framework for Classifying and Comparing Architecture-Centric Software Evolu...A Framework for Classifying and Comparing Architecture-Centric Software Evolu...
A Framework for Classifying and Comparing Architecture-Centric Software Evolu...
 
PhD Proposal talk
PhD Proposal talkPhD Proposal talk
PhD Proposal talk
 
Software_Engineering_-_Pressman.pdf book
Software_Engineering_-_Pressman.pdf bookSoftware_Engineering_-_Pressman.pdf book
Software_Engineering_-_Pressman.pdf book
 
AI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficientAI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficient
 
Foutse_MSR Vision keynote.pptx
Foutse_MSR Vision keynote.pptxFoutse_MSR Vision keynote.pptx
Foutse_MSR Vision keynote.pptx
 
Can We Automatically Generate Class Comments in Pharo?
Can We Automatically Generate Class Comments in Pharo?Can We Automatically Generate Class Comments in Pharo?
Can We Automatically Generate Class Comments in Pharo?
 
AI improves software testing through test automation, test creation and test ...
AI improves software testing through test automation, test creation and test ...AI improves software testing through test automation, test creation and test ...
AI improves software testing through test automation, test creation and test ...
 
Arabic Handwritten Text Recognition and Writer Identification
Arabic Handwritten Text Recognition and Writer IdentificationArabic Handwritten Text Recognition and Writer Identification
Arabic Handwritten Text Recognition and Writer Identification
 
msword
mswordmsword
msword
 
Software defect prevention example project
Software defect prevention example projectSoftware defect prevention example project
Software defect prevention example project
 
Can We Automatically Generate Class Comments in Pharo?
Can We Automatically Generate Class Comments in Pharo?Can We Automatically Generate Class Comments in Pharo?
Can We Automatically Generate Class Comments in Pharo?
 
Software Engineering Practice
Software Engineering PracticeSoftware Engineering Practice
Software Engineering Practice
 
Tracking Trends in Korean Information Science Research, 2000-2011
Tracking Trends in Korean Information Science Research, 2000-2011Tracking Trends in Korean Information Science Research, 2000-2011
Tracking Trends in Korean Information Science Research, 2000-2011
 
Code-Review-COW56-Meeting
Code-Review-COW56-MeetingCode-Review-COW56-Meeting
Code-Review-COW56-Meeting
 
Design Patterns - General Introduction
Design Patterns - General IntroductionDesign Patterns - General Introduction
Design Patterns - General Introduction
 
Software engineering -core topics
Software engineering -core topicsSoftware engineering -core topics
Software engineering -core topics
 
IEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK GuideIEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
IEEE Computer Society’s Strategic Activities and Products including SWEBOK Guide
 

Recently uploaded

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Recently uploaded (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

A data driven journey through research on software engineering

  • 1. A DATA-DRIVEN JOURNEY THROUGH RESEARCH ON SOFTWARE ENGINEERING Mario Sangiorgio
  • 2. MOTIVATION Getting a better idea of what’s going on in software engineering research community through a quantitative approach
  • 3. RELATED WORKS •C. Ghezzi - Keynote at ICSE 2008 Reflections on 40+ years of software engineering research and beyond •L. Briand - Keynote at ICSM 2011 Useful software engineering research: leading a double agent life •D. Rosemblum - Keynote at ASE 2012 Whither software engineering research?
  • 4. SUBJECTS OF OUR STUDY researchers research topics affiliations geographical areas
  • 7. SELECTED PUBLICATIONS REPRESENTATIVENESS AUTHORITATIVENESS
  • 8. DATA SOURCES Articles published and their authors COMPLETE XML DATABASE Citations, authors and affiliation details APIs
  • 9. COLLECTED DATA Venue Number of papers From To TSE 3043 1975 2012 TOSEM 295 1992 2012 ICSE 2907 1976 2012 ASE 1116 1997 2012 ESEC/FSE 416 1987 2012 TOTAL 7777 1975 2012 9865 researchers 278794 citations
  • 11. AUTHOR ANALYSIS Who published the most? Are there sub-communities?
  • 12. MOST PROLIFIC AUTHORS Software ICSE ASE ESEC/FSE TSE TOSEM Engineering Basili Bohem Xie Clarke Basili Notkin 60 28 24 8 33 13 Notkin Basili Grundy D. Jackson Briand Rothermel 56 26 18 8 26 8 Kramer Osterweil Hosking Ernst Weyuker Roman 49 23 16 7 18 6 Harrold Kramer Egyed Notkin Knight Wolf 46 21 16 7 17 6 Xie Notkin Lo Uchitel Kramer Harrold 46 21 16 7 16 6
  • 13. SUB-COMMUNITY DETECTION For each venue we consider the top most prolific authors |A B| We compute the set J(A, B) = |A [ B| similarity between all the pair of venues
  • 14. SUB-COMMUNITIES FSE 0.4 TOSEM 0.2 mds[,2] 0.0 ASE −0.2 TSE ICSE −0.2 0.0 0.2 0.4 0.6 mds[,1]
  • 15. TOPIC ANALYSIS What is the topic of a paper? What are the hot topics in software engineering? How have they evolved?
  • 16. CITATION NETWORK Papers in the dataset
  • 17. CITATION NETWORK Internal citations
  • 18. CITATION NETWORK Citations from specific venues Complete citations
  • 19. EXAMPLE What is the topic of the yellow paper?
  • 20. EXAMPLE What is the topic of the yellow paper? Topic Direct citations Topic A 2 What is the topic of Topic B 0 the general paper? General 1
  • 21. EXAMPLE What is the topic of the yellow paper? Topic Direct citations Topic profile Topic A 2 Topic A 66% Topic B 1 General 1 Topic B 33%
  • 22. SOFTWARE ENGINEERING TOPICS Topic Fraction of papers Programming Languages 9.34% Formal Methods 8.49% Software Reliability 6.13% Distributed Systems 5.96% Software Maintenance 5.92% Testing 4.64% Software Quality 4.53% Models 4.36% Software Architectures 4.36%
  • 23. TOPICS IN THE ‘70S By far the most represented Topic Fraction of papers Topics from Programming Languages 16.71% other fields Performance 7.95% Operating Systems 7.29% Database Systems 6.84% Formal Methods 6.65% Software Architectures 6.14% Knowledge Engineering 5.69% Distributed Systems 4.94% Software Maintenance 4.18%
  • 24. TOPICS IN THE ‘80S Topic Fraction of papers Significant rise Programming Languages 10.48% Distributed Systems 9.30% Other fields, Knowledge Engineering 8.47% related to Software Reliability 6.68% distributed systems Formal Methods 6.51% Information Systems 5.55% Software Maintenance 5.04% Models 4.35% Artificial Intelligence 3.74% Not only code
  • 25. TOPICS IN THE ‘90S Change of the most published Topic Fraction of papers topic Formal Methods 8.29% Programming Languages 8.13% Distributed Systems 6.80% Focus on soft ware Software Maintenance 6.55% quality Software Architectures 5.34% Software Quality 4.80% Knowledge Engineering 4.67% Models 4.65% Information Systems 4.40%
  • 26. TOPICS IN THE 2000S Still lot of emphasis on soft ware Topic Fraction of papers quality Formal Methods 9.93% Programming Languages 8.37% Testing 6.86% Software Maintenance 6.58% Software Reliability 6.22% Analysis of open Software Quality 5.72% source repositories Models 4.80% Empirical Studies 4.76% Software Architectures 4.38%
  • 27. NEED FOR A FINER ANALYSIS Topics change constantly, not once in a decade SOLUTION: sliding window instead of fixed subdivision
  • 28. TESTING 0.18 0.14 0.09 0.05 0 1975 1980 1985 1990 1995 2000 2005
  • 29. EMPIRICAL STUDIES 0.18 0.14 0.09 0.05 0 1975 1980 1985 1990 1995 2000 2005
  • 30. SERVICES 0.18 0.14 0.09 0.05 0 1975 1980 1985 1990 1995 2000 2005
  • 31. DISTRIBUTED SYSTEMS 0.18 0.14 0.09 0.05 0 1975 1980 1985 1990 1995 2000 2005
  • 32. PROGRAMMING LANGUAGES 0.18 0.14 0.09 0.05 0 1975 1980 1985 1990 1995 2000 2005
  • 33. PER-VENUE INSIGHTS Venue Peculiarities TSE Biased towards empirical works TOSEM More focused on formal aspects ICSE Balanced with respect to other venues Formal, with interests in testing, modeling ESEC/FSE and requirements engineering Interests in program analysis and automated ASE reasoning
  • 34. AFFILIATION ANALYSIS Where do the most prolific authors work? How much research is done in industry?
  • 35. AFFILIATION PROFILE Author Affiliation Affiliation profile Author A 1 Affiliation 1 33% Author B 2 Author B 2 Affiliation 2 66%
  • 36. MOST PROLIFIC AFFILIATIONS Affiliation Papers IBM 186.32 Carnegie Mellon University 166.52 University of Texas, Austin 122.62 University of Maryland 106.83 Microsoft 101.63 AT&T Bell Laboratories 101.37 University of California, Irvine 98.17 Georgia Institute of Technology 94.75 Massachusetts Institute of Technology 93.24 University of Virginia 81.55 ALL FROM THE USA
  • 37. PER-VENUE INSIGHTS Is it linked to the presence Venue Peculiarities of empirical works? Is the venue with more industrial TSE contribution European universities among the top TOSEM contributors Is Europe more formal? Balanced set of contributors we saw in the ICSE other venues Despite ESEC, there is no bias towards ESEC/FSE It is representative Europe Industrial contribution is less relevant. ASE Some affiliations appear only in its top list.
  • 38. INDUSTRY VS ACADEMIA 1.00 0.75 0.50 0.25 0 1970 1975 1980 1985 1990 1995 2000 2005 Industry Academia
  • 39. GEOGRAPHICAL ANALYSIS Where does the contribution come from?
  • 40. GEOGRAPHICAL AREAS Europe North Asia America & Oceania South America Africa
  • 41. LOCATION OF A PAPER Affiliation profile Locations Affiliation 1 20% Affiliation 1 North America Affiliation 2 30% Affiliation 2 Europe Affiliation 3 50% Affiliation 3 Europe Location profile North America 20% Europe 80%
  • 42. GEOGRAPHICAL DISTRIBUTION 1.00 0.75 0.50 0.25 0 1970 1975 1980 1985 1990 1995 2000 2005 Europe North America South America Asia & Oceania Africa
  • 43. CONCLUSION Academic literature contains a lot of information about a scientific community With data mining techniques we can unveil it and get some interesting insights