SlideShare a Scribd company logo
Can Better Identifier Splitting
Techniques H l F
T h i      Help Feature LLocation?
                                i ?
Bogdan Dit, Latifa Guerrouj, D
B d Di L if G             j Denys P h
                                  Poshyvanyk, Gi li
                                           k Giuliano A
                                                      Antoniol
                                                           i l




     SEMERU

19th IEEE International Conference on Program Comprehension
             (ICPC’11) – Kingston, Ontario, Canada
2
Textual information embeds
    domain k
    d      i knowledge
                  l d




                             3
Textual information embeds
                                      domain k
                                      d      i knowledge
                                                    l d




                                   About 70% of source code
                                     consists of identifiers*




* Deissenboeck, F. and Pizka , M., "Concise and Consistent Naming", Software
                                                                        4
Quality Journal, vol. 14, no. 3, 2006, pp. 261-282
Textual information embeds
                                      domain k
                                      d      i knowledge
                                                    l d




                                   About 70% of source code
                                     consists of identifiers*


                              Identifiers are important source of
                             information for maintenance tasks:
                                • traceability link recovery
                                • feature location
* Deissenboeck, F. and Pizka , M., "Concise and Consistent Naming", Software
                                                                        5
Quality Journal, vol. 14, no. 3, 2006, pp. 261-282
# of Cumulative Feature Location
Papers based on Textual Information




                                 6
Related Work on Identifiers
• Takang et al. (JPL 96)
            al (JPL’96)
  – programs with full-word identifiers are more
    understandable than those with abbreviated
    ones
• Lawrie et al. (ICPC 06)
            al (ICPC’06)
  – full words and recognizable abbreviations
    lead to better comprehension
• Binkley et al. (ICPC’09)
  – CamelCase style is easier to recognize than
    underscore                                     7
Related Work on Identifiers
• Enslen et al. (MSR 09)
            al (MSR’09)
  – Samurai: algorithm for splitting identifiers
    (using tables of identifier frequencies)
• Guerrouj et al. (JSME’11)
  – TIDIER: algorithm for splitting identifiers
    (using contextual information)
• Other related work
  – Deissenboeck and Pizka (SQJ’06), Antoniol et
    al. (ICSM’07),
    al (ICSM’07) Haiduc and Marcus (ICPC’08)
                                    (ICPC’08),
    etc.                                        8
Splitting Identifiers Correctly is
           Challenging
            h ll




                                 9
Identifier Splitting Algorithms
Original Identifier
userId
setGID
print_file2device
print file2device
SSLCertificate
MINstring
USERID
currentsize
readadapterobject
       p     j
tolocale
imitating
DEFMASKBit

                                       10
Identifier Splitting Algorithms
Original Identifier   Camel Case
userId                user Id
setGID                set GID
print_file2device
print file2device     print file 2 device
SSLCertificate        SSL Certificate
MINstring             MI Nstring
USERID                USERID
currentsize           currentsize
readadapterobject
       p     j        readadapterobject
                             p     j
tolocale              tolocale
imitating             imitating
DEFMASKBit            DEFMASK Bit

                                            11
Identifier Splitting Algorithms
                                                 Handles
Original Identifier   Camel Case              underscore and
userId                user Id
                                                  digits
setGID                set GID
print_file2device
print file2device     print file 2 device
SSLCertificate        SSL Certificate
MINstring             MI Nstring
USERID                USERID
currentsize           currentsize
readadapterobject
       p     j        readadapterobject Fails
                             p     j            at mixed cases
tolocale              tolocale
imitating             imitating
DEFMASKBit            DEFMASK Bit
                                            Fails at same case
                                                 identifiers 12
Identifier Splitting Algorithms
Original Identifier   Camel Case
userId                user Id
setGID                set GID
print_file2device
print file2device     print file 2 device
SSLCertificate        SSL Certificate
MINstring             MI Nstring
USERID                USERID
currentsize           currentsize
readadapterobject
       p     j        readadapterobject
                             p     j
tolocale              tolocale
imitating             imitating
DEFMASKBit            DEFMASK Bit

                                            13
Identifier Splitting Algorithms
Original Identifier   Camel Case            Samurai
userId                user Id               user Id
setGID                set GID               set GID
print_file2device
print file2device     print file 2 device   print file 2 device
SSLCertificate        SSL Certificate       SSL Certificate
MINstring             MI Nstring            MIN string
USERID                USERID                USER ID
currentsize           currentsize           current size
readadapterobject
       p     j        readadapterobject
                             p     j        read adapter object
                                                    p      j
tolocale              tolocale              tol ocal e
imitating             imitating             imi ta ting
DEFMASKBit            DEFMASK Bit           DEF MASK Bit

                                                              14
Identifier Splitting Algorithms
Original Identifier       Camel Case               Samurai
userId                   user Id
                      Splits some cases            user Id
setGID                   set GID                   set GID
                      where CamelCase
print_file2device
print file2device        print file 2 device       print file 2 device
SSLCertificate
                             cannot
                         SSL Certificate           SSL Certificate
MINstring                 MI Nstring               MIN string
USERID                    USERID                   USER ID
currentsize               currentsize              current size
readadapterobject
       p     j            readadapterobject
                                 p     j           read adapter object
                                                           p      j
tolocale                  tolocale                 tol ocal e
imitating                 imitating                imi ta ting
DEFMASKBit                DEFMASK Bit              DEF MASK Bit

                                      Oversplits                     15
# of Cumulative Feature Location
Papers based on Textual Information




                                 16
# of Cumulative Feature Location
Papers based on Textual Information



Existing feature location techniques
      use Camel Case splitting



                                   17
Information Retrieval FLT
• Generate corpus              synchronized void print(TestResult result,
• Preprocessing                long runTime) throws IOE
                               l       Ti ) h
                                 printHeader(runTime);
                                                        IOException{
                                                                i {

   –   Remove non-literals       printErrors(result);
   –   Remove stop words         p
                                 printFailures(result);
                                              (       );
                                 printFooter(result);
   –   Split identifiers
                               }
   –   Stemming
                               synchronized void print TestResult result
• I d i
  Indexing                     long runTime throws IOException
   – Term-by-document matrix   printHeader runTime printErrors result
                               printFailures result printFooter result
   – Singular Value
        g
     Decomposition
• User formulate query         print TestResult result runTime
                               IOException printHeader runTime
                                O cept o p t eade u               e
• G
  Generate results
         t     lt              printErrors result printFailures result
• Ranked list                  printFooter result                        18
Information Retrieval FLT
• Generate corpus              print Test Result result run Time IO
                               Exception print Header run Ti print
                               E      ti     i tH d          Time i t
• Preprocessing                Errors result print Failures result print
   –   Remove non-literals     Footer result
   –   Remove stop words
   –   Split identifiers
                               print test result result run time io
   –   Stemming                exception print head run time print error
• I d i
  Indexing                     result print fail result print foot result
   – Term-by-document matrix
   – Singular Value
        g
     Decomposition                      print   test   result   ...
• User formulate query
                                   m1     5      1       3      ...
• G
  Generate results
         t     lt
                                   m2    ...     ...     ...    ...
• Ranked list                                                               19
Information Retrieval FLT
                                    print   test   result   ...
• Generate corpus
• Preprocessing                m1    5       1       3      ...

   –   Remove non-literals     m2    ...     ...     ...    ...
   –   Remove stop words
   –   Split identifiers
   –   Stemming
• I d i
  Indexing
   – Term-by-document matrix
   – Singular Value
        g
     Decomposition
• User formulate query
• G
  Generate results
         t     lt
• Ranked list                                                     20
IR and Dynamic Information FLT
• Generate corpus
• Preprocessing
   –   Remove non-literals
   –   Remove stop words
   –   Split identifiers
   –   Stemming
• I d i
  Indexing
   – Term-by-document matrix
   – Singular Value Decomposition
        g                p
                                    Collect execution
• User formulate query                    trace
• Generate results
• Ranked list of executed methods
                                                    21
Research Goal
               R      hG l
Evaluate how advanced splitting techniques impact
  the
  th performance of feature location techniques
         f        ff t      l    ti t h i




                                             22
Information Retrieval FLT
• Generate corpus
• Preprocessing
   –   Remove non-literals
                               Replace Camel Case with :
   –   Remove stop words         •Samurai
   –   Split identifiers         •“Perfect” Splitting
   –   Stemming                  algorithm (
                                   g       (Oracle) )
• I d i
  Indexing
   – Term-by-document matrix
   – Singular Value
        g
     Decomposition                Better

• User formulate query
• G
  Generate results
         t     lt
                                  Worst
                                  W




• Ranked list                                        23
Extract Identifiers




       All
   Identifiers




 Building
 B ildi
the Oracle
                      24
Extract Identifiers




                          Same
       All                 p
                          split?
   Identifiers         (CamelCase
                         Samurai
                         TIDIER)


                              YES

                      Concordant
 Building
 B ildi                   Split
                       Identifiers
the Oracle
                                     25
Extract Identifiers




                          Same
       All                 p
                          split?
   Identifiers         (CamelCase
                         Samurai     • Assume they are
                         TIDIER)
                                     correct
                              YES
                                     • Manually verified a
                      Concordant     sample
 Building
 B ildi                   Split
                       Identifiers
the Oracle
                                                       26
                                     • Threat to validity
Manually
                                             Split
                                          Identifiers




Extract Identifiers                       Manual Split




                          Same            Discordant
                                     NO
       All                 p
                          split?             Split
   Identifiers         (CamelCase
                                          Identifiers
                         Samurai
                         TIDIER)


                              YES

                      Concordant
 Building
 B ildi                   Split
                       Identifiers
the Oracle
                                                         27
Manually
                                                    Split
                                                 Identifiers




                         Consensus
Extract Identifiers                              Manual Split
                       between authors

                                      Checked
                          Same
       All                 p
                          split?     source codeDiscordant
                                         NO
                                                    Split
   Identifiers         (CamelCase
                                                 Identifiers
                         Samurai
                         TIDIER)


                              YES

                      Concordant
 Building
 B ildi                   Split
                       Identifiers
the Oracle
                                                                28
Identifiers          Manually
• Examples: DT, i3,       that could             Split
                          not be split        Identifiers
P754, zzz, etc.

• Left unchanged
Extract Identifiers                           Manual Split




                          Same                Discordant
                                         NO
       All                 p
                          split?                 Split
   Identifiers         (CamelCase
                                              Identifiers
                         Samurai
                         TIDIER)


                              YES

                      Concordant
 Building
 B ildi                   Split
                       Identifiers
the Oracle
                                                             29
Design of the Case Study




                           30
Design of the Case Study
• RQ: Does a FLT with an advanced
  splitting algorithm produce better results
  than the same FLT using the CamelCase
  splitting algorithm?




                                           31
How to Compare two FLTs?




                           32
How to Compare two FLTs?
• Effectiveness measure for each feature
           IR
    Method       LSI
                score
    M121     0.92
    M64      0.89
    M15      0.86       Gold t
                        G ld set method
                                   th d
    M39      0.80
    M7       0.74
    M152     0.65
             0 65
    M234     0.56       Effectiveness = 5
    M12      0.54
    M78      0.52
             0 52


                                            33
How to Compare two FLTs?
• Effectiveness measure for each feature
           IR
    Method       LSI                                        y
                                                         IRDyn
                score
                                                        Method     LSI
    M121     0.92                                                 score
    M64      0.89
                                 Gold set method        M15      0.86
    M15      0.86                                       M7       0.74
    M39      0.80                                       M234     0.56
    M7       0.74                                       M12      0.54
    M152     0.65
             0 65
                                    Effectiveness = 2
                                       ect e ess
    M234     0.56
    M12      0.54
    M78      0.52
             0 52

                        Method                                            34
                        Executed method (from trace)
Which FLTs are we Comparing?




                           35
Software Systems
  •   Rhino 1 6R5
            1.6R5
  •   138 classes, 1,870 methods, 32K LOC
  •   Eaddy
      E dd et al.’s d *
                 l ’ data*
  •   2 datasets
 Dataset        Size           Queries                Gold Sets              Execution
                                                                            Information
RhinoFeatures    241    Sections of                   Eaddy et al.*   Full Execution Traces
                        ECMAScript                                    (from unit tests)
                        documentation
 Rhino
 Rhi Bugs        143    Bug title d
                        B titl and                    Eaddy t l *
                                                      E dd et al.*    N/A
                        description                     (CVS)
                                                                                      36
 * http://www.cs.columbia.edu/~eaddy/concerntagger/
Software Systems
  • jEdit 4 3
          4.3
  • 483 classes, 6.4K methods, 109K LOC
  •2d t t
      datasets
Dataset         Size         Queries           Gold Sets       Execution
                                                              Information
jEditFeatures   64     Feature (or Patch)        SVN       Marked Execution
                       title and description               Traces
 jEditBugs      86     Bug title and             SVN       Marked Execution
                       description                         Traces

                        Datasets available at:
      http://www.cs.wm.edu/semeru/data/icpc11-identifier-splitting/       37
Generating the jEdit   SVN Commits between
     Datasets               v4.2-v4.3




                                       38
Generating the jEdit
     Datasets

                                     SVN commit
                                      message




                          Title

                           +
                       Description

                           =
                         Query
Generating the jEdit
     Datasets

                                                 Changed files




              Previous                 Current
               Version
               V i                     Version
                                       V i
                          Compare
                            using
                         Eclipse AST
                          c pse S



                     Modified methods
                        (gold set)        40
Presenting the Results




                         41
Presenting the Results
       Box plot of all effectiveness measure in
                        datasets
        (e.g., 241 datapoints for RhinoFeatures)


     Average




     Median


                                            42
IR FLTs



RhinoFeatures
IR FLTs

                Similar median
                S
                 and average
RhinoFeatures
IR FLTs

                Similar median
                S
                 and average
RhinoFeatures                    RhinoBugs




jEditFeatures                    jEditBugs 45
IR FLTs

                Similar median
                S
                 and average
RhinoFeatures                     RhinoBugs

                Datasets with
                features have
                 better results
                than datasets
                  with bugs


jEditFeatures                     jEditBugs 46
IRDyn FLTs
                                      N/A

                 Similar median
                 S
                  and average
RhinoFeatures                     RhinoBugs




jEditFeatures                     jEditBugs 47
IRDyn FLTs
                                       N/A

                 Similar median
                 S
                  and average
RhinoFeatures                      RhinoBugs

                 Datasets with
                   atasets t
                 features have
                  better results
                 than datasets
                   with bugs


jEditFeatures                      jEditBugs 48
Compare FLTs by Percentages
IROracle   IRCamelCase
             (Baseline)
  10            17
  20            15
  18            18
   5             9
   4            16
  19             7
  12            28
  14            15
                               49
Compare FLTs by Percentages
IROracle   IRCamelCase
                          5/8
             (Baseline)
  10            17
  20            15
  18            18
                                2/8
   5             9
   4            16
  19             7
  12            28
  14            15
                                 50
IR



RhinoFeatures            RhinoBugs




                                    51
   jEditFeatures        jEditBugs
IR


                   Datasets with features
RhinoFeatures               vs.              RhinoBugs
                    Datasets with bugs




                                                        52
   jEditFeatures                            jEditBugs
IRDyn
                                     N/A


                 Similar trend
RhinoFeatures                       RhinoBugs




                                             53
 jEditFeatures                   jEditBugs
Statistical Results
• Wilcoxon signed-rank test
           signed rank
• Null hypothesis
  – Th
    There is no statistical significance diff
            i      t ti ti l i ifi       difference
    in terms of effectiveness between
    IRSamurai/IROracle and IRCamelCase
                    l            l

• Alternative hypothesis
  – IRSamurai/IROracle h statistically significantly
                       has t ti ti ll i ifi      tl
    higher effectiveness than IRCamelCase
• alpha = 0 05
   l h    0.05
                                                       54
IR



RhinoFeatures       The only         RhinoBugs
                   statistical
                significant result
                    (p=0.05)




                                              55
jEditFeatures
                                     jEditB
Qualitative Results
• Vocabulary mismatch between queries
  and code:
  – Name of developers (e.g., Slava, Carlos)
  – Id ifi
    Identifiers specific to communication (
                    ifi           i i (e.g.,
    thanks, greetings, annoying)




                                               56
Qualitative Results
• Features are more “descriptive” than
  bugs




                                         57
Qualitative Results
• Features are more “descriptive” than
  bugs




Words “join” and
 “line” are not
  mentioned




                                         58
Threats to Validity
• External
  – 2 Java applications (different domains)
  – More systems needed
• Construct
  – Errors may be p
             y    present in Oracle and g
                                        gold sets
  – We used data produced by other researchers
• Internal
  – Subjectivity and bias in building the Oracle
• Conclusion
  – Non-parametric test: Wilcoxon signed-rank      59
Research Questions
• RQ1 Does IRSamurai outperform IRCamelCase in
  terms of effectiveness? NO

• RQ2 Does IRSSamuraiDyn outperform IRC
                    i                 CamelCaseDyn
                                          lC
  in terms of effectiveness? NO

• RQ3 Does IROracle outperform IRCamelCase in terms
  of effectiveness? I some cases (Rhi )
   f ff ti         ? In           (Rhino)

• RQ4 Does IROracleDyn outperform IRCamelCaseDyn
                                              60
  in terms of effectiveness? NO
Future Work
• More systems and datasets
• Different maintenance tasks
  – T
    Traceability li k recovery
          bilit link
• Consider other splitting algorithms




                                        61
Conclusions
• Advanced splitting technique could
  improve FLTs
  – We found some empirical evidence
• Splitting has more impact on IR FLT
• If execution information is available, it is
   f             f                l bl
  not necessary to use an advance splitting
  technique
      h i


                                             62
Thank you! Questions?
   SEMERU @ William and Mary
http://www.cs.wm.edu/semeru/
        bdit@cs.wm.edu
        bdi          d




        SEMERU


                               63
References
• Takang et al. (1996) Takang, A., Grubb, P., and Macredie, R., "The
  Effects of Comments and Identifier Names on Program Comprehensibility:
  An Experimental Investigation", Journal of Programming Languages, vol. 4,
  no. 3, 1996, pp. 143-167
• Lawrie et al (2006) Lawrie D Morrell C Feild H and Binkley, D
             al.         Lawrie, D., Morrell, C., Feild, H.,      Binkley D.,
  "What's in a Name? A Study of Identifiers", in Proc. of IEEE ICPC'06, June
  14-16 2006, pp. 3-12
• Binkley et al. (
        y        (2009) Binkley, D., Davis, M., Lawrie, D., and Morrell, C.,
                       )       y, ,        , ,           , ,              , ,
  "To CamelCase or Under_score", in Proc. of IEEE ICPC'09, May 17-19 2009,
  pp. 158-167
• Enslen et al. (2009) Enslen, E., Hill, E., Pollock, L., and Vijay-Shanker, K.,
  "Mining Source C d to A
  "Mi i S          Code    Automatically S li Id ifi
                                     i ll Split Identifiers f S f
                                                             for Software
  Analysis", in Proc. of IEEE MSR'09, May 16-17 2009, pp. 71-80
• Guerrouj et al. (2011) Guerrouj, L., Di Penta, M., Antoniol, G., and
  Guéhéneuc, Y. G., TIDIER:
  Guéhéneuc Y -G "TIDIER: An Identifier Splitting Approach using Speech
  Recognition Techniques", JSME, vol. to appear, 2011
                                                                              64

More Related Content

Viewers also liked

Ppt chcgroup403 d working with groups v 22.3.13
Ppt chcgroup403 d working with groups v 22.3.13Ppt chcgroup403 d working with groups v 22.3.13
Ppt chcgroup403 d working with groups v 22.3.13
CTA Australia
 
Decisions. Better Integrated Destination Marketing.
Decisions. Better Integrated Destination Marketing. Decisions. Better Integrated Destination Marketing.
Decisions. Better Integrated Destination Marketing.
Bob Hoot
 
Do you know?
Do you know?Do you know?
Do you know?
VME Group
 
Websites to increase pinterest followers
Websites to increase pinterest followersWebsites to increase pinterest followers
Websites to increase pinterest followers
shaun569
 
Coral erp garment
Coral erp garmentCoral erp garment
Coral erp garment
Suzoy Banerjiee
 
Successful marketing for events
Successful marketing for eventsSuccessful marketing for events
Successful marketing for events
Gita Cipta
 
130118 sergio luis da silva jr. - an approach to formalise security patterns
130118   sergio luis da silva jr. - an approach to formalise security patterns130118   sergio luis da silva jr. - an approach to formalise security patterns
130118 sergio luis da silva jr. - an approach to formalise security patterns
Ptidej Team
 

Viewers also liked (7)

Ppt chcgroup403 d working with groups v 22.3.13
Ppt chcgroup403 d working with groups v 22.3.13Ppt chcgroup403 d working with groups v 22.3.13
Ppt chcgroup403 d working with groups v 22.3.13
 
Decisions. Better Integrated Destination Marketing.
Decisions. Better Integrated Destination Marketing. Decisions. Better Integrated Destination Marketing.
Decisions. Better Integrated Destination Marketing.
 
Do you know?
Do you know?Do you know?
Do you know?
 
Websites to increase pinterest followers
Websites to increase pinterest followersWebsites to increase pinterest followers
Websites to increase pinterest followers
 
Coral erp garment
Coral erp garmentCoral erp garment
Coral erp garment
 
Successful marketing for events
Successful marketing for eventsSuccessful marketing for events
Successful marketing for events
 
130118 sergio luis da silva jr. - an approach to formalise security patterns
130118   sergio luis da silva jr. - an approach to formalise security patterns130118   sergio luis da silva jr. - an approach to formalise security patterns
130118 sergio luis da silva jr. - an approach to formalise security patterns
 

Similar to ICPC11b.ppt

Icpc11b.ppt
Icpc11b.pptIcpc11b.ppt
CPK Cryptosystem In Solaris
CPK Cryptosystem In SolarisCPK Cryptosystem In Solaris
CPK Cryptosystem In Solaris
Zhi Guan
 
Bypass Security Checking with Frida
Bypass Security Checking with FridaBypass Security Checking with Frida
Bypass Security Checking with Frida
Satria Ady Pradana
 
Tips And Tricks For Bioinformatics Software Engineering
Tips And Tricks For Bioinformatics Software EngineeringTips And Tricks For Bioinformatics Software Engineering
Tips And Tricks For Bioinformatics Software Engineering
jtdudley
 
Emoji Encryption Using AES Algorithm
Emoji Encryption Using AES AlgorithmEmoji Encryption Using AES Algorithm
Emoji Encryption Using AES Algorithm
ijtsrd
 
ISSA: Next Generation Tokenization for Compliance and Cloud Data Protection
ISSA: Next Generation Tokenization for Compliance and Cloud Data ProtectionISSA: Next Generation Tokenization for Compliance and Cloud Data Protection
ISSA: Next Generation Tokenization for Compliance and Cloud Data Protection
Ulf Mattsson
 
Categorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk pythonCategorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk python
Janu Jahnavi
 
BLE as Active RFID
BLE as Active RFIDBLE as Active RFID
BLE as Active RFID
reelyActive
 
iPhonical and model-driven software development for the iPhone
iPhonical and model-driven software development for the iPhoneiPhonical and model-driven software development for the iPhone
iPhonical and model-driven software development for the iPhone
Heiko Behrens
 
NanoSec Conference 2019: Code Execution Analysis in Mobile Apps - Abdullah Jo...
NanoSec Conference 2019: Code Execution Analysis in Mobile Apps - Abdullah Jo...NanoSec Conference 2019: Code Execution Analysis in Mobile Apps - Abdullah Jo...
NanoSec Conference 2019: Code Execution Analysis in Mobile Apps - Abdullah Jo...
Hafez Kamal
 
BDD - Buzzword Driven Development - Build the next cool app for fun and for.....
BDD - Buzzword Driven Development - Build the next cool app for fun and for.....BDD - Buzzword Driven Development - Build the next cool app for fun and for.....
BDD - Buzzword Driven Development - Build the next cool app for fun and for.....
Alessandro Cinelli (cirpo)
 
BDD - Buzzword Driven Development - Build the next cool app for fun and for.....
BDD - Buzzword Driven Development - Build the next cool app for fun and for.....BDD - Buzzword Driven Development - Build the next cool app for fun and for.....
BDD - Buzzword Driven Development - Build the next cool app for fun and for.....
Michele Orselli
 
Categorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk pythonCategorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk python
Janu Jahnavi
 
The Belgian E Id Hacker Vs Developer
The Belgian E Id Hacker Vs DeveloperThe Belgian E Id Hacker Vs Developer
The Belgian E Id Hacker Vs Developer
beires
 
Perceptual Computing
Perceptual ComputingPerceptual Computing
Perceptual Computing
Intel Developer Zone Community
 
Bypass Security Checking with Frida
Bypass Security Checking with FridaBypass Security Checking with Frida
Bypass Security Checking with Frida
Satria Ady Pradana
 
internet
internetinternet
internet
jocker0080
 
BCS_PKI_part1.ppt
BCS_PKI_part1.pptBCS_PKI_part1.ppt
BCS_PKI_part1.ppt
UskuMusku1
 
Chado-XML
Chado-XMLChado-XML
Chado-XML
Chris Mungall
 
Rights Technologies for E-Publishing
Rights Technologies for E-PublishingRights Technologies for E-Publishing
Rights Technologies for E-Publishing
GiantSteps Media Technology Strategies
 

Similar to ICPC11b.ppt (20)

Icpc11b.ppt
Icpc11b.pptIcpc11b.ppt
Icpc11b.ppt
 
CPK Cryptosystem In Solaris
CPK Cryptosystem In SolarisCPK Cryptosystem In Solaris
CPK Cryptosystem In Solaris
 
Bypass Security Checking with Frida
Bypass Security Checking with FridaBypass Security Checking with Frida
Bypass Security Checking with Frida
 
Tips And Tricks For Bioinformatics Software Engineering
Tips And Tricks For Bioinformatics Software EngineeringTips And Tricks For Bioinformatics Software Engineering
Tips And Tricks For Bioinformatics Software Engineering
 
Emoji Encryption Using AES Algorithm
Emoji Encryption Using AES AlgorithmEmoji Encryption Using AES Algorithm
Emoji Encryption Using AES Algorithm
 
ISSA: Next Generation Tokenization for Compliance and Cloud Data Protection
ISSA: Next Generation Tokenization for Compliance and Cloud Data ProtectionISSA: Next Generation Tokenization for Compliance and Cloud Data Protection
ISSA: Next Generation Tokenization for Compliance and Cloud Data Protection
 
Categorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk pythonCategorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk python
 
BLE as Active RFID
BLE as Active RFIDBLE as Active RFID
BLE as Active RFID
 
iPhonical and model-driven software development for the iPhone
iPhonical and model-driven software development for the iPhoneiPhonical and model-driven software development for the iPhone
iPhonical and model-driven software development for the iPhone
 
NanoSec Conference 2019: Code Execution Analysis in Mobile Apps - Abdullah Jo...
NanoSec Conference 2019: Code Execution Analysis in Mobile Apps - Abdullah Jo...NanoSec Conference 2019: Code Execution Analysis in Mobile Apps - Abdullah Jo...
NanoSec Conference 2019: Code Execution Analysis in Mobile Apps - Abdullah Jo...
 
BDD - Buzzword Driven Development - Build the next cool app for fun and for.....
BDD - Buzzword Driven Development - Build the next cool app for fun and for.....BDD - Buzzword Driven Development - Build the next cool app for fun and for.....
BDD - Buzzword Driven Development - Build the next cool app for fun and for.....
 
BDD - Buzzword Driven Development - Build the next cool app for fun and for.....
BDD - Buzzword Driven Development - Build the next cool app for fun and for.....BDD - Buzzword Driven Development - Build the next cool app for fun and for.....
BDD - Buzzword Driven Development - Build the next cool app for fun and for.....
 
Categorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk pythonCategorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk python
 
The Belgian E Id Hacker Vs Developer
The Belgian E Id Hacker Vs DeveloperThe Belgian E Id Hacker Vs Developer
The Belgian E Id Hacker Vs Developer
 
Perceptual Computing
Perceptual ComputingPerceptual Computing
Perceptual Computing
 
Bypass Security Checking with Frida
Bypass Security Checking with FridaBypass Security Checking with Frida
Bypass Security Checking with Frida
 
internet
internetinternet
internet
 
BCS_PKI_part1.ppt
BCS_PKI_part1.pptBCS_PKI_part1.ppt
BCS_PKI_part1.ppt
 
Chado-XML
Chado-XMLChado-XML
Chado-XML
 
Rights Technologies for E-Publishing
Rights Technologies for E-PublishingRights Technologies for E-Publishing
Rights Technologies for E-Publishing
 

More from Ptidej Team

From IoT to Software Miniaturisation
From IoT to Software MiniaturisationFrom IoT to Software Miniaturisation
From IoT to Software Miniaturisation
Ptidej Team
 
Presentation
PresentationPresentation
Presentation
Ptidej Team
 
Presentation
PresentationPresentation
Presentation
Ptidej Team
 
Presentation
PresentationPresentation
Presentation
Ptidej Team
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel Briand
Ptidej Team
 
Manel Abdellatif
Manel AbdellatifManel Abdellatif
Manel Abdellatif
Ptidej Team
 
Azadeh Kermansaravi
Azadeh KermansaraviAzadeh Kermansaravi
Azadeh Kermansaravi
Ptidej Team
 
Mouna Abidi
Mouna AbidiMouna Abidi
Mouna Abidi
Ptidej Team
 
CSED - Manel Grichi
CSED - Manel GrichiCSED - Manel Grichi
CSED - Manel Grichi
Ptidej Team
 
Cristiano Politowski
Cristiano PolitowskiCristiano Politowski
Cristiano Politowski
Ptidej Team
 
Will io t trigger the next software crisis
Will io t trigger the next software crisisWill io t trigger the next software crisis
Will io t trigger the next software crisis
Ptidej Team
 
MIPA
MIPAMIPA
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.ppt
Ptidej Team
 
Thesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptThesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.ppt
Ptidej Team
 
Medicine15.ppt
Medicine15.pptMedicine15.ppt
Medicine15.ppt
Ptidej Team
 
Qrs17b.ppt
Qrs17b.pptQrs17b.ppt
Qrs17b.ppt
Ptidej Team
 
Icpc11c.ppt
Icpc11c.pptIcpc11c.ppt
Icpc11c.ppt
Ptidej Team
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.ppt
Ptidej Team
 
Msr17a.ppt
Msr17a.pptMsr17a.ppt
Msr17a.ppt
Ptidej Team
 
Icsoc15.ppt
Icsoc15.pptIcsoc15.ppt
Icsoc15.ppt
Ptidej Team
 

More from Ptidej Team (20)

From IoT to Software Miniaturisation
From IoT to Software MiniaturisationFrom IoT to Software Miniaturisation
From IoT to Software Miniaturisation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel Briand
 
Manel Abdellatif
Manel AbdellatifManel Abdellatif
Manel Abdellatif
 
Azadeh Kermansaravi
Azadeh KermansaraviAzadeh Kermansaravi
Azadeh Kermansaravi
 
Mouna Abidi
Mouna AbidiMouna Abidi
Mouna Abidi
 
CSED - Manel Grichi
CSED - Manel GrichiCSED - Manel Grichi
CSED - Manel Grichi
 
Cristiano Politowski
Cristiano PolitowskiCristiano Politowski
Cristiano Politowski
 
Will io t trigger the next software crisis
Will io t trigger the next software crisisWill io t trigger the next software crisis
Will io t trigger the next software crisis
 
MIPA
MIPAMIPA
MIPA
 
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.ppt
 
Thesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptThesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.ppt
 
Medicine15.ppt
Medicine15.pptMedicine15.ppt
Medicine15.ppt
 
Qrs17b.ppt
Qrs17b.pptQrs17b.ppt
Qrs17b.ppt
 
Icpc11c.ppt
Icpc11c.pptIcpc11c.ppt
Icpc11c.ppt
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.ppt
 
Msr17a.ppt
Msr17a.pptMsr17a.ppt
Msr17a.ppt
 
Icsoc15.ppt
Icsoc15.pptIcsoc15.ppt
Icsoc15.ppt
 

Recently uploaded

HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 

Recently uploaded (20)

HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 

ICPC11b.ppt

  • 1. Can Better Identifier Splitting Techniques H l F T h i Help Feature LLocation? i ? Bogdan Dit, Latifa Guerrouj, D B d Di L if G j Denys P h Poshyvanyk, Gi li k Giuliano A Antoniol i l SEMERU 19th IEEE International Conference on Program Comprehension (ICPC’11) – Kingston, Ontario, Canada
  • 2. 2
  • 3. Textual information embeds domain k d i knowledge l d 3
  • 4. Textual information embeds domain k d i knowledge l d About 70% of source code consists of identifiers* * Deissenboeck, F. and Pizka , M., "Concise and Consistent Naming", Software 4 Quality Journal, vol. 14, no. 3, 2006, pp. 261-282
  • 5. Textual information embeds domain k d i knowledge l d About 70% of source code consists of identifiers* Identifiers are important source of information for maintenance tasks: • traceability link recovery • feature location * Deissenboeck, F. and Pizka , M., "Concise and Consistent Naming", Software 5 Quality Journal, vol. 14, no. 3, 2006, pp. 261-282
  • 6. # of Cumulative Feature Location Papers based on Textual Information 6
  • 7. Related Work on Identifiers • Takang et al. (JPL 96) al (JPL’96) – programs with full-word identifiers are more understandable than those with abbreviated ones • Lawrie et al. (ICPC 06) al (ICPC’06) – full words and recognizable abbreviations lead to better comprehension • Binkley et al. (ICPC’09) – CamelCase style is easier to recognize than underscore 7
  • 8. Related Work on Identifiers • Enslen et al. (MSR 09) al (MSR’09) – Samurai: algorithm for splitting identifiers (using tables of identifier frequencies) • Guerrouj et al. (JSME’11) – TIDIER: algorithm for splitting identifiers (using contextual information) • Other related work – Deissenboeck and Pizka (SQJ’06), Antoniol et al. (ICSM’07), al (ICSM’07) Haiduc and Marcus (ICPC’08) (ICPC’08), etc. 8
  • 9. Splitting Identifiers Correctly is Challenging h ll 9
  • 10. Identifier Splitting Algorithms Original Identifier userId setGID print_file2device print file2device SSLCertificate MINstring USERID currentsize readadapterobject p j tolocale imitating DEFMASKBit 10
  • 11. Identifier Splitting Algorithms Original Identifier Camel Case userId user Id setGID set GID print_file2device print file2device print file 2 device SSLCertificate SSL Certificate MINstring MI Nstring USERID USERID currentsize currentsize readadapterobject p j readadapterobject p j tolocale tolocale imitating imitating DEFMASKBit DEFMASK Bit 11
  • 12. Identifier Splitting Algorithms Handles Original Identifier Camel Case underscore and userId user Id digits setGID set GID print_file2device print file2device print file 2 device SSLCertificate SSL Certificate MINstring MI Nstring USERID USERID currentsize currentsize readadapterobject p j readadapterobject Fails p j at mixed cases tolocale tolocale imitating imitating DEFMASKBit DEFMASK Bit Fails at same case identifiers 12
  • 13. Identifier Splitting Algorithms Original Identifier Camel Case userId user Id setGID set GID print_file2device print file2device print file 2 device SSLCertificate SSL Certificate MINstring MI Nstring USERID USERID currentsize currentsize readadapterobject p j readadapterobject p j tolocale tolocale imitating imitating DEFMASKBit DEFMASK Bit 13
  • 14. Identifier Splitting Algorithms Original Identifier Camel Case Samurai userId user Id user Id setGID set GID set GID print_file2device print file2device print file 2 device print file 2 device SSLCertificate SSL Certificate SSL Certificate MINstring MI Nstring MIN string USERID USERID USER ID currentsize currentsize current size readadapterobject p j readadapterobject p j read adapter object p j tolocale tolocale tol ocal e imitating imitating imi ta ting DEFMASKBit DEFMASK Bit DEF MASK Bit 14
  • 15. Identifier Splitting Algorithms Original Identifier Camel Case Samurai userId user Id Splits some cases user Id setGID set GID set GID where CamelCase print_file2device print file2device print file 2 device print file 2 device SSLCertificate cannot SSL Certificate SSL Certificate MINstring MI Nstring MIN string USERID USERID USER ID currentsize currentsize current size readadapterobject p j readadapterobject p j read adapter object p j tolocale tolocale tol ocal e imitating imitating imi ta ting DEFMASKBit DEFMASK Bit DEF MASK Bit Oversplits 15
  • 16. # of Cumulative Feature Location Papers based on Textual Information 16
  • 17. # of Cumulative Feature Location Papers based on Textual Information Existing feature location techniques use Camel Case splitting 17
  • 18. Information Retrieval FLT • Generate corpus synchronized void print(TestResult result, • Preprocessing long runTime) throws IOE l Ti ) h printHeader(runTime); IOException{ i { – Remove non-literals printErrors(result); – Remove stop words p printFailures(result); ( ); printFooter(result); – Split identifiers } – Stemming synchronized void print TestResult result • I d i Indexing long runTime throws IOException – Term-by-document matrix printHeader runTime printErrors result printFailures result printFooter result – Singular Value g Decomposition • User formulate query print TestResult result runTime IOException printHeader runTime O cept o p t eade u e • G Generate results t lt printErrors result printFailures result • Ranked list printFooter result 18
  • 19. Information Retrieval FLT • Generate corpus print Test Result result run Time IO Exception print Header run Ti print E ti i tH d Time i t • Preprocessing Errors result print Failures result print – Remove non-literals Footer result – Remove stop words – Split identifiers print test result result run time io – Stemming exception print head run time print error • I d i Indexing result print fail result print foot result – Term-by-document matrix – Singular Value g Decomposition print test result ... • User formulate query m1 5 1 3 ... • G Generate results t lt m2 ... ... ... ... • Ranked list 19
  • 20. Information Retrieval FLT print test result ... • Generate corpus • Preprocessing m1 5 1 3 ... – Remove non-literals m2 ... ... ... ... – Remove stop words – Split identifiers – Stemming • I d i Indexing – Term-by-document matrix – Singular Value g Decomposition • User formulate query • G Generate results t lt • Ranked list 20
  • 21. IR and Dynamic Information FLT • Generate corpus • Preprocessing – Remove non-literals – Remove stop words – Split identifiers – Stemming • I d i Indexing – Term-by-document matrix – Singular Value Decomposition g p Collect execution • User formulate query trace • Generate results • Ranked list of executed methods 21
  • 22. Research Goal R hG l Evaluate how advanced splitting techniques impact the th performance of feature location techniques f ff t l ti t h i 22
  • 23. Information Retrieval FLT • Generate corpus • Preprocessing – Remove non-literals Replace Camel Case with : – Remove stop words •Samurai – Split identifiers •“Perfect” Splitting – Stemming algorithm ( g (Oracle) ) • I d i Indexing – Term-by-document matrix – Singular Value g Decomposition Better • User formulate query • G Generate results t lt Worst W • Ranked list 23
  • 24. Extract Identifiers All Identifiers Building B ildi the Oracle 24
  • 25. Extract Identifiers Same All p split? Identifiers (CamelCase Samurai TIDIER) YES Concordant Building B ildi Split Identifiers the Oracle 25
  • 26. Extract Identifiers Same All p split? Identifiers (CamelCase Samurai • Assume they are TIDIER) correct YES • Manually verified a Concordant sample Building B ildi Split Identifiers the Oracle 26 • Threat to validity
  • 27. Manually Split Identifiers Extract Identifiers Manual Split Same Discordant NO All p split? Split Identifiers (CamelCase Identifiers Samurai TIDIER) YES Concordant Building B ildi Split Identifiers the Oracle 27
  • 28. Manually Split Identifiers Consensus Extract Identifiers Manual Split between authors Checked Same All p split? source codeDiscordant NO Split Identifiers (CamelCase Identifiers Samurai TIDIER) YES Concordant Building B ildi Split Identifiers the Oracle 28
  • 29. Identifiers Manually • Examples: DT, i3, that could Split not be split Identifiers P754, zzz, etc. • Left unchanged Extract Identifiers Manual Split Same Discordant NO All p split? Split Identifiers (CamelCase Identifiers Samurai TIDIER) YES Concordant Building B ildi Split Identifiers the Oracle 29
  • 30. Design of the Case Study 30
  • 31. Design of the Case Study • RQ: Does a FLT with an advanced splitting algorithm produce better results than the same FLT using the CamelCase splitting algorithm? 31
  • 32. How to Compare two FLTs? 32
  • 33. How to Compare two FLTs? • Effectiveness measure for each feature IR Method LSI score M121 0.92 M64 0.89 M15 0.86 Gold t G ld set method th d M39 0.80 M7 0.74 M152 0.65 0 65 M234 0.56 Effectiveness = 5 M12 0.54 M78 0.52 0 52 33
  • 34. How to Compare two FLTs? • Effectiveness measure for each feature IR Method LSI y IRDyn score Method LSI M121 0.92 score M64 0.89 Gold set method M15 0.86 M15 0.86 M7 0.74 M39 0.80 M234 0.56 M7 0.74 M12 0.54 M152 0.65 0 65 Effectiveness = 2 ect e ess M234 0.56 M12 0.54 M78 0.52 0 52 Method 34 Executed method (from trace)
  • 35. Which FLTs are we Comparing? 35
  • 36. Software Systems • Rhino 1 6R5 1.6R5 • 138 classes, 1,870 methods, 32K LOC • Eaddy E dd et al.’s d * l ’ data* • 2 datasets Dataset Size Queries Gold Sets Execution Information RhinoFeatures 241 Sections of Eaddy et al.* Full Execution Traces ECMAScript (from unit tests) documentation Rhino Rhi Bugs 143 Bug title d B titl and Eaddy t l * E dd et al.* N/A description (CVS) 36 * http://www.cs.columbia.edu/~eaddy/concerntagger/
  • 37. Software Systems • jEdit 4 3 4.3 • 483 classes, 6.4K methods, 109K LOC •2d t t datasets Dataset Size Queries Gold Sets Execution Information jEditFeatures 64 Feature (or Patch) SVN Marked Execution title and description Traces jEditBugs 86 Bug title and SVN Marked Execution description Traces Datasets available at: http://www.cs.wm.edu/semeru/data/icpc11-identifier-splitting/ 37
  • 38. Generating the jEdit SVN Commits between Datasets v4.2-v4.3 38
  • 39. Generating the jEdit Datasets SVN commit message Title + Description = Query
  • 40. Generating the jEdit Datasets Changed files Previous Current Version V i Version V i Compare using Eclipse AST c pse S Modified methods (gold set) 40
  • 42. Presenting the Results Box plot of all effectiveness measure in datasets (e.g., 241 datapoints for RhinoFeatures) Average Median 42
  • 44. IR FLTs Similar median S and average RhinoFeatures
  • 45. IR FLTs Similar median S and average RhinoFeatures RhinoBugs jEditFeatures jEditBugs 45
  • 46. IR FLTs Similar median S and average RhinoFeatures RhinoBugs Datasets with features have better results than datasets with bugs jEditFeatures jEditBugs 46
  • 47. IRDyn FLTs N/A Similar median S and average RhinoFeatures RhinoBugs jEditFeatures jEditBugs 47
  • 48. IRDyn FLTs N/A Similar median S and average RhinoFeatures RhinoBugs Datasets with atasets t features have better results than datasets with bugs jEditFeatures jEditBugs 48
  • 49. Compare FLTs by Percentages IROracle IRCamelCase (Baseline) 10 17 20 15 18 18 5 9 4 16 19 7 12 28 14 15 49
  • 50. Compare FLTs by Percentages IROracle IRCamelCase 5/8 (Baseline) 10 17 20 15 18 18 2/8 5 9 4 16 19 7 12 28 14 15 50
  • 51. IR RhinoFeatures RhinoBugs 51 jEditFeatures jEditBugs
  • 52. IR Datasets with features RhinoFeatures vs. RhinoBugs Datasets with bugs 52 jEditFeatures jEditBugs
  • 53. IRDyn N/A Similar trend RhinoFeatures RhinoBugs 53 jEditFeatures jEditBugs
  • 54. Statistical Results • Wilcoxon signed-rank test signed rank • Null hypothesis – Th There is no statistical significance diff i t ti ti l i ifi difference in terms of effectiveness between IRSamurai/IROracle and IRCamelCase l l • Alternative hypothesis – IRSamurai/IROracle h statistically significantly has t ti ti ll i ifi tl higher effectiveness than IRCamelCase • alpha = 0 05 l h 0.05 54
  • 55. IR RhinoFeatures The only RhinoBugs statistical significant result (p=0.05) 55 jEditFeatures jEditB
  • 56. Qualitative Results • Vocabulary mismatch between queries and code: – Name of developers (e.g., Slava, Carlos) – Id ifi Identifiers specific to communication ( ifi i i (e.g., thanks, greetings, annoying) 56
  • 57. Qualitative Results • Features are more “descriptive” than bugs 57
  • 58. Qualitative Results • Features are more “descriptive” than bugs Words “join” and “line” are not mentioned 58
  • 59. Threats to Validity • External – 2 Java applications (different domains) – More systems needed • Construct – Errors may be p y present in Oracle and g gold sets – We used data produced by other researchers • Internal – Subjectivity and bias in building the Oracle • Conclusion – Non-parametric test: Wilcoxon signed-rank 59
  • 60. Research Questions • RQ1 Does IRSamurai outperform IRCamelCase in terms of effectiveness? NO • RQ2 Does IRSSamuraiDyn outperform IRC i CamelCaseDyn lC in terms of effectiveness? NO • RQ3 Does IROracle outperform IRCamelCase in terms of effectiveness? I some cases (Rhi ) f ff ti ? In (Rhino) • RQ4 Does IROracleDyn outperform IRCamelCaseDyn 60 in terms of effectiveness? NO
  • 61. Future Work • More systems and datasets • Different maintenance tasks – T Traceability li k recovery bilit link • Consider other splitting algorithms 61
  • 62. Conclusions • Advanced splitting technique could improve FLTs – We found some empirical evidence • Splitting has more impact on IR FLT • If execution information is available, it is f f l bl not necessary to use an advance splitting technique h i 62
  • 63. Thank you! Questions? SEMERU @ William and Mary http://www.cs.wm.edu/semeru/ bdit@cs.wm.edu bdi d SEMERU 63
  • 64. References • Takang et al. (1996) Takang, A., Grubb, P., and Macredie, R., "The Effects of Comments and Identifier Names on Program Comprehensibility: An Experimental Investigation", Journal of Programming Languages, vol. 4, no. 3, 1996, pp. 143-167 • Lawrie et al (2006) Lawrie D Morrell C Feild H and Binkley, D al. Lawrie, D., Morrell, C., Feild, H., Binkley D., "What's in a Name? A Study of Identifiers", in Proc. of IEEE ICPC'06, June 14-16 2006, pp. 3-12 • Binkley et al. ( y (2009) Binkley, D., Davis, M., Lawrie, D., and Morrell, C., ) y, , , , , , , , "To CamelCase or Under_score", in Proc. of IEEE ICPC'09, May 17-19 2009, pp. 158-167 • Enslen et al. (2009) Enslen, E., Hill, E., Pollock, L., and Vijay-Shanker, K., "Mining Source C d to A "Mi i S Code Automatically S li Id ifi i ll Split Identifiers f S f for Software Analysis", in Proc. of IEEE MSR'09, May 16-17 2009, pp. 71-80 • Guerrouj et al. (2011) Guerrouj, L., Di Penta, M., Antoniol, G., and Guéhéneuc, Y. G., TIDIER: Guéhéneuc Y -G "TIDIER: An Identifier Splitting Approach using Speech Recognition Techniques", JSME, vol. to appear, 2011 64