Kenyon: A Software Stratigraphy Platform (ESEC/FSE 2005)

Sung Kim
Sung KimAssociate Prof.
Kenyon: A Software Stratigraphy Platform



 Jennifer Bevan, Sunghun               Lijie Zou, Mike Godfrey
Kim, E. James Whitehead Jr.               University of Waterloo
University of California, Santa Cruz        {lzou, migod}
     {jbevan, hunkim, ejw}                  @uwaterloo.edu
         @cs.ucsc.edu
Motivation

 Static analysis-based software evolution
  research has several common technical
  issues to manage.
     Extracting meaningful configurations from an
      SCM repository.
     Calculating static relations, metrics.
         Augments data from commit log messages.
     Saving the extracted facts.
         For later time-based analysis, data mining,
          incremental data addition.
Ongoing Static Evolution Research

 Instability Analysis (J. Bevan)
      Refines Zimmerman/Ying/Murphy using static
       dependence to remove temporal dependencies
 Entity Mapping/Origin Analysis (L. Zou, M.
  Godfrey)
      Uses static metrics to identify moved/split/merged
       procedures, files.
 Code clone evolution (M. Kim)
      Identifies clones and follows their evolution.
More Static Evolution Research

 Association rule mining
      For predicting changes [Ying et al., IEEE TSE, v30 n9, Sept. 2004]
      For architectural justification [Zimmermann, Diehl, and Zeller,
       Proc. IWPSE 2003]
 Identifying code “chunks” for future
  modularization [Mockus and Weiss, IEEE Software, v18 n2, 2001]
 “Feature” identification [Fischer, Pinzger, and Gall, Proc. WCRE
  2003]

 …and the ongoing research related to these.
Problem

 Despite similarity of approach, systems make
  several choices that limit sharing of technology and
  results:
      Usually choosing a single SCM system (CVS) for data.
      Usually creating a proprietary database schema.
      Usually not easily integratable with other research
       projects for result sharing.
 The cost of computationally expensive analysis
  techniques are not amortized across multiple
  research directions.
Solution: Kenyon

 Kenyon is designed to facilitate static software
  evolution research by providing common solutions
  to these common problems:
      Phase 1: Automatic configuration extraction from SCM
      Phase 2: Invoking static analysis tool(s)
      Phase 3: Storing the results from these preprocessing
       steps.
      Asynchronously provides access to previously
       processed and stored data.
Kenyon Processing

                                           Phases 2 & 3
                                           Fact Extraction
             Phase 1                       (Static Analysis)
             Configuration                 and Persist
             Extraction                    Gathered Facts
  SCM                                                            Kenyon
Repository                                                       Repository
                                                                 (RDBMS/
                                                                 Hibernate)
                             Filesystem



                                                               Client Tools
                                                               perform queries,
                                                               add new facts


                                            Client
                                           Software
                                          (e.g., IVA)
Phase 1: Extract Configurations

 Kenyon provides transaction recovery and logical
  configuration extraction for multiple SCM systems.
      Configurations specified by time + branch identifier.
      Sliding window algorithm for transaction recovery.
      Only changes from completed transactions are extracted
       for a “logical configuration”.
      Only changes from transactions that completed between
       two specifications are considered for a “configuration
       delta”.
Configuration Specification

 Kenyon’s logical configuration extraction and delta
  calculations allow researchers to consider software
  “as it existed at time T on branch B”.
      Most SCM systems archive data along a timeline with
       varying support for parallel development.
      Kenyon uses this commonality as the basis for its SCM
       interface and configuration specification.
      There is no indication that change-set based SCM
       systems will not be supportable by Kenyon.
Logical Configuration

• At any given point in time,
  one or more transactions may
  have just completed, and one
  or more may be ongoing.        T1
• Ongoing transactions are                           F4
  shown in red.
• Completed transactions are               F2
  shown in green.                     F1
                                                F3
Configuration Deltas

• Configuration deltas are
  calculated as C(T2) –
  C(T1).
• Only changes from            T2
  transactions completing
  between T1 (exclusive) and
                               T1                  F4
  T2 (inclusive) are
  considered.
                                              F3
                                         F2
                                    F1
Data from Phase 1

 Valid configuration specifications for extraction are
  created by Kenyon, one per timestamp where a
  transaction completed.
 For each configuration extracted:
      Author and log message of each transaction completing
       at that specification.
      The configuration is placed on the filesystem.
 A configuration delta for each consecutive pair of
  configurations processed can also be stored.
Phase 2: Invoke Fact Extractors

 Kenyon provides an abstract class that is used to
  invoke third-party fact extractors on the
  configuration extracted to the filesystem.
      Kenyon users would subclass this class to invoke their
       own fact extractor.
      Support for Codesurfer (line-level analysis) and
       SWAGKIT (procedure-level analysis) are provided with
       Kenyon. [www.grammatech.com, swag.uwaterloo.ca]
      FactExtractor subclasses have a tri-modal return status:
       “failure”, “new data to store”, or “no new data to store”.
Data from Phase 2

 FactExtractor subclasses provide:
      A ConfigGraph that maps software elements to nodes
       and static relationships to edges.
      The graph, any node, and any edge may be attributed
       with static metrics.
 Multiple fact extractors may be invoked on a single
  configuration: each created ConfigGraph is saved
  with a reference to the fact extractor that created it.
 If a configuration has already been processed by a
  given fact extractor, it will not be processed again
  unless new metrics are to be calculated.
Phase 3: Data Storage

 Kenyon uses Hibernate to persist data
  classes.
     Hibernate is an “object/relational persistence and
      query service for Java” [www.hibernate.org].
     Allows reuse of Kenyon classes by research
      tools implemented in Java.
     Each configuration processed by Kenyon is
      assigned to a Project, the top-level data class
      persisted by Kenyon.
Persisted Kenyon Data

• Projects contain one set of
  data for each configuration                             Project
  specification processed.                                    1

                                                              N
• Each such data set                            N   1
                                ConfigGraph             ConfigData
  contains one or more                1                       1
  ConfigGraphs, each                  1                       N
  produced by a different
                                FactExtractor           ConfigSpec
  FactExtractor.
                                      1                       2
• FactExtractors specify              1                       1
  what GraphSchema              GraphSchema             ConfigDelta
  subclass they use (not
  restrictive).
Data Access

 Hibernate allows access to preprocessed data using
  SQL or the Hibernate query methods (HQL, QBE/
  QBC), which support class/field-based queries.
      A Hibernate query returns a List of Objects, each of
       which is of the type originally persisted.
      Data fields in the returned class are populated unless
       specified as lazily loaded.
 Kenyon provides several convenience queries for
  common anticipated queries, such as “what
  configurations are available for this project”.
Kenyon Usage

 Kenyon processes data based on specifications in a
  configuration file
      Start time, stop time, how often to process
      Fact extractors and their assigned metric calculators.
      SCM parameters, filesystem parameters, some control
       over what Hibernate persists.
 A “processing run” will reuse any previously
  processed data if available
      For example, if a ConfigGraph has already been created,
       if new metrics are necessary they are calculated and
       added to the existing ConfigGraph.
Iterative Refinement Example

 When looking for “interesting” timeframes of
  evolution, a multiple-pass process is recommended.
      A user can configure Kenyon to process the changes in a
       system once per day.
      Days with high activity or other metrics exceeding a
       threshold can be flagged as “interesting”.
      The user can then configure Kenyon to process those
       days (via multiple processing runs) at the frequency of
       “every 20 minutes”.
      This process can repeat down to the “every second”
       level.
Parallel Preprocessing

 Kenyon is a single-threaded process, but Hibernate
  supports multiple connections to a single Kenyon
  database.
 A 10-year history can be processed in chunks by
  any number of computers, even if the processing
  configurations have overlapping times or different
  intervals.
 Kenyon does not integrate the deltas between
  different processing runs, so a small overlap in
  processing chunks is suggested.
Kenyon Architecture


 ConfigData           Project                 Hibernate/DBMS


 ConfigGraph                           <<calls>>
                                                   DataManager
                                                   <<calls>>

               <<calls>>
MetricLoader               Fact Extractor          SCMInterface
                                                   <<calls>>



                                                        SCM
                            Filesystem
                                                      Repository
Current Status

 Kenyon 1.2 available at
  http://kenyon.dforge.cse.ucsc.edu
 Supports CVS, Subversion, and ClearCase
 Students in 290G are performing projects
  using Kenyon this quarter
 Actively working with Samsung to analyze
  some of their source code.
Future Work (1/3)

 Continue working with M. Kim
      Evaluate usefulness of SCM-only module.
      If she decides to use Kenyon, assist with full integration.
 Finish integration of Beagle/Kenyon and
  IVA/Kenyon.
 Work with G. Murphy on using Kenyon at UBC.
 Evaluate Kenyon’s ability to reduce the time-to-
  results for static software evolution research by
  analyzing the seminar class projects.
Future Work (2/3)

 Support branch path traversal
      Allow users to see the branch points in a system and
       specify a path for processing instead of a single branch.
      Will reuse existing visualizations, must add specification
       mechanism.
 Incorporate full language-specific containment
  models for better inter-language graph traversal and
  mapping.
      Use M. Godfrey’s Java fact extractor and containment
       model.
Future Work (3/3)

 Support more of the Standard Exchange
  Formats for ConfigGraph export.
     TA is already supported, but only the Fact
      sections. Schema sections should be improved
      to use the language-specific containment models.
 Encourage other reseachers to use Kenyon,
  and improve results-sharing, capabilities, etc.
  based on their feedback.
Open Issues (1/3)

 The exact mechanism for allowing data
  sharing between researchers is not entirely
  controllable by Kenyon
     Database setup and administration can
      effectively override much of Kenyon’s
      preferences.
     By default, Kenyon-created tables are not
      mutable by processes other than Kenyon.
Open Issues (2/3)

 Kenyon provides a public class, EvolutionPath, that
  links a subgraph in one ConfigGraph to one in
  another ConfigGraph.
      Directed and attributable.
      Basic building block for evolution data.
 Is currently persisted by Kenyon, will likely not be
  after 1.1, due to database mutability issues.
      Other research projects can subclass and, if they want to
       share their results easily, persist them to a Hibernate
       database using the provided Hibernate mapping
       examples.
Open Issues (3/3)

 Kenyon is able to be automatically invoked
  via a post-commit script or a cron job.
 Should Kenyon be able to be automatically
  invoked from an IDE?
 What sort of support should Kenyon provide
  for better integration with, for example,
  Eclipse?
Conclusions (1/2)

 Kenyon is an engineering solution, designed to
  amortize the cost of the computationally expensive
  preprocessing steps that can benefit static software
  evolution research.
 Research projects using Kenyon will not have to
  independently create solutions for these common
  problems.
      18% code reduction in Beagle without really trying.
      Is expected to reduce the lag between beginning system
       implementation and producing research results.
Conclusions (2/2)

 Kenyon is not intended to be a lightweight data
  mining system for software evolution research.
      Tradeoff of speed vs. precision is still controllable via
       the choice of fact extractors.
      The configuration extraction time and associated
       network lag already put the per-configuration time at
       O(seconds)
 Instead, it allows the cost of time-consuming,
  computationally expensive preprocessing, to be
  amortized among researchers.
Questions?

 Kenyon was created primarily from code that existed in
  IVA, which is being funded by NSF grant CCR-01234603.
  Kenyon also contains code from Beagle, the origin analysis
  project overseen by Mike Godfrey.


 Email jbevan@cs.ucsc.edu with future questions.

   http://www.cse.ucsc.edu/research/labs/grase/kenyon/
1 of 31

Recommended

Introduction to Allmon (0.1.0) - a generic performance and availability monit... by
Introduction to Allmon (0.1.0) - a generic performance and availability monit...Introduction to Allmon (0.1.0) - a generic performance and availability monit...
Introduction to Allmon (0.1.0) - a generic performance and availability monit...Tomasz Sikora
1.8K views11 slides
3rd 3DDRESD: OSyRIS by
3rd 3DDRESD: OSyRIS3rd 3DDRESD: OSyRIS
3rd 3DDRESD: OSyRISMarco Santambrogio
380 views51 slides
3 design by
3 design3 design
3 designhanmya
254 views12 slides
MCSoC'13 Keynote Talk "Taming Big Data Streams" by
MCSoC'13 Keynote Talk "Taming Big Data Streams"MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"Hideyuki Kawashima
824 views63 slides
UIC Thesis Beretta by
UIC Thesis BerettaUIC Thesis Beretta
UIC Thesis BerettaMarco Santambrogio
571 views52 slides
VTU 5TH SEM CSE OPERATING SYSTEMS SOLVED PAPERS by
VTU 5TH SEM CSE OPERATING SYSTEMS SOLVED PAPERSVTU 5TH SEM CSE OPERATING SYSTEMS SOLVED PAPERS
VTU 5TH SEM CSE OPERATING SYSTEMS SOLVED PAPERSvtunotesbysree
20.8K views106 slides

More Related Content

Similar to Kenyon: A Software Stratigraphy Platform (ESEC/FSE 2005)

Net framework session03 by
Net framework session03Net framework session03
Net framework session03Niit Care
495 views38 slides
Application scenarios in streaming oriented embedded-system design by
Application scenarios in streaming oriented embedded-system designApplication scenarios in streaming oriented embedded-system design
Application scenarios in streaming oriented embedded-system designMr. Chanuwan
187 views4 slides
ResumeJagannath by
ResumeJagannathResumeJagannath
ResumeJagannathJagannath Timma
205 views2 slides
Microx - A Unix like kernel for Embedded Systems written from scratch. by
Microx - A Unix like kernel for Embedded Systems written from scratch.Microx - A Unix like kernel for Embedded Systems written from scratch.
Microx - A Unix like kernel for Embedded Systems written from scratch.Waqar Sheikh
957 views20 slides
Pt2520 Unit 4.5 Assignment 1 by
Pt2520 Unit 4.5 Assignment 1Pt2520 Unit 4.5 Assignment 1
Pt2520 Unit 4.5 Assignment 1Kimberly High
2 views48 slides
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor by
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen HypervisorMatteo Ferroni
87 views30 slides

Similar to Kenyon: A Software Stratigraphy Platform (ESEC/FSE 2005)(20)

Net framework session03 by Niit Care
Net framework session03Net framework session03
Net framework session03
Niit Care495 views
Application scenarios in streaming oriented embedded-system design by Mr. Chanuwan
Application scenarios in streaming oriented embedded-system designApplication scenarios in streaming oriented embedded-system design
Application scenarios in streaming oriented embedded-system design
Mr. Chanuwan187 views
Microx - A Unix like kernel for Embedded Systems written from scratch. by Waqar Sheikh
Microx - A Unix like kernel for Embedded Systems written from scratch.Microx - A Unix like kernel for Embedded Systems written from scratch.
Microx - A Unix like kernel for Embedded Systems written from scratch.
Waqar Sheikh957 views
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor by Matteo Ferroni
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
Matteo Ferroni87 views
Eclipse DemoCamp Bucharest 2014 - Continuous Integration Jenkins/Hudson by VladLica
Eclipse DemoCamp Bucharest 2014 - Continuous Integration Jenkins/HudsonEclipse DemoCamp Bucharest 2014 - Continuous Integration Jenkins/Hudson
Eclipse DemoCamp Bucharest 2014 - Continuous Integration Jenkins/Hudson
VladLica679 views
SELF LEARNING REAL TIME EXPERT SYSTEM by cscpconf
SELF LEARNING REAL TIME EXPERT SYSTEMSELF LEARNING REAL TIME EXPERT SYSTEM
SELF LEARNING REAL TIME EXPERT SYSTEM
cscpconf34 views
Agile & Iconix sdlc by Ahmed Nehad
Agile & Iconix sdlcAgile & Iconix sdlc
Agile & Iconix sdlc
Ahmed Nehad1.6K views
Configuration management benefits for everyone - Rudder @ FLOSSUK Spring Conf... by RUDDER
Configuration management benefits for everyone - Rudder @ FLOSSUK Spring Conf...Configuration management benefits for everyone - Rudder @ FLOSSUK Spring Conf...
Configuration management benefits for everyone - Rudder @ FLOSSUK Spring Conf...
RUDDER3.8K views
Libckpt transparent checkpointing under unix by ZongYing Lyu
Libckpt transparent checkpointing under unixLibckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unix
ZongYing Lyu288 views
cloud computing preservity by chennuruvishnu
cloud computing preservitycloud computing preservity
cloud computing preservity
chennuruvishnu4.8K views
Synchronization by misra121
SynchronizationSynchronization
Synchronization
misra1211.2K views
A tale of Disaster Recovery (Cfengine everyday, practices and tools) by Jonathan Clarke
A tale of Disaster Recovery (Cfengine everyday, practices and tools)A tale of Disaster Recovery (Cfengine everyday, practices and tools)
A tale of Disaster Recovery (Cfengine everyday, practices and tools)
Jonathan Clarke217 views
A tale of Disaster Recovery (Cfengine everyday, practices and tools) by RUDDER
A tale of Disaster Recovery (Cfengine everyday, practices and tools)A tale of Disaster Recovery (Cfengine everyday, practices and tools)
A tale of Disaster Recovery (Cfengine everyday, practices and tools)
RUDDER1.3K views
Centralizing sequence analysis by Denis C. Bauer
Centralizing sequence analysisCentralizing sequence analysis
Centralizing sequence analysis
Denis C. Bauer584 views
Application cloudification with liberty and urban code deploy - UCD by Davide Veronese
Application cloudification with liberty and urban code deploy - UCDApplication cloudification with liberty and urban code deploy - UCD
Application cloudification with liberty and urban code deploy - UCD
Davide Veronese273 views

More from Sung Kim

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning by
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningSung Kim
1.3K views23 slides
Deep API Learning (FSE 2016) by
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)Sung Kim
1.4K views25 slides
Time series classification by
Time series classificationTime series classification
Time series classificationSung Kim
5.7K views29 slides
Tensor board by
Tensor boardTensor board
Tensor boardSung Kim
8.4K views17 slides
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria... by
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...Sung Kim
2.5K views16 slides
Heterogeneous Defect Prediction (

ESEC/FSE 2015) by
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Sung Kim
2.2K views28 slides

More from Sung Kim(20)

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning by Sung Kim
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
Sung Kim1.3K views
Deep API Learning (FSE 2016) by Sung Kim
Deep API Learning (FSE 2016)Deep API Learning (FSE 2016)
Deep API Learning (FSE 2016)
Sung Kim1.4K views
Time series classification by Sung Kim
Time series classificationTime series classification
Time series classification
Sung Kim5.7K views
Tensor board by Sung Kim
Tensor boardTensor board
Tensor board
Sung Kim8.4K views
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria... by Sung Kim
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
Sung Kim2.5K views
Heterogeneous Defect Prediction (

ESEC/FSE 2015) by Sung Kim
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Sung Kim2.2K views
A Survey on Automatic Software Evolution Techniques by Sung Kim
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution Techniques
Sung Kim1.1K views
Crowd debugging (FSE 2015) by Sung Kim
Crowd debugging (FSE 2015)Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)
Sung Kim1.9K views
Software Defect Prediction on Unlabeled Datasets by Sung Kim
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
Sung Kim16.7K views
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015) by Sung Kim
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Sung Kim1.6K views
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014) by Sung Kim
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Sung Kim1.9K views
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2... by Sung Kim
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
Sung Kim2.2K views
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014) by Sung Kim
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
Sung Kim6.4K views
Source code comprehension on evolving software by Sung Kim
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving software
Sung Kim1.6K views
A Survey on Dynamic Symbolic Execution for Automatic Test Generation by Sung Kim
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
Sung Kim3.1K views
Survey on Software Defect Prediction by Sung Kim
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect Prediction
Sung Kim14.1K views
MSR2014 opening by Sung Kim
MSR2014 openingMSR2014 opening
MSR2014 opening
Sung Kim17K views
Personalized Defect Prediction by Sung Kim
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect Prediction
Sung Kim3.7K views
STAR: Stack Trace based Automatic Crash Reproduction by Sung Kim
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash Reproduction
Sung Kim7K views
Transfer defect learning by Sung Kim
Transfer defect learningTransfer defect learning
Transfer defect learning
Sung Kim3.2K views

Recently uploaded

MVP and prioritization.pdf by
MVP and prioritization.pdfMVP and prioritization.pdf
MVP and prioritization.pdfrahuldharwal141
39 views8 slides
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc
176 views29 slides
LLMs in Production: Tooling, Process, and Team Structure by
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team StructureAggregage
57 views77 slides
"Running students' code in isolation. The hard way", Yurii Holiuk by
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk Fwdays
36 views34 slides
Cencora Executive Symposium by
Cencora Executive SymposiumCencora Executive Symposium
Cencora Executive Symposiummarketingcommunicati21
160 views14 slides
Evaluation of Quality of Experience of ABR Schemes in Gaming Stream by
Evaluation of Quality of Experience of ABR Schemes in Gaming StreamEvaluation of Quality of Experience of ABR Schemes in Gaming Stream
Evaluation of Quality of Experience of ABR Schemes in Gaming StreamAlpen-Adria-Universität
38 views34 slides

Recently uploaded(20)

TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc176 views
LLMs in Production: Tooling, Process, and Team Structure by Aggregage
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team Structure
Aggregage57 views
"Running students' code in isolation. The hard way", Yurii Holiuk by Fwdays
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk
Fwdays36 views
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue by ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueElevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
ShapeBlue224 views
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by ShapeBlue
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
ShapeBlue199 views
Initiating and Advancing Your Strategic GIS Governance Strategy by Safe Software
Initiating and Advancing Your Strategic GIS Governance StrategyInitiating and Advancing Your Strategic GIS Governance Strategy
Initiating and Advancing Your Strategic GIS Governance Strategy
Safe Software184 views
Optimizing Communication to Optimize Human Behavior - LCBM by Yaman Kumar
Optimizing Communication to Optimize Human Behavior - LCBMOptimizing Communication to Optimize Human Behavior - LCBM
Optimizing Communication to Optimize Human Behavior - LCBM
Yaman Kumar38 views
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023 by BookNet Canada
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
BookNet Canada44 views
"Package management in monorepos", Zoltan Kochan by Fwdays
"Package management in monorepos", Zoltan Kochan"Package management in monorepos", Zoltan Kochan
"Package management in monorepos", Zoltan Kochan
Fwdays34 views
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit... by ShapeBlue
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
ShapeBlue162 views
"Node.js Development in 2024: trends and tools", Nikita Galkin by Fwdays
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin
Fwdays33 views
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT by ShapeBlue
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
ShapeBlue208 views
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by ShapeBlue
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
ShapeBlue196 views
State of the Union - Rohit Yadav - Apache CloudStack by ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue303 views
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... by ShapeBlue
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
ShapeBlue120 views
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... by ShapeBlue
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
ShapeBlue164 views

Kenyon: A Software Stratigraphy Platform (ESEC/FSE 2005)

  • 1. Kenyon: A Software Stratigraphy Platform Jennifer Bevan, Sunghun Lijie Zou, Mike Godfrey Kim, E. James Whitehead Jr. University of Waterloo University of California, Santa Cruz {lzou, migod} {jbevan, hunkim, ejw} @uwaterloo.edu @cs.ucsc.edu
  • 2. Motivation  Static analysis-based software evolution research has several common technical issues to manage.  Extracting meaningful configurations from an SCM repository.  Calculating static relations, metrics.  Augments data from commit log messages.  Saving the extracted facts.  For later time-based analysis, data mining, incremental data addition.
  • 3. Ongoing Static Evolution Research  Instability Analysis (J. Bevan)  Refines Zimmerman/Ying/Murphy using static dependence to remove temporal dependencies  Entity Mapping/Origin Analysis (L. Zou, M. Godfrey)  Uses static metrics to identify moved/split/merged procedures, files.  Code clone evolution (M. Kim)  Identifies clones and follows their evolution.
  • 4. More Static Evolution Research  Association rule mining  For predicting changes [Ying et al., IEEE TSE, v30 n9, Sept. 2004]  For architectural justification [Zimmermann, Diehl, and Zeller, Proc. IWPSE 2003]  Identifying code “chunks” for future modularization [Mockus and Weiss, IEEE Software, v18 n2, 2001]  “Feature” identification [Fischer, Pinzger, and Gall, Proc. WCRE 2003]  …and the ongoing research related to these.
  • 5. Problem  Despite similarity of approach, systems make several choices that limit sharing of technology and results:  Usually choosing a single SCM system (CVS) for data.  Usually creating a proprietary database schema.  Usually not easily integratable with other research projects for result sharing.  The cost of computationally expensive analysis techniques are not amortized across multiple research directions.
  • 6. Solution: Kenyon  Kenyon is designed to facilitate static software evolution research by providing common solutions to these common problems:  Phase 1: Automatic configuration extraction from SCM  Phase 2: Invoking static analysis tool(s)  Phase 3: Storing the results from these preprocessing steps.  Asynchronously provides access to previously processed and stored data.
  • 7. Kenyon Processing Phases 2 & 3 Fact Extraction Phase 1 (Static Analysis) Configuration and Persist Extraction Gathered Facts SCM Kenyon Repository Repository (RDBMS/ Hibernate) Filesystem Client Tools perform queries, add new facts Client Software (e.g., IVA)
  • 8. Phase 1: Extract Configurations  Kenyon provides transaction recovery and logical configuration extraction for multiple SCM systems.  Configurations specified by time + branch identifier.  Sliding window algorithm for transaction recovery.  Only changes from completed transactions are extracted for a “logical configuration”.  Only changes from transactions that completed between two specifications are considered for a “configuration delta”.
  • 9. Configuration Specification  Kenyon’s logical configuration extraction and delta calculations allow researchers to consider software “as it existed at time T on branch B”.  Most SCM systems archive data along a timeline with varying support for parallel development.  Kenyon uses this commonality as the basis for its SCM interface and configuration specification.  There is no indication that change-set based SCM systems will not be supportable by Kenyon.
  • 10. Logical Configuration • At any given point in time, one or more transactions may have just completed, and one or more may be ongoing. T1 • Ongoing transactions are F4 shown in red. • Completed transactions are F2 shown in green. F1 F3
  • 11. Configuration Deltas • Configuration deltas are calculated as C(T2) – C(T1). • Only changes from T2 transactions completing between T1 (exclusive) and T1 F4 T2 (inclusive) are considered. F3 F2 F1
  • 12. Data from Phase 1  Valid configuration specifications for extraction are created by Kenyon, one per timestamp where a transaction completed.  For each configuration extracted:  Author and log message of each transaction completing at that specification.  The configuration is placed on the filesystem.  A configuration delta for each consecutive pair of configurations processed can also be stored.
  • 13. Phase 2: Invoke Fact Extractors  Kenyon provides an abstract class that is used to invoke third-party fact extractors on the configuration extracted to the filesystem.  Kenyon users would subclass this class to invoke their own fact extractor.  Support for Codesurfer (line-level analysis) and SWAGKIT (procedure-level analysis) are provided with Kenyon. [www.grammatech.com, swag.uwaterloo.ca]  FactExtractor subclasses have a tri-modal return status: “failure”, “new data to store”, or “no new data to store”.
  • 14. Data from Phase 2  FactExtractor subclasses provide:  A ConfigGraph that maps software elements to nodes and static relationships to edges.  The graph, any node, and any edge may be attributed with static metrics.  Multiple fact extractors may be invoked on a single configuration: each created ConfigGraph is saved with a reference to the fact extractor that created it.  If a configuration has already been processed by a given fact extractor, it will not be processed again unless new metrics are to be calculated.
  • 15. Phase 3: Data Storage  Kenyon uses Hibernate to persist data classes.  Hibernate is an “object/relational persistence and query service for Java” [www.hibernate.org].  Allows reuse of Kenyon classes by research tools implemented in Java.  Each configuration processed by Kenyon is assigned to a Project, the top-level data class persisted by Kenyon.
  • 16. Persisted Kenyon Data • Projects contain one set of data for each configuration Project specification processed. 1 N • Each such data set N 1 ConfigGraph ConfigData contains one or more 1 1 ConfigGraphs, each 1 N produced by a different FactExtractor ConfigSpec FactExtractor. 1 2 • FactExtractors specify 1 1 what GraphSchema GraphSchema ConfigDelta subclass they use (not restrictive).
  • 17. Data Access  Hibernate allows access to preprocessed data using SQL or the Hibernate query methods (HQL, QBE/ QBC), which support class/field-based queries.  A Hibernate query returns a List of Objects, each of which is of the type originally persisted.  Data fields in the returned class are populated unless specified as lazily loaded.  Kenyon provides several convenience queries for common anticipated queries, such as “what configurations are available for this project”.
  • 18. Kenyon Usage  Kenyon processes data based on specifications in a configuration file  Start time, stop time, how often to process  Fact extractors and their assigned metric calculators.  SCM parameters, filesystem parameters, some control over what Hibernate persists.  A “processing run” will reuse any previously processed data if available  For example, if a ConfigGraph has already been created, if new metrics are necessary they are calculated and added to the existing ConfigGraph.
  • 19. Iterative Refinement Example  When looking for “interesting” timeframes of evolution, a multiple-pass process is recommended.  A user can configure Kenyon to process the changes in a system once per day.  Days with high activity or other metrics exceeding a threshold can be flagged as “interesting”.  The user can then configure Kenyon to process those days (via multiple processing runs) at the frequency of “every 20 minutes”.  This process can repeat down to the “every second” level.
  • 20. Parallel Preprocessing  Kenyon is a single-threaded process, but Hibernate supports multiple connections to a single Kenyon database.  A 10-year history can be processed in chunks by any number of computers, even if the processing configurations have overlapping times or different intervals.  Kenyon does not integrate the deltas between different processing runs, so a small overlap in processing chunks is suggested.
  • 21. Kenyon Architecture ConfigData Project Hibernate/DBMS ConfigGraph <<calls>> DataManager <<calls>> <<calls>> MetricLoader Fact Extractor SCMInterface <<calls>> SCM Filesystem Repository
  • 22. Current Status  Kenyon 1.2 available at http://kenyon.dforge.cse.ucsc.edu  Supports CVS, Subversion, and ClearCase  Students in 290G are performing projects using Kenyon this quarter  Actively working with Samsung to analyze some of their source code.
  • 23. Future Work (1/3)  Continue working with M. Kim  Evaluate usefulness of SCM-only module.  If she decides to use Kenyon, assist with full integration.  Finish integration of Beagle/Kenyon and IVA/Kenyon.  Work with G. Murphy on using Kenyon at UBC.  Evaluate Kenyon’s ability to reduce the time-to- results for static software evolution research by analyzing the seminar class projects.
  • 24. Future Work (2/3)  Support branch path traversal  Allow users to see the branch points in a system and specify a path for processing instead of a single branch.  Will reuse existing visualizations, must add specification mechanism.  Incorporate full language-specific containment models for better inter-language graph traversal and mapping.  Use M. Godfrey’s Java fact extractor and containment model.
  • 25. Future Work (3/3)  Support more of the Standard Exchange Formats for ConfigGraph export.  TA is already supported, but only the Fact sections. Schema sections should be improved to use the language-specific containment models.  Encourage other reseachers to use Kenyon, and improve results-sharing, capabilities, etc. based on their feedback.
  • 26. Open Issues (1/3)  The exact mechanism for allowing data sharing between researchers is not entirely controllable by Kenyon  Database setup and administration can effectively override much of Kenyon’s preferences.  By default, Kenyon-created tables are not mutable by processes other than Kenyon.
  • 27. Open Issues (2/3)  Kenyon provides a public class, EvolutionPath, that links a subgraph in one ConfigGraph to one in another ConfigGraph.  Directed and attributable.  Basic building block for evolution data.  Is currently persisted by Kenyon, will likely not be after 1.1, due to database mutability issues.  Other research projects can subclass and, if they want to share their results easily, persist them to a Hibernate database using the provided Hibernate mapping examples.
  • 28. Open Issues (3/3)  Kenyon is able to be automatically invoked via a post-commit script or a cron job.  Should Kenyon be able to be automatically invoked from an IDE?  What sort of support should Kenyon provide for better integration with, for example, Eclipse?
  • 29. Conclusions (1/2)  Kenyon is an engineering solution, designed to amortize the cost of the computationally expensive preprocessing steps that can benefit static software evolution research.  Research projects using Kenyon will not have to independently create solutions for these common problems.  18% code reduction in Beagle without really trying.  Is expected to reduce the lag between beginning system implementation and producing research results.
  • 30. Conclusions (2/2)  Kenyon is not intended to be a lightweight data mining system for software evolution research.  Tradeoff of speed vs. precision is still controllable via the choice of fact extractors.  The configuration extraction time and associated network lag already put the per-configuration time at O(seconds)  Instead, it allows the cost of time-consuming, computationally expensive preprocessing, to be amortized among researchers.
  • 31. Questions?  Kenyon was created primarily from code that existed in IVA, which is being funded by NSF grant CCR-01234603. Kenyon also contains code from Beagle, the origin analysis project overseen by Mike Godfrey.  Email jbevan@cs.ucsc.edu with future questions. http://www.cse.ucsc.edu/research/labs/grase/kenyon/