Introduction to Allmon (0.1.0) - a generic performance and availability monit...Tomasz Sikora
Allmon is a generic distributed system collecting and storing various runtime metrics collections used for system performance, health, quality and availability monitoring purposes. The following presentation gives short introduction to this system. More details can be found on the project page: http://code.google.com/p/allmon/.
Introduction to Allmon (0.1.0) - a generic performance and availability monit...Tomasz Sikora
Allmon is a generic distributed system collecting and storing various runtime metrics collections used for system performance, health, quality and availability monitoring purposes. The following presentation gives short introduction to this system. More details can be found on the project page: http://code.google.com/p/allmon/.
[EWiLi2016] Enabling power-awareness for the Xen HypervisorMatteo Ferroni
Virtualization allows simultaneous execution of multi-tenant workloads on the same platform, either a server or an embedded system. Unfortunately, it is non-trivial to attribute hardware events to multiple virtual tenants, as some system’s metrics relate to the whole system (e.g., RAPL energy counters). Virtualized environments have then a rather incomplete picture of how tenants use the hardware, limiting their optimization capabilities. Thus, we propose XeMPower, a lightweight monitoring solution for Xen that precisely accounts hardware events to guest workloads. It also enables attribution of CPU power consumption to individual tenants. We show that XeMPower introduces negligible overhead in power consumption, aiming to be a reference design for power-aware virtualized environments.
Full paper: http://ceur-ws.org/Vol-1697/EWiLi16_10.pdf
In a Power plant with a Distributed Control System ( DCS ), process parameters are continuously stored in databases at discrete intervals. The data contained in these databases
may not appear to contain valuable relational information but practically such a relation exists.
The large number of process parameter values are changing with time in a Power Plant. These parameters are part of rules framed by domain experts for the expert system. With the changes in parameters there is a quite high possibility to form new rules using the dynamics of the process itself. We present an efficient algorithm that generates all significant rules based on the real data. The association based algorithms were compared and the best suited algorithm for this process application was selected. The application for the Learning system is studied in a Power Plant domain. The SCADA interface was developed to acquire online plant data
Configuration management benefits for everyone - Rudder @ FLOSSUK Spring Conf...RUDDER
Sharing and reusing configurations, rolling out upgrades, ensuring a security policy is correctly applied, automating repetitive tasks, preparing for disaster recovery... these are all missions for configuration management tools.
Rudder is a new, open source approach to this domain, built on existing and reliable components. By allowing experts and power-users to create reusable templates and configurations based on best practices, it enables other actors in the IT department to benefit from the advantages of configuration management: using a web-based interface, junior sysadmins can quickly setup new servers while learning and respecting best practices and company policy, while service managers and security officers can get instant reports on their policies compliance level.
This talk introduces Rudder and show some illustrative use cases before describing the architecture of it's main components and how they interact (a web interface written in Scala, the CFEngine 3 infrastructure used to manage hosts, OpenLDAP as an inventory and configuration data store...), including how to write your own techniques and extend existing ones.
Libckpt transparent checkpointing under unixZongYing Lyu
James S. Plank, Micah Beck, Gerry Kingsley, Kai Li, “Libckpt: Transparent Checkpointing under UNIX“, Conference Proceedings, Usenix Winter 1995 Technical Conference, New Orleans, LA, January, 1995, pp. 213--223
A tale of Disaster Recovery (Cfengine everyday, practices and tools)RUDDER
After a brief presentation of configuration management (CM) basics, we start with an ill-fated tale from the recent past about disaster recovery (also known as a case study, if you must): how our CM saved us, how it didn't, and what could have been done better. This could lead to a discussion about best practices.
We use Cfengine 3, and will introduce the software, overview the main differences with other open source CM tools before explaining why we like this choice. But Cfengine is not all: what enables us to manage our configuration completely are the practices and tools we've built around it.
The first steps of analysing sequencing data (2GS,NGS) has entered a transitional period where on one hand most analysis steps can be automated and standardized (pipeline), while on the other constantly evolving protocols and software updates makes maintaining these analysis pipelines labour intensive.
I propose a centralized system within CSIRO that is flexible to cater for different analyses while also being generic to efficiently disseminate labour intensive maintenance and extension amongst the user community.
Developers often wonder how to implement a certain functionality
(e.g., how to parse XML files) using APIs. Obtaining
an API usage sequence based on an API-related natural
language query is very helpful in this regard. Given a query,
existing approaches utilize information retrieval models to
search for matching API sequences. These approaches treat
queries and APIs as bags-of-words and lack a deep understanding
of the semantics of the query.
We propose DeepAPI, a deep learning based approach to
generate API usage sequences for a given natural language
query. Instead of a bag-of-words assumption, it learns the
sequence of words in a query and the sequence of associated
APIs. DeepAPI adapts a neural language model named
RNN Encoder-Decoder. It encodes a word sequence (user
query) into a fixed-length context vector, and generates an
API sequence based on the context vector. We also augment
the RNN Encoder-Decoder by considering the importance
of individual APIs. We empirically evaluate our approach
with more than 7 million annotated code snippets collected
from GitHub. The results show that our approach generates
largely accurate API sequences and outperforms the related
approaches.
More Related Content
Similar to Kenyon: A Software Stratigraphy Platform (ESEC/FSE 2005)
[EWiLi2016] Enabling power-awareness for the Xen HypervisorMatteo Ferroni
Virtualization allows simultaneous execution of multi-tenant workloads on the same platform, either a server or an embedded system. Unfortunately, it is non-trivial to attribute hardware events to multiple virtual tenants, as some system’s metrics relate to the whole system (e.g., RAPL energy counters). Virtualized environments have then a rather incomplete picture of how tenants use the hardware, limiting their optimization capabilities. Thus, we propose XeMPower, a lightweight monitoring solution for Xen that precisely accounts hardware events to guest workloads. It also enables attribution of CPU power consumption to individual tenants. We show that XeMPower introduces negligible overhead in power consumption, aiming to be a reference design for power-aware virtualized environments.
Full paper: http://ceur-ws.org/Vol-1697/EWiLi16_10.pdf
In a Power plant with a Distributed Control System ( DCS ), process parameters are continuously stored in databases at discrete intervals. The data contained in these databases
may not appear to contain valuable relational information but practically such a relation exists.
The large number of process parameter values are changing with time in a Power Plant. These parameters are part of rules framed by domain experts for the expert system. With the changes in parameters there is a quite high possibility to form new rules using the dynamics of the process itself. We present an efficient algorithm that generates all significant rules based on the real data. The association based algorithms were compared and the best suited algorithm for this process application was selected. The application for the Learning system is studied in a Power Plant domain. The SCADA interface was developed to acquire online plant data
Configuration management benefits for everyone - Rudder @ FLOSSUK Spring Conf...RUDDER
Sharing and reusing configurations, rolling out upgrades, ensuring a security policy is correctly applied, automating repetitive tasks, preparing for disaster recovery... these are all missions for configuration management tools.
Rudder is a new, open source approach to this domain, built on existing and reliable components. By allowing experts and power-users to create reusable templates and configurations based on best practices, it enables other actors in the IT department to benefit from the advantages of configuration management: using a web-based interface, junior sysadmins can quickly setup new servers while learning and respecting best practices and company policy, while service managers and security officers can get instant reports on their policies compliance level.
This talk introduces Rudder and show some illustrative use cases before describing the architecture of it's main components and how they interact (a web interface written in Scala, the CFEngine 3 infrastructure used to manage hosts, OpenLDAP as an inventory and configuration data store...), including how to write your own techniques and extend existing ones.
Libckpt transparent checkpointing under unixZongYing Lyu
James S. Plank, Micah Beck, Gerry Kingsley, Kai Li, “Libckpt: Transparent Checkpointing under UNIX“, Conference Proceedings, Usenix Winter 1995 Technical Conference, New Orleans, LA, January, 1995, pp. 213--223
A tale of Disaster Recovery (Cfengine everyday, practices and tools)RUDDER
After a brief presentation of configuration management (CM) basics, we start with an ill-fated tale from the recent past about disaster recovery (also known as a case study, if you must): how our CM saved us, how it didn't, and what could have been done better. This could lead to a discussion about best practices.
We use Cfengine 3, and will introduce the software, overview the main differences with other open source CM tools before explaining why we like this choice. But Cfengine is not all: what enables us to manage our configuration completely are the practices and tools we've built around it.
The first steps of analysing sequencing data (2GS,NGS) has entered a transitional period where on one hand most analysis steps can be automated and standardized (pipeline), while on the other constantly evolving protocols and software updates makes maintaining these analysis pipelines labour intensive.
I propose a centralized system within CSIRO that is flexible to cater for different analyses while also being generic to efficiently disseminate labour intensive maintenance and extension amongst the user community.
Developers often wonder how to implement a certain functionality
(e.g., how to parse XML files) using APIs. Obtaining
an API usage sequence based on an API-related natural
language query is very helpful in this regard. Given a query,
existing approaches utilize information retrieval models to
search for matching API sequences. These approaches treat
queries and APIs as bags-of-words and lack a deep understanding
of the semantics of the query.
We propose DeepAPI, a deep learning based approach to
generate API usage sequences for a given natural language
query. Instead of a bag-of-words assumption, it learns the
sequence of words in a query and the sequence of associated
APIs. DeepAPI adapts a neural language model named
RNN Encoder-Decoder. It encodes a word sequence (user
query) into a fixed-length context vector, and generates an
API sequence based on the context vector. We also augment
the RNN Encoder-Decoder by considering the importance
of individual APIs. We empirically evaluate our approach
with more than 7 million annotated code snippets collected
from GitHub. The results show that our approach generates
largely accurate API sequences and outperforms the related
approaches.
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Sung Kim
Yida's presentation at MSR 2015!
Abstract—Developers expend significant effort on reviewing source code changes, hence the comprehensibility of code changes directly affects development productivity. Our prior study has suggested that composite code changes, which mix multiple development issues together, are typically difficult to review. Unfortunately, our manual inspection of 453 open source code changes reveals a non-trivial occurrence (up to 29%) of such composite changes.
In this paper, we propose a heuristic-based approach to automatically partition composite changes, such that each sub-change in the partition is more cohesive and self-contained. Our quantitative and qualitative evaluation results are promising in demonstrating the potential benefits of our approach for facilitating code review of composite code changes.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Monitoring Java Application Security with JDK Tools and JFR Events
Kenyon: A Software Stratigraphy Platform (ESEC/FSE 2005)
1. Kenyon: A Software Stratigraphy Platform
Jennifer Bevan, Sunghun Lijie Zou, Mike Godfrey
Kim, E. James Whitehead Jr. University of Waterloo
University of California, Santa Cruz {lzou, migod}
{jbevan, hunkim, ejw} @uwaterloo.edu
@cs.ucsc.edu
2. Motivation
Static analysis-based software evolution
research has several common technical
issues to manage.
Extracting meaningful configurations from an
SCM repository.
Calculating static relations, metrics.
Augments data from commit log messages.
Saving the extracted facts.
For later time-based analysis, data mining,
incremental data addition.
3. Ongoing Static Evolution Research
Instability Analysis (J. Bevan)
Refines Zimmerman/Ying/Murphy using static
dependence to remove temporal dependencies
Entity Mapping/Origin Analysis (L. Zou, M.
Godfrey)
Uses static metrics to identify moved/split/merged
procedures, files.
Code clone evolution (M. Kim)
Identifies clones and follows their evolution.
4. More Static Evolution Research
Association rule mining
For predicting changes [Ying et al., IEEE TSE, v30 n9, Sept. 2004]
For architectural justification [Zimmermann, Diehl, and Zeller,
Proc. IWPSE 2003]
Identifying code “chunks” for future
modularization [Mockus and Weiss, IEEE Software, v18 n2, 2001]
“Feature” identification [Fischer, Pinzger, and Gall, Proc. WCRE
2003]
…and the ongoing research related to these.
5. Problem
Despite similarity of approach, systems make
several choices that limit sharing of technology and
results:
Usually choosing a single SCM system (CVS) for data.
Usually creating a proprietary database schema.
Usually not easily integratable with other research
projects for result sharing.
The cost of computationally expensive analysis
techniques are not amortized across multiple
research directions.
6. Solution: Kenyon
Kenyon is designed to facilitate static software
evolution research by providing common solutions
to these common problems:
Phase 1: Automatic configuration extraction from SCM
Phase 2: Invoking static analysis tool(s)
Phase 3: Storing the results from these preprocessing
steps.
Asynchronously provides access to previously
processed and stored data.
8. Phase 1: Extract Configurations
Kenyon provides transaction recovery and logical
configuration extraction for multiple SCM systems.
Configurations specified by time + branch identifier.
Sliding window algorithm for transaction recovery.
Only changes from completed transactions are extracted
for a “logical configuration”.
Only changes from transactions that completed between
two specifications are considered for a “configuration
delta”.
9. Configuration Specification
Kenyon’s logical configuration extraction and delta
calculations allow researchers to consider software
“as it existed at time T on branch B”.
Most SCM systems archive data along a timeline with
varying support for parallel development.
Kenyon uses this commonality as the basis for its SCM
interface and configuration specification.
There is no indication that change-set based SCM
systems will not be supportable by Kenyon.
10. Logical Configuration
• At any given point in time,
one or more transactions may
have just completed, and one
or more may be ongoing. T1
• Ongoing transactions are F4
shown in red.
• Completed transactions are F2
shown in green. F1
F3
11. Configuration Deltas
• Configuration deltas are
calculated as C(T2) –
C(T1).
• Only changes from T2
transactions completing
between T1 (exclusive) and
T1 F4
T2 (inclusive) are
considered.
F3
F2
F1
12. Data from Phase 1
Valid configuration specifications for extraction are
created by Kenyon, one per timestamp where a
transaction completed.
For each configuration extracted:
Author and log message of each transaction completing
at that specification.
The configuration is placed on the filesystem.
A configuration delta for each consecutive pair of
configurations processed can also be stored.
13. Phase 2: Invoke Fact Extractors
Kenyon provides an abstract class that is used to
invoke third-party fact extractors on the
configuration extracted to the filesystem.
Kenyon users would subclass this class to invoke their
own fact extractor.
Support for Codesurfer (line-level analysis) and
SWAGKIT (procedure-level analysis) are provided with
Kenyon. [www.grammatech.com, swag.uwaterloo.ca]
FactExtractor subclasses have a tri-modal return status:
“failure”, “new data to store”, or “no new data to store”.
14. Data from Phase 2
FactExtractor subclasses provide:
A ConfigGraph that maps software elements to nodes
and static relationships to edges.
The graph, any node, and any edge may be attributed
with static metrics.
Multiple fact extractors may be invoked on a single
configuration: each created ConfigGraph is saved
with a reference to the fact extractor that created it.
If a configuration has already been processed by a
given fact extractor, it will not be processed again
unless new metrics are to be calculated.
15. Phase 3: Data Storage
Kenyon uses Hibernate to persist data
classes.
Hibernate is an “object/relational persistence and
query service for Java” [www.hibernate.org].
Allows reuse of Kenyon classes by research
tools implemented in Java.
Each configuration processed by Kenyon is
assigned to a Project, the top-level data class
persisted by Kenyon.
16. Persisted Kenyon Data
• Projects contain one set of
data for each configuration Project
specification processed. 1
N
• Each such data set N 1
ConfigGraph ConfigData
contains one or more 1 1
ConfigGraphs, each 1 N
produced by a different
FactExtractor ConfigSpec
FactExtractor.
1 2
• FactExtractors specify 1 1
what GraphSchema GraphSchema ConfigDelta
subclass they use (not
restrictive).
17. Data Access
Hibernate allows access to preprocessed data using
SQL or the Hibernate query methods (HQL, QBE/
QBC), which support class/field-based queries.
A Hibernate query returns a List of Objects, each of
which is of the type originally persisted.
Data fields in the returned class are populated unless
specified as lazily loaded.
Kenyon provides several convenience queries for
common anticipated queries, such as “what
configurations are available for this project”.
18. Kenyon Usage
Kenyon processes data based on specifications in a
configuration file
Start time, stop time, how often to process
Fact extractors and their assigned metric calculators.
SCM parameters, filesystem parameters, some control
over what Hibernate persists.
A “processing run” will reuse any previously
processed data if available
For example, if a ConfigGraph has already been created,
if new metrics are necessary they are calculated and
added to the existing ConfigGraph.
19. Iterative Refinement Example
When looking for “interesting” timeframes of
evolution, a multiple-pass process is recommended.
A user can configure Kenyon to process the changes in a
system once per day.
Days with high activity or other metrics exceeding a
threshold can be flagged as “interesting”.
The user can then configure Kenyon to process those
days (via multiple processing runs) at the frequency of
“every 20 minutes”.
This process can repeat down to the “every second”
level.
20. Parallel Preprocessing
Kenyon is a single-threaded process, but Hibernate
supports multiple connections to a single Kenyon
database.
A 10-year history can be processed in chunks by
any number of computers, even if the processing
configurations have overlapping times or different
intervals.
Kenyon does not integrate the deltas between
different processing runs, so a small overlap in
processing chunks is suggested.
22. Current Status
Kenyon 1.2 available at
http://kenyon.dforge.cse.ucsc.edu
Supports CVS, Subversion, and ClearCase
Students in 290G are performing projects
using Kenyon this quarter
Actively working with Samsung to analyze
some of their source code.
23. Future Work (1/3)
Continue working with M. Kim
Evaluate usefulness of SCM-only module.
If she decides to use Kenyon, assist with full integration.
Finish integration of Beagle/Kenyon and
IVA/Kenyon.
Work with G. Murphy on using Kenyon at UBC.
Evaluate Kenyon’s ability to reduce the time-to-
results for static software evolution research by
analyzing the seminar class projects.
24. Future Work (2/3)
Support branch path traversal
Allow users to see the branch points in a system and
specify a path for processing instead of a single branch.
Will reuse existing visualizations, must add specification
mechanism.
Incorporate full language-specific containment
models for better inter-language graph traversal and
mapping.
Use M. Godfrey’s Java fact extractor and containment
model.
25. Future Work (3/3)
Support more of the Standard Exchange
Formats for ConfigGraph export.
TA is already supported, but only the Fact
sections. Schema sections should be improved
to use the language-specific containment models.
Encourage other reseachers to use Kenyon,
and improve results-sharing, capabilities, etc.
based on their feedback.
26. Open Issues (1/3)
The exact mechanism for allowing data
sharing between researchers is not entirely
controllable by Kenyon
Database setup and administration can
effectively override much of Kenyon’s
preferences.
By default, Kenyon-created tables are not
mutable by processes other than Kenyon.
27. Open Issues (2/3)
Kenyon provides a public class, EvolutionPath, that
links a subgraph in one ConfigGraph to one in
another ConfigGraph.
Directed and attributable.
Basic building block for evolution data.
Is currently persisted by Kenyon, will likely not be
after 1.1, due to database mutability issues.
Other research projects can subclass and, if they want to
share their results easily, persist them to a Hibernate
database using the provided Hibernate mapping
examples.
28. Open Issues (3/3)
Kenyon is able to be automatically invoked
via a post-commit script or a cron job.
Should Kenyon be able to be automatically
invoked from an IDE?
What sort of support should Kenyon provide
for better integration with, for example,
Eclipse?
29. Conclusions (1/2)
Kenyon is an engineering solution, designed to
amortize the cost of the computationally expensive
preprocessing steps that can benefit static software
evolution research.
Research projects using Kenyon will not have to
independently create solutions for these common
problems.
18% code reduction in Beagle without really trying.
Is expected to reduce the lag between beginning system
implementation and producing research results.
30. Conclusions (2/2)
Kenyon is not intended to be a lightweight data
mining system for software evolution research.
Tradeoff of speed vs. precision is still controllable via
the choice of fact extractors.
The configuration extraction time and associated
network lag already put the per-configuration time at
O(seconds)
Instead, it allows the cost of time-consuming,
computationally expensive preprocessing, to be
amortized among researchers.
31. Questions?
Kenyon was created primarily from code that existed in
IVA, which is being funded by NSF grant CCR-01234603.
Kenyon also contains code from Beagle, the origin analysis
project overseen by Mike Godfrey.
Email jbevan@cs.ucsc.edu with future questions.
http://www.cse.ucsc.edu/research/labs/grase/kenyon/