Information Retrieval methods have been largely adopted to identify traceability links based on the textual similarity of software artifacts. However, noise due to word usage in software artifacts might negatively affect the recovery accuracy. We propose the use of smoothing filters to reduce the effect of noise in software artifacts and improve the performances of traceability recovery methods. An empirical evaluation performed on two repositories indicates that the usage of a smoothing filter is able to significantly improve the performances of Vector Space Model and Latent Semantic
Indexing. Such a result suggests that other than being used for traceability recovery the proposed filter can be used to improve performances of various other software engineering approaches based on textual analysis.
Este documento presenta una lección sobre cómo describirse a sí mismo y a otros en español. La lección cubre adjetivos para describir el pelo, los ojos, la estatura y la personalidad. Los estudiantes completan ejercicios para practicar el uso de adjetivos en oraciones cortas y identificar reglas gramaticales como el orden de palabras y género. El objetivo es que los estudiantes puedan crear descripciones breves en español.
This document contains summaries of different SONA presentations. It includes statistics on internship placements and projections for various AIESEC locations in India. Inferences are drawn about growth trends, contributions to national targets, and factors affecting performance like capitalization of alumni networks, sales and delivery management, and sustainability of talent and finances. The document emphasizes drawing inferences from statistics and learning from the detailed SONA reports.
This document provides statistics on the performance of various AIESEC local committees in India from 2011 to 2013. It shows that in 2011, AIESEC Chandigarh was ranked first nationally in terms of realizations. In 2012, AIESEC Delhi University was ranked first and AIESEC Hyderabad was ranked third. In 2013, AIESEC Delhi University was again ranked first while AIESEC Pune and AIESEC Chandigarh were ranked third and fourth respectively. The document emphasizes that AIESEC Hyderabad's success in a given year depends on the experience of its membership. It then provides projections for various AIESEC programmes in 2013.
The document discusses Net Promoter Score (NPS), which measures customer satisfaction and loyalty on a scale of -100 to 100. It works by asking customers how likely they are to recommend a company or product to others. An NPS between 0 to 50 is considered good, 50 to 100 is outstanding, -50 to 0 is okay, and -100 to -50 is alarming. The document encourages considering every customer experience.
The document provides instructions and examples for an English class focusing on describing physical appearance, nationality, and where someone is from using vocabulary like hair color, eye color, country of origin, and nationality. It also includes examples of student work and assessments to test listening and speaking skills related to these topics at different proficiency levels.
The document lists various initiatives and statistics related to an organization's partnerships, recruitment efforts, fundraising campaigns, and international exchanges. It includes sections on living diversity, realizations in different countries, enjoying participation and raising initiatives, matching initiatives, realization initiatives, acting sustainably, demonstrating integrity, activating leadership, striving for excellence, and growth statistics from 2012 to 2013.
The document summarizes financial activities and updates for July, including paying the first installment of 50,000 INR for an unstated purpose. It provides details on expenses like office rent, quality checks, and alumni events. Receivables are listed for different programs. The financial statement as of July 26 notes a bank balance of 2.94 lakhs and fixed deposit of 8.04 lakhs. Plans for August include focusing on enrollments, alumni relations, and recruitment.
Este documento presenta una lección sobre cómo describirse a sí mismo y a otros en español. La lección cubre adjetivos para describir el pelo, los ojos, la estatura y la personalidad. Los estudiantes completan ejercicios para practicar el uso de adjetivos en oraciones cortas y identificar reglas gramaticales como el orden de palabras y género. El objetivo es que los estudiantes puedan crear descripciones breves en español.
This document contains summaries of different SONA presentations. It includes statistics on internship placements and projections for various AIESEC locations in India. Inferences are drawn about growth trends, contributions to national targets, and factors affecting performance like capitalization of alumni networks, sales and delivery management, and sustainability of talent and finances. The document emphasizes drawing inferences from statistics and learning from the detailed SONA reports.
This document provides statistics on the performance of various AIESEC local committees in India from 2011 to 2013. It shows that in 2011, AIESEC Chandigarh was ranked first nationally in terms of realizations. In 2012, AIESEC Delhi University was ranked first and AIESEC Hyderabad was ranked third. In 2013, AIESEC Delhi University was again ranked first while AIESEC Pune and AIESEC Chandigarh were ranked third and fourth respectively. The document emphasizes that AIESEC Hyderabad's success in a given year depends on the experience of its membership. It then provides projections for various AIESEC programmes in 2013.
The document discusses Net Promoter Score (NPS), which measures customer satisfaction and loyalty on a scale of -100 to 100. It works by asking customers how likely they are to recommend a company or product to others. An NPS between 0 to 50 is considered good, 50 to 100 is outstanding, -50 to 0 is okay, and -100 to -50 is alarming. The document encourages considering every customer experience.
The document provides instructions and examples for an English class focusing on describing physical appearance, nationality, and where someone is from using vocabulary like hair color, eye color, country of origin, and nationality. It also includes examples of student work and assessments to test listening and speaking skills related to these topics at different proficiency levels.
The document lists various initiatives and statistics related to an organization's partnerships, recruitment efforts, fundraising campaigns, and international exchanges. It includes sections on living diversity, realizations in different countries, enjoying participation and raising initiatives, matching initiatives, realization initiatives, acting sustainably, demonstrating integrity, activating leadership, striving for excellence, and growth statistics from 2012 to 2013.
The document summarizes financial activities and updates for July, including paying the first installment of 50,000 INR for an unstated purpose. It provides details on expenses like office rent, quality checks, and alumni events. Receivables are listed for different programs. The financial statement as of July 26 notes a bank balance of 2.94 lakhs and fixed deposit of 8.04 lakhs. Plans for August include focusing on enrollments, alumni relations, and recruitment.
The document summarizes the activities and results of two quarters for an organization's network promotions, IM development tools, and corporate communications efforts. In quarter 1, they conducted various promotional activities, developed tools like brochures and booklets, and published newsletters. In quarter 2, they continued promotions and added initiatives like brand launches and case studies, while also creating additional tools. Things that worked well included specific brand launches, newsletters, and booklets. Things that did not work as well included some promotional activities and partnerships. Going forward, they plan to focus on initiatives like geographic information systems, branding, knowledge management, and membership development.
The document summarizes the performance of a faculty's review, including what worked like Q1 MLM and portfolio meetings, what didn't work like Q2 MLM and member commitment, member retention responsibilities, achieved targets around raises, matches and realizations, and a plan of action to improve department culture, have matching nights, synergize with other departments on projects, and register members for breaking barriers events.
The document discusses adapting organizational strategies like a chameleon. It instructs participants to split into groups and design fictional organizations. Case studies are presented on Pantene, Pringles, and PARCEL that demonstrate how changes in business strategy unlocked new revenue opportunities. The key lessons are that strategic changes are necessary to seize future opportunities, not incidental, and require foresight to know when existing strategies are ill-suited and discipline to enact fundamental shifts. The document concludes by stating AIESEC needs to adapt the way it does things while maintaining its interdependent nature.
Lili was born in Ambato, Ecuador to parents Angel and Elvira. She has three siblings and studied at Ambato High School before starting university. During her studies, Lili participated in teaching contests and worked at Eugenio Mera School while pursuing her degree in education, which she has now completed.
An Empirical Investigation on Documentation Usage Patterns in Maintenance TasksSebastiano Panichella
When developers perform a software maintenance
task, they need to identify artifacts—e.g., classes or more specifically
methods—that need to be modified. To this aim, they
can browse various kind of artifacts, for example use case
descriptions, UML diagrams, or source code.
This paper reports the results of a study—conducted with 33
participants— aimed at investigating (i) to what extent developers
use different kinds of documentation when identifying artifacts
to be changed, and (ii) whether they follow specific navigation
patterns among different kinds of artifacts.
Results indicate that, although developers spent a conspicuous
proportion of the available time by focusing on source code,
they browse back and forth between source code and either
static (class) or dynamic (sequence) diagrams. Less frequently,
developers—especially more experienced ones—follow an “integrated”
approach by using different kinds of artifacts.
The document outlines the achievements and non-achievements of talent management, including successful MB training in Q1 and Q3, completion of the TMP TLP and LEAD programs, and filling of national roles based on job descriptions. It also notes challenges with retention, late MB training, induction programs, and March recruitment. Metrics are provided showing targets and actuals for TMP/TLP programs. Next steps discussed include September recruitment and induction, expanding TMP/TLP programs, continuing LEAD, iXPs, TnT, and establishing MB sub-teams.
El documento presenta una lección sobre cómo dar y seguir direcciones en español. Incluye vocabulario clave como "izquierda", "derecha", "todo recto", y verbos como "toma", "sigue", y "cruza". Los estudiantes ven un video que muestra ejemplos de cómo dar direcciones y completan ejercicios interactivos para practicar dar y seguir direcciones. Al final, toman un examen de opción múltiple para evaluar su comprensión.
How the Evolution of Emerging Collaborations Relates to Code Changes: An Empi...Sebastiano Panichella
Developers contributing to open source projects spontaneously group into "emerging" teams, re
ected by messages ex-changed over mailing lists, issue trackers and other communication means. Previous studies suggested that such teams somewhat mirror the software modularity. This paper empirically investigates how, when a project evolves, emerging teams re-organize themselves|e.g., by splitting or merging. We relate the evolution of teams to the les they change, to investigate whether teams split to work on cohesive groups
of files. Results of this study conducted on the evolution
history of four open source projects, namely Apache HTTPD, Eclipse JDT, Netbeans, and Samba provide indications of what happens in the project when teams reorganize. Specifically, we found that emerging team splits imply working on more cohesive groups of les and emerging team merges imply working on groups of les that are cohesive from structural perspective. Such indications serve to better understand the evolution of software projects. More important, the observation of how emerging teams change can serve to suggest software remodularization actions.
This document outlines the goals and activities for an organization called "The Fraternity" over the first two quarters of the year. It includes growing membership through recruitment events, onboarding new associates through induction programs, tracking member performance, and continuing education activities to develop talent within the organization. The way forward focuses on refining local education processes, holding membership recruitment and development events, and instituting systems for mentorship, reviews, and ensuring team minimums are met.
Quarter 1 sales were below target but Quarter 2 sales exceeded the target. For the year, funds raised totaled 6,72,875 rupees of the 18,00,000 rupee target. Three new board appointments were made. A new partnership called '24*7 Leadership Factory' was created but not yet delivered. Successful strategies included events and annual partnerships. Opportunities that were not capitalized on included the CSR period, February recruitment, and building the OC and BDT teams. The way forward includes better delivery of programs, board contracts, client activities, expansion events, product packaging, and focusing on impact.
This document contains graphs showing the targets and achievements of several local associations over July, August, and September. Each graph tracks a different association - The Palladium, Executive Authority, The Esprit, The Trajectory - and shows whether they met their monthly targets. The final graphs track targets and achievements for matches, raises, and realizations at GCDP ICX.
This document describes a social justice program for middle school students. It has them identify partner organizations working on social issues and engage with those groups through letter writing, interviews, and actions. Students document their work on a social media site and teach workshops about the issues. The program aims to narrow the distance between the classroom and real world issues of social justice and open possibilities for lifelong engagement on these topics. It is presented as having institutional impact, pushing students out of comfort zones, and being innovative while dependent on school support.
This document summarizes the finances and investments of AIESEC in Hyderabad for the year. It details that the finance reserves increased to Rs. 7,74,000 with fixed deposits of Rs. 7,07,000. Various investments led to positive results, including an office renovation, stakeholder summit that generated 50 recruits, hoardings, and Diwali gifts for clients that led to 5 direct recruits. Successful investments included events, expansions, conferences for members, national exchange programs, and training programs. Key aspects that worked well included financial policies, return on relationship programs, investments in onboarding programs, proper documentation, and cost cutting.
To add people to GIS, go to your profile picture and click My Committee. Then click Teams and add members by clicking the numbers 1 and 2. To assign members to teams, click Add Team, enter team details, and add members to the team by clicking the green plus icon in the TLP (team leader position) box only. Do not click the plus icon in the TMP (team member position) box.
The document describes the process of designing a 3D post-apocalyptic city environment over multiple dates. It begins with designing the roads and first buildings, then adds more buildings and details like street lamps, painted road lines, and vehicles obtained from online images. Interior details like rooms, doors and windows are later added to buildings. Color is eventually applied to objects to aid texturing for a fly-through video of the virtual environment.
The document contains updates on appointments, targets achieved, and matches/positives for various companies from January to July. It also notes upcoming projects and meetings at the department, including membership development and industry capitalization. Additionally, it promotes believing in greatness and provides a contact for any questions.
The document discusses the negative outlook of a lost generation who believes happiness comes only from money, prioritizes work over family, and expects high divorce rates and environmental destruction in the future. It also mentions the need for strategic thinking around problem solving, vision, collaboration, and identifying strategies by considering relations, patterns, trends, and tradeoffs to think beyond traditional approaches.
The document reports on program performance metrics with targets and achievements for 2014-2015. It shows the number of current matches, realizations, and achieved percentages for different programs, as well as targets and achievements for bank fixed deposits. Overall program performance was below most targets set for the period.
This document contains charts showing pipeline, matches and projections for different groups (GIP ICX, GIP OGX, GCDP OGX, GCDP ICX) from January 2014 to October 2014. For each group, inferences are listed below the charts based on factors like national standing, internal/external markets, conversion ratios, and ways forward.
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsForward Gradient
The document outlines an AI and NLP seminar, including three parts: natural language processing, speech, and introduction. Part II on NLP covers topics like word representations, sentence representations, NLP benchmarks, multilingual representations, and applications of text and graph embeddings. Part III on speech discusses speech recognition approaches and multimodal speech and text for emotion recognition.
Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...hblanca
We present two terminology extraction tools to compare a knowledge-poor and a knowledge-rich approach. Both tools process SWT and MWT and are designed to handle multilingualism. We run an evaluation on 6 languages and 2 different domains using crawled comparable corpora and hand-crafted reference term lists (RTL). We discuss the 3 main results achieved for terminology extraction. The first two evaluation scenarios concern the knowledge-rich framework. Scenario 1 (S1) compares performances for each of the languages depending on the ranking that is applied: specificity score vs. the number of occurrences. Scenario 2 (S2) examines the relevancy of the term variant identification to increase the precision ranking for any of the languages. Scenario 3 (S3) compares both tools and demonstrates that a probabilistic term extraction approach, developed with minimal effort, achieves satisfactory results when compared to a rule-based method.
conference: cicling 2013 - samos
The document summarizes the activities and results of two quarters for an organization's network promotions, IM development tools, and corporate communications efforts. In quarter 1, they conducted various promotional activities, developed tools like brochures and booklets, and published newsletters. In quarter 2, they continued promotions and added initiatives like brand launches and case studies, while also creating additional tools. Things that worked well included specific brand launches, newsletters, and booklets. Things that did not work as well included some promotional activities and partnerships. Going forward, they plan to focus on initiatives like geographic information systems, branding, knowledge management, and membership development.
The document summarizes the performance of a faculty's review, including what worked like Q1 MLM and portfolio meetings, what didn't work like Q2 MLM and member commitment, member retention responsibilities, achieved targets around raises, matches and realizations, and a plan of action to improve department culture, have matching nights, synergize with other departments on projects, and register members for breaking barriers events.
The document discusses adapting organizational strategies like a chameleon. It instructs participants to split into groups and design fictional organizations. Case studies are presented on Pantene, Pringles, and PARCEL that demonstrate how changes in business strategy unlocked new revenue opportunities. The key lessons are that strategic changes are necessary to seize future opportunities, not incidental, and require foresight to know when existing strategies are ill-suited and discipline to enact fundamental shifts. The document concludes by stating AIESEC needs to adapt the way it does things while maintaining its interdependent nature.
Lili was born in Ambato, Ecuador to parents Angel and Elvira. She has three siblings and studied at Ambato High School before starting university. During her studies, Lili participated in teaching contests and worked at Eugenio Mera School while pursuing her degree in education, which she has now completed.
An Empirical Investigation on Documentation Usage Patterns in Maintenance TasksSebastiano Panichella
When developers perform a software maintenance
task, they need to identify artifacts—e.g., classes or more specifically
methods—that need to be modified. To this aim, they
can browse various kind of artifacts, for example use case
descriptions, UML diagrams, or source code.
This paper reports the results of a study—conducted with 33
participants— aimed at investigating (i) to what extent developers
use different kinds of documentation when identifying artifacts
to be changed, and (ii) whether they follow specific navigation
patterns among different kinds of artifacts.
Results indicate that, although developers spent a conspicuous
proportion of the available time by focusing on source code,
they browse back and forth between source code and either
static (class) or dynamic (sequence) diagrams. Less frequently,
developers—especially more experienced ones—follow an “integrated”
approach by using different kinds of artifacts.
The document outlines the achievements and non-achievements of talent management, including successful MB training in Q1 and Q3, completion of the TMP TLP and LEAD programs, and filling of national roles based on job descriptions. It also notes challenges with retention, late MB training, induction programs, and March recruitment. Metrics are provided showing targets and actuals for TMP/TLP programs. Next steps discussed include September recruitment and induction, expanding TMP/TLP programs, continuing LEAD, iXPs, TnT, and establishing MB sub-teams.
El documento presenta una lección sobre cómo dar y seguir direcciones en español. Incluye vocabulario clave como "izquierda", "derecha", "todo recto", y verbos como "toma", "sigue", y "cruza". Los estudiantes ven un video que muestra ejemplos de cómo dar direcciones y completan ejercicios interactivos para practicar dar y seguir direcciones. Al final, toman un examen de opción múltiple para evaluar su comprensión.
How the Evolution of Emerging Collaborations Relates to Code Changes: An Empi...Sebastiano Panichella
Developers contributing to open source projects spontaneously group into "emerging" teams, re
ected by messages ex-changed over mailing lists, issue trackers and other communication means. Previous studies suggested that such teams somewhat mirror the software modularity. This paper empirically investigates how, when a project evolves, emerging teams re-organize themselves|e.g., by splitting or merging. We relate the evolution of teams to the les they change, to investigate whether teams split to work on cohesive groups
of files. Results of this study conducted on the evolution
history of four open source projects, namely Apache HTTPD, Eclipse JDT, Netbeans, and Samba provide indications of what happens in the project when teams reorganize. Specifically, we found that emerging team splits imply working on more cohesive groups of les and emerging team merges imply working on groups of les that are cohesive from structural perspective. Such indications serve to better understand the evolution of software projects. More important, the observation of how emerging teams change can serve to suggest software remodularization actions.
This document outlines the goals and activities for an organization called "The Fraternity" over the first two quarters of the year. It includes growing membership through recruitment events, onboarding new associates through induction programs, tracking member performance, and continuing education activities to develop talent within the organization. The way forward focuses on refining local education processes, holding membership recruitment and development events, and instituting systems for mentorship, reviews, and ensuring team minimums are met.
Quarter 1 sales were below target but Quarter 2 sales exceeded the target. For the year, funds raised totaled 6,72,875 rupees of the 18,00,000 rupee target. Three new board appointments were made. A new partnership called '24*7 Leadership Factory' was created but not yet delivered. Successful strategies included events and annual partnerships. Opportunities that were not capitalized on included the CSR period, February recruitment, and building the OC and BDT teams. The way forward includes better delivery of programs, board contracts, client activities, expansion events, product packaging, and focusing on impact.
This document contains graphs showing the targets and achievements of several local associations over July, August, and September. Each graph tracks a different association - The Palladium, Executive Authority, The Esprit, The Trajectory - and shows whether they met their monthly targets. The final graphs track targets and achievements for matches, raises, and realizations at GCDP ICX.
This document describes a social justice program for middle school students. It has them identify partner organizations working on social issues and engage with those groups through letter writing, interviews, and actions. Students document their work on a social media site and teach workshops about the issues. The program aims to narrow the distance between the classroom and real world issues of social justice and open possibilities for lifelong engagement on these topics. It is presented as having institutional impact, pushing students out of comfort zones, and being innovative while dependent on school support.
This document summarizes the finances and investments of AIESEC in Hyderabad for the year. It details that the finance reserves increased to Rs. 7,74,000 with fixed deposits of Rs. 7,07,000. Various investments led to positive results, including an office renovation, stakeholder summit that generated 50 recruits, hoardings, and Diwali gifts for clients that led to 5 direct recruits. Successful investments included events, expansions, conferences for members, national exchange programs, and training programs. Key aspects that worked well included financial policies, return on relationship programs, investments in onboarding programs, proper documentation, and cost cutting.
To add people to GIS, go to your profile picture and click My Committee. Then click Teams and add members by clicking the numbers 1 and 2. To assign members to teams, click Add Team, enter team details, and add members to the team by clicking the green plus icon in the TLP (team leader position) box only. Do not click the plus icon in the TMP (team member position) box.
The document describes the process of designing a 3D post-apocalyptic city environment over multiple dates. It begins with designing the roads and first buildings, then adds more buildings and details like street lamps, painted road lines, and vehicles obtained from online images. Interior details like rooms, doors and windows are later added to buildings. Color is eventually applied to objects to aid texturing for a fly-through video of the virtual environment.
The document contains updates on appointments, targets achieved, and matches/positives for various companies from January to July. It also notes upcoming projects and meetings at the department, including membership development and industry capitalization. Additionally, it promotes believing in greatness and provides a contact for any questions.
The document discusses the negative outlook of a lost generation who believes happiness comes only from money, prioritizes work over family, and expects high divorce rates and environmental destruction in the future. It also mentions the need for strategic thinking around problem solving, vision, collaboration, and identifying strategies by considering relations, patterns, trends, and tradeoffs to think beyond traditional approaches.
The document reports on program performance metrics with targets and achievements for 2014-2015. It shows the number of current matches, realizations, and achieved percentages for different programs, as well as targets and achievements for bank fixed deposits. Overall program performance was below most targets set for the period.
This document contains charts showing pipeline, matches and projections for different groups (GIP ICX, GIP OGX, GCDP OGX, GCDP ICX) from January 2014 to October 2014. For each group, inferences are listed below the charts based on factors like national standing, internal/external markets, conversion ratios, and ways forward.
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsForward Gradient
The document outlines an AI and NLP seminar, including three parts: natural language processing, speech, and introduction. Part II on NLP covers topics like word representations, sentence representations, NLP benchmarks, multilingual representations, and applications of text and graph embeddings. Part III on speech discusses speech recognition approaches and multimodal speech and text for emotion recognition.
Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Ext...hblanca
We present two terminology extraction tools to compare a knowledge-poor and a knowledge-rich approach. Both tools process SWT and MWT and are designed to handle multilingualism. We run an evaluation on 6 languages and 2 different domains using crawled comparable corpora and hand-crafted reference term lists (RTL). We discuss the 3 main results achieved for terminology extraction. The first two evaluation scenarios concern the knowledge-rich framework. Scenario 1 (S1) compares performances for each of the languages depending on the ranking that is applied: specificity score vs. the number of occurrences. Scenario 2 (S2) examines the relevancy of the term variant identification to increase the precision ranking for any of the languages. Scenario 3 (S3) compares both tools and demonstrates that a probabilistic term extraction approach, developed with minimal effort, achieves satisfactory results when compared to a rule-based method.
conference: cicling 2013 - samos
Scalability in Software Systems Engineering: The Good, the Bad, and the Ugly ...David Rosenblum
1) Scalability is an important requirement for modern software systems as hardware capabilities and user demands continue to grow rapidly.
2) There are various definitions of scalability relating to performance, complexity, and abstraction. It can be characterized as how resource consumption grows with problem size.
3) Techniques for achieving scalability include abstraction, execution analysis, coarse-grained analysis, distribution, and approximation, each with associated costs and tradeoffs.
4) True scalability engineering is needed to systematically apply scalability techniques, evaluate designs, and compare alternatives to build systems that can demonstrably scale from the start.
This document summarizes AT&T Research's participation in the 2009 TREC Video Retrieval Evaluation content-based copy detection task. It describes their approaches to shot boundary detection, transformation detection and normalization of query keyframes, indexing reference videos using locality sensitive hashing, and refining matches through RANSAC and scoring techniques. Their system was evaluated on a dataset containing over 1,000 query videos and 800 reference videos, extracting over 5 million features. Results showed their technique could accurately detect copied video segments in the dataset.
Functionality testing involves developing test cases to test new code based on software function specifications, marketing requirements, and developer code. Test cases are the foundation of quality assurance and should cover equivalence classes, boundary values, decision tables, state transitions, and all pairs to ensure thorough coverage. Quality functionality testing requires understanding the purpose of new features, communicating with developers, thoroughly designing test cases, carefully executing tests, and reviewing results.
The document discusses preprocessing techniques for historical Sanskrit text documents before optical character recognition (OCR). It describes the basic steps of preprocessing which include scanning, noise removal through filtering techniques like mean, median and Wiener filters, and binarization. Newer techniques discussed are non-local means and total variation methods for noise removal, which help preserve details and edges while removing noise. The document evaluates the effect of different preprocessing filters and binarization on sample text images.
Changes and Bugs: Mining and Predicting Development ActivitiesThomas Zimmermann
The document discusses mining software archives and bug databases to predict software development activities and defects. It presents two main contributions: 1) fine-grained analysis of version archives to identify usage patterns and cross-cutting changes, and 2) mining bug databases to predict defects based on dependencies and the increased likelihood of defects when depending on defect-prone code.
NoSQL addresses issues related to large volumes of data, including poorly structured data, simplicity of data management, frequent reads and writes, big data streams, huge data storage needs, fast data filtering, complex relationships, and real-time processing and analysis. It works by chopping data into smaller, manageable pieces, separating reads from writes, using techniques like caching, and designing for unlimited data growth. Key aspects include minimizing relations, parallelizing and distributing operations, and avoiding single points of failure.
Generating super resolution images using transformersNEERAJ BAGHEL
The document summarizes a research paper on using transformers for the task of natural language processing. Some key points:
- Transformers use attention mechanisms to draw global dependencies between input and output without regard to sequence length, addressing limitations of RNNs and CNNs for NLP tasks.
- The proposed transformer architecture contains self-attention layers in the encoder and decoder, as well as an attention mechanism between the encoder and decoder.
- The transformer uses scaled dot-product attention and multi-head attention. Self-attention allows relating different positions of a single sequence to compute representations.
- Other components include feedforward layers and positional encoding to inject information about the relative or absolute positions of the tokens in the sequence
Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb Bert version three for Chinese lanuage. bert is a bert bert bert bert bert bergt bert bert bert bertbert bert bet bertb
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfSease
f you want to expand your query/documents with synonyms in Apache Lucene, you need to have a predefined file containing the list of terms that share the same semantic. It’s not always easy to find a list of basic synonyms for a language and, even if you find it, this doesn’t necessarily match with your contextual domain.
The term “daemon” in the domain of operating system articles is not a synonym of “devil” but it’s closer to the term “process”.
Word2Vec is a two-layer neural network that takes as input a text and outputs a vector representation for each word in the dictionary. Two words with similar meanings are identified with two vectors close to each other.
Software Systems as Cities: a Controlled ExperimentRichard Wettel
This document describes a controlled experiment to evaluate the Software Systems as Cities visualization tool CodeCity. The experiment involves participants completing program comprehension and design quality assessment tasks on medium and large software systems using either CodeCity or traditional tools like Eclipse. The main research questions are whether CodeCity increases task correctness and reduces time compared to traditional tools, regardless of system size. Key variables that are measured include task correctness, completion time, tool used, system size, participant experience level and background.
On how to change the utility curve of deep learning to make deep learning projects deliver an ROI no matter how accurate the machine learning system is - presented at the Nasscom Analytics Summit 2018.
- The document discusses Hi-Lite, a French research project that aims to combine unit testing and formal verification in Ada programs.
- It presents the translation from Ada to the Why3 verification language, including handling of types, contracts, loops, and other language features.
- The goal is to allow gradual adoption of formal verification while still leveraging existing tests, and apply verification to both new and legacy Ada code bases.
This paper aims to develop an effective sentence model using a dynamic convolutional neural network (DCNN) architecture. The DCNN applies 1D convolutions and dynamic k-max pooling to capture syntactic and semantic information from sentences with varying lengths. This allows the model to relate phrases far apart in the input sentence and draw together important features. Experiments show the DCNN approach achieves strong performance on tasks like sentiment analysis of movie reviews and question type classification.
Nanometer chip testing faces new challenges due to increasing process variations, new failure mechanisms, and higher costs. Solutions include new fault models like bridge fault testing, delay fault testing to check timing at high speeds, scan compression to reduce test data volume, and scan-based diagnostics to improve yield learning. Effective solutions require close collaboration between test, technology, and automated test equipment experts.
1) Testing at the nanometer scale presents new challenges due to increasing process variations, complex signal integrity issues, and new defect mechanisms.
2) New test techniques are needed to detect failures such as small delay defects and high-resistance bridges. Approaches such as bridge fault testing and delay fault testing generate significantly more test patterns.
3) Solutions to reduce cost and power consumption during testing include scan compression techniques, preventing unnecessary switching during scan shifts, and developing power-aware test patterns.
The presentation was given to Rivier Scala / Clojure User Group meeting on 10.6.2013. It is half-baked presentation. Will upload the final version when ready.
The first part is about DSLs in general, complexities in software engineering and abstraction. The seconds part presents an quick overview about DSLs in Scala and touches some of the technologies used for deep embedding.
Similar to ICPC 2011 - Improving IR-based Traceability Recovery Using Smoothing Filters (20)
Maliheh (Mali) Izadi, PhD, Andrea Di Sorbo, and Sebastiano Panichella co-chaired the 3rd Intl. Workshop on NL-based Software Engineering
April 20 2024, Lisbon, Portugal.
Diversity-guided Search Exploration for Self-driving Cars Test Generation thr...Sebastiano Panichella
Timo Blattner, Christian Birchler, Timo Kehrer, Sebastiano Panichella: Diversity-guided Search Exploration for Self-driving Cars Test Generation through Frenet Space Encoding. Intl. Workshop on Search-Based and Fuzz Testing (SBFT). 2024
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSebastiano Panichella
Nicolas Erni, Al-Ameen, Mohammed, Christian Birchler, Pouria Derakhshanfar, Stephan Lukasczyk, Sebastiano Panichella: SBFT Tool Competition 2024 -- Python Test Case Generation Track 17th International Workshop on Search-Based and Fuzz Testing
SBFT Tool Competition 2024 - CPS-UAV Test Case Generation TrackSebastiano Panichella
Sajad Khatiri, Prasun Saurabh, Timothy Zimmermann, Charith Munasinghe, Christian Birchler, Sebastiano Panichella: SBFT Tool Competition 2024 - CPS-UAV Test Case Generation Track 17th International Workshop on Search-Based and Fuzz Testing
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSebastiano Panichella
Sajad Khatiri, Sebastiano Panichella, Paolo Tonella: Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist. International Conference on Software Engineering. 2024
Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective ...Sebastiano Panichella
Lecture entitled "Testing with Fewer Resources: Toward Adaptive Approaches for Cost-effective Test Generation and Selection" at the International Summer School
on Search- and Machine Learning-based Software Engineering
June 22-24, 2022 - Córdoba, Spain
Sebastiano Panichella and Christian Birchler
COSMOS:
DevOps for Complex Cyber-physical Systems
Sebastiano Panichella
Zurich University of Applied Sciences (ZHAW)
Workshop on Adaptive CPSoS (WASOS) 2023
Testing and Development Challenges for Complex Cyber-Physical Systems: Insigh...Sebastiano Panichella
Keynote presentation </b>at ICST (AIST workshop) entitled "Testing and Development Challenges for Complex Cyber-Physical Systems: Insights from the COSMOS H2020 Project"
An Empirical Characterization of Software Bugs in Open-Source Cyber-Physical ...Sebastiano Panichella
Presentation at 16th IEEE International Conference on Software
Testing, Verification and Validation (ICST): An Empirical Characterization of Software Bugs in Open-Source Cyber-Physical Systems. Journal of Systems & Software (JSS).
Automated Identification and Qualitative Characterization of Safety Concerns ...Sebastiano Panichella
Presentation at the IEEE/ACM International Conference on
Automated Software Engineering (ASE 2023):
“Automated Identification and Qualitative Characterization of Safety Concerns
Reported in UAV Software Platforms” -
Transactions on Software Engineering and Methodology
This document provides information about the NL-based Software Engineering (NLBSE) '23 workshop to be held on May 20th, 2023. The workshop will have two keynote speakers, two paper presentation sessions, a tool competition, and will be held in a hybrid format with both in-person and remote participation. It outlines the schedule, participating speakers and chairs, instructions for remote participants, and plans for recording and publishing the workshop proceedings.
Simulation-based Test Case Generation for Unmanned Aerial Vehicles in the Nei...Sebastiano Panichella
This document proposes a method called SURREALIST to generate realistic simulated test cases for unmanned aerial vehicles (UAVs) using real flight logs. It aims to address limitations of field testing such as lack of reproducibility and limited test scenarios. SURREALIST works in two steps: 1) It systematically replicates real flights in simulation by finding optimal drone and environment configurations that minimize differences between real and simulated flight trajectories. 2) It generates new challenging test cases by manipulating drone and environment configurations according to a difficulty measure, such as violating safety distances to obstacles. The approach is evaluated on examples of replicating and modifying an existing flight to evaluate its ability to find bugs. SURREALIST aims to generate tests that can discover non
Exposed! A case study on the vulnerability-proneness of Google Play AppsSebastiano Panichella
This study analyzed the vulnerability levels of 1000 mobile apps from Google Play across 23 categories. The key findings were:
1) Medical apps had significantly fewer vulnerabilities than other categories like Finance and Shopping.
2) An app's vulnerability level did not affect its rating, but apps with more downloads tended to have higher vulnerability levels.
3) Contextual information like app description, metadata, and static code features could predict an app's vulnerability level with over 75% accuracy, with market data providing complementary insights to code analysis. Addressing app security is important as users may not be aware of risks when installing apps.
Search-based Software Testing (SBST) '22
Workshop Co-Chairs:
Giovani Guizzo
UNIVERSITY COLLEGE LONDON, UNITED KINGDOM
Sebastiano Panichella
ZURICH UNIVERSITY OF APPLIED SCIENCE, SWITZERLAND
Competition Co-Chairs:
Alessio Gambi
UNIVERSITY OF PASSAU, GERMANY
Gunel Jahangirova
UNIVERSITÀ DELLA SVIZZERA ITALIANA, SWITZERLAND
Vincenzo Riccio
UNIVERSITÀ DELLA SVIZZERA ITALIANA, SWITZERLAND
Fiorella Zampetti
UNIVERSITY OF SANNIO, ITALY
Website Chair:
Rebecca Moussa
UNIVERSITY COLLEGE LONDON, UNITED KINGDOM
Program Committee:
Nazareno Aguirre, Universidad Nacional de Río Cuarto - CONICET, Argentina
Aldeida Aleti, Monash University, Australia
Giuliano Antoniol, Ecole Polytechnique de Montréal, Canada
Kate Bowers, Oakland University, USA
Jose Campos, University of Washington, USA
Thelma E. Colanzi, State University of Maringá, Brazil
Byron DeVries, Grand Valley State University, USA
Gordon Fraser, University of Passau, Germany
Erik Fredericks, Oakland University, USA
Gregory Gay, Chalmers and the University of Gothenburg, Sweden
Alessandra Gorla, IMDEA Software Institute, Spain
Gregory Kapfhammer, Allegheny College, USA
Yiling Lou, Peking University, China
Mitchell Olsthoorn, Delft University of Technology, Netherlands
Justyna Petke, University College London, UK
Silvia R. Vergilio, Universidade Federal do Paraná, Brazil
Simone do Rocio Senger de Souza, University of São Paulo, Brazil
Thomas Vogel, Humboldt-Universität zu Berlin, Germany
Jie Zhang, University College London, UK
Tool Competition
Introduction
NLP-based approaches and tools have been proposed to improve the efficiency of software engineers, processes, and products, by automatically processing natural language artifacts (issues, emails, commits, etc.).
We believe that the availability of accurate tools is becoming increasingly necessary to improve Software Engineering (SE) processes. One important process is issue management and prioritization where developers have to understand, classify, prioritize, assign, etc. incoming issues reported by end-users and developers.
This year, we are pleased to announce the first edition of the NLBSE’22 tool competition on issue report classification, an important task in issue management and prioritization.
For the competition, we provide a dataset encompassing more than 800k labeled issue reports (as bugs, enhancements, and questions) extracted from real open-source projects. You are invited to leverage this dataset for evaluating your classification approaches and compare the achieved results against a proposed baseline approach (based on FastText).
Competition overview
We created a Colab notebook with detailed information about the competition (provided data, baseline approach, paper submission, paper format, etc.).
If you want to participate, you must:
Train and tune a multi-label multi-class classifier using the provided training set. The classifier should assign one label to an issue.
Evaluate your classifier on the provided test set
Write a paper (4 pages max.) describing:
The architecture and details of the classifier
The procedure used to pre-process the data
The procedure used to tune the classifier on the training set
The results of your classifier on the test set
Additional info.: provide a link to your code/tool with proper documentation on how to run it
Submit the paper by emailing the tool competition organizers (see below)
Submissions will be evaluated and accepted based on correctness and reproducibility, defined by the following criteria:
Clarity and detail of the paper content
Availability of the code/tool, released as open-source
Correct training/tuning/evaluation of your code/tool on the provided data
Clarity of the code documentation
The accepted submissions will be published at the workshop proceedings.
The submissions will be ranked based on the F1 score achieved by the proposed classifiers on the test set, as indicated in the papers.
The submission with the highest F1 score will be the winner of the competition.
How to participate?
Email your paper to Oscar Chaparro (oscarch@wm.edu) and Rafael Kallis (rk@rafaelkallis.com) by the submission deadline.
ICPC 2011 - Improving IR-based Traceability Recovery Using Smoothing Filters
1. Improving IR-based Traceability
Recovery Using Smoothing Filters
Andrea Massimiliano Rocco Annibale Sebastiano
De Lucia Di Penta Oliveto Panichella Panichella
2. Software traceability
“The degree to which a relationship can be established
between two products of a software development process”
[IEEE Glossary for Software Terminology]
Source
Use case Test case
code
Source
Use case Test case
code
Important for:
Up-to-date traceability
program comprehension Up-to-date traceability
requirement tracing links rarely exists →
links rarely exists →
impact analysis
need to recover them
need to recover them
software reuse
…
3. IR-based traceability recovery
Antoniol et al., 2002 (VSM+Probabilistic model)
Antoniol et al., 2002 (VSM+Probabilistic model)
Marcus and Maletic, 2003 (LSI)
Marcus and Maletic, 2003 (LSI)
4. Traditional IR vs.
IR applied to Software Engineering
Traditional IR IR applied to SE
Deals with We have sets of
heterogeneous homogeneous
documents for what documents for what
concerns: concerns
Linguistic choices Syntax, linguistic
Syntax choices
Semantics Examples:
We just live with that Use cases, test
differences documents, design
documents follow a
common template and
contain recurrent words
5. Problem
Different kinds of software artifacts require specific
preprocessing
Test case Change the date for a visit:
Test case Change the date for a visit:
C51
C51 Version: 0 02 000
Version: 0 02 000
Use case
Use case Satisfies the request to modify a visit
for a patient request to modify a visit
Satisfies the
for a patient
UcModVis
UcModVis
Priority
Priority High
High
....
....
Test description
Test description
Input
Input Select a visit:
Select a visit:
26/09/2003 11:00 First visit
26/09/2003 11:00 First visit
Change: 03/10/2003 11:00
Change: 03/10/2003 11:00
Oracle
Oracle Invalid sequence: The system does not allow
Invalid sequence: The system does not allow
to change a booking
to change a booking
Coverage
Coverage Valid classes: CE1 CE8 CE14 CE19 CE21
Valid classes: CE1 CE8 CE14 CE19 CE21
Invalid classes: None
Invalid classes: None
6. Problem
Different kinds of software artifacts require specific
preprocessing
Test case Change the date for a visit:
Test case Change the date for a visit:
C51
C51 Version: 0 02 000
Version: 0 02 000
Use case
Use case Satisfies the request to modify a visit
for a patient request to modify a visit
Satisfies the
for a patient
UcModVis
UcModVis
Priority
Priority High
High Artifact-specific words do
....
.... not bring useful
Test description
Test description
Input Select a visit:
information
Input Select a visit:
26/09/2003 11:00 First visit
26/09/2003 11:00 First visit
Change: 03/10/2003 11:00
Change: 03/10/2003 11:00
Oracle
Oracle Invalid sequence: The system does not allow
Invalid sequence: The system does not allow
to change a booking
to change a booking
Coverage
Coverage Valid classes: CE1 CE8 CE14 CE19 CE21
Valid classes: CE1 CE8 CE14 CE19 CE21
Invalid classes: None
Invalid classes: None
8. Noisy images
Pixels with peaks of low Pixels with peaks of
color intensity high color intensity
Noise
9. Reducing noise using smoothing filters
Mean filter
1
g ( x, y ) =
M
∑ f ( n, m )
f ( n , m )∈S
10. Image vs. traceability noise
Image noise: Traceability noise:
Pixels with high or Terms and linguistic
low color intensity patterns occurring in
Pixels are position many artifacts of a given
dependent category
Use cases, test
cases..
Artifacts (columns) are
position independent
d1 d2 d2 d1
11. Representing the noise
Source Documents Target Documents
s1 s2 s3 L sk t1 t2 t3 L tz
word1 v1,1 v1,2 v1,3 L v1, k v1,1 v1,2 v1,3 L v1, z
v v2,2 v2,3 v2, k v2,1 v2,2 v2,3 v2, z
word 2 2,1 L L
M M O M O M O M M O M O M O M
word n vn ,1 L vn ,2 L vn ,3 L vn ,k vn ,1 L vn ,2 L vn ,3 L vn , z
Linguistic information strictly Linguistic information strictly belonging
belonging to source documents to target documents
Common Information Common Information
for Source Documents For target documents
12. Representing the noise
Source Documents Target Documents
s1 s2 s3 L sk t1 t2 t3 L tz
word1 v1,1 v1,2 v1,3 L v1, k v1,1 v1,2 v1,3 L v1, z
v v2,2 v2,3 v2, k v2,1 v2,2 v2,3 v2, z
word 2 2,1 L L
M M O M O M O M M O M O M O M
word n vn ,1 L vn ,2 L vn ,3 L vn ,k vn,1 L vn,2 L vn,3 L vn, z
1 k
k ∑ v1, j 1 m
z ∑ v1, j
j =1 j = k +1
1 k 1 m
∑ v2, j
Mean source vector S= k ∑
S = j =1
v2, j Mean target vector T= T = z j = k +1
M
M m
k 1
1 z ∑ vn , j
∑ vn, j
Common Information k j =1 Common Information j = k +1
for Source Documents For target documents
The Mean Vectors are like the continuous component of a signal…
The Mean Vectors are like the continuous component of a signal…
13. Representing the noise
Source Documents Target Documents
s1 s2 s3 L sk t1 t2 t3 L tz
word1 v1,1 v1,2 v1,3 L v1, k v1,1 v1,2 v1,3 L v1, z
v v2,2 v2,3 v2, k v2,1 v2,2 v2,3 v2, z
word 2 2,1 L L
M M O M O M O M M O M O M O M
word n vn ,1 L vn ,2 L vn ,3 L vn ,k vn ,1 L vn ,2 L vn ,3 L vn , z
- -
S T
(mean target (mean target vector)
vector)
Filtered
Filtered Filtered
Filtered
Source Set
Source Set Target Set
Target Set
14. Empirical Study
Goal: analyze the effect of smoothing filter
Purpose: investigating how the filter affects
traceability recovery
Quality focus: traceability recovery performance
(precision and recall)
Perspective:
Researchers: evaluating the novel technique
Project managers: adopt a better traceability recovery
technique
Context: artifacts from two systems
EasyClinic and Pine
15. Context
EasyClinic Pine
Description Medical doctor office Text-based
management email client
Language Java C
Files/Classe 37 31
s
KLOC 20 130
Documents 113 100
Language Italian English
Artifacts Use cases Requirements
Interaction diagrams Use cases
Source code
Test cases
16. Research Questions and Factors
RQ1: Does the smoothing filter improve the
recovery performances of VSM-based traceability
recovery?
RQ2: Does the smoothing filter improve the
recovery performances of LSI-based traceability
recovery?
RQ3: How do the performances vary for different
types of artifacts?
Factors:
Use of filter: YES, NO
Technique: VSM, LSI
Artifact: Req., UC, Int. Diagrams, Code, TC
System: Easyclinic, Pine
17. Analysis Method
Performances evaluated by precision and recall:
correct ∩ retrieved correct ∩ retrieved
precision = recall =
retrieved correct
M1 M2
We statistically compare the #
of false positives of different 0
methods for each correct link
2
identified
2
Wilcoxon Rank Sum test
3
Cliff’s delta effect size
21. EasyClinic: Test cases into source (LSI)
Test cases are:
Short documents
Limited vocabulary
Mostly consistent with
source code
Precision
Filtered
Not Filtered
Recall
22. Pine: Use cases into requirements (LSI)
Precision
Filtere
d
Not Filtered
Recall
24. Link precision improvement
Login Patient
Login Patient
vs. Person
vs. Person
Poor vocabulary
Poor vocabulary
overlap (10%)
overlap (10%)
25. Threats to validity
Construct validity
Mainly related to our oracle
Provided by developers and for EasyClinic also peer-
reviewed
Internal validity
Improvements could be due to other reasons…
However we compared different techniques (VSM, LSI)
The approach works well regardless of stop word
removal/stemming and use of tf-idf
Conclusion validity
Conclusions based on proper (non-parametric) statistics
External validity
We considered systems with different characteristics and
artifacts
… but further studies are desirable
26. Conclusions
We proposed the use of smoothing filter to
improve performances of IR-based traceability
recovery
Idea inspired from digital signal processing
The filter significantly improves IR-based
traceability recovery based on VSM (RQ1) and LSI
(RQ2)
Filter particularly suitable for artifacts having a
higher verbosity (RQ3)
e.g., requirements and use cases
Less useful for artifacts composed of short
sentences and using a limited vocabulary
e.g., test cases
27. Work-in-progress
Study replication
Different systems and artifacts
Use of relevance feedback
More sophisticated smoothing technique
Non linear filters
Use in other applications of IR to software
engineering
E.g. impact analysis or feature location