The document discusses challenges in using off-the-shelf techniques to analyze mailing list archives. It finds that up to 98% of messages contain noise and need additional processing and cleaning. Issues include resolving multiple sender identities in up to 21% of addresses, reconstructing discussion threads from the linear archives, and extracting attachments that make up around 10% of messages.
This document summarizes the findings of an empirical study on the risks of using off-the-shelf techniques for processing mailing list data. The study found that up to 98% of messages in mailing list archives contain noise and require additional processing and cleaning. It also found that resolving multiple sender identities, reconstructing discussion threads, and extracting attachments are challenging due to limitations of current techniques.
This document discusses using Python and the Paver build tool to script common project tasks like configuration, documentation building, packaging, and more. It argues that Python build files are preferable to separate scripts because Python is a powerful, well-defined language that developers already know. The document provides examples of using Paver to define configuration options, namespaces, dynamic values, and tasks.
How DNS wildcards really work & how to prevent that DNS wildcard bite!
Tailored for DNS administrators on Unix and Windows operating authoritative DNS Servers with one or more zone-files, as well as all those interested in the topic.
The document discusses Redis, an open source in-memory data structure store. It highlights Redis' support for fundamental data structures like lists, sets, sorted sets and hashes through a simple yet powerful API. Redis stores data in memory for high performance but also supports persistence to disk for durability through snapshotting or append-only file writes. The document promotes Redis as a data structure server rather than just a simple key-value store.
The document describes the MUD 2010 workshop on mining unstructured data. It provides examples of unstructured data like websites, diagrams, documents, social media, documentation, help files, source code, bug reports, commit logs, emails, and system logs. Unstructured data is characterized as being complex, diverse, and imperfect due to its lack of explicit structure or format and use of natural language, rich semantics, and no authoritative representation.
An Empirical Study on Inconsistent Changes to Code Clones at Release LevelNicolas Bettenburg
This is a talk I gave at the 2009 Working Conference on Reverse Engineering in Lille, France about our work on the effects of inconsistent changes on software quality if we observe them at a release level.
A Lightweight Approach to Uncover Technical Information in Unstructured DataNicolas Bettenburg
This document summarizes a technical paper that presents a lightweight approach to uncover technical information from unstructured data like bug reports and email discussions. The approach uses spell checkers and adds heuristics like identifying camel case terms and programming language keywords to help classify lines of text as technical or not. Evaluation on annotated bug reports and emails shows the approach achieves precision of 86-89% and recall of 68-86% in line classification, outperforming previous state-of-the-art techniques.
This document summarizes the findings of an empirical study on the risks of using off-the-shelf techniques for processing mailing list data. The study found that up to 98% of messages in mailing list archives contain noise and require additional processing and cleaning. It also found that resolving multiple sender identities, reconstructing discussion threads, and extracting attachments are challenging due to limitations of current techniques.
This document discusses using Python and the Paver build tool to script common project tasks like configuration, documentation building, packaging, and more. It argues that Python build files are preferable to separate scripts because Python is a powerful, well-defined language that developers already know. The document provides examples of using Paver to define configuration options, namespaces, dynamic values, and tasks.
How DNS wildcards really work & how to prevent that DNS wildcard bite!
Tailored for DNS administrators on Unix and Windows operating authoritative DNS Servers with one or more zone-files, as well as all those interested in the topic.
The document discusses Redis, an open source in-memory data structure store. It highlights Redis' support for fundamental data structures like lists, sets, sorted sets and hashes through a simple yet powerful API. Redis stores data in memory for high performance but also supports persistence to disk for durability through snapshotting or append-only file writes. The document promotes Redis as a data structure server rather than just a simple key-value store.
The document describes the MUD 2010 workshop on mining unstructured data. It provides examples of unstructured data like websites, diagrams, documents, social media, documentation, help files, source code, bug reports, commit logs, emails, and system logs. Unstructured data is characterized as being complex, diverse, and imperfect due to its lack of explicit structure or format and use of natural language, rich semantics, and no authoritative representation.
An Empirical Study on Inconsistent Changes to Code Clones at Release LevelNicolas Bettenburg
This is a talk I gave at the 2009 Working Conference on Reverse Engineering in Lille, France about our work on the effects of inconsistent changes on software quality if we observe them at a release level.
A Lightweight Approach to Uncover Technical Information in Unstructured DataNicolas Bettenburg
This document summarizes a technical paper that presents a lightweight approach to uncover technical information from unstructured data like bug reports and email discussions. The approach uses spell checkers and adds heuristics like identifying camel case terms and programming language keywords to help classify lines of text as technical or not. Evaluation on annotated bug reports and emails shows the approach achieves precision of 86-89% and recall of 68-86% in line classification, outperforming previous state-of-the-art techniques.
Mining Development Repositories to Study the Impact of Collaboration on Softw...Nicolas Bettenburg
This document proposes an approach to study the impact of collaboration on software systems through mining development repositories. The approach involves:
I. Extracting communication data such as source code comments, emails, and issue discussions from version control systems, mailing lists, and issue tracking systems.
II. Studying the impact of collaboration on software quality by computing social metrics from the extracted communication data and measuring their relationship to post-release defects.
III. Studying the impact of collaboration on the development community by analyzing data on how code contributions are managed, such as feedback and reviews, to understand how contributors, reviewers, and the software are affected by communication.
The document describes a new algorithm to automatically identify bug-introducing changes by linking bug reports in an error reporting system to code changes in a version control system. The algorithm is an improvement on the existing SZZ algorithm by using annotation graphs, ignoring non-code changes, and removing outlier revisions. An evaluation of the new algorithm shows it reduces false positives by 36-51% and false negatives by 14% compared to the original SZZ approach.
The document discusses different types of code cloning, including intentional cloning through copy-paste and unintentional cloning due to language idioms. It notes that 10-15% of code may be cloned and that cloning can increase maintenance effort. However, cloning may also be used for experimentation without risking existing code or to address bugs through workarounds. The document outlines eight common cloning patterns and suggests that the reasons for duplication should be understood before deciding if refactoring is needed, as cloning is not always harmful.
Finding Paths in Large Spaces - A* and Hierarchical A*Nicolas Bettenburg
A* search is an informed search algorithm that finds the shortest path between a starting node and a goal node. It uses a heuristic function to estimate the distance to the goal for each node, guiding the search towards the most promising nodes first. A* is optimal if the heuristic is admissible (never overestimates the actual cost to reach the goal). It is also complete and optimally efficient. The algorithm maintains two lists, an open list of nodes to explore and a closed list of explored nodes. It iteratively removes the node with the lowest estimated total cost from the open list and expands it until the goal is found.
Ph.D. Dissertation - Studying the Impact of Developer Communication on the Qu...Nicolas Bettenburg
Software development is a largely collaborative effort, of which the actual encoding of program logic in source code is a relatively small part. Software developers have to collaborate effectively and communicate with their peers in order to avoid coordination problems. To date, little is known how developer communication during software development activities impacts the quality and evolution of a software.
In this thesis, we present and evaluate tools and techniques to recover communication data from traces of the software development activities. With this data, we study the impact of developer communication on the quality and evolution of the software through an in-depth investigation of the role of developer communication during software development activities. Through multiple case-studies on a broad spectrum of open-source software projects, we find that communication between developers stands in a direct relationship to the quality of the software. Our findings demonstrate that our models based on developer communication explain software defects as well as state-of-the art models that are based on technical information such as code and process metrics, and that social information metrics are orthogonal to these traditional metrics, leading to a more complete and integrated view on software defects. In addition, we find that communication between developers plays a important role in maintaining a healthy contribution management process, which is one of the key factors to the successful evolution of the software. Source code contributors who are part of the community surrounding open-source projects are available for limited times, and long communication times can lead to the loss of valuable contributions.
Our thesis illustrates that software development is an intricate and complex process that is strongly influenced by the social interactions between the stakeholders involved in the development activities. A traditional view based solely on technical aspects of software development such as source code size and complexity, while valuable, limits our understanding of software development activities. The research presented in this thesis consists of a first step towards gaining a more holistic view on software development activities.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive function. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms for those who already suffer from conditions like anxiety and depression.
Talk given at ICSM 2008 Conference in Beijing, China.
Duplicate Bug reports are commonly to pollute bug reporting systems and have negative effects on a development teams' productivity. Therefore, duplicate bug reports are ignored, once identified. The findings in this research work show, that duplicate reports actually contain extra information that is not present in the original bug reports and developers can potentially benefit from this information. We conduct experiments and a case study on ECLIPSE to quantify the amount of extra information. We show that this extra information can be used to enhance techniques related to bug fixing, such as triaging.
Accuracy measures the percentage of correct predictions out of the total number of predictions. Precision measures the percentage of positive predictions that were actually correct. Recall measures the percentage of positive cases that were correctly identified.
This document summarizes MongoDB usage at MapMyFitness from a DevOps perspective. It describes how MongoDB is used for storing routes, sessions, live tracking data and API logging. It provides examples of implementation patterns like replica sets, sharding and automated provisioning. It also covers topics like monitoring, security, maintenance and lessons learned.
Crunching data with go: Tips, tricks, use-casesSergii Khomenko
Talk for the first meetup of Munich Golang User Group. Described use-cases from real Go development, covered fetching data from sql database, connecting to Google services like Google Analytics, Google BigQuery, other aspect of building a geolocation application.
The document summarizes a presentation given by Jeff Hammerbacher on Hadoop and Cloudera. The presentation covered an introduction to Hadoop including HDFS, MapReduce and other subprojects. It discussed how Hadoop is used at large companies like Facebook and Yahoo to manage and analyze large amounts of data. It also provided an overview of Cloudera's Hadoop distribution and services to support enterprises using Hadoop.
The document contains log information from web server requests and access logs. It includes details like IP addresses, dates, request URLs, response codes, user agents, and referer URLs from multiple requests. The logs show information about requests for book information and search redirects.
This document summarizes the BRASIL distributed computing framework. It discusses the motivation for BRASIL to support large-scale data-intensive computing problems. It then provides overviews of the key techniques used in BRASIL, including its PUSH dataflow model, file system interfaces, command line tools, and microbenchmarks evaluating performance.
Follow a Firefox crash from its genesis in a collapsing browser process through the dizzying array of collection, storage, and reporting systems that make up Socorro, our open-source crash collector. Enjoy war stories of weird, interlocking failures, and see how we nevertheless continue to fulfill our mandate: “Never lose a crash.” Observe some patterns that emerged from this system which can be useful in yours.
Oracle 10.2.0.1 executed the full table scan of table T2 by reading blocks sequentially one by one due to an empty buffer cache, while Oracle 11.2.0.3 was able to read multiple blocks together using multiblock reads to populate the buffer cache more quickly. The full table scan performance was similar between the two versions, but Oracle 11.2.0.3 optimized the physical I/O by reading blocks in larger sets through multiblock reads.
Cloud flare jgc bigo meetup rolling hashesCloudflare
Rolling and extendable hashes allow efficiently updating hash values when a string or data is extended. Cyclic redundancy checks (CRCs) are commonly used as rolling hashes due to their fast computation. Rabin-Karp hashes also allow efficient rolling updates through arithmetic modulo a prime value. Both CRCs and Rabin-Karp hashes enable fast searching for substrings in a string by sliding the hash along the string and comparing values. Rsync utilizes a similar rolling hash approach along with MD5 hashes to efficiently synchronize files between systems.
Дмитрий Щадей "Что помогает нам писать качественный JavaScript-код?"Yandex
Почему в Яндексе легче сказать, чем написать плохой код? Рассказ о том, как мы в больших проектах сохраняем высокий уровень владения кодом без домашних заданий по чтению чужих файлов, и о том, какие сторонние инструменты мы для этого используем. Возможно, наш опыт поможет и вам.
The document discusses new features in Groovy 2.0 including alignments with JDK 7 such as Project Coin language changes and invoke dynamic support. It also discusses continued runtime performance improvements, static type checking, static compilation, and modularity. Additionally, the document provides an overview of improvements and enhancements to command chains, closures, and parallel programming support in Groovy 1.8.
This document summarizes the Go programming language. It was created by Google in 2007 and announced in 2009. Go is intended for systems programming and features garbage collection, static typing, and built-in concurrency with goroutines and channels. It aims to have high compilation speed and a syntax familiar to C/C++/Java programmers. Concurrency in Go is based on lightweight goroutines and channel-based communication between them.
Mining Development Repositories to Study the Impact of Collaboration on Softw...Nicolas Bettenburg
This document proposes an approach to study the impact of collaboration on software systems through mining development repositories. The approach involves:
I. Extracting communication data such as source code comments, emails, and issue discussions from version control systems, mailing lists, and issue tracking systems.
II. Studying the impact of collaboration on software quality by computing social metrics from the extracted communication data and measuring their relationship to post-release defects.
III. Studying the impact of collaboration on the development community by analyzing data on how code contributions are managed, such as feedback and reviews, to understand how contributors, reviewers, and the software are affected by communication.
The document describes a new algorithm to automatically identify bug-introducing changes by linking bug reports in an error reporting system to code changes in a version control system. The algorithm is an improvement on the existing SZZ algorithm by using annotation graphs, ignoring non-code changes, and removing outlier revisions. An evaluation of the new algorithm shows it reduces false positives by 36-51% and false negatives by 14% compared to the original SZZ approach.
The document discusses different types of code cloning, including intentional cloning through copy-paste and unintentional cloning due to language idioms. It notes that 10-15% of code may be cloned and that cloning can increase maintenance effort. However, cloning may also be used for experimentation without risking existing code or to address bugs through workarounds. The document outlines eight common cloning patterns and suggests that the reasons for duplication should be understood before deciding if refactoring is needed, as cloning is not always harmful.
Finding Paths in Large Spaces - A* and Hierarchical A*Nicolas Bettenburg
A* search is an informed search algorithm that finds the shortest path between a starting node and a goal node. It uses a heuristic function to estimate the distance to the goal for each node, guiding the search towards the most promising nodes first. A* is optimal if the heuristic is admissible (never overestimates the actual cost to reach the goal). It is also complete and optimally efficient. The algorithm maintains two lists, an open list of nodes to explore and a closed list of explored nodes. It iteratively removes the node with the lowest estimated total cost from the open list and expands it until the goal is found.
Ph.D. Dissertation - Studying the Impact of Developer Communication on the Qu...Nicolas Bettenburg
Software development is a largely collaborative effort, of which the actual encoding of program logic in source code is a relatively small part. Software developers have to collaborate effectively and communicate with their peers in order to avoid coordination problems. To date, little is known how developer communication during software development activities impacts the quality and evolution of a software.
In this thesis, we present and evaluate tools and techniques to recover communication data from traces of the software development activities. With this data, we study the impact of developer communication on the quality and evolution of the software through an in-depth investigation of the role of developer communication during software development activities. Through multiple case-studies on a broad spectrum of open-source software projects, we find that communication between developers stands in a direct relationship to the quality of the software. Our findings demonstrate that our models based on developer communication explain software defects as well as state-of-the art models that are based on technical information such as code and process metrics, and that social information metrics are orthogonal to these traditional metrics, leading to a more complete and integrated view on software defects. In addition, we find that communication between developers plays a important role in maintaining a healthy contribution management process, which is one of the key factors to the successful evolution of the software. Source code contributors who are part of the community surrounding open-source projects are available for limited times, and long communication times can lead to the loss of valuable contributions.
Our thesis illustrates that software development is an intricate and complex process that is strongly influenced by the social interactions between the stakeholders involved in the development activities. A traditional view based solely on technical aspects of software development such as source code size and complexity, while valuable, limits our understanding of software development activities. The research presented in this thesis consists of a first step towards gaining a more holistic view on software development activities.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive function. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms for those who already suffer from conditions like anxiety and depression.
Talk given at ICSM 2008 Conference in Beijing, China.
Duplicate Bug reports are commonly to pollute bug reporting systems and have negative effects on a development teams' productivity. Therefore, duplicate bug reports are ignored, once identified. The findings in this research work show, that duplicate reports actually contain extra information that is not present in the original bug reports and developers can potentially benefit from this information. We conduct experiments and a case study on ECLIPSE to quantify the amount of extra information. We show that this extra information can be used to enhance techniques related to bug fixing, such as triaging.
Accuracy measures the percentage of correct predictions out of the total number of predictions. Precision measures the percentage of positive predictions that were actually correct. Recall measures the percentage of positive cases that were correctly identified.
This document summarizes MongoDB usage at MapMyFitness from a DevOps perspective. It describes how MongoDB is used for storing routes, sessions, live tracking data and API logging. It provides examples of implementation patterns like replica sets, sharding and automated provisioning. It also covers topics like monitoring, security, maintenance and lessons learned.
Crunching data with go: Tips, tricks, use-casesSergii Khomenko
Talk for the first meetup of Munich Golang User Group. Described use-cases from real Go development, covered fetching data from sql database, connecting to Google services like Google Analytics, Google BigQuery, other aspect of building a geolocation application.
The document summarizes a presentation given by Jeff Hammerbacher on Hadoop and Cloudera. The presentation covered an introduction to Hadoop including HDFS, MapReduce and other subprojects. It discussed how Hadoop is used at large companies like Facebook and Yahoo to manage and analyze large amounts of data. It also provided an overview of Cloudera's Hadoop distribution and services to support enterprises using Hadoop.
The document contains log information from web server requests and access logs. It includes details like IP addresses, dates, request URLs, response codes, user agents, and referer URLs from multiple requests. The logs show information about requests for book information and search redirects.
This document summarizes the BRASIL distributed computing framework. It discusses the motivation for BRASIL to support large-scale data-intensive computing problems. It then provides overviews of the key techniques used in BRASIL, including its PUSH dataflow model, file system interfaces, command line tools, and microbenchmarks evaluating performance.
Follow a Firefox crash from its genesis in a collapsing browser process through the dizzying array of collection, storage, and reporting systems that make up Socorro, our open-source crash collector. Enjoy war stories of weird, interlocking failures, and see how we nevertheless continue to fulfill our mandate: “Never lose a crash.” Observe some patterns that emerged from this system which can be useful in yours.
Oracle 10.2.0.1 executed the full table scan of table T2 by reading blocks sequentially one by one due to an empty buffer cache, while Oracle 11.2.0.3 was able to read multiple blocks together using multiblock reads to populate the buffer cache more quickly. The full table scan performance was similar between the two versions, but Oracle 11.2.0.3 optimized the physical I/O by reading blocks in larger sets through multiblock reads.
Cloud flare jgc bigo meetup rolling hashesCloudflare
Rolling and extendable hashes allow efficiently updating hash values when a string or data is extended. Cyclic redundancy checks (CRCs) are commonly used as rolling hashes due to their fast computation. Rabin-Karp hashes also allow efficient rolling updates through arithmetic modulo a prime value. Both CRCs and Rabin-Karp hashes enable fast searching for substrings in a string by sliding the hash along the string and comparing values. Rsync utilizes a similar rolling hash approach along with MD5 hashes to efficiently synchronize files between systems.
Дмитрий Щадей "Что помогает нам писать качественный JavaScript-код?"Yandex
Почему в Яндексе легче сказать, чем написать плохой код? Рассказ о том, как мы в больших проектах сохраняем высокий уровень владения кодом без домашних заданий по чтению чужих файлов, и о том, какие сторонние инструменты мы для этого используем. Возможно, наш опыт поможет и вам.
The document discusses new features in Groovy 2.0 including alignments with JDK 7 such as Project Coin language changes and invoke dynamic support. It also discusses continued runtime performance improvements, static type checking, static compilation, and modularity. Additionally, the document provides an overview of improvements and enhancements to command chains, closures, and parallel programming support in Groovy 1.8.
This document summarizes the Go programming language. It was created by Google in 2007 and announced in 2009. Go is intended for systems programming and features garbage collection, static typing, and built-in concurrency with goroutines and channels. It aims to have high compilation speed and a syntax familiar to C/C++/Java programmers. Concurrency in Go is based on lightweight goroutines and channel-based communication between them.
The document is a presentation by Jeff Hammerbacher from Cloudera about Hadoop and how it is used at large companies like Facebook and Yahoo to manage massive amounts of data. It provides an overview of what Hadoop is, how it works, and some of the major components and subprojects. It also discusses how Hadoop is used at Facebook and Yahoo to process petabytes of data every day and support thousands of jobs. Finally, it describes Cloudera's distribution of Hadoop and the services and support they provide to enterprises.
Spark After Dark - LA Apache Spark Users Group - Feb 2015Chris Fregly
Spark After Dark is a mock dating site that uses the latest Spark libraries including Spark SQL, BlinkDB, Tachyon, Spark Streaming, MLlib, and GraphX to generate high-quality dating recommendations for its members and blazing fast analytics for its operators.
We begin with brief overview of Spark, Spark Libraries, and Spark Use Cases. In addition, we'll discuss the modern day Lambda Architecture that combines real-time and batch processing into a single system. Lastly, we present best practices for monitoring and tuning a highly-available Spark and Spark Streaming cluster.
There will be many live demos covering everything from basic topics such as ETL and data ingestion to advanced topics such as streaming, sampling, approximations, machine learning, textual analysis, and graph processing.
Spark after Dark by Chris Fregly of DatabricksData Con LA
Spark After Dark is a mock dating site that uses the latest Spark libraries, AWS Kinesis, Lambda Architecture, and Probabilistic Data Structures to generate dating recommendations.
There will be 5+ demos covering everything from basic data ETL to advanced data processing including Alternating Least Squares Machine Learning/Collaborative Filtering and PageRank Graph Processing.
There is heavy emphasis on Spark Streaming and AWS Kinesis.
Watch the video here
https://www.youtube.com/watch?v=g0i_d8YT-Bs
The document discusses using the command line as a productivity tool. It presents bash as a powerful tool for automating tasks and introduces many useful commands and concepts, including redirection, pipes, variables, conditionals, loops, and scripting. It also summarizes tools for developers such as Homebrew, Git, Xcode, xcpretty and xctool.
Apache Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of computers. It provides reliable storage through its distributed file system and scalable processing through its MapReduce programming model. Yahoo! uses Hadoop extensively for applications like log analysis, content optimization, and computational advertising, processing over 6 petabytes of data across 40,000 machines daily.
London Spark Meetup Project Tungsten Oct 12 2015Chris Fregly
Building on a previous talk about how Spark beat Hadoop @ 100TB Daytona GraySort, we present low-level details of Project Tungsten which includes many CPU and Memory optimizations.
NGS Informatics and Interpretation - Hardware Considerations by Michael McManusKnome_Inc
View this webinar at: http://www.knome.com/webinar-ngs-informatics-and-interpretation-hardware-considerations. In this presentation, Knome’s Senior Vice President of Operations, Michael McManus, PhD, will review the k100 and k25 hardware models of the knoSYS including servers, storage, networks, and power components. While doing so, he will answer:
- Why would someone purchase hardware when they can process NGS data on the cloud?
- For an organization not interested in using the cloud, what sort of hardware should be considered?
- What hardware specifications are needed for conducting align + call (FASTQ and/or BAM files) versus interpretation (VCF files)?
- Is all hardware alike? How does someone compare systems apples-to-apples?
This document provides an overview of PowerDNS, an open-source authoritative DNS server. It discusses PowerDNS' backends for storing zone data like BIND and SQL databases, its DNSSEC signing capabilities, and its support for remote and Lua scripting backends. It also demonstrates configuring PowerDNS to use BIND, MySQL, and remote backends. Finally, it summarizes Men & Mice's generic PowerDNS controller and upcoming training courses.
Similar to An Empirical Study on the Risks of Using Off-the-Shelf Techniques for Processing Mailing List Data (20)
10 Year Impact Award Presentation - Duplicate Bug Reports Considered Harmful ...Nicolas Bettenburg
This document describes a new automated method called SEVERIS that assists NASA test engineers in assigning severity levels to defect reports. SEVERIS uses text mining and machine learning techniques on NASA's Project and Issue Tracking System (PITS) database to predict issue severities. A case study found that SEVERIS accurately predicts severities and provides probability estimates, helping guide decision making during the severity assessment process.
Using Fuzzy Code Search to Link Code Fragments in Discussions to Source CodeNicolas Bettenburg
Talk on Using Fuzzy Code Search to Link Code Fragments in Discussions to Source Code, given at the 16th European Conference on Software Maintenance and Reengineering (CSMR'12) in Hungary.
Managing Community Contributions: Lessons Learned from a Case Study on Andro...Nicolas Bettenburg
This document summarizes a case study comparing community contribution processes for Android and Linux. It finds that Android actively worked to provide faster feedback on contributions, typically responding within days rather than weeks as with Linux. Android also centralized the contribution process within a web application rather than using email lists. The study also found that most contributions targeted major subsystems, with acceptance rates between 50-91%, while some Android subsystems had very low acceptance due to being more sensitive. The goal for Android was to keep users engaged by providing rapid feedback on contributions.
The document discusses approximation algorithms for NP-complete problems. It introduces the idea of finding near-optimal solutions in polynomial time for problems where optimal solutions cannot be found efficiently. It provides examples of the vertex cover problem and set cover problem, describing greedy approximation algorithms that provide performance guarantees for finding near-optimal solutions for these problems. The document also discusses some open questions around whether these approximation ratios can be improved.
This document discusses models for predicting customer perceptions of software quality based on factors collected within the first three months of installation. Logistic regression is used to model rare, high-impact software failures based on variables like system size, software upgrades, operating system, etc. Linear regression is used to model frequent, low-impact customer interactions like calls based on similar predictor variables. The models found most predictors to be statistically significant due to the large sample size.
The bug report describes an issue where entering an invalid value for a BigDecimal property would cause the editor to lock up until restoring the default value. A patch was proposed to handle exceptions thrown by the BigDecimal constructor better by checking for a null error message and returning an alternative message or stack trace. The patch was committed to fix the problem. The document contains the bug report details, code snippets, and discussion between the reporter and assignee.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise boosts blood flow, releases endorphins, and promotes changes in the brain which help relax the body and lift the mood.
Temple of Asclepius in Thrace. Excavation resultsKrassimira Luka
The temple and the sanctuary around were dedicated to Asklepios Zmidrenus. This name has been known since 1875 when an inscription dedicated to him was discovered in Rome. The inscription is dated in 227 AD and was left by soldiers originating from the city of Philippopolis (modern Plovdiv).
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...TechSoup
Whether you're new to SEO or looking to refine your existing strategies, this webinar will provide you with actionable insights and practical tips to elevate your nonprofit's online presence.
Chapter wise All Notes of First year Basic Civil Engineering.pptxDenish Jangid
Chapter wise All Notes of First year Basic Civil Engineering
Syllabus
Chapter-1
Introduction to objective, scope and outcome the subject
Chapter 2
Introduction: Scope and Specialization of Civil Engineering, Role of civil Engineer in Society, Impact of infrastructural development on economy of country.
Chapter 3
Surveying: Object Principles & Types of Surveying; Site Plans, Plans & Maps; Scales & Unit of different Measurements.
Linear Measurements: Instruments used. Linear Measurement by Tape, Ranging out Survey Lines and overcoming Obstructions; Measurements on sloping ground; Tape corrections, conventional symbols. Angular Measurements: Instruments used; Introduction to Compass Surveying, Bearings and Longitude & Latitude of a Line, Introduction to total station.
Levelling: Instrument used Object of levelling, Methods of levelling in brief, and Contour maps.
Chapter 4
Buildings: Selection of site for Buildings, Layout of Building Plan, Types of buildings, Plinth area, carpet area, floor space index, Introduction to building byelaws, concept of sun light & ventilation. Components of Buildings & their functions, Basic concept of R.C.C., Introduction to types of foundation
Chapter 5
Transportation: Introduction to Transportation Engineering; Traffic and Road Safety: Types and Characteristics of Various Modes of Transportation; Various Road Traffic Signs, Causes of Accidents and Road Safety Measures.
Chapter 6
Environmental Engineering: Environmental Pollution, Environmental Acts and Regulations, Functional Concepts of Ecology, Basics of Species, Biodiversity, Ecosystem, Hydrological Cycle; Chemical Cycles: Carbon, Nitrogen & Phosphorus; Energy Flow in Ecosystems.
Water Pollution: Water Quality standards, Introduction to Treatment & Disposal of Waste Water. Reuse and Saving of Water, Rain Water Harvesting. Solid Waste Management: Classification of Solid Waste, Collection, Transportation and Disposal of Solid. Recycling of Solid Waste: Energy Recovery, Sanitary Landfill, On-Site Sanitation. Air & Noise Pollution: Primary and Secondary air pollutants, Harmful effects of Air Pollution, Control of Air Pollution. . Noise Pollution Harmful Effects of noise pollution, control of noise pollution, Global warming & Climate Change, Ozone depletion, Greenhouse effect
Text Books:
1. Palancharmy, Basic Civil Engineering, McGraw Hill publishers.
2. Satheesh Gopi, Basic Civil Engineering, Pearson Publishers.
3. Ketki Rangwala Dalal, Essentials of Civil Engineering, Charotar Publishing House.
4. BCP, Surveying volume 1
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.pptHenry Hollis
The History of NZ 1870-1900.
Making of a Nation.
From the NZ Wars to Liberals,
Richard Seddon, George Grey,
Social Laboratory, New Zealand,
Confiscations, Kotahitanga, Kingitanga, Parliament, Suffrage, Repudiation, Economic Change, Agriculture, Gold Mining, Timber, Flax, Sheep, Dairying,
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxEduSkills OECD
Iván Bornacelly, Policy Analyst at the OECD Centre for Skills, OECD, presents at the webinar 'Tackling job market gaps with a skills-first approach' on 12 June 2024
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...EduSkills OECD
Andreas Schleicher, Director of Education and Skills at the OECD presents at the launch of PISA 2022 Volume III - Creative Minds, Creative Schools on 18 June 2024.
This presentation was provided by Racquel Jemison, Ph.D., Christina MacLaughlin, Ph.D., and Paulomi Majumder. Ph.D., all of the American Chemical Society, for the second session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session Two: 'Expanding Pathways to Publishing Careers,' was held June 13, 2024.
Gender and Mental Health - Counselling and Family Therapy Applications and In...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...indexPub
The recent surge in pro-Palestine student activism has prompted significant responses from universities, ranging from negotiations and divestment commitments to increased transparency about investments in companies supporting the war on Gaza. This activism has led to the cessation of student encampments but also highlighted the substantial sacrifices made by students, including academic disruptions and personal risks. The primary drivers of these protests are poor university administration, lack of transparency, and inadequate communication between officials and students. This study examines the profound emotional, psychological, and professional impacts on students engaged in pro-Palestine protests, focusing on Generation Z's (Gen-Z) activism dynamics. This paper explores the significant sacrifices made by these students and even the professors supporting the pro-Palestine movement, with a focus on recent global movements. Through an in-depth analysis of printed and electronic media, the study examines the impacts of these sacrifices on the academic and personal lives of those involved. The paper highlights examples from various universities, demonstrating student activism's long-term and short-term effects, including disciplinary actions, social backlash, and career implications. The researchers also explore the broader implications of student sacrifices. The findings reveal that these sacrifices are driven by a profound commitment to justice and human rights, and are influenced by the increasing availability of information, peer interactions, and personal convictions. The study also discusses the broader implications of this activism, comparing it to historical precedents and assessing its potential to influence policy and public opinion. The emotional and psychological toll on student activists is significant, but their sense of purpose and community support mitigates some of these challenges. However, the researchers call for acknowledging the broader Impact of these sacrifices on the future global movement of FreePalestine.
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
An Empirical Study on the Risks of Using Off-the-Shelf Techniques for Processing Mailing List Data
1. An Empirical Study on the Risks of Using Off-the-Shelf
Techniques for Processing Mailing List Data
Nicolas Bettenburg, Emad Shihab, Ahmed E. Hassan
Queen’s University, Canada
1
4. The Importance of Mailing List
Archives
rm of comm unication
• Emai l popular fo
to distribu te messages
• Mailing lists
valuable in formation
• Messa ges contain
ssions of s ource code
• Discu
evelopmen t decisions
•D
• Er ror reports
ser support requests
•U
4
5. Mining the Mailing Lists of
23 Open-Source Projects
• Summarizing developer mailing lists
• Using off-the-shelf tools
• Data from around 500,000 emails
• Unexpected results from experiments
5
8. While mining Mailing Lists of
23 Open-Source Projects
• Don’t treat mail archives as textual data
• Changing technologies
• Up to 98% of messages contain noise
7
9. While mining Mailing Lists of
23 Open-Source Projects
• Don’t treat mail archives as textual data
• Changing technologies
• Up to 98% of messages contain noise
Additional processing and cleaning needed!
8
10. From geek+@cmu.edu Wed Jan 21 08:11:26 1998
Date: Mon, 27 Jan 1997 12:50:44 -0500 (EST)
From: "Brian E. Gallew" <geek+@cmu.edu>
Subject: Re: [HACKERS] configure
- ---559023410-851401618-854387445=:824
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
> If you can grab a copy and run it on your machine, and send me
> the output, that would help alot.
Here is a gzip'ed tar of the results.
=====================================================================
| Please do not shoot at the thermonuclear weapons! -- Deacon |
=====================================================================
| Finger geek@andrew.cmu.edu for my public key. |
=====================================================================
- ---559023410-851401618-854387445=:824
Content-Type: APPLICATION/x-gzip
Content-Transfer-Encoding: BASE64
Content-Description: m88k-dg-dgux5.4R3.10.tar.gz
H4sIAHDq7DICA+xba3vaSLLOV/MrepzsGHiQuNomOJ4MdnDMrC8csB17HQ8W
UgM9FpJWFxsmyX8/Vd0tIYGwye5kP+w5fp4EaHW9XV1dXbdu6bY1ZCNV1/Qx
ffWD/sql0k6tRl4RUq6IT1KWn4R/L5UI2ansVirVSrlWhpZqqVZ5RUqv/gN/
gedrLiGvHNvzRy71VvUbUYu6mvnqv+zvNbkYM48MmUkJfGrEG1PTJJ7uMscn
9
11. From geek+@cmu.edu Wed Jan 21 08:11:26 1998
Date: Mon, 27 Jan 1997 12:50:44 -0500 (EST)
From: "Brian E. Gallew" <geek+@cmu.edu>
Subject: Re: [HACKERS] configure
- ---559023410-851401618-854387445=:824
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
> If you can grab a copy and run it on your machine, and send me
> the output, that would help alot.
Here is a gzip'ed tar of the results.
=====================================================================
| Please do not shoot at the thermonuclear weapons! -- Deacon |
=====================================================================
| Finger geek@andrew.cmu.edu for my public key. |
=====================================================================
- ---559023410-851401618-854387445=:824
Content-Type: APPLICATION/x-gzip
Content-Transfer-Encoding: BASE64
Content-Description: m88k-dg-dgux5.4R3.10.tar.gz
H4sIAHDq7DICA+xba3vaSLLOV/MrepzsGHiQuNomOJ4MdnDMrC8csB17HQ8W
UgM9FpJWFxsmyX8/Vd0tIYGwye5kP+w5fp4EaHW9XV1dXbdu6bY1ZCNV1/Qx
ffWD/sql0k6tRl4RUq6IT1KWn4R/L5UI2ansVirVSrlWhpZqqVZ5RUqv/gN/
gedrLiGvHNvzRy71VvUbUYu6mvnqv+zvNbkYM48MmUkJfGrEG1PTJJ7uMscn
10
12. Resolving Multiple Sender Identities
• Participants send mail from different addresses
• Up to 21% of addresses are aliases
• Such aliases bias identity-based analyses
• Manual inspection and correction tedious
• No fully automated approach to resolve identities
11
13. Reconstructing Discussion Threads
• Mail stored sequentially in archives
• Logical grouping: discussion topics
• Required information erroneous or missing
• Essential for social network and topic analysis
A A
B B
C C
D D
Linear Sequence Thread Hierarchy
12
14. From geek+@cmu.edu Wed Jan 21 08:11:26 1998
Date: Mon, 27 Jan 1997 12:50:44 -0500 (EST)
From: "Brian E. Gallew" <geek+@cmu.edu>
Subject: Re: [HACKERS] configure
- ---559023410-851401618-854387445=:824
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
> If you can grab a copy and run it on your machine, and send me
> the output, that would help alot.
Here is a gzip'ed tar of the results.
=====================================================================
| Please do not shoot at the thermonuclear weapons! -- Deacon |
=====================================================================
| Finger geek@andrew.cmu.edu for my public key. |
=====================================================================
- ---559023410-851401618-854387445=:824
Content-Type: APPLICATION/x-gzip
Content-Transfer-Encoding: BASE64
Content-Description: m88k-dg-dgux5.4R3.10.tar.gz
H4sIAHDq7DICA+xba3vaSLLOV/MrepzsGHiQuNomOJ4MdnDMrC8csB17HQ8W
UgM9FpJWFxsmyX8/Vd0tIYGwye5kP+w5fp4EaHW9XV1dXbdu6bY1ZCNV1/Qx
ffWD/sql0k6tRl4RUq6IT1KWn4R/L5UI2ansVirVSrlWhpZqqVZ5RUqv/gN/
gedrLiGvHNvzRy71VvUbUYu6mvnqv+zvNbkYM48MmUkJfGrEG1PTJJ7uMscn
13
15. From geek+@cmu.edu Wed Jan 21 08:11:26 1998
Date: Mon, 27 Jan 1997 12:50:44 -0500 (EST)
From: "Brian E. Gallew" <geek+@cmu.edu>
Subject: Re: [HACKERS] configure
- ---559023410-851401618-854387445=:824
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
> If you can grab a copy and run it on your machine, and send me
> the output, that would help alot.
Here is a gzip'ed tar of the results.
=====================================================================
| Please do not shoot at the thermonuclear weapons! -- Deacon |
=====================================================================
| Finger geek@andrew.cmu.edu for my public key. |
=====================================================================
- ---559023410-851401618-854387445=:824
Content-Type: APPLICATION/x-gzip
Content-Transfer-Encoding: BASE64
Content-Description: m88k-dg-dgux5.4R3.10.tar.gz
H4sIAHDq7DICA+xba3vaSLLOV/MrepzsGHiQuNomOJ4MdnDMrC8csB17HQ8W
UgM9FpJWFxsmyX8/Vd0tIYGwye5kP+w5fp4EaHW9XV1dXbdu6bY1ZCNV1/Qx
ffWD/sql0k6tRl4RUq6IT1KWn4R/L5UI2ansVirVSrlWhpZqqVZ5RUqv/gN/
gedrLiGvHNvzRy71VvUbUYu6mvnqv+zvNbkYM48MmUkJfGrEG1PTJJ7uMscn
14
16. Attachments
• MIME standard defines extensions to email
• Binary data encoded as text
• Around 10% of messages have attachments
• Extract attachments and store separately
15
17. From geek+@cmu.edu Wed Jan 21 08:11:26 1998
Date: Mon, 27 Jan 1997 12:50:44 -0500 (EST)
From: "Brian E. Gallew" <geek+@cmu.edu>
Subject: Re: [HACKERS] configure
- ---559023410-851401618-854387445=:824
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
> If you can grab a copy and run it on your machine, and send me
> the output, that would help alot.
Here is a gzip'ed tar of the results.
=====================================================================
| Please do not shoot at the thermonuclear weapons! -- Deacon |
=====================================================================
| Finger geek@andrew.cmu.edu for my public key. |
=====================================================================
- ---559023410-851401618-854387445=:824
Content-Type: APPLICATION/x-gzip
Content-Transfer-Encoding: BASE64
Content-Description: m88k-dg-dgux5.4R3.10.tar.gz
H4sIAHDq7DICA+xba3vaSLLOV/MrepzsGHiQuNomOJ4MdnDMrC8csB17HQ8W
UgM9FpJWFxsmyX8/Vd0tIYGwye5kP+w5fp4EaHW9XV1dXbdu6bY1ZCNV1/Qx
ffWD/sql0k6tRl4RUq6IT1KWn4R/L5UI2ansVirVSrlWhpZqqVZ5RUqv/gN/
gedrLiGvHNvzRy71VvUbUYu6mvnqv+zvNbkYM48MmUkJfGrEG1PTJJ7uMscn
16
18. From geek+@cmu.edu Wed Jan 21 08:11:26 1998
Date: Mon, 27 Jan 1997 12:50:44 -0500 (EST)
From: "Brian E. Gallew" <geek+@cmu.edu>
Subject: Re: [HACKERS] configure
- ---559023410-851401618-854387445=:824
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
> If you can grab a copy and run it on your machine, and send me
> the output, that would help alot.
Here is a gzip'ed tar of the results.
=====================================================================
| Please do not shoot at the thermonuclear weapons! -- Deacon |
=====================================================================
| Finger geek@andrew.cmu.edu for my public key. |
=====================================================================
- ---559023410-851401618-854387445=:824
Content-Type: APPLICATION/x-gzip
Content-Transfer-Encoding: BASE64
Content-Description: m88k-dg-dgux5.4R3.10.tar.gz
H4sIAHDq7DICA+xba3vaSLLOV/MrepzsGHiQuNomOJ4MdnDMrC8csB17HQ8W
UgM9FpJWFxsmyX8/Vd0tIYGwye5kP+w5fp4EaHW9XV1dXbdu6bY1ZCNV1/Qx
ffWD/sql0k6tRl4RUq6IT1KWn4R/L5UI2ansVirVSrlWhpZqqVZ5RUqv/gN/
gedrLiGvHNvzRy71VvUbUYu6mvnqv+zvNbkYM48MmUkJfGrEG1PTJJ7uMscn
17
19. Quotes and Signatures
• Duplicate information
• Unrelated to actual message
• Removing signatures is challenging
• Quoted text may or may not be desirable
• Signatures impact text mining approaches
• No perfect method for signature removal
====
=== ==== |
= ==== n
===
==== -- Deaco =======
==== ==
= ==== eapons! ==== |
=== ==== ear w === ====
==== rmonucl ==== ===
=== == e === == === ====
=== ==== t the th ======= key . === ====
=== ==== shoot a === ==== pub lic === ====
=== ==== do not === ==== fo r my ========
ase ==== cmu.edu ===
| Ple === ==== rew. === ====
==== eek@and ====
==== er g ====
===
ng ==
| Fi ========
==
====
18
21. (1) Mailing Lists contain valuable
information on a project.
(2) Data Needs Pre-Processing before
applying traditional tools.
(3) Manual Data Processing is often not
feasible or requires much effort.
(4) Off-the-Shelf tools were not designed
to prepare data for mining.
20