Finding Help with Programming Errors: An Exploratory Study of Novice Software...Preetha Chatterjee
Monthly, 50 million users visit Stack Overflow, a popular Q&A forum used by software developers, to share and gather knowledge and help with coding problems. Although Q&A forums serve as a good resource for seeking help from developers beyond the local team, the abundance of information can cause developers, especially novice software engineers, to spend considerable time in identifying relevant answers and suitable suggested fixes.
This exploratory study aims to understand how novice software engineers direct their efforts and what kinds of information they focus on within a post selected from the results returned in response to a search query on Stack Overflow. The results can be leveraged to improve the Q&A forum interface, guide tools for mining forums, and potentially improve granularity of traceability mappings involving forum posts. We qualitatively analyze the novice software engineers’ perceptions from a survey as well as their annotations of a set of Stack Overflow posts. Our results indicate that novice software engineers pay attention to only 27% of code and 15-21% of text in a Stack Overflow post to understand and determine how to apply the relevant information to their context. Our results also discern the kinds of information prominent in that focus.
Extracting Archival-Quality Information from Software-Related ChatsPreetha Chatterjee
Software developers are increasingly having conversations about software development via online chat services. Many of those chat communications contain valuable information, such as code descriptions, good programming practices, and causes of common errors/exceptions. However, the nature of chat community content is transient, as opposed to the archival nature of other developer communications such as email, bug reports and Q&A forums. As a result, important information and advice are lost over time.
The focus of this dissertation is Extracting Archival Information from Software-Related Chats, specifically to (1) automatically identify conversations which contain archival-quality information, (2) accurately reduce the granularity of the information reported as archival information, and (3) conduct a case study to investigate how archival quality information extracted from chats compare to related posts in Q&A forums. Archiving knowledge from developer chats that could be used potentially in several applications such as: creating a new archival mechanism available to a given chat community, augmenting Q&A forums, or facilitating the mining of specific information and improving software maintenance tools.
Proactive Empirical Assessment of New Language Feature Adoption via Automated...Raffi Khatchadourian
Programming languages and platforms improve over time, sometimes resulting in new language features that offer many benefits. However, despite these benefits, developers may not always be willing to adopt them in their projects for various reasons. In this paper, we describe an empirical study where we assess the adoption of a particular new language feature. Studying how developers use (or do not use) new language features is important in programming language research and engineering because it gives designers insight into the usability of the language to create meaning programs in that language. This knowledge, in turn, can drive future innovations in the area. Here, we explore Java 8 default methods, which allow interfaces to contain (instance) method implementations.
Default methods can ease interface evolution, make certain ubiquitous design patterns redundant, and improve both modularity and maintainability. A focus of this work is to discover, through a scientific approach and a novel technique, situations where developers found these constructs useful and where they did not, and the reasons for each. Although several studies center around assessing new language features, to the best of our knowledge, this kind of construct has not been previously considered.
Despite their benefits, we found that developers did not adopt default methods in all situations. Our study consisted of submitting pull requests introducing the language feature to 19 real-world, open source Java projects without altering original program semantics. This novel assessment technique is proactive in that the adoption was driven by an automatic refactoring approach rather than waiting for developers to discover and integrate the feature themselves. In this way, we set forth best practices and patterns of using the language feature effectively earlier rather than later and are able to possibly guide (near) future language evolution. We foresee this technique to be useful in assessing other new language features, design patterns, and other programming idioms.
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
Review of paper
Language Models are Unsupervised Multitask Learners
(GPT-2)
by Alec Radford et al.
Paper link: https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
YouTube presentation: https://youtu.be/f5zULULWUwM
(Slides are written in English, but the presentation is done in Korean)
Finding Help with Programming Errors: An Exploratory Study of Novice Software...Preetha Chatterjee
Monthly, 50 million users visit Stack Overflow, a popular Q&A forum used by software developers, to share and gather knowledge and help with coding problems. Although Q&A forums serve as a good resource for seeking help from developers beyond the local team, the abundance of information can cause developers, especially novice software engineers, to spend considerable time in identifying relevant answers and suitable suggested fixes.
This exploratory study aims to understand how novice software engineers direct their efforts and what kinds of information they focus on within a post selected from the results returned in response to a search query on Stack Overflow. The results can be leveraged to improve the Q&A forum interface, guide tools for mining forums, and potentially improve granularity of traceability mappings involving forum posts. We qualitatively analyze the novice software engineers’ perceptions from a survey as well as their annotations of a set of Stack Overflow posts. Our results indicate that novice software engineers pay attention to only 27% of code and 15-21% of text in a Stack Overflow post to understand and determine how to apply the relevant information to their context. Our results also discern the kinds of information prominent in that focus.
Extracting Archival-Quality Information from Software-Related ChatsPreetha Chatterjee
Software developers are increasingly having conversations about software development via online chat services. Many of those chat communications contain valuable information, such as code descriptions, good programming practices, and causes of common errors/exceptions. However, the nature of chat community content is transient, as opposed to the archival nature of other developer communications such as email, bug reports and Q&A forums. As a result, important information and advice are lost over time.
The focus of this dissertation is Extracting Archival Information from Software-Related Chats, specifically to (1) automatically identify conversations which contain archival-quality information, (2) accurately reduce the granularity of the information reported as archival information, and (3) conduct a case study to investigate how archival quality information extracted from chats compare to related posts in Q&A forums. Archiving knowledge from developer chats that could be used potentially in several applications such as: creating a new archival mechanism available to a given chat community, augmenting Q&A forums, or facilitating the mining of specific information and improving software maintenance tools.
Proactive Empirical Assessment of New Language Feature Adoption via Automated...Raffi Khatchadourian
Programming languages and platforms improve over time, sometimes resulting in new language features that offer many benefits. However, despite these benefits, developers may not always be willing to adopt them in their projects for various reasons. In this paper, we describe an empirical study where we assess the adoption of a particular new language feature. Studying how developers use (or do not use) new language features is important in programming language research and engineering because it gives designers insight into the usability of the language to create meaning programs in that language. This knowledge, in turn, can drive future innovations in the area. Here, we explore Java 8 default methods, which allow interfaces to contain (instance) method implementations.
Default methods can ease interface evolution, make certain ubiquitous design patterns redundant, and improve both modularity and maintainability. A focus of this work is to discover, through a scientific approach and a novel technique, situations where developers found these constructs useful and where they did not, and the reasons for each. Although several studies center around assessing new language features, to the best of our knowledge, this kind of construct has not been previously considered.
Despite their benefits, we found that developers did not adopt default methods in all situations. Our study consisted of submitting pull requests introducing the language feature to 19 real-world, open source Java projects without altering original program semantics. This novel assessment technique is proactive in that the adoption was driven by an automatic refactoring approach rather than waiting for developers to discover and integrate the feature themselves. In this way, we set forth best practices and patterns of using the language feature effectively earlier rather than later and are able to possibly guide (near) future language evolution. We foresee this technique to be useful in assessing other new language features, design patterns, and other programming idioms.
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
Review of paper
Language Models are Unsupervised Multitask Learners
(GPT-2)
by Alec Radford et al.
Paper link: https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
YouTube presentation: https://youtu.be/f5zULULWUwM
(Slides are written in English, but the presentation is done in Korean)
This presentation talks about Natural Language Processing using Java. At Museaic, a music intelligence platform, we spent time figuring out how to extract central themes from song lyrics. In this talk, I will cover some of the tasks involved in natural language processing such as named entity recognition, word sense disambiguation and concept/theme extraction. I will also cover libraries available in java such as stanford-nlp, dbpedia-spotlight and graph approaches using WordNet and semantic databases. This talk would help people understand text processing beyond simple keyword approaches and provide them with some of the best techniques/libraries for it in the Java world.
Building a Dynamic Bidding system for a location based Display advertising Pl...Ekta Grover
Experimentation to Productization : Building a Dynamic Bidding system for a location aware Ecosystem, Slides from my Fifth Elephant talk, Bangalore, 2014
Illustrated Code: Building Software in a Literate Way
Andreas Zeller, CISPA Helmholtz Center for Information Security
Notebooks – rich, interactive documents that join together code, documentation, and outputs – are all the rage with data scientists. But can they be used for actual software development? In this talk, I share experiences from authoring two interactive textbooks – fuzzingbook.org and debuggingbook.org – and show how notebooks not only serve for exploring and explaining code and data, but also how they can be used as software modules, integrating self-checking documentation, tests, and tutorials all in one place. The resulting software focuses on the essential, is well-documented, highly maintainable, easily extensible, and has a much higher shelf life than the "duct tape and wire” prototypes frequently found in research and beyond.
Trend detection and analysis on TwitterLukas Masuch
By Henning Muszynski, Benjamin Räthlein & Lukas Masuch
The popularity of social media services has increased exponentially in the last few years. The combination of big social data and powerful analytical technologies makes it possible to gain highly valuable insights that otherwise might not be accessible. The Twitter Analyzer comprises several components to collect, analyze and visualize Twitter data. Therefore, we explored various related technologies to implement this tool. We collected about 38 million english tweets related to various and analyzed those data with machine learning techniques to compute the respective sentiment and detect common topics. Furthermore, we visualized the results using varying visualization techniques to emphasize different aspects such as a wordcloud, several chart-types and geospatial visualizations. Used technologies: MongoDB, Python, Twython, Python NLTK, wordcloud2.js, wordfreq, amCharts, Google BigQuery, Google Cloud Storage, CartoDB, EtcML.
Can Deep Learning solve the Sentiment Analysis ProblemMark Cieliebak
Sentiment analysis appears to be one of the easier tasks in the realm of text analytics: given a text like a tweet or product review, decide whether it contains positive or negative opinion. This task is almost trivial for humans, but it turns out to be a true challenge for automated systems. In fact, state-of-the-art sentiment analysis tools are wrong on approx. 4 out of 10 documents.
Current sentiment analysis tools are rule-based, feature-based, or combinations of both. However, recent research uses deep learning on very large sets of documents.
In this talk, we will explain the intrinsic difficulties of automated sentiment analysis; present existing solution approaches and their performance; describe an architecture for a deep learning system; and explore whether deep learning can improve sentiment analysis accuracy.
The task of keyword extraction is to automatically identify a set of terms that best describe the document. Automatic keyword extraction establishes a foundation for various natural language processing applications: information retrieval, the automatic indexing and classification of documents, automatic summarization and high-level semantic description, etc. Although the keyword extraction applications usually work on single documents (document-oriented task), keyword extraction is also applicable to a more demanding task, i.e. the keyword extraction from a whole collection of documents or from an entire web site, or from tweets from Twitter. In the era of big-data, obtaining an effective and efficient method for automatic keyword extraction from huge amounts of multi-topic textual sources is of high importance.
We proposed a novel Selectivity-Based Keyword Extraction (SBKE) method, which extracts keywords from the source text represented as a network. The node selectivity value is calculated from a weighted network as the average weight distributed on the links of a single node and is used in the procedure of keyword candidate ranking and extraction. The selectivity slightly outperforms an extraction based on the standard centrality measures. Therefore, the selectivity and its modification – generalized selectivity as the node centrality measures are included in the SBKE method. Selectivity-based extraction does not require linguistic knowledge as it is derived purely from statistical and structural information of the network and it can be easily ported to new languages and used in a multilingual scenario. The true potential of the proposed SBKE method is in its generality, portability and low computation costs, which positions it as a strong candidate for preparing collections which lack human annotations for keyword extraction. Testing of the portability of the SBKE was tested on Croatian, Serbian and English texts – more precisely it was developed on Croatian News and ported for extraction from parallel abstracts of scientific publication in the Serbian and English languages.
The constructed parallel corpus of scientific abstracts with annotated keywords allows a better comparison of the performance of the method across languages since we have the controlled experimental environment and data. The achieved keyword extraction results measured with an F1 score are 49.57% for English and 46.73% for the Serbian language, if we disregard keywords that are not present in the abstracts. In case that we evaluate against the whole keyword set, the F1 scores are 40.08% and 45.71% respectively. This work shows that SBKE can be easily ported to new a language, domain and type of text in the sense of its structure. Still, there are drawbacks – the method can extract only the words that appear in the text.
Research seminar slides at URJC June 6. Briefly: social analysis; more detailed: static analysis and co-evolution (joint w Landman, Vinju, Muske; Businge).
In recent times, research activities in the areas of Opinion and Sentiment analysis in natural language texts and other media are gaining ground under the umbrella of subjectivity analysis. The reason may be the huge amount of available text data in the Social Web in the forms of news, reviews, blogs, chats and even twitter. Though Sentiment analysis from natural lan-guage text is a multifaceted and multidisciplinary problem, in general, the term “sentiment” is used in reference to the automatic analysis of evaluative text.
Presented at: The 29th Annual International Conference on Computer Science and Software Engineering (CASCON 2019)
Date of Conference: November 4, 2019 - November 6, 2019
Conference Location: Markham, Ontario, Canada
DOI: https://dl.acm.org/doi/abs/10.5555/3370272.3370293
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
How to Ask for Technical Help? Evidence-based Guidelines for Writing Question...Fabio Calefato
Slides presenting results from our IST paper (https://arxiv.org/abs/1710.04692) / IEEE Software blog post (http://blog.ieeesoftware.org/2017/11/can-we-trust-stack-overflow-netiquette.html) investigating whether we can trust Stack Overflow netiquette for writing better questions.
Automatic Identification of Informative Code in Stack Overflow PostsPreetha Chatterjee
Despite Stack Overflow’s popularity as a resource for solving coding problems, identifying relevant information from an individual post remains a challenge. The overload of information in a post can make it difficult for developers to identify specific and targeted code fixes. In this paper, we aim to help users identify informative code segments, once they have narrowed down their search to a post relevant to their task. Specifically, we explore natural language- based approaches to extract problematic and suggested code pairs from a post. The goal of the study is to investigate the potential of designing a browser extension to draw the readers’ attention to relevant code segments, and thus improve the experience of software engineers seeking help on Stack Overflow.
This presentation talks about Natural Language Processing using Java. At Museaic, a music intelligence platform, we spent time figuring out how to extract central themes from song lyrics. In this talk, I will cover some of the tasks involved in natural language processing such as named entity recognition, word sense disambiguation and concept/theme extraction. I will also cover libraries available in java such as stanford-nlp, dbpedia-spotlight and graph approaches using WordNet and semantic databases. This talk would help people understand text processing beyond simple keyword approaches and provide them with some of the best techniques/libraries for it in the Java world.
Building a Dynamic Bidding system for a location based Display advertising Pl...Ekta Grover
Experimentation to Productization : Building a Dynamic Bidding system for a location aware Ecosystem, Slides from my Fifth Elephant talk, Bangalore, 2014
Illustrated Code: Building Software in a Literate Way
Andreas Zeller, CISPA Helmholtz Center for Information Security
Notebooks – rich, interactive documents that join together code, documentation, and outputs – are all the rage with data scientists. But can they be used for actual software development? In this talk, I share experiences from authoring two interactive textbooks – fuzzingbook.org and debuggingbook.org – and show how notebooks not only serve for exploring and explaining code and data, but also how they can be used as software modules, integrating self-checking documentation, tests, and tutorials all in one place. The resulting software focuses on the essential, is well-documented, highly maintainable, easily extensible, and has a much higher shelf life than the "duct tape and wire” prototypes frequently found in research and beyond.
Trend detection and analysis on TwitterLukas Masuch
By Henning Muszynski, Benjamin Räthlein & Lukas Masuch
The popularity of social media services has increased exponentially in the last few years. The combination of big social data and powerful analytical technologies makes it possible to gain highly valuable insights that otherwise might not be accessible. The Twitter Analyzer comprises several components to collect, analyze and visualize Twitter data. Therefore, we explored various related technologies to implement this tool. We collected about 38 million english tweets related to various and analyzed those data with machine learning techniques to compute the respective sentiment and detect common topics. Furthermore, we visualized the results using varying visualization techniques to emphasize different aspects such as a wordcloud, several chart-types and geospatial visualizations. Used technologies: MongoDB, Python, Twython, Python NLTK, wordcloud2.js, wordfreq, amCharts, Google BigQuery, Google Cloud Storage, CartoDB, EtcML.
Can Deep Learning solve the Sentiment Analysis ProblemMark Cieliebak
Sentiment analysis appears to be one of the easier tasks in the realm of text analytics: given a text like a tweet or product review, decide whether it contains positive or negative opinion. This task is almost trivial for humans, but it turns out to be a true challenge for automated systems. In fact, state-of-the-art sentiment analysis tools are wrong on approx. 4 out of 10 documents.
Current sentiment analysis tools are rule-based, feature-based, or combinations of both. However, recent research uses deep learning on very large sets of documents.
In this talk, we will explain the intrinsic difficulties of automated sentiment analysis; present existing solution approaches and their performance; describe an architecture for a deep learning system; and explore whether deep learning can improve sentiment analysis accuracy.
The task of keyword extraction is to automatically identify a set of terms that best describe the document. Automatic keyword extraction establishes a foundation for various natural language processing applications: information retrieval, the automatic indexing and classification of documents, automatic summarization and high-level semantic description, etc. Although the keyword extraction applications usually work on single documents (document-oriented task), keyword extraction is also applicable to a more demanding task, i.e. the keyword extraction from a whole collection of documents or from an entire web site, or from tweets from Twitter. In the era of big-data, obtaining an effective and efficient method for automatic keyword extraction from huge amounts of multi-topic textual sources is of high importance.
We proposed a novel Selectivity-Based Keyword Extraction (SBKE) method, which extracts keywords from the source text represented as a network. The node selectivity value is calculated from a weighted network as the average weight distributed on the links of a single node and is used in the procedure of keyword candidate ranking and extraction. The selectivity slightly outperforms an extraction based on the standard centrality measures. Therefore, the selectivity and its modification – generalized selectivity as the node centrality measures are included in the SBKE method. Selectivity-based extraction does not require linguistic knowledge as it is derived purely from statistical and structural information of the network and it can be easily ported to new languages and used in a multilingual scenario. The true potential of the proposed SBKE method is in its generality, portability and low computation costs, which positions it as a strong candidate for preparing collections which lack human annotations for keyword extraction. Testing of the portability of the SBKE was tested on Croatian, Serbian and English texts – more precisely it was developed on Croatian News and ported for extraction from parallel abstracts of scientific publication in the Serbian and English languages.
The constructed parallel corpus of scientific abstracts with annotated keywords allows a better comparison of the performance of the method across languages since we have the controlled experimental environment and data. The achieved keyword extraction results measured with an F1 score are 49.57% for English and 46.73% for the Serbian language, if we disregard keywords that are not present in the abstracts. In case that we evaluate against the whole keyword set, the F1 scores are 40.08% and 45.71% respectively. This work shows that SBKE can be easily ported to new a language, domain and type of text in the sense of its structure. Still, there are drawbacks – the method can extract only the words that appear in the text.
Research seminar slides at URJC June 6. Briefly: social analysis; more detailed: static analysis and co-evolution (joint w Landman, Vinju, Muske; Businge).
In recent times, research activities in the areas of Opinion and Sentiment analysis in natural language texts and other media are gaining ground under the umbrella of subjectivity analysis. The reason may be the huge amount of available text data in the Social Web in the forms of news, reviews, blogs, chats and even twitter. Though Sentiment analysis from natural lan-guage text is a multifaceted and multidisciplinary problem, in general, the term “sentiment” is used in reference to the automatic analysis of evaluative text.
Presented at: The 29th Annual International Conference on Computer Science and Software Engineering (CASCON 2019)
Date of Conference: November 4, 2019 - November 6, 2019
Conference Location: Markham, Ontario, Canada
DOI: https://dl.acm.org/doi/abs/10.5555/3370272.3370293
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
How to Ask for Technical Help? Evidence-based Guidelines for Writing Question...Fabio Calefato
Slides presenting results from our IST paper (https://arxiv.org/abs/1710.04692) / IEEE Software blog post (http://blog.ieeesoftware.org/2017/11/can-we-trust-stack-overflow-netiquette.html) investigating whether we can trust Stack Overflow netiquette for writing better questions.
Automatic Identification of Informative Code in Stack Overflow PostsPreetha Chatterjee
Despite Stack Overflow’s popularity as a resource for solving coding problems, identifying relevant information from an individual post remains a challenge. The overload of information in a post can make it difficult for developers to identify specific and targeted code fixes. In this paper, we aim to help users identify informative code segments, once they have narrowed down their search to a post relevant to their task. Specifically, we explore natural language- based approaches to extract problematic and suggested code pairs from a post. The goal of the study is to investigate the potential of designing a browser extension to draw the readers’ attention to relevant code segments, and thus improve the experience of software engineers seeking help on Stack Overflow.
Automatically Identifying the Quality of Developer Chats for Post Hoc UsePreetha Chatterjee
Software engineers are crowdsourcing answers to their everyday challenges on Q&A forums (e.g., Stack Overflow) and more recently in public chat communities such as Slack, IRC and Gitter. Many software-related chat conversations contain valuable expert knowledge that is useful for both mining to improve programming support tools and for readers who did not participate in the original chat conversations. However, most chat platforms and communities do not contain built-in quality indicators (e.g., accepted answers, vote counts). Therefore, it is difficult to identify conversations that contain useful information for mining or reading, i.e,. conversations of post hoc quality. In this paper, we investigate automatically detecting developer conversations of post hoc quality from public chat channels. We first describe an analysis of 400 developer conversations that indicate potential characteristics of post hoc quality, followed by a machine learning-based approach for automatically identifying conversations of post hoc quality. Our evaluation of 2000 annotated Slack conversations in four programming communities (python, clojure, elm, and racket) indicates that our approach can achieve precision of 0.82, recall of 0.90, F-measure of 0.86, and MCC of 0.57. To our knowledge, this is the first automated technique for detecting developer conversations of post hoc quality.
Analysis of StackOverflow posts/user data trend analysis. Predicting time to answer (classification) using Weka. CSCI599 final project on Social media data analytics
Analyzing Big Data's Weakest Link (hint: it might be you)HPCC Systems
Tim Menzies, NC State University, presents at the 2015 HPCC Systems Engineering Summit Community Day.
For Big Data applications, there is a lack of any gold standards for "good analysis" or methods to assess our certification programs. Hence, we are still in the dark about whether or not our human analysts are making the best use possible of the tools of Big Data. While much progress has been made in the systems aspects of Big Data, certain critical human-centered aspects remain an open issue. Regardless of the sophistication of the analysis tools and environment, all that architecture can still be used incorrectly by users. If this issue was confined to a small number of inexperienced users, then it could be addressed via process improvements such as better training. But is it? What do we know about our analysts? Where are the studies that mine the people doing the data miners?
This presentation offers some preliminary results on tools that combine ECL with other methods that recognize the code generated by experienced or inexperienced developers. While the results are preliminary, they do raise the possibility that we can better characterize what it means to be experienced (or inexperienced) at Big Data applications.
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...Daniel Zivkovic
Serverless Toronto's 6th-anniversary event helps IT pros understand and prepare for the #GenAI tsunami ahead. You'll gain situational awareness of the LLM Landscape, receive condensed insights, and actionable advice about RAG in 2024 from Google AI Lead Mark Ryan and LlamaIndex creator Jerry Liu. We chose #RAG (Retrieval-Augmented Generation) because it is the predominant paradigm for building #LLM (Large Language Model) applications in enterprises today - and that's where the jobs will be shifting. Here is the recording: https://youtu.be/P5xd1ZjD-Os?si=iq8xibj5pJsJ62oW
LLMs in Production: Tooling, Process, and Team StructureAggregage
Join Dr. Greg Loughnane and Chris Alexiuk in this exciting webinar to learn all about the tooling, processes, and team structure you need to build and operate performant, reliable, and scalable production-grade LLM applications!
Microsoft and Revolution Analytics -- what's the add-value? 20150629Mark Tabladillo
Microsoft has been a leader in the enterprise analytics space for years. In 2014, Microsoft had already created R language functionality within Azure Machine Learning. On April 6, 2015, Microsoft and closed on a deal to acquire Revolution Analytics, a company focusing on scalable processing solutions initiated by the well-known R language. Many data science projects and initial demos do not need high-volume solutions: however, having a high-volume answer for the R language allows for planning or working toward the largest data science solutions.
This presentation describes the add-value for the Revolution Analytics acquisition. The talk covers 1) an overview of current data science technologies from Microsoft; 2) a description of the R language; 3) a brief review of the add-value for R with Azure Machine Learning, and 4) a description of the performance architecture and demo of the language constructs developed by Revolution Analytics. Most of the presentation will be focused on sections two and four. It is anticipated that these technologies will be partially if not fully integrated into SQL Server 2016.
The success of developer forums like Stack Overflow (SO) depends on the participation of users and the quality of shared knowledge. SO allows its users to suggest edits to improve the quality of the posts (e.g., questions and answers). Such posts can be rolled back to an earlier version when the current version of the post with the suggested edit does not satisfy the user. However, subjectivity bias in deciding either an edit is satisfactory or not could introduce inconsistencies in the rollback edits. For example, while a user may accept the formatting of a method name (e.g., getActivity()) as a code term, another user may reject it. Such bias in rollback edits could be detrimental and demotivating to the users whose suggested edits were rolled back. This problem is compounded due to the absence of specific guidelines and tools to support consistency across users on their rollback actions. To mitigate this problem, we investigate the inconsistencies in the rollback editing process of SO and make three contributions. First, we identify eight inconsistency types in rollback edits through a qualitative analysis of 777 rollback edits in 382 questions and 395 answers. Second, we determine the impact of the eight rollback inconsistencies by surveying 44 software developers. More than 80% of the study participants find our produced catalogue of rollback inconsistencies to be detrimental to the post quality. Third, we develop a suite of algorithms to detect the eight rollback inconsistencies. The algorithms offer more than 95% accuracy and thus can be used to automatically but reliably inform users in SO of the prevalence of inconsistencies in their suggested edits and rollback actions.
Presentation given by the Proffer team during their hackathon launch ceremony at IIT Delhi on November 10.
In partnership with NITI Aayog, Microsoft, IBM, Accel Partners, AWS, and Coinbase/Toshi. $17K+ in prizes for your Ethereum/Hyperledger projects.
Word Cloud Plus with Will and Ray PoynterRay Poynter
Ray and Will Poynter have created Word Cloud Plus, a web app for producing word clouds.
The key benefit of Word Cloud Plus is that it leverages the human brain to find and display better word clouds. Using Word Cloud Plus allows you to choose the right algorithm, choose the appropriate parameters, and configure the output.
Check out these three examples below – or try it yourself by visiting the Word Cloud Plus website: www.wordcloudplus.com
- "What is a word cloud, what are they are good for, and what they not good for?" – a discussion of what Word Clouds can and can’t do.
Visit the website here: https://wordcloudplus.com/blog/what-is-a-word-cloud-what-are-they-are-good-for-and-what-they-not-good-for
- "A letter to Santa from five countries analyzed using World Cloud Plus – a guest blog from Fastuna", using Word Cloud Plus to analyse some seasonal data. Visit the site here: https://wordcloudplus.com/blog/a-letter-to-santa-from-five-countries-analyzed-using-world-cloud-plus
- "What makes a great presenter?", Analysis via Word Cloud Plus – an example of how to use Word Cloud Plus. Read more here: https://wordcloudplus.com/blog/what-makes-a-great-presenter-analysis-via-word-cloud-plus
Similar to Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineering Tools (20)
Transform Your Communication with Cloud-Based IVR SolutionsTheSMSPoint
Discover the power of Cloud-Based IVR Solutions to streamline communication processes. Embrace scalability and cost-efficiency while enhancing customer experiences with features like automated call routing and voice recognition. Accessible from anywhere, these solutions integrate seamlessly with existing systems, providing real-time analytics for continuous improvement. Revolutionize your communication strategy today with Cloud-Based IVR Solutions. Learn more at: https://thesmspoint.com/channel/cloud-telephony
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
AI Genie Review: World’s First Open AI WordPress Website CreatorGoogle
AI Genie Review: World’s First Open AI WordPress Website Creator
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-genie-review
AI Genie Review: Key Features
✅Creates Limitless Real-Time Unique Content, auto-publishing Posts, Pages & Images directly from Chat GPT & Open AI on WordPress in any Niche
✅First & Only Google Bard Approved Software That Publishes 100% Original, SEO Friendly Content using Open AI
✅Publish Automated Posts and Pages using AI Genie directly on Your website
✅50 DFY Websites Included Without Adding Any Images, Content Or Doing Anything Yourself
✅Integrated Chat GPT Bot gives Instant Answers on Your Website to Visitors
✅Just Enter the title, and your Content for Pages and Posts will be ready on your website
✅Automatically insert visually appealing images into posts based on keywords and titles.
✅Choose the temperature of the content and control its randomness.
✅Control the length of the content to be generated.
✅Never Worry About Paying Huge Money Monthly To Top Content Creation Platforms
✅100% Easy-to-Use, Newbie-Friendly Technology
✅30-Days Money-Back Guarantee
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
#AIGenieApp #AIGenieBonus #AIGenieBonuses #AIGenieDemo #AIGenieDownload #AIGenieLegit #AIGenieLiveDemo #AIGenieOTO #AIGeniePreview #AIGenieReview #AIGenieReviewandBonus #AIGenieScamorLegit #AIGenieSoftware #AIGenieUpgrades #AIGenieUpsells #HowDoesAlGenie #HowtoBuyAIGenie #HowtoMakeMoneywithAIGenie #MakeMoneyOnline #MakeMoneywithAIGenie
Artificia Intellicence and XPath Extension FunctionsOctavian Nadolu
The purpose of this presentation is to provide an overview of how you can use AI from XSLT, XQuery, Schematron, or XML Refactoring operations, the potential benefits of using AI, and some of the challenges we face.
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Zoom is a comprehensive platform designed to connect individuals and teams efficiently. With its user-friendly interface and powerful features, Zoom has become a go-to solution for virtual communication and collaboration. It offers a range of tools, including virtual meetings, team chat, VoIP phone systems, online whiteboards, and AI companions, to streamline workflows and enhance productivity.
Utilocate offers a comprehensive solution for locate ticket management by automating and streamlining the entire process. By integrating with Geospatial Information Systems (GIS), it provides accurate mapping and visualization of utility locations, enhancing decision-making and reducing the risk of errors. The system's advanced data analytics tools help identify trends, predict potential issues, and optimize resource allocation, making the locate ticket management process smarter and more efficient. Additionally, automated ticket management ensures consistency and reduces human error, while real-time notifications keep all relevant personnel informed and ready to respond promptly.
The system's ability to streamline workflows and automate ticket routing significantly reduces the time taken to process each ticket, making the process faster and more efficient. Mobile access allows field technicians to update ticket information on the go, ensuring that the latest information is always available and accelerating the locate process. Overall, Utilocate not only enhances the efficiency and accuracy of locate ticket management but also improves safety by minimizing the risk of utility damage through precise and timely locates.
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppGoogle
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-fusion-buddy-review
AI Fusion Buddy Review: Key Features
✅Create Stunning AI App Suite Fully Powered By Google's Latest AI technology, Gemini
✅Use Gemini to Build high-converting Converting Sales Video Scripts, ad copies, Trending Articles, blogs, etc.100% unique!
✅Create Ultra-HD graphics with a single keyword or phrase that commands 10x eyeballs!
✅Fully automated AI articles bulk generation!
✅Auto-post or schedule stunning AI content across all your accounts at once—WordPress, Facebook, LinkedIn, Blogger, and more.
✅With one keyword or URL, generate complete websites, landing pages, and more…
✅Automatically create & sell AI content, graphics, websites, landing pages, & all that gets you paid non-stop 24*7.
✅Pre-built High-Converting 100+ website Templates and 2000+ graphic templates logos, banners, and thumbnail images in Trending Niches.
✅Say goodbye to wasting time logging into multiple Chat GPT & AI Apps once & for all!
✅Save over $5000 per year and kick out dependency on third parties completely!
✅Brand New App: Not available anywhere else!
✅ Beginner-friendly!
✅ZERO upfront cost or any extra expenses
✅Risk-Free: 30-Day Money-Back Guarantee!
✅Commercial License included!
See My Other Reviews Article:
(1) AI Genie Review: https://sumonreview.com/ai-genie-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
#AIFusionBuddyReview,
#AIFusionBuddyFeatures,
#AIFusionBuddyPricing,
#AIFusionBuddyProsandCons,
#AIFusionBuddyTutorial,
#AIFusionBuddyUserExperience
#AIFusionBuddyforBeginners,
#AIFusionBuddyBenefits,
#AIFusionBuddyComparison,
#AIFusionBuddyInstallation,
#AIFusionBuddyRefundPolicy,
#AIFusionBuddyDemo,
#AIFusionBuddyMaintenanceFees,
#AIFusionBuddyNewbieFriendly,
#WhatIsAIFusionBuddy?,
#HowDoesAIFusionBuddyWorks
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Łukasz Chruściel
No one wants their application to drag like a car stuck in the slow lane! Yet it’s all too common to encounter bumpy, pothole-filled solutions that slow the speed of any application. Symfony apps are not an exception.
In this talk, I will take you for a spin around the performance racetrack. We’ll explore common pitfalls - those hidden potholes on your application that can cause unexpected slowdowns. Learn how to spot these performance bumps early, and more importantly, how to navigate around them to keep your application running at top speed.
We will focus in particular on tuning your engine at the application level, making the right adjustments to ensure that your system responds like a well-oiled, high-performance race car.
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Essentials of Automations: The Art of Triggers and Actions in FME
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineering Tools
1. Exploratory Study of Slack Q&A Chats
as a Mining Source for
Software Engineering Tools
Preetha Chatterjee Kostadin Damevski Lori Pollock Vinay Augustine Nicholas A. Kraft
1
3. 8 million daily active users
Given Slack’s increased use, are Slack Q&A chats a good mining source for
Software Engineering tools?
3
https://www.statista.com/statistics/652779/worldwide-slack-users-total-vs-paid/
16 140 268
500
750
1,100
1,700
2,000
2,300
2,700
3,000
4,000
6,000
8,000
10,000
0
2000
4000
6000
8000
10000
12000
Numberofusersinthousands
4. Research Questions
4
RQ1. How prevalent is the kinds of information that has
been successfully mined from the Stack Overflow Q&A
forum to support software engineering tools in developer
Q&A chats such as Slack?
RQ2. Do Slack Q&A chats have characteristics that might
inhibit automatic mining of information to support
software engineering tools?
5. Data Sets
5
Community
(Slack Channels)
#Conversations Community
(SO Tags)
#Posts
Slackauto Slackmanual SOauto SOmanual
clojurians#clojure 5,013 80 clojure 1,3920 80
elmlang#beginners 7,627 80 elm 1,019 160
elmlang#general 5,906 80 - - -
pythondev#help 3,768 80 python 806,763 80
racket#general 1,579 80 racket 3,592 80
Total 23,893 400 Total 825,294 400
Data Preparation:
• Chat Disentanglement [Elsner and Charniak 2008]
• LDA topic model
6. Research Questions
6
RQ1. How prevalent is the kinds of information that has
been successfully mined from the Stack Overflow Q&A
forum to support software engineering tools in developer
Q&A chats such as Slack?
RQ2. Do Slack Q&A chats have characteristics that might
inhibit automatic mining of information to support
software engineering tools?
7. How has Stack Overflow been used as a
mining resource?
8
Code:
• IDE code recommendation [DeSouza‘14, Rahman‘14, Cordeiro’12, Ponzanelli‘14,
Bacchelli‘12, Amintaber‘15]
• Automatic generation of comments [Wong’13, Rahman‘15]
API:
• Learning and recommendation of APIs [Chen’16, Rahman’16, Wang’13]
• Augmenting API documentation [Treude‘16, Subramanian ‘14, Chen’14]
Other:
• Building thesaurus of software-specific terms [Tian’14, Chen’17]
• Gender bias and emotions [Novielli’14, Morgan ’17, Ford’16]
RQ1: Prevalence of information
8. Study Measures
9
Measure
Document length
Code snippet count
Code snippet length
Bad code snippets
Gist links
Stack Overflow links
API mentions in code snippets
API mentions in text
RQ1: Prevalence of information
10. 11
Much of the information mined from Stack Overflow is also available on Slack
Q&A channels.
API mentions are available in larger quantities on Slack Q&A channels.
Links are rarely available on both Slack and Stack Overflow Q&A.
Study Results
RQ1: Prevalence of information
11. Research Questions
12
RQ1. How prevalent is the kinds of information that has
been successfully mined from the Stack Overflow Q&A
forum to support software engineering tools in developer
Q&A chats such as Slack?
RQ2. Do Slack Q&A chats have characteristics that might
inhibit automatic mining of information to support
software engineering tools?
12. 13
Measure
Participant count
Questions with no answer
Answer count
Indicators of accepted answers
Questions with no accepted answer
NL text context per code snippet
Incomplete sentences
Noise in document
Knowledge construction process *
* A. Zagalsky, D. M. German, M.-A. Storey, C. G. Teshima, and G. Poo-Caamaño, “How the R community creates and
curates knowledge: An extended study of Stack Overflow and mailing lists,” Empirical Software Engineering, 2017.
RQ2: Challenges of Mining Slack
Study Measures
13. 14
Words/Phrases: good find; Thanks for your help; cool; this works; that’s it, thanks
a bunch for the swift and adequate pointers; Ah, ya that works; thx for the info;
alright, thx; awesome; that would work; your suggestion is what I landed on; will
have a look thank you; checking it out now thanks; that what i thought; Ok; okay;
kk; maybe this is what i am searching for; handy trick; I see, I’ll give it a whirl;
thanks for the insight!; thanks for the quick response @user, that was extremely
helpful!; That’s a good idea! ; gotcha; oh, I see; Ah fair; that really helps; ah, I
think this is falling into place; that seems reasonable; Thanks for taking the time to
elaborate; Yeah, that did it; why didn’t I try that?
Emojis:
Accepted Answer Indicators
RQ2: Challenges of Mining Slack
14. 15
Measure Results
Participant frequency 1 < 2 < 34
Questions with no answer 15.75%
Answer frequency 0 < 1 < 5
Questions with no accepted answer 52.25%
NL text context per code snippet 0 < 2 < 13
Incomplete sentences 12.63%
Noise in document 10.5%
Knowledge construction
61.5% crowd; 38.5%
participatory
RQ2: Challenges of Mining Slack
Study Results
15. Study Results
16
Accepted answers are available in chat conversations, but require more effort
to discern.
Participatory conversations provide additional value but require deeper analysis
of conversational context.
Percentages of incomplete sentences and noise are low.
RQ2: Challenges of Mining Slack
Measure Results
Participant frequency 1 < 2 < 34
Questions with no answer 15.75%
Answer frequency 0<1<5
Questions with no accepted answer 52.25%
NL text context per code snippet 0 < 2 < 13
Incomplete sentences 12.63%
Noise in document 10.5%
Knowledge construction 61.5% crowd; 38.5% participatory
16. 17
P. Chatterjee, M. A. Nishi, K. Damevski, V. Augustine, L. Pollock and N. A. Kraft, "What information about code
snippets is available in different software-related documents? An exploratory study," 2017 IEEE 24th International
Conference on Software Analysis, Evolution and Reengineering (SANER), Klagenfurt, 2017, pp. 382-386.
The largest proportion of Slack Q&A conversations discuss software design.
Analyzing Types of Information in Chats
17. Related Work on Analyzing Chats
18
• Learn developer behaviors [Elliot’03, Shihab’09, Yu’11, Lin’16]
• Filter out off-topic discussion [Chowdhury and Hindle’15]
• Extraction of rationale [Alkadhi’17, ‘18]
• Chatbots [Lebeuf’17, Paikari’18]
18. Conclusions
19
Q&A chats provide, in lesser quantities, the same information as can be
found in Q&A posts on Stack Overflow.
Adapting technique and training sets can achieve high accuracy in
disentangling the Slack conversations.
It is feasible to apply automated mining approaches to chat conversations
from Slack. However, identifying an accepted answer is non-trivial.
Future Work
Investigate linking between public Slack channels to Stack Overflow.
Mine conversations for software development insights.
Mine opinion statements available in public Slack channels.
19. 20
preethac@udel.edu
@PreethaChatterj
Exploratory Study of Slack Q&A Chats as a Mining Source for
Software Engineering Tools
Q&A chats provide, in lesser quantities, the same information as can be found in
Q&A posts on Stack Overflow.
Adapting technique and training sets can achieve high accuracy in disentangling
the Slack conversations.
It is feasible to apply automated mining approaches to chat conversations from
Slack. However, identifying an accepted answer is non-trivial.
Investigate linking between public Slack channels to Stack Overflow.
Mine conversations for software development insights.
Mine opinion statements available in public Slack channels.
Conclusions
Future Work
Supported by :
• NSF grant grant no. 1812968, 1813253
• DARPA MUSE program Air Force Research
Lab contract no. FA8750-16-2-0288.
Preprint:
https://tinyurl.com/
yxmown4x
Editor's Notes
Thank you. I’m Preetha Chatterjee, a PhD student at University of Delaware. Today, I will describe our work on “Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineering Tools.”
My coauthors are: Kostadin Damevski, Lori Pollock, Vinay Augustine and Nicholas Kraft.
With increased online sharing, developers are having conversations about software via online chat services. (click)
Developers use these communities to ask and answer specific development questions, with the aim of improving their own skills and helping others. Slack is currently the most popular platform which hosts many active public channels focused on software development technologies.
Over 8 million active users participate daily on Slack, and this graph shows how the number of users increased on Slack over the past few years.
Through this study we investigate given Slack’s increased use, are Slack Q&A chats a good mining source for Software Engineering tools?
For RQ1, We compare the content in Q \& A focused public chat communities (e.g. Slack) with Q \& A based discussion forums (e.g. Stack Overflow).
We explore the availability and prevalence of information in Slack that are mined from SO, which provides us with the first insight into the prospect of chat communities as a source of mining.
As a part of RQ2, we investigate the feasibility of applying automatic information extraction techniques on chat messages.
We curated a comparison data set on Slack and SO by using LDA and a modified chat disentanglement technique which was initially proposed by Elsner and Charniak. We gathered around 24k Slack conversations and 800k SO posts. Since all the measures for this study could not be computed automatically with high accuracy, we created smaller subsets of data each containing 400 conversations and posts for manual analysis.
I will first present the methodology and results of RQ1.
This slide shows a pair of example conversation on Slack and a Stack Overflow post on the similar topic, to highlight their differences in form and structure. Chat conversations are transient and as a result important information and advice are lost over time. SO is archival-based resource and developers can easily refer to the information for future references. Chats are informal communication platform where developers exchange a lot of information in short time, while SO has more in-depth questions with well-thought out answers. As opposed to SO, chat conversations lack a formal structure and are often interleaved.
I DON’T THINK WE HAVE TIME TO SHOW THIS SLIDE
Literature shows that, Code and NL text from SO has been mined by researchers for several s/w engg tasks such as IDE recommendation, augmenting API documentation, building thesaurus of s/w specific terms, etc. Collectively, these prior works suggest that specific types of information embedded in software-related documents could be used in building or improving software engineering tools.
To answer RQ1, we focused on similar information that has been commonly mined in SO. Specifically, we analyzed code snippets, links to external resources, and API mentions.
To answer RQ1, we focused on similar information that has been commonly mined in SO. Specifically, we analyzed code snippets, links to external resources, and API mentions.
We display the results primarily as box plots. Read take always and add:
However, most of this information is available in larger quantities on Stack Overflow.
Specifically for API mentions in text, both sources had a fairly low median occurrence, but Slack had a higher value and more variance.
Before the study, we anticipated that developers on Slack would often use links to answer questions, saving time by pointing askers to an existing information source, such as Stack Overflow. Alternatively, we expected askers to use Gist to post code prior to asking questions, in order to benefit from the clean formatting that enables the display of a larger block of code. While both of these behaviors did occur, they were fairly infrequent.
Next I will discuss the methodology and results of RQ2.
To answer RQ2, we focused on measures that could provide some insights into the form of Slack Q&A conversations (participant count, questions with no answer, answer count) and measures that could indicate challenges in automation (how participants indicate accepted answers, questions with no accepted answer, natural language text describing code snippets, incomplete sentences, noise within a document, and knowledge construction process) that suggest a need to filter. Since RQ2 investigates challenges in mining information in developer chat communications to support software engineering tools, we only computed the measures on Slack.
We observed the common words/phrases that indicate answer acceptance in Slack conversations. The most prevalent indicator is “Thanks/thank you”, followed by phrases acknowledging the participant’s help such as “okay”, ”got it”, and other positive sentiment indicators such as “this worked”, “cool”, and “great”.
Accepted answers were also commonly indicated using emojis as listed in the table.
Results represented as percentages are reported directly, while other results, computed as simple counts, are reported as minimum < median < maximum.
The results indicate that the number of incomplete sentences describing code is low, 13%, and similarly the noise in a conversation can be as high as 11%.
2) There is a significant proportion of accepted answers available in Slack. However, an automatic mining tool needs to automatically identify the sentence in a conversation that is an answer to a question and which question it is answering. This implies that NLP techniques and sentiment analysis will most likely be needed to automatically identify and match answers with questions.
3) Nearly 40% of conversations on Slack Q&A channels were participatory, with multiple individuals working together to produce an answer to the initial question. These conversations present an additional mining challenge, as utterances form a complex dependence graph, as answers are contributed and debated concurrently.
To gain insight into the semantic information, we analyzed the kinds of information provided in the conversations. Using the labels defined in one of our previous work, we observed that the most prevalent types of information on Slack is “Design”, which includes information on programming language, framework, and time/space complexity of the code snippet. This aligns with the fact that the main purpose of developer Q&A chats is to ask and answer questions about alternatives for a particular task, specific to using a particular language or technology.
Often the focal point of conversations are APIs, where a developer is asking experts on the channel for suggestions on API or proper idioms for API usage.
Other researches have conducted studies on analyzing chats. However they have focused on learning developer behaviors. Chowdhury and Hindle proposed an approach to automatically filter out off-topic IRC discussions by exploiting Stack Overflow programming discussions and YouTube video comments. Alkadhi et al. examined the frequency and completeness of available rationale in chat messages, contribution of rationale by developers, and the potential of automatic techniques for rationale extraction. Researchers have also investigated the role of chatbots in software development activities.
In summary, Q&A chats provide similar information that can be found on Q&A forums such as Stack Overflow. Adapting existing technique and training sets can achieve high accuracy in disentangling the Slack conversations. And finally, presence of low percentages of noise and incomplete sentences show feasibility to apply automatic mining approaches to extract information from Slack chats.
1) While there were few explicit links to Stack Overflow and GitHub Gists in our dataset, we believe that information is often duplicated on these platforms, and that answers on one platform can be used to complement the other. Future work includes further investigating this linking between public Slack channels to Stack Overflow.
2) Participatory Q&A conversations are available on Slack on large quantities. These conversations often provide interesting insights about various technologies and their use, incorporating various design choices. As future work, we intend to investigate mining such conversations for software development insights.
3) We also observed that developers use Slack to share opinions on best practices, APIs, or tools (e.g., API X has better design or usability than API Y ). Stack Overflow explicitly forbids the use of opinions on its site. Opinions are valuable to software developers, and it could also lead to new mining opportunities for software tools. Hence, we plan to investigate the mining of opinion statements available in public Slack channels.
This concludes my talk. I will be happy to answer questions now.