Abstract--The presence of learning in organizations is important for success and survival. Recent research into open source software developers has primarily suggested a social constructivist view where knowledge is constructed in the social relationships within the organization culture. I report results from a case study that investigated the presence of situated learning in open source developers at earlier time of a project. Thirty-eight developers were systematically selected and examined on their performance, experience and roles during ten months of maintenance work. I followed a model of learning curve effects that associated the improvement in the average resolving time with the accumulated experience. I found a strong relationship between the two variables and confirmed the presence of learning. In addition, I found a less convincing evidence to affirm knowledge depreciates in open source software developers. The depreciation factor was estimated to be 94 percent, compared to other studies which ranged between 65 to 85 percent. An additional investigation was conducted around the organization structure to understand whether core and peripheral members have different average resolving time. The finding was inconclusive to claim both groups have different means towards issue resolution. The consistency in the result about learning existence between this thesis and several related research efforts suggests that learning is likely to be an intrinsic characteristic of open source software development rather than just a speculative belief.
This document outlines a research methods lecture covering various topics:
- Types of research and formulating objectives
- Significance and reviewing literature
- Methodologies and creating a Gantt chart
- Activities like forming titles/questions/objectives and developing literature citations
It also discusses defining research, the scientific method, characteristics of a research methodology like originating from a question and collecting/interpreting data. The goal of research is to add to the body of knowledge by answering questions.
The document provides an overview of a project on vision-based place recognition for autonomous robots. It outlines the objective to localize a robot within an environment using visual cues. The methodology will improve on previous work by combining successful aspects and avoiding limitations. It will use adaptive multi-scale classification to differentiate environments based on discriminative features. Challenges include variations in object appearance and limited robot resources. Testing will use datasets from Bielefeld University and ImageCLEF, as well as a custom data acquisition tool.
The SA 8000 standard provides guidelines for ensuring ethical working conditions related to issues like child labor, minimum wage, working hours, health and safety, discrimination, and freedom of association. It aims to demonstrate that business operations are not based on exploitative "sweatshop" conditions. Any company that becomes SA 8000 certified must comply with nine principles covering these issues and implement management systems to monitor compliance. This standard helps companies prove they source goods ethically to meet demands from Western consumers for socially responsible manufacturing.
Lecture 10 environment friendly dyeing of cottonAdane Nega
The document discusses various methods for making cotton dyeing more environmentally friendly. It focuses on reducing the amount of dyes, salts, and other chemicals released in effluents. Some key methods mentioned are using bifunctional reactive dyes for lower color effluents, low salt reactive dyes to reduce salt, replacing sodium hydrosulfite and sodium sulfide reducing agents, and dyeing at lower liquor ratios. The goal is to meet environmental standards while maintaining dye yield and fabric quality.
This document discusses various methods for water conservation in textile processing, which can reduce effluent volume and treatment costs. Key methods mentioned include improving housekeeping to fix leaks, installing water meters, reusing cooling and washing water through countercurrent flow systems, and optimizing chemical usage. Case studies show that these measures have successfully reduced fresh water consumption by up to 40% in some mills while lowering effluent contamination.
The document discusses green design and eco-friendly textiles. It covers various topics like environment regulations in India, organic cotton cultivation to reduce pollution, naturally colored cotton varieties, traditional natural dyeing methods in India, and eco-labels to identify environmentally sustainable textile products.
Indian textile industry environment issue ppt, nitraAdane Nega
The document discusses the Indian textile industry and environmental issues. It provides an overview of the structure and history of the textile industry in India. It describes the various environmental impacts of the textile industry including air and water pollution from dyeing and finishing processes. It also outlines strategies for pollution control, including cleaner production techniques and end-of-pipe wastewater treatments.
The document discusses the development of an eco-friendly synthetic fibre made from corn. It notes that depleting fossil fuels and environmental pollution present future challenges for textile production. A new biodegradable polyester fibre has been developed from corn, which is a renewable resource. This fibre has properties suitable for textiles like silk-like or cotton-like qualities while being biodegradable and addressing issues with non-renewable and polluting traditional synthetic fibres.
This document outlines a research methods lecture covering various topics:
- Types of research and formulating objectives
- Significance and reviewing literature
- Methodologies and creating a Gantt chart
- Activities like forming titles/questions/objectives and developing literature citations
It also discusses defining research, the scientific method, characteristics of a research methodology like originating from a question and collecting/interpreting data. The goal of research is to add to the body of knowledge by answering questions.
The document provides an overview of a project on vision-based place recognition for autonomous robots. It outlines the objective to localize a robot within an environment using visual cues. The methodology will improve on previous work by combining successful aspects and avoiding limitations. It will use adaptive multi-scale classification to differentiate environments based on discriminative features. Challenges include variations in object appearance and limited robot resources. Testing will use datasets from Bielefeld University and ImageCLEF, as well as a custom data acquisition tool.
The SA 8000 standard provides guidelines for ensuring ethical working conditions related to issues like child labor, minimum wage, working hours, health and safety, discrimination, and freedom of association. It aims to demonstrate that business operations are not based on exploitative "sweatshop" conditions. Any company that becomes SA 8000 certified must comply with nine principles covering these issues and implement management systems to monitor compliance. This standard helps companies prove they source goods ethically to meet demands from Western consumers for socially responsible manufacturing.
Lecture 10 environment friendly dyeing of cottonAdane Nega
The document discusses various methods for making cotton dyeing more environmentally friendly. It focuses on reducing the amount of dyes, salts, and other chemicals released in effluents. Some key methods mentioned are using bifunctional reactive dyes for lower color effluents, low salt reactive dyes to reduce salt, replacing sodium hydrosulfite and sodium sulfide reducing agents, and dyeing at lower liquor ratios. The goal is to meet environmental standards while maintaining dye yield and fabric quality.
This document discusses various methods for water conservation in textile processing, which can reduce effluent volume and treatment costs. Key methods mentioned include improving housekeeping to fix leaks, installing water meters, reusing cooling and washing water through countercurrent flow systems, and optimizing chemical usage. Case studies show that these measures have successfully reduced fresh water consumption by up to 40% in some mills while lowering effluent contamination.
The document discusses green design and eco-friendly textiles. It covers various topics like environment regulations in India, organic cotton cultivation to reduce pollution, naturally colored cotton varieties, traditional natural dyeing methods in India, and eco-labels to identify environmentally sustainable textile products.
Indian textile industry environment issue ppt, nitraAdane Nega
The document discusses the Indian textile industry and environmental issues. It provides an overview of the structure and history of the textile industry in India. It describes the various environmental impacts of the textile industry including air and water pollution from dyeing and finishing processes. It also outlines strategies for pollution control, including cleaner production techniques and end-of-pipe wastewater treatments.
The document discusses the development of an eco-friendly synthetic fibre made from corn. It notes that depleting fossil fuels and environmental pollution present future challenges for textile production. A new biodegradable polyester fibre has been developed from corn, which is a renewable resource. This fibre has properties suitable for textiles like silk-like or cotton-like qualities while being biodegradable and addressing issues with non-renewable and polluting traditional synthetic fibres.
The document discusses machine learning and how computers can be programmed to learn from examples without being explicitly programmed. It provides examples of machine learning applications like predicting house prices, text classification, and face detection. Additionally, it describes different types of learning problems including supervised learning, unsupervised learning, and classification problems.
The document discusses machine learning and how computers can be programmed to learn from examples without being explicitly programmed. It provides examples of machine learning applications like predicting house prices, text classification, and face detection. Additionally, it describes different types of learning problems including supervised learning, unsupervised learning, and classification problems.
Anyone can research: guerilla user research tips for design and development -...Girl Geek Dinners Milano
1. The document discusses guerilla user research tips for design and development. It provides quick steps for conducting guerilla user research including writing research objectives, recruiting participants, interviewing and observing, analyzing the data, and creating design principles.
2. The steps involve recruiting a small number of participants, conducting informal interviews and observations to understand their needs and behaviors, then analyzing the data to develop design principles to guide product development.
3. Guerilla user research allows product teams to get useful insights quickly and with limited resources to help make better design decisions.
The document summarizes a study that explored how students used shared artifacts in a virtual environment to organize and practice virtual project management skills. It investigated a course where students collaborated virtually to develop projects for a client. Data was collected from the shared workspace and student interviews. The analysis found that students primarily used the virtual tools for sharing and reviewing literature, creating intermediate content, revising final content, defining tasks and responsibilities, and sharing operational information. Some students noted they would have preferred alternative collaboration tools.
Abstract:
Though in essence an engineering discipline, software engineering research has always been struggling to demonstrate impact. This is reflected in part by the funding challenges that the discipline faces in many countries, the difficulties we have to attract industrial participants to our conferences, and the scarcity of papers reporting industrial case studies.
There are clear historical reasons for this but we nevertheless need, as a community, to question our research paradigms and peer evaluation processes in order to improve the situation. From a personal standpoint, relevance and impact are concerns that I have been struggling with for a long time, which eventually led me to leave a comfortable academic position and a research chair to work in industry-driven research.
I will use some concrete research project examples to argue why we need more inductive research, that is, research working from specific observations in real settings to broader generalizations and theories. Among other things, the examples will show how a more thorough understanding of practice and closer interactions with practitioners can profoundly influence the definition of research problems, and the development and evaluation of solutions to these problems. Furthermore, these examples will illustrate why, to a large extent, useful research is necessarily multidisciplinary. I will also address issues regarding the implementation of such a research paradigm and show how our own bias as a research community worsens the situation and undermines our very own interests.
On a more humorous note, the title hints at the fact that being a scientist in software engineering and aiming at having impact on practice often entails leading two parallel careers and impersonate different roles to different peers and partners.
Bio:
Lionel Briand is heading the Certus center on software verification and validation at Simula Research Laboratory, where he is leading research projects with industrial partners. He is also a professor at the University of Oslo (Norway). Before that, he was on the faculty of the department of Systems and Computer Engineering, Carleton University, Ottawa, Canada, where he was full professor and held the Canada Research Chair (Tier I) in Software Quality Engineering. He is the coeditor-in-chief of Empirical Software Engineering (Springer) and is a member of the editorial boards of Systems and Software Modeling (Springer) and Software Testing, Verification, and Reliability (Wiley). He was on the board of IEEE Transactions on Software Engineering from 2000 to 2004. Lionel was elevated to the grade of IEEE Fellow for his work on the testing of object-oriented systems. His research interests include: model-driven development, testing and verification, search-based software engineering, and empirical software engineering.
This document summarizes Rob Grim's presentation on e-Science, research data, and the role of libraries. It discusses the Open Data Foundation's work in promoting metadata standards like DDI and SDMX. It also outlines the research data lifecycle and how metadata management can help libraries support research through services like data registration, archiving, discovery and access. Finally, it provides examples of how Tilburg University library supports research data through services aligned with data availability, discovery, access and delivery.
This project aimed to develop models for named entity recognition from online news articles. Two models were created: a Maximum Entropy model using traditional NLP techniques and a Deep Neural Network model using pre-trained word embeddings. Both models achieved similar accuracy levels of around 93.5-93.8%, but have different requirements and limitations. The Maximum Entropy model performance improved with additional complex features but training time increased substantially. The Deep Learning model accuracy plateaued after 6 epochs of training, with the word embedding dimension having little effect beyond a certain point.
This document summarizes digital science and the future of online research. It discusses how technology is changing the research workflow by enabling more efficient sharing of ideas, literature reviews, results, materials and data. However, there are still roadblocks to overcome, including specialization of tools, lack of interoperability, and accessibility issues. The key constituencies that must be considered are machines/tools, researchers, and decision makers. While the future of research is digital, adoption remains uneven and cultural shifts are needed to fully realize the benefits of new technologies.
This document discusses the design principles of advanced task elicitation systems. It begins with an introduction that outlines the motivation and challenges of manual task elicitation in software development. It then reviews related work on task elicitation systems and the need to evaluate their design principles empirically. The methodology section describes a design science research approach used to conceptualize and evaluate an artifact called REMINER. Evaluation results show that semi-automatic task elicitation and leveraging imported knowledge bases can significantly increase elicitation productivity compared to manual elicitation. The discussion covers limitations and opportunities for future research at the intersection of task elicitation and software development processes.
This document outlines the work of the Semantic Collaborative Environments Group at the Politecnico di Milano university in Italy. The group is researching theoretical approaches and developing experimental systems for collaborative environments using semantics. They will demonstrate their work and discuss future plans, with the goal of leveraging user participation on the Semantic Web to create large semantic datasets.
Link to video used in exercise : http://www.youtube.com/watch?v=AytfuE-wqbA
Link to document with resource list:
http://www.slideshare.net/northavorange/enhancing-at-through-id-techniques-handouts
Rehabilitation professionals classify
needs and identify workable solutions
for people with disabilities on a daily
basis. Unfortunately, many of those
solutions never get beyond the one
person for whom they are made. The
ability to develop solutions that have a
more universal appeal and application
would be a useful tool in the AT
provider’s “tool belt.” Industrial
Designers face such challenges as
a matter of practice. This workshop
will educate participants with regard
to tools and techniques used by
Industrial Designers that can help the
“one-of-a-kind” solutions grow into a
more universally marketable solution.
The team presented their mid-term progress on their capstone project analyzing misinformation and disinformation campaigns online. They discussed their research on relevant platforms, data repositories, and scraping tools. They also reflected on challenges faced, preliminary findings discovered, and next steps which include further data analysis using tools like SBS and Tableau as well as exploring sentiment analysis and model training.
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
This document discusses cloud programming models. It begins by defining programming models and noting that they provide an abstraction of a computer system through a language, libraries and runtime system. It then lists some key characteristics of a cloud programming model including efficiency, scalability, fault tolerance and data models. The document outlines an agenda to cover programming models for compute-intensive and big data workloads. It provides examples of bags of tasks and workflow programming models and their applications in fields like bioinformatics.
The document discusses the development of a faculty search mobile app using MIT App Inventor that allows users to search for professors by name or department and view their research interests and contact information. It outlines the vision, mock-up, scenario, and demonstration of the app, which was created to act as an information portal and reduce paper use. The app allows searching and viewing professor profiles but does not include email or calling functionality, keeping the features limited to only the essential functions.
This document discusses the intersection of machine learning and search-based software engineering (ML & SBSE). It provides examples of how data miners can find signals in software engineering artifacts using machine learning techniques. It then discusses how better algorithms do not necessarily lead to better mining yet and emphasizes the importance of sharing data, models, and analysis methods. Finally, it outlines a vision for "discussion mining" to guide teams in walking across the space of local models, with the goal of building a science of localism in ML and SBSE.
Implementing Archivematica, research data networkJisc RDM
This presentation discusses implementing Archivematica for preserving research data at the University of York and Hull. It covers background on the project, challenges implementing Archivematica, issues with identifying unknown file formats in research data, and future plans to move from proof of concept to production. The project tested pulling metadata from systems into Archivematica for ingest and explored packaging data for long-term preservation and access. A major challenge was the large number of unidentified file formats, which the project is addressing by developing new file format signatures.
This document provides an overview of foundational research propelled by text analytics. It begins with an outline that discusses text analytics in the big data era, information extraction systems and formalisms, foundational research challenges, and conclusions. It then discusses how text analytics has become important for applications like semantic search, life science mining, e-commerce, CRM/BI, and log analysis. It notes the need for database management systems and general-purpose development and management systems to facilitate value extraction from big data by a wide range of users and skills. Core information extraction tasks like named entity recognition, relation extraction, event extraction, temporal information extraction, and coreference resolution are discussed. Several formalisms for information extraction are presented, including X
Past, Present, and Future of Analyzing Software DataJeongwhan Choi
The document discusses the past, present, and future of analyzing software data. It traces the evolution from early pioneers in the 1950s and 1960s who began quantifying aspects of software like size and complexity, to modern academic experiments applying machine learning techniques in the 1980s-2000s, to widespread industrial adoption and conferences focused on the topic today. The future is predicted to include more data, algorithms, roles for data scientists, and real-time analysis to address big data challenges.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
The document discusses machine learning and how computers can be programmed to learn from examples without being explicitly programmed. It provides examples of machine learning applications like predicting house prices, text classification, and face detection. Additionally, it describes different types of learning problems including supervised learning, unsupervised learning, and classification problems.
The document discusses machine learning and how computers can be programmed to learn from examples without being explicitly programmed. It provides examples of machine learning applications like predicting house prices, text classification, and face detection. Additionally, it describes different types of learning problems including supervised learning, unsupervised learning, and classification problems.
Anyone can research: guerilla user research tips for design and development -...Girl Geek Dinners Milano
1. The document discusses guerilla user research tips for design and development. It provides quick steps for conducting guerilla user research including writing research objectives, recruiting participants, interviewing and observing, analyzing the data, and creating design principles.
2. The steps involve recruiting a small number of participants, conducting informal interviews and observations to understand their needs and behaviors, then analyzing the data to develop design principles to guide product development.
3. Guerilla user research allows product teams to get useful insights quickly and with limited resources to help make better design decisions.
The document summarizes a study that explored how students used shared artifacts in a virtual environment to organize and practice virtual project management skills. It investigated a course where students collaborated virtually to develop projects for a client. Data was collected from the shared workspace and student interviews. The analysis found that students primarily used the virtual tools for sharing and reviewing literature, creating intermediate content, revising final content, defining tasks and responsibilities, and sharing operational information. Some students noted they would have preferred alternative collaboration tools.
Abstract:
Though in essence an engineering discipline, software engineering research has always been struggling to demonstrate impact. This is reflected in part by the funding challenges that the discipline faces in many countries, the difficulties we have to attract industrial participants to our conferences, and the scarcity of papers reporting industrial case studies.
There are clear historical reasons for this but we nevertheless need, as a community, to question our research paradigms and peer evaluation processes in order to improve the situation. From a personal standpoint, relevance and impact are concerns that I have been struggling with for a long time, which eventually led me to leave a comfortable academic position and a research chair to work in industry-driven research.
I will use some concrete research project examples to argue why we need more inductive research, that is, research working from specific observations in real settings to broader generalizations and theories. Among other things, the examples will show how a more thorough understanding of practice and closer interactions with practitioners can profoundly influence the definition of research problems, and the development and evaluation of solutions to these problems. Furthermore, these examples will illustrate why, to a large extent, useful research is necessarily multidisciplinary. I will also address issues regarding the implementation of such a research paradigm and show how our own bias as a research community worsens the situation and undermines our very own interests.
On a more humorous note, the title hints at the fact that being a scientist in software engineering and aiming at having impact on practice often entails leading two parallel careers and impersonate different roles to different peers and partners.
Bio:
Lionel Briand is heading the Certus center on software verification and validation at Simula Research Laboratory, where he is leading research projects with industrial partners. He is also a professor at the University of Oslo (Norway). Before that, he was on the faculty of the department of Systems and Computer Engineering, Carleton University, Ottawa, Canada, where he was full professor and held the Canada Research Chair (Tier I) in Software Quality Engineering. He is the coeditor-in-chief of Empirical Software Engineering (Springer) and is a member of the editorial boards of Systems and Software Modeling (Springer) and Software Testing, Verification, and Reliability (Wiley). He was on the board of IEEE Transactions on Software Engineering from 2000 to 2004. Lionel was elevated to the grade of IEEE Fellow for his work on the testing of object-oriented systems. His research interests include: model-driven development, testing and verification, search-based software engineering, and empirical software engineering.
This document summarizes Rob Grim's presentation on e-Science, research data, and the role of libraries. It discusses the Open Data Foundation's work in promoting metadata standards like DDI and SDMX. It also outlines the research data lifecycle and how metadata management can help libraries support research through services like data registration, archiving, discovery and access. Finally, it provides examples of how Tilburg University library supports research data through services aligned with data availability, discovery, access and delivery.
This project aimed to develop models for named entity recognition from online news articles. Two models were created: a Maximum Entropy model using traditional NLP techniques and a Deep Neural Network model using pre-trained word embeddings. Both models achieved similar accuracy levels of around 93.5-93.8%, but have different requirements and limitations. The Maximum Entropy model performance improved with additional complex features but training time increased substantially. The Deep Learning model accuracy plateaued after 6 epochs of training, with the word embedding dimension having little effect beyond a certain point.
This document summarizes digital science and the future of online research. It discusses how technology is changing the research workflow by enabling more efficient sharing of ideas, literature reviews, results, materials and data. However, there are still roadblocks to overcome, including specialization of tools, lack of interoperability, and accessibility issues. The key constituencies that must be considered are machines/tools, researchers, and decision makers. While the future of research is digital, adoption remains uneven and cultural shifts are needed to fully realize the benefits of new technologies.
This document discusses the design principles of advanced task elicitation systems. It begins with an introduction that outlines the motivation and challenges of manual task elicitation in software development. It then reviews related work on task elicitation systems and the need to evaluate their design principles empirically. The methodology section describes a design science research approach used to conceptualize and evaluate an artifact called REMINER. Evaluation results show that semi-automatic task elicitation and leveraging imported knowledge bases can significantly increase elicitation productivity compared to manual elicitation. The discussion covers limitations and opportunities for future research at the intersection of task elicitation and software development processes.
This document outlines the work of the Semantic Collaborative Environments Group at the Politecnico di Milano university in Italy. The group is researching theoretical approaches and developing experimental systems for collaborative environments using semantics. They will demonstrate their work and discuss future plans, with the goal of leveraging user participation on the Semantic Web to create large semantic datasets.
Link to video used in exercise : http://www.youtube.com/watch?v=AytfuE-wqbA
Link to document with resource list:
http://www.slideshare.net/northavorange/enhancing-at-through-id-techniques-handouts
Rehabilitation professionals classify
needs and identify workable solutions
for people with disabilities on a daily
basis. Unfortunately, many of those
solutions never get beyond the one
person for whom they are made. The
ability to develop solutions that have a
more universal appeal and application
would be a useful tool in the AT
provider’s “tool belt.” Industrial
Designers face such challenges as
a matter of practice. This workshop
will educate participants with regard
to tools and techniques used by
Industrial Designers that can help the
“one-of-a-kind” solutions grow into a
more universally marketable solution.
The team presented their mid-term progress on their capstone project analyzing misinformation and disinformation campaigns online. They discussed their research on relevant platforms, data repositories, and scraping tools. They also reflected on challenges faced, preliminary findings discovered, and next steps which include further data analysis using tools like SBS and Tableau as well as exploring sentiment analysis and model training.
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
This document discusses cloud programming models. It begins by defining programming models and noting that they provide an abstraction of a computer system through a language, libraries and runtime system. It then lists some key characteristics of a cloud programming model including efficiency, scalability, fault tolerance and data models. The document outlines an agenda to cover programming models for compute-intensive and big data workloads. It provides examples of bags of tasks and workflow programming models and their applications in fields like bioinformatics.
The document discusses the development of a faculty search mobile app using MIT App Inventor that allows users to search for professors by name or department and view their research interests and contact information. It outlines the vision, mock-up, scenario, and demonstration of the app, which was created to act as an information portal and reduce paper use. The app allows searching and viewing professor profiles but does not include email or calling functionality, keeping the features limited to only the essential functions.
This document discusses the intersection of machine learning and search-based software engineering (ML & SBSE). It provides examples of how data miners can find signals in software engineering artifacts using machine learning techniques. It then discusses how better algorithms do not necessarily lead to better mining yet and emphasizes the importance of sharing data, models, and analysis methods. Finally, it outlines a vision for "discussion mining" to guide teams in walking across the space of local models, with the goal of building a science of localism in ML and SBSE.
Implementing Archivematica, research data networkJisc RDM
This presentation discusses implementing Archivematica for preserving research data at the University of York and Hull. It covers background on the project, challenges implementing Archivematica, issues with identifying unknown file formats in research data, and future plans to move from proof of concept to production. The project tested pulling metadata from systems into Archivematica for ingest and explored packaging data for long-term preservation and access. A major challenge was the large number of unidentified file formats, which the project is addressing by developing new file format signatures.
This document provides an overview of foundational research propelled by text analytics. It begins with an outline that discusses text analytics in the big data era, information extraction systems and formalisms, foundational research challenges, and conclusions. It then discusses how text analytics has become important for applications like semantic search, life science mining, e-commerce, CRM/BI, and log analysis. It notes the need for database management systems and general-purpose development and management systems to facilitate value extraction from big data by a wide range of users and skills. Core information extraction tasks like named entity recognition, relation extraction, event extraction, temporal information extraction, and coreference resolution are discussed. Several formalisms for information extraction are presented, including X
Past, Present, and Future of Analyzing Software DataJeongwhan Choi
The document discusses the past, present, and future of analyzing software data. It traces the evolution from early pioneers in the 1950s and 1960s who began quantifying aspects of software like size and complexity, to modern academic experiments applying machine learning techniques in the 1980s-2000s, to widespread industrial adoption and conferences focused on the topic today. The future is predicted to include more data, algorithms, roles for data scientists, and real-time analysis to address big data challenges.
Similar to Situated learning among open source software developers (20)
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
Infrastructure Challenges in Scaling RAG with Custom AI models
Situated learning among open source software developers
1. A Master Thesis Presentation
(Dartington Pottery Training Workshop, 1978)
Situated Learning in
Open Source Software Developers:
The Case of Google Chrome Project
Author: Supervisors:
Josef Hardi Prof. Barbara Russo
European Master in Software Engineering Dr. Richard Torkar
Thursday, August 4, 2011
2. Introduction
• Situated Learning is the learning that occurs
in workplaces [Brown et al., 1989].
• No separation between ‘knowing’ and ‘doing’.
• Situated learning is primarily practiced by the
community of practitioners.
1/18
Thursday, August 4, 2011
3. Existing Findings
• Learning curve effect.
• “That the more times a task has been performed, the
less time will be required on each subsequent
iteration.” [T.P. Wright, 1936]
• [Huntley, 2003]: Mozilla is reported to exhibit a
strong learning curve compared to Apache.
• [Au et al., 2009]: Learning is universally present
in OSS projects.
2/18
Thursday, August 4, 2011
4. Distinctions in this
Thesis
• Data are taken from each individual instead of
from an aggregation of individuals.
• More insights to individual characteristics.
• i.e., Knowledge depreciation and team roles as
factors that affect the learning process.
3/18
Thursday, August 4, 2011
5. Research Question 1: Research Question 2:
Is learning present in What are the factors that
OSS developers? affect learning?
Hypothesis 2:
Hypothesis 1: Knowledge depreciates over
There is a relation time among the OSS
between the developers.
accumulated
experience and the Hypothesis 3:
performance. Core developers resolve
issues faster.
4/18
Thursday, August 4, 2011
6. Case Study
• Google Chrome Project.
• Duration: 10 months ~ 10
releases (December 2008 -
October 2009).
5/18
Thursday, August 4, 2011
7. Research Methodology
1 2
Data Collection
Data exploration
Issue Report Review
Data Interaction Data Performance Experience Team Role
4 3
Identification of Learning Curve
Construct Input Data
Models and Data Fitting
6/18
Thursday, August 4, 2011
8. Research Methodology: 1 2 3 4
Data Collection
Issue Report =
[ID, Type, Area, Status,
Owner, Open date,
Assigned date, Started
date, Close date]
1. Unrelated project areas,
2. Invalid issue status,
3. Empty owner name.
Issue Report Data
(5,160 entries)
7/18
Thursday, August 4, 2011
9. Research Methodology: 1 2 3 4
Data Collection
"ben","sky",1226700214
"ben","sky",1226706864
"ben","pkasting",1226707765
"mal","tony",1226809276
"sgk","tony",1226874776
"phajdan.jr","deanm",1227808551
"phajdan.jr","deanm",1227809341
"phajdan.jr","mark",1228496086
...
Interaction =
[Owner, Reviewer,
Comment date]
Review
Interaction Data
(12,037 entries) 8/18
Thursday, August 4, 2011
10. Research Methodology: 1 2 3 4
Data Exploration
Releases
Issue Report
Data
Developers
Average of issue Performance
Measure Performance resolution time.
...
Releases
Issue Report
Data
Developers
Number of resolved Experience
Measure Experience
issues
...
Sample = 274 developers
9/18
Thursday, August 4, 2011
11. Research Methodology: 1 2 3 4
Data Exploration
Review
Releases
Interaction
Data
Developers
Core and periphery Team Role
Estimate Team Role structure model
[Borgatti, 1999]
...
• Core entails a dense, cohesive structure and
periphery entails a sparse, loose structure.
• The estimation is performed by using UCINET.
Sample = 274 developers
10/18
Thursday, August 4, 2011
12. Research Methodology: 1 2 3 4
Construct Input Data
274 Developers 38 Long-term
Contributors
Participate for
at least 8
releases
Refine
Not all of them working
in a long-term. longitud inal data
new
sets
11/18
Thursday, August 4, 2011
13. Input data set:
Performance
The data distribution in the group of long-term developers
Average time of resolving issues
(log days)
Releases
12/18
Thursday, August 4, 2011
14. Input data set:
Experience
The data distribution in the group of long-term developers
Amount of resolved issues
(N)
Releases
13/18
Thursday, August 4, 2011
15. Input data set:
Team Role
The team composition in the group of long-term developers
R1 R2 R3 R4 R5
39% 39% 45% 47%
46% 55% 53%
54% 61% 61%
R6 R7 R8 R9 R10
39%
47% 47% 42% 42%
53% 53% 58% 58% 61%
14/18
Thursday, August 4, 2011
16. Research Methodology: 1 2 3 4
Identification of Learning Curve
Models and Data Fitting
Model 1:
Model 2:
Note
15/18
Thursday, August 4, 2011
17. Result Summary
Hypothesis Variable Model 1 Model 2 Supported?
H1 KnowledgeStock -0.01*** -0.01*** Yes
H2 Lambda 0.94*** 0.94*** Yes
H3 TeamRole NA 0.18 No
*** Statistically significant p < 0.001
16/18
Thursday, August 4, 2011
18. Threats to Validity
Internal Validity External Validity
• The improvement in the solving • Both models have a very low
issues might be caused by the statistical prediction power (less
improvement in the system than 5%).
design.
• Some of the issue data are Construct Validity
incomplete
• The estimation of Core and
Periphery structure might not
reflect the real situation.
However, the communication
pattern is the best indicator.
17/18
Thursday, August 4, 2011
19. Conclusion
• I affirmed that learning is present in open
source software developers.
• Knowledge does not significantly depreciate in
the Google Chrome team.
• It is inconclusive to claim core developers
work faster than those who are in the
periphery.
• Methodological contribution: A method to
harvest and analyze data from code review.
18/18
Thursday, August 4, 2011