Keynote presentation from ECBS conference. The talk is about how to use machine learning and AI in improving software engineering. Experiences from our project in Software Center (www.software-center.se).
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...Edureka!
Machine Learning Training with Python: https://www.edureka.co/python )
This Edureka Machine Learning tutorial (Machine Learning Tutorial with Python Blog: https://goo.gl/fe7ykh ) on "AI vs Machine Learning vs Deep Learning" talks about the differences and relationship between AL, Machine Learning and Deep Learning. Below are the topics covered in this tutorial:
1. AI vs Machine Learning vs Deep Learning
2. What is Artificial Intelligence?
3. Example of Artificial Intelligence
4. What is Machine Learning?
5. Example of Machine Learning
6. What is Deep Learning?
7. Example of Deep Learning
8. Machine Learning vs Deep Learning
Machine Learning Tutorial Playlist: https://goo.gl/UxjTxm
Presented at All Things Open RTP Meetup
Presented by Karthik Uppuluri, Fidelity
Title: Generative AI
Abstract: In this session, let us embark on a journey into the fascinating world of generative artificial intelligence. As an emergent and captivating branch of machine learning, generative AI has become instrumental in myriad of sectors, ranging from visual arts to creating software for technological solutions. This session requires no prior expertise in machine learning or AI. It aims to inculcate a robust understanding of fundamental concepts and principles of generative AI and its diverse applications. Join us as we delve into the mechanics of this transformative technology and unpack its potential.
From Amazon to Google, top technology firms have embraced data science and machine learning to improve business outcomes. Yet AI adoption beyond these firms has been slow due to obstacles such as hiring talent, heterogeneous data, and compute infrastructure. Larger firms have built teams to tackle these issues with some success, but small- and mid-tier firms are at a distinct disadvantage. AI as a Service is a paradigm that levels the playing field and empowers businesses across the spectrum.
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...Edureka!
Machine Learning Training with Python: https://www.edureka.co/python )
This Edureka Machine Learning tutorial (Machine Learning Tutorial with Python Blog: https://goo.gl/fe7ykh ) on "AI vs Machine Learning vs Deep Learning" talks about the differences and relationship between AL, Machine Learning and Deep Learning. Below are the topics covered in this tutorial:
1. AI vs Machine Learning vs Deep Learning
2. What is Artificial Intelligence?
3. Example of Artificial Intelligence
4. What is Machine Learning?
5. Example of Machine Learning
6. What is Deep Learning?
7. Example of Deep Learning
8. Machine Learning vs Deep Learning
Machine Learning Tutorial Playlist: https://goo.gl/UxjTxm
Presented at All Things Open RTP Meetup
Presented by Karthik Uppuluri, Fidelity
Title: Generative AI
Abstract: In this session, let us embark on a journey into the fascinating world of generative artificial intelligence. As an emergent and captivating branch of machine learning, generative AI has become instrumental in myriad of sectors, ranging from visual arts to creating software for technological solutions. This session requires no prior expertise in machine learning or AI. It aims to inculcate a robust understanding of fundamental concepts and principles of generative AI and its diverse applications. Join us as we delve into the mechanics of this transformative technology and unpack its potential.
From Amazon to Google, top technology firms have embraced data science and machine learning to improve business outcomes. Yet AI adoption beyond these firms has been slow due to obstacles such as hiring talent, heterogeneous data, and compute infrastructure. Larger firms have built teams to tackle these issues with some success, but small- and mid-tier firms are at a distinct disadvantage. AI as a Service is a paradigm that levels the playing field and empowers businesses across the spectrum.
My presentation today about ChatGPT, Open AI, conversational AI, and the Future Of Work. Includes survey data from the audience. Presented at our Constellation Research Execution Network monthy Office Hours of CIOs, CDOs, and other CXOs.
Generative AI: Responsible Path forward, a presentation conducted during DataHour webinar series by Analytics Vidhya and attended by more than a hundred data scientists and AI experts from around the world. The presentation address the importance of AI ethics and the development of responsible AI governance at tech firms to help mitigate AI risks and ethical issues.
TensorFlow에 대한 분석 내용
- TensorFlow?
- 배경
- DistBelief
- Tutorial - Logistic regression
- TensorFlow - 내부적으로는
- Tutorial - CNN, RNN
- Benchmarks
- 다른 오픈 소스들
- TensorFlow를 고려한다면
- 설치
- 참고 자료
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete DeckSlideTeam
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck is loaded with easy-to-follow content, and intuitive design. Introduce the types and levels of artificial intelligence using the highly-effective visuals featured in this PPT slide deck. Showcase the AI-subfield of machine learning, as well as deep learning through our comprehensive PowerPoint theme. Represent the differences, and interrelationship between AI, ML, and DL. Elaborate on the scope and use case of machine intelligence in healthcare, HR, banking, supply chain, or any other industry. Take advantage of the infographic-style layout to describe why AI is flourishing in today’s day and age. Elucidate AI trends such as robotic process automation, advanced cybersecurity, AI-powered chatbots, and more. Cover all the essentials of machine learning and deep learning with the help of this PPT slideshow. Outline the application, algorithms, use cases, significance, and selection criteria for machine learning. Highlight the deep learning process, types, limitations, and significance. Describe reinforcement training, neural network classifications, and a lot more. Hit download and begin personalization. Our AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck are topically designed to provide an attractive backdrop to any subject. Use them to look like a presentation pro. https://bit.ly/3ngJCKf
The Future of AI is Generative not Discriminative 5/26/2021Steve Omohundro
The deep learning AI revolution has been sweeping the world for a decade now. Deep neural nets are routinely used for tasks like translation, fraud detection, and image classification. PwC estimates that they will create $15.7 trillion/year of value by 2030. But most current networks are "discriminative" in that they directly map inputs to predictions. This type of model requires lots of training examples, doesn't generalize well outside of its training set, creates inscrutable representations, is subject to adversarial examples, and makes knowledge transfer difficult. People, in contrast, can learn from just a few examples, generalize far beyond their experience, and can easily transfer and reuse knowledge. In recent years, new kinds of "generative" AI models have begun to exhibit these desirable human characteristics. They represent the causal generative processes by which the data is created and can be compositional, compact, and directly interpretable. Generative AI systems that assist people can model their needs and desires and interact with empathy. Their adaptability to changing circumstances will likely be required by rapidly changing AI-driven business and social systems. Generative AI will be the engine of future AI innovation.
Give a background of Data Science and Artificial Intelligence, to better understand the current state of the art (SOTA) for Large Language Models (LLMs) and Generative AI. Then start a discussion on the direction things are going in the future.
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYAndre Muscat
Discuss the impact and opportunity of using Generative AI to support your development and creative teams
* Explore business challenges in content creation
* Cost-per-unit of different types of content
* Use AI to reduce cost-per-unit
* New partnerships being formed that will have a material impact on the way we search and engage with content
Part 4 of a 9 Part Research Series named "What matters in AI" published on www.andremuscat.com
The quality of software systems may be expressed as a collection of Software Quality Attributes. When the system requirements are defined, it is essential also to define what is expected regarding these quality attributes, since these expectations will guide the planning of the system architecture and design.
Software quality attributes may be classified into two main categories: static and dynamic. Static quality attributes are the ones that reflect the system’s structure and organization. Examples of static attributes are coupling, cohesion, complexity, maintainability and extensibility. Dynamic attributes are the ones that reflect the behavior of the system during its execution. Examples of dynamic attributes are memory usage, latency, throughput, scalability, robustness and fault-tolerance.
Following the definitions of expectations regarding the quality attributes, it is essential to devise ways to measure them and verify that the implemented system satisfies the requirements. Some static attributes may be measured through static code analysis tools, while others require effective design and code reviews. The measuring and verification of dynamic attributes requires the usage of special non-functional testing tools such as profilers and simulators.
In this talk I will discuss the main Software Quality attributes, both static and dynamic, examples of requirements, and practical guidelines on how to measure and verify these attributes.
Presenting the landscape of AI/ML in 2023 by introducing a quick summary of the last 10 years of its progress, current situation, and looking at things happening behind the scene.
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi | Automating Machine Learning, Artificial Intelligence, and Data Science | Guided Analytics
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
Many data scientists are well grounded in creating accomplishment in the enterprise, but many come from outside – from academia, from PhD programs and research. They have the necessary technical skills, but it doesn’t count until their product gets to production and in use. The speaker recently helped a struggling data scientist understand his organization and how to create success in it. That turned into this presentation, because many new data scientists struggle with the complexities of an enterprise.
My presentation today about ChatGPT, Open AI, conversational AI, and the Future Of Work. Includes survey data from the audience. Presented at our Constellation Research Execution Network monthy Office Hours of CIOs, CDOs, and other CXOs.
Generative AI: Responsible Path forward, a presentation conducted during DataHour webinar series by Analytics Vidhya and attended by more than a hundred data scientists and AI experts from around the world. The presentation address the importance of AI ethics and the development of responsible AI governance at tech firms to help mitigate AI risks and ethical issues.
TensorFlow에 대한 분석 내용
- TensorFlow?
- 배경
- DistBelief
- Tutorial - Logistic regression
- TensorFlow - 내부적으로는
- Tutorial - CNN, RNN
- Benchmarks
- 다른 오픈 소스들
- TensorFlow를 고려한다면
- 설치
- 참고 자료
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete DeckSlideTeam
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck is loaded with easy-to-follow content, and intuitive design. Introduce the types and levels of artificial intelligence using the highly-effective visuals featured in this PPT slide deck. Showcase the AI-subfield of machine learning, as well as deep learning through our comprehensive PowerPoint theme. Represent the differences, and interrelationship between AI, ML, and DL. Elaborate on the scope and use case of machine intelligence in healthcare, HR, banking, supply chain, or any other industry. Take advantage of the infographic-style layout to describe why AI is flourishing in today’s day and age. Elucidate AI trends such as robotic process automation, advanced cybersecurity, AI-powered chatbots, and more. Cover all the essentials of machine learning and deep learning with the help of this PPT slideshow. Outline the application, algorithms, use cases, significance, and selection criteria for machine learning. Highlight the deep learning process, types, limitations, and significance. Describe reinforcement training, neural network classifications, and a lot more. Hit download and begin personalization. Our AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck are topically designed to provide an attractive backdrop to any subject. Use them to look like a presentation pro. https://bit.ly/3ngJCKf
The Future of AI is Generative not Discriminative 5/26/2021Steve Omohundro
The deep learning AI revolution has been sweeping the world for a decade now. Deep neural nets are routinely used for tasks like translation, fraud detection, and image classification. PwC estimates that they will create $15.7 trillion/year of value by 2030. But most current networks are "discriminative" in that they directly map inputs to predictions. This type of model requires lots of training examples, doesn't generalize well outside of its training set, creates inscrutable representations, is subject to adversarial examples, and makes knowledge transfer difficult. People, in contrast, can learn from just a few examples, generalize far beyond their experience, and can easily transfer and reuse knowledge. In recent years, new kinds of "generative" AI models have begun to exhibit these desirable human characteristics. They represent the causal generative processes by which the data is created and can be compositional, compact, and directly interpretable. Generative AI systems that assist people can model their needs and desires and interact with empathy. Their adaptability to changing circumstances will likely be required by rapidly changing AI-driven business and social systems. Generative AI will be the engine of future AI innovation.
Give a background of Data Science and Artificial Intelligence, to better understand the current state of the art (SOTA) for Large Language Models (LLMs) and Generative AI. Then start a discussion on the direction things are going in the future.
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYAndre Muscat
Discuss the impact and opportunity of using Generative AI to support your development and creative teams
* Explore business challenges in content creation
* Cost-per-unit of different types of content
* Use AI to reduce cost-per-unit
* New partnerships being formed that will have a material impact on the way we search and engage with content
Part 4 of a 9 Part Research Series named "What matters in AI" published on www.andremuscat.com
The quality of software systems may be expressed as a collection of Software Quality Attributes. When the system requirements are defined, it is essential also to define what is expected regarding these quality attributes, since these expectations will guide the planning of the system architecture and design.
Software quality attributes may be classified into two main categories: static and dynamic. Static quality attributes are the ones that reflect the system’s structure and organization. Examples of static attributes are coupling, cohesion, complexity, maintainability and extensibility. Dynamic attributes are the ones that reflect the behavior of the system during its execution. Examples of dynamic attributes are memory usage, latency, throughput, scalability, robustness and fault-tolerance.
Following the definitions of expectations regarding the quality attributes, it is essential to devise ways to measure them and verify that the implemented system satisfies the requirements. Some static attributes may be measured through static code analysis tools, while others require effective design and code reviews. The measuring and verification of dynamic attributes requires the usage of special non-functional testing tools such as profilers and simulators.
In this talk I will discuss the main Software Quality attributes, both static and dynamic, examples of requirements, and practical guidelines on how to measure and verify these attributes.
Presenting the landscape of AI/ML in 2023 by introducing a quick summary of the last 10 years of its progress, current situation, and looking at things happening behind the scene.
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi | Automating Machine Learning, Artificial Intelligence, and Data Science | Guided Analytics
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
Many data scientists are well grounded in creating accomplishment in the enterprise, but many come from outside – from academia, from PhD programs and research. They have the necessary technical skills, but it doesn’t count until their product gets to production and in use. The speaker recently helped a struggling data scientist understand his organization and how to create success in it. That turned into this presentation, because many new data scientists struggle with the complexities of an enterprise.
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
Looking to build a robust machine learning infrastructure to streamline MLOps? Learn from Provectus experts how to ensure the success of your MLOps initiative by implementing Data QA components in your ML infrastructure.
For most organizations, the development of multiple machine learning models, their deployment and maintenance in production are relatively new tasks. Join Provectus as we explain how to build an end-to-end infrastructure for machine learning, with a focus on data quality and metadata management, to standardize and streamline machine learning life cycle management (MLOps).
Agenda
- Data Quality and why it matters
- Challenges and solutions of Data Testing
- Challenges and solutions of Model Testing
- MLOps pipelines and why they matter
- How to expand validation pipelines for Data Quality
How to analyze text data for AI and ML with Named Entity RecognitionSkyl.ai
About the webinar
The Internet is a rich source of data, mainly textual data. But making use of huge quantities of data is a complex and time-consuming task. NLP can help with this problem through the use of Named Entity Recognition systems. Named entities are terms that refer to names, organizations, locations, values etc. NER annotates texts – marking where and what type of named entities occurred in it. This step significantly simplifies further use of such data, allowing for easy categorization of documents, analyze sentiments, improving automatically generated summaries etc.
Further, in many industries, the vocabulary keeps changing and growing with new research, abbreviations, long and complex constructions, and makes it difficult to get accurate results or use rule-based methods. Named Entity Recognition and Classification can help to effectively extract, tag, index, and manage this fast and ever-growing knowledge.
Through this webinar, we will understand how NER can be used to extract key entities from large volumes of text data
What you will learn
- How organizations are leveraging Named Entity Recognition across various industries
- Live demo - Identify & classify complex terms & with NERC (Named Entity Recognition & Categorization)
- Best practice to automate machine learning models in hours not months
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Sri Ambati
Presented at #H2OWorld 2017 in Mountain View, CA.
Enjoy the video: https://youtu.be/-rGRHrED94Y.
Learn more about H2O.ai: https://www.h2o.ai/.
Follow @h2oai: https://twitter.com/h2oai.
- - -
Abstract:
Most machine learning systems enable two essential processes: creating a model and applying the model in a repeatable and controlled fashion. These two processes are interrelated and pose technological and organizational challenges as they evolve from research to prototype to production. This presentation outlines common design patterns for tackling such challenges while implementing machine learning in a production environment.
Sergei's Bio:
Dr. Sergei Izrailev is Chief Data Scientist at BeeswaxIO, where he is responsible for data strategy and building AI applications powering the next generation of real-time bidding technology. Before Beeswax, Sergei led data science teams at Integral Ad Science and Collective, where he focused on architecture, development and scaling of data science based advertising technology products. Prior to advertising, Sergei was a quant/trader and developed trading strategies and portfolio optimization methodologies. Previously, he worked as a senior scientist at Johnson & Johnson, where he developed intelligent tools for structure-based drug discovery. Sergei holds a Ph.D. in Physics and Master of Computer Science degrees from the University of Illinois at Urbana-Champaign.
The field of machine programming — the automation of the development of software — is making notable research advances. This is, in part, due to the emergence of a wide range of novel techniques in machine learning. In today’s technological landscape, software is integrated into almost everything we do, but maintaining software is a time-consuming and error-prone process. When fully realized, machine programming will enable everyone to express their creativity and develop their own software without writing a single line of code. Intel realizes the pioneering promise of machine programming, which is why it created the Machine Programming Research (MPR) team in Intel Labs. The MPR team’s goal is to create a society where everyone can create software, but machines will handle the “programming” part.
Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15MLconf
Incorporating the Real Time Component into Analytics and Machine Learning: Many industries and organizations today want to harness the power of big data analytics and machine learning for its potential to improve margins, enhance discoveries, give insight into the business, and enable fast data driven decisions. The challenges include inability and/or difficulties in using available systems, not knowing where to start or which tools make sense for a particular problem, and dealing with data sets that are too big, too fast, or too complicated to handle with traditional systems.
RTDS Inc. has developed SymetryMLTM which are technologies for zero latency machine learning and analytics/exploration of very large datasets in real time, with a focus on speed, accuracy and simplicity. Our goal has been to cut the memory footprint required to learn large data sets, “reducer” functionality to automatically select the best attributes for model creation and build models on the fly. SymetryMLTM is also designed for easy integration into existing business processes via either an easy to use Web-UI or RESTful APIs.
This talk will explore some of the functionality of these systems including real time exploration of data, fast multi-variate model prototyping, and our use of GPUs and parallelization. An example of brain related data and the complexities of analytics will be discussed as well as a brief overview of other verticals we are exploring. Our work is geared towards making big data make sense in real time and enable users to gain insights faster than traditional methods.
Arocom is a consulting and solution engineering company with expertise in providing engineering services for AI & Machine Learning, Data Operations & Analytics, MLOps and Cloud Computing.
Our clients include companies within biotech, drug discovery, therapeutics, manufacturing, retail and startups. Our consultants are best in their skills and offer hands-on talent to our clients in achieving their goals.
Building machine learning muscle in your team & transitioning to make them do machine learning at scale. We also discuss about Spark & other relevant technologies.
Arocom is a consulting and solution engineering company with expertise in providing engineering services for AI & Machine Learning, Data Operations & Analytics, MLOps and Cloud Computing.
Our clients include companies within biotech, drug discovery, therapeutics, manufacturing, retail and startups. Our consultants are best in their skills and offer hands-on talent to our clients in achieving their goals.
AI improves software testing by Kari Kakkonen at TQSKari Kakkonen
AI (Artificial Intelligence) can make software testing better, and it is already happening. My presentation at Test & Quality Summit online 16.9.2020 talks a bit about Artificial Intelligence / Machine Learning theory, then discusses through NASA code quality case the fact that AI can be very precise in spotting problems. Finally, I take a look at software testing industry, which already proves to have many AI-powered tools and projects. Thanks to the team at Knowit and all the references in the content. I hope all of us start accelerating towards reaping off the AI benefits.
AI for Customer Service: How to Improve Contact Center Efficiency with Machin...Skyl.ai
About the webinar
It only takes one bad interaction for a customer to abandon a service or product. Businesses are no longer just competing with other companies’ products, they’re competing with a customer’s last service experience. All contact centers worldwide are looking for new and strategic ways to increase operational performance, reduce cost, and still provide high-touch customer experiences that improve customer loyalty and highlight ways to increase revenue and productivity.
Through this webinar, we will understand how AI can augment the effort, focus and problem-solving abilities of human agents so that they can tackle more complex or creative tasks. With an abundance of data from logs, emails, chat, and voice recordings, contact centers can ingest this data to provide contextual customer service at the right time with the right way providing satisfactory customer service and retain the brand value.
What you will learn
- How organizations are building engaging interactions that deliver value to customers
- Best practices to automate AI/ML models
- Demo: How to route customer queries to the right department or professional
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
2. Everything
software!
• Software is eating the
world, in all sectors
In the future, all
companies will be
software companies
Marc Andreessen,
founder of Netscape
3.
4. Take-aways from this talk
• Big data is the most important enabler in AI4SE
• AI4SE is closer than we think
• We will still be needed to teach ML/AI
5. Who am I?
• Professor in Software Engineering at
Chalmers | University of Gothenburg
• Specialization in software measurement
– Machine learning in software engineering
– Autonomous artificial intelligence based measurement
– Measurement knowledge discovery
– Simulation of outcome before decision formulation
– Metrological foundations of measurement reference etalons
• Actively working with the standards
– ISO/IEC 15939 - Software and Systems Engineering - Measurement Processes
– ISO/IEC 25000 (series) - Software Quality Requirements and Evaluation
(SQuaRE)
– ISO/IEC 14598 - Information Technology - Software Product Evaluation
• Software Center – a collaboration between 13 companies and 5
universities
6. Challenges of modern SE
• Need for Speed
– New releases are expected by the market almost on a daily basis
– Years -> Months -> Weeks
• Data driven development
– Development decisions are taken based on data from software development
• Empowerment
– The teams who have the data should make the decisions
• Ecosystems
– Services grow around products
– Products grow around platforms
7. Why AI and ML is a paradigm shift…
5
This is number 5
There is:
- 60 % probability that this is number 5
- 30 % probability that this is number 3
- 10 % probability that this is number 1
8. AI4SE is already here, we just did not know it yet
• Intelligent software development environments1
– Visual Studio (IntelliCode), Kite (Python), Codota
• Requirements Engineering2
– Algorithms for natural language processing, Hill Climbing for requirement
evolution
• Automated testing3
– Test automation, test identification, test orchestration
1 https://livablesoftware.com/smart-intelligent-ide-programming/
2 Groen, E.C., Harrison, R., Murukannaiah, P.K. et al. Autom Softw Eng (2019).
3 T. M. King, J. Arbon, D. Santiago, D. Adamo, W. Chin and R. Shanmugam, "AI for Testing Today and Tomorrow:
Industry Perspectives," 2019 IEEE International Conference On Artificial Intelligence Testing (AITest)
10. Data source ML Methods Difficulty level ROI/Impact Examples of visualization
Defect prediction - JIRA
- ClearQuest
- BugZilla
- Regression
[Excel, R, Weka, Python]
- Classification
[R, Weka, Python]
Low High/decision support
CCFlex ML metrics - Git
- SVN
- ClearCase
- Decision trees
[CCFlex, R, Weka, Python]
Medium Medium/data collection
Test optimization - Test tools
- Portals
- Test DBs
- Classification
[R, Weka, Python]
- Cluster analysis
[R, Weka, Python]
- Reinforced learning
[R, Weka, Python]
High High/development practices
Customer data analysis - Field data DB - Classification
[R, Weka, Python]
- Cluster analysis
[R, Weka, Python]
- Decision trees
[R, Weka, Python]
High High/decision support
KPI trend analysis - Metrics DB - Classification
[R, Weka, Python]
- Regression
[R, Weka, Python]
Medium Medium/dissemination
Requirements quality
assessment
- Requirements DB
- ReqPro
- DOORS
- Classification
[R, Weka, Python]
- Clustering
[R, Weka, Python]
Low Medium/development practices
Dashboard support - Metrics DB - Classification
[R, Weka, Python]
- Time series
[R, Weka, Python]
Low Medium/decision support
Defect classification - JIRA
- ClearQuest
- Bugzilla
- Decision trees
[R, Weka, Python]
- Clustering
[R, Weka, Python]
Medium Medium/development practices
Speed / CI - Gerrit
- Jenkins
- Deep learning
[R, Weka, Python]
- Decision trees
[R, Weka, Python]
High Medium/development practices
11. Typical application of AI in SE
Data mining
Raw data
exports
Feature
acquisition
Scaling,
cleaning,
wrangling
Machine
learning
Decision
support / AI
Image by Gerd Altmann from Pixabay
12. Machine learning / AI is just a small part of the whole
pipeline
• Production ML systems needed for
software engineering are still away
– Lack of high quality, labelled data
– Limited analysis capabilities due to non-
obfuscated data sets
– Non-standardized feature extraction
– Manual configuration of data workflows
Source: https://developers.google.com/machine-learning/crash-course/production-ml-systems
13. One of the fundamental challenges of applying ML in
software engineering – feature extraction
5
How we see the number
0 1 1 1 1 0
0 1 0 0 0 0
0 1 0 0 0 0
0 1 1 1 0.5 0
0 1 0 0.5 1 0
0 1 0 0 1 0
0 1 0 0.5 1 0
0 0.5 1 1 0.5 0
How the AI sees the number
14. One of the fundamental challenges of applying ML in
software engineering – feature extraction (requirement)
How we see the requirement
How the AI sees the requirement
When ContainerType changes to “not available” then ContainerCapacity
should be set to the last value as long as ContainerReset is requested.
Keyword: system Keyword: should Keyword: can Keyword: and Has_reference
0 1 0 0 0
AI’s ability to distinguish two requirements strongly depends on which features we extract.
15. Another Fundamental Challenge - lack of high quality
labelled data
When ContainerType changes to “not available” then ContainerCapacity
should be set to the last value as long as ContainerReset is requested.
The xxxxx concept shall allow changes in the configuration of the yyyyy
modules after the software has been built. For detailed specification of which
modules and parameters are changeable see reference zzzzz configuration
specification.
Example of a good requirement
Example of a “bad” requirement
To train an aNN we need 100.000 ++ data points, which we need to label manually.
16. Lack of high quality labelled data – human inconsistency
When ContainerType changes to “not available” then ContainerCapacity
should be set to the last value as long as ContainerReset is requested.
The xxxxx concept shall allow changes in the configuration of the yyyyy
modules after the software has been built. For detailed specification of which
modules and parameters are changeable see reference zzzzz configuration
specification.
Example of a good requirement
Example of a “bad” requirement
Tool Reviewer 1 Reviewer 2
78 4 4
67 5 3
62 4 4
62 5 4
62 4 4
60 4 4
60 4 4
58 4 3
55 4 5
53 4 4
49 3 4
49 4 3
47 1 1
46 4 3
42 3 3
Tool Reviewer 1 Reviewer 2
-65 4 2
-15 2 2
-14 1 2
-13 2 2
-5 2 3
0 4 2
1 Not req 1
1 2 1
1 5 3
2 3 2
7 3 2
8 3 3
9 4 3
10 2 1
11 4 1
Green => good requirement, Red => ”bad” requirement
18. Modern SW architecture: Computer on wheels
18
• Industry (practice)
– Automotive sofware architectures are moving from
federated (distributed) to integrated (centralized,
virtualized)
-> execution of more computationally
demanding algorithms
– Modern automotive software combine stochastic
and probabilistic algorithms
-> new methods for safety assurance, fault
detection/correction and diagnostics are needed
• Academia (theory)
– Data quality measures (consistency) are not
related to quality of AI algorithms (precision/recall)
-> novel data quality measures are needed to
well our data sets reflect the entire solution space
– ML and AI are difficult to test (development) and
diagnose (runtime)
-> new methods for testing and diagnostics
are needed
19. No other cars
(35%)
There is snow
(66.7%)
There is an animal
(99%)
You can drive here
(99%)
False negative False positive True positive True positive
???
20. Way forward with ML/AI and Automotive Software
• We need new ways to create/develop sustainable architectural designs.
• Automotive software architectures are moving from Federated to Integrated.
• Computationally demanding execution
• Automotive software development is moving to Agile (post-deploy, adaptive AUTOSAR)
• We need new ways to assure quality of such systems.
• Existing data quality measures are not related to quality of AI algorithm
• ML and AI are difficult to test (development) and diagnose (runtime)
• Traditional assertions do not accommodate stochastic nature of modern algorithms
• There are no systematic ways of handling training/test datasets for QA
22. • How to quantify entities without
predefined patterns?
• How to flexibly define measurement
instruments based on machine
learning?
• How to discover the patterns of
countable attributes using machine
learning?
• How to discover new data patterns (e.g.
anomalies)?
• How to define the measurement
functions using machine learning
algorithms?
• How to discover new patterns in data
which can be communicated to the
stakeholders?
• How to use machine learning to
describe the patterns?
• How to use machine learning in visual
analytics?
• How can we use machine
learning to mine for standard
models?
Machine Learning
AI/ML-based
measurement
• We study the use of machine learning
to
– Identify behavior of SW code finding
where the relevant code is
– Classify which defects are important,
based on their description, to save time
for analysis
– Identify bottlenecks in continuous
integration, based on integration stop-
patterns
– Identify which KPIs should be removed
because they do not provide any value
• How can we generate new
decision criteria using
machine learning?
23. OUR EXPERIENCES FROM USING MACHINE LEARNING IN SE
WHICH DEFECT SHOULD WE FIX FIRST?
25. Defects database
• Product: large > 10 MLOC
• Period: 2010-17
• Total records: ~14K
• Different filters …
Defects
Main tools:
26. Problem formulation
• How can we predict the severity of the defect?
– Imagine we discover a bug
– We need to quickly assess if this bug should be fixed in this release or not
– We need to assess if this is going to be a lot of work
• Today’s solution
– Architect and quality engineer make the assessment
• We can do better!
27. Mining association rules for defect prioritization
supp=0.0016 confidence=0.83 lift=9.95
{phaseFound=PRODUCT VALIDATION TESTING
answerCode=B2 - To be corrected in this release,
Importance=30}
=> {Severity=A}
supp=0.0011 confidence=0.88 lift=10.45
{phaseFound=Customer,
answerCode=B2 - To be corrected in this release,
submittedOnSystemPart=VERY IMPORTANT PART}
=> {Severity=A}
supp=0.0013 confidence=0.80 lift=9.55
{phaseFound=PRODUCT VALIDATION TESTING
answerCode=B2 - To be corrected in this release,
FollowUpOn=,
ClonedToAllReleases=YES
submittedOnSystemPart=LI}
=> {Severity=A}
28. Can we distinguish Severity A defects from others?
Decision tree: J48 (Weka) + ClassBalancer
J48 pruned tree (example)
------------------
VerificationLevelRequired =
| phaseFound = : A (1.62)
| phaseFound = Customer: A (60.88/12.3)
| phaseFound = Design Test (DT): Other (38.48/8.1)
| phaseFound = Document review (CPI): Other (11.75/1.62)
| phaseFound = FOA: A (28.66/10.85)
| phaseFound = Function Test (FT): Other (228.56/40.48)
| phaseFound = PRODUCT VALIDATION TEST: Other (6.86/3.24)
| phaseFound = INTERNAL TEST: Other (5.79)
| phaseFound = Requirement Review: Other (5.06)
| phaseFound = System Test (ST): Other (148.34/61.53)
VerificationLevelRequired = Customer: A (3.24)
VerificationLevelRequired = Design Test (DT): A (22.67)
VerificationLevelRequired = Function Test (FT): A (66.39)
VerificationLevelRequired = PRODUCT VALIDATION TEST: A (6.48)
VerificationLevelRequired = Requirement Review: A (4.86)
VerificationLevelRequired = System Test (ST): A (66.39)
Number of Leaves : 16
Size of the tree : 18
Accuracy = 77.70 %
True Positive(A) = 0.642
False Positive(A) = 0.088
F-Score(A) = 0.742
True Positive (Other) = 0.912
False Positive (Other) = 0.358
F-Score(B) = 0.804
29. Can we distinguish Severity A defects from others?
• Potentially valuable features (using filter):
• phaseFound
• Keywords headline: branch, test case, underscore
• Kyewords desc: descr_info, descr_requirement, descr_test,
descr_debug, descr_log…
• DaysUntilAssigned
• Records = 6342
• Features = 49
• Directly available
• Time periods between changes of states
• Keywords appearance in description and header
30. How many parameters do we need to make good
classifications?
observations 14K, supp=0.001, conf=0.8 => 263 rules; 37 prunned
32. Practical implications
• We can get much faster with ML
– Human assessment is deferred to later phases
• We need to learn how to work with probabilities
– We cannot say that something is digital any more
• Machine programming
– In the next few years we can see the programs that will repair and even write themselves using ML
approaches
33. EXAMPLE OF OUR RESEARCH
SPEED UP SOFTWARE DEVELOPMENT
USING MACHINE LEARNING
IN COLLABORATION WITH M. OCHODEK (POZNAN UNIV. OF TECHNOLOGY), R. HEBIG (CHALMERS | UNIV. OF GOTHENBURG), W. MEDING (ERICSSON), G. FROST (GRUNDFOS)
34. • How to quantify entities without predefined
patterns?
• How to flexibly define measurement
instruments based on machine learning?
• How to discover the patterns of countable
attributes using machine learning?
Initial diagnosis:
Recognizing coding
violations
• Problem
– How can we measure the quality of
source code based on arbitrary coding
guidelines
• Solutions
– Manual code reviews
– Static analysis
– Manual coding of new rules for static
analysis
– Machine learning of arbitrary coding
guidelines
35. Measuring code quality
Cycle 1: manual examples
• Problem
– How can we detect violations of coding styles in a dynamic way?
Dynamic == the rules can change without the need for tool
reconfiguration
• Solution at a glance
– Teach the code counter to recognize coding standards (e.g. use the
examples from company’s coding standard tutorials)
– Use machine learning as the tool’s engine to define the formal rules
– Apply the tool on the code base to find violations
• Results
– 95% - 99% accuracy of violation detection on open source projects
Violations
Coding standard
examples
Product
code base
Machine
learning
36. Feature acquisition
36
File type #Characters If … Decision class
java 25 TRUE … Violation
… … … … …
Feature engineering
and extraction engine
Source code: training set
Source code: ML encoded training set
@
37. Example features
• Plain text (F01-F04):
– File extension
– Full and trimmed length (characters)
– Tokens
• Programming language (F05-F19):
– Assignment,
– Brackets,
– Class,
– Comment,
– Semicolons,
– …
37
38. Company 1: Proprietary code (pilot)
• Set-up:
– Code base of ca. 7 MLOC
– One guideline:
• Top diagram:
– The size of the training set
(example) is one of two major
factors determining accuracy.
– The other factor is the algorithm
(not shown in the diagram)
• Bottom diagram:
– The first trials did not find anything
– Trial #5 resulted in finding
all violations
some false-positive (non-violation)
39. Results in the context of evolving code and guidelines
Company 2: preprocessor directive should start at the
beginning of the line
40. Recognizing more rules on larger code base
Company 1 (again): 7 different violations
1,00
0,35
0,98
0,77
0,82
0,91
0,65
1,00
0,97
1,00
0,99
1,00
0,97
0,98
1,00
0,21
0,97
0,63
0,69
0,86
0,49
0,00
0,20
0,40
0,60
0,80
1,00
1,20
Sum of F1-Score
Sum of Recall
Sum of Precision
41. What did we learn?
• Providing the examples is ”boring”
• Training is ”boring”
• Conclusion: faster than human reviewers, but still time consuming
• Solution #2: Gerrit!
– Gerrit is a Google-developed software review tool
42. Measuring code quality
Cycle 2: automated examples
• Problem
– How can we detect violations of coding styles in a dynamic way?
Dynamic == the rules can change over time based on the
team’s programming style
• Solution at a glance
– Teach the code counter to recognize coding standards by
analyzing code reviews
– Use machine learning as the tool’s engine to define the formal
rules
– Apply the tool on the code base to find violations
• Results
– 75% accuracy
Violations
Gerrit reviews
Product
code base
Machine
learning
43. Feature acquisition
44
File type #Characters If … Decision class
java 25 TRUE … Violation
… … … … …
Feature engineering
and extraction engine
Source code: training set
Source code: ML encoded training set
Data set expansion:
Ca. 1,000 LOC -> 180,000 LOC
45. Input
layer
…………………………………….…
Recurrent
layer
…………………………………….… Convolution
layer
………………………….…
Output
layer
Recognize
low level patterns
(e.g. non-standard ”for”)
Recognize
high level patterns
(e.g. non-compiled code)
90% probability of violation
9.9% probability of non-violation
0.1% probability of undecided
Encoded lines
Technical challenges (examples):
• How many layers?
• How many neurons per layer?
• Convolution first vs recurrent first
• Convolution parameters: window, stride, filters
• Recurrent parameters: forget function
46. NN understands
the programming language
• Word embeddings provide the context
• We use Linux kernel as the vocabulary
• The larger the code base, the better the
results from the neural network
– Ca. 20.000 words in the vocabulary
48. Conclusions and take-aways
• Big data is the most important enabler in AI4SE
• AI4SE is closer than we think
• We will still be needed to teach ML/AI