The slides from my presentation at the International Conference on Software Reuse (ICSR'22).
The paper: https://link.springer.com/chapter/10.1007/978-3-031-08129-3_2
Free preprint: https://hal.archives-ouvertes.fr/hal-03647706/
Data Mining-based Tools to Support Library Update. PhD Defence of Oleksandr Z...Oleksandr Zaitsev
Oleksandr ZAITSEV obtaining PhD in Informatics from the University of Lille.
Title: Data Mining-based Tools to Support Library Update
Date: October 28, 2022
Location: Inria Lille - Nord Europe. Park Plaza, Parc scientifique de la Haute-Borne, 6 Rue Heloïse Bât B, 59650 Villeneuve-d'Ascq, France
Composition of the jury:
Supervisor: Stéphane DUCASSE
Co-supervisor: Nicolas ANQUETIL
Industrial advisor: Arnaud THIEFAINE
Reviewers: Romain ROBBES, Coen de ROOVER
Examiner: Olga KOUCHNARENKO
This was a Cifre PhD between Inria research institute and Arolla software company. Oleksandr ZAITSEV is grateful to Arolla for sponsoring his research.
This document provides instructions for performing a variant calling analysis on genomic data from the 1000 Genomes Project. The steps include:
1. Running GATK's UnifiedGenotyper on the IGB biocluster to call variants from an aligned BAM file.
2. Hard filtering the called variants to remove low quality calls.
3. Annotating the filtered variants with SnpEff to add gene information.
4. Visualizing the results in the Integrative Genomics Viewer (IGV) desktop tool to inspect called variants and coverage.
This document provides information about Javier Eguiluz, a programmer and trainer from Spain who specializes in Symfony and Twig. It outlines his experience with Symfony and as the author of a popular Symfony book. The agenda covers tips, tricks, advanced features, best practices and new noteworthy features of the Twig templating engine.
The document discusses renaming features in Visual Studio 2015. It describes how VS2015 provides renaming assistance through suggestions from the light bulb icon and previews changes before renaming. The renaming window allows renaming variables, methods, properties, classes, parameters and strings. It can also rename code comments and detect conflicts if the new name already exists. Renaming occurs inline and on the fly. The examples demonstrate renaming a variable, method, and parameter across multiple files. VS2015 helps optimize code through intelligent and automated renaming.
The document discusses designing good object-oriented classes. It provides guidance on choosing appropriate classes, maintaining cohesion so that classes represent single concepts, minimizing dependencies between classes, and reducing side effects from method calls. Examples are given for common patterns when designing classes, such as keeping a running total, counting events, collecting object values, managing object properties, modeling object states, and describing an object's position. The reader is taught how to apply these design principles and patterns when modeling real-world problems as classes, methods, and objects.
The document outlines the five steps of the Theory of Constraints for improving a system: 1) Identify the system's constraints, 2) Decide how to exploit the constraints, 3) Subordinate all other decisions to exploiting the constraints, 4) Elevate the constraints, and 5) If elevating breaks a constraint, return to step 1. It provides examples of applying these steps to identify and address bottlenecks in a manufacturing system to maximize throughput.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1ZW7TDL.
Richard Dallaway shows an example of what Scala looks like when using pattern matching over classes, how to encode an idea into types and use advanced features of Scala without complicating the code. Filmed at qconlondon.com.
Richard Dallaway is a partner at Underscore -- a consultancy specializing in Scala, especially the type-driven and functional aspects of Scala. He works on client projects writing software and helping teams deliver software with Scala. His focus is on the web, machine learning, and code review. He's the co-author of "Essential Slick" (Underscore), and author of the "Lift Cookbook" (O'Reilly).
Data Mining-based Tools to Support Library Update. PhD Defence of Oleksandr Z...Oleksandr Zaitsev
Oleksandr ZAITSEV obtaining PhD in Informatics from the University of Lille.
Title: Data Mining-based Tools to Support Library Update
Date: October 28, 2022
Location: Inria Lille - Nord Europe. Park Plaza, Parc scientifique de la Haute-Borne, 6 Rue Heloïse Bât B, 59650 Villeneuve-d'Ascq, France
Composition of the jury:
Supervisor: Stéphane DUCASSE
Co-supervisor: Nicolas ANQUETIL
Industrial advisor: Arnaud THIEFAINE
Reviewers: Romain ROBBES, Coen de ROOVER
Examiner: Olga KOUCHNARENKO
This was a Cifre PhD between Inria research institute and Arolla software company. Oleksandr ZAITSEV is grateful to Arolla for sponsoring his research.
This document provides instructions for performing a variant calling analysis on genomic data from the 1000 Genomes Project. The steps include:
1. Running GATK's UnifiedGenotyper on the IGB biocluster to call variants from an aligned BAM file.
2. Hard filtering the called variants to remove low quality calls.
3. Annotating the filtered variants with SnpEff to add gene information.
4. Visualizing the results in the Integrative Genomics Viewer (IGV) desktop tool to inspect called variants and coverage.
This document provides information about Javier Eguiluz, a programmer and trainer from Spain who specializes in Symfony and Twig. It outlines his experience with Symfony and as the author of a popular Symfony book. The agenda covers tips, tricks, advanced features, best practices and new noteworthy features of the Twig templating engine.
The document discusses renaming features in Visual Studio 2015. It describes how VS2015 provides renaming assistance through suggestions from the light bulb icon and previews changes before renaming. The renaming window allows renaming variables, methods, properties, classes, parameters and strings. It can also rename code comments and detect conflicts if the new name already exists. Renaming occurs inline and on the fly. The examples demonstrate renaming a variable, method, and parameter across multiple files. VS2015 helps optimize code through intelligent and automated renaming.
The document discusses designing good object-oriented classes. It provides guidance on choosing appropriate classes, maintaining cohesion so that classes represent single concepts, minimizing dependencies between classes, and reducing side effects from method calls. Examples are given for common patterns when designing classes, such as keeping a running total, counting events, collecting object values, managing object properties, modeling object states, and describing an object's position. The reader is taught how to apply these design principles and patterns when modeling real-world problems as classes, methods, and objects.
The document outlines the five steps of the Theory of Constraints for improving a system: 1) Identify the system's constraints, 2) Decide how to exploit the constraints, 3) Subordinate all other decisions to exploiting the constraints, 4) Elevate the constraints, and 5) If elevating breaks a constraint, return to step 1. It provides examples of applying these steps to identify and address bottlenecks in a manufacturing system to maximize throughput.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1ZW7TDL.
Richard Dallaway shows an example of what Scala looks like when using pattern matching over classes, how to encode an idea into types and use advanced features of Scala without complicating the code. Filmed at qconlondon.com.
Richard Dallaway is a partner at Underscore -- a consultancy specializing in Scala, especially the type-driven and functional aspects of Scala. He works on client projects writing software and helping teams deliver software with Scala. His focus is on the web, machine learning, and code review. He's the co-author of "Essential Slick" (Underscore), and author of the "Lift Cookbook" (O'Reilly).
This document discusses cloning Twitter using Redis by storing user, follower, and post data in Redis keys and data structures. It provides examples of how to store:
1) User profiles as Hashes with fields like username and ID.
2) Follower and following relationships as Sorted Sets with user IDs and timestamps.
3) User posts and timelines as Lists by pushing new post IDs.
It explains that while Redis lacks tables, its keys and data structures like Hashes, Sets and Lists allow building the same data model without secondary indexes. The document also notes that the system can scale horizontally by sharding the data across multiple Redis servers.
This chapter discusses arrays, loops, and conditional statements in JavaScript. It covers how to store data in arrays, access and modify array elements, and determine an array's length. The chapter also explains while, do/while, and for loops for repeatedly executing code. It describes if, if/else, and switch conditional statements for making decisions, including nesting statements and else if constructions.
The document provides information about ServiceNow concepts including variables, variable sets, UI policies, user criteria, catalog client scripts, script includes, and search. It defines variables and variable sets, explains how to create UI policies and user criteria. It also describes how to create catalog client scripts and script includes, and discusses using scripts includes to store reusable JavaScript functions. The document concludes with an overview of searching for different record types like requests, request items, and tasks.
.Net december 2017 updates - Tamir DresherTamir Dresher
This document summarizes updates for application developers and DevOps from 2017, including new features in Visual Studio 2017, Visual Studio App Center, Live Share, and connectivity to Azure Kubernetes Service. It also covers upcoming features for C# 7.2 and 8.0 like private protected access modifiers, readonly structs and arguments, nullable reference types, and Span<T> to reduce memory allocations and improve performance.
Fyber implemented XGBoost models for two main use cases: Audience Vault Reach prediction and CTR prediction for their offer wall. For Audience Vault Reach, XGBoost with Spark was used to predict audience size over the next 14 days using historical user activity data. For CTR prediction, XGBoost ranked offers based on attributes to better estimate performance compared to old manual configurations. Both models involved data preprocessing, feature engineering, training XGBoost pipelines on Spark, and integrating the models into products.
This document provides an overview of data science and how it can be done in Node.js. It defines data science as combining software engineering and statistical analysis. It discusses regression modeling and recommender systems as examples. Regression modeling predicts future values like number of users based on past data. Recommender systems predict what other products a customer may buy based on their preferences. Node.js is recommended for data science due to its event-driven asynchronous nature and packages like NPM and D3.js. Code examples are provided for both techniques.
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...GeeksLab Odessa
DataScience Lab, 13 мая 2017
Сходство пациентов: вычистка дубликатов и предсказание пропущенных диагнозов
Виктор Сарапин (CEO at V.I.Tech)
Как эффективно определять дубликаты на десятках миллионов пациентов, и как определять пропущенные диагнозы и лечебные действия.
Все материалы доступны по ссылке: http://datascience.in.ua/report2017
Chapter 5:Understanding Variable Scope and Class ConstructionIt Academy
Exam Objective 4.2 Given an algorithm as pseudo-code, determine the correct scope for a variable used in the algorithm and develop code to declare variables in any of the following scopes: instance variable, method parameter, and local variable.
This document provides an overview of performance tuning the MySQL server. It discusses where to find server configuration and status information, how to analyze what the database is doing using status variables, and which configuration variables can be tuned for optimization, including global, per-session, and storage engine variables. Key areas covered include memory usage, query analysis, indexing strategies, and tuning storage engines like InnoDB and MyISAM.
This document summarizes a lecture on inheritance in Java. It discusses adding new methods like toString() to existing classes like ArrayIntList. It also covers defining multiple constructors, using the this and super keywords, and overriding and calling parent methods. The key aspects are:
1. The toString() method allows an object to return a String representation and avoids directly printing objects.
2. Multiple constructors can be defined and older ones can call newer ones using this().
3. Subclasses inherit fields and methods from parent classes and can override methods while still calling the parent version using super().
4. Constructors are not inherited so subclasses must define their own, but can call parent constructors using super().
Optimization for iterative queries on Mapreducemakoto onizuka
This document discusses optimization techniques for iterative queries with convergence properties. It presents OptIQ, a framework that uses view materialization and incrementalization to remove redundant computations from iterative queries. View materialization reuses operations on unmodified attributes by decomposing tables into invariant and variant views. Incrementalization reuses operations on unmodified tuples by processing delta tables between iterations. The document evaluates OptIQ on Hive and Spark, showing it can improve performance of iterative algorithms like PageRank and k-means clustering by up to 5 times.
Update Statistics provides concise summaries of document changes in 3 sentences:
The document discusses changes to statistics collection and use in Informix versions 11.10, 11.50, and 11.70, including "Smart Statistics" which only updates statistics if data changes exceed a threshold. It also describes the "Auto Update Statistics" scheduler tasks which automatically determine and run appropriate UPDATE STATISTICS commands based on guidelines. The document provides examples showing how statistics are updated and not updated depending on whether the UPDATE STATISTICS command is run or data change thresholds are exceeded.
Billion Goods in Few Categories: How Histograms Save a Life?Sveta Smirnova
We store data with an intention to use it: search, retrieve, group, sort... To do it effectively, the MySQL Optimizer uses index statistics when it compiles the query execution plan. This approach works excellently unless your data distribution is not even.
Last year I worked on several support tickets where data follows the same pattern: millions of popular products fit into a couple of categories and the rest used the rest. We had a hard time finding a solution for retrieving goods fast. We offered workarounds for version 5.7. However, a new MariaDB and MySQL 8.0 feature - histograms - would work better, cleaner and faster. The idea of the talk was born.
Of course, histograms are not a panacea and do not help in all situations.
I will discuss
- how index statistics physically stored by the storage engine
- which data exchanged with the Optimizer
- why it is not enough to make correct index choice
- when histograms can help and when they cannot
- differences between MySQL and MariaDB histograms
Talk for Percona Live 2019 Austin: https://www.percona.com/live/19/sessions/billion-goods-in-few-categories-how-histograms-save-a-life
The McDonalds Dataset was taken which had details about the Food items and their Nutritional value.
The Data Analysis of the dataset was done in Python using Python Libraries and tools. The report has been prepared in a simple crisp and easy to read manner, keeping in mind the reviewer of the article.
Special attention has been given to spacing and colouring to make the article more interesting. All insights are present right below the codes.
CAiSE 2014 An adapter-based approach for M2T transformationsJokin García Pérez
An adapter-based approach is presented to synchronize code generated by model-to-text (M2T) transformations with changes to the underlying platform. The approach uses adapters to modify generated SQL statements based on differences detected between old and new platform schemas. The process iterates over change records, checks for statement impacts, and calls adaptation functions that output updated statements without deleted columns or other unsupported features. An evaluation shows the approach reduces manual effort compared to propagating changes directly in the transformation code.
Object Oriented Analysis and Design with UML2 part2Haitham Raik
The document discusses object-oriented analysis and design principles. It covers object-oriented analysis, which involves identifying core concepts or domain classes from requirements. It then discusses object-oriented design principles like SOLID - single responsibility principle, open/closed principle, Liskov substitution principle, interface segregation principle, and dependency inversion principle. Design patterns are also mentioned.
The document discusses research principles from Jennifer Widom including choosing research topics by dropping fundamental assumptions, thoroughly developing the data model, query language, and system, and promptly disseminating results through publications and software. It provides examples of tricky semantics in new data models and emphasizes reusing relational semantics when possible and not being secretive with research work.
This presentation recounts the story of Macys.com and Bloomingdales.com's migration from legacy RDBMS to NoSQL Cassandra in partnership with DataStax.
One thing that differentiates this talk from others on Cassandra is Macy's philosophy of "doing more with less." You will see why we emphasize the performance tuning aspects of iterative development when you see how much processing we can support on relatively small configurations.
This session will cover:
1) The process that led to our decision to use Cassandra
2) The approach we used for migrating from DB2 & Coherence to Cassandra without disrupting the production environment
3) The various schema options that we tried and how we settled on the current one. We'll show you a selection of some of our extensive performance tuning benchmarks, as well as how these performance results figured into our final schema designs.
4) Our lessons learned and next steps
This document discusses concurrency patterns for MongoDB, including optimistic concurrency control. It provides examples of using findAndModify to perform consistent updates in MongoDB, even when updating subdocuments or performing independent updates with upserts. While operators can reduce the need for concurrency control, findAndModify allows atomic updates along with returning the previous or updated document, enabling patterns like optimistic concurrency control to ensure consistency when updates could conflict.
The document discusses refactoring tips provided by Martin Fowler. It defines refactoring as improving the internal structure of software without changing its external behavior. It provides examples of common refactoring techniques like extract method, inline method, move method, and consolidate conditional expressions. The goal of refactoring is to improve code quality by making software easier to understand and modify over time.
Cormas: Modelling for Citizens with Citizens. Building accessible and reliabl...Oleksandr Zaitsev
This document discusses participatory agent-based modeling (ABM) and the Cormas modeling platform. It addresses three main topics:
1) Software engineering principles apply to ABM, requiring testing to ensure models remain valid over time as environments change. Object-centric and time-travel debugging tools could help validate ABMs.
2) Tangible interactions using sensors or computer vision could improve accessibility by allowing physical control of models.
3) Collaborative modeling allowing multiple users simultaneous control of the same model from different perspectives could help different stakeholders understand each other.
Oleksandr Zaitsev works at Cirad in Montpellier, France as a research scientist focusing on modeling, software engineering, and machine learning. He develops the agent-based modeling platform Cormas in Pharo and teaches agent-based modeling. He also conducts missions in Senegal involving teaching, advising students, and modeling for pastoralism using Pharo. Currently he is supervising two interns in Dakar working on big data management and a water quality monitoring system using Pharo IoT. Additionally, he is helping a student build a smart game board using RFID sensors and Pharo. The document then discusses central questions of agent-based modeling and provides examples of applications including an
More Related Content
Similar to DepMiner: Automatic Recommendation of Transformation Rules for Method Deprecation
This document discusses cloning Twitter using Redis by storing user, follower, and post data in Redis keys and data structures. It provides examples of how to store:
1) User profiles as Hashes with fields like username and ID.
2) Follower and following relationships as Sorted Sets with user IDs and timestamps.
3) User posts and timelines as Lists by pushing new post IDs.
It explains that while Redis lacks tables, its keys and data structures like Hashes, Sets and Lists allow building the same data model without secondary indexes. The document also notes that the system can scale horizontally by sharding the data across multiple Redis servers.
This chapter discusses arrays, loops, and conditional statements in JavaScript. It covers how to store data in arrays, access and modify array elements, and determine an array's length. The chapter also explains while, do/while, and for loops for repeatedly executing code. It describes if, if/else, and switch conditional statements for making decisions, including nesting statements and else if constructions.
The document provides information about ServiceNow concepts including variables, variable sets, UI policies, user criteria, catalog client scripts, script includes, and search. It defines variables and variable sets, explains how to create UI policies and user criteria. It also describes how to create catalog client scripts and script includes, and discusses using scripts includes to store reusable JavaScript functions. The document concludes with an overview of searching for different record types like requests, request items, and tasks.
.Net december 2017 updates - Tamir DresherTamir Dresher
This document summarizes updates for application developers and DevOps from 2017, including new features in Visual Studio 2017, Visual Studio App Center, Live Share, and connectivity to Azure Kubernetes Service. It also covers upcoming features for C# 7.2 and 8.0 like private protected access modifiers, readonly structs and arguments, nullable reference types, and Span<T> to reduce memory allocations and improve performance.
Fyber implemented XGBoost models for two main use cases: Audience Vault Reach prediction and CTR prediction for their offer wall. For Audience Vault Reach, XGBoost with Spark was used to predict audience size over the next 14 days using historical user activity data. For CTR prediction, XGBoost ranked offers based on attributes to better estimate performance compared to old manual configurations. Both models involved data preprocessing, feature engineering, training XGBoost pipelines on Spark, and integrating the models into products.
This document provides an overview of data science and how it can be done in Node.js. It defines data science as combining software engineering and statistical analysis. It discusses regression modeling and recommender systems as examples. Regression modeling predicts future values like number of users based on past data. Recommender systems predict what other products a customer may buy based on their preferences. Node.js is recommended for data science due to its event-driven asynchronous nature and packages like NPM and D3.js. Code examples are provided for both techniques.
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...GeeksLab Odessa
DataScience Lab, 13 мая 2017
Сходство пациентов: вычистка дубликатов и предсказание пропущенных диагнозов
Виктор Сарапин (CEO at V.I.Tech)
Как эффективно определять дубликаты на десятках миллионов пациентов, и как определять пропущенные диагнозы и лечебные действия.
Все материалы доступны по ссылке: http://datascience.in.ua/report2017
Chapter 5:Understanding Variable Scope and Class ConstructionIt Academy
Exam Objective 4.2 Given an algorithm as pseudo-code, determine the correct scope for a variable used in the algorithm and develop code to declare variables in any of the following scopes: instance variable, method parameter, and local variable.
This document provides an overview of performance tuning the MySQL server. It discusses where to find server configuration and status information, how to analyze what the database is doing using status variables, and which configuration variables can be tuned for optimization, including global, per-session, and storage engine variables. Key areas covered include memory usage, query analysis, indexing strategies, and tuning storage engines like InnoDB and MyISAM.
This document summarizes a lecture on inheritance in Java. It discusses adding new methods like toString() to existing classes like ArrayIntList. It also covers defining multiple constructors, using the this and super keywords, and overriding and calling parent methods. The key aspects are:
1. The toString() method allows an object to return a String representation and avoids directly printing objects.
2. Multiple constructors can be defined and older ones can call newer ones using this().
3. Subclasses inherit fields and methods from parent classes and can override methods while still calling the parent version using super().
4. Constructors are not inherited so subclasses must define their own, but can call parent constructors using super().
Optimization for iterative queries on Mapreducemakoto onizuka
This document discusses optimization techniques for iterative queries with convergence properties. It presents OptIQ, a framework that uses view materialization and incrementalization to remove redundant computations from iterative queries. View materialization reuses operations on unmodified attributes by decomposing tables into invariant and variant views. Incrementalization reuses operations on unmodified tuples by processing delta tables between iterations. The document evaluates OptIQ on Hive and Spark, showing it can improve performance of iterative algorithms like PageRank and k-means clustering by up to 5 times.
Update Statistics provides concise summaries of document changes in 3 sentences:
The document discusses changes to statistics collection and use in Informix versions 11.10, 11.50, and 11.70, including "Smart Statistics" which only updates statistics if data changes exceed a threshold. It also describes the "Auto Update Statistics" scheduler tasks which automatically determine and run appropriate UPDATE STATISTICS commands based on guidelines. The document provides examples showing how statistics are updated and not updated depending on whether the UPDATE STATISTICS command is run or data change thresholds are exceeded.
Billion Goods in Few Categories: How Histograms Save a Life?Sveta Smirnova
We store data with an intention to use it: search, retrieve, group, sort... To do it effectively, the MySQL Optimizer uses index statistics when it compiles the query execution plan. This approach works excellently unless your data distribution is not even.
Last year I worked on several support tickets where data follows the same pattern: millions of popular products fit into a couple of categories and the rest used the rest. We had a hard time finding a solution for retrieving goods fast. We offered workarounds for version 5.7. However, a new MariaDB and MySQL 8.0 feature - histograms - would work better, cleaner and faster. The idea of the talk was born.
Of course, histograms are not a panacea and do not help in all situations.
I will discuss
- how index statistics physically stored by the storage engine
- which data exchanged with the Optimizer
- why it is not enough to make correct index choice
- when histograms can help and when they cannot
- differences between MySQL and MariaDB histograms
Talk for Percona Live 2019 Austin: https://www.percona.com/live/19/sessions/billion-goods-in-few-categories-how-histograms-save-a-life
The McDonalds Dataset was taken which had details about the Food items and their Nutritional value.
The Data Analysis of the dataset was done in Python using Python Libraries and tools. The report has been prepared in a simple crisp and easy to read manner, keeping in mind the reviewer of the article.
Special attention has been given to spacing and colouring to make the article more interesting. All insights are present right below the codes.
CAiSE 2014 An adapter-based approach for M2T transformationsJokin García Pérez
An adapter-based approach is presented to synchronize code generated by model-to-text (M2T) transformations with changes to the underlying platform. The approach uses adapters to modify generated SQL statements based on differences detected between old and new platform schemas. The process iterates over change records, checks for statement impacts, and calls adaptation functions that output updated statements without deleted columns or other unsupported features. An evaluation shows the approach reduces manual effort compared to propagating changes directly in the transformation code.
Object Oriented Analysis and Design with UML2 part2Haitham Raik
The document discusses object-oriented analysis and design principles. It covers object-oriented analysis, which involves identifying core concepts or domain classes from requirements. It then discusses object-oriented design principles like SOLID - single responsibility principle, open/closed principle, Liskov substitution principle, interface segregation principle, and dependency inversion principle. Design patterns are also mentioned.
The document discusses research principles from Jennifer Widom including choosing research topics by dropping fundamental assumptions, thoroughly developing the data model, query language, and system, and promptly disseminating results through publications and software. It provides examples of tricky semantics in new data models and emphasizes reusing relational semantics when possible and not being secretive with research work.
This presentation recounts the story of Macys.com and Bloomingdales.com's migration from legacy RDBMS to NoSQL Cassandra in partnership with DataStax.
One thing that differentiates this talk from others on Cassandra is Macy's philosophy of "doing more with less." You will see why we emphasize the performance tuning aspects of iterative development when you see how much processing we can support on relatively small configurations.
This session will cover:
1) The process that led to our decision to use Cassandra
2) The approach we used for migrating from DB2 & Coherence to Cassandra without disrupting the production environment
3) The various schema options that we tried and how we settled on the current one. We'll show you a selection of some of our extensive performance tuning benchmarks, as well as how these performance results figured into our final schema designs.
4) Our lessons learned and next steps
This document discusses concurrency patterns for MongoDB, including optimistic concurrency control. It provides examples of using findAndModify to perform consistent updates in MongoDB, even when updating subdocuments or performing independent updates with upserts. While operators can reduce the need for concurrency control, findAndModify allows atomic updates along with returning the previous or updated document, enabling patterns like optimistic concurrency control to ensure consistency when updates could conflict.
The document discusses refactoring tips provided by Martin Fowler. It defines refactoring as improving the internal structure of software without changing its external behavior. It provides examples of common refactoring techniques like extract method, inline method, move method, and consolidate conditional expressions. The goal of refactoring is to improve code quality by making software easier to understand and modify over time.
Similar to DepMiner: Automatic Recommendation of Transformation Rules for Method Deprecation (20)
Cormas: Modelling for Citizens with Citizens. Building accessible and reliabl...Oleksandr Zaitsev
This document discusses participatory agent-based modeling (ABM) and the Cormas modeling platform. It addresses three main topics:
1) Software engineering principles apply to ABM, requiring testing to ensure models remain valid over time as environments change. Object-centric and time-travel debugging tools could help validate ABMs.
2) Tangible interactions using sensors or computer vision could improve accessibility by allowing physical control of models.
3) Collaborative modeling allowing multiple users simultaneous control of the same model from different perspectives could help different stakeholders understand each other.
Oleksandr Zaitsev works at Cirad in Montpellier, France as a research scientist focusing on modeling, software engineering, and machine learning. He develops the agent-based modeling platform Cormas in Pharo and teaches agent-based modeling. He also conducts missions in Senegal involving teaching, advising students, and modeling for pastoralism using Pharo. Currently he is supervising two interns in Dakar working on big data management and a water quality monitoring system using Pharo IoT. Additionally, he is helping a student build a smart game board using RFID sensors and Pharo. The document then discusses central questions of agent-based modeling and provides examples of applications including an
Oleksandr Zaitsev presents their background and vision for research applying software engineering and artificial intelligence to agent-based modeling for sustainable agriculture. They have experience developing open-source tools in Pharo and propose improving the Cormas modeling platform. Specifically, using AI to create intelligent agents and SE techniques to build extensible ABM tools. Collaborating with other Cirad teams could integrate Cormas with GIS and mobile apps to make it more accessible to scientists and farmers. The goal is to advance modeling and popularize Cormas through teaching, publishing, and industrial partnerships.
The document discusses using agent-based modeling (ABM) and the Cormas platform to enable participatory modeling with local stakeholders. It introduces ABM concepts and some applications. It then describes features of the Cormas platform, which was developed for multi-agent simulations and interactive modeling. Finally, it outlines three research directions: developing an intuitive modeling language for non-programmers, enabling tangible interaction with models through physical interfaces, and supporting collaborative modeling where multiple people can interact with the same model simultaneously from different perspectives. The overall aim is to make modeling more accessible and empower citizens to participate meaningfully in the modeling process.
The document summarizes Oleksandr Zaitsev's background and research interests. It introduces Zaitsev as a researcher who received degrees from the Ukrainian Catholic University and Inria studying data science and informatics. His research focuses on using artificial intelligence techniques like data mining, neural networks, and genetic algorithms to help software engineers and farmers. The document also describes Zaitsev's involvement in the Pharo open-source community and his research groups at Inria and CIRAD.
How Libraries Evolve. A Survey of Two Industrial Companies and an Open-Source...Oleksandr Zaitsev
- The document summarizes surveys of library and client developers from different communities to understand their experiences with library evolution and updates.
- Key findings include that developers often have to deal with library updates, need documentation and support for breaking changes, and want to help their clients update.
- Threats to validity include using general survey questions, small population sizes, and not performing statistical tests on the results.
This document provides an overview of PolyMath, a library for scientific computing in Pharo. It discusses PolyMath's history and contributors, some of its key packages and algorithms, how to get started using it, and its future goals, which include decoupling packages, improving performance, integrating more with Pharo AI, improved documentation, Roassal charting, and notebooks. The document is presented by PolyMath contributors to remind attendees about PolyMath, discuss its future direction, and ask how attendees can support the project.
How Fast is AI in Pharo? Benchmarking Linear RegressionOleksandr Zaitsev
This document benchmarks linear regression implementations in Pharo. It finds that calling the optimized LAPACK library from Pharo provides a speedup of over 1,800x compared to a pure Pharo implementation. While scikit-learn is still faster than the Pharo+LAPACK prototype, pure Pharo code runs linear regression up to 15x faster than pure Python. Overall, the results show Pharo can perform fast linear regression by leveraging LAPACK, and with further work Pharo may match or exceed the speed of popular Python machine learning libraries.
Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Lear...Oleksandr Zaitsev
The slides from my presentation at QUATIC 2020 conference. Paper: https://link.springer.com/chapter/10.1007/978-3-030-58793-2_8. Open access preprint: https://hal.inria.fr/hal-02962334
Machine Learning-based Tools to Support Library UpdateOleksandr Zaitsev
The document reports on the progress of a PhD focused on developing machine learning-based tools to support library updates. It provides background on the CIFRE PhD partnership with Arolla software and lists objectives of identifying challenging update scenarios and building a toolkit. Current progress includes analyzing Pharo deprecations, documenting challenging stories, implementing an association rule mining algorithm, and developing a tool for rewriting deprecations. Next steps are to interview developers about update challenges and integrate techniques into a library update toolkit.
1. The document introduces version control systems (VCS) and the basics of using Git and GitHub, including adding, committing, and pushing changes.
2. It discusses the growing popularity of Git and GitHub compared to other VCS tools. Students will now be required to use Git for assignments and submit repositories to GitHub.
3. The remaining sections provide more details on key Git concepts and workflows like initializing a repository, making and sharing changes, pulling updates, and cloning an existing repository.
The document discusses automatic software migration and summarizes the author's PhD work. It introduces (1) the problem of automatic migration between versions, (2) an approach using machine learning tools to perform on-the-fly deprecation rewriting, and (3) a goal of removing humans from the migration loop.
Magic Literals In Pharo discusses different types of literals in the Pharo programming language including true, false, nil, numbers, characters, strings, symbols, and arrays. It introduces the concept of "magic literals" which are literals without a clear explanation and should be avoided for reasons like reduced readability and duplicated logic. The paper then evaluates a heuristic to detect magic literals in Pharo code and estimates that it identifies actual magic literals with 62% precision.
Malibou Pitch Deck For Its €3M Seed Roundsjcobrien
French start-up Malibou raised a €3 million Seed Round to develop its payroll and human resources
management platform for VSEs and SMEs. The financing round was led by investors Breega, Y Combinator, and FCVC.
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...kalichargn70th171
In today's fiercely competitive mobile app market, the role of the QA team is pivotal for continuous improvement and sustained success. Effective testing strategies are essential to navigate the challenges confidently and precisely. Ensuring the perfection of mobile apps before they reach end-users requires thoughtful decisions in the testing plan.
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...kalichargn70th171
In today's business landscape, digital integration is ubiquitous, demanding swift innovation as a necessity rather than a luxury. In a fiercely competitive market with heightened customer expectations, the timely launch of flawless digital products is crucial for both acquisition and retention—any delay risks ceding market share to competitors.
UI5con 2024 - Bring Your Own Design SystemPeter Muessig
How do you combine the OpenUI5/SAPUI5 programming model with a design system that makes its controls available as Web Components? Since OpenUI5/SAPUI5 1.120, the framework supports the integration of any Web Components. This makes it possible, for example, to natively embed own Web Components of your design system which are created with Stencil. The integration embeds the Web Components in a way that they can be used naturally in XMLViews, like with standard UI5 controls, and can be bound with data binding. Learn how you can also make use of the Web Components base class in OpenUI5/SAPUI5 to also integrate your Web Components and get inspired by the solution to generate a custom UI5 library providing the Web Components control wrappers for the native ones.
Using Query Store in Azure PostgreSQL to Understand Query PerformanceGrant Fritchey
Microsoft has added an excellent new extension in PostgreSQL on their Azure Platform. This session, presented at Posette 2024, covers what Query Store is and the types of information you can get out of it.
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSISTier1 app
Are you ready to unlock the secrets hidden within Java thread dumps? Join us for a hands-on session where we'll delve into effective troubleshooting patterns to swiftly identify the root causes of production problems. Discover the right tools, techniques, and best practices while exploring *real-world case studies of major outages* in Fortune 500 enterprises. Engage in interactive lab exercises where you'll have the opportunity to troubleshoot thread dumps and uncover performance issues firsthand. Join us and become a master of Java thread dump analysis!
Project Management: The Role of Project Dashboards.pdfKarya Keeper
Project management is a crucial aspect of any organization, ensuring that projects are completed efficiently and effectively. One of the key tools used in project management is the project dashboard, which provides a comprehensive view of project progress and performance. In this article, we will explore the role of project dashboards in project management, highlighting their key features and benefits.
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...The Third Creative Media
"Navigating Invideo: A Comprehensive Guide" is an essential resource for anyone looking to master Invideo, an AI-powered video creation tool. This guide provides step-by-step instructions, helpful tips, and comparisons with other AI video creators. Whether you're a beginner or an experienced video editor, you'll find valuable insights to enhance your video projects and bring your creative ideas to life.
How Can Hiring A Mobile App Development Company Help Your Business Grow?ToXSL Technologies
ToXSL Technologies is an award-winning Mobile App Development Company in Dubai that helps businesses reshape their digital possibilities with custom app services. As a top app development company in Dubai, we offer highly engaging iOS & Android app solutions. https://rb.gy/necdnt
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfVALiNTRY360
Salesforce Healthcare CRM, implemented by VALiNTRY360, revolutionizes patient management by enhancing patient engagement, streamlining administrative processes, and improving care coordination. Its advanced analytics, robust security, and seamless integration with telehealth services ensure that healthcare providers can deliver personalized, efficient, and secure patient care. By automating routine tasks and providing actionable insights, Salesforce Healthcare CRM enables healthcare providers to focus on delivering high-quality care, leading to better patient outcomes and higher satisfaction. VALiNTRY360's expertise ensures a tailored solution that meets the unique needs of any healthcare practice, from small clinics to large hospital systems.
For more info visit us https://valintry360.com/solutions/health-life-sciences
14 th Edition of International conference on computer visionShulagnaSarkar2
About the event
14th Edition of International conference on computer vision
Computer conferences organized by ScienceFather group. ScienceFather takes the privilege to invite speakers participants students delegates and exhibitors from across the globe to its International Conference on computer conferences to be held in the Various Beautiful cites of the world. computer conferences are a discussion of common Inventions-related issues and additionally trade information share proof thoughts and insight into advanced developments in the science inventions service system. New technology may create many materials and devices with a vast range of applications such as in Science medicine electronics biomaterials energy production and consumer products.
Nomination are Open!! Don't Miss it
Visit: computer.scifat.com
Award Nomination: https://x-i.me/ishnom
Conference Submission: https://x-i.me/anicon
For Enquiry: Computer@scifat.com
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
Preparing Non - Technical Founders for Engaging a Tech AgencyISH Technologies
Preparing non-technical founders before engaging a tech agency is crucial for the success of their projects. It starts with clearly defining their vision and goals, conducting thorough market research, and gaining a basic understanding of relevant technologies. Setting realistic expectations and preparing a detailed project brief are essential steps. Founders should select a tech agency with a proven track record and establish clear communication channels. Additionally, addressing legal and contractual considerations and planning for post-launch support are vital to ensure a smooth and successful collaboration. This preparation empowers non-technical founders to effectively communicate their needs and work seamlessly with their chosen tech agency.Visit our site to get more details about this. Contact us today www.ishtechnologies.com.au
Unveiling the Advantages of Agile Software Development.pdfbrainerhub1
Learn about Agile Software Development's advantages. Simplify your workflow to spur quicker innovation. Jump right in! We have also discussed the advantages.
Unveiling the Advantages of Agile Software Development.pdf
DepMiner: Automatic Recommendation of Transformation Rules for Method Deprecation
1. DepMiner: Automatic Recommendation of
Transformation Rules for Method Deprecation
1Arolla, Paris
2Inria, Univ. Lille, CNRS, Centrale Lille, UMR 9189 - CRIStAL
Oleksandr ZAITSEV1,2, Stéphane DUCASSE 2, Nicolas ANQUETIL2, Arnaud THIEFAINE1
The 20th International Conference on Software and Systems Reuse
oleksandr.zaitsev@arolla.fr
8. 8
Library Update Problem
If a client system depends on version N of a given
library, what must be changed in the client code
to use version N+K of that same library?
9. 9
Tools for Client Developers
Client
Developer
Library
Developer
Support?
Objective: Help client developers
update their system
Tools
01
10
Code
Analysis
Commit
History
(knows the client
system and what
parts of API it uses)
11. 11
Support from Library Developers
Deprecation
Documentation (release notes, change logs, etc.)
Communication (forums, chats, mailing lists)
A
B
D
C Automation (update script, rewriting rules)
How can library developers support their clients:
Client
Developer
Library
Developer
Support
12. Documentation (release notes, change logs, etc.)
Communication (forums, chats, mailing lists)
A
D
How can library developers support their clients:
12
The Scope of Our Work
B Deprecation
C Automation (update script, rewriting rules)
Client
Developer
Library
Developer
Support
Tools
Commit
History
Mine frequent
method call
replacements from
the commit history.
Generate
deprecations with
rules that can
fi
x
client code.
14. 14
Pharo Programming Language
Pharo is a pure object-oriented
programming language designed
in tradition of Smalltalk. It is also an
IDE developed entirely in itself.
We focus on Pharo because:
1. We have access to its
core developers
2. Pharo is convenient for
manipulating source code
19. 19
isSpecial
self
deprecated: ‘Renamed to #needsFullDefinition’
transformWith:
‘`@receiver isSpecial’ -> ’`@receiver needsFullDefinition’
^ self needsFullDefinition
Antecedent
(left hand side)
matches the method calls
that should be replaced
Consequent
(right hand side)
de
fi
nes the replacement
Transformation Rule
20. Why do we need to
support library
developers?
Part 3:
21. 21
Java
33 %
67 %
Deprecations with helpful
replacement messages
Deprecations without helpful
replacement messages
Replacement Messages
C#
22 %
78 %
JS
33 %
67 %
[Brito et al., 2018] [Brito et al., 2018] [Nascimento et al., 2020]
23. 23
367
9 %
32 %
59 %
Analysis of Deprecations in Pharo 8
Rewriting deprecations
(contain a transformation rule)
Non-rewriting deprecations
(no transformation rule)
Missed opportunity
24. 24
-4
-13
-52
-43
-5
-7
-1
-24
5
28
2
1
3
179
Rename method
Split method
Complex replacement
Remove argument(-s)
Add argument(-s)
Change receiver
Delete method
Deprecate class
Push down
Analysis of Deprecations in Pharo 8
Rewriting deprecations
(contain a transformation rule)
Non-rewriting deprecations
(no transformation rule)
25. 25
-4
-13
-52
-43
-5
-7
-1
-24
5
28
2
1
3
179
Rename method
Split method
Complex replacement
Remove argument(-s)
Add argument(-s)
Change receiver
Delete method
Deprecate class
Push down
Analysis of Deprecations in Pharo 8
Rewriting deprecations
(contain a transformation rule)
Non-rewriting deprecations
(no transformation rule)
Missed opportunity
26. 26
Challenge 1:
Challenge 2:
Absence of method visibility.
Absence of static type information.
There are no public/private keywords in Pharo.
Every method is public but not every method is meant to be used.
Pharo is a dynamically-typed language. Just by looking at source
code, we do not know the class from which the function is called, the
return type, or the argument types.
28. 28
{
Id: ef4fdd35fb05e74aa12aad4d22a37e17a8d87b5b,
Removed methods: […],
Added methods: […],
Modified methods: [
{
Old source code: …,
New source code: …,
Removed method calls: [smartDescription],
Added method calls: [description],
}],
Added classes: […],
Removed classes: […],
…
}
Line-based diffs High-level commits
Which lines of code were added or removed? Which methods, classes, or packages
were added, removed, or modified?
Q:
Q:
Step 1. Collecting the Data
29. 29
public static LinkedList insert(LinkedList list, int data)
{
Node new_node = new Node(data);
- new_node.setNext(null);
+ new_node.setNextNode(null);
if (list.head() == null) {
list.setHead(new_node);
}
else {
Node last = list.head;
- while (last.next() != null) {
- last = last.next();
+ while (last.nextNode() != null) {
+ last = last.nextNode();
}
last.next = new_node;
}
return list;
}
Method Change
Method change —
one method modified
by one commit
30. 30
public static LinkedList insert(LinkedList list, int data)
{
Node new_node = new Node(data);
- new_node.setNext(null);
+ new_node.setNextNode(null);
if (list.head() == null) {
list.setHead(new_node);
}
else {
Node last = list.head;
- while (last.next() != null) {
- last = last.next();
+ while (last.nextNode() != null) {
+ last = last.nextNode();
}
last.next = new_node;
}
return list;
}
{
remove(setNext),
add(setNextNode),
remove(next),
remove(next),
add(nextNode),
add(nextNode)
}
Method Change as Transaction
Transaction — set of
added and removed
method calls in a
method change:
31. 31
Missing methods — public methods that were present in the old version
and no longer exist in the new version.
Step 2. Detecting Breaking Changes
new API
old API
32. 32
Missing methods — public methods that were present in the old version
and no longer exist in the new version.
Step 2. Detecting Breaking Changes
new API
old API
Challenge 1:
Which methods are “public”?
We address this challenge by de
fi
ning
language-speci
fi
c heuristics.
For example, in Pharo, test, example,
baseline methods, etc. can be considered “private”.
The complete list of heuristics and the tool to deduce
method visibility in Pharo can be found in our repository:
https://github.com/olekscode/VisibilityDeductor
33. 33
Customer 1:
Customer 2:
Customer 3:
{ bread, butter, avocado }
{ bread, butter, bananas }
{ bread, butter, milk, cereal }
Customer 4: { bread, milk, cereal }
Customer 5: { butter, milk, cereal }
Transactions: Q1: What are the products that are
frequently purchased together?
Q2: What can we recommend to
people who buy bread?
(frequent itemsets)
(association rules)
Step 3. Market Basket Analysis
34. 34
Customer 1:
Customer 2:
Customer 3:
{ bread, butter, avocado }
{ bread, butter, bananas }
{ bread, butter, milk, cereal }
Customer 4: { bread, milk, cereal }
Customer 5: { butter, milk, cereal }
Transactions: Q1: What are the products that are
frequently purchased together?
Q2: What can we recommend to
people who buy bread?
{ bread } { butter }
Con
fi
dence: 75%
{ bread, butter }
{ milk, cereal }
Support: 60%
Support: 60%
Step 3. Market Basket Analysis
35. 35
Q1: What are the operations that frequently appear together in
method changes?
Q2: What can we recommend as a replacement for next() ?
{ next } { nextNode }
Con
fi
dence: 75%
{ remove(next), add(nextNode) }
Support: 60%
Frequent Method Call Replacements
37. 37
Step 4. Generating Deprecations
Node >> next
self
deprecated: ‘Use #nextNode instead.’
transformWith:
‘`@receiver next’ ->
’`@receiver nextNode’.
^ self needsFullDefinition
Generated Deprecation
Missing Method
Node >> next
Association Rule
{next}
{nextNode}
Support: 60%
Confidence: 75%
Challenge 2:
Is “nextNode” called from the same class as “next”?
To address this challenge, retain only those association
rules, where methods in antecedent and consequent of
the rule are de
fi
ned in the same class.
(i.e. new version Node must de
fi
ne nextNode method,
otherwise the association rule is discarded).
This is also the limitation of our approach.
38. 38
commit
history
old
API
new
API
high-level
changes
diff
Removed
public methods
Step 1: Collect data Step 2: Detect breaking changes
Step 3: Mine frequent method call replacements
Software Library
v1.0
v2.0
method 2
method 3
method 4
method 5
method 1
for
each
pull
requests
A-Priori
missing
method
association
rules Library
Developer
history
oldMethod
self
deprecated: ‘Use newMethod’
transformWith:‘`@rec oldMethod’
->‘`@rec newMethod’.
^ self newMethod
Step 4: Generate deprecations
43. 43
Limitations & Future Work
Unused / untested
methods
Our approach detects the changes in how the library uses
its own API. It is ineffective for methods that are not tested
and only called by clients
Unordered set of
method calls
Our approach ignores the order of method calls as well as
the distance between method calls in the source code.
Search entire
commit history
When a method is removed, it is more likely that the effect
caused by it happen in the same commit or in several next
commits. We search the whole history.
Reflective
operations
Methods that are invoked programmatically (through the
reflective operations) will not be detected.
44. oleksandr.zaitsev@arolla.fr
Summary
Get in touch:
‣ We proposed an approach to help library developers by generating
deprecations with transformation rules.
‣ Those deprecations can be used to rewrite client code.
‣ Our approach is based on frequent method call replacement from
the commit history.
‣ We implemented our approach as a prototype tool for Pharo and
evaluated it on 5 open-source projects.
‣ 134 deprecations generated by our tool were accepted and
merged into the projects.