Starting from the JDK itself, a wave of migrations to module systems is bound to propagate throughout the Java landscape. In this session, expand your mental toolbox by learning what modularity is, why it is important, and how to divide your monolithic application into well-designed functional modules. First you will gain an intimate understanding of modularity by hearing about several of its mind-bending paradoxes. Then you will learn how popular design principles apply to creating modules and their APIs. Finally you will learn how common monolithic software architectures exhibit various degrees of modularization of functional features and what that means for your forthcoming modularization efforts.
Large-Scale Ads CTR Prediction with Spark and Deep Learning: Lessons Learned ...Databricks
CTR prediction algorithms are essential, and are used extensively for ads bidding and sponsored search. While logistic regression models have proven effective for this kind of problem, rapid growth in the amount of data has created a lot of challenges. For example, how to train a logistic regression model with billions of parameters in a commodity hardware cluster, or how to improve the model’s accuracy with better feature engineering. Other challenges include figuring out how to benefit from popular deep learning technologies to reduce the dependence on human labor and expert knowledge, and how to improve job performance given such a complicated workload.
At Spark Summit East 2017, Hortonworks introduced vector-free L-BFGS to conquer the scalability challenge of MLlib and provide a very scalable logistic regression implementation. In this talk, hear about their experience integrating this implementation with different feature learning technologies to solve Ad CTR prediction problems, and the lessons they learned.
Starting from the JDK itself, a wave of migrations to module systems is bound to propagate throughout the Java landscape. In this session, expand your mental toolbox by learning what modularity is, why it is important, and how to divide your monolithic application into well-designed functional modules. First you will gain an intimate understanding of modularity by hearing about several of its mind-bending paradoxes. Then you will learn how popular design principles apply to creating modules and their APIs. Finally you will learn how common monolithic software architectures exhibit various degrees of modularization of functional features and what that means for your forthcoming modularization efforts.
Large-Scale Ads CTR Prediction with Spark and Deep Learning: Lessons Learned ...Databricks
CTR prediction algorithms are essential, and are used extensively for ads bidding and sponsored search. While logistic regression models have proven effective for this kind of problem, rapid growth in the amount of data has created a lot of challenges. For example, how to train a logistic regression model with billions of parameters in a commodity hardware cluster, or how to improve the model’s accuracy with better feature engineering. Other challenges include figuring out how to benefit from popular deep learning technologies to reduce the dependence on human labor and expert knowledge, and how to improve job performance given such a complicated workload.
At Spark Summit East 2017, Hortonworks introduced vector-free L-BFGS to conquer the scalability challenge of MLlib and provide a very scalable logistic regression implementation. In this talk, hear about their experience integrating this implementation with different feature learning technologies to solve Ad CTR prediction problems, and the lessons they learned.
Tackling Scaling Challenges of Apache Spark at LinkedInDatabricks
Over the past 3 years, Apache Spark has transitioned from an experiment to the dominant production compute engine at LinkedIn. Within the past year, we have seen a 3X growth of daily Spark applications.
3450 - Writing and optimising applications for performance in a hybrid messag...Timothy McCormick
Messaging architectures in any environment, from local standalone deployments through to public clouds, must provide the highest reliability yet maximize their performance. This session gives you an insight into IBM MQ and how applications can be made to perform to their absolute best while maintaining the data integrity that IBM MQ is renowned for. We'll see how this can be achieved through a combination of good application design, system tuning and architectural patterns.
Fan-out, fan-in & the multiplexer: Replication recipes for global platform di...HostedbyConfluent
This session will dive into our most successful (and unsuccessful!) multi-cluster event replication patterns.
An x-ray of the cross cluster distribution model that powers our globally distributed APIs will touch on the benefits that this model has provided in terms of client API experience, delivery agility and developer experience.
We will focus on recipes for effective use of Mirror Maker event replication to power platform distribution including the challenges of managing a 'fan in' event replication workflow - pulling events created in satellite clusters back to a mothership cluster for processing.
We will introduce the elegant technique of replication event multiplexing - which can be used to simplify the burden of managing a 'fan-in' replication topology by eliminating regional awareness from the application domain and improving replication health monitoring & observability.
Radical Speed for SQL Queries on Databricks: Photon Under the HoodDatabricks
Join this session to hear from the Photon product and engineering team talk about the latest developments with the project.
As organizations embrace data-driven decision-making, it has become imperative for them to invest in a platform that can quickly ingest and analyze massive amounts and types of data. With their data lakes, organizations can store all their data assets in cheap cloud object storage. But data lakes alone lack robust data management and governance capabilities. Fortunately, Delta Lake brings ACID transactions to your data lakes – making them more reliable while retaining the open access and low storage cost you are used to.
Using Delta Lake as its foundation, the Databricks Lakehouse platform delivers a simplified and performant experience with first-class support for all your workloads, including SQL, data engineering, data science & machine learning. With a broad set of enhancements in data access and filtering, query optimization and scheduling, as well as query execution, the Lakehouse achieves state-of-the-art performance to meet the increasing demands of data applications. In this session, we will dive into Photon, a key component responsible for efficient query execution.
Photon was first introduced at Spark and AI Summit 2020 and is written from the ground up in C++ to take advantage of modern hardware. It uses the latest techniques in vectorized query processing to capitalize on data- and instruction-level parallelism in CPUs, enhancing performance on real-world data and applications — all natively on your data lake. Photon is fully compatible with the Apache Spark™ DataFrame and SQL APIs to ensure workloads run seamlessly without code changes. Come join us to learn more about how Photon can radically speed up your queries on Databricks.
Scaling out logistic regression with SparkBarak Gitsis
Large scale multinomial logistic regression with Spark. Contains animated gifs. Analysis of LBFGS. Real world spark configurations. SimilarWeb categorization algorithm
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)Gabriele Bartolini
Migrating an Oracle database to Postgres is never an automated operation. And it rarely (never?) involve just the database. Experience brought us to develop an agile methodology for the migration process, involving schema migration, data import, migration of procedures and queries up to the generation of unit tests for QA.
Pitfalls, technologies and main migration opportunities will be outlined, focusing on the reduction of total costs of ownership and management of a database solution in the middle-long term (without reducing quality and business continuity requirements).
Presentation given at the OpenStack summit in Paris (Kilo) on Tue Nov 4th.
Last summit I had the pleasure to present a talk which encountered some success "Are enterprise ready for the OpenStack transformation?" (also published on SlideShare) . This talk is a follow up on what are the best practices that are successful in operating the transformation. We will first focus on identifying the right use cases for a generic enterprise, then define a roadmap with an organisational and a technical track, to finish with the definition what would be our success criterias for our group. This will happen as a workshop summary based on the multiple engagements eNovance has been delivering over the past 2 years.
Implementing distributed agile framework with
Scrum, XP & Effective Tools usage Dev ops. C. Padma presented this presentation during India Agile week 2015 - Bangalore
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...HostedbyConfluent
"Regular performance testing is one of the pillars of Kafka Streams’ reliability and efficiency. Beyond ensuring dependable releases, regular performance testing supports engineers in new feature development with the ability to easily test the performance impact of their features, compare different approaches, etc.
In this session, Alex and John share their experience from developing, using, and maintaining a performance testing framework for Kafka Streams that has prevented multiple performance regressions over the last 5 years. They cover guiding principles and architecture, how to ensure statistical significance and stability of results, and how to automate regression detection for actionable notifications.
This talk sheds light on how Apache Kafka is able to foster a vibrant open-source community while maintaining a high performance bar across many years and releases. It also empowers performance-minded engineers to avoid common pitfalls and bring high-quality performance testing to their own systems."
Sesja na temat analizy sentymentu, ale także i algorytmów uczenia maszynowego w bibliotekach do języka R Microsoft. Sesja była prezentowana na konferencji WhyR? w Warszawie
Tech-Talk at Bay Area Spark Meetup
Apache Spark(tm) has rapidly become a key tool for data scientists to explore, understand and transform massive datasets and to build and train advanced machine learning models. The question then becomes, how do I deploy these model to a production environment. How do I embed what I have learned into customer facing data applications. Like all things in engineering, it depends.
In this meetup, we will discuss best practices from Databricks on how our customers productionize machine learning models and do a deep dive with actual customer case studies and live demos of a few example architectures and code in Python and Scala. We will also briefly touch on what is coming in Apache Spark 2.X with model serialization and scoring options.
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...PAPIs.io
When making machine learning applications in Uber, we identified a sequence of common practices and painful procedures, and thus built a machine learning platform as a service. We here present the key components to build such a scalable and reliable machine learning service which serves both our online and offline data processing needs.
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
More Related Content
Similar to Modularization compass - Navigating white waters of feature-oriented modularity
Tackling Scaling Challenges of Apache Spark at LinkedInDatabricks
Over the past 3 years, Apache Spark has transitioned from an experiment to the dominant production compute engine at LinkedIn. Within the past year, we have seen a 3X growth of daily Spark applications.
3450 - Writing and optimising applications for performance in a hybrid messag...Timothy McCormick
Messaging architectures in any environment, from local standalone deployments through to public clouds, must provide the highest reliability yet maximize their performance. This session gives you an insight into IBM MQ and how applications can be made to perform to their absolute best while maintaining the data integrity that IBM MQ is renowned for. We'll see how this can be achieved through a combination of good application design, system tuning and architectural patterns.
Fan-out, fan-in & the multiplexer: Replication recipes for global platform di...HostedbyConfluent
This session will dive into our most successful (and unsuccessful!) multi-cluster event replication patterns.
An x-ray of the cross cluster distribution model that powers our globally distributed APIs will touch on the benefits that this model has provided in terms of client API experience, delivery agility and developer experience.
We will focus on recipes for effective use of Mirror Maker event replication to power platform distribution including the challenges of managing a 'fan in' event replication workflow - pulling events created in satellite clusters back to a mothership cluster for processing.
We will introduce the elegant technique of replication event multiplexing - which can be used to simplify the burden of managing a 'fan-in' replication topology by eliminating regional awareness from the application domain and improving replication health monitoring & observability.
Radical Speed for SQL Queries on Databricks: Photon Under the HoodDatabricks
Join this session to hear from the Photon product and engineering team talk about the latest developments with the project.
As organizations embrace data-driven decision-making, it has become imperative for them to invest in a platform that can quickly ingest and analyze massive amounts and types of data. With their data lakes, organizations can store all their data assets in cheap cloud object storage. But data lakes alone lack robust data management and governance capabilities. Fortunately, Delta Lake brings ACID transactions to your data lakes – making them more reliable while retaining the open access and low storage cost you are used to.
Using Delta Lake as its foundation, the Databricks Lakehouse platform delivers a simplified and performant experience with first-class support for all your workloads, including SQL, data engineering, data science & machine learning. With a broad set of enhancements in data access and filtering, query optimization and scheduling, as well as query execution, the Lakehouse achieves state-of-the-art performance to meet the increasing demands of data applications. In this session, we will dive into Photon, a key component responsible for efficient query execution.
Photon was first introduced at Spark and AI Summit 2020 and is written from the ground up in C++ to take advantage of modern hardware. It uses the latest techniques in vectorized query processing to capitalize on data- and instruction-level parallelism in CPUs, enhancing performance on real-world data and applications — all natively on your data lake. Photon is fully compatible with the Apache Spark™ DataFrame and SQL APIs to ensure workloads run seamlessly without code changes. Come join us to learn more about how Photon can radically speed up your queries on Databricks.
Scaling out logistic regression with SparkBarak Gitsis
Large scale multinomial logistic regression with Spark. Contains animated gifs. Analysis of LBFGS. Real world spark configurations. SimilarWeb categorization algorithm
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)Gabriele Bartolini
Migrating an Oracle database to Postgres is never an automated operation. And it rarely (never?) involve just the database. Experience brought us to develop an agile methodology for the migration process, involving schema migration, data import, migration of procedures and queries up to the generation of unit tests for QA.
Pitfalls, technologies and main migration opportunities will be outlined, focusing on the reduction of total costs of ownership and management of a database solution in the middle-long term (without reducing quality and business continuity requirements).
Presentation given at the OpenStack summit in Paris (Kilo) on Tue Nov 4th.
Last summit I had the pleasure to present a talk which encountered some success "Are enterprise ready for the OpenStack transformation?" (also published on SlideShare) . This talk is a follow up on what are the best practices that are successful in operating the transformation. We will first focus on identifying the right use cases for a generic enterprise, then define a roadmap with an organisational and a technical track, to finish with the definition what would be our success criterias for our group. This will happen as a workshop summary based on the multiple engagements eNovance has been delivering over the past 2 years.
Implementing distributed agile framework with
Scrum, XP & Effective Tools usage Dev ops. C. Padma presented this presentation during India Agile week 2015 - Bangalore
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...HostedbyConfluent
"Regular performance testing is one of the pillars of Kafka Streams’ reliability and efficiency. Beyond ensuring dependable releases, regular performance testing supports engineers in new feature development with the ability to easily test the performance impact of their features, compare different approaches, etc.
In this session, Alex and John share their experience from developing, using, and maintaining a performance testing framework for Kafka Streams that has prevented multiple performance regressions over the last 5 years. They cover guiding principles and architecture, how to ensure statistical significance and stability of results, and how to automate regression detection for actionable notifications.
This talk sheds light on how Apache Kafka is able to foster a vibrant open-source community while maintaining a high performance bar across many years and releases. It also empowers performance-minded engineers to avoid common pitfalls and bring high-quality performance testing to their own systems."
Sesja na temat analizy sentymentu, ale także i algorytmów uczenia maszynowego w bibliotekach do języka R Microsoft. Sesja była prezentowana na konferencji WhyR? w Warszawie
Tech-Talk at Bay Area Spark Meetup
Apache Spark(tm) has rapidly become a key tool for data scientists to explore, understand and transform massive datasets and to build and train advanced machine learning models. The question then becomes, how do I deploy these model to a production environment. How do I embed what I have learned into customer facing data applications. Like all things in engineering, it depends.
In this meetup, we will discuss best practices from Databricks on how our customers productionize machine learning models and do a deep dive with actual customer case studies and live demos of a few example architectures and code in Python and Scala. We will also briefly touch on what is coming in Apache Spark 2.X with model serialization and scoring options.
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...PAPIs.io
When making machine learning applications in Uber, we identified a sequence of common practices and painful procedures, and thus built a machine learning platform as a service. We here present the key components to build such a scalable and reliable machine learning service which serves both our online and offline data processing needs.
Similar to Modularization compass - Navigating white waters of feature-oriented modularity (20)
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Understanding Nidhi Software Pricing: A Quick Guide 🌟
Choosing the right software is vital for Nidhi companies to streamline operations. Our latest presentation covers Nidhi software pricing, key factors, costs, and negotiation tips.
📊 What You’ll Learn:
Key factors influencing Nidhi software price
Understanding the true cost beyond the initial price
Tips for negotiating the best deal
Affordable and customizable pricing options with Vector Nidhi Software
🔗 Learn more at: www.vectornidhisoftware.com/software-for-nidhi-company/
#NidhiSoftwarePrice #NidhiSoftware #VectorNidhi
In the ever-evolving landscape of technology, enterprise software development is undergoing a significant transformation. Traditional coding methods are being challenged by innovative no-code solutions, which promise to streamline and democratize the software development process.
This shift is particularly impactful for enterprises, which require robust, scalable, and efficient software to manage their operations. In this article, we will explore the various facets of enterprise software development with no-code solutions, examining their benefits, challenges, and the future potential they hold.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Looking for a reliable mobile app development company in Noida? Look no further than Drona Infotech. We specialize in creating customized apps for your business needs.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
E-commerce Application Development Company.pdfHornet Dynamics
Your business can reach new heights with our assistance as we design solutions that are specifically appropriate for your goals and vision. Our eCommerce application solutions can digitally coordinate all retail operations processes to meet the demands of the marketplace while maintaining business continuity.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
AI Genie Review: World’s First Open AI WordPress Website CreatorGoogle
AI Genie Review: World’s First Open AI WordPress Website Creator
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-genie-review
AI Genie Review: Key Features
✅Creates Limitless Real-Time Unique Content, auto-publishing Posts, Pages & Images directly from Chat GPT & Open AI on WordPress in any Niche
✅First & Only Google Bard Approved Software That Publishes 100% Original, SEO Friendly Content using Open AI
✅Publish Automated Posts and Pages using AI Genie directly on Your website
✅50 DFY Websites Included Without Adding Any Images, Content Or Doing Anything Yourself
✅Integrated Chat GPT Bot gives Instant Answers on Your Website to Visitors
✅Just Enter the title, and your Content for Pages and Posts will be ready on your website
✅Automatically insert visually appealing images into posts based on keywords and titles.
✅Choose the temperature of the content and control its randomness.
✅Control the length of the content to be generated.
✅Never Worry About Paying Huge Money Monthly To Top Content Creation Platforms
✅100% Easy-to-Use, Newbie-Friendly Technology
✅30-Days Money-Back Guarantee
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
#AIGenieApp #AIGenieBonus #AIGenieBonuses #AIGenieDemo #AIGenieDownload #AIGenieLegit #AIGenieLiveDemo #AIGenieOTO #AIGeniePreview #AIGenieReview #AIGenieReviewandBonus #AIGenieScamorLegit #AIGenieSoftware #AIGenieUpgrades #AIGenieUpsells #HowDoesAlGenie #HowtoBuyAIGenie #HowtoMakeMoneywithAIGenie #MakeMoneyOnline #MakeMoneywithAIGenie
AI Genie Review: World’s First Open AI WordPress Website Creator
Modularization compass - Navigating white waters of feature-oriented modularity
1. Modularization Compass:
Navigating the White Waters
of Feature-Oriented Modularity
Andrzej Olszak and Bo Nørregaard Jørgensen
The Maersk Mc-Kinney Moller Institute
University of Southern Denmark
2. Agenda
1. Motivation
2. Features
3. Evolution of feature modularity
4. Drift of modularity
5. Evaluation
6. Results
3. Motivation
• Software systems have to evolve to
accommodate new user expectations:
– Add and enhance features
– Fix existing features
• Working with existing code is difficult:
– Where is this feature implemented?
– Why is this logic here?
– Can this change break another feature?
• It only gets more difficult over time!
4. Features – user’s perspective on code
• Feature – unit of user-identifiable functionality of software
– Scattering increases change scope and delocalization effects
– Tangling increases change propagation and interleaving effects
5. Tracking the evolution of features
Add feature Enhance
feature
FSCA=2 (2+2)/2
(3+2)/2
• Are these values low or high?
– Should we aim for FSCA=1? What is the ‘optimal’ value?
• How much can be improved by refactoring?
– When to refactor?
– What/how to refactor?
?!
6. Drift of modularity as distance to ‘optimum’
• Drift is the distance between the actual and
the ‘optimal’ values of a metric
– ‘optimal’ = after applying optimal refactorings*
Actual modularization
Optimal modularization
r1 r2 r3 r4
Absolute
value
Drift
– Distance from optimal design
– Potential for refactoring
7. Calculating drift
• Obtaining ‘optimal’ modularizations
– Criteria for detecting ‘optimality’
– Efficient traversal of design space
– Automated refactorings
9. Calculating drift by optimizing program
structure
• Multi-objective optimization of package structures to
modularize features
– Based on the move-class refactoring
• MOGGA: genetic algorithm on a population of designs
– Pareto-optimality
– Grouping operators
• Resulting designs established using automated code
transformations
10. Example optimization results
• MOGGA config: population of 300 designs over 500 iterations
with 5% mutation prob.
0.03
0.025
0.02
0.015
0.01
0.005
0
fsca
JHotDraw SVG
Scattering -55%
Feature
Original
Automated
0.045
0.04
0.035
0.03
0.025
0.02
0.015
0.01
0.005
0
ftang
JHotDraw SVG
Tangling -60%
Package
Original
Automated
11. Evaluation
• Goal: To evaluate whether drift-based metrics
bring new information, as compared to their
absolute counterparts
• Procedure:
1. Locate features in code
2. Measure of absolute Scattering and Tangling
3. Measure drifts
• Target: release histories of 3 OSS
12. Feature location
• Recovered feature specifications from
documentation and UIs
• Located features in classes and methods using
execution tracing and feature-entry annotations
Application releases Identified features
JHotDraw Pert
11 Releases: 5.2; 5.3;
5.4b1; 6.0b1; 7.0.7; 7.0.8;
7.0.9; 7.1; 7.2; 7.3; 7.3.1
Align, Dependency tool, Edit basic, Edit figure, Exit program, Export drawing
(7.0.7), Group figures, Init program, Line tool (removed in 6.0b1), Modify
figure, Multiple windows (7.0.7), New drawing, Open drawing, Order figures,
Save as drawing, Selection tool, Snap to grid, Task tool, Text tool, Undo redo
(5.3), Zoom (7.0.7)
RText
17 Releases
…
FreeMind
13 Releases
…
13. Results – JHotDraw Pert
• Abs: FSCA FTANG
• Drift: FSCA FTANG
– FTANG drift > FSCA drift
• Architectural refactoring in v7.0.7
– Improved separation of features
– FSCA: drift decreases while absolute
value increases
• Abs: FSCA FTANG
• Drift: FSCA FTANG
– FSCA drift remains constant while
absolute value increases
9
8
7
6
5
4
3
2
1
0
FSCA
Release
Scattering drift of Pert
14
12
10
8
6
4
2
0
FTANG
Release
Tangling drift of Pert
Drift of tangling Absolute tangling
14. Some trends in remaining results
“Breakaway” points
“Oscillations” of drift?
(Antón and Potts, 2003)
Tangling drift >
scattering drift
(potential for separating)
7
6
5
4
3
2
1
20
18
16
14
12
10
8
6
4
2
0
FTANG
Release
Tangling drift of FreeMind
Drift of tangling Absolute tangling
10
9
8
7
6
5
4
3
2
1
12
10
8
6
4
2
0
FTANG
Release
Tangling drift of RText
Drift of tangling Absolute tangling
0
FSCA
Release
Scattering drift of RText
Drift of scattering Absolute scattering
0
FSCA
Release
Scattering drift of FreeMind
Drift of scattering Absolute scattering
15. Conclusion & Next steps
• Drift:
– Distance from ‘optimum’
– Potential for refactoring
– Not limited to feature-oriented metrics!
• Interesting observations from the 3 systems
– Demonstrated usefulness, more data needed to fully
understand and generalize the observations
• MOGGA config & performance can be optimized
• Method-level refactorings can be explored