The document summarizes IBM's experience migrating a large codebase from CVS to Git. It involved migrating over 40 active committers and around 600 bundles built daily across 4 active development streams. The migration process took several steps including converting the CVS repositories to Git, adding .gitignore files, and optimizing the repositories. Quotes from IBM employees discuss advantages of Git like thinking in terms of branches instead of patches, and challenges like a learning curve for developers.
Has it really been 10 years?
EclipseCon Europe, November 3, 2011
John Kellerman and Kim Moir
Live recording is available on FOSSLC
http://www.fosslc.org/drupal/content/has-it-really-been-10-years
A talk given to JCConf 2015 on 2015/12/05.
在程式設計領域,“immutable objects” 是相當重要的設計模式。同樣的,在虛擬化及雲端時代,“immutable infrastructure” 也成為新一代的顯學。在資源及流程的充分配合下,這將會大大簡化系統的複雜度,穩定性也會大大提升。
本演講將會從觀念出發,並佐以部份實作建議,讓大家有足夠資訊來評估此架構的好處。
Video: https://youtu.be/9j008nd6-A4
Has it really been 10 years?
EclipseCon Europe, November 3, 2011
John Kellerman and Kim Moir
Live recording is available on FOSSLC
http://www.fosslc.org/drupal/content/has-it-really-been-10-years
A talk given to JCConf 2015 on 2015/12/05.
在程式設計領域,“immutable objects” 是相當重要的設計模式。同樣的,在虛擬化及雲端時代,“immutable infrastructure” 也成為新一代的顯學。在資源及流程的充分配合下,這將會大大簡化系統的複雜度,穩定性也會大大提升。
本演講將會從觀念出發,並佐以部份實作建議,讓大家有足夠資訊來評估此架構的好處。
Video: https://youtu.be/9j008nd6-A4
Maven is the most popular Java Dependency Management Tool.
In this hands-on course, you will understand how Maven makes the life of a Java developer easy. We will use a step by step approach with 20 steps.
During the course, you will automate these using Maven.
Compiling Java Code
Running Unit Tests
Building Jar's and Wars
Running web applications in Tomcat
Setting up new projects
You will learn following features of Maven with 5 Example Projects on Github.
Dependency Management - including Transitive Dependencies
Maven Project Object Model
Maven Build Life Cycle
Maven Plugins
Maven Archetypes - Generate Projects
Maven Best Practices
Multi Module Maven Projects
Cloud Foundry Summit 2015: 10 common errors when pushing apps to cloud foundryJack-Junjie Cai
You may experience some errors when you push your application to CloudFoundry. Some of them are easier to figure out, while others may be mysterious and harder to diagnose. This session will examine 10 common errors that may happen during application push, including their symptom, the tools and techniques to diagnose them, and the possible solutions. The session will mostly focus on Java and node.js applications, but some of the tips applies to all runtimes.
Java REST API Framework Comparison - UberConf 2021Matt Raible
Use Spring Boot! No, use Micronaut!! Nooooo, Quarkus is the best!!!
There's a lot of developers praising the hottest, and fastest, Java REST frameworks: Micronaut, Quarkus, and Spring Boot. In this session, you'll learn how to do the following with each framework:
✅ Build a REST API
✅ Secure your API with OAuth 2.0
✅ Optimize for production with Docker and GraalVM
I'll also share some performance numbers and pretty graphs to compare community metrics.
Related blog post: https://developer.okta.com/blog/2021/06/18/native-java-framework-comparison
Maven is a build automation tool used primarily for Java projects. This presentation will cover the basics of Maven and its usage while developing Java application.This is for anyone interested to learn Maven especially the Java developers.
Ingress? That’s So 2020! Introducing the Kubernetes Gateway APIVMware Tanzu
SpringOne 2021:
Session Title: Ingress? That’s So 2020! Introducing the Kubernetes Gateway API
Speakers: Abhinav Rau, Principal Architect at Google; Madhav Sathe, Cloud Customer Engineer at Google
Jenkins is an open source automation server written in Java. Jenkins helps to automate the non-human part of the software development process, with continuous integration and facilitating technical aspects of continuous delivery. It is a server-based system that runs in servlet containers such as Apache Tomcat.
Maven is the most popular Java Dependency Management Tool.
In this hands-on course, you will understand how Maven makes the life of a Java developer easy. We will use a step by step approach with 20 steps.
During the course, you will automate these using Maven.
Compiling Java Code
Running Unit Tests
Building Jar's and Wars
Running web applications in Tomcat
Setting up new projects
You will learn following features of Maven with 5 Example Projects on Github.
Dependency Management - including Transitive Dependencies
Maven Project Object Model
Maven Build Life Cycle
Maven Plugins
Maven Archetypes - Generate Projects
Maven Best Practices
Multi Module Maven Projects
Cloud Foundry Summit 2015: 10 common errors when pushing apps to cloud foundryJack-Junjie Cai
You may experience some errors when you push your application to CloudFoundry. Some of them are easier to figure out, while others may be mysterious and harder to diagnose. This session will examine 10 common errors that may happen during application push, including their symptom, the tools and techniques to diagnose them, and the possible solutions. The session will mostly focus on Java and node.js applications, but some of the tips applies to all runtimes.
Java REST API Framework Comparison - UberConf 2021Matt Raible
Use Spring Boot! No, use Micronaut!! Nooooo, Quarkus is the best!!!
There's a lot of developers praising the hottest, and fastest, Java REST frameworks: Micronaut, Quarkus, and Spring Boot. In this session, you'll learn how to do the following with each framework:
✅ Build a REST API
✅ Secure your API with OAuth 2.0
✅ Optimize for production with Docker and GraalVM
I'll also share some performance numbers and pretty graphs to compare community metrics.
Related blog post: https://developer.okta.com/blog/2021/06/18/native-java-framework-comparison
Maven is a build automation tool used primarily for Java projects. This presentation will cover the basics of Maven and its usage while developing Java application.This is for anyone interested to learn Maven especially the Java developers.
Ingress? That’s So 2020! Introducing the Kubernetes Gateway APIVMware Tanzu
SpringOne 2021:
Session Title: Ingress? That’s So 2020! Introducing the Kubernetes Gateway API
Speakers: Abhinav Rau, Principal Architect at Google; Madhav Sathe, Cloud Customer Engineer at Google
Jenkins is an open source automation server written in Java. Jenkins helps to automate the non-human part of the software development process, with continuous integration and facilitating technical aspects of continuous delivery. It is a server-based system that runs in servlet containers such as Apache Tomcat.
"Will Git Be Around Forever? A List of Possible Successors" at UtrechtJUG🎤 Hanno Embregts 🎸
What source control software did you use in 2010? Possibly Git, if you were an early adopter or a Linux kernel committer. But chances are you were using Subversion, as this was the product of choice for the majority of the software developers. Ten years later, Git is the most popular product. Which makes me wonder: what will we use another ten years from now?
In this talk we will think about what features we want from our source control software in 2030. More speed? Better collaboration support? No merge conflicts ever?
I’ll also discuss a few products that have been published after Git emerged, including Plastic, Fossil and Pijul. I’ll talk about the extent to which they contain the features we so dearly desire and I’ll demonstrate a few typical use cases. To conclude, I’ll try to predict which one will be ‘the top dog’ in 2030 (all information is provided “as is”, no guarantees etc. etc.).
So attend this session if you’re excited about the future of version control and if you want to have a shot at beating even (!) the early adopters. Now if it turns out I was right, remember that you heard it here first.
Will Git Be Around Forever? A List of Possible Successors🎤 Hanno Embregts 🎸
What source control software did you use in 2008? Possibly Git, if you were an early adopter or a Linux kernel committer. But chances are you were using Subversion, as this was the product of choice for the majority of the software developers. Ten years later, Git is the most popular product. Which makes me wonder: what will we use another ten years from now?
In this talk we will think about what features we want from our source control software in 2028. More speed? Better collaboration support? No merge conflicts ever?
I’ll also discuss a few products that have been published after Git emerged, including Fossil, Veracity and Pijul. I’ll talk about the extent to which they contain the features we so dearly desire and I’ll demonstrate a few typical use cases. To conclude, I’ll try to predict which one will be ‘the top dog’ in 2028 (all information is provided “as is”, no guarantees etc. etc.).
So attend this session if you’re excited about the future of version control and if you want to have a shot at beating even (!) the early adopters. Now if it turns out I was right, remember that you heard it here first.
The Information Technology have led us into an era where the production, sharing and use of information are now part of everyday life and of which we are often unaware actors almost: it is now almost inevitable not leave a digital trail of many of the actions we do every day; for example, by digital content such as photos, videos, blog posts and everything that revolves around the social networks (Facebook and Twitter in particular). Added to this is that with the "internet of things", we see an increase in devices such as watches, bracelets, thermostats and many other items that are able to connect to the network and therefore generate large data streams. This explosion of data justifies the birth, in the world of the term Big Data: it indicates the data produced in large quantities, with remarkable speed and in different formats, which requires processing technologies and resources that go far beyond the conventional systems management and storage of data. It is immediately clear that, 1) models of data storage based on the relational model, and 2) processing systems based on stored procedures and computations on grids are not applicable in these contexts. As regards the point 1, the RDBMS, widely used for a great variety of applications, have some problems when the amount of data grows beyond certain limits. The scalability and cost of implementation are only a part of the disadvantages: very often, in fact, when there is opposite to the management of big data, also the variability, or the lack of a fixed structure, represents a significant problem. This has given a boost to the development of the NoSQL database. The website NoSQL Databases defines NoSQL databases such as "Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open source and horizontally scalable." These databases are: distributed, open source, scalable horizontally, without a predetermined pattern (key-value, column-oriented, document-based and graph-based), easily replicable, devoid of the ACID and can handle large amounts of data. These databases are integrated or integrated with processing tools based on the MapReduce paradigm proposed by Google in 2009. MapReduce with the open source Hadoop framework represent the new model for distributed processing of large amounts of data that goes to supplant techniques based on stored procedures and computational grids (step 2). The relational model taught courses in basic database design, has many limitations compared to the demands posed by new applications based on Big Data and NoSQL databases that use to store data and MapReduce to process large amounts of data.
Course Website http://pbdmng.datatoknowledge.it/
Contact me to download the slides
YouTube Link: https://youtu.be/8Xo3l1zv41I
**DevOps Certification Courses - https://www.edureka.co/devops-certification-training **
This Edureka PPT on ‘Git Interview Questions’ will discuss the most frequently asked questions that you might face in an interview.
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in
Buying a Ferrari for your teenager? You may want to think twiceAl Zindiq
Data science teams have different levels of maturity and they need to be equipped with the right tools and infrastructure to make them more agile and ready. Here, I will be discussing a combination of open source tools and cloud managed services that can go hand-by-hand and grow with your data science teams needs as they mature.
Open up your platform with Open Source and GitHubScott Graham
Use GitHub & open source to get your users involved in projects within your company. This presentation give a quick run down of what you need to know to get started.
Code for Startup MVP (Ruby on Rails) Session 1Henry S
First Session on Learning to Code for Startup MVP's using Ruby on Rails.
This session covers the web architecture, Git/GitHub and makes a real rails app that is deployed to Heroku at the end.
Thanks,
Henry
Gitlab for PHP developers (Brisbane PHP meetup, 2019-Jan-29)Vladimir Roudakov
Gitlab is not only code management service and the only open source platform, it is also "A full DevOps tool" as stated on their home page.
In this talk we are going to see what are the features available in GitLab free version and why is it more than source control tool.
GitHub is the repository for the vast majority of today’s open-source software. And that is why many interviewers look at applicants’ public GitHub.com accounts to assess their interests, popularity, helpfulness, and consistency. To collaborate with developers, today’s testers need git and a GitHub account. Unfortunately, esoteric command lines often confuse those new to the tool. Join Wilson Mar as he provides advice on how to be immediately productive. He begins with a review of top projects testers need to know; the etiquette to starting projects and following people; pull requests; and raising issues. Wilson includes demonstrations on mastering git, with tricks to markup text that gets converted into web pages, adding graphics to markup, creating branches, and merging branches. Based on his work on several projects on GitHub, Wilson provides keys to understanding the logic of different deployment workflows and explains even the most confusing words and concepts.
At some point, the code you write today will be deleted and replaced with something new. This talk will discuss the life cycle of a large code base, and how to manage it over time to accommodate rewrites, giving examples from a major rewrite of the Firefox build and release pipeline over the last two years. You'll learn how to replace components of a running distributed system while keeping it operational, the proverbial replacing the wing of an airplane in flight.
Distributed Systems at Scale: Reducing the FailKim Moir
This talk looks at the major problem's Mozilla's continuous integration farm and the plans we have to fix these issues. This talk was given at USENIX release engineering summit in Washington DC on November 13, 2015.
Scaling mobile testing on AWS: Emulators all the way downKim Moir
This talk will explore the evolution of Mozilla's continuous integration infrastructure for Firefox for Android. From our early device lab, to running tests on reference cards in custom racks, to our current implementation running on emulators in AWS. In addition, I'll discuss how we reduced the cost of running our tests in AWS by the use of spot instances, and fine tuning the selection of instance types. Finally, I'll discuss how we analyzed regression data to prune the number of tests we run to extend the capacity of our test pools and reduce costs. To give you some scope, our continuous integration farm consists of 6700 machines, 150,000 combined daily build and test jobs that are triggered by an average 300 pushes. This talk was given at USENIX release engineering summit in Washington, DC on November 13, 2015.
Eclipse Top Ten: Important lessons I've learned working on Eclipse Kim Moir
An insightful, candid and funny look at the top ten things I've learned while working on Eclipse for 8+ years. Community, contributors, committers, comics, this talk will have it all.
See http://relengofthenerds.blogspot.com the text associated with the slides
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
66. THE INFORMATION DISCUSSED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION, IT IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, AND IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, SUCH INFORMATION. ANY INFORMATION CONCERNING IBM'S PRODUCT PLANS OR STRATEGY IS SUBJECT TO CHANGE BY IBM WITHOUT NOTICE
Editor's Notes
Hi I’m Kim Moir and I’m a release engineer for the the Eclipse and RT Equinox projects. Since last June, our team has been working on migrating our ten year old CVS repository to Git. I’m going to talk about the process that we used to migrate, how our development processes changes to accommodate it, the challenges we faced and advice for other teams that are migrating. Along the way, I'm going to include some quotes from other committers with their thoughts on our git migration.
In honour of the fact that our Git migration is almost complete, a more appropriate name for my talk might be Git happens. Questions for Audience Show of hands, how many of you use Git on a daily basis? How many use CVS or SVN?
Why Git? The Eclipse foundation is in the process of phasing out support for CVS (December 2012) to reduce support costs. In theory, the summer months are our lowest activity period in terms of development due to the release we have every year in June. However the summer of 2011 was really busy for us as we began to plan our migration to Git. We wanted to minimize the disruption to the team, so we wanted to migrate as many projects as possible before fall and people returned from vacation. As well, we wanted to be able to migrate the bulk of the components before Indigo SR1 shipped at the end of September.
300 bundles, 62 features, 84 fragments JDT, PDE, Platform and Equinox projects -Some committers had exposure to Git - Orion and OSGi Alliance experience -16 GB in Eclipse repo, 8 GB in Equinox repo
One of the first discussions as we planned for our migration: what would the granularity of our Git repositories be? The Eclipse project has several subprojects: Platform, JDT, PDE and Equinox. Our commit rights are quite specific. If you are a committer on jdt.core, this doesn’t mean that you have rights on jdt.ui. With CVS, you can just check out the bundles you want into your workspace, you don’t have to clone the entire repository to your machine. Thus we wanted to ensure that our repositories weren’t too big so that a contributor wasn’t synchronizing a large repo to their machine with a lot of content that they would never use. How should our Git repositories be organized? We had a discussion with the PMC and decided that repositories should be organized by Unix group id.
With CVS, we had two repositories, /cvsroot/eclipse and /cvsroot/rt/. There is also currently a limitation in Git where you cannot assign multiple ACLs to the same repo. In order to preserve our project structure, we needed to have a repo for each Unix group. We couldn't have built larger repos without reorganizing our project structure and commit rights. That being said, we would recommend minimizing the number of repos you create as working with multiple Git repositories can be painful.
Due our CVS repository size and our desire to preserve our history, we decided that this would be a gradual migration over several months instead of a migration over a few days. We ran test migrations on a component basis and letting the owners them look at them and determine if there were issues. Many teams took the opportunity to reorganize their repos into a more organized fashion, for instance separating features, bundles and test bundles into separate directories. The Platform UI team were the first team to migrate to Git (July). Paul Webster spent about a month testing the Git migration of the platform UI bundles and writing scripts to assist in with the process. One of the issues that we ran into is that, when you tag or branch a repo in Git, the entire repo is tagged or branched. You can’t tag or branch a single project. In an effort to be good Eclipse citizens, during a release cycle, we only tag bundles that have changed. Thus only new bundles get downloaded as needed. For this reason, when first ran the migration tool on our CVS repos, a maintenance branch would only include bundles that had been branched for that release, and all the bundles that were not branched would be missing. Not good! To fix this, Paul wrote some scripts to precondition the repositories so that maintenance branches would include all the bundles in that release. I know that other projects didn’t have this problem, for instance CDT tags their bundles every time.
Another issue that we looked at during testing was that we had some rather large test repositories due to our binary files. Some background: Our build just compiles Java code. The SWT and Equinox Launcher teams have C code that must be compiled on native hardware for the 13 platforms we support and stored in the repository in binary form. Thus our initial test Git repositories were bloated with binaries, many of which had tags associated with old builds that we weren’t going to ever build again. Thus, we decided to 1) Have binary only repositories for these projects 2) Clean the binary repositories of non-release binaries to reduce their size. (Run a git-filter branch operation to remove binaries) 3) Update build scripts to fetch artifacts from binary repos
-Conditioned the repos back to 3.0 release
git-move-refs = removes unneeded fix up branches after the conversion Challenges during migration: Massaging tags that didn’t meet git standards. For instance some JDT committers had tags with “*” in them. Applied regexp foo to modify them.Long running git filter branch operations - From 20 minutes to 16 hours. Eclipse webmasters created a local partition for me on the filesystem to avoid NFS timeout issues on the shared Eclipse filesystem. Otherwise git filter branch operations would timeout after a few hours due to stale NFS file handles.
How long did the migration take? It depends on the size of the repo and the history associated with it. JDT Core 24 hours. 8 hours for filter branch. Time is correlated with repository size and history. Also, since we ran the migrations twice (1) test (2) real the migrations took a long time in both machine and people time
Our committers had a number of problems when first using Git. If you delete a project from your workspace, it’s easy to push that change to the master repository as an delete by mistake. In addition, since we work in multiple branches, we have had cases where people switch to one branch for one bundle and inadvertently commit code to another bundle to the wrong stream. While switching streams, committers also inadvertently deleted changes in their local workspace.
Our developers experienced quite a learning curve when switching to Git. For many it was a surprise that they couldn’t do everything in EGit like they had done in the CVS tooling. Several people reverted to using the command line or gitk. Which they found ironic because we are in the tooling business. So reverting to command line operations to manage your code contributions seemed like a step backward.
Another challenge was that the switch in focus to branches as opposed to patches. Traditionally, many teams created patches for every change and attached them to bugzillas that document the change. However, with Git, instead of creating patches, you would commit, and then add a link to the change in Bugzilla. So we had to adjust our mindset of commit to branch, instead of making a patch.
Branches > Patches Letting go of the patch mentality was a hurdle for many people. Several teams submit every change as a patch to bugzilla, and have done so for years. New committers were traditionally taught to write and refine patches as part of the process to become a committer. So it felt unnatural to commit changes in local branches.
I missed a bundle during one of the migrations and spent a day trying to integrate CVS content into a git repo while preserving history. I tried git-stitch and git-merge but to no avail, the history didn’t look right. In the end, I ended up rerunning the CVS migration because it was too much work to fix all the tags to look right.
Friday afternoon, the 21st of October, before milestone week. Brian de Alwis was using bzr-git client. He pushed some changes to master branch and it wiped all but two of the active branches in the repo. It also triggered a gc which cleaned up the recently deleted branches. Initially, other committers tried to push back the changes but were not allowed to because of server side commit hooks. They was then a mad scramble to find a committer with the latest copy of the repo that could be restored to eclipse.org. Paul found one his home machine. comment 37 I'll just add the final fix. We took a cloned repo that was up to date from Thursday and pulled Friday's 7 commits into R4_development and R3_6_maintenance only. Denis disabled the commit hooks. Then we pushed all tags and pushed refs/remotes/origin/*:refs/heads/* Pushing the refs also pushed back the GCed commits. We should get that restored repo from the ISP and compare it with the public repo now, to confirm we've completely restored the repo. PW We recently ran into a problem where a push inadvertently removed most of the branches and tags from our public repo, eclipse.platform.ui.git and GCed the orphaned commits, leaving us in a bad state. This was done through normal git operations, and can be easily replicated from the command line or a little script. We'd like to discuss ways of preventing or limiting the damage to our public repos from this kind of situation in the future. Please adds your comments or insights to https://bugs.eclipse.org/bugs/show_bug.cgi?id=362076 ”
Easier to branch Rolling back a commit is easier Seeing the Eclipse project move to Git made Wayne Beaton happy. If Wayne is happy, everyone is happy. Cool graphs on GitHub. The EGit team lots of feedback. Bugzilla feedback is love. For instance, Dani and Markus opened over 110 EGit bugs.
“ Fork you” is now a valid bugzilla resolution.
-We build with a mixture of PDE, p2 and Ant, as well as the Eclipse compiler. -In order to build against Git repositories we added the EGit fetch factory bundle to the subset of bundles that we use to build Eclipse. -Modified our map files to point to Git repositories -builder changes - fetch maps from Git repos, compare tags, create tag for build Id -changes to build scripts so binaries are fetched from the appropriate repos -Ran several test builds. Surprising low on the release engineering pain point scale. -backport changes to all four active developmen streams
The migration has also made us rethink our development and build processes. Today, we usually build from tags. Everyone releases to a branch and tags their contribution to the build. But with Git, you should be thinking of terms of branches. For instance, we will be moving to a git flow model where our usual development occurs in a one “develop” branch and we merge changes into the master branch for the build. We will also change the builder to tag the branch automatically.
The Git migration consumed a lot of time for us. However, if you look at it from an accounting perspective, it’s a sunk cost. Every year, we make a plan of major items for the release. Migrating to Git was a major item for us which meant that other items had to be deferred
Advice for other projects contemplating their Git migration Relax: You don’t have as many bundles or as much history as we do. It won’t be so painful or costly for you. And it won’t take months. Unless you’re WTP. Then it might take a while. Run test migrations and builds first before the actual migration date and get feedback from your community to see if you need to modify your strategy for the actual migration. [email_address] is helpful for questions related to git migration. Other projects have been very helpful. Paul Webster wrote a document “Git workflows for CVS users” which has been very useful. Inevitably when people have their repositories migrated to Git they have similar questions so it’s good to have the answers in a document you can point them to. -Minimize the number of repos you create. We have too many repos and cloning so many repos is not the most efficient way to work.
The benefits from the Git migration are not yet realized. Proponents of distributed version control systems suggest that it makes it easier to fork and contribute.
I recently watched a talk by David Eaves, who has been helping out open source and open data communities prepare metrics on bug fix rate, how long patches wait and so on. Anyways, one of his points during his talk was that people think that open source is all about collaboration and working together. But really, it if we empower people to go off and work on a problem by themselves without having to interact with someone because of the reduced transaction costs, this is a huge benefit. It would be an interesting academic study to analyze contributions to Eclipse projects using SVN and CVS, and contributions after the same projects convert to Git and see if there is a statistically significant increase in contributions. The most important thing is that you want to reduce the barriers to contribution in your community.