The document discusses parallelizing garbage collection (GC) in CRuby. It describes the current single-threaded GC approach and argues for a parallel marking GC to utilize multiple CPU cores. Key points covered include explaining GC concepts like dead objects and roots, an overview of CRuby's mark-and-sweep algorithm, and the motivation to parallelize marking to improve performance. The author has implemented several GC techniques for CRuby through "RubyKaigi Driven Development" including Lazy Sweep GC and a proposed Parallel Marking GC.
I will talk about some improvements of GC in Ruby 2.0.0. For instance, I will introduce about implementations of Bitmap Marking GC and so on, and show results of benchmarks after these are implemented.
Animation version is here: https://gumroad.com/l/xWCR (premium version)
CAAD FUTURES 2015: Development of High-definition Virtual Reality for Histo...Tomohiro Fukuda
This slide is our research presentation in the 16th CAAD Futures 2015 Conference, at MASP, Sao Paulo, Brazil.
Keywords: Cultural heritage, digital reconstruction, Virtual Reality, visualization, 3D modeling, presentation.
Abstract: This study shows fundamental data for constructing a high-definition VR application under the theme of a three-dimensional visualization to restore past architecture and cities. It is difficult for widespread architectural and urban objects to be rendered in real-time. Thus, in this study, techniques for improving the level of detail (LOD) and representation of natural objects were studied. A digital reconstruction project of Azuchi Castle and old castle town was targeted as a case study. Finally, a VR application with specifications of seven million polygons, texture of 1.87 billion pixels, and 1920 × 1080 screen resolution, was successfully developed that could run on a PC. For the developed VR applications, both qualitative evaluation by experts and quantitative evaluation by end users was performed.
I will talk about some improvements of GC in Ruby 2.0.0. For instance, I will introduce about implementations of Bitmap Marking GC and so on, and show results of benchmarks after these are implemented.
Animation version is here: https://gumroad.com/l/xWCR (premium version)
CAAD FUTURES 2015: Development of High-definition Virtual Reality for Histo...Tomohiro Fukuda
This slide is our research presentation in the 16th CAAD Futures 2015 Conference, at MASP, Sao Paulo, Brazil.
Keywords: Cultural heritage, digital reconstruction, Virtual Reality, visualization, 3D modeling, presentation.
Abstract: This study shows fundamental data for constructing a high-definition VR application under the theme of a three-dimensional visualization to restore past architecture and cities. It is difficult for widespread architectural and urban objects to be rendered in real-time. Thus, in this study, techniques for improving the level of detail (LOD) and representation of natural objects were studied. A digital reconstruction project of Azuchi Castle and old castle town was targeted as a case study. Finally, a VR application with specifications of seven million polygons, texture of 1.87 billion pixels, and 1920 × 1080 screen resolution, was successfully developed that could run on a PC. For the developed VR applications, both qualitative evaluation by experts and quantitative evaluation by end users was performed.
SOAR: SENSOR ORIENTED MOBILE AUGMENTED REALITY FOR URBAN LANDSCAPE ASSESSMENTTomohiro Fukuda
This slide is presented in CAADRIA2012 (The 17th International Conference on Computer Aided Architectural Design Research in Asia).
Abstract. This research presents the development of a sensor oriented mobile AR system which realizes geometric consistency using GPS, a gyroscope and a video camera which are mounted in a smartphone for urban landscape assessment. A low cost AR system with high flexibility is realized. Consistency of the viewing angle of a video camera and a CG virtual camera, and geometric consistency between a video image and 3DCG are verified. In conclusion, the proposed system was evaluated as feasible and effective.
DISTRIBUTED AND SYNCHRONISED VR MEETING USING CLOUD COMPUTING: Availability a...Tomohiro Fukuda
This slide is presented in CAADRIA2012 (The 17th International Conference on Computer Aided Architectural Design Research in Asia).
Abstract. The mobility of people's activities, and cloud computing technologies are becoming advanced in the modern age of information and globalisation. This study describes the availability of discussing spatial design while sharing a 3-dimensional virtual space with stakeholders in a distributed and synchronised environment. First of all, a townscape design support system based on a cloud computing type VR system is constructed. Next, an experiment of a distributed and synchronised discussion of townscape design is executed with subjects who are specialists in the townscape design field. After the experiment, both qualitative mental evaluation and quantitative evaluation were carried out. The conclusions are as follows: 1. Users who use VR frequently and who use videoconferencing consider that the difference with face-to-face discussion is small. 2. A Moiré pattern may occur in a gradation picture. 3. The availability of distributed and synchronised discussions with cloud computing type VR is high.
A STUDY OF VARIATION OF NORMAL OF POLY-GONS CREATED BY POINT CLOUD DATA FOR A...Tomohiro Fukuda
This slide is presented in CAADRIA2011 (The 16th International Conference on Computer Aided Architectural Design Research in Asia).
Abstracts: Acquiring current 3D space data of cities, buildings, and rooms rapidly and in detail has become indispensable. When the point cloud data of an object or space scanned by a 3D laser scanner is converted into polygons, it is an accumulation of small polygons. When object or space is a closed flat plane, it is necessary to merge small polygons to reduce the volume of data, and to convert them into one polygon. When an object or space is a closed flat plane, each normal vector of small polygons theoretically has the same angle. However, in practise, these angles are not the same. Therefore, the purpose of this study is to clarify the variation of the angle of a small polygon group that should become one polygon based on actual data. As a result of experimentation, no small polygons are converted by the point cloud data scanned with the 3D laser scanner even if the group of small polygons is a closed flat plane lying in the same plane. When the standard deviation of the extracted number of polygons is assumed to be less than 100, the variation of the angle of the normal vector is roughly 7 degrees.
GOAR: GIS Oriented Mobile Augmented Reality for Urban Landscape AssessmentTomohiro Fukuda
This slide is presented in CMC2012 (2012 4th International Conference on
Communications, Mobility, and Computing).
Abstract. This research presents the development of a mobile AR system which realizes geometric consistency
using GIS, a gyroscope and a video camera which are mounted in a smartphone for urban landscape assessment. A low cost AR system with high flexibility is developed.
Geometric consistency between a video image and 3DCG are verified. In conclusion, the proposed system was evaluated as feasible and effective.
Availability of Mobile Augmented Reality System for Urban Landscape SimulationTomohiro Fukuda
This slide is presented in CDVE2012 (The 9th International Conference on Cooperative Design, Visualization, and Engineering).
Abstract. This research presents the availability of a landscape simulation method for a mobile AR (Augmented Reality), comparing it with photo montage and VR (Virtual Reality) which are the main existing methods. After a pilot experiment with 28 subjects in Kobe city, a questionnaire about three landscape simulation methods was implemented. In the results of the questionnaire, the mobile AR method was well evaluated for reproducibility of a landscape, operability, and cost. An evaluation rated as better than equivalent was obtained in comparison with the existing methods. The suitability of mobile augmented reality for landscape simulation was found to be high.
A 1 hour presentation of how GlassFish v3 Prelude provides support for Scripting / Dynamic Languages. Ruby/JRuby/Rails and Groovy/Grails are specifically described.
SOAR: SENSOR ORIENTED MOBILE AUGMENTED REALITY FOR URBAN LANDSCAPE ASSESSMENTTomohiro Fukuda
This slide is presented in CAADRIA2012 (The 17th International Conference on Computer Aided Architectural Design Research in Asia).
Abstract. This research presents the development of a sensor oriented mobile AR system which realizes geometric consistency using GPS, a gyroscope and a video camera which are mounted in a smartphone for urban landscape assessment. A low cost AR system with high flexibility is realized. Consistency of the viewing angle of a video camera and a CG virtual camera, and geometric consistency between a video image and 3DCG are verified. In conclusion, the proposed system was evaluated as feasible and effective.
DISTRIBUTED AND SYNCHRONISED VR MEETING USING CLOUD COMPUTING: Availability a...Tomohiro Fukuda
This slide is presented in CAADRIA2012 (The 17th International Conference on Computer Aided Architectural Design Research in Asia).
Abstract. The mobility of people's activities, and cloud computing technologies are becoming advanced in the modern age of information and globalisation. This study describes the availability of discussing spatial design while sharing a 3-dimensional virtual space with stakeholders in a distributed and synchronised environment. First of all, a townscape design support system based on a cloud computing type VR system is constructed. Next, an experiment of a distributed and synchronised discussion of townscape design is executed with subjects who are specialists in the townscape design field. After the experiment, both qualitative mental evaluation and quantitative evaluation were carried out. The conclusions are as follows: 1. Users who use VR frequently and who use videoconferencing consider that the difference with face-to-face discussion is small. 2. A Moiré pattern may occur in a gradation picture. 3. The availability of distributed and synchronised discussions with cloud computing type VR is high.
A STUDY OF VARIATION OF NORMAL OF POLY-GONS CREATED BY POINT CLOUD DATA FOR A...Tomohiro Fukuda
This slide is presented in CAADRIA2011 (The 16th International Conference on Computer Aided Architectural Design Research in Asia).
Abstracts: Acquiring current 3D space data of cities, buildings, and rooms rapidly and in detail has become indispensable. When the point cloud data of an object or space scanned by a 3D laser scanner is converted into polygons, it is an accumulation of small polygons. When object or space is a closed flat plane, it is necessary to merge small polygons to reduce the volume of data, and to convert them into one polygon. When an object or space is a closed flat plane, each normal vector of small polygons theoretically has the same angle. However, in practise, these angles are not the same. Therefore, the purpose of this study is to clarify the variation of the angle of a small polygon group that should become one polygon based on actual data. As a result of experimentation, no small polygons are converted by the point cloud data scanned with the 3D laser scanner even if the group of small polygons is a closed flat plane lying in the same plane. When the standard deviation of the extracted number of polygons is assumed to be less than 100, the variation of the angle of the normal vector is roughly 7 degrees.
GOAR: GIS Oriented Mobile Augmented Reality for Urban Landscape AssessmentTomohiro Fukuda
This slide is presented in CMC2012 (2012 4th International Conference on
Communications, Mobility, and Computing).
Abstract. This research presents the development of a mobile AR system which realizes geometric consistency
using GIS, a gyroscope and a video camera which are mounted in a smartphone for urban landscape assessment. A low cost AR system with high flexibility is developed.
Geometric consistency between a video image and 3DCG are verified. In conclusion, the proposed system was evaluated as feasible and effective.
Availability of Mobile Augmented Reality System for Urban Landscape SimulationTomohiro Fukuda
This slide is presented in CDVE2012 (The 9th International Conference on Cooperative Design, Visualization, and Engineering).
Abstract. This research presents the availability of a landscape simulation method for a mobile AR (Augmented Reality), comparing it with photo montage and VR (Virtual Reality) which are the main existing methods. After a pilot experiment with 28 subjects in Kobe city, a questionnaire about three landscape simulation methods was implemented. In the results of the questionnaire, the mobile AR method was well evaluated for reproducibility of a landscape, operability, and cost. An evaluation rated as better than equivalent was obtained in comparison with the existing methods. The suitability of mobile augmented reality for landscape simulation was found to be high.
A 1 hour presentation of how GlassFish v3 Prelude provides support for Scripting / Dynamic Languages. Ruby/JRuby/Rails and Groovy/Grails are specifically described.
Kubernetes & Google Container Engine @ mablJoseph Lust
Validating 100 Million Pages a Month using Kubernetes and Google Container Engine (GKE).
How we used Docker to build our ML testing engine in four months. Lessons learned, best practices, and demonstrations.
Boston Google Cloud Meetup September Presentation @ mabl
https://www.meetup.com/Boston-Google-Cloud-Meetup/events/242964121/
Dmytro Patkovskyi "Practical tips regarding build optimization for those who ...Fwdays
This talk is about build optimization mechanisms available in three developer tools that are often used together (Gitlab, Gradle, and Docker). Dmytro will describe the possibilities of each instrument and advise which functions you should use and how. Additional attention will be paid to the most common pitfalls, along with handy tips and tricks. The talk will also be useful for those who use just one or two out of the tools.
Groovy, to Infinity and Beyond - Groovy/Grails eXchange 2009Guillaume Laforge
Reviewing the Groovy 1.6 features, the new 1.7 functionalities, and a look into what the future holds for Groovy 1.8 and beyond!
Presentation given by Guillaume Laforge at the Groovy/Grails eXchange conference, in London.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
9. Ice-cream factory
✓ I worked in an assembly line
✓ For example, I made many
cardboard boxes.
✓ I was a professional cardboard box
maker :)
8/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
10. Ice-cream factory
✓ I made 150 boxes per hour
(ZOMG)
9/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
11. I was like a machine!!
http://www.flickr.com/photos/kevincollins123/5887984753/
12.
13.
14. Working with Java
✓ I worked in a big company.
✓ This work was similar to
assembly line work..
✓ I made a part of a product. I didn't
understand whole product.
13/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
15. I was still like a
machine!!
http://www.flickr.com/photos/kevincollins123/5887984753/
16.
17.
18. My current work
✓ Currently, I work at NaCl.
✓ matz and shyouhei and takaokouji
are my co-workers.
✓ shugo is my boss.
✓ They are CRuby committers.
17/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
19. When I started Ruby
programming
✓ I felt free.
✓ This work wasn't similar to
assembly line work.
✓ I could make the whole product.
18/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
20. I was no longer
a machine!!
http://www.flickr.com/photos/danzden/121379782/
21.
22. Garbage Collection for me
✓ GC technology is very interesting
for me.
✓ GC is a garbage collecting
machine.
✓ I've been creating it since then.
It's very fun!!
21/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
31. My RDD history
✓ LazySweepGC - RubyKaigi2008
✓ LonglifeGC - 2009
✓ LazySweepGC - 2010
✓ ParallelMarkingGC - 2011
30/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
32. My RDD history
✓ LazySweepGC - RubyKaigi2008
✓ LonglifeGC - 2009
✓ LazySweepGC - 2010
✓ ParallelMarkingGC - 2011
31/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
33. LonglifeGC
✓ It treats long-life objects as a
special case.
✓ similar to Generational GC.
✓ LonglifeGC was rejected in
CRuby 1.9.2 by some reason.
✓ :'(
32/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
34. But, LonglifeGC has
been
used in Kiji :-)
http://www.flickr.com/photos/conifer/2389654222/
35. Kiji
✓ Kiji is an optimized version of
REE by Twitter developers.
✓ The twitter team substantially
extended LonglifeGC.
✓ It's cool!!
34/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
37. My RDD history
✓ LazySweepGC - RubyKaigi2008
✓ LonglifeGC - 2009
✓ LazySweepGC - 2010
✓ ParallelMarkingGC - 2011
36/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
38. LazySweepGC
✓ Traditional M&S GC executes
mark and sweep atomically.
✓ Ruby application stops during GC
(stop-the-world).
✓ In Lazy sweeping, sweeping is
lazy.
37/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
39. LazySweepGC
✓ Each invocation of the object
allocation sweeps Ruby's heap
✓ until it finds an appropriate free object.
38/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
40. Improvements
✓ This improves the response time
of GC
✓ I.e. the worst case time of GC
decreases.
39/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
41. LazySweepGC
✓ You can use LazySweepGC since
Ruby 1.9.3
40/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
42. My RDD history
✓ LazySweepGC - RubyKaigi2008
✓ LonglifeGC - 2009
✓ LazySweepGC - 2010
✓ ParallelMarkingGC - 2011
41/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
44. Today's topics
✓ Why do we need Parallel
Marking?
✓ What to consider?
✓ How to implement?
✓ How much did performance
improve?
43/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
45. Today's topics
✓ Why do we need Parallel
Marking?
✓ What to consider?
✓ How to implement?
✓ How much did performance
improve?
44/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
48. Current CRuby's GC
✓ GC operates on only 1 core.
✓ In multi-core environment, other
cores don't help GC.
47/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
49. GC:"I'm alone,
it's so hard."
http://www.flickr.com/photos/hortont/2698261070/
50. We should run GC in
parallel!!
http://www.flickr.com/photos/knallaerbse/2863161933/
52. What is GC?
✓ GC collects all dead objects.
51/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
53. What is a dead object?
✓ A dead object is an object that is
never referenced by the program.
✓ In GC terms, we say a that dead
object is unreachable from Roots.
52/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
54. What is Roots?
✓ Roots is a set of pointers that
directly reference objects in the
program.
✓ e.g. Ruby's local variables, etc..
53/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
55. For example
54/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
56. Please remember that
✓ GC collects objects that are
unreachable from Roots.
55/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
57. Next, Let me explain the
current CRuby GC
algorithm.
58. CRuby's GC algorithm
summary
✓ CRuby adopts the Mark & Sweep
algorithm
✓ Collector works in separate Mark
and Sweep phases.
57/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
59. In the Mark phase
✓ collector marks live objects that
are reachable from Roots.
58/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
60. For example
59/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
61. Mark phase with GC.start
60/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
62. Ruby Heap after marking
61/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
63. In the Sweep phase
✓ collector sweeps "dead" objects
✓ "dead" means unmarked
✓ "dead" means unreachable from Roots
62/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
64. Sweep phase
63/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
66. Characteristics
✓ The stop-the-world algorithm
✓ Single thread execution
65/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
67. Recently, PC has multi-core
processors. But,
✓ GC executes on a single thread.
✓ Other cores don't work during GC.
✓ What a waste!!
66/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
71. What is Parallel Marking?
✓ Collector run several marking
processes in parallel
✓ by using native threads.
✓ We will be happy on multi-core
machine.
70/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
72. Flow diagram for Parallel
Marking
71/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
74. Why not perform sweeping in
parallel
✓ The sweeping is much faster than
the marking.
✓ You can see ko1's research
✓ <URL:http://www.atdot.net/~ko1/
diary/201011.html#d4>
73/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
75. Why not perform sweeping in
parallel
✓ So, Mark phase improvement =
GC improvement
✓ And, we already have the lazy
sweeping.
74/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
76. Today's topics
✓ Why do we need Parallel
Marking?
✓ What to consider?
✓ How to implement?
✓ How much did performance
improve?
75/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
85. This means..
✓ Tasks are distributed to multiple
threads.
✓ The task of marking the entire
heap is divided into several tasks,
each marking a single branch.
84/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
100. What does "wait-free" mean?
✓ A wait-free program does non-
blocking execution.
✓ It guarantees per-thread progress.
99/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
103. Amdahl's law
is used to find the
maximum expected
improvement to an
overall system when
only part of the system
is improved.
[cited from `Amdahl's law - Wikipedia']
102/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
104. Amdahl's law is used in
parallel computing
✓ If parallel portion of the system is
X%
✓ And number of processors is Y,
✓ How much speedup can we
expect?
103/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
110. The conclusion so far
✓ We should consider how we can
efficiently balance workloads.
✓ So, we use Task Stealing.
✓ We should eliminate non-parallel
parts
✓ by using wait-free algorithm.
109/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
111. Today's topics
✓ Why do we need Parallel
Marking?
✓ What to consider?
✓ How to implement?
✓ How much did performance
improve
110/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
113. Task Stealing
✓ In Task Stealing, threads steal
tasks from each other
✓ Task Stealing is achieved with
Arora's Deque
112/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
114. Arora's Deque
✓ Deque stands for the Double-
Ended Queue.
✓ In Arora's Deque, the deque
contains tasks as elements.
✓ It's a wait-free data structure.
113/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
122. "Hey wait a minute,
doesn't shift() have
contention problems?"
123. In what ways could shift()
cause contention problems?
e.g...
✓ Multi-thread (workers) may call
shift() of same deque at the same
time.
122/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
124. In what ways could shift()
cause contention problems?
e.g...
✓ shift() and pop() could be called
at the same time
✓ when deque has only one element.
123/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
126. Serialization
✓ shift() is serialized by using CAS.
✓ CAS = Compare And Swap
✓ And, this serialization doesn't use
a lock.
✓ It's wait-free!!
125/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
127. I omit details of the
implementation of the
serialization.
128. For the sake of this
presentation, let's assume
that Arora's Deque avoids
contention problems.
129. Summary for Arora's Deque
✓ A simple data structure for Task
Stealing.
✓ Each worker has a single deque.
✓ Stealing (shift operation) is wait-
free!
128/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
130. How to use Arora's Deque
in Parallel Marking?
141. Summary
✓ Marker uses Arora's Deque as a
marking stack.
✓ A "task" means an object.
✓ The granularity of the task is very fine.
✓ This is a naive implementation.
140/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
148. Why slow?
✓ pop(),push(),shift() are called
frequently.
✓ Because deque has fine-grained tasks.
✓ Their overhead is too big.
147/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
156. Good point & Bad point
✓ Number of calls to Deque's
operations was reduced.
✓ Marking speed of the worker is
improved.
✓ However, Coarse-grained tasks
decrease parallelism.
155/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
159. If an object in B's branch has many child
objects..
160. .. then A can't steal it while B is marking
the large branch.
161. So, the worker needs to
treat large branches as
special cases.
162. Almost all large branches
hold large Array objects
and/or large Hash objects.
163. Treatment for large Array
objects and Hash objects
✓ Each marker has a special deque
to manage them.
✓ A marker divides them into fixed
size tasks.
✓ e.g. 0-9 elements of Array, 10-19
elements of Array...
162/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
164. Treatment for Large Array
and Hash
✓ By doing this, other workers can
steal divided tasks.
✓ This improves parallelism.
163/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
165. Summary
✓ The naive implementation was
slow.
✓ Grain of the task was too fine.
✓ A "task" means a branch in Roots
✓ Grain of the task is coarse.
✓ It's faster!!
164/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
166. Today's topics
✓ Why do we need Parallel
Marking?
✓ What to consider?
✓ How to implement?
✓ How much did performance
improve?
165/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
170. First benchmark program is
✓ make benchmark
✓ This is the benchmark which used in
CRuby development
169/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
171.
172. Why does this seem so slow?
✓ I think it's affected by Parallel
Marking's preparation.
✓ e.g. creating marking threads,
allocation of deques.
171/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
173. Why does this seem so slow?
✓ In most of the benchmarks, the
mark target objects are few.
✓ In this case, Parallel Marking cost is
expensive.
172/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
174. Next benchmark program is
✓ make rdoc
✓ make rdoc generates the Ruby
documentation.
✓ This benchmark measures execution
time and the GC execution time of
make rdoc.
173/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
175. make rdoc
✓ It takes about 80 seconds on my
machine.
✓ In fact, 30% of that time is spent
on GC!!
✓ How much did performance
improve?
174/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
179. In many core environment
✓ I expect we get a large
improvement.
✓ e.g. 8 core, 16 core...
✓ But, my machine has just 2 cores.
✓ I can't see it :(
178/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
180. Best case for Parallel GC
✓ If the objects are many.
✓ In this case, mark targets is also many.
✓ If the objects are long-lived.
✓ Server-side application?
179/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
182. Demonstration
✓ I want to show the performance
improvement with Parallel GC.
✓ This demonstration is video game
style.
181/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
188. Other characteristics of
SUPER NARIO GC
✓ GC is running in fixed intervals.
✓ A lot of objects are generated to
increase GC's burden.
✓ Burden = Game Level
187/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
189. Try to compare Original GC
and Parallel GC
✓ Original GC pause time is long.
✓ This game will be difficult.
✓ Parallel GC pause time is short.
✓ This game will be easy.
188/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
199. Windows OS is not supported
✓ Mark Worker uses pthread as
native thread.
✓ And, uses some gcc built-in
functions.
✓ But, I'll support for Windows
eventually.
198/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
200. Increased memory usage.
✓ Size of 1 Deque is roughly 32KB.
✓ But generally multi-core machine
have plenty of memory.
✓ So, I think it's OK :P
199/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
202. Conclusion
✓ I implemented Parallel Marking
GC
✓ GC was improved!
✓ I'll report to ruby-core soon.
201/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
203. Conclusion
✓ But, Parallel Marking has some
problems.
✓ I'll fix these.
202/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
204. source code
✓ Parallel Marking GC
✓ <URL:https://github.com/authorNari/
ruby/tree/pmark_div_root2>
✓ SUPER NARIO GC
✓ <URL:https://github.com/authorNari/
nario/>
203/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
205. Acknowledgments
✓ Following people helped me
make this presentation!!
✓ Tor-san!!
✓ matz, shugo, yhara, sada, takaokouji,
other co-workers!!
204/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3
207. Do you have any
questions?
Please short and simple
questions :)
208. Sorry
✓ It's too difficult for me to
understand/answer the question.
✓ Could be send the question on
twitter(@nari_en)?
207/207
Parallel worlds of CRuby's GC Powered by Rabbit 0.9.3