Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Process Address Space: The way to create virtual address (page table) of user...Adrian Huang
Process Address Space: The way to create virtual address (page table) of userspace application.
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Reverse Mapping (rmap) in Linux KernelAdrian Huang
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Memory Mapping Implementation (mmap) in Linux KernelAdrian Huang
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Decompressed vmlinux: linux kernel initialization from page table configurati...Adrian Huang
Talk about how Linux kernel initializes the page table.
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Process Address Space: The way to create virtual address (page table) of user...Adrian Huang
Process Address Space: The way to create virtual address (page table) of userspace application.
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Reverse Mapping (rmap) in Linux KernelAdrian Huang
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Memory Mapping Implementation (mmap) in Linux KernelAdrian Huang
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Decompressed vmlinux: linux kernel initialization from page table configurati...Adrian Huang
Talk about how Linux kernel initializes the page table.
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...Adrian Huang
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedAdrian Huang
This slide deck describes the Linux booting flow for x86_64 processors.
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Page cache mechanism in Linux kernel.
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Virtual File System in Linux Kernel
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Linux Kernel Booting Process (2) - For NLKBshimosawa
Describes the bootstrapping part in Linux, and related architectural mechanisms and technologies.
This is the part two of the slides, and the succeeding slides may contain the errata for this slide.
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Anne Nicolas
Understanding how Linux kernel IO subsystem works is a key to analysis of a wide variety of issues occurring when running a Linux system. This talk is aimed at helping Linux users understand what is going on and how to get more insight into what is happening.
First we present an overview of Linux kernel block layer including different IO schedulers. We also talk about a new block multiqueue implementation that gets used for more and more devices.
After surveying the basic architecture we will be prepared to talk about tools to peek into it. We start with lightweight monitoring like iostat and continue with more heavy blktrace and variety of tools that are based on it. We demonstrate use of the tools on analysis of real world issues.
Jan Kara, SUSE
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtAnne Nicolas
Ftrace is the official tracer of the Linux kernel. It has been apart of Linux since 2.6.31, and has grown tremendously ever since. Ftrace’s name comes from its most powerful feature: function tracing. But the ftrace infrastructure is much more than that. It also encompasses the trace events that are used by perf, as well as kprobes that can dynamically add trace events that the user defines.
This talk will focus on learning how the kernel works by using the ftrace infrastructure. It will show how to see what happens within the kernel during a system call; learn how interrupts work; see how ones processes are being scheduled, and more. A quick introduction to some tools like trace-cmd and KernelShark will also be demonstrated.
Steven Rostedt, VMware
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...Adrian Huang
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedAdrian Huang
This slide deck describes the Linux booting flow for x86_64 processors.
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Page cache mechanism in Linux kernel.
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Virtual File System in Linux Kernel
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
Linux Kernel Booting Process (2) - For NLKBshimosawa
Describes the bootstrapping part in Linux, and related architectural mechanisms and technologies.
This is the part two of the slides, and the succeeding slides may contain the errata for this slide.
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Anne Nicolas
Understanding how Linux kernel IO subsystem works is a key to analysis of a wide variety of issues occurring when running a Linux system. This talk is aimed at helping Linux users understand what is going on and how to get more insight into what is happening.
First we present an overview of Linux kernel block layer including different IO schedulers. We also talk about a new block multiqueue implementation that gets used for more and more devices.
After surveying the basic architecture we will be prepared to talk about tools to peek into it. We start with lightweight monitoring like iostat and continue with more heavy blktrace and variety of tools that are based on it. We demonstrate use of the tools on analysis of real world issues.
Jan Kara, SUSE
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtAnne Nicolas
Ftrace is the official tracer of the Linux kernel. It has been apart of Linux since 2.6.31, and has grown tremendously ever since. Ftrace’s name comes from its most powerful feature: function tracing. But the ftrace infrastructure is much more than that. It also encompasses the trace events that are used by perf, as well as kprobes that can dynamically add trace events that the user defines.
This talk will focus on learning how the kernel works by using the ftrace infrastructure. It will show how to see what happens within the kernel during a system call; learn how interrupts work; see how ones processes are being scheduled, and more. A quick introduction to some tools like trace-cmd and KernelShark will also be demonstrated.
Steven Rostedt, VMware
VMware’s Nathan Small who works as a Staff Engineer at Global Support Services has put together a great presentation about Advanced Root Cause Analysis. The presentation was designed to give you more insight into how a VMware Technical Support Engineer reviews logs, gathers data and performs in-depth analysis. Nathan is hoping to show you the skills they’re using every day to help determine the root cause for an issue in your environment. With this core knowledge, you will become more self-sufficient within your own environment and be able to diagnose an issue as it occurs rather than after the damage has been done.
As part of the Google Summer of Code, we tried to add support for SeaBIOS in order to allow guest OSes to be booted directly from PV disk devices rather than from the emulated disk device. SeaBIOS is the BIOS implementation that upstream qemu uses. When the virtual machine is created, SeaBIOS upon initialization uses a generic Xenstore client to communicate with the back end and initialize the front-end block device that will connect to the back end. After the connection is established I/O requests are made via the BIOS int 0x13 interface, guest OSes use the int 0x13 without needing to be aware that PV drivers were used.
PGCon 2014 - What Do You Mean my Database Server Core Dumped? - How to Inspec...Faisal Akber
Presented at PGCon 2014 in Ottawa.
Program crashes are a fact of life and occasionally unavoidable. If there are core dumps that get generated then understanding what happened becomes easier.
Migrating KSM page causes the VM lock up as the KSM page merging list is too ...Gavin Guo
Topic: Migrating KSM page causes the VM lock up as the KSM page merging list is too large - 2019 OSS China Shanghai
https://sched.co/NruN
- Description
A classic example of kdump related to KSM/khugepaged/NUMA balance/KVM in server/cloud environment.
- Youtube Chinese Presentation
https://youtu.be/oEtkEntScd0
- Speaker: Gavin Guo, Canonical
Technical Lead - Sustaining Engineering
Taipei, Taiwan
Gavin Guo is a Linux kernel developer in the Ubuntu community. He is the speaker of Spectre v2 Internal in 2018 China L3C and KASan debugging in 2016 China Linux Kernel Conference. He is now working for Canonical in the Customer Success division. He is responsible for the kernel stability and performance tuning of the OpenStack platform, especially in the NUMA(Nonuniform Memory Access), Page Reclaim, and SLUB allocator. He is also the one who introduces KASAN into the team to investigate kernel issues on OpenStack platform and that ended a lot of nightmares.
Linux has this great tool called strace, on OSX there’s a tool called dtruss - based on dtrace. Dtruss is great in functionality, it gives pretty much everything you need. It is just not as nice to use as strace. However, on Linux there is also ltrace for library tracing. That is arguably more useful because you can see much more granular application activity. Unfortunately, there isn’t such a tool on OSX. So, I decided to make one - albeit a simpler version for now. I called it objc_trace.
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)Ontico
HighLoad++ 2017
Зал «Рио-де-Жанейро», 8 ноября, 11:00
Тезисы:
http://www.highload.ru/2017/abstracts/2884.html
Java на Linux встречается повсеместно в информационных системах от больших данных до новомодных serverless архитектур. Как Linux, так и Java имеют свои эксплуатационные нюансы. Понимание этих нюансов важно, чтобы заставить стек Java + Linux работать стабильно и эффективно.
Но на практике "джависты" очень любят мыслить кроссплатформенно и не хотят разбираться с особенностями операционной системы, a "линускоиды" считают JVM чуждым миру Linux процессом, пожирающим всю доступную на сервере память.
А потом появляется Docker, и нюансов становится ещё больше...
Цель доклада - рассказать "джавистам" про Linux и Docker, а "линуксоидам" про JVM.
Netflix tunes Amazon EC2 instances for maximum performance. In this session, you learn how Netflix configures the fastest possible EC2 instances, while reducing latency outliers. This session explores the various Xen modes (e.g., HVM, PV, etc.) and how they are optimized for different workloads. Hear how Netflix chooses Linux kernel versions based on desired performance characteristics and receive a firsthand look at how they set kernel tunables, including hugepages. You also hear about Netflix's use of SR-IOV to enable enhanced networking and their approach to observability, which can exonerate EC2 issues and direct attention back to application performance.
From common errors seen in running Spark applications, e.g., OutOfMemory, NoClassFound, disk IO bottlenecks, History Server crash, cluster under-utilization to advanced settings used to resolve large-scale Spark SQL workloads such as HDFS blocksize vs Parquet blocksize, how best to run HDFS Balancer to re-distribute file blocks, etc. you will get all the scoop in this information-packed presentation.
What to do if Your Kafka Streams App Gets OOMKilled? with Andrey SerebryanskiyHostedbyConfluent
"Have you ever had your stateful Kafka Streams app killed by Kubernetes with the termination reason ""OOMKilled""? Even if you did set up JVM heap limit, the pod still got killed? This is likely due to your RocksDB off-heap memory usage. This talk will explore ways of diagnosing the problem, including:
- analysis of heap and off-heap memory;
- diving into RocksDB metrics;
- looking at k8s java pod measurements.
It will also show a possible solution to the problem with RocksDB and pod memory tuning. Attendants will get hands-on experience of live debugging Kafka Streams app together with useful tips and tricks on its production usage in Kubernetes. As a Streaming Platform Owner at Raiffeisen Bank and a former Data Engineer with 5+ years of experience, I will share some insights on building scalable and reliable Kafka Streams stateful applications."
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
SOCRadar Research Team: Latest Activities of IntelBroker
malloc & vmalloc in Linux
1. * Based on kernel 5.11 (x86_64) – QEMU
* 2-socket CPUs (4 cores/socket)
* 16GB memory
* Kernel parameter: nokaslr norandmaps
* KASAN: disabled
* Userspace: ASLR is disabled
* Legacy BIOS
malloc & vmalloc in Linux
Adrian Huang | Dec, 2022
2. Agenda
• Memory Allocation in Linux
• malloc -> brk() implementation in Linux Kernel
oWill *NOT* focus on glibc malloc implementation: You can read this link: malloc internal
• vmalloc: Non-contiguous memory allocation
• [Note] kmalloc has been discussed here: Slide #88 of Slab Allocator in Linux
Kernel
3. Memory Allocation in Linux
Buddy System
alloc_page(s), __get_free_page(s)
Slab Allocator
kmalloc/kfree
glibc: malloc/free
brk/mmap
. . .
vmalloc
User Space
Kernel Space
Hardware
• Balance between brk() and mmap()
• Use brk() if request size < DEFAULT_MMAP_THRESHOLD_MIN (128 KB)
o The heap can be trimmed only if memory is freed at the top end.
o sbrk() is implemented as a library function that uses the brk() system call.
o When the heap is used up, allocate memory chunk > 128KB via brk().
▪ Save overhead for frequent system call ‘brk()’
• Use mmap() if request size >= DEFAULT_MMAP_THRESHOLD_MIN (128 KB)
o The allocated memory blocks can be independently released back to the system.
o Deallocated space is not placed on the free list for reuse by later allocations.
o Memory may be wasted because mmap allocations must be page-aligned; and the
kernel must perform the expensive task of zeroing out memory allocated.
o Note: glibc uses the dynamic mmap threshold
o Detail: `man mallopt`
[glibc] malloc
• kmalloc: Contiguous memory allocation
• vmalloc: Non-contiguous memory allocation
o Scenario: memory allocation size > PAGE_SIZE (4KB)
o Allocate virtually contiguous memory
▪ Physical memory might NOT be contiguous
kmalloc & vmalloc
5. malloc() -> brk() implementation in
Linux Kernel
• Quick view: Process Address Space – Heap
• sys_brk – Call path
• [From scratch] Launch a program: load_elf_binary() in Linux kernel
o VMA change observation
o Heap (brk or program break) configuration
• [Program Launch] strace observation: heap – brk()
• strace observation: allocate space via malloc()
o If the heap space is used up, how about allocation size when calling malloc()->brk?
• glibc: malloc implementation for memory request size
6. Text
Process Virtual Address
Data
HEAP
mm->start_code =
0x40_0000
BSS
mmap
Stack (Default size: 8MB)
mm->mmap_base =
0x7FFF_F7FF_F000
STACK_TOP_MAX =
0x7FFF_FFFF_F000
0
128MB gap
0x7FFF_FFFF_FFFF
Stack Guard Gap
mm->stack
mm->brk
mm->start_brk
mm->start_data
mm->end_data
Quick view: Process Address Space - Heap
7. Text
Process Virtual Address
Data
HEAP
mm->start_code =
0x40_0000
BSS
mmap
Stack (Default size: 8MB)
mm->mmap_base =
0x7FFF_F7FF_F000
STACK_TOP_MAX =
0x7FFF_FFFF_F000
0
128MB gap
0x7FFF_FFFF_FFFF
Stack Guard Gap
mm->stack
mm->brk
mm->start_brk
mm->start_data
mm->end_data
Quick view: Process Address Space - Heap
Why are they different?
8. sys_brk – Call path
sys_brk
newbrk = PAGE_ALIGN(brk)
oldbrk = PAGE_ALIGN(mm->brk)
__do_munmap
shrink brk if brk <= mm->brk
do_brk_flags
mm->brk = brk
mm_populate
mm->def_flags & VM_LOCKED != 0
can expand the existing
anonymous mapping
vma_merge
vm_area_alloc
cannot expand the existing
anonymous mapping
return mm->brk
if brk < mm->start_brk
__mm_populate
populate_vma_page_range
__get_user_pages
follow_page_mask
return newbrk
mm_populate
faultin_page
handle_mm_fault
Find if the page is populated
The page is NOT populated yet
[By default] Heap (or brk) space is on-demand page
9. vma: R
vm_start =
0x400000
vm_end =
0x401000
vma: R, E
vm_start =
0x401000
vm_end =
0x496000
vma: R
vm_start =
0x496000
vm_end =
0x4be000
GAP
vma: R, W
vm_start =
0x4be000
vm_end =
0x4c4000
GAP
vma (vvar)
vm_start =
0x7ffff7ffa000
vm_end =
0x7ffff7ffe000
vma (vdso)
vm_start =
0x7ffff7ffe000
vm_end =
0x7ffff7fff000
vma (stack)
vm_start =
0x7fffff85d000
vm_end =
0x7ffffffff000
GAP
[From scratch] Launch a program: load_elf_binary() in Linux kernel
# ./free_and_sbrk 1 1
load_elf_binary()
Kernel
10. vma: R
vm_start =
0x400000
vm_end =
0x401000
vma: R, E
vm_start =
0x401000
vm_end =
0x496000
vma: R
vm_start =
0x496000
vm_end =
0x4be000
GAP
vma: R, W
vm_start =
0x4be000
vm_end =
0x4c4000
GAP
vma (vvar)
vm_start =
0x7ffff7ffa000
vm_end =
0x7ffff7ffe000
vma (vdso)
vm_start =
0x7ffff7ffe000
vm_end =
0x7ffff7fff000
vma (stack)
vm_start =
0x7fffff85d000
vm_end =
0x7ffffffff000
GAP
After launching a program: Question
Why?
11. # ./free_and_sbrk 1 1
vma: R
vm_start =
0x400000
vm_end =
0x401000
vma: R, E
vm_start =
0x401000
vm_end =
0x496000
vma: R
vm_start =
0x496000
vm_end =
0x4be000
GAP
vma: R, W
vm_start =
0x4be000
vm_end =
0x4c4000
GAP
vma (vvar)
vm_start =
0x7ffff7ffa000
vm_end =
0x7ffff7ffe000
vma (vdso)
vm_start =
0x7ffff7ffe000
vm_end =
0x7ffff7fff000
vma (stack)
vm_start =
0x7fffff85d000
vm_end =
0x7ffffffff000
GAP
load_elf_binary
set_brk
do_brk_flags
can expand the existing
anonymous mapping
vm_brk_flags
vma_merge
vm_area_alloc
cannot expand the existing
anonymous mapping
[From scratch] Launch a program: load_elf_binary() – Heap Configration
mm->{start_brk, brk} = end
12. # ./free_and_sbrk 1 1
vma: R
vm_start =
0x400000
vm_end =
0x401000
vma: R, E
vm_start =
0x401000
vm_end =
0x496000
vma: R
vm_start =
0x496000
vm_end =
0x4be000
GAP
vma: R, W
vm_start =
0x4be000
vm_end =
0x4c4000
GAP
vma (vvar)
vm_start =
0x7ffff7ffa000
vm_end =
0x7ffff7ffe000
vma (vdso)
vm_start =
0x7ffff7ffe000
vm_end =
0x7ffff7fff000
vma (stack)
vm_start =
0x7fffff85d000
vm_end =
0x7ffffffff000
GAP
load_elf_binary
set_brk
do_brk_flags
can expand the existing
anonymous mapping
vm_brk_flags
vma_merge
vm_area_alloc
cannot expand the existing
anonymous mapping
mm->{start_brk, brk} = end
vma (heap)
vm_start =
0x4c4000
vm_end =
0x4c5000
[From scratch] Launch a program: load_elf_binary() – Heap Configration
13. vm_start =
0x400000
vm_end =
0x401000
vm_start =
0x401000
vm_end =
0x496000
vm_start =
0x496000
vm_end =
0x4be000
GAP
vm_start =
0x4be000
vm_end =
0x4c4000
GAP
vma (vvar)
vm_start =
0x7ffff7ffa000
vm_end =
0x7ffff7ffe000
vma (vdso)
vm_start =
0x7ffff7ffe000
vm_end =
0x7ffff7fff000
vma (stack)
vm_start =
0x7fffff85d000
vm_end =
0x7ffffffff000
GAP
load_elf_binary
set_brk
do_brk_flags
can expand the existing
anonymous mapping
vm_brk_flags
vma_merge
vm_area_alloc
cannot expand the existing
anonymous mapping
vma (heap)
vm_start =
0x4c4000
vm_end =
0x4c5000
mm->brk = mm->start_brk
= 0x4c5000
vma: R vma: R, E vma: R vma: R, W
[From scratch] Launch a program: load_elf_binary() – Heap Configration
mm->{start_brk, brk} = end
14. vm_start =
0x400000
vm_end =
0x401000
vm_start =
0x401000
vm_end =
0x496000
vm_start =
0x496000
vm_end =
0x4be000
GAP
vm_start =
0x4be000
vm_end =
0x4c4000
GAP
vma (vvar)
vm_start =
0x7ffff7ffa000
vm_end =
0x7ffff7ffe000
vma (vdso)
vm_start =
0x7ffff7ffe000
vm_end =
0x7ffff7fff000
vma (stack)
vm_start =
0x7fffff85d000
vm_end =
0x7ffffffff000
GAP
load_elf_binary
set_brk
do_brk_flags
can expand the existing
anonymous mapping
vm_brk_flags
vma_merge
vm_area_alloc
cannot expand the existing
anonymous mapping
vma (heap)
vm_start =
0x4c4000
vm_end =
0x4c5000
mm->brk = mm->start_brk
= 0x4c5000
vma: R vma: R, E vma: R vma: R, W
[From scratch] Launch a program: load_elf_binary() – Heap Configration
mm->{start_brk, brk} = end
Why?
15. vm_start =
0x400000
vm_end =
0x401000
vm_start =
0x401000
vm_end =
0x496000
vm_start =
0x496000
vm_end =
0x4be000
GAP
vm_start =
0x4be000
vm_end =
0x4c4000
GAP
vma (vvar)
vm_start =
0x7ffff7ffa000
vm_end =
0x7ffff7ffe000
vma (vdso)
vm_start =
0x7ffff7ffe000
vm_end =
0x7ffff7fff000
vma (stack)
vm_start =
0x7fffff85d000
vm_end =
0x7ffffffff000
GAP
load_elf_binary
set_brk
do_brk_flags
can expand the existing
anonymous mapping
vm_brk_flags
vma_merge
vm_area_alloc
cannot expand the existing
anonymous mapping
vma (heap)
vm_start =
0x4c4000
vm_end =
0x4c5000
mm->brk = mm->start_brk
= 0x4c5000
vma: R vma: R, E vma: R vma: R, W
[From scratch] Launch a program: load_elf_binary() – Heap Configration
mm->{start_brk, brk} = end
elf_bss
elf_brk
16. vm_start =
0x400000
vm_end =
0x401000
vm_start =
0x401000
vm_end =
0x496000
vm_start =
0x496000
vm_end =
0x4be000
GAP
vm_start =
0x4be000
vm_end =
0x4c4000
GAP
vma (vvar)
vm_start =
0x7ffff7ffa000
vm_end =
0x7ffff7ffe000
vma (vdso)
vm_start =
0x7ffff7ffe000
vm_end =
0x7ffff7fff000
vma (stack)
vm_start =
0x7fffff85d000
vm_end =
0x7ffffffff000
GAP
load_elf_binary
set_brk
do_brk_flags
can expand the existing
anonymous mapping
vm_brk_flags
vma_merge
vm_area_alloc
cannot expand the existing
anonymous mapping
vma (heap)
vm_start =
0x4c4000
vm_end =
0x4c5000
mm->brk = mm->start_brk = 0x4c5000
vma: R vma: R, E vma: R vma: R, W
[From scratch] Launch a program: load_elf_binary() – Heap Configration
mm->{start_brk, brk} = end
elf_bss
elf_brk
range(elf_bss, elf_brk): bss space
17. [Program Launch] strace observation: heap – brk()
vma: R
vm_start =
0x400000
vm_end =
0x401000
vma: R, E
vm_start =
0x401000
vm_end =
0x496000
vma: R
vm_start =
0x496000
vm_end =
0x4be000
GAP
vma: R, W
vm_start =
0x4be000
vm_end =
0x4c4000
GAP
vma (vvar)
vm_start =
0x7ffff7ffa000
vm_end =
0x7ffff7ffe000
vma (vdso)
vm_start =
0x7ffff7ffe000
vm_end =
0x7ffff7fff000
vma (stack)
vm_start =
0x7fffff85d000
vm_end =
0x7ffffffff000
GAP
vma (heap)
vm_start =
0x4c4000
vm_end =
0x4c7000
mm->brk = 0x4c61c0
mm->start_brk = 0x4c5000
Demand paging: Allocate a physical page when a page fault occurs
sys_brk
newbrk = PAGE_ALIGN(brk)
oldbrk = PAGE_ALIGN(mm->brk)
__do_munmap
shrink brk if brk <= mm->brk
do_brk_flags
mm->brk = brk
mm_populate
mm->def_flags & VM_LOCKED != 0
can expand the existing
anonymous mapping
vma_merge
vm_area_alloc
cannot expand the existing
anonymous mapping
return mm->brk
if brk < mm->start_brk
18. vm_start =
0x400000
vm_end =
0x401000
vm_start =
0x401000
vm_end =
0x496000
vm_start =
0x496000
vm_end =
0x4be000
GAP
vm_start =
0x4be000
vm_end =
0x4c4000
GAP
vma (vvar)
vm_start =
0x7ffff7ffa000
vm_end =
0x7ffff7ffe000
vma (vdso)
vm_start =
0x7ffff7ffe000
vm_end =
0x7ffff7fff000
vma (stack)
vm_start =
0x7fffff85d000
vm_end =
0x7ffffffff000
GAP
vma (heap)
vm_start =
0x4c4000
vm_end =
0x4c7000
mm->brk = 0x4c61c0
mm->start_brk = 0x4c5000
Demand paging: Allocate a physical page when a page fault occurs
vma: R vma: R, E vma: R vma: R, W
[Program Launch] strace observation: heap – brk()
19. vm_start =
0x400000
vm_end =
0x401000
vm_start =
0x401000
vm_end =
0x496000
vm_start =
0x496000
vm_end =
0x4be000
GAP
vm_start =
0x4be000
vm_end =
0x4c4000
GAP
vma (vvar)
vm_start =
0x7ffff7ffa000
vm_end =
0x7ffff7ffe000
vma (vdso)
vm_start =
0x7ffff7ffe000
vm_end =
0x7ffff7fff000
vma (stack)
vm_start =
0x7fffff85d000
vm_end =
0x7ffffffff000
GAP
vma (heap)
vm_start =
0x4c4000
vm_end =
0x4e8000
mm->brk = 0x4e8000
mm->start_brk = 0x4c5000
Demand paging: Allocate a physical page when a page fault occurs
sys_brk
newbrk = PAGE_ALIGN(brk)
oldbrk = PAGE_ALIGN(mm->brk)
__do_munmap
shrink brk if brk <= mm->brk
do_brk_flags
mm->brk = brk
mm_populate
mm->def_flags & VM_LOCKED != 0
can expand the existing
anonymous mapping
vma_merge
vm_area_alloc
cannot expand the existing
anonymous mapping
return mm->brk
if brk < mm->start_brk
vma: R vma: R, E vma: R vma: R, W
[Program Launch] strace observation: heap – brk()
20. Recap
vma: R
vm_start =
0x400000
vm_end =
0x401000
vma: R, E
vm_start =
0x401000
vm_end =
0x496000
vma: R
vm_start =
0x496000
vm_end =
0x4be000
GAP
vma: R, W
vm_start =
0x4be000
vm_end =
0x4c4000
GAP
vma (vvar)
vm_start =
0x7ffff7ffa000
vm_end =
0x7ffff7ffe000
vma (vdso)
vm_start =
0x7ffff7ffe000
vm_end =
0x7ffff7fff000
vma (stack)
vm_start =
0x7fffff85d000
vm_end =
0x7ffffffff000
GAP
vma (heap)
vm_start =
0x4c4000
vm_end =
0x4e8000
mm->brk = 0x4e8000
mm->start_brk = 0x4c5000
Still not equal
24. vm_start =
0x400000
vm_end =
0x401000
vm_start =
0x401000
vm_end =
0x496000
vm_start =
0x496000
vm_end =
0x4be000
GAP
vm_start =
0x4be000
vm_end =
0x4c1000
GAP
vma (vvar)
vm_start =
0x7ffff7ffa000
vm_end =
0x7ffff7ffe000
vma (vdso)
vm_start =
0x7ffff7ffe000
vm_end =
0x7ffff7fff000
vma (stack)
vm_start =
0x7fffff85d000
vm_end =
0x7ffffffff000
GAP
vma (heap)
vm_start =
0x4c4000
vm_end =
0x4e8000
mm->brk = 0x4e8000
mm->start_brk = 0x4c5000
vm_start =
0x4c1000
vm_end =
0x4c4000
match
vma: R, W
vma: R vma: R, E vma: R vma: R
[Program Launch] strace observation: mprotect()
25. strace observation: allocate space via malloc() #1
[Init stage]
0x4e8000 – 0x4c7000 = 0x21000
(132KB: 33 pages)
• Balance between brk() and mmap()
• Use brk() if request size < DEFAULT_MMAP_THRESHOLD_MIN (128 KB)
o The heap can be trimmed only if memory is freed at the top end.
o sbrk() is implemented as a library function that uses the brk() system call.
o When the heap is used up, allocate memory chunk > 128KB via brk().
▪ Save overhead for frequent system call ‘brk()’
• Use mmap() if request size >= DEFAULT_MMAP_THRESHOLD_MIN (128 KB)
o The allocated memory blocks can be independently released back to the system.
o Deallocated space is not placed on the free list for reuse by later allocations.
o Memory may be wasted because mmap allocations must be page-aligned; and the
kernel must perform the expensive task of zeroing out memory allocated.
o Note: glibc uses the dynamic mmap threshold
o Detail: `man mallopt`
[glibc] malloc
26. strace observation: allocate space via malloc() #2
[Init stage] 0x21000 (132KB: 33 pages)
• Balance between brk() and mmap()
• Use brk() if request size < DEFAULT_MMAP_THRESHOLD_MIN (128 KB)
o The heap can be trimmed only if memory is freed at the top end.
o sbrk() is implemented as a library function that uses the brk() system call.
o When the heap is used up, allocate memory chunk > 128KB via brk().
▪ Save overhead for frequent system call ‘brk()’
• Use mmap() if request size >= DEFAULT_MMAP_THRESHOLD_MIN (128 KB)
o The allocated memory blocks can be independently released back to the system.
o Deallocated space is not placed on the free list for reuse by later allocations.
o Memory may be wasted because mmap allocations must be page-aligned; and the
kernel must perform the expensive task of zeroing out memory allocated.
o Note: glibc uses the dynamic mmap threshold
o Detail: `man mallopt`
[glibc] malloc
Current program break is used
up: allocate another 132KB
malloc.c
Heap space allocation from malloc(): Allocate memory chunk > 128KB via brk()
27. Memory Allocation in Linux – brk() detail
Buddy System
alloc_page(s), __get_free_page(s)
Slab Allocator
kmalloc/kfree
brk or mmap
. . .
vmalloc
User Space
Kernel Space
Hardware
• Balance between brk() and mmap()
• Use brk() if request size < DEFAULT_MMAP_THRESHOLD_MIN (128 KB)
o The heap can be trimmed only if memory is freed at the top end.
o sbrk() is implemented as a library function that uses the brk() system call.
o When the heap is used up, allocate memory chunk > 128KB via brk().
▪ Save overhead for frequent system call ‘brk()’
• Use mmap() if request size >= DEFAULT_MMAP_THRESHOLD_MIN (128 KB)
o The allocated memory blocks can be independently released back to the system.
o Deallocated space is not placed on the free list for reuse by later allocations.
o Memory may be wasted because mmap allocations must be page-aligned; and the
kernel must perform the expensive task of zeroing out memory allocated.
o Note: glibc uses the dynamic mmap threshold
o Detail: `man mallopt`
[glibc] malloc: check sysmalloc() for implementation
User application
glibc: malloc implementation
Allocated
heap space
enough? Y: Return available address from the allocated
heap space
N: if size < 128KB, then allocate “memory chunk > 128KB” by
calling brk()
VMA Configuration &
program break adjustment
Page fault handler
malloc
30. malloc.c
1
2
3
4
5
6
Heap is expanded for 0x21000 (33 pages): 0x555555559000 -> 0x55555557a000
glibc: malloc implementation for memory request size
Detail Reference
• [glibc] malloc internals
o Concept: Chunk, arenas, heaps, and thread
local cache (tcache)
31. vmalloc: Non-contiguous memory
allocation
• 64-bit Virtual Address in x86_64
• Call path
• vmap_area & guard page
• Example: vmalloc size = 8MB
o Kernel data structure
o qemu + gdb observation
• vmalloc users/scenario
32. Kernel Space
0x0000_7FFF_FFFF_FFFF
0xFFFF_8000_0000_0000
128TB
Page frame direct
mapping (64TB)
page_offset_base
64-bit Virtual Address
Kernel Virtual Address
0
0xFFFF_FFFF_FFFF_FFFF
Guard hole (8TB)
LDT remap for PTI (0.5TB)
Unused hole (0.5TB)
vmalloc/ioremap (32TB)
vmalloc_base
Unused hole (1TB)
Virtual memory map – 1TB
(store page frame descriptor)
…
vmemmap_base
page_ofset_base = 0xFFFF_8880_0000_0000
vmalloc_base = 0xFFFF_C900_0000_0000
vmemmap_base = 0xFFFF_EA00_0000_0000
* Can be dynamically configured by KASLR (Kernel Address Space Layout Randomization - "arch/x86/mm/kaslr.c")
Default Configuration
Kernel text mapping from
physical address 0
Kernel code [.text, .data…]
Modules
__START_KERNEL_map = 0xFFFF_FFFF_8000_0000
__START_KERNEL = 0xFFFF_FFFF_8100_0000
MODULES_VADDR
0xFFFF_8000_0000_0000
Empty Space
User Space
128TB
1GB or 512MB
1GB or 1.5GB Fix-mapped address space
(Expanded to 4MB: 05ab1d8a4b36) FIXADDR_START
Unused hole (2MB)
VMALLOC_START = 0xFFFF_C900_0000_0000
VMALLOC_END = 0xFFFF_E8FF_FFFF_FFFF
FIXADDR_TOP = 0xFFFF_FFFF_FF7F_F000
Reference: Documentation/x86/x86_64/mm.rst
64-bit Virtual Address in x86_64
33. vmalloc
Memory allocation for storing pointers
of page descriptors: area->pages[]
__get_vm_area_node
Allocate a vm_struct from kmalloc (slub allocator)
__vmalloc_node __vmalloc_node_range
Range: VMALLOC_START-VMALLOC_END
kzalloc_node
setup_vmalloc_vm
alloc_vmap_area
1. Allocate a vmap_area struct from
kmem_cache (slub allocator)
2. Get virtual address from vmalloc RB-tree
__vmalloc_area_node
area->pages[i] = page
page = alloc_page(gfp_mask)
for (i = 0; i < area->nr_pages; i++)
page table population
map_kernel_range
Get virtual address from vmalloc RB-tree
(vmap_area RB-tree)
vmalloc – call path
Page table is populated immediately upon the request: No page fault
36. Example: vmalloc size = 8MB: alloc_vmap_area()
vmap_area
va_start = 0xffffc90001a4d000
va_end = 0xffffc9000224e000
rb_node
list
subtree_max_size
vm
union
__get_vm_area_node
Allocate a vm_struct from kmalloc (slub allocator)
__vmalloc_node_range kzalloc_node
setup_vmalloc_vm
alloc_vmap_area
Allocate a vmap_area struct from
kmem_cache (slub allocator)
__vmalloc_area_node
Get virtual address from vmalloc RB-tree
(vmap_area RB-tree)
find_vmap_lowest_match(): Get a VA from RB-tree
insert_vmap_area()
free_vmap_area_root: init by vmalloc_init()
vmap_area_root
list_head: vmap_area_list vmap_area vmap_area vmap_area
vmalloc: 8MB
vmalloc-test.ko
vmalloc subsystem
buddy system
alloc_pages()
Example
37. Example: vmalloc size = 8MB: setup_vmalloc_vm()
vmap_area
va_start = 0xffffc90001a4d000
va_end = 0xffffc9000224e000
rb_node
list
subtree_max_size
vm
union
__get_vm_area_node
Allocate a vm_struct from kmalloc (slub allocator)
__vmalloc_node_range kzalloc_node
setup_vmalloc_vm
alloc_vmap_area
Allocate a vmap_area struct from
kmem_cache (slub allocator)
__vmalloc_area_node
Get virtual address from vmalloc RB-tree
(vmap_area RB-tree)
find_vmap_lowest_match(): Get a VA from RB-tree
insert_vmap_area()
free_vmap_area_root: init by vmalloc_init()
vmap_area_root
list_head: vmap_area_list vmap_area vmap_area vmap_area
vmalloc: 8MB
vmalloc-test.ko
vmalloc subsystem
buddy system
alloc_pages()
Example
vm_struct
next
addr = 0xffffc90001a4d000
size = 0x801000 (w/ guard page)
flags = 0x22
**pages = NULL
nr_pages = 0
phys_addr
caller
38. Example: vmalloc size = 8MB: __vmalloc_area_node()
vmap_area
va_start = 0xffffc90001a4d000
va_end = 0xffffc9000224e000
rb_node
list
subtree_max_size
vm
union
__get_vm_area_node
__vmalloc_node_range
__vmalloc_area_node
find_vmap_lowest_match(): Get a VA from RB-tree
free_vmap_area_root: init by vmalloc_init()
vmap_area_root
list_head: vmap_area_list vmap_area vmap_area vmap_area
vmalloc: 8MB
vmalloc-test.ko
vmalloc subsystem
buddy system
alloc_pages()
Example
vm_struct
next
addr = 0xffffc90001a4d000
size = 0x801000 (w/ guard page)
flags = 0x22
**pages = 0xffffc900019b9000
nr_pages = 0x800 (2048)
phys_addr
caller
Memory allocation for storing pointers
of page descriptors: area->pages[]
area->pages[i] = page
page = alloc_page(gfp_mask)
for (i = 0; i < area->nr_pages; i++)
page table population
map_kernel_range
Page
Descriptor
Page
Descriptor
...
Memory allocation for page descriptor pointer
• size: 8MB/4KB * 8 = 16384 bytes
• Allocated from vmalloc ( > 4KB) or kmalloc
(<= 4KB)