SlideShare a Scribd company logo
1 of 18
Download to read offline
Introduction to Unix, Part 2




CS 167                III–1   Copyright © 2006 Thomas W. Doeppner. All rights reserved.




                                                                                          III–1
File-System I/O

                    • Concerns
                      – getting data to and from the device
                          - disk architecture
                      – caching in primary storage
                      – simultaneous I/O and computation




           CS 167                           III–2      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  To understand how the file system operates, we certainly need to know a bit about disk
architectures and how we can most effectively utilize them. We cover this at length later on
in the course, but for now we’ll discuss the general concerns. What’s also important is how
the contents of files are cached in primary storage, and how this cache is used so that I/O
and computation can proceed at the same time.




                                                                                                                   III–2
Disk Architecture
                                  Sector


                                                                             Track




         Disk heads
         (on top and bottom
         of each platter)                                       Cylinder

CS 167                    III–3     Copyright © 2006 Thomas W. Doeppner. All rights reserved.




                                                                                                III–3
The Buffer Cache


                            Buffer

                       User Process




                        Buffer Cache

           CS 167                            III–4      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  File I/O in Unix, and in most operating systems, is not done directly to the disk drive, but
through an intermediary, the buffer cache.
  The buffer cache has two primary functions. The first, and most important, is to make
possible concurrent I/O and computation within a Unix process. The second is to insulate
the user from physical block boundaries.
  From a user thread’s point of view, I/O is synchronous. By this we mean that when the I/O
system call returns, the system no longer needs the user-supplied buffer. For example, after
a write system call, the data in the user buffer has either been transmitted to the device or
copied to a kernel buffer—the user can now scribble over the buffer without affecting the
data transfer. Because of this synchronization, from a user thread’s point of view, no more
than one I/O operation can be in progress at a time. Thus user-implemented multibuffered
I/O is not possible (in a single-threaded process).
  The buffer cache provides a kernel implementation of multibuffering I/O, and thus
concurrent I/O and computation are possible even for single-threaded processes.




                                                                                                                    III–4
Multi-Buffered I/O


                    Process

                      read( … )



                                           i-1           i                      i+1
                                       previous     current                probable
                                        block        block                next block




           CS 167                           III–5     Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  The use of read-aheads and write-behinds makes possible simultaneous I/O and
computation: if the block currently being fetched is block i and the previous block fetched
was block i-1, then block i+1 is also fetched. Modified blocks are normally written out not
synchronously but instead sometime after they were modified, asynchronously.




                                                                                                                  III–5
Maintaining the Cache

                                                            buffer requests


                              Aged        probably free buffers

                                                            returns of no-longer-
                                                            active buffers
                     oldest


                              LRU         probably active buffers

                                                            returns of active
                 youngest
                                                            buffers

            CS 167                              III–6      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




   In the typical Unix system, active buffers are maintained in least-recently-used (LRU) order
in the system-wide LRU list. Thus after a buffer has been used (as part of a read or write
system call), it is returned to the end of the LRU list. The system also maintains a separate
list of “free” buffers called the aged list. Included in this list are buffers holding no-longer-
needed blocks, such as blocks from files that have been deleted.
   Fresh buffers are taken from the aged list. If this list is empty, then a buffer is obtained
from the LRU list as follows. If the first buffer (least recently used) in this list is clean (i.e.,
contains a block that is identical to its copy on disk), then this buffer is taken. Otherwise
(i.e., if the buffer is dirty), it is written out to disk asynchronously and, when written, is
placed at the end of the aged list. The search for a fresh buffer continues on to the next
buffer in the LRU list, etc.
   When a file is deleted, any buffers containing its blocks are placed at the head of the aged
list. Also, when I/O into a buffer results in an I/O error, the buffer is placed at the head of
the aged list.




                                                                                                                       III–6
Address Translation
                     Virtual
                     Memory
                                                                          Real
                                                                         Memory




                                                                                                 page
           pages                                                                                 frames


                                          address map




            CS 167                            III–7     Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  References to virtual memory are mapped to real memory. In this lecture we assume that
address translation is done via paging. We use the terminology that virtual memory is divided
into fixed-size pieces called pages, while real-memory is divided into identically sized pieces
called page frames.




                                                                                                                    III–7
Multiprocess
                            Address Translation
                         Process 1’s
           Process 2’s     Virtual
             Virtual       Memory
             Memory                                                   Real
                                                                     Memory




           CS 167                         III–8     Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  Multiple processes compete for real memory. While many page frames appear only in one
process, others are shared by multiple processes.




                                                                                                                III–8
Can we reimplement read and write so
         that data isn’t copied from the kernel
         buffer cache to the user buffer, but re-
         mapped to it?




CS 167                       III–9   Copyright © 2006 Thomas W. Doeppner. All rights reserved.




                                                                                                 III–9
No …

                       (But we can do something almost as good.)




           CS 167                          III–10    Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  Why not? It’s because user buffers aren’t necessarily either multiples of a page size in
length or aligned on page boundaries.




                                                                                                                 III–10
Mapped Files

                    • Traditional File I/O
                       char buf[BigEnough];
                       fd = open(file, O_RDWR);
                       for (i=0; i<n_recs; i++) {
                          read(fd, buf, sizeof(buf));
                          use(buf);
                       }
                    • Mapped File I/O
                       void *MappedFile;
                       fd = open(file, O_RDWR);
                       MappedFile = mmap(... , fd, ...);
                       for (i=0; i<n_recs; i++)
                          use(MappedFile[i]);

           CS 167                            III–11     Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  Traditional I/O involves explicit calls to read and write, which in turn means that data is
accessed via a buffer; in fact, two buffers are usually employed: data is transferred
between a user buffer and a kernel buffer, and between the kernel buffer and the I/O
device.
  An alternative approach is to map a file into a process’s address space: the file provides
the data for a portion of the address space and the kernel’s virtual-memory system is
responsible for the I/O. A major benefit of this approach is that data is transferred directly
from the device to where the user needs it; there is no need for an extra system buffer.




                                                                                                                    III–11
Traditional File I/O


         user buffer



                           kernel buffer



                                                                          Disk


CS 167                           III–12    Copyright © 2006 Thomas W. Doeppner. All rights reserved.




                                                                                                       III–12
Mapped File I/O


virtual memory


                    address map



                            real memory



                                                                     Disk

CS 167                     III–13   Copyright © 2006 Thomas W. Doeppner. All rights reserved.




                                                                                                III–13
Mmap System Call
                     void *mmap(
                       void *addr,
                         // where to map file (0 if don’t care)
                       size_t len,
                         // how much to map
                       int prot,
                         // memory protection (read, write, exec.)
                       int flags,
                         // shared vs. private, plus more
                       int fd,
                         // which file
                       off_t off
                         // starting from where
                       )


            CS 167                            III–14     Copyright © 2006 Thomas W. Doeppner. All rights reserved.




   Mmap maps the file given by fd, starting at position off, for len bytes, into the caller’s
address space starting at location addr
         len is rounded up to a multiple of the page size
         off must be page-aligned
         if addr is zero, the kernel assigns an address
         if addr is positive, it is a suggestion to the kernel as to where the mapped file should
         be located (it usually will be aligned to a page).           However, if flags includes
         MAP_FIXED, then addr is not modified by the kernel (and if its value is not
         reasonable, the call fails)
         the call returns the address of the beginning of the mapped file
   The flags argument must include either MAP_SHARED or MAP_PRIVATE (but not both). If
it’s MAP_SHARED, then the mapped portion of the caller’s address space contains the
current contents of the file; when the mapped portion of the address space is modified by the
process, the corresponding portion of the file is modified.
   However, if flags includes MAP_PRIVATE, then the idea is that the mapped portion of the
address space is initialized with the contents of the file, but that changes made to the
mapped portion of the address space by the process are private and not written back to the
file. The details are a bit complicated: as long as the mapping process does not modify any of
the mapped portion of the address space, the pages contained in it contain the current
contents of the corresponding pages of the file. However, if the process modifies a page, then
that particular page no longer contains the current contents of the corresponding file page,
but contains whatever modifications are made to it by the process. These changes are not
written back to the file and not shared with any other process that has mapped the file. It’s
unspecified what the situation is for other pages in the mapped region after one of them is
modified. Depending on the implementation, they might continue to contain the current
contents of the corresponding pages of the file until they, themselves, are modified. Or they
might also be treated as if they’d just been written to and thus no longer be shared with
others.



                                                                                                                     III–14
Share-Mapped File I/O


virtual memory


                     address map



                             real memory
    virtual memory


                                                                      Disk

CS 167                      III–15   Copyright © 2006 Thomas W. Doeppner. All rights reserved.




                                                                                                 III–15
Example
            int main( ) {
              int fd;
              char* buffer;
              int size;

                fd = open("file", O_RDWR);
                size = lseek(fd, 0, SEEK_END);
                if ((int)(buffer = (char *)mmap(0, size,
                    PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0)) == -1) {
                  perror("mmap");
                  exit(1);
                }

                // buffer points to region of memory containing the
                // contents of the file

                ...

            }
           CS 167                           III–16     Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  Here we map the entire contents of a file into the caller’s address space, allowing it both
read and write access. Note mapping the file into memory does not cause any immediate I/O
to take place. The operating system will perform the I/O when necessary, according to its
own rules.




                                                                                                                   III–16
What if you do I/O both traditionally and
     via mmap to the same file simultaneously?




                      ’d   ?
CS 167                  III–17   Copyright © 2006 Thomas W. Doeppner. All rights reserved.




                                                                                             III–17
Integrated VM and Buffer Cache
                          Process 1’s
            Process 2’s     Virtual
                                          Kernel’s
              Virtual       Memory                                                               Disk
                                        Buffer Cache
              Memory
                                                                                Real
                                                                               Memory




            CS 167                              III–18   Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  To integrate virtual memory (and hence I/O via mmap) with traditional I/O, we must make
certain that there is at most one copy of any page from a file in primary storage. This copy (in
a page frame) is mapped both into the buffer cache and into all processes that have mmaped
that page of the file.




                                                                                                                     III–18

More Related Content

What's hot

Board support package_on_linux
Board support package_on_linuxBoard support package_on_linux
Board support package_on_linuxVandana Salve
 
Multi core-architecture
Multi core-architectureMulti core-architecture
Multi core-architecturePiyush Mittal
 
FUSE (Filesystem in Userspace) on OpenSolaris
FUSE (Filesystem in Userspace) on OpenSolarisFUSE (Filesystem in Userspace) on OpenSolaris
FUSE (Filesystem in Userspace) on OpenSolariselliando dias
 
Lec10. Memory and storage
Lec10.      Memory    and      storageLec10.      Memory    and      storage
Lec10. Memory and storageAnzaDar3
 
Unit 1 four part pocessor and memory
Unit 1 four part pocessor and memoryUnit 1 four part pocessor and memory
Unit 1 four part pocessor and memoryNeha Kurale
 
Boot process -test
Boot process -testBoot process -test
Boot process -testHari Shankar
 
Multicore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash PrajapatiMulticore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash PrajapatiAnkit Raj
 
Computer Hardware
Computer HardwareComputer Hardware
Computer HardwareDeepa Rani
 
Linux Porting to a Custom Board
Linux Porting to a Custom BoardLinux Porting to a Custom Board
Linux Porting to a Custom BoardPatrick Bellasi
 
Processors and its Types
Processors and its TypesProcessors and its Types
Processors and its TypesNimrah Shahbaz
 
Ov psim demo_slides_power_pc
Ov psim demo_slides_power_pcOv psim demo_slides_power_pc
Ov psim demo_slides_power_pcsimon56
 

What's hot (20)

Board support package_on_linux
Board support package_on_linuxBoard support package_on_linux
Board support package_on_linux
 
Multi core-architecture
Multi core-architectureMulti core-architecture
Multi core-architecture
 
Multi processor
Multi processorMulti processor
Multi processor
 
linux kernel overview 2013
linux kernel overview 2013linux kernel overview 2013
linux kernel overview 2013
 
Linux kernel
Linux kernelLinux kernel
Linux kernel
 
FUSE (Filesystem in Userspace) on OpenSolaris
FUSE (Filesystem in Userspace) on OpenSolarisFUSE (Filesystem in Userspace) on OpenSolaris
FUSE (Filesystem in Userspace) on OpenSolaris
 
Lec10. Memory and storage
Lec10.      Memory    and      storageLec10.      Memory    and      storage
Lec10. Memory and storage
 
Linux Booting Steps
Linux Booting StepsLinux Booting Steps
Linux Booting Steps
 
Windows xp
Windows xpWindows xp
Windows xp
 
Unit 1 four part pocessor and memory
Unit 1 four part pocessor and memoryUnit 1 four part pocessor and memory
Unit 1 four part pocessor and memory
 
virtual memory - Computer operating system
virtual memory - Computer operating systemvirtual memory - Computer operating system
virtual memory - Computer operating system
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
 
Boot process -test
Boot process -testBoot process -test
Boot process -test
 
Zos1.13 migration
Zos1.13 migrationZos1.13 migration
Zos1.13 migration
 
Multicore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash PrajapatiMulticore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash Prajapati
 
Computer Hardware
Computer HardwareComputer Hardware
Computer Hardware
 
Linux Porting to a Custom Board
Linux Porting to a Custom BoardLinux Porting to a Custom Board
Linux Porting to a Custom Board
 
Multicore computers
Multicore computersMulticore computers
Multicore computers
 
Processors and its Types
Processors and its TypesProcessors and its Types
Processors and its Types
 
Ov psim demo_slides_power_pc
Ov psim demo_slides_power_pcOv psim demo_slides_power_pc
Ov psim demo_slides_power_pc
 

Similar to 03unixintro2

Similar to 03unixintro2 (20)

02unixintro
02unixintro02unixintro
02unixintro
 
Buffer cache unix ppt Mrs.Sowmya Jyothi
Buffer cache unix ppt Mrs.Sowmya JyothiBuffer cache unix ppt Mrs.Sowmya Jyothi
Buffer cache unix ppt Mrs.Sowmya Jyothi
 
File Management in Operating Systems
File Management in Operating SystemsFile Management in Operating Systems
File Management in Operating Systems
 
Spring 2013 coordinated ims and db2 recovery
Spring 2013 coordinated ims and db2 recoverySpring 2013 coordinated ims and db2 recovery
Spring 2013 coordinated ims and db2 recovery
 
Cache coherence ppt
Cache coherence pptCache coherence ppt
Cache coherence ppt
 
Unixfs
UnixfsUnixfs
Unixfs
 
File
FileFile
File
 
PythonFuse (PyCon4)
PythonFuse (PyCon4)PythonFuse (PyCon4)
PythonFuse (PyCon4)
 
Os organization
Os organizationOs organization
Os organization
 
Chapter - 1
Chapter - 1Chapter - 1
Chapter - 1
 
ch17
ch17ch17
ch17
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OS
 
Introduction To operating System:
Introduction To operating System:Introduction To operating System:
Introduction To operating System:
 
Wish list from PostgreSQL - Linux Kernel Summit 2009
Wish list from PostgreSQL - Linux Kernel Summit 2009Wish list from PostgreSQL - Linux Kernel Summit 2009
Wish list from PostgreSQL - Linux Kernel Summit 2009
 
First compailer written
First compailer writtenFirst compailer written
First compailer written
 
Unix case-study
Unix case-studyUnix case-study
Unix case-study
 
Lecture1 Introduction
Lecture1  IntroductionLecture1  Introduction
Lecture1 Introduction
 
Unix Operating System
Unix Operating SystemUnix Operating System
Unix Operating System
 
P05-slides
P05-slidesP05-slides
P05-slides
 
Vmreport
VmreportVmreport
Vmreport
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

03unixintro2

  • 1. Introduction to Unix, Part 2 CS 167 III–1 Copyright © 2006 Thomas W. Doeppner. All rights reserved. III–1
  • 2. File-System I/O • Concerns – getting data to and from the device - disk architecture – caching in primary storage – simultaneous I/O and computation CS 167 III–2 Copyright © 2006 Thomas W. Doeppner. All rights reserved. To understand how the file system operates, we certainly need to know a bit about disk architectures and how we can most effectively utilize them. We cover this at length later on in the course, but for now we’ll discuss the general concerns. What’s also important is how the contents of files are cached in primary storage, and how this cache is used so that I/O and computation can proceed at the same time. III–2
  • 3. Disk Architecture Sector Track Disk heads (on top and bottom of each platter) Cylinder CS 167 III–3 Copyright © 2006 Thomas W. Doeppner. All rights reserved. III–3
  • 4. The Buffer Cache Buffer User Process Buffer Cache CS 167 III–4 Copyright © 2006 Thomas W. Doeppner. All rights reserved. File I/O in Unix, and in most operating systems, is not done directly to the disk drive, but through an intermediary, the buffer cache. The buffer cache has two primary functions. The first, and most important, is to make possible concurrent I/O and computation within a Unix process. The second is to insulate the user from physical block boundaries. From a user thread’s point of view, I/O is synchronous. By this we mean that when the I/O system call returns, the system no longer needs the user-supplied buffer. For example, after a write system call, the data in the user buffer has either been transmitted to the device or copied to a kernel buffer—the user can now scribble over the buffer without affecting the data transfer. Because of this synchronization, from a user thread’s point of view, no more than one I/O operation can be in progress at a time. Thus user-implemented multibuffered I/O is not possible (in a single-threaded process). The buffer cache provides a kernel implementation of multibuffering I/O, and thus concurrent I/O and computation are possible even for single-threaded processes. III–4
  • 5. Multi-Buffered I/O Process read( … ) i-1 i i+1 previous current probable block block next block CS 167 III–5 Copyright © 2006 Thomas W. Doeppner. All rights reserved. The use of read-aheads and write-behinds makes possible simultaneous I/O and computation: if the block currently being fetched is block i and the previous block fetched was block i-1, then block i+1 is also fetched. Modified blocks are normally written out not synchronously but instead sometime after they were modified, asynchronously. III–5
  • 6. Maintaining the Cache buffer requests Aged probably free buffers returns of no-longer- active buffers oldest LRU probably active buffers returns of active youngest buffers CS 167 III–6 Copyright © 2006 Thomas W. Doeppner. All rights reserved. In the typical Unix system, active buffers are maintained in least-recently-used (LRU) order in the system-wide LRU list. Thus after a buffer has been used (as part of a read or write system call), it is returned to the end of the LRU list. The system also maintains a separate list of “free” buffers called the aged list. Included in this list are buffers holding no-longer- needed blocks, such as blocks from files that have been deleted. Fresh buffers are taken from the aged list. If this list is empty, then a buffer is obtained from the LRU list as follows. If the first buffer (least recently used) in this list is clean (i.e., contains a block that is identical to its copy on disk), then this buffer is taken. Otherwise (i.e., if the buffer is dirty), it is written out to disk asynchronously and, when written, is placed at the end of the aged list. The search for a fresh buffer continues on to the next buffer in the LRU list, etc. When a file is deleted, any buffers containing its blocks are placed at the head of the aged list. Also, when I/O into a buffer results in an I/O error, the buffer is placed at the head of the aged list. III–6
  • 7. Address Translation Virtual Memory Real Memory page pages frames address map CS 167 III–7 Copyright © 2006 Thomas W. Doeppner. All rights reserved. References to virtual memory are mapped to real memory. In this lecture we assume that address translation is done via paging. We use the terminology that virtual memory is divided into fixed-size pieces called pages, while real-memory is divided into identically sized pieces called page frames. III–7
  • 8. Multiprocess Address Translation Process 1’s Process 2’s Virtual Virtual Memory Memory Real Memory CS 167 III–8 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Multiple processes compete for real memory. While many page frames appear only in one process, others are shared by multiple processes. III–8
  • 9. Can we reimplement read and write so that data isn’t copied from the kernel buffer cache to the user buffer, but re- mapped to it? CS 167 III–9 Copyright © 2006 Thomas W. Doeppner. All rights reserved. III–9
  • 10. No … (But we can do something almost as good.) CS 167 III–10 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Why not? It’s because user buffers aren’t necessarily either multiples of a page size in length or aligned on page boundaries. III–10
  • 11. Mapped Files • Traditional File I/O char buf[BigEnough]; fd = open(file, O_RDWR); for (i=0; i<n_recs; i++) { read(fd, buf, sizeof(buf)); use(buf); } • Mapped File I/O void *MappedFile; fd = open(file, O_RDWR); MappedFile = mmap(... , fd, ...); for (i=0; i<n_recs; i++) use(MappedFile[i]); CS 167 III–11 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Traditional I/O involves explicit calls to read and write, which in turn means that data is accessed via a buffer; in fact, two buffers are usually employed: data is transferred between a user buffer and a kernel buffer, and between the kernel buffer and the I/O device. An alternative approach is to map a file into a process’s address space: the file provides the data for a portion of the address space and the kernel’s virtual-memory system is responsible for the I/O. A major benefit of this approach is that data is transferred directly from the device to where the user needs it; there is no need for an extra system buffer. III–11
  • 12. Traditional File I/O user buffer kernel buffer Disk CS 167 III–12 Copyright © 2006 Thomas W. Doeppner. All rights reserved. III–12
  • 13. Mapped File I/O virtual memory address map real memory Disk CS 167 III–13 Copyright © 2006 Thomas W. Doeppner. All rights reserved. III–13
  • 14. Mmap System Call void *mmap( void *addr, // where to map file (0 if don’t care) size_t len, // how much to map int prot, // memory protection (read, write, exec.) int flags, // shared vs. private, plus more int fd, // which file off_t off // starting from where ) CS 167 III–14 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Mmap maps the file given by fd, starting at position off, for len bytes, into the caller’s address space starting at location addr len is rounded up to a multiple of the page size off must be page-aligned if addr is zero, the kernel assigns an address if addr is positive, it is a suggestion to the kernel as to where the mapped file should be located (it usually will be aligned to a page). However, if flags includes MAP_FIXED, then addr is not modified by the kernel (and if its value is not reasonable, the call fails) the call returns the address of the beginning of the mapped file The flags argument must include either MAP_SHARED or MAP_PRIVATE (but not both). If it’s MAP_SHARED, then the mapped portion of the caller’s address space contains the current contents of the file; when the mapped portion of the address space is modified by the process, the corresponding portion of the file is modified. However, if flags includes MAP_PRIVATE, then the idea is that the mapped portion of the address space is initialized with the contents of the file, but that changes made to the mapped portion of the address space by the process are private and not written back to the file. The details are a bit complicated: as long as the mapping process does not modify any of the mapped portion of the address space, the pages contained in it contain the current contents of the corresponding pages of the file. However, if the process modifies a page, then that particular page no longer contains the current contents of the corresponding file page, but contains whatever modifications are made to it by the process. These changes are not written back to the file and not shared with any other process that has mapped the file. It’s unspecified what the situation is for other pages in the mapped region after one of them is modified. Depending on the implementation, they might continue to contain the current contents of the corresponding pages of the file until they, themselves, are modified. Or they might also be treated as if they’d just been written to and thus no longer be shared with others. III–14
  • 15. Share-Mapped File I/O virtual memory address map real memory virtual memory Disk CS 167 III–15 Copyright © 2006 Thomas W. Doeppner. All rights reserved. III–15
  • 16. Example int main( ) { int fd; char* buffer; int size; fd = open("file", O_RDWR); size = lseek(fd, 0, SEEK_END); if ((int)(buffer = (char *)mmap(0, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0)) == -1) { perror("mmap"); exit(1); } // buffer points to region of memory containing the // contents of the file ... } CS 167 III–16 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Here we map the entire contents of a file into the caller’s address space, allowing it both read and write access. Note mapping the file into memory does not cause any immediate I/O to take place. The operating system will perform the I/O when necessary, according to its own rules. III–16
  • 17. What if you do I/O both traditionally and via mmap to the same file simultaneously? ’d ? CS 167 III–17 Copyright © 2006 Thomas W. Doeppner. All rights reserved. III–17
  • 18. Integrated VM and Buffer Cache Process 1’s Process 2’s Virtual Kernel’s Virtual Memory Disk Buffer Cache Memory Real Memory CS 167 III–18 Copyright © 2006 Thomas W. Doeppner. All rights reserved. To integrate virtual memory (and hence I/O via mmap) with traditional I/O, we must make certain that there is at most one copy of any page from a file in primary storage. This copy (in a page frame) is mapped both into the buffer cache and into all processes that have mmaped that page of the file. III–18