0
C/C++ Linux System Programming <ul><ul><li>Session 16 </li></ul></ul><ul><ul><li>User-space System Programming </li></ul><...
Outline <ul><li>Filesystem concepts </li></ul><ul><li>File I/O Ops </li></ul>
Filesystem <ul><li>Traditionally: An abstraction for storage device access </li></ul><ul><li>Why? </li></ul><ul><ul><li>Co...
VFS  <ul><li>Wider-range abstraction:  </li></ul><ul><ul><li>special FS, different types of disk FS, network FS </li></ul>...
Mounts <ul><li>Superblocks – filesystem control block </li></ul><ul><li>Mount point </li></ul><ul><li>Syscalls </li></ul><...
FS Objects and Metadata <ul><li>Inode – file control block </li></ul><ul><ul><li>A unique ID </li></ul></ul><ul><ul><li>Ac...
Journaling <ul><li>Problem: </li></ul><ul><ul><li>operations on metadata are non-atomic, can be interrupted by power loss ...
Disk Cache <ul><li>Buffers </li></ul><ul><li>Page cache </li></ul><ul><li>Writeback – pdflush </li></ul><ul><li>Read-ahead...
File Descriptors <ul><li>Descriptors – index into process file table </li></ul><ul><li>int open(const char *pathname, int ...
File I/O modes <ul><li>int fcntl(int fd, int cmd, long arg); // F_SETFL </li></ul><ul><li>Nonblocking: If not ready, EAGAI...
More File control <ul><li>int unlink(const char *pathname); </li></ul><ul><li>int truncate(const char *path, off_t length)...
Descriptor I/O <ul><li>ssize_t read(int fd, void *buf, size_t count); </li></ul><ul><li>ssize_t write(int fd, const void *...
IO Vectors <ul><li>ssize_t readv(int fd, const struct iovec *iov, int iovcnt); </li></ul><ul><li>ssize_t writev(int fd, co...
int echo_main(int argc, char **argv) { struct iovec io[argc]; struct iovec *cur_io = io; char *arg; char *p; ... while (1)...
Memory Mapped file <ul><li>void *mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset); </li></ul><u...
Locking <ul><li>Mandatory Locking (BSD) </li></ul><ul><ul><li>~S_IXGRP | SGID ( + mount flag MS_MANDLOCK) </li></ul></ul><...
Advisory Locking <ul><li>int flock(int fd, int operation); // LOCK_SH, LOCK_EX, LOCK_UN </li></ul><ul><li>int lockf(int fd...
#ifdef F_SETLK #ifndef SEEK_SET #define SEEK_SET 0 #endif struct flock lock_data; lock_data.l_type = F_WRLCK; lock_data.l_...
Buffered I/O <ul><li>Streams:  Buffer I/O and write to kernel at once </li></ul><ul><ul><li>Better alignment </li></ul></u...
I/O <ul><li>size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream); </li></ul><ul><li>size_t fwrite(const void *...
Behind the Scenes <ul><li>Inherently thread-safe </li></ul><ul><li>To do your own locking (of the stream, not the file) </...
Errors <ul><li>int feof(FILE *stream); </li></ul><ul><li>int ferror(FILE *stream); </li></ul><ul><li>void clearerr(FILE *s...
Positioning <ul><li>int fseek(FILE *stream, long offset, int whence); </li></ul><ul><li>long ftell(FILE *stream); </li></u...
Metadata <ul><li>int fstat(int fd, struct stat *buf); </li></ul><ul><li>int stat(const char *path, struct stat *buf); </li...
Directory Streams <ul><li>A directory is a file whose entries are other inodes </li></ul><ul><li>DIR *opendir(const char *...
static pid_list *scan_proc_pids(inode_list *ilist) { DIR *d; struct dirent *de; pid_t pid; pid_list *plist; xchdir(&quot;/...
I/O Multiplexing <ul><li>int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeou...
Epoll  <ul><li>Decouple interest set registration from poll </li></ul><ul><ul><li>+: O(1) on the wait </li></ul></ul><ul><...
 
IOCTL <ul><li>Device / special file control </li></ul><ul><li>int ioctl(int d, int request, ...); </li></ul><ul><li>Reques...
Filesystem events <ul><li>int inotify_init(void); // desc, need close </li></ul><ul><li>int inotify_add_watch(int fd, cons...
int inotifyd_main(int argc UNUSED_PARAM, char **argv) { unsigned mask = IN_ALL_EVENTS; // assume we want all events struct...
Asynchronous I/O <ul><li>Only on O_DIRECT </li></ul>struct aiocb { int aio_filedes;  /* file descriptor * int aio_lio_opco...
Upcoming SlideShare
Loading in...5
×

Sysprog 16

2,192

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,192
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
43
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Sysprog 16"

  1. 1. C/C++ Linux System Programming <ul><ul><li>Session 16 </li></ul></ul><ul><ul><li>User-space System Programming </li></ul></ul><ul><ul><li> – session 6 </li></ul></ul>
  2. 2. Outline <ul><li>Filesystem concepts </li></ul><ul><li>File I/O Ops </li></ul>
  3. 3. Filesystem <ul><li>Traditionally: An abstraction for storage device access </li></ul><ul><li>Why? </li></ul><ul><ul><li>Common sensible organization </li></ul></ul><ul><ul><li>Encapsulate OS – HW interaction, e.g. performance considerations </li></ul></ul>
  4. 4. VFS <ul><li>Wider-range abstraction: </li></ul><ul><ul><li>special FS, different types of disk FS, network FS </li></ul></ul><ul><ul><li>Common user interface </li></ul></ul><ul><ul><li>Multiple FS's </li></ul></ul><ul><ul><li>Common handling </li></ul></ul>
  5. 5. Mounts <ul><li>Superblocks – filesystem control block </li></ul><ul><li>Mount point </li></ul><ul><li>Syscalls </li></ul><ul><ul><li>int mount(const char *source, const char *target, const char *filesystemtype, unsigned long mountflags, const void *data); </li></ul></ul><ul><ul><li>int umount(const char *target); </li></ul></ul>
  6. 6. FS Objects and Metadata <ul><li>Inode – file control block </li></ul><ul><ul><li>A unique ID </li></ul></ul><ul><ul><li>Access/Owner info </li></ul></ul><ul><ul><li>Memory maps </li></ul></ul><ul><ul><li>Block device info </li></ul></ul><ul><li>Dirent – file as a directory entry (not physical) </li></ul><ul><li>File – file data and hook to meta (not physical) </li></ul>
  7. 7. Journaling <ul><li>Problem: </li></ul><ul><ul><li>operations on metadata are non-atomic, can be interrupted by power loss </li></ul></ul><ul><li>Physical vs logical journals </li></ul><ul><li>Metadata-only journals </li></ul>
  8. 8. Disk Cache <ul><li>Buffers </li></ul><ul><li>Page cache </li></ul><ul><li>Writeback – pdflush </li></ul><ul><li>Read-ahead </li></ul>
  9. 9. File Descriptors <ul><li>Descriptors – index into process file table </li></ul><ul><li>int open(const char *pathname, int flags); </li></ul><ul><li>int open(const char *pathname, int flags, mode_t mode); </li></ul><ul><li>int creat(const char *pathname, mode_t mode); </li></ul><ul><ul><li>Open with O_CREAT (disk files only) </li></ul></ul><ul><li>int close(int fd); /* notice status !! */ </li></ul>
  10. 10. File I/O modes <ul><li>int fcntl(int fd, int cmd, long arg); // F_SETFL </li></ul><ul><li>Nonblocking: If not ready, EAGAIN - O_NONBLOCK </li></ul><ul><li>Synchronized: Wait until data is on HW - O_SYNC </li></ul><ul><ul><li>int fsync(int fd); </li></ul></ul><ul><li>Asynchronous: Signal when ready - O_ASYNC </li></ul><ul><ul><li>SIGIO handler </li></ul></ul><ul><ul><li>fcntl: F_GETSIG / F_SETSIG, F_SETOWN/F_GETOWN (process getting signal) </li></ul></ul><ul><li>Direct: Directly from user buffer - O_DIRECT </li></ul>
  11. 11. More File control <ul><li>int unlink(const char *pathname); </li></ul><ul><li>int truncate(const char *path, off_t length); </li></ul><ul><ul><li>int ftruncate(int fd, off_t length); </li></ul></ul><ul><ul><li>O_TRUNC on open </li></ul></ul>
  12. 12. Descriptor I/O <ul><li>ssize_t read(int fd, void *buf, size_t count); </li></ul><ul><li>ssize_t write(int fd, const void *buf, size_t count); </li></ul><ul><li>off_t lseek(int fd, off_t offset, int whence); </li></ul><ul><ul><li>SEEK_SET, SEEK_CUR, SEEK_END </li></ul></ul><ul><li>EOF </li></ul>
  13. 13. IO Vectors <ul><li>ssize_t readv(int fd, const struct iovec *iov, int iovcnt); </li></ul><ul><li>ssize_t writev(int fd, const struct iovec *iov, int iovcnt); </li></ul>struct iovec { void *iov_base; /* Starting address */ size_t iov_len; /* Number of bytes to transfer */ };
  14. 14. int echo_main(int argc, char **argv) { struct iovec io[argc]; struct iovec *cur_io = io; char *arg; char *p; ... while (1) { int c; cur_io->iov_base = p = arg; ... while ((c = *arg++)) { if (c == eflag) { /* Check for escape seq. */ if (*arg == 'c') { /* 'c' means cancel newline and ignore all subsequent chars. */ cur_io->iov_len = p - (char*)cur_io->iov_base; cur_io++; goto ret; } ... c = bb_process_escape_sequence( (void*) &arg); } *p++ = c; } arg = *++argv; if (arg) *p++ = ' '; cur_io->iov_len = p - (char*)cur_io->iov_base; cur_io++; if (!arg) break; } ret: return writev(1, io, (cur_io - io)) >= 0; }
  15. 15. Memory Mapped file <ul><li>void *mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset); </li></ul><ul><li>int munmap(void *start, size_t length); </li></ul><ul><li>Important flags: </li></ul><ul><ul><li>No anonymous, MAP_SHARED, MAP_FIXED, MAP_POPULATE ( | MAP_NONBLOCK) </li></ul></ul><ul><li>int msync(void *start, size_t length, int flags); // MS_SYNC or MS_ASYNC </li></ul><ul><li>void *mremap(void *old_address, size_t old_size, size_t new_size, int flags); </li></ul>
  16. 16. Locking <ul><li>Mandatory Locking (BSD) </li></ul><ul><ul><li>~S_IXGRP | SGID ( + mount flag MS_MANDLOCK) </li></ul></ul><ul><ul><li>Racy (mmap) </li></ul></ul><ul><li>Advisory Locking </li></ul><ul><ul><li>Both sides play nice </li></ul></ul>
  17. 17. Advisory Locking <ul><li>int flock(int fd, int operation); // LOCK_SH, LOCK_EX, LOCK_UN </li></ul><ul><li>int lockf(int fd, int cmd, off_t len); // F_LOCK, T_LOCK, F_ULOCK, F_TEST </li></ul><ul><li>fcntl: F_GETLK, F_SETLK, F_SETLKW </li></ul><ul><ul><li>High level of control (with offset, down to a single byte) </li></ul></ul>struct flock { ... short l_type; /* Type of lock: F_RDLCK, F_WRLCK, F_UNLCK */ short l_whence; /* How to interpret l_start: SEEK_SET, SEEK_CUR, SEEK_END */ off_t l_start; /* Starting offset for lock */ off_t l_len; /* Number of bytes to lock */ pid_t l_pid; /* PID of process blocking our lock (F_GETLK only) */ ... };
  18. 18. #ifdef F_SETLK #ifndef SEEK_SET #define SEEK_SET 0 #endif struct flock lock_data; lock_data.l_type = F_WRLCK; lock_data.l_whence = SEEK_SET; lock_data.l_start = lock_data.l_len = 0; if (fcntl(pidFd, F_SETLK, &lock_data) == -1) { if (errno == EAGAIN) return oldpid; else return -1; } #else #ifdef LOCK_EX if (flock (pidFd, LOCK_EX|LOCK_NB) == -1) { if (errno == EWOULDBLOCK) return oldpid; else return -1; } #else if (lockf (pidFd, F_TLOCK, 0) == -1) { if (errno == EACCES) return oldpid; else return -1; } #endif #endif }
  19. 19. Buffered I/O <ul><li>Streams: Buffer I/O and write to kernel at once </li></ul><ul><ul><li>Better alignment </li></ul></ul><ul><ul><li>Less system calls </li></ul></ul><ul><ul><li>Yet another “cache”!! </li></ul></ul><ul><ul><li>FILE * </li></ul></ul><ul><ul><li>Formatting </li></ul></ul><ul><li>FILE *fopen(const char *path, const char *mode); </li></ul><ul><li>FILE *fdopen(int fd, const char *mode); </li></ul><ul><li>int fclose(FILE *fp); </li></ul><ul><li>int fileno(FILE *stream); </li></ul>
  20. 20. I/O <ul><li>size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream); </li></ul><ul><li>size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream); </li></ul><ul><li>Formatted </li></ul><ul><ul><li>int fprintf(FILE *stream, const char *format, ...); </li></ul></ul><ul><ul><li>int fscanf(FILE *stream, const char *format, ...); </li></ul></ul><ul><li>Char </li></ul><ul><ul><li>int fputc(int c, FILE *stream); </li></ul></ul><ul><ul><li>int fgetc(FILE *stream); -- int ungetc(int c, FILE *stream); </li></ul></ul><ul><li>String </li></ul><ul><ul><li>int fputs(const char *s, FILE *stream); </li></ul></ul><ul><ul><li>char *fgets(char *s, int size, FILE *stream); </li></ul></ul>
  21. 21. Behind the Scenes <ul><li>Inherently thread-safe </li></ul><ul><li>To do your own locking (of the stream, not the file) </li></ul><ul><ul><li>void flockfile(FILE *filehandle); </li></ul></ul><ul><ul><li>int ftrylockfile(FILE *filehandle); </li></ul></ul><ul><ul><li>void funlockfile(FILE *filehandle); </li></ul></ul><ul><ul><li>xxx_unlocked versions (e.g. fread_unlocked) </li></ul></ul><ul><li>Flushing the stream (not the page cache) </li></ul><ul><ul><li>int fflush(FILE *stream); </li></ul></ul>
  22. 22. Errors <ul><li>int feof(FILE *stream); </li></ul><ul><li>int ferror(FILE *stream); </li></ul><ul><li>void clearerr(FILE *stream); </li></ul><ul><li>Descriptor ops can not distinguish EOF vs error </li></ul>
  23. 23. Positioning <ul><li>int fseek(FILE *stream, long offset, int whence); </li></ul><ul><li>long ftell(FILE *stream); </li></ul><ul><li>int fgetpos(FILE *stream, fpos_t *pos); </li></ul><ul><li>int fsetpos(FILE *stream, fpos_t *pos); </li></ul>
  24. 24. Metadata <ul><li>int fstat(int fd, struct stat *buf); </li></ul><ul><li>int stat(const char *path, struct stat *buf); </li></ul><ul><ul><li>lstat : BSD only </li></ul></ul><ul><ul><li>Exec on all nodes in path </li></ul></ul>struct stat { dev_t st_dev; /* ID of device containing file */ ino_t st_ino; /* inode number */ mode_t st_mode; /* protection */ nlink_t st_nlink; /* number of hard links */ uid_t st_uid; /* user ID of owner */ gid_t st_gid; /* group ID of owner */ dev_t st_rdev; /* device ID (if special file) */ off_t st_size; /* total size, in bytes */ blksize_t st_blksize; /* blocksize for filesystem I/O */ blkcnt_t st_blocks; /* number of blocks allocated */ time_t st_atime; /* time of last access */ time_t st_mtime; /* time of last modification */ time_t st_ctime; /* time of last status change */ };
  25. 25. Directory Streams <ul><li>A directory is a file whose entries are other inodes </li></ul><ul><li>DIR *opendir(const char *name); </li></ul><ul><li>int closedir(DIR *dir); </li></ul><ul><li>struct dirent *readdir(DIR *dir); </li></ul>struct dirent { ino_t d_ino; /* inode number */ off_t d_off; /* offset to the next dirent */ unsigned short d_reclen; /* length of this record */ unsigned char d_type; /* type of file */ char d_name[256]; /* filename */ };
  26. 26. static pid_list *scan_proc_pids(inode_list *ilist) { DIR *d; struct dirent *de; pid_t pid; pid_list *plist; xchdir(&quot;/proc&quot;); d = opendir(&quot;/proc&quot;); if (!d) return NULL; plist = NULL; while ((de = readdir(d)) != NULL) { pid = (pid_t)bb_strtou(de->d_name, NULL, 10); if (errno) continue; if (chdir(de->d_name) < 0) continue; plist = scan_link(&quot;cwd&quot;, pid, ilist, plist); plist = scan_link(&quot;exe&quot;, pid, ilist, plist); plist = scan_link(&quot;root&quot;, pid, ilist, plist); .... } closedir(d); return plist; } static pid_list *scan_link(const char *lname, pid_t pid, inode_list *ilist, pid_list *plist) { ino_t inode; dev_t dev; if (!file_to_dev_inode(lname, &dev, &inode)) return plist; if (search_dev_inode(ilist, dev, inode)) plist = add_pid(plist, pid); return plist; } static int file_to_dev_inode(const char *filename, dev_t *dev, ino_t *inode) { struct stat f_stat; if (stat(filename, &f_stat)) return 0; *inode = f_stat.st_ino; *dev = f_stat.st_dev; return 1; } static int search_dev_inode(inode_list *ilist, dev_t dev, ino_t inode) { while (ilist) { if (ilist->dev == dev) { if (option_mask32 & OPT_MOUNT) return 1; if (ilist->inode == inode) return 1; } ilist = ilist->next; } return 0; }
  27. 27. I/O Multiplexing <ul><li>int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout); </li></ul><ul><li>int pselect(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, const struct timespec *timeout, const sigset_t *sigmask); </li></ul><ul><li>int poll(struct pollfd *fds, nfds_t nfds, int timeout); </li></ul><ul><li>int ppoll(struct pollfd *fds, nfds_vt nfds, const struct timespec *timeout, const sigset_t *sigmask); </li></ul><ul><ul><li>POLLIN/POLLOUT/POLLPRI/POLLERR </li></ul></ul>void FD_CLR(int fd, fd_set *set); int FD_ISSET(int fd, fd_set *set); void FD_SET(int fd, fd_set *set); void FD_ZERO(fd_set *set); struct pollfd { int fd; /* file descriptor */ short events; /* requested events */ short revents; /* returned events */ };
  28. 28. Epoll <ul><li>Decouple interest set registration from poll </li></ul><ul><ul><li>+: O(1) on the wait </li></ul></ul><ul><ul><li>+: Edge trigger </li></ul></ul><ul><ul><li>- : system call for adding onto the set </li></ul></ul><ul><li>int epoll_create(int size); //desc, need close </li></ul><ul><li>int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event); </li></ul><ul><li>int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout); </li></ul>typedef union epoll_data { void *ptr; int fd; uint32_t u32; uint64_t u64; } epoll_data_t; struct epoll_event { uint32_t events; /* Epoll events */ epoll_data_t data; /* User data variable */ };
  29. 30. IOCTL <ul><li>Device / special file control </li></ul><ul><li>int ioctl(int d, int request, ...); </li></ul><ul><li>Request is specific to device being controlled, and may have a payload (ioctl_list) </li></ul>
  30. 31. Filesystem events <ul><li>int inotify_init(void); // desc, need close </li></ul><ul><li>int inotify_add_watch(int fd, const char *pathname, uint32_t mask); // watch desc </li></ul><ul><li>int inotify_rm_watch(int fd, uint32_t wd); </li></ul><ul><li>FIONREAD ioctl </li></ul><ul><li>fcntl: F_NOTIFY </li></ul>struct inotify_event { int wd; /* watch descriptor */ uint32_t mask; /* mask of events */ uint32_t cookie; /* unique cookie */ uint32_t len; /* size of 'name' field */ char name[]; /* null-terminated name */ };
  31. 32. int inotifyd_main(int argc UNUSED_PARAM, char **argv) { unsigned mask = IN_ALL_EVENTS; // assume we want all events struct pollfd pfd; char **watched = ++argv; // watched name list const char *args[] = { *argv, NULL, NULL, NULL, NULL }; // open inotify pfd.fd = inotify_init(); if (pfd.fd < 0) bb_perror_msg_and_die(&quot;no kernel support&quot;); // setup watched while (*++argv) { char *path = *argv; char *masks = strchr(path, ':'); int wd; // watch descriptor // if mask is specified -> if (masks) { *masks = ''; // split path and mask // convert mask names to mask bitset mask = 0; while (*++masks) { int i = strchr(mask_names, *masks) - mask_names; if (i >= 0) { mask |= (1 << i); } } } // add watch wd = inotify_add_watch(pfd.fd, path, mask); if (wd < 0) { bb_perror_msg_and_die(&quot;add watch (%s) failed&quot;, path); } } static const char mask_names[] ALIGN1 = &quot;a&quot; // 0x00000001 File was accessed &quot;c&quot; // 0x00000002 File was modified &quot;e&quot; // 0x00000004 Metadata changed &quot;w&quot; // 0x00000008 Writtable file was closed &quot;0&quot; // 0x00000010 Unwrittable file closed &quot;r&quot; // 0x00000020 File was opened &quot;m&quot; // 0x00000040 File was moved from X &quot;y&quot; // 0x00000080 File was moved to Y &quot;n&quot; // 0x00000100 Subfile was created &quot;d&quot; // 0x00000200 Subfile was deleted &quot;D&quot; // 0x00000400 Self was deleted &quot;M&quot; // 0x00000800 Self was moved ; pfd.events = POLLIN; while (!signalled && poll(&pfd, 1, -1) > 0) { ssize_t len; void *buf; struct inotify_event *ie; // read out all pending events xioctl(pfd.fd, FIONREAD, &len); #define eventbuf bb_common_bufsiz1 ie = buf = (len <= sizeof(eventbuf)) ? eventbuf : xmalloc(len); len = full_read(pfd.fd, buf, len); // process events. N.B. events may vary in length while (len > 0) { int i; char events[12]; char *s = events; unsigned m = ie->mask; for (i = 0; i < 12; ++i, m >>= 1) { if (m & 1) { *s++ = mask_names[i]; } } *s = ''; args[1] = events; args[2] = watched[ie->wd]; args[3] = ie->len ? ie->name : NULL; xspawn((char **)args); // next event i = sizeof(struct inotify_event) + ie->len; len -= i; ie = (void*)((char*)ie + i); } if (eventbuf != buf) free(buf); } return EXIT_SUCCESS; }
  32. 33. Asynchronous I/O <ul><li>Only on O_DIRECT </li></ul>struct aiocb { int aio_filedes; /* file descriptor * int aio_lio_opcode; /* operation to perform */ int aio_reqprio; /* request priority offset * volatile void *aio_buf; /* pointer to buffer */ size_t aio_nbytes; /* length of operation */ struct sigevent aio_sigevent; /* signal number and value */ /* internal, private members follow... */ }; int aio_read (struct aiocb *aiocbp); int aio_write (struct aiocb *aiocbp); int aio_error (const struct aiocb *aiocbp); int aio_return (struct aiocb *aiocbp); int aio_cancel (int fd, struct aiocb *aiocbp); int aio_fsync (int op, struct aiocb *aiocbp); int aio_suspend (const struct aiocb * const cblist[], int n, const struct timespec *timeout);
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×