File systems

Basic

A filesystem is the methods and data structures that an operating system uses to keep track of files on a disk or partition; that is, the way the files are organized on the disk.

  • File content stored in blocks on storage device
  • Files are organized into directories (folders)

Detail about file

File descriptor

A file descriptor is a number that uniquely identifies an open file in a computer’s operating system. It describes a data resource, and how that resource may be accessed.
简单的说,根据Linux一切皆文件的概念来看,当进程打开或者创建文件的时候,内核会向进程返回一个数字,这个数字就是文件描述符,所有执行I/O操作的系统调用都通过文件描述符来进行。

A hard link is essentially a synced carbon copy of a file that refers directly to the inode of a file. Symbolic links on the other hand refer directly to the file which refers to the inode, a shortcut.

1
2
3
$ ln a.txt b.txt    # 创建硬链接 - 副本
$ ln -s a.txt c.txt # 创建软链接 - 替身(快捷方式)
$ unlink c.txt # 取消链接

File permission mode

3种模式 r:read w:write x:execute
3种身份: user group others
3*3 = 27 种访问权限
rw-r–r– => 110 (owner permission) 100 (group) 100 (others)
可以使用chomod 来进行权限的修改

Inode

The inode (index node) is a data structure in a Unix-style file system that describes a file-system object such as a file or a directory. 更多inode细节

  • Stores metadata/attributes about the file ( use stat file to check the metadata )
  • Also stores locations of blocks holding the content of the file
metadata
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
More details about the file metadata

struct stat {
dev_t st_dev; /* ID of device containing file */
ino_t st_ino; /* inode number */
mode_t st_mode; /* protection */
nlink_t st_nlink; /* number of (hard) links */
uid_t st_uid; /* user ID of owner */
gid_t st_gid; /* group ID of owner */
dev_t st_rdev; /* device ID (if special device file, e.g., /etc/tty) */
off_t st_size; /* total size, in bytes */
blksize_t st_blksize; /* blocksize for filesystem I/O */
blkcnt_t st_blocks; /* number of blocks allocated */
time_t st_atime; /* last time file content was accessed */
time_t st_mtime; /* last time file content was modified */
time_t st_ctime; /* last time inode was changed */
};

time_t st_atime; /* last time file content was accessed */ 关于这一项的思考:
这一项表明了不论对文件是什么操作,都会进行inode的修改,有的时候为了提升I/O性能,会在挂载文件系统的时候指定“noatime,nodiratime”参数,意味着当访问一个文件和目录的时候,access time都不会更新
可以通过 cat /etc/fstab 查看具体信息

locations of data

An inode has:

  • A number of direct pointers,each points to a data block
    example: a inode has 8 pointers, each pointer points to a 4K block, then this inode is enogh for 8*4K = 32KB size of file
  • Also has a slot for indirect pointer : a pointer points to a data block storing direct pointers
    example: 1 block’s size: 4KB, pointer size: 4 bytes, then a block can hold 1024 pointers => Now file can have (8 + 1024) blocks , enough for 4MB size of file

How to store larger files? - Multi-level index

  • Pointers may be organized into multiple levels (double,triple,…n times)


1 direct pointer => 1 block (4KB)
1 indirect pointer => 2^10 direct pointer = 2^10*4KB = 4MB
1 double indirect pointer => 2^10 indirect pointers => 2^20 direct pointers => 2^20*4KB = 4GB

Organization of blocks

Assumption:(便于理解,做些假设)

  1. Disk consists of a list of blocks and they are array-based.(other forms:Tree-based, e.g., SGI XFS -Blocks are organized into variable-length extents)
  2. a disk with 64 blocks
    • 4KB/block =>文件系统的最小单元
    • 512B/sector =>存储系统的最小单元
    • so there are 2^12/2^9 = 2^3 = 8 sectors/block and capacity of disk = 64 * 4KB = 256KB

Structure about these blocks:

  1. Data region (56 blocks (#8-63))
  2. Inode table (5 blocks #3 – #7) 元数据+文件内容地址
    • assume 256 bytes/inode 5 blocks, 4KB/block
    • => 80 inodes total (4KB/256B * 5)
    • => File system can store at most 80 files
  3. Bitmaps (#1, #2) a vector of bits, 0 for free (inode/block), 1 for in-use
    • Inode bitmap (imap):keep track of which inodes in the inode table are available (这里4KB一个block 1byte=8bit,所以一个block的话最多以存储80K个inode的状态)
    • Data bitmap (dmap):keep track of which blocks in data region are available
  4. Superblock(#0): Track where i/d blocks and inode table are; Indicate type of FS & inumber of its root dir; Will be read first when file system is mounted

Inumber

Each inode is identified by a number: Low-level number of file name
Can figure out location of inode from inumber 通过inumber可以寻址到其代表的inode所在的sector
Location: ⌊(inodeStartAddress + inumber ∗ inode size)/sector size⌋

example: inumber = 32
Address:12K+32*256=20K
Sector #: 20K/512 = 40

总结inumber,inode,data一句话来说就是inumber => inode => data

Detail about directory

Basic

  • Directory itself stored as a file
  • For each file in the directory, it stores:
    • name, inumber, record length, string length
  • If file is deleted (using rm command) or a name is unlinked (using unlink command),then inumber in its directory entry set to 0 (reserved for empty entry)
    • File is finally deleted when its last (hard) link is removed

Storing a directory

  • Also as a file with its own inode + data block
  • inode:
    • file type: directory (instead of regular file)
    • pointer to block(s) in data region storing directory entries

Operations on file

  • Create: open(), write()
  • Read: open(),read(), lseek()
  • Update: write(), lseek()
  • Delete: unlink()

Note: 这些函数都是操作系统自带提供的系统调用函数

Read

  • User interface via GUI or touch command in Linux
  • Implementation, e.g., via a C program with a system call: open()
  • open() returns a file descriptor. Reserved fds: stdin 0, stdout, 1, stderr 2
  • After getting the file descriptor of file, start read()
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    fd = open("/foo/bar", O_RDONLY)
    - Need to locate inode of the file "/foo/bar"
    - Assume inumber of root, say 2, is known (e.g., when the file system is mounted)

    step1: read inode and content of / (2 reads)
    - Look for "foo" in / -> foo's inumber

    step2: read inode and content of /foo (2 reads)
    - Look for "bar" in /foo -> bar's inumber

    step3: read inode of /foo/bar (1 read)
    – Permission check + allocate file descriptor

    open-file table per process(维护这些进程,系统有一个 open-file table)


    read(fd, buffer, size)
    Note fd is maintained in per-process open-file table
    – Table translates fd -> inumber of file

    step1: consult bar's inode to locate a block
    step2: read the block
    step3: update inode with newest file access time
    step4: update open-file table with new offset
    step5: repeat above steps until done(with reading data of given size)

I/O cost for open(): 5 reads
I/O cost for reading a block: 2 reads + 1 write

Write

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
int fd = open("/foo/bar", O_WRONLY)
– Or int fd = create(("/foo/bar")
– Assume bar is a new file under foo

step1: read'/'inode & content -> obtain foo's inumber
step2: read'/foo'inode & content -> check if bar exists
step3: read imap,to find a free inode for bar
step4: update imap,setting 1 for allocated inode
step5: write bar's inode
step6: update foo's content block - adding an entry for bar
step7: update foo's inode -update its modification time

write(fd, buffer, size)
step1: read inode of bar(by looking up its inumber in the open-file table)
step2: allocate new data blockread and write bmap
step3: write to data block of bar
step4: update bar node - new modification time, add pointer to block


本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!