File systems - BIN'S BLOG

Basic

A filesystem is the methods and data structures that an operating system uses to keep track of files on a disk or partition; that is, the way the files are organized on the disk.

File content stored in blocks on storage device
Files are organized into directories (folders)

Detail about file

File descriptor

A file descriptor is a number that uniquely identifies an open file in a computer’s operating system. It describes a data resource, and how that resource may be accessed.
简单的说，根据Linux一切皆文件的概念来看，当进程打开或者创建文件的时候，内核会向进程返回一个数字，这个数字就是文件描述符，所有执行I/O操作的系统调用都通过文件描述符来进行。

Hard link and symbolic link/soft link

A hard link is essentially a synced carbon copy of a file that refers directly to the inode of a file. Symbolic links on the other hand refer directly to the file which refers to the inode, a shortcut.

1
2
3

$ ln a.txt b.txt    # 创建硬链接 - 副本
$ ln -s a.txt c.txt # 创建软链接 - 替身（快捷方式）
$ unlink c.txt      # 取消链接

File permission mode

3种模式 r:read w:write x:execute
3种身份: user group others
3*3 = 27 种访问权限
rw-r–r– => 110 (owner permission) 100 (group) 100 (others)
可以使用chomod 来进行权限的修改

Inode

The inode (index node) is a data structure in a Unix-style file system that describes a file-system object such as a file or a directory. 更多inode细节

Stores metadata/attributes about the file ( use stat file to check the metadata )
Also stores locations of blocks holding the content of the file

metadata

More details about the file metadata

struct stat {
dev_t st_dev; /* ID of device containing file */
ino_t st_ino; /* inode number */
mode_t st_mode; /* protection */
nlink_t st_nlink; /* number of (hard) links */
uid_t st_uid; /* user ID of owner */
gid_t st_gid; /* group ID of owner */
dev_t st_rdev; /* device ID (if special device file, e.g., /etc/tty) */ 
off_t st_size; /* total size, in bytes */
blksize_t st_blksize; /* blocksize for filesystem I/O */
blkcnt_t st_blocks; /* number of blocks allocated */
time_t st_atime; /* last time file content was accessed */
time_t st_mtime; /* last time file content was modified */ 
time_t st_ctime; /* last time inode was changed */
};

time_t st_atime; /* last time file content was accessed */ 关于这一项的思考：
这一项表明了不论对文件是什么操作，都会进行inode的修改，有的时候为了提升I/O性能，会在挂载文件系统的时候指定“noatime,nodiratime”参数，意味着当访问一个文件和目录的时候，access time都不会更新
可以通过 cat /etc/fstab 查看具体信息

locations of data

An inode has:

A number of direct pointers,each points to a data block
example: a inode has 8 pointers, each pointer points to a 4K block, then this inode is enogh for 8*4K = 32KB size of file
Also has a slot for indirect pointer : a pointer points to a data block storing direct pointers
example: 1 block’s size: 4KB, pointer size: 4 bytes, then a block can hold 1024 pointers => Now file can have (8 + 1024) blocks , enough for 4MB size of file

How to store larger files? - Multi-level index

Pointers may be organized into multiple levels (double,triple,…n times)

1 direct pointer => 1 block (4KB)
1 indirect pointer => 2^10 direct pointer = 2^10*4KB = 4MB
1 double indirect pointer => 2^10 indirect pointers => 2^20 direct pointers => 2^20*4KB = 4GB

Organization of blocks

Assumption:(便于理解，做些假设)

Disk consists of a list of blocks and they are array-based.(other forms:Tree-based, e.g., SGI XFS -Blocks are organized into variable-length extents)
a disk with 64 blocks
- 4KB/block =>文件系统的最小单元
- 512B/sector =>存储系统的最小单元
- so there are 2^12/2^9 = 2^3 = 8 sectors/block and capacity of disk = 64 * 4KB = 256KB

Structure about these blocks:

Data region (56 blocks (#8-63))
Inode table (5 blocks #3 – #7) 元数据+文件内容地址
- assume 256 bytes/inode 5 blocks, 4KB/block
- => 80 inodes total (4KB/256B * 5)
- => File system can store at most 80 files
Bitmaps (#1, #2) a vector of bits, 0 for free (inode/block), 1 for in-use
- Inode bitmap (imap):keep track of which inodes in the inode table are available (这里4KB一个block 1byte=8bit，所以一个block的话最多以存储80K个inode的状态)
- Data bitmap (dmap):keep track of which blocks in data region are available
Superblock(#0): Track where i/d blocks and inode table are; Indicate type of FS & inumber of its root dir; Will be read first when file system is mounted

Inumber

Each inode is identified by a number: Low-level number of file name
Can figure out location of inode from inumber 通过inumber可以寻址到其代表的inode所在的sector
Location: ⌊(inodeStartAddress + inumber ∗ inode size)/sector size⌋

example: inumber = 32
Address:12K+32*256=20K
Sector #: 20K/512 = 40

总结inumber,inode,data一句话来说就是inumber => inode => data

Detail about directory

Basic

Directory itself stored as a file
For each file in the directory, it stores:
- name, inumber, record length, string length
If file is deleted (using rm command) or a name is unlinked (using unlink command),then inumber in its directory entry set to 0 (reserved for empty entry)
- File is finally deleted when its last (hard) link is removed

Storing a directory

Also as a file with its own inode + data block
inode:
- file type: directory (instead of regular file)
- pointer to block(s) in data region storing directory entries

Operations on file

Create: open(), write()
Read: open(),read(), lseek()
Update: write(), lseek()
Delete: unlink()

Note: 这些函数都是操作系统自带提供的系统调用函数

Read

User interface via GUI or touch command in Linux
Implementation, e.g., via a C program with a system call: open()
open() returns a file descriptor. Reserved fds: stdin 0, stdout, 1, stderr 2

After getting the file descriptor of file, start read()

fd = open("/foo/bar", O_RDONLY)
- Need to locate inode of the file "/foo/bar"
- Assume inumber of root, say 2, is known (e.g., when the file system is mounted)

step1: read inode and content of /  (2 reads)
- Look for "foo" in / -> foo's inumber

step2: read inode and content of /foo (2 reads)
- Look for "bar" in /foo -> bar's inumber

step3: read inode of /foo/bar (1 read)
– Permission check + allocate file descriptor

open-file table per process(维护这些进程，系统有一个 open-file table)


read(fd, buffer, size)
– Note fd is maintained in per-process open-file table
– Table translates fd -> inumber of file

step1: consult bar's inode to locate a block
step2: read the block
step3: update inode with newest file access time
step4: update open-file table with new offset
step5: repeat above steps until done(with reading data of given size)

I/O cost for open(): 5 reads
I/O cost for reading a block: 2 reads + 1 write

Write

int fd = open("/foo/bar", O_WRONLY)
– Or int fd = create(("/foo/bar")
– Assume bar is a new file under foo

step1: read'/'inode & content -> obtain foo's inumber
step2: read'/foo'inode & content -> check if bar exists
step3: read imap,to find a free inode for bar
step4: update imap,setting 1 for allocated inode
step5: write bar's inode
step6: update foo's content block  - adding an entry for bar
step7: update foo's inode  -update its modification time

write(fd, buffer, size)
step1: read inode of bar(by looking up its inumber in the open-file table)
step2: allocate new data block – read and write bmap
step3: write to data block of bar
step4: update bar node - new modification time, add pointer to block

Operating System

本博客所有文章除特别声明外，均采用 CC BY-SA 4.0 协议，转载请注明出处！

Network File System Previous

spaCy学习 Next