Storage systems

Storage Device

Hierarchy

Characters

  • Capacity(bytes) - How much data it can hold
  • Cost($) - Price per byte of storage
  • Bandwidth(bytes/sec) - Number of bytes that can be transferred per second; read and write bandwidth may be different
  • Latency(sec) - Time elapsed, waiting for response/delivery of data

Basic Function : CRUD

  • C(reate)/write
  • R(ead)
  • U(pdate)/overwrite
  • D(elete)

Some terms

  • Access times: Time taken before drive is ready to transfer data 一般来说,物理设备(硬盘,内存..)在进行数据的转换前需要索引到目标位置, 内存-纳秒 SSD-微秒 HDD-毫秒
  • Access pattern: how storage read/write data
    • Sequential: Data to be accessed are located next to each other or sequentially on the device
    • Random: Access data located randomly on storage device
  • Completion Time:Time to complete an read/write operation
    • CompletionTime = Latency + Size/Bandwidth
    • Depends on lots of factors(device, operation type, access pattern…)

Note: 这里主要讨论HDD和SSD

Hard Disk Drive

Organization

  • One or more spinning magnetic platters, typically two surface per platter
  • Data stored in tracks
  • Disk arm positions over the radial positon - swings across tracks but don’t extend
  • Data is read/written by disk head as platter spins

Hard disk head movement while copying files between two folders:
https://www.youtube.com/watch?v=BlB49F6ExkQ

Physical characteristics

  • 3.5” (diameter, common in desktops), 2.5” (common in laptops)
  • Rotational Speed: 4800/5400/7200/10000 RPM (rotations per minute)
  • Between 5-7 platters
  • Current capacity up to 10TB

Data Storage

  • 1 platter is divided into a number of tracks
  • 1 tracker is divided into N fixed size sectors
    • sector size: 4KB
    • Entire sector is written “atomically” -> sector为最小的操作单元,所以不论读写都首先进行sector的寻址

Address Method - CHS(cylinder-head-sector)

overview

CHS is an early way to address a sector. (LBA(Logical Block Addressing) is more common now.)

举个例子:
#cylinders: 256
#heads: 16 (i.e., 8 platters, 2 heads/platter)
#sectors/track: 64
sector size = 4KB
=> capacity of the drive: 2^8 * 2^6 * 2^2* 2^10 * 2^4 = 2^30 = 1GB

address step

According to CHS, data can be located before transferring, then data can be transferred

  1. Wait for the disk haed on right track - seek time
    1. On average seek time is about 1⁄3 max seek time
  2. wait for the right sector to rotate under the head. - rotational latency
    1. On average: about 1⁄2 of time of a full rotation
    2. example: Assume 10,000 RPM (rotations per minute) 60000 ms/ 10000 rotations = 6ms / rotation

Data Operation

T = T_seek + T_rotation + T_transfer
T_seek : Time to get the disk head on right track
T_rotation :Time to wait for the right sector to rotate under the head
T_transfer: Time to actually transfer data

T_transfer

Assume that data will be transferred: 512KB, 128 MB/sec transmission bandwidth
Transfer time: 512KB/128MB * 1000ms = 4ms

Actual Bandwidth

Actual Bandwidth = data / actual time ,所以一般情况下实际带宽会小

数据传输中的block和sector区分

  • Sector is the basic unit of hard disk dirve
  • Block is the basic unit of file system
  • Block has 1 or more sectors (in this course, assuming one block = one sector)

硬盘本身没有block的概念,block概念存在于文件系统的概念中,文件系统是一个块一个块的读取数据,如果是按照一个sector一个sector的来读数据,太慢了,所以才有了block这样一个逻辑块的概念。

不同access pattern对读写的影响

  • Sequential operation:
    • May assume all sectors involved are on the same track
      – need to seek to the right track or rotate to the first sector (一次seektime)
      – But no rotation/seeking needed afterwardSSD
  • Random operation:
    • May assume all sectors are on different tracks and sectors (多次seektime)
1
2
3
4
5
6
7
8
9
example: 7ms avg seek,  10,000 RPM  50 MB/sec transfer rate 4KB/block
Sequential access of 10 MB:
– Completion time = 7ms + 60*1000/10000/2 ms + 10/50 *1000 ms = 210ms
– Actual bandwidth = 10MB/210ms = 47.62 MB/s

Random access of 10 MB
– block numbers: 10*1000/4 = 2500 (assume 1 block = 1 sector)
– Completion time = 2500 * (7 + 3 + 4/50) = 25.2s
– Actual bandwidth = 10MB / 25.2s = 0.397 MB/s

Soild State Drive

Organization

SSD contains a number of flash memory chips

chip -> dies -> planes -> blocks -> pages (rows) -> cells

Characteristics

  • All electronic, made from flash memory
  • Limited lifetime, can only write a limited number of times.
  • More expensive, less capacity - 3 times or more expensive
  • Significantly better latency: no seek or rotational delay
  • Much better performance on random (however, write has much higher latency than read )

Data Storage

  • Cells are made of floating-gate transistors : By applying high positive/negative voltage to control gate, electrons can be attracted to or repelled from floating gate
    • State = 1, if no electrons in the floating gate
    • State = 0, if there are electrons (negative charges)
      – Electrons stuck there even when power is off
      – So state is retained
  • Data in SSD are represented by the ‘01010…’ formats, that is the state of the electrons

Operations

Read

  • Electrons on the floating gate affect the threshold voltage for the floating gate transistor to conduct
  • Higher voltage needed when gate has electrons

Steps:

  1. Apply Vint (intermediate voltage)
  2. If the current is detected, gate has no electrons=> bit = 1
  3. If no current, gate must have electrons => bit = 0

Page is the smallest unit that can be read

Write and Erase

Write and erase

  • Write: 1 => 0 (get electron)
    • Apply high positive voltage (>> voltage for read) to the control gate
    • Attract electrons from channel to floating gate (through quantum tunneling)
    • Page is the smallest unit for write
  • Erase: 0 => 1 (make electrons empty)
    • Need to apply much higher negative voltage to the control gate
    • Get rid of electrons from floating gate
    • May stress surrounding cells(dangerous to do on individual pages)
    • Block is the smallest unit for erase

P/E cycle

P/E cycle: Data is written to cells (P) and then erased (E)
Every write & erase damages oxide layer surrounding the floating-gate to some extent

  • Page is the smallest unit for read and write (write is also called program, 1->0)
  • Block is the smallest unit for erase (0->1) – i.e., make cells “empty” (i.e., no electrons) (关于为什么使用block作为最小擦除单元:SSD的物理结构导致,擦除过程会作用到整个block施加高电压,将电子吸引出来)

本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!