This article introduces the knowledge and concepts related to hard drive partitioning. I think the content related to this is difficult to understand because part of it is a hardware concept and part of it is software (file system), and a lot of information is introduced without putting it together for comparison, so the reader will be very vague about some concepts when he sees it. For example, hard disk partitions have partition types and file systems have types, what is the difference between these two types? What is the difference between a hard drive with sector size and a file system with block size? This article tries to go deeper, starting from the basic principles, introducing some concepts, what they do respectively and why they do it.

Understanding hard disks

A hard drive in Linux is a block device, which is used to store data. You enter data into the hard drive, and the hard drive stores it in a location for you. The next time you need it, you read it out from that location.

disks

So given a location, how does a hard drive go about finding data in that location?

This starts with the structure of the hard drive (although most machines now use SSDs, much of the data is based on mechanical hard drives, so here is an example of how CHS addressing works.) . A hard drive is made up of several discs, each of which can hold data on both the front and back sides.

mechanical hard drives

So the question is converted into: how to determine a position in several circular surfaces. First of all, how many parameters are needed to determine a position in a circle? Obviously 2, the distance from the center of the circle to determine a circle, plus an “angle” to determine a point on that circle. Then, in the structure of a hard disk, another parameter is added to determine the number of circular faces.

These parameters are called.

  • Cylinder/Track: track, column surface, which determines the position from the center of the circle.
  • Head: Head, this is the physical device that reads and writes data, in fact, when the hard disk is running, it is the platter that is spinning, the head is responsible for moving and adjusting the column surface that is read;.
  • Sector: Sector. The above two parameters determine a circle, and Sector determines which sector is in this circle.

This is how CHS addressing works.

It is also clear from this that the unit of hard drive storage is “a sector”. In fact, when partitioning, (the partitioning software) also uses the term “sector to sector” and does not involve you in Track and Head, which is something that the hardware itself uses for addressing.

Using fdisk we can see how many sectors the drive has and how big a sector is. In fact almost all hard drive sectors since 1980 have been 512bytes in size.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
$ fdisk -l
Disk /dev/sda: 64 GiB, 68719476736 bytes, 134217728 sectors
Disk model: VBOX HARDDISK
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x86ae6277
 
Device Boot Start End Sectors Size Id Type
/dev/sda1 2048 4542463 4540416 2.2G 82 Linux swap / Solaris
/dev/sda2 * 4542464 35999743 31457280 15G 83 Linux

Why do we need partitions?

Now we know that with sector locations, the drive can write or read data out. So why do we need partitioning? For the following reasons.

  • Isolating file system rot (eggs not in one basket). We have to create the file system on the device in order for the OS to use the file system for reading and writing. The file system is a plan of the storage device, recording what is stored in each block (inode, block). In case the metadata of the file system is wrong, then the whole file system data may not be read.

  • Improve storage utilization. A file will occupy a minimum of one block, and if the block is too large, a lot of space will be wasted. For example, if the block size is 4k, and all the files stored are 1k, then 3/4 of the space is wasted. If the block is too small, the performance is very low, because the kernel is copied in blocks; we can divide it into a partition and build a file system with a block size of 512bytes to store these small files exclusively.

  • Limit file growth. crontab writing too many logs causes all processes to hang, which is certainly not reasonable. But file growth does not cross the file system and run to another partition, so we can allocate write space to specific processes by partitioning.

Based on this, we can partition the system for different storage contents. For example, a separate partition for user programs /usr and a separate partition for /home.

What is the nature of partitioning?

Different partitions are still on one drive, which is equivalent to just managing different sectors in groups. So where is this grouping information stored?

The answer is on the first sector of the first hard drive. The first sector of the drive is also the first place the system reads when booting (BIOS-based boot process). As mentioned earlier, the size of a sector is 512bytes, what is in these 512bytes?

In Linux everything is a file, the hard disk is also a file, denoted by /dev/sda (this is the SCSI interface, the IDE interface would be /dev/hda, see here for naming and numbering (https://www.tldp.org/HOWTO/Partition/devices.html)). This way, we can copy the first 512bytes of the “file”.

1
2
3
4
$ dd if=/dev/sda of=mbr.bin bs=512 count=1
1+0 records in
1+0 records out
512 bytes copied, 0.000308004 s, 1.7 MB/s

You can then look at the contents of this file using the xxd command that comes with Vim.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
$ xxd mbr.bin
00000000: eb63 9010 8ed0 bc00 b0b8 0000 8ed8 8ec0  .c..............
00000010: fbbe 007c bf00 06b9 0002 f3a4 ea21 0600  ...|.........!..
00000020: 00be be07 3804 750b 83c6 1081 fefe 0775  ....8.u........u
00000030: f3eb 16b4 02b0 01bb 007c b280 8a74 018b  .........|...t..
00000040: 4c02 cd13 ea00 7c00 00eb fe00 0000 0000  L.....|.........
00000050: 0000 0000 0000 0000 0000 0080 0100 0000  ................
00000060: 0000 0000 fffa 9090 f6c2 8074 05f6 c270  ...........t...p
00000070: 7402 b280 ea79 7c00 0031 c08e d88e d0bc  t....y|..1......
00000080: 0020 fba0 647c 3cff 7402 88c2 52be 057c  . ..d|<.t...R..|
00000090: b441 bbaa 55cd 135a 5272 3d81 fb55 aa75  .A..U..ZRr=..U.u
000000a0: 3783 e101 7432 31c0 8944 0440 8844 ff89  7...t21..D.@.D..
000000b0: 4402 c704 1000 668b 1e5c 7c66 895c 0866  D.....f..\|f.\.f
000000c0: 8b1e 607c 6689 5c0c c744 0600 70b4 42cd  ..`|f.\..D..p.B.
000000d0: 1372 05bb 0070 eb76 b408 cd13 730d 5a84  .r...p.v....s.Z.
000000e0: d20f 83de 00be 857d e982 0066 0fb6 c688  .......}...f....
000000f0: 64ff 4066 8944 040f b6d1 c1e2 0288 e888  d.@f.D..........
00000100: f440 8944 080f b6c2 c0e8 0266 8904 66a1  .@.D.......f..f.
00000110: 607c 6609 c075 4e66 a15c 7c66 31d2 66f7  `|f..uNf.\|f1.f.
00000120: 3488 d131 d266 f774 043b 4408 7d37 fec1  4..1.f.t.;D.}7..
00000130: 88c5 30c0 c1e8 0208 c188 d05a 88c6 bb00  ..0........Z....
00000140: 708e c331 dbb8 0102 cd13 721e 8cc3 601e  p..1......r...`.
00000150: b900 018e db31 f6bf 0080 8ec6 fcf3 a51f  .....1..........
00000160: 61ff 265a 7cbe 807d eb03 be8f 7de8 3400  a.&Z|..}....}.4.
00000170: be94 7de8 2e00 cd18 ebfe 4752 5542 2000  ..}.......GRUB .
00000180: 4765 6f6d 0048 6172 6420 4469 736b 0052  Geom.Hard Disk.R
00000190: 6561 6400 2045 7272 6f72 0d0a 00bb 0100  ead. Error......
000001a0: b40e cd10 ac3c 0075 f4c3 0000 0000 0000  .....<.u........
000001b0: 0000 0000 0000 0000 7762 ae86 0000 0004  ........wb......
000001c0: 0104 82fe c2ff 0008 0000 0048 4500 80fe  ...........HE...
000001d0: c2ff 83bb c1bb 0050 4500 0000 e001 0000  .......PE.......
000001e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001f0: 0000 0000 0000 0000 0000 0000 0000 55aa  ..............U.

The contents of this can be divided into 4 parts.

  1. 001-440bytes (440bytes in total): the code given to the BIOS to execute; this is actually very interesting and those who are interested can dump this into machine code and have a look. The boot system needs to load the code into memory, but we need the system to boot in order to load the code. So this process is also called boot, i.e. “pull oneself over a fence by one’s bootstraps “ .
  2. 441-446bytes (6bytes total): MBR Disk signature.
  3. 447-510 (64bytes total): partition table in 4 parts of 16 bytes each;
  4. the last 511 and 512 (2bytes in total): fixed to 0x55AA, indicating that the drive is available for booting.

From 00001be to the last 00001fd, the information of the partition table is recorded.

1
2
3
4
000001c0: .... .... .... .... .... .... .... 80fe  ................
000001d0: c2ff 83bb c1bb 0050 4500 0000 e001 0000  .......PE.......
000001e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001f0: 0000 0000 0000 0000 0000 0000 0000 ....  ................
  • Partition 1: 0004 0104 82fe c2ff 0008 0000 0048 4500
  • Partition 2: 80fe c2ff 83bb c1bb 0050 4500 0000 e001
  • Partition 3: 0000 0000 0000 0000 0000 0000 0000 0000
  • Partition 4: 0000 0000 0000 0000 0000 0000 0000 0000

According to fdisk -l above, I only have two partitions on this machine, so partitions 3 and 4 are empty. What is recorded in these 16bytes? Let’s take one of the partitions, let’s use the 2nd one here.

1
80fe c2ff 83fe c1bb 0050 4500 0000 e001

0 bytes, 80, is a flag for.

  • 80 This partition can be used for system booting.
  • 00 This partition cannot be used for system booting.

Bytes 1-3, fe c2ff This is represented by the CHS address we mentioned above, and the next 5-7 bytes, respectively, indicating that this partition starts at.

  • fe Cylinder location is fe ;
  • c2 Head starts at c2 ;
  • ff Sector starts at ff ;

Accordingly, the end position of this partition is bb c1 bb.

The 4th byte between the beginning and the end is the partition type. In this case it is 83, which means the type is Linux.

All types can be listed in fdisk with the l command.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Command (m for help): l
 
 0  Empty           24  NEC DOS         81  Minix / old Lin bf  Solaris
 1  FAT12           27  Hidden NTFS Win 82  Linux swap / So c1  DRDOS/sec (FAT-
 2  XENIX root      39  Plan 9          83  Linux           c4  DRDOS/sec (FAT-
 3  XENIX usr       3c  PartitionMagic  84  OS/2 hidden or  c6  DRDOS/sec (FAT-
 4  FAT16 <32M      40  Venix 80286     85  Linux extended  c7  Syrinx
 5  Extended        41  PPC PReP Boot   86  NTFS volume set da  Non-FS data
 6  FAT16           42  SFS             87  NTFS volume set db  CP/M / CTOS / .
 7  HPFS/NTFS/exFAT 4d  QNX4.x          88  Linux plaintext de  Dell Utility
 8  AIX             4e  QNX4.x 2nd part 8e  Linux LVM       df  BootIt
 9  AIX bootable    4f  QNX4.x 3rd part 93  Amoeba          e1  DOS access
 a  OS/2 Boot Manag 50  OnTrack DM      94  Amoeba BBT      e3  DOS R/O
 b  W95 FAT32       51  OnTrack DM6 Aux 9f  BSD/OS          e4  SpeedStor
 c  W95 FAT32 (LBA) 52  CP/M            a0  IBM Thinkpad hi ea  Rufus alignment
 e  W95 FAT16 (LBA) 53  OnTrack DM6 Aux a5  FreeBSD         eb  BeOS fs
 f  W95 Ext'd (LBA) 54  OnTrackDM6      a6  OpenBSD         ee  GPT
10  OPUS            55  EZ-Drive        a7  NeXTSTEP        ef  EFI (FAT-12/16/
11  Hidden FAT12    56  Golden Bow      a8  Darwin UFS      f0  Linux/PA-RISC b
12  Compaq diagnost 5c  Priam Edisk     a9  NetBSD          f1  SpeedStor
14  Hidden FAT16 <3 61  SpeedStor       ab  Darwin boot     f4  SpeedStor
16  Hidden FAT16    63  GNU HURD or Sys af  HFS / HFS+      f2  DOS secondary
17  Hidden HPFS/NTF 64  Novell Netware  b7  BSDI fs         fb  VMware VMFS
18  AST SmartSleep  65  Novell Netware  b8  BSDI swap       fc  VMware VMKCORE
1b  Hidden W95 FAT3 70  DiskSecure Mult bb  Boot Wizard hid fd  Linux raid auto
1c  Hidden W95 FAT3 75  PC/IX           bc  Acronis FAT32 L fe  LANstep
1e  Hidden W95 FAT1 80  Old Minix       be  Solaris boot    ff  BBT

But actually, this partition type is not very useful in Linux, whether it’s ext2 or ext3 or other Linux partitions, it’s 83. This flag is interpreted differently by different operating systems, for example, Windows uses this flag to distinguish between different partition types, so you see in this table that FAT32 and NTFS, the common Windows partitions, each occupy a flag bit. In the end, this flag bit is actually a common flag, and how it is interpreted goes to the operating system, even if different operating systems installed on the same hard drive are also possible, for example 0x07 For example, OS/2 considers this flag bit to be an HPFS type partition and Windows considers it to be an NTFS type partition.

It is important to note that this flag bit is not intrinsically related to the file system. Since Linux doesn’t care about this flag bit, I can build a filesystem on this partition regardless of its type. I can even overwrite this flag bit while the system is running. For example, if I change this current partition to FAT12, there is no problem at all.

1
2
3
4
Device     Boot    Start      End  Sectors  Size Id Type
/dev/sda1           2048  4542463  4540416  2.2G 82 Linux swap / Solaris
/dev/sda2  *     4542464 35999743 31457280   15G  1 FAT12
/dev/sda3       35999744 38096895  2097152    1G 83 Linux

8-11 bytes: 0050 4500 Absolute address of the first sector of the logical block address.

11-15 bytes: 0000 e001 How many sectors this partition has in total.

Limit of MBR partition: From here, we can see that 4 bytes indicate the absolute address of the first sector and 4 bytes indicate how many sectors this partition has, so the maximum hard disk size that the MBR partition table can support is:

1
2
512Bytes * 0xff ff ff ff          + 512Bytes * 0xff ff ff ff
扇区大小    最大能表示的第一个扇区位置    扇区大小    最后一个分区的最大扇区数

That is, 512 * (2^32 -1 ) * 2, which is 4TiB-1Kb.

However, with such partitioning, the last partition must be 2TiB in order to take advantage of 4Tib. If a user has a 4TiB hard drive and wants to divide it evenly into 4 partitions of 1TiB each, it won’t work. This will cause confusion to many users, so when it comes to commercial promotion, just say that MBR supports 2TiB.

Primary and extended partitions

As you can also see here, the partition data is 64bytes in total and each partition table needs 16bytes of information. So there can be 4 partitions in total. The first time I used a computer, it was Windows, and I never understood what “local disk CDEF” meant. Actually, it means that the fast partitioning mode of the partitioning software divides the hard disk into 4 partitions by default on average.

The partition table determines that we can only create 4 partitions, what if we want more partitions?

Remember what happens when block addressing in the file system exceeds the number of blocks that can be stored in an inode? The answer is that the actual content of the block stored in the inode is the address of the real block. The same principle is used here. We can create a primary partition of type Extended (with flag 5), and then each partition in this partition holds the address to the next partition at the end.

partition

Logical partitions must be contiguous (obviously), but primary partitions can be non-contiguous. Other than that there is no difference in the use of logical and primary partitions. Logical partitions can also be used to boot the system.

This introduction should solve most of the reader’s problems (at least it answers many of my questions). For more in-depth questions, you may have to search for more detailed information based on this content.

Sector Size and Block Size

You should know something about this from reading this article. Sector is a concept of hard disk, almost all hard disk sectors are 512Bytes, if not, it may be a problem. And Block refers to a logical concept. But there may still be some confusion about them in some scenarios. I’ve done some research on the subject, so here’s more or less how to save some time for those who have the same questions as me in the future, and can find it here.

There is very little difference in the concept of sector size. But Block has different meanings in different scenarios.

The first is the block of the file system, where the block affects the size of the block used to store the file. The reason is simple: the file system is addressed in blocks, so if the block size is 4k, then even if the file is written to 1k, it will take 4k.

When a file system is created, the inode and block are automatically allocated:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
$ mkfs -t ext2 /dev/sda5
mke2fs 1.45.3 (14-Jul-2019)
Creating filesystem with 131072 1k blocks and 32768 inodes
Filesystem UUID: eac2bf41-3564-4f66-8740-e574593247fa
Superblock backups stored on blocks:
        8193, 24577, 40961, 57345, 73729
 
Allocating group tables: done
Writing inode tables: done
Writing superblocks and filesystem accounting information: done

Block in IO : IO is in blocks, this block is not necessarily the block size of the file system, nor the size of the sector, it can be smaller than the sector, but this is a waste, because the hard disk will write 512bytes per write, if the block of IO is 256bytes, then it is equivalent to writing the same sector, using two physical write operations. In addition, we have to write to disk through syscall, which copies data between user space and kernel space, also in blocks. We can use the madvice system call to suggest the IO block size to the Kernel.

Here is the impact on the speed when I use dd to copy the same data from the hard disk, using different block sizes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
$ dd if=/dev/sda of=dump.bin bs=200 count=1000
1000+0 records in
1000+0 records out
200000 bytes (200 kB, 195 KiB) copied, 0.0111036 s, 18.0 MB/s
$ dd if=/dev/sda of=dump.bin bs=512 count=1000
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 0.00905283 s, 56.6 MB/s
$ dd if=/dev/sda of=dump.bin bs=1024 count=1000
1000+0 records in
1000+0 records out
1024000 bytes (1.0 MB, 1000 KiB) copied, 0.0061073 s, 168 MB/s

But IO is actually a very complex issue, and it is not clear in a few words. We recommend a book Linux System Programming, which has four chapters on IO-related topics.

In addition, when you see a block, you have to pay attention to what context it is talking about. For example, the block shown by the ls -s command is shown with each block=1024bytes, while the block in stat is 512bytes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
$ ls -s a.txt
4 a.txt
$ stat a.txt
  File: a.txt
  Size: 6               Blocks: 8          IO Block: 4096   regular file
Device: 802h/2050d      Inode: 656578      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Context: unconfined_u:object_r:user_home_t:s0
Access: 2019-12-01 09:49:27.651029032 +0000
Modify: 2019-12-01 09:51:26.508134896 +0000
Change: 2019-12-01 09:51:26.508134896 +0000
 Birth: 2019-12-01 09:49:27.651029032 +0000

It is recommended to practice partitioning with the relevant tools, and it is recommended to operate inside the virtual machine without worrying about messing up the host. Play with these commands.

  • xxd (provided by vim)
  • fdisk
  • mount
  • grub
  • ss
  • dd