Recently I wrote a program to recursively search for folder statistics and got stuck when measuring the size of a folder

I used find -type l to search for soft links and found two links referenced recursively: a points to b’s folder while b points to a’s folder

Looking at the source code, I realized that the underlying std::fs::metadata call stat() system call would follow the link

I switched to symlink_metadata, and the underlying lstat() call solved the stuck bug by skipping the soft link.

errno 40 ELOOP

The Linux glibc error code errno 40(ELOOP) means too many soft link bounces?

I thought the standard library would report an ELOOP error, but then Mr. Wildcat said that std::fs made a lot of sacrifices to be cross-platform.

(It seems that the std::fs API does not handle ELOOP error codes? But I’ve never seen errno ELOOP, so I won’t talk nonsense)

Trade-offs made by standard libraries for cross-platform purposes, such as is_syslink() for metadata() always returning false, are obviously not well designed for Linux systems

because metadata/stat will “eat” the soft link, which is equivalent to resolving to a normal file, i.e. the abstraction of a soft link does not exist

so metadata().is_symlink() must always return false on Linux

That’s why the Rust documentation is kind enough to emphasize that is_symlink() must be used in conjunction with symlink_metadata() to be effective

The only way to know if a file is softlinked or not is to use symlink_metadata/lstat to not track softlinks

Of course, there must be a hint in the man documentation, but when I first read it TLDR was too long and I didn’t read it carefully…

In the symlink_metadata()/lstat() return value

is_symlink() and is_dir() are mutually exclusive only if one of them is true and the other is false

I have to say that the standard library lacks a lot of support for various LinuxExt, and is_symlink is not expected to be stable until early 2022.

Since Metadata’s member fields are all private, you can only transmute or find out if there is a UnixExt or something like that.

// 方法一: linux::fs::MetadataExt
let st_mode = std::os::linux::fs::MetadataExt::st_mode(&metadata);

// 方法二: transmute
let st_mode = unsafe { std::mem::transmute::<_, libc::mode_t>(metadata.file_type()) };

// 方法三: 我不用标准库了,直接调用 libc::stat 或 libc::lstat

Why is the du command not the same as Metadata::len()

Hard disk 4k alignment

For example, if a.txt has only one character, the stat command or fs::Metadata::len() does look like it has a size of 1

However, if you look at it with du, it says 4k, because the block-size of the Linux ext4 file system is usually 4k.

You can understand that the minimum storage unit of the hard disk is 4k, and all files occupy an integer multiple of 4k, which seems to be called 4k alignment

It is a bit like the structure internal storage layout to be aligned with the CPU register size of 8 byte, the structure size should be an integer multiple of 8 byte as much as possible

If you add a block-size parameter to the du command, such as du --apparent-size --block-size 1, it is the same as the stat command.

du --bytes or du -b is short for du --apparent-size --block-size 1

Is /proc really zero size?

The du command is not lying, the three virtual filesystems /dev, /proc, /sys are really zero size on the hard disk (because they don’t exist on the hard disk at all)

Although most of the files in these three folders are zero size when you look at the stat command, for example, “/proc/bus/pci/00/01.2” still has a size

For example, “/proc/config.gz” stores the compile-time parameters of the Linux kernel

Many of the parameters are of type String, so the size of zcat is also “variable” or indeterminate length.

So stat just says what the length of /proc/config.gz will be if it is read at the current moment

[w@ww repos]$ stat /proc/config.gz 
  File: /proc/config.gz
  Size: 58526           Blocks: 0          IO Block: 1024   regular file
Device: 0,21    Inode: 4026532079  Links: 1
Access: (0444/-r--r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-11-08 21:11:50.742480314 +0800
Modify: 2021-11-08 21:11:50.742480314 +0800
Change: 2021-11-08 21:11:50.742480314 +0800
 Birth: -
[w@ww repos]$ du /proc/config.gz 
0       /proc/config.gz