Anyone who has followed Bare Metal related projects will have seen how the boot process, how to quickly provision a physical server, etc. is implemented, usually with a LiveOS running to achieve certain actions. The Tinkerbell project uses Linuxkit as the LiveOS, and the Plunder project uses BOOTy as the LiveOS. a few days ago @thebsdbox took out a part of BOOTy and showed the main implementation as ginit, so we can better understand the details of the installation. Take a look at this project today.

If you install a CentOS, it is usually started with kernel + initramfs.img. initramfs.img contains systemd, anaconda, dracut and some other components, and then systemd specifies the different Target affiliations/dependencies/order to complete the final Anaconda call. Anaconda calls.

Anaconda decides how to install itself by parsing the KickStart parameter in /proc/cmdline.

The ginit project shows the following.

  • Making initramfs.img
  • Making a RAW image from a Container image
  • Running a virtual machine with a RAW image and the Linux Kernel via QEMU
  • ginit automatically runs the entrypoint command in Container

Demonstration of the process

Creating a RAW image from a Container image

The RAW image will not end up containing the Kernel part, so take the Nginx Container as an example. Extract the Entrypoint from the nginx:latest image, prepare a RAW image via dd, format it as ext4, mount the raw image locally as a loop device, copy the Nginx image to the mount point via docker export, unmount the mount point, and finally The RAW image contains all the contents of the Nginx Container. The RAW image here does not contain the kernel, so it cannot be started directly and is only used as a dependency for subsequent actions.

The default Entrypoint for the Nginx Container is docker-entrypoint.sh, which is a script that does some parameter checking.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#!/bin/bash

echo "Lets build you a disk image!"
docker pull $1
ENTRYPOINT=$(docker inspect -f '{{.Config.Entrypoint}}' $1 | sed 's/[][]//g')
echo "Creating a 200MB Disk"
dd if=/dev/zero of=disk.img bs=1024k count=200
mkfs.ext4 -F disk.img
mkdir -p /tmp/disk
mount -t ext4 -o loop disk.img /tmp/disk/
echo "Converting $1 to disk image"
docker create --name exporter $1 null
docker export exporter | tar xv -C /tmp/disk
docker rm exporter
umount /tmp/disk
echo The command $ENTRYPOINT will start this container

Use ginit to make initramfs.img

Static compile ginit; download and compile busybox, place the ginit compilation init under the / path, archive busybox via cpio, and compress it using gzip. Once all the processes are complete, copy the resulting initramfs.cpio.gz to the project path. The initramfs will eventually contain busybox + ginit.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# syntax=docker/dockerfile:experimental


# Build ginit as an init
FROM golang:1.17-alpine as dev
RUN apk add --no-cache git ca-certificates gcc linux-headers musl-dev
COPY . /go/src/github.com/thebsdbox/ginit/
WORKDIR /go/src/github.com/thebsdbox/ginit
ENV GO111MODULE=on
RUN --mount=type=cache,sharing=locked,id=gomod,target=/go/pkg/mod/cache \
    --mount=type=cache,sharing=locked,id=goroot,target=/root/.cache/go-build \
    CGO_ENABLED=1 GOOS=linux go build -a -ldflags "-linkmode external -extldflags '-static' -s -w" -o init
    

# Build Busybox
FROM gcc:10.1.0 as Busybox
RUN apt-get update; apt-get install -y cpio
RUN curl -O https://busybox.net/downloads/busybox-1.31.1.tar.bz2
RUN tar -xf busybox*bz2
WORKDIR busybox-1.31.1
RUN make defconfig; make LDFLAGS=-static CONFIG_PREFIX=./initramfs install

WORKDIR initramfs 
COPY --from=dev /go/src/github.com/thebsdbox/ginit/init .

# Package initramfs
RUN find . -print0 | cpio --null -ov --format=newc > ../initramfs.cpio 
RUN gzip ../initramfs.cpio
RUN mv ../initramfs.cpio.gz /

FROM scratch
COPY --from=Busybox /initramfs.cpio.gz .

Running the EntryPoint command in the Container through QEMU

As of now, we have initramfs.img, we have the raw image, but we are still missing the Linux Kernel. You can download the boot executable bzImage file directly from netboot provided by Ubuntu.

Now that all the preparations are done, we can run the virtual machine directly from QEMU, with Nginx in the RAW Image and ginit in the initramfs.

As mentioned earlier, the default Entrypoint for the Nginx Container is docker-entrypoint.sh, which is used to do some parameter wrapping, so here I changed the parameter to /usr/sbin/nginx.

1
2
3
4
5
6
qemu-system-x86_64 -nographic \
  -kernel ./linux \
  -append "entrypoint=/usr/sbin/nginx root=/dev/sda console=ttyS0" \
  -initrd ./initramfs.cpio.gz \
  -hda ./disk.img \
  -m 1G

The virtual machine console is ttyS0, which can be run through a terminal to view the boot log directly.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
...
[    1.469920] rtc_cmos 00:00: setting system clock to 2022-03-05T06:36:19 UTC (1646462179)
[    1.525397] ata1.00: ATA-7: QEMU HARDDISK, 2.5+, max UDMA/100
[    1.525579] ata1.00: 409600 sectors, multi 16: LBA48 
[    1.532980] ata2.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100
[    1.540741] scsi 0:0:0:0: Direct-Access     ATA      QEMU HARDDISK    2.5+ PQ: 0 ANSI: 5
[    1.545673] sd 0:0:0:0: [sda] 409600 512-byte logical blocks: (210 MB/200 MiB)
[    1.547063] sd 0:0:0:0: [sda] Write Protect is off
[    1.547515] sd 0:0:0:0: Attached scsi generic sg0 type 0
[    1.548188] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    1.550227] scsi 1:0:0:0: CD-ROM            QEMU     QEMU DVD-ROM     2.5+ PQ: 0 ANSI: 5
[    1.568178] sd 0:0:0:0: [sda] Attached SCSI disk
[    1.578345] sr 1:0:0:0: [sr0] scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray
[    1.578736] cdrom: Uniform CD-ROM driver Revision: 3.20
[    1.582611] sr 1:0:0:0: Attached scsi generic sg1 type 5
[    1.595655] Freeing unused decrypted memory: 2040K
[    1.666044] Freeing unused kernel image memory: 2712K
[    1.666482] Write protecting the kernel read-only data: 22528k
[    1.669246] Freeing unused kernel image memory: 2008K
[    1.670507] Freeing unused kernel image memory: 1192K
[    1.742691] x86/mm: Checked W+X mappings: passed, no W+X pages found.
[    1.743002] Run /init as init process
INFO[0000] Folder created [dev] -> [/dev]          
INFO[0000] Folder created [proc] -> [/proc]        
INFO[0000] Folder created [sys] -> [/sys]          
INFO[0000] Folder created [tmp] -> [/tmp]          
INFO[0000] Mounted [dev] -> [/dev]                 
INFO[0000] Mounted [proc] -> [/proc]               
INFO[0000] Mounted [sys] -> [/sys]                 
INFO[0000] Mounted [tmp] -> [/tmp]                 
INFO[0000] Starting DHCP client                    
INFO[0000] Starting ginit                          
ERRO[0000] Error finding adapter [Link not found]  
[    2.209227] tsc: Refined TSC clocksource calibration: 2893.182 MHz
[    2.209573] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x29b41aa25d4, max_idle_ns: 440795325238 ns
[    2.209984] clocksource: Switched to clocksource tsc
INFO[0002] Beginning provisioning process          
ERRO[0002] route ip+net: no such network interface 
INFO[0002] Folder created [root] -> [/mnt]         
[    3.902861] random: fast init done
[    3.912319] EXT4-fs (sda): recovery complete
[    3.913757] EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null)
[    3.914463] ext4 filesystem being mounted at /mnt supports timestamps until 2038 (0x7fffffff)
INFO[0002] Mounted [root] -> [/mnt]                
INFO[0002] Mounted [dev] -> [/mnt/dev]             
INFO[0002] Mounted [proc] -> [/mnt/proc]           
INFO[0002] Starting Shell                          
INFO[0002] Waiting for command to finish...        
/ #

The /init in [ 1.743002] Run /init as init process is already the ginit we compiled above, and the log output of the ginit run is as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
INFO[0000] Folder created [dev] -> [/dev]          
INFO[0000] Folder created [proc] -> [/proc]        
INFO[0000] Folder created [sys] -> [/sys]          
INFO[0000] Folder created [tmp] -> [/tmp]          
INFO[0000] Mounted [dev] -> [/dev]                 
INFO[0000] Mounted [proc] -> [/proc]               
INFO[0000] Mounted [sys] -> [/sys]                 
INFO[0000] Mounted [tmp] -> [/tmp]                 
INFO[0000] Starting DHCP client                    
INFO[0000] Starting ginit                          
ERRO[0000] Error finding adapter [Link not found]  
[    2.209227] tsc: Refined TSC clocksource calibration: 2893.182 MHz
[    2.209573] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x29b41aa25d4, max_idle_ns: 440795325238 ns
[    2.209984] clocksource: Switched to clocksource tsc
INFO[0002] Beginning provisioning process          
ERRO[0002] route ip+net: no such network interface 
INFO[0002] Folder created [root] -> [/mnt]         
[    3.902861] random: fast init done
[    3.912319] EXT4-fs (sda): recovery complete
[    3.913757] EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null)
[    3.914463] ext4 filesystem being mounted at /mnt supports timestamps until 2038 (0x7fffffff)
INFO[0002] Mounted [root] -> [/mnt]                
INFO[0002] Mounted [dev] -> [/mnt/dev]             
INFO[0002] Mounted [proc] -> [/mnt/proc]           
INFO[0002] Starting Shell                          
INFO[0002] Waiting for command to finish.

There are several things: create the necessary paths, create the corresponding devices, start a DHCP Client to get the IP address, mount the RAW image to /mnt, run the program specified in the entrypoint parameter via chroot, in this case /usr/sbin/nginx, and finally provide a shell environment to the user. We can see what processes are currently running with the ps command.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
/ # ps -ef |grep -v '\['
PID   USER     TIME  COMMAND
    1 0         0:01 /init
  178 0         0:00 nginx: master process /usr/sbin/nginx
  179 0         0:00 /bin/sh
  180 101       0:00 nginx: worker process
  193 0         0:00 ps -ef
/ # df 
Filesystem           1K-blocks      Used Available Use% Mounted on
devtmpfs                497020         4    497016   0% /dev
tmpfs                   502392         0    502392   0% /tmp
/dev/sda                181984    150940     16708  90% /mnt
devtmpfs                497020         4    497016   0% /mnt/dev
/ # ls /mnt/docker-entrypoint.sh 
/mnt/docker-entrypoint.sh
/ # ls /mnt/usr/sbin/nginx
/mnt/usr/sbin/nginx
/ # ls -hl /init
-rwxr-xr-x    1 0        0           3.4M Mar  5 04:20 /init

Now that we have the instructions to run in a Container Image running through the Linux kernel with initramfs, in a Bare Metal scenario, we can build Nginx into initramfs, replace Nginx with Docker or Container, and expose it. The physical server is used as the Docker Server, and the standby server is used as the Docker Client to connect to the physical server to run the specified container, and finally complete the installation of the physical server OS.

ginit implementation

Create system device and mount it

The DefaultMounts and DefaultDevices define some mandatory devices such as /dev/null , /dev/random , /dev/urandom , and mount points such as /dev , /proc , /tmp , /sys .

1
2
3
4
5
6
7
8
9
urandom := Device{
		CreateDevice: false,

		Name:  "urandom",
		Path:  "/dev/urandom",
		Mode:  syscall.S_IFCHR,
		Major: 1,
		Minor: 9,
	}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
dev := Mount{
		CreateMount: false,
		EnableMount: false,
		Name:        "dev",
		Source:      "devtmpfs",
		Path:        "/dev",
		FSType:      "devtmpfs",
		Flags:       syscall.MS_MGC_VAL,
		Mode:        0777,
	}
	m.Mount = append(m.Mount, dev)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
//cmd.Execute()
m := realm.DefaultMounts()
d := realm.DefaultDevices()
dev := m.GetMount("dev")
dev.CreateMount = true
dev.EnableMount = true

proc := m.GetMount("proc")
proc.CreateMount = true
proc.EnableMount = true

tmp := m.GetMount("tmp")
tmp.CreateMount = true
tmp.EnableMount = true

sys := m.GetMount("sys")
sys.CreateMount = true
sys.EnableMount = true

// Create all folders
m.CreateFolder()
// Ensure that /dev is mounted (first)
m.MountNamed("dev", true)

// Create all devices
d.CreateDevice()

// Mount any additional mounts
m.MountAll()

After the basic environment is prepared, start the DHCP Client and obtain an IP address.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
log.Println("Starting DHCP client")
	go realm.DHCPClient()

	// HERE IS WHERE THE MAIN CODE GOES
	log.Infoln("Starting ginit")
	time.Sleep(time.Second * 2)

	log.Infoln("Beginning provisioning process")

	mac, err := realm.GetMAC()
	if err != nil {
		log.Errorln(err)
		//realm.Shell()
	}
	fmt.Print(mac)

Now that the system environment is ready and the network is ready, it is time to run the specific commands. The way to get the commands is by parsing /proc/cmdline, /proc/cmdline is passed through --append when we create the VM.

After parsing the root and entrypoint parameter values, root is mounted to the corresponding mount point via Mount, and entrypoint is run via chroot.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
   stuffs, err := ParseCmdLine(CmdlinePath)
if err != nil {
	log.Errorln(err)
}
_, err = realm.MountRootVolume(stuffs["root"])
if err != nil {
	log.Errorf("Disk Error: [%v]", err)
}

cmd := exec.Command("/usr/sbin/chroot", []string{"/mnt", stuffs["entrypoint"]}...)
cmd.Stdin, cmd.Stdout, cmd.Stderr = os.Stdin, os.Stdout, os.Stderr

err = cmd.Start()
if err != nil {
	log.Errorf("command error [%v]", err)
}
err = cmd.Wait()
if err != nil {
	log.Errorf("error [%v]", err)
}

realm.Shell()

After all programs are run, a Shell environment is provided to the user.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
// Shell will Start a userland shell
func Shell() {
	// Shell stuff
	log.Println("Starting Shell")

	// TTY hack to support ctrl+c
	cmd := exec.Command("/usr/bin/setsid", "cttyhack", "/bin/sh")
	cmd.Stdin, cmd.Stdout, cmd.Stderr = os.Stdin, os.Stdout, os.Stderr

	err := cmd.Start()
	if err != nil {
		log.Errorf("Shell error [%v]", err)
	}
	log.Printf("Waiting for command to finish...")
	err = cmd.Wait()
	if err != nil {
		log.Errorf("Shell error [%v]", err)
	}
}

Summary

ginit is a minimal implementation that makes it easy to quickly understand what init does, and replacing ginit with systemd does the same thing, but it’s easy to get lost in the piles of Target dependencies. In the process of searching for information, I also saw that https://github.com/QuentinPerez/busygoxdoes something similar, which can be used as a reference.