Why IPv6 is a difficult replacement for IPv4

The network layer protocol assumes the two major functions of packet forwarding and routing, and it can provide the upper layer with the responsibility of transporting packets between different hosts. IP protocol, as a network layer protocol, plays an extremely critical role in today’s Internet, although it can only provide connectionless and unreliable services. In general, when we want to access services provided by other hosts, we need to access the target host through an IP address, which is the only way to be accessed by other hosts on the Internet.

Why MAC addresses do not need to be globally unique

MAC address (Media access control address) is a unique identifier assigned to the network interface controller (NIC), it will be used as a network address in the network segment, all hosts with a network card have a separate MAC address, the address contains a total of 48 bits, occupying A normal MAC address is represented in the format shown below, using two hexadecimal digits for each byte. 1 6e:77:0f:b8:8b:6b Because MAC addresses need to be unique, IEEE assigns address segments based on the manufacturer of the device.

Why Clusters Need Overlay Networks

Engineers who know a little about computer networks or Kubernetes networks should have heard of Overlay Network, which is not a new technology, but a computer network built on top of another network, a form of network virtualization technology, which has been promoted by the evolution of cloud computing virtualization technology in recent years. Because the Overlay network is a virtual network built on top of another computer network, it cannot stand alone, and the network on which the Overlay layer depends is the Underlay network, and the two concepts often appear in pairs.

Why system calls consume more resources

A system call is a way for a computer program to request services from the operating system kernel during execution. This may include hardware-related services, the creation and execution of new processes, and process scheduling. Anyone with a little knowledge of operating systems knows that - system calls provide the user program with an interface to the operating system. The famous glibc for C language encapsulates the system calls provided by the operating system and provides a well-defined interface that allows engineers to develop upper-level applications directly using the functions encapsulated in the container.

Why is the default page size for Linux 4KB?

We all know that Linux manages memory in pages. Whether it is loading data from disk into memory or writing data from memory back to disk, the operating system operates in pages. Even if we write only one byte of data to disk, we need to swipe the entire data from the entire page to disk. Linux supports both normal-sized memory pages and large memory pages (Huge Page). The default size of memory pages on most processors is 4KB, and although some processors use 8KB, 16KB, or 64KB as the default page size, 4KB pages are still the mainstream of the default memory page configuration of the operating system; in addition to the normal memory page size, different processors also contain large pages of different sizes.

Why databases should not use foreign keys

When we want to store data persistently, using relational databases is often the safest choice, not only because of the richness and stability of today’s relational databases, but also because the support for relational databases in different communities is very complete. In this article, we will analyze an important concept in relational databases - Foreign Key. In a relational database, a foreign key, also known as a relational key, is a set of columns that provide a connection between relational tables in a relational database.

Why CPU access to hard disk is slow

Mechanical Hard Disk Drives (HDD) and Solid State Drives (SSD) are the two most common types of hard drives used as external storage for computers, and it takes a long time for the CPU to access the data they store, as shown in the table below, where it takes 1,500 times longer to access a random 4KB of data in an SSD than it does to access the main memory, and 100,000 times longer to seek a mechanical disk than it does to access the main memory.

NVMe Solid State Drives with Keystone Storage KVell - SOSP '19

This paper is to present the paper in the 2019 SOSP Journal – KVell: the Design and Implementation of a Fast Persistent Key-Value Store, which implements KVell, a key-value store system developed for modern SSDs. Unlike mainstream key-value stores that use LSM trees (Log-structured merge-tree) or B-trees, KVell uses a completely new device in order to take full advantage of its performance and KVell uses a completely different design in order to take full advantage of the new device’s performance and reduce CPU overhead.

Why NUMA affects program latency

Non-Uniform Memory Access (NUMA) is a computer memory design approach, as opposed to Uniform Memory Access (UMA), also known as Symmetric Multi-Processor Architecture (SMP), which was used by early computers, but most modern computers use NUMA architecture to manage CPU and memory resources. As application developers, we are less likely to need direct access to hardware because the operating system shields us from many hardware-level implementation details. However, because NUMA affects applications, NUMA is something we must understand and be familiar with if we want to write high-performance, low-latency services, and this article will cover its impact in two ways.

Why HugePages can improve database performance

Memory is an important resource for computers, and while most services today do not require as much memory, databases and Hadoop services are big consumers of memory, consuming GBs and TBs of memory to speed up computations in production environments. The Linux operating system has introduced a number of strategies to manage this memory better and faster and reduce overhead, and today we are going to introduce HugePages, or large pages.

Why Linux needs Swapping

Anyone who knows a little bit about Linux knows that Linux divides the physical Random Access Memory (RAM) into 4KB-sized memory blocks by page, and the Swapping mechanism we are going to introduce today is closely related to memory, which is the process of the operating system copying the contents of physical memory pages to the swap space on the hard disk to release the memory. The physical memory and the swap partition on the hard disk make up the virtual memory available on the operating system, and these swap spaces are pre-configured by the system administrator.

Facebook Cluster Scheduling Management System - OSDI '20

This paper is about the paper in the 2020 OSDI Journal – Twine: A Unified Cluster Management System for Shared Infrastructure, which implements Twine, Facebook’s cluster management system for production environments for the past decade. Prior to the advent of this system, Facebook’s clusters consisted of separate resource pools customized for the business, which could not be shared with other businesses because the machines in these pools might have separate versions or configurations.

Nanosecond High Performance Logging System - ATC '18

In this paper, we would like to present a paper from the 2018 ATC Journal – NanoLog: A Nanosecond Scale Logging System, which implements NanoLog, a high-performance logging system that can perform 1 ~ 2 orders of magnitude better than other logging systems in the C++ community, e.g., spdlog, glog, and Boost Log. In this article, we will briefly analyze the design and implementation principles of NanoLog. Logging is an important part of system observability, and I’m sure many engineers have had the experience of adding logs on the fly to check for problems, a process the author has just re-visited.

Why OLAP Needs Columnar Storage

ClickHouse is one of the more popular online analytical processing (OLAP) data stores recently. Compared to our common traditional relational databases such as MySQL and PostgreSQL, data stores such as ClickHouse, Hive and HBase for online analytical processing (OLAP) scenarios tend to use columnar storage. Readers who know a little about databases know that Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) are the two most common scenarios of databases, and these two scenarios are not the only two, from which there are also concepts derived from Hybrid Transactional/Analytical Processing (HTAP).

Memory management design essentials

Disk for persistent storage is not a scarce resource today, but CPU and memory are still relatively expensive resources, and this article will describe how memory, a scarce resource in computers, is managed. Memory management systems and modules play an important role in operating systems and programming languages. The use of any resource is inseparable from the two actions of requesting and releasing, and two important processes in memory management are memory allocation and garbage collection, and how a memory management system can use limited memory resources to provide services to as many programs or modules as possible is its core goal.

Why Kubernetes is replacing Docker

Kubernetes is the de facto standard in container orchestration today, and Docker has played a pivotal role in containers from its inception to today, and is the default container engine in Kubernetes. However, in December 2020, the Kubernetes community decided to move forward with removing Dockershim-related code from its repositories, which was significant for both the Kubernetes and Docker communities. I’m sure most developers have heard of Kubernetes and Docker and know that we can use Kubernetes to manage Docker containers, but they may not have heard of Dockershim, the Docker shim.

Design Principles for Cluster Management System Mesos - NSDI '11

This paper presents a paper from the 2011 NSDI journal – Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center, which implements Mesos to manage different computing frameworks, such as Hadoop and MPI, in a cluster. Although the Mesos cluster management system is a technology that was released more than 10 years ago and has been gradually replaced today by the more mainstream and general container orchestration system Kubernetes, it does solve some of the cluster management problems.

How you should customize features for Kubernetes

Kubernetes is a very complex cluster orchestration system, yet even though it contains rich functionality and features, it cannot meet the needs of all scenarios because of the inherent high complexity of container scheduling and management. While Kubernetes can solve most of the common problems in most scenarios, in order to achieve a more flexible strategy, we need to use the scalability provided by Kubernetes for specific purposes. Each project will focus on different features at different cycles, and we can simply divide the evolution of the project into three distinct phases.

Serverless and Lightweight Virtualization Firecracker - NSDI '20

This paper is about the paper in the 2020 NSDI Journal – Firecracker: Lightweight Virtualization for Serverless Applications, which implements Firecracker to provide lightweight virtualization support on the host. Many developers today choose to use Serverless containers and services to reduce system overhead, improve hardware resource utilization, and enable rapid scaling, but the Serverless scenario places higher demands on container isolation, security, and performance. When serving multiple tenants with the same hardware, we expect the different workloads to be isolated for security and performance with minimal additional overhead.

Why early Windows needs defragmentation

I remember a dozen years ago when I was still using the early Windows system, every time the system would become very laggy, then you need to open the system provides the following disk defragmentation program, when the defragmentation is complete will feel the system becomes slightly smoother some. In the file system, defragmentation is the process of reducing fragmentation in the file system, which rearranges the contents of the same files on the disk in order and uses compression algorithms to remove gaps between files, somewhat similar to the tag compression algorithm in garbage collection.