Index design principles for relational databases

2022-08-20

tutorials

2609 words 13 min read

Inappropriate indexes are the most common cause of poor performance in relational database systems. Common situations include not having enough indexes, some SELECT statements may not have valid indexes, index columns are not in the right order, etc. Some developers believe that if a SQL statement uses indexes, then the query performance of the statement will be greatly improved, and that professional index design should be done by the DBA. However, we can design efficient indexes as long as we know how the database handles the task internally.

Why is the maximum length of VARCHAR in MySQL 65535?

2022-08-19

tutorials

1659 words 4 min read

CHAR and VARCHAR are commonly used data types for storing strings in MySQL. The official documentation describes the maximum length of CHAR as 255 and the maximum length of VARCHAR as 65535. However, after operation, we found that the actual maximum ’length’ that VARCHAR can create is an indefinite value. This article will analyze this issue. When we enter the table build statement CREATE TABLE test( a VARCHAR(65535) NOT NULL)CHARSET=latin1;

Why do Floating-point arithmetic have precision problems

2022-08-19

tutorials

1582 words 4 min read

The IEEE standard for binary floating-point arithmetic (IEEE 754) is the most widely used standard for floating-point arithmetic since the 1980s and is used by many CPUs and floating-point operators. However, this floating-point representation also poses certain accuracy problems, which we will discuss. IEEE 754 provides four precision specifications, of which single-precision floating-point and double-precision floating-point are the most commonly used types, and most programming languages today such as C,

Solutions when Kubernetes or MicroK8s encounters Docker Hub download limitations

2022-08-19

tutorials

1027 words 5 min read

I’ve been doing a five-day Kubernetes education training at an enterprise for the past few weeks, and our Lab environment is all based on lightweight MicroK8s. But the course didn’t start well because there were about 20 people in the class, and they were all using the company’s network, and they all shared the same IP connection to the extranet, so they ran into the annoying Rate limit on Docker Hub problem.

What's new in Python 3.10

2022-08-19

tutorials

4497 words 22 min read

The Python language is designed to make complex tasks simple, so updates iterate relatively quickly and require us to keep up with them! The release of a new version is always accompanied by new features and functions, and the first thing we need to understand and attribute these points before upgrading the version, so that we may use them flexibly in our programming later. Can’t wait, ready to go, so let’s start now!

ZooKeeper and Zab Protocol

2022-08-18

tutorials

3469 words 17 min read

ZooKeeper is a typical distributed data consistency solution dedicated to providing a high performance, highly available, distributed orchestration service with strict sequential access control. In the previous article Principles and Implementation of etcd, a Distributed Key-Value Store we learned about the implementation principles of the distributed orchestration service etcd key modules, in this article we take a look at the ZooKeeper implementation ideas. ZooKeeper was created by Yahoo, an Internet company, and uses the Zab protocol, a consensus algorithm designed specifically for the service.

Principles and Implementation of etcd, a Distributed Key-Value Store

2022-08-18

tutorials

11198 words 53 min read

etcd is an open source project initiated by CoreOS to build a highly available distributed key-value storage system. etcd can be used to store critical data and implement distributed orchestration and configuration services, playing a key role in modern cluster operations. etcd is a distributed key-value storage service based on the Raft consensus algorithm. The project structure is modular, with the Raft module for distributed consensus, the WAL module for data persistence, and the MVCC module for state machine storage.

Use Wireshark to analyze TCP throughput bottlenecks.

2022-08-18

tutorials

1520 words 8 min read

When debugging network quality, we generally focus on two factors: latency and throughput (bandwidth). Latency is easier to verify with a single ping or mtr. This article shares a way to debug throughput. Scenarios where throughput is important are generally the so-called Long Fat Networks (LFN, rfc7323). For example, downloading large files. The throughput does not reach the upper limit of the network and can be affected by 3 main factors.

kubernetes Network Model

2022-08-18

tutorials

1698 words 8 min read

In this paper, we will explore the network models in Kubernetes, as well as analyze various network models. Underlay Network Model What is Underlay Network Underlay Network as the name suggests is the physical network topology that the network equipment infrastructure such as switches, routers, DWDM are linked into using the network media that is responsible for the transmission of packets between networks. Source https://community.cisco.com/t5/data-center-switches/understanding-underlay-and-overlay-networks/td-p/4295870 The underlay network can be either layer 2 or layer 3; a typical example of a layer 2 underlay network is Ethernet Ethernet, and a typical example of a layer 3 being an underlay network is the Internet Internet.

Talking about distributed consensus algorithms and data consistency

2022-08-17

tutorials

5968 words 29 min read

One of the most important abstractions in distributed systems is consensus: all the nodes agree on a certain proposal. One or more nodes in a distributed system can propose certain values, and a consensus algorithm will decide the final value, while the core idea of consensus is that the decision is unanimous and once decided, it cannot be changed. This article summarizes some common consensus algorithms and theories in the distributed domain, hoping to gain a more comprehensive knowledge.

How to play Dapr without Kubernetes?

2022-08-17

tutorials

2737 words 13 min read

Dapr is designed as an enterprise-class microservices programming platform for developers that is independent of specific technology platforms and can run “anywhere”. Dapr itself does not provide an “infrastructure”, but rather uses its own extensions to adapt to specific deployment environments. In its current state, a true native Dapr application can only be deployed in a K8S environment if you wish to take it into production. While Dapr also provides support for Hashicorp Consul, there does not appear to be a stable version available.

Snapshot in Postgres

2022-08-17

news

732 words 4 min read

I recently wanted to learn something about the postgres ecosystem, and I didn’t quite understand its MVCC mechanism before, so I came back to try to understand it again. Here we ignore the concurrency control and cleanup part of MVCC and just look at the Snapshot part first. Tuple Postgres doesn’t have the MySQL kind of UNDO log, the multi-version data (Tuple) is stored directly in the tablespace with meta information to distinguish the versions.

GitOps Tool Argo CD Hands-On Tutorial

2022-08-17

tutorials

5281 words 11 min read

Argo CD is a continuous deployment tool for Kubernetes that follows the declarative GitOps philosophy. argo CD automatically synchronizes and deploys applications when Git repositories change. Argo CD follows the GitOps model, using Git repositories as the true source for defining the required application state, and Argo CD supports multiple Kubernetes manifests. kustomize helm charts ksonnet applications jsonnet files Plain directory of YAML/json manifests Any custom config management tool configured

CPU Cache Coherence and Memory Barrier

2022-08-16

tutorials

4320 words 21 min read

On modern CPUs (most of them), all memory accesses need to go through layers of cache, and understanding the CPU cache update coherency issues can be of great help in designing and debugging our programs. This article will introduce the CPU cache system and how to use memory barriers for cache synchronization. The memory hierarchy of early computer systems had only three levels: CPU registers, DRAM main memory, and disk storage.

Principle and Implementation of LSM-Tree and LevelDB

2022-08-16

tutorials

7212 words 15 min read

LSM-Tree is a data structure for write many read few application scenarios, which is adopted by powerful NoSQL databases such as Hbase and RocksDB as the underlying file organization method. In this paper, we will introduce the design idea of LSM-Tree, and analyze how LevelDB using LSM-Tree is implemented and optimized for performance. Before understanding LSM-Tree, the storage systems I have studied, such as MySQL and etcd, are all oriented

Talk about the principle and optimization of std::uniform_int_distribution

2022-08-16

tutorials

1653 words 8 min read

Normally, the Pseudo Random Number Generator generates integers in the interval [0, 2^N). Each integer in this interval has an equal chance of being generated. However, the random numbers we need usually belong to a smaller interval (e.g., the dice simulator needs random numbers in the interval [1, 6]), so we must use std::uniform_int_distribution to map [0, 2^N) to the interval [min, max]. This article follows Daniel Lemire’s paper Fast Random Integer Generation in an Interval and introduces several different mapping methods in order.

Developing Kubernetes Operators with Go: Basic Architecture

2022-08-16

tutorials

6071 words 29 min read

Image credit from “Kubernetes Operators Explained A few years ago, I called Kubernetes the de facto standard for service orchestration and container scheduling, and today K8s is the unchallenged “kingpin” of the space. However, while Kubernetes has evolved to become very complex today, the original data model, application model and scaling approach of Kubernetes is still valid. And application models and scaling methods like Operator are becoming increasingly popular with developers and operators.

Virtual memory in Linux systems

2022-08-15

tutorials

2582 words 13 min read

Modern operating systems use virtual memory for memory management, and this article summarizes the principles of virtual memory and some application scenarios. The main memory of a computer system is composed of M contiguous byte arrays, each of which has a unique Physical Address (PA). Under these physical conditions, the most natural way for the CPU to access memory is to use physical addressing, as shown in the figure below, where the CPU executes a load instruction to read the data at physical address 5.

Linux Virtual File System

2022-08-15

tutorials

1839 words 9 min read

A file system is a mechanism for organizing data and metadata on a storage device, and with such a broad definition, implementations vary greatly from file system to file system, including ext4, NFS, /proc, etc. Linux uses a layered architecture that separates the user interface layer, the file system implementation, and the drivers for the storage device, and is thus compatible with different file systems. The Virtual File System (VFS) is a software layer in the Linux kernel that provides a standard, abstract set of file operations in the kernel, allows different file systems to coexist, and provides a unified file system interface to user space programs.

Google C++ Style Excerpt Notes

2022-08-15

tutorials

1432 words 7 min read

Why this excerpt? Because when I was learning C++ code, I found that its functional advantages were too powerful and that there were no strong restrictions on writing code, and C++ has actually carried a lot of old-world atmosphere after decades of iteration, which has led to different habits of people from different backgrounds writing C++ code in different times, so I wanted to learn as much as possible from how good teams write C++ code, so I wrote this one excerpt.