CHAR and VARCHAR are commonly used data types for storing strings in MySQL. The official documentation describes the maximum length of CHAR as 255 and the maximum length of VARCHAR as 65535.
However, after operation, we found that the actual maximum ’length’ that VARCHAR can create is an indefinite value. This article will analyze this issue.
When we enter the table build statement
CREATE TABLE test( a VARCHAR(65535) NOT NULL)CHARSET=latin1; in MySQL, MySQL returns us
Row size too large error, which means MySQL does not support creating 65535 bytes long VARCHAR type. VARCHAR type of 65535 bytes.
However, we finally found out through testing that the maximum length that can be created is 65533. In order to better explain this phenomenon, we need to understand the internal storage method of MySQL.
In computer, character encoding is to encode the characters in a character set into a certain object in a specified set according to the bit pattern, so that the text can be stored in the computer and passed through the network. The same character may have different sizes in different encoding methods, for example, the UTF-8 encoding of Chinese character “我” is
E68891, which occupies three bytes, while the GBK encoding is
CED2, which occupies only two bytes. In the variable length byte encoding, different characters occupy different lengths, for example, “1” in UTF-8 encoding is
35, which occupies only one byte.
The way characters are stored in the database depends on the specific character encoding, since MySQL version 5.0, N in VARCHAR(N) or CHAR(N) refers to the maximum number of characters stored , not the maximum number of bytes. If we set the encoding of the table to
utf8mb4, then the column will take up N~4N bytes of space, and MySQL will estimate the worst-case input space when creating the table, at which point the maximum number of characters we can create is 65533/4 = 16383.
Row Record Format
Records are stored as rows in MySQL, and each 16KB-sized page can hold 2 to 200 row records. MySQL 5.0 introduced the row record format
Compact as the default storage format. After that,
Dynamic row record formats were introduced, but they are actually the same storage format.
- the header is a list of length records of a non-NULL variable-length column and is stored in reverse order.
- If the length of the column is less than or equal to 255 bytes, it is represented by 1 byte.
- If the length of the column is greater than 255 bytes, it is represented by 2 bytes.
- the second part is the NULL flag, indicating whether a column in the row is a NULL value, with 1, the NULL flag is also indefinite, if all the columns are set to NOT NULL when the table is built, then the NULL flag will not exist.
- followed by Record Header, which is fixed to occupy 5 bytes.
- if the primary key is not identified when the table is created, then MySQL automatically generates a RowID, which is 6 bytes long.
- Finally, there are two hidden columns, TransactionID, which records the transaction ID of the last time the row was modified and is 6 bytes long, and Roll Pointer, which is a rollback pointer and is 7 bytes long.
The maximum length of a variable-length column record in the header is two bytes. 65535 is the maximum value that can be represented by a 16-bit binary number and the maximum value that can be stored in a variable-length field, so MySQL theoretically supports VARCHAR type columns of this length to store data. However, the official document Limits on Table Column Count and Row Size states that MySQL rows have a 65,535-byte and that this feature is independent of the storage engine, even if we choose a storage engine that can support larger rows. That is, the maximum row length limits the length of columns .
For this reason, we can assume that the formula for calculating the length of rows in MySQL is: the sum of the lengths of all columns + the length of the record length list + the length of the NULL flag bit, and the space occupied by the fixed-length Record Header and hidden columns is not counted. Suppose there is a table with encoding
latin1, the length of a field is defined as 65530, and its length token occupies two bytes, the length of b field is defined as 2, and its length token occupies one byte, the total is exactly 65535 bytes. If the b field length is defined as 3, which exceeds the maximum row length limit, then the table cannot be created successfully.
Regarding the length of the NULL flag bit, if a row has nine columns, the first column is NULL and the other columns are not empty, then the NULL flag bit stores 0x01, which uses only one byte, and the ninth column is truncated in the flag bit to save some space. If all columns of the row data are not empty, MySQL will store 0x00 in the flag bit, which also takes up one byte, but will not be counted in the total length of the row data. The formula for calculating the length of the NULL flag bit can be derived from this: the position of the last NUUL column is N, and size = (N + 7)/8.
CHAR type storage
It is commonly understood that VARCHAR is the type of character that stores variable length and CHAR is the type of character that stores fixed length. In MySQL, the length of a fixed-length column is not stored in each row of data, but is defined at the table level. However, in multi-byte encoded character sets, the data stored inside a CHAR type column may not be of a fixed length.
We inserted two rows of data into the utf8mb4 encoded table and used the hexdump utility to see how the row records are stored in the
test_char_size.ibd binary storage file.
The storage of row data starts at address 00010079. The contents have been organized for ease of understanding.
From the above diagram, you can see more clearly the format of MySQL storing row data. Although CHAR(N) is a fixed-length data type, it may be stored internally as a variable-length column, and InnoDB records the true length of the column, and completes it with spaces (0x20) if it is less than N. You need to pay extra attention to this when calculating the total length of row data.
The maximum length of rows in MySQL and the maximum length of VARCHAR columns in InnoDB are both 65535, but there are also some uncertain data in the rows, and the space occupied by these data needs to be calculated in the row length, which leads to the VARCHAR columns cannot be filled. Besides, the non-single-byte encoded VHAR columns also need extra space to store the column length, and although the problem of overflow beyond the row length is rarely encountered in practical situations, extra attention should be paid to the way the data is stored.