PHP is simple, but it’s not easy to master. In addition to knowing how to use it, we also need to know how it works under the hood.
What is the purpose of understanding the underlying implementation of PHP? To use a dynamic language well, we must first understand it, the memory management and framework model is worth learning from, and we can optimize the performance of our programs by extending the development to achieve more and more powerful features.
PHP is a dynamic language for web development. To be more specific, it is a software framework that contains a large number of component modules implemented in C. It is a powerful UI framework.
In short; the PHP dynamic language execution process: after getting a piece of code, the source program is translated into individual instructions (opcodes) through lexical and syntactic parsing, and then the ZEND virtual machine executes these instructions in order to complete the operation. PHP itself is implemented in C, so the final calls are to C functions, so in effect we can think of PHP as a C-developed piece of software.
PHP directory structure
The PHP source code also includes several files generated during development, and several sections maintained in their respective locations upstream. (Note: PHP version 7.4.13).
|TSRM||Thread-related safety implementation, PHP thread safety is built on top of the TSRM library, PHP implementation of the common *G macro is usually the encapsulation of TSRM, TSRM (Thread Safe Resource Manager) thread-safe resource manager.|
|Zend||Core implementation of PHP parser, such as lexical syntax parsing of scripts, execution of opcode and implementation of extension mechanism, etc.|
|build||Compile related directories under linux|
|ext||PHP extensions, including the definition and implementation of most PHP functions, such as array series, pdo series, spl series and other function implementations, are in this directory. Personally written extensions can also be placed in this directory when testing, for easy testing and debugging.|
|main||PHP’s main code, where the most core PHP files are stored, mainly to achieve the basic facilities of PHP, here and Zend engine is not the same, Zend engine mainly to achieve the language’s most core language runtime environment.|
|netware||Network directory, definition and implementation of sockets|
|pear||PEAR is an abbreviation for the PHP Extension and Application Repository, a code repository for PHP extensions and applications. It is a code repository for PHP extensions and applications. Simply put, PEAR is to PHP what CPAN (Comprehensive Perl Archive Network) is to Perl.|
|sapi||PHP’s application layer interface contains code for various server abstraction layers, such as apache’s mod_php, cli,cgi,embed, and fpm.|
|scripts||Script directory under Linux|
|tests||Test scripts directory, containing test files for various PHP functions|
|travis||For building, non-PHP specific directories|
|win32||The scripts related to compiling PHP under Windows, such as the implementation of sokcet is not quite the same under Windows and *Nix platform, and also includes the scripts related to compiling PHP under Windows。|
Although there are many source directories, the only core directories are
The input to
PHP programs can be standard input from the command line, or network requests based on the
cgi/fastcgi protocol. It can even be embedded in a microcontroller for
C++ programs to call. They correspond to cli mode, fpm/cgi mode, embed mode, and in addition to these there are
apache2handler, litespeed mode.
apache2handle: This is the way to deal with
mod_PHPmode to run, and it is the most widely used one now.
cgi: This is another way of interaction between
PHPdirectly, that is, the famous fastcgi protocol, in recent
fastcgi+PHPis getting more and more applications, and it is the only way supported by asynchronous
webserver; typical application
fastcgiis To be clear, it is an extension of
cli: command invocation.
The sapi directory is an abstraction of the input and output layers, and is the specification for PHP to provide external services.
Similarly, the output can be written to the standard output of the command line or returned to the client as a network response based on the cgi/fastcgi protocol.
SAPI full name Server API, responsible for PHP external service specification, it defines the structure sapi_module_struct, the structure defines the mode start, shutdown, activation, expiration and so on many hook function pointers, each mode will these function pointers to their own function, it can easily extend the way of PHP external service. The above several modes are also the implementation of sapi_module_strcut to complete the multi-scenario application of PHP.
Web Serverloads the
FastCGIprocess manager (IIS ISAPI or Apache Module) at startup
FastCGIprocess manager initializes itself, starts multiple
CGIinterpreter processes (visible as multiple php-cgi) and waits for a connection from the Web
- When a client request reaches the
Web Server, the
FastCGIprocess manager selects and connects to a
Web serversends the
CGIenvironment variables and standard input to the
FastCGIsubprocess finishes processing and returns the standard output and error messages to the
Web Serverfrom the same connection. When the
FastCGIsubprocess closes the connection, the request is processed. The
FastCGIsubprocess then waits for and processes the next connection from the
FastCGIprocess manager (running in the Web Server). In
php-cgiexits at this point.
- In the above case, you can imagine how slow
CGIusually is. For every
Webrequest, PHP has to re-parse
php.ini, reload all the extensions and re-initialize all the data structures. With
FastCGI, all of this happens only once when the process starts. An additional benefit is that Persistent database connection works.
main directory is the glue between the
SAPI layer and the
The role of the
main directory is to take requests from
SAPI, parse out the script files and parameters to be executed, and initialize the environment and configuration, such as initializing variables and constants, registering functions, parsing configuration files, loading extensions, etc.
Zend engine is the kernel part of
php, which translates
php code into executable
opcode processing and implements the corresponding processing methods, basic data structures, memory allocation management, etc. It consists of two parts: the compiler and the executor.
The compiler is responsible for the lexical and syntactic analysis of the
PHP code, and generates an abstract syntax tree, which is then further compiled into
opcode is the instruction recognized by the
Zend virtual machine,
opcodes in total, and all the syntax is composed of these
opcodes. The executor is responsible for executing the
opcode output by the compiler.
ext(extension), which is a way to extend the function of
PHP kernel, divided into
PHP extension and
zend extension, both support user-defined development, both are more common,
PHP extensions are
array, etc., and the familiar
TSRM (Thread Safe Resource Manager) is a thread-safe resource manager.
A global variable is a variable defined outside a function, it is a public resource, in a multi-threaded environment, access to public resources may cause conflicts, TSRM is born to solve the problem.
The main purpose of TSRM is to ensure the safety of shared resources, and PHP’s thread safety mechanism is simple and intuitive - in a multi-threaded environment, each thread is provided with a separate copy of the global variable. This is implemented by allocating (locking before allocation) an independent ID (self-incrementing) to each thread via TSRM as an index to the current thread’s global variable memory area, enabling complete independence between threads for subsequent global variable access.
Most of the PHP SAPIs are single-threaded, so there is not much need to pay attention to thread safety, but in the case of Apache or the user’s own implementation of the PHP environment, it is necessary to consider thread safety.
PHP design philosophy and features
- multi-process model: Since PHP is a multi-process model, different requests do not interfere with each other, which ensures that a request hanging will not affect the full service, and PHP also supports multi-threaded model as early as now.
- weakly typed language: different from C/C++, JAVA, C# and other languages, PHP is a weakly typed language. The type of a variable is not determined unchanged at the beginning, it will be determined and may occur implicitly or display type conversion only in the run, the flexibility of this mechanism is very convenient and efficient in web development, the specific will be detailed in the PHP variables later.
- engine (Zend) + component (ext) model to reduce internal coupling.
- middle layer (sapi ) Sapi full name is Server Application Programming Interface isolated web server and PHP.
- syntax is simple and flexible, not too much specification. Disadvantages lead to mixed styles.
php execution flow & opcode
The php dynamic language execution process: after getting a piece of code, the source program is translated into individual instructions (opcodes) through lexical and syntactic parsing, and then the Zend virtual machine executes these instructions sequentially. php itself is implemented in c, so the final calls are to c functions, so in effect, we can think of php as a piece of software developed in c.
The core of php execution is the translated directives (opcode), which are the basic unit of php program execution.
There are several common processing functions.
HashTable - the core data structure
HashTable is the core data structure of
Zend, it is used to implement almost all common functions in
PHP, we know PHP array is its typical application, in addition, inside
zend, such as function symbol table, global variables, etc. are also based on
hash table with the following features.
- supports typical
- can be used as an array
- O(1) complexity for adding and deleting nodes
- key supports mixed types: the presence of both associative index arrays
- value supports mixed types: array(“string”,2332)
- linear traversal support: such as
Zend hash table implements the typical
hash table hash structure, and provides forward and reverse traversal of arrays by attaching a two-way chain table. The structure is shown in the following figure.
As you can see: in
hash table there is both a hash structure in the form of
key->value and a bidirectional linked table model, which makes it very convenient to support fast lookup and linear traversal.
Hash structure: Zend’s hash structure is a typical hash table model, which resolves conflicts by means of linked lists. Note that zend’s hash table is a self-growing data structure, and when the hash table is full, it dynamically expands by a factor of two and repositions the elements. In addition, zend itself has made some optimizations to speed up the key->value fast lookup by trading space for time. For example, in each element, a variable nKeyLength is used to identify the length of the key for quick determination.
Doubly linked list:
Zend hash tableimplements a linear traversal of elements through a Linked list structure. In theory, it is enough to use a Linked list for traversal. The main reason for using a Doubly linked list is to quickly delete and avoid traversal. The
Zend hash tableis a composite structure that can be used as an array, i.e. it supports the usual associative arrays and can be used as sequential indexed numbers, even allowing a mixture of the two. PHP associative arrays: Associative arrays are the typical application of
hash_table. A query process goes through the following steps (as you can see from the code, this is a common hash query process with some quick determinations to speed up the lookup).
- PHP Indexed Arrays
Index arrays are our common arrays, accessed by subscripts. For example:
$arr, zend hashtable is internally normalized, and for index type key is also assigned hash value and nKeyLength (to 0). The internal member variable nNextFreeElement is the maximum id currently assigned, which is automatically added to one after each push. It is this normalization process that allows PHP to achieve a mix of associative and non-associative. Due to the special nature of the push operation, the order of index keys in the PHP array is not determined by the subscript size, but by the order of push. For example,
$arr = 2;
$arr = 3; for a double type key, Zend hashtable will treat him as an index key.
PHP is a weakly typed language that does not strictly distinguish between the types of variables itself.
PHP does not require a type to be specified at the time of variable declaration.
PHP may perform implicit conversions of variable types during program runtime. As with other strongly typed languages, explicit type conversions may be performed in programs.
PHP variables can be classified as simple types (int, string, bool), collection types (array resource object) and constants (const). All of the above variables have the same structure at the bottom zval.
Zval consists of three main parts.
- type: specifies the type of variable stated (integer, string, array, etc.)
- refcount&is_ref: used to implement reference counting (described later in detail)
- value: the core part, which stores the actual data of the variable
Zvalue is used to store the actual data of a variable. Because of the need to store multiple types,
zvalue is a
union, and thus implements weak types.
The correspondence between the
php variable type and its actual storage is as follows.
Reference counting is widely used in memory recovery, string manipulation, etc. Variables in PHP are a typical application of reference counting. Zval’s reference counting is implemented by member variables is_ref and ref_count, which allows multiple variables to share the same data. This allows multiple variables to share a single copy of the data, avoiding the need for frequent copying. When assigning, zend points the variable to the same Zval with ref_count++ and ref_count-1 when unset. only when ref_count is reduced to 0 is the actual destruction operation performed. In the case of a reference assignment, zend will modify is_ref to 1.
PHP variables share data by reference counting, so what if you change the value of one of the variables? When trying to write to a variable, if Zend finds that the Zval pointed to by that variable is shared by multiple variables, it makes a copy of the Zval with a ref_count of 1 and decrements the refcount of the original Zval, a process called ‘Zval separation’. This process is called ‘Zval separation’. As you can see, zend will only copy when a write operation occurs, so it is also called copy-on-write. Integer and floating point number is one of the basic types in PHP and a simple type variable. For integers and floating point numbers, the corresponding values are stored directly in Zvalue. The types are long and double.
The Zvalue structure shows that, unlike strongly typed languages such as c, php does not distinguish between int, unsigned int, long, etc. For it, there is only one type of integer, which is long, and thus the range of integers in php is determined by the number of bits in the compiler rather than being fixed.
For floating point numbers, similar to integers, it also doesn’t distinguish between float and double, but is uniformly of the type double only. In php, what if an integer is out of bounds? In this case, it is automatically converted to double, so be careful, as many triks are generated from this.
Like integers, character variables are also base and simple type variables in php. The Zvalue structure shows that in php, strings consist of a pointer to the actual data and a length structure, which is more similar to string in c++. Since the length is represented by an actual variable, unlike c, its strings can be binary numbers (containing \0), and in php, finding the string strlen is an O(1) operation. When adding, modifying, or appending string operations, php reallocates memory to generate a new string. Finally, for security reasons, php still adds a \0 to the end of a string when it is generated.
Common string splicing methods and speed comparison. Suppose there are 4 variables as follows.
Now a comparison and explanation of several character splicing methods as above.
PHP arrays are implemented naturally via zend hashtable. how is foreach operation implemented?
For an array foreach is done by traversing a Doubly linked list in hashtable. For indexed arrays, foreach is more efficient than for, eliminating the need for a key->value lookup. count calls HashTabel -> NumOfElements, O(1), and for strings like ‘123’, zend converts them to integers,
$arr[' 123']are equivalent.
Resource types are the most complex variables in PHP and are a type of compound structure. PHP’s Zval can represent a wide range of data types, but it is difficult to adequately describe them for custom data types. Since there is no efficient way to depict these composite structures, there is also no way to use traditional straw rentals for them. To solve this problem, it is only necessary to refer to a pointer by an essentially arbitrary identifier (label), in a way known as a resource.
In Zval, for resource, lval is used as a pointer to the address where the resource is located. resource can be any composite structure, we are familiar with mysqli, fsock, memcached, etc. are all resources.
How to use resources
- Registration: for a custom data type to be used as a resource. It needs to be registered first, and
zendwill assign a globally unique label to it.
- Get a resource variable: For a resource,
hash_taleof id->actual data. For a
resource, only its id is recorded in
Zval. fetch finds the specific value in the hash_table by id and returns it.
- Resource destruction: There are various data types for resources. There is no way to destroy it in
Zenditself. So you need to provide the destruction function when registering the resource.
zendcalls the appropriate function to complete the destruct. It is also removed from the global resource table.
A resource can persist for a long time, not just after all the variables that reference it have gone out of scope, but even after a request has ended and a new one has been made. These resources are called persistent resources because they persist through the entire lifecycle of
SAPI, unless they are intentionally destroyed. In many cases, persistent resources can improve performance to some extent. For example, in the common case of
mysql_pconnect, persistent resources are allocated via
pemalloc so that they are not freed at the end of the request. For
zend, there is no distinction between the two per se.
How local and global variables are implemented in PHP
- For a request, at any given moment
PHPcan see two symbol tables (symbol_table and active_symbol_table), where the former is used to maintain global variables. The latter is a pointer to the symbol table of the currently active variable. When the program enters a function,
zendallocates a symbol table x to it while pointing
active_symbol_tableto a. The distinction between global and local variables is achieved in this way.
- Get variable values:
PHPsymbol table is implemented by
hash_table, each variable is assigned a unique identifier, and the corresponding
Zvalis found from the table when fetching.
- Using global variables in functions: In functions, we can use global variables by explicitly declaring
global. Create a reference to a variable with the same name in
active_symbol_table, or first if there is no variable with the same name in