Recently, due to work requirements, I did research and study on the European General Data Protection Regulation, in which there is a very important point, which is also common sense, is the need to do a good job of encrypting and storing the user’s personal privacy data to avoid the leakage of the user’s private plaintext data.

Solution Analysis

Thinking about how to do a good job of encrypting user privacy data can start with an analysis of a typical data read/write link.

1
2
3
4
5
_________          query           _________           read           _________
|       |     ---------------->    |       |     ---------------->    |       |
|  App  |                          |  DB   |                          | Disk  |
|       |     <================    |       |     <================    |       |
---------           rows           ---------         data page        ---------

According to this link analysis, 4 types of solutions for data encryption can be classified according to the starting point of data encryption.

  • Application layer encryption and decryption: the application itself is responsible for the encryption and decryption of data, which is the most liberal but also the most cumbersome solution.
  • DB pre-processing: the encryption logic is embedded before the database server starts serving, typically represented by the database proxy service.
  • Disk access session: The basic idea of this scheme then is to go around behind the database and inject hook processes in the file system, so that the encryption logic can be embedded before the disk data is read or written.
  • DB post-processing: embedding encryption logic after the database service, relying on triggers provided by the database and function customization functions, etc.

The following is an analysis of these types of schemes.

Application layer encryption and decryption scheme

With this scheme, data encryption and decryption are database agnostic, and the encryption is done by the application before depositing the data and decryption is done after reading the data. The advantages of this scheme are as follows.

  • Good migration: since it does not depend on any database features or operating system features, only code needs to be deployed to run.
  • Flexible implementation: the logic is placed in the application layer, various customizations or extensions are very easy to carry out, and the per-table/per-column encrypted storage can be easily implemented.

At that time, the disadvantages were also very obvious.

  • Impact on the use of advanced database features: such as database indexes and execution plans, etc;
  • Significantly affect the performance of database query: for example, the prefix query of Like and the range query of Where can only scan the whole table because of data encryption.
  • High development and maintenance costs: Every time when new data need to be encrypted, the development and testing need to be completed, and developers need to pay attention to the core business logic as well as the logic of data encryption and decryption in the application.

To implement the encryption and decryption scheme in the application layer, we can consider combining the callback function mechanism of various ORMs, such as the Callbacks mechanism provided by gorm, a popular ORM framework in golang, or the Callbacks mechanism of Active Record in Ruby on Rails framework, which can effectively help us isolate the business code and control code from each other.

Database preprocessing solution

The database pre-processing scheme is an optimization of the application layer encryption and decryption scheme. The essence of the idea is to separate the concern of encryption and decryption from the application and serve it independently, while achieving the reusability of the encryption and decryption logic.

Using database preprocessing can integrate some advantages of the application layer encryption scheme, while solving some problems of the latter.

  1. flexibility: database proxies have the same good flexibility to achieve column-level encryption and decryption.
  2. better reusability: multiple applications or systems no longer need to be built repeatedly, and the use of database agents with independently maintained services enables fast access to encryption and decryption functions.
  3. higher transparency: application developers do not need to pay attention to the encryption and decryption logic within themselves, but due to the degradation of SQL compatibility of the database agent, the application development process has to maintain an understanding of SQL compatibility.

The database pre-processing solution also has some drawbacks of its own.

  1. decreased stability and increased operation and maintenance burden: because the overall architecture introduces additional service nodes, it will add burden to the overall service cost and problem troubleshooting scenarios, and when data processing problems are encountered, it is necessary to involve the maintenance staff of database proxy services to troubleshoot together.
  2. inability to utilize advanced database features: similar to the application layer data encryption and decryption scheme, under this scheme, as the data is encrypted and stored and indexed via the database, data retrieval involving range types in query scenarios will lead to table sweeping.
  3. Consistency problem: Due to the need to maintain the meta information of business data tables / or columns in the database agent, it introduces the consistency problem of the configuration information of the agent layer and the business database design.

Disk Access Session: Transparent Data Encryption

Currently, all cloud service vendors basically provide services for this scheme, referred to as Transparent Data Encryption (TDE). It works at the operating system level and works on database data files. Through the hook process of i/o, it embeds encryption and decryption logic when the database storage engine reads and writes data to the file system, completes encryption before writing the data, and performs decryption before the data is read and returned to the storage engine process, and the whole process is completely transparent to the database storage engine.

The advantages of this solution are much more enticing.

  • Transparency: since the solution works on the database server, it is completely transparent to the application, which is still directly connected to the database, and there is absolutely no need to worry about SQL compatibility issues, etc.
  • Support for all advanced database features: since this solution is also completely transparent to the database engine, in the database perspective, it makes no difference whether it is encrypted or decrypted in all aspects of data retrieval logic as well as execution plan optimization, transaction management, etc., so this solution can perfectly preserve the advanced features of the database.

Of course, there is no perfect solution, and this solution has certain limitations.

  • Table-level encryption only: Since this scheme is a senseless encryption and decryption of data files, its biggest limitation is that it cannot achieve encryption in the middle part of the specified data table. Of course, if you can design a structure in your business that just about all tables or almost all columns need to be encrypted, this limitation is not a problem at all.
  • Platform compatibility issues: Since it relies on the design at the operating system level, this solution may only run on top of a specific platform; in addition, if it is under a cloud scenario, you also need to consider the differences in the solutions provided by different public clouds, especially the differences in the details of how to use them, which may lead to the problem of unconformity when the same system is migrated between cloud vendors.

In the design of my own business system, we introduced the concept of user privacy domain, and the design will separate user privacy data and business process data, and naturally realize the centralized storage of user privacy data, so full table data encryption is the most suitable choice for us.

DB post-processing

DB post-processing is based on database triggers and custom function functions, etc., which trigger the execution of custom encryption and decryption logic during the execution of queries by the database server. This solution can be said to solve the problems of the previous solutions.

  • transparency: DB post-processing, completely transparent to the application.
  • Advanced database features: fully compatible.
  • Flexibility: it can be flexible to achieve column-level data encryption, etc.

Although this solution also looks beautiful, it is still not perfect.

  • Anti-pattern: I have personally resisted managing functions or triggers, etc. on the database server because they are not easy to manage, let alone implement versioning, etc..
  • Database compatibility issues: Due to the dependency on the features provided by the database, this solution is basically impossible to achieve fast migration in scenarios such as having to replace the database.

Comparison

The following table compares several options.

No. The link where the encryption is located Introduction Cases Advantages Restrictions
1 App Encrypted by application before storage and decrypted by application after retrieval orm + various hooks implementation Flexible/fine granularity/compatible not transparent to application, increase development burden/additional secret key management burden
2 Database pre-proxy Add a layer of proxy service between application and database, and the proxy service is responsible for encryption and decryption Tencent Cloud CASB Highly transparent to application, no development burden/support column-level encryption SQL syntax compatibility decreases, not completely transparent to development/database features such as optimization processing, transaction processing, concurrency processing are not available/introduce additional links, performance and potential troubleshooting burdens/data bloat is huge, ~4.5x space bloat
3 Database data files work at the file system level, encryption is completed before the engine is persistently stored, data files are decrypted before they are loaded into memory, and plaintext data is retained in memory Tencent Cloud Transparent Data Encryption (TDE) is completely transparent to applications/built-in support for cloud databases, compatible with advanced database features Tablespace encryption, no support for field-level encryption/no way to avoid the risks associated with SQL injection.
4 Database post-processing Using “view” + “trigger” + “extended index” + “external call The database post-processing uses “view” + “trigger” + “extended index” + “external call” to achieve data encryption while ensuring full application transparency. - Completely transparent to the application/database advanced feature compatibility/support for column-level encryption

The problem of coexistence of several options

The problem that has never been discussed in the previous analysis is a problem that cannot be bypassed no matter which scheme is used.

  • Performance overhead: the additional overhead of data encryption and decryption process, data encryption and decryption requires a large number of computation processes, this will bring about a performance overhead that cannot be ignored, in the choice of encryption and decryption algorithms, it is necessary to choose the most efficient encryption algorithm possible, while ensuring data security.
  • Space expansion: In addition to the performance overhead will affect the choice of encryption algorithm, the encryption algorithm may also bring the problem of space expansion, that is, the ciphertext becomes longer than the plaintext, if it is space expansion, the database column in the design of various types of length also need to additionally consider the length of the column data after expansion. I learned that the CTR mode of AES encryption algorithm can achieve the same length of ciphertext and plaintext.
  • key management problem: no matter which encryption scheme, we need to think about the private storage of the key and the regular rotation problem, etc.. Otherwise, once the key is leaked, the encrypted data may also be cracked.

Summary

Data storage security is an increasingly important and important issue that must be considered well in advance during the application design phase, as it involves complex issues of quality of service and cost balance. Data encryption is gradually playing a common-sense role in business compliance and privacy protection. Just as everyone knows that passwords cannot be stored in clear text nowadays, encryption of all kinds of personal privacy data should also become a standard design in systems in the future.