Archiving, compressing, and decompressing files is a frequently used function, and we can do this with tools like tar and gzip. In Go, the standard libraries
compress provide us with these capabilities, and with this example, you will see that it is very easy to generate and handle compressed archives in a Go programming style.
Archiving and Compression
Before we start the code, we need to clarify the concepts of archiving and compression.
- Archiving, which refers to a collection of files or directories that are stored in a file.
- Compression, which refers to the use of algorithms to process files in order to retain the maximum file information while making them smaller.
Take the archiving tool tar for example, the files typed out by it are usually called tarball, and their file names usually end with .tar. The tarball is then compressed by other compression tools, such as gzip, to get a compressed file that usually ends in .tar.gz (you can use the -z argument in tar to invoke gzip).
A tarball is a collection of files whose structure is also made up of data segments, each of which contains a header (meta information describing the file) and the contents of the file.
archive library archiving and unarchiving
The archive library is used for archiving and unarchiving. It provides two options: tar and zip, and the paths to call them are
Let’s take tar as an example to show how to archive and unarchive files.
First, create a new target archive file as out.tar, then construct some file data readme.txt, gopher.txt and todo.txt for archiving.
Then the file header information is constructed in order, specifying the file name, permissions and size respectively (more header fields can be defined), and the
Write methods are called in order to write the data segments to be archived (header + file content) to the out.tar file via tw variables of type
Executing the above code will result in an archived out.tar file, which can be viewed by specifying the -tvf parameter with the tar utility.
As you can see, the specified file information (file name, permissions and size) is as expected, but other unspecified meta information is wrong, such as the date (the default value given directly).
If we use the tar utility, we can execute the following command to extract the files in out.tar.
But what should be done to implement it in the program?
First, open out.tar and construct a tr variable of type
*tar.Reader. After that, use
tr.Next to extract the contents of each data segment in turn and copy the contents of the file to the standard output via io. Until
io.EOF, which means that the end of the archive file has been read, the extraction is exited.
Compress library compression and decompression
The compress library supports several compression schemes, including bzip2, flate, gzip, lzw and zlib, and is called from
Let’s take the commonly used gzip as an example to show the compression and decompression code.
If the same file data readme.txt, gopher.txt and todo.txt as above, we want to get the tar-archived and compressed out.tar.gz file, how should we do it?
Very simple! Just change
gz is derived from
Comparing the size of the archived tarball with and without compression, we can see that the file size is compressed from 4.0K to 224B.
Similarly, if you want to uncompress and unarchive the out.tar.gz file, how should you do it?
It’s still very simple! Just change
gz is derived from
This article shows how to archive and unarchive files with the
archive/tar package. How to further compress and decompress a tarball with the
When showing the use of
compress/gzip, an additional layer of Writer/Reader is wrapped to add compression and decompression capabilities to the tar archive. Even better, if you want to switch between archiving/unarchiving and compressing/decompressing strategies, you can simply replace the corresponding Writer/Reader. This convenience comes from Go’s excellent streaming IO design.
Of course, it’s not easy to learn this on paper, but you have to do it yourself. For those who haven’t used the
compress libraries, you can try to use a scheme not used in this article to try to handle archived compressed files.