How to work with archive files in Go

Go’s standard library has excellent support for working with various compression formats such as zip or gzip. It’s easy to get Go programs to seamlessly write and read files with gzip compression if they end with a .gz suffix. Also, the library has packages that allow us to write and read .zip files and tarballs. This Go developer tutorial explores the ideas for working with compressed archive files in Go.

Archive files in the Go programming language

Typically, an archive file is a collection of one or more files along with metadata information for storage or portability as a single file unit. Although the main purpose is long-term storage, in practice it can be used for any purpose. The metadata information often defines the storage quality of files as an archive. For example, the files can be compressed to reduce disk space, store directory structure, error detection and correction information, and so on. Sometimes the files are encrypted and archived for security reasons.

We often see archive files that are used in packaging software distribution. For example, the deb files for Debian, JAR in Java files, and APK in Android are nothing more than archived software packages. Also the zip or gzip files we see are compressed formats of one or more files for archiving purposes. File compression before archiving is common. In addition to using multiple files as a unit, the compression saves space. This is particularly useful for file transactions over networks or for other portable reasons.

Read: How to use strings in Go

Go’s support for file compression

The standard Go library provides necessary APIs for compressing files into various formats. For example, it’s fairly easy to seamlessly read and write to gzip compression files or files with a .gz extension. There is also API support for reading and writing zip files or tarballs or bz2 files. Here is a quick example of using Go APIs to create zipped archives.

How to create zip archives in Go

The easiest way to create a zipped archive is to use the zip package. The idea is to open a file so we can write to it and then use zip.Writer to write the file to the zip archive. To put this in perspective, here is the order of things we need to do:

Create an archive file. This is nothing more than a simple file that we create in Go with the os.Create function from the os package. This function creates a file if it doesn’t exist, otherwise it truncates the named file.
Now when the archive file is created we need to create a writer to write other files and directories to the archive. This can be done using the zip.NewWriter function.
After we have created a writer, we can use the zip.Writer.Create function to add files and directories to the archive.
We can either use the copy function from io.Copy or use io.Writer.Write. The copy function copies from source to destination until EOF is found in the source file or an error occurs.

Finally we need to close the zip archive.

With the following sample code we show how to create zip archives in Go. Notice that we’ve left out error handling to focus only on the points needed. Regular and full-length codes must handle errors appropriately.

Package main import (“archive / zip” “io” “os”) func main () {zipArchive, _: = os.Create (“”) defer zipArchive.Close () Writer: = zip.NewWriter (zipArchive) f1, _: = os.Open (“pom.xml”) defer f1.Close () w1, _: = Writer.Create (“pom.xml”) io.Copy (w1, f1) f2, _: = os .Open (“”) defer f2.Close () w2, _: = writer.Create (“src / main / java /”) io.Copy (w2, f2) writer.Close ()}

Read: Understanding Mathematical Operators in Go

Unpack zip archive in Golang

Unpacking – or unpacking – is just as easy as zipping, except that we have to recreate the directory structure here if the filenames used in the archive contain paths. Here is a quick example of how to unzip zip archives in Go:

func main () {// … src: = “” dest: = “myproject_unpacked” var filenames []string reader, _: = zip.OpenReader (src) defer reader.Close () for _, f: = range reader.File {fp: = filepath.Join (dest, f.Name) strings.HasPrefix (fp, filepath. Clean (dest) + string (os.PathSeparator)) filenames = append (filenames, fp) if f.FileInfo (). IsDir () {os.MkdirAll (fp, os.ModePerm) continue} os.MkdirAll (filepath.Dir (fp), os.ModePerm) outFile, _: = os.OpenFile (fp, os.O_WRONLY | os.O_CREATE | os.O_TRUNC, f.Mode ()) rc, _: = f.Open () io.Copy (outFile, rc) outFile.Close () rc.Close ()}}

Here we open the zip file for reading and call zip.OpenReader instead of the os.Open function. This function opens a zip file and returns zip.ReadCloser.

The important aspect of the ReadCloser function is that it contains a section of a pointer that points to each displaying file in the ZIP archive, which helps us to iterate over each of them. If the archive contains a directory, we will recreate the structure. The os.MkdirAll function offers the possibility of creating intermediate directories according to the structure.

Tarballs and Gzipping in Go

The creation of tarballs or tape archives is another type of archive that is frequently used. It was introduced in the late 1970s to write data to tape drives, which were a common storage device at the time. They were used to secure and transport files on Unix platforms. The technique is still in vogue today and is quite a popular format for storing multiple files as a single unit and is widely used for sending files over the Internet or for distributing / downloading software. This can be seen on Linux and Unix systems with file extensions ending in “.tar”. Note, however, that the purpose of a tarball is to consolidate files, not compress them.

Gzip, on the other hand, is a compression technique that greatly reduces the file size. This algorithm was developed by Jean-loup Gailly and Mark Adler as a free software utility for file compression in the early Unix system. The algorithm is based on the deflate technique (lossless data compression), which is a combination of LZ77 and Huffman coding. In short, gzip is great for compressing text-based files.

We often find files on numerous platforms that have been archived with the combination of tarball and compressed with gzip. The reason for this is that tarballs is a great algorithm for archiving files and gzip is for file compression, especially for text files. A combination of them corresponds to an efficient technique and is often used together. The archive file usually has an extension such as “.tgz” or “tar.gz”.

Read: Dealing with bugs in Go

Combining tar and gz in go

Here is an example to illustrate how to apply or create tar.gz files in Go. The example is similar to the one we created with zip. Here we supply a list of the files to be archived to a function called createTarAndGz, which takes files that are named in one piece of a string. As in the previous example, we first create the output file and name it with the function os.Create mytestarchive.tar.gz:

Main import package (“archive / tar” “compress / gzip” “fmt” “io” “os”) func main () {listOfFilesNames: = []string {“main.tex”, “main.toc”, “util / file1.txt”, “util / file1.txt”} outFile, err: = os.Create (“mytestarchive.tar.gz”) if err! = nil {panic (“Error creating the archive file”)} defer outFile.Close () err = createTarAndGz (listOfFilesNames, outFile) if err! = nil {panic (“Error creating the archive file.”)} fmt.Println ( “Archiving and file compression completed.”)} Func createTarAndGz (fileNames []string, buffer io.Writer) error {gzipWriter: = gzip.NewWriter (buffer) defer gzipWriter.Close () tarWriter: = tar.NewWriter (gzipWriter) defer tarWriter.Close () for _, f: = range fileNames {err: = addToTar (tarWriter, f) if err! = nil {return err}} return nil} func addToTar (tarWriter * tar.Writer, filename string) error {file, err: = os.Open (filename) if err! = nil {return err} defer file.Close () info, err: = file.Stat () if err! = nil {return err} header, err: = tar.FileInfoHeader (info, info.Name ()) if err! = nil {return err} header.Name = filename err = tarWriter.WriteHeader (header) if err! = nil {return err} _, err = io.Copy (tarWriter, file) if err! = nil {return err} return nil }

In the createTarAndGz function, we create both writers for tar and gz, which implement the io.Writer interface. The write processes created in each case form a chain of command. We then iterate over the slice of filenames and add them to the tar archive using the addToTar function.

The addToTar function opens the respective file, retrieves its metadata information, file headers to generate a tar header from it, writes the header and finally uses the io.Copy function to include the file in the archive. That’s it.

After the archive file has been created, it is extracted using any untar and gzip decompression program, e.g. B. tar xzvf mytestarchive.tar.gz.

Read: Introduction to database programming in Go

Final thought on archiving files in Go

Go’s standard library provides the functionality required to work with archive files. It supports various compression techniques like zip, gzip, bz2 etc. all of them can be put together with the tarball. The single compressed unit of multiple files can then be used for storage or transport on any media or network. We can also customize the archiving process according to our requirements. For example we can encrypt, compress and then tarball the files. The possibilities with Go’s standard library API are endless to serve our purpose.

Related posts

String objects in Java


How to Use Structures in Go


Introduction to Rational Unified Process (RUP)


Leave a Comment