compress-extract-files-tar

Exploring the Linux Tar Command: Compress and Extract Files

I have been asked to write an article that explains the tar command in Linux using simple terms.

And I totally understand why…

The tar command is definitely not one of the easiest Linux commands to learn and all these flags can be quite confusing!

The Linux tar command is used to create .tar, .tar.gz, .tgz or tar.bz2 archives, often called “tarballs”. The extensions .tar.gz and .tgz are used to identify archives generated using gzip compression to reduce the size of the archive. Archives with extension .tar.bz2 are generated using bzip2 compression.

Linux distributions provide a tar binary that supports gzip compression without the help of external commands. The same might not apply to other types of compression as we will see in this article.

Let’s start with three tar command examples to get familiar with the most common flags.

How Do You Create a Tar Archive File?

Here is a basic example of the tar command syntax.

Open the Linux command line and follow all the commands while we go through them.

Use the following command from the current directory. In this example, we are not using compression:

tar -cf archive.tar testfile1 testfile2

This command allows you to create an archive file called archive.tar that contains two files: testfile1 and testfile2. We create the archive file using two flags: c, and f.

Here is the meaning of the two flags:

-c option (same as –create): create a new archive

-f option: it allows specifying an archive file (in this case called archive.tar)

The Linux file command confirms that archive.tar is an archive:

[myuser@localhost]$ file archive.tar 
archive.tar: POSIX tar archive (GNU)

Another useful flag is the -v flag which provides a verbose output of the files processed when you execute tar.

Let’s see how the output changes if we also pass the -v flag when creating the archive:

[myuser@localhost]$ tar -cfv archive.tar testfile1 testfile2
tar: archive.tar: Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors

Weird, for some reason we get an error back…

That’s because the tar command creates an archive with a name based on what follows the -f flag, and in this case after the -f flag there is the flag v.

The result is an archive called v as you can see from the ls output below:

[myuser@localhost]$ ls -al
total 20
drwxrwxr-x. 2 myuser mygroup  4096 Jul 17 09:42 .
drwxrwxrwt. 6 root     root      4096 Jul 17 09:38 ..
-rw-rw-r--. 1 myuser mygroup     0 Jul 17 09:38 testfile1
-rw-rw-r--. 1 myuser mygroup     0 Jul 17 09:38 testfile2
-rw-rw-r--. 1 myuser mygroup 10240 Jul 17 09:42 v

[myuser@localhost]$ file v
v: POSIX tar archive (GNU)

The “No such file or directory” error is caused by the fact that tar creates an archive called v that contains three files: archive.tar, testfile1, and testfile2. But archive.tar doesn’t exist and hence the error.

This shows how important is the order of flags passed to the tar command.

Let’s swap the -f and -v flags in the tar command and try again:

[myuser@localhost]$ tar -cvf archive.tar testfile1 testfile2
testfile1
testfile2

All is good this time, the verbose flag shows the names of the two files being added to the archive we are creating.

We have seen how to add multiple files to a tar archive.

Makes sense?

How Do You View a List of All the Files in a Tar Archive?

To list all the files in a tar archive without extracting its content we will introduce a fourth flag: -t.

We can now put together three flags: -t, -v, and -f to see the files in the archive we have previously created:

[myuser@localhost]$ tar -tvf archive.tar 
-rw-rw-r-- myuser/mygroup 0 2020-07-17 09:38 testfile1
-rw-rw-r-- myuser/mygroup 0 2020-07-17 09:38 testfile2

One of the things I noticed when I started using the tar command is that different people run it in slightly different ways.

I will explain what I mean in the next section…

Does the Tar Command Need a Dash?

I noticed that in some cases the dash before the flags were present but it wasn’t always the case.

So, let’s see if passing the dash or not makes any difference.

First of all, let’s try to run the previous command without using the dash before the flags:

[myuser@localhost]$ tar tvf archive.tar 
-rw-rw-r-- myuser/mygroup 0 2020-07-17 09:38 testfile1
-rw-rw-r-- myuser/mygroup 0 2020-07-17 09:38 testfile2

The output is the same, this means the dash is not necessary.

Just to give you an idea, you can run the tar command in the following ways and obtain the same output:

tar -t -v -f archive.tar 
tar -tvf archive.tar
tar -tvf archive.tar
tar --list --verbose --file archive.tar

The last command is using the long-option style for the flags passed to Linux commands.

You can see how it’s a lot easier to use the short version of the flags.

How to Extract All the Files From a Tar Archive

Let’s introduce an additional flag you can use to extract an archive. It’s the -x flag.

Let’s extract the tar file we have created before (you might also hear the term “untar”).

Use the Linux tar command below:

[myuser@localhost]$ tar -xvf archive.tar
testfile1
testfile2
[myuser@localhost]$ ls -al
total 20
drwxrwxr-x 2 myuser mygroup    59 Feb 10 21:21 .
drwxr-xr-x 3 myuser mygroup    55 Feb 10 21:21 ..
-rw-rw-r-- 1 myuser mygroup 10240 Feb 10 21:17 archive.tar
-rw-rw-r-- 1 myuser mygroup    54 Feb 10 21:17 testfile1
-rw-rw-r-- 1 myuser mygroup    78 Feb 10 21:17 testfile2 

As you can see we have used the -x flag to extract the content of the archive, the -v flag to do it verbosely, and the -f flag to refer to the archive file specified after the flags (archive.tar).

NOTE: As mentioned before we are only typing the dash character once before all the flags. We could have specified the dash character before each flag instead and the output would have been the same.

tar -x -v -f archive.tar

How To Extract Only One File From a Tar Archive

There is also a way to extract a single file from your archive.

In this scenario, it doesn’t make much difference considering that there are only two files inside our archive. But it can make a huge difference if you have an archive containing thousands of files and you only need one.

This is very common if you have a backup script that creates an archive of the log files for the last 30 days and you only want to see the content of the log file for a specific day.

To extract only testfile1 from archive.tar, you can use the following generic syntax:

tar -xvf {archive_file} {path_to_file_to_extract}

And in our specific case:

tar -xvf archive.tar testfile1

Let’s see what changes if I create a tar archive that contains two directories:

[myuser@localhost]$ ls -ltr
total 8
drwxrwxr-x. 2 myuser mygroup 4096 Jul 17 10:34 dir1
drwxrwxr-x. 2 myuser mygroup 4096 Jul 17 10:34 dir2
[myuser@localhost]$ tar -cvf archive.tar dir*
dir1/
dir1/testfile1
dir2/
dir2/testfile2

Note: Notice that I have used the wildcard * to include in the archive any files or directories whose name starts with “dir”.

If I want to just extract testfile1 the command will be:

tar -xvf archive.tar dir1/testfile1

After the extraction, the original directory structure is preserved, so we will end up with testfile1 inside dir1:

[myuser@localhost]$ ls -al dir1/
total 8
drwxrwxr-x. 2 myuser mygroup 4096 Jul 17 10:36 .
drwxrwxr-x. 3 myuser mygroup 4096 Jul 17 10:36 ..
-rw-rw-r--. 1 myuser mygroup    0 Jul 17 10:34 testfile1

Is everything clear?

How Can You Compress A Tar Archive to Reduce Its Size?

All the tar files we have created so far were uncompressed. Let’s say we want to also compress our archives.

How do we do it?

Gzip and Bzip2 compression formats can be used to reduce the size of a tar archive. The additional tar flags to enable compression are:

  • -z for Gzip: long flag is –gzip
  • -j for Bzip2: long flag is –bzip2

To create a gzipped tar archive called archive.tar.gz with verbose output run the following command (also one of the most common commands used when creating tar archives):

tar -czvf archive.tar.gz testfile1 testfile2

And to extract its content we will use:

tar -xzvf archive.tar.gz

This is also one of the essential commands when working with tar.

Notice how in the name of the file created, we append the .gz extension to the previous tar extension.

We could have also used the .tgz extension instead of .tar.gz and the result would have been the same.

Now, let’s create an archive that uses bzip2:

[myuser@localhost]$ tar -cvjf archive.tar.bz2 testfile*
testfile1
testfile2
/bin/sh: bzip2: command not found
tar: Child returned status 127
tar: Error is not recoverable: exiting now

The error “bzip2: command not found” shows that the tar command is trying to use the bzip2 command for the compression but the command cannot be found on our Linux system.

The solution is to install bzip2. The procedure depends on the Linux distribution you are using, in my case, it’s CentOS that uses YUM as a package manager.

Let’s install bzip2 using the following yum command:

yum install bzip2

I can confirm that the bzip2 binary exists using the which command:

[myuser@localhost]$ which bzip2
/usr/bin/bzip2

And the command to create compressed archives using bzip2 is the following:

[myuser@localhost]$ tar -cvjf archive.tar.bz2 testfile*
testfile1
testfile2
[myuser@localhost]$ ls -al
total 16
drwxrwxr-x. 2 myuser mygroup 4096 Jul 17 10:45 .
drwxrwxrwt. 6 root     root     4096 Jul 17 10:53 ..
-rw-rw-r--. 1 myuser mygroup  136 Jul 17 10:54 archive.tar.bz2
-rw-rw-r--. 1 myuser mygroup  128 Jul 17 10:45 archive.tar.gz
-rw-rw-r--. 1 myuser mygroup    0 Jul 17 10:44 testfile1
-rw-rw-r--. 1 myuser mygroup    0 Jul 17 10:44 testfile2

Everything works!

Also, considering that I’m very curious, I want to see what is the difference between the two archives (.tar.gz and .tar.bz2) according to the Linux file command:

[myuser@localhost]$ file archive.tar.gz 
archive.tar.gz: gzip compressed data, last modified: Fri Jul 17 10:45:04 2020, from Unix, original size 10240
[myuser@localhost]$ file archive.tar.bz2 
archive.tar.bz2: bzip2 compressed data, block size = 900k

As you can see, Linux can distinguish between archives generated using the two different compression algorithms.

How Do You Compress a Directory Using Tar?

Let’s see how to tar an entire directory and not just the files inside it.

Create a directory called tar_example and inside it create two files:

[myuser@localhost]$ mkdir tar_example
[myuser@localhost]$ touch tar_example/testfile1
[myuser@localhost]$ touch tar_example/testfile2

Now, execute the following command to generate the tar file of this specific directory:

[myuser@localhost]$ tar -cvzf tar_example_dir.tar.gz tar_example 
a tar_example
a tar_example/testfile1
a tar_example/testfile2

Confirm that the tar file has been created correctly using the tar command with the -t option. This command shows the content of the tar.gz file:

[myuser@localhost]$ tar -tvzf tar_example_dir.tar.gz 
drwxr-xr-x  0 myuser mygroup       0 Jul 17 11:04 tar_example/
-rw-r--r--  0 myuser mygroup       0 Jul 17 11:04 tar_example/testfile1
-rw-r--r--  0 myuser mygroup       0 Jul 17 11:04 tar_example/testfile2

Everything looks good!

Conclusion

In this article, you have learned the most common flags used with the tar command, how to create and extract a tar archive and how to create and extract a gzipped tar archive.

Let’s recap all the tar command options again:

  • -c: create a new archive
  • -f: allows specifying the file name of the archive
  • -t: list the contents of an archive
  • -v: verbosely list files processed
  • -x: extract files from an archive
  • -z: create compressed archive files using gzip
  • -j: used to compress with bzip2

And you? What are you using the tar command for?

Bonus read: now that you know how to use the tar command, let’s learn how to use another important Linux command: the tail command.

4 comments

Leave a Reply

Your email address will not be published. Required fields are marked *