compress-extract-files-tar

Tar Command in Linux: Compress and Extract Files

I have been asked to write an article that explains the Linux tar command in simple terms.

And I totally understand why…

The tar command is definitely not one of the easiest Linux commands to learn and all these flags can be quite confusing!

How does the Linux tar command work?

The tar command is used to create .tar, .tar.gz, .tgz or tar.bz2 archives, often called “tarballs”. The extensions .tar.gz and .tgz are used to identify archives generated using gzip compression to reduce the size of the archive. Archives with extension .tar.bz2 are generated using bzip2 compression.

Linux distributions provide a tar binary that supports gzip compression without the help of external command. The same might not apply to other types of compression as we will see in this article.

Let’s start with three examples of tar commands to get familiar with the most common flags.

Create an archive that contains two files

Here is a basic example of tar command, in this case we are not using compression:

tar -cf archive.tar testfile1 testfile2

This command creates an archive file called archive.tar that contains two files: testfile1 and testfile2.

Here is the meaning of the two flags:

-c (same as –create): create a new archive

-f: it allows to specify an archive file (in this case called archive.tar)

The file command confirms that archive.tar is an archive:

[myuser@localhost]$ file archive.tar 
archive.tar: POSIX tar archive (GNU)

Another useful flag is the -v flag that provides a verbose output of the files processed during the execution of the tar command on Linux.

Let’s see how the output changes if we also pass the -v flag when creating the archive:

[myuser@localhost]$ tar -cfv archive.tar testfile1 testfile2
tar: archive.tar: Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors

Weird, for some reason we get an error back…

That’s because the tar command creates an archive with a name based on what follows the -f flag, and in this case after the -f flag there is v.

The result is an archive called v as you can see from the ls output below:

[myuser@localhost]$ ls -al
total 20
drwxrwxr-x. 2 myuser mygroup  4096 Jul 17 09:42 .
drwxrwxrwt. 6 root     root      4096 Jul 17 09:38 ..
-rw-rw-r--. 1 myuser mygroup     0 Jul 17 09:38 testfile1
-rw-rw-r--. 1 myuser mygroup     0 Jul 17 09:38 testfile2
-rw-rw-r--. 1 myuser mygroup 10240 Jul 17 09:42 v

[myuser@localhost]$ file v
v: POSIX tar archive (GNU)

The “No such file or directory” directory is caused by the fact that tar tries to create an archive called v that contains three files: archive.tar, testfile1 and testfile2.

But archive.tar doesn’t exist and hence the error.

This show how important is the order of flags for tar.

Let’s swap the -f and -v flags in the tar command and try again:

[myuser@localhost]$ tar -cvf archive.tar testfile1 testfile2
testfile1
testfile2

All good this time, the verbose flag show the names of the two files being added to the archive we are creating.

Makes sense?

List all files in a tar archive verbosely

To list all the files in a tar archive without extracting its content we will introduce a fourth flag:

-t: list the contents of an archive

We can now put together three flags: -t, -v and -f to see the files in the archive we have previously created:

[myuser@localhost]$ tar -tvf archive.tar 
-rw-rw-r-- myuser/mygroup 0 2020-07-17 09:38 testfile1
-rw-rw-r-- myuser/mygroup 0 2020-07-17 09:38 testfile2

One of the things I noticed when I started using the tar command is that different people were running it in a slightly different way.

I will explain what I mean in the next section…

Shall I Use the Dash or Not with Tar?

I noticed that is some cases the dash before the flags was present but it wasn’t always the case.

So, let’s see if passing the dash or not makes any difference.

First of all, let’s try to run the same command without using the dash before the flags:

[myuser@localhost]$ tar tvf archive.tar 
-rw-rw-r-- myuser/mygroup 0 2020-07-17 09:38 testfile1
-rw-rw-r-- myuser/mygroup 0 2020-07-17 09:38 testfile2

The output is the same, this mean the dash is not necessary.

Just to give you an idea, you can run the tar command in the following way and obtain the same output:

tar -t -v -f archive.tar 
tar -tvf archive.tar
tar -tvf archive.tar
tar --list --verbose --file archive.tar

The last command is using the long-option style for flag provided to Linux commands.

You can see how it’s a lot easier to use the short version of the flag.

Extract all files from an archive

Let’s introduce an additional flag that allows to extract the content of a tar archive. It’s the -x flag.

To extract the content of the file we have created before we can use the following command:

tar -xvf archive.tar
(the two lines below are the output of the command in the shell)
testfile1
testfile2
ls -al
total 20
drwxrwxr-x 2 myuser mygroup    59 Feb 10 21:21 .
drwxr-xr-x 3 myuser mygroup    55 Feb 10 21:21 ..
-rw-rw-r-- 1 myuser mygroup 10240 Feb 10 21:17 archive.tar
-rw-rw-r-- 1 myuser mygroup    54 Feb 10 21:17 testfile1
-rw-rw-r-- 1 myuser mygroup    78 Feb 10 21:17 testfile2 

As you can see we have used the -x flag to extract the content of the archive, the -v flag to do it verbosely and the -f flag to refer to the archive file specified after the flags (archive.tar).

NOTE: As mentioned before we are only typing the dash character once before all the flags. We could have specified the dash sign before each flag instead and the output would have been the same.

tar -x -v -f archive.tar

There is also a way to extract a single file from your archive.

In this scenario it doesn’t make much difference considering that there are only two files inside our archive. But it can make a huge difference if you have an archive that contains thousands of files and you only need one of them.

This is very common if you have a backup script that creates an archive of the log files for the last 30 days and you only want to see the content of the log file for a specific day.

To extract only testfile1 from archive.tar, you can use the following generic syntax:

tar -xvf {archive_file} {path_to_file_to_extract}

And in our specific case:

tar -xvf archive.tar testfile1

Let’s see what changes if I create a tar archive that contains two directories:

[myuser@localhost]$ ls -ltr
total 8
drwxrwxr-x. 2 myuser mygroup 4096 Jul 17 10:34 dir1
drwxrwxr-x. 2 myuser mygroup 4096 Jul 17 10:34 dir2

[myuser@localhost]$ tar -cvf archive.tar dir*
dir1/
dir1/testfile1
dir2/
dir2/testfile2

Note: Notice that I have used the wildcard * to include in the archive any files or directories whose name starts with “dir”.

If I want to just extract testfile1 the command will be:

tar -xvf archive.tar dir1/testfile1

After the extraction the original directory structure is preserved, so I will end up with testfile1 inside dir1:

[myuser@localhost]$ ls -al dir1/
total 8
drwxrwxr-x. 2 myuser mygroup 4096 Jul 17 10:36 .
drwxrwxr-x. 3 myuser mygroup 4096 Jul 17 10:36 ..
-rw-rw-r--. 1 myuser mygroup    0 Jul 17 10:34 testfile1

Everything clear?

Reducing the size of a tar archive

Gzip and Bzip2 compression can be used to reduce the size of a tar archive.

The additional tar flags to enable compression are:

  • -z for Gzip compression: long flag is –gzip
  • -j for Bzip2 compression: long flag is –bzip2

To create a gzipped tar archive called archive.tar.gz with verbose output we will use the following command (also one of the most common commands used when creating tar archives):

tar -czvf archive.tar.gz testfile1 testfile2

And to extract its content we will use:

tar -xzvf archive.tar.gz

We could have also used the .tgz extension instead of .tar.gz and the result would have been the same.

Now, let’s create an archive that uses bzip2 compression:

[myuser@localhost]$ tar -cvjf archive.tar.bz2 testfile*
testfile1
testfile2
/bin/sh: bzip2: command not found
tar: Child returned status 127
tar: Error is not recoverable: exiting now

The error “bzip2: command not found” shows that the tar command is trying to use the bzip2 command for the compression but the command cannot be found on our Linux system.

The solution is to install bzip2. The procedure depends on the Linux distribution you are using, in my case it’s CentOS that uses yum as package manager.

Let’s install bzip2 using the following yum command:

yum install bzip2

I can confirm that the bzip2 binary exsts using the which command:

[myuser@localhost]$ which bzip2
/usr/bin/bzip2

And now if I run the tar command with bzip2 compression again:

[myuser@localhost]$ tar -cvjf archive.tar.bz2 testfile*
testfile1
testfile2
[myuser@localhost]$ ls -al
total 16
drwxrwxr-x. 2 myuser mygroup 4096 Jul 17 10:45 .
drwxrwxrwt. 6 root     root     4096 Jul 17 10:53 ..
-rw-rw-r--. 1 myuser mygroup  136 Jul 17 10:54 archive.tar.bz2
-rw-rw-r--. 1 myuser mygroup  128 Jul 17 10:45 archive.tar.gz
-rw-rw-r--. 1 myuser mygroup    0 Jul 17 10:44 testfile1
-rw-rw-r--. 1 myuser mygroup    0 Jul 17 10:44 testfile2

Everything works!

Also, considering that I’m very curious, I want to see what is the difference between the two archives (.tar.gz and .tar.bz2) according to the Linux file command:

[myuser@localhost]$ file archive.tar.gz 
archive.tar.gz: gzip compressed data, last modified: Fri Jul 17 10:45:04 2020, from Unix, original size 10240
[myuser@localhost]$ file archive.tar.bz2 
archive.tar.bz2: bzip2 compressed data, block size = 900k

As you can see, Linux can distinguish between archives generated using the two different compression algorithms.

Conclusion

In this article you have learned the most common flags used with the tar command, how to create and extract a tar archive and how to create and extract a gzipped tar archive.

Let’s recap all the flags again:

  • -c: create a new archive
  • -f: allows to specify the filename of the archive
  • -t: list the contents of an archive
  • -v: verbosely list files processed
  • -x: extract files from an archive
  • -z: use gzip compression
  • -j: use bzip2 compression

And you? What are you using the tar command for?

Let me know in the comments below 😉

Share knowledge with your friends!

4 comments

Leave a Reply

Your email address will not be published. Required fields are marked *