Back to list2020-294

Linux Softlinks vs. Hardlinks

Linux and other related UNIX like systems have the ability to create links (shortcuts) to file and directories. These come in two flavors that work quite different from one another and I found new users to be confused about the subtle differences between them.

Quickstart

The command to create links no matter the type is ln. To create a Softlink you can use it like this:

# Create a softlink called `system_passwd` to `/etc/passwd`
ln -s /etc/passwd ~/system_passwd

When you run ls -l ~/system_passwd the softlink is quite obvious in multiple ways:

lrwxrwxrwx 1 root root 11 Oct 20 00:03 /root/system_passwd -> /etc/passwd

The file name is followed by an arrow pointing at the original file path.
The very first character says l standing for link.

Hardlinks can be created like this:

# Create a hardlink called `system_passwd` to `/etc/passwd`
ln /etc/passwd ~/system_passwd

Checking the new link with ls -l ~/system_passwd doesn't yield any clue to it being a link:

-rw-r--r-- 2 root root 1441 Oct  8 05:25 /root/system_passwd

Yet both files are linked as expected. Changes to one are reflected in the other. Hardlinks are simply the exact same file with different names. See below for more details.

Softlinks

Softlinks are special files in the file system that contain a pointer to the original file path. The original is unaware of the link and both files have their own inodes (unique IDs in the file system).

These types of links are very similar to Windows shortcuts. These too are nothing but files containing information about the original.

Links of this type work great regardless of the where or what the original file is. There are no restrictions. You may create softlinks to directories, files on other file systems (e.g. different partition or hdd), or even on network storage.

The only drawback would be that once the original is no longer in it's expected position (e.g. it has been renamed or removed) all links fail to work.

Hardlinks

This type of link works very different to softlinks. There is no more original vs. link. Both are the exact same file with the exactl same inode.

Let's look at the following example:

root@training:~# touch original
root@training:~# ln original hardlink
root@training:~# ls -li original hardlink
787495 -rw-r--r-- 2 root root 0 Oct 20 00:13 hardlink
787495 -rw-r--r-- 2 root root 0 Oct 20 00:13 original

Using ls -li displays the file's inodes which gives us a clue that these files are related. Both have the same inode, 787495.

Take a closer look with stat:

root@training:~# stat original
  File: original
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: fd11h/64785d    Inode: 787495      Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2020-10-20 00:13:05.966686463 +0000
Modify: 2020-10-20 00:13:05.966686463 +0000
Change: 2020-10-20 00:13:15.398693873 +0000
 Birth: -
root@training:~# stat hardlink
  File: hardlink
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: fd11h/64785d    Inode: 787495      Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2020-10-20 00:13:05.966686463 +0000
Modify: 2020-10-20 00:13:05.966686463 +0000
Change: 2020-10-20 00:13:15.398693873 +0000
 Birth: -

Looking at the details we can see the Links attribute set to 2. Everything else is the exact same.

Sharing the same inode these files are in fact just one file with two labels. In fact, every file is just a link to an inode. Look at a regular file without links:

root@training:~# stat /etc/passwd
  File: /etc/passwd
  Size: 1441            Blocks: 8          IO Block: 4096   regular file
Device: fd11h/64785d    Inode: 140246      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2020-10-20 00:03:41.998244637 +0000
Modify: 2020-10-08 05:25:34.640900174 +0000
Change: 2020-10-20 00:03:37.246240926 +0000
 Birth: -

The Links attribute shows one.

Names in the file system are nothing but links and by creating new links with ln you just add more of these labels to the same data. Doing so creates no copy of the actual data. Even huge files use the space only once regardless of how many labels/links you add to throughout your file system tree.

However hardlinks have some limitations. As they heavily depend on the inode they are limited to the filesystem. If your system has multiple file systems (e.g. partitions) you can't use hardlinks accross them. The reason for this is simple. Each filesystem has it's own catalogue of inodes. Those are just dumb numbers and not some fancy unique IDs like UUIDs.

Another limitation is that they (usually) don't work with directories. The flag -d exists to attempt hardlinks to directories. However this is limited to the superuser and even that normally fails with Operation not permitted.

Finding all hardlinks for a file

As the above examples show ls gives us very little to identify hardlinks. To find every link to a specific file you can use the find command:

root@training:~# find / -inum 787495
/root/original
/root/hardlink

In this case find goes through the entire tree. If you have multiple file systems it'd be wise to use the mount point instead to avoid false positives in those other file systems.

The key attribute here is the -inum pattern allowing you to search for the inode rather then names or other more commonly known attributes of files.

Use case for hardlinks

A major benefit of hardlinks is that each link is equal to every other link. Deleting the original has no effect on the other links. Every link works independent from every other link. Even if you delete the original other links still point at the same data and the file's contents aren't deleted.

This presents a way to create insurance against user error. Hardlinking important files in some secured backup directory users can't get into allows you to keep files even when the user deletes them by accident. These backups take up no space and as they are the same file every change to them is instant. So no need to sync.

This is of course not a real backup but a quick way to keep files from being really deleted. Users may use the rm command on the file but as long as one hardlink remains the data is still there. For recovery all you have to do is create another hardlink to the backup file and your user owes you one ;)

Bottom line

Each type of link has unique benefits and there's no right or wrong. In 99% of cases a soft link is more then enough to do the job or even the only way (e.g. directories). They are easy to work with.

Hardlinks make sense in some fringe cases and are much closer to the actual file system. Their limitations restrict the use case in many modern environments featuring a multitude of different file systems and storage media. But they can still be used in fun and creative ways.

I hope this article helps to give you a better understanding for the concept of softlinks vs. hardlinks in Linux and makes choosing one over the other easier.