User Guide and
Reference Manual


January 2000

Next : The /proc filesystem
Previous : Organization of the file tree

(Back to the table of contents)

Chapter 4 : The Linux filesystem: ext2fs (EXTended 2 FileSystem)

The User Guide will have introduced the concepts of file ownership and access permissions, but really understanding the Linux filesystem requires that we redefine the concept of a file itself. One reason is that:

Everything is a file

Here, "everything" really means everything. A hard disk, a partition on a hard disk, a parallel port, a connection to a web site, an Ethernet card, all these are files. Even directories are files. Linux recognizes many types of files in addition to the standard files and directories. Note that by file type here, we don't mean the type of the contents of a file: for Linux and any Unix system, a file, whether it be a GIF image, a binary file or whatever, is just a stream of bytes. Differentiating files according to their contents is left to applications.

If you remember well, when you do a ls -l, the character before the access rights identifies the type of a file. We have already seen two types of files: regular files (-) and directories (d). You can also stumble upon these if you wander through the file tree and list contents of directories:

Here is a sample of each file:

$ ls -l /dev/null /dev/sda  /etc/rc.d/rc3.d/S20random /proc/554/maps \
crw-rw-rw-    1 root     root       1,   3 May  5  1998 /dev/null
brw-rw----    1 root     disk       8,   0 May  5  1998 /dev/sda
lrwxrwxrwx    1 root     root           16 Dec  9 19:12 /etc/rc.d/rc3.d/S20random ->
pr--r--r--    1 fg       fg              0 Dec 10 20:23 /proc/554/maps|
srwx------    1 fg       fg              0 Dec 10 20:08 /tmp/ssh-fg/ssh-510-agent=

We should add that ext2fs, like all other Unix filesystems, stores files, whichever their type, in an inode table. One particularity is that a file is not identified by its name, but by an inode number. In fact, not every file has a name. Names are just a consequence of a wider notion:


The best way to understand what's behind this notion of link is to take an example. Let's create a (regular) file:

$ pwd
$ ls
$ touch a
$ ls -il a
  32555 -rw-rw-r--    1 fg       fg              0 Dec 10 08:12 a

The -i option of the ls command prints the inode number, which is the first field on the output. As you can see, before we created file a, there were no files in the directory. The other field of interest is the third one, which is the link counter of the file.

In fact, the command touch a can be separated into two distinct actions:

But now, if we type:

$ ln a b
$ ls -il a b
  32555 -rw-rw-r--    2 fg       fg              0 Dec 10 08:12 a
  32555 -rw-rw-r--    2 fg       fg              0 Dec 10 08:12 b

we have created another link to the same inode. As you can see, we have not created any file named b, but instead we have just added another link to the inode numbered 32555 in the same directory named b. You can see on the ls -l output that the link counter for the inode is now 2, and no more 1.

Now, if we do:

$ rm a
$ ls -il b
  32555 -rw-rw-r--    1 fg       fg              0 Dec 10 08:12 b

we see that even though we have deleted the "original file", the inode still exists. But now the only link to it is the file named /home/fg/example/b.

Therefore, an inode is linked if and only if it is referenced by a name at least once in any directory[13]. Directories themselves are also stored into inodes, but their link count, unlike all other file types, is the number of subdirectories within them. There are at least two links per directory: the directory itself (.) and its parent directory (..).

Typical examples of files which are not linked (ie, have no name) are network connections: you will never see the file corresponding to your connection to in your file tree, whichever directory you try. Similarly, when you use a pipe in the shell, the file corresponding to the pipe does exist, but it is not linked.

"Anonymous" pipes and named pipes

Let's get back to the example of pipes, as it's quite interesting and is also a good illustration of the notion of links. When you use a pipe in a command line, the shell creates the pipe for you and operates so that the command before the pipe writes to it, whereas the command after the pipe reads from it. All pipes, whether they be anonymous (like the ones used by the shells) or named (see below), act like FIFOs (First In, First Out). We have already seen examples of how to use pipes in the shell, but let's take one for the sake of our demonstration:

$ ls -d /proc/[0-9] | head -6

One thing that you won't notice in this example (because it happens too fast for one to see) is that writes on pipes are blocking. It means that when the ls command writes to the pipe, it is blocked until a process at the other end reads from the pipe. In order to visualize the effect, you can create named pipes, which, as opposite to the pipes used by shells, have names (ie, they are linked, whereas shell pipes are not). The command to create such pipes is mkfifo:

$ mkfifo a_pipe
$ ls -il
total 0
    169 prw-rw-r--    1 fg       fg              0 Dec 10 14:12 a_pipe|
  # You can see that the link counter is 1, and that the output shows
  # that the file is a pipe ('p').
  # You can also use ln here:
$ ln a_pipe the_same_pipe
$ ls -il
total 0
    169 prw-rw-r--    2 fg       fg              0 Dec 10 15:37 a_pipe|
    169 prw-rw-r--    2 fg       fg              0 Dec 10 15:37 the_same_pipe|
$ ls -d /proc/[0-9] >a_pipe
  # The process is blocked, as there is no reader at the other end.
  # Type C-z to suspend the process...
zsh: 3452 suspended  ls -d /proc/[0-9] > a_pipe
  # ...Then put in into the background:
$ bg
[1]  + continued  ls -d /proc/[0-9] > a_pipe
  # now read from the pipe...
$ head -6 <the_same_pipe
  # ...the writing process terminates
[1]  + 3452 done       ls -d /proc/[0-9] > a_pipe

Similarly, reads are also blocking. If we execute the above commands in the reverse order, we observe that head blocks, waiting for some process to give it something to read:

$ head -6 <a_pipe
  # Program blocks, suspend it: C-z
zsh: 741 suspended  head -6 < a_pipe
  # Put it into the background...
$ bg
[1]  + continued  head -6 < a_pipe
  # ...And give it some food :)
$ ls -d /proc/[0-9] >the_same_pipe
$ /proc/1/
[1]  + 741 done       head -6 < a_pipe

You can also see an undesired effect in the previous example: the ls command has terminated before the head command took over. The consequence is that you got back at the prompt immediately, but head executed only after. Therefore it made its output only after you got back to the prompt :)

"Special" files: character mode and block mode files

As already stated, such files are either files created by the system or peripherals on your machine. We have also mentioned that the contents of block mode character files were buffered whereas character mode files were not. In order to illustrate this, insert a floppy into the drive and type the following command twice:

$ dd if=/dev/fd0 of=/dev/null

You can observe the following: while, the first time the command was launched, the whole contents of the floppy were read, the second time there was no access to the floppy drive at all. This is simply because the contents of the floppy were buffered when you first launched the command -- and you didn't change the floppy meanwhile.

But now, if you want to print a big file this way (yes it will work):

$ cat /a/big/printable/file/somewhere >/dev/lp0

the command will take as much time whether you launch it once, twice or fifty times. This is because /dev/lp0 is a character mode file, and its contents are not buffered.

The fact that block mode files are buffered have a nice side effect: not only are reads buffered, but writes are buffered too. This allows for writes on disks to be asynchronous: when you write a file on disk, the write operation itself is not immediate. It will only occur when Linux decides for it.

Finally, each special file has a major and minor number. On a ls -l output, they appear in place of the size, as the size for such files is irrelevant:

$ ls -l /dev/hda /dev/lp0
brw-rw----    1 root     disk       3,   0 May  5  1998 /dev/hda
crw-rw----    1 root     daemon     6,   0 May  5  1998 /dev/lp0

Here, the major and minor of /dev/hda are respectively 3 and 0, whereas for /dev/lp0 they are respectively 6 and 0. Note that these numbers are unique per file category, which means that there can be a character mode file with major 3 and minor 0 (this file actually exists: /dev/ttyp0), and similarly there can only be a block mode file with major 6 and minor 0. These numbers exist for a simple reason: it allows Linux to associate the good operations to these files (that is, to the peripherals these files refer to). You don't handle a floppy drive the same way than, say, a SCSI hard drive.

Symbolic links and the limitation of "hard" links

Here we have to face a very common misconception, even among Unix users, which is mainly due to the fact that links as we have seen them so far (wrongly called "hard" links) are only associated to regular files (and we have seen that it's not the case -- all the more that even symbolic links are "linked") But this requires that we first explain what symbolic links ("soft" links, or even more often "symlinks") are.

Symbolic links are files of a particular type which sole contents is an arbitrary string, which may or may not point to an actual filename. When you mention a symbolic link on the command line or in a program, in fact you access the file it points to, if it exists. For example:

$ echo Hello >myfile
$ ln -s myfile mylink
$ ls -il
total 4
    169 -rw-rw-r--    1 fg       fg              6 Dec 10 21:30 myfile
    416 lrwxrwxrwx    1 fg       fg              6 Dec 10 21:30 mylink -> myfile
$ cat myfile
$ cat mylink

You can see that the file type for mylink is 'l', for symbolic Link. The access rights for a symolic link are not significant: they will always be rwxrwxrwx. You can also see that it is a different file from myfile, as its inode number is different. But it refers to it symbolically, therefore when you type cat mylink, you will in fact print the contents of the file myfile. To demonstrate that a symbolic link contains an arbitrary string, we can do the following:

$ ln -s "I'm no existing file" anotherlink
$ ls -il anotherlink
    418 lrwxrwxrwx    1 fg       fg             20 Dec 10 21:43 anotherlink ->
I'm no existing file
$ cat anotherlink
cat: anotherlink: No such file or directory

But symbolic links exist because they overcome several limitations encountered by normal ("hard") links:

Symbolic links are therefore very useful in several circumstances, and very often, people tend to use them to link files together even when a normal link could be used instead. One advantage of normal linking, though, is that you don't lose the file if you delete "the original one" :)

Lastly, if you have observed carefully, you know what the size of a symbolic link is: it is simply the size of the string.

File attributes

The same way that FAT has file attributes (archive, system file, invisible), ext2fs has its own, but they are different. We speak of them here for the sake of completeness, but they are very seldom used. However, if you really want a secure system, read on.

There are two commands for manipulating file attributes: lsattr(1) and chattr(1). You'll probably have guessed it, lsattr LiSts attributes, whereas chattr CHanges them. These attributes can only be set on directories and regular files. These are the following:

You may want, for example, to set the 'i' attribute on essential system files in order to avoid bad surprises. Also consider the 'A' attribute on man pages for example: this prevents a lot of disk operations and, in particular, it saves some battery life on laptops.

Next : The /proc filesystem
Previous : Organization of the file tree

Copyright � 2000 MandrakeSoft