Monday, July 25, 2011

On File Names

Many programs have input languages that they accept. Compilers and interpreters accept programs, web browsers accept web pages, database systems accept queries, and file systems accept file names. An input language can be a great way to get input from a user.

But programs should not be using these input languages when they communicate with each other. Why? Because if program A is manipulating strings in the input language of program B, then A is reimplementing at least some of the syntactic and semantic logic that is already in program B. These reimplementations, besides being wasted effort, tend to be brittle and buggy.

In some areas, we've made great progress in dealing with this problem. For example, there are many libraries which allow for the manipulation of program source code. But there is one input language in particular whose semantics are at least partially reimplemented by nearly every substantial program in existence: the language of file names.

Lest you think that the language of file names is trivial, consider all of these syntactic and semantic details which a robust program must consider:

  • Relative and absolute paths
  • Trailing slashes
  • Parent and self references
  • Special files (devices, pipes, directories, and symbolic links)
  • Hard links
  • Permissions (having a file name doesn't mean you can open it)
  • Existence (having a file name doesn't mean it exists)
  • Platform-specific issues, such as:
    • Supported characters in names
    • Special characters (like the path separator)
    • Whether or not particular kinds of links are supported
    • What permissions are supported
    • Oddities like Windows drives

Now, I will briefly consider solutions to this problem. Much more about solutions will come in a future post. Two options have come to mind:

  1. Provide a wrapper interface for existing file systems that understands and enforces all the relevant syntactic and semantic details.
  2. Provide a fundamentally different file system. This option is more powerful, and I have some ideas about how this power can be used which I will discuss in a future post, hopefully with a prototype implementation of such a file system.

1 comment: