The Portable OS API

The Basis library provides a portable API for operating system-level operations. These deal in processes, file systems, time, date and device-specific I/O attributes, and models for manipulating these resources that are largely system-independent. In the Basis documentation this API is described under the heading of "System".

The OS structure contains part of the API in the substructures FileSys, Path, Process and IO as well as some common exception and error handling functions. The OS.FileSys structure provides an API for scanning directories, altering directories by deleting or renaming files and checking access permission. You can get and change the current directory here too.

The OS.Path provides an abstract view of file paths. The OS.Process structure provides a few process-oriented functions that are still rather Unix specific. The OS.IO structure provides an interface to the Posix poll system call. Presumably it can be implemented in terms of other system calls on other operating systems. I'll ignore it.

The signatures for the OS API can be found in the boot/OS directory of the compiler. Most of the implementation can be found in the boot/Unix directory for Posix-based Unix systems.

OS.FileSys

The file system API is straight-forward enough. I'll illustrate it with a program to scan a directory tree looking for writable executable files on the grounds that these might be a security hazard. Symbolic links won't be followed. I start with some useful declarations.

structure FS = OS.FileSys
structure OP = OS.Path

fun toErr msg = TextIO.output(TextIO.stdErr, msg)

exception Error of string

The structure declarations just provide convenient shorthand names. I've added my own Error exception so that I can report more meaningful context-sensitive errors.

Here is the main function.

fun main(arg0, argv) =
let
in
    case argv of
      [] => scan_dir OP.currentArc

    | (file::_) => scan_dir file;

    OS.Process.success
end
handle
  OS.SysErr (msg, _) =>
    (
        toErr(concat["System Error: ", msg, "\n"]);
        OS.Process.failure
    )

| Error msg => (toErr msg; OS.Process.failure)

| x => (toErr(concat["Uncaught exception: ", exnMessage x,"\n"]);
        OS.Process.failure)

The program will take a directory name on the command line or if one is omitted then the current directory is used. The OS.Path.currentArc function provides a portable way to represent the current directory. Errors that I expect from user input are reported via the Error handler. But I also catch other OS.SysErr exceptions just in case.

I open a directory with the following function. It catches the OS.SysErr exception and turns it into a more meaningful error message. (The optional syserror code in the exception is redundant).

fun open_dir dir =
(
    FS.openDir dir
)
handle 
  OS.SysErr (msg, _) => raise Error (concat[
            "Cannot open directory ", dir, ": ", msg, "\n"])

Finally here is the directory scanning function.

fun scan_dir dir =
let
    (* val _ = print(concat["scan_dir ", dir, "\n"]) *)
    val strm = open_dir dir

    fun get_files files =
    (
        case FS.readDir strm of
          "" => rev files           (* done *)

        | f => 
            let
                val file = OP.joinDirFile
                            {dir = dir, file = f}
            in
                if FS.isLink file
                then
                    get_files files
                else
                    get_files (file::files)
            end
    )

    val files = get_files []
    val _     = FS.closeDir strm

    fun show_wx file =
    (
        if FS.access(file, [FS.A_WRITE, FS.A_EXEC])
        then
            (print file; print "\n")
        else
            ()
    )

    fun scan_subdir file =
    (
        if FS.isDir file
        then
            scan_dir file
        else
            ()
    )
in
    app show_wx files;
    app scan_subdir files
end

The first line is just some tracing I used while debugging the function.

The val declarations are executed in the order that they appear. This is important since they may have side-effects. The get_files function reads the stream to build a list of files in the directory. It comes after the strm name is defined because it refers to it as a global name. The directory stream is updated imperatively by the readDir function. The OS.Path.joinDirFile function is a portable way to add a path element. It will use the right kind of slash.

I want to avoid accumulating open directories while walking the directory tree as I might run out of open files if the tree is too deep. So instead I extract the files and close the directory stream. This means that I will be retaining in memory all of the files in the directories along a path through the tree. File paths will be discarded on the way back up the tree so the garbage collector can reclaim them if needed. There is lots more memory than there are available file descriptors.

The show_wx function prints the file name if it is writable and executable. It is iterated over the list of files using the built-in app function (see the section called Lists). I recurse in scan_subdir by scanning each file that is a directory.

OS.Path

The functions in OS.Path model a file path as a list of names called arcs. There is also provision for a volume name for Microsoft Windows. File names can have an extension marked by a "." character. There are functions for splitting and joining all of these kinds of parts.

OS.Process

Your main interest in OS.Process is for the success and failure values which are needed as exit codes for your main program. The system command for running shell commands could be useful but if you want to capture the output see the functions in the Unix structure.

You can abort your program early with the exit or terminate functions but I prefer to use an exception for fatal errors. It leaves open the possibility of higher-level functions trapping the errors.

The getEnv function gets environment variables like the C library's getenv() function.

Time and Date

Time and Date values are provided in the Time and Date structures respectively. They are documented in the Basis library under the "System" heading. Their implementations appear in the boot/ directory of the compiler.

Time values in SML/NJ are stored as a pair of seconds and microseconds.

The main trick to remember with the time type is that the top-level operators are not overloaded on them. So if you want to subtract two time values you need to write something like Time.-(t1, t2). Similarly for the other symbolic functions in Time.

Also the conversion with integer values uses the LargeInt structure which on 32-bit systems is the same as Int32 i.e. boxed 32-bit integers. (See the section called Integers).

You can convert a date represented as the number of seconds since the Unix epoch to a year/month/day representation using the Time and Date structures. Write something like this:

val date = Date.fromTimeLocal(Time.fromSeconds 99999999)
val year = Date.year date

The month and day of the week is represented by the enumeration types weekday and month in Date. There is no mechanism for converting these values to integers or strings. You'll have to write your own. What you can do though is use the Date.fmt function to format a date any way you like. It uses the strftime() C library function underneath so the result should be properly locale-dependent.

At the time of writing, SML/NJ has not implemented the Date.fromString and Date.scan functions.

Unix

The "System" documention for the Basis library describes a Unix structure. It's not really a portable OS thing. It just provides some simple utility functions for spawning a sub-process, talking to it through stdin and stdout and killing it afterwards. The source for it can be found in the boot/Unix directory of the compiler.