The Config Layer

This layer implements modules that parse the server's configuration files. This includes the MIME types file. It does not include the authorisation files for user names and passwords. See the section called The NodeAuth Module for more information on those.

The configuration file is complex enough that I've used an ML-Yacc parser. The ML-Yacc system is simple enough to use that it can be comfortably used for small jobs like this. The ConfigTypes module defines the types for the parse tree for the main configuration file.

Most of the code is in the Config module. I'll describe that first. Most of it is just a lot of string handling and checking of parameters for correctness so I'll skim lightly over that.

The Config Module - Interface

First here are the types that describe the server's configuration.

(*  This is a simplified form of URLPath with just the parts.
    These paths are case-sensitive and so are stored in the
    original case.
*)
type NodePath = string list


(*  Required parameters are stored as strings with "" for an undefined
    value.  Optional ones as string option.
*)
datatype ServerConfig = ServerConfig of {
            server_root:    string,
            config_dir:     string,
            var_dir:        string,
            tmp_dir:        string,
            doc_dir:        string,
            cgi_dir:        string,
            mime_file:      string,
            error_log:      string,
            dir_index:      string,

            log_level:      Log.Level,

            run_user:       string option,
            run_group:      string option,

            conn_timeout:   int option,
            max_clients:    int option,
            max_tmp_space:  int option,
            max_req_size:   int option,

            listen_host:    string option,
            listen_port:    int,
            server_name:    string
            }

The server configuration is described by the ServerConfig type. It is held in a static variable in the Config module and fetched by the getServerConfig function. It is just a big record of all of the parameters that apply to the server as a whole, as described in the section called The Server Parameters in Chapter 8.

A node configuration is described by the NodeConfig type. See the section called The Node Parameters in Chapter 8 for more information on these configuration parameters.

datatype NodeConfig = NodeConfig of {
            path:       NodePath,
            kind:       NodeKind,
            options:    NodeOptionFormula list,
            auth:       NodeAuth
            }

and NodeKind = 
        NodeIsDir of {
            path:   string          (* directory path *)
            }

    |   NodeIsBuiltin of {
            name:   string
            }

    |   NodeIsScript of {
            path:   string
            }

(*  This is a subset of NodeConfig for .swerve files. *)
and SwerveConfig = SwerveConfig of {
            options:    NodeOptionFormula list,
            auth:       NodeAuth
            }

and NodeOptionFormula =
        NOFInherit
    |   NOFAll
    |   NOFNone
    |   NOFAdd of NodeOption
    |   NOFSub of NodeOption


and NodeOption =
        NodeExecCGI
    |   NodeFollowSymLinks
    |   NodeWithSubDirs


and NodeAuth = 
        NodeBasic of {
            realm:      string,
            user_file:  string,     (* path to the user file *)
            group_file: string,     (* path to the group file *)
            users:      string list,(* users to allow *)
            groups:     string list (* groups to allow *)
            }

    |   NodeNoAuth

The options for a node are described by a formula. This neatly describes how the options of a node can be computed from those of its parent by interpreting the formula. (The NodeExecCGI option is a left-over from an earlier design. It is not used anymore).

The NodeAuth type describes the authorisation parameters in a straight-forward way. Extra kinds of authorisation can be added to this type.

Each directory that implements a node can have a .swerve file that provides more parameters, mainly for authorisation. The contents of this file is described by the SwerveConfig type. Having a separate type allows parameters unique to this file to be added in the future and besides, the path and kind parameters are not relevant.

Here are the main functions of the module interface for the configuration parameters.

val processConfig:      string -> unit

val haveServerConfig:   unit -> bool

(*  This is not defined if the processConfig() has not succeeded. *)
val getServerConfig:    unit -> ServerConfig

(*  Return a list of all of the node configurations. 
*)
val getNodeConfigs:     unit -> NodeConfig list

(*  This reads a .swerve file and returns the configuration or
    NONE if it wasn't parsable.
    The Io exception will be raised if the file cannot be read.
*)
val processNodeFile:    string -> SwerveConfig option

(*  This returns the node configuration from the main configuration
    file if the node path appeared exactly in the file.
*)
val findNodeConfig:    NodePath -> NodeConfig option

The main entry point is processConfig which is called from the Main module as soon as the -f command line option is read. This reads the configuration files and saves the information in the static variables. If there are any errors then these are reported on standard error and the server configuration will not be saved.

The haveServerConfig function tests that the parsing was successful. If so then some of the parameters are poked into global variables, for example the logging file and level. The getServerConfig function can then be called from anywhere in the server to get the configuration. Since this information is immutable it can be safely called from any thread.

The node configurations are stored separately as a list in no particular order. The getNodeConfigs function fetches the list. Alternatively the findNodeConfig function can be used to get a particular node if its configuration path is known. But the main use for these configuration records is to build the resource store tree (see the section called The Store Module) and this uses the whole list.

The processNodeFile is used to parse a .swerve file. Errors are logged in the usual way. The result is returned if there was no error. Nothing is saved in static variables.

The static variables are managed like this.

val cf_server_config: ServerConfig option ref = ref NONE

val cf_nodes: NodeConfig list ref = ref []

fun getServerConfig()  = valOf(!cf_server_config)
fun haveServerConfig() = isSome(!cf_server_config)
fun getNodeConfigs()   = !cf_nodes


fun getServerRoot() =
let
    val ServerConfig{server_root, ...} = getServerConfig()
in
    server_root
end

The processConfig function looks like this.

fun processConfig file : unit =
let
    (* show the warnings while processing *)
    val _ = Log.lowerLevel Log.Warn
    val sections = parse_config false file
in
    (* dump_sections sections; *)

    (*  Ensure that the server node is processed first to
        get the server root for the nodes' files.
    *)
    process_server_section sections;
    process_node_sections sections;

    process_mime_file();
    init_globals();
    ()
end
handle _ => (Log.flush(); raise FatalX)



(*  This pokes some config parameters into various modules in
    common. 
*)
and init_globals() =
let
    val ServerConfig {error_log, log_level, max_tmp_space, ...} =
            getServerConfig()
in
    (*  Don't change the error stream until the config has been processed. *)
    Log.flush();
    Log.setLogFile error_log;
    Log.setLevel   log_level;

    case max_tmp_space of
      NONE   => ()
    | SOME l => TmpFile.setDiskLimit l;

    ()
end

The log level needs to be lowered to make sure that warnings from the configuration checking get through. They will appear on standard error. (The level could have been set higher by a command line option). Only after the configuration has been successfully read are errors redirected to the log file. Any exceptions are fatal.

The next sections describe the use of ML-Yacc and ML-Lex in detail for parsing the configuration.

The Configuration Grammar

The modules of an ML-Yacc parser are quite complex to describe as they use multiple functors to assemble the parser from parts. But it's easy to copy from an example. For the full details see the ML-Yacc documentation that comes with the source package (see Appendix C).

The parts of the parser are:

There is a lot of superficial similarity with a grammar file for standard Unix C yacc. One big difference is that it is even more strongly recommended that the parsing be side-effect free. I've seen many people write yacc grammars with action code that goes updating data structures or printing out error messages during the parsing operation. This immediately clashes with any kind of back-tracking such as error recovery. The well-known problems of putting action code in the middle of a production are an example of this.

I always just use a parser to build a parse tree. I don't report semantic errors until a later pass over the tree. The ML-Yacc parser here does the same. The action code attached to each production is just an expression that builds a node (or fragment of a node) in the parse tree. The ML-Yacc parser attempts recovery from syntax errors to continue parsing for as long as possible. This works best if the action code has no side-effects.

Here is the top part of the grammar file. It is structured like a the C yacc grammar file.

open Common
open ConfigTypes

%%

%eop EOF 

(* %pos declares the type of positions for terminals.
   Each symbol has an associated left and right position. *)

%pos Common.SrcPos
%arg (file): string

%term   
          KW_SERVER
        | KW_NODE

        | SYM_SEMICOLON
        | SYM_COMMA
        | SYM_LBRACE
        | SYM_RBRACE
        | SYM_EQUALS
        | SYM_SWERVE

        | TOK_WORD of string
        | TOK_STRING of string 
        | TOK_INT of int 

        | EOF

%nonterm 
          start of Section list
        | section_list of Section list
        | section of Section 
        | part_list of SectionPart list
        | part of SectionPart 
        | literal_list of Literal list
        | literal of Literal 

%name Config

%noshift EOF
%pure
%verbose

%%

It starts off with a header containing any SML declarations you may need for the rest of the grammar. This is delimited by the %% characters.

Then comes SML type declarations for the terminals and non-terminals. These must look like an SML datatype declaration. The terminals become the tokens for the lexer. The non-terminals are type declarations for the production rules of the grammar. The action code of the rules must be expressions of these types.

The terminal declarations can carry data. Here for example both a word and a string carry the text of the word or string. The difference between the two is that strings are quoted since they may contain special characters and unquoted words appear in special places in the grammar.

The KW_ terminals are keywords. They are reserved words that are recognised specially by the lexer. They mark the beginning of major syntactic constructs. Using reserved words helps to eliminate ambiguity in the grammar. The SYM_ terminals are punctuation symbols.

The %eop directive indicates which terminal marks the end of the input. It will be the same as the token returned by the eof function in the lexer. It also needs to be repeated in the %noshift directive.

The %arg directive declares an argument that will be passed into the parser. I use this to pass in the file name for error messages. The %name directive provides a prefix for the names of the parser modules. The %pure directive declares that all of the action code is side-effect free. If you don't include this then the parser will work harder to compensate which will slow it down. The %verbose directive tells ML-Yacc to dump the grammar rules to a .desc file. This can be useful for figuring out ambiguity problems but you need to be fairly familiar with LALR parsers.

I've actually combined two grammars together, one for the server configuration file and one for the .swerve files. They share a lot of syntax. Unfortunately ML-Yacc doesn't support more than one start symbol. I have to fake it by prepending a special symbol, called SYM_SWERVE, to the terminals from a .swerve file. The parser driver in the Config module will push the string "\001\001\001" onto the front of a .swerve file. The lexer will recognise this string as the SYM_SWERVE symbol. The parser will then switch to the swerve grammar.

Before looking at the grammar here are the types for the parse tree.

structure ConfigTypes =
struct

    datatype Section = 
            SectServer of {
                parts:  SectionPart list,
                pos:    Common.SrcPos
                }

        |   SectNode of {
                path:   string,
                parts:  SectionPart list,
                pos:    Common.SrcPos
                }

        (*  This is for the contents of a .swerve file *)
        |   SectSwerve of {
                parts:  SectionPart list
                }

    and     SectionPart = SectionPart of {
                left:   string,
                right:  Literal list,
                pos:    Common.SrcPos
                }

    and     Literal =
                LitIsString of string * Common.SrcPos
            |   LitIsInt of int * Common.SrcPos
end

The result of parsing will be a list of sections. Each section contains a list of parts and a part is a "word = value" pair. Every node in the parse tree is annotated with a source position. This gives the file name, line number and column where the characters corresponding to the node starts. Positions are used in error messages.

Here is the top part of the production section.

start:
        section_list                    (section_list)

    |   SYM_SWERVE
        part_list                       ([SectSwerve {
                                            parts = part_list
                                            }])

section_list:   
        section                         ([section])

    |   section_list 
        section                         (section_list @ [section])


section:
        KW_SERVER 
        SYM_LBRACE
        part_list
        SYM_RBRACE                      (SectServer {
                                            parts = part_list,
                                            pos   = KW_SERVERleft
                                            })
    |   KW_NODE 
        TOK_WORD
        SYM_LBRACE
        part_list
        SYM_RBRACE                      (SectNode {
                                            path  = TOK_WORD,
                                            parts = part_list,
                                            pos   = KW_NODEleft
                                            })

The parsing starts at the first non-terminal which I've called start. If it's parsing a server configuration file then the syntax is a list of sections. For a .swerve file it is a list of parts.

The action code computes a value for the non-terminal on the left-hand side of a production. A non-terminal on the right-hand side of a production can be used in the action code and it represents the value of the non-terminal. The notation KW_SERVERleft refers to the source position of the first character of the terminal KW_SERVER. The notation TOK_WORD represents the value carried by the terminal so it's a string variable. For more information on all of this see the ML-Yacc documentation.

Since the start non-terminal has the type (Section list) its action code must be of this type. So a .swerve file will produce a list containing the single SectSwerve section. The production for section_list is a standard pattern for a list of one or more things. It's a simple bit of recursion. Note that since ML-Yacc does LALR grammars it must be left-recursion. This means the recursive call to section_list appears at the beginning of the second branch of the production. The section production just computes a tree node with type Section.

Here's the rest of the grammar. It's quite straight-forward.

part_list:      
        part                            ([part])

    |   part_list 
        part                            (part_list @ [part])


part:
        TOK_WORD SYM_EQUALS
        literal_list
        SYM_SEMICOLON                   (SectionPart {
                                            left  = TOK_WORD,
                                            right = literal_list,
                                            pos   = TOK_WORDleft
                                            })


literal_list:   
        literal                         ([literal])

    |   literal_list 
        literal                         (literal_list @ [literal])


literal:
        TOK_STRING                      (LitIsString (TOK_STRING, TOK_STRINGleft))
    |   TOK_INT                         (LitIsInt (TOK_INT, TOK_INTleft))
    |   TOK_WORD                        (LitIsString (TOK_WORD, TOK_WORDleft))

The Configuration Lexer

The lexer splits the configuration files up into tokens which are words, strings, symbols and integers. The main difference between words and strings is that that strings can contain any special character so they must be quoted. Words are allowed to contain just enough special characters to form most of the file paths you're likely to want. The symbols include punctuation and some reserved words. The layout of the files is free format with any amount of white space between the tokens.

The lexer is generated using ML-Lex. Starting in the middle of the config.lex file are some declarations that are required to interface with the parser.

    (*  These definitions are required by the parser.
        The lexer types are supplied by the grammar.
    *)

    type    pos = Common.SrcPos
    type    arg = string                (* type from %arg below *)

    type svalue = Tokens.svalue
    type ('a,'b) token = ('a,'b) Tokens.token
    type lexresult= (svalue,pos) token

    fun eof file = Tokens.EOF(get_pos file 0, get_pos file 0)

%%
%header (functor ConfigLexFun(structure Tokens: Config_TOKENS));

ML-Yacc will generate a structure which defines all of the tokens that are passed from the lexer to the parser. These are the terminals of the grammar. The words terminal and token are synonymous. You use %header to declare the lexer as a functor that takes the structure as an argument, here called Tokens. Here is the signature for the structure, from the config.grm.sig file.

signature Config_TOKENS =
sig
type ('a,'b) token
type svalue
val EOF:  'a * 'a -> (svalue,'a) token
val TOK_INT: (int) *  'a * 'a -> (svalue,'a) token
val TOK_STRING: (string) *  'a * 'a -> (svalue,'a) token
val TOK_WORD: (string) *  'a * 'a -> (svalue,'a) token
val SYM_SWERVE:  'a * 'a -> (svalue,'a) token
val SYM_EQUALS:  'a * 'a -> (svalue,'a) token
val SYM_RBRACE:  'a * 'a -> (svalue,'a) token
val SYM_LBRACE:  'a * 'a -> (svalue,'a) token
val SYM_COMMA:  'a * 'a -> (svalue,'a) token
val SYM_SEMICOLON:  'a * 'a -> (svalue,'a) token
val KW_NODE:  'a * 'a -> (svalue,'a) token
val KW_SERVER:  'a * 'a -> (svalue,'a) token
end

All of the tokens are defined as functions that map from a pair of source positions and possibly some contained value to the token type. There are two source positions so that you can point to the first and last characters of the token in the source file. I just point to the first character and set the second position to be the same as the first. For example to generate the EOF token I just call the Tokens.EOF function with some dummy source positions.

In the signature the 'a type variable represents whatever type you choose for the source position. The svalue name means "semantic value". It's whatever data will be carried along with the tokens. When used with an ML-Yacc parser it will also include the types for the non-terminals. All you have to do is ensure that there are definitions for the svalue and token types in the lexer which are equated to the types supplied in the Tokens structure. Also you must equate the lexresult type to be the same as the parser's token type.

Here is the bottom half of the config.lex file which defines the regular expressions for the tokens.

%%
%header (functor ConfigLexFun(structure Tokens: Config_TOKENS));
%full
%arg (file: string);

wrd1=[A-Za-z_/\\$:.%+-];
wrd=[A-Za-z0-9_/\\$:.%+-];
word={wrd1}{wrd}*;
str=([^"\n]|\\\n|\\\"|\\\\);
digit=[0-9];
int=[+-]?{digit}+;

ws=[\ \t\013];
%%

"\n"            => (new_line yypos; continue());
{ws}+           => (continue());
#.*\n           => (new_line yypos; continue());


{word}          => (check_reserved yytext file yypos);

{int}           => (fix_integer yytext file yypos);

\"{str}*\"      => (fix_str yytext file yypos);

";"             => (sym Tokens.SYM_SEMICOLON file yypos);
","             => (sym Tokens.SYM_COMMA file yypos);
"{"             => (sym Tokens.SYM_LBRACE file yypos);
"}"             => (sym Tokens.SYM_RBRACE file yypos);
"="             => (sym Tokens.SYM_EQUALS file yypos);
"\001\001\001"  => (sym Tokens.SYM_SWERVE file yypos);


.               => (Log.errorP (get_pos file yypos)
                ["Unrecognised characters in the configuration file."];
                    eof file);

The wrd definition defines the characters that can appear in a word. The wrd1 definition is the subset that can be the first character of a word. This excludes digits to avoid confusion with integers. Strings can contain backslash escapes. Within the regular expression a few have to be handled separately. The \\\n combination ensures that new-lines are only allowed if they are immediately preceded by a backslash. Similarly a double-quote is allowed inside a string if it is preceded by a backslash. The last term ensures that the sequence "foo\\" is correctly recognised as a backslash at the end of a string and not a backslash followed by an internal double-quote.

The "\001\001\001" is the three CTRL-A character marker that is inserted at the beginning of a .swerve file as described in the section called The Configuration Grammar.

The lexer uses a few helper functions in the top section of the config.lex file to build a token. For example the yytext variable contains the complete matched text which for strings will include the double quote characters. The fix_str function strips them off and also translates the backslash escapes.

fun fix_str yytext file yypos =
let
    val pos = get_pos file yypos
    val chars = explode(substring(yytext, 1, size yytext - 2))

    fun count_nl [] pp = ()
    |   count_nl (#"\n"::rest) pp = (new_line pp; count_nl rest (pp+1))
    |   count_nl (c::rest) pp     = count_nl rest (pp+1)

    fun xlate [] rslt = implode(rev rslt)
    |   xlate (#"\\"::c::rest) rslt =
    let
        val nc =
            case c of
              #"n" => #"\n"
            | #"t" => #"\t"
            | _    => c
    in
        xlate rest (nc::rslt)
    end
    |   xlate (c::rest) rslt = xlate rest (c::rslt)
in
    count_nl chars (yypos+1);
    Tokens.TOK_STRING(xlate chars [], pos, pos)
end

According the Config_TOKENS signature above the TOK_STRING function takes the text of the string as the first argument. The type for this argument comes from the %term declaration in the grammar file.

What's a little tricky is keeping track of the line and column positions. I can count lines by being careful to call my new_line function (below) for each new-line character in a matched expression. I've made the new-line separate from the white space expression (ws) to make it easier to count. ML-Lex generates code to provide the position of a matched regular expression as the character offset from the beginning of the source. This is available in the yypos variable. If I save the offset of each new-line then I can work out the column number of a character by subtracting the offset of the character from that of the most recent new-line. This is taken care of in the following code.

val     line = ref 1                (* current line *)
val     line_pos = ref 0            (* char position of preceding \n *)

fun get_pos file yypos =
let
    val col = Int.max(yypos - !line_pos, 1) (* see eof *)
in
    Common.SrcPos {file=file, line= (!line), col=col}
end


fun new_line yypos =
(
    line := !line + 1;
    line_pos := yypos
)

The count_nl function in fix_str above is needed to account for new-line characters embedded in strings. It has to track the source position within the string to keep the positions right.

Integers as strings are converted to numeric values in the fix_integer function. The sym function just adds source positions to the symbols. I won't show these here as they are simple enough. Reserved words are filtered out in the check_reserved function.

val reserved_words = [
    ("SERVER",      Tokens.KW_SERVER),
    ("NODE",        Tokens.KW_NODE)
    ]


fun check_reserved yytext file yypos =
let
    val uword = Common.upperCase yytext
    val pos = get_pos file yypos
in
    case List.find (fn (w, _) => w = uword) reserved_words of
      NONE          => Tokens.TOK_WORD(yytext, pos, pos)
    | SOME (_, tok) => tok(pos, pos)
end

Since there are only a few a search through a list is fine. The word matching is case-insensitive.

The Parser Driver

This section completes the description of the parser by showing how it is used in the Config module. The various structures and functors are assembled to make a complete parser as follows.

(*  Assemble the pieces to make a parser. *)

structure ConfigLrVals = ConfigLrValsFun(structure Token = LrParser.Token)
structure ConfigLex    = ConfigLexFun(structure Tokens = ConfigLrVals.Tokens)
structure ConfigParser = JoinWithArg(structure LrParser = LrParser
                        structure ParserData = ConfigLrVals.ParserData
                        structure Lex = ConfigLex)

(*  Max number of tokens to lookahead when correcting errors. *)
val max_lookahead = 15;     

(*  The syntax error messages use the token names. This is for editing
    them to something more readable.
*)
val syntax_edits = [
    ("KW_SERVER",       "Server"),
    ("KW_NODE",         "Node"),
    ("SYM_SEMICOLON",   "semicolon"),
    ("SYM_COMMA",       "comma"),
    ("SYM_LBRACE",      "'{'"),
    ("SYM_RBRACE",      "'}'"),
    ("SYM_EQUALS",      "'='"),
    ("TOK_WORD",        "word"),
    ("TOK_STRING",      "string"),
    ("TOK_INT",         "number")
    ]

The ConfigLrVals structure contains the tables of parsing operations for the grammar. The ConfigLex structure contains the complete lexer specialised with the types needed to communicate with the parser. The JoinWithArg functor is part of the ML-Yacc library. It joins all the pieces together. The "WithArg" part of the name indicates that it supports an argument being passed in at parsing time. I use this to carry the file name. You can see it in the %arg declarations in the grammar and lexer files. (The file argument isn't used in the parser but the way the joining works it must be there if the lexer has one).

The result is a complete parser in the ConfigParser structure which will be used below.

A parser generated by ML-Yacc will do syntax correction. This means that if there is a syntax error it will attempt to insert or delete tokens to change the input into something parsable and then continue. It will produce an error message showing what change it made which should give the user an idea of what the original syntax error was. This sounds clever but it can be confusing. It means that if you omit a semicolon for example you will get an error message saying that one was inserted. This isn't very user-oriented. What's worse is that the tokens in the messages are described using the names that appear in the grammar file. I like to have distinctive terminal names in the grammar file, in uppercase. So I process the error messages to convert the terminal names to something more readable. The syntax_edits list in the above code is used for this processing. The max_lookahead parameter controls the error correction. The value 15 is recommended in the ML-Yacc documentation for most purposes.

Here is the parse_config function which drives the parser.

and parse_config swerve file: Section list =
let
    fun parse_error(msg, pos1, pos2) = Log.errorP pos1 [edit_errors msg]

    val swerve_done = ref false

    fun input rstrm n =
    (
        if swerve andalso not (!swerve_done)
        then
            (swerve_done := true; "\^A\^A\^A")
        else
            TextIO.inputN(rstrm, n)
    )

    fun do_parser holder =
    let
        val rstrm = TextIOReader.get holder

        fun do_parse lexstream =
            ConfigParser.parse(max_lookahead, lexstream, parse_error, file)

        val in_stream = ConfigParser.makeLexer (input rstrm) file

        val (result, new_stream) = do_parse in_stream
                    handle ParseError => ([], in_stream)
    in
        TextIOReader.closeIt holder;
        result
    end
in
    case TextIOReader.openIt' file of
      NONE   => []
    | SOME h => do_parser h
end

The TextIOReader.openIt' function opens the file without worrying about connection time-outs. The swerve argument indicates that a .swerve file is being parsed. The .swerve files are actually read while the server is processing connections so I should be worrying about time-outs for them but the files are small so I'll get away with it.

In the do_parser function the parsing is run by a call to the ConfigParser.parse function. The second argument to ConfigParser.parse is the lexer. The third is a call-back function for error messages and the fourth is the %arg argument value. The lexer is made with the ConfigParser.makeLexer function which takes a source reading function and a %arg argument value. The input function delivers the contents of the file in chunks of size n. If it is a .swerve file then the first chunk is forced to be the triple CTRL-A marker.

I've omitted a description of the syntax error editing. It's just some straight-forward string manipulation. The Substring.position function does the job of finding the string to replace.

The end-result of all of this is parse tree whose type is ConfigTypes.Section list, which is the type of the start non-terminal in the grammar.

Processing the Parse Tree

The output from the parser is a list of sections of the type ConfigTypes.Section. The two main processing steps that follow are for the server and node sections. Refer to the processConfig function in the section called The Config Module - Interface.

The process_server_section function looks through the sections for the server configuration section.

and process_server_section sections =
let
    fun match (SectServer _) = true
    |   match _              = false

    val sects = List.filter match sections
in
    case sects of
      [SectServer {parts, ...}] => process_server_parts parts

    | [] => (Log.error
               ["A server configuration section must be supplied."];
             raise Bad)

    | _  => (Log.error
               ["There are multiple server configuration sections."];
             raise Bad)
end

The process_server_parts function saves each parameter into the static variables. (This design neatly allows the server section to be anywhere in the file).

The process_node_sections function finds each node section and adds it to the list of node configurations in a static variable. The static variables are described in the section called The Config Module - Interface.

and process_node_sections sections =
let
    fun process sect =
    (
        case process_node_section sect of
          NONE        => ()
        | SOME config => cf_nodes := config :: (!cf_nodes)
    )

    fun match (SectNode _) = true
    |   match _            = false

    val sects = List.filter match sections
in
    case sects of
      [] => (Log.log Log.Warn (TF.S
               "There are no node configuration sections."))

    | _  => app process sects
end

I won't describe the parameter processing in detail. It's a lot of long-winded checking of values for the correct format and legality. I'll just describe the general idea.

The process_server_parts takes a list of ConfigTypes.SectionPart values which contain one parameter each. It has two dispatch tables for string-valued and integer-valued parts. These dispatch to functions that check the value and return the value if it is legal. If the value is illegal then an error message is logged and the Bad exception is raised. This is caught at the top in processConfig which aborts the whole server with a FatalX exception.

Each dispatch step reduces the list of parts by removing those that are recognised. If there are any parts left over then they must be unrecognised parameters and error messages are logged. (See the unrec_param function).

The server configuration record is built up from the successful values returned from the dispatch functions. Utility functions are used to extract particular parameters. For example the reqstr function finds a required parameter that has a string value. Defaulting is performed at this stage.

Node section processing is similar but since there are fewer parameters and their types are more complex there isn't a formal dispatch table. There isn't as much checking for legality as there should be. For example I don't check that the name of a built-in handler is legal at this stage, nor if a CGI script exists.

MIME Type Configuration

The server configuration contains the path to the MIME types file. This has the same format as is used by Apache. The contents of this file is read in by the process_mime_file function and saved into a table. The table maps from a file extension to a pair of major/minor MIME type names. For case-insensitivity the extensions are saved in upper case.

(*  The mime information is just a map from an extension to
    the pair.
*)
val mime_table: (string * string) STRT.hash_table =
                                    STRT.mkTable(101, NotFound)

There is only one external API function, lookupMime.

and process_mime_file() =
let
    val ServerConfig {mime_file, ...} = getServerConfig()
in
    if Files.readableReg mime_file
    then
        FileIO.withTextIn (Abort.never())
            mime_file () (read_mime mime_file)
    else
        Log.error ["The MIME types file is not readable: ", mime_file]
end



and read_mime mime_file stream : unit =
let
    fun loop lnum =
    (
        case TextIO.inputLine stream of
          "" => ()
        | line => (do_line lnum line; loop (lnum+1))
    )
... omitted material ...
in
    loop 1
end


and lookupMime ext = STRT.find mime_table (upperCase ext)