op/3declaration, or, the ability to introduce new syntax into the language so that I may model the problem more naturally. This document covers extending the language to include the
op/3declaration in its full breath of functionality.
Alternatives, and Raison d'être
op/3directive in a module and thereafter, within only that module, parse the operator declared with the specification and priority given in the directive. This may seem like a drastic measure, so we must consider the alternatives before choosing this course of action. There are basically four viable, albeit inferior, alternatives:
- The Mercury programming language provides the grave syntactic construct which converts the standard prefixed call to an infix one:
X `fn` YSee, for example, the
pprintmodule, as it used to make extensive use of this style (until it recently deprecated this approach to use one of the builtin operators, instead). Just as the Mercury language developers have discovered, this approach has at least two drawbacks:
- One could construct a specialized instance of the
op_tabletypeclass from the
opsmodule and thereafter use
term_iomodule to parse strings at runtime. See samples/calculator2.m provided with the distribution for an example program that demonstrates this approach.This approach also has its own set of associated problems:
- Constructing one's own
op_tableis excessive when using only a few operators and tedious when introducing many operators. This manual process steals precious time away from program development that addresses the problem, itself.
- Until now, there was no "cookbook" approach addressing the problem of how to create a mutable syntax. The
opsmodule and the sample calculator program are well-documented and provide good examples of how to implement static syntax, but provide no guidance for constructing dynamic, mutable, syntax. For this, one had to design such a framework from first principles.2
- An user-defined
op_tableinstance may only be used at runtime. The Mercury compiler, as implemented, does not allow such tables during module compilation.
- Constructing one's own
- Third, use one of the available scanners (such as samples/lex/) or parser generators (such as samples/moose/) to create a language syntax-aware preprocessor that substitutes operators and their arguments with the well-formed term replacement. Problems:
- This is highly redundant and fruitless exercise, as the compiler has its own parse phase that does the same work, and with the language itself in flux (as is the case for any living language) changes to the syntax quickly render a system created by these means obsolete. Parser generators for other programming languages provide complete grammars for every version of the host programming language, Mercury has no parser generator with such grammars, so this task is left to a user of these kinds of systems.
- Furthermore, although the domain-specific languages for these tools closely follow the Mercury programming language to do their work, they do have their own languages that require time and effort to master. When presented with powerful parsing facilities built right into logic programming languages (I'm referring specifically to Definite Clause Grammars (DCGs)), one must weigh the costs of learning these languages before embarking on such an endeavor.
- Worst for last: as with C/C++, create a specific preprocessor that parses the source file, converting annotated operators to equivalent Mercury terms by following the preprocessing directives. This approach requires so much work (the C preprocesser is a compiler-sized program) and has so many known pitfalls (such as replacing elements inappropriately (in a quoted string, for example) and causing an unacceptable disjunction between the generated executable and the original source base (confusing debugging and error reporting efforts), that it should not receive serious consideration;3 it does not in this document, at any rate.
op/3syntax into the compiler, the changes we make are hygenic in that they are part of the language syntax, not external and blindly unaware of it, as is the case with with C preprocessor and immediate so that they may be used at compile time in the module in which they are declared. This implementation also limits the lexical scope of the operatorwithin the module in which it is declared,4 preventing these declarations from corrupting modules that eventually use modules with specialized syntax.
op/3declaration fully into the the language, so that, e.g., facts may be stated in their vernacular and still be compiled into executable content in the Mercury idiom, as in this real-world example:
for the open weekly timecard ending date(2006, 1, 6): employee cgi_emp_001 billed [ 3 hours on sunday - date(2006, 1, 1), 16.5 hours on monday - date(2006, 1, 2), 5 hours on tuesday - date(2006, 1, 3), 5 hours on wednesday - date(2006, 1, 4) ] against contract lt_2005_001.
op/3defined syntax, both prefix ('
employee cgi_emp_001') and postfix ('
3 hours'). The above fact is certainly "only" a data term (in fact, as well as being a data term, the above fact also contains
op/3-based data terms), but fully actualized operators exist as well; the Prolog syntax module is rife with such examples. These uses of
op/3-declared syntax (describing entity relationships clearly and as activated syntax) are in no way limited to the rather straightforward problems of accounting, but are also used in production expert systems handling over 1,000,000 transactions per day; the use of these extensions are tied directly to rule findings satisfying customer requirements.
op/3-declared syntax is used extensively in production systems built using Prolog serving real-world requirements under heavy demands. With the preexisting extensions for purity, typing, and functional programming, imagine the utility and expressivity that could be obtained with Mercury so extended!
opsmodule uses a discriminator (the type
category) to choose among different uses for an operator (e.g. unary '
-' verses binary '
-'). This discriminator is internal, and, as we need the same functionality when defining new operators, so we externalize that type in library/ops.m by moving the type declaration from the implementation section to the interface.
categoryto the syntax declaration, and then make this new type an
op_table(typeclass) instance ... we add this type to the interface of compiler/prog_io_util.m:
:- type op_map == map(pair(string, ops.category), op_info). :- type mercury_op_map ---> mercury_op_map(ops.table, op_map). :- instance ops.op_table(mercury_op_map).
:- type op_info ---> op_info(ops.specifier, ops.priority). :- func op_specifier_from_string(string) = ops.specifier. :- func op_category_from_specifier(ops.specifier) = ops.category.
op_specifier_from_stringfunction simply takes an input string, e.g.
"xfx", and coverts it to the equivalent specifier representation, e.g. the functor
op_category_from_specifierfunction follows the (implied) convention of the
opsmodule, which is all prefix
specifiertypes (including binary prefix) are the
categoryand all other
specifiertypes (one of several different infix and postfix possibilities) are the
categorytype. The complete set of changes are enumerated explicitly in the email on the implementation.
opsmodule, we need to integrate this into the compiler's parser module (which is actually called
prog_io). The efficacious point is where the parser works at the module level,6 this occurs, after some initialization in
read_all_items/7. We initialize the op map here (with a call to
init_mercury_op_map), and then pass along that nascent syntax map to the calls that parse the items in the module (by modifying the signatures of
read_first_item/9and the recursive calls
read_items_loop(ModuleName, SourceFileName, !Msgs, !Items, !Error, Syn0, !IO)
read_items_loop/10with the goal:
read_item/7) which normally scans and parses the items in the module. When it encounters an
op/3declaration, however, it eventually resolves to the
process_decl/8back in the
prog_iomodule, which reads the declaration and then adds the syntax declaration to the op map, enhancing the syntax for the current module.
read_all_items/7completes its iteration on a module's items, it exits, discarding the
op_mapinstance and any syntax it accumulated from
op/3declarations in that module, returning the compiler to the base, Mercury-defined, syntax. So that the "next" module starts fresh without syntax from other modules polluting the compilation.
ltq' is equivalent to `
mmc --make --infer-all'. For modules with
op/3declarations in the implementation, `
ltq' first parses the module and writes out all terms canonically. After this translation, the system compiles the modules into the resulting executable or library.
mmc -M<file>' discovers file's dependencies and stores these in a makefile variable $(
ltqsimply builds a makefile with the enumerated dependencies and then calls the system that manages the dynamic syntax, which then writes out syntactically-enhanced modules in their canonical form (called `
doppare available, along with samples as dynamic_ops.tgz.
op/3declarations as well as at least two aborted implementation attempts to do so. In the ensuing process, where I did implement this solution, quite a discussion emerged on the maillist on the estetic of allowing the user to introduce or to change syntax, and how to go about doing it properly. This implementation is one approach, and is offered to assist those who wish to add syntactic extensions to their Mercury systems.
The normal infix operators do not have this grave branding, and for good reason. Imagine writing algebraic statements, such as the following:
while shackled to the grave syntax:
Note the extra parentheses -- these are now necessary, as the grave syntax does not communicate operator precedence. Also note that the single-character operators are now three times their original size. Given the above, it's tempting to avoid infix syntax altogether...
...but I have no desire to write out the parsed internal representation by hand (it may look like Lisp, circa 1965, because the syntax of most Lisps (with one notable exception) is also its parsed internal representation), so the Mercury prefix code is therefore presented:
There! Isn't the canonical tree syntax so much better than the cons syntax? Drek!
This is not all that bad, given the documentation and the calculator2.m sample. In calc4.m we provide a straightforward example using the
This pronouncement in no way prevented this author from submitting such a proposal to the Mercury team. Ah, the blessed ignorance of youth! All was not in vain, however: every misstep hides the seeds of greatness: one of the responses showed that samples/expand_term.m (the responder was the author of that module, in fact) provides the functionality of Prolog's
In ISO Prolog
:- op(300, xfx, plays). :- op(200, xfy, and). Term1 = jimmy plays football and squash. Term2 = susan plays tennis and basketball and volleyball.
...but then the textbook quickly redeems itself -- it is still my preferred Prolog textbook -- with a meatier problem, which I adapt for your enjoyment:
ruth was the executive director at wncog. sally was the executive administrative_assistant at wncog. diane was the director of the human_resources department at wncog. juan was the administrative_assistant of the human_resources department at wncog. sunny was the director of the finance department at wncog. stuart was the director of the operations department at wncog. joe was the system_administrator of the operations department at wncog. ?- Who was the director of the What department at wncog. Who = diane, What = human_resources ; Who = sunny, What = finance ; Who = stuart, What = operations ; no
I leave the
|[Bratko2001]||Prolog Programming for Artificial Intelligence, 3rd ed., Ivan Bratko, Addison-Wesley, Reading, Massachusetts, 2001.|
(article originally posted January 3, 2006)