CLOPS tutorial

This tutorial first illustrates the basic usage scenario and then presents a few advanced features. However, it barely scratches the surface of what can be done with CLOPS.

Basic usage

Typically, CLOPS is called by the build system to generate files before the actual compilation. Before seeing how to integrate with ANT let's first use it from the command line to get a better feeling of what goes on under-the-hood. In this section you will be guided step-by-step to recreate the example that comes in the CLOPS distribution in the sample/wc directory. So, if you feel stuck at any point, you may look in that directory for help. Or, if you feel brave enough, you may analyze yourself that example and skip this section.

A first example

The wc tool computes statistics of a text, such as the word count. Take a (short) look at its man page:

man wc

Create a fresh directory. All the commands in this tutorial are written under the assumption that the current directory is this fresh directory. Create a subdirectory lib and copy in it the file clops-runtime.jar from the CLOPS distribution. This file is needed to compile your code and to run your application. Then create another subdirectory src and in it create a file wc.clo with the content:

NAME::
  Wc

ARGS::
  Bytes: {"-c", "--bytes"} // option name, followed by aliases
  Chars: {"-m", "--chars"}
  Words: {"-w", "--words"}
  Lines: {"-l", "--lines"}
  LineLength: {"-L", "--max-line-length"}
  FilesFrom: {"--files0-from"}: {file}  // "file" is the option type
  Help: {"--help"}         /* the default type is "boolean" */
  Ver: {"--version"}
  Files: {}: {file-list}: [between="", allowMultiple="false"]

FORMAT::
  (Option | Files)*;  // this is a regular expression

WHERE::
  Option:   // shorthand to make the FORMAT easier to read
    Bytes | Chars | Words | Lines | LineLength |
    FilesFrom | Help | Ver;

The properties between and allowMultiple will be explained in the Dealing with -- section.

Create a subdirectory generated under src. Generate the parser with:

java -jar CLOPSPATH/lib/clops.jar\
  src/wc.clo -o src/generated -p generated\
  -d cli.html -b html

(If CLOPSPATH is added to your PATH and you are on Linux then you can simply say clops to run the tool.) You should see three files in src/generated, each containing a class. (The option -p is followed by a package name.) You should also see the file cli.html that gives an easy to read description of your tool's command line interface. Next, we'll place Main.java under src.

import java.io.File;
import generated.WcParser;
import generated.WcOptionsInterface;
import ie.ucd.clops.runtime.errors.ParseResult;

public class Main {
  public static void main(String[] args) throws Exception {
    WcParser parser = new WcParser();
    ParseResult argsParseResult = parser.parse(args);
    if (!argsParseResult.successfulParse()) {
      argsParseResult.printErrorsAndWarnings(System.err);
      System.out.println("Usage: java Main [OPTIONS] file...");
      System.exit(1);
    }
    WcOptionsInterface opt = parser.getOptionStore();
    if (opt.isWordsSet()) 
      System.out.println("I should print a word count.");
    if (opt.isBytesSet()) 
      System.out.println("I should print a byte count.");
    for (File f : opt.getFiles()) checkFile(f);
  }

  public static void checkFile(File f) {
    System.out.print("The file " + f.getPath());
    if (f.exists())
      System.out.println(" exists.");
    else
      System.out.println(" does not exist.");
  }
}

Create a subdirectory classes. To compile, say:

javac -cp lib/clops-runtime.jar\
  -sourcepath src -d classes src/Main.java src/generated/*

Warning: Errors during compilation might be caused by an incorrect .clo file, because CLOPS only detects obvious mistakes. More subtle ones are detected down the road by the Java compiler.

You can now run your program with:

java -cp classes:lib/clops-runtime.jar Main ARGUMENTS

Try to replace ARGUMENTS with various values and see what happens. In particular, try "--files0-from=.", "--files0-from .", and only "--files0-from".

Congratulations: You have now finished the first part of the tutorial and you saw how to use CLOPS instead of a hand-written parser.

Integrating with ANT

The commands in the previous section are a little long but hopefully you use a good build system that can handle the dirty details for you. In this section you'll see how ANT can help. Create the build.xml file:

<project name="Wc" default="compile" basedir=".">
  <!-- to be filled later -->
</project>

The compile target looks as usual, except that it depends on a code generation phase and that javac needs to have the file clops-runtime.jar in the classpath.

<target name="compile" 
        description="compiles Java files" 
        depends="clops-generate">
  <mkdir dir="classes"/>
  <javac destdir="classes" 
         srcdir="src" 
         classpath="lib/clops-runtime.jar"/>
</target>

The code generation phase is simply an invocation of CLOPS. You may need to change the path for clops.jar to fit your clops installation.

<target name="clops-generate" 
        description="use CLOPS to generate files">
  <mkdir dir="src/generated"/>
  <java fork="yes" dir="." jar="../../lib/clops.jar">
    <arg value="src/wc.clo"/>
    <arg value="-o=src/generated"/>
    <arg value="-p=generated"/>
  </java>
</target>

Finally, when packaging your application for a release you must include the clops-runtime.jar as a dependency.

<target name="dist" 
        description="build a distribution" 
        depends="compile">
  <mkdir dir="dist"/>
  <jar destfile="dist/wc.jar" basedir="classes">
    <manifest>
      <attribute name="Main-Class" value="Main"/>
      <attribute name="Class-Path" value="clops-runtime.jar"/>
    </manifest>
  </jar>
  <copy file="lib/clops-runtime.jar" todir="dist"/>
</target>

You are now ready to use CLOPS. But if you feel that you need more flexibility and power then read on.

Advanced features

This section illustrates a few advanced features of CLOPS by showing how you can deal with tricky command line conventions.

Dealing with --

The wc command line implementation above is almost OK. One problem is that it doesn't allow the user to examine a file whose name is "-w" (or any other name that conflicts with an option). The usual convention to handle such situations is to use a special option (--) and treat all that follows as being an operand (not an option). We can handle this by changing wc.clo a little.

ARGS::
  DashFiles: {}: {file-list}: 
    [between="", allowMultiple="false", allowDash="true"]
  DashDash: {"--"}

FORMAT::
  (Option | Files)* (DashDash DashFiles*)?;

Also, add the following line at the end of your main method:

    for (File f : opt.getDashFiles()) checkFile(f);

Recompile and run.

ant dist
java -jar dist/wc.jar ARGUMENTS

As before, spend some time trying different ARGUMENTS.

Each option type, like file-list, has a number of properties that customize its behavior, like between, allowMultiple, and allowDash. The default behavior for file-list is to recognize command lines like -f nameA,nameB and like -f=nameA,nameB. If the set of aliases is left empty then it will recognize command lines like =nameA,nameB or "" nameA,nameB. In other words, it still expects = or an argument delimiter between the empty prefix and the file names. If we want command lines like nameA,nameB to be recognized then we must say between="". The second property, allowMultiple, defaults to true and means that a comma (,) can be used to separate multiple file names. But wc considers the command line x,y as being one file name, not two, so we set allowMultiple to false to get this behavior. Finally, because it is by far the most common behavior, the default is not to allow file names that begin with a dash.

The other secret ingredient in getting the same behavior as wc is the format. It says that a -- might appear (?) followed by any number (*) of file names that may start with a dash.

If you wonder if the zoo of properties is big enough to handle all your needs then the answer is that underneath there is an even more general mechanism, and most properties are just syntactic sugar to make your life easier.

GZIP's -0, -1, ..., -9

GZIP can be told what compression level to use by saying -0, -1, ..., or -9. From the program we'd like to be able to simply ask clopsParser.getCompressionLevel(). How can this be achieved? To understand the answer it helps if you know a little more about how CLOPS works. Each option has a regular expression that is built by default from aliases and from the option type. This regular expression is used to split the command line into meaningful bits. Let's say we declared the following option:

  Foo: {"-f", "--foo"}: {int} 

Then, its default regular expression is

  "((?:-f)|(?:--foo))(?:[=\0]([0-9]+))?\0"

Here \0 matches the end of one command line argument and (?:X) is just the Java convention of grouping without numbering the group. In general, any option type that expects one argument has a default regular expression that consists of a prefix and a suffix. The prefix (in this case ((?:-f)|(?:--foo))) is an alternation between aliases. The suffix is made out of the "between" part and the "argument shape" part. The "between" part is by default [=\0] and may be changed using the between property; the "argument shape" part defaults to something dependent on the option type (([0-9]+) for the option type int) and can be changed using the argumentShape property. The whole suffix part can be changed at once using the suffixregexp property. Notice that if you don't provide any alias and yet override the suffix you effectively specify the whole regular expression yourself.

The groups in the regular expression are important. Group 1 usually captures the prefix and is used in error messages; group 2 captures the string that is then parsed to build the value of the option.

We can now describe the solution.

  CompressionLevel: {"-"}: {int}: [suffixregexp="([0-9])\0"]

That's all.

Exercise: How would you handle TAIL? (You can say tail -50 foo to display the last 50 lines of foo.)

TAR's Old Style Options

The special character \0 in the regular expressions above might look weird. But it is powerful. It allows you to have options that do not necessarily correspond to what your shell considers to be one argument. In particular, it allows you to handle old style options such as those supported by TAR.

To decompress a file one typically says

tar xzf foo.tar.gz

and to compress

tar cjvf foo.tar.bz2 foo/

The option x stands for eXtract, option c for Create, option z for gZip, option j for bzip2, option v for Verbose, and option f says that the file name follows.

But it is also possible to say

tar -c -z -f foo.tar.gz foo/

or even

tar fcz foo.tgz foo/

Let's see how this can be handled with CLOPS. We begin with the easy bits.

NAME::
  Tar

ARGS::
  Create: {"-c"}:
    "create a new archive"
  Extract: {"-x","--extract","--get"}:
    "extract files from an archive"
  BzipTwo: {"-j","--bzip2"}:
    "filter archive thru bzip2"
  Gzip: {"-z","--gzip","--gunzip","--ungzip"}:
    "filter archive thru gzip"
  Archive: {"-f","--file"}: {file}:
    "use the given file"
  File: {}: {file-list}: [between="", allowMultiple="false"]:
    "files to archive"

To recognize the old-style options we introduce dummy options.

  ShortCreate: {"c"}: [suffixregexp=""]
  ShortExtract: {"x"}: [suffixregexp=""]
  ShortBzipTwo: {"j"}: [suffixregexp=""]
  ShortGzip: {"z"}: [suffixregexp=""]
  ShortArchive: {"f"}: [suffixregexp=""]
  ShortArchiveValue: {}: {file}: [between="\0"]

These are "dummy" in the sense that we don't plan to ever ask about their value in our implementation of TAR. Instead, we tell CLOPS to set the corresponding long versions on the fly.

FLY::
  ShortArchiveValue -> Archive := {$(ShortArchiveValue)};
  ShortCreate -> Create := {true};
  ShortExtract -> Extract := {true};
  ShortGzip -> Gzip := {true};
  ShortBzipTwo -> BzipTwo := {true};

The mysterious regular expressions used by the old-style options can be understood only after you see the format.

FORMAT::
  (ShortOption* ShortArchive ShortOption* ShortArchiveValue 
      (LongOption | File)*
  | (ShortOption* LongOption* Archive (LongOption | File)*));

WHERE::
  LongOption: 
    Create | Extract | BzipTwo | Gzip;
  ShortOption: 
    ShortCreate | ShortExtract | ShortBzipTwo | ShortGzip;

By setting the suffixregexp to the empty string we make sure that a separator after the option string is not required, as is usual.

Finally, we can instruct CLOPS to issue error messages if certain conditions hold.

VALIDITY::
  {$(Create?) && $(Extract?)}: 
    "You can't create and extract at the same time";
  {$(Gzip?) && $(BzipTwo?)}: 
    "You can't use gzip and bzip2 at the same time";
  {!$(Create?) && !$(Extract?)}: 
    "Should I create an archive or extract from one?";

With relatively little effort we obtained a parser for the insanely complicated conventions of TAR!

Exercise: Find the remaining differences between TAR and the parser generated from the description above.