XML catalogs and the Catalog Resolver PreviousNext

About OASIS ERTC XML Catalogs

The resolvers mentioned in the previous chapter are take a string from an xml document, and attempt to open an input stream for the parser or application to read from. This is all very well, if the string concerned can easily be mapped to a local resource.

The moment you start to think beyond the local machine, this approach has limitations. For instance, if a SYSTEM id refers to a network resource, you don't have many options. If you have a local copy of the resource, you could edit the source document and change the SYSTEM id. But you may have to do a lot of editing, and if you want to distribute the document, it gets complicated. Alternatively, you might have a resolver available that handles the http protocol, but if you lose your network connection, or the remote server goes down, you are in trouble.

Using a string-based resolver can releave some of these problems, but having to code, or configure in code, a dedicated class is not very flexible. A good answer to these problems is the catalog resolver.

The Catalog Resolver

XM_CATALOG_RESOLVER is a resolver that implements a two-stage resolution process. In the first stage, it uses XML catalogs (as defined by the OASIS Entity Resolution Technical Committee's 1.1 specification of 7th October 2005) to look up a SYSTEM or PUBLIC id, or a URI reference from the source document, and fetches another URI reference to feed into the second stage.

In the second stage, an XM_URI_EXTERNAL_RESOLVER is used to open a stream to the mapped URI reference.

The actual format of an OASIS ERTC XML Catalog is quite complex, and allows delegation and URI re-writing. Read the specification to understand the full power of these catalogs.

Configuring the Catalog Resolver

A number of options control the way the ctaalog resolver finds catalogs, and the resolution itself.

System catalog list
The initial list of system catalogs searched by the resolver is taken from the environment variable XML_CATALOG_FILES, which must be a list of file names separated by colons or semi-colons. If this list is empty, then the file /etc/xml/catalog is used, unless suppress_default_system_catalog_file is called on XM_SHARED_CATALOG_MANAGER.
Document control of catalog files
Additional catalogs are searched for particular documents if one or more oasis-xml-catalog processing instructions appear within that documents (see the specification for restrictions). This behaviour can be suppressed by calling suppress_processing_instructions.
Public/System preference
If the prefer attribute is not coded for a particular catalog, then the default is prefer="public". This can be changed to prefer="system" by calling set_prefer_system.
To assist in creating catalogs, or to find possible bugs in the resolver, you can get debugging messages written to the standard error stream, by calling set_debug_level (a_level: INTEGER). Level 0 is the default, and level 10 gives the most voluminous ouptut. It is quite a good idea to turn on level 1 at least, as it will tell you if there are serious errors in your catalogs.

Debug level settings

The level parameter controls which classes of debugging messages are generated thus:

  1. Any errors that cause a catalog to fail parsing.
  2. Near-errors, loading catalogs, or switching to a delegated catalog.
  3. Catalogs that do not exist, traces of calls to resolve routines, and setting options on the catalog manager.
  4. Parsing a named catalog, and entries encountered within it.
  5. Resolution results. Setting xml:base.
  6. Preference status of found public entries. Duplicates.
  7. Catalog's base URI. Identity of retrieved catalogs, and whether they fail parsing.
  8. Normalization messages. Number of system catalogs.
  9. Checking for delegates, next catalogs and re-write rules. Matches found/not-found.
  10. Prefix strings. Candidate matches.

Copyright 2005-2016, Eric Bezault
Last Updated: 27 December 2016