External entity and URI reference resolvers PreviousNext

Resolution of external entities

By default, the parsers do not resolve external entities and produce an error if an external entity or an external DTD is used. These external references are loosely defined as a URI reference (although no fragment identifier is allowed) in the XML specification, and thus can be virtually anything, so a customisable resolving facility is provided.

To use entities, an external resolver must be set, using the parser's set_resolver routine. It sets a single resolver for use both for external DTDs and entities, there are routines to set each of these separately, set_entity_resolver and set_dtd_resolver.

An external resolver is a class that opens a KI_CHARACTER_INPUT_STREAM given a SYSTEM identifier (a string) or a PUBLIC identifier (another string - but not a URI). An error is produced if no corresponding stream can be found. It is the responsibility of the client to close the stream.

All external entity resolvers are descendants of XM_EXTERNAL_RESOLVER.

Resolution of URI references

As well as external entities, xml applications (but not the parser), may encounter other URI references that they need to process. A typical example is the document() function in XPath/XSLT. The contents of this is a URI reference. This too, will need to be opened for processing.

All URI reference resolvers are descendants of XM_URI_REFERENCE_RESOLVER. To resolve a URI reference, create such a resolver then call it's resolve_uri routine, passing it the name of the URI reference to open. You will get back a KI_CHARACTER_INPUT_STREAM.

Concrete resolvers

Here is a list of some available external entity resolvers supplied with the library, in order of increasing power. If none of these meets your needs, then you may write your own.

This resolver does nothing. It is the default used by the xml parser if you do not call set_resolver. Sutiable for stand-alone documents only.
This resolver takes the supplied SYSTEM id, and treats it as a file in the current working directory. It is therefore of only extremely limited use, and should be considered obsolete. Use XM_URI_EXTERNAL_RESOLVER instead.
This resolver resolves by mapping SYSTEM ids to in-memory strings. It must be configured by the programmer for every single SYSTEM id that will be encountered.
This resolver handles any SYSTEM id by examining it's URI scheme, and delegating it to a descendant of XM_URI_RESOLVER, whose purpose is to open URIs for that particular scheme.

The library provides XM_FILE_URI_RESOLVER, XM_DATA_URI_RESOLVER to handle the file and data protocols, and XM_STDIN_URI_RESOLVER to resolve stdin: to the standard input stream. If you need to handle other protocols, such as http or ftp, then you may be able to find them in other libraries (such as ePOSIX for http, https and ftp), or you can write one yourself. After creating such a resolver, you need to register it by calling register_scheme on the XM_URI_EXTERNAL_RESOLVER. You can inherit from XM_RESOLVER_FACTORY, and use the new_resolver* routines to create these resolvers.

There is also XM_STRING_URI_RESOLVER, which resolves URIs in the string scheme for naming STRINGs. This is designed to work with the XM_CATALOG_BOOTSTRAP_RESOLVER (see below).

This resolver combines the features of XM_STRING_EXTERNAL_RESOLVER and XM_URI_EXTERNAL_RESOLVER, and extends them further, in as much as it deals with resolution of PUBLIC ids and URI references also.

The resolver comes pre-configured with strings for resolving DTDs and schemas associated with OASIS ETRC XML Catalog files. Hence it's name - it is designed for use by XM_CATALOG__RESOLVER so that it has a way of resolving references within the catalog files that it uses to carry out the resolution process. But you can configure it to handle additional SYSTEM and PUBLIC ids, and URI references (there are separate lists for each type).

There is also an XM_STRING_URI_RESOLVER, which maps URIs in a string scheme to Eiffel STRINGs. To use this, you must add additional SYSTEM ids to the bootstrap resolver's list of well-known SYSTEM ids, along with the contents of an Eiffel STRING. Then you must register the resolver with the bootstrap resolver's uri_scheme_resolver. Once this has been done, the catalog resolver will be able to resolve your string URIs.

Briefly mentioned in passing was XM_CATALOG_RESOLVER. This is such a powerful and complex resolver, that it gets a chapter all of it's own.

Copyright 2005, Eric Bezault
Last Updated: 7 July 2005