Using XML based Configurations

This section explains how to use hierarchical and structured XML datasets.

Hierarchical properties

Because of its tree-like nature XML documents can represent data that is structured in many ways. This section explains how to deal with such structured documents and demonstrates the enhanced query facilities supported by the XMLConfiguration class..

Accessing properties defined in XML documents

We will start with a simple XML document to show some basics about accessing properties. The following file named gui.xml is used as example document:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<gui-definition>
  <colors>
    <background>#808080</background>
    <text>#000000</text>
    <header>#008000</header>
    <link normal="#000080" visited="#800080"/>
    <default>${colors.header}</default>
  </colors>
  <rowsPerPage>15</rowsPerPage>
  <buttons>
    <name>OK,Cancel,Help</name>
  </buttons>
  <numberFormat pattern="###\,###.##"/>
</gui-definition>

(As becomes obvious, this tutorial does not bother with good design of XML documents, the example file should rather demonstrate the different ways of accessing properties.) To access the data stored in this document it must be loaded by XMLConfiguration. Like other file based configuration classes XMLConfiguration supports many ways of specifying the file to process. One way is to pass the file name to the constructor as shown in the following code fragment:

try
{
    XMLConfiguration config = new XMLConfiguration("tables.xml");
    // do something with config
}
catch(ConfigurationException cex)
{
    // something went wrong, e.g. the file was not found
}

If no exception was thrown, the properties defined in the XML document are now available in the configuration object. The following fragment shows how the properties can be accessed:

String backColor = config.getString("colors.background");
String textColor = config.getString("colors.text");
String linkNormal = config.getString("colors.link[@normal]");
String defColor = config.getString("colors.default");
int rowsPerPage = config.getInt("rowsPerPage");
List buttons = config.getList("buttons.name");

This listing demonstrates some important points about constructing keys for accessing properties load from XML documents and about features of XMLConfiguration in general:

  • Nested elements are accessed using a dot notation. In the example document there is an element <text> in the body of the <color> element. The corresponding key is color.text.
  • The root element is ignored when constructing keys. In the example you do not write gui-definition.color.text, but only color.text.
  • Attributes of XML elements are accessed in a XPath like notation.
  • Interpolation can be used as in PropertiesConfiguration. Here the <default> element in the colors section refers to another color.
  • Lists of properties can be defined in a short form using the delimiter character (which is the comma by default). In this example the buttons.name property has the three values OK, Cancel, and Help, so it is queried using the getList() method. This works in attributes, too. Using the static setDelimiter() method of AbstractConfiguration you can globally define a different delimiter character or - by setting the delimiter to 0 - disabling this mechanism completely. Placing a backslash before a delimiter character will escape it. This is demonstrated in the pattern attribute of the numberFormat element.

In the next section will show how data in a more complex XML document can be processed.

Structured XML

Consider the following scenario: An application operates on database tables and wants to load a definition of the database schema from its configuration. A XML document provides this information. It could look as follows:

<?xml version="1.0" encoding="ISO-8859-1" ?>

<database>
  <tables>
    <table tableType="system">
      <name>users</name>
      <fields>
        <field>
          <name>uid</name>
          <type>long</type>
        </field>
        <field>
          <name>uname</name>
          <type>java.lang.String</type>
        </field>
        <field>
          <name>firstName</name>
          <type>java.lang.String</type>
        </field>
        <field>
          <name>lastName</name>
          <type>java.lang.String</type>
        </field>
        <field>
          <name>email</name>
          <type>java.lang.String</type>
        </field>
      </fields>
    </table>
    <table tableType="application">
      <name>documents</name>
      <fields>
        <field>
          <name>docid</name>
          <type>long</type>
        </field>
        <field>
          <name>name</name>
          <type>java.lang.String</type>
        </field>
        <field>
          <name>creationDate</name>
          <type>java.util.Date</type>
        </field>
        <field>
          <name>authorID</name>
          <type>long</type>
        </field>
        <field>
          <name>version</name>
          <type>int</type>
        </field>
      </fields>
    </table>
  </tables>
</database>

This XML is quite self explanatory; there is an arbitrary number of table elements, each of it has a name and a list of fields. A field in turn consists of a name and a data type. This XML document (let's call it tables.xml) can be loaded in exactly the same way as the simple document in the section before.

When we now want to access some of the properties we face a problem: the syntax for constructing configuration keys we learned so far is not powerful enough to access all of the data stored in the tables document.

Because the document contains a list of tables some properties are defined more than once. E.g. the configuration key tables.table.name refers to a name element inside a table element inside a tables element. This constellation happens to occur twice in the tables document.

Multiple definitions of a property do not cause problems and are supported by all classes of Configuration. If such a property is queried using getProperty(), the method recognizes that there are multiple values for that property and returns a collection with all these values. So we could write

Object prop = config.getProperty("tables.table.name");
if(prop instanceof Collection)
{
	System.out.println("Number of tables: " + ((Collection) prop).size());
}

An alternative to this code would be the getList() method of Configuration. If a property is known to have multiple values (as is the table name property in this example), getList() allows to retrieve all values at once. Note: it is legal to call getString() or one of the other getter methods on a property with multiple values; it returns the first element of the list.

Accessing structured properties

Okay, we can obtain a list with the name of all defined tables. In the same way we can retrieve a list with the names of all table fields: just pass the key tables.table.fields.field.name to the getList() method. In our example this list would contain 10 elements, the names of all fields of all tables. This is fine, but how do we know, which field belongs to which table?

When working with such hierarchical structures the configuration keys used to query properties can have an extended syntax. All components of a key can be appended by a numerical value in parentheses that determines the index of the affected property. So if we have two table elements we can exactly specify, which one we want to address by appending the corresponding index. This is explained best by some examples:

We will now provide some configuration keys and show the results of a getProperty() call with these keys as arguments.

tables.table(0).name
Returns the name of the first table (all indices are 0 based), in this example the string users.
tables.table(0)[@tableType]
Returns the value of the tableType attribute of the first table (system).
tables.table(1).name
Analogous to the first example returns the name of the second table (documents).
tables.table(2).name
Here the name of a third table is queried, but because there are only two tables result is null. The fact that a null value is returned for invalid indices can be used to find out how many values are defined for a certain property: just increment the index in a loop as long as valid objects are returned.
tables.table(1).fields.field.name
Returns a collection with the names of all fields that belong to the second table. With such kind of keys it is now possible to find out, which fields belong to which table.
tables.table(1).fields.field(2).name
The additional index after field selects a certain field. This expression represents the name of the third field in the second table (creationDate).
tables.table.fields.field(0).type
This key may be a bit unusual but nevertheless completely valid. It selects the data types of the first fields in all tables. So here a collection would be returned with the values [long, long].

These examples should make the usage of indices quite clear. Because each configuration key can contain an arbitrary number of indices it is possible to navigate through complex structures of XML documents; each XML element can be uniquely identified.

Adding new properties

So far we have learned how to use indices to avoid ambiguities when querying properties. The same problem occurs when adding new properties to a structured configuration. As an example let's assume we want to add a new field to the second table. New properties can be added to a configuration using the addProperty() method. Of course, we have to exactly specify where in the tree like structure new data is to be inserted. A statement like

// Warning: This might cause trouble!
config.addProperty("tables.table.fields.field.name", "size");

would not be sufficient because it does not contain all needed information. How is such a statement processed by the addProperty() method?

addProperty() splits the provided key into its single parts and navigates through the properties tree along the corresponding element names. In this example it will start at the root element and then find the tables element. The next key part to be processed is table, but here a problem occurs: the configuration contains two table properties below the tables element. To get rid off this ambiguity an index can be specified at this position in the key that makes clear, which of the two properties should be followed. tables.table(1).fields.field.name e.g. would select the second table property. If an index is missing, addProperty() always follows the last available element. In our example this would be the second table, too.

The following parts of the key are processed in exactly the same manner. Under the selected table property there is exactly one fields property, so this step is not problematic at all. In the next step the field part has to be processed. At the actual position in the properties tree there are multiple field (sub) properties. So we here have the same situation as for the table part. Because no explicit index is defined the last field property is selected. The last part of the key passed to addProperty() (name in this example) will always be added as new property at the position that has been reached in the former processing steps. So in our example the last field property of the second table would be given a new name sub property and the resulting structure would look like the following listing:

	...
    <table tableType="application">
      <name>documents</name>
      <fields>
        <field>
          <name>docid</name>
          <type>long</type>
        </field>
        <field>
          <name>name</name>
          <type>java.lang.String</type>
        </field>
        <field>
          <name>creationDate</name>
          <type>java.util.Date</type>
        </field>
        <field>
          <name>authorID</name>
          <type>long</type>
        </field>
        <field>
          <name>version</name>
		  <name>size</name>    <== Newly added property
          <type>int</type>
        </field>
      </fields>
    </table>
  </tables>
</database>

This result is obviously not what was desired, but it demonstrates how addProperty() works: the method follows an existing branch in the properties tree and adds new leaves to it. (If the passed in key does not match a branch in the existing tree, a new branch will be added. E.g. if we pass the key tables.table.data.first.test, the existing tree can be navigated until the data part of the key. From here a new branch is started with the remaining parts data, first and test.)

If we want a different behavior, we must explicitely tell addProperty() what to do. In our example with the new field our intension was to create a new branch for the field part in the key, so that a new field property is added to the structure rather than adding sub properties to the last existing field property. This can be achieved by specifying the special index (-1) at the corresponding position in the key as shown below:

config.addProperty("tables.table(1).fields.field(-1).name", "size");
config.addProperty("tables.table(1).fields.field.type", "int");

The first line in this fragment specifies that a new branch is to be created for the field property (index -1). In the second line no index is specified for the field, so the last one is used - which happens to be the field that has just been created. So these two statements add a fully defined field to the second table. This is the default pattern for adding new properties or whole hierarchies of properties: first create a new branch in the properties tree and then populate its sub properties. As an additional example let's add a complete new table definition to our example configuration:

// Add a new table element and define the name
config.addProperty("tables.table(-1).name", "versions");

// Add a new field to the new table
// (an index for the table is not necessary because the latest is used)
config.addProperty("tables.table.fields.field(-1).name", "id");
config.addProperty("tables.table.fields.field.type", "int");

// Add another field to the new table
config.addProperty("tables.table.fields.field(-1).name", "date");
config.addProperty("tables.table.fields.field.type", "java.sql.Date");
...

For more information about adding properties to a hierarchical configuration also have a look at the javadocs for HierarchicalConfiguration.

Escaping dot characters in XML tags

In XML the dot character used as delimiter by most configuration classes is a legal character that can occur in any tag. So the following XML document is completely valid:

<?xml version="1.0" encoding="ISO-8859-1" ?>

<configuration>
  <test.value>42</test.value>
  <test.complex>
    <test.sub.element>many dots</test.sub.element>
  </test.complex>
</configuration>

This XML document can be loaded by XMLConfiguration without trouble, but when we want to access certain properties we face a problem: The configuration claims that it does not store any values for the properties with the keys test.value or test.complex.test.sub.element!

Of course, it is the dot character contained in the property names, which causes this problem. A dot is always interpreted as a delimiter between elements. So given the property key test.value the configuration would look for an element named test and then for a sub element with the name value. To change this behavior it is possible to escape a dot character, thus telling the configuration that it is really part of an element name. This is simply done by duplicating the dot. So the following statements will return the desired property values:

int testVal = config.getInt("test..value");
String complex = config.getString("test..complex.test..sub..element");

Note the duplicated dots whereever the dot does not act as delimiter. This way it is possible to access properties containing dots in arbitrary combination. However, as you can see, the escaping can be confusing sometimes. So if you have a choice, you should avoid dots in the tag names of your XML configuration files.

Validation of XML configuration files

XML parsers provide support for validation of XML documents to ensure that they conform to a certain DTD. This feature can be useful for configuration files, too. XMLConfiguration allows to enable validation for the files to load.

The easiest way to turn on validation is to simply set the validating property to true as shown in the following example:

XMLConfiguration config = new XMLConfiguration();
config.setFileName("myconfig.xml");
config.setValidating(true);

// This will throw a ConfigurationException if the XML document does not
// conform to its DTD.
config.load();

Setting the validating flag to true will cause XMLConfiguration to use a validating XML parser. At this parser a custom ErrorHandler will be registered, which throws exceptions on simple and fatal parsing errors.

While using the validating flag is a simple means of enabling validation it cannot fullfil more complex requirements, e.g. schema validation. To be able to deal with such requirements XMLConfiguration provides a generic way of setting up the XML parser to use: A preconfigured DocumentBuilder object can be passed to the setDocumentBuilder() method.

So an application can create a DocumentBuilder object and initialize it according to its special needs. Then this object must be passed to the XMLConfiguration instance before invocation of the load() method. When loading a configuration file, the passed in DocumentBuilder will be used instead of the default one.