Websydian v6.1 online documentationOnline documentation - Websydian v6.1

Schema Validation



Introduction

This document describes how you can use a W3C schema to validate an XML document in TransacXML.

In this document, the term W3C schema is used to specify an XML schema that is created according to the W3C XML Schema recommendation.

Schema validation has been introduced in TransacXML in version 6.1.3.

What is a W3C schema

A W3C schema is an XML document that specifies the data model and syntax for other XML documents. These other XML documents can be validated using the schemas that describe the content of the documents.

It is not the aim of this document to describe the W3C schema syntax - it is quite complex and it is really not necessary to understand the syntax to understand how you can validate the documents. However, to be able to understand the errors returned from the validation, you will probably need to have at least a basic understanding of this syntax.

A web-based introduction can be found here: http://www.w3schools.com/schema/.

If you want a thorough explanation, we recommend the book "XML Schema" by Eric van der Vlist (ISBN: 978-0-596-00252-7).

When to validate a document using a W3C schema

You can validate the document at any time - but the following will describe the most likely scenarios.

In most cases, you will want to validate an XML document using a schema when somebody else has generated the document - you will normally validate the document before you start reading the content of the XML document.

By validating the document before you process it, you can often simplify your processing as you do not have to code the error handling of all the possible errors in the document structure.

Validating a document that you have created yourself can be a help in the development process. If you have a case where you have agreed on a schema for a document with an external party, you should try validating one of the documents you create with this schema before you start sending them to the external party - in this way you will have removed a number of possible errors before proceeding to send the documents to the external party.

However, it is not recommended that you do so in production - when you have the functions in place to create a document that is valid according to the schema, it will in most cases be an unnecessary overhead to validate the documents you generate yourself.

Performance consideration

You should note that there is a performance overhead when you validate the document with a schema - the performance overhead depends on the size of the schema (as it is parsed before it is used to validate the document) and the complexity and size of the document that is being validated.

Validating documents in TransacXML

The validation is performed using the function DomServices.SchemaValidation.

The function must be called after the XML document you want to validate has been loaded. Specify the references to the document you want to load for the input parameters:

DocumentToValidate<ObjectStoreReference>, DocumentToValidate<ObjectDocument>.

Specify the path and filename of the file containing the schema you want to use for the validation in the input parameter: Schema<SchemaFile>.

When using the WinC variant, you will have to specify the target namespace of the schema file in the input parameter WinCSpecificInput<namespaceURI> (see more below). When using the Java variant, any value specified for this parameter is disregarded - just specify <namespaceURI.NULL> when calling the function.

The function returns *Successful in Environment<*Returned status> if the validation is successful.

If the validation results in an error, a non-successful value will be returned in Environment<*Returned status>. You can use the function DomServices.ErrorPop to retrieve more information about the error (see more below).

Retrieving validation errors

For both Java and WinC the most important thing is to check for Environment<*Returned Status> after calling the DomServices.SchemaValidation function. This is the indication of whether the validation has been successful or not.

For the purpose of controlling the further functionality of the program calling the validation, checking the status will often be sufficient. However, in most cases you will also want to be able to report what the problem was - either to a user or as a message in a log.

The validation process writes all the validation errors the parser report to an error stack. You can retrieve the entries in this stack using the function DomServices.ErrorPop.

The number of errors reported are different for the two variants - for Java, a number of error messages can be reported, while the WinC variant normally only will report the first error found.

However, if you implement the following error checking, you will be certain to retrieve all errors reported by the validation:

 

Call WSYDOM/DomServices.SchemaValidation

Map with

Variable DocumentToValidate - References to the XML document

Variable Schema - The path and filename of the file containing the schema

Variable WinCSpecificInput - WinC: The target namespace of the schema - Java: NULL

 

If Environment<*Returning status> == <*Returning status.*Successful>

    * The document is valid - proceed with reading the document

Else

    Call WSYDOM/DomServices.ErrorPop

    While DomServices.ErrorPop/Output<ExceptionCode2> IS <State: WSYDOM/ExceptionCode2.NO_EXCEPTION>

        * Handle each error here (e.g. create a response document containing all the errors - or write to log

        * DomServices.ErrorPop/Output<ErrorDescription> contains the text of the error reported by the parser

        Call WSYDOM/DomServices.ErrorPop

In this way, you will retrieve all errors and you will be able to use the error messages to find information about the error.

Note that calling the ErrorPop function removes the errors from the error stack.

WinC: Specifying target namespace

The MSXML parser used in the WinC variant of TransacXML demands that you specify the target namespace of the schema when calling the validation function.

The target namespace of a schema is declared by the targetNamespace attribute in the top element of the schema (the schema element itself). If you for instance have the following top element in your schema, the target namespace is http://FirstNamespace.com:

If the schema tag does not contain a targetNamespace, you must specify NULL for the parameter.

If you specify a targetNamespace that is not the one specified in the schema, the validation function will return an error.

If you specify a value in the Java variant, it will always be disregarded. This means that if you want to make a function that effortlessly can be moved from WinC to Java at a later date, just specify the namespace in the parameter - this will work in both variants.

ValidateOnParse (WinC only)

The WinC version offers another way to perform schema validation for an XML document.

This can only be done when the document to validate contains a reference to the schema you can use to validate it with. This reference is in the form of a schemaLocation attribute in the schema instance namespace http://www.w3.org/2001/XMLSchema-instance - normally prefixed as xsi).

Where the validation described above is performed after you have loaded the document, this validation is performed as the document is loaded.

Note that if you use this way to validate the document, you are accepting that the party that produces the document also determines the schema that will be used for the validation.

This means that you risk having the situation where the schema used for the validation has been changed without your knowledge.

 

Another issue you need to be aware of is that the references in the schemaLocation attribute normally will be a reference to a resource on an external web server. This means that when loading the document the schema will be retrieved from the external server using an http request.  This can be a performance problem and in the worst case a security issue.

 

These issues means that our general recommendation is to use the SchemaValidation function described above for the schema validation.

Performing the validation

An option has to be specified for the XML document before it is loaded. You can do this by either calling the SetOption function yourself - or if you want to validate the input document in an ImportXmlDocument function, by setting a function option.

Calling the SetOption function

To activate this validation, you can call the function XmlParsers.MSXML.SetOption for the XML document before loading it.

Map the ObjectStoreReference with the ObjectStoreReference for the document you want to validate when loading.

Map the ObjectReference with the ObjectDocument for the document you want to validate when loading.

Map the ParserOption with the value ParserOption.ValidateOnParse.

Map the OptionValue with True.

ImportXmlDocument - using the ValidateOnParse option

For functions inheriting from the abstract ImportXmlDocument, you can do this by setting the function option ValidateOnParse to Yes - this automatically sets this option for the input document.

Loading the document

At this point the option is set and the normal call to DomServices.LoadDocumentFromFile can be performed.

If the document is valid, Environment<*Returning Status> will be set to *Success - otherwise either the load or the validation has failed.

As when calling the SchemaValidation function, you can retrieve the errors by calling the ErrorPop function.