Web services
XML documents contain the information being exchanged between two parties. It is used to organize documents and business data. XML files can be stored or transmitted between two applications on a network. SOAP provides a packaging and routing standard for exchanging XML documents over a network. A SOAP message is just an XML document. SOAP is specially designed, however, to contain and transmit other XML documents as well as information related to routing, processing, security, transactions, and other qualities of service. WSDL allows an organization to describe the types of XML documents and SOAP messages that must be used to interact with their Web services. UDDI allows organizations to register their Web services in a uniform manner within a common directory, so clients can locate their Web services and learn how to access them.
A SOAP Message That Contains Address Information <?xml version="1.0" encoding="UTF-8"?> <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:addr="http://www.Monson-Haefel.com/jwsbook/ADDR" > <soap:Body> <addr:address> <addr:name>Amazon.com</addr:name> <addr:street>1516 2nd Ave</addr:street> <addr:city>Seattle</addr:city> <addr:state>WA</addr:state> <addr:zip>90952</addr:zip> </addr:address> </soap:Body> </soap:Envelope> SOAP takes advantage of advanced XML features like XML namespaces (similar to Java package names) and XML schemas (used to type data) SOAP messages serve as a network envelope for exchanging XML documents and data.
There are actually two versions of SOAP today, versions 1.1 and 1.2. SOAP Messages with Attachments (SwA):SwA defines a message format for attaching binary data (images, sound files,documents, and so on) to SOAP messages. WSDL (Web Services Description Language) is a standard for describing the structure of the XML data exchanged between two systems using SOAP. When you create a new Web service, you can also create a WSDL document that describes the type of data you're exchanging. There are two versions of WSDL today, versions 1.1 and 1.2. Although WSDL provides an excellent format for describing the types of SOAP messages used by a Web service, it provides no guidance on where to store the WSDL documents or how to find them. In other words, WSDL doesn't describe where the WSDL documents should be kept so that others can find them easily and use them to communicate with your Web services UDDI (Universal Description, Discovery, and Integration) defines a standard set of Web service operations (methods) that are used to store and look up information about other Web service applications. In other words, UDDI defines a standard SOAP-based interface for a Web services registry. You can use a UDDI registry to find a particular type of Web service, or to find out about the Web services hosted by a specific organization. A UDDI registry is often referred to as a "Yellow Pages" for Web services. When you look up information about a Web service in a UDDI registry, you can narrow your search using various categories (technologies used, business types, industry, and so on). Each entry in a UDDI registry provides information on where the Web service is located and how to communicate with it.[2] The UDDI registry also provides information about the organization that hosts a particular Web service. UDDI can also store data about other types of services, such as a Web site or a phone service. There are three versions of UDDI at this time, versions 1.0, 2.0, and 3.0.
J2EE Web Service APIs: JAX-RPC: SAAJ: JAXR: JAXP: These are the APIs you will need to understand if you want to implement Web service applications using the J2EE platform. Perhaps the most important Web service API is JAX-RPC, which is used to implement J2EE Web service clients and endpoints (services) JAX-RPC is divided into two parts: 1) a set of client-side APIs 2) A set of server-side components, called endpoints. The client-side APIs allow you to communicate with Web service endpoints hosted on some other platform. For example, you can use one of the client-side APIs to send SOAP messages to a VB.NET or an Apache Axis Web service. The client-side APIs can be used from standalone Java applications or from J2EE components like servlets, JSPs, or EJBs. There are three client-side APIs: 1) generated stub(MOSTLY USED) 2) dynamic proxy 3) DII (Dynamic Invocation Interface). The generated stub is the one you will use the most, and its semantics closely resemble those of Java RMI. The dynamic proxy API also follows many of the Java RMI semantics, but is used less often. The DII is a very low-level API used primarily by vendor tools, but can also be employed by Web services developers if necessary. The server-side components include the JAX-RPC service endpoint (JSE) and the EJB endpoint. The JSE component is actually a type of servlet that has been adapted for use as a Web services component. It's very easy to implement, yet it has access to the full array of services and interfaces common to servlets. Java API for XML-based Remote Procedure Calls
The EJB endpoint is simply a type of stateless session EJB that has been adapted for use as a Web service endpoint. The EJB endpoint provides all the transactional and security features of a normal stateless session bean, but it's specifically designed to process SOAP requests. There are currently two versions of JAX-RPC, 1.0 and 1.1. Version 1.0 is used with J2EE 1.3, Version 1.1 is used with J2EE 1.4. SAAJ: (SOAP with Attachments API for Java) is a low-level SOAP API that complies with SOAP 1.1 and the SOAP Messages with Attachments specification. SAAJ allows you to build SOAP messages from scratch as well as read and manipulate SOAP messages. You can use it alone to create, transmit, and process SOAP messages, but you're more likely to use it in conjunction with JAX-RPC. In JAX-RPC, SAAJ is used primarily to process SOAP header blocks (the SOAP message metadata. JAXR: (Java API for XML Registries) provides an API for accessing UDDI registries. It simplifies the process of publishing and searching for Web service endpoints. JAXR was originally intended for ebXML registries, a standard that competes with UDDI, but was adapted for UDDI and works pretty well in most cases. JAXR has a set of business-domain types like Organization, Postal Address, and Contact as well as technical-domain types like Service Binding, External Link, and Classification. These domain models map nicely to UDDI data types. JAXR also defines APIs for publishing and searching for information in a UDDI registry. There is only one version of JAXR, version 1.0, JAXP JAXP (Java API for XML Processing) provides a framework for using DOM 2 and SAX2, standard Java APIs that read, write, and modify XML documents. DOM 2 (Document Object Model, Level 2) is a Java API that models XML documents as trees of objects. It contains objects that represent elements, attributes, values, and so on. DOM 2 is used a lot in situations where speed and memory are not factors, but complex manipulation of XML documents is required. DOM 2 is also the basis of SAAJ 1.1.
SAX2 (Simple API for XML, version 2) is very different in functionality from DOM 2. When a SAX parser reads an XML document, it fires events as it encounters start and end tags, attributes, values, etc. You can register listeners for these events, and they will be notified as the SAX2 parser detects changes in the XML document it is reading. JAXP comes in several versions including 1.1, 1.2, and 1.3. Version 1.3 is very new and is not supported by J2EE 1.4 Web Services.
XML BASICS
XML Primer An XML markup language defines a set of tags that are used to organize and describe text. Tags are usually paired; together, a start tag, an end tag, and everything between them are called an element. For example, you could save the addresses of your friends, family members, and business associates in a text file using XML
XML Address Document
<?xml version="1.0" encoding="UTF-8" ?> <addresses> <address category="friend"> <name>Bill Frankenfiller</name> <street>3243 West 1st Ave.</street> <city>Madison</city> <state>WI</state> <zip>53591</zip> </address> <address category="business"> <name>Amazon.com</name> <street>1516 2nd Ave</street> <city>Seattle</city> <state>WA</state> <zip>90952</zip> </address> </addresses>
XML documents are composed of Unicode text (usually UTF-8), so people as well as software can understand them The ability to create an infinite number of new markup languages is why XML is called eXtensible
XML only defines the syntax of elements used in text—it is not software and isn't compiled, interpreted, or executed. It's just plain text XSLT (extensible Style sheet Language Transformation) is a programming language based on XML. XML is used for two different purposes: 1) document-oriented and data-oriented applications.2) 2) Document-oriented markup languages like XHTML 3) DocBook are focused on the format and presentation of literature. Data-oriented markup languages focus on how data is organized and typed; they define a schema for storing and exchanging data between software applications. Some XML markup languages are industry standards, like SOAP and XHTML, while most are designed to serve a single application, organization, or individual. The XML markup languages used in this book, both custom and standard, are decidedly dataoriented. Regardless of the source of a markup language, if it's based on XML it must follow the same syntax and rules defined by the XML specification, which makes XML documents portable. Portability means you can use any standard XML parsers, editors, and other utilities to process most, if not all, of the XML documents you will encounter. An XML parser is a utility that can read and analyze an XML document. In most cases an XML parser is combined with a parser API (such as SAX2 or DOM 2) that allows a developer to interact with the XML document while it's being parsed, or after. An XML document can be saved or transferred over a network. A Web page written in XHTML (a variant of HTML), which is a text file, is an XML document. Similarly, a SOAP message, which is generated and exchanged over a network, is an XML document. A business might choose to store address information as an XML document. In this case the text file might look like
Listing 2-2 An XML Address Document Instance
<?xml version="1.0" encoding="UTF-8" ?> <address> <name>Amazon.com</name> <street>1516 2nd Ave</street> <city>Seattle</city>
<state>WA</state> <zip>90952</zip> </address>
The above example is called an XML document instance, which means it represents one possible set of data for a particular markup language. It might be saved as a file or sent over the Internet as the payload of a SOAP message. If you were to create another XML document with the same tags but different contents (like a different street or Zip code) it would be considered a different XML document instance. Anatomy of an XML Document An XML document is made up of declarations, elements, attributes, text data, comments, and other components. This section examines an XML document instance in detail and explains its most important components.
XML Declaration
An XML document may start with an XML declaration, but it's not required. An XML declaration declares the version of XML used to define the document (there is only one version at this time, version 1.0). It may also indicate the character encoding used to store or transfer the document, and whether the document is standalone or not (the standalone attribute is not used in this book).
<?xml version="1.0" encoding="UTF-8" ?> <address> <name>Amazon.com</name> <street>1516 2nd Ave</street> <city>Seattle</city> <state>WA</state> <zip>90952</zip> </address>
Elements
XML markup languages organize data hierarchically, in a tree structure, where each branch of the tree is called an element and is delimited by a pair of tags. All elements are named and have a start tag and an end tag. A start tag looks like <tagname> and an end tag looks like </tagname>. The tagname is a label that usually describes the information contained by the element. Between the start and end tags, an element may contain text or other elements, which themselves may contain text or more elements. There are six elements in this example (address, name, street, city, state, and zip). The address element uses the start tag <address> and the end tag </address>, and contains the other five elements. The address element, because it contains all the other elements, is referred to as the root element. Each XML document must have one root element, and that element must contain all the other elements and text, except the XML declaration, comments, and certain processing instructions.
The other elements (name, street, city, state, zip) all contain text. According to the WS-I Basic Profile 1.0, XML documents used in Web services must use either UTF-8 or UTF-16 encoding. This limitation simplifies things for Web service vendors and makes interoperability easier, because there is only one character encoding standard to worry about, Unicode. UTF-8 and UTF-16 encoding allows you to use characters from English, Chinese, French, German, Japanese, and many other languages. An element name must always begin with a letter or underscore, but can contain pretty much any Unicode character you like, including underscores, letters, digits, hyphens, and periods. Some characters may not be used: /, <, >, ?, ", @, &, and others. Also, an element name must never start with the string xml, as this is reserved by the XML 1.0 specification. As long as you follow XML's rules you may name elements anything and your elements may contain any combination of valid text and other elements. Elements do not have to contain any data at all. It's perfectly acceptable to use an empty-element tag, a single tag of the form <tagname/>, which is interpreted as a pair of start and end tags with no content (<tagname></tagname>). Empty-element tags are typically used when an element has no data, when it acts like flag, or when it’s pertinent data is contained in its attributes (attributes are described in the next section).
Attributes
An element may have one or more attributes. You use an attribute to supplement the data contained by an element, to provide information about it not captured by its contents. For example, we could describe the kind of address in an XML address document by declaring a category attribute as in.
Listing 2-3 Using Attributes in XML
<?xml version="1.0" encoding="UTF-8" ?> <address category="business" > <name>Amazon.com</name> <street>1516 2nd Ave</street> <city>Seattle</city> <state>WA</state> <zip>90952</zip> </address>
Each attribute is a name-value pair. The value must be in single or double quotes. You can define any number of attributes for an element, but a particular attribute may occur only once in a single element. Attributes cannot be nested like elements. Attribute names have the same restrictions as element names. Attributes must be declared in the start tag and never the end tag of an element. In many cases, empty-element tags are used when the attributes contain all the data.
Listing 2-4 Using the Empty-Element Tag in XML
<?xml version="1.0" encoding="UTF-8" ?> <address category="business" > <name>Amazon.com</name> <street>1516 2nd Ave</street> <city>Seattle</city> <state>WA</state> <zip>90952</zip> <phone countrycode="01" areacode="715" number ="55529482" ext="341" /> </address>
Using attributes instead of nested elements is considered a matter of style, rather than convention. There are no "standard" design conventions for using attributes or elements.
Comments
You can add comments to an XML document just as you can add comments to a Java program. A comment is considered documentation about the XML document and is not part of the data it describes. Comments are placed between a <!-- designator and a --> designator, as in HTML:
<!-- comment goes here -->.
Using Comments in XML
<?xml version="1.0" encoding="UTF-8" ?> <!-- This document contains address information --> <address category="business" > <name>Amazon.com</name> <street>1516 2nd Ave</street> <city>Seattle</city> <state>WA</state> <zip>90952</zip> </address>
CDATA Section
An element may contain other elements, text, or a mixture of both. When an element contains text, you have to be careful about which characters you use because certain characters have special meaning in XML. Using quotes (single or double), less-than and greater-than signs (< and >), the ampersand (&), and other special characters in the contents of an element will confuse parsers, which consider these characters to be special parsing symbols. To avoid parsing problems you can use escape characters like > for greater-than or & for ampersand, but this technique can become cumbersome. A CDATA section allows you to mark a section of text as literal so that it will not be parsed for tags and symbols, but will instead be considered just a string of characters. For example, if you want to put HTML in an XML document, but you don't want it parsed, you can embed it in a CDATA section. In the address document contains a note in HTML format.
Using a CDATA Section in XML
<?xml version="1.0" encoding="UTF-8" ?> <!-- This document contains address information --> <address category="business" > <name>Amazon.com</name> <street>1516 2nd Ave</street> <city>Seattle</city> <state>WA</state> <zip>90952</zip> <note> <![CDATA[ <html> <body> <p> Last time I contacted <b>Amazon.com</b> I spoke to ... </body> </html> ]]> </note> </address>
CDATA Sections take the form <![CDATA[ text goes here ]]> . If we include the HTML in the note element without embedding it in a CDATA section, XML processors will parse it as Address Markup, instead of treating it as ordinary text, causing two kinds of problems: First, HTML's syntax isn't as strict as XML's so parsing problems are likely. Second, the HTML is not actually part of Address Markup; it's simply a part of the text contained by the note element, and we want it treated as literal text. Processing XML Documents Although XML is just plain text, and can be accessed using a common text editor, it's usually read and manipulated by software applications and not by people using text editors. A software application that reads and manipulates XML documents will use an XML parser. In general, parsers read a stream of data (usually a file or network stream) and break it down into functional units that can then be processed by a software application. An XML parser can read an XML document and parse its contents according to the XML syntax. Parsers usually provide a programming API that allows developers to access elements, attributes, text, and other constructs in XML documents. There are basically two standard kinds of XML parser APIs: SAX and DOM. SAX (Simple API for XML) was the first standard XML parser API and is very popular. Although several individuals created it, David Brownell currently maintains SAX2, the latest version, as an open development project at SourceForge.org. SAX2 parsers are available in many programming languages including Java.
SAX2 is based on an event model. As the SAX2 parser reads an XML document, starting at the beginning, it fires off events every time it encounters a new element, attribute, piece of text, or other component. SAX2 parsers are generally very fast because they read an XML document sequentially and report on the markup as it's encountered. DOM (Document Object Model) was developed after SAX2 and maintained by the W3C. DOM level 2 (DOM 2) is the current version, but there is a DOM level 3 in the works. DOM 2 parsers are also available for many programming languages, including Java. DOM 2 presents the programmer with a generic, object-oriented model of an XML document. Elements, attributes, and text values are represented as objects organized into a hierarchical tree structure that reflects the hierarchy of the XML document being processed. DOM 2 allows an application to navigate the tree structure, modify elements and attributes, and generate new XML documents in memory. It's a very powerful and flexible programming model, but it's also slow compared to SAX2, and consumes a lot more memory. In addition to providing a programming model for reading and manipulating XML documents, the parser's primary responsibility is checking that documents are well formed; that is, that their elements, attributes, and other constructs conform to the syntax prescribed by the XML 1.0 specification. For example, an element without an end tag, or with an attribute name that contains invalid characters, will result in a syntax error. A parser may also, optionally, enforce validity of an XML document. An XML document may be well formed, but invalid because it is not organized according to its schema. Two popular Java parser libraries: 1) Crimson 2) Xerces-J These include both SAX2 and DOM 2, so you can pick the API that better meets your needs. Crimson is a part of the Java 2 platform (JDK 1.4), which means it's available to you automatically. Xerces, which some people feel is better, is maintained by the Apache Software Foundation. You must download it as a JAR file and place it in your class path (or ext directory) before you can use it. Either parser library is fine for most cases, but Xerces supports W3C XML Schema validation while Crimson doesn't. JAXP (Java API for XML Processing), which is part of the J2EE platform, is not a parser. It's a set of factory classes and wrappers for DOM 2 and SAX2 parsers. Java-based DOM 2 and SAX2 parsers, while conforming to standard DOM 2 or SAX2 programming models, are instantiated and configured differently, which inhibits their portability.
JAXP eliminates this portability problem by providing a consistent programming model for instantiating and configuring DOM 2 and SAX2 parsers. JAXP can be used with Crimson or Xerces-J. JAXP is a standard Java extension library, so using it will help keep your J2EE applications portable. Other non-standard XML APIs are also available to Java developers, including JDOM, dom4j, and XOM. These APIs are tree-based like DOM 2, and although they are non-standard, they tend to provide simpler programming models than DOM 2. JDOM and dom4j are actually built on top of DOM 2 implementations, wrapping DOM 2 with their own object-oriented programming model. JDOM and dom4j can both be used with either Xerces-J or Crimson. If ease of use is important, you may want to use one of these non-standard parser libraries, but if J2EE portability is more important, stick with JAXP, DOM 2, and SAX2.
XML Namespaces
An XML namespace provides a qualified name for an XML element or attribute, the same way that a Java package provides a qualified name for a Java class. In most Java programs, classes are imported from other packages (java.io, javax.xml, and the rest). When the Java program is compiled, every operation performed on every object or class is validated against the class definition in the appropriate package. If Java didn't have package names, the classes in the Java core libraries (I/O, AWT, JDBC, etc.) would all be lumped together with developer-defined classes. Java package names allow us to separate Java classes into distinct namespaces, which improves organization and access control, and helps us avoid name conflicts (collisions). XML namespaces are similar to Java packages, and serve the same purposes; XML namespace provides a kind of package name for individual elements and attributes. 2.2.1 An Example of Using Namespaces Creating XML documents based on multiple markup languages is often desirable. For example, suppose we are building a billing and inventory control system for a company called Monson-Haefel Books. We can define a standard markup language for address information, the Address Markup Language, to be used whenever an XML document needs to contain address information. An instance of Address Markup is shown in Listing 2-7.
Listing 2-7 An Instance of the Address Markup Language
<?xml version="1.0" encoding="UTF-8" ?> <address category="business" > <name>Amazon.com</name> <street>1516 2nd Ave</street> <city>Seattle</city> <state>WA</state> <zip>90952</zip> </address>
Address Markup has its own schema, defined using either DTD (Document Type Definition) or the W3C XML Schema Language, which dictates how its elements are organized. Every time we use address information in an XML document, it should be validated against Address Markup's schema. For example, in Listing 2-8 the address information is included in the PurchaseOrder XML document.
Listing 2-8 The PurchaseOrder Document Using the Address Markup Langauge
<?xml version="1.0" encoding="UTF-8" ?> <purchaseOrder orderDate="2003-09-22" > <accountName>Amazon.com</accountName> <accountNumber>923</accountNumber> <address> <name>AMAZON.COM</name> <street>1850 Mercer Drive</street> <city>Lexington</city> <state>KY</state> <zip>40511</zip> </address> <book>
The types element uses the XML schema language to declare complex data types and elements that are used elsewhere in the WSDL document. The import element is similar to an import element in an XML schema document; it's used to import WSDL definitions from other WSDL documents. The message element describes the message's payload using XML schema built-in types, complex types, or elements that are defined in the WSDL document's types element, or defined in an external WSDL document the import element refers to. The portType and operation elements describe a Web service's interface and define its methods. A portType and its operation elements are analogous to a Java interface and its method declarations. An operation element uses one or more message types to define its input and output payloads. The binding element assigns a portType and its operation elements to a particular protocol (for instance, SOAP 1.1) and encoding style. The service element is responsible for assigning an Internet address to a specific binding. The documentation element explains some aspect of the WSDL document to human readers. Any of the other WSDL elements may contain documentation elements. The documentation element is not critical, so it will not be mentioned again in this chapter.
XML Namespaces
An XML namespace provides a qualified name for an XML element or attribute, the same way that a Java package provides a qualified name for a Java class. In most Java programs, classes are imported from other packages (java.io, javax.xml, and the rest). When the Java program is compiled, every operation performed on every object or class is validated against the class definition in the appropriate package. If Java didn't have package names, the classes in the Java core libraries (I/O, AWT, JDBC, etc.) would all be lumped together with developer-defined classes. Java package names allow us to separate Java classes into distinct namespaces, which improves organization and access control, and helps us avoid name conflicts (collisions). XML namespaces are similar to Java packages, and serve the same purposes; an XML namespace provides a kind of package name for individual elements and attributes. An Example of Using Namespaces Creating XML documents based on multiple markup languages is often desirable. For example, suppose we are building a billing and inventory control system for a company called MonsonHoeffel Books. We can define a standard markup language for address information, the Address Markup Language, to be used whenever an XML document needs to contain address information. An instance of Address Markup is shown below
An Instance of the Address Markup Language
<? Xml version="1.0" encoding="UTF-8”?> <address category="business" > <name>Amazon.com</name> <street>1516 2nd Ave</street> <city>Seattle</city> <state>WA</state> <zip>90952</zip> </address>
Address Markup is used in Address Book Markup (nested in the addresses element) defined in at the start of this chapter, but it will also be reused in about half of Monson-Haefel Books' other XML markup languages (types of XML documents): Invoice, Purchase Order, Shipping, Marketing, and others. Address Markup has its own schema, defined using either DTD (Document Type Definition) or the W3C XML Schema Language, which dictates how its elements are organized. Every time we use address information in an XML document, it should be validated against Address Markup's schema. For example, in Listing 2-8 the address information is included in the PurchaseOrder XML document.
Listing 2-8 The PurchaseOrder Document Using the Address Markup Language
[ Team LiB ]
Invocation
There are invocation mechanisms on both the server side and the client side. On the server side, the invocation mechanism is responsible for: Server-Side Invocation 1. Receiving a SOAP message from a transport (e.g., from an HTTP or JMS endpoint). 2. Invoking handlers that preprocess the message (e.g., to persist the message for reliability purposes, or process SOAP headers). 3. Determining the message’s target service—in other words, which WSDL operation the message is intended to invoke. 4. Given the target WSDL operation, determining which Java class/ method to invoke. I call this the Java target. Determining the Java target is referred to as dispatching. 5. Handing off the SOAP message to the Serialization subsystem to deserialize it into Java objects that can be passed to the Java target as parameters. 6. Invoking the Java target using the parameters generated by the Serialization subsystem and getting the Java object returned by the targetmethod. 7. Handing off the returned object to the Serialization subsystem to serialize it into an XML element conformant with the return message specified by the target WSDL operation. 8. Wrapping the returned XML element as a SOAP message response conforming to the target WSDL operation. 9. Handing the SOAP response back to the transport for delivery At each stage in this process, the invocation subsystem must also handle exceptions. When an exception occurs, the invocation subsystem often must package it as a SOAP fault message to be returned to the client. In practice, the invocation process is more nuanced and complex than this. However, the steps outlined here offer a good starting point for our discussion of Java Web Services architecture. Later chapters go into greater detail—particularly Chapters 6 and 7 where I examine JAX-WS, and Chapter 11 where the SOA-J5 invocation mechanism is described. As you can see, the invocation process is nontrivial. Part of its complexity results from having to support SOAP. We’ll look at a simpler alternative, known as REST (Representational State Transfer), in Chapter 3. Even with REST, however, invocation is complicated. It’s just not that easy to solve the generalized problem of mapping an XML description of a Web service to a Java target and invoking that target with an XML message. On the client side, the invocation process is similar if you want to invoke a Web service using a Java interface. This approach may not always be the most appropriate way to invoke a Web service—a lot depends on the problem
you are solving. If your client is working with XML, it might be easier to just construct a SOAP message from XML and pass it to the Web service. On the other hand, if your client is working with Java objects, as JWS assumes, the client-side invocation subsystem is responsible for:
Client-Side Invocation 1. Creating an instance of the Web service endpoint implementing a Java interface referred to (JWS terminology) as the service endpoint interface (SEI). The invocation subsystem has one or more factories for creating SEI instances. These instances are either created on the fly, or accessed using JNDI. Typically, SEI instances are implemented using Java proxies and invocation handlers. I cover this fascinating topic in depth in Chapter 6. 2. Handling an invocation of the SEI instance. 3. Taking the parameters passed to the SEI and passing them to the Serialization subsystem to be serialized into XML elements that conform to the XML Schema specified by the target service’s WSDL. 4. Based on the target service’s WSDL, wrapping the parameter elements in a SOAP message. 5. Invoking handlers that post-process the message (e.g., to persist the message for reliability purposes, or set SOAP headers) based on Quality of Service (QoS) or other requirements. 6. Handing off the message to the transport for delivery to the target Web service. 7. Receiving the SOAP message response from the transport. 8. Handing off the SOAP message to the Serialization subsystem to deserialize it into a Java object that is an instance of the class specified by the SEI’s return type. 9. Completing the invocation of the SEI by returning the deserialized SOAP response. Again, for simplicity of presentation, I have left out a description of the exception handling process. In general, client-side invocation is the inverse of server-side invocation. On the server side, the invocation subsystem front-ends a Java method with a proxy SOAP operation defined by the WSDL. It executes the WSDL operation by invoking a Java method. Conversely, on the client side, the invocation subsystem front-ends the WSDLdefined SOAP operation with a proxy Java interface. It handles a Java method call by executing a WSDL operation. Figure 1–1 illustrates this mirror image behavior. One interesting point to make here is that only the middle part of Figure 1–1, the SOAP request/response, is specified by the WSDL. The Java method invocations at either end are completely arbitrary from a Web Services perspective. In fact, you have one Java method signature on the client side and a completely different method signature on the server side. In most cases, the method signatures are different, and the programming languages used are different, because if both sides were working with the same Java class libraries, this invocation could occur via Java RMI.
Also keep in mind that Figure 1–1 simply illustrates the mirror-image nature of invocation on the client and server sides. In practice, one side of this diagram or the other is probably not doing a Java method invocation. For example, Web Services enable us to have a Java client invoking a CICS transaction over SOAP/HTTP. In that scenario, you have a Java invocation subsystem only on the client side and something else that converts SOAP to CICS on the server side.
XML Schema: Understanding Structures
The W3C XML Schema definition (WXS) represents the Abstract Data Model of W3C XML Schema (WXS) in XML language. By defining an Abstract Data Model of the schema, the W3C Schema becomes agnostic about the language used to represent that model. XML representation is the formal representation specified by WXS, but you are free to represent the Abstract Data Model any way you want and use it for validation. For example, you can directly create an inmemory schema using any data structure that adheres to the Abstract Data Model. This encourages the vendors that develop W3C Schema validators to provide an API that you can use create an in-memory schema directly. There are numerous grammars available for validating XML-instance documents. Some became obsolete immediately, while others—such as DTD, which is part of W3C XML 1.0 REC—have passed the test of time. Of the extant grammars,
XML Schema is the most popular among XML developers because:
1. It uses XML as the language to define the schema. 2. It has more than 44 built-in data types, and each of these data types can be further refined for fine-grained validation of the character data in XML. 3. The cardinality of the elements can be defined in a fine-grained manner using the minOccurs and maxOccurs attributes. 4. It supports modularity and re-usability by extension, restriction, import, includes, and redefine constructs. 5. It supports identity constraint to ensure uniqueness of a value in an XML document, in the specified set. 6. It has an Abstract Data Model and therefore is not bound to the XML representation only.
Here's an example of how you would validate an XML instance against an externally specified schema:
import java.io.FileInputStream; import oracle.xml.parser.v2.XMLError;
import import import ... //load
oracle.xml.parser.schema.XML Schema; oracle.xml.parser.schema.XSDBuilder; oracle.xml.schemavalidator.XSDValidator; XML Schema
XSDBuilder schemaBuilder = new XSDBuilder(); XML Schema schema = schemaBuilder.build(new FileInputStream("myschema.xsd"), null); //set the loaded XML Schema to the XSDValidator XSDValidator validator = new XSDValidator(); validator.setSchema(schema); //validate the XML-instance against the supplied XML Schema. validator.validate(new FileInputStream("data.xml")); //check for errors XMLError error = validator.getError(); if (error.getNumMessages() > 0) { System.out.println("XML-instance is invalid."); error.flushErrors(); } else { System.out.println("XML-instance is valid."); }
Of course, XML Schema has limitations as well:
1. It doesn't support rule-based validation. An example of rule-based validation would be: If the value of attribute "score" is greater than 80, then the element "distinction" must exist in the XML instance, otherwise not. 2. The Unique Particle Attribution (UPA) constraint too strictly defines a grammar for all types of XML documents. (See the "UPA Constraint" section for details.)
Oracle XML Developer's Kit (XDK) includes a W3C-complaint XML Schema processor, as well as several utilities, such as for creating schema datatypes and restricting them programatically using the APIs, parsing and validating the XML Schema structure itself, traversing the Abstract Data Model of an XMLSchema, and so on. Check out the oracle.xml.parser.schema and oracle.xml.schemavalidator packages. The Content and Model Element Content In an XML document, the content of an element is the content enclosed between its <opening> and </closing> tag. An element can have only four types of content: TextOnly, ElementOnly, Mixed, and Empty. Attributes declared on an element are not considered to contribute to the content of an element. They are just part of the element on which they are declared, and contribute to the structure of XML.
TextOnly The content of an element is said to be TextOnly, when that element has only character data (or simply called as text data) between its <opening> and </closing> tag, or in other words, when that element has no child elements. For example:
<TextOnly>some character data</TextOnly>
ElementOnly
The content of an element is said to be ElementOnly, when that element has only child elements between its <opening> and </closing> tag, optionally separated by whitespaces (space, tab, newline, carriage return). These whitespaces are called ignorable whitespaces, and are often used for indenting the XML. Therefore the following: ElementOnly content without whitespaces
<ElementOnly><child1 .../><child2 .../></ElementOnly>
is the same as:
ElementOnly content with whitespaces
<ElementOnly> <child1 .../> <child2 .../> </ElementOnly>
Mixed
The content of an element is said to be Mixed when that element has character data interspersed with child elements between its <opening> and </closing> tag. (In other words, its content has both character data as well as child elements.) When the content is mixed, then so-called ignorable whitespaces are not ignorable anymore. Therefore, the following:
<Mixed><child1.../>some character data<child1.../></Mixed>
is different than:
<Mixed> <child1 .../> some character data <child1 .../> </Mixed>
Empty
The content of an element is said to be Empty when that element has absolutely nothing between the <opening> and </closing> tag, not even whitespaces. For example:
<Empty></Empty>
Another way, for ease of use and clarity, to represent an element, which has an empty content is to use a single empty tag, as follows:
<Empty />
Content Models
In an XML grammar, one declares the content model of an element to specify the type of element content in the corresponding XML instance document. Therefore, a content model is the definition of the element content. The figure below illustrates how to declare the content models in an XML Schema. Trace the paths in this figure starting from <schema>, to understand how to declare the content model for the four types of element content, with and without attribute declarations. Let's examine each one briefly.
Figure 1. Declare the content models in an XML Schema
TextOnly In the illustration above, trace the path until simpleType-1 to declare an element with TextOnly content model:
<xsd:element name="TextOnly"> <xsd:simpleType> <xsd:restriction base="xsd:string" /> </xsd:simpleType> </xsd:element> OR equivalent <xsd:element name="TextOnly" type="xsd:string" />
The above schema declares an element named "TextOnly" (can be anything) with the TextOnly content model, whose content must be a string in the corresponding XML instance. When the content model of an element is TextOnly there is always a simpleType associated with it that indicates the datatype of that element. For example, in this case the datatype for element TextOnly is string. See the corresponding XML instance for this schema in the previous section.
As mentioned previously, attributes don't contribute to the element content; therefore, another example of an XML instance with a TextOnly content, and with attributes, is:
<TextOnly att="val">some character data</TextOnly>
Now trace the path in Figure 1 until simpleContent-3 to declare an element with TextOnly content model, and with attributes:
<xsd:element name="TextOnly"> <xsd:complexType> <xsd:simpleContent> <xsd:extension base="xsd:string"> <xsd:attribute name="att" type="xsd:string" use="required" /> </xsd:extension> </xsd:simpleContent> </xsd:complexType> </xsd:element>
The above schema declares an element named "TextOnly" with TextOnly content model whose content must be a string and must have an attribute named "attr" in the corresponding XML instance.
ElementOnly Trace the path in Figure 1 until either one of sequence-5, choice-6, or all-7 to declare an element with ElementOnly content model:
<xsd:element name="ElementOnly"> <xsd:complexType> <xsd:sequence> <!-- could have used choice or all instead —> <xsd:element name="child1" type="xsd:string" /> <xsd:element name="child2" type="xsd:string" /> </xsd:sequence> </xsd:complexType> </xsd:element>
The above schema declares an element named "ElementOnly" with ElementOnly content model. The element "ElementOnly" must have the child elements "child1" and "child2" in the corresponding XML instance document. See the corresponding XML instance for this schema in the previous section.
Another XML instance with ElementOnly element content and with attributes looks like:
<ElementOnly att="val"> <child1 .../>
<child2 .../> </ElementOnly>
Mixed
Trace the path in Figure 1 until either one of sequence-5, choice-6, or all-7 to declare an element with Mixed content model—which is identical to declaring ElementOnly content model—but this time set the mixed attribute on the complexType to true, as follows:
<xsd:element name="Mixed"> <xsd:complexType mixed="true"> <xsd:sequence> <xsd:element name="child1" type="xsd:string" /> <xsd:element name="child2" type="xsd:string" /> </xsd:sequence> <xsd:attribute name="att" type="xsd:string" use="required" /> </xsd:complexType> </xsd:element>
To declare an element with ElementOnly content model and with attributes, the path in Figure 1 is same as that of declaring ElementOnly content model. The attributes are then declared within the complexType as follows:
<xsd:element name="ElementOnly"> <xsd:complexType> <xsd:sequence> <xsd:element name="child1" type="xsd:string" /> <xsd:element name="child2" type="xsd:string" /> </xsd:sequence> <xsd:attribute name="att" type="xsd:string" use="required" /> </xsd:complexType> </xsd:element>
The corresponding XML instance for the above schema looks like
<Mixed att="val"> <child1 .../> some character data <child1 .../> </Mixed>
Empty
Trace the path until complexType-2 to declare an element with Empty content model, with or without attributes:
<xsd:element name="EmptyContentModels"> <xsd:complexType> <xsd:sequence> <xsd:element name="Empty1"> <xsd:complexType /> </xsd:element> <xsd:element name="Empty2"> <xsd:complexType> <xsd:attribute name="att" type="xsd:string" use="required" />
The corresponding XML instance for the above schema looks like
<EmptyContentModels> <Empty1 /> <Empty2 att="val" /> </EmptyContentModels>
Model Groups
When the content model of an element is declared to be ElementOnly (or mixed), which means that the element has child elements, then you can specify the order and occurrence of the child elements in more detail using the model groups. A model group consists of particles; a particle can be an element declaration or yet another model group. The model groups itself can have a cardinality, which can be refined using the minOccurs and maxOccurs attributes. These characteristics make model groups quite powerful. The three model groups supported by XML Schema are:
•
•
•
Sequence - (a , b)* - means that the child elements declared within the sequence model group must occur in the corresponding XML-instance in the same order as defined in the schema. The cardinality of a sequence model group can range from 0 to unbounded. A sequence model group can futher contain a sequence or a choice model group recursively. Choice - (a | b)* - means that from the set of child elements declared within the choice model group exactly one element must occur in the corresponding XML-instance. The cardinality of a choice model group can range from 0 to unbounded. A choice model group can futher contain a sequence or a choice model group recursively. All - {a , b}? - means that the entire set of child elements declared within the all model group must occur in the corresponding XML-instance, but unlike sequence model group, the order is not important. The child elements can therefore occur in any order. The cardinality of an all model group can only be either 0 or 1. An all model group can only contain element declarations and not any other model group.
These model groups can either be declared in-line or as a global declaration (immediate child of <schema> construct with a name for re-usability). A global model group must be declared within the <group> construct, which you can later refer to by its name. But unlike the in-line model groups, the minOccurs/maxOccurs attributes cannot be declared on the globally declared model groups. When required, you can use the minOccurs/maxOccurs attributes when referencing the globally declared model group. For example:
<xsd:group name="globalDecl"> <xsd:sequence> <xsd:element name="child1" type="xsd:string" />
Subsequently, you can reference the globally declared model group using the group construct along with the minOccurs/maxOccurs attributes, if required, as follows:
<xsd:group ref="globalDecl" maxOccurs="unbounded"> ((a | b)* , c+)? <xsd:element name="complexModelGroup"> <xsd:complexType> <xsd:sequence minOccurs="0" maxOccurs="1"> <xsd:choice minOccurs="0" maxOccurs="unbounded"> <xsd:element name="a" type="xsd:string" /> <xsd:element name="b" type="xsd:string" /> </xsd:choice> <xsd:element name="c" type="xsd:string" minOccurs="1" maxOccurs="unbounded"> </xsd:sequence> </xsd:complexType> </xsd:element>
Here is a complex example for a much better understanding of model groups:
The complexType story
You now have enough information to write a simple schema for an XML document. But many advanced concepts in XML Schema remain to be addressed. is one of the other most powerful constructs in the XML Schema. Apart from allowing you to declare all four content models with or without attributes, you can derive a new complexType by inheriting an already declared complexType. Consequently, the derived complexType can either add more declarations to the ones inherited from the base complexType (using extension) or can restrict the declarations from the base complexType (using restriction).
complexType
A complexType can be extended or restricted using either simpleContent or complexContent. A complexType with simpleContent declares a TextOnly content model, with or without attributes. A complexType with complexContent can be used to declare the remaining three content models —ElementOnly, Mixed, or Empty—with or without attributes. Extending a complexType simpleContent
Figure 2. A complexType with simpleContent can only be extended to add attributes.
A complexType with simpleContent can extend either a simpleType or a complexType with simpleContent. As illustrated in Figure 2, in the derived complexType, then, the only thing you are allowed to do is add attributes. For example:
<?xml version="1.0" ?> <xsd:schema targetNamespace="http://inheritance-ext-res" xmlns:tns="http://inheritance-ext-res" xmlns:xsd="http://www.w3.org/2001/XML Schema" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xsd:complexType name="DerivedType1"> <xsd:simpleContent> <xsd:extension base="xsd:string"> <xsd:attribute name="att1" type="xsd:string" use="required" /> </xsd:extension> </xsd:simpleContent> </xsd:complexType> <xsd:complexType name="DerivedType2"> <xsd:simpleContent> <xsd:extension base="tns:DerivedType1"> <xsd:attribute name="att2" type="xsd:string" use="required" /> </xsd:extension> </xsd:simpleContent> </xsd:complexType> <xsd:element name="SCExtension"> <xsd:complexType> <xsd:sequence> <xsd:element name="Derived1" type="tns:DerivedType1" /> <xsd:element name="Derived2" type="tns:DerivedType2" /> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema>
In the above schema: 1. DerivedType1 extends from the built-in simpleType string, and adds an attribute attr1. 2. DerivedType2 inherits attribute attr1 from the base DerivedType1, which is a "complexType with simpleContent," and adds an attribute attr2.
An XML instance corresponding to the above schema looks like:
<SCExtension xmlns="http://inheritance-ext-res" xmlns:xsi="http://www.w3.org/2001/XML Schema-instance" xsi:schemaLocation="http://inheritance-ext-res CTSCExt.xsd"> <Derived1 att1="val">abc</Derived1>
Figure 3. A complexType with complexContent can be used to extend the model group as well as add attributes.
A complexType with complexContent can extend either a complexType or a complexType with complexContent. As illustrated in Figure 3, in the derived complexType, then, you are allowed to add attributes, as well as extend the model group. For example:
<?xml version="1.0" ?> <xsd:schema targetNamespace="http://inheritance-ext-res" xmlns:tns="http://inheritance-ext-res" xmlns:xsd="http://www.w3.org/2001/XML Schema" elementFormDefault="qualified" attributeFormDefault="unqualified"> <!— (child1)+ —> <xsd:complexType name="BaseType"> <xsd:sequence maxOccurs="unbounded"> <xsd:element name="child1" type="xsd:string" /> </xsd:sequence> <xsd:attribute name="att1" type="xsd:string" use="required" /> </xsd:complexType> <!— ((child1)+ , (child2 | child3)) —> <xsd:complexType name="DerivedType"> <xsd:complexContent> <xsd:extension base="tns:BaseType"> <xsd:choice> <xsd:element name="child2" type="xsd:string" /> <xsd:element name="child3" type="xsd:string" /> </xsd:choice> <xsd:attribute name="att2" type="xsd:string" use="required" /> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="CCExtension">
In the above schema: 1. The DerivedType inherits the sequence model group from the base complexType, and adds a choice model group, thereby, making the final content model of the derived complexType - ((child1)+ , (child2 | 2. The DerivedType inherits attribute attr1 from the BaseType, and adds attribute attr2.
child3)).
An XML instance corresponding to the above schema looks like:
<CCExtension xmlns="http://inheritance-ext-res" xmlns:xsi="http://www.w3.org/2001/XML Schema-instance" xsi:schemaLocation="http://inheritance-ext-res CTCCExt.xsd"> <Base att1="val"> <child1>This is base</child1> <child1>This is base</child1> </Base> <Derived att1="val" att2="val"> <child1>This is inherited from base</child1> <child1>This is inherited from base</child1> <child1>This is inherited from base</child1> <child3>This is added in the derived</child3> </Derived> </CCExtension>
Restricting a complexType
simpleContent
Figure 4. A complexType with simpleContent can be used to restrict the datatype and attributes.
A complexType with simpleContent can only restrict a complexType with simpleContent. As illustrated in Figure 4, in the derived complexType, then, you can restrict the simpleType of the base, as well as restrict the type and use (optional, mantatory, etc.) of the attributes from the base. For example:
<?xml version="1.0" ?> <xsd:schema targetNamespace="http://inheritance-ext-res" xmlns:tns="http://inheritance-ext-res" xmlns:xsd="http://www.w3.org/2001/XML Schema" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xsd:complexType name="BaseType"> <xsd:simpleContent> <xsd:extension base="xsd:string"> <xsd:attribute name="att1" type="xsd:string" use="optional" /> <xsd:attribute name="att2" type="xsd:integer" use="optional" /> </xsd:extension> </xsd:simpleContent> </xsd:complexType> <xsd:complexType name="DerivedType"> <xsd:simpleContent> <xsd:restriction base="tns:BaseType"> <xsd:maxLength value="35" /> <xsd:attribute name="att1" use="prohibited" /> <xsd:attribute name="att2" use="required"> <xsd:simpleType> <xsd:restriction base="xsd:integer"> <xsd:totalDigits value="2" /> </xsd:restriction> </xsd:simpleType> </xsd:attribute> </xsd:restriction> </xsd:simpleContent> </xsd:complexType> <xsd:element name="SCRestriction"> <xsd:complexType> <xsd:sequence> <xsd:element name="Base" type="tns:BaseType" /> <xsd:element name="Derived" type="tns:DerivedType" /> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema>
In the above schema: 1. You restricted the simpleType content of the base (of type string) to a string of length 35 in the derived. 2. You blocked the attribute att1 from being inherited from base.
3. You restricted the type of the attribute att2 to an integer of 2 digits, and made it mandatory from optional.
An XML instance corresponding to the above schema looks like:
<SCRestriction xmlns="http://inheritance-ext-res" xmlns:xsi="http://www.w3.org/2001/XML Schema-instance" xsi:schemaLocation="http://inheritance-ext-res CTSCRes.xsd"> <Base att1="val">This is base type</Base> <Derived att2="12">This is restricted in the derived</Derived> </SCRestriction>
complexContent
Figure 5. A complexType with complexContent can be used to restrict the model group as well as the attributes.
A complexType with complexContent can either restrict a complexType or a complexType with complexContent. As illustrated in Figure 5, in the derived complexType, then, you must repeat the entire content model from the base and restrict them as desired, if required. You can restrict the attributes the same way as you would do while restricting a simpleContent. For example:
<?xml version="1.0" ?> <xsd:schema targetNamespace="http://inheritance-ext-res" xmlns:tns="http://inheritance-ext-res" xmlns:xsd="http://www.w3.org/2001/XML Schema" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xsd:complexType name="BaseType"> <xsd:sequence> <xsd:element name="child1" type="xsd:string" maxOccurs="unbounded" /> <xsd:element name="child2" type="xsd:string"/> </xsd:sequence> <xsd:attribute name="att1" type="xsd:string" use="optional" /> </xsd:complexType> <xsd:complexType name="DerivedType"> <xsd:complexContent>
In the above schema: 1. You restricted the cardinality of child1 in the DerivedType, inherited from the BaseType, from unbounded to 4. 2. You restricted the type of child2 in the DerivedType, inherited from the BaseType to a string of length 35 3. You prohibited the attribute att1 from being inherited from the BaseType.
An XML instance corresponding to the above schema looks like:
<CCRestriction xmlns="http://inheritance-ext-res" xmlns:xsi="http://www.w3.org/2001/XML Schema-instance" xsi:schemaLocation="http://inheritance-ext-res CTCCRes.xsd"> <Base att1="val"> <child1>This is base type</child1> <child2>This is base type</child2> </Base> <Derived> <child1>This is restricted in the derived</child1> <child2>This is restricted in the derived</child2> </Derived>
Assembling Schemas
</CCRestriction>
Imports, includes, and chameleon effects Many Java projects involve multiple different classes and packages instead of a single, huge Java file because modularization makes the code easy to re-use, read, and maintain. Subsequently, you have to stick the necessary import into the classes before you can use them. Similarly, in XML Schema, you have to manage multiple different schemas from various different namespaces and you need to stick the necessary import in the schemas before you use them. XML Schemas can be assembled using <import/> and <include/> schema constructs, and of course, the following should be the first statement in the schema before any other declarations:
<schema> <import namespace="foo" schemaLocation="bar.xsd" /> <include schemaLocation="baz.xsd" /> ... </schema>
Usually <import /> is used when the schema being imported has a targetNamespace, while <include /> is used when the schema being included has no targetNamespace declared.
Let's look at an example involving two schemas - A and B— with A referring to items declared in B. Case I When both the schemas have a targetNamespace and the targetNamespace of schema A (tnsA) is different from the targetNamespace of schema B (tnsB), then A must import B.
<import namespace="tnsB" schemaLocation="B.xsd">
It is however an error for A to import B without specifying the namespace, as well as for A to include B.
Case II When both the schemas have a targetNamespace and the targetNamespace of schema A (tnsAB) is same as the targetNamespace of schema B (tnsAB), then A must include B.
<include schemaLocation="B.xsd">
It is an error for A to import B.
Case III When both the schemas A and B don't have a targetNamespace. In this case, A must include B.
<include schemaLocation="B.xsd" />
Case IV When schema A has no targetNamespace, and schema B has a targetNamespace (tnsB), then, A must import B.
<import namespace="tnsB" schemaLocation="B.xsd" />
It is an error for A to include B because B has a targetNamespace.
Case V When schema A has a targetNamespace (tnsA) and schema B has no targetNamespace, then...? Loudly please! A should include B. But what if I say that in this case, A should import B? Actually, in this case A can either import or include B, and both are legal, though the effects are different. When A includes B, all the included items from B get the namespace of A. Such an include is known as a chameleon include. When you don't want such a chameleon effect to take place, you must use an import without specifying the namespace. An import without the namespace attribute allows unqualified reference to components with no target namespace.
<import schemaLocation="B.xsd">
Importing or including a schema multiple times is not an error, because the schema processors can detect such a scenario and not load an already loaded schema. Therefore, it is not an error if A.xsd imports B.xsd and C.xsd; and both B.xsd and C.xsd individually import A.xsd. Circular references are not errors either but are strongly discouraged.
By the way, a mere import like <import /> is legal as well. This approach simply allows unqualified reference to foreign components with no target namespace without giving any hints as to where to find them. It is up to the Schema processor to either throw an error or lookup for unknown items using some mechanism, and this behaviour may vary from one Schema processor to other. A mere <include /> is however illegal. Rules of thumb:
1. <include/> - is as good as saying that the <include/>d schema is defined inline in the including schema. 2. <import/> - is always used when <import/>ed schema has a targetNamespace, which is different than the targetNamespace of the importing schema.
Redefining Schemas You may not always want to assemble schemas in their original forms. For example, you may want to modify the components being imported from the schema. In such cases, when we want to redefine a declaration without changing its name, we use the redefine component to do this, with the constraint that the schema which is to be redefined must either have (a) the same targetNamespace as the <redefine>ing schema document, or have (b) no targetNamespace at all, in which case the <redefine>d schema document is converted to the <redefine>ing schema document's targetNamespace. For example:
actual.xsd
In the above schema: 1. You redefined the DerivedType complexType by adding one more element to the content model, without changing its name. 2. By not redefining the BaseType in the redefine schema, it is inherited as is.
Note that the name of a type is not changed when redefining it. Therefore, redefined types use themselves as their base types. In the above example, we redefine a complexType named DerivedType without changing its name. While redefining DerivedType, any reference to "DerivedType" (for example base="tns:DerivedType") is supposed to refer to the actual DerivedType. After the type is redefined, any reference to the DerivedType is supposed to refer to the redefined type. An XML instance corresponding to the above-redefined schema looks like:
<Redefine xmlns="http://inheritance-ext-res" xmlns:xsi="http://www.w3.org/2001/XML Schema-instance" xsi:schemaLocation="http://inheritance-ext-res redefine.xsd"> <Base att1="val"> <child1>This is base type</child1> </Base> <Derived att1="val" att2="val"> <child1>This is inherited from the base as is</child1> <child2>This is added in the derived</child2> <child4>This is added when redefining</child4> </Derived> </Redefine>
Constraints
Identity constraint XML Schema allows you to enforce uniqueness constraints on the content of elements and attributes, which guarantees that in the instance document the value of the specified elements or attributes are unique. When uniqueness is enforced, there must be an item whose value is to be checked for uniqueness—ISBN number, for example. When you have identified the item, then you must identify the set in which the value of those selected items should be checked for uniqueness (a set of books, for example). XML Schema provides two constructs — unique and key—to enforce uniqueness constraints. Unique ensures that if the specified values are not null, then they must be unique in the defined set; key ensures that the specified values are never null and are unique in the defined set.
There is one more construct — keyref, which points to some key already defined. Keyref then ensures that the value of the specified item within keyref exists in the set of keys the keyref is pointing to. All three constructs have the same syntax (all of them use a selector and fields) but different meanings. The selector is used to define the set in which uniqueness is to enforced, and field (multiple fields are used to define a composite item) is used to define the item whose value is to be checked for uniqueness. The value for both selector and field are XPath expressions. XPath expressions do not respect default namespaces; therefore, it becomes very essential to make the XPath expressions namespace aware by explicitly using prefixes bound to appropriate namespace, if the elements/attributes are in a namespace. For example:
<?xml version="1.0" ?> <xsd:schema targetNamespace="http://identity-constraint" xmlns:tns="http://identity-constraint" xmlns:xsd="http://www.w3.org/2001/XML Schema" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xsd:complexType name="BookType"> <xsd:sequence> <xsd:element name="title" type="xsd:string" /> <xsd:element name="half-isbn" type="xsd:string" /> <xsd:element name="other-half-isbn" type="xsd:float" /> </xsd:sequence> </xsd:complexType> <xsd:element name="Books"> <xsd:complexType> <xsd:sequence> <xsd:element name="Book" type="tns:BookType" maxOccurs="unbounded" /> </xsd:sequence> </xsd:complexType> <xsd:key name="isbn"> <xsd:selector xpath=".//tns:Book" /> <xsd:field xpath="tns:half-isbn" /> <xsd:field xpath="tns:other-half-isbn" /> </xsd:key> </xsd:element> </xsd:schema>
In the above schema, we declared a key named "isbn" that says, "The composite value (half-isbn + other-half-isbn) specified by field must be not null and unique in the set of books, as specified by the selector."
Unique Particle Attribution (UPA) Constraint The UPA constraint ensures that the content model of every element be specified in a way such that while validating XML instance there is no ambiguity and the correct element declarations
can be determined deterministically for validation. For example, the following schema violates the UPA constraint:
<xsd:element name="upa"> <xsd:complexType> <xsd:sequence> <xsd:element name="a" minOccurs="0"/> <xsd:element name="b" minOccurs="0"/> <xsd:element name="a" minOccurs="0"/> </xsd:sequence> </xsd:complexType> </xsd:element> <upa> <a/> </upa>
...because in the corresponding XML-instance for the above schema:
It is not deterministic that the element "a" in the XML instance corresponds to which element declaration in the schema—the element declaration for "a", which is before the element declaration for "b"; or the element declaration for "a", which is after the element declaration for "b"? This restriction limits you to write an XMLSchema for the type of XML instance you just saw. Anyway, in this case, if you just set the minOccurs of element "b" to anything greater than 0, then the UPA is not violated.
The following, then, is a valid schema:
<xsd:element name="upa"> <xsd:complexType> <xsd:sequence> <xsd:element name="a" minOccurs="0"/> <xsd:element name="b" minOccurs="1"/> <xsd:element name="a" minOccurs="0"/> </xsd:sequence> </xsd:complexType> </xsd:element> <upa> <a/> <b/> </upa>
...because in the corresponding XML-instance for the above schema:
It is quite clear that the element "a" in the XML instance is actually an instance of the element declaration for "a", which is before the element declaration for "b" in the schema.
Conclusion Now that you have completed this series, you should understand:
1. The concept of namespaces in XML and XML Schema
2. The scalar datatypes supported in XML Schema, and how to further restrict them using simpleType 3. The element content, content model, model groups, particles, extending and restricting a complexType, assembling schemas, identity constraint, and UPA, which allow you to define and constrain the structure of XML.
XML Schema: Understanding Namespaces
XML namespace is a collection of XML elements and attributes identified by an Internationalized Resource Identifier (IRI); One of the primary motivations for defining an XML namespace is to avoid naming conflicts when using and re-using multiple vocabularies. XML Schema is used to create a vocabulary for an XML instance, and uses namespaces heavily..
Namespaces are similar to packages in Java in several ways:
• •
•
A package in Java can have many reusable classes and interfaces. Similarly, a namespace in XML can have many reusable elements and attributes. To use a class or interface in a package, you must fully qualify that class or interface with the package name. Similarly, to use an element or attribute in a namespace, you must fully qualify that element or attribute with the namespace. A Java package may have an inner class that is not directly inside the package, but rather "belongs" to it by the virtue of its enclosing class. The same is true for namespaces: there could be elements or attributes that are not directly in a namespace, but belongs to the namespace by virtue of its parent or enclosing element. This is a transitive relationship. If a book is on the table, and the table is on the floor, then transitively, the book is on the floor; albeit the book is not directly on the floor.
Thus, we see that the namespaces in XML concept is not very different from packages in Java. This correlation is intended to simplify the understanding of namespaces in XML and to help you visualize the namespaces concept. In this article, you will learn:
• • • • •
The role of namespaces in XML How to declare and use namespaces The difference between default-namespace and no-namespace How to create namespaces using XML Schema, and The difference between qualified and unqualified elements/attributes in a namespace.
Declaring and Applying Namespaces Namespaces are declared as an attribute of an element. It is not mandatory to declare namespaces only at the root element; rather it could be declared at any element in the XML document.
The scope of a declared namespace begins at the element where it is declared and applies to the entire content of that element, unless overridden by another namespace declaration with the same prefix name—where, the content of an element is the content between the <opening-tag> and </closing-tag> of that element. A namespace is declared as follows:
<someElement xmlns:pfx="http://www.foo.com" />
In the attribute xmlns:pfx, xmlns is like a reserved word, which is used only to declare a namespace. In other words, xmlns is used for binding namespaces, and is not itself bound to any namespace. Therefore, the above example is read as binding the prefix "pfx" with the namespace "http://www.foo.com." It is a convention to use XSD or XS as a prefix for the XML Schema namespace, but that decision is purely personal. One can choose to use a prefix ABC for the XML Schema namespace, which is legal, but doesn't make much sense. Using meaningful namespace prefixes add clarity to the XML document. Note that the prefixes are used only as a placeholder and must be expanded by the namespace-aware XML parser to use the actual namespace bound to the prefix. In Java analogy, a namespace binding can be correlated to declaring a variable, and wherever the variable is referenced, it is replaced by the value it was assigned. In our previous namespace declaration example, wherever the prefix "pfx" is referenced within the namespace declaration scope, it is expanded to the actual namespace (http://www.foo.com) to which it was bound: In Java: String pfx = "http://www.library.com" In XML: <someElement xmlns:pfx="http://www.foo.com" /> Although a namespace usually looks like a URL, that doesn't mean that one must be connected to the Internet to actually declare and use namespaces. Rather, the namespace is intended to serve as a virtual "container" for vocabulary and un-displayed content that can be shared in the Internet space. In the Internet space URLs are unique—hence you would usually choose to use URLs to uniquely identify namespaces. Typing the namespace URL in a browser doesn't mean it would show all the elements and attributes in that namespace; it's just a concept. But here's a twist: although the W3C Namespaces in XML Recommendation declares that the namespace name should be an IRI, it enforces no such constraint. Therefore, I could also use something like:
<someElement xmlns:pfx="foo" />
which is perfectly legal. By now it should be clear that to use a namespace, we first bind it with a prefix and then use that prefix wherever required. But why can't we use the namespaces to qualify the elements or attributes from the start?
First, because namespaces—being IRIs—are quite long and thus would hopelessly clutter the XML document. Second and most important, because it might have a severe impact on the syntax, or to be specific, on the production rules of XML—the reason being that an IRI might have characters that are not allowed in XML tags per the W3C XML 1.0 Recommendation.
Invalid) <http://www.library.com:Book /> Valid) <lib:Book xmlns:lib="http://www.library.com" />
Below the elements Title and Author are associated with the Namespace http://www.library.com:
<?xml version="1.0"?> <Book xmlns:lib="http://www.library.com"> <lib:Title>Sherlock Holmes</lib:Title> <lib:Author>Arthur Conan Doyle</lib:Author> </Book>
In the example below, the elements Title and Author of Sherlock Holmes - IIIand Sherlock Holmes - I are associated with the namespace http://www.library.com and the elements Title and Author of Sherlock Holmes - II are associated with the namespace http://www.otherlibrary.com.
<?xml version="1.0"?> <Book xmlns:lib="http://www.library.com"> <lib:Title>Sherlock Holmes - I</lib:Title> <lib:Author>Arthur Conan Doyle</lib:Author> <purchase xmlns:lib="http://www.otherlibrary.com"> <lib:Title>Sherlock Holmes - II</lib:Title> <lib:Author>Arthur Conan Doyle</lib:Author> </purchase> <lib:Title>Sherlock Holmes - III</lib:Title> <lib:Author>Arthur Conan Doyle</lib:Author> </Book>
The W3C Namespaces in XML Recommendation enforces some namespace constraints:
1. Prefixes beginning with the three-letter sequence x, m, and l, in any case combination, are reserved for use by XML and XML-related specifications. Although not a fatal error, it is inadvisable to bind such prefixes. The prefix xml is by definition bound to the namespace name http://www.w3.org/XML/1998/namespace. 2. A prefix cannot be used unless it is declared and bound to a namespace. (Ever tried to use a variable in Java without declaring it?)
The following violates both these constraints:
<?xml version="1.0"?> <Book xmlns:XmlLibrary="http://www.library.com">
[Error]: prefix lib not bound to a namespace. [Inadvisable]: prefix XmlLibrary begins with 'Xml.' Default Namespace (Not Default Namespaces) It would be painful to repeatedly qualify an element or attribute you wish to use from a namespace. In such cases, you can declare a {default namespace} instead. Remember, at any point in time, there can be only one {default namespace} in existence. Therefore, the term "Default Namespaces" is inherently incorrect. Declaring a {default namespace} means that any element within the scope of the {default namespace} declaration will be qualified implicitly, if it is not already qualified explicitly using a prefix. As with prefixed namespaces, a {default namespace} can be overridden too. A {default namespace} is declared as follows:
<someElement xmlns="http://www.foo.com" /> <?xml version="1.0"?> <Book xmlns="http://www.library.com"> <Title>Sherlock Holmes</Title> <Author>Arthur Conan Doyle</Author> </Book>
In this case the elements Book, Title, and Author are associated with the Namespace http://www.library.com. Remember, the scope of a namespace begins at the element where it is declared. Therefore, the element Book is also associated with the {default namespace}, as it has no prefix.
<?xml version="1.0"?> <Book xmlns="http://www.library.com"> <Title>Sherlock Holmes - I</Title> <Author>Arthur Conan Doyle</Author> <purchase xmlns="http://www.otherlibrary.com"> <Title>Sherlock Holmes - II</Title> <Author>Arthur Conan Doyle</Author> </purchase> <Title>Sherlock Holmes - III</Title> <Author>Arthur Conan Doyle</Author> </Book>
In the above, the elements Book, and Title, and Author of Sherlock Holmes - III and Sherlock Holmes - I are associated with the namespace http://www.library.com and the elements purchase, Title, and Author of Sherlock Holmes - II are associated with the namespace http://www.otherlibrary.com.
Default Namespace and Attributes Default namespaces do not apply to attributes; therefore, to apply a namespace to an attribute the attribute must be explicitly qualified. Here the attribute isbn has {no namespace} whereas the attribute cover is associated with the namespace http://www.library.com.
<?xml version="1.0"?> <Book isbn="1234" pfx:cover="hard" xmlns="http://www.library.com" xmlns:pfx="http://www.library.com"> <Title>Sherlock Holmes</Title> <Author>Arthur Conan Doyle</Author> </Book>
Undeclaring Namespace Unbinding an already-bound prefix is not allowed per the W3C Namespaces in XML 1.0 Recommendation, but is allowed per W3C Namespaces in XML 1.1 Recommendation. There was no reason why this should not have been allowed in 1.0, but the mistake has been rectified in 1.1. It is necessary to know this difference because not many XML parsers yet support Namespaces in XML 1.1. Although there were some differences in unbinding prefixed namespaces, both versions allow you to unbind or remove the already declared {default namespace} by overriding it with another {default namespace} declaration, where the namespace in the overriding declaration is empty. Unbinding a namespace is as good as the namespace not being declared at all. Here the elements Book, Title, and Author of Sherlock Holmes - III and Sherlock Holmes - I are associated with the namespace http://www.library.com and the elements purchase, Title, and Author of Sherlock Holmes - II have {no namespace}:
<someElement xmlns="" /> <?xml version="1.0"?> <Book xmlns="http://www.library.com"> <Title>Sherlock Holmes - I</Title> <Author>Arthur Conan Doyle</Author> <purchase xmlns=""> <Title>Sherlock Holmes - II</Title> <Author>Arthur Conan Doyle</Author> </purchase> <Title>Sherlock Holmes - III</Title> <Author>Arthur Conan Doyle</Author> </Book>
Here's an invalid example of unbinding a prefix per Namespaces in XML 1.0 spec, but a valid example per Namespaces in XML 1.1:
<purchase xmlns:lib="">
From this point on, the prefix lib cannot be used in the XML document because it is now undeclared as long as you are in the scope of element purchase. Of course, you can definitely re-declare it. No Namespace No namespace exists when there is no default namespace in scope. A {default namespace} is one that is declared explicitly using xmlns. When a {default namespace} has not been declared at all using xmlns, it is incorrect to say that the elements are in {default namespace}. In such cases, we say that the elements are in {no namespace}. {no namespace} also applies when an already declared {default namespace} is undeclared. In summary:
•
• • • •
•
The scope of a declared namespace begins at the element where it is declared and applies to all the elements within the content of that element, unless overridden by another namespace declaration with the same prefix name. Both prefixed and {default namespace} can be overridden. Both prefixed and {default namespace} can be undeclared. {default namespace} does not apply to attributes directly. A {default namespace} exists only when you have declared it explicitly. It is incorrect to use the term {default namespace} when you have not declared it. No namespace exists when there is no default namespace in scope.
Namespaces and XML Schema Thus far we have seen how to declare and use an existing namespace. Now let's examine how to create a new namespace and add elements and attributes to it using XML Schema. XML Schema is an XML before it's anything else. In other words, like any other XML document, XML Schema is built with elements and attributes. This "building material" must come from the namespace http://www.w3.org/2001/XMLSchema, which is a declared and reserved namespace that contains elements and attributes as defined in W3C XML Schema Structures Specification and W3C XML Schema Datatypes Specification. You should not add elements or attributes to this namespace. Using these building blocks we can create new elements and attributes as required and enforce the required constraints on these elements and attributes and keep them in some namespace. (See Figure 1.) XML Schema calls this particular namespace as the {target namespace}, or the namespace where the newly created elements and attributes will reside.
Figure 1: Elements and attributes in XML Schema namespace are used to write an XML Schema document, which generates elements and attributes as defined by user and puts them in {target namespace}. This {target namespace} is then used to validate the XML instance.
This {target namespace} is referred from the XML instance for ensuring validity of the instance document. (See Figure 2.) During validation, the Validator verifies that the elements/attributes used in the instance exist in the declared namespace, and also checks for any other constraint on their structure and datatype.
Figure 2: From XML Schema to XML Schema instance
Qualified or Unqualified In XML Schema we can choose to specify whether the instance document must qualify all the elements and attributes, or must qualify only the globally declared elements and attributes. Regardless of what we choose, the entire instance would be validated. So why do we have two choices? The answer is "manageability." When we choose qualified, we are specifying that all the elements and attributes in the instance must have a namespace, which in turn adds namespace complexity to instance. If say that the schema is modified by making some local declarations global and/or making some global declarations local, then the instance documents are not affected at all. In contrast, when we choose unqualified, we are specifying that only the globally
declared elements and attributes in the instance must have a namespace, which in turn hides the namespace complexity from the instance. But in this case, if say, the schema is modified by making some local declarations global and/or making some global declarations local, then all instance documents are affected—and the instance is no longer valid. The XML Schema Validator would report validation errors if we try to validate this instance against the modified XML Schema. Therefore, the namespaces must be fixed in the instance per the modification done in XML Schema to make the instance valid again.
<?xml version="1.0" encoding="US-ASCII"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:tns="http://www.library.com" targetNamespace="http://www.library.com" elementFormDefault="qualified" attributeFormDefault="unqualified"> <element name="Book" type="tns:BookType" /> <complexType name="BookType"> <sequence> <element name="Title" type="string" /> <element name="Author" type="string" /> </sequence> </complexType> </schema>
The declarations that are the immediate children of the element <schema> are the global declarations, and the rest are local declarations. In the above example, Book and BookType are declared globally whereas Title and Author are local declarations. We can express the choice between qualified and unqualified by setting the schema element attributes elementFormDefault and attributeFormDefault to either qualified or unqualified.
elementFormDefault = (qualified | unqualified) : unqualified attributeFormDefault = (qualified | unqualified) : unqualified
When elementFormDefault is set to qualified, it implies that in the instance of this grammar all the elements must be explicitly qualified, either by using a prefix or setting a {default namespace}. An unqualified setting means that only the globally declared elements must be explicitly qualified, and the locally declared elements must not be qualified. Qualifying a local declaration in this case is an error. Similarly, when attributeFormDefault is set to qualified, all attributes in the instance document must be explicitly qualified using a prefix. Remember, {default namespace} doesn't apply to attributes; hence, we can't use a {default namespace} declaration to qualify attributes. Unqualified seems to imply being in the namespace by virtue of the containing element. This is interesting, isn't it? In the following diagrams, the concept symbol space is similar to the non-normative concept of namespace partition. For example, if a namespace is like a refrigerator, then the symbol spaces
are the shelves in the refrigerator. Just as shelves partition the entire space in a refrigerator, the symbol spaces partition the namespace. There are three primary partitions in a namespace: one for global element declarations, one for global attribute declarations, and one for global type declarations (complexType/simpleType). This arrangement implies we can have a global element, a global attribute, and a global type all have the same name, and still co-exist in a {target namespace} without any name collisions. Further, every global element and a global complexType have their own symbol space to contain the local declarations. Let's examine the four possible combinations of values for the pair of attributes elementFormDefault and attributeFormDefault.
Case 1: elementFormDefault=qualified, attributeFormDefault=qualified
Here the {target namespace} directly contains all the elements and attributes; therefore, in the instance, all the elements and attributes must be qualified. Case 2: elementFormDefault=qualified, attributeFormDefault=unqualified
Here the {target namespace} directly contains all the elements and the corresponding attributes for these elements are contained in the symbol space of the respective elements. Therefore, in the instance, only the elements must be qualified and the attributes must not be qualified, unless the attribute is declared globally. Case 3: elementFormDefault=unqualified, attributeFormDefault=qualified
Here the {target namespace} directly contains all the attributes and only the globally declared elements, which in turn contains its child elements in its symbol space. Therefore, in the instance, Resources only the globally declared elements and all the attributes following resources to test the Use the must be qualified. examples and to learn more about Case 4: elementFormDefault=unqualified,
attributeFormDefault=unqualified
namespaces and XML Schema.
Download Oracle 10g XDK Production. Oracle XDK is a set of components, tools, and utilities that eases the task of building and deploying XML-enabled applications. Unlike many shareware and trial XML components, the production Oracle Here the {target namespace} XDK are fully supported and come directly contains only the globally with a commercial redistribution declared elements, whichlicense. in turn
contains its child elements in its symbol space. Every element the W3C XML Schema Read contains the corresponding Primer This document provides an attributes in its symbol space; readable description of the easily therefore, in the instance, onlySchema facilities, and is XML the globally declared elements toward quickly understanding oriented how to create schemas using the XML and attributes must be qualified.
The above diagrams are intended as a visual representation of what is directly contained in a namespace and what is transitively contained in a namespace, depending on the value of elementFormDefault/attributeFormDefault. The implication of this setting is that the elements/attributes directly in the {target namespace} must have a namespace associated with them in the corresponding
Schema language. XML Schema Part 1: Structures and XML Schema Part 2: Datatypes provide the complete normative description of the XML Schema language. Bookmark the XML Technology Center Whether you are a beginner, intermediate, or advanced XML user, the XML Center provides you up-todate content and guidance to develop all types of XML and Web Service applications. XML Samples and Tutorials XML Sample Code Tutorial Series: Oracle XML Parser Techniques
XML instance, and the elements/attributes that are not directly (transitively) in the {target namespace} must not have a namespace associated with them in the corresponding XML instance. Target Namespace and No Target Namespace Now we know that XML Schema creates the new elements and attributes and puts it in a namespace called {target namespace}. But what if we don't specify a {target namespace} in the schema? When we don't specify the attribute targetNamespace at all, no {target namespace} exists—which is legal—but specifying an empty URI in the targetNamespace attribute is "illegal." For example, the following is invalid. We can't specify an empty URI for the {target namespace}:
<schema targetNamespace="" . . .>
In this case, when no {target namespace} exists, we say, as described earlier, that the newly created elements and attributes are kept in {no namespace}. (It would have been incorrect to use the term {default namespace}.) To validate the corresponding XML instance, the corresponding XML instance must use the noNamespaceSchemaLocation attribute from the http://www.w3.org/2001/XMLSchema-instance namespace to refer to the XML Schema with no target namespace. Conclusion Hopefully, this overview of namespaces should help you move to XML Schema more easily. The Oracle XML Developer Kit (XDK) supports the W3C Namespaces in the XML 1.0 Recommendation; you can turn on/off the namespace check using the JAXP APIs in the Oracle XDK by using the setNamespaceAware(boolean) method in the SAXParserFactory and the DocumentBuilderFactory classes.
XML Schema: Understanding Datatypes
Learn which datatypes are supported in XML Schema version 1.0, and how to use them. Other articles in this series: XML Schema: Understanding Namespaces XML Schema: Understanding Structures
Downloads for this article: Oracle XML Developer's Kit Oracle JDeveloper 10g (includes visual XML Schema editor)
The W3C XML Schema Datatype Specification defines numerous datatypes for validating the element content and the attribute value. These datatypes can be used to validate only the scalar content of elements, and not the non-scalar or mixed content. The text enclosed between the <opening> and </closing> element tags, and the value of the attributes are often referred to as scalar data, but it can also be a list of scalar data. These datatypes are intended for use in XML Schema definition and other XML-related documents. Initially, Document Type Definition (DTD) was the only grammar available for validating XML instances. But DTD has only a handful of datatypes, ensuring coarse validation of the scalar data in XML via the familiar PCDATA, CDATA, and so on. XML Schema, in contrast, overcomes this limitation by providing 44 built-in datatypes. Each of these datatypes can be further customized to ensure fine validation of the scalar data. For example, the built-in datatype string can be customized to successfully validate strings and ensure they are of length 4. In this article, you will learn:
• • • • •
The difference between the value space, lexical space, and canonical lexical representation of the supported datatypes The datatypes supported in XML Schema, their classifications, and their relationships to each other Creation of new datatypes from the built-in datatypes using restriction, list, and union constructs Various constraining facets available for restricting a datatype How to use Oracle XDK to programmatically create and use XML Schema datatypes.
Datatype Fundamentals Before we dive into the various types of datatypes, their usage, and the relationships between them, we need to understand datatypes as a general concept. Although XML Schema specification explains the following fundamentals about datatypes, these fundamentals are not specific to XML Schema. Rather, they are general mathematical concepts. Let's examine them in more detail.
Value Space and Lexical Space
A value space contains the maximum allowed set of values for a given datatype. Each value in the value space of a datatype is denoted by one or more literals in the lexical space of that datatype. A lexical space is the set of valid literals for a datatype. Consider this metaphor: In the English language (and in fact in all languages), we have various words that share the same meaning. A value can be correlated to a word's meaning, and the corresponding literals can then be correlated to various different words, all having the same meaning. For example, 100.0, 200.0, and so on are values in the value space of datatype float. The value 100.0 can be represented using multiple literals such as 10.0E+1, 1.0E2, 1.0E+2, and so on. Similarly, the value 200.0 can be represented using multiple literals such as 2.0E2, 2.0E+2, and so on. All such literals for every value in the value space of float belong to the lexical space of datatype float. (See Figure 1.)
Figure 1: A value in the value space can map to many literals in the lexical space.
Canonical Lexical Representation A canonical lexical representation is a set of literals from among the valid set of literals for a datatype such that there is a one-to-one mapping between literals in the canonical lexical representation and values in the value space. (See Figures 2 and 3.)
Figure 2: Many literals in the lexical space map to exactly one literal in the canonical lexical representation.
Figure 3: There is always a one-to-one mapping from the value space to the canonical lexical representation.
Canonical representations do not serve any purpose in XML Schema but are useful in other specifications that use XML Schema datatypes. For example, the XQuery/XPath datamodel uses XML Schema types as well as the canonical lexical representation to serialize a value. Therefore, when serializing a value such as 100.0, the corresponding canonical lexical representation is used —in this case, 1.0E2.
Datatypes in XMLSchema Now that we understand the fundamental concept about datatypes in general, let's explore the datatypes available in XML Schema. Broadly speaking, the datatypes in XML Schema can be categorized as ur-Type, built-in, and user-derived (se Table 1 below) and are related to each other as shown in Figure 4.
ur-Type
anyType anySimpleType
Built-in (Atomic) User-Derived
Primitive Derived Restriction List Union
Table 1: XML Schema Datatype Classification
Figure 4: Relationships between datatypes supported by XML Schema
Now, let's examine the major classifications—ur-Type, built-in, and user-derived—more closely. ur-Type An ur-Type is a classification that says there exists a base or root of the entire type system hierarchy in XML Schema datatypes. Any and every datatype in XML Schema has the ur-Type as its parent or ancestor. The ur-Type has a role similar to that of java.lang.Object in Java, which is the base class of all built-in and user-defined classes in that language. Similarly, the ur-type is the base of all datatypes in XML Schema. anyType and anySimpleType are the two ur-types available in XML Schema. anyType The anyType datatype is a concrete ur-Type, which can serve either as a complex type (nonscalar data, means elements), or as a simple type (scalar data) depending on the context. For example, here is an XML Schema using the anyType datatype:
Here is the corresponding valid instance using scalar data:
<Currency xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://mydatatypes.edu ex2.xsd" xmlns="http://mydatatypes.edu">USD</Currency>
And here is the corresponding valid instance using non-scalar data:
The anySimpleType datatype is also a concrete ur-Type, and is the parent of all built-in datatypes and ancestor of all user-derived scalar datatypes. It differs from anyType in the sense that it can hold only scalar data corresponding to any scalar datatype, whereas anyType can hold scalar as well as non-scalar data. For example, here is an XML Schema using the anySimpleType datatype:
<?xml version="1.0" encoding="US-ASCII"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="http://mydatatypes.edu" elementFormDefault="qualified" attributeFormDefault="unqualified"> <element name="Currency" type="anySimpleType" /> </schema>
Here is the corresponding valid instance using scalar data:
<Currency xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://mydatatypes.edu ex3.xsd" xmlns="http://mydatatypes.edu">USD</Currency> <Currency xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://mydatatypes.edu ex3.xsd" xmlns="http://mydatatypes.edu"> <dollars>100</dollars> </Currency>
And here is the corresponding invalid instance using non-scalar data:
In fact, if you don't specify any type for an element declaration, its type defaults to anyType, and if you don't specify any type for an attribute declaration, its type defaults to anySimpleType. In the example below, the type of element Currency defaults to anyType and the type of attribute MoreCurrency defaults to anySimpleType.
<?xml version="1.0" encoding="US-ASCII"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="http://mydatatypes.edu" elementFormDefault="qualified" attributeFormDefault="unqualified"> <element name="Currency" /> <attribute name="MoreCurrency" /> </schema>
Built-in Datatypes
Built-in datatypes, which are defined in the W3C XML Schema Datatype Specification, must be supported by all W3C XML Schema-compliant parsers. There are two classifications of built-in datatypes: primitive and derived. The differences between the two have little relevance for the user, but we will examine them here anyway to demonstrate the mechanics and utility of datatype generation. (See the W3C's built-in datatype inheritance diagram here.) Built-in Primitive Datatypes Primitive datatypes are indivisible. They are not defined in terms of other datatypes; they exist independently. For example, decimal is a well-defined mathematical concept that cannot be defined in terms of any other datatypes. There are the 19 built-in primitive datatypes supported by the XML Schema Datatypes Specification:
string boolean decimal float double duration dateTime time date gYearMonth gYear gMonthDay gDay gMonth hexBinary base64Binary anyURI QName NOTATION
For details, see Section 3.2 of the XML Schema Part 2.
Built-in Derived Datatypes Derived datatypes, in contrast, are divisible because they are derived from the built-in primitive datatypes—in other words, derived datatypes are defined in terms of other datatypes. For example, an integer is a well-defined mathematical concept that can be defined in terms of decimal with the restriction of not using the decimal point. There are 25 built-in derived datatypes supported by XML Schema Datatypes:
normalizedString token language NMTOKEN NMTOKENS Name NCName ID IDREF IDREFS ENTITY ENTITIES integer nonPositiveInteger negativeInteger long int short byte nonNegativeInteger unsignedLong unsignedInt unsignedShort unsignedByte positiveInteger
For details, see Section 3.3 of Part 2 of the XML Schema spec.
User-Derived Datatypes User-derived datatypes are the ones specified by the user in an XML Schema Definition, and are created by either restriction, list, or union. The XML Schema construct <simpleType> is used to create user-derived datatypes. Such a datatype can be named if one wants to re-use it or can be anonymous if it is to be used only once. There has been some confusion because the specification currently categorizes list and union as user-derived datatypes. They should rather be categorized as user-defined datatypes for clarity. This confusion may be addressed in the next version of XML Schema.
User-Derived Datatype by Restriction
Every built-in datatype has a set of allowed constraining facets, which can be used to constrain or restrict that datatype, leading to the creation of a new datatype categorized as a user-derived datatype. A constraining facet is an optional property that can be applied to a datatype to
constrain its "value space." Constraining the "value space" consequently constrains the "lexical space." Remember, the value space of a datatype can only be restricted and not extended. The XML Schema construct <restriction> is used to create user-derived datatypes by restricting an existing datatype with the allowed constraining facets. For example, a string of length 3 can be expressed as:
<?xml version="1.0" encoding="US-ASCII"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="http://mydatatypes.edu" elementFormDefault="qualified" attributeFormDefault="unqualified"> <element name="Currency"> <simpleType> <restriction base="string"> <length value="3" /> </restriction> </simpleType> </element> </schema>
In the above example, an anonymous user-derived datatype—the base datatype being string—is defined along with the constraining facet, length. The same example can be written using a named user-derived datatype for re-usability:
<?xml version="1.0" encoding="US-ASCII"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="http://mydatatypes.edu" xmlns:tns="http://mydatatypes.edu" elementFormDefault="qualified" attributeFormDefault="unqualified"> <element name="Currency" type="tns:currency_type" /> <element name="MoreCurrency" type="tns:currency_type" /> <simpleType name="currency_type"> <restriction base="string"> <length value="3" /> </restriction> </simpleType> </schema>
Following are the 12 constraining facets in XML Schema, which can be used to create a user-derived datatype from other available built-in datatypes. The constraining facets might change however depending on the base datatype:
length minLength maxLength pattern enumeration
User-Defined List Datatype In XML Schema a list is a sequence of homogeneous items, separated by a white space (space, tabs, carriage returns, new lines), where all the items in the list have the same datatype. It is similar to an array in Java, which is self-describing. The XML Schema construct <list> is used to create a list datatype. For example, a list of float can be created as under:
<?xml version="1.0" encoding="US-ASCII"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="http://mydatatypes.edu" elementFormDefault="qualified" attributeFormDefault="unqualified"> <element name="Currency"> <simpleType> <list itemType="float" /> </simpleType> </element> </schema>
A list need not always be of a built-in datatype; it can also be a list of user-derived datatype. For example, a list of user-derived datatype from float, where the value is restricted from 10.0 to 20.0, can be expressed as:
<?xml version="1.0" encoding="US-ASCII"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="http://mydatatypes.edu" elementFormDefault="qualified" attributeFormDefault="unqualified"> <element name="Currency"> <simpleType> <list> <simpleType> <restriction base="float"> <minInclusive value="10.0" /> <maxInclusive value="20.0" /> </restriction> </simpleType> </list> </simpleType> </element> </schema>
To re-use the above defined list datatype, we must name the list datatype as follows:
<?xml version="1.0" encoding="US-ASCII"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="http://mydatatypes.edu" xmlns:tns="http://mydatatypes.edu" elementFormDefault="qualified" attributeFormDefault="unqualified"> <element name="Currency" type="tns:listOfFloat" /> <simpleType name="listOfFloat"> <list> <simpleType> <restriction base="float"> <minInclusive value="10.0" /> <maxInclusive value="20.0" /> </restriction> </simpleType> </list> </simpleType> </schema>
A valid instance adhering to the above schema can hold a list of float between the range 10.0 and 20.0, both inclusive:
<Currency xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://mydatatypes.edu ex5.xsd" xmlns="http://mydatatypes.edu">10.0 12.4 15.0</Currency>
In the above example the items in the list are restricted to have a value from 10.0 to 20.0, but there is no restriction on the number of items in the list. If we want to restrict the number of items in the list to say 3, we can do that as follows:
<?xml version="1.0" encoding="US-ASCII"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="http://mydatatypes.edu" xmlns:tns="http://mydatatypes.edu" elementFormDefault="qualified" attributeFormDefault="unqualified"> <element name="Currency"> <simpleType> <restriction base="tns:listOfFloat"> <length value="3" /> </restriction> </simpleType> </element> <simpleType name="listOfFloat"> <list> <simpleType> <restriction base="float">
Here we used a facet—length—to restrict the number of items in the list in the above example. For datatypes derived from list datatype, regardless of the datatype of the individual itemType of list, only the following facets are allowed:
Length MinLength MaxLength Pattern Enumeration WhiteSpace
User-Derived Union Datatype
A union datatype is created by taking a union of one or more other datatypes. The XML Schema construct <union> is used to create union datatypes. For example, a union of int and float datatypes can be expressed as:
<?xml version="1.0" encoding="US-ASCII"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="http://mydatatypes.edu" elementFormDefault="qualified" attributeFormDefault="unqualified"> <element name="Currency"> <simpleType> <union memberTypes="int float" /> </simpleType> </element> </schema>
When validating the value of currency in the instance, it is first matched against datatype int. If it is not a valid int then it is matched against datatype float. If it is not a valid float either, then an error is raised. As you can see, the order in which memberTypes are declared is indeed significant, but only from a datatype validator perspective. From the user's perspective, the order of memberTypes is not significant at all.
Similar to list, a union can be of primitive datatypes as well as user-derived datatypes. For example, a union of user-derived datatypes from int and float can be expressed as follows:
<?xml version="1.0" encoding="US-ASCII"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="http://mydatatypes.edu" xmlns:tns="http://mydatatypes.edu"
A valid instance adhering to the above schema can hold either a single int between the range 10 and 20 or a single float between the range 30.0 and 40.0, both inclusive:
<Currency xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://mydatatypes.edu ex7.xsd" xmlns="http://mydatatypes.edu">35.0</Currency>
When restricting a union datatype,regardless of the datatype of the individual memberTypes, only the following facets are allowed:
Pattern Enumeration
It is possible to mix and match list, union, and atomic datatypes with restrictions to define a datatype per specific requirements. For more details about constraining facets, see Section 4.1.5 of XML Schema Part 2 and Appendix B of XML Schema Part 0.
Datatype Namespaces
The datatypes that we have seen thus far are associated with the XML Schema namespace http://www.w3.org/2001/XMLSchema, which has other XML Schema constructs as well, like complexType, complexContent, group, and so on.
Because the W3C XML Schema Datatypes spec was written with the intention of not being used exclusively within XML Schema definition language, but rather also to be used by other XMLrelated languages, it provides a subset namespace of http://www.w3.org/2001/XMLSchema— http://www.w3.org/2001/XMLSchema-datatypes—which contains only the built-in datatypes,
constraining facets, and so on needed to facilitate the use of XML Schema datatypes in other languages. The advantage of this separation affects the XML Schema datatype validator implementation, in the sense that a standalone implementation of XML Schema datatypes is possible—as opposed to implementing the entire XML Schema Structures plus XML Schema datatypes specification. Using Oracle XDK Apart from validating an XML instance against the XML Schema grammar, the Oracle XML Developer's Kit (XDK) provides APIs to programmatically use the built-in datatypes, restrict them using the constraining facets, and validate a value against the schema. For example:
import oracle.xml.parser.schema.*; . . . XSDSimpleType st = XSDSimpleType.getPrimitiveType(XSDSimpleType.iSTRING); try { //set a constraining facet on the simpleType st.setFacet(XSDSimpleType.LENGTH, "5"); } catch(XSDException ex1) { System.out.println("[ERROR] Facet not supported. "+ex1.getMessage()); } try { //validate value st.validateValue("hello"); System.out.println("[SUCCESS] The value is valid."); } catch(XSDException ex2) { System.out.println("[ERROR] Invalid Value. "+ex2.getMessage()); creates an anonymous datatype of type string and restricts it to successfully
validate only strings of length 5. You can use the XDK Schema APIs to create datatypes and restrict them programmatically. See the XDK javadoc for more details.
Conclusion Now that you understand datatypes in XML Schema and their usage, moving to other constructs of XML Schema, which define complex element content, should be much easier.