
In order to better deal with XML content from within object-oriented programming languages, the W3C has
put together an object model for XML. This is known as the Document Object Model, or the DOM. The
DOM treats the XML document as a tree.
The W3C DOM specification is comprised of three levels. Levels 1 and 2 are both Recommendations. Level
3 is currently under development. XML parsers may be DOM compliant if they generate the content in the
expected tree structure. In fact, the DOM is available for HTML documents as well. In this chapter we will
investigate using the Document Object Model to manipulate the XML tree. We will see how to add, modify,
delete, and replace elements and attributes. In addition, we will see how to traverse the XML tree.
What is the Document Object Model?
• The DOM is a programming API for XML and HTML documents.
• Utilizing the DOM, programmers may build documents, navigate them, and add, modify, or delete
elements and content.
• The essence of the DOM is to treat an XML (or HTML) hierarchical document as a tree, or more
accurately a “grove”.
- The term grove is used to describe a document that may contain more then one tree.
• As a goal of the W3C, the DOM should be accessible from a wide variety of programming languages,
environments, and applications.
• The W3C, then, has put forth a set of definitions to be implemented by any parser claiming DOM
compliance.
- A set of interfaces and objects used to represent and manipulate a document
- The semantics of these objects – including attributes and behaviors
- The relationships and collaborations between these objects
Document Object Model Specifications
• The W3C has three DOM specifications: Level 1, Level 2 and Level 3.
• The Level 1 document has reached Recommendation status. It defined the core of the DOM.
• Level 2 also a Recommendation. It specifies modifications to the core interfaces defined in Level 1, as
well as adding support for newer XML features such as stylesheets and namespaces.
• Level 3 is not yet a Recommendation.
- It will support XML Schema.
- It may also resolve the loading and saving issue, where various languages implement their own
mechanisms. The fact that this varies between languages and parsers has caused some confusion.
- We won’t be covering any Level 3 features. It is just too new.
• For an object model to be considered DOM compliant, it must expose particular properties, methods
and events.
• There are several freely available parsers for working with the DOM. Two of the more popular are
Xerces and Crimson.
- The Xerces parser is an open source parser with DOM support
- The Crimson parser, formerly Sun's parser, also has DOM support.
DOM Interfaces
• There are fundamental interfaces specified by the DOM.
- DOMImplementation: Provides a way to query the version of DOM implemented by a particular
parser instance.
- DOMException: This provides a construct for reporting errors when working with the DOM.
- DocumentFragment: This is a lightweight, minimal document object. Used for working with subtrees
of an XML document without all the overhead of a full document.
- Document: This interface represents the entire HTML or XML document. Conceptually, it is the root
of the document tree, and provides the primary access to the data. Extends Node.
- Node: This is the primary datatype for the entire DOM.
- NodeList: This is the abstraction of an ordered collection of nodes. This is a live collection.
- NamedNodeMap: This provides a collection of nodes accessible by name. They are not maintained in
any particular order. This is also a live collection.
- CharacterData: This extends the Node interface with a set of attributes and methods for accessing
character data in the DOM.
- Attr: Extends the Node interface, but they are not considered a part of the document tree, since they
are not elements. This represents an XML attribute in an Element object.
- Element: Extends the Node interface. This is an element in an XML or HTML document.
- Text: This interface inherits from CharacterData and represents the textual content of an Element or
Attr.
- Comment: Inherits from CharacterData. This represents the content of a comment. <!--blah -->
• There are several extended interfaces. Any DOM dealing only with HTML will not implement these.
They are not mandatory.
- CDATASection: Extends the Text interface. These are those blocks used to escape blocks of text
containing characters that would otherwise be considered markup.
- DocumentType: Extends the Node interface. This interface provides access to the list of entities
defined for a document in a DTD. In Level 1, editing of this node is not supported.
- Notation: Extends the Node interface. This represents a notation declared in the DTD. Also read-only.
- Entity: Extends the Node interface. This may be parsed or unparsed. This interface models the entity
itself, not the entity reference. Read-only.
- EntityReference: Extends the Node interface. These may be in an XML document referring to an
Entity defined in a DTD. Read-only.
- ProcessingInstruction: Extends the Node interface. The interface represents a PI used in XML.
• In addition, there is a basic native datatype, the DOMString.
- It is defined as a sequence of UTF-16 characters.
- In XML, we don’t have other datatypes. Only strings.
• So, the inheritance hierarchy picture looks like this:
Begin Working with DOM
• The typical first step to working with the DOM is to load an existing document into a DOM tree.
- parse: Parse an existing document from a file, stream, or other input source.
• The Java API for XML Processing, JAXP, uses the Factory design pattern.
- A DocumentBuilderFactory is created first.
- The DocumentBuilderFactory is used to create a DocumentBuilder.
- The DocumentBuilder parses an existing file, stream, etc. and loads the DOM Document.
import org.w3c.dom.*; // Include the DOM interfaces
import javax.xml.parsers.*; // JAXP parser package
import java.io.*;
// … snip
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse("C:\\inventory.xml");
• Also, when working with Java DOM implementations, the properties are not typically public.
- There are methods provided to retrieve attribute values. These methods begin with “get”.
- There are some methods provided to set attribute values. These methods begin with “set”.
Document Interface
• There are several methods for working with the Document, providing information about the document
itself.
Method Description
Attr createAttribute(name) Creates a new attribute with the specified name.
Attr createAttributeNS(uri, qname) Creates an attribute with the name and namespace.
CDATASection createCDATASection(data) Creates a CDATA section node that contains the supplied
data.
Comment createComment(data) Creates a comment node that contains the supplied data.
DocumentFragment createDocumentFragment() Creates an empty DocumentFragment object.
Element createElement(tagName) Creates an element node named as given.
Element createElementNS(uri, qname) Creates an element with the name and namespace.
EntityReference createEntityReference(name) Creates a new EntityReference object.
ProcessingInstruction createProcessingInstruction(target, data) Creates a PI node that contains the
supplied target and data.
Text createTextNode(data) Creates a text node that contains the supplied data.
Element getElementById(id) Returns the element with the provided ID.
NodeList getElementsByTagName(tagname) Returns a NodeList of elements that have the specified
name in the document.
NodeList getElementsByTagNameNS(uri, lname) Returns a NodeList of elements that have the specified
name and namespace.
Node importNode(importedNode, deep) Import a node from another document. This imported copy
may be deep (include all descendants) or not.
• There are several attributes on the Document.
Attribute Description Read/Write?
doctype The document type declaration. Read-only
documentElement Convenience attribute for retrieving the root element of the document. Read-only
implementation The DOMImplementation object handling this document. Read-only
• For Java, rather than accessing the properties directly, there are methods provided for getting the
properties.
Method Description
Element getDocumentElement() Returns the root element.
DocumentType getDoctype() Returns the document type declaration.
import org.w3c.dom.*; // Include the DOM interfaces
import javax.xml.parsers.*; // JAXP parser package
import java.io.*;
// … snip
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse("C:\\inventory.xml");
Element root = doc.getDocumentElement();
Node Interface
• The Node interface defines constants for the possible node types.
- Keep in mind; the type of node will dictate whether it may have children, attributes, etc.
- Also remember that in the DOM, nearly everything is a node! This can be confusing.
Name Value
ELEMENT_NODE 1
ATTRIBUTE_NODE 2
TEXT_NODE 3
CDATA_SECTION_NODE 4
ENTITY_REFERENCE_NODE 5
ENTITY_ NODE 6
PROCESSING_INSTRUCTION_NODE 7
COMMENT_NODE 8
DOCUMENT_NODE 9
DOCUMENT_TYPE_NODE 10
DOCUMENT_FRAGMENT_NODE 11
NOTATION_NODE 12
• The Node interface has several attributes defined.
Properties Description Read/Write?
attributes Contains the NamedNodeMap of attributes for this node if it is an Element, null
otherwise. Read-only.
childNodes Contains a NodeList containing the children (for nodes that can have children). Read-
only.
firstChild Contains the first child of this node. Node. Read-only.
lastChild Returns the last child node. Node. Read-only.
localName Returns the local part of the qualified name of this node. DOMString. Read-only.
namespaceURI The namespace URI of this node. DOMString. Read-only.
nextSibling Contains the next sibling Node of this node in the parent's child list. Read-only.
nodeName Contains the qualified name of the element, attribute, or entity reference, or a fixed string for
other node types. DOMString. Read-only.
nodeType Specifies the XML DOM node type, which determines valid values and whether the node can
have child nodes. One of the values defined in the enumeration at the start of this section. Read-only.
nodeValue The value of the node, depending on its type. DOMString. Writeable.
ownerDocument Returns the document object associated with this node. Read-only.
parentNode Contains the parent node (for nodes that can have parents). Read-only.
prefix Returns the namespace prefix as a DOMString. Read-only.
previousSibling Contains the node immediately preceding this one. Read-only.
• The values of nodeName, nodeValue, and attributes vary according to the type of node.
Interface nodeName nodeValue attributes
Attr name of attribute value of attribute null
CDATASection #cdata-section content of the CDATASection null
Comment #comment content of the comment null
Document #document null null
DocumentFragment #document-fragment null null
DocumentType document type name null null
Element tag name null NamedNodeMap
Entity entity name null null
EntityReference name of entity referenced null null
Notation notation name null null
ProcessingInstruction target entire content, excluding target null
Text #text content of the text node null
• There are several methods available for working with the Node.
Method Description
Node appendChild(newChild) Appends newChild as the last child of this node, and returns it.
Node cloneNode(deep) Returns a duplicate of this node, including all attributes and their values. If deep
is true, recursively clone the subtree under the specified node.
Boolean hasAttributes() Returns true if this node has any attributes.
Boolean hasChildNodes() Returns true if this node has children.
node insertBefore(newChild, refChild) Inserts a newChild node before the refChild, or at the end of the
list of children and returns it.
Boolean isSupported(feature, version) Returns true if the specified feature is supported on this node.
void normalize() Put all text nodes into normal form. That is that there are no adjacent text nodes, and
there are no empty text nodes. This is useful for working with XPointer.
Node removeChild(oldChild) Removes the specified child node from the list of children and returns it.
Node replaceChild(newChild, oldChild) Replaces the specified old child node with the supplied new
child node in the set of children of this node, and returns the node that was replaced.
• For Java, rather than accessing the properties directly, there are methods provided for getting the
properties.
Method Property
NamedNodeMap getAttributes() attribute
NodeList getChildNode() childNodes
Node getFirstChild() firstChild
Node getLastChild() lastChild
String getLocalName() localName
String getNamespaceURI namespaceURI
Node getNextSibling() nextSibling
String getNodeName() nodeName
String getNodeType() nodeType
String getNodeValue() nodeValue
Document getOwnerDocument ownerDocument
Node getParentNode() parentNode
String getPrefix() prefix
Node getPreviousSibling previousSibling
void setNodeValue(string) nodeValue
• The typical steps then, in editing DOM documents are:
- Use the Document interface create methods
- Use the Node interface append, insert, replace, etc.
• Here is an example of working with Node interface to add new nodes to a document.
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse("C:\\inventory.xml");
Element root = doc.getDocumentElement();
Element newChild = doc.createElement("NewChild");
Text newText = doc.createTextNode("New Text");
// Note that at this point the text node has no home. We must assign it to a location.
newChild.appendChild(newText);
// Note that at this point the element node has no home. We must assign it to a location.
root.appendChild(newChild);
}
catch (Exception e) {
System.out.println("An error was encountered " + e.getMessage());
}
NodeList Interface
• The NodeList interface provides the abstraction of an ordered collection of nodes, without defining or
constraining how this collection is implemented.
• It is critical to understand that this is a live collection.
- As changes are made to the tree, they are automatically reflected in the NodeList.
• There is one attribute, length, which is read-only. It gives the number of nodes in the list.
- The range of the node list then is from 0 to length-1 inclusive.
• There is one method, item. It accepts an index as a parameter and returns the indexth item in the
collection as a Node.
• An example using the NodeList.
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse("C:\\inventory.xml");
Element root = doc.getDocumentElement();
NodeList allNodes = root.getChildNodes();
for (int i = 0; i<allNodes.getLength(); i++) {
System.out.println("node name " +
allNodes.item(i).getNodeName());
}
Element newChild = doc.createElement("NewChild");
Text newText = doc.createTextNode("New Text");
newChild.appendChild(newText);
root.appendChild(newChild);
for (int i = 0; i<allNodes.getLength(); i++) {
System.out.println("node name " +
allNodes.item(i).getNodeName());
}
}
catch (Exception e) {
System.out.println("An error was encountered " + e.getMessage());
}
• Notice that the NodeList, allNodes, will reflect the changes made by adding new child nodes to the tree!
NamedNodeMap Interface
• The NamedNodeMap interface is for representing collections of nodes that can be accessed by name.
- Commonly, this interface is for working with attributes.
• Note that this does not inherit from NodeList. These maps are not maintained in any particular order.
- They may also be accessed by an ordinal index, but this is just for convenience of enumeration. These
nodes are not maintained in any particular order.
• There is one attribute, length, which is read-only. It gives the number of nodes in the list.
- The range of the valid child node indices is from 0 to length-1 inclusive.
• There are several methods.
Method Description
Node getNamedItem(name) Retrieves a node specified by name.
Node getNamedItemNS(uri, name) Retrieves a node specified by a name and a namespace URI.
Node item(index) Returns the indexth item in the map.
Node removeNamedItem(name) Removes a node specified by name, and returns it.
Node removeNamedItemNS(uri, name) Removes a node specified by a name and a URI, and returns it.
Node setNamedItem(node) Adds a node using its nodeName attribute. If a node with that name already
exists in this map, it is replaced.
Node setNamedItemNS(node) Adds a node using its nodeName and namespaceURI. If a node with that
name already exists in this map, it is replaced.
• An example of working with the NamedNodeMap.
Element newChild = doc.createElement("NewChild");
Text newText = doc.createTextNode("New Text");
newChild.appendChild(newText);
root.appendChild(newChild);
newChild.setAttribute("att1", "the first attribute");
newChild.setAttribute("att2", "the second attribute");
newChild.setAttribute("att3", "the third attribute");
NamedNodeMap nnm = newChild.getAttributes();
System.out.println("The first attribute is " +
nnm.getNamedItem("att1").getNodeValue());
DOM
Table of Contents
online learning aid. Any attempts to copy, reproduce, or use for training is strictly prohibited.
Courseware
Training Resources
Tutorials