
Now that we have seen the big picture, it is time to get into the specifics. We will learn the syntax for
creating XML documents, as well as the rules for XML documents being well-formed. The XML syntax is
not difficult, and it is only in its strict requirements that any difficulties arise in creating documents.
It is critical that a very solid understanding of XML comes from this chapter, as it will form the basis for
everything throughout the remainder of this course. We will in fact, as a practical application of XML, take a
look at an existing database and see how to map it to a well-formed XML document.
XML Syntax: The Bricks
• Every XML document should begin with the XML declaration.
• The XML declaration should be on the first line of the document.
• The declaration has the following attributes.
- Version: describes the XML version to be used in the document. So far, the only version is “1.0”.
Required.
- Standalone: specifies whether there are external markup declarations. Valid values are “yes” and “no”.
Optional.
- Encoding: tells which type of character encoding is used. Optional.
<?xml version=”1.0” encoding=”UTF-8” standalone=”yes”?>
• The only types of encoding required for support by an XML processor are “UTF-8” and “UTF-16”.
Elements
• An element is the proper name for the XML tags and the content contained within.
- An element tag begins with a “<” and ends with a “>”.
- An element has a start tag, content, and an end tag.
- An element might be empty and contain no content. There is a special syntax for specifying an empty
element, the closing character is preceded by the “/” character.
<elementname>element content</elementname> <!-- an element with content -->
<anotherelementname /> <!-- an element without content -->
• Element names must begin with a letter or an underscore, and then any sequence of characters (with the
exception of the space).
• XML is case sensitive.
• Elements may contain only text.
<?xml version="1.0"?>
<ClassMaterial>Complete XML</ClassMaterial>
• Elements may only have children.
<?xml version="1.0"?>
<ClassMaterial>
<Title>Complete XML</Title>
</ClassMaterial>
• Elements may contain both text and children. These elements would be known as “mixed content”.
- Mixed content is generally avoided.
<?xml version="1.0"?>
<ClassMaterial>Complete XML
<Chapter>A History Of XML
<Objective>There are many objectives for this chapter
<Point>To see the purpose of XML</Point>
<Point>To learn the history of XML</Point>
<Point>To see the building blocks of XML syntax</Point>
<Point>To learn the rules for well-formed XML documents.</Point>
<Point>To see what some of the surrounding technologies are</Point>
</Objective>
</Chapter>
</ClassMaterial>
• The first element in a document is known as the root element.
Attributes
• Elements may have attributes. Attributes provide more information about the element and are located
within the start tag of an element.
- Attribute values are contained in quotes.
- The attribute “xml” is reserved.
<elementname attributename=”attribute value”>element content </elementname>
• Attribute names follow the same rules as element names.
• You can think of attributes with reference to HTML. Remember the HTML body tag? It has the
attribute of bgcolor for example.
<HTML><HEAD></HEAD>
<BODY BGCOLOR=”white”>
</BODY>
</HTML>
• Attributes are typically used to provide more information about the element itself, and are usually
simple data values. A good example would be an id.
<?xml version=”1.0”?>
<classListing>
<student id=”555-53-3242”/>
<student id=”443-34-2344”/>
<student id=”325-22-3445”/>
</classListing>
• Attributes do not affect whether an element is text only, child only, or mixed.
• Elements may have many attributes.
• Elements may not have more than one of the same attribute.
• Taking a look at our class material example from before. We could use attributes to assign a title to our
chapter and class material.
<?xml version="1.0"?>
<ClassMaterial Title="Complete XML" Author=”Gina McGhee”>
<Chapter Title="The XML Saga" PageCount=”3”>
<Objective>
<Point>To see the purpose of XML</Point>
<Point>To learn the history of XML</Point>
<Point>To see the building blocks of XML syntax</Point>
<Point>To learn the rules for well-formed XML documents.</Point>
<Point>To see what some of the surrounding technologies are</Point>
</Objective>
<Overview>XML has a long history. Its development came about as a result of weaknesses and
deficiencies of other markup languages. We will be examining some of these weaknesses, and indeed looking
to the past to get motivation for the use of XML in the future. Also, we will get our first glimpse into the
building blocks of XML and see some of the surrounding technologies.</Overview>
<Section Title="Goals of XML">
<Point>Straightforwardly usable over the Internet.</Point>
<Point>Support a wide variety of applications.</Point>
<Point>Designs may be prepared quickly.</Point>
<Point>Documents should be easy to create.</Point>
<Point>Terseness of minimal importance.</Point>
</Section>
<Section Title="What is XML?">
<Point>Text formatting
<Subpoint>Formatting markup</Subpoint>
<Subpoint>Generalized markup</Subpoint>
</Point>
<Point>The goal: to digitally represent documents.</Point>
</Section>
<Summary>
<Point>XML is a technology important to learn.</Point>
</Summary>
</Chapter>
</ClassMaterial>
Comments
• XML comments begin with “<!--" and end with “-->”.
<!-- This is an XML comment -->
Character Entities
• XML character entities are similar to HTML entities, and are used to include reserved characters, such
as “<” within a document.
• These characters are reserved since the parser believes these to be parsed as XML elements.
<?xml version=”1.0”?>
<Math>
<Statement>6 < 7 </Statement> <!-- 6 is less than 7 -->
</Math>
• Note in the above example the parser will look for the matching close element tag. This will produce an
error.
• There are some built-in XML character entities.
Entity Represents Character representation
" quotation marks “
' apostrophe ‘
< less-than sign <
> greater-than sign >
& ampersand &
<?xml version=”1.0”?>
<Math>
<Statement>6 < 7 </Statement> <!-- 6 is less than 7 -->
</Math>
CDATA Sections
• The case may arise that a significant chunk of data contains markup that may have reserved characters.
CDATA sections are used to delineate this piece of data.
• CDATA sections are used to enclose large amounts of text containing reserved markup characters that
are not to be validated as XML, or in the case of variable data over which there is no control.
- Images
- Script blocks
- User input
• A CDATA section begins with the special characters “<![CDATA[“ and ends with “]]>”
<![CDATA[
<script language=”javascript”>
var myVar;
function myFunction()
{
return true;
}
</script>
]]>
• CDATA sections may occur anywhere character data may occur in an XML document.
- They may not appear outside the root element.
<?xml version=”1.0”?>
<ClassMaterial Title=”Gardening for Beginners”>
<CoverPhoto><![CDATA[saji j3<<<k34<>>>>>>uxf934f9u5””/98r]]></CoverPhoto>
<Chapter Number=”1”>We all love to garden. It makes us feel closer to Mother Earth.
</Chapter>
</ClassMaterial>
Processing Instructions
• Processing instructions are used to pass information to the application that is processing the XML
document.
<?PITarget instruction:parameters ?>
- A processing instruction begins with a “<?” and ends with “?>”.
- The PITarget is the application which will process the instruction.
- The instruction would be the method to be executed by the target application.
<?myWordProcessor COUNT:pages?>
• The Notation mechanism may be used to formally declare a PITarget.
• Due to the nature of PIs, they are implementation specific, and parsers that do not recognize them are
free to ignore them.
• There is an example of a useful PI, it associates a stylesheet with an XML document.
<?xml-stylesheet type=”text/xsl” href=”myStylesheet.xsl”?>
All Together Now!
<?xml version="1.0" standalone="yes" ?>
<patients>
<![CDATA[
<script language="javascript">
function displayRecord(admissionID)
{
document.text1.value=admissionID;
document.text2.value=patientRecord(admissionID).name.value;
}
</script>
]]>
<!-- Patient records are to be stored here -->
<patientRecord admissionID="553534234">Family History of heart disease
<name>John Smith</name>
<insurance>Jones & Barnes</insurance>
<condition>High blood pressure</condition>
<condition>Heart disease</condition>
<medication>Aspirin</medication>
</patientRecord>
<patientRecord admissionID="533453455"> Admitted with stomach cramps
<name>Jill Toomey</name>
<insurance>Jones & Miller</insurance>
<condition>Diabetes</condition>
<medication>Insulin</medication>
</patientRecord>
</patients>
XML: The Rules
• The rules for XML:
- Documents must be well-formed.
- Documents may be valid. We will cover validity in the XML Schema and DTD chapters.
• Well-formed documents must meet a specified set of criteria.
- There must be at least one element. The first element in the document is known as the “root element".
- All tags must be properly nested.
- XML is case sensitive.
- There may be only one root element
- Attribute values must be in quotation marks.
- Elements may not have duplicate attribute names.
- Element and attribute names must obey XML naming rules: begin with a letter or an underscore and
contain no spaces.
XML
Table of Contents
Copyright (c) 2008. Intertech, Inc. All Rights Reserved. This information is to be used exclusively as an
online learning aid. Any attempts to copy, reproduce, or use for training is strictly prohibited.
Courseware
Training Resources
Tutorials