XML Tutorial

•        To see the purpose of XML.
•        To learn the history of XML.
•        To see what some of the surrounding technologies are.
•        To get a brief introduction to the W3C and their role in the development of the specifications.
•        Learn the basic tools required for working with XML, and where they may be obtained.
•        To learn the basic syntax for XML.
•        Learn the rules for well-formed XML documents.
•        See how to test an XML document for well-formedness
•        To learn the basics of the XML Document Object Model.
•        To learn the uses for the DOM, as well as strengths and weaknesses.
•        To see how to build and manipulate XML documents using the DOM.
•        To learn how to traverse XML content.




XML has a long history.  Its development came about as a result of weaknesses and deficiencies of other
markup languages.  We will be examining some of those weaknesses, and indeed looking to the past to get
motivation for the use of XML in the future.   This chapter is providing us with a broad overview of XML
itself, and its relative specifications providing a road map for the remainder of the course.
We will begin by seeing the history of XML and the goals for XML as laid out by the W3C, the World Wide
Web Consortium.  Indeed, we will take a look at the role of the W3C and the various stages a specification
must pass through in order to become a standard.   
And finally, there are many related technologies either currently under development or finalized.  We will take
a look at what these are, and where they currently stand with the W3C as far as standardization.  


An XML History                                                                        
•        1969:  IBM developed Generalized Markup Language (GML)
-        Goldfarb
-        Mosher
-        Lorie
•        Three principles
-        A single standard.
-        Allows for defining arbitrary extensions.
-        Document types should be formally defined so that documents may be checked for validity.
•        The three principles of GML were continued in the following technology of SGML.
-        In fact, these three principles are still a part of the goals for XML.
•        1978-1986:  Goldfarb and company developed Standard Generalized Markup Language (SGML).
-        SGML was a refinement of GML.
•        SGML is a widely used markup language today.  It is indeed a direct ancestor of XML.  
•        1989:  Tim Berners-Lee developed Hypertext Markup Language (HTML).
-        Much simpler than SGML.
-        Uses the same angle bracket notation.
-        Only a fixed set of element types.
-        Not rigorously defined at first.
•        HTML, Hypertext Markup Language, is a widely accepted Internet standard.
-        Comprised of a set of tags, which are used to describe how data should appear on the screen.
•        HTML is not without its problems.
-        Not very printer-friendly, and is limited because it is tied to a set of tags.
-        Extensions were invented to help give HTML more interactivity and to separate the presentation from
the data.
-        It is a formatting markup language rather than generalized.  That means the presentation rules are
intermingled with the data.  This is a bit of a problem.

•        To address these problems, companion HTML tools were created.
-        Scripting
-        Java applets and ActiveX controls
-        Cascading Style Sheets (CSS) and DHTML
-        Plug-ins
•        It seemed that a simpler subset of SGML should be defined that is more extensible than HTML.  
Hence, XML – the Extensible Markup Language.
-        In other words, rather than trying to “fix” HTML, it seemed more appropriate to revisit SGML and
derive a simpler markup language from it.
World-Wide Web Consortium                                                        
•        Enter the W3C, the World-Wide Web Consortium.  The W3C is a standards body responsible for
Internet technologies and protocols.
•        Companies and individuals submit proposals to the W3C for technology standards.  The W3C then
forms a committee, which is responsible for the development of the standards.
•        The levels of maturity of a specification document are as follows:
-        Working Draft:  This represents a work in progress, and a general commitment to pursue work in that
particular area.
-        Last Call Working Draft:  A work in progress that has reached a level where the authors are requesting
review from other groups.  If approved, after this point it enters the Recommendation phases.
-        Candidate Recommendation:  A stable Working Draft that the Director has proposed to the community
for implementation experience and feedback.
-        Proposed Recommendation:  A Candidate Recommendation that has benefited from implementation and
has been sent to an Advisory Committee for review.
-        Recommendation:  This represents consensus within the W3C.  This is considered, then, a standard
ready for widespread deployment.
•        The W3C undertook the task of rolling out a new markup language.
-        Simpler than SGML.
-        Extensible, unlike HTML.
•        The XML Specification reached Recommendation status in February 1998.
What is XML?                                                                                
•        It combines the flexibility and power of SGML with the widespread acceptance of HTML.
•        XML is a simple, standard way to delimit text data.  It has been described as the “ASCII of the web”.
-        It allows for creating an arbitrary data structure, and then sharing it with any other programming
language on any computing platform.
•        SGML is the starting point.  And indeed, XML is defined as a subset of SGML.
-        Due to its many optional features, SGML is complex and it is difficult to write generic parsers.  XML,
in contrast, is much simpler and parsers are more readily available and creatable.
-        XML is to retain backwards compatibility with existing SGML-oriented systems.  This allows data
marked up in XML to be used by SGML systems saving in conversion costs and leveraging the greater
accessibility provided by the web.
•        XML leverages existing Internet protocols and software for easy data processing and transmission.
-        As we will see, HTTP is a powerful tool for transmitting not only HTML but XML as well.
-        Because of its simplicity, plain text editors can be used to create and edit XML documents.

•        XML is a method for putting structured data in a text file.
-        XML is a set of rules for designing text formats for data that is independent of the software, platform,
etc. producing it.
•        XML is text, with strict rules.
-        With HTML, forgotten tags or attributes without quotes are acceptable.  And indeed, browsers are very
forgiving.  XML is by nature far stricter, and such types of errors require the application to halt parsing and
issue an error.
•        XML is a family of technologies.
-        As we will soon see, XML has many related specifications which are optional modules providing
special abilities and complementary functions for XML data.
•        XML is verbose, but that’s okay.
-        Since XML is text, and uses tags to delimit data, the size of the files is almost always bigger than a
comparable binary format.  This is really less of an issue as time passes since storage is cheap and there are
indeed many good compression algorithms applicable to XML.
•        XML is license-free, platform-independent, and well supported.
-        In choosing XML, you will have a large and growing community of tools available to help on your
projects.  XML is a W3C technology, and is free then for use without having to pay anyone anything!  And,
since it is freely available, there is no tie to a single vendor!
XML and Its Core Technologies                                                
•        XML 1.0, Second Edition: contains only errata corrections for XML 1.0, Feb 1998.
-        Recommendation
•        XML Infoset 1.0: defines the atomic units comprising a well-formed XML document.
-        Recommendation
•        XInclude 1.0: merge multiple XML documents by creating a composite Infoset.
-        Candidate Recommendation
•        XML Base:  Establish a base URI for an XML document.
-        Recommendation
•        XML 1.1:  Update XML to use Unicode 3, to support checking of normalization, and follow Unicode's
line endings more closely.  Formerly known as Blueberry.
-        Candidate Recommendation
Validation                                                                                        
•        XML Schema: document in three parts describing a common syntax for representing the structure and
datatypes in an XML document. Provides a replacement for the old DTD, three parts.
-        Recommendation
•        Document Type Definition, DTD: a non-XML syntax for describing the hierarchical structure of an
XML document.  Formerly the only standard way to provide XML validation.  
-        Superseded by XML Schema.
XML Programming Tools                                                                
•        Document Object Model, DOM: platform and language neutral interface allowing programs and scripts
to dynamically access and update the content, structure, and style of documents. Operates on an XML
document as a tree.
-        Level 1 and 2 – Recommendation
-        Level 3 – Working Draft
•        Simple API for XML, SAX: an API to the parser itself, programmers write code to respond to parser
generated events – not under the auspices of the W3C, maintained by xml.apache.org. Commonly used.  
Treats XML data as a forward-only, read-only stream.
Style and Structure                                                                        
•        Cascading Style Sheets: add style to web documents. Used for both XML and HTML.
-        Several levels/versions of CSS are available.
-        Does not support structural transformations.
-        Recommendation.
•        XSL 1.0: language for expressing stylesheets.
-        Recommendation
-        Comprised of three parts -- XSLT, XSLF, and XPath.

•        XSLT 1.0: language for transforming XML documents
-        Recommendation
•        XPath 1.0: expression language for referencing parts of an XML document.  Used by XSLT, XPointer,
XQuery and XLink.
-        Recommendation
•        XSLF: vocabulary for expressing formatting semantics.
-        Recommendation


•        XSLT 2.0: integrate newer XML features, like Schema.
-        Working Draft
•        XPath 2.0: integrate newer XML features, like Schema.
-        Working Draft

XML Helper Specifications                                                        
•        URI, URL, URN: short string identifying resources on the web. URI is a superset of URL and URN.
URLs include gopher, ftp, http, etc. A URN is a URL with an institutional commitment to persistence and
availability.
•        Namespaces in XML 1.0: provides a mechanism for disambiguating the elements and attributes in an
XML document by associating them with a “namespace” using prefixes.  A namespace is a collection of
names, identified by a URI reference.
-        Recommendation
XHTML
                                                                               
•        XHTML 1.0: reformulate HTML based on XML. Apply the rigors of XML to an HTML document.  
There are three flavors: Strict (use CSS for all styling, no tags associated with layout), Transitional (use CSS
and use some HTML layout concessions for older browser compatibility), and Frameset (work with frames)
-        Recommendation
•        Modularization of XHTML: decomposition of XHTML into a collection of abstract modules providing
specific types of functionality. This should facilitate the use of HTML across various platforms.  Expressed
in DTD, with Schema coming soon.
-        Recommendation
•        XHTML 1.1: merging of XHTML 1.0 and modularization
-        Recommendation
Locate, Search and Query                                                        
•        XLink 1.0: insert elements into XML documents to create and describe links between resources. More
sophisticated than HTML linking.
-        Recommendation
•        XML Query, XQuery 1.0: provides a more intelligent means of querying XML content from both a
database and document perspective.
-        Several documents are produced in support of the XML Query language
-        None of them are Recommendations yet, in fact, most are Working Drafts
XML Related Protocols                                                                
•        Hypertext Transfer Protocol, HTTP 1.1: request/response protocol providing open-ended set of
methods and headers indicating the purpose of the request.
-        Recommendation
•        XML Protocol: the W3C work on B2B XML messaging.
-        Includes the SOAP Note submitted previously.
-        Provides mechanisms for sending XML over HTTP.
-        SOAP 1.2 currently a Candidate Recommendation.



Tools of the Trade                                                                        
•        In order to work with XML there are a few simple tools that must be acquired.  Most are available for
download.
•        Internet Explorer has had XML support since version 5.0.
-        The most current version of IE is 6.0, which includes MSXML version 3.0.
•        MSXML 3.0 has support for a number of XML specifications:
-        XML, and XML Validation against DTD
-        DOM
-        XSLT and XPath
-        CSS
-        SAX2
•        There are other XML tools for download at the MSDN online site.
-        A tool for validating XML and viewing XSLT output.  You will want to download this for validating
XML documents and testing XSLT.
-        An XSL style sheet for XML schemas, view schemas color-coded and laid out really nice in the
browser.
•        General parsers, for programmatic access, are available throughout the Internet for free download.  Here
is a list of some of the more commonly used, and widely accepted.

•        Xml.apache.org is a site providing commercial quality standards based XML solutions for Java and
C++ developers.  There are several parsers available on this site.
-        Xerces parsers in Java and C++ (with Perl and COM bindings).  This parser supports XML Schema,
DOM level 2 version 1, and SAX 2.0.
-        Xalan in Java and C++, providing XSLT stylesheet processors.  It may be used in conjunction with the
Xerces SAX parser.
-        FOP for XSL formatting object support in Java.  
-        A couple of Apache implementations of the SOAP specification submitted to the W3C for
consideration.
•        Additionally, there are other vendors, such as IBM providing their own validating parsers.  If
interested, visit IBM’s alphaWorks site.
•        Special XML editors may become needed if you plan to publish many XML documents.  If, however,
you are a programmer who will be mainly dealing with XML as a data transport language, a plain text editor
such as Notepad or VI may suit your purposes just fine.
XML Tutorial
Table of Contents
Copyright (c) 2008.  Intertech, Inc. All Rights Reserved.  This information is to be used exclusively as an
online learning aid.  Any copying is strictly prohibited.
Courseware
Training Resources
Tutorials