Sunteți pe pagina 1din 33

XML Parsers

Parsing
Parsing
• XML parsing is required so that our application can
inspect, retrieve and modify the document contents.
XML parser program this sits between XML
document and our application. In an attempt to
standardize the way parser should work, two
specification has come out, that spells out the
interfaces that an application can expect from a
parser:
• SAX: the Simple API for XML: SAX processes the
XML document a tag at a time and generates events.
• DOM: the Document Object Model: describes the
document as a data-structure in the form of tree. It
first loads the entire xml in the form of tree. Then
application can edit any traverse and edit any node.
SAX Vs. DOM
• When it comes to fast, efficient reading of XML data,
SAX is hard to beat. It requires little memory, because
it does not construct an internal representation (tree
structure) of the XML data. Instead, it simply sends
data to the application as it is read —your application
can then do whatever it wants to do with the data it
sees.But you can’t go back to an earlier position or
leap ahead to a different position.
• In general, it works well when you simply want
to read data and have the application act on it.
• DOM is not suitable for the above since it has to read
the entire data before it acts on it. Also it requires
more memory.
• But when you need to modify an XML structure
— especially when you need to modify it interactively,
an in-memory structure like the Document Object
JAXP API
•THE Java API for XML Processing (JAXP) is for processing
XML data using applications written in the Java programming
language.
•JAXP leverages the parser standards SAX (Simple API for
XML Parsing) and DOM (Document Object Model) so that you
can choose to parse your data as a stream of events or to build
an object representation of it.
•JAXP also supports the XSLT (XML Stylesheet Language
Transformations) standard, giving you control over the
presentation of the data and enabling you to convert the data to
other XML documents or to other formats, such as HTML.
•JAXP also provides namespace support, allowing you to work
with DTDs that might otherwise have naming conflicts.
•JAXP comes with standard java SDK.
Steps to write application
1. Obtain a parser object
2. Obtain a source of XML data
3. Give that source to the parser to parse.
• JAXP has just Interfaces for SAX and DOM
and abstract classes that provide factory
methods for obtaining instances of parser
and an XML data source.
• 4 packages:
• org.xml.sax: SAX Distribution
• org.xml.sax.helper: SAX Distribution
• org.w3c.dom: DOM in java
• javax.xml.parsers: JAXP distribution
SAX Programming model
•Not a W3C standard but widely adopted including
IBM and Sun.
•The standard SAX distribution for java contains 2
packages:
• org.xml.sax
• org.xml.sax.helpers.
•They contain 11 classes and interfaces.
Classes
• Classes related to Parser:
• org.xml.sax.XMLReader is the interface that
an XML parser's SAX2 driver must implement. It is
an Interface for reading an XML document using
callbacks.
• javax.xml.parsers.SAXParser defines the
API that wraps an XMLReader implementation
class. An instance of this class can be obtained
from the
javax.xml.parsers.SAXParserFactory.
newSAXParser() method.
•Classes related to application that we write:
•Contain interface called
org.xml.sax.ContentHandler: This is the main interface
that most SAX applications implement.This interface define the
methods which the parser class will use as call backs. The
Parser class excepts an object of this type to be passed in its
constructor.
•org.xml.sax.helpers.DefaultHandler is a class that
implements ContentHandler. Default base class for SAX2
event handlers.
•Exception classes: SAXException,
SAXParserException
•Helper classes: SAXParserFactory
•When parser reaches the end of the document, the only data in
the memory is what your application saved.
SAX Programming model

startDocument
DTD 2.input
e startElement
(optional) SAXParser v
calls characters
e
XML source 2. input handler endElement
n
methods t endDocument
s
etc
1. creates 2. input output

SAXParserFactory
Class implementing ContentHandler
org.xml.sax.ContentHandler
• It is this interface which declares the event
handling methods of SAX.
• void characters(char ch[], int
start, int length)
• void startDocument
• void endDocument()
• public void startElement(String uri,
String localName, String qName,
Attributes attributes)
• void endElement(String uri, String
localName, String qName)
• void processingInstruction(String
target, String data)
DefaultHandler and
SAXParser
• DefaultHandler: The easiest way to implement
ContentHandler interface is to extend the
DefaultHandler class, defined in the
org.xml.sax.helpers package.
• SAXParserFactory, SAXParser: SAXParser
is an abstract class. The static newInstance()
method of SAXParserFactory returns a new
concrete implementation of this class. It throws a
ParserConfigurationException if it is unable
to produce a parser that matches the specified
configuration of options.
• Xerces Parser from Apache: implements the Parser
and uses JAXP API (org.apache.xerces.jaxp).
//Program 1: Counting no. of elements
import java.io.*;
import org.xml.sax.Attributes;
import javax.xml.parsers.SAXParser;
import org.xml.sax.helpers.DefaultHandler;
import javax.xml.parsers.SAXParserFactory;
public class CountSax extends DefaultHandler{
public static void main(String s[]) throws
Exception{
if (s.length !=1){
System.out.println("Usage: cmd filename");
System.exit(0);
// Use the default (non-validating) parser
SAXParserFactory
factory=SAXParserFactory.newInstance();
/*Creates a new instance of a SAXParser using the currently
configured factory parameters.*/
SAXParser saxParser=factory.newSAXParser();
File f= new File(s[0]);
if(f.exists())
// Parse the input
saxParser.parse(f,new CountSax());
else
System.out.println("unknown file");
}
static private int ele=0;
public void startDocument(){ele=0;}
public void startElement(String uri, String
localName, String qName, Attributes attrs)
{ ele++;}
public void endDocument(){
System.out.println("Number of elements :"
+ele);
}}
Execution:
java CountSax note.xml
Number of elements :4
/*Program 2: Creating HTML document to represent
note.xml*/
import java.io.*;
import org.xml.sax.*;
import javax.xml.parsers.*;
import org.xml.sax.helpers.DefaultHandler;
public class NoteSax extends DefaultHandler{
PrintWriter out;
public NoteSax()throws Exception{
out= new PrintWriter(new BufferedWriter(new
FileWriter("note.html")));
}
public static void main(String s[]) throws
Exception{
if (s.length !=1){
System.out.println("Usage: cmd filename");
System.exit(0);}
SAXParserFactory
factory=SAXParserFactory.newInstance();
SAXParser saxParser=factory.newSAXParser();
File f= new File(s[0]);
if(f.exists())
saxParser.parse(f,new NoteSax());
else
System.out.println("unknown file");}
public void startDocument(){}
public void startElement(String uri, String
localName, String qName, Attributes attrs){
if(qName.equals("note"))
out.println("<html><head><title>Note</titl ></
head ><body>");
if(qName.equals("to"))out.println(" To, ");
if(qName.equals("from"))
out.println("<p align='right'><font
color='black'> -from ");
if(qName.equals("body") && (attrs.getLength()>0))
{for (int i = 0; i < attrs.getLength(); i++) {
String aName = attrs.getQName(i);
String value=attrs.getValue(i);
if(aName.equals("type")){
if( value.equals("warm"))
out.println("<font color='green'>");
if( value.equals("cold"))
out.println("<font color='red'>");
if( value.equals("formal"))
out.println("<font color='blue'>"); }
if(aName.equals("subject"))
out.println("<I>" +value+":</I>");
}//end of for
}// end of if
}
public void endElement(String uri, String
localName, String qName, Attributes attrs){
if(qName.equals("body"))
out.println("</font>");
if(qName.equals("from"))
out.println("</font></p>");}
public void endDocument(){
out.println("</body></html>");
out.close();}
public void characters(char buf[], int offs, int
l) throws SAXException{
String s = new String(buf, offs, l);
out.println(s+ "<br>");}}
note.xml
<note>
<to>you</to>
<body1 type="warm"
subject="Contemplation">If today was a
perfect day then there would be no
tomorrow</body1>
<from>God</from>
</note>

Execution:
java CountSax note.xml
 creates note.html
<html><head><title>Note</title></head><body>
To,
you<br>
<font color='green'>
<I>Contemplation:</I>
If today was a perfect day then there
would be no tomorrow<br>
<br>
<p align='right'><font color='black'> -from
God<br>
</body></html>
note.html
DOM
•Document object model. It is a standard
produced by W3C .
•All DOM processing assumes that you have
read and parsed a complete document into
memory so that all parts are equally accessible.
The data is represented in the form of tree.
•Disadvantages
4.It is pretty clumsy if you want to pick out a few
elements.
5.Memory requirement could get restrictive
org.w3c.dom package
Interfaces:
• Node
• Document (extends Node):The Document
interface represents the entire HTML or XML
document.
• NodeList interface provides the abstraction
of an ordered collection of nodes
• There are static methods in Node interface to
check element type. Node.ELEMENT_NODE,
Node. CDATA_SECTION_NODE
Methods
• Document Methods:
• public NodeList
getElementsByTagName(String tagname )
• public Element
createElement(String tagName) throws
DOMException
• public Comment createComment(String data)
• public Text createTextNode(String data)

• NodeList Methods:
• public int getLength()
• public Node item(int index)
Node Methods:
•Methods to access information about current node:
•public String getNodeName()
•public short getNodeType()
•public NodeList getChildNodes()
•Methods to modify the node’s children
•public Node appendChild(Node newChild) throws
DOMException
•public Node removeChild(Node oldChild) throws
DOMException
•public Node replaceChild(Node newChild,
Node oldChild) throws DOMException
DOM Programming model
XML source DTD
(optional) Document (DOM)
2.input 2.input
3.Parse
DocumentBuilder and build
the tree Node
1.creates
DocumentBuilderFactory

Recursively search nodes

Search Mechanism

Output
// Program 1: counting no. of elements
import org.w3c.dom.*;
import
javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import java.io.*;
public class CountDom{
public static void main(String str[])throws
Exception{
File f= new File(str[0]);
Node n= readFile(f);
int ele=getElementCount(n);
System.out.println(ele);}
public static Document readFile(File f) throws
Exception{
Document d;
DocumentBuilderFactory dbf=
DocumentBuilderFactory.newInstance();
dbf.setValidating(true);
DocumentBuilder db=dbf.newDocumentBuilder();
d=db.parse(f);
return d;}
public static int getElementCount(Node node){
if(node==null)
return 0;
int sum=0;
boolean
isElement=(node.getNodeType()==Node.ELEMENT_NOD
E);
if(isElement)
sum=1;
NodeList children= node.getChildNodes();
if(children==null)
return sum;
for(int i=0;i<children.getLength();i++)
sum+=getElementCount(children.item(i));
return sum;
}
}
// Program 2: Adding a comment and a node and
displaying
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import java.io.*;
import org.w3c.dom.*;
public class AddNodeDom{
static Node n1;
static Comment c;
public static void main(String str[])throws
Exception{
File f= new File(str[0]);
Document n= readFile(f);
setElements(n);
display(n);
System.out.println("done");
}
public static Document readFile(File f) throws
Exception{
Document d;
DocumentBuilderFactory dbf=
DocumentBuilderFactory.newInstance();
DocumentBuilder db=dbf.newDocumentBuilder();
d=db.parse(f);
return d;
}
public static void display(Node node){
if(node.getNodeType()==Node.ELEMENT_NODE)
System.out.print(node.getNodeName()+":");
if(node.getNodeType()==Node.TEXT_NODE ||
node.getNodeType()==Node.COMMENT_NODE )
System.out.println(node.getNodeValue().trim());
NodeList children= node.getChildNodes();
if(children!=null)
for(int i=0;i<children.getLength();i++)
display(children.item(i));
}
public static void setElements(Node node){
if(node==null) return;
boolean
isEle=(node.getNodeType()==Node.ELEMENT_NODE);
if(isEle && node.getNodeName().equals("display-
name")) n1= node;
if(isEle && node.getNodeName().equals("servlet"))
{ node.appendChild(c);
node.appendChild(n1);}
NodeList children= node.getChildNodes();
if(children!=null)
for(int i=0;i<children.getLength();i++)
setElements(children.item(i));}}

S-ar putea să vă placă și