Apr03 JohnZ

Core Java Technologies Tech Tips
Tips, Techniques, and Sample Code

Welcome to the Core Java Technologies Tech Tips, April 22, 2003.
Here you'll get tips on using core Java technologies and APIs,
such as those in Java 2 Platform, Standard Edition (J2SE).
This issue covers:
* Validating URL Links
* Reusing Exceptions
These tips were developed using Java 2 SDK, Standard Edition,
v 1.4.
This issue of the Core Java Technologies Tech Tips is written by
John Zukowski, president of JZ Ventures, Inc.
(http://www.jzventures.com).
You can view this issue of the Tech Tips on the Web at
http://java.sun.com/jdc/JDCTechTips/2003/tt0422.html.
See the Subscribe/Unsubscribe note at the end of this newsletter
to subscribe to Tech Tips that focus on technologies and products
in other Java platforms.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - VALIDATING URL LINKS
A frequent problem that web site maintainers have is making sure
that links on a site remain valid. Sometimes a resource that is
the target of a link is removed. For example, consider links to
technical articles. Over time, these articles can get out of date,
and so are sometimes removed. After a resource like this is
dropped, any link to it is no longer valid. Validating these
links, especially links to resources on other sites, can keep an
entire team of people busy, so automating the process can save a
lot of time. The following tip presents a programmatic technique
for validating URL links. Specifically, it presents a program
that checks the response codes for all the foreign URLs on a web
page, and then generates a report. The report has more information
than simply the status of a link. It also include things like what
a redirected URL actually points to, so you can use the report to
update the web page.
If you've used the Web at all, it's likely that you've
encountered the dreaded 404 error. This means that the target of
a link on a web page, that is, a destination page, is not found.
The 404 in the error is a response code, one of the many response
codes covered in the HTTP protocol defined by the World Wide Web
Consortium (W3C) (http://www.w3.org/Protocols/). Page 40 of
RFC 2616 at ftp://ftp.isi.edu/in-notes/rfc2616.txt shows the
complete list of response codes.
There are three classes in the java.net package that are useful
in checking the response code for a URL link: URL, URLConnection,
and HttpURLConnection. The URL class allows you to create a URL
object for an http/https string. (You can create URLs for

other protocols such as ftp, but the response code is only valid
for HTTP connections). The URLConnection class gives you a way
to find out the response code associated with a specific URL.
The HttpURLConnection class is a URLConnection for HTTP requests
(a sister class, HttpsURLConnection is for HTTPS requests).
To get a URLConnection for a URL, you open a connection on a URL
object using the openConnection method. This gives you the
connection object, but it doesn't yet make the connection to the
URL. This gives you the option to configure the connection in
some special way, for instance, you can set any special header
fields. To make the connection to the object associated
with the URL, you call the connect method. To check the response
code you, have to call the getResponseCode() method. By default,
HttpURLConnection uses the HTTP GET method when retrieving an
object. This means the actual contents of the object are
returned. You should read the data from the input stream, and
then close the stream when finished. This avoids the possibility
of leaving the connection hanging, with the data only partially
read.
Here's what a simple check for the response code associated with
a URL looks like:
import java.net.*;
import java.io.*;
public class SimpleURLCheck {
public static void main(String args[]) {
if (args.length == 0) {
System.err.println
("Please provide a URL to check");
} else {
String urlString = args[0];
try {
URL url = new URL(urlString);
URLConnection connection =
url.openConnection();
if (connection instanceof HttpURLConnection) {
HttpURLConnection httpConnection =
(HttpURLConnection)connection;
httpConnection.connect();
int response =
httpConnection.getResponseCode();
System.out.println(
"Response: " + response);
InputStream is =
httpConnection.getInputStream();
byte[] buffer = new byte [256];
while (is.read (buffer) != -1) {}
is.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
If you run the program with the URL http://java.sun.com/jdc/:

java SimpleURLCheck http://java.sun.com/jdc/
it should return a response code of 200.
Note that if you're behind a firewall, you need to set the
proxyHost and proxyPort properties as appropriate for your proxy.
In other words, you need to add code in the program that looks
something like this:
Properties prop = System.getProperties();
prop.put("http.proxyHost","your-proxy-host-name");
prop.put("http.proxyPort","your-proxy-port-number");
There's a specific reason why http://java.sun.com/jdc/ was picked
as the URL. If you enter that URL in the browser you'll notice
that the browser gets redirected to
http://developer.java.sun.com/developer/. The browser then loads
the page to which it is redirected. It is that page that loads
without problem, and so the browser sends back a response code of
200 (or HttpURLConnection.HTTP_OK).
Why doesn't HttpURLConnection report that this is a redirected
URL? By default, HttpURLConnection will follow redirected URLs.
If you want to find out if a URL redirects, you have to turn off
the default behavior. You can do this either for all
HttpURLConnection objects by using the setFollowRedirects method.
Or you can turn off the default behavior for a specific
HttpURLConnection by using the setInstanceFollowRedirects method.
In either case, providing an argument of false turns off the
automatic redirect behavior. You can test this, by adding the
line:
HttpURLConnection.setFollowRedirects(false);
to the SimpleURLCheck program. If you rerun the program:
the response code should be 301.
The 301 error code means that the URL has moved permanently (by
comparison, a response code of 302 represents a temporary
redirect). That means that if the original URL was saved, a smart
application could update the saved URL by getting the target of
the URL redirect. To get that redirected URL, you need to
retrieve the Location header of the HttpURLConnection. You do
this with the getHeaderField method.
Here's the SimpleURLCheck program with setFollowRedirects and
getHeaderField methods added:
import java.net.*;
import java.io.*;
public class SimpleURLCheck {
public static void main(String args[]) {
if (args.length == 0) {
System.err.println(
"Please provide a URL to check");
} else {
String urlString = args[0];
try {
int response =
System.out.println("Response: " + response);
String location =
httpConnection.getHeaderField("Location");
if (location != null) {
System.out.println(
"Location: " + location);
}
InputStream is =
httpConnection.getInputStream();
byte[] buffer = new byte [256];
while (is.read (buffer) != -1) {}
is.close();
}
}
}
}
}
Now, when you run the application with a URL of
http://java.sun.com/jdc/:
it should produce the following results:
Response: 301
Location: http://developer.java.sun.com/developer/
The check even works for https URLs (the following command
should be entered on one line):
java SimpleURLCheck
https://www.madonotcall.govconnect.com/
should produce the following results:
Response: 302
Location: cookiestest.asp
Remember to add proxy settings if you're behind a firewall, that
is, for https.proxyHost and https.proxyPort.
Notice that this particular site runs a quick cookie test. If

you rerun the program with the new URL, appending cookiestest.asp
to end of first URL, you would see the redirection (again,
the following command goes on one line):
java SimpleURLCheck
https://www.madonotcall.govconnect.com/cookiestest.asp
Response: 302
Location: cookies_error.htm
Of course, this little command-line program doesn't support
cookies, so the web site redirects to an error page. Had the URL
been entered into a browser, the response would have been a
redirect to
https://www.madonotcall.govconnect.com/Welcome.asp.
Yet another thing to add to the link checker program is a smarter
way to check status codes. Because the program doesn't care what
the content actually is, you can set the request method to HEAD
in the request. This setting specifies that the request is only
for the heading of the response, not the actual data. By default,
an HTTP request is a GET request -- in this case, everything is
embedded in the URL. You can make a HEAD request for the
HttpURLConnection by specifying setRequestMethod("HEAD"). For
example, you can add the following line to the SimpleURLCheck
program:
httpConnection.setRequestMethod("HEAD");
Instead of showing the full program again here, you'll see the
setRequestMethod method in use in the enhanced URL check report
later.
Unlike setFollowsRedirects, which you can specify to cover all
connections, setRequestMethod needs to be specified for each
connection.
With the SimpleURLCheck program, you are able to check a single
URL at a time. By moving the checking code to a method, and
automating the scanning for URLs from a web page, you can
generate a report on the validity of the URLs. The report can
also track redirects, in other words, to where the URLs have been
pointed. In this case, it is not necessary to have code that gets
the InputStream, reads the data, and then closes the stream.
However, there is no problem if you leave that code in the
program -- the read returns -1 immediately.
An earlier Tech Tip tip titled "Extracting Links from an HTML
File" (http://java.sun.com/jdc/TechTips/1999/tt0923.html#tip1)
presented a program that fetches URLs from a web page. You can
can combine that program with the SimpleURLCheck program to
generate the link check report.
For a simple report, let's just print out error codes for each
page (and redirect URLs for those that have one). To understand
the report, you need to understand the range of response codes
for HTTP requests. Going back to RFC 2616, you'll notice the
following categories of response codes:
1xx
2xx
3xx
4xx
5xx
informational
successful
redirection
error
server error
If you were to generate a smarter report, you could ignore the

web pages that generate error codes in the 100 and 200 range. You
could report the rest as errors. Another way to make the report
smarter is to also check internal links, not just external ones.
And, if you want the program to be truly smart, you might want
to ignore errors like redirecting http://sun.com to
http://www.sun.com/, or at least flag them differently. Another
enhancement, though a tricky one, is automating the tagging of
URLs with session information, such as when you visit a URL like
http://www.networksolutions.com/.
Here's what the enhanced URL check program looks like.
import
import
import
import
java.io.*;
java.net.*;
javax.swing.text.*;
javax.swing.text.html.*;
class EnhancedURLCheck {
public static void main(String[] args) {
EditorKit kit = new HTMLEditorKit();
Document doc = kit.createDefaultDocument();
// The Document class does not yet
// handle charset's properly.
doc.putProperty("IgnoreCharsetDirective",
Boolean.TRUE);
try {
// Create a reader on the HTML content.
Reader rd = getReader(args[0]);
// Parse the HTML.
kit.read(rd, doc, 0);
// Iterate through the elements
// of the HTML document.
ElementIterator it = new ElementIterator(doc);
javax.swing.text.Element elem;
while ((elem = it.next()) != null) {
SimpleAttributeSet s = (SimpleAttributeSet)
elem.getAttributes().getAttribute(
HTML.Tag.A);
if (s != null) {
validateHref(
(String)s.getAttribute(
HTML.Attribute.HREF));
}
}
} catch (Exception e) {
}
System.exit(1);
}
// Returns a reader on the HTML data. If 'uri' begins
// with "http:", it's treated as a URL; otherwise,
// it's assumed to be a local filename.
static Reader getReader(String uri)
throws IOException {
if (uri.startsWith("http:")) {
// Retrieve from Internet.
URLConnection conn =
new URL(uri).openConnection();
return new
InputStreamReader(conn.getInputStream());
} else {
// Retrieve from file.
return new FileReader(uri);
}
}
private static void validateHref(String urlString) {
if ((urlString != null) &&
urlString.startsWith("http://")) {
try {
httpConnection.setRequestMethod("HEAD");
int response =
System.out.println("[" + response + "]" +
urlString);
String location =
httpConnection.getHeaderField("Location");
if (location != null) {
System.out.println(
"Location: " + location);
}
System.out.println();
}
}
}
}
}
If you run the report on the http://java.sun page:
java EnhancedURLCheck http://java.sun.com
it should produce output similar to the following (only the first
few lines of the output are shown):
[200]http://java.sun.com/
[200]http://search.java.sun.com/search/java/
advanced.jsp
[200]http://java.sun.com/
[302]http://servlet.java.sun.com/logRedirect/
frontpage-nav/http://java.sun.com/products/
Location: http://java.sun.com/products/
[302]http://servlet.java.sun.com/logRedirect/
frontpage-nav/http://java.sun.com/j2ee/Location:
http://java.sun.com/j2ee/
For more information about using URLs, see the lesson "Working
with URLs" in the Java Tutorial
(http://java.sun.com/docs/books/tutorial/networking/urls/index.html).
Also see the lesson "Reading from and Writing to a URLConnection"
(http://java.sun.com/docs/books/tutorial/networking/urls/readingWriting.html).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - REUSING EXCEPTIONS
When an exceptional condition occurs in a program, it's typical
to throw an exception. The syntax for throwing an exception often
looks something like this:
throw new MyException("String");
Literally, you create the exception when the exceptional
condition happens, and immediately throw it to the caller (or to
some enclosing try-catch block).
If you print a stack trace at that point with printStackTrace(),
the information you see is filled with the details of the stack
at the time the exception was created. This might sound obvious.
However, if you want to avoid creating new objects when an
exception occurs, you can create the exception once, and reuse it
when the exceptional condition reoccurs. A common design pattern
for this is called a Singleton. You can create the object the
first time you need it, and then keep reusing the same object.
For example:
MyObject myObject = null;
private synchronized MyObject getMyObject() {
if (myObject == null) {
myObject = new MyObject();
}
return myObject;
}
What happens if you use a pattern similar to this with an
Exception object (or in fact anything that subclasses
java.lang.Throwable)? In this case, the stack trace is filled
with the trace from when the exception was created. Typically,
that isn't what you want because you probably want to see the
exact trace for each exception, that is, when each exception
happened. In order to share exception objects in this way, you
have to refill the stack trace for the new stack when the
exception happens. The method of Throwable to do this is
fillInStackTrace.
Here's what the getMyObject pattern looks like after adjusting it
to work with exception objects:
MyExceptionObject myExceptionObject = null;
private synchronized MyExceptionObject
getMyExceptionObject() {
if (myExceptionObject == null) {
myExceptionObject = new MyExceptionObject();
} else {
myExceptionObject.fillInStackTrace();
}
return myExceptionObject;
}
Notice that you don't have to fill in the stack manually the
first time. It's done for you. You only have to fill in the
stack for subsequent occurrences.
At this point, you might ask why you would want to reuse
exception objects. One reason to reuse (or even just pre-create)
exception objects is to minimize the number of objects created
when an exception actually happens. Of course, filling the stack
does take resources, so this isn't completely free of the need
for memory resources. And, if you truly don't need the stack
trace, this pattern allows you to not bother filling it.
To demonstrate this behavior, the following program shows two
ways of reusing exception objects. For each approach, the
exception is printed three times. In the first case, the same
stack is shown three times, even though three different methods
print the trace. The latter case demonstrates the difference in
each stack when you refill the stack trace with each call.
import java.io.*;
public class ReuseException {
IOException exception1 = null;
private synchronized IOException
getException1() {
if (exception1 == null) {
exception1 = new IOException();
}
return exception1;
}
IOException exception2 = null;
private synchronized IOException
getException2() {
if (exception2 == null) {
exception2 = new IOException();
} else {
exception2.fillInStackTrace();
}
return exception2;
}
void exception1Method1() {
getException1().printStackTrace();
}
}
}
}
}
}
public static void main(String[] args) {
ReuseException reuse =
new ReuseException();
reuse.exception1Method1();
System.out.println("---");
}
}
When you run the program, your output should look something like
this:
java.io.IOException
at ReuseException.getException1
(ReuseException.java:9)
at ReuseException.exception1Method1
at ReuseException.main
java.io.IOException
java.io.IOException
--java.io.IOException
java.io.IOException
java.io.IOException
For more information about working with exceptions, see the lesson
"Handling Errors with Exceptions" in the Java Tutorial
(http://java.sun.com/docs/books/tutorial/essential/exceptions/index.html).
Also see the documentation for the Throwable class
(http://java.sun.com/j2se/1.4.1/docs/api/java/lang/Throwable.html).
. . . . . . . . . . . . . . . . . . . . . . .
IMPORTANT: Please read our Terms of Use, Privacy, and Licensing
policies:
http://www.sun.com/share/text/termsofuse.html
http://www.sun.com/privacy/
http://developer.java.sun.com/berkeley_license.html
* FEEDBACK
Comments? Send your feedback on the Core Java Technologies
Tech Tips to:
jdc-webmaster@sun.com
* SUBSCRIBE/UNSUBSCRIBE
Subscribe to other Java developer Tech Tips:
- Enterprise Java Technologies Tech Tips. Get tips on using
enterprise Java technologies and APIs, such as those in the
Java 2 Platform, Enterprise Edition (J2EE).
- Wireless Developer Tech Tips. Get tips on using wireless
Java technologies and APIs, such as those in the Java 2
Platform, Micro Edition (J2ME).
To subscribe to these and other JDC publications:
- Go to the JDC Newsletters and Publications page,
(http://developer.java.sun.com/subscription/),
choose the newsletters you want to subscribe to and click

"Update".
- To unsubscribe, go to the subscriptions page,
(http://developer.java.sun.com/subscription/),
uncheck the appropriate checkbox, and click "Update".
- To use our one-click unsubscribe facility, see the link at
the end of this email:
- ARCHIVES
You'll find the Core Java Technologies Tech Tips archives at:
http://java.sun.com/jdc/TechTips/index.html
- COPYRIGHT
Copyright 2003 Sun Microsystems, Inc. All rights reserved.
4150 Network Circle, Santa Clara, California 95054 USA.
This document is protected by copyright. For more information, see:
http://java.sun.com/jdc/copyright.html
Core Java Technologies Tech Tips
April 22, 2003
Trademark Information: http://www.sun.com/suntrademarks/
Java, J2SE, J2EE, J2ME, and all Java-based marks are trademarks
or registered trademarks of Sun Microsystems, Inc. in the
United States and other countries.

Apr03 JohnZ

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Apr03 JohnZ

Încărcat de

Drepturi de autor:

Formate disponibile

Core Java Technologies Tech Tips

Tips, Techniques, and Sample Code

object for an http/https string. (You can create URLs for

If you run the program with the URL http://java.sun.com/jdc/:

Notice that this particular site runs a quick cookie test. If

If you were to generate a smarter report, you could ignore the

choose the newsletters you want to subscribe to and click

S-ar putea să vă placă și