Handling URL's
URL's have an important role within the internet. Even novice users of the web know about URL's. In Java, a URL can be used by a program to retrieve information without the need for any more detailed understanding of the internet.
Example - A primitive Web browser
It is easy to write a Web browser in Java to read and display information from a remote host computer, given a URL. The following application does this. Type in a URL into a text field, including the http:// part. The contents of the web page are displayed in the text area.
This is the second way of creating a web browser. The first was using a telnet program. Another method is yet to come.
import java.awt.*; import java.awt.event.*; import java.net.*; import java.io.*; public class MiniBrowser extends Frame implements ActionListener { private TextField txtInput; private TextArea contents; private Button btnDisplay; private Button exit; public static void main(String [] args) { MiniBrowser m = new MiniBrowser(); m.makeGUI(); m.setSize(300,400); m.setVisible(true); } public void makeGUI() { setLayout(new FlowLayout()); txtInput = new TextField(50); add(txtInput); btnDisplay = new Button("Display page at this URL"); add(btnDisplay); contents = new TextArea("", 0, 0, TextArea.SCROLLBARS_VERTICAL_ONLY); add(contents); btnDisplay.addActionListener(this); exit = new Button("exit"); add(exit); exit.addActionListener(this); } public void actionPerformed(ActionEvent event) { if (event.getSource() == exit) System.exit(0); String line; String location = txtInput.getText(); try { URL url = new URL(location); BufferedReader input = new BufferedReader( new InputStreamReader( url.openStream())); while ( (line = input.readLine()) != null) { contents.append(line); contents.append("/n"); } input.close(); } catch (MalformedURLException e) {contents.setText("Invalid URL format");} catch (IOException io) {contents.setText(io.toString());} } }
The program first creates a URL object by calling a constructor method of the class URL, supplying the string version of the URL as the parameter. If there is something wrong with the syntax of the URL, an exception is raised and the program displays an error message.
Streams
Accessing information across the Internet is accomplished using library classes for streams. These are the same classes as are used to read and write information from files. Thus reading or writing to the network is just like reading or writing to a serial file. There are 22 Java library classes providing stream access. The trick is to select the appropriate class. Some of them are no use for Internet access - for example those that do random access or provide line numbers on input streams. Some of them deal with data as binary data and others treat a stream as character data.
Within the program, an input stream is created:
InputStream is = url.openStream();
This is a connection to the URL. We could do input directly using this object, but it is more convenient to create some other stream objects:
InputStreamReader isr = new InputStreamReader(is); BufferedReader input = new BufferedReader(isr);
This is a class that support character input and output, rather than binary, because the assumption is that the data at the desired URL is characters (HTML). Moreover, we can process a whole line at a time with this class.
A while loop inputs a line at a time using the method readLine from the class BufferedReader. Each line is appended to the text area. This continues until there are no more lines - a null line is encountered. Since readLine strips end of line characters, they have to be put back.
Streams are:
To use an input stream:
Conclusion
Note that this program uses a high level Java API. It uses only a URL, typed in by the user, to retrieve a web page from a remote host. (For example, there is no mention of TCP, IP, IP addresses or sockets, which are all used by this program.) This demonstrates how Java has been designed to carry out internet tasks easily. The Java library classes used by this program know about and use the HTTP protocol. (See the reference summary on HTTP as part of these notes.) Thus, for example, the HTTP header has been stripped from the information sent to the client from the web server. The classes know about the protocol but not about the content, which is displayed as raw HTML. This web browser assumes that the data retrieved from the site is text. If the data expected was a GIF file or a Java class files, some other class would be used to input the data. Also this program does, of course, assume that there is a Web server program running on the server to retrieve and send the information to this client according to the HTTP protocol.
Summary of class URL
The class URL is a classic Java class, with constructor methods to create an object and access methods to retrieve portions of the URL from an object.
class name |
URL |
import |
import java.net |
|
description |
example |
constructors |
|
|
public URL(String url) |
creates a new URL object corresponding to the string supplied |
URL url = new URL("http://www.shu.ac.uk"); |
public URL(String protocol, String host, int port, String file) |
|
URL url = new URL("http", "www.shu.ac.uk", 80, "/default.htm"); |
|
|
|
object methods |
|
|
public String getProtocol() |
returns the protocol as a string. |
String protocol = url.getProtocol(); |
public String getHost() |
returns the host name part of the URL as a string. |
String name = url.getHost(); |
public int getPort() |
returns the port as a string. |
int port = url.getPort(); |
public String getFile() |
returns the path and file name part of a URL as a string. |
String name = url.getFile(); |
public final InputStream openStream() |
Makes a connection to the resource specified by the URL. Returns an InputStream object from which data can be read. |
InputStream is = url.openStream(); |
public String toString() |
returns the URL as a string |
String URLName = url.toString(); |
Exercises
The aims of this exercise are:
1 Run this primitive web browser program and explore the various exceptions that can arise.
Identify the limitations of this browser and suggest enhancements that could be made.
2 Add a default (home) page so that a particular URL is displayed when the program is run. Add a new (home) button to revert to this home page.
3 Enhance the web browser so that it clears the text area before displaying a new web page.
4 Modify the web browser so that it takes some account of the content of the page that is downloaded. Make it is display only those lines that do not contain either of the characters > and <. To accomplish this use the library method indexOf within class String. It returns the integer -1 if the string given as a parameter is not in the string. e.g.
if (myString.indexOf(">") == -1) // is not in else // is in
Put this code in your own method.