Web Extractor

Web Extractor
Chapt er 1
Abstr act
Abstract:
This is purely a java enabled Network project using which an end user downloads different files (data) from various servers with a high-speed data transfer. The end user activating the Web Extractor need to submit an URL of the file to be downloaded, by which the software gets connected to the server and gets the file information with its status for the download. After which the user is asked for the path to save the file. The Web Extractor then downloads the requested file at a high speed. The software design supports the display of status information of the download carried out at every second (i.e.) the percentage of file downloaded. The most striking feature of this Web Extractor is support of high-speed data transfer and it allows the user to download more than one file simultaneously. It provides a reliable data transfer by tracing the download status at every moment and helps to continue the file downloading from the position of its last download if the downloading is terminated by a sudden disaster like loss of connection due to any reason or failure of transmission.
Chapt er 2
Project Synopsis
Synopsis-Introduction
Web
Extractor
is
software
aimed
to
achieve
high
speeds
in
downloading information from the Internet. The striking feature of the Web Extractor is, at given instance of time simultaneously it supports multiple downloads as well as multilevel download (i.e.) For example if I need to download a file of size 1GB from the internet, instead of downloading it from one server, our software provides a facility, where it accepts other sources (URLs) of the same file, if available to download the same single file from different servers. Downloading of file from single server may cause a time delay depending upon the network traffic and load on that particular server, but our software design supports multilevel download with the implementation of the multithreading feature, by which depending upon the number of the URLs submitted by the user for the a single file download and the file size the initiation of download will create threads on to each URL and the file is divided into equal number of parts and each thread is assigned with the work of downloading the part of the requested file. Synchronization of the threads lead to the download the parts of the file from different servers simultaneously and integrating them into a single file after the total download of all the parts. By this feature, load on a server or network traffic will not effect the final downloading time of a file because of the switching between the threads for sharing the load of threads, working on high network traffic and heavy loaded servers. Hence, the design of this software is going to provide us a high-speed download of files with multiple and multilevel downloading facilities. If we need to download a file of size 1GB from internet, instead of downloading it from one server, our software provides a facility, where it accepts other sources (URLs) of the same file, if available to download the same single file from different servers.
Chapt er 3
System Configurati on
System Configuration
The software requirement specification can produce at the culmination of the analysis task. The function and performance allocated to software as part of system engineering are refined by established a complete information description, a detailed functional description, a representation of system behavior, an indication of performance and design constrain, appropriate validation criteria, and other information pertinent to requirements. This project requires the following H/W and S/W equipment in order to execute them. They are as given below.
Operating System: Windows NT/2000 Professional (Client / Server) Hardware Configuration Processor Clock Ram Software Configuration Java, JDK Kit, JSDK 2.0, Java Network Programming, Swings, Net : : : Pentium III 500 MHZ 128 MB
Chapte r 4
System Analysis
System Analysis
To analysis and understand the system we have to analyze and specify the requirements first. Software requirement specification is the starting point of software development activity. The software requirements specification means translating the ideas in the minds of the clients (input), into the formal document (output). Any formal translating processes producing a formal output must have a precise and unambiguous input .The SRS phase consists of two basic activities. Problem analysis.
Requirement specification. Fact Finding Techniques In this system we are going to develop a facility to a user that he will not face any difficulty at the time of usage like data missing, one way contacts, one view contacts. As we are developing this system with an encoding technique of images the user will not be bothered on which camera support is using, as well in sound. As we are maintaining one technique of speed controlling the frame relay will not be a problem for the user like over speed display, hanged display.
Feasibili ty Study
Feasibility Study A feasibility study is a high-level capsule version of the entire System analysis and Design Process. The study begins by classifying the problem definition. Feasibility is to determine if its worth doing. Once an acceptance problem definition has been generated, the analyst develops a logical model of the system. A search for alternatives is analyzed carefully. There are 3 parts in feasibility study. Operational Feasibility Question that going to be asked are: Will the system be used if it developed and implemented. If there was sufficient support for the project from the management and from the users. Have the users been involved in planning and development of the Project. Will the system produce poorer result in any respect or area?
This system can be implemented in the organization because there is adequate support from management and users. Being developed in Java so that the necessary operations are carried out automatically. Technical feasibility Does the necessary technology exist to do what is been suggested Does the proposed equipment have the technical capacity for using the new system? Are there technical guarantees of accuracy, reliability and data security?
The project is developed on Pentium III with 128 MB RAM. The environment required in the development of system is any windows platform The observer pattern along with factory pattern will update the results eventually
The language used in the development is JAVA 1.4. & J2sdk 1.4 and Java Network Programming.
Financial and Economical Feasibility The system developed and installed will be good benefit to the organization. The system will be developed and operated in the existing hardware and software infrastructure. So there is no need of additional hardware and software for the system.
Existing and Proposed System
Existing System System definition is the process of obtaining a clear understanding of the problem space such as your business opportunities; user needs, or market environment and defining an application or system to solve that problem. Web Extractor is software, which downloads the requested file and displays the status information of download carried out by the server. Since downloading of file from single server may cause a time delay depending upon the network traffic and load on that particular server, it is difficult to download the files at a high rate. Proposed System Web Extractor is software designed to achieve high speeds in downloading information from the Internet. The striking feature of the Web Extractor is, at given instance of time simultaneously it supports multiple download as well as multilevel download. Our software design supports multilevel download with the implementation of the multithreading feature. For a single file of a particular size, the initiation of download will create threads depending on the number of URLs (Mirrors) submitted by the user and the file is divided into equal number of parts, after which each thread is assigned with the process of downloading the designated part of the file to be downloaded from the corresponding destinations.
Analysis Report
Analysis:
Web Extractor is software designed to achieve high speeds in downloading information from the Internet. The striking feature of the Web Extractor is, at given instance of time simultaneously it supports multiple download as well as multilevel download. Our software design supports multilevel download with the implementation of the multithreading feature. For a single file of a particular size, the initiation of download will create threads depending on the number of URLs (Mirrors) submitted by the user and the file is divided into equal number of parts, after which each thread is assigned with the process of downloading the designated part of the file to be downloaded from the corresponding destinations. How do they do that? You must have asked yourself that a hundred times when you used Download Accelerator Plus* or Go!Zilla* to download your favorite MP3 or Quake* demo files from the Internet. How can they download it 4-5 times faster than the browser does? The Resuming feature is even more fascinating. And downloading the same file from multiple servers seems nothing short of rocket science. Input: The end user needs to submit an URL of the file that is to be downloaded after which the software gets connected to the server and gets the file information with its properties, status for the download. After which the user is asked for the path to save the file and the other URLs (Mirrors) if any available for the same file download (To initiate multilevel download). Expected Output: Web Extractor provides a reliable data transfer by tracing the download status of each thread at every moment. Synchronization of the threads allows the user to download parts of the file from different targets simultaneously and integrating them into a single file after the total download of all parts is completed successfully. By this feature, load on a server connected or network traffic will not effect the downloading time, as the software provides the feature of work sharing process by the employed
threads depending on their server and network constraints. The system design is also expected to provide with the help to continue the file download from the position of its last download stopped because of any reason (Technical problems or may be any general disasters). Scope This is the explanation that describes in considerable detail how to develop an object-oriented download accelerator in Java* using design patterns such as Factory, MVC, Observer, Singleton, and Visitor, and design techniques such as proxies, separation of concerns, and inversion of control. This article also discusses developing plug-ins for Java applications and the basics of HTTP. At the end of this series, you will be able to develop your own download accelerator in Java that complies with most of the important design patterns. DoIt accelerates Internet downloads by spawning multiple threads, where each thread downloads a fragment from the server. After all threads have finished downloading, they are recombined into a single file. It also allows resuming disconnected downloadsa feature very useful for people using slow dial-up lines. Mirror URLs can also be specified, where DoIt balances the load by downloading the fragments from the servers in a roundrobin fashion.
Chapt er 5
System Design
Design Strategy DoIt* has two layers implemented as two separate Java* subpackages:
core, which contains all the logic for creating and managing the file downloads gui, which contains all the user interface components.
At first I describe the core package. Then explain the gui package and the plugin development. This kind of organization of the code has many advantages. Modular code allows you to select the required functionalities (modules) and assemble them to make a larger application. If the performance of a module is unsatisfactory, you can replace it with better implementations. It also helps allocate well-defined responsibilities to the development teams.
Design and Implementation of 'core'

To understand the UI you have to understand the core package. Figures 1 and 2 represent the sequence diagrams of the core sub-package. The core sequence diagram. (Adding the URLs, getting the download information, and analyzing and filtering the URLs provided by user.) The core sequence diagram. (After analyzing and filtering the URLs provided by the user, preparing CoreDownloadWorkers, to start, pause, resume, and stop the download.) The GUI object in the figure will be later expanded into specific objects in the UI layer. For now, assume that it routes messages from the user to the core sub package. The black bar parallel to the CoreDownloadWorkers denotes that it is an asynchronous method call.
UML
Sequenc e Diagram s
Chapt er 6
System Descrip tion
Web Extractor and Download Workers
As
the
figure
suggests,
for
each
file
download,
CoreDownloadManager object is instantiated. A list of mirror URLs (can be just one) are fed to it. For each URL, it creates a CoreDownloadWorker, which acts as a proxy, abstracting the protocol specific implementation of IProtocolWrapper. For example, there can be two mirror URLs, one using HTTP and the other FTP, both referring to holds copies of the to same file. The ProtocolWrapperFactory references protocol-specific
implementations and dispenses them when requested through a static method. This is a typical use of the Factory pattern where specific subclasses (or protocol-specific implementations of IProtocolWrapper here) are returned based on the requested type. The CoreDownloadWorker spawns a separate thread and invokes the getDownloadInfo() method on the protocol implementation. The protocol-specific implementation reads the specific headers to determine the file size and support for resuming and puts this information into the CoreDownloadWorker for this implementation. The protocol implementation is separate from the core package because it makes supporting multiple protocols easier, more like developing plug-ins. The HTTP support that comes with the default DoIt distribution, namely http_protocolwrapper. HTTP_ProtocolWrapper is actually a separate plug-in. The HTTP_ProtocolWrapper in turn relies on an open source library called HTTP Client, developed by Ronald Tschalr, to do the actual communication through HTTP. The HTTP Protocol Use the HTTP Head protocol to find out if the servers support resuming. The HTTP Head protocol has the same structure as the HTTP Get protocol, except that the server responds with just the header information, such as support for resuming, file size, and so forth. If the servers support
resuming, then find the file size. This saves the user from downloading the entire file just to get the file information. If the server responds with a '206' header, it means the server supports Partial Content; in other words, resuming is supported. The header also has a 'Content Length' field, which tells how big the file is. If, on the other hand, a '200' header is returned, it means the server does not support resuming. In such cases the file cannot be downloaded in parts. For more details of the HTTP 1.1 protocol, read RFC 2616*. Other configuration properties for HTTP, such as proxy settings, can be configured through the HTTP_ProtocolWrapper.properties file in the 'pw' (short for Protocol Wrapper) directory. After all the CoreDownloadWorkers have finished gathering
information about the file, the CoreDownloadManager consolidates this information. If there is at least one CoreDownloadWorker that supports resuming, it will be chosen over the CoreDownloadWorkers that don't. If none of them support resuming, then the first one will be chosen. If there are more than one CoreDownloadWorkers that support resuming, all of them will be used to balance the load. With this list of CoreDownloadWorkers, the old CoreDownloadWorkers are deleted and new ones created, one for each URL in the list just created. Since it is now known whether the URLs support resuming or not and the file length is also known, the CoreDownloadWorkers are configured to download a specific fragment of the file. If resuming is not supported, then a single CoreDownloadWorker will be set to download the entire file. Downloading the File Fragments Simultaneously The HTTP plug-in uses the 'Get' protocol to download the file. If resuming is supported, each of the CoreDownloadWorkers is configured to download a non-overlapping fragment of the file. This request is conveyed to the server(s) by setting the 'Range' field with the specific byte range of the file in the HTTP 'Get' request. For example, suppose one
CoreDownloadWorker has to download the first 1024 bytes; the 'Range' value of the 'Get' request would be 'bytes=0-1023'. DoIt is configured to download in chunks specified in the CHUNK_SIZE field of the DoIt.properties located in the DoIt directory (minimum of 50 KB) or one seventh of the file size, whichever is bigger. The actual download process runs in a separate thread, which is controlled by the CoreDownloadManager. The bytes read from the protocol implementation continue to be saved into files specific to each CoreDownloadWorker. After all the CoreDownloadWorkers have finished downloading, the FileRecombiner in the sd.util sub package combines them into one file in the location initially specified by the user. The partial files downloaded by each CoreDownloadWorker are stored in a temporary directory inside the 'dwnld' directory (short for Download), whose name is an MD5 hash of the first mirror URL typed in for the download. The MD5 algorithm implementation for Java used here was developed by Jon Howell*.
To support resuming the download, the CoreDownloadManager and CoreDownloadWorkers can be serialized. Since they carry with them the URLs being used and the number of bytes copied until now, downloads can be stopped anytime and resumed later by just serializing and de-serializing the CoreDownloadManager, which in turn serializes and de-serializes the CoreDownloadWorkers. This serialized object graph is stored in a file called DownloadInfo.ser in the temporary directory allocated for this download.
Figure 1. DoIt* screenshot. (Layout of the UI components.)
1. Panel to flash messages. 2. Download

panel with CardLayout. Visible panel is the DownloadStatusPanel.
3. Panel inside the DownloadStatusPanel, displaying file details.
4. Panel inside the DownloadStatusPanel, containing an array of

DownloadStatusPanelListComponents, each displaying its corresponding CoreDownloadWorkers status.
5. Panel containing the Action buttons. 6. AddURLsPanel which now lies below the DownStatusPanel.
The GUI subpackage sits on top of the core layer. The figure explains where each UI subcomponent is located. Since the two are clearly separated, it follows the guidelines laid out by the Model-View-Control (MVC) pattern. The core package with its CoreDownloadManager and CoreDownloadWorkers form the Model since they hold the data, such as information about the file we are downloading, support for resuming, and status of each CoreDownloadWorker. The gui as the name suggests, forms the View (UI) which knows how to display the Model. Almost always, the Controller resides side-by-side with the UI components since it is meant to control UI activities: what the user should be seeing, how to handle events generated by button clicks, menu handling, and so forth. If the functionality offered by the UI is not very complicated, then the Controller is usually tucked into the UI components. The functionality of the UI components in our example are not very simple. Therefore, the Controller will be kept separate from the View. The source codes of DownloadPanel and DownloadStatusPanel are just containers JPanels. They do not have any logic to handle events. They've been designed by dragging and dropping components using an IDE (NetBeans*). All the event-handling logic is in DownloadPanelController and DownloadStatusPanelController.
Downloading a File
1. Enter the URLs (plural because it can be a list of mirror sites) as

indicated in Figure 1, F.
2. Get information about the URLs, then analyze and filter the URLs
based on the information collected (not shown in the figure).
3. Obtain the destination file name and location (not shown in the figure). 4. Download the file fragments as indicated in Figure 1, C and D with the
two blue inner rectangles.
5. Recombine the fragments.

There also needs to be a mechanism to handle events like Pause and Stop that are activated by button clicks. The threads can be controlled individually through pop-up menus triggered by right-clicking the buttons. Putting all this event handling code inside the UI components would lead to confusion. Component Hierarchy and Layout Referring again to Figure 1, the DownloadPanel has three
subcomponents as indicated by A, B (the big red rectangle) and E. A is a JTextField component that flashes messages telling the user what to do. E has a 1 row 2 column GridLayout which holds the 2 JButtons. The text of the first button on the left changes, depending on the context: Start, Continue, Pause, and Resume. Component B, however, is just a plain Panel with a CardLayout. Since all visual components in Java Foundation Classes (JFC), a.k.a Swing, inherit from the Component class, developers are actually encouraged to mix and match components, or put them one inside the other. This is exactly what has been done to the JPanel shown in Figure 1, B. Since it has a CardLayout, all context-specific components are added one on top of the other, like a deck of cards. So, in the first stage then the user is entering the URLs, the AddURLsPanel displays. When the file download has started, the DownloadStatusPanel is brought to the top.
Component Configuration Before a panel is displayed in B, it has to be configured. The components themselves have no brains; they can only hold data. So, there needs to be a way for external entities to configure them, which is the work of the DownloadPanelController. This is commonly referred to as Inversion of Control because the object that created it controls from the component outside, usually. The Visitor pattern also recommends a similar strategy when a lot of classes, share some common operation, but were not or could not be included in the inheritance hierarchy. For classes (Hosts) whose functionality has to be extended without disturbing the existing code, you can plug in a Visitor to implement that functionality by making the Host invoke the Visitor's 'visit (Host)' method, passing itself as the parameter. The Visitor initiates this process by invoking the 'accept (Visitor)' method, passing itself as the parameter. The DownloadPanel and all JPanel subclasses, which form the 'deck of cards' implement the IHost interface. However, here the IHost Implementations do not completely follow the Visitor pattern, since the functionality is inside the 'accept has a (Visitor)' method itself. in The the DownloadPanelController separate thread running
background to do all the housekeeping:

Switching the JPanels Changing the text of the JButtons based on the current context Handling the events generated by button clicks. Since the sequence of events is very clearly defined, they are grouped
into sets of activities to be performed at each stage, making the code more readable and bug-free, as indicated by Listing 1. The DownloadPanelController keeps track of the current context by means of state variables and case statements. Listing 2 shows the pseudo code for
the
method
dwnldState()
which
handles
the
'Download
State'
in
DownloadPanelController. Each of the states, including the Download State, has three sub states. The first sub state is the ENTER_SUB_STATE, where the context-specific subcomponent is initialized. The state then enters the EVT_PROCESS_SUB_STATE where events are accepted and processed. They are processed in a First In First Out (FIFO) manner from the actionEventsQueue. If the event were meant for a subcomponent, then it is forwarded to that component's Controller (in this case, the DownloadStatusPanelController). This kind of Separation of Concerns is another design philosophy. The DownloadStatusPanelController has to notify the
DownloadPanelController indicating that it has finished processing the event and can move to the next state. This 'callback' mechanism is accompalished through the Observer pattern. The DownloadStatusPanelController extends the java.util.Observable class and the DownloadPanelController implements the java.util.Observer interface, registering an interest in the messages broadcasted by the DownloadStatusPanelController. The DownloadStatusPanelController has another subcomponent called the FullDownloadDetailsPanel, which contains an array of This DownloadStatusPanelListComponents.
DownloadStatusPanelListComponent is a combination of a JTextField, to display the URL being used by the corresponding CoreDownloadWorker and a JProgressBar to graphically display the number of bytes it has downloaded. Thread-safe Updates of the UI As you can see, the Swing library is powerful and flexible. Updating the UI, as a standard practice, must be done only by the AWT-Event
Dispatcher thread. So, a Runnable target is used, which contains the UI handling code, like the example in Listing 3. This keeps the UI updating thread-safe.
Safety Measures All the UI components mentioned above are placed in a JFrame called MainFrame. Since only one instance of MainFrame is needed, and to avoid others from accidentally creating multiple instances of this object, its constructor is made private, in other words, a Singleton. A public static method is made which maintains a private static reference to a MainFrame instance. So, the first time the method is accessed, the singleton reference will be null. A new instance is created and a reference to it is stored in the private static variable. The same reference is returned to all subsequent calls. This is how to control the number of instances of an object. To avoid running multiple instances of DoIt on the same computer, it opens a ServerSocket on a predefined port. Nothing is actually served from the port to clients who connect to the server; it is just to prevent other instances of DoIt from starting because the first instance would already have bound to that port. However, another instance can be executed on a different port by editing the DoIt.properties file. So, we can't really stop a very determined hacker.
Developing Plug-ins The protocol implementations are expected to be in separate Java Archive (JAR) files, with the Manifest file containing the Main-Class entry. It should specify the fully qualified name of the class implementing the IProtocolWrapper interface without the '.class' extension. Listing 4 shows the Manifest entry for HTTP_ProtocolWrapper.jar. When DoIt starts, all the JAR files in the 'pw' directory are loaded into a new URLClassLoader,
along with the default ClassLoader as shown in Listing 5. This new ClassLoader is attached to the current thread's ContextClassLoader, which will be later picked up by the IProtocolWrapperFactory. Here, all the classes specified by the Main-Class entries in their respective JAR files, which implement the IProtocolWrapper interface, are loaded and cached (see Listing 6). This method of loading the JAR files programmatically saves a lot of trouble for the user. All that needs to be done is to copy the plug-in to the 'pw' directory and DoIt restarted.
UML Class & Collaborat ion
Diagrams
Add Urls Pannel: This diagram explains the constructs used in designing the URLs panel for the Web Extractor where the URL path form where the intended file is taken as input parameter and submits the same for further process of download.
Core Download Worker:
This diagram explains the constructs used in designing the core Download Worker which takes care of organizing the download of files by implementing the multi threading functionalities with the number of URLs submitted by the users. Core Download worker is basically the main part of the source which takes charge of the logic implementation of the project.
Core Web Extractor:
This diagram explains the methods and parameters implemented in the purpose of designing the core Web Extractor, which takes activation of downloading process when user submits his request of file download and initiates it.
Doit:
This diagram explains the constructs used in designing the main program involved in this project, which initiates the Web Extractor and prompts the user to continue with the process by submitting the URL of the file to be downloaded and proceeds with further activities in sequences exhibited by collaborating the classes and methods designed.
Download List Panel:
This diagram explains the constructs used in designing the download list panel, which maintains the list of all downloads initiated from the Web Extractor, by submitting the URLs to the system. This even takes care of maintaining the associated information through the classes collaborated with it.
Web Extractor:
This diagram explains the constructs involved in the designing the basic download process. This diagram also explains the class construct and associated collaborations in this system development with the methods and parameters involved in this process.
Web Extractor GUI Constants:
This diagram explains the various constants used in the programming of the graphical user interface of the Web Extractor. They are also represented with their data types and their flow through various classes.
Download Panel:
This diagram explains the constructs used in designing the Download panel which is designed to display the information of the download process initiated by the user in multilevel with integrated display of further process through the associated panels internally.
Download Status Panel:
This diagram explains the constructs used in designing the Download status panel list component which is designed to display the information of status report regarding the download process initiated by the user in multilevel with a separate status indicator for each.
Download Status Panel List Component:
This diagram explains the constructs used in designing the Download status panel list component which is designed to display the information of status report regarding the download process initiated by the user in multilevel listing the threads involved with a separate status indicator for each.
File Recombiner:
This diagram explains the constructs used in designing the File Recombining process in the Web Extractor where the process of recombining the splitted file parts downloaded from various URLs, integrating them into a single file and submitting to the Core Web Extractor for further process.
Full Download details panel:
This diagram explains the constructs used in designing the full download details panel its linkage with the other panels in maintaining the sequence of downloading process for the Web Extractor where the process tracing is also exhibited.
Global Constants:
This diagram explains the various constants used in the programming of the software. They are also represented with their data types and their flow through various classes.
Grid Layout:
Class and collaboration diagram explains the constructs used in designing the Grid Layout used in downloading process.
HTTP_ProtocolWrapper:
This diagram explains the class and collaboration structure implemented in designing the HTTP_ProtocolWrapper class for the Web Extractor, which includes the protocol functionalities and its services to be implemented in downloading the information from various sites.
IHost:
This diagram explains the constructs used in designing the IHost for the Web Extractor where it maintains the details of accepting the URLs and displaying the sequence of the host information in various panels and frames used in the process.
InputOutput:
This InputOutput class and collaboration diagram explains the constructs used in designing the Input and output functionalities for the Web Extractor where the activities of transferring the input and output data through the streams is coded.
JFrame:
This diagram explains the constructs used in designing the JFrame for the Web Extractor where the actions and events are designed for the basic purpose of the project.
Main Frame:
This class and collaboration diagram explains the constructs used in designing the Main frame used in the project for the downloading where the other panels and frames are directed for the view of further process of download.
Main Trash:
This diagram explains the methods and parameters used for designing the main trash system. The methods used in this file are main() and maintrash() which are designated with their return types. This diagram even explains the collaboration of the other classes of this project with this class in functioning the basic process of download.
MD5:
This diagram explains the constructs used in designing the MD5.java, which is designed for the purpose of maintaining the threads for downloading the files and tracing the file information and keep track the download process.
Protocol Wrapper:
This diagram explains the constructs used in designing the protocol wrapper, where the protocols and their constructs are implemented for the service of transferring the data from the URL paths to the system by taking the request to the URLs and reading the data from the servers and placing back on the system.
Protocol Wrapper factory:
This diagram explains the methods and parameters used for coding the basic functionalities of the protocols implemented in this software for the downloading information through the Internet using the IP based data network. In this the construct of calling the protocols for initializing the threads for downloading the targeted information from various URLs submitted is designed.
Software Overview
About Java Initially the language was called as oak but it was renamed as Java in 1995. The primary motivation of this language was the need for a platform-independent (i.e., architecture neutral) language that could be used to create software to be embedded in various consumer electronic devices.
Java is a programmers language. Java is cohesive and consistent. Except for those constraints imposed by the Internet
environment, Java gives the programmer, full control. Finally, Java is to Internet programming where C was to system programming. Importance of Java to the Internet Java has had a profound effect on the Internet. This is because; Java expands the Universe of objects that can move about freely in Cyberspace. In a network, two categories of objects are transmitted between the Server and the Personal computer. They are: Passive information and Dynamic active programs. The Dynamic, Self-executing programs cause serious problems in the areas of Security and probability. But, Java addresses those concerns and by doing so, has opened the door to an exciting new form of program called the Applet. Java can be used to create two types of programs: Applications and Applets An application is a program that runs on our Computer under the operating system of that computer. It is more or less like one creating using C or C++. Javas ability to create Applets makes it important. An
Applet is an application designed, to be transmitted over the Internet and executed by a Java compatible web browser. An applet is actually a tiny Java program, dynamically downloaded across the network, just like an image. But the difference is, it is an intelligent program, not just a media file. It can react to the user input and dynamically change. Security Every time you that you download a normal program, you are risking a viral infection. Prior to Java, most users did not download executable programs frequently, and those who did scanned them for viruses prior to execution. Most users still worried about the possibility of infecting their systems with a virus. In addition, another type of malicious program exists that must be guarded against. This type of program can gather private information, such as credit card numbers, bank account balances, and passwords. Java answers both of these concerns by providing a firewall between a networked application and your computer. When you use a Java-compatible Web browser, you can safely download Java applets without fear of virus infection or malicious intent. Portability For programs to be dynamically downloaded to all the various types of platforms connected to the Internet, some means of generating portable executable code is needed .As you will see, the same mechanism that helps ensure security also helps create portability. Indeed, Javas solution to these two problems is both elegant and efficient.
The Byte code The key that allows the Java to solve the security and portability problem is that the output of Java compiler is Byte code. Byte code is a highly optimized set of instructions designed to execute by the Java run-time system, which is called the Java Virtual Machine (JVM). That is, in its standard form, the JVM is an interpreter for byte code. Translating a Java program into byte code helps makes it much easier to run a program in a wide variety of environments. The reason is, once the run-time package exists for a given system, any Java program can run on it. Although Java was designed for interpretation, there is technically nothing about Java that prevents on-the-fly compilation of byte code into native code. Sun has just completed its Just In Time (JIT) compiler for byte code. When the JIT compiler is a part of JVM, it compiles byte code into executable code in real time, on a piece-by-piece, demand basis. It is not possible to compile an entire Java program into executable code all at once, because Java performs various run-time checks that can be done only at run time. The JIT compiles code, as it is needed, during execution. Java Virtual Machine (JVM) Beyond the language, there is the Java virtual machine. The Java virtual machine is an important element of the Java technology. The virtual machine can be embedded within a web browser or an operating system. Once a piece of Java code is loaded onto a machine, it is verified. Byte code verification takes place at the end of the compilation process to make sure that is all accurate and correct. So byte code verification is integral to the compiling and executing of Java code.
As part of the loading process, a .Java
Javac
Java Java byte code Virtual
.Class
The above picture shows the development process a typical Java programming uses to produce byte codes and executes them. The first box indicates that the Java source code is located in a. Java file that is processed with a Java compiler called JAVA. The Java compiler produces a file called a. class file, which contains the byte code. The class file is then loaded across the network or loaded locally on your machine into the execution environment is the Java virtual machine, which interprets and executes the byte code. Java Architecture Java architecture provides a portable, robust, high performing environment for development. Java provides portability by compiling the byte codes for the Java Virtual Machine, which is then interpreted on each platform by the run-time environment. Java is a dynamic system, able to load code when needed from a machine in the same room or across the planet. Compilation of Code When you compile the code, the Java compiler creates machine code (called byte code) for a hypothetical machine called Java Virtual Machine (JVM). The JVM is supposed to execute the byte code. The JVM is created for overcoming the issue of portability. The code is written
and compiled for one machine and interpreted on all machines. This machine is called Java Virtual Machine. Compiling and interpreting Java Source Code
Java Interpreter (PC)
Source Code .. .. ..
PC Compil
Java Byte code
Macintosh Compiler
Java Interpreter (Macintosh
SPARC Compiler
(Platform
Independent )
Java Interpret er (Sparc)
During run-time the Java interpreter tricks the byte code file into thinking that it is running on a Java Virtual Machine. In reality this could be a Intel Pentium Windows 95 or SunSARC station running Solaris or Apple Macintosh running system and all could receive code from any computer through Internet and run the Applets. Simple Java was designed to be easy for the Professional programmer to learn and to use effectively. If you are an experienced C++ programmer, learning Java will be even easier. Because Java inherits the C/C++ syntax and many of the object oriented features of C++. Most of the confusing concepts from C++ are either left out of Java or
implemented in a cleaner, more approachable manner. In Java there are a small number of clearly defined ways to accomplish a given task. Object-Oriented Java was not designed to be source-code compatible with any other language. This allowed the Java team the freedom to design with a blank slate. One outcome of this was a clean usable, pragmatic approach to objects. The object model in Java is simple and easy to extend, while simple types, such as integers, are kept as highperformance non-objects. Robust The multi-platform environment of the Web places extraordinary demands on a program, because the program must execute reliably in a variety of systems. The ability to create robust programs was given a high priority in the design of Java. Java is strictly typed language; it checks your code at compile time and run time. Java virtually eliminates the problems of memory management and de-allocation, which is completely automatic. In a well-written Java program, all run time errors can and should be managed by your program.
HTML Hypertext Markup Language (HTML), the languages of the World Wide Web (WWW), allows users to produces Web pages that include text, graphics and pointer to other Web pages (Hyperlinks). HTML is not a programming language but it is an application of ISO Standard 8879, SGML (Standard Generalized Markup Language), but specialized to hypertext and adapted to the Web. The idea behind Hypertext is that instead of reading text in rigid linear structure, we can easily jump from one point to another point. We can navigate through the information
based on our interest and preference. A markup language is simply a series of elements, each delimited with special characters that define how text or other items enclosed within the elements should be displayed. Hyperlinks are underlined or emphasized works that load to other documents or some portions of the same document. HTML can be used to display any type of document on the host computer, which can be geographically at a different location. It is a versatile language and can be used on any platform or desktop. HTML provides tags (special codes) to make the document look attractive. HTML tags are not case-sensitive. Using graphics, fonts, different sizes, color, etc., can enhance the presentation of the document. Anything that is not a tag is part of the document itself. Basic HTML Tags : <!---> Specifies comments Creates hypertext links Formats text as bold Formats text in large font. Contains all tags and text in the HTML document Definition of a term Creates definition list Formats text with a particular font Encloses a fill-out form Defines a particular frame in a set of frames Creates headings of different levels Contains tags that specify information about a document <HR>...</HR> <HTML></HTML> <META>...</META> Creates a horizontal rule Contains all other HTML tags Provides meta-information about a document <A>.</A> <B>.</B> <BIG>.</BIG> <BODY></BODY> <DD></DD> <DL>...</DL> <FONT></FONT> <FORM>...</FORM> <FRAME>...</FRAME> <H#></H#> <HEAD>...</HEAD>
<CENTER>...</CENTER> Creates text
<SCRIPT></SCRIPT> <TABLE></TABLE> <TD></TD> <TR></TR> <TH></TH> ADVANTAGES
Contains client-side or server-side script Creates a table Indicates table data in a table Designates a table row Creates a heading in a table
A HTML document is small and hence easy to send over the net. It is small because it does not include formatted information. HTML is platform independent. HTML tags are not case-sensitive.
Te a ing C hs t p t & e r b7 g g i De u ng Strategi es
Testing Testing is the process of detecting errors. Testing performs a very critical role for quality assurance and for ensuring the reliability of software. The results of testing are used later on during maintenance also. Psychology of Testing The aim of testing is often to demonstrate that a program works by showing that it has no errors. The basic purpose of testing phase is to detect the errors that may be present in the program. Hence one should not start testing with the intent of showing that a program works, but the intent should be to show that a program doesnt work. Testing is the process of executing a program with the intent of finding errors. Testing Objectives The main obje ctive of te sting is to uncover a host of errors, syste matically and with minim um eff ort and time . Sta ting f ormally, we can say, Testing is a process of executing a program with the intent of finding an error. A successful test is one that uncovers an as yet undiscovered error. A good test case is one that has a high probability of finding error, if it exists. The tests are inadequate to detect possibly present errors. The software more or less confirms to the quality and reliable standards. Levels of Testing In orde r to uncove r the e rrors pre se nt in diffe rent pha se s we have the conce pt of le ve ls of testing. The basic levels of te sting are as shown below
Client Needs
Acceptance Testing
Requirements
System Testing
Design
Integration Testing
Code System Testing
Unit Testing
The philosophy behind testing is to find errors. Test cases are devised with this in mind. A strategy employed for system testing is code testing. Code Testing This strategy examines the logic of the program. To follow this method we developed some test data that resulted in executing every instruction in the program and module i.e. every path is tested. Systems are not designed as entire nor are they tested as single systems. To ensure that the coding is perfect two types of testing is performed or for that matter is performed or that matter is performed or for that matter is performed on all systems. Types Of Testing
Unit Testing Link Testing Unit Testing Unit testing, focuses verification effort on the smallest unit of software i.e. the module. Using the detailed design and the process specifications
testing is done to uncover errors within the boundary of the module. All modules must be successful in the unit test before the start of the integration testing begins. In this project each service can be thought of a module. There are two modules like Client and Server. Giving different sets of inputs has tested each module. When developing the module as well as finishing the development so that each module works without any error. The inputs are validated when accepting from the user. In this application developer tests the programs up as system. Software units in a system are the modules and routines that are assembled and integrated to form a specific function. Unit testing is first done on modules, independent of one another to locate errors. This enables to detect errors. Through this errors resulting from interaction between modules initially avoided. Link Testing Link testing does not test software but rather the integration of each module in system. The primary concern is the compatibility of each module. The Programmer tests where modules are designed with different parameters, length, type etc. Integration Testing After the unit testing we have to perform integration testing. The goal here is to see if modules can be integrated properly, the emphasis being on testing interfaces between modules. This testing activity can be considered as testing the design and hence the emphasis on testing module interactions. In this project integrating all the modules forms the main system. When integrating all the modules I have checked whether the integration effects working of any of the services by giving different combinations of inputs with which the two services run perfectly before Integration.
System Testing Here the entire software system is tested. The reference document for this process is the requirements document, and the goal is to see if software meets its requirements. Here entire VOIP has been tested against requirements of project and it is checked whether all requirements of project have been satisfied or not. Acceptance Testing Acceptance Test is performed with realistic data of the client to demonstrate that not emphasized. In this project VOIP I have collected some data and tested whether project is working correctly or not. Test cases should be selected so that the largest number of attributes of an equivalence class is exercised at once. The testing phase is an important part of software development. It is the process of finding errors and missing operations and also a complete verification to determine whether the objectives are met and the user requirements are satisfied. Black Box Testing This testing method considers a module as a single unit and checks the unit at interface and communication with other modules rather getting into details at statement level. Here the module will be treated as a black box that will take some input and generate output. Output for a given set of input combinations are forwarded to other modules. the software is working satisfactorily. Testing here is focused on external behavior of the system; the internal logic of program is
Chapt er 8
Output Screens
Chapt er 9
Conclusion & Recommenda tions
CONCLUSION After completion of this project we are satisfied that we have completely designed all the requirement of our system and solved out the problems observed in the existing systems. And as we have used designing patterns we have reduced the system problems like hanging and late messaging and etc by implementing the standards of the Java Network architecture. The users of this system can take a chance of the downloading the files from multiple sites, as the present day Internet is a huge database of various types of information (Files). The users of the Web Extractor are instructed to use the system for the multiple and multilevel downloads by submitting the appropriate URL path of the file required.
Bibliography
1.
THE COMPLETE REFERENCE JAVA 2
PATRICK NAUGHTON HERBERT SCHILDT JAWORSICK
TATA McGraw HILL TECH.MEDIA Publication
2.
JAVA CERTIFICATION
3.
COMPLETE JAVA 2
ROBERT HELLERS ERNEST
DPB PUBLICATIONS WILEY India Ltd. TATA McGraw Hills
4. MASTERING JAVA SECURITY

5. SOFTWARE ENGINEERING
RICH HELTON ROGER PRESSMAN

Web Extractor

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Web Extractor

Încărcat de

Drepturi de autor:

Formate disponibile

Web Extractor

Existing and Proposed System

Design and Implementation of 'core'

System Descrip tion

Web Extractor and Download Workers

Figure 1. DoIt* screenshot. (Layout of the UI components.)

1. Panel to flash messages. 2. Download

3. Panel inside the DownloadStatusPanel, displaying file details.

4. Panel inside the DownloadStatusPanel, containing an array of

1. Enter the URLs (plural because it can be a list of mirror sites) as

5. Recombine the fragments.

background to do all the housekeeping:

UML Class & Collaborat ion

Core Download Worker:

Core Web Extractor:

Download List Panel:

Web Extractor GUI Constants:

Download Status Panel:

Download Status Panel List Component:

Full Download details panel:

Protocol Wrapper factory:

As part of the loading process, a .Java

Java Byte code

Java Interpreter (Macintosh

Java Interpret er (Sparc)

<CENTER>...</CENTER> Creates text

<SCRIPT></SCRIPT> <TABLE></TABLE> <TD></TD> <TR></TR> <TH></TH> ADVANTAGES

Te a ing C hs t p t & e r b7 g g i De u ng Strategi es

Code System Testing

Conclusion & Recommenda tions

THE COMPLETE REFERENCE JAVA 2

PATRICK NAUGHTON HERBERT SCHILDT JAWORSICK

TATA McGraw HILL TECH.MEDIA Publication

ROBERT HELLERS ERNEST

DPB PUBLICATIONS WILEY India Ltd. TATA McGraw Hills

4. MASTERING JAVA SECURITY

RICH HELTON ROGER PRESSMAN

S-ar putea să vă placă și