This java crawler is extremely useful if you need to search a webpage for a specific word, tag or whatever you want to analyze in the data retrieved from a given URL.
I’ve used it for example to search for a specific error message that appeared in a page when a connection to the database could not be done. It helped me to prove that the error was really caused as a consequence of the connection link failure to the database.
The crawler saves in the file system the page that contains the string you’re searching for. The name of the file contains the time from when the string was found within the page body. With this information I could match the time information present on the file name with the time accompanying the error present in the web server log.
The code was originally developed by Rodrigo Gama that is a fellow developer/coworker of mine. I just adapted the code a little bit to fit my needs.
What’s the idea behind the crawler?
The main idea behind the crawler is the following:
You pass 2 essential parameters to run the application - these are the string you want to search for and the URLs you want to verify.
A thread for each URL is then created. This is done using the PageVerificationThread.java class that implements Runnable.
The PageVerificationThread creates a notificator object that is responsible for calling the MailSender object that in its turn sends a notification (message) to the emails you hardcoded in the Main.java class.
The message is also hardcoded inside the run() method of PageVerificationThread class.
I advise you to read the comments in the code.
You’ll have to change some strings in the code as is the case of the username and password used to send the e-mails.
The crawler has 4 classes: MailSender.java, Main.java, Notificator.java and PageVerificationThread.java.
This is the Main class:
This is the PageVerificationThread class:
This is the Notificator class:
This is the MailSender class:
How to use it?
Using Eclipse you just have to run it as shown in this picture:
I hope you make good use of it!
Here it is for your delight: http://leniel.googlepages.com/JavaCrawler.zip