Proxy Web Logging and Search System

Part of the C2000 system deals with the logging of URL's visited in class and their subsequent processing so that they may be displayed appropriately at the end of the class. It is precisely for this purpose that a Proxy server is used. The browsers in C2000 enabled classes are configured to 'go' via the Proxy. The Proxy intercepts the URl's, logs them and if the requested Web-page is not present in it's cache, it fetches it from the appropriate location. The C2000 Proxy server is a modified version of the Apache Web server built by the Apache Group.

Earlier, a commercial product called 'Frontier' was used as a Web-logger but the downside was that it only ran on Macintosh machines. This necessitated the presence of at least one Macintosh in a C2000-enabled classroom leading to considerable inflexibility. The Apache Proxy server on the other hand poses no such constraints. In fact, the flexible caching options provided by the Proxy alongwith it's potential use as a firewall have made it an ideal choice for this project.

With this overview, let us have a look at the design aspects and data flow involved.


High-level Architecture Diagram

Proxy Web Logging Sequence :

  1. The browser in a C2000-enabled class makes a Web request.
  2. The Proxy server receives the request and logs the URL.
  3. The Proxy checks its cache to see whether the requested page is present. If it is, it takes the page from there; otherwise, a request is sent over the Net as usual.
  4. When the page is brought in, the Proxy stores the page in it's cache.
  5. Modules process the URL interacting with the class and room files in the process.
  6. This URL alongwith other information such as the date and time of logging and the coursename and roomnumber of that class is stored in XML format in a file.

Log Search Sequence :

  1. At the end of every C2000-enabled class StreamWeaver passes a request to a CGI program for all the URL's visited in a particular class. It passes the coursename as a parameter.
  2. The CGI program after parsing the query constructs an expression which is used for the search.
  3. This expression is used by the sgrep utility to perform a search on the XML file.
  4. The search results (in XML format) are passed back to StreamWeaver.
  5. StreamWeaver combines these results with other information such as slide info, audio and video and outputs them to the client browser.
NOTE : In the future, we can envisage passing any number and type of parameters to the CGI program for searching. The present system has been developed keeping in mind this criterion and currently supports this feature.


Installation Instructions for Webtracker v1.0

Download the source code here

Webtracker is an extension of the Apache Web server (v1.2.6) proxy module. For the instructions below, [base_directory] signifies the base directory where apache_1.2.6 would be installed. The setup in the downloaded code assumes /hm31/webtrack as the server base directory.

  1. Edit the configuration files httpd.conf, srm.conf and access.conf in the subdirectory called "conf".
  2. Go to the subdirectory "[base_directory]/apache_1.2.6/src". Open the file named constants. You will find, among other items, a list of files and their paths. Edit the paths as per your setup.
  3. Edit the files class_and_room.txt and room_ip_mapping.txt depending upon your setup. These files specify the courses conducted in classrooms at specified times and days and the IP addresses of machines in a classroom respectively.
    Note : A list of all files used in Webtracker and their paths is available.
  4. Compile the code.
  5. Start the Server
  6. You now need to go to the classroom browser machines and have them set up to go via the Proxy. The port number can be configured using the configuration files
    The Proxy address on the fce machine is fce.cc.gatech.edu and the port is 8010.

This completes the installation process.

If you would like to verify whether the system works, perform the following steps :
  1. Surf a few sample sites on a Proxy-enabled browser.
  2. Log onto the fce machine. Switch to /hm31/webtrack/apache_1.2.6/logs. View the file proxylog. It should contain records having the following structure for each URL visited :
                      <entry>
                        <date>01/15/99</date>
                        <time>09:56:53 AM</time>
                        <hostname>dhcp12.gatech.edu</hostname>
                        <url>http://home.netscape.com/h.js</url>
                        <coursenumber>cs6751b_99_Winter</coursenumber>
                        <room>102</room>
                      </entry>
    
    If a similar record exists, the proxy logging system works fine. If it does not, please refer to the TroubleShooting section of this page for help.
  3. To verify whether the search system works, start your browser. This browser need not be Proxy-enabled. For demonstration purposes, we shall assume that the date on which you surfed was 15 January, 1999. Type in http://fce.cc.gatech.edu:8080/cgi-bin/c2k_search?date=01/15/99 in the URL Location box.
  4. You should be able to see records with exacly the same structure as in 2. above, but having only <date>01/15/99</date> as entries. What has actually happened is that a search has been performed on the XML file and only those records which had been logged on 15 January, 1999 have been selected and displayed.
  5. You can repeat the above procedure with any of the following keywords or any combination thereof(provided it is in the standard CGI 'GET' query format) :

Troubleshooting Tips and FAQ's

  1. Help! The Proxy does not log requests!!
  2. Help! The Proxy logs requests but the browser complains that it is 'unable to connect to the server. The server may be down or unreachable...'
  3. Help! The Proxy logs requests but the browser cannot find the file /cgi-bin/c2k_search.
  4. The browser does not give any error but the display has only 3 lines :
    <XML version = "1.0"?>
    <proxylog>
    </proxylog>
    


Future Computing Environments Georgia Institute of Technology






Comments/Complaints???   Mail to vishal@cc.gatech.edu

Last modified: Wed January 13 22:55:07 EDT 1999