Rob Bender
Me | Resume | Websites | Projects | Photography | Photolog | Flickr Photos | Contact

Image Caching Server

Another project I worked on at E-Commerce Solutions was a product search engine with data supplied from many partner websites. One problematic aspect of the search engine was trying to display product thumbnails on the results page.

Along with the product's information, each partner site also sent us the URL to a full-size picture of the product. Image size varied greatly from site to site, but we wanted to display small thumbnails all formatted to similar dimensions. The first solution was to set the height and width attributes of the image tag, but unfortunately we did not know the original dimensions ahead of time, so attempting to hardcode dimensions resulted in horribly distorted pictures. The second solution used JavaScript to scale the pictures, but it only worked in Internet Explorer 4.0 and above, and would sometimes generate JavaScript errors if there was a problem downloading the image. Either solution required the client to download the entire original picture before it could scale it down, and some of these photos were full-screen.

My solution was to create an intermediate proxy server that would download the images from the partner sites as they were requested and then scale the pictures down on the fly before sending them to the client browser. For efficiency, once an image was scaled, it was saved in a cache on the server so future requests did not require repeat trips to the partner servers.

I wrote the proof of concept in Python using the Python standard library and the Python Imaging Library (PIL). The URL of a requested image was sent to the proxy server which used Python's URL handling libraries to pull the image from the partner site. The PIL library loaded the image and extracted its dimensions. With this information, the server was able to properly calculate the image's height and width to fit the search results page. The PIL library scaled down the image and saved a copy to disk, keeping a list of scaled image in memory. The scaled image was then sent back to the client's browser.

The next time the same image was requested, the server simply redirected the client to the local copy sitting on the server's hard drive. If the requested image could not be found on the partner's site, then the client was redirected to a generic "Picture Not Available" graphic.

The solution was dubbed the Image Caching Server, and a full implentation was written by one of my coworkers in C as a custom Apache module running on a Linux server.