To digitize a document is to preserve it: to stop its deterioration by the passage of time; to freeze that image today and now. Nevertheless, it's very important to preserve the original; in case of doubt about the date, an analysis of the paper and the ink will shed some light on the question. And, of course, the original is important by itself.

There are two basic ways to digitize documents :

1) To photograph them with a scanner.

2) To photograph them with a digital camera.

Before to go in details let's see some common aspects to any situation. Visiting a library or a museum, it's important to watch carefully what we are going to work with, and how is the environmental illumination, to figure out how to proceed. We can find single sheets, a book, few documents or a considerable volume of work. We can find a room generously illuminated, or a shady place.

In case of being a book with many pages, we must bear in mind that scanners are much slower than digital cameras. If there are only few pages, and yet better, single sheets, it's preferable to take a decision in favor of the scanner.

If we are working with a digital camera in a well illuminated place, probably we will not need additional light. If it's a dark place, is convenient to add spotlights. The flash wouldn't be used to take photos of documents. It's better to find a proper standard of light, to set it up, and directly to work with it.

Some documents are thin sheets and they could show transparency to the other side of the image. We'll need to set a black piece of paper behind the face we are working with.

The table background we are going to use preferably must be of a dark color. If it is a table of a clear color, we need to add a velvet cloth or any another black fabric.

It's always convenient to go with a laptop and to check the final result before starting transferring images. In the digital cameras, often what one sees in the visor it is not what one will see on the computer's screen later. Trying the result, blemishes can be corrected before beginning. Also it's good to go with a USB powered external hard drive, to getting rid of the whole weight of the work and to not reduce memory in the computer while the work is advancing.


They are of professional and domestic type. The professionals, of the highest resolution, don't use a  top. Documents are facing up and the scanner shoots from a suitable height, for example, 25 inches. Some of them can flip the pages automatically, as that of the photo above, and can make 3000 captures in an hour. The problem of these professional elements is that they are very expensive: the price goes around 15 thousand and 35 thousand dollars. The common scanners (flatbed), can fulfill very well their mission, if speed is not needed. The lower transfer speed  (approximately 10 to 20 seconds by capture) makes the work harder. A scanner, in addition, is more difficult to transport than a digital camera, and should be justified the need to take it to the place. The advantage is that it is not necessary to adapt very much: setting it to a standard of 200 dpi (dots per inch) of resolution the result uses to be enough satisfactory and the scanner, by itself sets the lighting and compensates the images problems.


An important recommendation is to fix the camera with a tripod. It will avoid shaking images, impossible to control when taking many pictures. The trick for working fast is to add a remote control from the camera to the laptop. Simply pulsing the keyboard space tab, the camera shoots, saving at the same time the image on the file we want to store it. In this way, working with a book, we take position on the place; with one hand we flip the pages to the next document and with the other we tap the laptop's spacebar. In this way, we can photograph around 400 or 500 sheets in one hour, depending on our practice and skill. The maximum average I heard an archivist reached is around 1,000 captures in an hour.

The camera should be mounted on a tripod with the lens focusing on a 90 degrees angle in relation with the tripod legs, being the lens parallel to the document. Any minimum angular deviation will provoke perspective distortions.

The flash must be, as possible, disabled: it use to result in washed images or with an excessive shining. Many digital cameras have preset modes: depending on the brand, they could be "museum mode" or "documents", or "interior mode". In these positions they open the diaphragm and delay the shoot's speed, making it more adequate with the illumination. If the first tests with these preset modes were not satisfactory, there will be necessary to regulate the diaphragm and the speed manually, to obtain the wished quality and brightness. This can take some time, but you'll do it once. Then the standard remains fixed, and positioning the tripod, some additional light if it's necessary, and with a remote control, the work is automated and becomes fast and efficient.

Digital cameras with reflex lenses generally show the image in the LCD viewer much closer than what is going to be the final result. The more megapixels the camera has, leads, of course, to a better image quality.

Connecting an AC adaptor, it's recommended to avoid the batteries be continuously discharging, which, surely, it will happen in a couple of hours. Otherwise, to carry with rechargeable batteries and a charger.

In both cases is better to store the archives as  jpg files. They are  more compressed than the others (TIFF, etc.) and, although having a little bit of less quality, we will not install an elephant in our system.


It's important to rename the files while we are classifying them, for keeping them easily identified. Otherwise, images will be stored automatically with the digital camera code.

A PDF file as a final storage has more compression, allowing the use of more space than the other files. On the other hand, they also allow to transcript the information to an OCR software (Optical Character Recognition): it reads and translate a tiped or printed text on a kind of text which later is able to be edited and formatted,  as any Word file. OCR doesn't read manuscripts.

It's all a matter of patience and practice, until reach the desired results. No matters to make a lot of trials: once we have the standard we want, we have walked the first step. After this, the work will take a fast route on the railroad of our desires.


Pablo Briand, July 30th 2009.