Data mining and image processing experiments on photographs from Portus

As large-scale data processing becomes easier and more affordable to everyone, so too increases the temptation to try and use new technologies and methods to reduce the amount of manual labor that usually comes with classifying and categorising big data collections. With textual data, the techniques of extracting useful information from unstructured data have already been more or less established. With image-heavy data sets – like the Portus photos – we have to turn to image processing methods such as object detection and text recognition, which unfortunately are still very unreliable and in most cases do not stand up to a comparison with a human doing the work.

Portus Project 2011

Before starting work on the Portus data, I had some vague knowledge about the state of the art in image processing. I knew that there exist some fairly robust object and text recognition algorithms and that the data itself would not be very diverse – there would be some duplicate photos (or almost identical), and many objects (such as blackboard tablets) would probably be present on a lot of the pictures. Therefore, as a proof of concept experiment, I constructed a simple image processing pipeline which – for every photo – attempted to find a blackboard tablet and recognise the text written on it. If successful, this program could then be used to try and extract notes from photographs or at least find all the photographs that contain notes. Given that the data was also “real” in the sense that nothing had been done beforehand to specifically make analysing the data easier (such as standardised EXIF tagging, strict folder structures), any breakthroughs and successes could then probably be repeated in real life scenarios with similar data.

However, the results of this processing were not entirely satisfactory. While detecting the tablet from a given image (image above) was easy enough, recognizing the handwritten text from it’s surface was not. This was due to the fact that well-performing text recognition algorithms require a training set of letters in order to optimise it’s performance. In the case of Portus data, creation of this set was not possible due to time restrictions. A possible solution for this problem is to replace anything hand-written with print-outs or Quick Response (QR) codes from a computer, as recognition of typed text is much easier.

In general, I believe that using image processing methods to extract information or classify photography collections is possible, however with the current state of technology, the results are far from ideal. A lot can also be improved by adjusting the process of photography itself and the initial storage of the photos. For instance, using a smartphone’s GPS, one could easily add fairly accurate positioning data to individual photos and later see the locations as dots on a map.

Karl Potisepp MSc, worked on Portus data as a part of his thesis.

Graeme Earl says:

27th February 2014 at 2:03 pm

This was such a great addition to our digital work on the Portus project! We are also considering ways to take this work forward. Alongside previous work trialling digital tablets such as ipads or kindles we see the identification of the hand written tablets as a useful step forward. Ideally these tablets should not only be identified in the future but isolated from the surrounding scene so that subsequent manual annotation would be very quick i.e. we could cycle through all of the tablets assigning appropriate metadata. We are also intrigued by the wider possibilities of this and related approach. Ongoing research in web-scale computer vision for example could allow easy retrieval of related images captured at Portus. Furthermore, by identifying shared architectural or object features in this corpus of c. 30,000 images we could then target subsequent structure from motion and stereo work, as James Miles recently demonstrated in batch processing a large number of helicopter images from the site. Secondly we are considering the role that the various forms of head mounted imagery captured on site could play in deriving coarse but functional surface models.