As large-scale data processing becomes easier and more affordable to everyone, so too increases the temptation to try and use new technologies and methods to reduce the amount of manual labor that usually comes with classifying and categorising big data collections. With textual data, the techniques of extracting useful information from unstructured data have already been more or less established. With image-heavy data sets – like the Portus photos – we have to turn to image processing methods such as object detection and text recognition, which unfortunately are still very unreliable and in most cases do not stand up to a comparison with a human doing the work.
Before starting work on the Portus data, I had some vague knowledge about the state of the art in image processing. I knew that there exist some fairly robust object and text recognition algorithms and that the data itself would not be very diverse – there would be some duplicate photos (or almost identical), and many objects (such as blackboard tablets) would probably be present on a lot of the pictures. Therefore, as a proof of concept experiment, I constructed a simple image processing pipeline which – for every photo – attempted to find a blackboard tablet and recognise the text written on it. If successful, this program could then be used to try and extract notes from photographs or at least find all the photographs that contain notes. Given that the data was also “real” in the sense that nothing had been done beforehand to specifically make analysing the data easier (such as standardised EXIF tagging, strict folder structures), any breakthroughs and successes could then probably be repeated in real life scenarios with similar data.
However, the results of this processing were not entirely satisfactory. While detecting the tablet from a given image (image above) was easy enough, recognizing the handwritten text from it’s surface was not. This was due to the fact that well-performing text recognition algorithms require a training set of letters in order to optimise it’s performance. In the case of Portus data, creation of this set was not possible due to time restrictions. A possible solution for this problem is to replace anything hand-written with print-outs or Quick Response (QR) codes from a computer, as recognition of typed text is much easier.
In general, I believe that using image processing methods to extract information or classify photography collections is possible, however with the current state of technology, the results are far from ideal. A lot can also be improved by adjusting the process of photography itself and the initial storage of the photos. For instance, using a smartphone’s GPS, one could easily add fairly accurate positioning data to individual photos and later see the locations as dots on a map.
Karl Potisepp MSc, worked on Portus data as a part of his thesis.