Computer Vision for Museum Collection Comparison

Improving the exploration of online collections

In their 2022 thesis of the same name, Noa Nonkes (University of Amsterdam) proposes a data-driven approach to improve the exploration of online museum collections.

The starting point of the study is the observation that current processes require users to know the specific collection they wish to explore before they can search, which is not conducive to general exploration. A possible solution is to merge all collections, which, however, results in a heterogeneous collection. To address these issues, the study uses computer vision techniques to compare and find similarities among museum collections, which aids in combining them.

In the process, high-level features from images of art pieces were extracted using a neural network and clustered using k-Means. The effectiveness of this approach was tested on a well-annotated museum dataset, using two pre-trained models, ResNet18 and a Vision Transformer (ViT), both of which were further fine-tuned on the same museum dataset.

The results, both quantitative and qualitative, showed that the fine-tuned ViT model with k=10 clusters performed best. This approach was then applied in a case study using the Allard Pierson (AP) dataset. The study concluded that the ViT model is suitable for extracting information from visual museum data. However, it also noted that the optimal number of clusters for representing an entire museum collection is specific to each museum.

In the study, Nonkes worked with 20 clusters based on the dataset of the Metropolitan Museum of Art (MET). The Elbow method (using the Davies-Bouldin score and the Silhouette score) was tested as well, but the results were not sufficiently conclusive. Ultimately, k = 20/10/6/4/2 was considered sufficient to test. It turned out that k = 2 worked best for these scores, but after qualitative evaluation, this k did not seem to be optimal. That is the reason why, finally, k = 10, the second-best option, was chosen. However, the final conclusion of the study is that the number of clusters is specific to the museum dataset (what works for the MET might not necessarily work for AP).

On the subject of metadata, the study concludes that it would have been very helpful if it had been known in which knowledge domain an artwork normally occurs. This would make it clear whether the newly created clusters consist of artworks that normally occur together or not. In addition, a specific way of dating and describing the museum objects could help to better describe the clusters created.