Image retrieval is a fundamental problem in computer vision: given a query image, can you find similar images in a large database? This is especially important for query images containing landmarks, which accounts for a large portion of what people like to photograph.
You can check my team’s implementation at Andres Torrubia’s GitHub
Step 1: Single Model CNN extraction + Nearest Neighbour Search
Extracts the last convolution layer of a given architecture applied GeM pooling and performs L2 normalization.
It uses augmentation (images are LR flipped) so both the index and queries features are duplicated.
Once features are extracted, it performs regular nearest neighbour search using PCA, whitening and L2 normalization and the resuls are ready for a submission.
Step 2: Ensembling
we take the results of nearest neighbours search for each query (and flipped LR queries) and aggregates distances of different runs (different architectures) and builds a submission accordingly. For flipped LR images, we pick the minimum distance.