Fine Grind Sketch Based Image Retrival Based on Vision Transformer

Image credit: Charlie Tong

Abstract

With the advent of the Internet era, online shopping has become the choice of many people. Online shopping has generated a large number of product images in order for customers to obtain product details. Keywords do not always clearly describe the item characteristics. This is especially true for fashion related items, such as clothes and shoes. Thus, the function of search by image is based on image content for searching. However, not all search methods can use images as the basis for a query. We can avoid certain scenarios that cannot be photographed if we can retrieve them through sketch drawings. At the same time, sketches are not limited by the physical world, and we are able to retrieve the target gallery by our own ideas and thus get the items we really need. In order to cope with the increasing choice of e-commerce platforms and improve the accuracy of sketch retrieval. In this paper, we first propose a new sketch-based fine-grained image retrieval method, TripleFormer-Sketch Based Image Retrival (TF-SBIR). This method integrates the deep learning methods of the last five years, and uses the latest sketch vectorization representation to input sketch stroke sequences into the improved triple-former neural network, and uses Transformer to replace part of the convolutional neural network and residual network to improve the Top1 and Top10 accuracy of the traditional triple-former neural network retrieval results. For the application of this method, a fine-grained sketch retrieval system is proposed in this paper, which queries the corresponding alternative images in the database in real time according to the information of strokes input by users. It also analyzes the contents of the alternative pictures to obtain information such as color and model of the retrieved objects. Extensive experiments conducted in QMUL-ShoeV2, a dataset designed for triplet neural networks, show that the proposed TF-SBIR retrieval method outperforms the traditional triplet neural network method, while the experimental system can effectively help users to obtain the results of sketches corresponding to physical pictures.

Publication
Undergraduate Capstone design
The PDF is translated from Chinese to English by DeepL
Charlie Tong
Charlie Tong
Undergraduate at the Ohio State University

My research interests include distributed robotics, mobile computing and programmable matter.