Please wait a minute...
Computational Visual Media  2020, Vol. 6 Issue (4): 477-487    doi: 10.1007/s41095-020-0184-6
Research Article     
A new dataset of dog breed images and a benchmark for fine-grained classification
Ding-Nan Zou1,2(),Song-Hai Zhang1,(✉)(),Tai-Jiang Mu1(),Min Zhang3()
1 Department of Computer Science and Technology, BNRist, Tsinghua University, Beijing 100084, China
2 NaJiu Company, Hunan 410022, China
3 Harvard Medical School, Brigham and Women’s Hospital, Boston, MA 02115, USA
Download: PDF (957 KB)      HTML  
Export: BibTeX | EndNote (RIS)      

Abstract  

In this paper, we introduce an image dataset for fine-grained classification of dog breeds: the Tsinghua Dogs Dataset. It is currently the largest dataset for fine-grained classification of dogs, including 130 dog breeds and 70,428 real-world images. It has only one dog in each image and provides annotated bounding boxes for the whole body and head. In comparison to previous similar datasets, it contains more breeds and more carefully chosen images for each breed. The diversity within each breed is greater, with between 200 and 7000+ images for each breed. Annotation of the whole body and head makes the dataset not only suitable for the improvement of fine-grained image classification models based on overall features, but also for those locating local informative parts. We show that dataset provides a tough challenge by benchmarking several state-of-the-art deep neural models. The dataset is available for academic purposes at https://cg.cs.tsinghua.edu.cn/ThuDogs/.



Key wordsfine-grained classification      dog      dataset      benchmark     
Received: 18 May 2020      Published: 30 November 2020
Fund:  National Natural Science Foundation of China(Project Nos. 61521002 and 61772298)
Corresponding Authors: Song-Hai Zhang     E-mail: zoudn14@mails.tsinghua.edu.cn;shz@tsinghua.edu.cn;taijiang@tsinghua.edu.cn;mzhang@bwh.harvard.edu
About author: Ding-Nan Zou is a master candidate in the Department of Computer Science and Technology at Tsinghua University, Beijing. His research interests include computer graphics and computer vision, especially dog face and iris recognition.|Song-Hai Zhang received his Ph.D. degree in computer science and technology from Tsinghua University, Beijing, in 2007. He is currently an associate professor in the Department of Computer Science and Technology at Tsinghua University. His research interests include image and video analysis and processing as well as geometric computing.|Tai-Jiang Mu is currently an assistant researcher in the Department of Computer Science and Technology, Tsinghua University, Beijing, where he received his bachelor and doctor degrees in computer science and technology in 2011 and 2016, respectively. His research interests include visual media learning, SLAM, and human robot interaction.|Min Zhang is a researcher in Harvard Medical School, Brigham and Women’s Hospital. She received her Ph.D. degree in computer science from Stony Brook University and the other Ph.D. degree in mathematics from Zhejiang University. She is an expert in the fields of geometric modeling, medical imaging, graphics, visualization, machine learning, 3D technologies, etc.
Cite this article:

Ding-Nan Zou,Song-Hai Zhang,Tai-Jiang Mu,Min Zhang. A new dataset of dog breed images and a benchmark for fine-grained classification. Computational Visual Media, 2020, 6(4): 477-487.

URL:

http://cvm.tsinghuajournals.com/10.1007/s41095-020-0184-6     OR     http://cvm.tsinghuajournals.com/Y2020/V6/I4/477

Fig. 1 Dog variations in our dog dataset. (a) Great Danes exhibit large variations in appearance, while (b) Norwich terriers and (c) Australian terriers are quite similar to each other.
10].
">
Fig. 2 Birds in the CUB200-2011 dataset [10].
Fig. 3 Teddy and Cassell Dogs.
Fig. 4 Snapshots of Tsinghua Dogs Dataset.
blue) and their heads (red).
">
Fig. 5 Bounding boxes for whole dogs (blue) and their heads (red).
Fig. 6 Adjustment software.
Fig. 7 Labeled Images.
DatasetBreedsImagesImages per breedObject
CUB-200200603330Bird
Stanford Dogs12020,580150-252Dog
Ours13070,428200-7449Dog, dog’s face
Table 1 Dataset comparison
Fig. 8 Top 24 breeds of dogs by number of images.
Fig. 9 Fraction of the image covered by the dog’s head bounding box.
Fig. 10 Fraction of the image covered by the dog’s body bounding box.
Fig. 11 Image resolutions in the Stanford Dogs and Tsinghua Dogs datasets (pixels).
DatasetInformationAccuracy reported in Ref. [19]Accuracy in our test
CUB 200-2011200 species of birds
11,788 pictures
88.9% (single)
89.6% (combined)
88.609% (single)
89.454% (combined)
Stanford Dogs120 breeds of dogs
20,580 pictures
84.674% (single)
86.515% (combined)
Tsinghua Dogs Dataset130 breeds of dogs
70,428 pictures
81.98% (single)
83.52% (combined)
Table 2 Performance of PMG [19] on different datasets
ModelBackboneBatchsizeEpochsAccuracy
Inception V36420077.66%
WS-DANInception128086.404%
PMGResNet501620083.52%
TBMSL-NetResNet50620083.7%
Table 3 Fine-grained classification accuracy of PMG [19], TBMSL-Net [23], WS-DAN [38], and Inception V3 [47] on our dataset
Fig. 12 Qualitative comparison of WS_DAN models trained on Stanford Dogs and Tsinghua Dogs. Dogs in each row belong to the same breed. WS_DAN trained on Tsinghua Dogs classifies the dogs correctly except for the last column, while the one trained on Stanford Dogs gives a correct classification only for the first column.
[1]   Cai, S.; Zuo, W.; Zhang, L. Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In: Proceedings of the IEEE International Conference on Computer Vision, 511-520, 2017.
[2]   Cui, Y.; Song, Y.; Sun, C.; Howard, A.; Belongie, S. J. Large scale fine-grained categorization and domain-specific transfer learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4109-4118, 2018.
[3]   Wang, Y.; Morariu, V. I.; Davis, L. S. Learning a discriminative filter bank within a CNN for fine-grained recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4148-4157, 2018.
[4]   Yang, Z.; Luo, T. G.; Wang, D.; Hu, Z. Q.; Gao, J.; Wang, L. W. Learning to navigate for fine-grained classification. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science Vol. 11218. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 438-454, 2018.
[5]   Khosla, A.; Jayadevaprakash, N.; Yao, B.; Li, F.-F. Novel dataset for fine-grained image categorization. In: Proceedings of the 1st Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition, 2011.
[6]   Krizhevsky, A.; Sutskever, I.; Hinton, G. E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, Vol. 1, 1097-1105, 2012.
[7]   Chen, L.; Yang, M. Semi-supervised dictionary learning with label propagation for image classification. Computational Visual Media Vol. 3, No. 1, 83-94, 2017.
[8]   Chen, K. X.; Wu, X. J. Component SPD matrices: A low-dimensional discriminative data descriptor for image set classification. Computational Visual Media Vol. 4, No. 3, 245-252, 2018.
[9]   Ren, J. Y.; Wu, X. J. Vectorial approximations of infinite-dimensional covariance descriptors for image classification. Computational Visual Media Vol. 3, No. 4, 379-385, 2017.
[10]   Wah, C.; Branson, S.; Welinder, P.; Perona, P.; Belongie, S. The Caltech-UCSD Birds-200-2011 Dataset. Computation & Neural Systems Technical Report, CNS-TR-2011-001. California Institute of Technology, 2011.
[11]   Liu, J.; Kanazawa, A.; Jacobs, D.; Belhumeur, P. Dog breed classification using part localization. In: Proceedings of the 12th European Conference on Computer Vision, Vol. Part I, 172-185, 2012.
[12]   Berg, T.; Belhumeur, P. N. POOF: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 955-962, 2013.
[13]   Branson, S.; Horn, G. V.; Belongie, S.; Perona, P. Bird species categorization using pose normalized deep convolutional nets. arXiv preprint arXiv:1406.2952, 2014.
[14]   Zhang, N.; Donahue, J.; Girshick, R.; Darrell, T. Part-based R-CNNs for fine-grained category detection. In: Computer Vision-ECCV 2014. Lecture Notes in Computer Science Vol. 8689. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 834-849, 2014.
[15]   Lin, D.; Shen, X.; Lu, C.; Jia, J. Deep LAC: Deep localization, alignment and classification for fine-grained recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1666-1674, 2015.
[16]   Lam, M.; Mahasseni, B.; Todorovic, S. Fine-grained recognition as HSnet search for informative image parts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6497-6506, 2017.
[17]   Chen, Y.; Bai, Y.; Zhang, W.; Mei, T. Destruction and construction learning for finegrained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5157-5166, 2019.
[18]   Ge, W. F.; Lin, X. R.; Yu, Y. Z. Weakly supervised complementary parts models for fine-grained image classification from the bottom up. arXiv preprint arXiv:1903.02827, 2019.
[19]   Du, R. Y.; Chang, D. L.; Bhunia, A. K.; Xie, J. Y.; Ma, Z. Y.; Song, Y. Z.; Guo, J. Fine-grained visual classification via progressive multi-granularity training of jigsaw patches. arXiv preprint arXiv:2003.03836, 2020.
[20]   Zheng, H.; Fu, J.; Mei, T.; Luo, J. Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE International Conference on Computer Vision, 5219-5227, 2017.
[21]   Fu, J.; Zheng, H.; Mei, T. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4476-4484, 2017.
[22]   Zheng, H.; Fu, J.; Zha, Z.; Luo, J.; Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5012-5021, 2019.
[23]   Zhang, F.; Li, M.; Zhai, G.; Liu, Y. Three-branch and multi-scale learning for fine-grained image recognition (TBMSL-Net). arXiv preprint arXiv:2003.09150, 2020.
[24]   Sun, G. L.; Cholakkal, H.; Khan, S.; Khan, F. S.; Shao, L. Fine-grained recognition: Accounting for subtle differences between similar classes. arXiv preprint arXiv:1912.06842, 2019.
[25]   Lin, T.-Y.; RoyChowdhury, A.; Maji, S. Bilinear CNN models for fine-grained visual recognition. In: Proceedings of the IEEE international conference on computer vision, 1449-1457, 2015.
[26]   Gao, Y.; Beijbom, O.; Zhang, N.; Darrell, T. Compact bilinear pooling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 317-326, 2016.
[27]   Yu, C.; Zhao, X.; Zheng, Q.; Zhang, P.; You, X. Hierarchical bilinear pooling for fine-grained visual recognition. In: Computer Vision-ECCV 2018. Lecture Notes in Computer Science Vol. 11220. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 595-610, 2018.
[28]   Wang, Y.; Choi, J.; Morariu, V. I.; Davis, L. S. Mining discriminative triplets of patches for fine-grained classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1163-1172, 2016.
[29]   Zhang, X.; Zhou, F.; Lin, Y.; Zhang, S. Embedding label structures for finegrained feature representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1114-1123, 2016.
[30]   Dubey, A.; Gupta, O.; Raskar, R.; Naik, N. Maximum-entropy fine grained classification. arXiv preprint arXiv:1809.05934, 2018.
[31]   Qian, Q.; Jin, R.; Zhu, S.; Lin, Y. Fine-grained visual categorization via multi-stage metric learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3716-3724, 2015.
[32]   Sun, M.; Yuan, Y.; Zhou, F.; Ding, E. Multi-attention multi-class constraint for fine-grained image recognition. In: Computer Vision-ECCV 2018. Lecture Notes in Computer Science Vol. 11220. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 834-850, 2018.
[33]   Dubey, A.; Gupta, O.; Guo, P.; Raskar, R.; Farrell, R.; Naik, N. Pairwise confusion for fine-grained visual classification. In: Computer Vision-ECCV 2018. Lecture Notes in Computer Science Vol. 11216. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 71-88, 2018.
[34]   Zhuang, P.; Wang, Y.; Qiao, Y. Learning attentive pairwise interaction for fine-grained classification. arXiv preprint arXiv:2002.10191, 2020.
[35]   Xu, Z.; Huang, S.; Zhang, Y.; Tao, D. Augmenting strong supervision using web data for finegrainedcategorization. In: Proceedings of the IEEE International Conference on Computer Vision, 2524-2532, 2015.
[36]   Niu, L.; Veeraraghavan, A.; Sabharwal, A. Fine-grained classification using heterogeneous web data and auxiliary categories. arXiv preprint arXiv:1811.07567, 2018.
[37]   Torralba, A.; Efros, A. A. Unbiased look at dataset bias. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1521-1528, 2011.
[38]   Hu, T.; Qi, H. G.; Huang, Q. M.; Lu, Y. See better before looking closer: Weakly supervised data augmentation network for fine-grained visual classification. arXiv preprint arXiv:1901.09891, 2019.
[39]   Krause, J.; Stark, M.; Deng, J.; L. Fei-Fei. 3D object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, 554-561, 2013.
[40]   Maji, S.; Rahtu, E.; Kannala, J.; Blaschko, M.; Vedaldi, A. Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151, 2013.
[41]   Nilsback, M.; Zisserman, A. Automated flower classification over a large number of classes. In: Proceedings of the 6th Indian Conference on Computer Vision, Graphics & Image Processing, 722-729, 2008.
[42]   Deng, J.; Dong, W.; Socher, R.; Li, L.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 248-255, 2009.
[43]   Everingham, M.; van Gool, L.; Williams, C. K. I.; Winn, J.; Zisserman, A. The pascal visual object classes (VOC) challenge. International Journal of Computer Vision Vol. 88, No. 2, 303-338, 2010.
[44]   Lin, T.; Maire, M.; Belongie, S.; Bourdev, L.; Girshick, R.; Hays, J.; Perona, P.; Ramanan, D.; Zitnick, C. L.; Dollár, P. Microsoft COCO: Common objects in context. arXiv preprint arXiv:1405.0312, 2014.
[45]   Wang, Z.; Bovik, A. C.; Sheikh, H. R.; Simoncelli, E. P. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing Vol. 13, No. 4, 600-612, 2004.
[46]   Russell, B. C.; Torralba, A.; Murphy, K. P.; Freeman, W. T. LabelMe: A database and web-based tool for image annotation. International Journal of Computer Vision Vol. 77, Nos. 1-3, 157-173, 2008.
[47]   Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K. Q. Densely connected convolutional networks. arXiv preprint arXiv:1608.06993, 2016.
[1] Hawraa Abbas, Yulia Hicks, David Marshall, Alexei I. Zhurov, Stephen Richmond. A 3D morphometric perspective for facial gender analysis and classification using geodesic path curvature features[J]. Computational Visual Media, 2018, 4(1): 17-32.
[2] Yifan Lu, Jiaming Lu, Songhai Zhang, Peter Hall. Traffic signal detection and classification in street views using an attention model[J]. Computational Visual Media, 2018, 04(03): 253-266.