(44-6) 15 * << * >> * Russian * English * Content * All Issues

Book spine recognition with the use of deep neural networks
M.O. Kalinina 1, P.L. Nikolaev 1

Moscow Aviation Institute (National Research University),
121552, Moscow, Russia, Orshanskaya 3

 PDF, 2256 kB

DOI: 10.18287/2412-6179-CO-731

Pages: 968-977.

Full text of article: Russian language.

Nowadays deep neural networks play a significant part in various fields of human activity. Especially they benefit spheres dealing with large amounts of data and lengthy operations on obtaining and processing information from the visual environment. This article deals with the development of a convolutional neural network based on the YOLO architecture, intended for real-time book recognition. The creation of an original data set and the training of the deep neural network are described. The structure of the neural network obtained is presented and the most frequently used metrics for estimating the quality of the network performance are considered. A brief review of the existing types of neural network architectures is also made. YOLO architecture possesses a number of advantages that allow it to successfully compete with other models and make it the most suitable variant for creating an object detection network since it enables some of the common disadvantages of such networks to be significantly mitigated (such as recognition of similarly looking, same-color book coves or slanted books). The results obtained in the course of training the deep neural network allow us to use it as a basis for the development of the software for book spine recognition.

image recognition; object detection; computer vision; machine learning; artificial neural networks; deep learning; convolutional neural networks.

Kalinina MO, Nikolaev PL. Book spine recognition with the use of deep neural networks. Computer Optics 2020; 44(6): 968-977. DOI: 10.18287/2412-6179-CO-731.


  1. Quoc N, Choi W. A framework for recognition books on bookshelves. Proc ICIC 2009: Emerging Intelligent Computing Technology and Applications; 2009; 386-395. DOI: 10.1007/978-3-642-04070-2_44.
  2. Tsai SS, Chen D, Chen H, Hsu C, Kim K, Singh JP, Girod B. Combining image and text features: A hybrid approach to mobile book spine recognition. Proc 2011 ACM Int Conf on Multimedia 2011: 1029-1032. DOI: 10.1145/2072298.2071930.
  3. Chen D, Tsaia S, Kimb K, Hsub C, Singhb JP, Giroda B. Low-cost asset tracking using location-aware camera phones. Proc SPIE 2010; 7798: 77980R. DOI: 10.1117/12.862426.
  4. Chen D, Tsai S, Hsu C, Singh JP, Girod B. Mobile augmented reality for books on a shelf. Proc 2011 IEEE Int Conf on Multimedia and Expo 2011: 1-6. DOI: 10.1109/ICME.2011.6012171.
  5. Lee DJ, Chang Y, Archibald JK, Pitzak C. Matching book-spine images for library shelf-reading process automation. Proc 2008 IEEE Int Conf on Automation Science and Engineering 2008: 738-743. DOI: 10.1109/COASE.2008.4626503.
  6. Nevetha MP, Baskar A. Automatic book spine extraction and recognition for library inventory. Management WCI '15: Proc 3rd Int Symposium on Women in Computing and Informatics 2015: 44-48. DOI: 10.1145/2791405.2791506.
  7. Jubair MI, Banik P. A technique to detect books from library bookshelf image. Proc 2013 IEEE 9th Int Conf on Computational Cybernetics (ICCC) 2013: 359-363. DOI: 10.1109/ICCCyb.2013.6617619.
  8. Talker L, Moses Y. Viewpoint-independent book spine segmentation. Proc IEEE Winter Conf on Applications of Computer Vision; 2014: 453-460. DOI: 10.1109/WACV.2014.6836066.
  9. Yang X, He D, Huang W, Ororbia A, Zhou Z, Kifer D, Giles CL. Smart library: Identifying books on library shelves using supervised deep learning for scene text reading. Proc 2017 ACM/IEEE Joint Conf on Digital Libraries (JCDL); 2017: 1-4. DOI: 10.1109/JCDL.2017.7991581.
  10. Anegawa, R., Aritsugi, M. Text Detection on Books Using CNN Trained with Another Domain Data. Proc. 2019 IEEE Intl Conf on Dependable, Autonomic and Secure Computing; 2019: 170-176. DOI: 10.1109/DASC/PiCom/CBDCom/CyberSciTech.2019.00041.
  11. Gandhi R. R-CNN, Fast R-CNN, Faster R-CNN, YOLO – object detection algorithms. 2018. Source: <https://towardsdatascience.com/r-cnn-fast-r-cnn-faster-r-cnn-yolo-object-detection-algorithms-36d53571365e>.
  12. Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, Shafait F, Uchida S, Valveny E. ICDAR 2015 competition on robust reading. Proc 2015 13th Int Conf on Document Analysis and Recognition (ICDAR) 2015: 1156-1160.
  13. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. Proc 2016 IEEE Conf Comp Vis Pattern Recogn 2016; 779-788. DOI: 10.1109/CVPR.2016.91.
  14. Redmon J, Farhady A. Yolo9000: Better, faster, stronger Proc 2017 IEEE Conf Comp Vis Pattern Recogn 2017: 6517-6525.
  15. Redmon J, Farhady A. YOLOv3: An incremental improvement. 2018. Source: <https://arxiv.org/pdf/1804.02767.pdf>.
  16. Liu W, Anuelov D, Erhan D, Szegedy C, Reed S, Fu C, Berg A. SSD: Single shot multibox detector. In Book: Leibe B, Matas J, Sebe N, Welling M, eds. Computer Vision – ECCV 2016. Cham: Springer; 2016. DOI: 10.1007/978-3-319-46448-0_2.
  17. Lin TY, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. 2018. Source: <https://arxiv.org/pdf/1708.02002.pdf>.
  18. Tao A, Barker J, Sarathy S. DetectNet: Deep Neural Network for Object Detection in DIGITS. Source: <https://developer.nvidia.com/blog/detectnet-deep-neural-network-object-detection-digits/>.
  19. Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. 2016. Source: <https://arxiv.org/pdf/1506.01497.pdf>.
  20. He K, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. 2018. Source: <https://arxiv.org/pdf/1703.06870.pdf>.
  21. Mask R-CNN: architecture of modern neuron network for object segmentation on image [In Russian]. 2018. Source: <https://habr.com/en/post/421299/>.
  22. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC. SSD: Single shot multibox detector. 2016. Source: <https://arxiv.org/pdf/1512.02325.pdf>.
  23. Tsang S. Review: SSD – single shot detector (object detection). 2018. Source: <https://towardsdatascience.com/review-ssd-single-shot-detector-object-detection-851a94607d11>.
  24. YOLO: Real-time object detection.  Source: <https://pjreddie.com/darknet/yolo/>.
  25. Sambasivarao K. Non-maximum suppression (NMS). 2019. Source: <https://towardsdatascience.com/non-maximum-suppression-nms-93ce178e177c>.
  26. Bindal A. Normalization Techniques in Deep Neural Networks. 2019. Source: <https://medium.com/techspace-usict/normalization-techniques-in-deep-neural-networks-9121bf100d8>.
  27. Sharma H. Activation functions: Sigmoid, ReLU, Leaky ReLU and Softmax basics for neural networks and deep learning. 2019. Source: <https://medium.com/@himanshuxd/activation-functions-sigmoid-relu-leaky-relu-and-softmax-basics-for-neural-networks-and-deep-8d9c70eed91e>.
  28. The PASCAL Visual Object Classes homepage. Source: <http://host.robots.ox.ac.uk/pascal/VOC/>.
  29. Arlen TC. Understanding the mAP evaluation metric for object detection. Source: <https://medium.com/@timothycarlen/understanding-the-map-evaluation-metric-for-object-detection-a07fe6962cf3>.
  30. Saxen S. Precision vs Recall. 2018. Source: <https://towardsdatascience.com/precision-vs-recall-386cf9f89488>.
  31. Sandeep A. Object detection – IOU – Intersection Over Union. 2019. Source: <https://medium.com/@nagsan16/object-detection-iou-intersection-over-union-730`70cb11f6e>.
  32. Bodla N, Singh B, Chellappa R, Davis LS. Improving object detection with one line of code. 2017. Source: <https://arxiv.org/pdf/1704.04503.pdf>.

© 2009, IPSI RAS
151, Molodogvardeiskaya str., Samara, 443001, Russia; E-mail: ko@smr.ru ; Tel: +7 (846) 242-41-24 (Executive secretary), +7 (846) 332-56-22 (Issuing editor), Fax: +7 (846) 332-56-20