(45-5) 09 * << * >> * Русский * English * Содержание * Все выпуски

Advanced Hough-based method for on-device document localization
D.V. Tropin 1,2,5, A.M. Ershov 3,5, D.P. Nikolaev 4,5, V.V. Arlazarov 2,5

Moscow Institute of Physics and Technology (National Research University), Dolgoprudny, Russia,
FRC CSC RAS, Moscow, Russia,
Moscow State University, Moscow, Russia,
Institute for Information Transmission Problems of the RAS (Kharkevich Institute), Moscow, Russia,
LLC "Smart Engines Service", Moscow, Russia

 PDF, 5247 kB

DOI: 10.18287/2412-6179-CO-895

Страницы: 702-712.

Язык статьи: English.

The demand for on-device document recognition systems increases in conjunction with the emergence of more strict privacy and security requirements. In such systems, there is no data transfer from the end device to a third-party information processing servers. The response time is vital to the user experience of on-device document recognition. Combined with the unavailability of discrete GPUs, powerful CPUs, or a large RAM capacity on consumer-grade end devices such as smartphones, the time limitations put significant constraints on the computational complexity of the applied algorithms for on-device execution.
     In this work, we consider document location in an image without prior knowledge of the docu-ment content or its internal structure. In accordance with the published works, at least 5 systems offer solutions for on-device document location. All these systems use a location method which can be considered Hough-based. The precision of such systems seems to be lower than that of the state-of-the-art solutions which were not designed to account for the limited computational resources.
     We propose an advanced Hough-based method. In contrast with other approaches, it accounts for the geometric invariants of the central projection model and combines both edge and color features for document boundary detection. The proposed method allowed for the second best result for SmartDoc dataset in terms of precision, surpassed by U-net like neural network. When evaluated on a more challenging MIDV-500 dataset, the proposed algorithm guaranteed the best precision compared to published methods. Our method retained the applicability to on-device computations.

Ключевые слова:
document detection, rectangle object localization, smartphone-based acquisition, on-device recognition, Hough transform, image segmentation.

This work is partially supported by Russian Foundation for Basic Research (projects 18-29-26035 and 19-29-09092).

Tropin DV, Ershov AM, Nikolaev DP, Arlazarov VV. Advanced Hough-based method for on-device document localization. Computer Optics 2021; 45(5): 702-712. DOI: 10.18287/2412-6179-CO-895.


  1. Bulatov K, Arlazarov VV, Chernov T, Slavin O, Nikolaev D. Smart IDReader: Document recognition in video stream. 14th IAPR Int Conf on Document Analysis and Recognition (ICDAR) 2017; 6: 39-44. DOI: 10.1109/ICDAR.2017.347.
  2. Esser D, Muthmann K, Schuster D. Information extraction efficiency of business documents captured with smartphones and tablets. Proc 2013 ACM Symposium on Document Engineering 2013: 111-113. DOI: 10.1145/2494266.2494302.
  3. Buttarelli G. The EU GDPR as a clarion call for a new global digital gold standard. Int Data Priv Law 2016; 6(2): 77-78. DOI: 10.1093/idpl/ipw006.
  4. Andreeva A, Arlazarov VV, Gayer A, Dorokhov E, Sheshkus A, Slavin OA. Document recognition method based on convolutional neural network invariant to 180 degree rotation angle. Information Technologies and Computing Systems 2019; 4: 87-93. DOI: 10.14357/20718632190408.
  5. Zhang Z, He L-W. Whiteboard scanning and image enhancement. Digit Signal Process 2007; 17(2): 414-432. DOI: 10.1016/j.dsp.2006.05.006.
  6. Zhukovsky A, Nikolaev D, Arlazarov V, Postnikov V, Polevoy D, Skoryukina N, Chernov T, Shemiakina J, Mukovozov A, Konovalenko I, Povolotsky M. Segments graph-based approach for document capture in a smartphone video stream. 14th IAPR Int Conf on Document Analysis and Recognition (ICDAR) 2017: 337-342. DOI: 10.1109/ICDAR.2017.63.
  7. Hartl AD, Reitmayr G. Rectangular target extraction for mobile augmented reality applications. Proc 21st Int Conf on Pattern Recognition (ICPR 2012) 2012: 81-84.
  8. Skoryukina N, Nikolaev DP, Sheshkus A, Polevoy D. Real time rectangular document detection on mobile devices. Proc SPIE 2015; 9445: 94452A. DOI: 10.1117/12.2181377.
  9. Tropin D, Konovalenko I, Skoryukina N, Nikolaev D, Arlazarov VV. Improved algorithm of ID card detection by a priori knowledge of the document aspect ratio. Proc SPIE 2021; 11605: 116051F. DOI: 10.1117/12.2587029.
  10. Tropin D, Ilyuhin S, Nikolaev D, Arlazarov VV. Approach for document detection by contours and contrasts. 2020 25th Int Conf on Pattern Recognition (ICPR) 2021: 9689-9695. DOI: 10.1109/ICPR48806.2021.9413271.
  11. Puybareau É, Géraud T. Real-time document detection in smartphone videos. 25th IEEE Int Conf on Image Processing (ICIP) 2018: 1498-1502. DOI: 10.1109/ICIP.2018.8451533.
  12. Liu N, Wang L. Dynamic detection of an object framework in a mobile device captured image. US Patent 10134163 B2 of November 20, 2018.
  13. Sanchez-Rivero R, Silva-Mata FJ, Morales-Quevedo A. Captura de documentos de identidad en escenarios reales: Deteccíon y evaluacíon de la calidad. 18th International V Conferencia internacoial en ciencias computacionales e informatica (CICCI 2020).
  14. Attivissimo F, Giaquinto N, Scarpetta M, Spadavecchia M. An automatic reader of identity documents. IEEE Int Conf on Systems, Man and Cybernetics (SMC) 2019: 3525-3530. DOI: 10.1109/SMC.2019.8914438.
  15. Ngoc M, Fabrizio J, Géraud T. Document detection in videos captured by smartphones using a saliency-based method. Int Conf on Document Analysis and Recognition Workshops (ICDARW) 2019; 4: 19-24. DOI: 10.1109/ICDARW.2019.30059.
  16. Leal LR, Bezerra BL. Smartphone camera document detection via Geodesic Object Proposals. IEEE Latin American Conference on Computational Intelligence (LA-CCI). 2016: 1-6. DOI: 10.1109/LA-CCI.2016.7885735.
  17. Castelblanco A, Solano J, Lopez C, Rivera E, Tengana L, Ochoa M. Machine learning techniques for identity document verification in uncontrolled environments: A case study. In Book: Mora KMF, Marín JA, Cerda J, Carrasco-Ochoa JA, Martínez-Trinidad JF, Olvera-López JA, eds. Pattern Recognition. 12th Mexican Conference, MCPR 2020. Cham, Switzerland: Springer; 2020: 271-281. DOI: 10.1007/978-3-030-49076-8_26.
  18. Zhu A, Zhang C, Li Z, Xiong S. Coarse-to-fine document localization in natural scene image with regional attention and recursive corner refinement. Int J Doc Anal Recognit 2019; 22(3): 351-360. DOI: 10.1007/s10032-019-00341-0.
  19. das Neves R, Lima E, Bezerra B, Zanchettin C, Toselli A. HU-PageScan: a fully convolutional neural network for document page crop. IET Image Process 2020; 14(15): 3890-3898. DOI: 10.1049/iet-ipr.2020.0532.
  20. Sheshkus A, Nikolaev D, Arlazarov VL. Houghencoder: Neural Network Architecture for Document Image Semantic Segmentation. IEEE Int Conf on Image Processing (ICIP) 2020: 1946-1950. DOI: 10.1109/ICIP40778.2020.9191182.
  21. Brady ML. A fast discrete approximation algorithm for the Radon transform. SIAM J Comput 1998; 27(1): 107-119. DOI: 10.1137/S0097539793256673.
  22. Javed K, Shafait F. Real-time document localization in natural images by recursive application of a CNN. 14th IAPR Int Conf on Document Analysis and Recognition (ICDAR) 2017; 1: 105-110. DOI: 10.1109/ICDAR.2017.26.
  23. Burie J, Chazalon J, Coustaty M, Eskenazi S, Luqman MM, Mehri M, Nayef N, Ogier J, Prum S, Rusiñol M. ICDAR2015 competition on smartphone document capture and OCR (SmartDoc). 13th Int Conf on Document Analysis and Recognition (ICDAR) 2015: 1161-1165. DOI: 10.1109/ICDAR.2015.7333943.
  24. Arlazarov VV, Bulatov K, Chernov T, Arlazarov VL. MIDV-500: a dataset for identity document analysis and recognition on mobile devices in video stream. Computer Optics 2019; 43(5): 818-824. DOI: 10.18287/2412-6179-2019-43-5-818-824.
  25. Chazalon J, Rusiñol M, Ogier J, Lladós J. A semi-automatic groundtruthing tool for mobile-captured document segmentation. 13th Int Conf on Document Analysis and Recognition (ICDAR) 2015: 621-625. DOI: 10.1109/ICDAR.2015.7333836.
  26. Konovalenko IA, Kokhan VV, Nikolaev DP. Maximal coordinate discrepancy as accuracy criterion of image projective normalization for optical recognition of documents. Bulletin of the South Ural State University, Series «Mathematical Modelling, Programming & Computer Software» 2020; 13(3): 43-58. DOI: 10.14529/mmp200304.
  27. Chiron G, Ghanmi N, Awal AM. ID documents matching and localization with multi-hypothesis constraints. 2020 25th Int Conf on Pattern Recognition (ICPR) 2021: 3644-3651. DOI: 10.1109/ICPR48806.2021.9412437.

© 2009, IPSI RAS
Россия, 443001, Самара, ул. Молодогвардейская, 151; электронная почта: journal@computeroptics.ru; тел: +7 (846) 242-41-24 (ответственный секретарь), +7 (846) 332-56-22 (технический редактор), факс: +7 (846) 332-56-20