Abstract
This paper presents an effective and efficient approach to extracting scene text from images. The approach first extracts the edge information by the local maximum difference filter (LMDF), and at the same time a given image is decomposed into a group of image layers by color clustering. Then, through combining the characteristics of geometric structure and spatial distribution of scene text with the edge map, the candidate text image layers are identified. Further, in character level, the candidate text connected components are identified using a set of heuristic rules. Finally, the graph-cut computation is utilized to identify and localize text lines with arbitrary directions. In the proposed approach, the segmentation of text pixels is efficiently embedded into the computation of text localization as a part. The comprehensive evaluation experiments are performed on four challenging datasets (ICDAR 2003, ICDAR 2011, MSRA-TD500 and The Street View Text (SVT)) to verify the validation of our approach. In the comparison experiments with many state-of-the-art methods, the results demonstrate that our approach can effectively handle scene text with diverse fonts, sizes, colors, different languages, as well as arbitrary orientations, and it is robust to the influence of illumination change.













Similar content being viewed by others
References
Bhattacharya U, Parui SK, Mondal S (2009) Devanagari and bangla text extraction from natural scene images. In: Proceedings of the 10th international conference on document analysis and recognition (ICDAR). Catalonia, pp 171–175
Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: Proceedings of the 23rd IEEE conference on computer vision and pattern recognition (CVPR). San Francisco, pp 2963–2970
Fabrizio J, Marcotegui B, Cord M (2009) Text segmentation in natural scenes using toggle-mapping. In: Proceedings of the 16th IEEE international conference on image processing. Cairo, pp 2373–2376
Hanif SM, Prevost L, Negri PA (2008) A cascade detector for text detection in natural scene images. In: Proceedings of the 19th international conference on pattern recognition (ICPR). Tampa, pp 1–4
Junga C, Liu Q, Kim J (2008) A new approach for text segmentation using a stroke filter. Signal Proc 88(7):1907–1916
Kumar M, Kim YC, Lee GS (2010) Text detection using multilayer separation in real scene images. In: Proceedings of the 10th IEEE international conference on computer and information technology. Bradford, pp 1413–1417
Kumar M, Lee G (2010) Automatic text location from complex natural scene images. In: Proceedings of international conference on computer and automation engineering. Singapore, pp 594–597
Lee JJ, Lee PH, Lee SW, Yuille A, Koch C (2011) Adaboost for text detection in natural scene. In: Proceedings of the 11th international conference on document analysis and recognition (ICDAR). Beijing, pp 429–434
Li XJ, Wang WQ, Jiang SQ, Huang QM (2008) Fast and effective text detection. In: Proceedings of the 15th IEEE international conference on image processing. San Diego, pp 969–972
Liu Q, Jung C, Kim S, Moon Y, Yeun Kim J (2006) Stroke filter for text localization in video images. In: Proceedings of the 26th IEEE conference on image processing (ICIP). Atlanta, pp 1473–1476
Lu F, Xie M (2010) An efficient method of license plate location in complex scene. In: Proceedings of the 2nd international conference on computer modeling and simulation. Sanya Yuhai, pp 206–209
Lucas SM (2005) Icdar 2005 text locating competition results. In: Proceedings of the 8th international conference on document analysis and recognition (ICDAR). Seoul, pp 80–84
Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) Icdar 2003 robust reading competitions. In: Proceedings of the 7th international conference on document analysis and recognition (ICDAR). Edinburgh, pp 682–687
Mancas-Thillou C, Gosselin B (2006) Spatial and color spaces combination for natural scene text extraction. In: Proceedings of the 13th international conference on image proceedings (ICIP). Atlanta, pp 985–988
Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10):761–767
Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. In: Proceedings of the 10th Asian conference on computer vision (ACCV). New Zealand, pp 30–35
Neumann L, Matas J (2011) Text localization in real-world images using efficiently pruned exhaustive search. In: Proceedings of the 11th international conference on document analysis and recognition (ICDAR). Beijing, pp 687–691
Neumann L, Matas J (2012) Real-time scene text localization and recognition. In: Proceedings of the 25th IEEE conference on computer vision and pattern recognition (CVPR). Providence, pp 3538–3545
Park J, Lee G, Kim E, Lim J, Kim S, Yang H, Lee M, Hwang S (2010) Automatic detection and recognition of korean text in outdoor signboard images. Pattern Recogn Lett 31(12):1728–1739
Pazio M, Niedzwiecki M, Kowalik R, Lebiedz J (2007) Text detection system for the blind. In: Proceedings of the 15th European signal processing conference. Poznan, pp 272–276
Shahab A, Shafait F, A. Dengel. (2011) ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: Proceedings of the 11th international conference on document analysis and recognition. pp 1491–1496
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Shivakumara P, Huang W, Phan TQ, Tan CL (2010) Accurate video text detection through classification of low and high contrast images. Pattern Recognit 43(6):2165–2185
Shivakumara P, Phan TQ, Tan CL (2011) A laplacian approach to multi-oriented text detection in video. IEEE Trans Pattern Anal Mach Intell 33(2):412–419
Tang X, Gao X, Liu J, Zhang H (2002) A spatial-temporal approach for video caption detection and recognition. IEEE Trans Neural Netw 13(4):961–971
Wang K, Babenko B, Belongie S (2011) End-to-end Scene Text Recognition. In: Proceedings of the 13th international conference on computer vision (ICCV). Barcelona, pp 1457–1464
Wang K, Belongie S (2010) Word Spotting in the Wild. In: Proceedings of the 11th European conference on computer vision (ECCV). Heraklion, pp 591–604
Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: Proceedings of the 25th IEEE conference on computer vision and pattern recognition (CVPR). Providence, pp 1083–1090
Yi C, Tian Y (2013) Text extraction from scene images by character apperance and structure modeling. Comp Vision Image Underst 117(2):182–194
Zeng C, Jia W, He X (2011) An algorithm for colour-based natural scene text segmentation. In: Proceedings of the 4th international conference on camera-based document analysis and recognition. Beijing, pp 58–68
Zhang J, Kasturi R (2010) Character energy and link energy-based text extraction in scene images. In: Proceedings of the 10th Asian conference on computer vision (ACCV). New Zealand, pp 308–320
Acknowledgments
This work was supported by the National Natural Science Foundation of China under Grant No. 61232013, No. 61271434, No. 61175115.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, X., Wang, W. An effective graph-cut scene text localization with embedded text segmentation. Multimed Tools Appl 74, 4891–4906 (2015). https://doi.org/10.1007/s11042-013-1848-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-013-1848-3