Abstract
In this paper, we propose a novel holistic methodology for keyword search in historical typewritten documents combining synthetic data and user’s feedback. The holistic approach treats the word as a single entity and entails the recognition of the whole word rather than of individual characters. Our aim is to search for keywords typed by the user in a large collection of digitized typewritten historical documents. The proposed method is based on: (i) creation of synthetic image words; (ii) word segmentation using dynamic parameters; (iii) efficient hybrid feature extraction for each image word and (iv) a retrieval procedure that is optimized by user’s feedback. Experimental results prove the efficiency of the proposed approach.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baird, H.S.: The state of the art of document image degradation modeling. In: IARP 2000 Workshop on Document Analysis Systems, pp. 10–13 (2000)
Bhat, D.: An evolutionary measure for image matching. In: Proceedings of the Fourteenth International Conference on Pattern Recognition, ICPR 1998, vol. I, pp. 850–852 (1998)
Bokser, M.: Omnidocument technologies. Proceedings of the IEEE 80(7), 1066–1078 (1992)
Gatos, B., Papamarkos, N., Chamzas, C.: A binary tree based OCR technique for machine printed characters. Engineering Applications of Artificial Intelligence 10(4), 403–412 (1997)
Lu, Y., Tan, C., Weihua, H., Fan, L.: An approach to word image matching based on weighted Hausdorff distance. In: Sixth International Conference on Document Analysis and Recognition (ICDAR 2001), pp. 10–13 (2001)
Madhvanath, S., Govindaraju, V.: Local reference lines for handwritten word recognition. Pattern Recognition 32, 2021–2028 (1999)
Manmatha, R.: A scale space approach for automatically segmenting words from historical handwritten documents. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1212–1225 (2005)
Marcolino, A., Ramos, V., Ármalo, M., Pinto, J.C.: Line and Word matching in old documents. In: Proceedings of the Fifth IberoAmerican Sympsium on Pattern Recognition (SIARP 2000), pp. 123–125 (2000)
Rath, T.M., Manmatha, R.: Features for word spotting in historical documents. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003), pp. 218–222 (2003)
Waked, B., Suen, C.Y., Bergler, S.: Segmenting document images using diagonal white runs and vertical edges. In: Proceedings of the Sixth International Conference on Document Analysis and Recognition (ICDAR 2001), pp. 194–199 (2001)
Weihua, H., Tan, C.L., Sung, S.Y., Xu, Y.: Word shape recognition for image-based document retrieval. In: International Conference on Image Processing, ICIP 2001, pp. 8–11 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gatos, B., Konidaris, T., Pratikakis, I., Perantonis, S.J. (2006). A Holistic Methodology for Keyword Search in Historical Typewritten Documents. In: Antoniou, G., Potamias, G., Spyropoulos, C., Plexousakis, D. (eds) Advances in Artificial Intelligence. SETN 2006. Lecture Notes in Computer Science(), vol 3955. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11752912_52
Download citation
DOI: https://doi.org/10.1007/11752912_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34117-8
Online ISBN: 978-3-540-34118-5
eBook Packages: Computer ScienceComputer Science (R0)