Abstract
We present a new system for predicting the segmentation of online handwritten documents into multiple blocks, such as text paragraphs, tables, graphics, or mathematical expressions. A hierarchical representation of the document is adopted by aggregating strokes into blocks, and interactions between different levels are modeled in a tree Conditional Random Field. Features are extracted, and labels are predicted at each tree level with logistic classifiers, and Belief Propagation is adopted for optimal inference over the structure. Being fully trainable, the system is shown to properly handle difficult segmentation problems arising in unconstrained online note-taking documents, where no prior knowledge is available regarding the layout or the expected content. Our experiments show very promising results and allow to envision fully automatic segmentation of free-form online notes.






Similar content being viewed by others
Notes
For IAM-OnDo dataset, a more elaborate temporal distance could be adopted since each sampled point has its own timestamp.
References
Awasthi, P., Gagrani, A., Ravindran, B.: Image modeling using tree structured conditional random fields. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 2060–2065 (2007)
Bishop, C.M., Svensen, M., Hinton, G.E.: Distinguishing text from graphics in on-line handwritten ink. In: Proceedings of the 9th International Workshop on Frontiers in Handwriting Recognition, pp. 142–147. IEEE (2004)
Blanchard, J., Artieres, T.: On-line handwritten documents segmentation. In: Proceedings of the 9th International Workshop on Frontiers in Handwriting Recognition, pp. 148–153. IEEE (2004)
Boix, X., Gonfaus, J.M., van de Weijer, J., Bagdanov, A.D., Serrat, J., Gonzàlez, J.: Harmony potentials. Int. J. Comput. Vis. 96(1):83–102 (2012)
Delaye, A., Anquetil, E.: Hbf49 feature set: A first unified baseline for online symbol recognition. Pattern Recognition 46(1), 117–130 (2013)
Delaye, A., Liu, C.-L.: Text/non-text classification in online handwritten documents with conditional random fields. In: Liu, C.-L., Zhang, C., Wang, L. (eds.) Proceedings of the Chinese Conference on Pattern Recognition, volume 0321 of Communications in Computer and Information Science, pp. 514–521. Springer, Heidelberg (2012)
Delaye, A., Liu, C.-L.: Context modeling for text/non-text separation in freeform online handwritten documents. In: Proceedings of the 19th Document Recognition and Retrieval Conference, part of the IS&T-SPIE Electronic Imaging Symposium, SPIE Proceedings, pp. 86580C–86580C. SPIE (2013)
Delaye, A., Liu, C.-L.: Graphics extraction from heterogeneous online documents with hierarchical random fields. In: Proceedings of the 12th International Conference on Document Analysis and Recognition, pp. 1007–1011 (2013)
Delaye, A., Liu, C.-L.: Contextual text/non-text stroke classification in online handwritten notes with conditional random fields. Pattern Recognition 47(3), 959–968 (2014)
Delaye, A., Macé, S., Anquetil, E.: Modeling relative positioning of handwritten patterns. In: Proceedings of the 14th Biennial Conference of the International Graphonomics Society, pp. 122–127 (2009)
He, X., Zemel, R.S., Carreira-Perpinán, M.A.: Multiscale conditional random fields for image labeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 695–702. IEEE (2004)
Indermühle, E., Bunke, H., Shafait, F., Breuel, T.: Text versus non-text distinction in online handwritten documents. In: Proceedings of the 2010 ACM Symposium on Applied Computing, pp. 3–7. ACM (2010)
Indermühle, E., Frinken, V., Bunke, H.: Mode detection in online handwritten documents using BLSTM neural networks. In: Proceedings of the 13th International Conference on Frontiers in Handwriting Recognition, pp. 302–307 (2012)
Indermühle, E., Liwicki, M., Bunke, H.: IAMonDo-database: An online handwritten document database with non-uniform contents. In: Document Analysis Systems, pp. 97–104 (2010)
Jain, A.K., Namboodiri, A.M., Subrahmonia, J.: Structure in on-line documents. In: Proceedings of the 6th International Conference on Document Analysis and Recognition, pp. 844–848 (2001)
Kschischang, F.R., Frey, B.J., Loeliger, H.-A.: Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47(2), 498–519 (2001)
Kumar, S., Hebert, M.: Man-made structure detection in natural images using a causal multiscale random field. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 119–126. IEEE (2003)
Ladicky, L., Russell, C., Kohli, P., Torr, P.H.: Associative hierarchical crfs for object class image segmentation. In: Computer Vision, 2009 IEEE 12th International Conference on, pp. 739–746. IEEE (2009)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the International Conference on Machine Learning, pp. 282–289. Citeseer (2001)
Lemaitre, A., Camillerapp, J., Coüasnon, B.: Multiresolution cooperation makes easier document structure recognition. Int. J. Doc. Anal. Recognit. 11(2), 97–109 (2008)
Lin, Z., He, J., Zhong, Z., Shum, H.-Y.: Table detection in online ink notes. IEEE Trans. Pattern Anal. Mach. Intell. 28, 1341–1346 (2006)
Liu, D.C., Nocedal, J.: On the limited memory bfgs method for large scale optimization. Math. Program. 45(1), 503–528 (1989)
Liwicki, M., Indermühle, E., Bunke, H.: On-line handwritten text line detection using dynamic programming. In: Proceedings of the 9th International Conference on Document Analysis and Recognition, vol. 1, pp. 447–451. IEEE (2007)
Montreuil, F., Grosicki, E., Heutte, L., Nicolas, S.: Unconstrained handwritten document layout extraction using 2d conditional random fields. In: Proceedings of the 10th International Conference on Document Analysis and Recognition, pp. 853–857. IEEE (2009)
Montreuil, F., Nicolas, S., Grosicki, E., Heutte, L.: A new hierarchical handwritten document layout extraction based on conditional random field modeling. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 31–36. IEEE (2010)
Nowozin, S. ,Gehler, P.V., Lampert, C.H.: On parameter learning in crf-based approaches to object class image segmentation. In: European Conference on Computer Vision, pp. 98–111. Springer (2010)
Otte, S., Krechel, D., Liwicki, M., Dengel, A.: Local feature based online mode detection with recurrent neural networks. In: Proceedings of the 13th International Conference on Frontiers in Handwriting Recognition, pp. 531–535 (2012)
Plath, N., Toussaint, M., Nakajima, S.: Multi-class image segmentation using conditional random fields and global classification. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 817–824. ACM (2009)
Quattoni, A., Collins, M., Darrell, T.: Conditional random fields for object recognition. In: Proceedings of the 11th International Conference on Neural Information Processing. Citeseer (2004)
Reynolds, J., Murphy, K.: Figure-ground segmentation using a hierarchical conditional random field. In: Proceedings of the 4th Canadian Conference on Computer and Robot Vision, pp. 175–182. IEEE (2007)
Shi, Z., Govindaraju, V.: Multi-scale techniques for document page segmentation. In: Proceedings of the 8th International Conference on Document Analysis and Recognition, pp. 1020–1024. IEEE (2005)
Shilman, M., Wei, Z., Raghupathy, S., Simard, P., Jones, D.: Discerning structure from freeform handwritten notes. In: Proceedings of the 7th International Conference on Document Analysis and Recognition, pp. 60–65. IEEE (2003)
Sutton, C.A., McCallum, A.: An introduction to conditional random fields. Found. Trends Mach. Learn. 4(4), 267–373 (2012)
Szummer, M.: Learning diagram parts with hidden random fields. In: Proceedings of the 8th International Conference on Document Analysis and Recognition, pp. II, 1188–1193 (2005)
Wang, S.B., Quattoni, A., Morency, L.-P., Demirdjian, D., Darrell. T.: Hidden conditional random fields for gesture recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1521–1527. IEEE (2006)
Weber, M., Liwicki, M., Schelske, Y.T.H., Schoelzel, C., Strauß, F., Dengel, F.: Mcs for online mode detection: Evaluation on pen-enabled multi-touch interfaces. In: Proceedings of the 11th International Conference on Document Analysis and Recognition, pp. 957–961. IEEE (2011)
Willems, D., Rossignol, S., Vuurpijl, L.: Mode detection in on-line pen drawing and handwriting recognition. In: Proceedings of the 8th International Conference on Document Analysis and Recognition, pp. 31–35. IEEE (2005)
Yao, J., Fidler, J., Urtasun, R.: Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 702–709. IEEE (2012)
Yedidia, J.S., Freeman, W.T., Weiss, Y.: Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans. Inf. Theory 51(7), 2282–2312 (2005)
Yin, F., Liu, C.-L.: Handwritten chinese text line segmentation by clustering with distance metric learning. Pattern Recognit. 42(12), 3146–3157 (2009)
Zhang, X., Lyu, M.R., Dai, G.: Extraction and segmentation of tables from chinese ink documents based on a matrix model. Pattern Recognit. 40(7), 1855–1867 (2007)
Zhou, X.-D., Liu, C.-L.: Text/non-text ink stroke classification in Japanese handwriting based on markov random fields. In: Proceedings of the 9th International Conference on Document Analysis and Recognition, vol. 1, pp. 377–381. IEEE (2007)
Zhou, X.-D., Wang, D.-H., Liu, C.-L.: A robust approach to text line grouping in online handwritten Japanese documents. Pattern Recognit. 42(9), 2077–2088 (2009)
Acknowledgments
This work is supported by the Chinese Academy of Sciences under the Fellowships for Young International Scientists program (No. 2012Y1GB0001), and by National Natural Science Fundation of China under the Research Fund for International Young Scientists (No. 61250110082).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Delaye, A., Liu, CL. Multi-class segmentation of free-form online documents with tree conditional random fields. IJDAR 17, 313–329 (2014). https://doi.org/10.1007/s10032-014-0221-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-014-0221-z