Abstract
We present Searn, an algorithm for integrating search and learning to solve complex structured prediction problems such as those that occur in natural language, speech, computational biology, and vision. Searn is a meta-algorithm that transforms these complex problems into simple classification problems to which any binary classifier may be applied. Unlike current algorithms for structured learning that require decomposition of both the loss function and the feature functions over the predicted structure, Searn is able to learn prediction functions for any loss function and any class of features. Moreover, Searn comes with a strong, natural theoretical guarantee: good performance on the derived classification problems implies good performance on the structured prediction problem.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Altun, Y., Hofmann, T., & Smola, A. (2004). Gaussian process classification for segmenting and annotating sequences. In Proceedings of the international conference on machine learning (ICML).
Ando, R., & Zhang, T. (2005). A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6, 1817–1853.
Bagnell, J. A., Kakade, S., Ng, A., & Schneider, J. (2003). Policy search by dynamic programming. In Neural information processing systems (Vol. 16). Cambridge: MIT Press.
Beygelzimer, A., Dani, V., Hayes, T., Langford, J., & Zadrozny, B. (2005). Error limiting reductions between classification tasks. In Proceedings of the international conference on machine learning (ICML).
Bikel, D. M. (2004). Intricacies of Collins’ parsing model. Computational Linguistics, 30(4), 479–511.
Chang, C.-C., & Lin, C.-J. (2001). LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Cohen, W. W., & Carvalho, V. (2005). Stacked sequential learning. In Proceedings of the international joint conference on artificial intelligence (IJCAI).
Collins, M. (2002). Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In Proceedings of the conference on empirical methods in natural language processing (EMNLP).
Collins, M., & Roark, B. (2004). Incremental parsing with the perceptron algorithm. In Proceedings of the conference of the association for computational linguistics (ACL).
Crescenzi, P., Goldman, D., Papadimitriou, C., Piccolboni, A., & Yannakakis, M. (1998). On the complexity of protein folding. In ACM symposium on theory of computing (STOC) (pp. 597–603).
Dang, H. (Ed.). (2005). Fifth document understanding conference (DUC-2005), Ann Arbor, MI, June 2005.
Daumé III, H. (2006). Practical structured learning for natural language processing. PhD thesis, University of Southern California.
Daumé III, H., & Marcu, D. (2002). A noisy-channel model for document compression. In Proceedings of the conference of the association for computational linguistics (ACL) (pp. 449–456).
Daumé III, H., & Marcu, D. (2005a). Bayesian summarization at DUC and a suggestion for extrinsic evaluation. In Document understanding conference.
Daumé III, H., & Marcu, D. (2005b). A large-scale exploration of effective global features for a joint entity detection and tracking model. In Proceedings of the joint conference on human language technology conference and empirical methods in natural language processing (HLT/EMNLP) (pp. 97–104).
Daumé III, H., & Marcu, D. (2006). Bayesian query-focused summarization. In Proceedings of the conference of the association for computational linguistics (ACL), Sydney, Australia.
Foulds, L. R., & Graham, R. L. (1982). The Steiner problem in phylogeny is NP-complete. Advances in Applied Mathematics, 3, 43–49.
Freund, Y., & Shapire, R. E. (1999). Large margin classification using the perceptron algorithm. Machine Learning, 37(3), 277–296.
Germann, U., Jahr, M., Knight, K., Marcu, D., & Yamada, K. (2003). Fast decoding and optimal decoding for machine translation. Artificial Intelligence, 154(1–2), 127–143.
Giménez, J., & Màrquez, L. (2004). SVMTool: a general POS tagger generator based on support vector machines. In Proceedings of the 4th LREC.
Huang, L., Zhang, H., & Gildea, D. (2005). Machine translation as lexicalized parsing with hooks. In Proceedings of the 9th international workshop on parsing technologies (IWPT-05), October 2005.
Kääriäinen, M. (2006). Lower bounds for reductions. In The atomic learning workshop (TTI-C), March 2006.
Kakade, S., & Langford, J. (2002). Approximately optimal approximate reinforcement learning. In Proceedings of the international conference on machine learning (ICML).
Kakade, S., Teh, Y. W., & Roweis, S. (2002). An alternate objective function for Markovian fields. In Proceedings of the international conference on machine learning (ICML).
Kassel, R. (1995). A comparison of approaches to on-line handwritten character recognition. PhD thesis, Massachusetts Institute of Technology, Spoken Language Systems Group.
Knight, K., & Marcu, D. (2002). Summarization beyond sentence extraction: a probabilistic approach to sentence compression. Artificial Intelligence, 139(1).
Kudo, T., & Matsumoto, Y. (2001). Chunking with support vector machines. In Proceedings of the conference of the North American chapter of the association for computational linguistics (NAACL).
Kudo, T., & Matsumoto, Y. (2003). Fast methods for kernel-based text analysis. In Proceedings of the conference of the association for computational linguistics (ACL).
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the international conference on machine learning (ICML).
Langford, J., & Zadrozny, B. (2005). Relating reinforcement learning performance to classification performance. In Proceedings of the international conference on machine learning (ICML).
Lewis, D. (2001). Applying support vector machines to the TREC-2001 batch filtering and routing tasks. In Proceedings of the conference on research and developments in information retrieval (SIGIR).
Liang, P., Bouchard-Côté, A., Klein, D., & Taskar, B. (2006). An end-to-end discriminative approach to machine translation. In Proceedings of the joint international conference on computational linguistics and association of computational linguistics (COLING/ACL).
Lin, C.-Y., & Hovy, E. (2002). From single to multi-document summarization: a prototype system and its evaluation. In Proceedings of the conference of the association for computational linguistics (ACL), July 2002.
Lin, C.-Y., & Hovy, E. (2003). Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the conference of the North American chapter of the association for computational linguistics and human language technology (NAACL/HLT), Edmonton, Canada, 27 May–1 June 2003.
Manning, C. (2006). Doing named entity recognition? Don’t optimize for F 1. Post on the NLPers Blog, 25 August 2006. http://nlpers.blogspot.com/2006/08/doing-named-entity-recognition-dont.html.
McAllester, D., Collins, M., & Pereira, F. (2004). Case-factor diagrams for structured probabilistic modeling. In Proceedings of the conference on uncertainty in artificial intelligence (UAI).
McCallum, A., Freitag, D., & Pereira, F. (2000). Maximum entropy Markov models for information extraction and segmentation. In Proceedings of the international conference on machine learning (ICML).
McDonald, R. (2006). Discriminative sentence compression with soft syntactic constraints. In Proceedings of the conference of the European association for computational linguistics (EACL).
McDonald, R., & Pereira, F. (2005). Identifying gene and protein mentions in text using conditional random fields. BMC Bioinformatics, 6(Suppl 1).
McDonald, R., Crammer, K., & Pereira, F. (2004). Large margin online learning algorithms for scalable structured classification. In NIPS workshop on learning with structured outputs.
Musicant, D., Kumar, V., & Ozgur, A. (2003). Optimizing F-measure with support vector machines. In Proceedings of the international Florida artificial intelligence research society conference (pp. 356–360).
Ng, A., & Jordan, M. (2000). PEGASUS: A policy search method for large MDPs and POMDPs. In Proceedings of the conference on uncertainty in artificial intelligence (UAI).
Punyakanok, V., & Roth, D. (2001). The use of classifiers in sequential inference. In Advances in neural information processing systems (NIPS).
Punyakanok, V., Roth, D., & Yih, W.-T. (2005a). The necessity of syntactic parsing for semantic role labeling. In Proceedings of the international joint conference on artificial intelligence (IJCAI) (pp. 1117–1123).
Punyakanok, V., Roth, D., Yih, W.-T., & Zimak, D. (2005b). Learning and inference over constrained output. In Proceedings of the international joint conference on artificial intelligence (IJCAI) (pp. 1124–1129).
Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386–408. Reprinted in Neurocomputing (MIT Press, 1998).
Russell, S., & Norvig, P. (1995). Artificial intelligence: a modern approach. New Jersey: Prentice Hall.
Sarawagi, S., & Cohen, W. (2004). Semi-Markov conditional random fields for information extraction. In Advances in neural information processing systems (NIPS).
Shen, L., Satta, G., & Joshi, A. (2007). Guided learning for bidirectional sequence classification. In Proceedings of the conference of the association for computational linguistics (ACL).
Sutton, C., Rohanimanesh, K., & McCallum, A. (2004). Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data. In Proceedings of the international conference on machine learning (ICML) (pp. 783–790).
Sutton, C., Sindelar, M., & McCallum, A. (2005). Feature bagging: preventing weight undertraining in structured discriminative learning (Technical Report IR-402). University of Massachusetts, Center for Intelligent Information Retrieval.
Taskar, B., Guestrin, C., & Koller, D. (2003). Max-margin Markov networks. In Advances in neural information processing systems (NIPS).
Taskar, B., Chatalbashev, V., Koller, D., & Guestrin, C. (2005). Learning structured prediction models: a large margin approach. In Proceedings of the international conference on machine learning (ICML) (pp. 897–904).
Teufel, S., & Moens, M. (1997). Sentence extraction as a classification task. In ACL/EACL-97 workshop on intelligent and scalable text summarization (pp. 58–65).
Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453–1484.
Tsuruoka, Y., & Tsujii, J. (2005). Bidirectional inference with the easiest-first strategy for tagging sequence data. In Proceedings of the conference on empirical methods in natural language processing (EMNLP).
Turian, J., & Melamed, I. D. (2006). Advances in discriminative parsing. In Proceedings of the joint international conference on computational linguistics and association of computational linguistics (COLING/ACL).
Turner, J., & Charniak, E. (2005). Supervised and unsupervised learning for sentence compression. In Proceedings of the conference of the association for computational linguistics (ACL).
Wainwright, M. (2006). Estimating the “wrong” graphical model: benefits in the computation-limited setting (Technical report). University of California Berkeley, Department of Statistics, February 2006.
Weston, J., Chapelle, O., Elisseeff, A., Schoelkopf, B., & Vapnik, V. (2002). Kernel dependency estimation. In Advances in neural information processing systems (NIPS).
Ye, S., Qiu, L., Chua, T.-S., & Kan, M.-Y. (2005). NUS at DUC 2005: understanding documents via concept links. In Document understanding conference.
Zadrozny, B., Langford, J., & Abe, N. (2003). Cost-sensitive learning by cost-proportionate example weighting. In Proceedings of the IEEE conference on data mining (ICMD).
Zhang, T. (2006). Personal communication, June 2006.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Dan Roth.
Rights and permissions
About this article
Cite this article
Daumé, H., Langford, J. & Marcu, D. Search-based structured prediction. Mach Learn 75, 297–325 (2009). https://doi.org/10.1007/s10994-009-5106-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-009-5106-x