Search-based structured prediction

Daumé, Hal; Langford, John; Marcu, Daniel

doi:10.1007/s10994-009-5106-x

Search-based structured prediction

Published: 14 March 2009

Volume 75, pages 297–325, (2009)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Search-based structured prediction

Download PDF

Hal Daumé III¹,
John Langford² &
Daniel Marcu³

3998 Accesses
7 Altmetric
Explore all metrics

Abstract

We present Searn, an algorithm for integrating search and learning to solve complex structured prediction problems such as those that occur in natural language, speech, computational biology, and vision. Searn is a meta-algorithm that transforms these complex problems into simple classification problems to which any binary classifier may be applied. Unlike current algorithms for structured learning that require decomposition of both the loss function and the feature functions over the predicted structure, Searn is able to learn prediction functions for any loss function and any class of features. Moreover, Searn comes with a strong, natural theoretical guarantee: good performance on the derived classification problems implies good performance on the structured prediction problem.

Article PDF

Machine Learning

Naive automated machine learning

Article Open access 29 September 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Altun, Y., Hofmann, T., & Smola, A. (2004). Gaussian process classification for segmenting and annotating sequences. In Proceedings of the international conference on machine learning (ICML).
Ando, R., & Zhang, T. (2005). A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6, 1817–1853.
MathSciNet Google Scholar
Bagnell, J. A., Kakade, S., Ng, A., & Schneider, J. (2003). Policy search by dynamic programming. In Neural information processing systems (Vol. 16). Cambridge: MIT Press.
Google Scholar
Beygelzimer, A., Dani, V., Hayes, T., Langford, J., & Zadrozny, B. (2005). Error limiting reductions between classification tasks. In Proceedings of the international conference on machine learning (ICML).
Bikel, D. M. (2004). Intricacies of Collins’ parsing model. Computational Linguistics, 30(4), 479–511.
Article Google Scholar
Chang, C.-C., & Lin, C.-J. (2001). LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Cohen, W. W., & Carvalho, V. (2005). Stacked sequential learning. In Proceedings of the international joint conference on artificial intelligence (IJCAI).
Collins, M. (2002). Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In Proceedings of the conference on empirical methods in natural language processing (EMNLP).
Collins, M., & Roark, B. (2004). Incremental parsing with the perceptron algorithm. In Proceedings of the conference of the association for computational linguistics (ACL).
Crescenzi, P., Goldman, D., Papadimitriou, C., Piccolboni, A., & Yannakakis, M. (1998). On the complexity of protein folding. In ACM symposium on theory of computing (STOC) (pp. 597–603).
Dang, H. (Ed.). (2005). Fifth document understanding conference (DUC-2005), Ann Arbor, MI, June 2005.
Daumé III, H. (2006). Practical structured learning for natural language processing. PhD thesis, University of Southern California.
Daumé III, H., & Marcu, D. (2002). A noisy-channel model for document compression. In Proceedings of the conference of the association for computational linguistics (ACL) (pp. 449–456).
Daumé III, H., & Marcu, D. (2005a). Bayesian summarization at DUC and a suggestion for extrinsic evaluation. In Document understanding conference.
Daumé III, H., & Marcu, D. (2005b). A large-scale exploration of effective global features for a joint entity detection and tracking model. In Proceedings of the joint conference on human language technology conference and empirical methods in natural language processing (HLT/EMNLP) (pp. 97–104).
Daumé III, H., & Marcu, D. (2006). Bayesian query-focused summarization. In Proceedings of the conference of the association for computational linguistics (ACL), Sydney, Australia.
Foulds, L. R., & Graham, R. L. (1982). The Steiner problem in phylogeny is NP-complete. Advances in Applied Mathematics, 3, 43–49.
Article MATH MathSciNet Google Scholar
Freund, Y., & Shapire, R. E. (1999). Large margin classification using the perceptron algorithm. Machine Learning, 37(3), 277–296.
Article MATH Google Scholar
Germann, U., Jahr, M., Knight, K., Marcu, D., & Yamada, K. (2003). Fast decoding and optimal decoding for machine translation. Artificial Intelligence, 154(1–2), 127–143.
MathSciNet Google Scholar
Giménez, J., & Màrquez, L. (2004). SVMTool: a general POS tagger generator based on support vector machines. In Proceedings of the 4th LREC.
Huang, L., Zhang, H., & Gildea, D. (2005). Machine translation as lexicalized parsing with hooks. In Proceedings of the 9th international workshop on parsing technologies (IWPT-05), October 2005.
Kääriäinen, M. (2006). Lower bounds for reductions. In The atomic learning workshop (TTI-C), March 2006.
Kakade, S., & Langford, J. (2002). Approximately optimal approximate reinforcement learning. In Proceedings of the international conference on machine learning (ICML).
Kakade, S., Teh, Y. W., & Roweis, S. (2002). An alternate objective function for Markovian fields. In Proceedings of the international conference on machine learning (ICML).
Kassel, R. (1995). A comparison of approaches to on-line handwritten character recognition. PhD thesis, Massachusetts Institute of Technology, Spoken Language Systems Group.
Knight, K., & Marcu, D. (2002). Summarization beyond sentence extraction: a probabilistic approach to sentence compression. Artificial Intelligence, 139(1).
Kudo, T., & Matsumoto, Y. (2001). Chunking with support vector machines. In Proceedings of the conference of the North American chapter of the association for computational linguistics (NAACL).
Kudo, T., & Matsumoto, Y. (2003). Fast methods for kernel-based text analysis. In Proceedings of the conference of the association for computational linguistics (ACL).
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the international conference on machine learning (ICML).
Langford, J., & Zadrozny, B. (2005). Relating reinforcement learning performance to classification performance. In Proceedings of the international conference on machine learning (ICML).
Lewis, D. (2001). Applying support vector machines to the TREC-2001 batch filtering and routing tasks. In Proceedings of the conference on research and developments in information retrieval (SIGIR).
Liang, P., Bouchard-Côté, A., Klein, D., & Taskar, B. (2006). An end-to-end discriminative approach to machine translation. In Proceedings of the joint international conference on computational linguistics and association of computational linguistics (COLING/ACL).
Lin, C.-Y., & Hovy, E. (2002). From single to multi-document summarization: a prototype system and its evaluation. In Proceedings of the conference of the association for computational linguistics (ACL), July 2002.
Lin, C.-Y., & Hovy, E. (2003). Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the conference of the North American chapter of the association for computational linguistics and human language technology (NAACL/HLT), Edmonton, Canada, 27 May–1 June 2003.
Manning, C. (2006). Doing named entity recognition? Don’t optimize for F ₁. Post on the NLPers Blog, 25 August 2006. http://nlpers.blogspot.com/2006/08/doing-named-entity-recognition-dont.html.
McAllester, D., Collins, M., & Pereira, F. (2004). Case-factor diagrams for structured probabilistic modeling. In Proceedings of the conference on uncertainty in artificial intelligence (UAI).
McCallum, A., Freitag, D., & Pereira, F. (2000). Maximum entropy Markov models for information extraction and segmentation. In Proceedings of the international conference on machine learning (ICML).
McDonald, R. (2006). Discriminative sentence compression with soft syntactic constraints. In Proceedings of the conference of the European association for computational linguistics (EACL).
McDonald, R., & Pereira, F. (2005). Identifying gene and protein mentions in text using conditional random fields. BMC Bioinformatics, 6(Suppl 1).
McDonald, R., Crammer, K., & Pereira, F. (2004). Large margin online learning algorithms for scalable structured classification. In NIPS workshop on learning with structured outputs.
Musicant, D., Kumar, V., & Ozgur, A. (2003). Optimizing F-measure with support vector machines. In Proceedings of the international Florida artificial intelligence research society conference (pp. 356–360).
Ng, A., & Jordan, M. (2000). PEGASUS: A policy search method for large MDPs and POMDPs. In Proceedings of the conference on uncertainty in artificial intelligence (UAI).
Punyakanok, V., & Roth, D. (2001). The use of classifiers in sequential inference. In Advances in neural information processing systems (NIPS).
Punyakanok, V., Roth, D., & Yih, W.-T. (2005a). The necessity of syntactic parsing for semantic role labeling. In Proceedings of the international joint conference on artificial intelligence (IJCAI) (pp. 1117–1123).
Punyakanok, V., Roth, D., Yih, W.-T., & Zimak, D. (2005b). Learning and inference over constrained output. In Proceedings of the international joint conference on artificial intelligence (IJCAI) (pp. 1124–1129).
Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386–408. Reprinted in Neurocomputing (MIT Press, 1998).
Article MathSciNet Google Scholar
Russell, S., & Norvig, P. (1995). Artificial intelligence: a modern approach. New Jersey: Prentice Hall.
MATH Google Scholar
Sarawagi, S., & Cohen, W. (2004). Semi-Markov conditional random fields for information extraction. In Advances in neural information processing systems (NIPS).
Shen, L., Satta, G., & Joshi, A. (2007). Guided learning for bidirectional sequence classification. In Proceedings of the conference of the association for computational linguistics (ACL).
Sutton, C., Rohanimanesh, K., & McCallum, A. (2004). Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data. In Proceedings of the international conference on machine learning (ICML) (pp. 783–790).
Sutton, C., Sindelar, M., & McCallum, A. (2005). Feature bagging: preventing weight undertraining in structured discriminative learning (Technical Report IR-402). University of Massachusetts, Center for Intelligent Information Retrieval.
Taskar, B., Guestrin, C., & Koller, D. (2003). Max-margin Markov networks. In Advances in neural information processing systems (NIPS).
Taskar, B., Chatalbashev, V., Koller, D., & Guestrin, C. (2005). Learning structured prediction models: a large margin approach. In Proceedings of the international conference on machine learning (ICML) (pp. 897–904).
Teufel, S., & Moens, M. (1997). Sentence extraction as a classification task. In ACL/EACL-97 workshop on intelligent and scalable text summarization (pp. 58–65).
Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453–1484.
MathSciNet Google Scholar
Tsuruoka, Y., & Tsujii, J. (2005). Bidirectional inference with the easiest-first strategy for tagging sequence data. In Proceedings of the conference on empirical methods in natural language processing (EMNLP).
Turian, J., & Melamed, I. D. (2006). Advances in discriminative parsing. In Proceedings of the joint international conference on computational linguistics and association of computational linguistics (COLING/ACL).
Turner, J., & Charniak, E. (2005). Supervised and unsupervised learning for sentence compression. In Proceedings of the conference of the association for computational linguistics (ACL).
Wainwright, M. (2006). Estimating the “wrong” graphical model: benefits in the computation-limited setting (Technical report). University of California Berkeley, Department of Statistics, February 2006.
Weston, J., Chapelle, O., Elisseeff, A., Schoelkopf, B., & Vapnik, V. (2002). Kernel dependency estimation. In Advances in neural information processing systems (NIPS).
Ye, S., Qiu, L., Chua, T.-S., & Kan, M.-Y. (2005). NUS at DUC 2005: understanding documents via concept links. In Document understanding conference.
Zadrozny, B., Langford, J., & Abe, N. (2003). Cost-sensitive learning by cost-proportionate example weighting. In Proceedings of the IEEE conference on data mining (ICMD).
Zhang, T. (2006). Personal communication, June 2006.

Download references

Author information

Authors and Affiliations

School of Computing, University of Utah, Salt Lake City, UT, 84112, USA
Hal Daumé III
Yahoo! Research Labs, New York, NY, 10011, USA
John Langford
Information Sciences Institute, Marina del Rey, CA, 90292, USA
Daniel Marcu

Authors

Hal Daumé III
View author publications
You can also search for this author inPubMed Google Scholar
John Langford
View author publications
You can also search for this author inPubMed Google Scholar
Daniel Marcu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Hal Daumé III.

Additional information

Editor: Dan Roth.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Daumé, H., Langford, J. & Marcu, D. Search-based structured prediction. Mach Learn 75, 297–325 (2009). https://doi.org/10.1007/s10994-009-5106-x

Download citation

Received: 22 September 2006
Revised: 15 May 2008
Accepted: 16 January 2009
Published: 14 March 2009
Issue Date: June 2009
DOI: https://doi.org/10.1007/s10994-009-5106-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Search-based structured prediction

Abstract

Article PDF

Similar content being viewed by others

Machine Learning

Naive automated machine learning

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords