Multi-core Structural SVM Training

Chang, Kai-Wei; Srikumar, Vivek; Roth, Dan

doi:10.1007/978-3-642-40991-2_26

Kai-Wei Chang²³,
Vivek Srikumar²³ &
Dan Roth²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8189))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2989 Accesses
2 Citations

Abstract

Many problems in natural language processing and computer vision can be framed as structured prediction problems. Structural support vector machines (SVM) is a popular approach for training structured predictors, where learning is framed as an optimization problem. Most structural SVM solvers alternate between a model update phase and an inference phase (which predicts structures for all training examples). As structures become more complex, inference becomes a bottleneck and thus slows down learning considerably. In this paper, we propose a new learning algorithm for structural SVMs called DEMIDCD that extends the dual coordinate descent approach by decoupling the model update and inference phases into different threads. We take advantage of multicore hardware to parallelize learning with minimal synchronization between the model update and the inference phases.We prove that our algorithm not only converges but also fully utilizes all available processors to speed up learning, and validate our approach on two real-world NLP problems: part-of-speech tagging and relation extraction. In both cases, we show that our algorithm utilizes all available processors to speed up learning and achieves competitive performance. For example, it achieves a relative duality gap of 1% on a POS tagging problem in 192 seconds using 16 threads, while a standard implementation of a multi-threaded dual coordinate descent algorithm with the same number of threads requires more than 600 seconds to reach a solution of the same quality.

Download to read the full chapter text

Chapter PDF

Weakly Supervised Discriminative Training of Linear Models for Natural Language Processing

Learning Diverse Models: The Coulomb Structured Support Vector Machine

An Interpretable Knowledge Representation Framework for Natural Language Processing with Cross-Domain Application

References

Agarwal, A., Duchi, J.: Distributed delayed stochastic optimization. In: Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K. (eds.) NIPS, pp. 873–881 (2011)
Google Scholar
Chang, K., Roth, D.: Selective block minimization for faster convergence of limited memory large-scale linear models. In: KDD (2011)
Google Scholar
Chang, M., Srikumar, V., Goldwasser, D., Roth, D.: Structured output learning with indirect supervision. In: ICML (2010)
Google Scholar
Chang, M.W., Yih, W.T.: Dual coordinate descent algorithms for efficient large margin structural learning. Transactions of the Association for Computational Linguistics (2013)
Google Scholar
Chu, C., Kim, S.K., Lin, Y., Yu, Y., Bradski, G., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: NIPS, vol. 19, p. 281 (2007)
Google Scholar
Collins, M.: Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In: EMNLP (2002)
Google Scholar
Collins, M., Roark, B.: Incremental parsing with the perceptron algorithm. In: ACL (2004)
Google Scholar
Finley, T., Joachims, T.: Training structural SVMs when exact inference is intractable. In: ICML, pp. 304–311 (2008)
Google Scholar
Gurobi Optimization, Inc.: Gurobi optimizer reference manual (2012)
Google Scholar
Hsieh, C.J., Chang, K.W., Lin, C.J., Keerthi, S.S., Sundararajan, S.: A dual coordinate descent method for large-scale linear SVM. In: ICML (2008)
Google Scholar
Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning (1999)
Google Scholar
Joachims, T., Finley, T., Yu, C.N.: Cutting-plane training of structural svms. Machine Learning (2009)
Google Scholar
Lacoste-Julien, S., Jaggi, M., Mark Schmidt, P.P.: Block-coordinate Frank-Wolfe optimization for structural SVMs. In: ICML (2013)
Google Scholar
Langford, J., Smola, A.J., Zinkevich, M.: Slow learners are fast. In: NIPS, pp. 2331–2339 (2009)
Google Scholar
Luo, Z.Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Annals of Operations Research 46, 157–178 (1993)
Article MathSciNet Google Scholar
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: The penn treebank. Computational Linguistics
Google Scholar
McDonald, R., Hall, K., Mann, G.: Distributed training strategies for the structured Perceptron. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles, California, pp. 456–464. Association for Computational Linguistics (June 2010)
Google Scholar
McDonald, R.T., Hall, K., Mann, G.: Distributed training strategies for the structured perceptron. In: HLT-NAACL, pp. 456–464 (2010)
Google Scholar
Meshi, O., Sontag, D., Jaakkola, T., Globerson, A.: Learning efficiently with approximate inference via dual losses. In: ICML (2010)
Google Scholar
Roth, D., Yih, W.: A linear programming formulation for global inference in natural language tasks. In: Ng, H.T., Riloff, E. (eds.) CoNLL (2004)
Google Scholar
Roth, D., Yih, W.: Global inference for entity and relation identification via a linear programming formulation. In: Getoor, L., Taskar, B. (eds.) Introduction to Statistical Relational Learning (2007)
Google Scholar
Taskar, B., Chatalbashev, V., Koller, D., Guestrin, C.: Learning structured prediction models: a large margin approach. In: ICML (2005)
Google Scholar
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research (2005)
Google Scholar
Wang, P.W., Lin, C.J.: Iteration complexity of feasible descent methods for convex optimization. Technical Report, National Taiwan University (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, University of Illinois, Urbana-Champaign, IL, USA
Kai-Wei Chang, Vivek Srikumar & Dan Roth

Authors

Kai-Wei Chang
View author publications
You can also search for this author in PubMed Google Scholar
Vivek Srikumar
View author publications
You can also search for this author in PubMed Google Scholar
Dan Roth
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, 3001, Leuven, Belgium
Hendrik Blockeel
Fraunhofer IAIS, Department of Knowledge Discovery, Schloss Birlinghoven, University of Bonn, 53754, Sankt Augustin, Germany
Kristian Kersting
LIACS, Universiteit Leiden, Niels Bohrweg 1, 2333, Leiden, CA, The Netherlands
Siegfried Nijssen
Department of Computer Science and Engineering, Czech Technical University, Technicka 2, 16627, Prague 6, Czech Republic
Filip Železný

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chang, KW., Srikumar, V., Roth, D. (2013). Multi-core Structural SVM Training. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2013. Lecture Notes in Computer Science(), vol 8189. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40991-2_26

Download citation

DOI: https://doi.org/10.1007/978-3-642-40991-2_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40990-5
Online ISBN: 978-3-642-40991-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics