Maximum Mean and Covariance Discrepancy for Unsupervised Domain Adaptation

Zhang, Wenju; Zhang, Xiang; Lan, Long; Luo, Zhigang

doi:10.1007/s11063-019-10090-0

Maximum Mean and Covariance Discrepancy for Unsupervised Domain Adaptation

Published: 08 August 2019

Volume 51, pages 347–366, (2020)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Wenju Zhang^1,3,
Xiang Zhang ORCID: orcid.org/0000-0002-5201-3802^2,3,
Long Lan^2,3 &
…
Zhigang Luo^1,3

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

A fundamental research topic in domain adaptation is how best to evaluate the distribution discrepancy across domains. The maximum mean discrepancy (MMD) is one of the most commonly used statistical distances in this field. However, information about distributions could be lost when adopting non-characteristic kernels by MMD. To address this issue, we devise a new distribution metric named maximum mean and covariance discrepancy (MMCD) by combining MMD and the proposed maximum covariance discrepancy (MCD). MCD probes the second-order statistics in reproducing kernel Hilbert space, which equips MMCD to capture more information compared to MMD alone. To verify the efficacy of MMCD, an unsupervised learning model based on MMCD abbreviated as McDA was proposed and efficiently optimized to resolve the domain adaptation problem. Experiments on image classification conducted on two benchmark datasets show that McDA outperforms other representative domain adaptation methods, which implies the effectiveness of MMCD in domain adaptation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Baktashmotlagh M, Harandi M, Salzmann M (2016) Distribution-matching embedding for visual domain adaptation. J Mach Learn Res 17(1):3760–3789
Google Scholar
Baktashmotlagh M, Harandi MT, Lovell BC, Salzmann M (2013) Unsupervised domain adaptation by domain invariant projection. In: IEEE international conference on computer vision, pp 769–776
Borgwardt KM, Gretton A, Rasch MJ, Kriegel HP, Schölkopf B, Smola AJ (2006) Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22(14):e49–e57
Google Scholar
Bruzzone L, Marconcini M (2010) Domain adaptation problems: a DASVM classification technique and a circular validation strategy. IEEE Trans Pattern Anal Mach Intell 32(5):770–787
Google Scholar
Cao X, Wipf D, Wen F, Duan G, Sun J (2013) A practical transfer learning algorithm for face verification. In: IEEE international conference on computer vision, pp 3208–3215
Dai W, Yang Q, Xue GR, Yu Y (2007) Boosting for transfer learning. In: Ghahramani Z (ed) Proceedings of the 24th international conference on machine learning. ACM, New York, pp 193–200
Dudley R, Fulton W, Katok A, Sarnak P, Bollobás B, Kirwan F (2002) Real analysis and probability. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9780511755347
Fernando B, Habrard A, Sebban M, Tuytelaars T (2013) Unsupervised visual domain adaptation using subspace alignment. In: IEEE International conference on computer vision, pp 2960–2967
Fukumizu K, Gretton A, Sun X, Schölkopf B (2008) Kernel measures of conditional dependence. In: Advances in neural information processing systems, pp 489–496
Gong B, Shi Y, Sha F, Grauman K (2012) Geodesic flow kernel for unsupervised domain adaptation. In: IEEE international conference on computer vision, pp 2066–2073
Gretton A, Borgwardt KM, Rasch MJ, Schölkopf B, Smola A (2012) A kernel two-sample test. J Mach Learn Res 13(Mar):723–773
Google Scholar
Gretton A, Bousquet O, Smola A, Schölkopf B (2005) Measuring statistical dependence with Hilbert–Schmidt norms. In: International conference on algorithmic learning theory, pp 63–77
Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology
Guo Y, Ding G, Liu Q (2015) Distribution regularized nonnegative matrix factorization for transfer visual feature learning. In: ACM international conference on multimedia retrieval, pp 299–306
Hsieh YT, Tao SY, Tsai YHH, Yeh YR, Wang YCF (2016) Recognizing heterogeneous cross-domain data via generalized joint distribution adaptation. In: IEEE international conference on multimedia and expo, pp 1–6
Hsu TH, Chen W, Hou C, Tsai YH, Yeh Y, Wang YF (2015) Unsupervised domain adaptation with imbalanced cross-domain data. In: IEEE international conference on computer vision, pp 4121–4129
Huang J, Smola AJ, Gretton A, Borgwardt KM, Schölkopf B (2006) Correcting sample selection bias by unlabeled data. In: Advances in neural information processing systems, pp 601–608
Jiang W, Deng C, Liu W, Nie F, Chung F, Huang H (2017) Theoretic analysis and extremely easy algorithms for domain adaptive feature learning. In: Proceedings of the international joint conference on artificial intelligence, pp 1958–1964
Long M, Cao Y, Wang J, Jordan M (2015) Learning transferable features with deep adaptation networks. In: International conference on machine learning, pp 97–105
Long M, Wang J, Ding G, Sun J, Yu PS (2013) Transfer feature learning with joint distribution adaptation. In: IEEE international conference on computer vision, pp 2200–2207
Long M, Zhu H, Wang J, Jordan MI (2016) Unsupervised domain adaptation with residual transfer networks. In: Advances in neural information processing systems, pp 136–144
Mroueh Y, Sercu T, Goel V (2017) McGan: mean and covariance feature matching GAN. In: International conference on machine learning, pp 2527–2535
Muandet K, Fukumizu K, Sriperumbudur B, Schölkopf B et al (2017) Kernel mean embedding of distributions: a review and beyond. Found Trends Mach Learn 10(1–2):1–141
Google Scholar
Müller A (1997) Integral probability metrics and their generating classes of functions. Adv Appl Probab 29(2):429–443
Google Scholar
Pan SJ, Tsang IW, Kwok JT, Yang Q (2011) Domain adaptation via transfer component analysis. IEEE Trans Neural Netw 22(2):199–210
Google Scholar
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Google Scholar
Patel VM, Gopalan R, Li R, Chellappa R (2015) Visual domain adaptation: a survey of recent advances. IEEE Signal Process Mag 32(3):53–69
Google Scholar
Quang M.H, San Biagio M, Murino V (2014) Log-Hilbert–Schmidt metric between positive definite operators on Hilbert spaces. In: Advances in neural information processing systems, pp 388–396
Quang Minh H, San Biagio M, Bazzani L, Murino V (2016) Approximate log-Hilbert–Schmidt distances between covariance operators for image classification. In: IEEE conference on computer vision and pattern recognition, pp 5195–5203
Saenko K, Kulis B, Fritz M, Darrell T (2010) Adapting visual category models to new domains. In: European conference on computer vision, pp 213–226
Schölkopf B, Smola AJ, Bach F et al (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge
Google Scholar
Shen J, Qu Y, Zhang W, Yu Y (2018) Wasserstein distance guided representation learning for domain adaptation. In: AAAI conference on artificial intelligence
Si S, Tao D, Geng B (2010) Bregman divergence-based regularization for transfer subspace learning. IEEE Trans Knowl Data Eng 22(7):929–942
Google Scholar
Sriperumbudur BK, Gretton A, Fukumizu K, Lanckriet GRG, Schölkopf B (2008) Injective Hilbert space embeddings of probability measures. In: Annual conference on learning theory, pp 111–122
Sun B, Feng J, Saenko K (2016) Return of frustratingly easy domain adaptation. In: AAAI conference on artificial intelligence, pp 2058–2065
Sun H, Liu S, Zhou S (2016) Discriminative subspace alignment for unsupervised visual domain adaptation. Neural Process Lett 44(3):779–793
Google Scholar
Wang T, Ye T, Gurrin C (2016) Transfer nonnegative matrix factorization for image representation. In: International conference on multimedia modeling, pp 3–14
Xie X, Sun S, Chen H, Qian J (2018) Domain adaptation with twin support vector machines. Neural Process Lett 48(2):1213–1226
Google Scholar
Zellinger W, Grubinger T, Lughofer E, Natschläger T, Saminger-Platz S (2017) Central moment discrepancy (CMD) for domain-invariant representation learning. In: International conference on learning representations
Zhu L, Zhang X, Zhang W, Huang X, Guan N, Luo Z (2017) Unsupervised domain adaptation with joint supervised sparse coding and discriminative regularization term. In: IEEE international conference on image processing, pp 3066–3070
Zong Y, Zheng W, Zhang T, Huang X (2016) Cross-corpus speech emotion recognition based on domain-adaptive least-squares regression. IEEE Signal Process Lett 23(5):585–589
Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (61806213, 61702134, U1435222).

Author information

Authors and Affiliations

Science and Technology on Parallel and Distributed Laboratory, National University of Defense Technology, Changsha, 410073, China
Wenju Zhang & Zhigang Luo
State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha, 410073, China
Xiang Zhang & Long Lan
College of Computer, National University of Defense Technology, Changsha, 410073, China
Wenju Zhang, Xiang Zhang, Long Lan & Zhigang Luo

Authors

Wenju Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Xiang Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Long Lan
View author publications
You can also search for this author inPubMed Google Scholar
Zhigang Luo
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Xiang Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Gradient Computation

According to (16), when the polynomial kernel of degree d is adopted, the gradient of the empirical estimator of squared MMD with respect to the data matrix $A=[X, Y]$ is given by

$$\begin{aligned} \frac{{\partial \widehat{\mathrm{MMD}}^2}}{{\partial A}} = 2dA( {M \circ {K_{d - 1}}} ), \end{aligned}$$

(39)

where ${\left( {{K_{d - 1}}} \right) _{ij}} = {\left( {{A_i^T}A_j + c} \right) ^{d - 1}}$ and $ \circ $ denotes the element-wise product. Likewise, there holds

$$\begin{aligned} \frac{{\partial \widehat{\mathrm{MCD}}^2}}{{\partial A}} = 4dA( {Z{K_d}Z \circ {K_{d - 1}}} ), \end{aligned}$$

(40)

and

$$\begin{aligned} \frac{{\partial \widehat{\mathrm{MMCD}}^2}}{{\partial A}} = \frac{{\partial \widehat{\mathrm{MMD}}^2}}{{\partial A}} + \beta \frac{{\partial \widehat{\mathrm{MCD}}^2}}{{\partial A}}. \end{aligned}$$

(41)

The gradients of MMD, MCD and MMCD with the linear kernel can be obtained by setting $d=1$ and $c=0$ in (39)–(41), respectively.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, W., Zhang, X., Lan, L. et al. Maximum Mean and Covariance Discrepancy for Unsupervised Domain Adaptation. Neural Process Lett 51, 347–366 (2020). https://doi.org/10.1007/s11063-019-10090-0

Download citation

Published: 08 August 2019
Issue Date: February 2020
DOI: https://doi.org/10.1007/s11063-019-10090-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Maximum Mean and Covariance Discrepancy for Unsupervised Domain Adaptation

Abstract

Access this article

Subscribe and save

Buy Now

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A: Gradient Computation

Appendix A: Gradient Computation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now