Abstract
A fundamental research topic in domain adaptation is how best to evaluate the distribution discrepancy across domains. The maximum mean discrepancy (MMD) is one of the most commonly used statistical distances in this field. However, information about distributions could be lost when adopting non-characteristic kernels by MMD. To address this issue, we devise a new distribution metric named maximum mean and covariance discrepancy (MMCD) by combining MMD and the proposed maximum covariance discrepancy (MCD). MCD probes the second-order statistics in reproducing kernel Hilbert space, which equips MMCD to capture more information compared to MMD alone. To verify the efficacy of MMCD, an unsupervised learning model based on MMCD abbreviated as McDA was proposed and efficiently optimized to resolve the domain adaptation problem. Experiments on image classification conducted on two benchmark datasets show that McDA outperforms other representative domain adaptation methods, which implies the effectiveness of MMCD in domain adaptation.




We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Baktashmotlagh M, Harandi M, Salzmann M (2016) Distribution-matching embedding for visual domain adaptation. J Mach Learn Res 17(1):3760–3789
Baktashmotlagh M, Harandi MT, Lovell BC, Salzmann M (2013) Unsupervised domain adaptation by domain invariant projection. In: IEEE international conference on computer vision, pp 769–776
Borgwardt KM, Gretton A, Rasch MJ, Kriegel HP, Schölkopf B, Smola AJ (2006) Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22(14):e49–e57
Bruzzone L, Marconcini M (2010) Domain adaptation problems: a DASVM classification technique and a circular validation strategy. IEEE Trans Pattern Anal Mach Intell 32(5):770–787
Cao X, Wipf D, Wen F, Duan G, Sun J (2013) A practical transfer learning algorithm for face verification. In: IEEE international conference on computer vision, pp 3208–3215
Dai W, Yang Q, Xue GR, Yu Y (2007) Boosting for transfer learning. In: Ghahramani Z (ed) Proceedings of the 24th international conference on machine learning. ACM, New York, pp 193–200
Dudley R, Fulton W, Katok A, Sarnak P, Bollobás B, Kirwan F (2002) Real analysis and probability. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9780511755347
Fernando B, Habrard A, Sebban M, Tuytelaars T (2013) Unsupervised visual domain adaptation using subspace alignment. In: IEEE International conference on computer vision, pp 2960–2967
Fukumizu K, Gretton A, Sun X, Schölkopf B (2008) Kernel measures of conditional dependence. In: Advances in neural information processing systems, pp 489–496
Gong B, Shi Y, Sha F, Grauman K (2012) Geodesic flow kernel for unsupervised domain adaptation. In: IEEE international conference on computer vision, pp 2066–2073
Gretton A, Borgwardt KM, Rasch MJ, Schölkopf B, Smola A (2012) A kernel two-sample test. J Mach Learn Res 13(Mar):723–773
Gretton A, Bousquet O, Smola A, Schölkopf B (2005) Measuring statistical dependence with Hilbert–Schmidt norms. In: International conference on algorithmic learning theory, pp 63–77
Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology
Guo Y, Ding G, Liu Q (2015) Distribution regularized nonnegative matrix factorization for transfer visual feature learning. In: ACM international conference on multimedia retrieval, pp 299–306
Hsieh YT, Tao SY, Tsai YHH, Yeh YR, Wang YCF (2016) Recognizing heterogeneous cross-domain data via generalized joint distribution adaptation. In: IEEE international conference on multimedia and expo, pp 1–6
Hsu TH, Chen W, Hou C, Tsai YH, Yeh Y, Wang YF (2015) Unsupervised domain adaptation with imbalanced cross-domain data. In: IEEE international conference on computer vision, pp 4121–4129
Huang J, Smola AJ, Gretton A, Borgwardt KM, Schölkopf B (2006) Correcting sample selection bias by unlabeled data. In: Advances in neural information processing systems, pp 601–608
Jiang W, Deng C, Liu W, Nie F, Chung F, Huang H (2017) Theoretic analysis and extremely easy algorithms for domain adaptive feature learning. In: Proceedings of the international joint conference on artificial intelligence, pp 1958–1964
Long M, Cao Y, Wang J, Jordan M (2015) Learning transferable features with deep adaptation networks. In: International conference on machine learning, pp 97–105
Long M, Wang J, Ding G, Sun J, Yu PS (2013) Transfer feature learning with joint distribution adaptation. In: IEEE international conference on computer vision, pp 2200–2207
Long M, Zhu H, Wang J, Jordan MI (2016) Unsupervised domain adaptation with residual transfer networks. In: Advances in neural information processing systems, pp 136–144
Mroueh Y, Sercu T, Goel V (2017) McGan: mean and covariance feature matching GAN. In: International conference on machine learning, pp 2527–2535
Muandet K, Fukumizu K, Sriperumbudur B, Schölkopf B et al (2017) Kernel mean embedding of distributions: a review and beyond. Found Trends Mach Learn 10(1–2):1–141
Müller A (1997) Integral probability metrics and their generating classes of functions. Adv Appl Probab 29(2):429–443
Pan SJ, Tsang IW, Kwok JT, Yang Q (2011) Domain adaptation via transfer component analysis. IEEE Trans Neural Netw 22(2):199–210
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Patel VM, Gopalan R, Li R, Chellappa R (2015) Visual domain adaptation: a survey of recent advances. IEEE Signal Process Mag 32(3):53–69
Quang M.H, San Biagio M, Murino V (2014) Log-Hilbert–Schmidt metric between positive definite operators on Hilbert spaces. In: Advances in neural information processing systems, pp 388–396
Quang Minh H, San Biagio M, Bazzani L, Murino V (2016) Approximate log-Hilbert–Schmidt distances between covariance operators for image classification. In: IEEE conference on computer vision and pattern recognition, pp 5195–5203
Saenko K, Kulis B, Fritz M, Darrell T (2010) Adapting visual category models to new domains. In: European conference on computer vision, pp 213–226
Schölkopf B, Smola AJ, Bach F et al (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge
Shen J, Qu Y, Zhang W, Yu Y (2018) Wasserstein distance guided representation learning for domain adaptation. In: AAAI conference on artificial intelligence
Si S, Tao D, Geng B (2010) Bregman divergence-based regularization for transfer subspace learning. IEEE Trans Knowl Data Eng 22(7):929–942
Sriperumbudur BK, Gretton A, Fukumizu K, Lanckriet GRG, Schölkopf B (2008) Injective Hilbert space embeddings of probability measures. In: Annual conference on learning theory, pp 111–122
Sun B, Feng J, Saenko K (2016) Return of frustratingly easy domain adaptation. In: AAAI conference on artificial intelligence, pp 2058–2065
Sun H, Liu S, Zhou S (2016) Discriminative subspace alignment for unsupervised visual domain adaptation. Neural Process Lett 44(3):779–793
Wang T, Ye T, Gurrin C (2016) Transfer nonnegative matrix factorization for image representation. In: International conference on multimedia modeling, pp 3–14
Xie X, Sun S, Chen H, Qian J (2018) Domain adaptation with twin support vector machines. Neural Process Lett 48(2):1213–1226
Zellinger W, Grubinger T, Lughofer E, Natschläger T, Saminger-Platz S (2017) Central moment discrepancy (CMD) for domain-invariant representation learning. In: International conference on learning representations
Zhu L, Zhang X, Zhang W, Huang X, Guan N, Luo Z (2017) Unsupervised domain adaptation with joint supervised sparse coding and discriminative regularization term. In: IEEE international conference on image processing, pp 3066–3070
Zong Y, Zheng W, Zhang T, Huang X (2016) Cross-corpus speech emotion recognition based on domain-adaptive least-squares regression. IEEE Signal Process Lett 23(5):585–589
Acknowledgements
This work was supported by the National Natural Science Foundation of China (61806213, 61702134, U1435222).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Gradient Computation
Appendix A: Gradient Computation
According to (16), when the polynomial kernel of degree d is adopted, the gradient of the empirical estimator of squared MMD with respect to the data matrix \(A=[X, Y]\) is given by
where \({\left( {{K_{d - 1}}} \right) _{ij}} = {\left( {{A_i^T}A_j + c} \right) ^{d - 1}}\) and \( \circ \) denotes the element-wise product. Likewise, there holds
and
The gradients of MMD, MCD and MMCD with the linear kernel can be obtained by setting \(d=1\) and \(c=0\) in (39)–(41), respectively.
Rights and permissions
About this article
Cite this article
Zhang, W., Zhang, X., Lan, L. et al. Maximum Mean and Covariance Discrepancy for Unsupervised Domain Adaptation. Neural Process Lett 51, 347–366 (2020). https://doi.org/10.1007/s11063-019-10090-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-019-10090-0