Abstract
Federated Learning (FL) is a promising decentralized machine learning framework that enables a massive number of clients (e.g., smartphones) to collaboratively train a global model over the Internet without sacrificing their privacy. Though FL’s efficacy in non-convex problems is proven, its convergence amidst biased client participation lacks theoretical study. In this paper, we analyze the convergence of FedAvg on non-convex problems, which is the most renowned FL algorithm. We assume even data distribution but non-IID among clients, and elucidate the convergence rate of FedAvg in situations characterized by biased client participation. Our analysis reveals that biased client participation can significantly reduce the precision of the FL model. We validate this through trace-driven experiments, demonstrating that unbiased client participation results in 11% to 50% higher test accuracy compared to extremely biased client participation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abay, A., Zhou, Y., Baracaldo, N., Rajamoni, S., Chuba, E., Ludwig, H.: Mitigating bias in federated learning. arXiv preprint arXiv:2012.02447 (2020). https://doi.org/10.48550/arXiv.2012.02447
Amiri, M.M., Gündüz, D., Kulkarni, S.R., Poor, H.V.: Convergence of federated learning over a noisy downlink. IEEE Trans. Wireless Commun. 21(3), 1422–1437 (2021). https://doi.org/10.1109/TWC.2021.3103874
Balakrishnan, R., Li, T., Zhou, T., Himayat, N., Smith, V., Bilmes, J.: Diverse client selection for federated learning via submodular maximization. In: International Conference on Learning Representations (ICLR) (2021)
Chen, F., Chen, N., Mao, H., Hu, H.: Assessing four neural networks on handwritten digit recognition dataset (MNIST). arXiv preprint arXiv:1811.08278 (2018). https://doi.org/10.48550/ARXIV.1811.08278
Chilimbi, T., Suzue, Y., Apacible, J., Kalyanaraman, K.: Project adam: building an efficient and scalable deep learning training system. In: Proceedings of the 11th USENIX conference on Operating Systems Design and Implementation (OSDI), pp. 571–582 (2014)
Cho, Y.J., Wang, J., Joshi, G.: Client selection in federated learning: convergence analysis and power-of-choice selection strategies. arXiv preprint arXiv:2010.01243 (2020). https://doi.org/10.48550/arXiv.2010.01243
Duan, M., Liu, D., Chen, X., Liu, R., Tan, Y., Liang, L.: Self-balancing federated learning with global imbalanced data in mobile systems. IEEE Trans. Parallel Distrib. Syst. 32(1), 59–71 (2020). https://doi.org/10.1109/TPDS.2020.3009406
Haddadpour, F., Mahdavi, M.: On the convergence of local descent methods in federated learning. arXiv preprint arXiv:1910.14425 (2019). https://doi.org/10.48550/arXiv.1910.14425
Kairouz, P., et al.: Advances and open problems in federated learning. Found. Trends® Mach. Learn. 14(1–2), 1–210 (2021)
Khaled, A., Mishchenko, K., Richtárik, P.: First analysis of local GD on heterogeneous data. arXiv preprint arXiv:1909.04715 (2019). https://doi.org/10.48550/ARXIV.1909.04715
Khan, L.U., Saad, W., Han, Z., Hossain, E., Hong, C.S.: Federated learning for internet of things: recent advances, taxonomy, and open challenges. IEEE Commun. Surv. Tutor. (2021). https://doi.org/10.1109/COMST.2021.3090430
Krizhevsky, A.: Learning Multiple Layers of Features From Tiny Images. University of Toronto, Toronto (2012)
Li, A., Zhang, L., Tan, J., Qin, Y., Wang, J., Li, X.Y.: Sample-level data selection for federated learning. In: IEEE Conference on Computer Communications (INFOCOM), pp. 1–10 (2021). https://doi.org/10.1109/INFOCOM42981.2021.9488723
Li, T., Hu, S., Beirami, A., Smith, V.: Ditto: Fair and robust federated learning through personalization. In: Proceedings of the 38th International Conference on Machine Learning (ICML), pp. 6357–6368. PMLR (2021)
Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V.: Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2, 429–450 (2020)
Li, T., Sanjabi, M., Smith, V.: Fair resource allocation in federated learning. In: International Conference on Learning Representations (ICLR) (2020)
Li, X., Huang, K., Yang, W., Wang, S., Zhang, Z.: On the convergence of Fedavg on non-IID data. In: Eighth International Conference on Learning Representations (ICLR) (2020)
Liu, R., Cao, Y., Yoshikawa, M., Chen, H.: Fedsel: Federated SGD under local differential privacy with top-k dimension selection. In: DASFAA (2020)
Ma, J., Xie, M., Long, G.: Personalized federated learning with robust clustering against model poisoning. In: Chen, W., Yao, L., Cai, T., Pan, S., Shen, T., Li, X. (eds.) ADMA 2022. LNCS, vol. 13726, pp. 238–252. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-22137-8_18
McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics (AISTATS), pp. 1273–1282 (2017)
Segarceanu, S., Gavat, I., Suciu, G.: Evaluation of deep learning techniques for acoustic environmental events detection. Romanian J. Technical Sci. Appl. Mech. 66(1), 19–37 (2021)
Tan, L., et al.: Adafed: optimizing participation-aware federated learning with adaptive aggregation weights. IEEE Trans. Network Sci. Eng. 9(4), 2708–2720 (2022). https://doi.org/10.1109/TNSE.2022.3168969
Xu, J., Glicksberg, B.S., Su, C., Walker, P., Bian, J., Wang, F.: Federated learning for healthcare informatics. J. Healthcare Inform. Res. 5(1), 1–19 (2021)
Yang, H., Fang, M., Liu, J.: Achieving linear speedup with partial worker participation in non-IID federated learning. In: International Conference on Learning Representations (ICLR) (2021)
Yang, W., et al.: Gain without pain: Offsetting DP-injected Nosies stealthily in cross-device federated learning. IEEE Internet Things J. 9(22), 22147–22157 (2021). https://doi.org/10.1109/JIOT.2021.3102030
Yu, H., Jin, R., Yang, S.: On the linear speedup analysis of communication efficient momentum SGD for distributed non-convex optimization. In: International Conference on Machine Learning (ICML), pp. 7184–7193 (2019)
Acknowledgements
This study received support from the National Natural Science Foundation of China through Grants U1911201 and U2001209, the Natural Science Foundation of Guangdong under Grant 2021A1515011369, and the Science and Technology Program of Guangzhou under Grant 2023A04J2029.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendiex
Appendiex
Within this section, we will provide proofs for Lemma 1 and Lemma 2.
1.1 Proof of Lemma 1
For any \(t \ge 0 \), there exits a \(t - t_0 \le E \), and \( \omega _k^{t_0} = \omega ^{t_0} \) for all \(k=1,2,...,N\). Similar to previous work [17], we have
1.2 Proof of Lemma 2
Since \( n_k = \frac{n}{N} \) for all \( n_k \), \( \sum _{k=1}^{N} M_k^t = mt \), we can derive that
Utilizing the \(\rho \)-smoothness property of \(F(\omega )\), the subsequent inequality can be derived:
By applying the fact: \( \mathbb {E} \left\| x \right\| ^2 = \mathbb {E} \left[ \left\| x - \mathbb {E} x \right\| ^2 \right] + \left\| \mathbb {E} x \right\| ^2 \), we can obtain
Since each client works in parallel and independently and according to Assumption 3, we have
We further note that
Firstly, for bound A1, we can obtain

Secondly, \(\omega ^{t+1} = \frac{N}{m} \sum _{k \in s_t} p_k \omega _{k}^{t+1}\) according to Eq. (4), therefore we can obtain \(\nabla F(\omega ^{t+1}) = \frac{N}{m} \sum _{k \in s_{t+1}} p_k \nabla F_k(\omega ^{t+1})\) [24]. For bound A2, we can obtain
According to the Cauchy-Buniakowsky-Schwarz inequality, we have
By using Assumption 1, we can obtain
By using Lemma 1, we can derive the bound of A2 as
Upon substituting Eq. (31) into Eq. (27), we arrive at the upper bound for A1 as follows:
By combining the results of Eq. (25), Eq. (26) and Eq. (32), we can obtain
The conclusion that \( 0 \le \eta _t \le \frac{1}{\rho } \) can be obtained from the setting \(\eta _t = \frac{1}{\rho } \sqrt{\frac{1}{T}}\), we can obtain

By dividing both the left side and the right side by \( \frac{\eta _t}{2} \), we have

According to Eq. (13), we have \(\sum _{k\in S_t} p_k^2 = \sum _{k\in S_t} {\left( \frac{M_k^t}{mt} \right) }^2\). As \(\eta _t = \frac{1}{\rho } \sqrt{\frac{1}{T}}\), we can sum Eq. (35) from \(t=0\) to \(T-1\) and obtain

where \(\omega ^*\) is the optimal solution.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tan, L., Hu, M., Zhou, Y., Wu, D. (2023). Analyzing the Convergence of Federated Learning with Biased Client Participation. In: Yang, X., et al. Advanced Data Mining and Applications. ADMA 2023. Lecture Notes in Computer Science(), vol 14177. Springer, Cham. https://doi.org/10.1007/978-3-031-46664-9_29
Download citation
DOI: https://doi.org/10.1007/978-3-031-46664-9_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46663-2
Online ISBN: 978-3-031-46664-9
eBook Packages: Computer ScienceComputer Science (R0)