Förderjahr 2025 / Stipendium Call #20 / ProjektID: 7832 / Projekt: Advancing Privacy in Federated Learning
Federated Learning (FL) is a promising approach to mitigating privacy risks in machine learning. In some domains, particularly under strict regulatory constraints such as medical research, it may be the only viable way to analyze data. By enabling decentralized collaboration, FL has already supported initiatives that would otherwise be infeasible due to data access restrictions (e.g., FeatureCloud, dAIbetes, Probe projects).
However, translating FL into real-world, scalable applications remains challenging. As highlighted by Daly et al.[1], key limitations include the lack of verifiability of server-side computations and the difficulty of validating client-side behavior. This undermines the ability of users and auditors to assess whether privacy guarantees are actually upheld, and limits how much trust in service providers can be reduced.
Bonawitz et al. [2] outline several core privacy principles for Federated Learning systems. First, systems should ensure transparency, auditability, and user control over data usage—clarifying what data is used, for which purposes, and how it is processed. Second, they emphasize data minimization, reducing exposure at every stage by sharing only necessary information, such as aggregated updates instead of raw data, and by employing secure aggregation and cryptographic techniques. Third, access to sensitive intermediate results should be strictly limited to prevent unintended leakage. Finally, any released outputs should provide formal anonymization guarantees, ensuring that model parameters or aggregate statistics do not reveal information about individual participants.
Crucially, FL does not inherently guarantee privacy. Attacks such as membership inference, gradient leakage, and property inference remain possible. While techniques like differential privacy and secure aggregation mitigate these risks, they introduce trade-offs in model utility, computational cost, and system complexity. In practice, privacy becomes a tunable parameter rather than a strict guarantee.
A key gap is the lack of context-aware risk assessment. Current evaluations focus on technical metrics (e.g., attack accuracy), but the real-world impact of privacy leakage depends on the domain, data sensitivity, and adversarial incentives.
To address this, my next work will focus on conducting interviews with stakeholders and industry experts to better understand which privacy risks matter in practice, what trade-offs are acceptable, and how to align technical evaluations with real-world needs.
Federated Learning’s future depends not only on better algorithms, but on grounding its assumptions in reality.
[1] K. Daly, H. Eichner, P. Kairouz, H. B. McMahan, D. Ramage and Z. Xu, "Federated Learning in Practice: Reflections and Projections," 2024 IEEE 6th International Conference on Trust, Privacy and Security in Intelligent Systems, and Applications (TPS-ISA), Washington, DC, USA, 2024, pp. 148-156, doi: 10.1109/TPS-ISA62245.2024.00026.
[2] K. Bonawitz, P. Kairouz, B. McMahan, and D. Ramage. Federated learning and privacy: Building privacy- preserving systems for machine learning and data science on decentralized data. Queue, 19(5):87–114, 2021.