April 27, 2025

Health Benefit

Healthy is Rich, Today's Best Investment

FAItH: Federated Analytics and Integrated Differential Privacy with Clustering for Healthcare Monitoring

FAItH: Federated Analytics and Integrated Differential Privacy with Clustering for Healthcare Monitoring

n this experiment, we first evaluated the performance of various differential privacy (DP) mechanisms. Next, we compared the results of FAItH with state-of-the-art methods, specifically clustering techniques used in federated analytics34 and federated learning frameworks35,36, which share similarities with our approach but differ in their clustering strategy and how they integrate within the federated learning and machine learning pipeline. The dataset used for this purpose was the Human Activity Recognition (HAR) dataset, which is typical for sensor data on mobile devices37. The HAR dataset captures three-dimensional measurements from both an accelerometer and a gyroscope. Which uses these sensor information to anticipate user behavior with high accuracy. There are around 10,299 cases in all, and 561 characteristics characterize each one.

Experiment parameters

The federated analytics environment was simulated with 510 clients in most cases, and with 980 clients for a specific test to evaluate the scalability of FAItH. Two different privacy budgets, represented by epsilon (\(\epsilon\)) values of 10 and 50, were tested. These values determine the level of privacy control, with smaller \(\epsilon\) values providing stronger privacy guarantees but potentially reducing accuracy. Additionally, the performance of various differential privacy mechanisms was evaluated under highly distributed data, and the effectiveness of FAItH was assessed and compared with state-of-the-art methods across different degrees of distribution to test its novelty and effectiveness. Therefore, to mimic the highly imbalanced or non-identical (non-i.i.d.) data distribution across participants, we employed the Dirichlet distribution with parameters \(\alpha = \{0.5, 3, 5\}\). As \(\alpha\) decreases, the label distribution becomes more heterogeneous, indicating a shift towards a more pronounced non-i.i.d. data distribution.

To evaluate the first stage, which includes federated analytics and differential privacy (DP), we used three key metrics: membership inference attack success rates, attack accuracy, and overall accuracy. In the second stage, to assess the clustering performance, we employed the silhouette score.

The first metric, Membership Inference Attack Success Rates, was used to evaluate the vulnerability of DP mechanisms to adversarial attacks. We explored three types of membership inference attacks: simple, complex, and advanced. The simple membership inference attack is a basic approach that attempts to determine whether a specific data point was part of the training set. To measure its success, we calculated the true statistic (mean, sum, variance, or quantile) on the original client data and compared it to the DP result. If the difference between the two was below a threshold (\(2/\epsilon\)), the attack was considered successful. The attack success rate was determined by the percentage of successfully attacked clients.

In contrast, the Complex Membership Inference Attack employed machine learning techniques, particularly logistic regression, to predict membership. This method involved preparing a dataset by computing DP results for each client and obtaining their true statistics. These values were then used to train a logistic regression model, which predicted membership based on the deviation from the true statistic. The model’s success rate reflected the privacy mechanism’s susceptibility to advanced attacks. However, the effectiveness of this attack can be limited by the diversity of the training dataset.

Furthermore, to capture complex patterns and non-linear relationships in the data, we incorporated an Advanced Membership Inference Attack (Neural Network-based). This attack model utilizes a deep learning approach to predict membership more accurately. The neural network model takes the differentially private outputs, compares them with the true statistics, and learns to predict membership based on these comparisons. The network is trained on the DP results, and the accuracy of its predictions serves as an advanced measure of the privacy protection offered by the DP mechanism. The success rate of this advanced attack was evaluated based on the ability of the neural network to distinguish between members and non-members of the dataset, providing a deeper insight into the privacy mechanism’s robustness against more sophisticated attacks.

Accuracy was used to measure the closeness of the statistical functions (mean, variance, and quantile) to the true values when applying differential privacy. A high accuracy score indicates that, despite the noise introduced by DP, the function still approximates the actual value closely. This metric balances the trade-off between maintaining privacy and preserving the utility of the data. In this experiment, an attack is considered to fail when it cannot correctly predict if a specific data point was part of the original dataset. For simple attacks, failure happens when the difference between the real value and the differentially private (DP) result is too big. If the difference is larger than \(\frac{2}{\epsilon }\), the attack fails. This means the added noise makes it harder for the attacker to figure out the original value.

For complex attacks, failure is measured by how well a machine learning model (such as logistic regression) can predict if a data point was in the dataset. If the model can’t predict well, the attack is less successful, meaning the privacy protection is stronger. Finally, the advanced attack is evaluated based on the neural network’s ability to learn the membership prediction task. A failed advanced attack indicates that the deep learning model could not identify the original data point, signaling that the privacy-preserving mechanism is effective.

In this work, several clustering evaluation metrics are employed to assess the performance of the clustering algorithms. The silhouette score is used to measure the quality of the clustering by comparing how similar each data point is to its own cluster versus neighboring clusters, with higher scores indicating better-defined clusters. Along with the silhouette score, the confidence interval for silhouette score is provided to quantify the uncertainty of the clustering results, offering a range within which the true Silhouette Score is likely to lie. The Davies–Bouldin Index (DBI) is another metric that evaluates the compactness and separation of clusters, where a lower DBI indicates more distinct and compact clusters. Finally, the computation time is measured to assess the efficiency of the clustering process, including both model training and evaluation, providing insight into the algorithm’s performance in terms of time efficiency.

Evaluated the performance of various differential privacy (DP) mechanisms

First we discuss results for centralized and local DP with different noise mechanisms. Then We discuss clustering and compare with non-DP results.

Analysis of results for centralized and local DP with different noise mechanisms

In Fig. 2, we illustrate the comparison of Accuracy and Attack Levels across Noise Types. The figure displays the accuracy of different noise mechanisms alongside the results of the regression attack. In Fig. 3, the figures exclusively present the attack accuracy for the deep learning model. This presentation highlights the distinct performance of the advanced deep learning model in evaluating privacy, offering a more sophisticated measure of attack success compared to traditional regression model Under Central DP with Laplace noise, accuracy improves as the privacy budget increases. At epsilon 10, Mean and Quantile have lower accuracy values (− 1.5886 and − 0.0769, respectively), indicating notable information distortion due to noise addition. At epsilon 50, however, accuracy is positive for all functions, with Variance achieving the highest accuracy (0.9584), suggesting that Laplace noise is effective in preserving accuracy with a higher privacy budget. This result demonstrates the inherent trade-off between privacy and utility, where higher privacy budgets reduce noise distortion but increase the potential for attack vulnerability. Nonetheless, attack levels also increase with a higher epsilon, with Quantile showing the highest susceptibility to attack (0.92) at epsilon 50 in the regression attack. In contrast, for the advanced attack using the deep learning model, the attack success rate at epsilon 10 reaches 0.69, which is higher than the 0.65 achieved by the regression model. This suggests that the deep learning model is better equipped at capturing complex relationships in the data, particularly at lower privacy budgets, as compared to the simpler regression models, which struggle more under higher privacy budgets.

For Exponential noise under Center DP, accuracy also improves with a higher epsilon value. Mean and Quantile have relatively lower accuracies at epsilon 10 (− 0.6505 and − 0.1521, respectively), while at epsilon 50, accuracy is positive for all functions, with Variance achieving the highest (0.9571). The improved performance with Exponential noise demonstrates its potential for preserving utility in statistical functions, particularly when privacy constraints are relaxed. Nevertheless, Exponential noise shows a notable increase in vulnerability from epsilon 10 to epsilon 50, suggesting a similar trade-off between accuracy and privacy. For instance, when the attacker uses a deep learning model, the attack accuracy for the Quantile function at epsilon 10 reaches 0.68, which is higher than the regression model’s accuracy of around 0.6. This highlights the importance of using advanced attack models in privacy evaluations, as they are better equipped to exploit complex patterns in the data.

With Gaussian noise under Center DP, Mean and Quantile exhibit negative accuracies at epsilon 10, which are less severe than those under Laplace but still indicate substantial noise impact. At epsilon 50, accuracy improves across all functions, with Variance reaching the highest (0.9603), demonstrating that Gaussian noise can better maintain accuracy at higher privacy budgets. However, as with other noise types, the increased epsilon correlates with higher attack probabilities. This finding further underscores the critical balance between privacy and utility across all mechanisms. For different attack evaluations, both regression and deep learning attacks show accuracy around 0.6 and 0.8 for different epsilon values. However, the results in Figs. 2 and 3 demonstrate that the deep learning attack observes significantly higher accuracy compared to regression, proving the importance of including deep learning in privacy evaluations

In summary, accuracy consistently improves across all noise types as epsilon increases, with Variance demonstrating the most resilience and highest accuracy overall. However, attack probability also increases with a higher epsilon. These results underscore the critical need to balance privacy and utility by selecting appropriate privacy budgets and noise mechanisms. For instance, Variance achieves a favorable trade-off between accuracy and attack resistance, while Quantile remains more vulnerable under relaxed privacy budgets, emphasizing the importance of tailoring configurations to specific application needs.

Fig. 2
figure 2

Comparison of accuracy and attack levels across noise types.

Fig. 3
figure 3

Membership attack levels across noise types.

Analysis of results for local DP with different noise mechanisms. In Fig. 4, under Local DP with Laplace noise, accuracy improves as the privacy budget increases. At epsilon 10, Mean and Quantile have lower accuracy values (− 2.6517 and − 0.0601, respectively), indicating notable information distortion due to noise addition. At epsilon 50, however, accuracy is positive for all functions, with Variance achieving the highest accuracy (0.9591), suggesting that Laplace noise is effective in preserving accuracy with a higher privacy budget. However, this improvement comes at the expense of increased vulnerability, with attack levels rising as the privacy budget is relaxed. Nonetheless, attack levels also increase with a higher epsilon, with Quantile showing the highest susceptibility to attack (0.92). This outcome highlights a trade-off between accuracy and privacy, as greater accuracy at epsilon 50 correlates with an increase in attack vulnerability.

For Exponential noise under Local DP, accuracy also improves with a higher epsilon value. Mean and Quantile have relatively lower accuracies at epsilon 10 (− 2.2975 and − 0.1556, respectively), while at epsilon 50, accuracy is positive for all functions, with Variance achieving the highest (0.9601). The results for Exponential noise highlight its ability to maintain strong accuracy retention while balancing privacy, although attack levels also increase as privacy constraints loosen. Nevertheless, attack levels also rise with epsilon, reaching 0.92 for Quantile. Exponential noise shows a notable increase in vulnerability from epsilon 10 to epsilon 50, suggesting a similar trade-off between accuracy and privacy.

With Gaussian noise under Local DP, Mean and Quantile exhibit negative accuracies at epsilon 10, which are less severe than those under Laplace but still indicate substantial noise impact. At epsilon 50, accuracy improves across all functions, with Variance reaching the highest (0.9593), demonstrating that Gaussian noise can better maintain accuracy at higher privacy budgets. As with other mechanisms, the increased epsilon correlates with higher attack probabilities, particularly for Quantile, which remains most vulnerable under relaxed privacy settings.

In summary, accuracy consistently improves across all noise types as epsilon increases, with Variance demonstrating the most resilience and highest accuracy overall. However, attack probability also increases with a higher epsilon. These findings emphasize the importance of tailoring noise mechanisms and privacy budgets to specific application requirements, balancing the need for privacy with the utility of the results. For example, Variance offers a favorable balance of accuracy and attack resistance, while Quantile requires careful consideration due to its higher susceptibility to attacks. In comparison to non-DP configurations, the trade-offs in our approach highlight the practicality of achieving sufficient clustering utility while preserving strong privacy guarantees. These findings are particularly relevant for healthcare applications, where balancing data confidentiality with actionable insights is essential.

Fig. 4
figure 4

Comparison of accuracy and attack levels across noise types for local differential privacy methods.

Clustering analysis

Mean function analysis. Interestingly, for Agglomerative clustering, the Centralized DP with Gaussian noise at \(\epsilon = 10\) yields a score of 0.3598, and Local DP with Exponential noise at \(\epsilon = 10\) yields a score of 0.3556, both of which are higher than the non-DP baseline score of 0.3089.

The results in Table 2 show that clustering quality can sometimes even improve with noise added through differential privacy, indicating robust clustering potential while maintaining privacy. These findings demonstrate that differential privacy mechanisms can achieve competitive clustering utility with mean-based functions, even in privacy-preserving configurations, making them suitable for sensitive data applications such as healthcare.

Table 2 Silhouette scores for mean function by DP type and epsilon. Significant values are in bold.

Variance function analysis. Interestingly, for Agglomerative clustering, the local DP with Gaussian noise at \(\epsilon = 50\) yields a score of 0.3930, which is higher than the non-DP baseline score of 0.3089.

The results in Table 3 show that adding differential privacy noise doesn’t always reduce clustering quality; in some cases, it can even enhance it. This indicates a favorable balance between privacy and utility, as variance-based functions exhibit minimal utility loss even under privacy constraints. The findings indicate only a minor utility trade-off when applying DP to variance-based functions, which actually perform better than both non-DP and mean-based functions (with a score of 0.3598). This highlights how specific noise mechanisms and privacy budgets can be carefully selected to optimize clustering performance while maintaining strong privacy guarantees. The inherent data structure captured by Agglomerative clustering further contributes to this balance, demonstrating its effectiveness in privacy-preserving scenarios.

Table 3 Silhouette scores for variance function by DP type and epsilon. Significant values are in bold.

Quantile function analysis. The results show that for the Quantile function, the local DP with Exponential noise at \(\epsilon = 50\) performs very closely to the non-DP baseline, achieving silhouette scores of 0.4378 for both KMeans and Agglomerative clustering, compared to the non-DP score of 0.4387. Although this mechanism does not outperform the non-DP baseline, the similarity in scores suggests minimal loss in clustering utility. Although the Quantile function doesn’t outperform non-DP, it shows the highest score and provides the most robust support for privacy-preserving clustering in healthcare applications.

Table 4 Silhouette scores for quantile function by DP type and epsilon. Significant values are in bold.

The results in Table 4 indicate that the Quantile function with Local DP and Exponential noise at \(\epsilon = 50\) achieves the highest clustering quality while preserving privacy, nearly matching the non-DP baseline. This configuration offers a strong balance for healthcare monitoring in applications like chronic disease tracking, where both privacy and accurate clustering are essential. When greater privacy is required, the variance function with Local DP and Gaussian noise at \(\epsilon = 50\) provides a reliable alternative, demonstrating robust clustering performance with minimal utility loss under stricter privacy constraints. Agglomerative clustering plays a key role by effectively capturing the underlying structure of the data, resulting in superior clustering quality.

Overall, these findings show that our privacy-preserving clustering approach is both effective and highly suitable for healthcare applications such as patient activity monitoring and chronic disease management. By balancing privacy and utility through careful selection of configurations, our approach ensures actionable insights without compromising patient confidentiality. By focusing on federated analytics rather than accessing actual patient data, our approach builds trust and encourages adoption in privacy-sensitive environments.

Comparison results

In this experiment, we compared FAItH against two other approaches: federated analytics (without differential privacy) and federated learning (FL). In this experiment, we use the Variance function with the Laplace privacy mechanism as a candidate function to compare against other approaches, as it demonstrates a good trade-off between accuracy and privacy. We used a few important metrics to evaluate the clustering performance, including the silhouette score, Davies–Bouldin Score, and computation time. FAItH does not expect to outperform federated analytics; instead, FA shows a great guideline. As FA does not introduce noise to the results.

First, we discuss results when we set \(\alpha\) = 3. When we looked at the Silhouette Score, which measures how well the clusters are formed, federated analytics (without privacy) showed the best results with a score of 0.59 for 510 clients, indicating the best cluster separation. FAItH, which adds differential privacy, naturally had lower Silhouette Scores, with a score of 0.53 for 510 clients, because of the noise introduced for privacy reasons. Still, FAItH performed better than federated learning, which had the lowest Silhouette Scores overall, with a score of 0.31 for 510 clients. Interestingly, as the number of clients increased, federated learning improved slightly, but it still lagged behind FAItH and federated analytics. Next, we looked at the Confidence Intervals (CIs) for the Silhouette Scores to get a sense of how consistent the clustering results were for each approach. For federated analytics (FA), the CIs were more consistent with a higher upper bound, showing stable cluster formations. In contrast, FAItH’s CIs were a bit narrower, which is expected since adding differential privacy introduces some noise, but the results still stayed within a reasonable range. Federated learning (FL) showed much wider CIs, pointing to more uncertainty in its clustering. This is especially clear because the upper bounds for FL were generally higher, meaning its clustering was less predictable compared to FA and FAItH.

Next, we checked the Davies–Bouldin Score, a measure of cluster quality-lower values are better. As we expected, federated analytics had the best performance here, with the lowest Davies–Bouldin Scores (0.53 for 510 clients), indicating it had the most distinct clusters. FAItH was a bit higher due to the noise from differential privacy, but still performed better than federated learning, which had the highest Davies–Bouldin Scores (1.47 for 510 clients) and the least distinct clusters.

Lastly, we compared computation time. For all numbers of clients involved, both FA and FAItH took less than a minute; however, for federated learning, the time consumption was significantly higher, ranging between approximately 1.43 min (86.02 s for 10 clients) to 16.11 min (966.72 s for 510 clients) due to the training overhead inherent in the nature of FL. In typical simulations, we usually work with around 100 clients. However, to test how well FAItH scales, we extended the experiment up to 980 clients. Since FAItH doesn’t have the extra overhead from optimization, it doesn’t take much longer to process, which made it possible to test such a large number of clients. The results showed that the Silhouette Scores remained stable throughout the experiment. FA showed a small improvement, reaching 0.61, while FAItH reached 0.54. The Davies–Bouldin Score stayed the same for both methods. Since both FA and FAItH finished in under a minute, we decided not to include the federated learning (FL) approach in this part of the experiment, as its communication time was much higher. This demonstrates how scalable FAItH is, even with large numbers of clients.

Fig. 5
figure 5

Comparison of clustering performance for FAItH, federated analytics, and federated learning across different metrics where \(\alpha = 3\).

First, we discuss the results when we set \(\alpha = 5\). When we looked at the Silhouette Score, which measures how well the clusters are formed, federated analytics (without privacy) showed the best results with a score of 0.60 for 510 clients, indicating the best cluster separation. FAItH, which adds differential privacy, naturally had lower Silhouette Scores, with a score of 0.53 for 510 clients, because of the noise introduced for privacy reasons. Still, FAItH performed better than federated learning, which had the lowest Silhouette Scores overall, with a score of 0.31 for 510 clients. Interestingly, as the number of clients increased, federated learning improved slightly, but it still lagged behind FAItH and federated analytics.

Next, we looked at the Confidence Intervals (CIs) for the Silhouette Scores to get a sense of how consistent the clustering results were for each approach. For federated analytics (FA), the CIs were more consistent with a higher upper bound, showing stable cluster formations. In contrast, FAItH’s CIs were a bit narrower, which is expected since adding differential privacy introduces some noise, but the results still stayed within a reasonable range. Federated learning (FL) showed much wider CIs, pointing to more uncertainty in its clustering. This is especially clear because the upper bounds for FL were generally higher, meaning its clustering was less predictable compared to FA and FAItH.

Next, we checked the Davies–Bouldin Score, a measure of cluster quality-lower values are better. As we expected, federated analytics had the best performance here, with the lowest Davies–Bouldin Scores (0.5263 for 510 clients), indicating it had the most distinct clusters. FAItH was a bit higher due to the noise from differential privacy, but still performed better than federated learning, which had the highest Davies–Bouldin Scores (1.490308 for 510 clients) and the least distinct clusters.

Lastly, we compared computation time. For all numbers of clients involved, both FA and FAItH took less than a minute; however, for federated learning, the time consumption was significantly higher, ranging between approximately 1.65 min (98.85 s for 10 clients) to 16.86 min (1011.73 s for 510 clients) due to the training overhead inherent in the nature of FL. Similar to the previous experiment when \(\alpha\) =3 we increased the number of clients to 980, and the results remained steady. This consistency further proves that FAItH is a suitable solution for large-scale environments

Fig. 6
figure 6

Comparison of clustering performance for FAItH, federated analytics, and federated learning across different metrics where \(\alpha = 5\).

In this section, we compared FAItH, federated analytics (FA), and federated learning (FL) using key clustering metrics such as the Silhouette Score, Davies–Bouldin Score, Dunn Index, and computation time. The results clearly show that while FAItH performs well, its clustering quality is slightly lower than federated analytics due to the noise introduced by differential privacy (DP). However, even with this reduction, FAItH outperformed federated learning in terms of clustering quality, as FL had the lowest scores overall.

One of the most significant findings is that FAItH remains stable and performs consistently even as the number of clients increases. When we pushed the experiment to 980 clients, the results still held steady, proving that FAItH is a highly scalable solution. Federated analytics showed a slight improvement in the Silhouette Score, but FAItH’s performance was still close behind, showing the effectiveness of DP without compromising too much on clustering quality. Furthermore, unlike federated learning, which has higher computation time due to training overhead, both FA and FAItH took less than a minute, confirming that FAItH is suitable for large-scale environments.

This makes FAItH a great choice for privacy-preserving federated analytics, especially in large-scale scenarios that require strong privacy protection. DP is becoming the best practice for such environments, as it provides necessary privacy guarantees without introducing significant computational overhead. This highlights FAItH’s potential as a scalable and efficient privacy-enhanced solution for federated analytics.

link

Leave a Reply

Your email address will not be published. Required fields are marked *

Copyright © All rights reserved. | Newsphere by AF themes.