The efficacy of Federated Learning (FL), a prominent privacy-preserving machine learning paradigm, is critically challenged by the statistical heterogeneity inherent in non-independent and identically distributed (Non-IID) data across client silos. While numerous algorithms have been proposed to address this issue, a comprehensive understanding of their relative performance under diverse conditions remains elusive. This paper presents a large-scale, systematic comparative study, evaluating nine notable FL algorithms—including FedAvg, FedProx, SCAFFOLD, MOON, FedBN, and server-side adaptive methods—under a wide spectrum of Non-IID settings. Our methodology involves rigorous experimentation across eight benchmark datasets, simulating practical challenges such as label distribution skew, feature distribution skew, and data quantity imbalances. Our key finding is that no single algorithm is universally superior; optimal performance is highly contingent on the specific type of data heterogeneity. We demonstrate that adaptive optimizers like FedAdagrad excel under severe label skew, whereas FedBN is the definitive choice for feature skew. Furthermore, we uncover the practical limitations of theoretically-motivated methods like SCAFFOLD in highly heterogeneous environments. These insights culminate in a practical decision tree to guide algorithm selection, providing a clear roadmap for researchers and practitioners.
| CATEGORY | DATASET | PARTITIONING | FedAvg | FedProx | SCAFFOLD | FedNova | FedAdagrad | FedYogi | FedAdam | MOON | FedBN |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Label Distribution Skew | MNIST | pk ~ Dir(0.5) | 98.85% ± 0.05% | 98.91% ± 0.03% | 61.99% ± 1.4% | 98.97% ± 0.03% | 98.81% ± 0.04% | 98.86% ± 0.06% | 98.46% ± 0.02% | 98.80% ± 0.08% | 98.91% ± 0.02% |
| #C = 1 | 40.34% ± 26.51% | 40.24% ± 21.8% | 10.99% ± 0.5% | 54.66% ± 9.53% | 68.85% ± 9.5% | 34.20% ± 15.34% | 18.86% ± 12.77% | 9.92% ± 0.24% | 60.73% ± 7.55% | ||
| #C = 2 | 97.53% ± 0.07% | 97.42% ± 0.12% | 21.81% ± 1.92% | 97.37% ± 0.31% | 95.92% ± 0.37% | 97.78% ± 0.22% | 61.91% ± 36.65% | 55.75% ± 32.89% | 97.50% ± 0.26% | ||
| #C = 3 | 98.53% ± 0.16% | 98.65% ± 0.15% | 29.95% ± 0.15% | 98.43% ± 0.15% | 98.22% ± 0.07% | 98.69% ± 0.11% | 95.8% ± 1.82% | 95.06% ± 1.60% | 98.56% ± 0.11% | ||
| FMNIST | pk ~ Dir(0.5) | 86.88% ± 0.38% | 87.16% ± 0.18% | 48.80% ± 1.36% | 87.12% ± 0.32% | 86.39% ± 0.19% | 86.44% ± 0.15% | 84.46% ± 1.72% | 87.15% ± 0.38% | 87.01% ± 0.28% | |
| #C = 1 | 15.88% ± 4.6% | 28.39% ± 16.68% | 10% ± 0.0% | 20.28% ± 14.53% | 37% ± 4.9% | 25.35% ± 10.93% | 10% ± 0.0% | 10% ± 0.0% | 21.13% ± 10.30% | ||
| #C = 2 | 79.3% ± 2.37% | 78.33% ± 0.7% | 22.85% ± 2.60% | 75.76% ± 3.94% | 78.11% ± 1.03% | 81.97% ± 0.78% | 63.14% ± 15.59% | 67.77% ± 2.61% | 79.09% ± 1.18% | ||
| #C = 3 | 83.67% ± 0.55% | 83.62% ± 0.85% | 25.56% ± 0.16% | 83.51% ± 0.67% | 83.52% ± 0.68% | 83.42% ± 0.13% | 79.62% ± 0.72% | 74.57% ± 3.46% | 83.56% ± 0.76% | ||
| SVHN | pk ~ Dir(0.5) | 86.44% ± 0.31% | 86.48% ± 0.1% | 51.74% ± 1.20% | 86.48% ± 0.13% | 86.7% ± 0.35% | 85.34% ± 0.21% | 86.7% ± 0.35% | 86.15% ± 0.46% | 86.45% ± 0.51% | |
| #C = 1 | 14.32% ± 2.3% | 13.27% ± 4.53% | 19.59% ± 0.0% | 9.15% ± 1.82% | 18.37% ± 1.72% | 18.37% ± 1.72% | 15.65% ± 5.58% | 12.49% ± 3.03% | 12.69% ± 2.30% | ||
| #C = 2 | 77.49% ± 1.34% | 79.28% ± 0.17% | 24.55% ± 0.86% | 72.41% ± 2.15% | 76.37% ± 1.18% | 79.03% ± 0.26% | 57.99% ± 27.15% | 74.41% ± 1.76% | 76.27% ± 2.53% | ||
| #C = 3 | 81.71% ± 1.08% | 82.11% ± 1.14% | 31.11% ± 0.51% | 81.59% ± 0.91% | 82.43% ± 0.14% | 81.27% ± 0.64% | 82.43% ± 0.14% | 81.26% ± 1.06% | 81.77% ± 0.49% | ||
| CINIC10 | pk ~ Dir(0.5) | 36.41% ± 0.18% | 36.24% ± 0.3% | 19.16% ± 0.32% | 36.59% ± 0.69% | 30.24% ± 1.90% | 35.65% ± 0.27% | 30.57% ± 1.46% | 36.93% ± 0.94% | 36.24% ± 0.23% | |
| #C = 1 | 9.64% ± 0.54% | 9.46% ± 0.77% | 11.17% ± 1.51% | 10% ± 0.0% | 10% ± 0.0% | 9.93% ± 0.33% | 10% ± 0.0% | 10% ± 0.0% | 10% ± 0.0% | ||
| #C = 2 | 25.34% ± 1.33% | 28.15% ± 1.2% | 16.47% ± 1.53% | 26.08% ± 1.31% | 13.58% ± 5.06% | 26.58% ± 0.92% | 13.58% ± 5.06% | 25.41% ± 0.68% | 27.55% ± 0.79% | ||
| #C = 3 | 32.91% ± 0.44% | 32.75% ± 0.42% | 15.46% ± 1.17% | 32.99% ± 0.41% | 19% ± 6.36% | 30.01% ± 0.91% | 18.66% ± 6.13% | 33.32% ± 0.86% | 33.08% ± 0.23% | ||
| CIFAR10 | pk ~ Dir(0.5) | 65.61% ± 1.52% | 64.16% ± 0.25% | 25.74% ± 1.65% | 64.66% ± 1.05% | 60.62% ± 0.84% | 59.41% ± 0.4% | 38.95% ± 5.79% | 64.69% ± 0.82% | 63.43% ± 0.102% | |
| #C = 1 | 9.64% ± 0.54% | 9.46% ± 0.77% | 10.11% ± 0.34% | 11.66% ± 2.9% | 19.4% ± 0.11% | 22.79% ± 3.63% | 10% ± 0.0% | 10% ± 0.0% | 11.20% ± 1.43% | ||
| #C = 2 | 47.83% ± 3.29% | 49.82% ± 0.66% | 17.44% ± 1.79% | 46.27% ± 2.21% | 46.63% ± 0.97% | 47.38% ± 1.41% | 10% ± 0.0% | 42.23% ± 1.0% | 48.36% ± 1.50% | ||
| #C = 3 | 62.65% ± 1.49% | 63.57% ± 0.2% | 20.11% ± 2.02% | 61.7% ± 1.51% | 59.9% ± 1.06% | 59.02% ± 0.34% | 38.34% ± 3.52% | 61.21% ± 0.74% | 61.93% ± 0.89% | ||
| FedISIC2019 | pk ~ Dir(0.5) | 56.46% ± 0.30% | 56.23% ± 0.49% | 28.30% ± 14.09% | 33.61% ± 0.55% | 53.80% ± 0.60% | 54.14% ± 0.38% | 48.27% ± 0.07% | 55.90% ± 0.29% | 56.24% ± 0.55% | |
| #C = 1 | 18.34% ± 0.0% | 28.3% ± 14.09% | 48.22% ± 0.0% | 22.56% ± 19.46% | 48.55% ± 0.47% | 48.22% ± 0.0% | 48.22% ± 0.0% | 37.02% ± 15.93% | 27.02% ± 15.08% | ||
| #C = 2 | 40.09% ± 5.30% | 38.52% ± 14.27% | 48.22% ± 0.0% | 28.95% ± 14.39% | 52.18% ± 0.87% | 51.84% ± 0.53% | 48.22% ± 0.0% | 48.43% ± 1.16% | 48.10% ± 1.58% | ||
| #C = 3 | 47.93% ± 1.71% | 48.02% ± 2.41% | 48.22% ± 0.0% | 40.68% ± 4.82% | 52.25% ± 0.81% | 51.92% ± 1.13% | 38.26% ± 14.09% | 50.52% ± 2.15% | 49.18% ± 0.26% | ||
| Adult | pk ~ Dir(0.5) | 81.73% ± 2.62% | 83.48% ± 2.17% | 76.49% ± 0.0% | 69.07% ± 2.05% | 83.64% ± 1.71% | 83.31% ± 1.29% | 82.98% ± 0.92% | 82.39% ± 2.40% | 84.39% ± 0.49% | |
| #C = 1 | 80.21% ± 3.35% | 81.06% ± 2.98% | 76.49% ± 0.0% | 54.08% ± 0.29% | 85.03% ± 0.23% | 84.03% ± 1.48% | 84.70% ± 0.57% | 77.88% ± 4.82% | 79.22% ± 5.57% | ||
| Number of times that performs the best | 3 | 4 | 2 | 1 | 8 | 4 | 2 | 2 | 1 | ||
| Feature distribution skew | MNIST | x̂ ~ Gau(0.1) | 99.23% ± 0.07% | 99.26% ± 0.06% | 98.19% ± 0.12% | 99.24% ± 0.03% | 99.28% ± 0.07% | 99.22% ± 0.06% | 99.05% ± 0.09% | 99.24% ± 0.02% | 99.28% ± 0.02% |
| FMNIST | 84.35% ± 0.13% | 84.65% ± 0.30% | 72.68% ± 0.44% | 84.43% ± 0.03% | 83.99% ± 0.25% | 84.57% ± 0.26% | 82.64% ± 0.51% | 84.57% ± 0.12% | 84.72% ± 0.14% | ||
| SVHN | 68.85% ± 0.67% | 69.39% ± 0.40% | 70.25% ± 0.41% | 69.75% ± 1.20% | 73.14% ± 1.03% | 72.8% ± 0.56% | 70.09% ± 0.92% | 70.09% ± 0.73% | 69.27% ± 1.75% | ||
| CINIC10 | 35.66% ± 1.05% | 34.51% ± 1.01% | 31.97% ± 0.95% | 33.84% ± 0.70% | 23.10% ± 9.29% | 35.44% ± 0.73% | 23.44% ± 9.50% | 34.18% ± 0.5% | 33.82% ± 0.88% | ||
| CIFAR10 | 64.02% ± 0.08% | 63.59% ± 0.09% | 47.04% ± 1.01% | 63.16% ± 1.13% | 63.48% ± 0.88% | 63.36% ± 0.42% | 51.72% ± 0.75% | 64.93% ± 0.34% | 63.8% ± 0.56% | ||
| FedISIC2019 | 54.75% ± 0.56% | 55.11% ± 0.12% | 49.56% ± 0.68% | 54.78% ± 0.04% | 52.36% ± 0.52% | 52.7% ± 0.99% | 48.87% ± 0.88% | 54.96% ± 0.86% | 55.3% ± 0.46% | ||
| FCUBE | synthetic | 99.57% ± 0.05% | 99.67% ± 0.12% | 97.07% ± 1.28% | 99.87% ± 0.12% | 99.53% ± 0.12% | 99.67% ± 0.12% | 99.67% ± 0.12% | 99.7% ± 0.14% | 99.67% ± 0.09% | |
| Number of times that performs the best | 1 | 0 | 0 | 1 | 2 | 0 | 0 | 1 | 3 | ||
| Quantity skew | MNIST | q ~ Dir(0.5) | 99.02% ± 0.05% | 99.01% ± 0.07% | 97.3% ± 0.02% | 98.95% ± 0.05% | 99.08% ± 0.04% | 98.89% ± 0.05% | 98.61% ± 0.07% | 99.07% ± 0.06% | 99.08% ± 0.06% |
| FMNIST | 88.80% ± 0.05% | 88.46% ± 0.31% | 80.53% ± 0.8% | 88.99% ± 0.11% | 88.95% ± 0.35% | 88.75% ± 0.05% | 86.41% ± 0.16% | 89.09% ± 0.17% | 88.08% ± 0.0% | ||
| SVHN | 87.38% ± 1.27% | 87.99% ± 0.38% | 77.08% ± 0.97% | 87.52% ± 0.44% | 88.80% ± 0.06% | 87.44% ± 0.05% | 85.08% ± 0.28% | 87.85% ± 0.82% | 87.94% ± 0.13% | ||
| CINIC10 | 38.73% ± 0.55% | 38.56% ± 0.39% | 37.61% ± 0.40% | 38.55% ± 0.29% | 26.96% ± 12.0% | 39.23% ± 0.48% | 26.63% ± 11.76% | 38.22% ± 0.67% | 39.73% ± 0.45% | ||
| CIFAR10 | 71.13% ± 0.37% | 71.01% ± 0.39% | 46.14% ± 0.07% | 71.72% ± 0.58% | 69.40% ± 0.38% | 68.24% ± 0.68% | 25.90% ± 22.49% | 71.6% ± 0.44% | 71.23% ± 0.52% | ||
| FedISIC2019 | 57.73% ± 0.29% | 58.14% ± 0.55% | 49.96% ± 0.75% | 57.04% ± 0.53% | 56.18% ± 0.63% | 56.52% ± 0.18% | 48.66% ± 0.63% | 58.51% ± 0.66% | 57.93% ± 0.2% | ||
| Adult | 84.26% ± 0.04% | 84.48% ± 0.23% | 85.55% ± 0.05% | 83.98% ± 0.22% | 83.83% ± 0.48% | 83.5% ± 0.83% | 83.83% ± 0.48% | 83.6% ± 0.82% | 83.93% ± 0.41% | ||
| Number of times that performs the best | 0 | 0 | 1 | 1 | 2 | 0 | 0 | 2 | 2 | ||
| Homogeneous partition | MNIST | IID | 99.11% ± 0.05% | 99.10% ± 0.01% | 98.27% ± 0.09% | 99.15% ± 0.04% | 99.11% ± 0.07% | 98.99% ± 0.11% | 98.78% ± 0.02% | 99.19% ± 0.03% | 99.09% ± 0.07% |
| FMNIST | 89.27% ± 0.23% | 89.21% ± 0.24% | 83.69% ± 0.45% | 89.29% ± 0.08% | 89.65% ± 0.03% | 89.01% ± 0.11% | 87.54% ± 0.57% | 89.35% ± 0.16% | 89.16% ± 0.36% | ||
| SVHN | 88.07% ± 0.36% | 88.30% ± 0.19% | 82.05% ± 0.24% | 87.30% ± 1.51% | 60.14% ± 28.69% | 87.32% ± 0.34% | 61.30% ± 29.50% | 87.63% ± 1.65% | 88.35% ± 0.21% | ||
| CINIC10 | 39.99% ± 0.88% | 39.82% ± 0.26% | 41.58% ± 0.48% | 40.18% ± 0.41% | 36.91% ± 1.39% | 40.83% ± 0.56% | 36.58% ± 0.93% | 40.85% ± 1.13% | 39.92% ± 0.92% | ||
| CIFAR10 | 72.59% ± 0.51% | 74.14% ± 0.01% | 51.88% ± 0.45% | 72.23% ± 0.31% | 69.53% ± 0.59% | 68.47% ± 0.37% | 65.79% ± 2.08% | 69.16% ± 0.47% | 72.36% ± 0.32% | ||
| FedISIC2019 | 59.15% ± 0.83% | 60.25% ± 0.12% | 52.59% ± 0.62% | 58.91% ± 0.30% | 51.82% ± 1.85% | 51.49% ± 2.32% | 51.49% ± 2.32% | 59.23% ± 0.54% | 58.9% ± 0.31% | ||
| Adult | 84.24% ± 0.39% | 84.10% ± 0.16% | 85.59% ± 0.12% | 83.97% ± 0.06% | 83.9% ± 0.1% | 84.23% ± 0.51% | 83.90% ± 0.10% | 84.42% ± 0.97% | 84.09% ± 0.51% | ||
| Number of times that performs the best | 0 | 2 | 2 | 0 | 1 | 0 | 0 | 1 | 1 | ||
@misc{,
author={},
title={},
publisher={},
year={},
}