A Mutual-Information-Guided and ADASYN-Augmented Machine Learning Framework for Early Prediction of Parkinson’s Disease
Keywords:
XGBoost, Tree-Structured Parzen Estimator, Data Augmentation, ADASYN, Feature Selection, Mutual InformationAbstract
Early detection of Parkinson’s disease (PD) is essential for timely medical intervention and improving patient outcomes. Speech signal analysis offers a non-invasive, cost-effective, and easily deployable diagnostic pathway. However, achieving reliable early prediction remains challenging due to data imbalance, redundant features, and model instability. This study aims to develop an optimized and robust machine learning framework that enhances the predictive accuracy and stability of PD detection from speech data. An optimized machine learning model based on eXtreme Gradient Boosting (XGBoost) was developed for early PD prediction. The model’s hyperparameters were tuned using the Tree-structured Parzen Estimator (TPE), while Mutual Information (MI) was employed to select the most informative features from the speech dataset. To address class imbalance, the Adaptive Synthetic Sampling Approach for Imbalanced Learning (ADASYN) was applied to generate synthetic minority samples. Model performance and stability were evaluated using ten independent runs of Stratified 10-Fold Cross-Validation (SCV). The proposed framework achieved superior predictive performance with an average accuracy of 97.27%, precision of 98.79%, F1-score of 97.18%, recall of 95.77%, and ROC-AUC of 98.11% across multiple evaluations. Comparative analysis with similar studies demonstrated improved robustness, reliability, and balance between sensitivity and specificity. The integration of MI-based feature selection and ADASYN-based data augmentation significantly enhanced the performance and stability of the XGBoost model for early PD prediction. The proposed model demonstrates strong potential for clinical use as a decision support system, providing a low-cost, non-invasive, and remotely deployable tool for early PD diagnosis using patient speech signals.
References
J. S. Almeida et al., "Detecting Parkinson’s disease with sustained phonation and speech signals using machine learning techniques," Pattern Recognition Letters, vol. 125, pp. 55-62, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167865519301163.
D. Braga, A. M. Madureira, L. Coelho, and R. Ajith, "Automatic detection of Parkinson’s disease based on acoustic analysis of speech," Engineering Applications of Artificial Intelligence, vol. 77, pp. 148-158, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/abs/pii/S0952197618302045.
S. Srinivasan, P. Ramadass, S. K. Mathivanan, K. Panneer Selvam, B. D. Shivahare, and M. A. Shah, "Detection of Parkinson disease using multiclass machine learning approach," Scientific Reports, vol. 14, no. 1, p. 13813, 2024. [Online]. Available: https://www.nature.com/articles/s41598-024-64004-9.
J. Mei, C. Desrosiers, and J. Frasnelli, "Machine learning for the diagnosis of Parkinson's disease: a review of literature," Frontiers in aging neuroscience, vol. 13, p. 633752, 2021. [Online]. Available: https://www.frontiersin.org/journals/aging-neuroscience/articles/10.3389/fnagi.2021.633752/full.
M. A. Islam, M. Z. H. Majumder, M. A. Hussein, K. M. Hossain, and M. S. Miah, "A review of machine learning and deep learning algorithms for Parkinson's disease detection using handwriting and voice datasets," Heliyon, vol. 10, no. 3, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2405844024015007.
M. Beriich, A. Ouhmida, Z. Alouani, S. Saleh, B. Cherradi, and A. Raihani, "Advancing Parkinson’s Disease Detection: A Review of AI and Deep Learning Innovations," in 2025 5th International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET), 2025: IEEE, pp. 1-10. [Online]. Available: https://jglobal.jst.go.jp/en/detail?JGLOBAL_ID=202502252256845613.
A. Reddy et al., "Artificial intelligence in Parkinson's disease: Early detection and diagnostic advancements," Ageing research reviews, vol. 99, p. 102410, 2024. [Online]. Available: https://ejece.org/index.php/ejece/article/view/488.
A. H. Schapira, K. R. Chaudhuri, and P. Jenner, "Non-motor features of Parkinson disease," Nature reviews neuroscience, vol. 18, no. 7, pp. 435-450, 2017. [Online]. Available: https://pubmed.ncbi.nlm.nih.gov/28592904/.
W. Poewe et al., "Parkinson disease," Nature reviews Disease primers, vol. 3, no. 1, pp. 1-21, 2017.
E. R. Dorsey, T. Sherer, M. S. Okun, and B. R. Bloem, "The emerging evidence of the Parkinson pandemic," Journal of Parkinson’s disease, vol. 8, no. s1, pp. S3-S8, 2018. [Online]. Available: https://pubmed.ncbi.nlm.nih.gov/30584159/.
M. J. Armstrong and M. S. Okun, "Diagnosis and treatment of Parkinson disease: a review," Jama, vol. 323, no. 6, pp. 548-560, 2020. [Online]. Available: https://pubmed.ncbi.nlm.nih.gov/32044947/.
B. R. Bloem, M. S. Okun, and C. Klein, "Parkinson's disease," The Lancet, vol. 397, no. 10291, pp. 2284-2303, 2021. [Online]. Available: https://pubmed.ncbi.nlm.nih.gov/33848468/.
Y. Ben-Shlomo, S. Darweesh, J. Llibre-Guerra, C. Marras, M. San Luciano, and C. Tanner, "The epidemiology of Parkinson's disease," The Lancet, vol. 403, no. 10423, pp. 283-292, 2024. [Online]. Available: https://pubmed.ncbi.nlm.nih.gov/38245248/.
A. Zolin, H. Ooi, M. Zhou, C. Su, F. Wang, and H. Sarva, "Liver fibrosis associated with more severe motor deficits in early Parkinson’s disease," Clinical Neurology and Neurosurgery, vol. 252, p. 108861, 2025. [Online]. Available: https://scholar.google.com/citations?user=P4PgpD4AAAAJ&hl=en.
R. Lamba, T. Gulati, and A. Jain, "A hybrid feature selection approach for parkinson’s detection based on mutual information gain and recursive feature elimination," Arabian Journal for Science and Engineering, vol. 47, no. 8, pp. 10263-10276, 2022. [Online]. Available: https://www.springerprofessional.de/en/a-hybrid-feature-selection-approach-for-parkinson-s-detection-ba/20046808.
R. Kardan, M. Nazari, J. Hemmati, A. Ahmadi, and M. Ashab, "A Novel Therapeutic Strategy for Parkinson's Disease based on the Gut Microbiota: A Rreview Article," (in eng), Scientific Journal of Kurdistan University of Medical Sciences, Review vol. 29, no. 3, pp. 127-138, 2024, doi: 10.61186/sjku.29.3.11.
D. Aarsland et al., "Parkinson disease-associated cognitive impairment," Nature reviews Disease primers, vol. 7, no. 1, p. 47, 2021. [Online]. Available: https://pubmed.ncbi.nlm.nih.gov/34210995/.
Z. Yang et al., "Optimizing parkinson’s disease prediction: A comparative analysis of data aggregation methods using multiple voice recordings via an automated artificial intelligence pipeline," Data, vol. 10, no. 1, p. 4, 2025. [Online]. Available: https://www.mdpi.com/2306-5729/10/1/4.
M. N. Kadhim, D. Al-Shammary, and F. Sufi, "A novel voice classification based on Gower distance for Parkinson disease detection," International Journal of Medical Informatics, vol. 191, p. 105583, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1386505624002466.
İ. Cantürk and O. Günay, "Investigation of scalograms with a deep feature fusion approach for detection of Parkinson’s disease," Cognitive Computation, vol. 16, no. 3, pp. 1198-1209, 2024. [Online]. Available: https://link.springer.com/article/10.1007/s12559-024-10254-8.
N. Patel¹, R. Srividhya, P. E. Linda, and S. Rajesh¹, "Parkinson's Insight: Leveraging CNN and LSTM Networks for Enhanced Diagnostic Accuracy," in Proceedings of the International Conference on Advancements in Computing Technologies and Artificial Intelligence (COMPUTATIA 2025), 2025, vol. 189: Springer Nature, p. 157. [Online]. Available: https://www.atlantis-press.com/proceedings/computatia-25/126010054.
Z. K. Senturk, "Early diagnosis of Parkinson’s disease using machine learning algorithms," Medical hypotheses, vol. 138, p. 109603, 2020. [Online]. Available: https://pubmed.ncbi.nlm.nih.gov/32028195/.
G. Pahuja and T. Nagabhushan, "A comparative study of existing machine learning approaches for Parkinson's disease detection," IETE Journal of Research, vol. 67, no. 1, pp. 4-14, 2021. [Online]. Available: https://www.shs-conferences.org/articles/shsconf/ref/2022/09/shsconf_etltc2022_03027/shsconf_etltc2022_03027.html.
D. Gupta et al., "Optimized cuttlefish algorithm for diagnosis of Parkinson’s disease," Cognitive systems research, vol. 52, pp. 36-48, 2018. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1389041718301876.
P. Sharma, S. Sundaram, M. Sharma, A. Sharma, and D. Gupta, "Diagnosis of Parkinson’s disease using modified grey wolf optimization," Cognitive Systems Research, vol. 54, pp. 100-115, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/abs/pii/S1389041718308726.
R. Lamba, T. Gulati, H. F. Alharbi, and A. Jain, "A hybrid system for Parkinson’s disease diagnosis using machine learning techniques," International Journal of Speech Technology, pp. 1-11. [Online]. Available: https://dl.acm.org/doi/10.4018/IJSI.292027.
W. Thirapanish, P. Kantavat, D. Wanvarie, E. Chuangsuwanich, and P. Punyabukkana, "Evaluating Machine Learning-Based Feature Selection Methods for Diagnosing Parkinson's Disease Under the SVM Framework," in 2024 7th International Conference on Artificial Intelligence and Big Data (ICAIBD), 2024: IEEE, pp. 409-415. [Online]. Available: https://www.researchgate.net/publication/382718440_Evaluating_Machine_Learning-Based_Feature_Selection_Methods_for_Diagnosing_Parkinson's_Disease_Under_the_SVM_Framework.
D. Baruah, R. Rehman, P. K. Bora, P. Mahanta, K. Dutta, and P. Konwar, "Performance Evaluation of Classification Algorithms for Parkinson’s Disease Diagnosis: A Comparative Study," Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 7, no. 3, pp. 692-712, 2025. [Online]. Available: https://jeeemi.org/index.php/jeeemi/article/view/713.
V. J. Kadam and S. M. Jadhav, "Feature ensemble learning based on sparse autoencoders for diagnosis of Parkinson’s disease," in Computing, Communication and Signal Processing: Proceedings of ICCASP 2018: Springer, 2018, pp. 567-581.
S. Rahman, M. Hasan, A. K. Sarkar, and F. Khan, "Classification of Parkinson’s disease using speech signal with machine learning and deep learning approaches," European Journal of Electrical Engineering and Computer Science, vol. 7, no. 2, pp. 20-27, 2023. [Online]. Available: https://ejece.org/index.php/ejece/article/view/488.
D. Jain, A. K. Mishra, and S. K. Das, "Machine learning based automatic prediction of Parkinson’s disease using speech features," in Proceedings of International Conference on Artificial Intelligence and Applications: ICAIA 2020, 2020: Springer, pp. 351-362. [Online]. Available: https://www.researchgate.net/publication/342640627_Machine_Learning_Based_Automatic_Prediction_of_Parkinson's_Disease_Using_Speech_Features.
H. Reddy, D. V. S. Jagadeesh, P. B. Pati, and B. P. Kn, "Parkinson's Disease Diagnosis from Patients Speech Analysis," in 2024 IEEE 9th International Conference for Convergence in Technology (I2CT), 2024: IEEE, pp. 1-5. [Online]. Available: https://www.semanticscholar.org/paper/Parkinson's-Disease-Diagnosis-from-Patients-Speech-HarshithaReddy-Aryagopal/e24528e3b84b9b2f6d65f7c8821d1bc9a9f16639.
H. M. Balaha, A. E.-S. Hassan, R. A. Ahmed, and M. H. Balaha, "Comprehensive multimodal approach for Parkinson’s disease classification using artificial intelligence: insights and model explainability," Soft Computing, pp. 1-33, 2025. [Online]. Available: https://dl.acm.org/doi/10.1007/s00500-025-10463-9.
D. K. Saha and T. D. Nath, "A lightweight CNN-based ensemble approach for early detecting Parkinson’s disease with enhanced features," International Journal of Speech Technology, pp. 1-15, 2025. [Online]. Available: https://pubmed.ncbi.nlm.nih.gov/28592904/.
B. Akila and J. J. V. Nayahi, "Parkinson classification neural network with mass algorithm for processing speech signals," Neural Computing and Applications, vol. 36, no. 17, pp. 10165-10181, 2024. [Online]. Available: https://link.springer.com/article/10.1007/s10462-025-11347-y.
A. H. Al-Fatlawi, M. H. Jabardi, and S. H. Ling, "Efficient diagnosis system for Parkinson's disease using deep belief network," in 2016 IEEE Congress on evolutionary computation (CEC), 2016: IEEE, pp. 1324-1330.
R. Lamba, T. Gulati, K. A. Al-Dhlan, and A. Jain, "A systematic approach to diagnose Parkinson’s disease through kinematic features extracted from handwritten drawings," Journal of Reliable Intelligent Environments, pp. 1-10, 2021.
S. Yadav, M. K. Singh, and S. Pal, "Artificial intelligence model for parkinson disease detection using machine learning algorithms," Biomedical Materials & Devices, vol. 1, no. 2, pp. 899-911, 2023.
R. Alshammri, G. Alharbi, E. Alharbi, and I. Almubark, "Machine learning approaches to identify Parkinson's disease using voice signal features," Frontiers in artificial intelligence, vol. 6, p. 1084001, 2023. [Online]. Available: https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2023.1084001/full.
P. Das and S. Nanda, "Bio-inspired voting ensemble weighted extreme learning machine classifier for the detection of Parkinson’s disease," Research on Biomedical Engineering, vol. 39, no. 3, pp. 493-507, 2023.
D. Jain, A. K. Mishra, and S. K. Das, "Machine learning based automatic prediction of Parkinson’s disease using speech features," in Proceedings of International Conference on Artificial Intelligence and Applications: ICAIA 2020, 2021: Springer, pp. 351-362.
P. Das, S. Nanda, and G. Panda, "Automated improved detection of Parkinson’s disease using ensemble modeling," in 2020 IEEE International Symposium on Sustainable Energy, Signal Processing and Cyber Security (iSSSC), 2020: IEEE, pp. 1-5. [Online]. Available: https://www.proceedings.com/content/057/057976webtoc.pdf.
H. H. B. Y. G. EA and S. A. Li, "adaptive synthetic sampling approach for imbalanced learning 2008," in 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) Hong KongPiscataway: IEEE, vol. 13221328.
T. Chen and C. Guestrin, "Xgboost: A scalable tree boosting system," in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785-794.
L. Yang and A. Shami, "On hyperparameter optimization of machine learning algorithms: Theory and practice," Neurocomputing, vol. 415, pp. 295-316, 2020.
N. R. Baqer and P. Rashidi-Khazaee, "Residential Building Energy Usage Prediction Using Bayesian-Based Optimized XGBoost Algorithm," IEEE Access, 2025. [Online]. Available: https://ieeexplore.ieee.org/iel8/6287639/10820123/10900361.pdf.
M. Little, P. McSharry, E. Hunter, J. Spielman, and L. Ramig, "Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease," Nature Precedings, pp. 1-1, 2008. [Online]. Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC3051371/.
J. Bergstra, D. Yamins, and D. Cox, "Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures," in International conference on machine learning, 2013: PMLR, pp. 115-123. [Online]. Available: https://proceedings.mlr.press/v28/bergstra13.html.
M. H. J. Ali H. Al-Fatlawi, Sai Ho Ling, "Efficient Diagnosis System for Parkinson's Disease
Using Deep Belief Network," 2016 IEEE Congress on evolutionary computation (CEC), vol. 2016 Jul 24, pp. 1324-1330, 2016.
V. J. Kadam and S. M. Jadhav, "Feature ensemble learning based on sparse autoencoders for diagnosis of Parkinson’s disease," in Computing, Communication and Signal Processing: Proceedings of ICCASP 2018, 2019: Springer, pp. 567-581. [Online]. Available: https://dl.acm.org/doi/abs/10.1007/s00521-021-05741-0.
N. Arasavali, R. Challapalli, J. Jayalakshmi, K. Kasireddy, C. Moturu, and P. Poornapriya, "Parkinson's Classification based on Vocal Features using a Hybrid DNN Multi-Layered LSTM Model," in 2024 2nd International Conference on Signal Processing, Communication, Power and Embedded System (SCOPES), 2024: IEEE, pp. 1-5.

Downloads
Published
Submitted
Revised
Accepted
Issue
Section
License
Copyright (c) 2025 Ghadeer Aqil Ali, Leila Sharifi (Author); Parviz Rashidi-Khazaee (Corresponding author); Hossein Nahid-Titkanlue (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.