Frontiers random forest article Random Forest Algorithm for the Classification of Neuroimaging Data in Alzheimer’s Disease A Systematic Review Frontiers in Aging Neuroscience

Frontiers random forest article Random Forest Algorithm for the Classification of Neuroimaging Data in Alzheimer’s Disease A Systematic Review Frontiers in Aging Neuroscience

where p j is the relative frequency of class j in the node n .
Ardekani, B. A., Bermudez, E., Mubeen, A. M., Bachman, A. H., and Alzheimer’s Disease Neuroimaging, I. . Prediction of incipient Alzheimer’s disease dementia in patients with mild cognitive impairment. J. Alzheimers Dis. 55, 269–281. doi: 10.3233/JAD-160594
RF follows specific rules for tree growing, tree combination, self-testing and post-processing, it is robust to overfitting and it is considered more stable in the presence of outliers and in very high dimensional parameter spaces than other machine learning algorithms . The concept of variable importance is an implicit feature selection performed by RF with a random subspace methodology, and it is assessed by the Gini impurity criterion index . The Gini index is a measure of prediction power of variables in regression or classification, based on the principle of impurity reduction ; it is non-parametric and therefore does not rely on data belonging to a particular type of distribution. For a binary split , the Gini index of a node n is calculated as follows:
Tripoliti et al. conducted a study on 41 subjects, divided into three groups: 12 subjects were AD patients , from very mild to mild following the Clinical Dementia Rating , 14 subjects were healthy young controls and 14 were healthy elderly subjects .
Results: Twelve articles—published between the 2007 and 2017—have been included in this systematic review after a quantitative and qualitative selection. The lesson learnt from these works suggest that when RF was applied on multi-modal data for prediction of Alzheimer’s disease conversion from the Mild Cognitive Impairment , it produces one of the best accuracies to date. Moreover, the RF has important advantages in terms of robustness to overfitting, ability to handle highly non-linear data, stability in the presence of outliers and opportunity for efficient parallel processing mainly when applied on multi-modality neuroimaging data, such as, MRI morphometric, diffusion tensor imaging, and PET images.
Figure 1. Illustration of a random forest construct superimposed on a coronal slice of the MNI 152 standard template. Each binary node is partitioned based on a single feature, and each branch ends in a terminal node, where the prediction of the class is provided. The different colors of the branches represent each of the trees in the forest. The final prediction for a test set is obtained by combining with a majority vote the predictions of all single trees.
A limitation of this systematic review concerns the lack of information about the tuning of the RF parameters. In particular, poor information were reported in the selected works about how the number and depth of trees in the forest or the splitting criteria were chosen. Although, this tuning is performed automatically by RF, how external assessment of these parameters would improve the overall accuracies is still unknown.
Oppedal, K., Eftestol, T., Engan, K., Beyer, M. K., and Aarsland, D. . Classifying dementia using local binary patterns from different regions in magnetic resonance images. Int. J. Biomed. Imaging 2015:572567. doi: 10.1155/2015/572567
Caruana, R., and Niculescu-Mizil, A. . “An empirical comparison of supervised learning algorithms,” in 23rd International Conference on Machine Learning , , 161–168.
Impact Factor 5.750  | CiteScore 7.4 More on impact ›
Zhang, D., Shen, D., and Alzheimer’s Disease Neuroimaging, I. . Predicting future clinical changes of MCI patients using longitudinal and multimodal biomarkers. PLoS ONE 7:e33182. doi: 10.1371/journal.pone.0033182
PubMed Abstract | CrossRef Full Text | Google Scholar
The high number of GM voxels was reduced with a feature selection approach consisted in the regularized logistic regression framework applied only on the dataset with AD and HC subjects. The selected variables were then aggregated with age and cognitive measurements and used for building the RF classifier for predicting AD in MCI patients, i.e., sMCI vs. pMCI.
Rathore, S., Habes, M., Iftikhar, M. A., Shacklett, A., and Davatzikos, C. . A review on neuroimaging-based classification studies and associated feature extraction methods for Alzheimer’s disease and its prodromal stages. Neuroimage 155, 530–548. doi: 10.1016/j.neuroimage.2017.03.057
For all these reasons, the main goal of this systematic review was to highlight the role of RF as the ideal candidate for handling the high-dimensional problem and the variable redundancy in the early diagnosis of AD. We sought to review the literature in this area to identify all the works that applied the RF algorithm on single and multi-modality neuroimaging data, eventually combined with demographics and genetic information, and with neuropsychological scores. Our aim was also to evaluate how well, in term of accuracy, RF was able to classify AD and to distinguish between sMCI and pMCI, and how its intrinsic feature selection procedure could improve this overall accuracy.
Keywords: random forest, Alzheimer’s disease, mild cognitive impairment, neuroimaging, classification
Breiman, L., Friedman, J., Stone, C. J., and Olshen, R. A. . Classification and Regression Trees . Boca Raton, FL: CRC press.
A sample of 105 subjects was selected by Son et al. from the ADNI database. The cohort was divided into three age—sex—matched groups: 30 AD , 40 MCI and 35 HC . All participants underwent 3 T acquisition of T1-w images and resting state functional MRI . Structural scans were pre-processed for correcting movement artifacts and smoothed, and then they were segmented into WM, GM, and CSF. The volumes of 10 subcortical regions were calculated as measure of atrophy. The rs-fMRI images were pre-processed and registered onto the T1-w and aligned to the MNI standard space. Given a set of ROIs from the AAL atlas as nodes, the functional networks were constructed by defying the edges as correlation values between nodes. Authors quantified the connectivity of the functional networks within the 10 subcortical regions with the eigenvector centrality measure among AD and HC, MCI and HC, and AD and MCI.
Demographic and behavioral data were grouped with the features extracted from the data preprocessing phase: head motions parameters; volumetric measures, i.e., volumes obtained from the segmentation of gray matter , WM and CSF; activation patterns, consisting in several measures derived from the activated voxels and clusters; hemodynamic measures extracted from the BOLD responses, such as, the amplitude of venous volume or of vascular signal. Authors applied a feature selection on this dataset for reducing the dimensionality by removing highly correlated variables. Selected features were used for training a RF classifier with 10 trees, and the performance was assessed using 10-fold cross-validation accuracy. Two separated datasets were evaluated: the first consisted of AD patients and both young and old healthy subjects, while the second consisted of AD and only old controls. Sensitivity and specificity of the two binary classifiers were ranging from 94 to 98%, depending of the subset of selected features. The highest values were obtained on the dataset that included AD and old controls, with a 98% of both sensitivity and specificity.
A further interesting characteristic of the RF algorithm in the AD realm was the estimates of the features importance. The ranking of the variables plays an important role because it could assess which of the features contribute most to the prediction by also providing a correspondence to anatomical regions or structures with a biologically plausible connection to pathology .
This systematic review provided, for the first time, a framework for the exploration of the RF algorithm and of its strength in predicting AD when high-dimensional and multi-modal neuroimaging data are combined with demographics, genetic and cognitive scores. Indeed, as recently stated by Rathore et al. , no single neuroimaging modality is enough to reach optimal accuracy for automatic AD prediction, but only through the combination of different methodologies, the classification task could be effectively translated into the clinical realm. Our work supported the idea that there is some complementary information between modalities and that this knowledge can be successfully explored with a combination of classifiers rather than a single one. The RF, as a bagging ensemble model, provided promising results, but with possible limitations. Thus, given the high accuracies reached by RF in the classification of dementia, we aimed at encouraging further studies, especially for comparing and integrating this algorithm with other machine learning approaches, such as, the deep learning, which recently showed its potentiality in the investigation of neuroimaging correlates . In the future, the aggregation of multi-approaches , multimodal and multi-sites data would drastically increase our ability to extract reliable biomarkers of neurodegenerative diseases.
The feature importance was assessed with the intrinsic characteristic of RF consisting of the recursive feature elimination with the Gini index as criterion and 10,000 trees. The performance of models—with and without RFE—was evaluated as the overall accuracy on a separate test set with 35 AD and 75 HC. Findings revealed that the highest accuracy for the classifier AD vs. HC was obtained with the RFE on the combined dataset with thickness and non-cortical volumes. An increase of 0.7% was found in this accuracy when authors combined all models by a majority vote approach. The majority vote method resulted to have also the best ability to predict MCI-to-AD conversion 2 years before actual dementia onset with sensitivity/specificity of 76.6/75%. As further analysis, authors found that the adding of ApoE genotype and demographics data did not improve the overall accuracy in distinguishing AD from HC, while it showed an increase of sensitivity/specificity in the prediction of MCI conversion.
Greicius, M. D., Srivastava, G., Reiss, A. L., and Menon, V. . Default-mode network activity distinguishes Alzheimer’s disease from healthy aging: evidence from functional MRI. Proc. Natl. Acad. Sci. U.S.A 101, 4637–4642. doi: 10.1073/pnas.0308627101
Another interesting observation was that, both in binary and ternary problems, feature selection based on the Gini index, improved the overall performance and this is true also for the works in which only a neuroimaging modality was used . Other kinds of feature selection and extraction, applied prior to the RF classification, showed also an improvement in the overall accuracies .
Feature selection and classification were performed with a RF classifier with 10 trees and the 10-fold nested cross validation accuracy was used as the performance metric. In particular, three RF models were built: a ternary problem HC vs. AD vs. LBD, a binary classifier HC vs. AD+LBD and another binary model AD vs. LBD.
Conclusions: We discussed the strengths of RF, considering also possible limitations and by encouraging further studies on the comparisons of this algorithm with other commonly used classification approaches, particularly in the early prediction of the progression from MCI to AD.
Although the single-modality classification results were comparable between the original dataset and the embedded feature one, the latter presented the best performances as following: 86.4% for the AD vs. HC with the FDG-PET data, 73.8% for MCI vs. HC with the genetic data, 58.4% for the sMCI vs. pMCI with MRI data. A slight increase of the accuracy was obtained with the multi-modality classification for AD vs. HC and for MCI vs. HC , while for pMCI vs. sMCI there was a small decrease .
Eight works applied feature selection/elimination for reducing the dimension of the variables space. The number of trees used in the RF was not specified in two cases . Finally, we reported in the column Results of Table 1 the—highest—overall accuracies of binary or ternary classifiers reached by each study, except for the one that provided only sensitivity and specificity. Figure 3 presented a comparison—where applicable—of accuracies obtained by the studies for the binary models AD vs. HC , MCI vs. HC , sMCI vs. pMCI , and the multi-class problem AD vs. HC vs. MCI
The highest accuracy in distinguishing between sMCI and pMCI was reached when the combination of neuroimaging and neuropsychiatric features was considered as training set. The classifiers built only on the baseline measures or only on HVI values showed indeed poor performance. The variable ranking of the 16 features revealed that—according to the impurity criterion—ADAS cognitive test was the most important one, followed by the rate of change of the right HVI.
For the ternary problem—HC vs. AD vs. LBD—the best accuracy was reached when the classifier was trained on the texture features extracted from the T1 images in the WML masks . Results of the model HC vs. AD+LBD revealed that the highest accuracy was obtained also when only T1WML variables were considered. On the contrary, for distinguishing AD from LBD with the maximum accuracy the texture features should be extracted from the T1 in the WM ROI.
Cabral et al. collected 177 subjects from the ADNI database, divided into three balanced groups: AD patients , MCI patients and HC . Authors analyzed FDG-PET data, acquired 24 months after the first visit and already pre-processed by ADNI. In particular, they used the voxel intensities as features of interest, for a total of 309,881 variables. The original dataset was decomposed by using the one-vs.-all scheme, resulting into three subsets: AD vs. ALL, MCI vs. ALL, HC vs. ALL. The Mutual Information criterion was used for extracting the optimal features with the highest ranking value, separately for each pairwise problem. The selected features were then used for training three binary RF models with 100 trees. As aggregation scheme for the ternary problem, the voting strategy was applied. The classification performance was then assessed by the 10-fold cross-validation accuracy, repeated 5 times with fold randomization. The ternary RF classifier provided a multiclass accuracy of 64.63%. It must be addressed that the authors applied other two algorithms, linear and RBF SVM, obtaining, respectively, an accuracy of 66.33% and 66.78%.
Cabral, C., Silveira, M., and Alzheimer’s Disease Neuroimaging, I. . Classification of Alzheimer’s disease from FDG-PET images using favourite class ensembles. Conf. Proc. IEEE Eng. Med. Biol. Soc. 2013, 2477–2480. doi: 10.1109/EMBC.2013.6610042
Objective: Machine learning classification has been the most important computational development in the last years to satisfy the primary need of clinicians for automatic early diagnosis and prognosis. Nowadays, Random Forest algorithm has been successfully applied for reducing high dimensional and multi-source data in many scientific realms. Our aim was to explore the state of the art of the application of RF on single and multi-modal neuroimaging data for the prediction of Alzheimer’s disease.
Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., and Group, P. . Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ 339:b2535. doi: 10.1136/bmj.b2535
Borza, T., Engedal, K., Bergh, S., Benth, J., and Selbæk, G. . The course of depression in late life as measured by the montgomery and asberg depression rating scale in an observational study of hospitalized patients. BMC Psychiatry 15:191. doi: 10.1186/s12888-015-0577-8
Fripp, J., Bourgeat, P., Acosta, O., Raniga, P., Modat, M., Pike, K. E., et al. . Appearance modeling of 11C PiB PET images: characterizing amyloid deposition in Alzheimer’s disease, mild cognitive impairment and healthy aging. Neuroimage 43, 430–439. doi: 10.1016/j.neuroimage.2008.07.053
Berchtold, N. C., and Cotman, C. W. . Evolution in the conceptualization of dementia and Alzheimer’s disease: Greco-Roman period to the 1960s. Neurobiol. Aging 19, 173–189. doi: 10.1016/S0197-458000052-9
Methods: A systematic review following PRISMA guidelines was conducted on this field of study. In particular, we constructed an advanced query using boolean operators as follows: AND neuroimaging AND AND . The query was then searched in four well-known scientific databases: Pubmed, Scopus, Google Scholar and Web of Science.
Figure 2 reported the four phases—identification, screening, eligibility and inclusion—of the process for the selection of the studies in this review. Nineteen records were excluded after the initial screening of title and abstract and three more records were removed after the full-text assessment, following the inclusion criteria. Finally, 12 studies were included in qualitative synthesis.
For the present systematic review, we followed the Preferred Reporting Items for Systematic Reviews and Meta-Analysis guidelines . The statement consists of a checklist of recommended items to be reported and a four-step flow diagram .
Table 1. Characteristics of each of the twelve studies included in the systematic review.
The Alzheimer’s disease , a common form of dementia, is a progressive neurodegenerative disorder that affects mostly elderly people . It is characterized by a decline in cognitive function, including progressive loss of memory, reasoning, and language . Mild cognitive impairment is an intermediate state between healthy aging and AD, which is not severe enough to interfere with daily life. Although not all MCI subjects develop to AD and they remain cognitively stable for many years, the incidence of progression is evaluated between 10 and 15% per year . There is no generally accepted cure for AD, but several treatments exist for delaying its course. For this reason, it is extremely important to early detect the MCI subjects that are at imminent risk of conversion to AD.
The work of Lebedeva et al. , was aimed at predicting MCI and dementia in late-life depression patients 1 year prior to the diagnosis. The analysis was conducted on a cohort of 32 patients including 21 MCI and 8 AD, and a group of 40 age—sex—matched HC from the PRODE prospective multicenter study . All subjects underwent 1.5/3 T MRI acquisition at the baseline and after 1 year. T1-w images were pre-processed for extracting CTH and subcortical volumes with a standard pipeline, for a total of 148 features. Clinical and neuropsychological assessment was performed for each subject at both time points.
Ardekani et al. applied their classification task on a cohort of 164 MCI patients from the ADNI database, divided into 78 stable MCI and 86 MCI converted to AD within 3 years from the baseline . All selected subjects underwent two 1.5 T MRI acquisitions, at the baseline and at ~1 year later. Neuropsychiatric scores of these two time points were also considered in the analysis. T1-w images—without any pre-processing—were used for calculating the hippocampal volumetric integrity , defined as the fraction of volume of a region that is expected to surround the hippocampus in a normal brain that is occupied by tissue . The HVI is measured—separately for each hemisphere—as the area under the histogram curve for voxel values above a CSF intensity threshold. The HVI measures and the neuropsychiatric scores were merged for a total of 16 features for each subject, including their average rate of change between the baseline and the 1-year follow-up.
RF is a collection or ensemble of Classification and Regression Trees trained on datasets of the same size as training set, called bootstraps , created from a random resampling on the training set itself. Once a tree is constructed, a set of bootstraps, which do not include any particular record from the original dataset , is used as test set. The error rate of the classification of all the test sets is the OOB estimate of the generalization error. Breiman showed by empirical evidence that, for the bagged classifiers, the OOB error is accurate as using a test set of the same size as the training set. Thus, using the OOB estimate removes the need for a separate test set. To classify new input data, each individual CART tree votes for one class and the forest predicts the class that obtains the plurality of votes.
More details about individual works, such as, the results obtained with other algorithms or other subsets of features, could be found in the next section.
In a first phase, authors assessed the importance value of the voxels in discriminating AD from HC with two different feature selection methods: the Wilcoxon rank sum test and the ReliefF algorithm, which were used both within a non-nested and nested approach. For the classification task, fifteen subsets were then created by selecting an increasing number—from 50 to 3,000—of most discriminating voxels, ordered by decreasing importance. RF models were trained with 300 trees on each of these feature subspaces and their performance was evaluated with a repeated 5-fold cross-validation accuracy.
Several RF models were trained on different feature subsets and their performance were evaluated with the OOB estimation of classification accuracy. The mean reduction of Gini impurity index was used for the assessment of the variable importance.
Strobl, C., Boulesteix, A. L., and Augustin, T. . Unbiased split selection for classification trees based on the gini index. Comput. Stat. Data Anal. 52, 483–501. doi: 10.1016/j.csda.2006.12.030
The ternary problem, AD vs. MCI vs. HC, was evaluated by training a RF classifier with the SV and the eigenvector centrality measures as features. The multi-class accuracy of the RF model was assessed with a repeated leave-one-out cross-validation approach. Authors reached a poor performance in distinguishing among AD, MCI, and HC subjects. However, they identified distinctive regional atrophy and functional connectivity patterns characterizing each binary problem AD vs. HC , MCI vs. HC , and MCI vs. AD .
AS: Research project: Conception, Organization, and Execution. Statistical Analysis: Design, Execution, Review, and Critique. Manuscript: Writing of the first draft, Review, and Critique. AC: Research project: Conception, Organization and Execution. Manuscript: Review and Critique. AQ: Research project: Organization and Execution. Manuscript: Review and Critique.
Collie, A., and Maruff, P. . The neuropsychology of preclinical Alzheimer’s disease and mild cognitive impairment. Neurosci. Biobehav. Rev. 24, 365–374. doi: 10.1016/S0149-763400012-9
Moradi et al. obtained baseline data for their analysis from the ADNI database and they selected 825 subjects grouped as: 200 AD patients , 100 stable MCI , 164 MCI progressed to AD within 3 years from the baseline and 231 HC . Another group of 100 unknown MCI diagnosed as MCI at the baseline but with missing diagnosis at 36 months follow-up was also considered. For integrating the unlabeled group of uMCI into the training set and assigning them to the pMCI or sMCI class, the authors used a low density separation approach for semi-supervised learning.
The best accuracies—around 90%—for the binary problem AD vs. HC, were observed when the RF classifiers were trained on high-dimensional and multi-modality data . Superior performance of these models can be explained by the ability of RF to detect less extensive changes in the variables, which could be not revealed by others algorithms. Moreover, Moradi et al. showed that RF was more immune to the data type thanks to its capability to handle discrete data and to apply an efficient discretization algorithm on continuous data type before the learning step.
For splitting a binary node in the best way, the improvement in the Gini index should be maximized. In other words, a low Gini means that a particular predictor feature plays a greater role in partitioning the data into the two classes. Thus, the Gini index can be used to rank the importance of features for a classification problem.
Moradi, E., Pepe, A., Gaser, C., Huttunen, H., Tohka, J., and Alzheimer’s Disease Neuroimaging, I. . Machine learning framework for early MRI-based Alzheimer’s conversion prediction in MCI subjects. Neuroimage 104, 398–412. doi: 10.1016/j.neuroimage.2014.10.002

Sarica, A., Cerasa, A., Valentino, P., Yeatman, J., Trotta, M., Barone, S., et al. . The corticospinal tract profile in amyotrophic lateral sclerosis. Hum. Brain. Mapp. 38, 727–739. doi: 10.1002/hbm.23412
PubMed Abstract | CrossRef Full Text | Google Scholar
Sivapriya, T. R., Kamal, A. R., and Thangaiah, P. R. . Ensemble merit merge feature selection for enhanced multinomial classification in Alzheimer’s dementia. Comput. Math. Methods Med. 2015:676129. doi: 10.1155/2015/676129
Received: 23 June 2017; Accepted: 22 September 2017; Published: 06 October 2017.
Frontiers random forest article Random Forest Algorithm for the Classification of Neuroimaging Data in Alzheimer's Disease A Systematic Review Frontiers in Aging Neuroscience
Frontiers random forest article Random Forest Algorithm for the Classification of Neuroimaging Data in Alzheimer’s Disease A Systematic Review Frontiers in Aging Neuroscience
Maggipinto, T., Bellotti, R., Amoroso, N., Diacono, D., Donvito, G., Lella, E., random forest article et al. . DTI measurements for Alzheimer’s classification. Phys. Med. Biol. 62, 2361–2375. doi: 10.1088/1361-6560/aa5dbe
The RF model performance was evaluated as the mean accuracy calculated by 10-fold cross-validation. The highest accuracy in distinguishing the MCI-to-AD conversion reached almost the 82% when the concatenated measures—age, cognitive, and voxel—and the combination of LDS and RF were considered. The importance analysis of MRI features, age, and cognitive measurements calculated by RF classifier revealed that the first three most predictive variables were: MRI voxels, the Rey’s Auditory Verbal Learning Test and the Alzheimer’s Disease Assessment Scale—cognitive subtest 11 .
The models built on the FA features selected with the non-nested approach showed the highest accuracies in both binary problems, AD vs. HC and MCI vs. HC . The non-nested variable selection resulted to produce better results than the nested one also when MD voxels were used for training the classifiers .
In Oppedal et al. , a total of 73 mild dementia subjects, divided into 57 AD patients and 16 LBD patients, together with 36 HC were investigated. The cohort MRIs were acquired in different research centers with 1.0/1.5 T scanners and FLAIR images were also obtained. T1-w images were corrected, registered and segmented for extracting the white matter tissue. From the pre-processed FLAIR images, the WM lesions maps were automatically created. In a second phase of the study, authors applied the local binary pattern approach as a texture descriptor on both T1 and FLAIR images and their derived WM and WML maps as ROIs. For enhancing the discriminative power of LBP, an image contrast measure was added as variable for every voxel in the specified ROI. The total number of features for each subject was 48, resulting from the combination of LBP and C values in each ROI.
Falahati, F., Westman, E., and Simmons, A. . Multivariate data analysis and machine learning in Alzheimer’s disease with a focus on structural magnetic resonance imaging. J. Alzheimers Dis. 41, 685–708. doi: 10.3233/JAD-131928
The features of interest were extracted from 1.5 T MRI images using a surface-based cortex reconstruction and volumetric segmentation. In particular, non-cortical volumes, cortical thickness , Jacobian maps and sulcal depth were measured for each subject. The ability of these parameters in distinguishing AD from HC, was assessed individually and with a combination of measurements of CTH and non-cortical volumes.
Breiman, L. . Bagging predictors. Mach. Learn. 24, 123–140. doi: 10.1007/BF00058655
The binary models for distinguishing MCI from HC and stable MCI from progressive MCI showed lower accuracies, around 82%, although it was similarly improved by multi-modal data classification . In particular, the inclusion of age as well as cognitive measurements , in the space of features, significantly increased the classification of MCI vs. HC and the AD conversion prediction in MCI patients . On the contrary, for the conundrum between sMCI vs. pMCI, Gray et al. found that the accuracy reached on multi-modality classification is not significantly different from that obtained with MRI information alone. Interestingly, authors suggested that the lack of improvement in distinguishing the progression to AD, could be overcame by incorporating longitudinal information, as indeed Ardekani et al. demonstrated afterwards by considering the rate of change of variables.
The cohort underwent a visual fMRI finger tapping task. Raw structural and functional images were preprocessed for correction of motion artifacts, registered and normalized.
The feature selection and classification task was composed by three main phases in which RF performance was evaluated together with other ensemble algorithms—Naïve Bayes, J48 and SVM. Each classifier was trained with each of the four datasets, after that they were dimensionally reduced with a particle swarm optimization approach coupled with the Merit Merge technique . The performance of the classification models was evaluated with the 5-fold cross-validation accuracy of the ternary problem AD vs. MCI vs. HC. RF—implemented with 100 to 1,000 trees—showed its best multi-class accuracy when it was trained on the baseline combined dataset and the same result was obtained with the CPEMM feature selection methodology. It must be addressed that RF reached comparable performance of the other classification algorithms, except for SVM that presented the lowest accuracies in the delineation of dementia.
The study of Wang et al. included 129 subject with MCI from the ADNI database. The cohort was divided into 65 stable MCI and 64 progressive MCI , who converted to AD within 3 years from the baseline. All subject underwent the acquisition of 1.5 T MRI, florbetapir-PET and FDG-PET. Authors analyzed already pre-processed neuroimaging data by ADNI, separately grouped according to the modality of acquisition, i.e., features extracted from the T1-w images and the uptake of florbetapir and FDG. A dataset with a combination of these multimodal measures was also evaluated. Three classification algorithms—partial least square , linear SVM and RF —were trained on these four different datasets. Their ability in distinguishing sMCI from pMCI was assessed with the leave-one-out cross-validation accuracy. RF showed the best accuracy when it was trained on the combined multi-modal features dataset. A comparable result on the same dataset was reached by SVM. On the contrary, informed PLS generally outperformed both RF and SVM especially when the three neuroimaging modalities are fused .
Chen, X., Wang, M., and Zhang, H. . The use of classification trees for bioinformatics. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 1, 55–63. doi: 10.1002/widm.14
The study of Lebedev et al. was based on a cohort of 575 subjects from ADNI database, divided into three main groups: 185 AD , 165 patients with MCI of which 149 progressed to AD within 4 years, and 225 HC . The MCI group was split into six subgroups according to the month of MCI-to-AD conversion .
Published titles and abstracts in the English language from the first of January 2007 to the first of May 2017 were searched systematically across the following databases: PubMed, Scopus, Google Scholar, and Web of Science. The search terms were concatenated in an advanced query using boolean operators as follows: AND neuroimaging AND AND . After the initial web search, duplicate items among databases were removed.
The cohort investigated by Maggipinto et al. was obtained from ADNI database and it consisted of 150 subjects divided into three groups: 50 AD, 50 MCI, and 50 HC with an age range from 55 to 90. Diffusion-weighted scans acquired with a 3 T scanner was used for this machine learning study, randomly selected from the baseline and follow-up visit. DTIs were pre-processed for correction of movement artifacts and eddy currents with a standard pipeline. A diffusion tensor was fitted for each subject and fractional anisotropy and mean diffusion maps were extracted. The FA and MD maps were then used as input for a tract-based spatial statistics analysis, which—for each subject —produced ~120,000 voxels for each diffusion metric.
Abraham, A., Milham, M. P., Di Martino, A., Craddock, R. C., Samaras, D., Thirion, B., et al. . Deriving reproducible biomarkers from multi-site resting-state data: an autism-based example. Neuroimage 147, 736–745. doi: 10.1016/j.neuroimage.2016.10.045
Vieira, S., Pinaya, W. H., and Mechelli, A. . Using deep learning to investigate the neuroimaging correlates of psychiatric and neurological disorders: methods and applications. Neurosci. Biobehav. Rev. 74, 58–75. doi: 10.1016/j.neubiorev.2017.01.002
RF has been successfully applied in many scientific realms such as, the bioinformatics, proteomics, and genetics , but it was less applied on neuroimaging data for the prediction of the Azheimer’s disease. The present paper is the first, to our knowledge, that systematically analyzed the literature of the last 10 years on the use of the RF algorithm on neuroimaging data for the early diagnosis of AD. In this review, we summarized the characteristics of twelve works by focusing our attention on performance reached by their algorithms.
Again, what still remains to be assessed is the performance of RF algorithm on multi-site data. As already demonstrated for rs-fMRI datasets from different sites , the accuracy and the reliability of the biomarkers extraction could be enhanced by dramatically increasing the cohort size. Moreover, it was shown that classifiers trained on data from multiple sources will likely generalize better to new observations , avoiding the overfitting. Thus, it would be interesting to evaluate how well RF could classify when it is trained on features that are not invariant across sites and how the sample heterogeneity influences its performance.
During the screening phase, to be assessed for eligibility, studies were required to: investigate a cohort of AD in cross-sectional case-control or longitudinal design, analyze neuroimaging data, apply RF algorithm as Machine Learning technique for the classification of AD patients.
Gray, K. R., Aljabar, P., Heckemann, R. A., Hammers, A., Rueckert, D., and Alzheimer’s Disease Neuroimaging, I. . Random forest-based similarity measures for multi-modal classification of Alzheimer’s disease. Neuroimage 65, 167–175. doi: 10.1016/j.neuroimage.2012.09.065
PubMed Abstract | CrossRef Full Text | Google Scholar
Three different binary datasets were used for the RF classification: AD vs. HC, MCI vs. HC, sMCI vs. pMCI. The performance of each classifiers was evaluated with a stratified repeated random sampling approach, where, in each of the 100 runs, the dataset was divided into training and test set . Accuracy on the test set was then calculated as the mean of all the 100 repetitions. The RF models were trained with 5,000 trees on the feature data from each of the four modalities independently and the feature importance ranking was extracted. As further analysis, authors measured the similarity between pairs of examples from the RF classifiers and applied a Manifold learning approach on data from single-modality and on combined/concatenated features .
Citation: Sarica A, Cerasa A and Quattrone A Random Forest Algorithm for the Classification of Neuroimaging Data in Alzheimer’s Disease: A Systematic Review. Front. Aging Neurosci . 9:329. doi: 10.3389/fnagi.2017.00329
Menze, B. H., Kelm, B. M., Masuch, R., Himmelreich, U., Bachert, P., Petrich, W., et al. . A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinformatics 10:213. doi: 10.1186/1471-2105-10-213
Ceriani, L., and Verme, P. . The origins of the gini index: extracts from variabilità e mutabilità by corrado gini. J. Econ. Inequal. 10, 421–443. doi: 10.1007/s10888-011-9188-x
All subjects underwent 1.5 T MRI acquisition and the T1-w scans were preprocessed following the voxel-based morphometry approach. In particular, T1-w images were corrected, spatially normalized and segmented into GM, WM, and CSF. The GM maps were then further processed for extracting 29,852 GM density values—for each subject—used as MRI features for the classification task.
Calle, M. L., Urrea, V., Boulesteix, A. L., and Malats, N. . AUC-RF: a new strategy for genomic profiling with random forest. Hum. Hered. 72, 121–132. doi: 10.1159/000330778
Lebedev, A. V., Westman, E., Van Westen, G. J., Kramberger, M. G., Lundervold, A., Aarsland, D., et al. . Random forest ensembles for detection and prediction of Alzheimer’s disease with a good between-cohort robustness. Neuroimage Clin. 6, 115–125. doi: 10.1016/j.nicl.2014.08.023
Liberati, A., Altman, D. G., Tetzlaff, J., Mulrow, C., Gotzsche, P. C., Ioannidis, J. P., et al. . The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ 339:b2700. doi: 10.1136/bmj.b2700
Son, S. J., Kim, J., and Park, H. . Structural and functional connectional fingerprints in mild cognitive impairment and Alzheimer’s disease patients. PLoS ONE 12:e0173426. doi: 10.1371/journal.pone.0173426
Tianjin Medical University General Hospital, China
Figure 2. PRISMA workflow of the identification, screening, eligibility, and inclusion of the studies in the systematic review.
To reduce a risk of bias, two authors independently screened paper abstracts and titles, and analyzed the full papers that met the inclusion criteria, as suggested by the PRISMA guidelines. The reference lists of examined full-text papers were also scrutinized for additional relevant publications.
Figure 3. Histograms of the overall accuracy reached by the studies—where applicable—for the binary classifiers AD vs. HC, MCI vs. HC and sMCI vs. pMCI, and for the ternary problem AD vs. MC vs. HC. See also Table 1 . AD, Alzheimer’s disease; HC, healthy controls; MCI, Mild cognitive impairment; cMCI, converter MCI; pMCI, progressive.
Palmqvist, S., Hertze, J., Minthon, L., Wattmo, C., Zetterberg, H., Blennow, K., et al. . Comparison of brief cognitive tests and CSF biomarkers in predicting Alzheimer’s disease in mild cognitive impairment: six-year follow-up study. PLoS ONE 7:e38639. doi: 10.1371/journal.pone.0038639
Breiman, L. . Random Forests. Mach. Learn. 45, 5–32. doi: 10.1023/A:1010933404324
Greek Association of Alzheimer’s Disease and Related Disorders, Greece
PubMed Abstract | CrossRef Full Text | Google Scholar
Lebedeva, A. K., Westman, E., Borza, T., Beyer, M. K., Engedal, K., Aarsland, D., et al. . MRI-based classification models in prediction of mild cognitive impairment and dementia in late-life depression. Front. Aging Neurosci. 9:13. doi: 10.3389/fnagi.2017.00013
Authors want to acknowledge Mr. Simonluca Spadanuda for the creation of the random forest illustration .
Trzepacz, P. T., Yu, P., Sun, J., Schuh, K., Case, M., Witte, M. M., et al. . Comparison of neuroimaging modalities for the prediction of conversion from mild cognitive impairment to Alzheimer’s dementia. Neurobiol. Aging 35, 143–151. doi: 10.1016/j.neurobiolaging.2013.06.018
The diagnosis of AD is based primarily on multiple variables and factors, such as, demographics and genetic information, neuropsychological tests, cerebrospinal fluid biomarkers, and brain imaging data. Moreover, for the assessment of the risk of conversion from MCI, the rate of change of these variables could represent a further source of knowledge. In particular, the neuroimaging technologies, such as, magnetic resonance imaging , functional MRI , diffusion tensor imaging , single photon emission tomography , and positron emission tomography have been widely and successfully applied in the study of MCI and AD . The choice of the neuroimaging modality depends on the duration and severity of the disease, for example when MRI could not reveal any brain alterations, fMRI, SPECT, or PET are able to assess metabolic abnormalities and DTI could be used for investigating the microstructural disruption of the white matter .
Three works investigated the ternary problem: AD vs. MCI vs. HC, but only the work of Sivapriya et al. reached a reliable accuracy of 96.3%. The low performance of the other two studies—64.63% of Cabral et al. and 53.33% of Son et al. —might be due to the heterogeneous pattern of brain changes across the three groups and the inability of RF to model the too large variability in the stages of pathological process. Thus, although RF can be naturally extended to multi-class problems, the AD vs. MCI vs. HC ternary model could not be still translated into a real-world clinical scenario.
The editor and reviewers’ affiliations are the latest provided on their Loop research profiles and may not reflect their situation at the time of review.
Shen, D., Wu, G., and Suk, H. I. . Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 19, 221–248. doi: 10.1146/annurev-bioeng-071516-044442
Four datasets from the ADNI database were used by Sivapriya et al. and three different groups of subjects were selected: AD, MCI, and HC. The number of subjects in each dataset varied according to the features considered: Neuropsychological dataset , Neuroimaging dataset , Baseline combined data with both neuropsychological and neuroimaging measures , and combined dataset . Some of the neuropsychological tests used were the Clinical dementia ratio-SB, the ADAS, the RAVLT, and the MOCA. Authors used already pre-processed MRI data by ADNI for their study, in particular neuroimaging measures extracted from T1-w and FDG-PET images, consisting in volumes and average PIB SUVR of several regions of interest .
Data extracted from the studies—finally included in the qualitative synthesis—were: sample diagnosis, sample size and mean age, neuroimaging acquisition type, features of interest, RF classification parameters, classification performance validation, and selected findings in terms of classification performance.
The high dimension of all the features considered in the diagnosis of AD and in the progression from MCI, and their complex interactions make it very difficult for humans to interpret the data. Computer aided diagnosis represents a valuable automatic tool for supporting the clinicians by teaching to computers to predict incipient AD. Machine learning and pattern recognition algorithms have been proven to efficiently classify AD patients and healthy controls and to distinguish between stable MCI subjects and progressive MCI that converted to AD . In general, the machine learning methods used on neuroimaging data rely on a single classifier, such as, the widely used Support Vector Machine , Linear Discriminant Analysis , or Naïve Bayes. However, in the last years, ensembles algorithms resulted to be a reliable alternative to single classifiers showing better performance than the latter, especially when multi-modality variables are combined together. Although among all ensembles approaches Random Forest produced the best accuracies in many scientific fields and in other neurological diseases , it is still poorly applied in the prediction of AD, and only lately researchers payed their attention to it. In particular, RF showed important advantages over other methodologies regarding the ability to handle highly non-linearly correlated data, robustness to noise, tuning simplicity, and opportunity for efficient parallel processing . Moreover, RF presents another important characteristic: an intrinsic feature selection step, applied prior to the classification task, to reduce the variables space by giving an importance value to each feature.
Acosta-Cabronero, J., and Nestor, P. J. . Diffusion tensor imaging in Alzheimer’s disease: insights into the limbic-diencephalic network and methodological considerations. Front. Aging. Neurosci. 6:266. doi: 10.3389/fnagi.2014.00266
Tripoliti, E. E., Fotiadis, D. I., and Argyropoulou, M. . A supervised method to assist the diagnosis of Alzheimer’s disease based on functional magnetic resonance imaging. Conf. Proc. IEEE Eng. Med. Biol. Soc. 2007, 3426–3429. doi: 10.1109/IEMBS.2007.4353067
As further analysis, authors used their PRODE cohort as test set for the RF model previously built by Lebedev et al. on AD and HC from ADNI database. The accuracy was better when only SV measures were used than when SV and CTH were combined .
Data extracted from the studies were summarized in Table 1 . In particular, we reported those characteristics that are related to the highest performance reached by RF in each study. Regarding the cohort diagnosis, two works investigated Alzheimer’s patients and healthy controls , four works had AD, HC, and MCI, two studies considered AD, HC, stable MCI , and progressive MCI , two had sMCI and pMCI , one had HC and MCI and one had AD, HC, and Lewy-body dementia patients.
Frisoni, G. B., Fox, N. C., Jack, C. R. Jr., Scheltens, P., forest school newspaper articles