1. Background
In recent years, the movement toward primary prevention strategies for mild cognitive impairment (MCI) and dementia, especially Alzheimer’s Disease (AD), has highlighted the importance of early screening and risk identification at scale. Diagnostic criteria for both MCI and AD emphasise the value of biomarkers including β-amyloid protein (Aβ) and tau protein as early indicators of neuropathology that are associated with AD [1-3], but the cost and accessibility of these methods are significant barriers to widespread adoption.
Neurocognitive screening is another method for routine identification of cognitive decline. Several neurocognitive tests have been shown to be sensitive to both underlying Aβ and Tau burden [4] as well as the diagnosis of MCI and AD itself [1-3]. Neurocognitive tests are also non-invasive and relatively easy to administer. However, whilst ‘in clinic’ testing of cognition offers several advantages, primary care physicians, the most likely first channel for potential patients tend to lack technical support, infrastructure, and experience to effectively use such methods [5,6]. Furthermore, the relatively high cost of delivery of neuropsychological services in clinics and hospitals remains prohibitive in many countries especially in regions with the highest rates of dementia and/or looming dementia crises, such as Asia [7].
The advent of COVID-19 has created new barriers to ‘in clinic’ screening for MCI and dementia as significant numbers of elderly people at higher risk of cognitive decline appear to be avoiding attendance at clinics and hospitals due to infection concerns [8]. The problem is worse for researchers engaged in important work based in tertiary medical centres such as University Hospitals which rely on voluntary participation. This situation is unlikely to change in the near future, leading to under diagnosis and later diagnosis with all of the attendant costs to patients, their families, healthcare systems and society [9].
Computerised neuropsychological assessment has been in existence for over 20 years and has promised to provide a solution to this problem. This approach addresses the problems of administration and interpretation of neurocognitive tests as they can provide standardized and accurate delivery and reporting of cognitive function [10-12] but to date they have failed to provide the complete solution that is needed for screening at scale. Often such tests are highly expensive, require a physician or assistant to administer and are delivered via desktop or hardware solutions that must be separately purchased and/or may not possess the necessary clinical validity and patient experience to ensure widespread uptake.
A new approach to digital neurocognitive assessment involves the delivery of app-based tests via mobile phone. More than 3 Billion people own or use a smartphone worldwide and is forecast to further grow by several hundred million in the next few years. Contrary to popular perception, older people are increasingly adopting smart phones, nearly three-quarters (74%) of Americans aged 50-64 are smartphone owners, as are 42% of those 65 and older [13]. In China, there is widespread acceptance and adoption of smartphones in the for health applications [14] with over 85 percent of elderly mobile users having downloaded more than 20 apps, whilst half have more than 30 apps installed on their phones. Among middle-aged and elderly internet users (over 40 years old), 65.7 percent spend at least a quarter of their free time on mobile phones every day, and nearly 30 percent spend more than half of their free time on their phones [15].
This study is a pilot to investigate the diagnostic validity of Savonix, a mobile/tablet-based neurocognitive assessment system. We investigated the 11 tests of the Savonix system in older adults from Shanghai, China in order to determine their sensitivity, specificity and to MCI and AD and to examine their convergent validity in relation to standard paper and pencil screening tests - the MMSE, ADAS-Cog and CDR, SB.
2. Methods
2.1 Participants
Forty-nine cognitively impaired older patients (mean age =72 years) were recruited from Hua Shan hospital, affiliated with Fudan University, Shanghai, China. Twenty-eight patients were diagnosed with MCI, and 21 with mild AD. A slightly younger and larger cohort of 265 neuro-normal older adults (mean age = 62) were recruited from two community sites, Jian Ai Charity and Jinmei Care, both located in Shanghai, China for normative purposes. No user indicated color blindness (see Table 1).
2.2 Clinical diagnosis
Clinical cases were diagnosed according to National Institute on Aging (NIA)-Alzheimer’s Association guidelines [16,17] and Alzheimer’s Dementia diagnostic criteria followed The Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) [18], whereas Mild Cognitive Impairment (MCI) diagnostic criteria follows the Petersen (2004) criteria [19]. AD or MCI Diagnosis were typically initiated following an informant or self-reported cognitive change. History taking was conducted by an experienced clinician. The Chinese version of Mini-Mental State Examination [20] was also performed to evaluate general cognitive level and for diagnosis. Laboratory and imaging (MRI) examinations were used to rule out other neurological or a non-neurological medical comorbidity, including, but not limited to, vascular diseases, trauma, dementia with Lewy bodies. CSF and/or PET are not routinely conducted for AD/MCI patients in the hospital.
3. Cognitive Assessments
3.1 Bedside mental status examinations
Besides the MMSE, clinical patients were administered The Alzheimer Disease Assessment Scale Cognitive subscale Chinese version [21] and Clinical Dementia Rating (CDR) Chinese version [22] by an experienced clinician. All paper-and-pen tests, including MMSE, were taken by patients at least 3 months prior to the mobile cognitive assessment.
All the participants were administered the Savonix mobile cognitive assessment in November and December 2016. For the purposes of initial validation, Savonix was administered an experienced clinician administrator. The assessment included following 11 cognitive subtests:
Immediate/Delayed Verbal Learning: This task measures immediate and delayed verbal memory. In 1916, Édouard Claparède published the Test de mémoire des mots (Test of Memory for Words). This test would become the basis for André Rey’s Auditory Verbal Learning Test [23], and subsequent tests of verbal learning, including the Savonix Verbal Learning Task. The task presents patients with a list of words, which are to be committed to memory. Patients are tested on their ability to remember the words immediately after the original list is removed, and then again after at least a 15-minute delay.
Verbal Interference Part 1/Part 2: This task is based on the Stroop effect [24], which has been widely used in clinical and research settings for the past 50 years as an accurate measure of cognitive control and processing speed. Participants were presented with color words such as red, yellow, green, or blue, which were printed in a different color of ink (e.g., red printed in blue ink). Participants were able to easily read the words but found it more difficult to name the color of the ink. Stroop observed that the learned response of reading the words “interfered” with the unfamiliar task of naming the ink color. Naming the ink color requires inhibiting the well-practiced, interfering tendency to read words. This task measures focus by assessing the subject’s ability to inhibit automatic and irrelevant responses.
Trail Making Part 1/Part 2: This task is based on the classic test developed by Reitan (1958) [25]. Part 1 of this task is a measure of visual scanning and information processing speed. The user is presented with a pattern of 13 numbers (1-13) on the screen and is required to touch numbers in ascending sequence (i.e., 1, 2, 3...). As each number is touched in correct order, a line is drawn automatically to connect it to the preceding number or letter in the sequence. Part 2 of this task is a measure of cognitive flexibility and attention switching. The user is presented with a pattern of 13 numbers (1-13) and 12 letters (A-L) on the screen and is required to touch numbers and letters alternatively in ascending sequence (i.e. 1, A, 2, B, 3, C...). As each number or letter is touched in correct order, a line is drawn automatically to connect it to the preceding number or letter in the sequence.
Digit Span Forward/ Reverse: The Digit Span task is similar to that subtest of the Wechsler Adult Intelligence Scale [26]. The Forward Digit Span task consists of a number of trials where a series of digits are presented at a constant rate on the device screen. Immediately after each trial, the user is required to enter the digits on a keypad in the order in which they were flashed. In the Reverse Digit Span task, the user is required to enter the digits in reverse order. Sequence length varies between three and 10, with two trials for each length and with trials presented in ascending sequence order. The task ends when the participant fails two trials of any sequence length or when all trials are completed. This task is used as a measure of attention and working memory.
Maze Part 1/Part 2: The Savonix Maze task is similar to the procedure reported by Milner in 1965 [27], which is sometimes referred to as the Austin Maze and is sensitive to hippocampal, frontal lobe, and cerebellar lesions and/degeneration. In the original maze, bolts were attached to a wooden board; no markings were on the board; subjects were required to discover the path of the “maze” through trial and error. When subjects touched the bolts with a metal-tipped stylus, a sound would alert the subject if they were on the right path. Though the Savonix test is similar to the Austin Maze, the maze is displayed on a device screen and feedback is provided visually rather than auditorily.
Complex Figure: The Savonix Complex Figure task is based on the classical Rey-Osterrieth Complex Figure Test (ROCF) [29], a measure of visual spatial and constructional abilities as well as spatial memory, and has been used clinically to evaluate attention, planning, and executive function. The Savonix version is adapted fot a touchscreen device. Similar to the ROCF, the outcome variable of the Savonix Complex Figure is accuracy of the reproduced figure against the original figure, as measured by errors versus correct features. Following at least a 15-minute delay period, a free recall is administered in which subjects attempt to redraw the original design from memory. Both delayed cued and free recall memory paradigms have been identified as being important measures used in evaluating Alzheimer’s disease.
The Savonix application was preinstalled on a 9-inch tablet. User information such Gender, and Education was entered by an administrator on the tablet, and the tablet was handed to the patients before starting the test. Savonix tests were self-administrated after user information was collected.
3.2 Statistical analysis
Statistical analyses were conducted using R- Studio version 1. 3. 1073 with R version 4. 0. 2, pROC package version 1. 16. 2, [29], psych package version 2. 0. 9 [30] and dplyr package version 1. 0. 2, [31].
Demographic variables were compared in the AD, MCI, and neuro-normal groups by one-way Analyses of Variance (ANOVA) for age and education years, and a Person χ2 analysis for gender. MMSE total scores, ADAS-Cog total scores, and CDR Sum of Boxes were compared between AD and MCI subjects by t-tests. Univariate logistic regression analyses were applied for variables which showed statistical significance in the comparison tests (t- tests or χ2 analysis).
For each of the Savonix subtests, only one outcome variable was selected to be submitted for subsequent analysis: Ratio of incorrect responses for Immediate/Delayed Verbal Learning, Digit Span Forward/ Reverse, Maze 1 / 2, and Complex Figure; Total reaction time for Verbal Interference Part 1, Trail Making Part 1 / Part 2; and Ratio of incorrect responses divided by Total reaction times for Verbal Interference Part 2.
Nine cognitive domain scores were also composed from the 11 subtests: Instant Verbal Memory, Delayed Verbal Memory, Information Processing Speed, Flexible Thinking, Working Memory, Spatial Memory, Attention, Focus, and Executive Function. Inverse minmax norms of the raw data which were used for following analyses were compared in AD, MCI, and neuro-normal participants by one-way Kruskal-Wallis rank sum tests as several scores were non-normally distributed.
Convergent validity between traditional paper-and-pen screening tests and Savonix tests was measured using the Spearman’s correlation. MMSE’s total scores, ADAS-Cog’s total scores, CDR Sum of Boxes (CDRSB), total Savonix 11 test scores, total Savonix 9 domain scores, and total Savonix 6 memory-related domain scores (Instant Verbal Memory, Delayed Verbal Memory, Information Processing Speed, Flexible Thinking, Working Memory, Spatial Memory) for the MCI and AD subjects were used to calculate those correlation coefficients.
Logistic regression analyses were conducted to evaluate discriminatory ability of Savonix tests relative to the traditional paper-and- pen screening tests (only for MCI vs. AD classification). Based on univariate logistic regression analyses of age and education years, models for Normal vs. MCI discrimination and Normal vs. AD and for Normal vs. MCI/AD were adjusted by age and education years, and models for MCI vs. AD were adjusted by education years. For the MCI vs. AD models, all the sample (21 AD subjects and 28 MCI subjects) were used to estimate one model. For the other discrimination models, down-sampling for the Normal subjects was performed to balance the data: 1000 randomly down-sampled 30 Normal subjects (out of 265) and 28 MCI subjects to estimate 1000 Normal vs. MCI models; 1000 of 20 Normal subjects and 21 AD subjects for the Normal vs. AD models, and 1000 of 50 Normal subjects and 49 MCI-or-AD subjects for the Normal vs. MCI/AD models.
The discrimination accuracies of the logistic regression models were evaluated based on the Receiver Operating Characteristic (ROC) analyses. The discriminant indices were: sensitivities, specificities, positive predictive values (PPVs), negative predictive values (NPVs), and Area under the ROC curves (AUCs). For Normal vs. MCI, Normal vs. AD, and Normal vs. MCI/AD models, only the models with statistically significant regression coefficients for the test scores were considered out of 1000 sampled models, and of those considered models, the models with minimum and maximum AUCs were selected for discriminant indices [See Table 2-5, Figure 1].
4. Results
4.1 Participant characteristics
Table 1 lists the descriptive statistics of participants’ characteristics, such as age, education gender, years; the traditional paper-and-pen screening tests; as well as results of comparison tests and univariate logistic regression analyses.
Between-group one-way ANOVA revealed significant differences in age [F(1, 312)= 57.03, p<0.0001] and education [F(1, 312)= 13.45, p<0.001]. Bonferroni post-hoc showed significant differences in age between Normal vs. MCI (p<0.0001) and Normal vs. AD (p<0.0001). It also showed significant differences in education years between the Normal vs. AD classification (p<0.0001) and the MCI vs. AD classification (p<0.05). This means that the neuro-normal subjects are younger than those with MCI or AD classification, and AD subjects had more education than neuro-normal subjects and MCI subjects.
Univariate logistic analyses to discriminate the three groups (Normal, MCI, and AD) with age and education years suggested that the cognitive assessment scores need to be adjusted by age and education for models of Normal vs. MCI; Normal vs. AD; and Normal vs. MCI or AD discrimination. It also suggested that these scores need to be adjusted by education for MCI vs. AD discrimination.
Between-group t-tests showed significant differences in the MMSE, the ADAS-Cog, and the CDRSB scores between MCI and AD subjects [t(47)= 10.05, p<0.0001 for MMSE; t (47)= 3.27, p<0.01 for ADAS-Cog; and t(47)= 3.08, p<0.01 for CDRSB]. This means that MCI subjects showed better cognitive abilities than AD subjects.
4.2 Savonix test scores
Table 6 shows the variables used to score the 11 subtests; descriptive statistics of the minmax normalized scores; and the results of Kruskal- Wallis tests to compare between neuro-normal, MCI, and AD subjects.
The Kruskal-Wallis tests showed significant differences between Normal, MCI, and AD subject groups in most of the scores except for Verbal Interference Part 1.
Mann–Whitney tests with Bonferroni correction for post-hoc showed pair wise differences: between Normal vs. MCI (p< 0.01), Normal vs. AD (p<0.0001), and MCI vs. AD (p<0.05) in Immediate Verbal Learning: between Normal vs. MCI (p< 0.001) and Normal vs. AD (p<0.0001) in Delayed Verbal Learning; between Normal vs. AD (p<0.001), and MCI vs. AD (p<0.01) in Verbal Interference Part 2; between Normal vs. MCI (p< 0.001), Normal vs. AD (p<0.0001), and MCI vs. AD (p<0.001) in Trail Making Part 1; between Normal vs. MCI (p< 0.0001), Normal vs. AD (p<0.0001), and MCI vs. AD (p<0.01) in Trail Making Part 2; between Normal vs. MCI (p<0.01), and Normal vs. AD (p,0.01) in Digit Span Part 1; between Normal vs. MCI (p,0.01), Normal vs. AD (p<0.0001), and MCI vs. AD (p<0.05) in Digit Span Part 2; between Normal vs. AD (p<0.01) in Maze 1; between Normal vs. MCI (p< 0.001), and Normal vs. AD (p<0.0001) in Maze 2; between Normal vs. MCI (p< 0.0001), Normal vs. AD (p<0.0001), and MCI vs. AD (p<0.01) in Complex Figure; between Normal vs. MCI (p< 0.0001), Normal vs. AD (p<0.0001), and MCI vs. AD (p<0.001) in Total 11 test score; between Normal vs. MCI (p< 0.0001), Normal vs. AD (p<0.0001), and MCI vs. AD (p<0.001) in Total 6 domain score; and between Normal vs. MCI (p< 0.0001), Normal vs. AD (p<0.0001), and MCI vs. AD (p<0.001) in Total 9 domain score.
This result of post-hoc suggests that the discriminant abilities of the Savonix 11 subtests may differ in the sensitive areas such as sensitive only between Normal vs. MCI/AD, or sensitive across all types of discrimination.
4.3 Convergent validity
Correlation coefficients between three types of Savonix assessment total scores and the three traditional screening tests are shown in Table 7, which are all significant. The Savonix cognitive assessment has weak to moderate correlation to MMSE, ADAS-Cog, and CDRSB for the MCI and AD subjects.
4.3 Discriminant ability of Savonix 11 subtests
Figure 1 shows the result of ROC curve analysis for Savonix 11 subtests, MMSE, ADAS-Cog, and CDRSB using scores adjusted with education years to discriminate MCI and AD. Table 2 shows the discriminant indices of those tests as well as the summary of logistic regressions. The sensitivities, specificities, PPVs, and NPVs are determined by the top-left method. Except for the MMSE result which was used for diagnosis, the Savonix 11 subtests shows better AUC compared to those for the ADAS-Cog and the CDRSB though there were no statistically significances in DeLong’s ROC comparison tests (p= 0.763 for 11 tests vs. ADAS-Cog, p= 0.670 for 11 tests vs. CDRSB).
Table 3, 4, and 5 show the discriminant indices of Savonix 11 test scores to detect MCI from Normal, to detect AD from Normal, and to detect MCi or AD from Normal respectively, as well as the summary of those logistic regression models. Those results show that Savonix 11 tests, when used altogether, have very powerful discriminant ability across Normal vs. MCI, Normal vs. AD, Normal vs. MCI or AD, and MCI vs. AD.
4.4 Selective use of Savonix subtests
The above results show that each of the Savonix 11 subtests differ in the ability to discriminate between different clinical conditions. For example, one subtest may be particularly effective for differentiating AD from Normal and MCI, but not for differentiating MCI from Normal.
We proceeded to test the hypothesis that only a subset of the Savonix subtests are sufficient for identifying clinical conditions.
To validate this hypothesis, we compared the AUCs of univariate logistic models of the 11 subtests for Normal vs. MCI, Normal vs. AD, Normal vs. MCI or AD, and MCI vs. AD against the minimum and maximum reference AUCs (MinAUC and MaxAUC).
The AUC for the univariate logistic model with Verbal Interference Part 1 was used as the MinAUC for all the three types of discrimination: MCI vs. AD, Normal vs. MCI, Normal vs. AD.
The MaxAUC was AUC for the univariate logistic regression model for one of the total test scores: the univariate models of total 6 domain score were applied to the AD vs. Normal and MCI vs. AD discrimination, while the model of total 11 subtest score was applied for MCI vs. Normal discrimination. Comparison was performed using DeLong’s ROC comparison tests.
Table 8, 9, and 10 shows the AUC comparisons between 11 subtests for MCI vs. AD, Normal vs. MCI, Normal vs. AD discrimination. We selected a few of subtests whose AUCs are significantly different (at p < 0.01) from the MinAUC, but not from the MaxAUC for each type of discrimination. For MCI vs. AD discrimination, three subtests, Trail Making Part 1 and 2, and Complex Figure were selected; for Normal vs. MCI discrimination, Trail Making Part 2, Maze Part 2, and Complex Figure were selected; and for Normal vs. AD discrimination, two subtests, Trail Making Part 2 and Complex Figure were selected.
Total scores of those selected subsets of subtests were evaluated for their discriminant abilities as with the total 11 subtests’ scores to detect AD from MCI, MCI from Normal, or AD from Normal respectively. As seen in Tables 2, 3, and 4, combinations of subtests can detect comparably to the full set of 11 Savonix subtests.
5. Conclusion
This study investigates the diagnostic validity of Savonix, a mobile/tablet-based neurocognitive assessment system, in a Chinese population. The investigation comprised three parts: 1) subtest-level discrimination between neuronormal, mild cognitive impairment, and Alzheimer’s disease cases; 2) convergent validity between the Savonix assessment total scores and the traditional paper-and-pen screening tests; and 3) ROC analyses for the total scores of the entire Savonix assessment and the selective subtests.
Of the 11 subtests in Savonix assessment tool, seven were measured by the test variables regarding accuracy (c.f., incorrect response rates), three measured by the variables regarding speed (c.f., reaction times), and one measured by variables regarding both the accuracy and the speed.
From the comparison between neuronormal, MCI, and AD groups, all seven subtests for the accuracy and all three for the speed variables showed significant differences between groups indicating that the tool has fine-grained discriminant abilities even at a subtest level.
Comparisons between the total scores versus the traditional paper and pencil screening tools showed a reasonable level of convergent validity. Relatively weaker correlation between CDRSB and Savonix’s assessment could be due to CDRSB’s scope of functions: CDRSB includes non-cognitive functions such as those related to community affairs, home, hobbies, and personal cares [32] which are not currently included in the Savonix assessment.
The results of ROC analyses indicate high discriminant ability of the Savonix assessment tool’s total scores across different types of discriminations: MCI vs. neuro-normal cases; AD vs. neuro-normal cases; MCI vs. AD cases; and MCI or AD (cognitively impaired) vs. neuro-normal cases. The Max discriminant indices for the physician diagnosis such as AUCs, sensitivities and specificities for discriminations indicate that the Savonix assessment is one of the best diagnostic sensitivities among other automated [33], mobile-based [34], app-based, or web-based [11] tools available today. As for the screening of AD vs MCI cases, the Savonix assessment also showed better performance than traditional pen-and-paper screening such as ADAS-Cog and CDRSB.
The present study also shows that selective application of the Savonix 11 subtests is possible when screening for different neurocognitive conditions. Two tests in particular, Complex Figure (which measures visuo-spatial episodic memory, general attention, planning, executive function) and Trail Making part 2 (which measures attentional switching), can manage all three screening types-AD vs MCI screening, MCI vs neuro-normal screening, and AD vsneuro-normal screening-with diagnostic ability comparable to the use of all 11 subtests.
This data-driven finding is compatible with studies suggesting that tracking of decline in executive functions is as important as tracking episodic memory when differentiating between normal ageing and neurocognitive conditions such as MCI, and AD [35,36].
For MCI vs neuro-normal screening, Maze part 2 (which taps into visuo-spatial working memory and executive function), was selected. This is compatible with findings that deficit characteristics in visuo-spatial working memory and executive function may be a key to screen MCI from normal aging [37,38].
For AD vs. Neuro-normal screening, Trail Making part 1, which is relatively simple compared to the other two subtests (Complex Figure and Trail Making part 2), was selected. Trail Making part 1 measures information processing speed and simple attention. This is compatible with findings that show that the TMT is sensitive to amyloid burden tracking processing speed and executive function is important for identifying MCI conditions that progress towards AD [39].
The high screening accuracy of the Savonix assessment, even with selective use of subtests, strongly suggests that the assessment tool, which covers the major cognitive domains, can be adapted and abbreviated for cognitive screening at scale, which is crucial for early detection of MCI or dementia.
There are some limitations in this study. Subject sizes for MCI and AD groups were relatively small and the convergent validity examination was limited to clinical cases only. Subtypes of MCI such as amnestic, non-amnestic, or multiple domain MCI were not examined. Future investigation is needed to address these limitations. Nonetheless, this preliminary study demonstrates the diagnostic and convergent validity of a new mobile/tablet-based neurocognitive assessment, Savonix. Savonix has high potential to enhance early detection of cognitive impairment at scale, by overcoming the traditional barriers to in-person clinics and hospitals for cognitive screening. With accumulating evidence that early identification [40], combined with appropriate intervention for modifiable factors, can alter the progression of cognitive change, the possibility of prevention of MCI and possibly dementia is emerging.
Competing Interests
The authors declare the following competing interests:
KS and SC authors are/were employees of Savonix Inc at the time of
writing and retain stockholding in Savonix Inc. QZ has no competing
interests.
Author Contributions
KS performed data analysis and prepared early drafts of the MS. QZ oversaw conception, design, data collection and revised manuscript. SC oversaw analysis and manuscript preparation and final revision.