1. Introduction
Autism Spectrum Disorder (ASD) is a neurodevelopmental disability involving deficits in social communication and the presence of restricted, repetitive patterns of behavior [1], and difficulties with attention and motivation often co-occur [2,3]. The Centers for Disease Control and Prevention (CDC) estimated that one in 59 children has ASD [4]. Children with ASD struggle with emotion identification and recognition and have difficulty recognizing and interpreting facial expressions [5]. These skills are foundational to understanding mental and emotional states of others and are critical for emotion regulation and social competence [6-8]. Poor emotion recognition in schoolage children has been linked to difficulty relating to peers, increasing the risk of peer rejection, social isolation, and mood disorders [9]. Poor social skills can also contribute to underachievement in school and employment [10]. Given the rising prevalence of ASD and its significant impact on long-term functioning, effective early interventions targeting the core deficits of this disorder are crucial.
Social skills therapies for children with ASD have traditionally been led by a clinician in a one-on-one or group setting and involve either directly teaching social skills to the individual with ASD or training peers and family members to interact with the child more appropriately [11]. These types of therapies have been shown to be effective in improving emotion recognition [12,13], joint attention [12,14,15,15-17], social communication [12,18], reciprocal social interaction [12,19], and imitation [12,20,21]. These treatments require extensive time and resources that are in high demand [22-24]. Furthermore, human social interaction can be anxiety-provoking for children with ASD [25], and individuals with ASD often report they prefer to interact with social robots rather than people [26]. Social robots may be more interesting, engaging, and motivating for this population [27], and the use of child-preferred, intrinsic reinforcers leads to improvements in social engagement [27,28]. Social robots for individuals with ASD have shown promise in improving imitation, engagement, attention, initiation of social interaction, turn-taking, joint attention, attention span, eye contact, child-led speech, and the use of novel social behaviors [27,29-38].
1.2 SAM robot
Cartoon-like or animalistic robots are more engaging for children with ASD; however, these robots typically offer only a limited range of facial expressions that do not generalize to the human face [37]. Taking these factors into account, the Socially Animated Machine (SAM) was created, which resembles a stuffed animal monkey with core features of a human face, including cartoon-like eyes, eyebrows, and a mouth (Figure 1).
SAM displays six facial expressions: happiness, sadness, anger, fear, surprise, and disgust (see Figure 2). Typically developing children were able to identify SAM’s emotions with high accuracy [39]. The SAM intervention is autonomous, and implementing multiple SAM robots in a clinical setting would allow therapists and teachers to serve more individuals using fewer resources. For additional details regarding the design and development of the robot, please refer to previously published work on this topic [39].
During the intervention sessions, SAM sat behind a touchscreen tablet and talked to the child while presenting response options on the tablet. Children were seated at a table facing the SAM robot, and indicated their responses by touching the tablet. The SAM robot intervention consisted of five mini-games that targeted emotion recognition and identification. These games were introduced through a series of eight, weekly lessons which lasted between 15 to 25 minutes each. Games began with simple emotion modeling and matching tasks and increased in complexity across sessions, culminating in an emotion inference task. During this final task, the SAM robot tells a short story which evokes a certain emotion and prompts the child to identify the emotion. For this task, the SAM robot does not model the relevant facial expression. Instead, the child must infer the correct answer solely from the content of the story and select the appropriate emotion on the tablet. See Supplementary file for detailed session content. The control group also interacted with SAM, but the content of the sessions differed and was designed to be free of emotion-based content. Children in the control group selected dance moves on the tablet, and SAM performed the dance move. Previous research with SAM showed children with ASD and average IQ were engaged, happy, and comfortable when interacting with the robot, but improvement in emotion identification accuracy showed a ceiling effect [39]. This study aimed to determine the effectiveness of the SAM robot intervention for children with ASD across a range of cognitive ability.
2. Materials and Methods
2.1 Participants
Participants were recruited from organizations that work with children with ASD, and flyers were posted at ASD-focused events and community centers. Children ages four to 14 years with a diagnosis of ASD and without uncorrected vision or hearing problems were eligible to participate. The primary investigator conducted a brief phone screener with 39 interested families to determine initial eligibility. Seven children did not meet eligibility criteria, three families chose not to participate due to living a substantial distance from the study site, and others were lost to follow-up. One child discontinued due to anxiety regarding the robot interaction. Twenty children (10 per group) were eligible to participate and completed the study.
All children who participated met classification for ASD. Age ranged from five to 14 years old, and there were nineteen males and one female. Sixteen participants were white and four were black. Cognitive ability level ranged from severely impaired (IQ ≤ 40) to high average (IQ ≥ 110), and receptive language skills ranged from severely impaired (standard score ≤ 40) to above average (standard score ≥ 115). Parent education was varied and ranged from high school to advanced degree. Fisher’s exact tests and independent samples t-tests revealed no differences between groups on demographic and descriptive measures.
2.2 Procedures
The protocol for this study was approved by the Institutional Review Board at the University of Alabama at Birmingham, and all participant information was stored securely. Parent consent was obtained for child participation, and assent was provided by children age seven or older who were cognitively able to assent. This study utilized a controlled trial design in which participants were assigned to the SAM robot intervention group or the SAM robot control group using a predefined algorithm aimed to match groups on IQ (Supplementary file). The study was conducted at the University of Alabama at Birmingham, and the robot-child interaction sessions took place in a setting convenient to the family, including the university, library, or community center.
All participants completed eight sessions. Session 1 included eligibility confirmation (ASD measures), completion of a demographic questionnaire, descriptive measures (IQ, receptive language), outcome measures (facial recognition, parent and teacher rating forms), and a robot interaction (emotion identification task). Participants were then assigned to either the intervention or control group. Children, parents, and teachers were blind to group membership. For the next six sessions, the intervention group played emotion games with the robot, and the control group played dance games with the robot. During the robot sessions, children were provided with visual supports (e.g., timer, picture symbols) as well as reinforcers (e.g., breaks, preferred food items) when needed to decrease anxiety and maintain engagement. Session 8 involved re-administration of outcome measures, the robot emotion identification task, and completion of enjoyment questionnaires. Following study completion, parents were debriefed and informed of group assignment, and control participants were given the option to complete the SAM intervention. Enrollment in the SAM intervention for control participants following study completion was optional, and additional data were not collected following study completion for these participants.
2.3 Measures
Demographics: Parents completed a brief questionnaire about child and family characteristics. Child information included age, gender, and ethnic and racial identity. Family information included urbanrural classification, household income, parental education, marital status, and employment status.
ASD diagnosis: The Autism Diagnostic Observation Schedule, Second Edition (ADOS-2) [40] is an instrument used to aid in ASD diagnosis. It is a semi-structured observation and interaction session with an administrator and the child. Based on observed behaviors, the measure evaluates skills in the areas of social communication and restricted and repetitive behaviors and yields a diagnostic classification of autism, autism spectrum disorder, or non-spectrum. A comparison score indicates the level of ASD symptoms and ranges from one to 10, with higher scores indicating greater levels of ASD symptomatology. The investigator administering this measure was research reliable on the ADOS-2. The experimenter also completed a rating of diagnostic certainty on a scale from one to five, with four or five indicating a high level of confidence in the ASD diagnosis. Participants were considered to have an ASD diagnosis if they scored in the autism or autism spectrum range on the ADOS-2 and received a diagnostic certainty rating of four or five.
Cognitive ability (IQ): The Kaufman Brief Intelligence Test, Second Edition (KBIT-2) [41] assesses general cognitive abilities and generates verbal, nonverbal, and composite domain scores along with verbal and nonverbal age equivalents. Domain and composite scores are standard scores with a mean of 100 and a standard deviation of 15, with higher scores indicating better performance compared to sameage peers.
Receptive language: The Peabody Picture Vocabulary Test, Fourth Edition (PPVT-4) [42] is a measure of receptive language. Individuals are presented with four color pictures on a page as response options. For each item, the examiner says a word, and the examinee responds by pointing to the picture that best illustrates the meaning of the word. Overall scores are standard scores with a mean of 100 and a standard deviation of 15, with higher scores indicating more-developed abilities relative to same-age peers.
Facial recognition: Facial recognition was measured using the Benton Facial Recognition Test (Benton), Short Form [43]. The Benton is a clinician-administered measure of facial recognition. The child is presented with a target face and chooses the correct match from an array of six photos. It was originally developed for use with individuals with traumatic brain injury, but it has also been used in recent research with individuals with ASD [44]. This study used the 27-item, short form version of this measure given the age range and expected attention span of the participants. Short form raw scores range from zero to 27. Severe impairment is defined as a raw score ≤ 17, 18 correct indicates moderate impairment, 19 items correct is borderline impaired, and scores ≥ 20 are in the normal range.
Social skills: Social skills were measured using parent and teacher questionnaires. Parents and teachers completed the Social Responsiveness Scale, Second Edition (SRS-2), Parent and Teacher Rating Scales [45]. The SRS-2 is 65-item rating scale that focuses on the severity of social impairments common to individuals with ASD. This measure yields a composite total standard T-score, with a mean of 50 and a standard deviation of 10. Higher scores indicate greater impairment. Scores of 60 to 69 indicate mild impairment, and scores ≥ 70 indicate moderate to severe difficulties.
Emotion identification: Emotion identification accuracy was measured by participant responses to robot prompts during an emotion matching task with the robot. Children were asked to match SAM’s emotional facial expressions to the emotions displayed in schematic face drawings [46] (MATCH-D) and photos of human faces [47] (MATCH-F). Each of the six target emotions (happiness, sadness, anger, fear, surprise, and disgust) was presented four times, twice with schematic drawings and twice with photos of human faces. Accuracy for matching emotions was recorded via responses made on the touchscreen tablet, with possible scores ranging from zero to 24. A previous study showed strong test-retest reliability (r = .79) with a slight upward shift from Session 1 to Session 8 [48].
Enjoyment: Child enjoyment was measured by child and parent questionnaires. The child’s enjoyment questionnaire consisted of two questions: (1) “How much did you like talking to the robot?” and (2) “How much would you like to talk with the robot again?” Each question was rated on a scale from zero to 10, with higher ratings indicating increased enjoyment and motivation to return. A picture of a thermometer was used to aid children in understanding the questions and completing the questionnaire. The parent enjoyment questionnaire was used to measure parent ratings of their child’s enjoyment of the SAM robot interactions. Three questions were rated on a scale from zero to 10 yielding a maximum score of 30 on this measure. The questions were worded as follows: (1) “My child enjoyed interacting with the robot,” (2) “My child was motivated to come to the robot sessions,” and (3) “My child would like to interact with the robot again in the future.” Scores were averaged across questions. For this study, ratings between 7 to 10 were indicative of high favorability, ratings between 4 to 6 were indicative of moderate favorability, and ratings 0 to 3 were indicative of low favorability.
Level of improvement: Following study completion, individuals were categorized into subgroups according to their level of improvement: responders, non-responders, and disengaged. Responders were defined as individuals who improved 20% or more on emotion identification accuracy from session 1 to session 8. Non-responders were defined as those who improved less than 20% from session 1 to session 8. The disengaged group consisted of those who did not understand the task and responded randomly, performing at chance at session 1 and session 8. These individuals had poor attention and required significant redirection and reinforcement to remain seated and complete the tasks. Participants in the disengaged group were characterized by significant global delays impacting all areas of functioning, including cognitive ability, language skills, facial recognition, and social skills.
2.4 Data analysis
Missing data: Three children were unable to complete the enjoyment questionnaire and one child could not complete the Benton Facial Recognition test due to cognitive ability and inability to respond. One parent questionnaire and three teacher questionnaires were not returned. All missing data were excluded listwise from individual analyses.
Analytic plan: Group differences between intervention and control group on enjoyment ratings were measured using the Mann-Whitney U test (due to violations of the assumption of normality). Tests of intervention effect on outcome measures were conducted using oneway ANCOVAs with post-test scores as the dependent variable and pre-test scores as the covariate. Exploratory analyses that followed consisted of within-group paired sample t-tests for each group to investigate change in outcome measures from session 1 to session 8.
To explore which participant characteristics were associated with greater level of improvement within the intervention group, a series of one-way ANOVAs were conducted with baseline participant characteristics as the dependent variables across the factor of responder status (3 levels: responder, non-responder, or disengaged). Post-hoc testing was completed for significant or trending ANOVA results to further investigate group differences using Tukey’s test (when the assumption of homogeneity of variances was met) or the Games-Howell post hoc test (when the assumption of homogeneity of variances was not met).
Regarding assumptions, for emotion identification accuracy, the data showed a ceiling effect, and the within-group residuals were not normally distributed, as measured by Shapiro-Wilk’s W statistic (p <.05). However, ANCOVA is robust to this violation and the analysis proceeded as planned. Assumptions were met for all other analyses.
3. Results
Parent and child enjoyment ratings were high across both groups (Table 1), and enjoyment did not differ between groups (p > .05). No differences between the control group and intervention group were found on emotion identification accuracy, facial recognition, parentrated social skills, or teacher-rated social skills (all p > .05, Table 2 for unadjusted means).
Maximum score is 10. Scores between seven and 10 indicate high levels of enjoyment and motivation.
Emotion identification accuracy scores have a maximum of 24. Parent- and teacher-rated social skills are t-scores with a mean of 50 and a standard deviation of 10. A decrease in parent- and teacher-rated social skills indicates improvement.
Within-groups analyses indicated significant improvement on parent-rated social skills from session 1 to session 8 for both the intervention group (t(9) = -3.08, p < .05) and the control group (t(8) = -3.18, p < .05). Emotion identification accuracy improvement trended toward significance in both groups (Intervention: t(9) = 2.28, p = .053; Control: t(9) = 1.96, p = .082). Facial recognition and teacher-rated social skills did not differ significantly over time for either group (all p > .10).
Regarding subgroups of intervention participants, Figure 3 displays emotion identification accuracy results for individuals within the responder, non-responder, and disengaged groups over time. In general, non-responders had the highest baseline abilities, followed next by responders and then the disengaged group. Refer to Table 3 for intervention subgroup group means. The omnibus ANOVA indicated differences among intervention subgroups on receptive language (p = .01) and facial recognition (p = .02), and differences trended toward significance for nonverbal IQ (p = .07) and IQ composite (p = .09). No subgroup differences were seen on age, autism severity, verbal IQ, parent-rated social skills, teacher-rated social skills, or enjoyment ratings (all p > .10). Figures 4 and 5 illustrate differences between subgroups.
4. Conclusions
This study aimed to determine whether the SAM robot intervention was enjoyable, motivating, and effective for children with ASD across a range of cognitive ability. As hypothesized, both parent and participant ratings indicated high levels of child enjoyment, motivation, and willingness to interact with the robot again across all participants. There was no difference between groups on this measure. This is of particular importance when considering intervention methods for children with ASD. Given that engagement and motivation can be difficult in this population, and that children who are intrinsically motivated by the learning process will be more likely to benefit from it, continued pursuit of robot-based interventions with this population is a worthwhile endeavor.
Notably, one child who enrolled in the study experienced anxiety while interacting with the robot and discontinued participation following the first session. The comorbidity of anxiety disorder and ASD is high, and current estimates range from 42% to 79% [49]. Although precautions were taken during the study to decrease participant anxiety as much as possible, the SAM robot intervention is likely contraindicated for children who experience significant anxiety while interacting with the robot. Given that the other children enrolled in the study were able to attend all sessions and complete the study, in general, the intervention is feasible for this population.
Results indicated improvement within both groups over time on parent-rated social skills and emotion identification accuracy. Improvement within the control group over time was not anticipated. Although parents were blind to group membership, given the time and effort required for study completion, it is possible that parents were more likely to report social skills improvements due to a placebo effect, wherein they were hopeful that their child was in the intervention group and therefore perceived social skills improvement in their child. This may have resulted in inadvertently better ratings following study completion. A recent study using parent-rated social skills measures for children with ASD showed that parents reported social skills improvements over time for their children even when children were not enrolled in any type of intervention or treatment [50]. This indicates the importance of objective measures of social skills improvements over time when evaluating social skills interventions. Although parent ratings are a commonly accepted method to measure social skills improvements, they may not be a reliable way of measuring skill acquisition and social skills outcomes for this population. Additionally, it is possible that children may have been receiving concurrent social skills interventions, such as school or outpatient therapies, which may have resulted in social skills improvement outside of the study.
Emotion identification accuracy was an objective measure administered at session 1 and session 8. Unexpectedly, both the intervention group and the control group improved over time on this measure, despite the control group not having been exposed to the stimuli in the intervening weeks. This improvement may be explained by the control group being exposed to the task at baseline and receiving immediate feedback regarding the accuracy of their performance. They may have recalled the correct answers to the assessment eight weeks later and demonstrated practice effects. Future research should utilize novel stimuli to avoid practice effects, and accuracy feedback should not be given at baseline to either group. Alternative measures of emotion identification accuracy that have been normed for this population would be a beneficial addition to this type of research.
Further investigation of individual improvement on emotion identification within the intervention group revealed interesting and meaningful information regarding the characteristics of children with ASD who benefitted from the intervention. This intervention was most appropriate for individuals with mildly impaired to borderline cognitive ability and moderately impaired facial recognition skills at baseline. Additionally, the ability to attend to a task for at least 10 to 20 minutes was crucial.
For individuals with low average to average cognitive skills and language abilities, a ceiling effect was seen similar to the results reported in Koch (2017). Prior to study participation, individuals with well-developed cognitive skills had already mastered the six basic emotions on which the SAM intervention is based. However, given the high enjoyment and motivation ratings, the use of robot-based interventions should continue to be explored for this population, perhaps with more advanced social skills goals.
This intervention was not shown to be effective for individuals with low receptive language abilities and poor attention. Those who were unable to attend to the intervention tasks and did not consistently engage with the robot despite prompting from the researcher and external reinforcers (e.g., preferred food items) did not improve over time. The verbal language used in the robot intervention was not tailored to those with low receptive language abilities. Using this type of robot-based social skills intervention is unlikely to be effective for these children. Other therapeutic techniques to address attention and behavior difficulties, such as applied behavior analysis (ABA) [51,52], would be more appropriate for these children prior to participating in a robot- or technology-based intervention.
4.1 Limitations and future directions
Due to small sample size, this study was not adequately powered to find significant differences between the control group and the intervention group. However, the majority of the robot intervention literature with this population has utilized case study or case series designs of eight or fewer participants. Therefore, inclusion of 20 participants in a controlled trial is expected to contribute meaningfully to the field. Additionally, this sample only enrolled one female participant and parents tended to be well-educated and upper-middle class. Future research should attempt to reflect the gender ratio of males to females typically seen in ASD (3:1) [53] and should recruit a diverse sample that is more representative of the ASD population to more accurately generalize findings.
The design of this study is certainly a strength compared to other robot intervention studies for this population. The use of a control group that had equal exposure to the robot compared to the intervention group allowed for adequate blinding of families and teachers, whereas other similar studies have utilized a wait-list controlled trial in which parents, teachers, and participants were not blind to group membership. Future research should consider employing an A/B study design where control participants are not exposed to intervention stimuli at baseline. This design would allow all participants to complete the emotion intervention as well as the non-emotion, dance games with the robot. This would double the number of observations and increase power, allowing participants to be compared to their own performance at baseline. Additionally, researchers in this area should consider conducting additional randomized controlled trials in which participants and researchers are blind to group membership and should include different combinations of intervention dose and content to identify the critical elements of social robot interventions for this population.
Another strength of this study was the inclusion of children with ASD across a range of ability levels. Children with ASD and below average cognitive ability frequently are excluded from participating in research due to difficulty with recruitment and retention. Future intervention research studies should strive to include children with ASD and below average IQ. Although recruitment of these families was certainly a challenge, with adequate time, effort, and support, conducting research with this population is achievable and rewarding.
4.2 Implications
Overall, results indicated that the SAM robot social skills intervention is most effective for individuals with mildly impaired to borderline deficient cognitive and receptive language abilities. Children with low average to average skills had already mastered the basic emotion identification skills covered in this intervention, and children with severely impaired abilities were unable to understand and effectively participate in this intervention. Given the high levels of enjoyment and motivation reported while interacting with the robot and the identification of a subset of children for whom this intervention was most effective, continued exploration of the utility of robot-based interventions for children with ASD continues to be a worthwhile and exciting area of study as the field continues to explore and improve upon this modality of therapy delivery.
Competing Interests
The authors declare that they have no competing interests.
Author Contributions
JL was responsible for the conceptualization of the project and was involved in carrying out the majority of this research including developing the research question and methods, recruitment, running study participants, scoring and data entry, data analysis, and writing. CB served as an undergraduate research assistant and assisted with recruitment, running study participants, scoring, and data entry. FB and MH provided guidance and support throughout the research process. Additionally, MH provided edits and assisted in creating the figures for this article.
Acknowledgments
The authors would like to thank the funding sources for this project: the National Center for Advancing Translational Sciences of the National Institutes of Health (TL1TR001418), the University of Alabama at Birmingham (UAB) Civitan International Research Center, and the UAB Civitan-Sparks Clinics Leadership Education in Neurodevelopmental and Related Disabilities (LEND) and University Centers of Excellence in Developmental Disabilities (UCEDD). These sponsors had no involvement in the study design; collection, analysis, or interpretation of data; the writing of the article; or the decision to submit the article for publication. Additionally, the SAM robot was initially conceptualized by Sarah Koch and the Social Technology for Autism Research (STAR) Lab, and Carl Stevens designed and built the SAM robot.