I have now completed the development of an LLM-augmented, emotionally intelligent pedagogical AI conversational agent (called AIvaluate) and tested it with teachers. I have written up the study and submitted it for peer-review at (Springer) Education and Information Technologies (EIT), a Q1 journal for educational technology. Stay tuned for the full study once peer-reviewed.
Abstract
Performance-based assessments (PBAs), such as viva voce exams and oral presentations, offer comprehensive evaluations of student knowledge and skills but place substantial burdens on teachers. The integration of emotionally intelligent, LLM-augmented AI conversational agents presents a potential solution to alleviate teacher burden while maintaining the integrity and effectiveness of PBAs. This study investigates the use of AIvaluate, a pedagogical AI conversational agent designed to support teachers during oral PBAs by offering emotionally intelligent insights and streamlining the assessment process. A counterbalanced mixed-methods study design was employed with 35 teachers and students participating in both traditional face-to-face and AIvaluate-supported assessments. Data was collected through teacher-assigned grades, System Usability Scale (SUS) questionnaires, and qualitative open-response surveys. Quantitative and qualitative analyses were conducted to compare grading outcomes, system usability, and teacher preferences between the two assessment formats. Teachers issued significantly higher grades to students in AIvaluate-supported assessments (p = 0.033), attributed to more structured, consistent questioning and emotional state reporting. The overall SUS score for AIvaluate indicated “acceptable” usability, surpassing the face-to-face format. Thematic analysis revealed key strengths of AIvaluate, including automated question prompts, real-time emotional insights, and the convenience of remote operation. However, teachers noted limitations, such as occasional technical issues and the lack of a personal connection compared to traditional face-to-face interactions. AIvaluate demonstrates the potential to reduce teacher burden in PBAs while maintaining usability and assessment quality. Its emotionally intelligent features and automated functionalities enhance the assessment process, offering a scalable, technology-driven solution for modern education. Future research should explore building further functionality to address the diverse needs of teachers, while focusing on addressing technical limitations and assessing long-term impacts on teacher satisfaction and student outcomes.
Keywords
Artificial intelligence, Conversational agent, Chatbot, Assessment, Education, Performance based assessment, Teacher burden, Assessor burden, Workload, Generative AI, LLM