Qualitative research examining student’s preconceived perceptions of AI grading in Massively Open Online Courses (MOOC)
HCI graduate class project with teammate Kevin Duong
Interview Design
Interviewing
Qualitative Analysis
Design Recommendations
User insights and design recommendations for MOOC designers
Massively Open Online Courses (MOOCs) are increasingly popular among students seeking to learn and improve their skills at a fraction of the cost of traditional university courses. However, MOOCs differ significantly from universities, primarily due to the absence of dedicated instructors who evaluate student assignments.
In this graduate class project, graduate student Kevin Duong and I proposed that AI technology could be utilized to assess a large volume of student submissions in MOOCs, thereby addressing the quality gap between MOOCs and conventional classroom experiences.
We conducted interviews with eight students to explore their perceptions of a future where AI grades assignments in various MOOC settings. In this qualitative process, we synthesized a set of recommendations for effectively implementing AI grading in MOOCs, taking into account students’ concerns and perspectives on the use of AI.
Massively Open Online Courses (MOOCs) are online courses designed for unlimited participation, with platforms like Coursera and Udemy serving as notable examples (Zheng et al, 2015).
In 2021, over 221 million users around the globe took advantage of MOOCs to enhance their skills and explore a wide range of educational subjects (Class Central, 2021). These courses enable learners to access a vast array of topics at their own pace, overcoming the financial and physical barriers often associated with traditional educational institutions.
In a MOOC, students generally learn by progressing through modules, which consist of pre-recorded lessons from the course instructor and various assignments. Oftentimes but not always, students need to achieve a certain score on an assignment to advance to the next module.
Assignments can take various formats and may be graded either objectively or subjectively. Grading often happens automatically or through peer assessment, where MOOC participants evaluate each other's work before advancing. In some cases, assignment completion is assessed solely based on participation.
MOOC enrollment can range from hundreds to tens of thousands of students per cohort, making it logistically and financially unfeasible for human instructors to assess all the work.
To gather qualitative insights from students, we interviewed 8 participants that were conveniently sampled from the Iowa State or through personal connections. To qualify, participants must meet the following criteria:
1. Young adults aged 18 to 35
2. Have taken or is currently enrolled in a MOOC course within the past 12 months
The interviews were semi-structured and lasted about 45 to 60 minutes. They took place on Microsoft Teams and were recorded and transcribed using the platform.
We posed scenario-based questions to help students envision a potential implementation of AI grading. These scenarios were developed from existing HCI studies in the fields of AI and educational technology that speculate on the future use of AI for grading. Then, we asked students to share their thoughts and concerns on the presented scenarios.
In the future, students who are taking the same course will be exposed to an assignment grading system powered by Artificial Intelligence.
a. What is the first image that comes to your mind?
b. Do you like/dislike the idea of AI evaluation? Why do you like/dislike this idea?
c. How would you feel about continuing your education through the MOOC platform?
d. What requirements would you have to adapt to this change?
In the future, MOOCs will implement an AI-peer grading system. An essay will first be graded by 5 different peers based on a rubric. Then, an AI model will attempt to remove biases in peer-grading essays by modifying the peer grades based on how the class instructor has graded a very small number of essays.
1. Do you like/dislike the idea of AI evaluation shown in the scenario? Why do you like/dislike this idea?
2. What is your preferred choice among the three options:
a. Evaluations made by peers only
b. Evaluations made by AI only
c. Peer - AI collaboration
3. If Peer - AI collaboration was selected, do you prefer:
a. Peer grading has a higher weight in your final score than AI.
b. AI grading has a higher stake in your final score than peers.
c. Peer and AI have equal weights in the final score.
In the future, you receive feedback from an AI grading system on a project for a MOOC class. The feedback is detailed, offering specific suggestions on improvement.
1. Do you like/dislike the idea of AI evaluation shown in the scenario? Why do you like/dislike this idea?
2. For the aspects you dislike, what would alleviate some of your concerns?
We also collected demographic and quantitative data about the participants to inform our analysis of their interview responses.
We analyzed the interview data by qualitatively coding the transcripts, using an inductive, bottom-up coding approach. Using the online coding tool Tagguette, we coded the responses and maintained a codebook. We collaboratively coded the first interview and discussed our interpretations of the data. For the next three interviews, we coded them individually before reconvening to align our codes. Finally, we coded the last four interviews individually and held one more meeting to ensure our codes were consistent.
Our team believed that qualitative data is inherently subjective, and that we as coders have our own biases and worldview that we can introduce to our analysis. To acknowledge my own subjectivity, I have included my positionality statement here.
Participant 1:
Yeah, I truthfully think it would be good. The current [peer grading system] was kind of not really important that [the MOOC creators] almost glossed over.
Participant 3:
I don't think I will have any concerns [given] my experience with Coursera... if AI was a grader, I would at least feel reassured that [AI] would go through [answers] in a more detailed way and I'll get a more informative answer [compared to peer grading].
Participant 1:
I'm all for [AI grading]. I truly think it would be fantastic. I have concerns, but I think it would lead to a very great system.
Participant 3:
There's a lot of potential gray areas, but I think I would leans towards being in favor of [AI grading]. I'm curious to see how [AI grading] operates and then seeing if it will work.
Participant 8:
I would be willing to try it at the very least.
Participant 1:
I would want [AI] to specifically tell me or maybe show me how to get better. Yeah, my initial reaction is it would provide feedback and quite a lot of feedback and I would really like that.
Participant 8:
I would want feedback on how I have been doing on the course so far, like maybe ask me to upload my [website] code assignment that I have been doing so far and look at my website. Maybe an suggestion on where I'm lacking and how the user would view my portfolio?
Participant 6:
Humans tend to be very short and concise in their responses, whereas AI can give you really detailed explanations and give you next steps.
Participant 1:
In my previous experience, [ChatGPT] has been useful and I've been able to learn a lot from using [AI] especially when you're able to ask follow up questions and just dive deeper and dive deeper, because for me, I learn best when I can ask questions.
Participant 3:
You could continue to probe and probe and probe... you can continue to ask why or have AI regenerate along the same lines.
Participant 1:
I think a human grading can vary depending on their emotions of the day, or like the recent assignments that they have graded. Because I think human judgement is very subjective based on the recent things that you have seen and done.
Participant 3:
I think, as humans, it might be hard for us to spot biases because we ourselves have biases, but AI does not have that.
Participant 4 references a time where he wrote his thesis criticizing a newspaper when his other classmates were celebrating the newspaper:
If a student maybe has a different angle on something or want to answer the question differently, than what the AI system is programmed to do then what's where I kind of have concern with grading specifically.
Because I was so against the grain of what everyone else was doing and also what we learned about in the class, I just don't know if AI would be able to cognitively read that essay and then grade it effectively and fairly in comparison.
Participant 6:
I think AI does a really great job at answering very simple questions, but it falls short more often than not for me when the questions get complex.
Participant 4:
Is this AI going to be understand what I'm talking about?
Participant 1:
My biggest concern would be where [the AI grading system] is pulling [data] from… if AI is able to be that transparent, I would 100% trust the AI more just because I it seems more serious and I actually know where it's coming from.
Participant 2:
If they're doing this AI grading system, I would want the instructor to actually show me how it works and there should be a discussion between all the students to make sure that they're not treated unfairly.
Participant 5:
Any lack of transparency would cause me to feel more cautious about trusting it.
Participant 6:
I think there's also a feedback mechanisms where you could either thumbs up or thumbs down or appeal or report it a [mistake], which I think at least provide feedback for the AI to get better and to improve.
Participant 5:
So long as I could either appeal or then better understand why I was losing points, I would be OK with that… that would absolutely provide more trust because it effectively gives me a fall back. So if I think that the AI has made a mistake, then I can be comfortable in knowing I have some form of recourse as opposed to just allowing it to happen effectively.
Participant 2:
The AI itself needs to be proctored, like someone would have to check up on it to see that it's performing as the same standard that they they first set it up to be.
Participant 4:
I mean, I'm sure there is a margin of error that can happen, so I would hope that maybe the professor would just skim over what the AI did with the grading at the end, just to ensure that it graded correctly and not just entrust everything AI.
a. Develop features that clarify the grading rubric and AI decision-making processes to students.
b. Include updates on AI enhancements and a straightforward disclaimer regarding its continual improvement.
c. Clearly disclose the use of AI systems in grading upfront.
a. Ensure AI provides specific, actionable suggestions for improving grades.
b. Include AI interface to facilitate interactive learning through query-based engagements (follow-up questions).
a. Implement a user-friendly system for students to challenge and review AI-derived grades, ensuring support through online forms or real-time assistance.
a. Integrate mechanisms for the AI to learn from grading appeals and adjustments, supplemented by periodic human oversight to ensure accuracy and fairness.
a. Train AI on diverse datasets to appreciate varied perspectives and writing methods, thus enhancing its ability to fairly grade diverse student responses.
This was my first extended qualitative research study, and I learned so much about the process. This project taught me how complicated it can be to apply qualitative research rigorously and the responsibility of interpreting others' perspectives and narratives accurately.
With my quantitative background, it sometimes felt “wrong” to explore specific aspects in some interviews but not in others. I learned to channel my genuine curiosity and delve deeper into relevant topics as they arose. I also came to understand that the true aim of qualitative research is to cultivate credible, rich narratives and experiences, which can vary greatly among individuals. Consequently, the exact interview questions do not need to be identical for everyone.
Due to our limited schedules as students, Kevin and I chose to do more individual coding rather than collaborative coding. In hindsight, we spent a lot more time than intended in alignment meetings to reach consensus. Perhaps, we could have saved time by engaging in more collaborative coding sessions initially. In the future, I plan to embrace collaborative coding more, especially for the first few sets of interviewing data.
(1) Class Central. (2021). MOOC statistics and trends 2021. https://www.classcentral.com/report/moocs-stats-and-trends-2021/
(2) Zheng, S., Rosson, M. B., Shih, P. C., & Carroll, J. M. (2015). Understanding student motivation, behaviors, and perceptions in MOOCs. CSCW 2015 - Proceedings of the 2015 ACM International Conference on Computer-Supported Cooperative Work and Social Computing, 1882–1895. https://doi.org/10.1145/2675133.2675217
(3) Baker, R. S., & Inventado, P. S. (2014). Educational data mining and learning analytics. CiteSeerX. https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=fe102164f3c7645cfc6bbff58bc70440570ada74
(4) Conijn, R., Kahr, P., & Snijders, C. (2023). The Effects of Explanations in Automated Essay Scoring Systems on Student Trust and Motivation. Journal of Learning Analytics, 10(1), 37-53. https://doi.org/10.18608/jla.2023.7801