Large Language Models (LLMs) like GPT are taking the world of education by storm. They can be (and often are) used to generate educational questions and possibly even entire lesson plans – for some examples, see insights from a recent study here.

The piece of information that is mostly missing though and that is of critical importance is how pedagogically useful and applicable these models really are in educational contexts. We have investigated these questions in our recent paper titled “How Useful are Educational Questions Generated by Large Language Models?”. This work is a collaboration between Sabina Elkins (McGill University, MILA, & Korbit), myself, Jackie Chi Kit Cheung (McGill University & MILA), and Iulian Serban (Korbit).

In this study, we used controllable generation and few-shot learning to generate diverse educational questions at different levels of Bloom’s taxonomy and at varying difficulty levels across two subject domains. We then asked a set of teachers to assess these questions to determine if – from their perspective – these questions are high quality and viable for use in the classroom.

Overwhelmingly, the answer is yes! We can generate relevant, grammatically correct, and answerable questions, the quality and usefulness of which speaks volumes about how they can benefit teachers and students alike.

The paper will appear in the proceedings of the 24th International Conference on Artificial Intelligence in Education (AIED 2023) later this year, but you can already access its preprint version here. Moreover, if you’d like to learn more about the study and the annotators and see the examples of the generated questions, take a look at our github page for this study.