From Plagiarism Detection To A.I. Detection: ChatGPT and Academic Grading
It’s that time of the semester again (at least in my university) when lecturers plant themselves for hours in their cubicles to grade a thousand papers whilst hoping to maintain a modicum of academic objectivity throughout the process. However, since ChatGPT was unleashed late last year (and, since then, Bard, Bing and so on), grading assignment papers nowadays is somewhat tricky.
For more than a decade, most assignments submitted online were put into this system known as Turnitin which is second to none when it comes to plagiarism detection apps (see note 1). This system is used in 99% of the top educational institutions all over the world (including the revered OxBridge universities) and works by ‘bouncing’ off students’ papers against north of 91 billion web pages (and counting).
I had the pleasure of speaking with one of the directors who visited from the United States a few years ago; he told me that every week millions of web pages are continually being added to their database. Heck, they’re even scanning documents found in centuries-old archives.
Basically, practically every written text on earth will be part of the Turnitin database. Therefore, if a student copied-and-pasted from any document (even if they didn’t do it online!) it’s likely to show up in the originality report.
So all was well and dandy…until ChatGPT arrived on the scene.
Because — and this is really AI in Education 101 here — today the trending problem isn’t students copying material from the Web; it’s students using an A.I. bot to write their essays on their behalf. It really is just as simple as keying in the assignment question into ChatGPT’s chat box and keep doing so until the system regenerates and refines an answer you find satisfactory.
In one stroke this potentially renders “non-originality qua plagiarism detection” worthless.
This isn’t a flawless analogy but imagine in the past art forgers faked their art using specific paintings, so the moment an original painting is found a strong case can be made against the con artist. But today the art fakers are copying bits and pieces from thousands of art pieces and combining them to produce artworks which cannot be ‘compared against’ discrete paintings.
Hence, it’s much harder for authorities to prove forgery. Thus any system reliant on checking potential forgeries against specific paintings out there could be made redundant.
Luckily for Turnitin, they immediately produced an Artificial Intelligence (A.I.)-detection function trained to detect content from the GPT-3 and GPT-3.5 language models (which includes ChatGPT). Note that GPT-3 and ChatGPT are trained on the text of the entire Internet (a ‘field’ Turnitin does have some expertise on, obviously), and they are essentially taking that large amount of text and generating sequences of words based on picking the next highly probable words.
Large Language Models (LLMs) tend to generate the next word (in a sequence of words) by predicting what is the most likely or best word based on the data they’re trained in (in this case, it’s the entire Web). As such, “answers” copied from ChatGPT will read differently from human writing which, as we know, tends to be awry, weird, inconsistent and idiosyncratic.
The above paragraph is the theory at least.
Unlike with plagiarism detection, AI detection works on probabilities. With plagiarism detection, Turnitin is able to show you the exact page your student copied a paragraph from; with AI, Turnitin is limited to saying that based on the way this paragraph sounds it’s “likely” generated from ChatGPT or Bard or what-not.
But herein lies the rub: It’s practically impossible to prove a student copied from an AI chatbot. If it hasn’t happened already, I foresee future news reports where a student and a university fight each other in a lawsuit, with one claiming AI copying and another denying it.
And, short of me being forced to show my ChatGPT dashboard’s response history, there’s really no way a lecturer can ‘prove’ I cheated. Inserting the assignment question into ChatGPT isn’t going to help because a) the system doesn’t always (if at all) generate the same response to the same question and b) a student can always ask more and more questions until the ‘final’ response is customized to his or her satisfaction.
How do we move forward?
I don’t know if my approach is representative of the country’s lecturers who grade online assignments, but generally I will:
a) check the AI detection score
b) construct a broad assessment of the student and, critically, whether the person is capable of writing what he submitted and, in worst cases,
c) interview (that’s my polite replacement for ‘interrogate’) the student as to what he wrote, his sources, etc.
Hardly perfect, but — apart from insisting only on closed-door written assignments and/or watching over students shoulders like a hawk all the time (both of which are highly impractical, especially in university settings) — I don’t for now see any other way of ensuring academic integrity in our A.I. epoch.
The twist in the tale is that (and I’m just guessing here) there is a probably a small group of students who are able to use and manipulate AI systems to their advantage. These students are in all likelihood saving a lot of time yet achieving high grades because, well, their lecturers simply haven’t kept up with these issues and said students are smart enough ‘avoid detection’ when it comes to suspicion of cheating. Some of them are probably very bright individuals in general, participate actively in class discussions and more or less fit the profile of a top student.
Yet, and ironically, because they’re in tuned with developments in the AI world, getting a chatbot to bang out a 5,000 word essay would be chicken feed for them. A double irony is, for me at least, if a student is able to utilize a LLM to save time writing an answer and if they can verbally defend what they write (as if they wrote it), and if their Turnitin AI detection scores remain low (see note 2), then has anything ‘bad’ actually occurred? Shouldn’t such students be instead commended for their resourcefulness?
So perhaps that’s the way forward: Work with students on harnessing the power of AI instead of fearing they’ll cheat with it. We probably don’t have many other options left.
Note 1: The more accurate term for what systems like Turnitin does is non-originality detection. Plagiarism is the illegitimate ‘borrowing’ of ideas (which can be rephrased in a manner immune to an online check) whereas what Turnitin does, strictly speaking, is ascertain how much of a student’s paper is copied from something online i.e. how much of it is ‘non-original’.
Note 2: Turnitin itself notes that a score of 20% or less for AI Detection, the likelihood of false positives is higher. Hence, declaring academic misconduct in such cases needs to be done under extreme caution and only after an extensive investigation.