Generating content with AI has been possible since the early 2010s, but it really took off in November 2022 with the release of OpenAI’s ChatGPT-3.5. Suddenly, everybody seemed to be generating essays, job application letters, social media posts, and even poetry by simply typing a prompt into a large language model (LLM). Alongside the development of artificially generated text, there’s been an equally rapid growth of AI detection tools. A recent study (PDF) by the Center for Democracy & Technology found that 68% of US educators have used AI detection tools to check student’s essays. So what are these tools, and how do they work?
AdvertisementDetectors are a way of using artificial intelligence to spot artificial intelligence. They use algorithms trained on human-written and AI data to analyze linguistic patterns like repetitive phrasing or unnatural word frequency. Some also look for inconsistencies or superficial reasoning. Most AI detection tools return a percentage score indicating how much text is likely to be human-written and how much is AI-generated. It’s a challenging job, given that LLMs are getting better all the time. We tested different AI detection tools using four pieces of text, two written by humans and two produced by ChatGPT, which you can read about in more detail in the Methodology section.
If you need to check someone else’s text to see if it was written by a robot, check out our three best tools below. If, however, you want to use AI detection tools to test your own AI-generated work so you can submit it and not get caught, then beware. Our tests showed a huge difference in results across different applications. If whoever checks your work uses a different AI detection tool, you could still get caught out.
AdvertisementQuillBot is best for unlimited checks
QuillBot performed well in tests, identifying both examples of non-AI written content successfully, with scores of 100% human. It also recognized the AI-generated content, although it did think that 34% of the AI factual prose and 7% of the AI fiction writing were written by a human. It’s fast, free to use and can check English, Spanish, German, and French text. You can either paste in text or upload DOCX or PDF documents. The free version limits you to 1,200 at a time, but there’s no limit to how many checks you can run, so you can still check longer texts if you break them into chunks of 1,200 words or fewer.
AdvertisementAlternatively, you can pay for a premium version ($8.33 per month, billed annually) for unlimited text length. QuillBot also has other features, including a Paraphraser, plagiarism checker, and content summarizer. As well as its percentage scores for “AI-generated” and “Human-written” content, there are two other options: “AI-generated and AI-refined” and “Human-written and AI-refined.” These are only available in the English language version. However, in my tests, the scores for these categories were 0% across all four documents.
Sapling is best for quick results
Smodin is best for non-fiction text
Runners Up: AI detection tools that didn’t quite make the top three
The AI detection tools to avoid
GPTZero allows you to scan text using its Basic Scan for free without a requirement to sign up. It allows you three scans before you need to sign up for a free account, although the free account gives you fewer features than many other free accounts. You can only see highlighted likely AI passages if you opt for a paid account. In tests, it performed well with the factual pieces, giving the AI-generated text and human-written text scores of 98% and 0%, respectively. However, it was unable to tell the difference between human and ChatGPT fiction. Douglas Adams’ writing received a score of 59% human, making it only marginally more human, in GPTZero’s opinion than the AI-generated SciFi, which got 58%
AdvertisementUndetectable AI claims to check text against multiple AI detection tools, including QuillBot and Sapling. Yet its results didn’t match those we got when using the tools directly. Every one of the 4 test articles came back as human-written. I did manage to make it detect some AI content by pasting in some shockingly bad ChatGPT examples, but the writing needs to be unnatural and cliche-ridden before Undetectable thinks it was produced by AI.
The absolute worst AI detection tool we tested was Merlin AI, whose scores bore very little resemblance to how the writing examples were produced. My factual article resulted in a score of 40% AI, so it did at least consider it to be slightly more human than the GPT version, which scored 78%. When it came to detecting AI fiction, it was completely off-beam. ChatGPT’s story returned 45% AI-generated, while the preface to the “Hitchhiker’s Guide to the Galaxy” was, in Merlin’s opinion, 97% AI-generated, which is quite a feat for a book published in 1979.
AdvertisementMethodology
We only tested text-based AI detection tools, although similar image and video tools are also available. We focused on products that were free to use, although all came with advanced paid options. We used four pieces of text; two were factual articles, and two were pieces of fiction. For the factual content, I used the words from my entirely human-written LinkedIn article. I then generated an article of similar length with the same title on ChatGPT.
AdvertisementTo see how good the tools were at spotting original fiction and AI-generated stuff, I used the preface to Douglas Adams’ “Hitchhiker’s Guide to the Galaxy.” Then I took the first eighteen words (“Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy…”) and told ChatGPT to use them as a starting point for the first 600 words of a sci-fi novel. I removed any framing text around ChatGPT’s answers but did not make any other changes.
The AI detection tools were scored on their accuracy. I also took into account how easy they were to use and gave higher rankings to tools without overly restrictive limitations on their free plans. In judging the results, we considered that false positives (where human-written text was reported as AI) were a bigger problem than false negatives, where AI content was missed. This is because, as AI models continue to improve, some non-human-generated content is bound to slip through the net, annoying as that might be. However, the consequences of human-written prose being flagged as AI are much bigger and can have serious consequences that may be completely underserved.
Advertisement