In the fall of 2023, I asked my students in my History of Sciences and Technology course, which is part of the Bachelor of Arts in History program, to use ChatGPT for a summative evaluation. This led them to become aware of the limitations of this artificial intelligence tool.
In my opinion, in education, technology serves as a complementary element to the pedagogical experience. It cannot replace it and must remain subordinate to it under any circumstances, especially considering that these technologies are constantly evolving: today, everyone is talking about ChatGPT; tomorrow, it might be about a new tool… Tools like ChatGPT can be helpful for students to perform certain tasks (suggesting research topics, translating text passages). Still, they can’t study a subject in detail or depth nor suggest truly reliable sources to document it. Yet, this is at the very core of the professional practice of historians. Therefore, it becomes essential to sharpen the critical thinking skills of our students so that they use these tools in a considerate way in their future profession.
As part of a 3-step exercise, I wanted to demystify the magical aura surrounding artificial intelligence and show my students that, even if their work had weak points, when they knew their subject well, ChatGPT wouldn’t provide them with much new information.
A 3-step scientific biography
For my course, students had to produce a scientific biography. In the 1st step, they had to:
- choose a scientist
- conduct bibliographic research on the selected individual, using at least 3 peer-reviewed sources
- write a 200-word biography containing:
- personal biography elements (childhood, family background, social origins…)
- scientific trajectory elements (training and studies, research, significant scientific contributions, etc.)
In their biography, students also had to include contextualization elements (such as historical context or political stance) to understand better who their scientist was and in which era they lived.
In the 2nd step, students had to repeat the same exercise, but this time with ChatGPT. In class, I took the time to show my students how to properly formulate their prompt to obtain satisfying results. Then, I let them test ChatGPT. They had to ask the AI to:
- write a biography on the same scientific figure previously chosen (1500 to 2000 words)
- rely on scholarly references in writing its text and provide a bibliography
Students also had to specify the elements they wanted to appear in the bibliography generated by ChatGPT. Their interactions with ChatGPT and the bibliography created had to be included in the appendix of their assignment.
In the 3rd step, as a conclusion to their assignment, students were required to conduct a critical comparison (500 to 750 words) of the 2 bibliographies. They had to analyze:
- discrepancies in factual information between the 2 versions
- the style adopted by ChatGPT
- the reliability and relevance of ChatGPT for such an exercise
Students’ observations
Inaccurate facts
The first thing my students noticed was that the algorithm made several factual errors.
To a student who had worked on the pioneer of computer science (and AI!) Alan Turing, ChatGPT stated that the British mathematician had designed, rather than decrypted, the codes of the Enigma machine used by the Nazis during World War II: a major mistake. She also learned from ChatGPT that Turing completed a research stay at the Institute of Advanced Studies in Princeton between 1945 and 1946, while the student had noted the dates 1936 and 1939 in her own research. When asked about this discrepancy in dates, ChatGPT confessed to mixing up Turing with one of his contemporaries, the Hungarian mathematician John von Neumann, whose name is more associated with game theory. Upon verification, von Neumann was in the United States between 1945 and 1946 but never set foot in Princeton.
Another student who had worked on the inventor Alexander Graham Bell noticed that ChatGPT attributed to him the invention of “visible speech,” whereas it should have credited his father, Alexander Melville Bell, for the invention of this phonetic system.
ChatGPT also taught a student that the French recipient of the Nobel Prize in physics, Louis Néel, had been trained by Marie Curie before founding his own laboratory at the Université of Strasbourg and then holding a teaching position at the Sorbonne… All of which were inaccurate claims!
In the case of little-known scientific figures, such as the 18th-century French astronomer Nicole-Reine Lepaute, ChatGPT was even more confused, generating entirely fictional sections of biographies.
Hallucinations
If these discrepancies made my students smile, they found it less amusing that ChatGPT’s uncontrolled bursts of creativity, or “hallucinations,” as they are called in AI jargon, extended to the bibliographic references.
The student working on the physicist Louis Néel had struggled to collect sources to document his work. He was then surprised to see that the biography generated by ChatGPT referenced several academic works that he had been unable to find before, being even more astonished to discover that these references were, in fact, entirely fabricated.
Another student, who chose to explore the career of the physician Ignace Philippe Semmelweis, not only discovered that ChatGPT had suggested non-existent references, although they initially seemed plausible, but that even the real references it had provided only mentioned Semmelweis in passing. Interesting fact: one of the books mentioned by ChatGPT was even considered a low-quality reference by serious historians of the Austro-Hungarian physician.
Second observation, methodological this time: the conversational agent was not only prone to enriching the historiography with imaginary work, but even when it proposed real references, the quality of its literature review could prove to be weak and irrelevant.
From a pedagogical point of view, I could have used these fabricated bibliographic references to explain to my students the “mechanics” behind ChatGPT. Its “hallucinations” are not solely due, as is often believed, to the fact that the data on which it was trained contain factual errors or contradictory and biased information since the false references produced simply don’t exist on the internet.
These “hallucinations” are actually inseparable from the tool itself. It remains a very powerful probabilistic text generator since it produces sentences based on the probability that words appear in similar sentences and contexts within its training data. In other words, neither intelligent nor creative, ChatGPT is an algorithm that relies on statistical methods to calculate probabilities and on a vast amount of training data to generate text with the highest chances of “correctly” answering a question.
Even if it were trained on a “perfect” data corpus, the probability of ChatGPT generating errors wouldn’t be zero. It responds in probabilistic terms and not according to truth criteria; its “intelligence” is therefore only apparent, as is the case with all algorithms.
A simplistic approach
The 3rd limitation identified by the students in ChatGPT’s prose refers to the very nature of what a good scientific biography should be. Several noticed that the texts generated by the conversational robot often leaned toward hagiography and presented the scientists as solitary geniuses of science, obscuring the social and intellectual context that influenced their trajectory. The biographies produced by ChatGPT were smoothed over and left out certain darker aspects of the scientists’ lives to offer a flawless portrait.
A student who had chosen Antoine Lavoisier as his subject observed that, unlike his own text, ChatGPT’s text had omitted to place in time the discoveries of the French chemist on oxygen in relation to the experiments of his British contemporary Joseph Priestley, who relied on the concept of phlogiston. Putting these events in context is crucial for understanding the originality of Lavoisier’s scientific approach and the epistemological rupture it establishes with the qualitative approach that prevailed in chemistry until then.
Even more striking is the case of Thomas Edison, pictured by ChatGPT as the “inventor” of the electric light bulb, a statement that is common sense but lacks historical depth. Indeed, the development of the incandescent lamp with carbon filament stemmed more from a collective invention work, Edison himself being at the head of a team of inventors working in his laboratory in the late 1870s.
At that time, Edison was far from the only one working on the concept of the incandescent lamp, as evidenced by his association with the British electrician Joseph Swan. The success of his model relied on the use of other innovations, in the 1st instance with the mercury pump developed by the German-British chemist Hermann Sprengel in 1865. To present Edison as simply the inventor with a natural genius is ignoring the fact that what made his social existence possible was the emergence of industrial research, which began to be established as a collective activity within companies in the late 19th century.
Even if ChatGPT generated seemingly perfect biographies, composed of well-structured sentences and free of grammar and syntax errors, it also showed significant limitations in terms of the accuracy of facts presented, the relevance of the provided sources, and the problematization of its biographical subjects.
ChatGPT: a reliable tool?
My students greatly appreciated the exercise. At first, they were impressed and somewhat stunned by how quickly the conversational tool produced a “quality” biography when they had spent nearly a month completing the same exercise. Their initial reaction faded as my students dug deeper into the work generated by ChatGPT. What puzzled them the most was the confidence with which the AI presented inaccurate facts as reliable. They also realized how important it was to stay vigilant regarding the sources provided by ChatGPT, which sometimes took pleasure in inventing sources, definitely invented but plausible (for example, the author was real, but the title of their work was fake).
In addition, my students noticed that ChatGPT was even more creative when it came to a topic that did not concern Anglo-Saxon culture. Since ChatGPT was first trained with Anglo-Saxon data, the biographies of individuals from different linguistic and cultural backgrounds were much less specific.
Some of my colleagues have taken a more positive approach to ChatGPT in their classrooms. They emphasize the time-saving aspect the tool can provide when it comes to summarizing a text, translating information, or organizing facts chronologically. However, my goal with this exercise was to encourage my students to use critical thinking when faced with information and to act as experts on a given topic. By scrutinizing the life of a scientist for nearly a month, they were able to clearly understand the factual gap between their biographies and those generated by ChatGPT.
In conclusion, the small pedagogical experiment I conducted shows that before engaging in an in-depth conversation with ChatGPT on a given historical topic, reasonable knowledge of the topic at hand remains an essential prerequisite for not being misled by the multiple pitfalls of AI.