CIA Library: Artificial Intelligence (AI) in Research: What is Generative AI?

Generative AI

What is Generative AI?
Challenges & Limitations of Generative AI

"Generative AI (GenAI) is an artificial intelligence (AI) technology that automatically generates content in response to prompts written in natural-language conversational interfaces. Rather than simply curating existing webpages by drawing on existing content, GenAI actually produces new content. The content can appear in formats that comprise all symbolic representations of human thinking: texts written in natural language, images (including photographs, digital paintings and cartoons), videos, music and software code."

"GenAI is trained using data collected from webpages, social media conversations and other online media. It generates its content by statistically analysing the distributions of words, pixels or other elements in the data that it has ingested and identifying and repeating common patterns (for example, which words typically follow which other words). While GenAI can produce new content, it cannot generate new ideas or solutions to real-world challenges, as it does not understand real-world objects or social relations that underpin language. Moreover, despite its fluent and impressive output, GenAI cannot be trusted to be accurate" (UNESCO, 2023, p. 8).

Source: United Nations Educational, Scientific and Cultural Organization (UNESCO). (2023). Guidance for generative AI in education and research.

Limitations of Generative AI (Like ChatGPT)

Generative AI is changing rapidly and it is important to keep in mind that content created by AI tools is the result of a predictive computer model; content is not necessarily accurate, unbiased, up to date, or ethical or legal to pass off as your own work.

Generative AI and tools such as ChatGPT suffer from these downsides including:

Limitation	Example
Inaccuracies or "hallucinations"	There are many reports of false information in responses. Tools built around large language models are using words to "predict" accurate information and may make mistakes. EXAMPLE: ChatGPT can produce "fake" or "made up" citations when was asked to provide a list of sources on a topic.
Not up to date	Unless the tool is actively connected to the web, it will not be trained on current information which will impact its responses. EXAMPLE: When asked for current trends on a topic or for who the current President is, certain tools will not be able to provide that information.
Bias of the training material	Since the tools are trained on materials written by biased humans, the response may also be bias in some way. EXAMPLE: If asked to create images of CEO's or prisoners, the people in the images will reflect stereotypical images like those we may see in our modern media, which continues harmful biases.
Transparency of information/Source Evaluation	We do not know exactly what information is used in training data. The tools are also not "searching" the training data like a search engine or database. The content is completely stripped of context and authority. EXAMPLE: When you search on Google for current trends in legal scholarship, you are able to evaluate if the person writing is a legal scholar, a law student, or someone completely removed from the field.
Information behind paywalls	Generative AI tools do not have access to information behind paywalls, which is frequently more quality information than what is freely accessible to tools that access the web to provide responses.
Limits on conversations	Due to its capabilities to be used for nefarious purposes, many generative AI tools have "guardrails" which prevent it from answering certain types of questions, including those related to politics, "nonsense", and other sensitive topics.

The information on this page is adapted from "Student Guide to Generative AI" by Jessica Kiebler, Beekman Library, Pace University and "Generative Artificial Intelligence", Geisel Library, UC San Diego.

Glossary

Artificial Intelligence - the use of computers to model the behavioral aspects of human reasoning and learning ("artificial intelligence"). AI uses algorithms, such as logic, pattern recognition, and machine learning to perform like human intelligence. AI can solve complex problems and make art. AI is being integrated into many applications with a variety of uses and fields.

ChatGPT - A chatbot that can generate human-like text based on conversational prompts from a user. It is based on a large language model, version GPT-4 as of 14 March 2023, and deep learning. GPT stands for Generative Pre-trained Transformers which is a type of neural network that can be trained on large amounts of data.

Hallucination - When an AI tool generates a confident response but the answer is false. This occurs because the information was not available in the training data. This can also occur if the training data was biased. The algorithm will create an answer with a high level of confidence and then go on to repeat that false answer. This is seen when an AI tool gives sources that do not exist or very skewed, or biased answers. This is why it is important to fact-check AI responses.

Large language models (LLMs) - It "is a deep learning algorithm that can recognize, summarize, translate, predict and generate text and other content based on knowledge gained from massive datasets" (Lee). LLMs like GPT-3 use neural networks to train on a large amount of data and form predictions based on probability. They have a wide variety of uses from text and image generation to software development and coding. ChatGPT is an example.

Machine learning - A type of artificial intelligence that allows computers to be trained from data rather than being explicitly programmed.

Neural networks - A type of machine learning algorithm that is modeled after the human brain. It consists of interconnected nodes that process information and make predictions. Neural networks are used in artificial intelligence applications to facilitate image recognition, natural language processing, and speech recognition.

Prompts - A method of communicating with Large Language Models like ChatGPT or BingChat to generate a response. The user formulates a question or statement to initiate a response from the AI tool.

Training datasets - The data used to train machine learning models. The datasets teach the model to make predictions based on probability. The quantity and quality of training data are critical factors in determining the reliability and performance of a machine learning model. Training models are a fixed set of data. New, often larger datasets are released periodically which prompts developers to train new and improved models. This is why AI tools like ChatGPT release new versions.

Sources

"Artificial intelligence." The Columbia Encyclopedia, Paul Lagasse, and Columbia University, Columbia University Press, 8th edition, 2018. Credo Reference, Accessed 29 Mar. 2023.

Lee, Angie. “What Are Large Language Models Used for and Why Are They Important?” NVIDIA Blog, 26 Jan. 2023, https://blogs.nvidia.com/blog/2023/01/26/what-are-large-language-models-used-for/.

Student Success Librarian

Jackie Mayse

she/her

Email Me

Contact:

Jessica R. Gund Memorial Library
Cleveland Institute of Art
11610 Euclid Ave.
Cleveland, OH 44106

(216) 421-7441