AI bias, also called machine learning bias or algorithm bias, refers to the occurrence of biased results due to human biases that skew the original training data or AI algorithm—leading to distorted outputs and potentially harmful outcomes. Generative AI tools reflect the biases present in their training data, which may originate from various sources including: data inputters, anyone providing data or content (personal bias), the origin of the data (machine bias), and the exclusion of underrepresented or marginalized communities (selection bias). Moreover, users may inadvertently reinforce their existing beliefs by rephrasing prompts until they receive the answer they most desire (confirmation bias). Generative AI tools amplify and reinforce these biases, and it is crucial to remain critical of outputs/responses.
The reflection and perpetuation of human biases in AI, along with the dominance of certain languages and cultures, pose significant challenges for creating inclusive and equitable AI systems. Researchers must consider the societal and cultural impacts of AI on diverse communities.
Learn More:
While generative AI tools can help users with such tasks as brainstorming for new ideas, organizing existing information, mapping out scholarly discussions, or summarizing sources, they are also notorious for not relying fully on factual information or rigorous research strategies. In fact, they are known for producing "hallucinations," an AI science term used to describe false information created by the AI system to defend its statements. Oftentimes, these "hallucinations" can be presented in a very confident manner and consist of partially or fully fabricated citations or facts.
Certain AI tools have even been used to intentionally produce false images or audiovisual recordings to spread misinformation and mislead the audience. Referred to as "deep fakes," these materials can be utilized to subvert democratic processes and are thus particularly dangerous.
Challenges in ensuring AI-generated content is accurate and transparent, combined with the risk of spreading misinformation, necessitate rigorous verification methods and ethical guidelines in research. Increased transparency from individuals using AI to generate potentially false information as well as the companies the train AI models is crucial in stemming the flow of disinformation spread by these technologies. Inconsistent regulation has made this harder to achieve.
Learn More:
The management and protection of sensitive data by AI platforms are paramount, requiring research into advanced security protocols and ethical data handling practices.
Content and prompts submitted into a generative AI tool can be shared with the tool’s training dataset and shared in some manner with other users. Many of these policies share that data is automatically collected to enhance their training data, algorithm, and products (text, graphics, visual, audio, etc.). As with all terms and conditions, a generative AI tool’s privacy policy can be updated anytime. Check the data privacy policy of the tool regularly for how data is collected, used, and stored. It is best practice not to share sensitive information, personal or confidential, of any kind.
Learn More:
While experts generally believe that training generative AI using varied corpus materials is a fair use, until court cases address the issue head-on on it will remain an open question. While general copyright exceptions and limitations apply to carve out acceptable uses of otherwise copyright-protected works, contractual restrictions can curtail any or all of those exceptions and limitations, such as fair use. For example, most things on the World Wide Web are automatically protected by copyright. Many websites employ terms of service (contracts) to curtail the use of their content. If the terms of service of a website state that its contents may not be used to train AI, then a breach of those terms may be considered a breach of contract. However, courts vary on whether they will enforce terms of service with no “click-through” license on the grounds that there is no legally binding contract between the person browsing the web and the website owner.
Beyond just training AI, the use of existing content by AI raises intellectual property issues, and the potential for unethical applications, such as deepfakes, demands a focus on developing ethical AI frameworks.
Learn More:
The environmental footprint of training and deploying AI models requires research into more sustainable AI practices and technologies. Evidence shows that data centers have increase in their energy demand as they begin to support the needs of high-powered AI models. The impacts to the power grid, energy sector, and global water use suggest that there is much work to be done to make this technology sustainable.
Learn More:
A content aggregator, or moderator, is an individual or organization that collects data. Content aggregators are often employees who train and improve the tool's algorithms. These employees, often noted as "invisible workers" or "ghost workers," can range from those who train and annotate or label the data to those who enhance and test the algorithm or the models as well as other tasks. Outsourced and contract data workers are especially susceptible to these conditions.
However, it should be noted that if a tool uses web scraping, collecting public information from the Internet, then data is gathered from anyone who has posted content online or has had their content published online. Training AI materials on copyrighted material are protected by fair use but exploit artists and authors while also devaluing their labor. In addition to equity and wage concerns, there are also repercussions to workers' mental health due to exposure to sensitive topics.
Many companies have hired, or are hiring, professionals to create original content and provide oversight, although this process is still under development.
Learn More:
While many have been concerned about how AI could replace artists, as of now, there is no actual evidence that this is happening. Additionally, research on this particular topic is lacking. However, artists and those working with artists, are keenly watching the ways that AI generated images, video, audio, and more are changing how artists must navigate the making and dissemination of their work.
Learn More:
Adapted from Maryland Institute College of Art Library, University of South Florida Libraries, Georgetown University Library, and University of Illinois Urbana-Champaign Library