Important Key Terms for AI Governance
The field of AI is advancing quickly across various sectors and industries, creating a challenge for business, technology, and government professionals who lack a common language and understanding of AI governance terms. Even the definition of "artificial intelligence" varies widely, from cinematic examples like HAL 9000 in "2001: A Space Odyssey," to creative applications such as Midjourney and DALL-E, to everyday uses like email autocorrect and mobile maps. AI's use cases and applications are rapidly expanding into all areas of life.
This glossary updates IAPP's Key Terms for AI Governance, originally released in October 2023. The revised version incorporates insights from various sources and feedback from leading AI governance experts, adding new terms and revising existing ones.
The original glossary from June 2023 was created using a wide range of sources, offering concise yet detailed definitions for common AI-related terms. This approach has been maintained for the latest updates, which include both policy and technical perspectives to enhance the discussion on AI governance. While some terms and definitions overlap, this glossary is distinct from the official IAPP Glossary of Privacy Terms.
Privacy professionals around the world are continuously adapting to new challenges and trends. According to a recent report from the International Association of Privacy Professionals (IAPP), there are several key areas that organizations should focus on.
Important Key Terms for AI Governance
1) Accountability
Developers and users of AI systems need to make sure the system works in a way that's ethical, fair, and clear, and follows the relevant laws and rules. Accountability means that one should be able to trace any actions, decisions, or results of the AI system back to the people or organizations responsible for it.
2) Accuracy
Accuracy refers to how well an AI system does its intended job. It measures how effectively the system produces correct results based on the data it receives. Accuracy is crucial for assessing how reliable an AI model is, particularly in areas where precision is essential, like medical diagnoses.
3) Active learning
Active learning is a part of AI and machine learning where an algorithm chooses specific data to learn from. Instead of using all available data, the model asks for more information on certain points that will help it learn more effectively.
4) Adaptive learning
Adaptive learning is a technique that customizes educational content to fit each student’s unique needs, skills, and learning speed. Its goal is to create a personalized learning experience that matches different learning styles and helps each student learn more effectively.
5) Adversarial attack
A safety and security risk for an AI model occurs when someone deliberately alters the model by feeding it harmful or misleading data. These attacks can make the model produce wrong or unsafe results, which can lead to serious consequences. For instance, tricking a self-driving car’s system into thinking a red light is green could endanger road safety.
6) AI assurance
AI assurance is a set of systems, rules, and practices designed to ensure that AI is safe, reliable, and trustworthy. This can include things like checking for conformity, assessing impacts and risks, conducting audits, issuing certifications, and following relevant standards.
7) AI audit
An AI audit is a thorough examination of an AI system to make sure it works correctly and meets all applicable laws, regulations, and standards. It helps spot potential risks and provides strategies to address them.
8) AI governance
AI governance is a collection of rules, policies, and practices at different levels—international, national, and organizational—that guides how AI technology is developed, used, and monitored. It helps ensure that AI meets stakeholders' goals, is used responsibly and ethically, and follows all relevant laws and regulations.
9) Algorithm
A procedure or set of instructions and rules designed to perform a specific task or solve a particular problem using a computer.
10) Artificial general intelligence (AGI)
AGI (Artificial General Intelligence) refers to AI that has human-like intelligence and can handle a wide variety of tasks and goals across different situations. It remains a theoretical concept and is different from "narrow" AI, which is designed for specific tasks or problems.
11) Artificial intelligence (AI)
Artificial intelligence (AI) is a general term for systems designed to perform or automate tasks using different computational methods. This can involve techniques like machine learning, where systems improve and adapt by learning from experience. AI is a branch of computer science focused on making computers act intelligently, which may also involve automated decision-making.
12) Automated decision-making
The process of making a decision by technological means without human involvement, either in whole or in part.
13) Bias
AI can have different types of bias. Computational or machine bias happens when a model makes systematic errors because of its assumptions or the data it uses. Cognitive bias involves individual mistakes in judgment or thinking, while societal bias leads to widespread prejudice or discrimination. These biases can affect how data is chosen for training the model and can influence outcomes, potentially threatening individual rights and freedoms.
14) Bootstrap aggregating
A machine learning technique that combines several versions of a model, each trained on different random portions of the data. This approach helps make the overall model more reliable and accurate.
15) Chatbot
A type of AI that mimics human conversation and interaction by using natural language processing and deep learning to understand and reply to text or speech.
16) Classification model
A type of machine learning model that categorizes input data into different classes or groups.
*Sometimes referred to as classifiers.
17) Clustering
An unsupervised machine learning technique that finds and analyzes patterns in data, grouping similar data points into clusters based on their similarities.
*Sometimes referred to as clustering algorithms.
18) Compute
Compute refers to the processing power available to a computer system, including hardware like the central processing unit (CPU) and graphics processing unit (GPU). It is crucial for tasks such as handling memory, storing data, running applications, rendering graphics, and supporting cloud computing.
19) Computer vision
Computer vision is a branch of AI that involves using computers to interpret and analyze visual data like images and videos. It’s used in applications such as facial recognition, object detection, and analyzing medical images.
20) Conformity assessment
An evaluation, usually carried out by an external party, of an AI system to check if it meets various requirements like risk management, data governance, record-keeping, transparency, and cybersecurity.
21) Contestability
The principle of ensuring that AI systems can be scrutinized and questioned by humans. This involves being able to challenge the decisions and actions of AI systems, which relies on transparency and supports accountability in AI governance
22) Corpus
A corpus is a large set of texts or data that a computer uses to identify patterns, make predictions, or produce specific results. It can include both structured and unstructured data and may focus on a single topic or span multiple subjects.
23) Data leak
A data leak is an unintended release of sensitive or confidential information. It can happen due to weak security, human mistakes, misconfigured storage, or inadequate policies for data sharing. Unlike a data breach, a data leak is accidental and not done intentionally.
24) Data poisoning
An adversarial attack where a malicious user inserts false data into a model's training process to disrupt its learning. This intentional manipulation of the training data aims to introduce errors, causing the model to perform poorly and produce incorrect or harmful results.
25) Data provenance
Data provenance is the process of tracking and recording the history and origin of data from its creation and collection to its current state. It includes details about sources, processing methods, and involved parties, helping to ensure data integrity and quality. This process is crucial for transparency and governance, and it aids in understanding both the data and the overall AI system.
26) Data quality
Data quality refers to how well a dataset meets the needs and expectations for its intended use. It affects the quality of AI results and the performance of an AI system. High-quality data is accurate, complete, valid, consistent, timely, and suitable for its purpose.
27) Decision tree
A decision tree is a type of supervised learning model used in machine learning that maps out decisions and their possible outcomes in a branching diagram.
28) Deep learning
Deep learning is a branch of AI and machine learning that uses artificial neural networks to process complex data. It's particularly effective for tasks like natural language processing, speech recognition and image recognition.
29) Deepfakes
Deepfakes are audio or visual content that has been edited or created using AI techniques. They can be used to spread misleading or false information.
30) Diffusion model
A generative model for creating images that starts with random noise and gradually refines it to produce a realistic image based on a given prompt.
31) Discriminative model
A type of machine learning model that maps input features to specific categories and looks for patterns to differentiate between them. It's commonly used in tasks like text classification, such as determining the language of a text or spotting spam. Examples include neural networks, decision trees, and random forests.
32) Disinformation
Disinformation is audio or visual content that is deliberately altered or created to cause harm. It can spread through deepfakes made by people with harmful intentions.
33) Entropy
Entropy measures how unpredictable or random a set of data is in machine learning. Higher entropy means there is more uncertainty in predicting outcomes.
34) Expert system
A rules-based AI that uses a knowledge base from human experts to mimic their decision-making skills in a specific area, such as diagnosing medical conditions.
35) Explainability
Explainability is the ability to clearly describe how an AI system produces a particular result or makes a decision. It's crucial for maintaining transparency and trust in AI.
36) Exploratory data analysis
Data discovery involves techniques used before training a machine learning model to get initial insights into a dataset. This includes spotting patterns, outliers, anomalies, and relationships between variables.
37) Fairness
Data discovery involves techniques used before training a machine learning model to understand a dataset better. This includes identifying patterns, spotting outliers and anomalies, and uncovering relationships between different variables.
38) Federated learning
Federated learning is a machine learning method where models are trained locally on data from multiple devices. Instead of sending the actual data to a central server, only updates to the model are sent. These updates are combined at a central location to improve the global model, and this process repeats until the model is fully trained. This approach enhances privacy and security by keeping individual data on the local devices.
39) Fine-tuning
Fine-tuning is the process of taking a deep learning model that has already been trained on general data and further training it for a specific task using a smaller, labeled dataset. This involves adjusting the model to better perform on the specialized task based on the new data.
40) Foundation model
A foundation model is a large-scale model trained on a wide variety of data to support a range of capabilities like language, vision, robotics, reasoning, search, or human interaction. It serves as a base that can be adapted for specific applications.
41) Generalization
The ability of a model to grasp the patterns and trends in its training data and use that understanding to make predictions or decisions about new, unseen data.
42) Generative AI
Generative AI is a field that uses deep learning and large datasets to create content like text, code, images, music, simulations, and videos based on user prompts. Unlike discriminative models, generative AI generates new content rather than just predicting outcomes based on existing data.
43) Greedy algorithms
A type of algorithm that focuses on making the best decision for a specific step or moment based on the current information, without considering the overall long-term optimal outcome.
44) Ground truth
The ground truth is the known, accurate state of a dataset that serves as a benchmark for evaluating the quality of an AI system. It acts as the real-world reference used to measure the accuracy and reliability of the system's outputs.
45) Hallucinations
Situations where generative AI models produce outputs that seem believable but are actually factually incorrect, giving the impression of being true.
46) Human-centric AI
Human-centered AI focuses on designing, developing, and using AI systems in a way that prioritizes human well-being, values, and needs. The aim is to create AI that enhances and supports human abilities rather than diminishing them.
47) Human in the loop (HITL)
A design approach that ensures humans can oversee, intervene, interact with, or control how an AI system operates and makes decisions.
48) Impact assessment
An evaluation process aimed at discovering, understanding, documenting, and addressing the potential ethical, legal, economic, and societal impacts of an AI system in a particular use case.
49) Inference
A machine learning process where a trained model is used to make predictions or decisions based on new input data.
50) Input data
Data given to or collected by a learning algorithm or model to generate an output. This data is essential for machine learning models to learn, make predictions, and perform tasks.
51) Interpretability
Interpretability refers to designing models in a way that makes their reasoning and decisions understandable to humans from the start. Unlike explainability, which explains decisions after they are made, interpretability focuses on creating models with structures, features, or algorithms that are inherently easier to understand. These models are often specific to their domain and require considerable domain expertise to develop.
52) Large language model (LLM)
Large Language Models (LLMs) are a type of AI that uses deep learning to create models trained on vast text datasets. These models learn patterns and relationships between characters, words, and phrases to handle various text-based tasks. There are two main types: generative models, which predict text based on learned word sequence probabilities, and discriminative models, which classify text based on learned data features and weights. The term "large" indicates both the extensive datasets used for its training and the model's substantial capacity, measured by its number of parameters.
53) Machine learning (ML)
Machine learning is a branch of AI where algorithms learn from data and make decisions or predictions based on that learning. Instead of being explicitly programmed, these algorithms build models from training data to handle new data. The process includes collecting and preparing data, engineering features, training, testing, and validating the model. Machine learning is used in various fields like fraud detection, recommendation systems, customer service, healthcare, and logistics.
54) Misinformation
Unintentional misleading audio or visual content that can be spread through deepfakes made by individuals who do not intend to cause harm.
55) Machine learning model
A model captures the patterns and relationships in data by processing a training dataset with an AI algorithm. It can then use this learned representation to make predictions or perform tasks on new, unseen data.
56) Model card
A brief document that provides details about an AI model, including its intended use, performance metrics, and how it performs under different conditions, such as across various cultures, demographics, or races.
57) Multimodal models
A multimodal model in machine learning can handle multiple types of input or output data, such as combining images and text. For instance, it might use both an image and a text caption to produce a score that shows how well the caption describes the image. These models are versatile and can be used for tasks like image captioning and speech recognition.
58) Multimodal models
A multimodal model in machine learning can handle multiple types of input or output data, such as combining images and text. For instance, it might use both an image and a text caption to produce a score that shows how well the caption describes the image. These models are versatile and can be used for tasks like image captioning and speech recognition.
59) Multimodal models
A multimodal model in machine learning can handle multiple types of input or output data, such as combining images and text. For instance, it might use both an image and a text caption to produce a score that shows how well the caption describes the image. These models are versatile and can be used for tasks like image captioning and speech recognition.
60) Multimodal models
A multimodal model in machine learning can handle multiple types of input or output data, such as combining images and text. For instance, it might use both an image and a text caption to produce a score that shows how well the caption describes the image. These models are versatile and can be used for tasks like image captioning and speech recognition.
61) Overfitting
Overfitting in machine learning occurs when a model becomes too tailored to the training data, making it unable to generalize well to new, unseen data. As a result, the model may struggle to make accurate predictions on new datasets.
62) Oversight
Oversight involves closely monitoring and supervising an AI system to reduce risks, ensure it follows regulations, and maintain responsible practices. This is crucial for good AI governance and can include certification processes, conformity assessments, and regulatory bodies that enforce compliance.
63) Parameters
Parameters are the internal variables that an algorithmic model learns from training data. These values are adjusted during training to enable the model to make predictions on new data. Parameters are specific to the model's architecture; for example, in neural networks, they are the weights assigned to each neuron.
64) Post processing
Post-processing involves steps taken after running a machine learning model to refine its outputs. This can include modifying the model's predictions or using a holdout dataset—data not used during training—to create a function that adjusts the model's outputs to enhance fairness or meet specific business needs.
65) Preprocessing
Data preprocessing involves preparing data for training a machine learning model by cleaning the data, handling missing values, normalizing, extracting features, and encoding categorical variables. This process is essential for improving data quality, reducing bias, addressing fairness issues, and boosting the performance and reliability of machine learning algorithms.
66) Prompt
An input or command given to an AI model or system to produce a response or result.
67) Prompt engineering
Prompt engineering is the intentional design and refinement of prompts to guide an AI model's responses toward more useful or accurate results.
68) Random forest
A random forest is a supervised machine learning algorithm that creates multiple decision trees and combines their predictions to achieve greater accuracy and stability. Each tree is built using a random subset of the training data, which is why it's called a random forest. This approach is useful for handling datasets with missing values or complex structures.
69) Red teaming
Red teaming involves testing an AI system's safety, security, and performance by simulating adversarial attacks to evaluate its resilience and identify vulnerabilities. This process aims to uncover security risks, model flaws, biases, and other potential issues by trying to exploit or manipulate the model. The findings are then provided to developers to address and improve the model before and after its public release.
70) Reinforcement learning
Reinforcement learning is a machine learning approach where a model is trained to improve its actions within an environment to achieve a specific goal. It uses feedback in the form of rewards and penalties, often learned through trial and error or simulations, without needing external data. For instance, an algorithm might be trained to score high in a video game by receiving feedback on its performance and adjusting its strategy accordingly.
71) Reinforcement learning with human feedback (RLHF)
Reinforcement learning with human feedback involves incorporating human evaluations into the training process. Human feedback is given on the model’s outputs by comparing and choosing which ones better reflect human preferences. This feedback is used as an additional source of rewards or penalties, helping to align the AI’s behavior with human values and preferences while it learns and improves.
72) Reliability
Reliability in an AI system means it consistently performs its intended function accurately and behaves as expected, even when faced with new, unseen data.
73) Robotics
Robotics is a multidisciplinary field focused on designing, building, operating, and programming robots. It enables AI systems and software to interact with and manipulate the physical world.
74) Robustness
Robustness in an AI system refers to its ability to remain functional, accurate, and perform well even when faced with security attacks or changes in input. It ensures the system can withstand and recover from disruptions while maintaining its performance across different conditions.
75) Safety
Safety in AI involves designing, developing, and deploying systems to reduce risks from misinformation, disinformation, deepfakes, and other unintended behaviors. It also includes managing malicious use and preventing potential existential or unexpected risks associated with advanced AI technologies.
76) Semi-supervised learning
Semi-supervised learning is a type of machine learning that uses a combination of a large amount of unlabeled data and a smaller amount of labeled data to train a model. This approach helps overcome the difficulty of obtaining large volumes of labeled data. Generative AI often uses semi-supervised learning to leverage both types of data effectively.
77) Small language models
Small language models are scaled-down versions of large language models. They have fewer parameters and need less training data, making them more efficient and suitable for environments with limited computational resources or applications that require quicker training and inference.
78) Supervised learning
Supervised learning is a type of machine learning where a model is trained using labeled input data and corresponding known outputs. The input data (predictors) and the desired outputs (targets) guide the training process. This approach is useful for tasks like classification, where the model categorizes data into specific groups, and regression, where it makes predictions by understanding relationships between variables.
79) Synthetic data
Synthetic data is created by a system or model to mimic the structure and statistical characteristics of real data without including actual real-world information. It's commonly used for testing or training machine learning models, especially when real data is limited, unavailable, or too sensitive.
80) System card
A system card is a concise document that provides information on how different AI models interact within a network, enhancing the overall explainability of the AI system. It is similar to a model card but focuses on the integration and operation of multiple models within the system.
81) Testing data
The test dataset is used to evaluate the performance of a trained model with new data, typically at the end of the initial development process. It helps assess how well the model performs on unseen data and is also used for future updates or modifications to the model.
82) Training data
The training dataset is used to teach a model how to accurately predict outcomes, discover patterns, or recognize structures by exposing it to various examples and information during the learning process.
83) Transfer learning model
A transfer learning model is designed to apply knowledge gained from one task, like identifying cats, to improve learning and performance on a different but related task, such as recognizing dogs.
84) Transformer model
A transformer model is a neural network architecture designed to understand context and relationships within sequence data, like words in a sentence. It uses an attention mechanism to focus on the most relevant parts of the input, enhancing model accuracy. For instance, in language tasks, it helps the model grasp the meaning of a word by considering the surrounding words in the entire sentence.
85) Transparency
Transparency in AI refers to openness and clarity about how AI algorithms operate and make decisions. It encompasses providing information about the system’s workings, such as through model or system cards, and maintaining documentation throughout the AI lifecycle. This can also involve techniques like watermarking to disclose AI usage and making source code accessible in open-source projects. The specific meaning of transparency can vary based on the context.
86) Trustworthy AI
his term is often used interchangeably with responsible AI and ethical AI, all of which emphasize guiding AI development and governance by principles such as security, safety, transparency, explainability, accountability, privacy, and fairness.
87) Turing test
A test designed to evaluate whether a machine can demonstrate intelligent behavior that is indistinguishable from that of a human. Originally proposed by Alan Turing, the test involves a machine engaging in a text-based conversation, where the goal is for a human judge to be unable to differentiate between responses from the machine and those from a human.
88) Underfitting
A concept in machine learning where a model does not adequately capture the complexity of the training data, leading to poor performance and inaccurate predictions. Underfitting can occur due to having too few parameters, excessive regularization, or insufficient or inappropriate features in the training data.
89) Unsupervised learning
A branch of machine learning where the model learns to identify patterns and structures within an unlabelled dataset with little to no human guidance. The AI examines the data to uncover patterns or groupings. This method is useful for tasks like clustering, detecting anomalies, reducing dimensionality, and extracting features.
90) Validation data
A subset of the dataset used to evaluate the model's performance while it's still being trained. Validation data helps in adjusting the model's parameters and preventing overfitting, ensuring it generalizes well before the final assessment with the test dataset.
91) Variables
In the context of machine learning, a variable is a measurable element or feature that can have different values. These can be numerical (quantitative) or categorical (qualitative), and they are used to describe and analyze data.
92) Variance
A statistical measure of how much the values in a dataset differ from their average. High variance means the values are spread out widely around the mean, while low variance means they are clustered close to the mean. In machine learning, high variance can lead to overfitting, where a model captures noise in the training data instead of general patterns. The trade-off between variance and bias is key: increasing model complexity often reduces bias but raises variance, while simplifying the model reduces variance but increases bias.
93) Watermarking
Embedding subtle or nearly invisible patterns into AI-generated content or metadata that can be detected only by computer systems. Watermarking aids in identifying and labeling AI-generated content, enhancing transparency.