Generative AI Glossary
A Glossary of Generative AI Terms for Using the Clarifai Platform Effectively
A
Adversarial Autoencoder (AAE)
A type of autoencoder which combines the principles of adversarial loss, integral to GANs, and the architecture of an autoencoder. This combination empowers the model to learn complex distributions of data effectively.
Audio Synthesis
This involves using AI to create new, artificial sounds or voice outputs. Such sounds can be as simple as a specific tone or as complex as a mimicked form of speech.
Autoregressive Models
These are generative models that produce data by conditioning each element's probability on previous elements in a sequence. For example, WaveNet and PixelCNN are autoregressive models for creating music and images, respectively.
Autoencoder
An autoencoder is an artificial neural network utilized for learning efficient encodings of input data. It has two crucial components: an encoder that compresses the input data and a decoder that reconstructs the data from its reduced form.
Autoregressive Generative Models
These models predict the distribution of subsequent sequence elements using prior sequence elements to implicitly establish a distribution across sequences using Conditional Probability's Chain Rule. The main architectures for autoregressive models are causal convolutional networks and recurrent neural networks.
B
BERT (Bidirectional Encoder Representations from Transformers)
BERT, developed by Google, is a pre-trained transformer-based language model. It stands out for its bidirectional training approach, which allows it to understand the context of a word based on all of its surroundings (left and right of the word).
BLOOM
Developed by The BLOOM project, Bloom is a large-scale language model that can execute a vast array of natural language understanding and generation tasks accurately.
C
ChatGPT
Developed by OpenAI, ChatGPT is a specialized large-scale language model that generates human-like text. It's a popular choice for developing AI powered chatbots due to its convincing conversation-generation capabilities.
CLIP (Contrastive Language—Imagen Pretraining)
This involves using AI to create new, artificial sounds or voice outputs. Such sounds can be as simple as a specific tone or as complex as a mimicked form of speech.
Close-Book QA
Close-book QA, also known as zero-shot QA, refers to the ability of an LLM to answer questions without access to any additional information or context beyond its internal knowledge base.
Close-book QA stands in contrast to open-book QA, where the LLM can access and process external sources of information, such as documents, web pages, or knowledge bases.
Conditional GANs (cGANs)
These are a type of GAN where a conditional variable is introduced to the input layer, allowing the model to generate data conditioned on certain factors. This augmentation provides the model with the capability to generate data with desired characteristics.
Cross-modal
Cross-modal learning refers to using information from one modality to understand or make predictions in another modality. This could involve translating or transforming the data in some way. For example, a cross-modal learning system might be designed to accept text input and output a related image or vice versa.
CycleGAN
A type of GAN that can translate an image from a source domain to a target domain without paired examples. It's particularly useful in tasks like photo enhancement, image colorization, and style transfer for unpaired photo-to-photo translation.
D
DALL-E 2
This is an updated version of DALL-E, an AI model developed by OpenAI to generate images from textual descriptions. It's an excellent example of a multi-modal AI system.
Data Distribution
In machine learning, data distribution refers to the overall layout or spread of data points within a dataset. In the case of generative models such as GANs, the generator seeks to mimic the actual data distribution.
Deepfake
Synthetic media in which a person in an existing image or video is replaced with someone else's likeness using machine learning techniques. While they could serve interactive entertainment purposes, deepfakes may mislead viewers, often with unintended consequences.
Diffusion
In AI, 'diffusion' refers to a technique used for generating new data by starting with a portion of actual data, then gradually adding random noise. This process is generally reversed, with a neural network trained to predict the reverse process of noise addition to the data.
Discriminator
In a GAN, the discriminator is the component that tries to differentiate real data instances from the fictitious ones fabricated by the generator. It helps refine the generator's ability to create realistic data.
E
Embedding
An embedding represents data in a new form, often a vector space, facilitating comparisons and calculations with other data points. Similar items should have similar embeddings, making it an essential feature for many AI tasks, like recommendation systems and natural language processing.
Emergence/Emergent Behavior
("sharp left turns," intelligence explosions). In artificial intelligence, emergence refers to complex phenomena that arise from simple rules or processes. Radical concepts like "sharp left turns" and "intelligence explosions" denote sudden, dramatic developments in AI, often related to AGI's emergence.
F
Few-Shot Learning
A machine learning method where the model learns to perform a task from a few examples per class. For instance, it can correctly categorize new data after being shown only a few samples from each category.
Fine-Tuning
A form of transfer learning wherein a pre-trained model is slightly modified or adjusted to perform a new task. This process allows for more efficient use of the pre-trained models by adjusting them to solve tasks similar to the ones they were originally trained on.
Foundation Model
In AI, foundation models are large-scale AI models trained on diverse and extensive data meant to be fine-tuned or adapted for more specific tasks. These are called foundation models, as they offer a robust and broad foundation that can be built upon for various AI tasks.
G
Generative Models for Images
These are generative models like GANs, VAEs, and DALL-E, trained on image data and capable of generating new images that reflect the patterns found in the training data.
Generative Pre-Trained Transformer (GPT)
GPT is a family of neural network models trained to generate content. These models are pre-trained on vast amounts of text data, allowing them to generate coherent and relevant text based on user prompts. GPT models can automate content creation and analyze customer feedback for insights, fostering personalized interactions.
Generator
In a Generative Adversarial Network, the generator is the component that creates new instances of data by learning to mimic the real data distribution.
GPT-1, GPT-2, GPT-3, and GPT-4
Progressive versions of the generative pre-trained transformers developed by OpenAI. Each model sees improvements and expansions on its predecessors, offering advanced text generation capabilities and greater application versatility. GPT-3, for instance, is an extremely sophisticated model known for its wide-ranging applicability, including translation, question-answering, and text completion tasks.
GPT-J
GPT-J is an open-source large language model developed by EleutherAI in 2021. It is a generative pre-trained transformer model with 6 billion parameters, similar to GPT-3, but with some architectural differences. GPT-J was trained on a large-scale dataset called The Pile, a mixture of sources from different domains.
GPT-Neo
GPT-Neo is a family of transformer-based language models from EleutherAI based on the GPT architecture. It is an open-source alternative to GPT-3 that can generate natural language texts using deep learning. The GPT-Neo model comes in 125M, 1.3B, and 2.7B parameter variants. This allows users to choose the model size that best fits their specific use case and computational constraints.
Grounding
It is the process of linking a model's output to factual and verifiable information sources. This technique enhances the accuracy and reliability of the model, especially in applications where factual correctness is critical. Grounding reduces the risk of the model generating unfounded or incorrect content.
H
Hallucination
In AI, a hallucination occurs when a model makes erroneous conclusions and generates content that doesn't correspond to reality. These erroneous outputs indicate problems in the workings of the AI model. Team vigilance is necessary to maintain the accuracy and reliability of AI systems in identifying and mitigating hallucinations.
I
Image Translation
A task in computer vision where the goal is to map or translate one image into another, often using a model known as GANs. For example, translating a daytime scene into a nighttime scene.
Inpainting
A generative task where the AI is meant to fill in missing or corrupted parts of an image. Typical applications include photo restoration and the completion of unfinished art.
L
Langchain
Langchain is a concept in AI and machine learning that affects reasoning capability. When prompting an LLM, the "chain-of-thought" technique improves the model's reasoning by breaking tasks into smaller, discrete steps. A more complex approach, "tree-of-thought," allows logical steps to branch and backtrack.
Large Language Models (LLMs)
Large-scale AI models trained on extensive text data, such as GPT 3 and BERT. They can respond to prompts, generate text, answer questions, create poetry, and even generate code. This ability can enable personalized and authentic customer interactions and assist in automating customer-facing content.
Latent Space
In generative models, latent space refers to a compressed input data representation. It is the transition medium between the noise injected into the GAN’s generator and its output.
Llama 2
Llama 2 is a collection of pre-trained and fine-tuned large language models (LLMs) created and publicly released by Meta AI. It is available in three model sizes: 7, 13, and 70 billion parameters. Llama 2-Chat is a fine-tuned version of Llama 2, specifically optimized for dialogue-based scenarios.