1. Generative AI
A subset of artificial intelligence focused on creating new content, such as text, images, audio, video, or code. It relies on models trained on vast datasets to identify patterns and generate similar but unique outputs.
2. Large Language Model (LLM)
An AI model trained to process and generate human-like text. Examples include GPT (Generative Pre-trained Transformer), BERT, and LaMDA, all of which leverage deep learning architectures, specifically transformers.
3. Transformer
A neural network architecture known for its ability to process sequential data, like text. Transformers use self-attention mechanisms, enabling the model to learn relationships between words and capture long-range dependencies efficiently.
4. Retrieval-Augmented Generation (RAG)
A hybrid approach combining retrieval-based and generative AI techniques. RAG models retrieve relevant information from external sources (e.g., databases or documents) and then use this information to generate contextually accurate responses, improving the model’s factual accuracy and relevance.
5. LangChain
A framework designed to facilitate the development of applications that rely on LLMs for various NLP tasks. LangChain is particularly useful for chaining together complex workflows where multiple steps or sources of data are involved, such as combining data retrieval, generation, and summarization into a single, coherent process.
6. CrewAI
An emerging tool focused on collaborative AI model development and deployment. CrewAI enables multiple users to contribute to and manage AI projects, making it easier to maintain version control, monitor performance, and integrate feedback for fine-tuning models in real-time, especially in enterprise environments.
7. Pre-training and Fine-tuning
- Pre-training: Training a model on a large dataset to develop a general understanding of the data.
- Fine-tuning: Customizing the model for specific tasks by training it on a smaller, task-specific dataset.
8. Prompt and Prompt Engineering
- Prompt: Text input given to a generative model to guide its response.
- Prompt Engineering: The craft of designing prompts that elicit the most accurate or creative responses from a model, especially valuable in guiding LLMs toward desired outputs.
9. Few-shot, One-shot, and Zero-shot Learning
These terms describe how much prior information or example data is needed for a model to perform a new task.
- Few-shot learning: The model is given a few examples.
- One-shot learning: The model is given only one example.
- Zero-shot learning: The model receives no prior examples and must generalize based on its pre-trained knowledge.
10. Token
The basic unit of data processed by language models, often representing individual words or subwords. For example, the word “chatbot” might be split into “chat” and “bot.”
11. Temperature and Top-p Sampling
These parameters help control the variability of model outputs.
- Temperature: A higher temperature introduces more randomness, while a lower temperature makes the output more deterministic.
- Top-p (Nucleus Sampling): Limits the model’s choices to a subset of probable outcomes, refining its output quality by considering only top results until a cumulative probability threshold (p) is reached.
12. GAN (Generative Adversarial Network)
A model architecture where two networks—a generator and a discriminator—compete to create high-quality data. GANs are especially prominent in image generation.
13. Diffusion Model
A generative approach used mainly for image creation, where the model generates data by reversing a noise-adding process. Diffusion models have become popular for creating realistic visual outputs in tools like DALL-E and Stable Diffusion.
14. Reinforcement Learning with Human Feedback (RLHF)
A training approach where human evaluators provide feedback to improve model outputs. RLHF is used to make responses more accurate and aligned with human values, particularly in conversational AI models.
15. Ethical AI and Responsible AI
Principles guiding the safe, fair, and transparent development of AI systems. This includes addressing issues like bias, privacy, and the potential societal impacts of AI-generated content.
16. Bias and Fairness in AI
Bias in AI can lead to unfair treatment of individuals or groups and is a significant challenge in ensuring responsible AI use. Bias mitigation techniques are essential to avoid harmful stereotypes and enhance fairness.
17. Synthetic Data
Artificially generated data used for training AI models, often when real data is limited, costly, or poses privacy concerns. Synthetic data diversifies training datasets and improves model accuracy in scenarios with insufficient real-world data.
18. Image-to-Image, Text-to-Image, and Text-to-Text Models
These are generative models for converting one type of input to another:
- Image-to-Image: Converts an image input into a modified version, such as adding color to black-and-white photos.
- Text-to-Image: Generates images from text prompts (e.g., DALL-E).
- Text-to-Text: Language models generating text based on an input prompt (e.g., GPT-4, ChatGPT).
19. Inference and Latency
- Inference: The process of generating outputs based on model predictions for new data.
- Latency: The time delay between input and response, crucial for applications that require real-time interactions.
20. Multimodal AI
AI that can process and generate multiple types of data (e.g., text, images, audio). Multimodal applications are highly beneficial in tasks where a holistic understanding of diverse data types is required.
21. Hallucination
When a generative model outputs incorrect or nonsensical information. Addressing hallucinations is essential for deploying reliable AI, especially in applications where factual accuracy is crucial.
22. Vector Database
A specialized database designed to store vector embeddings, which are numerical representations of data points (such as text or images). Vector databases, like Pinecone and FAISS, are essential in retrieval-augmented generation (RAG) workflows, allowing models to efficiently search for relevant information.
23. Embeddings
Numerical representations of data that capture semantic meaning, used to measure similarity between different pieces of information. Embeddings are vital in search, recommendation, and clustering applications.
24. Fine-tuning vs. Training from Scratch
- Fine-tuning: Adapting a pre-trained model to a specific task by training it further on a smaller dataset.
- Training from Scratch: Training a model from an uninitialized state, usually requiring extensive data and computing power.
25. OpenAI API
A popular API by OpenAI that provides access to advanced language models like GPT-4o. It allows developers to build applications leveraging LLM capabilities without developing and hosting their own models.