Essential Technical Terms You Need to Know to Learn AI

AI Technical Word List
Artificial Intelligence (AI) | General Artificial Intelligence (AGI)
LLM (large language model) | model parameters | algorithms | computational power | natural language processing | machine learning (ML) | deep learning (DL) | neural networks | Transformer | Token
Intelligentsia | Multimodal | prompt | RAG | AIGC | What is the B in 12B?
GPU
Basic Terms
Artificial Intelligence (AI)
Artificial Intelligence (AI) is one of the hottest topics in the world today, and is the 21st century's wind vane that will lead the world's future development in the field of science and technology, as well as the transformation of lifestyles.
Artificial General Intelligence (AGI)
Artificial General Intelligence (AGI) is a type of AI that possesses intelligence comparable to or even exceeding that of humans. AGI is not only capable of basic thinking skills such as perception, understanding, learning, and reasoning like humans, but also capable of flexible application, rapid learning, and creative thinking in different fields.
Modeling and Training
Before understanding the following keywords, first understand the following figure, in today's (as of the time point of the article) everyone around the voice of the larger chivalrous artificial intelligence are refers to the training of the large language model (which will be described below), the large language model is equivalent to the human brain, people are born out of the intelligence is extremely low, but also through the world of learning (training) out.
LLM (Large Language Model)
Large Language Models (LLM) are a class of artificial intelligence models trained using large amounts of data to understand and generate natural language text. These models are often based on deep learning neural network architectures, particularly Transformer architectures, which are able to capture the complexity and nuance of language.
In contrast to the above figure, the large language model in today's popular generative AI when the human brain, after a large amount of data and machine learning trained, which contains a huge number of parameter variables (can be understood as the learned knowledge), with these knowledge, you ask it some questions, it can be based on the learned parameter variables inference, to return to you the results.
Model Parameters
Model parameters are variables such as weights and biases that can be learned in machine learning and deep learning models. During training, these parameters are tuned by optimization algorithms (e.g., gradient descent) to minimize the gap between the model's predicted and actual values. The initial values of the parameters are usually random, and as training proceeds, they gradually converge to appropriate values that capture the complex patterns and relationships in the input data.
Algorithm
Artificial intelligence has three pillars: data, algorithms, and computing power. The algorithm is the means to solve the problem, and the means to solve the problem in batches.
Computing Power
Artificial intelligence has three pillars: data, algorithms, and computing power.
Computing power, which usually refers to the ability of a computer or computing system to perform operations, is an important indicator of how fast a computer can process data. Computing power is particularly important in the fields of artificial intelligence, big data processing, and scientific computing, as these tasks often require processing large amounts of data and complex computations.
Natural Language Processing
Every animal has its own language, so do machines. Natural language processing technology allows computers to understand digital information and be able to do appropriate thinking, see the detailed introduction in the relevant chapter of Natural Language Processing
Machine Learning (ML)
Machine learning studies and builds a special algorithm (rather than a particular algorithm) that enables computers to learn on their own from data and thus make predictions; machine learning is not a specific algorithm, but a collective term for many algorithms. Machine learning is not a specific algorithm, but a collective term for many algorithms. Through machine learning, computers are given the ability to make certain judgments and reasoning.
Deep Learning (DL)
Deep Learning is a branch of Machine Learning. Deep Learning has a good performance and led the third wave of Artificial Intelligence. It is also one of the very important learning methods in today's large language model training.
Neural Networks
Inside AI refers to artificial neural networks, which, as the name suggests, are a concept that can be described as bionic. Humans discovered that neurons collaborate with each other to accomplish the processing and transmission of information, so the concept of artificial neural networks was proposed for information processing.
Transformer
Google's BERT model has set the NLP world on fire by winning SOTA results on 11 NLP tasks. A key factor in BERT's success was the power of the Transformer. Google's Transformer model was first used for machine translation tasks, which achieved SOTA results at that time. Transformer improves the slow training shortcomings of RNN, which is the most criticized, and uses the self-attention mechanism to achieve fast parallelism. And Transformer can be increased to a very deep depth, fully exploiting the characteristics of the DNN model to improve the model accuracy.
Token
It generally refers to a basic unit of a large language model. A short English word may be defined as a token, such as "refers", while a long English word may be split into multiple tokens, such as "Tokenization". For Chinese, the number of tokens occupied will be relatively larger, and some words will be represented by one or even more tokens.
Model Application
After the model training is completed, the model with a huge number of parameters is equivalent to having an AI brain, and after having an AI brain, deploying the model and utilizing the model's reasoning ability becomes an application that everyone can access.
Agents
Academia and industry have proposed various definitions of the term "agent". Generally speaking, an agent should have human-like thinking and planning capabilities, memory and even emotions, and certain skills to interact with the environment, agents and humans, and complete specific tasks. An agent can be imagined as a digital person in the environment, which can be simply summarized as the following definition:
Agent = Large Language Model (LLM) + Observation + Thinking + Action + Memory
Multimodal
Multimodal is a very important concept in the field of artificial intelligence, it refers to the ability of the system to simultaneously process and understand data from different sources or different types of data. Simply put, it means that AI is able to "see" (visual information), "hear" (auditory information), "read" (textual information), and process other information at the same time.
As an example, when humans communicate, they can not only understand what the other party says (auditory information), but also see the other party's gestures and expressions (visual information), and even understand what the other party writes through the text (text information). Multimodal AI systems mimic this ability, allowing machines to understand and process information more comprehensively and accurately.
Below, I will explain the concept of multimodality in more detail in several ways:
-
Data type: The type of data that multimodal AI handles is not limited to just one; it can include images, sounds, texts, videos, etc.
-
Information fusion: Multimodal systems are able to fuse information from different modalities to obtain a more comprehensive understanding. For example, combining visual and auditory information to recognize whether a person is angry or not.
-
Application Scenarios: Multimodal technology has applications in many fields, such as self-driving cars that require vision (recognizing road signs and pedestrians), hearing (recognizing sirens from emergency vehicles), and sensor data (sensing the surrounding environment); and chatbots that need to understand textual information and voice information.
-
Challenges: A major challenge for multimodal systems is how to effectively integrate information from different modalities, which may have different characteristics and representations.
-
Advantages: Multimodal systems are usually more accurate and robust than single-modal systems because they can understand information from multiple perspectives, reducing the possibility of misinterpretation.
-
Evolution: Multimodal AI is rapidly evolving as technology advances, and more exciting applications are likely to emerge in the future.
Multimodal AI is like equipping a machine with multiple senses, allowing it to understand and process information in a more comprehensive way to better serve humans.
Prompt
In AI, especially in large pre-trained language models (e.g., GPT), "prompt" refers to a piece of text, question, or instruction that is fed into the model to guide it to produce a specific output. Simply put, a prompt is a hint or instruction given by the user to the AI that tells the AI what task to perform or what content to produce.
Here are some key points about the role of prompts in AI:
-
Task guidance: Through a well-designed prompt, the AI model can be guided to perform specific tasks, such as answering questions, generating text, translating language, etc.
-
Content generation: In text generation tasks, the prompt can be a short question, a complete paragraph, or a set of instructions, and the AI model will generate the corresponding text content according to the prompt.
-
Optimize Output: Optimizing the prompt improves the accuracy and relevance of the AI model's output. A clear, specific, and consistent prompt tends to make the generated results more accurate.
-
Structured information: Using structured information such as lists, steps, headings, etc. in the prompt can help the model to generate organized and clear text, and is particularly suitable for generating tutorials, instructions, step-by-steps, and other types of text.
-
Avoid errors: Ambiguous or ambiguous instructions should be avoided when designing the prompt to reduce the likelihood of the model generating unreasonable or inaccurate responses.
-
Innovation and Interestingness: Designing prompts that lead to the generation of innovative and interesting text can be a challenge, and models can be stimulated to be creative through the use of open-ended questions, the provision of interesting contexts, and quotes from famous people.
-
Technological development: As technology advances, prompt technology has demonstrated strong capabilities in multiple domains such as healthcare, finance, education, etc., and new design methods and optimization strategies are constantly being explored to improve model performance.
-
Prompt Engineering: This is an AI technology that improves AI performance by designing and improving AI prompts to create highly effective and controllable AI systems.
-
Application Prospect: Despite some limitations of prompt technology, its application in many fields is still very promising and is expected to play a more important role in the future.
In practice, since AI is not yet so intelligent, and many times understanding human needs is not particularly well, the design of the prompt is crucial to the performance of AI models, which directly affects the content, style, and quality of the text generated by the model.
RAG
RAG, or Retrieval-Augmented Generation, is an artificial intelligence technique that combines the accuracy of retrieval models with the creativity of generative models to achieve higher levels of accuracy and innovation. The RAG architecture leverages the dynamic capabilities of large databases and large language models (LLMs) to generate insightful and The RAG architecture utilizes the dynamic capabilities of large databases and large language models (LLM) to generate insightful and accurate results.
The RAG workflow typically consists of the following steps:
-
Retrieval: A user query is used to retrieve relevant context from an external knowledge source. This step usually involves embedding the user query into the same vector space as the additional contexts in the vector database using an embedding model, so that a similarity search can be performed to return the top k closest data objects from the vector database.
-
Augmentation: The user query and the retrieved additional context are put into a hint template. This augmented prompt will contain more information that will help the model to generate more accurate and detailed responses.
-
Generation: Finally, the enhanced prompts are fed into a large-scale language model (LLM) for generation to produce the final answer.
The advantage of RAG is that it allows the LLM to utilize additional data resources to improve the quality of generative AI without retraining. This makes RAG ideal for application scenarios that require up-to-date information, domain-specific knowledge, or personalized data. For example, customer service systems, educational platforms, research and analytics tools, and content generation are all areas that can benefit from RAG.
AIGC
AIGC is a term used in the field of Artificial Intelligence, which stands for "Artificial Intelligence Generated Content". The term is commonly used to describe the process of automatically generating various types of content including, but not limited to, text, images, audio, and video, using AI techniques.
AIGC technology can be applied in a number of areas, such as:
-
Writing and editing: Automatically generating news articles, reports, novels, etc.
-
Art creation: Creating artworks such as paintings, music, poems, etc.
-
Design: Automatically generate design patterns, architectural models, etc.
-
Entertainment: Generate game content, movie scripts, etc.
-
Education: Create educational materials and simulation scenarios.
AIGC typically relies on machine learning models, especially deep learning techniques such as natural language processing (NLP) models, image generation models (e.g., GANs, Generative Adversarial Networks), and so on. These models are able to mimic and replicate human creative processes by learning from a large number of data samples to generate new content.
The development of AIGC technology has opened up many opportunities, as well as a series of discussions about copyright, ethics, and creative labor. As technology advances, AIGC is likely to play an important role in many more areas in the future.
What is the B in 12B?
The 5B and 7B of large language model parameters are the number of trainable parameters in the model. Here "B" stands for billion, i.e., 10^9. Thus, 5B means 5 billion trainable parameters, and 7B means 7 billion trainable parameters. These parameters are the weights and biases in the neural network, which are updated during the training process by the back-propagation algorithm to enable the model to fit the training data better.
Other
GPU
The CPU is the brain of the computer, capable of handling a wide range of computational tasks and is suitable for complex tasks. GPUs, on the other hand, have a very simple structure and are suitable for handling repetitive and simple tasks, such as vector computation, which is ideal for use in the field of artificial intelligence.