Artificial Intelligence Comes to Everyone (2017~Now)

Environmental change

After the new millennium, computer technology has been developed at an extraordinary rate, and the quantitative to qualitative changes in AI have accumulated a foundation.

Internet Technology

With the emergence of Internet technology, knowledge sharing and collaboration on a global scale have been facilitated, enabling researchers in the field of AI to easily share research results and technological advances. Open-source projects and online forums provide communication platforms for AI research and applications, accelerating the dissemination and application of new technologies.

At the same time the Internet has created a large amount of data content, providing rich data resources for AI, ranging from intelligent personal assistants to automated translations to recommender systems and smart home devices, both in e-commerce and socialization, and e-libraries have become very accessible as a source of data for the training of AI and machine learning algorithms, the

Computer Hardware Technology

The semiconductor industry has grown exponentially since Gordon Moore introduced Moore's Law in 1965. The rapid increase in computer storage and computing power, the emergence and performance of GPUs (Graphics Processing Units), TPUs (Tensor Processing Units), and other specialized AI gas pedals have greatly accelerated the training speed of AI models, and these hardware are designed for parallel processing, and are able to efficiently deal with large-scale datasets and complex computations, thus shortening the training time of the models, and making it possible for more complex and deeper neural network models.

Software Technology

Advances in hardware technology have led researchers to explore new neural network architectures, such as quantum computing and neuromorphic computing. These novel architectures aim to mimic the way the human brain works in order to handle learning tasks more efficiently, and while advanced hardware technologies have made these explorations possible, software technologies have made great breakthroughs in the following areas:

Artificial intelligence algorithms (machine learning, deep learning)
Neural networks
Computer vision
Natural language processing

With the development of these technologies, they have gradually been applied to people's daily life, such as:

Access control face recognition
Automobile automatic driving
Industrial intelligent manufacturing
Smart speakers (Amazon Echo, Google Assistant Assistant, Apple Siri Assistant, Xiao Ai Speaker, Tmall Genie, Baidu Xiaodu)
...

Artificial intelligence technology for these application scenarios to solve a lot of needs and problems, but are relatively single and rule-based, beyond a certain range, may become very "retarded", the overall still basically stays in the weak artificial intelligence stage.

Neural Network Architecture Changes

Natural Language Processing (NLP) is a challenging problem in deep learning. Unlike image recognition and computer vision problems, natural language does not have a good vector or matrix structure, and the meanings of raw words are not as certain and easy to represent as pixel values. Generally we need to use word embedding techniques to convert words into vectors, which are then fed into a computer for computation.

Language modeling is a natural language processing technique based on statistical and machine learning methods, which is used to evaluate and predict the probability distribution of a given sequence, usually a sequence of words or a sequence of characters. The main applications of language modeling are tasks such as text generation, machine translation, and speech recognition.

Recurrent Neural Network (RNN, Recurrent Neural Network)

The human reading habit does not start from the beginning to think about the meaning of each appearing word, but to understand the meaning of the current word through the information of the previous words. Based on this behavior, Recurrent Neural Network (RNN) was developed.

RNN is a classical sequential model that feeds information from a sequence into the network one by one in a cyclic manner and uses a recurrent structure within the network to capture temporal dependencies in the sequence.

The main problem with the RNN model is that the inputs are sequences and the data can only be processed sequentially, not in parallel, and the computational cost is high. In addition, it also suffers from the problem of vanishing gradients and therefore cannot handle very long sequences.

Although there are also improvements including GRU, LSTM , this does not solve these fundamental problems: even if the memory is improved by the attention mechanism, the computation is not improved. Processing was still sequential and could not address long text contextual data relationships and computational efficiency, as each token had to be processed sequentially . Hence the birth of Transformer.

Transformer architecture

In 2017 Google proposed the Transformer architecture in the paper Attention is All You Need. It is a neural network architecture based on a self-attention mechanism that processes long sequences by interacting information from all positions in the sequence.Transformer passes the input sequence to the encoder and the decoder separately, with each part consisting of multiple layers, each consisting of a multi-headed self-attention mechanism and a fully connected layer. Transformer's encoder and decoder can process multiple sequence locations simultaneously and hence can process long sequences more efficiently. The following figure shows a schematic of the network mentioned in this paper:

A detailed description of the Transformer architecture will be presented in detail in the chapter on principles related to large models.In summary, the main following features of the Transformer architecture are as follows:

Parallel processing , no longer like RNN, each processing depends on the serial relationship of the previous one, Transformer has no more sequences, and adopts parallel processing, which greatly improves the processing speed; we know from the above figure, Transformer is to take the input statements as a whole, and input them into the embedding layer, so there is the ability of parallel computation, the It no longer emphasizes the input sequence sub-order. Therefore, there is no problem of long dependency.
Positional encoding , a technique for adding positional information to the input embedding, enables the model to understand the position of certain parts of the input within the overall input. The encoding includes not only the position itself, but also the correlation data between the disassembled parts (tokens), which facilitates the computation of weights during parallel processing (multi-head attention mechanism);
Self-attention mechanism (self-attention mechanism) : this is a new one, introduced in the Attention is All You Need paper, and is the most common structure in transformers. The self-attention mechanism is a core component of the Transformer. Unlike previous Attention mechanisms, it is possible to model dependencies between different positions in a sequence without relying on temporal ordering, thus allowing for better handling of long sequences.

Due to the emergence of the Transformer architecture, which provides a basis for processing massive amounts of information and building massive amounts of model parameters, after the publication of Attention is All You Need, Transformer, and its variants, have been commonly used on large datasets to train large language models.

The Emergence of Big Models

Since the advent of the Transformer architecture, large language models have been popping up all over the world.

The inflection chronicle: the rise of pre-trained models

In 2018, pre-trained models reached a milestone with the release of Google BERT (Bidirectional Encoder Representations from Transformers) , which enables models to understand more complex contexts and semantic relationships by pre-training large-scale corpora. This technological innovation enables large language models to perform well in a variety of natural language processing tasks, opening up new possibilities for applications such as automated Q&A and machine translation.

Jun 2018 GPT-1 Approx. 5GB text, 117 million references;

Feb 2019 GPT-2 Approx. 40GB text, 1.5 billion participant count;

June 11, 2020 GPT-3 language model announced;

Global Chronicle: cross-domain applications of large language models

In the fall of 2022, GPT-3 generated a lot of attention due to its viral spread on social media. The Large Language Models (LLMs), which have over 175 million parameters and run at a cost of $1 million per second, opened a new era in the field of Natural Language Processing (NLP).

In this phase, the emergence of large-scale pre-trained models revolutionized the NLP research and application landscape.

Large-scale language models fully exploit the potential of large amounts of unlabeled data, thus endowing the models with stronger language understanding and generalization capabilities. Large models employing pre-training and fine-tuning strategies have achieved unprecedented success on several NLP tasks, demonstrating outstanding performance in terms of model accuracy, generalization ability, and complex task processing. This has not only attracted a large amount of investment, but also spawned completely new development and research directions in the field of NLP.

At the same time, with the gradual reduction of both the threshold of application and the threshold of use of large language model, a large number of large language model products continue to emerge - ChatGPT, the cross-language code compilation tool Cursor, the official Github code authoring tool CopilotX, Gamma AI, which generates PPT content with one click. Copilot configured in the office family bucket, fill features configured in Photoshop, MidJourney and Stable Diffusion for extensive image generation ...... These applications have not only changed the way business operates, but have also greatly impacted people's lives and work, and during this period, big language models crossed over to finance, healthcare, law, and many other fields, bringing intelligence and efficiency improvements to a wide range of industries.

The emergence of multimodality

With the wide application of big language models, multimodal learning has also become a new direction in the development of big language models, where the models are able to better understand multiple forms of information, such as text, images and sounds. At the same time, the concept of adaptive learning is leading the way for big language models to better adapt to different domains and tasks, making them more generalizable.

In the early morning of February 16, OpenAI once again shook the global technology community by releasing a literate video model called Sora. Sora, compared with other literate video models before it, has crossed over to a practical productivity tool, which not only marks a major breakthrough of AI in the field of video generation, but also triggers a profound reflection on the impact of the development of AI on the future of mankind.

On the same day Google launched Gemini 1.5 Pro, which supports up to 1 million tokens, far exceeding other current base models, and can process a large amount of information at once, such as 1 hour of video, 11 hours of audio, more than 30,000 lines of code, or more than 700,000 words, Gemini 1.5 Pro is a medium-sized multimodal model.

The Dawn of the AGI Era

With the release of Sora, AI seems to have officially entered the age of general artificial intelligence (AGI) , which refers to machine intelligence capable of performing a wide range of intelligent activities like humans, including understanding language, recognizing images, and engaging in complex reasoning, etc. The Sora model is capable of outputting up to 60-second videos directly, and the videos contain highly detailed backgrounds, complex multiple-angle shots, and multiple emotionally rich characters. This capability goes beyond simple image or text generation and begins to touch the more complex and dynamic medium of video. This means that AI is not only becoming more and more powerful in processing static information, but it is also showing amazing potential for dynamic content creation.

Convergence of Dreams and Reality

Humans have had the desire to turn dreams into reality since time immemorial, and the launch of Sora has certainly taken this process a big step forward. Through highly realistic video generation, Sora gives us a glimpse of the possibilities of intermingling dreams with reality. Artists, directors and even ordinary people in the future will be able to quickly transform their creativity and imagination into visual works with the help of tools like Sora. This will not only greatly enrich our cultural life and entertainment experience, but may also have a disruptive impact on industries such as film and television and advertising.

AI generates AI: the age of the unpredictable

The release of Sora also brings another shocking revelation: the era of generating AI with AI may no longer be far away. Driven by AI technology, models like Sora can not only mimic human creative styles, but may even evolve themselves to produce entirely new artistic styles and creative approaches. This process of "AI generating AI" will make the development of artificial intelligence more unpredictable, but at the same time full of unlimited possibilities.

The Rise of Silicon-based Life and Future Challenges

The advent of the AI era will not only push human beings from carbon-based life to silicon-based life, but also have a profound impact on human productivity and lifestyle. Under the perspective of silicon-based life, human beings may be able to achieve a leap in productivity and create unprecedented heights of civilization through deep integration with AI. However, this also brings unprecedented challenges and ethical issues. As AI-generated content becomes more and more realistic, it will be difficult for humans to distinguish between fantasy and reality, which may have a profound impact on our cognition, emotions and even social stability.

Impact on everyone

The impact of AI is a direct influence on the development of modern civilization, and AI has become a sign that the future has come, and will continue to approach and influence people's lives now and in the future, and have a broad and profound impact on the progress of human society. From self-driving cars to smart home systems, from virtual shopping assistants to medical diagnostic systems, AI is gradually penetrating into every aspect of our lives.

AI can liberate manpower and improve work efficiency, AI digital people, AIGC (content generation), changing a number of industries around us, such as news media, e-commerce marketing, planning, product design, film and television production, etc., a series of heavy, repetitive, and even dangerous tasks will gradually be replaced by AI, and jobs will be replaced by a number of AI engaged in careers, such as cue word engineers;

The first batch of people who lost their jobs due to AI has already come, we are powerless to stop the rapid development of science and technology, we can only embrace and meet the future to the arrival, as a witness and a participant in the history of mankind, we have to improve our own relevant AI understanding and technology, to learn it, use it, and even improve it.

Although AI technology brings many benefits, there are also some concerns and risks. For example, AI technology may lead to increased unemployment, privacy breaches, data misuse, and ethical issues.