-->

Enterprise AI World 2024 Is Nov. 20-21 in Washington, DC. Register now for $100 off!

The Rise of GenAI and LLMs

Article Featured Image

DEVELOPMENTS IN THE 1980S, 1990S, AND 2000S

Recurrent neural networks (RNNs) surfaced in 1986, gaining immediate popularity. Unlike traditional feedforward neural networks with one-directional information flow, RNNs could remember previous inputs through their internal state or memory, processing and converting sequential data effectively through an algorithm known as backpropagation through time. This feature made them suitable for natural language processing (NLP) tasks. However, RNNs struggled with retaining long-term memory, a problem technically known as the vanishing gradient.

In 1997, long short-term memory (LSTM) networks, a specialized type of RNN, were developed. LSTMs addressed the short-term memory limitations of RNNs, including slowing the degradation of gradients, allowing them to capture more distant dependencies. With a unique architecture featuring input, forget, and output gates, LSTMs could selectively memorize or discard information, maintaining relevant data over long sequences. This ability made LSTMs efficient at capturing long-term dependencies in sentences, significantly improving tasks like co-reference resolution.

2010 saw the release of the Stanford CoreNLP tools, an open source NLP toolkit created by the Stanford NLP Group. Its libraries can be applied to several text processing tasks, including tokenization, part-of-speech identification, named entity recognition, dependency parsing, and sentiment analysis.

The launch of Google Brain in 2011 marked a transformative period, offering researchers powerful computing resources, datasets, and advanced features such as word embeddings. This facilitated a deeper contextual understanding of words, propelling the field forward.

In 2014, Kyunghyun Cho, with colleagues at the University of Montreal, Le Mans University (France), and Jacobs University (Germany), introduced gated recurrent units (GRUs) to simplify the structure of LSTMs while tackling similar problems. GRUs used only two gates—update and reset—to streamline the gating process. This reduced complexity made GRUs more computationally efficient while retaining long-term dependencies effectively, addressing the vanishing gradient issue like LSTMs but with a more straightforward approach.

And then a step change. Generative adversarial neural networks (GANs), introduced by Ian Goodfellow and fellow researchers at the University of Montreal in 2014 at the International Conference on Neural Information Processing Systems (NIPS 2014), are games in which two neural networks play against each other. One attempts to deceive the other with generated content, while the other examines the output for irregularities that would betray its source. The game continues until the generated content is so close to the actual content that the opposing neural network can no longer challenge its authenticity. Despite advancements, RNNs and their variants, LSTMs and GRUs, struggled to retain context in long sequences.

The introduction of the attention mechanism marked a paradigm shift in sequence modeling. Unlike RNNs that used a fixed-size context vector, attention dynamically referenced the entire source sequence, selecting relevant parts at each output step. This approach prevented the loss of crucial information, especially in longer sequences, significantly improving performance.

TRANSFORMER ARCHITECTURE

The transformer architecture, introduced in 2017 by Google Brain’s Ashish Vaswani, et al. in the paper “Attention Is All You Need” (proceedings.neurips.cc//paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf), revolutionized sequence processing. Relying on an attention mechanism, transformers featured an encoder and decoder with multiple self-attention layers and feed-forward neural networks. The multi-head attention mechanism allowed transformers to focus on different parts of the input simultaneously, capturing various contextual nuances.

This architecture processed sequences in parallel, enhancing efficiency and laying the foundation for subsequent models. In 2018, Google released BERT (Bidirectional Encoder Representations from Transformers), a model that processed text bidirectionally, setting new performance standards. OpenAI released GPT-1 (generative pre-trained transformers) the same year.

OpenAI’s GPT-2 followed in 2019, with GPT-3 arriving in 2020. This marked a paradigm shift in AI capabilities. These LLMs, pretrained on vast amounts of text, could be fine-tuned for specific tasks, performing a wide range of functions.

EAIWorld Cover
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues