1. 说明
文章作为个人学习 GPT 的笔记。
1.1 学习资料
不得不说现在的互联网还是给像我这样的普通人提供了很好的学习支持。
- 有 Andrej, 3blue1brown 等大佬制作的精妙视频
- Andrej 的 GPT 课程非常细致,且配有代码
- ChatGPT 等助手AI 可以帮忙解释各种学习中的问题
下面罗列一些个人感觉比较有价值的相关资料
1.1.1 帮助理解和入门的资料
- Youtube 3blue1brown 介绍深度学习神经网络基础
- Youtube 3blue1brown 神经网络反向传播理解
- Youtube 3blue1brown 神经网络反向传播公式
- Youtube 3blue1brown 初步理解 GPT
- Youtube 3blue1brown GPT 的注意力机制介绍
- Youtube 3blue1brown GPT 如何存储信息和知识的理解
1.1.2 Andrej 指导实践 GPT 的课程
- Youtube Andrej 从 Bigram 到 GPT 的理解和实践
- Youtube Andrej 对大模型的前景/发展/挑战等方面的分享
- Youtube Andrej GPT Tokenizer 的实现
- Youtube Andrej GPT-2 的完整实现
1.1.3 其他重要资料
- <Attention Is All Your Need> 论文:提出 Transformer 注意力机制
- <Dropout: A Simple Way to Prevent Neural Networks from Overfitting> 论文:提出训练中随机丢弃网络单元防过拟合的方法
- <Deep Residual Learning for Image Recognition> 论文:提出残差连接,有助于缓解梯度消失和梯度爆炸问题
1.1.4 可用语料
2. 分步学习和实现 GPT
2.1 先实现简单全连接神经网络
主要用以了解 pytorch 的基本用法:
- 模型建立
- 正向传播和反向传播优化
- 模型预测
Jupyter Notebook 笔记和代码详见:
2.2 实现 Bigram 模型
跟着 Andrej 课程实现 Bigram 模型( NLP中最简单的模型),用以熟悉 NLP 基本操作的思路:
- 语料处理(encode, decode)
- 监督学习每批输入输出应该是什么
- 模型损失评估
- 等等
Jupyter Notebook 笔记和代码详见:
训练后的 Bigram 模型文本生成质量:
I gin cmy tofou winca e omedikinin atorin, un, Wh orir t,
CI d ces nid n wethanole thourselle d!PZAy I fr be Jut maid f bl k hanon; 'ds
A bes
Dout f illemerer,
BRY fano I dl mathepen f w--bukshe! theve at,
minia! ce w garyome Goll, t m'do amyos, wises ne aves thepred; m grconend n he bshasmethityosifowha alllicr tes wothoulor athis held.
INThallele, amalf merqus. MNowhinkid se o.
T:
TE att od OLove f cour howatltheay, y I'd bunth ast o ngy:
QUTheno ghenurd
DD t, waprcrrt kee oy flesserd n k's hy RYo e?
TEDY:
Y more oultime
ARDWinthel gondoleraysind, myOnato t be Ant then merims mong rve COnd berm t welile
MPOM:
Y: itidyoumil llle be; yif
可见生成质量非常差,但也情有可原,毕竟这个 bigram 是按字符生成,且生成时仅考虑上一个字符。
2.3 实践 Self-Attention 机制
继续跟着 Andrej 课程学习。
实现 bigram 后可以发现,即使调大模型,或者增加训练时间,也不会让生成的结果变太好(虽然比不训练好)。主要是由于 bigram 预测时只会关心前一个 token 来预测下一个 token,注意力短浅。
所以需要考虑一些方法来拓展注意力,让模型能利用更多上文信息。曾经 NLP 简易的获取长上文的方式是 Ngram ,但效果也很有限。而 Transformer 的注意力机制实践上看获取和处理信息的能力是非常强的(虽然训练开销也大)。
先了解一下 Transformer 的核心:self-attention 的思路
Jupyter Notebook 笔记和代码详见:
2.4 实现简版 GPT
继续跟着 Andrej 课程学习。实现极简版 GPT。
Jupyter Notebook 笔记和代码详见:
仅用哈利波特训练后的极简 GPT 生成质量:
Iweared as they stretch of him.
"Sonty, Ronad Howler, you dident them into there?" and Her conduction.
He looked forward. Her hand at the other.
"So what d'you think here?"
Black opened the steering back onto the chocolate.
"That's been known of wizards, next to his feet,Harry Potter, Siriuta."
"You shut are true," said Harry.
Dudley had to go all so muchtember, but coursing picking up inthe last years to be opponing.
"I'm stuggling it," he said importantly, softrying a fewer squashy and still gruly smarking. "What d'you don't taught," Harry asked, "It's not far 'not, jus' better, it's the bethat off' the powers. Ot it -- us you could tall kill yeh!Reluch? Got I'd find tonight on to join yousit
*250*
espectacular outside corridors.
"But Dumbledore's master!" said "The Colin Curesures about the Hall and jinxed."
The Slythering worder lit down the air, dennig stupidly through the shelveshudder through the gave Harry holding him a secret and undolughthe tip. Speed highly in the sunken air kneed his wand. There was a dragon, ugly, Harry kimal jumped tables, and lightly pulled off the door of the doors behind the floor. Fred and George were still completely.
"You don't stop?" she said to Harry.
虽然错误的单词很多,文章也没有胡乱含义,但相对 Bigram 已经好了很多。
2.4 实现 GPT Tokenizer
Jupyter Notebook 笔记和代码详见:
2.5 实现 GPT 和 Tokenizer 的结合
Jupyter Notebook 笔记和代码详见:
仅用哈利波特训练后的极简 GPT + 简易 Tokenizer 生成质量:
who was coming to him ready. Her arms she slopped into her Slytherins:
Mr. But one leapt and that jerkier were explocoed by an antidote spell, which conversation was very continually abandon endles. Mrs. Weasley had planted quiberin through the door hearing her head and Harry thought the Person her advancing subpening left of Fudge. Even Bertha's happy Dumbledore and Hagrill, Mr Weasley’s name interested, that they were cas a short way night remember than Dumbledore. Famous and he was, whose spellbooks,
finally little would have kidnailed be weidering like sense weapon? The door of everybody because Harry under the teach for the World Cup! Anyway, he
missively. He lost every lucky, and saw that creaking fond of seats? A floor of Oce wizard's crackling into Hagrid grip again and
The spiralum had shed and with their brains. The poor wizards around the last for Madam Edawn earer, looking on the way through the sky.
Which affle away a dragon empt, within the tiny air, Harry had passed a great door that encountered they'd have been one
two when they were urrounded by a subject. Voldest less edup. Sirius.” said Hermione at once, morning the same reflection to the pitch. Harry had been buying woods:
He turned right nice hardly by Mosten, in which he was, however, all of Cold, Minister Beauxbatons were insidently to be every obviour, and and the small port of bite bird History of Magical
Jerkins wizarding Lestruck kind of day. Aunt Muze that the arrived hair on the between Percy which herogs, resister couldn’t sit with
their week-eyed live long boyss.
错误的单词又少了一些,质量进一步提升,不过语法还是问题很多,语义依旧比较混乱。从训练情况上看,主要是缺训练数据(仅有哈利波特1-7),再继续训练会有过拟合的问题。进一步扩大训练样本应该能让结果再慢慢变好。