LOADING

加载过慢请开启缓存 浏览器默认开启

深度学习笔记10

68. Transformer

layer Norm不同于Batch Normalization

BN是不同batch间的同一channel进行(这样才合理)

image-20240203192712738

image-20240203203500016

image-20240203170601578

image-20240203170943456

image-20240203171815812

image-20240203191933707

image-20240203172057536

image-20240203172746423

FFN:image-20240203192356550

image-20240203172854289

image-20240203173202346

image-20240203173355349

训练:

image-20240203173800584

image-20240203174014496

预测:image-20240203193309908

bleu score 是一种测试评估函数, 但不能作为训练损失函数,因为不能微分,就无法反向传播梯度下降

image-20240203174644867

image-20240203174914105