Yang Xu

PyTorch Transformer Language Model Clarified

PyTorch provides a pretty thorough tutorial for building a complete pipeline for training and evaluating a Transformer-based language model (link) It provides sufficient amount of details as a tutorial for beginners, but there are several places I found that can be further clarified.1. The batch_size in section "run the model" should be seq_lenFirst, let's look at the for loop in train(), where a batch is taken from train_data and processed into data and targetsfor batch, i in enumerate(...

Paragraph

Yang Xu

Written by

I am an Assistant Professor of Computer Science at San Diego State University. I do research in machine learning and NLP.

Subscribe