COMP4650/6490 Document Analysis – Semester 2 / 2024 Tutorial / Lab 6 Last updated June 28, 2024 Q1: Pre-training Explain why neural network architectures are especially well suited for pre-training. Q2: Self-supervised Objectives (a) In BERT’s masked language modelling training objective, masked tokens are sometimes kept as the same word or replaced with a random word, instead of using the [MASK] token. Explain why this is done. (b) Explain why for BERT’s next sentence prediction task, inputs are encoded as [CLS] sentence1 [SEP] sentence2 [SEP]. Q3: Practical Exercise In this lab you will fine tune a pre-trained transformer model using the Hugging Face transformers1 library. The dataset is the IMDb movie review data where the task is to classify a review as either positive (if the reviewer liked the movie) or negative (if the reviewer did not like the movie). The input is the text of the review and the output is a binary label either 0 (negative) or 1 (positive). You will need to work through the notebook lab6-finetune transformer.ipynb and complete the practical exercise in it. 1https://github.com/huggingface/transformers 1 51作业君版权所有