Homework 3: Language Modeling CS 1470/2470 Due October 19, 2020 at 11:59pm AoE 1 Conceptual Questions 1. What are the dimensions of an embedding matrix? What do they repre- sent? 2. Given the following sentences, plot reasonable embeddings in 2d for “Blueno”, “teleported”, “planet”, “star”, and “flew”. (Hint: A simple graph with some clusters is fine.) Blueno flew to the planet. Then Blueno teleported to the star. I went to the star. 3. What are some benefits to using RNNs over trigrams (or n-grams generally speaking?) 4. What are LSTM cells? How are they different from Vanilla RNNS, and why are they able to ‘remember’ information for longer timeframes than vanilla RNNs? (Hint: Your answer should, at minimum, address the con- cepts of gates and gradients.) 5. (Optional) Have feedback for this assignment? Found something confus- ing? We’d love to hear from you! 2 Ethical Implications 1. OpenAI and GTP-3: In June 2020, OpenAI created a transformer-based language-generator model called GPT-3 and released a private beta. The text that it is capable of generating nears human-level writing ability. The computer scientists who made the model believed it was irresponsible to release the entire model. Instead they have released an API that lets businesses and individuals use the model. Explore more about GPT-3 here. 1 (a) Do you think OpenAI should release the entire model? Why or why not? (3-5 sentences) (b) Identify three ways that GPT-3 can be used maliciously beyond fake news. How can OpenAI be held accountable for misuse of the API? (5-8 sentences) (c) Is the usefulness of GPT-3 worth it, given the enormous quantities of energy resources it takes to build? (3-5 sentences) (d) Is there a problem in your local community or city that GPT-3 could be used to solve? What would be some potential unintended conse- quences and how would you mitigate them? (4-6 sentences) 3 CS2470-only Questions 1. The Gated Recurrent Unit (GRU) is another recurrent network cell that can, like the LSTM, retain information over long sequences. How is it able to do this? Describe its architecture, and compare it with that of the LSTM. 2. While we have studied Convolutional Neural Networks (CNNs) in the con- text of 2D images, CNNs can also be used for 1D sequence modeling tasks, such as language modeling. Look up some papers that have attempted this, and that compare CNN language models to RNN language models (cite which papers you read). What appears to be the general consensus on the pros and cons of the two approaches? 2
欢迎咨询51作业君