Midterm I Logistics, Topics, Samples (Midterm Note on September 21, 2020) IDS575: Machine Learning and Statistical Methods Moontae Lee Update on Lecture Notes #04 Based on feedback from the audiences, we have updated the naming conventions. Updated in the Lecture Notes. (Will not be updated in Annotated Notes due to lecture recording) ! in "#$(!) or (', )) in * ', )+ : standard parameters Used to be called user parameters or user-friendly parameters. , when converting to an exponential family: natural parameters Could be called canonical parameters in other textbook. - in regression models: model parameters Linearly interact with features (independent variables). ( l e c t u r e n o t e s a r e u p d a t e d t o r e m o v e t y p o s . d o w n l o a d t h e m a g a i n i f n e c e s s a r y ) 9/22/20 2 Logistics Date: September 26 (Sat), 2020 Where: Online synchronous session Time: 9:00am Duration: Approximately 2 hours ( w h e n a n d w h e r e ) 9/21/20 3 Formats Closed book exam. Only one single cheat sheet is allowed. (letter size, double-sided) One additional single empty sheet is allowed as a scratch paper. (letter size, double-sided) Online software to use (through the Blackboard) Respondus Monitor (your exam solving will be monitored) Lockdown browser (you will not be able to navigate different windows during exam hours) Zoom (You will privately ask your question only to me. Then the instructor will answer for you) ( h o w t o t a k e a n e x a m ) 9/21/20 4 Respondus Monitor Respondus Monitor Use your webcam to detect suspicious activities. Preparation Secure your webcam and double check the camera. Mostly compatible with Windows and Mac computers. iPad could be used with the dedicated app installation, but generally discouraged. ( w h a t t o p r e p a r e ) 9/21/20 5 Lockdown Browser Lockdown browser Online proctoring environment working with Blackboard. You are unable to access other applications or websites. You cannot close the test until it is submitted. Refer to the articles carefully in the following two links. Download and install UIC’s version of Lockdown Browser https://download.respondus.com/lockdown/download.php?id=344933365 Confirm and follow the general guideline in advance! https://answers.uillinois.edu/uic/99742 ( w h a t t o p r e p a r e ) 9/21/20 6 Instructions (1) 1. Close all the windows and applications in your computer. Don’t try to connect to the Zoom session first. 2. Open the Lockdown browser. You should install it before the exam time. 3. Login to Blackboard If you properly installed UIC’s version, Blackboard must be the default webpage. 4. Use ”Midterm I” on the exam section. Password: ids575!! (with two exclamation marks) ( l e t ’ s l e a r n s t e p - b y - s t e p ) 9/22/20 7 Instructions (2) 5. Follow the necessary steps provided by Respondus Monitor. Will take some times but nothing complicated. 6. In the first setup question, click the Zoom link to open our zoom session. You should join in from the exam question. Do not try to launch a separate Zoom application. 7. Turn off both video and audio in Zoom. Only private chatting to me is allowed in the main exam for clarification questions. 8. Solve the exam. Put your maximum care trying not to close the window before submission! ( l e t ’ s l e a r n s t e p - b y - s t e p ) 9/22/20 8 Guidelines Before taking an exam Select a location where you will not be interrupted Make sure you have a stable internet connection. Turn off all other mobile/electronic device once the main “Midterm I” starts. Clear your area except one single cheat sheet, one additional scratch paper, and pencils. ( p r i o r t o t a k i n g a n e x a m ) 9/21/20 9 Guidelines During the main exam Remain seated at your desk/workstation for the entire duration of the test. Respondus monitor will alert instructors any suspicious activity. Lockdown browser will prevent you from accessing other websites or applications. Type your questions on the concurrent Zoom session privately only to the instructor. Watch out! Do not close your exam window or browser before submission! (you may lose your progress) ( w h i l e t a k i n g a n e x a m ) 9/21/20 10 Formats True/False Each true/false question (only either true or false) will be followed by a short answer question. If you think the answer is true à justify your rationale briefly in the following short answer. If you think the answer is falseà provide a simple counter example in the following short answer. Multiple Choices Choose every option you think appropriate. Some questions will be followed by a short answer to ask your justification. Short answers Write up with the best answer with brevity. ( w h a t t y p e o f q u e s t i o n s w i l l b e t h e r e ) 9/21/20 11 Basic machine learning setting Find instance, label, and example. Define input and output space. A hypothesis = a mathematical function from the input space to the output space. Hypothesis space = a set of all hypotheses given the input and the output space. Q: Asking a basic property of mathematical function Q: Counting the size of a hypothesis space under a certain condition. ( e x a m t o p i c s a n d s a m p l e q u e s t i o n s t o s t u d y ) 9/22/20 12 k-Nearest Neighbor Understand the concept of lazy learning Understand the formulas to represent different kNN hypotheses. Understand how to draw decision boundaries and how to make actual predictions. Q: Worst-case scenario. Q: Given a simple training data, draw the decision boundaries. Q: Make a prediction on toy examples. ( e x a m t o p i c s a n d s a m p l e q u e s t i o n s t o s t u d y ) 9/22/20 13 Linear regression Understand the least-square objective (loss) function. Gradient-based training algorithms. Basic residual and normal equations. (no intricate proofs or heavy computations) Probabilistic formulation (what assumptions we have?) and training via MLE. Q: Formulate a linear regression with few features. Derive a gradient-descent algorithm. Q: Formulate a linear egression with toy real data. Answer for conceptual questions. ( e x a m t o p i c s a n d s a m p l e q u e s t i o n s t o s t u d y ) 9/22/20 14 Logistic regression Role of sigmoid (logistic) link function and the power of function composition. Understand how to get optimal parameter via Maximum Likelihood Estimation Think about what type of decision boundaries will be made. Q: Why is it called regression though it is purposed for classification? Q: Formulate a logistic regression with few features. Q: Choose proper decision boundaries. ( e x a m t o p i c s a n d s a m p l e q u e s t i o n s t o s t u d y ) 9/22/20 15 Exponential family and Generalized Linear Models Convert a distribution to an exponential family. Understand why least-square loss a natural choice for linear regression. Understand why sigmoid function is a natural link function for logistic regression. Q: Concept of statistics and sufficient statistics. Q: Differentiate standard parameter, natural parameter, and model parameter. Q: Simple conceptual questions. ( e x a m t o p i c s a n d s a m p l e q u e s t i o n s t o s t u d y ) 9/22/20 16 Sample question (1) NOTE: decision tree is not the topic of our class. Just as an example purpose. Q: Does decision trees always achieve a zero training error if keep splitting attributes? Options: (a) True, (b) False Justification False. If a dataset includes two training examples that have same instance values but with different label values, there must be a training error no matter how much individual attributes are split. ( t r u e / f a l s e + s h o r t j u s t i f i c a t i o n ) 9/22/20 17 Sample question (2) Use plain alphanumeric symbols available in general keyboard. Superscript by ^ Subscript by _ Q: What is the gradient of ! = ||$||%% for $ ∈ ℝ(? Options: (a) 1 (b) 2$ (c) 2||$||% (d) 2$+, . . , 2$( (e) $% Justification It is y = (x_1^2 + x_2^2 + … + x_n^2). So ry/rx_j = 2x_j. ( m u l t i p l e c h o i c e + s h o r t j u s t i f i c a t i o n ) 9/22/20 18 Sample question (3) You are trying to estimate housing price of Loop area in Chicago. Given 100 training data points, your model is ! = #$%$ + #'%' + #( %( where %$ is the # of bedrooms, %' is the distance to the closest station, and %( is the square feet. Q: What will be the expected signs of #$,#' , #(? Q: Which of the followings is the correct updating formula for SGD? Q: Is this Linear Regression? Q: This may be difficult to be a reasonable model. Why? Q: What happen if you have only 2 data points rather than 100? ( s h o r t a n s w e r s o r m u l t i p l e c h o i c e + s h o r t j u s t i f i c a t i o n ) 9/22/20 19 Overall advice Try to figure out dimension of each mathematical symbol. (scalar, vector, matrix) Our lectures taught various statistical methods in general settings. In homework, you applied it to a mid-size specific cases. In exam, you will apply it to a toy-level problem but with more concrete numbers. Make sure finding answers for the questions within each slide. Don’t try to spend too much time to read and watch external material. Don’t try to copy-paste something from the lecture notes. In many cases, exam questions ask concrete numbers rather than about general forms and formulas. ( e x a m m u s t b e a n e x t e n s i o n o f l e c t u r e . e n j o y i t r a t h e r t h a n g e t t i n g s u f f e r e d ) 9/22/20 20
欢迎咨询51作业君