Home

Syllabus

The course Computational Methods for Linguists focuses on learning how data science methods/tools can be applied in the domain of linguistics. One of the main goals of the course is to familiarize the students with the basics of programming, including general technical versatility skills required to successfully undertake even the most basic programming tasks. While the syllabus doesn’t mention linguistic and social concepts explicitly, they will be present throughout the class, as the assignments will be organized around linguistic or linguistically-annotated data. The presentations will be graded in particular based on how well they are connected to linguistic and social concepts.

Learning outcomes

Click to expand Students will learn about what counts as data in computational linguistics, as well as how linguistic theory and questions dictate which computational methods are employed. Similarly, students will learn about ethical and social implications of data uses in linguistics. Students will learn basic programming concepts and how to write a range of programs (using python programming language). They will also learn how to use command-line interface and version control. They will learn a range of techniques for data cleaning, representing data as vectors, thoughtfully choosing a model, loading the data into the model, running the model, and interpreting and visualizing results.

Extended office hours

Click to expand In this class, we offer you extended office hours, because you will face technical issues with which every person needs help, when they come across them the first few times. While the usual expectation applies, that you come to the office hours with a specific issue and show your work (demonstrate that you have made some effort already), remember that there is no expectation that your issue needs to be particularly complex or advanced. Many technical issues are simple but can take hours to figure out if you see them for the first time. Use the extended office hours.

Asking questions on the Discussion Boards

Click to expand There will be a dedicated Discussion Board on Canvas for each assignment, as well as an area for general and other questions. It is important that you ask a lot of questions on the Discussion Boards, and we do mean it. Posting a question on the Discussion Board will allow others to benefit from your question and our answer! Use email for confidential questions such as regarding your grades and personal circumstances, but not for any questions related to assignments or class logistics! Use the Discussion Boards for that!

Some advice on how to ask technical questions on the Discussion Board effectively.

Assessment

Click to expand The class is organized around a series of assignments targeting different concepts and skills but all connected to linguistic data/corpora (TBA). There are no exams. The assignments, on which the students will work individually, put toghether account for 80% of the grade. Additionally, there is a presentation related to the assignments which is worth another 15% (the presentation may be pre-recorded). Additionally, students will write a blog post reflecting on a reading of their choice and will also post comments to their classmate's posts; this is worth another 5%. Up to 2% (positive) adjustment for participation (such as asking questions during class or on the discussion board, attending office hours etc.).

Grading scale:

95% = 4.0, 94% = 3.9, 93% = 3.8 & so on.

Late homework policy

Assignments

The assignments will be roughly biweekly. The goal is to give you some breathing space in between and sufficient time to work on them in a reasonable pace. However, it will be very important for you to remember that the assignments which are technical in nature may take a long time due to technical issues (which is very normal, and dealing with it is one of the main learning goals in this class). This means, more or less, that if you delay starting the assignment, you are very likely to not finish it by the time it is due.

I highly recommend the following algorithm for the technical assignments:

  1. Start on the day assignment is published (or the next day).
  2. Get a feel of how fast you are progressing. Take a note of the first block you are facing. Post about it on the Discussion Board. Then go do something else.
  3. Come back to it next day. See if you get unblocked. If not, go to the next office hour and see if that helps you get unblocked.
  4. Repeat from step 2.

Once you start feeling like you have not been making progress for 20-30 minutes, stop. Come back to it later (e.g. next day).

Having 2 weeks or so for each assignment can mean you can take nice breaks from the class, but that will only work if you start early. When progress feels slow, take frequent but short breaks (e.g. leave the assignment and come back to it next day). When you are almost done and you feel like you understand almost everything and will be able to finish the assignment quickly, then it becomes possible to take a few days break. But not before, or you will not succeed in earning a good grade.

Readings

Click to expand There will be some assigned readings for most lectures. Some of them will just be blog posts and websites for beginner programmers etc. They are just as good for learning about these things as books ;) Maybe even better. Other readings will include scholarly papers; reading those is more difficult, so, try to identify some specific goals as you read. E.g. "I am reading this to understand what "Data Statements" are and I want to form an opinion about whether they are useful in some particular context."

(The blogging assignment readings overlap with the assigned readings but it is not the same set.)

Texts

Click to expand There are no required textbooks, though there will be some reading, all available online.

Recommended text (for those who have not taken LING200): Language Files 12.

You may find a book on python programming for beginners helpful, but in general we will rely on online resources.

Class schedule (subject to change)

Date Topic Reading Due
March 30 Introduction, course structure, etc.    
April 1 Conceptual overview. Data science What is Data Science? Online survey (on Canvas); “Assignment 0”
      Request an account on the patas cluster
April 6 Basic system/programming concepts Basics of python programming (Chapter 1)  
April 8 VS Code basics. Version control. demo code 1. The IMDB reviews dataset paper  
    2. Data statements for NLP; 3. Version control (read conceptually; ignore the R-studio stuff etc.) Blogs 1
April 13 Variables, scope, and control flow; code; fizzbuzz 0. Assignment and statements (2.1–2.13); 1. Numbers, strings, lists (3.1); 2. Logical operators; 3. Control flow (4.1–4.5); 4. De Morgan’s laws Assignment 1
April 15 Loops and dictionaries. Input/output; code 1. Loops; 2. Dicts (6.1–6.2); 3. Input and output;  
April 20 Text processing; code 0. Regular expressions; 1. Tokenization; 2. Unicode; 3. Modules Blogs 2
April 22 Text processing, contd. Unicode. Evaluation. VS Code settings    
April 27 Metrics. Precision and Recall. code Precision and recall Assignment 2
April 29 Data science and probability; MLE A mathy tutorial – try to work through a couple problems of different types, listed there at the end.  
May 4 Statistics: distributions. The Gaussian distribution; Exercise; solution code Start tutorial, read through “Measures of spread” Blogs 3
May 6 Bayes Theorem. Dataframes; code Finish tutorial, skip the section about R but make sure to read about the Bayes Theorem. You can also skip: Entropy and Information Gain; Inferential statistics. Read those if you like (they are generally important), but we probably don’t have time for them. Assignment 3
May 11 Machine Learning and matrices: Linear regression; exercise and demo code Regression and classification; skeleton code  
May 13 ML-contd. Classification: Logistic regression, Naive Bayes; code Logistic regression; Naive Bayes; Blogs 4
May 18 Language models. Non-linearity and neural nets Deep learning for NLP; Non-linear problems; Testing NLP models  
May 20 Deep Learning overview. Linguistic knowledge in NLP Ettinger et al. 2017  
May 25 Working with linguistic corpora Aijmer 2021 or Stange 2021 (both found on Canvas–>Files–>papers) Assignment 4
May 27 Visualization and Communication; code 🐙(1) To dissect an octopus; (2) Keras word embeddings tutorial (a working version of this is part of your HW5 skeleton) (3) Visualization with pandas Blogs 5
June 1 Presentations    
June 3 Presentations (possibly asynch.)    
June 8     Assignment 5

Academic Integrity

The University takes academic integrity very seriously. Behaving with integrity is part of our responsibility to our shared learning community. If you’re uncertain about if something is academic misconduct, ask the instructor. The Instructor is willing to discuss questions you might have.

Acts of academic misconduct may include but are not limited to:

Concerns about these or other behaviors prohibited by the Student Conduct Code will be referred for investigation and adjudication by (include information for specific campus office).

Students found to have engaged in academic misconduct may receive a zero on the assignment (or other possible outcome).

Disability Accommodations

Your experience in this class is important. It is the policy and practice of the University of Washington to create inclusive and accessible learning environments consistent with federal and state law. If you have already established accommodations with Disability Resources for Students (DRS), please activate your accommodations via myDRS so we can discuss how they will be implemented in this course.

If you have not yet established services through DRS, but have a temporary health condition or permanent disability that requires accommodations (conditions include but not limited to; mental health, attention-related, learning, vision, hearing, physical or health impacts), contact DRS directly to set up an Access Plan. DRS facilitates the interactive process that establishes reasonable accommodations. Contact DRS at (www.disability.uw.edu).

Safety

Call SafeCampus at 206-685-7233 anytime – no matter where you work or study – to anonymously discuss safety and well-being concerns for yourself or others. SafeCampus’s team of caring professionals will provide individualized support, while discussing short- and long-term solutions and connecting you with additional resources when requested.

Religious Accommodations

Washington state law requires that UW develop a policy for accommodation of student absences or significant hardship due to reasons of faith or conscience, or for organized religious activities. The UW’s policy, including more information about how to request an accommodation, is available at Religious Accommodations Policy (https://registrar.washington.edu/staffandfaculty/religious-accommodations-policy/). Accommodations must be requested within the first two weeks of this course using the Religious Accommodations Request form (https://registrar.washington.edu/students/religious-accommodations-request/).”

Home