Due: Monday, February 7, 12pm
Problem 1: 33 points total (3 points per subproblem)
This assignment is based on problems 1-5 of Jason Eisner's language modeling homework plus a small programming problem (problem 6). The first thing to do is to download the PDF of the homework. Many thanks to Jason E for making this and other materials for teaching NLP available!
Work through problems 1-5 and hand in your written solutions for this homework in class. Problem 6 asks you to write a small program, which you will submit on Blackboard.
A few notes:
You are welcome to consult books that cover probability theory, such as DeGroot and Schervish or the appendices of Cormen et al, as well as the slides on probability from Dickinson, Eisner and Martin. Also, usage of Wikipedia in conjunction with the course readings, notes and assignments is acceptable (especially if you learn something from it). For this assignment, it may be helpful to consult the following: Algebra of sets (especially if you're rusty on set theory) and Bayes' theorem which is not extensively discussed in Jurafsy & Martin.
There are 100 points total in this assignment. Point values for each problem/sub-problem are given below.
Problem 2: 15 points
Problem 3: 15 points
c. 10 (2 pts per subproblem)
Problem 4: 7 points
Problem 5: 15 points
Problem 6: 15 points
This problem is very small programming exercise intended to give you a
small amount of practice counting things in text and to make sure you
are comfortable running a program on the Unix command line.
First, download the text of Jane Austen's book Persuasion from Project Gutenberg. Then, use the
$ cat 105.txt | tr -cs '[:alpha:]' '\n' > 105_wpl.txt
Now, write a Java or Python program that reads in
Call your program
$ python compute_bigram.py 105_wpl.txt
We will of course test these values on another text, so you should make sure to actually compute the values and not just print them out…
Here's a stub Python script which deals with the command line args to get you going:
The UT Compling Lab has a page with lots of useful python_tips that you might also find helpful.
Submit your file