Homework 1 asks you to write a program that analyzes written text. The text will come from a file. The kind of analysis I am interested is:
- count the number of words in the text
- count the number of distinct words in the text
- the X most used words and their frequencies (X will be a number supplied by the user)
- the X most used words that are greater than or equal to Y letters and their frequencies (X and Y will be numbers supplied by the user)
- words that are only used X times (X will be a number supplied by the user)
You should convert all characters in each of the words to lowercase letters. When performing your analysis you will strip any character that is not an alphabetic or numeric character from the word. So, if you read in a word that ends a sentence like “character.” you will strip the period and the word will be “character” by itself.
I am attaching two very large files to test the program with (you should begin with a much smaller file to test your program initially):
George Orwell’s 1984 (attached, please use this text document for the project).
These are the things I will be looking for when I am grading the assignment:
- Can you read in and count all the words from the file?
- Is the distinct word count correct?
- Do you strip punctuation and convert each word to lowercase?
- Are the frequencies correct?
- What approach did you take to store data (vectors, arrays, repeats)?
- How efficiently did you search for existing words?
NOTE: Needs to be modular (separate functions for each process. MUST use inFile, vectors, and ABSOLUTELY binary search, this is an absolute requirement.
P.S.: Well written comments is necessary to explain how the program is working.
Project due by 9-9:30pm Chicago time, latest time 10pm Chicago time (the current time is 7:22pm).