F
3

Built a simple Python script to count words, but it gave me a number way too high.

I made a function using the .split() method on a string from a book I downloaded, 'Pride and Prejudice'. It said there were over 150,000 words, which seemed wrong. I learned it was counting punctuation attached to words as separate items. Anyone know a better way to clean the text before splitting?
3 comments

Log in to join the discussion

Log In
3 Comments
margaretc42
Yeah, 150k for Pride and Prejudice is definitely off. I had that exact problem with a project last month. Try using a regex to strip out punctuation before you split.
7
the_ben
the_ben13d ago
Oh man, same thing happened to me!
3
abbyhall
abbyhall12d ago
I used to think that too until I tried it.
1