Tuesday, January 5, 2010

[Text] Counts of a word within a file. Not by line, total.

Came across this (courtesy of Franklin52 in the unix.com forums).
http://www.unix.com/shell-programming-scripting/63576-how-find-count-word-within-file.html

Say you need a count of a particular word in a file. For example, in an XML file where the word can repeat within a line. Can't use WC or GREP or FIND, but AWK will do the job. (You do have these tools on your Windows box, right? [if you have a UNIX box, it's assumed you do])


awk 'BEGIN{RS=" "}/WORDTOCOUNT/{h++}END{print h}' blah.txt


One note: this assumes that there will be a space somewhere between the occurrence of the words. TESTTEST and
TEST
TEST
would each only count as 1, since there's no space to "reset" the find. (It's a stream function - search for the word. If you find it, increment by one and skip forward to the next space. When you hit a space, start searching again.) For our XML, there are spaces after the tag we searched for, so the counts work.

No comments: