Start Learning Japanese in the next 30 Seconds with
a Free Lifetime Account

Or sign up using Facebook

word frequency counter that can count Japanese characters

Moderators: Moderator Team, Admin Team

mieth
Expert on Something
Posts: 147
Joined: June 7th, 2007 7:55 pm

word frequency counter that can count Japanese characters

Postby mieth » August 8th, 2010 1:41 am

Does anyone know of a free or cheap word frequency counter than can count Japanese words? It would be nice if I could do English as well. Anyway cheers.

Belton
Expert on Something
Posts: 752
Joined: June 16th, 2006 11:39 am

Postby Belton » August 16th, 2010 6:46 pm

I've written a FileMaker solution that will break down texts.
It'll do character counts on individual pieces of text. As yet it doesn't aggregate over several texts except for the most frequent 20 kanji. (Maybe v0.4) Nor does it count words except for katakana.
It's still beta and the Windows version isn't as fully functional as the Mac version.

http://www.shiawase.co.uk/kanji-sieve/

The major piece of software that parses Japanese text into words would be MeCab
Which seems to be the starting point for anyone doing serious research in this area. However they seem to use other software to collect the frequency data. The main problem is automatic parsing of text into words.
While I could get it to work, I couldn't integrate or automate it in a GUI fashion.
There are links in this comments thread

http://www.shiawase.co.uk/2010/04/10/ka ... /#comments

You'd need to be very comfortable using the command line or with programming.
And the majority of the documentation is in Japanese.

Online there is the Reading Tutor toolkit.
(I use the later chuta dictionary in my Kanji Sieve to parse the text, it doesn't return statistics but it was much easier to extract the results from the html and save them in a useful form)

http://language.tiu.ac.jp/tools_e.html#input

The Level checker will give word counts and also will count English words as far as I can see. But again it's only on a single piece of text and there's a size limit.
You'd need to extract and parse the results and collate them in a database to get results spanning several texts.

While I've seen studies looking at frequency of individual kanji, I haven't seen anything looking at frequency of words.

Get 51% OFF
mieth
Expert on Something
Posts: 147
Joined: June 7th, 2007 7:55 pm

Postby mieth » August 17th, 2010 6:00 am

Thank for the reply belton!!

Return to “Learn All About Japanese”