Start Learning Japanese in the next 30 Seconds with
a Free Lifetime Account

Or sign up using Facebook

Japanese-Japanese dictionary file?

Moderators: Moderator Team, Admin Team

Javizy
Expert on Something
Posts: 1165
Joined: February 10th, 2007 2:41 pm

Japanese-Japanese dictionary file?

Postby Javizy » October 9th, 2009 11:04 am

I've been using the crappy JLPT shared deck on the Anki server, and I'm sick of all the mistakes and duplicated cards. Before that, I was using an even worse deck, so I've finally decided to make my own cards. My typing capacity is limited, so I decided to write a little program to do it for me.

Outputting a JLPT list I found in tab-separated format was straightforward enough, and I'm just getting example sentences added using the trusty Tanaka corpus (there is a plugin for a Anki, but it doesn't save them, so you can't use them with iAnki).

Anyway, I think the English and example sentences together make for good understanding, but ideally I'd like to add a (concise) Japanese definition as well. Does anybody know where I could find an EDICT style file that is Japanese-Japanese? Or where I could ask?

If I can get this done, my Japanese will start improving really fast, and since I can e-mail vocabulary lists from my dictionary on my iPod, I'll never have to type out a flashcard again :lol: Obviously, I'll add it to the shared decks on Anki as well :P

Belton
Expert on Something
Posts: 752
Joined: June 16th, 2006 11:39 am

Postby Belton » October 10th, 2009 9:45 am

There are commercial dictionaries but for quick access and cross indexing they probably aren't a simple XML file. (and since they are readily available on keitai there might be little impetus to develop a free resource like Edict or Eijiro)
most public domain data seems to be translation dictionaries (for want of a better word)

Wiktionary might be your best bet. Good luck on extracting the data however. I've found it hard enough to parse what I want out of Kanjidict into a database.

http://www.linkedin.com/answers/technol ... CH_ITS_DBS

http://ja.wiktionary.org/

http://download.wikipedia.org/jawiktionary/20091008/

If you're patient with cut and paste (8000 words!), or good at Applescript you could probably re-purpose the data in OSX 's dictionary. or one on goo

For learners the nature of the definition might be an issue (maybe not for you but other users). A children's dictionary might be a better starting point. While I've seen interesting ones in print I've yet to come across a computer version. I have a good one on DS in kakitorikun (the definitions are appropriate to the grade of the word) but apart from transcribing there's probably no way to get at it. But it's fine where it is mostly,if not as convenient as flashcards.

I wonder if you could get a collaborative project started? With enough input it might be possible. It seems like a project JapanesePod might sponsor/host.
The main problem (after manhours) might be not infringing copyrights, where would the definitions be sourced from?

btw Even Jim Breen doesn't describe Tanaka as completely trustworthy due to the way it was originally compiled.
http://www.csse.monash.edu.au/~jwb/tanakacorpus.html
I like to check Reiji as well.

Get 51% OFF
Psy
Expert on Something
Posts: 845
Joined: January 10th, 2007 8:33 am

Postby Psy » October 10th, 2009 4:48 pm

As belton said, the tanaka corpus isn't perfect (I've corrected a number of entries myself), but it is available for free as a download so I would take it over having nothing. If you're wanting to parse stuff, why not design something that'll access a place like dictionary.goo.ne.jp and rip the examples/definitions from there? After deciding on the data you want, it'd be just as easy to throw it into a tab-delimited file for later import with Anki. I do stuff like this all the time with small snippets of PHP and SQL. It'd be best to keep it as a private enterprise though-- for one, I wouldn't want to disrespect their servers. For two, as belton wrote, getting in trouble for copyright infringement wouldn't be fun.

I suppose I'll try fiddling with this idea myself.
High time to finish what I've started. || Anki vocabulary drive: 5,000/10k. Restart coming soon. || Dig my Road to Katakana tutorial on the App store.

Javizy
Expert on Something
Posts: 1165
Joined: February 10th, 2007 2:41 pm

Postby Javizy » October 11th, 2009 12:34 am

Thanks for the replies. I definitely think it will be possible. Goo's definitions are exactly the kind of ones I want. I'd written off trying to rip from an online source, but maybe it's not such a bad idea after all. I managed to work out how to query a word with goo using these values:

dictionary.goo.ne.jp/freewordsearcher.html?MT=word&kind=jn&dict=国語辞書&mode=0

Strangely it seems to recognise 国語辞書 (unless it just goes by kind=jn), but when I tried setting MT to 想像, it was returned as question marks. When I search normally, it is converted to this format: %E6%83%B3%E5%83%8F. Apparently it's some sort of Unicode with the special characters converted. Do you know what this sort of format this is called? I'm having some trouble searching for it.

Edit: I think java.net.URLEncoder might take care of the formatting issue, but I'm going to bed now so I'll have to try tomorrow :mario:

Javizy
Expert on Something
Posts: 1165
Joined: February 10th, 2007 2:41 pm

Postby Javizy » October 13th, 2009 1:01 pm

えへへ

Image

Also, I did read about the problems with the Tanaka corpus, but Breemy said the updated version has undergone a considerable amount of maintenance. Certainly not perfect, but free and very programmer friendly :P

Return to “Learn All About Japanese”