Edict2 Japanese-English Dictiionary Complete in plain HTML

rshiplett · Postby **rshiplett** » August 8th, 2012 3:59 pm

Edict2 Japanese-English Dictiionary Complete in plain HTML : 15 page at http://kanji.aule-browser.com/index.html#edict2

The pages have two index pages: one for Kanji and one for other kana

13 of the 15 pages have alternative views that load the free Hanazono font HanaMinA.

The pages are PLAIN OLD HTML + CSS - no JavaScript or suspicious links or frames or whatever. CLEAN.

Most pages have about 14,000 items.

The next iteration is in the Curl language for Tokyo's SCSK corporation ( formerly MIT Curl) and maybe later using a JavaScript library.

I am open to any request - provided that it is not for a Flash, Silverlight or Java applet.

Browsing the pages has been a big help to me in discovering my own mnemonic aids.

mmmason8967 · Postby **mmmason8967** » August 12th, 2012 10:35 pm

Would you mind explaining what CURL actually is?

I gather it's a browser plug-in of some sort but I'm not at all clear what it does. I tend to be somewhat suspicious of plug-ins, at least until I'm sure about what they do and where they came from. From comments you've made previously, I guess you're probably the same.

Anyway, a brief Idiot's Guide would be greatly appreciated!

マイケル

rshiplett · Postby **rshiplett** » August 13th, 2012 3:57 am

MIT Curl comes from the same DARPA project as W3C of the www

Curl can be thought of as an alternative to HTML + JavaScript. In a browser it does require a plugin, and even as a desktop app requires the Curl environment rather as does, say, Java.

Curl can be simple declarative markup as is HTML. But Curl also has macros, so it can do markup such as 'poem' with 'verse' and verse' instead of DIV with a P and a P.

Curl is an expression-based language ( two others are ICON and Rebol) but it was developed by LISP professors at MIT and has some very strong features as a programming language such as anonymous procedures and the macro facilities.

I consider the Curl programming environment to be one of its great strengths - the IDE. The debugger is second to none. What is more, the documentation is itself a live Curl application and the code in the doc's not only runs but can be edited, saved, restored ...

As a programming language Curl comes with very extensive libraries - although not with a rules module or a separate parsing module. For parsing string data I prefer ICON, Rebol or PROLOG.

As a format for literature and language, I prefer it to JavaScript with JSON or my old fallback, Smalltalk.

Cheers,

Robert
PS
the dictionary is Beta 0.8 as of today.

rshiplett · Postby **rshiplett** » August 21st, 2012 1:03 pm

I am very much in need of beta testers at http://www.aule-browser.com/kanji/index.html#curld especially for the DBL or "Double" Kanji version of the Edict2/Kanjidic2 dictionary viewer which uses the Curl RTE plugin from curl.com (English) or curlap.com (Japanese).

Curl from MIT is now based at Tokyo's SCSK Corp. The most recent change is to offer a Korean version (July 2012).

Unlike plugins such as Flash and Adobe Reader, very view updates to the plugin have been required.

Unlike Java, I am not aware of any issues being reported concerning the use of Curl. Curl is on the laptops of a great many financial folks in the Fortune 100 because of software from Paisley, where I was once the lead Curl developer. Paisley is now owned by Thomson Reuters, a dominant supplier of financial software in taxation,audit and GRC.

The "Doubles" Curl app is at http://www.aule-browser.com/kanji/edict2-dbl-kanji-grid-curl.html

The actual Curl file varies as the Beta progresses through the addition of the non-basic features of the app. All basic dictionary viewing features are now in place with no known bugs outstanding. Hiragana and Katakana dictionary files are not yet added (buttons for these two are disabled in the current Beta 0.903

The app runs on Win, Mac and Linux, with Android and iOS to follow. Possible issues involve user UNICODE font settings in the browser. The desktop version will follow the 1.0 release as a ZIP and gzip file for use either on-line or off-line. A browser version to run off-line is also planned.

The app does not use SQL ... not even SqLite is required.

Your help is appreciated. The app should be an aid to students when off-line or out of range for internet access.

The app is designed to use as little memory and disk space as possible.

mmmason8967 · Postby **mmmason8967** » August 21st, 2012 9:02 pm

I don't really know enough Japanese to give it a proper workout, but I'm quite happy to poke around and let you know if anything appears to go wrong.

I'm using it mainly in Firefox 14 under Windows 7.

マイケル

rshiplett · Postby **rshiplett** » August 23rd, 2012 9:04 am

Thanks - I need whatever help I can get !

Yesterday I relaxed the rule for the "Double Kanji" view to include more words -- doubles ending with a character etc

It is most effective at the moment

1) entering a search for 2 kanji

2) entering 1 Kanji followed by an asterix, i,e,, *

The "Double" version only looks at the first 3 characters in the search field ( the entire app is based on first two characters as the mechanism for fast searched )

I have a post at http://kanjirecog.blogspot.ca/2012/08/recognizng-kanji.html that explains more on navigating using the first learner's kanji.

R

mmmason8967 wrote:I don't really know enough Japanese to give it a proper workout, but I'm quite happy to poke around and let you know if anything appears to go wrong.

I'm using it mainly in Firefox 14 under Windows 7.

マイケル

[url][/url]

mmmason8967 · Postby **mmmason8967** » August 23rd, 2012 10:01 pm

When I search for a kanji pair using, say, 彼*, I get what I expect, which is a list of pairs where the first kanji is 彼.

My expectation (which may be mistaken) is that the list will contain only pairs starting with 彼 (like globbing) but in fact the list seems to contain the 彼 pairs followed by pairs which have 悲 as the first character. Is that what you expect?

マイケル

rshiplett · Postby **rshiplett** » August 23rd, 2012 10:55 pm

the entries in Edict2 are often multiple in the subject for a given set to English equivalents

I decided to break them all out as separate entries, but to leave them in order.

This created some oddities as some multiple entries have a "double kanji" in something other than first position of the 'subject' and that same double may not occur anywhere else in the dict.

Today I found a Kanji that is in KanjiDic2 but absent in Edict2 - which I had not expected (roughly equivalent Kanji are there in its place.)

I have to change my algorithm to cope with Kanji which are only in "second position" in doubles.

The algorithm now tries to find a set of entries beginning with a search kanji, but is trying to fall back to a previous "find" it it fails. But sometimes taking the first match means missing a later set/series of entries. I will look at cross-linking if it can be done without affecting size and speed too much. When I add user mnemonic notes (soon) there should be a chance to put in "See Also" links. I may add a widget that tracks Kanji as a Preference so that will offer a link. The techniqu/algorithm must be smart in order to cope with users ADD'ing entries (which is also on my short list.) Entries do carry an integer id for their JMDict source and that will allow this shuffling of links to work.

You can see the app do hit-n-miss sometimes on startup - it will look in,say, file #2 of 9 and then go to file #3, where it may fail to find a "double" match. I will re-write some of that code in about 10 days. I am resisting using a conventional "index" as most of the kanji "cluster" in "first" position somewhere in the dictionary, but there are some one-off cases. I may use a partial index tied to the cached records ( I cache as "pages" load.)

I intend to do other versions in Pharo Smalltalk, Object Icon and perhaps Red or Visual Prolog. Icon allows some backtracking and as a language was designed for this kind of task. I may also try two tasks competing with different rules to see who gets a good match first and track kanji-kana combo's which fail to find an expected match based on the CharClass/CharMap lookup.

I should really find a moment to run a few thousand common kanji in batch and see what the success hit rate is. If I add a "rolodex" of a few thousand kanji as a user "index" mapped into the dictionary there will then be that option for a user.

Overall, I need to able to maintain the code and get onto to styling the appearance, while keeping it small and fast.

If you get a hit for a double kanji (as a pair) in the full dict, but fail to get a hit in the DBL dict, please let me know,

Thanks

R
PS thanks for not asking why I am not doing this in Perl ;-)

natsukoy9313 · Postby **natsukoy9313** » August 24th, 2012 9:20 am

Hi there

We, the JPpod101 team, just want to appreciate the information you're providing here!
I checked the link and think it's really good :shock:

Thank you for sharing!

Natsuko(奈津子),
Team JapanesePod101.com

mmmason8967 · Postby **mmmason8967** » August 24th, 2012 8:02 pm

rshiplett wrote:PS thanks for not asking why I am not doing this in Perl

I code only occasionally. I've tried Perl but I always seemed to end up with a program that worked but which I couldn't understand two days later. I'm much more of a Python person really.

マイケル

Learn Japanese - JapanesePod101.com

Edict2 Japanese-English Dictiionary Complete in plain HTML

Edict2 Japanese-English Dictiionary Complete in plain HTML

Curl as web content markup and as programming language

Re: Curl as web content markup and as programming language

Edict2

Re: Edict2