the entries in Edict2 are often multiple in the subject for a given set to English equivalents
I decided to break them all out as separate entries, but to leave them in order.
This created some oddities as some multiple entries have a "double kanji" in something other than first position of the 'subject' and that same double may not occur anywhere else in the dict.
Today I found a Kanji that is in KanjiDic2 but absent in Edict2 - which I had not expected (roughly equivalent Kanji are there in its place.)
I have to change my algorithm to cope with Kanji which are only in "second position" in doubles.
The algorithm now tries to find a set of entries beginning with a search kanji, but is trying to fall back to a previous "find" it it fails. But sometimes taking the first match means missing a later set/series of entries. I will look at cross-linking if it can be done without affecting size and speed too much. When I add user mnemonic notes (soon) there should be a chance to put in "See Also" links. I may add a widget that tracks Kanji as a Preference so that will offer a link. The techniqu/algorithm must be smart in order to cope with users ADD'ing entries (which is also on my short list.) Entries do carry an integer id for their JMDict source and that will allow this shuffling of links to work.
You can see the app do hit-n-miss sometimes on startup - it will look in,say, file #2 of 9 and then go to file #3, where it may fail to find a "double" match. I will re-write some of that code in about 10 days. I am resisting using a conventional "index" as most of the kanji "cluster" in "first" position somewhere in the dictionary, but there are some one-off cases. I may use a partial index tied to the cached records ( I cache as "pages" load.)
I intend to do other versions in Pharo Smalltalk, Object Icon and perhaps Red or Visual Prolog. Icon allows some backtracking and as a language was designed for this kind of task. I may also try two tasks competing with different rules to see who gets a good match first and track kanji-kana combo's which fail to find an expected match based on the CharClass/CharMap lookup.
I should really find a moment to run a few thousand common kanji in batch and see what the success hit rate is. If I add a "rolodex" of a few thousand kanji as a user "index" mapped into the dictionary there will then be that option for a user.
Overall, I need to able to maintain the code and get onto to styling the appearance, while keeping it small and fast.
If you get a hit for a double kanji (as a pair) in the full dict, but fail to get a hit in the DBL dict, please let me know,
Thanks
R
PS thanks for not asking why I am not doing this in Perl