Start Learning Japanese in the next 30 Seconds with
a Free Lifetime Account

Or sign up using Facebook

OCR software for Japanese

Moderators: Moderator Team, Admin Team

ivoplsek1676
New in Town
Posts: 5
Joined: December 10th, 2010 2:12 pm

OCR software for Japanese

Postby ivoplsek1676 » September 14th, 2011 6:39 am

Hello everyone,
This might be a difficult question to answer but I am trying my luck. Does anyone know of a good OCR sofware for the Japanese language? Please help out of you know?
Thank you,

ivo

PS. If you need a fuller explanation:
OCR helps you recognize text that was, for example, scanned from a book, your notes etc. Usually when you scan a text like that you are unable to edit it. But with a OCR software you can recognize this text and edit it later (Adobe PRO has an integrated OCR - optical character recognition - so does these days google.docs etc.). With English, there is no problem now to scan a text and then transfer it into word document, html etc. But is there any software that could do it for Japanese in a reliable way that you know of? I have found out that Omnipage and ABBVY Reader should be able to do it...but have never used this software in real life and before shelling out $500 I'd like to know how good it is.


Why do I need to do this: I am currently working on my dissertation that requires that I read a ton of Japanese scholarly literature but with my level of Japanese I am still unable to read fast (need constantly to look kanji that I don't know). Since this takes obviously a lot of time (time I don't have, unfortunately) I thought it would be much easier if I scanned the books, then transfered the text into HTML and then used Rikaichan while reading. This will be still time consuming but not as much as what I am doing now.

Any ideas or help will be truly appreciated.

kraemder
New in Town
Posts: 7
Joined: June 20th, 2011 2:49 pm

Postby kraemder » September 18th, 2011 12:14 am

For a dissertation? ouch pressure heh.

I would really love some good OCR software just for my hobby of trying to learn this terribly difficult writing system. Anyway, I've tried ABBYY Finereader 9.0 which I originally purchased a few years ago for OCR of German, Spanish, and English. It did those languages with ease very accurately. I think in part it is able to do this since it can easily separate the words from each other and use a dictionary to double check itself.

Finereader can OCR Japanese text but there's no option to have it use a dictionary to improve accuracy. I think they're up to version 10.0 and I'm still using 9.0. They may have improved the accuracy with the new version but with 9.0 I found that it wasn't good enough for Japanese. But I'm more of a beginner so I was needing to look up pretty much every word in a dictionary as I went whereas someone with more skill wouldn't.

I currently have the latest version of Adobe Acrobat Standard and it seems to do a slightly better job of OCR. It's not perfect though. It seems katakana words in particular cause it headaches.

I am pretty sure all of these software packages let you use them on a trial basis before paying. I would check for that.

I'd love to hear from you if you locate a good solution - please reply to your thread here.

One note - if you buy Adobe I strongly suggest you take the time to order by phone. I had an issue where the email on the order was incorrect and I had to send bank statements and tons of paperwork and it took a month for them to figure it out and allow me to download the software. Terrible experience.

Get 51% OFF
ivoplsek1676
New in Town
Posts: 5
Joined: December 10th, 2010 2:12 pm

it works

Postby ivoplsek1676 » September 18th, 2011 1:40 am

Hello kraemder3911,

So i have looked into it and it seems that the best applications are OmniPage and Abbyy - I've got for Abbyy 10.1 and have to say that I am amazed: it actually works, almost with 100% accuracy. At least so far the text I have scanned was perfect. The conversion from vertical (left to right) texts into a pdf or htm was no problem at all either. The only little issueare e.g. page numbers or words that are sort of not part of the text itself - sometimes the software just doesn't know what to do with them and they might appear in the middle of a sentence or something after the conversion. But that's really a minor issue. I have not expected that the OCR technology has advanced this much and I am pleasantly surprised.
I have no experience with Adobe in regards to Japanese - I thought that Adobe was not able to work with Japanese at all?
I have heard that there's other software out there (written by Japanese companies) that does OCR...e.g. http://www.sourcenext.com/titles/use/124290/#読み取り機能.

This is cheap but truly ABBYY FineREader is so easy to use and seems like a really powerful tool - not just for Japanese but for many languages.
Hope this helps.....

ps. if others have experience with Japanese OCR or conversion in general, please post here.

kraemder
New in Town
Posts: 7
Joined: June 20th, 2011 2:49 pm

a

Postby kraemder » September 18th, 2011 2:25 pm

Thanks for replying, I appreciate that. I will have to give Abby another look. What dpi are you scanning at? adobe definitely has Japanese support :p. I tried it out with a couple pages of Harry potter. It did better than Abby 9. Is there a particular reason you're having the text aligned left to right? It doesn't by some miracle put spaces between words does it?

kraemder
New in Town
Posts: 7
Joined: June 20th, 2011 2:49 pm

Postby kraemder » September 18th, 2011 11:36 pm

heh I bought FR11 upgrade. I think it just came out maybe a few days ago even. It has dictionary support for Japanese (maybe version 10 did too?). I did the trial which only lets you save 1 page at a time. The 1st page definitely seemed better than before and also better than Adobe. So i plunked down 100 bucks for it. I like the interface a lot better than 9.0 too. While it's definitely better, it did have issues with some pages I think due to furigana. Double edged sword there furigana. Helps me out tons but seems to just confuse the computer. But all in all it's pretty good. I particularly like how I can save to html and then use rikichain to read which seems to be the best pop dictionary. With adobe I was using JBrute which isn't bad but isn't as nice.

Since there are issues with accuracy (I'm not getting 99% maybe because Harry Potter has a lot more furigana than other books) I'm thinking I'll have to have the original available for the parts it got confused on =/.

Would be nice if I could just buy etexts of light popular fiction online. 100% accuracy without a need for OCR would be nice =).

philipp_wagner_501538
New in Town
Posts: 1
Joined: September 7th, 2014 6:11 pm

Re: OCR software for Japanese

Postby philipp_wagner_501538 » September 7th, 2014 6:14 pm

Hello everybody!

I use the following page for reading texts in Japanese by OCR (this works surprinsingly smoothly!!) and then you can do anything you want with them - e.g. put them into Google Translator, for beginners. It is a Online OCR Service: http://www.ocrgeek.com . The nice thing is that you can upload as many pages as you want at one time. This helps to save some time - the plain text the shows up in one file and is editable...

Do not forget to choose Japanese as document language right at the beginning, otherwise thinks will not work properly...

Good luck :-)

community.japanese
Expert on Something
Posts: 2704
Joined: November 16th, 2012 8:54 am

Re: OCR software for Japanese

Postby community.japanese » September 9th, 2014 6:31 am

philipp_wagner さん、

Konnichiwa.
Thank you for sharing the information.

Yuki 由紀
Team JapanesePod101.com

Return to “Learn All About Japanese”