Skip to content

working with uzn file not working #66

Closed
@michi729

Description

@michi729

in a command line you would use "tesseract.exe pic1.bmp pic1.txt -psm 4" and put a pic1.uzn file in the current directory.
When I try
Tesseract.TesseractEngine tesseract = new Tesseract.TesseractEngine("....path... tessdata", "eng", Tesseract.EngineMode.Default);
Tesseract.Pix picture = Tesseract.Pix.LoadFromFile(@"...path... pic1.bmp");
Tesseract.Page page = tesseract.Process(picture, Tesseract.PageSegMode.SingleColumn); //PSM -4
...
string text = page.GetText();

will lead to an exception on GetText (same as tesseract.exe would fail if there is no uzn file)
Therefore I assume that the .net wrapper does not find (or search for) the uzn file.

Could you please tell me what to do or if this is a bug?

Activity

charlesw

charlesw commented on Jan 24, 2014

@charlesw
Owner

Hi michi729,
I wasn't even aware of uzn files before now which is probably why it doesn't work. Anyway I've done a little reading and it seems like tesseract needs to know the input name for file (which make sense since this is how it finds the uzn file). Do you think it makes sense to add this to the Page class, in which case you could do:

using (var engine = new TesseractEngine(@"./tessdata", "eng", EngineMode.Default)) {
    using(var img = Pix.LoadFromFile("./phototest.tif")) {
        using(var page = engine.Process(img)) {
            page.InputName = "phototest.tif";
            // do processing
        }
    }
}

Alternatively I could overload the Process method so it takes the input name as an optional parameter. Do you have any preferences? Also if you could kindly provide an example image and corresponding uzn file with a brief description of what you expect the output should be so I can write up a test case to verify the implementation. Note this should of course not contain any confidential information or be copyrighted.

michi729

michi729 commented on Jan 24, 2014

@michi729
Author

Hi Charles, thanks for the quick response!
I will get back to you with an example picture as well as uzn file in time.
All the best, Michael

michi729

michi729 commented on Jan 24, 2014

@michi729
Author

test

Calling "tesseract.exe test.png test -psm 4"
with tesseract, test.png and test.uzn in the same directory will result in a test.txt with the content
This is another test

Content of test.uzn:
100 130 200 30 Text

charlesw

charlesw commented on Jan 24, 2014

@charlesw
Owner

Thanks just what I needed.

michi729

michi729 commented on Jan 27, 2014

@michi729
Author

Hi Charles, I am not sure, if this should be added as parameter. Tesseract itself just replaces the suffix of the current picure's name. I.e. you could get the picture name from parsing LoadFromFile. What do you think?

charlesw

charlesw commented on Jan 27, 2014

@charlesw
Owner

In theory yes, however this would only work if the image was loaded from file. Tesseract actually doesn't work this way and according to my analysis of the source relies on the image name being passed in as an additional parameter to it's ProcessPage routine. Its a pretty simple fix really so should have it done tomorrow sometime, assuming no unforeseen issues arise.

michi729

michi729 commented on Jan 27, 2014

@michi729
Author

You are right :-) And thanks for taking the time!

charlesw

charlesw commented on Jan 27, 2014

@charlesw
Owner

Just released an updated nuget package (1.10) that supports uzn files though an optional parameter on Process as previously discussed. Please note that using a PSM of SingleColumn (4) does NOT work due to a bug in Tesseract 3.02 (https://code.google.com/p/tesseract-ocr/issues/detail?id=653) however other options do. This issue will be resolved once tesseract 3.03 has been released.

michi729

michi729 commented on Jan 28, 2014

@michi729
Author

Hi Charles, thank you very much for your fast support :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @charlesw@michi729

      Issue actions

        working with uzn file not working · Issue #66 · charlesw/tesseract