Jump to content

Labview Multilingual text read support


Sharon_

Recommended Posts

Hi friends,

               I am developing OCR software where I am reading an excel file from Labview and compare the texts from it to the OCR module output.

My labview version is in Japanese. If the excel file texts are either in Japanese or English my string indicator displays the texts without any issues. But I cant display other language texts like korean,turkish or russian. Is there a way to fix this problem? :oops:

I spoke to NI -Japan but the solution provided(Labview unicode) does nt seem to be working.  In unicode mode I cant even read japanese texts properly. :throwpc:

 

Thanks for your time and support. :worshippy:

 

 

Sharon

Edited by Sharon_
Link to comment

Hi,

 

What encoding does your OCR software use? Is it UTF-8 or UTF-16 or SHIFT-JIS or something else?

 

To display Unicode text in LabVIEW, you must give it Unicode data. If you provide SHIFT-JIS text but LabVIEW tries to interpret it as Unicode text, then the interpretation will be wrong.

 

This page might provide more insight: https://decibel.ni.com/content/docs/DOC-10153

Link to comment

Hi,

 

What encoding does your OCR software use? Is it UTF-8 or UTF-16 or SHIFT-JIS or something else?

 

To display Unicode text in LabVIEW, you must give it Unicode data. If you provide SHIFT-JIS text but LabVIEW tries to interpret it as Unicode text, then the interpretation will be wrong.

 

This page might provide more insight: https://decibel.ni.com/content/docs/DOC-10153

 

Hi JKSH,

Thanks for the reply.

I havent gone that far. I am only trying to read the texts that are to be compared with the OCR output. Before comparison I just want to display the texts that I read from excel file. I cant be sure of the final result, if I am not sure whether the data I am displaying is not correct or the format is different.  :lightbulb:

 

 

Sharon

Link to comment

Hi Sharon,

 

How do you read the text from Excel into LabVIEW?

 

 

If the excel file texts are either in Japanese or English my string indicator displays the texts without any issues. But I cant display other language texts like korean,turkish or russian.

 

I'm guessing that the text that you read from Excel text is encoded in SHIFT-JIS.

 

SHIFT-JIS can encode Japanese and English text, but it cannot encode text from Turkish or Russian languages.

 

That's why Unicode was invented. Unicode can encode text from many many different languages at the same time.

 

 

 

I am only trying to read the texts that are to be compared with the OCR output. Before comparison I just want to display the texts that I read from excel file.

 

To display the text in LabVIEW Unicode mode, you must convert the text into a Unicode encoding first.

 

As a starting point, read the link I posted earlier (https://decibel.ni.com/content/docs/DOC-10153 ). I haven't tried it yet, but the example under "Converting ASCII Strings to Unicode" should let you display your Japanese text in LabVIEW's Unicode mode.

 

(I don't think it will correctly convert your Turkish and Russian text, though. But anyway, try it first, and let's do this one step at a time. Text encoding is a moderately complex topic, and you'll probably need a few days to fully understand your problem).

Link to comment

 

Hi Sharon,

 

How do you read the text from Excel into LabVIEW?

 

 

 

I'm guessing that the text that you read from Excel text is encoded in SHIFT-JIS.

 

SHIFT-JIS can encode Japanese and English text, but it cannot encode text from Turkish or Russian languages.

 

That's why Unicode was invented. Unicode can encode text from many many different languages at the same time.

 

 

 

 

To display the text in LabVIEW Unicode mode, you must convert the text into a Unicode encoding first.

 

As a starting point, read the link I posted earlier (https://decibel.ni.com/content/docs/DOC-10153 ). I haven't tried it yet, but the example under "Converting ASCII Strings to Unicode" should let you display your Japanese text in LabVIEW's Unicode mode.

 

(I don't think it will correctly convert your Turkish and Russian text, though. But anyway, try it first, and let's do this one step at a time. Text encoding is a moderately complex topic, and you'll probably need a few days to fully understand your problem).

 

 

Basically using the calls to MultyByteToUnicodeString() and UnicodeToMultiByteString() Windows APIs you can do every possible conversion from and between an MBCS encoding known to Windows. These functions accept as one of their parameters the codepage that the MBCS text is in. By default, one passes the CP_ACP constant there, which tells Windows to use the current user codepage, but if you know that your text is in another different codepage you have to pass in the according constant for that parameter to MultyByteToUnicodeString() and end up with UTF16 encoded string in the output. 

Link to comment

Hi,

      I can read and display the unicode characters on indicators, if the unicode chars are input directly from a control or from a text file(saved as unicode file).

But the problem I am facing is when I read excel file. The 'non-supported' chars are displayed as '?'.

Unfortunately we can't save the excel file as unicode file.right? 

I am virtually running out ideas now. :throwpc:  I tried coping the excel contents in a txt(unicode) file and read it using LabVIEW code. It seems to be okay. 

So now how do I handle the excel file containing ISO 8859-1-15, 8859-9,8859-5, KSC5801 and Chinese traditional and Manderin characters.?

Any solutions? :oops:

 

 

Thanks for your time and support..!!! :worshippy:

 

 

-Sharon

Link to comment

Hi,

      I can read and display the unicode characters on indicators, if the unicode chars are input directly from a control or from a text file(saved as unicode file).

But the problem I am facing is when I read excel file. The 'non-supported' chars are displayed as '?'.

Unfortunately we can't save the excel file as unicode file.right? 

I am virtually running out ideas now. :throwpc:  I tried coping the excel contents in a txt(unicode) file and read it using LabVIEW code. It seems to be okay. 

So now how do I handle the excel file containing ISO 8859-1-15, 8859-9,8859-5, KSC5801 and Chinese traditional and Manderin characters.?

Any solutions? :oops:

 

 

Thanks for your time and support..!!! :worshippy:

 

 

-Sharon

 

 

How do you read the characters from the Excel file? What Excel file is it?

 

Basically xls files use binary OLE streams for data which stores strings as OLECHAR which is basically UTF16. xlsx files use xml with UTF8 encoding.

 

But your problem is most likely that you use the ActiveX interface to Excel. Here LabVIEWs own smartness likely plays you some tricks since the strings provided by the Excel ActiveX interface are automagically translated into whatever is your current default mbcs codepage that you have configured for your Windows account. While LabVIEW can support Unicode in its string controls with the unsupported ini file setting, it's very much possible that this support does not extend to the ActiveX interface in LabVIEW and ActiveX being designed as idiot proof interface doesn't allow you to change that behavior.

Link to comment
  • 5 months later...

rolfk,

        I am not sure how to use MultyByteToUnicodeString() function in LV. But if I want use excel property to read out the values from the excel containing unicode characters , do I need to install language packs?

For instance I have no problem with Japanese texts or english. But I can read any language characters from a unicode text file. 

Is this going to be an issue while writing the unicode values an excel file? 

Any idea? 

I was planning to create only txt file reports but the problem with unicode text files is that, newline is unicoded(not exactly coming to the newline) and I cant append texts to the "next line". That7s why I wanted to generate reports in excel.Argh! :lightbulb: 

 

 

-Sharon

post-16569-0-72254500-1426140050_thumb.g

Edited by Sharon_
Link to comment

Well, Excel is (mostly) Unicode throughout so it can simply display any language that can be represented in UTF-16 (the Unicode encoding scheme used by Windows). LabVIEW is NOT Unicode (aside from the unsupported option to enable UTF-8 support in it but that is an experimental feature with lots of difficulties that make a seamless operation very difficult). As such LabVIEW uses whatever MBCS (MultiByte Character Set) that your language setting in the International Control Panel defines. When LabVIEW invokes ActiveX methods in the Excel Automation Server the strings are automatiically translated from the Excel Unicode format to the LabVIEW MBCS format. But Unicode to MBCS can be lossy since no MBCS coding scheme other than UTF-8 (which can also be considered MBCS) can represent every Unicode character. But Windows doesn't allow to define UTF-8 to be set as system MBCS encoding unlike Linux.

 

So if your Excel string contains characters that can not be translated to characters in the current MBCS of the system you get a problem. There is no simple solution to this problem, otherwise NI and many others would have done it long ago. Anything that can be thought out for this will in some ways have drawbacks elsewhere.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.