Sharon_ Posted October 2, 2014 Report Share Posted October 2, 2014 (edited) Hi friends, I am developing OCR software where I am reading an excel file from Labview and compare the texts from it to the OCR module output. My labview version is in Japanese. If the excel file texts are either in Japanese or English my string indicator displays the texts without any issues. But I cant display other language texts like korean,turkish or russian. Is there a way to fix this problem? I spoke to NI -Japan but the solution provided(Labview unicode) does nt seem to be working. In unicode mode I cant even read japanese texts properly. Thanks for your time and support. Sharon Edited October 2, 2014 by Sharon_ Quote Link to comment
JKSH Posted October 2, 2014 Report Share Posted October 2, 2014 Hi, What encoding does your OCR software use? Is it UTF-8 or UTF-16 or SHIFT-JIS or something else? To display Unicode text in LabVIEW, you must give it Unicode data. If you provide SHIFT-JIS text but LabVIEW tries to interpret it as Unicode text, then the interpretation will be wrong. This page might provide more insight: https://decibel.ni.com/content/docs/DOC-10153 Quote Link to comment
Sharon_ Posted October 2, 2014 Author Report Share Posted October 2, 2014 Hi, What encoding does your OCR software use? Is it UTF-8 or UTF-16 or SHIFT-JIS or something else? To display Unicode text in LabVIEW, you must give it Unicode data. If you provide SHIFT-JIS text but LabVIEW tries to interpret it as Unicode text, then the interpretation will be wrong. This page might provide more insight: https://decibel.ni.com/content/docs/DOC-10153 Hi JKSH, Thanks for the reply. I havent gone that far. I am only trying to read the texts that are to be compared with the OCR output. Before comparison I just want to display the texts that I read from excel file. I cant be sure of the final result, if I am not sure whether the data I am displaying is not correct or the format is different. Sharon Quote Link to comment
JKSH Posted October 2, 2014 Report Share Posted October 2, 2014 Hi Sharon, How do you read the text from Excel into LabVIEW? If the excel file texts are either in Japanese or English my string indicator displays the texts without any issues. But I cant display other language texts like korean,turkish or russian. I'm guessing that the text that you read from Excel text is encoded in SHIFT-JIS. SHIFT-JIS can encode Japanese and English text, but it cannot encode text from Turkish or Russian languages. That's why Unicode was invented. Unicode can encode text from many many different languages at the same time. I am only trying to read the texts that are to be compared with the OCR output. Before comparison I just want to display the texts that I read from excel file. To display the text in LabVIEW Unicode mode, you must convert the text into a Unicode encoding first. As a starting point, read the link I posted earlier (https://decibel.ni.com/content/docs/DOC-10153 ). I haven't tried it yet, but the example under "Converting ASCII Strings to Unicode" should let you display your Japanese text in LabVIEW's Unicode mode. (I don't think it will correctly convert your Turkish and Russian text, though. But anyway, try it first, and let's do this one step at a time. Text encoding is a moderately complex topic, and you'll probably need a few days to fully understand your problem). Quote Link to comment
Rolf Kalbermatter Posted October 4, 2014 Report Share Posted October 4, 2014 Hi Sharon, How do you read the text from Excel into LabVIEW? I'm guessing that the text that you read from Excel text is encoded in SHIFT-JIS. SHIFT-JIS can encode Japanese and English text, but it cannot encode text from Turkish or Russian languages. That's why Unicode was invented. Unicode can encode text from many many different languages at the same time. To display the text in LabVIEW Unicode mode, you must convert the text into a Unicode encoding first. As a starting point, read the link I posted earlier (https://decibel.ni.com/content/docs/DOC-10153 ). I haven't tried it yet, but the example under "Converting ASCII Strings to Unicode" should let you display your Japanese text in LabVIEW's Unicode mode. (I don't think it will correctly convert your Turkish and Russian text, though. But anyway, try it first, and let's do this one step at a time. Text encoding is a moderately complex topic, and you'll probably need a few days to fully understand your problem). Basically using the calls to MultyByteToUnicodeString() and UnicodeToMultiByteString() Windows APIs you can do every possible conversion from and between an MBCS encoding known to Windows. These functions accept as one of their parameters the codepage that the MBCS text is in. By default, one passes the CP_ACP constant there, which tells Windows to use the current user codepage, but if you know that your text is in another different codepage you have to pass in the according constant for that parameter to MultyByteToUnicodeString() and end up with UTF16 encoded string in the output. Quote Link to comment
Sharon_ Posted October 7, 2014 Author Report Share Posted October 7, 2014 Hi, I can read and display the unicode characters on indicators, if the unicode chars are input directly from a control or from a text file(saved as unicode file). But the problem I am facing is when I read excel file. The 'non-supported' chars are displayed as '?'. Unfortunately we can't save the excel file as unicode file.right? I am virtually running out ideas now. I tried coping the excel contents in a txt(unicode) file and read it using LabVIEW code. It seems to be okay. So now how do I handle the excel file containing ISO 8859-1-15, 8859-9,8859-5, KSC5801 and Chinese traditional and Manderin characters.? Any solutions? Thanks for your time and support..!!! -Sharon Quote Link to comment
Rolf Kalbermatter Posted October 7, 2014 Report Share Posted October 7, 2014 Hi, I can read and display the unicode characters on indicators, if the unicode chars are input directly from a control or from a text file(saved as unicode file). But the problem I am facing is when I read excel file. The 'non-supported' chars are displayed as '?'. Unfortunately we can't save the excel file as unicode file.right? I am virtually running out ideas now. I tried coping the excel contents in a txt(unicode) file and read it using LabVIEW code. It seems to be okay. So now how do I handle the excel file containing ISO 8859-1-15, 8859-9,8859-5, KSC5801 and Chinese traditional and Manderin characters.? Any solutions? Thanks for your time and support..!!! -Sharon How do you read the characters from the Excel file? What Excel file is it? Basically xls files use binary OLE streams for data which stores strings as OLECHAR which is basically UTF16. xlsx files use xml with UTF8 encoding. But your problem is most likely that you use the ActiveX interface to Excel. Here LabVIEWs own smartness likely plays you some tricks since the strings provided by the Excel ActiveX interface are automagically translated into whatever is your current default mbcs codepage that you have configured for your Windows account. While LabVIEW can support Unicode in its string controls with the unsupported ini file setting, it's very much possible that this support does not extend to the ActiveX interface in LabVIEW and ActiveX being designed as idiot proof interface doesn't allow you to change that behavior. Quote Link to comment
Sharon_ Posted March 12, 2015 Author Report Share Posted March 12, 2015 (edited) rolfk, I am not sure how to use MultyByteToUnicodeString() function in LV. But if I want use excel property to read out the values from the excel containing unicode characters , do I need to install language packs? For instance I have no problem with Japanese texts or english. But I can read any language characters from a unicode text file. Is this going to be an issue while writing the unicode values an excel file? Any idea? I was planning to create only txt file reports but the problem with unicode text files is that, newline is unicoded(not exactly coming to the newline) and I cant append texts to the "next line". That7s why I wanted to generate reports in excel.Argh! -Sharon Edited March 12, 2015 by Sharon_ Quote Link to comment
Rolf Kalbermatter Posted March 12, 2015 Report Share Posted March 12, 2015 Well, Excel is (mostly) Unicode throughout so it can simply display any language that can be represented in UTF-16 (the Unicode encoding scheme used by Windows). LabVIEW is NOT Unicode (aside from the unsupported option to enable UTF-8 support in it but that is an experimental feature with lots of difficulties that make a seamless operation very difficult). As such LabVIEW uses whatever MBCS (MultiByte Character Set) that your language setting in the International Control Panel defines. When LabVIEW invokes ActiveX methods in the Excel Automation Server the strings are automatiically translated from the Excel Unicode format to the LabVIEW MBCS format. But Unicode to MBCS can be lossy since no MBCS coding scheme other than UTF-8 (which can also be considered MBCS) can represent every Unicode character. But Windows doesn't allow to define UTF-8 to be set as system MBCS encoding unlike Linux. So if your Excel string contains characters that can not be translated to characters in the current MBCS of the system you get a problem. There is no simple solution to this problem, otherwise NI and many others would have done it long ago. Anything that can be thought out for this will in some ways have drawbacks elsewhere. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.