_Y_ Posted September 10, 2016 Report Share Posted September 10, 2016 (edited) I need to "extract" text from MS Word document (no formatting, just plain text). Unfortunately, methods that I found allow to get only conventional LabVIEW string where all national characters and symbols are lost. Each such a character is replaced with code of question mark. Is there any way to read national text from MS Word? I would be happy to get it in any format; for example as U16 array of Unicode symbols, or U8 array with two values per symbol, or any other. I would also be happy with any Word format: doc or docx. Thank you Edited September 10, 2016 by _Y_ Quote Link to comment
ShaunR Posted September 10, 2016 Report Share Posted September 10, 2016 (edited) They are not "lost". There is just no mapping in the current code page to render them While LabVIEW doesn't officially support unicode there are things unofficially that you can do to display and manipulate unicode strings. Edited September 10, 2016 by ShaunR Quote Link to comment
_Y_ Posted September 10, 2016 Author Report Share Posted September 10, 2016 ShaunR, thank you for the answer. The information about Unicode in LabVIEW is really interesting and will be useful. However, I am still at the square 1. The article does not explain how to get Unicode string from MS Word document (or I did not find answer in the article). Quote Link to comment
Neil Pate Posted September 10, 2016 Report Share Posted September 10, 2016 (edited) Can you interact with Word (using ActiveX) and save as a text file? Then you can read the text file as pure bytes and interpret as UTF-8. I have done something similar whereby I allow a GUI to be translated "on-the-fly" into different languages, stored as UTF-8 text files. Edited September 10, 2016 by Neil Pate Quote Link to comment
_Y_ Posted September 11, 2016 Author Report Share Posted September 11, 2016 11 hours ago, Neil Pate said: Can you interact with Word (using ActiveX) and save as a text file? Is it possible to send such a command from LabVIEW to MS Word? Is there any description how to do it? Actually it could be a good solution. Quote Link to comment
ShaunR Posted September 11, 2016 Report Share Posted September 11, 2016 Oh.I thought you had already obtained the text since you stated it looks like a series of question marks (so just needed to convert it) LabVIEW is shipped with some automation examples. The one below (from the examples) interacts with Excel but the principle is the same. I couldn't find any examples of Ms Word without the Report Toolkit because most interaction with MS products is generally the other way - writing reports. I don't have M$ products installed to knock up a quick example, unfortunately. Quote Link to comment
_Y_ Posted September 13, 2016 Author Report Share Posted September 13, 2016 (edited) Thank you, ShaunR, for your attempts. This is my test code. I can read the file but the output string contains no hidden data, only U8 symbols. So, I get question mark instead of the national character. txt_text.doc word_reader_160913.vi Edited September 14, 2016 by _Y_ Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.