Jump to content
Sign in to follow this  
_Y_

Read national text from MS Word document

Recommended Posts

I need to "extract" text from MS Word document (no formatting, just plain text). Unfortunately, methods that I found allow to get only conventional LabVIEW string where all national characters and symbols are lost. Each such a character is replaced with code of question mark.

 

Is there any way to read national text from MS Word? I would be happy to get it in any format; for example as U16 array of Unicode symbols, or U8 array with two values per symbol, or any other. I would also be happy with any Word format: doc or docx.

 

Thank you

Edited by _Y_

Share this post


Link to post
Share on other sites

They are not "lost". There is just no mapping in the current code page to render them

While LabVIEW doesn't officially support unicode there are things unofficially that you can do to display and manipulate unicode strings.

Edited by ShaunR

Share this post


Link to post
Share on other sites

ShaunR, thank you for the answer. The information about Unicode in LabVIEW is really interesting and will be useful. However, I am still at the square 1. The article does not explain how to get Unicode string from MS Word document (or I did not find answer in the article).

Share this post


Link to post
Share on other sites

Can you interact with Word (using ActiveX) and save as a text file? Then you can read the text file as pure bytes and interpret as UTF-8. I have done something similar whereby I allow a GUI to be translated "on-the-fly" into different languages, stored as UTF-8 text files.

Edited by Neil Pate

Share this post


Link to post
Share on other sites
11 hours ago, Neil Pate said:

Can you interact with Word (using ActiveX) and save as a text file?

Is it possible to send such a command from LabVIEW to MS Word? Is there any description how to do it?

Actually it could be a good solution.

Share this post


Link to post
Share on other sites

Oh.I thought you had already obtained the text since you stated it looks like a series of question marks (so just needed to convert it)

LabVIEW is shipped with some automation examples. The one below (from the examples) interacts with Excel but the principle is the same. I couldn't find any examples of Ms Word without the Report Toolkit because most interaction with MS products is generally the other way - writing reports. 

I don't have M$ products installed to knock up a quick example, unfortunately.

Untitled.png

Share this post


Link to post
Share on other sites

Thank you, ShaunR, for your attempts. This is my test code. I can read the file but the output string contains no hidden data, only U8 symbols. So, I get question mark instead of the national character.

word_reader_160913.png

txt_text.doc

word_reader_160913.vi

Edited by _Y_

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.