Jump to content

data extraction from sheets


Recommended Posts

Hey

finally back to some LV programming, this time with an interresting challenge.

i have several large datasets to extract. the number of datasets is such, that i need a semi automated solution.

each dataset includes few thousand files.

some dataset are text files, and some are excel workbooks, including several sheets. this is the first time i deal with excel data extraction, therefore would be happy for some instructions.

the idea is this: look-up in the file a value of interest, and define its "position". Then, for all the files in this dataset the program will automatically extract this value. the way the prog should determine if a value is in the right field, is by its description with preceding string.

Everytime there is no match, i should be able to define a new look-up position.

At the end, i expect the program to pick up a file, and look the value related to several possible formats.

Any idea to give me a lead how to takle this?

edit: i will cross post it on the dark side as well

Link to comment

I can't say I fully understand what you want (an example of a file with what to search for would be nice), but you can easily search a single column by indexing it and using Search 1D Array.

If you want to search things by pattern, you might wish to hold the array as a delimited string, search the string itself and then find the number of delimiters before the match to know which cell you're in.

Talking to Excel is reasonably easy using ActiveX and there are some examples in the Example finder. Just note that the ActiveX interface change between Office versions, so the computer you develop on needs to have the same version of Office as the computer the program is being run on.

Alternatively, you can connect to Excel using ODBC and use SQL to extract data from it. That could handle the searching as well. I don't remember the details, but searching for Excel and ODBC or ADO might even find you some example VIs.

Link to comment

QUOTE (Yen @ Jun 5 2008, 07:25 PM)

If you want to search things by pattern, you might wish to hold the array as a delimited string, search the string itself and then find the number of delimiters before the match to know which cell you're in.

Hello Yen!

thats what i want. the problem is, i do not know in advance what is going to be the string before the value, and how many values wil there be. the best would be that a sheet open, and i could with the mouse show the field for the string and the field for the value(s).

since i have few hundred such formats, then upon opening a file, the software should compare its possible strings/fields with the one under examination. in case the is a fit, then extraction is automatic. but if there is no fit, then once again the datasheet should open itself, and allow me to choose the relevent field. a mouse operation would be fantastic here, and i was hoping somebody already has something done on similar topic.

to note that the datasheet can be a simple text file with no tab delimiting the string and the value (just a space or ':').

Link to comment

QUOTE (Gabi1 @ Jun 5 2008, 07:51 PM)

Hello Yen!

thats what i want. the problem is, i do not know in advance what is going to be the string before the value, and how many values wil there be. the best would be that a sheet open, and i could with the mouse show the field for the string and the field for the value(s).

since i have few hundred such formats, then upon opening a file, the software should compare its possible strings/fields with the one under examination. in case the is a fit, then extraction is automatic. but if there is no fit, then once again the datasheet should open itself, and allow me to choose the relevent field. a mouse operation would be fantastic here, and i was hoping somebody already has something done on similar topic.

to note that the datasheet can be a simple text file with no tab delimiting the string and the value (just a space or ':').

You could do this with Excel and the LabVIEW Report Generation Toolkit.

Open up the datasheet, prompt the user to select the data of interest, read (programmatically via .net) the current selection and you have your first match.

Ton

Link to comment

QUOTE (tcplomp @ Jun 5 2008, 07:59 PM)

You could do this with Excel and the LabVIEW Report Generation Toolkit.

Open up the datasheet, prompt the user to select the data of interest, read (programmatically via .net) the current selection and you have your first match.

Ton

you mean create a macro in excel?

i have tried that, with minimum success. the problem stays the same: it will open a file, discover it doesnt go with its known format, and trow an error. make more elaborate VB to handle all that is out of my league and mainly of my available time.

i am not sure how the report generation toolkit would help me there. coul you expnad?

Link to comment

I still don't understand exactly what it is you want to do (and probably won't without some examples), but in any case, if you cross-post, you should include the link in both posts, so that people can see what replies were given in the other thread.

Link to comment

QUOTE (Yen @ Jun 6 2008, 11:30 AM)

I still don't understand exactly what it is you want to do (and probably won't without some examples), but in any case, if you cross-post, you should include the link in both posts, so that people can see what replies were given in the other thread.

no response whatsoever on the other side, so...

let me depict to you:

i have 100 files, identical except the value:

"

NF=12

SXC 2 dB

PMD= 0.5

...

...

"

i want to extract the NF and PMD values. in this file format NF appear in the first line after the '=', PMD appear in the third line after the ' '. would be easy if i would be able to open the txt file, select with the mouse the string, and then the value, and store the field positions in LV.

Then i have another 100 files, but this time it look like this:

"

vendor...xxx

AP 0

NFa=12

SXC 2 dB

PMD= 0.5

...

...

"

this time the value of interest comes after the string NFa (rather than NF), and comes in the third line. no way to extract it without retelling the computer to refine his search. once i give it to the application, i would like him to keep this, so in the next file it opens it could try with the first criteria, and as the string comparison fails on the position, try with the second defined criteria.

some files come already in excel format, therefore it is easier to find fields. ultimately, after several such manual entries of string and corresponding field to extract the value, the application would be able to handle automatically any file format to extract.

i hope it is more clear, and thanks for your input.

Link to comment

I'm still not clear on the details, but maybe this will help:

Instead of attempting to train the software, why not just write code for each case and then run each one until you don't get an error?

Alternatively, you can try writing identification VIs (which will only attempt to identify the format and then return the format type. You can have one VI for each format and build them using a standard connector) and then run those until you have an answer.

If you're asking about actully parsing the file, it seems to me that the Match Pattern primitive might help you, but it's hard to tell exactly.

Link to comment

QUOTE (Yen @ Jun 7 2008, 10:27 PM)

I'm still not clear on the details, but maybe this will help:

Instead of attempting to train the software, why not just write code for each case and then run each one until you don't get an error?

Alternatively, you can try writing identification VIs (which will only attempt to identify the format and then return the format type. You can have one VI for each format and build them using a standard connector) and then run those until you have an answer.

If you're asking about actully parsing the file, it seems to me that the Match Pattern primitive might help you, but it's hard to tell exactly.

Thanks yen.

i have 2 problems related: i do not know in advance how will the file look like. therefore, i would like a run-time solution, where the file to be read would pop up (or container), and allow me to click on the fields of interest. then if next file is identical, such identification vi would look versus its own built library of "clicked fields", and would not send the file to the container UI for definition of a new format type.

second problem, more basic: i dont see a solution to open the file in a container, and allow the mouse to click and get its response and field. i am sure this has been done before, so i am looking for some existing examples.

i am not yet programing this thing, as the file extraction is the critical element wich fails because of the tremendous amounts of different file formats. manual extraction on the other hand is a brain suicide... :(

Link to comment

If the basic format of the file is the same, then all you need to do is read the field names and put them into a ring and let the user select which fields they want. Alternatively, you can have them mark the text in a string control and use the selection properties to determine which field it is. As I said, I personally don't really understand the structure of the files, so I can't help that much.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.