Reading in large data files

abuch06 · August 11, 2007

I need to read in files over 1GB and the exress read VI doesn't cut it anymore. The algorithm in MatLab that I used to use to read in files like this worked pretty well; read in 10000 pieces of data, processed, then spit it out. I have the OpenG lib and I tried some of the read data file VI's that are in there, but they all sent out garbage. Where there was supposed to be a zero the VI would return 2.65487E-28 or something like that... The data is not complicated: first column is relative time and the other columns are channels of data.

Any suggestions? Thanks...

Tomi Maila · August 11, 2007

You provide us now with too little information to solve the problem. Are you trying to read a text file? Is it tab delimited? What are the datatypes stored in the file? Do you have code you are using to read the file and could you share it with us? Does the problems occur only with files over 1GB or also with small files? Can provide a small sample file that reproduces the problem?

Gary Rubin · August 11, 2007

QUOTE(abuch06 @ Aug 10 2007, 09:29 AM)

The algorithm in MatLab that I used to use to read in files like this worked pretty well; read in 10000 pieces of data...

Which Matlab file read are you using?

abuch06 · August 11, 2007

Sorry... i cant post a copy of my vi because my work has some sort of firewall up...

Its a tab delimited file and has voltage readings for each channel with respect to the time column. Its just a regular ASCII file, so I can open it up in notepad and read through the data. The problem only occurs with the larger files, because the Read from Measurement File Express VI can only read in relatively small files. The file looks exactly like this, starts with the names of the channels, then moves to data and channels:

FAIRING_NOSE

FAIR_FWD_BOLT_1ST

FAIR_FWD_BOLT_2ND

S2_SEPARATION

S3_TVC_BATTERY

FAIR_BASE_JOINT

PAYLOAD_SEP_1ST

PAYLOAD_SEP_2ND

0.000000 0.004883 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

0.000200 0.000000 0.000000 0.000000 0.000000 -0.004883 0.000000 0.000000 0.004883

0.000400 -0.004883 -0.004883 -0.004883 -0.004883 0.000000 0.000000 0.000000 0.000000

0.000600 0.000000 0.000000 -0.004883 -0.004883 -0.004883 -0.004883 0.000000 0.000000

0.000800 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

0.001000 0.000000 -0.004883 0.000000 0.004883 0.000000 0.000000 0.000000 0.000000

The first column is time in seconds. It goes on for about an hour, so the file ends up being pretty big.

Yuri33 · August 11, 2007

Is there any reason you can't implement the same solution in LV that you implemented in Matlab (read in\process\spit out 10000 samples at a time)? LV provides equivalent open\read\seek\close functions that Matlab has.

abuch06 · August 11, 2007

i tried using the scan file option, but it was really slow... i put it in a for loop for the number of rows. I wanted it to return a 1D array, but it gave me each individual double.

Grampa_of_Oliva_n_Eden · August 11, 2007

QUOTE(Yuri33 @ Aug 10 2007, 10:29 AM)

Is there any reason you can't implement the same solution in LV that you implemented in Matlab (read in\process\spit out 10000 samples at a time)? LV provides equivalent open\read\seek\close functions that Matlab has.

Depending which version of LV this may or may not work.

LV 8.0 and up handles 64 bit file pointers (I believe). Prior to that, the file offset was limited to 32 bits. But even then the file I/O primatives tracked the file size internally using 64 bit, so as long as you just let the file I/O stuff track the file pointer, it would still work.

HDF5 will also handle files that large.

Under Windows XP and prior you could seldom read in ALL of the file at once becuase Windows could only provid about 1.2 G of memory.

Ben

abuch06 · August 11, 2007

i was using textscan in Matlab

yea Labview throws an exception before it even begins reading in the file. Can anyone post a screenshot of a VI that would read in that kind of file using the commands in Labview 8.0.1?

this is what i have in Matlab

fprintf('Working');

totalData = textscan(fid, formatStr, 10000,'delimiter','\t', 'headerlines', startPoint);

while ~isempty(totalData{1}) %continue processing until no more data is read

for C=1:sizeChannels+1 % across each channel

for L=1:max(size(totalData{1,C})) % down each row

if C==1

time_counter = time_counter+1; %adds up to find total time of test

elseif totalData{C}(L)>outofbounds(C-1) || totalData{C}(L)<(-outofbounds(C-1))

out_of_bounds_matrix{C-1}(1,count(C-1)) = totalData{1}(L); %adds time and value to out of bounds array

out_of_bounds_matrix{C-1}(2,count(C-1)) = totalData{C}(L);

count(C-1)=count(C-1)+1;

end

fprintf('.');

totalData = textscan(fid,formatStr,10000,'delimiter','\t'); %reads in 10000 pieces at a time

end

Gary Rubin · August 11, 2007

QUOTE(abuch06 @ Aug 10 2007, 10:49 AM)

i was using textscan in Matlab

Textscan reads ascii text.

I'm using very old versions of the OpenG largefile VIs, but as far as I know, they only do binary formats, not text.

Ton Plomp · August 11, 2007

Here's some example code:

http://forums.lavag.org/index.php?act=attach&type=post&id=6603

But it's not quite optimal.

Be aware that you have a serious load of data:

5 kHz*3600*9*8=1.3 GB, this will most likely not be allowed by your memory since LabVIEW needs the array in one continous block of data.

So I'd try create list of pointers with the byte position of every second, this will creat an array of 8*3600=28.8 kB

Ton

PS update your profile it shows you use 8.2.1, but you stated 8.0.1 earlier

crelf · August 11, 2007

QUOTE(abuch06 @ Aug 10 2007, 11:29 PM)

Where there was supposed to be a zero the VI would return 2.65487E-28 or something like that...

Have a look at http://en.wikipedia.org/wiki/Floating_point' target="_blank">this wikipedia article for an exmplination on why a floating-point representation of 0 isn't really 0 in the computing world.

LAVA 1.0 Content · August 14, 2007

I have a large (multi-gigabyte), fixed record length binary file that I extract values from in a post-processing step (no data acq running). I used queues and parallel loops to improve my data load time from ~5 minutes to ~30 seconds!

Here is a picture of my VI. My records are binary, but you could use "read lines" and a for loop to enqueue the strings in the lower loop. Part of my conversion to actual values is done in a sub-vi that I use in other places; you could use Ton's example to extract the values from the string (spreadsheet string to array).

To improve your load time; Use parallel loops to retrieve data and convert it; predefine your array sizes and use replace element instead of build array or autoindexing.

http://forums.lavag.org/index.php?act=attach&type=post&id=6615

abuch06 · August 15, 2007

Thanks guys, I really appreciate all the help! I will definitely use some of this stuff.

Sign In

Reading in large data files

Recommended Posts

abuch06

Link to comment

Tomi Maila

Link to comment

Gary Rubin

Link to comment

abuch06

Link to comment

Yuri33

Link to comment

abuch06

Link to comment

Grampa_of_Oliva_n_Eden

Link to comment

abuch06

Link to comment

Gary Rubin

Link to comment

Ton Plomp

Link to comment

crelf

Link to comment

LAVA 1.0 Content

Link to comment

abuch06

Link to comment

Join the conversation

Browse

Activity

Important Information