Jump to content

Read large data from text file and plot


hhtnwpu

Recommended Posts

hi, I want to read and process large data(nearly 100 million rows,the file size over 500MB), the original format of the file is .txt, through rename as .dat we get the binary file. The attachment is my vi. When the file is larger than 1 million lines(the data is singal Column), there is a wrong with "the memory is full".I want to read the data and plot a graph in the time domain, on the graph I can see the detail by zoom tools; and then do some anlysis about FFT and Statistics .I don't know how to do decimation in chunks .another thing ,maybe the memory release is also important. can you help me? thanks!

the data from a dynamic strain test,the sampling rate is 10K/s。if we can get the whole result use a little data decimation from a chunk ,when we zoom some detail, such as in one chunk, we can get the whole data on the graph without decimation?thanks!

read data.viFetching info...

Edited by hhtnwpu
Link to comment

Try to use the count terminal to set the size of chunks you read in every iteration and use shift registers for the refnum. This to build your data array stepwise.

By renamning the file extenstion from .txt to .dat you will not get a binary file. It still remains a textfile! Use VIs for reading textfiles. You can read textfiles line by line, if you like.

Edited by Anders Björk
Link to comment
  On 12/22/2012 at 5:07 PM, hhtnwpu said:

hi, I want to read and process large data(nearly 100 million rows,the file size over 500MB), the original format of the file is .txt, through rename as .dat we get the binary file. The attachment is my vi. When the file is larger than 1 million lines(the data is singal Column), there is a wrong with "the memory is full".I want to read the data and plot a graph in the time domain, on the graph I can see the detail by zoom tools; and then do some anlysis about FFT and Statistics .I don't know how to do decimation in chunks .another thing ,maybe the memory release is also important. can you help me? thanks!

the data from a dynamic strain test,the sampling rate is 10K/s。if we can get the whole result use a little data decimation from a chunk ,when we zoom some detail, such as in one chunk, we can get the whole data on the graph without decimation?thanks!

The easiest (and most memory efficient) solution is to pre-process the file and put the data in a database and then use queries to decimate.

Take a look at the "SQLite_Data Logging Example.vi" contained in the SQLite API for LabVIEW. It does exactly what you describe but with real-time acquisition..

Edited by ShaunR
Link to comment

The advice for acquiring the data sounds good. Pulling the data in as chuncks for parsing and placing into preallocated arrays will keep things memory efficient.

The problem is that 100 million points is always going to cause you issues having it in memory at once. You also will find if you try and write this to a graph this requires a separate copy of the data so this is going to cause issues again.

I think you are going to have to buffer to disk to achieve this. You can do it as a database but I would be just as tempted to just put it to a binary file when you have a simple array. You can then easily access specific sets of elements from the binary file (you cannot do this easily with a text file) very efficiently. I think for the graph you are going to have to determine the best way to deal with this. You are probably going to have to decimate the data into it and then allow people to load more detail of the specific area of interest to minimise the data in memory at any given time.

Link to comment
  On 12/22/2012 at 9:01 PM, JamesMc86 said:

The advice for acquiring the data sounds good. Pulling the data in as chuncks for parsing and placing into preallocated arrays will keep things memory efficient.

The problem is that 100 million points is always going to cause you issues having it in memory at once. You also will find if you try and write this to a graph this requires a separate copy of the data so this is going to cause issues again.

I think you are going to have to buffer to disk to achieve this. You can do it as a database but I would be just as tempted to just put it to a binary file when you have a simple array. You can then easily access specific sets of elements from the binary file (you cannot do this easily with a text file) very efficiently. I think for the graph you are going to have to determine the best way to deal with this. You are probably going to have to decimate the data into it and then allow people to load more detail of the specific area of interest to minimise the data in memory at any given time.

You'll end up writing a shed-load of code that realises your own bespoke pseudo database/file format that's not quite as good and fighting memory constraints everywhere

Much easier just to do this:

Edited by ShaunR
  • Like 1
Link to comment
  On 12/22/2012 at 11:45 PM, ShaunR said:
You'll end up writing a shed-load of code that realises your own bespoke pseudo database/file format that's not quite as good and fighting memory constraints everywhere Much easier just to do this:

thanks for your advice. but the sqlite can't run when the version is below labview 2009.besides, how about the HDF5?if the sqlite is easier than HDF5?

  On 12/22/2012 at 9:01 PM, JamesMc86 said:
The advice for acquiring the data sounds good. Pulling the data in as chuncks for parsing and placing into preallocated arrays will keep things memory efficient. The problem is that 100 million points is always going to cause you issues having it in memory at once. You also will find if you try and write this to a graph this requires a separate copy of the data so this is going to cause issues again. I think you are going to have to buffer to disk to achieve this. You can do it as a database but I would be just as tempted to just put it to a binary file when you have a simple array. You can then easily access specific sets of elements from the binary file (you cannot do this easily with a text file) very efficiently. I think for the graph you are going to have to determine the best way to deal with this. You are probably going to have to decimate the data into it and then allow people to load more detail of the specific area of interest to minimise the data in memory at any given time.

thank you for your suggestion. How can i convert the text file to binary file? i just know the data original format is HEX with muliti-columns.

  On 12/22/2012 at 8:17 PM, ShaunR said:
The easiest (and most memory efficient) solution is to pre-process the file and put the data in a database and then use queries to decimate. Take a look at the "SQLite_Data Logging Example.vi" contained in the SQLite API for LabVIEW. It does exactly what you describe but with real-time acquisition..

thanks. if the sqlite will influence make a .exe installer?

Link to comment
  On 12/22/2012 at 11:45 PM, ShaunR said:

You'll end up writing a shed-load of code that realises your own bespoke pseudo database/file format that's not quite as good and fighting memory constraints everywhere

If you had complex record types I would agree but this is just straight numeric data. A binary file is not that hard to work with and gives high performance random access with a smaller footprint than databases because it doesn't have all the extra functionality we are not using and returns the data directly in the correct type, no conversion necessary which is going to hit you on large data sets (and stress your memory more!). TDMS is maybe a better option again for having an easier API but should give performance similar to the binary file.

post-18067-0-96200400-1356255497.png

I believe TDMS and HDF5 should give a similar performance as they are both binary formats, but I have not worked with it directly myself.

For the conversion you are probably going to have to load the existing file in pieces and write them back to whatever other format you go with. The hard thing is knowing where the chunks are as potentially (depending on your format) each entry could be a different size. There is a read multiple rows option on the built in read from text file which is probably the best way to break it down (right-click>> Read Lines on Read text file).

Link to comment
  On 12/23/2012 at 7:04 AM, hhtnwpu said:

thanks for your advice. but the sqlite can't run when the version is below labview 2009.besides,

Some people have successfully back-saved to earlier versions of LabVIEW. There are certain features of the API that use methods that weren't available in older versions of labview but, if I remember correctly, there were only 2 or 3 of them (mainly using recursion)

  On 12/23/2012 at 7:04 AM, hhtnwpu said:

how about the HDF5?if the sqlite is easier than HDF5?

HDF5 is a file format. SQLite is a database. Whilst SQLite has it's own file format it has a lot of code to search, index and relate the data in the file. You will have to write all that stuff to manipulate the data contained in a HDF5 file yourself.

  On 12/23/2012 at 7:04 AM, hhtnwpu said:

thanks. if the sqlite will influence make a .exe installer?

Not sure what you are asking here. Can you make an exe? Yes. Do you need to add things to an installer? Yes-the sqlite binary.

  On 12/23/2012 at 9:44 AM, JamesMc86 said:

If you had complex record types I would agree but this is just straight numeric data. A binary file is not that hard to work with and gives high performance random access with a smaller footprint than databases because it doesn't have all the extra functionality we are not using

Yup. Looks really easy :) Now decimate and zoom in a couple of times with the x data in time ;) (lets compare apples with apples rather than with pips)

What I was getting at is that you end up writing search algos, buffers and look-up tables so that you can manipulate the data (not to mention all the debugging). Then you find its really slow (if you don't run out of memory), so you start caching and optimising. Databases already have these features (they are not just a file structure) , are quick and really easy to manipulate the data with efficient memory usage. Want the max/min value between two arbitrary points? Just a query string away rather than writing another module that chews up another shed-load of memory.

Having said that. They are not a "magic bullet". But they are a great place to start for extremely large data sets rather than re-inventing the wheel, especially when there is an off-the-shelf solution already.

(TDMS is a kind of database by the way and beats anything for streaming data. It gets a bit tiresome for manipulating data though)

Link to comment
  On 12/23/2012 at 11:37 AM, ShaunR said:

Some people have successfully back-saved to earlier versions of LabVIEW. There are certain features of the API that use methods that weren't available in older versions of labview but, if I remember correctly, there were only 2 or 3 of them (mainly using recursion)

HDF5 is a file format. SQLite is a database. Whilst SQLite has it's own file format it has a lot of code to search, index and relate the data in the file. You will have to write all that stuff to manipulate the data contained in a HDF5 file yourself.

Not sure what you are asking here. Can you make an exe? Yes. Do you need to add things to an installer? Yes-the sqlite binary.

Yup. Looks really easy :) Now decimate and zoom in a couple of times with the x data in time ;) (lets compare apples with apples rather than with pips)

What I was getting at is that you end up writing search algos, buffers and look-up tables so that you can manipulate the data (not to mention all the debugging). Then you find its really slow (if you don't run out of memory), so you start caching and optimising. Databases already have these features (they are not just a file structure) , are quick and really easy to manipulate the data with efficient memory usage. Want the max/min value between two arbitrary points? Just a query string away rather than writing another module that chews up another shed-load of memory.

Having said that. They are not a "magic bullet". But they are a great place to start for extremely large data sets rather than re-inventing the wheel, especially when there is an off-the-shelf solution already.

(TDMS is a kind of database by the way and beats anything for streaming data. It gets a bit tiresome for manipulating data though)

thanks a lot. I will have a try about SQlite. I think my data file is text format. Using the SQlite,when polt a graph of total time, whether there will have a problem about "out of memory" or "the memory is full"?

  On 12/23/2012 at 9:44 AM, JamesMc86 said:

If you had complex record types I would agree but this is just straight numeric data. A binary file is not that hard to work with and gives high performance random access with a smaller footprint than databases because it doesn't have all the extra functionality we are not using and returns the data directly in the correct type, no conversion necessary which is going to hit you on large data sets (and stress your memory more!). TDMS is maybe a better option again for having an easier API but should give performance similar to the binary file.

post-18067-0-96200400-1356255497.png

I believe TDMS and HDF5 should give a similar performance as they are both binary formats, but I have not worked with it directly myself.

For the conversion you are probably going to have to load the existing file in pieces and write them back to whatever other format you go with. The hard thing is knowing where the chunks are as potentially (depending on your format) each entry could be a different size. There is a read multiple rows option on the built in read from text file which is probably the best way to break it down (right-click>> Read Lines on Read text file).

thank you very much, as your advice ,i can avoid using database, really? how about very large data .as I know,when the file size is more than 50MB,there will be a "memory is full"problem.

Link to comment

To decimate loop on single values by an incremental value. Or for a proper display you still need to load a whole chunk to use an sk filter or similar to display correctly. If you just want max or min in a section that's where SQLite works nicely, but there is a single function to get min max of array anyway. I've written an example of the sort of thing you need to do (but not from file) at https://decibel.ni.com/content/docs/DOC-24017

The advantage of any of the methods is that you don't have to load the whole file thus removing the memory issue, you just load the section you need. That said all these methods depend on you not loading the whole data set at once. The fundamental issue is having the whole data set in memory at once

(null)

Link to comment
  On 12/23/2012 at 3:26 PM, JamesMc86 said:

To decimate loop on single values by an incremental value. Or for a proper display you still need to load a whole chunk to use an sk filter or similar to display correctly. If you just want max or min in a section that's where SQLite works nicely, but there is a single function to get min max of array anyway. I've written an example of the sort of thing you need to do (but not from file) at https://decibel.ni.c.../docs/DOC-24017

The advantage of any of the methods is that you don't have to load the whole file thus removing the memory issue, you just load the section you need. That said all these methods depend on you not loading the whole data set at once. The fundamental issue is having the whole data set in memory at once

(null)

thanks a lot.

Link to comment
  On 12/23/2012 at 9:44 AM, JamesMc86 said:

If you had complex record types I would agree but this is just straight numeric data. A binary file is not that hard to work with and gives high performance random access with a smaller footprint than databases because it doesn't have all the extra functionality we are not using and returns the data directly in the correct type, no conversion necessary which is going to hit you on large data sets (and stress your memory more!). TDMS is maybe a better option again for having an easier API but should give performance similar to the binary file.

post-18067-0-96200400-1356255497.png

I believe TDMS and HDF5 should give a similar performance as they are both binary formats, but I have not worked with it directly myself.

For the conversion you are probably going to have to load the existing file in pieces and write them back to whatever other format you go with. The hard thing is knowing where the chunks are as potentially (depending on your format) each entry could be a different size. There is a read multiple rows option on the built in read from text file which is probably the best way to break it down (right-click>> Read Lines on Read text file).

Thank you for your suggestion. as your opinoin, how can I plot all the data ,if the amount of data is very large , the memory still have a problem of "memory is full or memory is out"

Link to comment

I honestly am not sure if it will be possible with that amount of data points. Here are some tips that may get the code to run but even then you will find it will probably become very sluggish as LabVIEW has to process 100M points every time it has to redraw the graph, even if you don't, LabVIEW has to decimate the data as it only has 100-1000 pixels that it can use to plot the data.

1. Loading from a binary file is better than text because text has to be converted meaning two copies of the data. If you have text load it a section at a time into a preallocated array (you will have to be very careful about allocations).

2. Use SGL representation, the default in LabVIEW is normally DBL for floating point but single only uses 4 bytes per point.

3. By default on a 32 bit OS LabVIEW has 2GB of virtual memory it can use (hence the problems, in a SGL format each copy of data you have uses 20% of this). If you are on a 32 bit OS enable the 3GB flag so it can use 3GB instead (there is a KB on the NI site for this). Or moving to a 64 bit OS with 32 bit LabVIEW will give it 4GB. The ultimate would be to use 64 bit LabVIEW but you tend to hit limitations of supported tool kits so I tend to suggest this as a last resort when the memory sizes can be avoided through programming.

On top of these you just have to be very careful that any data manipulation you do does not require a data copy.

That is how you try and avoid running out if memory but I would still suggest trying some of the other methods that Shaun and I have suggested. Even if you can get this to run, the programming will be a little easier but the program is going to have poor performance with that much data and is always going to be on the brink, at any point you could add some feature which needs more memory and you are back to square one.

  • Like 1
Link to comment
  On 12/29/2012 at 11:00 AM, JamesMc86 said:
I honestly am not sure if it will be possible with that amount of data points. Here are some tips that may get the code to run but even then you will find it will probably become very sluggish as LabVIEW has to process 100M points every time it has to redraw the graph, even if you don't, LabVIEW has to decimate the data as it only has 100-1000 pixels that it can use to plot the data.

1. Loading from a binary file is better than text because text has to be converted meaning two copies of the data. If you have text load it a section at a time into a preallocated array (you will have to be very careful about allocations).

2. Use SGL representation, the default in LabVIEW is normally DBL for floating point but single only uses 4 bytes per point.

3. By default on a 32 bit OS LabVIEW has 2GB of virtual memory it can use (hence the problems, in a SGL format each copy of data you have uses 20% of this). If you are on a 32 bit OS enable the 3GB flag so it can use 3GB instead (there is a KB on the NI site for this). Or moving to a 64 bit OS with 32 bit LabVIEW will give it 4GB. The ultimate would be to use 64 bit LabVIEW but you tend to hit limitations of supported tool kits so I tend to suggest this as a last resort when the memory sizes can be avoided through programming.

On top of these you just have to be very careful that any data manipulation you do does not require a data copy.

That is how you try and avoid running out if memory but I would still suggest trying some of the other methods that Shaun and I have suggested. Even if you can get this to run, the programming will be a little easier but the program is going to have poor performance with that much data and is always going to be on the brink, at any point you could add some feature which needs more memory and you are back to square one.

thank you very much, now I can read and plot a data with 10 millions point, I will have a try to plot more data.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.