Jump to content

Data acquisition on disk.


Recommended Posts

Hi guys.

I have created 4 different applications that do data acquition using cards from NI and MS SQL Server. The logic is the same for all of them:

I use text files separated with TAB. I open a handle to the file outside the main loop, inside the loop i do the acquisition, and finally i close the file.

Then i initiate a job on the SQL Server with runs a DTS package that uploads this text file to a database LOCALY (text file and database on the same computer!) and then deletes this file. It will be created by the next acquisition procedure.

For a specific application, the measurement file is more than 30MB long, containing approximately 850.000 records!!! The acquisition rate is very high, 250 samples per 50ms. No, it can't be reduced at all!

All these records where impossible to be uploaded to SQL Server that's why i created the job, to have the server do the upload and leave my application in peace to perform the next measurement. I must admit that the upload of the file from the Sql server is VERY quick, about 1 minute for all of these records.

And here arises my problem. My industry produces alluminium coils. The application i mentioned, takes thickness measurements from 50 coils per day. Imagine this, 50 coils X ~850.000 samples per coil = 42.500.000 record PER DAY!!! What about per week or per month??? My database would be very large not only beacuse of the amount of containing data on disk, but the number of records. A SELECT query may take up a long time to execute. I've thought of keeping my data in BLOB records, but then i would not be able to perform agreggate functions, or make comparisons. I've also thought of leaving the text files intact, and just adding one record for every coil, containing the path information for the data file (the TAB separated file).

I must mention that my software works fine, the DTS on the SQL server does its job nicely. I just want to hear what people, that have been involved with data acquitition for years, have to say about this.

Here is the structure of the data file:

Date/Time {TAB} Entry_Thickness {TAB} Exit_Thickness

02/08/2005 10:54:21.21 {TAB} 0.0732 {TAB} 0.0660

02/08/2005 10:54:21.25 {TAB} 0.0736 {TAB} 0.0658

..... ...... .....

This file is viewable with NOTEPAD. But it never actually opens. Too large for notepad!

What about saving to binary files? I've thought of this but never implemented it. Any thoughts?

I'm sorry for the length of my post.

Thanks a lot

Nick The Greek

Link to comment
This file is viewable with NOTEPAD. But it never actually opens. Too large for notepad!

What about saving to binary files? I've thought of this but never implemented it. Any thoughts?

I'm sorry for the length of my post. 

Thanks a lot

Nick The Greek

5486[/snapback]

Don't apologize for the length, most posts contain too little information. :)

Human readable (ASCII) vs. Machine readable (Binary):

I started with microprocessors where an entire program could reside comfortably in 4K. I spent hours squeezing every byte out of the program and packed my data as tight as it could be. For the time, the effort was appropriate. (Yes, I'm an old fart) :D

I now manage an airborne data acquisition system that records a lot of data and I record everything as generic CSV files (tab, space, comma, whatever). I think that the benefit of human readable data far outweighs any space saving acheived by using a binary format.

My thinking is as follows:

- Most, if not all LV applications are very low quantity installations, therefore the CPU hardware cost is small relative to the dev. cost.

- I can buy a 250 GB SATA drive for less than 2 hours of my time and that price will continue to drop. It won't be long before consumers have Terabye+ machines to store bad home videos and make sure they don't miss their favorite irrelevant soap operas.

- That recorded data can be read by a human, Exel, LV, and just about every other data related program. I only have to write the code once and I don't have to worry about other apps interpreting my binary format.

- If my requirements are "on the edge" of what is available when I start the project, the requirements will be more than manageable before the product is finished.

- I'm all for elegance and efficiency, but within what parameters? Even if there was a possibility that an app I was developing would become a high volume item, I would still use LabVIEW first, as it is the best rapid prototyping system that I have found so far. I would establish the proof-of-concept first and then, if and when the numbers become real, I would then look at optimization.

Does it bug me that a stand-alone system now has more storage than 500 multi-user systems had 30 years ago? Yes, a little, but I'm getting over it.

I hope my rant helps a little.

Regards,

Barrie

Link to comment

Hi Nick:

Like Barrie I'm an old fart :D ... An like him, I started with a microprocessor where the whole program could fit in 4 K (In my case, less than that, perhaps just a couple of yards of paper punch tape.) There was no question of compacting the data as much as possible-- you conditioned the signal with analog electronics to use most of the range of the 8 bit a/d convertor, and stored each sample as an 8 bit word. Either that, or you ended up with less than 8 bits resolution. (In my case, the binary data was transferred to paper punch tape, had to carry it to a mainframe in order to do an FFT...)

I agree completely with what Barrie said-- I might add one note from past experiences. I used to have programs which stored data as compact binary. I had conversion utilities which converted the data to printable ASCII CSV files. This expanded the file size by about a factor of 10. Funny thing is, if you subsequently put that CSV file through PKZIP, it ended up about the same size as the original packed binary. The Zip routines are pretty good-- If a data file has 10% information, and 90% fluffy formatting, Zip will pack it down pretty near to a factor of 10. So when computers got fast enough to do the binary-to-printable ASCII conversion on the fly for my applications, I started doing my initial stream-to-disk storage as printable ASCII.... Which I would subsequently PKZIP for storage. Convenient to send to others too-- no need to send them the clunky conversion program and teach them to use it-- Everyone has Zip or something like it.

And not too long after that as storage costs continued to decrease, I realized that the time I was spending zipping data, and unzipping the data to search through it or use it-- fast as that now is-- cost more than the storage. Today I always store my data as printable ASCII. Perhaps backup and archive utilities pack it, I don't mind if they do, as long as they don't waste my time doing it. I might zip it myself if I'm attaching it to an email, but other than that, leave it in readable form.

Concerning your question about searching and indexing the data-- I haven't a lot of experience with database searching, but it seems to me that writing the index file is well worth while, given the size of your data set. On the other hand, if the most common search is to find the data file associated with a particular coil, that implies that each coil has an unique serial number or a date/time code. Why not use the S/N or date code for the file name? Then all you need to do is search for the filename in Windows.

Like Barrie, I've ranted a little, perhaps even rambled, ;) but I hope my rambling has helped a little.

Best Regards, Louis

Link to comment
For a specific application, the measurement file is more than 30MB long, containing approximately 850.000 records!!! The acquisition rate is very high, 250 samples per 50ms. No, it can't be reduced at all! 

5486[/snapback]

I think binary files would be a better way to go. The files are anyway NOT human readable (if you can't open them up!!). This will make your files much smaller and your database easier to handle.

If there is any specific info that you absolutely must be able to have human-readable, just throw that in the file name.

For example: Coil_01_2004_08_03_1_30_pm_First_Run.bin

It will be a bit of work to write the acquisition and file-read VI's, but once debugged, you don't have to mess with them again. Look at the binary data acquisition VI's as well for ideas.

You could break up the number of records in a single file, have individual databases by month or year to limit the size of your database.

Alternately, you could have a summary of the data go to a database, i.e., max/min/median thickness for the day or something like that.

Face it, 99.99% of the time, nobody is going to look at the reams of raw data. I used to work in the Fuel-Cell industry where they had a mania for maintaining raw data (2Gigs per day). I found in a year's time, NOBODY bothered with looking at it! They were only interested in anomalies (deviations from the norm). That might help you to think about what is important to look for and only store that bit of the information, rather than reams of raw data.

Cheers,

Neville.

Link to comment

I agree with everyone partly I guess.

I work for a specialized battery manufacturer and we take way more data than necessary all the time.

The usefulness of human readable data cannot be ignored IF someone actually reads the stuff. We have CD's full just sitting around during the development and testing of the batteries we produce. Once the battery is out the door we either give the data to the customer or just summary data sheets. Either way we keep copies of the summary data sheets just for us for a period of time then they get disposed of.

You are producing a ton of data that probably will never get looked at. If someone needs to see it you could easily convert it from binary to ASCII for them to read but in the mean time you can save storage space by saving it in binary. If your company is like mine you are better off saving space since even getting a new hard drive can be a huge proposition at times. You can also burn it to CD's and store them and delete the data off the hard drive of your storage computer to keep it running nicely.

Basically what comes to mind for me is how long it needs to be stored and how often it gets looked at. If you only keep it a short time or it gets looked at frequently then ASCII may be the way to go. If you need to keep it forever and no one ever looks at it then binary might be the better option.

Hope this isn't as confusing as it looks.

Jack

Link to comment

I work for an OEM that manufactures machines for the metals industry. We too log the thickness deviation for every single coil every 10 msec and about 255 more signals. However, we don't use Labview to do it. Why reinvent the wheel for this type of data acquisition? IBA has the best solution for us and they have been working in the metal industry for many years. Their product is called PDA & ibaAnalyzer.

http://www.iba-germany.com/

Link to comment
  • 3 weeks later...
I work for an OEM that manufactures machines for the metals industry.  We too log the thickness deviation for every single coil every 10 msec and about 255 more signals.  However, we don't use Labview to do it.  Why reinvent the wheel for this type of data acquisition?  IBA has the best solution for us and they have been working in the metal industry for many years.  Their product is called PDA & ibaAnalyzer.

http://www.iba-germany.com/

5566[/snapback]

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

We use IBA's products !!!!!!!!!!!!!! (Don't ask me where i work, i just forgot it ;) )

But they are too expensive, that's why i believe that in my case reiventing the wheel is cheaper!!! I mean, this is a small application aqcuiring 2 signals. Why pay 30.000 euros at lease when i can built it myself? The problem is not aqcuisition itself, but saving it.

Anyway thanks, for an answer. I really appreciate IBA's products, as we use it every single day without problems.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.