Jump to content

Herbert

NI
  • Posts

    66
  • Joined

  • Last visited

Posts posted by Herbert

  1. I agree that classes in general should be tested through their public interfaces. On the other hand, I would want to design my tests so they lead me to the root cause of a problem in the shortest amount of time possible. If a "black box" test using my public interface fails, I don't want to have to dig down my VI hierarchy in order to find the root cause. Not if I know that a "white box" test inside the class could have provided me with that information without me doing anything. So I guess I want to test both the public interface and the private methods.

    Obviously, black box testing is the only way of making sure that you're testing the exact behavior your class will expose to its callers. A white box test can interfere with the inner workings of a class, bearing the risk that it alters the classes behavior or otherwise produces results that couldn't occur in a black box test. So, if a black box test fails, I'll probably have to fix my code. If a white box test fails, I might have to fix the test instead. Sometimes it's worthwhile adding and maintaining a white box test, sometimes it's not ...

    I strongly encourage everyone who is interested in unit testing to watch out for new releases on ni.com/softwareengineering and related content on ni.com/largeapps on Friday, 02/06/2009.

  2. I might look at this through my TDMS glasses too much, but to me, the natural way of storing the events you have mentioned would have been to create a channel for each cluster element - where the channel is of the same data type as your cluster element. I realize that this requires you to unbundle and bundle the cluster for writing and reading, respectively. But you wouldn't loose any numeric accuracy, any timestamp tidbits or other things. The only advantage I can see in storing everything as strings would be less coding. Am I missing something there?

    I have thought a lot about allowing arbitrary clusters in TDMS. The problem, as you mentioned, is, that you don't know what kind of data you're really dealing with, so it's impossible to magically do the right thing. Some cluster elements are better off being stored as properties, but how would I know? If I store them as properties because they are scalar, I'm out of luck if they change their value after 1000 iterations. Similarly, what would I do with a numeric array in the cluster? Create a channel? Append the array values from the next cluster to that channel? What if these are FFT results? I have not been able to come up with a good way of identifying these things automatically. Of course, you can always come up with some fancy piece of UI that allows users to assign cluster elements to TDMS objects (smells like Express VI :P ), but the best interface we have for making that assignment is the block diagram.

    If a cluster doesn't contain arrays or other clusters, you could make a case for that we should handle that by making each cluster element a channel. That would be a viable thing to do. But when it comes to nested clusters and clusters that include arrays, providing "automatic" handling creates expectations that can hardly be fulfilled.

    Herbert

  3. QUOTE(Kevin P @ Jun 19 2007, 10:50 AM)

    1. Talk to NI internal account people so I can install older LabVIEW 8.20 on my newly purchased license rather than 8.2.1 (Would also install DAQmx 8.3 instead of 8.5). This seems like an easy way to be sure that my stand-alone executable will be compatible with the older deployed PC.

    2. Go ahead and install LabVIEW 8.2.1, but stick with DAQmx 8.3. Then I'd be building an 8.2.1 executable that I want to deploy to an older PC with 8.20 runtime. Will this work, both in general due to runtime version difference and in particular with respect to the TDMS functions? Bear in mind that the older PC is creating the TDMS files with a LV 8.20 app.

    Kevin,

    the second option will not work. The executable needs to be compiled with the same version the runtime of which you are using. The incompatibility is not in the TDMS files, it exists between LabVIEW-compiled code and the LabVIEW Runtime Engine. So, I'm afraid the first option will be the only way to go (except for updating everything to 8.2.1, regardless of the DAQmx version).

    Herbert

  4. I had a chance of seeing motorcycle traffic in Vietnam recently. People participating or joining traffic never look to left, right or back, but that's ok since everybody is aware of it. If you want to pass by someone, you generally honk, so they know something is coming from behind. The whole thing might look quite familiar to you if you have one of these screen savers that simulate a fish swarm. There are more details, but it is scary enough just like that.

    Herbert

    http://forums.lavag.org/index.php?act=attach&type=post&id=6142

  5. Thang,

    A) The idea here is that users should never have to touch properties like wf_increment or even know about them. We use the wf_xxx properties to store things that are embedded in LabVIEW data types (e.g. T0 and dT are embedded in the waveform data type). If you use waveforms correctly, all of these properties should be written and read without you doing something special. That of course only works if the waveforms have the correct values in them. Since you are asking - here are the important ones:

    • T0 is saved to wf_start_time (timestamp).
    • dT is saved to wf_increment (double).
    • If your data is not time-domain, wf_start_time will still be set, but your X0 value goes into wf_start_offset (double). This will happen for example with frequency-domain data or histogram results.
    • If you exchange data with DIAdem, you need to set the wf_samples property to something other than 0 (we ususally set it to the number of values in the incoming waveform, so in your file, it is 1). DIAdem will use this property to determine whether a channel is a waveform or not.

    B) That's exactly right. The only thing you need to do is set the property NI_MinimumBufferSize (integer) for each of your data channels to 1000 or 10000 or something. The TDMS API does the buffering automatically (requires LV 8.2.1). This is not crucial to the functionality of your application, but it will speed up writing and reading quite a bit.

    Unrelated) I see from the flags on your account that you're from Vietnam. I just came back from 2 weeks of vacation, visiting friends in Vietnam. They took me on a roundtrip through the country, including Hanoi, Ha Long, Nha Trang and Saigon. Best vacation I had in a long time. I'm addicted to Caffee Suo Da now :thumbup:

    Herbert

  6. Can't you just go with the one waveforms you acquire and split it up, e.g. using "Get Waveform Components" combined with "Get Digital Components" or using some of the functions on the "Digital Waveform" -> "Conversion" palette?

    Herbert

  7. QUOTE(Thang Nguyen @ Jun 18 2007, 12:23 PM)

    Yeah, they are stored in the right order. And I store data at different rate. You can see the number of data in high frequency is larger the number of data in low frequency. All of them start and stop at the same time.

    Looking at the file with the TDMS Viewer, what you have is:

    • different channel lengths (high freq channels have 618 values, low freq channels have 224)
    • same dT (1.00 for all channels)
    • varying starting times (T0) for every channel

    It looks like you are using waveforms to save single values. In that case, I'm not sure that DAQmx or other functions that put out waveforms will set dT correctly, because there is no second value to reference to. If you save a series of single values to a waveform channel, you need to be really sure that they are equally sampled. If you're not sure, you should rather split up the waveform data type and store the timestamps and the data values to different channels (e.g. one timestamp and one double channel).

    Saving single values to TDMS like this is also not a very efficient thing to do. It is a lot more efficient to gather a bunch of values and write them as a larger array. You can have the TDMS API do that for you by setting the channel property "NI_MinimumBufferSize" to the number of values that you wish to buffer. In your case, good values might be 1000 or 10000.

    Hope that helps,

    Herbert

  8. QUOTE(torekp @ May 17 2007, 08:43 AM)

    I'm almost sure you've seen it, but just in case ... I posted some more details on how we benchmark file formats at NI on http://forums.lavag.org/index.php?s=&showtopic=7939&view=findpost&p=30185' target="_blank">this thread, including prerequisites and the actual VIs we use to run our benchmarks.

    For relatively short periods of writing, the profiler returns only the time it takes to shove your data into the Windows buffer, but that doesn't mean it's on disc yet. Don't yell at it - the poor thing doesn't know any better :blink:

    Herbert

  9. QUOTE(Tomi Maila @ May 16 2007, 11:28 AM)

    Tomi,

    I used HDF5 version 1.6.4. The LabVIEW API for that was never released to the public. I also don't have that code in my benchmark tool any more.

    You might need to rip some stuff out of the code, e.g. DAQmx or HWS, depending on what you have on your machine. Adding a format is rather simple. Just add it to the typedef for the pulldown list and add new cases to the open, write and close case structs.

    Some remarks:

    Hope that helps,

    Herbert

    http://forums.lavag.org/index.php?act=attach&type=post&id=5888''>http://forums.lavag.org/index.php?act=attach&type=post&id=5888'>http://forums.lavag.org/index.php?act=attach&type=post&id=5888

  10. Benchmarks

    Every benchmark was run on a "clean" machine. The machine is what used to be a good office machine 2 years ago. It has software RAID, which is a minor influence on some of the benchmarks. Depending on what machine you use (e.g. what harddrive, singleproc vs. dualproc etc.) results may obviously vary. If you see spikes in time consumption where my benchmarks don't show any, you might need a better harddisc / controller. Harddisc on Windows needs to be defragmented and at least half empty in order to achieve reproducible results. No on-demand virus scanning. Better shut down any service that Windows can survive without. Load your benchmark VI, open the task manager, wait until processor performance stays at zero and hit run. Make sure you have plenty of memory, so your system never starts paging.

    We did not care about the time it takes to write small amounts of data to disc. Windows will buffer that data and your application continues to run before the data is actually on disc. We only cared for sustained performance that you can hold up for an extended period of time. In order to achieve this "steady state", we stored at least 1000 scans in each of our benchmarks. The graphs in the attached PDF files show the number of scans stored on the x axis and the time it took for a single scan to be written on the y axis. The time consumed is only the time for the writing operation. Time consumed by acquisition and other parts of the application is not included.

    There are several things we were looking for in a benchmark:

    1. Overall time to completion (duh).
    2. Number and duration of spikes in time consumption. Minor spikes are normal and will occur with any file format on pretty much any system. Larger spikes can be a killer for high-speed streaming.
    3. Any dependency of performance on file size and/or file contents. This is where we eliminated most existing formats from our list. Performance often degrades linearly or even exponentially when meta data is added.

    Source data always was a 1d array of analog waveforms with several waveform attributes set. Formats under test were:

    • TDMS
    • TDM
    • LabVIEW Bytestream
    • LabVIEW Datalog (datalog type 1d array of wfm)
    • NI HWS (NI format for Modular Instruments, reuses HDF5 codebase)
    • HDF5
    • LVM (ASCII based, Excel-friendly)

    Some benchmarks only include a subset of these formats. The ones that are missing didn't perform well enough to fit in our graphs. HDF5 was tested only in the "Triggered Measurements" use case, because with the HDF5-based NI HWS format we already had a benchmark "on the safe side". The reason TDM goes down in flames in some benchmarks is that it stores channels as contiguous pieces of data.

    Mainstream DAQ

    First Benchmark is a mainstream DAQ use case. Acquire 100 channels with 1000 samples per scan and do that 1000 times in a row. Note the spikes when Datalog and HWS/HDF5 are updating their lookup trees. TDMS beats bytestream by a small margin because of a more efficient processing of waveform attributes.

    http://forums.lavag.org/index.php?act=attach&type=post&id=5882

    Modular Instruments

    Acquire 10 channels with 100000 values per scan. Here's where HWS/HDF5 still has TDMS beat. They do that by using asynchronous, unbuffered Windows File I/O. According to MS, that's the fastest way of writing to disc on Windows. We're working on that for TDMS. An interesting detail is the first value in the upper diagram. Note that HWS/HDF5 takes almost a second to initially create the file.

    http://forums.lavag.org/index.php?act=attach&type=post&id=5883

    Industrial Automation

    Acquire single values from 1000 channels. These are LabVIEW 8.20 benchmarks. With the 8.2.1 NI_MinimumBufferSize feature TDMS should look better than that, but I haven't run this test yet. Note that HWS/HDF5 takes about 3 seconds where all 3 native LabVIEW formats stay below 100ms.

    http://forums.lavag.org/index.php?act=attach&type=post&id=5884

    Triggered Measurements

    In this use case, every scan creates a new group with a new set of channels. This typically occurs in triggered measurements, or when you're storing FFTs or other analysis results that you cannot just append. We acquire 1000 values from 16 channels per scan for this use case. From all things, I've lost the original data for the HDF5 test, so I need to attach 2 diagrams. The first one is the 8.20 benchmark without HDF5:

    http://forums.lavag.org/index.php?act=attach&type=post&id=5885

    The second one is an older benchmark that was done with a purely G-based prototype of TDMS (work title TDS). I attached it because it has the HDF5 data in it. The reason HWS is faster than the underlying HDF5 is that it stores only a limited set of properties.

    http://forums.lavag.org/index.php?act=attach&type=post&id=5886

    Reading

    I also have a bunch of reading benchmarks, e.g. read all meta data from a file, read a whole channel from a file, read a whole scan from a file etc. These are less exciting to lock at though, because I only have aggregate numbers on that.

    We also recently conducted a benchmark on how fast DIAdem can load and display data from multi-gigabyte files, where TDMS was the overall fastest reading format.

    Hope that helps,

    Herbert

    QUOTE(Tomi Maila @ May 16 2007, 12:39 AM)

    I'd love to use TDMS but it doesn't suit our needs as it is today with only two hierarchy levels and lacking support multidimensional arrays (3-15d) and scalars. Are you intending to extend tdms format to support these features?

    Tomi

    Yes, we are planning to add these features. The underlying infrastructure (TDMS.DLL) is already fully equipped to do that, the file format already has placeholders for all necessary information in it. The reason we don't have these things yet is that TDMS is used for data exchange with DIAdem, where deep hierarchies and multi-dimensional arrays are not supported. So everytime we add something like this, we need to coordinate with other groups that use TDMS (CVI, SignalExpress, DIAdem...) to make sure everybody has an acceptable way of handling whatever is in the file. We're working on that.

    Herbert :headbang:

  11. QUOTE(Tomi Maila @ May 15 2007, 02:24 PM)

    Herbert, could you please specify the performance issues and if possible refer to the source.

    Tomi

    Tomi,

    prior to making TDMS, we ran a bunch of different benchmarks on a variety of file formats. Test cases included high-speed logging on 10 channels, single-value logging on 10000 channels, saving FFT results (the point being that you cannot append FFT channels) and more. HDF5 does great on small numbers of channels, but it started having issues when we had about 100 data channels, where a channel in HDF5 is a node with a bunch of properties and a 1D array. If you keep adding channels (as you have to in the FFT results use case), performance goes down exponentially (!) to the number of channels.

    HDF5 furthermore tends to produce spikes in time consumption when writing. We contacted the HDF5 development team about that and they responded that it was a known issue they would be working on, but they couldn't give us a timeline for when it would be fixed.

    Herbert

  12. I think I answered this on Info-LabVIEW earlier today ... for the kind of dataset you describe, the TDMS file format and the TDM Streaming functions (subpalette on File I/O) would be a good solution. TDMS files are binary, so their disc footprint is going to be much smaller than ASCII. The file size is only limited by your hard disc size. Within the file, you can organize data in groups that you can assign names and properties to (so you can use a smaller number of files). LabVIEW comes with a viewer application for TDMS. TDMS is also supported in CVI, SignalExpress and DIAdem, plus we provide an Excel AddIn for TDMS as a free download. If you need more connectivity than that, there's also a C DLL and a documentation of the file format available on ni.com.

    An SQL database might be a reasonable solution, too - if it is well designed. It'll certainly help you maintaining the large number of tests that you are storing. HDF5 is probably a bad idea. It is great for storing few signals at a high speed, but it has some really bad performance issues when it comes to storing large numbers of data sets.

    Hope that helps,

    Herbert

  13. When you open a file and write to it, the values are not immediately written to disc. They are cached in memory until a certain amount of data has accumulated, then the operating system will "flush" the cache to disc. If you need an event log for debugging purposes, you have two options:

    • Open, write and close everytime you write. Close will force the operating system to flush your data to disc.
    • Use the "Flush" function from the "Advanced" sub palette in file i/o. That essentially does the same thing, but it performs a little better, since you save the overhead for re-opening the file.

    If your system powers off without the OS properly shutting down, and that happens at a point in time when the system is flushing it's disc buffers, you might end up with a corrupted file. In that case, I would recommend to narrow down what causes the error, and then write one log file for each message from the code that appears to cause the problem. This will obviously result in tons of files, but it will tell you when your system bailed out.

    Hope that helps,

    Herbert

  14. QUOTE(Thang Nguyen @ May 8 2007, 10:05 AM)

    I want to now if there is any limitation in the size of the TDMS file? If YES, is there any solution to determine it?

    There is practically no limit to the file size. We can address any piece of data within a signed integer 64 range (that's almost 1000000 TeraBytes). The limiting factor for that would be your harddisc size.

    The only limiting factor is that we cash meta data and index information. That means we keep names and properties of all groups and channels in memory, plus everytime you store data to a channel, we will keep a uInt64 value that tells us where in the file that piece of data went. Hence, if you store a large number of objects in a loop with a large number of iterations, you can run out of memory eventually.

    To give you an idea of what I mean by large: One of our test cases for TDMS has 10000 channels with dozens of properties and 10000 chunks of raw data each. This test runs just fine on an average office machine. If your requirements are below that, you should be fine. If you ever run into that kind of limits, you might be successful by

    • using NI_MinimumBufferSize to reduce the number of times we actually write to disc
    • changing your group/channel setup to be more efficient
    • defragmenting your file
    • using multiple files.

    Hope that helps,

    Herbert

  15. WSDL

    I've been playing with WSDL quite a bit, using a variety of tools to create WSDL files and read them back in. Unfortunately, there are compatibility issues often times, e.g. some tools use namespaces that other tools don't support, some tools declare their data types in a way other tools won't recognize etc. So I went back to what I think is the grand daddy of web services - the Google Search API. I took their WSDL and did some text processing in order to adapt it to my server. Crude method. Works like a charm. For reference on WSDL, I was pleasantly surprised to see that the W3 specification of WSDL is very well written, and has a lot of examples. Great resource. http://www.w3.org/TR/wsdl#_http.

    Web Service

    The WSDL file is just declaring your API, and you still have to implement it. Programming the actual web server in LabVIEW is probably quite a bit of work. If you want it running stand-alone, you'll have to implement your own networking code. If you run a LabVIEW-built executable as a CGI app in a web server, you can avoid that effort, but you still need to write code to serialize / deserialize the SOAP strings. That's not exactly rocket science, but it might not be fun either. Plus, you need to live with the disadvantages of CGI apps, the most important of which is that they are slow, because every call will start up a new process.

    If you are using VISUAL Studio, I'd recommend building your own WSD as described above, having VISUAL Studio make a C# web service out of it, and hooking that up to either the LabVIEW ActiveX server interface or a LabVIEW-built DLL (the ActiveX server has the advantage that you can keep LabVIEW running in order to save performance). If you are using a C/C++ compiler, check out http://gsoap2.sourceforge.net/. I know there are many ways of implementing web services in other languages (I think about 90% of them are done in Java), but I'm not quite sure how the LabVIEW connectivity works from there.

    Hope that helps,

    Herbert

  16. You should be safe there. DAQmx refnums are indeed special in that they can be casted into strings. That applies to a variety of them, including channels, tasks, tags etc. I'm not sure it applies to all of them. LabVIEW automatically coerces these refnums to strings if they are wired to a terminal that expects a string.

    There have been some issues in the past if you use DAQmx refnums as attributes of variants or waveforms, where the variant/waveform goes into a LabVIEW function that tries to process the attributes. Doesn't look like you're going to have that use case though.

    Herbert

  17. QUOTE(PJM_labview @ May 1 2007, 02:34 PM)

    Unfortunately the described workaround (Check the found output) does not work.

    PJM

    :throwpc:

    I guess we'll have to get ugly then. "Get Properties" with no property name and no type wired to it will give you a list of all property names. Worst case that can be used to verify whether a property is there or not.

    Herbert

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.