PaulG. Posted March 22, 2006 Report Share Posted March 22, 2006 What would be the best way to acquire 20usec of data at 1 GS/sec (20K points) in 1Khz pulse per second intervals? The board I am using seems to be capable, but the bottleneck appears to be in LV/Windows. Do I need RT for this? I have been at this for quite some time. I'm using a state machine, the vi "free runs" from one state to the next utilizing the external trigger (1K) signal. There is no display and all data points are placed in an array that I dimension before the DAQ sequence starts with "replace array subset". My board is capable of "multiple records" but that seems to have little effect on performance. Most of the commands to the digitizer utilize code interface nodes. Have I hit the proverbial wall and need to brush up on my C? Thanks. PaulG. Quote Link to comment
peteski Posted March 22, 2006 Report Share Posted March 22, 2006 What would be the best way to acquire 20usec of data at 1 GS/sec (20K points) in 1Khz pulse per second intervals? The board I am using seems to be capable, but the bottleneck appears to be in LV/Windows. Do I need RT for this? I have been at this for quite some time. I'm using a state machine, the vi "free runs" from one state to the next utilizing the external trigger (1K) signal. There is no display and all data points are placed in an array that I dimension before the DAQ sequence starts with "replace array subset". My board is capable of "multiple records" but that seems to have little effect on performance. Most of the commands to the digitizer utilize code interface nodes. Have I hit the proverbial wall and need to brush up on my C?Thanks. PaulG. PaulG, Would it be possible to tell us the hardware you are using? 20MBytes/sec of sustained/undisturbed transfer for an undefined amount of time is likely to be an issue for a Windows based system, UNLESS there is a "generous" application of buffering involved. -Pete Liiva Quote Link to comment
Gary Rubin Posted March 22, 2006 Report Share Posted March 22, 2006 What seems to be the symptom? Are you missing triggers? Overflowing a buffer on the board? I agree that it would help to know which board you're using. Does it have an onboard FIFO for data acquisition and transfer? Gary Quote Link to comment
PaulG. Posted March 22, 2006 Author Report Share Posted March 22, 2006 PaulG,Would it be possible to tell us the hardware you are using? 20MBytes/sec of sustained/undisturbed transfer for an undefined amount of time is likely to be an issue for a Windows based system, UNLESS there is a "generous" application of buffering involved. -Pete Liiva It's a Gage 82G. The board has 2 MByte FIFO-style memory and it is capable of acquiring multiple records. Most likely it will only run for about 2 - 3 seconds at this rate at the most then process the data, then run again for 2-3 seconds. I need to do this for up to 30-60 minutes without crashing or running out of memory. I don't save all the data, just process certain portions of it utilizing array offsets later in the code. PaulG. Quote Link to comment
peteski Posted March 23, 2006 Report Share Posted March 23, 2006 It's a Gage 82G. Hmm... I don't think Labview RT is going to do a bit of good, I suspect that there would not be any drivers for this board. Perhaps the "state machine" is not doing what you need it to do. For DAQ systems I prefer a two parallel loop process with one loop simply doing nothing but acquiring the data and stuffing it in a queue, and the other loop feeding off of the same queue as fast as it can. Here is a quick and dirty idea to see if you even have a chance. Try a simple loop where you acquire the data at the rate you intend to in real life, and do nothing with it. Put in the loop the bare modicum of timing checks to see if things loop fast enough to imply that the data is being acquired that fast, and maybe set a display outside the loop to look at the last data set acquired AFTER the loop is terminated to verify that you were getting real data. Forget about the state machine, data processing, etc. Just initialize an array fed through a shift register to the 20Ksample size and replace that over and over. If you keep it simple enough, you ought to see if it is even possible to do what you require with your setup. Would you be willing to show your code for people to look at to see if there are any "gotchas" in the diagram? You might want to do screen captures, since the code for your code interface calls will typically be a pain to properly transfer with the actual vi if you posted that. -Pete Liiva Quote Link to comment
PaulG. Posted March 23, 2006 Author Report Share Posted March 23, 2006 Hmm... I don't think Labview RT is going to do a bit of good, I suspect that there would not be any drivers for this board. Perhaps the "state machine" is not doing what you need it to do. For DAQ systems I prefer a two parallel loop process with one loop simply doing nothing but acquiring the data and stuffing it in a queue, and the other loop feeding off of the same queue as fast as it can.Here is a quick and dirty idea to see if you even have a chance. Try a simple loop where you acquire the data at the rate you intend to in real life, and do nothing with it. Put in the loop the bare modicum of timing checks to see if things loop fast enough to imply that the data is being acquired that fast, and maybe set a display outside the loop to look at the last data set acquired AFTER the loop is terminated to verify that you were getting real data. Forget about the state machine, data processing, etc. Just initialize an array fed through a shift register to the 20Ksample size and replace that over and over. If you keep it simple enough, you ought to see if it is even possible to do what you require with your setup. Would you be willing to show your code for people to look at to see if there are any "gotchas" in the diagram? You might want to do screen captures, since the code for your code interface calls will typically be a pain to properly transfer with the actual vi if you posted that. -Pete Liiva I managed to get my 20 usec of data at 1GS/sec at 1K pps. I made some modifications, put a "0" wait state in the iteration counter loop, put some millisecond timers from where the DAQ starts to where it stops, and with 2 seconds (2000 triggers) of 1Kpulses I'm down to 2003 milliseconds and it's very consistent. That's as close to real time as I need. And that's acquiring the data and replacing the data subsets in an array. But once I try to display it, even only every 500 milliseconds it goes up to 2300-2400 msec. Displaying the data is a major roadblock. I suppose I could try displaying a decimated array but I don't know how much that will buy me. Would a power video card help with this by taking some of the display work away from the PC? Video cards are cheap, and this would be a lot cheaper than another day or two (or three or four) of development time. PaulG. Quote Link to comment
peteski Posted March 23, 2006 Report Share Posted March 23, 2006 Well, it sounds like you might have half the challenge behind you, and half the challenge ahead of you. Without knowing your particular display needs, its hard to give specific advice, but I'll shoot off some "random" suggestions non-the-less. 20MBytes/sec of information is obviously alot for the "average human being" to digest, visually. Consider your analysis carefully, and then decide what really needs to be displayed to the user "real time". Keep in mind that a user is unlikely to be able to keep their attention to a screen for more than maybe a minute, at best, on a good day, with just enough but not too much coffee, etc. You may be able to use these facts to your advantage. I'll guess that you have waveforms of 20,000 points each. Your monitor is on the order of about 1000 or so pixels, so you could bin your waveform into one of less than 1000 points, and then keep track of a max, mean, and min of each bin. It might be possible to do that fast enough, but it means being very careful in how you do your operations. Additions are quick, but multiplications are slow, at least at these timescales. You should only update the screen at the fastest once a second or so - anything faster may look sexy, but its not going to impart enough of a lasting impression. I'd strongly suggest going with the two parallel loop shared queue solution, with your analysis in one loop and your daq in the other. It might be possible that you would then have a third parallel loop, which would be your display loop, that would share a display queue with the analysis loop that would be fed "well distilled" information once a second or so as an indicator of what is going on "under the hood". Do NOT try to display OR do data analysis in the data acquisition loop, please. I'm taking a wild guess that this is what kicks you into the 2300-2400 msec range. If you need to display, feed a queue and let a parallel loop eat from the queue and let that loop do analysis, update the display, take out the garbage... Displaying and data analysis in the data acquisition loop is fine at slower, more pedestrian applications, but it will simply not do for high speed applications such as the one you are working on. I hope this helps! -Pete Liiva p.s. Is this a LIDAR application? It sounds not entirely unlike some things we do around where I work. Quote Link to comment
PaulG. Posted March 23, 2006 Author Report Share Posted March 23, 2006 Well, it sounds like you might have half the challenge behind you, and half the challenge ahead of you. Without knowing your particular display needs, its hard to give specific advice, but I'll shoot off some "random" suggestions non-the-less.20MBytes/sec of information is obviously alot for the "average human being" to digest, visually. Consider your analysis carefully, and then decide what really needs to be displayed to the user "real time". Keep in mind that a user is unlikely to be able to keep their attention to a screen for more than maybe a minute, at best, on a good day, with just enough but not too much coffee, etc. You may be able to use these facts to your advantage. I'll guess that you have waveforms of 20,000 points each. Your monitor is on the order of about 1000 or so pixels, so you could bin your waveform into one of less than 1000 points, and then keep track of a max, mean, and min of each bin. It might be possible to do that fast enough, but it means being very careful in how you do your operations. Additions are quick, but multiplications are slow, at least at these timescales. You should only update the screen at the fastest once a second or so - anything faster may look sexy, but its not going to impart enough of a lasting impression. I'd strongly suggest going with the two parallel loop shared queue solution, with your analysis in one loop and your daq in the other. It might be possible that you would then have a third parallel loop, which would be your display loop, that would share a display queue with the analysis loop that would be fed "well distilled" information once a second or so as an indicator of what is going on "under the hood". Do NOT try to display OR do data analysis in the data acquisition loop, please. I'm taking a wild guess that this is what kicks you into the 2300-2400 msec range. If you need to display, feed a queue and let a parallel loop eat from the queue and let that loop do analysis, update the display, take out the garbage... Displaying and data analysis in the data acquisition loop is fine at slower, more pedestrian applications, but it will simply not do for high speed applications such as the one you are working on. I hope this helps! -Pete Liiva p.s. Is this a LIDAR application? It sounds not entirely unlike some things we do around where I work. Our application is an ultrasonic NDT application. I think I have a lot to work with this morning. I have to have some visual display during acquisition. However, the visual representation only needs to be just that. I can decimate the array and display only the decimated portion. I can cut it down by a factor of at least 20, refresh the display only once per second, interpolate the graph and it looks fine. Also, the data analysis happens after each series of scan pulses so I don't have to worry about that ... for now. Thanks for your help. PaulG. Quote Link to comment
peteski Posted March 23, 2006 Report Share Posted March 23, 2006 Our application is an ultrasonic NDT application. I'm somewhat surprised that full 1 GHz is necessary for for an ultrasonic non destructive testing application, but then again I'm used to the speed of sound in air (much slower then in solid, I hear! (all pun intended )) and I imagine you are trying to pin down the edge (and phase?) of some form of echo pulse(s). Sounds like a neat project! :thumbup: Again, I suggest that if you want continuous Daq, don't decimate or analyze in the same loop as the Daq is occurring in. Because of the onboard FIFO of the card, if the code interface calls are set up properly, while the acquisition is occurring your PC should have a little "free time" on its hands to try to accomplish tasks. If your analysis tasks are in a parallel loop, then those tasks can happen while the PC waits for the next data dump to come out of the card's FIFO. If you don't do it in parallel, the PC has to: (1) ask for the data (2) wait for the data (and maybe wasting CPU cycles in the process!!) (3) receive the data (4) then do the analysis (5) maybe then occasionally update a display (6) then it can come back around and go back to step (1) Best of luck! -Pete Liiva Quote Link to comment
Dan Bookwalter Posted March 23, 2006 Report Share Posted March 23, 2006 I'm somewhat surprised that full 1 GHz is necessary for for an ultrasonic non destructive testing application, but then again I'm used to the speed of sound in air (much slower then in solid, I hear! (all pun intended )) and I imagine you are trying to pin down the edge (and phase?) of some form of echo pulse(s). Sounds like a neat project! :thumbup: Again, I suggest that if you want continuous Daq, don't decimate or analyze in the same loop as the Daq is occurring in. Because of the onboard FIFO of the card, if the code interface calls are set up properly, while the acquisition is occurring your PC should have a little "free time" on its hands to try to accomplish tasks. If your analysis tasks are in a parallel loop, then those tasks can happen while the PC waits for the next data dump to come out of the card's FIFO. If you don't do it in parallel, the PC has to: (1) ask for the data (2) wait for the data (and maybe wasting CPU cycles in the process!!) (3) receive the data (4) then do the analysis (5) maybe then occasionally update a display (6) then it can come back around and go back to step (1) Best of luck! -Pete Liiva several years ago i had a DAQ application that required me to push the CPU to the limit before i did any analysis/display of the data , i ended up using Queue's along with /TCP/IP to push the data to a second PC that did all the analysis/display/storage. just another idea.... Dan Quote Link to comment
peteski Posted March 23, 2006 Report Share Posted March 23, 2006 several years ago i had a DAQ application that required me to push the CPU to the limit before i did any analysis/display of the data , i ended up using Queue's along with /TCP/IP to push the data to a second PC that did all the analysis/display/storage.just another idea.... Dan Yes, I fully agree! Been there and done that, and have a couple of T-shirts as a result (yes, literally! ) In fact my avatar happens to be "loosely based" on the program I did that for. It was my full immersion into queues for a variety of aspects of a test and measurement system. TCP/IP is something I had played with prior to that project, though. Its a fantastic protocol and for my purposes, I find its well implemented in LabView. I would tend to try to start out with a parallel loop on the same machine first, though. It requires just a little less development to implement and less equipment. Maybe with the new gigabit NICs you can stream all that raw data (1 Gbits per second * 0.4 "empty" network efficiency / 8 bits per byte = ~50 MBytes per second, hmm... not bad...) but you need to be aware of whether you are using a xover cable, switch, or hub. Most importantly in such a case, IMHO, is to keep the network between these "lab" machines disconnected from all other machines, and especially the Internet! -Pete Liiva Quote Link to comment
Gary Rubin Posted March 23, 2006 Report Share Posted March 23, 2006 Because of the onboard FIFO of the card, if the code interface calls are set up properly, while the acquisition is occurring your PC should have a little "free time" on its hands to try to accomplish tasks. My experience with Gage (I was using a CS14200 board) is that they do not do double-buffering in their FIFO, like most A/D cards do. As a result, you cannot acquire data while you are busy offloading. This means that your acquisition software loop has to be MUCH leaner than with most A/D cards. I don't have the documentation with any more (it was delivered to the customer along with the system), but my recollection is that there was a graph somewhere which showed maximum PRF for a given sample rate and sample length. Of course, this would be a little different using the multiple-record feature. I also found that the Labview drivers that you can get from Gage were simply calling some pre-compiled higher-level functions. These were quite slow. I used my own calls to the DLL to DMA the data and rearm the card. The other weird thing I found (again, this was with the CS14200 board), is that it would always give me 12 more samples than I asked for. I hope this is helpful. Gary Quote Link to comment
peteski Posted March 23, 2006 Report Share Posted March 23, 2006 My experience with Gage (I was using a CS14200 board) is that they do not do double-buffering in their FIFO, like most A/D cards do. I've never dealt with Gage, so its good to know to be on the lookout for this! I know that if certain things aren't implemented well in the dll's, what might be good in the hardware specs can go to waste in the "bottle necks" created along the way. But still, even if double buffering is not happening at the card side, specifically in this application it might not matter too much, since it would seem there is a 2% duty cycle to the acquisition itself (20 usec every 1 msec). If the dll seizes the control of the CPU from the start of the request for data until the moment it is tranfered to memory, then that would be a royal pain in the...! Regardless, I agree with you that keeping the data acquisition loop as lean as possible is simply the best bet! -Pete Liiva Quote Link to comment
Gary Rubin Posted March 23, 2006 Report Share Posted March 23, 2006 I've never dealt with Gage, so its good to know to be on the lookout for this! I know that if certain things aren't implemented well in the dll's, what might be good in the hardware specs can go to waste in the "bottle necks" created along the way. But still, even if double buffering is not happening at the card side, specifically in this application it might not matter too much, since it would seem there is a 2% duty cycle to the acquisition itself (20 usec every 1 msec). If the dll seizes the control of the CPU from the start of the request for data until the moment it is tranfered to memory, then that would be a royal pain in the...! Regardless, I agree with you that keeping the data acquisition loop as lean as possible is simply the best bet!-Pete Liiva I think I was able to do about 2kHz max with a considerably higher duty cycle. That was DMA'ing the data right to another card though, so I didn't have to wait for the OS/Labview to deal with the data arrays. It'd be a bit different with the multiple record. In that case, you store several triggers-worth of data and offload it all at once. It allows you to acquire at a much higher PRF for short periods of time, with the penalty that it takes even longer to offloard the data when you're done with the acquisition. Quote Link to comment
peteski Posted March 23, 2006 Report Share Posted March 23, 2006 I think I was able to do about 2kHz max with a considerably higher duty cycle. That was DMA'ing the data right to another card though, so I didn't have to wait for the OS/Labview to deal with the data arrays.It'd be a bit different with the multiple record. In that case, you store several triggers-worth of data and offload it all at once. It allows you to acquire at a much higher PRF for short periods of time, with the penalty that it takes even longer to offloard the data when you're done with the acquisition. Yes, at which point a second card might be an interesting option. Acquire on one card as you download from the other, sort of a "rich man's" double buffering, to bastardize a term Of course, then, some consideration has to be taken into account for properly balancing the split signal, as not to distort the original signal. And then there are signal losses, and maybe some other issues I'm forgetting at the moment. -Pete Liiva Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.