LAVA 1.0 Content Posted June 14, 2006 Report Posted June 14, 2006 I've been working on a project that requires logging multiple data streams via TCP at about 2.5 mbit/second each. The messages are fixed length and contain a counter at the beginning, to determine timing and if data is missing. I have an architecture based on a tight loop that pulls data from the TCP connection and places it in a queue. A logger task pulls data from the queue and writes it to disk. I have written a serving vi that simulates my data device, and have been able to source and log data at about 4.5 mbit/sec over a gigabit point to point Ethernet segment with no data loss. I now need to add event detection logging . Events will be determined by placing a subset of the messages into a UI data queue that will be evaluated separately. I was thinking of just storing the counter when an event occurs and using that to determine an offset position in the file. I'm not sure how large the log files are going to get; they may exceed 4GB, and require multiple files for a single test. I've been reading various app notes on NI's site about using circular buffers to log event data, such as: Deterministic Data Streaming in Distributed Data Acquisition SystemsEvent-Driven Circular Hard Disk Data Buffering This is my first high-speed logging application. Am I approaching this the right way by using queues, or are there factors I have not yet discovered that will cause me grief later on? I'm concerned about switching to a circular buffer and possibly losing data if my logger function is delayed by some sort of system event. I've got 2 GB physical memory, and the queue works well so far. I'm seing about 8-12% cpu utilization with a single channel. The additional channels will each have their own Ethernet controller (again, point to point). Quote
Mike Ashe Posted June 14, 2006 Report Posted June 14, 2006 Right now it looks like you have a good architecture worked out and your CPU utilization has plenty of engineering margin. I try to keep my utilization below 50%, and I like 40% for comfort. You are well below that. I would test things out quickly by creating a prototype of your new event additions. Don't try to simulate everything, just the extra data throughput. Dummy up at least twice the maximum additional data you expect and pump that through your system and check for strain. If you are still under 40% I would say you probably want to stick with your queues and current architecture. If you go over 50% or show spikes or data drop after a reasonable test run time, then you may want to think about architectural changes. Otherwise, stick with what seems to be winning in your current paradigm. Good luck. Quote
Grampa_of_Oliva_n_Eden Posted June 14, 2006 Report Posted June 14, 2006 Right now it looks like you have a good architecture worked out and your CPU utilization has plenty of engineering margin. I try to keep my utilization below 50%, and I like 40% for comfort. You are well below that.I would test things out quickly by creating a prototype of your new event additions. Don't try to simulate everything, just the extra data throughput. Dummy up at least twice the maximum additional data you expect and pump that through your system and check for strain. If you are still under 40% I would say you probably want to stick with your queues and current architecture. If you go over 50% or show spikes or data drop after a reasonable test run time, then you may want to think about architectural changes. Otherwise, stick with what seems to be winning in your current paradigm. Good luck. Im betting on the queue to win. Ever since the queue data type could be defined, it has blown away any LV2 I can develop, .... unless the data has to go to more than one place. Ben Quote
LAVA 1.0 Content Posted July 31, 2006 Author Report Posted July 31, 2006 unless the data has to go to more than one place Well, I'm back... The data does indeed need to go "more than one place" I've learned that there is data that must be evaluated AND shown to an operator before and during actual logging to file. I've created a new vi that reads the original raw data queue (tcp strings) and performs a least squares fit on a subset of that data. I create a second data queue within this new vi that I "forward" the raw rate data (tcp strings) to. When I need to start the actual logging, I manually start the original logger vi, perform a lookup in a LV2 global to retrieve the reference to the secondary data queue, then log to disk. I've got enough memory to support the dual queues for now, I can decrease my queue sizes some if I run into problems. Currently, my vis are started Reentrant (option 8) and Autodispose references set to true. When testing ends, I send a notification to the top level TCP reciever VI to close. The receiver closes the TCP session, then moves to a pause state where it waits for the first data Queue Status to return empty, destroys the first data queue and exits. The least squares fit vi errors out reading the destroyed first data queue, then moves to a pause state where it waits for the secondary queue to empty, then destroys the secondary data queue and exits. The independantly started logger vi errors out reading the secondary queue and closes it's file handles. This seems to work, but I fear that I won't close all the threads and references properly. Should I add all of my references to my LV2 global and add an action to check/close all these references? Is this design reliable and scalable to three "channels" of tcp receivers? Quote
Aristos Queue Posted August 26, 2006 Report Posted August 26, 2006 Well, I'm back... The data does indeed need to go "more than one place" I've learned that there is data that must be evaluated AND shown to an operator before and during actual logging to file. I've created a new vi that reads the original raw data queue (tcp strings) and performs a least squares fit on a subset of that data. I create a second data queue within this new vi that I "forward" the raw rate data (tcp strings) to. Sounds like a good design. By the way, under the hood the queues are a very efficient circular in-memory buffer. They don't do any data copies when enqueing and dequeing unless you fork the wire to take the data elsewhere on the diagram (and even then only if the "elsewhere" is a function that will modify the value). Quote
LAVA 1.0 Content Posted September 11, 2006 Author Report Posted September 11, 2006 By the way, under the hood the queues are a very efficient circular in-memory buffer. I find myself repeatedly writing code to check the timeout of the Enqueue Element method, then placing a case statement that will Dequeue one element to make room for the new element that timed out. If LabVIEW queues are implimented as circular buffers, is there a better way to do this? How to Write to a Full Buffer LabVIEW can handle cases 1 and 4 (from link above), and sort of handles the third case by timing out with a time of zero milliseconds. I'm concerned that my implimentation of Case 2 with "Enqueue -> Timeout True? -> Dequeue Element -> Enqueue Element" will add to the CPU load when working with multiple TCP data streams. It would be nice to have an option to "Obtain Queue" for defining how a size limited queue behaves when full; and enumeration with "Ignore Oldest","Ignore Newest". A companion to this would be an additional "Late?" output on the Dequeue Element and Flush Queue functions that could be used to monitor the servicing of the queue. In timed loops, this could even be used to dynamically adjust the loop rate to tune CPU usage. Quote
Dan Press Posted September 11, 2006 Report Posted September 11, 2006 So far you seem to be on the right track. I'm just responding to your suggestions for how to change the behavior of queues. It looks to me like we already have all the tools to define how our queues behave. The options for handling how the queue acts when full are all available to you at the Enqueue function. There is nothing wrong with using the timeouts and reacting to them. You've already written one example of how to customize the Enqueue behavior. Just select that inner part of your diagram and Create SubVI! You can even add the other options to give that subVI a more general applicability. Perhaps wire up an enumerated type with your choices of Ignore Oldest, Ignore Newest, Wait, etc. Similarly, you could create your own Dequeue subVI that reports the number of elements in the queue and/or the "fullness %" on which you could then perform some math to determine a period for a timed loop. I could see a use case where you would want to employ the Ignore Newest or Oldest in some parts of your code, but not in others. For example, you could have multiple message types that travel on the same queue. Some message types could be of a higher priority than others and would therefore be given the authority to bump off the oldest elements on the queue. Other lower priority messages would have to wait until there is space. As an aside, you're using TCP so I would be very surprised of you experience data loss due to the network. TCP is a "guaranteed delivery" protocol. If you switch to UDP, you can achieve higher throughput while perhaps risking data loss due to the lack of error checking and such. It sounds like you've got a cool project. We worked on something that had similar requirements and queues turned out to be just the thing. Quote
Kevin P Posted September 12, 2006 Report Posted September 12, 2006 It would be nice to have an option to "Obtain Queue" for defining how a size limited queue behaves when full; and enumeration with "Ignore Oldest","Ignore Newest". I was recently musing about something similar on the NI forums. Specifically, I've been finding fairly frequent need for a behavior more like lossy circular buffering. I'd like to fix the size of the circular buffer, and then the freshest data keeps circularly overwriting the oldest data. The UI thread could then asynchronously perform analysis on the most recent N samples, acting like a sliding window. The other behavior I'd like in a circular buffer would be the ability to query data in a manner like the DAQ circular buffers, i.e., specify Read Marks, Offsets, # Samples to Read, etc. The trouble with writing little wrappers that accomplish something similar using queues is the need to keep re-writing them for different datatypes as the need arises. Besides, the code to retrieve the most recent 1024 samples in a size 16384 buffer seems pretty clunky using queues. -Kevin P. Quote
LAVA 1.0 Content Posted September 14, 2006 Author Report Posted September 14, 2006 The trouble with writing little wrappers that accomplish something similar using queues is the need to keep re-writing them for different datatypes as the need arises. Besides, the code to retrieve the most recent 1024 samples in a size 16384 buffer seems pretty clunky using queues. Exactly! :thumbup: Lossy Queues would eliminate this. How much extra code and CPU overhead does the current solution involve (enqueue, timeout, eval timeout, dequeue element, enqueue element)? If the Queue was defined as lossy, then it would only need to move the pointer internally +1 and store the data. The Queue must have pointers defined internally that determine the Start and End of data indexes to perform a flush; so the Late? flag would be true when End and Start are adjacent. A "# of Elements" input to Dequeue Element would make 1 call to the underlying Queue, rather than 1024 dequeues while trying to simultaneously load the queue at a high rate. How about a block size input to Obtain Queue with a corresponding Notifier output? The Notifier would be triggered from within the queue each time a complete block was available. The Notifier would contain the index to the block. A Dequeue Block function would return the requested block. And please, don't bring up "Use a variant data type, then you can write generic wrappers" This is about making the queues easy and fast, not about " ways to skin the cat".... I’m betting on the queue to win. Ever since the queue data type could be defined, it has blown away any LV2 I can develop, Queues are proven to be more efficient, I'm just hoping to get the most out of them. I feel as if I'm tacking CB antennas, bumper protectors, and deer whistles onto a Ferrari... If you go over 50% or show spikes or data drop after a reasonable test run time, then you may want to think about architectural changes. As a quick note, I did see some serious CPU spikes (> 80%) while monitoring CPU load for multiple channels at higher rates. Using Windows Performance Monitor I determined that the spikes coincided with disk I/O. It appears that Windows was buffering large amounts of data before flushing it. I added a LabVIEW "Flush File" after every 8th "Flush Queue / Write to Binary File" loop. My CPU load went almost flat, even for three channels. The interval may need to change as my data rates or message size increase... It might have something to do with the RAID disk controller, or some Windows "Delayed Write" setting somewhere, but I was able to manage it from LabVIEW directly, and the problem wasn't related to queues.... :thumbup: Quote
Aristos Queue Posted September 14, 2006 Report Posted September 14, 2006 LabVIEW can handle cases 1 and 4 (from link above), and sort of handles the third case by timing out with a time of zero milliseconds. I'm concerned that my implimentation of Case 2 with "Enqueue -> Timeout True? -> Dequeue Element -> Enqueue Element" will add to the CPU load when working with multiple TCP data streams. I looked long and hard at your picture. A few thoughts... a) Are you worried about the unnecessary loss of data? There's no "critical section" protecting the "enqueue, dequeue, enqueue" sequence. Suppose the producer VI tries to enqueue and fails. Ok, so it goes to the dequeue to make room. In the time it takes to do this, the consumer VI has dequeued an element. There's no need for you to do the dequeue, but you don't know that. I don't think this matters -- most lossy streams, such as video conference communication packets, don't really care what packets get dropped. If this matters, a semaphore aquire/release around the "enqueue, dequeue, enqueue" and the same semaphore acquire/release around the dequeue in the consumer loop would fix the problem. b) I think I can suggest a better performing way of doing the "enqueue, dequeue, enqueue." Your current implementation will fork the data being added to the queue and will hurt performance for large data elements. Try this: The Queue Status primitive generates no code for any terminal that is unwired. So if you do not wire the "Elements Out" it will not duplicate the contents of the queue, nor will it take the time to evaluate any of the unwired terminals. Fetching the current element count is very fast, and this avoids ever forking your data wire. Forking the wire is a big deal since if the wire is forked it prevents the Enqueue from taking advantage of one of its biggest speed optimizations and it guarantees a copy of the data will be made at the fork. (PS: The 0 that I've wired to the timing input of the dequeue inside the case structure is important... you might detect that the queue is full, so you go to dequeue an element... in the time between when you detect the queue is full and the dequeue, the consumer loop might speed ahead, dequeue all the remaining elements and leave the queue empty. If the timeout terminal of the dequeue is unwired, the dequeue would hang indefinitely waiting for someone to enqueue data. These are the sorts of gotchas that multi threading opens up for you. ) c) I know you said "and don't tell me about variants." Although probably not the solution at the moment for whatever it is that you're working on, as time goes on I would expect those utility VIs that you discuss to be writable with LabVIEW classes, where there is no extra data allocation when you upcast. Over time I believe that users will find a lot of utility in rewriting functionality, particularly with generic data communications systems like the queues or LV2 globals, using the LabVIEW classes for maximum code reuse and minimum "genericization" overhead. Just a thought... I'm downplaying the possibilities here since I've lately been accused of suggesting LV classes as *the* silver bullet for all of LV's problems. I want to keep expectations realistic, but I do think there's benefit in this arena. Quote
LAVA 1.0 Content Posted September 15, 2006 Author Report Posted September 15, 2006 The Queue Status primitive generates no code for any terminal that is unwired. So if you do not wire the "Elements Out" it will not duplicate the contents of the queue, nor will it take the time to evaluate any of the unwired terminals. Fetching the current element count is very fast, and this avoids ever forking your data wire. Forking the wire is a big deal since if the wire is forked it prevents the Enqueue from taking advantage of one of its biggest speed optimizations and it guarantees a copy of the data will be made at the fork. :thumbup: Generates no code for unwired terminals is great news! This is something I was unaware of; I've generally avoided using "Get Queue Status" (think polling ). I'll definitely look at and test your suggestion. Just a thought... I'm downplaying the possibilities (of using the LabVIEW classes) here since I've lately been accused of suggesting LV classes as *the* silver bullet for all of LV's problems. You know LVOOP and it's possibliities as well or better than anyone. If it offers a better, faster or more robust solution to a problem, then let us know! As that old Texan sayin' goes: "It ain't Braggin'"--if it's true! I haven't seriously played with classes yet, but I enjoy reading the 'spirited' discussions on LAVA and Info-LABVIEW. I've spent the last year digging into multithreading, queues, notifiers and Preferred Exection Thread. I look forward to learing LVOOP and adding it to my programming arsenal. I think LVOOP needs a first "real world" example that people could use. Most of the discussions seem more academic to me. I don't care where it might be a problem, show me what it can do! I've been thinking of a twist on the "Getting Started" example. How about adapting it to be a Toolbar class that could be reused by every-day LabVIEW users (like me); maybe placed in their own UI as a subpanel or in a pane? Just thinking..... Quote
Gary Rubin Posted September 15, 2006 Report Posted September 15, 2006 While we're on the topic of queues, I have some pretty basic questions about them. I use LV2 Globals for passing data asynchronously between acquisition and processing routines, but have become curious about the applicability of queues, especially after reading about their efficiency. I just played with the LV7.1 queue examples and read the online help about queues, but I still feel like I'm not understanding their true power. My typical LV2 Global looks like this: 2 Shift Registers: one containing an array and the other containing a Count I32 scalar Init Case: Initialize array and put into shift register Put Case: Take in new data, put into shift register array using Replace Array Subset and increment Count Get Case: Use Array Subset, get data up to Count, and reset count to zero. Typically, my put case (acquisition) will run in a much faster loop than my get case (processing/display/storage), meaning that I will be "getting" arrays of data, after having "put" many scalars. I was thinking about whether replacing this approach with queues would be beneficial, and here's what I'm still not sure about: From what I can tell, you can only dequeue the same datatype that you enqueue. I would have assumed that if I enqueue a scalar in the form of an array of length = 1, then dequeue the array, it would have given me the whole contents of the queue, but this wasn't the case. I also don't see any sort of way of saying "dequeue the last N queued elements". Therefore, ff I wanted to use queues for this type of data transfer, I would have to put the Dequeue, along with Get Queue status, in a while loop and run it until the queue is empty. Is that correct? If so, can that really be more efficient than what I'm doing now? Thanks, Gary Quote
LAVA 1.0 Content Posted September 15, 2006 Author Report Posted September 15, 2006 I was thinking about whether replacing this approach with queues would be beneficial, and here's what I'm still not sure about:From what I can tell, you can only dequeue the same datatype that you enqueue. I would have assumed that if I enqueue a scalar in the form of an array of length = 1, then dequeue the array, it would have given me the whole contents of the queue, but this wasn't the case. I also don't see any sort of way of saying "dequeue the last N queued elements". Therefore, ff I wanted to use queues for this type of data transfer, I would have to put the Dequeue, along with Get Queue status, in a while loop and run it until the queue is empty. Is that correct? If so, can that really be more efficient than what I'm doing now? Well, after professing to having spent the last year learning these things, I'll try to answer. "From what I can tell, you can only dequeue the same datatype that you enqueue." In a strict sense, the answer is "True". In order to create a queue (Obtain Queue) you must provide a data type. This can be as basic as a scalar, our a as complex as a cluster of clusters. You may have noticed my emoticon when I mentioned variants. I actually answered a question in another thread about the ADVANTAGES of using variants (here). So, you catually can enqueue different data types, but you have to know what to do with it when you dequeue it. "Therefore, ff I wanted to use queues for this type of data transfer, I would have to put the Dequeue, along with Get Queue status, in a while loop and run it until the queue is empty." There is a nice function called "Flush Queue" This will return all of the available elements from the Queue, and empty it. This is how I log my data to disk. I take the array of TCP strings and write them to a binary file. The real gain in using a Queue over a LV2 (Functional) Global is that in order for the Functional Global to store data, it must 'block'. That is, while you are replacing elements in the global, the reader must wait. If it wasn't 'blocking' (if the Functional Global was re-entrant) it couldn't store data in it's internal shift registers. The underlying LabVIEW implimentation of the queue is a circular buffer with multiple pointers. Enqueue Element can be writing to one position in the internal array while Dequeue Element function can be extracting a different element, and the Queue management code updates the pointers. The discussion about 'lossy queues' doesn't really matter much as long as you can dequeue data faster than you can enqueue it. You might say How? Disk I/O is a slow operation. Bulk writes of 1000 strings is more effecient than 1000 individual writes. There is one issue where you can only Dequeue Element or Flush Queue. There is no provision for dequeueing N elements; you must put that in a loop. I may be wrong, but I think the Enqueue Element function is actually polymorphic; you can provide a single element, or an array of elements of the type specified when the Queue was obtained. As Ben stated and I have observed, the Queues should prove to be faster than any LV2 Functional Global you can wire up. Hope I answered your question(s) Quote
Gary Rubin Posted September 15, 2006 Report Posted September 15, 2006 There is a nice function called "Flush Queue" This will return all of the available elements from the Queue, and empty it. ... Hope I answered your question(s) That's exactly what I was looking for! Thanks! Quote
Kevin P Posted September 15, 2006 Report Posted September 15, 2006 One other little note: when you have a queue of 1D arrays of <whatever>, each enqueue operation can pass in a different size array, i.e., an array of 49 elements, followed by an array of 3117 elements, an array of 3 elements, etc. Of course, this may not necessarily be the friendliest thing to do to your downstream data consumer... -Kevin P. Quote
Gary Rubin Posted September 15, 2006 Report Posted September 15, 2006 One other little note: when you have a queue of 1D arrays of <whatever>, each enqueue operation can pass in a different size array, i.e., an array of 49 elements, followed by an array of 3117 elements, an array of 3 elements, etc. Of course, this may not necessarily be the friendliest thing to do to your downstream data consumer... That's the way my LV2 works right now. It doesn't make for smooth dataflow, but it makes sure that the consumer is not waiting for more when it could be processing what's available. Quote
Aristos Queue Posted September 16, 2006 Report Posted September 16, 2006 One other little note: when you have a queue of 1D arrays of <whatever>, each enqueue operation can pass in a different size array, i.e., an array of 49 elements, followed by an array of 3117 elements, an array of 3 elements, etc. Of course, this may not necessarily be the friendliest thing to do to your downstream data consumer...-Kevin P. Curious that you should bring up queues of arrays today.... For reasons of my own, I was reviewing the behind-the-scenes code of the queue primitives today. One of the biggest use cases I have for them in my own programming is tree traversal. For example, control reference traversal: enqueue all the controls on the front panel, then in a loop, dequeue them one by one to do something to them. If the one you dequeue is a tab control or a cluster, enqueue all the sub controls. Continue looping until the queue is empty. Now, what I usually do is create a queue of my base type (for example, control refnum). When I have an array of elements, I drop a for loop and enqueue each element one by one. That way they're all available for dequeue. I got to thinking -- there are some times when I'm enqueuing each individual element and then at the dequeue I'm building an array of elements back up (maybe a filtered list of the elements, for example). In these cases, it seems like it might be beneficial to have two queues, one that handles single elements and one that handles arrays of elements so that I never bother tearing down the array structure if I'm just going to build it again. Of course, this is for traverals where the order of traversal doesn't matter, since you would then dequeue from the lone element queue until it was empty then dequeue from the array queue. Since the queues try to simply take ownership of the wire's data and not make a copy of the data (unless the enqueue wire is forked to some other node in which case it has to make a copy), it might make sense in some cases to let the enqueue take ownership of the entire array of elements. I don't have any specific examples at this point. And I don't have any evidence that this would ever be advantageous. It's just one of those passing hunches to think about.... Quote
Kevin P Posted September 19, 2006 Report Posted September 19, 2006 ...Since the queues try to simply take ownership of the wire's data and not make a copy of the data (unless the enqueue wire is forked to some other node in which case it has to make a copy), it might make sense in some cases to let the enqueue take ownership of the entire array of elements. My most common use case for queues of arrays results from DAQmx data acq. I set up a hardware monitor thread that pushes data straight from a DAQmx Read into an Enqueue call (no data forking). Then another thread Dequeues so I can write to file or do some processing. Any benchmarking I've done makes it seem pretty efficient, but I'd like to confirm: is this how you'd recommend using queues to separate data acq from processing? The other thing I typically do explains why I'd really like native queue-like support for lossy circular buffers. I generally have some type of live display going on that gives the operator a reasonable clue about what's happening. It isn't the "real" analysis, just a brief flickering view into how the test is going. What I wind up doing is that when Dequeueing the DAQ data for file writes, I also decimate it and write it to a homemade circular buffer. Inside the buffer function, I have to copy data values from the input to my internal shift register array. Question: what's the most efficient way to structure the output indicator for such a homemade circular buffer? How does LV decide whether to hang onto and reuse memory space or allocate a new output array on every call? Are there ways to force its hand? I remember some old Traditional NI-DAQ calls under RT where you could wire in the right size input array whose actual memory space was used for filling in the output array values. Would this still be the best way to handle my homemade circular buffer? My RT experience tends to make me look for ways to minimize unnecessary memory allocations... -Kevin P. Quote
LAVA 1.0 Content Posted September 19, 2006 Author Report Posted September 19, 2006 I think I can suggest a better performing way of doing the "enqueue, dequeue, enqueue." Your current implementation will fork the data being added to the queue and will hurt performance for large data elements. Try this: The Queue Status primitive generates no code for any terminal that is unwired. So if you do not wire the "Elements Out" it will not duplicate the contents of the queue, nor will it take the time to evaluate any of the unwired terminals. Fetching the current element count is very fast, and this avoids ever forking your data wire. Forking the wire is a big deal since if the wire is forked it prevents the Enqueue from taking advantage of one of its biggest speed optimizations and it guarantees a copy of the data will be made at the fork. I tried the suggested implementation and ran some tests this morning. I placed a case statement around my enqueue code, and used a boolean to select the case. I start three instances of this receiver, as well as three instances of my logger and three of my UI (receives the Notifier data) What I observed is that the CPU utilization was higher when using the Get Queue Status method you suggested. For three channels, my implementation runs ~ 24%. The Get Queue Status method averaged 32%+. It appears that the Get Queue Status function requires more CPU resources than forking the data does. The data (TCP string) never leaves the loop; it is passed unmodified to the Queue and Notifier. My Queue is grossly oversized (72,000 elements), so I should never get a "full queue" at this point. The data messages are 128 bytes long. The logger flushes the queue once a second, and the data is arriving at the TCP port @ ~ 1kHz right now. The Notifier is sent at 1000/20 or ~20 Hz. The thing that bothered me the most was that while using the Get Queue Status technique, I saw the strip chart in my UI plot Nulls for a second (as designed). That is, the Notifier did not fire (waits 200 ms for timeout) for multiple data points. I'm playing back a binary file, and part of my validation is to check that the original and logged binary files are identical. The Log file was missing data! (the first time I've seen this in months). I moved the Get Queue Status outside the case statements and placed it inside the same loop as the TCP Receive, hoping that LabVIEW could "clump" the operations better. My CPU utilization raised to an average of almost 50%, but I did not experience data loss. What could have caused the queue to "block" or "freeze"? Could the Get Queue Status function have blocked during a Flush Queue Operation in my Logger? The delay was long enough that the Windows TCP buffer overfilled by the time I came around to read again... Quote
Aristos Queue Posted September 23, 2006 Report Posted September 23, 2006 I was thinking about whether replacing this approach with queues would be beneficial, and here's what I'm still not sure about:From what I can tell, you can only dequeue the same datatype that you enqueue. I would have assumed that if I enqueue a scalar in the form of an array of length = 1, then dequeue the array, it would have given me the whole contents of the queue, but this wasn't the case. I also don't see any sort of way of saying "dequeue the last N queued elements". Therefore, ff I wanted to use queues for this type of data transfer, I would have to put the Dequeue, along with Get Queue status, in a while loop and run it until the queue is empty. Is that correct? If so, can that really be more efficient than what I'm doing now? You can use Flush Queue to dequeue all of the elements. If you want N elements and account for the possibility that there aren't that many elements in the array, then do this: There's a slightly easier way... Put the Dequeue in a While loop with the Timeout terminal wired with zero. It will dequeue one element. If no element is available, it will immediately timeout. Wire the output Timeout terminal to the Stop terminal of the While loop. You can Or in a test to see if the "i" terminal of the While loop has reached your desired count. This way you don't have to get the Get Queue Status prim involved, and you'll save yourself some thread synchronization overhead. The attached VI is written in LV8.2. Download File:post-5877-1159034903.vi Quote
Aristos Queue Posted September 23, 2006 Report Posted September 23, 2006 I start three instances of this receiver, as well as three instances of my logger and three of my UI (receives the Notifier data) YEEE HAW! I think I found it! The key word in the above sentence is "three". In the "enqueue dequeue enqueue" version -- the VI tries to enqueue and if there's room it immediately enqueues without releasing the lock on the queue. In the "Get status, dequeue enqueue" version -- the VI checks Get Status and if it finds that there is enough room in the queue, it enqueus... BUT if there are three receivers operating on the same queue, one of the others may have made the same Get Status check and decided it had enough room -- thus taking the very last spot in the queue! For example: The queue has max size of 5. There are currently 4 elements already in the queue. At time index t, these events happen: t = 0, VI A does Get Status and the value 4 returns t = 1, VI B does Get Status and the value 4 returns t = 2, VI B does Enqueue t = 3, VI A does Enqueue -- and hangs because there's no space left If you're going to use the Get Queue Status, you have to make your test "If (current elements in queue >= (max queue size - number of enqueue VIs)) then { dequeue }" Subtle!!!! Quote
LAVA 1.0 Content Posted September 23, 2006 Author Report Posted September 23, 2006 If you're going to use the Get Queue Status, you have to make your test "If (current elements in queue >= (max queue size - number of enqueue VIs)) then { dequeue }" All the alternative implementations suggested by both LV Punk or Aristos Queue may fail to succeed in enqueueing the TCP buffer to a concurrently accessed queue. To be 100% certain that you succeeed in enqueueing an element, you have to be able to test-and-set in an atomic manner. An atomic operation in computer science refers to a set of operations that can be combined so that they appear to the rest of the system to be a single operation. However you cannot guarantee atomic test-and-set unless you rely on hardware or operating system test-and-set memory operations. In LabVIEW you can access these OS level operations only by using semaphores or limited size queues. You would need to lock the queue using either a semaphore or another queue, then perform atomic dequeue+enqueue operation and finally release the lock. This however doesn't sound very wise since your performance would be lower than using single threaded application. You could also force all queue operations to happen in a single thread by placing them in a single non-reentrant VI, but also this would reduce your data troughput. To achive practical level quality you can: Dequeue elements when queue is almost full as Aristos suggested. However to be on safe side you should start dequeueing already a little earlier than what Aristos suggested. Dequeue element if queue is full and try to enqueue element. If this fails, you repeat dequeueing elements until you succeed in enqueueing. To increase success rate you can first try to dequeue one element and if you still cannot enqueue the dequeue two elements and so on. Instead of dequeueing only one element, dequeue multiple elements so that for any practical reason there is enough room in the queue Use multiple queues, one for each VI, so that collision never occur Use only single VI so collisions never occur To get more information about the atomic operations, google for "atomic operation", "test-and-set" and perhaps also "semaphore". If you get interested, google also for "software transactional memory" for a little alternative lock-free implementation of concurrent operations. Quote
LAVA 1.0 Content Posted September 25, 2006 Author Report Posted September 25, 2006 For example:The queue has max size of 5. There are currently 4 elements already in the queue. At time index t, these events happen: t = 0, VI A does Get Status and the value 4 returns t = 1, VI B does Get Status and the value 4 returns t = 2, VI B does Enqueue t = 3, VI A does Enqueue -- and hangs because there's no space left If you're going to use the Get Queue Status, you have to make your test "If (current elements in queue >= (max queue size - number of enqueue VIs)) then { dequeue }" Sorry to burst your bubble, but I may not have explained the implementation fully. There are three TCP receivers, three loggers and three UIs. Each Receiver creates a unique queue and unique notifier and places the refs in an LV2 global array. Each logger and UI retrieves refs to it's receiver by index. The queue is used by the logger, and the notifier by the UI. I chose the notifier for the UI because it's not a critical function and the implementation of the notifier is "a single element lossy queue" There is only one consumer for each queue. There is no case where the two consumers are reading from a common queue. I need to be able to log without UI for high speed cases, and there is a "stabilize" state that must be reached before logging. So... The receiver can receive and pass data via the queue and notifier. The stabilize state can be determined either in the UI or a separate VI that evaluates the notifier data. When things are "stable" the logger can be started and data written to disk. I generally have some type of live display going on that gives the operator a reasonable clue about what's happening. It isn't the "real" analysis, just a brief flickering view into how the test is going. What I wind up doing is that when Dequeueing the DAQ data for file writes, I also decimate it and write it to a homemade circular buffer. Inside the buffer function, I have to copy data values from the input to my internal shift register array. I'm doing the same thing. I need to perform a least squares fit and "limits test" on a subset of data to show overall state or progress, but "real" analysis could be performed after the fact on the logged data. The data does not necessarily have to be truly decimated (regular interval), just a managable size block for on-the-fly calculations. A lossy queue would free me from "managing" it; I could stuff it from the producer and flush it from the consumer without worrying about an unlimited sized queue exhausting memory, or eating up CPU cycles performing queue checks. I'm thinking of listening to a "full rate" notifier in my "analysis" vi, and using an internal (to the analysis VI) LV queue that would have the enqueue-error-dequeue-enqueue functionality in it; or maybe even a user event that fires the analysis for every N notifications received. If this VI errored or was closed, the internal queue would be destroyed and the notifier data from the producer could "drop to the floor" without penalty. I feel I've been complaining too much about something that can't be changed. I have a method that works for me, so I guess its time to move on and get some work done Thanks to all, and maybe some day we'll have "lossy" queues in LV. Think "Fat Bast@rd" from Austin Powers: The Spy Who Shagged Me... I want my lossy queues, lossy queues, lossy queues; I want my lossy queues, lossy queues, lossy queues... Lossy... Lossy LV queues.. Quote
Kevin P Posted October 6, 2006 Report Posted October 6, 2006 Adding a couple more tidbits: I just remembered the RTFIFO that was (is?) downloadable from ni.com back around LV RT v6. It IS lossy. As I recall though, you would have to roll your own loop to retrieve the entire FIFO buffer -- I don't *think* it had functions similar to the Queue's "flush" or "status" which can return all the elements at once. I'm not sure if that version would still be compatible with LV RT 7+, but it would probably still work on the Windows side. I personally haven't started using LV 8.x, but was at the NI Tech Symposium yesterday and noticed that Shared Variables have an option to allow buffering with overwriting. The guy doing the presentation hinted that under the hood, a Shared Variable configured that way would essentially implement an RTFIFO. I haven't played with Shared Variables yet, so don't know if there might be a way to retrieve the entire buffer at once. I also recall an NI forum post about using the circular overwrite built into a chart control. The suggestion was to hide the chart on the front panel to try to prevent any expensive screen redraws. Then the "History" property can give you the entire buffer all at once. I haven't tested this out for speed yet, but I kinda suspect there's still gonna be a lot of overhead in a chart control even when its panel is hidden. -Kevin P. Quote
Guillaume Lessard Posted October 6, 2006 Report Posted October 6, 2006 I just remembered the RTFIFO that was (is?) downloadable from ni.com back around LV RT v6. It IS lossy. I have used an RT-FIFO as a lossy queue on the windows side, but you lose some functionality compared to the normal queues. That's unfortunate. It has led me to wish for exactly what LV Punk wishes for. Still, it can be a handy shortcut. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.