This is very tricky, and I definitely don't totally understand it. Also, despite the NI under my name I am not remotely part of R&D and so may just be making this stuff up. But it seems to be mostly accurate in my experience.
LabVIEW is going to use a set of buffers to store your array. You can see these buffers with the "show buffer allocations" tool, but the tool doesnt show the full story. In the specific image in Shaun's post, there should be no difference between a tunnel and shift register, because everything in the loop is *completely* read-only, meaning that LabVIEW can reference just one buffer (one copy of the array) from multiple locations. If you modify the array (for example, your in-place element structure), it means there has to be one copy of the array which retains the original values and another copy which stores the new values. Thats defined by dataflow semantics. However, labview can perform different optimizations if you use them differently. I've attached a picture. The circled dots are what the buffer allocations tool shows you and the rectangles are my guess of the lifetime of each copy of the array:
There are three versions. In version 1 the array is modified inside the loop, so labview cannot optimize. It must take the original array buffer (blue), make a copy of it in its original form into a second buffer (red) and then change elements of the red buffer so that the access downstream can access the updated (red) data.
Version 2 shows the read-only case. Only one buffer is required.
Version 3 shows where a shift register can aid in optimization. As in version one we are changing the buffer with replace array subset, but because the shift register basically tells labview "these two buffers are actually the same", it doesn't need to copy the data on every iteration. This changes with one simple modification:
However you'll note that in order to force a new data copy (note the dot), I had to use a sequence structure to tell labview "these two versions must be available in memory simultaneously". If you remove the sequence structure, LabVIEW just changes the flow of execution to remove the copy (by performing index before replace):
For fun, I've also put a global version together.
You'll note that the copy is also made on every iteration here (as labview has to leave the buffer in the global location and must also make sure the local buffer, on the wire, is up to date).
Sorry for the tl;dr, but hopefully this makes some sense. If not, please correct me