Jump to content

For loop parallelization


Recommended Posts

I have some code that keeps arrays of the most recent states of thousands of objects (1000 in the attached example).

The basic architecture of the code can be represented as shown (this a very simplified example that tries to capture what the real code does). And yes, I know that the initialization function is run more often than necessary. I just threw this together quickly to demonstrate dataflow.

post-4344-126868108338_thumb.png

Clearly, this does not work because the Parallel For Loop structure does not allow the use of the shift registers. I know, however, that I will never have duplicate index values in a call to this function so order of execution of the For Loop iterations does not matter. I also know that the lengths of the Index and Value arrays are highly variable at runtime.

Because there really is no dependency between for-loop iterations, it seems like this should be parallelizable.

So, does anyone have any suggestions? I was thinking I could maybe replace the shift registers with a queue containing the array:

post-4344-126868163284_thumb.png

Is that inefficient? Unsafe? How does the Parallel For Loop deal with this case? It claims to be executable.

After robustness, speed is the ultimate consideration here.

Thanks,

Gary

Link to comment

I have some code that keeps arrays of the most recent states of thousands of objects (1000 in the attached example).

The basic architecture of the code can be represented as shown (this a very simplified example that tries to capture what the real code does). And yes, I know that the initialization function is run more often than necessary. I just threw this together quickly to demonstrate dataflow.

post-4344-126868108338_thumb.png

Clearly, this does not work because the Parallel For Loop structure does not allow the use of the shift registers. I know, however, that I will never have duplicate index values in a call to this function so order of execution of the For Loop iterations does not matter. I also know that the lengths of the Index and Value arrays are highly variable at runtime.

Because there really is no dependency between for-loop iterations, it seems like this should be parallelizable.

So, does anyone have any suggestions? I was thinking I could maybe replace the shift registers with a queue containing the array:

post-4344-126868163284_thumb.png

Is that inefficient? Unsafe? How does the Parallel For Loop deal with this case? It claims to be executable.

After robustness, speed is the ultimate consideration here.

Thanks,

Gary

Option # is a single element Q?

OPtion #2 will also allow using the DVR (data value reference) to replace the SR.

Before i type more are you SURE the time you are trying to optimize is due to they many values tht have to be updated and NOT due to the associate crunching you have not shown above?

Ben

Link to comment

Option # is a single element Q?

OPtion #2 will also allow using the DVR (data value reference) to replace the SR.

Before i type more are you SURE the time you are trying to optimize is due to they many values tht have to be updated and NOT due to the associate crunching you have not shown above?

Ben

Ben,

Yes, Option 2 would be a single element Q, where that element is a preallocated array of fixed length.

Is DVR efficient? For some reason, I thought references made copies.

And there's not a lot of number crunching. Just lots of Array Subsets and Replace Array Subsets. Now that you mention it, though, that could be primarily a memory bottleneck, which depending on the architecture of my CPU, may or may not be helped through parallelization.

Gary

Link to comment

I've run into this as well, but my problem was actually memory allocation: large variable sized arrays of moderately sized objects leading to fragmentation/allocation. With a little code cleanup most copy operations vanished and everything went fine...I never really spent extra time about how it could be parallelized given the shift register limitation.

I'll also caution that unless you can absolutely guarantee the index array will contain all unique values, you'll likely need to do some checking, or keep the VI private such that you can control what accesses it if you go ahead with parallel code.

As for your queue example, that will execute, but there will be very little parallel operation, as all but one of the p-loops will be blocked by the one p-loop that has dequeued the array

I've only ever "played" with DVRs, so unfortunately I can't really comment on them. :(

On second thought, I'd expect the inherent mutual exclusions that are built into DVRs would put you up against the same loop blocking that happens with the single element queue...

Edited by mje
Link to comment

Ben,

Yes, Option 2 would be a single element Q, where that element is a preallocated array of fixed length.

Is DVR efficient? For some reason, I thought references made copies.

And there's not a lot of number crunching. Just lots of Array Subsets and Replace Array Subsets. Now that you mention it, though, that could be primarily a memory bottleneck, which depending on the architecture of my CPU, may or may not be helped through parallelization.

Gary

I have limited experience with the DVR but it can operate in-place like the Q version you illustrated.

Re: architecture

Yes, but the inverse is also possible depending on the hardware architecture (cache size, type, and pipe-lining). If you array will fit into your cache and the cache writes don't interupt the pipe-line you could effectively use the memory management hardware to act as a colator running in paprallel with the CPU.

But then again...

If the Q or DVR version results in CPU waiting for a mutex to the shared buffer, you may be better off just letting one CPU do the work without interacting with the process scheduler (hint: Timed seq will let you dicatate which CPU it runs in so you could use the second CPU as a dedicated number cruncher.)

Not claiming to know what I'm talking about, but always willing to learn,

Ben

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.