For loop parallelization

Gary Rubin · March 15, 2010

I have some code that keeps arrays of the most recent states of thousands of objects (1000 in the attached example).

The basic architecture of the code can be represented as shown (this a very simplified example that tries to capture what the real code does). And yes, I know that the initialization function is run more often than necessary. I just threw this together quickly to demonstrate dataflow.

Clearly, this does not work because the Parallel For Loop structure does not allow the use of the shift registers. I know, however, that I will never have duplicate index values in a call to this function so order of execution of the For Loop iterations does not matter. I also know that the lengths of the Index and Value arrays are highly variable at runtime.

Because there really is no dependency between for-loop iterations, it seems like this should be parallelizable.

So, does anyone have any suggestions? I was thinking I could maybe replace the shift registers with a queue containing the array:

Is that inefficient? Unsafe? How does the Parallel For Loop deal with this case? It claims to be executable.

After robustness, speed is the ultimate consideration here.

Thanks,

Gary

Grampa_of_Oliva_n_Eden · March 15, 2010

I have some code that keeps arrays of the most recent states of thousands of objects (1000 in the attached example).

The basic architecture of the code can be represented as shown (this a very simplified example that tries to capture what the real code does). And yes, I know that the initialization function is run more often than necessary. I just threw this together quickly to demonstrate dataflow.

Clearly, this does not work because the Parallel For Loop structure does not allow the use of the shift registers. I know, however, that I will never have duplicate index values in a call to this function so order of execution of the For Loop iterations does not matter. I also know that the lengths of the Index and Value arrays are highly variable at runtime.

Because there really is no dependency between for-loop iterations, it seems like this should be parallelizable.

So, does anyone have any suggestions? I was thinking I could maybe replace the shift registers with a queue containing the array:

Is that inefficient? Unsafe? How does the Parallel For Loop deal with this case? It claims to be executable.

After robustness, speed is the ultimate consideration here.

Thanks,

Gary

Option # is a single element Q?

OPtion #2 will also allow using the DVR (data value reference) to replace the SR.

Before i type more are you SURE the time you are trying to optimize is due to they many values tht have to be updated and NOT due to the associate crunching you have not shown above?

Ben

Gary Rubin · March 16, 2010

Option # is a single element Q?

OPtion #2 will also allow using the DVR (data value reference) to replace the SR.

Before i type more are you SURE the time you are trying to optimize is due to they many values tht have to be updated and NOT due to the associate crunching you have not shown above?

Ben

Ben,

Yes, Option 2 would be a single element Q, where that element is a preallocated array of fixed length.

Is DVR efficient? For some reason, I thought references made copies.

And there's not a lot of number crunching. Just lots of Array Subsets and Replace Array Subsets. Now that you mention it, though, that could be primarily a memory bottleneck, which depending on the architecture of my CPU, may or may not be helped through parallelization.

Gary

mje · March 16, 2010

I've run into this as well, but my problem was actually memory allocation: large variable sized arrays of moderately sized objects leading to fragmentation/allocation. With a little code cleanup most copy operations vanished and everything went fine...I never really spent extra time about how it could be parallelized given the shift register limitation.

I'll also caution that unless you can absolutely guarantee the index array will contain all unique values, you'll likely need to do some checking, or keep the VI private such that you can control what accesses it if you go ahead with parallel code.

As for your queue example, that will execute, but there will be very little parallel operation, as all but one of the p-loops will be blocked by the one p-loop that has dequeued the array

I've only ever "played" with DVRs, so unfortunately I can't really comment on them.

On second thought, I'd expect the inherent mutual exclusions that are built into DVRs would put you up against the same loop blocking that happens with the single element queue...

Edited March 16, 2010 by mje

Grampa_of_Oliva_n_Eden · March 16, 2010

Ben,

Yes, Option 2 would be a single element Q, where that element is a preallocated array of fixed length.

Is DVR efficient? For some reason, I thought references made copies.

And there's not a lot of number crunching. Just lots of Array Subsets and Replace Array Subsets. Now that you mention it, though, that could be primarily a memory bottleneck, which depending on the architecture of my CPU, may or may not be helped through parallelization.

Gary

I have limited experience with the DVR but it can operate in-place like the Q version you illustrated.

Re: architecture

Yes, but the inverse is also possible depending on the hardware architecture (cache size, type, and pipe-lining). If you array will fit into your cache and the cache writes don't interupt the pipe-line you could effectively use the memory management hardware to act as a colator running in paprallel with the CPU.

But then again...

If the Q or DVR version results in CPU waiting for a mutex to the shared buffer, you may be better off just letting one CPU do the work without interacting with the process scheduler (hint: Timed seq will let you dicatate which CPU it runs in so you could use the second CPU as a dedicated number cruncher.)

Not claiming to know what I'm talking about, but always willing to learn,

Ben

Sign In

For loop parallelization

Recommended Posts

Gary Rubin

Grampa_of_Oliva_n_Eden

Gary Rubin

mje

Grampa_of_Oliva_n_Eden

Join the conversation

Browse

Activity

Important Information