Optimization of reshape 1d array to 4 2d arrays function

Bruniii · April 16, 2024

Hi all,

I'm looking for help to increase as much as possibile the speed of a function that reshape a 1D array from a 4 channels acquisition board to 4 2D array. The input array is:

Ch0_0_0 - Ch1_0_0 - Ch2_0_0 - Ch3_0_0 - Ch0_1_0 - Ch1_1_0 - Ch2_1_0 - Ch3_1_0 - ... - Ch0_N_0 - Ch1_N_0 - Ch2_N_0 - Ch3_N_0 - Ch0_0_1 - Ch1_0_1 - Ch2_0_1 - Ch3_0_1 - Ch0_1_1 - Ch1_1_1 - Ch2_1_1 - Ch3_1_1 - ... - Ch0_N_M - Ch1_N_M - Ch2_N_M - Ch3_N_M

where, basically, the array is the stream of samples from 4 channel, of M measures, each measure of N samples per channel per measure. First the first sample of each channel of the first measure, than the second sample of each channel....

Addtionally, I need to remove the first X samples and last Z samples from each measure for each channel (basically, i'm getting N samples from the board but I only care about the samples from X to N-Z, for each cahnnel and measure). The board can be configured only with power of 2 samples per measure, hence no way to receive from the board only the desired length.

The end goal is to have 4 2D array (one for each channel), with M rows and N-(X+Z) columns. The typical length of the input 1D array is 4 channel * M=512 measure * N=65536 samples/ch*measure; typical X = 200, Z = 30000.

Originally I tried the following code:

and then this, which is faster :

Still, every millisecond gained will help and I'm sure that an expert here can achieve the same result with a single super efficient function. The function will run on a 32-cores intel i9 cpu.

Thanks!

Marco.

X___ · April 16, 2024

Try decimate array?

Edited April 16, 2024 by X___
corrected mistake

Bruniii · April 16, 2024

3 hours ago, X___ said:

Try interleave array?

I do not have a "nice" vi anymore but the very first implementation was based on the decimate array function (I guess you are referring to decimate and not interleave) but it was slower than the other two solutions:

ShaunR · April 17, 2024

Post the VI's rather than snippets (snippets don't work on Lavag.org) along with example data. It's also helpful if you have standard benchmarks that we can plug our implementation into (sequence structure with frames and getmillisecs) so we can compare and contrast.

e.g

Edited April 17, 2024 by ShaunR

Bruniii · April 17, 2024

9 hours ago, ShaunR said:

Post the VI's rather than snippets (snippets don't work on Lavag.org) along with example data. It's also helpful if you have standard benchmarks that we can plug our implementation into (sequence structure with frames and getmillisecs) so we can compare and contrast.

e.g

Sure, the attached vi contains the generation of a sample 1d array to simulate the 4 channels, M measures, N samples and the latest version on the code to reshape it, inside a sequence structure.

test_reshape.vi

ShaunR · April 23, 2024

Nope. I can't beat it. To get better performance i expect you would probably have to use different hardware (FPGA or GPU).

Self auto-incrementing arrays in LabVIEW are extremely efficient and I've come across the situation previously where decimate is usually about 4 times slower. Your particular requirement requires deleting a subsection at the beginning and end of each acquisition so most optimisations aren't available.

Just be aware that you have a fixed number of channels and hope the HW guys don't add more or make a cheaper version with only 2.

Bruniii · April 23, 2024

2 hours ago, ShaunR said:

Nope. I can't beat it. To get better performance i expect you would probably have to use different hardware (FPGA or GPU).

Self auto-incrementing arrays in LabVIEW are extremely efficient and I've come across the situation previously where decimate is usually about 4 times slower. Your particular requirement requires deleting a subsection at the beginning and end of each acquisition so most optimisations aren't available.

Just be aware that you have a fixed number of channels and hope the HW guys don't add more or make a cheaper version with only 2.

Thanks for trying!

How "easy" is to use GPUs in LabVIEW for this type of operations? I remeber reading that I'm supposed to write the code in C++, where the CUDA api is used, compile the dll and than use the labview toolkit to call the dll. Unfortunally, I have zero knowlodge in basically all these step.

ShaunR · April 23, 2024

2 hours ago, Bruniii said:

Thanks for trying!

How "easy" is to use GPUs in LabVIEW for this type of operations? I remeber reading that I'm supposed to write the code in C++, where the CUDA api is used, compile the dll and than use the labview toolkit to call the dll. Unfortunally, I have zero knowlodge in basically all these step.

There is a GPU Toolkit if you want to try it. No need to write wrapper DLL's. It's in VIPM so you can just install it and try. Don't bother with the download button on the website-it's just a launch link for VIPM and you'd have to log in.

One afterthought. When benchmarking you must never leave outputs unwired (like the 2d arrays in your benchmark). LabVIEW will know that the data isn't used anywhere and optimise to give different results than when in production. So you should at least do something like this:

On my machine your original executed in ~10ms. With the above it was ~30ms.

Edited April 23, 2024 by ShaunR

Bruniii · April 27, 2024

On 4/23/2024 at 5:07 PM, ShaunR said:

There is a GPU Toolkit if you want to try it. No need to write wrapper DLL's. It's in VIPM so you can just install it and try. Don't bother with the download button on the website-it's just a launch link for VIPM and you'd have to log in.

One afterthought. When benchmarking you must never leave outputs unwired (like the 2d arrays in your benchmark). LabVIEW will know that the data isn't used anywhere and optimise to give different results than when in production. So you should at least do something like this:

On my machine your original executed in ~10ms. With the above it was ~30ms.

Thank you for the note regarding the compiler and the need to "use" all the outputs. I know it but forget when writing this specific vi.

Regarding the GPU toolkit: it's the one I read in the past. In the """""documentation"""", NI writes:

Quote

In this toolkit, the function wrappers for the FFT and BLAS operations already are built with the LVGPU SDK, and they specifically call the NVIDIA CUDA libraries and communicate with a GPU through an NVIDIA API. You can use the LVGPU SDK to build wrappers for implementing custom GPU functions to execute on any co-processor device as long as LabVIEW can call the external function.

https://www.ni.com/docs/en-US/bundle/labview-gpu-analysis-toolkit-api-ref/page/lvgpu/lvgpu.html And, for example, I found the following topic on NI forum: https://forums.ni.com/t5/GPU-Computing/Need-Help-on-Customizing-GPU-Computing-Using-the-LabVIEW-GPU/td-p/3395649 where it looks like the custom dll for the specific operations needed is required.

Rolf Kalbermatter · April 27, 2024

There are several alternatives for the NI GPU Toolkit that are considerably more up to date and actually still maintained.

https://www.ngene.co/gpu-toolkit-for-labview

https://www.g2cpu.com/

G2CPU · January 16

Hi,

Norman Kircher Just made me aware of this thread.

I created some code (based on G2CPU 1.4.2 community edition) where I'm hitting 800us to 900us on a RTX 4080 Laptop. (equal to a RTX 4060 desktop)

I purposely didn't do indexing to then pull it back to RAM as I don't know what you want to do after.

You can then use this data for boolean comparison, statistics etc before pulling it out of the GPU.

This code works on CUDA and OpenCL and is also LabVIEW RT compatible.

Let me know if it works for you.

Contact me at natan.biesmans@g2cpu.com if you need a more tailor fit implementation.

Br,

Natan Biesmans

CEO G2CPU the GPU and CPU HPC Toolkit for LabVIEW
LabVIEW Champion, CLA, CPI

Deserialize Data with Nan Removal.vi

Sign In

Optimization of reshape 1d array to 4 2d arrays function

Recommended Posts

Bruniii

X___

Bruniii

ShaunR

Bruniii

ShaunR

Bruniii

ShaunR

Bruniii

Rolf Kalbermatter

G2CPU

Join the conversation

Browse

Activity

Important Information