Jump to content

Recommended Posts

So when I think of file integrity I think of checksums and MD5. I realize there are tons of different hash methods and CRCs available but I prefer MD5. So I was exited when I heard LabVIEW 8.2 got MD5 for files natively (I think it was in the vi lib in 8.0 but nothing on the palette)

But since I've used the MD5 I've been disappointed it how long it takes to calculate an MD5. So I did some quick tests comparing the native MD5, to the OpenG MD5, against the command line version I've been using found at http://www.etree.org/md5com.html . For small files (less than 30kb) the native MD5 is relativly quick at around 50ms for one file. This is good if you are checking the integrity of a config file, but I'd rather use it as a general purpose file utility, checking the integrity of a directory of files.

Any file above 30kb and the command line version process it faster. I performed an MD5 on four 5Mb text files, and using the native MD5 it took 2,786ms, while the command line took 125ms. The OpenG wasn't a good comparison since it processed the whole file at once taking, over 30 seconds.

So I wrote an "improved" MD5 calculation VI. I think you'll be horrified when you look at the source, it just uses the command line version but it works, and alot faster than either OpenG or native. I also saved it in 7.1.

EDIT: I seem to have a problem uploading (says I didn't select a file) so I hosted it on my site for now.

http://brian-hoover.com/Code/LabVIEW/MyMD5File.zip

  • Like 1
Link to post
Share on other sites

I own the one that ships with LabVIEW. If y'all figure out a way to implement it with 100% G code (i.e. no command line calls) that's faster than the current shipping code, I'd certainly be open to changing it in LV 2010. This topic came up a few years back on LAVA, and at the time, mine was quite a bit faster than the OpenG one.

-D

Link to post
Share on other sites

I didn't expect my code would be put in the next rev. of LabVIEW for several reasons. That wasn't my intent. I just wanted to have a way of calculating the fastest MD5 possible for a directory of files. I ran it on 500MB of random files in the My Documents folder and it took 3 seconds using my version (with command line embedded) and it took 75 seconds using the native code. But I realize the limitations of using a command line. Unable to handle crashes, needs Windows, need access to a temp folder, unsure how it works with new versions of Windows, among other problems.

I don't know how to optimize the MD5 algorithm, but what sort of things are off limits for potential additions to LabVIEW? Like if I found a .dll that calculated the checksum quickly could I write a VI which just uses that .dll? I assume there are legal reasons why NI could not include random code from the internet in a commerical product.

@Ton

I saw that code in SourceForge a little while ago but it's missing two VIs

MD5 Unrecoverable U8 padding.vi

MD5 FGHI functions.vi

I'd be glad to do some testing to see how each stacks up.

Link to post
Share on other sites

Thanks Tom, I got the all the needed VIs and ran again. OpenG still seems to be the slowest. I've played around with the chunk size and haven't been able to improve it much. I did one 2Mb file with a 10KB chunk and it took 2.8 seconds. The command line version took .01 seconds and the native took .2 seconds. For now I'm sticking with my command line version.

BTW I reported the fact that we can't upload files.

Link to post
Share on other sites

QUOTE (Darren @ Jun 10 2009, 04:11 PM)

I experimented with the shipping implementation, and found that the following will help the performance:

Disable debugging in "MD5 Checksum File" and the sub-vi "MD5Checksum Core".

Inside "MD5Checksum Core", the inner-most loop contains a section of code that performs Swap Words and Swap Bytes on the current array Element. Move these two functions to the outermost loop and place them immediately after the typecast of the string to an array of U32.

I reduced the MD5 calculation on version 8.6.1f1 LabVIEW.exe from 2.79 seconds to 2.12 seconds.

QUOTE (hooovahh @ Jun 10 2009, 02:57 PM)

Any file above 30kb and the command line version process it faster. I performed an MD5 on four 5Mb text files, and using the native MD5 it took 2,786ms, while the command line took 125ms. The OpenG wasn't a good comparison since it processed the whole file at once taking, over 30 seconds.

I revisited my .NET implementation from here and found that it one of the .NET methods was broken when I loaded the VI in LabVIEW 8.6.1. I've fixed it and cleaned it up, but can't upload to the LAVA forums at the moment. (not sure why...) Maybe the .NET technique will work for you...

  • Like 1
Link to post
Share on other sites

I like your .NET method. In my test, for files less than 16MB the command line version is faster by a little, with both times around 100ms for the 16MB file, while the native is around 2380ms.

But as files grow to around the size I want to be process the .NET method works faster. I ran a test with 500MB of files, with a file size all between 50MB and 80MB and the command line took 4900ms and the .NET took 2320ms.

I know what you mean when you said it wouldn't open in a newer version of LabVIEW. It opens fine in 7.1, and 8.0 but any thing newer the Invoke node names are slightly different and need to be re-linked but after that it works.

So I could determine the size of the file, and use the right method for that file size, but I'm just going to stick with your method since the improvement for small files is very small between them all. Thanks.

Link to post
Share on other sites

QUOTE (Phillip Brooks @ Jun 11 2009, 08:30 AM)

I experimented with the shipping implementation, and found that the following will help the performance:

Thanks, Phillip. I have filed CAR# 173651 to myself for investigating your suggestions in LabVIEW 2010. If anybody else has any suggestions, post them here, as I will be reviewing this thread when looking into the CAR later this year. Again, I'm looking to stick with a 100% G, platform-independent implementation.

-D

Link to post
Share on other sites

QUOTE (Darren @ Jun 11 2009, 05:01 PM)

Thanks, Phillip. I have filed CAR# 173651 to myself for investigating your suggestions in LabVIEW 2010. If anybody else has any suggestions, post them here, as I will be reviewing this thread when looking into the CAR later this year. Again, I'm looking to stick with a 100% G, platform-independent implementation.

-D

I tried converting your code to read U32 values instead of a string, and I also removed the "Get File Position" and "Set File Position" functions.

By this I was able to reduce the time, by approximately 30%, for large files.

/J

Link to post
Share on other sites

QUOTE (Darren @ Jun 11 2009, 11:01 AM)

Thanks, Phillip. I have filed CAR# 173651 to myself for investigating your suggestions in LabVIEW 2010. If anybody else has any suggestions, post them here, as I will be reviewing this thread when looking into the CAR later this year. Again, I'm looking to stick with a 100% G, platform-independent implementation.

-D

FYI, I experimented with loading the file data in one loop and passing the data via a queue to the 'core' function running in a separate loop, thinking that the file I/O was a place for improvement. It appears that the majority of the overhead is in the 'core' vi; no gains were detected...

Link to post
Share on other sites
  • 1 month later...

Ok, I had a few minutes this afternoon to re-read this thread and look into any low-hanging fruit for improving the performance of the MD5 VI that ships with LabVIEW. Unless I missed something, there were three concrete suggestions for improving the performance of the core VI:

1. Disable debugging: Done.

2. Move Swap Words and Swap Bytes functions out of the loop: Done, although this appears to have a negligible effect compared to turning off debugging.

3. Process file as a U32: I haven't done this one yet. JFM, can you post your modified version of the VI so I can take a look at it? I'm not sure yet if I want to go forward with this change, as there's another VI in that LLB (MD5Checksum string.vi) that can be used to generate the MD5 of a string, independent of File I/O, that assumes the core VI takes a string input.

-D

Link to post
Share on other sites
  • 1 year later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.