Jump to content

Slow MD5


Recommended Posts

15 minutes ago, hooovahh said:

Very neat, glad you found an improvement.  But it seems like you forgot to attach the code.  Are you still making improvements?

Sorry. No can do. It's part of ECL but I've given you all the info to replicate it and proved it might be worth your while ;)

Edited by ShaunR
Link to comment
24 minutes ago, hooovahh said:

I'm pretty satisfied with the solutions posted.  You don't seem to be based on your responses.

I have a different solution...

1 hour ago, hooovahh said:

Well this code has been here for 12 years ready to be improved,

.. which is demonstrably an improvement.

You can lead a horse to water but, this time,  I guess I underestimated it's thirstiness.

 

 

 

Link to comment
3 hours ago, ShaunR said:

Yuck. Cmd line :throwpc: Try using the EVP_Digest interface of the NIlibeay32.dll ;) You can even have progress events if you want to be fancy :wub:

I don't know why NI didn't use it :blink:

Direct MD5 call was used from NIlibeay32.dll instead of EVP_Digest interface

Link to comment
1 hour ago, ShaunR said:

You can lead a horse to water but

You most certainly can.  I asked for you to share your improvement so everyone can benefit, you just seemed interested in talking about it but not showing it.

32 minutes ago, mahgust said:

ShaunR, thanks for the idea sharing!

Hoovah, code is attached.

MD5_my2.png.65bdb19c6f99dc43744f5d87536a2176.png

MD5_my2.vi 10.22 kB · 0 downloads

Unfortunately I get "Not Enough Memory to complete this operation" error when I try it on my 1GB+ file.  I suspect the other functions in that call library node are needed like Init, Update, and Final.

Link to comment
1 hour ago, hooovahh said:

You most certainly can.  I asked for you to share your improvement so everyone can benefit, you just seemed interested in talking about it but not showing it.

As I explained. The code I used is part of ECL so commercial IP prevents me from sharing. You should know or at least appreciate this!

1 hour ago, hooovahh said:

I suspect the other functions in that call library node are needed like Init, Update, and Final.

Correct. New, Init, update, final and free (the EVP interface). You will also need digestbyname.

Once you have this you will also be able to do *all* the hashes, not just MD5.

Edited by ShaunR
Link to comment
1 hour ago, mahgust said:

Yes, it seems like for a large files it should be done in this way

http://websites.umich.edu/~x509/ssleay/md5.html

That is the old way. The EVP_Digest interface abstracts away the different hashes into unified set of functions. All that is needed is a while loop where they have two update functions in the example about 2/3rds of the way down in that link..

Edited by ShaunR
Link to comment
6 minutes ago, hooovahh said:

Sorry if it wasn't clear from the original post, or my replies, but a commercial solution isn't appropriate for this.

I don't think you can see the wood for the trees.

I am saying I can't supply the actual code I used because it's commercial but I can help you "discover" how to do it which isn't bound by commercial restraints.

Edited by ShaunR
Link to comment
4 hours ago, mahgust said:

New version with sequence Init -> Update -> Final.

 

P.S. I tried EVP interface but is in stuck with EVP_MD_CTX structure definition in LV.
https://github.com/theos/headers/blob/master/openssl/evp.h

Since you don't access the internal elements in the struct at all from LabVIEW you just can treat it all as a pointer sized integer.

In fact since OpenSSL 1.0.0 all those structs are considered opaque in terms of external users of the API and should never be referenced in any way other than through published OpenSSL functions. In terms of an external API user these contexts are meant to be simply a handle (a pointer to private data whose contents is unknown).

EVP_MD_CTX_create() creates the context -> just configure it to return a pointer sized integer.

Then pass this to all other EVP functions again as pointer sized integer.

And of course don't forget to call the EVP_MD_CTX_free() function at the end to avoid memory leaks. 

 

Edited by Rolf Kalbermatter
  • Like 1
Link to comment
2 hours ago, Rolf Kalbermatter said:

Since you don't access the internal elements in the struct at all from LabVIEW you just can treat it all as a pointer sized integer.

In fact since OpenSSL 1.0.0 all those structs are considered opaque in terms of external users of the API and should never be referenced in any way other than through published OpenSSL functions. In terms of an external API user these contexts are meant to be simply a handle (a pointer to private data whose contents is unknown).

EVP_MD_CTX_create() creates the context -> just configure it to return a pointer sized integer.

Then pass this to all other EVP functions again as pointer sized integer.

And of course don't forget to call the EVP_MD_CTX_free() function at the end to avoid memory leaks. 

 

//new 3.0 openssl lib has new-free definitions
mdctx = EVP_MD_CTX_new();
EVP_MD_CTX_free(mdctx);


//old 1.0 openssl lib has create-destroy definitions
mdctx = EVP_MD_CTX_create();
EVP_MD_CTX_destroy(mdctx);

 

Lib in LV2014 operates in create/destroy definitions.

I've tried to implement create-destroy without structure definition only with pointer, but it leads to LV crash. Could you suggest what is wrong with my code ? Thx!

350255718_Untitled1.png.b4863ee7e7fd35b78089ac5eaca389fd.png

Untitled 1.vi

Link to comment
49 minutes ago, mahgust said:

//new 3.0 openssl lib has new-free definitions
mdctx = EVP_MD_CTX_new();
EVP_MD_CTX_free(mdctx);


//old 1.0 openssl lib has create-destroy definitions
mdctx = EVP_MD_CTX_create();
EVP_MD_CTX_destroy(mdctx);

 

Lib in LV2014 operates in create/destroy definitions.

I've tried to implement create-destroy without structure definition only with pointer, but it leads to LV crash. Could you suggest what is wrong with my code ? Thx!

350255718_Untitled1.png.b4863ee7e7fd35b78089ac5eaca389fd.png

Untitled 1.vi 6.97 kB · 2 downloads

One obvious discrepancy: create uses a pointer sized integer and destroy uses an Adapt to type. This will result in passing the pointer as an u64 passed by reference (Adapt to Type are always passed by reference if they are not handles, arrays or ActiveX references).

What you want to configure it to is Numeric, Pointer sized Integer, Pass by Value.

Yes you want to pass it by value, the value returned from the create function is already a pointer and destroy expects this pointer.

Link to comment
53 minutes ago, mahgust said:

//new 3.0 openssl lib has new-free definitions
mdctx = EVP_MD_CTX_new();
EVP_MD_CTX_free(mdctx);


//old 1.0 openssl lib has create-destroy definitions
mdctx = EVP_MD_CTX_create();
EVP_MD_CTX_destroy(mdctx);

 

Lib in LV2014 operates in create/destroy definitions.

I've tried to implement create-destroy without structure definition only with pointer, but it leads to LV crash. Could you suggest what is wrong with my code ? Thx!

350255718_Untitled1.png.b4863ee7e7fd35b78089ac5eaca389fd.png

Untitled 1.vi 6.97 kB · 0 downloads

The destroy should be an Unsigned Pointer Sized integer (Passed by value) rather than adapt to type.

You can also set the nodes to "Run in any thread"

Rolf got there 10 secs before me.lol

Untitled 1.vi

Link to comment

EVP Functions was used according this OpenSSL version
https://www.openssl.org/docs/man1.0.2/man3/EVP_DigestInit.html

OpenSSL_add_all_digests();
md = EVP_get_digestbyname(argv[1]);
mdctx = EVP_MD_CTX_create();
EVP_DigestInit_ex(mdctx, md, NULL);
EVP_DigestUpdate(mdctx, mess1, strlen(mess1));
EVP_DigestFinal_ex(mdctx, md_value, &md_len);
EVP_MD_CTX_destroy(mdctx);
EVP_cleanup();

Link to comment
1 hour ago, mahgust said:

Thanks a lot, guys! EVP is working now! md4 md5 sha1 sha224 sha256 sha384 sha512 can be calculated.

Quite convinient and fast utility for file integrity check can be created in this way.

Sweet.  My test file was about 4.5s via command line and was 2.2s using your code. 

I'm pretty sure this is just copied over from the NI example, but we can get rid of all the Get and Set file positions.  Not sure why NI put them there in the first place.

Also is this NIlibeay binary available on other RT targets?  I looked at the SSL functions used by Webdav, and the HTTP functions, and they call other NI binaries, the ni_webdavLVClient and ni_httpClient files that likely have a .so file or .dll based on target selection.  I see Pharlap versions of this library but not any others.

Link to comment
2 hours ago, mahgust said:

Thanks a lot, guys! EVP is working now! md4 md5 sha1 sha224 sha256 sha384 sha512 can be calculated.

Quite convinient and fast utility for file integrity check can be created in this way.

Should we point NI to this (hash calc through EVP interface) via Idea Exchange ?

MD5_my7.png.260bb7164c967e51f1ea8ba2985df09f.png

MD5_my7.vi 26.48 kB · 2 downloads

Nice work. Glad you got there.

Link to comment
49 minutes ago, hooovahh said:

Also is this NIlibeay binary available on other RT targets?

There is an opk package in one of the feeds if it's not there already. Chances are, if it has SSH, it'll have compatible binaries. NIlibeay is just NI's compilation of OpenSSL.

Link to comment

I looked at the code and one question (with low relevancy to the thread): Isn't get file size slow? I remember it being almost as slow as to read the whole file. Or did I do something wrong? Would listening to End Of File error be a better option or it has some caveats?

Link to comment
11 hours ago, mahgust said:

Thanks a lot, guys! EVP is working now! md4 md5 sha1 sha224 sha256 sha384 sha512 can be calculated.

Quite convinient and fast utility for file integrity check can be created in this way.

Should we point NI to this (hash calc through EVP interface) via Idea Exchange ?

MD5_my7.png.260bb7164c967e51f1ea8ba2985df09f.png

MD5_my7.viFetching info...

You can remove the Set and Get File Offset inside the loop. The LabVIEW file IO nodes maintain internally a file offset (actually it's the underlying OS file IO functions which do and advance that pointer along as you read). As long as you do pure sequential access there is no need to set the file offset explicitly setting. It's even so that when you open a file for anything but append mode, the file offset will be automatically set to 0.

Only when you do random access will you need to do explicit file offset setting. I don't expect this to save a lot of time but why do it if it is not necessary?

Quote

I looked at the code and one question (with low relevancy to the thread): Isn't get file size slow? I remember it being almost as slow as to read the whole file. Or did I do something wrong? Would listening to End Of File error be a better option or it has some caveats?

That would seem very strange. The Get File Size directly translates to a Windows API call on the underlying file handle. Why that would be so slow is a miracle to me.

Edited by Rolf Kalbermatter
Link to comment
2 hours ago, Lipko said:
9 hours ago, hooovahh said:

I'm pretty sure this is just copied over from the NI example, but we can get rid of all the Get and Set file positions.  Not sure why NI put them there in the first place.

 

Most likely because the original code originates from pre LabVIEW 8.0. There all LabVIEW Read and Write nodes had explicit file offset input and output. When you upgrade these VIs, LabVIEW mutates them by adding explicit file offset calls before and after the File Read and File Write. It's the only safe way as LabVIEW can't easily know if the original file offset handling was unnecessary because the access is fully sequential or not. Obviously for trivial cases like this the analyzer could be made smart enough to decide that it is not needed, but there are corner cases where this is not easily decided. Rather than try to think up of all such corner cases and make sure that analyzer won't decide wrong by removing one file offset call to much, the easier thing is to simply maintain the original functionality and risk some performance loss (which is minimal in comparison to the old situation where this offset handling was always done anyways).

The "example scrubber" for that code probably cleaned it up but didn't dare to remove the file offset calls, obviously not to familiar with LabVIEW internas.

 

Link to comment
38 minutes ago, Rolf Kalbermatter said:

 

That would seem very strange. The Get File Size directly translates to a Windows API call on the underlying file handle. Why that would be so slow is a miracle to me.
 

Never mind, maybe I used File/directory info.vi.

Link to comment

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.