Jump to content

SQLite for OpenG


Recommended Posts

Posted

LabVIEW dll functions can be called from C, I’m pretty sure, but I can’t figure out how toy get a pointer to the function in LabVIEW.

Well like with any other DLL too, using the Windows API LoadLibrary() and GetProcAddress(). The bad thing about this is that it will make your SQLite library more Windows specific and less portable. My personal choice for this would be to create an extra C file that adds the necessary C functions and then either add it to the SQLite source, compiling my own LabVIEW friendly SQLite shared library, or in order to be able to reuse prebuilt SQLite libraries, create a separate DLL that contains these helper functions.

Yes this requires you to use a C compiler for every platform you want to support your library on, but if you start to support more than one platform (and Windows 64 Bit is a different platform already) the maintenance effort of creating those helper shared libraries for each platform and maintaining the platform specific loading of those helper functions is quickly getting a real hassle in comparison to simply creating the helper library directly in C. And your LabVIEW VI library stays all tidy and clean, no matter what LabVIEW platform you want to support.

BTW: I was just checking out Fossil from the same author. Looks like an interesting light weight DVCS application based entirely on an embedded SQLLite manager for its datastorage.

Advantages: Integrated Issue Tracker and wiki, support for binary files although obviously without merging of branches, simple HTTP based network protocol with SSL support both for syncing but also interfacing with it from a different tool (like a LabVIEW plugin).

And all that in a single executable that barely is larger than the SQLite database kernel itself.

The disadvantage, if you can call it that, is the rather simple web based user interface to all this functionality.

  • Like 1
Posted

Any further comments on this package? Or any comments on whether this is right for OpenG? I could alternately try and release it under “Team LAVA”.

I could also use the help of people who use LabVIEW for Macs or Linux to try and test it with SQLite compiled for those platforms. It should work quite easily. It would be a shame to have a Windows-only package.

— James

  • 2 weeks later...
  • 1 month later...
  • 2 weeks later...
Posted

I've written my own SQlite implementation making this the fifth I'm aware of. All of them being yours, mine, ShaunR's, SmartSQLView, and a much older one written by someone at Philips.

Handling Variants can be done (mine handles them) but there's several gotchas to deal with.

SQLite's strings can contain binary data like LabVIEW strings. It looks like your functions are setup to handle the \0's with text so that's not a problem. So you can just write strings as text and flattened data as blobs, then you can use the type of the column to determine how to read the data back. The next trick is how to handle Nulls. As your code is written now NaN's, Empty strings and Nulls will all be saved as sqlite Nulls. The strings are null because the empty is string is passed as a 0 pointer to bind text. So when you have an empty string you need to set the number of bytes to 0 but pass in a non empty string. I never figured out an Ideal solution to NaN's. Since I treat null's as empty variants I couldn't store NaN's as nulls. The way I handled NaN's was to flatten and store them as blobs. I also would flatten empty variants with attributes instead of storing them as nulls (otherwise the attribute information would be lost). Be aware of the type affinity since that can screw this up.

I like how you used property nodes to handle the binding and reading of different types. If you don't mind I might try to integrate that idea into my implementation.

If you want to improve the performance, passing the dll path to every CLN can add a lot of overhead to simple functions (at least when I last checked).

I use wildcards from

http://zone.ni.com/reference/en-XX/help/371361H-01/lvexcodeconcepts/configuring_the_clf_node/

If your executing multiple statements from one SQL string you can avoid making multiple string copies by converting the SQL string to a pointer (DSNewPtr and MoveBlock be sure to release with DSDisposePtr even if an error has occurred). Then you can just use prepare_v2 with pointers directly.

You might want to add the output of sqlite3_errmsg to "SQLite Error.vi" I've found it helpful.

  • Like 2
Posted

Hi Matt, thanks for bringing your experience to this.

Handling Variants can be done (mine handles them) but there's several gotchas to deal with.

SQLite's strings can contain binary data like LabVIEW strings. It looks like your functions are setup to handle the \0's with text so that's not a problem. So you can just write strings as text and flattened data as blobs, then you can use the type of the column to determine how to read the data back. The next trick is how to handle Nulls. As your code is written now NaN's, Empty strings and Nulls will all be saved as sqlite Nulls. The strings are null because the empty is string is passed as a 0 pointer to bind text. So when you have an empty string you need to set the number of bytes to 0 but pass in a non empty string. I never figured out an Ideal solution to NaN's. Since I treat null's as empty variants I couldn't store NaN's as nulls. The way I handled NaN's was to flatten and store them as blobs. I also would flatten empty variants with attributes instead of storing them as nulls (otherwise the attribute information would be lost). Be aware of the type affinity since that can screw this up.

It was my feeling that there is no clean way to directly connect SQLIte3’s loose typing system with LabVIEW variants. One could make a system similar to the OpenG Variant Config VIs, where one inputs a cluster to define the datatypes to read in, but a straight “Get Column as Variant” seems to have too many gotchas to be worth it. If one did want to store arbitrary LabVIEW datatypes in SQLite, one could just flatten the data and store as BLOB, but I thought that option could be left outside the scope of the package.

I like how you used property nodes to handle the binding and reading of different types. If you don't mind I might try to integrate that idea into my implementation.

Please do. I have wondered if it is a good idea to make functions like Step or Finalize also available as Property nodes, as that would allow more compact code in many cases (though as these functions aren’t really “properties” that might be confusing).

If you want to improve the performance, passing the dll path to every CLN can add a lot of overhead to simple functions (at least when I last checked).

Is that true? I wouldn’t have thought that, but I have never tested it. The advantage of passing the dll path is that one can alter it easily. Do you have an performance data with your system that I could compare to?

If your executing multiple statements from one SQL string you can avoid making multiple string copies by converting the SQL string to a pointer (DSNewPtr and MoveBlock be sure to release with DSDisposePtr even if an error has occurred). Then you can just use prepare_v2 with pointers directly.

I realized this after I did it. But I don’t want to introduce “pointers” into any public API function like “Prepare”. I am considering making an alternate, private version of “Prepare” that uses a pointer in this way to allow higher performance in VIs like “Execute SQL".

You might want to add the output of sqlite3_errmsg to "SQLite Error.vi" I've found it helpful.

On the “to do” list. Slightly tricky because of the issue of needed mutexes described in the documentation:

"When the serialized threading mode is in use, it might be the case that a second error occurs on a separate thread in between the time of the first error and the call to these interfaces. When that happens, the second error will be reported since these interfaces always report the most recent result. To avoid this, each thread can obtain exclusive use of the database connection D by invoking sqlite3_mutex_enter(sqlite3_db_mutex(D)) before beginning to use D and invokingsqlite3_mutex_leave(sqlite3_db_mutex(D)) after all calls to the interfaces listed here are completed."

— James

  • Like 1
Posted

Is that true? I wouldn’t have thought that, but I have never tested it. The advantage of passing the dll path is that one can alter it easily. Do you have an performance data with your system that I could compare to?

If you think for a few seconds about it you will recognize that this is true. When a path is passed in, LabVIEW has at every call to verify that the path has not changed in respect to the last call. That is not a lot of CPU cycles for a single call but can add up if you call many Call Library Nodes like that especially in loops. So if the dll name doesn't really change, it's a lot better to use the library name in the configuration dialog, as there LabVIEW only will evaluate the path once at load time and afterwards never again.

If it wouldn't do this check the performance of the Call Library node would be abominable bad, since loading and unloading of DLLs is really a performance killer, where this code comparison is just a micro delay in comparison.

If I would have to have a guess, using a not changing diagram path adds up maybe 100us, maybe a bit more, but compare that to the overhead of the Call Library node itself which is in the range of single us. Comparison of paths on equality is the most expensive comparison operation, as you only can determine equality if you have compared every single element and character in them. Unequality has on average half the execution time, since you can break out of the comparison at the first occurrence of a difference.

  • Like 2
Posted

If I would have to have a guess, using a not changing diagram path adds up maybe 100us, maybe a bit more, but compare that to the overhead of the Call Library node itself which is in the range of single us. Comparison of paths on equality is the most expensive comparison operation, as you only can determine equality if you have compared every single element and character in them.

A quick test using my “Example1” shows that I can INSERT 100,000 points, each involving 4 calls with a diagram path, in 0.75 seconds (this time does not include the “COMMIT” to disk). That’s less than 2 microseconds per CLN. So the overhead of the diagram path can’t be that much. Though if it is a significant fraction of the 2 microseconds, then it will be good to eventually get rid off it.

— James

Added later: I had a look at ShaunR’s “SQLite_Speed Example.vi” which INSERTs pairs of strings: he can INSERT 100,000 in 0.36 seconds, half my time. So perhaps I will look into statically specifying the library. Wish I could specify it in one place, though. One thing a User might want to do is have a different SQLite3 version (compiled with different options, for example) for different applications, and statically specifying the library for each CLN makes that problematic. Is there any way to specify the path at runtime, but do it only once? Or at compile time, but specify it in only one place?

Posted

A quick test using my “Example1” shows that I can INSERT 100,000 points, each involving 4 calls with a diagram path, in 0.75 seconds (this time does not include the “COMMIT” to disk). That’s less than 2 microseconds per CLN. So the overhead of the diagram path can’t be that much. Though if it is a significant fraction of the 2 microseconds, then it will be good to eventually get rid off it.

Always nice to have real numbers :D

I guess my estimations are still based on my times when working with 66MHz i486 CPUs. :D

A modern Dual Core should hopefully smash that into pieces of course.

Posted

I’ve been benchmarking it (by just running a “Bind” in a loop), and using the path adds about 25 nanoseconds per CLN. Haven’t figured out yet why my code seems to be slower than Shaun’s (hope it’s not the LVOOP :) ).

Posted

I’ve been benchmarking it (by just running a “Bind” in a loop), and using the path adds about 25 nanoseconds per CLN. Haven’t figured out yet why my code seems to be slower than Shaun’s (hope it’s not the LVOOP :) ).

Now, 25ns really amazes me! That for a loop that needs to compare several dozen characters. Probably optimized to operate on 4 bytes integers instead of on individual characters. Or maybe LabVIEW nowadays does use dirty flags for its data handles but that seems rather unlikely.

An Always Data Copy in the wire to the path before passed to the CLN should eliminate any cached dirty flags.

And that LVOOP might be an important part of the picture, would not surprise me at all. ;)

Posted

Added later: I had a look at ShaunR’s “SQLite_Speed Example.vi” which INSERTs pairs of strings: he can INSERT 100,000 in 0.36 seconds, half my time. So perhaps I will look into statically specifying the library. Wish I could specify it in one place, though. One thing a User might want to do is have a different SQLite3 version (compiled with different options, for example) for different applications, and statically specifying the library for each CLN makes that problematic. Is there any way to specify the path at runtime, but do it only once? Or at compile time, but specify it in only one place?

There isn't a way to re-bind CLNs that I know of (there might be a scripting method, but...). If you configure the CLNs with a specific (e.g., ".\bin\sqlite.dll") that gives developers the opportunity to replace it on disk with which version they prefer. Personally, I do not see value in being able to wire in the path of a DLL to use. If you have to programmatically determine what version of a library to use...

Do CLNs load their library at load-time or run-time? If it's the later (I believe it is), then you have up until the first CLN call to programmatically copy a different DLL to your relative location. In dealing with an oversight I found in the lvsound2 library recently, I was experimenting with forcing a DLL to unload unilaterally. Unfortunately, it does not seem this is possible. With that in mind, if there are a subset of specific functions (e.g. sqlite_version()) which might need to be called as part of the logic to figure out which library to use, it would be wise to wrap that in a dynamically called VI so that the DLL does not attempt to stay resident.

Posted

It was my feeling that there is no clean way to directly connect SQLIte3’s loose typing system with LabVIEW variants. One could make a system similar to the OpenG Variant Config VIs, where one inputs a cluster to define the datatypes to read in, but a straight “Get Column as Variant” seems to have too many gotchas to be worth it. If one did want to store arbitrary LabVIEW datatypes in SQLite, one could just flatten the data and store as BLOB, but I thought that option could be left outside the scope of the package.

I use the type of the data within sqlite to determine how to read it. When you use "variant to data" with a variant containing 64 bit int (as read from sqlite) it can be converted into a 32bit int without error(as can a double to single). So I store int (all sizes),singles,doubles and strings as their related sqlite types. empty variants as nulls, and every things else (including NaNs) as flattened data . As mine is written anything saved via variant, when read back as a variant will "variant to data" back to it's originally type without loss of data. Which handles all the use cases I could think of. NaN's being flattened was the only iffy part about it. I don't think variant support is critical, but with the way my interface works it gives some advantages.

Please do. I have wondered if it is a good idea to make functions like Step or Finalize also available as Property nodes, as that would allow more compact code in many cases (though as these functions aren’t really “properties” that might be confusing).

I wouldn't your already "abusing" the property system, but in a way that's close to it's intent. My basic interface doesn't expose step or finalize so it's not a problem I need to deal with.

Is that true? I wouldn’t have thought that, but I have never tested it. The advantage of passing the dll path is that one can alter it easily. Do you have an performance data with your system that I could compare to?

On one of my benchmarks where I write a bunch of string data to the database I'm 12% slower if I pass the path in on just the bind text CLN. It should be worse with numeric data (since those have far less overhead). I remember it being worse the last time I checked so I guess some optimizations were made since then.

I realized this after I did it. But I don’t want to introduce “pointers” into any public API function like “Prepare”. I am considering making an alternate, private version of “Prepare” that uses a pointer in this way to allow higher performance in VIs like “Execute SQL".

Unless you're preparing multiple statements where the schema changes, you can prepare all of them before running any of them. If they are changing schema then your "Execute SQL" could be used.

On the “to do” list. Slightly tricky because of the issue of needed mutexes described in the documentation:

"When the serialized threading mode is in use, it might be the case that a second error occurs on a separate thread in between the time of the first error and the call to these interfaces. When that happens, the second error will be reported since these interfaces always report the most recent result. To avoid this, each thread can obtain exclusive use of the database connection D by invoking sqlite3_mutex_enter(sqlite3_db_mutex(D)) before beginning to use D and invokingsqlite3_mutex_leave(sqlite3_db_mutex(D)) after all calls to the interfaces listed here are completed."

— James

I just assumed that the user (me in my case) would never use the same database connection in two places at once. The possible bugs from that are far worse than the rare incorrect error description, so I just considered it back practice and don't try to deal with it's problems.

I’ve been benchmarking it (by just running a “Bind” in a loop), and using the path adds about 25 nanoseconds per CLN. Haven’t figured out yet why my code seems to be slower than Shaun’s (hope it’s not the LVOOP :) ).

It's not the LVOOP mine uses LVOOP and is faster than Shaun's. On Shaun's benchmark with 100000 points, his is 181.95 insert and 174.82 dump mine is 155.43 and 157.77

I have the Property Node interface working on mine (at least a basic version of what I'm going to do), working on optimizing it currently. Found a new way to confuse LabVIEW in the process. Right now when working with large amounts of data a VI that only returns a LVOOP object for storing parameters (and containing nothing at that point only takes about 40 bytes of memory), can take hundreds of ms to run. I think it's trying to reuse the memory previous runs LVOOP object and has to free the old data I had assigned to it, but for some odd reason it can slow down drastically.

Posted

Do CLNs load their library at load-time or run-time? If it's the later (I believe it is), then you have up until the first CLN call to programmatically copy a different DLL to your relative location. In dealing with an oversight I found in the lvsound2 library recently, I was experimenting with forcing a DLL to unload unilaterally. Unfortunately, it does not seem this is possible. With that in mind, if there are a subset of specific functions (e.g. sqlite_version()) which might need to be called as part of the logic to figure out which library to use, it would be wise to wrap that in a dynamically called VI so that the DLL does not attempt to stay resident.

If you specify the path in the configuration dialog, the DLL is loaded at load time. If you specify it through the path parameter it is loaded at runtime.

Posted

I use the type of the data within sqlite to determine how to read it. When you use "variant to data" with a variant containing 64 bit int (as read from sqlite) it can be converted into a 32bit int without error(as can a double to single). So I store int (all sizes),singles,doubles and strings as their related sqlite types. empty variants as nulls, and every things else (including NaNs) as flattened data . As mine is written anything saved via variant, when read back as a variant will "variant to data" back to it's originally type without loss of data. Which handles all the use cases I could think of. NaN's being flattened was the only iffy part about it. I don't think variant support is critical, but with the way my interface works it gives some advantages.

If I’m imagining it right, your package goes the route of getting SQLite to serve as a LabVIEW data-type repository. I would guess you could abstract away the details of the SQLite loose typing system from the User entirely, making it simpler to learn.

I went a different route of defining an explicit boundary between the two type systems, with the User dealing directly with the SQLite types. So my BIND VIs refer to SQLite3 types (Text, Blob, Real, Integer) while my GET COLUMN VIs refer to LabVIEW types (string, DBL, I64). On the its side of the boundary, I let SQLite store things as it wants, including storing empty strings or NaNs and NULL. This, I think, is an advantage if I were to need to interact with an SQLite Database created by another program; can your package handle reading data from columns where some rows are NULL?

How do you handle the fact that LabVIEW strings can represent either ANSI text or binary data? The former maps to SQLite_TEXT, while the later maps to SQLite_BLOB. Do you store all strings as TEXT?

Mine is a lower-level approach, I think, which has tradeoffs of greater flexibility but also greater required knowledge to use. Fortunately SQLite has excellent documentation.

On one of my benchmarks where I write a bunch of string data to the database I'm 12% slower if I pass the path in on just the bind text CLN. It should be worse with numeric data (since those have far less overhead). I remember it being worse the last time I checked so I guess some optimizations were made since then.

I need to recheck my benchmark, as 25 ns seems unbelievably fast even to me.

I just assumed that the user (me in my case) would never use the same database connection in two places at once. The possible bugs from that are far worse than the rare incorrect error description, so I just considered it back practice and don't try to deal with it's problems.

I’m trying to get this into OpenG, though, and don’t want rare race conditions. I could get around this issue using DVRs or some other lock, but that’s some effort, so I’m putting it off until I fully understand all issues.

It's not the LVOOP mine uses LVOOP and is faster than Shaun's. On Shaun's benchmark with 100000 points, his is 181.95 insert and 174.82 dump mine is 155.43 and 157.77

I realized a problem in using my Example1 as a benchmark. Fixing that, I’m still slower than Shaun by about 40%. I need to see if I can improve that.

— James

In dealing with an oversight I found in the lvsound2 library recently, I was experimenting with forcing a DLL to unload unilaterally. Unfortunately, it does not seem this is possible.

In the context help for the “Specify path..” checkbox, there is a “Tip” that seems to indicate the ability to unload a path-referenced dll:

post-18176-0-64655900-1341085350.png

Posted

If I’m imagining it right, your package goes the route of getting SQLite to serve as a LabVIEW data-type repository. I would guess you could abstract away the details of the SQLite loose typing system from the User entirely, making it simpler to learn.

Acting as a data repository is one of the things I set mine up for. I have functions similar to opengs "write panel to ini" and "read panel from ini" but far faster (since they don't have to convert to readable text).

I went a different route of defining an explicit boundary between the two type systems, with the User dealing directly with the SQLite types. So my BIND VIs refer to SQLite3 types (Text, Blob, Real, Integer) while my GET COLUMN VIs refer to LabVIEW types (string, DBL, I64). On the its side of the boundary, I let SQLite store things as it wants, including storing empty strings or NaNs and NULL. This, I think, is an advantage if I were to need to interact with an SQLite Database created by another program; can your package handle reading data from columns where some rows are NULL?

Mine handles null fine, how it handles it depends on the mode it's in. In string mode it get's zero length strings, In variant mode the variants are null, the newer typed reading depend on the particular type for that column.

How do you handle the fact that LabVIEW strings can represent either ANSI text or binary data? The former maps to SQLite_TEXT, while the later maps to SQLite_BLOB. Do you store all strings as TEXT?

SQLite text can hold binary data just like LabVIEW strings. In mine Blobs are typically used to hold flattened LabVIEW data (although they don't have to). As far as SQLite is concerned there is very little difference between a blob and text, the only things that come to mind are type affinities and blob i/o. I have an optimization where I assume TEXT rarely contains /0 when reading columns, but that's not a functionality difference.

I’m trying to get this into OpenG, though, and don’t want rare race conditions. I could get around this issue using DVRs or some other lock, but that’s some effort, so I’m putting it off until I fully understand all issues.

The only way I know of that might work is to have the vi's for savepoints/begin start the lock, and VI's for commit/release release the lock. Otherwise the user cannot compose multiple SQL commands together with out the risk of parallel work screwing it up. Since SQLite is by default in serialized threading mode, I'm not sure if that setup would even gain any protection. There's only so much you can do to protect a programmer from them self. With yours what would happen to the data output if someone made a copy of the statement object and tried to process the two of them in parallel, I think you'd get a really nasty race condition, and I'm not sure if there's a good way to stop them that from happening. I've been meaning to add a non copyable class wires to the idea exchange for stuff like that, but I never really fleshed out the design in my head.

I realized a problem in using my Example1 as a benchmark. Fixing that, I’m still slower than Shaun by about 40%. I need to see if I can improve that.

Shaun's and mine are both highly optimized, so it'll take some work to catch up to them. I would suggest either inlining the majority of your vi's (mine does this) or use subroutine priority (Shaun's does this) as the first optimization to try.

Posted

Mine handles null fine, how it handles it depends on the mode it's in. In string mode it get's zero length strings, In variant mode the variants are null, the newer typed reading depend on the particular type for that column.

What I mean is, if you take your null variants from variant mode and try to cast them to a number, the “Variant to Data” node will throw an error. Your other two modes specify the type when getting the column, as mine does, allowing SQLite to do the Null conversion.

SQLite text can hold binary data just like LabVIEW strings. In mine Blobs are typically used to hold flattened LabVIEW data (although they don't have to). As far as SQLite is concerned there is very little difference between a blob and text, the only things that come to mind are type affinities and blob i/o.

Yeah, but why does SQLite, which is very economical in numbers of types, bother making separate types for TEXT and BLOB? Must make a significant difference somewhere. Remember, I want to remain compatible with non-LabVIEW programs, which may have there own constraints on valid TEXT data. Binary data is NOT valid UTF-8 or UTF-16 data.

I do have an eye towards eventually implementing BLOB I/O. Another difference between TEXT and BLOB are the collation functions and sort order.

I have an optimization where I assume TEXT rarely contains /0 when reading columns, but that's not a functionality difference.

Could you explain this? I don’t see how /0’s have any effect. I extract TEXT or BLOBS as strings with this code, which is unaffected by /0’s:

post-18176-0-59422300-1341135301_thumb.p

The only way I know of that might work is to have the vi's for savepoints/begin start the lock, and VI's for commit/release release the lock. Otherwise the user cannot compose multiple SQL commands together with out the risk of parallel work screwing it up. Since SQLite is by default in serialized threading mode, I'm not sure if that setup would even gain any protection. There's only so much you can do to protect a programmer from them self. With yours what would happen to the data output if someone made a copy of the statement object and tried to process the two of them in parallel, I think you'd get a really nasty race condition, and I'm not sure if there's a good way to stop them that from happening. I've been meaning to add a non copyable class wires to the idea exchange for stuff like that, but I never really fleshed out the design in my head.

A good point about the Statement, but a User could be running multiple statements from the same connection. I only need to lock the connection from function execution to query of the extended error code in order to be sure I get the correct code.

Shaun's and mine are both highly optimized, so it'll take some work to catch up to them. I would suggest either inlining the majority of your vi's (mine does this) or use subroutine priority (Shaun's does this) as the first optimization to try.

To get in OpenG I have to be 2009 compatible, which means no inlining. And I think OpenG frowns on advanced optimizations (or even turning off debugging) so I may be stuck here.

— James

Posted

To get in OpenG I have to be 2009 compatible, which means no inlining. And I think OpenG frowns on advanced optimizations (or even turning off debugging) so I may be stuck here.

Personally I find subroutine priority not an issue, if applied sparingly and very specifically. But once someone starts to apply this to just about any function in a library he made that library a clear trashcan candidate in my eyes. Blind optimization like this is about 10 times worse than no optimization at all.

If there are specific functions, like in this case a function that might retrieve a single data item from a result set and therefore is called potentially 1000 of times in a normal operation, subroutine priority may make sense, if you know for sure that this function is fast and uninterruptable. With fast I mean that the function should not go through an entire hierarchy of driver layers and what else to do its task and it should not involve any operation that may be blocked or interruptable such as any IO operation like disk or even worse network access. If you know that this function accesses already prepared data stored in the database refnum or result set refnum, then a subroutine VI is a responsible choice, but otherwise it is just a disaster waiting to happen. Also consider that subroutine VIs are not debuggable anymore so you really don't want to have that through your entire LabVIEW VI library.

Applying subroutine priority to VIs that are not for sure executed very repeatably in loops is lazyness and wrong applied optimization, with nasty costs such as making the library hard to debug and potentially locking yourself completely up.

As to fixing your threading issue with retrieving error information, my choice here would be to write a C wrapper around the sqlite DLL that returns the error code as function return value and since I'm already busy, would also take care of things like LabVIEW friendly function parameters where necessary, semaphore locking of connections and other refnums where useful and even the dynamic loading of selectable sqlite DLLs if that would be such a dear topic to me. :rolleyes: And I might create a solution based on user refnums, so that the entire access to the interface is done through Property and Method Nodes.

Posted (edited)

What I mean is, if you take your null variants from variant mode and try to cast them to a number, the “Variant to Data” node will throw an error. Your other two modes specify the type when getting the column, as mine does, allowing SQLite to do the Null conversion.

If I saved the number with the variant interface it will not be stored as a null (that's why nan's are flattened). My variant interface does return the type as stored in sqlite, in case it's being used for something more complicated than a data store.

Yeah, but why does SQLite, which is very economical in numbers of types, bother making separate types for TEXT and BLOB? Must make a significant difference somewhere. Remember, I want to remain compatible with non-LabVIEW programs, which may have there own constraints on valid TEXT data. Binary data is NOT valid UTF-8 or UTF-16 data.

I do have an eye towards eventually implementing BLOB I/O. Another difference between TEXT and BLOB are the collation functions and sort order.

I can write and recovery invalid UTF-8 data fine, so I think the encoding only matters if you start using the UTF-16 functions since it would have to convert encoding or if you set sqlite to use utf-16 as it's default encoding since the length would have to be even. Also adding a collation sequence more intelligent than memcmp may mess things up. There are differences between blob and text, but I think they're more meaningfully when your language uses c style strings or you start getting fancy with how you handle text. I had forgotten that text is considered earlier than blob (even though by default they use both use memcmp to sort).

Could you explain this? I don’t see how /0’s have any effect. I extract TEXT or BLOBS as strings with this code, which is unaffected by /0’s:

post-18176-0-59422300-1341135301_thumb.p

When I read text the CLN's return value is a string, if that strings length doesn't match the expected number of bytes (can only happen if it contains \0) then I reread it using the moveblock method. So if the string doesn't contain /0 I can read it faster, but if it does mine is slower. This optimization is the reason my select is faster than Shaun's. I would suggest using Byte Array to String instead of type cast, they're the same speed and the Byte Array will error if it's input type get's changed some how.

A good point about the Statement, but a User could be running multiple statements from the same connection. I only need to lock the connection from function execution to query of the extended error code in order to be sure I get the correct code.

Just seems like a lot of work to fix a rare, non critical bug. And locking every function seems like it'll have a performance hit. Personally I would just add something to the documentation that the error description can potentially be wrong if two errors occur nearly simultaneously, and not worry about. The hard part in my mind is verify that whatever you did actually fixed the bug. For now I would suggest adding the errmsg (it's really helpful with syntax errors), and make fixing the race condition a low priority.

To get in OpenG I have to be 2009 compatible, which means no inlining. And I think OpenG frowns on advanced optimizations (or even turning off debugging) so I may be stuck here.

— James

I don't know the OpenG standards well enough to say. Maybe they'll be ok with careful use of subroutines (binding's, column reads and stepping should be prime candidates).

Personally I find subroutine priority not an issue, if applied sparingly and very specifically. But once someone starts to apply this to just about any function in a library he made that library a clear trashcan candidate in my eyes. Blind optimization like this is about 10 times worse than no optimization at all.

If there are specific functions, like in this case a function that might retrieve a single data item from a result set and therefore is called potentially 1000 of times in a normal operation, subroutine priority may make sense, if you know for sure that this function is fast and uninterruptable. With fast I mean that the function should not go through an entire hierarchy of driver layers and what else to do its task and it should not involve any operation that may be blocked or interruptable such as any IO operation like disk or even worse network access. If you know that this function accesses already prepared data stored in the database refnum or result set refnum, then a subroutine VI is a responsible choice, but otherwise it is just a disaster waiting to happen. Also consider that subroutine VIs are not debuggable anymore so you really don't want to have that through your entire LabVIEW VI library.

Applying subroutine priority to VIs that are not for sure executed very repeatably in loops is lazyness and wrong applied optimization, with nasty costs such as making the library hard to debug and potentially locking yourself completely up.

The issues with subroutines and blocking are why I use inlining instead. When I start activating Inlining on VI's based on performance analysis, it tends to get applied liberally on low level functions. I typically don't add it to high level functions so I don't drastically slow down their recompile times of their callers. Dead code elimination and inlining open up some options for making efficient interfaces. In one of my VI's if you didn't wire one of the VI's outputs a lot of the code to generate it would be thrown out.

As to fixing your threading issue with retrieving error information, my choice here would be to write a C wrapper around the sqlite DLL that returns the error code as function return value and since I'm already busy, would also take care of things like LabVIEW friendly function parameters where necessary, semaphore locking of connections and other refnums where useful and even the dynamic loading of selectable sqlite DLLs if that would be such a dear topic to me. :rolleyes: And I might create a solution based on user refnums, so that the entire access to the interface is done through Property and Method Nodes.

We're talking about extended error information (a common use is a string description of syntax errors). Basic error codes are returned without problem. SQLite is very well designed, so the only non friendly function parameters I can think of are function pointers for extending SQLite. By default the connections lock themselves (the error case is the only exceptional bug I know of that might get through those locks). Although efficiently doing selectable dlls would require a C DLL. I assume user refnums are one of those not publicly documented LabVIEW apis.

I was looking at how hard it would be to make SQLite into a scripting node, the only example of scripting Nodes I know of is LabPython, which apparently you wrote. Is there documentation for adding scripting nodes beyond that source code (lvsnapi.h). My main issue was how to link separate script nodes to the same database connection, ideally passed around via wire (I guess making a particular data type for getting and setting may work). Also how do you return descriptive errors.

Edited by Matt W
Posted

We're talking about extended error information (a common use is a string description of syntax errors). Basic error codes are returned without problem. SQLite is very well designed, so the only non friendly function parameters I can think of are function pointers for extending SQLite. By default the connections lock themselves (the error case is the only exceptional bug I know of that might get through those locks). Although efficiently doing selectable dlls would require a C DLL. I assume user refnums are one of those not publicly documented LabVIEW apis.

User refnums are for instance used by the DOM XML Library. They are indeed not documented, but not so much a LabVIEW API to call as much more a combination of an external shared library with a specific API interface and a text document describing that API to the LabVIEW object manager, such that the DLL gets linked to properly when you use property and method nodes on the according user refnum. It's a powerful tool to extend LabVIEW with libraries without much of LabVIEW VIs involved. And it works from LabVIEW 7 until 2011 without real issues, but there is no guarantee that it could not be necked in a coming version.

I was looking at how hard it would be to make SQLite into a scripting node, the only example of scripting Nodes I know of is LabPython, which apparently you wrote. Is there documentation for adding scripting nodes beyond that source code (lvsnapi.h). My main issue was how to link separate script nodes to the same database connection, ideally passed around via wire (I guess making a particular data type for getting and setting may work). Also how do you return descriptive errors.

While it's theoretically imaginable to interface an SQL database through a script node I think it is highly unpractical. The script node at least in the version as is documented in lvsnapi.h and which is the only information I have available is meant to work on a local session context to the particular script node. Much like your sqlite connection, and passing this connection around to various script nodes is highly complicated and also delivers no real benefit, since the script contents is static. You can't change the script text at runtime, not even with VI scripting as that is considered an edit operation that can only occur at edit time. So you end up writing in stone your database interface which is very seldom how you want to access databases. At least some of the query parameters are usually dynamic and while you could pass that into the script node as parameter, your script node needs to be able to interpret the entire script, so you need some parser too. The script node interface simply receives the text, and the list of parameters and has to do something with it. Also the supported parameter types are somewhat limited.

So you end up either with a script node that can only contain the SQL text you pass to a method, and does always implement a specific SQL statement sequence or you need to add some intermediate parser that gives you more flexibility in what you can put into the scriptnode besides the SQL statements.

Posted

If I saved the number with the variant interface it will not be stored as a null (that's why nan's are flattened)...

There are differences between blob and text, but I think they're more meaningfully when your language uses c style strings...

I’m thinking about interoperability with other programs (admittedly, not the most common use case) that don’t use flattened NaN’s and the like.

When I read text the CLN's return value is a string, if that strings length doesn't match the expected number of bytes (can only happen if it contains \0) then I reread it using the moveblock method. So if the string doesn't contain /0 I can read it faster, but if it does mine is slower. This optimization is the reason my select is faster than Shaun’s.

I’m surprised that would be faster than the MoveBlock method (but I’ve not benchmarked it).

I would suggest using Byte Array to String instead of type cast, they're the same speed and the Byte Array will error if it's input type get's changed some how.

Good catch; I don’t know why I didn’t use Byte Array to String there.

Just seems like a lot of work to fix a rare, non critical bug. And locking every function seems like it'll have a performance hit. Personally I would just add something to the documentation that the error description can potentially be wrong if two errors occur nearly simultaneously, and not worry about. The hard part in my mind is verify that whatever you did actually fixed the bug. For now I would suggest adding the errmsg (it's really helpful with syntax errors), and make fixing the race condition a low priority.

I think you’re right. I’ll get the error message added.

Thanks,

— James

Posted

When I read text the CLN's return value is a string, if that strings length doesn't match the expected number of bytes (can only happen if it contains \0) then I reread it using the moveblock method. So if the string doesn't contain /0 I can read it faster, but if it does mine is slower. This optimization is the reason my select is faster than Shaun's.

Why would reading in the data as string first be faster than reading it with MoveBlock() or into a byte array, as it's in both cases one memory buffer copy.

I would suggest using Byte Array to String instead of type cast, they're the same speed and the Byte Array will error if it's input type get's changed some how.

Actually the ByteArrayToString has a slight change to be faster than Typecast. That may sound counterintuitive you you think of Typecast in the C way, but the LabVIEW Typecast is a lot more complicated than a C Typecast. It maintains the memory information logically but not necessarily physically such as a C typecast would. For one thing it always employs endianess swapping (not really an issue for byte data as there is nothing to swap there) and in the case of Typecasting an int32 to a float for instance it involves in fact double byte swapping on little endian machines (any LabVIEW machine at the current time except maybe the Power PC VxWorks cRIO systems), once for the native integer format to the inherent flattened format and then again from the flattened format to the native floating point format. So a LabVIEW Typecast is anything but a simple C typecast.

Posted (edited)

Why would reading in the data as string first be faster than reading it with MoveBlock() or into a byte array, as it's in both cases one memory buffer copy.

I'm not exactly sure why but there was a speed improvement from it. A few possible reasons string's copy is inlined and doesn't have the slight overhead of the cln call. Or it might use some form of memory copy instead of move (copy doesn't handle overlap so it should be faster).

I don't think I can read directly into a byte array. The size of the data is a separate CLN, and the pointer to the string data is the return value. So there no way (that I can see) to size the byte array, which is why I use move block.

Actually the ByteArrayToString has a slight change to be faster than Typecast. That may sound counterintuitive you you think of Typecast in the C way, but the LabVIEW Typecast is a lot more complicated than a C Typecast. It maintains the memory information logically but not necessarily physically such as a C typecast would. For one thing it always employs endianess swapping (not really an issue for byte data as there is nothing to swap there) and in the case of Typecasting an int32 to a float for instance it involves in fact double byte swapping on little endian machines (any LabVIEW machine at the current time except maybe the Power PC VxWorks cRIO systems), once for the native integer format to the inherent flattened format and then again from the flattened format to the native floating point format. So a LabVIEW Typecast is anything but a simple C typecast.

I was referring to this particular typecast, since byte array's and string have identical memory layouts and don't require swaps (at least on windows, I'm unsure if alignment is different between the two on other systems), I assumed that have the same performance (I believe I have tested that assumption at some point as well). I wasn't aware that typecasting between i32 and float being so costly. I don't see why they don't just do what C does and simply interpret it differently, it sounds like the two method's produce the same results.

Edited by Matt W
Posted

I’m surprised that would be faster than the MoveBlock method (but I’ve not benchmarked it).

— James

I wouldn't worry too much about performance to begin with. Getting everything mapped out and functioning is (IMHO) more important since the optimisation does not prevent it's use and can take a while due to it being an iterative process (this can be achieved with each stable release).

If you are looking at making it directly compatible with other apps for viewing, you will need to insert using the "string to UTF8" and recover using the "UTF8 To String" vis as the methods Matt and I use do not honor this.

UTF8 Conversion

Why would reading in the data as string first be faster than reading it with MoveBlock() or into a byte array, as it's in both cases one memory buffer copy.

Because to use the Moveblock you have to use 3 API calls rather than one (Get the pointer, the size and then Move it).

When I read text the CLN's return value is a string, if that strings length doesn't match the expected number of bytes (can only happen if it contains \0) then I reread it using the moveblock method. So if the string doesn't contain /0 I can read it faster, but if it does mine is slower. This optimization is the reason my select is faster than Shaun’s.

That's not the reason ;)

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.