Jump to content

What do you use JSON for?


drjdpowell

Recommended Posts

I’m working on a new JSON library that I hope will be much faster for my use cases.  It skips any intermediate representation (like the LVOOP objects in the LAVA JSON API, or the Variants in some other JSON toolkits) and works directly on JSON text strings.  I’d like to avoid making a tool just for myself, so I’d like to know what other people are using JSON for.  Is anyone using JSON for large(ish) data?  Application Config files?  Communication with remote non-LabVIEW programs?  Databases?

Link to comment

Yes.

I now use the SQLite JSON capabilities for in-application uses but I also have my own parser for Websockets and comms. The other JSON library was just too slow for streaming Websockets and the NI primitive is as much use as a chocolate fireguard because it crashes out if anything isn't quite right. (which I've raged about before). If you want to see this sort of use case then take a look at blockchain.info for the real-time transactions.

I went back to my original ones that I showed in the original thread and developed those further by having a format case for each and every type and used queues for the nesting (kept the polymorphic reads the same as the original). It is acceptably slower than the native one and orders of magnitude faster than the other library on large data (much the same for small snippets) although it isn't as good with all the different encoding which it just hand-waves to a string.

Edited by ShaunR
Link to comment
8 hours ago, ShaunR said:

The other JSON library was just too slow for streaming Websockets and the NI primitive is as much use as a chocolate fireguard because it crashes out if anything isn't quite right. 

One of the performance advantages of working directly with JSON strings is that, when converting JSON to/from a Variant, one can use the NI primitives to handle large numeric arrays, without ever letting it see (and throw a fit over) the entire string.  In fact, I can often pull out selected elements of a large JSON text faster than using the NI primitive’s “path” input (I think because the primitive insists on parsing the entire string for errors, while I don’t).  

Link to comment
22 hours ago, drjdpowell said:

I’m working on a new JSON library that I hope will be much faster for my use cases.  It skips any intermediate representation (like the LVOOP objects in the LAVA JSON API, or the Variants in some other JSON toolkits) and works directly on JSON text strings.  I’d like to avoid making a tool just for myself, so I’d like to know what other people are using JSON for.  Is anyone using JSON for large(ish) data?  Application Config files?  Communication with remote non-LabVIEW programs?  Databases?

I usually use it for config files or for any debug information (web service output, syslog messages, etc) which might be read by a human. I'm not sure what quantity makes the data 'large' but it could certainly be a few pages of data if you have arrays. For right this moment, I'm also using it for TCP messages but I may swap that over to flattened strings -- even if theres no real reason, as a rule I try to avoid using lv proprietary formats. For the cfg use performance isn't a huge deal but for everything else I feel like the lava api is too slow for me to run to it for everything. This may be unfair, but in general for those uses I'll pull out the built-in flatten to json.

One thing I can say for sure is I've never needed the in-memory key-value features of the lava API. I just use the json stuff as an interchange, so all those objects only ever go in one function. The other issue I've had with it is deploying to RT...labview doesn't like some objects on RT, and the lava API fits in that category. Unsure why but it caused a lot of headaches a few months back when I tried to use it -- ended up just reverting.

Associated with my main usage, the things I'd love to see are
1-Handle enums and timestamps and similar common types without being whiny about how its not in the standard like the built-in API is.
--->This is just because I generally do a quick flatten/unflatten for the cfg files, syslog, and tcp messages. Using the lv api you have to manually convert every offending element, which soaks up any speed boost you get from using the built-in one.
2-Discover and read optional components (technically possible to read optional with lv api, but pretty wasteful and also gross. Unless there is magic I don't know, there is no way to discover with the built-in api). 
--->Again on the cfg side, being able to pull a substring out as a 'raw json object' or something and pass that off to a plugin would let you nicely format things that might change. On the generation side, letting the plugin return a plain json object and appending that into the tree is handy too. For the higher-speed code I guess I don't really need this.
3-I love the lava api's pretty-print.
--->Its just handy for debugging and, for the cfg files, its nice to be able to easily read it. Not important for the TCP/syslog use cases. (It occurs to me it would be easy to use the lava api for this too, since for config files the slower speed doesn't matter so much).

Edited by smithd
Link to comment
11 hours ago, drjdpowell said:

One of the performance advantages of working directly with JSON strings is that, when converting JSON to/from a Variant, one can use the NI primitives to handle large numeric arrays, without ever letting it see (and throw a fit over) the entire string.  In fact, I can often pull out selected elements of a large JSON text faster than using the NI primitive’s “path” input (I think because the primitive insists on parsing the entire string for errors, while I don’t).  

I was initially manipulating the string but then you demonstrated the recursive approach with objects for encoding which was more elegant and removed all the dodgy string logic to handle the hierarchy. Once I found that classes just didn't cut it for performance (as per usual) I went back and solved the same problem with queues.

The fundamental difference in my initial approach was that the retrieval type was chosen by the polymorphic instance that the developer chose (it ignored the implicit type in the JSON data). That was fast but getting a  key/value table was ugly. Since all key/value pairs were strings internally the objects made it easier to get the key/value pairs into a lookup table. Pushing and popping queues were much faster and more efficient at that, though, and didn't require large amounts of contiguous memory.

Link to comment
6 hours ago, smithd said:

-Handle enums and timestamps and similar common types without being whiny about how its not in the standard (yet apparently changing the standard for DBL types was just fine).
-Discover and read optional components (technically possible with lv api, but pretty wasteful and also gross). 
-I love the lava api's pretty-print.

I’d add:

- Work on a stream (i.e. allow the JSON Value to be followed by something else, like the regular Flatten functions have a “Rest of String” output).

- Give useful error messages that include where in the very long JSON text the parser had an issue.

- Work with “sub”JSON.  Meaning, “I know there is an “Options” item, but it can come in multiple forms, so just return me that item as JSON so I can do more work on it (or pass it on to an application subcomponent that does know what the form is).

The library I’m working on, JSONtext, is trying to be sort of an extension to the inbuilt JSON primitives that adds all these features.

  • Like 1
Link to comment
7 hours ago, smithd said:

I usually use it for config files or for any debug information (web service output, syslog messages, etc) which might be read by a human. I'm not sure what quantity makes the data 'large' but it could certainly be a few pages of data if you have arrays. For right this moment, I'm also using it for TCP messages but I may swap that over to flattened strings -- even if theres no real reason, as a rule I try to avoid using lv proprietary formats. For the cfg use performance isn't a huge deal but for everything else I feel like the lava api is too slow for me to run to it for everything. This may be unfair, but in general for those uses I'll pull out the built-in flatten to json.

One thing I can say for sure is I've never needed the in-memory key-value features of the lava API. I just use the json stuff as an interchange, so all those objects only ever go in one function. The other issue I've had with it is deploying to RT...labview doesn't like some objects on RT, and the lava API fits in that category. Unsure why but it caused a lot of headaches a few months back when I tried to use it -- ended up just reverting.

Associated with my main usage, the things I'd love to see are
1-Handle enums and timestamps and similar common types without being whiny about how its not in the standard like the built-in API is.
--->This is just because I generally do a quick flatten/unflatten for the cfg files, syslog, and tcp messages. Using the lv api you have to manually convert every offending element, which soaks up any speed boost you get from using the built-in one.
2-Discover and read optional components (technically possible to read optional with lv api, but pretty wasteful and also gross. Unless there is magic I don't know, there is no way to discover with the built-in api). 
--->Again on the cfg side, being able to pull a substring out as a 'raw json object' or something and pass that off to a plugin would let you nicely format things that might change. On the generation side, letting the plugin return a plain json object and appending that into the tree is handy too. For the higher-speed code I guess I don't really need this.
3-I love the lava api's pretty-print.
--->Its just handy for debugging and, for the cfg files, its nice to be able to easily read it. Not important for the TCP/syslog use cases. (It occurs to me it would be easy to use the lava api for this too, since for config files the slower speed doesn't matter so much).

I don't use any of them for this sort of thing. They introduced the JSON extension as a build option in SQLite so it just goes straight in (raw) to an SQLite database column and you can query the entries with SQL just as if it was a table. It's a far superior option (IMO) to anything in LabVIEW for retrieving including the native one. I did write a quick JSON exporter in my API to create JSON from a query as the corollary (along the lines of the existing export to CSV) but since no-one is investing in the deveopment anymore, I'm pretty "meh" about adding new features even though I have a truck-load of prototypes.

(And yes. I figuratively wanted to kiss Ton when he wrote the pretty print :D )

Link to comment

Some performance numbers:

I took the "Message on new transaction" JSON from the blockchain.info link that Shaun gave, created a cluster for it, and compared the latest LAVA-JSON-1.4.1**, inbuilt NI-JSON, and my new JSONtext stuff for converting JSON to a cluster.

  • LAVA-JSON: 7.4 ms
  • NI-JSON: 0.08 ms
  • JSONtext: 0.6 ms

Then I added a large array of 10,000 numbers to bulk the array out by 50kB.

If I add the array to the Cluster I get these numbers:

  • LAVA-JSON: 220 ms
  • NI-JSON: 5.6 ms
  • JSONtext: 9.0 ms   (I pass the large array to NI-JSON internally, which is why I'm closer)

If I don't add the array to the cluster (say, I'm only interested in the metadata of a measurement):

  • LAVA-JSON: 135 ms
  • NI-JSON: 5.2 ms  
  • JSONtext: 1.1 ms  

The NI tools appear to vet everything very carefully, even unused elements, while I do the minimal checking needed to parse past the large array (in fact, if I find all cluster elements before reaching the array, I just stop, meaning the time to convert is 0.6 ms, as if the array wasn't there).

**Note: earlier LAVA-JSON versions would be notably worse, especially for the large case.

Link to comment
2 hours ago, ShaunR said:

I don't use any of them for this sort of thing. They introduced the JSON extension as a build option in SQLite so it just goes straight in (raw) to an SQLite database column and you can query the entries with SQL just as if it was a table. It's a far superior option (IMO) to anything in LabVIEW for retrieving including the native one.

I prototyped using an in-memory SQLite DB to do JSON operations, but I can get comparable speed by direct parsing.  But using JSON support in a Database is a great option.

Link to comment
35 minutes ago, drjdpowell said:

I prototyped using an in-memory SQLite DB to do JSON operations, but I can get comparable speed by direct parsing.  But using JSON support in a Database is a great option.

300MB/sec? :D

If you want bigger JSON streams then the bitcoin order books are usually a few MB

Edited by ShaunR
Link to comment
11 hours ago, smithd said:

One thing I can say for sure is I've never needed the in-memory key-value features of the lava API. I just use the json stuff as an interchange, so all those objects only ever go in one function.

I have only used it that way a bit.   And that was basically for recording attributes of a dataset as it passed through a chain of analysis.  I was only adding things at a few places before the result got saved to disk/database.  The new JSONtext library has Insert functions to add/change values.  These are slower than the old library, but not so much as to make up for the expensive conversion to/from LVOOP objects, unless one is doing hundreds of inserts.  If someone is using LAVA-JSON objects in such a way, I would like to know about it.  

Edited by drjdpowell
Link to comment
1 hour ago, drjdpowell said:

Only 125MB/sec, but I was testing calling 'SELECT json_extract($json,$path)' which has the extra overhead of getting the JSON string in and out of the db.  I wish I could match 300MB/sec in LabVIEW.

That's nothing to sniff at. At some point you just have to say "this is the wrong way to approach this problem". JSON isn't a high performance noSQL database - it's just a text format and one designed for a non-threaded, interpreted scripting language (so performance was never on the agenda :lol:. )

  • Like 1
Link to comment
6 hours ago, ShaunR said:

I don't use any of them for this sort of thing. They introduced the JSON extension as a build option in SQLite so it just goes straight in (raw) to an SQLite database column and you can query the entries with SQL just as if it was a table. It's a far superior option (IMO) to anything in LabVIEW for retrieving including the native one. I did write a quick JSON exporter in my API to create JSON from a query as the corollary (along the lines of the existing export to CSV) but since no-one is investing in the deveopment anymore, I'm pretty "meh" about adding new features even though I have a truck-load of prototypes.

(And yes. I figuratively wanted to kiss Ton when he wrote the pretty print :D )

I'm stuck with plain files until NI moves PXI over Linux-RT (I haven't heard any official confirmation this will happen, I'm just assuming they didn't decide to upgrade the entire cRIO line while leaving their high performance automated test hw on a 10 year old OS). It sounds like pharlap doesn't support sqlite.

2 hours ago, drjdpowell said:

I've had a client hand me a 4GB JSON array, so I'm OK for large test cases. :D

Ah so thats what you mean by large ;). Nothing like that on my end, but it occurs to me one of the things I'm doing (in the category of 'stuff i might just flatten to string' is streaming images from a server to a client. Basically I flatten the image separately to a png and then put that in a json object with some metadata (time, format, etc...). My point here is that as part of the json generation step for me, I'm passing in a large binary string which has to be escaped and handled by the flatten to json function. I realize this is probably a bit unusual but I thought I'd mention it.

Link to comment
4 hours ago, smithd said:

My point here is that as part of the json generation step for me, I'm passing in a large binary string which has to be escaped and handled by the flatten to json function.

Be careful if you use the NI function to get your binary data back, as it has bug in that will truncate the string at the first zero, even though that zero is properly escaped as \u0000.  Png files might or might not have zeros in them, but other binary things do (flattened LVOOP objects, for example).

Link to comment
10 hours ago, ShaunR said:

Pharlap is a walk in the park ;) VxWorks was the one that made me age 100 years :D I actually have some source with the relevant changes but never had a device.

Yeah I've tried to compile things for vxworks. Even simple things suck. I know pharlap is just crappy windows 95 but I'd still rather not edit the source to get it working. I don't trust myself to maintain a working build process.

8 hours ago, drjdpowell said:

Be careful if you use the NI function to get your binary data back, as it has bug in that will truncate the string at the first zero, even though that zero is properly escaped as \u0000.  Png files might or might not have zeros in them, but other binary things do (flattened LVOOP objects, for example).

Oh meh. How irritating. I don't think the pngs do but its worth checking. Thats the part of the system I haven't really gotten around to testing properly yet :/

Link to comment

Looks good, the performance stats look great.

I use the LAVA library in most projects. Normally just small config clusters but I have once had to write my own parser for a 1600 element object (that needed to be quite high performance).

Sometimes in config files I will use the key-value mode to read what items are in the object to help with defaults if missing or version migration which I guess isn't the intention of this API but that's the only case I have that this wouldn't work for.

Great performance - sometimes the simple methods are best!

Link to comment
1 hour ago, JamesMc86 said:

Sometimes in config files I will use the key-value mode to read what items are in the object to help with defaults if missing or version migration which I guess isn't the intention of this API but that's the only case I have that this wouldn't work for.

Oh that IS a use case.   Though lookup is much slower than with a Variant-Attribute-based object, it is much faster than doing the initial conversion to that object, so one is well ahead overall if one only needs to do a few lookups.

Link to comment
  • 3 weeks later...
Quote

Be careful if you use the NI function to get your binary data back, as it has bug in that will truncate the string at the first zero, even though that zero is properly escaped as \u0000.  Png files might or might not have zeros in them, but other binary things do (flattened LVOOP objects, for example).

Ugh, this is really killer. I had issues upon issues with transferring images, and then right when I thought I had a solution this hit me. Meh..

 

The real reason I'm posting is just to bump and see how jsontext is coming. It looks like you're still pretty actively working on it on bitbucket...do you feel more confident about it, or would you still call it "VERY untested"? I'd love to try it out for real when you get closer to, shall we say a 'beta' release?

Also, from what I could tell there isn't a license file in the code. Are you planning on licensing it any differently from your other libraries, or you just never got around to putting in a file?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.