Jump to content

Should File IO be handled by a central actor?


AlexA

Recommended Posts

Following with the idea of division of responsibility. I designed my application such that each individual process in charge of controlling and acquiring data from a specific piece of hardware would send its acquired data to another actor who maintained all file references (TDMS files) and was responsible for opening and closing files, as well as saving the data to disk.

 

A consequence of this decision is that someone wanting to introduce a new piece of hardware + its corresponding control code must go further than just dropping a plugin that meets the communication contract into a directory. They must implement all the file IO stuff in the file IO process.

 

The thought has entered my mind that perhaps it would be better to make File IO the responsibility of the plugin that wants to save acquired data. So each individual process would implement a sub loop for saving its own data. The template for plugins would then include the basic file IO stuff required so it would be much easier for someone to just modify the data types to be saved.

 

My goal here is ease of maintenance/extension.

 

The most important consideration for me apart from ease of extension is whether having a bunch of independant processes talking to the OS will be more CPU heavy than one single actor (who is an intermediary between the OS and all the other processes).

 

Does anyone have any experience in this area?

Link to comment

I switched to service oriented a while ago which is the premise of what you are pondering. You can see a simple example in the VIM demo along.along with an EDSM.

 

You will note a couple of services, one of which is FILE that enables other modules to access basic read functionality and the SOUND that, well, plays the sounds :D . Error logging is another that lends itself to this topology and in real systems I also have a Database service that comes in very handy.. The way things usually pan out is you have a set of services that provide a core functionality and supplemental modules can use them if they want. You define an API that other modules can use so they don't have to implement everything themselves.Looking at it that way, there is no presupposition that any module is functionally complete, only that it is a provider of certain features if you decide to use them. No one is forced to, but it is advantageous to do so., If a service gets too heavy, split it out into a couple of services. The module layout doesn't matter, only the message API interface does.

 

Because each service is a self contained module and all interaction is via its message interface, you can transplant them into other applications or expand the feature set, as you can see I do by adding TCPIP here..

Edited by ShaunR
Link to comment

Interesting, let me see if I understand you correctly as there are a number of implementation differences which I might get hung up on.

You would copy and paste that File sub VI into every module that may want to do file IO. You access it using the named queues functionality, rather than maintaining queue references or anything like that.

I note that the File IO is non-reentrant, what does this mean if there multiple "plugins" which have it on their block diagrams? Or are you proposing that each plugin has essentially their own version of that File IO subVI?

Link to comment

You would copy and paste that File sub VI into every module that may want to do file IO.

 

No. There is only one that supplies the FILE service and.It doesn't matter where you put it or how you load it. Plonking it on the diagram of the Main.vi is just a way to load it that can be easily seen and recognised.

 

You access it using the named queues functionality, rather than maintaining queue references or anything like that.

 

Yes but you don't have to. That is just an implementation detail of the messaging framework I use. Each service is controlled with a queue and data is retrieved via an event. That is the module strategy. The tactical choice of pure string messaging breaks the cohesion between modules and makes the messaging system network agnostic.The use of queue names is an implementation choice to achieve the latter.

 

I note that the File IO is non-reentrant, what does this mean if there multiple "plugins" which have it on their block diagrams? Or are you proposing that each plugin has essentially their own version of that File IO subVI?

 

The services are themselves "plugins". You expand the system and define the systems' operation by the "plugins" you create and load for it. This topology is of the "File IO be handled by a central actor" category so there is only one and all other modules query it directly or listen for data being emitted by it. It is like your current system without the cohesion problem that you are suffering. Putting a copy in everything is a really bad idea :P

 

I get the impression you looked at the demo source only. Probably because all the events were broken due to the VIM. That's a shame really because you lose all the context and see each module in action and how they interact.

Edited by ShaunR
Link to comment

Ok, I think I understand, what do you do if a new module someone is developing wants to use a core service in a different way? I.e. wants to save its data differently?

 

I'll refer you to my original comment

you have a set of services that provide a core functionality and supplemental modules can use them if they want.

 

Every so often go through the modules that others are creating and see what is useful for reuse and add it to your core services.

Link to comment

So you're saying if someone comes along with a new module and a new way of saving data, the standard way to interop is for them to implement their own File IO?

 

The workflow for me adding something to your application is to roll my own file IO inside my module?

 

I'm saying let them write it as a service and co-opt it for your reuse libraries/services if it looks interesting and useful :D

If a facility doesn't exist, someone has to write it. Software doesn't spontaneously come into being because you want it. Well. Not unless you are the CEO.

 

So look at my FILE vi again. It opens a file and sends the contents to whoever requests it. The FILE.vi does not care about the file itself, it's structure or what the bytes mean but it does require it to be a "normal" file with the open and close. The FILE.vi can read a lot of files for most scenarios (config, INI, binary, log files etc) but it cannot currently read TDMS files because they need a different procedure to access them and TDMS aren't required for this demo..

 

Can I add it to the FILE.vi? Sure I can. I can put the code in the FILE.vi then other modules just use the message FILE>READ>TDMS>filename. Do I want to? Maybe, if I think open/read/close of a TDMS is useful. I could also create a STREAM service that may have a state machine (see the TELEMETRY.vi for a producer consumer state machine) and allowing other module writers to access that via it's API. (STREAM>WRITE>TDMS>filename etc) Now I have another service in my application toolkit that I can add to TELEMETRY, FILE, DB, SOUND etc to make any other applications. Maybe I do both ;)

 

You will notice that either way. The other modules/services only care about the message, not the code that actually does the work or where that code is (It could be on another computer!) and I can partition the software within my application as I see fit without interference from other modules/services. I can also add more APIs and more commands to a single API without changing backward compatibility (within reason)

 

Saying all that. Maybe your use case requires a different architecture. There is no one size fits all no matter how much framework developers would like theirs to be.

Edited by ShaunR
Link to comment

Ok, thanks very much for the clarification. I guess my original question boils down to "how much can we lean on the OS file handling code for handling multiple streaming files?"

 

I acknowledge your point about someone having to write the code. If the OS (windows) can be trusted to handle multiple open file handles; absorbing multiple streams of data gracefully, then it makes more sense to let people write their own file IO stuff local to their module.

 

There's actually not much difference between completely letting the OS handle it, and what I'm currently doing. There's no effort to schedule writes in my current architecture so it might as well be the same thing as everyone just trying to write their own stuff independently. 

Link to comment

I can't contribute a lot to the LabVIEW side of this discussion, but I wouldn't try to outsmart the OS in terms of file IO. The OS can delay or cache writes and might implement different schemes depending on the type of disk (that is, it will schedule writes to a SSD differently than to a standard disk). Writing larger chunks is usually better than writing small chunks, and if you know how large your file wills be in advance you might get some benefit from setting the file to that size initially, but other than that I'd let the OS do the work.

Link to comment

Ok, thanks very much for the clarification. I guess my original question boils down to "how much can we lean on the OS file handling code for handling multiple streaming files?"

 

I acknowledge your point about someone having to write the code. If the OS (windows) can be trusted to handle multiple open file handles; absorbing multiple streams of data gracefully, then it makes more sense to let people write their own file IO stuff local to their module.

 

There's actually not much difference between completely letting the OS handle it, and what I'm currently doing. There's no effort to schedule writes in my current architecture so it might as well be the same thing as everyone just trying to write their own stuff independently. 

 

I get the feeling we are talking at cross purposes. All file reading and writing must go through the OS (unless you have a special kernel driver) so I don't really know what you are getting at.

Link to comment

That's my point I guess. Why manage a central repository of file references for streaming files from different modules when the OS can handle the scheduling better than I can anyway? Why not just let the individual modules open and close their own files as they require.

Link to comment

I didn't read through the entire thread, but I thought I could still throw my thoughts in.

 

Generally I'd prefer to write a file API first, service second. I've repeatedly run into problems where services don't quite do what I want (either how they get the data to log or how they do timing or how they shut down or...) but the core API works well. I like the idea that the API is your reuse library and the service is part of a sample project, drop vi or other.

 

As for whether I'd just use the API vs making a service, I usually end up with a service because I do a lot of RT. Offloading the file I/O to a low-priority loop is the first thing you (should) learn. If timing isn't too important I'll do the file I/O locally (for example critical error reporting) and this is where starting from an API saves time.

 

Finally, having a service for each file vs multiple files handled by a single qmh/whatever...I'd tend towards 1 file/qmh for any synchronous or semi-synchronous file API as I'd rather let the OS multiplex than try to pump all my I/O through one pipe. This might be especially true if for example you want to log your error report to the C drive but then your waveform data to SD/USB cards. I'm guessing there are some OS level tricks for increasing throughput that I couldn't touch if it was N files/loop.

Link to comment

Generally I'd prefer to write a file API first, service second. I've repeatedly run into problems where services don't quite do what I want (either how they get the data to log or how they do

 

I was trying to decide how I would describe the difference between an API and a Service succinctly and couldn't really come up with anything. API stands for Application Programming Interface but I tend to use it to describe groupings of individual methods and properties-a collection of useful functions that achieve no specific behaviour in and of themselves - "a PI". Therefore, my distinguishing proposal would be state and behaviour but  Applications tend to be stateful and have all sorts of complicated behaviours so I'm scuppered by the nomenclature there.

Edited by ShaunR
Link to comment

I was trying to decide how I would describe the difference between an API and a Service succinctly and couldn't really come up with anything. API stands for Application Programming Interface but I tend to use it to describe groupings of individual methods and properties-a collection of useful functions that achieve no specific behaviour in and of themselves - "a PI". Therefore, my distinguishing proposal would be state and behaviour but  Applications tend to be stateful and have all sorts of complicated behaviours so I'm scuppered by the nomenclature there.

Yeah thats true too, you could make a background service appear to be an API. I guess the difference to me is that with the service you're more likely to have to be concerned about some of the internal details like timing or, as in this thread, how it multiplexes. Its probably not a requirement, but maybe its just I feel like the complexity of a service is higher so I am less likely to trust that the developer of the service did a good job. For example, with a TCP service did they decide to launch a bunch of async processes or do they go through and poll N TCP connections at a specific rate? You do still have to be concerned about some things with a more low-level API/wrapper, but it feels like different ends of a sliding scale if that makes any sense.

Link to comment

This is getting a little abstract for me. To concretize the discussion a little, @smithd from what you say I'm visualizing something like:

A wrapper (subVI) around TDMS files (for discussions sake) which internally looks like a message handler with messages for open/close/write.

I assume it would be a re-entrant VI that you drop on the block diagram of anything that needs file IO and hook up internally.

This is what I infer from your 1 file/qmh statement?

Link to comment

Sorry, I don't think I was describing things properly.

 

All I was saying is, lets say I want to do XYZ with a TDMS file. You could do all the TDMS VI calls directly but from the other thread and my own experience you end up doing a lot of metadata prep before you're ready to log data, and even then you probably have some specific format you want to use. Rather than directly making some QMH to do all these things, I personally would prefer to wrap that logic into a small library.

 

 

I've tried to use 'off the shelf' processes in the past and have sometimes been unsuccessful  because something I wanted wasn't exposed, the communication pipe was annoying to use, I had to use a new stop mechanism separate from the rest of my code, or similar. In these situations I either write a bunch of annoying code to work with the process, or I just go in and edit the thing directly until it does what I want.

 

The 1 file/QMH was really just me agreeing with ned and shaun -- I don't know what optimizations can be done by the OS or by labview but I don't want to get in the way of those by trying to make a single QMH manage multiple files.

Link to comment

I think you are probably looking at it slightly awkwardly. You went for compartmentalised solution according to some best practice ideology and then found it got awkward for concurrent processes. You want it to be easy to manage and scalable, and the way you did it was exactly that, but the downside was creating bottlenecks for your asynchronous processes.

 

You have the SOUND.vi problem whereby I needed to be able to play arbitrary asynchronous WAV files simultaneously when a single sound could be 10s of seconds. If the SOUND.vi could only process one message at a time that was useful but I wanted a more generic solution. Sol. I made SOUND.vi a service controller.Other processes ask it to "PLAY"a particular file. It then launches a sub process for that file and returns immediately having no more involvement in the data transfer. How could this work with, say, TDMS?

 

You have the FILE service. You send it a message like FILE>STREAM>TDMS>myfile[.tdms]. The FILE service launches a  listener that is registered for "myfile" messages. You tell the DAQ service to launch an acquisition -  DAQ>STREAM>AI1>myfile. or similar And that's it! The DAQ pumps out messages with the label "myfile" and the listener(s) consume them. Of course the corollary is you can use "FILE>STREAM>TEXT", "FILE>STREAM>BIN" etc, even at the same time :yes: .and you still have your FILE>WRITE and FILE>READ which you don;t really have to launch as sub processes.

 

You've started a producer and consumer and connected their messaging pipes. You can do that as many times as you like and let them trundle on in the background. Your other "plugins" just need to send the right messages and listen in on the conversation (also register for myfile).

Edited by ShaunR
  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.