hooovahh Posted May 2, 2018 Report Share Posted May 2, 2018 Crosspost on the dark side. Lets say I have two large files on disk. These files will be merged at the end of a process, but I don't want to wait for the end of the process to merge some of them for convenience of reading them. I could copy the two files, then merge them, then get rid of the merge, but I'm I feel like that is a waste of File I/O operations, especially if the files are large. I could work with both files independently and when I get to the end of one file open the next. But what would be simpler would be to make a linked file, or a virtual file. A file that doesn't really exist and is just a link to the real files. But the trick is I don't want this link to be just one file, I want this linked file to concatenate with another one. C:\Temp\1.tdms - 1GB file C:\Temp\2.tdms - 1GB file C:\Temp\Merged.tdms - linked file that when opened has the contents of 1.tdms, and 2.tdms. Is this possible? I searched around and only found references to mklink, junctions, and other ways of mapping real files to another virtual place on disk. This is part of what I want but I'd also like to combine the files in a virtual sense. Any thoughts? As I said I could combine them so Merge.tdms is a real file at 2GB in size, but making that copy will probably take a while. And I could write code to read from 1.tdms, then when it gets to the end read from 2.tdms but that complicates functions quite a bit, especially when there is likely a 3.tdms or 4.tdms. Has anyone heard of any feature like this? Quote Link to comment
mje Posted May 2, 2018 Report Share Posted May 2, 2018 (edited) It would be news to me if there was a way of doing it. In the past I've wanted to treat blocks of a large file as native file system objects but never found a way of doing it at the operating system level. I figured you're either hanging onto a refnum/handle of the big file and synchronizing I/O operations yourself, work with a folder holding a collection of files to merging after the fact. Neither was ideal but I chose the later since I didn't need real-time read access and it's a whole lot less work. Edited May 2, 2018 by mje Quote Link to comment
ShaunR Posted May 2, 2018 Report Share Posted May 2, 2018 (edited) I'm not sure if I'm understanding this correctly but you may be able to use memory mapped files. (I'm going to ignore the fact you are talking about TDMS because that may complicate things ) So you would create a mmap file twice as big as you need (right, I know. crystal ball time ). Then write file one from the beginning and file two from half way through. You can read out the data in any order, you like while it's being written by just addressing the bytes directly. So. For example. You could read line one from file one and line one from file two (half way through the map) and show those however you like. Alternatively you write each file into an mmap and read from multiple maps. Edited May 2, 2018 by ShaunR Quote Link to comment
GregSands Posted May 2, 2018 Report Share Posted May 2, 2018 This probably won't help you, but you should be using HDF5 files - it can do exactly this. H stands for Hierarchical, and it is quite straightforward to write data to multiple files, and create a "master" file which transparently links them. That works for writing as well as reading, so you can create the master file at the start, and write data to that which will be stored in separate files, or create it after writing individual files. The HDF5 library handles all of the connection, and can be as simple as I said or far more complex if needed. Quote Link to comment
hooovahh Posted May 3, 2018 Author Report Share Posted May 3, 2018 Okay thanks for your input. As mentioned on the dark side this is probably something that is possible by the OS, but likely lots of work that I'm not well versed in. I was really just hoping for a tool that I hadn't heard of and had a hard time searching for due to the term similarity with other virtual file mapping. Also I didn't know HDF5 had that feature, something to research for sure, but I think we are a bit too far down the TDMS hole at the moment. Not that we can't change course but it might be difficult and I'd need to see some really compelling reasons to invest in it. And since TDMS is handling all we need (I mean other than this) I don't see that happening soon. I'll either be merging the files early and keeping track that a partial merge has happened, that way I'm still just dealing with reading and parsing one file, or write an abstraction layer which makes reading from multiple files transparent. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.