Jump to content

Replace portion of file data


Recommended Posts

Hay!

I'm currently working on a custom database, and as most database experts know, they are normally very large files. The beauty of databases therefore is that you can lread and write portions of data without loading the entire file. Well, that is were I have run into a problem.

Given a file, I need to be able to replace a certain amount of data at a given offset with another string of data. But from what I can see, it is impossible to do this without loading the entire file as a string and then using the string operations to replace wherever, then write the entire string back to the file. :( I really need to avoid this since the databases I'm dealing with could be very large (100 or more megabytes :o .) Does anyone have any ideas on how to solve this? Maybe any Win32 functions designed for this? Any help is greatly appreciated!

7J1L1M

Link to comment

If you are getting paid to do this, then you should use a real database. Microsoft and Oracle and Sun/MySQL have solved this problem. They spent billions of dollars doing it.

If you are compelled to do this the hard way (that is, for homework), than I would look more closely at LabVIEW's "Set File Position" primitive.

Good luck,

Jason

Link to comment

jdunham,

Unfortunately it will be the hard way ;) . But I will need to use this same concept for other file formats anyway, I'm just currently up with my head against the wall for this format...

Set File Position? Do you mean "Seek"? LabVIEW doesn't have any functions named "Set File Position".

Certainly if you can read a portion of a file, you can erase part of the file and replace it with something else... Or so it would seem...

If anyone has any more ideas, please write! :thumbup: Thanks!

7J1L1M

Link to comment

Hay Toby,

I don't have LabVIEW 8.x, which would explain the Set File Position :headbang: . Isn't it nearly equivalent to "Seek" in LabVIEW 7.x?

If so, how could it help me replace a portion of a file with another string? If necessary, the operation could probably be performed in two steps: erase part of the file, then insert a string. Still, I have no method for removing part of a file without loading the entire file into LabVIEW and editing it with the string ops, then write it back to a blank file... :thumbdown:

Appreciate the comments, any further ideas would be most helpful! :)

7J1L1M

Link to comment

QUOTE (7J1L1M @ Mar 11 2008, 09:03 PM)

Hay Toby,

I don't have LabVIEW 8.x, which would explain the Set File Position :headbang: . Isn't it nearly equivalent to "Seek" in LabVIEW 7.x?

If so, how could it help me replace a portion of a file with another string? If necessary, the operation could probably be performed in two steps: erase part of the file, then insert a string. Still, I have no method for removing part of a file without loading the entire file into LabVIEW and editing it with the string ops, then write it back to a blank file... :thumbdown:

Appreciate the comments, any further ideas would be most helpful! :)

7J1L1M

You can easily do this if the string you are replacing is exactly the same size as the string you are writing. If you set the file position to a certain point and then start writing, this will overwrite whatever is already there. You don't really erase data, just replace it. So if the replacement part is the same size as the old part, your job should be done. Since this is string data, and not formatted numeric data, however, it may be unlikely that you will be so lucky to have equal sizes.

In this case, your best option is to pipeline the process by only reading sections at a time past the replacement section, then rewriting them at the end. You can get it so that you only load say 100,000 bytes at a time, which should be more efficient.

If you have full control over the file format used here, you might try adapting some insights used in the NI TDMS format. In this format, all data is always appended to the end of the file. With the exception of defragmenting the file, you never ever erase existing parts of the file. You just use clever indexing to invalidate them and append valid parts to the end. This might be a lot of work, but maybe it'll trigger some idea in your head. The end result is that you never have to load but a small index portion at the beginning of your file in order to dump data at the end.

Link to comment

QUOTE (7J1L1M @ Mar 11 2008, 09:03 PM)

I don't have LabVIEW 8.x, which would explain the Set File Position :headbang: . Isn't it nearly equivalent to "Seek" in LabVIEW 7.x?

In 7.x, you can use Write File and wire pos mode to 0 and pos offset to the start position for your write. You can also call Seek first, set the start location and use Write File with pos mode set to 2. That will work if you are actually replacing x bytes in your file with x new bytes. If you're changing the number of bytes in the file, you'll have to open the rest of the file to make sure you don't overwrite good data or leave remnants of the old data behind.

Link to comment

Thanks for all the suggestions! I'll see what I can do with the "append to end of file" technique, although I'm afraid that will be extremely hectic and complicated for me :( . One of my main goals with this format was to attain the smallest size possible (even at gigantic sizes! :laugh: ) for easy transfer, but still with the speed of commercial databases. I may have found something that will do what I've been wanting to do:

Windows Kernel32.dll has some functions called file mapping like "CreateFileMapping", "Memmove_a", and other similar functions. It looks like it may be possible to use these to actually "move" parts of the file to "nothing" and thereby erase it. I believe there is also an insert function. Does anybody have any experience with file mapping? If so, it may be possible to use this idea. Here's a link to a situation similar to mine that uses this method:

http://forum.soft32.com/pda/delete-part-bi...opict44773.html

If anyone has an idea for this, it would be most welcome!

7J1L1M

Link to comment
  • 2 weeks later...

QUOTE (7J1L1M @ Mar 11 2008, 10:03 PM)

In LabVIEW < 8.0 this function was indeed called Seek but very seldom used, since the Read and Write File functions had an offset input that defaulted to the current offset.

Rolf Kalbermatter

QUOTE (7J1L1M @ Mar 12 2008, 01:30 PM)

Thanks for all the suggestions! I'll see what I can do with the "append to end of file" technique, although I'm afraid that will be extremely hectic and complicated for me
:(
. One of my main goals with this format was to attain the smallest size possible (even at gigantic sizes! :laugh: ) for easy transfer, but still with the speed of commercial databases. I may have found something that will do what I've been wanting to do:

Windows Kernel32.dll has some functions called file mapping like "CreateFileMapping", "Memmove_a", and other similar functions. It looks like it may be possible to use these to actually "move" parts of the file to "nothing" and thereby erase it. I believe there is also an insert function. Does anybody have any experience with file mapping? If so, it may be possible to use this idea. Here's a link to a situation similar to mine that uses this method:

If anyone has an idea for this, it would be most welcome!

File mapping simply maps a view of the file into memory. It is meant to quickly modify files but there is no way to map a part of the file to "nothing" and make that part magically disappear. You still have to move the remainder of the file view to the new position before closing the file mapping to make that part be go away.

Writing a VI that goes to the offset where the modification starts and then in a loop reads in junks of data writing them back to the new desired offset will be not that difficult. If the part to insert will be bigger than the part it resizes you obviously will have to start the file junk read and write back at the end of the file if you do not want to write the changes into a new temorary file first and then deleting the original one moving the temporary file to its name. Of course this too would be done with a loop reading in junks at a time to avoid having to read in a 100MB file into memory.

For security reasons I would recommend to do the temporary file anyhow as this operation will take some time (seconds) and there is always a chance that a crash in the middle of the data copying might leave the file in a completely corrupted state.

Rolf Kalbermatter

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.