Reds Posted September 4, 2023 Report Share Posted September 4, 2023 (edited) Let's say you have a really big binary file. A file so big that it won't fit into your PC RAM. Now let's say you wanted to delete the *first* 100kB in that file, and leave the rest of the file alone. How would you do that? Can it be done quickly? Can it be done without creating a whole new file? Edited September 4, 2023 by Reds Quote Link to comment
mcduff Posted September 4, 2023 Report Share Posted September 4, 2023 Read and copy the file in chunks. No need to open the whole file at once. To increase speed write is multiples of the disk sector size. Quote Link to comment
X___ Posted September 4, 2023 Report Share Posted September 4, 2023 Here is ChatGPT's answer: You can delete part of a file without loading it entirely into RAM by using the dd command. This command can be used to move the contents of the file up by a specified number of bytes, effectively deleting the specified number of bytes from the beginning of the file1. Here is an example script that you can use to delete a specified number of bytes from the beginning of a file: #!/bin/bash size=$(stat -c %s "$2") dd bs=$1 if="$2" skip=1 seek=0 conv=notrunc of="$2" dd bs=$((size - $1)) if="$2" skip=1 seek=1 count=0 of="$2" Copy You can call this script as ./truncstart.sh 2 file.dat, where 2 is the number of bytes to delete from the beginning of file.dat1. However, please note that this solution is not robust in case of an unplanned outage, which could occur part-way through dd’s processing; in which case the file would be corrupted1. Is there anything else you would like to know? 😊 Learn more: 1. unix.stackexchange.com2. superuser.com3. digitalcitizen.life Quote Link to comment
ShaunR Posted September 5, 2023 Report Share Posted September 5, 2023 8 hours ago, X___ said: Here is ChatGPT's answer: You can delete part of a file without loading it entirely into RAM by using the dd command. This command can be used to move the contents of the file up by a specified number of bytes, effectively deleting the specified number of bytes from the beginning of the file1. Here is an example script that you can use to delete a specified number of bytes from the beginning of a file: #!/bin/bash size=$(stat -c %s "$2") dd bs=$1 if="$2" skip=1 seek=0 conv=notrunc of="$2" dd bs=$((size - $1)) if="$2" skip=1 seek=1 count=0 of="$2" Copy You can call this script as ./truncstart.sh 2 file.dat, where 2 is the number of bytes to delete from the beginning of file.dat1. However, please note that this solution is not robust in case of an unplanned outage, which could occur part-way through dd’s processing; in which case the file would be corrupted1. Is there anything else you would like to know? 😊 Learn more: 1. unix.stackexchange.com2. superuser.com3. digitalcitizen.life Linux only huh? No mention of fallocate? Why do people keep posting junk from ChatGPT? At this point I consider it spam. 2 Quote Link to comment
dadreamer Posted September 5, 2023 Report Share Posted September 5, 2023 (edited) I would suggest Memory-Mapped Files, but I'm a bit unsure whether all-ready instruments exist for such a task. There's @Rolf Kalbermatter's adaptation: https://forums.ni.com/t5/LabVIEW/Problem-Creating-File-Mapping-Object-in-Memory-Mapped-FIles/m-p/3753032#M1056761 But seems to need some tweaks to work with common files instead of file mapping objects. Not that hard to do though. A quick-n-dirty sample (reading 10 bytes only). Yes, I know I should use CreateFileA instead of Open/Create/Replace VI + FRefNumToFD, just was lazy and short on time. Edited September 5, 2023 by dadreamer Quote Link to comment
ShaunR Posted September 5, 2023 Report Share Posted September 5, 2023 (edited) 15 minutes ago, dadreamer said: I would suggest Memory-Mapped Files, but I'm a bit unsure whether all-ready instruments exist for such a task. There's @Rolf Kalbermatter's adaptation: https://forums.ni.com/t5/LabVIEW/Problem-Creating-File-Mapping-Object-in-Memory-Mapped-FIles/m-p/3753032#M1056761 But seems to need some tweaks to work with common files instead of file mapping objects. Not that hard to do though. There is a limit to how much you can map into memory. BTW. Here is a LabVIEW mmap wrapper for working with files on windows Edited September 5, 2023 by ShaunR Quote Link to comment
dadreamer Posted September 5, 2023 Report Share Posted September 5, 2023 28 minutes ago, ShaunR said: There is a limit to how much you can map into memory. Not an issue for "100kB" views, I think. Files theirselves may be big enough, 7.40 GB opened fine (just checked). Quote Link to comment
Reds Posted September 5, 2023 Author Report Share Posted September 5, 2023 Thanks for the ideas fellas, I'll report back on my progress. I guess I was hoping for some Win32 API that could tweak the NTFS tables to change the starting sector of a file (but I guess that would be too easy). Quote Link to comment
X___ Posted September 5, 2023 Report Share Posted September 5, 2023 14 hours ago, ShaunR said: Linux only huh? No mention of fallocate? Why do people keep posting junk from ChatGPT? At this point I consider it spam. Well, isn't Linux part of Windows nowadays? Quote Link to comment
ShaunR Posted September 6, 2023 Report Share Posted September 6, 2023 On 9/5/2023 at 11:19 AM, dadreamer said: Not an issue for "100kB" views, I think. Files theirselves may be big enough, 7.40 GB opened fine (just checked). A 100kB view will not help you truncate from the front. You can use it to copy chunks like mcduff suggested but On 9/4/2023 at 7:41 PM, Reds said: A file so big that it won't fit into your PC RAM The issue with what the OP is asking is to get the OS to recognise a different start of a file. Truncating from the end is easy (just tell the file system the length has changed). From the front is not unless you have specific file system operations. On Windows you would have to have Sparse Files to achieve the same as fallocate 1 Quote Link to comment
dadreamer Posted September 6, 2023 Report Share Posted September 6, 2023 7 minutes ago, ShaunR said: You can use it to copy chunks like mcduff suggested It is what I was thinking of, just in case with Memory-Mapped Files it should be a way more productive, than with normal file operations. No need to load entire file into RAM. I have a machine with 8 GB of RAM and 8 GB files are mmap'ed just fine. So, general sequence is that: Open a file (with CreateFileA or as shown above) -> Map it into memory -> Move the data in chunks with Read-Write operations -> Unmap the file -> SetFilePointer(Ex) -> SetEndOfFile -> Close the file. Quote Link to comment
ShaunR Posted September 6, 2023 Report Share Posted September 6, 2023 (edited) 1 hour ago, dadreamer said: It is what I was thinking of, just in case with Memory-Mapped Files it should be a way more productive, than with normal file operations. No need to load entire file into RAM. I have a machine with 8 GB of RAM and 8 GB files are mmap'ed just fine. So, general sequence is that: Open a file (with CreateFileA or as shown above) -> Map it into memory -> Move the data in chunks with Read-Write operations -> Unmap the file -> SetFilePointer(Ex) -> SetEndOfFile -> Close the file. Indeed. However. Hole punching is much, much faster. If you are talking terabytes, it's the only way really. Set the file to be Sparse. Write 100k zero's to the beginning. Job done (sort of). Edited September 6, 2023 by ShaunR Quote Link to comment
Reds Posted September 6, 2023 Author Report Share Posted September 6, 2023 9 hours ago, ShaunR said: Indeed. However. Hole punching is much, much faster. If you are talking terabytes, it's the only way really. Set the file to be Sparse. Write 100k zero's to the beginning. Job done (sort of). Yes, we are indeed talking terabytes. Reading the original file and writing a new one will take many minutes. It will also require the storage medium to have terabytes of free space available to perform the operation. Maybe even a whole separate partition would need to be set aside with free space. "Copy only the parts you want to save" is certainly the obvious solution, but it's not a good one for really big files. Thanks for the Microsoft link to Sparse files. I"ll dig into that and learn more. Quote Link to comment
ShaunR Posted September 7, 2023 Report Share Posted September 7, 2023 9 hours ago, Reds said: Thanks for the Microsoft link to Sparse files. I"ll dig into that and learn more. You can play with fsultil but Windows (A.K.A NTFS/ReFS) doesn't have the "FALLOC_FL_COLLAPSE_RANGE" like fallocate (which helps with programs that aren't Sparse aware). Quote Link to comment
Reds Posted September 7, 2023 Author Report Share Posted September 7, 2023 Yeah, I dug into the Microsoft docs on sparse files, and I don't think that technology is going to solve my problem after all. Cool stuff. Good to know. But doesn't seem like it's going to solve my immediate pain. I guess what's really needed is a way to modify the NTFS Master File Table (MFT) to modify the starting offset of a given file. But, I didn't actually see any Win32 API's that could do that. I'm sure it must be possible to do that with some bit banging, but I'd probably be getting in way over my head if I tried to modify the MFT using a method that was not Microsoft endorsed. Quote Link to comment
GregSands Posted September 7, 2023 Report Share Posted September 7, 2023 I'm guessing there must be more to your question, but based on your specs, I'd be asking whether it was worth spending time and effort deleting a relatively tiny part of a file. 100k out of tens of GB? I'd just leave it there and work around it! Quote Link to comment
ShaunR Posted September 8, 2023 Report Share Posted September 8, 2023 9 hours ago, GregSands said: I'm guessing there must be more to your question, but based on your specs, I'd be asking whether it was worth spending time and effort deleting a relatively tiny part of a file. 100k out of tens of GB? I'd just leave it there and work around it! It's a common requirement for video editing. Quote Link to comment
dadreamer Posted September 8, 2023 Report Share Posted September 8, 2023 (edited) Technically related question: Insert bytes into middle of a file (in windows filesystem) without reading entire file (using File Allocation Table)? (Or closer, but not that informative). The extract is - theoretically possible, but so low level and hacky that easy to mess up with something, rendering the whole system inoperable. If this doesn't stop you, then you may try contacting Joakim Schicht, as he has made a bunch of NTFS tools incl. PowerMft for low level modifications and maybe he will give you some tips about how to proceed (or give it up and switch to traditional ways/workarounds). Edited September 8, 2023 by dadreamer 1 Quote Link to comment
ShaunR Posted September 8, 2023 Report Share Posted September 8, 2023 Well. What is your immediate pain? Can you elaborate? Here is an existing file with the first 0x40000000 (d:1073741824) bytes nulled. You can see it only has about 600MB on disk. If I query it I see that data starts at 0x40000000 Now I can do a seek to that location and read ~600 bytes. However. I'm guessing you have further restrictions. Quote Link to comment
Reds Posted September 8, 2023 Author Report Share Posted September 8, 2023 (edited) Quote Well. What is your immediate pain? Can you elaborate? The jumbo file is recorded with a bunch of header data starting at file offset zero. This header data is not actually useful, and it actually causes a third party analysis application to think that the recorded data is corrupt. If I can manage to delete only the header data at the beginning of the file, then the third party analysis application can open and analyze the file without throwing any errors. Edited September 8, 2023 by Reds Quote Link to comment
Reds Posted September 8, 2023 Author Report Share Posted September 8, 2023 21 hours ago, GregSands said: I'm guessing there must be more to your question, but based on your specs, I'd be asking whether it was worth spending time and effort deleting a relatively tiny part of a file. 100k out of tens of GB? I'd just leave it there and work around it! Yeah, I wish that was possible. The problem is that a third party analysis application can't understand the first 100kB of the file, and so that software incorrectly concludes that the entire remainder of the file must be corrupt. Quote Link to comment
Dan Bookwalter N8DCJ Posted September 8, 2023 Report Share Posted September 8, 2023 I was looking for something else ... and ran across this thread ... "This header data is not actually useful" my question is , why the header then ? Dan Quote Link to comment
ShaunR Posted September 9, 2023 Report Share Posted September 9, 2023 10 hours ago, Reds said: If I can manage to delete only the header data at the beginning of the file, then the third party analysis application can open and analyze the file without throwing any errors. Indeed. On 9/7/2023 at 9:01 AM, ShaunR said: but Windows (A.K.A NTFS/ReFS) doesn't have the "FALLOC_FL_COLLAPSE_RANGE" Which is what you need. Quote Link to comment
Neil Pate Posted September 9, 2023 Report Share Posted September 9, 2023 @ShaunR and @dadreamer (and @Rolf Kalbermatter) how do you know so much about low level Windows stuff? Please never leave our commuity, you are not replaceable! Quote Link to comment
ShaunR Posted September 10, 2023 Report Share Posted September 10, 2023 18 hours ago, Neil Pate said: @ShaunR and @dadreamer (and @Rolf Kalbermatter) how do you know so much about low level Windows stuff? Please never leave our commuity, you are not replaceable! Oh, I am easily replaceable. The other two know how things work in a "white-box", "under-the-hood" manner. I know how stuff works in a "black-box" manner after decades of finding work-arounds and sheer bloody-mindedness. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.