JackDunaway Posted March 14, 2013 Report Share Posted March 14, 2013 How can I exclude a variable-width, inner part from a regex match? For instance, given the three following inputs: The quick brown fox jumped over the lazy dog The quick brown fox jumped over the sleepy dog The quick brown fox jumped over the hotdog I want to match the following: The quick brown fox jumped over the dog I have investigated negative lookarounds, but since these are zero-width assertions, I can't easily figure out how to include additional regex directives on *both sides* of the lookaround. Is this problem relegated to the use of Search and Replace String configured for regex matching, or can this be achieved with a simple regex match? Here's a screenshot of one of my naïve attempts using RegexPal; a successful attempt would also show "dog" highlighted as part of the match, but excluding the dog modifier. At least it doesn't match 'frog' :-) Quote Link to comment
Darin Posted March 14, 2013 Report Share Posted March 14, 2013 In Perl you can get funky, here I would S&R or use capture groups on both sides and concatenate. Actually I would probably do that in Perl as well. Quote Link to comment
GregSands Posted March 14, 2013 Report Share Posted March 14, 2013 Once again, just as I think of a possible solution (capturing "dog" separately and concatenating) Darin jumps in first with the answer. Quote Link to comment
JackDunaway Posted March 14, 2013 Author Report Share Posted March 14, 2013 OK; so, the consensus is that there does not exist a single regex to solve this? (I was really hoping to learn of a solution that could be achieved purely by the regex engine.) The best name I can come up with for what I'm trying to do is submatch extraction, where the whole match depends on matching directives before and after the submatch, yet excludes the submatch like this: Quote Link to comment
mje Posted March 14, 2013 Report Share Posted March 14, 2013 Yeah, S&R will be way faster. But you asked... Don't forget the * and + operators are greedy, they will match as long of a string as possible. So a simple "The quick brown fox jumped over the (.*)dog" will do the trick. Substitute + for * if you want to require at least 1 character, or use the {m,n} syntax if you have other length restrictions. Obligatory: Quote Link to comment
JackDunaway Posted March 14, 2013 Author Report Share Posted March 14, 2013 ...a simple "The quick brown fox jumped over the (.*)dog" will do the trick. Well, simply finding a submatch is the easy part ;-) What I'm really interested in doing is returning the original input string as the match, minus the submatch. Figure this out, and then we can fly around on vines saving days :-) Quote Link to comment
Darin Posted March 14, 2013 Report Share Posted March 14, 2013 Why does S&R not fulfill your desire? The regex engine deals in offsets and lengths in its internal state machine. Implementing dropped characters would really require a fundamental change to this representation. I am not sure what it would do to its speed (lack thereof) but my guess is that it will not speed up. There seems to be a reason why regular expressions are low-level tools used in higher-level languages. In Perl you can sprinkle some other code inside, but it really makes things hard to read. Most people do not realize that you can comment inside regexes, and even fewer ever want to deal with a regex which requires commenting. Quote Link to comment
JackDunaway Posted March 14, 2013 Author Report Share Posted March 14, 2013 Why does S&R not fulfill your desire? It would be helpful to have access to the text that was replaced. It is easy enough to create a re-use VI that does this; I'm just curious if the ability already exists. Quote Link to comment
mje Posted March 15, 2013 Report Share Posted March 15, 2013 [blockquote class=ipsBlockquote data-author=JackDunaway data-cid=102091 data-time=1363301998]<p> What I'm really interested in doing is returning the original input string as the match, <em class='bbc'>minus</em> the submatch. Figure this out, and then we can fly around on vines saving days :-)</p> <br /> Oh, I misread. Yes, the look ahead/behind are what you want.<br /> <br /> <span style='font-family: courier new', courier, monospace'>(?<=The quick brown fox jumped over the ).*(?=dog)</span><br /> <br /> Using the S&R in regex mode will get you the string you want, but the match primitive won't since the whole point of the match is not to create new strings but to only return substrings. You could still use the match though if you concatenate the before/after substrings. Ok, I give up. My phone completely screwed that up: (?<=The quick brown fox jumped over the ).*(?=dog) If that doesn't work, I give up. Quote Link to comment
JackDunaway Posted March 15, 2013 Author Report Share Posted March 15, 2013 (?<=The quick brown fox jumped over the ).*(?=dog) If that doesn't work, I give up. That got me excited... but it's not quite right. The whole match is still just the submatch, because the lookarounds are zero-width. Virtual +50 bounty for the regex that makes the green light come on in the following test harness: Quote Link to comment
mje Posted March 15, 2013 Report Share Posted March 15, 2013 By giving up I meant giving up on fixing that post by the way. Anyways, you're not going to be able to do it with the match primitive alone. It won't make a new string for you, which is what you need if you want to get the actual "The quick brown fox jumped over the dog" out of the match. The S&R would do it, but then you won't get the match. If you add some extra logic though, you can do it. I hope I got your examples right, my LabVIEW stopped accepting snippets for some reason, so I rolled this one from scratch. Other case is empty by the way, in case the snippets I produce are as defective as my ability to read them. Basically you need to construct the final string yourself. Quote Link to comment
JackDunaway Posted March 15, 2013 Author Report Share Posted March 15, 2013 Anyways, you're not going to be able to do it with the match primitive alone. It won't make a new string for you... Well, it's kinda not a new string... it's just a noncontiguous substring. (Can a substring be defined as 'noncontiguous'? Perhaps, no, and what i desire is impossible.) I hope I got your examples right, my LabVIEW stopped accepting snippets for some reason, so I rolled this one from scratch. regex2.png Wow, thanks for going out of your way to recreate! Sorry snippets broke for you That is basically the way I'm solving the problem right now; with extra syntax. The prime motivation for finding a *purely* regex solution is to generalize this problem -- consider wanting to remove adjectives from both nouns: The fox jumped over the dog This general solution more closely matches my problem domain. (This thread presents the simplest form of the problem, since I can't even figure that out; or if the desired solution is even possible!) Quote Link to comment
mje Posted March 15, 2013 Report Share Posted March 15, 2013 Hah, no worries. I love these types of problems. Pure logic. Non-contiguous strings exist, just not in LabVIEW I figured you had a good reason to use a regex because as it stood a regex was not the best way to do it: scanning would be faster. You have an interesting problem though in that you want to replace and match at the same time, and apparently globally. Quote Link to comment
ShaunR Posted March 15, 2013 Report Share Posted March 15, 2013 (edited) You are expecting that "Whole Match" really means Whole Match except those bits I don't want? LV Help whole match contains all the characters that match the expression entered in regular expression. Any substring matches the function finds appear in the submatch outputs. To match individual components you have to create capture groups. The ?: syntax means that you exclude that component from the list of capture groups (this isn't a LV peculiarity, it's how all regex parsers work). So whilst (The quick brown fox jumped over thes*)([a-zA-Z0-9]*)(s*dog) will give you three terminal outputs with 1. The quick brown fox jumped over the 2. lazy 3. dog (The quick brown fox jumped over thes*)(?:[a-zA-Z0-9]*)(s*dog) will give you only two terminals with 1. The quick brown fox jumped over the 2. dog Similarly. The quick brown fox jumped over thes*([a-zA-Z0-9]*)s*dog will give you only one terminal with 1. lazy In all cases the Whole Match will give you the whole string if all capture groups match and bugger all if it doesn't. Edited March 15, 2013 by ShaunR Quote Link to comment
Phillip Brooks Posted March 15, 2013 Report Share Posted March 15, 2013 (edited) So I've tried to load every snippet in this thread to LV2012 and all I get is a picture on the BD. I tested LV by grabbing a couple from the dark side and they work fine. I created a few of my own as well and could load them back in. Could there be a filter on the LAVAG server that is stripping out the meta data? I'll share my SuperSecret toggle snippet here and see if I can load it back into LV. EDIT: I think something is happening on the server. My image file on disk was ~15k in size and when downloaded is ony 5k. Hmmm.... Edited March 15, 2013 by Phillip Brooks Quote Link to comment
ShaunR Posted March 15, 2013 Report Share Posted March 15, 2013 So I've tried to load every snippet in this thread to LV2012 and all I get is a picture on the BD. I tested LV by grabbing a couple from the dark side and they work fine. I created a few of my own as well and could load them back in. Could there be a filter on the LAVAG server that is stripping out the meta data? I'll share my SuperSecret toggle snippet here and see if I can load it back into LV. Change SuperSecret Setting.png EDIT: I think something is happening on the server. My image file on disk was ~15k in size and when downloaded is ony 5k. Hmmm.... Yup. I checked my snippet after Hoover said (on another thread) and it was fine. I've now uploaded the same snippet to http://postimage.org/image/5p8stekp7/ and it works fine when downloaded. Definitely something going on with lavag.org. Maybe it's now being optimised/compressed? Quote Link to comment
Daklu Posted March 20, 2013 Report Share Posted March 20, 2013 If you have to do regular expressions very often and want to learn them, RegexBuddy is your friend. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.