JackDunaway Posted March 22, 2012 Report Share Posted March 22, 2012 I'm trying to use a part of input text inside a regular expression without capturing the text in the match, but can't figure out how to construct the regex. Below is a snippet that shows what I'm trying to do, and what regex I've tried. Basically, the string is composed of sections that start with "foo", and there is no terminal string that denotes the end of a section. In other words, you know the section has ended when you run into the next "foo", or if you hit the end of the string. I'm trying to divide this string into an array of sections. (Note: you can copy the below input into http://regexpal.com/ rather than firing up LabVIEW) By best-shot regex: (foo[\s\S]*?)(?:foo)? An example input: foo text foo some more text foo some lorem ipsum text foo no more And the snippet: Any ideas?? Thanks in advance! 1 Quote Link to comment
Darin Posted March 22, 2012 Report Share Posted March 22, 2012 I can not test, but I would actually use a positive look-ahead instead of the capture group. Try this: (foo[\s\S]*?)(?=foo|\z) 2 Quote Link to comment
GregSands Posted March 22, 2012 Report Share Posted March 22, 2012 (edited) Can you forget about the regex and use either Spreadsheet String To Array, or Scan String For Tokens? Assuming of course that the foos will be discarded, or can be added back in. Edited March 22, 2012 by GregSands Quote Link to comment
JackDunaway Posted March 22, 2012 Author Report Share Posted March 22, 2012 Nice! That's getting close, but it's missing the very last section - here's a screenshot from RegexPal: By the way, this helps enough to get me past an immediate hurdle, but can the regex be refined further to match the last section? Quote Link to comment
GregSands Posted March 22, 2012 Report Share Posted March 22, 2012 Nice! That's getting close, but it's missing the very last section - here's a screenshot from RegexPal: The regex grabs the last section in LabVIEW though. But I'd still steer away from regexes if there's any other way to do it. Quote Link to comment
JackDunaway Posted March 22, 2012 Author Report Share Posted March 22, 2012 The regex grabs the last section in LabVIEW though. But I'd still steer away from regexes if there's any other way to do it. Good point - it works in LV no prob - probably just a difference in the terminal condition of the loops between my app and RegexPal highlighting. So, as far as I'm concerned, Darin's solution works just fine! And, why would you steer away from regexes? Quote Link to comment
GregSands Posted March 22, 2012 Report Share Posted March 22, 2012 And, why would you steer away from regexes? I spent a few years doing a lot of Perl programming, so I'm not totally anti them! When they're needed, they're extremely powerful, but if you have a fixed delimiter, as in this case, then they will always be slower than a token search. Quote Link to comment
JackDunaway Posted March 22, 2012 Author Report Share Posted March 22, 2012 When they're needed, they're extremely powerful, but if you have a fixed delimiter, as in this case, then they will always be slower than a token search. Gotcha! Well, the particular example above is just a subset of what I'm *really* trying to do, (no, "foo" is not the real section header ). If you saw the full parsing requirements, we would agree that a regex with a few submatches syntactically knocks the socks off of a solution with nested token searches. ***EDIT - And after analyzing the problem a little further, the "tokens" are expressions themselves, not static, so slice-and-dicing the string could really get messy! *** Quote Link to comment
asbo Posted March 22, 2012 Report Share Posted March 22, 2012 It's beginning to look like writing your own parser is the smarter choice. It's pretty often that regex gets misused in that kind of circumstance (if I had a dollar for every time someone tried to parse HTML with regex...). Give it some thought and see if you'd come out ahead with a proper parser. By the way, why did you choose [\s\S]? "Match any whitespace or any not-whitespace." Quote Link to comment
JackDunaway Posted March 22, 2012 Author Report Share Posted March 22, 2012 By the way, why did you choose [\s\S]? "Match any whitespace or any not-whitespace." I'm glad you asked. I have not been able to figure out how to make dots match newlines by turning on single-line mode. LabVIEW does not seem to honor this setting - am I doing something wrong? It's beginning to look like writing your own parser is the smarter choice. It's pretty often that regex gets misused in that kind of circumstance (if I had a dollar for every time someone tried to parse HTML with regex...). Give it some thought and see if you'd come out ahead with a proper parser. I'm a little confused by this statement - writing a regex is writing my own parser. Quote Link to comment
asbo Posted March 23, 2012 Report Share Posted March 23, 2012 I'm glad you asked. I have not been able to figure out how to make dots match newlines by turning on single-line mode. LabVIEW does not seem to honor this setting - am I doing something wrong? It's buried in the help file, but you prefix your string with (?s) - the regex implementation in LV is pretty dirty. The correct format for a regular expression is [delimiter][expression[delimiter][options], e.g: /(foo(?:.(?!foo))+)/sgi, which is my solution for your problem (but LV doesn't like lookaround). I'm spoiled by all the time I spent working with proper PCRE. I'm a little confused by this statement - writing a regex is writing my own parser. Well, you're parsing with regex - it's not the same thing as writing a parser. A true parser is written with the specific grammar of your subject in mind; not necessarily foo, followed by some stuff, and maybe another foo like the regex is doing. The "some stuff" part is something that regex is particularly bad for - unlimited quantifiers paired with dot or equivalent tend to be a sign that regex is the wrong tool. Regex is awesome when your subject is precise, but as you can see in your case, the variable-length payload is difficult to deal with elegantly. I bring this up especially because you mentioned that your header itself is variable, which is only going to further complicate things. You might be be successful with a regex, but it will be brittle and potentially very tedious to build. Technically, yes, you can write a true parser using regex (and that's probably okay) but there's a very clear line (to me) when you're trying to do too much with one regular expression. Darin's suggestion works correctly in LabVIEW (I don't think that it should), so you might be able to get away with finding an expression which works for the header and substituting it for your foo's. 1 Quote Link to comment
o u a d j i Posted October 6, 2013 Report Share Posted October 6, 2013 (edited) . JackD_VI.zip Edited October 6, 2013 by o u a d j i Quote Link to comment
o u a d j i Posted October 6, 2013 Report Share Posted October 6, 2013 or this one : (.{3})(.|R)+?(?=1|$) Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.