Non-Capturing Group in Regular Expression

JackDunaway · March 22, 2012

I'm trying to use a part of input text inside a regular expression without capturing the text in the match, but can't figure out how to construct the regex. Below is a snippet that shows what I'm trying to do, and what regex I've tried.

Basically, the string is composed of sections that start with "foo", and there is no terminal string that denotes the end of a section. In other words, you know the section has ended when you run into the next "foo", or if you hit the end of the string. I'm trying to divide this string into an array of sections.

(Note: you can copy the below input into http://regexpal.com/ rather than firing up LabVIEW)

By best-shot regex:

(foo[\s\S]*?)(?:foo)?

An example input:

foo

text

foo

some more

text

foo

some lorem

ipsum

text

foo

no more

And the snippet:

Any ideas?? Thanks in advance!

Darin · March 22, 2012

I can not test, but I would actually use a positive look-ahead instead of the capture group.

Try this:

(foo[\s\S]*?)(?=foo|\z)

GregSands · March 22, 2012

Can you forget about the regex and use either Spreadsheet String To Array, or Scan String For Tokens?

Assuming of course that the foos will be discarded, or can be added back in.

Edited March 22, 2012 by GregSands

JackDunaway · March 22, 2012

Nice! That's getting close, but it's missing the very last section - here's a screenshot from RegexPal:

By the way, this helps enough to get me past an immediate hurdle, but can the regex be refined further to match the last section?

GregSands · March 22, 2012

Nice! That's getting close, but it's missing the very last section - here's a screenshot from RegexPal:

The regex grabs the last section in LabVIEW though. But I'd still steer away from regexes if there's any other way to do it.

JackDunaway · March 22, 2012

The regex grabs the last section in LabVIEW though. But I'd still steer away from regexes if there's any other way to do it.

Good point - it works in LV no prob - probably just a difference in the terminal condition of the loops between my app and RegexPal highlighting. So, as far as I'm concerned, Darin's solution works just fine!

And, why would you steer away from regexes?

GregSands · March 22, 2012

And, why would you steer away from regexes?

I spent a few years doing a lot of Perl programming, so I'm not totally anti them! When they're needed, they're extremely powerful, but if you have a fixed delimiter, as in this case, then they will always be slower than a token search.

JackDunaway · March 22, 2012

When they're needed, they're extremely powerful, but if you have a fixed delimiter, as in this case, then they will always be slower than a token search.

Gotcha! Well, the particular example above is just a subset of what I'm *really* trying to do, (no, "foo" is not the real section header :lol: ). If you saw the full parsing requirements, we would agree that a regex with a few submatches syntactically knocks the socks off of a solution with nested token searches.

***EDIT - And after analyzing the problem a little further, the "tokens" are expressions themselves, not static, so slice-and-dicing the string could really get messy! ***

asbo · March 22, 2012

It's beginning to look like writing your own parser is the smarter choice. It's pretty often that regex gets misused in that kind of circumstance (if I had a dollar for every time someone tried to parse HTML with regex...). Give it some thought and see if you'd come out ahead with a proper parser.

By the way, why did you choose [\s\S]? "Match any whitespace or any not-whitespace."

JackDunaway · March 22, 2012

By the way, why did you choose [\s\S]? "Match any whitespace or any not-whitespace."

I'm glad you asked. I have not been able to figure out how to make dots match newlines by turning on single-line mode. LabVIEW does not seem to honor this setting - am I doing something wrong?

It's beginning to look like writing your own parser is the smarter choice. It's pretty often that regex gets misused in that kind of circumstance (if I had a dollar for every time someone tried to parse HTML with regex...). Give it some thought and see if you'd come out ahead with a proper parser.

I'm a little confused by this statement - writing a regex is writing my own parser.

asbo · March 23, 2012

I'm glad you asked. I have not been able to figure out how to make dots match newlines by turning on single-line mode. LabVIEW does not seem to honor this setting - am I doing something wrong?

It's buried in the help file, but you prefix your string with (?s) - the regex implementation in LV is pretty dirty. The correct format for a regular expression is [delimiter][expression[delimiter][options], e.g: /(foo(?:.(?!foo))+)/sgi, which is my solution for your problem (but LV doesn't like lookaround). I'm spoiled by all the time I spent working with proper PCRE.

I'm a little confused by this statement - writing a regex is writing my own parser.

Well, you're parsing with regex - it's not the same thing as writing a parser. A true parser is written with the specific grammar of your subject in mind; not necessarily foo, followed by some stuff, and maybe another foo like the regex is doing. The "some stuff" part is something that regex is particularly bad for - unlimited quantifiers paired with dot or equivalent tend to be a sign that regex is the wrong tool. Regex is awesome when your subject is precise, but as you can see in your case, the variable-length payload is difficult to deal with elegantly. I bring this up especially because you mentioned that your header itself is variable, which is only going to further complicate things. You might be be successful with a regex, but it will be brittle and potentially very tedious to build.

Technically, yes, you can write a true parser using regex (and that's probably okay) but there's a very clear line (to me) when you're trying to do too much with one regular expression.

Darin's suggestion works correctly in LabVIEW (I don't think that it should), so you might be able to get away with finding an expression which works for the header and substituting it for your foo's.

o u a d j i · October 6, 2013

.

JackD_VI.zip

Edited October 6, 2013 by o u a d j i

o u a d j i · October 6, 2013

or this one :

(.{3})(.|R)+?(?=1|$)

Sign In

Non-Capturing Group in Regular Expression

Recommended Posts

JackDunaway

Link to comment

Darin

Link to comment

GregSands

Link to comment

JackDunaway

Link to comment

GregSands

Link to comment

JackDunaway

Link to comment

GregSands

Link to comment

JackDunaway

Link to comment

asbo

Link to comment

JackDunaway

Link to comment

asbo

Link to comment

o u a d j i

Link to comment

o u a d j i

Link to comment

Join the conversation

Browse

Activity

Important Information