Jump to content
JackDunaway

Non-Capturing Group in Regular Expression

Recommended Posts

I'm trying to use a part of input text inside a regular expression without capturing the text in the match, but can't figure out how to construct the regex. Below is a snippet that shows what I'm trying to do, and what regex I've tried.

Basically, the string is composed of sections that start with "foo", and there is no terminal string that denotes the end of a section. In other words, you know the section has ended when you run into the next "foo", or if you hit the end of the string. I'm trying to divide this string into an array of sections.

(Note: you can copy the below input into http://regexpal.com/ rather than firing up LabVIEW)

By best-shot regex:

(foo[\s\S]*?)(?:foo)?

An example input:

foo

text

foo

some more

text

foo

some lorem

ipsum

text

foo

no more

And the snippet:

post-17237-0-49550000-1332448940.png

Any ideas?? Thanks in advance!

  • Like 1

Share this post


Link to post
Share on other sites

I can not test, but I would actually use a positive look-ahead instead of the capture group.

Try this:

(foo[\s\S]*?)(?=foo|\z)

  • Like 2

Share this post


Link to post
Share on other sites

Can you forget about the regex and use either Spreadsheet String To Array, or Scan String For Tokens?

post-3889-0-06922000-1332450695.png

Assuming of course that the foos will be discarded, or can be added back in.

Edited by GregSands

Share this post


Link to post
Share on other sites

Nice! That's getting close, but it's missing the very last section - here's a screenshot from RegexPal:

post-17237-0-04875600-1332450598.png

By the way, this helps enough to get me past an immediate hurdle, but can the regex be refined further to match the last section?

Share this post


Link to post
Share on other sites

Nice! That's getting close, but it's missing the very last section - here's a screenshot from RegexPal:

The regex grabs the last section in LabVIEW though. But I'd still steer away from regexes if there's any other way to do it.

Share this post


Link to post
Share on other sites

The regex grabs the last section in LabVIEW though. But I'd still steer away from regexes if there's any other way to do it.

Good point - it works in LV no prob - probably just a difference in the terminal condition of the loops between my app and RegexPal highlighting. So, as far as I'm concerned, Darin's solution works just fine!

And, why would you steer away from regexes?

Share this post


Link to post
Share on other sites

And, why would you steer away from regexes?

I spent a few years doing a lot of Perl programming, so I'm not totally anti them! When they're needed, they're extremely powerful, but if you have a fixed delimiter, as in this case, then they will always be slower than a token search.

Share this post


Link to post
Share on other sites

When they're needed, they're extremely powerful, but if you have a fixed delimiter, as in this case, then they will always be slower than a token search.

Gotcha! Well, the particular example above is just a subset of what I'm *really* trying to do, (no, "foo" is not the real section header :lol: ). If you saw the full parsing requirements, we would agree that a regex with a few submatches syntactically knocks the socks off of a solution with nested token searches.

***EDIT - And after analyzing the problem a little further, the "tokens" are expressions themselves, not static, so slice-and-dicing the string could really get messy! :o ***

Share this post


Link to post
Share on other sites

It's beginning to look like writing your own parser is the smarter choice. It's pretty often that regex gets misused in that kind of circumstance (if I had a dollar for every time someone tried to parse HTML with regex...). Give it some thought and see if you'd come out ahead with a proper parser.

By the way, why did you choose [\s\S]? "Match any whitespace or any not-whitespace."

Share this post


Link to post
Share on other sites

By the way, why did you choose [\s\S]? "Match any whitespace or any not-whitespace."

I'm glad you asked. I have not been able to figure out how to make dots match newlines by turning on single-line mode. LabVIEW does not seem to honor this setting - am I doing something wrong?

It's beginning to look like writing your own parser is the smarter choice. It's pretty often that regex gets misused in that kind of circumstance (if I had a dollar for every time someone tried to parse HTML with regex...). Give it some thought and see if you'd come out ahead with a proper parser.

I'm a little confused by this statement - writing a regex is writing my own parser.

Share this post


Link to post
Share on other sites

I'm glad you asked. I have not been able to figure out how to make dots match newlines by turning on single-line mode. LabVIEW does not seem to honor this setting - am I doing something wrong?

It's buried in the help file, but you prefix your string with (?s) - the regex implementation in LV is pretty dirty. The correct format for a regular expression is [delimiter][expression[delimiter][options], e.g: /(foo(?:.(?!foo))+)/sgi, which is my solution for your problem (but LV doesn't like lookaround). I'm spoiled by all the time I spent working with proper PCRE.

I'm a little confused by this statement - writing a regex is writing my own parser.

Well, you're parsing with regex - it's not the same thing as writing a parser. A true parser is written with the specific grammar of your subject in mind; not necessarily foo, followed by some stuff, and maybe another foo like the regex is doing. The "some stuff" part is something that regex is particularly bad for - unlimited quantifiers paired with dot or equivalent tend to be a sign that regex is the wrong tool. Regex is awesome when your subject is precise, but as you can see in your case, the variable-length payload is difficult to deal with elegantly. I bring this up especially because you mentioned that your header itself is variable, which is only going to further complicate things. You might be be successful with a regex, but it will be brittle and potentially very tedious to build.

Technically, yes, you can write a true parser using regex (and that's probably okay) but there's a very clear line (to me) when you're trying to do too much with one regular expression.

Darin's suggestion works correctly in LabVIEW (I don't think that it should), so you might be able to get away with finding an expression which works for the header and substituting it for your foo's.

  • Like 1

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.