zlocm Posted October 15, 2010 Report Share Posted October 15, 2010 I try to delete all html tags from sequence (like <.....>). But LabView don`t find all of tags in me sequence. My sequence: <tr id = "ololo">Hello World <br> !!!</tr> Regexp: <(.*?)> My vi (LV 2009) is in attachements. Regerds. delete html tags from sequence.vi Quote Link to comment
Phillip Brooks Posted October 15, 2010 Report Share Posted October 15, 2010 I try to delete all html tags from sequence (like <.....>). But LabView don`t find all of tags in me sequence. My sequence: <tr id = "ololo">Hello World <br> !!!</tr> Regexp: <(.*?)> My vi (LV 2009) is in attachements. Regerds. Funny, I just recently posted a vi (on the dark side) that I wrote awhile back to remove HTML tags from TestStand HTML reports. Maybe it will help you. There are some other regex nuggets in the thread... http://forums.ni.com/t5/BreakPoint/Regular-Expressions-Board/m-p/1269088#M14343 Quote Link to comment
jcarmody Posted October 15, 2010 Report Share Posted October 15, 2010 How about this? Quote Link to comment
zlocm Posted October 15, 2010 Author Report Share Posted October 15, 2010 (edited) Thanks for advice ! But regexp have problem... How about this? Edited October 15, 2010 by zlocm Quote Link to comment
jcarmody Posted October 15, 2010 Report Share Posted October 15, 2010 Thanks for advice ! But regexp have problem... What's the problem? What should the output be? Quote Link to comment
asbo Posted October 15, 2010 Report Share Posted October 15, 2010 For the record, /<(.*?)>/ is considered a fairly "bad" regular expression. You should work with something more like /<[^>]+>/. A cookie to whoever can explain why that's better Quote Link to comment
jcarmody Posted October 15, 2010 Report Share Posted October 15, 2010 (edited) As far as the problem in your original VI, you're using Shift Registers with the Match Regular Expression node incorrectly. You can't use the Offset After Match if you're only going to search what was found before and after the previous match. Savvy? For the record, /<(.*?)>/ is considered a fairly "bad" regular expression. You should work with something more like /<[^>]+>/. A cookie to whoever can explain why that's better Yours will not miss nested tags line break characters. If the tag spans multiple lines the "bad" regular expression will fail. Edited October 15, 2010 by jcarmody Quote Link to comment
asbo Posted October 15, 2010 Report Share Posted October 15, 2010 Yours will not miss nested tags line break characters. If the tag spans multiple lines the "bad" regular expression will fail. The Match Regular Expression node does have a multiline parameter to account for that; when set to True, the ^ and $ anchors no longer match line endings and the . wildcard will also match \r and \n. But more importantly, you should rarely, if ever, write a regex that uses that . wildcard. You might parallel it to global variables in LV - there are valid use cases, but they are far and few. Using more specific matching makes debugging and readability much more straightforward. Though my solution does have a flaw - if there's a nested > (within an attribute, for example), the regex will break. This way why regular expressions are almost never the correct solution for HTML/XML/*ML problems, it's a job for a proper parser (TidyHTML, for example). Quote Link to comment
ShaunR Posted October 15, 2010 Report Share Posted October 15, 2010 ....debugging and readability much more straightforward. This is regex we're talking about 1 Quote Link to comment
zlocm Posted October 15, 2010 Author Report Share Posted October 15, 2010 I have expression:<tr id = "ololo">Hello World <br> !!!</tr> Output should be: Hello World !!! But, it was: My sequence:Hello World <br> !!! This library (perl regexp) works fine with python, so this regexp get right result, but LAbView not. This pthon script: [/color][color=#1C2837]import reif __name__ == '__main__': data = '''<tr id = "ololo">Hello World <br> !!!</tr>''' table_regex = re.compile("<(.*?)>",re.IGNORECASE) print("FIRST EXRESSION ---------------------") print(table_regex.search(data).group()) print("SECOND EXRESSION---------------------") p2 = data[table_regex.search(data).end():] print(table_regex.search(p2).group()) print("THIRD EXRESSION---------------------") p3 = p2[table_regex.search(p2).end():] print(table_regex.search(p3).group()) pass[/color][color=#1C2837] returns: FIRST EXRESSION --------------------- <tr id = "ololo"> SECOND EXRESSION--------------------- <br> THIRD EXRESSION--------------------- </tr> so, this regexp should works fine. What's the problem? What should the output be? As far as the problem in your original VI, you're using Shift Registers with the Match Regular Expression node incorrectly. You can't use the Offset After Match if you're only going to search what was found before and after the previous match. Savvy? Yours will not miss nested tags line break characters. If the tag spans multiple lines the "bad" regular expression will fail. Oh, you right, thanks. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.