The recursive VI approach along with the Build Array is not memory efficient. Seems a more complicated solution. My preference is the preallocation of arrays with the swap/cut method.
Focusing exclusively on timing, it appears that about 80-90% of the time expense of these routines is taken up by the Search/Split String primitive. Even changing out the swap/cut method for a build array only increases processing time by 10%, whereas deleting the Search/Split String and replacing it with a constant zero on the case input reduces overall timing 80-90%. This shows that no real significant timing gains can be achieved without a focus on improving the primitive.
EDIT: Ok, just read page 2 (duh, should make sure I remember to flip the page!) I did assume that the match could be anywhere in the string.