Finding HTML Stuff with Regular Expressions

Following-up my post [cref 69], here’s what I used it for:

<tr\b[^>]*>([\s\S](?!tr\b[^>]*>))*?(String1|String2)[\s\S]*?</tr>

This is an excellent way to find a chunk of HTML with regex. It finds specific table rows containing either “String1” or “String2”, regardless of linefeeds, carriage returns, or other nefarious forms of whitespace.

Adapted from one of the more simple incarnations in Steve Levithan’s post about evolving a regex to find innermost HTML elements. Could be more complete and/or efficient.. but at least this one is fairly easy (kinda) to understand.

2 thoughts on “Finding HTML Stuff with Regular Expressions

  1. Very cool. One minor suggestion is to change ([\s\S](?!tr\b[^>]*>))*? to ([\s\S](?</?!tr\b[^>]*>))*? so it will also match some corner cases like <tr>Str<b>String1</b></tr>

    Or is there something I’m missing?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.