Tips and Tricks: Useful Regular Expressions

I’ve learned a whole lot about regular expressions over the last year and a bit, and there are two ‘phrases’ for lack of a better term that are sort of like the nuclear option or like a blunt object, but will get me most of where I need to go.

Phrase 1: .*?

This is one of the most useful regular expression ‘phrases’ I’ve learned. It selects zero or more of any kind of character, including spaces, but isn’t greedy. (That just means that it’ll stop the first time it sees whatever is put after the question mark). The key limitation of this phrase is that it will not include new lines (though the exact implementation varies from language to language).

Phrase 2: .+?

Related to the first phrase, this one selects one or more of any kind of character, including spaces, but isn’t greedy. This is useful when something has to exist, but can be any number of things. Like the one before it, this (usually) doesn’t include new lines.

Phrase 3: [\s\S]*?

This ‘phrase’ is even more ‘powerful’ than the one above. This one selects zero or more of any kind of character, including all kinds of whitespace (yes, also new lines), but again, isn’t greedy. I used this one a LOT for a while, especially in Java.

Get creative

I was having trouble getting a regular expression to select an section of HTML that was broken up onto different lines irregularly 1. Finally I figured out that I wanted to use \s*? (zero or more whitespace characters, not greedy) to account for the different amounts of spacing, but not to pick up any letters or other characters.


  1. yes, I know regex is not the tool of choices for HTML, but my hands were a bit tied on this project 

Some things I learned about case statements in Ruby

I’ve been working with case statements in Ruby, and I was having a difficult time getting it to do what I wanted.

Things I learned:

  1. Case statements that take an argument after the case (ex: case SOME_VARIABLE) can’t take logic after the when part of the statement.

    Bad:

    case thingie
    when thingie.match(/some regex/)
    # something cool happens
    end
    

    Good:

    case thingie
    when "a perfectly boring string"
    # something cool happens
    end
    
  2. Case statements that do not take an argument after the case CAN take logic after the when

    Good:

    case
    when thingie.match(/some regex/)
    # something cool happens
    end