Writing regular expressions in the context of a new programming language

One of the things that is kind of annoying about regular expressions is that every programming language implements them slightly differently. If you can, find someone who can give you the low-down in this new language. Otherwise, you’ll have to stick with googling, which can take a while to figure out what you need. I’ll get you started with a few languages.

General Resources

Two of my favorite and most helpful resources:

  • RegExPlanet.com: lets you test regular expressions in many different languages.
  • Regex Cheat Sheet: has a pretty comprehensive general overview of regex syntax.

Regular Expressions in Ruby

One of the easiest ways to get started with regular expressions in Ruby is via Rubular.com. This site provides a way to test regular expressions against any text, as well as a quick cheat sheet to help. RegExPlanet.com also has a Ruby tester that is in beta.

Regular Expressions in Java

For help with Java, I really like using the tester at RegExPlanet.com. It does two really cool things.

  1. Different Java methods (apparently) use regular expressions differently. RegExPlanet.com shows if and how a regular expression will work with each of the methods.
  2. RegExPlanet.com also provides the ‘Java string’ for use in Java methods. In Java, we have to escape the backslashes with additional backslashes. This can get pretty confusing very quickly, so having RegExPlanet.comgenerate that string for me is very helpful.

Regular Expressions in JavaScript

Using regular expressions in JavaScript was where I really began using the pattern modifiers. If a regular expression is between two forward slashes, pattern modifiers are the letters that come after the last forward slash.

/hello/i

This regular expression will match the word ‘hello’ as well as capitalized ‘Hello’, and all caps ‘HELLO’. It will even match the super fancy ‘hElLo’, if you are into crazy stuff like that. The i modifier tells the regular expression to be case-insensitive.

Another really useful modifier for regular expressions in JavaScript is g, which stands for ‘global’. In JavaScript, regular expressions will find the first time it matches a pattern and then stop. Using the g modifier tells it to find ALL the instances where a string matches the given pattern.

Tips and Tricks: Useful Regular Expressions

I’ve learned a whole lot about regular expressions over the last year and a bit, and there are two ‘phrases’ for lack of a better term that are sort of like the nuclear option or like a blunt object, but will get me most of where I need to go.

Phrase 1: .*?

This is one of the most useful regular expression ‘phrases’ I’ve learned. It selects zero or more of any kind of character, including spaces, but isn’t greedy. (That just means that it’ll stop the first time it sees whatever is put after the question mark). The key limitation of this phrase is that it will not include new lines (though the exact implementation varies from language to language).

Phrase 2: .+?

Related to the first phrase, this one selects one or more of any kind of character, including spaces, but isn’t greedy. This is useful when something has to exist, but can be any number of things. Like the one before it, this (usually) doesn’t include new lines.

Phrase 3: [\s\S]*?

This ‘phrase’ is even more ‘powerful’ than the one above. This one selects zero or more of any kind of character, including all kinds of whitespace (yes, also new lines), but again, isn’t greedy. I used this one a LOT for a while, especially in Java.

Get creative

I was having trouble getting a regular expression to select an section of HTML that was broken up onto different lines irregularly 1. Finally I figured out that I wanted to use \s*? (zero or more whitespace characters, not greedy) to account for the different amounts of spacing, but not to pick up any letters or other characters.


  1. yes, I know regex is not the tool of choices for HTML, but my hands were a bit tied on this project