Lesson 103 - Pattern matching

Often you have pages, where dynamic and static parts mix. In our example, we're going to use a fictional movie theater's website. Here is the description of a few movies they are showing right now.The non-translateable parts are the title, the names of the actors and the name of the director.

Title: The Independents
Year: 2015
Starring: Ronald Russo, Kevin Spielberg, Holley Spacey, James D. Frail
Directed by: James Aranofsky
Genre: Drama, Comedy
Price: $10
Opening: 4/15
Showtimes: 10a, 1p, 4:15p
Summary: Joe and Jane are a couple who decide to go independent. There is just one problem: neither of them are actually employed the first place. Hillarity ensues.

Title: Enemy at the Barn
Year: 2014
Starring: Julia Franco, René Willis, Amanda Frakes
Directed by: Jane Lemon
Genre: Horror
Price: $12
Opening: 4/17
Showtimes: 11:15a, 3p, 6p, 8:15p
Summary: Hannah and Therese decide to go on a vacation to a small farm in the middle of nowhere. But they are going to have to face the evil that is hiding in the barn, if they want to survive.

Title: Chocolately ever after
Year: 2015
Starring: Billy Piper, Tina Almberg, Randy Starr
Directed by: Istvan Unger
Genre: Romance, Comedy
Price: $14
Opening: 5/2
Showtimes: 9a, 12:15p, 6p, 8:30p
Summary: Tom is a successful businessman. Amy runs a small bakery that specializes in chocolate-covered pastries. One day their lives cross paths and they get married.

The process

To demonstrate this in practice, let's translate this page in Easyling. Create a new project with this lesson's URL (lesson103.tutorial.easyling.com), and scan the page. If you open it for translation, you will see something like this:


To fix this, we're going to add a couple of rules at Dashboard - Advanced Settings - Pattern Matching. Here you can define the regular expressions for excluding certain things from the translation. For our particular case, this one will do:

(?:(?:Title|Starring|Directed by|Price|Opening|Showtimes|Year)\:)(.+)

Note that the tags within a given segment are ignore during matching (i.e. the <b> tags used for formatting), but they are not stripped and present in translation, as you can see it on the screenshot above.

Whitespace, however, is a different matter: whitespace is preserved, so you have to make sure your regexes cover that. To further complicate things, multiple whitespace characters are rendered as one by default in HTML (i.e. three spaces · · · will appear as a single one · ). Hence it is a good practice, to use the regular expression \s+ wherever whitespace is expected, as .+ or .* will not always catch them.

You can find an explanation for this particular regex here, which is the External pattern tester we link to on the page (see below).

The next step is to clean out the existing translations. Updating our regular expression will not affect our existing segments; however, after deleting them, re-scanning the page will hide the segments we do not want to translate and it will not be counted in the statistics.

Open the page for translation again, and click the search bar on the top. Search for the pattern /.+/, which will select all the source segments. Click the hourglass icon on the left, which will give you a warning:

After deleting the segments, scan the page again. Based on our regular expression above, certain segments will be hidden and will not show up on the workbench, nor in the statistics: