Soft hyphen rendering problem in modern browsers (July 2011)

Problem description

In Unicode, there are two distinct hyphen characters: U+002D (the normal hyphen) and U+00AD (the soft hyphen). OpenType and TrueType fonts can assign the same glyph or two different glyphs to those characters, using the font's "cmap" table. User agents should always use the character code U+002D to render the normal hyphen (in situations where the hyphen occurs in regular text, such as in the phrase "mid-August"), and should use the soft hyphen code U+00AD when the soft hyphen should appear at the hyphenation point (at the end of a line). This is expected behavior.

In most fonts available today, the hyphen character and the soft hyphen character either point to the same glyph or to two glyphs that are identical in width and appearance. However, there are legitimate reasons why a font developer might want to use a different design (or primarily, a different width) for those two characters.

It appears that currently (July 2011), some browsers (Safari, Opera) do the right thing, using the soft hyphen at the hyphenation point, while other browsers (Chrome, Firefox, IE) ignore the presence of the U+00AD character in the font and use the regular hyphen at the hyphenation point. Especially when webfonts are used, this may lead to lots of confusion and unexpected problems at the end-user's side.

Font used in samples

The samples uses a modified version of the Lato font by tyPoland Łukasz Dziedzic, available under the SIL OFL license. The modifications are the following: the hyphen glyph (U+002D) is drawn higher and has a larger advance width, and the soft hyphen glyph (U+00AD) is drawn lower and has an advance width that is narrower than the glyph's drawing. Also, both glyphs are visibly bolder from the rest of the typeface to make the effect visible.

You can download the font used in these samples: softhyphenbug.ttf.

Sample text

From an astronomical view, the equinoxes and solstices would be the middle of the respective seasons, but a variable seasonal lag means that the meteorological start of the season, which is based on average temperature patterns, occurs several weeks later than the start of the astronomical season. According to meteorologists, summer extends for the whole months of June, July, and August in the northern hemisphere and the whole months of December, January, and February in the southern hemisphere. This meteorological definition of summer also aligns with the commonly viewed notion of summer as the season with the longest (and warmest) days of the year (365 days), in which daylight predominates. The meteorological reckoning of seasons is used in Austria, Denmark and the former USSR; it is also used by many in the United Kingdom, where summer is thought of as extending from mid-May to mid-August. In Ireland, the summer months according to the national meteorological service, Met Éireann, are June, July and August. However, according to the Irish Calendar summer begins 1 May and ends 1 August. School textbooks in Ireland follow the cultural norm of summer commencing on 1 May rather than the meteorological definition of 1 June.

From the astronomical perspective, days continue to lengthen from equinox to solstice and summer days progressively shorten after the solstice, so meteorological summer encompasses the build-up to the longest day and a diminishing thereafter, with summer having many more hours of daylight than spring. Solstices and equinoxes are taken to mark the mid-points, not the beginnings, of the seasons. Midsummer takes place over the shortest night of the year, which is the summer solstice.

Browser rendering (screenshots)

Safari 5

Correct rendering on Safari 5.0.5 on Mac OS X 10.6.8: the hyphen character is used when the hyphen should be used, while the soft hyphen character is used where the soft hyphen should appear. It is correct that the soft hyphen character "sticks out" off the right margin: that confirms that Safari performs the rendering and full justification correctly. "Hanging hyphens" are in fact a common practice in high-end typography, though of course they would not be drawn as bold as in this sample :) .

Firefox 5

Incorrect rendering on Firefox 5.0 on Mac OS X 10.6.8: the hyphen character (U+002D) is used both when the hyphen and when the soft hyphen character should be used.

Chrome 12

Incorrect rendering on Chrome 12.0.742 on Mac OS X 10.6.8: the hyphen character (U+002D) is used both when the hyphen and when the soft hyphen character should be used. Same problem persists in Chrome 14.0.816 Canary and Chromium 14.0.816.

Internet Explorer 9

Incorrect rendering on Internet Explorer 9 on Windows 7: the hyphen character (U+002D) is used both when the hyphen and when the soft hyphen character should be used.

Internet Explorer 7

Incorrect rendering on Internet Explorer 7 on Windows XP: the hyphen character (U+002D) is used both when the hyphen and when the soft hyphen character should be used.

Opera 11.50

Correct rendering on Opera 11.50 on Mac OS X 10.6.8: the hyphen character (U+002D) is used when the hyphen should be used, and soft hyphen (U+00AD) is used when the soft hyphen should be used. However, Opera 11.50 seems to have strange problems with full justification when the text is hyphenated (the right edge of the text is not really well-aligned).

Also, Opera has another problem: after executing "Reload", the page is redrawn in such a way that all the portions of the soft hyphen glyph that exceed beyond its right sidebearing get cut off. Switching to a different app and back "fixes" the display.

Credits

Samples created by Adam Twardoch. Samples use a modified version of Lato font by tyPoland Łukasz Dziedzic, available under the SIL OFL license. Text sourced from Wikipedia. Hyphenation generated on-the-fly using Hyphenator.js.

Last modified: 2011-07-09.