Monday, May 14, 2012

It's All in the Timing

I'm not a timer, and I don't play one on TV, but in the course of doing editing and QC, I've picked up the basics of this deceptively simple and absolutely necessary part of the fansubbing process. As a result, I've become quite sensitive to bad timing and find myself correcting script timing more often than I would like.

Timing is the process of fitting subtitles to the spoken dialog. Subtitles should appear when a line begins and disappear when a line ends, more or less. (It's a bit more complicated than that.) Really good timers can time a script almost as fast as they can listen to it. For me, the process takes hours, and by the time I'm done, my wrists are really hurting from all the mouse clicks - a sure sign that I'm not doing it correctly.

When I started fansubbing back in 2006, the group I first joined believed in precise timing. Lines appeared precisely when speech began and disappeared precisely when speech ended. This didn't bother me, because I'm a fast reader, but after I started working with other teams, I learned that almost everyone added padding before (lead-in) and after (lead-out) lines, to allow more reading time. In addition, they added additional padding between adjacent lines (joining) so that lines didn't disappear and appear quickly, an annoying visual pattern called flashing. Finally, every group tried to make sure that lines didn't spill over a scene boundary change unnecessarily (scene bleed) or start just after, or end just before, a scene boundary (reverse scene bleed).

Thus, a timer's style can be defined by a relatively small number of parameters:
  1. How many frames of lead-in?
  2. How many frames of lead-out?
  3. How many frames allowed between lines for joining?
  4. How many frames should be added to pad to a scene boundary?
  5. What are the rules on scene bleeds?
1-4 are strictly numeric and can be applied by rote; in fact, Aegisub - the most popular subtitling program - has a tool that will apply these parameters to a precisely timed script.

The only "controversy," and it's a rather mild one, is around the rules for scene bleeds. Clearly, if a line continues on for a substantial amount of time beyond a scene change, the subtitle has to remain on the screen. On the other hand, if the continuation is just an extension of the last syllable of the line, the subtitle should be cut off at the scene change. But what if there's a whole word, or even just a whole syllable, beyond the scene change? Opinions differ.

My "style" as a timer can be summed up as follows:
  1. Five frames for lead-in (200ms). Most timers use 200-300ms.
  2. Seven or eight frames for lead-out (300ms). Most timers use 300-500ms.
  3. Join if gap is less than or equal to eight frames (320ms). Most timers use 500-1000ms.
  4. Extend to scene boundary if less than or equal to six frames (250ms). Most timers use 300-500ms.
  5. Extend over a scene boundary if a word or significant syllable would be cut off. No real consensus on this one.
If you prefer the majority position on timing to mine, fear not: I've only timed 58 scripts in the last ten years, mostly for my Orphan Fansubs label, and that was 58 too many.

In my opinion, timing doesn't get enough respect. Bad timing makes subtitle viewing really unpleasant. For example, almost all R1 subtitle scripts are badly timed, and I almost always prefer to watch fansubs instead. If you think it's easy, give it a try. I think you'll start to understand why timing is important, and why good timers are hard to find.

[Revised 15-Nov-2015]


  1. "For example, almost all R1 subtitle scripts are badly timed, and I almost always prefer to watch fansubs instead."

    Indeed, it wasn't until I started timing a lot of scripts in late 2008 with Sugar Sugar Rune that I realized just how poor R1 DVD subtitle timing was. Some of it's the format's fault -- DVD subs can do weird things if 2 consecutive lines end and begin on the same frame. Thus they have to insert a short gap of a frame or two, leading to that annoying "flash" effect. But when it comes to scene bleeds, they just don't care. I've heard reps at conventions pay lip service to not letting subtitles cross scene changes, but the actual discs tell a different story.

    I've got no idea which term is correct, but I've always called lines that start just before scene changes "inverse scene bleeds." Starting just after / ending just before is "flicker" in my book.

    One issue I didn't see addressed is total line length. Some groups like to use "fragmented timing," where subs contain very little text, are often onscreen for fractions of a second, and move to the next subtitle at every slight pause in the dialogue. Other groups use "agglomerated timing," where very long blocks of text stay onscreen for long periods of time (6+ sec), often across very noticeable pauses in dialogue.

    Neither of these extremes are good imo. Fragmented timing requires speed reading, constantly shifts viewer focus to changing subs, and leads to disjointed, overliteral writing that tries to line up with Japanese sentence structure. ("Your costume..." || "I'm going to go look for it.") Agglomerated timing gives away too much too soon from a dramatic standpoint, and is almost always accompanied by purple prose and insufficient horizontal margins.

    The best approach is to find a middle ground, which is why I try to keep my subs within the 2 to 5 second range.

    1. You make a really good point about line length. Fragmentation is bad practice, as is agglomeration across pauses. When there's no audible pause (rare), it's a tradeoff about finding a logical breaking point in the audio and the English.