Wednesday, February 1, 2017

Collectr's Curmudgeonly Guide to QC

When I took up fansubbing in 2006, most groups had strong QC teams and QC processes. The teams took pride in putting out a good product; revisions after release were looked on with distaste. That began to change in the later years of the decade. Groups began to compete on getting releases out quickly – the so-called "speed-subbing" phenomenon. Nothing helps to shorten time-to-release than trimming lengthy parts of the release process, like QC. gg was among the first to go this routine, dispensing with QC altogether. Then as simultaneous streaming took over, QC seemed less important, because the "official subs" were assumed to be decent (a dangerous assumption, it turned out). Old-line groups tried to retain a strong QC process, but in most groups the QC team atrophied. Recruiting QCs became more and more difficult.

QC is probably the least understood – and least appreciated – part of the fansubbing process. QC is all about finding mistakes, not fixing them.  If you do your work as a QC well, no one will notice; and if you do it poorly, everyone will blame you for the mistakes that got through. QC requires attention to detail, as well as selflessness that is rather rare these days. It's also an excellent way to learn and understand the fansubbing process in all its complexity.

What is QC?

QC is the process of reviewing a fansub for mistakes – in translation, timing, editing, typesetting, or encoding – and for possible improvements. There are two phases: script QC (SQC) and release QC (RQC). The former is focused on the script while it is still easily changed; the latter on the final, hopefully releasable episode. In each case, the QC's job is to write a report detailing the errors and suggested changes; it is not to change the script.

SQC is usually done in Aegisub, the ubiquitous tool for subtitling anime. Aegisub offers many advantages, including the ability to replay lines easily and to step frame-by-frame when necessary. It also has a built-in spelling checker and other helpful tools.

RQC can be done in Aegisub, but it is better done by watching an encoded and muxed file. This allows for checks that only apply to the released file: missing or incorrectly typed fonts; missing or incorrect chapters; random muxing mistakes that affect the video or audio.

Script QC (SQC)

Before starting, you will need the script, the encoded file, and the fonts used in the episode. Any unique fonts must be installed before invoking Aegisub. They can be deleted later, if you don't want your font folder to become unduly cluttered. It also helps to have an editing guide, which details the conventions to be used in the show.  (See my blog entry on editing for information about compiling an editing guide.) If the translator or editor didn't supply one, you should compile one for yourself. This is particularly important for long series, where inter-episode consistency is easily lost, or for fansubs based on CrunchyRoll scripts; CR is notorious for changing character names from script to script.

With all that in hand, it's time to fire up Aegisub and start looking for errors.

Translation Errors

Unless you know Japanese yourself, you are unlikely to find true translation errors, but even a non-speaker can spot certain issues:

  • Discrepancies in length. Sometimes a long Japanese line is translated as a very short English sentence. (The reverse happens as well, although it's less common.) Some compression is to be expected, particularly on conventional polite phrases, but significant length discrepancies may indicate that a phrase or clause has been dropped.
  • Inconsistent romanization of names. Japanese names with long vowels (Kōsaku) can be romanized either by adding extra letters (Kousaku) or by treating long vowels as normal vowels (Kosaku). Whichever is chosen, it needs to be applied consistently to all Japanese names.
  • Inconsistent honorifics. If the translation includes honorifics, then it needs to include them wherever they are present, and to exclude them when absent. It is easy to confuse honorifics with Japanese particles, e.g., to hear "-no" as "-dono."
  • Inconsistent character names. This is a particular hazard in long series.
Timing Errors

Timers can have different conventions for handling lead-in, lead-out, lines that cross scene boundaries, and so on (see this blog entry). You need to understand the timer's preferred style before flagging timing errors.

In checking timing, it is really helpful to have a keyframes file. Modern compression algorithms, like H.264, do not put a keyframe at every scene change and will insert a keyframe in the middle of a long, static scene. A keyframes file provides a better (but not foolproof) indicator of where scene boundaries really are. There are batch scripts that will generate a keyframes file, if the encoder does not provide one.

While it is possible to check timing as you go, I usually make a separate pass, looking only at the audio display in Aegisub, to check timing. Issues to look for include:

  • Missing lead-in or lead-out. Unless a line abuts against a scene boundary or another line, it should have both lead-in and lead-out.
  • Scene shortfalls. With certain exceptions, lines should not start or stop a few frames from a scene boundary. The timer should have a standard about how many frames after the start or before the end of a scene must be present. If the line violates these standards, it should be snapped to the appropriate scene boundary.
  • Scene bleeds. Sometimes, a line crosses a scene boundary by just a slight amount. The decision of whether to terminate the line at the scene boundary, or to continue into the next scene, depends on the timer's standards. Some timers cross the boundary if there's a full word in the next scene; other if there's a full syllable in the next scene.
  • Gap between adjacent lines. Two adjacent but separated lines must have a minimum time between them, as established by the timer. Otherwise, they should be joined by extending the lead-out of the first line and possibly the lead-in of the second.
  • Lead-out/lead-in balance between joined lines. When adjacent lines are joined, the balance between lead-out and lead-in can be tricky, particularly if the time spacing is short. If there's any spacing at all, there should be both lead-out and lead-in, even if below the normal minimums.
  • Song timing errors. After the first episode in a series, the song translations are simply cut and pasted from episode to episode. A line at the start or end may be missed. Changes in keyframes may result in scene bleeds. The songs need to be checked on every episode, a tedious process.
Timing checks are complicated by the issue of false keyframes. Sometimes, a keyframe gets generated when there is, in fact, no real scene change. Thus, every possible timing violation involving a keyframe has to be checked to see if the scene boundary is really there.

Editing Errors

This is the largest category of checks, and includes spelling, grammar, punctuation, and style. Using tools can help to automate editing checks, but there is still a lot of staring and thinking that has to be done.

Automated Tools

Aegisub has a built-in spelling checker, but it gets tripped up by Japanese names and phrases, and of course by the romanji in songs, if included.

A different approach is to use the spelling and grammar checker in Microsoft Word.

  • Export the script as a plain text file.
  • Edit the text file to remove any songs and signs.
  • Join any sentences that are split across multiple lines into a single line.
  • Replace all line breaks (\N) with space, and then replace any double spaces with single space.
  • Save the edited file.
  • Load the edited text file into Word and press F7. 
Word's checker is far from perfect. In particular, it gets grumpy about incomplete expressions and messes up on some common clichés (for example, it doesn't like "It's all my fault.") All alleged spelling mistakes have to be looked at; when in doubt, check the word on Google. Nonetheless, Word will find subtle mistakes that often get missed by the eye, like repeated articles ("the the") and "its/it's" confusion.

Problems can arise with expressions that have multiple acceptable spellings, like "goodbye." Any of the accepted variants is fine, but they need to be used consistently. The same applies to "Um" vs "Umm," "Hm" vs "Hmm," and "Geez" vs "Jeez." Hyphenation can be tricky too. Some compound English words are now simply joined (like "heartbreak"); others are not. Again, when you have a concern, Google is your friend.

Finally, a non-US spell checker will flag spellings that vary between US and UK usage, like "honor/honour." Most fansub groups use US spelling and grammar.
Grammar and Punctuation

English grammar and punctuation are very complicated, and you need to know the rules of the road. My blog on editing describes some of the trickier rules, but I stumble over new ones all the time. For example, plurals of mnemonics are made by simply adding an "s", e.g. "The ABCs of Love" rather than "The ABC's of Love." The most common problems seem to be:

  • Singular/plural agreement. Impersonal sentences are particularly troublesome.
  • Commas after "Then or So" or before "too" or in interjections beginning with "Oh." Both including the comma and omitting it are acceptable; including it is more formal, omitting it more conversational. Whatever choices are made, they need to be used consistently.
  • Commas in compound sentences (and not in compound clauses). Compound sentences (two complete sentences joined by "and" or "or") must have a comma between them. Compound phrases (a sentence with one subject and two verbs, joined by "and" or "or") must not have a comma between the phrases. This rule is frequently violated in streaming scripts.
  • Subjunctive conjugation. The English subjunctive is a swamp and can result in some quite peculiar sounding phrases, e.g. "If he be…" I generally prefer to ignore subjunctive conjugations, but if one is used, it needs to be right.
  • Punctuation of quotations. US grammar and English grammar differ here. In the US, a concluding comma or period is placed inside the closing quotation mark, while an exclamation point or question mark is placed outside. In the UK, all punctuation marks are placed outside the closing quotation mark.
  • Overuse of ellipses. Don't get me started on this one.

Style issues are really nebulous, and it's all too easy for a QC to turn into a "back-seat editor" (which will really tick off the editor, by the way).  Still, there are style issues that the QC should look at and potentially flag:

  • Inconsistent use of contractions. Most anime dialog is conversational speech. In English, conversational speech uses contractions. Formal speech may be appropriate in some cases (for example, an elderly servant, a snooty ojou-sama), but the formal versus informal distinction needs to be consistent. Teenagers rarely speak formally, so their speech should use contractions.
  • "Will" versus "Shall." This is a particular instance of formal versus informal speech. The word "shall" rarely appears in US English conversation; its use is reserved to legal documents ("Congress shall make no law…"). The most common violation is "Shall we go?" In conversational speech, a person would say "Let's go, okay?" or "Should we go now?"
  • Impersonals. Japanese translations are often full of impersonal phrases: "It seems…" or "There are…" Overuse makes the dialog stilted.
  • Repeated words. If the same word appears in successive lines, it can be very jarring, unless the repetition is intended as reinforcement or is a quotation. "Just" gets thrown in way too often.
The list of potential style problems is endless; see my blog on editing for a more comprehensive discussion.

Typesetting Errors

Typesetting must be inspected visually. To do that correctly, all fonts used in the episode must be installed prior to running Aegisub. Common problems include:

  • Styling errors. If the script uses different styles for dialog versus thought, or present time versus flashbacks, each line must be checked for use of the correct style. Application of a "thought" style can be tricky if the character involved is not on-screen or is turned away from the viewer.
  • 3-liners. If a line is too long, it may occupy three lines instead of two.  Alternately, a 3-liner may be created if a two-line sub overlaps with another line. (Make sure you've installed the dialog font before flagging these kinds of errors.)
  • Italics errors in fonts without true italics. If a font lacks true italics, the subtitle renderer creates pseudo-italics by leaning the font to the right. This causes crowding between an italicized word and a subsequent non-italicized word. The typesetter must provide padding (e.g. {\i1}word{\i0\fscx130} {\fscx}word).
  • Crowding in fonts with true italics. Even with true italics, an italicized word that abuts an exclamation mark or question mark may look crowded. The typesetting must provide padding (e.g. {\i1}word{\i0\fscx30} {\fscx}!)
  • Sign/dialog overlap. Signs may occur in any part of the screen and can overlap the dialog. If the dialog is not assigned to a higher layer than the sign, the dialog will be "under" the sign. The dialog may need to be moved to the top of the screen in order not to conflict with the sign.
  • Incorrect start or end time. Every sign needs to be inspected for correct start and end time.
  • Missing signs. Sometimes, signs that seem germane may not be typeset. This may be a deliberate decision on the part of the translator or typesetter, or it may be inadvertent.
Encoding Errors

As part of SQC, the QC must actually watch the episode from end to end in order to check for mistakes in the video and audio. It's all too easy to skip from line to line, but in that case, errors between lines will be missed.

The SQC Report

The QC provides a written report of suggested changes back to the team. Comments can be sorted by translation, timing, editing, typesetting; if not sorted, then the comment needs to indicate who in the team needs to look at the issue:

TL:          line (with time references; simply cut and paste from the script)

For editing comments, where the QC has a suggestion to make, the comment can be:

Edit:       line
    suggested new line
Now, if there are a lot of changes, generating a comprehensive report may be really tedious. One shortcut I use is to "fix" a script as I go, save it under a new name, and then generate a "differences" report using Linux diff or Windows WinMerge. This differences report includes the old and new lines, with time references. It's then very easy to annotate each change with a rationale or a description of the underlying issue.

Release QC (RQC)

RQC differences from SQC in two significant ways. First, it is done on a finished file, rather than by using Aegisub. Second, it flags only grievous errors, such as missing fonts, bad chapters, and so on. I've already described my release checking process in this blog entry, so I won't repeat the detailed checklist. For comprehensive checking, you will need the final script as well as the finished episode. Here are a few of the more critical steps in RQC:

  • Load the final script into Aegisub and use the "Font Collector" feature to compile a list of required fonts. Check for errors (missing fonts, missing glyphs in fonts). Note that lack of italics or a bold font variant is not a fatal problem; the subtitle renderer compensates.
  • Use Linux diff or Windows WinMerge to compare the initial and final scripts. Check that all changes were done correctly, e.g. with proper spelling.
  • Use "mkvmerge -i" to get a list of fonts attached to the file. Make sure that every font has the correct MIME type (x-truetype-font). Check that all fonts are included. Check that the chapter file is included (if the episode is chaptered).
  • Spot check the episode. Check that the correct script was muxed in. Check that tracks are properly labeled. If chapters are included, check that the chapter timing points are correct.
  • Play the episode from end to end. Pay particular attention to songs and signs, and look for any encoding problems. If there are multiple scripts (for example, honorifics and no honorifics), you will have to watch the episode twice (gag).
While it's possible to provide editing suggestions during RQC, you should not expect them to be followed. RQC is about real mistakes, not differences of style or opinion.

Life after QC

When I was a QC, I couldn't wait to "graduate" to a more creative position, editing in particular. Over time, I've added a limited ability to typeset and time to my skill set. However, I still do QC, particularly for other teams. I find that QC is a great way to avoid getting "boxed in" by my own habits.  I get to see how other editors and typesetters work, and I always learn from them. It also builds up goodwill, which I can draw on if I run into thorny issues in Orphan.

So whether you want to do QC forever or view it as an entry point into fansubbing, give it a try. The mistakes are all there, waiting to be found.

No comments:

Post a Comment