Potter of Babble has been running a bit slowly the past few months. One of the reasons is because the original Spellman Spectrum got out of whack. When it was first designed, it was making use of only a few dozen data points per translation. The amount of data points has since multiplied, though, and now they number in the triple digits. The model’s calculation became lopsided and no longer seemed to reflect whether a translation was more rigid or loose, as the scale was intended to do.

So we’ve spent some time working on modeling a Spellman Spectrum v2.0. That model aims to be more firmly rooted in concepts of data science, but conceptualizing a workable model has taken much longer than anticipated.

As a stopgap to keep things going, I’ve tinkered with the original model, which I’m calling v1.1. Like the original model, it relies largely on an evaluation of various weighted data points. Here’s a bit more about who model v1.1 works:

  1. It’s no longer dependent on the amount of data points. Whether a translation is 10% evaluated or 90% evaluated, the scale should work. Nor should the number of data points matter, whether 50 data points or 1500.
  2. It’s bound to a sigmoidal function. A basic mechanic used in supervised machine learning, sigmoidal functions are a great tool in tailoring models to fit shifting data sets, as is the case while building model v2.0. It also creates greater distinction in the middle of the scale, where most of the translations lie. A book that previously rated 45 might now be closer to 40, while a book that previously rated 55 might be closer to 65. This leads to a greater spread throughout the scale.
  3. It now uses aggregate weighting. All data points in a category are totaled before weighting is applied. This allows us to collect as much data as possible in a weighted category even if such a representative amount is unavailable for another weighted category. That, in turn, means a more finely tuned score.
  4. It accounts for having too little data in a high-ranking category. At the time of this post, most of the evaluated translations have 5 or fewer data points in the highest-ranking category, which is too small to be considered representative. The scale automatically adjusts the weighting of a category if it falls below a representative threshold.
  5. It changes how some translations rank. The Maori translation was previously the top-rated, in part because of how much the translation nativizes the high number of low-ranking data points. With the weighting mechanisms adjusted, Maori still ranks very high, but below a few other translations. Meanwhile, while the Brazilian Portuguese rating stayed virtually unchanged at 56.3, the Lusitanian Portuguese rating declined from 57 to 40.

The model still has its problems, which is why the work doesn’t stop at v1.1. One of the drawbacks of this version is that the highest-rated translations on the scale do not have a good spread. French and Norwegian rate 96.6 and 99.4, respectively. Not only does that quantitative proximity mask the qualitative differences between them, but their scores’ closeness to 100 misleadingly suggests that you can’t get much more creative than the Norwegian translation. This is a side effect of the transformations made to the sigmoidal function, which sacrificed the differentiations on the upper end of the scale for greater differentiation in the middle.

On the other hand, knowing, in a superlative sense, that French and Norwegian are among the most innovative translations is enough for most Harry Potter translation enthusiasts. It’s in the middle of the scale where the differences between translations is most interesting.