Of Time Leaks and Infinite Playback

Part of my work as a postdoc within the Yale Haskell Group in 2014 involved refining the playback implementation for the Euterpea library. As of Euterpea 2.0, the default MIDI playback implementation, play, supports sequentially infinite musical structures, but at the expense of allowing a time leak. This post will explain how that situation came to be and why the time leak is not actually such a bad thing.

History

Haskell is a lazy language, which means that expressions are only evaluated when they are needed. You can define an infinite structure and traverse it successfully, although, in practice, you will not be able to explore all of it. For example, consider a classic GHCi demonstration of this property:

let x = 1 : x
x

Behold! An infinite stream of 1s printing to the screen until you press Ctrl+C to stop it. This property is quite powerful and has application to music. A simple example would involve repeating patterns, but we may also wish to do something like sonify the digits of pi as pitch classes, producing an infinite while non-repeating series. These are structures that Euterpea allows users to define.

For many years, Euterpea did not support playback of infinitely long musical structures. Users could define them, but had to decide in advance how much they wanted to hear in playback. This was the case for the first 6 years I spent working with Paul Hudak on Euterpea at Yale. Then, one day, there was a conversation that went roughly this:

Paul: We should be able to play infinite values in Euterpea.
Me: Why? We only ever listen to finite things.
Paul: But it should be possible.
Me: It will hurt performance.
Paul: But it should be possible.

This futile debate continued for weeks. Paul wanted ideological purity: we can define and stream infinite lists in Haskell, and so we should be able to do the same with other infinite sequential structures. In contrast, I wanted optimized performance: there is no way to safely ensure accurate timing playback on arbitrary music values (more on this to follow if you don’t believe me now).

Eventually, Paul and I agreed to support both things separately, since there is no viable middle ground that allows great performance and infinite values at the same time. Something interesting also emerged: between arguments of purity and performance, it became clear that it’s actually safer to use an infinite playback strategy if you aren’t sure how long it takes to compute an arbitrary music value. For this reason, the infinite playback implementation became Euterpea’s default play function, while adding a timing-strict playback version (playS) for finite values. With these two functions in place, everyone was happy…for the most part.

The Infinite Playback Algorithm

Euterpea converts Music values to event streams before playback at the MIDI level. This is possible to do lazily on sequentially infinite values, or pieces of music that go on forever. Infinite parallelism is not supported and will just hang, but it also simply makes no sense in MIDI-level music (a point on which Paul and I agreed!). So, from this point on, “infinite” means sequentially infinite.

A musical event can thought of as a pair of two values: a timestamp, t, since the last event, and a note on or off event, n. In pseudocode, Euterpea 2’s play algorithm is roughly this:

for each event (t,n) in the event list:
    sleep for t amount of time
    send n to the MIDI output device

If time measurement was exact, this algorithm would suffer no problems. However, time measurement is not exact. When a thread sleeps for some amount of time, there is the possibility that it can be woken up just a bit late if some higher-priority process hogs the CPU. If we lose a few nanoseconds to such an occurance, that time is gone for good. We don’t get it back; everything else in the sequence ends getting permanently shifted back in time. This type of behavior is often referred to as a time leak: the observed sequence of events will, given enough time, slide slowly out of sync with the ideal sequence of events. Similarly, because the construction of the event list is lazy, we may hear an unintended pause the middle of the music if the computer hits a bubble of time-intensive computation in the music value.

Clearly these time-leak-induced behaviors are not good if we want precise timing! But…is it always bad? As it turns out, there are situations in which it’s quite a beneficial behavior for reasons of safety. Let’s take a look at why that is by addressing some common complaints with the strategy.

Why not make it strict?

Strictness is the opposite of laziness: it involves computing everything in an expression before it is used. This is simply not possible while accommodating infinite sequential structure. Fully evaluating something before its use is fundamentally at odds with the notion of streaming an infinite value to an output device. Playback of arbitrary infinite structures must be lazy to succeed.

Why not simply check the current time and compensate for unanticipated delays?

Although appealing in concept, it is actually unsafe to do this lazily. We don’t even need an infinite value to see why this is problematic, just a finite value that takes a long time to compute.

Suppose we try to compensate for missed time by shifting remaining events forward (subtracting the delay from their t) to stay on time. Let’s say we have a melody that consists of 240 sixteenth notes at 120bpm. The duration of this melody is:

(240 sixteenths)*(1 beat / 4 sixteenths) *(1 min/120 beats)*(60sec/min) = 30 seconds

Now, suppose it takes 31 seconds to compute the very first note, but computation of the others follows swiftly. This is not a contrived scenario. In fact, it is the exact pattern Kulitta exhibits when working with chord spaces: it takes a long time to set up structures needed to compute any notes, so there is a long wait on the first note and then the others arrive immediately.

If we try to do lazy playback, which is required for infinite values, evaluation of the first note will start when we ask for playback to start. And yet, in our hypothetical scenario, it can’t arrive until 31 seconds into the playback. The rest of the music is now 31 seconds late because of lazy evaluation. If we try to adjust subsequent time stamps to compensate for this, we will end up sending out all remaining events with a timestamp of 0 (meaning “now”). The result will be an onslaught of 231 note-on events to whatever MIDI output device is in use, all close enough to sound simultaneous. Playback will consist of 31 seconds of silence, and then BAM! All the notes at once.

This scenario doesn’t just result in what I refer to as note avalanches, or the messy sound of far too many notes happening at once. More importantly, it can actually crash applications and synthesizers! Crashes are typically due to polyphony limits (often 127 – well under the 231 of our example above) and buffering problems when sending messages between devices. This means that bad behaviors resulting from this scenario are often beyond Euterpea’s control. The bottom line is that trying to shift things forward in time is not a safe option when arbitrarily long computation times are involved.

On a historical side note, Euterpea’s older playback algorithm also suffered from note avalanches, despite being constrained to finite values. The design was not very good: it would still ask to pipe out the first value as it was computing everything lazily, but there was a step in the computation that ended up traversing the entire value before it could be played (hence the old requirement for finiteness). On top of that, the old implementation would also try to make up for lost time. As a result, the timing for the first two notes often sounded wrong on simple values and, on more complex values such as those from Kulitta, crashes were abundant. And this was just with finite values! Euterpea’s current implementations are a big improvement over this problematic older approach.

Why not use a buffer?

No degree of buffering will 100% remove the unintentional pause and note avalanche issues, because there is no way to predict what buffer size is needed without examining the music value first (which, again, is at odds with lazy evaluation). If you know something about the structure of the music value, such as if you want to play a repeating finite unit, buffering will absolutely work – but it is an application specific tactic. Even if we buffer some amount of music lazily, we still risk an observable time leak for time-consuming computations that exceed the buffer’s capacity to compensate. So, it is not a sufficiently general solution for playback of arbitrary music values.

Surely there must be some way around this!

Unfortunately, there isn’t, not while accommodating truly arbitrary music values and computation times. You will always end up with one of two behaviors:

A time leak (safe but imperfect performance)
The risk of a note avalance (unsafe on hard-to-compute values)

The only way to find a balance between time leaks and crashes is to impose constraints on the music values. And so…

There is no way to safely ensure accurate timing playback on arbitrary music values.

The Solution

There is really quite a simple solution to all this: accept that purity and performance don’t always intersect and stop trying to make everyone happy with a single function. Instead, it makes more sense to address playback functionality by common use cases.

Use case 1: casual playback of an arbitrary music structure in isolation. In other words, we just want to know sufficiently well what something sounds like and don’t care about things like synchronizing to external time keepers. For a musical analogy, consider it like asking a skilled musician to sightread a never-before-seen score. Euterpea 2’s play function is well-suited to this. First and foremost, it is safe from note avalanches while working on infinite sequential values. In practice, the timing is also just fine to the ear on the vast majority of cases I have heard. Although the time leak will always be there, it is typically too tiny to be audible and therefore does not affect casual listening. If you happen to hit a case where the playback is obviously laggy, then you know you’ve got heavy computation going on (without crashing) and may want to move to the second case instead. It’s an easy switch for finite values and, for infinite ones, simply involves using cut to listen to a finite portion.

Use case 2: strict-timing playback of a finite music structure in isolation. Because the structure is finite, we can precompute the MIDI event sequence (which may take a long time) and avoid hitting bubbles of large computation that cause note avalanches. Playback begins only after everything has been evaluated. This is what Euterpea’s playS (short for “play strict”) does, and it is safe from laziness-induced note avalanches as a result. For another musical analogy, this version is like letting the musician memorize the piece before playing it.

Use case 3: synchronization of streams to external timers. This scenario is fundamentally problematic for infinite values, since it requires shifting things forward in time to compensate for delays. For infinite cases, there are two options: use repeating sequences of finite values or ensure that the computation between events is minimal. Either way, and even when considering finite values, it is really too broad an area to provide any sort of prepackaged support for it in Euterpea. The situation requires potentially substantial constraints over the music values and will also involve application and/or device-specific synchronization methods. For hardware MIDI devices, stream synchronization is often accomplished through use of a separate cable that sends voltage pulses. A similar approach at the software level can be accomplished through functional reactive programming, such as with the UISF library’s event streams.

And so, everyone can be happy! …Except for the people in use case 3, who still have to do a bunch of extra coding.