What makes a piece of music sound good? What makes the pitches that we use work? What makes something consonant or dissonant? These are some interesting questions that I will explore in this post.

In my earlier post on The Physics of Sound, I discussed about understanding sound as a wave, and how loudness, pitch and timbre are defined. In this post, I explore more into music itself. I will assume that readers have basic knowledge of keys, octaves and major/minor scales.

Consonance and Dissonance

Let’s begin with a question: which key sounds the most like middle C? Most, I suspect, would answer with either “treble C” or “tenor C“. After all, many should already be familiar that the same key on another octave sounds very similar, even though it’s an entirely different key. Some might even answer with “middle C” itself, which is a perfectly valid answer, despite how straightforward and painfully obvious that might sound.

In musical terms, 2 voices (I shall use voices to refer to instruments) playing the same key is known as a unison or prime. Unisons and octaves are the most consonant intervals.

Now, let’s follow through in this direction: what’s the next most similar key to C, without considering the octave? If you’re familiar with the major key, you might reply with either F or G. Indeed, most would agree that the perfect fourth and perfect fifth are the second most consonant intervals.

The question is, what makes these intervals consonant to begin with? How do we define consonance?

Harmonic Series

I would like to emphasise from here on that the theories discussed are human attempts at developing some formal structure with mathematics and physics to understand music; they are not meant to be understood as the truth behind it. However, these theories have been well-verified and very effective in the development of music.

From this section on, I will use the scientific pitch notation for keys (tenor C = C4, middle C = C5, treble C = C6). I will also expand upon a few points made in my previous post, The Physics of Sound:

Adding an octave to the pitch is the same as multiplying the frequency by 2.
- Adding two octaves is the same as multiplying the frequency by 4, since we are doubling the frequency twice.
- Subtracting an octave is the same as dividing the frequency by 2.
Our ears are more used to hearing lower harmonics than higher harmonics.
- I will make a claim here, that consonance is the natural and dissonance is the unnatural.
- Consequently, consonance comes from lower harmonics, and dissonance comes from higher harmonics. This should not be taken to mean that higher pitches are more dissonant.

Since the lowest harmonics are the fundamental (unison) and the first harmonic (octave), they happen to be the most natural, and hence the most consonant.

If the fundamental is C5, what is the first harmonic? Trivially, this is just multiplying the frequency by 2, the same as adding one octave, which gives us C6. The more interesting question would be: what is the second harmonic? We need to find the key with 3 times the frequency of C5.

Amazingly, the result is approximately G6, which is the compound fifth above C5. Shifting the result to octave 5, we get G5, the perfect fifth.

We can ask the same question, but from the opposite perspective. Instead of taking C5 to be the fundamental and asking what the harmonic is, what if we take C5 to be the harmonic and ask what the fundamental is?

If C5 is the first harmonic, what is the fundamental? Also trivially, this is just one octave lower, C4. Now, if C5 is the second harmonic, what is the fundamental? We need to find the key with 1/3 times the frequency of C5 (3 times lower).

We get a result approximately equal to F3, which is the compound fifth below C5. Shifting the result to octave 5, we get F5, the perfect fourth.

We managed to obtain the results of perfect fifth and perfect fourth, but for different reasons. We got the perfect fifth by multiplication of the frequency, and the perfect fourth by division. We can forthwith consider the results from multiplication separately from division; multiplication will give us the overtone series, and division will give us the undertone series.

Overtone Series

The Overtone Series refers to the pitches we get when we fix the fundamental as the tonic and find the harmonics. Physically, these pitches can be understood as tones that occur naturally in the tonic. In this section, we shall take the tonic to be C5.

Notice that in order to lower a harmonic into the same octave as the tonic, we need to repeatedly divide its frequency by 2 until we end up with a frequency fraction between 1 (inclusive) and 2 (exclusive). In the case of G6, we lowered its frequency until it was 3/2 times of C5, giving us G5.

Let’s now look at the third harmonic. This has 4 times the frequency of C5, which means it is 2 octaves higher (C7). By lowering it down to octave 5, we get C5, which means that it is equivalent to the tonic. Hence, this harmonic is redundant.

Notice that in general, all even multiples of the frequency are redundant. The harmonic with 2N times the frequency of the fundamental is the same key as the harmonic with N times the frequency.

The fourth harmonic is more interesting, with 5 times the frequency. We need to repeatedly divide by 2 until the result is between 1 and 2, which gives us 5/4 times the frequency. This is closest to the key E5, the major third.

We can ignore the fifth harmonic as it has 6 times the frequency.

We can continue to find harmonics this way, and the further we go, the more dissonant they become. Below are the frequency fractions we get for the fundamental and first 31 harmonics, in increasing order of dissonance, with redundant fractions ignored:

$1,\dfrac{3}{2},\dfrac{5}{4},\dfrac{7}{4},\dfrac{9}{8},\dfrac{11}{8},\dfrac{13}{8},\dfrac{15}{8},\dfrac{17}{16},\dfrac{19}{16},\dfrac{21}{16},\dfrac{23}{16},\dfrac{25}{16},\dfrac{27}{16},\dfrac{29}{16},\dfrac{31}{16}$

Below is a plot showing the frequency of each harmonic with its corresponding pitch. The colour and thickness of the dotted line indicates the consonance of that harmonic; redder and thicker lines when more consonant, bluer and thinner lines when more dissonant.

Below is a table with the closest key to each fraction, as well as the approximation error which will be explained later:

Undertone Series

The Undertone Series refers to the pitches we get when we keep each harmonic fixed and find the fundamental. Physically, these pitches can be understood as tones that the tonic occurs naturally in. Here, we set the harmonic as C5.

In the overtone series, we lowered harmonics to the same octave as the tonic. Here, we will raise each fundamental to the octave below the tonic instead. We do so by repeatedly multiplying its frequency by 2, until we end up with a frequency fraction between 1/2 (exclusive) and 1 (inclusive). In the case of F3, we shall increase its frequency until it is 2/3 times of C5, giving us F4.

Each fundamental when the tonic is an even harmonic is redundant. The fundamental with 1/2N times its frequency is the same key as the fundamental with 1/N times its frequency.

The fundamental when C5 is the fourth harmonic has 1/5 times the frequency. Raising this to the current octave results in 4/5 times the frequency, closest to the key A♭4, the minor sixth.

Again, we can follow through with this procedure to find fundamentals, in order of increasing dissonance. Below are the frequency fractions we get by taking the tonic to be the fundamental and first 31 harmonics, in increasing order of dissonance, with redundant fractions ignored:

$1,\dfrac{2}{3},\dfrac{4}{5},\dfrac{4}{7},\dfrac{8}{9},\dfrac{8}{11},\dfrac{8}{13},\dfrac{8}{15},\dfrac{16}{17},\dfrac{16}{19},\dfrac{16}{21},\dfrac{16}{23},\dfrac{16}{25},\dfrac{16}{27},\dfrac{16}{29},\dfrac{16}{31}$

Notice that they are reciprocals of the fractions in the overtone series.

Below is a plot showing the frequency of each fundamental with its corresponding pitch. The colour and thickness of the dotted line indicates the consonance of that fundamental; redder and thicker lines when more consonant, bluer and thinner lines when more dissonant.

Also, the table with the closest key and error:

Notice that the errors here are negative of the errors in the overtone series.

Full Series

It is not difficult to imagine the Overtone Series and Undertone Series as just mirror images of each other. In the below plot, the two series are combined. Notice that all horizontal dotted lines above the one at C5 are mirror images of the ones below.

Also notice that for the frequency fractions, if its numerator is odd and its denominator is a power of 2, it comes from the overtone series. Conversely, if its numerator is a power of 2 and its denominator is odd, it comes from the undertone series.

We can also list out all intervals in order of increasing dissonance. Note that these are not exact because approximation errors exist:

Unison/Octave
Perfect Fifth, Perfect Fourth
Major Third, Minor Sixth
Minor Seventh, Major Second
Tritone
Major Seventh, Minor Second
Minor Third, Major Sixth

Tuning

In the above examples, I made the natural assumption that there are 12 keys in an octave, equally spaced in pitch, and most people would also assume this by default. This method of determining the positions of keys is known as Twelve-Tone Equal Temperament (12-TET/12-ET), and is a particular kind of tuning. The A = 440 Hz pitch standard applies specifically to 12-TET. 12-TET is mainly used in Western music.

It shouldn’t be too confusing then, to know that we can use a different number of keys in an octave, also equally spaced in pitch. These methods, together with 12-TET, fall into a common family of tuning systems, known as Equal Temperament. Some examples are 19-TET, 24-TET and 31-TET.

Another way of tuning, which shouldn’t be too confusing, is to simply choose a set of frequency fractions, and assign a key to each. This is known as Just Intonation. Just intonation is mainly used in traditional Eastern music. Notable examples are Pythagorean Tuning and Five-Limit Tuning. Pythagorean Tuning uses the following frequency fractions:

Other keys such as C♯ can be derived from this table, which will be explained later. When all keys are shifted to the same octave, frequency fractions with smaller numbers are referred to as juster pitches.

There are also hybrid tunings which incorporate elements of Equal Temperament and Just Intonation, an example being Meantone Temperament. These systems are complicated so I shall not discuss them here.

You might notice that all keys before C are taken from the undertone series and all keys after C are taken from the overtone series. Also, the diminished fifth, G♭, is slightly lower in pitch than the augmented fourth, F♯.

Best of Both Worlds?

Is it possible then, to have an equal temperament system that is also a just-intoned system? Unfortunately, the answer is no. Here is a short mathematical proof (if you aren’t interested, you can skip to the next subsection):

Equal temperament systems rely on subdividing octaves into smaller units of pitch.
For a n-TET, there are n keys/intervals in an octave, hence every key has a frequency $2^{1/n}$ times of the previous key (n^th root of 2).
For a just intonation, the frequency of every key has to be a fraction of the tonic’s frequency (i.e. a rational number).

Hence, to prove that a system cannot be both equal-tempered and just-intoned, we just need to show that there exists some interval in an equal temperament that’s irrational. In particular, we will try to prove that the interval $2^{1/n}$ is irrational, where $n$ is a natural number and $n>1$ . We’ll prove this by contradiction, meaning that we will first assume that it is rational, then show that it leads to a logical error.

If it is rational, then $2^{1/n}=\frac{p}{q}$ , where $p,q$ are natural numbers. We will take it that $\frac{p}{q}$ is in its most simplified form, meaning that $p,q$ are relatively prime (have no common factors).

We can rearrange the above equation into this form: $2q^n=p^n$ .

We can express $q$ as a product of its prime factors: $q=\prod_i{f_i}$ , then $2\prod_i{f_i^n}=p^n$ . This equation shows that all prime factors of $q$ have to also be factors of $p$ . Since we established earlier that they have no common factors, the only possibility is that $q$ has no prime factors, i.e. $q=1$ .

So, $2=p^n$ . Since $n>1$ , there are no natural numbers that $p$ and $n$ can take, which is a logical error. By contradiction, this means that all roots of 2 are irrational, so we proved that a system cannot be both equal-tempered and just-intoned.

Equal Temperament vs Just Intonation

You might ask be wondering which tuning is better, so let’s weigh the pros and cons of each.

Equal Temperament (“Every key is equal.”)

Pros
- The consonance/dissonance of intervals does not depend on the key.
- Pitch intervals are equal, so it is transposition-invariant.
- Only need to determine number of intervals per octave.
Cons
- Intervals only approximate the harmonic series. Certain systems can have large approximation errors.
- All intervals except the octave have irrational frequency ratios.

Just Intonation (“Equality is key.”)

Pros
- Intervals are exactly equal to those in harmonic series.
- All intervals have rational frequency ratios (hence “fractions”). Certain chords can be tuned to follow exact fractions.
Cons
- Intervals further from the tonic have increasing dissonance. An infamous example is the wolf interval (imperfect fifth).
- Pitch intervals are not equal, hence transposition and modulation are limited but still possible (more on this later on).
- Not easy to determine which and how many frequency fractions to use.

Before determining which system is better than the other, it’s important to know about pitch errors and how our ears perceive them.

Making Cents of Pitch Errors

As mentioned in the previous section, we cannot construct an equal temperament that is also a just intonation.

The harmonic series is essentially a set of frequency fractions, meaning that it can only be utilised in its exact form via just intonation.
However, equal temperament is far more practical and convenient as consonances and dissonances do not depend on the key.
Thus, let’s stick to equal temperament for now, and walk through the problems.

As seen earlier, when we try to use the harmonic series under equal temperament, we end up incurring rounding errors. These errors in pitch are in units of cents, which are defined to be 1/100 of a 12-TET semitone.

In The Physics of Sound, I mentioned that to add a semitone to the pitch, you multiply the frequency by $2^{1/12}\approx 1.0595$ . I shall clarify that this only applies to 12-TET semitones.
To add an n-TET semitone to the pitch, you multiply the frequency by $2^{1/n}$ .
Here, to add a cent to the pitch, you multiply the frequency by $2^{1/1200}\approx 1.000578$ .

Let’s look at an example: we shall calculate the key of the minor seventh under 12-TET, and the corresponding error. The minor seventh is more consonant in the overtone series, with a frequency fraction of 7/4. Since ET systems are equally spaced in pitch instead of frequency, we need to convert this frequency fraction into a pitch difference by taking the logarithm.

In an n-TET system, a frequency ratio of $r$ corresponds to a pitch difference of $n\log_2{r}$ , in units of n-TET semitones. For the frequency fraction 7/4 under 12-TET, we get the pitch difference as about 9.68826 semitones, i.e. 9.68826 semitones above the tonic.

However, since we only allow integer numbers of semitones for the key, we have to round this number off. We get that the key is 10 semitones above the tonic, with an error of -0.31174 semitones, or -31.174 cents. For numerical reasons, rounding errors should be rounded up in absolute value instead of rounded off. The absolute value of the error is 31.174 cents, so rounding up to the nearest cent gives 32 cents, and placing the sign back gives us the error of -32 cents.

Wikipedia lists the error as -31 cents as it was rounded off, which is numerically incorrect as it implies a greater precision than is actually present. For example, an error of +0.3 cents would round off to 0 cents, but this value gives the impression that there is no error. It is perfectly fine to overstate rounding errors, but not understate them.

Equal Temperament

Comparing Systems

Since 100 cents is 1 semitone, the error margin of the 12-TET is from -50 to +50 cents. We can reduce this error margin by increasing the number of keys in an octave, but this might cause the problem of over-complication. The reason is because there’s a just-noticeable difference (JND) for pitch that our ears can perceive, and anything smaller than that cannot be perceived.

However, the pitch JND is not a fixed figure. It varies from person to person, and for different kinds of tones (since pitch differences for harmonics are more discernible). For the average person, the pitch JND is about 5 cents. This means that equal temperament systems with more than 120 keys per octave would have an indiscernible difference between consecutive keys.

I performed a short study to find out how well equal temperament systems could approximate the harmonic series. In this study:

I generated the harmonic series up to the 256^th harmonic/fundamental.
I analysed from 2-TET to 100-TET (i.e. 2 to 100 intervals per octave). For each ET system:
1. I calculated the error between each harmonic/fundamental and the nearest key.
2. I first multiplied all errors by the number of intervals per octave. This meant that ET systems with more intervals were penalised more for pitch errors.
3. I then scaled each error depending on the consonance of the harmonic/fundamental. This meant that errors for more consonant intervals were penalised more. I called this scaling factor the error decay factor, since it indicates the rate which errors decay over the harmonics.
4. I averaged over the absolute value of the errors to get the mean absolute error (MAE).
5. I repeated the above steps to find the MAE for various error decay factors. I normalised the MAE for each decay rate by the geometric series.
I visualised the result as a contour plot, as shown below:

In the above image:

The horizontal axis denotes the number of intervals per octave (i.e. the ET system)
The vertical axis denotes the error decay factor.
The intensity denotes the mean absolute error (MAE); brighter for larger, darker for smaller.
The vertical ticks are plotted for common ET systems. An ET system with a dark vertical strip has high accuracy.

It should thus please you to know that 12-TET is in fact the most accurate out of the first 23 ET systems. 53-TET has the highest accuracy of the first 100, but you probably wouldn’t want to play an instrument with 53 keys per octave!

Hence, although it’s a matter of opinion whether equal temperament or just intonation is better, what we can take away from this is that 12-TET does a remarkable job at approximating the harmonic series.

Enharmonic Equivalence

Now, while we might understand why we have the names A♯ and B♭ to refer to the same key, ever wondered why keys like C𝄪 (C double sharp) or D𝄫 (D double flat) exist, even though the former is clearly just D and the latter is just C? The reason for this is that they are only equivalent under 12-TET. By definition:

If two keys/ key signatures/ intervals have the same frequency/ frequency fraction, but have different names, they are said to be enharmonically equivalent. All the example pairs above are enharmonically equivalent.
- For example, B♯, C and D𝄫 are enharmonically equivalent.
The enharmonic spelling of a key/ key signature/ interval is a different name with the same frequency/ frequency fraction.

Strictly following the definition, enharmonic equivalence only applies if the frequencies are exactly equal. Transposition is only possible in ET systems because all keys within a particular interval are forced to be enharmonically equivalent. Just-intoned systems can contain enharmonic equivalents (see Groven’s 36-Tone Just Scale for example), but only in special cases.

Just Intonation

Deriving Keys

As mentioned earlier, certain keys are missing from the Pythagorean tuning table and can be derived from other keys. For example, we shall derive the frequency fraction for C♯. The table again, for reference:

Notice that the only sharp key we have in the table is F♯. Take note that F♯ is to C as C♯ is to G, and E is present in the table. In the same way that F♯ has $\dfrac{729}{512}$ times the frequency of C, C♯ has $\dfrac{729}{512}$ times the frequency of G. G itself has $\dfrac{3}{2}$ times the frequency of C. This means that A♯ has $\dfrac{3}{2} \times \dfrac{2187}{1024} = \dfrac{59049}{32768}$ times the frequency of C.

By the same rationale, all other keys can be derived through relative positions.

Commas

Let us now look at an example of 2 keys that are enharmonically equivalent in 12-TET, and find the difference between their frequency fractions in Pythagorean tuning.

The simplest example would be G♭ and F♯ which are already present in the table. Shifting G♭ up to the same octave has a frequency fraction of $\dfrac{1024}{729}$ , and F♯ has a frequency fraction of $\dfrac{729}{512}$ , which is not equal. F♯ has $\dfrac{729}{512} \div \dfrac{1024}{729} = \dfrac{531441}{524288}$ times the frequency of G♭. This is equivalent to about 0.2346 semitone (12-TET), or 23.46 cents.

For all other enharmonically equivalent sets, the frequencies are also separated by an error of 23.46 cents. You can check this for yourself! It is a special property of Pythagorean tuning that results in this peculiar result. Hence, this error is called the Pythagorean comma. The term comma is used in music to denote a miniscule pitch difference between two musical keys.

Although keys with a comma between them are not strictly enharmonically equivalent, they may be effectively so. Keeping the JND in consideration, any comma of less than 5 cents is generally unnoticeable. What I should add here, is that if you space two notes further apart in time, the pitch difference between them becomes less noticeable.

For example, the keys X and Y are spaced 20 cents apart, and you replace X with Y in a chord. This is pretty noticeable, since the incorrect frequency clashes with the other keys in the chord. However, if you instead replace X with Y in a melody, it might not be noticeable at all and you might get away with it. In fact, this is what many musicians who use just intonation often do in their compositions to modulate or transpose!

Compositions

While not completely explicit, an understanding of consonance and dissonance can aid greatly in garnering harmonic and melodic interest in our compositions.

Consonance and Familiarity

There has been the notion since the Renaissance that excessive consonance sounds boring, and should be avoided in compositions. Even in modern-day part writing, we still observe the age-old rules that “parallel unisons/octaves/fifths should be avoided”, “leaps into octaves should be avoided”, etc. However, in my earlier post The Moods of Music Modes, I mentioned that the Mixolydian mode is the most consonant, even more so than the Ionian mode (major scale), yet it sounds less boring.

The natural question would be: Why is Mixolydian more consonant than Ionian, even though it sounds less boring? In order to answer this, we should identify what it is exactly that our ears find boring.

Imagine discovering 10 songs that sound really amazing and adding them into your playlist. You would then proceed to listen to these 10 songs every single day. Will there come some point when you get tired of them and move on to different songs? I suspect there would be for most of you. It is also no surprise that boredom is a result of excessive repetition.

Hence, it may not be so apt to claim that consonance is boring. It is instead that familiarity is boring, and repetition leads to familiarity. Even if something is exciting and unfamiliar to us at first because of its use of dissonance, we will eventually find it boring after listening to it over and over again. Since consonance is natural, it just has a greater tendency to sound boring, but it is not necessarily boring.

In general, repetition is good in moderation. Too little of it and your music becomes chaos too alien to appreciate, but too much of it and your music becomes a chore to listen to. Some repetition is needed to help you to establish a motif, scale or tonic.

Dissonance and Emotion

You might have heard several musicians claim that dissonance is emotion. From a mathematical standpoint, dissonance is the unnatural. It is like a break from the familiar into unfamiliar territory. However, what is important in compositions is not so much how much dissonance you use, but how familiar the territory that you are exploring is. With this idea, you can portray a whole variety of scenarios:

The most familiar territory, the major scale, is a joyful and happy one.
A slight twist in the major scale, like playing the III chord can easily turn the joy into a bittersweet one.
The slightly less-familiar territory, the minor scale, is a sad and solemn one.
Pushing towards the unknown, the blues scale, is an impersonal yet relaxing atmosphere.
Even more unfamiliar territory, the Phrygian scale, is an empty, desert landscape.
The very unfamiliar, the Locrian scale, is one of cold, unforgiving apathy.

We can also move between territories in our composition to evoke mystery, like in the melodic minor and the minor major seventh chord. This movement can evoke emotion, but it can just as easily evoke a lack of one. Experimenting with the interplay of familiar and unfamiliar is key to understanding the various pictures that can be painted in our compositions.

Frequency Fraction	Closest Interval	Error (Cents)
$1$	Unison/Octave	0
$\dfrac{3}{2}$	Perfect Fifth	+02
$\dfrac{5}{4}$	Major Third	-14
$\dfrac{7}{4}$	Minor Seventh	-32
$\dfrac{9}{8}$	Major Second	+04
$\dfrac{11}{8}$	Tritone	-49
$\dfrac{13}{8}$	Minor Sixth	+41
$\dfrac{15}{8}$	Major Seventh	-12
$\dfrac{17}{16}$	Minor Second	+05
$\dfrac{19}{16}$	Minor Third	-03
$\dfrac{21}{16}$	Perfect Fourth	-30
$\dfrac{23}{16}$	Tritone	+29
$\dfrac{25}{16}$	Minor Sixth	-28
$\dfrac{27}{16}$	Major Sixth	+06
$\dfrac{29}{16}$	Minor Seventh	+30
$\dfrac{31}{16}$	Major Seventh	+46

Frequency Fraction	Closest Interval	Error (Cents)
$1$	Unison/Octave	0
$\dfrac{2}{3}$	Perfect Fourth	-02
$\dfrac{4}{5}$	Minor Sixth	+14
$\dfrac{4}{7}$	Major Second	+32
$\dfrac{8}{9}$	Minor Seventh	-04
$\dfrac{8}{11}$	Tritone	+49
$\dfrac{8}{13}$	Major Third	-41
$\dfrac{8}{15}$	Minor Second	+12
$\dfrac{16}{17}$	Major Seventh	-05
$\dfrac{16}{19}$	Major Sixth	+03
$\dfrac{16}{21}$	Perfect Fifth	+30
$\dfrac{16}{23}$	Tritone	-29
$\dfrac{16}{25}$	Major Third	+28
$\dfrac{16}{27}$	Minor Third	-06
$\dfrac{16}{29}$	Major Second	-30
$\dfrac{16}{31}$	Minor Second	-46

The Mathematics of Music

Consonance and Dissonance