Waveform shaping


The above is a visual representation of the opening measures of Muzio Clementi’s sonatina, Op 26, No1 as rendered by sf2sound — a kind of command line synthesizer that takes a stream of solfa symbols as input. This is homework for an eventual ear-training program. In any case, here is the audio file:


One of the challenges has been in shaping the waveforms of the individual notes so that they fit together without making annoying pops. A simple exponential decay did not work, although that is good for making the sound more or less percussive. What I discovered is that (at a minimum), one has to shape the attack and release of the note. For the moment I have done this by shaping the wave form with a simple quadratic function.

I’ve been experimenting with various settings and algorithms in order to get a better or more interesting sound. The sound file you hear is slightly more complex than the other ones I have posted. In previous version the sound was either (1) an exponentially damped sine wave, or (2) the former with some kind of shaping of the amplitude profile as mentioned above. In the current version, higher harmonics are mixed with the fundamental tone. Here is the code snippet of quad2samp where the mixing occurs:

// Form the sine wave and add harmonics to it
samp = sin(W*phase);
samp += -0.4*sin(2*W*phase);
samp += +0.2*sin(3*W*phase);
samp += -0.1*sin(4*W*phase);

I’ve observed an odd but likely well-known phenomenon (or is it an illusion?). When the sound consists of a shaped sine wave, i.e. no (deliberate) harmonic mixture, I find it painful to listen to it, even at relatively low volumes. Painful in the most elementary sense of the word, not because the poor artistry of sf2sound! When I mix in higher harmonics in some degree, the (physical) pain diminishes. I suspect this because the acoustic energy in the first instance is concentrated near a single frequency, so a small number of hair cells in the inner ear are overstimulated. When the same energy is spread among the various harmonics, albeit in unequal proportions, it is also spread over more hair cells, so that individual cells are not overstimulated. Perhaps someone who really knows what is going on can comment.

I’ll close with one more image — a close-up shot that shows how the wave forms from two adjacent notes join smoothly. The jaggedness of the sound wave reflects the addition of higher harmonics to the sine wave representing the fundamental tone.


Sonatina: close-up

Improvements — more shapely waveforms produced by quad2samp eliminates those durn popping sounds — and much more! Source code for quad2samp and sf2sound.py

1 | 2 | 3 | 4 | 5 | 6 |Source Code

Many changes, including new names for the commands. Here is an example:

% sf2sound test '|| fundamental:261 f: tempo:120 | stacc: q do re mi fa | leg: p: h sol sol ||'

And here is the audio:


The dynamics f: and p:, for forte and piano, now work thanks to the improved back-end program quad. I have renamed it quad2samp. The command fundamental:261 sets the frequency in Hertz of do. For longer examples, you can take a file as input:

sf2sound -f theme.solfa

The file theme.solfa represents the first 15 measures of a sonatina in C by Muzio Clementi (Op. 36 No. 1). See the code pages. Here is the sound file:


previous version

Notice that those terrible popping sounds are gone. Eliminating them turned out to be a problem in waveform shaping, which is accomplished in quad2samp using some nice little quadratic functions. I will discuss these improvements and others in the next few posts, as well as give all the source code.

One of the major changes is a clearer idea of what the input language is and can do. I am dubbing the language SF — for solfa, of course. It has three basic entities: note symbols, rhythm symbols, and commands. The symbols can be accented. Thus do+ is an accented do which is interpreted as C#, while do^ is one of two ways to write C an octave above the given C, the other being do2. An important accent for a rhythm symbol is the dot. Just as in music, it increases the value by one-half. Thus q. is a dotted quarter note.

Commands may or may not take arguments. Thus we have tempo:144 but also allegro:. Some commands take more than one argument. An example is cresc:4:f, which means crescendo over four beats to forte. One can also say cresc:4, which means crescendo over four beats from the current level to whatever level results. The rapidity of the crescendo is a default constant which of course can be adjusted: crescendo-speed:1.2. The commands change the values of the “SF” machine that transforms a stream of tokens in the SF language into quadruples. So far this architecture seems to remarkably easy to extend and maintain.

This little project is more work than I bargained for, but it beats playing video games. I am having fun and learning a lot. The Audio Programming Book by Boualanger and Lazzarini is a fantastic resource.


Immense destruction

There are no words for this. Move slider to left and right to see more/less of before/after. HH

A better way of creating .wav files given note data

1 | 2 | 3 | 4 | 5 | 6 |Source Code

Today’s post in my little saga of learning audio programming features a program quad.c. It creates a .wav file given a file of quadruples of the form (frequency in Hertz, duration in seconds, decay factor, amplitude). There are advantages of this approach as compared to to the previous one, which was based on concatenating the text files produced by tfork.  First, only a single intermediate text file containing waveform sample data is created.  In the first version,  one file was created for each note, plus one for the concatenation of all the former.  The concatenated file can be quite large. It is a file of 44,100 numbers for each second of audio which represents the sampled waveform. With quad, the file size is of corse the same as that of the concatenated file. The waveform which is sampled, however, has continuously varying phase. With the concatenated files produced by tfork, the phase begins at zero at the start of each note.  More importantly, in the new version, the volume of the individual notes can be controlled by setting the amplitude in foo.quad. You will notice the increase in volume in note to note when you play the file foo.wav.

In the next post, quad will be incorporated in solfa2sf.

Example. Here is a three-line file of quadruples that represents the notes A E A’, where A’ is an octave above A = 220 Hertz:

File: foo.quad
220 1.0 0.5 0.2
330 1.0 0.5 0.5
440 1.0 0.5 1.0

The comments in quad.c give more details, but suffice to say here running the following plays the sound represented by foo.quad

./quad foo.quad foo.samp
text2sf foo.samp foo.wav 44100 1 .90
rm foo.samp
play foo.wav


Note the increase in volume from note to note.


In version two of solfa2sf: we can enter rhythm as well as pitch values, and we can control tempo and note decay

1 | 2 | 3 | 4 | 5 | Source Code

Source Code for solfa2sf

The first version of solfa2sf took a string of solfa syllables and produced a .wav sound file representing it. This was good as a start, but we want to do more! In particular, we would like to be able to give different rhythm values to the notes. I’ve posted an improved version of solfa2sf that does just this. One can use the program on the command line like this

solfa2sf foo '| q do2 . e mi2 do2 . q sol sol |'

The result is a file foo.wav:


It represents a melody that starts with a quarter note C above middle C, continues with a pair of eighth notes, and ends with a pair of quarter notes.

To work with longer bits of music, it is best to write the solfa in a text editor, then compute the sound file like this.

solfa2sf -f theme.solfa

The file (theme.solfa) is this text below

| q do2 . e mi2 do2 . q sol sol |
| do2 . e mi2 do2 . q sol sol2 |
| e fa2 mi2 re2 do2 . ti do2 ti do2 |
| re2 do2 ti la . q sol x |

The symbols || . | have no effect on the program and could be omitted. They do help the human who reads or composes tis text, however. The possible rhythm values at this moment are w (whole note), h (half note) q (quarter note), e (eighth note), and s (sixteenth note). The command tempo:120 sets the tempo, and decay:0.1 determines how fast a note trails off into silence: Increase it to make the sound more sustained, decrease it to make it more percussive.

Below is the sound file. Click the link to play it. Do you recognize the piece? I studied it as a child back in Iceland.



We combine C and Python programming to transform a sequence of solfege syllables, e.g., do re mi re do, into a sound file.

1 | 2 | 3 | 4 | 5 | Source Code

In our last post we used tfork and a few unix commands to construct a sound file for an A major chord, A C# E A’. The idea was (1) to make text files representing the sounds of the individual notes using tfork, (2) glue the text files end-to-end using cat, (3) convert the resulting big text file into a .wav file using text2sf.

This works fine for short “melodies,” but it soon gets tedious — and out of hand. A better way is to write a short program that does all this for you, given a text string of solfa syllables like “do re mi re do”. This is what we do in the Python program solfa2sf. Below is the sound produced by

./solfa2sf -w foo 0.3 0.1 do re mi re do


SOURCE CODE for solfa2sf

In running solfa2sf, the arguments are as follows: (1) an option: -d for “dry run”, i.e., no output, -v for an output file with verbose messages, -w for wet, the opposite of dry: produces the output .wav file with few messages; (2) the file name; foo results in an output file foo.wav, (3) the note duration in seconds, (4) the decay time, (*) the solfa syllables.

If you use a small decay time, the sound is percussive, like a marimba, or even a drum. If you use a large one, it is more like an organ. Try delay times of 0.01, 0.1, and 1.0 to see what the effect is.

Python is very good with strings, lists, and dictionaries, which is what we need to parse and handle an input string like “do re mi re do’. C is the best tool for fast computation. So we use them together! As you can see from the source code, we call on the C program tfork using the Python command os.system. Like a carpenter, we use saw, chisel, hammer, etc. as needed for the task at hand.


We combine the power of tfork and text2sf with the magic of unix to construct a sound file for an A major chord.

1 | 2 | 3 | 4 | 5 | Source Code

A Chord

In Audio Programming 1, we compiled the tfork.c program in chapter 1 of the Audio Programming Book, Richard Boulanger and Victor Lazzarini, editors. This program , together with text2sf, gave us the tools needed to algorithmically construct simple sounds, namely an exponentially decaying sine wave. We used these tools to fabricate a file that represents the sound of a tuning fork.

Simple as these two tools are, they gives us the means to construct more complicated sounds without any additional C programming (fun though that is!). We will use the magic of unix. To begin, a short shell script:

# Language: unix/sh
# File: sound.sh
# Example: sh sound.sh 440 a --- writes text representation of
# a 0.2 second 220 Herz sound to a file a.txt

./tfork $2.txt 0.2 $1 44100 0.2

Using sound.sh, we make four sounds, each 0.2 seconds in duration, with frequencies of 220, 275, 330, and 440 Hertz. These correspond to the notes A, C#, E, and A’ = A one octave higher. The frequency ratios are C#/A = 5/4, E/A = 3/2, and A’/A = 2. Thus we are using Pythagorean tuning, in which pitch ratios in the scale are rational numbers with small numerator and denominator.

Let us now execute the following commands:

% sh sound.sh 220 a
% sh sound.sh 275 c#
% sh sound.sh 330 e
% sh sounds.sh 440 a2

The result is the creation of text files a.txt, c#.txt, etc., which represent the given sounds. Next, we concatenate these files, putting a.txt first, c#.txt next, etc:

% cat a.txt c#.txt e.txt a2.txt >chord.txt

Then, we convert chord.txt into a .wav file:

% text2sf chord.txt chord.wav 44100 1 1.0

To conclude, we play the file:

play chord.wav

Here is the sound:


Clearly there is more to do, among which are  (1) Clean up this sound: it needs to fade cleanly into silence; (2) Develop a mini language for transforming a sequence of pitch names into a sound file; (3) make more complex sounds.