Dynamically Rendering a Sound in HTML/JavaScript

Every so often I’ll code something not because it is something that I need, but because creating it is enjoyable. I stumbled upon something that I had coded some time ago and I thought it was worth sharing. I was experimenting with dynamically rendering sound using various technologies. This posts focuses on performing that within HTML/JavaScript. This post will explain the code.

When you are rendering sound in the browser before you can actually play it the user must interact with the page . This prevents a window that just opened in another tab or behind other windows from being noisy before a user knows about it.  If you were using this technology in a real solution you  would want to design your application such that you know the user has clicked, scrolled, or engaged in some other interaction at least once before your code begins doing anything.

To render audio you need a reference to the AudioContext object. On many browsers this can be found on window.AudioContext. On webkit browsers it can be found in window.webkitAudioContext. An easy way to get a reference to the right object is to coalesce the two.

var AudioContext = window.AudioContext || window.webkitAudioContext;

Now we can make our own instance of an AudioContext instance. When creating an AudioContext  two options that you can specify are the sample rate of the audio and the latency. The Latency setting allows you to balance power consumption on the machine with responsiveness to the audio that is being rendered.

var context = new AudioContext({ latencyHint: "interactive", sampleRate:48000});

From our audio context we can create our AudioBuffer objects. These are the objects that contain the waveform data for the sounds. An AudioBuffer object could have a different sample rate than the AudioContext used to make it. In general though I don’t suggest doing this.  An AudioBuffer object could have multiple channels. We will use a single channel (monaural). When creating the AudioBuffer we must specify the number of channels in the audio, the sample rate, and the count of samples in the buffer. The number of samples can directly be mapped to a time length. If I wanted to have a buffer that were 2 seconds long and I have a sample rate of 48KHz (48,000 samples per second) then the number of samples needed in the AudioBuffer is 2 * 48,000 or 96,000.

const SAMPLE_RATE = 48000;
const CHANNEL_COUNT = 1;
const SAMPLE_COUNT = 2 * SAMPLE_RATE; //for 2 seconds
var audioBuffer = context.createBuffer(CHANNEL_COUNT, SAMPLE_COUNT, SAMPLE_RATE);

The Web Audio API contains a function for generating tones. I’m not going to use it. Instead I’m going to manually fill this buffer with the data needed to make a sound. Values in the audio buffer are floating point values in the range of -1.0 to 1.0. I’m going to fill the buffer with a pure tone of 440 Hz.  The Math.Sin function is used here. The value passed to Math.Sin must be incremented by a specific value to obtain this frequency.  This value is used in a for-loop to populate the buffer.

const TARGET_FREQUENCY = 440;
const var dx = 1 / (SAMPLE_RATE * TARGET_FREQUENCY);

var channelBuffer = audioBuffer.getChannelData(1);
for(var i=0, x=0;i<SAMPLE_COUNT;++x, x+=dx) {
   channelBuffer[i] = Math.sin(x);
}

The AudioBuffer object is populated. In the next step we can play it. To play the AudioBuffer it gets wrapped in an AudioBufferSource and connected to the destination audio device.

var source = context.createBufferSource();
source.buffer = audioBuffer;
source.connect(context.destination);
source.start();

If you put all of these code samples in a JavaScript file and run it in your browser after you click the 440 tone will begin to play. It works, but let’s do something a little more interesting; I want to be to provide a list of frequencies and hear them played back in the order in which I specified them. This will give the code enough functionality to play a simple melody. To make it even more interesting I want to be able to have these frequencies play simultaneously. This will allow the code to play chords and melodies with polyphony.

The list of frequencies could be passed in the form of a list of numbers. I had specifying melodies that way; I’ve done it on the HP48G calculator.  Instead of doing that, I’ll make a simple class that will hold a note  (A,B,C,D,E,F, G), a number for the octave, and duration. An array of these will make up one voice within a melody. If multiple voices are specified they will all play their notes at once.  I’m switching from JavaScript to TypeScript here just to have something automated looking over my shoulder to check on mistakes. Remember that TypeScript is a super set of JavaScript. If you know JavaScript you will be able to follow along.  For fields that contain notes I’m constraining them to being one of the following values. Most of these are recognizable music notes. The exception is the letter R will will be used for rest notes (silence).

type NoteLetter = 'A♭'|'A'|'A#'|'B♭'|'B'|'C'|'C#'|'D♭'|'D'|'D#'|'E'|'F'|'F#'|'G♭'|'G'|'G#'|'R';
type Octave = number;
type Duration = number;
 
 

I’ve made a  dictionary to map each note letter to a frequency.  I also have a Note class that will contain the information necessary to play a single note.  Most of the fields on this class explain themselves (letter, octave, duration, volume). The two fields, rampDownMargin and rampUpMargin specify how long a note should take to come to full volume. When I initially tried this with no ramp time two notes that were side-by-side that had the same frequency sounded like a single note. Adding a time for the sound to ramp up to full volume and back down allowed each note to have its own distinct sound.

class Note {
    letter:NoteLetter;
    octave:Octave;
    duration:Duration;
    volume = 1;
    rampDownMargin = 1/8;
    rampUpMargin = 1/10;

    constructor(letter:NoteLetter,octave:Octave,duration:Duration) {
        this.letter = letter;
        this.octave = octave;
        this.duration = duration;
    }
}

A voice will be a collection of notes. An array would be sufficient. But I want to also be able to expand upon the voice class to support some other features. For example, I’ve added a member named isMuted so that I can silence a voice without deleting it. I may also add methods to serialize or deserialize a voice or functionality to make editing easier.

class Voice { 
    noteList:Array;
    isMuted = false;

    constructor() { 
        this.noteList = [];
    }

    addNote(note:NoteLetter, octave:number|null = null, duration:number|null = null ) : Note {
        octave = octave || 4;
        duration = duration || 1;
        var n = new Note(note, octave, duration);
        this.add(n);
        return n;
    }

    add(n:Note):void {
        this.noteList.push(n);
    }

    parse(music:string):void {
        var noteStrings = music.split('|');
    }

}

Voices are combined together as a melody. In addition to collecting the voices within this class the functionality for rendering the voices to playable audio buffer also happens here.

class Melody { 

    voiceList:Array;

    constructor() {
        this.voiceList = [];
    }

    addVoice():Voice {
        var v = new Voice();
        this.add(v);
        return v;
    }

    add(v:Voice) { 
        this.voiceList.push(v);
    }

    render(audioBuffer:AudioBuffer, bpm:number) {
        var channelData = audioBuffer.getChannelData(0);
        var sampleRate = audioBuffer.sampleRate;
        var samplesPerBeat = (60/bpm)*sampleRate;
        this.voiceList.forEach(voice => {
            if(voice.isMuted)
                return;
            var position = 0;
            var x = 0;
            voice.noteList.forEach((note)=>{                

                var octaveMultiplier = Math.pow(2,note.octave );
                var frequency = noteFrequency[note.letter] * octaveMultiplier;
                var dx:number = (frequency == 0 )? 0 : (1 / sampleRate * frequency);                
                var sampleCount = samplesPerBeat * note.duration;
                var rampDownCount = samplesPerBeat * note.rampDownMargin;
                var rampUpCount = samplesPerBeat * note.rampUpMargin;
                var noteSample = 0;

                while (sampleCount>0 && position < channelData.length) {                     var rampAdjustment = 1;                     if(rampUpCount > 0) {
                        rampAdjustment *= Math.min(rampUpCount, noteSample)/rampUpCount;
                    }

                    if(rampDownCount>0) {
                        rampAdjustment *= Math.min(rampDownCount, sampleCount)/rampDownCount;
                    }

                    

                    channelData[position] += rampAdjustment * 0.25 * note.voume * Math.sin(x);
                    --sampleCount;
                    ++noteSample;
                    ++position;
                    x += dx;
                }
            });
        });
    }
}
 
To test this code out I’ve made a an audio buffer using the code from the earlier sample. I then created a Melody objected and added a couple of voices to it that place a scale in reverse order.
 
var song = new Melody();

var voice = song.addVoice();

voice.addNote('C', 4);
voice.addNote('D', 4);
voice.addNote('E', 4);
voice.addNote('F', 4);
voice.addNote('G', 4);
voice.addNote('A', 4);
voice.addNote('B', 4);
voice.addNote('C', 5, 2);

voice = song.addVoice();

voice.addNote('C', 5, 1);
voice.addNote('B', 4);
voice.addNote('A', 4);
voice.addNote('G', 4);
voice.addNote('F', 4);
voice.addNote('E', 4);
voice.addNote('D', 4);
voice.addNote('C', 4, 2);    

song.render(audioBuffer, 120 /*BPM*/);
var source = context.createBufferSource();
source.buffer = audioBufferList[0];
source.connect(context.destination)
source.start();
 
With this functionality present I feel comfortable trying to play out a recognizable tune. The first tune I tried was one from a video game that I think many know. I found the sheet music and typed out the notes.
 
var song = new Melody();

var voice = song.addVoice();
voice.isMuted = false;
voice.addNote('E', 6,0.5);
voice.addNote('E', 6, 0.5);
voice.addNote('R', 6, 0.5);
voice.addNote('E', 6, 0.5);
voice.addNote('R', 6, 0.5);
voice.addNote('C', 6, 0.5);
voice.addNote('E', 6, 0.5);
voice.addNote('R', 6, 0.5);
voice.addNote('G', 6, 1);
voice.addNote('R', 6);
voice.addNote('G', 5 , 0.5);
voice.addNote('R', 6,1.5);

//------------
voice.addNote('C', 5 , 0.5);
voice.addNote('R', 6,1);
voice.addNote('G', 5 , 0.5);
voice.addNote('R', 6,1);
voice.addNote('E', 5 , 0.5);
voice.addNote('R', 6,1);
voice.addNote('A', 6 , 0.5);
There are more notes to this (which are not typed out here). IF you want to hear this in action and see the code execute, I have it hosted here: https://j2i.net/apps/dynamicSound/
 
For the sake of trying something different, I also tried “Adagio for Strings and Organ in G-Minor“. If you want to hear the results of that, you can find those here
 
I got this project as far as rendering those two melodies before I turned my attention to C/C++. I preferred using C/C++ because I had more options for rendering the sound and the restrictions of the browser are not present. However, some of the features that I used were specific to Windows. There is a potential disadvantage (depending on how the code is to be used) of it not being runnable on some other OSes without adapting to their audio APIs.
 
This is something that I may revisit later. 
 
 
 

Posts may contain products with affiliate links. When you make purchases using these links, we receive a small commission at no extra cost to you. Thank you for your support.

Mastodon: @j2inet@masto.ai
Instagram: @j2inet
Facebook: @j2inet
YouTube: @j2inet
Telegram: j2inet
Twitter: @j2inet

One thought on “Dynamically Rendering a Sound in HTML/JavaScript

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.