How It Works

A technical deep-dive into the audio capture, frequency analysis, and GPU rendering pipeline that powers System Audio Visualizer.

Step 1: Audio Capture

System Audio Visualizer uses the Screen Capture API (navigator.mediaDevices.getDisplayMedia) to capture audio from your screen or browser tab. When you click “Start Visualization,” the browser presents a dialog where you can choose which screen, window, or tab to share.

The critical step is enabling the “Share audio” checkbox in the browser dialog. Without this, no audio data reaches the visualizer. Once sharing begins, we immediately stop the video track since we only need the audio stream, reducing CPU and bandwidth usage.

Step 2: Audio Context and Analysis

The captured audio stream is connected to the Web Audio API processing pipeline:

An AudioContext is created to manage the audio processing graph.
A MediaStreamSource node connects the captured stream to the audio graph.
An AnalyserNode is attached to perform real-time Fast Fourier Transform (FFT) analysis.

The AnalyserNode is configured with an FFT size of 1024, providing 512 frequency bins. A smoothing time constant of 0.8 prevents jittery visual updates while maintaining responsiveness.

Step 3: Frequency Extraction

On every animation frame, the audio engine extracts two key data arrays from the AnalyserNode:

Frequency data: A Uint8Array of 512 values (0-255) representing the magnitude of each frequency bin from low to high.
Waveform data: A Uint8Array representing the current audio waveform shape in the time domain.

From the frequency data, the engine computes three energy bands:

Bass (0-15% of bins): Kick drums, sub-bass, bass guitars.
Mid (15-50% of bins): Vocals, guitars, keys, most melodic content.
Treble (50-100% of bins): Hi-hats, cymbals, high-frequency sparkle.

Step 4: Beat Detection

The beat detection algorithm monitors the bass energy band for sudden peaks. When the current bass energy exceeds the previous frame's energy by a threshold factor (1.4x) and the absolute bass level is above 0.3, a beat is flagged. A cooldown of 150ms prevents rapid false triggers from sustained bass.

Beat events drive dramatic visual responses like shockwaves in the Bass Shockwave preset and expansion bursts in Circular Pulse Ring.

Step 5: GPU Rendering

Audio metrics are passed to the WebGL rendering pipeline powered by Three.js. Each visualization preset defines custom GLSL vertex and fragment shaders that run on the GPU.

The rendering loop runs at 60fps using requestAnimationFrame:

Read audio frequency data from the AnalyserNode.
Compute bass, mid, treble, amplitude, and beat metrics.
Update shader uniform values with the new audio data.
Update any particle systems or geometry transformations.
Render the frame to the fullscreen canvas.

Step 6: Shader Techniques

Different presets use different GPU rendering techniques:

Instanced rendering: Used for Neon Spectrum Bars, where 64 bars are rendered with a single draw call using instance attributes.
Point sprites: Used for Particle Galaxy, rendering 5000 particles as GPU-accelerated point sprites with custom shapes.
Raymarching: Used for Audio Tunnel, where a fragment shader traces rays through a procedurally defined tunnel geometry.
Noise-based domain warping: Used for Liquid Shader Field, layering 3D simplex noise functions driven by audio bands.

Performance Optimization

Several strategies ensure smooth performance:

Pixel ratio is capped at 2x to prevent excessive GPU load on 3x displays.
Typed arrays (Uint8Array, Float32Array) minimize garbage collection.
Audio metrics are computed at display refresh rate, not faster.
Additive blending is used instead of expensive transparency sorting.
Resources are properly disposed when switching presets or stopping capture.