Table of Contents

Audacity CTF: Audio Forensics

執筆者:

カテゴリ:

Audacity in CTF audio forensics taught me something I didn’t expect: the first thing to do with an audio file is not to listen to it. I learned this the hard way on a CyLab Security Academy (formerly picoCTF) Morse Code challenge, where I opened the WAV file, hit play, and spent the next twenty minutes trying to decode something by ear — before realizing I should have been looking at the waveform view the entire time.

The Morse Code challenge that reset my approach

The challenge was straightforward in description: decode the audio file. The file was morse_chal.wav. I ran the standard first checks:

$ file morse_chal.wav
morse_chal.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz

$ strings morse_chal.wav | grep -i "flag\|ctf\|pico"
(no output)

$ binwalk morse_chal.wav
DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
0             0x0             RIFF audio data (WAV), PCM, 1 channels, 44100 sample rate

No embedded files. No plaintext flag. A clean 28.9-second mono WAV at 44100 Hz. So I opened it in Audacity — and made my first mistake: I switched to Spectrogram view immediately, because that’s what every CTF audio guide says to do first.

I spent fifteen minutes adjusting the spectrogram settings. Window size 1024, then 2048, then 4096. Maximum frequency 8000 Hz, then 20000 Hz. Different color schemes. Nothing. The spectrogram showed a concentrated band around 500 Hz with no visible structure — just a uniform carrier.

Then I switched back to Waveform view, and the pattern was immediately obvious: regular pulses with clear short and long durations. Morse code. I had been looking at it wrong the whole time.

After decoding manually from the waveform and confirming with a Python script (carrier frequency: 517 Hz), the message spelled out WH47 H47H 60D W20U6H7 — leetspeak for “What hath God wrought,” the first message Samuel Morse himself transmitted by telegraph in 1844. The flag: picoCTF{wh47_h47h_60d_w20u6h7}.

The lesson was not “spectrogram is wrong” — it was “spectrogram is one view among several, and the right view depends on how the data is hidden.” I had over-generalized from a previous challenge where the spectrogram revealed everything. Morse code lives in the time domain, not the frequency domain. Waveform view shows time. Spectrogram shows frequency.

Getting Audacity ready: the two settings that actually matter

Switch views intentionally, not by reflex

The track dropdown in Audacity lets you switch between Waveform and Spectrogram. Most CTF guides say “switch to Spectrogram immediately.” After the Morse Code incident, I now look at Waveform first for five seconds — checking for the kind of regular amplitude pattern that signals Morse or binary encoding — before switching to Spectrogram.

If the waveform shows random-looking noise with no structure, then Spectrogram is the right next step. If it shows repeating pulse patterns, stay in Waveform.

Tune the spectrogram settings for CTF challenges

Access Edit → Preferences → Spectrograms and adjust:

Window size: Begin with 1024; increase to 2048 or 4096 for sharper frequency resolution. At 512 the spectrogram is blurry. At 4096 it’s sharp but time resolution drops — you’ll miss short events. For static images hidden in audio, 4096 is usually correct.
Maximum frequency: Default is 8000 Hz. If the flag uses high-frequency components (some SSTV signals, ultrasonic encoding), expand to 20000 Hz. On the Morse Code challenge, this adjustment revealed nothing because the signal was all time-domain — a reminder that not every setting change helps.
Color scheme: “Spectrum” or “Inferno” typically reveals hidden images better than default grayscale.

Nine CTF patterns in audio forensics, ranked by how often I encounter them

Pattern 1: Hidden image in spectrogram (most common)

Flags, QR codes, barcodes, or images embedded in the frequency domain — invisible in Waveform view, immediately visible in Spectrogram with the right window size. This is why the “spectrogram first” rule exists. If you see a suspicious region at the top of the spectrogram that looks cut off, expand the maximum frequency range — some challenges deliberately put the content above the default 8000 Hz ceiling.

If a QR code appears in the spectrogram, screenshot it for scanning. Cleanup in an image editor (adjusting contrast, converting to black and white) may be necessary before a barcode scanner can read it.

Pattern 2: Morse code in the waveform

As in the Morse Code challenge above: short and long pulses encode dots and dashes. The key insight is that Morse code is a time-domain pattern — it’s about when the signal is on and off, not what frequencies are present. The Spectrogram shows you a carrier frequency exists, but the Waveform shows you the timing structure.

# In Audacity Waveform view:
# 1. Zoom in on the time axis (Ctrl+Scroll)
# 2. Look for repeating short/long pulses
# 3. Amplify if pulses are hard to distinguish (Effect → Amplify)
# 4. Time the shortest pulse — that's your dot duration
# Short pulse ≈ dot (.), Long pulse ≈ dash (-)

The Morse Code challenge had a carrier at 517 Hz. The pulses were ~0.11s for dots and ~0.33s for dashes. Once I had that timing, decoding was straightforward. For longer messages, use multimon-ng to decode automatically rather than doing it by hand.

$ multimon-ng -t wav -a MORSE_CW morse_chal.wav

Pattern 3: Reversed or speed-manipulated audio

Speech or tones recorded backwards, at double speed, or pitch-shifted. Apply Effect → Reverse, Effect → Change Speed, Effect → Change Tempo, or Effect → Change Pitch as needed. Speed and tempo are different: Change Speed affects both pitch and duration; Change Tempo preserves pitch. For most CTF challenges, Change Speed is what you want.

Pattern 4: Stereo channel hiding

Data hidden in one channel of a stereo file. Use Track → Split Stereo Track to examine each channel independently. The invert-and-mix technique reveals phase-cancelled content that wouldn’t otherwise be audible:

Duplicate the track
On one copy: Effect → Invert
Select both → Tracks → Mix → Mix and Render

On the Morse Code challenge, splitting the channels showed identical mono content (it was mono to begin with). Worth the ten seconds to check regardless.

Pattern 5: DTMF tones

Each phone keypad key produces two simultaneous frequencies (e.g., 697 Hz + 1209 Hz = “1”). In Spectrogram view, DTMF tones appear as brief horizontal lines at two frequencies at the same time. Decode with external tools rather than manually reading frequencies:

$ multimon-ng -t wav -a DTMF mystery.wav
Output: DTMF: 1 DTMF: 3 DTMF: 3 DTMF: 7 ...

Pattern 6: SSTV signal

Images transmitted as audio using Slow-Scan Television encoding from amateur radio. Appears as regular striped diagonal patterns in Spectrogram view. Audacity can visualize but not decode — use qsstv or an online SSTV decoder to extract the image.

Pattern 7: Binary encoding in tone pulses

Two alternating tones represent 1s and 0s. Identify the two frequencies in Spectrogram, note their timing sequence, convert to binary, then ASCII. If decoding fails, try reversing the bit order, swapping which frequency is 1 vs 0, or reading right-to-left. CTF authors rarely document which convention they used.

Pattern 8: LSB steganography in WAV

Data hidden in the least significant bits of audio samples — imperceptible to ears but extractable with dedicated tools. The WAV format stores raw PCM samples: 16-bit audio means each sample is a number from -32768 to 32767. The last bit of each sample is effectively inaudible, so it can carry arbitrary data. Audacity shows only faint noise — you need external tools:

$ pip install stegolsb
$ wavsteg -r -i mystery.wav -o output.txt -n 1 -b 1000

Or with Python directly:

from scipy.io import wavfile
import numpy as np
rate, data = wavfile.read('mystery.wav')
bits = (data.flatten() & 1).tolist()
chars = [chr(int(''.join(map(str,bits[i:i+8])),2)) for i in range(0,min(len(bits),800),8)]
print(''.join(chars))

Pattern 9: Data appended after audio content

Extra bytes appended at the end of the audio file — invisible in Audacity but detectable with binwalk. Check this before spending time in Audacity:

$ binwalk mystery.wav

DECIMAL       HEXADECIMAL     DESCRIPTION
0             0x0             RIFF (little-endian) data, WAVE audio
1048576       0x100000        Zip archive data, at least v2.0

$ dd if=mystery.wav of=hidden.zip bs=1 skip=1048576

Full trial process: what I actually tried on morse_chal.wav

Step	Action	Tool / Command	Result	Why it failed / succeeded
1	File identification	`file morse_chal.wav`	RIFF WAVE, 16-bit mono, 44100 Hz	Confirmed audio format — nothing unusual at file level
2	String search	`strings morse_chal.wav \| grep flag`	No output	Flag was encoded in timing, not as ASCII bytes
3	Binwalk check	`binwalk morse_chal.wav`	Only WAV header found	No embedded files — audio only
4	Open in Audacity, switch to Spectrogram	Track dropdown → Spectrogram	Solid band at ~517 Hz, no structure	Wrong view for time-domain data — Morse lives in amplitude timing, not frequency content
5	Adjust window size and frequency range	Preferences → 1024 → 2048 → 4096, max 20000 Hz	Same uniform band, no hidden image	15 minutes wasted — spectrogram adjustments can’t reveal time-domain patterns
6	Split stereo channels	Track → Split Stereo Track	Single mono channel (file was already mono)	Worth checking — took 10 seconds
7	Switch to Waveform view	Track dropdown → Waveform	Clear short/long pulse pattern immediately visible	Should have checked this at step 4 — Morse code is a waveform pattern
8	Zoom in on pulses	Ctrl+Scroll to zoom time axis	Dot ≈ 0.11s, Dash ≈ 0.33s, 517 Hz carrier	Timing ratio confirmed 1:3 (standard Morse)
9	Decode manually + Python verification	Manual + scipy FFT	`WH47 H47H 60D W20U6H7`	Flag: `picoCTF{wh47_h47h_60d_w20u6h7}` — “What hath God wrought” in leetspeak

Audacity vs other audio tools: how I actually decide

Situation	First choice	Why not Audacity?
Unknown audio file, first look	Audacity (waveform → spectrogram)	—
Decode Morse code automatically	multimon-ng	Audacity requires manual reading of pulse timing
Decode SSTV signal	qsstv / online decoder	Audacity can visualize SSTV stripes but can’t decode them
Decode DTMF tones programmatically	multimon-ng	Audacity requires manual frequency identification
Extract LSB steganography	wavsteg / stegolsb	Audacity has no LSB extraction feature
Detect embedded files in WAV	binwalk → then dd	Audacity only works at the audio layer
Batch process multiple files	SoX or FFmpeg	Audacity is GUI-only — no scripting without plugins
Morse code decoding from clean audio	Audacity (waveform) + multimon-ng	Audacity visualizes; multimon-ng decodes
Noise reduction before external tool	Audacity → export → external tool	Audacity’s noise reduction is good as a preprocessor

Audacity is a visualizer and manual manipulator. It shows you things and lets you transform audio — but it decodes nothing automatically. Every pattern except spectrogram inspection and manual waveform reading requires handing off to an external tool.

Why frequency-domain thinking matters beyond CTF

The Morse Code flag was “What hath God wrought” — the first message Samuel Morse transmitted by telegraph in 1844, a biblical quote from Numbers 23:23. The CTF authors encoded it in leetspeak and hid it in a WAV file using the same principle Morse used in 1844: timing-based encoding over a carrier signal. The only difference is that the carrier here is a 517 Hz audio tone instead of an electrical pulse.

The spectrogram reveals information that the waveform can’t show you: data encoded in the frequency domain is imperceptible to a casual listener but mathematically present and extractable. This is the same reason radio engineers, malware analysts working on audio-based C2 channels, and steganography researchers all use spectrogram analysis. The WAV format stores raw PCM samples — the “audio” is just numbers, and those numbers can encode anything: an image in their frequency distribution, bits in their least significant positions, arbitrary bytes appended after the valid audio frames.

Understanding this — that audio data has both a time domain and a frequency domain, and that CTF challenges can hide data in either — is what separates a first-time audio forensics attempt from a systematic one.

My current first-three-minutes workflow

# Step 1: What's the file format?
file target.wav
strings target.wav | grep -i "flag\|ctf\|pico"

# Step 2: Check for appended data before opening Audacity
binwalk target.wav

# Step 3: Open in Audacity — Waveform FIRST (5 seconds)
# Look for: repeating pulse patterns → Morse or binary encoding
# If waveform shows random noise → switch to Spectrogram

# Step 4: Spectrogram (if waveform shows nothing)
# Track dropdown → Spectrogram
# Adjust: Window size 1024 → 2048 → 4096 until content is clear
# Adjust: Max frequency 8000 → 20000 if default range shows nothing

# Step 5: If spectrogram shows nothing obvious
# - Split stereo channels: Track → Split Stereo Track
# - Try invert + mix to reveal phase-cancelled content
# - Switch back to Waveform: look for Morse-like pulse patterns

# Step 6: If audio sounds manipulated
# - Effect → Reverse (backwards speech)
# - Effect → Change Speed → 50% (double-speed audio)
# - Effect → Change Pitch → -12 semitones (octave-shifted)

# Step 7: If all else fails, export and use external tools
# - Morse/DTMF: multimon-ng -t wav -a MORSE_CW target.wav
# - SSTV: qsstv or online decoder
# - LSB: wavsteg -r -i target.wav -o output.txt -n 1 -b 1000

The critical change from my pre-Morse Code workflow: waveform before spectrogram. Not always — but for five seconds, to rule out the time-domain patterns that spectrogram view hides. On the Morse Code challenge, five seconds of waveform inspection would have saved fifteen minutes of spectrogram adjustment.