Audacity in CTF audio forensics taught me something I didn’t expect: the first thing to do with an audio file is not to listen to it. I learned this the hard way on a CyLab Security Academy (formerly picoCTF) Morse Code challenge, where I opened the WAV file, hit play, and spent the next twenty minutes trying to decode something by ear — before realizing I should have been looking at the waveform view the entire time.
The Morse Code challenge that reset my approach
The challenge was straightforward in description: decode the audio file. The file was morse_chal.wav. I ran the standard first checks:
$ file morse_chal.wav morse_chal.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz $ strings morse_chal.wav | grep -i "flag\|ctf\|pico" (no output) $ binwalk morse_chal.wav DECIMAL HEXADECIMAL DESCRIPTION -------------------------------------------------------------------------------- 0 0x0 RIFF audio data (WAV), PCM, 1 channels, 44100 sample rate
No embedded files. No plaintext flag. A clean 28.9-second mono WAV at 44100 Hz. So I opened it in Audacity — and made my first mistake: I switched to Spectrogram view immediately, because that’s what every CTF audio guide says to do first.
I spent fifteen minutes adjusting the spectrogram settings. Window size 1024, then 2048, then 4096. Maximum frequency 8000 Hz, then 20000 Hz. Different color schemes. Nothing. The spectrogram showed a concentrated band around 500 Hz with no visible structure — just a uniform carrier.
Then I switched back to Waveform view, and the pattern was immediately obvious: regular pulses with clear short and long durations. Morse code. I had been looking at it wrong the whole time.
After decoding manually from the waveform and confirming with a Python script (carrier frequency: 517 Hz), the message spelled out WH47 H47H 60D W20U6H7 — leetspeak for “What hath God wrought,” the first message Samuel Morse himself transmitted by telegraph in 1844. The flag: picoCTF{wh47_h47h_60d_w20u6h7}.
The lesson was not “spectrogram is wrong” — it was “spectrogram is one view among several, and the right view depends on how the data is hidden.” I had over-generalized from a previous challenge where the spectrogram revealed everything. Morse code lives in the time domain, not the frequency domain. Waveform view shows time. Spectrogram shows frequency.
Getting Audacity ready: the two settings that actually matter
Switch views intentionally, not by reflex
The track dropdown in Audacity lets you switch between Waveform and Spectrogram. Most CTF guides say “switch to Spectrogram immediately.” After the Morse Code incident, I now look at Waveform first for five seconds — checking for the kind of regular amplitude pattern that signals Morse or binary encoding — before switching to Spectrogram.
If the waveform shows random-looking noise with no structure, then Spectrogram is the right next step. If it shows repeating pulse patterns, stay in Waveform.
Tune the spectrogram settings for CTF challenges
Access Edit → Preferences → Spectrograms and adjust:
- Window size: Begin with 1024; increase to 2048 or 4096 for sharper frequency resolution. At 512 the spectrogram is blurry. At 4096 it’s sharp but time resolution drops — you’ll miss short events. For static images hidden in audio, 4096 is usually correct.
- Maximum frequency: Default is 8000 Hz. If the flag uses high-frequency components (some SSTV signals, ultrasonic encoding), expand to 20000 Hz. On the Morse Code challenge, this adjustment revealed nothing because the signal was all time-domain — a reminder that not every setting change helps.
- Color scheme: “Spectrum” or “Inferno” typically reveals hidden images better than default grayscale.
Nine CTF patterns in audio forensics, ranked by how often I encounter them
Pattern 1: Hidden image in spectrogram (most common)
Flags, QR codes, barcodes, or images embedded in the frequency domain — invisible in Waveform view, immediately visible in Spectrogram with the right window size. This is why the “spectrogram first” rule exists. If you see a suspicious region at the top of the spectrogram that looks cut off, expand the maximum frequency range — some challenges deliberately put the content above the default 8000 Hz ceiling.
If a QR code appears in the spectrogram, screenshot it for scanning. Cleanup in an image editor (adjusting contrast, converting to black and white) may be necessary before a barcode scanner can read it.
Pattern 2: Morse code in the waveform
As in the Morse Code challenge above: short and long pulses encode dots and dashes. The key insight is that Morse code is a time-domain pattern — it’s about when the signal is on and off, not what frequencies are present. The Spectrogram shows you a carrier frequency exists, but the Waveform shows you the timing structure.
# In Audacity Waveform view: # 1. Zoom in on the time axis (Ctrl+Scroll) # 2. Look for repeating short/long pulses # 3. Amplify if pulses are hard to distinguish (Effect → Amplify) # 4. Time the shortest pulse — that's your dot duration # Short pulse ≈ dot (.), Long pulse ≈ dash (-)
The Morse Code challenge had a carrier at 517 Hz. The pulses were ~0.11s for dots and ~0.33s for dashes. Once I had that timing, decoding was straightforward. For longer messages, use multimon-ng to decode automatically rather than doing it by hand.
$ multimon-ng -t wav -a MORSE_CW morse_chal.wav
Pattern 3: Reversed or speed-manipulated audio
Speech or tones recorded backwards, at double speed, or pitch-shifted. Apply Effect → Reverse, Effect → Change Speed, Effect → Change Tempo, or Effect → Change Pitch as needed. Speed and tempo are different: Change Speed affects both pitch and duration; Change Tempo preserves pitch. For most CTF challenges, Change Speed is what you want.
Pattern 4: Stereo channel hiding
Data hidden in one channel of a stereo file. Use Track → Split Stereo Track to examine each channel independently. The invert-and-mix technique reveals phase-cancelled content that wouldn’t otherwise be audible:
- Duplicate the track
- On one copy: Effect → Invert
- Select both → Tracks → Mix → Mix and Render
On the Morse Code challenge, splitting the channels showed identical mono content (it was mono to begin with). Worth the ten seconds to check regardless.
Pattern 5: DTMF tones
Each phone keypad key produces two simultaneous frequencies (e.g., 697 Hz + 1209 Hz = “1”). In Spectrogram view, DTMF tones appear as brief horizontal lines at two frequencies at the same time. Decode with external tools rather than manually reading frequencies:
$ multimon-ng -t wav -a DTMF mystery.wav Output: DTMF: 1 DTMF: 3 DTMF: 3 DTMF: 7 ...
Pattern 6: SSTV signal
Images transmitted as audio using Slow-Scan Television encoding from amateur radio. Appears as regular striped diagonal patterns in Spectrogram view. Audacity can visualize but not decode — use qsstv or an online SSTV decoder to extract the image.
Pattern 7: Binary encoding in tone pulses
Two alternating tones represent 1s and 0s. Identify the two frequencies in Spectrogram, note their timing sequence, convert to binary, then ASCII. If decoding fails, try reversing the bit order, swapping which frequency is 1 vs 0, or reading right-to-left. CTF authors rarely document which convention they used.
Pattern 8: LSB steganography in WAV
Data hidden in the least significant bits of audio samples — imperceptible to ears but extractable with dedicated tools. The WAV format stores raw PCM samples: 16-bit audio means each sample is a number from -32768 to 32767. The last bit of each sample is effectively inaudible, so it can carry arbitrary data. Audacity shows only faint noise — you need external tools:
$ pip install stegolsb $ wavsteg -r -i mystery.wav -o output.txt -n 1 -b 1000
Or with Python directly:
from scipy.io import wavfile
import numpy as np
rate, data = wavfile.read('mystery.wav')
bits = (data.flatten() & 1).tolist()
chars = [chr(int(''.join(map(str,bits[i:i+8])),2)) for i in range(0,min(len(bits),800),8)]
print(''.join(chars))
Pattern 9: Data appended after audio content
Extra bytes appended at the end of the audio file — invisible in Audacity but detectable with binwalk. Check this before spending time in Audacity:
$ binwalk mystery.wav DECIMAL HEXADECIMAL DESCRIPTION 0 0x0 RIFF (little-endian) data, WAVE audio 1048576 0x100000 Zip archive data, at least v2.0 $ dd if=mystery.wav of=hidden.zip bs=1 skip=1048576
Full trial process: what I actually tried on morse_chal.wav
| Step | Action | Tool / Command | Result | Why it failed / succeeded |
|---|---|---|---|---|
| 1 | File identification | file morse_chal.wav | RIFF WAVE, 16-bit mono, 44100 Hz | Confirmed audio format — nothing unusual at file level |
| 2 | String search | strings morse_chal.wav | grep flag | No output | Flag was encoded in timing, not as ASCII bytes |
| 3 | Binwalk check | binwalk morse_chal.wav | Only WAV header found | No embedded files — audio only |
| 4 | Open in Audacity, switch to Spectrogram | Track dropdown → Spectrogram | Solid band at ~517 Hz, no structure | Wrong view for time-domain data — Morse lives in amplitude timing, not frequency content |
| 5 | Adjust window size and frequency range | Preferences → 1024 → 2048 → 4096, max 20000 Hz | Same uniform band, no hidden image | 15 minutes wasted — spectrogram adjustments can’t reveal time-domain patterns |
| 6 | Split stereo channels | Track → Split Stereo Track | Single mono channel (file was already mono) | Worth checking — took 10 seconds |
| 7 | Switch to Waveform view | Track dropdown → Waveform | Clear short/long pulse pattern immediately visible | Should have checked this at step 4 — Morse code is a waveform pattern |
| 8 | Zoom in on pulses | Ctrl+Scroll to zoom time axis | Dot ≈ 0.11s, Dash ≈ 0.33s, 517 Hz carrier | Timing ratio confirmed 1:3 (standard Morse) |
| 9 | Decode manually + Python verification | Manual + scipy FFT | WH47 H47H 60D W20U6H7 | Flag: picoCTF{wh47_h47h_60d_w20u6h7} — “What hath God wrought” in leetspeak |
Audacity vs other audio tools: how I actually decide
| Situation | First choice | Why not Audacity? |
|---|---|---|
| Unknown audio file, first look | Audacity (waveform → spectrogram) | — |
| Decode Morse code automatically | multimon-ng | Audacity requires manual reading of pulse timing |
| Decode SSTV signal | qsstv / online decoder | Audacity can visualize SSTV stripes but can’t decode them |
| Decode DTMF tones programmatically | multimon-ng | Audacity requires manual frequency identification |
| Extract LSB steganography | wavsteg / stegolsb | Audacity has no LSB extraction feature |
| Detect embedded files in WAV | binwalk → then dd | Audacity only works at the audio layer |
| Batch process multiple files | SoX or FFmpeg | Audacity is GUI-only — no scripting without plugins |
| Morse code decoding from clean audio | Audacity (waveform) + multimon-ng | Audacity visualizes; multimon-ng decodes |
| Noise reduction before external tool | Audacity → export → external tool | Audacity’s noise reduction is good as a preprocessor |
Audacity is a visualizer and manual manipulator. It shows you things and lets you transform audio — but it decodes nothing automatically. Every pattern except spectrogram inspection and manual waveform reading requires handing off to an external tool.
Why frequency-domain thinking matters beyond CTF
The Morse Code flag was “What hath God wrought” — the first message Samuel Morse transmitted by telegraph in 1844, a biblical quote from Numbers 23:23. The CTF authors encoded it in leetspeak and hid it in a WAV file using the same principle Morse used in 1844: timing-based encoding over a carrier signal. The only difference is that the carrier here is a 517 Hz audio tone instead of an electrical pulse.
The spectrogram reveals information that the waveform can’t show you: data encoded in the frequency domain is imperceptible to a casual listener but mathematically present and extractable. This is the same reason radio engineers, malware analysts working on audio-based C2 channels, and steganography researchers all use spectrogram analysis. The WAV format stores raw PCM samples — the “audio” is just numbers, and those numbers can encode anything: an image in their frequency distribution, bits in their least significant positions, arbitrary bytes appended after the valid audio frames.
Understanding this — that audio data has both a time domain and a frequency domain, and that CTF challenges can hide data in either — is what separates a first-time audio forensics attempt from a systematic one.
My current first-three-minutes workflow
# Step 1: What's the file format? file target.wav strings target.wav | grep -i "flag\|ctf\|pico" # Step 2: Check for appended data before opening Audacity binwalk target.wav # Step 3: Open in Audacity — Waveform FIRST (5 seconds) # Look for: repeating pulse patterns → Morse or binary encoding # If waveform shows random noise → switch to Spectrogram # Step 4: Spectrogram (if waveform shows nothing) # Track dropdown → Spectrogram # Adjust: Window size 1024 → 2048 → 4096 until content is clear # Adjust: Max frequency 8000 → 20000 if default range shows nothing # Step 5: If spectrogram shows nothing obvious # - Split stereo channels: Track → Split Stereo Track # - Try invert + mix to reveal phase-cancelled content # - Switch back to Waveform: look for Morse-like pulse patterns # Step 6: If audio sounds manipulated # - Effect → Reverse (backwards speech) # - Effect → Change Speed → 50% (double-speed audio) # - Effect → Change Pitch → -12 semitones (octave-shifted) # Step 7: If all else fails, export and use external tools # - Morse/DTMF: multimon-ng -t wav -a MORSE_CW target.wav # - SSTV: qsstv or online decoder # - LSB: wavsteg -r -i target.wav -o output.txt -n 1 -b 1000
The critical change from my pre-Morse Code workflow: waveform before spectrogram. Not always — but for five seconds, to rule out the time-domain patterns that spectrogram view hides. On the Morse Code challenge, five seconds of waveform inspection would have saved fifteen minutes of spectrogram adjustment.
Further Reading
If you’re building out your audio forensics toolkit, the CTF Forensics Tools: The Ultimate Guide for Beginners covers Audacity alongside the full set of tools used in Forensics challenges — from disk imaging to steganography.
For the pattern of data embedded across a file’s raw byte structure rather than its audio content, the binwalk guide covers scanning for embedded files, reading offset output, and extracting content with dd.
When the audio challenge involves a WAV file with data appended past the audio frames, the dd in CTF forensics guide covers byte-level extraction for challenges that go deeper than the audio layer.
The FFmpeg guide complements Audacity for cases where you need to batch-process audio files, extract frames from video, or convert between formats before opening in Audacity.
Leave a Reply