Can your ears really detect the phase of sound? – That would be kind of remarkable

The other day a friend asked the question “Can you hear phase?”.

More precisely this question translates to “Do your ears have the capacity to detect the time varying acoustic pressure of a sound wave, or do they only respond to the amplitude envelope of the sound wave?”.

To be even more precise, lets represent a pressure wave as \(p(t) = A(t)\sin{(2\pi\nu t)}\), where \(\nu\) is the carrier frequency (pitch) of the sound wave, and \(A(t)\) is the amplitude which can vary in time, but usually varies slowly compared to \(\sin{(2\pi\nu t)}\).

Can your ears actually faithfully detect \(p(t)\), or do they only detect \(A(t)\) and \(\nu \)? This is exactly the same as asking if your eyes detect the time variation of the electromagnetic field, or only the intensity and colour. With your eyes, the answer is clear, they detect intensity and colour. No detector yet conceived can directly detect electric field variation at optical frequencies.

Anyway…

After a lot of discussion, thought experiments, and flip-flopping opinions, another friend suggested that I just test it experimentally, so that is what I did.

If you look up sound localisation on Wikipedia, it will tell you that your brain uses the sound phase information delivered by your two ears to help locate the source of the sound. So, is this true, or is this just another thing which sounds so reasonable that people accept it is true?

I’M HONESTLY NOT SURE.

Here is the test:

I generated a stereo sound signal where each ear hears the same frequency (\(\nu_{\rm left}=\nu_{\rm right}\)), with the same amplitude \(A_{\rm left}=A_{\rm right}={\rm constant}\), but with a phase difference that varies in time:

\(p_{\rm left}(t) = A\sin{(2\pi\nu t)}\) and

\(p_{\rm right}(t) = A\sin{(2\pi\nu t + \phi(t))}\) where  \(\phi(t)\) varies from \(0\) to \(2\pi\) over the period of 5 seconds (\(\phi(t)=\pi sin{(2\pi t/5)}\)).

If you listen to the clip below with headphones in, you can clearly hear the apparent source of the sound move back and forth (\(\nu=400{\rm Hz}\)).

So, that seems pretty conclusive, your ears can detect the actual pressure as it varies in time, and it can transmit this information to the brain, which can compare the pressure and make a guess about the direction of the source based on the phase difference.

But that actually seems pretty astonishing.

It means that information is being delivered from you ear to your brain at a frequency of at least 400 times a second in the above example, and potentially much higher (I think I can still hear the direction variation at frequency up to \(\nu\approx 1000{\rm Hz}\)). I just didn’t think that your brain signals could really work at such high frequencies, after all, I tend not to “perceive” any difference between two very brief events events, whether they take 100 milliseconds of 100 nanoseconds (for example, short flashes of light).

So, while that might the end of the story, I do have a couple of alternative hypotheses of how your ears could be “detecting phase”.

  1. Exactly as above: your ear is a microphone and sends \(p(t)\) to you brain directly.
  2. Your ear is a microphone and detects \(p(t)\), but it doesn’t send \(p(t)\) to your brain. It first “mixes it down” with a “local oscillator”, which creates a much lower frequency signal which it can send, which still preserves all the phase information.
  3. Ears are not microphones, they only detect \(A(t)\) and \(\nu\). BUT it is possible that sound itself travels from one ear to the other through your head meat, where it then interferes with the sound that traveled around your head, which would actually change the amplitude \(A(t)\) in a way that depended on the phase difference. This would give an indirect way of determining phase.
  4. I have made an error and there is a flaw in the test.

I think option 1 is probably, maybe, most likely, but again, I find it totally amazing that signals can be faithfully transmitted around your brain at frequencies as high as \(~1k\rm{Hz}\).

Option 2 seems pretty unlikely. Basically it requires two high precision clocks – one in each ear – ticking at exactly the same rate, which never go out of sync. It is hard to think of a way to keep these clocks synchronised that doesn’t also involve the brain sending out high frequency signals like in option 1. So option 2 has all the same amazing neural transfer frequencies of option 1 (probably), but with the added complexity of needing clocks and frequency mixers in each ear.

Option 3 seems plausible to me. It means no high frequency neural connections are required, and means the ears themselves don’t need to be able to detect \(p(t)\). If you are a fan of option 2, then actually you could probably use the meat-transmitted sound wave as a way to sync up the two clocks, but this still seems less likely to me.

Option 4 is not unlikely. There are other factors that I didn’t fully consider. For example, by varying the phase you are varying the frequency that one ear is hearing. Maybe you are just perceiving this as the Doppler shift of an object that is passing you, which is what the sinusoidal variation of \(\phi(t)\) would achieve. Indeed, if I change \(\phi(t)\) to a constant linear ramp, \(\phi(t)=2\pi t/5\), then the position changing effect is reduced, or maybe disappears entirely as demonstrated by the clip below:

Honestly, I don’t think I can perceive any motion at all in this one.

So, I think I am unconvinced either way yet. Maybe there is another test…

Can a single ear detect phase?

This is a related question to the one above, but with some differences. What I actually mean is: given a sound wave that contains two frequency components, each with constant amplitude, can you hear the difference if the relative phase of the two components are changed?

i.e considering just a single ear, does \(p(t) = A\sin{(2\pi\nu_a t)} + B\sin{(2\pi\nu_b t + \phi)}\) sound identical, whatever the value of \(\phi\)?

To test, I made an audio clip where \(\phi=0\) for the first second, and then \(\phi=\pi/2\) for the next (\(\nu_a=440{\rm Hz}\), \(\nu_b=880{\rm Hz}\)). This cycle repeats a few times just to give you a chance to really listen for it. I smoothly reduced the amplitude to zero around each transition so as not to hear any audible “flick” when the phase is suddenly changed (which would introduce other frequency components). Ignoring the amplitude fading, this it what the waveform looks like:

Here is the resulting audio:

I don’t think I can hear any difference between each segment, but let me know if you can.

In conclusion, I still don’t know if ears can detect sound phase, but I’m leaning towards no. That was my first reaction when I first though about it, and the tests I have done on myself seem pretty inconclusive. I’d love to know the answer, or any other comments you have.

Science only works when people let each other know how they messed up, so let me know if I have!

– Rory

Update:

My friend has come up with a more convincing, more watertight test. It is a slight modification to my original directionality test, but rather than ramping the phase of one signal in time (which changes the frequency), he just used a fixed phase offset between the two sine waves. By having repeating short clips where the ear with the phase offset is swapped, you get a very convincing impression that the source of the sound is swapping from left to right and back again. The phase offset used is calculated from the wavelength of the sound and the width of the head. Here is the audio:

 

So, to update my conclusion. YES, YOU CAN HEAR PHASE, and to me, that is truly remarkable!

 

 

Here is the python code I used to generate directionality test:

import numpy as np
from scipy.io import wavfile
pi = np.pi

f = 400
sample_rate = 44100
clip_time = 10
t = np.arange(0, clip_time, 1/sample_rate)
phase_mod_period = 5
left = np.sin(2*pi*t*f)
# sinusoidal ramp
# right = np.sin(2*pi*t*f + pi*np.sin(2*pi*t/phase_mod_period))
# linear ramp
right = np.sin(2*pi*t*f + 2*pi*t/phase_mod_period)

wave_data = np.stack([left, right], axis=1).astype('float32')
wavfile.write('stereo_phase_test.wav',sample_rate, wave_data)

And this generated the two-tone phase difference test:

from pylab import *
from scipy.io import wavfile

def find_nearest(array,value):
    idx = (abs(array-value)).argmin()
    actual_val = array[idx]
    return idx, actual_val

f1 = 440
T1 = 1/f1
f2 = 880

SAMPLE_RATE = 44100

clip_time = 10
segment_period = round(1/T1)*T1 #this just ensures that the segment period is an integer number of periods.

t = np.arange(0, clip_time, 1/SAMPLE_RATE)
t_offset_both = -0.08E-3
t_0 = np.arange(0, clip_time, 1/SAMPLE_RATE) + t_offset_both
t_offset_pion2 = -0.196E-3
t_pion2 = np.arange(0, clip_time, 1/SAMPLE_RATE) + t_offset_pion2 + t_offset_both

phase_mod_period = 0.25
phase_mod_freq = 1/phase_mod_period
s_0 = 0.5*sin(2*pi*t_0*f1) + 0.5*sin(2*pi*t_0*f2)
s_pion2 = 0.5*sin(2*pi*t_pion2*f1) + 0.5*sin(2*pi*t_pion2*f2 + pi/2)

idxcut0, tcut = find_nearest(t, segment_period)
seg0 = s_0[:idxcut0]
idxcut1, tcut = find_nearest(t, 2*segment_period)
seg1 = s_pion2[idxcut0:idxcut1]

all_segs = []
for i in range(int(clip_time/segment_period/2)):
    all_segs.append(seg0)
    all_segs.append(seg1)
s = concatenate(all_segs)

envelope = ones(len(t))
fade_time = 0.02
for i in range(int(clip_time/segment_period)+1):
    tzero = i*segment_period
    envelope = envelope - exp(-(t-tzero)**2/(2*fade_time**2))

s_audio = s*envelope
wave_data = stack([s_audio, s_audio], axis=1).astype('float32')
wavfile.write('abrupt_phase_shift_with_fade_transition.wav',SAMPLE_RATE, wave_data)

join_time = 1*segment_period

idxstart, tstart = find_nearest(t, join_time - 1/f1)
idxstop, tstop = find_nearest(t, join_time + 1/f1)
idxcut, tcut = find_nearest(t, join_time)
subplot(1,2,1)
title('ϕ=0 → ϕ=π/2 transition')
plot(t[idxstart:idxstop]*1E3, s[idxstart:idxstop], '--', linewidth = 4, label='generated audio')
plot(t[idxstart:idxstop]*1E3, s_0[idxstart:idxstop], label='ϕ=0')
plot(t[idxstart:idxstop]*1E3, s_pion2[idxstart:idxstop], label='ϕ=π/2')
plot([1E3, 1E3],[s.min(), s.max()], label='switch time' )
xlabel('t (ms)')
ylabel('Waveform value')
legend()

join_time = 2*segment_period
idxstart, tstart = find_nearest(t, join_time - 1/f1)
idxstop, tstop = find_nearest(t, join_time + 1/f1)
idxcut, tcut = find_nearest(t, join_time)
subplot(1,2,2)
title('ϕ=π/2 → ϕ=0 transition')
plot(t[idxstart:idxstop]*1E3, s[idxstart:idxstop], '--', linewidth = 4, label='generated audio')
plot(t[idxstart:idxstop]*1E3, s_0[idxstart:idxstop], label='ϕ=0')
plot(t[idxstart:idxstop]*1E3, s_pion2[idxstart:idxstop], label='ϕ=π/2')
plot([2E3, 2E3],[s.min(), s.max()], label='switch time' )
xlabel('t (ms)')
legend()
tight_layout()
show()

 

3 Comments

  1. RichardpeadeDecember 15, 2019

    I am sorry for off-topic, I am thinking about making an informative web site for students. Will probably begin with publishing interesting facts just like”A B-25 bomber crashed into the 79th floor of the Empire State Building on July 28, 1945.”Please let me know if you know where I can find some related information and facts like here

    writing an essay

    Reply
  2. LarryemiTyFebruary 11, 2020

    Lower priced Essay Producing presents most helpful, custom made and finest rated essays over the internet at reasonably priced price levels | Our authority essay writers promise exceptional superior quality with 24/7.

    Reply
  3. AffiliateLabzFebruary 15, 2020

    Great content! Super high-quality! Keep it up! 🙂

    Reply

Leave a Reply

Your email address will not be published.

Scroll to top