Dan Benjamin

Yes, you’re in the right place. Hivelogic has merged with DanBenjamin.com, and its content now lives here. Watch this space for details.

How to Record a Podcast with People in Multiple Locations

I recently read Joel Spolsky’s excellent description of the StackOverflow podcast setup. Although I was impressed by his process and gear, I was also a little bit surprised by its complexity. I also realized that although I’ve written a Podcasting Equipment Guide, I haven’t explained how I actually use that equipment to record a podcast.

Over the last few years of podcasting, first with the Hivelogic Radio Show (on hiatus) and later with The Talk Show (which is actually not on hiatus), I’ve learned a lot about how to record and produce good quality audio. Unfortunately, because of storage and bandwidth limitations, most podcasts are mixed-down to mono and reduced to 32 or 64 kbps. At this level, they’ll lose most of the detail and subtle nuance you’d find in a CD (or better) quality recording. That’s part of why podcasts do well when they focus on producing excellent content.

One of the questions people often ask is how we record The Talk Show with me here in Orlando and John in Philadelphia. Many people suspect that we record using Skype, Audio Hijack, Soundflower, or something similar.

In fact, we use a much more reliable, tried-and-true method that’s so simple, it just might surprise you.

The Golden Years

Recording with good equipment, like the kind I recommend in the Podcasting Equipment Guide, and then editing and mixing down with great care, can help to create the best possible result even when you’re using less than ideal equipment.

When recording the Hivelogic Radio Show, which was essentially an interview show, I was podcasting with people who generally didn’t have recording equipment of their own. They would often have to bark into the built-in microphones in their MacBooks or use the USB headset mics they’d purchased for VOIP and Skype calls. In either case, it made for bad audio quality. With some creative editing techniques I’d learned, I could clean up their audio, boost the levels, while simultaneously bringing my own audio down a few notches. After a little bit of compression and a subtle Noise Gate to remove any hiss, the result would be a balanced, even podcast. Once reduced to a podcast-friendly bit-rate, nobody could tell the difference.

But balancing different audio outputs isn’t what this article is about.

A Simple, Direct Way

Gruber and I both use a Shure SM7B, boosted with a PreSonus Tube Pre, connected to the Mac with an M-Audio Firewire Solo. We both use Freeverse SoundStudio 3 to record the audio. This is a solid, budget-minded but professional-level setup.

We found early on that recording the output from Skype (or iChat) was less than reliable, even when using great software like Rogue Amoeba’s amazing Audio Hijack Pro. Initially, I tried using the recorded Skype channels. Then I tried recording my own audio with SoundStudio 3, pulling Gruber’s audio from Audio Hijack, and mixing them back together. I actually tried many more variations than I’m detailing here. In the end, the audio was never as nice as when I recorded directly from my mic into SoundStudio 3 (or Quicktime Pro or GarageBand).

I was talking about this with my friend Ryan Irelan, and he let me in on an old recording-industry technique with an unfortunate name: The Double-Ender. Although traditionally used for television, the double-ender works just a well with audio, and it’s growing in popularity within the podcasting community because of its simplicity. It works like this:

An interviewer [...] would be videotaped conducting an interview via a long-distance phone call to the interviewee in another part of the world. This interviewee [...] would be videotaped as he was being interviewed. This videotape would then be sent to the interviewer’s city and synchronized with the videotape of the interviewer [...] and the higher-quality sound of the videotapes would be used instead of the telephone audio.

This is precisely what John and I do when recording The Talk Show, and it’s exactly what you should do any time you’re recording audio where the people involved are in different locations.

I record my audio. John records his audio. We talk to each other using Skype or iChat or the telephone (but it doesn’t matter how we talk to each other because we’re not recording the actual conversation, just our own side). John then zips and uploads his audio which I then download and drop into a track in SoundStudio 3 (GarageBand would also work just fine).

Recording this way saves hours of time in post-production, because we end up with two high quality audio tracks that need almost no audio editing, allowing me (or Ryan when he’s doing the editing) to focus exclusively on the content.

Wondering how we sync up the audio? You’d be amazed how well a quick “3-2-1-Start” actually works.

Comments

Jon ·

Simple, yet effective. I hadn't thought of that.

I'm curious though. Are you using headphones for the conversation so that it doesn't bleed over into the recording?

Dan Benjamin ·

@Jon - absolutely, we both have headphones on. Depending on how we're talking to each other, I'll either use a set of Samson headphones, or the iPhone's headphones.

David Fredin ·

Hi!

I've heard that ppl have had problems with the syncing of the sound when recording at two different computers. Because of the small, but anyway existing, difference in the quartz crystals in the computers.

Does that affect you?

Ed ·

I would imagine the delay would be due to network congestion over Skype or iChat.

Me: Bonjour!
You hear me 1.2 seconds later, and then respond: Howdy!
I hear you 0.8 seconds later, and so on and so forth.

This seemingly would provide random delays in responses, or talking over each other. Probably minimal with a fast enough connection, but still a problem.

Erik J. Barzeski ·

I've been doing a similar thing, recording both ends, but unfortunately the software I use on my Mac to record the audio shortens segments during which I'm silent (i.e. listening to my partner). It does this despite the preference to do so being disabled. As a result, I still have to do a lot of editing of the timing because my recording is slightly compressed.

My podcast partner uses QuickTime to record his audio on Windows. Perhaps I'll have to use it to record it on the Mac. :-P

Good write-up. I'm making due with a Samson USB mic, so I envy your setup. :-)

Adam Lisagor ·

Is it obnoxious if I chime in? I know, but I will anyway.

For You Look Nice Today (I'm in LA, Scott is in Los Altos, and Merlin in SF), we use a nice Skype plug-in that can be had for $15 called Call Recorder. It records your Skype conversation but the magic is in how it lets you split out the different sides of the conversation into separate channels. So you end up with a file which you can open in QuickTime Pro and extract one channel with just your crisp, clean locally-recorded audio, and another with the other side of the conversation, mixed.

This is crucial for the syncing of the tracks because all you have to do, once you've been sent the other locally-recorded sides of the conversation, is match them up with your cruddy mixed version. No need for a 1-2-3-Go! or a goofy clap (which never works like it's supposed to because of delay). Just pick a word or a clever turn of phrase and line that sucker up!

The one gotcha (as @David Fredin mentions above) is that every once in a while, you'll have to deal with sync slippage. Maybe your recording disk farted or maybe someone used a different sample rate. It happens. In which case, you need only find the points where your sync is off (usually by no more than a fraction of a second) make a cut point, and line it up again.

That's how we do it anyway. Thanks for the post, Dan Benjamin. I love your podcast and can't wait for the new ones.

Jason Seifer ·

This is how we do it on my podcast also. Good article!

David Smith ·

For the D-1-3 show, we all record locally (on advice from Gruber), but since two of us are using Macs and one is using a PC, we do get the sync problem. Apparently it's much worse between Macs and PC's than between similar machines. I don't know why this is, now that Macs are using Intel chips, but there it is.

While editing, I have to go through and re-sync up the audio many times through the course of the show.

Andrew Woods ·

That makes a lot of sense, Dan. I have something that's been rattling in my brain every time you and John mention listeners' gripes regarding who's who in your podcast: why not give each participant a stereo bias? E.G. John slightly right channel and yourself slightly left channel (or vice versa). That way the next time someone gripes, you can simply say, "I'm the one on the left!" Seems easy enough to do given your simple setup. I figure you've already thought of this, but thought I'd throw it out there. The show's great, by the way!

Jacob ·

For my podcast we do the same thing and the initial sync is never an issue.

I do notice that it can get out of sync as our recording goes on, but its simple enough to fix.

Also on the delay we don't notice anything with Skype. Part of that is setting up a static route on my router to allow Skype traffic straight through. You can read more about that here. http://creativebits.org/mac_os_x/improve_skype_sound_quality_0

Nicholas Tolson ·

Perfect example of K.I.S.S. Not only is it easier, it yields higher quality audio.

@Andrew - I like the stereo idea.

bud ·

You could have Dan or John say "I'm on the right", but you might be surprised how many people have their stereo image flipped, often through no fault of their own. So what one or the other would have to say is, "I'm the one slightly on this side, if you can even tell".

And there are people who do not even try for a stereo spread from their speakers, if the speakers themselves were not separated by design. I see a lot of people that stack their stereo speakers on top of each other.

Sandra ·

I thought everybody knew about this; why would you do it any other way?

As for stereo, I've usually listened to The Talk Show on my mono speaker (when I'm not going somewhere, in which case I might use stereo headphones). I love mono, it feels so diegetic. The music or the show becomes a part of my life, my room, instead of the other way around when you're immersed in stereo.

I spent hours in the hammock listening to the talk show with a mono speaker beside me this summer.

heather gold ·

Great piece and thanks Adam L for your comment and explanation. I've been trying to figure out how ylnt sounds so great for ages.

I have a trickier problem which is that I have different people on my show all the time, many of whom aren't geeks.
I've done it live for years and then recorded the old fashioned way, audio+ video)
(which has been really resource and time intensive
But I've experimented with doing it remotely and with audio only. I've tried using talkshoe which give ok results but it does allow people to call in which is about the tech level I can get from them.

What would you recommend for dealing with people who 1) don't have a computer or 2)can't handle the call recorder thing?

Pierre Lebeaupin ·

Yeah, I was wondering how you deal with sync slippage. It can happen very easily: suppose you sample at 44.1 kHz, but the sound card (or USB mic, since it's the one which does the sampling in this case) of one of you actually samples at 44 kHz. It won't be noticed in the individual recordings (such a 2.3‰ pitch shift can't be heard), but the recordings will desync from each other by 8 seconds each hour. Oops!

PC software which has to deal with such things (such as, audio&video players/recorders, since the audio and video playbacks/records are controlled by different clocks; streaming software, as the packet delivering timing is controlled by the server; and of course software like Skype) know this and compensate in various ways.

This issue burned me when I discovered it by testing at my job and was tasked with investigating it: took me a while before I figured out the sound card in the test computer was actually recording at 8.1 kHz and not 8 kHz as asked (probably because it could only generate frequencies in multiple of 300 Hz, due to the way variable frequencies are usually generated by multiplying the frequency of a reference clock in a PLL). So trust me, it can happen, and yes, that badly (more than 1%!).

Notice the transmission latency will hide this somewhat (as long as the desync does not grow larger that the latency) as it provides some margin for the sync. What I'd do if this happened to me would be to change of one of the clips to be ever so slightly faster (or slower), so that not only the initial "start", but a final sync matches too.

Chris Ilias ·

If you're wearing headphones to hear your own side of the conversation and listening to the phone call or Skype call, are you wearing two headphones, or does the hardware mix the two?

Andrew Woods ·

@ bud: I had tons of friends in college that stacked their stereo speakers on one another - even worse were the ones that had mommy-and-daddy-bought full-on surround systems, but piled all the speakers in the corner of their dorm room! Being a bit of an audiophile, I always begged them to let me fix it. And while I appreciate that most people either don't notice or don't care about stereo separation, at least it would give Dan and John a way to shut up those who complain about them sounding alike. (To be fair, I thought that as well the first couple of episodes I watched, but now I can tell them apart!)

Paul D. Waite ·

> Not only is it easier

I dunno: the double-ender does mean you have to edit together the two ends. That’s not necessarily a lot of work, but it is work you have to do every time you record a podcast.

Joel’s set-up seems focused on avoiding editing if at all possible.

Andrew Krzmarzick ·

What are your thoughts on using something like Talkshoe?

http://www.talkshoe.com

Clarence Coggins Crown Prince of Web 2.0 ·

Yes I'm kinda like Andrew, why not use tech like conference call lines or other software sites that allow multiple call in line? This is great information and I will have to come back to it to study it more.

Jay Jennings ·

One reason not to use a conference call recording is quality. If each person is using a "real" microphone rather than whatever's built into a cell phone, house phone, etc., you're going to have better quality audio.

However, a conference call is a really good way to do an interview with someone who's not geeky enough to record their side of the conversation (or in situations where that just wouldn't be appropriate). You end up with (at least) one side of the conversation sounding like a phone call, but folks hear that on terrestrial radio all the time and so are used to it.

Mike Rose ·

We use Talkshoe for the weekly TUAW show but unfortunately the remote-side audio recording is truly, deeply crappy. We swap it out with a locally-recorded Skype call.

A double-ender config is certainly preferable for quality, and we've used it from time to time when we're not called in to TalkShoe. Since we sometimes have four or more participants on the Skype call, plus the TS-side chat, it's a bit daunting to have to collect and recombine all those local files.

Sorry, comments have been closed.