Synchronicity

Over the last year or so, as the user of both desktop systems and a laptop, I’ve had to deal with the challenge of trying to keep all of my data in sync between the machines I use, as well as maintaining a good, reliable backup of all my files.

I thought I might share how I solved these problems using a few homegrown tools, some web-based software, and a commercial service.

Some Background Info: My Setup

I use several machines on a daily basis. As of this writing, my primary two machines are:

The iMac makes for the perfect desktop machine (much faster than the 1.8Ghz Dual-G5 it replaced), and the PowerBook is the ideal mobile workstation. I use a few other machines from time to time as well, including one of my all-time favorite machines, a PowerMac G4 Cube, which became an excellent fileserver, backup, and testing machine once I installed a larger, faster hard drive. I still find myself in front of this machine, as well as remote machines, from time to time.

Goals

My goal here wasn’t just to have my files be available on any machine I use, but also to have a live backup of everything, so that the loss of any (or worse, several) machines would have virtually no impact on the integrity or availability of my data.

In the past, I’ve used Retrospect, Firewire drives, and the like to make local backup copies of everything. These measures have saved me time and time again, so I understand their value. But my solution here would be in addition to those methods. I’m talking about not having to worry about this anymore, about not being dependent on my own infrastructure for the integrity of my data, about being able to get to everything from anywhere at any time on any machine and know that it’s up to date.

Kinds of Data

I had about 50GB of data to manage. That’s actually a lot of stuff, so it made sense to spend a bit of time thinking about what kind of data I create and store on my machines. It turns out that it’s easy to categorize everything I have into five basic types of files:

  1. Source code (100MB)
  2. Music (34GB)
  3. Photos and images (9GB)
  4. Email (200MB)
  5. “Everything else” (5GB)

The first category (Source code) and the last category (Everything else) are the ones I’m most worried about.

The term “Source code” might need a little qualification or explanation, because I’m using the term code a bit loosely here. This category would (obviously) include things like Ruby, PHP, and Java files. But I extend it to also include XHTML and CSS files, images, Photoshop files, even Word documents—anything that’s used to create a project (where tracking changes is important) would fit here.

“Everything else” is simply a category for the remaining files that don’t fit into the first four categories. I’m talking about passwords, bookmarks, address book entries. Also things like PDF files, archived documents, text files, fonts, legal documents, old proposals and RFPs, and supportive files that won’t be changed or edited anymore. Things I might need on any given machine or while I’m on the go.

I clone my drive using the amazing SuperDuper! (any word on the Universal binary? Dave? Bruce?). This takes care of backing up my Music and Photos, which I don’t need to be available on every machine or over the Internet. The music lives on both the iMac and an iPod video as well, so it’s relatively secure and portable. And when I want to share my photos, I upload them to Flickr.

Requirements

I realized that I’d probably need to come up with more than one solution for each of the different categories, not only because of the different types of files, but because of the different ways I work with them. In each case, the solution had to meet the following requirements:

  1. Redundancy
  2. Availability
  3. Synchronization
  4. Security

I was an IT guy for a good while, setting up systems, servers, and networks. I know how things should work, and I’m pretty picky about data integrity, redundancy, and availability. This had to be right.

So, what solutions did I come up with?

Managing Source Code

The decision to use Subversion was an obvious one. In my opinion, Subversion is the best solution out there for managing any type of source code, from Java to Ruby to XHTML to Photoshop files.

What is Subversion? Perhaps the Version Control with Subversion book says it best:

Subversion is a free/open-source version control system. That is, Subversion manages files and directories over time. [It] remembers every change ever made to your files and directories. This allows you to recover older versions of your data, or examine the history of how your data changed.

Subversion can access its repository across networks, which allows it to be used by people on different computers. It is a general system that can be used to manage any collection of files. For you, those files might be source code—for others, anything from grocery shopping lists to digital video mixdowns and beyond.

So here we have a free solution that does everything I’d need to manage my source, and would even allow me to open up my code to others for collaborative efforts and projects.

But there was a problem: Where would I setup the repository that would be available 24/7, securely, over the Internet?

My existing SVN repository was hosted on the G4 Cube here in my office. But it was far from super-redundant, and making it available across the Internet meant opening a hole in my firewall and potentially exposing the whole network. Sure, it’s a small risk, but why go there?

Further, while many web-hosting companies offer this kind of service, it’s usually an afterthought to their core business: delivering websites to the masses. And when it is offered, it’s not done securely over https. Rather, it’ll be wide open, using http. And if you’re like me, it’s unlikely that you, your customers, colleagues, and employers would be happy to know you’re sending their code out over the Internet in a completely unencrypted, exposed way, and that it might be stored without true redundancy.

Still, other questions remain. How frequently is the SVN repository backed up? Is it in a secure location? Can I export the data at will? Do I need to be a rocket scientist to get things setup and use the service?

I was chatting about this with my friend Duncan Davidson, and it turned out he had the same concerns, and the same needs. Then we realized we had the experience, the know-how, and even the hardware in place to set this up.

So in our spare time, we went ahead and set up a Subversion service for ourselves using solid hardware, running in data center near Duncan’s place. Now our data is available from anywhere at any time. It’s been a challenge to automate and set up correctly, but it’s been fun, and is working out really well so far.

Source Code Storage Problem: solved.

Managing Email

Most people store their email in one of two places—they leave it on the server or they download it to their hard drive.

Storing email on the server, especially if you’re lucky enough to be able to use IMAP over SSL, is relatively secure and reliable. When using IMAP, you can see all of your mail on every machine you use. Most hosts back up email, as email services are usually a core part of what they do well. Further, most hosts provide web interfaces to their email, so remote access to the email archive is usually possible as well (assuming, again, you’re onboard with IMAP).

But I’ve got a few email addresses, some on different hosting services, and I’d like to store everything in one place and use a single interface when I’m on the road. What I really needed was a central archive, independent and autonomous of any specific host or email account. I’d use it to store all of my mail online, regardless of how it came in.

Enter .Mac. While most people probably think that .Mac is primarily a photo gallery service, one of its biggest strengths actually lies in its robust support of IMAP for email archival. And Apple uses SSL for both IMAP as well as for webmail. Further, this is Apple we’re talking about. They’re doing great, and so is their .Mac service. They’re not about to go out of business, shut down, or sell out. They’re here to stay, and so is .Mac.

So now I’ve got all my email safely in one independent, secure, reliable location. Problem solved.

Everything Else

Deciding how to manage this category of “stuff” was a bit tougher, because on the one hand, I need these files on many machines or while I’m on the go, but they don’t see enough changes or activity to warrant them living in a Subversion repository.

I was already using .Mac for email management, and iSync works great for managing bookmarks, passwords, and address book entries.

So why not see if I could leverage the service here as well?

iDisk is a part of .Mac which allots you storage space on Apple’s Internet servers. They’re up 24/7, redundant, and well-backed up, and they’ve got a nice looking (and fast) web user interface that works on any machine (Mac or PC). This means I can use the Windows 95 machine in the hotel lobby to download and print a copy of the boarding pass I saved from my PowerBook to my iDisk the day before the flight. Perfect.

But here’s the great part. Gruber told me about a very cool iDisk feature which creates a copy of your iDisk on your computer. You can make changes to it at any time, even when you’re not connected to the Internet, and as soon as you are, the changes will be reconciled and synchronized automatically. Now every machine I use will have the same data automatically … and it’ll be available online as well wherever I am.

Problem solved!

Conclusions

Yeah, I wound up spending a little bit of money … both on the SVN solution as well as on .Mac. But so far, it’s been well worth it, and I couldn’t be happier with the solution.

I’ll be writing a follow-up here in a few months with my thoughts and feedback, as well as any new discoveries I’ve made.

More articles in the Archive →