This is a bit more of an explanation that what Phil had mentioned but basically on the same lines.

There are a few factors to consider. First, have you verified that the audio and video are in sync in from the post software that you output from? Sounds like a stupid question but worth checking. I'm sure you probably have.

Something else to look at is the machine performance. When you run this, are you only running one layer? What is the resolution? What kind of hardware? It may not be a hardware issue though and your performance could be fine. Then, it's simply a slip of the video from the audio.

Consider that the video portion of the media is playing at one speed and the audio at another. Here's why:

A media file has two parts, the audio and the video. While they are read at the same time, they are not linked except at their starting point. Think of it like a 10 meter steel cable and a 10 meter bungie cord connected to the same place but both have a weight on them at the very end The longer the lengths the greater the difference in the bungie versus the steel cable.

In a computer, audio is given highest priority, comparable to a steel cable. It will run frame accurate at the rate it was recorded. However, the video will not run at its default frame rate unless deliberately specified. It's like the bungie cord.

The video runs at multiples of the refresh rate of the video card. Basically 60hz means 30fps. This is, unless you force it by using the speed control in catalyst. Or if you're using a devices that genlocks the frame rate exactly as its specified.

We know a media server is not 59.94. It's 60. (actually not exactly 60 but definitely not 59.94 either).

Here's an explanation of what you might be seeing. If the video is 5 minutes long (300 seconds) and the post software outputs the final at 29.97fps progressive then it would have 8991 frames.

Since the video card counts single frames of playback at 30fps then you will complete 8991 frames in 299.7 seconds. Not 300. This means that you see a 9 frame difference between the audio and the video after 5 minutes.

If you're running through an image processor, scaler, switcher, etc then you loose a few frames already. It would definitely be noticeable after a few minutes.

If you can render the file again from the post software so that the video is exactly 30fps then it should work. Basically what this would be doing is introducing one new duplicate frame every 33 frames (1.1 seconds).

I've actually done this manually in the past just experimenting. Literally adding duplicate frames to a video I didn't have the source to anymore.