I generally suggest a consistent approach. So either use the video, or not. Advantages, besides non-verbal cues, include the greater potential for synchronicity...in Cases I discuss the degree of focus regardless of whether or not the tools allow for real-time exchange. Simply, if you can see the person you know they are there paying attention, and not trying to chat with 3 other people! For one thing, you then know whether the person is thinking about the answer, or doing something else.
I think the visual exchange helps to create a natural exchange and build trust and rapport. So even if you are not specifically collecting data about the nonverbal cues, facial expressions etc., you can use the video part to further your process.