Around six months ago, we took a serious look at supporting Skype in our IVR toolkit. Confident of our Xtend IVR 3.0 design which provides a driver approach capable of supporting any voice card, we ventured forth to see how complicated actual Skype integration would be. After all, our toolkit supported SIP, H323, Multimedia devices, Eicon Diva Server, Dialogic Global Call, Ai-Logix and TAPI. So what else could be more complicated than all these? Boy, were we wrong... :-)
Skype had many gotchas to overcome. Not the least being the absence of a documented Audio API and that is just for starters. Read on...
1. Streaming Audio: No problem, you know how to write a windows virtual audio device, don't you?
The Skype api does not provide any support for obtaining streaming audio data from Skype. So the method of using a windows virtual audio device is the only known method of obtaining audio necessary for implementing an IVR. This left us with the task of either licensing virtual audio cables or implementing a windows virtual audio driver from scratch.
2. Multiple Skypes Channels: Well you need to create as many user accounts.
An IVR has to support multiple channels. However the only known method to get multiple Skype channels up and running on a single system is to use the runas facility to run Skype as a different windows user. Windows uses the terminal services facility as a way to logon multiple users simultaneously. However this incurs extra cpu load and memory. Say for running eight channels of skype, you require eight different user accounts under windows which leads us to the next problem.
3. The API is easy: Unless you have multiple skypes running, then it's an IPC nightmare!
The Skype API under windows uses WM_COPYDATA which is restricted to within the user account boundary. Therefore a single program cannot control multiple skype instances. The only way to get around this limitation is to have a helper program running in the same user account as skype, access the api and then pipe the data back to the controlling application via inter-process communication techniques. Think you want to debug this... good luck ;-).
4. And the sampling rate is: How about 48 Khz?
Left to itself, Skype selects 48 Khz as the audio sampling rate. Surprised? I was too. A standard IVR functions somewhere in the 8 Khz to 11 Khz range for audio. Any attempt to massage a standard IVR to function with Skype requires one to convert between these sampling rates. To get the best audio quality you will need to interpolate the 8 Khz data to the 48 Khz range.
5. Working on your computer: Sorry, I am going to show popups all over the desktop.
Skype does not provide a silent mode of operation. The gui keeps popping up informational message and queries throughout a automated skype session. And since you are dealing with multiple skype instances, any hope of simultaneously using your computer for other purposes is a foregone conclusion. (Update: As of Oct 2006 Skype now provides a command for turning itself silent, but this command still has issues that needs to be threshed out).
6. Need Chat & SMS support: Oops... how do I implement that in my IVR?
Skype allows extra functionality like chat messages and sms which have to be supported. Surprisingly enough, the practical approach seems to be to provide a way to access the raw skype command and response strings from the scripting language. The interface is then further simplified by providing appropriate functions in the script that encapsulate the required functionality. This approach also has the advantage of future proofing the design to support newer skype features.
The proof-of-concept Skype driver has made it into the March release of the Xtend IVR 3.0 developer edition and can be downloaded from
http://www.xtendtech.com/ivr.