Floatp Text-to-Speech Protocol – FTTSP/0.1
Introduction
The purpose of the protocol described in this document is to enable a client program (the client) to communicate with a server program (the server), in order to utilize it's speech synthesizing subsystem (the synthesizer).
The client issues requests for text to be spoken by the synthesizer. The server continually responds with status messages as the synthesizer progresses. The aquired status messages may be used by the client to update the screen to highlight the point where the synthesizer "is at".
A Unix domain socket, TCP socket, or similar may be used for message transport.
Packets
The protocol uses a trivial to parse, yet "human readable", scheme for message packing. Only capital letters are used, numbers are hexadecimal unsigned integers and fields are separated by spaces (20H). The ASCII character code table is used for the interchange, with one exception – the text destined for the synthesizer. Any encoding of the users choosing may be applied to the text to be spoken by the synthesizer, as long as both client and server supports it, obviously. For this document, the ASCII character code table suffices to exemplify "spoken text", and is used for the purpose of clarity.
Packet header
A packet header of four characters defines the total packet size, including the packet header itself. The packet size is encoded as a four digit hexadecimal unsigned integer.
Packet payload
The packet payload follows the packet header, separated by a space. The payload is divided into fields, each one separated by a space.
Packet header and payload fields
- Packet Size
The total size of the packet. An unsigned integer in the range 0..FFFFH, encoded as a string of four hexadecimal digits.
- Request Serial
The identity of the request. An unsigned integer in the range 0..FFFFH, encoded as a string of four hexadecimal digits.
- Request Name
The name of a requested operation. A string of four characters.
Name Description Remark ABRT Request current speak to be aborted HELO Request to handshake Optional SPEK Request text to be spoken - Packet Data
Any data applicable to the packet type.
- Response Type
A string of two characters.
Name Description Remark OK Successful request resolution Terminal ER Request rejected due to an error Terminal EV An event occured while serving the request Progression
Client requests
Packets sent from the client to the server.
<Packet Size> <Request Serial> <Request Name>[ <Packet Data>]
ABRT – Abort speak request
A client's request for the server to abort the currently served speak request.
<Packet Size> <Request Serial> ABRT 000E 0002 ABRT
HELO – Handshake request
A client's request for the server to abort the currently served speak request.
<Packet Size> <Request Serial> HELO 000E 0001 HELO
SPEK – Speak request
A client's request for a text to be spoken by the server's synthesizer subsystem.
<Packet Size> <Request Serial> SPEK <Text> 0024 0002 SPEK Floatp Text-to-Speech
Server responses
Packets sent from the server to the client.
<Packet Size> <Request Serial> <Request Name> <Response Type>[ <Data>]
ER – Request error response
A server's response packet of type ER reports that an error occured while processing a request. An error code is supplied as a 3-digit decimal integer in the packet data field. The connection is then closed by the server.
<Packet Size> <Request Serial> <Request Name> ER <3-Digit Decimal Integer> 0010 0002 SPEK ER 503
Error status codes resembles HTTP status codes.
Code | Description |
---|---|
400 | Bad Request |
500 | Internal Server Error |
503 | Service Unavailable |
EV – Request event response
An event packet is a server response of type EV. The event name is a five character word, stored in first field of the packet data, and any event parameters are stored in successive fields.
- HELO, ENVMT – Environment information event
Reports information about the server environment.
ENVMT 0028 0001 HELO EV ENVMT ENCODING "UTF-8"
- SPEK, ABRTD – Speak abortion event
Reports that the synthesizer has been stopped due to an abort (ABRT) request.
ABRTD 0017 0001 SPEK EV ABRTD
- SPEK, FNSHD – Speak finished event
Reports that the synthesizer is finished speaking.
FNSHD 0017 0003 SPEK EV FNSHD
- SPEK, PRGRS – Speak progress event
Reports that progress has been made by the synthesizer.
Parameters are two unsigned integers, each encoded as a string of four hexadecimal digits. These numbers forms a the range in the text for which this event was emitted. The first number is a character offset marking the beginning of the range. The second is the character count of the range.
PRGRS <Offset> <Count> 0021 0001 SPEK EV PRGRS 0004 0005
- SPEK, STRTD – Speak started event
Reports that the synthesizer has started speaking.
STRTD 0017 0001 SPEK EV STRTD
OK – Request success response
A server's response packet of type OK reports the successful resolution of a request.
<Packet Size> <Request Serial> <Request Name> OK 0011 0002 SPEK OK