-
Notifications
You must be signed in to change notification settings - Fork 67
/
TODO
213 lines (167 loc) · 8.46 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
Speech Dispatcher TODO
======================
The release versions are not final, and could change. Targetted release is
based on demand from users, and difficulty of the work involved.
* Add pitch as an option for capitalization presentation
see https://github.com/brailcom/speechd/issues/24
* Allow setting a synthesis voice in the user config using spd-conf.
* Add support for Mimic
see https://github.com/brailcom/speechd/issues/19
* Drop getting information from other clients
see https://github.com/brailcom/speechd/issues/335
(0.11) Migrate to GSettings.
(0.11) Synthesizer specific settings API.
(0.11) Use more GLib in the server.
(0.11) Client audio retrieval API.
(0.11) Server to module protocol documentation.
(0.12) Server to module protocol improvements.
* Move synth modules to plugin architecture with plugin host.
* Synth plugin API.
* Allow for building synth plugins out of tree.
(0.11) Integrate with logind/consolekit.
(0.11) Properly support system-wide mode.
* Support spawning the server via Systemd socket activation.
The above improvements are documented in detail below. If work has started on
a particular project, a git branch will be noted. These git branches are
located at https://github.com/TheMuso/speechd-wip.git. To read the most up to
date copy of this file, please clone the master Speech Dispatcher git
repository, located at git://git.freebsoft.org/git/speechd.gitand check out
the master branch.
Migrate to GSettings
--------------------
* Write the GSettings metadata XML file.
* Migrate the server to GSettings.
* Listen to GSettings changes.
* Migrate synthesizer modules to GSettings.
* Write a program to migrate user settings to GSettings.
Synthesizer specific settings API
Depends on: Migration to GSettings
---------------------------------
Background:
* Currently have API for espeak pitch range in git master, but this is only
useful for espeak.
* Espeak module has a config option to show variants along with available
voices, which can be a very long list and can choak some clients.
* Implement server to module protocol to support:
- Request available settings.
- Request available settings and their current value.
- Request the value of a setting.
- Set a setting.
- Reset a setting to its default.
* Implement SSIP protocol support.
* Implement C API, see synthesizer specific settings C API draft.
* Implement python API.
Synthesizer specific settings C API draft
typedef struct {
char *name;
char *description; /* This should be localized */
enum SynthSettingValueType get_type;
enum SynthSettingValueType set_type;
int min_value;
int max_value;
char **value_list;
void *cur_value;
] SynthSetting;
In the C API, a NULL terminated array of this structure would be returned for
all settings a synth offers.
The SynthSettingValueType enum would look something like this:
typedef enum {
SYNTH_SETTING_VALUE_UNKNOWN = 0,
SYNTH_SETTING_VALUE_NUMBER = 1,
SYNTH_SETTING_VALUE_STRING = 2,
SYNTH_SETTING_VALUE_STRING_LIST = 3 /* A list of strings for the user
to choose from, i.e voice variants */
} SynthSettingValueType;
C API methods to work with these data types could be as follows:
SynthSetting **spd_synth_get_settings(SPDConnection *connection);
int spd_synth_set_setting(SPDConnection *connection, SynthSetting *setting,
void *value);
void free_synth_settings(SynthSettings **settings);
Use more GLib in the server
---------------------------
* Use GLib event loops where possible.
* use GLib GThreads and GAsyncQueues for thread communication.
* Use g_spawn calls for executing modules.
* Support multiple client connection methods, unix socket, inet socket.
* Use g_debug and other relevant GLib logging facilities for
messages/logging.
* Use GThreadedSocketService for handling client connections.
* Replace custom implementations of parsing buffers with GLib equivalent
methods where possible.
Move audio into server
----------------------
* Consider using a separate socket for audio transfer, however this may be
difficult when attempting to synchronise with index marks. An alternative is
to send index mark data via the audio socket as well.
* Rework modules supporting audio output to not use any advanced internal
playback queueing, and simply send the audio in relatively small buffers to
the server. Smaller buffers to allow the server to stop/pause the audio more
responsively.
* Extend priority system to be either global priority, or priority per audio
output device.
* Rework pulseaudio output to use a GLib event loop.
* Rework other audio output modules to better work within an event loop.
Client audio retrieval API
--------------------------
* Allow client to either request audio directly, or have audio written to a
designated file on disk.
* Allow modules to decline the use of direct audio retrieval. I know of one
speech synth that is not supported by speech dispatcher, who's licensing
model doesn't allow for direct audio retrieval. If this module is ever
supported, its code will likely remain closed to prevent people working
around the implementation, but it would still be nice to support this synth
in the longer term. (Luke Yelavich)
* Load a new instance of the requested synth module, and spin up a worker
thread to handle audio file writing or sending to client, to allow the server
to dispatch other speech messages, as direct audio retrieval should be
independant of the priority system.
Server to module protocol documentation
---------------------------------------
* Similar to the SSIp documentation, write up a texi document that explains
the server to module protocol, currently over stdin/stdout, but may use other
IPC in the future.
Server to module protocol improvements
--------------------------------------
* Consider using sockets for IPC, with a dedicated socket per module.
* Consider implementing shared memory support, particularly for audio data
transfer, but this may depend on whether GLib has a shared memory API, The
GMappedFile API may be useful, if the initiator can change the contents of
the GMappedFile, and the other side can notice changes. Needs investigation.
* Support the launching of modules via systems other than Speech Dispatcher,
useful where containers of some sort are being used, and the environment
requires that any separate processes are run in containers/other kind of
sandbox, hense the use of sockets as per above.
Integrate with logind/consolekit
(Depends on migration to GSettings, GLib main event loops everywhere)
--------------------------------
* Query current user, and currently running sessions for that user.
* Subscribe to tty change events and cork audio playback and synthesis flow
if none of the user's sessions are active.
* Allow the enabling/disabling of logind/consolekit via GSettings and at
runtime, enabled being the default.
* Allow the disabling of consolekit/logind at build time.
* Consider abstracting this functionality into plugins, or at the very least
separate code with an internal API to more easily support any future
session/seat monitoring systems.
Properly support system-wide mode
---------------------------------
* Set a default user and group for the system wide instance to run under, at
build time, and runtime.
* Add a systemd unit to allow the use of system wide mode, disabled by
default.
Support spawning the server via Systemd socket activation
---------------------------------------------------------
* Allow this to be enabled/disabled at build time.
Copyright (C) 2001 Brailcom, o.p.s
Copyright (C) 2016 Luke Yelavich <themuso@themuso.com>
Copyright (C) 2018-2021 Samuel Thibault <samuel.thibault@ens-lyon.org>
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later
version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details (file
COPYING in the root directory).
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.