Confirmed users
58
edits
Andrenatal (talk | contribs) No edit summary |
Andrenatal (talk | contribs) |
||
Line 22: | Line 22: | ||
== The speech decoder == | == The speech decoder == | ||
* Decoder | |||
** Third-party licensing is extremely costly (usual unit is millions) and lead to an unwanted dependency. Write a decoder from scratch is tough, and requires highly specialized and difficult to find engineers. | |||
The good news are that exists great open source toolkits that we can use and enhance. I am a long time supportert and contributor of CMU Sphinx that have a number of quality models on different languages openly available. Plus pocketsphinx can run very fast and accurate when well tuned for both FSG and LVSCR language models. | The good news are that exists great open source toolkits that we can use and enhance. I am a long time supportert and contributor of CMU Sphinx that have a number of quality models on different languages openly available. Plus pocketsphinx can run very fast and accurate when well tuned for both FSG and LVSCR language models. | ||
Line 30: | Line 31: | ||
* Automatic retrain | * Automatic retrain | ||
** We should also build scripts to automatically adapt the acoustic model per user with his own voice, to constantly auto-improve the service individually for him but also for the service as overall. | |||
* Privacy | * Privacy | ||
** Some argued with me about privacy on online services. At the ideal screnario, actually online recognition is required only for LVSCR, while FSG can be handled offline if architected correctly. I think letting users to choose or not to let us use his voice to improve models is how other OSes handle this issue. | |||
* Offline and online | * Offline and online | ||
** The same speech server can be designed to run both online as offline, letting the responsibility to handle transmission to the middleware that handle the connections with the front. | |||
== Web Speech API == | == Web Speech API == |