SpeechRTC - Speech enabling the open web: Difference between revisions

Jump to navigation Jump to search
No edit summary
Line 22: Line 22:
== The speech decoder ==
== The speech decoder ==


Third-party licensing is extremely costly (usual unit is millions) and lead to an unwanted dependency. Write a decoder from scratch is tough, and requires highly specialized and difficult to find engineers.  
* Decoder
** Third-party licensing is extremely costly (usual unit is millions) and lead to an unwanted dependency. Write a decoder from scratch is tough, and requires highly specialized and difficult to find engineers.  


The good news are that exists great open source toolkits that we can use and enhance. I am a long time supportert and contributor of CMU Sphinx that have a number of quality models on different languages openly available. Plus pocketsphinx can run very fast and accurate when well tuned for both FSG and LVSCR language models.  
The good news are that exists great open source toolkits that we can use and enhance. I am a long time supportert and contributor of CMU Sphinx that have a number of quality models on different languages openly available. Plus pocketsphinx can run very fast and accurate when well tuned for both FSG and LVSCR language models.  
Line 30: Line 31:
* Automatic retrain
* Automatic retrain


We should also build scripts to automatically adapt the acoustic model per user with his own voice, to constantly auto-improve the service individually for him but also for the service as overall.  
** We should also build scripts to automatically adapt the acoustic model per user with his own voice, to constantly auto-improve the service individually for him but also for the service as overall.  


* Privacy
* Privacy
   
   
Some argued with me about privacy on online services. At the ideal screnario, actually online recognition is required only for LVSCR, while FSG can be handled offline if architected correctly. I think letting users to choose or not to let us use his voice to improve models is how other OSes handle this issue.  
** Some argued with me about privacy on online services. At the ideal screnario, actually online recognition is required only for LVSCR, while FSG can be handled offline if architected correctly. I think letting users to choose or not to let us use his voice to improve models is how other OSes handle this issue.  


* Offline and online
* Offline and online


The same speech server can be designed to run both online as offline, letting the responsibility to handle transmission to the middleware that handle the connections with the front.
** The same speech server can be designed to run both online as offline, letting the responsibility to handle transmission to the middleware that handle the connections with the front.
 


== Web Speech API ==
== Web Speech API ==
Confirmed users
58

edits

Navigation menu