User:Tedd/Speech Recognition: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(add more files)
(Add api limitation details)
Line 1: Line 1:
= Architecture =
= Architecture =
== API limitation ==
Acess to the speech recognition is limited to certified apps (at the time of writing), using the 'Func' parameter inside the WebIDL constructor:
<pre>
Func="SpeechRecognition::IsAuthorized"
</pre>
'''IsAuthorized''' is implemented inside the SpeechRecognition class:
<pre>
bool
SpeechRecognition::IsAuthorized(JSContext* aCx, JSObject* aGlobal)
{
  bool inCertifiedApp = IsInCertifiedApp(aCx, aGlobal);
  bool enableTests = Preferences::GetBool(TEST_PREFERENCE_ENABLE);
  bool enableRecognitionEnable = Preferences::GetBool(TEST_PREFERENCE_RECOGNITION_ENABLE);
  bool enableRecognitionForceEnable = Preferences::GetBool(TEST_PREFERENCE_RECOGNITION_FORCE_ENABLE);
  return (inCertifiedApp || enableRecognitionForceEnable || enableTests) && enableRecognitionEnable;
}
</pre>
== Relevant files in the Gecko source tree ==
== Relevant files in the Gecko source tree ==
pocketsphinx library code:
pocketsphinx library code:

Revision as of 20:03, 8 October 2015

Architecture

API limitation

Acess to the speech recognition is limited to certified apps (at the time of writing), using the 'Func' parameter inside the WebIDL constructor:

 Func="SpeechRecognition::IsAuthorized"

IsAuthorized is implemented inside the SpeechRecognition class:

bool
SpeechRecognition::IsAuthorized(JSContext* aCx, JSObject* aGlobal)
{
  bool inCertifiedApp = IsInCertifiedApp(aCx, aGlobal);
  bool enableTests = Preferences::GetBool(TEST_PREFERENCE_ENABLE);
  bool enableRecognitionEnable = Preferences::GetBool(TEST_PREFERENCE_RECOGNITION_ENABLE);
  bool enableRecognitionForceEnable = Preferences::GetBool(TEST_PREFERENCE_RECOGNITION_FORCE_ENABLE);
  return (inCertifiedApp || enableRecognitionForceEnable || enableTests) && enableRecognitionEnable;
}

Relevant files in the Gecko source tree

pocketsphinx library code:

./media/pocketsphinx

Speech recognition WebIDL:

./dom/webidl/SpeechRecognitionResultList.webidl
./dom/webidl/SpeechRecognitionResult.webidl
./dom/webidl/SpeechRecognitionAlternative.webidl
./dom/webidl/SpeechRecognition.webidl
./dom/webidl/SpeechRecognitionEvent.webidl
./dom/webidl/SpeechRecognitionError.webidl

WebIDL implementation (C++):

gecko/dom/media/webspeech/recognition/SpeechRecognitionResult.h
gecko/dom/media/webspeech/recognition/SpeechRecognition.h
gecko/dom/media/webspeech/recognition/SpeechRecognitionResultList.h
gecko/dom/media/webspeech/recognition/SpeechRecognitionResult.cpp
gecko/dom/media/webspeech/recognition/SpeechRecognitionAlternative.cpp
gecko/dom/media/webspeech/recognition/SpeechRecognitionResultList.cpp
gecko/dom/media/webspeech/recognition/SpeechRecognition.cpp
gecko/dom/media/webspeech/recognition/SpeechRecognitionAlternative.h

Recognition service IDL:

./dom/media/webspeech/recognition/nsISpeechRecognitionService.idl

Implementation of the IDL interface (C++):

./dom/media/webspeech/recognition/PocketSphinxSpeechRecognitionService.cpp
./dom/media/webspeech/recognition/PocketSphinxSpeechRecognitionService.h

Events Implementation (C++):

./dom/events/SpeechRecognitionError.h
./dom/events/SpeechRecognitionError.cpp

Association between components for speech recognition

Speech recognition functionality is available in JavaScript through WebIDL which is bound to a C++ class, which in return uses the nsISpeechRecognitionSerivce interface to communicate with the actual recognition service. This section should illustrate how each component is associated with one another.

In JavaScript (given the right permissions) a 'SpeechRecognition' object can be created:

var speech = new SpeechRecognition();
speech.start(stream);

The invoked function is defined inside a WebIDL file (SpeechRecognition.webidl):

interface SpeechRecognition : EventTarget {
    ...
    void start(optional MediaStream stream);
    ...
}

The SpeechRecognition interface and the start method, are itself implemented in a C++ class (SpeechRecognition::Start):

void
SpeechRecognition::Start(const Optional<NonNull<DOMMediaStream>>& aStream, ErrorResult& aRv)
{
  ...
  nsresult rv;
  rv = mRecognitionService->Initialize(this);
  ...
}

mRecognitionService is an instance of the class that implements the nsISpeechRecognitionService interface.

interface nsISpeechRecognitionService : nsISupports {
    void initialize(in SpeechRecognitionWeakPtr aSpeechRecognition);
    ...
}

In case of pocketpshinx, this class is defined in PocketSphinxSpeechRecognitionService.h which implements the Initialize function as well:

NS_IMETHODIMP
PocketSphinxSpeechRecognitionService::Initialize(
    WeakPtr<SpeechRecognition> aSpeechRecognition)
{
...
}

This class uses the pocketsphinx library for the speech recognition, an example of the library use is shown here:

rv = ps_process_raw(mPs, &mAudiovector[0], mAudiovector.Length(), FALSE,
                    FALSE);

rv = ps_end_utt(mPs);
confidence = 0;