SIRI- Speech to Text

SIRI Speech to Text: A Step by step guide on how to get started on SIRI Kit

Excited about release of iOS 10 ?
We too..!!
Here I am going to explain a very simple procedure of using the SIRI Kit – A step by step guide on how to get started on it. Here is the complete guide on SIRI speech to text feature.

Prerequisite:

For this demo you will need Xcode 8.0 or Xcode 8.X Beta and a apple developer program.

Create a new project in XCode.

Select Single View Application click next.
Give a Product Name. Here we are setting it as: SIRIKitDemoApp
Choose your development team If you have already enrolled to the apple development program. If not, then add your development account from Xcode preferences -> accounts.
Set the Organization Name and Identifier for the project.
Set a primary implementation language to swift and click next.
Now, here we need the developer account.
Open your Apple development account and go to Certificates and Identifiers
Create a new development provisioning profile. Now we need to enable SIRI kit for your App Create a new App identifier with your App ID and name to it. Enable SiriKit from the App Services listed in it and click Continue and Register your app ID.
We need to create a provisioning profile for the SIRI Kit.
Create a new provisioning profile by clicking ‘+’.
Select iOS App Development profile and click continue.
Select your App ID listed in the dropdown and click continue.
Select your certificate(s) and devices, give your profile a name and generate.
Download and set
the provisioning profile in the Xcode. Here is the final look of your project settings
Now open the Xcode project and open Capabilities by selecting project name in navigator
Targets -> Capabilities. Turn on the Siri and we are done. 😉
This will add Siri Entitlements in you project automatically. These are no trust until you submit your app to the app store. But as we are going to use the same procedure to search your app content when your app is in background, we are prepared for the next tutorial in advance.

2. Code:

1. Import SpeechKit.
For starting with any framework, we have to import the framework in the project:
Import Speech
This we are using some methods from the class. So we should use its delegate methods of the framework.
We are using SFSpeechRecognizerDelegate for the project.
If you can peek into this delegate methods, We basically need one optional method from the delegate. i.e.

func speechRecognizer(_ speechRecognizer: SFSpeechRecognizer, availabilityDidChange available: Bool)

Enough the introduction. Now, Open your storyboard and design your viewController in following fashion.
There is only one “Welcome” Label, a UITextView and a button on the XIB to toggle the recording. Set them constraints as shown in the above image. you are free from the design stuff.
Don’t forget to outlet the all all components in the ViewController.
The only button that set to toggle the action of recording needs IBAction. We set that action as microphoneTapped. Following will be your blank IBAction method.

@IBAction func microphoneTapped(_   sender: AnyObject) {

}

We are going to use the SFSpeechRecognizer from the speech kit. Hence we need to initiate the speechRecognizer by passing it language type.

private let speechRecognizer = SFSpeechRecognizer(locale: Locale.init(identifier: “en-US”))

Add this to your code. Initialize the SFSpeechRecognizer just below the our textview/ microPhoneButton outlet. This is global declaration of the object.
Here we are providing the en-US language to the recognizer so that the recognizer will know the language you are going to speak. It may vary depending on the language type. You can assign the phone’s default language to the recognizer.
You have to request the authorization of your speechrecognizer. Depending on the user’s choice of allowing the speech to text or not, we are going to decide the actual calls to the recognizer.
Now, put following code in the microphoneTapped method.

var isButtonEnabled = false

SFSpeechRecognizer.requestAuthorization{ (authStatus) in

           switch authStatus {

           case .authorized : isButtonEnabled = true

               

           case .denied : isButtonEnabled = false

           print(“Speech recognition is denied”)

 

           case .restricted : isButtonEnabled = false

           print(“Speech recognition is restricted”)

               

           case .notDetermined : isButtonEnabled = false

           print(“Speech recognition not yet authorized”)

           

           }

}

 

OperationQueue.main.addOperation {

               self.microphoneButton.isEnabled = isButtonEnabled // 3

           }

1. Here Siri will ask the user to allow it to access your speechRecognizer.
2. Depending on the user’s input, the “startRecoding” button is going to be enabled or disabled.
3. This line of code is used to pass the user input value to the button. The reason why this line is put in the OperationQueue is we need to update the button’s property by a different thread.
Now Build and run your project. This is the first run for the app.
Crash..!!
You will get the log on your console as :
[access] This app has crashed because it attempted to access privacy-sensitive data without a usage description.  The app’s Info.plist must contain an NSSpeechRecognitionUsageDescription key with a string value explaining to the user how the app uses this data.
Aahh..!! We forgot to add the privacy Information in the Info.plist.
Open your Info.plist as source code, add following code after any </string>

<key>NSMicrophoneUsageDescription</key>

<string>Your microphone will be used to record your speech when you press the &quot;Start Recording&quot; button.</string>

<key>NSSpeechRecognitionUsageDescription</key>

<string>Speech recognition will be used to determine which words you speak into this device&apos;s microphone.</string>

Save the plist. Again, build and run the application. Now we have done it correctly. Right?
Tap the start recording button. This will lead to the following screen:

Handle the task of speech recognition.

Once you do not allow the siri to access your phone’s speechrecognition and microphone, the start recording button will get disabled. If you want to enable it again, then you need to go to setting->SIRIKitDemoApp->enable the services.
set your speechRecognizer’s delegate to self.
We need instances for requesting the recognition, actual recognition, and one for audio engine. Below is the code for that:

private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?

private var recognitionTask: SFSpeechRecognitionTask?

private let audioEngine = AVAudioEngine()

Create a new function named startRecording.
1. Firstly we need to check whether any recognition task is running in the app. If yes, we need to cancel it.

if  speechRecognitionTask != nil {

           speechRecognitionTask?.cancel()

           speechRecognitionTask = nil

       }

       

2. Start a audio session in the app to listen the user’s speech. We are recording the speech and then setting the audio season mode as measurement. Why not other modes? The answer is we are checking the voice for a very small amount of time. and not for VOIP. This mode is better for minimizing system supplied signals for processing.
This code throws exception sometimes. So we are going to put this code into try catch block.

let audioSession = AVAudioSession.sharedInstance()

       

       do {

           try audioSession.setCategory(AVAudioSessionCategoryRecord)

           try audioSession.setMode(AVAudioSessionModeMeasurement)

           try audioSession.setActive(true, with: .notifyOthersOnDeactivation)

       } catch {

           print(“audioSession properties weren’t set because of an error.”)

           

       }

       

3. The audio session we have set as recording, we need to send the request to the speechrecognizer.  So initiate a request, check whether your device has audioEngine, check whether the previous object of recognitionRequest is nil. Blah.. This is huge. 😉
Tell your recognizer that he can prompt the partial results of my speech as soon as you detect the words. (I other terms, I can read what you are interpreting while I am talking).

 recognitionRequest = SFSpeechAudioBufferRecognitionRequest ()

       

      guard let inputNode = audioEngine.inputNode else {

           fatalError(“Audio Engine has no input node”)

       }

       

       guard let recognitionRequest = recognitionRequest else {

           fatalError(“Unable to create and SFSpeechAudioBufferRecognitionRequest object”)

       }

       recognitionRequest.shouldReportPartialResults = true

4. Now, start the recognition handler with the speechRecognistionTask. This result handler will give you the results returned by the recognizer.  If there is some error, or the result is final, then we should stop the audioEngine and the remove the inputNode. ReEnable the microPhone button after result is returned.

speechRecognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in

           

           var isFinal = false;

           

           if result != nil{

              

               self.textView.text = result?.bestTranscription.formattedString

                     isFinal = (result?.isFinal)!

 

               if error != nil || isFinal {

                   self.audioEngine.stop()

                   inputNode.removeTap(onBus: 0)

                   

                   self.recognitionRequest = nil

                   self.speechRecognitionTask = nil

                   

                   self.microphoneButton.isEnabled = true

               }

           }

       })

5. Add an audio Input to the recognitionRequest with the recording format. We can add the Audio Input to the recognition request after it started. And then prepare the audioEngine.

let recordingFormat = inputNode.outputFormat(forBus: 0)

       inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in

           self.recognitionRequest?.append(buffer)

       }

       audioEngine.prepare()

       

6. As audio Engine throws exception, add it in the try catch block.

do{

           try audioEngine.start()

       } catch{

           print(“AudioEngine could not start because of an Error.”)

       }

       textView.text = “Say Something, I am listing!”

   }

Build and run your application. Siri will help you in converting your speech to text. 🙂
Complete code in the viewController will be like :

import UIKit

import Speech

 

class ViewController: UIViewController, SFSpeechRecognizerDelegate {

@IBOutlet weak var textView: UITextView!

@IBOutlet weak var microphoneButton: UIButton!

   

   private let speechRecognizer = SFSpeechRecognizer(locale: Locale.init(identifier: “en-US”))  //1

   private var recognitionRequest : SFSpeechAudioBufferRecognitionRequest?

   private var speechRecognitionTask : SFSpeechRecognitionTask?

   private let audioEngine = AVAudioEngine()  

   

override func viewDidLoad() {

super.viewDidLoad()

          

}

@IBAction func microphoneTapped(_   sender: AnyObject) {

       

       var isButtonEnabled = false

       

       SFSpeechRecognizer.requestAuthorization{ (authStatus) in

           

           switch authStatus {

               

           case .authorized : isButtonEnabled = true

                             

           case .denied : isButtonEnabled = false

           print(“Speech recognition is denied”)

               

           case .restricted : isButtonEnabled = false

           print(“Speech recognition is restricted”)

               

           case .notDetermined : isButtonEnabled = false

           print(“Speech recognition not yet authorized”)

               

           }

           

           OperationQueue.main.addOperation {

               self.microphoneButton.isEnabled = isButtonEnabled

           }

       }

       

       self.speechRecognizer?.delegate = self

       

           if audioEngine.isRunning {

               audioEngine.stop()

               recognitionRequest?.endAudio()

               microphoneButton.isEnabled = false

               microphoneButton.setTitle(“Start Recording”, for: .normal)

           } else {

               startRecording()

               microphoneButton.setTitle(“Stop Recording”, for: .normal)

           }

       

       }

   

   func startRecording() {

       if  speechRecognitionTask != nil {

           speechRecognitionTask?.cancel()

           speechRecognitionTask = nil

       }

             

       let audioSession = AVAudioSession.sharedInstance()

       

       do {

           try audioSession.setCategory(AVAudioSessionCategoryRecord)

           try audioSession.setMode(AVAudioSessionModeMeasurement)

           try audioSession.setActive(true, with: .notifyOthersOnDeactivation)

       } catch {

           print(“audioSession properties weren’t set because of an error.”)

           

       }

       

       recognitionRequest = SFSpeechAudioBufferRecognitionRequest ()

    

      guard let inputNode = audioEngine.inputNode else {

           fatalError(“Audio Engin has no input node”)

       }

       

       guard let recognitionRequest = recognitionRequest else {

           fatalError(“Unable to create and SFSpeechAudioBufferRecognitionRequest object”)

       }

       

       recognitionRequest.shouldReportPartialResults = true

       

       speechRecognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in

           

           var isFinal = false;

           

           if result != nil{

              

               self.textView.text = result?.bestTranscription.formattedString

                     isFinal = (result?.isFinal)!

 

               if error != nil || isFinal {

                   self.audioEngine.stop()

                   inputNode.removeTap(onBus: 0)

                   

                   self.recognitionRequest = nil

                   self.speechRecognitionTask = nil

                   

                   self.microphoneButton.isEnabled = true

               }

           }

       })

        

       let recordingFormat = inputNode.outputFormat(forBus: 0)

       inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in

           self.recognitionRequest?.append(buffer)

       }

       

       audioEngine.prepare()

       

       do{

           try audioEngine.start()

       } catch{

           print(“AudioEngine could not start because of an Error.”)

       }

       

       textView.text = “Say Something, I am listing!”

   }

     

   func speechRecognizer(_ speechRecognizer: SFSpeechRecognizer, availabilityDidChange available: Bool) {

       if available {

           microphoneButton.isEnabled = true

       } else {

           microphoneButton.isEnabled = false

       }

   }

}

This complete guide will help you get started with SIRI speech to text. Good luck !
Showing 2 comments
  • Fatih TAN

    Hi,

    Thank you for your inspiring article.

    Can you upload the project file?

  • Paresh Prajapati

    Hello this is real helpful tutorial, I have implemented in my ios app now i want to access this feature from siri how can i do it.
    When i start siri even if app is closed and say siri with app name in context will it start recording and convert my speech to text?

Leave a Comment