Node.js - Google Cloud Text-to-Speech

For some reasons you may need to convert a text into an audio file. The so called text-to-speech technology allows you to do so. Developing your own text-to-speech technology takes a long time and it's not an easy thing. Therefore, the easiest solution is using a service, with the drawback of having to pay.

One of the Text-to-Speech service is provied by Google. It's known to have a pretty good results. They also provide the API, makes it easy to integrate with your application. In this tutorial, I'm going to show you the basic example usages of Google Text-to-Speech API in Node.js, from the preparation until the code.

Preparation

1. Create or select a Google Cloud project

A Google Cloud project is required to use this service. Open Google Cloud console, then create a new project or select existing project

2. Enable billing for the project

Like other cloud platforms, Google requires you to enable billing for your project. If you haven't set up billing, open billing page.

3. Enable Google Text-to-Speech API

To use an API, you must enable it first. Open this page to enable Text-to-Speech API.

4. Set up service account for authentication

As for authentication, you need to create a new service account. Create a new one on the service account management page and download the credentials, or you can use your already created service account.

In your .env file, you have to add a new variable

GOOGLE_APPLICATION_CREDENTIALS=/path/to/the/credentials

The .env file should be loaded of course, so you need to use a module for reading .env such as dotenv.

Dependencies

This tutorial uses @google-cloud/text-to-speech. Add the following dependency to your package.json and run npm install

  "@google-cloud/text-to-speech": "~0.3.0"
  "dotenv": "~4.0.0"
  "lodash": "~4.17.10"

1. Synthesize Speech

The example below is a basic example of how to use speech synthesization. You need to provide the text to synthesize, audio encoding, and voice output configuration (optional). If successful, it will return audioContent on the response body. Then you can write it to a file.

  require('dotenv').config();
  
  const _ = require('lodash');
  const fs = require('fs');
  
  const textToSpeech = require('@google-cloud/text-to-speech');
  
  const client = new textToSpeech.TextToSpeechClient();
  
  const request = {
    // The text to synthesize
    input: { text: 'This is an example' },
  
    // The language code and SSML Voice Gender
    voice: { languageCode: 'en-US', ssmlGender: 'NEUTRAL' },
  
    // The audio encoding type
    audioConfig: { audioEncoding: 'MP3' },
  };
  
  const outputFileName = 'output.mp3';
  
  client.synthesizeSpeech(request)
    .then(async (response) => {
      console.log(response);
      const audioContent = _.get(response[0], 'audioContent');
  
      if (audioContent) {
        fs.writeFileSync(outputFileName, audioContent, 'binary');
  
        console.log(`Audio content successfully written to file: ${outputFileName}`);
      } else {
        console.log('Failed to get audio content');
      }
    })
    .catch((err) => {
      console.error('ERROR:', err);
    });

2. List Voices

The example below is for getting the list of voices supported by Google Text-to-Speech service. You may need to run it to get the latest list.

  require('dotenv').config();
  
  const textToSpeech = require('@google-cloud/text-to-speech');
  
  const client = new textToSpeech.TextToSpeechClient();
  
  client.listVoices({})
    .then(async (response) => {
      console.log(JSON.stringify(response[0]));
    })
    .catch((err) => {
      console.error('ERROR:', err);
    });

Below is the list of supported voices at the time this post was written.

Language Code Name SSML Gender Natural Sample Rate (Hz)
es-ES es-ES-Standard-A FEMALE 24000
it-IT it-IT-Standard-A FEMALE 24000
ja-JP ja-JP-Standard-A FEMALE 22050
ko-KR ko-KR-Standard-A FEMALE 22050
pt-BR pt-BR-Standard-A FEMALE 24000
tr-TR tr-TR-Standard-A FEMALE 22050
sv-SE sv-SE-Standard-A FEMALE 22050
nl-NL nl-NL-Standard-A FEMALE 24000
en-US en-US-Wavenet-D MALE 24000
de-DE de-DE-Wavenet-A FEMALE 24000
de-DE de-DE-Wavenet-B MALE 24000
de-DE de-DE-Wavenet-C FEMALE 24000
de-DE de-DE-Wavenet-D MALE 24000
en-AU en-AU-Wavenet-A FEMALE 24000
en-AU en-AU-Wavenet-B MALE 24000
en-AU en-AU-Wavenet-C FEMALE 24000
en-AU en-AU-Wavenet-D MALE 24000
en-GB en-GB-Wavenet-A FEMALE 24000
en-GB en-GB-Wavenet-B MALE 24000
en-GB en-GB-Wavenet-C FEMALE 24000
en-GB en-GB-Wavenet-D MALE 24000
en-US en-US-Wavenet-A MALE 24000
en-US en-US-Wavenet-B MALE 24000
en-US en-US-Wavenet-C FEMALE 24000
en-US en-US-Wavenet-E FEMALE 24000
en-US en-US-Wavenet-F FEMALE 24000
fr-FR fr-FR-Wavenet-A FEMALE 24000
fr-FR fr-FR-Wavenet-B MALE 24000
fr-FR fr-FR-Wavenet-C FEMALE 24000
fr-FR fr-FR-Wavenet-D MALE 24000
it-IT it-IT-Wavenet-A FEMALE 24000
ja-JP ja-JP-Wavenet-A FEMALE 24000
nl-NL nl-NL-Wavenet-A FEMALE 24000
en-GB en-GB-Standard-A FEMALE 24000
en-GB en-GB-Standard-B MALE 24000
en-GB en-GB-Standard-C FEMALE 24000
en-GB en-GB-Standard-D MALE 24000
en-US en-US-Standard-B MALE 24000
en-US en-US-Standard-C FEMALE 24000
en-US en-US-Standard-D MALE 24000
en-US en-US-Standard-E FEMALE 24000
de-DE de-DE-Standard-A FEMALE 24000
de-DE de-DE-Standard-B MALE 24000
en-AU en-AU-Standard-A FEMALE 24000
en-AU en-AU-Standard-B MALE 24000
en-AU en-AU-Standard-C FEMALE 24000
en-AU en-AU-Standard-D MALE 24000
fr-CA fr-CA-Standard-A FEMALE 24000
fr-CA fr-CA-Standard-B MALE 24000
fr-CA fr-CA-Standard-C FEMALE 24000
fr-CA fr-CA-Standard-D MALE 24000
fr-FR fr-FR-Standard-A FEMALE 24000
fr-FR fr-FR-Standard-B MALE 24000
fr-FR fr-FR-Standard-C FEMALE 24000
fr-FR fr-FR-Standard-D MALE 24000

That's all about how to use Google Text-to-Speech API in Node.js. Thank you for reading this post.