In this article, you will learn how to create text-to-speech programs in Python. You will create a Python program that converts any text you provide into speech.
This is an interesting experiment to discover what can be created with Python and to show you the power of Python and its modules.
How can you make Python speak?
Python provides hundreds of thousands of packages that allow developers to write pretty much any type of program. Two cross-platform packages you can use to convert text into speech using Python are PyTTSx3 and gTTS.
Together we will create a simple program to convert text into speech. This program will show you how powerful Python is as a language. It allows us to do even complex things with very few lines of code.
The Libraries to Make Python Speak
In this guide, we will try two different text-to-speech libraries:
- PyTTSx3
- gTTS (Google text to Speech API)
They are both available on the Python Package Index (PyPI), the official repository for Python third-party software. Below you can see the page on PyPI for the two libraries:
- PyTTSx3: https://pypi.org/project/pyttsx3/
- gTTS: https://pypi.org/project/gTTS/
There are different ways to create a program in Python that converts text to speech and some of them are specific to the operating system.
The reason why we will be using PyTTSx3 and gTTS is to create a program that can run in the same way on Windows, Mac, and Linux (cross-platform).
Let’s see how PyTTSx3 works first…
Text-To-Speech With the PyTTSx3 Module
Before using this module remember to install it using pip:
pip install pyttsx3
If you are using Windows and you see one of the following error messages, you will also have to install the module pypiwin32:
No module named win32com.client
No module named win32
No module named win32api
You can use pip for that module too:
pip install pypiwin32
If the pyttsx3 module is not installed you will see the following error when executing your Python program:
ModuleNotFoundError: No module named 'pyttsx3'
There’s also a module called PyTTSx (without the 3 at the end), but it’s not compatible with both Python 2 and Python 3.
We are using PyTTSx3 because is compatible with both Python versions.
It’s great to see that to make your computer speak using Python you just need a few lines of code:
# import the module
import pyttsx3
# initialise the pyttsx3 engine
engine = pyttsx3.init()
# convert text to speech
engine.say("I love Python for text to speech, and you?")
engine.runAndWait()
Run your program and you will hear the message coming from your computer.
With just four lines of code! (excluding comments)
Also, notice the difference that commas make in your phrase. Try to remove the comma before “and you?” and run the program again.
Can you see (hear) the difference?
Also, you can use multiple calls to the say() function, so:
engine.say("I love Python for text to speech, and you?")
could be written also as:
engine.say("I love Python for text to speech")
engine.say("And you?")
All the messages passed to the say() function are not said unless the Python interpreter sees a call to runAndWait(). You can confirm that by commenting the last line of the program.
Change Voice with PyTTSx3
What else can we do with PyTTSx?
Let’s see if we can change the voice starting from the previous program.
First of all, let’s look at the voices available. To do that we can use the following program:
import pyttsx3
engine = pyttsx3.init()
voices = engine.getProperty('voices')
for voice in voices:
print(voice)
You will see an output similar to the one below:
<Voice id=com.apple.speech.synthesis.voice.Alex
name=Alex
languages=['en_US']
gender=VoiceGenderMale
age=35>
<Voice id=com.apple.speech.synthesis.voice.alice
name=Alice
languages=['it_IT']
gender=VoiceGenderFemale
age=35>
<Voice id=com.apple.speech.synthesis.voice.alva
name=Alva
languages=['sv_SE']
gender=VoiceGenderFemale
age=35>
<Voice id=com.apple.speech.synthesis.voice.amelie
name=Amelie
languages=['fr_CA']
gender=VoiceGenderFemale
age=35>
<Voice id=com.apple.speech.synthesis.voice.anna
name=Anna
languages=['de_DE']
gender=VoiceGenderFemale
age=35>
<Voice id=com.apple.speech.synthesis.voice.carmit
name=Carmit
languages=['he_IL']
gender=VoiceGenderFemale
age=35>
<Voice id=com.apple.speech.synthesis.voice.damayanti
name=Damayanti
languages=['id_ID']
gender=VoiceGenderFemale
age=35>
......
....
...
etc...
The voices available depend on your system and they might be different from the ones present on a different computer.
Considering that our message is in English we want to find all the voices that support English as a language. To do that we can add an if statement inside the previous for loop.
Also to make the output shorter we just print the id field for each Voice object in the voices list (you will understand why shortly):
import pyttsx3
engine = pyttsx3.init()
voices = engine.getProperty('voices')
for voice in voices:
if 'en_US' in voice.languages or 'en_GB' in voice.languages:
print(voice.id)
Here are the voice IDs printed by the program:
com.apple.speech.synthesis.voice.Alex
com.apple.speech.synthesis.voice.daniel.premium
com.apple.speech.synthesis.voice.Fred
com.apple.speech.synthesis.voice.samantha
com.apple.speech.synthesis.voice.Victoria
Let’s choose a female voice, to do that we use the following:
engine.setProperty('voice', voice.id)
I select the id com.apple.speech.synthesis.voice.samantha, so our program becomes:
import pyttsx3
engine = pyttsx3.init()
engine.setProperty('voice', 'com.apple.speech.synthesis.voice.samantha')
engine.say("I love Python for text to speech, and you?")
engine.runAndWait()
How does it sound? 🙂
You can also modify the standard rate (speed) and volume of the voice setting the value of the following properties for the engine before the calls to the say() function.
Below you can see some examples on how to do it:
Rate
rate = engine.getProperty('rate')
engine.setProperty('rate', rate+50)
Volume
volume = engine.getProperty('volume')
engine.setProperty('volume', volume-0.25)
Play with voice id, rate, and volume to find the settings you like the most!
Text to Speech with gTTS
Now, let’s create a program using the gTTS module instead.
I’m curious to see which one is simpler to use and if there are benefits in gTTS over PyTTSx or vice versa.
As usual, we install gTTS using pip:
pip install gtts
One difference between gTTS and PyTTSx is that gTTS also provides a CLI tool, gtts-cli.
Let’s get familiar with gtts-cli first, before writing a Python program.
To see all the language available you can use:
gtts-cli --all
That’s an impressive list!
The first thing you can do with the CLI is to convert text into an mp3 file that you can then play using any suitable applications on your system.
We will convert the same message used in the previous section: “I love Python for text to speech, and you?”
gtts-cli 'I love Python for text to speech, and you?' --output message.mp3
I’m on a Mac and I will use afplay to play the MP3 file.
afplay message.mp3
The thing I see immediately is that the comma and the question mark don’t make much difference. One point for PyTTSx that does a better job with this.
I can use the –lang flag to specify a different language, you can see an example in Italian…
gtts-cli 'Mi piace programmare in Python, e a te?' --lang it --output message.mp3
…the message says: “I like programming in Python, and you?”
Now we will write a Python program to do the same thing.
# Import the gTTS module
from gtts import gTTS
# This the os module so we can play the MP3 file generated
import os
# Generate the audio using the gTTS engine. We are passing the message and the language
audio = gTTS(text='I love Python for text to speech, and you?', lang='en')
# Save the audio in MP3 format
audio.save("message.mp3")
# Play the MP3 file
os.system("afplay message.mp3")
If you run the program you will hear the message.
Remember that I’m using afplay because I’m on a Mac. You can just replace it with any utilities that can play sounds on your system.
Looking at the gTTS documentation, I can also read the text more slowly passing the slow parameter to the gTTS() function.
audio = gTTS(text='I love Python for text to speech, and you?', lang='en', slow=True)
Give it a try!
Change Voice with gTTS
How easy is it to change the voice with gTTS?
Is it even possible to customize the voice?
It wasn’t easy to find an answer to this, I have been playing a bit with the parameters passed to the gTTS() function and I noticed that the English voice changes if the value of the lang parameter is ‘en-US’ instead of ‘en’.
The language parameter uses IETF language tags.
audio = gTTS(text='I love Python for text to speech, and you?', lang='en-US')
The voice seems to take into account the comma and the question mark better than before.
Also from another test it looks like ‘en’ (the default language) is the same as ‘en-GB’.
It looks to me like there’s more variety in the voices available with PyTTSx3 compared to gTTS.
Before finishing this section I also want to show you a way to create a single MP3 file that contains multiple messages, in this case in different languages:
from gtts import gTTS
import os
audio_en = gTTS('hello', lang='en')
audio_it = gTTS('ciao', lang='it')
with open('hello_ciao.mp3', 'wb') as f:
audio_en.write_to_fp(f)
audio_it.write_to_fp(f)
os.system("afplay hello_ciao.mp3")
The write_to_fp() function writes bytes to a file-like object that we save as hello_ciao.mp3.
Makes sense?
Work With Text to Speech Offline
One last question about text-to-speech in Python.
Can you do it offline or do you need an Internet connection?
Let’s run the first one of the programs we created using PyTTSx3.
From my tests, everything works well, so I can convert text into audio even if I’m offline.
This can be very handy for the creation of any voice-based software.
Let’s try gTTS now…
If I run the program using gTTS after disabling my connection, I see the following error:
gtts.tts.gTTSError: Connection error during token calculation: HTTPSConnectionPool(host='translate.google.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x11096cca0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))
So, gTTS doesn’t work without a connection because it requires access to translate.google.com.
If you want to make Python speak offline use PyTTSx3.
Conclusion
We have covered a lot!
You have seen how to use two cross-platform Python modules, PyTTSx3 and gTTS, to convert text into speech and to make your computer talk!
We also went through the customization of voice, rate, volume, and language that from what I can see with the programs we created here are more flexible with the PyTTSx3 module.
Are you planning to use this for a specific project?
Let me know in the comments below 🙂
Claudio Sabato is an IT expert with over 15 years of professional experience in Python programming, Linux Systems Administration, Bash programming, and IT Systems Design. He is a professional certified by the Linux Professional Institute.
With a Master’s degree in Computer Science, he has a strong foundation in Software Engineering and a passion for robotics with Raspberry Pi.
Hi, Yes I was planning to develop a program which would read text in multiple voices. I’m not a programmer and was looking to find the simplest way to achieve this. There are so many programming languages out there, would you say Python would be the best to for this purpose?
kind regards
Delton