Python Yield

Python Yield: Create Your Generators [With Examples]

The Python yield keyword is something that at some point you will encounter as developer. What is yield? How can you use it in your programs?

The yield keyword is used to return a value to the caller of a Python function without losing the state of the function. When the function is called again its execution continues from the line after the yield expression. A function that uses the yield keyword is called generator function.

This definition might not be enough to understand yield.

That’s why we will look at some examples of how to the yield keyword in your Python code.

Let’s start coding!

Regular Functions and Generator Functions

Most developers are familiar with the Python return keyword. It is used to return a value from a function and it stops the execution of that function.

When you use return in your function any information about the state of that function is lost after the execution of the return statement.

The same doesn’t happen with yield…

When you use yield the function still returns a value to the caller with the difference that the state of the function is stored in memory. This means that the execution of the function can continue from the line of code after the yield expression when the function is called again.

That sounds complicated!?!

Here is an example…

The following regular function takes as input a list of numbers and returns a new array with every value multiplied by 2.

def double(numbers):
    double_numbers = []
    for number in numbers:
        double_numbers.append(2*number)
    return double_numbers

numbers = [3, 56, 4, 76, 45]
print(double(numbers))

When you execute this code you get the following output:

[6, 112, 8, 152, 90]

When the function reaches the return statement the execution of the function stops. At this point the Python interpreter doesn’t keep any details about its state in memory.

Let’s see how we can get the same result by using yield instead of return.

def double(numbers):
    for number in numbers:
        yield 2*number

numbers = [3, 56, 4, 76, 45]
print(double(numbers))

This new function is a lot simpler…

…here are the differences from the function that was using the return statement:

  • We don’t need the new double_numbers list.
  • We can remove the line that contains the return statement because we don’t need to return an entire list back.
  • Inside the for loop we can directly use yield to return values to the caller one at the time.

What output do we get this time from the print statement?

<generator object double at 0x7fc8600ac820>

A generator function returns a generator object.

In the next section we will see how to read values from this generator object.

Read the Output of Generator Functions

Firstly let’s recap what yield does when is used in a Python function:

A function that contains the yield keyword is called generator function as opposed to a regular function that uses the return keyword to return a value to the caller. The behaviour of yield is different from return because yield returns values one at the time and pauses the execution of the function until the next call.

In the previous section we have seen that when we print the output of a generator function we get back a generator object.

But how can we get the values from the generator object in the same way we do with a regular Python list?

We can use a for loop. Remember that we were calling the generator function double(). Let’s assign the output of this function to a variable and then loop through it:

double_gen = double(numbers)

for number in double_gen:
    print(number)

With a for loop we get back all the values from this generator object:

6
112
8
152
90

In the exact same way we could use this for loop to print the values in the list returned by the regular function we have defined. The one that was using the return statement.

So, what’s the difference between the two functions?

The regular function creates a list in memory and returns the full list using the return statement. The generator function doesn’t keep the full list of numbers in memory. Numbers are returned, one by one, each time the generator function is called in the for loop.

We can also get values from the generator using the next() function.

The next function returns the next item in the generator every time we pass the generator object to it.

We are expecting back a sequence of five numbers. Let’s pass the generator to the next() function six times and see what happens:

double_gen = double(numbers)

print(next(double_gen))
print(next(double_gen))
print(next(double_gen))
print(next(double_gen))
print(next(double_gen))
print(next(double_gen))

[output]
6
112
8
152
90
Traceback (most recent call last):
  File "/opt/python/yield/yield_tutorial.py", line 15, in 
    print(next(double_gen))
StopIteration

The first time we call the next() function we get back 6, then 112, then 8 and so on.

After the fifth time we call the next() function there are no more numbers to be returned by the generator. At that point we call the next() function again and we get back a StopIteration exception from the Python interpreter.

The exception is raised because no more values are available in the generator.

When you use the for loop to get the values from the generator you don’t see the StopIteration exception because the for loop handles it transparently.

Next Function and __next__() Generator Object Method

Using the dir() built-in function we can see that __next__ is one of the methods available for our generator object.

This is the method that is called when we pass the generator to the next() function.

print(dir(double_gen))

[output]
['__class__', '__del__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__name__', '__ne__', '__new__', '__next__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'close', 'gi_code', 'gi_frame', 'gi_running', 'gi_yieldfrom', 'send', 'throw']

Python methods whose name starts and ends with double underscores are called dunder methods.

How to Convert a Generator to a Python List

In our example of generator we have seen that when we print the value of the generator variable we get back a reference to a generator object.

But, how can we see all the values in the generator object without using a for loop or the next() function?

A way to do that is by converting the generator into a Python list using the list() function.

double_gen = double(numbers)
print(double_gen)
print(list(double_gen))

[output]
<generator object double at 0x7f821007c820>
[6, 112, 8, 152, 90]

As you can see we got back the list of numbers in the generator as a list.

This doesn’t necessarily makes sense considering that one of the reasons you would use a generator is that generators require a lot less memory than lists.

That’s because when you use a list Python stores every single element of the list in memory while a generator returns only one value at the time. Some additional memory is required to “pause” the generator function and remember its state.

When we convert the generator into a list using the list() function we basically allocate memory required for every element returned by the generator (basically the same that happens with a regular list).

In one of the next sections we will analyse the difference in size between a list and a generator.

Generator Expressions

We have seen how to use the yield keyword to create generator function.

This is not the only way to create generators, you can also use a generator expression.

To introduce generator expression we will start from an example of list comprehension, a Python construct used to create lists based on existing lists in a one liner.

Let’s say we want to write a list comprehension that returns the same output of the functions we have defined before.

The list comprehension takes a list and returns a new list where every element is multiplied by 2.

numbers = [3, 56, 4, 76, 45]
double_numbers = [2*number for number in numbers]
print(type(double_numbers))
print(double_numbers)

The list comprehension starts and ends with a square bracket and in a single line does what the functions we have defined before were doing with multiple lines of code.

<class 'list'>
[6, 112, 8, 152, 90]

As you can see the value returned by the list comprehension is of type list.

Now, let’s replace the square brackets of the list comprehension with parentheses. This is a generator expression.

numbers = [3, 56, 4, 76, 45]
double_numbers = (2*number for number in numbers)
print(type(double_numbers))
print(double_numbers)

This time the output is slightly different…

<class 'generator'>
<generator object <genexpr> at 0x7feb88224820>

The object returned by the new expression is a generator, it’s not a list anymore.

We can go through this generator in the same way we have seen before by using either a for loop or the next function:

print(next(double_numbers))
6

To convert a list comprehension into a generator expression replace the square brackets that surround the list comprehension with parentheses.

Notice that there is a small difference in the way Python represents an object returned by a generator function and a generator expression.

Generator Function

<generator object double at 0x7f821007c820>

Generator Expression

<generator object <genexpr> at 0x7feb88224820>

More About Using Yield in a Python Function

We have seen an example on how to use yield in a function but I want to give you another example that clearly shows the behaviour of yield.

Let’s take the generator function we have created before and add some print statements to show exactly what happens when the function is called?

def double(numbers):
    for number in numbers:
        print("Before yield - Number: {}".format(2*number))
        yield 2*number
        print("After yield - Number: {}".format(2*number))

numbers = [3, 56, 4]
double_gen = double(numbers)

When we call the next() function and pass the generator we get the following:

>>> next(double_gen)
Before yield - Number: 6

The first print statement and the yield statement are executed. After that the function is paused and the value in the yield expression is returned.

When we call next() again the execution of the function continues from where it left before. Here is what the Python interpreter does:

  1. Execute the print statement after the yield expression.
  2. Start the next iteration of the for loop.
  3. Execute the print statement before the yield expression.
  4. Return the yielded value and pause the function.
>>> next(double_gen)
After yield - Number: 6
Before yield - Number: 112

This gives you a better understanding of how Python pauses and resumes the state of a generator function.

How To Yield a Tuple in Python

In the examples we have seen so far we have been using the yield keyword to return a single number.

Can we apply yield to a tuple instead?

Let’s say we want to pass the following list of tuples to our function:

numbers = [(3, 4), (56, 57), (4, 5)]

We can modify the previous generator function to return tuples where we multiply every element by 2.

def double(numbers):
    for element in numbers:
        print("Before yield {}".format((2*element[0], 2*element[1])))
        yield (2*element[0], 2*element[1])
        print("After yield {}".format((2*element[0], 2*element[1])))

In the same way we have done before let’s call the next() function twice and see what happens:

First call

double_gen = double(numbers)
next(double_gen)

[output]
Before yield (6, 8)

Second call

next(double_gen) 

[output]
After yield (6, 8)
Before yield (112, 114)

So, the behaviour is exactly the same.

Multiple Yield Statements in a Python Function

Can you use multiple yield statements in a single Python function?

Yes, you can!

The behaviour of the generator function doesn’t change from the scenario where you have a single yield expression.

Every time the __next__ method gets called on the generator function the execution of the function continues where it left until the next yield expression is reached.

Here is an example. Open the Python shell and create a generator function with two yield expressions. The first one returns a list and the second one returns a tuple:

>>> def multiple_yield():
...     yield [1, 2, 3]
...     yield (4, 5, 6)
... 
>>> gen = multiple_yield()

When we pass the generator object gen to the next function we should get back the list first and then the tuple.

>>> next(gen)
[1, 2, 3]
>>> next(gen)
(4, 5, 6) 

Passing the generator object to the next function is basically the same as calling the __next__ method of the generator object.

>>> gen.__next__()
[1, 2, 3]
>>> gen.__next__()
(4, 5, 6)
>>> gen.__next__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration 

As expected the Python interpreter raises a StopIteration exception when we execute the __next__ method the third time. That’s because our generator function only contains two yield expressions.

Can I Use Yield and Return in the Same Function?

Have you wondered if you can use yield and return in the same function?

Let’s see what happens when we do that in the function we have created in the previous section.

Here we are using Python 3.8.5:

>>> def multiple_yield():
...     yield [1, 2, 3]
...     yield (4, 5, 6)
...     return 'done'
... 
>>> gen = multiple_yield() 

The behaviour is similar to the one of the function without the return statement. The first two times we call the next() function we get back the two values in the yield expressions.

The third time we call the next() function the Python interpreter raises a StopIteration exception. The only difference is that the string in the return statement (‘done’) becomes the exception message.

>>> next(gen)
[1, 2, 3]
>>> next(gen)
(4, 5, 6)
>>> next(gen)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration: done 

If you try to run the same code with Python 2.7 you get a SyntaxError because a return statement with argument cannot be used inside a generator function.

>>> def multiple_yield():
...     yield [1, 2, 3]
...     yield (4, 5, 6)
...     return 'done'
... 
  File "<stdin>", line 4
SyntaxError: 'return' with argument inside generator 

Let’s try to remove the return argument:

>>> def multiple_yield():
...     yield [1, 2, 3]
...     yield (4, 5, 6)
...     return
... 
>>>  

All good this time.

This is just an experiment…

In reality it might not make sense to use yield and return as part of the same generator function.

Have you found a scenario where it might be useful doing that? Let me know in the comment.

Generators and Memory Usage

One of the reasons to use generators instead of lists is to save memory.

That’s because when working with lists all the elements of a lists are stored in memory while the same doesn’t happen when working with generators.

We will generate a list made of 100,000 elements and see how much space it takes in memory using the sys module.

Let’s start by defining two functions, one regular function that returns a list of numbers and a generator function that returns a generator object for the same sequence of numbers.

Regular Function

def get_numbers_list(max):
    numbers = []
    for number in range(max):
        numbers.append(number)
    return numbers

Generator Function

def get_numbers_generator(max):
    for number in range(max):
        yield number

Now, let’s get the list of numbers and the generator object back and calculate their size in bytes using the sys.getsizeof() function.

import sys

numbers_list = get_numbers_list(100000)
print("The size of the list is {} bytes".format(sys.getsizeof(numbers_list)))

numbers_generator = get_numbers_generator(100000)
print("The size of the generator is {} bytes".format(sys.getsizeof(numbers_generator)))

The output is:

The size of the list is 824456 bytes
The size of the generator is 112 bytes

 The list takes over 7000 times the memory required by the generator!

So, there is definitely a benefit in memory allocation when it comes to using generators. At the same time using a list is faster so it’s about finding a tradeoff between memory usage and performance.

Conclusion

You have learned the difference between return and yield in a Python function.

So now you know how to use the yield keyword to convert a regular function into a generator function.

I have also explained how generator expressions can be used as alternative to generator functions.

Finally, we have compared generators and regular lists from a memory usage perspective and showed why you can use generators to save memory especially if you are working with big datasets.

Share knowledge with your friends!

Leave a Reply

Your email address will not be published. Required fields are marked *