# Understanding Multiprocessing with Python

By [Regan Willis](https://paragraph.com/@regan-willis) · 2022-01-31

---

![](https://storage.googleapis.com/papyrus_images/6199005b4873173039db3eb4f19e257ad5fd2cc01a7b2a2ee97096af6ad92faa.jpg)

Multiprocessing is using more than one computer processor so that instead of running the program sequentially (like usual), the program will have multiple processes that run in parallel. This can be a useful solution to speed up your program. Multiprocessing is more complicated in Python than other programming languages because of the [Global Interpreter Lock.](https://wiki.python.org/moin/GlobalInterpreterLock) Luckily, as is typical for Python, we have libraries to help us! We’ll use standard libraries [multiprocessing](https://docs.python.org/3/library/multiprocessing.html) and [concurrent.futures](https://docs.python.org/3/library/concurrent.futures.html). You can view the source code [here](https://github.com/reganwillis/Python-Multiprocessing).

Creating Multiple Parallel Processes
------------------------------------

If you’ve been researching this already you might have heard the word “pool” thrown around. This is a common way to implement parallel processing — creating several different processes that all work at the same time. You can do this with the regular multiprocessing library, but I think the concurrent futures library is a little more intuitive, and it has the ability to switch from threads to processes without much change in syntax.

The code below is an example that functions as two concurrent threads with one thread being the main program and the other holding a pool of processes. This means that all the computation in the pool is done in parallel, but it is not in parallel with the main thread. The main thread loops through to look for qualifying data and, when enough processes have been created, the main thread switches to the process pool and starts computing in parallel.

    from time import sleep
    from concurrent.futures import ProcessPoolExecutor
    
    
    def my_function(data):
    
        # simulate computation on data
        if data == 9:
            sleep(4)
        else:
            sleep(1)
    
        print(f'data received: {data}')
    
    
    if __name__ == "__main__":
    
        # create process pool
        executor = ProcessPoolExecutor()
    
        # loop through data
        for data in range(1, 31):
    
            # send qualifying data to the pool
            if data % 3 == 0:
                print('...sending data')
                executor.submit(my_function, data)
    

[view the Gist on Github](https://gist.github.com/reganwillis/2e1048164c0a92073eb3e5f5eea46036#file-concurrentfutures_example-py)

To test that the processes are working in true parallel, I made each process sleep and all the processes slept at roughly the same time. Overall, the processes sleep for 13 seconds, so if this program was being run sequentially it would take over 13 seconds to complete all the processes. However, because we are using multiprocessing, sending all the data points takes less than five seconds. The third process takes a little longer, so instead of the program waiting on it, it will go on to complete all the other processes and the third process will finish last.

Creating a Spin-Off, FIFO Idling Process
----------------------------------------

The problem with the pool is that although processes may be _created_ in a first-in, first-out manner, there is no way to guarantee that the processes _end_ in order. Usually, this is an advantage: if the fifth process is taking a little longer it’s no problem because the sixth and seventh processes can still keep going. However, you may find yourself in the position where a function can be done asynchronously, but the _order of the output_ of that function matters. In this case, you need only one process that will spin off and work in parallel with the main thread. Here’s an example using the multiprocessing library:

    import multiprocessing as mp
    import time
    
    
    def my_function(queue):
        data = queue.get()
    
        # loop through queue until flag is received
        while data:
            print(f'data received: {data}')
            data = queue.get()
    
    
    if __name__ == "__main__":
    
        # create queue, start process
        queue = mp.Queue()
        process = mp.Process(target=my_function, args=(queue,))
        process.start()
    
        # loop through data
        for data in range(1, 31):
    
            # simulate data search taking some time
            time.sleep(0.25)
    
            # send qualifying data through the queue
            if data % 3 == 0:
                print('...sending data')
                queue.put(data)
    
        # send flag to stop process
        queue.put(False)
    

[view the Gist on Github](https://gist.github.com/reganwillis/cb6a114c19ff8dd88c61cc854d76edfc#file-multiprocessing_example-py)

The main thread is constantly looping, while sending some of the data to the function for it to do some time-consuming computation on it. The multiprocessing library includes a [queue](https://docs.python.org/3/library/multiprocessing.html#pipes-and-queues) that can be sent as an argument in a function. From the main thread, you can put data in the queue. Calling the get method will remove the data from the queue in the order it was sent. The process will start as soon as you call the start method, but the queue’s get method will, by default, wait until there is something in the queue before continuing the program. After the first piece of data is sent in the queue, you can loop through it until a flag is sent to stop the loop.

[Originally published to Medium on 08/01/2020](https://medium.com/@reganwillis/understanding-multiprocessing-with-python-b2a793be046b).

---

*Originally published on [Regan Willis](https://paragraph.com/@regan-willis/understanding-multiprocessing-with-python)*