Categories
Development Tools

Leveraging Python’s multiprocessing module to output plots from MatPlotLib

From time to time, plots generated from MatPlotLib take a while to process and display when placed in serially in a python program. However – the plots generated by the program have no bearing on the remainder of the program.  In essence – the plots are generated externally to the program and saved to the disk (and/or displayed). To maintain the flexibility that I had when originally making the plotter() function, I created two wrapper functions to send MatPlotLib to a new task.  

Introduction

On a current project I am working on, I either have to or desire to plot information using the MatPlotLib library.  However – the plots generated by the program have no bearing on the remainder of the program.  In essence – the plots are generated externally to the program and saved to the disk (and/or displayed).  However, when coded serially – the some plots can take a half minute or more to plot.

Tools used

Context

Here, MatPlotLib was used to plot geospatial data for small portions of a sphere.  A separate file called plotting_toolbox.py was created to store a function and sub functions to plot similar data for the region in question.  the  plotting_toolbox.py and it’s main function plotter(*) is used when I need to plot different attributes in different figures to highlight some aspect of the problem I am solving.  At the time of this writing – the code is embargoed.  However, the function is built as follows:

def plotter(title, out_file_name=None, roads=None, fire_stations=None, counties=False, display=True, dpi_override=300):
    print("Plotting", title, "(", out_file_name, ")")
    fig, ax = plt.subplots()
    _setup(ax, fig, title, dpi_override)
    if roads is not None:
        _plot_roads(ax, roads)
    if fire_stations is not None:
        _plot_fire_stations(ax, fire_stations)
    #etc...
    if out_file_name is not None:
        fig.savefig(out_file_name, bbox_inches='tight')
    if display:
        plt.show()
def _setup(ax, fig, title, dpi_override):
    fig.gca().set_aspect('equal', adjustable='box')
    ax.grid(zorder=0)
    ax.set_title(label=title)
    fig.dpi = dpi_override
    ax.set_xlabel('Longitude')
    ax.set_ylabel('Latitude')
def _plot_roads(ax, roads): 
    for road in roads: 
        x, y = zip(*road) 
        ax.plot(x, y, 'b,-', linewidth=0.5, zorder=35)

etc…

For example, if I wanted a map to display just the fire stations, I may call

plotter('fire stations', out_file_name='fire_stations.png', fire_stations=fs)

If I wanted a map of the roads and counties, I may call

plotter('Roads', out_file_name='roads.png', roads=rds, counties=True)

and so forth.

Depending on the size of the road file and other data in the program, it can take a little bit of time to process and plot the graphs.  Using multiprocessing, we can create a new process where the plotting is done in a separate python process and we can let the computational part of the program continue to run in the original process.

The Modification

In Python, to import the multiprocessing module, we call:

from multiprocessing import Process

and can use the Process class as indicated in the Python  documentation, such as:

p = Process(target=f, args=('bob',))
p.start()
p.join()

In this case, we have multiple optional arguments in the plotter(*)function.  To retain the simplicity of being able to call the function with the optional parameters, I created a new function called plotter_mp(*) and   plotter_args_parser(*) which took the identical arguments as plotter(*).

Since Process Cannot handle optional arguments, the function plotter_args_parser(*) exists to convert the optional arguments to their default values.  It simply returns a tuple of all the arguments ensuring that the default arguments retain their values if they are default.

def plotter_args_parser(title, out_file_name=None, roads=None, fire_stations=None, counties=False, display=True, dpi_override=300):
    return (title, out_file_name, roads, fire_stations, counties, display, dpi_override)

When used in conjunction with plotter_mp(*), we see:

def plotter_mp(title, out_file_name=None, roads=None, fire_stations=None, counties=False, display=True, dpi_override=300):
    print('Plotting with Multiprocessing:', title)
    p_args = plotter_args_parser(title, out_file_name, roads, fire_stations, counties, display, dpi_override)
    p = Process(target=plotter, args=p_args)
    p.start()

and the plot will be saved and/or output whenever it is finished.

Savings

When executed on an AMD A8-7600 3.10ghz 4 core computer with 16.0GB RAM (15.0GB useable) on 64 bit Windows 10, without multiprocessing the program took approximately 10 to 11 minutes to complete.  However, when plotting on a separate process (11 images), the process took around 8 to 9 minutes to complete – or about 10-20% in savings.  Multiprocessing was also leveraged to read the large data files, however – that only shaved seconds of a serial implementation.

Conclusion

To maintain the flexibility that I had when originally making the plotter(*) function, I created two wrapper functions, one called plotter_args_parser(*) and another called plotter_mp(*), where the former turns the arguments into a tuple and the latter wraps the Process class and lets the new python process do it’s plotting thing until it’s finished.