Introduction
On a current project I am working on, I either have to or desire to plot information using the MatPlotLib library. However – the plots generated by the program have no bearing on the remainder of the program. In essence – the plots are generated externally to the program and saved to the disk (and/or displayed). However, when coded serially – the some plots can take a half minute or more to plot.
Tools used
Context
Here, MatPlotLib was used to plot geospatial data for small portions of a sphere. A separate file called plotting_toolbox.py
was created to store a function and sub functions to plot similar data for the region in question. the plotting_toolbox.py
and it’s main function plotter(*)
is used when I need to plot different attributes in different figures to highlight some aspect of the problem I am solving. At the time of this writing – the code is embargoed. However, the function is built as follows:
def plotter(title, out_file_name=None, roads=None, fire_stations=None, counties=False, display=True, dpi_override=300):
print("Plotting", title, "(", out_file_name, ")")
fig, ax = plt.subplots()
_setup(ax, fig, title, dpi_override)
if roads is not None:
_plot_roads(ax, roads)
if fire_stations is not None:
_plot_fire_stations(ax, fire_stations)
#etc...
if out_file_name is not None:
fig.savefig(out_file_name, bbox_inches='tight')
if display:
plt.show()
def _setup(ax, fig, title, dpi_override):
fig.gca().set_aspect('equal', adjustable='box')
ax.grid(zorder=0)
ax.set_title(label=title)
fig.dpi = dpi_override
ax.set_xlabel('Longitude')
ax.set_ylabel('Latitude')
def _plot_roads(ax, roads):
for road in roads:
x, y = zip(*road)
ax.plot(x, y, 'b,-', linewidth=0.5, zorder=35)
etc…
For example, if I wanted a map to display just the fire stations, I may call
plotter('fire stations', out_file_name='fire_stations.png', fire_stations=fs)
If I wanted a map of the roads and counties, I may call
plotter('Roads', out_file_name='roads.png', roads=rds, counties=True)
and so forth.
Depending on the size of the road file and other data in the program, it can take a little bit of time to process and plot the graphs. Using multiprocessing
, we can create a new process where the plotting is done in a separate python process and we can let the computational part of the program continue to run in the original process.
The Modification
In Python, to import the multiprocessing module, we call:
from multiprocessing import Process
and can use the Process
class as indicated in the Python documentation, such as:
p = Process(target=f, args=('bob',))
p.start()
p.join()
In this case, we have multiple optional arguments in the plotter(*)
function. To retain the simplicity of being able to call the function with the optional parameters, I created a new function called plotter_mp(*)
and plotter_args_parser(*)
which took the identical arguments as plotter(*)
.
Since Process
Cannot handle optional arguments, the function plotter_args_parser(*)
exists to convert the optional arguments to their default values. It simply returns a tuple
of all the arguments ensuring that the default arguments retain their values if they are default.
def plotter_args_parser(title, out_file_name=None, roads=None, fire_stations=None, counties=False, display=True, dpi_override=300):
return (title, out_file_name, roads, fire_stations, counties, display, dpi_override)
When used in conjunction with plotter_mp(*)
, we see:
def plotter_mp(title, out_file_name=None, roads=None, fire_stations=None, counties=False, display=True, dpi_override=300):
print('Plotting with Multiprocessing:', title)
p_args = plotter_args_parser(title, out_file_name, roads, fire_stations, counties, display, dpi_override)
p = Process(target=plotter, args=p_args)
p.start()
and the plot will be saved and/or output whenever it is finished.
Savings
When executed on an AMD A8-7600 3.10ghz 4 core computer with 16.0GB RAM (15.0GB useable) on 64 bit Windows 10, without multiprocessing the program took approximately 10 to 11 minutes to complete. However, when plotting on a separate process (11 images), the process took around 8 to 9 minutes to complete – or about 10-20% in savings. Multiprocessing was also leveraged to read the large data files, however – that only shaved seconds of a serial implementation.
Conclusion
To maintain the flexibility that I had when originally making the plotter(*)
function, I created two wrapper functions, one called plotter_args_parser(*)
and another called plotter_mp(*)
, where the former turns the arguments into a tuple and the latter wraps the Process class and lets the new python process do it’s plotting thing until it’s finished.