Categories
Development Tools

Ensuring a Python path exists

When running programs in Python, I often like to put my output into a folder based on the current time. The following is a code snippet to ensure a file path exists (and logs the folder creation).

import logging
import os


def create_path(file):
    """Creates a path if it does not exist
    :param file: the path/file that we want to check if it exists
    """
    directory, file = os.path.dirname(file), os.path.basename(file)
    if not os.path.exists(directory):
        logging.debug(f"Path {directory} does not exist for {file}, creating it now!")
        os.makedirs(directory)
Categories
Development Tools

Using python’s Pool.map() with many arguments

One thing that bugged me that took a while to find a solution was how to use multiple arguments in Python’s multiprocessing Pool.map(*) function.

def original_function(arg1, arg2, arg3, arg4)
    # do something with the four arguments
    return the_result

def function_wrapper(args):
    return original_function(*args)

def main()
    iterable = list()
    pool = mp.Pool()
    for parameter in parameters:
        iterable.append((arg1, arg2, arg3, parameter))
    results = list(pool.map(func=function_wrapper, iterable=iterable))

You are simply passing the tuple into a wrapper function and then unzipping the arguments inside the wrapper. Good enough for me.

Categories
Development Tools

Leveraging Python’s multiprocessing module to output plots from MatPlotLib

Introduction

On a current project I am working on, I either have to or desire to plot information using the MatPlotLib library.  However – the plots generated by the program have no bearing on the remainder of the program.  In essence – the plots are generated externally to the program and saved to the disk (and/or displayed).  However, when coded serially – the some plots can take a half minute or more to plot.

Tools used

Context

Here, MatPlotLib was used to plot geospatial data for small portions of a sphere.  A separate file called plotting_toolbox.py was created to store a function and sub functions to plot similar data for the region in question.  the  plotting_toolbox.py and it’s main function plotter(*) is used when I need to plot different attributes in different figures to highlight some aspect of the problem I am solving.  At the time of this writing – the code is embargoed.  However, the function is built as follows:

def plotter(title, out_file_name=None, roads=None, fire_stations=None, counties=False, display=True, dpi_override=300):
    print("Plotting", title, "(", out_file_name, ")")
    fig, ax = plt.subplots()
    _setup(ax, fig, title, dpi_override)
    if roads is not None:
        _plot_roads(ax, roads)
    if fire_stations is not None:
        _plot_fire_stations(ax, fire_stations)
    #etc...
    if out_file_name is not None:
        fig.savefig(out_file_name, bbox_inches='tight')
    if display:
        plt.show()
def _setup(ax, fig, title, dpi_override):
    fig.gca().set_aspect('equal', adjustable='box')
    ax.grid(zorder=0)
    ax.set_title(label=title)
    fig.dpi = dpi_override
    ax.set_xlabel('Longitude')
    ax.set_ylabel('Latitude')
def _plot_roads(ax, roads): 
    for road in roads: 
        x, y = zip(*road) 
        ax.plot(x, y, 'b,-', linewidth=0.5, zorder=35)

etc…

For example, if I wanted a map to display just the fire stations, I may call

plotter('fire stations', out_file_name='fire_stations.png', fire_stations=fs)

If I wanted a map of the roads and counties, I may call

plotter('Roads', out_file_name='roads.png', roads=rds, counties=True)

and so forth.

Depending on the size of the road file and other data in the program, it can take a little bit of time to process and plot the graphs.  Using multiprocessing, we can create a new process where the plotting is done in a separate python process and we can let the computational part of the program continue to run in the original process.

The Modification

In Python, to import the multiprocessing module, we call:

from multiprocessing import Process

and can use the Process class as indicated in the Python  documentation, such as:

p = Process(target=f, args=('bob',))
p.start()
p.join()

In this case, we have multiple optional arguments in the plotter(*)function.  To retain the simplicity of being able to call the function with the optional parameters, I created a new function called plotter_mp(*) and   plotter_args_parser(*) which took the identical arguments as plotter(*).

Since Process Cannot handle optional arguments, the function plotter_args_parser(*) exists to convert the optional arguments to their default values.  It simply returns a tuple of all the arguments ensuring that the default arguments retain their values if they are default.

def plotter_args_parser(title, out_file_name=None, roads=None, fire_stations=None, counties=False, display=True, dpi_override=300):
    return (title, out_file_name, roads, fire_stations, counties, display, dpi_override)

When used in conjunction with plotter_mp(*), we see:

def plotter_mp(title, out_file_name=None, roads=None, fire_stations=None, counties=False, display=True, dpi_override=300):
    print('Plotting with Multiprocessing:', title)
    p_args = plotter_args_parser(title, out_file_name, roads, fire_stations, counties, display, dpi_override)
    p = Process(target=plotter, args=p_args)
    p.start()

and the plot will be saved and/or output whenever it is finished.

Savings

When executed on an AMD A8-7600 3.10ghz 4 core computer with 16.0GB RAM (15.0GB useable) on 64 bit Windows 10, without multiprocessing the program took approximately 10 to 11 minutes to complete.  However, when plotting on a separate process (11 images), the process took around 8 to 9 minutes to complete – or about 10-20% in savings.  Multiprocessing was also leveraged to read the large data files, however – that only shaved seconds of a serial implementation.

Conclusion

To maintain the flexibility that I had when originally making the plotter(*) function, I created two wrapper functions, one called plotter_args_parser(*) and another called plotter_mp(*), where the former turns the arguments into a tuple and the latter wraps the Process class and lets the new python process do it’s plotting thing until it’s finished.

Categories
Development Tools

Using Visual Studio 2017, CPLEX 12.8.0, Windows 10

To create the development environment that I anticipate for my research project, I wanted to ensure that I could get Visual Studio 2017 and CPLEX 12.8.0.  This project unfortunately took me the better part of a day, so I am documenting it here for my future reference and hopefully to save someone else some heartache.


To begin, I did use the material outlined in the post here.  However, the post is over a year old and the method outlined there did not yield a positive result.


Step 1: Updating the path variable

As outlined here, I updated my PATH variable.  I am not sure if it was absolutely necessary, as it was one of the first things I tried and was lazy to change it back.

As outlined in the steps from IBM, the PATH environment variable was already there so I added it by clicking “Edit…”.  The path of the dll is specifically:

C:\Program Files\IBM\ILOG\CPLEX_Studio128\cplex\bin\x64_win64

Step 2: Installing Visual Studio 2017, VC++ 2015.3 V140 toolset

If you have already installed Visual Studio 2017, you will need to re-run the Visual Studio Installer, for me it was in the Start Menu.  You will need to click “Modify”.  In the next menu, you will be in the “Workloads” tab.  Next to “Workloads”, click “Individual Components”.  Look for the header “Code Tools” under which will have the “VC++ 2015.3 V140 toolset for desktop (x86, x64)”.  Ensure that toolset is checked.  Click “Modify” in the lower right corner.  I believe it is a big file (approximately 8Gb).

If you are installing visual studio from scratch, I believe this is a similar process when you get to the choices for “Workloads”.  Choose your desired workloads and then go to the “Individual Components” tab.  Look for the header “Code Tools” and ensure the “VC++ 2015.3 V140 toolset for desktop (x86, x64)” is checked.


Step 3: Linking CPLEX with your Visual Studio project

For this step, I outright copied most of the steps outlined here.

to begin, right click on the the project file and entering the project properties.

Under C/C++, General, add the following to “Additional Include directories”

C:\Program Files\IBM\ILOG\CPLEX_Studio128\cplex\include 
C:\Program Files\IBM\ILOG\CPLEX_Studio128\concert\include

Under Linker, General, add the following to “Additional Library Directories”

Under the “Release Configuration”

C:\Program Files\IBM\ILOG\CPLEX_Studio128\cplex\lib\x64_windows_vs2017\stat_mda
C:\Program Files\IBM\ILOG\CPLEX_Studio128\concert\lib\x64_windows_vs2017\stat_mda

Under the “Debug Configuration”

C:\Program Files\IBM\ILOG\CPLEX_Studio128\cplex\lib\x64_windows_vs2017\stat_mdd
C:\Program Files\IBM\ILOG\CPLEX_Studio128\concert\lib\x64_windows_vs2017\stat_mdd

Under Linker, Inpurt, add the following to “Additional Dependencies”

cplex1280.lib
concert.lib
ilocplex.lib

And now Visual Studio should be able to call the CPLEX environment.