Ensuring a Python path exists

When running programs in Python, I often like to put my output into a folder based on the current time. The following is a code snippet to ensure a file path exists (and logs the folder creation).

import logging
import os


def create_path(file):
    """Creates a path if it does not exist
    :param file: the path/file that we want to check if it exists
    """
    directory, file = os.path.dirname(file), os.path.basename(file)
    if not os.path.exists(directory):
        logging.debug(f"Path {directory} does not exist for {file}, creating it now!")
        os.makedirs(directory)

Using python’s Pool.map() with many arguments

One thing that bugged me that took a while to find a solution was how to use multiple arguments in Python’s multiprocessing Pool.map(*) function.

def original_function(arg1, arg2, arg3, arg4)
    # do something with the four arguments
    return the_result

def function_wrapper(args):
    return original_function(*args)

def main()
    iterable = list()
    pool = mp.Pool()
    for parameter in parameters:
        iterable.append((arg1, arg2, arg3, parameter))
    results = list(pool.map(func=function_wrapper, iterable=iterable))

You are simply passing the tuple into a wrapper function and then unzipping the arguments inside the wrapper. Good enough for me.

Leveraging Python’s multiprocessing module to output plots from MatPlotLib

From time to time, plots generated from MatPlotLib take a while to process and display when placed in serially in a python program. However – the plots generated by the program have no bearing on the remainder of the program.  In essence – the plots are generated externally to the program and saved to the disk (and/or displayed). To maintain the flexibility that I had when originally making the plotter() function, I created two wrapper functions to send MatPlotLib to a new task.  

Introduction

On a current project I am working on, I either have to or desire to plot information using the MatPlotLib library.  However – the plots generated by the program have no bearing on the remainder of the program.  In essence – the plots are generated externally to the program and saved to the disk (and/or displayed).  However, when coded serially – the some plots can take a half minute or more to plot.

Tools used

Context

Here, MatPlotLib was used to plot geospatial data for small portions of a sphere.  A separate file called plotting_toolbox.py was created to store a function and sub functions to plot similar data for the region in question.  the  plotting_toolbox.py and it’s main function plotter(*) is used when I need to plot different attributes in different figures to highlight some aspect of the problem I am solving.  At the time of this writing – the code is embargoed.  However, the function is built as follows:

def plotter(title, out_file_name=None, roads=None, fire_stations=None, counties=False, display=True, dpi_override=300):
    print("Plotting", title, "(", out_file_name, ")")
    fig, ax = plt.subplots()
    _setup(ax, fig, title, dpi_override)
    if roads is not None:
        _plot_roads(ax, roads)
    if fire_stations is not None:
        _plot_fire_stations(ax, fire_stations)
    #etc...
    if out_file_name is not None:
        fig.savefig(out_file_name, bbox_inches='tight')
    if display:
        plt.show()
def _setup(ax, fig, title, dpi_override):
    fig.gca().set_aspect('equal', adjustable='box')
    ax.grid(zorder=0)
    ax.set_title(label=title)
    fig.dpi = dpi_override
    ax.set_xlabel('Longitude')
    ax.set_ylabel('Latitude')
def _plot_roads(ax, roads): 
    for road in roads: 
        x, y = zip(*road) 
        ax.plot(x, y, 'b,-', linewidth=0.5, zorder=35)

etc…

For example, if I wanted a map to display just the fire stations, I may call

plotter('fire stations', out_file_name='fire_stations.png', fire_stations=fs)

If I wanted a map of the roads and counties, I may call

plotter('Roads', out_file_name='roads.png', roads=rds, counties=True)

and so forth.

Depending on the size of the road file and other data in the program, it can take a little bit of time to process and plot the graphs.  Using multiprocessing, we can create a new process where the plotting is done in a separate python process and we can let the computational part of the program continue to run in the original process.

The Modification

In Python, to import the multiprocessing module, we call:

from multiprocessing import Process

and can use the Process class as indicated in the Python  documentation, such as:

p = Process(target=f, args=('bob',))
p.start()
p.join()

In this case, we have multiple optional arguments in the plotter(*)function.  To retain the simplicity of being able to call the function with the optional parameters, I created a new function called plotter_mp(*) and   plotter_args_parser(*) which took the identical arguments as plotter(*).

Since Process Cannot handle optional arguments, the function plotter_args_parser(*) exists to convert the optional arguments to their default values.  It simply returns a tuple of all the arguments ensuring that the default arguments retain their values if they are default.

def plotter_args_parser(title, out_file_name=None, roads=None, fire_stations=None, counties=False, display=True, dpi_override=300):
    return (title, out_file_name, roads, fire_stations, counties, display, dpi_override)

When used in conjunction with plotter_mp(*), we see:

def plotter_mp(title, out_file_name=None, roads=None, fire_stations=None, counties=False, display=True, dpi_override=300):
    print('Plotting with Multiprocessing:', title)
    p_args = plotter_args_parser(title, out_file_name, roads, fire_stations, counties, display, dpi_override)
    p = Process(target=plotter, args=p_args)
    p.start()

and the plot will be saved and/or output whenever it is finished.

Savings

When executed on an AMD A8-7600 3.10ghz 4 core computer with 16.0GB RAM (15.0GB useable) on 64 bit Windows 10, without multiprocessing the program took approximately 10 to 11 minutes to complete.  However, when plotting on a separate process (11 images), the process took around 8 to 9 minutes to complete – or about 10-20% in savings.  Multiprocessing was also leveraged to read the large data files, however – that only shaved seconds of a serial implementation.

Conclusion

To maintain the flexibility that I had when originally making the plotter(*) function, I created two wrapper functions, one called plotter_args_parser(*) and another called plotter_mp(*), where the former turns the arguments into a tuple and the latter wraps the Process class and lets the new python process do it’s plotting thing until it’s finished.

Estimating π with random numbers

Introduction

There is a simple experiment you can perform to estimate the value of π, i.e. 3.14159…, using a random number generator.  Explicitly, π is defined as the ratio of a circle’s circumference to its diameter, \(\pi = C/d\) where \(C\) and \(d\) are the circumference and diameter, respectively.  Also recall the area of a circle is \(\pi r^2 = A\)

For this experiment, you’ll need a programming language with a random number generator that will generate uniform random numbers, or that is randomly pick a number over a uniform distribution.

What does a uniform random number mean?  Let us consider a discrete (numbers that are individually separate and distinct) example:

Consider the game where you have a dozen unique marbles in a bag.  You pick one at random, record the color, and then return it.  Over time (i.e. thousands of draws), you will notice that approximately each marble will be drawn approximately the same number of times.  That is because, theoretically, there is no bias toward any specific marble in the bag.  It’s the same concept, in an ideal world a random number will be be picked without bias to any other number.

Unfortunately, computers are not truly random.  Most, if not all, random numbers are pseudo-random number generators.   They rely on mathematical formulations to generate random numbers (which, can’t be random!).  However, they have a depth and complexity to sufficiently provide a program with a random number.

Python’s Random Function

To attempt to compute the value of π using this random sampling, consider python’s random number generator.  Using the function

random.random()

you can draw a random floating point number in the range [0.0, 1.0).

So In your preferred Python IDE of choice, import the random module.

import random

after typing

random.random()

a few dozen times, you should begin to see that you’ll get a random number between 0 and 1.

Consider the circle

Now, consider a geometric representation of a circle.  In a Cartesian plane a circle centered at the origin can be represented as the following equation:

\((x^2 + y^2)^{1/2} = a\)

and the area of a circle as:

\(\pi r^2 = A_C\)

In addition, consider a square that is circumscribed around the same circle:

\((2 \times r)^2 =A_S\)

Visually:

A circle (orange) inscribed in a square (blue)

Looking at the figure above, assume we pick a uniformly random point that is bounded by the blue square \(S\), what is the chance that the point chosen would be inside the circle \(C\) as well? It would be the ratio of the area of the  circle \(A_c\) to the area of the square \(A_s\), that is the probability of selecting a point in \(S\) and \(C\) is \(P\{S \& C\}\)

\(P\{S \& C\} = A_C/A_S\)

\(P\{S \& C\} = {\pi r^2}/{(2r)^2}\)

Of course, assuming a \(r>0\)

\(P\{S \& C\} = {\pi}/{4}\)

So the chance of selecting a point in the square \(S\) that is also inside the circle \(C\) is \({\pi}/{4}\)

Coding the experiment in Python

To test this experiment Pythonically, let us consider a square and circle centered at the origin and, for simplicity, we draw only positive numbers from the random number generator:

def GetPoint():
     # Returns True if the point is inside the circle.
     x, y = random.random(), random.random()  # uniform [0.0, 1.0)
     return math.sqrt(x**2+y**2) <= 1

GetPoint() tests if randomly generated point from inside the square is also inside the circle.

Since we are only considering one-quarter the area of the circle and one-quarter the area of the square, the reduction cancels out and we are still left with the relationship of \(\pi/4\).  To call the experiment and estimate π, we must interpret what happens when a random point is inside the circle.  In Python, we can write:

def circle_estimator(n):
    i = 0
    for k in range(n):
        if GetPoint():
            i += 1
    return 4 * i / n  # returns the estimated value of pi

For this experiment to work, we must execute it many times.  If we only call GetPoint() once, there is a chance it will say π is 4 or 0!  Clearly not a close value to π.

Running the experiment for multiple iterations, the following is the results of the estimated value of π:

Iterations Estimation Pct
10 3.4 8.23%
100 3.12 -0.69%
10000 3.1552 0.43%
1000000 3.143532 0.06%

Close!  However, this is clearly not as efficient as just calling

math.pi