Blog

Ensuring a Python path exists

When running programs in Python, I often like to put my output into a folder based on the current time. The following is a code snippet to ensure a file path exists (and logs the folder creation).

import logging
import os

def create_path(file):
"""Creates a path if it does not exist
:param file: the path/file that we want to check if it exists
"""
directory, file = os.path.dirname(file), os.path.basename(file)
if not os.path.exists(directory):
logging.debug(f"Path {directory} does not exist for {file}, creating it now!")
os.makedirs(directory)


Using python’s Pool.map() with many arguments

One thing that bugged me that took a while to find a solution was how to use multiple arguments in Python’s multiprocessing Pool.map(*) function.

def original_function(arg1, arg2, arg3, arg4)
# do something with the four arguments
return the_result

def function_wrapper(args):
return original_function(*args)

def main()
iterable = list()
pool = mp.Pool()
for parameter in parameters:
iterable.append((arg1, arg2, arg3, parameter))
results = list(pool.map(func=function_wrapper, iterable=iterable))

You are simply passing the tuple into a wrapper function and then unzipping the arguments inside the wrapper. Good enough for me.

Leveraging Python’s multiprocessing module to output plots from MatPlotLib

From time to time, plots generated from MatPlotLib take a while to process and display when placed in serially in a python program. However – the plots generated by the program have no bearing on the remainder of the program.  In essence – the plots are generated externally to the program and saved to the disk (and/or displayed). To maintain the flexibility that I had when originally making the plotter() function, I created two wrapper functions to send MatPlotLib to a new task.

Introduction

On a current project I am working on, I either have to or desire to plot information using the MatPlotLib library.  However – the plots generated by the program have no bearing on the remainder of the program.  In essence – the plots are generated externally to the program and saved to the disk (and/or displayed).  However, when coded serially – the some plots can take a half minute or more to plot.

Context

Here, MatPlotLib was used to plot geospatial data for small portions of a sphere.  A separate file called plotting_toolbox.py was created to store a function and sub functions to plot similar data for the region in question.  the  plotting_toolbox.py and it’s main function plotter(*) is used when I need to plot different attributes in different figures to highlight some aspect of the problem I am solving.  At the time of this writing – the code is embargoed.  However, the function is built as follows:

def plotter(title, out_file_name=None, roads=None, fire_stations=None, counties=False, display=True, dpi_override=300):
print("Plotting", title, "(", out_file_name, ")")
fig, ax = plt.subplots()
_setup(ax, fig, title, dpi_override)
if fire_stations is not None:
_plot_fire_stations(ax, fire_stations)
#etc...
if out_file_name is not None:
fig.savefig(out_file_name, bbox_inches='tight')
if display:
plt.show()
def _setup(ax, fig, title, dpi_override):
ax.grid(zorder=0)
ax.set_title(label=title)
fig.dpi = dpi_override
ax.set_xlabel('Longitude')
ax.set_ylabel('Latitude')
def _plot_roads(ax, roads):
ax.plot(x, y, 'b,-', linewidth=0.5, zorder=35)

etc…

For example, if I wanted a map to display just the fire stations, I may call

plotter('fire stations', out_file_name='fire_stations.png', fire_stations=fs)

If I wanted a map of the roads and counties, I may call

plotter('Roads', out_file_name='roads.png', roads=rds, counties=True)

and so forth.

Depending on the size of the road file and other data in the program, it can take a little bit of time to process and plot the graphs.  Using multiprocessing, we can create a new process where the plotting is done in a separate python process and we can let the computational part of the program continue to run in the original process.

The Modification

In Python, to import the multiprocessing module, we call:

from multiprocessing import Process

and can use the Process class as indicated in the Python  documentation, such as:

p = Process(target=f, args=('bob',))
p.start()
p.join()

In this case, we have multiple optional arguments in the plotter(*)function.  To retain the simplicity of being able to call the function with the optional parameters, I created a new function called plotter_mp(*) and   plotter_args_parser(*) which took the identical arguments as plotter(*).

Since Process Cannot handle optional arguments, the function plotter_args_parser(*) exists to convert the optional arguments to their default values.  It simply returns a tuple of all the arguments ensuring that the default arguments retain their values if they are default.

def plotter_args_parser(title, out_file_name=None, roads=None, fire_stations=None, counties=False, display=True, dpi_override=300):
return (title, out_file_name, roads, fire_stations, counties, display, dpi_override)

When used in conjunction with plotter_mp(*), we see:

def plotter_mp(title, out_file_name=None, roads=None, fire_stations=None, counties=False, display=True, dpi_override=300):
print('Plotting with Multiprocessing:', title)
p_args = plotter_args_parser(title, out_file_name, roads, fire_stations, counties, display, dpi_override)
p = Process(target=plotter, args=p_args)
p.start()

and the plot will be saved and/or output whenever it is finished.

Savings

When executed on an AMD A8-7600 3.10ghz 4 core computer with 16.0GB RAM (15.0GB useable) on 64 bit Windows 10, without multiprocessing the program took approximately 10 to 11 minutes to complete.  However, when plotting on a separate process (11 images), the process took around 8 to 9 minutes to complete – or about 10-20% in savings.  Multiprocessing was also leveraged to read the large data files, however – that only shaved seconds of a serial implementation.

Conclusion

To maintain the flexibility that I had when originally making the plotter(*) function, I created two wrapper functions, one called plotter_args_parser(*) and another called plotter_mp(*), where the former turns the arguments into a tuple and the latter wraps the Process class and lets the new python process do it’s plotting thing until it’s finished.

Introduction

There is a simple experiment you can perform to estimate the value of π, i.e. 3.14159…, using a random number generator.  Explicitly, π is defined as the ratio of a circle’s circumference to its diameter, $$\pi = C/d$$ where $$C$$ and $$d$$ are the circumference and diameter, respectively.  Also recall the area of a circle is $$\pi r^2 = A$$

For this experiment, you’ll need a programming language with a random number generator that will generate uniform random numbers, or that is randomly pick a number over a uniform distribution.

What does a uniform random number mean?  Let us consider a discrete (numbers that are individually separate and distinct) example:

Consider the game where you have a dozen unique marbles in a bag.  You pick one at random, record the color, and then return it.  Over time (i.e. thousands of draws), you will notice that approximately each marble will be drawn approximately the same number of times.  That is because, theoretically, there is no bias toward any specific marble in the bag.  It’s the same concept, in an ideal world a random number will be be picked without bias to any other number.

Unfortunately, computers are not truly random.  Most, if not all, random numbers are pseudo-random number generators.   They rely on mathematical formulations to generate random numbers (which, can’t be random!).  However, they have a depth and complexity to sufficiently provide a program with a random number.

Python’s Random Function

To attempt to compute the value of π using this random sampling, consider python’s random number generator.  Using the function

random.random()

you can draw a random floating point number in the range [0.0, 1.0).

So In your preferred Python IDE of choice, import the random module.

import random

after typing

random.random()

a few dozen times, you should begin to see that you’ll get a random number between 0 and 1.

Consider the circle

Now, consider a geometric representation of a circle.  In a Cartesian plane a circle centered at the origin can be represented as the following equation:

$$(x^2 + y^2)^{1/2} = a$$

and the area of a circle as:

$$\pi r^2 = A_C$$

In addition, consider a square that is circumscribed around the same circle:

$$(2 \times r)^2 =A_S$$

Visually:

Looking at the figure above, assume we pick a uniformly random point that is bounded by the blue square $$S$$, what is the chance that the point chosen would be inside the circle $$C$$ as well? It would be the ratio of the area of the  circle $$A_c$$ to the area of the square $$A_s$$, that is the probability of selecting a point in $$S$$ and $$C$$ is $$P\{S \& C\}$$

$$P\{S \& C\} = A_C/A_S$$

$$P\{S \& C\} = {\pi r^2}/{(2r)^2}$$

Of course, assuming a $$r>0$$

$$P\{S \& C\} = {\pi}/{4}$$

So the chance of selecting a point in the square $$S$$ that is also inside the circle $$C$$ is $${\pi}/{4}$$

Coding the experiment in Python

To test this experiment Pythonically, let us consider a square and circle centered at the origin and, for simplicity, we draw only positive numbers from the random number generator:

def GetPoint():
# Returns True if the point is inside the circle.
x, y = random.random(), random.random()  # uniform [0.0, 1.0)
return math.sqrt(x**2+y**2) <= 1

GetPoint() tests if randomly generated point from inside the square is also inside the circle.

Since we are only considering one-quarter the area of the circle and one-quarter the area of the square, the reduction cancels out and we are still left with the relationship of $$\pi/4$$.  To call the experiment and estimate π, we must interpret what happens when a random point is inside the circle.  In Python, we can write:

def circle_estimator(n):
i = 0
for k in range(n):
if GetPoint():
i += 1
return 4 * i / n  # returns the estimated value of pi

For this experiment to work, we must execute it many times.  If we only call GetPoint() once, there is a chance it will say π is 4 or 0!  Clearly not a close value to π.

Running the experiment for multiple iterations, the following is the results of the estimated value of π:

 Iterations Estimation Pct 10 3.4 8.23% 100 3.12 -0.69% 10000 3.1552 0.43% 1000000 3.143532 0.06%

Close!  However, this is clearly not as efficient as just calling

math.pi

Using Visual Studio 2017, CPLEX 12.8.0, Windows 10

To create the development environment that I anticipate for my research project, I wanted to ensure that I could get Visual Studio 2017 and CPLEX 12.8.0.  This project unfortunately took me the better part of a day, so I am documenting it here for my future reference and hopefully to save someone else some heartache.

To begin, I did use the material outlined in the post here.  However, the post is over a year old and the method outlined there did not yield a positive result.

Step 1: Updating the path variable

As outlined here, I updated my PATH variable.  I am not sure if it was absolutely necessary, as it was one of the first things I tried and was lazy to change it back.

As outlined in the steps from IBM, the PATH environment variable was already there so I added it by clicking “Edit…”.  The path of the dll is specifically:

C:\Program Files\IBM\ILOG\CPLEX_Studio128\cplex\bin\x64_win64

Step 2: Installing Visual Studio 2017, VC++ 2015.3 V140 toolset

If you have already installed Visual Studio 2017, you will need to re-run the Visual Studio Installer, for me it was in the Start Menu.  You will need to click “Modify”.  In the next menu, you will be in the “Workloads” tab.  Next to “Workloads”, click “Individual Components”.  Look for the header “Code Tools” under which will have the “VC++ 2015.3 V140 toolset for desktop (x86, x64)”.  Ensure that toolset is checked.  Click “Modify” in the lower right corner.  I believe it is a big file (approximately 8Gb).

If you are installing visual studio from scratch, I believe this is a similar process when you get to the choices for “Workloads”.  Choose your desired workloads and then go to the “Individual Components” tab.  Look for the header “Code Tools” and ensure the “VC++ 2015.3 V140 toolset for desktop (x86, x64)” is checked.

For this step, I outright copied most of the steps outlined here.

to begin, right click on the the project file and entering the project properties.

C:\Program Files\IBM\ILOG\CPLEX_Studio128\cplex\include
C:\Program Files\IBM\ILOG\CPLEX_Studio128\concert\include

Under the “Release Configuration”

C:\Program Files\IBM\ILOG\CPLEX_Studio128\cplex\lib\x64_windows_vs2017\stat_mda
C:\Program Files\IBM\ILOG\CPLEX_Studio128\concert\lib\x64_windows_vs2017\stat_mda

Under the “Debug Configuration”

C:\Program Files\IBM\ILOG\CPLEX_Studio128\cplex\lib\x64_windows_vs2017\stat_mdd
C:\Program Files\IBM\ILOG\CPLEX_Studio128\concert\lib\x64_windows_vs2017\stat_mdd

cplex1280.lib
concert.lib
ilocplex.lib

And now Visual Studio should be able to call the CPLEX environment.

UX Experience: BIC 4-Color Grip Ball Pens Retractable, Medium Point

One of the most pleasant writing experiences I have had the opportunity to use in the past two years has been the BIC 4-color grip ball pen. This writing utensil is not to be confused with the BIC 4-Color Ball Pens Retractable, Medium Point. The main, and only difference between the two is the barrel of the BIC 4-color grip ball pen is the inclusion of a 12.5mm in diameter pen barrel and a rubberized grip (versus a non-rubber, basic plastic grip and a 10.5mm barrel on the non-grip pen).

The primary users of the BIC 4-colors (grip or no grip) are students and nurses. The four colors are commonly used to organize notes or patient charts. As a member of one of those userbases, I use the BIC 4-color with grip for organizing notes in my daily notebook. The multiple colors help highlight to-do lists or take dictation notes form multiple speakers.

The activities which were in progress while using the 4-color grip pen were situations related to note-taking, which typically occurs in work related or study related settings. I have used the pen to write the occasional rent check in addition to loaning the pen temporarily to a child so he could get his baseball signed at a minor league baseball game. No word if the athlete enjoyed the writing experience.

However, for the context of “note-taking,” there typically is the pressure of completing a sentence or a thought to keep up with a lecturer or speaker. In this use-case, the urgency is not critical, i.e. a misunderstood or misheard thought can be rectified quickly or confirmed with the speaker; however, repeated requests clarifications can be met with ire. So, the ability to write for long periods of time without pause can be of benefit for the user. The ability to also manage a typical 3-hour examination is an added benefit, although will probably be of less use or need in the coming year for me. I attribute this increased writing performance to the increased diameter of the 4-color grip pen and the slight cushion provided by the grip.

Another added benefit for the 4-color grip pen is that the retractable pen portion of the is backwards compatible with the older, non-grip version. Therefore, when I broke the stock retractable portion of the original grip pen, I could exchange the barrel of the non-grip pen with the grip barrel to continue using this new writing experience, which was critical as I have not found the grip version of the pen since I first purchased it approximately two years ago.

Returning to the typical use case, for day-to-day note taking I use the Blueline A9 wire bound notebook. Before discovering the grip pen, I would store my pen in what I am calling the “inboard storage,” that is, the barrel of the pen is placed within the metal wire binding. This contrasts with “outboard storage” where only the pen’s clip is placed in the wire binding and the barrel is outside the binding and the notebook. The Blueline A9 had a 17.0mm binding, however, approximately 3.5mm of the binding is occupied by paper. Attempting to store the 4-color grip pen in the inboard position could wear out the grip faster and/or damage the paper in the notebook. Therefore, I was forced to store the pen in the outboard position. This storage option is what led the original BIC 4-color pen to break. Both versions of the BIC 4-color pens have a very narrow pen clip (2.5mm) versus other pens I frequently use such as the single-color BIC Atlantis (7.0mm at the widest) and the single-color Pilot G-2 (7.0mm). Because of this negative use case, I have decided to store a single-color pen (both of which have grips) with the notebook and leave the BIC 4-color grip pen in my bag for longer note-taking sessions. Both alternative pens can be stored more effectively in the outboard position and the Atlantis can be stored in the inboard position. The G-2 can be stored in the inboard position as well, but only while the notebook is closed. In fact, both alternative pens can have multiple pens stored in the outboard position (up to four with the Atlantis) to satisfy my color needs if the primary, 4-color grip pen is forgotten. However, at this time, I believe two colors (pens) will be sufficient.

Using the 10 dimensions of UX as outlined by Provost and Robert (2013)¹, the experience of the BIC 4-Color grip pen is as follows: