1. Introduction

This example illustrates how to use Techila Distributed Computing Engine to speed up Value-at-Risk computations implemented with Python and follows the same approach discussed in the paper below.

Please note that the Python version of this example was created after the original publication of the paper discussed below, meaning that the paper does not include the performance statistics for the Python version of the code.

If you are unfamiliar with the TDCE terminology or are interested in more general information about TDCE, please see Introduction to Techila Distributed Computing Engine. More details about the TDCE Python application program interface (API) can be found in Techila Distributed Computing Engine with Python.

2. Code Overview

One of the most common risk measures in the finance industry is Value-at-Risk (VaR). Value-at-Risk measures the amount of potential loss that could happen in a portfolio of investments over a given time period with a certain confidence interval. It is possible to calculate VaR in many different ways, each with their own pros and cons. Monte Carlo simulation is a popular method and is used in this example.

In the simplified VaR model used in the example, the value of a portfolio of financial instruments is simulated under a set of economic scenarios. The set of financial instruments in this example is limited to a set of fixed coupon bonds and equity options. The scenarios can be analyzed independently, meaning that the computations can be sped up significantly by analyzing the scenarios simultaneously on a large number of Techila Worker nodes.

This example is available for download at the following link:

The contents of the zip file is shown below for reference:

dir1 python

The run_VaR.py file is main file used in this example and contains the definition for function run_demo, which is the function used to run the example. Depending on the value of the input argument given to the run_demo function, it can be used to run the VaR computations either locally (run_demo(local=True)) or in TDCE (run_demo(local=False)).

python sample

2.1. Data Locations & Management

This example uses a set financial instruments, which are generated when the parameters.py file is executed locally. This file is included in the zip-file, as can be seen from the previous screenshot.

When the parameters.py file is executed, variables will be defined in the local Python workspace. The data stored in these variables will be used when analyzing the scenarios.

When analyzing the scenarios locally, the values of variables needed in the computations will be passed to the computationally intensive function as input arguments, as in any other standard Python application.

In the distributed version of this application, variables will be transferred to Techila Workers participating in the computations by using the peach function’s params parameter.

The screenshot below illustrates which variables will be created by the parameters.py file. Parameters nrOfOptions and nrOfBonds will determine the number of financial instruments in the portfolio and will affect how long does it take to process one scenario (more instruments means more processing time required per scenario.). Parameter nrOfScenarios determines the number of scenarios that will be processed in total. In the local version, the nrOfScenarios determines the number of iterations in the for loop structure and in the distributed version, this will be used to calculate how many Jobs will be used to process the computations.

vars python

2.2. Sequential Local Processing

The computations can be executed locally by with the following syntax:

Syntax for Python 2 Syntax for Python 3
result = run_demo(local=True)
from run_VaR import *
result = run_demo(local=True)

When executed locally, the computations will be processed using the for loop shown below.

# Process computations locally.
for x in range(0, nrOfScenarios):
    scenario_result = do_scenario(x, scenarios, CFs, PCA_t, PCA1, PCA2, PCA3,
                                  S0, K, iv_t, ir_t, ir_r, ir_displacement,
                                  option_Maturity, volshock_surface, iv_M,
                                  isCall, couponTimes, pos)
    portfolioValue[x] = scenario_result[1]

Each iteration processes one scenario and stores the computational result in portfolioValue. Each iteration is also independent, meaning there are no recursive dependencies. The results are visualized by performing a do_postprocess call, which updates the histogram figure with new scenario data at 3 second intervals.

2.3. Distributed Processing

The computations can be executed in Techila Distributed Computing Engine by using the following Python syntax:

Syntax for Python 2 Syntax for Python 3
result = run_demo(local=False)
from run_VaR import *
result = run_demo(local=False)

This will result in the following code being executed, which will create the computational Project.

# Process computations in TDCE.
portfolioValue = techila.peach(funcname="do_scenario",
                               params=['<vecidx>', scenarios, CFs, PCA_t,
                                        PCA1, PCA2, PCA3, S0, K, iv_t, ir_t,
                                        ir_r, ir_displacement, option_Maturity,
                                        volshock_surface, iv_M, isCall,
                                        couponTimes, pos],

The peach syntax parameters are explained below:

The funcname parameter defines that each Job will execute the do_scenario function. With the defaulta parameters used in this example, one execution of the do_scenario is quite quick (approximately 0.1 seconds) and would result in poor efficiency in the Job due to overheads related to data transfers and initializations. We can greatly improve the efficiency by using the steps parameter to increase the number of times the function is executed in the Job, thus increasing the amount of meaningful computational work per Job. In this example, steps has been set to 500, meaning each Job will execute the do_scenario function for 500 times. This means that each Job will process 500 scenarios and will return a list containing the scenario results.

The files parameter is used list which files be transferred to the Techila Workers. In this example, we will need access to files bs_function.py and do_scenario.py files, which contain function definitions for the functions used in the computations. The TDCE system will automatically make functions defined in the files available in the base workspace used in the Job.

The input arguments needed by the do_scenario function have been defined by using the params parameter, where '<vecidx>' will be used to index the list containing the scenarios. These are identical as in the locally executable version with the exception that the value of the for loop counter has been replaced with the '<vecidx>'. This keyword is recognized by the TDCE system and will automatically be replaced by the Jobidx value, which will be unique in each Job. For example, in Job #1 the value will be 0, in Job #2 it will be 1 and so forth.

The total number of Jobs in the Project is defined by dividing the value of the jobs parameter (total number of scenarios) with the value of the steps parameter (scenarios processed in each Job). For example, if nrOfScenarios=100000, then the Project would contain 200 Jobs (100000 / 500 = 200). In situations where the steps parameter is not used, the number of Jobs set to match the value of the jobs parameter.

The stream parameter has been set to True, which means results will be streamed from the Techila Server to the End-User’s computer as soon as they are available. All Job results will be postprocessed by the do_postprocess function, which will visualize results using a histogram graph.

3. Performance Comparison

When the run_demo function is executed, the callback function (do_postprocess) will print information about how quickly scenarios are being processed on average. This information can be used to compare the performance of the local version and the distributed version of code.

The average processing speed is calculated using the following formula:

\(Average Processing Speed = {Scenarios Processed \over Elapsed Time} \)

The value of ElapsedTime seen in the formula is measured from the start of the run_demo function. Respectively the callback function do_postprocess will be used to calculate the elapsed time since the program was started and how many scenarios have been processed so far. This information will then be used to display the average processing speed each time the histogram graph is updated. Please note that because the timer starts as soon as the do_postprocess function is started, there will be a short period at the start of the computations where the processing speed will be lower. As more and more scenarios are processed, the average processing speed displayed will become more representative of the actual processing speed.

The processing speed of the local version will mostly depend on the CPU characteristics of the computer you are using to run the code. The screenshot below illustrates the processing speed when running the computations on a Intel i5-5200U CPU @ 2.20 GHz.

python var local perf
Figure 1. The average local processing speed settled at approx 9 scenarios / second.

The processing speed of the distributed version will mostly depend on the number of Techila Worker CPU cores available for computations. The screenshot below illustrates the processing speed when using 400 Techila Worker CPU cores.

python var techila perf
Figure 2. The average processing speed in TDCE settled at approx 3300 scenarios / second when using 400 Techila Worker CPU cores.