1. Scheduling

In computer science, scheduling is the method by which threads, processes or data flows are given access to system resources (e.g. processor time, communications bandwidth). This is usually done to load balance a system effectively or achieve a target quality of service. The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously). (Source: Wikipedia)

If you are unfamiliar with Techila Distributed Computing Engine (TDCE) terminology or the operating principles of the TDCE technology, information on these can be found in Introduction to Techila Distributed Computing Engine.

Computational workload (Jobs) is transferred to computing nodes (Techila Workers) by the scheduler functionality of the Techila Server. The scheduling cycle consists of matching computational resources (Techila Worker CPU cores) with computational Projects using a randomized match making procedure.

image005
Figure 1. The scheduling cycle.

Scheduling cycles can be triggered by several different events, which can take place in a TDCE environment. Events that trigger scheduling cycles include:

  • New computational Project is created by an End-User

  • Techila Worker finishes a computational Job

  • New trusted Techila Workers are added to the Techila environment

  • State of the Techila Worker changes to allow execution of Jobs (state changes to running)

  • New Jobs are created

  • Jobs are marked as expired

These event based triggers can cause several hundred scheduling cycles each second, meaning computational resources will be matched with computational Jobs at a very rapid pace.

In addition to these event-based triggers, scheduling cycles are also executed at fixed time intervals (default 5 seconds). The purpose of these additional scheduling cycles is to ensure that all Jobs will be assigned to Techila Workers, even if the Job would not meet any criteria typically required to trigger a scheduling cycle.

1.1. Techila License

The Techila License defines the maximum number of Jobs that can be running at any given time in the TDCE environment. The Techila License does not limit the number of Techila Workers that can be added to the computing environment. The Techila scheduler will ensure that the number of cores performing computations at the same time does not exceed the amount specified in the license.

For example, consider a customer with a Techila License for 500 CPU cores who has installed the Techila Worker software on 600 dual-core computers. Computational Jobs will be automatically processed on 500 of the most efficient CPU cores. The remaining 700 CPU cores will remain available and can be automatically taken into use if any of 500 CPU cores becomes unavailable (for example if the node shuts down). The process of selecting which Techila Workers will participate in the Projects is explained in Resource Listing.

1.2. Project Priority

When an End-User creates a computational Project, they can specify the priority for their Project. If no Project priority is specified, the "Normal" priority will be used by default. These priorities are used to determine the order in which Projects created by the End-User are processed. Project priority values are:

Priority

Highest (1)

High (2)

Above Normal (3)

Normal (4) 2

Below Normal (5)

Low (6)

Lowest (7) 1

1 Jobs belonging to a Project with the "Lowest" priority will be terminated on Techila Workers if a Project with a priority level of "Normal" or higher is requesting for computing capacity.

2 This is the default priority value

By default, Projects created by an End-User are processed in the order determined by his/her Project priority values. Jobs belonging to the Project with the highest priority will be sent to Techila Workers first. As soon as all Jobs from this Project have been assigned to Techila Workers, the TDCE system will start assigning Jobs from his/her next Project. If the End-User has created more than one Project with the same priority, the Projects will be processed in the order in which they were created. The image below illustrates how Project creation order affects the order in which Jobs are processed.

image006
Figure 2. In this example, End-User "John" has created two Projects that both consist of six (6) Jobs. There are four CPU cores of computing capacity. In Step 1, Jobs from Project ID 1 will be assigned to Techila Workers. This is because the Project 1 was created before Project 2. In Step 2, the remaining two Jobs from Project 1 have been assigned. This means two Jobs from Project ID 2 can also be assigned. In Step 3, Project 1 has been completed and the last Jobs from Project 2 have been assigned to Techila Workers. In Step 4, all Jobs from both Projects have been completed.

1.3. Expired Job Management

If Jobs belonging to the current highest priority Project are marked as "Expired" (taking longer than expected to complete), these Jobs will be rescheduled and sent to new Techila Workers before processing Jobs from the next Project. This is illustrated in the image below.

image007
Figure 3. In Step 1, Jobs from Project 1 will be processed. In Step 2, Jobs are being processed from both Projects. In Step 3, one Job from Project 1 has expired and has been restarted on an additional Techila Worker CPU core. This means there are two copies of this Job running simultaneously. In Step 4, the expired Job from Project 1 has been completed. This means that the remaining Jobs from Project 2 can be assigned to the Techila Worker CPU cores.

1.4. Simultaneous Project Processing

Techila Server can also be configured to schedule multiple Projects simultaneously from a single End-User. When this feature is enabled, computing capacity will be divided between the Projects using the same Random Ticket Draw that is used when managing Projects created by different End-Users (explained in Random Ticket Draw).

The image below illustrates how Jobs from two different Projects are scheduled simultaneously using the Random Ticket Draw. Please note that this example is slightly simplified, as it assumes any Job can be executed on any Techila Worker’s CPU core (no requirements for processor architecture or operating system).

image008
Figure 4. In this example, two Projects are being scheduled simultaneously. Computing capacity is divided in an even manner between these Projects using a Random Ticket Draw. The random nature of the scheduling algorithm means that there can be deviations in the amount of computing capacity dedicated to any given Project at any given time. For example, in Step 2, Project 1 receives 3/4 of the CPU cores. Respectively in Step 3, Project 2 receives a larger amount of computing power.

1.5. Multi End-User Environment

In a TDCE environment where several End-Users are running several Projects each at the same time, computing capacity will be divided between computational Projects. This is done by using a Random Ticket Draw to decide which Project is given access to computational capacity.

By default, only one Project from each End-User will be allowed to participate in this Random Ticket Draw. If an End-User has more than one active Project, the Project with the highest priority value will be selected for the Random Ticket Draw. If an End-User has multiple Projects with the same priority, the Projects will be processed in the order they were created. This is illustrated in the example in Figure 5 below.

image009
Figure 5. In this example, End-Users "John" and "Alice" have both created two Projects. Techila Distributed Computing Engine will then select the Project with the highest priority value from each End-User for the Random Ticket Draw. Project ID 2 will be selected from "John" ("High" is more important than "Normal) and Project ID 3 will be selected from "Alice" (Project 3 is created before Project 4)

2. Scheduling Cycle

This Chapter explains the operations that take place during one scheduling cycle

Please note that the scheduling cycle will be executed several hundred times each second, meaning computational resources will be matched with computational Jobs at a very rapid pace. For example, if End-User "John" has created a Project with 500 Jobs and End-User "Alice" has created a Project with 1000 Jobs, all Jobs will be assigned to Techila Workers (assuming there is a sufficient amount computing capacity) within a couple of seconds.

2.1. Resource Listing

The Techila Server starts the scheduling cycle by creating a list of all available Techila Worker CPU (Central Processing Unit) cores that have been sorted according to amount of free CPU power available. This list also contains information on the hardware and software specifications of each Techila Worker, such as the processor architecture and the operating system. This information will be used in the next step when selecting which Projects are compatible with each specific Techila Worker.

After the fastest Techila Worker CPU core has been selected, the Techila Server will select Projects compatible with this CPU core for the Random Ticket Draw.

2.2. Calculating Project Ranking

As soon as the Projects have been selected for the Random Ticket Draw, the Techila Server calculates the ranking for each Project. These ranking values will determine how many random number tickets each Project will receive. These random number tickets will be used in the next Step of the scheduling cycle to determine which Projects are given computing capacity.

The parameters that are used when calculating the ranking are shown below:

  • End-User priority (Configured by the Techila administrator. By default, all End-User priorities are the same)

  • End-User Project priority (Configured by the End-User)

  • Project wall clock time

  • CPU time used in completed Jobs of the Project

  • Number of Jobs that are being processed from the Project.

  • Relative order in which the Projects were created

  • Sum of Techila Worker benchmarks that are participating in the Project

2.3. Random Ticket Draw

As soon as Project ranking values have been calculated, each Project will be given a specific amount of random number tickets based on these ranking values. The amount of random number tickets for each Project will be calculated by dividing the sum of all Project ranking values with the Project’s ranking value. This essentially means that Projects with small ranking values will get more random tickets, which in turn will increase their probability of winning the Random Ticket Draw and getting access to computing capacity.

2.4. Job Assignment

As soon as the winning Project has been determined, one Job from that Project will be assigned to the CPU core of the Techila Worker that had most free CPU power available.

After a Job has been assigned to a Techila Worker, the scheduler cycle is complete. The Techila Server then restarts the scheduling cycle by listing the resources, which will again be followed by the randomized match making process. The scheduling cycle will be repeated as long as there are computational Jobs waiting to be assigned