1. Scheduling
In computer science, scheduling is the method by which threads, processes or data flows are given access to system resources (e.g. processor time, communications bandwidth). This is usually done to load balance a system effectively or achieve a target quality of service. The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously). (Source: Wikipedia)
If you are unfamiliar with Techila Distributed Computing Engine (TDCE) terminology or the operating principles of the TDCE technology, information on these can be found in Introduction to Techila Distributed Computing Engine.
Computational workload (Jobs) is transferred to computing nodes (Techila Workers) by the scheduler functionality of the Techila Server. The scheduling cycle consists of matching computational resources (Techila Worker CPU cores) with computational Projects using a randomized match making procedure.
Scheduling cycles can be triggered by several different events, which can take place in a TDCE environment. Events that trigger scheduling cycles include:
-
New computational Project is created by an End-User
-
Techila Worker finishes a computational Job
-
New trusted Techila Workers are added to the Techila environment
-
State of the Techila Worker changes to allow execution of Jobs (state changes to running)
-
New Jobs are created
-
Jobs are marked as expired
These event based triggers can cause several hundred scheduling cycles each second, meaning computational resources will be matched with computational Jobs at a very rapid pace.
In addition to these event-based triggers, scheduling cycles are also executed at fixed time intervals (default 5 seconds). The purpose of these additional scheduling cycles is to ensure that all Jobs will be assigned to Techila Workers, even if the Job would not meet any criteria typically required to trigger a scheduling cycle.
1.1. Techila License
The Techila License defines the maximum number of Jobs that can be running at any given time in the TDCE environment. The Techila License does not limit the number of Techila Workers that can be added to the computing environment. The Techila scheduler will ensure that the number of cores performing computations at the same time does not exceed the amount specified in the license.
For example, consider a customer with a Techila License for 500 CPU cores who has installed the Techila Worker software on 600 dual-core computers. Computational Jobs will be automatically processed on 500 of the most efficient CPU cores. The remaining 700 CPU cores will remain available and can be automatically taken into use if any of 500 CPU cores becomes unavailable (for example if the node shuts down). The process of selecting which Techila Workers will participate in the Projects is explained in Resource Listing.
1.2. Project Priority
When an End-User creates a computational Project, they can specify the priority for their Project. If no Project priority is specified, the "Normal" priority will be used by default. These priorities are used to determine the order in which Projects created by the End-User are processed. Project priority values are:
Priority |
---|
Highest (1) |
High (2) |
Above Normal (3) |
Normal (4) 2 |
Below Normal (5) |
Low (6) |
Lowest (7) 1 |
1 Jobs belonging to a Project with the "Lowest" priority will be terminated on Techila Workers if a Project with a priority level of "Normal" or higher is requesting for computing capacity. 2 This is the default priority value |
By default, Projects created by an End-User are processed in the order determined by his/her Project priority values. Jobs belonging to the Project with the highest priority will be sent to Techila Workers first. As soon as all Jobs from this Project have been assigned to Techila Workers, the TDCE system will start assigning Jobs from his/her next Project. If the End-User has created more than one Project with the same priority, the Projects will be processed in the order in which they were created. The image below illustrates how Project creation order affects the order in which Jobs are processed.
1.3. Expired Job Management
If Jobs belonging to the current highest priority Project are marked as "Expired" (taking longer than expected to complete), these Jobs will be rescheduled and sent to new Techila Workers before processing Jobs from the next Project. This is illustrated in the image below.
1.4. Simultaneous Project Processing
Techila Server can also be configured to schedule multiple Projects simultaneously from a single End-User. When this feature is enabled, computing capacity will be divided between the Projects using the same Random Ticket Draw that is used when managing Projects created by different End-Users (explained in Random Ticket Draw).
The image below illustrates how Jobs from two different Projects are scheduled simultaneously using the Random Ticket Draw. Please note that this example is slightly simplified, as it assumes any Job can be executed on any Techila Worker’s CPU core (no requirements for processor architecture or operating system).
1.5. Multi End-User Environment
In a TDCE environment where several End-Users are running several Projects each at the same time, computing capacity will be divided between computational Projects. This is done by using a Random Ticket Draw to decide which Project is given access to computational capacity.
By default, only one Project from each End-User will be allowed to participate in this Random Ticket Draw. If an End-User has more than one active Project, the Project with the highest priority value will be selected for the Random Ticket Draw. If an End-User has multiple Projects with the same priority, the Projects will be processed in the order they were created. This is illustrated in the example in Figure 5 below.
2. Scheduling Cycle
This Chapter explains the operations that take place during one scheduling cycle
Please note that the scheduling cycle will be executed several hundred times each second, meaning computational resources will be matched with computational Jobs at a very rapid pace. For example, if End-User "John" has created a Project with 500 Jobs and End-User "Alice" has created a Project with 1000 Jobs, all Jobs will be assigned to Techila Workers (assuming there is a sufficient amount computing capacity) within a couple of seconds.
2.1. Resource Listing
The Techila Server starts the scheduling cycle by creating a list of all available Techila Worker CPU (Central Processing Unit) cores that have been sorted according to amount of free CPU power available. This list also contains information on the hardware and software specifications of each Techila Worker, such as the processor architecture and the operating system. This information will be used in the next step when selecting which Projects are compatible with each specific Techila Worker.
After the fastest Techila Worker CPU core has been selected, the Techila Server will select Projects compatible with this CPU core for the Random Ticket Draw.
2.2. Calculating Project Ranking
As soon as the Projects have been selected for the Random Ticket Draw, the Techila Server calculates the ranking for each Project. These ranking values will determine how many random number tickets each Project will receive. These random number tickets will be used in the next Step of the scheduling cycle to determine which Projects are given computing capacity.
The parameters that are used when calculating the ranking are shown below:
-
End-User priority (Configured by the Techila administrator. By default, all End-User priorities are the same)
-
End-User Project priority (Configured by the End-User)
-
Project wall clock time
-
CPU time used in completed Jobs of the Project
-
Number of Jobs that are being processed from the Project.
-
Relative order in which the Projects were created
-
Sum of Techila Worker benchmarks that are participating in the Project
2.3. Random Ticket Draw
As soon as Project ranking values have been calculated, each Project will be given a specific amount of random number tickets based on these ranking values. The amount of random number tickets for each Project will be calculated by dividing the sum of all Project ranking values with the Project’s ranking value. This essentially means that Projects with small ranking values will get more random tickets, which in turn will increase their probability of winning the Random Ticket Draw and getting access to computing capacity.
2.4. Job Assignment
As soon as the winning Project has been determined, one Job from that Project will be assigned to the CPU core of the Techila Worker that had most free CPU power available.
After a Job has been assigned to a Techila Worker, the scheduler cycle is complete. The Techila Server then restarts the scheduling cycle by listing the resources, which will again be followed by the randomized match making process. The scheduling cycle will be repeated as long as there are computational Jobs waiting to be assigned