Load Balancing Grid Scheduler Management
Abstract- Computational Grid is a widespread distributed computing environment that provides huge computational power for the large scale distributed application. One of the challenging issue in computational grid is load balancing. This paper, proposed a new load balancing scheduler algorithm which can not only increases the utilization of the resource and throughput, but also realize the load balancing within the grid environment. The updated topology and load information is acquired dynamically from the resource using the event notification approach. In order to maximize the utilization of resources and to increase the performance of the system application level load balancing is needed for the individual parallel jobs. In many approaches load balancing is done only at the local scheduler level, which is applicable to small application and leads to more communication overhead between the nodes. For the large scale application load balancing at the local scheduler level will not provide the feasible solution. So the novel load balancing algorithm is proposed, which provides the load balancing at the meta-scheduler level. To initiate the load balancing triggering policy is used, which determines the appropriate time period time to start the load balancing operation. Triggering policy can be initiated by using two approaches such as threshold and boundary value approach. These approach increases the performance in the large scale application by submitting the job to the least loaded machine to reduce the elapsed time and waiting time of job, and to maximize the utilization of the resources which are idle or least loaded.
Index Terms - Load balancing, topology information, load information, event notification, triggering policy, threshold value and boundary value approach.
A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities. One of the challenging issue in grid environment is Load Balancing, which makes the coordination and scheduling of the resource dynamically to execute the tasks more quickly , . Grid technology provides the best toolkit and the system platform to deal with these issues. A vast array of Globus Toolkit (GT4) middleware has been developed to support applications in Grid , .
The impact of using Load Balancing Scheduler (LBS) algorithm is to choose the best fit resource from the Grid Environment, which executes the task more quickly than the other resource. As per the grid environment is concerned, each cluster is consider as a resource . There are many Grid Scheduling algorithms exist, in which the resource with high CPU speed and free memory is selected as the best resource for task execution. These types of algorithm may leads to overhead in the case, where the newly arrived task requires more CPU speed and free memory. This situation leads to overloading of the particular resource due to demand and some times, the chance for the unavailability of the resource. So far the load balancing algorithm is implemented by exploiting the static or dynamic resource information, topology information and load information of the resource, such as loaded or unloaded , . They have exploited the information as a separate model, but they didn't integrate all the information in the single algorithm. In order to avoid overloading and unavailability of the resource we have proposed new LBS algorithm by exploiting both topology and load information, which is obtained dynamically from the Grid resource.
Of course there are many load balancing algorithms, which act more efficiently in grid environment. Since the jobs are dynamically submitted to scheduler, there may be a possibility to have same requirement for many jobs. As a result Grid scheduler will submit more number of jobs to the single resource, which leads to overloading. In case, if the local users fire the job to the resource without the knowledge of meta-scheduler, then the resource is forced to handle too many number of jobs from both meta-scheduler as well as local user and this situation leads to overloading of resource. In order to avoid the overloading and job waiting time, load in the particular resource is needs to be balanced. So the Load Balancing is a preferable solution to provide application level load balancing for individual jobs . The impact of using LBS algorithms in the meta-scheduler will reduce the total elapse time, waiting time of the job and maximize the utilization of the resource which is idle or least loaded , , .
The most important issues in grid environment, is the performance degradation caused due to load imbalance and achieving minimum response time for the client job request . Therefore Load Balancing is indispensable for the heterogeneous cluster, to assure fine distribution of workload on each cluster. All these approaches, broadly implements load sharing algorithms, which can be static or dynamic and also uses the centralized or distributed control . The reference shows that a hybrid of both static and dynamic strategy for the resource selection provides various Load Balancing policies for providing a good performance . In the literature dynamic load balancing technique for grid application based on Graph partitioning, which exploits knowledge of the topology of the Grid environment to partition the communication graph in such a way as to reduce the cross-site communication . For the dynamic load balancing algorithm, it is unacceptable to frequently exchange state information because of high communication overhead. In , sender processor collects the status information about neighboring processors by communicating with them at every load balancing instant. For the large scale Grid Environment where communication latency is very large, the status exchange at each load balancing instant can leads to large communication overhead , . In our approach the problem of frequent exchange of information is alleviated by estimating the load information on demand by using the event notification approach. Here the triggering policy is considered which makes the decision at what time the load balancing operation is to be initiated .
DYNAMIC LOAD BALANCING GRID SCHEDULER
Load Balancing policy
In order to maximize the utilization of resource and to reduce the waiting time of the job, application level load balancing is needed for individual parallel jobs. As per the grid environment is concern, load balancing is to be done by considering any one of the load balancing policies such as transfer policy, selection policy, location policy, information policy and triggering policy , .
In our proposed model we have considered only four policies. First is the transfer policy, which is also called as threshold policy and the thresholds are expressed in units of load , . Suppose new job or task originates at the node, when the load at the node exceeds a threshold T, the transfer policy decides that the node is a sender. In case the node falls below threshold T, then the transfer policy decides that the particular node can be a receiver for the remote task. Second is the location policy, the responsibility of this policy is to find suitable nodes to share load. A widely used method for finding a suitable node is through polling. In polling, a node polls another node to find out whether it is a suitable node for load sharing. Nodes can be polled either serially or parallel. An alternative to polling is to broadcast a query to find out if any node is available for load sharing. Third is the information policy, which is responsible for making decision such as when information about the states of other nodes in the system should be collected, where it should be collected from, and what information should be collected. Of course there are many information policies available, but we have considered the sender initiated demand-driven policy. In the sender initiated policy, sender will look for the receiver to transfer their loads, and the reverse one is receiver initiated policy .
Finally we considered the triggering policy, which determines the appropriate period to start a load balancing operation. The triggering policy can be initiated by any one of the two approaches. In the threshold approach, threshold value is set for the resource or cluster load present in the grid environment. If the resource load exceeds the particular threshold value, then the load balancing policies such as information, selection and location policy are considered, to migrate the job to other resources which are below the threshold value . And the boundary value approach is used, if no job arrives to the meta-scheduler for certain time interval, then the average load of the resources is calculated. The upper and lower boundary value is set for the resource, suppose if the resource load exceeds the upper boundary value means then the load balancer will migrate the job to the resource which is below the lower boundary value . Here the job migration takes place until the resource comes to moderate load.
Load Balancing Grid Scheduler model
In this paper, Load Balancing Grid Scheduler (LBGS) model is proposed using Load Balancing Scheduler (LBS), Load Balancer (LB) and Job Migration (JM) algorithms. This scheduler model is shown in Fig 1. The users will submit their job to the meta-scheduler which falls in the queue of request handler. The Request Handler service provides a user interface through which a client can submit the jobs described using JSDL specification to the underlying meta-scheduler. This service will obtain the jobs from user and stores it in a queue of request handler. The dispatch manager, which is present in the CARE Resource Broker (CRB) obtains the submitted job information periodically from the queue, which is implemented in the request handler component . It sends the jobs to LBS for discovering suitable resources for every scheduling interval it maintains.
Fig 1: Structure of Load Balancing Grid scheduler
This scheduler will perform the load balancing and job migration, by exploiting the information gathered from load monitor and information manager, along with the topology and load cost. It will allocate the job to the resource which is having less topology cost to reach and less load cost. The Load Monitor contains the information about all the load agents, and prepares system load information about all the resource. This load information is automatically updated from the load agent when there is any change in the resource load by using event notification approach. Finally the load information is given to the LBS for actual Load Balancing and job migration.
The Load Agent service acts as a server and a copy of load agent service has to run on all resource or cluster where the users want to run their applications. Load agent provides system load information such as job queue length, CPU speed and the number of node count available in the resource.
The information manager will query the Monitoring and Discovery Service (MDS) and sends the host information to the LBS. Based on the monitoring interval it keeps track of the host status and updates the host information to the LBS. If any new resources are added those information are also updated periodically . The Transfer Manager is invoked by the dispatch manager with the job-id and the matching resource-id as input. Once it is invoked, the transfer manager creates a remote directory for the given path name in user input. Transfer manager gives the permission rights for the execution of given job in the remote directory. Once this process is over, it informs the dispatch manager through messages.
The execution manager is invoked by the dispatch manager when the transfer manager completed the creation of directory in the remote host. The dispatch manager will dispatch the job for execution. Execution manager will keeps updating the job status to the scheduler. Finally it reports the completion or failure of job to the scheduler.
Before presenting the exact problem statement, we first describe the notations and terminology that are used throughout the paper (refer to Table 1).
List of notations and terminology
Number of heterogeneous resources (R1,R2, â€¦, RM)
Number of jobs to be processed
Delay to reach the resource Ri
Topology Factor of the resource Ri (Number of hop to reach the resource)
Topology cost of the resource
Job unit of the task or job Ji (size of job)
Estimated arrival rate for the resource Ri at time T
Estimated service rate for the resource Ri at time T
Boundary value estimation factor
Threshold value estimation factor
Threshold load value of the resource
Resource is of low load
Resource is of high load
Resource is of moderate load
Estimated load of the resource Ri at time T
Total load of all the resource R = (R1,R2, R3â€¦, RM)
Average load of all the resource R
Load upper boundary for the resource R
Load lower boundary for the resource R
We will now introduce certain key performance metrics considered in this paper such as Topology and Load cost, regarding the Grid Scheduling and Load Balancing in the real Grid Environment.
Estimation of Topology Cost
Topology is regarded as one of the outmost important factors since it could strongly affect the Grid performance . In our proposed approach, LBS will take the decision to submit the job to best fit resource, which is having less topology cost to reach, by exploiting topology information. Here the topology information, such as delay and number of hop count is gathered by using the Network Weather Service Tool. The topology cost is calculated as follows,
TCi = 2 * Di * TFi
Article name: Load Balancing Grid Scheduler Management essay, research paper, dissertation