The chip design technologies

Essay add: 10-01-2017, 18:25   /   Views: 19

Abstract - The advancements in the chip design technologies in the recent past has lead to the development of huge computationally powerful datacenters. But the increase in computational power is accompanied with the steep rise in the total power consumption to operate these datacenters. Most of the modern datacenters lacks efficient and fine-grained data monitoring and decision making capability to minimize the total power consumption while maintaining desired Quality Of Service. In this paper, we proposes a low-overhead full system power model, System Power Meter (SPMeter) to predict power and resource utilization of a cluster node. The paper is designed to give special consideration in explaining two significant problems while developing power model to predict instantaneous power consumption, they are: selection of system events having maximum correlation with total power consumed and various approaches towards building a accurate as well as low-overhead model. Moreover, the work here proposes an infrastructure to collate monitored vital statistics from all participating cluster nodes in a homogenous or heterogeneous cluster to a central decision making node, the one essentially hosting the scheduler job. SPMeter modelling techniques are evaluated using six well known computationally diverse benchmark programs over more than one server hardware platforms such as: AMD Opetron Sun Servers and Intel Pentd Dell Power Edge server. The proposed model is capable to predict the instantaneous power with an average median error of 8% or less against actual measured power using an external power measuring device.

Keywords: Sensor Network, Data Center, Job Scheduling

I. Introduction

Recent years have seen rapid growth in High Performance Computing datacenters both in terms of count as well as magnitude. The major reason for this increase is due to the applicability of principles and resources of high performance computing in myriad applications. In recent decades, advances in micro architecture and fabrication techniques have fuelled development of datacenters equipped with the extreme computational power and accuracy, but the downside of this development is the steep increase in power requirement to sustain desired Quality Of Service using these server farms. For instance, a supercomputer with 360 TFlops with conventional processor such as IB Blue Gene/L requires 20 MW to operate []. In the report submitted to Congress on "Server and Data center Energy Efficiency" in 2007 [], the energy consumption by US data centers and servers was 61 million kilowatt-hours (KWh) in 2006, with a total electricity cost was about $4.5 billion; roughly equal to the energy consumed by 5.8 million average US families. Moreover, the carbon emission (CO2) equivalent for this level of consumption was about 846 million metric tons. The report also indicated that the figures are expected to double themselves in next five years given the current trend.

The power dissipation in a datacenter is almost equally shared between the server hardware and the power distribution and cooling equipments. Tradeoffs in energy use exist between greater utilization of hardware resources and performance degradation of the running jobs, which allows for an optimal level of power consumption. Intelligent deployment of jobs in the cluster to minimize power use is realizable, if there exist a low-overhead monitoring mechanism to gather runtime resource utilization and power requirements for participating cluster nodes.

Most of the modern datacenters lacks the capability to profile the cluster nodes on the basis of its resource as well as power requirements. In this paper, we propose System Power Meter (SPMeter), a resource and power monitoring system. The major components in SPMeter are: SPMeter_monitor, SPM_power, SPM_report. The SPM_monitor is a daemon that is responsible to gather vital statistics of selected system events that have maximum impact towards power consumption and generate a profile of the system based on resource utilization of its four major sub-components such as: cpu, cache, dram and disk. SPM_power is responsible to predict instantaneous power consumed by a physical node based on the data monitored by SPM_monitor daemon. The work here is an attempt to determine minimum system events that can be used to predict power with acceptable average deviation error against actual consumed power. The work presented in this paper is being verified on more than one hardware platforms such as AMD Opetron Sun Servers, Intel PentD Dell PowerEdge servers, we are able to profile given physical node by using maximum of four significant system events with an acceptable accuracy. Further, we propose a framework to collate the monitored per node resource and power utilization to a central decision making node within a cluster, essentially one hosting scheduler node with an intension to assist power aware scheduling policies implementation.

The rest of the paper is divided into the following sections: Section II provides insight to approaches taken in the past to predict power and some related works. Section III briefs about the potential system events significant to prepare a full system power model. Section IV, provides an architectural overview to SPMeter. Section V, provides details regarding challenges in modelling monitored system model and provides details about building the power model proposed in this work. Section VI, discusses our experimental setup, power model validation details, and publishes results obtained by running six benchmark programs. Section VII, concludes the work presented in this paper with providing highlights to some of our future works.

II. Related work

With the increase in the popularity of the datacenters, there are numerous research efforts towards developing/optimizing methods to optimize total power consumption to operate these server farms with maintaining desired Quality Of Service. There are ongoing efforts to optimize power consumption both in software as well as hardware. Few of the most successful and implemented hardware techniques towards power optimization is Dynamic Voltage and Frequency Scaling (DVFS). DVFS helps in minimizing the power consumption by reducing the cpu operating frequency during idle cycles. Moreover, most of the modern operating system implements software to regulate processor speed based on given system workload.

Moore et al proposed temperature aware workload placement in datacenter [8]. Qinghui et al suggested experimentally algorithm to deploy jobs in the data center based on the physical location of the server nodes to minimize the cooling cost of datacenter [5].

Prior efforts towards power predictions can be broadly classified into two categories: first, power estimates based on usage of significant functional units and second, using performance counters to predict power for significant sub-components within the system. The power model is closer to the latter approach with estimating power by modelling correlation of selected system events to total power consumed. Joseph and Martonosi [] estimate power consumption using performance counters. They used simulator to produce results and require multiple run to gather selected events due to hardware limitation. Wu et al [] estimate power consumption of functional unit for Pentium architecture using micro benchmarks. Enonomou et al [] used performance counters to predict power of blade servers. Lee and Brooks [] built a statistical model trained using selected hardware design parameters and predict power consumption based on statistical correlation of selected hardware design events. Their approach requires sampling of large design space and estimate power consumption based on previous power values for the same design space profiled a priori. A priori sampling constraint limits profiling of applications for which sample space is not known. Contreras and Martonosi [] predict power consumption using performance counters on an XScale. Merkel and Bellosa [] use performance counters to estimate power consumption in a symmetric multiprocessing system (SMP), the processes were shuffled to reduce overheating of any single processor. Bircher et al [], propose online measurement of full system power consumption utilizing cpu performance counters. Their work provide details about the performance counters that have major impact over total power consumption but does not provide details regarding generating the model to predict runtime power and provide equations to predict power consumption by individual sub-component of the system. The closest approach to the work presented in this paper is one presented by Karan Singh et al []. Their work proposes a linear regression model using piece-wise function to estimate power consumption for per-thread as well as full system. The work presented in this paper proposes a low-overhead technique to profile cluster nodes based on their resource and power consumption. We propose a novel approach to model full system power by sub dividing the task in to two separate linear regression power model formed by combining system events (gathered using cpu performance counters and disk accesses) with high degree of correlation. The proposed model is verified on more than one hardware platforms such as: AMD Opteron, Intel-pentD etc. Moreover, we propose an infrastructure to collate resource and power utilization of each individual cluster node to a central decision making node to assist scheduler to achieve power aware scheduling.

Although there is a growing corpus of work in this area, to the best of our knowledge the work proposed in this paper is first attempt to predict full system instantaneous power using separate linear models of highly correlated system events as well as profile cluster nodes based on their resource and power consumption.

III. System Events Selection

The accuracy of power model to predict instantaneous power for full system depends mostly on selection of the system events for which correlation with power consumption is greater when compared to other less related system events. The most significant system events that can be most likely candidate to be included in power models are:

  1. Cpu Cycles
  2. Almost all the modern chip architectures expose performance counters to count UNHALTED as well as HALTED cycles for cpu. It is proved earlier that cpu sub-component is one of the major contributor towards total power consumption by the system.

  3. Retired Micro Operations
  4. This system events accounts for all instructions that completed execution and for which the architectural stare is updated. It includes interrupts and exceptions that were raised while instruction execution. This is counted as one of another potential even to be monitored to predict cpu power consumption.

  5. Cache Access
  6. The cache subsystem consists of on-chip system caches such as L1 and L2 data as well as instruction caches (and L3 if available). The cache sub system is one of the major contributors towards total power consumption. Moreover, we observed that there exists a high correlation between cpu unhalted cycles and cache accesses in system.

  7. DRAM Activity
  8. The pages miss in the cache leads to dram access in the system. The dram accesses can lead to one of three possible scenarios: page hit, page miss or page conflict. The dram access count should be taken in to account to model full system power consumption.

  9. Disk Accesses

The page fault in the dram sub system leads to access to secondary hard disk in the system to fetch the required page. This operation introduces high latency in the system along with power consumption due to mechanical nature of hard disk (other than solid state drives). This system event also account for Direct Memory Accesses (DMA) issued during sampling period.

The accuracy of a full system power model depends upon selection of input system events to be monitored to generate the training dataset. After careful observation of various system events, we selected at most four events as input vectors to our power model, they are: unhalted cpu cycles, cache accesses (data and instruction cache access), dram accesses and disk accesses (read and write bytes). The selection of events is specific to given hardware platform but major sub-components monitored to predict power are: cpu, cache, dram and disk. The first three system events were monitored using exposed cpu performance counters but due to lack in some direct performance counter to get accurate disk accesses, we used utility program to get total disk accesses (reads as well as write in bytes) for a given sampling period. This approach also addresses the hardware limitation imposed by certain hardware platform (such as AMD Opteron) in terms of maximum performance counters that can monitored simultaneously thus achieving real time power prediction for even those platforms.

IV. SPMeter- Architecture Overview

The proposed monitoring technique in this paper composed of three major components: SPM_monitor, SPM_power and SPM_report.

SPM_monitor, is a daemon responsible for sampling selected system events to gather required data for other components to perform desired results. We have used a hybrid approach to implement this daemon by utilizing both cpu performance counters as well as Linux utility program to gather samples related to selected system events. Oprofile, a system wide Linux profiler [], is used to obtain hardware performance counter values for given sampling period. Most of the commercial Linux distribution such as Redhat RHEL provides default install of Oprofile, thus it does not require any software install on a cluster node to run SPMeter. SPM_power, is the implementation of our power model. The details about power modelling approach and its validation are discussed in details in following sections V and VI respectively.

SPM_report, this component in the proposed method generates reports for resource and power utilization of a cluster node that can be collated to a central node within a cluster. The proposed monitoring method is basically follows client-server paradigm. There exist two separate versions of the implementation for a client as well as for server node. The client version periodically generates resource and power profile of the node using SPM_monitor and SPM_power, this profile is then sent to server within the cluster. The server node is responsible to collect samples from all participating nodes in the cluster and maintains a list of active nodes in the cluster at any given time. The design takes care of not opening to many ports for communication leading for infeasibility in model deployment on large cluster environments. Moreover, the list also accounts for change in cluster node states i.e. updates the list against any node failure or recovery. Figure [1] depicts various components and their intercommunication.

V. Power Model Details

Our power model SPM_power, utilizes the data monitored using SPM_monitor to predict total system power consumed at any given time by the system. We made several attempts to develop a power model with main objectives to be a low-overhead model in terms of runtime power computation as well as to get a reasonable accurate result when compared to some external attached full system power meter. The following sections briefs about the various approaches before finalizing power model proposes in this paper.

As discussed in the earlier sections, we based our power model on measuring power consumed by four major sub-components in the system i.e. cpu, cache, dram and disk. Our first attempt was to find a four dimensional linear regression model of that fits input variables thus predicting instantaneous power.

where, C accounted for idle when system was running no jobs other than housekeeping done by kernel threads, c1,c2,c3 and c4 were determined by training the model using data obtained by running benchmark programs. This attempt was a failure and its accuracy was highly dependent on the benchmark programs. Further, we continue our efforts to refine the four dimensional linear model to N degree polynomial with four input vector. Even with this modification we were not able to produce result better than average error deviation of 20% or less on moth the benchmark programs against actual measure power.

To understand the input vector space better we plotted 3D plots of three input vector at the same time, it is observed that these data can be clustered into less number of dimensions. The next approach we tried was clustering the input space in to clusters and the obtained centroid is used as a priori information to model unseen test data. Even though the input vector space can be classified into few clusters but this approach was not much better than earlier attempt to model instantaneous power. One of the reasons of this failure was that model is not able to capture proper correlation of the input vectors with total power consumption. Moreover, the failure may be due to broad spectrum values for input vector space when system goes from idle power to its maximum power consumption for a particular benchmark program.

With the lessons learned from the earlier attempts, we worked more experiments to understand more correlation between input variables. The system workload at given time can be broadly classified into two major categories: cpu or I/O intensive. It is observed that the input vector space dimensionality can shrink from existing four dimensions to two dimension with two separate power models to estimate power for cpu subsystem separately from i/o sub-system. It is observed that cpu and cache (data and instruction) sub-system work more closely when system is running cpu intensive jobs and same stands for dram and disk components. This understanding motivated us to remodel our approach as two separate two dimensional linear regression models with input vectors as {cpu, cache} and {disk, dram}.

But there still remain a task to combine these two separate models to obtain a reasonably accurate estimation of full system power. The approach we took to solve this problem, we used following equation:

The above equation can estimate runtime power consumption of full system irrespective of the nature of workload being run by system by combining the power factor from both cpu-cahce and disk-dram sub-systems. The constants a, ß are determined heuristically by closely observing the dataset obtained by running computationally diverse benchmark programs and understanding the contribution by each individual power model towards total power consumption. The experimental results obtained using this method was much better than earlier approaches as expected.

But, there remains an open question whether the power model should model the input vectors as linear regression model or N degree polynomial (N > 1), or any other mathematical relation such as logarithmic, exponential, sinusoidal etc can fit the data in best possible manner. With an intension to better our model we performed experiments to determine the degree of polynomial that can provide best fit to the input vector space. Below are the published results obtained by determining average deviation error by running six well known benchmark programs with increasing degree of polynomial for power model equations:

According to the Law of Diminishing returns, the increase in complexity of the model does not lead us to substantial increase in performance. Thus it can be concluded that a linear regression model incurs least overhead cost with comparable average deviation error.

Article name: The chip design technologies essay, research paper, dissertation