Performance Modeling and Dynamic Pipeline Optimization

Table of Contents


Overview

Many visualization component architectures use a dataflow pipeline paradigm for their distributed execution model. Such an organization has provided the underpinnings of the most successful visualization packages, such as AVS, IBM's Data Explorer, and the Visualization Toolkit (VTK). In their simplest form, all components that comprise the application pipeline reside on a single platform. Early attempts at distributed execution required a user to manually partition components into remote and local groups. In such a partitioning, some of the components run locally, and the other collection runs on a remote host. The partitioning is static, which means the partitioning never changes in response to changing application needs or environmental conditions.

However, there is no a-priori way to select an optimal (or even tolerable) pipeline distribution at startup without first being able to accurately predict the performance of the individual components. Because no such performance models exist, placement of components to date has been entirely heuristic. When we have a performance model associated with each component that comprises a pipeline, we can create a composite parametric model that enables us to accurately predict the overall performance of the pipeline and therefore can make quantifiably optimal selections for the distribution of the components across resources. The ability to optimally place components will be a core requirement for resource selection mechanisms needed for effective Grid computing. Note that "pipeline elements" consist not only of individual software components, but also include the "pipes" through which data flows between components.

However, it is also true that performance of a visualization pipeline varies dynamically as a function of input data, user parameters, and environmental conditions (e.g., competing but measurable traffic on shared network links). Therefore, the performance model must continuously estimate pipeline performance of visualization "applications" consisting of multiple software components deployed in a heterogeneous and distributed environment. A resilient pipeline flow executive must be able to obtain a quantitative measure or estimate of performance for each element in the distributed application, but should also be able to modify at run time the use of resources in a heterogeneous and distributed environment to achieve optimal performance. The algorithm must be able to take into account the cost of redefining the dataflow topology as well as with the potential impact to user interactivity requirements.

Efforts to support such dynamic redeployment of Grid-based simulation codes have been hampered by the cost of migrating the entire simulation state to a new resource in response to a "contract violation." The costs of migrating the application in this situation often far exceed the benefits of the improved performance of the new deployment. Visualization offers a unique opportunity to support greater flexibility in dynamic job migration because, unlike simulation codes, each time a new dataset is loaded from disk, the entire visualization pipeline state is "flushed," thereby permitting dramatic re-distribution of the components without the need to migrate large amounts of state or checkpoint-restart information.

Accurate pipeline performance prediction also requires accurate models of the network performance and current capacities. There are a number of network monitoring systems in development, such as LBL's Network Characterization Service (NCS) and Web100, which use indirect methods (testing at the endpoints of the network) to estimate network topology, link bandwidth, and available capacity. We have also been investigating direct methods of extracting these parameters using tools like Network Ferret. A SciDAC project - the Bandwidth Estimation project - aims to provide the means for applications to detect and respond to changes in network performance characteristics. We will investigate the relative advantages and disadvantages of these different methods for the purpose of accurately estimating pipeline performance.

There is an opportunity to become more deeply involved with the Grid community to define a set of "standards" whereby Grid-enabled components can "publish" performance estimates. An external process can then use the published performance estimates to derive an estimate of aggregate performance. We will use performance estimates of the pipeline components to determine if a given pipeline distribution is optimal, or if the distribution should be reconfigured to improve performance. The performance model must include the cost of resource migration and balance it against the nature of the contract violation that has prompted the need for redeploying the pipeline. We will investigate the use of empirical/history-based, extrapolated, and (if possible) fully parameterized performance models for some comparatively simple visualization components and the network resources that tie them together.

This information will likely take the form of a new data schema that is published through a standard Grid information service like the Globus MDS. We will also explore a way to expose the performance modeling information through custom MDS information providers and whether such data can be tied into the emerging CIM data model for resource description. Such information will be useful to virtually any distributed application architecture. Therefore, this development work has the potential to impact a wide array of Grid computing activities that are far outside its original scope.

Research objectives

  1. Define scope of variables needed to accurately model the performance of a visualization application consisting of software components deployed in a wide-area fashion.
  2. Investigate ways to obtain data for the variables. Foster collaborative ties with other appropriate research groups to define the means to obtain data.
  3. Investigate network performance monitoring and modeling frameworks in development by the network research community. Accurate and current network performance estimation is a critical component of the process of pipeline performance estimation.
  4. Using performance data and performance estimates, perform dynamic pipeline reconfiguration and optimization.
  5. Establish a collaborative relationship with the Global Grid forum ACE (Advanced Collaborative Environments) working group to foster two-way exchange of ideas and standards for performance modeling and estimation. Visualization is a key Grid application that will be used to test and refine performance measurement and optimization concepts.