Nency Johnston 1
Visualization Group 2
The mission of the Visualization Group at Lawrence Berkeley National Laboratory is to enhance the field of visualization by working with researchers at LBNL and the National Energy Research Scientific Computer Center (NERSC). As part of this mission, we maintain a visualization facility, provide low-level and high-level visualization software, and investigate new ways of visualizing data.
The main focus of visualization group members is to participate in multidisciplinary teams. Frequently these are long-term collaborations. but we also provide more casual short-term consulting services. In both cases, we serve a user community from a diversity of disciples include life sciences, earth sciences, materials sciences, nuclear sciences, research medicine, physics and mathematics.
Of particular concern to the group is the support of remotely-located NERSC users. The wide distribution of NERSC use throughout the United States poses unique problems associated with providing user access to higher-end visualization resources. Developing and adapting our tools to address these issues is an ongoing direction for the group
A successful and interesting, but short-lived, experiment which addressed both information and technology dissemination and technology sharing took place during 1991-1995 with the International AVS Center. During this time, the LBL Visualization Group contributed around 80 or so AVS "modules" to the user community. This effort was well received, as evidenced by the three awards given to LBL by the International AVS Center. AVS shortcomings aside, this model was highly successfully for a number of reasons:
In addition to source code distribution, we have published numerous papers, primarily in the area of visualization engineering. More specifically, most of these papers address the theory and implementation of scalable desktop-to-immersive Virtual Reality interfaces for scientific visualization.
Another venture into tool-sharing was carried out via the legacy of the old SLATEC group. These members agreed to "share code." However, due to lengthy delays induced by attorneys at the various laboratories, it took some four years before attorneys from the organizations of the original SLATEC cast signed a new agreement which blessed code sharing. LBL was the principal contributor, making the AVS modules available as well as some miscellaneous utilities for visualization. This group has since become inactive.
A good way to satisfy 80% of users' needs with 20% effort is to use stable, broadly-applicable software which is commercially supported. That software should have good documentation, good support including tutorials and examples, and is ideally easy-to-learn. A good example of spending 80% of effort towards 20% of user's needs would be the customizations required to enable an older Fortran code, for example, which runs in batch mode to run in an interactive environment which includes visualization as well as "steering." In the former case, users are often "on their own" in terms of doing the visualizations, but can and do call on our group for consulting. This 20% effort rarely advances the state of visualization science, but does make many users happy. The 80% effort produces papers, often results in some sort of advance in visualization science, occasionally may be peripherally involved in a new scientific discovery, but is often speculative.
To date, our efforts to customize user codes to run in an interactive visualization environment have proceeded along one of two paths. One path is the integration of such a code into a data flow environment, such as AVS or Khoros. In order to accomplish this goal, "hooks" must be manually placed within the code at judicious locations, and are used to make subroutine calls which copy data out of the simulation and into the visualization environment. The second path requires a similar step, but has consisted of custom socket code which achieves the same goal. Both approaches require that some sort of user interface be built to control the code. The UI's typically consist of subroutine calls to package-specific widget-type objects.
One of our goals for future work includes the exploration of tools and techniques which will streamline this process: take what has been essentially a customization phase and put it back into the "mainstream," covered by the tools in the 20% category.
We work with a variety of researchers who use (or will soon need to use) hierarchical data structures in their computations. The need for these data models arise when the data cannot be effectively represented at one resolution. They can also arise when different portions of the computation are more naturally represented in different ways (e.g. different coordinate systems). The computations lead to very large data sets which contains a large number of variables over a large range of scales. As a result, we are looking at ways to deal with the visualization needs of these researchers.
We are working with researchers in the earth sciences, computational fluid dynamics, chemistry, and biochemistry to help them visualize their specific multiresolution, hierarchical data. From these collaborations, we are getting a better understanding of which problems (and solutions) they have in common and which problems need to be solved on a case-by-case basis. They are also letting us explore balance of computing and data distribution in the visualization process (e.g. geometry computed on the NERSC T3E, the geometry rendered on NERSC Onyx 2, and the resulting image displayed on the researcher's local workstation). This in turn has led to experimenting with visualization tools on a wide spectrum of computing architectures over a wide range of communication networks, each with their own advantages and disadvantages.
The CCSE at LBNL/NERSC developed a C++ class library, Boxlib, which supports adaptive mesh refinement (AMR). This library is freely available and is being used by researchers around the world. It was vectorized on the NERSC Cray C90 and it achieved close to the theoretical peak performance of that machine. It has now been ported and parallelized on the NERSC T3E. On the T3E, the CFD calculations will be produce hundreds of timesteps with gigabytes of data per timestep. In order to avoid moving this large amount of data off the T3E for visualization, much of the visualization will be done on the T3E. One of our research current projects is to parallelize software volume rendering so that it can produce images at several frames a second from a 1K x 1K x 1K data set.
Another group at LBNL/NERSC (ANAG) is extending Boxlib to include more complicated data at the individual cell level. Specifically, they are adding geometry which is does not lie on the grid and can cause cells to be multivalued. We are working with them to extend Khoros to enable them to perform simple operations (e.g. select a scalar, take 2D slices, take 3D subvolumes) and then display the results in a number of ways (e.g. color image, zoomed image, cell spread sheet). These tools will be used to help them debug their extensions and to look at data sets once the code is working. Also, to avoid "reinventing the wheel" by using existing computational tools in Khoros, we are writing utilities to transform the AMR data into a stream of numbers, which can be operated on by existing Khoros tools, and then back to AMR data. In this way AMR data with the same hierarchical structure can be processed in Khoros.
Web-based technology in scientific visualization will continue to play a major role in the Visualization Group's efforts. One of the primary uses of web based technology is to enhance the dissemination of information to and among collaborations. With many of our users scattered throughout the United States, web based technology has helped us in educating and working with our remote users in a collaborative fashion. Currently, most of this interaction requires third-party tools and information sharing in non-real time.
A rich area of research is in the development of software tools that allow users to generate and manipulate scientific visualizations over the Internet. This is especially applicable to our users who are not located at LBNL and thus not able to utilize our visualization facility. One of the strongest points of Web based technology is the relative hardware independence of most Web-based tools. This is especially true of tools developed using Java as the front end. Although performance may vary widely from hardware platform to hardware platform, the cross-platform capability insures, for the most part, a common method of interaction and display regardless of the type of system it is being run on. This flattens the learning curve required by the user and diminishes the need for specific hardware systems. We will continue to develop tools and use off-the-shelf software that will allow some of our remote users the capability of visualizing their data from within a web browser.
Most tools that have been developed for web use can not adapt themselves to the underlying network. Although users are presented with a similar or common interface, performance can vary widely between different hardware platforms. Accordingly, we're continuing to research the development of tools that will utilize web-based technology and be "network aware". The concept of network aware applications is not a new idea, but has been given a fair amount of attention lately with the explosive growth of the Internet. A network-aware application monitors the underlying network capabilities and allows the user to adjust the performance by determining how the application should be running. It is analogous to allowing a user to change how a visualization is rendered according to the rendering capabilities of the hardware. (In some instances the user may want to see only a wireframe to increase the rendering rate or to reduce the amount of bandwidth being used by the application.)
Visualization of data have evolved over the years from simple X/Y plots into complex three dimensional models, complete with interaction techniques. Over the past few years, numerous research groups have explored techniques which facilitate the use of a variety of input devices to interact with data, such as probes, streamline rakes, etc.
One of the useful lessons learned from these projects is that a good user interface enables "point and click" and "net surfing"-style navigation using hyperlinks. Freeing the user to explore and "mine" data is a highly relevant topic to visualization efforts of today and the near future.
Systematizing this process will form the basis for research efforts over the next few years. While it is presently possible to construct systems which permit three-dimensional navigation and traversal through heterogeneous data, each such system is a customization, and thus falls into the 80% rather than the 20% category (see Sec. 2.1, above). Such mechanisms are not currently supported in commercially available software. The closest thing available "commercially" is VRML, which has embedded hyperlinks in the model itself. VRML, however, was not originally developed with scientific visualization in mind. Since VRML is an evolving standard, there are still many issues to be resolved until VRML can become a staple for scientific visualizations.
Currently, one of the major drawbacks with regards to VRML and scientific visualizations is the lack of scaling. VRML was developed with the thought that most of the objects being rendered would be of a basic geometric shape, i.e. cubes, spheres and cones. Unfortunately, these types of objects are typically in the minority in scientific visualizations. The mechanism for displaying non-basic geometry and unwieldy and in some cases, such as volume rendering, non-existent.
Another major drawback with VRML is the lack of interpretation standard across different hardware platforms. A VRML scene viewed on one platform can look completely different when viewed from another platform. This can have disastrous effects on a scientific visualization.
Communication between and among laboratories is informal. People tend to know only what personal acquaintances are researching or developing. Moreover, visualization activities throughout the laboratories are so extensive that it is impossible to stay abreast of them all without access to a centrally-located store of information. At the same time, ignorance of whom to call to get access to software and/or to obtain necessary permission to use it, etc. both contribute to a barrier to the sharing of tools. One possible approach to redress these problems would be the formation of a central MICS-sponsored Web site.
The purpose of this site would be to provide a repository for software, publications, and announcements which could then be used by all MICS-sponsored sites. In order to make such an idea work, a person would need to be committed to run the project, and endowed with the authority to collect information. Further incentive might take the form of making contributions to the Web site a condition of funding, so that projects would be required to disseminate via this site.
Even with "flat" data formats there is a "Tower of Babel" problem: everyone use a different format. Typically uses either have no format or they've "extended" a format for their special needs. With hierarchical data formats, the problem is exacerbated: there are very few standards (e.g. HDF) and these are almost always too rigid for the researchers. In the "flat" case, we can translate most formats into a single format and then work with that. With the hierarchical formats this isn't currently the case. Also, we will probably want to impose a hierarchical format on some large, "flat" data sets to facilitate viewing and navigating within them.
As indicated above, dataflow environments have many positive qualities which have lead to their proliferation over the past few years. However, as these tools have become more widely used, their shortcomings become more apparent.
The user-written modules that extend the dataflow environment often are evolved from a computational kernel, such as a convolution, or minimization solver. It is a straightforward matter to "put a wrapper" around kernels such as this, and conceptually, only a few lines of code would need to be changed to move from one dataflow environment to another. The amount of code to be changed begins to rapidly increase as the number and type of data to be communicated to and from the user-written module increase. Often, there is sufficient difference between data types and objects from one vendor to another to make the porting process very time consuming.
In a dataflow model, data typically flows in from a "source", through several processing stages, then to a "sink." Common sources are data files or custom-modified simulation codes which generate data. Common sinks are renderers or tools for writing data to secondary storage. All currently available dataflow models accommodate this type of topology. Complications arise when "feedback loops" appear within the dataflow graph. Such a feedback loop is used to pass information from one processing stage back "upstream.". of view of the user) computation requiring data from the renderer or reiterative solver computation to reach a desired level of tolerance. But however useful looping (and branching) may be, many commercially available systems (selling for thousands of dollars) do not support such fundamental programming constructs.
The dataflow packages are broadly based upon data structures, and not data objects or data models. As such, these systems suffer from serious limitations in terms of memory consumption. If data doesn't fit into memory, including the multiple copies made by the system, computation cannot continue. With increasing amounts of data coming from the research community, this limitation is no longer acceptable. Even when a robust data model is supported, user codes (computational kernels) will need to be re-architected to take advantage of the new data model. This activity will consume a significant amount of effort.
It takes a good deal of effort to port any code from a batch oriented model into a dataflow environment. Execution control within the simulation has to be re-engineered so that the simulation is run according to the dataflow process scheduler, and data are "nicely" communicate to and from the outside world. It is not uncommon for a couple of person-months to be spent doing a port.
Each of the different dataflow packages has a very steep learning curve, thereby making them unattractive to most users. By and large, it is the visualization development community which has embraced these systems, for one of their primary redeeming qualities is the environment for rapid prototyping of visualization algorithms.
One barrier to interoperability is the difficulty of sharing code. Code sharing is difficult because the national Laboratories use a variety of commercial and homegrown visualization packages. It is difficult to agree on a particular product because of our user community and our relationships with various vendors
For example, some Labs use IBM Data Explorer because they have developed a good working relationship with IBM and get answers to their questions and problems quickly. We find that getting the license information from them is difficult. This isn't to say that IBM is a bad company, but that we all have built up relationships with various vendors and the list is disjoint.
Another barrier is that no one package meets all of our needs. For example, high-energy physicists' and Environmental researchers' requirements usually are different. Additionally, we all have a large investment in our current software. Besides the software we have written, our users have made a huge investment in learning these systems and modifying their codes to use these products.
One problem is that most of the visualization development has occurred under UNIX and in an X-windows environment. These codes do not simply "port" to Windows. The only thing that X and Windows have in common is that both are "windowing systems." It is not reasonable to expect that many tools, if any at all, will be ported from X to Windows.
There are at least two commercially available X-servers for the NT platform. One, from OpenNT (nee Intergraph) claims to support OpenGL but crashes when used. The other, from Hummingbird, has a better track record in terms of stability. However, these tools just provide a way to use existing X code on the PC, rather than addressing the porting issue. These products will cost the users money, in addition to the nickels-and-dimes they are already spending.
Another barrier is access to the graphics hardware itself. The most promising path to using the hardware is through OpenGL, a widely accepted and implemented API for 3d graphics. Most commercially available visualization tools are built using OpenGL, in order to make optimal use of graphics hardware. OpenGL, while "supported" by WindowsNT, is available only in the Win32 subsystem, not the Posix subsystem, and as such, is not usable by tools running from within one of these X-emulation models. Writing an OpenGL DLL for graphics hardware is difficult and time-consuming, and it won't happen unless vendors see a significant commercial opportunity.
Games technology might be considered a viable source (or inspiration) of scientific visualization tools. Descent and Doom, for example, are dynamic, first-person games with stunning interactivity and graphics. Furthermore, working under severe RAM constraints, games developers learned how to cut corners by squeezing the last drop out of the hardware through tricks and hacks and other engineering marvels. Scientific visualization tools, on the other hand, tend toward the "completely general" design. The same tool has to work with astrophysics data, with chemistry data, with GIS data, and so forth. Thus the corner-cutting approach, while seemingly promising, is inappropriate for use in scientific visualization. The amount of data to be processed in a scientific visualization can far exceed anything games technology is capable of handling. Moreover, game-platforms such as PC's are fine for X/Y plots, and a few specialized tasks, but they are not designed in the same way as conventional graphics workstations. A complete discussion of this topic is beyond the scope of this paper, but are basic contention is that the labs should not be put into a position of "doing ports" of visualization tools to a PC platform (even as many of us are currently required to do so).
We have found that visualization is either ignored in the planning stages or cut out of projects because of funding shortages. How do we get both the researchers and the funding agencies to agree that visualization is important?
There are many long-term problems associated with remote visualization. Because NERSC users are scattered throughout the United States and their problem domains are diverse, there is universal solution to the challenge of providing visualization services to remote sites.
One of the primary problems associated with remote visualization is the varying amount of network bandwidth between LBNL and remote sites. Such a dynamic problem suggests a dynamic solution: visualization tools which can adapt to the underlying bandwidth.
Since we have little control of the underlying bandwidth capabilities, any applications run over the Internet will have to take into consideration the amount of the data that can be transferred rapidly so that the user will not perceive serious latency delays. In some cases, with low bandwidth, it may be more advantageous to perform as much of the rendering at the user site as possible.
Another major issue with solving remote visualization problems at LBNL is the varying level of hardware infrastructure (i.e., graphics and computing capacity as well as software tools) at the remote user sites. This diversity is most apparent amongst NERSC users. The users range from having access to only X-terminals to having access to CAVE's. Since we have not determine the hardware at the remote sites, the best we can hope for is to support a fairly common subset of hardware, or alternatively, to support a common software infrastructure deployed on heterogeneous hardware.
More fundamental than a laundry list of hardware and software, is the philosophy under which we operate and the consistency of this philosophy with the needs and practices of a diverse and widely-scattered user community, limited by budgetary constraints. Any hardware and software recommendations should thus follow from our basic approach and be congruent with user demands.
All of our hardware is from commercial sources. We are interested in VR technology that can be brought to the desktop and be affordable. We do acquire new and somewhat expensive VR hardware to evaluate it's eventual potential to our users' workstations. Because we have a large HPC center here, we also need to provide graphics accelerators. For example, the network connection to a user's site and their local graphics hardware are not always capable of visualizing a 10GB picture 15 times/second. Along with network and device capabilities, ease of equipment maintenance is also important.
Because we have never been a large group, we cannot afford to maintain a large body of software. Thus, we believe in using commercial software whenever possible and adding on to it. The best of both worlds is achieved when we can release our changes/additions/etc. back to the software vendor for maintenance.