Parallel Atlas

> In the past, I wasn't that interested in parallel atlas because
> I was thinking of the "VWF" case - lots of jobs, where parallel atlas
> would presumably be slower.  But I can now see some benefit while running
> in "debugging" mode (initial file set-up and testing).

> does it make sense to run parallel atlas on a 2000 node job if your
> primary interest is just making it faster?  if you answer yes, this
> will greatly increase my interest in parallel atlas.
Absolutely yes. For this type of job you would get 65-75% efficiency on
2 or 4 processors. 

> how many cpu's can you use in parallel atlas?  is it adjustable?  

There is no upper limit except that the benefits tend to flatten out
around 16 cpus, but this depends on problem size (see later).
The number of cpus used is user selectable on the GO statement:

	go atlas simflags="-V 4.3.2.R -P 6"  

> (Note: I had to install 4.3.2.R, as 4.3.0.R didn't support
> multiple CPU's)


> how well does the efficiency scale with node count?  what are the
> "best" types of problems for parallel atlas?
The 'efficiency' goes up as the node count goes up. This is easy to
rationalise since the overhead (or inefficiency) is due to
splitting up and recombining the mesh. The less often this is done
the more efficient the solution is.

The best problems to run are those that have a large CPU time per
iteration. Typically large mesh and/or complex equation sets.

Simulation with a large number of iterations but a fast time per
iteration are much less efficient.

Small mesh problems will be SLOWER on 7 CPUS than 4 CPUS due to the
overhead of dividing up the mesh and recombining it. Also the
boundary area between mesh points solved on different CPUS is a
higher percentage of the total in this case. This required more
communication of data between CPUs which is slow.

Running a 8000 node job on 4 cpus is not the same as running four
2000 node jobs since the 8000 node mesh has to be solved
self-consistently. The efficiency includes all effects.  A similar
effect is related to memory. Parallel ATLAS does use more total
memory than regular ATLAS but not more memory per processor.
Therefore just adding 3 more processors to a typical 1 processor
machine might not give the improvements we quote unless memory is
also added. Dani said he was considering a fixed memory per
processor of 256Mb which is exactly the way to go.

> you told me there were a couple of things parallel atlas couldn't do - i
> think they were automatic IV curve tracing, and 3D simulations.  but
> mixed-mode will work, right?    
Correct. Mixedmode will work fine in parallel ATLAS. The unsupported
features are CURVETRACE, MOD.WATT model and 3D modules.

> I am a little confused now about the parallel atlas licenses.  I guess
> parallel atlas is a separate version, rather than simply activating a
> license.  

When parallel ATLAS runs it takes one of the normal licenses and
one or more threaded licenses.  For example, if you want to run 4
processor parallel atlas, 1 normal atlas licenses will be used, and
3 threaded licenses will be used.


> Some timing numbers on a 2000 node, 2 carrier problem:

> SunOS law 5.5.1  SPARCstation-20  with 2 CPU's

> The following table has the CPU times in seconds for a series of
> bias-points, all in the same file.  it appears to take a while for
> the parallel version to get "revved up", and then it saturates at
> about a 35% improvement, which is fine.

>  2 CPUs   1 CPU    Speed-up
>   37.5     43.5       14%
>   81.6     105        22
>   174      254        31
>   274      412        33
>   377      580        35
>  1234     1925        35

> 35% improvement might not sound like a lot, but you only expect
> 50% improvement, and therefore you are getting 70% of ideal.  if
> you already have a 2 CPU machine available, adding a parallel
> atlas license is an easy way to "upgrade" your machine (i.e., to 
> get results faster).

> Note that the only HP-OS that supports multi-threading is HP-11.

Above is a plot of the performance speed-up versus number of CPU's for a small and large job. The single CPU times for these two cases were roughly 4 minutes and 48 minutes. They were just MOSFET IV sweeps, nothing advanced. Your performance may vary. The following two questions also came up and are included for advanced users: > - I've been using 7 CPUs for this simulation (Atlas confirms it gets all > 7), but the load on the machine doesn't go above 3, and even top doesn't > show more than 20% CPU usage (in my 8-CPU machine, 1 CPU=12.5%). > Michael has run other completely different atlas jobs that did actually > use 80%+ of CPU usage, according to top. I presume this is because of > the special 2-level newton solution method; is there something better I > could try that'll still converge? If you try .OPTIONS FULLN and METHOD NEWTON it should be the most efficient for the parallel solver. Maybe also see the parallel statistics for this example as below. > what's is the significance of the last two lines? > > > Mesh statistics: > Type non-cylindrical > Total grid points 2001 > Total triangles 3778 > Obtuse triangles 0 (0 %) > Variance in partition size: 0.E+0 > Tagged Elements 407 ( 10 %) The explaination from the developer is: As for the output, the "Tagged Elements" is the number of mesh elements belonging to a border line between different partitions. These elements are the ones requiring special processing when using parallel algorithms. The percentage is the percentage from the total number of elements in the mesh. The smaller this number is, the more efficient parallel atlas is. The variance is a measure of how balanced the partitions are. If this number is high (say > 100), it is likely that some processors will do overtime while others take time off. Which is bad for overall parallel performance. In this example you gave me, 0 variance means the partitioning algorithm achieved perfect balance between the partitions.

Home * Calendar * Calibration * Publications and Presentations * Editorial/Humor * Miscellaneous

This page last updated Dec 17, 1998 by