Presentation Title

Presentation Title

ALGORITHM IMPROVEMENTS AND HPC 1 Basic MD algorithm 1. Initialize atoms status and the LJ potential table; set parameters controlling the simulation; O(N) 2. For all time steps do 3. Update positions of all atoms; O(N) 4. If there are atoms who have moved too much, do Update the neighbour list, including all atom pairs that are within a distance range (half distance matrix computation); O(N2)

End if 5. Make use of the neighbour list to compute forces acted on all atoms; O(N) 6. Update velocities of all atoms; O(N) 7. Update the displace list, which contains the displacements of atoms; O(N) 8. Accumulate and output target statistics of each time step; O(N) 9. End for 2 Some general ideas

MD basic algorithm has not changed the last 25 years Most major MD codes use similar approaches for optimization All trivial tricks have been already applied!! No major breakthrough except perhaps GPUs and Anton MD improvement is largely ligated to the improvement of hardware resources Can we use efficiently a present supercomputer for MD? 3 Improvement directions Floating-point calculations Precision, Custom Maths

Non-Bonded interactions PBC, Common interactions Stream processing & GPUs Integration steps Multiple time steps, Constrains, Virtual interactions sites Parallelization. Spatial decomposition Asynchronous parallelization Replica exchange, Markov state models P. Larsson, B. Hess & E. Lindahl. WIRE Comp. Mol. Sci. 2011, 1, 93-108 Floating point calculations MD limiting step is the calculation of Non-bonded interactions

Floating-point operations Random memory access Move to simple precision Problem for accumulated numerical error or for too small position displacements However, force calculations require only a relative error < 10-3 Small effect in the calculation itself in modern processors Improves use cache, SIMD register Math routines in single precision are more efficient Custom Math routines MD does not require maths to be IEEE complain!! Some internal checks can be avoided (ex. positive parameter in sqrt )

5 Non-bonded interactions Avoid O(N2) Maintain list of NB pairs within a distance cutoff Updated at larger time steps Caution, too short cutoff distance give wrong behaviour Domain decomposition (more later) PBC requires to check for the closest image Simple approach with 6 conditionals kill modern processors Cut-off sphere partitioning method

Maintains information about all 26 copies of the particle, updated with the NB list. Grid based NB pair list 6 Specialized interactions Some interactions are more abundant than others WAT-WAT dominate NB interactions list Water models SPC, TIP3P, TIP4P Rigid, only VdW in O, reduced set of charges The number of interactions to check is largely reduced

Only oxygens should be considered in the NB Lists A recent Charmm implementation uses a combined Coulomb + Lennard-Jones potential for Wat-Wat interactions 7 Streaming & GPUs GPUs have become popular in simulation field.

One order of magnitude faster that comparable CPUs Requires reformulate algorithms Usually the approach is not compatible with others SIMD approach, No random access to memory, no persistent data CPU is responsible of the main algorithm, GPU performs selected parts Update the NB list, Evaluate interactions, Update coordinates & velocities GPU could be viewed as a multicore device executing a high number of threads in parallel Threads are grouped in blocks that share memory. Memory distribution could be limitation Atom based decomposition is preferred

8 Use of GPUs 9 Integration steps Multiple time steps Numerical integrations require that the time step is kept below the fastest movement in the system Normally 1 femtosecond (109 iterations to reach biologically meaningful time scales) Most parts of the calculation do not change significatively in 1 fs.

GROMACS style: 1fs bond vibrations, 2fs NB interactions, 4fs PME (long range interactions) Tenths fs (NB list update, temperature and pressure update) Increasing the granularity of the minimal calculation (ex. in coarse-grained) will allow longer time steps 10 Integration steps. Constrains Constrains Some movements are not relevant for global flexibility Algorithms like SHAKE, RATTLE, M-SHAKE, LINCS, update bond distances by direct minimization

Constrains are very efective but limit parallelization Usually restricted to bonds with hydrogen atoms Virtual interaction sites Hydrogen atoms are replaced by virtual atoms Possible to reach 4-5 fs 11 MD Parallelization MD parallelization uses MPI Heavily limited by communication bandwith Modern multicore processors will allow to combine with openMP Simplest strategy is Atoms/Force decomposition

Atoms are distributed among processing units Bad and unpredictable load balance Requires all-to-all update of coordinates Partially improved by distribution NB list instead of atoms Scales only 8-16 processors Distribution should be dynamic: Spatial decomposition Space regions (not atoms) are distributed among processing units Assignment of atoms is dynamic, depending on their position in space

12 Spatial decomposition Model to use depends on hardware communication model Eight shell for single network (classical Linux cluster) Midpoint for full-duplex connections (Cray, BlueGene) Pulses from on processor to the next in each geometrical direction diminish communication calls for long range interactions 13 Spatial decomposition

Load balancing is the major efficiency problem Cells need not to be equal, and possibly updated dynamically 14 Asyncronous parallelization Any parallelization method become useless with less of ca. 100 atoms per processing unit Present computers can have thousands of cores but most usual simulations have around 100,000 atoms !!! A single long simulation can be replaced by a series of shorter ones.

Same statistics Sampling is improved as initial structures could be different In most cases simulations are completely independent (embarrisingly parallel) A single supercomputer is not really needed Distributed computing Grid [email protected] 15 Replica Exchange

P S S ' = min1, exp m n Em En Markov State Model Kinetic clustering of states Trajectory defines different ensembles separated by energy barriers Markov models used to predict transitions Trained with a large number of short simulations 17 Conclusions

Major MD codes uses a combination of optimization methods Current routine tends to be GPU based calculations, but combinations MPI + openMP are being considered. GPU + MPI + approaches are being considered but are highly dependent on hardware evolution. No parallelization schema will survive next generation of supercomputers 18 MD & HPC future? An ideal approach would for instance be to run a simulation of an ion channel system over 1000 cores in parallel, employing,

e.g., 1000 independent such simulations for kinetic clustering, and finally perhaps use this to screen thousands of ligands or mutations in parallel (each setup using 1,000,000 cores) P. Larsson, B. Hess & E. Lindahl. WIRE Comp. Mol. Sci. 2011, 1, 93-108 19

Recently Viewed Presentations

  • Everyday Public Speaking - Alvin Independent School District

    Everyday Public Speaking - Alvin Independent School District

    Everyday Public Speaking Chapter 13 After completing this chapter, you will be able to explain types and functions of social-ritual speeches create and deliver social-ritual speeches describe the purposes of an informative speech and a training speech prepare scripts or...
  • TQA Training PPT final

    TQA Training PPT final

    Step back and refrain from making physical contact with them. ... - A stretcher requires two people to gently roll the animal onto it. Handlers should hold/push at the flank and under the forelegs. ... one to operate the machine...
  • Chapter 1  Microbiology is the study of microorganisms

    Chapter 1 Microbiology is the study of microorganisms

    1860s: Joseph Lister used a chemical disinfectant to prevent surgical wound infections. 1876: Robert Koch established a set of postulates, used to prove that a specific microbe causes a specific disease (i.e. anthrax). The Germ Theory of Disease 1796: Edward...
  • Financial and Non Financial Assets of Institutional Investors ...

    Financial and Non Financial Assets of Institutional Investors ...

    17 countries provide total non financial assets (AN) for the sector S12_IF, 19 countries for the sector S125 and 7 countries out of 12 for the sector S12_OF. Summary of methodological survey - 7IA 21 countries have provided the methodological...

    In red, generic sentence trials (Run 1); and in blue, famous sentence trials (Run 2). Temporal auditory areas are significantly activated as are the posterior language areas. Broca's area is significantly active only in Run 1. Figure 5: (e&f) The...
  • 8 8 INTRODUCTION Motivation is an important function

    8 8 INTRODUCTION Motivation is an important function

    Incentive. Motivation is the incentive of employees. It is a personal and natural feeling of the mind of the employees. In fact an individual is inspired to make his efforts to satisfy the needs of his life. 2. Unending Process.
  • Chapter 3 Teams, Teamwork and Team Dynamics

    Chapter 3 Teams, Teamwork and Team Dynamics

    focusing on the patient to provide the best means of serving patient/client interests. allowing team members to carry out the team's work and to manage itself as an independent group of people. ... Chapter 3 Teams, Teamwork and Team Dynamics...
  • ECSS - Emily Carr Secondary School

    ECSS - Emily Carr Secondary School

    OSAP-Loan. Loans eventually need to be paid back. Online application process. will not update this year's application . until May 2014. OSAP-Grant. Grants are gifts . 30% off Ontario Tuition. money you don't have to repay. designed for students...