Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
1st ed, 2014
Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte
Titel
High-performance computing on complex environments
Auflage
1st ed
Ort / Verlag
Hoboken, New Jersey : Wiley,
Erscheinungsjahr
2014
Beschreibungen/Notizen
  • Includes bibliographical references at the end of each chapters and index.
  • Cover -- Title Page -- Contents -- Contributors -- Preface -- European Science Foundation -- Part I Introduction -- Chapter 1 Summary of the Open European Network for High-Performance Computing in Complex Environments -- 1.1 Introduction and Vision -- 1.2 Scientific Organization -- 1.2.1 Scientific Focus -- 1.2.2 Working Groups -- 1.3 Activities of the Project -- 1.3.1 Spring Schools -- 1.3.2 International Workshops -- 1.3.3 Working Groups Meetings -- 1.3.4 Management Committee Meetings -- 1.3.5 Short-Term Scientific Missions -- 1.4 Main Outcomes of the Action -- 1.5 Contents of the Book -- Acknowledgment -- Part II Numerical Analysis for Heterogeneous and Multicore Systems -- Chapter 2 On the Impact of the Heterogeneous Multicore and Many-Core Platforms on Iterative Solution Methods and Preconditioning Techniques -- 2.1 Introduction -- 2.2 General Description of Iterative Methods and Preconditioning -- 2.2.1 Basic Iterative Methods -- 2.2.2 Projection Methods: CG and GMRES -- 2.3 Preconditioning Techniques -- 2.4 Defect-Correction Technique -- 2.5 Multigrid Method -- 2.6 Parallelization of Iterative Methods -- 2.7 Heterogeneous Systems -- 2.7.1 Heterogeneous Computing -- 2.7.2 Algorithm Characteristics and Resource Utilization -- 2.7.3 Exposing Parallelism -- 2.7.4 Heterogeneity in Matrix Computation -- 2.7.5 Setup of Heterogeneous Iterative Solvers -- 2.8 Maintenance and Portability -- 2.9 Conclusion -- Acknowledgments -- References -- Chapter 3 Efficient Numerical Solution of 2D Diffusion Equation on Multicore Computers -- 3.1 Introduction -- 3.2 Test Case -- 3.2.1 Governing Equations -- 3.2.2 Solution Procedure -- 3.3 Parallel Implementation -- 3.3.1 Intel PCM Library -- 3.3.2 OpenMP -- 3.4 Results -- 3.4.1 Results of Numerical Integration -- 3.4.2 Parallel Efficiency -- 3.5 Discussion -- 3.6 Conclusion -- Acknowledgment -- References.
  • Chapter 4 Parallel Algorithms for Parabolic Problems on Graphs in Neuroscience -- 4.1 Introduction -- 4.2 Formulation of the Discrete Model -- 4.2.1 The theta-Implicit Discrete Scheme -- 4.2.2 The Predictor--Corrector Algorithm I -- 4.2.3 The Predictor--Corrector Algorithm II -- 4.3 Parallel Algorithms -- 4.3.1 Parallel theta-Implicit Algorithm -- 4.3.2 Parallel Predictor--Corrector Algorithm I -- 4.3.3 Parallel Predictor--Corrector Algorithm II -- 4.4 Computational Results -- 4.4.1 Experimental Comparison of Predictor--Corrector Algorithms -- 4.4.2 Numerical Experiment of Neuron Excitation -- 4.5 Conclusions -- Acknowledgments -- References -- Part III Communication and Storage Considerations in High-Performance Computing -- Chapter 5 An Overview of Topology Mapping Algorithms and Techniques in High-Performance Computing -- 5.1 Introduction -- 5.2 General Overview -- 5.2.1 A Key to Scalability: Data Locality -- 5.2.2 Data Locality Management in Parallel Programming Models -- 5.2.3 Virtual Topology: Definition and Characteristics -- 5.2.4 Understanding the Hardware -- 5.3 Formalization of the Problem -- 5.4 Algorithmic Strategies for Topology Mapping -- 5.4.1 Greedy Algorithm Variants -- 5.4.2 Graph Partitioning -- 5.4.3 Schemes Based on Graph Similarity -- 5.4.4 Schemes Based on Subgraph Isomorphism -- 5.5 Mapping Enforcement Techniques -- 5.5.1 Resource Binding -- 5.5.2 Rank Reordering -- 5.5.3 Other Techniques -- 5.6 Survey of Solutions -- 5.6.1 Algorithmic Solutions -- 5.6.2 Existing Implementations -- 5.7 Conclusion and Open Problems -- Acknowledgment -- References -- Chapter 6 Optimization of Collective Communication for Heterogeneous HPC Platforms -- 6.1 Introduction -- 6.2 Overview of Optimized Collectives and Topology-Aware Collectives -- 6.3 Optimizations of Collectives on Homogeneous Clusters -- 6.4 Heterogeneous Networks.
  • 6.4.1 Comparison to Homogeneous Clusters -- 6.5 Topology- and Performance-Aware Collectives -- 6.6 Topology as Input -- 6.7 Performance as Input -- 6.7.1 Homogeneous Performance Models -- 6.7.2 Heterogeneous Performance Models -- 6.7.3 Estimation of Parameters of Heterogeneous Performance Models -- 6.7.4 Other Performance Models -- 6.8 Non-MPI Collective Algorithms for Heterogeneous Networks -- 6.8.1 Optimal Solutions with Multiple Spanning Trees -- 6.8.2 Adaptive Algorithms for Efficient Large-Message Transfer -- 6.8.3 Network Models Inspired by BitTorrent -- 6.9 Conclusion -- Acknowledgments -- References -- Chapter 7 Effective Data Access Patterns on Massively Parallel Processors -- 7.1 Introduction -- 7.2 Architectural Details -- 7.3 K-Model -- 7.3.1 The Architecture -- 7.3.2 Cost and Complexity Evaluation -- 7.3.3 Efficiency Evaluation -- 7.4 Parallel Prefix Sum -- 7.4.1 Experiments -- 7.5 Bitonic Sorting Networks -- 7.5.1 Experiments -- 7.6 Final Remarks -- Acknowledgments -- References -- Chapter 8 Scalable Storage I/O Software for Blue Gene Architectures -- 8.1 Introduction -- 8.2 Blue Gene System Overview -- 8.2.1 Blue Gene Architecture -- 8.2.2 Operating System Architecture -- 8.3 Design and Implementation -- 8.3.1 The Client Module -- 8.3.2 The I/O Module -- 8.4 Conclusions and Future Work -- Acknowledgments -- References -- Part IV Efficient Exploitation of Heterogeneous Architectures -- Chapter 9 Fair Resource Sharing for Dynamic Scheduling of Workflows on Heterogeneous Systems -- 9.1 Introduction -- 9.1.1 Application Model -- 9.1.2 System Model -- 9.1.3 Performance Metrics -- 9.2 Concurrent Workflow Scheduling -- 9.2.1 Offline Scheduling of Concurrent Workflows -- 9.2.2 Online Scheduling of Concurrent Workflows -- 9.3 Experimental Results and Discussion -- 9.3.1 DAG Structure -- 9.3.2 Simulated Platforms -- 9.3.3 Results and Discussion.
  • 9.4 Conclusions -- Acknowledgments -- References -- Chapter 10 Systematic Mapping of Reed--Solomon Erasure Codes on Heterogeneous Multicore Architectures -- 10.1 Introduction -- 10.2 Related Works -- 10.3 Reed--Solomon Codes and Linear Algebra Algorithms -- 10.4 Mapping Reed--Solomon Codes on Cell/B.E. Architecture -- 10.4.1 Cell/B.E. Architecture -- 10.4.2 Basic Assumptions for Mapping -- 10.4.3 Vectorization Algorithm and Increasing its Efficiency -- 10.4.4 Performance Results -- 10.5 Mapping Reed--Solomon Codes on Multicore GPU Architectures -- 10.5.1 Parallelization of Reed--Solomon Codes on GPU Architectures -- 10.5.2 Organization of GPU Threads -- 10.6 Methods of Increasing the Algorithm Performance on GPUs -- 10.6.1 Basic Modifications -- 10.6.2 Stream Processing -- 10.6.3 Using Shared Memory -- 10.7 GPU Performance Evaluation -- 10.7.1 Experimental Results -- 10.7.2 Performance Analysis using the Roofline Model -- 10.8 Conclusions and Future Works -- Acknowledgments -- References -- Chapter 11 Heterogeneous Parallel Computing Platforms and Tools for Compute-Intensive Algorithms: A Case Study -- 11.1 Introduction -- 11.2 A Low-Cost Heterogeneous Computing Environment -- 11.2.1 Adopted Computing Environment -- 11.3 First Case Study: The N-Body Problem -- 11.3.1 The Sequential N-Body Algorithm -- 11.3.2 The Parallel N-Body Algorithm for Multicore Architectures -- 11.3.3 The Parallel N-Body Algorithm for CUDA Architectures -- 11.4 Second Case Study: The Convolution Algorithm -- 11.4.1 The Sequential Convolver Algorithm -- 11.4.2 The Parallel Convolver Algorithm for Multicore Architectures -- 11.4.3 The Parallel Convolver Algorithm for GPU Architectures -- 11.5 Conclusions -- Acknowledgments -- References -- Chapter 12 Efficient Application of Hybrid Parallelism in Electromagnetism Problems -- 12.1 Introduction.
  • 12.2 Computation of Green's functions in Hybrid Systems -- 12.2.1 Computation in a Heterogeneous Cluster -- 12.2.2 Experiments -- 12.3 Parallelization in Numa Systems of a Volume Integral Equation Technique -- 12.3.1 Experiments -- 12.4 Autotuning Parallel Codes -- 12.4.1 Empirical Autotuning -- 12.4.2 Modeling the Linear Algebra Routines -- 12.5 Conclusions and Future Research -- Acknowledgments -- References -- Part V CPU + GPU Coprocessing -- Chapter 13 Design and Optimization of Scientific Applications for Highly Heterogeneous and Hierarchical HPC Platforms Using Functional Computation Performance Models -- 13.1 Introduction -- 13.2 Related Work -- 13.3 Data Partitioning Based on Functional Performance Model -- 13.4 Example Application: Heterogeneous Parallel Matrix Multiplication -- 13.5 Performance Measurement on CPUs/GPUs System -- 13.6 Functional Performance Models of Multiple Cores and GPUs -- 13.7 FPM-Based Data Partitioning on CPUs/GPUs System -- 13.8 Efficient Building of Functional Performance Models -- 13.9 FPM-Based Data Partitioning on Hierarchical Platforms -- 13.10 Conclusion -- Acknowledgments -- References -- Chapter 14 Efficient Multilevel Load Balancing on Heterogeneous CPU + GPU Systems -- 14.1 Introduction: Heterogeneous CPU + GPU Systems -- 14.1.1 Open Problems and Specific Contributions -- 14.2 Background and Related Work -- 14.2.1 Divisible Load Scheduling in Distributed CPU-Only Systems -- 14.2.2 Scheduling in Multicore CPU and Multi-GPU Environments -- 14.3 Load Balancing Algorithms for Heterogeneous CPU + GPU Systems -- 14.3.1 Multilevel Simultaneous Load Balancing Algorithm -- 14.3.2 Algorithm for Multi-Installment Processing with Multidistributions -- 14.4 Experimental Results -- 14.4.1 MSLBA Evaluation: Dense Matrix Multiplication Case Study -- 14.4.2 AMPMD Evaluation: 2D FFT Case Study -- 14.5 Conclusions.
  • Acknowledgments.
  • Description based on print version record.
Sprache
Identifikatoren
ISBN: 1-118-71189-0, 1-118-86667-3
OCLC-Nummer: 870336445
Titel-ID: 9925036845806463
Format
1 online resource (470 p.)
Schlagworte
High performance computing