I am new to Parallel Programming. I am trying to multiply two matrices. I have partitioned the problem as follows:Let the operation be mat3 = mat1 x mat2I am broadcasting the mat2 to all the processes in the communicator, and cutting out strips of rows of the mat1 and scattering them to the processes in the commnicator group. After all processes has the entire mat2 and the corresponding strips of mat1, they multiply the strip with mat2 and then I am using the gather operation with the local results of the process, and accumulate the entire result in the root process.I wanted to know if there is a better problem partitioning to multiply two matrix in a general purpose computer.My implementation is in OpenMPI. There is a variety of algorithms in the literature about Matrix Multiplication that can be extend to the MPI paradigm.
View Notes - MPI-Matix-Vector from CS 211 at University of California, Riverside. Matrix-vector Multiplication Chapter Objectives Review matrix-vector multiplicaiton Propose replication of vectors. Outer product. Recall that when we multiply an m×n matrix by an n × p matrix we get an m × p matrix. Outer product of column vector aT and vector b = matrix C. An m × 1 times a 1 × n. A1,3. x3,1 Multiplication table with rows formed by a: and the columns by b:.
For example: 1Dsystolic 1 2D-systolic, Cannon’s algorithm 2; Fox’s algorithm 3; Berntsen’s algorithm 4; DNS algorithm 5.If you ignore the matrix proprieties (sparse ect), it basically resumes on how the data its distributed among the processes to minimize the synchronization and the load unbalance (the amount of work distributed among each process).In this you can see two different data distribution approach and the comparison between them.papers: 1 Golub G.H and Van C.H L., “Matrix Computations.”,Johns Hopkins University Press, 1989.2 Whaley R. C., Petitet A., Dongarra J.
J., “Automated empirical optimizations of software and the ATLAS project” Parallel Computing 27, 1.2 (2001), 3.35.3 Fox G. W., and Hey A. G., “Matrix algorithms on a hypercube I:Matrix multiplication”,Parallel Computing, vol.
1987.4 Berntsen J.,“Communication efficient matrix multiplication on hypercubes, Parallel Computing”, vol. 335-342, 1989.5 Ranka S. And Sahni S., “Hypercube Algorithms for Image Processing and Pattern Recognition”, Springer- Verlag, New York, NY, 1990.
Parallel Programming Labs: MPI and OpenMP ExamplesThese labs will help you to understand C parallel programming with and.Visual Studio 2010 solution, Intel Compiler with /Qopenmp.HelloWorldSee Hello message from every thread and every process. SumNumbersThis program sums all rows in an array using parallelism.The root process acts as a master and sends a portion of thearray to each child process. Master and child processes thenall calculate a partial sum of the portion of the array assignedto them, and the child processes send their partial sums tothe master, who calculates a grand total. SumNumbersCascadeSums all rows in an array. Cascade algorithm.