The idea is that we have two very large matrices and we want to multiply them. Obviously we want the multiplication to go parallely and in a master-worker fation. Given two matrices A and B, we want to compute C = A * B; Perhaps the simplest way of doing it is to ship the entire matrices accross to the workers. Then the master can specify a set of rows to compute upon. For the sake of illustration we take this approach.
The task is in the file Task-Matmul.C. This is the class Task_Matmul which is derived from MWTask.
As said earlier the master will specify the worker to compute a range of rows of matrix C. Thus the main components of this class are startRow and endRow which specify this range. The pointer results is used to hold the results of the computed rows.
The master distributes these tasks to the workers. To distribute a task the master will call pack_work(). In our case we just pack the startRow and the endRow. At the worker end the worker will call unpack_work() and download whatever is passed on from the master into a new Task structure. The worker computes this task and the result will be stored in the results pointer of the task. The call pack_results() in the worker will be used to pack the results to be shipped to the master. In our case we ship the computed matrix rows. On the master side unpack_results() will be used to get whatever was shipped over.
Notice that MWDriver treats the MWTask structures as abstract data types that are shipped back and forth between the master and workers. This treatment enables the applications to attach whatever meaning to their tasks as long as they write the requisite virtual functions.
Lets examine the master now. This is the class Driver_Matmul in Driver-Matmul.C which is instantiated from the MWDriver class. The main components of this class are the three matrices A, B and C. It also stores the rows and cols of the matrices. The interesting parameter is partition_factor. Say suppose the matrix A has 100 rows. If the partition_factor equals 10, we will be asking the workers to solve 10 rows at any time.
Lets examine the get_userinfo() function. The in_master file specified in the file submit_mwfile will be given to this program as the stdin. In this case we are reading the number of different arch/opsys pair of executables that we have. Condor is a heterogenous environment and we can have machines belonging to different arch/opsys combinations. We can compile the worker programs on several different arch/opsys combination and specify those here. The calls RMC->set_worker_attributes says that the ith executable file is this one and that it's running constraints are ((Arch==\"INTEL\") && (Opsys==\"SOLARIS26\")). In our case we have just one executable which is of type INTEL/SOLARIS26. Next we scan the matrices A and B themselves. Thus the primary function of the get_userinfo is to specify to the MW the executable characteristics and also do some initialization.
The next important function is the setup_initial_tasks(). Here MW wants from the application how many initial tasks are there and what are those. The main quantities in our Tasks are the startRow and endRow of the rows that are to be computed as part of computing that task. Thus the tasks are initialized in the loop and given to MW.
In essence the above two functions will constitute the setup part of MW. After setup_initial_tasks is done, MW knows how many tasks are there and what are those tasks. It also knows how many workers to get. It starts acquiring workers and sending tasks to them. Before it sends any task to a new worker it first sends some init data. This is specified by the function pack_worker_init_data(). In our case we have to send the entire A and B matrices.
Now as and when the workers complete the tasks and return back the results, MW will get those results and call act_on_completed_task(). In this function the application can do things like looking up to see what are the results of that task, storing those results or creating new tasks based on the results. In our case the result of the task are the specified rows of the computed matrix and we merely store those rows into our resulting matrix.
Finally printresults which will be called at the end of everything, prints out the matric C.
The worker is the entity which actually does the work. This is in the file Worker-Matmul.C. This is the class Worker_Matmul which is derived from the base class MWWorker.
The main components of the worker are the two matrices A and B that it stores.
When the worker first starts up it gets initial data from the master. This is processed in unpack_init_data(). In our case the master sends the matrices A and B. Thus we store the two matrices in this function.
As and when a worker gets a task to execute MW will call the function execute_task(). This will contain the task that has to be computed upon. In our case a task will contain the startRow and the endRow and we have to compute these rows of matrix C. The execution takes place and we store the results in that task itself.