[Python] 3 Ways of Multi-threaded Matrix Multiplication

Recently, I have implemented 3 different ways of multi-threaded matrix multiplication. There are 3 ways of thinking when writing a parallel program: –

  • Input Decomposition
  • Output Decomposition
  • Intermediate Decomposition

We want to create matrix multiplication (3 x 3) program in multi-threaded way.

Input: Matrix A, B and each one of them is 3×3 size.
Output: Matrix C which is the resultant of matrix A * B

Input Decomposition

There are several way when multiply 2 matrices, one of them is Block Matrix, on which you divide the matrix to sub-matrices (under some constraints), then multiplying the sub-matrices and finally sum them up in matrix C.

Here, the matrices are 3×3, so, we can divide them to A = [A1, A2, A3] transpose, B = [B1, B2, B3], where A1, A2, A3 are of size 3×1 and B1, B2, B3 are of size 1×3. Each thread calculates A1*B1, A2*B2, A3*B3. The result of these multiplications is 3×3 matrix.

The only thing remaining now is add the 3 3×3 matrices together to get a matrix C.

Output Decomposition

We know that the elements of each cell in the matrix C rows depends on a row of matrix A and the matrix B columns. So, we make each thread calculates the row values of matrix C. And the input of each thread is a row from matrix A and the whole matrix B.


Intermediate Decomposition

It is similar to Input Decomposition, the only difference here is that every thread will return a unique matrix C (intermediate matrix). In other words, every thread will return an intermediate matrix with size 3×3. So, we will have 3 temp matrix of size 3×3, then the matrix C = (thread_1_matrix_C + thread_2_matrix_C, thread_3_matrix_C)


Here https://github.com/AhmedHani/CS617-HPC/tree/master/Assignments/parallel_matrix_multiplication/multi-threaded_matrix_multiplication is the source code that implements the 3 techniques

The input is: A = [[1, 2, 3], [4, 5, 6], [7, 8, 9]], B = [[9, 8, 7], [6, 5, 4], [3, 2, 1]]

The output: C = A * B


– Algorithms and Parallel Computing by Fayez Gebali
– Introduction to Parallel Computing, Second Edition by  Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar


Solving Time-dependent Graph Using Modified Dijkstra Algorithm

Paper: http://research.microsoft.com/pubs/173806/edbt08tdsp.pdf

Content: –

  • Time-dependent Graph – Definition
  • Problem Definition
  • Existed Solutions
    – Bellman-Ford
    – Extended A*
  • New Dijkstra Based Algorithm


Time-dependent Graph – Definition

We can define Time-dependent Graph (TDG) as GT(V, E, W), where V is a graphs vertices, E is the edges between each vertex and W is the weights between each edge of the graph. Weights vary for each time interval T where T is a time domain pre-specified by the graph. So, W is a function of time (called edge-delay function) that has 3 parameters, the source node, destination node and the departure time from the source node.


Figure 1 shows an example of TDG graph. We have 4 nodes and directed edges between each one of them. There are weights on the edges between every pair of the graph. The weights change during time. In (b), it shows that at all time intervals, the weights are constant (10). (c) shows that the weight from nodes 1, 3 change in 2 time intervals, from [0, 5] the weight will be 10 and from [10, 60] the weight increases to 25.


Problem Definition

The given problem for the graph is to estimate the least total travel time (LTT) from the source node to the destination node given the edge-delay functions and time interval T. The travel time is the arrival time minus the starting time.

The source node, destination node and the starting time are given from the user. In other words, the LTT function should have 3 parameters LTT(source, destination, departure time of source).

There is another factor of time which is the waiting time at node. It is allowed to wait at a node, hence, the departure time of the node = arrival time + wait time.

The solution should estimate the shortest path of time at time-dependent graph (TDSP) from the source node to the destination node.


Existed Solutions

There are 2 existed solutions for TDSP. The objective is to minimize the LTT function given the starting node and destination node.

Bellman-Ford and extended A* algorithms are discrete time solution for such a problem.

Given graph GT(V, E, W) and query LTT(source, destination, starting_time), minimize the LTT function.

Bellman-Ford Based Algorithm: –


We can illustrate the algorithm by the following steps: –

  • First, we need to initialize the time to reach every node of the graph from the source by infinite. GL(t) indicates the earliest arrival time at node L from the source node S with starting time (t).
  • Initialize the edges between each pair of nodes. HK,L(t) is a function that indicates the earliest arrival to node L from source node S given the edge (K, L) with starting time (t)
  • Brute force to form relaxation between edges. Then update GL(t) and HK,L(t) till we get minimum time. This occurs when there are no more changes to the weights of the edges (Convergence).
  • Updating GL(t) and HK,L(t) are called path-selection and time-refinement steps respectively

A* Based Algorithm: –

Another algorithm to solve TDSP problem. Here, the algorithm assumes that there is no waiting time at each node.

There are many paths from the source node and destination node, the main idea here is use Priority Queue (like regular A*) to get all paths from source to destination. Let Pk is the kth path to reach from source node S to destination node D. For each path Pk we have an associated function with it. FPk(t) =  GPk(t) + dk,e(t) – t, where GPk(t) is the arrival time from source node S to node K using path Pk and dk,d(t) is a lower-bound estimation (heuristic) for time from node K to destination D.

For each iteration, the queue get on of paths then trying to minimize the function FPk(t). Then pop the path, get the other path and so on.

Here, path-selection and time-refinement are used in the priority queue and function update respectively.

The algorithm is called KDXZ for the first characters of their authors names. Experiments showed that this algorithm works fine with small graphs and struggles with larger ones as it is hard to estimate the value of dk,e(t)


New Dijkstra Based Algorithm: –

The algorithm is about decomposing the 2 mentioned steps, the time-refinement and path-selection steps as outlines of two-step-LLT


The time-refinement function estimates the earliest arrival time functions of each node in the graph. Then checks if there exists a path between the source and destination nodes. The path-selection gest the path which match the best arrival time for the best starting time t*.

Path-selection Algorithm: –


Time-refinement Algorithm (Dijkstra main function)


At first, we initialize the earliest arrival time function for each node of the graph. The algorithm refines arrival time incrementally for each sub-interval of the node time domain. Then use this function by extending the interval each time. Well-refined function is a function that specifies the earliest arrival time at a node from the source node.

For each iteration, the algorithm takes the front node of the priority queue then trying to expand its starting-time sub-interval to reach the well-refined function. Then update the function for every neighbor of the current node.

If the current node is the destination node, then the arrival time function of the destination node.


Getting started with Tensorflow

Tensorflow is an open source library created by Google for deep learning tasks. The library mainly works with matrices operations, it represents the operations between matrices and data by a graph that shows the dependency between the tasks. The edges (called tensors) between the nodes are the output of the operations.

There are many advantages of using tensorflow such as it reduces the development time by avoiding hard-coding some mathematical operations, also it supports GPU.

Recently I have added a solution for XOR learning problem in my notebook using tensorflow. It was the first time for me to use tensorflow in Machine Learning, and I think it won’t be the last time as I find it an awesome library that I could use.

This is a model from my machine learning masters course’s assignment.


You can check the solution here https://github.com/AhmedHani/CS624-ML/blob/master/Assignments/xor_learning.ipynb