A Research Platform for Singapore

Solving the Computational Challenge of City-Scale Microscopic Traffic Simulation


In recent years, traffic simulation has become one of the most used approaches for traffic analysis in support of the design and evaluation of traffic systems. In microscopic traffic simulations, road traffic is formed by driver-vehicle-units that have certain driving behaviors.

How to simulate the road traffic at a large scale, such as a city, at a high level of detail in real-time? To calculate the movement of large amount of driver-vehicle-units with a reasonable speed requires a large amount of computational resource. Parallel and distributed computing can be used to make such big simulations possible. Conventional parallel and distributed algorithms may not be suitable for traffic simulation. New algorithms which incorporate the characteristics of road traffic should be designed. 


The concept of parallel simulation in CityMoS is illustrated in the video.

As is well-known in the parallel and distributed computing community, to improve the speed of a program is not simply a matter of adding more processors. This is because parallel computing comes with overheads. Two most dominant overheads are load imbalance and synchronization. When the computational work load is distributed unevenly among the processors, some of the processors would not be utilized efficiently. Synchronization is necessary for processes (a part of the simulation that runs on a processor) to exchange data due to data dependencies. This incurs communication among processors.

Thus, in this research project, we investigate and design efficient load-balancing and synchronization algorithms for microscopic traffic simulation.


We have developed an innovative partitioning algorithms to partition the simulation to make use of distributed memory computers. Some preliminary results on the speed-up we achieved are shown below.

In a city-scale simulation scenario, AIDA’s partitioning algorithm is able to speed-up around 7 time compared to a non-parallelized simulation, reducing the simulation from more than 3 hours to less than half an hour. Its performance surpasses the popular partitioning methods Stripe, METIS, and Buffoon.


As is shown in the example above, our algorithms can significantly reduce the time required to run a simulation study. This speeds up the whole modelling and simulation process.

On-going work

We will continue to investigate more modular and easy-to-use ways to utilize supercomputers to improve the load-balancing and reduce synchronization overhead.