The 3-phase task execution model has shown to be a good candidate to tackle the memory bus contention problem. It divides the execution of tasks into computation and memory phases that enable a fine-grained memory bus contention analysis. However, existing works that focus on the bus contention analysis for 3-phase tasks, neglect the fact that memory bus contention strongly relates to the number of bus/memory requests generated by tasks, which, in turn, depends on the content of the cache memories during the execution of those tasks. These existing works assume that the worst-case number of bus/memory requests will be generated during all the memory phases of all tasks, irrespective of the already existing content in the cache memory. This overestimates the memory bus contention of tasks, leading to pessimistic worst-case response time (WCRT) bounds.
This work proposes a holistic approach towards bus contention analysis for 3-phase tasks by (1) deriving an upper bound on the actual cache misses of tasks that lead to bus/memory requests; (2) improving State-of-the-Art (SoA) bus contention analysis of two bus arbitration schemes that dominate all existing works on the bus contention analysis for 3-phase tasks; and (3) performing an extensive experimental evaluation under different settings to compare the proposed analysis against the SoA. Results show that incorporating a tighter bound on the number of cache misses of tasks into the bus contention analysis can lead to a significant improvement in task set schedulability.
Abstract
Published at: 29th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA’23)
Introduction
The adoption of multicore platforms in hard real-time systems, i.e., systems that run applications with stringent timing requirements, is still under the scrutiny of academia and industry. The main challenge that hinders the use of commercial off-the-shelf (COTS) multicore platforms in hard real-time systems is their unpredictability, which originates from the sharing of different hardware resources, e.g., Last-Level Cache (LLC), the interconnect (e.g., memory bus), and the main memory. A task executing on one core of a multicore platform has to compete with other co-running tasks (running on other cores) to access these shared resources. For instance, the shared memory bus connects all the cores to the main memory. Due to such sharing, tasks running on different cores may have to compete to access the memory bus in order to read/write data/code from/to the main memory, resulting in bus contention. This bus contention can significantly increase the Worst-Case Execution Time (WCET) and Worst-Case Response Time (WCRT) of tasks.
The 3-phase task model [3], [19] has been extensively used by the state-of-the-art to tackle the memory/bus contention problem [1], [5], [8], [9], [16], [18]. The 3-phase task model divides the execution of each task into distinct computation and memory phases such that a task can only access the shared memory bus/main memory during its memory phase and the core execute the task during the computation phase without accessing the shared bus/main memory. Although the 3-phase task model makes bus/memory access patterns of tasks more predictable, 3-phase tasks can still suffer bus/memory contention, e.g., when multiple concurrent tasks running on different cores try to access the bus/memory. Considering the impact bus contention can have on the WCET/WCRT of tasks, several works in the state-of-the-art [1], [8], [9], [16], [25] have focused on analyzing the maximum bus contention that can be suffered by 3-phase tasks and its impact on taskset schedulability. In a few recent works, Arora et al. [8], [9] have presented bus contention analysis for 3-phase tasks that dominate all the existing bus contention analysis for 3-phase tasks. However, when computing bus contention, these works [8], [9] assume that the number of bus/memory requests that can be generated during the memory phases of each job of a task is always equal to its worst-case memory access demand in isolation. Although this assumption is safe, it can lead to pessimistic results, especially for platforms with cache memories. Caches are smaller faster memories that store the recently referenced data/instructions of tasks and allow for data re-use, i.e., cache content fetched during the execution of one job of a task may be re-used during the execution of a subsequent job of the same task [20]. This can significantly reduce the number of bus/memory requests generated during the execution of subsequent jobs of tasks. However, this assumption is not considered by [8], [9] and hence these works result in overestimating the bus contention of tasks.
In this work, we exploit the interdependence between the cache memory, bus requests and bus contention, i.e., the bus contention suffered by tasks depends on the number of bus requests which in turn depends on the number of cache misses, to improve the analysis presented in [8], [9]. First, we analyze the cache to tightly upper bound the number of cache misses that lead to bus/memory requests during the memory phases.
We then improve the existing bus contention analyses [8], [9] by incorporating a tighter bound on the number of cache misses/bus requests into the analysis. Finally, we propose a WCRT-based schedulability analysis by integrating the bounds on bus contention into the WCRT of tasks. Formally, our main contributions are:
(1) Upper bounding cache misses of 3-phase tasks to compute bus/memory requests generated during the memory phases of tasks;
(2) Using the derived bound on cache misses, we improve the existing memory bus contention analyses for 3-phase tasks considering fixed-priority partitioned scheduling [8], [9];
(3) Integrating the bounds on bus contention into a WCRT formulation to perform schedulability analysis; and
(4) Extensive empirical evaluation under different settings to compare our proposed improved bus contention analyses to the existing bus contention analyses [8], [9]. Experimental results show that bus contention analyses that consider the interdependence between the cache misses and bus requests can improve taskset schedulability by up to 55%.
Acknowledgement
This work was supported by the CISTER Research Unit (UIDP/UIDB/04234/2020), financed by National Funds through FCT/MCTES (Portuguese Foundation for Science and Technology); by project ADACORSA (ECSEL/0010/2019 – JU grant nr. 876019) financed through National Funds from FCT and European funds through the EU ECSEL JU. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and Austria, Sweden, Spain, Italy, France, Portugal, Ireland, Finland, Slovenia, Poland, Netherlands, Turkey – Disclaimer: This document reflects only the author’s view and the Commission is not responsible for any use that may be made of the information it contains. This work is also a result of the work developed under project Aero.Next Portugal (nº C645727867- 00000066) and FLY-PT (grant nº 46079, POCI-01-0247-FEDER-046079), also funded by FCT under PhD grant 2020.09532.BD.
Access Complete Publication
For an in-depth exploration of our findings and methodologies, download here
