Abstract |
The emergence of programmable switching ASICs has substantially realized the goal of flexible, high speed, programmer friendly packet processing. A critical concern, however, when compiling a program into such an ASIC is the depth of the program's pipeline layout. The depth of a pipeline is correlated with its latency, which in turn is correlated with pipeline power consumption and FLOP count. Further, because many commercial pipelines such as RMT and Flexpipe cannot allocate memory freely to stages, noncompact pipelines can result in significant memory underutilization in underloaded stages. Inter-table control and data dependencies critically limit the ability of compilers to layout tables compactly, but how to design pipelines which can resolve these dependencies has not been studied before. In this paper, we introduce precedence, an extension of the RMT pipeline architecture which enables tables linked by dependencies to be executed in parallel or even out-of-order. We analyze precedence, and show that it resolves up to 90% of dependencies in our benchmark program suite, achieves pipeline depth reduction of up to 40%, and can be implemented with lightweight and inexpensive hardware. |