The ability to have reliable and ordered delivery of a group of messages can facilitate and simplify many distributed applications. To achieve ordering, existing approaches either employ centralized sequencers or tokens, thus suffer from limited scalability, or use distributed consensus protocols, which incurs high overhead in bandwidth and delay.
This paper describes Reliable Ordered Message Scattering (ROMS), an efficient and scalable method to scatter groups of messages in serializable order via data center network. ROMS is scalable as it distributes work to each switch and end host. It is reliable in the sense that a message is guaranteed to be delivered exactly once to a nonfaulty host. At its core, ROMS separates the bookkeeping of order information from message forwarding. ROMS aggregates order information using in-network computation at switches. This forms the “control plane” of the system. On the “data plane”, ROMS forwards messages in the network as usual and reorders them at the receiverend based on the order information.
We build two ROMS prototypes using Barefoot and Arista switches. Our evaluation shows that ROMS achieves high performance and fault tolerance with low CPU and network overheads. As case studies, ROMS improves atomic multi-site operation throughput by 50x under YCSB+T workload, achieves 100x scalability for TPC-C Payment transactions and scales serializable log replication.
Note: Our ongoing research has significant improvement over the preliminary poster publication.
- Oct. 2017, Second Place, SOSP’17 Student Research Competition (SRC) Undergraduate Category by Gefei Zuo.