A Scalable and Efficient Architecture for FPGA-based HTTPS Accelerator
Joint project with Tianyi Cui for Microsoft Hackathon 2016.
HTTPS is a protocol for secure connection to web services. The rising concern over user’s privacy on the Internet has led most Microsoft web services to offer access via HTTPS. Over the last 5 years, the portion of HTTPS traffic has grown by 40% every 6 months, and now accounts for more than 40% of all web connections.
HTTPS provides three things to ensure security. First, it authenticates the entity of web server during connection setup. Second, it encrypts data transmission between the user and the web server. Third, it checks integrity of data. Among the three mechanisms, the first authentication part is the most computationally intensive.
Without HTTPS, a single CPU core can process more than 7 thousand requests per second. With HTTPS, the throughput drops 35 times due to TLS handshake in connection setup. The high computational overhead of TLS handshake has been the major obstacle for high traffic volume websites to deploy HTTPS for security.
In TLS handshake, the web server performs a decryption using private key. For 2048 bit private key, the decryption involves about 8 million integer multiplication operations. A CPU core needs to execute the multiplications one by one. FPGA is a reconfigurable hardware with thousands of multiplication units that can operate in parallel. However, programs on CPU can be arbitrarily long, while FPGA has very limited resource to store programs. TLS certificate parsing and protocol handling are complicated, but requires much less computation than decryption. So we modify the OpenSSL library to offload decryption to FPGA, and leave the other parts on CPU.
We build the accelerator on top of the Catapult FPGA platform, which has been deployed in Microsoft data centers. Catapult FPGA has accelerated Bing search ranking by 20 times and Azure networking by more than 10 times. In this Hackathon, we apply the FPGA to HTTPS acceleration. Our preliminary implementation can support 12 thousand decryptions per second, which is 20 times faster than a single CPU core. In terms of power efficiency, FPGA consumes 3 times less power than CPU.
In addition to high performance and power efficiency, our accelerator design has two more advantages. First, FPGA is predominately programmed with low-level hardware description languages which are hard to program and hard to debug. We leverage the ClickNP framework from Microsoft Research Asia to program the FPGA with high-level language. With high-level language, software developers can also write accelerator code, and the development is several times more efficient.
Second, we build the accelerator in a modular and scalable way. The accelerator needs to co-locate with other accelerators in data center, and the private key size is increasing over years for better security. There are multiple algorithms in each step of the decryption, some of them are slower but take less resource, and some of them are faster but take more resource. Our design explores the design space automatically and generates hardware with the best throughput for a given set of resource constraints and private key size.
To summarize, our HTTPS accelerator on reconfigurable hardware will accelerate connection setup of HTTPS web server by 20 times. The accelerator is based on Catapult FPGA already deployed in Microsoft data centers, so it does not require additional hardware. With HTTPS accelerator deployed in data centers, Microsoft will be able to provide secure web services to both end users and cloud customers with significantly reduced cost.
Aug. 2016, Global 2nd place in Cloud and Enterprise Category, Microsoft Hackathon 2016. [Video]
Note: This HTTPS Accelerator project is mainly done by Tianyi Cui.