magicstack

uvloop: Blazing fast Python networking

by Yury Selivanov @1st1
May 03, 2016

TL;DR

asyncio is an asynchronous I/O framework shipping with the Python Standard Library. In this blog post, we introduce uvloop: a full, drop-in replacement for the asyncio event loop. uvloop is written in Cython and built on top of libuv.

uvloop makes asyncio fast. In fact, it is at least 2x faster than nodejs, gevent, as well as any other Python asynchronous framework. The performance of uvloop-based asyncio is close to that of Go programs.

asyncio & uvloop

The asyncio module, introduced by PEP 3156, is a collection of network transports, protocols, and streams abstractions, with a pluggable event loop. The event loop is the heart of asyncio. It provides APIs for:

  • scheduling calls,
  • transmitting data over the network,
  • performing DNS queries,
  • handling OS signals,
  • convenient abstractions to create servers and connections,
  • working with subprocesses asynchronously.

As of this moment, uvloop is only available on *nix platforms and Python 3.5.

uvloop is a drop-in replacement of the built-in asyncio event loop. You can install uvloop with pip:

$ pip install uvloop

Using uvloop in your asyncio code is as easy as:

import asyncio
import uvloop
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())

The above snippet makes any asyncio.get_event_loop() call return an instance of uvloop.

You can also create a uvloop instance explicitly, by calling uvloop.new_event_loop()

Architecture

uvloop is written in Cython and is built on top of libuv.

libuv is a high performance, multiplatform asynchronous I/O library used by nodejs. Because of how wide-spread and popular nodejs is, libuv is fast and stable.

uvloop implements all asyncio event loop APIs. High-level Python objects wrap low-level libuv structs and functions. Inheritance is used to keep the code DRY and ensure that any manual memory management is in sync with libuv primitives' lifespans.

Benchmarks

To check how the performance of uvloop stacks up against other implementations, we have created a toolbench to benchmark TCP and UNIX sockets I/O, and the performance of HTTP protocol.

Benchmarked servers run inside a Docker container with an outside load-generating tool (wrk for HTTP benchmarks) that measures request throughput and latency.

All benchmarks in this blog post were run on an Intel Xeon CPU E5-1620 v2 @ 3.70GHz, and Ubuntu Linux. We use Python 3.5, and all servers are single-threaded. Additionally, we use GOMAXPROCS=1 for Go code, nodejs does not use cluster, and all Python servers are single-process. Every benchmark sets the TCP_NODELAY flag.

Benchmarks on Mac OS X exhibit very similar results.

TCP

This benchmark tests the performance of a simple echo server with different message sizes. We use 1, 10, and 100 KiB packages. The concurrency level is 10. Each benchmark was run for 30 seconds.

See also the full TCP benchmarks report.

Some comments on each position:

  1. asyncio-streams. asyncio with its built-in, pure-Python event loop. In this benchmark, we test the performance of the high-level streams abstraction. We use asyncio.create_server() to create a server that passes a pair of (reader, writer) to the client coroutine.
  2. tornado. This server implements a very simple Tornado protocol, which immediately sends back any data it receives.
  3. curio-streams. Curio is the new kid on the Python aio lib block. Similarly to asyncio-streams, in this benchmark we test curio streams, using curio.make_streams() to create a pair of (reader, writer) that provide a high-level APIs such as readline().
  4. twisted. Similarly to Tornado, here we test a minimal echo protocol.
  5. curio. This benchmark tests the performance of curio sockets: a tight loop of sock.recv() and sock.sendall() coroutines.
  6. uvloop-streams. As in #2, here we test the performance of asyncio high-level streams, but this time on top of uvloop.
  7. gevent. We use gevent.StreamServer and a gevent socket to send/receive data in a tight loop.
  8. asyncio. It appears that vanilla asyncio is quite fast! Similarly to #2 and #4, here we test the performance of a minimal echo protocol implemented in pure-Python asyncio.
  9. nodejs. We use the net.createServer API to test the streams performance in nodejs v4.2.6.
  10. uvloop. This benchmark tests a minimal echo protocol (as in #2, #4, #8) implemented in asyncio, on top of uvloop. With 1 KiB messages, uvloop is the fastest implementation with whopping 105,000 requests per second! For 100 KiB messages, uvloop manages to pump through about 2.3 GiB/s.
  11. Go. A tight loop of net.Conn.Read/Write calls. Golang performance is very similar to uvloop, slightly better for 10 and 100 KiB messages.

How to read box-charts: 99%75%50%25%min

Code for all benchmarks can be found here.

See also UNIX sockets benchmark results.

HTTP

Initially, we wanted to test aiohttp on asyncio and uvloop against nodejs and Go. aiohttp is the most popular framework for writing asynchronous HTTP servers and clients with asyncio.

See also the full HTTP benchmarks report.

However, the performance bottleneck in aiohttp turned out to be its HTTP parser, which is so slow, that it matters very little how fast the underlying I/O library is. To make things more interesting, we created a Python binding for http-parser (nodejs HTTP parser C library, originally developed for Nginx). The library is called httptools, and is available on Github and PyPI.

For HTTP, all benchmarks use wrk to generate the load. The concurrency level is set to 300. The duration of each benchmark is 30 seconds.

Quite surprisingly, pure-Python asyncio, with the help of high-performance HTTP parser is faster than nodejs, which uses the same HTTP parser!

Go is faster for 1 KiB responses, but uvloop+asyncio is measurably better for 10/100 KiB responses. The quality of service is excellent for asyncio and uvloop with httptools, as wells as for Go.

Admittedly, httptools-based server is very minimal and does not include any routing logic, unlike other implementations. Nonetheless, the benchmark demonstrates how fast uvloop can be with an efficiently implemented protocol.

Conclusion

It is safe to conclude that, with uvloop, it is possible to write Python networking code that can push tens of thousands of requests per second per CPU core. On multicore systems a process pool can be used to scale the performance even further.

uvloop and asyncio, combined with the power of async/await in Python 3.5, makes it easier than ever to write high-performance networking code in Python.

Please try uvloop (github) and share your results with us!