What I read this week (December 6 - December 12)

2020-12-12 481 words what I read,

Building a feature store at DoorDash, radial data visualizations, and writing programming comics.

Building a Gigascale ML Feature Store with Redis, Binary Serialization, String Hashing, and Compression by Arbaz Khan and Zohaib Sibte Hassan, from the DoorDash engineering blog, is a technical deep dive into their ML feature store. Features are inputs to machine learning models, like the device from which an order was made or a list of restaurants that a customer has ordered from. Feature stores make these inputs available to models running in production.

DoorDash had billions (?!) of features, associated with customers, merchants, and menu items. They needed millions of lookups per second (which each had to be very fast), and mostly-nightly (but sometimes more frequent) full data refreshes that could take longer. They benchmarked candidate key/value stores (Cassandra, CockroachDB, Redis, ScyllaDB, YugabyteDB), finding Redis to be fastest, then focused their efforts on optimizing it.

I think this was a very well-written engineering blog post. The authors clearly outlined their requirements for people outside DoorDash to understand, and used these to describe a methodical approach to building out their feature store. (I have to wonder if it was so methodical in practice, too!)

Sidenote: ML feature stores feel like a problem that many companies have to reinvent in house; Eugene Yan maintains a list of some of these. I wonder when we’ll start seeing more standardization in this space.

Why use a radial data visualization? from Kerry Rodden on Observable gives a deep dive into radial visualization formats, where data is laid out circularly, instead of on the XY plane. Rodden discusses why you might choose this kind of visualization (they’re visually interesting, which can drive engagement; they also work well for periodic time series data). Drawbacks include misleading areas (our perception of sizes of wedges is unreliable) and being harder to immediately understand.

That all was interesting—but the post included amazing examples built in D3.js. They were flawlessly integrated with the post, inviting me to explore the ideas the author was presenting interactively. Above all, I appreciate how much work must have gone into building this!

Electoral college decision tree by Kerry Rodden was referenced in the above post; it’s a radial data visualization of electoral outcomes in the 2020 presidential election. This way of presenting the data is more complicated than others, like the NYT (I had to link a tweet because the actual page just shows Biden winning now). But it’s more visually interesting, and I’m sure it was fun to make.

How I write useful programming comics by Julia Evans was a nice look into her process for creating her amazing comics. Her comics are typically about surprising or hidden facts, lists of important things, or stories. I really love posts that show me part of the creative process of people I respect, or people whose work I admire, so I enjoyed this.

Return home