This week’s reading was all over the place: a talk on Docker and Python for data science, authorial intent in code, the book Clean Code, and an essay on NPM.

[Talk] Docker and Python: making them play nicely and security for data science and ML

Speaker: Tania Allard

This talk, from (the online) PyCon 2020, was a data science-focused introduction to Docker. The start of the talk was a pretty standard overview of what Docker is and why you might want to use it. Allard explained these clearly, which I appreciated, and her slides were amazing. I learned about tools like repo2docker and cookie-cutter for Docker.

Allard concluded by discussing automation. Using CI to rebuild your Docker images on new pushes and every 2-ish weeks can be a good practice, since it helps you to keep dependencies up to date or (at the very least) install security updates. She showed an example workflow using (I think?) Github Actions.

I wish the talk had gone into a little more depth, but otherwise it was quite good!

Code Only Says What It Does

Author: Mark Brooker

In this blog post, Mark Brooker makes the point that code only tells you what it is does. As a result, understanding intent is one of the most challenging parts of handling software:

Since most software doesn’t have a formal spec, most software “is what it does”, there’s an incredible pressure to respect authorial intent when editing someone else’s code. You don’t know which quirks are load-bearing.

Brooker suggests design documentation to help with this. Document not only the how and what behind the designs, but also how you decided. What assumptions did you have when building the system? What cases did you believe were out of scope? What did you think the software would, and would not, have to handle?

Oh, and comments. Those help too.

It’s probably time to stop recommending Clean Code

I’m not sure who the author is, but this is a post tearing apart Robert Martin’s Clean Code, a widely recommended book about writing good software. “The major problem I have with Clean Code is that a lot of the example code in the book is just dreadful.

My criticism of the book, ever since I was first introduced to it, was that it was too Java-focused. The “everything in a class, clear public/private distinctions” didn’t sit well with my Python-trained self, but I chalked that up to different languages having different patterns. Here, though, the author calls out specific examples from Clean Code as being simply terrible code.

The post (unlike the example code!) is quite readable. I recommend it (and do not recommend the book).

Worrying about the NPM Ecosystem

Author: Sam Bleckley

The NPM ecosystem seems unwell. If you are concerned with security, reliability, or long-term maintenance, it is almost impossible to pick a suitable package to use—both because there are 1.3 million packages available, and even if you find one that is well documented and maintained, it might depend on hundreds of other packages, with dependency trees stretching ten or more levels deep—as one developer, it’s impossible to validate them all.

This is a much more well-argued take than The NodeJS Ecosystem is Chaotic and Insecure, which makes the rounds every few months, while ultimately coming to the same conclusions. Bleckley takes a data-driven approach by charting dependencies of packages in NPM, which ideally would be low-depth, have few indirect dependencies, and few cycles.

Why? NPM makes it easy to create a package; the package.json file is used to define projects and packages alike. And JavaScript has no standard library. That packages are easy to make isn’t bad—it enables people to get into programming and feel like they’re doing something “real” much more easily than in, say, Python.

“I suggest that this is a social problem, more than a technical one, and propose a semi-social solution”, he writes.

Bleckley proposes labeling “healthy” packages, which are actively maintained, triage bugs, resolve security issues, etc., and then goes on to explain why this would never work. “The most popular javascript packages, especially large frameworks, have no impetus to dramatically change their practices just because some person in West Michigan made some graphs and said it would be nice.” How thoughtfully self-aware!

I cannot offer a batteries-included answer, but I want you to walk away with these conclusions:

  • Something is wrong with the javascript ecosystem.
  • It’s not just a feeling; it’s measurable.
  • It’s not really a technical problem, but mostly a social one.
  • A functional solution will not be to change how packages are published, but how packages are selected for use.
  • That solution will need to take into account not just a package, but all its direct and indirect dependencies, too.
  • And, for approval and adoption, it will need the social clout of major names in the field.