I did a lot of reading this week—articles shared on Twitter, from newsletters, or passed along by friends. Topics include AI bias, data science careers, technical debt and testing data science systems, and more.

AI bias and the problems of ethical locality

Author: Amanda Askell (research scientist at OpenAI)

In this post by an OpenAI researcher, Dr. Askell argues that there are two “ethical locality” problems in discussing (and attempting to reduce) AI bias:

The first ethical locality problem is the problem of practical locality: we are limited in what we can do because the actions available to us depend on the society we find ourselves in. The second ethical locality problem is the problem of epistemic locality: we are limited in what we can do because ethical views evolve over time and vary across regions.

Practical locality: the options available to an AI decision maker depend on the practices of the society it is embedded in. “This means that even the most procedurally fair choice can reflect the unfair practices of this society.” The author gives an example in gendered hiring based on a hypothetical 1860s factory.

Getting rid of the unfairness that we have inherited from the past—such as different levels of investment in education and health across nations and social groups—may require proactive interventions. We may even want to make decisions that are less procedurally fair in the short-term if doing so will reduce societal unfairness in long-term. For example, we could think that positive discrimination is procedurally unfair and yet all-things-considered justified.

Epistemic locality: as we learn more about the world, our ethical views change. What might have been considered fair in 1860s (not hiring someone based on a physical disability, for example) would be considered unfair today (when we would attempt to find an accommodation).

In general, I think a good rule of thumb is ‘if a problem hasn’t been solved despite hundreds of years of human attention, we probably shouldn’t build our AI systems in a way that presupposes finding the solution to that problem.’

This was a fairly long post, but remarkably readable. Some of the author’s other posts look great, too, and I’m looking forward to reading more.

Are dashboards dead?

Author: Tristan Handy (found via Twitter)

You know Betteridge’s law of headlines, where if the headline is a yes/no question, the answer is “no”? This is a glaring example.

But the core idea behind this post is that the reporting side of data (dashboards, notebooks, slides, etc.), like so much else, “is not a technical problem, it is a people problem.” Democratization is nice, but pure democratization without curation is confusing and a waste of effort.

Instead, try to “empower people to do something meaningful with data” via curation (only model useful data, only create useful dashboards), context (explain why things matter), and complexity (use systems with low barriers to entry).

“Dashboards aren’t dead”—they write—“but the post [which said that they are] did a great job of highlighting some problems with the status quo that I feel like no one’s paying enough attention to.

Unpopular opinion - data scientists should be more end-to-end

Author: Eugene Yan

This is a personal blog post making the point that “it’s difficult to be effective when the data science process is split across different people.”

An end-to-end data scientist can identify and solve problems with data, to deliver value.

Being more end-to-end improves your ability to make an impact, Yan argues. It gives you more context on the problem (which is critical!) and reduces communication and coordination overhead.They cite organizational examples from Stitch Fix and Netflix.

Which aspects would disproportionately improve your ability to deliver as a data scientist?

This is a great question to reflect on. One of my near-term goals is to get more experience with productionalization and ML engineering. I think I’ll be referring to this post more often to remind me why.

Shorter articles

Google’s Advertising Platform Is Blocking Articles About Racism by Aaron Mak reveals that Google’s Ad Exchange platform has automatically demonetized articles reporting on racism. This included 10 articles on Slate, along with The Atlantic reprinting King’s “Letter From Birmingham Jail.”

The demonetizations can be (and were) appealed, but a Google spokesperson admitted that their moderation algorithm can sometimes single out keywords stripped of context. But the author calls out (correctly, in my opinion):

Still, the demonetizations underscore the fact that publishers’ ability to make money from their work remains partially at the mercy of opaque algorithms that can be tweaked at any time, possibly with significant consequences for their business. The potential concerns might be especially pronounced for news organizations whose missions are to focus on topics, like racism and discrimination, that aren’t necessarily ad platform–friendly.

Effective testing for machine learning systems by Jeremy Jordan clearly motivates the value of tests in machine learning, and contrasts their role to tests in normal software systems.

The difference: in traditional software, humans write logic to interact with data to produce desired behavior. In machine learning, humans provide desired behavior during training and testing, and use that with data to learn the logic.

The author splits tests into:

  • pre-train (to short-circuit training and identify issues ahead of time), like checking shapes, checking output ranges, making sure one gradient step decreases the loss, etc.
  • post-train (to verify that the model artifact did what we expect), which are much harder. One example is invariance (the sentences “Mark is a great instructor” and “Samantha is a great instructor” should have the same sentiment), and another is directionality (holding everything else constant, increasing X should cause an increase in the predicted value).

This was a great resource!

Technical debt for data scientists by Gordon Shotwell was shared in my work’s Slack. The author suggests addressing technical debt through testing (same story as always), documentation (an interesting idea was “treat documentation as a first-class skill in hiring”), robustness (general software development advice), and social interaction. The last is the most interesting—foster emotional safety on development teams to create trust and encourage feedback, then use that trust to do better code reviews.

Bike-share rebalancing is a classic data challenge. It just got harder. by Stephen Gossett on BuiltIn discusses a lot of the data challenges with bikeshare programs in NY. Classic demand estimation (inflow vs. outflow) fails badly, since both rentals and returns represent demand (for a bike, for a spot). The pandemic changing people’s behavior complicates the problem further.

Multi-armed bandits and the Stitch Fix experimentation problem by Brian Amadio is a Stitch Fix blog post describing their approach to experimentation, via multi-armed bandits. They build up the theory of why you might want to use a multi-armed bandit, then explain the architecture of their production experimentation system.

It’s clear that Stitch Fix has done a lot of key infra work to enable data science to move quickly and have strong impacts on the business. Giving data scientists the tools to experiment in this way sounds valuable!

And, finally, other things I read but don’t feel like summarizing (been doing lots of writing this week!):