Articles I read this week, including “data science is different now” and a Washington Post editorial about regulating AI.

Data science is different now

Author: Vicki Boykis

How I found this: I think Twitter?

Summary: this is a long post about the recent history and current state of data science. The thesis is that there’s an important problem in data science: “an oversupply of junior data scientists hoping to enter the industry, and mismatched expectations on what they can hope to find once they do get that coveted title of data scientist.”

Boykis discusses at length the glut of aspiring data scientists trying to break into the field, citing the explosion of MOOCs, rise of bootcamps, and even the creation of data science degrees by universities. She goes on to talk about unrealistic expectations for what “data science” is—everyone wants to do modeling, and no one wants to do data cleaning. It’s also become clearer that data science is largely engineering, and that these skills are very important.

She closes the post with advice to aspiring data scientists: don’t aim for a data science job (there aren’t very many and you have a lot of competition), then learn the skills needed for data science (SQL, a programming language, and cloud skills). “Pick a small piece of something and start there. Do something small. Learn something small, build something small.”

Thoughts: this is a great post. I haven’t found many resources that discuss the state of data science with as much nuance as this one. The advice that “your first job might not be a data science job” is some of the best I’ve seen, and the list of problems is remarkably comprehensive. It furthers my belief that data science requires strong engineering skills with a data background; whether that’s more stats, visualization, or modeling is less important.

Here’s how to regulate AI properly

Author: R. David Edelman (Washington Post)

How I found this: probably Skynet Today

Summary: the White House published regulatory guidelines for AI earlier this week. The memo is about what AI should look like in practice, and not about what ethical AI is or about any specific technology. This article calls for the government to craft “substantive, tailored AI policies” that look at the ways specific technologies are used in specific contexts.

Regarding AI as a truly singular technology is a mistake, one that puts us at risk of missing out on its potential while also inviting algorithmic dystopia.

The author brings up the European GDPR as an example of a well-intentioned policy which works in many cases, but also has disastrous side effects (on e.g., small companies). “Developing these policies will be hard, technical work. But it’s the only way we can weigh values in conflict and ensure that AI systems are used for us — and not against us,” the author concludes, and I decidedly agree.

Thoughts: this is a good take. The core point about “AI” not being a monolithic type of technology is vastly important, but it speaks to something even more general about the “tech industry.” Ben Thompson has written about this a few times:

That leads to a broader point: “tech” is not simply another category, like railroads or telecom. Tech is a means, not an end, but Senator Warren’s approach presumes the latter. That is why she proposes the same set of rules for the sale of toasters and the sale of apps, and everything in between. The truth is that Amazon is a retailer; Apple a combination of hardware maker and platform makers. Google is a search and advertising company, and Facebook a publishing and advertising company.

AI is not a technology; AI is a field of study and practice. The industry matters too; regulating credit scoring ML systems in banks must be different than regulating audience measurement ML systems in media companies.

I do worry about incompetent or otherwise inadequate regulation, though. Mark Zuckerberg’s Senate hearing in April 2018 betrayed that an alarming number of lawmakers have a fundamental misunderstanding of how Facebook (and tech more broadly) works. This leaves me little faith in systems that even technology professionals have difficulty making sense of.

Finally, I’ll readily admit that policy in general is one of my blind spots. I wish I knew more about it, and have been reading more about it in the recent months, but it’s still something that I don’t feel all that confident writing about yet.