Articles I read this week, including privacy micro-networks, “so-so AI,” pieces on political modeling, and more.

Why private micro-networks could be the future of how we connect

Author: Tanya Basu

How I found this from Skynet Today

Summary: this is a critique of current social media from the angle of oversharing. Current platforms, the author argues, are based around carefully cultivating your online image and broadcasting highlights of your life. It’s not quite right for more close-knit communication. “Updating family about a vacation across platforms—via Instagram stories or on Facebook, for example—might not always be appropriate. Do you really want your cubicle pal, your acquaintance from book club, and your high school frenemy to be looped in as well?”

The social norms and wide-reaching audiences of platforms like Facebook, Instagram, and Twitter often make sharing inappropriate. Many people keep multiple accounts on these platforms (ever heard of finstas?) which are intended for different audiences. Micro-networks offer more control of your audience.

Thoughts this kind of reads like an advertisement for Cocoon, but I’ll focus on the ideas behind it. I largely agree with the point here: despite social media aiming to connect people, it often feels impersonal. It’s also the case that recent data mishaps have created a desire for a more privacy-focused, and perhaps more intimate, network. And platforms that optimize for engagement can create anxiety or otherwise have unintended effects, like on YouTube.

I’m not sure if something like Cocoon can take off. The free internet is supported by ads, and ads are so heavily ingrained in the norms of the internet that the idea of paying for online services seems outlandish. But I’ve recently found myself more willing to pay for ad-free, high-quality content (like Stratechery, and I’m also considering The Athletic or NYT / The Upshot), and I think this will slowly catch on.

Regardless, I appreciate this article for proposing a new alternative to the dominant social media platforms. The problems inherent in these platforms seem to be getting more obvious (maybe I’m just paying more attention), and questions about platform design are increasingly interesting to me.

The Democratic electorate on Twitter is not the actual Democratic electorate

Authors: Nate Cohn and Kevin Quealy

How I found this: when trying to find the Atlantic article about the same topic

Summary: this is an article characterizing how the Democratic electorate on Twitter differs from the electorate off Twitter. It summarizes data from the Hidden Tribes project. Some highlights:

  • 29% of Dems on social media identify as moderates or conservatives, but 53% of other Dems do
  • 48% of Dems on social media say political correctness is a problem in the US, compared to 70% of other Dems
  • 11% of Dems on social media are African American, compared to 24% of other Dems

Those who post on social media are also more likely to have a college degree, to be white, to have become more liberal throughout their life, and to donate to a political group. There are a number of other comparisons in the article, but the takeaway is that your average Twitter liberal is quantifiably different from your garden variety Democrat.

Thoughts: this is great, and absolutely consistent with my experience. Perhaps it’s the trend of social media to create outrage, but I often see stark differences between Democrats who I follow online and those offline. The Hidden Tribes report is certainly something that I want to dig into in the future. To me, the message is clear: social media is warping our perception of what’s normal.

So-So Artificial Intelligence

Author: Jessy Lin

How I found this: Skynet Today (on their website)

Summary: if we stop thinking about Artificial General Intelligence (AGI) for a moment, there are a lot of interesting questions about “narrow” task-specific AI. This article discusses what the authors call “so-so AI,” talking about how many AI applications (like customer service chatbots or facial recognition) are just “so-so,” but still disruptive enough to displace workers.

The author continued by discussing reasons for these trends, citing “the culture around AI in industry and academia” as the core concept. Research, the author argues, designs for humans out of the loop, meaning systems are rarely designed (and almost never evaluated) for humans to be working with them.

Thoughts: this is very well argued. I love the concept of human-centered AI, and it’s one of the broad research areas that I’m interested in. Designing AI to work with humans is a tough, interdisciplinary, and domain-specific problem. Some of the papers I’ve read recently have talked about possible human-in-the-loop applications of ML for online content moderation, and there are many other ideas out there too.

Bayesian Product Ranking at Wayfair

Authors: Dave Harris and Tom Croonenborghs

How I found this: one of my coworkers (thanks Zack!)

Summary: This is a Wayfair blog post discussing a new Bayesian system they built to identify products likely to appeal to people & present them to their customers. The first problem was to determine products’ general appeal, but this is complicated by the fact that products at the top are more likely to be ordered first.

They created a logistic regression that fits the probability that a customer will order product i in position j as logit(α + product effect[i] + position effect[j]. Product effects are due to product-specific characteristics, and position effects are due to the product’s position on the site.

Another challenged faced is sparsity; small counts can be super noisy, and so you always have to worry about e.g., a shower curtain that was shown to just one customer who ended up purchasing it (a 100% conversion rate!). The post is vague about how they handle this, but it sounds like some kind of partial pooling approach that shares information across products.

They’re also not just interested in orders; clicking a product, adding it to a cart, or saving it for later all also provide signals about a product’s appeal. When considering path-to-purchase, the team modeled this as p(order | product shown) = p(click | product shown) * p(add to cart | click) * p(order | add to cart). They can fit these stages separately and combine the results.

Finally, the model needs to be updated over time. Bayesian methods excel at incremental learning; they use (most of) the previous day’s posterior as the prior for the next day, and this almost entirely takes care of the problem.

Thoughts: it’s incredible how new ways of looking at a problem can lead to innovative solutions. This entire model is a logistic regression with “some other stuff” (pooled parameters, incrementally trained, etc.), and I admire the simplicity of it all.

I’m also coming to appreciate logistic regression much more now that I’m in the industry. It’s incredible how powerful it can be when you choose your parameters intelligently. The simple idea of “add effects for all of these together, scale it to (0, 1)” gets you really far, and its simplicity & interpretability are quite valuable.

Other, shorter articles

Making Machine Learning Scalable and Accessible at Grubhub is an article from Grubhub’s data science & MLE teams about the machine learning platform they’ve built. This is the latest in a long line of company-internal ML platforms, including Uber’s Michelangelo, Facebook’s FBLearner Flow, the one that I worked on at Nielsen, and more that I can’t find. The article is a good read, but not much that I haven’t seen before.

Reflecting on our Python 2 to 3 Project is an article about Rover’s migration from Python 2 to 3. It is a followup to a previous post they wrote, Moving Safely to Python 3, nearly a year ago.

Applying mypy to real world projects is some nice, practical advice on mypy. It’s more advanced than introductions or tutorials, and more practical than most resources I’ve seen. It links to the Dropbox post on type checking their codebase, which is also a great read.

The terrifying uncertainty at the heart of FiveThirtyEight’s election forecasts is an article from late 2018 that breaks down a lot of the forecasts that FiveThirtyEight makes. This is a clear, well-written piece that does a fantastic job of communicating the uncertainty inherent in predictions. FiveThirtyEight repeatedly stresses that outcomes that are 80% likely are not certainties, and this is often lost on people interpreting their predictions; this article does the opposite and instead breaks down that that means.