The afternoon keynotes were on racism in policing, data in COVID-19, and lessons from FiveThirtyEight. Dr. Phillip Goff delivered an incredible talk on what we need to invest in to support Black communities, and Nate Silver went on with what it means to think probabilistically and work together in crisis. Amazing!

Racism and Policing: The Path Forward

Speaker: Dr. Phillip Atiba Goff

Dr. Goff, from the Center for Policing Equity, discusses #BlackLivesMatter and what kinds of services we need to invest in. Data can lead the way, but it has to be thoughtfully applied.

On “defund the police”: only the most extreme position on this means “completely abolish the police.” Think about “defunding,” though: Black communities have been defunded for generations, reducing investment in public education and health systems. “Instead of public health systems, we have law enforcement.”

We’ve defunded public education, we’ve defunded public health, we’ve defuned mental health … and the only public ‘good’ that we pour funding into … is the police. If we’re going to keep defunding Black communities, let’s take money out of law enforcement and put it into [all these other things].

Goff passionately argues about the need to invest in Black communities in other ways. This can’t be done easily, either; you can’t just throw money at them and hope that it works. This is where we need analytics: listen to Black communities to figure out what they need, of course, but then data and analytics have to help us allocate money.

There are lots of different things that we have to do to reimagine public safety, Goff continues. And we won’t get there from only the people who write laws and argue for civil rights. “We’re going to need nerds to make sure this happens responsibly,” too; that we’re going to need data to help us understand, for example, what’s going to happen in Minneapolis in the next year as they literally abolish their police.

With this moment being as big as it is, I implore all of you: … figure out a way to turn your “data nerd” status into “justice nerd” status. … We can do that now, and we couldn’t before.

An amazing talk—everyone should go watch the recording once it’s up.

Rapid Response Research for COVID-19 and Other Challenges: Machine Learning and Data Science at Cal

Speaker: Prof. Jennifer Chayes

Prof. Chayes will talk about a new “Computing, Data Science, and Society” (CDSS) at Cal; this spans computer science, their school of information, statistics, and their data science commons (which are research partnerships that are under construction). The talk started by discussing some of their research initiatives, but the remainder focused on COVID-19.

One of the projects was on predictions of hospital demand, by Bin Yu’s research group. Their goal was to predict demand 7 days out, which could help to distribute PPE (including 1M face shields). There were at least half a dozen other research projects discussed, including drug design, interpreting satellite images from the developing world, digital contact tracing, and helping understand how to create a COVID-safe campus.

The Signal and the Noise: the Big Lessons from 20 Years of Data Analysis

Speaker: Nate Silver

I was really excited to hear Silver speak! He started with some interesting stats:

  • 90% of the world’s data was generated in the past two years (approximately true at any given time)
  • only ~a third of scientific studies can actually be replicated

He discussed election forecasting: during the Iowa caucus, there was a fair amount of data and lots of room to argue about it. Who’s ahead (four people at different points), how much do you smooth the data, how do you connect the dots, and who won (literally two people depending on your definition of “winning”)?

And, famously, in 2016, 538 gave Trump a 28 - 29% chance of winning, and the Trump campaign gave him a 30% chance; compare this to NYT’s 15% or Princeton’s 1%. This was because of correlated errors: 538 understood that they didn’t have 50 separate states voting at once, but that a polling miss in one state made a polling miss in another state more likely.

Interpreting data is generally hard; elections are a great example of this, but so is coronavirus. What metric should you hone in on: new cases, hospitalizations, deaths, positive rate, all of them, none of them? These are all different ways to argue about something, and Silver argues that “motivated reasoning is a big issue.” When we see data that confirms our priors, we are less inclined to accept nuance, uncertainty, or future evidence to the contrary.

Silver continued on the signal-to-noise ratio, and how it can mislead us. The COVID vaccine stage 1 trials will be filled with all kinds of noise—false positives and just nonsense—and extracting a signal from all that is difficult. Polls, obviously, are the same way (they’re noisy indicators, not anything precise).

He offers suggestions:

Think probabilistically:

  • make forecasts with the margin of error, apply the margin of error in your head, and calibrate your probabilities (do things you say have a 30% chance of happening actually happen 30% of the time?)
  • lean on high-probability suggestions: a few months ago, if everyone had started wearing masks despite not knowing they’re effective but knowing it was likely they were effective, we would be in a better place today.
  • think about actionable insights: what’s the likely 100-mile radius for where a hurricane will hit, so that we can recommend evacuations?
  • don’t be overcertain; there’s no prize for it.

Know where you’re coming from:

  • we’re all approaching the world with our own perspective.
  • Silver asked the question “what characteristics make crowds wise?", which is the subject of the book by James Surowiecki. The answers were diversity (bring people with different ways of thinking about a problem), independence (empower people to e.g., question their boss), and decentralization (don’t be in an echo chamber).

Silver gives the example of Joe Biden winning the Democratic primary. If you use Twitter a lot, this may have come as a shock to you; but he was in fact a frontrunner for the nomination. It’s not a good default assumption to assume that everyone who disagrees with you is misinformed.

Trial and error: until you have a product actually tested by real people, you can be quite surprised. He also cites the 80/20 rule, that 20% of the effort can get you 80% of the way there (e.g., when building an election model, when playing poker, etc.). The marginal improvements, though, are where you really get to differentiate yourselves (as Silver did with election forecasting).

That was awesome; and there was a AMA session with him after which was also insightful. This was a great set of keynotes.