Doing Good Data Science : Ethics of Data Science

We need to understand how to build the software systems that implement fairness. That’s what we mean by doing good data science. Any code of data ethics will tell you that you shouldn’t collect data from experimental subjects without informed consent. But that code won’t tell you how to implement “informed consent.” Informed consent is easy when you’re interviewing a few dozen people in person for a psychology experiment. Informed consent means something different when someone clicks an item in an online catalog (hello, Amazon), and ads for that item start following them around ad infinitum. Do you use a pop-up to ask for permission to use their choice in targeted advertising? How many customers would you lose if you did so? Informed consent means something yet again when you’re asking someone to fill out a profile for a social site, and you might (or might not) use that data for any number of experimental purposes. 

Do you pop up a consent form in impenetrable legalese that basically says “we will use your data, but we don’t know for what”? 

Do you phrase this agreement as an opt-out, and hide it somewhere on the site where nobody will find it? 

That’s the sort of question we need to answer. And we need to find ways to share best practices. After the ethical principle, we have to think about the implementation of the ethical principle. 

That isn’t easy; it encompasses everything from user experience design to data management. 

Open next page to see more