Friday, April 29, 2016

Ethics in Data Science


“Big data”: the term is everywhere. We are living in the aftermath of a big data explosion, where mountains of data have been collected and are just waiting to be mined. Data scientists are the people who have the necessary skills to extract usable insights from this sort of data. As such, they are in high demand, prized by tech companies seeking to make better business decisions. However, what people need to remember is that the field of data science is so new, and even the top data scientists are still figuring out best practices.

One issue that has been discussed recently is ethics in data science. Many data scientists hail from academia, with PhD’s in Physics, Astronomy, Computational Biology, and others. They are all very good at working with data from their respective domains. But it is a very different thing to start working with data on humans, compared to data on astronomical observations, for example. Some of the data may be from people who may not have ever given explicit informed consent to have such data collected about them. It may be tempting to think that “mere information” is not a big deal, because it is not like an individual human is being harmed. But as Fairfield and Shtein wrote in 2014, “The traditional focus of social science has been on physical, rather than informational harms, and on not harming individuals, but big data impacts communities as much (or more) than individuals. Yet the notion that information is not a cognizable harm is not supportable in the context of an information-based society.”

Furthermore, there is so much that we cannot foresee about the implications of using data in a certain way. Using a dataset for a purpose that it was never intended for can have consequences, so it is important for data scientists to be careful and sensitive. They need to keep in mind the environment that the data was collected in, and how clean the data is. As Cathy O’Neil wrote, “People have too much trust in data to be intrinsically objective, even though it is in fact only as good as the human processes that collected it.” (2016) She wrote about a former project of hers in which her team was tasked to develop an algorithm to predict how long a family would be in New York City’s homeless services program. The aim was to pair each homeless family with the most appropriate service. The dataset to be used for training the algorithm spanned the last 30 years, and her team had to decide whether or not to use race as a predictor variable. In the end, they decided not to use it, because the dataset was sure to be biased against blacks. Due to the effects of racism, blacks have historically made up a large portion of the homeless population. If the algorithm found that blacks were less likely to find a job, and the algorithm was used by city officials today, then homeless black men might be less likely to receive job counseling.

Knowing all this, it can seem like data scientists of today are walking through minefields, because there is no telling when their work might have a negative impact on communities. Thus, it is important to keep the conversation alive, and to keep asking the right questions of ourselves. We must keep asking whether we are doing enough, and whether we have considered the unique needs of the communities whose data we handle.


References

[Untitled illustration of an iceberg with the words “BIG DATA” written on it]. Retrieved April 29, 2016 from http://timoelliott.com/blog/wp-content/uploads/2013/06/big-data-graphic-iceberg-690.jpg

Fairfield, J. and Shtein, H. (2014). Big Data, Big Problems: Emerging Issues in the Ethics of Data Science and Journalism. Journal of Mass Media Ethics, 29(1), 38-51.

O’Neil, C. (2016). The Ethical Data Scientist. Slate. http://www.slate.com/articles/technology/future_tense/2016/02/how_to_bring_better_ethics_to_data_science.html

Friday, April 15, 2016

Security vs Privacy: is it all or nothing?



Last month, South by Southwest 2016 opened with President Obama in a keynote conversation. He discussed the issue of privacy and security with Evan Smith, who is Editor in Chief of The Texas Tribune. (You can watch the video here.) I am sure that there were many people from the tech community in the audience for whom this is an extremely sensitive issue. Many executives from top tech companies signed a letter supporting Apple’s decision to resist the FBI's request for help breaking into the iPhone, which would subsequently compromise user privacy. Obama declined to speak specifically about the San Bernardino iPhone issue, but he did talk about the issue in a broader sense. He said, “The dangers are real. Maintaining law and order and a civilized society is important. Protecting our kids is important. And so I would just caution against taking an absolutist perspective on this, because we make compromises all the time.” (Obama, 2016)

One of the things that I admire about Obama is his ability to see the nuances in an issue. I think it is the mark of intelligence to recognize that things are rarely black and white. Life is lived mostly in the gray areas. Thus, I agree with Obama that we should not approach the iPhone issue with an absolutist point of view. Brian Barrett from WIRED also agreed, as he wrote, “It also should not be framed as an absolute. Doing so presents the issue to the American public in a way that makes the FBI’s request palatable while obfuscating the potentially dangerous precedent it would represent.” (2016)

What we should be striving for is the right balance between privacy and security, where in most cases, only the right people have access to someone’s phone. In this case, Apple was justified in refusing to build the tool that the FBI was requesting. It is too much to ask of them, as it would violate the trust that they had worked so hard to build with their users. The government should figure out how to handle these situations without depending on the industry. As it turned out, that is exactly what happened: the FBI didn’t need Apple’s help after all, as they were able to retrieve the data from the iPhone eventually. Now it is up to the American people to decide whether the FBI should have been allowed to break into the iPhone or not.

This is how it’s always been, in that the US government uses its own resources to protect Americans, and sometimes that involves accessing people’s private data. Whether Americans feel that the government has overstepped has always been debatable, and people should remember that public opinion can shift over time. Indeed, the post-9/11 world was marked by increased support for government surveillance. As Levi and Wall wrote in 2004, “Despite fierce opposition from pro-privacy groups, the shift towards increased dataveillance has tended to occur against a backdrop of increased public support and against a broadly muted opposition, providing policing and security agencies with a strengthened mandate to carry out obtrusive security measures that would not have been so readily tolerated previously.”

Years later, in June 2013, news broke that Edward Snowden had revealed documents pertaining to NSA surveillance programs. A poll conducted amongst Americans shortly afterwards showed that “three-quarters said they approved of the government’s tracking phone records of Americans suspected of terrorist activity. Nearly the same number approved of the United States’ monitoring the Internet activities of people living in foreign countries.” (Kopicki, 2013)


Clearly, many people support surveillance when it is framed as a need to combat terrorists. But the issue is never that simple, is it? Several community leaders have suggested that these issues be addressed in Congress, rather than through hastily developed local agreements or secret surveillance programs. That is something I wholeheartedly support, because it would allow America to have the discussion it so badly needs in today’s technology-driven world.


References
[Untitled illustration of a door with the Apple logo by Then One for WIRED magazine]. Retrieved April 15, 2016 from http://www.wired.com/2016/02/apple-fbi-privacy-security/

The Daily Conversation (Producer). (2016, March 11). Obama Explains The Apple/FBI iPhone Battle. Retrieved from https://www.youtube.com/watch?v=ZjvX5zq7BXg

Barrett, B. (2016). The Apple-FBI Fight Isn’t About Privacy vs. Security. Don’t Be Misled. WIRED. http://www.wired.com/2016/02/apple-fbi-privacy-security/

Kopicki, A. (2013). Poll Finds Disapproval of Record Collection, but Little Personal Concern. The Caucus (New York Times blog). http://thecaucus.blogs.nytimes.com/2013/06/11/poll-finds-disapproval-but-little-personal-concern-about-record-collection/

Levi, M. and Wall, D. S. (2004). Technologies, Security, and Privacy in the Post-9/11 European Information Society. Journal of Law and Society, 31(2), 194-220.

Friday, April 1, 2016

The Perfect Internship

Photo Credit: Stuart Isett for The New York Times


It’s officially spring, and many students are currently searching for summer internships. Here in Silicon Valley, the market for computer science (CS) internships is highly competitive. Many students struggle to land internship offers. But suppose, hypothetically, that a student received multiple offers. What factors would influence her decision on which offer to accept?

Now, I am not a CS major. But as an MS Statistics student hoping to work in the field of analytics and data science, I value some of the same things that a CS major values in an internship. For me, there are three things that are most important to me. One is the possibility of the internship leading to a full-time position after I graduate. In Silicon Valley, connections are still the best way to land a job. If the company is open to hiring me after I graduate based on the great work I did as an intern, I cannot afford to pass up that opportunity. This sentiment is shared by other students, as shown in a study James Risley wrote about. Out of all the factors that students value in an internship, career advancement opportunity was the most important attribute (Risley, 2015).

Another thing that is important to me is learning skills not currently being taught in my grad program, namely tools for working with large terabyte-scale data such as MapReduce, Hadoop, and Spark. It seems that few students are exposed to these in their university programs; the bulk of people become familiar with these tools on the job. Thus, I need to learn those tools in order to stay competitive when I graduate and begin hunting for a job in data science.

My ability to learn, however, depends on whether I am in an environment that is conducive to learning. This leads me to the third thing that’s important to me, and I believe it is THE most important one: a good mentor. Even if I am surrounded by the fanciest technology in the world, I can’t learn unless I have someone who is willing to teach me, and is invested in my success. I do not thrive in very unstructured environments wherein I am expected to be practically independent. I need someone who is patient enough to explain concepts, and is interested in building a solid working relationship with me. I need to get the sense that I have something valuable to contribute. This is especially important to me as a minority woman planning to make a career in tech, where men greatly outnumber women; if I don’t feel like I belong, I am not going to come back. The importance of mentors is reiterated by Gloria Townsend in a 2002 paper. She wrote, “Whether mentors are men or women, if mentors can provide encouragement in a genuine and sincere manner, then these simple actions can transform the way a young woman views her connectedness to our discipline.” (Townsend, 2002)


Works Cited
Isett, S. (Photographer). (2007). [Untitled photograph of female students in front of a computer], Retrieved on April 1, 2016 from http://www.nytimes.com/2007/04/17/science/17comp.html?_r=0

Risley, J. (2015). Women nearly one third as likely to seek internships in tech, study finds. Geekwire. http://www.geekwire.com/2015/women-nearly-three-times-less-likely-to-seek-internships-in-tech-study-finds/

Townsend, G. C. (2002). People Who Make a Difference: Mentors and Role Models. ACM SIGCSE Bulletin - Women and Computing Homepage archive, 34(2), 57-61.