Friday, April 29, 2016

Ethics in Data Science


“Big data”: the term is everywhere. We are living in the aftermath of a big data explosion, where mountains of data have been collected and are just waiting to be mined. Data scientists are the people who have the necessary skills to extract usable insights from this sort of data. As such, they are in high demand, prized by tech companies seeking to make better business decisions. However, what people need to remember is that the field of data science is so new, and even the top data scientists are still figuring out best practices.

One issue that has been discussed recently is ethics in data science. Many data scientists hail from academia, with PhD’s in Physics, Astronomy, Computational Biology, and others. They are all very good at working with data from their respective domains. But it is a very different thing to start working with data on humans, compared to data on astronomical observations, for example. Some of the data may be from people who may not have ever given explicit informed consent to have such data collected about them. It may be tempting to think that “mere information” is not a big deal, because it is not like an individual human is being harmed. But as Fairfield and Shtein wrote in 2014, “The traditional focus of social science has been on physical, rather than informational harms, and on not harming individuals, but big data impacts communities as much (or more) than individuals. Yet the notion that information is not a cognizable harm is not supportable in the context of an information-based society.”

Furthermore, there is so much that we cannot foresee about the implications of using data in a certain way. Using a dataset for a purpose that it was never intended for can have consequences, so it is important for data scientists to be careful and sensitive. They need to keep in mind the environment that the data was collected in, and how clean the data is. As Cathy O’Neil wrote, “People have too much trust in data to be intrinsically objective, even though it is in fact only as good as the human processes that collected it.” (2016) She wrote about a former project of hers in which her team was tasked to develop an algorithm to predict how long a family would be in New York City’s homeless services program. The aim was to pair each homeless family with the most appropriate service. The dataset to be used for training the algorithm spanned the last 30 years, and her team had to decide whether or not to use race as a predictor variable. In the end, they decided not to use it, because the dataset was sure to be biased against blacks. Due to the effects of racism, blacks have historically made up a large portion of the homeless population. If the algorithm found that blacks were less likely to find a job, and the algorithm was used by city officials today, then homeless black men might be less likely to receive job counseling.

Knowing all this, it can seem like data scientists of today are walking through minefields, because there is no telling when their work might have a negative impact on communities. Thus, it is important to keep the conversation alive, and to keep asking the right questions of ourselves. We must keep asking whether we are doing enough, and whether we have considered the unique needs of the communities whose data we handle.


References

[Untitled illustration of an iceberg with the words “BIG DATA” written on it]. Retrieved April 29, 2016 from http://timoelliott.com/blog/wp-content/uploads/2013/06/big-data-graphic-iceberg-690.jpg

Fairfield, J. and Shtein, H. (2014). Big Data, Big Problems: Emerging Issues in the Ethics of Data Science and Journalism. Journal of Mass Media Ethics, 29(1), 38-51.

O’Neil, C. (2016). The Ethical Data Scientist. Slate. http://www.slate.com/articles/technology/future_tense/2016/02/how_to_bring_better_ethics_to_data_science.html

Friday, April 15, 2016

Security vs Privacy: is it all or nothing?



Last month, South by Southwest 2016 opened with President Obama in a keynote conversation. He discussed the issue of privacy and security with Evan Smith, who is Editor in Chief of The Texas Tribune. (You can watch the video here.) I am sure that there were many people from the tech community in the audience for whom this is an extremely sensitive issue. Many executives from top tech companies signed a letter supporting Apple’s decision to resist the FBI's request for help breaking into the iPhone, which would subsequently compromise user privacy. Obama declined to speak specifically about the San Bernardino iPhone issue, but he did talk about the issue in a broader sense. He said, “The dangers are real. Maintaining law and order and a civilized society is important. Protecting our kids is important. And so I would just caution against taking an absolutist perspective on this, because we make compromises all the time.” (Obama, 2016)

One of the things that I admire about Obama is his ability to see the nuances in an issue. I think it is the mark of intelligence to recognize that things are rarely black and white. Life is lived mostly in the gray areas. Thus, I agree with Obama that we should not approach the iPhone issue with an absolutist point of view. Brian Barrett from WIRED also agreed, as he wrote, “It also should not be framed as an absolute. Doing so presents the issue to the American public in a way that makes the FBI’s request palatable while obfuscating the potentially dangerous precedent it would represent.” (2016)

What we should be striving for is the right balance between privacy and security, where in most cases, only the right people have access to someone’s phone. In this case, Apple was justified in refusing to build the tool that the FBI was requesting. It is too much to ask of them, as it would violate the trust that they had worked so hard to build with their users. The government should figure out how to handle these situations without depending on the industry. As it turned out, that is exactly what happened: the FBI didn’t need Apple’s help after all, as they were able to retrieve the data from the iPhone eventually. Now it is up to the American people to decide whether the FBI should have been allowed to break into the iPhone or not.

This is how it’s always been, in that the US government uses its own resources to protect Americans, and sometimes that involves accessing people’s private data. Whether Americans feel that the government has overstepped has always been debatable, and people should remember that public opinion can shift over time. Indeed, the post-9/11 world was marked by increased support for government surveillance. As Levi and Wall wrote in 2004, “Despite fierce opposition from pro-privacy groups, the shift towards increased dataveillance has tended to occur against a backdrop of increased public support and against a broadly muted opposition, providing policing and security agencies with a strengthened mandate to carry out obtrusive security measures that would not have been so readily tolerated previously.”

Years later, in June 2013, news broke that Edward Snowden had revealed documents pertaining to NSA surveillance programs. A poll conducted amongst Americans shortly afterwards showed that “three-quarters said they approved of the government’s tracking phone records of Americans suspected of terrorist activity. Nearly the same number approved of the United States’ monitoring the Internet activities of people living in foreign countries.” (Kopicki, 2013)


Clearly, many people support surveillance when it is framed as a need to combat terrorists. But the issue is never that simple, is it? Several community leaders have suggested that these issues be addressed in Congress, rather than through hastily developed local agreements or secret surveillance programs. That is something I wholeheartedly support, because it would allow America to have the discussion it so badly needs in today’s technology-driven world.


References
[Untitled illustration of a door with the Apple logo by Then One for WIRED magazine]. Retrieved April 15, 2016 from http://www.wired.com/2016/02/apple-fbi-privacy-security/

The Daily Conversation (Producer). (2016, March 11). Obama Explains The Apple/FBI iPhone Battle. Retrieved from https://www.youtube.com/watch?v=ZjvX5zq7BXg

Barrett, B. (2016). The Apple-FBI Fight Isn’t About Privacy vs. Security. Don’t Be Misled. WIRED. http://www.wired.com/2016/02/apple-fbi-privacy-security/

Kopicki, A. (2013). Poll Finds Disapproval of Record Collection, but Little Personal Concern. The Caucus (New York Times blog). http://thecaucus.blogs.nytimes.com/2013/06/11/poll-finds-disapproval-but-little-personal-concern-about-record-collection/

Levi, M. and Wall, D. S. (2004). Technologies, Security, and Privacy in the Post-9/11 European Information Society. Journal of Law and Society, 31(2), 194-220.

Friday, April 1, 2016

The Perfect Internship

Photo Credit: Stuart Isett for The New York Times


It’s officially spring, and many students are currently searching for summer internships. Here in Silicon Valley, the market for computer science (CS) internships is highly competitive. Many students struggle to land internship offers. But suppose, hypothetically, that a student received multiple offers. What factors would influence her decision on which offer to accept?

Now, I am not a CS major. But as an MS Statistics student hoping to work in the field of analytics and data science, I value some of the same things that a CS major values in an internship. For me, there are three things that are most important to me. One is the possibility of the internship leading to a full-time position after I graduate. In Silicon Valley, connections are still the best way to land a job. If the company is open to hiring me after I graduate based on the great work I did as an intern, I cannot afford to pass up that opportunity. This sentiment is shared by other students, as shown in a study James Risley wrote about. Out of all the factors that students value in an internship, career advancement opportunity was the most important attribute (Risley, 2015).

Another thing that is important to me is learning skills not currently being taught in my grad program, namely tools for working with large terabyte-scale data such as MapReduce, Hadoop, and Spark. It seems that few students are exposed to these in their university programs; the bulk of people become familiar with these tools on the job. Thus, I need to learn those tools in order to stay competitive when I graduate and begin hunting for a job in data science.

My ability to learn, however, depends on whether I am in an environment that is conducive to learning. This leads me to the third thing that’s important to me, and I believe it is THE most important one: a good mentor. Even if I am surrounded by the fanciest technology in the world, I can’t learn unless I have someone who is willing to teach me, and is invested in my success. I do not thrive in very unstructured environments wherein I am expected to be practically independent. I need someone who is patient enough to explain concepts, and is interested in building a solid working relationship with me. I need to get the sense that I have something valuable to contribute. This is especially important to me as a minority woman planning to make a career in tech, where men greatly outnumber women; if I don’t feel like I belong, I am not going to come back. The importance of mentors is reiterated by Gloria Townsend in a 2002 paper. She wrote, “Whether mentors are men or women, if mentors can provide encouragement in a genuine and sincere manner, then these simple actions can transform the way a young woman views her connectedness to our discipline.” (Townsend, 2002)


Works Cited
Isett, S. (Photographer). (2007). [Untitled photograph of female students in front of a computer], Retrieved on April 1, 2016 from http://www.nytimes.com/2007/04/17/science/17comp.html?_r=0

Risley, J. (2015). Women nearly one third as likely to seek internships in tech, study finds. Geekwire. http://www.geekwire.com/2015/women-nearly-three-times-less-likely-to-seek-internships-in-tech-study-finds/

Townsend, G. C. (2002). People Who Make a Difference: Mentors and Role Models. ACM SIGCSE Bulletin - Women and Computing Homepage archive, 34(2), 57-61.

Friday, March 11, 2016

When Technology Fails Us

I am a big fan of technology. In fact, I want to build my career around technology because I truly believe it has tremendous potential to improve the lives of humans everywhere. However, sometimes technology fails spectacularly and becomes a huge annoyance instead. For example, I have a Nest thermostat in my house. It is what is called a “smart” thermostat. For the first week, my family was happy with the Nest thermostat. I particularly enjoyed the ability to control the heater remotely using a mobile app. If the weather was particularly chilly on a certain day and I was 15 minutes away from arriving home, I could turn on the heater via the app, and the house would already be toasty warm when I arrived. Later in the evening, I could use the mobile app again to turn off the thermostat from the comfort of my warm bed.

Screenshot of Nest's online store (Nest, 2016)
However, I realized quickly that the “smart” features of the Nest thermostat were nowhere near as useful as I had hoped. Nest’s Auto-Away feature was supposed to automatically shut off the thermostat during the times that people would not be at home. However, it is useless in households where the family's schedules are constantly changing. Rayoung Yang and Mark Newman (2013) did a study on this which was presented at an international conference. They wrote, “Participants expected Auto-Away would save energy when they were not at home. Several participants reported that they did not obtain much benefit from it since Auto-Away often either turned on when they were at home or did not turn on when they were not at home."

Furthermore, the ability to access the Nest via the mobile app is completely dependent on the stability of one’s home WiFi system. There are multiple reasons why a home WiFi connection would be unstable: spotty service from the internet provider, a faulty router, etc. When the Nest thermostat loses its connection to your home WiFi, it doesn’t tell you. It currently has no system for doing so. You only find out that it is offline when you are 25 miles away from home, trying to turn off the thermostat. Needless to say, this is highly annoying. This is apparently a known issue with the Nest thermostat, as I later found out when I read an article by Dave Greenbaum (2015). He wrote, “The app and the thermostat let you know your thermostat is offline if you check manually, but Nest doesn’t proactively warn you with an email or mobile notification if it detects the thermostat hasn’t checked in with the server in a while." Greenbaum suggested a web service, Junction, as a possible solution to the notification problem. I might just try it. Though I can’t help but think that I shouldn’t have to use a third-party service. That feature should have been built into my Nest thermostat in the first place.

For sure, this is one prime example of technology failing us.


Works Cited

[Screenshot of Nest’s online store]. Retrieved March 11, 2016 from https://store.nest.com/product/thermostat/

Greenbaum, D. (2015). How to Fix what Google Won’t Fix With The Nest Thermostat. GroovyPost. http://www.groovypost.com/howto/fix-annoying-nest-thermostat-issues-that-google-wont/

Yang, R., & Newman, M. W. (2013). Learning from a Learning Thermostat: Lessons for Intelligent Systems for the Home. Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing, 93-102.

Friday, February 19, 2016

On the skills gap between a CS education and the "real world"

I have often heard people complain that formal education does not prepare students for the “real world”. The frustration towards formal education is typically directed at liberal arts colleges, specifically those that produce graduates with few technical skills who end up struggling to find employment. But as it turns out, graduates with technical degrees don’t fare much better.

I recently read an article by Daniel Gelernter (2015) in which he lamented how poorly prepared computer science graduates are for real-world jobs. He writes, “University computer science departments are in miserable shape: 10 years behind in a field that changes every 10 minutes.” I can’t say I disagree with him about universities lagging behind. Technology is being developed at such a rapid pace, and there are no guarantees on what will be the next big thing. For example, once upon a time Adobe Flash was the coolest way to produce a professional-looking website. But just as quickly as it came, Flash was on its way out. People realized that Flash sites were clunky and took too long to load. The complicated designs of some Flash websites made them look cluttered on smaller screens, and that became a huge issue as mobile phones became more popular than desktop computers. Today, it’s practically impossible to find a website that still uses Flash because most web browsers no longer support it by default. Everyday, new technologies are being introduced. Today’s programming language of choice may well be obsolete a few years from now.

In this sense, I think that universities are not entirely to blame. A technology may be so new that schools struggle to find qualified people willing to teach it. Even the corporate world cannot keep up with technology sometimes. Once a company builds its system with a particular tool, it can be very difficult (not to mention expensive) to update it. Take Netflix, for example. Netflix is widely considered to be a pioneer in high-quality video streaming, and they make every effort to stay on top of their game. But I recently attended the ODSC West conference and sat in on a talk given by Chris Colburn from Netflix. Using an apologetic tone, he mentioned that Netflix stores its data on the Hadoop platform. He said that the Netflix team is aware that there are newer tools out there, but Netflix is just not ready to migrate yet.

Like Gelernter, there are many other people in tech who claim that they struggle to find qualified people. Some people believe that the problem is that there are not enough people studying the relevant subjects in school. However, Joann Weiner wrote in 2014 that “the reality is that there are enough STEM graduates. But these jobs go unfilled because there’s a gap between what graduates in STEM fields can do and the skills that STEM employers are seeking.”

So what exactly are the skills that Computer Science graduates lack? Experts in the field believe it is a combination of technical skills and the “softer skills”. Alex Radermacher and Gursimran Walia (2013) did an extensive review of research on this topic, and found that “graduating students are lacking in many different areas, including technical abilities (design, testing, configuration management tools, etc.), personal skills (communication, teamwork, etc.), and professional qualities (e.g. ethics).”

Given this, I think the best way to prepare students for the workplace is by putting them in the workplace. That is, students should do extensive internships before they graduate. The University of Waterloo in Canada is known for being a key feeder of Silicon Valley tech companies, and many people believe this is due to its cooperative education program. Though it takes 5 years to graduate, students who graduate with a CS degree from Waterloo already have 2 years of industry experience.

It is important to realize though that not everyone will have access to internships unless they are enrolled in a formal degree program. Many job postings specifically state that they do not accept applications from people who are not current students. I think this is rather short-sighted because there are many talented people without college degrees who have managed to teach themselves the right skills for CS jobs. I hope the industry becomes more open-minded, and that people refrain from making sweeping generalizations about people with non-traditional backgrounds. Even Gelernter (2015), who happily hires developers without college degrees, seems to be guilty of this. He said, “The thing I look for in a developer is a longtime love of coding/people who taught themselves to code in high school and still can’t get enough of it.” He declared that the majority of people who graduate from intensive bootcamp programs are not good, because they just don’t love coding as much as someone who has been doing it for years. I disagree with him in this regard because I think that there are probably a lot of people who were just not exposed to programming at an early age. Who’s to say that someone graduating from a coding bootcamp doesn’t love coding as much as someone who started coding 10 years earlier? That person who graduated from a bootcamp may be trying to make a career transition, but he may have put his heart and soul into it. He would make a great employee if given a chance. Or he may even be fulfilling a childhood dream, because even people who may have discovered that they liked to code at an early age may not have had the resources to run with it. Computers cost money, and if a child doesn’t have a computer at home, he cannot spend hours teaching himself to code.

All in all, I think the problem of the skills gap can be solved with some effort from both sides: the schools as well as the industry. Schools need to make their students do more internships. But companies also need to be more open-minded about the people they decide to hire for internships.


References:

Gelernter, D. (2015). Why I’m Not Looking to Hire Computer-Science Majors. The Wall Street Journal. http://www.wsj.com/articles/why-im-not-looking-to-hire-computer-science-majors-1440804753

Weiner, J. (2014). The STEM paradoxes: Graduates’ lack of non-technical skills, and not enough women. The Washington Post. https://www.washingtonpost.com/blogs/she-the-people/wp/2014/09/26/the-stem-paradox-lack-of-skills-by-stem-graduates-and-not-enough-women/

Radermacher, A., & Walia, G. (2013). Gaps between industry expectations and the abilities of graduates. SIGCSE '13 Proceeding of the 44th ACM technical symposium on Computer Science Education, 525-530.

Wednesday, February 10, 2016

Hello World: The Significance of the First Step

 print("Hello World!")  

This is usually the first line of code that someone writes when they first start learning a programming language. This, or some version of it. It's practically a tradition to use "Hello World!" as a test message to demonstrate basic syntax to novice programmers. For me, since Python 3 was the first programming language I learned, the line above was my very first line of code.

I love the "Hello, World!" tradition because it imparts a sense of adventure and discovery. Learning to code is truly like exploring another world. Or perhaps it is more accurate to compare it to someone pulling back a curtain and revealing that the world you've always lived in has a whole other dimension. Suddenly you see things differently.

It's only fitting then that I launch my first technical blog with the same words. It is, after all, a beginning: a venture into a new world.

Hello, World! Stef has some things to share with you.