sábado, 1 de marzo de 2014

Why The New York Times hired a Biology Researcher as its Chief Data Scientist


TO HELP MAKE SENSE OF THE MASSIVE TROVES OF DATA PRODUCED BY PEOPLE CLICKING AROUND ITS WEBSITE, THE TIMES MADE A (VERY) NONTRADITIONAL HIRE--CHRIS WIGGINS, A BIOLOGY RESEARCHER WITH A PHD IN THEORETICAL PHYSICS. IF YOU CAN MAP THE HUMAN GENOME, MAYBE YOU CAN EVEN FIX JOURNALISM.

It doesn't come as a huge surprise that the New York Times has hired a chief data scientist. Even 162-year-old media companies know that technology will play a huge role in the future of journalism. And, despite its age, the Times hasn't shied away from digital innovation. What's surprising, however, is that the new hire, Chris Wiggins, has spent the last 10 years steeped in biology research.

Chris WigginsWiggins's role at the paper started less officially last fall, when he used a sabbatical to apply his skills to the newsroom. "The best thing you can do with a sabbatical is to find something completely unrelated to your research to do," he explained to Fast Company. "I've been hearing for years from my students that this is where the future is and I was happy that I found a way that I could participate in building that future while building the data science skills that I've been developing in a different domain within the academy," he added.


As of last week, the Times made him a permanent fixture. Wiggins, who is also an associate professor in the Department of Applied Physics and Applied Mathematics at Columbia University, will spend one day a week at the paper, leading its yet-to-be built "machine learning" team.

Fast Company spoke with Wiggins about what that role will entail and how having a background in biology makes him at all qualified to lead the Times into the future.

FAST COMPANY: Before we get to this biology quandary, tell us about your background. Maybe that will clear things up.
CHRIS WIGGINS: My PhD was in theoretical physics. I got my PhD at Princeton at a time which is now called the second string revolution. Basically, everyone was interested in string theory, except for me. I was interested in the opposite of string theory. I was interested in how we model complex real-world problems in terms of the data that we have about them. At the time I was working in biology and that's where I've been working for the last several years doing what we would now call data science.

Okay, got it. So that makes a little more sense of things. But still: what does biology have to do with media?
The pain that many fields are experiencing by becoming data driven is a pain that was experienced in biology 15 years ago when whole genomes started being sequenced. I think there is a lot in common about applying data science in the natural sciences and applying data science in the real world. Real people have questions from a different domain, they have abundant data, and it's a fun and creative task to try to reframe those domain questions as machine learning tasks.

You lost me at machine learning tasks, what does that mean?
Machine learning sits at the intersection of data engineering and mathematical modeling. The thing that makes it different from statistics traditionally, is far more focus on building algorithms.

Another difference, although this is more of a spiritual difference, is that statistics traditionally has had a stronger emphasis in explaining a data set and machine learning has far more interest in building predictive models. For example, when Netflix tells you what movie to watch or when Amazon predicts what book to buy--that’s machine learning. In a way, machine learning has crept up all around you.

Like Netflix--now you're speaking our language. But how does this relate to theTimes? How do you plan to apply "machine learning" to the Grey Lady and its newsroom?

There's a lot of potential both on the business side and the editorial side. On the business side The New York Times is interested in establishing long-term relationships with their readers. So the Times is trying to
  • understand what are the behaviors that seem to be correlated with loyalty
  • what are the behaviors that seem to be correlated with dissatisfaction, particularly among subscribers--the way a long-term relationship manifests itself at the Times is via subscribers. 
If we're trying to understand how people engage with the site in general and in the abstract, machine learning is both useful for gaining insights about what kind of content engages users, and also good business. There is a way to listen to your customers at scale that you can do at a website that is just qualitatively different than having a focus group or giving out a survey.

And what about editorially?
ANYTIME ANYONE DOES ANYTHING ON A WEBSITE, THAT IS AN EVENT AND THAT PERSON LEAVES A TRAIL OF DATA. PUTTING ALL OF THAT DATA TOGETHER IS DEFINITELY A NON-TRIVIAL TASK.

I've worked with people on the data visualization group to think about how better to cluster the results that they look at that make the data visualization more interpretable. I've worked with business analysts to think about how to present data in user engagement in a way that is more meaningful and more statistically robust. I've worked with people from the personalization team to think about how best to test and optimize their recommendation models. There's a lot of opportunities for a machine learning group that makes all the groups stronger and draws a common narrative with people thinking about data in a high-powered way throughout the company, particularly on the technology side. It's a very technologically strong company.

It sounds like they have a pretty strong tech department.
People at the New York Times I found to be totally optimistic. They're not putting their heads in the sand. If you look at just the changes, even on the front-facing New York Times, they've really built out a presence in interactive news and much more video. I would say people are aware of the challenges and are really working hard to build a sustainable future for journalism.

I understand that part of your role is to further extend the tech department and build a team?
I'll be focusing on people who do and speak machine learning. There's already excellent data engineers; there's already excellent business analysts; there's already a great data visualization group. So it's a fantastic place now to build a machine learning group to try to make sense of those data, reframe business analytics questions as prediction problems, and to work with people who will visualize and help make interpretable the insights you gain from those types of predictive models.

How do you think this team led by you will change things at the Times?
News has gone from a device that shows up on your door step to a website, which just opens up a whole new universe of ways of understanding your readers and listening to your readers better. Anytime anyone does anything on a website, that is an event and that person leaves a trail of data. Putting all of that data together is definitely a non-trivial task. It gives so much immediate insight into the way people use your product, how your products can be improved, what new products you should be thinking about. I think that's a real transformation for anybody in any business.

[Photos courtesy of Chris Wiggins]

ORIGINAL: Fast Company
February 12, 2014

No hay comentarios:

Publicar un comentario

Nota: solo los miembros de este blog pueden publicar comentarios.