Report back from DataGotham

This is a guest post by Sean J. Taylor, a PhD student in Information Systems at NYU’s Stern School of Business.

Last Thursday and Friday I attended the 2nd annual DataGotham conference in New York City. Alex Hanna asked me to write about my experience there for the benefit of those who were unable to attend, so here’s my take on the event.

Thursday evening was a social event in a really sweet rooftop space in Tribeca with an open bar and great food (a dangerous combination for this still-grad-student). Though I spent a lot of the time catching up with old friends, I would describe the evening as “hanging out on Twitter, but in person.” I met no fewer than a dozen people I had only previously known online. I am continually delighted at how awesomeness on Twitter is a reliable indicator of awesomeness in-person. Events like DataGotham are often worth it for this reason alone.

Friday was a giant block of short and long talks, with two panels mixed in. The most common type of talk showcased an interesting problem that the speaker works on. This ranged from industry applications such as the optimizing what goes into a Birch Box to more academic questions like inferring the age of Twitter users from their names or predicting the flood damage from hurricanes. I think these talks are a great way to stimulate discussion and make personal connections. We all think hard about empirical problems so it’s fun to share the lessons learned as we converged on satisfying solutions. From a social perspective, it helps us form a mental map of who has which kinds of expertise — useful information if you’re looking for advice, a product, or a job.

The second most popular kind of talk could be described as “here’s a peek at what we do at our organization (and how we do it).” Bob Gleichauf (In-Q-Tel and Lab41), Phil Kim (Capital One Labs), Sebastián Pérez Saaibi (Aentropico), and Jeff Hammerbacher (Mount Sinai School of Medicine) shared high-level overviews of their organizations, what they care about, and how they work. I think DataGotham was probably the perfect forum for these people to increase awareness about what they’re developing. Each of these speakers were pretty skillful at injecting novel ideas that added insight to talks that could have been merely descriptive.

Besides the talented people, impressive organizations, and diverse set of problems they are working on, I would say two main themes emerged at DataGotham. The first is epitomized by Brian Dalessandro’s talk about how he used intellectual capital gained during his day job to help another organization in need. Many people I chatted with shared his sentiment of wanting to use their data skills for good ™ in other contexts. DataKind seems to be the state-of-the-art approach right now, but I’m hopeful other avenues will emerge with alternative approaches. In particular, I’m skeptical that one-off, weekend projects are as useful for non-profit organizations as we would like to believe.

The second theme was building and sustaining our community. Even just a few of years ago, the idea of a community organized around a shared interest in data would have seemed a bit crazy. We organized around software tools, technologies, academic fields, companies, or industries. But judging from my armchair social network analysis, a fairly cohesive cluster of people has emerged that spans geography and disciplines. A panel led by the smooth-voiced Jon Bruner discussed the unique challenges and opportunities for growing our nascent community on both local and global scales. Harlan Harris shared valuable experience he’s garnered from growing DC’s community and Noel Hidalgo added a fascinating argument to creating “safe places” for people to work, learn, and collaborate. I won’t be able to do justice to the panel’s discussion here, so I highly recommend you watch it yourself when it becomes available on Youtube.

I’ve heard (and proposed) a number of hypotheses for why NYC’s own data community is so vibrant. There are a diversity of organizations (academic, non-profit, finance, ad-tech, and non-ad-tech) located in close physical proximity and they are generous with money and space. There are many active meetup groups that one could attend every month, each with fantastic volunteer speakers. People in NYC seem excited about their work and social enough to come out and talk about it on a regular basis (especially at data drinkups). There are conferences like DataGotham and Strata that bring hundreds of people together to talk about their work.

But I would submit that the real reason for the success in NYC is the core group of nerds who have been tirelessly organizing events since before it was cool (no hipster). The organizers of DataGotham (John Myles White, Mike Dewar, Drew Conway, and Hilary Mason) settle for nothing less than awesomeness in all that they do, forming a strong nucleus that others can organize around, imitate, and build on. This year’s event was no exception to this high standard. So while we might be tempted to attribute the NYC data community’s success to excellent events like DataGotham, this probably gets the causal direction wrong. They are effects of some amazing people who are the real causes.

Bad Hessian

Perls of Wisdom

Report back from DataGotham