Alex « Bad Hessian

About Alex

Alex Hanna is a PhD candidate in sociology at the University of Wisconsin-Madison. Substantively, I'm interested in social movements, media, and the Middle East. Methodologically, I'm interested in computational social science, textual analysis, and social network analysis. You can find me on Twitter at @alexhanna and on the web at http://alex-hanna.com.

(GIF via Dilettwat)

We’re down to the final episode. This one is for all the marbles. Wait, that’s not the best saying in this context. In any case, moving right along. In the top four episode, Detox was eliminated, but not after Roxxxy threw maybe ALL of the shade towards Jinkx (although, to Roxxxy’s credit, she says a lot of this was due to editing).

Jinkx, however, defended herself well by absolutely killing the lipsync. Probably one of the top three of the season, easy.

Getting down to the wire, it’s looking incredibly close. As it is, the model has ceased to tell us anything of value. Here are the rankings:

1         Alaska  0.6050052  1.6752789
2 Roxxxy Andrews  2.5749070  3.6076899
3  Jinkx Monsoon  3.4666713  3.2207345

But looking at the confidence intervals, all three estimates are statistically indistinguishable from zero. The remaining girls don’t have sufficient variation on the variables of interest to differentiate them from each other in terms of winning this thing.

So what’s drag race forecaster to do? Well, the first thought that came to my mind was — MOAR DATA. And hunty, there’s one place where I’ve got data by the troves — Twitter.

Continue reading →

We’re in the Final Four now, the actual final four that matters (sorry sports forecasters).

Last week, Coco got the chop, which made sense statistically (she had a huge relative risk AND had been the first queen to have had to lipsync four times) and from a narrative standpoint — Alyssa got eliminated the week before so they didn’t really need to keep Coco around to continue all that drama.

So now the biggest question is who is getting kicked off this week and will leave us with our top three? Before I get to my predictions, I want to point readers to Dilettwat’s analysis, which, while uses no regressions, is still chock full of some interesting statistics about the top four and makes some predictions about who needs to do what in this episode to win.

For my own analysis, it’s looking very close here. Here are the numbers.

1  Jinkx Monsoon  1.2170057  1.5580338
2         Alaska  1.2509423  2.0045457
3 Roxxxy Andrews  3.2466063  3.3926072
4          Detox  5.5899580  1.5527694

Jinkx and Alaska are neck-and-neck in this model, and confidence intervals make pairwise comparisons rather hard to make here. But this order is the same as Homoviper’s Index.

Just to get a sense of how close this is, here is a plot that tracks the relative risks across the last few weeks.

Last week, Coco’s relative risk was incredibly high, the highest it has been. This week, Detox has the only relative risk that is indistinguishable from zero, which makes me think she’s about to go. Which makes sense — she’s the only one in the group who has lipsynced more than once. So all I’m willing to say more confidently is that Detox goes home tonight.

On a totally different note, Jujubee came to Madison on Thursday and I got a chance to tell her about my forecasting efforts when we were taking some pictures…

Then I saw Nate Silver at the Midwest Political Science Association conference in Chicago. Unfortunately, there was not an opportunity for a Kate Silver/Nate Silver photo op. Maybe next time.

I’m scribbling this furiously because I had a busy weekend, but stay tuned for next week’s extra special analysis.

This week, the Global Data on Events, Location, and Tone, or GDELT dataset went public. The architect of this project is Kalev Leetaru, a researcher in library and information sciences, and owes much to the work of Phil Schrodt.

The scale of this project is nothing short of groundbreaking. It includes 200 million dyadic events from 1979-2012. Each event profiles target and source actors, including not only states, but also substate actors, the type of event drawn from the Schrodt-specified CAMEO project, and even longitude and latitude of the event for many of the events. The events are drawn from several different news sources, including the AP, AFP, Reuters, and Xinhua and are computer-coded with Schrodt’s TABARI system.

To give you a sense how much more this has improved upon the granularity of what we once had, the last large project of this sort that hadn’t been in the domain of a national security organization is King and Lowe’s 10 million dyadic events dataset. Furthermore, the dataset will be updated daily. And to put a cherry on the top, as Jay Ulfelder pointed out, it was funded by the National Science Foundation.

For my own purposes, I’m planning on using these data to extract protest event counts. Social movement scholars have typically relied on handcoding newspaper archives to count for particular protest events, which is typically time-consuming and also susceptible to selection and description bias (Earl et al. 2004 have a good review of this). This dataset has the potential to take some of the time out of this; the jury is still out on how well it accounts for the biases, though.

For what it’s worth, though, it looks like it does a pretty bang-up job with some of the Egypt data. Here’s a simple plot I did across time for CAMEO codes related to protest with some Egyptian entity as the source actor. Rather low until January 2011, and then staying more steady through out the year, peaking again in November 2011, during the Mohamed Mahmoud clashes.

These data have a lot of potential for political sociology, where computer-coded event data haven’t really made much of an appearance. Considering the granularity of the data, that it accounts for many substate actors, social movement scholars would be remiss not to start digging in.

A few other resources on GDELT:
Leetaru and Schrodt’s 2013 ISA paper
Jay Yonamine‘s (one of Schrodt’s students) paper on predicting levels of violence in Afghanistan

Last week, Alyssa got the boot and Jinkx kept her place. And I totally called it with my first model that accounted for the proportional hazards assumption. I think the model is having a little more success as the season plods on.

Before I get to the predictions for episode 10, there’s two really interesting prospects that may either give this model some more predictive power or become very interesting projects in their own right.

Continue reading →

Last week, Alaska took it home with her dangerous performance, while Ivy Winters was sent home after going up against Alyssa Edwards. This is sad on many fronts. First, I love me some Ivy Winters. Second, Jinkx had revealed that she had a crush on Ivy, and the relationship that may have flourished between the two would have been too cute. But lastly, and possibly the worst, both of my models from last week had Ivy on top. Ugh.

What went wrong? Well, this certainly wasn’t Ivy’s challenge. But it’s high time that I started interrogating the models a little further.
Continue reading →

Fabio Rojas over at orgtheory posted this amazing univariate distribution relationship map, courtesy of the William and Mary math department.

I am awestruck.

An interactive version can be found here.

A few weeks ago, Twitter announced that they were releasing a client for their Streaming API. It’s open-source! Get it here: https://github.com/twitter/hbc

This is pretty great news, for a few reasons:

The Streaming API relies on a consistent connection, so doing all that messy authentication and making sure you’re not going to drop any information is simplified and will comport to Twitter specs.
Twitter is deprecating v1 of their APIs, including Streaming and RESTful. They haven’t made any dramatic changes in the Streaming API but it still means changing libraries or expecting someone who is maintaining your library of choice to update it.
It’s all being developed and actively maintained in-house by Twitter. The maintainers, @steven and @kevino (apparently one of the perks of working at Twitter is getting an awesome username), are especially responsive with bug fixes and pull requests.
There’s a plugin for the Twitter4j library, if you want to implement listeners that do any background data handling or parsing for particular pieces of data (deletes vs. stall_warnings). I haven’t tried this yet but it looks promising.

The downside? It’s in Java. While this used to be a nice insult when I was hacking around in CS 180 and Java was at version 1.4.2, Java has gotten much faster since then. The addition of projects like Apache Maven has made development with dependencies and handling classpaths much easier. But then, you still have to know at least a little Java to get the thing up and running.

I’ve been using this as my primary gardenhose collection device for a few weeks now with only a handful of issues, as bugs are surfacing in development but being squashed soon after.

Wow, last week’s Drag Race post made the rounds in the stats and Drag Race circles. It was cross-posted to Jezebel and has been getting some pretty high-profile links. A little birdy told me that Ms. Ru herself has read it. I think I can die a happy man knowing that RuPaul has visited Bad Hessian.

Anyhow, last week I tried to count Coco out. I was reading her like the latest AJS. The library is open. But her response to me was simple — girl, please:

(Also this happened. Wig under a wig.)

(both of these gifs courtesy of f%^@yeahdragrace)

Can that win safeguard Coco from getting eliminated? Let’s look at the numbers after the jump.
Continue reading →

If you follow me on Twitter, you know that I’m a big fan of RuPaul’s Drag Race. The transformation, the glamour, the sheer eleganza extravanga is something my life needs to interrupt the monotony of grad school. I was able to catch up on nearly four seasons in a little less than a month, and I’ve been watching the current (fifth) season religiously every Monday at Plan B, the gay bar across from my house.

I don’t know if this occurs with other reality shows (this is the first I’ve been taken with), but there is some element of prediction involved in knowing who will come out as the winner. A drag queen we spoke with at Plan B suggested that the length of time each queen appears in the season preview is an indicator, while Homoviper’s “index” is largely based on a more qualitative, hermeneutic analysis. I figured, hey, we could probably build a statistical model to know which factors are the most determinative in winning the competition.
Continue reading →

raspberry-pi-top

A friend of mine has recently let me borrow a Raspberry Pi, a tiny, ultra-cheap computer that uses an SD card for a hard drive and uses a mini-USB cord for power. It runs either their Debian distro (Raspbian) or Arch without much trouble.

I’ve been thinking of cool things to do with it. There are a number of pretty cool projects that folks have done with a little bit of wiring and elbow grease. These are no doubt amazing and worthy of much geek admiration. I’ve been trying to think of something novel to do with mine. Matt and I thought we were on to something when we discussed a cat toy that would change direction when it hit a wall.

I put the little DC motor away when I realized that there’s probably some nifty social scientific data that can be had by a tiny computing device with a portable power source and USB wireless card. Asking Twitter about what one could do here, David Masad suggested replicating this study which estimates crowds using dropped wireless packets. A few other ideas I’ve had suggest that using an infrared sensor could count foot traffic in particular areas, sensors hooked up next to a door to get data on entering and leaving a room, or a sound monitor that collects data on voice interaction. Of course, some of these suggestions get dangerously close to violating individual privacy and disclosure, so the more that these sensors make people anonymous the better.

So, tossing this idea out to Bad Hessian readers — what would be some other “unobtrusive methods” that could be leveraged with a tiny computer and some simple sensors?

Bad Hessian

There's got to be a more "Pythonic" solution...

Author Archives: Alex

About Alex

RuPaul’s Drag Race Season 5 Finale — Predicting America’s Next Drag Superstar from Twitter

THE FINAL FOUR – Drag Race season 5, episode 11 predictions

GDELT and social movements

More variables, spinoff projects, and RuPaul’s Drag Race season 5 predictions: episode 10

Model assessment (and predictions for RuPaul’s Drag Race Season 5, Episode 9)

Univariate distribution relationships

Hosebird – Twitter’s in-house Streaming API client

RuPaul’s Drag Race season 5 predictions: episode 8

Lipsyncing for your life: a survival analysis of RuPaul’s Drag Race

Raspberry and Society