Tag Archives: data

Software and Chicken Entrails

I had a collection of epiphanies today about Informational Software.

“Informational Software” is a term I use to describe software that helps you understand and make decisions about information. It is not a product and does not make your business money, but it can be used to help you understand your business and therefore, in theory, help you make money. For example, analytics software does not make you money, but you can use it to understand your traffic and hopefully to then minimize your costs and maximize your revenues. (Note that if you are selling an analytics package, this definition is still true, because in that case the software is a product, not an information tool*.)

Informational Software comes in two types: software that interprets trends and data and makes decisions for the user, and software that collects and reveals data so the user can make their own decisions. Expert systems are an example of the former, analytics packages are an example of the latter.

Let’s call this mess of data the chicken entrails. We want to read them and predict the future, right? Sure we do. That’s what chicken entrails are for.

Okay, enough definition. Here are the epiphanies:

First: If your users are untrained and the data is simple, your system can advise the user and/or make decisions for them. If they cannot read the entrails, or the entrails are too simple to bother the entrail readers, do not let/make them read the entrails.

Second: (Pay attention, this is the important bit) If your users are highly trained and the data is very complex, do not attempt to interpret the entrails for them. Just show them the entrails. Highly trained users who have asked you for information software do not want you to do their job, they want a tool to help them do their job better.

Third: My instinct is always write software that interprets entrails, regardless of the complexity and regardless of user knowledge. Learn to stop and figure out what it really needed.

Fourth: Interpreting complex data is really, really hard. On a small, knock-it-out project, it is almost certainly doomed to fail. Now reread the 2nd epiphany: On small, knock-em-out projects, it is completely unnecessary.

Right now I’m working on software that reveals entrails to some truly arcane masters of entrail reading. They have become masters because the information system currently available to them is literally designed to protect the data from their eyes. I have found two pieces of data, correlated them, and put them into a report, and they think I’m a super genius. Not because my software is smart, but because it is smart enough to be dumb.

I really like epiphanies where I suddenly realize that it is not necessary to be doomed to failure.

* And if you’re smart enough to reason “but what if you’re using your analytics tool to analyze the sales of your analytics tool”, I congratulate you on your cleverness. Can you also see the flaw in this reasoning?

Terrible Beauty

Recently I dumped all of the data out of one of our geographic tables in our database, and plotted a red dot for every record. This was the result:

Click the link for a larger version (1200×800).

This is an interesting graph. It appears to follow population, but there are also some patterns that follow state outlines: there are some curious voids in North Carolina and Minnesota, for example. In the west, we can almost make out highways; there’s a visible north/south track through Denver along I-25, and there’s a strong clustering down I-15 through Utah with a big blotch over Las Vegas.

Kinda neat, huh?

What you are looking at is every registered sex offender in the United States. My job is to collect and analyze that data, along with crime information.

Still neat? Sigh.

There are still some interesting patterns, however. Population centers follow the freeways in the western states. As the population density rises to the east, we lose track of individual roads; sex offenders are only absent where people cannot live. Physical barriers like coastlines and bays are clearly marked, of course; but you can also clearly see places where people merely aren’t allowed to live: the Okefenokee National Wildlife Refuge in Florida, the Allegheny and Susquehannock National Forests in Pennsylvania, and the cluster of tiny national parks all throughout upstate New York.

I am most interested, however, in North Carolina and Minnesota. Why are they so obviously underrepresented here? Are people just less likely to offend sexually in those states? Possibly; I am too cynical to believe that people in those states behave differently than in the other 48, but it could be that NC and MN convict sex offenders less often, or they could simply have less strict sex offender laws.

Ah, but see how trusting you are of my little dots? It could be that the NC and MN sex offender registry systems are just harder to get data from. While I haven’t worked with those states directly, I have worked on the code that aggregates this data, and I can tell you that no two states provide data the same way. Some states are sensible and reliable, while others are such a headbashing nightmare—Alaska, I am looking at you here—that you wonder if their IT department had to make do with wild monkeys because the trained ones were too expensive.

Here’s a really good possibility: Some states track sex offenders at different levels, and sex offenders under a certain level are tracked inside the state but not published to national registries. It could be that those two states have just as many RSO’s per capita, but they do not report misdemeanor offenses outside the state.

Or, who knows. It really could be that in North Carolina people are too nice—and in Minnesota it’s just too cold—for that sort of thing.

Note: some of the dots lie outside the US. This is not always a mistake. Sex offenders are required to keep their address registered even if they move outside the USA. There are a few thousand US sex offenders scattered across the globe—including one at the South Pole.