Wednesday 24 June 2009

To Excel or not to Excel?

Microsoft software product names are a pretty good source of ironic amusement. Of those I use regularly “Excel” and “Access” vie for the top spot, which is probably why I found the paper on the HSUG mailing list a week or two ago by David Lawunmi so interesting.

Lawunmi uses a wealth of interesting references to tour some of the drawbacks of Excel. These include - without rehashing his entire article - a variety of statistical flaws or missing functions, use of binary rather than decimal floating-point arithmetic, and generally appalling charting. More damningly to my mind though is the lack of record keeping involved in analytical spreadsheet work. Generally at the end of a piece of Excel work you’re left with your raw data file, your results, a mish-mash of semi-processed data, and a grim smile. There’s no audit trail or log of what you’ve done, and your results can’t be replicated by anyone else (or often yourself). Not that using spreadsheets makes good practice impossible of course, but they certainly do discourage it.

The analogy between programming projects and spreadsheets that Lawunmi refers to from the NPL document is pretty close, but a tighter one is to the subset of scripting languages within the set of programming languages. Quick and dirty solutions to small problems, and a method of making large problems functionally insoluble. However, friends who work in the software biz tell me that wherever there is an interface between large proprietry systems you will find a nest of script jobs providing the cushion and the interface. And of course we live in real world. Excel is ubiquitous and we will all continue to use it:

“Excel is utterly pervasive. Nothing large (good or bad) happens without it passing at some time though Excel”
Source: Anonymous interviewee quoted by Croll.

So where do we go from here?

You’re going to use Excel for many (most?) things but make the effort to supersede it where it counts – don’t get sucked into software monolingualism. For proper stats use a proper stats package. For proper graphs use a proper graphical package. When you do use Excel be careful, and if you can't be careful be lucky. Or perhaps you could move to Open Office. Hmmm, must check that out...

In the final analysis perhaps the question should be rewritten wryly: “To excel when Excel doesn’t?”

Wednesday 17 June 2009

Generating Principles for Analysts, Part 1

I was reading the Overseas Development Institute's Exploring the Science of Complexity today (more on this again), which I found through Duncan Green's blog From Poverty To Power.

One of the examples it gave of the generation of complex behaviour from simple rulesets was that of the US Marine Corps who apparently use "Capture the high ground", "Stay in touch", and "Keep moving" (I haven't managed to confirm this, though the principles are certainly in keeping with the maneuver warfare principles of John Boyd).

Regardless, the question is begged... what might these be for analysts?

I think I got a line on the first one a few weeks ago when I was facilitating a workshop about the main points in Tufte's books. He spends a fair amount of time in The Display of Quantitative Information on how graphs are made up of "data ink" which conveys information, and "non-data ink" (including chart junk), which doesn't. One of his examples takes the original graph (Linus Pauling, General Chemistry (San Francisco, 1947), p. 64.) ,


Source: The Visual Display of Quantitative Information, Edward Tufte

and with the non-performing non-data ink removed (and just a little bit of data ink added), we have

Source: The Visual Display of Quantitative Information, Edward Tufte

Much better - reads at a glance.

It occurred that the concept of data ink vs non-data ink is another way of expressing the signal to noise ratio.

Errors are a form of noise. Lack of clarity is noise. Extraneous information is noise.

So... Maximise Signal:Noise as a first generating principle. Not bad.

Wednesday 10 June 2009

"A city is not a tree"

Christopher Alexander's 1965 essay (Part I & Part II and also Part I with all diagrams intact) has a great title, but deserves its status as a classic for its ideas. It argues that the elements of a city: the people, the buildings, the businesses, and their interrelations are not well represented by a mathematical tree structure but much better by a mathematical semi-lattice. The complex interrelations between these elements can be much better described by a structure that allows them to overlap.

Isn't this obvious? To me at least, only in hindsight. When I first read it I had one of those "Ah-Hah!" moments as the idea struck me. Of course it had to be that way, except I'd somehow never known it.

And to this day tree-patterned suburbs continue to be built around many urban cores, which suggests that it either isn't obvious to the planners either, or that Alexander was also correct in advancing the idea that people have a cognitive barrier to the representation of concepts in semi-lattice structures. Another cognitive problem to look out for I guess...

So, what has any of this got to do with information analysis? The lesson I get out of this is that I need to get the analytical framework right before I can hope to get anything useful out of my data, and any work I do based on the wrong framework is worse than useless.

And wider that that? That the systems that interest us most are richer and more complex that we find it easy to conceive. That seems like a good place to leave this first post.
/** google analytics */