||New Reviews| |Software Methodologies| |Popular Science| |AI/Machine Learning| |Programming| |Java| |Linux/Open Source| |XML| |Software Tools| |Other| |Web| |Tutorials| |All By Date| |All By Title| |Resources| |About||
Keywords: Statistics, data analysis, Excel
Title: Head First Data Analysis
Author: Michael Milton
Verdict: A good introduction to the beginning data analyst
Most books on data analysis are focused on algorithms — giving you the statistical tools of the job, with instructions on what technique to use for different tasks and with different data sets. That isn't the approach that this Head First book takes — far from it. The approach adopted by author Michael Milton is to focus instead on what it is a data analyst does — it's the why rather that purely the how. Add to that the distinctive Head First approach — informal, light hearted and geared heavily around exercises for you to engage in — and you've got yourself a fairly unique proposition as far as data analysis books go.
The first and most obvious thing to point out is that this is not a book that is aimed at experienced analyst (though those who already walk the walk might enjoy dipping in anyway). The book is aimed squarely at the beginner — someone who is faced with the task of analysing some data but who doesn't have a systematic way of going about it. What the book provides the reader then is a gentle way into the field using common tools (mainly Excel and OpenOffice), worked examples and a chance to tag along with exercises, questions and answers, sample data and common scenarios.
Aside from specific topics such as hypothesis testing, regression, data cleansing and so on, the book brings out the point that much data analysis work is iterative, exploratory and that it needs to be utterly focused or it ends up being a black hole that leads nowhere (but with a lot of numbers along the way). The exercises are generally useful, and it's fair to point out that you don't get the best out of this if you're just content to read along rather than dive in. The data is there and it's easy to download and play with — it's the only way to get a feel for the subject really.
A good example is the chapter on data cleansing which works through a simple but fairly complete scenario of working through a cleaning a dataset. It starts from scratch with a text file full of delimited records that you want to turn into something useful. This means splitting fields, cleaning text, identifying duplicates and so on. It's bread and butter work that takes up an awful lot of time in the real world, but isn't much discussed in a lot of data analysis books. Here's it's a late chapter in the book, but worth having all the same.
While much of the first part of the book uses nothing more than a spreadsheet and CSV files, as topics proceed the tools change as well. The stats tool that it uses is R, the open source statistical package that packs a serious punch. While the book doesn't go into huge amounts of depth, the introduction to R is a good move, and the examples show how easy it is to get value from the tool very quickly. This is definitely a bonus.
So, on the whole this is a pretty good place to start for someone new to data analysis, but as always there's room for improvement too. Firstly the choice of topics is somewhat strange. The material on Bayesian analysis doesn't quite come off. It's a complex topic, and perhaps this is one which could have been left out or left to an appendix. It doesn't help that the text contains a number of errors which make it hard to follow (but the O'Reilly site does contain an errata page that you can check out). Secondly there is a chapter on relational databases, but it doesn't really go into SQL or any useful depth. If you've never heard of a relational database maybe you'd find it useful (at a pinch), but for most readers there's not a lot here. Again, this is something that might have been good in an appendix. More useful would have been a bit more on some basic statistical analysis of data that would prove useful to a beginner — percentiles, simple tests of significance, box plots etc.
So, overall this is a good book and one easy to recommend as a first introduction to the basic ideas of data analysis, although less suited to those already versed in the field or those who want more in the way of statistical methodologies.