TechBookReport logo

Keywords: Machine learning, statistics, R, data mining

Title: Machine Learning for Hackers

Author: Drew Conway and John Myles White

Publisher: O'Reilly

ISBN: 978-1449303716

Media: Book

Level: Introductory machine learning, some previous experience of R useful

Verdict: A solid and readable book for those wanting to explore machine learning with R


The first thing to say about this book is that the title doesn't really do the content justice. To my mind a title like Machine Learning for Hackers carries suggestions of programmers writing code in a language like Python, Ruby, Java and so on. It really doesn't bring to mind the use of the R statistical language and environment. So, a more accurate take on this would call it something like Hands-on Machine Learning With R - not such a snappy title I'll grant you, but closer to what the authors have produced than the current title.

With that said, there's no doubting that this is a great little book for those who want to get to grips with data mining/machine learning using the best open source statistical system in the world. One of the great things about this book is that the authors clearly emphasis the iterative, exploratory nature of much machine learning and data mining activity. In this respect the book is so much more useful than a simple listing of different algorithms and approaches (there are plenty of good books already on the market that focus more on the algorithmic side of machine learning).

The book is primarily task oriented aside from the first couple of chapters, which cover some fundamental basics, such as installing and running R, and a quick look at key descriptive statistics (quantiles, standard deviations, averages etc) in the context of data exploration. The latter is an important point, as is shown in the succeeding chapters, a good chunk of any data analysis project is taken up with understanding, validating, cleansing and formating the data so that it's useful for the task at hand.

From then on the book tackles standard machine learning topics chapter by chapter, each geared around an extensive worked example. Topics covered include classification (spam filtering), ranking (priority email inbox), regression (page views prediction), regularisation (text regression), optimisation (code breaking), principle components analysis (building a market index), MDS (exploring US senator similarity), kNN (recommendation systems), analysing social graphs and support vector machines. That's a fairly comprehensive list of techniques and a good grounding in them provides a very solid background to performing some very complex data mining and knowledge discovery projects covered.

While these different techniques differ in algorithm and implementation, each chapter runs a fairly common course. The first step is always to explore the data, get a feel for it and then go through the process of collating different data sets (for training and testing, for example). In doing this there's a good run-down on key R concepts on data handling, input output and libraries for visualising the data. Because the examples cover such a wide range of applications it means that the main data types (numeric, text and dates) are covered in some detail. Once the data has been organised the particular technique being used is introduced, very often by performing intermediate steps explicitly rather than just using a library function in one hit (though these library functions are later introduced). It's a very practical and hands-on way of working, and more importantly, it aids in developing understanding and proficiency. It's this hands-on, iterative way of working that the authors say is the hall-mark of the true hacker…

Overall, this is a great book for R users wanting to get into machine learning, and for people familiar with machine learning and wanting to know how to use R as their platform.

Hit the 'back' key in your browser to return to subject index page

Return to home page

Contents © TechBookReport 2012. Published July 12 2012