Visualizing Data - TechBookReport

Keywords: Data mining, knowledge discovery, graphics

Title: Visualizing Data

Author: Ben Fry

Publisher: O'Reilly

ISBN: 0596514557

Media: Book

Level: Introductory

Verdict: An excellent introduction to the Processing environment and language

Buy US
Buy UK

Trying to get a handle on large or complex datasets is not always easy or straightforward. Excel, the vehicle of choice for many people doing basic data analysis, simply doesn't cut it. And whereas there are plenty of data mining and knowledge discovery tools to chose from, many of these are designed to run in batch mode. For those interested in iteratively exploring data - in order to understand it, to extract features or discover new knowledge - the needs for interaction are essential. This is the area that Processing, (http://processing.org), seeks to address. Processing is an open-source programming language and environment for processing data and creating innovative and interactive displays for data mining and exploration.

Processing consists of an integrated development environment for the development of 'Sketches' - data visualisation projects - and the associated programming language, which is based on Java (Processing programs are actually compiled down to Java). Initiated by Ben Fry and Casey Reas, Processing is fast becoming the tool of choice for developers and data analysts want to create feature-rich graphical applications that help users understand their data.

In 'Data Visualisation', Ben Fry provides the reader with both a set of tutorials on how to use Processing with different types of data, but also a set of guidelines for how to approach such projects in general. The scene is set in the second chapter which presents a worked example - exploring the mapping of zip codes to locations - that shows what Processing can do, as well Fry's recommended process for tackling these projects.

While the first two chapters do a good job in laying the groundwork, the next few chapters delve more deeply into specific types of data and visualisations: maps, time series, graphs, scatterplots, networks and trees etc. In each case there is discussion on some of the key issues involved - data acquisition, parsing, generating displays, using supporting libraries and so on. The book makes liberal use of code - which is also available for download, along with sample datasets. Additional topics are covered when appropriate, for example there is discussion of XML parsing, JSON and other common formats, external libraries for accessing Excel and so on.

The theme of data parsing, though touched on in a number of chapters, is also the subject of a specific chapter. Here there is discussion of common formats and how they can be parsed, as well as looking at available tools for addressing these formats. The emphasis is on finding what works rather than over-engineering solutions, and the heuristic approach that Fry recommends makes a lot of sense for those of us who have had to do such work in practice. While not as a detailed as Greg Wilson's 'Data Crunching', there's some good advice on offer to the reader.

The relationship between the Processing language and Java is also explored, and the final chapter looks at how you can make use of the Processing API from within Java, including instructions on using Processing APIs with Java applets.

Finally, mention must be made that the book is illustrated with numerous colour graphics, giving the reader a taste of what the code in the book is doing. For anyone interested in learning how to use Processing the book is a must. It does more than just walk through a set of examples, it provides the reader with good background, sound advice and lots of real code to try out.

Hit the 'back' key in your browser to return to subject index page

Return to home page