Tools to help you manipulate text data

Thursday, April 21, 2011 |

You may be wondering what tools we recommend for working with Data Transfer and Match Tables. While there are many options, both commercial and otherwise, the following are some of our recommendations. What they all have in common is that they are tools designed for manipulation of large text files. These tools also share something else: they are all either free or open-source, so we hope that makes them even more useful for you. Some of these same tools are used by the DoubleClick Customization team - the DoubleClick team that builds the occasional DT-based custom report or application for clients.

So if you're looking for some alternatives to try out, we hope you find the following tools useful:

Crush-Tools - CRUSH (Custom Reporting Utilities for SHell) is a collection of tools for processing delimited-text data from the command line or in shell scripts. The tools are especially "Data Transfer aware" in that they understand DT headers, delimiters and column types. At the site find a tutorial and usergroup. A simple example of CRUSH is the following command line statement where familiar shell commands are enhanced:

gzdog <dt_filename> | grepfield -F Event-ID [12] | aggregate2 -K Site-ID,Activity-Sub-Type -S Revenue -L | pivot -P Site-ID,Activity-Sub-Type -A Revenue-Sum


"unzip and cat the DT file, select only post-click and post-impression activities, aggregate the revenue by Site and Floodlight tag, and pivot the results for each site/activity combination."

Google Refine - a power tool for working with messy data, cleaning it up, transforming it from one format into another, extending it with web services, and linking it to databases like Freebase. The tool can help make merging disparate data a breeze. Check out the engaging demo videos.

Google Fusion Tables (beta) - Upload small or large data sets from spreadsheets or CSV files. Visualize your data on maps, timelines and charts. Pick who can access your data; hide parts of your data if needed. Merge data from multiple tables. Discuss your data with others. Track changes and discussions. Drop a DT file into Fusion Tables and in short order you'll be able to graph and sift through the data through a spreadsheet like UI.

Google Visualization API - lets you access multiple sources of structured data that you can display, choosing from a large selection of visualizations. Thus you can create reports and dashboards as well as analyze and display your data through the wealth of available visualization applications. In terms of DT, once you've processed your files, make it visually useful!

-- Matthew Trojanovich, the DoubleClick Data Transfer Team