5 tips for consuming Data Transfer files

Tuesday, October 26, 2010 |

Every day we receive various support tickets asking for help with existing Data Transfer configurations. Sometimes the answer to a question can be found within the DT FAQs in the Help Center (DFA | DFP | Rich Media) and other times it requires personal assistance. We regularly review both sources of information in an effort to improve your Data Transfer experience. With that in mind, we've identified a number of tips that will help improve your work with DT.

1) Download Daily
DT files are published daily* according to your account's timezone. Correspondingly, we suggest that you download your new DT files every day. While you can download files every other day, only on weekdays, or even once a week, downloading every day ensures that you see your data when it is fresh. It also means that you can identify issues while they are still recent. DT files stay on the server, by default, for 30 days from the date of post - but don't wait until the files are aged to pick them up! Purged DT files cannot be recovered.

2) Secure your login
If your files are used by different parties, it's best to limit access to your FTP account. For instance, say you have a reporting team, an analytics team and a 3rd party consulting team all consuming your DT files. Suppose that each is located in a different office or is part of a different company. Sharing your login means that you cannot control which group sees what data. If one of these parties should need to have access restricted (filtered) or to revoked, it's difficult to do with a shared login. Some DT (and Match Table) data can be quite sensitive. The best approach is to download all of your files locally and then to distribute as needed. While this requires more effort on your part, it's almost always well worth it. We get requests to build custom scripts to help clients distribute their DT files, but such scripts are "black box", limited in their business rules, and are charged at an additional fee.
If you do need to reset your FTP login, please contact your support team and they'll be happy to assist.

3) Save for a rainy day
We recommend that you save 60 days of DT files in their raw, un-aggregated format. Storage these days is relatively cheap. Storing your files in their raw un-aggregated format means that you can re-load them into your data store easily if needed. Once files have aged off of the DoubleClick FTP server, they cannot be regenerated. DT files are special since, among other reasons, they contain your non-sampled User-ID ad events. We sometimes hear about clients who discover a clever (but legal!) new use for their event files - and frequently that new use requires granular values like User-ID. Moreover, depending upon your system, it's possible to query flat files just like querying a database. There are a number of open-source options to achieve this.

4) Identifying DT fields
DT file types (eg, impression, click and activity) change formats very infrequently. In fact, if there is a change in format, it's likely because you have asked us to add or remove a field. In short, you can depend on both the file format (gzipped text) and the structure (column A, column B and so on). From time to time, however, you may want to change the order of your columns. To ensure that your ETL system is flexible, identify DT columns based on column name rather than on column position. Identifying a field by name means that changes to DT format require no further programming on your part. Indexing columns by their name is dependable; fields like Time, User-ID, Ad-ID, Site-ID, etc, have never changed. When we build custom reports based on DT, we always index on column name.

5) Use .done files
.done files are a somewhat under-used free feature of the Match Tables (MT) system. .done files are produced daily and provide you with information that can be very useful in ensuring smooth ETL processing. A .done file is produced once all dependencies have been met for that day. By default the dependencies are your newly updated MT files. Thus, if a .done file is posted for a given day, it means that you can start downloading your MT files and begin identifying event dimensions (names). .done files also provide you with SHA checksums as well as a filename list and filesize list. Note: if you have network level Activities, you'll also receive a .done file. This file however is different than the MT .done file.
To request .done files for your account, contact your account manager or support team.

6) Use open source tools
Processing your DT and MT files doesn't have to involve expensive ETL resources. There are a number of open-source ETL tools (as well as open-source databases such as MySQL) which can do a fine job while at the same time helping you manage operation costs. One particular tool is Crush-Tools. We designed Crush-Tools internally and released it to the open-source community a few years back. Crush-Tools is a unix-based library of utilities that are specifically designed to consume, manipulate and operate on DoubleClick Data Transfer files. For instance, via pipes and redirects you can filter, aggregate, calculate and produce reports all directly from DT flat files. We even use Crush-Tools extensively ourselves! Read more about Crush-Tools here.

We hope these tips will prove helpful to you.

--Matthew Trojanovich, the Data Transfer Service Team

* Some clients have asked to receive their DT files more frequently. Our product team is indeed working on this. Dates cannot be given yet, but stay tuned to this channel for more.