Opening a CSV in Excel—without corruption
Comma Separated Values (AKA comma-delimited ASCII) are one of the most common formats used to exchange information. This is because they're compatible with applications on just about any operating system—going back to DOS days. On many computers, a .CSV file will be automatically associated with Microsoft Excel, and when you double-click on the file it opens using the automatic settings.
This is great—unless you have certain types of data such as East coast Zip codes.
Here's an example .CSV file: three fields listing a name, date in EU format, and Zip code.
Name,Date in EU format,Zip
Jane Doe,1/5/07,01252
John Smith,19/7/07,98107
If you double-click on this file, Excel opens it as:
What Excel has done is classify all the columns as "General" format. The names are fine, but the May 1st date has been changed to January 5th. While it may look like the date is still salvageable as the EU format, Excel codes dates with an underlying 5 digit value and this cell is now the January 5, 2007 value. On that same row Jane's Zip code has lost the leading zero, which you can see in the formula bar as completely gone, not simply hidden from view. On John's record the July 19 date is being treated like text (note the left alignment) because Excel doesn't recognize months greater than 12.
If at this point you close the file without saving, it will still be OK, but the minute you save, the values will be permanently changed.
Dates in formats other than M/D/Y and pure number IDs such as Zips and some customer or employee IDs are the most vulnerable. If you don't have those in your data file, you'll generally be OK with a double-click to open.
When you do need to be careful of the formats, you want to go through Excel's import dialog. You can get to this dialog by importing, or by opening a file with an extension Excel doesn't recognize (.txt will sometimes work).
If importing:
- Open a fresh worksheet and click in A1
- Data menu
- Import External Data
- Import Data
- Select your file
Now the wizard, which is identical for both importing and opening an unrecognized file:
- On Step 1 of 3 defaults are OK
- On Step 2 of 3 change the Delimiter to Comma
- On Step 3 of 3 select all columns (click on first, Shift+click on last)
- Set their format to Text then Finish
In Excel, what Text essentially means is "Don't change anything." While I could go through the individual columns during Import and set the proper date, etc., the all Text approach is faster. Once the data is open, I can always select a column and use the Format menu, Cells to adjust the Category if needed, but as long as it's set to Text Excel will leave the data alone.
Need a Hand?
A little help can add a lot of polish—or just save hours and headaches:
The course was very well received. Ann in one word is phenomenal. Please thank her again for all her hard work and of course patience. Amazing woman.
Marian Slobodian
Statistics Canada