What is CSV encoding and what should I use?

We're going to get into a more technical topic now, but one that is absolutely necessary to avoid errors with your CSV! Don't worry, let's break it down into simple terms to understand: what is encoding, why does it matter, and what should you (most likely) be using in your CSV?
So...what the heck is encoding?

Put simply, it is a method of translating. When you type on your device, the characters you select (numbers, letters, punctuation, etc.) are stored in computer lingo. This lingo is called "bytes", which are just numeric values (yes, those "01101000 01100001" binary codes that make you feel like you're in the Matrix) . When you save and close your file, what looks to you like normal text, the computer has translated into it's own byte-language and stored it for you. Thanks computer!

Now, the next time you (or someone else) opens up your saved file, the computer now has to translate the language back to you, this time from it's byte-language back into your language. You know, so you can actually read and understand what you had typed...

All of this is called character encoding. And you may have guessed, it requires a key in order to work. Just like a key on a map or a cipher, or other types of puzzle challenges - you must have a key in order to translate the coded characters into text that can be read and understood.

Have you ever played one of these? Can you imagine trying to figure it out without the key located at the top?!?!
So, it is the same with any text you type. Your computer requires a key in order to translate the text from your language, into bytes, and then back again the next time you want to open and read your file.

So hopefully, you're feeling confident about what encoding is and why it's necessary. Which leads us to:

What encoding key should you use?

Let's not beat around the bush. It is recommended to always use UTF-8 encoding in your CSV files.

Why?
UTF-8 encoding contains 1,112,064 characters, which encompasses just about any character you would type, in any language. This means you could essentially type any text, numbers, punctuation, emojis - anything - and using UTF-8 encoding, the computer will have the key it needs to store this information into bytes, and back again.

There are other encoding options, but they are all more limited than UTF-8. So, as a general rule of thumb, select UTF-8 and you're golden. If you're using a different encoding key, you may run into error messages, or simply with a file that looks like this when you open it:

See those question marks? That's the computer's confusion you're looking at. Poor thing. It didn't have the proper key to translate the text you typed, so now it spits out utter nonsense back to you.
Many programs default to this encoding option for you, so you don't have to do all that pesky thinking and choosing. Google Sheets, for example, uses UTF-8 as it's default. However, if you are using Excel, you may need to check that you're using UTF-8 encoding. You can do this by:

  1. In your opened file, select "Save As" and select ".CSV"
  2. Name your file
  3. At the bottom of this dialog box, select "Tools" and then "Web options"
  4. Head to the "Encoding" tab and in the "Save this document as:" dropdown, choose Unicode (UTF-8).
  5. Save!
(Obviously the steps can vary a bit depending on which version of Excel you are using, but the principle remains - don't forget to specify your character encoding!).


The idea is simple: you are the creator of the file, so you choose the key that the computer will use to translate your text into it's computer-lingo. Your reward for this task is a file that can be opened and read again in your language, without any translation errors!