What is a character set and what is collation? A character set is a set of symbols and encodings, whereas, collation is a set of rules to compare characters in a character set. For example:

Suppose that we have an alphabet with four letters: "A", "B", "a", "b". We give each letter a number: "A" = 0, "B" = 1, "a" = 2, "b" = 3. The letter "A" is a symbol, the number 0 is the encoding for "A" and the combination of all four letters and their encodings is a character set.

Suppose that we want to compare two string values, "A" and "B". The simplest way to do this is to look at the encodings: 0 for "A" and 1 for "B". Because 0 is less than 1, we say "A" is less than "B". What we've just done is apply a collation to our character set. The collation is a set of rules (only one rule in this case): "compare the encodings." We call this simplest of all possible collations a binary collation.

How to specify character settings per database?

To create a database such that its tables will use a given default character set and collation for data storage, following is required:

CREATE DATABASE mydb DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;