The collation utf8mb4_0900_ai_ci is a character set collation for MySQL databases, introduced in MySQL 8.0.1. It is based on the Unicode Collation Algorithm (UCA) 9.0.0, and the character set is utf8mb4, which supports a wide range of Unicode characters.
The “ai” in the collation name stands for “accent insensitive” and the “ci” stands for “case insensitive.” This means that comparisons between characters are done without considering differences in case or accents.
If you are receiving an “Unknown collation” error, it may be because your MySQL server version is older than 8.0.1 and doesn’t support this collation. To fix this issue, you can:
Problem
During the migration of a web application, I got the below error while restoring a database on another server. The collation id may differ based on the MySQL version.
Error message:
Error 1273 (HY000) at line 25 Unknown collation: ‘utf8mb4_0900_ai_ci’
See the error screenshot during database restoration.
Here you go with a solution.
Solution
After a little investigation, I found that the MySQL server running on the destination is an older version than the source. So we got that the destination server doesn’t contain the required database collation.
Then we do a little tweak in the backup file to resolve this. Edit the database backup file in text editor and replace “utf8mb4_0900_ai_ci” with “utf8mb4_general_ci” and “CHARSET=utf8mb4” with “CHARSET=utf8“.
- Replace the below string:
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
- With:
ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci;
Here we are changing the CHARSET to utf8, that is the older version and have limitation, Read the implications at the end of this article before making the changes in database.
Save your file and restore the database.
The Linux system users can use the sed command to replace text in files directly.
sed -i 's/utf8mb4_0900_ai_ci/utf8_general_ci/g' backup.sql
sed -i 's/CHARSET=utf8mb4/CHARSET=utf8/g' backup.sql
That it. after the above changes, the database was successfully restored!
Hope this is solution helped you to resolve “Unknown collation: ‘utf8mb4_0900_ai_ci’” issue.
Limitations of UTF8 Character Set:
Changing the character set from utf8mb4 to utf8 in MySQL is not inherently bad, but it may have some implications that you should consider before making the change:
- Limited Unicode support: The utf8 character set in MySQL only supports a limited range of Unicode characters, specifically the Basic Multilingual Plane (BMP), which includes characters from the Unicode code points U+0000 to U+FFFF. In contrast, utf8mb4 supports the full range of Unicode characters, including supplementary characters (code points U+10000 to U+10FFFF), such as emojis and certain rare symbols or scripts. If you need to store these supplementary characters in your database, you should use utf8mb4.
- Data loss or corruption: If your existing data contains characters outside of the BMP, converting the character set from utf8mb4 to utf8 may result in data loss or corruption. These characters will be replaced with the Unicode replacement character (U+FFFD) during the conversion process.
- Index length limitations: The utf8 character set uses less storage space (up to 3 bytes per character) compared to utf8mb4 (up to 4 bytes per character). This may help you work around index length limitations, especially with older versions of MySQL. However, you should be aware of the trade-offs in terms of Unicode support.