2016年12月12日 星期一

[Linux 常見問題] How can I convert multiple files to UTF-8 encoding

Source From Here 
Question 
I have a bunch of text files that I'd like to convert from any given charset to UTF-8 encoding. Are there any command line tools or Perl (or language of your choice) one liners I can use to do this en masse? 

How-To 
iconv does convert between many character encodings. So adding a little bash magic and we can write: 
- cov.sh 
  1. for file in *.txt; do  
  2.     iconv -f ascii -t utf-16 "$file" -o "${file%.txt}.utf16.txt"  
  3. done  
Then try it this way: 
# file test.txt
test.txt: ASCII text, with CRLF line terminators

// Write all supported fromcode and tocode values to standard output in an unspecified format.
# iconv -l

// Start converting file with extension .txt from ASCII to UTF-16
// -f fromcodesetIdentify the codeset of the input file.
// -t tocodeset: Identify the codeset to be used for the output file.

# ./cov.sh
# file test.utf16.txt
test.utf16.txt: Little-endian UTF-16 Unicode text, with CRLF, CR line terminators

This will run iconv -f ascii -t utf-16 to every file ending in .txt, sending the recoded file to a file with the same name but ending in .utf16.txt instead of .txt. 

It's not as if this would actually do anything to your files while translating ASCII to UTF-8 (because ASCII is a subset of UTF-8), but to answer your question about how to convert between encodings. 


Supplement 
iconv 指令轉換文字檔編碼(Big5 轉 UTF8、UTF8 轉 Big5 )

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

  Source From  Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 #   git clean -d -fx 方案2: 今天在服务器上  gi...