2014年4月22日 星期二

[ Java 文章收集 ] Write Binary File in Java to be read by a C Program

來源自 這裡 
Preface: 
考慮有下面 C 代碼從檔案讀入 Binary 資料並轉為對應 struct 結構 "mystruct": 
  1. #include   
  2. #include   
  3.   
  4. struct mystruct  
  5. {  
  6.     unsigned char x;  
  7.     unsigned short y;  
  8.     unsigned int z;  
  9. };  
  10.   
  11. int main(void)  
  12. {  
  13.     FILE* myfile;  
  14.     struct mystruct ms;  
  15.   
  16.     myfile = fopen("data.bin""rb");  
  17.     if(!myfile)  
  18.     {  
  19.         printf("Unable to open the binary!\n");  
  20.         return EXIT_FAILURE;  
  21.     }  
  22.   
  23.     fread(&ms, sizeof(struct mystruct), 1, myfile);  
  24.     printf("mystruct size=%x\n", sizeof(ms));  
  25.     printf("unsigned byte=%x (%d)\n", ms.x, sizeof(ms.x));  
  26.     printf("unsigned short=%x (%d)\n", ms.y, sizeof(ms.y));  
  27.     printf("unsigned int=%u (%d)\n", ms.z, sizeof(ms.z));  
  28.   
  29.     fclose(myfile);  
  30.     return EXIT_SUCCESS;  
  31. }  
那有沒有辦法從 Java 代碼去產生 struct 結構 "mystruct" 的 Binary data? 下面的範例代碼告訴你怎麼做. 

Example: 
首先根據 C 資料結構 定義 unsigned char 佔 1 個 byte; unsigned short 佔 2 個 bytes; unsigned int 佔 2~4 bytes (我的機器是 4 bytes). 因此我們可以推知 mystruct 結構應該佔 1+2+4=7 個 bytes. 接著來看 Java 代碼如何產生 mystruct 的 binary 檔案: 
- NativeDataOutputStream.java 
  1. import java.nio.ByteBuffer  
  2. import java.nio.ByteOrder  
  3.   
  4.   
  5. class NativeDataOutputStream extends FilterOutputStream{  
  6.     public NativeDataOutputStream(){  
  7.   
  8.     }  
  9.   
  10.     public NativeDataOutputStream(OutputStream out) {  
  11.         super(out);  
  12.     }  
  13.   
  14.     public void writeShort(short value) throws IOException {  
  15.         ByteBuffer buffer = ByteBuffer.allocate(2).order(ByteOrder.nativeOrder());  
  16.         buffer.putShort(value);  
  17.         out.write(buffer.array());  
  18.     }  
  19.   
  20.     public void writeInt(int value) throws IOException {  
  21.         ByteBuffer buffer = ByteBuffer.allocate(4).order(ByteOrder.nativeOrder());  
  22.         buffer.putInt(value);  
  23.         out.write(buffer.array());  
  24.     }  
  25.   
  26.     public void writeLong(long value) throws IOException {  
  27.         ByteBuffer buffer = ByteBuffer.allocate(8).order(ByteOrder.nativeOrder());  
  28.         buffer.putLong(value);  
  29.         out.write(buffer.array());  
  30.     }  
  31.   
  32.     static void main(args) {  
  33.         short unsignedByte = 1;  
  34.         int unsignedShort = 2;  
  35.         long unsignedInt = 3;  
  36.   
  37.         NativeDataOutputStream out = new NativeDataOutputStream(new BufferedOutputStream(new FileOutputStream("data.bin"false)));  
  38.   
  39.         try {  
  40.             out.write     ((byte)  unsignedByte );  
  41.             out.writeShort((short) unsignedShort);  
  42.             out.writeInt  ((int)   unsignedInt  );  
  43.             out.flush();  
  44.         } finally {  
  45.             out.close();  
  46.         }  
  47.     }  
  48. }  
上面代碼透過 ByteBuffer 暫存資料的 byte array, 並透過 ByteOrder 給定系統的 Endianness (Big-endian or Little-endian, 就是系統怎麼解釋這個 array, 是從大到小還是從小到大.) 執行完畢後檢查產出的 "data.bin" 資訊確定 byte 數目如預期: 
 

接著使用 C 編譯完的程式 a.out 去讀 data.bin, 這時問題發生了: 
> ./a.out
mystruct size=8
unsigned byte=1 (1)
unsigned short=100 (2)
unsigned int=0 (4)
 # 1 怎麼不見了!!!

由 unsigned short=100 可以知道系統是 Big-Endian, 但是問題在於 unsigned int 的值居然是 0!!!! 後來知道 struct 在 compiler 的 align purpose 之下會自動加 padding
You're assuming that sizeof(struct) is the sum of the sizes of the members (7 bytes) and they follow each other one after the other with no gaps; this is actually unlikely. Usually structs are padded to 4 or even 8-byte boundaries, depending on the compiler. I think most compilers these days would pack that struct into 8 bytes: one for the char, an unused byte, two for the short, and four for the int. Therefore when you read the data, it's aligned incorrectly, so you get weird numbers; and likewise with the Java code (which doesn't confirm that it reads the number of bytes it expects, either, and gets one more byte than the 7 it wants.)

So the sad truth is that the layout of the C struct is compiler dependent, and you'll have to determine what your compiler does in terms of padding, and then match it with your Java program. Once you do that, you should be fine.

於是我使用下面代碼使用 C 代碼輸出 mystruct 的 binary 資料: 
  1. #include   
  2.   
  3. struct mystruct{  
  4.     unsigned char x;  
  5.     unsigned short y;  
  6.     unsigned int z;  
  7. };  
  8.   
  9. int main()  
  10. {  
  11.     FILE *pFile;  
  12.     pFile = fopen("data.bin""wb");  
  13.     if(!pFile)  
  14.     {  
  15.         printf("Fail to open file!\n");  
  16.         return -1;  
  17.     }  
  18.   
  19.     struct mystruct s = {1,1,1};      
  20.     fwrite(&s, sizeof(s), 1, pFile);  
  21.     fclose(pFile);  
  22.     return 0;  
  23. }  
上面產生的 data.bin 的 Hex string 如下: 
01 BC 01 00 01 00 00 00

因此由上可以知道當輸出 byte 時, 需要加上 padding "BC", 有了這個發現, 我們可以改寫 Java 代碼在 writeByte 的地方: 
  1. public void writeByte(byte value) throws IOException  
  2. {         
  3.     if(bo.equals(ByteOrder.BIG_ENDIAN))  
  4.     {  
  5.         out.write(value);  
  6.         out.write(HexByteKit.Hex2Byte("BC")); // Add Padding  
  7.     }  
  8.     else  
  9.     {  
  10.         out.write(HexByteKit.Hex2Byte("BC")); // Add Padding  
  11.         out.write(value);             
  12.     }  
  13. }  
這次產生的 data.bin 的 binary 內容如下: 
 

此時使用 C 讀入 data.bin 的結果就正常了: 
> a.out
mystruct size=8
unsigned byte=1 (1)
unsigned short=1 (2)
unsigned int=1 (4)

Supplement: 
flib - HexByteKit 
Java Primate Data Type

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

  Source From  Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 #   git clean -d -fx 方案2: 今天在服务器上  gi...