2019年3月1日 星期五

[ Python 文章收集 ] Base64 Encoding and Decoding Using Python

Source From Here 
Preface 
Say you have a binary image file you wanted to transfer across a network. You’re amazed that the file wasn’t received properly on the other side—the file just contained strange characters! Well, it seems that you attempted tosend your file in its raw bits and bytes format, while the media used was designed for streaming text. 

What would be the workaround to avoid such an issue? The answer is Base64 encoding. In this article, I will show you how we can use Python to encode and decode a binary image. The program is illustrated as a standalone local program, but you can apply the concept to different applications like sending your encoded image from your mobile device to a server, and many other applications. 

What Is Base64? 
Before moving more deeper in the article, let’s define what we mean by Base64

Base64 is a way in which 8-bit binary data is encoded into a format that can be represented in 7 bits. This is done using only the characters A-Z, a-z, 0-9, +, and / in order to represent data, with = used to pad data. For instance, using this encoding, three 8-bit bytes are converted into four 7-bit bytes. The term Base64 is taken from the Multipurpose Internet Mail Extensions (MIME) standard, which is widely used for HTTP and XML, and was originally developed for encoding email attachments for transmission. 

Why Do We Use Base64? 
Base64 is very important for binary data representation, such that it allows binary data to be represented in a way that looks and acts as plain text, which makes it more reliable to be stored in databases, sent in emails, or used in text-based format such as XMLBase64 is basically used for representing data in an ASCII string format. 

As mentioned in the introduction of this article, without Base64 sometimes data will not be readable at all. 

Base64 Encoding 
Base64 encoding is the process of converting binary data into a limited character set of 64 characters. As shown in the first section, those characters are A-Z, a-z, 0-9, +, and / (count them, did you notice they add up to 64?). This character set is considered the most common character set, and is referred to as MIME’s Base64. It uses A-Z, a-z, 0-9, +, and / for the first 62 values, and +, and / for the last two values: 


The Base64 encoded data ends up being longer than the original data, so that as mentioned above, for every 3 bytes of binary data, there are at least 4 bytes of Base64 encoded data. This is due to the fact that we are squeezing the data into a smaller set of characters. 

Have you ever seen part of a raw email file like the one shown below (which most likely originates from an email not being delivered)? If so, then you have seen Base64 encoding in action! (If you notice “=”, you can conclude that this is a Base64 encoding, since the equals sign is used in the encoding process for padding.): 
  1. plain Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: base64  
  2.   
  3. 2KfZhNiz2YTYp9mFINi52YTZitmD2YUg2YjYsdit2YXYqSDYp9mE2YTZhyDZiNio2LHZg9in2KrZ h9iMDQoNCtij2YjYryDZgdmC2Lcg2KfZhNin2LPYqtmB2LPYp9ixINi52YYg2KfZhNmF2YLYsdix 2KfYqiDYp9mE2K/Ysdin2LPZitipINin2YTYqtmKINiq2YbYtdit2YjZhiDYqNmH2Kcg2YTZhdmG INmK2LHZitivINin2YTYqtmI2LPYuSDZgdmKDQrYt9mE2Kgg2KfZhNi52YTZhSDYp9mE2LTYsdi5 2YrYjCDYudmE2YXYpyDYqNij2YbZiiDYutmK2LEg2YXYqtiu2LXYtSDYqNin2YTYudmE2YUg2KfZ hNi02LHYudmKINmI2KPZgdiq2YLYryDZhNmE2YXZhtmH2Kwg2KfZhNi52YTZhdmKDQrZhNiw2YTZ gy4NCg0K2KzYstin2YPZhSDYp9mE2YTZhyDYrtmK2LHYpyDYudmE2Ykg2YbYtdit2YPZhSDZgdmK INmH2LDYpyDYp9mE2LTYo9mGLg0KDQrYudio2K/Yp9mE2LHYrdmF2YYNCg== –089e0141aa264e929a0514593016 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: base64  
Base64 is carried out in multiple steps, as follows: 
* The text to be encoded in converted into its respective decimal values, that is, into their ASCII equivalent (i.e. a:97, b:98, etc.). Here’s the ASCII table.
* The decimal values obtained in the above step are converted into their binary equivalents (i.e. 97: 01100001).
* All the binary equivalents are concatenated, obtaining a large set of binary numbers.
* The large set of binary numbers is divided into equal sections, with each section containing only 6 bits.
* The equal sets of 6 bits are converted into their decimal equivalents.
* Finally, the decimal equivalents are converted into their Base64 values (i.e. 4: E). Here are the decimal values and their Base64 alphabet.

Base64 Decoding 
Base64 encoding decoding is the opposite of Base64 encoding encoding. In other words, it is carried out by reversing the steps described in the previous section. So, the steps of Base64 encoding decoding can be described as follows: 
* Each character in the string is changed to its Base64 decimal value.
* The decimal values obtained are converted into their binary equivalents.
* The first two bits of the binary numbers are truncated from each of the binary numbers obtained, and the sets of 6 bits are combined, forming one large string of binary digits.
* The large string of binary digits obtained in the previous step is split into groups of 8 bits.
* The 8-bit binary numbers are converted into their decimal equivalents.
* Finally, the decimal values obtained are converted into their ASCII equivalent.

Encoding an Image 
Let’s now get to the meat of this article. In this section, I’m going to show you how we can easily Base64 encode an image using Python. I will be using the following binary image. Go ahead, download it and let’s get Python rolling! (I’m assuming that the name of the image is avenger.jpg.

(Source from here

The first thing we have to do in order to use Base64 in Python is to import the base64 module: 
  1. import base64  # Do base64 encoding/decoding  
  2. import hashlib  # Do MD5 sum calculation  
In order to encode the image, we simply use the function base64.b64encode(bytes). Python mentions the following regarding this function: 
Encode the bytes-like object s using Base64 and return the encoded bytes.

Thus, we can do the following in order to Base64 encode our image: 
  1. with open('avenger.jpg''rb') as fh:  
  2.     image_read = fh.read()  
  3.   
  4. encoded = base64.b64encode(image_read)  
  5. print(encoded.__class__.__name__)  
  6. md5 = hashlib.md5()  
  7. md5.update(image_read)  
  8. print("avenger.jpg with MD5={}".format(md5.hexdigest()))  
Output: 
bytes
avenger.jpg with MD5=2adc08bf723e21e9d92c084f101927eb

Decoding an Image 
To decode an image using Python, we simply use the base64.b64decode(bytes) function. Python mentions the following regarding this function: 
Decode the Base64 encoded bytes-like object or ASCII string s and return the decoded bytes.

So, in order to decode the image we encoded in the previous section, we do the following: 
  1. decoded = base64.b64decode(encoded)  
  2. md5 = hashlib.md5()    
  3. md5.update(decoded)  
  4. print("decoded with MD5={}".format(md5.hexdigest()))  
Output: 
decoded with MD5=2adc08bf723e21e9d92c084f101927eb


Putting It All Together 
Let’s put the program that Base64 encodes and decodes a string together. The Python script that does that should look something like the following: 
  1. In [1]: data = b'data to be encoded'  
  2.   
  3. In [2]: import base64, hashlib  
  4.   
  5. In [3]: encoded = base64.b64encode(data)  
  6.   
  7. In [4]: decoded = base64.b64decode(encoded)  
  8.   
  9. In [5]: decoded == data  
  10. Out[5]: True  
  11.   
  12. In [6]: decoded  
  13. Out[6]: b'data to be encoded'  
Supplement 
Python 計算 MD5 與 SHA 雜湊教學與範例

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

  Source From  Here 方案1: // x -----删除忽略文件已经对 git 来说不识别的文件 // d -----删除未被添加到 git 的路径中的文件 // f -----强制运行 #   git clean -d -fx 方案2: 今天在服务器上  gi...