How to encode a text string into a number in Python?

Issue

Let’s say you have a string:

mystring = "Welcome to the InterStar cafe, serving you since 2412!"

I am looking for a way to convert that string into a number, like say:

encoded_string = number_encode(mystring)

print(encoded_string)

08713091353153848093820430298

..that you can convert back to the original string.

decoded_string = number_decode(encoded_string)

print(decoded_string)

"Welcome to the InterStar cafe, serving you since 2412!"

It doesn’t have to be cryptographically secure, but it does have to put out the same number for the same string regardless of what computer it’s running on.

Solution

encode it to a bytes in a fixed encoding, then convert the bytes to an int with int.from_bytes. The reverse operation is to call .to_bytes on the resulting int, then decode back to str:

mystring = "Welcome to the InterStar cafe, serving you since 2412!"
mybytes = mystring.encode('utf-8')
myint = int.from_bytes(mybytes, 'little')
print(myint)
recoveredbytes = myint.to_bytes((myint.bit_length() + 7) // 8, 'little')
recoveredstring = recoveredbytes.decode('utf-8')
print(recoveredstring)

Try it online!

This has one flaw, which is that if the string ends in NUL characters ('\0'/\x00') you’ll lose them (switching to 'big' byte order would lose them from the front). If that’s a problem, you can always just pad with a '\x01' explicitly and remove it on the decode side so there are no trailing 0s to lose:

mystring = "Welcome to the InterStar cafe, serving you since 2412!"
mybytes = mystring.encode('utf-8') + b'\x01'  # Pad with 1 to preserve trailing zeroes
myint = int.from_bytes(mybytes, 'little')
print(myint)
recoveredbytes = myint.to_bytes((myint.bit_length() + 7) // 8, 'little')
recoveredstring = recoveredbytes[:-1].decode('utf-8') # Strip pad before decoding
print(recoveredstring)

Answered By – ShadowRanger

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published