Skip to content
This repository was archived by the owner on Jan 13, 2023. It is now read-only.

TryteString.as_string needs some re-branding #90

Closed
todofixthis opened this issue Oct 31, 2017 · 5 comments
Closed

TryteString.as_string needs some re-branding #90

todofixthis opened this issue Oct 31, 2017 · 5 comments

Comments

@todofixthis
Copy link
Contributor

The TryteString.as_string desperately needs to be renamed; a lot of users are confusing it with __str__.

@todofixthis
Copy link
Contributor Author

todofixthis commented Oct 31, 2017

My recommendation is to call it decode:

  • Users will be familiar with this method because it is built into Python strings (and TryteString is supposed to "feel" like the Tangle version of a Python string).
  • Users who are familiar with how bytes.decode works will also be able to grasp more easily that this is a "trytes -> bytes -> characters" process (especially once TryteString.as_bytes() does not work as expected #62 is implemented), so they are less likely to confuse it with __str__.

@todofixthis todofixthis changed the title TryteString.as_string needs a makeover TryteString.as_string needs some re-branding Oct 31, 2017
@mlouielu
Copy link
Contributor

mlouielu commented Nov 1, 2017

Conclusion

+1 for rename as_string to decode, and rename as_bytes to encode.

For what I think TryteString should act like this:

>>> import iota
>>> ts = iota.codecs.encode(b'EXAMPLE'.decode('ascii'), 'utf-8')  # Return TryteString
>>> ts = iota.codecs.encode('EXAMPLE', 'utf-8')                    # Return TryteString
>>> ts = iota.TryteString.from_string('EXAMPLE')
>>> ts = iota.TryteString.from_bytes(b'*\x15d\x96\xb5\x121\x8b\x01')
>>> ts = iota.TryteString('OBGCKBWBZBVBOB')
iota.TryteString('OBGCKBWBZBVBOB')
>>> ts.encode()                          # encode "tryte-string" to "tryte-in-bytes"
b'*\x15d\x96\xb5\x121\x8b\x01'
>>> ts.decode('utf-8')                   # decode "tryte-string" to "str"
>>> ts.decode()                          # default with utf-8
'EXAMPLE'
>>> str(ts)
'OBGCKBWBZBVBOB'
>>> bytes(ts)                           # Not b'OBGCKBWBZBVBOB'
b'*\x15d\x96\xb5\x121\x8b\x01'

Explain

Users are confused between 'EXAMPLE', iota.Hash('EXAMPLE'), iota.Hash(b'EXAMPLE'), str(iota.Hash('EXAMPLE')), bytes(iota.Hash('EXAMPLE')), what is the different between them?

  • 'EXAMPLE': a string, maybe tryte string, or a Python string
  • iota.Hash('EXAMPLE'): a TryteString, with its value init with 'EXAMPLE'
  • iota.Hash(b'EXAMPLE'): a TryteString, with its value init with b'EXAMPLE' (this is same as 'EXAMPLE')
  • str(iota.Hash('EXAMPLE')): a tryte string in str, from iota.TryteString('EXAMPLE'))
  • bytes(iota.Hash('EXAMPLE')): a tryte string in bytes, from iota.TryteString('EXAMPLE'))

The point is, TryteString.__init__ input with str or bytes is both acceptable, in here, str and bytes both represent a "tryte string".

There isn't involve any decode/encode. So, str(iota.Hash('EXAMPLE')) will be 'EXAMPLE', and bytes(iota.Hash('EXAMPLE')) will be b'*\x15d\x96\xb5\x121\x8b\x01', is make sense.


But, from_string, as_string involve with encode/deocde, from_string encode input string to utf-8, and pass it to from_bytes, therefore, this is the same:

>>> iota.Hash.from_string('妳好') == iota.Hash.from_bytes('妳好'.encode('utf-8'))
True

The deeper problem here comes from from_bytes. It takes not the "tryte string in bytes format" but "any bytes".


For what I think, we just mess up two different converts in one type. we want to do something like str/bytes -> tryte-string -> TryteString -> bytes, and tryte-string (in strorbytes) -> TryteString.

# str/bytes -> tryte-string
"This is a message from GitHub" -> "CCWCXCGDEAXCGDEAPCEAADTCGDGDPCVCTCEAUCFDCDADEAQBXCHDRBIDQC"

# tryte-string -> TryteString
"CCWCXCGDEAXCGDEAPCEAADTCGDGDPCVCTCEAUCFDCDADEAQBXCHDRBIDQC" -> TryteString("CCWCXCGDEAXCGDEAPCEAADTCGDGDPCVCTCEAUCFDCDADEAQBXCHDRBIDQC")

# TryteString -> bytes
TryteString("CCWCXCGDEAXCGDEAPCEAADTCGDGDPCVCTCEAUCFDCDADEAQBXCHDRBIDQC") -> b'T\xf4\xe6\xcd\xbc\x0bNf.\xcb\xb7\x0bm\xeb@\xce^\x17L\xeb.R\x08&oT.\xe6\x05\x1at\x94R\xe9\x08'

---------

# str/bytes -> tryte-string
"EXAMPLE" -> "OBGCKBWBZBVBOB"

# tryte-string -> TryteString
"OBGCKBWBZBVBOB" -> TryteString("OBGCKBWBZBVBOB")
# tryte-string -> TryteString
"EXAMPLE" -> TryteString("EXAMPLE")
b"EXAMPLE" -> TryteString("EXAMPLE")

BTW, @todofixthis you are using Python 2, right? In Python 3, str can only encode to bytes, and bytes can only decode to str. str can't do decode to unicode. I think that's why I'm stuck in TryteString.decode(), if TryteString act like a Python string, it can't do decode in Python 3...

@todofixthis
Copy link
Contributor Author

todofixthis commented Nov 1, 2017

These are great ideas, thanks @mlouielu !


tl;dr version: Overall, I think we're in agreement; I just have a couple of minor changes to request:

Changes to be made:

  • Rename TryteString.as_bytes to encode.
  • Rename TryteString.as_string to decode.

Everything else can stay the way it is — we'll make additional changes for #62, but for #90, I think we only need to rename a couple of methods.


Let's tackle this one item at a time:

1. __init__

I like the idea of TryteString('FOO') == TryteString(b'FOO'). In fact, this is what PyOTA does currently.

2. __str__ and __bytes__

To be consistent, __str__ and __bytes__ should either:

Return the ASCII representation of the trytes:

  • str(TryteString('ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM'))) == 'ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM'
  • bytes(TryteString('ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM')) == b'ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM'

OR return the binary representation of the trytes:

  • str(TryteString('ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM'))) == '你好,世界!'
  • bytes(TryteString('ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM')) == b'\xe4\xbd\xa0\xe5\xa5\xbd\xef\xbc\x8c\xe4\xb8\x96\xe7\x95\x8c\xef\xbc\x81'

I think the former satisfies the Principle of Least Astonishment. Additionally, it conforms to the Zen of Python ("There should be one-- and preferably only one --obvious way to do it.") because we will use encode/decode to get binary representations of TryteStrings anyway.

3. iota.codecs.encode and iota.codecs.decode

This is not necessary, as we can leverage Python's built-in codecs system.

To decode bytes into trytes:

>>> from codecs import encode, decode

>>> bytes_ = '你好,世界!'.encode('utf-8')
>>> decode(bytes_, 'trytes_binary')
TryteString('ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM')

# Using legacy ASCII codec:
>>> decode(bytes_, 'trytes_ascii')
TryteString('LH9GYEMHCF9GWHZFEELHVFOEOHNEEEWHZFUD')

To encode strings into trytes:

>>> str_ = '你好,世界!'
>>> encode(str_, 'trytes_binary')
TryteString('ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM')

# Using legacy ASCII codec:
>>> encode(str_, 'trytes_ascii')
TryteString('LH9GYEMHCF9GWHZFEELHVFOEOHNEEEWHZFUD')

Note: PyOTA already uses decode and encode internally to convert some values, so we might have to get creative here.

4. TryteString.from_bytes and TryteString.from_string

I think we're in alignment here; I just need to make one minor tweak, because we also have to support the legacy ASCII codec. See next section.

5. TryteString.encode replaces TryteString.as_bytes

I like the rename, and I think it will resonate with Python users; it is the reverse of decode(bytes_, 'trytes_binary') from the example above:

  • decode(bytes_, 'trytes_binary').encode('trytes_binary') == bytes_
  • decode(bytes_, 'utf-8').encode('utf-8') == bytes_

We will need to support the legacy ASCII codec, so there needs to be an optional argument to that method:

## Using binary codec (default):
>>> TryteString('ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM').encode()
>>> TryteString('ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM').encode('trytes_binary')
b'\xe4\xbd\xa0\xe5\xa5\xbd\xef\xbc\x8c\xe4\xb8\x96\xe7\x95\x8c\xef\xbc\x81'

# Using legacy ASCII codec:
>>> TryteString('LH9GYEMHCF9GWHZFEELHVFOEOHNEEEWHZFUD').encode('trytes_ascii')
b'\xe4\xbd\xa0\xe5\xa5\xbd\xef\xbc\x8c\xe4\xb8\x96\xe7\x95\x8c\xef\xbc\x81'

6. TryteString.decode replaces TryteString.as_string

Similar comments as the previous section.

todofixthis added a commit that referenced this issue Jan 6, 2018
- `TryteString.from_string` is now `from_unicode`.
- `TryteString.as_string` is now `decode`.
- `TryteString.as_bytes` is now `encode`.
- Original methods are still available, but deprecated.
@todofixthis
Copy link
Contributor Author

Summary of changes:

  • Rename TryteString.from_string to from_unicode.
  • Rename TryteString.as_bytes to encode.
  • Rename TryteString.as_string to decode.
  • Add deprecated versions of the renamed functions.

@todofixthis
Copy link
Contributor Author

Scheduled for release: 2.0.4

marko-k0 pushed a commit to marko-k0/iota.lib.py that referenced this issue Jul 28, 2018
- `TryteString.from_string` is now `from_unicode`.
- `TryteString.as_string` is now `decode`.
- `TryteString.as_bytes` is now `encode`.
- Original methods are still available, but deprecated.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants