How to write custom text pre-processing that could be saved as part of a model?
Suppose that I would like to have two features:
- auto-correct string input with some function. Words might change after this operation
- do query expansion of string input, such that outcome text/tokens might contain few additional words(for which weights would be trained).
Something like this:
fli to London -> Fly to London
fly to London -> Fly to London loc_city
-> this token would need to be in vocabulary in advance, which could be done
After steps 1 and/or 2, feed the result to TextVectorisation / Embedding layer ?
standardize callback, but I do not see obvious way of doing that with existing tf.string operations.
Ideally, there is a callback function / layer which accepts string(or tokens) and maps to another string(or string tokens).
You can get the first character of a string like this:
import tensorflow as tf class StringLayer(tf.keras.layers.Layer): def __init__(self): super(StringLayer, self).__init__() def call(self, inputs): return tf.squeeze(tf.strings.bytes_split(inputs), axis=1).to_tensor()[:, 0] s = tf.constant([['next_string'], ['some_string']]) layer = StringLayer() print(layer(s)) # tf.Tensor([b'n' b's'], shape=(2,), dtype=string)
Answered By – AloneTogether