Custom text pre-processing saved in Tensorflow model


How to write custom text pre-processing that could be saved as part of a model?

Suppose that I would like to have two features:

  1. auto-correct string input with some function. Words might change after this operation
  2. do query expansion of string input, such that outcome text/tokens might contain few additional words(for which weights would be trained).

Something like this:

  1. fli to London -> Fly to London

  2. fly to London -> Fly to London loc_city

    -> this token would need to be in vocabulary in advance, which could be done

After steps 1 and/or 2, feed the result to TextVectorisation / Embedding layer ?

There is standardize callback, but I do not see obvious way of doing that with existing tf.string operations.

Ideally, there is a callback function / layer which accepts string(or tokens) and maps to another string(or string tokens).


You can get the first character of a string like this:

import tensorflow as tf

class StringLayer(tf.keras.layers.Layer):
  def __init__(self):
    super(StringLayer, self).__init__()

  def call(self, inputs):
    return tf.squeeze(tf.strings.bytes_split(inputs), axis=1).to_tensor()[:, 0]

s = tf.constant([['next_string'], ['some_string']])
layer = StringLayer()
# tf.Tensor([b'n' b's'], shape=(2,), dtype=string)

Answered By – AloneTogether

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published