How to save TextVectorization to disk in tensorflow?


from tensorflow.keras.layers.experimental.preprocessing import TextVectorization 

text_dataset = 
vectorizer = TextVectorization(max_tokens=100000, output_mode='tf-idf',ngrams=None)

I have trained a TextVectorization and I want to save it to disk, so that I can reload it next time? I have tried pickle and joblib.dump. It does not work.

How can I save it?

the generated error is the following:

InvalidArgumentError: Cannot convert a Tensor of dtype resource to a NumPy array


Instead of pickling the object, pickle the configuration and weights. Later unpickle it and use configuration to create the object and load the saved weights. Office docs here.


text_dataset =[
                                                   "this is some clean text", 
                                                   "some more text", 
                                                   "even some more text"]) 
# Fit a TextVectorization layer
vectorizer = TextVectorization(max_tokens=10, output_mode='tf-idf',ngrams=None)    

# Vector for word "this"
print (vectorizer("this"))

# Pickle the config and weights
pickle.dump({'config': vectorizer.get_config(),
             'weights': vectorizer.get_weights()}
            , open("tv_layer.pkl", "wb"))

print ("*"*10)
# Later you can unpickle and use 
# `config` to create object and 
# `weights` to load the trained weights. 

from_disk = pickle.load(open("tv_layer.pkl", "rb"))
new_v = TextVectorization.from_config(from_disk['config'])
# You have to call `adapt` with some dummy data (BUG in Keras)

# Lets see the Vector for word "this"
print (new_v("this"))


[[0.         0.         0.         0.         0.91629076 0.
  0.         0.         0.         0.        ]], shape=(1, 10), dtype=float32)
[[0.         0.         0.         0.         0.91629076 0.
  0.         0.         0.         0.        ]], shape=(1, 10), dtype=float32)

Answered By – mujjiga

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published