Tensorflow with string based tabular data [UNIMPLEMENTED: Cast string to float is not supported]

Issue

I am having the following tabular data stored in a dataframe df:

input3 input2 score
aaaaaa xxxxxx 0.1.
bbbbbb yyyyyy 0.1.

I want to build a regression model on that using TF functional API. Because of the strings, I am using Embedding layers. Here is the network:

input1 = Input(shape=(1,), name="input1")
embedding1 = Embedding(n_input1, 5)(input1)
vec1 = Flatten()(embedding1)

# creating user embedding path
input2 = Input(shape=(1,), name="input2")
embedding2 = Embedding(n_input2, 5)(input2)
vec2 = Flatten()(embedding2)

# concatenate features
conc = Concatenate()([vec1, vec2])

# add fully-connected-layers
fc1 = Dense(256, activation='relu')(conc)
fc2 = Dense(128, activation='relu')(fc1)
fc3 = Dense(128, activation='relu')(fc2)
out = Dense(1)(fc3)

# Create model and compile it
model = Model([input1, input2], out)
model.compile('adam', 'mean_squared_error')

where n_input_1 and n_input_2 are the number of unique items in each columns.

Because, I have df.dtypes returning:

input1          object
input2          object
score          float64
dtype: object

I do df = data_df.astype({'input1': 'string', 'input2': 'string'}) — not sure this is useful

When trying to fit the model using:
history = model.fit([df.input1, df.input2], df.score, epochs=10, verbose=1)

I end up with the following error:

UnimplementedError: Graph execution error:

Detected at node 'model/Cast' defined at (most recent call last):
...
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py", line 671, in _conform_to_reference_input
      tensor = tf.cast(tensor, dtype=ref_input.dtype)
Node: 'model/Cast'
2 root error(s) found.
  (0) UNIMPLEMENTED:  Cast string to float is not supported
     [[{{node model/Cast}}]]
  (1) CANCELLED:  Function was cancelled before it was started
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_965]

Not really sure what I missed here ?

Solution

Check documentation:

https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding

As it says:

This layer can only be used on positive integer inputs of a fixed range. The tf.keras.layers.TextVectorization, tf.keras.layers.StringLookup, and tf.keras.layers.IntegerLookup preprocessing layers can help prepare inputs for an Embedding layer.

Example:

[[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]

Answered By – Alex

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published