How to apply tf.data transformations to a DataFrame

Issue

I want to apply tf.data transformations to a panda dataframe. According to the tensorflow docs HERE I can apply tf.data to a dataframe directly but the dtype of the dataframe should be uniform.

When I apply tf.data to my dataframe like below

tf.data.Dataset.from_tensor_slices(df['reports'])

it generates this error

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float).

When I print df['reports'].dtype it is dtype('O') which seems to be not uniformed, if this is the case then how can I convert this dataframe to uniform dtype.

Solution

Try using a ragged structure:

import tensorflow as tf
import pandas as pd

df = pd.DataFrame(data={'reports': [[2.0, 3.0, 4.0], [2.0, 3.0], [2.0]]})

dataset = tf.data.Dataset.from_tensor_slices(tf.ragged.constant(df['reports']))

for x in dataset:
  print(x)
tf.Tensor([2. 3. 4.], shape=(3,), dtype=float32)
tf.Tensor([2. 3.], shape=(2,), dtype=float32)
tf.Tensor([2.], shape=(1,), dtype=float32)

Answered By – AloneTogether

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published