Convert list of tuples to tensorflow dataset (tf.data.Dataset)

Issue

Data from kaggle Natural Language Processing with Disaster Tweets

ds_train

>>>[("Already expecting to be inundated w/ articles about trad authors' pay plummeting by early next year but if this is true it'll be far worse",
  0)
 ('@blazerfan not everyone can see ignoranceshe is Latinoand that is All she can ever benothing morebut an attack dog 4 a hate group GOP',
  0),...]

`

👆 like [(X1 , y1),…(X_n , y_n)]

OR dataframe

0                      Just happened a terrible car crash

1       Heard about #earthquake is different cities, s...

2       there is a forest fire at spot pond, geese are...

I want to convert it to tensorflow datasets.
I tried tf.data.Dataset.from_tensor_slices(ds_train) but got error

ValueError: Can’t convert Python sequence with mixed types to Tensor.

Solution

One option would be to split the tuple:

import tensorflow as tf

data = [("Already expecting to be inundated w/ articles about trad authors' pay plummeting by early next year but if this is true it'll be far worse", 0), ('@blazerfan not everyone can see ignoranceshe is Latinoand that is All she can ever benothing morebut an attack dog 4 a hate group GOP', 0)]
x, y = zip(*data)
dataset = tf.data.Dataset.from_tensor_slices((list(x), list(y)))

With a dataframe:

dataset = tf.data.Dataset.from_tensor_slices((df['text'], df['target']))

Answered By – AloneTogether

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published