Categories
Mastering Development

Boolean Labels Training with Keras causes the error “Please provide data which shares the same first dimension”

So, here is what I am trying to do. My model has to receive a number of training samples, each a conjunction of Boolean literals (i.e. a vector of 0 or 1s) assigned with a truth value. Learning from the samples, it must be able to receive some test vector and determine its truth value.

More concretely, a vector of 0 and 1s such as V = [1,0,0,…,0,1] may be either acceptable or not (labeled with 1 or 0.)
My training sample array contains 15202 of such vectors. It is an array of size (15202, 20) and the train label array is of size (15202, 1). Then there is a training label array containing labels for each sample. That is, the following piece of code

print(np.shape(train_samples))
print(type(train_samples))
print(np.shape(train_labels))
print(type(train_labels))

gives the results:

(15202, 20)
<class 'numpy.ndarray'>
(15202, 1)
<class 'numpy.ndarray'>

The rest of the code is as follows:

import numpy as np
from random import randint
from sklearn.utils import shuffle
from sklearn.preprocessing import MinMaxScaler
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dense
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import categorical_crossentropy
#Main Code
# Randomly generated sample and lables, for illustration only
train_samples = np.random.randint(2, size=(15202,20))
train_labels = np.random.randint(2, size=(15202,1))
#--------

train_labels, train_samples = shuffle(train_labels, train_samples)
scaler = MinMaxScaler(feature_range=(0,1))
scaled_train_samples = scaler.fit_transform(train_samples.reshape(-1,1))
model = Sequential([
    Dense(units=16, input_shape=(1,), activation='relu'),
    Dense(units=32, activation='relu'),
    Dense(units=2, activation='softmax')
])
model.compile(optimizer=Adam(learning_rate=0.0001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(x=scaled_train_samples, y=train_labels, batch_size=10, epochs=30, verbose=2)

The final line causes an error:

ValueError: Data cardinality is ambiguous:
  x sizes: 304040
  y sizes: 15202
Please provide data which shares the same first dimension.

I notice that the reported x size (304040) is actually 15202 times 20. So what am I doing wrong here, and how can I fix that? Thanks.

Leave a Reply

Your email address will not be published. Required fields are marked *