How does the BERT model (in Tensorflow or Paddle-paddle frameworks) relate to nodes of the underlying neural-net that’s being trained?

The BERT model in frameworks like TensorFlow/Paddle-paddle shows various kinds of computation nodes (like subtract, accumulate, add, mult etc) in a graph like form in 12 layers. But this graph doesn’t look anything like a neural-network, one that’s typically shown in textbooks (e.g. like this https://en.wikipedia.org/wiki/Artificial_neural_network#/media/File:Colored_neural_network.svg) where each edge has a weight that’s being trained…

Indexing tensors in custom loss function with Keras

I’m using a custom loss function in Keras. This is the function: def custom_loss(groups_id_count): def listnet_loss(real_labels, predicted_labels): losses = tf.placeholder(shape=[None], dtype=tf.float32) # Tensor of rank 1 for group in groups_id_count: start_range = 0 end_range = (start_range + group[1]) batch_real_labels = tf.slice(real_labels, [start_range, 1, None], [end_range, 1, None]) batch_predicted_labels = tf.slice(predicted_labels, [start_range, 0, 0], [end_range, 0,…