Categories
Development

Is a Spark RDD deterministic for the set of elements in each partition?

I can’t find much documentation on ensuring partitioning order – i just want to ensure that given a set of deterministic transformations (output rows always the same), partitions always receive the same set of elements if the underlying dataset doesn’t change. Is that possible? It doesn’t need to be sorted: an example would be after […]