I have a query that can run on the same data set, and sometimes it fails and sometimes it succeeds
The query is generated by hive metadata service, and I can’t modify it.
This is a simplified version of the query:
select "TBLS"."TBL_ID", "FILTER0"."PART_ID", "TBLS"."TBL_NAME", "FILTER0"."PART_KEY_VAL" from "PARTITIONS" inner join "TBLS" on "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = 'test_table_int' inner join "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = "PARTITIONS"."PART_ID" where cast("FILTER0"."PART_KEY_VAL" as decimal(21, 0)) = 1
When I spin up a new database, and populate the relevant tables, this is how the whole data looks like (querying without any filters):
and running the query above will return a single row (the one with
PART_KEY_VAL = 1)
the problem starts after I run some automated tests that write to those tables. I couldn’t find any pattern, I just run a few complicate tests that write to those tables
Now if I populate those tables again, the data looks similar:
but running the query above will result in:
SQL Error [22P02]: ERROR: invalid input syntax for type numeric: “c”
for some reason, the value “c” is being cast to decimal and it fails, even though the same query on the same data was working earlier
what could be the reason for this behavior?
for reference, here is where the query is generated, but I simplified it a bit above: https://github.com/apache/hive/blob/rel/release-3.1.2/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L1289-L1339