Skip to content

java.lang.IllegalArgumentException when using parquet file #69

@JyotiRSharma

Description

@JyotiRSharma

When trying to run a config check on a parquet file, the following error can be seen:

root@lubuntu:/home/jyoti/Spark# /opt/spark/spark-3.1.2-bin-hadoop3.2/bin/spark-submit --num-executors 10 --executor-cores 2 data-validator-assembly-20220111T034941.jar --config config.yaml
22/01/11 11:50:53 WARN Utils: Your hostname, lubuntu resolves to a loopback address: 127.0.1.1; using 192.168.195.131 instead (on interface ens33)
22/01/11 11:50:53 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
22/01/11 11:50:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/01/11 11:50:59 INFO Main$: Logging configured!
22/01/11 11:51:00 INFO Main$: Data Validator
22/01/11 11:51:01 INFO ConfigParser$: Parsing `config.yaml`
22/01/11 11:51:01 INFO ConfigParser$: Attempting to load `config.yaml` from file system
Exception in thread "main" java.lang.ExceptionInInitializerError
	at com.target.data_validator.validator.RowBased.<init>(RowBased.scala:11)
	at com.target.data_validator.validator.NullCheck.<init>(NullCheck.scala:12)
	at com.target.data_validator.validator.NullCheck$.fromJson(NullCheck.scala:37)
	at com.target.data_validator.validator.JsonDecoders$$anon$7$$anonfun$decoders$2.apply(JsonDecoders.scala:16)
	at com.target.data_validator.validator.JsonDecoders$$anon$7$$anonfun$decoders$2.apply(JsonDecoders.scala:16)
	at com.target.data_validator.validator.JsonDecoders$$anon$7$$anonfun$2.apply(JsonDecoders.scala:32)
	at com.target.data_validator.validator.JsonDecoders$$anon$7$$anonfun$2.apply(JsonDecoders.scala:32)
	at scala.Option.map(Option.scala:230)
	at com.target.data_validator.validator.JsonDecoders$$anon$7.com$target$data_validator$validator$JsonDecoders$$anon$$getDecoder(JsonDecoders.scala:32)
	at com.target.data_validator.validator.JsonDecoders$$anon$7$$anonfun$apply$3.apply(JsonDecoders.scala:27)
	at com.target.data_validator.validator.JsonDecoders$$anon$7$$anonfun$apply$3.apply(JsonDecoders.scala:27)
	at cats.syntax.EitherOps$.flatMap$extension(either.scala:149)
	at com.target.data_validator.validator.JsonDecoders$$anon$7.apply(JsonDecoders.scala:27)
	at io.circe.SeqDecoder.apply(SeqDecoder.scala:17)
	at io.circe.Decoder$class.tryDecode(Decoder.scala:36)
	at io.circe.SeqDecoder.tryDecode(SeqDecoder.scala:6)
	at com.target.data_validator.ConfigParser$anon$importedDecoder$macro$15$1$$anon$6.apply(ConfigParser.scala:21)
	at io.circe.generic.decoding.DerivedDecoder$$anon$1.apply(DerivedDecoder.scala:13)
	at io.circe.Decoder$$anon$28.apply(Decoder.scala:178)
	at io.circe.Decoder$$anon$28.apply(Decoder.scala:178)
	at io.circe.SeqDecoder.apply(SeqDecoder.scala:17)
	at io.circe.Decoder$class.tryDecode(Decoder.scala:36)
	at io.circe.SeqDecoder.tryDecode(SeqDecoder.scala:6)
	at com.target.data_validator.ConfigParser$anon$importedDecoder$macro$81$1$$anon$10.apply(ConfigParser.scala:28)
	at io.circe.generic.decoding.DerivedDecoder$$anon$1.apply(DerivedDecoder.scala:13)
	at io.circe.Json.as(Json.scala:106)
	at com.target.data_validator.ConfigParser$.configFromJson(ConfigParser.scala:28)
	at com.target.data_validator.ConfigParser$$anonfun$parse$1.apply(ConfigParser.scala:65)
	at com.target.data_validator.ConfigParser$$anonfun$parse$1.apply(ConfigParser.scala:65)
	at cats.syntax.EitherOps$.flatMap$extension(either.scala:149)
	at com.target.data_validator.ConfigParser$.parse(ConfigParser.scala:65)
	at com.target.data_validator.ConfigParser$.parseFile(ConfigParser.scala:60)
	at com.target.data_validator.Main$.loadConfigRun(Main.scala:23)
	at com.target.data_validator.Main$.main(Main.scala:171)
	at com.target.data_validator.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: requirement failed: Literal must have a corresponding value to bigint, but class Integer found.
	at scala.Predef$.require(Predef.scala:281)
	at org.apache.spark.sql.catalyst.expressions.Literal$.validateLiteralValue(literals.scala:219)
	at org.apache.spark.sql.catalyst.expressions.Literal.<init>(literals.scala:296)
	at org.apache.spark.sql.catalyst.expressions.Literal$.create(literals.scala:144)
	at com.target.data_validator.validator.ValidatorBase$.<init>(ValidatorBase.scala:139)
	at com.target.data_validator.validator.ValidatorBase$.<clinit>(ValidatorBase.scala)
	... 47 more

Ran a spark-submit job as follows:

spark-submit --num-executors 10 --executor-cores 2 data-validator-assembly-20220111T034941.jar --config config.yaml

The config.yaml file has the following content:

numKeyCols: 2
numErrorsToReport: 742

tables:
  - parquetFile: /home/jyoti/Spark/userdata1.parquet
    checks:
      - type: nullCheck
        column: salary

I got the userdata1.parquet from the following github link:
https://github.com/Teradata/kylo/blob/master/samples/sample-data/parquet/userdata1.parquet

Environment Details:
latest source code: data-validator-0.13.0
Lubuntu 18.04 LTS x64 version on VMWare Player
4 CPU cores and 2GB ram
Java version

yoti@lubuntu:~$ java -version
openjdk version "1.8.0_312"
OpenJDK Runtime Environment (build 1.8.0_312-8u312-b07-0ubuntu1~18.04-b07)
OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)

lsb_release output:

jyoti@lubuntu:~$ lsb_release -a 2>/dev/null
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04 LTS
Release:	18.04
Codename:	bionic

uname -s:

jyoti@lubuntu:~$ uname -s
Linux

sbt -version:

root@lubuntu:/home/jyoti/Spark# sbt -version
downloading sbt launcher 1.6.1
[info] [launcher] getting org.scala-sbt sbt 1.6.1  (this may take some time)...
[info] [launcher] getting Scala 2.12.15 (for sbt)...
sbt version in this project: 1.6.1
sbt script version: 1.6.1

Please let me know if you need anything else.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions