You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I defined an Avro schema and used SBT Avrohugger to generate the Scala code. Serialization and deserialization so far works on my local machine. I am doing something like this:
valx:Array[Byte] = ... // Get the serialized datavalmyThing=SpecificAvroCodecs.toBinary[MyAvroThing](MyAvroThing.SCHEMA$).invert(x)
When I run this locally, it works perfectly. I now created a Spark task that can be submitted to Spark with the help of the SBT Assembly plugin. When I "submit" this task locally (using spark-submit --master local[*]), this serialization works. However, when I submit it to a "real" Spark installation, I get a CCE:
Exception in thread "main" java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to com.example.avro.MyAvroThing
So, the deserializer does not recognize the format and deserializes it to a generic Avro type. I double checked that all necessary Avro libraries and Twitter's Bijection-Avro are correctly embedded in my resulting JAR.
As a next investigation step, I analyzed the GenericData.Record I get by doing:
valmystery=SpecificAvroCodecs.toBinary[MyAvroThing](MyAvroThing.SCHEMA$).invert(x).asInstanceOf[Try[Any]]
mystery.get match {
case_: MyAvroThing=> println("ok!")
caser: GenericData.Record=> println("Got a generic record with schema: "+ r.getSchema.getFields.map(_.name()).mkString(", "))
case _ => println("Got something completely different")
}
When I run this locally, it prints out ok! as it correctly gets the MyAvroThing. When I run this on the Spark cluster, I get:
Got a generic record with schema: foo, bar, quux
this means, my schema IS honored by the deserializer and it is deserialized correctly, only the transformation to the resulting class is not done somehow.
When I query the record's fields by name, I get the correct data I expect in my MyAvroThing.
What is going wrong here?
The text was updated successfully, but these errors were encountered:
I wonder if the issue could be a classpath issue. Locally you have one version of avro, on the cluster you have another and it shows up as a runtime error.
I defined an Avro schema and used SBT Avrohugger to generate the Scala code. Serialization and deserialization so far works on my local machine. I am doing something like this:
When I run this locally, it works perfectly. I now created a Spark task that can be submitted to Spark with the help of the SBT Assembly plugin. When I "submit" this task locally (using
spark-submit --master local[*]
), this serialization works. However, when I submit it to a "real" Spark installation, I get a CCE:So, the deserializer does not recognize the format and deserializes it to a generic Avro type. I double checked that all necessary Avro libraries and Twitter's Bijection-Avro are correctly embedded in my resulting JAR.
As a next investigation step, I analyzed the
GenericData.Record
I get by doing:When I run this locally, it prints out
ok!
as it correctly gets theMyAvroThing
. When I run this on the Spark cluster, I get:this means, my schema IS honored by the deserializer and it is deserialized correctly, only the transformation to the resulting class is not done somehow.
When I query the record's fields by name, I get the correct data I expect in my
MyAvroThing
.What is going wrong here?
The text was updated successfully, but these errors were encountered: