Update Spark Version 2.2 #43

DataWanderer · 2018-03-30T18:07:42Z

No description provided.

traviscrawford · 2018-03-30T23:59:28Z

What’s the issue you’re seeing trying to use this with Spark 2.2? The Spark dependency is “provided” meaning you should be able to use this jar with Spark 2.2.

…

On Fri, Mar 30, 2018 at 11:07 AM DataWanderer ***@***.***> wrote: — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#43>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAG2gqSH_5GoMwD5lzRfsOY-45WduYh1ks5tjnRugaJpZM4TB-Ug> .

DataWanderer · 2018-03-31T03:27:32Z

com.google.common.util.concurrent.RateLimiter.acquire(I)D at com.github.traviscrawford.spark.dynamodb.DynamoDBRelation$$anonfun$scan$1$$anonfun$apply$2$$anonfun$apply$1.apply$mcDI$sp(DynamoDBRelation.scala:137) at com.github.traviscrawford.spark.dynamodb.DynamoDBRelation$$anonfun$scan$1$$anonfun$apply$2$$anonfun$apply$1.apply(DynamoDBRelation.scala:136) at com.github.traviscrawford.spark.dynamodb.DynamoDBRelation$$anonfun$scan$1$$anonfun$apply$2$$anonfun$apply$1.apply(DynamoDBRelation.scala:136) at scala.Option.foreach(Option.scala:257) at com.github.traviscrawford.spark.dynamodb.DynamoDBRelation$$anonfun$scan$1$$anonfun$apply$2.apply(DynamoDBRelation.scala:136) at com.github.traviscrawford.spark.dynamodb.DynamoDBRelation$$anonfun$scan$1$$anonfun$apply$2.apply(DynamoDBRelation.scala:130) at scala.Option.foreach(Option.scala:257) at com.github.traviscrawford.spark.dynamodb.DynamoDBRelation$$anonfun$scan$1.apply(DynamoDBRelation.scala:130) at com.github.traviscrawford.spark.dynamodb.DynamoDBRelation$$anonfun$scan$1.apply(DynamoDBRelation.scala:114) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at On Fri, Mar 30, 2018 at 6:59 PM, Travis Crawford <notifications@github.com> wrote:

…

What’s the issue you’re seeing trying to use this with Spark 2.2? The Spark dependency is “provided” meaning you should be able to use this jar with Spark 2.2. On Fri, Mar 30, 2018 at 11:07 AM DataWanderer ***@***.***> wrote: > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <#43>, or mute the > thread > <https://github.com/notifications/unsubscribe- auth/AAG2gqSH_5GoMwD5lzRfsOY-45WduYh1ks5tjnRugaJpZM4TB-Ug> > . > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#43 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AG94jobiTzZJiFB_wAl3fCqcc8eaUrvSks5tjsbhgaJpZM4TB-Ug> .

traviscrawford · 2018-03-31T20:51:21Z

Looking at https://github.com/traviscrawford/spark-dynamodb/blob/master/pom.xml we see there's no explicit Guava dependency, so whatever version Spark brings is what's used.

Does your application override the Guava version, either explicitly or transitively from some other dependency? What version of Guava is on your application's classpath, and what's pulling it in?

DataWanderer · 2018-04-03T23:34:29Z

The application doesnot override Guava Version. But some how EMR spark versino 2.2.1 and 2.2.0 gives this issue

klamouri · 2018-10-16T20:29:19Z

It looks like it's not limited to EMR. The same issue appears on DataBricks.

Here is my code:

import com.github.traviscrawford.spark.dynamodb.DynamoScanner
val rdd = DynamoScanner(sc, "table", 8, 1000, None, Option(50), Option("us-east-1"))
rdd.collect()

And the output:

Caused by: java.lang.NoSuchMethodError: com.google.common.util.concurrent.RateLimiter.acquire(I)D
	at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1$$anonfun$apply$3$$anonfun$apply$1.apply$mcDI$sp(DynamoScanner.scala:67)
	at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1$$anonfun$apply$3$$anonfun$apply$1.apply(DynamoScanner.scala:66)
	at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1$$anonfun$apply$3$$anonfun$apply$1.apply(DynamoScanner.scala:66)
	at scala.Option.foreach(Option.scala:257)
	at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1$$anonfun$apply$3.apply(DynamoScanner.scala:66)
	at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1$$anonfun$apply$3.apply(DynamoScanner.scala:60)
	at scala.Option.foreach(Option.scala:257)
	at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1.apply(DynamoScanner.scala:60)
	at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1.apply(DynamoScanner.scala:58)
	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
	at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
	at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
	at scala.collection.AbstractIterator.to(Iterator.scala:1336)
	at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:947)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:947)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2159)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2159)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:111)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:354)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

I'm running it on a databricks cluster using their 3.5 LTS (includes Apache Spark 2.2.1, Scala 2.11).
The only dependency attached to the cluster is spark-dynamodb-0.0.13.

Can I ask on what kind of cluster are you running those spark jobs @traviscrawford ? EMR, Databricks or self-maintained cluster?

EDIT: It's worth noting that this only occurs when you put a rate limit that isn't None.

klamouri · 2018-10-17T00:41:42Z

Here is how I fixed it.

I added spark-dynamodb-0.0.13 to my application and I also added guava:16. I then shadowed guava:16 to not have issues with DataBricks provided version. This issue occurs because the acquire method used to return void before Guava 16 and it returns Double since Guava 16.

I think there is something funky with the Scala compilation. Indeed, this stack means that the library code is trying to call an acquire method that takes an Int as an argument and return a Double. Which is not possible if you're using a stock version of Spark (Guava is stuck at 15 because of Hadoop if I'm not mistaken). Hence, shadowing a Guava 16 solves the issue

klamouri mentioned this issue Oct 18, 2018

Does not work with EMR. #44

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Spark Version 2.2 #43

Update Spark Version 2.2 #43

DataWanderer commented Mar 30, 2018

traviscrawford commented Mar 30, 2018 via email

DataWanderer commented Mar 31, 2018 via email

traviscrawford commented Mar 31, 2018

DataWanderer commented Apr 3, 2018

klamouri commented Oct 16, 2018 •

edited

Loading

klamouri commented Oct 17, 2018

Update Spark Version 2.2 #43

Update Spark Version 2.2 #43

Comments

DataWanderer commented Mar 30, 2018

traviscrawford commented Mar 30, 2018 via email

DataWanderer commented Mar 31, 2018 via email

traviscrawford commented Mar 31, 2018

DataWanderer commented Apr 3, 2018

klamouri commented Oct 16, 2018 • edited Loading

klamouri commented Oct 17, 2018

klamouri commented Oct 16, 2018 •

edited

Loading