Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Spark Version 2.2 #43

Open
DataWanderer opened this issue Mar 30, 2018 · 6 comments
Open

Update Spark Version 2.2 #43

DataWanderer opened this issue Mar 30, 2018 · 6 comments

Comments

@DataWanderer
Copy link

No description provided.

@traviscrawford
Copy link
Owner

traviscrawford commented Mar 30, 2018 via email

@DataWanderer
Copy link
Author

DataWanderer commented Mar 31, 2018 via email

@traviscrawford
Copy link
Owner

Looking at https://github.com/traviscrawford/spark-dynamodb/blob/master/pom.xml we see there's no explicit Guava dependency, so whatever version Spark brings is what's used.

Does your application override the Guava version, either explicitly or transitively from some other dependency? What version of Guava is on your application's classpath, and what's pulling it in?

@DataWanderer
Copy link
Author

The application doesnot override Guava Version. But some how EMR spark versino 2.2.1 and 2.2.0 gives this issue

@klamouri
Copy link

klamouri commented Oct 16, 2018

It looks like it's not limited to EMR. The same issue appears on DataBricks.

Here is my code:

import com.github.traviscrawford.spark.dynamodb.DynamoScanner
val rdd = DynamoScanner(sc, "table", 8, 1000, None, Option(50), Option("us-east-1"))
rdd.collect()

And the output:

Caused by: java.lang.NoSuchMethodError: com.google.common.util.concurrent.RateLimiter.acquire(I)D
	at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1$$anonfun$apply$3$$anonfun$apply$1.apply$mcDI$sp(DynamoScanner.scala:67)
	at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1$$anonfun$apply$3$$anonfun$apply$1.apply(DynamoScanner.scala:66)
	at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1$$anonfun$apply$3$$anonfun$apply$1.apply(DynamoScanner.scala:66)
	at scala.Option.foreach(Option.scala:257)
	at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1$$anonfun$apply$3.apply(DynamoScanner.scala:66)
	at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1$$anonfun$apply$3.apply(DynamoScanner.scala:60)
	at scala.Option.foreach(Option.scala:257)
	at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1.apply(DynamoScanner.scala:60)
	at com.github.traviscrawford.spark.dynamodb.DynamoScanner$$anonfun$com$github$traviscrawford$spark$dynamodb$DynamoScanner$$scan$1.apply(DynamoScanner.scala:58)
	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
	at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
	at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
	at scala.collection.AbstractIterator.to(Iterator.scala:1336)
	at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:947)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:947)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2159)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2159)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:111)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:354)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

I'm running it on a databricks cluster using their 3.5 LTS (includes Apache Spark 2.2.1, Scala 2.11).
The only dependency attached to the cluster is spark-dynamodb-0.0.13.

Can I ask on what kind of cluster are you running those spark jobs @traviscrawford ? EMR, Databricks or self-maintained cluster?

EDIT: It's worth noting that this only occurs when you put a rate limit that isn't None.

@klamouri
Copy link

Here is how I fixed it.

I added spark-dynamodb-0.0.13 to my application and I also added guava:16. I then shadowed guava:16 to not have issues with DataBricks provided version. This issue occurs because the acquire method used to return void before Guava 16 and it returns Double since Guava 16.

I think there is something funky with the Scala compilation. Indeed, this stack means that the library code is trying to call an acquire method that takes an Int as an argument and return a Double. Which is not possible if you're using a stock version of Spark (Guava is stuck at 15 because of Hadoop if I'm not mistaken). Hence, shadowing a Guava 16 solves the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants