-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crop with bounds leads to a JVM crash for files with 500M points #14
Comments
Hm, if you'll have a smaller file to reproduce this issue with - it would be really helpful. |
the issue is that, it only happens with big files. |
Can be some memory allocation issue than. |
When we run for a small file we do not get a crash. However, if check the number of points returned using the metadata it does reflect the crop output:
It should be To read obtain this result we use the RDD defined above and we do:
In case we write the result of the pipeline into a local file its metadata has the correct number of points (that's how we know the crop worked out). We think the metadata in the RDD should also be updated when we do a |
@pomadchin I have isolated the code and now it should be easy to debug the issue. I am using an Ubuntu virtual machine with 4 cores and 32 GB main-memory and 48GB of swap. The data can be obtained from here (it has around 500M points):
Then the following code was executed in Jupyter notebook running a scala kernel. The code is a light version of your code from PointCloudInputFormat.scala import io.pdal.pipeline.{CropFilter, LasRead, LasWrite}
import io.pdal.{Pipeline, PointView}
import org.apache.spark.{SparkConf, SparkContext}
import scala.collection.JavaConversions._
import java.util
val compression_type :Option[String] = Option("laszip")
val polygon_str :Option[String] = Option("POLYGON ((128000.0 479000.0, 128000.0 485000.0, 136000.0 485000.0, 136000.0 479000.0, 128000.0 479000.0))")
val pipelineExpr = LasRead("/data/local/home/ecolidar/C_25GN2.laz", compression = Option("laszip"), spatialreference = Option("EPSG:4326")) ~ CropFilter(polygon = polygon_str)
val pipeline = pipelineExpr.toPipeline
pipeline.execute()
// If a filter extent is set, don't actually load points.
val (pointViewIterator :util.Iterator[PointView], disposeIterator): (util.Iterator[PointView], () => Unit) =
{
val pvi = pipeline.getPointViews()
(pvi.asInstanceOf[util.Iterator[PointView]], pvi.dispose _)
}
// conversion to list to load everything into JVM memory
val pointClouds = pointViewIterator.asInstanceOf[util.Iterator[PointView]].toList.map{ pointView =>
val pointCloud = pointView.getPointCloud()
pointView.dispose()
pointCloud
}.iterator The pipeline execution consumed all memory plus 1.5GB of swap space. The issue is with conversion to list to load everything into JVM memory. While doing it the JVM crashes. It happens in the |
The answer in your case that there is nothing smart to do with it; the workarounds in this case obviously would be
It doesnt look too much, on the Also a possible workaround is to think about better structures to store points, but the reals solution would be number 2. P.S. in your code you don't need to call |
@pomadchin I decided to dive a bit into this issue because I think lack of memory can't be the only reason. Take the following file:
It has 440MB size in LAZ format, it stores 91M points. Execution exactly the same code above, even without
The machine has 32GB main memory, so we do not think there is enough memory to process 91M points. The pipeline execution consumes 16% of the memory, the crash happens when 30% of memory is used (reported by top) during the copy to JVM. The report error is:
We think the issue is something else then lack of memory. |
What's inside of the You said even without a crop. Without a crop there would be more points. Try to crop by a small area, wondering what would be the result. |
hs_err_pid21764.log in the attachments. For a large file, 500M, we did a crop to 15M and even after that it crashed. We assumed there was so intermediate being created, therefore, it was the reason to crash JVM. Hence, we decided to test without cropping, just a clean read, and because of that we wrote "even without cropping". However, if we crop at most 70% of the points it does not crash. Sorry for the incomplete info. |
How many dimensions this LAZ file has? |
Have you tried to calculate how much memory should occupy such a 15 mill elements array? |
It has 13 dimensions. Assuming 6 of them are doubles and the other 7 are integers (we are overestimating) a record would be: (68 + 47) 91M = 6.6GB which is near 20% of our memory (it reported 16% after Maybe this info helps on something, these are the dimensions reported by
|
Can you try to select only X, Y, Z and check the behaviour? |
So it stores all points in a single byte array, the issue happens at this point: https://github.com/PDAL/java/blob/caa95c9c9f73f341fd08db99c86e3173cb223868/native/src/io_pdal_PointView.cpp#L169 you can try to insert some debugging print here, to print the buffer length it would try to allocate: Taking into account your input: Remark: max array length is And |
A silly question, I have modified the sources you said, then I did:
And then I thought
How should I compile to get the the native compiled? |
You don't have to build assembly, just publish everything locally using this script: https://github.com/PDAL/java/blob/master/scripts/publish-local.sh |
@pomadchin thanks, that did the job. However, I do not see anything being printed. The following line was added:
With a strace I see that the new
It is strange I do not see it in the |
Can you show me the code? |
Either The scala code we use is the the following:
|
I meant a new C function body sry for the confusion. And what's the new error including the stacktrace in a log file? |
The C function looks like this:
And the strace.zip obtained like this:
The
In strace you can see it picked up that one. |
Ok, and what's You may try to kill the process after printing the length. |
@pomadchin I injected a print statement in I added a print statement where we get the That showed me that the print statement should have been added here GetPackedPoints instead of Anyway, here important output:
It does not look like a big allocation. |
Ah yes, sorry for confusion. Ok, reread this comment: bufSize should be
|
Yeah, #11 solved the problem of copying a big file into local fs. PDAL can work only with files on a local fs while hadoop doesnt care, so i decided to use streams here to copy files instead of common byte arrays, though it adds some constrains about this stream usage, it should be used carefully and we definitely should know why it's necessary. #12 would resolve the issue not only of a better parallelism but also the of data distribution problem, we can consistently spread points across the cluster to avoid the situation happened in the current post. |
We will see if we can get something for #12, but first we need to understand the code. We will get back to this issue coming days. |
@pomadchin looking to the code, we will need some time to get #12 done. Not sure if we have time for that before July or even August. For what we saw, The issue is that Important to notice, the generation of the RDD would be done by a single executor because the file is downloaded and multiple downloads are not recommended, plus we have the pipeline execution which should be done over the entire LAS/LAZ file and not portions of it. Please correct me if I miss-understood something. |
it turned out that PDAL supports direct reads from Mb it makes sense to think about adding HDFS / Hadoop compatible backends support into https://github.com/PDAL/PDAL/blob/master/vendor/arbiter/arbiter.cpp#L131-L168 |
The following read:
Leads to the following error:
The content of the log file
hs_err_pid25838.log is in the attachments.
Any ideas or suggestions for such crash?
We have a machine with 32GB, the worker uses 30GB and the executor 30GB, this is, one executor per worker.
If we need to provide more information please let us know.
The text was updated successfully, but these errors were encountered: