Hiveless is a Scala library for working with Spark and Hive using a more expressive typed API. It adds typed HiveUDFs and implements Spatial Hive UDFs. It consists of the following modules:
hiveless-core
with the typed Hive UDFs API and the initial base set of codecshiveless-jts
with the TWKB JTS encoding supporthiveless-spatial
with Hive GIS UDFs (depends on GeoMesa)hiveless-spatial-index
with extra Hive GIS UDFs that may be used for the GIS indexing purposes (depends on GeoMesa and GeoTrellis)- There is also a forked release CartoDB/analytics-toolbox-databricks, which is a complete
hiveless-spatial
andhiveless-spatial-index
copy at this point. However, it may contain an extended GIS functionality in the future.
- There is also a forked release CartoDB/analytics-toolbox-databricks, which is a complete
To use Hiveless in your project add the following in your build.sbt
file as needed:
resolvers ++= Seq(
// for snapshot artifacts only
"oss-sonatype" at "https://oss.sonatype.org/content/repositories/snapshots"
)
libraryDependencies ++= List(
"com.azavea" %% "hiveless-core" % "<latest version>",
"com.azavea" %% "hiveless-spatial" % "<latest version>",
"com.azavea" %% "hiveless-spatial-index" % "<latest version>"
)
CREATE OR REPLACE FUNCTION st_geometryFromText as 'com.azavea.hiveless.spatial.ST_GeomFromWKT';
CREATE OR REPLACE FUNCTION st_intersects as 'com.azavea.hiveless.spatial.ST_Intersects';
CREATE OR REPLACE FUNCTION st_simplify as 'com.azavea.hiveless.spatial.ST_Simplify';
-- ...and more
The full list of supported functions can be found here.
There are two types of supported optimizations: ST_Intersects
and ST_Contains
, which allow Spark to push down predicates when possible.
To enable optimizations:
import com.azavea.hiveless.spark.sql.rules.SpatialFilterPushdownRules
val spark: SparkSession = ???
SpatialFilterPushdownRules.registerOptimizations(sparkContext.sqlContext)
It is also possible to set it through the Spark configuration via the optimizations injector:
import com.azavea.hiveless.spark.sql.SpatialFilterPushdownOptimizations
val conf: SparkConfig = ???
config.set("spark.sql.extensions", classOf[SpatialFilterPushdownOptimizations].getName)
Code is provided under the Apache 2.0 license available at http://opensource.org/licenses/Apache-2.0, as well as in the LICENSE file. This is the same license used as Spark.