Skip to content

Commit

Permalink
Merge pull request #654 from NVIDIA/branch-24.04
Browse files Browse the repository at this point in the history
release 24.04 [skip ci]
  • Loading branch information
YanxuanLiu authored May 10, 2024
2 parents e0f644d + ac4785c commit df01b39
Show file tree
Hide file tree
Showing 44 changed files with 3,893 additions and 291 deletions.
10 changes: 5 additions & 5 deletions .github/workflows/auto-merge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ name: auto-merge HEAD to BASE
on:
pull_request_target:
branches:
- branch-24.02
- branch-24.04
types: [closed]

jobs:
Expand All @@ -27,16 +27,16 @@ jobs:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: branch-24.02 # force to fetch from latest upstream instead of PR ref
ref: branch-24.04 # force to fetch from latest upstream instead of PR ref

- name: auto-merge job
uses: ./.github/workflows/auto-merge
env:
OWNER: NVIDIA
REPO_NAME: spark-rapids-ml
HEAD: branch-24.02
BASE: branch-24.04
HEAD: branch-24.04
BASE: branch-24.06
AUTOMERGE_TOKEN: ${{ secrets.AUTOMERGE_TOKEN }} # use to merge PR

2 changes: 0 additions & 2 deletions .github/workflows/blossom-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,6 @@ jobs:
GaryShen2008,\
NvTimLiu,\
YanxuanLiu,\
zhanga5,\
Er1cCheng,\
', format('{0},', github.actor)) && github.event.comment.body == 'build'
steps:
- name: Check if comment is issued by authorized person
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/gcs-benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ jobs:
SERVICE_ACCOUNT: ${{ secrets.GCLOUD_SERVICE_ACCOUNT }}
CLUSTER_NAME: github-spark-rapids-ml-${{github.run_number}}
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- name: run benchmark
shell: bash
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/signoff-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ jobs:
signoff-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4

- name: sigoff-check job
uses: ./.github/workflows/signoff-check
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,16 +35,17 @@ The following table shows the currently supported algorithms. The goal is to ex
| Supported Algorithms | Python | Scala |
| :--------------------- | :----: | :---: |
| CrossValidator || |
| DBSCAN (*) || |
| KMeans || |
| k-NN (*) || |
| approx/exact k-NN (*) || |
| LinearRegression || |
| LogisticRegression || |
| PCA |||
| RandomForestClassifier || |
| RandomForestRegressor || |
| UMAP (*) || |

Note: Spark does not provide a k-Nearest Neighbors (k-NN) implementation, but it does have an [LSH-based Approximate Nearest Neighbor](https://spark.apache.org/docs/latest/ml-features.html#approximate-nearest-neighbor-search) implementation. As an alternative to PCA, we also provide a Spark API for GPU accelerated Uniform Manifold Approximation and Projection (UMAP), a non-linear dimensionality reduction algorithm in the RAPIDS cuML library.
Note: Spark does not provide a k-Nearest Neighbors (k-NN) implementation, but it does have an [LSH-based Approximate Nearest Neighbor](https://spark.apache.org/docs/latest/ml-features.html#approximate-nearest-neighbor-search) implementation. As an alternative to PCA, we also provide a Spark API for GPU accelerated Uniform Manifold Approximation and Projection (UMAP), a non-linear dimensionality reduction algorithm in the RAPIDS cuML library. As an alternative to KMeans, we also provide a Spark API for GPU accelerated Density-Based Spatial Clustering of Applications with Noise (DBSCAN), a density based clustering algorithm in the RAPIDS cuML library.

## Getting started

Expand Down
2 changes: 1 addition & 1 deletion ci/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,6 @@ RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86
&& conda config --set solver libmamba

# install cuML
ARG CUML_VER=24.02
ARG CUML_VER=24.04
RUN conda install -y -c rapidsai -c conda-forge -c nvidia cuml=$CUML_VER python=3.9 cuda-version=11.8 \
&& conda clean --all -f -y
21 changes: 11 additions & 10 deletions ci/Jenkinsfile.premerge
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/local/env groovy
/*
* Copyright (c) 2023, NVIDIA CORPORATION.
* Copyright (c) 2023-2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -40,7 +40,7 @@ pipeline {
agent {
kubernetes {
label "premerge-init-${BUILD_TAG}"
cloud 'sc-ipp-blossom-prod'
cloud "${common.CLOUD_NAME}"
yaml cpuImage
}
}
Expand Down Expand Up @@ -87,7 +87,7 @@ pipeline {

def title = githubHelper.getIssue().title
if (title ==~ /.*\[skip ci\].*/) {
githubHelper.updateCommitStatus("$BUILD_URL", "Skipped", GitHubCommitState.SUCCESS)
githubHelper.updateCommitStatus("", "Skipped", GitHubCommitState.SUCCESS)
currentBuild.result == "SUCCESS"
skipped = true
return
Expand All @@ -107,7 +107,7 @@ pipeline {
agent {
kubernetes {
label "premerge-docker-${BUILD_TAG}"
cloud 'sc-ipp-blossom-prod'
cloud "${common.CLOUD_NAME}"
yaml pod.getDockerBuildYAML()
workspaceVolume persistentVolumeClaimWorkspaceVolume(claimName: "${PVC}", readOnly: false)
customWorkspace "${CUSTOM_WORKSPACE}"
Expand All @@ -116,7 +116,7 @@ pipeline {

steps {
script {
githubHelper.updateCommitStatus("$BUILD_URL", "Running - preparing", GitHubCommitState.PENDING)
githubHelper.updateCommitStatus("", "Running - preparing", GitHubCommitState.PENDING)
checkout(
changelog: false,
poll: true,
Expand Down Expand Up @@ -169,7 +169,7 @@ pipeline {
agent {
kubernetes {
label "premerge-ci-${BUILD_TAG}"
cloud 'sc-ipp-blossom-prod'
cloud "${common.CLOUD_NAME}"
yaml pod.getGPUYAML("${IMAGE_PREMERGE}", "${env.GPU_RESOURCE}", '8', '32Gi')
workspaceVolume persistentVolumeClaimWorkspaceVolume(claimName: "${PVC}", readOnly: false)
customWorkspace "${CUSTOM_WORKSPACE}"
Expand All @@ -178,7 +178,7 @@ pipeline {

steps {
script {
githubHelper.updateCommitStatus("$BUILD_URL", "Running - tests", GitHubCommitState.PENDING)
githubHelper.updateCommitStatus("", "Running - tests", GitHubCommitState.PENDING)
container('gpu') {
timeout(time: 2, unit: 'HOURS') { // step only timeout for test run
common.resolveIncompatibleDriverIssue(this)
Expand All @@ -198,14 +198,15 @@ pipeline {
}

if (currentBuild.currentResult == "SUCCESS") {
githubHelper.updateCommitStatus("$BUILD_URL", "Success", GitHubCommitState.SUCCESS)
githubHelper.updateCommitStatus("", "Success", GitHubCommitState.SUCCESS)
} else {
// upload log only in case of build failure
def guardWords = ["gitlab.*?\\.com", "urm.*?\\.com"]
def guardWords = ["gitlab.*?\\.com", "urm.*?\\.com", "sc-ipp-*"]
guardWords.add("nvidia-smi(?s)(.*?)(?=git)") // hide GPU info
guardWords.add("sc-ipp*") // hide cloud info
githubHelper.uploadLogs(this, env.JOB_NAME, env.BUILD_NUMBER, null, guardWords)

githubHelper.updateCommitStatus("$BUILD_URL", "Fail", GitHubCommitState.FAILURE)
githubHelper.updateCommitStatus("", "Fail", GitHubCommitState.FAILURE)
}

if (TEMP_IMAGE_BUILD) {
Expand Down
2 changes: 1 addition & 1 deletion docker/Dockerfile.pip
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ ARG CUDA_VERSION=11.8.0
FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu22.04

ARG PYSPARK_VERSION=3.3.1
ARG RAPIDS_VERSION=24.2.0
ARG RAPIDS_VERSION=24.4.0
ARG ARCH=amd64
#ARG ARCH=arm64
# Install packages to build spark-rapids-ml
Expand Down
2 changes: 1 addition & 1 deletion docker/Dockerfile.python
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
ARG CUDA_VERSION=11.8.0
FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu20.04

ARG CUML_VERSION=24.02
ARG CUML_VERSION=24.04

# Install packages to build spark-rapids-ml
RUN apt update -y \
Expand Down
5 changes: 3 additions & 2 deletions docs/site/compatibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,17 @@ The following table shows the currently supported algorithms. The goal is to ex
| Supported Algorithms | Python | Scala |
| :--------------------- | :----: | :---: |
| CrossValidator || |
| DBSCAN (*) || |
| KMeans || |
| k-NN (*) || |
| approx/exact k-NN (*) || |
| LinearRegression || |
| LogisticRegression || |
| PCA |||
| RandomForestClassifier || |
| RandomForestRegressor || |
| UMAP (*) || |

Note: Spark does not provide a k-Nearest Neighbors (k-NN) implementation, but it does have an [LSH-based Approximate Nearest Neighbor](https://spark.apache.org/docs/latest/ml-features.html#approximate-nearest-neighbor-search) implementation. As an alternative to PCA, we also provide a Spark API for GPU accelerated Uniform Manifold Approximation and Projection (UMAP), a non-linear dimensionality reduction algorithm in the RAPIDS cuML library.
Note: Spark does not provide a k-Nearest Neighbors (k-NN) implementation, but it does have an [LSH-based Approximate Nearest Neighbor](https://spark.apache.org/docs/latest/ml-features.html#approximate-nearest-neighbor-search) implementation. As an alternative to PCA, we also provide a Spark API for GPU accelerated Uniform Manifold Approximation and Projection (UMAP), a non-linear dimensionality reduction algorithm in the RAPIDS cuML library. As an alternative to KMeans, we also provide a Spark API for GPU accelerated Density-Based Spatial Clustering of Applications with Noise (DBSCAN), a density based clustering algorithm in the RAPIDS cuML library.


## Supported Versions
Expand Down
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
project = 'spark-rapids-ml'
copyright = '2024, NVIDIA'
author = 'NVIDIA'
release = '24.02.0'
release = '24.04.0'

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
Expand Down
6 changes: 5 additions & 1 deletion docs/source/spark_rapids_ml.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ Clustering
:template: autosummary/class_with_docs.rst
:toctree: api

DBSCAN
DBSCANModel
KMeans
KMeansModel

Expand Down Expand Up @@ -85,6 +87,8 @@ Nearest Neighbors
:template: autosummary/class_with_docs.rst
:toctree: api

ApproximateNearestNeighbors
ApproximateNearestNeighborsModel
NearestNeighbors
NearestNeighborsModel

Expand All @@ -111,4 +115,4 @@ UMAP
:toctree: api

UMAP
UMAPModel
UMAPModel
Loading

0 comments on commit df01b39

Please sign in to comment.