Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/support cosmos #145

Draft
wants to merge 59 commits into
base: main
Choose a base branch
from
Draft
Changes from 1 commit
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
48394f1
Remove usePurviewType experimental flag
wjohnson Dec 5, 2022
b6b48a3
Prioritize resource sets and allow for matching against dfs and blob …
wjohnson Dec 6, 2022
fa5c9c4
Adding recent troubleshooting guidance
wjohnson Dec 12, 2022
9c71e45
Testing Overhaul
wjohnson Dec 12, 2022
73e656e
Resolving race condition based on app settings being deployed while m…
wjohnson Jan 3, 2023
612f4c0
Bump cryptography from 38.0.4 to 39.0.1 in /tests/environment
dependabot[bot] Feb 8, 2023
b74ad2b
Added unit test and integration test for Azure MySQL, and updated LIM…
hmoazam Feb 6, 2023
6f42108
Documentation references OL 0.18 and DBR 11.3
wjohnson Feb 5, 2023
e83d8ac
No longer supporting Spark 2
wjohnson Feb 5, 2023
7099edd
Updates - Postgres (#148)
hmoazam Feb 6, 2023
6296607
Feature/support kusto (#147)
hmoazam Feb 6, 2023
f59a347
Refactoring SelectReturnEntity to be more clear
wjohnson Feb 12, 2023
dbccd6a
Refactoring SelectReturnEntity to reflect we only accept entities wit…
wjohnson Feb 12, 2023
2636d40
Fix spark3-test-def merge conflict
wjohnson Feb 12, 2023
45f65db
Prioritize blob paths over placeholder entity
wjohnson Feb 12, 2023
79410ce
Refactoring validentity to validEntitiesAfterFiltering to make it mor…
wjohnson Feb 12, 2023
61ea232
Removing unncessary column mapping comments and Validate_Resource_Set…
wjohnson Feb 12, 2023
d53f45f
Refactoring PurviewIngestion to remove unused methods and refactor ou…
wjohnson Feb 12, 2023
a947969
Refactor Validate_X_Json method names to reflect what it is testing o…
wjohnson Feb 12, 2023
f061826
Refactoring SendToPurview in PurviewIngestion for each loop's variabl…
wjohnson Feb 12, 2023
d3dd7af
Refactoring PurviewIngestion naming conventions to clarify entities t…
wjohnson Feb 13, 2023
d732ef7
Adding a field for originalQualifiedName and removing unused methods …
wjohnson Feb 13, 2023
48dc0c6
Refactoring the cache naming conventions inside of PurviewIngestion
wjohnson Feb 13, 2023
50b0bf5
ColParser should optionally take in a mapping of original dataset nam…
wjohnson Feb 13, 2023
48af630
Refactoring to support extracting the ColumnParser to be passed aroun…
wjohnson Feb 14, 2023
ada23c9
Implement ColumnParser injection and update column mappings of proces…
wjohnson Feb 14, 2023
0c68b4a
Updating Limitations and Readme to better reflect current state and s…
wjohnson Feb 14, 2023
eddfa25
Handle Azure Data Factory Job Names (#137)
wjohnson Feb 22, 2023
3fd940d
Reflect support for Azure Data Factory (#170)
wjohnson Feb 23, 2023
f4b166b
Update ADF and Kusto limitations (#169)
wjohnson Feb 23, 2023
811a7ae
Update Delta Merge support (#167)
wjohnson Feb 23, 2023
014183b
Adding Hive Table as part of supported features (#171)
wjohnson Feb 24, 2023
fbabc92
OL 13 -> 18 (#173)
hmoazam Feb 24, 2023
5e65747
Added snowflake mapping to gallery (#172)
hmoazam Feb 24, 2023
39dcd36
Updated with new aka.ms url ready for release (#168)
hmoazam Feb 24, 2023
2c58203
fixed MySQL and Postgres test expectations (#174)
hmoazam Feb 25, 2023
b6b1e87
Fix Library Definitions in Job Tasks to prevent deserialization error…
wjohnson Feb 28, 2023
bc40c97
OlToPurviewMapping Quality of Dev Improvements
wjohnson Mar 13, 2023
46ae5b9
Enabling Workflow Dispatch to run the build
wjohnson Mar 13, 2023
05a54b2
Correct one line mappings as artifact
wjohnson Mar 13, 2023
6fd492e
Fix Mappings for Mount Points with Subdirectories
wjohnson Mar 1, 2023
7502ed5
Adding unit tests to confirm mount behavior
wjohnson Mar 13, 2023
c0f65d1
Testin
wjohnson Mar 10, 2023
c9f0189
Rollback Delta Merge statements due to false positive in test suite
wjohnson Mar 13, 2023
51678a0
Remove unncessary comment
wjohnson Mar 13, 2023
6d8df7d
Update readme to rollback delta merge support
wjohnson Mar 13, 2023
c2bf960
Mappings must be a separate artifact
wjohnson Mar 13, 2023
c73398b
Implementing Cosmos support
hmoazam Dec 21, 2022
ba6c4aa
Cosmos integration test added, and test-env README updated. TODO: Upd…
hmoazam Jan 15, 2023
7f24fa2
WIP - found flaws in DataSourceV2 events logic
hmoazam Jan 17, 2023
9f3714c
Cosmos support WIP
hmoazam Jan 18, 2023
b67bbe1
Cosmos WIP
hmoazam Jan 18, 2023
50f3129
Cosmos WIP
hmoazam Jan 18, 2023
00b18d4
Update LIMITATIONS
hmoazam Jan 18, 2023
247e56a
Updated UnitTestData, test CompleteNoOutputsInputsFullMessage expecte…
hmoazam Jan 18, 2023
c3bbfdd
save progress - VS code broken
hmoazam Jan 29, 2023
e008feb
Cosmos WIP
hmoazam Feb 6, 2023
2da877e
Unsuccessful debugging
hmoazam Feb 12, 2023
edd5657
Fix null error when accessing inputs from table storage, and clean up…
hmoazam Feb 17, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Adding recent troubleshooting guidance
  • Loading branch information
wjohnson committed Dec 18, 2022
commit fa5c9c490e4ef796e26bd11a4500751438c3666e
51 changes: 51 additions & 0 deletions TROUBLESHOOTING.md
Original file line number Diff line number Diff line change
@@ -48,6 +48,57 @@

In this case, use the databricks CLI to upload the jar to the expected location to avoid changes in the file name.

* ### Internal Error Resolving Secrets

For the demo deployment, if your cluster fails and returns the error "Internal Error resolving secrets" and "Failed to fetch secrets referred to in Spark Conf", the deployment script may have failed to add an Access Policy to the Azure Key Vault or the secret scope was not created.

**Solution**: Update the values in the below script and execute it in the cloud shell. This script deletes the demo deployment's secret scope and then recreates it. After executing the script, you should see an access policy for "AzureDatabricks" in your Azure Key Vault.

```bash
adb_ws_url=adb-DATABRICKS_WORKSPACE.ID.azuredatabricks.net
global_adb_token=$(az account get-access-token --resource 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d -o tsv --query '[accessToken]')
adb_ws_id=/subscriptions/SUBSCRIPTION_ID/resourceGroups/RESOURCE_GROUP_NAME/providers/Microsoft.Databricks/workspaces/DATABRICKS_WORKSPACE_NAME
subscription_id=123acb-456-def
akv_name=AKV_NAME
akv_resource_id=/subscriptions/SUBSCRIPTION_ID/resourceGroups/RESOURCE_GROUP_NAME/providers/Microsoft.KeyVault/vaults/AKV_NAME

# Remove the Secret Scope if it exists
cat << EOF > delete-scope.json
{
"scope": "purview-to-adb-kv"
}
EOF

curl \
-X POST https://$adb_ws_url/api/2.0/secrets/scopes/delete \
-H "Authorization: Bearer $global_adb_token" \
-H "X-Databricks-Azure-Workspace-Resource-Id: $adb_ws_id" \
--data @delete-scope.json

# If the above fails, that's okay
# Ultimately, we just need a clean slate

cat << EOF > create-scope.json
{
"scope": "purview-to-adb-kv",
"scope_backend_type": "AZURE_KEYVAULT",
"backend_azure_keyvault":
{
"resource_id": "$akv_resource_id",
"dns_name": "https://$akv_name.vault.azure.net/"
},
"initial_manage_principal": "users"
}
EOF


curl \
-X POST https://$adb_ws_url/api/2.0/secrets/scopes/create \
-H "Authorization: Bearer $global_adb_token" \
-H "X-Databricks-Azure-Workspace-Resource-Id: $adb_ws_id" \
--data @create-scope.json
```

## <a id="no-lineage" />I don't see lineage in Microsoft Purview

* ### Try Refreshing the Page
2 changes: 2 additions & 0 deletions deploy-base.md
Original file line number Diff line number Diff line change
@@ -117,6 +117,8 @@ From the [Azure Portal](https://portal.azure.com)

echo $purview_type_resp_custom_type
```

If you need a Powershell alternative, see the [docs](./docs/powershell-alternatives.md#upload-custom-types).

## <a id="download-openlineage" />Download the OpenLineage Spark agent and configure with your Azure Databricks clusters

4 changes: 4 additions & 0 deletions deploy-demo.md
Original file line number Diff line number Diff line change
@@ -120,3 +120,7 @@ purview_type_resp_custom_type=$(curl -s -X POST $purview_endpoint/catalog/api/at

echo $purview_type_resp_custom_type
```

If you need a Powershell alternative, see the [docs](./docs/powershell-alternatives.md#upload-custom-types).

You should now be able to run your demo notebook and receive lineage.
25 changes: 25 additions & 0 deletions docs/powershell-alternatives.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Powershell Alternative Scripts

In some cases, you're not able to use the cloud shell or you don't have access to a machine that can run wsl / curl. This doc provides alternatives to select

## Upload Custom Types

Assumes you are in the `deployment/infra` folder of the repo.

```powershell
$purview_endpoint="https://PURVIEW_ACCOUNT_NAME.purview.azure.com"
$TENANT_ID="TENANT_ID"
$CLIENT_ID="CLIENT_ID"
$CLIENT_SECRET="CLIENT_SECRET"

$get_token=(Invoke-RestMethod -Method 'Post' -Uri "https://login.microsoftonline.com/$TENANT_ID/oauth2/token" -Body "resource=https://purview.azure.net&client_id=$CLIENT_ID&client_secret=$CLIENT_SECRET&grant_type=client_credentials")
$token=$get_token.access_token
$body=(Get-Content -Path .\Custom_Types.json)
$headers = @{
'Content-Type'='application/json'
'Authorization'= "Bearer $token"
}

Invoke-RestMethod -Method 'Post' -Uri "$purview_endpoint/catalog/api/atlas/v2/types/typedefs" -Body $body -Headers $headers

```