Features | Architecture | Documentation References | Compatibility| Log Ingestion Examples | Feedback | Legal Information
The Splunk Integration project is a non-supported bidirectional connector consisting of three main components as depicted in the architecture diagram:
- The Databricks add-on for Splunk, an app, that allows Splunk Enterprise and Splunk Cloud users to run queries and execute actions, such as running notebooks and jobs, in Databricks
- Splunk SQL database extension (Splunk DB Connect) configuration for Databricks connectivity
- Notebooks for Push and Pull events and alerts from Splunk Databricks.
We also provided extensive documentation for Log Collection to ingest, store, and process logs on economical and performant Delta lake.
- Run Databricks SQL queries right from the Splunk search bar and see the results in Splunk UI (Fig 1 )
- Execute actions in Databricks, such as notebook runs and jobs, from Splunk (Fig 2 & Fig 3)
- Use Splunk SQL database extension to integrate Databricks information with Splunk queries and reports (Fig 4 & Fig 5)
- Push events, summary, alerts to Splunk from Databricks (Fig 6 and Fig 7)
- Pull events, alerts data from Splunk into Databricks (Fig 8)
Run Databricks SQL queries right from the Splunk search bar and see the results in Splunk UI
Execute actions in Databricks, such as notebook runs and jobs, from Splunk
Use Splunk SQL database extension to integrate Databricks information with Splunk queries and reports
Push events, summary, alerts to Splunk from Databricks
- Databricks Add-on for Splunk Integration Installation And Usage Guide:
- Documentation: [markdown, pdf, word]
- Link to Databricks add-on for Splunk on Splunkbase
- Splunk DB Connect Guide for Databricks:
- Push Data to Splunk from Databricks.docx:
- Pull Data from Splunk into Databricks.docx:
Databricks Add-on for Splunk, notebooks and documentation provided in this project are compatible with:
- Splunk Enterprise version: 8.1.x and 8.2.x
- Databricks REST API: 1.2 and 2.0:
- Azure Databricks
- AWS SaaS, E2 and PVC deployments
- GCP
- OS: Platform independent
- Browser: Safari, Chrome and Firefox
This project also provides documentation and notebooks to showcase specifics on how to use Databricks for collecting various logs (a comprehensive list is provided below) via stream ingest and batch-ingest using Databricks autoloader and Spark streaming into cloud Data lakes for durable storage on S3. The included documentation and notebooks also provide methods and code details for each log type: parsing, schematizing, ETL/Aggregation, and storing in Delta format to make them available for analytics.
Data collection sources with notebooks and documentation are included for the following sources:
- Cloudtrail logs:
- VPC flow logs:
- Syslog:
Issues with the application? Found a bug? Have a great idea for an addition? Feel free to file an issue or submit a pull request.
This software is provided as-is and is not officially supported by Databricks through customer technical support channels. Support, questions, help, and feature requests can be communicated via email -> cybersecurity@databricks.com or through the Issues page of this repo.