Skip to content

DataBiosphere/java-pfb

Repository files navigation

Java-PFB

A Java implementation of the pyPFB library that includes a CLI and a Java library.

As discussed in the docs provided by pyPFB, PFB files are Avro files with a particular required schema. Given the specific PFB schema, the gradle avro plugin automatically generates class files on project build. Together with the Avro Java library, we are able to deserialize the PFB files and get the relevant peices of data: The schema, metadata, nodes, and table row data.

To learn more about PFBs, see pfb.md.

Getting Started

See the library README for more details on how to reference the library in your project.

The CLI is a wrapper around the library. See the CLI README for more information.

Developer Information

Publishing

See library and cli readmes for more details.

Running SourceClear locally

SourceClear is a static analysis tool that scans a project's Java dependencies for known vulnerabilities. If you get a build failure due a SourceClear error and want to debug the problem locally, you need to get the API token from vault before running the gradle task.

export SRCCLR_API_TOKEN=$(vault read -field=api_token secret/secops/ci/srcclr/gradle-agent)
./gradlew srcclr

Results of the scan are uploaded to Veracode. You can request an account to view results from #dsp-infosec-champions.

Running SonarQube locally

SonarQube is a static analysis code that scans code for a wide range of issues, including maintainability and possible bugs. If you get a build failure due to SonarQube and want to debug the problem locally, you need to get the the sonar token from vault before runing the gradle task.

export SONAR_TOKEN=$(vault read -field=sonar_token secret/secops/ci/sonarcloud/java-pfb)
./gradlew sonar

Unlike SourceClear, running this task produces no output unless your project has errors. To always generate a report, run using --info:

./gradlew sonar --info

We run the scans for two projects: java-pfb and java-pfb-cli. The results are uploaded to the sonarcloud dashboard.

Benchmarking

Java Microbenchmark Harness (JMH) is a high-performance benchmarking tool for individual Java methods. It is integrated into this project via the JMH Gradle plugin.

To run the benchmarks manually:

./gradlew jmh

After benchmarks run, you can view the results at library/build/reports/jmh/human.txt.

Benchmarks automatically run via GitHub Actions for all PRs and for all pushes to the main branch.

You can view main branch benchmarks at https://databiosphere.github.io/java-pfb/dev/bench/.

For PRs, navigate to your most recent commit and look for a comment on that commit.

Note that benchmarks run via GitHub Actions are variable, since the runner machines are variable. Developers are encouraged to run benchmarks locally while developing to best assess the impact of their changes.