This guide discusses the details of REDCap-ETL configuration properties.
The data source is the REDCap project that has the data to be extracted. It is specified as the REDCap API token for the project and the REDCap API URL for the REDCap instance that has the project, for example:
redcap_api_url = http://redcap.someplace.edu/api/
data_source_api_token = 1A2FC4E892CC712031BA94D1198127C1
The transformation rules specify how the records in REDCap should be transformed into records in your database.
You need to set the transform_rules_source property to one of the numbers (1, 2, or 3), which are described below:
- text - rules are entered directly in the configuration file. This option is only available for JSON (.json) configuration files, and not .ini configuration files. JSON configuration files are
- file - the rules are stored in a file:
- on the REDCap-ETL server, for file-based configuration
- uploaded to the REDCap configuration project, for project-based configuration
- auto-generated - default rules (one table per form) are automatically generated by REDCap-ETL. This option is mainly intended for testing that an installation or project has been set up correctly.
For option 2 (the file option), you need to specify the file containing your transformation rules using the transform_rules_file property. So, for example, you might have the following in your configuration file:
transform_rules_source = 2
transform_rules_file = basic-demography-rules.txt
Using option 3 (auto-generated) above will not allow you to modify the auto-generated rules. To generate rules so that they can be modified, use the following script to generate a file of rules, and then edit those rules as appropriate:
bin/transformation_rules_generator.php
The script uses your REDCap-ETL configuration file to generate rules. For example, from the top-level directory of the installation, you could use the following command to generate rules for a config/test.ini configuration file:
php bin/transformation_rules_generator.php config/test.ini > config/test-rules.txt
The above script supports several options:
- -c, --complete-fields - includes form complete fields in the auto-generated transformation rules.
- -d, --dag-fields - includes DAG (Data Access Group) fields in the auto-generated transformation rules.
- -f, --file-fields - includes file fields in the auto-generated transformation rules. Note that if a file field is included in the rules, only a flag indicating that a document has been uploaded for this field will be exported from REDCap. The file contents will NOT be exported.
- -s, --survey-fields - includes survey fields (survey identifier and timestamp) for forms that are enabled as surveys.
- -n, --notes-fields - remove notes fields.
- -i, --identifier-fields - remove identifier fields.
- -t, --table-nonrepeating <table-name> - combine non-repeating fields into table "table-name" (for non-longitudinal projects).
Examples:
php bin/transformation_rules_generator.php -c -d -f config/test.ini > config/test-rules.txt
php bin/transformation_rules_generator.php -cdf config/test.ini > config/test-rules.txt
php bin/transformation_rules_generator.php --dag-fields config/test.ini > config/test-rules.txt
php bin/transformation_rules_generator.php --dag-fields --file-fields config/test.ini > config/test-rules.txt
After creating the rules file with the script, you will need to modify your configuration file to use the rules file, for example:
transform_rules_source = 2
transform_rules_file = test-rules.txt
Details about the transformation rules can be found here: Transformation Rules Guide
In your configuration file, you need to set the property db_connection to a string that provides connection information for your database, for example:
db_connection = MySQL:127.10.10.1:etl_user:etl_password:etl_test_db
The database where the data is loaded is specified as a database connection string. This string has the following format:
<database-connection-type> : <database-connection-values>
Currently, the supported database connection types are
- MySQL
- CSV (comma-separated values)
- PostgreSQL
- SQLite
- SQL Server
For MySQL, the format of the database connection string is:
MySQL:<host>:<username>:<password>:<database>[:<port>]
Example MySQL database connection strings:
MySQL:localhost:etl_user:etl_password:etl_test_db
MySQL:someplace.edu:admin:admin_password_123:etl_prod_db:3306
Note: Since the ':' character is used as a separator for the database connection string, if any of the values in your database connection contain a ':', it needs to be escaped with a backslash. For example, if your password is "my:password", then it would need to be specified as "my\:password".
For CSV, the database connection string format is:
CSV:<output-directory>
For example:
CSV:/home/redcap-etl/csv/project1
For PostgreSQL, the database connection string format is:
PostgreSQL:<host>:<username>:<password>:<database>[:<schema>[:<port>]]
For SQLite, the database connection string format is:
SQLite:<database-file-path>
For example:
SQLite:/home/redcap-etl/db/etl-data.db
For SQL Server, the format of the database connection string is:
SQLServer:<host>:<username>:<password>:<database>[:<port>]
Example SQL server database connection strings:
SQLServer:localhost:some_user:amazingPW987:etl_test_db
SQLServer:someplace.edu:admin:adminPassword123:etl_prod_db:1433
Note: Since the ':' character is used as a separator for the database connection string, if any of the values in your database connection contain a ':', it needs to be escaped with a backslash. For example, if your password is "my:password", then it would need to be specified as "my\:password".
This section provides a detailed list of the configuration properties.
Property | Description |
---|---|
redcap_api_url | The URL for your REDCap API. Typically this will be your REDCap URL with "/api/" appended to the end. Not ending the URL with a slash (/) may cause an error. |
ssl_verify | Indicates if SSL verification is used for the connection to REDCap. This defaults to true. Setting it to false is insecure. |
ca_cert_file | Certificate authority certificate file. This can be used to support SSL verification of the connection to REDCap if your system does not provide support for it by default |
Property | Description |
---|---|
data_source_api_token | The API token for the REDCap project from which the data is being extracted from REDCap. |
extract_filter_logic | REDCap filter logic that is used to restrict the records extracted from the project.
For example, extract_filter_logic='[record_id] < 1020'would cause only records with a record ID less than 1020 to be extracted from REDCap. Note that REDCap-ETL just passes the extract filter logic value to REDCap, and REDCap does minimal error checking. In many cases when you have an error in your filter, REDCap will not generate an error and no records will be returned. You can test out your filter logic in the API Playground within REDCap to make sure that it returns the records expected. Also note that some characters typically used in for the filter logic can cause an error when REDCap-ETL processes the configuration file. To prevent this from happening, it is best to put the filter logic value in quotes as shown in the example above. |
Properties for the database where the extracted data is loaded.
Property | Description |
---|---|
db_connection | The database connection string for the database where the data is loaded. |
db_ssl | Flag that indicates if SSL should be used for database connection (true by default). This property is not applicable to CSV files or the SQLite database. Note: on Linux systems when using MySQL, having this set to true may cause the database connection to fail when the database host is specified as "localhost". To fix this problem, set db_ssl to 'false' (or 0), or specify "127.0.0.1" for the database host instead of "localhost". |
db_ssl_verify | Flag that indicates if the SSL certificate of the database server should be verified. For this to work, a valid ca_cert_file (certificate authority certificate file) needs to be specified. This property is not applicable to CSV files or SQLite databases. |
db_primary_keys | Flag that indicates if primary key constraints should be created for database tables (true by default). This functionality is not available for CSV files or SQLite databases. |
db_foreign_keys | Flag that indicates if foreign key constraints should be created for database tables that are specified as having parent tables in the transformation rules (true by default). This functionality is not available for CSV files or SQLite databases. |
REDCap-ETL logs to the load database by default. It creates and logs to 2 tables:
- etl_log - contains one row for each ETL process run, where each row contains the start time, table prefix used (if any), batch size, and REDCap-ETL version number.
- etl_event_log - contains the individual event messages for each ETL process run.
The 2 tables can be joined on their log_id attributes.
Unlike the other tables REDCap-ETL generates, the database logging tables are not deleted between runs, so these tables accumulate the results of all ETL runs.
Property | Description |
---|---|
db_logging | A true/false property indicating if REDCap-ETL should log to the database. The default value for this property is true. Database logging is not supported for CSV (comma-separated value) file output. |
db_log_table | The name of the main database logging table. This name defaults to **etl_log**. |
db_event_log_table | The name of the database logging event table. This name defaults to **etl_event_log**. |
Properties for e-mail error notifications and processing summaries that can be sent by REDCap-ETL.
Property | Description |
---|---|
email_errors | True/false value that indicates if an e-mail should be sent when errors occur to the "email_to_list". The default value is true. |
email_summary | True/false value that indicates if an e-mail summary of the log messages should be sent to the "email_to_list". The default value is false, and no summary will be sent if the ETL process encounters an error. |
email_from_address | The from address for e-mail notifications sent by REDCap-ETL |
email_subject | The subject for e-mail notifications sent by REDCap-ETL |
email_to_list | The to address list for e-mail notifications sent by REDCap-ETL |
For configurations where the transform_rules_source property is set to 3 (auto-generation), the following properties can be used to control the auto-generation of transformation rules:
Property | Description |
---|---|
autogen_include_complete_fields | Set to 1 or 'true' to include form complete fields in the auto-generated transformation rules. By default, they will not be included. |
autogen_include_dag_fields | Set to 1 or 'true' to include DAG (Data Access Group) fields in the auto-generated transformation rules. By default, they will not be included. |
autogen_include_file_fields | Set to 1 or 'true' to include file fields in the auto-generated transformation rules. By default, they will not be included. |
autogen_include_survey_fields | Set to 1 or 'true' to include survey fields in the auto-generated transformation rules. By default, they will not be included. |
autogen_remove_notes_fields | Set to 1 or 'true' to remove REDCap notes fields in the auto-generated transformation rules. By default, they will be included. |
autogen_remove_identifier_fields | Set to 1 or 'true' to remove REDCap fields that are marked as identifiers in the auto-generated transformation rules. By default, they will be included. |
autogen_combine_non_repeating_fields | Set to 1 or 'true' to have all fields in non-repeating forms combined into a single table for non-longitudinal projects. By default, each form will be in a separate table. |
autogen_non_repeating_fields_table | If the autogen_combine_non_repeating_fields property is set, this property needs to be set as the table name to use for the table that combines the non-repeating fields. |
REDCap-ETL supports properties that can be used to specify custom SQL processing. You can specify SQL to be executed before and/or after the ETL process runs.
Note:
- If a table prefix is specified for the ETL process, that prefix will NOT be automatically added to table names in the post-processing SQL file, so you will need to add it manually, and change it if the table prefix changes.
- This feature is intended for running commands that update the database, and select commands will generate no output.
- Custom SQL processing is not supported for CSV files.
Property | Description |
---|---|
pre_processing_sql_file | File with SQL statements to execute on the database before the ETL process runs. This can be useful for dropping indexes and views created using post-processing SQL. |
post_processing_sql_file | File with SQL statements to execute on the database after the ETL process has finished. This can be used, for example, to create indexes or custom views on tables generated by the ETL process. |
In REDCap, multiple choice questions have values and corresponding labels, e.g., [0 => "no", 1 => "yes"]. Originally, REDCap-ETL only displayed the values in the tables generated in the target database, and then created views that contained only the labels. Now REDCap-ETL generates both the values and labels in the tables. The current plan is to remove the label views in a future version of REDCap-ETL.
Property | Description |
---|---|
label_view_suffix | *(Deprecated)* A view is created that contains the labels for multiple choice questions (and not the codes/values) |
label_field_suffix | Suffix that will be added to multiple-choice label fields. The default value is "_label" To turn off these fields from being generated, set this to blank. |
REDCap-ETL generates several fields automatically. The generated field type properties can be used to modify the types of these fields.
Property | Description |
---|---|
generated_label_type | The type used for the label fields generated by REDCap-ETL for checkbox, radio and dropdown fields. If varchar or char types are specified, the size for the labels will be calculated to be the minimum needed for the largest value for the specific field for the label. |
generated_name_type | The type used for name fields for events and instruments generated by REDCap-ETL (the redcap_event_name and redcap_repeat_instrument fields) |
generated_record_id_type | The type used for the REDCap record ID fields generated for each table by REDCap-ETL |
generated_suffix_type | The type used for the suffix fields generated by REDCap-ETL when a suffixes rows type is specified |
REDCap-ETL uses a "Lookup" table internally to map REDCap's multiple choice values to their corresponding labels. This internal table can be saved to your database.
Property | Description |
---|---|
create_lookup_table | A true/false property that indicates whether or not a lookup table should be created in the database. The property defaults to "false" if not specified. |
lookup_table_name | The name to use for the lookup table created in the database. The default table name is "etl_lookup". |
Property | Description |
---|---|
calc_field_ignore_pattern | A pattern represented as a PHP regular expression that will be used in determining what calculation fields values are considered when evaluating REDCap data rows for inclusion in a table. By default, if a calculation field is included in a table, then any REDCap data records which have a non-blank value for that calculation field will be included in the table. If, however, you specified the pattern '/^0$/' for this property, then REDCap-ETL would ignore calculation fields that are blank or zero when determining what REDCap data records should be included in the table. |
ignore_empty_incomplete_forms | In certain situations, REDCap will set the form complete field for a form to the "incomplete" value for a record, even though the form has never been edited or saved for that record. When this happens, by default, REDCap-ETL will store a row for that form for the record in the database. The row will have no values except for the form complete field, which will be set to zero. However, if this property is set to 1 or "true", then REDCap-ETL will ignore records for such forms. |
extracted_record_count_check | A true/false property that indicates whether or not a check should be done to make sure that the number of records extracted from REDCap is the number expected. This is done to catch cases where REDCap's API does not export all the records that it should due to an internal error, but does not notify REDCap-ETL of the error. By default this check is turned on, but you might need to turn it off if records could be deleted or added when the ETL process runs, which could also cause the counts not to match. |
log_file | File to use for logging |
time_limit | Maximum number of seconds the ETL process is allowed to run. A value of zero (the default value) means there is no limit. |
time_zone | Timezone to use; see the URL below for valid values: https://secure.php.net/manual/en/timezones.php By default, the default PHP timezone on the system is used. |