diff --git a/clients/drcachesim/docs/drcachesim.dox.in b/clients/drcachesim/docs/drcachesim.dox.in index b1ce51d450d..dbbcdbc107c 100644 --- a/clients/drcachesim/docs/drcachesim.dox.in +++ b/clients/drcachesim/docs/drcachesim.dox.in @@ -998,7 +998,7 @@ $ bin64/drrun -t drmemtrace -indir newdir -tool basic_counts \endcode **************************************************************************** -\page google_workload_traces Google Workload Traces +\page google_workload_traces Google Workload Traces (Version 2) With the rapid growth of internet services and cloud computing, workloads on warehouse-scale computers (WSCs) have become an important @@ -1011,11 +1011,11 @@ instruction and memory address traces from workloads running in Google data centers so that computer architecture researchers can study and develop new architecture ideas to improve the performance and efficiency of this important class of workloads. To protect Google's -intellectual property, these traces have been modified in order to -filter out sensitive information. These traces follow a synthetic ISA -(#DR_ISA_REGDEPS) that removes architecture specific details (e.g., the -opcode of instructions), while still providing enough information (e.g., -register dependencies) to perform meaningful analyses and simulations. +intellectual property, these traces have had their original ISA replaced +with a synthetic ISA that we call #DR_ISA_REGDEPS. This synthetic ISA +removes architecture specific details (e.g., the opcode of instructions), +while still providing enough information (e.g., register dependencies, +instruction categories) to perform meaningful analyses and simulations. \section sec_google_format Public Trace Format @@ -1031,29 +1031,24 @@ hints on the type of operation an instruction performs. Being a synthetic ISA, some routines that work on instructions coming from an actual ISA (such as #DR_ISA_AMD64) are not supported (e.g., decode_sizeof()). - -Currently we support: -- instr_convert_to_isa_regdeps(): to convert an #instr_t of an actual ISA to a - #DR_ISA_REGDEPS #instr_t. -- instr_encode() and instr_encode_to_copy(): to encode a #DR_ISA_REGDEPS #instr_t - into a sequence of contiguous bytes. -- decode() and decode_from_copy(): to decode an encoded #DR_ISA_REGDEPS instruction - into an #instr_t. +We do support decode() and decode_from_copy(): to decode an encoded #DR_ISA_REGDEPS +instruction into an #instr_t. A #DR_ISA_REGDEPS #instr_t contains the following information: -- categories: composed by #dr_instr_category_t values, they indicate the type of +- Categories: composed by #dr_instr_category_t values, they indicate the type of operation performed (e.g., a load, a store, a floating point math operation, a branch, etc.). Note that categories are composable, hence more than one category can be set. This information can be obtained using instr_get_category(). -- arithmetic flags: we don't distinguish between different flags, we only report if +- Arithmetic flags: we don't distinguish between different flags, we only report if at least one arithmetic flag was read (all arithmetic flags will be set to read) and/or written (all arithmetic flags will be set to written). This information can be obtained using instr_get_arith_flags(). -- number of source and destination operands: we only consider register operands. +- Number of source and destination operands: we only consider register operands. This information can be obtained using instr_num_srcs() and instr_num_dsts(). -- source operation size: is the largest source operand the instruction operates on. - This information can be obtained by accessing the #instr_t operation_size field. -- list of register operand identifiers: they are contained in #opnd_t lists, + Memory operands can be deduced by subsequent read and write records in the trace. +- Source operation size: is the largest source operand the instruction operates on. + This information can be obtained using instr_get_operation_size(). +- List of register operand identifiers: they are contained in #opnd_t lists, separated in source and destination. Note that these #reg_id_t identifiers are virtual and it should not be assumed that they belong to any DR_REG_ enum value of any specific architecture. These identifiers are meant for tracking register @@ -1062,44 +1057,49 @@ A #DR_ISA_REGDEPS #instr_t contains the following information: instr_get_src(). - ISA mode: is always #DR_ISA_REGDEPS. This information can be obtained using instr_get_isa_mode(). -- encoding bytes: an array of bytes containing the #DR_ISA_REGDEPS #instr_t +- Encoding bytes: an array of bytes containing the #DR_ISA_REGDEPS #instr_t encoding. Note that this information is present only for decoded instructions (i.e., #instr_t generated by decode() or decode_from_copy()). This information can be obtained using instr_get_raw_bits(). -- length: the length of the encoded instruction in bytes. Note that this +- Length: the length of the encoded instruction in bytes. Note that this information is present only for decoded instructions (i.e., #instr_t generated by - decode() or decode_from_copy()). This information can be obtained by accessing - the #instr_t length field. + decode() or decode_from_copy()). This information can be obtained using instr_length(). + Be aware that in Google Workload Traces the instruction fetch size of a #memref_t and + the instr_length() of the corresponding instruction do not match! For convenience + reasons we kept the instruction fetch size to be the same as the size of the original + ISA instruction. Note that all routines that operate on #instr_t and #opnd_t are also supported for -#DR_ISA_REGDEPS instructions. However, querying information outside of those -described above (e.g., the instruction opcode with instr_get_opcode()) will return -the zeroed value set by instr_create() or instr_init() when the #instr_t was -created (e.g., instr_get_opcode() would return OP_UNDECODED). +#DR_ISA_REGDEPS instructions and their operands. However, querying information outside +of those described above (e.g., the instruction opcode with instr_get_opcode()) will +return the zeroed value set by instr_create() or instr_init() when the #instr_t was +created. On top of instructions and memory acceses, traces also have #dynamorio::drmemtrace::trace_marker_type_t markers. All markers of the original trace are present, except for: -#dynamorio::drmemtrace::TRACE_MARKER_TYPE_SYSCALL_IDX -#dynamorio::drmemtrace::TRACE_MARKER_TYPE_SYSCALL -#dynamorio::drmemtrace::TRACE_MARKER_TYPE_SYSCALL_TRACE_START -#dynamorio::drmemtrace::TRACE_MARKER_TYPE_SYSCALL_TRACE_END -#dynamorio::drmemtrace::TRACE_MARKER_TYPE_SYSCALL_FAILED, which have been removed. +- #dynamorio::drmemtrace::TRACE_MARKER_TYPE_SYSCALL_IDX +- #dynamorio::drmemtrace::TRACE_MARKER_TYPE_SYSCALL +- #dynamorio::drmemtrace::TRACE_MARKER_TYPE_SYSCALL_TRACE_START +- #dynamorio::drmemtrace::TRACE_MARKER_TYPE_SYSCALL_TRACE_END +- #dynamorio::drmemtrace::TRACE_MARKER_TYPE_SYSCALL_FAILED +Which have been removed. Because tracing overhead results into inflated context switches, the #dynamorio::drmemtrace::TRACE_MARKER_TYPE_CPU_ID values have been modified to -"unknown CPU" to avoid confusion. We recommend users to use our scheduler for a -realistic schedule of a trace threads. Also, the only -#dynamorio::drmemtrace::TRACE_MARKER_TYPE_FUNC_ID -#dynamorio::drmemtrace::TRACE_MARKER_TYPE_FUNC_ARG -#dynamorio::drmemtrace::TRACE_MARKER_TYPE_FUNC_RETVAL -#dynamorio::drmemtrace::TRACE_MARKER_TYPE_FUNC_RETADDR markers preserved are those -related to SYS_futex functions. +"unknown CPU" to avoid confusion. We recommend users to use our scheduler +(see \ref sec_drcachesim_sched) for a realistic schedule of a trace's threads. +Also, we preserved the following markers: +- #dynamorio::drmemtrace::TRACE_MARKER_TYPE_FUNC_ID +- #dynamorio::drmemtrace::TRACE_MARKER_TYPE_FUNC_ARG +- #dynamorio::drmemtrace::TRACE_MARKER_TYPE_FUNC_RETVAL +- #dynamorio::drmemtrace::TRACE_MARKER_TYPE_FUNC_RETADDR +But only for SYS_futex functions. Finally, every trace has a v2p.textproto file associated to it, which provides a plausible virtual to physical mapping of the virtual addresses present in a trace for more realistic TLB simulations. This is a static virtual to physical mapping with 2 MB pages. Users can generate different mappings (e.g., smaller page size) -by modifying such file, or create their own mapping following the same +by modifying this file, or create their own mapping following the same v2p.textproto format. \section sec_google_get Getting the Traces @@ -1111,7 +1111,7 @@ The Google Workload Traces can be downloaded from: Directory structure: - \verbatim workload_name/ - ..drmemtrace.zip + ..memtrace.zip v2p.textproto \endverbatim @@ -1148,6 +1148,18 @@ You can contribute to the project in many ways: - Sharing and collaborating on architecture research. - Reporting issues: see \ref sec_google_help +\section sec_public_v1_deprecated Deprecated Google Workload Traces (Version 1) + +The previous version of Google workload traces contains a subset of the +information of the current traces and has been deprecated. +Please use the current version described above. + +The previous version can still be found at: + + - [Google workload trace folder (Version 1)](https://console.cloud.google.com/storage/browser/external-traces) + +DynamoRIO 11.0 is the latest version that supports these traces. + **************************************************************************** \page sec_drcachesim_config_file Configuration File