Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEST: Perf branch stack sampling #6330

Draft
wants to merge 773 commits into
base: rpi-6.10.y
Choose a base branch
from

Conversation

popcornmix
Copy link
Collaborator

Do not merge.

This is for testing a patch set that adds branch stack sampling to perf, which can be used by BOLT.

6by9 and others added 30 commits August 22, 2024 14:29
11cf37e switched to using drm_fb_dma_get_gem_addr instead of
drm_fb_dma_get_gem_obj and adding fb->offset[].

However the tiled formats need to compute the offset in a more
involved manner than drm_fb_dma_get_gem_addr applies, and we
were ending up with the offset for src_[xy] being applied twice.

Switch back to using drm_fb_dma_get_gem_obj and fully computing
the offsets ourselves.

Fixes: 11cf37e ("drm/vc4: Move the buffer offset out of the vc4_plane_state")
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Add a flag custom_fb_num to denote that the client has
requested a specific fbdev node number via node.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
For situations where there are multiple DRM cards in a system,
add a query of DT for "drm_fb" designations for cards to set
their preferred /dev/fbN designation.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>

drm/fb_helper: Change query for FB designation from drm_fb to drm-fb

Fixes: 1216ea5 ("drm/fb-helper: Look up preferred fbdev node number from DT")
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Apparently aliases are only allowed lower case and hyphens,
so swap the use of underscore to hyphen.

Fixes: 3aa1f24 ("drm: Look for an alias for the displays to use as the DRM device name")
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
This property can be used to delay deassertion of external fundamental
reset, which may be useful for endpoints that require an extended time for
internal setup to complete.

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
When factoring out __vc4_hvs_stop_channel, the logic got inverted from
	if (condition)
	  // stop channel
to
	if (condition)
	  goto out
	//stop channel
	out:
and also changed the exact register writes used to stop the channel.

Correct the logic so that the channel is actually stopped, and revert
to the original register writes.

Fixes: 6d01a10 ("drm/vc4: crtc: Move HVS init and close to a function")
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The reset condition for the EMPTY flag in DISPSTATx is 0,
so seeing as we've just reset the pipeline there is no
guarantee that the flag will denote empty if it hasn't been
enabled.

Drop the WARN.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The code handling freeing stale dlists had 2 issues:
- it disabled the interrupt as soon as the first EOF interrupt
  occurred, even if it didn't clear all stale allocations, thus
  leading to stale entries
- It didn't free stale entries from disabled channels, so eg
  "kmstest -c 0" could leave a stale alloc on channel 1 floating
  around.

Keep the interrupt enabled whilst there are any outstanding
allocs, and discard those on disabled channels. This second
channel does require us to call vc4_hvs_stop_channel from
vc4_crtc_atomic_disable so that the channel actually gets stopped.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Users are reporting running out of DLIST memory. Add a
debugfs file to dump out all the allocations.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
BCM2711 runs pixelvalve at two pixels per clock cycle which results
in an unfortunate limitation that odd horizontal timings are not
possible. This is apparent on the standard DMT mode of 1366x768@60
which cannot be driven with correct timing.

BCM2712 defaults to the same behaviour, but has a mode to support
odd timings. While internally it still runs at two pixels per clock,
setting the PV_VCONTROL_ODD_TIMING bit makes it appear externally
to behave as it is one pixel per clock.

Switching to this mode fixes 1366x768@60 mode, and other custom
resultions with odd horizontal timings.

Signed-off-by: Dom Cobley <popcornmix@gmail.com>
With a DMA FIFO threshold greater than 1 (encoded as 0), it is possible
for data in the FIFO to be inaccessible, causing the transfer to fail
after a timeout. If the transfer includes a transmission, reduce the
RX threshold when the TX completes, otherwise use 1 for the whole
transfer (inefficient, but not catastrophic at SPI data rates).

See: raspberrypi#5696

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Certain controllers (dwc-mshc) generate timeout conditions separately to
command-completion conditions, where the end result is interrupts are
separated in time depending on the current SDCLK frequency.

This causes spurious interrupts if SDCLK is slow compared to the CPU's
ability to process and return from interrupt. This occurs during card
probe with an empty slot where all commands that would generate a
response time out.

Add a quirk to squelch command response interrupts when a command
timeout interrupt is received.

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
The DWC MSHC controller on RP1 needs differentiating from the generic
version.

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
Signed-off-by: Nick Bulleid <nedbulleid@fastmail.com>

Added export feature to gpio-poweroff documentation

Signed-off-by: Nick Bulleid <nedbulleid@fastmail.com>
With the new support for a chain of sys_off handlers, gpio-poweroff
does not disable a normal shutdown (though it does delay it). There
is therefore no need for the noisy WARN from the kernel.

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
See: https://forums.raspberrypi.com/viewtopic.php?p=2159344

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Add the following formats:

- V4L2_PIX_FMT_RGB48/V4L2_PIX_FMT_BGR48
  48-bit RGB where each colour sample is 16-bits.

- V4L2_PIX_FMT_PISP_COMP1_MONO/V4L2_PIX_FMT_PISP_COMP2_MONO
  16-bit to 8-bit pisp compressed monochrome pixel format.

Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
Users have reported log spam created by "Event Ring Full" xHC event
TRBs. These are caused by interrupt latency in conjunction with a very
busy set of devices on the bus. The errors are benign, but throughput
will suffer as the xHC will pause processing of transfers until the
event ring is drained by the kernel. Expand the number of event TRB slots
available by increasing the number of event ring segments in the ERST.

Controllers have a hardware-defined limit as to the number of ERST
entries they can process, so make the actual number in use
min(ERST_MAX_SEGS, hw_max).

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
All the handling for the properties was present, but they
were never attached to the connector to allow userspace
to change them.

Add them to the connector.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Step wise governor increases the mitigation level when the temperature
goes above a threshold and will decrease the mitigation when the
temperature falls below the threshold. If it were a case, where the
temperature hovers around a threshold, the mitigation will be applied
and removed at every iteration. This reaction to the temperature is
inefficient for performance.

The use of hysteresis temperature could avoid this ping-pong of
mitigation by relaxing the mitigation to happen only when the
temperature goes below this lower hysteresis value.

Signed-off-by: Ram Chandrasekar <rkumbako@codeaurora.org>
Signed-off-by: Lina Iyer <ilina@codeaurora.org>

drivers: thermal: step_wise: avoid throttling at hysteresis temperature after dropping below it

Signed-off-by: Serge Schneider <serge@raspberrypi.org>

Fix hysteresis support in gov_step_wise.c

Directly get hyst value instead of going through an
optional and, now, unimplemented function.

Signed-off-by: Jürgen Kreileder <jk@blackdown.de>
The mainline driver has implemented analogue gain using the control
V4L2_CID_GAIN instead of V4L2_CID_ANALOGUE_GAIN.

libcamera requires V4L2_CID_ANALOGUE_GAIN, and therefore fails.

Update the driver to use V4L2_CID_ANALOGUE_GAIN.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
It is permitted for a plane to be configured such that none
of it is on-screen via either negative dest rectangle X,Y
offset, or just an offset that is greater than the crtc
dimensions.

These planes were resized via drm_atomic_helper_check_plane_state
such that the source rectangle had a zero width or height, but
they still created a dlist entry even though they contributed
no pixels. In the case of vc6_plane_mode_set, that it could result
in negative values being written into registers, which caused
incorrect behaviour.

Drop planes that result in a source width or height of 0 pixels
to avoid the incorrect rendering.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Commit 7cd7065 ("drm/bridge: display-connector: implement
bus fmts callbacks") added use of drm_atomic_helper_bridge_*
functions, but didn't select the dependency of DRM_KMS_HELPER.
If nothing else selected that dependency it resulted in a
build failure.

Select the missing dependency.

Fixes: 7cd7065 ("drm/bridge: display-connector: implement bus fmts callbacks")
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
atomic_check creates a state, and allocates the dlist memory for
it such that atomic_flush can not fail.

On destroy that dlist allocation was being put in the stale list,
even though it had never been programmed into the hardware,
therefore doing lots of atomic_checks could consume all the dlist
memory and fail.

If the dlist has never been programmed into the hardware, then
free it immediately.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The dmabuf import already checks that the backing buffer is contiguous
and rejects it if it isn't. vc4 also requires that the buffer is
in the bottom 1GB of RAM, and this is all correctly defined via
dma-ranges.

However the kernel silently uses swiotlb to bounce dma buffers
around if they are in the wrong region. This relies on dma sync
functions to be called in order to copy the data to/from the
bounce buffer.

DRM is based on all memory allocations being coherent with the
GPU so that any updates to a framebuffer will be acted on without
the need for any additional update. This is fairly fundamentally
incompatible with needing to call dma_sync_ to handle the bounce
buffer copies, and therefore we have to detect and reject mappings
that use bounce buffers.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The moplet registers as VC4_ENCODER_TYPE_TXP1 and can be
fed from mux output 2 of HVS channel 1.

Correct the option which checked for VC4_ENCODER_TYPE_TXP0

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
MOP uses register offset 0x24 for the high bits of the address,
whilst Moplet uses 0x1c.

Handle this difference between the block types.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Add YAML device tree bindings for the ROHM BU64754 VCM Motor Driver for
Camera Autofocus.

Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
pelwell and others added 24 commits August 22, 2024 14:29
Ensure the transmit FIFO has emptied before ending the transfer by
dropping the TX threshold to 0 when the last byte has been pushed into
the FIFO. Include a similar fix for the non-IRQ paths.

See: raspberrypi#6285
Fixes: 6014649 ("spi: dw: Save bandwidth with the TMOD_TO feature")
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
The DW SPI interface has a 16-bit clock divider, where the bottom bit
of the divisor must be 0. Limit how low the clock speed can go to
prevent the clock divider from being truncated, as that could lead to
a much higher clock rate than requested.

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
There is now an ssd1327-spi overlay, but it's of little use without
the corresponding display drivers. Add them as modules to the usual
defconfig files.

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Using the "cores * 1.5" heuristic, configure the kernel builds for the
4-core GitHub-hosted runners.

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
The DT property for the BQ32000 controlled by trickle-resistor-ohms
parameter should be "trickle-resistor-ohms", not "abracon,tc-resistor".

See: raspberrypi#6291

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Many HD44780 LCD displays are connected via very common I2C
GPIO expander.
We have an overlay for connecting the displays directly to GPIOs,
but not one for it connected via a backpack. Add such an overlay.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The default values defining a 16x2 display weren't documented,
so add them.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The corresponding driver implementation has seen sufficient testing,
so enable by default. Retain the dtparam so it can be turned off for test.

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
In the same way that other subsystems support the setting of device
id numbers from Device Tree aliases, allow gpiochip numbers to be
derived from "gpiochip<n>" aliases.

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Add a gpiochip0 aliase pointing to the rp1 GPIO node, making it appear
as gpiochip0.

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Make the BCM2712's onboard GPIOs start at gpiochip10, marking them out
as system resources and preventing accidental use by existing Pi 5
code.

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Allow block devices to be used as caches for other devices. The primary
use is to allow small, low latency media to act as caches for spinning
rust drives.

See: raspberrypi#6303
     raspberrypi#455

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Add CONFIG_ZRAM_WRITEBACK=y and CONFIG_ZRAM_MULTI_COMP=y.

See: raspberrypi#2939

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
This reverts commit abb1ad6.

See: raspberrypi#6294

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
This patch adds definitions related to the Branch Record Buffer Extension
(BRBE) as per ARM DDI 0487K.a. These will be used by KVM and a BRBE driver
in subsequent patches.

Some existing BRBE definitions in asm/sysreg.h are replaced with equivalent
generated definitions.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
----
Changes in V18:

- Changed BRBIDR0_EL1 register fields CC and FORMAT, updated the commit message

 arch/arm64/include/asm/sysreg.h |  17 ++---
 arch/arm64/tools/sysreg         | 131 ++++++++++++++++++++++++++++++++
 2 files changed, 137 insertions(+), 11 deletions(-)
The Branch Record Buffer Extension (BRBE) adds a number of system registers
and instructions, which we don't currently intend to expose to guests. Our
existing logic handles this safely, but this could be improved with some
explicit handling of BRBE.

The presence of BRBE is currently hidden from guests as the cpufeature
code's ftr_id_aa64dfr0[] table doesn't have an entry for the BRBE field,
and so this will be zero in the sanitised value of ID_AA64DFR0 exposed to
guests via read_sanitised_id_aa64dfr0_el1(). As the ftr_id_aa64dfr0[] table
may gain an entry for the BRBE field in future, for robustness we should
explicitly mask out the BRBE field in read_sanitised_id_aa64dfr0_el1().

The BRBE system registers and instructions are currently trapped by the
existing configuration of the fine-grained traps. As neither the registers
nor the instructions are described in the sys_reg_descs[] table,
emulate_sys_reg() will warn that these are unknown before injecting an
UNDEFINED exception into the guest.

Well-behaved guests shouldn't try to use the registers or instructions, but
badly-behaved guests could use these, resulting in unnecessary warnings. To
avoid those warnings, we should explicitly handle the BRBE registers and
instructions as UNDEFINED.

Address the above by having read_sanitised_id_aa64dfr0_el1() mask out the
ID_AA64DFR0.BRBE field, and explicitly handling all of the BRBE system
registers and instructions as UNDEFINED.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: kvmarm@lists.linux.dev
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
----
Changes in V18:

- Updated the commit message

 arch/arm64/kvm/sys_regs.c | 56 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
In order to support the Branch Record Buffer Extension (BRBE), we need to
extend the arm_pmu framework with some basic infrastructure for branch
stack sampling which arm_pmu drivers can opt-in to using. Subsequent
patches will use this to add support for BRBE in the PMUv3 driver.

With BRBE, the hardware records branches into a hardware FIFO, which will
be sampled by software when perf events overflow. A task may be context-
switched an arbitrary number of times between overflows, and to avoid
losing samples we need to save the current records when a task is context-
switched out. To do these we'll need to use the pmu::sched_task() callback,
and we'll also need to allocate some per-task storage space via event flag
PERF_ATTACH_TASK_DATA.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
----
Changes in V18:

- Scan valid branch stack events in armpmu_start() to create merged filter
- Updated the commit message

 drivers/perf/arm_pmu.c       | 42 +++++++++++++++++++++++++++++++++---
 include/linux/perf/arm_pmu.h | 32 ++++++++++++++++++++++++++-
 2 files changed, 70 insertions(+), 4 deletions(-)
Fine grained trap control for BRBE registers, and instructions access need
to be configured in HDFGRTR_EL2, HDFGWTR_EL2 and HFGITR_EL2 registers when
kernel enters at EL1 but EL2 is present. This changes __init_el2_fgt() as
required.

Similarly cycle and mis-prediction capture need to be enabled in BRBCR_EL1
and BRBCR_EL2 when the kernel enters either into EL1 or EL2. This adds new
__init_el2_brbe() to achieve this objective.

This also updates Documentation/arch/arm64/booting.rst with all the above
EL2 along with MDRC_EL3.SBRBE requirements.

First this replaces an existing hard encoding (1 << 62) with corresponding
applicable macro HDFGRTR_EL2_nPMSNEVFR_EL1_MASK.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
----
Changes in V18:

- Dropped ifdef CONFIG_ARM64_BRBE around __init_el2_brbe()
- Updated the in code comment around __init_el2_brbe()
- Dropped the write up for EL2->EL1 transition, moved up the EL3 write up

 Documentation/arch/arm64/booting.rst | 21 +++++++
 arch/arm64/include/asm/el2_setup.h   | 87 +++++++++++++++++++++++++++-
 2 files changed, 105 insertions(+), 3 deletions(-)
This extends recently added branch stack sampling framework in ARMV8 PMU to
enable such events via new architecture feature called Branch Record Buffer
Extension aka BRBE. This implements all the armv8pmu_branch_xxx() callbacks
as expected at ARMV8 PMU level required to drive perf branch stack sampling
events. This adds a new config option CONFIG_ARM64_BRBE to encapsulate this
BRBE based implementation, available only on ARM64 platforms.

BRBE hardware captures a branch record via three distinct system registers
representing branch source address, branch target address, and other branch
information. A BRBE buffer implementation is organized as multiple banks of
32 branch records each, which is a collection of BRBSRC_EL1, BRBTGT_EL1 and
BRBINF_EL1 registers. Though total BRBE record entries i.e BRBE_MAX_ENTRIES
cannot exceed MAX_BRANCH_RECORDS as defined for ARM PMU.

Branch stack sampling is enabled and disabled along with regular PMU events
. This adds required function callbacks in armv8pmu_branch_xxx() format, to
drive the PMU branch stack hardware when supported. This also adds fallback
stub definitions for these callbacks for PMUs which would not have required
support.

BRBE hardware attributes get captured in a new reg_brbidr element in struct
arm_pmu during armv8pmu_branch_probe() which is called from broader probing
function __armv8pmu_probe_pmu(). Attributes such as number of branch record
entries implemented in the hardware can be derived from armpmu->reg_brbidr.

BRBE gets enabled via armv8pmu_branch_enable() where it also derives branch
filter, and additional requirements from event's 'attr.branch_sample_type'
and configures them via BRBFCR_EL1 and BRBCR_EL1 registers.

PMU event overflow triggers IRQ, where current branch records get captured,
stitched along with older records available in 'task_ctx', before getting
processed for core perf ring buffer. Task context switch outs incrementally
save current branch records in event's 'pmu_ctx->task_ctx_data' to optimize
workload's branch record samples.

In case multiple events with different branch sample type requests converge
on the same PMU, BRBE gets enabled for the merged branch filter accommoding
all those event's branch sample type. Captured branch records get filterted
in software for an overflown event if BRBE hardware config does not match
its branch sample type, while handling the PMU IRQ.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
----
Changes in V18:

- Replaced BRBIDR0_EL1_FORMAT_0 as BRBIDR0_EL1_FORMAT_FORMAT_0 in BRBE driver
- Added SW filtering framework in read_branch_records() during filter mismatch
- Added SW filtering for both privilege modes and branch types

 drivers/perf/Kconfig            |   11 +
 drivers/perf/Makefile           |    1 +
 drivers/perf/arm_brbe.c         | 1198 +++++++++++++++++++++++++++++++
 drivers/perf/arm_pmuv3.c        |  160 ++++-
 drivers/perf/arm_pmuv3_branch.h |   83 +++
 include/linux/perf/arm_pmu.h    |    5 +
 6 files changed, 1457 insertions(+), 1 deletion(-)
 create mode 100644 drivers/perf/arm_brbe.c
 create mode 100644 drivers/perf/arm_pmuv3_branch.h
Disable the BRBE before we enter the guest, saving the status and enable it
back once we get out of the guest. This avoids capturing branch records in
the guest kernel or userspace, which would be confusing the host samples.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: kvmarm@lists.linux.dev
Cc: linux-arm-kernel@lists.infradead.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
----
Changes in V18:

- Used host_data_ptr() to access host_debug_state.brbcr_el1 register
- Changed DEBUG_STATE_SAVE_BRBE to use BIT(7)
- Reverted back iflags as u8

 arch/arm64/include/asm/kvm_host.h  |  3 +++
 arch/arm64/kvm/debug.c             |  5 +++++
 arch/arm64/kvm/hyp/nvhe/debug-sr.c | 31 ++++++++++++++++++++++++++++++
 3 files changed, 39 insertions(+)
The test runs quite slowly in the model, so replace "xargs -n1" with
"tr ' ' '\n'" which does the same thing but in single digit minutes
instead of double digit minutes.

Also reduce the number of loops in the test application. Unfortunately
this causes intermittent failures on x86, presumably because the
sampling interval is too big to pickup any loops, so keep it the same
there.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: linux-perf-users@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
In the perf script command, spaces are turned into newlines. But when
there is a double space this results in empty lines which fail the
following inverse grep test, so strip the empty lines.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: linux-perf-users@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Add Arm64 BRBE-specific testing to the existing branch stack sampling test.
The test currently passes on the Arm FVP RevC model, but no hardware has
been tested yet.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: linux-perf-users@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Co-developed-by: German Gomez <german.gomez@arm.com>
Signed-off-by: German Gomez <german.gomez@arm.com>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.