-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEST: Perf branch stack sampling #6330
Draft
popcornmix
wants to merge
773
commits into
raspberrypi:rpi-6.10.y
Choose a base branch
from
popcornmix:perf_branch_stack_sampling
base: rpi-6.10.y
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
TEST: Perf branch stack sampling #6330
popcornmix
wants to merge
773
commits into
raspberrypi:rpi-6.10.y
from
popcornmix:perf_branch_stack_sampling
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
11cf37e switched to using drm_fb_dma_get_gem_addr instead of drm_fb_dma_get_gem_obj and adding fb->offset[]. However the tiled formats need to compute the offset in a more involved manner than drm_fb_dma_get_gem_addr applies, and we were ending up with the offset for src_[xy] being applied twice. Switch back to using drm_fb_dma_get_gem_obj and fully computing the offsets ourselves. Fixes: 11cf37e ("drm/vc4: Move the buffer offset out of the vc4_plane_state") Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Add a flag custom_fb_num to denote that the client has requested a specific fbdev node number via node. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
For situations where there are multiple DRM cards in a system, add a query of DT for "drm_fb" designations for cards to set their preferred /dev/fbN designation. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com> drm/fb_helper: Change query for FB designation from drm_fb to drm-fb Fixes: 1216ea5 ("drm/fb-helper: Look up preferred fbdev node number from DT") Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Apparently aliases are only allowed lower case and hyphens, so swap the use of underscore to hyphen. Fixes: 3aa1f24 ("drm: Look for an alias for the displays to use as the DRM device name") Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
This property can be used to delay deassertion of external fundamental reset, which may be useful for endpoints that require an extended time for internal setup to complete. Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
When factoring out __vc4_hvs_stop_channel, the logic got inverted from if (condition) // stop channel to if (condition) goto out //stop channel out: and also changed the exact register writes used to stop the channel. Correct the logic so that the channel is actually stopped, and revert to the original register writes. Fixes: 6d01a10 ("drm/vc4: crtc: Move HVS init and close to a function") Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The reset condition for the EMPTY flag in DISPSTATx is 0, so seeing as we've just reset the pipeline there is no guarantee that the flag will denote empty if it hasn't been enabled. Drop the WARN. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The code handling freeing stale dlists had 2 issues: - it disabled the interrupt as soon as the first EOF interrupt occurred, even if it didn't clear all stale allocations, thus leading to stale entries - It didn't free stale entries from disabled channels, so eg "kmstest -c 0" could leave a stale alloc on channel 1 floating around. Keep the interrupt enabled whilst there are any outstanding allocs, and discard those on disabled channels. This second channel does require us to call vc4_hvs_stop_channel from vc4_crtc_atomic_disable so that the channel actually gets stopped. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Users are reporting running out of DLIST memory. Add a debugfs file to dump out all the allocations. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
BCM2711 runs pixelvalve at two pixels per clock cycle which results in an unfortunate limitation that odd horizontal timings are not possible. This is apparent on the standard DMT mode of 1366x768@60 which cannot be driven with correct timing. BCM2712 defaults to the same behaviour, but has a mode to support odd timings. While internally it still runs at two pixels per clock, setting the PV_VCONTROL_ODD_TIMING bit makes it appear externally to behave as it is one pixel per clock. Switching to this mode fixes 1366x768@60 mode, and other custom resultions with odd horizontal timings. Signed-off-by: Dom Cobley <popcornmix@gmail.com>
With a DMA FIFO threshold greater than 1 (encoded as 0), it is possible for data in the FIFO to be inaccessible, causing the transfer to fail after a timeout. If the transfer includes a transmission, reduce the RX threshold when the TX completes, otherwise use 1 for the whole transfer (inefficient, but not catastrophic at SPI data rates). See: raspberrypi#5696 Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Certain controllers (dwc-mshc) generate timeout conditions separately to command-completion conditions, where the end result is interrupts are separated in time depending on the current SDCLK frequency. This causes spurious interrupts if SDCLK is slow compared to the CPU's ability to process and return from interrupt. This occurs during card probe with an empty slot where all commands that would generate a response time out. Add a quirk to squelch command response interrupts when a command timeout interrupt is received. Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
The DWC MSHC controller on RP1 needs differentiating from the generic version. Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
Signed-off-by: Nick Bulleid <nedbulleid@fastmail.com> Added export feature to gpio-poweroff documentation Signed-off-by: Nick Bulleid <nedbulleid@fastmail.com>
With the new support for a chain of sys_off handlers, gpio-poweroff does not disable a normal shutdown (though it does delay it). There is therefore no need for the noisy WARN from the kernel. Signed-off-by: Phil Elwell <phil@raspberrypi.com>
See: https://forums.raspberrypi.com/viewtopic.php?p=2159344 Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Add the following formats: - V4L2_PIX_FMT_RGB48/V4L2_PIX_FMT_BGR48 48-bit RGB where each colour sample is 16-bits. - V4L2_PIX_FMT_PISP_COMP1_MONO/V4L2_PIX_FMT_PISP_COMP2_MONO 16-bit to 8-bit pisp compressed monochrome pixel format. Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
Users have reported log spam created by "Event Ring Full" xHC event TRBs. These are caused by interrupt latency in conjunction with a very busy set of devices on the bus. The errors are benign, but throughput will suffer as the xHC will pause processing of transfers until the event ring is drained by the kernel. Expand the number of event TRB slots available by increasing the number of event ring segments in the ERST. Controllers have a hardware-defined limit as to the number of ERST entries they can process, so make the actual number in use min(ERST_MAX_SEGS, hw_max). Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
All the handling for the properties was present, but they were never attached to the connector to allow userspace to change them. Add them to the connector. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Step wise governor increases the mitigation level when the temperature goes above a threshold and will decrease the mitigation when the temperature falls below the threshold. If it were a case, where the temperature hovers around a threshold, the mitigation will be applied and removed at every iteration. This reaction to the temperature is inefficient for performance. The use of hysteresis temperature could avoid this ping-pong of mitigation by relaxing the mitigation to happen only when the temperature goes below this lower hysteresis value. Signed-off-by: Ram Chandrasekar <rkumbako@codeaurora.org> Signed-off-by: Lina Iyer <ilina@codeaurora.org> drivers: thermal: step_wise: avoid throttling at hysteresis temperature after dropping below it Signed-off-by: Serge Schneider <serge@raspberrypi.org> Fix hysteresis support in gov_step_wise.c Directly get hyst value instead of going through an optional and, now, unimplemented function. Signed-off-by: Jürgen Kreileder <jk@blackdown.de>
The mainline driver has implemented analogue gain using the control V4L2_CID_GAIN instead of V4L2_CID_ANALOGUE_GAIN. libcamera requires V4L2_CID_ANALOGUE_GAIN, and therefore fails. Update the driver to use V4L2_CID_ANALOGUE_GAIN. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
It is permitted for a plane to be configured such that none of it is on-screen via either negative dest rectangle X,Y offset, or just an offset that is greater than the crtc dimensions. These planes were resized via drm_atomic_helper_check_plane_state such that the source rectangle had a zero width or height, but they still created a dlist entry even though they contributed no pixels. In the case of vc6_plane_mode_set, that it could result in negative values being written into registers, which caused incorrect behaviour. Drop planes that result in a source width or height of 0 pixels to avoid the incorrect rendering. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Commit 7cd7065 ("drm/bridge: display-connector: implement bus fmts callbacks") added use of drm_atomic_helper_bridge_* functions, but didn't select the dependency of DRM_KMS_HELPER. If nothing else selected that dependency it resulted in a build failure. Select the missing dependency. Fixes: 7cd7065 ("drm/bridge: display-connector: implement bus fmts callbacks") Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
atomic_check creates a state, and allocates the dlist memory for it such that atomic_flush can not fail. On destroy that dlist allocation was being put in the stale list, even though it had never been programmed into the hardware, therefore doing lots of atomic_checks could consume all the dlist memory and fail. If the dlist has never been programmed into the hardware, then free it immediately. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The dmabuf import already checks that the backing buffer is contiguous and rejects it if it isn't. vc4 also requires that the buffer is in the bottom 1GB of RAM, and this is all correctly defined via dma-ranges. However the kernel silently uses swiotlb to bounce dma buffers around if they are in the wrong region. This relies on dma sync functions to be called in order to copy the data to/from the bounce buffer. DRM is based on all memory allocations being coherent with the GPU so that any updates to a framebuffer will be acted on without the need for any additional update. This is fairly fundamentally incompatible with needing to call dma_sync_ to handle the bounce buffer copies, and therefore we have to detect and reject mappings that use bounce buffers. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The moplet registers as VC4_ENCODER_TYPE_TXP1 and can be fed from mux output 2 of HVS channel 1. Correct the option which checked for VC4_ENCODER_TYPE_TXP0 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
MOP uses register offset 0x24 for the high bits of the address, whilst Moplet uses 0x1c. Handle this difference between the block types. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Add YAML device tree bindings for the ROHM BU64754 VCM Motor Driver for Camera Autofocus. Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com> Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Ensure the transmit FIFO has emptied before ending the transfer by dropping the TX threshold to 0 when the last byte has been pushed into the FIFO. Include a similar fix for the non-IRQ paths. See: raspberrypi#6285 Fixes: 6014649 ("spi: dw: Save bandwidth with the TMOD_TO feature") Signed-off-by: Phil Elwell <phil@raspberrypi.com>
The DW SPI interface has a 16-bit clock divider, where the bottom bit of the divisor must be 0. Limit how low the clock speed can go to prevent the clock divider from being truncated, as that could lead to a much higher clock rate than requested. Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Signed-off-by: Phil Elwell <phil@raspberrypi.com>
There is now an ssd1327-spi overlay, but it's of little use without the corresponding display drivers. Add them as modules to the usual defconfig files. Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Using the "cores * 1.5" heuristic, configure the kernel builds for the 4-core GitHub-hosted runners. Signed-off-by: Phil Elwell <phil@raspberrypi.com>
The DT property for the BQ32000 controlled by trickle-resistor-ohms parameter should be "trickle-resistor-ohms", not "abracon,tc-resistor". See: raspberrypi#6291 Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Many HD44780 LCD displays are connected via very common I2C GPIO expander. We have an overlay for connecting the displays directly to GPIOs, but not one for it connected via a backpack. Add such an overlay. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The default values defining a 16x2 display weren't documented, so add them. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
The corresponding driver implementation has seen sufficient testing, so enable by default. Retain the dtparam so it can be turned off for test. Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
In the same way that other subsystems support the setting of device id numbers from Device Tree aliases, allow gpiochip numbers to be derived from "gpiochip<n>" aliases. Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Add a gpiochip0 aliase pointing to the rp1 GPIO node, making it appear as gpiochip0. Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Make the BCM2712's onboard GPIOs start at gpiochip10, marking them out as system resources and preventing accidental use by existing Pi 5 code. Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Allow block devices to be used as caches for other devices. The primary use is to allow small, low latency media to act as caches for spinning rust drives. See: raspberrypi#6303 raspberrypi#455 Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Add CONFIG_ZRAM_WRITEBACK=y and CONFIG_ZRAM_MULTI_COMP=y. See: raspberrypi#2939 Signed-off-by: Phil Elwell <phil@raspberrypi.com>
This reverts commit abb1ad6. See: raspberrypi#6294 Signed-off-by: Phil Elwell <phil@raspberrypi.com>
This patch adds definitions related to the Branch Record Buffer Extension (BRBE) as per ARM DDI 0487K.a. These will be used by KVM and a BRBE driver in subsequent patches. Some existing BRBE definitions in asm/sysreg.h are replaced with equivalent generated definitions. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> ---- Changes in V18: - Changed BRBIDR0_EL1 register fields CC and FORMAT, updated the commit message arch/arm64/include/asm/sysreg.h | 17 ++--- arch/arm64/tools/sysreg | 131 ++++++++++++++++++++++++++++++++ 2 files changed, 137 insertions(+), 11 deletions(-)
The Branch Record Buffer Extension (BRBE) adds a number of system registers and instructions, which we don't currently intend to expose to guests. Our existing logic handles this safely, but this could be improved with some explicit handling of BRBE. The presence of BRBE is currently hidden from guests as the cpufeature code's ftr_id_aa64dfr0[] table doesn't have an entry for the BRBE field, and so this will be zero in the sanitised value of ID_AA64DFR0 exposed to guests via read_sanitised_id_aa64dfr0_el1(). As the ftr_id_aa64dfr0[] table may gain an entry for the BRBE field in future, for robustness we should explicitly mask out the BRBE field in read_sanitised_id_aa64dfr0_el1(). The BRBE system registers and instructions are currently trapped by the existing configuration of the fine-grained traps. As neither the registers nor the instructions are described in the sys_reg_descs[] table, emulate_sys_reg() will warn that these are unknown before injecting an UNDEFINED exception into the guest. Well-behaved guests shouldn't try to use the registers or instructions, but badly-behaved guests could use these, resulting in unnecessary warnings. To avoid those warnings, we should explicitly handle the BRBE registers and instructions as UNDEFINED. Address the above by having read_sanitised_id_aa64dfr0_el1() mask out the ID_AA64DFR0.BRBE field, and explicitly handling all of the BRBE system registers and instructions as UNDEFINED. Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: kvmarm@lists.linux.dev Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> ---- Changes in V18: - Updated the commit message arch/arm64/kvm/sys_regs.c | 56 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+) Reviewed-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
In order to support the Branch Record Buffer Extension (BRBE), we need to extend the arm_pmu framework with some basic infrastructure for branch stack sampling which arm_pmu drivers can opt-in to using. Subsequent patches will use this to add support for BRBE in the PMUv3 driver. With BRBE, the hardware records branches into a hardware FIFO, which will be sampled by software when perf events overflow. A task may be context- switched an arbitrary number of times between overflows, and to avoid losing samples we need to save the current records when a task is context- switched out. To do these we'll need to use the pmu::sched_task() callback, and we'll also need to allocate some per-task storage space via event flag PERF_ATTACH_TASK_DATA. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> ---- Changes in V18: - Scan valid branch stack events in armpmu_start() to create merged filter - Updated the commit message drivers/perf/arm_pmu.c | 42 +++++++++++++++++++++++++++++++++--- include/linux/perf/arm_pmu.h | 32 ++++++++++++++++++++++++++- 2 files changed, 70 insertions(+), 4 deletions(-)
Fine grained trap control for BRBE registers, and instructions access need to be configured in HDFGRTR_EL2, HDFGWTR_EL2 and HFGITR_EL2 registers when kernel enters at EL1 but EL2 is present. This changes __init_el2_fgt() as required. Similarly cycle and mis-prediction capture need to be enabled in BRBCR_EL1 and BRBCR_EL2 when the kernel enters either into EL1 or EL2. This adds new __init_el2_brbe() to achieve this objective. This also updates Documentation/arch/arm64/booting.rst with all the above EL2 along with MDRC_EL3.SBRBE requirements. First this replaces an existing hard encoding (1 << 62) with corresponding applicable macro HDFGRTR_EL2_nPMSNEVFR_EL1_MASK. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> ---- Changes in V18: - Dropped ifdef CONFIG_ARM64_BRBE around __init_el2_brbe() - Updated the in code comment around __init_el2_brbe() - Dropped the write up for EL2->EL1 transition, moved up the EL3 write up Documentation/arch/arm64/booting.rst | 21 +++++++ arch/arm64/include/asm/el2_setup.h | 87 +++++++++++++++++++++++++++- 2 files changed, 105 insertions(+), 3 deletions(-)
This extends recently added branch stack sampling framework in ARMV8 PMU to enable such events via new architecture feature called Branch Record Buffer Extension aka BRBE. This implements all the armv8pmu_branch_xxx() callbacks as expected at ARMV8 PMU level required to drive perf branch stack sampling events. This adds a new config option CONFIG_ARM64_BRBE to encapsulate this BRBE based implementation, available only on ARM64 platforms. BRBE hardware captures a branch record via three distinct system registers representing branch source address, branch target address, and other branch information. A BRBE buffer implementation is organized as multiple banks of 32 branch records each, which is a collection of BRBSRC_EL1, BRBTGT_EL1 and BRBINF_EL1 registers. Though total BRBE record entries i.e BRBE_MAX_ENTRIES cannot exceed MAX_BRANCH_RECORDS as defined for ARM PMU. Branch stack sampling is enabled and disabled along with regular PMU events . This adds required function callbacks in armv8pmu_branch_xxx() format, to drive the PMU branch stack hardware when supported. This also adds fallback stub definitions for these callbacks for PMUs which would not have required support. BRBE hardware attributes get captured in a new reg_brbidr element in struct arm_pmu during armv8pmu_branch_probe() which is called from broader probing function __armv8pmu_probe_pmu(). Attributes such as number of branch record entries implemented in the hardware can be derived from armpmu->reg_brbidr. BRBE gets enabled via armv8pmu_branch_enable() where it also derives branch filter, and additional requirements from event's 'attr.branch_sample_type' and configures them via BRBFCR_EL1 and BRBCR_EL1 registers. PMU event overflow triggers IRQ, where current branch records get captured, stitched along with older records available in 'task_ctx', before getting processed for core perf ring buffer. Task context switch outs incrementally save current branch records in event's 'pmu_ctx->task_ctx_data' to optimize workload's branch record samples. In case multiple events with different branch sample type requests converge on the same PMU, BRBE gets enabled for the merged branch filter accommoding all those event's branch sample type. Captured branch records get filterted in software for an overflown event if BRBE hardware config does not match its branch sample type, while handling the PMU IRQ. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> ---- Changes in V18: - Replaced BRBIDR0_EL1_FORMAT_0 as BRBIDR0_EL1_FORMAT_FORMAT_0 in BRBE driver - Added SW filtering framework in read_branch_records() during filter mismatch - Added SW filtering for both privilege modes and branch types drivers/perf/Kconfig | 11 + drivers/perf/Makefile | 1 + drivers/perf/arm_brbe.c | 1198 +++++++++++++++++++++++++++++++ drivers/perf/arm_pmuv3.c | 160 ++++- drivers/perf/arm_pmuv3_branch.h | 83 +++ include/linux/perf/arm_pmu.h | 5 + 6 files changed, 1457 insertions(+), 1 deletion(-) create mode 100644 drivers/perf/arm_brbe.c create mode 100644 drivers/perf/arm_pmuv3_branch.h
Disable the BRBE before we enter the guest, saving the status and enable it back once we get out of the guest. This avoids capturing branch records in the guest kernel or userspace, which would be confusing the host samples. Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: kvmarm@lists.linux.dev Cc: linux-arm-kernel@lists.infradead.org CC: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> ---- Changes in V18: - Used host_data_ptr() to access host_debug_state.brbcr_el1 register - Changed DEBUG_STATE_SAVE_BRBE to use BIT(7) - Reverted back iflags as u8 arch/arm64/include/asm/kvm_host.h | 3 +++ arch/arm64/kvm/debug.c | 5 +++++ arch/arm64/kvm/hyp/nvhe/debug-sr.c | 31 ++++++++++++++++++++++++++++++ 3 files changed, 39 insertions(+)
The test runs quite slowly in the model, so replace "xargs -n1" with "tr ' ' '\n'" which does the same thing but in single digit minutes instead of double digit minutes. Also reduce the number of loops in the test application. Unfortunately this causes intermittent failures on x86, presumably because the sampling interval is too big to pickup any loops, so keep it the same there. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
In the perf script command, spaces are turned into newlines. But when there is a double space this results in empty lines which fail the following inverse grep test, so strip the empty lines. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Add Arm64 BRBE-specific testing to the existing branch stack sampling test. The test currently passes on the Arm FVP RevC model, but no hardware has been tested yet. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: linux-perf-users@vger.kernel.org Cc: linux-kernel@vger.kernel.org Co-developed-by: German Gomez <german.gomez@arm.com> Signed-off-by: German Gomez <german.gomez@arm.com> Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
popcornmix
force-pushed
the
rpi-6.10.y
branch
3 times, most recently
from
September 12, 2024 14:20
760d872
to
6598edf
Compare
popcornmix
force-pushed
the
rpi-6.10.y
branch
from
October 10, 2024 12:54
e1dadc4
to
e15745e
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Do not merge.
This is for testing a patch set that adds branch stack sampling to perf, which can be used by BOLT.