ROCm · randyh62 · Aug 3, 2024 · Aug 3, 2024 · Aug 3, 2024 · Aug 3, 2024
@@ -34,7 +34,6 @@ enum
 embeded
 extern
 fatbinary
-foundationally
 frontends
 gedit
 GPGPU
@@ -45,12 +44,14 @@ HIP's
 hipcc
 hipexamine
 hipified
+HIPify
 hipother
 HIPRTC
 hcBLAS
 icc
 IILE
 iGPU
+inlined
 inplace
 Interoperation
 interoperate
@@ -71,6 +72,7 @@ ltrace
 makefile
 Malloc
 malloc
+MALU
 multicore
 multigrid
 multithreading
@@ -80,19 +82,22 @@ nonnegative
 NOP
 Numa
 Nsight
+omnitrace
 overindex
 overindexing
 oversubscription
 pragmas
 preconditioners
 prefetched
 preprocessor
+profilers
 PTX
 PyHIP
 queryable
 prefetching
 representable
 RMW
+rocgdb
 ROCm's
 rocTX
 RTC
@@ -108,7 +113,6 @@ structs
 SYCL
 syntaxes
 tradeoffs
-templated
 typedefs
 UMM
 variadic

@@ -1,27 +1,20 @@
 # HIP documentation
 
 The Heterogeneous-computing Interface for Portability (HIP) API is a C++ runtime
-API and kernel language that lets developers create portable applications for AMD
-and NVIDIA GPUs from single source code.
+API and kernel language that lets developers create portable applications running
+in heterogeneous systems using CPUs, and AMD GPUs or NVIDIA GPUs, from a single source code.
+HIP provides a simple marshalling language to access either the AMD ROCM back-end,
+or NVIDIA CUDA back-end, to build and run application kernels. For more information,
+see [Introduction to HIP](./understand/introduction_to_hip).
 
-For HIP supported AMD GPUs on multiple operating systems, see:
-
-* [Linux system requirements](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-gpus)
-* [Microsoft Windows system requirements](https://rocm.docs.amd.com/projects/install-on-windows/en/latest/reference/system-requirements.html#windows-supported-gpus)
-
-The CUDA enabled NVIDIA GPUs are supported by HIP. For more information, see [GPU Compute Capability](https://developer.nvidia.com/cuda-gpus).
-
-On the AMD ROCm platform, HIP provides header files and runtime library built on top of HIP-Clang compiler in the repository [Common Language Runtimes (CLR)](./understand/amd_clr), which contains source codes for AMD's compute languages runtimes as follows,
-
-On non-AMD platforms, like NVIDIA, HIP provides header files required to support non-AMD specific back-end implementation in the repository ['hipother'](https://github.com/ROCm/hipother), which translates from the HIP runtime APIs to CUDA runtime APIs.
-
-## Overview
+The HIP documentation is organized as follows:
 
 ::::{grid} 1 1 2 2
 :gutter: 3
 
 :::{grid-item-card} Install
 
+* [Linux system requirements](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-gpus)
 * [Installing HIP](./install/install)
 * [Building HIP from source](./install/build)
 
@@ -30,37 +23,32 @@ On non-AMD platforms, like NVIDIA, HIP provides header files required to support
 :::{grid-item-card} Conceptual
 
 * {doc}`./understand/programming_model`
-* {doc}`./understand/programming_model_reference`
 * {doc}`./understand/hardware_implementation`
 * {doc}`./understand/amd_clr`
 
 :::
 
 :::{grid-item-card} How to
 
-* [Programming manual](./how-to/programming_manual)
-* [HIP porting guide](./how-to/hip_porting_guide)
-* [HIP porting: driver API guide](./how-to/hip_porting_driver_api)
+* [Programming Manual](./how-to/programming_manual)
+* [HIP Porting Guide](./how-to/hip_porting_guide)
+* [HIP Porting: Driver API Guide](./how-to/hip_porting_driver_api)
 * {doc}`./how-to/hip_rtc`
 * {doc}`./how-to/performance_guidelines`
 * [Debugging with HIP](./how-to/debugging)
 * {doc}`./how-to/logging`
-* [Unified memory](./how-to/unified_memory)
-* [Cooperative Groups](./how-to/cooperative_groups)
+* [Unified Memory](./how-to/unified_memory)
 * {doc}`./how-to/faq`
 
 :::
 
 :::{grid-item-card} Reference
 
 * {doc}`/doxygen/html/index`
-* [C++ language extensions](./reference/cpp_language_extensions)
-* [C++ language support](./reference/cpp_language_support)
-* [HIP math API](./reference/math_api)
-* [Comparing syntax for different APIs](./reference/terms)
-* [HSA runtime API for ROCm](./reference/virtual_rocr)
-* [HIP managed memory allocation API](./reference/unified_memory_reference)
-* [HIP Cooperative Groups API](./reference/cooperative_groups)
+* [C++ Language Extensions](./reference/cpp_language_extensions)
+* [Comparing Syntax for Different APIs](./reference/terms)
+* [HSA Runtime API for ROCm](./reference/virtual_rocr)
+* [HIP Managed Memory Allocation API](./reference/unified_memory_reference)
 * [List of deprecated APIs](./reference/deprecated_api_list)
 
 :::
@@ -71,8 +59,6 @@ On non-AMD platforms, like NVIDIA, HIP provides header files required to support
 * [HIP examples](https://github.com/ROCm/HIP-Examples)
 * [HIP test samples](https://github.com/ROCm/hip-tests/tree/develop/samples)
 * [SAXPY tutorial](./tutorial/saxpy)
-* [Reduction tutorial](./tutorial/reduction)
-* [Cooperative groups tutorial](./tutorial/cooperative_groups_tutorial)
 
 :::
 

@@ -0,0 +1,28 @@
+.. meta::
+  :description: This chapter provides an introduction to the HIP API.
+  :keywords: AMD, ROCm, HIP, CUDA, C++ language extensions
+
+.. _intro-to-hip:
+
+*******************************************************************************
+Introduction to HIP
+*******************************************************************************
+
+The Heterogeneous-computing Interface for Portability (HIP) is a C++ runtime API and kernel language that lets you create portable applications for AMD and NVIDIA GPUs from a single source code. 
+
+* HIP is a thin API with little or no performance impact over coding directly in NVIDIA CUDA or AMD ROCm.
+* HIP enables coding in a single-source C++ programming language including features such as templates, C++11 lambdas, classes, namespaces, and more.
+* The :doc:`HIPify <hipify:index>` tools convert source from CUDA to HIP.
+* Developers can specialize for the platform (CUDA or AMD) to tune for performance or handle tricky cases.
+
+HIP includes the runtime API, kernel language, compilers (``clang``, ``hipcc``), code profilers (``rocprof``, ``omnitrace``), debugging tools (``rocgdb``), and libraries to create heterogeneous applications running on both CPUs and GPUs. HIP provides marshalling libraries like :doc:`hipFFT <hipfft:index>` or :doc:`hipBLAS <hipblas:index>` that act as a thin programming layer over either NVIDIA CUDA or AMD ROCm to enable support for either language as a back-end. These libraries offer pointer-based memory interfaces and are easily integrated into your applications.
+
+HIP supports the ability to build and run on either AMD GPUs or NVIDIA GPUs. GPU Programmers familiar with NVIDIA CUDA or OpenCL will find the HIP API familiar and easy to use. Developers no longer need to choose between AMD or NVIDIA GPUs. You can quickly port your application to run on the available hardware while maintaining a single codebase. The HIPify tools, based on the clang front-end and Perl language, can convert CUDA API calls into the corresponding HIP calls. However, HIP is not intended to be a drop-in replacement for CUDA, and developers should expect to do some manual coding and performance tuning work to port existing projects as described in `HIP Porting Guide <../how-to/hip_porting_guide.html>`_.  
+
+For the AMD ROCm platform, HIP provides headers and a runtime library built on top of HIP-Clang compiler in the repository `Common Language Runtime (CLR) <./amd_clr.html>`_.  The HIP runtime implements HIP streams, events, and memory APIs, and is a object library that is linked with the application.  The source code for all headers and the library implementation is available on GitHub. HIP developers on ROCm can use :doc:`ROCgdb <rocgdb:index>` for debugging and :doc:`ROCProfiler <rocprofiler:index>` for profiling.
+
+For the NVIDIA CUDA platform, HIP provides header files in the repository `hipother <https://github.com/ROCm/hipother>`_ which translate from the HIP runtime APIs to CUDA runtime APIs.  The header files contain mostly inlined functions and thus have very low overhead. Developers coding in HIP should expect the same performance as coding in native CUDA.  The code is then compiled with ``nvcc``, the standard C++ compiler provided with the CUDA SDK.  Developers can use any tools supported by the CUDA SDK including the CUDA debugger and profiler.
+
+HIP is designed to work seamlessly with the ROCm Runtime (ROCr). HIP provides two types of APIs: those that run on the CPU, also known as host system, and those that run on GPUs, or accelerators. The host-based code is used to create device buffers, move data between the host application and a device, launch the device code (also known as kernel), manage streams and events, and perform synchronization. The device or kernel code, running on GPUs, provides significantly increased performance over CPUs for certain types of functions as described in `Programming Model <./programming_model.html>`_. 
+
+In summary, HIP simplifies cross-platform development, maintains performance, and provides a familiar C++ experience for GPU programming that runs seamlessly on both AMD and NVIDIA GPUs.