Learning and practice of high performance computing

Insatll

git clone

Application

pocket-ai -- A Portable Toolkit for deploying Edge AI and HPC.

https://github.com/cjmcv/pocket-ai

Practice

cux -- An experimental framework for performance analysis and optimization of CUDA kernel functions.

https://github.com/cjmcv/hpc/tree/master/0-frameworks/cux

tag: cuda / simd / openmp.

mrpc -- Mini-RPC, based on asio.

https://github.com/cjmcv/hpc/tree/master/0-frameworks/mrpc

tag: distributed computing.

DEPRECATED

hcs A heterogeneous computing system for multi-task scheduling optimization.

vky A Vulkan-based computing framework

"hcs" and "vky" have been moved to pocket-ai and renamed as graph and vk respectively.

Learning

Distributed computing

mpi/mpi4py

alg_matrix_multiply ： gemm: C = A * B.
base_broadcast_scatter_gather ： Record the basic usage of Bcast, Scatter, Gather and Allgather.
base_group ： Group communication.
base_hello_world ： Environment Management Routines.
base_reduce_alltoall_scan ： Record the basic usage of Reduce, Allreduce, Alltoall, Scan and Exscan.
base_send_recv ： Record the basic usage of MPI_Send/MPI_Recv and MPI_ISend/MPI_IRecv.
base_type_contiguous ： Send and receive custom types of data by using MPI_Type_contiguous.
base_type_struct ： Send and receive custom types of data by using MPI_Type_struct.
util_bandwidth_test ： Test bandwidth by point-to-point communications.
py_base_broadcast_scatter_gather ： Record the basic usage of Bcast, Scatter, Gather and Allgather.
py_base_reduce_scan ： Record the basic usage of Reduce and Scan.
py_base_send_recv ： Record the basic usage of Send and Recv.

Heterogeneous computing

cuda

cuda_util ： Utility functions.
alg_histogram ： histogram, mainly introduce atomicAdd.
alg_matrix_multiply ： gemm: C = A * B.
alg_vector_add ： Vector addition: C = A + B.
alg_vector_dot_product ： Vector dot product: h_result = SUM(A * B).
alg_vector_scan ： Scan. Prefix Sum.
base_aligned_memory_access ： An experiment on aligned memory access.
base_bank_conflict ： An experiment on Bank Conflict in Shared Memory.
base_coalesced_memory_access ： An experiment on coalesced memory access.
base_float2half ： Record the basic usage of float2half.
base_graph ： Record the basic usage of cuda graph.
base_hyperQ ： Demonstrate how HyperQ allows supporting devices to avoid false dependencies between kernels in different streams.
base_kernel_layout ： Record the basic execution configuration of kernel.
base_occupancy ： Record the basic usage of cudaOccupancyMaxPotentialBlockSize.
base_texture ： Record the basic usage of Texture Memory.
base_unified_memory ： A simple task consumer using threads and streams with all data in Unified Memory.
base_zero_copy ： Record the basic usage of Zero Copy.
cub_block_reduce ： Simple demonstration of cub::BlockReduce.
cub_block_scan ： Simple demonstration of cub::BlockScan.
cub_device_reduce ： Simple demonstration of DeviceScan::Sum.
cub_device_scan ： Simple demonstration of DeviceScan::ExclusiveSum.
cub_warp_reduce ： Simple demonstration of cub::WarpReduce.
cub_warp_scan ： Simple demonstration of cub::WarpScan.
cublas_gemm_float16 ： gemm: C = A * B. Use cublas with half-precision.
thrust_iterators ： Record the basic usage of Iterators in Thrust.
thrust_sort ： Sort arrays with Thrust.
thrust_transformations ： Some of the parallel vector operations in Thrust.
thrust_vector ： Record the basic usage of Vector in Thrust.

vulkan

vky

opencl

ocl_util ： Utility functions.
alg_dot_product ： Vector dot product, h_result = SUM(A * B).
alg_vector_add ： Vector addition: C = A + B.
base_platform_info ： Query OpenCL platform information.

Thread

std

alg_quick_sort： Quick sort using std::thread.
alg_vector_dot_product： Vector dot product: h_result = SUM(A * B). Record the basic usage of std::tread and std::sync.
base_async： Record the basic usage of std::async.
util_blocking_queue： Blocking queue. Mainly implemented by thread, queue and condition_variable.
util_internal_thread： Internal Thread. Mainly implemented by std::thread.
util_thread_pool： Thread Pool. Mainly implemented by thread, queue, future and condition_variable.

openmp

alg_matrix_multiply ： gemm: C = A * B.
alg_pi_calculate ： Calculate PI using parallel, for and reduction.
base_flush ： Records the basic usage of flush.
base_mutex ： Mutex operation in openmp, including critical, atomic, lock.
base_parallel_for ： Parallel and For.
base_schedule ： Records the basic usage of schedule.
base_sections_single ： Records the basic usage of Sections and Single.
base_synchronous ： Synchronous operation in openmp, including barrier, ordered and master.

tbb

base_allocator ： The basic use of allocator.
base_atomic ： The basic use of atomic.
base_concurrent_hash_map ： The basic use of concurrent_hash_map.
base_concurrent_queue ： The basic use of concurrent queue.
base_mutex ： The basic use of mutex in tbb.
base_parallel_for ： The basic use of parallel_for.
base_parallel_reduce ： The basic use of parallel_reduce.
base_parallel_scan ： The basic use of parallel_scan.
base_parallel_sort ： The basic use of base_parallel_sort.
base_task_scheduler ： The basic use of base_task_scheduler.
count_strings ： Count strings. Use the concurrent_hash_map.

Coroutines

libco

asyncio

base_future： Record the basic usage of future.
base_gather： Use gather to execute tasks in parallel.
base_hello_world： Hello world. Record the basic usage of async, await and loop.
base_loop_chain： Executes nested coroutines.

SIMD

sse/avx

matrix_multiply ： Matrix Multiplication.
matrix_transpose ： Matrix Transpose.
vector_dot_product ： Vector dot product: result = SUM(A * B).
vector_scan ： Scan. Prefix Sum.

neon

matrix_multiply : Matrix Multiplication.
matrix_transpose ： Matrix Transpose.

Name		Name	Last commit message	Last commit date
Latest commit History 405 Commits
0-frameworks		0-frameworks
coroutine		coroutine
cuda		cuda
llvm		llvm
mpi		mpi
opencl		opencl
openmp		openmp
pocket-ai @ 2c8e687		pocket-ai @ 2c8e687
simd		simd
std		std
tbb		tbb
vulkan		vulkan
z-docs/images		z-docs/images
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning and practice of high performance computing

Insatll

Application

Practice

Learning

Distributed computing

Heterogeneous computing

Thread

Coroutines

SIMD

About

Releases

Packages

Languages

License

cjmcv/hpc

Folders and files

Latest commit

History

Repository files navigation

Learning and practice of high performance computing

Insatll

Application

Practice

Learning

Distributed computing

Heterogeneous computing

Thread

Coroutines

SIMD

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages