diff --git a/README.md b/README.md index e2d7920b1..0ab356325 100644 --- a/README.md +++ b/README.md @@ -1,27 +1,31 @@ Take a video and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training. -You can watch some demos [here](https://drive.google.com/drive/folders/1KHv8n_rd3Lcr2v7jBq1yPSTWM554Gq8e?usp=sharing). A StableDiffusion extension is also available, [here](https://github.com/s0md3v/sd-webui-roop). +You can watch some demos [here](https://drive.google.com/drive/folders/1KHv8n_rd3Lcr2v7jBq1yPSTWM554Gq8e?usp=sharing). +A Stable Diffusion extension is also available, [here](https://github.com/s0md3v/sd-webui-roop). ![demo-gif](demo.gif) ## Disclaimer + This software is meant to be a productive contribution to the rapidly growing AI-generated media industry. It will help artists with tasks such as animating a custom character or using the character as a model for clothing etc. The developers of this software are aware of its possible unethical applications and are committed to take preventative measures against them. It has a built-in check which prevents the program from working on inappropriate media including but not limited to nudity, graphic content, sensitive material such as war footage etc. We will continue to develop this project in the positive direction while adhering to law and ethics. This project may be shut down or include watermarks on the output if requested by law. Users of this software are expected to use this software responsibly while abiding the local law. If face of a real person is being used, users are suggested to get consent from the concerned person and clearly mention that it is a deepfake when posting content online. Developers of this software will not be responsible for actions of end-users. -## How do I install it? +## How to install? ### Basic -It is more likely to work on your computer but it will also be very slow. You can follow instructions for the basic install [here](https://github.com/s0md3v/roop/wiki/1.-Installation). +It is more likely to work on your computer, but will be quite slow. Follow instructions for the basic installation [here](https://github.com/s0md3v/roop/wiki/1.-Installation). ### Acceleration -If you have a good GPU and are ready for solving any software issues you may face, you can enable GPU which is wayyy faster. To do this, first follow the basic install instructions given above and then follow GPU-specific instructions [here](https://github.com/s0md3v/roop/wiki/2.-Acceleration). +If you own a capable GPU and are prepared to address any software problems, you have the option to activate such acceleration, which offers significantly enhanced speed. Once you finished the basic installation, you can follow the instructions for the acceleration installation [here](https://github.com/s0md3v/roop/wiki/2.-Acceleration). + +## How to use? -## How do I use it? +### UI Executing `python run.py` command will launch this window: @@ -29,33 +33,38 @@ Executing `python run.py` command will launch this window: Choose a face (image with desired face) and the target image/video (image/video in which you want to replace the face) and click on `Start`. Open file explorer and navigate to the directory you select your output to be in. You will find a directory named `` where you can see the frames being swapped in realtime. Once the processing is done, it will create the output file. That's it. -Additional command line arguments are given below. To learn out what they do, check [this guide](https://github.com/s0md3v/roop/wiki/Advanced-Options). +## CLI + +Additional command line arguments are given below. To learn out what they do, check the guide [here](https://github.com/s0md3v/roop/wiki/Advanced-Options). ``` options: - -h, --help show this help message and exit - -s SOURCE_PATH, --source SOURCE_PATH select an source image - -t TARGET_PATH, --target TARGET_PATH select an target image or video - -o OUTPUT_PATH, --output OUTPUT_PATH select output file or directory - --frame-processor FRAME_PROCESSOR [FRAME_PROCESSOR ...] frame processors (choices: face_swapper, face_enhancer, ...) - --keep-fps keep target fps - --keep-frames keep temporary frames - --skip-audio skip target audio - --many-faces process every face - --reference-face-position REFERENCE_FACE_POSITION position of the reference face - --reference-frame-number REFERENCE_FRAME_NUMBER number of the reference frame - --similar-face-distance SIMILAR_FACE_DISTANCE face distance used for recognition - --video-encoder {libx264,libx265,libvpx-vp9} adjust output video encoder - --video-quality [0-51] adjust output video quality - --max-memory MAX_MEMORY maximum amount of RAM in GB - --execution-provider {cpu} [{cpu} ...] available execution provider (choices: cpu, ...) - --execution-threads EXECUTION_THREADS number of execution threads - -v, --version show program's version number and exit + -h, --help show this help message and exit + -s SOURCE_PATH, --source SOURCE_PATH select an source image + -t TARGET_PATH, --target TARGET_PATH select an target image or video + -o OUTPUT_PATH, --output OUTPUT_PATH select output file or directory + --frame-processor FRAME_PROCESSOR [FRAME_PROCESSOR ...] frame processors (choices: face_swapper, face_enhancer, ...) + --keep-fps keep target fps + --keep-frames keep temporary frames + --skip-audio skip target audio + --many-faces process every face + --reference-face-position REFERENCE_FACE_POSITION position of the reference face + --reference-frame-number REFERENCE_FRAME_NUMBER number of the reference frame + --similar-face-distance SIMILAR_FACE_DISTANCE face distance used for recognition + --temp-frame-format {jpg,png} image format used for frame extraction + --temp-frame-quality [1-100] image quality used for frame extraction + --output-video-encoder {libx264,libx265,libvpx-vp9,h264_nvenc,hevc_nvenc} encoder used for the output video + --output-video-quality [1-100] quality used for the output video + --max-memory MAX_MEMORY maximum amount of RAM in GB + --execution-provider {cpu} [{cpu} ...] available execution provider (choices: cpu, ...) + --execution-threads EXECUTION_THREADS number of execution threads + -v, --version show program's version number and exit ``` -Looking for a CLI mode? Using the -s/--source argument will make the run program in cli mode. +Using the `-s/--source`, `-t/--target` and `-o/--output` argument will run the program in headless mode. ## Credits + - [henryruhs](https://github.com/henryruhs): for being an irreplaceable contributor to the project - [ffmpeg](https://ffmpeg.org/): for making video related operations easy - [deepinsight](https://github.com/deepinsight): for their [insightface](https://github.com/deepinsight/insightface) project which provided a well-made library and models. diff --git a/gui-demo.png b/gui-demo.png index 345c420f4..38f1c6338 100644 Binary files a/gui-demo.png and b/gui-demo.png differ diff --git a/roop/core.py b/roop/core.py index 32a6d3952..456aa7e9b 100755 --- a/roop/core.py +++ b/roop/core.py @@ -44,8 +44,10 @@ def parse_args() -> None: program.add_argument('--reference-face-position', help='position of the reference face', dest='reference_face_position', type=int, default=0) program.add_argument('--reference-frame-number', help='number of the reference frame', dest='reference_frame_number', type=int, default=0) program.add_argument('--similar-face-distance', help='face distance used for recognition', dest='similar_face_distance', type=float, default=0.85) - program.add_argument('--video-encoder', help='adjust output video encoder', dest='video_encoder', default='libx264', choices=['libx264', 'libx265', 'libvpx-vp9']) - program.add_argument('--video-quality', help='adjust output video quality', dest='video_quality', type=int, default=18, choices=range(52), metavar='[0-51]') + program.add_argument('--temp-frame-format', help='image format used for frame extraction', dest='temp_frame_format', default='png', choices=['jpg', 'png']) + program.add_argument('--temp-frame-quality', help='image quality used for frame extraction', dest='temp_frame_quality', type=int, default=0, choices=range(100), metavar='[1-100]') + program.add_argument('--output-video-encoder', help='encoder used for the output video', dest='output_video_encoder', default='libx264', choices=['libx264', 'libx265', 'libvpx-vp9', 'h264_nvenc', 'hevc_nvenc']) + program.add_argument('--output-video-quality', help='quality used for the output video', dest='output_video_quality', type=int, default=35, choices=range(100), metavar='[1-100]') program.add_argument('--max-memory', help='maximum amount of RAM in GB', dest='max_memory', type=int) program.add_argument('--execution-provider', help='available execution provider (choices: cpu, ...)', dest='execution_provider', default=['cpu'], choices=suggest_execution_providers(), nargs='+') program.add_argument('--execution-threads', help='number of execution threads', dest='execution_threads', type=int, default=suggest_execution_threads()) @@ -65,8 +67,10 @@ def parse_args() -> None: roop.globals.reference_face_position = args.reference_face_position roop.globals.reference_frame_number = args.reference_frame_number roop.globals.similar_face_distance = args.similar_face_distance - roop.globals.video_encoder = args.video_encoder - roop.globals.video_quality = args.video_quality + roop.globals.temp_frame_format = args.temp_frame_format + roop.globals.temp_frame_quality = args.temp_frame_quality + roop.globals.output_video_encoder = args.output_video_encoder + roop.globals.output_video_quality = args.output_video_quality roop.globals.max_memory = args.max_memory roop.globals.execution_providers = decode_execution_providers(args.execution_provider) roop.globals.execution_threads = args.execution_threads @@ -151,7 +155,7 @@ def start() -> None: # process image to videos if predict_video(roop.globals.target_path): destroy() - update_status('Creating temp resources...') + update_status('Creating temporary resources...') create_temp(roop.globals.target_path) # extract frames if roop.globals.keep_fps: @@ -163,10 +167,14 @@ def start() -> None: extract_frames(roop.globals.target_path) # process frame temp_frame_paths = get_temp_frame_paths(roop.globals.target_path) - for frame_processor in get_frame_processors_modules(roop.globals.frame_processors): - update_status('Progressing...', frame_processor.NAME) - frame_processor.process_video(roop.globals.source_path, temp_frame_paths) - frame_processor.post_process() + if temp_frame_paths: + for frame_processor in get_frame_processors_modules(roop.globals.frame_processors): + update_status('Progressing...', frame_processor.NAME) + frame_processor.process_video(roop.globals.source_path, temp_frame_paths) + frame_processor.post_process() + else: + update_status('Frames not found...') + return # create video if roop.globals.keep_fps: fps = detect_fps(roop.globals.target_path) @@ -186,6 +194,7 @@ def start() -> None: update_status('Restoring audio might cause issues as fps are not kept...') restore_audio(roop.globals.target_path, roop.globals.output_path) # clean temp + update_status('Cleaning temporary resources...') clean_temp(roop.globals.target_path) # validate video if is_video(roop.globals.target_path): diff --git a/roop/globals.py b/roop/globals.py index 3b8bdeb37..f7c68e650 100644 --- a/roop/globals.py +++ b/roop/globals.py @@ -12,8 +12,10 @@ reference_face_position = None reference_frame_number = None similar_face_distance = None -video_encoder = None -video_quality = None +temp_frame_format = None +temp_frame_quality = None +output_video_encoder = None +output_video_quality = None max_memory = None execution_providers: List[str] = [] execution_threads = None diff --git a/roop/metadata.py b/roop/metadata.py index 0f4e05168..1de4d8409 100644 --- a/roop/metadata.py +++ b/roop/metadata.py @@ -1,2 +1,2 @@ name = 'roop' -version = '1.2.0' +version = '1.3.0' diff --git a/roop/processors/frame/face_enhancer.py b/roop/processors/frame/face_enhancer.py index 20cdd958f..3a7f5a217 100644 --- a/roop/processors/frame/face_enhancer.py +++ b/roop/processors/frame/face_enhancer.py @@ -60,6 +60,12 @@ def post_process() -> None: def enhance_face(target_face: Face, temp_frame: Frame) -> Frame: start_x, start_y, end_x, end_y = map(int, target_face['bbox']) + padding_x = int((end_x - start_x) * 0.5) + padding_y = int((end_y - start_y) * 0.5) + start_x = max(0, start_x - padding_x) + start_y = max(0, start_y - padding_y) + end_x = max(0, end_x + padding_x) + end_y = max(0, end_y + padding_y) temp_face = temp_frame[start_y:end_y, start_x:end_x] if temp_face.size: with THREAD_SEMAPHORE: diff --git a/roop/utilities.py b/roop/utilities.py index c84eeb600..ac0ed0daa 100644 --- a/roop/utilities.py +++ b/roop/utilities.py @@ -12,8 +12,8 @@ import roop.globals -TEMP_FILE = 'temp.mp4' TEMP_DIRECTORY = 'temp' +TEMP_VIDEO_FILE = 'temp.mp4' # monkey patch ssl for mac if platform.system().lower() == 'darwin': @@ -21,7 +21,7 @@ def run_ffmpeg(args: List[str]) -> bool: - commands = ['ffmpeg', '-hide_banner', '-hwaccel', 'auto', '-loglevel', roop.globals.log_level] + commands = ['ffmpeg', '-hide_banner', '-loglevel', roop.globals.log_level] commands.extend(args) try: subprocess.check_output(commands, stderr=subprocess.STDOUT) @@ -42,27 +42,35 @@ def detect_fps(target_path: str) -> float: return 30 -def extract_frames(target_path: str, fps: float = 30) -> None: +def extract_frames(target_path: str, fps: float = 30) -> bool: temp_directory_path = get_temp_directory_path(target_path) - run_ffmpeg(['-i', target_path, '-pix_fmt', 'rgb24', '-vf', 'fps=' + str(fps), os.path.join(temp_directory_path, '%04d.png')]) + temp_frame_quality = roop.globals.temp_frame_quality * 31 // 100 + return run_ffmpeg(['-hwaccel', 'auto', '-i', target_path, '-q:v', str(temp_frame_quality), '-pix_fmt', 'rgb24', '-vf', 'fps=' + str(fps), os.path.join(temp_directory_path, '%04d.' + roop.globals.temp_frame_format)]) -def create_video(target_path: str, fps: float = 30) -> None: +def create_video(target_path: str, fps: float = 30) -> bool: temp_output_path = get_temp_output_path(target_path) temp_directory_path = get_temp_directory_path(target_path) - run_ffmpeg(['-r', str(fps), '-i', os.path.join(temp_directory_path, '%04d.png'), '-c:v', roop.globals.video_encoder, '-crf', str(roop.globals.video_quality), '-pix_fmt', 'yuv420p', '-vf', 'colorspace=bt709:iall=bt601-6-625:fast=1', '-y', temp_output_path]) + output_video_quality = (roop.globals.output_video_quality + 1) * 51 // 100 + commands = ['-hwaccel', 'auto', '-r', str(fps), '-i', os.path.join(temp_directory_path, '%04d.' + roop.globals.temp_frame_format), '-c:v', roop.globals.output_video_encoder] + if roop.globals.output_video_encoder in ['libx264', 'libx265', 'libvpx']: + commands.extend(['-crf', str(output_video_quality)]) + if roop.globals.output_video_encoder in ['h264_nvenc', 'hevc_nvenc']: + commands.extend(['-cq', str(output_video_quality)]) + commands.extend(['-pix_fmt', 'yuv420p', '-vf', 'colorspace=bt709:iall=bt601-6-625:fast=1', '-y', temp_output_path]) + return run_ffmpeg(commands) def restore_audio(target_path: str, output_path: str) -> None: temp_output_path = get_temp_output_path(target_path) - done = run_ffmpeg(['-i', temp_output_path, '-i', target_path, '-c:v', 'copy', '-map', '0:v:0', '-map', '1:a:0', '-y', output_path]) + done = run_ffmpeg(['-hwaccel', 'auto', '-i', temp_output_path, '-i', target_path, '-c:v', 'copy', '-map', '0:v:0', '-map', '1:a:0', '-y', output_path]) if not done: move_temp(target_path, output_path) def get_temp_frame_paths(target_path: str) -> List[str]: temp_directory_path = get_temp_directory_path(target_path) - return glob.glob((os.path.join(glob.escape(temp_directory_path), '*.png'))) + return glob.glob((os.path.join(glob.escape(temp_directory_path), '*.' + roop.globals.temp_frame_format))) def get_temp_directory_path(target_path: str) -> str: @@ -73,7 +81,7 @@ def get_temp_directory_path(target_path: str) -> str: def get_temp_output_path(target_path: str) -> str: temp_directory_path = get_temp_directory_path(target_path) - return os.path.join(temp_directory_path, TEMP_FILE) + return os.path.join(temp_directory_path, TEMP_VIDEO_FILE) def normalize_output_path(source_path: str, target_path: str, output_path: str) -> Optional[str]: