s0md3v · henryruhs · Jul 25, 2023 · Jul 22, 2023 · Jul 23, 2023 · Jul 23, 2023
diff --git a/README.md b/README.md
@@ -1,61 +1,70 @@
 Take a video and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training.
 
-You can watch some demos [here](https://drive.google.com/drive/folders/1KHv8n_rd3Lcr2v7jBq1yPSTWM554Gq8e?usp=sharing). A StableDiffusion extension is also available, [here](https://github.com/s0md3v/sd-webui-roop).
+You can watch some demos [here](https://drive.google.com/drive/folders/1KHv8n_rd3Lcr2v7jBq1yPSTWM554Gq8e?usp=sharing).
+A Stable Diffusion extension is also available, [here](https://github.com/s0md3v/sd-webui-roop).
 
 ![demo-gif](demo.gif)
 
 ## Disclaimer
+
 This software is meant to be a productive contribution to the rapidly growing AI-generated media industry. It will help artists with tasks such as animating a custom character or using the character as a model for clothing etc.
 
 The developers of this software are aware of its possible unethical applications and are committed to take preventative measures against them. It has a built-in check which prevents the program from working on inappropriate media including but not limited to nudity, graphic content, sensitive material such as war footage etc. We will continue to develop this project in the positive direction while adhering to law and ethics. This project may be shut down or include watermarks on the output if requested by law.
 
 Users of this software are expected to use this software responsibly while abiding the local law. If face of a real person is being used, users are suggested to get consent from the concerned person and clearly mention that it is a deepfake when posting content online. Developers of this software will not be responsible for actions of end-users.
 
-## How do I install it?
+## How to install?
 
 ### Basic
 
-It is more likely to work on your computer but it will also be very slow. You can follow instructions for the basic install [here](https://github.com/s0md3v/roop/wiki/1.-Installation).
+It is more likely to work on your computer, but will be quite slow. Follow instructions for the basic installation [here](https://github.com/s0md3v/roop/wiki/1.-Installation).
 
 ### Acceleration
 
-If you have a good GPU and are ready for solving any software issues you may face, you can enable GPU which is wayyy faster. To do this, first follow the basic install instructions given above and then follow GPU-specific instructions [here](https://github.com/s0md3v/roop/wiki/2.-Acceleration).
+If you own a capable GPU and are prepared to address any software problems, you have the option to activate such acceleration, which offers significantly enhanced speed. Once you finished the basic installation, you can follow the instructions for the acceleration installation [here](https://github.com/s0md3v/roop/wiki/2.-Acceleration).
+
+## How to use?
 
-## How do I use it?
+### UI
 
 Executing `python run.py` command will launch this window:
 
 ![gui-demo](gui-demo.png)
 
 Choose a face (image with desired face) and the target image/video (image/video in which you want to replace the face) and click on `Start`. Open file explorer and navigate to the directory you select your output to be in. You will find a directory named `<video_title>` where you can see the frames being swapped in realtime. Once the processing is done, it will create the output file. That's it.
 
-Additional command line arguments are given below. To learn out what they do, check [this guide](https://github.com/s0md3v/roop/wiki/Advanced-Options).
+## CLI
+
+Additional command line arguments are given below. To learn out what they do, check the guide [here](https://github.com/s0md3v/roop/wiki/Advanced-Options).
 
 ```
 options:
-  -h, --help                                               show this help message and exit
-  -s SOURCE_PATH, --source SOURCE_PATH                     select an source image
-  -t TARGET_PATH, --target TARGET_PATH                     select an target image or video
-  -o OUTPUT_PATH, --output OUTPUT_PATH                     select output file or directory
-  --frame-processor FRAME_PROCESSOR [FRAME_PROCESSOR ...]  frame processors (choices: face_swapper, face_enhancer, ...)
-  --keep-fps                                               keep target fps
-  --keep-frames                                            keep temporary frames
-  --skip-audio                                             skip target audio
-  --many-faces                                             process every face
-  --reference-face-position REFERENCE_FACE_POSITION        position of the reference face
-  --reference-frame-number REFERENCE_FRAME_NUMBER          number of the reference frame
-  --similar-face-distance SIMILAR_FACE_DISTANCE            face distance used for recognition
-  --video-encoder {libx264,libx265,libvpx-vp9}             adjust output video encoder
-  --video-quality [0-51]                                   adjust output video quality
-  --max-memory MAX_MEMORY                                  maximum amount of RAM in GB
-  --execution-provider {cpu} [{cpu} ...]                   available execution provider (choices: cpu, ...)
-  --execution-threads EXECUTION_THREADS                    number of execution threads
-  -v, --version                                            show program's version number and exit
+  -h, --help                                                                 show this help message and exit
+  -s SOURCE_PATH, --source SOURCE_PATH                                       select an source image
+  -t TARGET_PATH, --target TARGET_PATH                                       select an target image or video
+  -o OUTPUT_PATH, --output OUTPUT_PATH                                       select output file or directory
+  --frame-processor FRAME_PROCESSOR [FRAME_PROCESSOR ...]                    frame processors (choices: face_swapper, face_enhancer, ...)
+  --keep-fps                                                                 keep target fps
+  --keep-frames                                                              keep temporary frames
+  --skip-audio                                                               skip target audio
+  --many-faces                                                               process every face
+  --reference-face-position REFERENCE_FACE_POSITION                          position of the reference face
+  --reference-frame-number REFERENCE_FRAME_NUMBER                            number of the reference frame
+  --similar-face-distance SIMILAR_FACE_DISTANCE                              face distance used for recognition
+  --temp-frame-format {jpg,png}                                              image format used for frame extraction
+  --temp-frame-quality [1-100]                                               image quality used for frame extraction
+  --output-video-encoder {libx264,libx265,libvpx-vp9,h264_nvenc,hevc_nvenc}  encoder used for the output video
+  --output-video-quality [1-100]                                             quality used for the output video
+  --max-memory MAX_MEMORY                                                    maximum amount of RAM in GB
+  --execution-provider {cpu} [{cpu} ...]                                     available execution provider (choices: cpu, ...)
+  --execution-threads EXECUTION_THREADS                                      number of execution threads
+  -v, --version                                                              show program's version number and exit
 ```
 
-Looking for a CLI mode? Using the -s/--source argument will make the run program in cli mode.
+Using the `-s/--source`, `-t/--target` and `-o/--output` argument will run the program in headless mode.
 
 ## Credits
+
 - [henryruhs](https://github.com/henryruhs): for being an irreplaceable contributor to the project
 - [ffmpeg](https://ffmpeg.org/): for making video related operations easy
 - [deepinsight](https://github.com/deepinsight): for their [insightface](https://github.com/deepinsight/insightface) project which provided a well-made library and models.

diff --git a/gui-demo.png b/gui-demo.png
diff --git a/roop/core.py b/roop/core.py
@@ -44,8 +44,10 @@ def parse_args() -> None:
     program.add_argument('--reference-face-position', help='position of the reference face', dest='reference_face_position', type=int, default=0)
     program.add_argument('--reference-frame-number', help='number of the reference frame', dest='reference_frame_number', type=int, default=0)
     program.add_argument('--similar-face-distance', help='face distance used for recognition', dest='similar_face_distance', type=float, default=0.85)
-    program.add_argument('--video-encoder', help='adjust output video encoder', dest='video_encoder', default='libx264', choices=['libx264', 'libx265', 'libvpx-vp9'])
-    program.add_argument('--video-quality', help='adjust output video quality', dest='video_quality', type=int, default=18, choices=range(52), metavar='[0-51]')
+    program.add_argument('--temp-frame-format', help='image format used for frame extraction', dest='temp_frame_format', default='png', choices=['jpg', 'png'])
+    program.add_argument('--temp-frame-quality', help='image quality used for frame extraction', dest='temp_frame_quality', type=int, default=0, choices=range(100), metavar='[1-100]')
+    program.add_argument('--output-video-encoder', help='encoder used for the output video', dest='output_video_encoder', default='libx264', choices=['libx264', 'libx265', 'libvpx-vp9', 'h264_nvenc', 'hevc_nvenc'])
+    program.add_argument('--output-video-quality', help='quality used for the output video', dest='output_video_quality', type=int, default=35, choices=range(100), metavar='[1-100]')
     program.add_argument('--max-memory', help='maximum amount of RAM in GB', dest='max_memory', type=int)
     program.add_argument('--execution-provider', help='available execution provider (choices: cpu, ...)', dest='execution_provider', default=['cpu'], choices=suggest_execution_providers(), nargs='+')
     program.add_argument('--execution-threads', help='number of execution threads', dest='execution_threads', type=int, default=suggest_execution_threads())
@@ -65,8 +67,10 @@ def parse_args() -> None:
     roop.globals.reference_face_position = args.reference_face_position
     roop.globals.reference_frame_number = args.reference_frame_number
     roop.globals.similar_face_distance = args.similar_face_distance
-    roop.globals.video_encoder = args.video_encoder
-    roop.globals.video_quality = args.video_quality
+    roop.globals.temp_frame_format = args.temp_frame_format
+    roop.globals.temp_frame_quality = args.temp_frame_quality
+    roop.globals.output_video_encoder = args.output_video_encoder
+    roop.globals.output_video_quality = args.output_video_quality
     roop.globals.max_memory = args.max_memory
     roop.globals.execution_providers = decode_execution_providers(args.execution_provider)
     roop.globals.execution_threads = args.execution_threads
@@ -151,7 +155,7 @@ def start() -> None:
     # process image to videos
     if predict_video(roop.globals.target_path):
         destroy()
-    update_status('Creating temp resources...')
+    update_status('Creating temporary resources...')
     create_temp(roop.globals.target_path)
     # extract frames
     if roop.globals.keep_fps:
@@ -163,10 +167,14 @@ def start() -> None:
         extract_frames(roop.globals.target_path)
     # process frame
     temp_frame_paths = get_temp_frame_paths(roop.globals.target_path)
-    for frame_processor in get_frame_processors_modules(roop.globals.frame_processors):
-        update_status('Progressing...', frame_processor.NAME)
-        frame_processor.process_video(roop.globals.source_path, temp_frame_paths)
-        frame_processor.post_process()
+    if temp_frame_paths:
+        for frame_processor in get_frame_processors_modules(roop.globals.frame_processors):
+            update_status('Progressing...', frame_processor.NAME)
+            frame_processor.process_video(roop.globals.source_path, temp_frame_paths)
+            frame_processor.post_process()
+    else:
+        update_status('Frames not found...')
+        return
     # create video
     if roop.globals.keep_fps:
         fps = detect_fps(roop.globals.target_path)
@@ -186,6 +194,7 @@ def start() -> None:
             update_status('Restoring audio might cause issues as fps are not kept...')
         restore_audio(roop.globals.target_path, roop.globals.output_path)
     # clean temp
+    update_status('Cleaning temporary resources...')
     clean_temp(roop.globals.target_path)
     # validate video
     if is_video(roop.globals.target_path):

diff --git a/roop/globals.py b/roop/globals.py
@@ -12,8 +12,10 @@
 reference_face_position = None
 reference_frame_number = None
 similar_face_distance = None
-video_encoder = None
-video_quality = None
+temp_frame_format = None
+temp_frame_quality = None
+output_video_encoder = None
+output_video_quality = None
 max_memory = None
 execution_providers: List[str] = []
 execution_threads = None

diff --git a/roop/metadata.py b/roop/metadata.py
@@ -1,2 +1,2 @@
 name = 'roop'
-version = '1.2.0'
+version = '1.3.0'
diff --git a/roop/processors/frame/face_enhancer.py b/roop/processors/frame/face_enhancer.py
@@ -60,6 +60,12 @@ def post_process() -> None:
 
 def enhance_face(target_face: Face, temp_frame: Frame) -> Frame:
     start_x, start_y, end_x, end_y = map(int, target_face['bbox'])
+    padding_x = int((end_x - start_x) * 0.5)
+    padding_y = int((end_y - start_y) * 0.5)
+    start_x = max(0, start_x - padding_x)
+    start_y = max(0, start_y - padding_y)
+    end_x = max(0, end_x + padding_x)
+    end_y = max(0, end_y + padding_y)
     temp_face = temp_frame[start_y:end_y, start_x:end_x]
     if temp_face.size:
         with THREAD_SEMAPHORE:

diff --git a/roop/utilities.py b/roop/utilities.py
@@ -12,16 +12,16 @@
 
 import roop.globals
 
-TEMP_FILE = 'temp.mp4'
 TEMP_DIRECTORY = 'temp'
+TEMP_VIDEO_FILE = 'temp.mp4'
 
 # monkey patch ssl for mac
 if platform.system().lower() == 'darwin':
     ssl._create_default_https_context = ssl._create_unverified_context
 
 
 def run_ffmpeg(args: List[str]) -> bool:
-    commands = ['ffmpeg', '-hide_banner', '-hwaccel', 'auto', '-loglevel', roop.globals.log_level]
+    commands = ['ffmpeg', '-hide_banner', '-loglevel', roop.globals.log_level]
     commands.extend(args)
     try:
         subprocess.check_output(commands, stderr=subprocess.STDOUT)
@@ -42,27 +42,35 @@ def detect_fps(target_path: str) -> float:
     return 30
 
 
-def extract_frames(target_path: str, fps: float = 30) -> None:
+def extract_frames(target_path: str, fps: float = 30) -> bool:
     temp_directory_path = get_temp_directory_path(target_path)
-    run_ffmpeg(['-i', target_path, '-pix_fmt', 'rgb24', '-vf', 'fps=' + str(fps), os.path.join(temp_directory_path, '%04d.png')])
+    temp_frame_quality = roop.globals.temp_frame_quality * 31 // 100
+    return run_ffmpeg(['-hwaccel', 'auto', '-i', target_path, '-q:v', str(temp_frame_quality), '-pix_fmt', 'rgb24', '-vf', 'fps=' + str(fps), os.path.join(temp_directory_path, '%04d.' + roop.globals.temp_frame_format)])
 
 
-def create_video(target_path: str, fps: float = 30) -> None:
+def create_video(target_path: str, fps: float = 30) -> bool:
     temp_output_path = get_temp_output_path(target_path)
     temp_directory_path = get_temp_directory_path(target_path)
-    run_ffmpeg(['-r', str(fps), '-i', os.path.join(temp_directory_path, '%04d.png'), '-c:v', roop.globals.video_encoder, '-crf', str(roop.globals.video_quality), '-pix_fmt', 'yuv420p', '-vf', 'colorspace=bt709:iall=bt601-6-625:fast=1', '-y', temp_output_path])
+    output_video_quality = (roop.globals.output_video_quality + 1) * 51 // 100
+    commands = ['-hwaccel', 'auto', '-r', str(fps), '-i', os.path.join(temp_directory_path, '%04d.' + roop.globals.temp_frame_format), '-c:v', roop.globals.output_video_encoder]
+    if roop.globals.output_video_encoder in ['libx264', 'libx265', 'libvpx']:
+        commands.extend(['-crf', str(output_video_quality)])
+    if roop.globals.output_video_encoder in ['h264_nvenc', 'hevc_nvenc']:
+        commands.extend(['-cq', str(output_video_quality)])
+    commands.extend(['-pix_fmt', 'yuv420p', '-vf', 'colorspace=bt709:iall=bt601-6-625:fast=1', '-y', temp_output_path])
+    return run_ffmpeg(commands)
 
 
 def restore_audio(target_path: str, output_path: str) -> None:
     temp_output_path = get_temp_output_path(target_path)
-    done = run_ffmpeg(['-i', temp_output_path, '-i', target_path, '-c:v', 'copy', '-map', '0:v:0', '-map', '1:a:0', '-y', output_path])
+    done = run_ffmpeg(['-hwaccel', 'auto', '-i', temp_output_path, '-i', target_path, '-c:v', 'copy', '-map', '0:v:0', '-map', '1:a:0', '-y', output_path])
     if not done:
         move_temp(target_path, output_path)
 
 
 def get_temp_frame_paths(target_path: str) -> List[str]:
     temp_directory_path = get_temp_directory_path(target_path)
-    return glob.glob((os.path.join(glob.escape(temp_directory_path), '*.png')))
+    return glob.glob((os.path.join(glob.escape(temp_directory_path), '*.' + roop.globals.temp_frame_format)))
 
 
 def get_temp_directory_path(target_path: str) -> str:
@@ -73,7 +81,7 @@ def get_temp_directory_path(target_path: str) -> str:
 
 def get_temp_output_path(target_path: str) -> str:
     temp_directory_path = get_temp_directory_path(target_path)
-    return os.path.join(temp_directory_path, TEMP_FILE)
+    return os.path.join(temp_directory_path, TEMP_VIDEO_FILE)
 
 
 def normalize_output_path(source_path: str, target_path: str, output_path: str) -> Optional[str]: