Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge with x264.git #1

Open
wants to merge 1,730 commits into
base: master
Choose a base branch
from
Open

Merge with x264.git #1

wants to merge 1,730 commits into from
This pull request is big! We’re only showing the most recent 250 commits.

Commits on Feb 4, 2012

  1. Clean up and optimize weightp, plus enable SSSE3 weight on SB/BDZ

    Also remove unused AVX cruft.
    Fiona Glaser committed Feb 4, 2012
    Configuration menu
    Copy the full SHA
    6d7c5ef View commit details
    Browse the repository at this point in the history
  2. Minor asm optimizations/cleanup

    Fiona Glaser committed Feb 4, 2012
    Configuration menu
    Copy the full SHA
    04c3819 View commit details
    Browse the repository at this point in the history
  3. x86inc: add TAIL_CALL macro to abstract a common asm idiom

    pengvado authored and Fiona Glaser committed Feb 4, 2012
    Configuration menu
    Copy the full SHA
    e0581e0 View commit details
    Browse the repository at this point in the history
  4. TBM, AVX2, FMA3, BMI1, and BMI2 CPU detection support

    TBM and BMI1 are supported by Trinity/Piledriver.
    The others (and BMI1) will probably appear in Intel's upcoming Haswell.
    Also update x86inc with AVX2 stuff.
    Fiona Glaser committed Feb 4, 2012
    Configuration menu
    Copy the full SHA
    ae289e6 View commit details
    Browse the repository at this point in the history

Commits on Feb 5, 2012

  1. Fix regression in r2141

    Broke register preservation in x264_cpu_cpuid and x264_cpu_xgetbv.
    Did not cause any problems.
    Henrik Gramner authored and Fiona Glaser committed Feb 5, 2012
    Configuration menu
    Copy the full SHA
    a37a424 View commit details
    Browse the repository at this point in the history

Commits on Feb 15, 2012

  1. Fix interlaced + extremal slice-max-size

    Broke if the first macroblock in the slice exceeded the set slice-max-size.
    Fiona Glaser committed Feb 15, 2012
    Configuration menu
    Copy the full SHA
    282c3cf View commit details
    Browse the repository at this point in the history

Commits on Mar 6, 2012

  1. Fix RGB colorspace input

    BGR/BGRA input was correct.
    MasterNobody authored and Fiona Glaser committed Mar 6, 2012
    Configuration menu
    Copy the full SHA
    0fc5acc View commit details
    Browse the repository at this point in the history
  2. Add error handling for out-of-tree build

    chikuzen authored and Fiona Glaser committed Mar 6, 2012
    Configuration menu
    Copy the full SHA
    10e1ba5 View commit details
    Browse the repository at this point in the history
  3. ICL: fix out of tree building and resource file usage on Windows

    kemuri-9 authored and Fiona Glaser committed Mar 6, 2012
    Configuration menu
    Copy the full SHA
    38a26cd View commit details
    Browse the repository at this point in the history
  4. Fix rare overflow in 10-bit intra_satd_x3_16x16 asm

    MasterNobody authored and Fiona Glaser committed Mar 6, 2012
    Configuration menu
    Copy the full SHA
    0a36950 View commit details
    Browse the repository at this point in the history
  5. Fix possible alignment crash when linking from MSVC

    x264_cavlc_init needs to be stack-aligned now.
    Fiona Glaser committed Mar 6, 2012
    Configuration menu
    Copy the full SHA
    d52d0b1 View commit details
    Browse the repository at this point in the history
  6. Fix incorrect zero-extension assumptions in x86_64 asm

    Some x264 asm assumed that the high 32 bits of registers containing "int" values would be zero.
    This is almost always the case, and it seems to work with gcc, but it is *not* guaranteed by the ABI.
    As a result, it breaks with some other compilers, like Clang, that take advantage of this in optimizations.
    Accordingly, fix all x86 code by using intptr_t instead of int or using movsxd where neccessary.
    Also add checkasm hack to detect when assembly functions incorrectly assumes that 32-bit integers are zero-extended to 64-bit.
    Henrik Gramner authored and Fiona Glaser committed Mar 6, 2012
    Configuration menu
    Copy the full SHA
    3131a19 View commit details
    Browse the repository at this point in the history
  7. x86inc: support yasm -f win64

    Not necessary for x264, as -m amd64 already does the right thing, but used by external users of x86inc.
    rbultje authored and Fiona Glaser committed Mar 6, 2012
    Configuration menu
    Copy the full SHA
    3a5f2fe View commit details
    Browse the repository at this point in the history
  8. Export PSNR/SSIM in x264 API

    Fiona Glaser committed Mar 6, 2012
    Configuration menu
    Copy the full SHA
    9da19fb View commit details
    Browse the repository at this point in the history
  9. Remove explicit run calculation from coeff_level_run

    Not necessary with the CAVLC lookup table for zero run codes.
    Fiona Glaser committed Mar 6, 2012
    Configuration menu
    Copy the full SHA
    1b31a10 View commit details
    Browse the repository at this point in the history

Commits on Mar 7, 2012

  1. Add an small per-MB cost penalty for lowres

    Helps avoid VBV predictors going nuts with very low-cost MBs.
    One particular case this fixes is zero-cost MBs: adaptive quantization decreases the QP a lot, but (before this patch), no cost penalty gets factored in for this, because anything times zero is zero.
    MasterNobody authored and Fiona Glaser committed Mar 7, 2012
    Configuration menu
    Copy the full SHA
    48e8e52 View commit details
    Browse the repository at this point in the history
  2. Abstract bitstream backup/restore functions

    Required for row re-encoding.
    Fiona Glaser committed Mar 7, 2012
    Configuration menu
    Copy the full SHA
    bc473dd View commit details
    Browse the repository at this point in the history
  3. Add row-reencoding support to VBV for improved accuracy

    Extremely accurate, possibly 100% so (I can't get it to fail even with difficult VBVs).
    Does not yet support rows split on slice boundaries (occurs often with slice-max-size/mbs).
    Still inaccurate with sliced threads, but better than before.
    Fiona Glaser committed Mar 7, 2012
    Configuration menu
    Copy the full SHA
    2535ba1 View commit details
    Browse the repository at this point in the history
  4. Minor asm changes

    Fiona Glaser committed Mar 7, 2012
    Configuration menu
    Copy the full SHA
    92b0bd9 View commit details
    Browse the repository at this point in the history
  5. BMI1 decimate functions

    Intel was nice enough to make tzcnt equal to "rep bsf", which is backwards-compatible.
    This means we don't actually have to add new functions to make it work.
    Fiona Glaser committed Mar 7, 2012
    Configuration menu
    Copy the full SHA
    42db5e6 View commit details
    Browse the repository at this point in the history
  6. x86inc: switch to amdnops

    Recent AMD CPUs' instruction decoders choke horribly on extremely long nops (i.e. with 4 prefixes).
    Won't affect much, since we don't use ALIGN much.
    Fiona Glaser committed Mar 7, 2012
    Configuration menu
    Copy the full SHA
    5b2c62a View commit details
    Browse the repository at this point in the history
  7. Add full-recon API option

    Fully reconstruct frames even without dump-yuv.
    Fiona Glaser committed Mar 7, 2012
    Configuration menu
    Copy the full SHA
    90408ec View commit details
    Browse the repository at this point in the history
  8. Sliced-threads: do hpel and deblock after returning

    Lowers encoding latency around 14% in sliced threads mode with preset superfast.
    Additionally, even if there is no waiting time between frames, this improves parallelism, because hpel+deblock are done during the (singlethreaded) lookahead.
    For ease of debugging, dump-yuv forces all of the threads to wait and finish instead of setting b_full_recon.
    Fiona Glaser committed Mar 7, 2012
    Configuration menu
    Copy the full SHA
    a155572 View commit details
    Browse the repository at this point in the history

Commits on Mar 12, 2012

  1. Fix clobbering of mutex/cvs

    Regression in r2183.
    Bizarrely seemed to work on many platforms, but crashed on win64 and may have been slower.
    Only affected sliced threads during encoding, but could cause crashes on x264 encoder close even without sliced threads.
    MasterNobody authored and Fiona Glaser committed Mar 12, 2012
    Configuration menu
    Copy the full SHA
    e046ba7 View commit details
    Browse the repository at this point in the history

Commits on Mar 14, 2012

  1. Fix sliced-threads ratecontrol bug

    Was using qp instead of qscale; could cause NANs (not to mention less accurate results).
    Fiona Glaser committed Mar 14, 2012
    Configuration menu
    Copy the full SHA
    bca4127 View commit details
    Browse the repository at this point in the history

Commits on Mar 22, 2012

  1. Fix comment in deblock.c

    The code does, in fact, handle CAVLC+8x8dct correctly already.
    Fiona Glaser committed Mar 22, 2012
    Configuration menu
    Copy the full SHA
    065fec2 View commit details
    Browse the repository at this point in the history

Commits on Mar 25, 2012

  1. Fix frame input colorspace check

    MasterNobody authored and Fiona Glaser committed Mar 25, 2012
    Configuration menu
    Copy the full SHA
    fff12b1 View commit details
    Browse the repository at this point in the history

Commits on Mar 27, 2012

  1. Fix intra-refresh + hrd

    Kieran Kunhya authored and Fiona Glaser committed Mar 27, 2012
    Configuration menu
    Copy the full SHA
    52f7a14 View commit details
    Browse the repository at this point in the history

Commits on Apr 23, 2012

  1. ICL/MSVS: Fix shared library generation and usage

    MSVS requires exported variables to be declared with the DATA keyword, and requires that imported variables be declared with dllimport.
    This does not fix x264 cli being unable to use a shared library built by ICL however.
    kemuri-9 authored and Fiona Glaser committed Apr 23, 2012
    Configuration menu
    Copy the full SHA
    70877e3 View commit details
    Browse the repository at this point in the history
  2. configure: correct use of RC variable and add --extra-rcflags

    komisar666 authored and Fiona Glaser committed Apr 23, 2012
    Configuration menu
    Copy the full SHA
    62d7007 View commit details
    Browse the repository at this point in the history
  3. Update config.guess and config.sub

    Adds support for a bunch of targets, including:
    aarch64 (armv8)
    arm-linux-androideabi
    funman authored and Fiona Glaser committed Apr 23, 2012
    Configuration menu
    Copy the full SHA
    f4aefb3 View commit details
    Browse the repository at this point in the history
  4. configure: force select -mXX gcc option for i386/x86-64

    Makes multilib compilation more convenient.
    komisar666 authored and Fiona Glaser committed Apr 23, 2012
    Configuration menu
    Copy the full SHA
    ffea9f5 View commit details
    Browse the repository at this point in the history
  5. Fix disabling of mbtree when using 2pass encoding and zones

    MasterNobody authored and Fiona Glaser committed Apr 23, 2012
    Configuration menu
    Copy the full SHA
    b0f44f9 View commit details
    Browse the repository at this point in the history
  6. Eradicate all mention of Extended Profile

    x264 never supported it and never will because nobody uses it.
    Henrik Gramner authored and Fiona Glaser committed Apr 23, 2012
    Configuration menu
    Copy the full SHA
    66acbbf View commit details
    Browse the repository at this point in the history
  7. Add Level 5.2 support

    astrataro authored and Fiona Glaser committed Apr 23, 2012
    Configuration menu
    Copy the full SHA
    e8952df View commit details
    Browse the repository at this point in the history
  8. Faster chroma weight cost calculation

    New assembly function with SSE2, SSSE3 and XOP implementations for calculating absolute sum of differences.
    Henrik Gramner authored and Fiona Glaser committed Apr 23, 2012
    Configuration menu
    Copy the full SHA
    4442eac View commit details
    Browse the repository at this point in the history

Commits on Apr 24, 2012

  1. Add mb_info API for signalling constant macroblocks

    Some use-cases of x264 involve encoding video with large constant areas of the frame.
    Sometimes, the caller knows which areas these are, and can tell x264.
    This API lets the caller do this and adds internal tracking of modifications to macroblocks to avoid problems.
    This is really only suitable without B-frames.
    An example use-case would be using x264 for VNC.
    Fiona Glaser committed Apr 24, 2012
    Configuration menu
    Copy the full SHA
    8e57a9a View commit details
    Browse the repository at this point in the history

Commits on May 15, 2012

  1. Fix some bugs in mb_info code

    MasterNobody authored and Fiona Glaser committed May 15, 2012
    Configuration menu
    Copy the full SHA
    44d2f08 View commit details
    Browse the repository at this point in the history
  2. Add support for RGB formats in bit-depth conversion filter

    MasterNobody authored and Fiona Glaser committed May 15, 2012
    Configuration menu
    Copy the full SHA
    7cfe43c View commit details
    Browse the repository at this point in the history

Commits on May 18, 2012

  1. Threaded lookahead

    Split each lookahead frame analysis call into multiple threads.  Has a small
    impact on quality, but does not seem to be consistently any worse.
    
    This helps alleviate bottlenecks with many cores and frame threads. In many
    case, this massively increases performance on many-core systems.  For example,
    over 100% faster 1080p encoding with --preset veryfast on a 12-core i7 system.
    Realtime 1080p30 at --preset slow should now be feasible on real systems.
    
    For sliced-threads, this patch should be faster regardless of settings (~10%).
    
    By default, lookahead threads are 1/6 of regular threads.  This isn't exacting,
    but it seems to work well for all presets on real systems.  With sliced-threads,
    it's the same as the number of encoding threads.
    Fiona Glaser committed May 18, 2012
    Configuration menu
    Copy the full SHA
    df700ea View commit details
    Browse the repository at this point in the history

Commits on Jul 3, 2012

  1. Fix crash with --fps 0

    Fix some integer overflows and check input parameters better.
    Also fix incorrect type specifiers for demuxer info printing.
    MasterNobody authored and Fiona Glaser committed Jul 3, 2012
    Configuration menu
    Copy the full SHA
    5e3aaf1 View commit details
    Browse the repository at this point in the history
  2. x86inc: import patches from libav

    Allow manual invocation of WIN64_SPILL_XMM even under INIT_MMX
    SSE version of mova is movaps rather than movdqa.
    YMM version of movnta.
    Add mp size for named arguments.
    Fix DEFINE_ARGS when used outside of a cglobal.
    Define a few more cpuflags.
    3-argument wrappers for a few more instructions.
    pengvado authored and Fiona Glaser committed Jul 3, 2012
    Configuration menu
    Copy the full SHA
    5754ea2 View commit details
    Browse the repository at this point in the history

Commits on Jul 17, 2012

  1. Cap ratecontrol predictor parameters

    Limits VBV mispredictions after long periods of relatively constant video.
    MasterNobody authored and Fiona Glaser committed Jul 17, 2012
    Configuration menu
    Copy the full SHA
    bcd1a70 View commit details
    Browse the repository at this point in the history
  2. Print elapsed time in encoding progress indicator

    komisar666 authored and Fiona Glaser committed Jul 17, 2012
    Configuration menu
    Copy the full SHA
    498af9c View commit details
    Browse the repository at this point in the history
  3. Support changing resolutions between passes with macroblock-tree

    Implement a basic separable bilinear filter to rescale the quantizer offsets.
    Structure inspired by swscale, but floating-point instead of fixed-point.
    Not as optimized as it could be, but it's quite fast already.
    
    Example compression penalties on a 720p video game recording:
    First pass with 720p and second as 480p: ~-1.5% (vs. same res)
    First pass with 480p and second as 720p: ~-3% (vs. same res)
    Fiona Glaser committed Jul 17, 2012
    Configuration menu
    Copy the full SHA
    dea5d7a View commit details
    Browse the repository at this point in the history
  4. Try 8x8 transform analysis even when sub8x8 partitions are present

    Turn off the sub8x8 partitions, try it, and turn them back on if it didn't help.
    Small compression improvement with p4x4 on (~0.1-0.5%).
    Also update related comments.
    Fiona Glaser committed Jul 17, 2012
    Configuration menu
    Copy the full SHA
    d026397 View commit details
    Browse the repository at this point in the history
  5. Faster predictor checking with subme<3

    Fix a typo that made an early-skip less effective.
    Avoid a relatively unpredictable branch.
    Slightly changed output due to the typo-fix.
    ~50 cycles faster on Core i7.
    Fiona Glaser committed Jul 17, 2012
    Configuration menu
    Copy the full SHA
    2ec6941 View commit details
    Browse the repository at this point in the history

Commits on Jul 18, 2012

  1. Revert r2204

    People don't seem to like this so I'm just going to get rid of it.
    Fiona Glaser committed Jul 18, 2012
    Configuration menu
    Copy the full SHA
    3d03b61 View commit details
    Browse the repository at this point in the history

Commits on Jul 26, 2012

  1. Free user supplied data when deleting a frame

    This eliminates a memory leak when calling x264_encoder_close.
    kierank authored and Fiona Glaser committed Jul 26, 2012
    Configuration menu
    Copy the full SHA
    cbb9070 View commit details
    Browse the repository at this point in the history

Commits on Jul 27, 2012

  1. x86inc: automatically insert vzeroupper for YMM functions

    Backported from libav.
    rbultje authored and Fiona Glaser committed Jul 27, 2012
    Configuration menu
    Copy the full SHA
    ed56837 View commit details
    Browse the repository at this point in the history

Commits on Sep 5, 2012

  1. Remove special-casing for OpenBSD pthread handling

    Previously it was policy to use -pthread, but OpenBSD now recommends -lpthread.
    its been libpthread anyway and policy has changed to stop using -pthread.
    brad0 authored and Fiona Glaser committed Sep 5, 2012
    Configuration menu
    Copy the full SHA
    f8fd641 View commit details
    Browse the repository at this point in the history
  2. Export the average effective CRF of each frame

    Useful to judge the resulting quality of a frame when VBV is enabled.
    Fiona Glaser committed Sep 5, 2012
    Configuration menu
    Copy the full SHA
    cc5dced View commit details
    Browse the repository at this point in the history
  3. Improve mb_info constant mb optimization

    Allow fast skipping even if the pskip MV isn't zero.
    Fiona Glaser committed Sep 5, 2012
    Configuration menu
    Copy the full SHA
    05089a3 View commit details
    Browse the repository at this point in the history
  4. Enhance nalu_process

    Add the input frame opaque pointer to the arguments.
    This makes it easier to use with multiple simultaneous x264 encodes.
    Fiona Glaser committed Sep 5, 2012
    Configuration menu
    Copy the full SHA
    f93b786 View commit details
    Browse the repository at this point in the history
  5. Fix mb_info_free with sliced threads

    x264 would free mb_info before it was completely done using it.
    Fiona Glaser committed Sep 5, 2012
    Configuration menu
    Copy the full SHA
    033df0a View commit details
    Browse the repository at this point in the history
  6. Enhance mb_info: add mb_info_update

    This feature lets the callee know which decoded macroblocks have changed.
    Fiona Glaser committed Sep 5, 2012
    Configuration menu
    Copy the full SHA
    8980dd8 View commit details
    Browse the repository at this point in the history

Commits on Sep 11, 2012

  1. Set libm in the configure script if the OS has libm

    Prerequisite for another configure patch after this.
    Idea copied from libpthread.
    brad0 authored and Fiona Glaser committed Sep 11, 2012
    Configuration menu
    Copy the full SHA
    e8e8b9a View commit details
    Browse the repository at this point in the history

Commits on Sep 26, 2012

  1. Fix pkg-config for dynamic vs static linking

    brad0 authored and Fiona Glaser committed Sep 26, 2012
    Configuration menu
    Copy the full SHA
    02217bd View commit details
    Browse the repository at this point in the history
  2. Fix use of deprecated av_close_input_file call

    Jason Martens authored and Fiona Glaser committed Sep 26, 2012
    Configuration menu
    Copy the full SHA
    9657747 View commit details
    Browse the repository at this point in the history

Commits on Nov 7, 2012

  1. Fix ALIGNED_ARRAY_EMU macros on ICL

    ICL's preprocessor doesn't handle it correctly.
    This fix is similar to libav's fix in 0db2d9.
    dwbuiten authored and Fiona Glaser committed Nov 7, 2012
    Configuration menu
    Copy the full SHA
    21ba91a View commit details
    Browse the repository at this point in the history
  2. Fix reconfiguring to crf=0

    Lossless mode can't currently be enabled mid-stream.
    MasterNobody authored and Fiona Glaser committed Nov 7, 2012
    Configuration menu
    Copy the full SHA
    480bbc9 View commit details
    Browse the repository at this point in the history
  3. Fix crash with no-scenecut + mbtree

    Fiona Glaser committed Nov 7, 2012
    Configuration menu
    Copy the full SHA
    ac2d7c0 View commit details
    Browse the repository at this point in the history
  4. Disable ARM NEON MRC CPU test for Apple devices

    The Apple A6 CPU doesn't support performance counters, so this test caused a crash.
    David Wolstencroft authored and Fiona Glaser committed Nov 7, 2012
    Configuration menu
    Copy the full SHA
    3f516c5 View commit details
    Browse the repository at this point in the history
  5. x86inc: only define program_name if the macro is unset.

    This allows overriding the value from outside the file.
    This can be useful if x86inc.asm is used outside of x264.
    DonDiego authored and Fiona Glaser committed Nov 7, 2012
    Configuration menu
    Copy the full SHA
    00cc160 View commit details
    Browse the repository at this point in the history
  6. x86inc: Rename 3dnow2 to 3dnowext

    The name "3dnowext" is more common than "3dnow2". Doesn't affect x264.
    DonDiego authored and Fiona Glaser committed Nov 7, 2012
    Configuration menu
    Copy the full SHA
    5d85879 View commit details
    Browse the repository at this point in the history
  7. Add support for the ffmpeg/vapoursynth high bit depth y4m extensions

    jeeb authored and Fiona Glaser committed Nov 7, 2012
    Configuration menu
    Copy the full SHA
    cc61a4b View commit details
    Browse the repository at this point in the history
  8. Update level dpb size calculation to match newer H.264 spec

    Doesn't actually change encoding behavior, but makes it more correct.
    Warning messages should now be accurate at higher bit depths and non-4:2:0.
    Technically, since it redefines x264_level_t, this is an API version increment.
    Fiona Glaser committed Nov 7, 2012
    Configuration menu
    Copy the full SHA
    0d5f6fb View commit details
    Browse the repository at this point in the history
  9. Improve slice header QP selection

    Use the first macroblock of each slice instead of the last of the previous.
    Lets us pick a reasonable initial QP for the first slice too.
    Slightly improved compression.
    Fiona Glaser committed Nov 7, 2012
    Configuration menu
    Copy the full SHA
    b304a7c View commit details
    Browse the repository at this point in the history
  10. Attempt to optimize PPS pic_init_qp in 2-pass mode

    Small compression improvement; up to ~0.5% in extreme cases.
    Helps more with small slice sizes (tiny resolutions or slice-max-size).
    Note that this changes the 2-pass stats file format.
    Fiona Glaser committed Nov 7, 2012
    Configuration menu
    Copy the full SHA
    1580a74 View commit details
    Browse the repository at this point in the history

Commits on Nov 8, 2012

  1. Fix possible issues with out-of-spec QP values

    Fixes a possible regression in r2228.
    MasterNobody authored and Fiona Glaser committed Nov 8, 2012
    Configuration menu
    Copy the full SHA
    bfed708 View commit details
    Browse the repository at this point in the history

Commits on Nov 12, 2012

  1. Fix crash when using libx264.dll compiled with ICL for X86_64

    MasterNobody authored and Fiona Glaser committed Nov 12, 2012
    Configuration menu
    Copy the full SHA
    144b791 View commit details
    Browse the repository at this point in the history

Commits on Nov 19, 2012

  1. lavf input: allocate AVFrame correctly

    Allocate AVFrames correctly with avcodec_alloc_frame().
    This caused crashes with newer libavcodecs that try to free frame extradata.
    elenril authored and Fiona Glaser committed Nov 19, 2012
    Configuration menu
    Copy the full SHA
    0db80be View commit details
    Browse the repository at this point in the history

Commits on Dec 6, 2012

  1. Solaris: use sysconf to get processor count

    Solaris responds correctly to the same value as Cygwin, so let's use that.
    SeanMcG authored and Fiona Glaser committed Dec 6, 2012
    Configuration menu
    Copy the full SHA
    12458a2 View commit details
    Browse the repository at this point in the history
  2. configure: fix gpac detection with -Wp,-D_FORTIFY_SOURCE=2

    sergiomb2 authored and Fiona Glaser committed Dec 6, 2012
    Configuration menu
    Copy the full SHA
    cd71765 View commit details
    Browse the repository at this point in the history
  3. Fix typo in r2222

    Slightly wrong numbers in level table.
    Fiona Glaser committed Dec 6, 2012
    Configuration menu
    Copy the full SHA
    042fdd3 View commit details
    Browse the repository at this point in the history

Commits on Dec 12, 2012

  1. Fix pthread_join emulation on win32 and BeOS

    Doesn't actually affect x264, but it's more correct.
    MasterNobody authored and Fiona Glaser committed Dec 12, 2012
    Configuration menu
    Copy the full SHA
    23829dd View commit details
    Browse the repository at this point in the history

Commits on Jan 8, 2013

  1. Fix build on ARM with binutils >= 2.23.51.0.6

    GAS doesn't seem to like spaces in vld1 anymore, so remove those.
    Bernhard Rosenkränzer authored and Fiona Glaser committed Jan 8, 2013
    Configuration menu
    Copy the full SHA
    05c1646 View commit details
    Browse the repository at this point in the history
  2. Fix crash if the first frame is forced to a non-keyframe

    This is obviously bad user input, but x264 shouldn't crash if it happens.
    MasterNobody authored and Fiona Glaser committed Jan 8, 2013
    Configuration menu
    Copy the full SHA
    8eddd52 View commit details
    Browse the repository at this point in the history
  3. Update config.guess and config.sub

    Henrik Gramner authored and Fiona Glaser committed Jan 8, 2013
    Configuration menu
    Copy the full SHA
    9d5ec55 View commit details
    Browse the repository at this point in the history
  4. x86inc: support stack mem allocation and re-alignment in PROLOGUE

    Use this in 8-bit loopfilter functions so they can be used if
    there is no aligned stack (e.g. x86-32 MSVC or ICC 10.x).
    rbultje authored and Fiona Glaser committed Jan 8, 2013
    Configuration menu
    Copy the full SHA
    b073e87 View commit details
    Browse the repository at this point in the history
  5. x86inc: activate REP_RET automatically

    Now RET checks whether it immediately follows a branch, so the programmer dosen't have to keep track of that condition.
    REP_RET is still needed manually when it's a branch target, but that's much rarer.
    The implementation involves lots of spurious labels, but that's ok because we strip them.
    pengvado authored and Fiona Glaser committed Jan 8, 2013
    Configuration menu
    Copy the full SHA
    4cf2728 View commit details
    Browse the repository at this point in the history

Commits on Jan 9, 2013

  1. x86inc: Use VEX-encoded instructions in AVX functions

    Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4 functions for all instructions that exists in a VEX-encoded version.
    This change makes it easier to extend existing code to use AVX2.
    Also add support for AVX emulation of a few instructions that were missing before.
    Henrik Gramner authored and Fiona Glaser committed Jan 9, 2013
    Configuration menu
    Copy the full SHA
    8a9608b View commit details
    Browse the repository at this point in the history
  2. AVX2/FMA3 version of mbtree_propagate

    First AVX2 function for testing.
    Bump yasm version to 1.2.0 for AVX2 support.
    Fiona Glaser committed Jan 9, 2013
    Configuration menu
    Copy the full SHA
    ccda1ba View commit details
    Browse the repository at this point in the history
  3. x86inc: Drop tzcnt workaround

    It is no longer needed now that we've bumped the version requirement of yasm to 1.2.0.
    Henrik Gramner authored and Fiona Glaser committed Jan 9, 2013
    Configuration menu
    Copy the full SHA
    f2b4f29 View commit details
    Browse the repository at this point in the history
  4. Bump dates to 2013

    pengvado authored and Fiona Glaser committed Jan 9, 2013
    Configuration menu
    Copy the full SHA
    732b072 View commit details
    Browse the repository at this point in the history

Commits on Feb 25, 2013

  1. x86-32: use simple nop codes for <= sse

    The "CentaurHauls family 6 model 9 stepping 8" family of CPUs (flags:
    fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse up rng
    rng_en ace ace_en) SIGILLs on long nop codes.
    rbultje authored and Fiona Glaser committed Feb 25, 2013
    Configuration menu
    Copy the full SHA
    9475e6a View commit details
    Browse the repository at this point in the history
  2. x86-64: fix trellis asm with interlacing

    Regression in r2145.
    Assembly assumed array was [2][64] when it was actually [2][63].
    Tiny (~0.1%) compression improvement.
    Fiona Glaser committed Feb 25, 2013
    Configuration menu
    Copy the full SHA
    5743b19 View commit details
    Browse the repository at this point in the history
  3. x86: don't use the red zone on win64

    MasterNobody authored and Fiona Glaser committed Feb 25, 2013
    Configuration menu
    Copy the full SHA
    c2c2a95 View commit details
    Browse the repository at this point in the history
  4. Fix possible non-determinism with mbtree + open-gop + sync-lookahead

    Code assumed keyframe analysis would only pull one frame off the list; this
    isn't true with open-gop.
    MasterNobody authored and Fiona Glaser committed Feb 25, 2013
    Configuration menu
    Copy the full SHA
    43ff8f1 View commit details
    Browse the repository at this point in the history
  5. Update "Install and compile x264" in doc/regression_test.txt

    Neil authored and Fiona Glaser committed Feb 25, 2013
    Configuration menu
    Copy the full SHA
    b671762 View commit details
    Browse the repository at this point in the history
  6. Cosmetics: stricter definition of parameterless functions

    Gramner authored and Fiona Glaser committed Feb 25, 2013
    Configuration menu
    Copy the full SHA
    6a82e49 View commit details
    Browse the repository at this point in the history
  7. x264.h: improve x264_encoder_reconfig documentation

    Fiona Glaser committed Feb 25, 2013
    Configuration menu
    Copy the full SHA
    3269534 View commit details
    Browse the repository at this point in the history
  8. x86inc: rename program_name to private_prefix

    Synced from libav.
    The new name is more descriptive and will allow defining a separate public
    prefix for externally visible library symbols.
    DonDiego authored and Fiona Glaser committed Feb 25, 2013
    Configuration menu
    Copy the full SHA
    faf3dbe View commit details
    Browse the repository at this point in the history
  9. x86inc: Add cvisible macro for C functions with public prefix

    This allows defining externally visible library symbols.
    
    Signed-off-by: Diego Biurrun <diego@biurrun.de>
    DonDiego authored and Fiona Glaser committed Feb 25, 2013
    Configuration menu
    Copy the full SHA
    fd2c4a0 View commit details
    Browse the repository at this point in the history
  10. x86inc: Set ELF hidden visibility for global constants

    Gramner authored and Fiona Glaser committed Feb 25, 2013
    Configuration menu
    Copy the full SHA
    5ec5c78 View commit details
    Browse the repository at this point in the history
  11. Windows: Enable DEP and ASLR

    Gramner authored and Fiona Glaser committed Feb 25, 2013
    Configuration menu
    Copy the full SHA
    5e0fca8 View commit details
    Browse the repository at this point in the history
  12. configure: add QNX support

    llmike authored and Fiona Glaser committed Feb 25, 2013
    Configuration menu
    Copy the full SHA
    f6e0d28 View commit details
    Browse the repository at this point in the history
  13. 64-bit cabac optimizations

    ~4% faster PIC
    
    WIN64:
    ~3% faster and 16 byte shorter cabac_encode_bypass
    ~8% faster cabac_encode_terminal
    Benchmarked on Ivy Bridge
    
    UNIX64:
    One instruction less in cabac_encode_bypass
    Gramner authored and Fiona Glaser committed Feb 25, 2013
    Configuration menu
    Copy the full SHA
    c3983b8 View commit details
    Browse the repository at this point in the history
  14. x86: Use SSE instead of SSE2 for copying data

    Reduces code size because movaps/movups is one byte shorter than movdqa/movdqu.
    Also merge MMX and SSE versions of memcpy_aligned into a single macro.
    Gramner authored and Fiona Glaser committed Feb 25, 2013
    Configuration menu
    Copy the full SHA
    5a76432 View commit details
    Browse the repository at this point in the history
  15. Improve lookahead-threads auto selection

    Smarter decision to improve fast-first-pass performance in 2-pass encodes.
    Dramatically improves CPU utilization on multi-core systems.
    
    Tested on a quad-core Ivy Bridge (12 threads, 1080p):
    Fast first pass:
    veryfast:     ~7% faster
    faster:      ~11% faster
    fast/medium: ~15% faster
    slow/slower: ~42% faster
    veryslow:    ~55% faster
    CRF/1-pass:
    veryfast:     ~9% faster
    (all others remained the same)
    Fiona Glaser committed Feb 25, 2013
    Configuration menu
    Copy the full SHA
    d2a9d25 View commit details
    Browse the repository at this point in the history
  16. Fix two bugs in predictor checking

    pmv wasn't checked properly in some cases, as well as zero vector.
    Output-changing portion of the following patch.
    Fiona Glaser committed Feb 25, 2013
    Configuration menu
    Copy the full SHA
    0046406 View commit details
    Browse the repository at this point in the history
  17. x86: optimize and clean up predictor checking

    Branchlessly handle elimination of candidates in MMX roundclip asm.
    Add a new asm function, similar to roundclip, except without the round part.
    Optimize and organize the C code, and make both subme>=3 and subme<3 consistent.
    Add lots of explanatory comments and try to make things a little more understandable.
    ~5-10% faster with subme>=3, ~15-20% faster with subme<3.
    Fiona Glaser committed Feb 25, 2013
    Configuration menu
    Copy the full SHA
    6371c3a View commit details
    Browse the repository at this point in the history
  18. x86: faster high bit depth ssd

    About 15% faster on average.
    irock authored and Fiona Glaser committed Feb 25, 2013
    Configuration menu
    Copy the full SHA
    93bf124 View commit details
    Browse the repository at this point in the history

Commits on Feb 26, 2013

  1. x86: port SSE2+ SATD functions to high bit depth

    Makes SATD 20-50% faster across all partition sizes but 4x4.
    irock authored and Fiona Glaser committed Feb 26, 2013
    Configuration menu
    Copy the full SHA
    790c648 View commit details
    Browse the repository at this point in the history
  2. x86: combined SA8D/SATD dsp function

    Speedup is most apparent for 8-bit (~30%), but gives some improvements
    for 10-bit too (~12%).
    64-bit only for now.
    irock authored and Fiona Glaser committed Feb 26, 2013
    Configuration menu
    Copy the full SHA
    75d9270 View commit details
    Browse the repository at this point in the history
  3. x86: detect Bobcat, improve Atom optimizations, reorganize flags

    The Bobcat has a 64-bit SIMD unit reminiscent of the Athlon 64; detect this
    and apply the appropriate flags.
    
    It also has an extremely slow palignr instruction; create a flag for this to
    avoid massive penalties on palignr-heavy functions.
    
    Improve Atom function selection and document exactly what the SLOW_ATOM flag
    covers.
    
    Add Atom-optimized SATD/SA8D/hadamard_ac functions: simply combine the ssse3
    optimizations with the sse2 algorithm to avoid pmaddubsw, which is slow on
    Atom along with other SIMD multiplies.
    
    Drop TBM detection; it'll probably never be useful for x264.
    
    Invert FastShuffle to SlowShuffle; it only ever applied to one CPU (Conroe).
    
    Detect CMOV, to fail more gracefully when run on a chip with MMX2 but no CMOV.
    Fiona Glaser committed Feb 26, 2013
    Configuration menu
    Copy the full SHA
    5d60b9c View commit details
    Browse the repository at this point in the history
  4. x86: faster AVX satd/sa8d/sa8d_satd/hadamard_ac

    Use Conroe-style movddup in AVX transforms; both Sandy Bridge and Bulldozer
    do movddup in the load unit, so it's totally free this way.
    
    On Sandy Bridge:
    ~6% faster sa8d_satd
    ~5% faster hadamard_ac
    ~9% faster 32-bit satd
    ~2% faster sa8d
    Fiona Glaser committed Feb 26, 2013
    Configuration menu
    Copy the full SHA
    68a6268 View commit details
    Browse the repository at this point in the history
  5. Fix some store forwarding stalls

    There's quite a few others, but most of them don't help to fix or there's no
    easy way to avoid them.
    Fiona Glaser committed Feb 26, 2013
    Configuration menu
    Copy the full SHA
    7de9a9a View commit details
    Browse the repository at this point in the history
  6. Eliminate some branchiness in ME/analysis

    Faster, fewer branch mispredictions.
    Fiona Glaser committed Feb 26, 2013
    Configuration menu
    Copy the full SHA
    7b1301e View commit details
    Browse the repository at this point in the history
  7. Add AvxSynth support to the AviSynth input module.

    Uses dlopen to load AvxSynth on Linux and OS X.
    
    Allows the use of --demuxer avs for AvxSynth, though the only source filter it
    can currently use is FFMS2.
    
    Add a local copy of avxsynth_c.h and its dependent headers in extras/ so that
    users don't need to actually have AvxSynth development headers installed to
    enable support for it (mirroring the AviSynth behavior).
    
    Based on a patch by 0x09 (tab@lavabit.com)
    qyot27 authored and Fiona Glaser committed Feb 26, 2013
    Configuration menu
    Copy the full SHA
    5ee1d03 View commit details
    Browse the repository at this point in the history
  8. quant_4x4x4: quant one 8x8 block at a time

    This reduces overhead and lets us use less branchy code for zigzag, dequant,
    decimate, and so on.
    Reorganize and optimize a lot of macroblock_encode using this new function.
    ~1-2% faster overall.
    
    Includes NEON and x86 versions of the new function.
    Using larger merged functions like this will also make wider SIMD, like
    AVX2, more effective.
    Fiona Glaser committed Feb 26, 2013
    Configuration menu
    Copy the full SHA
    993c81e View commit details
    Browse the repository at this point in the history
  9. CABAC/CAVLC: use the new bit-iterating macro here too

    Fiona Glaser committed Feb 26, 2013
    Configuration menu
    Copy the full SHA
    215f2be View commit details
    Browse the repository at this point in the history
  10. ARM: update NEON mc_chroma to work with NV12 and re-enable it

    Up to 10-15% faster overall.
    Stefan Groenroos authored and Fiona Glaser committed Feb 26, 2013
    Configuration menu
    Copy the full SHA
    3a8baa0 View commit details
    Browse the repository at this point in the history

Commits on Mar 1, 2013

  1. ARM: Fix bug in x264_quant_4x4x4_neon

    Regression in r2273.
    Stefan Groenroos authored and Fiona Glaser committed Mar 1, 2013
    Configuration menu
    Copy the full SHA
    cb4547a View commit details
    Browse the repository at this point in the history

Commits on Apr 13, 2013

  1. Fix undefined behavior in x264_ratecontrol_mb

    Fiona Glaser committed Apr 13, 2013
    Configuration menu
    Copy the full SHA
    3703344 View commit details
    Browse the repository at this point in the history

Commits on Apr 23, 2013

  1. Fix array overreads that caused miscompilation in gcc 4.8

    Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    3cdaca1 View commit details
    Browse the repository at this point in the history
  2. x86inc: fix some corner cases of SWAP

    SWAP with >=3 named (rather than numbered) args
    PERMUTE followed by SWAP with 2 named args
    used to produce the wrong permutation
    pengvado authored and Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    bed18d0 View commit details
    Browse the repository at this point in the history
  3. x86: correctly check stack alignment for Atom hadamard_ac

    Regression in r2265 (only affected compilers with broken stack alignment,
    like ICL on win32).
    Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    42c500a View commit details
    Browse the repository at this point in the history
  4. Fix y4m input with C420paldv colorspace

    MasterNobody authored and Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    aa73459 View commit details
    Browse the repository at this point in the history
  5. lavf input: don't use deprecated AVStream fields

    Fixes building against newer libavcodecs from the Libav project.
    twalker314 authored and Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    e74287e View commit details
    Browse the repository at this point in the history
  6. Show "avs: no" --disable-avs option instead of empty string

    MasterNobody authored and Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    bf52bab View commit details
    Browse the repository at this point in the history
  7. Disable mbtree asm with cpu-independent option

    Results vary between versions because of different rounding results.
    MasterNobody authored and Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    8a3a41d View commit details
    Browse the repository at this point in the history
  8. Add slice-min-mbs feature

    Works in conjunction with slice-max-mbs and/or slice-max-size to avoid overly
    small slices.
    Useful with certain decoders that barf on extremely small slices.
    
    If slice-min-mbs would be violated as a result of slice-max-size, x264 will
    exceed slice-max-size and print a warning.
    Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    fdfffa3 View commit details
    Browse the repository at this point in the history
  9. Add slices-max feature

    The H.264 spec technically has limits on the number of slices per frame. x264
    normally ignores this, since most use-cases that require large numbers of
    slices prefer it to. However, certain decoders may break with extremely large
    numbers of slices, as can occur with some slice-max-size/mbs settings.
    
    When set, x264 will refuse to create any slices beyond the maximum number,
    even if slice-max-size/mbs requires otherwise.
    Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    732e4f7 View commit details
    Browse the repository at this point in the history
  10. weightp: improve scale/offset search, chroma

    Rescale the scale factor if the offset clips. This makes weightp more effective
    in fades to/from white (and an other situation that requires big offsets).
    
    Search more than 1 scale factor and more than 1 offset, depending on --subme.
    
    Try to find the optimal chroma denominator instead of hardcoding it.
    
    Overall improvement: a few percent in fade-heavy clips, such as a sample from
    Avatar: TLA.
    Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    2d0c47a View commit details
    Browse the repository at this point in the history
  11. OpenCL lookahead

    OpenCL support is compiled in by default, but must be enabled at runtime by an
    --opencl command line flag. Compiling OpenCL support requires perl. To avoid
    the perl requirement use: configure --disable-opencl.
    
    When enabled, the lookahead thread is mostly off-loaded to an OpenCL capable GPU
    device.  Lowres intra cost prediction, lowres motion search (including subpel)
    and bidir cost predictions are all done on the GPU.  MB-tree and final slice
    decisions are still done by the CPU.  Presets which do not use a threaded
    lookahead will not use OpenCL at all (superfast, ultrafast).
    
    Because of data dependencies, the GPU must use an iterative motion search which
    performs more total work than the CPU would do, so this is not work efficient
    or power efficient. But if there are spare GPU cycles to spare, it can often
    speed up the encode. Output quality when OpenCL lookahead is enabled is often
    very slightly worse in quality than the CPU quality (because of the same data
    dependencies).
    
    x264 must compile its OpenCL kernels for your device before running them, and in
    order to avoid doing this every run it caches the compiled kernel binary in a
    file named x264_lookahead.clbin (--opencl-clbin FNAME to override).  The cache
    file will be ignored if the device, driver, or OpenCL source are changed.
    
    x264 will use the first GPU device which supports the required cl_image
    features required by its kernels. Most modern discrete GPUs and all AMD
    integrated GPUs will work.  Intel integrated GPUs (up to IvyBridge) do not
    support those necessary features. Use --opencl-device N to specify a number of
    capable GPUs to skip during device detection.
    
    Switchable graphics environments (e.g. AMD Enduro) are currently not supported,
    as some have bugs in their OpenCL drivers that cause output to be silently
    incorrect.
    
    Developed by MulticoreWare with support from AMD and Telestream.
    sborho authored and Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    f49a1b2 View commit details
    Browse the repository at this point in the history
  12. x86-64: cabac_block_residual assembly

    RDO: ~20% faster than C
    Bitstream: ~50% faster than C
    1-2% faster overall, highest on preset superfast/fast/medium.
    Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    a3f5c73 View commit details
    Browse the repository at this point in the history
  13. x86inc: fix AVX emulation of cmp(p|s)(s|d)

    Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    3a8dfb2 View commit details
    Browse the repository at this point in the history
  14. x86inc: create xm# and ym#, analagous to m#

    For when we want to mix simd sizes within one function.
    pengvado authored and Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    19e1a2b View commit details
    Browse the repository at this point in the history
  15. x86: more AVX2 framework, AVX2 functions, plus some existing asm tweaks

    AVX2 functions:
    mc_chroma
    intra_sad_x3_16x16
    last64
    ads
    hpel
    dct4
    idct4
    sub16x16_dct8
    quant_4x4x4
    quant_4x4
    quant_4x4_dc
    quant_8x8
    SAD_X3/X4
    SATD
    var
    var2
    SSD
    zigzag interleave
    weightp
    weightb
    intra_sad_8x8_x9
    decimate
    integral
    hadamard_ac
    sa8d_satd
    sa8d
    lowres_init
    denoise
    Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    0ea5be8 View commit details
    Browse the repository at this point in the history
  16. x86util: Support ymm registers in HADD macros

    Gramner authored and Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    184c505 View commit details
    Browse the repository at this point in the history
  17. x86: AVX2 high bit-depth predict_8x8c_h/predict_8x16c_h

    Gramner authored and Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    51708c3 View commit details
    Browse the repository at this point in the history
  18. x86: AVX2 high bit-depth predict_16x16_h

    Gramner authored and Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    7908dc6 View commit details
    Browse the repository at this point in the history
  19. x86: AVX2 high bit-depth predict_4x4_h

    Gramner authored and Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    fa40b44 View commit details
    Browse the repository at this point in the history
  20. x86: AVX high bit-depth predict_16x16_v

    Also restructure some code to reduce code size of various functions,
    especially in high bit-depth.
    Gramner authored and Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    f3d521d View commit details
    Browse the repository at this point in the history
  21. x86: AVX2 predict_16x16_p

    Also fix the AVX implementation to correctly use the SSSE3 inline asm
    instead of SSE2.
    Gramner authored and Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    8ecdeb2 View commit details
    Browse the repository at this point in the history
  22. x86: AVX2 predict_8x8c_p/predict_8x16c_p

    Gramner authored and Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    97ad171 View commit details
    Browse the repository at this point in the history
  23. x86: AVX2 predict_16x16_dc

    Gramner authored and Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    0f776f6 View commit details
    Browse the repository at this point in the history
  24. x86: AVX memzero_aligned

    Gramner authored and Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    547a657 View commit details
    Browse the repository at this point in the history
  25. x86: AVX2 nal_escape

    Also rewrite the entire function to be faster and drop the AVX version which is no longer useful.
    Gramner authored and Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    e7a46b6 View commit details
    Browse the repository at this point in the history
  26. x86: AVX2 high_bit_depth pixel_avg2, get_ref, mc_copy_w16, mc_luma

    Also reduce the number of xmm registers used by mc_copy_* to avoid
    saving and restoring xmm6 and xmm7 on 64-bit Windows.
    Gramner authored and Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    295f83a View commit details
    Browse the repository at this point in the history
  27. x86: AVX2 high bit-depth pixel_sad

    Also use loops instead of duplicating code; reduces code size by ~10kB with
    negligible effect on performance.
    Gramner authored and Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    9f885c1 View commit details
    Browse the repository at this point in the history
  28. x86: AVX2 high bit-depth vsad

    Gramner authored and Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    0e69048 View commit details
    Browse the repository at this point in the history
  29. x86: AVX2 high bit-depth pixel_sad_x3/pixel_sad_x4

    Also reduce the number of xmm registers used by sse2/ssse3 pixel_sad_x3.
    Gramner authored and Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    f49c2eb View commit details
    Browse the repository at this point in the history
  30. x86: AVX2 high bit-depth pixel_ssd

    Gramner authored and Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    dc05aeb View commit details
    Browse the repository at this point in the history
  31. x86: AVX2 pixel_ssd_nv12_core

    Gramner authored and Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    03396f8 View commit details
    Browse the repository at this point in the history
  32. x86: SSSE3 ads_mvs

    ~55% faster ads in benchasm, ~15-30% in real encoding.
    ~4% faster "placebo" preset overall.
    Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    40316f8 View commit details
    Browse the repository at this point in the history
  33. x86-64: BMI2 cabac_residual functions

    Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    c17d12f View commit details
    Browse the repository at this point in the history
  34. x86: SSSE3 LUT-based faster coeff_level_run

    ~2x faster coeff_level_run.
    Faster CAVLC encoding: {1%,2%,7%} overall with {superfast,medium,slower}.
    Uses the same pshufb LUT abuse trick as in the previous ads_mvs patch.
    Fiona Glaser committed Apr 23, 2013
    Configuration menu
    Copy the full SHA
    67d6f60 View commit details
    Browse the repository at this point in the history

Commits on Apr 29, 2013

  1. Fix two bugs in slice-min-mbs and slices-max

    Slices-max broke slice-max-size when slice-max wasn't used.
    Slice-min-mbs broke in rare cases near the end of a threadslice.
    Fiona Glaser committed Apr 29, 2013
    Configuration menu
    Copy the full SHA
    7f36065 View commit details
    Browse the repository at this point in the history

Commits on May 15, 2013

  1. Fix invalid memcpy in sliced-threads

    Likely didn't actually break in practice, but memcpy with src==dst
    is incorrect.
    Fiona Glaser committed May 15, 2013
    Configuration menu
    Copy the full SHA
    3ba0fb8 View commit details
    Browse the repository at this point in the history

Commits on May 17, 2013

  1. checkasm: Fix stack alignment bug

    Gramner authored and Fiona Glaser committed May 17, 2013
    Configuration menu
    Copy the full SHA
    0e000e7 View commit details
    Browse the repository at this point in the history
  2. checkasm: Use 64-bit cycle counters

    Prevents overflows that can occur in some cases.
    Gramner authored and Fiona Glaser committed May 17, 2013
    Configuration menu
    Copy the full SHA
    5444e95 View commit details
    Browse the repository at this point in the history
  3. x86inc: Remove .rodata kludges

    The Mach-O bug was fixed in yasm 0.8.0 and we don't support versions that old.
    
    a.out was superseded by ELF on sane systems a few decades ago.
    Gramner authored and Fiona Glaser committed May 17, 2013
    Configuration menu
    Copy the full SHA
    c1e3709 View commit details
    Browse the repository at this point in the history
  4. x86: add Jaguar CPU detection

    Fiona Glaser committed May 17, 2013
    Configuration menu
    Copy the full SHA
    25e219a View commit details
    Browse the repository at this point in the history
  5. x86: Add missing initializations for high bit-depth sad_aligned

    Gramner authored and Fiona Glaser committed May 17, 2013
    Configuration menu
    Copy the full SHA
    16d0372 View commit details
    Browse the repository at this point in the history
  6. x86: Don't use explicitly aligned versions of SAD on AVX CPUs

    On modern CPUs movdqu isn't slower than movdqa when used on aligned data and using the same code in both cases saves cache.
    
    This was already done for the high bit-depth AVX2 implementation but the aligned version still exists as dead code so remove that.
    Gramner authored and Fiona Glaser committed May 17, 2013
    Configuration menu
    Copy the full SHA
    33c3526 View commit details
    Browse the repository at this point in the history

Commits on May 20, 2013

  1. x86inc: Utilize the shadow space on 64-bit Windows

    Store XMM6 and XMM7 in the shadow space in functions that clobbers them.
    This way we don't have to adjust the stack pointer as often,
    reducing the number of instructions as well as code size.
    Gramner authored and Fiona Glaser committed May 20, 2013
    Configuration menu
    Copy the full SHA
    30c91f6 View commit details
    Browse the repository at this point in the history
  2. x86: 32-byte align the stack if possible

    Avoids the need for manual 32 byte array alignment on compilers that support
    -mpreferred-stack-boundary.
    Fiona Glaser committed May 20, 2013
    Configuration menu
    Copy the full SHA
    7cbb27f View commit details
    Browse the repository at this point in the history
  3. x86-64: faster SSSE3 trellis

    ~2% faster trellis.
    Fiona Glaser committed May 20, 2013
    Configuration menu
    Copy the full SHA
    1f5a32c View commit details
    Browse the repository at this point in the history
  4. x86: faster SSSE3 hpel

    ~7% faster using the pmulhrsw trick from mc_chroma.
    Fiona Glaser committed May 20, 2013
    Configuration menu
    Copy the full SHA
    a838417 View commit details
    Browse the repository at this point in the history
  5. x86: Faster high bit-depth intra_sad_x3_4x4

    20->16 cycles on Ivy Bridge
    Gramner authored and Fiona Glaser committed May 20, 2013
    Configuration menu
    Copy the full SHA
    594dd84 View commit details
    Browse the repository at this point in the history
  6. x86: AVX2 deblock strength

    30->18 cycles
    Fiona Glaser committed May 20, 2013
    Configuration menu
    Copy the full SHA
    8e4f045 View commit details
    Browse the repository at this point in the history
  7. x86: AVX2 high bit-depth intra_sad_x3_8x8

    43->24 cycles
    Gramner authored and Fiona Glaser committed May 20, 2013
    Configuration menu
    Copy the full SHA
    f114746 View commit details
    Browse the repository at this point in the history
  8. x86: AVX2 intra_sad_x3_8x8c

    30->22 cycles
    Fiona Glaser committed May 20, 2013
    Configuration menu
    Copy the full SHA
    af6647e View commit details
    Browse the repository at this point in the history
  9. x86: faster AVX2 quant_4x4x4

    10->9 cycles
    Fiona Glaser committed May 20, 2013
    Configuration menu
    Copy the full SHA
    0c00c2c View commit details
    Browse the repository at this point in the history
  10. x86: AVX2 add16x16_idct_dc

    27 -> 19 cycles
    Fiona Glaser committed May 20, 2013
    Configuration menu
    Copy the full SHA
    02aa136 View commit details
    Browse the repository at this point in the history
  11. x86: AVX2 high bit-depth quant

    quant_4x4: 13->6 cycles
    quant_4x4_dc: 14->8 cycles
    quant_8x8: 47->24 cycles
    quant_4x4x4: 48->25 cycles
    Gramner authored and Fiona Glaser committed May 20, 2013
    Configuration menu
    Copy the full SHA
    481e4cd View commit details
    Browse the repository at this point in the history
  12. x86: AVX2 high bit-depth denoise_dct

    28->15 cycles
    
    Also reorder instructions to use fewer registers, 3 cycles faster on Ivy Bridge with 64-bit Windows.
    Gramner authored and Fiona Glaser committed May 20, 2013
    Configuration menu
    Copy the full SHA
    89f067b View commit details
    Browse the repository at this point in the history
  13. x86-64: 64-bit variant of AVX2 hpel_filter

    ~5% faster than 32-bit.
    Fiona Glaser committed May 20, 2013
    Configuration menu
    Copy the full SHA
    bc88d1b View commit details
    Browse the repository at this point in the history
  14. x86: AVX2 high bit-depth dequant

    Gramner authored and Fiona Glaser committed May 20, 2013
    Configuration menu
    Copy the full SHA
    edf31ed View commit details
    Browse the repository at this point in the history
  15. x86: AVX2 dequant_4x4_dc

    Gramner authored and Fiona Glaser committed May 20, 2013
    Configuration menu
    Copy the full SHA
    e7cb328 View commit details
    Browse the repository at this point in the history
  16. x86: shave a few instructions off AVX deblock

    Fiona Glaser committed May 20, 2013
    Configuration menu
    Copy the full SHA
    0b2c3d3 View commit details
    Browse the repository at this point in the history
  17. OpenCL support improvement/refactoring

    Autoload the OpenCL library so that it's not required to run an openCL-enabled
    build of x264.
    
    Update X264_BUILD, which should have been changed with the first patch.
    MasterNobody authored and Fiona Glaser committed May 20, 2013
    Configuration menu
    Copy the full SHA
    3aa9a67 View commit details
    Browse the repository at this point in the history

Commits on May 22, 2013

  1. Fix compilation with OpenCL on MacOS X

    Also fix crash in the case of OpenCL error during encoding.
    MasterNobody authored and Fiona Glaser committed May 22, 2013
    Configuration menu
    Copy the full SHA
    3b8e924 View commit details
    Browse the repository at this point in the history

Commits on May 28, 2013

  1. Fix building with compilers without inline asm support

    Also fix crash in high bit depth builds compiled with unaligned stack.
    MasterNobody authored and Fiona Glaser committed May 28, 2013
    Configuration menu
    Copy the full SHA
    e32d9c2 View commit details
    Browse the repository at this point in the history

Commits on Jul 3, 2013

  1. Fix potential misaligment crash in AVX2 denoise_dct

    Gramner authored and Fiona Glaser committed Jul 3, 2013
    Configuration menu
    Copy the full SHA
    c41b629 View commit details
    Browse the repository at this point in the history
  2. Fix build with PIC on some systems

    pengvado authored and Fiona Glaser committed Jul 3, 2013
    Configuration menu
    Copy the full SHA
    25ef3f5 View commit details
    Browse the repository at this point in the history
  3. Fix possible crash when writing very large filler NALUs

    Bitstream-reallocation function didn't handle the case of filler.
    MasterNobody authored and Fiona Glaser committed Jul 3, 2013
    Configuration menu
    Copy the full SHA
    ffc3ad4 View commit details
    Browse the repository at this point in the history
  4. OpenCL cosmetics

    MasterNobody authored and Fiona Glaser committed Jul 3, 2013
    Configuration menu
    Copy the full SHA
    83d35c7 View commit details
    Browse the repository at this point in the history
  5. Interface: if vbv-maxrate < bitrate, set bitrate = vbv-maxrate

    This probably makes more sense to the user than setting vbv-maxrate = bitrate,
    as before.
    Fiona Glaser committed Jul 3, 2013
    Configuration menu
    Copy the full SHA
    9143d5a View commit details
    Browse the repository at this point in the history
  6. Add "--stitchable" option for segmented encoding

    Stops x264 from attempting to optimize global stream headers, ensuring that
    different segments of a video will have identical headers when used with
    identical encoding settings.
    Fiona Glaser committed Jul 3, 2013
    Configuration menu
    Copy the full SHA
    fa215fc View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    397f60e View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    bfa2f0c View commit details
    Browse the repository at this point in the history
  9. Tweak i16x16-delta-quant-avoidance code

    Don't omit the delta quant if it'd raise the quantizer to do so; this fixes
    a rare flickering issue caused by deblocking.
    Fiona Glaser committed Jul 3, 2013
    Configuration menu
    Copy the full SHA
    01087fd View commit details
    Browse the repository at this point in the history

Commits on Jul 5, 2013

  1. x86: Remove X264_CPU_SSE_MISALIGN functions

    Prevents a crash if the misaligned exception mask bit is cleared for some reason.
    
    Misaligned SSE functions are only used on AMD Phenom CPUs and the benefit is miniscule.
    They also require modifying the MXCSR control register and by removing those functions
    we can get rid of that complexity altogether.
    
    VEX-encoded instructions also supports unaligned memory operands. I tried adding AVX
    implementations of all removed functions but there were no performance improvements on
    Ivy Bridge. pixel_sad_x3 and pixel_sad_x4 had significant code size reductions though
    so I kept them and added some minor cosmetics fixes and tweaks.
    Gramner authored and Fiona Glaser committed Jul 5, 2013
    Configuration menu
    Copy the full SHA
    ff41804 View commit details
    Browse the repository at this point in the history

Commits on Aug 23, 2013

  1. Fix AVX2 detection bug with "limit CPUID" enabled in BIOS

    Fiona Glaser committed Aug 23, 2013
    Configuration menu
    Copy the full SHA
    2d66c7c View commit details
    Browse the repository at this point in the history
  2. Fix a few minor bugs found with a static analyzer

    MasterNobody authored and Fiona Glaser committed Aug 23, 2013
    Configuration menu
    Copy the full SHA
    a6c396f View commit details
    Browse the repository at this point in the history
  3. Fix cases in which intra refresh allowed prediction from disallowed p…

    …ixels
    MasterNobody authored and Fiona Glaser committed Aug 23, 2013
    Configuration menu
    Copy the full SHA
    1430b04 View commit details
    Browse the repository at this point in the history
  4. x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64"

    This is also a valid value for WIN64.
    dwbuiten authored and Fiona Glaser committed Aug 23, 2013
    Configuration menu
    Copy the full SHA
    adc99d1 View commit details
    Browse the repository at this point in the history
  5. configure: Support cygwin64

    Diogo Franco authored and Fiona Glaser committed Aug 23, 2013
    Configuration menu
    Copy the full SHA
    401edc3 View commit details
    Browse the repository at this point in the history
  6. x86: Faster AVX2 pixel_sad_x3 and pixel_sad_x4

    Gramner authored and Fiona Glaser committed Aug 23, 2013
    Configuration menu
    Copy the full SHA
    4becc3e View commit details
    Browse the repository at this point in the history
  7. x86: SSSE3 implementation of pixel_sad_x3 and pixel_sad_x4

    Gramner authored and Fiona Glaser committed Aug 23, 2013
    Configuration menu
    Copy the full SHA
    e33aac9 View commit details
    Browse the repository at this point in the history
  8. Transparent hugepage support

    Combine frame and mb data mallocs into a single large malloc.
    Additionally, on Linux systems with hugepage support, ask for hugepages on
    large mallocs.
    
    This gives a small performance improvement (~0.2-0.9%) on systems without
    hugepage support, as well as a small memory footprint reduction.
    
    On recent Linux kernels with hugepage support enabled (set to madvise or
    always), it improves performance up to 4% at the cost of about 7-12% more
    memory usage on typical settings..
    
    It may help even more on Haswell and other recent CPUs with improved 2MB page
    support in hardware.
    Gramner authored and Fiona Glaser committed Aug 23, 2013
    Configuration menu
    Copy the full SHA
    fa1e2b7 View commit details
    Browse the repository at this point in the history
  9. AVC-Intra support

    This format has been reverse engineered and x264's output has almost exactly
    the same bitstream as Panasonic cameras and encoders produce. It therefore does
    not comply with SMPTE RP2027 since Panasonic themselves do not comply with
    their own specification. It has been tested in Avid, Premiere, Edius and
    Quantel.
    
    Parts of this patch were written by Fiona Glaser and some reverse
    engineering was done by Joseph Artsimovich.
    Kieran Kunhya authored and Fiona Glaser committed Aug 23, 2013
    Configuration menu
    Copy the full SHA
    9b94896 View commit details
    Browse the repository at this point in the history
  10. Windows Unicode support

    Windows, unlike most other operating systems, uses UTF-16 for Unicode strings while x264 is designed for UTF-8.
    
    This patch does the following in order to handle things like Unicode filenames:
    * Keep strings internally as UTF-8.
    * Retrieve the CLI command line as UTF-16 and convert it to UTF-8.
    * Always use Unicode versions of Windows API functions and convert strings to UTF-16 when calling them.
    * Attempt to use legacy 8.3 short filenames for external libraries without Unicode support.
    Gramner authored and Fiona Glaser committed Aug 23, 2013
    Configuration menu
    Copy the full SHA
    fa3cac5 View commit details
    Browse the repository at this point in the history

Commits on Aug 24, 2013

  1. Fix GPAC support on Windows

    boiled-sugar authored and Fiona Glaser committed Aug 24, 2013
    Configuration menu
    Copy the full SHA
    098b686 View commit details
    Browse the repository at this point in the history

Commits on Aug 26, 2013

  1. Fix masked access violation in KERNEL32

    Caused crashes under gdb in Windows and might cause other unknown problems.
    MasterNobody authored and Fiona Glaser committed Aug 26, 2013
    Configuration menu
    Copy the full SHA
    5bcff2a View commit details
    Browse the repository at this point in the history

Commits on Aug 27, 2013

  1. Workaround for FFMS indexing bug

    If FFMS_ReadIndex is used with an empty index file it gets stuck in an infinite loop instead of returning NULL
    like it's supposed to do on failure. Explicitly check if the file is empty before calling it as a workaround.
    Gramner authored and Fiona Glaser committed Aug 27, 2013
    Configuration menu
    Copy the full SHA
    2fd2923 View commit details
    Browse the repository at this point in the history

Commits on Sep 3, 2013

  1. Fix INSTALL in configure for Solaris systems

    timmooney authored and Fiona Glaser committed Sep 3, 2013
    Configuration menu
    Copy the full SHA
    5b272b2 View commit details
    Browse the repository at this point in the history

Commits on Oct 24, 2013

  1. Configuration menu
    Copy the full SHA
    50a0c33 View commit details
    Browse the repository at this point in the history
  2. Fix compilation of shared library for Windows with original MinGW too…

    …lchain
    MasterNobody authored and Fiona Glaser committed Oct 24, 2013
    Configuration menu
    Copy the full SHA
    266fdfc View commit details
    Browse the repository at this point in the history
  3. Fix compilation in case of HAVE_LOG2F check fails spuriously

    MasterNobody authored and Fiona Glaser committed Oct 24, 2013
    Configuration menu
    Copy the full SHA
    03450be View commit details
    Browse the repository at this point in the history
  4. configure: include dependency libs in the Libs pkg-config

    If only a static library is built, the user of the library that just
    tries to link to the lib using the flags provided by pkg-config
    might not know that only a static lib exists and that he'd have to
    pass --static to pkg-config to get the internal dependencies to
    be able to link the library.
    
    For a shared build, the internal dependencies are kept in Libs.private
    as before.
    
    This matches how libav's pkg-config files are generated.
    mstorsjo authored and Fiona Glaser committed Oct 24, 2013
    Configuration menu
    Copy the full SHA
    12f9d49 View commit details
    Browse the repository at this point in the history
  5. configure: don't generate a git version number if .git isn't present

    SeanMcG authored and Fiona Glaser committed Oct 24, 2013
    Configuration menu
    Copy the full SHA
    c3c73f1 View commit details
    Browse the repository at this point in the history

Commits on Oct 25, 2013

  1. version.sh: change to use /bin/sh

    funman authored and Fiona Glaser committed Oct 25, 2013
    Configuration menu
    Copy the full SHA
    b7b6029 View commit details
    Browse the repository at this point in the history
  2. Update to current libav/ffmpeg API

    MasterNobody authored and Fiona Glaser committed Oct 25, 2013
    Configuration menu
    Copy the full SHA
    05f0438 View commit details
    Browse the repository at this point in the history
  3. Replace gf_malloc with regular malloc in mp4 muxer

    It was used as a workaround for a bug that only existed in the GPAC repository
    for a few weeks back in 2010. There's no reason to keep it anymore.
    Gramner authored and Fiona Glaser committed Oct 25, 2013
    Configuration menu
    Copy the full SHA
    8b58a4c View commit details
    Browse the repository at this point in the history
  4. Use calloc instead of malloc + memset

    Gramner authored and Fiona Glaser committed Oct 25, 2013
    Configuration menu
    Copy the full SHA
    b54422a View commit details
    Browse the repository at this point in the history
  5. x86inc: Make ym# behave the same way as xm#

    This makes more sense for future implementations of templates with zmm registers.
    Gramner authored and Fiona Glaser committed Oct 25, 2013
    Configuration menu
    Copy the full SHA
    4b68633 View commit details
    Browse the repository at this point in the history
  6. CRF-max: don't warn if VBV underflow occurs

    Only warn if underflow occurs for reasons other than CRF-max, as CRF-max
    implies that VBV underflow is desired by the user.
    Fiona Glaser committed Oct 25, 2013
    Configuration menu
    Copy the full SHA
    7634f8c View commit details
    Browse the repository at this point in the history
  7. chroma-me: take shortcut in BI analysis

    ~100 cycles faster with subme>=9
    Fiona Glaser committed Oct 25, 2013
    Configuration menu
    Copy the full SHA
    77cc44f View commit details
    Browse the repository at this point in the history

Commits on Oct 30, 2013

  1. Make x264_encoder_reconfig more threadsafe

    Do the reconfig when the next frame's encode begins.
    Fixes some rare crashes with frame-threading and encoder_reconfig.
    MasterNobody authored and Fiona Glaser committed Oct 30, 2013
    Configuration menu
    Copy the full SHA
    350b214 View commit details
    Browse the repository at this point in the history
  2. Add --filler option

    Allows generation of hard-CBR streams without using NAL HRD.
    Useful if you want to be able to reconfigure the bitrate (which you can't do
    with NAL HRD on).
    Fiona Glaser committed Oct 30, 2013
    Configuration menu
    Copy the full SHA
    c084f6c View commit details
    Browse the repository at this point in the history
  3. Add AVC-Intra 1080p50/60 Class 100 parameters

    Also add some compatibility fixes.
    kierank authored and Fiona Glaser committed Oct 30, 2013
    Configuration menu
    Copy the full SHA
    c9f2bce View commit details
    Browse the repository at this point in the history
  4. Add L-SMASH support as preferable alternative for MP4-muxing

    MasterNobody authored and Fiona Glaser committed Oct 30, 2013
    Configuration menu
    Copy the full SHA
    09c7010 View commit details
    Browse the repository at this point in the history
  5. Remove --visualize option.

    It probably wasn't used or maintained for last few years.
    MasterNobody authored and Fiona Glaser committed Oct 30, 2013
    Configuration menu
    Copy the full SHA
    95d196e View commit details
    Browse the repository at this point in the history

Commits on Jan 6, 2014

  1. Fix uninitialized variable

    Caused if the timebase is not specified in stats file. Found by Clang.
    MasterNobody authored and Fiona Glaser committed Jan 6, 2014
    Configuration menu
    Copy the full SHA
    a2f5d60 View commit details
    Browse the repository at this point in the history

Commits on Jan 8, 2014

  1. Fix ARM asm compilation with Apple assembler

    Steve Clark authored and Fiona Glaser committed Jan 8, 2014
    Configuration menu
    Copy the full SHA
    9148141 View commit details
    Browse the repository at this point in the history
  2. Fix input support from named pipes in Windows

    MasterNobody authored and Fiona Glaser committed Jan 8, 2014
    Configuration menu
    Copy the full SHA
    008c56e View commit details
    Browse the repository at this point in the history
  3. CLI: Avoid redundant 16-bit upconversions in piped raw input

    It's not possible to seek in pipes, so if we want to skip frames we have to read and
    discard unused ones. It's pointless to do bit-depth upconversions in those frames.
    Gramner authored and Fiona Glaser committed Jan 8, 2014
    Configuration menu
    Copy the full SHA
    6bc6341 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    7664014 View commit details
    Browse the repository at this point in the history
  5. Remove tools/xyuv.c

    It's an old stand-alone application that isn't relevant to x264.
    Gramner authored and Fiona Glaser committed Jan 8, 2014
    Configuration menu
    Copy the full SHA
    02697d5 View commit details
    Browse the repository at this point in the history
  6. Bump dates to 2014

    Also update AUTHORS file and my e-mail address in the headers of various files.
    Gramner authored and Fiona Glaser committed Jan 8, 2014
    Configuration menu
    Copy the full SHA
    807aeaa View commit details
    Browse the repository at this point in the history
  7. Avoid some unneccesary memory loads in macroblock_encode

    Gramner authored and Fiona Glaser committed Jan 8, 2014
    Configuration menu
    Copy the full SHA
    8be6600 View commit details
    Browse the repository at this point in the history

Commits on Jan 21, 2014

  1. Fix quantization factor allocation

    We don't need to wastefully allocate quant tables above QP_MAX_SPEC; they're
    never used.
    Fiona Glaser committed Jan 21, 2014
    Configuration menu
    Copy the full SHA
    e2a9662 View commit details
    Browse the repository at this point in the history
  2. v210 input support

    Assembly based on code by Henrik Gramner and Loren Merritt.
    jamesba authored and Fiona Glaser committed Jan 21, 2014
    Configuration menu
    Copy the full SHA
    41227fa View commit details
    Browse the repository at this point in the history
  3. Add support for AVC-Intra Class 200

    kierank authored and Fiona Glaser committed Jan 21, 2014
    Configuration menu
    Copy the full SHA
    dd6a303 View commit details
    Browse the repository at this point in the history
  4. x86inc: speed up compilation with yasm

    Work around yasm's inefficiency with handling large numbers of variables
    in the global scope.
    pengvado authored and Fiona Glaser committed Jan 21, 2014
    Configuration menu
    Copy the full SHA
    42d2519 View commit details
    Browse the repository at this point in the history

Commits on Feb 24, 2014

  1. Fix build with Android NDK

    Android NDK does not expose sched_getaffinity.
    dreifachstein authored and Fiona Glaser committed Feb 24, 2014
    Configuration menu
    Copy the full SHA
    0d668be View commit details
    Browse the repository at this point in the history
  2. Really fix quantization factor allocation

    Actually allocate less (instead of just initialize less) and fix comments.
    MasterNobody authored and Fiona Glaser committed Feb 24, 2014
    Configuration menu
    Copy the full SHA
    ee8d5e4 View commit details
    Browse the repository at this point in the history

Commits on Mar 11, 2014

  1. Fix checkasm --bench output when nop_cycles is too large

    MasterNobody authored and Fiona Glaser committed Mar 11, 2014
    Configuration menu
    Copy the full SHA
    48dbfa2 View commit details
    Browse the repository at this point in the history
  2. Fix corruption with CAVLC overflow handling in MBAFF+main profile

    Probably a regression in r2178.
    Fiona Glaser committed Mar 11, 2014
    Configuration menu
    Copy the full SHA
    19dddbc View commit details
    Browse the repository at this point in the history
  3. Fix memory overwrite in x264_deblock_h_chroma_mbaff_sse2

    Fixes possible corruption with MBAFF+sliced threads.
    MasterNobody authored and Fiona Glaser committed Mar 11, 2014
    Configuration menu
    Copy the full SHA
    850c8c5 View commit details
    Browse the repository at this point in the history
  4. mbaff: fix mb_field_decoding_flag tracking and simplify allow skip check

    Fixes an issue with too many forced non-skips in mbaff+cavlc, as well as
    non-deterministic output with mbaff+cavlc+sliced-threads.
    MasterNobody authored and Fiona Glaser committed Mar 11, 2014
    Configuration menu
    Copy the full SHA
    8b821ec View commit details
    Browse the repository at this point in the history

Commits on Mar 12, 2014

  1. Fix pointer cast warning for 64-bit builds

    MasterNobody authored and Fiona Glaser committed Mar 12, 2014
    Configuration menu
    Copy the full SHA
    de01d88 View commit details
    Browse the repository at this point in the history
  2. x264.h: fix documentation

    The full details of the return values of encoder_encode and encoder_headers
    were mistakenly removed a while ago; re-add them.
    Fiona Glaser committed Mar 12, 2014
    Configuration menu
    Copy the full SHA
    b7a50c1 View commit details
    Browse the repository at this point in the history
  3. Don't set chroma_loc_info_present_flag for non-4:2:0

    The H.264 spec says it shouldn't be set in these cases.
    MasterNobody authored and Fiona Glaser committed Mar 12, 2014
    Configuration menu
    Copy the full SHA
    f35e3fc View commit details
    Browse the repository at this point in the history
  4. Write 3D metadata when outputting Matroska

    For when --frame-packing is set.
    Steve Lhomme authored and Fiona Glaser committed Mar 12, 2014
    Configuration menu
    Copy the full SHA
    0bb3b2e View commit details
    Browse the repository at this point in the history
  5. x86: Pass -Worphan-labels to yasm

    Makes it easier to detect typos.
    Gramner authored and Fiona Glaser committed Mar 12, 2014
    Configuration menu
    Copy the full SHA
    8596dd3 View commit details
    Browse the repository at this point in the history
  6. x86inc: free up variable name "n" in global namespace

    pengvado authored and Fiona Glaser committed Mar 12, 2014
    Configuration menu
    Copy the full SHA
    974f2e7 View commit details
    Browse the repository at this point in the history
  7. x86inc: warn if XOP integer FMA instruction emulation is impossible

    Emulation requires a temporary register if arguments 1 and 4 are the same; this
    doesn't obey the semantics of the original instruction, so we can't emulate
    that in x86inc.
    
    ffmpeg has an x86util emulation for that case; I'll add it if x264's asm ever
    needs it.
    
    Also add pmacsdql emulation.
    MasterNobody authored and Fiona Glaser committed Mar 12, 2014
    Configuration menu
    Copy the full SHA
    039fab9 View commit details
    Browse the repository at this point in the history
  8. x86inc: Support arbitrary stack alignments

    If the stack is known to be at least 32-byte aligned we can safely store ymm
    registers on the stack without doing manual alignment.
    
    Change ALLOC_STACK to always align the stack before allocating stack space for
    consistency. Previously alignment would occur either before or after allocating
    stack space depending on whether manual alignment was required or not.
    Gramner authored and Fiona Glaser committed Mar 12, 2014
    Configuration menu
    Copy the full SHA
    7c860f0 View commit details
    Browse the repository at this point in the history
  9. x86: Minor mbtree_propagate_cost improvements

    Reduce the number of registers used from 7 to 6.
    Reduce the number of vector registers used by the AVX2 implementation from 8 to 7.
    Multiply fps_factor by 1/256 once per frame instead of once per macroblock row.
    Use mova instead of movu for dst since it's guaranteed to be aligned.
    Some cosmetics.
    Gramner authored and Fiona Glaser committed Mar 12, 2014
    Configuration menu
    Copy the full SHA
    f032147 View commit details
    Browse the repository at this point in the history
  10. x86: SSE2 and SSSE3 plane_copy_deinterleave_rgb

    About 5.6x faster than C on Haswell.
    Gramner authored and Fiona Glaser committed Mar 12, 2014
    Configuration menu
    Copy the full SHA
    a90ea34 View commit details
    Browse the repository at this point in the history
  11. arm: implement x264_pixel_var_8x16_neon

    checkasm --bench on a cortex-a9:
    var_8x16_c: 4306
    var_8x16_neon: 791
    Janne Grunau authored and Fiona Glaser committed Mar 12, 2014
    Configuration menu
    Copy the full SHA
    6683612 View commit details
    Browse the repository at this point in the history
  12. arm: implement x264_pixel_var2_8x16_neon

    checkasm --bench on a cortex-a9:
    var2_8x16_c: 5677
    var2_8x16_neon: 1421
    Janne Grunau authored and Fiona Glaser committed Mar 12, 2014
    Configuration menu
    Copy the full SHA
    ac8f2e8 View commit details
    Browse the repository at this point in the history

Commits on Mar 13, 2014

  1. arm: use available neon functions for intra_sa8d/sad/satd_x3

    4% faster on main/medium, 15% faster on baseline/superfast on a cortex-a9.
    Janne Grunau authored and Fiona Glaser committed Mar 13, 2014
    Configuration menu
    Copy the full SHA
    00a00cc View commit details
    Browse the repository at this point in the history
  2. Macroblock tree overhaul/optimization

    Move the second core part of macroblock tree into an assembly function;
    SIMD-optimize roughly half of it (for x86). Roughly ~25-65% faster mbtree,
    depending on content.
    
    Slightly change how mbtree handles the tradeoff between range and precision
    for propagation.
    
    Overall a slight (but mostly negligible) effect on SSIM and ~2% faster.
    Fiona Glaser committed Mar 13, 2014
    Configuration menu
    Copy the full SHA
    b3fb718 View commit details
    Browse the repository at this point in the history

Commits on Mar 6, 2017

  1. Configuration menu
    Copy the full SHA
    01f973d View commit details
    Browse the repository at this point in the history