-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge with x264.git #1
base: master
Are you sure you want to change the base?
Commits on Feb 4, 2012
-
Clean up and optimize weightp, plus enable SSSE3 weight on SB/BDZ
Also remove unused AVX cruft.
Fiona Glaser committedFeb 4, 2012 Configuration menu - View commit details
-
Copy full SHA for 6d7c5ef - Browse repository at this point
Copy the full SHA 6d7c5efView commit details -
Minor asm optimizations/cleanup
Fiona Glaser committedFeb 4, 2012 Configuration menu - View commit details
-
Copy full SHA for 04c3819 - Browse repository at this point
Copy the full SHA 04c3819View commit details -
Configuration menu - View commit details
-
Copy full SHA for e0581e0 - Browse repository at this point
Copy the full SHA e0581e0View commit details -
TBM, AVX2, FMA3, BMI1, and BMI2 CPU detection support
TBM and BMI1 are supported by Trinity/Piledriver. The others (and BMI1) will probably appear in Intel's upcoming Haswell. Also update x86inc with AVX2 stuff.
Fiona Glaser committedFeb 4, 2012 Configuration menu - View commit details
-
Copy full SHA for ae289e6 - Browse repository at this point
Copy the full SHA ae289e6View commit details
Commits on Feb 5, 2012
-
Broke register preservation in x264_cpu_cpuid and x264_cpu_xgetbv. Did not cause any problems.
Henrik Gramner authored and Fiona Glaser committedFeb 5, 2012 Configuration menu - View commit details
-
Copy full SHA for a37a424 - Browse repository at this point
Copy the full SHA a37a424View commit details
Commits on Feb 15, 2012
-
Fix interlaced + extremal slice-max-size
Broke if the first macroblock in the slice exceeded the set slice-max-size.
Fiona Glaser committedFeb 15, 2012 Configuration menu - View commit details
-
Copy full SHA for 282c3cf - Browse repository at this point
Copy the full SHA 282c3cfView commit details
Commits on Mar 6, 2012
-
BGR/BGRA input was correct.
Configuration menu - View commit details
-
Copy full SHA for 0fc5acc - Browse repository at this point
Copy the full SHA 0fc5accView commit details -
Configuration menu - View commit details
-
Copy full SHA for 10e1ba5 - Browse repository at this point
Copy the full SHA 10e1ba5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 38a26cd - Browse repository at this point
Copy the full SHA 38a26cdView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0a36950 - Browse repository at this point
Copy the full SHA 0a36950View commit details -
Fix possible alignment crash when linking from MSVC
x264_cavlc_init needs to be stack-aligned now.
Fiona Glaser committedMar 6, 2012 Configuration menu - View commit details
-
Copy full SHA for d52d0b1 - Browse repository at this point
Copy the full SHA d52d0b1View commit details -
Fix incorrect zero-extension assumptions in x86_64 asm
Some x264 asm assumed that the high 32 bits of registers containing "int" values would be zero. This is almost always the case, and it seems to work with gcc, but it is *not* guaranteed by the ABI. As a result, it breaks with some other compilers, like Clang, that take advantage of this in optimizations. Accordingly, fix all x86 code by using intptr_t instead of int or using movsxd where neccessary. Also add checkasm hack to detect when assembly functions incorrectly assumes that 32-bit integers are zero-extended to 64-bit.
Henrik Gramner authored and Fiona Glaser committedMar 6, 2012 Configuration menu - View commit details
-
Copy full SHA for 3131a19 - Browse repository at this point
Copy the full SHA 3131a19View commit details -
Not necessary for x264, as -m amd64 already does the right thing, but used by external users of x86inc.
Configuration menu - View commit details
-
Copy full SHA for 3a5f2fe - Browse repository at this point
Copy the full SHA 3a5f2feView commit details -
Fiona Glaser committed
Mar 6, 2012 Configuration menu - View commit details
-
Copy full SHA for 9da19fb - Browse repository at this point
Copy the full SHA 9da19fbView commit details -
Remove explicit run calculation from coeff_level_run
Not necessary with the CAVLC lookup table for zero run codes.
Fiona Glaser committedMar 6, 2012 Configuration menu - View commit details
-
Copy full SHA for 1b31a10 - Browse repository at this point
Copy the full SHA 1b31a10View commit details
Commits on Mar 7, 2012
-
Add an small per-MB cost penalty for lowres
Helps avoid VBV predictors going nuts with very low-cost MBs. One particular case this fixes is zero-cost MBs: adaptive quantization decreases the QP a lot, but (before this patch), no cost penalty gets factored in for this, because anything times zero is zero.
Configuration menu - View commit details
-
Copy full SHA for 48e8e52 - Browse repository at this point
Copy the full SHA 48e8e52View commit details -
Abstract bitstream backup/restore functions
Required for row re-encoding.
Fiona Glaser committedMar 7, 2012 Configuration menu - View commit details
-
Copy full SHA for bc473dd - Browse repository at this point
Copy the full SHA bc473ddView commit details -
Add row-reencoding support to VBV for improved accuracy
Extremely accurate, possibly 100% so (I can't get it to fail even with difficult VBVs). Does not yet support rows split on slice boundaries (occurs often with slice-max-size/mbs). Still inaccurate with sliced threads, but better than before.
Fiona Glaser committedMar 7, 2012 Configuration menu - View commit details
-
Copy full SHA for 2535ba1 - Browse repository at this point
Copy the full SHA 2535ba1View commit details -
Fiona Glaser committed
Mar 7, 2012 Configuration menu - View commit details
-
Copy full SHA for 92b0bd9 - Browse repository at this point
Copy the full SHA 92b0bd9View commit details -
Intel was nice enough to make tzcnt equal to "rep bsf", which is backwards-compatible. This means we don't actually have to add new functions to make it work.
Fiona Glaser committedMar 7, 2012 Configuration menu - View commit details
-
Copy full SHA for 42db5e6 - Browse repository at this point
Copy the full SHA 42db5e6View commit details -
Recent AMD CPUs' instruction decoders choke horribly on extremely long nops (i.e. with 4 prefixes). Won't affect much, since we don't use ALIGN much.
Fiona Glaser committedMar 7, 2012 Configuration menu - View commit details
-
Copy full SHA for 5b2c62a - Browse repository at this point
Copy the full SHA 5b2c62aView commit details -
Fully reconstruct frames even without dump-yuv.
Fiona Glaser committedMar 7, 2012 Configuration menu - View commit details
-
Copy full SHA for 90408ec - Browse repository at this point
Copy the full SHA 90408ecView commit details -
Sliced-threads: do hpel and deblock after returning
Lowers encoding latency around 14% in sliced threads mode with preset superfast. Additionally, even if there is no waiting time between frames, this improves parallelism, because hpel+deblock are done during the (singlethreaded) lookahead. For ease of debugging, dump-yuv forces all of the threads to wait and finish instead of setting b_full_recon.
Fiona Glaser committedMar 7, 2012 Configuration menu - View commit details
-
Copy full SHA for a155572 - Browse repository at this point
Copy the full SHA a155572View commit details
Commits on Mar 12, 2012
-
Regression in r2183. Bizarrely seemed to work on many platforms, but crashed on win64 and may have been slower. Only affected sliced threads during encoding, but could cause crashes on x264 encoder close even without sliced threads.
Configuration menu - View commit details
-
Copy full SHA for e046ba7 - Browse repository at this point
Copy the full SHA e046ba7View commit details
Commits on Mar 14, 2012
-
Fix sliced-threads ratecontrol bug
Was using qp instead of qscale; could cause NANs (not to mention less accurate results).
Fiona Glaser committedMar 14, 2012 Configuration menu - View commit details
-
Copy full SHA for bca4127 - Browse repository at this point
Copy the full SHA bca4127View commit details
Commits on Mar 22, 2012
-
The code does, in fact, handle CAVLC+8x8dct correctly already.
Fiona Glaser committedMar 22, 2012 Configuration menu - View commit details
-
Copy full SHA for 065fec2 - Browse repository at this point
Copy the full SHA 065fec2View commit details
Commits on Mar 25, 2012
-
Configuration menu - View commit details
-
Copy full SHA for fff12b1 - Browse repository at this point
Copy the full SHA fff12b1View commit details
Commits on Mar 27, 2012
-
Kieran Kunhya authored and Fiona Glaser committed
Mar 27, 2012 Configuration menu - View commit details
-
Copy full SHA for 52f7a14 - Browse repository at this point
Copy the full SHA 52f7a14View commit details
Commits on Apr 23, 2012
-
ICL/MSVS: Fix shared library generation and usage
MSVS requires exported variables to be declared with the DATA keyword, and requires that imported variables be declared with dllimport. This does not fix x264 cli being unable to use a shared library built by ICL however.
Configuration menu - View commit details
-
Copy full SHA for 70877e3 - Browse repository at this point
Copy the full SHA 70877e3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 62d7007 - Browse repository at this point
Copy the full SHA 62d7007View commit details -
Update config.guess and config.sub
Adds support for a bunch of targets, including: aarch64 (armv8) arm-linux-androideabi
Configuration menu - View commit details
-
Copy full SHA for f4aefb3 - Browse repository at this point
Copy the full SHA f4aefb3View commit details -
configure: force select -mXX gcc option for i386/x86-64
Makes multilib compilation more convenient.
Configuration menu - View commit details
-
Copy full SHA for ffea9f5 - Browse repository at this point
Copy the full SHA ffea9f5View commit details -
Configuration menu - View commit details
-
Copy full SHA for b0f44f9 - Browse repository at this point
Copy the full SHA b0f44f9View commit details -
Eradicate all mention of Extended Profile
x264 never supported it and never will because nobody uses it.
Henrik Gramner authored and Fiona Glaser committedApr 23, 2012 Configuration menu - View commit details
-
Copy full SHA for 66acbbf - Browse repository at this point
Copy the full SHA 66acbbfView commit details -
Configuration menu - View commit details
-
Copy full SHA for e8952df - Browse repository at this point
Copy the full SHA e8952dfView commit details -
Faster chroma weight cost calculation
New assembly function with SSE2, SSSE3 and XOP implementations for calculating absolute sum of differences.
Henrik Gramner authored and Fiona Glaser committedApr 23, 2012 Configuration menu - View commit details
-
Copy full SHA for 4442eac - Browse repository at this point
Copy the full SHA 4442eacView commit details
Commits on Apr 24, 2012
-
Add mb_info API for signalling constant macroblocks
Some use-cases of x264 involve encoding video with large constant areas of the frame. Sometimes, the caller knows which areas these are, and can tell x264. This API lets the caller do this and adds internal tracking of modifications to macroblocks to avoid problems. This is really only suitable without B-frames. An example use-case would be using x264 for VNC.
Fiona Glaser committedApr 24, 2012 Configuration menu - View commit details
-
Copy full SHA for 8e57a9a - Browse repository at this point
Copy the full SHA 8e57a9aView commit details
Commits on May 15, 2012
-
Configuration menu - View commit details
-
Copy full SHA for 44d2f08 - Browse repository at this point
Copy the full SHA 44d2f08View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7cfe43c - Browse repository at this point
Copy the full SHA 7cfe43cView commit details
Commits on May 18, 2012
-
Split each lookahead frame analysis call into multiple threads. Has a small impact on quality, but does not seem to be consistently any worse. This helps alleviate bottlenecks with many cores and frame threads. In many case, this massively increases performance on many-core systems. For example, over 100% faster 1080p encoding with --preset veryfast on a 12-core i7 system. Realtime 1080p30 at --preset slow should now be feasible on real systems. For sliced-threads, this patch should be faster regardless of settings (~10%). By default, lookahead threads are 1/6 of regular threads. This isn't exacting, but it seems to work well for all presets on real systems. With sliced-threads, it's the same as the number of encoding threads.
Fiona Glaser committedMay 18, 2012 Configuration menu - View commit details
-
Copy full SHA for df700ea - Browse repository at this point
Copy the full SHA df700eaView commit details
Commits on Jul 3, 2012
-
Fix some integer overflows and check input parameters better. Also fix incorrect type specifiers for demuxer info printing.
Configuration menu - View commit details
-
Copy full SHA for 5e3aaf1 - Browse repository at this point
Copy the full SHA 5e3aaf1View commit details -
x86inc: import patches from libav
Allow manual invocation of WIN64_SPILL_XMM even under INIT_MMX SSE version of mova is movaps rather than movdqa. YMM version of movnta. Add mp size for named arguments. Fix DEFINE_ARGS when used outside of a cglobal. Define a few more cpuflags. 3-argument wrappers for a few more instructions.
Configuration menu - View commit details
-
Copy full SHA for 5754ea2 - Browse repository at this point
Copy the full SHA 5754ea2View commit details
Commits on Jul 17, 2012
-
Cap ratecontrol predictor parameters
Limits VBV mispredictions after long periods of relatively constant video.
Configuration menu - View commit details
-
Copy full SHA for bcd1a70 - Browse repository at this point
Copy the full SHA bcd1a70View commit details -
Configuration menu - View commit details
-
Copy full SHA for 498af9c - Browse repository at this point
Copy the full SHA 498af9cView commit details -
Support changing resolutions between passes with macroblock-tree
Implement a basic separable bilinear filter to rescale the quantizer offsets. Structure inspired by swscale, but floating-point instead of fixed-point. Not as optimized as it could be, but it's quite fast already. Example compression penalties on a 720p video game recording: First pass with 720p and second as 480p: ~-1.5% (vs. same res) First pass with 480p and second as 720p: ~-3% (vs. same res)
Fiona Glaser committedJul 17, 2012 Configuration menu - View commit details
-
Copy full SHA for dea5d7a - Browse repository at this point
Copy the full SHA dea5d7aView commit details -
Try 8x8 transform analysis even when sub8x8 partitions are present
Turn off the sub8x8 partitions, try it, and turn them back on if it didn't help. Small compression improvement with p4x4 on (~0.1-0.5%). Also update related comments.
Fiona Glaser committedJul 17, 2012 Configuration menu - View commit details
-
Copy full SHA for d026397 - Browse repository at this point
Copy the full SHA d026397View commit details -
Faster predictor checking with subme<3
Fix a typo that made an early-skip less effective. Avoid a relatively unpredictable branch. Slightly changed output due to the typo-fix. ~50 cycles faster on Core i7.
Fiona Glaser committedJul 17, 2012 Configuration menu - View commit details
-
Copy full SHA for 2ec6941 - Browse repository at this point
Copy the full SHA 2ec6941View commit details
Commits on Jul 18, 2012
-
People don't seem to like this so I'm just going to get rid of it.
Fiona Glaser committedJul 18, 2012 Configuration menu - View commit details
-
Copy full SHA for 3d03b61 - Browse repository at this point
Copy the full SHA 3d03b61View commit details
Commits on Jul 26, 2012
-
Free user supplied data when deleting a frame
This eliminates a memory leak when calling x264_encoder_close.
Configuration menu - View commit details
-
Copy full SHA for cbb9070 - Browse repository at this point
Copy the full SHA cbb9070View commit details
Commits on Jul 27, 2012
-
x86inc: automatically insert vzeroupper for YMM functions
Backported from libav.
Configuration menu - View commit details
-
Copy full SHA for ed56837 - Browse repository at this point
Copy the full SHA ed56837View commit details
Commits on Sep 5, 2012
-
Remove special-casing for OpenBSD pthread handling
Previously it was policy to use -pthread, but OpenBSD now recommends -lpthread. its been libpthread anyway and policy has changed to stop using -pthread.
Configuration menu - View commit details
-
Copy full SHA for f8fd641 - Browse repository at this point
Copy the full SHA f8fd641View commit details -
Export the average effective CRF of each frame
Useful to judge the resulting quality of a frame when VBV is enabled.
Fiona Glaser committedSep 5, 2012 Configuration menu - View commit details
-
Copy full SHA for cc5dced - Browse repository at this point
Copy the full SHA cc5dcedView commit details -
Improve mb_info constant mb optimization
Allow fast skipping even if the pskip MV isn't zero.
Fiona Glaser committedSep 5, 2012 Configuration menu - View commit details
-
Copy full SHA for 05089a3 - Browse repository at this point
Copy the full SHA 05089a3View commit details -
Add the input frame opaque pointer to the arguments. This makes it easier to use with multiple simultaneous x264 encodes.
Fiona Glaser committedSep 5, 2012 Configuration menu - View commit details
-
Copy full SHA for f93b786 - Browse repository at this point
Copy the full SHA f93b786View commit details -
Fix mb_info_free with sliced threads
x264 would free mb_info before it was completely done using it.
Fiona Glaser committedSep 5, 2012 Configuration menu - View commit details
-
Copy full SHA for 033df0a - Browse repository at this point
Copy the full SHA 033df0aView commit details -
Enhance mb_info: add mb_info_update
This feature lets the callee know which decoded macroblocks have changed.
Fiona Glaser committedSep 5, 2012 Configuration menu - View commit details
-
Copy full SHA for 8980dd8 - Browse repository at this point
Copy the full SHA 8980dd8View commit details
Commits on Sep 11, 2012
-
Set libm in the configure script if the OS has libm
Prerequisite for another configure patch after this. Idea copied from libpthread.
Configuration menu - View commit details
-
Copy full SHA for e8e8b9a - Browse repository at this point
Copy the full SHA e8e8b9aView commit details
Commits on Sep 26, 2012
-
Configuration menu - View commit details
-
Copy full SHA for 02217bd - Browse repository at this point
Copy the full SHA 02217bdView commit details -
Fix use of deprecated av_close_input_file call
Jason Martens authored and Fiona Glaser committedSep 26, 2012 Configuration menu - View commit details
-
Copy full SHA for 9657747 - Browse repository at this point
Copy the full SHA 9657747View commit details
Commits on Nov 7, 2012
-
Fix ALIGNED_ARRAY_EMU macros on ICL
ICL's preprocessor doesn't handle it correctly. This fix is similar to libav's fix in 0db2d9.
Configuration menu - View commit details
-
Copy full SHA for 21ba91a - Browse repository at this point
Copy the full SHA 21ba91aView commit details -
Lossless mode can't currently be enabled mid-stream.
Configuration menu - View commit details
-
Copy full SHA for 480bbc9 - Browse repository at this point
Copy the full SHA 480bbc9View commit details -
Fix crash with no-scenecut + mbtree
Fiona Glaser committedNov 7, 2012 Configuration menu - View commit details
-
Copy full SHA for ac2d7c0 - Browse repository at this point
Copy the full SHA ac2d7c0View commit details -
Disable ARM NEON MRC CPU test for Apple devices
The Apple A6 CPU doesn't support performance counters, so this test caused a crash.
David Wolstencroft authored and Fiona Glaser committedNov 7, 2012 Configuration menu - View commit details
-
Copy full SHA for 3f516c5 - Browse repository at this point
Copy the full SHA 3f516c5View commit details -
x86inc: only define program_name if the macro is unset.
This allows overriding the value from outside the file. This can be useful if x86inc.asm is used outside of x264.
Configuration menu - View commit details
-
Copy full SHA for 00cc160 - Browse repository at this point
Copy the full SHA 00cc160View commit details -
x86inc: Rename 3dnow2 to 3dnowext
The name "3dnowext" is more common than "3dnow2". Doesn't affect x264.
Configuration menu - View commit details
-
Copy full SHA for 5d85879 - Browse repository at this point
Copy the full SHA 5d85879View commit details -
Configuration menu - View commit details
-
Copy full SHA for cc61a4b - Browse repository at this point
Copy the full SHA cc61a4bView commit details -
Update level dpb size calculation to match newer H.264 spec
Doesn't actually change encoding behavior, but makes it more correct. Warning messages should now be accurate at higher bit depths and non-4:2:0. Technically, since it redefines x264_level_t, this is an API version increment.
Fiona Glaser committedNov 7, 2012 Configuration menu - View commit details
-
Copy full SHA for 0d5f6fb - Browse repository at this point
Copy the full SHA 0d5f6fbView commit details -
Improve slice header QP selection
Use the first macroblock of each slice instead of the last of the previous. Lets us pick a reasonable initial QP for the first slice too. Slightly improved compression.
Fiona Glaser committedNov 7, 2012 Configuration menu - View commit details
-
Copy full SHA for b304a7c - Browse repository at this point
Copy the full SHA b304a7cView commit details -
Attempt to optimize PPS pic_init_qp in 2-pass mode
Small compression improvement; up to ~0.5% in extreme cases. Helps more with small slice sizes (tiny resolutions or slice-max-size). Note that this changes the 2-pass stats file format.
Fiona Glaser committedNov 7, 2012 Configuration menu - View commit details
-
Copy full SHA for 1580a74 - Browse repository at this point
Copy the full SHA 1580a74View commit details
Commits on Nov 8, 2012
-
Fix possible issues with out-of-spec QP values
Fixes a possible regression in r2228.
Configuration menu - View commit details
-
Copy full SHA for bfed708 - Browse repository at this point
Copy the full SHA bfed708View commit details
Commits on Nov 12, 2012
-
Configuration menu - View commit details
-
Copy full SHA for 144b791 - Browse repository at this point
Copy the full SHA 144b791View commit details
Commits on Nov 19, 2012
-
lavf input: allocate AVFrame correctly
Allocate AVFrames correctly with avcodec_alloc_frame(). This caused crashes with newer libavcodecs that try to free frame extradata.
Configuration menu - View commit details
-
Copy full SHA for 0db80be - Browse repository at this point
Copy the full SHA 0db80beView commit details
Commits on Dec 6, 2012
-
Solaris: use sysconf to get processor count
Solaris responds correctly to the same value as Cygwin, so let's use that.
Configuration menu - View commit details
-
Copy full SHA for 12458a2 - Browse repository at this point
Copy the full SHA 12458a2View commit details -
Configuration menu - View commit details
-
Copy full SHA for cd71765 - Browse repository at this point
Copy the full SHA cd71765View commit details -
Configuration menu - View commit details
-
Copy full SHA for 042fdd3 - Browse repository at this point
Copy the full SHA 042fdd3View commit details
Commits on Dec 12, 2012
-
Fix pthread_join emulation on win32 and BeOS
Doesn't actually affect x264, but it's more correct.
Configuration menu - View commit details
-
Copy full SHA for 23829dd - Browse repository at this point
Copy the full SHA 23829ddView commit details
Commits on Jan 8, 2013
-
Fix build on ARM with binutils >= 2.23.51.0.6
GAS doesn't seem to like spaces in vld1 anymore, so remove those.
Bernhard Rosenkränzer authored and Fiona Glaser committedJan 8, 2013 Configuration menu - View commit details
-
Copy full SHA for 05c1646 - Browse repository at this point
Copy the full SHA 05c1646View commit details -
Fix crash if the first frame is forced to a non-keyframe
This is obviously bad user input, but x264 shouldn't crash if it happens.
Configuration menu - View commit details
-
Copy full SHA for 8eddd52 - Browse repository at this point
Copy the full SHA 8eddd52View commit details -
Update config.guess and config.sub
Henrik Gramner authored and Fiona Glaser committedJan 8, 2013 Configuration menu - View commit details
-
Copy full SHA for 9d5ec55 - Browse repository at this point
Copy the full SHA 9d5ec55View commit details -
x86inc: support stack mem allocation and re-alignment in PROLOGUE
Use this in 8-bit loopfilter functions so they can be used if there is no aligned stack (e.g. x86-32 MSVC or ICC 10.x).
Configuration menu - View commit details
-
Copy full SHA for b073e87 - Browse repository at this point
Copy the full SHA b073e87View commit details -
x86inc: activate REP_RET automatically
Now RET checks whether it immediately follows a branch, so the programmer dosen't have to keep track of that condition. REP_RET is still needed manually when it's a branch target, but that's much rarer. The implementation involves lots of spurious labels, but that's ok because we strip them.
Configuration menu - View commit details
-
Copy full SHA for 4cf2728 - Browse repository at this point
Copy the full SHA 4cf2728View commit details
Commits on Jan 9, 2013
-
x86inc: Use VEX-encoded instructions in AVX functions
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4 functions for all instructions that exists in a VEX-encoded version. This change makes it easier to extend existing code to use AVX2. Also add support for AVX emulation of a few instructions that were missing before.
Henrik Gramner authored and Fiona Glaser committedJan 9, 2013 Configuration menu - View commit details
-
Copy full SHA for 8a9608b - Browse repository at this point
Copy the full SHA 8a9608bView commit details -
AVX2/FMA3 version of mbtree_propagate
First AVX2 function for testing. Bump yasm version to 1.2.0 for AVX2 support.
Fiona Glaser committedJan 9, 2013 Configuration menu - View commit details
-
Copy full SHA for ccda1ba - Browse repository at this point
Copy the full SHA ccda1baView commit details -
It is no longer needed now that we've bumped the version requirement of yasm to 1.2.0.
Henrik Gramner authored and Fiona Glaser committedJan 9, 2013 Configuration menu - View commit details
-
Copy full SHA for f2b4f29 - Browse repository at this point
Copy the full SHA f2b4f29View commit details -
Configuration menu - View commit details
-
Copy full SHA for 732b072 - Browse repository at this point
Copy the full SHA 732b072View commit details
Commits on Feb 25, 2013
-
x86-32: use simple nop codes for <= sse
The "CentaurHauls family 6 model 9 stepping 8" family of CPUs (flags: fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse up rng rng_en ace ace_en) SIGILLs on long nop codes.
Configuration menu - View commit details
-
Copy full SHA for 9475e6a - Browse repository at this point
Copy the full SHA 9475e6aView commit details -
x86-64: fix trellis asm with interlacing
Regression in r2145. Assembly assumed array was [2][64] when it was actually [2][63]. Tiny (~0.1%) compression improvement.
Fiona Glaser committedFeb 25, 2013 Configuration menu - View commit details
-
Copy full SHA for 5743b19 - Browse repository at this point
Copy the full SHA 5743b19View commit details -
Configuration menu - View commit details
-
Copy full SHA for c2c2a95 - Browse repository at this point
Copy the full SHA c2c2a95View commit details -
Fix possible non-determinism with mbtree + open-gop + sync-lookahead
Code assumed keyframe analysis would only pull one frame off the list; this isn't true with open-gop.
Configuration menu - View commit details
-
Copy full SHA for 43ff8f1 - Browse repository at this point
Copy the full SHA 43ff8f1View commit details -
Update "Install and compile x264" in doc/regression_test.txt
Neil authored and Fiona Glaser committedFeb 25, 2013 Configuration menu - View commit details
-
Copy full SHA for b671762 - Browse repository at this point
Copy the full SHA b671762View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6a82e49 - Browse repository at this point
Copy the full SHA 6a82e49View commit details -
x264.h: improve x264_encoder_reconfig documentation
Fiona Glaser committedFeb 25, 2013 Configuration menu - View commit details
-
Copy full SHA for 3269534 - Browse repository at this point
Copy the full SHA 3269534View commit details -
x86inc: rename program_name to private_prefix
Synced from libav. The new name is more descriptive and will allow defining a separate public prefix for externally visible library symbols.
Configuration menu - View commit details
-
Copy full SHA for faf3dbe - Browse repository at this point
Copy the full SHA faf3dbeView commit details -
x86inc: Add cvisible macro for C functions with public prefix
This allows defining externally visible library symbols. Signed-off-by: Diego Biurrun <diego@biurrun.de>
Configuration menu - View commit details
-
Copy full SHA for fd2c4a0 - Browse repository at this point
Copy the full SHA fd2c4a0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5ec5c78 - Browse repository at this point
Copy the full SHA 5ec5c78View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5e0fca8 - Browse repository at this point
Copy the full SHA 5e0fca8View commit details -
Configuration menu - View commit details
-
Copy full SHA for f6e0d28 - Browse repository at this point
Copy the full SHA f6e0d28View commit details -
~4% faster PIC WIN64: ~3% faster and 16 byte shorter cabac_encode_bypass ~8% faster cabac_encode_terminal Benchmarked on Ivy Bridge UNIX64: One instruction less in cabac_encode_bypass
Configuration menu - View commit details
-
Copy full SHA for c3983b8 - Browse repository at this point
Copy the full SHA c3983b8View commit details -
x86: Use SSE instead of SSE2 for copying data
Reduces code size because movaps/movups is one byte shorter than movdqa/movdqu. Also merge MMX and SSE versions of memcpy_aligned into a single macro.
Configuration menu - View commit details
-
Copy full SHA for 5a76432 - Browse repository at this point
Copy the full SHA 5a76432View commit details -
Improve lookahead-threads auto selection
Smarter decision to improve fast-first-pass performance in 2-pass encodes. Dramatically improves CPU utilization on multi-core systems. Tested on a quad-core Ivy Bridge (12 threads, 1080p): Fast first pass: veryfast: ~7% faster faster: ~11% faster fast/medium: ~15% faster slow/slower: ~42% faster veryslow: ~55% faster CRF/1-pass: veryfast: ~9% faster (all others remained the same)
Fiona Glaser committedFeb 25, 2013 Configuration menu - View commit details
-
Copy full SHA for d2a9d25 - Browse repository at this point
Copy the full SHA d2a9d25View commit details -
Fix two bugs in predictor checking
pmv wasn't checked properly in some cases, as well as zero vector. Output-changing portion of the following patch.
Fiona Glaser committedFeb 25, 2013 Configuration menu - View commit details
-
Copy full SHA for 0046406 - Browse repository at this point
Copy the full SHA 0046406View commit details -
x86: optimize and clean up predictor checking
Branchlessly handle elimination of candidates in MMX roundclip asm. Add a new asm function, similar to roundclip, except without the round part. Optimize and organize the C code, and make both subme>=3 and subme<3 consistent. Add lots of explanatory comments and try to make things a little more understandable. ~5-10% faster with subme>=3, ~15-20% faster with subme<3.
Fiona Glaser committedFeb 25, 2013 Configuration menu - View commit details
-
Copy full SHA for 6371c3a - Browse repository at this point
Copy the full SHA 6371c3aView commit details -
x86: faster high bit depth ssd
About 15% faster on average.
Configuration menu - View commit details
-
Copy full SHA for 93bf124 - Browse repository at this point
Copy the full SHA 93bf124View commit details
Commits on Feb 26, 2013
-
x86: port SSE2+ SATD functions to high bit depth
Makes SATD 20-50% faster across all partition sizes but 4x4.
Configuration menu - View commit details
-
Copy full SHA for 790c648 - Browse repository at this point
Copy the full SHA 790c648View commit details -
x86: combined SA8D/SATD dsp function
Speedup is most apparent for 8-bit (~30%), but gives some improvements for 10-bit too (~12%). 64-bit only for now.
Configuration menu - View commit details
-
Copy full SHA for 75d9270 - Browse repository at this point
Copy the full SHA 75d9270View commit details -
x86: detect Bobcat, improve Atom optimizations, reorganize flags
The Bobcat has a 64-bit SIMD unit reminiscent of the Athlon 64; detect this and apply the appropriate flags. It also has an extremely slow palignr instruction; create a flag for this to avoid massive penalties on palignr-heavy functions. Improve Atom function selection and document exactly what the SLOW_ATOM flag covers. Add Atom-optimized SATD/SA8D/hadamard_ac functions: simply combine the ssse3 optimizations with the sse2 algorithm to avoid pmaddubsw, which is slow on Atom along with other SIMD multiplies. Drop TBM detection; it'll probably never be useful for x264. Invert FastShuffle to SlowShuffle; it only ever applied to one CPU (Conroe). Detect CMOV, to fail more gracefully when run on a chip with MMX2 but no CMOV.
Fiona Glaser committedFeb 26, 2013 Configuration menu - View commit details
-
Copy full SHA for 5d60b9c - Browse repository at this point
Copy the full SHA 5d60b9cView commit details -
x86: faster AVX satd/sa8d/sa8d_satd/hadamard_ac
Use Conroe-style movddup in AVX transforms; both Sandy Bridge and Bulldozer do movddup in the load unit, so it's totally free this way. On Sandy Bridge: ~6% faster sa8d_satd ~5% faster hadamard_ac ~9% faster 32-bit satd ~2% faster sa8d
Fiona Glaser committedFeb 26, 2013 Configuration menu - View commit details
-
Copy full SHA for 68a6268 - Browse repository at this point
Copy the full SHA 68a6268View commit details -
Fix some store forwarding stalls
There's quite a few others, but most of them don't help to fix or there's no easy way to avoid them.
Fiona Glaser committedFeb 26, 2013 Configuration menu - View commit details
-
Copy full SHA for 7de9a9a - Browse repository at this point
Copy the full SHA 7de9a9aView commit details -
Eliminate some branchiness in ME/analysis
Faster, fewer branch mispredictions.
Fiona Glaser committedFeb 26, 2013 Configuration menu - View commit details
-
Copy full SHA for 7b1301e - Browse repository at this point
Copy the full SHA 7b1301eView commit details -
Add AvxSynth support to the AviSynth input module.
Uses dlopen to load AvxSynth on Linux and OS X. Allows the use of --demuxer avs for AvxSynth, though the only source filter it can currently use is FFMS2. Add a local copy of avxsynth_c.h and its dependent headers in extras/ so that users don't need to actually have AvxSynth development headers installed to enable support for it (mirroring the AviSynth behavior). Based on a patch by 0x09 (tab@lavabit.com)
Configuration menu - View commit details
-
Copy full SHA for 5ee1d03 - Browse repository at this point
Copy the full SHA 5ee1d03View commit details -
quant_4x4x4: quant one 8x8 block at a time
This reduces overhead and lets us use less branchy code for zigzag, dequant, decimate, and so on. Reorganize and optimize a lot of macroblock_encode using this new function. ~1-2% faster overall. Includes NEON and x86 versions of the new function. Using larger merged functions like this will also make wider SIMD, like AVX2, more effective.
Fiona Glaser committedFeb 26, 2013 Configuration menu - View commit details
-
Copy full SHA for 993c81e - Browse repository at this point
Copy the full SHA 993c81eView commit details -
CABAC/CAVLC: use the new bit-iterating macro here too
Fiona Glaser committedFeb 26, 2013 Configuration menu - View commit details
-
Copy full SHA for 215f2be - Browse repository at this point
Copy the full SHA 215f2beView commit details -
ARM: update NEON mc_chroma to work with NV12 and re-enable it
Up to 10-15% faster overall.
Stefan Groenroos authored and Fiona Glaser committedFeb 26, 2013 Configuration menu - View commit details
-
Copy full SHA for 3a8baa0 - Browse repository at this point
Copy the full SHA 3a8baa0View commit details
Commits on Mar 1, 2013
-
ARM: Fix bug in x264_quant_4x4x4_neon
Regression in r2273.
Stefan Groenroos authored and Fiona Glaser committedMar 1, 2013 Configuration menu - View commit details
-
Copy full SHA for cb4547a - Browse repository at this point
Copy the full SHA cb4547aView commit details
Commits on Apr 13, 2013
-
Fix undefined behavior in x264_ratecontrol_mb
Fiona Glaser committedApr 13, 2013 Configuration menu - View commit details
-
Copy full SHA for 3703344 - Browse repository at this point
Copy the full SHA 3703344View commit details
Commits on Apr 23, 2013
-
Fix array overreads that caused miscompilation in gcc 4.8
Fiona Glaser committedApr 23, 2013 Configuration menu - View commit details
-
Copy full SHA for 3cdaca1 - Browse repository at this point
Copy the full SHA 3cdaca1View commit details -
x86inc: fix some corner cases of SWAP
SWAP with >=3 named (rather than numbered) args PERMUTE followed by SWAP with 2 named args used to produce the wrong permutation
Configuration menu - View commit details
-
Copy full SHA for bed18d0 - Browse repository at this point
Copy the full SHA bed18d0View commit details -
x86: correctly check stack alignment for Atom hadamard_ac
Regression in r2265 (only affected compilers with broken stack alignment, like ICL on win32).
Fiona Glaser committedApr 23, 2013 Configuration menu - View commit details
-
Copy full SHA for 42c500a - Browse repository at this point
Copy the full SHA 42c500aView commit details -
Configuration menu - View commit details
-
Copy full SHA for aa73459 - Browse repository at this point
Copy the full SHA aa73459View commit details -
lavf input: don't use deprecated AVStream fields
Fixes building against newer libavcodecs from the Libav project.
Configuration menu - View commit details
-
Copy full SHA for e74287e - Browse repository at this point
Copy the full SHA e74287eView commit details -
Configuration menu - View commit details
-
Copy full SHA for bf52bab - Browse repository at this point
Copy the full SHA bf52babView commit details -
Disable mbtree asm with cpu-independent option
Results vary between versions because of different rounding results.
Configuration menu - View commit details
-
Copy full SHA for 8a3a41d - Browse repository at this point
Copy the full SHA 8a3a41dView commit details -
Works in conjunction with slice-max-mbs and/or slice-max-size to avoid overly small slices. Useful with certain decoders that barf on extremely small slices. If slice-min-mbs would be violated as a result of slice-max-size, x264 will exceed slice-max-size and print a warning.
Fiona Glaser committedApr 23, 2013 Configuration menu - View commit details
-
Copy full SHA for fdfffa3 - Browse repository at this point
Copy the full SHA fdfffa3View commit details -
The H.264 spec technically has limits on the number of slices per frame. x264 normally ignores this, since most use-cases that require large numbers of slices prefer it to. However, certain decoders may break with extremely large numbers of slices, as can occur with some slice-max-size/mbs settings. When set, x264 will refuse to create any slices beyond the maximum number, even if slice-max-size/mbs requires otherwise.
Fiona Glaser committedApr 23, 2013 Configuration menu - View commit details
-
Copy full SHA for 732e4f7 - Browse repository at this point
Copy the full SHA 732e4f7View commit details -
weightp: improve scale/offset search, chroma
Rescale the scale factor if the offset clips. This makes weightp more effective in fades to/from white (and an other situation that requires big offsets). Search more than 1 scale factor and more than 1 offset, depending on --subme. Try to find the optimal chroma denominator instead of hardcoding it. Overall improvement: a few percent in fade-heavy clips, such as a sample from Avatar: TLA.
Fiona Glaser committedApr 23, 2013 Configuration menu - View commit details
-
Copy full SHA for 2d0c47a - Browse repository at this point
Copy the full SHA 2d0c47aView commit details -
OpenCL support is compiled in by default, but must be enabled at runtime by an --opencl command line flag. Compiling OpenCL support requires perl. To avoid the perl requirement use: configure --disable-opencl. When enabled, the lookahead thread is mostly off-loaded to an OpenCL capable GPU device. Lowres intra cost prediction, lowres motion search (including subpel) and bidir cost predictions are all done on the GPU. MB-tree and final slice decisions are still done by the CPU. Presets which do not use a threaded lookahead will not use OpenCL at all (superfast, ultrafast). Because of data dependencies, the GPU must use an iterative motion search which performs more total work than the CPU would do, so this is not work efficient or power efficient. But if there are spare GPU cycles to spare, it can often speed up the encode. Output quality when OpenCL lookahead is enabled is often very slightly worse in quality than the CPU quality (because of the same data dependencies). x264 must compile its OpenCL kernels for your device before running them, and in order to avoid doing this every run it caches the compiled kernel binary in a file named x264_lookahead.clbin (--opencl-clbin FNAME to override). The cache file will be ignored if the device, driver, or OpenCL source are changed. x264 will use the first GPU device which supports the required cl_image features required by its kernels. Most modern discrete GPUs and all AMD integrated GPUs will work. Intel integrated GPUs (up to IvyBridge) do not support those necessary features. Use --opencl-device N to specify a number of capable GPUs to skip during device detection. Switchable graphics environments (e.g. AMD Enduro) are currently not supported, as some have bugs in their OpenCL drivers that cause output to be silently incorrect. Developed by MulticoreWare with support from AMD and Telestream.
Configuration menu - View commit details
-
Copy full SHA for f49a1b2 - Browse repository at this point
Copy the full SHA f49a1b2View commit details -
x86-64: cabac_block_residual assembly
RDO: ~20% faster than C Bitstream: ~50% faster than C 1-2% faster overall, highest on preset superfast/fast/medium.
Fiona Glaser committedApr 23, 2013 Configuration menu - View commit details
-
Copy full SHA for a3f5c73 - Browse repository at this point
Copy the full SHA a3f5c73View commit details -
x86inc: fix AVX emulation of cmp(p|s)(s|d)
Fiona Glaser committedApr 23, 2013 Configuration menu - View commit details
-
Copy full SHA for 3a8dfb2 - Browse repository at this point
Copy the full SHA 3a8dfb2View commit details -
x86inc: create xm# and ym#, analagous to m#
For when we want to mix simd sizes within one function.
Configuration menu - View commit details
-
Copy full SHA for 19e1a2b - Browse repository at this point
Copy the full SHA 19e1a2bView commit details -
x86: more AVX2 framework, AVX2 functions, plus some existing asm tweaks
AVX2 functions: mc_chroma intra_sad_x3_16x16 last64 ads hpel dct4 idct4 sub16x16_dct8 quant_4x4x4 quant_4x4 quant_4x4_dc quant_8x8 SAD_X3/X4 SATD var var2 SSD zigzag interleave weightp weightb intra_sad_8x8_x9 decimate integral hadamard_ac sa8d_satd sa8d lowres_init denoise
Fiona Glaser committedApr 23, 2013 Configuration menu - View commit details
-
Copy full SHA for 0ea5be8 - Browse repository at this point
Copy the full SHA 0ea5be8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 184c505 - Browse repository at this point
Copy the full SHA 184c505View commit details -
Configuration menu - View commit details
-
Copy full SHA for 51708c3 - Browse repository at this point
Copy the full SHA 51708c3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7908dc6 - Browse repository at this point
Copy the full SHA 7908dc6View commit details -
Configuration menu - View commit details
-
Copy full SHA for fa40b44 - Browse repository at this point
Copy the full SHA fa40b44View commit details -
x86: AVX high bit-depth predict_16x16_v
Also restructure some code to reduce code size of various functions, especially in high bit-depth.
Configuration menu - View commit details
-
Copy full SHA for f3d521d - Browse repository at this point
Copy the full SHA f3d521dView commit details -
Also fix the AVX implementation to correctly use the SSSE3 inline asm instead of SSE2.
Configuration menu - View commit details
-
Copy full SHA for 8ecdeb2 - Browse repository at this point
Copy the full SHA 8ecdeb2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 97ad171 - Browse repository at this point
Copy the full SHA 97ad171View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0f776f6 - Browse repository at this point
Copy the full SHA 0f776f6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 547a657 - Browse repository at this point
Copy the full SHA 547a657View commit details -
Also rewrite the entire function to be faster and drop the AVX version which is no longer useful.
Configuration menu - View commit details
-
Copy full SHA for e7a46b6 - Browse repository at this point
Copy the full SHA e7a46b6View commit details -
x86: AVX2 high_bit_depth pixel_avg2, get_ref, mc_copy_w16, mc_luma
Also reduce the number of xmm registers used by mc_copy_* to avoid saving and restoring xmm6 and xmm7 on 64-bit Windows.
Configuration menu - View commit details
-
Copy full SHA for 295f83a - Browse repository at this point
Copy the full SHA 295f83aView commit details -
x86: AVX2 high bit-depth pixel_sad
Also use loops instead of duplicating code; reduces code size by ~10kB with negligible effect on performance.
Configuration menu - View commit details
-
Copy full SHA for 9f885c1 - Browse repository at this point
Copy the full SHA 9f885c1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0e69048 - Browse repository at this point
Copy the full SHA 0e69048View commit details -
x86: AVX2 high bit-depth pixel_sad_x3/pixel_sad_x4
Also reduce the number of xmm registers used by sse2/ssse3 pixel_sad_x3.
Configuration menu - View commit details
-
Copy full SHA for f49c2eb - Browse repository at this point
Copy the full SHA f49c2ebView commit details -
Configuration menu - View commit details
-
Copy full SHA for dc05aeb - Browse repository at this point
Copy the full SHA dc05aebView commit details -
Configuration menu - View commit details
-
Copy full SHA for 03396f8 - Browse repository at this point
Copy the full SHA 03396f8View commit details -
~55% faster ads in benchasm, ~15-30% in real encoding. ~4% faster "placebo" preset overall.
Fiona Glaser committedApr 23, 2013 Configuration menu - View commit details
-
Copy full SHA for 40316f8 - Browse repository at this point
Copy the full SHA 40316f8View commit details -
x86-64: BMI2 cabac_residual functions
Fiona Glaser committedApr 23, 2013 Configuration menu - View commit details
-
Copy full SHA for c17d12f - Browse repository at this point
Copy the full SHA c17d12fView commit details -
x86: SSSE3 LUT-based faster coeff_level_run
~2x faster coeff_level_run. Faster CAVLC encoding: {1%,2%,7%} overall with {superfast,medium,slower}. Uses the same pshufb LUT abuse trick as in the previous ads_mvs patch.
Fiona Glaser committedApr 23, 2013 Configuration menu - View commit details
-
Copy full SHA for 67d6f60 - Browse repository at this point
Copy the full SHA 67d6f60View commit details
Commits on Apr 29, 2013
-
Fix two bugs in slice-min-mbs and slices-max
Slices-max broke slice-max-size when slice-max wasn't used. Slice-min-mbs broke in rare cases near the end of a threadslice.
Fiona Glaser committedApr 29, 2013 Configuration menu - View commit details
-
Copy full SHA for 7f36065 - Browse repository at this point
Copy the full SHA 7f36065View commit details
Commits on May 15, 2013
-
Fix invalid memcpy in sliced-threads
Likely didn't actually break in practice, but memcpy with src==dst is incorrect.
Fiona Glaser committedMay 15, 2013 Configuration menu - View commit details
-
Copy full SHA for 3ba0fb8 - Browse repository at this point
Copy the full SHA 3ba0fb8View commit details
Commits on May 17, 2013
-
Configuration menu - View commit details
-
Copy full SHA for 0e000e7 - Browse repository at this point
Copy the full SHA 0e000e7View commit details -
checkasm: Use 64-bit cycle counters
Prevents overflows that can occur in some cases.
Configuration menu - View commit details
-
Copy full SHA for 5444e95 - Browse repository at this point
Copy the full SHA 5444e95View commit details -
x86inc: Remove .rodata kludges
The Mach-O bug was fixed in yasm 0.8.0 and we don't support versions that old. a.out was superseded by ELF on sane systems a few decades ago.
Configuration menu - View commit details
-
Copy full SHA for c1e3709 - Browse repository at this point
Copy the full SHA c1e3709View commit details -
Fiona Glaser committed
May 17, 2013 Configuration menu - View commit details
-
Copy full SHA for 25e219a - Browse repository at this point
Copy the full SHA 25e219aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 16d0372 - Browse repository at this point
Copy the full SHA 16d0372View commit details -
x86: Don't use explicitly aligned versions of SAD on AVX CPUs
On modern CPUs movdqu isn't slower than movdqa when used on aligned data and using the same code in both cases saves cache. This was already done for the high bit-depth AVX2 implementation but the aligned version still exists as dead code so remove that.
Configuration menu - View commit details
-
Copy full SHA for 33c3526 - Browse repository at this point
Copy the full SHA 33c3526View commit details
Commits on May 20, 2013
-
x86inc: Utilize the shadow space on 64-bit Windows
Store XMM6 and XMM7 in the shadow space in functions that clobbers them. This way we don't have to adjust the stack pointer as often, reducing the number of instructions as well as code size.
Configuration menu - View commit details
-
Copy full SHA for 30c91f6 - Browse repository at this point
Copy the full SHA 30c91f6View commit details -
x86: 32-byte align the stack if possible
Avoids the need for manual 32 byte array alignment on compilers that support -mpreferred-stack-boundary.
Fiona Glaser committedMay 20, 2013 Configuration menu - View commit details
-
Copy full SHA for 7cbb27f - Browse repository at this point
Copy the full SHA 7cbb27fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1f5a32c - Browse repository at this point
Copy the full SHA 1f5a32cView commit details -
~7% faster using the pmulhrsw trick from mc_chroma.
Fiona Glaser committedMay 20, 2013 Configuration menu - View commit details
-
Copy full SHA for a838417 - Browse repository at this point
Copy the full SHA a838417View commit details -
x86: Faster high bit-depth intra_sad_x3_4x4
20->16 cycles on Ivy Bridge
Configuration menu - View commit details
-
Copy full SHA for 594dd84 - Browse repository at this point
Copy the full SHA 594dd84View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8e4f045 - Browse repository at this point
Copy the full SHA 8e4f045View commit details -
x86: AVX2 high bit-depth intra_sad_x3_8x8
43->24 cycles
Configuration menu - View commit details
-
Copy full SHA for f114746 - Browse repository at this point
Copy the full SHA f114746View commit details -
Configuration menu - View commit details
-
Copy full SHA for af6647e - Browse repository at this point
Copy the full SHA af6647eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0c00c2c - Browse repository at this point
Copy the full SHA 0c00c2cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 02aa136 - Browse repository at this point
Copy the full SHA 02aa136View commit details -
x86: AVX2 high bit-depth quant
quant_4x4: 13->6 cycles quant_4x4_dc: 14->8 cycles quant_8x8: 47->24 cycles quant_4x4x4: 48->25 cycles
Configuration menu - View commit details
-
Copy full SHA for 481e4cd - Browse repository at this point
Copy the full SHA 481e4cdView commit details -
x86: AVX2 high bit-depth denoise_dct
28->15 cycles Also reorder instructions to use fewer registers, 3 cycles faster on Ivy Bridge with 64-bit Windows.
Configuration menu - View commit details
-
Copy full SHA for 89f067b - Browse repository at this point
Copy the full SHA 89f067bView commit details -
x86-64: 64-bit variant of AVX2 hpel_filter
~5% faster than 32-bit.
Fiona Glaser committedMay 20, 2013 Configuration menu - View commit details
-
Copy full SHA for bc88d1b - Browse repository at this point
Copy the full SHA bc88d1bView commit details -
Configuration menu - View commit details
-
Copy full SHA for edf31ed - Browse repository at this point
Copy the full SHA edf31edView commit details -
Configuration menu - View commit details
-
Copy full SHA for e7cb328 - Browse repository at this point
Copy the full SHA e7cb328View commit details -
x86: shave a few instructions off AVX deblock
Fiona Glaser committedMay 20, 2013 Configuration menu - View commit details
-
Copy full SHA for 0b2c3d3 - Browse repository at this point
Copy the full SHA 0b2c3d3View commit details -
OpenCL support improvement/refactoring
Autoload the OpenCL library so that it's not required to run an openCL-enabled build of x264. Update X264_BUILD, which should have been changed with the first patch.
Configuration menu - View commit details
-
Copy full SHA for 3aa9a67 - Browse repository at this point
Copy the full SHA 3aa9a67View commit details
Commits on May 22, 2013
-
Fix compilation with OpenCL on MacOS X
Also fix crash in the case of OpenCL error during encoding.
Configuration menu - View commit details
-
Copy full SHA for 3b8e924 - Browse repository at this point
Copy the full SHA 3b8e924View commit details
Commits on May 28, 2013
-
Fix building with compilers without inline asm support
Also fix crash in high bit depth builds compiled with unaligned stack.
Configuration menu - View commit details
-
Copy full SHA for e32d9c2 - Browse repository at this point
Copy the full SHA e32d9c2View commit details
Commits on Jul 3, 2013
-
Configuration menu - View commit details
-
Copy full SHA for c41b629 - Browse repository at this point
Copy the full SHA c41b629View commit details -
Configuration menu - View commit details
-
Copy full SHA for 25ef3f5 - Browse repository at this point
Copy the full SHA 25ef3f5View commit details -
Fix possible crash when writing very large filler NALUs
Bitstream-reallocation function didn't handle the case of filler.
Configuration menu - View commit details
-
Copy full SHA for ffc3ad4 - Browse repository at this point
Copy the full SHA ffc3ad4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 83d35c7 - Browse repository at this point
Copy the full SHA 83d35c7View commit details -
Interface: if vbv-maxrate < bitrate, set bitrate = vbv-maxrate
This probably makes more sense to the user than setting vbv-maxrate = bitrate, as before.
Fiona Glaser committedJul 3, 2013 Configuration menu - View commit details
-
Copy full SHA for 9143d5a - Browse repository at this point
Copy the full SHA 9143d5aView commit details -
Add "--stitchable" option for segmented encoding
Stops x264 from attempting to optimize global stream headers, ensuring that different segments of a video will have identical headers when used with identical encoding settings.
Fiona Glaser committedJul 3, 2013 Configuration menu - View commit details
-
Copy full SHA for fa215fc - Browse repository at this point
Copy the full SHA fa215fcView commit details -
Configuration menu - View commit details
-
Copy full SHA for 397f60e - Browse repository at this point
Copy the full SHA 397f60eView commit details -
x86: faster AVX2 iDCT, AVX deblock_luma_h, deblock_luma_h_intra
Fiona Glaser committedJul 3, 2013 Configuration menu - View commit details
-
Copy full SHA for bfa2f0c - Browse repository at this point
Copy the full SHA bfa2f0cView commit details -
Tweak i16x16-delta-quant-avoidance code
Don't omit the delta quant if it'd raise the quantizer to do so; this fixes a rare flickering issue caused by deblocking.
Fiona Glaser committedJul 3, 2013 Configuration menu - View commit details
-
Copy full SHA for 01087fd - Browse repository at this point
Copy the full SHA 01087fdView commit details
Commits on Jul 5, 2013
-
x86: Remove X264_CPU_SSE_MISALIGN functions
Prevents a crash if the misaligned exception mask bit is cleared for some reason. Misaligned SSE functions are only used on AMD Phenom CPUs and the benefit is miniscule. They also require modifying the MXCSR control register and by removing those functions we can get rid of that complexity altogether. VEX-encoded instructions also supports unaligned memory operands. I tried adding AVX implementations of all removed functions but there were no performance improvements on Ivy Bridge. pixel_sad_x3 and pixel_sad_x4 had significant code size reductions though so I kept them and added some minor cosmetics fixes and tweaks.
Configuration menu - View commit details
-
Copy full SHA for ff41804 - Browse repository at this point
Copy the full SHA ff41804View commit details
Commits on Aug 23, 2013
-
Fix AVX2 detection bug with "limit CPUID" enabled in BIOS
Fiona Glaser committedAug 23, 2013 Configuration menu - View commit details
-
Copy full SHA for 2d66c7c - Browse repository at this point
Copy the full SHA 2d66c7cView commit details -
Configuration menu - View commit details
-
Copy full SHA for a6c396f - Browse repository at this point
Copy the full SHA a6c396fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1430b04 - Browse repository at this point
Copy the full SHA 1430b04View commit details -
x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64"
This is also a valid value for WIN64.
Configuration menu - View commit details
-
Copy full SHA for adc99d1 - Browse repository at this point
Copy the full SHA adc99d1View commit details -
Diogo Franco authored and Fiona Glaser committed
Aug 23, 2013 Configuration menu - View commit details
-
Copy full SHA for 401edc3 - Browse repository at this point
Copy the full SHA 401edc3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4becc3e - Browse repository at this point
Copy the full SHA 4becc3eView commit details -
Configuration menu - View commit details
-
Copy full SHA for e33aac9 - Browse repository at this point
Copy the full SHA e33aac9View commit details -
Combine frame and mb data mallocs into a single large malloc. Additionally, on Linux systems with hugepage support, ask for hugepages on large mallocs. This gives a small performance improvement (~0.2-0.9%) on systems without hugepage support, as well as a small memory footprint reduction. On recent Linux kernels with hugepage support enabled (set to madvise or always), it improves performance up to 4% at the cost of about 7-12% more memory usage on typical settings.. It may help even more on Haswell and other recent CPUs with improved 2MB page support in hardware.
Configuration menu - View commit details
-
Copy full SHA for fa1e2b7 - Browse repository at this point
Copy the full SHA fa1e2b7View commit details -
This format has been reverse engineered and x264's output has almost exactly the same bitstream as Panasonic cameras and encoders produce. It therefore does not comply with SMPTE RP2027 since Panasonic themselves do not comply with their own specification. It has been tested in Avid, Premiere, Edius and Quantel. Parts of this patch were written by Fiona Glaser and some reverse engineering was done by Joseph Artsimovich.
Kieran Kunhya authored and Fiona Glaser committedAug 23, 2013 Configuration menu - View commit details
-
Copy full SHA for 9b94896 - Browse repository at this point
Copy the full SHA 9b94896View commit details -
Windows, unlike most other operating systems, uses UTF-16 for Unicode strings while x264 is designed for UTF-8. This patch does the following in order to handle things like Unicode filenames: * Keep strings internally as UTF-8. * Retrieve the CLI command line as UTF-16 and convert it to UTF-8. * Always use Unicode versions of Windows API functions and convert strings to UTF-16 when calling them. * Attempt to use legacy 8.3 short filenames for external libraries without Unicode support.
Configuration menu - View commit details
-
Copy full SHA for fa3cac5 - Browse repository at this point
Copy the full SHA fa3cac5View commit details
Commits on Aug 24, 2013
-
Configuration menu - View commit details
-
Copy full SHA for 098b686 - Browse repository at this point
Copy the full SHA 098b686View commit details
Commits on Aug 26, 2013
-
Fix masked access violation in KERNEL32
Caused crashes under gdb in Windows and might cause other unknown problems.
Configuration menu - View commit details
-
Copy full SHA for 5bcff2a - Browse repository at this point
Copy the full SHA 5bcff2aView commit details
Commits on Aug 27, 2013
-
Workaround for FFMS indexing bug
If FFMS_ReadIndex is used with an empty index file it gets stuck in an infinite loop instead of returning NULL like it's supposed to do on failure. Explicitly check if the file is empty before calling it as a workaround.
Configuration menu - View commit details
-
Copy full SHA for 2fd2923 - Browse repository at this point
Copy the full SHA 2fd2923View commit details
Commits on Sep 3, 2013
-
Configuration menu - View commit details
-
Copy full SHA for 5b272b2 - Browse repository at this point
Copy the full SHA 5b272b2View commit details
Commits on Oct 24, 2013
-
Configuration menu - View commit details
-
Copy full SHA for 50a0c33 - Browse repository at this point
Copy the full SHA 50a0c33View commit details -
Configuration menu - View commit details
-
Copy full SHA for 266fdfc - Browse repository at this point
Copy the full SHA 266fdfcView commit details -
Configuration menu - View commit details
-
Copy full SHA for 03450be - Browse repository at this point
Copy the full SHA 03450beView commit details -
configure: include dependency libs in the Libs pkg-config
If only a static library is built, the user of the library that just tries to link to the lib using the flags provided by pkg-config might not know that only a static lib exists and that he'd have to pass --static to pkg-config to get the internal dependencies to be able to link the library. For a shared build, the internal dependencies are kept in Libs.private as before. This matches how libav's pkg-config files are generated.
Configuration menu - View commit details
-
Copy full SHA for 12f9d49 - Browse repository at this point
Copy the full SHA 12f9d49View commit details -
Configuration menu - View commit details
-
Copy full SHA for c3c73f1 - Browse repository at this point
Copy the full SHA c3c73f1View commit details
Commits on Oct 25, 2013
-
Configuration menu - View commit details
-
Copy full SHA for b7b6029 - Browse repository at this point
Copy the full SHA b7b6029View commit details -
Configuration menu - View commit details
-
Copy full SHA for 05f0438 - Browse repository at this point
Copy the full SHA 05f0438View commit details -
Replace gf_malloc with regular malloc in mp4 muxer
It was used as a workaround for a bug that only existed in the GPAC repository for a few weeks back in 2010. There's no reason to keep it anymore.
Configuration menu - View commit details
-
Copy full SHA for 8b58a4c - Browse repository at this point
Copy the full SHA 8b58a4cView commit details -
Configuration menu - View commit details
-
Copy full SHA for b54422a - Browse repository at this point
Copy the full SHA b54422aView commit details -
x86inc: Make ym# behave the same way as xm#
This makes more sense for future implementations of templates with zmm registers.
Configuration menu - View commit details
-
Copy full SHA for 4b68633 - Browse repository at this point
Copy the full SHA 4b68633View commit details -
CRF-max: don't warn if VBV underflow occurs
Only warn if underflow occurs for reasons other than CRF-max, as CRF-max implies that VBV underflow is desired by the user.
Fiona Glaser committedOct 25, 2013 Configuration menu - View commit details
-
Copy full SHA for 7634f8c - Browse repository at this point
Copy the full SHA 7634f8cView commit details -
chroma-me: take shortcut in BI analysis
~100 cycles faster with subme>=9
Fiona Glaser committedOct 25, 2013 Configuration menu - View commit details
-
Copy full SHA for 77cc44f - Browse repository at this point
Copy the full SHA 77cc44fView commit details
Commits on Oct 30, 2013
-
Make x264_encoder_reconfig more threadsafe
Do the reconfig when the next frame's encode begins. Fixes some rare crashes with frame-threading and encoder_reconfig.
Configuration menu - View commit details
-
Copy full SHA for 350b214 - Browse repository at this point
Copy the full SHA 350b214View commit details -
Allows generation of hard-CBR streams without using NAL HRD. Useful if you want to be able to reconfigure the bitrate (which you can't do with NAL HRD on).
Fiona Glaser committedOct 30, 2013 Configuration menu - View commit details
-
Copy full SHA for c084f6c - Browse repository at this point
Copy the full SHA c084f6cView commit details -
Add AVC-Intra 1080p50/60 Class 100 parameters
Also add some compatibility fixes.
Configuration menu - View commit details
-
Copy full SHA for c9f2bce - Browse repository at this point
Copy the full SHA c9f2bceView commit details -
Configuration menu - View commit details
-
Copy full SHA for 09c7010 - Browse repository at this point
Copy the full SHA 09c7010View commit details -
It probably wasn't used or maintained for last few years.
Configuration menu - View commit details
-
Copy full SHA for 95d196e - Browse repository at this point
Copy the full SHA 95d196eView commit details
Commits on Jan 6, 2014
-
Caused if the timebase is not specified in stats file. Found by Clang.
Configuration menu - View commit details
-
Copy full SHA for a2f5d60 - Browse repository at this point
Copy the full SHA a2f5d60View commit details
Commits on Jan 8, 2014
-
Fix ARM asm compilation with Apple assembler
Steve Clark authored and Fiona Glaser committedJan 8, 2014 Configuration menu - View commit details
-
Copy full SHA for 9148141 - Browse repository at this point
Copy the full SHA 9148141View commit details -
Configuration menu - View commit details
-
Copy full SHA for 008c56e - Browse repository at this point
Copy the full SHA 008c56eView commit details -
CLI: Avoid redundant 16-bit upconversions in piped raw input
It's not possible to seek in pipes, so if we want to skip frames we have to read and discard unused ones. It's pointless to do bit-depth upconversions in those frames.
Configuration menu - View commit details
-
Copy full SHA for 6bc6341 - Browse repository at this point
Copy the full SHA 6bc6341View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7664014 - Browse repository at this point
Copy the full SHA 7664014View commit details -
It's an old stand-alone application that isn't relevant to x264.
Configuration menu - View commit details
-
Copy full SHA for 02697d5 - Browse repository at this point
Copy the full SHA 02697d5View commit details -
Also update AUTHORS file and my e-mail address in the headers of various files.
Configuration menu - View commit details
-
Copy full SHA for 807aeaa - Browse repository at this point
Copy the full SHA 807aeaaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8be6600 - Browse repository at this point
Copy the full SHA 8be6600View commit details
Commits on Jan 21, 2014
-
Fix quantization factor allocation
We don't need to wastefully allocate quant tables above QP_MAX_SPEC; they're never used.
Fiona Glaser committedJan 21, 2014 Configuration menu - View commit details
-
Copy full SHA for e2a9662 - Browse repository at this point
Copy the full SHA e2a9662View commit details -
Assembly based on code by Henrik Gramner and Loren Merritt.
Configuration menu - View commit details
-
Copy full SHA for 41227fa - Browse repository at this point
Copy the full SHA 41227faView commit details -
Configuration menu - View commit details
-
Copy full SHA for dd6a303 - Browse repository at this point
Copy the full SHA dd6a303View commit details -
x86inc: speed up compilation with yasm
Work around yasm's inefficiency with handling large numbers of variables in the global scope.
Configuration menu - View commit details
-
Copy full SHA for 42d2519 - Browse repository at this point
Copy the full SHA 42d2519View commit details
Commits on Feb 24, 2014
-
Android NDK does not expose sched_getaffinity.
Configuration menu - View commit details
-
Copy full SHA for 0d668be - Browse repository at this point
Copy the full SHA 0d668beView commit details -
Really fix quantization factor allocation
Actually allocate less (instead of just initialize less) and fix comments.
Configuration menu - View commit details
-
Copy full SHA for ee8d5e4 - Browse repository at this point
Copy the full SHA ee8d5e4View commit details
Commits on Mar 11, 2014
-
Configuration menu - View commit details
-
Copy full SHA for 48dbfa2 - Browse repository at this point
Copy the full SHA 48dbfa2View commit details -
Fix corruption with CAVLC overflow handling in MBAFF+main profile
Probably a regression in r2178.
Fiona Glaser committedMar 11, 2014 Configuration menu - View commit details
-
Copy full SHA for 19dddbc - Browse repository at this point
Copy the full SHA 19dddbcView commit details -
Fix memory overwrite in x264_deblock_h_chroma_mbaff_sse2
Fixes possible corruption with MBAFF+sliced threads.
Configuration menu - View commit details
-
Copy full SHA for 850c8c5 - Browse repository at this point
Copy the full SHA 850c8c5View commit details -
mbaff: fix mb_field_decoding_flag tracking and simplify allow skip check
Fixes an issue with too many forced non-skips in mbaff+cavlc, as well as non-deterministic output with mbaff+cavlc+sliced-threads.
Configuration menu - View commit details
-
Copy full SHA for 8b821ec - Browse repository at this point
Copy the full SHA 8b821ecView commit details
Commits on Mar 12, 2014
-
Configuration menu - View commit details
-
Copy full SHA for de01d88 - Browse repository at this point
Copy the full SHA de01d88View commit details -
The full details of the return values of encoder_encode and encoder_headers were mistakenly removed a while ago; re-add them.
Fiona Glaser committedMar 12, 2014 Configuration menu - View commit details
-
Copy full SHA for b7a50c1 - Browse repository at this point
Copy the full SHA b7a50c1View commit details -
Don't set chroma_loc_info_present_flag for non-4:2:0
The H.264 spec says it shouldn't be set in these cases.
Configuration menu - View commit details
-
Copy full SHA for f35e3fc - Browse repository at this point
Copy the full SHA f35e3fcView commit details -
Write 3D metadata when outputting Matroska
For when --frame-packing is set.
Steve Lhomme authored and Fiona Glaser committedMar 12, 2014 Configuration menu - View commit details
-
Copy full SHA for 0bb3b2e - Browse repository at this point
Copy the full SHA 0bb3b2eView commit details -
x86: Pass -Worphan-labels to yasm
Makes it easier to detect typos.
Configuration menu - View commit details
-
Copy full SHA for 8596dd3 - Browse repository at this point
Copy the full SHA 8596dd3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 974f2e7 - Browse repository at this point
Copy the full SHA 974f2e7View commit details -
x86inc: warn if XOP integer FMA instruction emulation is impossible
Emulation requires a temporary register if arguments 1 and 4 are the same; this doesn't obey the semantics of the original instruction, so we can't emulate that in x86inc. ffmpeg has an x86util emulation for that case; I'll add it if x264's asm ever needs it. Also add pmacsdql emulation.
Configuration menu - View commit details
-
Copy full SHA for 039fab9 - Browse repository at this point
Copy the full SHA 039fab9View commit details -
x86inc: Support arbitrary stack alignments
If the stack is known to be at least 32-byte aligned we can safely store ymm registers on the stack without doing manual alignment. Change ALLOC_STACK to always align the stack before allocating stack space for consistency. Previously alignment would occur either before or after allocating stack space depending on whether manual alignment was required or not.
Configuration menu - View commit details
-
Copy full SHA for 7c860f0 - Browse repository at this point
Copy the full SHA 7c860f0View commit details -
x86: Minor mbtree_propagate_cost improvements
Reduce the number of registers used from 7 to 6. Reduce the number of vector registers used by the AVX2 implementation from 8 to 7. Multiply fps_factor by 1/256 once per frame instead of once per macroblock row. Use mova instead of movu for dst since it's guaranteed to be aligned. Some cosmetics.
Configuration menu - View commit details
-
Copy full SHA for f032147 - Browse repository at this point
Copy the full SHA f032147View commit details -
x86: SSE2 and SSSE3 plane_copy_deinterleave_rgb
About 5.6x faster than C on Haswell.
Configuration menu - View commit details
-
Copy full SHA for a90ea34 - Browse repository at this point
Copy the full SHA a90ea34View commit details -
arm: implement x264_pixel_var_8x16_neon
checkasm --bench on a cortex-a9: var_8x16_c: 4306 var_8x16_neon: 791
Janne Grunau authored and Fiona Glaser committedMar 12, 2014 Configuration menu - View commit details
-
Copy full SHA for 6683612 - Browse repository at this point
Copy the full SHA 6683612View commit details -
arm: implement x264_pixel_var2_8x16_neon
checkasm --bench on a cortex-a9: var2_8x16_c: 5677 var2_8x16_neon: 1421
Janne Grunau authored and Fiona Glaser committedMar 12, 2014 Configuration menu - View commit details
-
Copy full SHA for ac8f2e8 - Browse repository at this point
Copy the full SHA ac8f2e8View commit details
Commits on Mar 13, 2014
-
arm: use available neon functions for intra_sa8d/sad/satd_x3
4% faster on main/medium, 15% faster on baseline/superfast on a cortex-a9.
Janne Grunau authored and Fiona Glaser committedMar 13, 2014 Configuration menu - View commit details
-
Copy full SHA for 00a00cc - Browse repository at this point
Copy the full SHA 00a00ccView commit details -
Macroblock tree overhaul/optimization
Move the second core part of macroblock tree into an assembly function; SIMD-optimize roughly half of it (for x86). Roughly ~25-65% faster mbtree, depending on content. Slightly change how mbtree handles the tradeoff between range and precision for propagation. Overall a slight (but mostly negligible) effect on SSIM and ~2% faster.
Fiona Glaser committedMar 13, 2014 Configuration menu - View commit details
-
Copy full SHA for b3fb718 - Browse repository at this point
Copy the full SHA b3fb718View commit details
Commits on Mar 6, 2017
-
Configuration menu - View commit details
-
Copy full SHA for 01f973d - Browse repository at this point
Copy the full SHA 01f973dView commit details