public inbox for git-commits@fedoraproject.org
help / color / mirror / Atom feed
From: Jakub Jelinek <jakub@fedoraproject.org>
To: git-commits@fedoraproject.org
Subject: [rpms/gcc] rhel-f41-base: 4.1.1-56
Date: Mon, 29 Jun 2026 12:23:07 GMT [thread overview]
Message-ID: <178273578709.1.7201169702592553932.rpms-gcc-15f5edb33ed3@fedoraproject.org> (raw)
A new commit has been pushed.
Repo : rpms/gcc
Branch : rhel-f41-base
Commit : 15f5edb33ed3dbe7acfa11a774946f641766f7f8
Author : Jakub Jelinek <jakub@fedoraproject.org>
Date : 2007-02-10T19:18:26+00:00
Stats : +3632/-9 in 4 file(s)
URL : https://src.fedoraproject.org/rpms/gcc/c/15f5edb33ed3dbe7acfa11a774946f641766f7f8?branch=rhel-f41-base
Log:
4.1.1-56
---
diff --git a/.cvsignore b/.cvsignore
index 612cc08..f09d384 100644
--- a/.cvsignore
+++ b/.cvsignore
@@ -1 +1 @@
-gcc-4.1.1-20070202.tar.bz2
+gcc-4.1.1-20070209.tar.bz2
diff --git a/gcc41-amdfam10.patch b/gcc41-amdfam10.patch
new file mode 100644
index 0000000..4a7938b
--- /dev/null
+++ b/gcc41-amdfam10.patch
@@ -0,0 +1,3614 @@
+2007-02-08 Harsha Jagasia <harsha.jagasia@amd.com>
+
+ * config/i386/xmmintrin.h: Make inclusion of emmintrin.h
+ conditional to __SSE2__.
+ * config/i386/emmintrin.h: Generate #error if __SSE2__ is not
+ defined.
+ * config/i386/pmmintrin.h: Generate #error if __SSE3__ is not
+ defined.
+ * config/i386/tmmintrin.h: Generate #error if __SSSE3__ is not
+ defined.
+
+2007-02-05 Harsha Jagasia <harsha.jagasia@amd.com>
+
+ * config/i386/athlon.md (athlon_fldxf_k8, athlon_fld_k8,
+ athlon_fstxf_k8, athlon_fst_k8, athlon_fist, athlon_fmov,
+ athlon_fadd_load, athlon_fadd_load_k8, athlon_fadd, athlon_fmul,
+ athlon_fmul_load, athlon_fmul_load_k8, athlon_fsgn,
+ athlon_fdiv_load, athlon_fdiv_load_k8, athlon_fdiv_k8,
+ athlon_fpspc_load, athlon_fpspc, athlon_fcmov_load,
+ athlon_fcmov_load_k8, athlon_fcmov_k8, athlon_fcomi_load_k8,
+ athlon_fcomi, athlon_fcom_load_k8, athlon_fcom): Added amdfam10.
+
+ * config/i386/i386.md (x86_sahf_1, cmpfp_i_mixed, cmpfp_i_sse,
+ cmpfp_i_i387, cmpfp_iu_mixed, cmpfp_iu_sse, cmpfp_iu_387,
+ swapsi, swaphi_1, swapqi_1, swapdi_rex64, fix_truncsfdi_sse,
+ fix_truncdfdi_sse, fix_truncsfsi_sse, fix_truncdfsi_sse,
+ x86_fldcw_1, floatsisf2_mixed, floatsisf2_sse, floatdisf2_mixed,
+ floatdisf2_sse, floatsidf2_mixed, floatsidf2_sse,
+ floatdidf2_mixed, floatdidf2_sse, muldi3_1_rex64, mulsi3_1,
+ mulsi3_1_zext, mulhi3_1, mulqi3_1, umulqihi3_1, mulqihi3_insn,
+ umulditi3_insn, umulsidi3_insn, mulditi3_insn, mulsidi3_insn,
+ umuldi3_highpart_rex64, umulsi3_highpart_insn,
+ umulsi3_highpart_zext, smuldi3_highpart_rex64,
+ smulsi3_highpart_insn, smulsi3_highpart_zext, x86_64_shld,
+ x86_shld_1, x86_64_shrd, sqrtsf2_mixed, sqrtsf2_sse,
+ sqrtsf2_i387, sqrtdf2_mixed, sqrtdf2_sse, sqrtdf2_i387,
+ sqrtextendsfdf2_i387, sqrtxf2, sqrtextendsfxf2_i387,
+ sqrtextenddfxf2_i387): Added amdfam10_decode.
+
+ * config/i386/athlon.md (athlon_idirect_amdfam10,
+ athlon_ivector_amdfam10, athlon_idirect_load_amdfam10,
+ athlon_ivector_load_amdfam10, athlon_idirect_both_amdfam10,
+ athlon_ivector_both_amdfam10, athlon_idirect_store_amdfam10,
+ athlon_ivector_store_amdfam10): New define_insn_reservation.
+ (athlon_idirect_loadmov, athlon_idirect_movstore): Added
+ amdfam10.
+
+ * config/i386/athlon.md (athlon_call_amdfam10,
+ athlon_pop_amdfam10, athlon_lea_amdfam10): New
+ define_insn_reservation.
+ (athlon_branch, athlon_push, athlon_leave_k8, athlon_imul_k8,
+ athlon_imul_k8_DI, athlon_imul_mem_k8, athlon_imul_mem_k8_DI,
+ athlon_idiv, athlon_idiv_mem, athlon_str): Added amdfam10.
+
+ * config/i386/athlon.md (athlon_sseld_amdfam10,
+ athlon_mmxld_amdfam10, athlon_ssest_amdfam10,
+ athlon_mmxssest_short_amdfam10): New define_insn_reservation.
+
+ * config/i386/athlon.md (athlon_sseins_amdfam10): New
+ define_insn_reservation.
+ * config/i386/i386.md (sseins): Added sseins to define_attr type
+ and define_attr unit.
+ * config/i386/sse.md: Set type attribute to sseins for insertq
+ and insertqi.
+
+ * config/i386/athlon.md (sselog_load_amdfam10, sselog_amdfam10,
+ ssecmpvector_load_amdfam10, ssecmpvector_amdfam10,
+ ssecomi_load_amdfam10, ssecomi_amdfam10,
+ sseaddvector_load_amdfam10, sseaddvector_amdfam10): New
+ define_insn_reservation.
+ (ssecmp_load_k8, ssecmp, sseadd_load_k8, seadd): Added amdfam10.
+
+ * config/i386/athlon.md (cvtss2sd_load_amdfam10,
+ cvtss2sd_amdfam10, cvtps2pd_load_amdfam10, cvtps2pd_amdfam10,
+ cvtsi2sd_load_amdfam10, cvtsi2ss_load_amdfam10,
+ cvtsi2sd_amdfam10, cvtsi2ss_amdfam10, cvtsd2ss_load_amdfam10,
+ cvtsd2ss_amdfam10, cvtpd2ps_load_amdfam10, cvtpd2ps_amdfam10,
+ cvtsX2si_load_amdfam10, cvtsX2si_amdfam10): New
+ define_insn_reservation.
+
+ * config/i386/sse.md (cvtsi2ss, cvtsi2ssq, cvtss2si,
+ cvtss2siq, cvttss2si, cvttss2siq, cvtsi2sd, cvtsi2sdq,
+ cvtsd2si, cvtsd2siq, cvttsd2si, cvttsd2siq,
+ cvtpd2dq, cvttpd2dq, cvtsd2ss, cvtss2sd,
+ cvtpd2ps, cvtps2pd): Added amdfam10_decode attribute.
+
+ * config/i386/athlon.md (athlon_ssedivvector_amdfam10,
+ athlon_ssedivvector_load_amdfam10, athlon_ssemulvector_amdfam10,
+ athlon_ssemulvector_load_amdfam10): New define_insn_reservation.
+ (athlon_ssediv, athlon_ssediv_load_k8, athlon_ssemul,
+ athlon_ssemul_load_k8): Added amdfam10.
+
+ * config/i386/i386.h (TARGET_SSE_UNALIGNED_MOVE_OPTIMAL): New macro.
+ (x86_sse_unaligned_move_optimal): New variable.
+
+ * config/i386/i386.c (x86_sse_unaligned_move_optimal): Enable for
+ m_AMDFAM10.
+ (ix86_expand_vector_move_misalign): Add code to generate movupd/movups
+ for unaligned vector SSE double/single precision loads for AMDFAM10.
+
+ * config/i386/i386.h (TARGET_AMDFAM10): New macro.
+ (TARGET_CPU_CPP_BUILTINS): Add code for amdfam10.
+ Define TARGET_CPU_DEFAULT_amdfam10.
+ (TARGET_CPU_DEFAULT_NAMES): Add amdfam10.
+ (processor_type): Add PROCESSOR_AMDFAM10.
+
+ * config/i386/i386.md: Add amdfam10 as a new cpu attribute to match
+ processor_type in config/i386/i386.h.
+ Enable imul peepholes for TARGET_AMDFAM10.
+
+ * config.gcc: Add support for --with-cpu option for amdfam10.
+
+ * config/i386/i386.c (amdfam10_cost): New variable.
+ (m_AMDFAM10): New macro.
+ (m_ATHLON_K8_AMDFAM10): New macro.
+ (x86_use_leave, x86_push_memory, x86_movx, x86_unroll_strlen,
+ x86_cmove, x86_3dnow_a, x86_deep_branch, x86_use_simode_fiop,
+ x86_promote_QImode, x86_integer_DFmode_moves,
+ x86_partial_reg_dependency, x86_memory_mismatch_stall,
+ x86_accumulate_outgoing_args, x86_arch_always_fancy_math_387,
+ x86_sse_partial_reg_dependency, x86_sse_typeless_stores,
+ x86_use_ffreep, x86_use_incdec, x86_four_jump_limit,
+ x86_schedule, x86_use_bt, x86_cmpxchg16b, x86_pad_returns):
+ Enable/disable for amdfam10.
+ (override_options): Add amdfam10_cost to processor_target_table.
+ Set up PROCESSOR_AMDFAM10 for amdfam10 entry in
+ processor_alias_table.
+ (ix86_issue_rate): Add PROCESSOR_AMDFAM10.
+ (ix86_adjust_cost): Add code for amdfam10.
+
+ * config/i386/i386.opt: Add new Advanced Bit Manipulation (-mabm)
+ instruction set feature flag. Add new (-mpopcnt) flag for popcnt
+ instruction. Add new SSE4A (-msse4a) instruction set feature flag.
+ * config/i386/i386.h: Add builtin definition for SSE4A.
+ * config/i386/i386.md: Add support for ABM instructions
+ (popcnt and lzcnt).
+ * config/i386/sse.md: Add support for SSE4A instructions
+ (movntss, movntsd, extrq, insertq).
+ * config/i386/i386.c: Add support for ABM and SSE4A builtins.
+ Add -march=amdfam10 flag.
+ * config/i386/ammintrin.h: Add support for SSE4A intrinsics.
+ * doc/invoke.texi: Add documentation on flags for sse4a, abm, popcnt
+ and amdfam10.
+ * doc/extend.texi: Add documentation for SSE4A builtins.
+
+2007-02-05 Dwarakanath Rajagopal <dwarak.rajagopal@amd.com>
+
+ * gcc.dg/i386-cpuid.h: Test whether SSE4A is supported
+ for running tests.
+ * gcc.target/i386/sse4a-extract.c: New test.
+ * gcc.target/i386/sse4a-insert.c: New test.
+ * gcc.target/i386/sse4a-montsd.c: New test.
+ * gcc.target/i386/sse4a-montss.c: New test.
+
+2006-12-15 H.J. Lu <hongjiu.lu@intel.com>
+
+ * gcc.dg/i386-cpuid.h (bit_SSSE3): New.
+
+2006-11-30 H.J. Lu <hongjiu.lu@intel.com>
+
+ * gcc.dg/i386-cpuid.h (bit_SSE3): New.
+ (i386_get_cpuid): New function.
+ (i386_cpuid_ecx): Likewise.
+ (i386_cpuid_edx): Likewise.
+ (i386_cpuid): Updated to call i386_cpuid_edx.
+
+--- gcc/doc/extend.texi.jj 2007-02-09 16:18:25.000000000 +0100
++++ gcc/doc/extend.texi 2007-02-09 21:26:06.000000000 +0100
+@@ -6931,6 +6931,23 @@ v4si __builtin_ia32_pabsd128 (v4si)
+ v8hi __builtin_ia32_pabsw128 (v8hi)
+ @end smallexample
+
++The following built-in functions are available when @option{-msse4a} is used.
++
++@smallexample
++void _mm_stream_sd (double*,__m128d);
++Generates the @code{movntsd} machine instruction.
++void _mm_stream_ss (float*,__m128);
++Generates the @code{movntss} machine instruction.
++__m128i _mm_extract_si64 (__m128i, __m128i);
++Generates the @code{extrq} machine instruction with only SSE register operands.
++__m128i _mm_extracti_si64 (__m128i, int, int);
++Generates the @code{extrq} machine instruction with SSE register and immediate operands.
++__m128i _mm_insert_si64 (__m128i, __m128i);
++Generates the @code{insertq} machine instruction with only SSE register operands.
++__m128i _mm_inserti_si64 (__m128i, __m128i, int, int);
++Generates the @code{insertq} machine instruction with SSE register and immediate operands.
++@end smallexample
++
+ The following built-in functions are available when @option{-m3dnow} is used.
+ All of them generate the machine instruction that is part of the name.
+
+--- gcc/doc/invoke.texi.jj 2007-02-09 16:18:25.000000000 +0100
++++ gcc/doc/invoke.texi 2007-02-09 21:56:44.000000000 +0100
+@@ -522,7 +522,7 @@ Objective-C and Objective-C++ Dialects}.
+ -mno-fp-ret-in-387 -msoft-float -msvr3-shlib @gol
+ -mno-wide-multiply -mrtd -malign-double @gol
+ -mpreferred-stack-boundary=@var{num} @gol
+--mmmx -msse -msse2 -msse3 -mssse3 -m3dnow @gol
++-mmmx -msse -msse2 -msse3 -mssse3 -msse4a -m3dnow -mpopcnt -mabm @gol
+ -mthreads -mno-align-stringops -minline-all-stringops @gol
+ -mpush-args -maccumulate-outgoing-args -m128bit-long-double @gol
+ -m96bit-long-double -mregparm=@var{num} -msseregparm @gol
+@@ -9062,6 +9062,10 @@ instruction set support.
+ @item k8, opteron, athlon64, athlon-fx
+ AMD K8 core based CPUs with x86-64 instruction set support. (This supersets
+ MMX, SSE, SSE2, 3dNOW!, enhanced 3dNOW! and 64-bit instruction set extensions.)
++@item amdfam10
++AMD Family 10 core based CPUs with x86-64 instruction set support. (This
++supersets MMX, SSE, SSE2, SSE3, SSE4A, 3dNOW!, enhanced 3dNOW!, ABM and 64-bit
++instruction set extensions.)
+ @item winchip-c6
+ IDT Winchip C6 CPU, dealt in same way as i486 with additional MMX instruction
+ set support.
+@@ -9339,8 +9343,14 @@ preferred alignment to @option{-mpreferr
+ @itemx -mno-sse3
+ @item -mssse3
+ @itemx -mno-ssse3
++@item -msse4a
++@item -mno-sse4a
+ @item -m3dnow
+ @itemx -mno-3dnow
++@item -mpopcnt
++@itemx -mno-popcnt
++@item -mabm
++@itemx -mno-abm
+ @opindex mmmx
+ @opindex mno-mmx
+ @opindex msse
+--- gcc/testsuite/gcc.target/i386/sse4a-insert.c.jj 2007-02-09 21:26:06.000000000 +0100
++++ gcc/testsuite/gcc.target/i386/sse4a-insert.c 2007-02-09 21:26:06.000000000 +0100
+@@ -0,0 +1,110 @@
++/* { dg-do run { target i?86-*-* x86_64-*-* } } */
++/* { dg-options "-O2 -msse4a" } */
++#include <ammintrin.h>
++#include <stdlib.h>
++#include "../../gcc.dg/i386-cpuid.h"
++
++static void sse4a_test (void);
++
++typedef union
++{
++ long long i[2];
++ __m128i vec;
++} LI;
++
++int
++main ()
++{
++ unsigned long cpu_facilities;
++
++ cpu_facilities = i386_extended_cpuid_ecx ();
++
++ /* Run SSE4a test only if host has SSE4a support. */
++ if ((cpu_facilities & bit_SSE4a))
++ sse4a_test ();
++
++ exit (0);
++}
++
++static long long
++sse4a_test_insert (long long in1, long long in2)
++{
++ __m128i v1,v2;
++ long long index_length, pad;
++ LI v_out;
++ index_length = 0x0000000000000810;
++ pad = 0x0;
++ v1 = _mm_set_epi64x (pad, in1);
++ v2 = _mm_set_epi64x (index_length, in2);
++ v_out.vec = _mm_insert_si64 (v1, v2);
++ return (v_out.i[0]);
++}
++
++static long long
++sse4a_test_inserti (long long in1, long long in2)
++{
++ __m128i v1,v2;
++ long long pad = 0x0;
++ LI v_out;
++ v1 = _mm_set_epi64x (pad, in1);
++ v2 = _mm_set_epi64x (pad, in2);
++ v_out.vec = _mm_inserti_si64 (v1, v2, (unsigned int) 0x10, (unsigned int) 0x08);
++ return (v_out.i[0]);
++}
++
++static chk (long long i1, long long i2)
++{
++ int n_fails =0;
++ if (i1 != i2)
++ n_fails +=1;
++ return n_fails;
++}
++
++long long vals_in1[5] =
++ {
++ 0x1234567887654321,
++ 0x1456782093002490,
++ 0x2340909123990390,
++ 0x9595959599595999,
++ 0x9099038798000029
++ };
++
++long long vals_in2[5] =
++ {
++ 0x9ABCDEF00FEDCBA9,
++ 0x234567097289672A,
++ 0x45476453097BD342,
++ 0x23569012AE586FF0,
++ 0x432567ABCDEF765D
++ };
++
++long long vals_out[5] =
++ {
++ 0x1234567887CBA921,
++ 0x1456782093672A90,
++ 0x2340909123D34290,
++ 0x95959595996FF099,
++ 0x9099038798765D29
++ };
++
++static void
++sse4a_test (void)
++{
++ int i;
++ int fail = 0;
++ long long out;
++
++ for (i = 0; i < 5; i += 1)
++ {
++ out = sse4a_test_insert (vals_in1[i], vals_in2[i]);
++ fail += chk(out, vals_out[i]);
++
++ out = sse4a_test_inserti (vals_in1[i], vals_in2[i]);
++ fail += chk(out, vals_out[i]);
++ }
++
++ if (fail != 0)
++ abort ();
++
++ exit (0);
++}
+--- gcc/testsuite/gcc.target/i386/sse4a-extract.c.jj 2007-02-09 21:26:06.000000000 +0100
++++ gcc/testsuite/gcc.target/i386/sse4a-extract.c 2007-02-09 21:26:06.000000000 +0100
+@@ -0,0 +1,100 @@
++/* { dg-do run { target i?86-*-* x86_64-*-* } } */
++/* { dg-options "-O2 -msse4a" } */
++#include <ammintrin.h>
++#include <stdlib.h>
++#include "../../gcc.dg/i386-cpuid.h"
++
++static void sse4a_test (void);
++
++typedef union
++{
++ long long i[2];
++ __m128i vec;
++} LI;
++
++int
++main ()
++{
++ unsigned long cpu_facilities;
++
++ cpu_facilities = i386_extended_cpuid_ecx ();
++
++ /* Run SSE4a test only if host has SSE4a support. */
++ if ((cpu_facilities & bit_SSE4a))
++ sse4a_test ();
++
++ exit (0);
++}
++
++static long long
++sse4a_test_extrq (long long in)
++{
++ __m128i v1, v2;
++ long long index_length, pad;
++ LI v_out;
++ index_length = 0x0000000000000810;
++ pad = 0x0;
++ v1 = _mm_set_epi64x (pad, in);
++ v2 = _mm_set_epi64x (pad, index_length);
++ v_out.vec = _mm_extract_si64 (v1, v2);
++ return (v_out.i[0]);
++}
++
++static long long
++sse4a_test_extrqi (long long in)
++{
++ __m128i v1;
++ long long pad =0x0;
++ LI v_out;
++ v1 = _mm_set_epi64x (pad, in);
++ v_out.vec = _mm_extracti_si64 (v1, (unsigned int) 0x10,(unsigned int) 0x08);
++ return (v_out.i[0]);
++}
++
++static chk (long long i1, long long i2)
++{
++ int n_fails =0;
++ if (i1 != i2)
++ n_fails +=1;
++ return n_fails;
++}
++
++long long vals_in[5] =
++ {
++ 0x1234567887654321,
++ 0x1456782093002490,
++ 0x2340909123990390,
++ 0x9595959599595999,
++ 0x9099038798000029
++ };
++
++long long vals_out[5] =
++ {
++ 0x0000000000006543,
++ 0x0000000000000024,
++ 0x0000000000009903,
++ 0x0000000000005959,
++ 0x0000000000000000
++ };
++
++static void
++sse4a_test (void)
++{
++ int i;
++ int fail = 0;
++ long long out;
++
++ for (i = 0; i < 5; i += 1)
++ {
++ out = sse4a_test_extrq (vals_in[i]);
++ fail += chk(out, vals_out[i]);
++
++ out = sse4a_test_extrqi (vals_in[i]);
++ fail += chk(out, vals_out[i]);
++ }
++
++ if (fail != 0)
++ abort ();
++
++ exit (0);
++}
+--- gcc/testsuite/gcc.target/i386/sse4a-montss.c.jj 2007-02-09 21:26:06.000000000 +0100
++++ gcc/testsuite/gcc.target/i386/sse4a-montss.c 2007-02-09 21:26:06.000000000 +0100
+@@ -0,0 +1,64 @@
++/* { dg-do run { target i?86-*-* x86_64-*-* } } */
++/* { dg-options "-O2 -msse4a" } */
++#include <ammintrin.h>
++#include <stdlib.h>
++#include "../../gcc.dg/i386-cpuid.h"
++
++static void sse4a_test (void);
++
++int
++main ()
++{
++ unsigned long cpu_facilities;
++
++ cpu_facilities = i386_extended_cpuid_ecx ();
++
++ /* Run SSE4a test only if host has SSE4a support. */
++ if ((cpu_facilities & bit_SSE4a))
++ sse4a_test ();
++
++ exit (0);
++}
++
++static void
++sse4a_test_movntss (float *out, float *in)
++{
++ __m128 in_v4sf = _mm_load_ss (in);
++ _mm_stream_ss (out, in_v4sf);
++}
++
++static int
++chk_ss (float *v1, float *v2)
++{
++ int n_fails = 0;
++ if (v1[0] != v2[0])
++ n_fails += 1;
++ return n_fails;
++}
++
++float vals[10] =
++ {
++ 100.0, 200.0, 300.0, 400.0, 5.0,
++ -1.0, .345, -21.5, 9.32, 8.41
++ };
++
++static void
++sse4a_test (void)
++{
++ int i;
++ int fail = 0;
++ float *out;
++
++ out = (float *) malloc (sizeof (float));
++ for (i = 0; i < 10; i += 1)
++ {
++ sse4a_test_movntss (out, &vals[i]);
++
++ fail += chk_ss (out, &vals[i]);
++ }
++
++ if (fail != 0)
++ abort ();
++
++ exit (0);
++}
+--- gcc/testsuite/gcc.target/i386/sse4a-montsd.c.jj 2007-02-09 21:26:06.000000000 +0100
++++ gcc/testsuite/gcc.target/i386/sse4a-montsd.c 2007-02-09 21:26:06.000000000 +0100
+@@ -0,0 +1,64 @@
++/* { dg-do run { target i?86-*-* x86_64-*-* } } */
++/* { dg-options "-O2 -msse4a" } */
++#include <ammintrin.h>
++#include <stdlib.h>
++#include "../../gcc.dg/i386-cpuid.h"
++
++static void sse4a_test (void);
++
++int
++main ()
++{
++ unsigned long cpu_facilities;
++
++ cpu_facilities = i386_extended_cpuid_ecx ();
++
++ /* Run SSE4a test only if host has SSE4a support. */
++ if ((cpu_facilities & bit_SSE4a))
++ sse4a_test ();
++
++ exit (0);
++}
++
++static void
++sse4a_test_movntsd (double *out, double *in)
++{
++ __m128d in_v2df = _mm_load_sd (in);
++ _mm_stream_sd (out, in_v2df);
++}
++
++static int
++chk_sd (double *v1, double *v2)
++{
++ int n_fails = 0;
++ if (v1[0] != v2[0])
++ n_fails += 1;
++ return n_fails;
++}
++
++double vals[10] =
++ {
++ 100.0, 200.0, 300.0, 400.0, 5.0,
++ -1.0, .345, -21.5, 9.32, 8.41
++ };
++
++static void
++sse4a_test (void)
++{
++ int i;
++ int fail = 0;
++ double *out;
++
++ out = (double *) malloc (sizeof (double));
++ for (i = 0; i < 10; i += 1)
++ {
++ sse4a_test_movntsd (out, &vals[i]);
++
++ fail += chk_sd (out, &vals[i]);
++ }
++
++ if (fail != 0)
++ abort ();
++
++ exit (0);
++}
+--- gcc/testsuite/gcc.dg/i386-cpuid.h.jj 2006-10-05 00:26:53.000000000 +0200
++++ gcc/testsuite/gcc.dg/i386-cpuid.h 2007-02-07 13:07:08.000000000 +0100
+@@ -2,23 +2,32 @@
+ Used by 20020523-2.c and i386-sse-6.c, and possibly others. */
+ /* Plagarized from 20020523-2.c. */
+
++/* %ecx */
++#define bit_SSE3 (1 << 0)
++#define bit_SSSE3 (1 << 9)
++
++/* %edx */
+ #define bit_CMOV (1 << 15)
+ #define bit_MMX (1 << 23)
+ #define bit_SSE (1 << 25)
+ #define bit_SSE2 (1 << 26)
+
++/* Extended Features */
++/* %ecx */
++#define bit_SSE4a (1 << 6)
++
+ #ifndef NOINLINE
+ #define NOINLINE __attribute__ ((noinline))
+ #endif
+
+-unsigned int i386_cpuid (void) NOINLINE;
+-
+-unsigned int NOINLINE
+-i386_cpuid (void)
++static inline unsigned int
++i386_get_cpuid (unsigned int *ecx, unsigned int *edx)
+ {
+- int fl1, fl2;
++ int fl1;
+
+ #ifndef __x86_64__
++ int fl2;
++
+ /* See if we can use cpuid. On AMD64 we always can. */
+ __asm__ ("pushfl; pushfl; popl %0; movl %0,%1; xorl %2,%0;"
+ "pushl %0; popfl; pushfl; popl %0; popfl"
+@@ -42,15 +51,99 @@ i386_cpuid (void)
+ if (fl1 == 0)
+ return (0);
+
+- /* Invoke CPUID(1), return %edx; caller can examine bits to
++ /* Invoke CPUID(1), return %ecx and %edx; caller can examine bits to
+ determine what's supported. */
+ #ifdef __x86_64__
+- __asm__ ("pushq %%rcx; pushq %%rbx; cpuid; popq %%rbx; popq %%rcx"
+- : "=d" (fl2), "=a" (fl1) : "1" (1) : "cc");
++ __asm__ ("pushq %%rbx; cpuid; popq %%rbx"
++ : "=c" (*ecx), "=d" (*edx), "=a" (fl1) : "2" (1) : "cc");
+ #else
+- __asm__ ("pushl %%ecx; pushl %%ebx; cpuid; popl %%ebx; popl %%ecx"
+- : "=d" (fl2), "=a" (fl1) : "1" (1) : "cc");
++ __asm__ ("pushl %%ebx; cpuid; popl %%ebx"
++ : "=c" (*ecx), "=d" (*edx), "=a" (fl1) : "2" (1) : "cc");
++#endif
++
++ return 1;
++}
++
++static inline unsigned int
++i386_get_extended_cpuid (unsigned int *ecx, unsigned int *edx)
++{
++ int fl1;
++ if (!(i386_get_cpuid (ecx, edx)))
++ return 0;
++
++ /* Invoke CPUID(0x80000000) to get the highest supported extended function
++ number */
++#ifdef __x86_64__
++ __asm__ ("cpuid"
++ : "=a" (fl1) : "0" (0x80000000) : "edx", "ecx", "ebx");
++#else
++ __asm__ ("pushl %%ebx; cpuid; popl %%ebx"
++ : "=a" (fl1) : "0" (0x80000000) : "edx", "ecx");
++#endif
++ /* Check if highest supported extended function used below are supported */
++ if (fl1 < 0x80000001)
++ return 0;
++
++ /* Invoke CPUID(0x80000001), return %ecx and %edx; caller can examine bits to
++ determine what's supported. */
++#ifdef __x86_64__
++ __asm__ ("cpuid"
++ : "=c" (*ecx), "=d" (*edx), "=a" (fl1) : "2" (0x80000001) : "ebx");
++#else
++ __asm__ ("pushl %%ebx; cpuid; popl %%ebx"
++ : "=c" (*ecx), "=d" (*edx), "=a" (fl1) : "2" (0x80000001));
+ #endif
++ return 1;
++}
++
++
++unsigned int i386_cpuid_ecx (void) NOINLINE;
++unsigned int i386_cpuid_edx (void) NOINLINE;
++unsigned int i386_extended_cpuid_ecx (void) NOINLINE;
++unsigned int i386_extended_cpuid_edx (void) NOINLINE;
++
++unsigned int NOINLINE
++i386_cpuid_ecx (void)
++{
++ unsigned int ecx, edx;
++ if (i386_get_cpuid (&ecx, &edx))
++ return ecx;
++ else
++ return 0;
++}
++
++unsigned int NOINLINE
++i386_cpuid_edx (void)
++{
++ unsigned int ecx, edx;
++ if (i386_get_cpuid (&ecx, &edx))
++ return edx;
++ else
++ return 0;
++}
+
+- return fl2;
++unsigned int NOINLINE
++i386_extended_cpuid_ecx (void)
++{
++ unsigned int ecx, edx;
++ if (i386_get_extended_cpuid (&ecx, &edx))
++ return ecx;
++ else
++ return 0;
++}
++
++unsigned int NOINLINE
++i386_extended_cpuid_edx (void)
++{
++ unsigned int ecx, edx;
++ if (i386_get_extended_cpuid (&ecx, &edx))
++ return edx;
++ else
++ return 0;
++}
++
++static inline unsigned int
++i386_cpuid (void)
++{
++ return i386_cpuid_edx ();
+ }
+--- gcc/config.gcc.jj 2007-02-09 16:18:25.000000000 +0100
++++ gcc/config.gcc 2007-02-09 21:26:06.000000000 +0100
+@@ -264,12 +264,12 @@ xscale-*-*)
+ i[34567]86-*-*)
+ cpu_type=i386
+ extra_headers="mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
+- pmmintrin.h tmmintrin.h"
++ pmmintrin.h tmmintrin.h ammintrin.h"
+ ;;
+ x86_64-*-*)
+ cpu_type=i386
+ extra_headers="mmintrin.h mm3dnow.h xmmintrin.h emmintrin.h
+- pmmintrin.h tmmintrin.h"
++ pmmintrin.h tmmintrin.h ammintrin.h"
+ need_64bit_hwint=yes
+ ;;
+ ia64-*-*)
+@@ -2396,6 +2396,9 @@ if test x$with_cpu = x ; then
+ ;;
+ i686-*-* | i786-*-*)
+ case ${target_noncanonical} in
++ amdfam10-*)
++ with_cpu=amdfam10
++ ;;
+ k8-*|opteron-*|athlon_64-*)
+ with_cpu=k8
+ ;;
+@@ -2436,6 +2439,9 @@ if test x$with_cpu = x ; then
+ ;;
+ x86_64-*-*)
+ case ${target_noncanonical} in
++ amdfam10-*)
++ with_cpu=amdfam10
++ ;;
+ k8-*|opteron-*|athlon_64-*)
+ with_cpu=k8
+ ;;
+@@ -2668,7 +2674,7 @@ case "${target}" in
+ esac
+ # OK
+ ;;
+- "" | k8 | opteron | athlon64 | athlon-fx | nocona | core2 | generic)
++ "" | amdfam10 | k8 | opteron | athlon64 | athlon-fx | nocona | core2 | generic)
+ # OK
+ ;;
+ *)
+--- gcc/config/i386/i386.h.jj 2007-02-09 16:18:25.000000000 +0100
++++ gcc/config/i386/i386.h 2007-02-09 21:29:00.000000000 +0100
+@@ -141,6 +141,7 @@ extern const struct processor_costs *ix8
+ #define TARGET_GENERIC32 (ix86_tune == PROCESSOR_GENERIC32)
+ #define TARGET_GENERIC64 (ix86_tune == PROCESSOR_GENERIC64)
+ #define TARGET_GENERIC (TARGET_GENERIC32 || TARGET_GENERIC64)
++#define TARGET_AMDFAM10 (ix86_tune == PROCESSOR_AMDFAM10)
+
+ #define TUNEMASK (1 << ix86_tune)
+ extern const int x86_use_leave, x86_push_memory, x86_zero_extend_with_and;
+@@ -159,6 +160,7 @@ extern const int x86_accumulate_outgoing
+ extern const int x86_epilogue_using_move, x86_decompose_lea;
+ extern const int x86_arch_always_fancy_math_387, x86_shift1;
+ extern const int x86_sse_partial_reg_dependency, x86_sse_split_regs;
++extern const int x86_sse_unaligned_move_optimal;
+ extern const int x86_sse_typeless_stores, x86_sse_load0_by_pxor;
+ extern const int x86_use_ffreep;
+ extern const int x86_inter_unit_moves, x86_schedule;
+@@ -208,6 +210,8 @@ extern int x86_prefetch_sse, x86_cmpxchg
+ #define TARGET_PARTIAL_REG_DEPENDENCY (x86_partial_reg_dependency & TUNEMASK)
+ #define TARGET_SSE_PARTIAL_REG_DEPENDENCY \
+ (x86_sse_partial_reg_dependency & TUNEMASK)
++#define TARGET_SSE_UNALIGNED_MOVE_OPTIMAL \
++ (x86_sse_unaligned_move_optimal & TUNEMASK)
+ #define TARGET_SSE_SPLIT_REGS (x86_sse_split_regs & TUNEMASK)
+ #define TARGET_SSE_TYPELESS_STORES (x86_sse_typeless_stores & TUNEMASK)
+ #define TARGET_SSE_LOAD0_BY_PXOR (x86_sse_load0_by_pxor & TUNEMASK)
+@@ -376,6 +380,8 @@ extern int x86_prefetch_sse, x86_cmpxchg
+ } \
+ else if (TARGET_K8) \
+ builtin_define ("__tune_k8__"); \
++ else if (TARGET_AMDFAM10) \
++ builtin_define ("__tune_amdfam10__"); \
+ else if (TARGET_PENTIUM4) \
+ builtin_define ("__tune_pentium4__"); \
+ else if (TARGET_NOCONA) \
+@@ -400,6 +406,8 @@ extern int x86_prefetch_sse, x86_cmpxchg
+ builtin_define ("__SSSE3__"); \
+ builtin_define ("__MNI__"); \
+ } \
++ if (TARGET_SSE4A) \
++ builtin_define ("__SSE4A__"); \
+ if (TARGET_SSE_MATH && TARGET_SSE) \
+ builtin_define ("__SSE_MATH__"); \
+ if (TARGET_SSE_MATH && TARGET_SSE2) \
+@@ -455,6 +463,11 @@ extern int x86_prefetch_sse, x86_cmpxchg
+ builtin_define ("__k8"); \
+ builtin_define ("__k8__"); \
+ } \
++ else if (ix86_arch == PROCESSOR_AMDFAM10) \
++ { \
++ builtin_define ("__amdfam10"); \
++ builtin_define ("__amdfam10__"); \
++ } \
+ else if (ix86_arch == PROCESSOR_PENTIUM4) \
+ { \
+ builtin_define ("__pentium4"); \
+@@ -493,13 +506,14 @@ extern int x86_prefetch_sse, x86_cmpxchg
+ #define TARGET_CPU_DEFAULT_nocona 17
+ #define TARGET_CPU_DEFAULT_core2 18
+ #define TARGET_CPU_DEFAULT_generic 19
++#define TARGET_CPU_DEFAULT_amdfam10 20
+
+ #define TARGET_CPU_DEFAULT_NAMES {"i386", "i486", "pentium", "pentium-mmx",\
+ "pentiumpro", "pentium2", "pentium3", \
+ "pentium4", "geode", "k6", "k6-2", "k6-3", \
+ "athlon", "athlon-4", "k8", \
+ "pentium-m", "prescott", "nocona", \
+- "core2", "generic"}
++ "core2", "generic", "amdfam10"}
+
+ #ifndef CC1_SPEC
+ #define CC1_SPEC "%(cc1_cpu) "
+@@ -2162,6 +2176,7 @@ enum processor_type
+ PROCESSOR_CORE2,
+ PROCESSOR_GENERIC32,
+ PROCESSOR_GENERIC64,
++ PROCESSOR_AMDFAM10,
+ PROCESSOR_max
+ };
+
+--- gcc/config/i386/i386.md.jj 2007-02-09 16:18:25.000000000 +0100
++++ gcc/config/i386/i386.md 2007-02-10 19:33:43.000000000 +0100
+@@ -151,6 +151,12 @@
+ (UNSPEC_PSHUFB 120)
+ (UNSPEC_PSIGN 121)
+ (UNSPEC_PALIGNR 122)
++
++ ; For SSE4A support
++ (UNSPEC_EXTRQI 130)
++ (UNSPEC_EXTRQ 131)
++ (UNSPEC_INSERTQI 132)
++ (UNSPEC_INSERTQ 133)
+ ])
+
+ (define_constants
+@@ -190,7 +196,8 @@
+ \f
+ ;; Processor type. This attribute must exactly match the processor_type
+ ;; enumeration in i386.h.
+-(define_attr "cpu" "i386,i486,pentium,pentiumpro,geode,k6,athlon,pentium4,k8,nocona,core2,generic32,generic64"
++(define_attr "cpu" "i386,i486,pentium,pentiumpro,geode,k6,athlon,pentium4,k8,
++ nocona,core2,generic32,generic64,amdfam10"
+ (const (symbol_ref "ix86_tune")))
+
+ ;; A basic instruction type. Refinements due to arguments to be
+@@ -201,10 +208,10 @@
+ incdec,ishift,ishift1,rotate,rotate1,imul,idiv,
+ icmp,test,ibr,setcc,icmov,
+ push,pop,call,callv,leave,
+- str,cld,
++ str,bitmanip,cld,
+ fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint,
+ sselog,sselog1,sseiadd,sseishft,sseimul,
+- sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,sseicvt,ssediv,
++ sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,sseicvt,ssediv,sseins,
+ mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft"
+ (const_string "other"))
+
+@@ -218,7 +225,7 @@
+ (cond [(eq_attr "type" "fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint")
+ (const_string "i387")
+ (eq_attr "type" "sselog,sselog1,sseiadd,sseishft,sseimul,
+- sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,sseicvt,ssediv")
++ sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,sseicvt,ssediv,sseins")
+ (const_string "sse")
+ (eq_attr "type" "mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft")
+ (const_string "mmx")
+@@ -228,7 +235,8 @@
+
+ ;; The (bounding maximum) length of an instruction immediate.
+ (define_attr "length_immediate" ""
+- (cond [(eq_attr "type" "incdec,setcc,icmov,str,cld,lea,other,multi,idiv,leave")
++ (cond [(eq_attr "type" "incdec,setcc,icmov,str,cld,lea,other,multi,idiv,leave,
++ bitmanip")
+ (const_int 0)
+ (eq_attr "unit" "i387,sse,mmx")
+ (const_int 0)
+@@ -282,7 +290,7 @@
+ ;; Set when 0f opcode prefix is used.
+ (define_attr "prefix_0f" ""
+ (if_then_else
+- (ior (eq_attr "type" "imovx,setcc,icmov")
++ (ior (eq_attr "type" "imovx,setcc,icmov,bitmanip")
+ (eq_attr "unit" "sse,mmx"))
+ (const_int 1)
+ (const_int 0)))
+@@ -407,7 +415,7 @@
+ (const_string "load")
+ (and (eq_attr "type"
+ "!alu1,negnot,ishift1,
+- imov,imovx,icmp,test,
++ imov,imovx,icmp,test,bitmanip,
+ fmov,fcmp,fsgn,
+ sse,ssemov,ssecmp,ssecomi,ssecvt,sseicvt,sselog1,
+ mmx,mmxmov,mmxcmp,mmxcvt")
+@@ -961,10 +969,11 @@
+ "sahf"
+ [(set_attr "length" "1")
+ (set_attr "athlon_decode" "vector")
++ (set_attr "amdfam10_decode" "direct")
+ (set_attr "mode" "SI")])
+
+ ;; Pentium Pro can do steps 1 through 3 in one go.
+-
++;; comi*, ucomi*, fcomi*, ficomi*,fucomi* (i387 instructions set condition codes)
+ (define_insn "*cmpfp_i_mixed"
+ [(set (reg:CCFP FLAGS_REG)
+ (compare:CCFP (match_operand 0 "register_operand" "f#x,x#f")
+@@ -978,7 +987,8 @@
+ (if_then_else (match_operand:SF 1 "" "")
+ (const_string "SF")
+ (const_string "DF")))
+- (set_attr "athlon_decode" "vector")])
++ (set_attr "athlon_decode" "vector")
++ (set_attr "amdfam10_decode" "direct")])
+
+ (define_insn "*cmpfp_i_sse"
+ [(set (reg:CCFP FLAGS_REG)
+@@ -993,7 +1003,8 @@
+ (if_then_else (match_operand:SF 1 "" "")
+ (const_string "SF")
+ (const_string "DF")))
+- (set_attr "athlon_decode" "vector")])
++ (set_attr "athlon_decode" "vector")
++ (set_attr "amdfam10_decode" "direct")])
+
+ (define_insn "*cmpfp_i_i387"
+ [(set (reg:CCFP FLAGS_REG)
+@@ -1012,7 +1023,8 @@
+ (const_string "DF")
+ ]
+ (const_string "XF")))
+- (set_attr "athlon_decode" "vector")])
++ (set_attr "athlon_decode" "vector")
++ (set_attr "amdfam10_decode" "direct")])
+
+ (define_insn "*cmpfp_iu_mixed"
+ [(set (reg:CCFPU FLAGS_REG)
+@@ -1027,7 +1039,8 @@
+ (if_then_else (match_operand:SF 1 "" "")
+ (const_string "SF")
+ (const_string "DF")))
+- (set_attr "athlon_decode" "vector")])
++ (set_attr "athlon_decode" "vector")
++ (set_attr "amdfam10_decode" "direct")])
+
+ (define_insn "*cmpfp_iu_sse"
+ [(set (reg:CCFPU FLAGS_REG)
+@@ -1042,7 +1055,8 @@
+ (if_then_else (match_operand:SF 1 "" "")
+ (const_string "SF")
+ (const_string "DF")))
+- (set_attr "athlon_decode" "vector")])
++ (set_attr "athlon_decode" "vector")
++ (set_attr "amdfam10_decode" "direct")])
+
+ (define_insn "*cmpfp_iu_387"
+ [(set (reg:CCFPU FLAGS_REG)
+@@ -1061,7 +1075,8 @@
+ (const_string "DF")
+ ]
+ (const_string "XF")))
+- (set_attr "athlon_decode" "vector")])
++ (set_attr "athlon_decode" "vector")
++ (set_attr "amdfam10_decode" "direct")])
+ \f
+ ;; Move instructions.
+
+@@ -1267,7 +1282,8 @@
+ [(set_attr "type" "imov")
+ (set_attr "mode" "SI")
+ (set_attr "pent_pair" "np")
+- (set_attr "athlon_decode" "vector")])
++ (set_attr "athlon_decode" "vector")
++ (set_attr "amdfam10_decode" "double")])
+
+ (define_expand "movhi"
+ [(set (match_operand:HI 0 "nonimmediate_operand" "")
+@@ -1384,8 +1400,10 @@
+ [(set_attr "type" "imov")
+ (set_attr "mode" "SI")
+ (set_attr "pent_pair" "np")
+- (set_attr "athlon_decode" "vector")])
++ (set_attr "athlon_decode" "vector")
++ (set_attr "amdfam10_decode" "double")])
+
++;; Not added amdfam10_decode since TARGET_PARTIAL_REG_STALL is disabled for AMDFAM10
+ (define_insn "*swaphi_2"
+ [(set (match_operand:HI 0 "register_operand" "+r")
+ (match_operand:HI 1 "register_operand" "+r"))
+@@ -1558,8 +1576,10 @@
+ [(set_attr "type" "imov")
+ (set_attr "mode" "SI")
+ (set_attr "pent_pair" "np")
+- (set_attr "athlon_decode" "vector")])
++ (set_attr "athlon_decode" "vector")
++ (set_attr "amdfam10_decode" "vector")])
+
++;; Not added amdfam10_decode since TARGET_PARTIAL_REG_STALL is disabled for AMDFAM10
+ (define_insn "*swapqi_2"
+ [(set (match_operand:QI 0 "register_operand" "+q")
+ (match_operand:QI 1 "register_operand" "+q"))
+@@ -2113,7 +2133,8 @@
+ [(set_attr "type" "imov")
+ (set_attr "mode" "DI")
+ (set_attr "pent_pair" "np")
+- (set_attr "athlon_decode" "vector")])
++ (set_attr "athlon_decode" "vector")
++ (set_attr "amdfam10_decode" "double")])
+
+ (define_expand "movti"
+ [(set (match_operand:TI 0 "nonimmediate_operand" "")
+@@ -4122,7 +4143,8 @@
+ "cvttss2si{q}\t{%1, %0|%0, %1}"
+ [(set_attr "type" "sseicvt")
+ (set_attr "mode" "SF")
+- (set_attr "athlon_decode" "double,vector")])
++ (set_attr "athlon_decode" "double,vector")
++ (set_attr "amdfam10_decode" "double,double")])
+
+ (define_insn "fix_truncdfdi_sse"
+ [(set (match_operand:DI 0 "register_operand" "=r,r")
+@@ -4131,7 +4153,8 @@
+ "cvttsd2si{q}\t{%1, %0|%0, %1}"
+ [(set_attr "type" "sseicvt")
+ (set_attr "mode" "DF")
+- (set_attr "athlon_decode" "double,vector")])
++ (set_attr "athlon_decode" "double,vector")
++ (set_attr "amdfam10_decode" "double,double")])
+
+ (define_insn "fix_truncsfsi_sse"
+ [(set (match_operand:SI 0 "register_operand" "=r,r")
+@@ -4140,7 +4163,8 @@
+ "cvttss2si\t{%1, %0|%0, %1}"
+ [(set_attr "type" "sseicvt")
+ (set_attr "mode" "DF")
+- (set_attr "athlon_decode" "double,vector")])
++ (set_attr "athlon_decode" "double,vector")
++ (set_attr "amdfam10_decode" "double,double")])
+
+ (define_insn "fix_truncdfsi_sse"
+ [(set (match_operand:SI 0 "register_operand" "=r,r")
+@@ -4149,7 +4173,8 @@
+ "cvttsd2si\t{%1, %0|%0, %1}"
+ [(set_attr "type" "sseicvt")
+ (set_attr "mode" "DF")
+- (set_attr "athlon_decode" "double,vector")])
++ (set_attr "athlon_decode" "double,vector")
++ (set_attr "amdfam10_decode" "double,double")])
+
+ ;; Avoid vector decoded forms of the instruction.
+ (define_peephole2
+@@ -4410,7 +4435,8 @@
+ [(set_attr "length" "2")
+ (set_attr "mode" "HI")
+ (set_attr "unit" "i387")
+- (set_attr "athlon_decode" "vector")])
++ (set_attr "athlon_decode" "vector")
++ (set_attr "amdfam10_decode" "vector")])
+ \f
+ ;; Conversion between fixed point and floating point.
+
+@@ -4461,6 +4487,7 @@
+ (set_attr "mode" "SF")
+ (set_attr "unit" "*,i387,*,*")
+ (set_attr "athlon_decode" "*,*,vector,double")
++ (set_attr "amdfam10_decode" "*,*,vector,double")
+ (set_attr "fp_int_src" "true")])
+
+ (define_insn "*floatsisf2_sse"
+@@ -4471,6 +4498,7 @@
+ [(set_attr "type" "sseicvt")
+ (set_attr "mode" "SF")
+ (set_attr "athlon_decode" "vector,double")
++ (set_attr "amdfam10_decode" "vector,double")
+ (set_attr "fp_int_src" "true")])
+
+ (define_insn "*floatsisf2_i387"
+@@ -4504,6 +4532,7 @@
+ (set_attr "mode" "SF")
+ (set_attr "unit" "*,i387,*,*")
+ (set_attr "athlon_decode" "*,*,vector,double")
++ (set_attr "amdfam10_decode" "*,*,vector,double")
+ (set_attr "fp_int_src" "true")])
+
+ (define_insn "*floatdisf2_sse"
+@@ -4514,6 +4543,7 @@
+ [(set_attr "type" "sseicvt")
+ (set_attr "mode" "SF")
+ (set_attr "athlon_decode" "vector,double")
++ (set_attr "amdfam10_decode" "vector,double")
+ (set_attr "fp_int_src" "true")])
+
+ (define_insn "*floatdisf2_i387"
+@@ -4572,6 +4602,7 @@
+ (set_attr "mode" "DF")
+ (set_attr "unit" "*,i387,*,*")
+ (set_attr "athlon_decode" "*,*,double,direct")
++ (set_attr "amdfam10_decode" "*,*,vector,double")
+ (set_attr "fp_int_src" "true")])
+
+ (define_insn "*floatsidf2_sse"
+@@ -4582,6 +4613,7 @@
+ [(set_attr "type" "sseicvt")
+ (set_attr "mode" "DF")
+ (set_attr "athlon_decode" "double,direct")
++ (set_attr "amdfam10_decode" "vector,double")
+ (set_attr "fp_int_src" "true")])
+
+ (define_insn "*floatsidf2_i387"
+@@ -4615,6 +4647,7 @@
+ (set_attr "mode" "DF")
+ (set_attr "unit" "*,i387,*,*")
+ (set_attr "athlon_decode" "*,*,double,direct")
++ (set_attr "amdfam10_decode" "*,*,vector,double")
+ (set_attr "fp_int_src" "true")])
+
+ (define_insn "*floatdidf2_sse"
+@@ -4625,6 +4658,7 @@
+ [(set_attr "type" "sseicvt")
+ (set_attr "mode" "DF")
+ (set_attr "athlon_decode" "double,direct")
++ (set_attr "amdfam10_decode" "vector,double")
+ (set_attr "fp_int_src" "true")])
+
+ (define_insn "*floatdidf2_i387"
+@@ -6832,6 +6866,14 @@
+ "TARGET_64BIT"
+ "")
+
++;; On AMDFAM10
++;; IMUL reg64, reg64, imm8 Direct
++;; IMUL reg64, mem64, imm8 VectorPath
++;; IMUL reg64, reg64, imm32 Direct
++;; IMUL reg64, mem64, imm32 VectorPath
++;; IMUL reg64, reg64 Direct
++;; IMUL reg64, mem64 Direct
++
+ (define_insn "*muldi3_1_rex64"
+ [(set (match_operand:DI 0 "register_operand" "=r,r,r")
+ (mult:DI (match_operand:DI 1 "nonimmediate_operand" "%rm,rm,0")
+@@ -6854,6 +6896,11 @@
+ (match_operand 1 "memory_operand" ""))
+ (const_string "vector")]
+ (const_string "direct")))
++ (set (attr "amdfam10_decode")
++ (cond [(and (eq_attr "alternative" "0,1")
++ (match_operand 1 "memory_operand" ""))
++ (const_string "vector")]
++ (const_string "direct")))
+ (set_attr "mode" "DI")])
+
+ (define_expand "mulsi3"
+@@ -6864,6 +6911,14 @@
+ ""
+ "")
+
++;; On AMDFAM10
++;; IMUL reg32, reg32, imm8 Direct
++;; IMUL reg32, mem32, imm8 VectorPath
++;; IMUL reg32, reg32, imm32 Direct
++;; IMUL reg32, mem32, imm32 VectorPath
++;; IMUL reg32, reg32 Direct
++;; IMUL reg32, mem32 Direct
++
+ (define_insn "*mulsi3_1"
+ [(set (match_operand:SI 0 "register_operand" "=r,r,r")
+ (mult:SI (match_operand:SI 1 "nonimmediate_operand" "%rm,rm,0")
+@@ -6885,6 +6940,11 @@
+ (match_operand 1 "memory_operand" ""))
+ (const_string "vector")]
+ (const_string "direct")))
++ (set (attr "amdfam10_decode")
++ (cond [(and (eq_attr "alternative" "0,1")
++ (match_operand 1 "memory_operand" ""))
++ (const_string "vector")]
++ (const_string "direct")))
+ (set_attr "mode" "SI")])
+
+ (define_insn "*mulsi3_1_zext"
+@@ -6910,6 +6970,11 @@
+ (match_operand 1 "memory_operand" ""))
+ (const_string "vector")]
+ (const_string "direct")))
++ (set (attr "amdfam10_decode")
++ (cond [(and (eq_attr "alternative" "0,1")
++ (match_operand 1 "memory_operand" ""))
++ (const_string "vector")]
++ (const_string "direct")))
+ (set_attr "mode" "SI")])
+
+ (define_expand "mulhi3"
+@@ -6920,6 +6985,13 @@
+ "TARGET_HIMODE_MATH"
+ "")
+
++;; On AMDFAM10
++;; IMUL reg16, reg16, imm8 VectorPath
++;; IMUL reg16, mem16, imm8 VectorPath
++;; IMUL reg16, reg16, imm16 VectorPath
++;; IMUL reg16, mem16, imm16 VectorPath
++;; IMUL reg16, reg16 Direct
++;; IMUL reg16, mem16 Direct
+ (define_insn "*mulhi3_1"
+ [(set (match_operand:HI 0 "register_operand" "=r,r,r")
+ (mult:HI (match_operand:HI 1 "nonimmediate_operand" "%rm,rm,0")
+@@ -6938,6 +7010,10 @@
+ (eq_attr "alternative" "1,2")
+ (const_string "vector")]
+ (const_string "direct")))
++ (set (attr "amdfam10_decode")
++ (cond [(eq_attr "alternative" "0,1")
++ (const_string "vector")]
++ (const_string "direct")))
+ (set_attr "mode" "HI")])
+
+ (define_expand "mulqi3"
+@@ -6948,6 +7024,10 @@
+ "TARGET_QIMODE_MATH"
+ "")
+
++;;On AMDFAM10
++;; MUL reg8 Direct
++;; MUL mem8 Direct
++
+ (define_insn "*mulqi3_1"
+ [(set (match_operand:QI 0 "register_operand" "=a")
+ (mult:QI (match_operand:QI 1 "nonimmediate_operand" "%0")
+@@ -6962,6 +7042,7 @@
+ (if_then_else (eq_attr "cpu" "athlon")
+ (const_string "vector")
+ (const_string "direct")))
++ (set_attr "amdfam10_decode" "direct")
+ (set_attr "mode" "QI")])
+
+ (define_expand "umulqihi3"
+@@ -6988,6 +7069,7 @@
+ (if_then_else (eq_attr "cpu" "athlon")
+ (const_string "vector")
+ (const_string "direct")))
++ (set_attr "amdfam10_decode" "direct")
+ (set_attr "mode" "QI")])
+
+ (define_expand "mulqihi3"
+@@ -7012,6 +7094,7 @@
+ (if_then_else (eq_attr "cpu" "athlon")
+ (const_string "vector")
+ (const_string "direct")))
++ (set_attr "amdfam10_decode" "direct")
+ (set_attr "mode" "QI")])
+
+ (define_expand "umulditi3"
+@@ -7038,6 +7121,7 @@
+ (if_then_else (eq_attr "cpu" "athlon")
+ (const_string "vector")
+ (const_string "double")))
++ (set_attr "amdfam10_decode" "double")
+ (set_attr "mode" "DI")])
+
+ ;; We can't use this pattern in 64bit mode, since it results in two separate 32bit registers
+@@ -7065,6 +7149,7 @@
+ (if_then_else (eq_attr "cpu" "athlon")
+ (const_string "vector")
+ (const_string "double")))
++ (set_attr "amdfam10_decode" "double")
+ (set_attr "mode" "SI")])
+
+ (define_expand "mulditi3"
+@@ -7091,6 +7176,7 @@
+ (if_then_else (eq_attr "cpu" "athlon")
+ (const_string "vector")
+ (const_string "double")))
++ (set_attr "amdfam10_decode" "double")
+ (set_attr "mode" "DI")])
+
+ (define_expand "mulsidi3"
+@@ -7117,6 +7203,7 @@
+ (if_then_else (eq_attr "cpu" "athlon")
+ (const_string "vector")
+ (const_string "double")))
++ (set_attr "amdfam10_decode" "double")
+ (set_attr "mode" "SI")])
+
+ (define_expand "umuldi3_highpart"
+@@ -7153,6 +7240,7 @@
+ (if_then_else (eq_attr "cpu" "athlon")
+ (const_string "vector")
+ (const_string "double")))
++ (set_attr "amdfam10_decode" "double")
+ (set_attr "mode" "DI")])
+
+ (define_expand "umulsi3_highpart"
+@@ -7188,6 +7276,7 @@
+ (if_then_else (eq_attr "cpu" "athlon")
+ (const_string "vector")
+ (const_string "double")))
++ (set_attr "amdfam10_decode" "double")
+ (set_attr "mode" "SI")])
+
+ (define_insn "*umulsi3_highpart_zext"
+@@ -7210,6 +7299,7 @@
+ (if_then_else (eq_attr "cpu" "athlon")
+ (const_string "vector")
+ (const_string "double")))
++ (set_attr "amdfam10_decode" "double")
+ (set_attr "mode" "SI")])
+
+ (define_expand "smuldi3_highpart"
+@@ -7245,6 +7335,7 @@
+ (if_then_else (eq_attr "cpu" "athlon")
+ (const_string "vector")
+ (const_string "double")))
++ (set_attr "amdfam10_decode" "double")
+ (set_attr "mode" "DI")])
+
+ (define_expand "smulsi3_highpart"
+@@ -7279,6 +7370,7 @@
+ (if_then_else (eq_attr "cpu" "athlon")
+ (const_string "vector")
+ (const_string "double")))
++ (set_attr "amdfam10_decode" "double")
+ (set_attr "mode" "SI")])
+
+ (define_insn "*smulsi3_highpart_zext"
+@@ -7300,6 +7392,7 @@
+ (if_then_else (eq_attr "cpu" "athlon")
+ (const_string "vector")
+ (const_string "double")))
++ (set_attr "amdfam10_decode" "double")
+ (set_attr "mode" "SI")])
+
+ ;; The patterns that match these are at the end of this file.
+@@ -10281,7 +10374,8 @@
+ [(set_attr "type" "ishift")
+ (set_attr "prefix_0f" "1")
+ (set_attr "mode" "DI")
+- (set_attr "athlon_decode" "vector")])
++ (set_attr "athlon_decode" "vector")
++ (set_attr "amdfam10_decode" "vector")])
+
+ (define_expand "x86_64_shift_adj"
+ [(set (reg:CCZ FLAGS_REG)
+@@ -10496,7 +10590,8 @@
+ (set_attr "prefix_0f" "1")
+ (set_attr "mode" "SI")
+ (set_attr "pent_pair" "np")
+- (set_attr "athlon_decode" "vector")])
++ (set_attr "athlon_decode" "vector")
++ (set_attr "amdfam10_decode" "vector")])
+
+ (define_expand "x86_shift_adj_1"
+ [(set (reg:CCZ FLAGS_REG)
+@@ -11256,7 +11351,8 @@
+ [(set_attr "type" "ishift")
+ (set_attr "prefix_0f" "1")
+ (set_attr "mode" "DI")
+- (set_attr "athlon_decode" "vector")])
++ (set_attr "athlon_decode" "vector")
++ (set_attr "amdfam10_decode" "vector")])
+
+ (define_expand "ashrdi3"
+ [(set (match_operand:DI 0 "shiftdi_operand" "")
+@@ -14520,7 +14616,23 @@
+ [(set (match_dup 0) (xor:SI (match_dup 0) (const_int 31)))
+ (clobber (reg:CC FLAGS_REG))])]
+ ""
+- "")
++{
++ if (TARGET_ABM)
++ {
++ emit_insn (gen_clzsi2_abm (operands[0], operands[1]));
++ DONE;
++ }
++})
++
++(define_insn "clzsi2_abm"
++ [(set (match_operand:SI 0 "register_operand" "=r")
++ (clz:SI (match_operand:SI 1 "nonimmediate_operand" "")))
++ (clobber (reg:CC FLAGS_REG))]
++ "TARGET_ABM"
++ "lzcnt{l}\t{%1, %0|%0, %1}"
++ [(set_attr "prefix_rep" "1")
++ (set_attr "type" "bitmanip")
++ (set_attr "mode" "SI")])
+
+ (define_insn "*bsr"
+ [(set (match_operand:SI 0 "register_operand" "=r")
+@@ -14529,7 +14641,44 @@
+ (clobber (reg:CC FLAGS_REG))]
+ ""
+ "bsr{l}\t{%1, %0|%0, %1}"
+- [(set_attr "prefix_0f" "1")])
++ [(set_attr "prefix_0f" "1")
++ (set_attr "mode" "SI")])
++
++(define_insn "popcountsi2"
++ [(set (match_operand:SI 0 "register_operand" "=r")
++ (popcount:SI (match_operand:SI 1 "nonimmediate_operand" "")))
++ (clobber (reg:CC FLAGS_REG))]
++ "TARGET_POPCNT"
++ "popcnt{l}\t{%1, %0|%0, %1}"
++ [(set_attr "prefix_rep" "1")
++ (set_attr "type" "bitmanip")
++ (set_attr "mode" "SI")])
++
++(define_insn "*popcountsi2_cmp"
++ [(set (reg FLAGS_REG)
++ (compare
++ (popcount:SI (match_operand:SI 1 "nonimmediate_operand" "rm"))
++ (const_int 0)))
++ (set (match_operand:SI 0 "register_operand" "=r")
++ (popcount:SI (match_dup 1)))]
++ "TARGET_POPCNT && ix86_match_ccmode (insn, CCZmode)"
++ "popcnt{l}\t{%1, %0|%0, %1}"
++ [(set_attr "prefix_rep" "1")
++ (set_attr "type" "bitmanip")
++ (set_attr "mode" "SI")])
++
++(define_insn "*popcountsi2_cmp_zext"
++ [(set (reg FLAGS_REG)
++ (compare
++ (popcount:SI (match_operand:SI 1 "nonimmediate_operand" "rm"))
++ (const_int 0)))
++ (set (match_operand:DI 0 "register_operand" "=r")
++ (zero_extend:DI(popcount:SI (match_dup 1))))]
++ "TARGET_64BIT && TARGET_POPCNT && ix86_match_ccmode (insn, CCZmode)"
++ "popcnt{l}\t{%1, %0|%0, %1}"
++ [(set_attr "prefix_rep" "1")
++ (set_attr "type" "bitmanip")
++ (set_attr "mode" "SI")])
+
+ (define_expand "clzdi2"
+ [(parallel
+@@ -14541,7 +14690,23 @@
+ [(set (match_dup 0) (xor:DI (match_dup 0) (const_int 63)))
+ (clobber (reg:CC FLAGS_REG))])]
+ "TARGET_64BIT"
+- "")
++{
++ if (TARGET_ABM)
++ {
++ emit_insn (gen_clzdi2_abm (operands[0], operands[1]));
++ DONE;
++ }
++})
++
++(define_insn "clzdi2_abm"
++ [(set (match_operand:DI 0 "register_operand" "=r")
++ (clz:DI (match_operand:DI 1 "nonimmediate_operand" "")))
++ (clobber (reg:CC FLAGS_REG))]
++ "TARGET_64BIT && TARGET_ABM"
++ "lzcnt{q}\t{%1, %0|%0, %1}"
++ [(set_attr "prefix_rep" "1")
++ (set_attr "type" "bitmanip")
++ (set_attr "mode" "DI")])
+
+ (define_insn "*bsr_rex64"
+ [(set (match_operand:DI 0 "register_operand" "=r")
+@@ -14550,7 +14715,92 @@
+ (clobber (reg:CC FLAGS_REG))]
+ "TARGET_64BIT"
+ "bsr{q}\t{%1, %0|%0, %1}"
+- [(set_attr "prefix_0f" "1")])
++ [(set_attr "prefix_0f" "1")
++ (set_attr "mode" "DI")])
++
++(define_insn "popcountdi2"
++ [(set (match_operand:DI 0 "register_operand" "=r")
++ (popcount:DI (match_operand:DI 1 "nonimmediate_operand" "")))
++ (clobber (reg:CC FLAGS_REG))]
++ "TARGET_64BIT && TARGET_POPCNT"
++ "popcnt{q}\t{%1, %0|%0, %1}"
++ [(set_attr "prefix_rep" "1")
++ (set_attr "type" "bitmanip")
++ (set_attr "mode" "DI")])
++
++(define_insn "*popcountdi2_cmp"
++ [(set (reg FLAGS_REG)
++ (compare
++ (popcount:DI (match_operand:DI 1 "nonimmediate_operand" "rm"))
++ (const_int 0)))
++ (set (match_operand:DI 0 "register_operand" "=r")
++ (popcount:DI (match_dup 1)))]
++ "TARGET_64BIT && TARGET_POPCNT && ix86_match_ccmode (insn, CCZmode)"
++ "popcnt{q}\t{%1, %0|%0, %1}"
++ [(set_attr "prefix_rep" "1")
++ (set_attr "type" "bitmanip")
++ (set_attr "mode" "DI")])
++
++(define_expand "clzhi2"
++ [(parallel
++ [(set (match_operand:HI 0 "register_operand" "")
++ (minus:HI (const_int 15)
++ (clz:HI (match_operand:HI 1 "nonimmediate_operand" ""))))
++ (clobber (reg:CC FLAGS_REG))])
++ (parallel
++ [(set (match_dup 0) (xor:HI (match_dup 0) (const_int 15)))
++ (clobber (reg:CC FLAGS_REG))])]
++ ""
++{
++ if (TARGET_ABM)
++ {
++ emit_insn (gen_clzhi2_abm (operands[0], operands[1]));
++ DONE;
++ }
++})
++
++(define_insn "clzhi2_abm"
++ [(set (match_operand:HI 0 "register_operand" "=r")
++ (clz:HI (match_operand:HI 1 "nonimmediate_operand" "")))
++ (clobber (reg:CC FLAGS_REG))]
++ "TARGET_ABM"
++ "lzcnt{w}\t{%1, %0|%0, %1}"
++ [(set_attr "prefix_rep" "1")
++ (set_attr "type" "bitmanip")
++ (set_attr "mode" "HI")])
++
++(define_insn "*bsrhi"
++ [(set (match_operand:HI 0 "register_operand" "=r")
++ (minus:HI (const_int 15)
++ (clz:HI (match_operand:HI 1 "nonimmediate_operand" "rm"))))
++ (clobber (reg:CC FLAGS_REG))]
++ ""
++ "bsr{w}\t{%1, %0|%0, %1}"
++ [(set_attr "prefix_0f" "1")
++ (set_attr "mode" "HI")])
++
++(define_insn "popcounthi2"
++ [(set (match_operand:HI 0 "register_operand" "=r")
++ (popcount:HI (match_operand:HI 1 "nonimmediate_operand" "")))
++ (clobber (reg:CC FLAGS_REG))]
++ "TARGET_POPCNT"
++ "popcnt{w}\t{%1, %0|%0, %1}"
++ [(set_attr "prefix_rep" "1")
++ (set_attr "type" "bitmanip")
++ (set_attr "mode" "HI")])
++
++(define_insn "*popcounthi2_cmp"
++ [(set (reg FLAGS_REG)
++ (compare
++ (popcount:HI (match_operand:HI 1 "nonimmediate_operand" "rm"))
++ (const_int 0)))
++ (set (match_operand:HI 0 "register_operand" "=r")
++ (popcount:HI (match_dup 1)))]
++ "TARGET_POPCNT && ix86_match_ccmode (insn, CCZmode)"
++ "popcnt{w}\t{%1, %0|%0, %1}"
++ [(set_attr "prefix_rep" "1")
++ (set_attr "type" "bitmanip")
++ (set_attr "mode" "HI")])
+ \f
+ ;; Thread-local storage patterns for ELF.
+ ;;
+@@ -15302,7 +15552,8 @@
+ sqrtss\t{%1, %0|%0, %1}"
+ [(set_attr "type" "fpspc,sse")
+ (set_attr "mode" "SF,SF")
+- (set_attr "athlon_decode" "direct,*")])
++ (set_attr "athlon_decode" "direct,*")
++ (set_attr "amdfam10_decode" "direct,*")])
+
+ (define_insn "*sqrtsf2_sse"
+ [(set (match_operand:SF 0 "register_operand" "=x")
+@@ -15311,7 +15562,8 @@
+ "sqrtss\t{%1, %0|%0, %1}"
+ [(set_attr "type" "sse")
+ (set_attr "mode" "SF")
+- (set_attr "athlon_decode" "*")])
++ (set_attr "athlon_decode" "*")
++ (set_attr "amdfam10_decode" "*")])
+
+ (define_insn "*sqrtsf2_i387"
+ [(set (match_operand:SF 0 "register_operand" "=f")
+@@ -15320,7 +15572,8 @@
+ "fsqrt"
+ [(set_attr "type" "fpspc")
+ (set_attr "mode" "SF")
+- (set_attr "athlon_decode" "direct")])
++ (set_attr "athlon_decode" "direct")
++ (set_attr "amdfam10_decode" "direct")])
+
+ (define_expand "sqrtdf2"
+ [(set (match_operand:DF 0 "register_operand" "")
+@@ -15399,7 +15652,8 @@
+ "fsqrt"
+ [(set_attr "type" "fpspc")
+ (set_attr "mode" "XF")
+- (set_attr "athlon_decode" "direct")])
++ (set_attr "athlon_decode" "direct")
++ (set_attr "amdfam10_decode" "direct")])
+
+ (define_insn "fpremxf4"
+ [(set (match_operand:XF 0 "register_operand" "=f")
+@@ -20186,7 +20440,7 @@
+ (mult:DI (match_operand:DI 1 "memory_operand" "")
+ (match_operand:DI 2 "immediate_operand" "")))
+ (clobber (reg:CC FLAGS_REG))])]
+- "(TARGET_K8 || TARGET_GENERIC64) && !optimize_size
++ "(TARGET_K8 || TARGET_GENERIC64 || TARGET_AMDFAM10) && !optimize_size
+ && (GET_CODE (operands[2]) != CONST_INT
+ || !CONST_OK_FOR_LETTER_P (INTVAL (operands[2]), 'K'))"
+ [(set (match_dup 3) (match_dup 1))
+@@ -20200,7 +20454,7 @@
+ (mult:SI (match_operand:SI 1 "memory_operand" "")
+ (match_operand:SI 2 "immediate_operand" "")))
+ (clobber (reg:CC FLAGS_REG))])]
+- "(TARGET_K8 || TARGET_GENERIC64) && !optimize_size
++ "(TARGET_K8 || TARGET_GENERIC64 || TARGET_AMDFAM10) && !optimize_size
+ && (GET_CODE (operands[2]) != CONST_INT
+ || !CONST_OK_FOR_LETTER_P (INTVAL (operands[2]), 'K'))"
+ [(set (match_dup 3) (match_dup 1))
+@@ -20215,7 +20469,7 @@
+ (mult:SI (match_operand:SI 1 "memory_operand" "")
+ (match_operand:SI 2 "immediate_operand" ""))))
+ (clobber (reg:CC FLAGS_REG))])]
+- "(TARGET_K8 || TARGET_GENERIC64) && !optimize_size
++ "(TARGET_K8 || TARGET_GENERIC64 || TARGET_AMDFAM10) && !optimize_size
+ && (GET_CODE (operands[2]) != CONST_INT
+ || !CONST_OK_FOR_LETTER_P (INTVAL (operands[2]), 'K'))"
+ [(set (match_dup 3) (match_dup 1))
+@@ -20233,7 +20487,7 @@
+ (match_operand:DI 2 "const_int_operand" "")))
+ (clobber (reg:CC FLAGS_REG))])
+ (match_scratch:DI 3 "r")]
+- "(TARGET_K8 || TARGET_GENERIC64) && !optimize_size
++ "(TARGET_K8 || TARGET_GENERIC64 || TARGET_AMDFAM10) && !optimize_size
+ && CONST_OK_FOR_LETTER_P (INTVAL (operands[2]), 'K')"
+ [(set (match_dup 3) (match_dup 2))
+ (parallel [(set (match_dup 0) (mult:DI (match_dup 0) (match_dup 3)))
+@@ -20249,7 +20503,7 @@
+ (match_operand:SI 2 "const_int_operand" "")))
+ (clobber (reg:CC FLAGS_REG))])
+ (match_scratch:SI 3 "r")]
+- "(TARGET_K8 || TARGET_GENERIC64) && !optimize_size
++ "(TARGET_K8 || TARGET_GENERIC64 || TARGET_AMDFAM10) && !optimize_size
+ && CONST_OK_FOR_LETTER_P (INTVAL (operands[2]), 'K')"
+ [(set (match_dup 3) (match_dup 2))
+ (parallel [(set (match_dup 0) (mult:SI (match_dup 0) (match_dup 3)))
+@@ -20265,7 +20519,7 @@
+ (match_operand:HI 2 "immediate_operand" "")))
+ (clobber (reg:CC FLAGS_REG))])
+ (match_scratch:HI 3 "r")]
+- "(TARGET_K8 || TARGET_GENERIC64) && !optimize_size"
++ "(TARGET_K8 || TARGET_GENERIC64 || TARGET_AMDFAM10) && !optimize_size"
+ [(set (match_dup 3) (match_dup 2))
+ (parallel [(set (match_dup 0) (mult:HI (match_dup 0) (match_dup 3)))
+ (clobber (reg:CC FLAGS_REG))])]
+--- gcc/config/i386/athlon.md.jj 2006-10-29 20:56:45.000000000 +0100
++++ gcc/config/i386/athlon.md 2007-02-09 21:26:06.000000000 +0100
+@@ -29,6 +29,8 @@
+ (const_string "vector")]
+ (const_string "direct")))
+
++(define_attr "amdfam10_decode" "direct,vector,double"
++ (const_string "direct"))
+ ;;
+ ;; decode0 decode1 decode2
+ ;; \ | /
+@@ -131,18 +133,22 @@
+
+ ;; Jump instructions are executed in the branch unit completely transparent to us
+ (define_insn_reservation "athlon_branch" 0
+- (and (eq_attr "cpu" "athlon,k8,generic64")
++ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
+ (eq_attr "type" "ibr"))
+ "athlon-direct,athlon-ieu")
+ (define_insn_reservation "athlon_call" 0
+ (and (eq_attr "cpu" "athlon,k8,generic64")
+ (eq_attr "type" "call,callv"))
+ "athlon-vector,athlon-ieu")
++(define_insn_reservation "athlon_call_amdfam10" 0
++ (and (eq_attr "cpu" "amdfam10")
++ (eq_attr "type" "call,callv"))
++ "athlon-double,athlon-ieu")
+
+ ;; Latency of push operation is 3 cycles, but ESP value is available
+ ;; earlier
+ (define_insn_reservation "athlon_push" 2
+- (and (eq_attr "cpu" "athlon,k8,generic64")
++ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
+ (eq_attr "type" "push"))
+ "athlon-direct,athlon-agu,athlon-store")
+ (define_insn_reservation "athlon_pop" 4
+@@ -153,12 +159,16 @@
+ (and (eq_attr "cpu" "k8,generic64")
+ (eq_attr "type" "pop"))
+ "athlon-double,(athlon-ieu+athlon-load)")
++(define_insn_reservation "athlon_pop_amdfam10" 3
++ (and (eq_attr "cpu" "amdfam10")
++ (eq_attr "type" "pop"))
++ "athlon-direct,(athlon-ieu+athlon-load)")
+ (define_insn_reservation "athlon_leave" 3
+ (and (eq_attr "cpu" "athlon")
+ (eq_attr "type" "leave"))
+ "athlon-vector,(athlon-ieu+athlon-load)")
+ (define_insn_reservation "athlon_leave_k8" 3
+- (and (eq_attr "cpu" "k8,generic64")
++ (and (eq_attr "cpu" "k8,generic64,amdfam10")
+ (eq_attr "type" "leave"))
+ "athlon-double,(athlon-ieu+athlon-load)")
+
+@@ -167,6 +177,11 @@
+ (and (eq_attr "cpu" "athlon,k8,generic64")
+ (eq_attr "type" "lea"))
+ "athlon-direct,athlon-agu,nothing")
++;; Lea executes in AGU unit with 1 cycle latency on AMDFAM10
++(define_insn_reservation "athlon_lea_amdfam10" 1
++ (and (eq_attr "cpu" "amdfam10")
++ (eq_attr "type" "lea"))
++ "athlon-direct,athlon-agu,nothing")
+
+ ;; Mul executes in special multiplier unit attached to IEU0
+ (define_insn_reservation "athlon_imul" 5
+@@ -176,29 +191,35 @@
+ "athlon-vector,athlon-ieu0,athlon-mult,nothing,nothing,athlon-ieu0")
+ ;; ??? Widening multiply is vector or double.
+ (define_insn_reservation "athlon_imul_k8_DI" 4
+- (and (eq_attr "cpu" "k8,generic64")
++ (and (eq_attr "cpu" "k8,generic64,amdfam10")
+ (and (eq_attr "type" "imul")
+ (and (eq_attr "mode" "DI")
+ (eq_attr "memory" "none,unknown"))))
+ "athlon-direct0,athlon-ieu0,athlon-mult,nothing,athlon-ieu0")
+ (define_insn_reservation "athlon_imul_k8" 3
+- (and (eq_attr "cpu" "k8,generic64")
++ (and (eq_attr "cpu" "k8,generic64,amdfam10")
+ (and (eq_attr "type" "imul")
+ (eq_attr "memory" "none,unknown")))
+ "athlon-direct0,athlon-ieu0,athlon-mult,athlon-ieu0")
++(define_insn_reservation "athlon_imul_amdfam10_HI" 4
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "imul")
++ (and (eq_attr "mode" "HI")
++ (eq_attr "memory" "none,unknown"))))
++ "athlon-vector,athlon-ieu0,athlon-mult,nothing,athlon-ieu0")
+ (define_insn_reservation "athlon_imul_mem" 8
+ (and (eq_attr "cpu" "athlon")
+ (and (eq_attr "type" "imul")
+ (eq_attr "memory" "load,both")))
+ "athlon-vector,athlon-load,athlon-ieu,athlon-mult,nothing,nothing,athlon-ieu")
+ (define_insn_reservation "athlon_imul_mem_k8_DI" 7
+- (and (eq_attr "cpu" "k8,generic64")
++ (and (eq_attr "cpu" "k8,generic64,amdfam10")
+ (and (eq_attr "type" "imul")
+ (and (eq_attr "mode" "DI")
+ (eq_attr "memory" "load,both"))))
+ "athlon-vector,athlon-load,athlon-ieu,athlon-mult,nothing,athlon-ieu")
+ (define_insn_reservation "athlon_imul_mem_k8" 6
+- (and (eq_attr "cpu" "k8,generic64")
++ (and (eq_attr "cpu" "k8,generic64,amdfam10")
+ (and (eq_attr "type" "imul")
+ (eq_attr "memory" "load,both")))
+ "athlon-vector,athlon-load,athlon-ieu,athlon-mult,athlon-ieu")
+@@ -209,21 +230,23 @@
+ ;; other instructions.
+ ;; ??? Experiments show that the idiv can overlap with roughly 6 cycles
+ ;; of the other code
++;; Using the same heuristics for amdfam10 as K8 with idiv
+
+ (define_insn_reservation "athlon_idiv" 6
+- (and (eq_attr "cpu" "athlon,k8,generic64")
++ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
+ (and (eq_attr "type" "idiv")
+ (eq_attr "memory" "none,unknown")))
+ "athlon-vector,(athlon-ieu0*6+(athlon-fpsched,athlon-fvector))")
+ (define_insn_reservation "athlon_idiv_mem" 9
+- (and (eq_attr "cpu" "athlon,k8,generic64")
++ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
+ (and (eq_attr "type" "idiv")
+ (eq_attr "memory" "load,both")))
+ "athlon-vector,((athlon-load,athlon-ieu0*6)+(athlon-fpsched,athlon-fvector))")
+ ;; The parallelism of string instructions is not documented. Model it same way
+ ;; as idiv to create smaller automata. This probably does not matter much.
++;; Using the same heuristics for amdfam10 as K8 with idiv
+ (define_insn_reservation "athlon_str" 6
+- (and (eq_attr "cpu" "athlon,k8,generic64")
++ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
+ (and (eq_attr "type" "str")
+ (eq_attr "memory" "load,both,store")))
+ "athlon-vector,athlon-load,athlon-ieu0*6")
+@@ -234,34 +257,62 @@
+ (and (eq_attr "unit" "integer,unknown")
+ (eq_attr "memory" "none,unknown"))))
+ "athlon-direct,athlon-ieu")
++(define_insn_reservation "athlon_idirect_amdfam10" 1
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "amdfam10_decode" "direct")
++ (and (eq_attr "unit" "integer,unknown")
++ (eq_attr "memory" "none,unknown"))))
++ "athlon-direct,athlon-ieu")
+ (define_insn_reservation "athlon_ivector" 2
+ (and (eq_attr "cpu" "athlon,k8,generic64")
+ (and (eq_attr "athlon_decode" "vector")
+ (and (eq_attr "unit" "integer,unknown")
+ (eq_attr "memory" "none,unknown"))))
+ "athlon-vector,athlon-ieu,athlon-ieu")
++(define_insn_reservation "athlon_ivector_amdfam10" 2
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "amdfam10_decode" "vector")
++ (and (eq_attr "unit" "integer,unknown")
++ (eq_attr "memory" "none,unknown"))))
++ "athlon-vector,athlon-ieu,athlon-ieu")
++
+ (define_insn_reservation "athlon_idirect_loadmov" 3
+- (and (eq_attr "cpu" "athlon,k8,generic64")
++ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
+ (and (eq_attr "type" "imov")
+ (eq_attr "memory" "load")))
+ "athlon-direct,athlon-load")
++
+ (define_insn_reservation "athlon_idirect_load" 4
+ (and (eq_attr "cpu" "athlon,k8,generic64")
+ (and (eq_attr "athlon_decode" "direct")
+ (and (eq_attr "unit" "integer,unknown")
+ (eq_attr "memory" "load"))))
+ "athlon-direct,athlon-load,athlon-ieu")
++(define_insn_reservation "athlon_idirect_load_amdfam10" 4
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "amdfam10_decode" "direct")
++ (and (eq_attr "unit" "integer,unknown")
++ (eq_attr "memory" "load"))))
++ "athlon-direct,athlon-load,athlon-ieu")
+ (define_insn_reservation "athlon_ivector_load" 6
+ (and (eq_attr "cpu" "athlon,k8,generic64")
+ (and (eq_attr "athlon_decode" "vector")
+ (and (eq_attr "unit" "integer,unknown")
+ (eq_attr "memory" "load"))))
+ "athlon-vector,athlon-load,athlon-ieu,athlon-ieu")
++(define_insn_reservation "athlon_ivector_load_amdfam10" 6
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "amdfam10_decode" "vector")
++ (and (eq_attr "unit" "integer,unknown")
++ (eq_attr "memory" "load"))))
++ "athlon-vector,athlon-load,athlon-ieu,athlon-ieu")
++
+ (define_insn_reservation "athlon_idirect_movstore" 1
+- (and (eq_attr "cpu" "athlon,k8,generic64")
++ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
+ (and (eq_attr "type" "imov")
+ (eq_attr "memory" "store")))
+ "athlon-direct,athlon-agu,athlon-store")
++
+ (define_insn_reservation "athlon_idirect_both" 4
+ (and (eq_attr "cpu" "athlon,k8,generic64")
+ (and (eq_attr "athlon_decode" "direct")
+@@ -270,6 +321,15 @@
+ "athlon-direct,athlon-load,
+ athlon-ieu,athlon-store,
+ athlon-store")
++(define_insn_reservation "athlon_idirect_both_amdfam10" 4
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "amdfam10_decode" "direct")
++ (and (eq_attr "unit" "integer,unknown")
++ (eq_attr "memory" "both"))))
++ "athlon-direct,athlon-load,
++ athlon-ieu,athlon-store,
++ athlon-store")
++
+ (define_insn_reservation "athlon_ivector_both" 6
+ (and (eq_attr "cpu" "athlon,k8,generic64")
+ (and (eq_attr "athlon_decode" "vector")
+@@ -279,6 +339,16 @@
+ athlon-ieu,
+ athlon-ieu,
+ athlon-store")
++(define_insn_reservation "athlon_ivector_both_amdfam10" 6
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "amdfam10_decode" "vector")
++ (and (eq_attr "unit" "integer,unknown")
++ (eq_attr "memory" "both"))))
++ "athlon-vector,athlon-load,
++ athlon-ieu,
++ athlon-ieu,
++ athlon-store")
++
+ (define_insn_reservation "athlon_idirect_store" 1
+ (and (eq_attr "cpu" "athlon,k8,generic64")
+ (and (eq_attr "athlon_decode" "direct")
+@@ -286,6 +356,14 @@
+ (eq_attr "memory" "store"))))
+ "athlon-direct,(athlon-ieu+athlon-agu),
+ athlon-store")
++(define_insn_reservation "athlon_idirect_store_amdfam10" 1
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "amdfam10_decode" "direct")
++ (and (eq_attr "unit" "integer,unknown")
++ (eq_attr "memory" "store"))))
++ "athlon-direct,(athlon-ieu+athlon-agu),
++ athlon-store")
++
+ (define_insn_reservation "athlon_ivector_store" 2
+ (and (eq_attr "cpu" "athlon,k8,generic64")
+ (and (eq_attr "athlon_decode" "vector")
+@@ -293,6 +371,13 @@
+ (eq_attr "memory" "store"))))
+ "athlon-vector,(athlon-ieu+athlon-agu),athlon-ieu,
+ athlon-store")
++(define_insn_reservation "athlon_ivector_store_amdfam10" 2
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "amdfam10_decode" "vector")
++ (and (eq_attr "unit" "integer,unknown")
++ (eq_attr "memory" "store"))))
++ "athlon-vector,(athlon-ieu+athlon-agu),athlon-ieu,
++ athlon-store")
+
+ ;; Athlon floatin point unit
+ (define_insn_reservation "athlon_fldxf" 12
+@@ -302,7 +387,7 @@
+ (eq_attr "mode" "XF"))))
+ "athlon-vector,athlon-fpload2,athlon-fvector*9")
+ (define_insn_reservation "athlon_fldxf_k8" 13
+- (and (eq_attr "cpu" "k8,generic64")
++ (and (eq_attr "cpu" "k8,generic64,amdfam10")
+ (and (eq_attr "type" "fmov")
+ (and (eq_attr "memory" "load")
+ (eq_attr "mode" "XF"))))
+@@ -314,7 +399,7 @@
+ (eq_attr "memory" "load")))
+ "athlon-direct,athlon-fpload,athlon-fany")
+ (define_insn_reservation "athlon_fld_k8" 2
+- (and (eq_attr "cpu" "k8,generic64")
++ (and (eq_attr "cpu" "k8,generic64,amdfam10")
+ (and (eq_attr "type" "fmov")
+ (eq_attr "memory" "load")))
+ "athlon-direct,athlon-fploadk8,athlon-fstore")
+@@ -326,7 +411,7 @@
+ (eq_attr "mode" "XF"))))
+ "athlon-vector,(athlon-fpsched+athlon-agu),(athlon-store2+(athlon-fvector*7))")
+ (define_insn_reservation "athlon_fstxf_k8" 8
+- (and (eq_attr "cpu" "k8,generic64")
++ (and (eq_attr "cpu" "k8,generic64,amdfam10")
+ (and (eq_attr "type" "fmov")
+ (and (eq_attr "memory" "store,both")
+ (eq_attr "mode" "XF"))))
+@@ -337,16 +422,16 @@
+ (eq_attr "memory" "store,both")))
+ "athlon-direct,(athlon-fpsched+athlon-agu),(athlon-fstore+athlon-store)")
+ (define_insn_reservation "athlon_fst_k8" 2
+- (and (eq_attr "cpu" "k8,generic64")
++ (and (eq_attr "cpu" "k8,generic64,amdfam10")
+ (and (eq_attr "type" "fmov")
+ (eq_attr "memory" "store,both")))
+ "athlon-direct,(athlon-fpsched+athlon-agu),(athlon-fstore+athlon-store)")
+ (define_insn_reservation "athlon_fist" 4
+- (and (eq_attr "cpu" "athlon,k8,generic64")
++ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
+ (eq_attr "type" "fistp,fisttp"))
+ "athlon-direct,(athlon-fpsched+athlon-agu),(athlon-fstore+athlon-store)")
+ (define_insn_reservation "athlon_fmov" 2
+- (and (eq_attr "cpu" "athlon,k8,generic64")
++ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
+ (eq_attr "type" "fmov"))
+ "athlon-direct,athlon-fpsched,athlon-faddmul")
+ (define_insn_reservation "athlon_fadd_load" 4
+@@ -355,12 +440,12 @@
+ (eq_attr "memory" "load")))
+ "athlon-direct,athlon-fpload,athlon-fadd")
+ (define_insn_reservation "athlon_fadd_load_k8" 6
+- (and (eq_attr "cpu" "k8,generic64")
++ (and (eq_attr "cpu" "k8,generic64,amdfam10")
+ (and (eq_attr "type" "fop")
+ (eq_attr "memory" "load")))
+ "athlon-direct,athlon-fploadk8,athlon-fadd")
+ (define_insn_reservation "athlon_fadd" 4
+- (and (eq_attr "cpu" "athlon,k8,generic64")
++ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
+ (eq_attr "type" "fop"))
+ "athlon-direct,athlon-fpsched,athlon-fadd")
+ (define_insn_reservation "athlon_fmul_load" 4
+@@ -369,16 +454,16 @@
+ (eq_attr "memory" "load")))
+ "athlon-direct,athlon-fpload,athlon-fmul")
+ (define_insn_reservation "athlon_fmul_load_k8" 6
+- (and (eq_attr "cpu" "k8,generic64")
++ (and (eq_attr "cpu" "k8,generic64,amdfam10")
+ (and (eq_attr "type" "fmul")
+ (eq_attr "memory" "load")))
+ "athlon-direct,athlon-fploadk8,athlon-fmul")
+ (define_insn_reservation "athlon_fmul" 4
+- (and (eq_attr "cpu" "athlon,k8,generic64")
++ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
+ (eq_attr "type" "fmul"))
+ "athlon-direct,athlon-fpsched,athlon-fmul")
+ (define_insn_reservation "athlon_fsgn" 2
+- (and (eq_attr "cpu" "athlon,k8,generic64")
++ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
+ (eq_attr "type" "fsgn"))
+ "athlon-direct,athlon-fpsched,athlon-fmul")
+ (define_insn_reservation "athlon_fdiv_load" 24
+@@ -387,7 +472,7 @@
+ (eq_attr "memory" "load")))
+ "athlon-direct,athlon-fpload,athlon-fmul")
+ (define_insn_reservation "athlon_fdiv_load_k8" 13
+- (and (eq_attr "cpu" "k8,generic64")
++ (and (eq_attr "cpu" "k8,generic64,amdfam10")
+ (and (eq_attr "type" "fdiv")
+ (eq_attr "memory" "load")))
+ "athlon-direct,athlon-fploadk8,athlon-fmul")
+@@ -396,16 +481,16 @@
+ (eq_attr "type" "fdiv"))
+ "athlon-direct,athlon-fpsched,athlon-fmul")
+ (define_insn_reservation "athlon_fdiv_k8" 11
+- (and (eq_attr "cpu" "k8,generic64")
++ (and (eq_attr "cpu" "k8,generic64,amdfam10")
+ (eq_attr "type" "fdiv"))
+ "athlon-direct,athlon-fpsched,athlon-fmul")
+ (define_insn_reservation "athlon_fpspc_load" 103
+- (and (eq_attr "cpu" "athlon,k8,generic64")
++ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
+ (and (eq_attr "type" "fpspc")
+ (eq_attr "memory" "load")))
+ "athlon-vector,athlon-fpload,athlon-fvector")
+ (define_insn_reservation "athlon_fpspc" 100
+- (and (eq_attr "cpu" "athlon,k8,generic64")
++ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
+ (eq_attr "type" "fpspc"))
+ "athlon-vector,athlon-fpsched,athlon-fvector")
+ (define_insn_reservation "athlon_fcmov_load" 7
+@@ -418,12 +503,12 @@
+ (eq_attr "type" "fcmov"))
+ "athlon-vector,athlon-fpsched,athlon-fvector")
+ (define_insn_reservation "athlon_fcmov_load_k8" 17
+- (and (eq_attr "cpu" "k8,generic64")
++ (and (eq_attr "cpu" "k8,generic64,amdfam10")
+ (and (eq_attr "type" "fcmov")
+ (eq_attr "memory" "load")))
+ "athlon-vector,athlon-fploadk8,athlon-fvector")
+ (define_insn_reservation "athlon_fcmov_k8" 15
+- (and (eq_attr "cpu" "k8,generic64")
++ (and (eq_attr "cpu" "k8,generic64,amdfam10")
+ (eq_attr "type" "fcmov"))
+ "athlon-vector,athlon-fpsched,athlon-fvector")
+ ;; fcomi is vector decoded by uses only one pipe.
+@@ -434,13 +519,13 @@
+ (eq_attr "memory" "load"))))
+ "athlon-vector,athlon-fpload,athlon-fadd")
+ (define_insn_reservation "athlon_fcomi_load_k8" 5
+- (and (eq_attr "cpu" "k8,generic64")
++ (and (eq_attr "cpu" "k8,generic64,amdfam10")
+ (and (eq_attr "type" "fcmp")
+ (and (eq_attr "athlon_decode" "vector")
+ (eq_attr "memory" "load"))))
+ "athlon-vector,athlon-fploadk8,athlon-fadd")
+ (define_insn_reservation "athlon_fcomi" 3
+- (and (eq_attr "cpu" "athlon,k8,generic64")
++ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
+ (and (eq_attr "athlon_decode" "vector")
+ (eq_attr "type" "fcmp")))
+ "athlon-vector,athlon-fpsched,athlon-fadd")
+@@ -450,18 +535,18 @@
+ (eq_attr "memory" "load")))
+ "athlon-direct,athlon-fpload,athlon-fadd")
+ (define_insn_reservation "athlon_fcom_load_k8" 4
+- (and (eq_attr "cpu" "k8,generic64")
++ (and (eq_attr "cpu" "k8,generic64,amdfam10")
+ (and (eq_attr "type" "fcmp")
+ (eq_attr "memory" "load")))
+ "athlon-direct,athlon-fploadk8,athlon-fadd")
+ (define_insn_reservation "athlon_fcom" 2
+- (and (eq_attr "cpu" "athlon,k8,generic64")
++ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
+ (eq_attr "type" "fcmp"))
+ "athlon-direct,athlon-fpsched,athlon-fadd")
+ ;; Never seen by the scheduler because we still don't do post reg-stack
+ ;; scheduling.
+ ;(define_insn_reservation "athlon_fxch" 2
+-; (and (eq_attr "cpu" "athlon,k8,generic64")
++; (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
+ ; (eq_attr "type" "fxch"))
+ ; "athlon-direct,athlon-fpsched,athlon-fany")
+
+@@ -516,6 +601,23 @@
+ (and (eq_attr "type" "mmxmov,ssemov")
+ (eq_attr "memory" "load")))
+ "athlon-direct,athlon-fploadk8,athlon-fstore")
++;; On AMDFAM10 all double, single and integer packed and scalar SSEx data
++;; loads generated are direct path, latency of 2 and do not use any FP
++;; executions units. No seperate entries for movlpx/movhpx loads, which
++;; are direct path, latency of 4 and use the FADD/FMUL FP execution units,
++;; as they will not be generated.
++(define_insn_reservation "athlon_sseld_amdfam10" 2
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "ssemov")
++ (eq_attr "memory" "load")))
++ "athlon-direct,athlon-fploadk8")
++;; On AMDFAM10 MMX data loads generated are direct path, latency of 4
++;; and can use any FP executions units
++(define_insn_reservation "athlon_mmxld_amdfam10" 4
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "mmxmov")
++ (eq_attr "memory" "load")))
++ "athlon-direct,athlon-fploadk8, athlon-fany")
+ (define_insn_reservation "athlon_mmxssest" 3
+ (and (eq_attr "cpu" "k8,generic64")
+ (and (eq_attr "type" "mmxmov,ssemov")
+@@ -533,6 +635,25 @@
+ (and (eq_attr "type" "mmxmov,ssemov")
+ (eq_attr "memory" "store,both")))
+ "athlon-direct,(athlon-fpsched+athlon-agu),(athlon-fstore+athlon-store)")
++;; On AMDFAM10 all double, single and integer packed SSEx data stores
++;; generated are all double path, latency of 2 and use the FSTORE FP
++;; execution unit. No entries seperate for movupx/movdqu, which are
++;; vector path, latency of 3 and use the FSTORE*2 FP execution unit,
++;; as they will not be generated.
++(define_insn_reservation "athlon_ssest_amdfam10" 2
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "ssemov")
++ (and (eq_attr "mode" "V4SF,V2DF,TI")
++ (eq_attr "memory" "store,both"))))
++ "athlon-double,(athlon-fpsched+athlon-agu),((athlon-fstore+athlon-store)*2)")
++;; On AMDFAM10 all double, single and integer scalar SSEx and MMX
++;; data stores generated are all direct path, latency of 2 and use
++;; the FSTORE FP execution unit
++(define_insn_reservation "athlon_mmxssest_short_amdfam10" 2
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "mmxmov,ssemov")
++ (eq_attr "memory" "store,both")))
++ "athlon-direct,(athlon-fpsched+athlon-agu),(athlon-fstore+athlon-store)")
+ (define_insn_reservation "athlon_movaps_k8" 2
+ (and (eq_attr "cpu" "k8,generic64")
+ (and (eq_attr "type" "ssemov")
+@@ -578,6 +699,11 @@
+ (and (eq_attr "type" "sselog,sselog1")
+ (eq_attr "memory" "load")))
+ "athlon-double,athlon-fpload2k8,(athlon-fmul*2)")
++(define_insn_reservation "athlon_sselog_load_amdfam10" 4
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "sselog,sselog1")
++ (eq_attr "memory" "load")))
++ "athlon-direct,athlon-fploadk8,(athlon-fadd|athlon-fmul)")
+ (define_insn_reservation "athlon_sselog" 3
+ (and (eq_attr "cpu" "athlon")
+ (eq_attr "type" "sselog,sselog1"))
+@@ -586,6 +712,11 @@
+ (and (eq_attr "cpu" "k8,generic64")
+ (eq_attr "type" "sselog,sselog1"))
+ "athlon-double,athlon-fpsched,athlon-fmul")
++(define_insn_reservation "athlon_sselog_amdfam10" 2
++ (and (eq_attr "cpu" "amdfam10")
++ (eq_attr "type" "sselog,sselog1"))
++ "athlon-direct,athlon-fpsched,(athlon-fadd|athlon-fmul)")
++
+ ;; ??? pcmp executes in addmul, probably not worthwhile to bother about that.
+ (define_insn_reservation "athlon_ssecmp_load" 2
+ (and (eq_attr "cpu" "athlon")
+@@ -594,13 +725,13 @@
+ (eq_attr "memory" "load"))))
+ "athlon-direct,athlon-fpload,athlon-fadd")
+ (define_insn_reservation "athlon_ssecmp_load_k8" 4
+- (and (eq_attr "cpu" "k8,generic64")
++ (and (eq_attr "cpu" "k8,generic64,amdfam10")
+ (and (eq_attr "type" "ssecmp")
+ (and (eq_attr "mode" "SF,DF,DI,TI")
+ (eq_attr "memory" "load"))))
+ "athlon-direct,athlon-fploadk8,athlon-fadd")
+ (define_insn_reservation "athlon_ssecmp" 2
+- (and (eq_attr "cpu" "athlon,k8,generic64")
++ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
+ (and (eq_attr "type" "ssecmp")
+ (eq_attr "mode" "SF,DF,DI,TI")))
+ "athlon-direct,athlon-fpsched,athlon-fadd")
+@@ -614,6 +745,11 @@
+ (and (eq_attr "type" "ssecmp")
+ (eq_attr "memory" "load")))
+ "athlon-double,athlon-fpload2k8,(athlon-fadd*2)")
++(define_insn_reservation "athlon_ssecmpvector_load_amdfam10" 4
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "ssecmp")
++ (eq_attr "memory" "load")))
++ "athlon-direct,athlon-fploadk8,athlon-fadd")
+ (define_insn_reservation "athlon_ssecmpvector" 3
+ (and (eq_attr "cpu" "athlon")
+ (eq_attr "type" "ssecmp"))
+@@ -622,6 +758,10 @@
+ (and (eq_attr "cpu" "k8,generic64")
+ (eq_attr "type" "ssecmp"))
+ "athlon-double,athlon-fpsched,(athlon-fadd*2)")
++(define_insn_reservation "athlon_ssecmpvector_amdfam10" 2
++ (and (eq_attr "cpu" "amdfam10")
++ (eq_attr "type" "ssecmp"))
++ "athlon-direct,athlon-fpsched,athlon-fadd")
+ (define_insn_reservation "athlon_ssecomi_load" 4
+ (and (eq_attr "cpu" "athlon")
+ (and (eq_attr "type" "ssecomi")
+@@ -632,10 +772,20 @@
+ (and (eq_attr "type" "ssecomi")
+ (eq_attr "memory" "load")))
+ "athlon-vector,athlon-fploadk8,athlon-fadd")
++(define_insn_reservation "athlon_ssecomi_load_amdfam10" 5
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "ssecomi")
++ (eq_attr "memory" "load")))
++ "athlon-direct,athlon-fploadk8,athlon-fadd")
+ (define_insn_reservation "athlon_ssecomi" 4
+ (and (eq_attr "cpu" "athlon,k8,generic64")
+ (eq_attr "type" "ssecmp"))
+ "athlon-vector,athlon-fpsched,athlon-fadd")
++(define_insn_reservation "athlon_ssecomi_amdfam10" 3
++ (and (eq_attr "cpu" "amdfam10")
++;; It seems athlon_ssecomi has a bug in the attr_type, fixed for amdfam10
++ (eq_attr "type" "ssecomi"))
++ "athlon-direct,athlon-fpsched,athlon-fadd")
+ (define_insn_reservation "athlon_sseadd_load" 4
+ (and (eq_attr "cpu" "athlon")
+ (and (eq_attr "type" "sseadd")
+@@ -643,13 +793,13 @@
+ (eq_attr "memory" "load"))))
+ "athlon-direct,athlon-fpload,athlon-fadd")
+ (define_insn_reservation "athlon_sseadd_load_k8" 6
+- (and (eq_attr "cpu" "k8,generic64")
++ (and (eq_attr "cpu" "k8,generic64,amdfam10")
+ (and (eq_attr "type" "sseadd")
+ (and (eq_attr "mode" "SF,DF,DI")
+ (eq_attr "memory" "load"))))
+ "athlon-direct,athlon-fploadk8,athlon-fadd")
+ (define_insn_reservation "athlon_sseadd" 4
+- (and (eq_attr "cpu" "athlon,k8,generic64")
++ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
+ (and (eq_attr "type" "sseadd")
+ (eq_attr "mode" "SF,DF,DI")))
+ "athlon-direct,athlon-fpsched,athlon-fadd")
+@@ -663,6 +813,11 @@
+ (and (eq_attr "type" "sseadd")
+ (eq_attr "memory" "load")))
+ "athlon-double,athlon-fpload2k8,(athlon-fadd*2)")
++(define_insn_reservation "athlon_sseaddvector_load_amdfam10" 6
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "sseadd")
++ (eq_attr "memory" "load")))
++ "athlon-direct,athlon-fploadk8,athlon-fadd")
+ (define_insn_reservation "athlon_sseaddvector" 5
+ (and (eq_attr "cpu" "athlon")
+ (eq_attr "type" "sseadd"))
+@@ -671,6 +826,10 @@
+ (and (eq_attr "cpu" "k8,generic64")
+ (eq_attr "type" "sseadd"))
+ "athlon-double,athlon-fpsched,(athlon-fadd*2)")
++(define_insn_reservation "athlon_sseaddvector_amdfam10" 4
++ (and (eq_attr "cpu" "amdfam10")
++ (eq_attr "type" "sseadd"))
++ "athlon-direct,athlon-fpsched,athlon-fadd")
+
+ ;; Conversions behaves very irregularly and the scheduling is critical here.
+ ;; Take each instruction separately. Assume that the mode is always set to the
+@@ -684,12 +843,25 @@
+ (and (eq_attr "mode" "DF")
+ (eq_attr "memory" "load")))))
+ "athlon-direct,athlon-fploadk8,athlon-fstore")
++(define_insn_reservation "athlon_ssecvt_cvtss2sd_load_amdfam10" 7
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "ssecvt")
++ (and (eq_attr "amdfam10_decode" "double")
++ (and (eq_attr "mode" "DF")
++ (eq_attr "memory" "load")))))
++ "athlon-double,athlon-fploadk8,(athlon-faddmul+athlon-fstore)")
+ (define_insn_reservation "athlon_ssecvt_cvtss2sd" 2
+ (and (eq_attr "cpu" "athlon,k8,generic64")
+ (and (eq_attr "type" "ssecvt")
+ (and (eq_attr "athlon_decode" "direct")
+ (eq_attr "mode" "DF"))))
+ "athlon-direct,athlon-fpsched,athlon-fstore")
++(define_insn_reservation "athlon_ssecvt_cvtss2sd_amdfam10" 7
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "ssecvt")
++ (and (eq_attr "amdfam10_decode" "vector")
++ (eq_attr "mode" "DF"))))
++ "athlon-vector,athlon-fpsched,athlon-faddmul,(athlon-fstore*2)")
+ ;; cvtps2pd. Model same way the other double decoded FP conversions.
+ (define_insn_reservation "athlon_ssecvt_cvtps2pd_load_k8" 5
+ (and (eq_attr "cpu" "k8,athlon,generic64")
+@@ -698,12 +870,25 @@
+ (and (eq_attr "mode" "V2DF,V4SF,TI")
+ (eq_attr "memory" "load")))))
+ "athlon-double,athlon-fpload2k8,(athlon-fstore*2)")
++(define_insn_reservation "athlon_ssecvt_cvtps2pd_load_amdfam10" 4
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "ssecvt")
++ (and (eq_attr "amdfam10_decode" "direct")
++ (and (eq_attr "mode" "V2DF,V4SF,TI")
++ (eq_attr "memory" "load")))))
++ "athlon-direct,athlon-fploadk8,athlon-fstore")
+ (define_insn_reservation "athlon_ssecvt_cvtps2pd_k8" 3
+ (and (eq_attr "cpu" "k8,athlon,generic64")
+ (and (eq_attr "type" "ssecvt")
+ (and (eq_attr "athlon_decode" "double")
+ (eq_attr "mode" "V2DF,V4SF,TI"))))
+ "athlon-double,athlon-fpsched,athlon-fstore,athlon-fstore")
++(define_insn_reservation "athlon_ssecvt_cvtps2pd_amdfam10" 2
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "ssecvt")
++ (and (eq_attr "amdfam10_decode" "direct")
++ (eq_attr "mode" "V2DF,V4SF,TI"))))
++ "athlon-direct,athlon-fpsched,athlon-fstore")
+ ;; cvtsi2sd mem,reg is directpath path (cvtsi2sd reg,reg is doublepath)
+ ;; cvtsi2sd has troughput 1 and is executed in store unit with latency of 6
+ (define_insn_reservation "athlon_sseicvt_cvtsi2sd_load" 6
+@@ -713,6 +898,13 @@
+ (and (eq_attr "mode" "SF,DF")
+ (eq_attr "memory" "load")))))
+ "athlon-direct,athlon-fploadk8,athlon-fstore")
++(define_insn_reservation "athlon_sseicvt_cvtsi2sd_load_amdfam10" 9
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "sseicvt")
++ (and (eq_attr "amdfam10_decode" "double")
++ (and (eq_attr "mode" "SF,DF")
++ (eq_attr "memory" "load")))))
++ "athlon-double,athlon-fploadk8,(athlon-faddmul+athlon-fstore)")
+ ;; cvtsi2ss mem, reg is doublepath
+ (define_insn_reservation "athlon_sseicvt_cvtsi2ss_load" 9
+ (and (eq_attr "cpu" "athlon")
+@@ -728,6 +920,13 @@
+ (and (eq_attr "mode" "SF,DF")
+ (eq_attr "memory" "load")))))
+ "athlon-double,athlon-fploadk8,(athlon-fstore*2)")
++(define_insn_reservation "athlon_sseicvt_cvtsi2ss_load_amdfam10" 9
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "sseicvt")
++ (and (eq_attr "amdfam10_decode" "double")
++ (and (eq_attr "mode" "SF,DF")
++ (eq_attr "memory" "load")))))
++ "athlon-double,athlon-fploadk8,(athlon-faddmul+athlon-fstore)")
+ ;; cvtsi2sd reg,reg is double decoded (vector on Athlon)
+ (define_insn_reservation "athlon_sseicvt_cvtsi2sd_k8" 11
+ (and (eq_attr "cpu" "k8,athlon,generic64")
+@@ -736,6 +935,13 @@
+ (and (eq_attr "mode" "SF,DF")
+ (eq_attr "memory" "none")))))
+ "athlon-double,athlon-fploadk8,athlon-fstore")
++(define_insn_reservation "athlon_sseicvt_cvtsi2sd_amdfam10" 14
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "sseicvt")
++ (and (eq_attr "amdfam10_decode" "vector")
++ (and (eq_attr "mode" "SF,DF")
++ (eq_attr "memory" "none")))))
++ "athlon-vector,athlon-fploadk8,(athlon-faddmul+athlon-fstore)")
+ ;; cvtsi2ss reg, reg is doublepath
+ (define_insn_reservation "athlon_sseicvt_cvtsi2ss" 14
+ (and (eq_attr "cpu" "athlon,k8,generic64")
+@@ -744,6 +950,13 @@
+ (and (eq_attr "mode" "SF,DF")
+ (eq_attr "memory" "none")))))
+ "athlon-vector,athlon-fploadk8,(athlon-fvector*2)")
++(define_insn_reservation "athlon_sseicvt_cvtsi2ss_amdfam10" 14
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "sseicvt")
++ (and (eq_attr "amdfam10_decode" "vector")
++ (and (eq_attr "mode" "SF,DF")
++ (eq_attr "memory" "none")))))
++ "athlon-vector,athlon-fploadk8,(athlon-faddmul+athlon-fstore)")
+ ;; cvtsd2ss mem,reg is doublepath, troughput unknown, latency 9
+ (define_insn_reservation "athlon_ssecvt_cvtsd2ss_load_k8" 9
+ (and (eq_attr "cpu" "k8,athlon,generic64")
+@@ -752,6 +965,13 @@
+ (and (eq_attr "mode" "SF")
+ (eq_attr "memory" "load")))))
+ "athlon-double,athlon-fploadk8,(athlon-fstore*3)")
++(define_insn_reservation "athlon_ssecvt_cvtsd2ss_load_amdfam10" 9
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "ssecvt")
++ (and (eq_attr "amdfam10_decode" "double")
++ (and (eq_attr "mode" "SF")
++ (eq_attr "memory" "load")))))
++ "athlon-double,athlon-fploadk8,(athlon-faddmul+athlon-fstore)")
+ ;; cvtsd2ss reg,reg is vectorpath, troughput unknown, latency 12
+ (define_insn_reservation "athlon_ssecvt_cvtsd2ss" 12
+ (and (eq_attr "cpu" "athlon,k8,generic64")
+@@ -760,6 +980,13 @@
+ (and (eq_attr "mode" "SF")
+ (eq_attr "memory" "none")))))
+ "athlon-vector,athlon-fpsched,(athlon-fvector*3)")
++(define_insn_reservation "athlon_ssecvt_cvtsd2ss_amdfam10" 8
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "ssecvt")
++ (and (eq_attr "amdfam10_decode" "vector")
++ (and (eq_attr "mode" "SF")
++ (eq_attr "memory" "none")))))
++ "athlon-vector,athlon-fpsched,athlon-faddmul,(athlon-fstore*2)")
+ (define_insn_reservation "athlon_ssecvt_cvtpd2ps_load_k8" 8
+ (and (eq_attr "cpu" "athlon,k8,generic64")
+ (and (eq_attr "type" "ssecvt")
+@@ -767,6 +994,13 @@
+ (and (eq_attr "mode" "V4SF,V2DF,TI")
+ (eq_attr "memory" "load")))))
+ "athlon-double,athlon-fpload2k8,(athlon-fstore*3)")
++(define_insn_reservation "athlon_ssecvt_cvtpd2ps_load_amdfam10" 9
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "ssecvt")
++ (and (eq_attr "amdfam10_decode" "double")
++ (and (eq_attr "mode" "V4SF,V2DF,TI")
++ (eq_attr "memory" "load")))))
++ "athlon-double,athlon-fploadk8,(athlon-faddmul+athlon-fstore)")
+ ;; cvtpd2ps mem,reg is vectorpath, troughput unknown, latency 10
+ ;; ??? Why it is fater than cvtsd2ss?
+ (define_insn_reservation "athlon_ssecvt_cvtpd2ps" 8
+@@ -776,6 +1010,13 @@
+ (and (eq_attr "mode" "V4SF,V2DF,TI")
+ (eq_attr "memory" "none")))))
+ "athlon-vector,athlon-fpsched,athlon-fvector*2")
++(define_insn_reservation "athlon_ssecvt_cvtpd2ps_amdfam10" 7
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "ssecvt")
++ (and (eq_attr "amdfam10_decode" "double")
++ (and (eq_attr "mode" "V4SF,V2DF,TI")
++ (eq_attr "memory" "none")))))
++ "athlon-double,athlon-fpsched,(athlon-faddmul+athlon-fstore)")
+ ;; cvtsd2si mem,reg is doublepath, troughput 1, latency 9
+ (define_insn_reservation "athlon_secvt_cvtsX2si_load" 9
+ (and (eq_attr "cpu" "athlon,k8,generic64")
+@@ -784,6 +1025,13 @@
+ (and (eq_attr "mode" "SI,DI")
+ (eq_attr "memory" "load")))))
+ "athlon-vector,athlon-fploadk8,athlon-fvector")
++(define_insn_reservation "athlon_secvt_cvtsX2si_load_amdfam10" 10
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "sseicvt")
++ (and (eq_attr "amdfam10_decode" "double")
++ (and (eq_attr "mode" "SI,DI")
++ (eq_attr "memory" "load")))))
++ "athlon-double,athlon-fploadk8,(athlon-fadd+athlon-fstore)")
+ ;; cvtsd2si reg,reg is doublepath, troughput 1, latency 9
+ (define_insn_reservation "athlon_ssecvt_cvtsX2si" 9
+ (and (eq_attr "cpu" "athlon")
+@@ -799,6 +1047,29 @@
+ (and (eq_attr "mode" "SI,DI")
+ (eq_attr "memory" "none")))))
+ "athlon-double,athlon-fpsched,athlon-fstore")
++(define_insn_reservation "athlon_ssecvt_cvtsX2si_amdfam10" 8
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "sseicvt")
++ (and (eq_attr "amdfam10_decode" "double")
++ (and (eq_attr "mode" "SI,DI")
++ (eq_attr "memory" "none")))))
++ "athlon-double,athlon-fpsched,(athlon-fadd+athlon-fstore)")
++;; cvtpd2dq reg,mem is doublepath, troughput 1, latency 9 on amdfam10
++(define_insn_reservation "athlon_sseicvt_cvtpd2dq_load_amdfam10" 9
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "sseicvt")
++ (and (eq_attr "amdfam10_decode" "double")
++ (and (eq_attr "mode" "TI")
++ (eq_attr "memory" "load")))))
++ "athlon-double,athlon-fploadk8,(athlon-faddmul+athlon-fstore)")
++;; cvtpd2dq reg,mem is doublepath, troughput 1, latency 7 on amdfam10
++(define_insn_reservation "athlon_sseicvt_cvtpd2dq_amdfam10" 7
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "sseicvt")
++ (and (eq_attr "amdfam10_decode" "double")
++ (and (eq_attr "mode" "TI")
++ (eq_attr "memory" "none")))))
++ "athlon-double,athlon-fpsched,(athlon-faddmul+athlon-fstore)")
+
+
+ (define_insn_reservation "athlon_ssemul_load" 4
+@@ -808,13 +1079,13 @@
+ (eq_attr "memory" "load"))))
+ "athlon-direct,athlon-fpload,athlon-fmul")
+ (define_insn_reservation "athlon_ssemul_load_k8" 6
+- (and (eq_attr "cpu" "k8,generic64")
++ (and (eq_attr "cpu" "k8,generic64,amdfam10")
+ (and (eq_attr "type" "ssemul")
+ (and (eq_attr "mode" "SF,DF")
+ (eq_attr "memory" "load"))))
+ "athlon-direct,athlon-fploadk8,athlon-fmul")
+ (define_insn_reservation "athlon_ssemul" 4
+- (and (eq_attr "cpu" "athlon,k8,generic64")
++ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
+ (and (eq_attr "type" "ssemul")
+ (eq_attr "mode" "SF,DF")))
+ "athlon-direct,athlon-fpsched,athlon-fmul")
+@@ -828,6 +1099,11 @@
+ (and (eq_attr "type" "ssemul")
+ (eq_attr "memory" "load")))
+ "athlon-double,athlon-fpload2k8,(athlon-fmul*2)")
++(define_insn_reservation "athlon_ssemulvector_load_amdfam10" 6
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "ssemul")
++ (eq_attr "memory" "load")))
++ "athlon-direct,athlon-fploadk8,athlon-fmul")
+ (define_insn_reservation "athlon_ssemulvector" 5
+ (and (eq_attr "cpu" "athlon")
+ (eq_attr "type" "ssemul"))
+@@ -836,6 +1112,10 @@
+ (and (eq_attr "cpu" "k8,generic64")
+ (eq_attr "type" "ssemul"))
+ "athlon-double,athlon-fpsched,(athlon-fmul*2)")
++(define_insn_reservation "athlon_ssemulvector_amdfam10" 4
++ (and (eq_attr "cpu" "amdfam10")
++ (eq_attr "type" "ssemul"))
++ "athlon-direct,athlon-fpsched,athlon-fmul")
+ ;; divsd timings. divss is faster
+ (define_insn_reservation "athlon_ssediv_load" 20
+ (and (eq_attr "cpu" "athlon")
+@@ -844,13 +1124,13 @@
+ (eq_attr "memory" "load"))))
+ "athlon-direct,athlon-fpload,athlon-fmul*17")
+ (define_insn_reservation "athlon_ssediv_load_k8" 22
+- (and (eq_attr "cpu" "k8,generic64")
++ (and (eq_attr "cpu" "k8,generic64,amdfam10")
+ (and (eq_attr "type" "ssediv")
+ (and (eq_attr "mode" "SF,DF")
+ (eq_attr "memory" "load"))))
+ "athlon-direct,athlon-fploadk8,athlon-fmul*17")
+ (define_insn_reservation "athlon_ssediv" 20
+- (and (eq_attr "cpu" "athlon,k8,generic64")
++ (and (eq_attr "cpu" "athlon,k8,generic64,amdfam10")
+ (and (eq_attr "type" "ssediv")
+ (eq_attr "mode" "SF,DF")))
+ "athlon-direct,athlon-fpsched,athlon-fmul*17")
+@@ -864,6 +1144,11 @@
+ (and (eq_attr "type" "ssediv")
+ (eq_attr "memory" "load")))
+ "athlon-double,athlon-fpload2k8,athlon-fmul*34")
++(define_insn_reservation "athlon_ssedivvector_load_amdfam10" 22
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "ssediv")
++ (eq_attr "memory" "load")))
++ "athlon-direct,athlon-fploadk8,athlon-fmul*17")
+ (define_insn_reservation "athlon_ssedivvector" 39
+ (and (eq_attr "cpu" "athlon")
+ (eq_attr "type" "ssediv"))
+@@ -872,3 +1157,12 @@
+ (and (eq_attr "cpu" "k8,generic64")
+ (eq_attr "type" "ssediv"))
+ "athlon-double,athlon-fmul*34")
++(define_insn_reservation "athlon_ssedivvector_amdfam10" 20
++ (and (eq_attr "cpu" "amdfam10")
++ (eq_attr "type" "ssediv"))
++ "athlon-direct,athlon-fmul*17")
++(define_insn_reservation "athlon_sseins_amdfam10" 5
++ (and (eq_attr "cpu" "amdfam10")
++ (and (eq_attr "type" "sseins")
++ (eq_attr "mode" "TI")))
++ "athlon-vector,athlon-fpsched,athlon-faddmul")
+--- gcc/config/i386/pmmintrin.h.jj 2006-10-05 00:29:29.000000000 +0200
++++ gcc/config/i386/pmmintrin.h 2007-02-09 21:26:06.000000000 +0100
+@@ -30,7 +30,11 @@
+ #ifndef _PMMINTRIN_H_INCLUDED
+ #define _PMMINTRIN_H_INCLUDED
+
+-#ifdef __SSE3__
++#ifndef __SSE3__
++# error "SSE3 instruction set not enabled"
++#else
++
++/* We need definitions from the SSE2 and SSE header files*/
+ #include <xmmintrin.h>
+ #include <emmintrin.h>
+
+--- gcc/config/i386/tmmintrin.h.jj 2007-02-09 16:18:25.000000000 +0100
++++ gcc/config/i386/tmmintrin.h 2007-02-09 21:26:06.000000000 +0100
+@@ -30,7 +30,11 @@
+ #ifndef _TMMINTRIN_H_INCLUDED
+ #define _TMMINTRIN_H_INCLUDED
+
+-#ifdef __SSSE3__
++#ifndef __SSSE3__
++# error "SSSE3 instruction set not enabled"
++#else
++
++/* We need definitions from the SSE3, SSE2 and SSE header files*/
+ #include <pmmintrin.h>
+
+ static __inline __m128i
+--- gcc/config/i386/sse.md.jj 2007-02-09 16:18:25.000000000 +0100
++++ gcc/config/i386/sse.md 2007-02-09 21:26:06.000000000 +0100
+@@ -963,6 +963,7 @@
+ "cvtsi2ss\t{%2, %0|%0, %2}"
+ [(set_attr "type" "sseicvt")
+ (set_attr "athlon_decode" "vector,double")
++ (set_attr "amdfam10_decode" "vector,double")
+ (set_attr "mode" "SF")])
+
+ (define_insn "sse_cvtsi2ssq"
+@@ -976,6 +977,7 @@
+ "cvtsi2ssq\t{%2, %0|%0, %2}"
+ [(set_attr "type" "sseicvt")
+ (set_attr "athlon_decode" "vector,double")
++ (set_attr "amdfam10_decode" "vector,double")
+ (set_attr "mode" "SF")])
+
+ (define_insn "sse_cvtss2si"
+@@ -989,6 +991,7 @@
+ "cvtss2si\t{%1, %0|%0, %1}"
+ [(set_attr "type" "sseicvt")
+ (set_attr "athlon_decode" "double,vector")
++ (set_attr "amdfam10_decode" "double,double")
+ (set_attr "mode" "SI")])
+
+ (define_insn "sse_cvtss2siq"
+@@ -1002,6 +1005,7 @@
+ "cvtss2siq\t{%1, %0|%0, %1}"
+ [(set_attr "type" "sseicvt")
+ (set_attr "athlon_decode" "double,vector")
++ (set_attr "amdfam10_decode" "double,double")
+ (set_attr "mode" "DI")])
+
+ (define_insn "sse_cvttss2si"
+@@ -1014,6 +1018,7 @@
+ "cvttss2si\t{%1, %0|%0, %1}"
+ [(set_attr "type" "sseicvt")
+ (set_attr "athlon_decode" "double,vector")
++ (set_attr "amdfam10_decode" "double,double")
+ (set_attr "mode" "SI")])
+
+ (define_insn "sse_cvttss2siq"
+@@ -1026,6 +1031,7 @@
+ "cvttss2siq\t{%1, %0|%0, %1}"
+ [(set_attr "type" "sseicvt")
+ (set_attr "athlon_decode" "double,vector")
++ (set_attr "amdfam10_decode" "double,double")
+ (set_attr "mode" "DI")])
+
+ (define_insn "sse2_cvtdq2ps"
+@@ -1921,7 +1927,8 @@
+ "cvtsi2sd\t{%2, %0|%0, %2}"
+ [(set_attr "type" "sseicvt")
+ (set_attr "mode" "DF")
+- (set_attr "athlon_decode" "double,direct")])
++ (set_attr "athlon_decode" "double,direct")
++ (set_attr "amdfam10_decode" "vector,double")])
+
+ (define_insn "sse2_cvtsi2sdq"
+ [(set (match_operand:V2DF 0 "register_operand" "=x,x")
+@@ -1934,7 +1941,8 @@
+ "cvtsi2sdq\t{%2, %0|%0, %2}"
+ [(set_attr "type" "sseicvt")
+ (set_attr "mode" "DF")
+- (set_attr "athlon_decode" "double,direct")])
++ (set_attr "athlon_decode" "double,direct")
++ (set_attr "amdfam10_decode" "vector,double")])
+
+ (define_insn "sse2_cvtsd2si"
+ [(set (match_operand:SI 0 "register_operand" "=r,r")
+@@ -1947,6 +1955,7 @@
+ "cvtsd2si\t{%1, %0|%0, %1}"
+ [(set_attr "type" "sseicvt")
+ (set_attr "athlon_decode" "double,vector")
++ (set_attr "amdfam10_decode" "double,double")
+ (set_attr "mode" "SI")])
+
+ (define_insn "sse2_cvtsd2siq"
+@@ -1960,6 +1969,7 @@
+ "cvtsd2siq\t{%1, %0|%0, %1}"
+ [(set_attr "type" "sseicvt")
+ (set_attr "athlon_decode" "double,vector")
++ (set_attr "amdfam10_decode" "double,double")
+ (set_attr "mode" "DI")])
+
+ (define_insn "sse2_cvttsd2si"
+@@ -1972,7 +1982,8 @@
+ "cvttsd2si\t{%1, %0|%0, %1}"
+ [(set_attr "type" "sseicvt")
+ (set_attr "mode" "SI")
+- (set_attr "athlon_decode" "double,vector")])
++ (set_attr "athlon_decode" "double,vector")
++ (set_attr "amdfam10_decode" "double,double")])
+
+ (define_insn "sse2_cvttsd2siq"
+ [(set (match_operand:DI 0 "register_operand" "=r,r")
+@@ -1984,7 +1995,8 @@
+ "cvttsd2siq\t{%1, %0|%0, %1}"
+ [(set_attr "type" "sseicvt")
+ (set_attr "mode" "DI")
+- (set_attr "athlon_decode" "double,vector")])
++ (set_attr "athlon_decode" "double,vector")
++ (set_attr "amdfam10_decode" "double,double")])
+
+ (define_insn "sse2_cvtdq2pd"
+ [(set (match_operand:V2DF 0 "register_operand" "=x")
+@@ -2015,7 +2027,8 @@
+ "TARGET_SSE2"
+ "cvtpd2dq\t{%1, %0|%0, %1}"
+ [(set_attr "type" "ssecvt")
+- (set_attr "mode" "TI")])
++ (set_attr "mode" "TI")
++ (set_attr "amdfam10_decode" "double")])
+
+ (define_expand "sse2_cvttpd2dq"
+ [(set (match_operand:V4SI 0 "register_operand" "")
+@@ -2033,7 +2046,8 @@
+ "TARGET_SSE2"
+ "cvttpd2dq\t{%1, %0|%0, %1}"
+ [(set_attr "type" "ssecvt")
+- (set_attr "mode" "TI")])
++ (set_attr "mode" "TI")
++ (set_attr "amdfam10_decode" "double")])
+
+ (define_insn "sse2_cvtsd2ss"
+ [(set (match_operand:V4SF 0 "register_operand" "=x,x")
+@@ -2047,20 +2061,22 @@
+ "cvtsd2ss\t{%2, %0|%0, %2}"
+ [(set_attr "type" "ssecvt")
+ (set_attr "athlon_decode" "vector,double")
++ (set_attr "amdfam10_decode" "vector,double")
+ (set_attr "mode" "SF")])
+
+ (define_insn "sse2_cvtss2sd"
+- [(set (match_operand:V2DF 0 "register_operand" "=x")
++ [(set (match_operand:V2DF 0 "register_operand" "=x,x")
+ (vec_merge:V2DF
+ (float_extend:V2DF
+ (vec_select:V2SF
+- (match_operand:V4SF 2 "nonimmediate_operand" "xm")
++ (match_operand:V4SF 2 "nonimmediate_operand" "x,m")
+ (parallel [(const_int 0) (const_int 1)])))
+- (match_operand:V2DF 1 "register_operand" "0")
++ (match_operand:V2DF 1 "register_operand" "0,0")
+ (const_int 1)))]
+ "TARGET_SSE2"
+ "cvtss2sd\t{%2, %0|%0, %2}"
+ [(set_attr "type" "ssecvt")
++ (set_attr "amdfam10_decode" "vector,double")
+ (set_attr "mode" "DF")])
+
+ (define_expand "sse2_cvtpd2ps"
+@@ -2081,7 +2097,8 @@
+ "TARGET_SSE2"
+ "cvtpd2ps\t{%1, %0|%0, %1}"
+ [(set_attr "type" "ssecvt")
+- (set_attr "mode" "V4SF")])
++ (set_attr "mode" "V4SF")
++ (set_attr "amdfam10_decode" "double")])
+
+ (define_insn "sse2_cvtps2pd"
+ [(set (match_operand:V2DF 0 "register_operand" "=x")
+@@ -2092,7 +2109,8 @@
+ "TARGET_SSE2"
+ "cvtps2pd\t{%1, %0|%0, %1}"
+ [(set_attr "type" "ssecvt")
+- (set_attr "mode" "V2DF")])
++ (set_attr "mode" "V2DF")
++ (set_attr "amdfam10_decode" "direct")])
+
+ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+ ;;
+@@ -4550,3 +4568,92 @@
+ "pabs<mmxvecsize>\t{%1, %0|%0, %1}";
+ [(set_attr "type" "sselog1")
+ (set_attr "mode" "DI")])
++
++;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
++;;
++;; AMD SSE4A instructions
++;;
++;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
++
++(define_insn "sse4a_vmmovntv2df"
++ [(set (match_operand:DF 0 "memory_operand" "=m")
++ (unspec:DF [(vec_select:DF
++ (match_operand:V2DF 1 "register_operand" "x")
++ (parallel [(const_int 0)]))]
++ UNSPEC_MOVNT))]
++ "TARGET_SSE4A"
++ "movntsd\t{%1, %0|%0, %1}"
++ [(set_attr "type" "ssemov")
++ (set_attr "mode" "DF")])
++
++(define_insn "sse4a_movntdf"
++ [(set (match_operand:DF 0 "memory_operand" "=m")
++ (unspec:DF [(match_operand:DF 1 "register_operand" "x")]
++ UNSPEC_MOVNT))]
++ "TARGET_SSE4A"
++ "movntsd\t{%1, %0|%0, %1}"
++ [(set_attr "type" "ssemov")
++ (set_attr "mode" "DF")])
++
++(define_insn "sse4a_vmmovntv4sf"
++ [(set (match_operand:SF 0 "memory_operand" "=m")
++ (unspec:SF [(vec_select:SF
++ (match_operand:V4SF 1 "register_operand" "x")
++ (parallel [(const_int 0)]))]
++ UNSPEC_MOVNT))]
++ "TARGET_SSE4A"
++ "movntss\t{%1, %0|%0, %1}"
++ [(set_attr "type" "ssemov")
++ (set_attr "mode" "SF")])
++
++(define_insn "sse4a_movntsf"
++ [(set (match_operand:SF 0 "memory_operand" "=m")
++ (unspec:SF [(match_operand:SF 1 "register_operand" "x")]
++ UNSPEC_MOVNT))]
++ "TARGET_SSE4A"
++ "movntss\t{%1, %0|%0, %1}"
++ [(set_attr "type" "ssemov")
++ (set_attr "mode" "SF")])
++
++(define_insn "sse4a_extrqi"
++ [(set (match_operand:V2DI 0 "register_operand" "=x")
++ (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
++ (match_operand 2 "const_int_operand" "")
++ (match_operand 3 "const_int_operand" "")]
++ UNSPEC_EXTRQI))]
++ "TARGET_SSE4A"
++ "extrq\t{%3, %2, %0|%0, %2, %3}"
++ [(set_attr "type" "sse")
++ (set_attr "mode" "TI")])
++
++(define_insn "sse4a_extrq"
++ [(set (match_operand:V2DI 0 "register_operand" "=x")
++ (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
++ (match_operand:V16QI 2 "register_operand" "x")]
++ UNSPEC_EXTRQ))]
++ "TARGET_SSE4A"
++ "extrq\t{%2, %0|%0, %2}"
++ [(set_attr "type" "sse")
++ (set_attr "mode" "TI")])
++
++(define_insn "sse4a_insertqi"
++ [(set (match_operand:V2DI 0 "register_operand" "=x")
++ (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
++ (match_operand:V2DI 2 "register_operand" "x")
++ (match_operand 3 "const_int_operand" "")
++ (match_operand 4 "const_int_operand" "")]
++ UNSPEC_INSERTQI))]
++ "TARGET_SSE4A"
++ "insertq\t{%4, %3, %2, %0|%0, %2, %3, %4}"
++ [(set_attr "type" "sseins")
++ (set_attr "mode" "TI")])
++
++(define_insn "sse4a_insertq"
++ [(set (match_operand:V2DI 0 "register_operand" "=x")
++ (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "0")
++ (match_operand:V2DI 2 "register_operand" "x")]
++ UNSPEC_INSERTQ))]
++ "TARGET_SSE4A"
++ "insertq\t{%2, %0|%0, %2}"
++ [(set_attr "type" "sseins")
++ (set_attr "mode" "TI")])
+--- gcc/config/i386/i386.opt.jj 2007-02-09 16:18:25.000000000 +0100
++++ gcc/config/i386/i386.opt 2007-02-09 21:26:06.000000000 +0100
+@@ -205,6 +205,22 @@ mmni
+ Target Undocumented Mask(SSSE3) MaskExists
+ Support MMX, SSE, SSE2, SSE3 and SSSE3 built-in functions and code generation
+
++msse4a
++Target Report Mask(SSE4A)
++Support MMX, SSE, SSE2, SSE3 and SSE4A built-in functions and code generation
++
++mpopcnt
++Target Report Mask(POPCNT)
++Support code generation of popcount instruction for popcount built-ins
++namely __builtin_popcount, __builtin_popcountl and __builtin_popcountll
++
++mabm
++Target Report Mask(ABM)
++Support code generation of Advanced Bit Manipulation (ABM) instructions,
++which include popcnt and lzcnt instructions, for popcount and clz built-ins
++namely __builtin_popcount, __builtin_popcountl, __builtin_popcountll and
++__builtin_clz, __builtin_clzl, __builtin_clzll
++
+ msseregparm
+ Target RejectNegative Mask(SSEREGPARM)
+ Use SSE register passing conventions for SF and DF mode
+--- gcc/config/i386/ammintrin.h.jj 2007-02-09 21:26:06.000000000 +0100
++++ gcc/config/i386/ammintrin.h 2007-02-09 21:26:06.000000000 +0100
+@@ -0,0 +1,73 @@
++/* Copyright (C) 2007 Free Software Foundation, Inc.
++
++ This file is part of GCC.
++
++ GCC is free software; you can redistribute it and/or modify
++ it under the terms of the GNU General Public License as published by
++ the Free Software Foundation; either version 2, or (at your option)
++ any later version.
++
++ GCC is distributed in the hope that it will be useful,
++ but WITHOUT ANY WARRANTY; without even the implied warranty of
++ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
++ GNU General Public License for more details.
++
++ You should have received a copy of the GNU General Public License
++ along with GCC; see the file COPYING. If not, write to
++ the Free Software Foundation, 51 Franklin Street, Fifth Floor,
++ Boston, MA 02110-1301, USA. */
++
++/* As a special exception, if you include this header file into source
++ files compiled by GCC, this header file does not by itself cause
++ the resulting executable to be covered by the GNU General Public
++ License. This exception does not however invalidate any other
++ reasons why the executable file might be covered by the GNU General
++ Public License. */
++
++/* Implemented from the specification included in the AMD Programmers
++ Manual Update, version 2.x */
++
++#ifndef _AMMINTRIN_H_INCLUDED
++#define _AMMINTRIN_H_INCLUDED
++
++#ifndef __SSE4A__
++# error "SSE4A instruction set not enabled"
++#else
++
++/* We need definitions from the SSE3, SSE2 and SSE header files*/
++#include <pmmintrin.h>
++
++static __inline void __attribute__((__always_inline__))
++_mm_stream_sd (double * __P, __m128d __Y)
++{
++ __builtin_ia32_movntsd (__P, (__v2df) __Y);
++}
++
++static __inline void __attribute__((__always_inline__))
++_mm_stream_ss (float * __P, __m128 __Y)
++{
++ __builtin_ia32_movntss (__P, (__v4sf) __Y);
++}
++
++static __inline __m128i __attribute__((__always_inline__))
++_mm_extract_si64 (__m128i __X, __m128i __Y)
++{
++ return (__m128i) __builtin_ia32_extrq ((__v2di) __X, (__v16qi) __Y);
++}
++
++#define _mm_extracti_si64(X, I, L) \
++((__m128i) __builtin_ia32_extrqi ((__v2di)(X), I, L))
++
++static __inline __m128i __attribute__((__always_inline__))
++_mm_insert_si64 (__m128i __X,__m128i __Y)
++{
++ return (__m128i) __builtin_ia32_insertq ((__v2di)__X, (__v2di)__Y);
++}
++
++#define _mm_inserti_si64(X, Y, I, L) \
++((__m128i) __builtin_ia32_insertqi ((__v2di)(X), (__v2di)(Y), I, L))
++
++
++#endif /* __SSE4A__ */
++
++#endif /* _AMMINTRIN_H_INCLUDED */
+--- gcc/config/i386/emmintrin.h.jj 2006-10-05 00:29:29.000000000 +0200
++++ gcc/config/i386/emmintrin.h 2007-02-09 21:26:06.000000000 +0100
+@@ -30,7 +30,11 @@
+ #ifndef _EMMINTRIN_H_INCLUDED
+ #define _EMMINTRIN_H_INCLUDED
+
+-#ifdef __SSE2__
++#ifndef __SSE2__
++# error "SSE2 instruction set not enabled"
++#else
++
++/* We need definitions from the SSE header files*/
+ #include <xmmintrin.h>
+
+ /* SSE2 */
+--- gcc/config/i386/i386.c.jj 2007-02-09 16:24:00.000000000 +0100
++++ gcc/config/i386/i386.c 2007-02-10 19:47:05.000000000 +0100
+@@ -534,6 +534,71 @@ struct processor_costs k8_cost = {
+ COSTS_N_INSNS (35), /* cost of FSQRT instruction. */
+ };
+
++struct processor_costs amdfam10_cost = {
++ COSTS_N_INSNS (1), /* cost of an add instruction */
++ COSTS_N_INSNS (2), /* cost of a lea instruction */
++ COSTS_N_INSNS (1), /* variable shift costs */
++ COSTS_N_INSNS (1), /* constant shift costs */
++ {COSTS_N_INSNS (3), /* cost of starting multiply for QI */
++ COSTS_N_INSNS (4), /* HI */
++ COSTS_N_INSNS (3), /* SI */
++ COSTS_N_INSNS (4), /* DI */
++ COSTS_N_INSNS (5)}, /* other */
++ 0, /* cost of multiply per each bit set */
++ {COSTS_N_INSNS (19), /* cost of a divide/mod for QI */
++ COSTS_N_INSNS (35), /* HI */
++ COSTS_N_INSNS (51), /* SI */
++ COSTS_N_INSNS (83), /* DI */
++ COSTS_N_INSNS (83)}, /* other */
++ COSTS_N_INSNS (1), /* cost of movsx */
++ COSTS_N_INSNS (1), /* cost of movzx */
++ 8, /* "large" insn */
++ 9, /* MOVE_RATIO */
++ 4, /* cost for loading QImode using movzbl */
++ {3, 4, 3}, /* cost of loading integer registers
++ in QImode, HImode and SImode.
++ Relative to reg-reg move (2). */
++ {3, 4, 3}, /* cost of storing integer registers */
++ 4, /* cost of reg,reg fld/fst */
++ {4, 4, 12}, /* cost of loading fp registers
++ in SFmode, DFmode and XFmode */
++ {6, 6, 8}, /* cost of storing fp registers
++ in SFmode, DFmode and XFmode */
++ 2, /* cost of moving MMX register */
++ {3, 3}, /* cost of loading MMX registers
++ in SImode and DImode */
++ {4, 4}, /* cost of storing MMX registers
++ in SImode and DImode */
++ 2, /* cost of moving SSE register */
++ {4, 4, 3}, /* cost of loading SSE registers
++ in SImode, DImode and TImode */
++ {4, 4, 5}, /* cost of storing SSE registers
++ in SImode, DImode and TImode */
++ 3, /* MMX or SSE register to integer */
++ /* On K8
++ MOVD reg64, xmmreg Double FSTORE 4
++ MOVD reg32, xmmreg Double FSTORE 4
++ On AMDFAM10
++ MOVD reg64, xmmreg Double FADD 3
++ 1/1 1/1
++ MOVD reg32, xmmreg Double FADD 3
++ 1/1 1/1 */
++ 64, /* size of prefetch block */
++ /* New AMD processors never drop prefetches; if they cannot be performed
++ immediately, they are queued. We set number of simultaneous prefetches
++ to a large constant to reflect this (it probably is not a good idea not
++ to limit number of prefetches at all, as their execution also takes some
++ time). */
++ 100, /* number of parallel prefetches */
++ 5, /* Branch cost */
++ COSTS_N_INSNS (4), /* cost of FADD and FSUB insns. */
++ COSTS_N_INSNS (4), /* cost of FMUL instruction. */
++ COSTS_N_INSNS (19), /* cost of FDIV instruction. */
++ COSTS_N_INSNS (2), /* cost of FABS instruction. */
++ COSTS_N_INSNS (2), /* cost of FCHS instruction. */
++ COSTS_N_INSNS (35), /* cost of FSQRT instruction. */
++};
++
+ static const
+ struct processor_costs pentium4_cost = {
+ COSTS_N_INSNS (1), /* cost of an add instruction */
+@@ -816,11 +881,13 @@ const struct processor_costs *ix86_cost
+ #define m_PENT4 (1<<PROCESSOR_PENTIUM4)
+ #define m_K8 (1<<PROCESSOR_K8)
+ #define m_ATHLON_K8 (m_K8 | m_ATHLON)
++#define m_AMDFAM10 (1<<PROCESSOR_AMDFAM10)
+ #define m_NOCONA (1<<PROCESSOR_NOCONA)
+ #define m_CORE2 (1<<PROCESSOR_CORE2)
+ #define m_GENERIC32 (1<<PROCESSOR_GENERIC32)
+ #define m_GENERIC64 (1<<PROCESSOR_GENERIC64)
+ #define m_GENERIC (m_GENERIC32 | m_GENERIC64)
++#define m_ATHLON_K8_AMDFAM10 (m_K8 | m_ATHLON | m_AMDFAM10)
+
+ /* Generic instruction choice should be common subset of supported CPUs
+ (PPro/PENT4/NOCONA/CORE2/Athlon/K8). */
+@@ -828,23 +895,31 @@ const struct processor_costs *ix86_cost
+ /* Leave is not affecting Nocona SPEC2000 results negatively, so enabling for
+ Generic64 seems like good code size tradeoff. We can't enable it for 32bit
+ generic because it is not working well with PPro base chips. */
+-const int x86_use_leave = m_386 | m_K6_GEODE | m_ATHLON_K8 | m_CORE2 | m_GENERIC64;
+-const int x86_push_memory = m_386 | m_K6_GEODE | m_ATHLON_K8 | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC;
++const int x86_use_leave = m_386 | m_K6_GEODE | m_ATHLON_K8_AMDFAM10 | m_CORE2
++ | m_GENERIC64;
++const int x86_push_memory = m_386 | m_K6_GEODE | m_ATHLON_K8_AMDFAM10 | m_PENT4
++ | m_NOCONA | m_CORE2 | m_GENERIC;
+ const int x86_zero_extend_with_and = m_486 | m_PENT;
+-const int x86_movx = m_ATHLON_K8 | m_PPRO | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC | m_GEODE /* m_386 | m_K6 */;
++/* Enable to zero extend integer registers to avoid partial dependencies */
++const int x86_movx = m_ATHLON_K8_AMDFAM10 | m_PPRO | m_PENT4 | m_NOCONA
++ | m_CORE2 | m_GENERIC | m_GEODE /* m_386 | m_K6 */;
+ const int x86_double_with_add = ~m_386;
+ const int x86_use_bit_test = m_386;
+-const int x86_unroll_strlen = m_486 | m_PENT | m_PPRO | m_ATHLON_K8 | m_K6 | m_CORE2 | m_GENERIC;
+-const int x86_cmove = m_PPRO | m_GEODE | m_ATHLON_K8 | m_PENT4 | m_NOCONA;
++const int x86_unroll_strlen = m_486 | m_PENT | m_PPRO | m_ATHLON_K8_AMDFAM10
++ | m_K6 | m_CORE2 | m_GENERIC;
++const int x86_cmove = m_PPRO | m_GEODE | m_ATHLON_K8_AMDFAM10 | m_PENT4
++ | m_NOCONA;
+ const int x86_fisttp = m_NOCONA;
+-const int x86_3dnow_a = m_ATHLON_K8;
+-const int x86_deep_branch = m_PPRO | m_K6_GEODE | m_ATHLON_K8 | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC;
++const int x86_3dnow_a = m_ATHLON_K8_AMDFAM10;
++const int x86_deep_branch = m_PPRO | m_K6_GEODE | m_ATHLON_K8_AMDFAM10
++ | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC;
+ /* Branch hints were put in P4 based on simulation result. But
+ after P4 was made, no performance benefit was observed with
+ branch hints. It also increases the code size. As the result,
+ icc never generates branch hints. */
+ const int x86_branch_hints = 0;
+-const int x86_use_sahf = m_PPRO | m_K6_GEODE | m_PENT4 | m_NOCONA | m_GENERIC32; /*m_GENERIC | m_ATHLON_K8 ? */
++const int x86_use_sahf = m_PPRO | m_K6_GEODE | m_PENT4 | m_NOCONA | m_GENERIC32;
++ /*m_GENERIC | m_ATHLON_K8 ? */
+ /* We probably ought to watch for partial register stalls on Generic32
+ compilation setting as well. However in current implementation the
+ partial register stalls are not eliminated very well - they can
+@@ -856,13 +931,16 @@ const int x86_use_sahf = m_PPRO | m_K6_G
+ const int x86_partial_reg_stall = m_PPRO;
+ const int x86_partial_flag_reg_stall = m_CORE2 | m_GENERIC;
+ const int x86_use_himode_fiop = m_386 | m_486 | m_K6_GEODE;
+-const int x86_use_simode_fiop = ~(m_PPRO | m_ATHLON_K8 | m_PENT | m_CORE2 | m_GENERIC);
++const int x86_use_simode_fiop = ~(m_PPRO | m_ATHLON_K8_AMDFAM10 | m_PENT
++ | m_CORE2 | m_GENERIC);
+ const int x86_use_mov0 = m_K6;
+ const int x86_use_cltd = ~(m_PENT | m_K6 | m_CORE2 | m_GENERIC);
+ const int x86_read_modify_write = ~m_PENT;
+ const int x86_read_modify = ~(m_PENT | m_PPRO);
+ const int x86_split_long_moves = m_PPRO;
+-const int x86_promote_QImode = m_K6_GEODE | m_PENT | m_386 | m_486 | m_ATHLON_K8 | m_CORE2 | m_GENERIC; /* m_PENT4 ? */
++const int x86_promote_QImode = m_K6_GEODE | m_PENT | m_386 | m_486
++ | m_ATHLON_K8_AMDFAM10 | m_CORE2 | m_GENERIC;
++ /* m_PENT4 ? */
+ const int x86_fast_prefix = ~(m_PENT | m_486 | m_386);
+ const int x86_single_stringop = m_386 | m_PENT4 | m_NOCONA;
+ const int x86_qimode_math = ~(0);
+@@ -872,18 +950,37 @@ const int x86_promote_qi_regs = 0;
+ if our scheme for avoiding partial stalls was more effective. */
+ const int x86_himode_math = ~(m_PPRO);
+ const int x86_promote_hi_regs = m_PPRO;
+-const int x86_sub_esp_4 = m_ATHLON_K8 | m_PPRO | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC;
+-const int x86_sub_esp_8 = m_ATHLON_K8 | m_PPRO | m_386 | m_486 | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC;
+-const int x86_add_esp_4 = m_ATHLON_K8 | m_K6_GEODE | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC;
+-const int x86_add_esp_8 = m_ATHLON_K8 | m_PPRO | m_K6_GEODE | m_386 | m_486 | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC;
+-const int x86_integer_DFmode_moves = ~(m_ATHLON_K8 | m_PENT4 | m_NOCONA | m_PPRO | m_CORE2 | m_GENERIC | m_GEODE);
+-const int x86_partial_reg_dependency = m_ATHLON_K8 | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC;
+-const int x86_memory_mismatch_stall = m_ATHLON_K8 | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC;
+-const int x86_accumulate_outgoing_args = m_ATHLON_K8 | m_PENT4 | m_NOCONA | m_PPRO | m_CORE2 | m_GENERIC;
++/* Enable if add/sub rsp is preferred over 1 or 2 push/pop */
++const int x86_sub_esp_4 = m_ATHLON_K8_AMDFAM10 | m_PPRO | m_PENT4 | m_NOCONA
++ | m_CORE2 | m_GENERIC;
++const int x86_sub_esp_8 = m_ATHLON_K8_AMDFAM10 | m_PPRO | m_386 | m_486
++ | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC;
++const int x86_add_esp_4 = m_ATHLON_K8_AMDFAM10 | m_K6_GEODE | m_PENT4 | m_NOCONA
++ | m_CORE2 | m_GENERIC;
++const int x86_add_esp_8 = m_ATHLON_K8_AMDFAM10 | m_PPRO | m_K6_GEODE | m_386
++ | m_486 | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC;
++/* Enable if integer moves are preferred for DFmode copies */
++const int x86_integer_DFmode_moves = ~(m_ATHLON_K8_AMDFAM10 | m_PENT4 | m_NOCONA
++ | m_PPRO | m_CORE2 | m_GENERIC | m_GEODE);
++const int x86_partial_reg_dependency = m_ATHLON_K8_AMDFAM10 | m_PENT4 | m_NOCONA
++ | m_CORE2 | m_GENERIC;
++const int x86_memory_mismatch_stall = m_ATHLON_K8_AMDFAM10 | m_PENT4 | m_NOCONA
++ | m_CORE2 | m_GENERIC;
++/* If ACCUMULATE_OUTGOING_ARGS is enabled, the maximum amount of space required
++ for outgoing arguments will be computed and placed into the variable
++ `current_function_outgoing_args_size'. No space will be pushed onto the stack
++ for each call; instead, the function prologue should increase the stack frame
++ size by this amount. Setting both PUSH_ARGS and ACCUMULATE_OUTGOING_ARGS is
++ not proper. */
++const int x86_accumulate_outgoing_args = m_ATHLON_K8_AMDFAM10 | m_PENT4
++ | m_NOCONA | m_PPRO | m_CORE2
++ | m_GENERIC;
+ const int x86_prologue_using_move = m_ATHLON_K8 | m_PPRO | m_CORE2 | m_GENERIC;
+ const int x86_epilogue_using_move = m_ATHLON_K8 | m_PPRO | m_CORE2 | m_GENERIC;
+ const int x86_shift1 = ~m_486;
+-const int x86_arch_always_fancy_math_387 = m_PENT | m_PPRO | m_ATHLON_K8 | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC;
++const int x86_arch_always_fancy_math_387 = m_PENT | m_PPRO
++ | m_ATHLON_K8_AMDFAM10 | m_PENT4
++ | m_NOCONA | m_CORE2 | m_GENERIC;
+ /* In Generic model we have an confict here in between PPro/Pentium4 based chips
+ that thread 128bit SSE registers as single units versus K8 based chips that
+ divide SSE registers to two 64bit halves.
+@@ -893,15 +990,66 @@ const int x86_arch_always_fancy_math_387
+ this option on P4 brings over 20% SPECfp regression, while enabling it on
+ K8 brings roughly 2.4% regression that can be partly masked by careful scheduling
+ of moves. */
+-const int x86_sse_partial_reg_dependency = m_PENT4 | m_NOCONA | m_PPRO | m_CORE2 | m_GENERIC;
++const int x86_sse_partial_reg_dependency = m_PENT4 | m_NOCONA | m_PPRO | m_CORE2
++ | m_GENERIC | m_AMDFAM10;
+ /* Set for machines where the type and dependencies are resolved on SSE
+ register parts instead of whole registers, so we may maintain just
+ lower part of scalar values in proper format leaving the upper part
+ undefined. */
+ const int x86_sse_split_regs = m_ATHLON_K8;
+-const int x86_sse_typeless_stores = m_ATHLON_K8;
++/* Code generation for scalar reg-reg moves of single and double precision data:
++ if (x86_sse_partial_reg_dependency == true | x86_sse_split_regs == true)
++ movaps reg, reg
++ else
++ movss reg, reg
++ if (x86_sse_partial_reg_dependency == true)
++ movapd reg, reg
++ else
++ movsd reg, reg
++
++ Code generation for scalar loads of double precision data:
++ if (x86_sse_split_regs == true)
++ movlpd mem, reg (gas syntax)
++ else
++ movsd mem, reg
++
++ Code generation for unaligned packed loads of single precision data
++ (x86_sse_unaligned_move_optimal overrides x86_sse_partial_reg_dependency):
++ if (x86_sse_unaligned_move_optimal)
++ movups mem, reg
++
++ if (x86_sse_partial_reg_dependency == true)
++ {
++ xorps reg, reg
++ movlps mem, reg
++ movhps mem+8, reg
++ }
++ else
++ {
++ movlps mem, reg
++ movhps mem+8, reg
++ }
++
++ Code generation for unaligned packed loads of double precision data
++ (x86_sse_unaligned_move_optimal overrides x86_sse_split_regs):
++ if (x86_sse_unaligned_move_optimal)
++ movupd mem, reg
++
++ if (x86_sse_split_regs == true)
++ {
++ movlpd mem, reg
++ movhpd mem+8, reg
++ }
++ else
++ {
++ movsd mem, reg
++ movhpd mem+8, reg
++ }
++ */
++const int x86_sse_unaligned_move_optimal = m_AMDFAM10;
++const int x86_sse_typeless_stores = m_ATHLON_K8_AMDFAM10;
+ const int x86_sse_load0_by_pxor = m_PPRO | m_PENT4 | m_NOCONA;
+-const int x86_use_ffreep = m_ATHLON_K8;
++const int x86_use_ffreep = m_ATHLON_K8_AMDFAM10;
+ const int x86_rep_movl_optimal = m_386 | m_PENT | m_PPRO | m_K6_GEODE | m_CORE2;
+ const int x86_use_incdec = ~(m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC);
+
+@@ -909,19 +1057,22 @@ const int x86_use_incdec = ~(m_PENT4 | m
+ integer data in xmm registers. Which results in pretty abysmal code. */
+ const int x86_inter_unit_moves = 0 /* ~(m_ATHLON_K8) */;
+
+-const int x86_ext_80387_constants = m_K6_GEODE | m_ATHLON | m_PENT4 | m_NOCONA | m_PPRO | m_GENERIC32;
++const int x86_ext_80387_constants = m_K6_GEODE | m_ATHLON | m_PENT4
++ | m_NOCONA | m_PPRO | m_GENERIC32;
+ /* Some CPU cores are not able to predict more than 4 branch instructions in
+ the 16 byte window. */
+-const int x86_four_jump_limit = m_PPRO | m_ATHLON_K8 | m_PENT4 | m_NOCONA | m_CORE2 | m_GENERIC;
+-const int x86_schedule = m_PPRO | m_ATHLON_K8 | m_K6_GEODE | m_PENT | m_CORE2 | m_GENERIC;
+-const int x86_use_bt = m_ATHLON_K8;
++const int x86_four_jump_limit = m_PPRO | m_ATHLON_K8_AMDFAM10 | m_PENT4
++ | m_NOCONA | m_CORE2 | m_GENERIC;
++const int x86_schedule = m_PPRO | m_ATHLON_K8_AMDFAM10 | m_K6_GEODE | m_PENT
++ | m_CORE2 | m_GENERIC;
++const int x86_use_bt = m_ATHLON_K8_AMDFAM10;
+ /* Compare and exchange was added for 80486. */
+ const int x86_cmpxchg = ~m_386;
+ /* Compare and exchange 8 bytes was added for pentium. */
+ const int x86_cmpxchg8b = ~(m_386 | m_486);
+ /* Exchange and add was added for 80486. */
+ const int x86_xadd = ~m_386;
+-const int x86_pad_returns = m_ATHLON_K8 | m_CORE2 | m_GENERIC;
++const int x86_pad_returns = m_ATHLON_K8_AMDFAM10 | m_CORE2 | m_GENERIC;
+
+ /* In case the average insn count for single function invocation is
+ lower than this constant, emit fast (but longer) prologue and
+@@ -1485,16 +1636,24 @@ ix86_handle_option (size_t code, const c
+ case OPT_msse:
+ if (!value)
+ {
+- target_flags &= ~(MASK_SSE2 | MASK_SSE3);
+- target_flags_explicit |= MASK_SSE2 | MASK_SSE3;
++ target_flags &= ~(MASK_SSE2 | MASK_SSE3 | MASK_SSE4A);
++ target_flags_explicit |= MASK_SSE2 | MASK_SSE3 | MASK_SSE4A;
+ }
+ return true;
+
+ case OPT_msse2:
+ if (!value)
+ {
+- target_flags &= ~MASK_SSE3;
+- target_flags_explicit |= MASK_SSE3;
++ target_flags &= ~(MASK_SSE3 | MASK_SSE4A);
++ target_flags_explicit |= MASK_SSE3 | MASK_SSE4A;
++ }
++ return true;
++
++ case OPT_msse3:
++ if (!value)
++ {
++ target_flags &= ~MASK_SSE4A;
++ target_flags_explicit |= MASK_SSE4A;
+ }
+ return true;
+
+@@ -1546,7 +1705,8 @@ override_options (void)
+ {&nocona_cost, 0, 0, 0, 0, 0, 0, 0},
+ {&core2_cost, 0, 0, 16, 7, 16, 7, 16},
+ {&generic32_cost, 0, 0, 16, 7, 16, 7, 16},
+- {&generic64_cost, 0, 0, 16, 7, 16, 7, 16}
++ {&generic64_cost, 0, 0, 16, 7, 16, 7, 16},
++ {&amdfam10_cost, 0, 0, 32, 7, 32, 7, 32}
+ };
+
+ static const char * const cpu_names[] = TARGET_CPU_DEFAULT_NAMES;
+@@ -1565,7 +1725,10 @@ override_options (void)
+ PTA_3DNOW_A = 64,
+ PTA_64BIT = 128,
+ PTA_SSSE3 = 256,
+- PTA_CX16 = 512
++ PTA_CX16 = 512,
++ PTA_POPCNT = 1024,
++ PTA_ABM = 2048,
++ PTA_SSE4A = 4096
+ } flags;
+ }
+ const processor_alias_table[] =
+@@ -1621,6 +1784,10 @@ override_options (void)
+ | PTA_3DNOW_A | PTA_SSE | PTA_SSE2},
+ {"athlon-fx", PROCESSOR_K8, PTA_MMX | PTA_PREFETCH_SSE | PTA_3DNOW | PTA_64BIT
+ | PTA_3DNOW_A | PTA_SSE | PTA_SSE2},
++ {"amdfam10", PROCESSOR_AMDFAM10, PTA_MMX | PTA_PREFETCH_SSE | PTA_3DNOW
++ | PTA_64BIT | PTA_3DNOW_A | PTA_SSE
++ | PTA_SSE2 | PTA_SSE3 | PTA_POPCNT
++ | PTA_ABM | PTA_SSE4A | PTA_CX16},
+ {"generic32", PROCESSOR_GENERIC32, 0 /* flags are only used for -march switch. */ },
+ {"generic64", PROCESSOR_GENERIC64, PTA_64BIT /* flags are only used for -march switch. */ },
+ };
+@@ -1772,6 +1939,15 @@ override_options (void)
+ x86_prefetch_sse = true;
+ if (processor_alias_table[i].flags & PTA_CX16)
+ x86_cmpxchg16b = true;
++ if (processor_alias_table[i].flags & PTA_POPCNT
++ && !(target_flags_explicit & MASK_POPCNT))
++ target_flags |= MASK_POPCNT;
++ if (processor_alias_table[i].flags & PTA_ABM
++ && !(target_flags_explicit & MASK_ABM))
++ target_flags |= MASK_ABM;
++ if (processor_alias_table[i].flags & PTA_SSE4A
++ && !(target_flags_explicit & MASK_SSE4A))
++ target_flags |= MASK_SSE4A;
+ if (TARGET_64BIT && !(processor_alias_table[i].flags & PTA_64BIT))
+ error ("CPU you selected does not support x86-64 "
+ "instruction set");
+@@ -1963,6 +2139,10 @@ override_options (void)
+ if (TARGET_SSSE3)
+ target_flags |= MASK_SSE3;
+
++ /* Turn on SSE3 builtins for -msse4a. */
++ if (TARGET_SSE4A)
++ target_flags |= MASK_SSE3;
++
+ /* Turn on SSE2 builtins for -msse3. */
+ if (TARGET_SSE3)
+ target_flags |= MASK_SSE2;
+@@ -1982,6 +2162,10 @@ override_options (void)
+ if (TARGET_3DNOW)
+ target_flags |= MASK_MMX;
+
++ /* Turn on POPCNT builtins for -mabm. */
++ if (TARGET_ABM)
++ target_flags |= MASK_POPCNT;
++
+ if (TARGET_64BIT)
+ {
+ if (TARGET_ALIGN_DOUBLE)
+@@ -8900,8 +9084,16 @@ ix86_expand_vector_move_misalign (enum m
+ }
+
+ if (TARGET_SSE2 && mode == V2DFmode)
+- {
+- rtx zero;
++ {
++ rtx zero;
++
++ if (TARGET_SSE_UNALIGNED_MOVE_OPTIMAL)
++ {
++ op0 = gen_lowpart (V2DFmode, op0);
++ op1 = gen_lowpart (V2DFmode, op1);
++ emit_insn (gen_sse2_movupd (op0, op1));
++ return;
++ }
+
+ /* When SSE registers are split into halves, we can avoid
+ writing to the top half twice. */
+@@ -8929,7 +9121,15 @@ ix86_expand_vector_move_misalign (enum m
+ emit_insn (gen_sse2_loadhpd (op0, op0, m));
+ }
+ else
+- {
++ {
++ if (TARGET_SSE_UNALIGNED_MOVE_OPTIMAL)
++ {
++ op0 = gen_lowpart (V4SFmode, op0);
++ op1 = gen_lowpart (V4SFmode, op1);
++ emit_insn (gen_sse_movups (op0, op1));
++ return;
++ }
++
+ if (TARGET_SSE_PARTIAL_REG_DEPENDENCY)
+ emit_move_insn (op0, CONST0_RTX (mode));
+ else
+@@ -13461,6 +13661,7 @@ ix86_issue_rate (void)
+ case PROCESSOR_PENTIUM4:
+ case PROCESSOR_ATHLON:
+ case PROCESSOR_K8:
++ case PROCESSOR_AMDFAM10:
+ case PROCESSOR_NOCONA:
+ case PROCESSOR_GENERIC32:
+ case PROCESSOR_GENERIC64:
+@@ -13659,6 +13860,7 @@ ix86_adjust_cost (rtx insn, rtx link, rt
+
+ case PROCESSOR_ATHLON:
+ case PROCESSOR_K8:
++ case PROCESSOR_AMDFAM10:
+ case PROCESSOR_GENERIC32:
+ case PROCESSOR_GENERIC64:
+ memory = get_attr_memory (insn);
+@@ -14370,6 +14572,14 @@ enum ix86_builtins
+ IX86_BUILTIN_PABSW128,
+ IX86_BUILTIN_PABSD128,
+
++ /* AMDFAM10 - SSE4A New Instructions. */
++ IX86_BUILTIN_MOVNTSD,
++ IX86_BUILTIN_MOVNTSS,
++ IX86_BUILTIN_EXTRQI,
++ IX86_BUILTIN_EXTRQ,
++ IX86_BUILTIN_INSERTQI,
++ IX86_BUILTIN_INSERTQ,
++
+ IX86_BUILTIN_VEC_INIT_V2SI,
+ IX86_BUILTIN_VEC_INIT_V4HI,
+ IX86_BUILTIN_VEC_INIT_V8QI,
+@@ -15102,6 +15312,18 @@ ix86_init_mmx_sse_builtins (void)
+ = build_function_type_list (void_type_node,
+ pchar_type_node, V16QI_type_node, NULL_TREE);
+
++ tree v2di_ftype_v2di_unsigned_unsigned
++ = build_function_type_list (V2DI_type_node, V2DI_type_node,
++ unsigned_type_node, unsigned_type_node,
++ NULL_TREE);
++ tree v2di_ftype_v2di_v2di_unsigned_unsigned
++ = build_function_type_list (V2DI_type_node, V2DI_type_node, V2DI_type_node,
++ unsigned_type_node, unsigned_type_node,
++ NULL_TREE);
++ tree v2di_ftype_v2di_v16qi
++ = build_function_type_list (V2DI_type_node, V2DI_type_node, V16QI_type_node,
++ NULL_TREE);
++
+ tree float80_type;
+ tree float128_type;
+ tree ftype;
+@@ -15435,6 +15657,20 @@ ix86_init_mmx_sse_builtins (void)
+ def_builtin (MASK_SSSE3, "__builtin_ia32_palignr", di_ftype_di_di_int,
+ IX86_BUILTIN_PALIGNR);
+
++ /* AMDFAM10 SSE4A New built-ins */
++ def_builtin (MASK_SSE4A, "__builtin_ia32_movntsd",
++ void_ftype_pdouble_v2df, IX86_BUILTIN_MOVNTSD);
++ def_builtin (MASK_SSE4A, "__builtin_ia32_movntss",
++ void_ftype_pfloat_v4sf, IX86_BUILTIN_MOVNTSS);
++ def_builtin (MASK_SSE4A, "__builtin_ia32_extrqi",
++ v2di_ftype_v2di_unsigned_unsigned, IX86_BUILTIN_EXTRQI);
++ def_builtin (MASK_SSE4A, "__builtin_ia32_extrq",
++ v2di_ftype_v2di_v16qi, IX86_BUILTIN_EXTRQ);
++ def_builtin (MASK_SSE4A, "__builtin_ia32_insertqi",
++ v2di_ftype_v2di_v2di_unsigned_unsigned, IX86_BUILTIN_INSERTQI);
++ def_builtin (MASK_SSE4A, "__builtin_ia32_insertq",
++ v2di_ftype_v2di_v2di, IX86_BUILTIN_INSERTQ);
++
+ /* Access to the vec_init patterns. */
+ ftype = build_function_type_list (V2SI_type_node, integer_type_node,
+ integer_type_node, NULL_TREE);
+@@ -15923,9 +16159,9 @@ ix86_expand_builtin (tree exp, rtx targe
+ enum insn_code icode;
+ tree fndecl = TREE_OPERAND (TREE_OPERAND (exp, 0), 0);
+ tree arglist = TREE_OPERAND (exp, 1);
+- tree arg0, arg1, arg2;
+- rtx op0, op1, op2, pat;
+- enum machine_mode tmode, mode0, mode1, mode2, mode3;
++ tree arg0, arg1, arg2, arg3;
++ rtx op0, op1, op2, op3, pat;
++ enum machine_mode tmode, mode0, mode1, mode2, mode3, mode4;
+ unsigned int fcode = DECL_FUNCTION_CODE (fndecl);
+
+ switch (fcode)
+@@ -16340,6 +16576,114 @@ ix86_expand_builtin (tree exp, rtx targe
+ emit_insn (pat);
+ return target;
+
++ case IX86_BUILTIN_MOVNTSD:
++ return ix86_expand_store_builtin (CODE_FOR_sse4a_vmmovntv2df, arglist);
++
++ case IX86_BUILTIN_MOVNTSS:
++ return ix86_expand_store_builtin (CODE_FOR_sse4a_vmmovntv4sf, arglist);
++
++ case IX86_BUILTIN_INSERTQ:
++ case IX86_BUILTIN_EXTRQ:
++ icode = (fcode == IX86_BUILTIN_EXTRQ
++ ? CODE_FOR_sse4a_extrq
++ : CODE_FOR_sse4a_insertq);
++ arg0 = TREE_VALUE (arglist);
++ arg1 = TREE_VALUE (TREE_CHAIN (arglist));
++ op0 = expand_expr (arg0, NULL_RTX, VOIDmode, 0);
++ op1 = expand_expr (arg1, NULL_RTX, VOIDmode, 0);
++ tmode = insn_data[icode].operand[0].mode;
++ mode1 = insn_data[icode].operand[1].mode;
++ mode2 = insn_data[icode].operand[2].mode;
++ if (! (*insn_data[icode].operand[1].predicate) (op0, mode1))
++ op0 = copy_to_mode_reg (mode1, op0);
++ if (! (*insn_data[icode].operand[2].predicate) (op1, mode2))
++ op1 = copy_to_mode_reg (mode2, op1);
++ if (optimize || target == 0
++ || GET_MODE (target) != tmode
++ || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
++ target = gen_reg_rtx (tmode);
++ pat = GEN_FCN (icode) (target, op0, op1);
++ if (! pat)
++ return NULL_RTX;
++ emit_insn (pat);
++ return target;
++
++ case IX86_BUILTIN_EXTRQI:
++ icode = CODE_FOR_sse4a_extrqi;
++ arg0 = TREE_VALUE (arglist);
++ arg1 = TREE_VALUE (TREE_CHAIN (arglist));
++ arg2 = TREE_VALUE (TREE_CHAIN (TREE_CHAIN (arglist)));
++ op0 = expand_expr (arg0, NULL_RTX, VOIDmode, 0);
++ op1 = expand_expr (arg1, NULL_RTX, VOIDmode, 0);
++ op2 = expand_expr (arg2, NULL_RTX, VOIDmode, 0);
++ tmode = insn_data[icode].operand[0].mode;
++ mode1 = insn_data[icode].operand[1].mode;
++ mode2 = insn_data[icode].operand[2].mode;
++ mode3 = insn_data[icode].operand[3].mode;
++ if (! (*insn_data[icode].operand[1].predicate) (op0, mode1))
++ op0 = copy_to_mode_reg (mode1, op0);
++ if (! (*insn_data[icode].operand[2].predicate) (op1, mode2))
++ {
++ error ("index mask must be an immediate");
++ return gen_reg_rtx (tmode);
++ }
++ if (! (*insn_data[icode].operand[3].predicate) (op2, mode3))
++ {
++ error ("length mask must be an immediate");
++ return gen_reg_rtx (tmode);
++ }
++ if (optimize || target == 0
++ || GET_MODE (target) != tmode
++ || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
++ target = gen_reg_rtx (tmode);
++ pat = GEN_FCN (icode) (target, op0, op1, op2);
++ if (! pat)
++ return NULL_RTX;
++ emit_insn (pat);
++ return target;
++
++ case IX86_BUILTIN_INSERTQI:
++ icode = CODE_FOR_sse4a_insertqi;
++ arg0 = TREE_VALUE (arglist);
++ arg1 = TREE_VALUE (TREE_CHAIN (arglist));
++ arg2 = TREE_VALUE (TREE_CHAIN (TREE_CHAIN (arglist)));
++ arg3 = TREE_VALUE (TREE_CHAIN (TREE_CHAIN (TREE_CHAIN (arglist))));
++ op0 = expand_expr (arg0, NULL_RTX, VOIDmode, 0);
++ op1 = expand_expr (arg1, NULL_RTX, VOIDmode, 0);
++ op2 = expand_expr (arg2, NULL_RTX, VOIDmode, 0);
++ op3 = expand_expr (arg3, NULL_RTX, VOIDmode, 0);
++ tmode = insn_data[icode].operand[0].mode;
++ mode1 = insn_data[icode].operand[1].mode;
++ mode2 = insn_data[icode].operand[2].mode;
++ mode3 = insn_data[icode].operand[3].mode;
++ mode4 = insn_data[icode].operand[4].mode;
++
++ if (! (*insn_data[icode].operand[1].predicate) (op0, mode1))
++ op0 = copy_to_mode_reg (mode1, op0);
++
++ if (! (*insn_data[icode].operand[2].predicate) (op1, mode2))
++ op1 = copy_to_mode_reg (mode2, op1);
++
++ if (! (*insn_data[icode].operand[3].predicate) (op2, mode3))
++ {
++ error ("index mask must be an immediate");
++ return gen_reg_rtx (tmode);
++ }
++ if (! (*insn_data[icode].operand[4].predicate) (op3, mode4))
++ {
++ error ("length mask must be an immediate");
++ return gen_reg_rtx (tmode);
++ }
++ if (optimize || target == 0
++ || GET_MODE (target) != tmode
++ || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
++ target = gen_reg_rtx (tmode);
++ pat = GEN_FCN (icode) (target, op0, op1, op2, op3);
++ if (! pat)
++ return NULL_RTX;
++ emit_insn (pat);
++ return target;
++
+ case IX86_BUILTIN_VEC_INIT_V2SI:
+ case IX86_BUILTIN_VEC_INIT_V4HI:
+ case IX86_BUILTIN_VEC_INIT_V8QI:
+--- gcc/config/i386/xmmintrin.h.jj 2006-10-05 00:29:29.000000000 +0200
++++ gcc/config/i386/xmmintrin.h 2007-02-09 21:26:06.000000000 +0100
+@@ -1241,7 +1241,9 @@ do { \
+ } while (0)
+
+ /* For backward source compatibility. */
+-#include <emmintrin.h>
++#ifdef __SSE2__
++# include <emmintrin.h>
++#endif
+
+ #endif /* __SSE__ */
+ #endif /* _XMMINTRIN_H_INCLUDED */
diff --git a/gcc41.spec b/gcc41.spec
index 1af6a15..294870a 100644
--- a/gcc41.spec
+++ b/gcc41.spec
@@ -1,10 +1,10 @@
-%define DATE 20070202
+%define DATE 20070209
%define gcc_version 4.1.1
-%define gcc_release 55
+%define gcc_release 56
%define _unpackaged_files_terminate_build 0
%define multilib_64_archs sparc64 ppc64 s390x x86_64
%define include_gappletviewer 1
-%ifarch %{ix86} x86_64 ia64 ppc
+%ifarch %{ix86} x86_64 ia64 ppc alpha
%define build_ada 1
%else
%define build_ada 0
@@ -116,7 +116,7 @@ Patch6: gcc41-ada-pr18302.patch
Patch7: gcc41-ada-tweaks.patch
Patch8: gcc41-java-slow_pthread_self.patch
Patch9: gcc41-ppc32-retaddr.patch
-Patch10: gcc41-i386-tune-core2.patch
+Patch10: gcc41-amdfam10.patch
Patch11: gcc41-dsohandle.patch
Patch12: gcc41-rh184446.patch
Patch13: gcc41-pr20297-test.patch
@@ -124,7 +124,7 @@ Patch14: gcc41-objc-rh185398.patch
Patch15: gcc41-tests.patch
Patch16: gcc41-pr25874.patch
Patch17: gcc41-pr30189.patch
-Patch18: gcc41-ssse3.patch
+Patch18: gcc41-rh227983.patch
Patch19: gcc41-hash-style-gnu.patch
Patch20: gcc41-pr30001.patch
Patch21: gcc41-java-libdotdotlib.patch
@@ -429,7 +429,7 @@ which are required to run programs compiled with the GNAT.
%patch7 -p0 -b .ada-tweaks~
%patch8 -p0 -b .java-slow_pthread_self~
%patch9 -p0 -b .ppc32-retaddr~
-%patch10 -p0 -b .i386-tune-core2~
+%patch10 -p0 -b .amdfam10~
%patch11 -p0 -b .dsohandle~
%patch12 -p0 -b .rh184446~
%patch13 -p0 -E -b .pr20297-test~
@@ -437,7 +437,7 @@ which are required to run programs compiled with the GNAT.
%patch15 -p0 -b .tests~
%patch16 -p0 -b .pr25874~
%patch17 -p0 -b .pr30189~
-%patch18 -p0 -b .ssse3~
+%patch18 -p0 -b .rh227983~
%patch19 -p0 -b .hash-style-gnu~
%patch20 -p0 -b .pr30001~
%patch21 -p0 -b .java-libdotdotlib~
@@ -1529,6 +1529,15 @@ fi
%doc rpm.doc/changelogs/libmudflap/ChangeLog*
%changelog
+* Sat Feb 10 2007 Jakub Jelinek <jakub@redhat.com> 4.1.1-56
+- update from gcc-4_1-branch (-r121479:121738)
+ - PRs c++/29487, target/29487, target/30370
+- merge gomp fixes from gcc-4_2-branch (-r121689:121690)
+ PR c++/30703
+- add AMDfam10 support (Harsha Jagasia, #222897)
+- set build_ada to 1 on alpha (#224247)
+- regenerate libjava.util.TimeZone data from tzdata2007a (#227888)
+
* Fri Feb 2 2007 Jakub Jelinek <jakub@redhat.com> 4.1.1-55
- update from gcc-4_1-branch (-r121069:121479)
- PRs c++/28988, fortran/30278, libstdc++/30586, middle-end/29683,
diff --git a/sources b/sources
index 4a137e7..2da9fe9 100644
--- a/sources
+++ b/sources
@@ -1 +1 @@
-da5ebc8e1045dab9142f7a0a9ce47304 gcc-4.1.1-20070202.tar.bz2
+900d1b9e7c3edfa4d855cd99d23f200f gcc-4.1.1-20070209.tar.bz2
reply other threads:[~2026-06-29 12:23 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=178273578709.1.7201169702592553932.rpms-gcc-15f5edb33ed3@fedoraproject.org \
--to=jakub@fedoraproject.org \
--cc=git-commits@fedoraproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox