niXforums Forum Index
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   PreferencesPreferences   Log in to check your private messagesLog in to check your private messages   Log inLog in 
·  nixdoc.net ·  man pages ·  Linux HOWTOs ·  FreeBSD Tips ·  Forums
navigation Forum index » *nix » BSD » FreeBSD » mail-lists » Architecture
[patch] Adding optimized kernel copying support - Part III
Post new topic   Reply to topic Page 1 of 1 [10 Posts] View previous topic :: View next topic
Author Message
Attilio Rao
*nix forums beginner


Joined: 26 Feb 2006
Posts: 17

PostPosted: Wed May 31, 2006 6:56 pm    Post subject: [patch] Adding optimized kernel copying support - Part III Reply with quote

Hi,
this is the last release which is rather finished and complete for the project.

I tested for consistency for a long time and the FPU handling
mechanism seems very robust so as copyin/copyout do.

What I'm looking for, at this point, are testers for peroformances.
What is proposed in the patch is one of the better solutions for UP
archs (not running with PREEMPTION) but more general cases might be
handled with time.

I hope that somebody wants to play with him, giving suggestions and
doing different benchmarks.

The code can be found here:
http://users.gufi.org/~rookie/works/patches/xmmcopy_6_1.diff

and is for RELEASE_6_1 in order to have a wider range of testers (a
diff against HEAD will be available ASAP).

Please keep in mind that this is not a complete rip of DflyBSD code
beacause it is different in a lot of parts.

For any kind of tecnical questions, please mail me.

Attilio

PS: a particular thanks goes to Bruce Evans for his benchmarks and
feedbacks about code structure


--
Peace can only be achieved by understanding - A. Einstein
_______________________________________________
freebsd-arch@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
Back to top
Attilio Rao
*nix forums beginner


Joined: 26 Feb 2006
Posts: 17

PostPosted: Wed May 31, 2006 8:10 pm    Post subject: Re: [patch] Adding optimized kernel copying support - Part III Reply with quote

Sorry, but I unforgot one thing so, please, redownload the patch now.

Attilio

2006/5/31, Attilio Rao <asmrookie@gmail.com>:
Quote:
Hi,
this is the last release which is rather finished and complete for the project.

I tested for consistency for a long time and the FPU handling
mechanism seems very robust so as copyin/copyout do.

What I'm looking for, at this point, are testers for peroformances.
What is proposed in the patch is one of the better solutions for UP
archs (not running with PREEMPTION) but more general cases might be
handled with time.

I hope that somebody wants to play with him, giving suggestions and
doing different benchmarks.

The code can be found here:
http://users.gufi.org/~rookie/works/patches/xmmcopy_6_1.diff

and is for RELEASE_6_1 in order to have a wider range of testers (a
diff against HEAD will be available ASAP).

Please keep in mind that this is not a complete rip of DflyBSD code
beacause it is different in a lot of parts.

For any kind of tecnical questions, please mail me.

Attilio

PS: a particular thanks goes to Bruce Evans for his benchmarks and
feedbacks about code structure


--
Peace can only be achieved by understanding - A. Einstein



--
Peace can only be achieved by understanding - A. Einstein
_______________________________________________
freebsd-arch@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
Back to top
Suleiman Souhlal
*nix forums beginner


Joined: 17 Sep 2003
Posts: 12

PostPosted: Wed May 31, 2006 8:18 pm    Post subject: Re: [patch] Adding optimized kernel copying support - Part III Reply with quote

Hello Attilio,

Attilio Rao wrote:
Quote:
Hi,
this is the last release which is rather finished and complete for the
project.

I tested for consistency for a long time and the FPU handling
mechanism seems very robust so as copyin/copyout do.

Nice work. Any chance you could also port it to amd64? :-)

Quote:
What I'm looking for, at this point, are testers for peroformances.
What is proposed in the patch is one of the better solutions for UP
archs (not running with PREEMPTION) but more general cases might be
handled with time.

Does that mean it won't work with SMP and PREEMPTION?

Quote:
I hope that somebody wants to play with him, giving suggestions and
doing different benchmarks.

What kind of performance improvements did you see in your benchmarks?

Quote:
The code can be found here:
http://users.gufi.org/~rookie/works/patches/xmmcopy_6_1.diff

and is for RELEASE_6_1 in order to have a wider range of testers (a
diff against HEAD will be available ASAP).

Please keep in mind that this is not a complete rip of DflyBSD code
beacause it is different in a lot of parts.

For any kind of tecnical questions, please mail me.

I wonder if we could get rid of the memcpy_vector (copyin/copyout_vector
before this patch), bzero_vector and bcopy_vector function pointers and
do boot-time patching of the callers to the right version.

I have a linux-inspired proof-of-concept demo of this boot-time patching
at http://people.freebsd.org/~ssouhlal/testing/bootpatch-20060527.diff.
It prefetches the next element in the *_FOREACH() macros in sys/queue.h.
The patching that it does is to use PREFETCH instruction instead of
PREFETCHNTA if the cpu is found to support SSE2.

-- Suleiman

_______________________________________________
freebsd-arch@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
Back to top
Attilio Rao
*nix forums beginner


Joined: 26 Feb 2006
Posts: 17

PostPosted: Wed May 31, 2006 8:29 pm    Post subject: Re: [patch] Adding optimized kernel copying support - Part III Reply with quote

2006/5/31, Suleiman Souhlal <ssouhlal@freebsd.org>:
Quote:
Hello Attilio,

Hello Suleiman,

Quote:
Nice work. Any chance you could also port it to amd64? Smile

Not in the near future, I think. :P

Quote:
Does that mean it won't work with SMP and PREEMPTION?

Yes it will work (even if I think it needs more testing) but maybe
would give lesser performances on SMP|PREEMPTION due to too much
traffic on memory/cache. For this I was planing to use non-temporal
instructions
(obviously benchmarks would be very appreciate).

Quote:
What kind of performance improvements did you see in your benchmarks?

I'm sorry but I didn't benchmarked on P4 (with xmm instructions).
On P3, using integer copies, with dd and time I measured about 2%
increasing, I hope more on P4 (and you might add xmm usage too).

Quote:
I wonder if we could get rid of the memcpy_vector (copyin/copyout_vector
before this patch), bzero_vector and bcopy_vector function pointers and
do boot-time patching of the callers to the right version

Mmm, please note that on i386, at boot time (I've never studied that
code) it seems requiring of vectorized version of bcopy/bzero.
memcpy_vector that I introduced is used in slightly a different way
from the other so I don't think it's so simple applying your idea to
these.

Quote:
I have a linux-inspired proof-of-concept demo of this boot-time patching
at http://people.freebsd.org/~ssouhlal/testing/bootpatch-20060527.diff.
It prefetches the next element in the *_FOREACH() macros in sys/queue.h.
The patching that it does is to use PREFETCH instruction instead of
PREFETCHNTA if the cpu is found to support SSE2.

It would be very appreciate to have it MI (yes, I mean MD + MI structure RazzP)

Attilio

--
Peace can only be achieved by understanding - A. Einstein
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
Back to top
Bruce Evans
*nix forums Guru Wannabe


Joined: 22 Mar 2002
Posts: 190

PostPosted: Wed May 31, 2006 11:25 pm    Post subject: Re: [patch] Adding optimized kernel copying support - Part III Reply with quote

On Wed, 31 May 2006, Attilio Rao wrote:

Quote:
2006/5/31, Suleiman Souhlal <ssouhlal@freebsd.org>:
Nice work. Any chance you could also port it to amd64? :-)

Not in the near future, I think. Razz

It is not useful for amd64. An amd64 has enough instruction bandwidth
to saturate the L1 cache using 64-bit accesses although not using
32-bit accesses. An amd64 has 64-bit integer registers which can be
accesses without the huge setup overheads and code complications for
MMX/XMM registers. It already uses 64-bit registers or 64-bit movs
for copying and zeroing of course. Perhaps it should use prefetches
and nontemporal writes more than it already does, but these don't
require using SSE2 instructions like nontemporal writes do for 32-bit
CPUs.

Quote:
Does that mean it won't work with SMP and PREEMPTION?

Yes it will work (even if I think it needs more testing) but maybe
would give lesser performances on SMP|PREEMPTION due to too much
traffic on memory/cache. For this I was planing to use non-temporal
instructions
(obviously benchmarks would be very appreciate).

Er, isn't its main point to fix some !SMP assumptions made in the old
copying-through-the-FPU code? (The old code is messy due to its avoidance
of global changes. It wants to preserve the FPU state on the stack, but
this doesn't quite work so it does extra things (still mostly locally)
that only work in the !SMP && (!SMPng even with UP) case. Patching this
approach to work with SMP || SMPng cases would make it messier.)

The new code wouldn't behave much differently under SMP. It just might
be a smaller optimization because more memory pressure for SMP causes
more cache misses for everything and there are no benefits from copying
through MMX/XMM unless nontemporal writes are used. All (?) CPUs with
MMX or SSE* can saturate main memory using 32-bit instructions. On
32-bit CPUs, the benefits of using MMX/XMM come from being able to
saturate the L1 cache on some CPUs (mainly Athlons and not P[2-4]),
and from being able to use nontemporal writes on some CPUs (at least
AthlonXP via SSE extensions all CPUs with SSE2).

Bruce
_______________________________________________
freebsd-arch@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
Back to top
Attilio Rao
*nix forums beginner


Joined: 26 Feb 2006
Posts: 17

PostPosted: Wed May 31, 2006 11:32 pm    Post subject: Re: [patch] Adding optimized kernel copying support - Part III Reply with quote

2006/6/1, Bruce Evans <bde@zeta.org.au>:
Quote:

Does that mean it won't work with SMP and PREEMPTION?

Yes it will work (even if I think it needs more testing) but maybe
would give lesser performances on SMP|PREEMPTION due to too much
traffic on memory/cache. For this I was planing to use non-temporal
instructions
(obviously benchmarks would be very appreciate).

Er, isn't its main point to fix some !SMP assumptions made in the old
copying-through-the-FPU code? (The old code is messy due to its avoidance
of global changes. It wants to preserve the FPU state on the stack, but
this doesn't quite work so it does extra things (still mostly locally)
that only work in the !SMP && (!SMPng even with UP) case. Patching this
approach to work with SMP || SMPng cases would make it messier.)

The new code wouldn't behave much differently under SMP. It just might
be a smaller optimization because more memory pressure for SMP causes
more cache misses for everything and there are no benefits from copying
through MMX/XMM unless nontemporal writes are used. All (?) CPUs with
MMX or SSE* can saturate main memory using 32-bit instructions. On
32-bit CPUs, the benefits of using MMX/XMM come from being able to
saturate the L1 cache on some CPUs (mainly Athlons and not P[2-4]),
and from being able to use nontemporal writes on some CPUs (at least
AthlonXP via SSE extensions all CPUs with SSE2).

I was just speaking about the copying routine itself and not about the
SSE2 environment preserving mechanism. It remains untouched in SMP
case.

However I need to say you were right when you suggested me to merge
anything in support.s since it has a more coherent design.

Attilio


--
Peace can only be achieved by understanding - A. Einstein
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
Back to top
Alexander Leidinger
*nix forums addict


Joined: 16 Jun 2002
Posts: 59

PostPosted: Thu Jun 01, 2006 7:30 am    Post subject: Re: [patch] Adding optimized kernel copying support - Part III Reply with quote

Quoting Attilio Rao <asmrookie@gmail.com> (from Thu, 1 Jun 2006
01:32:12 +0200):

Quote:
2006/6/1, Bruce Evans <bde@zeta.org.au>:

The new code wouldn't behave much differently under SMP. It just might
be a smaller optimization because more memory pressure for SMP causes
more cache misses for everything and there are no benefits from copying
through MMX/XMM unless nontemporal writes are used. All (?) CPUs with
MMX or SSE* can saturate main memory using 32-bit instructions. On
32-bit CPUs, the benefits of using MMX/XMM come from being able to
saturate the L1 cache on some CPUs (mainly Athlons and not P[2-4]),
and from being able to use nontemporal writes on some CPUs (at least
AthlonXP via SSE extensions all CPUs with SSE2).

I was just speaking about the copying routine itself and not about the
SSE2 environment preserving mechanism. It remains untouched in SMP
case.

AFAIR the DFly FPU rework allows to use FPU/XMM instructions in their
kernel without the need to do some manual state preserving (it's done
automatically on demand). So one could use XMM instructions in RAID 5
or crypto parts of the code to test if it is a performance benefit. Do
I understand the above part right that with this patch this is also
the case for us in the UP case, but not in the SMP case?

Bye,
Alexander.

--
Selling GoodYear Eagle F1 235/40ZR18, 2x 4mm + 2x 5mm, ~150 EUR
you have to pick it up between Germany/Saarland and Luxembourg/Capellen
http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137

_______________________________________________
freebsd-arch@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
Back to top
Attilio Rao
*nix forums beginner


Joined: 26 Feb 2006
Posts: 17

PostPosted: Thu Jun 01, 2006 12:27 pm    Post subject: Re: [patch] Adding optimized kernel copying support - Part III Reply with quote

2006/6/1, Alexander Leidinger <Alexander@leidinger.net>:
Quote:
AFAIR the DFly FPU rework allows to use FPU/XMM instructions in their
kernel without the need to do some manual state preserving (it's done
automatically on demand). So one could use XMM instructions in RAID 5
or crypto parts of the code to test if it is a performance benefit. Do
I understand the above part right that with this patch this is also
the case for us in the UP case, but not in the SMP case?

Since it seems to be a mis-understanding about this I will try to
explain better.

The patch can be saw as a 3-step issue*:

1) Implementing a robust and working method to preserve FPU/MMX/XMM
usage into the kernel

2) Modifing copyin/copyout/memcpy in order to use xmm registers (in a
first moment I thought to bzero/bcopy too but, since they're used for
short amount of datas, xmm usage is deprecated due to heavyness of
context saving).

3) Giving a reliable and better version of memcpy (that I called i686_memcpy).

1 is achieved successfully and it is the same in UP and SMP arches.
It's imported from Dragonfly and I tested on my boxes very carefully
and for a long time. It never give me problems.

2 seems good too, even if it needs more stress-testing I think. It is
the same on UP and SMP and needs no changes.

3 is what I was speaking about having different versions for UP and
SMP. It needs more testing even if the code seems correct to me. It's
important to understand that it is an example on how new architecture
for FPU saving/restore can be used (you can see at it as a reference
for further coding I guess).

Maybe FPU_PICKUP/FPU_DROP could be modified and exported in order to
be used in different parts of the kernel...

So I hope it's clearer now.

Attilio

* I refer, for this discussion, exclusively to FreeBSD-i386

--
Peace can only be achieved by understanding - A. Einstein
_______________________________________________
freebsd-arch@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
Back to top
Matthew Dillon
*nix forums Guru Wannabe


Joined: 07 May 2002
Posts: 112

PostPosted: Tue Jun 06, 2006 6:48 am    Post subject: Re: [patch] Adding optimized kernel copying support - Part III Reply with quote

:AFAIR the DFly FPU rework allows to use FPU/XMM instructions in their
:kernel without the need to do some manual state preserving (it's done
:...
:
:Bye,
:Alexander.

That actually isn't quite how it works. If the userland had active
FP state then the kernel still has to save it before it can use the
FP registers. The kernel does not have to restore it, however (that is,
it can just let userland take a fault to restore its FP state).
However, the kernel still has to mess around with CR0_TS when pushing
and popping an FP context / save area.

The FP state reworking in DragonFly had the following effects:

* We now have a save area pointer instead of a fixed, static save area.
This allows FP state to be 'stacked' without having to play weird
games with a static save area.

* The standard FP restoration fault is no longer limited to userland.
The kernel can push its own state, switch away to another thread,
switch back, and take a fault to restore it, independant of the
user FP state.

--

It would be possible to simplify matters and actually implement what
you say... the ability to use FP registers without any manual state
preserving. That is, to be able to treat the FP registers just like
normal registers. It would require saving and restoring a great deal
more state in the interrupt/exception frame push code and the
thread switch code, though. It could be conditionalized based CR0_TS
or it could just be done unconditionally. I'm not sure it would yield
any improvement in performance, though.

There is also the problem of the storage required to manage multiple
save areas. It's something like, what, 512 bytes on the stack? Because
of that DragonFly still implements an FPU interlock so the kernel
doesn't stack more then one additional save area due to FAST interrupts,
stacked exceptions, etc.

-Matt

_______________________________________________
freebsd-arch@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
Back to top
Attilio Rao
*nix forums beginner


Joined: 26 Feb 2006
Posts: 17

PostPosted: Tue Jun 06, 2006 7:18 am    Post subject: Re: [patch] Adding optimized kernel copying support - Part III Reply with quote

2006/6/6, Matthew Dillon <dillon@apollo.backplane.com>:
Quote:
:AFAIR the DFly FPU rework allows to use FPU/XMM instructions in their
:kernel without the need to do some manual state preserving (it's done
:...
:
:Bye,
:Alexander.

That actually isn't quite how it works. If the userland had active
FP state then the kernel still has to save it before it can use the
FP registers. The kernel does not have to restore it, however (that is,
it can just let userland take a fault to restore its FP state).
However, the kernel still has to mess around with CR0_TS when pushing
and popping an FP context / save area.

The FP state reworking in DragonFly had the following effects:

* We now have a save area pointer instead of a fixed, static save area.
This allows FP state to be 'stacked' without having to play weird
games with a static save area.

* The standard FP restoration fault is no longer limited to userland.
The kernel can push its own state, switch away to another thread,
switch back, and take a fault to restore it, independant of the
user FP state.

--

It would be possible to simplify matters and actually implement what
you say... the ability to use FP registers without any manual state
preserving. That is, to be able to treat the FP registers just like
normal registers. It would require saving and restoring a great deal
more state in the interrupt/exception frame push code and the
thread switch code, though. It could be conditionalized based CR0_TS
or it could just be done unconditionally. I'm not sure it would yield
any improvement in performance, though.

I tend to agree with you beacause it would be too much work/storage
savings which will loose all the improvements gave to xmm registers.
The point about using xmm registers is just performance improvements.
I think that having an interlock into the kernel (and so just one
kernel saving-state) is the better thing for performances, even if it
doesn't provide a real unconditional usage.

Attilio

PS: Please consider too that xmm registers seem increasing
performances just if used with aligned with aligned datas (movaps,
movdqa), so not in the general case.
MMXs, instead, seem giving very poor improvement, in particular on
evolved architectures (>= P3)

--
Peace can only be achieved by understanding - A. Einstein
_______________________________________________
freebsd-arch@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
Back to top
Google

Back to top
Display posts from previous:   
Post new topic   Reply to topic Page 1 of 1 [10 Posts] View previous topic :: View next topic
The time now is Wed Dec 03, 2008 11:55 pm | All times are GMT
navigation Forum index » *nix » BSD » FreeBSD » mail-lists » Architecture
Jump to:  

Similar Topics
Topic Author Forum Replies Last Post
No new posts Install suse from USB without BIOS support vjy04 Suse 0 Tue Jun 24, 2008 8:57 am
No new posts Install suse from USB without BIOS support vjy04 Suse 0 Tue Jun 24, 2008 8:56 am
No new posts Install suse from USB without BIOS support vjy04 Suse 0 Tue Jun 24, 2008 8:56 am
No new posts Install suse from USB without BIOS support vjy04 Suse 0 Tue Jun 24, 2008 8:55 am
No new posts [PATCH] Mantaining turnstile aligned to 128 bytes in i386... Attilio Rao Architecture 5 Tue Jul 25, 2006 3:13 pm

Personal Finance | Pink Ranger | Loans | Home Loan | Loans
Copyright © 2004-2005 DeniX Solutions SRL
 
Other DeniX Solutions sites: Unix/Linux blog |  electronics forum |  medicine forum |  science forum | 
Privacy Policy


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.4045s ][ Queries: 16 (0.2597s) ][ GZIP on - Debug on ]