QUICK LOADING: Directly into Stack segment, Win32; File mapping

Discussion:

(too old to reply)

Coenraad Loubser

22 years ago

This post touches on two issues:
(1) Valid file mapping ranges for the various OS's
(2) Location of stack segment of loaded image file,
access rights, and other issues

Hi, I've been experimenting with file mapping over the last few weeks.
Quite convenient, not having to initialise your runtime data every time
you load.

Now there are certain issues. If you use hardlinked pointers in Windows
mapped images, it seems you are stuck with the OS you created the links in.
Because a range that is valid for Windows 98 is completely invalid for
Windows 2000
and still different on Windows XP.

Can anyone shed more specific light on this?
a) What ranges will windows allocate to your mapping object if you ask it to
b) What ranges are valid for predefined mapping offsets
c) ..slightly off the file mapping topic, maybe. Anyone know how to (under
win32)
create "alias" selectors in your running image? In other words, how can
you make
two regions point to the same data?

which brings us to (2):

I have a cool little routine that creates any app by just repeating a loop
which
pops its needed pointers and data off the stack. So I load my datafile right
onto the stack.
Problem is, I just cant seem to create a segment in my PE exe, that I can
either use as the
stack, or point to load directly onto the stack.

I assume that I can't move the stack to some arbitrary region of memory,
because the loading
thread for my program needs to have access to this region, which it doesnt
and I dont know
how to give it to it, or maybe this region is just not visible from some
other (I wish I knew
which!?) threads that for some reason shares the loaded app's stack.
(Questions e,f and maybe
g are in this paragraph somewhere!)

So, what I ask is:
Information about the stack in windows. (98, NT, XP, 2000, 2003, whatever,
all of them!)

Why, you ask? There are so many other ways to do it.
Well, Anyone ever experimented to see what is quicker, adding a section that
gets mapped to
your PE file, or loading some other file on initialisation of your app?

Popping values off the stack or reading consecutive data from another block?

Any numbers here would be a question h) and i)!

Thanks!
I'll post my little stack routine here when I get some answers. It's really
nice, can load a
a delphi-like application instantly, with all the details and data set up,
custom classes,
fonts, colours, all from just a simple datafile and tiny loader. I've coded
an IDE that's
smaller than 3KB. I like it... imagine how many more apps you'd be able to
fit into your 2Ghz
server's 2gb ram! Instead of being able to host a few hundred 4mb apps, it'd
be able to host
thousands, of say.. tiny custom database query-applications and such.

Coenraad

Ivan Brugiolo [MSFT]

22 years ago

Permalink

a) What ranges will windows allocate to your mapping object if you ask it
to

The first availalble address will be used, unless it's an image,
where the prefessed address is honored if availalbe with the required range.
The address ranges availalbe/in-user in a given process are
extracted from the VAD node of the VAD tree, plus there's the concept of
a contiguous set of pages with the same protection bits in hte same VAD.
You can use the !vad debugger extension command in a (Local)KD session to
inspect.
You can also use VirtualQueryEx or the !address debugger extension to
accomplish the same.

2) Location of stack segment of loaded image file, access rights, and other
issues

Stack segment and data segment are the same by default in user mode.
Access rights to the stack are implemented with page protection techniques
rather than segment protection.
Again, look at the VirtualQUeryEx output to understand the GUARD-page
concept.

0:000> r
eax=77fc35e6 ebx=7ffdf000 ecx=00000000 edx=77f8ed01 esi=77fc23b4
edi=00191f08
eip=77f43847 esp=0006fb60 ebp=0006fc48 iopl=0 nv up ei pl nz na pe
nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000
efl=00000202

c) ..Anyone know how to create "alias" selectors in your running image?

You can use the unsupported callgate techniques (for GDT)
and/or the undocumented/unsupported NtSetLdtEntries to replace/set/change
LDT entries,
and change the segment register to something different.
I would advise you to look at the fiber code in kernel32 to see how to
switch stack in a more supported way.
Hope this unswers the unclear concept of "alias selector in an image".
Selectors are a x86 specific CPU concept, while an image is a Win32 concept.

now some short advices:

If you had reversed ingineered the CreateProcess code you would see how
the Process is created, how the stack is created, and how the thread is
created.
Given that the 3 concepts are separated, I can see possiblity for you to
create
the process first, map your data over there, set the stack to be over the
mapped section,
and create a thread that executes over that stack.
This will interfear with the stack-commit code for stacks in MM, but if you
can committ the
region of memory in advance, and swear to never overflow the stack, you
should not have to expand it.

caveat: answers above are generally true for supported NT-derivative OSes.

I also don't understand what's your final goal: having small working-set
applications ?
An application would use anyway OS code that is shared across all the
running applications,
and the virtual address space of 4 Gig per application is not a waste,
exactly because it's virtual.
But maybe I did not understand your comments at the end of the post.

BTW, you choose a wide set of newsgroups to cross post, that does not seem
adequate in this case.
--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of any included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

...

Coenraad Loubser

22 years ago

Permalink

Hi, Thanks!

Yeah, I'm not all that active on the newsgroups and have no idea where a
post like this really belongs, so it was just a shot in the dark! I know my
questions were vaguely formulated, but you understood them right!

The problem is that XP and 98SE (and windows 95, 98, 2000 for that matter)
doesn't seem to share a common mapping region when mapping a file with
MapViewOfFileEx.

If you leave the address up to windows, XP maps a first 1mb file to
0x00840000 and 98SE to 0x8d224000. (I presume that these addresses are
further dependent on what dll's are used and other applications resident)
When you specify an address manually, some of these addresses are rejected.
I've tried some addresses by trial-and-error and XP seems so favour
addresses in the 0x02000000-0x08000000 region, while 98SE favours addresses
from 0x90000000 and up.

Now, why 50 different applications can't map 50 different files to the same
(unused) region in their own virtual address space, I don't really
understand. Or I could, but shouldn't it NOT work like that?

The main things I'm trying to avoid, is having to fix up loaded pointers
(and having to keep track of them in the first place) and having different
data-chunks to load under different OS's.

Aha, but I'm being silly, aren't I. Because 0x00400000-to-wherever is shared
under all of the OS's!!

The only other thing I'd like to do then, is get the stack to run in this
mapped region, just for a short section of code. I know I could just copy
the data to the stack, but I'm trying to avoid this. When I set SS=DS it
does pack a lot of garbage (from other threads no doubt) into this region,
but eventually it crashes. (Must be the access rights, because I've given it
many megabytes of committed space!) Will look at VirtualQueryEx and also
CreateFiberEx, maybe thats just the starting point I was looking for...

Thanks

Coenraad

Post by Ivan Brugiolo [MSFT]
Selectors are a x86 specific CPU concept, while an image is a Win32 concept.

Far as I'm concerned Win32 only runs on x86's! :)

Post by Ivan Brugiolo [MSFT]
If you had reversed ingineered the CreateProcess code you would see how
the Process is created, how the stack is created, and how the thread is
created.

Something like this could take DAYS, for me! To distinguish where it's doing
what, should'nt it just be properly documented and freely accessible
somewhere?

Post by Ivan Brugiolo [MSFT]
Given that the 3 concepts are separated, I can see possiblity for you to
create
the process first, map your data over there, set the stack to be over the
mapped section,

I presume that this is where windows maps the various segments of the PE
image, right after it creates the process. But my guess is that it won't use
the regular API calls to do so, so it may be hard to track...

Post by Ivan Brugiolo [MSFT]
and create a thread that executes over that stack.
This will interfear with the stack-commit code for stacks in MM, but if you
can committ the
region of memory in advance, and swear to never overflow the stack, you
should not have to expand it.

Arg, it sounds too complicated. I wouldn't even know where to start to
change the stack for a thread. All I know how to do is PUSH DS; POP SS and I
was hoping that theres a simple call or something that could make that work!

Post by Ivan Brugiolo [MSFT]
caveat: answers above are generally true for supported NT-derivative OSes.
I also don't understand what's your final goal: having small working-set
applications ?
An application would use anyway OS code that is shared across all the
running applications,
and the virtual address space of 4 Gig per application is not a waste,
exactly because it's virtual.
But maybe I did not understand your comments at the end of the post.

I love simplicity and elegance: imagine this:

when you call an API function, everything is pushed onto the stack.

So you could for example push the parameters for 5 functions, and the call
them,

consecutively, with no pushes inbetween. Now, you could just as well load
all that data

from a file, and then just run a loop that calls a function a set number of
times.

This works quite nicely, since you can first pop the number of times off,
run the loop:

maybe after calling a function (createwindow) then pop the offset where to
store the handle,

or pop how many fixups, and store what needs to go there, then repeat the
loop. so you can

do a whole lot of intialisations while eliminating thousands of push
instructions, and

by using a small datafile and a small init routine.

PE Exe file

.data section (0xStackofs) data in a pushed-on-stack order for creating a
few classes, windows,

fonts, etc, with various sub-controls and whatevrer

.code section 0x400000 maybe 100 bytes of code that does simply this:

pop n

repeat n times

call createwindow

pop fixups

jz nextloop

repeat fixups times

pop address

mov address,eax

whallah!

The conventional sickens me! .RC files! Arg please!

Manually editing text in a createwindow call in a .CPP or .PP source file!
Arg!

I've got a nice graphical interface to editing this stack-format file, and a
small loader.

Well, modern CPU's may be better optimised to execute large linear chunks of
code, and disk

space may be almost free, but I'm sure this would execute just as fast,
maybe faster, and it will definitely run better on older pc's (of which
there are a lot in the world, that could be put to good use with efficient
software!)

I have 486's configured to boot windows 95 quicker than 1Ghz P4's PC's can
boot XP. I live in a poverty-strickled country where even the oldest
crappest PC's can be put to good use.

I can tell you, from my experience and experiments, I have found around
50-60% of the code in most windows applications are utterly redundant,
by-products of ancient-and-arcane compiler technology. Imagine some kinda
super-recompiler that can make your 2GHZ pc perform like a 5GHZ one.....
it's not gonna happen, but better compilers and OS's are going to be
developed...

Ivan Brugiolo [MSFT]

22 years ago

Permalink

I'm not at all familiar with the Win9x OS-es,
but NT-derivative kernels returns the first availalble address that spans
the requested size for a FileMap request.
Attempts to relay on a given number are pointless,
given all the factors that are involved in the address space layout
construction

50 Applicaton can map a view of the same file at 50 (potentially different)
addresses
because that's the whole purpose of the Virtual Address space.

0x00400000 if not shares under all the OS-es.
it may just happen by pure luck that the loader honors
the base address of the image set by the linker, because there was no other
module
or address range competing for that address.
This is off-course not guarantee to happen.

You don't need to replace segment register on x86,
since the stack and data segment anyway spans
the whole availalble user-mode address space.

Win32 runs on x86, IA64, AMD64 and it used to run on Alpha, AXP64, MIPS,
PPC, i860.
If you think for a second about the different exception handling
implementation
in IA64 and AMD64, and the different calling mechanism on IA64,
you will realize why there's no point for the OS to expose unportable and
useless features.
Fibers are the exposed way to switch to a different stack from the current
thread.
I won't go over the complications of explaining what a backing storage is on
IA64,
and why it's essential that the OS manages that, and why you cannot simply
map
a piece of memory and have that to be the stack/backing-storage.

as far as using arbitrary stack as pre-loaded storage of data,
I would advise you to understand what else goes into a stack,
such as the exception unwind information.
I'm really curious of how you would load that from a mapped
file and have that working properly.
--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of any included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

...