Template cache inefficency under Windows

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Template cache inefficency under Windows

Daniel Dekany
A not too big issue for windows-only apps: Basically, we cache parsed
files using their path as the key. Under Windows the same file can be
referred with various names that are only different in their case,
like /bar.ftl and /Bar.ftl. Yet, if both variation is used to get the
template, the template will be loaded, parsed and then cached twice
(or for even more times -- there are several other case variations). I
wonder if something could be done to prevent this. Like, detecting if
the filesystem is case sensitive... what is the most reliable way of
doing that? Check if the "os.name" system property contains "Windows"
or "WinCE", or if "file.separator" is "\" comes to my mind, but I'm
not sure how reliable these are in practice. (Canonicalize all paths
would be a solution in principle, but it would require
canonicalization whenever a template is get, regardless if it is
already on the cache, which is certainly not a good idea, given that
canonicalization can entail I/O.) (Or for apps that should run on UN*X
as well: maybe the TemplateLoader implementation could check under
Windows if the case was correct. But then again, how can I be really
sure if I'm under Windows.)

--
Best regards,
 Daniel Dekany


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
FreeMarker-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freemarker-devel
Reply | Threaded
Open this post in threaded view
|

Re: Template cache inefficency under Windows

Attila Szegedi
On Wed, 11 Apr 2007 17:04:08 +0200, Daniel Dekany <[hidden email]>  
wrote:

> A not too big issue for windows-only apps: Basically, we cache parsed
> files using their path as the key. Under Windows the same file can be
> referred with various names that are only different in their case,
> like /bar.ftl and /Bar.ftl. Yet, if both variation is used to get the
> template, the template will be loaded, parsed and then cached twice
> (or for even more times -- there are several other case variations). I
> wonder if something could be done to prevent this. Like, detecting if
> the filesystem is case sensitive... what is the most reliable way of
> doing that? Check if the "os.name" system property contains "Windows"
> or "WinCE", or if "file.separator" is "\" comes to my mind, but I'm
> not sure how reliable these are in practice. (Canonicalize all paths
> would be a solution in principle, but it would require
> canonicalization whenever a template is get, regardless if it is
> already on the cache, which is certainly not a good idea, given that
> canonicalization can entail I/O.) (Or for apps that should run on UN*X
> as well: maybe the TemplateLoader implementation could check under
> Windows if the case was correct. But then again, how can I be really
> sure if I'm under Windows.)

You can't be really sure. And even if you were, Windows can handle  
arbitrary filesystems, not all of them case-insensitive. As a matter of  
fact, a NTFS partition on Windows NT based OSes can be used as both case  
sensitive and case insensitive at the same time. In general, accessing the  
filesystem through the Win32 OS personality[*] will cause it to act as  
case insensitive and accessing it through the POSIX OS personality will  
cause it to act as case sensitive. Nice ball of hair, huh? :-)

And, you'd really have to solve this on a TemplateLoader level because a  
TemplateCache can use any loader, and some loaders could be case sensitive  
regardless of what OS they run on. I don't really see a better solution  
than doing filename canonicalization, but we'd have to extend  
TemplateLoader to be able to return a "canonical" template path.

Another solution would be to turn to cryptography -- calculate a  
cryptographical hash of the template source code (i.e. SHA-160), and if it  
matches another Template object's hash, create a new Template object that  
reuses the other's root TemplateElement. But I really think that's an  
overkill.

Also, the problem is not Windows specific - you can have filesystems in  
Linux and other Unices that aren't case sensitive. I.e. by default, the  
HFS+ filesystem of Mac OS X is not case sensitive either (but you can  
format it to be case sensitive).

Attila.

--
home: http://www.szegedi.org
weblog: http://constc.blogspot.com

[*]WinNT based Windowses expose distinct sets of APIs to the software that  
runs on them, referred to as "personalities". The "usual" personality is  
of course Win32, but there's also a POSIX personality, and an optionally  
installable OS/2 personality as well. These allow for non-GUI programs  
written for POSIX and OS/2 APIs to be compiled and run on these OSes  
natively.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
FreeMarker-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freemarker-devel
Reply | Threaded
Open this post in threaded view
|

Re: Template cache inefficency under Windows

Randall R Schulz
In reply to this post by Daniel Dekany
On Wednesday 11 April 2007 08:04, Daniel Dekany wrote:
> A not too big issue for windows-only apps: Basically, we cache parsed
> files using their path as the key. Under Windows the same file can be
> referred with various names that are only different in their case,
> like /bar.ftl and /Bar.ftl. Yet, if both variation is used to get the
> template, the template will be loaded, parsed and then cached twice
> (or for even more times -- there are several other case variations).
> I wonder if something could be done to prevent this. ...

Check out how Cygwin (<http://cygwin.com/>) computes its pseudo inode
numbers (visible when, e.g., you use the "-i" option of "ls"). If I
recall correctly, that number is stable over time and uniquely
identifies a given file. It should make a fine key for your associative
store, if you can get at the same information they use.

Of course, you may well need JNI code to do that.


Randall Schulz

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
FreeMarker-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freemarker-devel
Reply | Threaded
Open this post in threaded view
|

Re: Template cache inefficency under Windows

Leos Literak
Randall R Schulz wrote:

> On Wednesday 11 April 2007 08:04, Daniel Dekany wrote:
>> A not too big issue for windows-only apps: Basically, we cache parsed
>> files using their path as the key. Under Windows the same file can be
>> referred with various names that are only different in their case,
>> like /bar.ftl and /Bar.ftl. Yet, if both variation is used to get the
>> template, the template will be loaded, parsed and then cached twice
>> (or for even more times -- there are several other case variations).
>> I wonder if something could be done to prevent this. ...
>
> Check out how Cygwin (<http://cygwin.com/>) computes its pseudo inode
> numbers (visible when, e.g., you use the "-i" option of "ls"). If I
> recall correctly, that number is stable over time and uniquely
> identifies a given file. It should make a fine key for your associative
> store, if you can get at the same information they use.
>
> Of course, you may well need JNI code to do that.

I can understand that it it is suboptimal. The question is whether
time spent on this topic is worth of the gain. Especially if I see
JNI, platform dependant checks .. Do you have any benchmarks that
show this is a real issue?

I can tell from my experience (unconfirmed by any kind of measurement),
that freemarker is very fast. The user cannot distinguish the first load
of page from another, so template loading is dot zero zero zero nothing.
You would have to have thousands of templates to feel the difference ...

Well, I am just interested, if optimalization would lead to real gain.
Sincerelly

Leos


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
FreeMarker-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freemarker-devel
Reply | Threaded
Open this post in threaded view
|

Re: Template cache inefficency under Windows

Attila Szegedi
On Fri, 13 Apr 2007 23:06:48 +0200, Leos Literak <[hidden email]>  
wrote:

>
> I can understand that it it is suboptimal. The question is whether
> time spent on this topic is worth of the gain. Especially if I see
> JNI, platform dependant checks .. Do you have any benchmarks that
> show this is a real issue?
>
> I can tell from my experience (unconfirmed by any kind of measurement),
> that freemarker is very fast. The user cannot distinguish the first load
> of page from another, so template loading is dot zero zero zero nothing.
> You would have to have thousands of templates to feel the difference ...
>
> Well, I am just interested, if optimalization would lead to real gain.

I don't think it would -- I answered to Daniel on a theoretical level, but  
I don't think I'd ever feel the need to complicate the code because of  
this; it indeed has a very dubious benefit :-)

Attila

--
home: http://www.szegedi.org
weblog: http://constc.blogspot.com

> Sincerelly
>
> Leos
>
>

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
FreeMarker-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freemarker-devel
Reply | Threaded
Open this post in threaded view
|

Re: Template cache inefficency under Windows

Daniel Dekany
Saturday, April 14, 2007, 10:03:19 AM, Attila Szegedi wrote:


> On Fri, 13 Apr 2007 23:06:48 +0200, Leos Literak <[hidden email]>
> wrote:
>
>>
>> I can understand that it it is suboptimal. The question is whether
>> time spent on this topic is worth of the gain. Especially if I see
>> JNI, platform dependant checks .. Do you have any benchmarks that
>> show this is a real issue?
>>
>> I can tell from my experience (unconfirmed by any kind of measurement),
>> that freemarker is very fast. The user cannot distinguish the first load
>> of page from another, so template loading is dot zero zero zero nothing.
>> You would have to have thousands of templates to feel the difference ...
>>
>> Well, I am just interested, if optimalization would lead to real gain.
>
> I don't think it would -- I answered to Daniel on a theoretical level, but
> I don't think I'd ever feel the need to complicate the code because of
> this; it indeed has a very dubious benefit :-)

Sure, it was just bugged me (as I have just dealt with something
similar to FreeMarker's caching). Since then I have decided too that I
rather don't care.

BTW, where the case sensitivity thing do have a real importance is
when you develop something on a case insensitive system (like Windows,
or even OS X, if I understand well), and then later it is deployed to
a case-sensitive one (like a UN*X server) and it suddenly becomes
broken. I know, one should use UN*X for development if the users run
UN*X...

--
Best regards,
 Daniel Dekany


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
FreeMarker-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freemarker-devel
Reply | Threaded
Open this post in threaded view
|

Re: Template cache inefficency under Windows

Daniel Dekany
Saturday, April 14, 2007, 2:35:01 PM, Daniel Dekany wrote:

> Sure, it was just bugged me (as I have just dealt with something
> similar to FreeMarker's caching). Since then I have decided too that I
> rather don't care.

Er... I seem to be unable to not care... :) So, for those who care
too, I tell what did I find out and how it fails. (Funny as storage
mechanisms like to make it impossible to put a *decent quality*
parsed-thing cache in front of them... like if they were worth much
without that. Parsed-thing cache in front of JCR/Jackarabbit someone?
Eh... toy-system standards these are.)

So, I have found out that the only true solution is -- not
surprisingly -- to canonicalise the template ID (the template path).
Among others, this would resolve aliases due soft-links under UNIX,
and eliminate the case variations under Windows. I have also reached
the conclusion that the Map key in the cache should *not* be the
canonical ID, but the "requested" ID, which, however could be mapped
to the canonical ID, which should be associated with the parsed
template. This way, even though there could be a cache entry for each
equivalent template ID that was ever requested, the template itself
would be still loaded + parsed + cached only once (once for each
different canonical ID). And, since the keys are not the canonical
ID-s, one could be able to get template with non-canonical ID-s
without causing canonicalization (I/O). Of course, similarly as cached
templates have update delay, the canonicalization results should have
update delay as well (tied together with template updating, but there
details are not interesting now). Easy. Except, that at least as far
as we are speaking about a "FileTemplateLoader", it doesn't work
reliably (no, not because of the update delays, they are irrelevant).
If I have two ID-s, ID1 and ID2, and ID1 is the path of a link in the
filesystem to the ID2 file (so ID1's canonical form is ID2), then if
ID1 later modified to point to ID3 instead, and after that ID2's
content is modified, then it can happen that when the requested
template is ID1, the TemplateLoader will return the new content of
ID2, despite that there was no moment ever when the ID1 link was
pointed to a file with such content. It can happen because the
TemplateLoader must separate the canonicalization and the loading
step. So first it canonicalizes ID1 to ID2, then meanwhile, if we are
unlucky, ID1 is modified to point to ID3 and ID2's content is
modified, and then the TemplateLoader starts to read ID2, which now
contains the new content. Bang!

> BTW, where the case sensitivity thing do have a real importance is
> when you develop something on a case insensitive system (like Windows,
> or even OS X, if I understand well), and then later it is deployed to
> a case-sensitive one (like a UN*X server) and it suddenly becomes
> broken. I know, one should use UN*X for development if the users run
> UN*X...

This one seems to be not too easy either. How can I get the filename
in its original case with Java API, so that the TemplateLoader can
deny loading the template with any other name? Maybe from the
canonicalized path... but that's not guaranteed at all. So it seems
that I just can't reliably get it with Java API. I know, JNDI... but
only to solve this, no way.

Well, this why it is incredibly hard to do things (near-) perfectly...
A TemplateLoader mechanism is surely not the most complex issue in a
template engine, and yet...

--
Best regards,
 Daniel Dekany


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
FreeMarker-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/freemarker-devel