|
Post by tenochtitlanuk on Jan 26, 2021 6:56:53 GMT -5
Revisited code showing the terrible behaviour of LB4 word$ on long strings.
The code below reads a large comma-separated dictionary ( available online- search for 'unixdict.txt' which is linked from Rosetta Code)
In LB4 my alternative 'MyWord$(' works as expected, and native LB4 'word$(' method takes ages..)
In LB 5 NEITHER work- they see the dictionary but fail to use the separator character I ask for...
Any ideas??
' OK in LB4 but fails in LB5 open "unixdict.txt" for input as #fIn ' ASCII text with chr$( 10) separator f$ =input$( #fIn, lof( #fIn)) close #fIn
print "Searching started..."
data 1, 2, 3, 10, 100, 1000, 9000
do read N now =time$( "ms"): print "MyWord$ found word # "; using( "######", N); " to be <"; MyWord$( f$, N, chr$( 10)); "> taking ", using( "######", time$( "ms") -now); " ms." now =time$( "ms"): print "LBword$ found word # "; using( "######", N); " to be <"; word$( f$, N, chr$( 10)); "> taking ", using( "######", time$( "ms") -now); " ms." print "" loop until N =9000
print "Done."
end
function MyWord$( source$, rank, sep$) ' get round LB4 word$( problem on large strings. if rank =1 then e =instr( source$, sep$) MyWord$ =left$( source$, e -1) else L =len( source$) p =0 rank =rank -1 do' p =instr( source$, sep$, p +1) rank =rank -1 scan loop until rank <=0 e =instr( source$, sep$, p +2) MyWord$ =mid$( source$, p +1, e -p -1) end if end function
|
|
|
Post by Carl Gundel on Jan 26, 2021 11:45:49 GMT -5
Revisited code showing the terrible behaviour of LB4 word$ on long strings.
The code below reads a large comma-separated dictionary ( available online- search for 'unixdict.txt' which is linked from Rosetta Code) In LB4 my alternative 'MyWord$(' works as expected, and native LB4 'word$(' method takes ages..) In LB 5 NEITHER work- they see the dictionary but fail to use the separator character I ask for... Any ideas??
By terrible behavior you mean slow performance? word$() is a general purpose function. If you use it for some kinds of problems it makes things faster. You're discovered a use case that is very slow. Perhaps you will have better performance reading the file and parsing out each item as you read? Do you have a simplest example to demonstrate the LB5 bad behavior?
|
|
|
Post by Carl Gundel on Jan 27, 2021 14:14:27 GMT -5
|
|
|
Post by Carl Gundel on Jan 27, 2021 15:26:33 GMT -5
Oh, that link doesn't really work so well.
|
|
|
Post by tenochtitlanuk on Jan 27, 2021 16:12:36 GMT -5
The link is straight from Rosetta Code task ( rosettacode.org/wiki/Words_containing_%22the%22_substring ) which is trivial with fast word$(... web.archive.org/web/20180611003215/http://www.puzzlers.org/pub/wordlists/unixdict.txt works fine for me. I've lived with the anomalous timing of LB's word$, and know how to get round it, and thought LB5 had it cracked- but it fails as I've noted on this long string dictionary- plain ASCII text with chr$( 10) separators. If you run my code you'll see the timings do not make sense- current version gave the following printout. How can it take so long to find a first word, and nearly the same for the last one??!! Searching started... MyWord$ found word # 1 to be <10th> taking 9 ms. MyWord$ found word # 1 to be <10th> taking 2 ms. LBword$ found word # 1 to be <10th> taking 14466 ms. LBword$ found word # 1 to be <10th> taking 15187 ms.
MyWord$ found word # 1 to be <1st> taking 5 ms. MyWord$ found word # 2 to be <1st> taking 20 ms. LBword$ found word # 1 to be <1st> taking 15830 ms. LBword$ found word # 2 to be <1st> taking 15813 ms.
MyWord$ found word # 1 to be <2nd> taking 4 ms. MyWord$ found word # 3 to be <2nd> taking 22 ms. LBword$ found word # 1 to be <2nd> taking 15835 ms. LBword$ found word # 3 to be <2nd> taking 15911 ms.
MyWord$ found word # 1 to be <9th> taking 26 ms. MyWord$ found word # 10 to be <9th> taking 25 ms. LBword$ found word # 1 to be <9th> taking 15847 ms. LBword$ found word # 10 to be <9th> taking 15832 ms.
MyWord$ found word # 1 to be <absence> taking 183 ms. MyWord$ found word # 100 to be <absence> taking 185 ms. LBword$ found word # 1 to be <absence> taking 15958 ms. LBword$ found word # 100 to be <absence> taking 15862 ms.
MyWord$ found word # 1 to be <annals> taking 1571 ms. MyWord$ found word # 1000 to be <annals> taking 1562 ms. LBword$ found word # 1 to be <annals> taking 15856 ms. LBword$ found word # 1000 to be <annals> taking 15915 ms.
MyWord$ found word # 1 to be <grandmother> taking 12931 ms. MyWord$ found word # 10000 to be <grandmother> taking 12908 ms. LBword$ found word # 1 to be <grandmother> taking 15865 ms. LBword$ found word # 10000 to be <grandmother> taking 16059 ms.
MyWord$ found word # 1 to be <windowpane> taking 73883 ms. MyWord$ found word # 100000 to be <windowpane> taking 75945 ms. LBword$ found word # 1 to be <> taking 21180 ms. LBword$ found word # 100000 to be <> taking 16356 ms.
Done. One bright spot- there are some very odd words in this dictionary- presumably scraped from some Unix documentation. Includes US cities/towns I'd never heard of!! But not Framlingham!!
|
|
|
Post by tsh73 on Jan 27, 2021 16:27:54 GMT -5
Easy if it loads - parses - whole sting into array first. Or more advanced (slower?) data structure. Have you read on "Shlemiel the painter’s algorithm"? Have a good laugh: www.joelonsoftware.com/2001/12/11/back-to-basics/I suspect there might be one.
|
|
|
Post by tenochtitlanuk on Jan 27, 2021 18:52:41 GMT -5
Yup -plenty of ways to make it unnecessarily slow!
Hadn't met Schlemiel before, but a relevant reference. Given that a sub using instr( can gallop through from the start counting delimiters faster than LB word$(, there has to be a weird algorithm in LB4.
Kernighan and Richie- yeah, part of my youth. I swear by The Wirth formula, 'Algorithms + Data Structures = Programs' is still valid. It is also complete. A program is nothing more than algorithms acting on data structures. And it won't optimise itself!
Also from the same era, Programming Proverbs in Pascal, and Programming in Forth. And the Cambridge programming book I still look at- with the examples in Fortran. ( that was my first exposure to a high level language- we had access to Cambridge's Titan, running an early ancestor of Fortran.
|
|
|
Post by Carl Gundel on Jan 27, 2021 21:29:34 GMT -5
Easy if it loads - parses - whole sting into array first. Or more advanced (slower?) data structure. Have you read on "Shlemiel the painter’s algorithm"? Have a good laugh: www.joelonsoftware.com/2001/12/11/back-to-basics/I suspect there might be one. It's slow because it's a naive implementation. When I wrote it speed was not in my mind. Nobody complained until now, and that's usually why things get optimized, when it becomes obvious that it's slow.
|
|
|
Post by tenochtitlanuk on Jan 28, 2021 7:36:32 GMT -5
Nobody complained until now, Hmmm... I've mentioned it many times, on this site and the old one, including links to pages on my site explaining it! eg see middle of this page linked from a post on this forum. Don't get me wrong- there's no way I could write a language from intial specification to largely bug-free completion.I admire and sincerely recommend your creations, Carl, and realise you can't follow all our postings on the various wikis/boards/forums and websites like mine. You're still a saint for me!
|
|
|
Post by Carl Gundel on Jan 28, 2021 11:08:41 GMT -5
Nobody complained until now, Hmmm... I've mentioned it many times, on this site and the old one, including links to pages on my site explaining it! eg see middle of this page linked from a post on this forum. Don't get me wrong- there's no way I could write a language from intial specification to largely bug-free completion.I admire and sincerely recommend your creations, Carl, and realise you can't follow all our postings on the various wikis/boards/forums and websites like mine. You're still a saint for me! Sorry, I should have said that I don't recall it being complained about. I feel I also owe you a lot of gratitude.
|
|