Adding search facility for 999999 entries

Tasp
Full Member

Posts: 215

Adding search facility for 999999 entries Nov 6, 2020 12:17:34 GMT -5

Quote

Post by Tasp on Nov 6, 2020 12:17:34 GMT -5

Good evening.

I have a DB that potentially can have 999999 entries. In reality it won't have, but each entry will have a user defined 6 digit customer number, I store the rest of the users information with other fields of an array.

For example;


names = 1
address = 2
details = 3

DIM customer$ (999999,20)
customer$(2,names) = "joe bloggs"
customer$(2,address) = "999 Letsby Avenue"
customer$(2,details) = "Bad payer"

customer$(456,names) = "john smith"
customer$(456,address) = "12 The Pub"
customer$(456,details) = "Pulls Pints"

print customer$(456,names)

Obviously this isn't the greatest way to store this info, however this is part of a larger program that is very time critical, so rather than search thru a DB on the fly, it's quicker long term to load all the DB into an array at startup. Then just call the details from the array as needed.

But when it comes to DB management, ie editing details trying to create a speedy search routine is slow at best.
This is what I have tried so far;


'sSite.xxxx details are grabbed from textboxes
    FOR a = 1 TO 999999

        IF sSite.Customer$ <> "" OR_
           sSite.Name$ <> "" OR_
           sSite.Address$ <> "" OR_
           sSite.Contactname$ <> "" OR_
           sSite.Telephone$ <> "" OR_
           sSite.Keyno$ <> "" THEN

            IF sSite.Customer$ =      CUSTOMER$(a, 1) THEN NOTICE "FOUND"
            IF sSite.Name$ =        CUSTOMER$(a, 2) THEN NOTICE "FOUND"
            IF sSite.Address$ =     CUSTOMER$(a, 3) THEN NOTICE "FOUND"
            IF sSite.Contactname$ = CUSTOMER$(a, 4) THEN NOTICE "FOUND"
            IF sSite.Telephone$ =   CUSTOMER$(a, 5) THEN NOTICE "FOUND"
            IF sSite.Keyno$ =       CUSTOMER$(a, 6) THEN NOTICE "FOUND"
        END IF
    NEXT

Any suggestions on a better search method? Or am I stuck due to the way the DB is actually created?

Last Edit: Nov 6, 2020 12:18:16 GMT -5 by Tasp

If you're ever being chased by a Police dog, try not to run thru a tunnel, over a see-saw and jump thru a fiery ring, they're trained for that stuff.

Rod
Global Moderator

Posts: 2,621

Adding search facility for 999999 entries Nov 6, 2020 14:18:58 GMT -5

Quote

Post by Rod on Nov 6, 2020 14:18:58 GMT -5

Don't hold everything in an array. Hold it in a RAF and front it with an index array. that is the fastest way to search. It is discussed in this thread. Basically you file the data in a RAF and leave it alone. You just front it with a simple extract array that keeps your search criteria. You need to read some because it takes a bit of digesting.

libertybasiccom.proboards.com/thread/878/simple-contacts-database

Running Win11 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz 2.42 GHz 64bit with 32Gb ram

alincon
Full Member

Posts: 111

Adding search facility for 999999 entries Nov 6, 2020 16:43:50 GMT -5

Quote

Post by alincon on Nov 6, 2020 16:43:50 GMT -5

[openPerFileNew]
open fileNameNew$ for random as #perFileNew len = 370
field #perFileNew, 25 as name$, 28 as address$, 15 as city$, 2 as state$, 9 as zip$, 3 as apmt$, 1 as gndr$,_ '83
10 as phone1$, 10 as phone2$, 25 as hmEmail$, 10 as emgPhone$, 20 as emgName$,_ '75
9 as ssn$, 5 as dob, 5 as doh, 1 as mar, 1 as hins, 1 as lins, 1 as pens, 1 as savs,_ '24
2 as posn$, 2 as step$, 25 as wkEmail$, 10 as wkPhone$, 1 as exmps,_ '40
5 as earnAmt, 5 as earnDate, 5 as priorAmt, 5 as priorDate, 5 as tmpAmt, 5 as tmpDat1,_ '30
5 as tmpDat2, 6 as eAmty, 5 as pAmty, 5 as tAmty, 4 as sickUsed, 4 as vacnUsed,_ '29
5 as taxWh, 5 as socWh, 5 as medWh, 5 as hinsWh, 5 as linsWh, 5 as savsWh, 5 as pensWh,_ '35
5 as taxWhy,5 as socWhy,5 as medWhy,5 as hinsWhy,5 as linsWhy,5 as savsWhy,5 as pensWhy, _ '35
4 as sickEarn, 4 as vacnEarn, 5 as fill$, 2 as dept$, 2 as sect$, 1 as type$, 1 as xx$
empRecCtr = lof(#perFileNew) / 370
notice "Emp Ctr ";str$(empRecCtr) 'has to be a whole number
return

[MakePerFileIndex]
for n = 1 to empRecCtr
get #perFileNew, n
emps$(n,2) = sect$
emps$(n,4) = lz$(n,3) 'record number is vital
emps$(n,1) = dept$
emps$(n,3) = name$
print name$,dob,doh,earnDate
if dept$ = "99" then
n99 = n99 + 1
delRec(n99) = n 'save for adding new emp records
end if
next
sort emps$(),1,empRecCtr,1 'dept 99 recs sort to end
empRecCtr = empRecCtr - n99
notice "Del Ctr ";str$(n99)
return

Tasp Full Member Posts: 215	Adding search facility for 999999 entries Nov 8, 2020 11:40:52 GMT -5 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Tasp on Nov 8, 2020 11:40:52 GMT -5 Well everything is stored in a CSV, so maybe I'm actually searching in the wrong location and should be searching through the file instead. Cheers.
	If you're ever being chased by a Police dog, try not to run thru a tunnel, over a see-saw and jump thru a fiery ring, they're trained for that stuff.

Rod
Global Moderator

Posts: 2,621

Adding search facility for 999999 entries Nov 8, 2020 14:00:05 GMT -5

Quote

Post by Rod on Nov 8, 2020 14:00:05 GMT -5

The secret is to have an ordered extract of the CSV file, so it needs to be in an array to be sorted. Sorting is very fast and need only happen once. Then searching for records means finding the first matching record in the array and similar records will be one step away. So finding similar records is again very fast. The array holds the search data and points to the CSV file record.

So you blast through the extract array and pick out the CSV record as needed.

You cannot order a CSV file very easily, if you can't order it then neither can you search it quickly, it will all be very laborious as you trawl through every CSV record for every search.

Running Win11 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz 2.42 GHz 64bit with 32Gb ram

tsh73
Global Moderator

Global Moderator

just in case you wondered Anatoly is my real name

Posts: 1,418

Adding search facility for 999999 entries Nov 8, 2020 15:41:46 GMT -5

Quote

Post by tsh73 on Nov 8, 2020 15:41:46 GMT -5

I have a DB that potentially can have 999999 entries. In reality it won't have, but each entry will have a user defined 6 digit customer number,

But how much numbers it'll have?
Because running full search


N=999999
t0=time$("ms")
'first of all
        IF not(sSite.Customer$ <> "" OR_
           sSite.Name$ <> "" OR_
           sSite.Address$ <> "" OR_
           sSite.Contactname$ <> "" OR_
           sSite.Telephone$ <> "" OR_
           sSite.Keyno$ <> "") THEN notice "Nothing to search for!": end

'else
    FOR a = 1 TO N
            IF (sSite.Name$ <> "" and sSite.Name$ =        customer$(a, names))  or _
               (sSite.Address$ <> "" and sSite.Address$ =  customer$(a, address)) _
            THEN
                    t1=time$("ms")
                    NOTICE  t1-t0;" FOUND ";a
                    exit for
            END IF
    NEXT
t11=time$("ms")
print t11-t0, "search"

for 1 000 000 entries - in array - takes on my (old) machine 32 seconds

But if I have array of 10 000 with numbers in 1 .. 999 999 range,

dim used(10000)
maxUsed = 10000
for i = 1 to maxUsed
    used(i) = int(rnd(0)*1000000)
next

I take number from that array - and check it in bigger array

    FOR i = 1 TO maxUsed
        a=used(i)
        SCAN    'if we hope to  ever break it
            IF (sSite.Name$ <> "" and sSite.Name$ =        customer$(a, names))  or _
               (sSite.Address$ <> "" and sSite.Address$ =  customer$(a, address)) _
            THEN
                    t1=time$("ms")
                    NOTICE  t1-t0;" FOUND ";a
                    exit for
            END IF
    NEXT

And it all takes 0.2 seconds

Last Edit: Nov 8, 2020 15:47:55 GMT -5 by tsh73

Rod
Global Moderator

Posts: 2,621

Adding search facility for 999999 entries Nov 9, 2020 10:35:04 GMT -5

Quote

Post by Rod on Nov 9, 2020 10:35:04 GMT -5

I am not sure showing an array example helps. In reality the data will be held in in a file. That may be a RAF or a CSV file. Getting it into an array will take time. Searching the array less time. But if you bring all of the data into an array then you are responsible for edits and changes and for saving those edits and changes. If the data is left in the RAF or CSV file you take on less ownership and can manage the file record by record for edits. That's the old thinking, when programs crashed more often.

Perhaps the new world is super fast, I will need to experiment with some manufactured data.

Running Win11 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz 2.42 GHz 64bit with 32Gb ram

Rod
Global Moderator

Posts: 2,621

Adding search facility for 999999 entries Nov 11, 2020 9:53:19 GMT -5

Quote

Post by Rod on Nov 11, 2020 9:53:19 GMT -5

Ok this code creates 1,000,000 RAF records of 259 bytes. That is a 250MB file so DO NOT RUN THIS CODE UNLESS YOU ARE PARTICIPATING IN THE DISCUSSION. The file holds one special record at position 500000. So this is a realistic file. Here are my results.

creating RAF file
file creation took 108630
searching RAF file
found
searched entire file in 54763
creating array based index
created array index in 57928
searching array index
found
searched entire index file in 5508

So the file took some time to create but searching through all records in the RAF was quite quick but still nearly 60s to process all those records. I then build an array based index. This was built reasonably quickly but again near 60s to do. Once built searching the entire array took only 5s so an enormous advantage.

The thing about indexes is they can be built once and used often, they can have mixed info like yyyymmddhhmmss They can be sorted. This is a massive advantage because once you find the first record you will instantly find the next, say a days receipts, they will all be sorted together.

I have not played with CSV file structure yet but I think it will be quite slow compared to the RAF, getting the data into Liberty will be the slow point. Out on file a CSV is really quite inaccessible.

So for a really large dataset I would still say, RAF fronted by array based indexes. But if you are managing 1,000,000 records perhaps you need a better tool.

'goto [skip]
open "contact.dat" for output as #1
close #1
Open "contact.dat" For Random As #1 Len = 259
Field #1, 35 As company$, 35 As contact$, 35 As addr1$, 35 As addr2$, 35 As addr3$,_
        10 As zip$, 12 As phone1$,12 As phone2$, 50 As email$
n$="Robert John Paul Lisa Janette Peter Gordon Manuel Harry William Jane Joan Debra"
s$="Houston Wilson Raydon Malcom McDonald House Morrow Menzies Buxton Souness Keen Goran Allan"
b$="Ace Acme Arrow Big Best Custom Fast Cheap Hungry"
d$="Carriers Hauliers Chemicals Cleaners Builders Plumbers Decorators Electricians Insurance"
s=time$("ms")
print "creating RAF file"
for n= 1 to 1000000
    company$=word$(b$,int(rnd(0)*9+1))+" "+word$(d$,int(rnd(0)*9+1))
    contact$=word$(n$,int(rnd(0)*13+1))+" "+word$(s$,int(rnd(0)*13+1))
    addr1$=str$(n)+" First line of address"
    addr2$="Second line of address"
    addr3$="Third line of address"
    zip$="9999"
    if n=500000 then
    phone1$="9999999999"
    else
    phone1$="0123456789"
    end if
    phone2$="0987654321"
    email$=contact$+"@hotmail.com"
    put #1,n
next
print "file creation took ";time$("ms")-s
close #1

[skip]
Open "contact.dat" For Random As #1 Len = 259
Field #1, 35 As company$, 35 As contact$, 35 As addr1$, 35 As addr2$, 35 As addr3$,_
        10 As zip$, 12 As phone1$,12 As phone2$, 50 As email$



s=time$("ms")
print "searching RAF file"
for n= 1 to 1000000
    get #1,n
    if trim$(phone1$)="9999999999" then print "found"
next
print "searched entire file in ";time$("ms")-s

dim index$(1000000,2)
s=time$("ms")
print "creating array based index"
for n= 1 to 1000000
    get #1,n
    index$(n,1)=phone1$
    index$(n,2)=str$(n)
next
print "created array index in ";time$("ms")-s

s=time$("ms")
print "searching array index"
for n= 1 to 1000000
    if trim$(index$(n,1))="9999999999" then print "found"
next
print "searched entire index file in ";time$("ms")-s
close #1
end

Running Win11 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz 2.42 GHz 64bit with 32Gb ram

Tasp
Full Member

Posts: 215

Adding search facility for 999999 entries Nov 21, 2020 15:19:38 GMT -5

Quote

Post by Tasp on Nov 21, 2020 15:19:38 GMT -5

Thanks to all for the responses.

I took aspects from all and realised I should not be searching through an array like that.
While the main program uses an array to hold the customer info, the database management doesn't need to hold it in the same way at all.

Now changed it loads and searches fast.

If you're ever being chased by a Police dog, try not to run thru a tunnel, over a see-saw and jump thru a fiery ring, they're trained for that stuff.

Post by Tasp on Nov 6, 2020 12:17:34 GMT -5

Post by Rod on Nov 6, 2020 14:18:58 GMT -5

Post by alincon on Nov 6, 2020 16:43:50 GMT -5

Post by Tasp on Nov 8, 2020 11:40:52 GMT -5

Post by Rod on Nov 8, 2020 14:00:05 GMT -5

Post by tsh73 on Nov 8, 2020 15:41:46 GMT -5

Post by Rod on Nov 9, 2020 10:35:04 GMT -5

Post by Rod on Nov 11, 2020 9:53:19 GMT -5

Post by Tasp on Nov 21, 2020 15:19:38 GMT -5