Tasp
Full Member
Posts: 215
|
Post by Tasp on Nov 6, 2020 12:17:34 GMT -5
Good evening.
I have a DB that potentially can have 999999 entries. In reality it won't have, but each entry will have a user defined 6 digit customer number, I store the rest of the users information with other fields of an array.
For example;
names = 1 address = 2 details = 3
DIM customer$ (999999,20) customer$(2,names) = "joe bloggs" customer$(2,address) = "999 Letsby Avenue" customer$(2,details) = "Bad payer"
customer$(456,names) = "john smith" customer$(456,address) = "12 The Pub" customer$(456,details) = "Pulls Pints"
print customer$(456,names)
Obviously this isn't the greatest way to store this info, however this is part of a larger program that is very time critical, so rather than search thru a DB on the fly, it's quicker long term to load all the DB into an array at startup. Then just call the details from the array as needed.
But when it comes to DB management, ie editing details trying to create a speedy search routine is slow at best. This is what I have tried so far;
'sSite.xxxx details are grabbed from textboxes FOR a = 1 TO 999999
IF sSite.Customer$ <> "" OR_ sSite.Name$ <> "" OR_ sSite.Address$ <> "" OR_ sSite.Contactname$ <> "" OR_ sSite.Telephone$ <> "" OR_ sSite.Keyno$ <> "" THEN
IF sSite.Customer$ = CUSTOMER$(a, 1) THEN NOTICE "FOUND" IF sSite.Name$ = CUSTOMER$(a, 2) THEN NOTICE "FOUND" IF sSite.Address$ = CUSTOMER$(a, 3) THEN NOTICE "FOUND" IF sSite.Contactname$ = CUSTOMER$(a, 4) THEN NOTICE "FOUND" IF sSite.Telephone$ = CUSTOMER$(a, 5) THEN NOTICE "FOUND" IF sSite.Keyno$ = CUSTOMER$(a, 6) THEN NOTICE "FOUND" END IF NEXT
Any suggestions on a better search method? Or am I stuck due to the way the DB is actually created?
|
|
|
Post by Rod on Nov 6, 2020 14:18:58 GMT -5
Don't hold everything in an array. Hold it in a RAF and front it with an index array. that is the fastest way to search. It is discussed in this thread. Basically you file the data in a RAF and leave it alone. You just front it with a simple extract array that keeps your search criteria. You need to read some because it takes a bit of digesting. libertybasiccom.proboards.com/thread/878/simple-contacts-database
|
|
|
Post by alincon on Nov 6, 2020 16:43:50 GMT -5
[openPerFileNew] open fileNameNew$ for random as #perFileNew len = 370 field #perFileNew, 25 as name$, 28 as address$, 15 as city$, 2 as state$, 9 as zip$, 3 as apmt$, 1 as gndr$,_ '83 10 as phone1$, 10 as phone2$, 25 as hmEmail$, 10 as emgPhone$, 20 as emgName$,_ '75 9 as ssn$, 5 as dob, 5 as doh, 1 as mar, 1 as hins, 1 as lins, 1 as pens, 1 as savs,_ '24 2 as posn$, 2 as step$, 25 as wkEmail$, 10 as wkPhone$, 1 as exmps,_ '40 5 as earnAmt, 5 as earnDate, 5 as priorAmt, 5 as priorDate, 5 as tmpAmt, 5 as tmpDat1,_ '30 5 as tmpDat2, 6 as eAmty, 5 as pAmty, 5 as tAmty, 4 as sickUsed, 4 as vacnUsed,_ '29 5 as taxWh, 5 as socWh, 5 as medWh, 5 as hinsWh, 5 as linsWh, 5 as savsWh, 5 as pensWh,_ '35 5 as taxWhy,5 as socWhy,5 as medWhy,5 as hinsWhy,5 as linsWhy,5 as savsWhy,5 as pensWhy, _ '35 4 as sickEarn, 4 as vacnEarn, 5 as fill$, 2 as dept$, 2 as sect$, 1 as type$, 1 as xx$ empRecCtr = lof(#perFileNew) / 370 notice "Emp Ctr ";str$(empRecCtr) 'has to be a whole number return
[MakePerFileIndex] for n = 1 to empRecCtr get #perFileNew, n emps$(n,2) = sect$ emps$(n,4) = lz$(n,3) 'record number is vital emps$(n,1) = dept$ emps$(n,3) = name$ print name$,dob,doh,earnDate if dept$ = "99" then n99 = n99 + 1 delRec(n99) = n 'save for adding new emp records end if next sort emps$(),1,empRecCtr,1 'dept 99 recs sort to end empRecCtr = empRecCtr - n99 notice "Del Ctr ";str$(n99) return
|
|
Tasp
Full Member
Posts: 215
|
Post by Tasp on Nov 8, 2020 11:40:52 GMT -5
Well everything is stored in a CSV, so maybe I'm actually searching in the wrong location and should be searching through the file instead.
Cheers.
|
|
|
Post by Rod on Nov 8, 2020 14:00:05 GMT -5
The secret is to have an ordered extract of the CSV file, so it needs to be in an array to be sorted. Sorting is very fast and need only happen once. Then searching for records means finding the first matching record in the array and similar records will be one step away. So finding similar records is again very fast. The array holds the search data and points to the CSV file record.
So you blast through the extract array and pick out the CSV record as needed.
You cannot order a CSV file very easily, if you can't order it then neither can you search it quickly, it will all be very laborious as you trawl through every CSV record for every search.
|
|
|
Post by tsh73 on Nov 8, 2020 15:41:46 GMT -5
But how much numbers it'll have? Because running full search
N=999999 t0=time$("ms") 'first of all IF not(sSite.Customer$ <> "" OR_ sSite.Name$ <> "" OR_ sSite.Address$ <> "" OR_ sSite.Contactname$ <> "" OR_ sSite.Telephone$ <> "" OR_ sSite.Keyno$ <> "") THEN notice "Nothing to search for!": end
'else FOR a = 1 TO N IF (sSite.Name$ <> "" and sSite.Name$ = customer$(a, names)) or _ (sSite.Address$ <> "" and sSite.Address$ = customer$(a, address)) _ THEN t1=time$("ms") NOTICE t1-t0;" FOUND ";a exit for END IF NEXT t11=time$("ms") print t11-t0, "search"
for 1 000 000 entries - in array - takes on my (old) machine 32 seconds
But if I have array of 10 000 with numbers in 1 .. 999 999 range,
dim used(10000) maxUsed = 10000 for i = 1 to maxUsed used(i) = int(rnd(0)*1000000) next
I take number from that array - and check it in bigger array
FOR i = 1 TO maxUsed a=used(i) SCAN 'if we hope to ever break it IF (sSite.Name$ <> "" and sSite.Name$ = customer$(a, names)) or _ (sSite.Address$ <> "" and sSite.Address$ = customer$(a, address)) _ THEN t1=time$("ms") NOTICE t1-t0;" FOUND ";a exit for END IF NEXT
And it all takes 0.2 seconds
|
|
|
Post by Rod on Nov 9, 2020 10:35:04 GMT -5
I am not sure showing an array example helps. In reality the data will be held in in a file. That may be a RAF or a CSV file. Getting it into an array will take time. Searching the array less time. But if you bring all of the data into an array then you are responsible for edits and changes and for saving those edits and changes. If the data is left in the RAF or CSV file you take on less ownership and can manage the file record by record for edits. That's the old thinking, when programs crashed more often.
Perhaps the new world is super fast, I will need to experiment with some manufactured data.
|
|
|
Post by Rod on Nov 11, 2020 9:53:19 GMT -5
Ok this code creates 1,000,000 RAF records of 259 bytes. That is a 250MB file so DO NOT RUN THIS CODE UNLESS YOU ARE PARTICIPATING IN THE DISCUSSION. The file holds one special record at position 500000. So this is a realistic file. Here are my results.
So the file took some time to create but searching through all records in the RAF was quite quick but still nearly 60s to process all those records. I then build an array based index. This was built reasonably quickly but again near 60s to do. Once built searching the entire array took only 5s so an enormous advantage.
The thing about indexes is they can be built once and used often, they can have mixed info like yyyymmddhhmmss They can be sorted. This is a massive advantage because once you find the first record you will instantly find the next, say a days receipts, they will all be sorted together.
I have not played with CSV file structure yet but I think it will be quite slow compared to the RAF, getting the data into Liberty will be the slow point. Out on file a CSV is really quite inaccessible.
So for a really large dataset I would still say, RAF fronted by array based indexes. But if you are managing 1,000,000 records perhaps you need a better tool.
'goto [skip] open "contact.dat" for output as #1 close #1 Open "contact.dat" For Random As #1 Len = 259 Field #1, 35 As company$, 35 As contact$, 35 As addr1$, 35 As addr2$, 35 As addr3$,_ 10 As zip$, 12 As phone1$,12 As phone2$, 50 As email$ n$="Robert John Paul Lisa Janette Peter Gordon Manuel Harry William Jane Joan Debra" s$="Houston Wilson Raydon Malcom McDonald House Morrow Menzies Buxton Souness Keen Goran Allan" b$="Ace Acme Arrow Big Best Custom Fast Cheap Hungry" d$="Carriers Hauliers Chemicals Cleaners Builders Plumbers Decorators Electricians Insurance" s=time$("ms") print "creating RAF file" for n= 1 to 1000000 company$=word$(b$,int(rnd(0)*9+1))+" "+word$(d$,int(rnd(0)*9+1)) contact$=word$(n$,int(rnd(0)*13+1))+" "+word$(s$,int(rnd(0)*13+1)) addr1$=str$(n)+" First line of address" addr2$="Second line of address" addr3$="Third line of address" zip$="9999" if n=500000 then phone1$="9999999999" else phone1$="0123456789" end if phone2$="0987654321" email$=contact$+"@hotmail.com" put #1,n next print "file creation took ";time$("ms")-s close #1
[skip] Open "contact.dat" For Random As #1 Len = 259 Field #1, 35 As company$, 35 As contact$, 35 As addr1$, 35 As addr2$, 35 As addr3$,_ 10 As zip$, 12 As phone1$,12 As phone2$, 50 As email$
s=time$("ms") print "searching RAF file" for n= 1 to 1000000 get #1,n if trim$(phone1$)="9999999999" then print "found" next print "searched entire file in ";time$("ms")-s
dim index$(1000000,2) s=time$("ms") print "creating array based index" for n= 1 to 1000000 get #1,n index$(n,1)=phone1$ index$(n,2)=str$(n) next print "created array index in ";time$("ms")-s
s=time$("ms") print "searching array index" for n= 1 to 1000000 if trim$(index$(n,1))="9999999999" then print "found" next print "searched entire index file in ";time$("ms")-s close #1 end
|
|
Tasp
Full Member
Posts: 215
|
Post by Tasp on Nov 21, 2020 15:19:38 GMT -5
Thanks to all for the responses.
I took aspects from all and realised I should not be searching through an array like that. While the main program uses an array to hold the customer info, the database management doesn't need to hold it in the same way at all.
Now changed it loads and searches fast.
|
|