|
Post by meerkat on Feb 28, 2023 6:39:57 GMT -5
Thanks pierre.
You are correct. This is an interesting subject..
Maybe this is more of a personal decision on using the PRIMARY KEY.
Some reasons to use an explicitly defined INTEGER PRIMARY KEY column instead of the rowid include:
1. Control over the primary key: Using an explicitly defined primary key column allows you to choose the name and data type of the primary key. This can make it easier to manage the table and can provide more flexibility when working with the data.
2. Compatibility with other databases: Using an explicitly defined primary key column makes it easier to migrate the data to other databases that may not support the rowid concept.
3. Consistency with best practices: Using an explicitly defined primary key column is a common best practice in database design and can make the table more consistent with other tables in your database.
In summary, while it is possible to use the rowid as the primary key for a table in SQLite, it is generally recommended to use an explicitly defined INTEGER PRIMARY KEY column instead.
I guess for me I don't like giving up control and letting the system assign numbers. Instead of an INTEGER PRIMARY KEY, I simply define it as an INDEX and stay away from the PRIMARY KEY. Then I assign the values. I still keep the rowid for a lot of reasons.
|
|
|
Post by meerkat on Feb 28, 2023 8:26:13 GMT -5
Just to show ways to calculate age. This uses the 'now' or current date. And you have the birth and death date in a person record
SELECT CASE WHEN birth_date IS NULL OR birth_date = '' THEN NULL ELSE DATE('now') - DATE(birth_date, '+1 day', '-1 year') END AS age FROM table_name WHERE id = <person_id>;
Sometimes things get a little complicated. Suppose you have a person file. The person file has events that you want to track including the birth event ,eventType = 'BI', and the death event, eventType = 'DE'. And there may not be a birth or death event. If they have a birth and no death event then return nothing. It could look like this.SELECT person.personNum,person.firstName,person.middleName,person.lastName, CASE WHEN death.eventType = 'DE' THEN CAST((julianday(death.eventDate) - julianday(birth.eventDate)) / 365.25 AS INTEGER) ELSE CAST((julianday('now') - julianday(birth.eventDate)) / 365.25 AS INTEGER) END AS age FROM person JOIN event AS birth ON person.personNum = birth.personNum AND birth.eventType = 'BI' LEFT JOIN event AS death ON person.personNum = death.personNum AND death.eventType = 'DE'
WHERE person.personNum = <personNum>;
There are other ways to do this. If you have something, you may want to show us how.
|
|
|
Post by pierre on Feb 28, 2023 15:11:17 GMT -5
Your formula makes me 2 years older than I really am....
The following seems to work well (adapted from the internet):
output is: name, birthdate, age john 2003-02-28 20 mary 2003-03-31 19
sqliteconnect #sql, "test.db"
#sql execute("create table if not exists persons (name, birthdate)")
#sql execute("insert into persons values ('john', '2003-02-28'),('mary', '2003-03-31')")
query$ = " select name, birthdate, case when strftime('%m', date('now')) > strftime('%m', birthdate) then strftime('%Y', date('now')) - strftime('%Y', birthdate) when strftime('%m', date('now')) = strftime('%m', birthdate) then case when strftime('%d', date('now')) >= strftime('%d', date(birthdate)) then strftime('%Y', date('now')) - strftime('%Y', birthdate) else strftime('%Y', date('now')) - strftime('%Y', birthdate) - 1 end when strftime('%m', date('now')) < strftime('%m', birthdate) then strftime('%Y', date('now')) - strftime('%Y', birthdate) - 1 end as age from persons"
#sql execute(query$)
print #sql columnnames$() while #sql hasanswer() #row = #sql #nextrow() name$ = #row name$() birthdate$ = #row birthdate$() age = #row age() print name$;" ";birthdate$;" ";age wend
|
|
|
Post by pierre on Mar 1, 2023 4:15:07 GMT -5
As far as your second example is concerned, the result is correct.
Working with a number of days, poses the problem of leap years, therefore you divide by 365.25.
Actually it all depends on the exact number of 'February, 29 th' incurred between 'now' and the 'birthdate'. So 365.25 is not completely exact, but it remains a good approximation.
pierre
|
|
|
Post by meerkat on May 14, 2023 6:59:37 GMT -5
Calendar interface. Here is a simple program that will propagate a calendar database with all the dates for any given year.. You can use the LBgen program to create a calendar interface. This would allow you to do wildcard searches, add, changes, and deletes.Change YYYY for the year.Change db$ where you want the database, or add the calendar to an existing db.
'Create calendar database with notes for a given yyyy 'Change YYYY to desired yyyy 'Change db$ to you database file , or use and existing database ' '
YYYY = 2023 ' enter the year to generate db$ = "C:\rbp101\projects\a_project\data\a.db"
sqliteconnect #sql, db$ ' open or create a sqlite calendar database
'-- Create a table to store the calendar if it does not exist sql$ = "CREATE TABLE IF NOT EXISTS calendar ( yyyy INTEGER, mm INTEGER, dd INTEGER, dow INTEGER, notes TEXT);" #sql execute(sql$) sql$ = "CREATE UNIQUE INDEX IF NOT EXISTS calIndx ON calendar (yyyy,mm,dd)" #sql execute(sql$)
sql$ = "DELETE FROM calendar WHERE yyyy = ";YYYY #sql execute(sql$)
'-- Generate the calendar for a given year sql$ = "WITH RECURSIVE dates(date) AS ( VALUES (date('";YYYY;"-01-01')) UNION ALL SELECT date(date, '+1 day') FROM dates WHERE date < date('";YYYY;"-12-31') ) INSERT INTO Calendar (yyyy, mm, dd, dow, notes) SELECT strftime('%Y', date) AS yyyy, strftime('%m', date) AS mm, strftime('%d', date) AS dd, strftime('%w', date) AS dow, '' AS notes FROM dates;"
#sql execute(sql$)
'-- Query the calendar for a given year and month sql$ = "SELECT * FROM calendar WHERE yyyy = ";YYYY;" and mm = 1 ;" #sql execute(sql$) WHILE #sql hasanswer() #row = #sql #nextrow() yyyy = #row yyyy() mm = #row mm() dd = #row dd() dow = #row dow() notes$ = #row notes$() print yyyy;" | ";mm;" | ";dd;" | ";dow;" | ";notes$ WEND end
|
|
|
Post by pierre on May 14, 2023 8:30:40 GMT -5
interesting !
thanks
pierre
|
|
|
Post by meerkat on May 14, 2023 12:48:45 GMT -5
MLR - Multiple Linear Regression. Here is a simple program to do MLR on 3 columns and one target. You need to fill the table with the data. The size of the data is limited only by the size of your disk.
dbName$ ="a" sqliteconnect #sql, "C:\rbp101\projects\a_project\data\a.db"
'-- Step 2: Retrieve the relevant data from the database '-- Assuming you have a table named 'yourTable' with columns 'column1', 'column2', 'column3', and 'targetColumn' ' if not then create it sql$ = "CREATE TABLE IF NOT EXISTS yourTable (column1, column2, column3, targetColumn)" #sql execute(sql$)
sql$ = "SELECT column1, column2, column3, targetColumn FROM yourTable;"
print "------ MLR info -------------" gosub [getSql]
'-- Step 3: Perform the multiple linear regression using SQL '-- Assuming you want to estimate the coefficients for the equation: targetColumn = b0 + b1 * column1 + b2 * column2 + b3 * column3 '-- Calculate the means of each column sql$ = "SELECT AVG(column1) AS mean_column1, AVG(column2) AS mean_column2, AVG(column3) AS mean_column3, AVG(targetColumn) AS mean_target FROM yourTable;"
print "--- means of each column" gosub [getSql]
'-- Calculate the sum of squared differences from the means sql$ = "SELECT SUM((column1 - mean_column1) * (column1 - mean_column1)) AS ssq_column1, SUM((column2 - mean_column2) * (column2 - mean_column2)) AS ssq_column2, SUM((column3 - mean_column3) * (column3 - mean_column3)) AS ssq_column3, SUM((targetColumn - mean_target) * (targetColumn - mean_target)) AS ssq_target FROM yourTable, (SELECT AVG(column1) AS mean_column1, AVG(column2) AS mean_column2, AVG(column3) AS mean_column3, AVG(targetColumn) AS mean_target FROM yourTable);"
print "--- sum of squared differences from the means" gosub [getSql]
'-- Calculate the cross-products and sum of cross-products sql$ = "SELECT SUM((column1 - mean_column1) * (targetColumn - mean_target)) AS cross_prod_column1_target, SUM((column2 - mean_column2) * (targetColumn - mean_target)) AS cross_prod_column2_target, SUM((column3 - mean_column3) * (targetColumn - mean_target)) AS cross_prod_column3_target FROM yourTable, (SELECT AVG(column1) AS mean_column1, AVG(column2) AS mean_column2, AVG(column3) AS mean_column3, AVG(targetColumn) AS mean_target FROM yourTable);"
print "--- cross-products and sum of cross-products" gosub [getSql]
'-- Calculate the regression coefficients sql$ = "SELECT cross_prod_column1_target / ssq_column1 AS coefficient_column1, cross_prod_column2_target / ssq_column2 AS coefficient_column2, cross_prod_column3_target / ssq_column3 AS coefficient_column3, mean_target - (cross_prod_column1_target / ssq_column1) * mean_column1 - (cross_prod_column2_target / ssq_column2) * mean_column2 - (cross_prod_column3_target / ssq_column3) * mean_column3 AS intercept FROM ( SELECT SUM((column1 - mean_column1) * (targetColumn - mean_target)) AS cross_prod_column1_target, SUM((column2 - mean_column2) * (targetColumn - mean_target)) AS cross_prod_column2_target, SUM((column3 - mean_column3) * (targetColumn - mean_target)) AS cross_prod_column3_target FROM yourTable, (SELECT AVG(column1) AS mean_column1, AVG(column2) AS mean_column2, AVG(column3) AS mean_column3, AVG(targetColumn) AS mean_target FROM yourTable) ) AS cross_products, ( SELECT SUM((column1 - mean_column1) * (column1 - mean_column1)) AS ssq_column1, SUM((column2 - mean_column2) * (column2 - mean_column2)) AS ssq_column2, SUM((column3 - mean_column3) * (column3 - mean_column3)) AS ssq_column3, SUM((targetColumn - mean_target) * (targetColumn - mean_target)) AS ssq_target FROM yourTable, (SELECT AVG(column1) AS mean_column1, AVG(column2) AS mean_column2, AVG(column3) AS mean_column3, AVG(targetColumn) AS mean_target FROM yourTable) ) AS sum_sq_diffs, (SELECT AVG(column1) AS mean_column1, AVG(column2) AS mean_column2, AVG(column3) AS mean_column3, AVG(targetColumn) AS mean_target FROM yourTable);"
print "-- regression coefficients" gosub [getSql]
'-- Output the regression coefficients and intercept sql$ = "WITH cross_products AS ( SELECT SUM((column1 - mean_column1) * (targetColumn - mean_target)) AS cross_prod_column1_target, SUM((column2 - mean_column2) * (targetColumn - mean_target)) AS cross_prod_column2_target, SUM((column3 - mean_column3) * (targetColumn - mean_target)) AS cross_prod_column3_target FROM yourTable, (SELECT AVG(column1) AS mean_column1, AVG(column2) AS mean_column2, AVG(column3) AS mean_column3, AVG(targetColumn) AS mean_target FROM yourTable) ), sum_sq_diffs AS ( SELECT SUM((column1 - mean_column1) * (column1 - mean_column1)) AS ssq_column1, SUM((column2 - mean_column2) * (column2 - mean_column2)) AS ssq_column2, SUM((column3 - mean_column3) * (column3 - mean_column3)) AS ssq_column3, SUM((targetColumn - mean_target) * (targetColumn - mean_target)) AS ssq_target FROM yourTable, (SELECT AVG(column1) AS mean_column1, AVG(column2) AS mean_column2, AVG(column3) AS mean_column3, AVG(targetColumn) AS mean_target FROM yourTable) ) SELECT coefficient_column1, coefficient_column2, coefficient_column3, intercept FROM ( SELECT cross_prod_column1_target / ssq_column1 AS coefficient_column1, cross_prod_column2_target / ssq_column2 AS coefficient_column2, cross_prod_column3_target / ssq_column3 AS coefficient_column3, mean_target - (cross_prod_column1_target / ssq_column1) * mean_column1 - (cross_prod_column2_target / ssq_column2) * mean_column2 - (cross_prod_column3_target / ssq_column3) * mean_column3 AS intercept FROM cross_products, sum_sq_diffs, (SELECT AVG(column1) AS mean_column1, AVG(column2) AS mean_column2, AVG(column3) AS mean_column3, AVG(targetColumn) AS mean_target FROM yourTable) ) AS regression_result;"
print "--- regression coefficients and intercept" gosub [getSql] end
[getSql] #sql execute(sql$) colNames$ = #sql columnnames$() print colNames$
WHILE #sql hasanswer() result$ = #sql nextrow$(" |";chr$(9)) print result$ WEND print "===============================" RETURN
Here is the results from a test file I created;
------ MLR info ------------- column1, column2, column3, targetColumn 1 | 2 | 3 | 4 2 | 3 | 4 | 5 33 | 44 | 55 | 66 3 | 6 | 9 | 12 2 | 4 | 6 | 8 =============================== --- means of each column mean_column1, mean_column2, mean_column3, mean_target 8.2 | 11.8 | 15.4 | 19.0 =============================== --- sum of squared differences from the means ssq_column1, ssq_column2, ssq_column3, ssq_target 770.8 | 1304.8 | 1981.2 | 2800.0 =============================== --- cross-products and sum of cross-products cross_prod_column1_target, cross_prod_column2_target, cross_prod_column3_target 1465.0 | 1910.0 | 2355.0 =============================== -- regression coefficients coefficient_column1, coefficient_column2, coefficient_column3, intercept 1.90062273 | 1.46382587 | 1.18867353 | -32.1638241 =============================== --- regression coefficients and intercept coefficient_column1, coefficient_column2, coefficient_column3, intercept 1.90062273 | 1.46382587 | 1.18867353 | -32.1638241 ===============================
|
|
|
Post by meerkat on May 19, 2023 4:36:33 GMT -5
MLR Generator Here is a simple program to make it easy to set up a MLR (multiple linear regression). I don't know of any limits on the number of columns or the size of your data. Simply change the following values at the beginning of the program; dim col$(??) - set this to be large enough to hold all you column names col$(?) - set to the column names in you database table target$ - set this to the target name in your database table. tblName$ - the name of the table in your database db$ - this is the directory and name of your database ---------------------------------------- The output you would be interested in is --- regression coefficients and intercept coefficient_begQty: 0.915414312 coefficient_endQty: 0.175504934 coefficient_begGauge: -229.745036 coefficient_endGauge: -183.209723 intercept: 82498.7243
Using these values it should give you the best guess of a target If you had known values of 1,2,3,4 the formula would be: target = 1 * 0.915414312 + 2 * 0.175504934 + 3 * -229.745036 4 * -183.209723 + 82498.7243
The included program was set up to run a MLR of some data from a table I have. Change the values to fit your sample;
' =================================================================================== ' Change these values
dim col$(20) ' change to you number of columns numCol = 4 ' change this to number of columns col$(1) = "begQty" ' change to name of your columns col$(2) = "endQty" ' .. col$(3) = "begGauge" ' .. col$(4) = "endGauge" ' .. target$ = "movQty" ' change to your target columnn tblName$ ="wpoDtl" ' change to your database name
db$ = "C:\rbp101\projects\wine_project\data\wine.db" ' change to your database name ' =================================================================================== sqliteconnect #sql, db$ ' open database cr$ = chr$(13)
print "------------- MLR info -------------"
sql$ = "SELECT count(*) as numRows FROM ";tblName$ #sql execute(sql$) #row = #sql #nextrow() numRows = #row numRows() print "There are ";numRows;" records in the ";tblName$;" table" print sql$ = "SELECT " for i = 1 to numCol sql$ = sql$ + col$(i);", " next i sql$ = sql$ + target$;" FROM ";tblName$;" LIMIT 20;" #sql execute(sql$) colNames$ = #sql columnnames$() print colNames$ WHILE #sql hasanswer() result$ = #sql nextrow$(" |";chr$(9)) print result$ WEND print "==============================="
'-- Perform the multiple linear regression using SQL '-- Calculate the means of each column
sql$ = "SELECT " for i = 1 to numCol sql$ = sql$ + "AVG(";col$(i);") AS mean_";col$(i);",";cr$ next i sql$ = sql$ + "AVG(";target$;") AS mean_target FROM ";tblName$;";" hdr$ = "--- means of each column" gosub [getSql] ' ---------------------------------------------------------------- hdr$ = "-- Calculate the sum of squared differences from the means"
sql$ = "SELECT ";cr$ for i = 1 to numCol sql$ = sql$ + " SUM((";col$(i);") - mean_";col$(i);" * (";col$(i);" - mean_";col$(i);")) AS ssq_";col$(i);",";cr$ next i sql$ = sql$ + " SUM((";target$;" - mean_target) * (";target$;" - mean_target)) AS ssq_target FROM ";tblName$;", (SELECT ";cr$ for i = 1 to numCol sql$ = sql$ + "AVG(";col$(i);") AS mean_";col$(i);",";cr$ next i sql$ = sql$ + "AVG(";target$;") AS mean_target FROM ";tblName$;");" gosub [getSql]
'----------------------------------------------------------------
hdr$ = "-- Calculate the regression coefficients"
sql$ = "SELECT ";cr$ for i = 1 to numCol sql$ = sql$ + " cross_prod_";col$(i);"_target / ssq_";col$(i);" AS coefficient_";col$(i);",";cr$ next i sql$ = sql$ + " mean_target ";cr$ for i = 1 to numCol sql$ = sql$ + " - (cross_prod_";col$(i);"_target / ssq_";col$(i);") * mean_";col$(i);cr$ next i sql$ = sql$ + " AS intercept FROM ( SELECT ";cr$ for i = 1 to numCol x$ = ","+cr$ if i = numCol then x$ = "";cr$ sql$ = sql$ + " SUM((";col$(i);" - mean_";col$(i);") * (";target$;" - mean_target)) AS cross_prod_";col$(i);"_target";x$ next i sql$ = sql$ + " FROM ";tblName$;", (SELECT ";cr$ for i = 1 to numCol sql$ = sql$ + "AVG(";col$(i);") AS mean_";col$(i);",";cr$ next i sql$ = sql$ + "AVG(";target$;") AS mean_target FROM ";tblName$;") ) AS cross_products, ( SELECT ";cr$ for i = 1 to numCol sql$ = sql$ + " SUM((";col$(i);" - mean_";col$(i);") * (";col$(i);" - mean_";col$(i);")) AS ssq_";col$(i);",";cr$ next i sql$ = sql$ + " SUM((";target$;" - mean_target) * (";target$;" - mean_target)) AS ssq_target FROM ";tblName$;", (SELECT ";cr$ for i = 1 to numCol sql$ = sql$ + "AVG(";col$(i);") AS mean_";col$(i);",";cr$ next i sql$ = sql$ + "AVG(";target$;") AS mean_target FROM ";tblName$;") ) AS sum_sq_diffs, (SELECT ";cr$ for i = 1 to numCol sql$ = sql$ + "AVG(";col$(i);") AS mean_";col$(i);",";cr$ next i sql$ = sql$ + "AVG(";target$;") AS mean_target FROM ";tblName$;");"
gosub [getSql]
hdr$ = "'-- Output the regression coefficients and intercept" sql$ = "WITH cross_products AS ( SELECT ";cr$ for i = 1 to numCol x$ = ","+cr$ if i = numCol then x$ = "";cr$ sql$ = sql$ + " SUM((";col$(i);" - mean_";col$(i);") * (";target$;" - mean_target)) AS cross_prod_";col$(i);"_target";x$ next i sql$ = sql$ + " FROM ";tblName$;", (SELECT " for i = 1 to numCol sql$ = sql$ + "AVG(";col$(i);") AS mean_";col$(i);",";cr$ next i sql$ = sql$ + "AVG(";target$;") AS mean_target FROM ";tblName$;") ), sum_sq_diffs AS ( SELECT " for i = 1 to numCol x$ = ","+cr$ if i = numCol then x$ = "";cr$ sql$ = sql$ + " SUM((";col$(i);" - mean_";col$(i);") * (";col$(i);" - mean_";col$(i);")) AS ssq_";col$(i);x$ next i sql$ = sql$ +" FROM ";tblName$;", (SELECT " for i = 1 to numCol sql$ = sql$ + "AVG(";col$(i);") AS mean_";col$(i);",";cr$ next i sql$ = sql$ + "AVG(";target$;") AS mean_target FROM ";tblName$;") ) SELECT " for i = 1 to numCol sql$ = sql$ + "coefficient_";col$(i);",";cr$ next i sql$ = sql$ + " intercept FROM ( SELECT " for i = 1 to numCol sql$ = sql$ + " cross_prod_";col$(i);"_target / ssq_";col$(i);" AS coefficient_";col$(i);",";cr$ next i sql$ = sql$ + " mean_target " for i = 1 to numCol sql$ = sql$ + " - (cross_prod_";col$(i);"_target / ssq_";col$(i);") * mean_";col$(i);cr$ next i sql$ = sql$ + " AS intercept FROM cross_products, sum_sq_diffs, (SELECT " for i = 1 to numCol sql$ = sql$ + "AVG(";col$(i);") AS mean_";col$(i);",";cr$ next i sql$ = sql$ + "AVG(";target$;") AS mean_target FROM ";tblName$;") ) AS regression_result; " hdr$ = "--- regression coefficients and intercept" gosub [getSql] end ' ====================================================================================== [getSql] print hdr$ 'print sql$ #sql execute(sql$) colNames$ = #sql columnnames$() WHILE #sql hasanswer() result$ = #sql nextrow$(chr$(9)) for i = 1 to 100 cn$ = word$(colNames$,i,",") if cn$ = "" then exit for print cn$;":";chr$(9);word$(result$,i,chr$(9)) next i WEND print "===============================" RETURN
Here is the output from the sample I ran;
------------- MLR info ------------- There are 488 records in the wpoDtl table
begQty, endQty, begGauge, endGauge, movQty 0 | 17118 | 0 | 180.15 | 17118 0 | 10274 | 0 | 108.59 | 10274 0 | 4948 | 0 | 52.52 | 4948 0 | 15255 | 0 | 162.65 | 15255 0 | 5322 | 0 | 56.99 | 5322 0 | 14936 | 0 | 160.65 | 14936 0 | 15814 | 0 | 170.85 | 15814 0 | 5181 | 0 | 56.23 | 5181 0 | 17024 | 0 | 185.58 | 17024 105872 | 0 | 0 | 0 | 105872 0 | 976 | 0 | 10.14 | 976 0 | 13291 | 0 | 137.51 | 13291 0 | 18262 | 0 | 188.14 | 18262 0 | 19876 | 0 | 203.91 | 19876 0 | 13020 | 0 | 133.02 | 13020 116834 | 0 | 999.99 | 0 | 116834 15900 | 30979 | 0 | 362 | 15979 18300 | 36356 | 0 | 413 | 18356 970 | 1876 | 0 | 22 | 976 13260 | 26291 | 0 | 295 | 13291 =============================== --- means of each column mean_begQty: 84887.6025 mean_endQty: 84799.75 mean_begGauge: 21.996373 mean_endGauge: 135.75168 mean_target: 145164.24 =============================== -- Calculate the sum of squared differences from the means ssq_begQty: 4.142515e7 ssq_endQty: 4.1382278e7 ssq_begGauge: 10734.23 ssq_endGauge: 66246.82 ssq_target: 4.40716074e13 =============================== --- regression coefficients and intercept coefficient_begQty: 0.915414312 coefficient_endQty: 0.175504934 coefficient_begGauge: -229.745036 coefficient_endGauge: -183.209723 intercept: 82498.7243 ===============================
To estimate the target you use the coefficients and intercept Suppose you had known columns of 10,20,30,40 then the equation to estimate the target would be
target = 10 * 0.915414312 + 20 * 0.175504934 + 30 * -229.745036 + 40 * -183.209723 + 82498.7243
|
|