hal9k
Junior Member
Posts: 87
|
Post by hal9k on Oct 18, 2022 16:32:16 GMT -5
My wife wants me to write a program to collect some specific data from a web site she subscribes to. I'm comfortable using httpget$ and parsing the results from a known URL, but this seems like it would be much more difficult. In addition to building the correct URL string and supplying it to httpget$, I have to first get through the site's login page. I'm guessing there is a clever way to do this, but it's beyond my feeble abilities (I know very little about html magic).
Does anybody have advice or an example of a LB program authenticating to a web site?
As always, thanks!
|
|
|
Post by Chris Iverson on Oct 18, 2022 17:08:23 GMT -5
It might be possible using the WinINET API functions, but it would basically require writing custom code for every site you want to support logging in to.
And that's if the site supports logins from automated systems, and not just web browsers. There's a lot of things that can detect if it's a browser being used nowadays, and it's quite likely that if you're not, your connection will get rejected.
There's really no guaranteed way to tell, other than just figuring it out and trying it out.
Is there a specific site you'd like to be able to log in to?
|
|
hal9k
Junior Member
Posts: 87
|
Post by hal9k on Oct 18, 2022 17:19:48 GMT -5
|
|
|
Post by Chris Iverson on Oct 18, 2022 17:36:32 GMT -5
First thing I notice off the bat is that it supports SSO/OAuth using either Google or Apple accounts.
If the account you want to access is tied to one of those, that's basically the end of the game right there. Apple and Google guard their authentication flows very heavily.
If you're using a regular username and password on the site, you might be able to get away with it. Not sure yet.
|
|
hal9k
Junior Member
Posts: 87
|
Post by hal9k on Oct 18, 2022 17:50:52 GMT -5
She uses a regular user/pw to log in.
Thanks again for all your help!
|
|
|
Post by xxgeek on Oct 18, 2022 22:00:31 GMT -5
It should auto login.
The problems lies in the cookie being deleted when the browser closes most likely. Check browser settings under - privacy and security or cookie management - something similar.
I setup an account and tested the theory.
It worked for me. Using firefox, so your browser setting may differ in name.
|
|
hal9k
Junior Member
Posts: 87
|
Post by hal9k on Oct 19, 2022 9:35:07 GMT -5
I'm not sure where the browser comes into play. I want to access the web site via an LB program. I thought about manually logging in via a browser and then running the LB program, hoping that the manual authentication would also cover the LB access, but I didn't think that would work. I'll give it a try.
|
|
|
Post by xxgeek on Oct 19, 2022 16:57:07 GMT -5
I'm not sure where the browser comes into play. I want to access the web site via an LB program. I thought about manually logging in via a browser and then running the LB program, hoping that the manual authentication would also cover the LB access, but I didn't think that would work. I'll give it a try. It is impossible to sign in to this website using LB code. It does a robot check, with pictures.
To sign in to this website (automatically) it needs to find it's cookie on your PC. If no cookie (or file) exists then you get a sign in page. (May be a file rather than a cookie placed on your PC) I found NO cookie here, just files.
Once you provide password and email address you are sent to verify you are not a robot, so no auto sign in possible with LB. Unless you can prove in LB code you're not a robot. Good luck.
But, if you don't delete it's cookie or file (on last visit) you get signed in automatically. That's why you need the cookie, or file it places on your PC
In LB to go to that site in your default browser the code is
webSite$ = "https://my.carbmanager.com/" run "explorer ";webSite$
Ok you're on the website now, all signed in. I see no options to download any info or files etc. So you'll have to get inventive.
If there is a way to get "specific data" through the source code of the page someone else may know. I'd like to know too.... I doubt it, since all the data is embedded in an image, and can't be copied by selecting it.
If I am off topic, sorry about the misunderstanding.
I'd still like to know how this works out though, for future ideas.
|
|
hal9k
Junior Member
Posts: 87
|
Post by hal9k on Oct 19, 2022 17:46:21 GMT -5
Thanks! When I get some spare cycles I'll play around with it and see if I can sneak in. I'll post here if I find a way.
|
|
|
Post by Chris Iverson on Oct 19, 2022 20:02:18 GMT -5
Yeah, sorry, I have to concur with xxgeek.
I went to examine how they handle their own logins, but it's an identity management system run by Google, with the same checks as Google's own accounts.
You might be able to pull something off like setting up an embedded browser, and siphoning the data out once the user is logged in, but I wouldn't even know where to begin with that, and it would be even more complicated.
|
|
|
Post by Rod on Oct 20, 2022 1:57:25 GMT -5
So does your wife have to login every time she accesses the site? If you make an analogy with this site when I log in I get the option to “remember me”. Now provided my cookies, machine and ip stay static I can just visit the site and gain access.
In that state I believe httpget$ should be able to access the site. Httpget$ Just looks like a browser request to the site.
|
|
|
Post by Chris Iverson on Oct 20, 2022 2:25:15 GMT -5
Two things - IF using HTTPGET$() worked after logging in to a browser, it would have to be registered in the system's cookie cache, so I don't think you could use Chrome or Firefox to do the login.
You'd have to use IE or Edge, whichever one's on your system.
And that's assuming it's cookies that are used to save sessions, which isn't always the case anymore.
Also, HTTPGET$() does not look like a browser. It's only a single request, whereas(to a general webpage from a browser), there's be tons of requests, to load all of the corresponding content on the page. This is the kind of behavior that gets filtered out.
For example, if I just visit my homepage in my browser, I automatically get two requests:
98.52.84.195 - - [20/Oct/2022:07:22:01 +0000] "GET / HTTP/2.0" 200 6 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36" 98.52.84.195 - - [20/Oct/2022:07:22:01 +0000] "GET /favicon.ico HTTP/2.0" 404 176 "https://chrisiverson.net/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36"
One for the root page /, and then one for the favicon to show in the browser's tab(which I don't have set up). Both with a useragent "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36". Also, the favicon request has the referer filled in.
On the other hand, using HTTPGET$() gets this result in my webserver's access log:
98.52.84.195 - - [20/Oct/2022:07:19:03 +0000] "GET / HTTP/1.1" 200 6 "-" "My User Agent"
One single request, with the useragent "My User Agent".
|
|
hal9k
Junior Member
Posts: 87
|
Post by hal9k on Oct 20, 2022 9:50:54 GMT -5
Thank you all for all of your help and suggestions. I plan to put a little more effort into experimenting with the ideas I've gleaned from you suggestions. If, as I suspect, I'm unable to get the access I want, I will give up and chalk this up to a learning experience.
Thanks again. I'm continually overwhelmed with the quality and quantity of assistance from this wonderful community.
|
|