dkl
Full Member
Posts: 234
|
Post by dkl on Mar 23, 2021 18:57:58 GMT -5
Can LB retrieve info from the windows clipboard or from a web browser address link?
I know I put info from LB and put it into the clipboard, but I want to do the reverse.
Ultimately, I would like to be able to copy link locations from the browser without having to do it manually. It would only be a specific one on a page not varies scattered links.
However if I could just get the info from the clipboard to LB would be great.
Thanks
|
|
|
Post by Brandon Parker on Mar 23, 2021 21:10:56 GMT -5
You can use the "HTTPGet$()" function to return the HTML content from a URL. You would then simply parse the string data returned for what you're are after. Check out the old Liberty BASIC Newsletter #108 at the link below. www.libertybasicuniversity.com/lbnews/nl108/tip.htm{:0) Brandon Parker
|
|
dkl
Full Member
Posts: 234
|
Post by dkl on Mar 23, 2021 22:06:52 GMT -5
Thanks Brandon, I am already using the "HTTPGet$()" function to get the HTML content. I want to get the URL address to then be able to use the "HTTPGet$()" function!
This is laziness at it's finest!!! Let's say I have 20 films i want to get the info for. I use LB to gather the names of the files - open chrome, which goes to a movie database and does a search for the film it finds the web page of the film. I then want to grab the URL required to download the html content to gather the info required. Once at the web page I could do a highlight/copy/paste, find next film .....repeat, but for multiple movies that's a bit tedious, so I'm trying to automate it all as much as possible.
PS The info required for LB to find the movie within the browser is not the same as the source page to acquire the movie info
On another note...... I have seen the Clipboard API's in the same newsletter and tried them out, but it seems that the API causes the clipboard to only function with certain programmes meaning, I can copy/paste to notepad and the clipboard Demo, but cannot do the same from a browser when the clipboard Demo is running.....Odd!? I can........ close Clipboard Demo copy from browser run Clipboard Demo then it will accept the Paste, but then have to close the Demo again to do another copy, so rather defeats the purpose
|
|
|
Post by Chris Iverson on Mar 23, 2021 23:28:33 GMT -5
Are you using one of the demos on that page? Which one?
Not being able to copy/paste makes me think the clipboard might not be getting closed properly.
Also, if you're trying to bulk-retrieve data on various movies on IMDB, you had a better idea asking about their API. If you can point to any API documentation, we may be able to help work with it, but using their API to pull the data will probably be a lot simpler than trying to remote-control a browser, play games with the clipboard, and download and parse HTML manually.
|
|
dkl
Full Member
Posts: 234
|
Post by dkl on Mar 24, 2021 1:47:53 GMT -5
I was using the txt version and the multiple type version I’ve actually done everything regards getting the data etc without the use of an API. All I need to do is automate the bit I mentioned. Although, I might just use a 3rd party clipboard- store the links there and put them into a file Then LB will do the rest. I was just trying to make it fully automated from start to finish! I’ve looked into APIs and have Alyce Watson’s book, but I’m afraid it's over my head!
|
|
|
Post by Rod on Mar 24, 2021 2:41:11 GMT -5
Why do you need the clipboard when httpget$ gives you the very info you want and gets the info? Surely just parse out the link from the data received and httpget$ the parsed link.
|
|
dkl
Full Member
Posts: 234
|
Post by dkl on Mar 24, 2021 6:23:27 GMT -5
I’m sorry, gentlemen, perhaps I’m misunderstanding something? Yes I know httpget$ downloads all the info I need, but to get that info I have to go to IMDB - find the movie and copy the link to use with httpget$
That’s the bit I’m trying to automate.
How can I use httpget$ if I don’t have the link. I don’t think just using http:\www.IMDb.com + movie name will get any response?
|
|
|
Post by metro on Mar 24, 2021 8:30:31 GMT -5
It would seem there are datasets for all movies, of no interest to me having owned two video libraries over 20 years ago. datasets.imdbws.com/ gives you a compressed list of movies (different info in each I think) name.basics.tsv.gz is approx 250Mb which expands to 640Mb Movie titles seem to be just numbered in the Url www.imdb.com/title/tt0068647/?ref_=adv_li_tt gives you Gonshchiki (1973) change the 7 in the link to a 6 and you get The Godfather (1972). I'd put money on it that you could create a list of movies and their URLs from the list in the TAB separated list. so then you don't have to copy and paste. I'm no expert there maybe issues with this code, however, I can "Ctrl + C" a URL in an address bar and hit the button to paste it into the textbox you can "Alt-TAb" between your program and the browser copy and paste. or build your url's from the Db nomainwin WindowWidth = 500 WindowHeight = 220 UpperLeftX=int((DisplayWidth-WindowWidth)-230) UpperLeftY= 100 ' int((WindowHeight-1320)) TexteditorColor$ = "white" TextboxColor$ = "white" textbox #main.textbox1, 5, 27, 450, 25 button #main.btn,"PASTE", [button4Click],UL,65, 60, 70, 40 open "Paste Urls" for dialog as #main print #main, "font ms_sans_serif 10" print #main, "trapclose [quit.main]" h=hwnd(#main)
wait
[button4Click] 'Perform action for the button named 'button4' calldll #user32, "OpenClipboard",h as long, result as long calldll #user32, "GetClipboardData",_CF_TEXT as long,_ txt as long
if txt <> 0 then #main.textbox1, winstring(txt) end if CallDll #user32, "CloseClipboard", r as boolean wait
[quit.main] 'End the program ' you may or may not want to empty the Clipboard calldll #user32, "OpenClipboard",h as long, result as long CallDll #user32, "EmptyClipboard", r as boolean CallDll #user32, "CloseClipboard", r as boolean close #main end
|
|
|
Post by Brandon Parker on Mar 24, 2021 10:09:28 GMT -5
I’m sorry, gentlemen, perhaps I’m misunderstanding something? Yes I know httpget$ downloads all the info I need, but to get that info I have to go to IMDB - find the movie and copy the link to use with httpget$ That’s the bit I’m trying to automate. How can I use httpget$ if I don’t have the link. I don’t think just using http:\www.IMDb.com + movie name will get any response? You will need to use HTTPGet$() more than once with some parsing in between to find the link(s) you need to get data from. You will also have to do quite a bit of error checking if you go this route. The code below requests www.imdb.com to find "The Avengers" and then it parses the returned data to extract the link to the page for the actual movie. Obviously, you would want to make sure you are grabbing the correct one for movies that have remakes and such. This is just a simple example of plowing through the IMDb website using simple web scraping... movieName$ = "The Avengers"
'Find the movie by the given name IMDbData$ = HTTPGet$("https://www.imdb.com/find?q=";movieName$)
'Narrow down the data to the first findList table IMDbData$ = After$(IMDbData$, "<table class=";chr$(34);"findList";chr$(34);">")
'Narrow down the data to the end of that table IMDbData$ = UpTo$(IMDbData$, "</table>")
'Extract the link immediately following the result_text td class definition IMDbData$ = After$(UpTo$(IMDbData$, chr$(34);" >";movieName$), "<td class=";chr$(34);"result_text";chr$(34);"> <a href=";chr$(34))
'Combine the previously extracted link with www.imdb.com to pull the movie's actual webpage. IMDbData$ = HTTPGet$("https://www.imdb.com";IMDbData$)
Print IMDbData$ When trying to do web scraping, the trick is to figure out the URL's the server expects to receive when actions are triggered by the user to get the server to perform various actions. It can get way more complicated and possibly impossible with more complex sites, but from what I can tell the IMDb site seems to make it fairly simple to navigate. My first step was simply to manually search for "The Avengers" to see how the website formatted the "find movie" request and then I just went from there. At the point where the IMDbData$ variable is above, you are at the perfect spot to parse out all of the main characters from the string. I would do this by first narrowing down the data that lies between "Cast overview, first billed only:" and "See full cast" since this is where they live in the HTML data. Typically with things like this, the webpages are dynamically produced but are done so in a very predictable fashion since the developer would want to build this type of page for every object (i.e. movie) and reuse the majority of the code only dynamically building in the differences between the objects at each request from a user. I hope that helps a little ... {:0) Brandon Parker
|
|
dkl
Full Member
Posts: 234
|
Post by dkl on Mar 24, 2021 23:06:17 GMT -5
Metro thank you for the code and it helped me see how to use a DLL and call the clipboard. I was aware of the IMDb structure although it doesn't always work just adding one extra numeral, but I get you point. I already had the datasets that you pointed me to, but wasn't really sure how to use them, but as I have revisted them I understand what they are all about, so thank you for pointing me in the correct direction
|
|
dkl
Full Member
Posts: 234
|
Post by dkl on Mar 24, 2021 23:06:33 GMT -5
Brandon, that was was exactly what I was looking for. I had been already using the final url that the code produces to open the webpage, but had been going to the web site and copying the link. That was the bit I wanted to automate and can now do so:)!
The 2nd to last line of code (combined with the code before) is very helpful to produce the Movie name and url. I've used a crude way to extract the data I want from the final code, so will look at how you have gone about it to improve my code. which I can easily combine with my current programme to produce what I need.
|
|
dkl
Full Member
Posts: 234
|
Post by dkl on Mar 25, 2021 4:46:10 GMT -5
Brandon, I'm really please with that bit of code, because now together with the programme I've already written I can search Actors, Directors, Writers etc. as well as Film and TV straight from the programme without having to use the browser if necessary.
|
|
|
Post by Brandon Parker on Mar 25, 2021 12:47:34 GMT -5
I am certainly glad that you find it useful!! Let us know if there's anything else you might need...
{:0)
Brandon Parker
|
|
dkl
Full Member
Posts: 234
|
Post by dkl on Mar 29, 2021 1:58:12 GMT -5
Just 1 query.......
When AFTER$, UPTO$ or AFTERLAST$ are used multiple times to read a file. Each time the command is used does it restart at the beginning of the file or continue from where the last command was used like INSTR() does.
|
|
|
Post by Chris Iverson on Mar 29, 2021 2:05:13 GMT -5
They all always start from the beginning of the string, with the exception of INSTR() if you give it a third parameter telling it where to start in the string.
They only operate on strings, not on files. How you get the file's data into the string to work with is up to you.
|
|