I find my self needing to extract URLs from text files quite a lot and this is the easiest one liner linux command line magic that I got to extract urls from text files.
cat filename | grep http | grep -shoP 'http.*?[" >]' > outfilename The first grep helps reduce cpu load. The second grep uses perl grep syntax to enable non-greedy grepping and thus allow you to get multiple URLs in one line of HTML and allows you to get the closest extraction. With the above you will still get a trailing quote in the end most of the time, this you can easily delete using your favorite text editor by simply replacing all instances of a quote with nothing. Simple and short and works well. This one is a lot more powerful it will search all files under this directory for links and output it to a file one directory level about this level. If you output the links in the same directory you will go into an infinite loop that will fill your hard drive. find * -exec cat {} \; | grep http | grep -shoP 'http.*?[" >]' > ..outfilename
‘cat filename | grep http’ is pointless, do ‘grep http < filename'. See: http://mywiki.wooledge.org/BashGuide/Practices#Don.27t_Ever_Do_These
I tried it but it didnt work as expected. I got many URLs but not all of them. Specialy urls that are long and have complicated URLs like hashkeys and such. Example
http://5.lw5.videobam.com/storage/5/videos/a/dw/KCsRT/encoded.mp4/6bd24ab76dbe926715432d764e9bb9b2/535290ae?ss=232