I started doing this with 10k reports, the different formatting between different years was too much of a pain in the ass. They started embedding Excel documents in what I think is base64 into text files? I don't know. In the 90s the tables were in plain text.
Yes, XBRL (XML) files. However they only give you the financials. What OP wanted was management summary, risk factors, etc which still needs to be parsed. Luckily many companies now will put anchors to those sections in the HTML.
I once hacked together AI to try and predict if cost of Bitcoin will go up or down based only on time and history of price. The program worked, but I remember it didn't predict very well. Maybe tinkering and reworking it would lead to something, but the combining the AI with the exchange APIs is daunting.
I use the Google daydream headset mostly for YouTube, I would kill for a browser so I can plug my keyboard in and browse the internet on a gigantic screen.
I tried to get bulk data from EDGAR a while back. Turns out bulk data acquisition has been turfed to a third party, which charges for downloads. This is supposed to be federal free data, I was so pissed off. I am still pissed off.
Huh? It takes time to download all the filings by parsing the index for free [1] but it can be done. You also have to add back off logic when you get a "too many requests" error, but it isn't that hard. I did it and I am still updating as new filings are posted.
I made fedcrawl.com, the entire backend runs on Lua/torch. This involves scraping federal contract sites, cleaning text, and ultimately indexing everything on elasticsearch. It's really just a few Lua scripts running as services via systemd and passing messages with redis. I love it for the speed and how simple the code looks.
Not really, it's pretty unorthodox to look for genetic sequences as evidence of infection. Typically you look for antibodies, as the virus may not be circulating, or is simply not where you are sampling.
Folding is dynamic over the lifetime of the RNA, things like SHAPE are nice but only provide a snapshot. It is still leagues better than simply going by the old school thermodynamics based folding algorithms. A neat way to check out this 'folding space' is to generate all possible structures (possible even for long sequences if you don't include psuedoknots), then do alignments, or integrate experimental data like SHAPE.