More

kusmi · on Dec 15, 2019

I always used NiFi.

kusmi · on May 31, 2018

I started doing this with 10k reports, the different formatting between different years was too much of a pain in the ass. They started embedding Excel documents in what I think is base64 into text files? I don't know. In the 90s the tables were in plain text.

rpedela · on June 1, 2018

Yes, XBRL (XML) files. However they only give you the financials. What OP wanted was management summary, risk factors, etc which still needs to be parsed. Luckily many companies now will put anchors to those sections in the HTML.

kusmi · on April 28, 2018

I once hacked together AI to try and predict if cost of Bitcoin will go up or down based only on time and history of price. The program worked, but I remember it didn't predict very well. Maybe tinkering and reworking it would lead to something, but the combining the AI with the exchange APIs is daunting.

kusmi · on April 3, 2018

I use the Google daydream headset mostly for YouTube, I would kill for a browser so I can plug my keyboard in and browse the internet on a gigantic screen.

kusmi · on March 22, 2018

I tried to get bulk data from EDGAR a while back. Turns out bulk data acquisition has been turfed to a third party, which charges for downloads. This is supposed to be federal free data, I was so pissed off. I am still pissed off.

rpedela · on March 23, 2018

Huh? It takes time to download all the filings by parsing the index for free [1] but it can be done. You also have to add back off logic when you get a "too many requests" error, but it isn't that hard. I did it and I am still updating as new filings are posted.

1. https://www.sec.gov/edgar/searchedgar/accessing-edgar-data.h...

__john · on March 23, 2018

How long ago was this?

kusmi · on March 23, 2018

Half a year?

kusmi · on March 15, 2018

Lua/Torch backend + Redis + React

kusmi · on March 12, 2018

I made fedcrawl.com, the entire backend runs on Lua/torch. This involves scraping federal contract sites, cleaning text, and ultimately indexing everything on elasticsearch. It's really just a few Lua scripts running as services via systemd and passing messages with redis. I love it for the speed and how simple the code looks.

kusmi · on Feb 20, 2018

Not really, it's pretty unorthodox to look for genetic sequences as evidence of infection. Typically you look for antibodies, as the virus may not be circulating, or is simply not where you are sampling.

kusmi · on Feb 14, 2018

Folding is dynamic over the lifetime of the RNA, things like SHAPE are nice but only provide a snapshot. It is still leagues better than simply going by the old school thermodynamics based folding algorithms. A neat way to check out this 'folding space' is to generate all possible structures (possible even for long sequences if you don't include psuedoknots), then do alignments, or integrate experimental data like SHAPE.

kusmi · on Dec 10, 2017

Where's the code?