Obligatory Comment: "Excel as a Database" by Rory Blyth, circa early 2000's.
"As a developer, you've probably, at some unfortunate point in your life (possibly several points, actually), been handed an Excel file that has been crammed full of 'data' by someone in marketing and told to 'do something with it.' "http://wyorock.com/excelasadatabase.htm
As a developer, what should I do if I actually want to do that? Because I get a lot of satisfaction out of massaging messy wads of data into a clean, uniform, accessible form. Is there a job title or something I should look for?
The Analytics team at my company does this. ETL's we call them (Export, Transform, Load). Some of us are app devs and some are data analysts; there's cross-over. We do other things too, but moving data between systems is a big part of it.
As for job titles; Software Engineer (or Developer) of Analytics, Data Analyst, Data Scientist, or something along those lines. Probably varies by company.
Recently I worked on a contract doing some data ingestion that the resident team didn't want to deal with.
They had been provided a JSON api to get what should have been a batch file. The owner/api creator said "this is good enough we won't accommodate you." So much for sensible data transmission and batch processing.
Because I have had to deal with these sorts of things before I have a fairly robust tool chain to hammer api's with requests to get the data in a reasonable time frame. In this case my client went full tilt - and simply hammered the API till its owner gave in and started sending batch files.
Yes the end result in most cases is good old CSV -
repeated variable names are bloat when getting lots of data, extra formatting is the same... CSV's are great for getting these things down.
As another poster pointed out (S)FTP is the way to go when sending CSV's, and they are compressed in most cases to save storage and data transmission on both ends.
I am old enough to remember when mailing a hard disc or a tape might be a faster way to move a LOT Of data. The practice is still alive and well, but the option is now by the "truckload"
Probably means a single report with all the data sent at a regular interval. (All transactions between 00:00 hours and 24:00 hours the previous day, sent by 9:00 am to these email inboxes).
Instead the client was given an api to query and no batch processing of information was allowed.
I’ve encountered this idea somewhere, and fortunately have never had to deal with such a broken scenario. Getting piecemeal information is infuriating.
He probably means the output of a batch processing pipeline. Clients will calculate a set of data once a day and shove it into some kind of storage[0] for you to download.
[0] probably an FTP server with no TLS and with the same weak username and password for everyone who connects, a setup which they won't change no matter how many times you ask
What the other child poster said. This is 90% of my job. It’s really fun a lot of the time and absolutely brutal the rest of it. Be prepared for things like writing code that checks the color or text styling of cells etc. people encode data in whacky ways
Seriously. I frequently get handed spreadsheets with integers typed as strings... people using strings for things that, heaven's sake, should undoubtedly be booleans... Getting the thing into usable condition takes up more time and energy than the actual analysis they desire
"As a developer, you've probably, at some unfortunate point in your life (possibly several points, actually), been handed an Excel file that has been crammed full of 'data' by someone in marketing and told to 'do something with it.' " http://wyorock.com/excelasadatabase.htm