Did my phd around that time and did a project “scaling” my work on a spark cluster. Huge pita and no better than my local setup which was an MBP15 with pandas a postgres (actually I wrote+contributed a big chunk of pandas read_sql at that time to make is postgres compatible using sqlalchemy)