No public posts yet unfortunately. For spinning up jobs we have an internal tool that can spin up machines and provision them, but in my own experiments I just usually end up spinning up machines and rsyncing experiments over.
Ah, fair enough. Too bad, I'd be really interested in hearing how you do that. I work for a consulting firm as a data scientist, and we have a lot of issues with data size. I'm aware of the tricks for image data (rotating & flipping, playing with different gamma levels), but I haven't seen much for standard, tabular style data.
Also, what do you use you spin up multiple jobs in parallel using EC2?