Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> is there a better tutorial/course for a beginner into this field ?

I have no clue. I have a degree in Computer Science and am finishing my master in Applied Statistic. It just happen my skill set, statistical modeling, have many overlap with Machine Learning.

General overview maybe https://r4ds.had.co.nz/

But for straight up modeling I would start and recommend this book: https://otexts.org/fpp2/ (update: this book doesn't go over EDA, outlier, and such... r4ds does EDA. For outlier and imputation some statistic book can cover that.)

It's for time series modeling (statistical slanted) but I think it goes over modeling aspect really well and it very intro (coursera have a step above that book for a little more depth in time series). To be fair... statistical modeling for univariate time series statistical model is number one-ish see m4 competition or the uber blog on time series.

> the end goal not being academia, but being able to think and write reasonable production code.

For the thinking part which affect writing code, from my experiences (I can be wrong), data science/ML approach to modeling is different than statistic.

The vast majority of the time Data Science/ML are given the data which is why I believe AI algorithm can be bias (see Gyfcat and Asian computer vision classification problem). Where as statistic you usually figure out your hypothesis or what you are trying to answer and then you create a design experiment or strategy on how to collect the data without being bias and hopefully controlling factors. But statistic also do given data but the vast majority of models out there is slanted toward explanatory vs forecasting/prediction. DS/ML seems to care more about forecasting/prediction.

I also think how each discipline approach modeling affect the way a person think within that field too. I have not figure out the unified thinking of how ML/DS discipline approach model but I am confidence that it's not the same as statistic. But speaking from statistic vs applied math, I can give an example. For time series data, statistician model on the assumption that all we're given is the data and we'll try to extract every bit of information out of the data and explain away the variance with models (each predictor can remove away the noise by explaining it whatever left is error/chances). It's more data focus. For applied math, they figure out how the data is generated eg their model assume this is how the data is generated so they got these stochastic processes and is uses more toward probability than statistic.

So thinking would affect code... So how statistician code stuff is probably different than ML/DS. I've seen people calling imputation witchcraft and throw temporal data in random forest >___>.

> write reasonable production code.

I do R for modeling.

If you're not creating any new fancy algorithm, you can just use packages. You train them on the data set, and just ship the trained model. You can wrap it as a REST service via https://www.rplumber.io/. I like to think Python have something similar?

Do note I know very little about deep learning or how to ship that. I've chosen to specialize in statistical modeling and R for modeling.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: