Hacker Newsnew | past | comments | ask | show | jobs | submitlogin



It's also in the links in that repo, but may be more interesting to some folks here:

Katie's paper on VLBI reconstruction: https://arxiv.org/abs/1512.01413

This is how I learned about the topic and I think it's well suited for computery folks, since it was published in CVPR.


There are 6 authors for that paper, the way you linked it suggested it's solely by her; are the other authors listed honorarily?

Similarly the title suggests she worked alone on the project. Which seems exceedingly unlikely given the need for telescope time and computing time and the wide range of disciplines I imagine the project covers ... did she work alone. That must be almost unique in experimentalism nowadays?


It depends a bit on the scientific field, but in mine (computer vision, which this was also published in) the first author is the "main" author, usually doing most if not all the "ground work", then you have collaborators and co-authors, and finally the supervisors (in the case of the PhD student that she was at the time). It is becoming more common to see papers with "Author1, Author2, Author3..." lists, where the "*" authors count with equal contribution. This is important for attribution and the infamous metrics that funding often depends on, but it's not the case here.

So I'm certain those authors did their part, so maybe yes, I should have linked this as "Bouman et al." but I wouldn't expect this to be six equal contributions either.

That all being said, she's certainly standing on the shoulders of a pyramid of giants there.

Edit: to the people downvoting the parent, maybe explain? I didn't take this as a bad faith comment. It can be genuinely confusing to someone who doesn't know the ins and outs of academic attribution...


I'd argue that in practice science is more of a pile of giants.


Giant turtles. Its giant turtles all the way down.


I've found that CS contributions in academia are often poorly credited (sometimes even entirely neglected) to endeavors not purely executed within CS itself.

It's interesting considering how many modern scientific endeavors are dependent on new innovative algorithms, software, and computing techniques in both experimental and theoretical work then its frequently just hand-waved away as "technology."

I'm not saying such contributions (typically, though they can,) lay groundwork for an experiment or theory in another domain, but I am saying active CS involvement/expertise is typically critical to many scientific endeavors' success these days. If a project is interdiscipinary, there's probably a computer scientist on the team helping out.


I’ve gotten authorship when I have collaborated on the paper itself (rather than just research: that’s how you get into acknowledgements, same as lab techs, collaborators who didn’t work on this specific paper, etc.)


Although I agree with the point you have raised, it can sometimes be a little tricky to draw the line. Should we have Microsoft cited for projects completed using their software or system? Should we always cite Newton when using calculus?

I think society implicitly assumes that there has been a tone of people backing up a single individual towards their main achievements and that the individual is humble enough to know and to try - ever so slightly - to show appreciation.


That's why I said "actively."

From my perspective, if in order to accomplish your work, you need consult or active collaboration with a computer scientist and otherwise could not develop/test your theory or conduct your experiment, then they almost certainly should be cited as an author/collaborator.

If you utilize something OTS outside the project that just works for you and don't need a computer scientist, then whatever entity created that OTS IP isn't really an author/collaborator, but it's likely their work should be cited/referenced if it's part of the critical methodology (as part of disclosure and repeatability).

If your project used Microsoft Word to write up a report, it's not important to the underlying science you conducted. You could have substituted it with TeX, other Word processors, or pen/paper and it wouldn't change the outcome of the underlying theory/experiment you developed. If you used an Ansys package to perform analysis for some purpose, you should probably mention that out of rigor but Ansys isn't an author or collaborator.

If on the other hand you need someone to architect a solution to handle processing your massive dataset, needed someone to write custom code because nothing could do what you needed, needed a new algorithm because you had no clue how to approach the problem, or even needed someone to modify source code significantly to something that existed but couldn't do what you needed, then they are certainly an author/collaborator. If you took existing code/algorithm and made it more efficient in order to accomplish a task that would have taken too long otherwise, you're a contributor/collaborator and should be listed as an author.

This has been a huge issue in academic research but it's been getting a bit better and researchers are starting even more to acknowledge/credit computing professionals as crucial contributors and authors, as they rightfully should be.


> That's why I said "actively."

That's one of the reasons I said "I agree with the point you have raised".

I also think you have once again, raised some good points. Hopefully, others will use similar structures when writing and publishing their research.


In this case she is a Computer Scientist, so the CS contributions are the first author on the paper.


Not just CS, but depending on the discipline, math and statistics more generally.


Papers are routinely credited to their first author. Everyone understand the co-authors also took part. She also clearly credits the rest of the team in the posted BBC article.


She does clearly credit the rest of the team, which is admirable as she's also clearly been key to this piece of work.

Somehow though my facebook feed is already littered with images saying she was single-handedly responsible and no one's talking about her.

https://bit.ly/2Gkfk7f


[flagged]


I guess that's gonna happen once these things become so commonplace as not to qualify to be "news". So... they're working on it I guess :D

To clarify: I don't doubt women can do science, just empirically, they don't get to do it as often (at this level) as men.


There is actually nothing really in the BBC story about being a women other than in the context of her getting attention on twitter, but barely that. I think whoever changed the title might have been confused as well. It doesn't refer to woman as gender but woman as subject as in e.g. "the man walked his dog".


They are not doing that.


you seem pretty eager to get upset about something!


Very few science papers have a single author these days. She's listed as first author, so presumably her contribution was at least as important as anyone else's.


If we want to be pedantic, this paper is the work of millions of humans throughout history, who have helped developed the math knowledge for such a project to even be feasible.


[flagged]


No doubt. But can you please not post unsubstantive comments to Hacker News? Especially not ones that violate this site guideline: "Please don't post shallow dismissals, especially of other people's work."

https://news.ycombinator.com/newsguidelines.html


Once you post your impossibly distant black hole images, I'm sure they'll consider your pull request cleaning it up...


For those who didn't realise above, I was being flippant.

On a slightly serious note though, I wonder how much productivity is lost in the scientific community due to poorly written and documented code?

I've heard stories of 40 year old Fortran code written by long deceased professors that was written to crunch physics numbers or whatever, and when it's come time to modify or add to it, nobody can make head nor tail of it and they have to write it from scratch.

There's a reason why in the non-academic world we have coding standards and code review. Code isn't written in a bubble, other people will look at it and work on it.

That's not to belittle or criticise the work done in the slightest. Cleanliness of code is orthogonal to functionality. You can have beautifully written, clean and documented code that doesn't do what it's meant to, and likewise you can have a complete mess of code that performs some genius function perfectly.


I write that code. ;-)

It's a toss-up. On the one hand, there's a loss due to dirty code, but a gain by a smaller group of people being able to do multidisciplinary work. In my own case, I'm a physicist outside academia, and in addition to code, I also do electronics and a variety of other things.

When you're doing exploratory R&D, as I am, there are downsides to getting things done by domain specialists. First, you have to find people with quantitative skills, and they tend to be in the greatest demand due to scarcity. Second, you have to manage the politics of getting them assigned and engaged. Third, you have to manage the interface between specialties. It becomes a project management exercise. And then, the way that code and project files are structured, it may be possible to read isolated sections of code, but very hard for a non-expert to find their way around the myriad of files that tend to form a modern code base.

In my own case, I do what I can to write good code. I try to keep up to date on good practices, and so forth. Could we do better? Sure. The quest to improve my coding is how I accidentally bumped into HN in the first place.


Don't worry about these comments. The worst thing in science is usually that the code is not published (and these comments on code quality don't help).

As long as it's published, if somebody wants to reuse it, reimplementing from the paper is the hardest part.


I agree with you that the scientific community is way behind industry standards, but the reason for that is much less of their code is actually designed for reuse. The overwhelming majority of their work is just "let me try writing this code and see what results I get."

Industry professionals are forced to take the approach of "I need to write this code to be as maintainable and flexible as possible" because they have no idea what the business is going to want next and generally have no set timeframe for how long they may have to maintain any particular project.


A lot of industry code is also glue logic which doesn't express any original idea which makes it inherently easier to document. Code expressing a novel algorithm is never going to be as easy to document and maintain as code plugging standard libraries together. Notably, code in "industry" which does express novel algorithms is often also not so easy to read, there just isn't that much of it on most projects.


There are efforts to integrate more industry standard software engineering practices into research (RSE or "research software engineering" as a phrase is growing in popularity):

US: http://urssi.us UK: https://www.software.ac.uk


A lot of academic code is also written by students. For instance, I'm working on a project that ends up with code written by 6 masters' students. I'm trying desperately to get them to use Git or some other kind of version control rather than emailing me files, but it's only been partially successful. My last CS class per se was 18 years ago. They don't know (&(^ about programming -- at least I've been paid to do it in a production context in a company that has to make money to justify its existence -- and since they learned C++ first but we're programming in R or Python there are some ridiculous and unnecessary maneuvers and lots of for loops. I try to work through the code with them but I also don't have time for all of it, since I'm also teaching several classes etc. Sometimes it's easier to go with the crap I've got (that I've tested for correctness) than rewrite things.

If people have good resources I could pass to students about standards for Python code, for instance, let me know.


This is an issue in my field (Engineering) as well.

Most people in my field (materials engineering) are not programmers either they are lucky if they've done one intro course 10 years ago (which was probably done in a language like Java or Visual Basic).

Even then what gets taught in an intro course at university is not the type of code that is written "on the job". I did two semesters of programming courses when I was at uni (as electives) my courses were taught in Java and focused on stuff like object oriented programming and memorizing stuff about "the waterfall model"

There is a pretty big gap between this and my first experience which was being sat down in front of some 30 year old Fortran code which had no objects, no classes etc.

The goto at least in my org when people are trying to understand scientific code - write their own algorithms etc is the 30 year old "Numerical Recipes" (https://en.wikipedia.org/wiki/Numerical_Recipes) textbook. The explanations in this textbook are best and simplest I have come across by far.

I know I personally referenced this book heavily when I was writing code in C to do Spline interpolation/smoothing. I am unaware of any other reference for a lot of algorithms/techniques than this book.

Only other thing I am aware of is the GNU GSL library which in my experience is harder to understand for beginners - even it's example code is "for loop based"

For example: https://www.gnu.org/software/gsl/doc/html/bspline.html

If I had to convert this code to R (which I do know) or python (which I've never written) I'd probably write it this loop based style as well it's what I know and what makes sense to do me and the people in my org I'd expect to be interacting with my code. (the "Engineers can write Fortran in any language meme" is a real issue).

Maybe someone should write a new textbook on "modern" way to solve these sorts of problems if such a thing exists I am unaware of it but would certainly be welcome.


This makes me wonder if universities could employ a bootcamp-like curriculum, with lots of feedback, collaboration and unit tests, and make it available for students in these disciplines. Like how many schools have everyone take writing classes.


I think this would be very useful. So useful. I personally haven't been able to get anything code-related through the curriculum committee though (I'm not in CS).


There's a lot of Fortran code in underlying math libraries that are highly highly optimized, including the Fortran compilers themselves (mainly due to age and demand to eek performance out).

I worked with an old Fortran codebase at one point and there were comments in the documentation (a scan of a typewritten via typewriter document) throughout about switching "cards" and "decks"... took me a moment to realize it was refering to punch cards (and I thought I was old) which also led to the program structure fragmented in several individual smaller sub programs (so card reader could handle it) that now is a trivial matter to handle. Maybe they were just ready for the SOA and microservices trend.

In academia, pressure is often on publishing and pulling funding in through grants and contracts. I've done a lot of rapid prototyping in academic research environments and while writing clean software is always on my mind, often, sitting down and refactoring to be more cleverly efficient or taking time to focus on structure, long term maintainability, etc. isn't a priority and refocuses needed cognitive load from the high level research goal the software needed to achieve to instead focusing on production quality software.

I'm not concerned if it takes O(2n) vs O(n) or O(n log n) vs. O(n) time if I know the target scale is small. I'm not concerned that I can cleverly avoid using an extra data structure (and reduce space complexity) if I can do this operation in place on an existing data structure using some reasonably complex algorithm. Chances are I might remove this functionality entirely tomorrow or some student may have to figure it out later on, and I don't want to implement or explain to the student the Boyer-Moore majority algorithm when a brute force O(n^2) time is just fine here and a lot easier to adjust/maintain for a passer by scientist/student.

I'm aware there's a lot of problems and maybe my abstraction hierarchies aren't the best, I could probably make something better with more time.

You have some high-level complex process you're trying to represent and translate in to a program (maybe a simulation, maybe a complex model or set models, etc.). You're not always concerned about if there's a better way to write it or make extensive use of all the features of whatever language you needed to work in (which you may or may not have experience with since you needed to work from existing codebases to start with since time is tight), you simply want to use whatever requires the least time and cognitive load to think about and produce results so you can keep your eyes on the target of what you're developing.

Later on, when prototypes work (or if you hit performance bottlenecks stopping progress), then and only then do you start refactoring and looking at performance optimization--targeting the biggest bottlenecks first.

If everything works, then you can focus on overall refactoring and optimization and turning your Frankenstein into a supermodel (if you have resources/money to do that with--good luck), but you typically need a functional proof of concept to even have a chance of securing funding for that step.

If there's no money in that effort moving forward and you decide "well, maybe someone can use this" so let's release it, that typically has to get approval through a technology transfer office who are always in arms about protecting potential IP so it ends up on some disks rotting away never to be seen or used again.

If you're permitted to release the IP, you begin wondering how the development quality will reflect on you and your group, especially for those who see it and have no context of the constraints you worked with to produce that miracle functional Frankenstein. It's ugly as sin, but it fulfilled the goal to deliver the core research results and did so as quickly as possible and cheaply as possible.


> https://github.com/achael/eht-imaging/graphs/contributors

Huh, I wonder how accurate this is. All the code is beyond me in any case, I'm in no position to judge the relative value of any of it.


Number of lines is not an indicator of contribution. The reason I posted this repo was so that folks curious about the programming behind this science could have their fill. I am just happy to see an academic project using good version control practices released under a proper license.


I'm not sure what the idea behind posting this is? In a large scale project rarely does someone high up the command chain do any of the "grunt" work (i.e. programming). Also, by training she appears to be dealing more with theory than the actual implementation aspect of this all.


ok, so just so everyone is clear, the point between parent and grandparent is Andrew Chael seems to have written a hell of a lot more code than Katie Bouman, for a lot longer

* achael 566 commits 850,275 ++ 131,044 --

* klbouman 90 commits 2,410 ++ 1,265 --

However, at least at the level of reading the commit messages, Katie's are pretty math heavy:

"fixed bug in the fake briggs weighting"

"starting to fix chirp problems with polrep"

"made it possible to do a min uv cut on closure phase when adding it a..."

While Andrew's lean frequently toward code maintenance:

"updated some docstrings in imager_utils"

"moved imgsum to plotting.summary_plots"

"modified README"

That said, Andrew and others seem to have pretty good insight too.


To add some more context... of the 850k lines, 500k lines are mostly models and machine generated code. Andrew is definitely smart (smarter than an average HN user) and his code is very important but I have never seen so much display of misogyny and sexism against a woman scientist. She never took any credit and clearly said that this was a team effort. Some of the top posts on reddit are trying to mischaracterize the work that Dr.Katie has done and the comments are so vile.


One of the few redeeming features of the HN conventions of civility and seriousness is that we don't have Reddit's problems and we don't need to talk about Reddit.

But what could possibly qualify you to say that "Andrew is definitely smart (smarter than an average HN user) and his code is very important"?


[flagged]


I can not believe you have written the last sentence without any sense of irony. We are just discussing about how stupid the LOC metric is and how most of the LOC Andrew write were machine generated. People were also saying that her commits were more math heavy. Anyway if you actually believe that there is a secret feminist agenda, I don’t think I can say anything that will change your mind.


Don’t think GP supports that statement. That post just explains why the earlier lost was miffed about the commit log.

It’s disappointing to see this celebration of an amazing technical achievement devolve into a contentious meta-analysis inspired by the USA’s broken politics.


There are, just factually speaking, a lot of headlines reading something like "this is the woman who wrote the code..."


You've misunderstood me slightly. You're absolutely right about the loc metric -- seems reasonable to me that she'd design the/some algos and let others do boiler plate and implementation of [other] algo's. That's why I emphasised "appears", as in "someone naively approaches the subject, sees that and thinks 'her contribution was really small'".

I don't think there is a "secret feminist agenda" as such, but news outlets do over-egg the situation to try and create "women heroes of science". The way it's done appears to be sexist in an attempt at, so-called, positive discrimination; rather than being equalist.

You seem to consider my analysis to be abjectly errant, I would appreciate hearing why?


> so much display of misogyny and sexism against a woman scientist

There are no woman scientists, science has no gender.

The article is sexiest not people who are curious what Dr. Bouman actually did to be honored to mention in BBC article.


It's fallacious to presume that — because gender, sex, race, etc. shouldn't impact peoples' opportunities — that we should treat is as though it doesn't impact them.

This is a good article on the concept: https://everydayfeminism.com/2013/09/dont-see-race/

[Edit:] Or this, as a complementary one: https://www.mcsweeneys.net/articles/i-dont-see-race


We don’t exist in a purely meritocratic and egalitarian society. Maybe you have never been told that you are not good enough for some work but growing up in deeply paternalistic society, I constantly heard “women are too stupid for hard sciences and they should just stick to kitchens”. If celebrating her achievements in this way changes minds of a few people and inspires a few girls to believe in themselves, I think it is worth the “biased” coverage that she is getting for her work.


> I grew up hearing women are too stupid for hard sciences

Bad for you.

Thanks God I grew up in a society where every person no matter of gender and age can do hard science.


Nobody is saying that only some societies have both genders doing hard sciences. It’s the matter of opportunity.


People don’t surrender their outside identity when they become a scientist.


People don't surrender their outside political views, their lineage, sexual orientation and other background either. See how stupid the article would sound if titled: "Jane Doe: The divorced homosexual black democratic woman with three children behind the first black hole image". Science matters.


Since when was software development about who wrote the most code or did the most commits.

Do people like Linus deserve less credit now that he isn't the leader on the commit scoreboard ?


Linus wrote the most code to start the project though, that doesn't seem a good example.


True, but that need not be of high importance when assigning credit. Especially in projects that involve a lot of implementation, the high level algorithms and techniques are developed by someone who might be far removed from actually writing code, and that aspect is outsourced to someone much more competent at programming. A lot of PhD students / professors do not have the talent or have not developed the thought process to write complex code, simply because their focus has been on other things.


Exactly. One project I worked on a while back involved me taking a big pile of very clever and complicated Matlab code and rewriting it in python in a way that made it easy to use from other projects and able to read a couple of additional input file formats. If you just look at the commits on that project it would look like I was responsible for 90+% of the entire thing, while in truth I was basically just doing transcribing and cleaning and had basically nothing to do with any of the difficult parts.


she could have written 0 lines and be "the woman behind the picture".

Writing code is easy, figuring out complex algorithms is something very different, and does not require coding knowledge


https://github.com/achael/eht-imaging/commit/886b07b8a00d142...

one commit with 524,306 additions. adding a model.


This is deeply offensive. I'm the owner and founder of my company and I haven't written any meaningful code for our core products in 10 years. Our github repo has barely a scratch from me in it. Does this make my work, my hard long hours in managing my team and designing the product, worthless?


It's offensive and frustrating. I don't understand why people have to try and pull holes in this kind of celebration like there is some conspiracy to promote Katie at the cost of others.

If I read it right, she mentioned and praised her team as well.


Honestly I'd take satisfaction in what I had built. As a leader I'd greatly enjoy giving credit to the team who committed to my vision and made it a reality.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: