Autocorrect errors in Excel still creating genomics headache
tcskeptic 2021-08-17 16:43:29 +0000 UTC [ - ]
UnFleshedOne 2021-08-17 17:52:26 +0000 UTC [ - ]
KMnO4 2021-08-18 02:48:14 +0000 UTC [ - ]
(Maybe genomics isn’t as bad as proteomics)
guitarbill 2021-08-17 16:52:40 +0000 UTC [ - ]
(You could argue that supplementary data should be in an open format, so you could automate this checking. I'm not against this, either.)
nnmg 2021-08-17 19:33:48 +0000 UTC [ - ]
Scientific editors do nothing for data validation. There is no accountability, even after retractions.
Scientific journal editors are glorified gatekeepers for "high impact" work (read: flashy), and then use free reviewer labor to cover themselves so they can call it 'peer reviewed'. In the rare cases when journals do require supporting data, they explicitly ask for excel spreadsheets :(
breakfastduck 2021-08-17 17:39:44 +0000 UTC [ - ]
Honestly these kind of things really wind me up.
ironmagma 2021-08-17 17:41:55 +0000 UTC [ - ]
breakfastduck 2021-08-17 17:48:53 +0000 UTC [ - ]
You KNOW the problem exists, yet you STILL choose to use the software, THEN expect it to change its behaviors because its inconvenient for your very specific use case. Microsoft aren't going to fix this behavior because there are likely millions if not billions of spreadsheets that rely on it.
That is not a software problem.
ironmagma 2021-08-17 17:56:59 +0000 UTC [ - ]
bserge 2021-08-17 20:41:03 +0000 UTC [ - ]
breakfastduck 2021-08-17 19:03:27 +0000 UTC [ - ]
"I didn't install it, I only chose to use it for something it isnt suited for instead of looking for alternatives, not my fault"
ironmagma 2021-08-17 19:26:38 +0000 UTC [ - ]
breakfastduck 2021-08-17 20:32:01 +0000 UTC [ - ]
But one specific use case for genetics means its totally unsuitable & and big problem with the product. Got it.
setr 2021-08-18 06:47:34 +0000 UTC [ - ]
Excel documents almost always have issues, independent of genomics[0]. I can probably just skim through my emails and find a bunch of problem spreadsheets all over.
Fundamentally, the problem is threefold.
1) excel tries to be smart, and this works in the easy case and breaks down in the edge cases. The edge cases also tend to appear with larger data entry as various one-offs, so it’s particularly insidious
2. Excel is difficult to validate (formulas are hidden, naming things properly is a bit of a hack, no real typechecking, data constraints can be set but it’s hard to check what’s set to what across all cells, etc)
3. The typical user / target market isn’t trained to use what facilities excel does happen to offer.
Ultimately, the problem is that excel doesn’t “scale”, and by default you get the wrong thing.
And even if you know how to set things up properly, it’s near impossible to be sure it stays proper.
[0] https://www.marketwatch.com/story/88-of-spreadsheets-have-er...
quantified 2021-08-17 21:20:49 +0000 UTC [ - ]
AmericanChopper 2021-08-17 20:47:21 +0000 UTC [ - ]
It is however most suited to accounting and logistics tasks. For any more novel use case, you’re going to have to put some thought into how you use it, or use a more suitable alternative. Even for the tasks that it’s very well suited to, there will always be more sophisticated alternatives. But I’d consider the way it lowers the the technical barriers to entry an overwhelming positive, even if it doesn’t completely absolve the users from learning about how their data works. I have the same frustrations with developers who have been trained by ORMs to have no idea how RDBMSes work. But I consider it their fault for not learning their trade properly, rather than the fault of the people who make nice tools to help them.
Closi 2021-08-17 17:52:00 +0000 UTC [ - ]
Further, the software would be fine if you use it properly for this sort of use case (i.e. bring the data in via PowerQuery which has enforced types).
BugsJustFindMe 2021-08-17 20:44:33 +0000 UTC [ - ]
IAmEveryone 2021-08-17 20:22:40 +0000 UTC [ - ]
People choose their tools for some reason(s), and there is no law of man or nature that says that using a common tool for some esoteric purpose is somehow wrong, just because you consider it too easy.
breakfastduck 2021-08-17 23:53:24 +0000 UTC [ - ]
psychometry 2021-08-17 20:08:11 +0000 UTC [ - ]
What makes matters worse is that scientists, who are already not great coders, have to work with even less technical people (i.e. the physicians). These collaborative processes will inevitably involve software like Excel. Need your M.D. co-collaborator to annotate a gene list or whatever? Your CSV's getting re-saved in Excel whether you like it or not.
tssva 2021-08-17 21:38:21 +0000 UTC [ - ]
The issue is choosing the incorrect tool or if you have no choice but to use a spreadsheet failing to learn how your tools work.
cyanydeez 2021-08-18 00:22:02 +0000 UTC [ - ]
people like spreadsheets because it provides very immediate feedback
delosrogers 2021-08-17 16:45:01 +0000 UTC [ - ]
OptionX 2021-08-17 16:24:54 +0000 UTC [ - ]
an_d_rew 2021-08-17 16:28:16 +0000 UTC [ - ]
Furthermore, Microsoft office is installed everywhere everywhere everywhere… like it leave it love it or hate it, it doesn’t matter… it’s true (at least for huge swaths of the demographic).
So the first thing most people learn is excel and the last thing most people use is excel. It’s a database, it’s a spreadsheet, you can even use it as a word processor if you want to. And some have.
I’m not saying you could and I’m not saying you should, I’m just answering as to “why“.
OptionX 2021-08-17 16:49:31 +0000 UTC [ - ]
I've had my education in a engineering focused campus so that may skew it, but even not IT related area had programming courses to, at least, learn the basics.
Do the newer researcher still show reluctance to move from Excel or is it an old guard kind of deal?
jasode 2021-08-17 17:23:31 +0000 UTC [ - ]
The feature of Excel that's underestimated/overlooked by skilled programmers of Python/R is that spreadsheets immediately expose an editable visual spatial canvas datagrid GUI.
I have decades of programming C/C++/C#/Python/etc and yet I create new spreadsheets and use them every day.
Spreadsheets are much faster than wiring up a datagrid in C# or C++ Qt gui widget editor or any other "true" programming language. Spreadsheets are also faster than using Python package Pandas to import a csv into a dataframe and view it in a Jupyter notebook. And last time I checked, displaying an output grid of cells in Jupyter is read-only and not 2-way editable like Excel.
And yes, the complaint is that "MS Excel calculations are not auditable, repeatable, version controlled, etc". All true, but it still doesn't change the fact that Python doesn't have an instant datagrid GUI. UI affordances also matter in viewing science data sets as well as financial budgets.
OptionX 2021-08-18 00:36:40 +0000 UTC [ - ]
And I was wondering if people like in the non-engineering but scientific fields, as the ones discussed in post, are exposed to the alternatives during their formation years.
If one chooses to use Excel because its the best tool great, but if its because its the perceived only option that's probably counterproductive.
int_19h 2021-08-17 20:45:32 +0000 UTC [ - ]
fartcannon 2021-08-17 16:40:17 +0000 UTC [ - ]
Use the right tool.
qayxc 2021-08-17 16:51:19 +0000 UTC [ - ]
Step one should be learn your tool, no matter the tool.
There's several ways to avoid the issue in Excel. If someone working with Excel isn't able to learn that (heck, a simple template would suffice), I have no hope the same demographic would have any success with R or Python.
da_chicken 2021-08-17 17:57:33 +0000 UTC [ - ]
No, not really. There are several ways to help reduce the issue, but none of them eliminate what it does.
Example that comes to mind is the data file for College Board's SAT test. The data formats for student reports for schools come in two formats: PDF (one page per student), CSV, and fixed-width. That is the comprehensive list of your options. College Board doesn't care about you as a customer. They're too big. Any request you submit will be black holed.
Some of the columns in the file indicate a range, usually in the format "X-Y". Excel will try to coerce that into a date, if it's valid.
Other columns indicate a ratio, expressed as "X/Y". Excel will coerce that into a date, if it's valid.
Other columns indicate an ID number, expressed as a large, fixed-digit number, zero-padded. Excel will coerce that into an integer, or, if it's too long, into scientific notation discarding digits.
It doesn't matter how you format the CSV. Excel will do the above.
Here's an example CSV:
ID,Range,Ratio
12345678901234567890,8-9,"7/15"
"00000000000000012345","9-10",21/35
I open that with Excel and immediately save it as a CSV. I look at the file in a text editor and I see: ID,Range,Ratio
1.23457E+19,9-Aug,15-Jul
12345,10-Sep,21/35
Do you have any idea how fun it is to explain to teachers and school administrators what happened here?The correct way to work with this data file is: Do not, under any circumstances, open it with Excel if you expect to use it for anything else.
The problem is, there are very few applications that work well with CSV files. I know of CsvEd and Delimit. There are several text editors with a CSV column mode that makes the file look like a table (with varying degrees of success). All of these vary between "godawful" and "a complete nightmare" in terms of performance and usability compared to Excel.
dragonwriter 2021-08-17 18:39:51 +0000 UTC [ - ]
Only if you open the file, so that Exel assumes defaults for formatting. If you import the CSV (Data -> From Text in the current UI) you can specify the format with the Import Text Wizard as described on the College Board instructions for using the file. Except...
Unfortunately, the College Board has outdated instructions on their website; Excel used to offer the Import Text Wizard on opening a csv or text file, rather than making default assumptions, and the College Board instructions page, while it does tell you to use that Import Text Wizard and provide all the details of what to plug into that wizard, tells you to open the file to get it, which bypasses the wizard.
Things like this is why clerical and other low-level positions involving excel test candidates on proficiency with specific versions of excel. Higher-level workers are expected to be able to figure out these kind of changes themselves, though (or consume lower-level staff time, if they are in management.
da_chicken 2021-08-17 19:56:14 +0000 UTC [ - ]
We have now changed a 1 second automated process that anybody could do, into a 2 minute manual process that requires knowledge of data types. We've gone from a non-technical task to a highly technical task. To open a file without completely munging the data.
It also will sometimes do weird things like exclusive lock the file on disk until you close every open window of Excel because it creates data connections to the file.
dragonwriter 2021-08-17 20:03:19 +0000 UTC [ - ]
You don't, because while the wizard doesn't default to headers, the first screen of the wizard has a checkbox for it, so the option is there without Power Query.
da_chicken 2021-08-17 23:33:15 +0000 UTC [ - ]
There is no checkbox in the wizard.
occamrazor 2021-08-17 18:27:47 +0000 UTC [ - ]
pletnes 2021-08-17 17:19:33 +0000 UTC [ - ]
qayxc 2021-08-17 19:03:55 +0000 UTC [ - ]
If not then your personal experience is irrelevant in this context.
CobaltFire 2021-08-17 18:00:18 +0000 UTC [ - ]
So I get to cobble together some really ugly spreadsheets in excel when I’d MUCH rather use something more appropriate. Sometimes it’s not the person, it’s the org. In this case, government.
jpeloquin 2021-08-17 16:53:54 +0000 UTC [ - ]
(1) Graduate students (who are the bulk of the scientific labor force) usually have some training in python, R, or Matlab, but are seldom practiced in applying it to real-world work. So if a lab standardizes on python, R, etc. the senior people have to do a lot of extra work to support the programming, rather than doing work with intrinsic scientific value. It's easier to teach the Excel quirks than to teach effective programming practices. Excel quirks are concrete, and "good practices" in programming are vague and situational.
(2) Python & R have their own pitfalls. It's easy to apply the wrong filters to a data set and compare subsets that aren't what you think you're comparing. Although when they go wrong, they go very wrong, and this is easier for a supervisor to notice.
(3) Excel has rich text formatting, graph embedding, and data types, so it's very useful as a human-readable data interchange & summary format even if you never do computation in it.
IMO neither programming languages nor Excel are a great fit for data analysis. Something purpose-made, like JMP or even GraphPad, is probably the better choice in most situations. Programming gets you automation but at the cost of high complexity. Since you still need to look at and think about the data there's a limit to how much time automation can save (Amdahl's law).
cm2187 2021-08-17 16:45:31 +0000 UTC [ - ]
Try to tell someone who just needs to get a task done that he needs to spend the next 3 months learning programming from scratch vs using excel.
breakfastduck 2021-08-17 17:37:03 +0000 UTC [ - ]
"A bad workman always blames his tools" has never been truer than this situation.
gerdesj 2021-08-17 16:30:23 +0000 UTC [ - ]
qayxc 2021-08-17 16:52:26 +0000 UTC [ - ]
acomjean 2021-08-18 04:43:54 +0000 UTC [ - ]
You get data from a lot of sources, and people (like me) just sometimes forget to check the spreadsheet. Often a csv file so it’s not clear it’s been generated by excel. It’s also only a problem for gene symbols (not entrez gene IDs or flybase gene numbers). Also the datasets are big, so there are 15 genes in error over the whole dataset it can be hard to spot.
Honestly genes are a constantly moving target and are kinda a pain to write software for. They get new names, split and merge over time…
qayxc 2021-08-18 09:20:47 +0000 UTC [ - ]
And why hasn't there been a collaboration between professionals to fix this on an organisational level (like agreeing on a spreadsheet template or sane method for data exchange)?
Is it really every lab for themselves with MS Excel and CSV of all things being the lowest common denominator?
acomjean 2021-08-19 12:26:19 +0000 UTC [ - ]
There are groups dedicated to organizing/ annotating genes. They’re quite organized and have highly structured databases.
http://gmod.org/wiki/Introduction_to_Chado
A lot of the model orgainism species have gotten together to make thinks more uniform.
https://www.alliancegenome.org/
Also most people in this field are using free software.
gpapilion 2021-08-17 16:40:15 +0000 UTC [ - ]
Excel, and most spreadsheets for that, avoid some of the basic control structures beginners find challenging. Many functions are basically map and reduce functions so you typically are getting the same number of cells back, or one. Often maps are done visually in the spreadsheet with a formula being applied to every cell in a column.
delosrogers 2021-08-17 16:40:33 +0000 UTC [ - ]
enaaem 2021-08-17 16:31:24 +0000 UTC [ - ]
taeric 2021-08-17 18:26:55 +0000 UTC [ - ]
It is a terrible format that has very few redeeming qualities. It happens to also be the most widely used one.
happytoexplain 2021-08-17 16:36:05 +0000 UTC [ - ]
minikites 2021-08-17 16:28:01 +0000 UTC [ - ]
blackbear_ 2021-08-17 16:38:47 +0000 UTC [ - ]
So yes, maybe these "relatively unskilled users" will struggle for a few weeks/months, but it will be a net positive change as many of them will see immense benefit after some tinkering.
instagraham 2021-08-17 16:31:14 +0000 UTC [ - ]
Errors made from using a new , presumably neutral software, would be the kind you are supposed to catch as a researcher.
There is obviously scope for error in everything but Excel's user-hostile methods are not justified by this.
an_d_rew 2021-08-17 16:32:14 +0000 UTC [ - ]
Excel is everywhere.
People are familiar with it. The same people are often not familiar with, and have never used, and may even be confused by “notepad.exe”.
ltbarcly3 2021-08-17 16:39:51 +0000 UTC [ - ]
codetrotter 2021-08-17 16:58:32 +0000 UTC [ - ]
A reasonable alternative would be for example statically typing each column or row in the sheet.
So in one sheet column A contains only floats, col B contains only text, col C contains only dates etc. And in another sheet col A may be date type, col B date type also, col C and D text, and col E monetary amounts.
But this would come at the cost of not being able to mix types in column, and a lot of people want to be able to mix.
And that’s why we’re stuck
wtallis 2021-08-17 17:16:34 +0000 UTC [ - ]
I feel like a lot of the desire for this comes from Excel's desire to fill the screen with a single infinite spreadsheet. Apple's Numbers and some other programs make it more natural to have multiple separate tables on screen, each of which can have their own header rows and columns and separate type and format rules for those columns/rows. Excel more or less forces people to emulate this capability by just using a separate region of the same table, which gets in the way of applying consistent formatting and typing rules to whole columns or rows.
eitland 2021-08-17 17:14:29 +0000 UTC [ - ]
automatically "correcting" something that was correct based on a dumb hunch
It is really simple to remove the most embarrassing autocomplete errors on phones:
- just disable autocorrect,
- or preferably use another keyboard that just proposes fixes but leaves it to you to select the suggestion (I use SwiftKey and it does this, others probably exist too).
The same is not possible in Excel as far as I am aware:
You cannot turn of autoformatting.
The whole thing could be easily (on a ux level at least ;-) solved by introducing a single checkbox: "stop embarrassing me" with help text that says "turn off autoformatting".
Checking that box would leave values like you typed or pasted them in.
No need to select on a column basis.
Edit: It is also almost at the level of modern search engines that replace my very specific queries with generic queries about something unrelated or vaguely related.
Edit 2: it is already more than a year ago I guess since iPhones were caught changing the names of correctly spelled medications to the name of another unrelated medication. Sooner or later this incessant drive to be smarter than the user is going to cost lives I'm afraid.
When it comes to search it is already costing me many minutes lost a day, and that is just first order effects, before we get into all the things that doesn't get done because we give up on finding it.
jjk166 2021-08-17 17:34:25 +0000 UTC [ - ]
Select format cells
Select "Text"
This will disable excel's autoformatting. Given the rarity that someone types "SEPT4" into excel and isn't typing a date, putting the option one level deep in a menu seems more appropriate than a top level button.
eitland 2021-08-17 19:22:44 +0000 UTC [ - ]
Given the rarity that someone has pasted 100 000 rows into and want them to be autoborked disabling autoborking, at least for pasted data seems appropriate enough for me ;-)
jjk166 2021-08-17 20:00:15 +0000 UTC [ - ]
eitland 2021-08-18 06:59:07 +0000 UTC [ - ]
Seriously? Also for pasted series? Also when the rest of the column doesn't match?
jjk166 2021-08-18 14:59:18 +0000 UTC [ - ]
The I-want-to-put-data-in-my-table-which-closely-resembles-an-incredibly-common-format-but-is-problematic-if-mistaken-for-that-format-oh-and-I-can-not-be-bothered-to-google-how-to-turn-off-autoformat niche is not very large. I don't work for microsoft and I don't have copies of user data, but I would be willing to bet my entire life savings that over 95% of the time someone pastes a series which appears to have mismatched data types in the columns, the series actually does have mismatched data types in the columns.
Closi 2021-08-17 17:53:56 +0000 UTC [ - ]
You can actually statically type the columns now for imported data via the PowerQuery editor (which is built into Excel), although not a lot of people know how to use it.
zozbot234 2021-08-17 17:11:03 +0000 UTC [ - ]
Excel actually supports this out of the box. But it's one more option to set, and people get lazy.
dspillett 2021-08-17 17:23:33 +0000 UTC [ - ]
As well as being easy to be lazy and not use the options, it is easy to accidentally undo them also.
Tagbert 2021-08-18 19:40:04 +0000 UTC [ - ]
This is definitely a problem if you are just opening a raw CSV file and editing it as the default cell type does allow autoconversion.
djbebs 2021-08-19 09:00:49 +0000 UTC [ - ]
If you know how to get it to never do that sort of thing, please let me know, because this is causing major issues in production software...
gpapilion 2021-08-17 16:31:05 +0000 UTC [ - ]
I think it has interesting implications for self-driving, since as far as I can tell we're at an earlier but similar stage in its development; works for simple situations, likely to drive you into a wall for no reason. We build so much trust in these systems, we stop paying attention to what they are doing, and then pay the price when it behaves in an unexpected way to the user.
soco 2021-08-17 20:18:19 +0000 UTC [ - ]
PedroBatista 2021-08-17 17:27:36 +0000 UTC [ - ]
But this time I feel we should give the business to the geneticists. The Excel "problem" is known and has been known for a long time, finance guys and policy makers were maybe the first big ones to "discover" this for slightly different reasons.
Then WHY on Earth these top professionals, elites, "creme de la creme" people continue on this path? And it's not like they love Excel, quite the contrary. Yes, the main reason is they are not programmers so they need to find a way to hack something to solve their problem. The problem with their "problem" solving is they are being slobs in a work that requires the EXACT opposite.
"Oh now we need to hire a programmer too?" - Yes, you do.
"But we don't have the budget for it" - Yes, you do. You see, money is never enough for anything, it's a question of priorities and when you blow millions of dollars with fancy experiments to ruin your career because of Excel, maybe that Python/R/Julia guy was a bargain after all.
I know some of these people, and under the aura of lab coats and distinguished professors it's just some guy/gal trying to crank that paper ASAP to keep the hamster wheel spinning AND they are being sloppy about the data and process AND most of them know it.
Sorry, but no excuses.
Closi 2021-08-17 17:50:30 +0000 UTC [ - ]
I think we are comparing something that is knocked out in an hour in excel to someone spending 4-10x of the time doing it in Python / Julia, but I don't think that's a like-for-like comparison.
breakfastduck 2021-08-17 17:35:47 +0000 UTC [ - ]
No sympathy from me. Embarrassing. I certainly wouldn't get away with such negligence at my workplace.
koksik202 2021-08-18 14:11:21 +0000 UTC [ - ]
planet-and-halo 2021-08-17 16:37:18 +0000 UTC [ - ]
mceoin 2021-08-17 20:13:17 +0000 UTC [ - ]
- It's definitely possible to build a custom spreadsheet product with a small team, even one targeted at such a "niche" user group. So it's an idea worth testing.
- Product can be "backwards compatible" - you can export to xlxs and import from xlxs - so you don't have to change behaviour of the entire industry on day one to get this to work, only a single genomics researcher or lab.
- Pricing, unit economics, etc are unknown to me (I have no background in genomics or scientific research). But presumably you could leverage standard SaaS models and build a viable model up from there using a few case studies. There's definitely schleppy behaviour going on here that can be solved.
- Even a "lifestyle business" has significant upside beyond the financial: improving genomics research improves genomic research!
- The product advantage over time presumably involve building more custom tooling into the genomics / data ecosystem. "Not creating typos" is just the beachhead.
I've never actually interviewed genomics people about their need here, but if anyone knows people with this problem I would love to talk to them: @mceoin on twitter. (DMs open)
cm2187 2021-08-17 16:42:23 +0000 UTC [ - ]
dspillett 2021-08-17 17:20:18 +0000 UTC [ - ]
Though convincing people to switch to a new app for that reason, even if it were free and Free, would still be an uphill struggle.
> just formatting your cells as text before entering the values
It is a little more than that. I work with finance people, and there a lot of data is manipulated in Excel but passed around as CSV files for compatibility elsewhere. This causes no end of problems because fixes by setting cell properties are obviously lost in transcription, and date errors creep in as things are moved back & forth between people in the US and those in locales that do dates properly.
masklinn 2021-08-17 17:30:22 +0000 UTC [ - ]
It's been going for more than half a decade now, and genomicists apparently would rather rename genes than stop using excel...
prionassembly 2021-08-17 17:16:22 +0000 UTC [ - ]
NullPrefix 2021-08-17 16:40:02 +0000 UTC [ - ]
That's a hard to implement feature. Hand waving and buzzword lingo will not be enough for people to believe it.
masklinn 2021-08-17 17:31:13 +0000 UTC [ - ]
It really isn't: just remove the data autodetection anything and you're done.
Closi 2021-08-17 17:28:22 +0000 UTC [ - ]
HPsquared 2021-08-17 17:40:38 +0000 UTC [ - ]
nnmg 2021-08-17 19:05:41 +0000 UTC [ - ]
And really, even if you know python or R, are you really going to fire up a jupyter notebook, load the data, and run pandas queries every time someone in lab meeting or after a talk asks you about this gene or that gene in your data?
I think the important question is why is date conversion a default? Would it really break backwards compatibility for MS Excel users if date conversions were explicit instead of automatic? Turning that off by default would fix a lot of this.
BugsJustFindMe 2021-08-17 20:36:46 +0000 UTC [ - ]
Sometimes, but the situation is in reality worse than that. Excel is also used as the gold standard database/storage/interchange format of record for random shit that clinical researchers have typed in by hand whether directly or transcribed from other notes, often when that data isn't actually fundamentally tabular in nature because people really like working in grids. Even when grids hurt more than help.
A big secret in genetic research is that the MDs, grad students, project managers, and coordinators running the research programs are often not super focused on what well-structured data looks like and don't know what things like "key-value store" or "nested tree-like structure" mean, and even if they did there aren't good GUI tools for entering them anyway, and it leads to countless errors that maybe (here I speculate) they just assume will wash out as noise.
> I think the important question is why is date conversion a default?
Yes, why any kind of conversion is ever the default is a real money question.
quantified 2021-08-17 21:18:35 +0000 UTC [ - ]
netizen-936824 2021-08-18 13:39:48 +0000 UTC [ - ]
gaze 2021-08-19 04:48:05 +0000 UTC [ - ]
SonicScrub 2021-08-17 20:20:24 +0000 UTC [ - ]
If I could choose the tools used by the whole process involving multiple different companies and departments, hey I would! It would be python all the way down. But I was but a cog in a massive organization.
dragonwriter 2021-08-17 20:25:29 +0000 UTC [ - ]
If you stay in spreadsheets these problems mostly don’t occur (that is, once data entry is squared away so that the initial spreadsheet has what you want it doesn't tend to get lost), its when you move in and out of spreadsheets via text and take the path of least resistance [0] to do the transition that the problem occurs.
[0] and to be fair, there is a lot of resistance off that path.
SonicScrub 2021-08-17 20:40:21 +0000 UTC [ - ]
God I hated working in old-school engineering/manufacturing. "That's not how we do things" is the answer to everything. I
dragonwriter 2021-08-17 23:57:43 +0000 UTC [ - ]
dec0dedab0de 2021-08-17 20:09:32 +0000 UTC [ - ]
I don't do any scientific research, but I have been using jupyter as a replacement for excel since it was called the ipython notebook. I don't really use pandas all that often, I just find it easier to read and edit data in python. Though I first learned ipython added the notebook from a talk Wes McKinney gave about Pandas.