Complexity theory puts limits on performance of gradient descent
miketery 2021-08-18 00:15:57 +0000 UTC [ - ]
This seems counter intuitive. Why would this be true? I would have thought that it’s almost impossible for a particle to end up in the same place as it started.
fshbbdssbbgdd 2021-08-18 00:26:32 +0000 UTC [ - ]
One example of a compact convex set is a circle. Imagine some continuous function that maps a circle to a circle. An easy example is rotation. For rotation, the center stays in the same place. Now imagine some more complicated function that might do rotation, stretching, spiral distortions, whatever you can imagine. But it has to be continuous, meaning you can’t go off one edge and come back on the other side (among other things). Try to draw, or even just imagine a transformation like this and you’ll find that there’s always some point that stays in the same place.
For a 3D object like a glass it’s hard to see the same thing intuitively, but Brouwer’s theorem says that it holds (and it also holds for higher-dimensional sets).
kevinventullo 2021-08-18 01:07:58 +0000 UTC [ - ]
enriquto 2021-08-18 10:46:20 +0000 UTC [ - ]
A glass is not convex so this theorem does not apply. You'd need other fixed-point theorems for that. If the glass has a handle the result is trivially false (consider rotating a doughnut, this does not have any fixed point).
antman 2021-08-18 08:25:52 +0000 UTC [ - ]
techwizrd 2021-08-18 00:32:44 +0000 UTC [ - ]
I think the 3-dimensional analogy (similar to the stirring of coffee and sugar Brouwer observed) is rather unintuitive. I think some of the proofs are quite beautiful, but I like this analogy from Wikipedia:
> Take an ordinary map of a country, and suppose that that map is laid out on a table inside that country. There will always be a "You are Here" point on the map which represents that same point in the country.
SquishyPanda23 2021-08-18 01:34:40 +0000 UTC [ - ]
As you mentioned, it's also not intuitive. It's by now well-digested, but it's a fairly surprising theorem, even to Brouwer.
Finally, it's not constructive. Generally you can't figure out which point is fixed. If I recall correctly, frustration with this fact fueled Brouwer's interest in intuitionism, which gives us a type of logic where some propositions are neither true nor false but essentially undecidable.
I'm not sure what you mean by a "particle" in your sentence. If you mean to go down to the quantum scale, then I'm sure all sorts of things go wrong in trying to apply the theorem. For one, even talking about the position of a particle at all is very different from talking about a point in Euclidean space.
If you don't mean to go down to the quantum scale, then lots of maps have fixed points. Rotations an reflections are two large classes of examples.
vlovich123 2021-08-18 02:05:15 +0000 UTC [ - ]
SquishyPanda23 2021-08-18 02:39:07 +0000 UTC [ - ]
Intuitionistic logic is a logic that is weaker than classical logic in the sense that there are some proofs in classical logic that don't work in intuitinistic logic. Specifically, the law of the excluded middle (either A is true or not A is true") does not hold.
You can convince yourself that this is a reasonable type of logic by interpreting "A is true" to mean "I have a proof of A", and likewise "not A is true" means "I have a proof of not A". Then it's pretty reasonable that you neither have a proof of A or of not A.
I haven't read Godel's proof, but my understanding is that he uses intuitionistic logic in some of the key steps. Since facts in intuitionistic logic are also facts in classical logic, then I believe the incompleteness theorem holds in both logics. But I'm not at all an expert here, so someone please correct me if I'm wrong.
"Intuitionism" also refers to a specific philosophy of mathematics. But a lot of people just use the word to refer to the logic that came out of that philosophical school.
Intuitionistic logic is also very common in type theory, so a lot of proof assistants use it.
TeMPOraL 2021-08-18 08:43:53 +0000 UTC [ - ]
Between math and software development, I feel I've encountered this idea at least three times (+ rediscovered it independently), under different names - "3-value logic", "3-state booleans" - without ever hearing about "intuitionism". Thanks for mentioning it, and now I wonder how it all relates (i.e. I'm about to get stuck on Wikipedia).
> interpreting "A is true" to mean "I have a proof of A", and likewise "not A is true" means "I have a proof of not A".
Makes me think of an interesting progression I've also seen when studying maths and then (classical) AI algorithms. You start with boolean logic, then figure out you need to represent lack of knowledge/certainty and get 3-state logic. Then someone notices that there are degrees to being sure, and you get fuzzy logic[0], representing logical values as real numbers. And then someone notices that this smells awful lot like probability theory, and you get Bayesian probability as applied to reasoning, with all the extra concepts like using whole probability distributions as first-class logic values.
--
[0] - I'm not sure how much mathematical rigour there is to it; it's something I've learned when studying AI for videogames.
SquishyPanda23 2021-08-18 10:24:30 +0000 UTC [ - ]
They are different logics in general though. 3-value logic is more like the A is True, False or None/Unknown.
Another interpretation one can put on intuitionistic logic is game semantics. Here a statement is true if the verifier has a winning strategy to prove the proposition. It's false if the falsifier has a winning strategy. Then intuitionistic logic corresponds roughly to the idea that some games don't have a winning strategy for either player.
So here (and in the proofs example I gave above) is that we get an idea of "neither true nor false" that is represented by the lack of an explicit rule for concluding truth or falsity in some cases.
In contrast, 3-valued logic gets at the idea of "neither true or false" by introducing another symbol (e.g. "None" or "-1") together with new operators that operate on sentences with that symbol.
So we get one from classical logic by deleting something (namely the law of the excluded middle) and the other by adding something.
skulk 2021-08-18 02:18:40 +0000 UTC [ - ]
im3w1l 2021-08-18 02:07:14 +0000 UTC [ - ]
So just thinking out loud here but like in the 1d case... The set will be an interval [a,b] Let g(x) = f(x) - x
g(a) >= 0 and g(b) <= 0. Now we can use the intermediate value value theorem and binary search to find (get arbitrary close) a point with g(x) = 0 (i.e. fix point of f).
Not possible to do such searches in higher dimensions?
SquishyPanda23 2021-08-18 10:12:39 +0000 UTC [ - ]
This is useful for practical applications, but it's not the same as knowing the fixed point for mathematical applications.
E.g. you may want to prove that some function preserves the irrational number pi. But you may not be able to prove it, only show that computations make your conjecture more and more likely.
ChrisKnott 2021-08-18 06:56:34 +0000 UTC [ - ]
I don't think you necessarily have g(a) >= 0 and g(b) <= 0. Finding the starting points for binary search might be essentially as hard as finding the fixed point.
xarici_ishler 2021-08-18 00:23:45 +0000 UTC [ - ]
This is a great video explaining Brouwer's theorem and its (theoretical) application to stirring a cup of water.
hyperbovine 2021-08-18 12:44:00 +0000 UTC [ - ]
max_ 2021-08-18 18:15:30 +0000 UTC [ - ]
That's true on a short time scale. But on a longer time scale, you will eventually visit all possible states of the system.
I think this is similar to the concept of ergodicity[0].
arketyp 2021-08-18 07:20:35 +0000 UTC [ - ]
chompychop 2021-08-18 08:15:14 +0000 UTC [ - ]
voldacar 2021-08-18 08:33:07 +0000 UTC [ - ]
joebob42 2021-08-18 00:24:07 +0000 UTC [ - ]
Or in the glass of water point, what if I pour the water out on the table. It's certainly all been moved, yes?
(To be clear I'm obviously wrong / not understanding the premise, but I would be curious to know what I'm missing)
nonameiguess 2021-08-18 00:34:26 +0000 UTC [ - ]
im3w1l 2021-08-18 00:27:27 +0000 UTC [ - ]
joebob42 2021-08-18 00:36:06 +0000 UTC [ - ]
The upside here is that it seems perhaps I wasn't missing the premise, and it was just mis-stated :)
voldacar 2021-08-18 08:37:09 +0000 UTC [ - ]
cblconfederate 2021-08-18 09:31:27 +0000 UTC [ - ]
im3w1l 2021-08-18 00:19:31 +0000 UTC [ - ]
kevinventullo 2021-08-18 01:16:00 +0000 UTC [ - ]
alanbernstein 2021-08-18 06:06:14 +0000 UTC [ - ]
drdeca 2021-08-18 01:56:04 +0000 UTC [ - ]
kevinventullo 2021-08-18 03:39:35 +0000 UTC [ - ]
drdeca 2021-08-18 17:20:32 +0000 UTC [ - ]
But for a normal spoon shape, I don’t think dipping it in makes the map between points in the water fail to be continuous. I’m somewhat confident that there is a smooth path from an embedding of a cylinder to R^3 to an embedding of a cylinder which is a little taller and has a spoon shape removed from it, into R^3 .
zbendefy 2021-08-18 11:36:59 +0000 UTC [ - ]
zitterbewegung 2021-08-18 00:36:16 +0000 UTC [ - ]
drdeca 2021-08-18 01:54:19 +0000 UTC [ - ]
Are you sure?
tzs 2021-08-18 13:48:23 +0000 UTC [ - ]
There is an easy way to counteract this. Let’s say your original runs in 1000 seconds. Squaring that would give 1000000 seconds. That’s a jump from under 17 minutes to over 11 days.
The fix: measure everything in hours. Your original runs in 0.278 hours. Squaring that gives 0.077 hours. Squaring the precision now makes it run 3.6 times faster.
jakeinspace 2021-08-18 14:01:23 +0000 UTC [ - ]
mmiyer 2021-08-18 18:42:17 +0000 UTC [ - ]
dynm 2021-08-18 10:40:20 +0000 UTC [ - ]
> But for other applications, like in numerical analysis, researchers might need to square their precision. To achieve such an improvement, they might have to square the running time of gradient descent
I don't quite understand this. Say f(p) is the complexity of gradient descent for precision p. The first paragraph seems to imply that f(p) = c×p^2, so that f(2p) = c×4×p^2 = f(p). However, the second paragraph then just seems to imply that f(p) = c×p. Am I missing something?
Incidentally, I love that Quanta brings attention to work mostly just on the basis of how important it is. 99% of the other stuff you hear about new research is really just driven by PR departments at the researcher's home institutions.
mmiyer 2021-08-18 18:47:34 +0000 UTC [ - ]
FabHK 2021-08-18 11:30:04 +0000 UTC [ - ]
> You can imagine a function as a landscape, where the elevation of the land is equal to the value of the function (the “profit”) at that particular spot. Gradient descent searches for the function’s local minimum [...]
optimalsolver 2021-08-18 01:00:10 +0000 UTC [ - ]
wenc 2021-08-18 03:05:13 +0000 UTC [ - ]
gerdesj 2021-08-17 23:35:27 +0000 UTC [ - ]
Buttons840 2021-08-18 02:33:50 +0000 UTC [ - ]
> But for other applications, like in numerical analysis, researchers might need to square their precision. To achieve such an improvement, they might have to square the running time of gradient descent, making the calculation completely intractable.
I think a silver lining here is that a company having access to 10,000 times as much computing power as a common consumer can only achieve, say, a 10x better model. So the inequality isn't as extreme as it could be.
adrianN 2021-08-18 09:53:21 +0000 UTC [ - ]
kbelder 2021-08-18 17:23:48 +0000 UTC [ - ]
Then, you apply better techniques, that hit the limits of your PC, and you bring that up to maybe 80%.
Then you bring in the PHDs, move your analysis to a cluster or into the cloud, and you gain another 2%.
In other words, it's more often that 10000x the computing power gives you something like a 1.2x the model quality, not 10000x or even 100x. Still worth doing for a large enough company, though.
Just my experience, and only a generalization. Some models just can't be built without a minimum amount of horsepower, etc.
mjburgess 2021-08-18 05:03:48 +0000 UTC [ - ]
It's not necessarily that squaring precision leads to a "better model", its that it fails to be a model at all without that precision.
Eg., consider a climate model with effects at the first-decimal place, and 10th decimal-place of some measurable quantity vs. time.
The effects at the 10th dp. could aggregate to overwhelm the effects at the 1 dp.
It isnt that the model is better if it can "compute at the 10th dp" -- its just wildly inaccurate otherwise.
ben_w 2021-08-18 07:48:53 +0000 UTC [ - ]
While the broader point is valid, IIRC climate (being about average and broad trends) is fine with the imperfections, and handles this by running multiple simulations with slightly different parameters and seeing what usually happens. Weather forecast likewise.
TeMPOraL 2021-08-18 09:13:35 +0000 UTC [ - ]
That may be true, but in simulations using floating points (which are, as 'mjburgess points out, discrete), there are discrete approximation errors that accumulate with almost any operation, on top of any measurement or modelling approximation errors you may have. On some problems, numerical integration on floats bleeds precision really fast.
I recall seeing some problems on numerics labs during university years, where using doubles instead of floats was what made the algorithm converge on a solution instead of endlessly spinning or diverging into infinity. I imagine that ideally, you'd use a more sophisticated algorithm instead[0] - perhaps one that simulates real numbers with greater precision. But computer power isn't infinite (and it certainly wasn't infinite in the last century) - floats and doubles are fast, supported by hardware, and you may need that speed to get results in a reasonable timeframe. Using doubles instead of floats may give you just enough precision buffer to make the faster algorithm work.
--
[0] - A trivial example that I'm sure is known to anyone doing physical simulations (including for videogames) is using increasingly sophisticated integrators for kinematics. You may start with naive Euler calculating v += aΔt, s += vΔt, but this quickly becomes inaccurate in quite spectacular ways (try simulating springs with naive Euler). So you may then progress through backwards / symplectic Euler, and end up with Verlet integration or even reaching for 4th Order Runge-Kutta, to keep the energy in the system more-less conserved as it evolves. And yes, the difference between floats and doubles is also very apparent, especially with Euler methods.
gugagore 2021-08-18 14:22:17 +0000 UTC [ - ]
Truncation error is present even if there were zero approximation in the elementary floating point operations. I'm not sure how you mean that the difference between floats and doubles shows up with Euler methods especially.
Under no floating point error, you can drive truncation error toward zero by driving the step size toward zero. But the floating point precision sets a floor on how low you can go. Perhaps this is what you meant.
mjburgess 2021-08-18 08:22:22 +0000 UTC [ - ]
When creating simulations we do so by defining a computable function (ie., over discrete binary strings) which approximates some non-computable function (ie., over the reals).
It is important to note that those non-computable functions are derived, in part, analytically based on theorems which require infinite precision, ie., real numbers.
When we define a computable approximation we can only do so by narrowing the simulation to a scale where "the numbers have only a few significant digits". This is really the only thing we can simulate, and even then, we are using mathematical analysis to derive much of it -- which cannot be simulated.
If you really wanted to simulate across scales, ie., run a simulation "from atoms to tables", i think we'd be requiring many many many decimal places. It would likely, i think, be physically impossible to do. (Or if not, then atoms to planets would be) -- as the discrete precision required would require a theoretical minimum of mass that would create a black hole.
eg., Consider that a table has 10^verylargenumber of "spherical" atoms, and so its aggregate shape would be extremely sensitive to a miscalculation of each atoms shape ... and we would require a precision in PI probably something like 10^-verylargenumber.
wcoenen 2021-08-18 11:57:00 +0000 UTC [ - ]
That "verylargenumber" exponent is not actually a large number. It's probably close to 28.
E.g. for a rough estimate, pretend that the table has a mass of 120kg and is made of only carbon atoms. 120000 gram / (12 gram/mol) = 10000 mol = 6x10^27 < 10^28.
On a related note, the estimated number of atoms in the visible universe is around 10^80.
mjburgess 2021-08-18 14:24:24 +0000 UTC [ - ]
So we need 30 bits/atom to store the volume, for 10^30 atoms -- this is 10^15 petabytes.
Suppose we're trying to estimate the shape of the table using the relevant state vector for each atom (ie., radius, coordinate, EM-shell, etc.). Ie., how these atoms interact to produce a structure.
Then we need perhaps 10 numbers/atom, so 10^16 petabytes. And consider the computation.
The issue is that macroscopic structuring effects, ie., that the table has its tabular shape also exert a top-down effect constraining the structure of the atoms.
We'd have to solve that problem through iteration. Ie., we compute a shape from local effects only, then iteratively recompute until increasing orders of scale are included.
Suppose we perform 1 operation/atom to compute an interaction, that's 10^30 operations. At 1ns/op, we have 10^21 seconds -- optimistically.
That's one scale, now we need to iterate; given the orders of effects are 1nm...1m that's c. 10 scales, say. We're now at 10^22 seconds with 10^16 petabytes of data.
And also, since we're running a 10-fold iteration we need more than 10^30 precision, as errors will compound.
This is for a table!. As I said, choose your scale, eg., atom to star, and the storage requirments have an essentially physically impossible mass.
actually_a_dog 2021-08-18 11:41:50 +0000 UTC [ - ]
---
[0]: https://www.jpl.nasa.gov/edu/news/2016/3/16/how-many-decimal...
morpheos137 2021-08-18 12:37:07 +0000 UTC [ - ]
This comment makes no sense to without the number of significant digits.
legulere 2021-08-18 05:04:19 +0000 UTC [ - ]
nightski 2021-08-18 18:44:01 +0000 UTC [ - ]
madsbuch 2021-08-18 06:19:44 +0000 UTC [ - ]
cblconfederate 2021-08-18 09:23:01 +0000 UTC [ - ]
scotth 2021-08-18 03:04:02 +0000 UTC [ - ]