Google considers switching FLoC to a topic-based approach
godshatter 2021-08-17 15:58:17 +0000 UTC [ - ]
vineyardmike 2021-08-17 20:01:27 +0000 UTC [ - ]
Because i have an interest in X but i frequent sites for topics X, Y, Z, A, B daily because i'm a complex individual. Maybe X is the most "valuable" topic to profit off so now A, B, X, Y, Z all show ads on just X.
qeternity 2021-08-17 22:16:59 +0000 UTC [ - ]
RcouF1uZ4gsC 2021-08-17 16:58:51 +0000 UTC [ - ]
nixass 2021-08-17 15:21:32 +0000 UTC [ - ]
john_yaya 2021-08-17 15:32:34 +0000 UTC [ - ]
rektide 2021-08-17 15:49:27 +0000 UTC [ - ]
But this is not to a user's dis-advantage. Google genuinely is trying to find a good path forward. The proof to me comes in the quest for alternatives. Find anything else pro-user being suggested. Find ANYTHING. There's nothing. It's barren.
There is a huge astroturfed war going on. Google is trying to do good. It's trying to find some compromise between content-holders & users. The content-holders are stirring up a colossal storm because they want what they have, which is everything. They're winning this PR blitz. Low, cheap, pervasive attacks on Google are winning this war, & it's users who are losing.
tyingq 2021-08-17 16:56:21 +0000 UTC [ - ]
danShumway 2021-08-17 16:51:00 +0000 UTC [ - ]
Every major browser other than Google is taking proactive steps to limit cookies right now. Firefox very recently announced better cookie deletion for sites; they'll now try to delete cookies from other domains that are linked to those sites. This is going to get progressively better and better.
But the sentiment here is exactly what Google promotes. It begs the question over whether or not we need to compromise with content-holders on privacy. We don't, no browser other than Chrome views user tracking as a necessary compromise with advertisers. There was a point in the past where we might honestly worry about whether it was feasible to block 3rd-party cookies, but since then multiple efforts and improvements from Mozilla and Safari have made it clear that browser manufactures don't have to get advertisers' permission to push code.
Google is refusing to proactively move forward with steps that everyone else has agreed are necessary, because Google is mad that we're not thinking enough about their business interests. We have a good path forward, Google is blocking it for a large number of users, and then Google is complaining "why is no one else except us coming up with solutions?" But we have come up with solutions, they're just not solutions that Google likes.
The caution from Chrome about cookies is really interesting. Chrome broke web audio, alerts, the dev team is regularly openly hostile to suggestions that the consensus process moves too quickly. They tried tried to deprecate the url for crying out loud, but now they want to be all cautious about cookies? Sure.
> There is a huge astroturfed war going on.
Agreed, but not in the direction that you're suggesting. The general public wants better privacy, they don't care about whether or not it makes advertisers happy. Nontechnical users are often misinformed about tracking and they often lack technical skills or willpower to increase their privacy on their own. But that doesn't mean the public doesn't care about privacy, ordinary users do support the figerprinting protections Apple is building into iOS, they support the privacy changes Mozilla is building into Firefox. Most ordinary people are on the side of privacy in this debate.
Advertisers are desperate to argue that users are being harmed or that moving forward without an alternative tracking system is impossible, when the reality is that even the nontechnical public has turned out to have overwhelmingly positive feelings towards on-by-default technical measures to reduce tracking. The astroturfing here is the FUD coming out of companies claiming that we need (for either moral, technical, or PR reasons) to compromise with advertisers before we can move forward removing 3rd-party cookies.
It's really only Google who is lagging on this stuff. If they want the least private browser on the market, then they can have it. I don't think we need to hold back everyone else while we wait for Google to make up their mind about this.
jsnell 2021-08-17 17:05:57 +0000 UTC [ - ]
danShumway 2021-08-17 17:50:00 +0000 UTC [ - ]
This is exactly how the rollout of DoH has worked, different regions have had rollouts at different rates, particularly in regions where legislators are concerned about market effects. If the UK moves to block that too, then maybe then we have a more serious problem. But Google should be trying to roll out these protections in areas where it can.
It also doesn't mean that the UK is opposed to restricting 3rd-party cookies. The UK is worried about Google privileging their own ads. That's not the same thing as demanding a compromise with advertisers to enable cross-site tracking before cookies are removed, it's a requirement that Google not use the removal of 3rd-party cookies to give themselves outsized control over the advertising ecosystem.
jsnell 2021-08-17 19:44:19 +0000 UTC [ - ]
As for the first point, do you have a region in mind where the regulatory authorities have greenlit this? Certainly not anywhere in the EU, given they launched a pre-emptive probe into the plans before anything was launched. Or in the US, where the unlaunched plans are included in the antitrust complaint being driven by the states.
Look, there's a lot of complaints one can direct at Google. But this isn't one of those things. Given the amount of scrutiny how could they possibly unilaterally launch anything related to this, beyond tiny experiments?
danShumway 2021-08-17 20:00:48 +0000 UTC [ - ]
My suggestion is literally that they not unilaterally launch this across every region.
> The position of the CMA is that other advertisers need 3p cookies but Google doesn't. [...] They will demand there to be some mechanism that gives small ad tech companies more data than FLOC did.
I disagree with your connection of those two points. Google's job is to convince regulators that the removal of 3rd-party cookies won't advantage them. There are multiple ways they could attack that problem, including limiting the amount of data they collect. The UK's position is not that it is fundamentally important for advertisers to track users, the UK's position is that is important that there not be one advertiser who can track users.
It's particularly galling for Google to ignore this, because it assumes that of course Google can't possibly collect less information about its users or use less information in ad targeting, so the only solution is to make sure everyone else has the same amount of information as Google. But that's a false premise in my mind. I'm not advocating to remove 3rd-party cookies because I want Google to be the only entity that can target me; I would be perfectly happy if Google removed 3rd-party cookies and committed to tracking me less to reduce its dominance in the market.
In fact, I would argue that it would be better if Google went down that route. I don't trust Google any more than I trust any other 3rd-party advertiser; it's just as problematic for Google to be collecting this kind of invasive data. The assumption that Google's business model or data collection can't possibly change, and therefore we can't make any privacy improvements for anyone else... I just really strongly disagree with that idea.
This is also true in the US. There are some probes happening in the US that cite cookie removal as a possible anti-competitive indicator. Notably, there are other policies and actions that these probes cite that Google has not scrambled to avoid in the same way as it has for cookies. But whatever -- even in the US, regulators are not saying that advertisers need this information, they are saying it's unacceptable for Google to have this large of a disparity with other competitors.
dmitriid 2021-08-17 16:00:16 +0000 UTC [ - ]
Google is that earns 80% of its money from online ads. There's literally no incentive for Google to "find a good path forward".
dylan604 2021-08-17 15:41:01 +0000 UTC [ - ]
jefftk 2021-08-17 15:42:31 +0000 UTC [ - ]
(Disclosure: I work on ads at Google, speaking only for myself)
luckylion 2021-08-17 16:34:40 +0000 UTC [ - ]
With ads, content quality goes down. You don't need to convince people that your content is worth reading, you just need to convince Google. Slap a few expired domains on your text-spinner generated content and off you go.
Not all content is created equal, and Made-For-Adsense content would not have been made otherwise and would not take away attention from quality content.
I believe you're rationalizing your paycheck.
rchaud 2021-08-17 16:04:35 +0000 UTC [ - ]
https://digiday.com/media/googles-confirmed-clicks-initiativ...
dessant 2021-08-17 15:56:42 +0000 UTC [ - ]
gxnxcxcx 2021-08-17 16:46:46 +0000 UTC [ - ]
The monetized web has in a lot of measures exceeded the frontiers of what the ragtag petite bourgeoisie of the early internet could ever be but, in the process, the great majority of us (both users and the very long tail of non-top content creators) have invariably become peasants.
I'm not saying there was any other way to have what we have now, but what we have now kinda ain't sooo good when one isn't the middleman in a process architected for the middleman's benefit.
rurp 2021-08-17 23:23:57 +0000 UTC [ - ]
dogleash 2021-08-17 16:56:28 +0000 UTC [ - ]
Weird position to take when most high-quality content is already shifting to a paywall model (or didn't base their business model on advertising in the first place). And ad-supported content is in a race to the bottom meaning any content useful beyond light entertainment is drying up.
easrng 2021-08-17 17:01:02 +0000 UTC [ - ]
ericholscher 2021-08-17 18:19:23 +0000 UTC [ - ]
dmitriid 2021-08-17 16:01:51 +0000 UTC [ - ]
They are not.
The world has survived for millenia without tracking ads.
greiskul 2021-08-17 16:38:34 +0000 UTC [ - ]
Nicksil 2021-08-17 17:08:00 +0000 UTC [ - ]
Since its birth, clearly. No matter how you try and spin this, advertising is not a requite for a free and open Web. If ads were to disappear tomorrow, web sites which society has deemed important will survive, the others (if income was necessary in the first place) will indeed perish. And we will be just fine.
dmitriid 2021-08-17 16:48:49 +0000 UTC [ - ]
For a very, very long time.
> Cause I see a lot of it dieing every day, being replaced with walls.
That's because companies feel value in silos and extracting every cent they can out of users.
gxnxcxcx 2021-08-17 17:45:27 +0000 UTC [ - ]
perryizgr8 2021-08-18 09:35:18 +0000 UTC [ - ]
wendythehacker 2021-08-17 17:09:56 +0000 UTC [ - ]
-) Is is possible to re-identify webpages a user visits based on Cohort ID?
-) E.g. can a website be built to show your "profile" and "interests" based on the Cohort, rather than just the FLoC ID? Google and others who (I assume) will share their back-end data will be able to build such a website.
-) Can a "rainbow table of FLoCs" be pre-calcuated? This would allow to re-identify certain browsing habits of users
-) In fact what if someone creates a Chrome extension that publishes visited domain names and their resulting FLoC ID - Imagine many people download and use it for fun! This would sort of decentralize the previous mentioned de-identification attacks and render FLoC useless for all other privacy concerned users.
-) How much easier is it now to track a user with just IP address + FLoC ID now?
BUT, what I'm missing entirely at this point is how will the web server (Google and other ad companies) actually *use/share* the Cohort information? That is not being described at all by Google - and seems rather critical to me.
More details and testing ideas in this article: https://embracethered.com/blog/posts/2021/red-teaming-floc-c...
purplepatrick 2021-08-18 01:43:53 +0000 UTC [ - ]
How granular should a “topic” be? “Financial services” isn’t gonna cut it. Or how about medical sites? “Medical” or “health” are useless mean-nothing buckets, while “rectal bleeding” may be too granular.
Ditto the issue of topic visit frequency - sure, someone visiting shoe sites multiple times is a thing, but how often is someone going to look for, say, mortgage loans? ~90%+ of people only gather two offers for home loans, for example. They can get that on one site in one visit. But they may have read a ton about it before on sites that may only have been tagged as “financial content” or some wish washy label.
Data structuring for topics, occupations, etc. online or offline has almost never succeeded. Ever. I have never seen anyone do it well and I work in an industry that depends on it.
Oh, and don’t even get me started on sets…
Making this the basis for targeting is really only going to help sell shoes, fashion, porn, jewelry, and cars.
tehjoker 2021-08-17 16:32:53 +0000 UTC [ - ]
I do think it is fun to imagine what topics they could assign to you around the web. What about "anti-vax", "terrorist", "communist", or what not now totally divorced from the material that generated the label.
ocdtrekkie 2021-08-17 16:45:46 +0000 UTC [ - ]
This attempts to solve this by limiting it to a much more limited set of topic groups that are presumably plaintext-readable, and it sounds like the option to remove yourself from ones that are incorrect may also be considered.
That being said, I'm happy to use a web browser that doesn't consider better ad targeting a reason to implement features.
spullara 2021-08-17 19:58:28 +0000 UTC [ - ]
iamben 2021-08-17 20:31:27 +0000 UTC [ - ]
jtsiskin 2021-08-18 03:25:15 +0000 UTC [ - ]
srswtf123 2021-08-17 15:41:29 +0000 UTC [ - ]
But hey, shareholder value, right? Waaay more important.
dheera 2021-08-17 15:54:03 +0000 UTC [ - ]
Welcome to capitalism.
neolog 2021-08-18 03:00:21 +0000 UTC [ - ]
srswtf123 2021-08-17 18:36:40 +0000 UTC [ - ]
Then again, rarely is lawbreaking by corporations adequately prosecuted. Even more rare are actual guilty verdicts resulting in actual punishment. Of course, "Don't be evil" isn't a law, and I'm not claiming Google is breaking any.
So while I think I may be technically correct, reality certainly does seem to favor your statement. This seems like a fairly sad state of affairs.
Is it like this everywhere capitalism is practiced? I'd think a visiting extraterrestrial scholar might be able to weigh in.
dheera 2021-08-17 19:32:46 +0000 UTC [ - ]
I agree with this. But laws and ethics are two different things. Corporations often do things that are in line with the law but unethical. By "civil duty" I meant ethics.
kwhitefoot 2021-08-17 16:48:37 +0000 UTC [ - ]
No it isn't.
erhk 2021-08-17 17:02:49 +0000 UTC [ - ]
kwhitefoot 2021-08-19 16:47:28 +0000 UTC [ - ]
nixroot 2021-08-17 16:56:05 +0000 UTC [ - ]
syrrim 2021-08-18 01:29:44 +0000 UTC [ - ]
As such, I doubt this model will leave anyone very happy. Advertisers will dislike that it offers poorer tracking, and users will dislike that it leaks demographic information. I for one am satisfied with an approach that screws over advertisers. However, I suspect google won't be. What I see as their only escape is to create sandboxed environments in the browser from which to run advertising targeting code. A javascript worker of some sort would be restricted from making web requests, and would only be able to act on information fed to it. This information would come from webpages the user visited, and would include information about the page, as well as their actions there. The worker would use this information to decide what ads to target at the user.
Downloading a particular ad without leaking information remains a problem. You could download every possible ad, and have the worker select one, but this is very wasteful of bandwidth. "Private information retrieval" (PIR) schemes can reduce the bandwidth wastage substantially, without leaking any information to the server. These schemes have received a great deal of attention from cryptographers, but as far as I can tell have been overlooked by programmers. Likely this is because there are few instances where a project cares about privacy, but the project owner is untrusted to protect privacy. Thus, it seems like privacy protecting targeted advertising would be an ideal testbed for PIR. It would be very interesting if google went this route.
audit 2021-08-18 03:49:56 +0000 UTC [ - ]
Perhaps we can look at advertising as 2 categories:
(1) advertising -- as means to notify about an existing product or a service
(2) advertising as means to elevate one product over the other in the eyes of consumers.
The (1)st category seems to be very legitimate and welcome.
The (2)nd category falls into implicit manipulation, almost con-artistry.
So let's assume that it is the (1)st category is the one the public would like to find solutions for (and the 2nd category will, eventually, become outlawed .. let's hope).
With that in mind, perhaps a way for a user to set legitimate interest (at the browser level) and the type of 'pages' they accept to see ads would work.
The 'types' of pages might be a some sort of taxonomy standard that labels a page with a type (eg privacy preserving or not and many subtypes).
As a user navigates to through the web, only pages with specific subtypes, would participate in the 'digital identity build up'.
The pages that, according to user preferences, cannot participate in the 'digital identity' build up -- can still show ads, but cannot use or 'donate' its information into the digital identity database.
jeffalo 2021-08-17 15:27:16 +0000 UTC [ - ]
mandevil 2021-08-17 16:12:49 +0000 UTC [ - ]
So Google has been trying to come up with something that Apple will let them get away with. Either give up on targeted advertising to iPhone/iPad users or come up with something better than third party cookies.
vineyardmike 2021-08-17 20:13:34 +0000 UTC [ - ]
Apple has no interest in supporting FLOC at all though right?
btown 2021-08-17 16:32:39 +0000 UTC [ - ]
Do. Not. Allow. Anything. LGBTQ+. Related. To. Be. A. Topic. Unless. It's. Opt. In.
People. Live. In. Unpredictably. Unsafe. Places.
People. Will. Die.
kodah 2021-08-17 17:03:48 +0000 UTC [ - ]
To me, in the United States, there are two major camps of thought:
* Targeted advertising is useful to me and I don't mind someone knowing everything about me
* Targeted advertising is dangerous to me for x, y, and z reason
A world that allows both is optimal. Allow companies to collect data, but have federal agency oversight and requirements up to and including the right to be forgotten. It should be possible to pull a level and dump most (if not all) collected data associated with you.
ssss11 2021-08-17 21:15:37 +0000 UTC [ - ]
I don’t personally think targeting works - does anyone have a good research paper that proves it?
I think the optimal solution would be the old pre-internet advertising.. general ads on billboards, but now on websites. No tracking.
fsociety 2021-08-17 21:52:40 +0000 UTC [ - ]
That means if we go back to pre-internet style of ads, then it will be expensive to advertise your business to a niche community therefore only larger companies can advertise.
There is a privacy angle that needs to be figured out, and it is complex. But just going back to the old way just empowers larger businesses to beat out smaller ones. This is the sad reality we live in now.
rovr138 2021-08-18 03:43:13 +0000 UTC [ - ]
You can choose where to advertise. Is this website, page, game, product, etc related?
You can do it how DuckDuckGo does it by targeting keywords.
Multiple people can buy space/impressions and you can rotate them. It doesn’t have to go back to old static banners.
danShumway 2021-08-17 17:34:25 +0000 UTC [ - ]
I agree with your overall sentiment, but I think the solution you propose is bad. It allows companies to collect data automatically behind users' backs, and then tries to address the extreme power imbalance that creates with regulations and escape hatches that users can take after that data has already been created. But ideally, we want to prevent problems from occurring at all, not just to try and fix them after the fact.
To your point about voluntary tracking, users may want to signal interest in some topics but not others. They may want to signal interest about some topics only in certain situations. They may want to signal interest about a topic to one site, but signal interest about another topic to a different site. They may want to "freeze" the topics that they signal interest about, so that they don't need to worry about accidentally revealing new preferences in the future, or to think about how browsing a set of sites will change the ads that they get. All of these are valid things for a user to want to do.
It is better if users have complete agency over what data is transmitted, and if companies do not fingerprint or correlate users behind their backs at all. Attempting to determine user cohorts clientside is the correct move, arguably the only part of FLoC that Google got right. The problem with FLoC is that even if cohorts are determined completely clientside, the user still does not have agency and transparency about those cohorts.
There is theoretically a version of FLoC that would be really good for privacy, and that would also allow the kind of optional tracking that some users want. Such a system would need to be easy for users to toggle on and off on the fly on a site-by-site basis. It would need to be transparent to users about what categories they're in, it would need to allow users to edit and delete their categories, to add new categories, and to turn off category aggregation. It would need to not send new categories to a website without the user's permission. It would need to block websites from determining whether they're being given an automatically generated category, or a user-defined category. And importantly, it would all need to be opt-in.
Right now, tracking is an invisible system that happens automatically, with (rare) escape hatches that the majority of users are not expected to pursue, and that give little to no granularity about how users can customize how they present to the world. This kind of tracking is inherently invasive. We should pursue a paradigm shift where users proactively volunteer information about how they want to present to the world in any given situation -- in other words, we should give users agency to choose their own identities online.
Most regulatory suggestions that I see online (including the one you give) consciously or unconsciously focus on preserving the invasive nature of tracking while adding safeguards to try and prevent advertisers from abusing the inherent power imbalance. Instead, that power imbalance ought to be totally flipped in favor of the user. We need to reject the underlying idea that preferences should be extracted from users, rather than something that users consciously choose to volunteer.
----
I've commented to the same effect a couple of times in the past:
- https://news.ycombinator.com/item?id=25906791
- https://news.ycombinator.com/item?id=26353494
- https://news.ycombinator.com/item?id=25907079
This isn't only true for advertising. Think about how little control users have over how personalized algorithmic content filtering/ordering works on sites like Youtube, and you'll see similar conceptual problems with how the systems are designed. But it's really apparent in advertising, moreso than in other contexts.
I want to fight back against policy changes that (in my mind) cement and normalize the current philosophical approach to advertising. It's not OK to be tracked without your permission, and regulations or safeguards and escape hatches around tracking aren't enough to make it OK. The system would still be fundamentally broken.
kodah 2021-08-17 18:07:04 +0000 UTC [ - ]
danShumway 2021-08-17 19:31:54 +0000 UTC [ - ]
I think it's kind of important when talking about privacy to keep in mind that the goal shouldn't be to force people to never reveal anything about themselves, it's to give them agency over what they reveal. That gets lost sometimes, I'm often guilty of losing sight of it as well. So a privacy world that makes it impossible for people to connect with each other, or that tells them that they're not allowed to present a certain way online is just as much of a problem as a solution that requires them to do so.
To me, the core idea behind privacy in regards to user tracking is that people should have agency over what their identities are, over what identities they're "allowed" to have, and over where they share those identities and whether those identities are associated with each other. It's totally valid for people to want to be able to tell Google that they're interested in seeing certain ads, they should be able to do that.
And when we expand out from privacy and look at algorithms on sites like Youtube/Twitter, that underlying idea of control becomes a bit more obvious -- it's not that content suggestions are bad, it's the inversion of control over how those suggestions are determined, the requirement that suggestions are constantly being computed and updated based on every action the user takes, the requirement that there be one set of suggestions for each user regardless of context, and the refusal from companies to give users the ability to do anything beyond slightly tweak or retrain their suggestions or to treat personally volunteered preferences as valid or trustworthy compared to what the algorithm determines they should like.
But I do want to get Youtube suggestions for related videos, I just want to be in control over how that happens. I want to be the entity with power in that relationship, I want to be the entity holding onto my data, I don't want to have to trust Google not to abuse me or to ask permission for Google to forget things.
This also gets at the potential benefits of a private world where users have real agency. It's very easy (I'm often guilty of this) to phrase the end goal of privacy as a world where tons of things just go away. But the reality is that targeted ads today mostly kind of stink, and there's a lot of potential for filtering, curation, and community aggregation that we can't take advantage of because users are excluded from the process of determining what they see online and how they're perceived by others. I wish there was more effort to try and describe how a private world could be better for things like search suggestions, user-relevant ads, content filtering -- because in a world where users had control over how this stuff worked and could customize their own experiences, it might be possible to try new applications, share more information, or experiment with new identities without risking abuse.
Imagine a world where you're an LGBTQ+ adjacent teen trying to figure out your own identity, and you temporarily turn on a category related to that for a subset of sites. If there's not a huge danger of fingerprinting, you can see how it feels for sites to recognize that -- maybe to tell Youtube that for right now you'd like to see more videos suggested based on that category. But you can safely do that because you know that other sites won't get that information, and that at any point you can switch the category off with zero consequences. You don't have to ask Google/Youtube for permission to edit what they know about you or to forget a category, you can control it locally right from your browser without asking anyone's permission.
That opens the door for really powerful applications or recommendation engines that arguably couldn't be (morally) built today. It's not about trying to create a world where nobody knows anything about anyone, it's about flipping the power imbalance and inverting the current predominant narrative about how information should be collected online.
With FLoC (and with other privacy initiatives from companies like Facebook), the feeling I get is that Google is trying to convince users that it can be a responsible data steward both because of internal policies and regulations. These companies try to create a narrative that the only options are either they track us, or that we never get anything recommended again. But neither of those options are what I want, what I want is to be my own data steward.
ramses0 2021-08-17 20:44:02 +0000 UTC [ - ]
When providing information in response to a search by a user of the EAIS, the EAIS must order the information provided so that the flight options that best satisfy the parameters of the user-selected search criteria are displayed conspicuously and no less prominently (e.g., in the same or larger font size and the same or more noticeable font color) than any other flight option displayed.
https://www.law.cornell.edu/cfr/text/14/256.4
...............
The airline industry (once upon a time) got its collective hand slapped because they would only display flights from certain carriers on "page 1" and relegated "other options" to "page 2".
It doesn't seems like we're going to put the genie back in the bottle of push promotion or "algorithmic" ordering, and it's a very difficult battle to fight.
The original genius of reddit/n.y.c display ordering algorithm was that it represented an aggregate priority order that was user-controlled (ie: 100 users give 500 upvotes across 1000 articles, and you get to see them ordered 10 at a time). The site is/was the cohort, and "the algorithm" was shown equally to all comers.
Contrast to twitter (presumably) and facebook (definitely) where content "chum"[0] is intermingled as soon as possible after the original "hook" of original or requested content, and tailored to the individual user. It's a constant stream of distraction and lies, and no wonder that a significant portion of net-users become hooked.
googlryas 2021-08-18 06:40:25 +0000 UTC [ - ]
btown 2021-08-18 16:39:40 +0000 UTC [ - ]
> Do not allow anything LGBTQ+ related to be a topic unless it's opt-in. People live in unpredictably unsafe places. People will die.
I try to write in an accessible way generally, but in this specific case I wanted a message that would stand out visually to sighted people, because they are the most likely (probabilistically) to be both the group leading this project at Google, and also the group who might have not thought about how their technical choices can impact people with diverse backgrounds. I definitely tripped over the trolley problem here, and I sincerely apologize for hurting you.
shuckles 2021-08-17 16:44:58 +0000 UTC [ - ]
btown 2021-08-17 17:05:28 +0000 UTC [ - ]
With FLoC, though, the idea was that the browser would provide document.interestCohort() and the individual site's JS could react accordingly: https://github.com/WICG/floc . This means that any site, regardless of its contracts with ad networks, could immediately identify your cohort and associate it with your activity. Web developers working in good faith would be encouraged to have user.cohort or user.topic fields from day one "just so you have it" - imagine all the ways someone could use this in bad faith. Inevitably this data would leak (or be intentionally leaked) and could trivially become a target list for doxxing closeted people. It's a dangerous, dangerous proposal.
shuckles 2021-08-17 19:30:36 +0000 UTC [ - ]
jefftk 2021-08-17 20:24:32 +0000 UTC [ - ]
FLoC is one of a collection of proposals, which does include preventing fingerprinting: https://www.chromium.org/Home/chromium-privacy/privacy-sandb...
If you're not going to prevent fingerprinting, why even get rid of third-party cookies? Fingerprinting is a step backwards from cookies, since, for example, closing and reopening a private browsing window gives you a new cookie jar but the same fingerprint.
(Disclosure: I work for Google, speaking only for myself)
shuckles 2021-08-17 20:37:05 +0000 UTC [ - ]
I’m skeptical that fingerprinting can be “solved”, but let’s suspended disbelief.
danShumway 2021-08-17 22:41:49 +0000 UTC [ - ]
Let's say fingerprinting is completely solved. The browser's still going to be sending your FLoC data to Facebook/Reddit/Twitter/Amazon when you log in -- and it's not clear you'll even be able to tell what information that FLoC payload reveals about you.
Solving fingerprinting is one thing, but there are a ton of situations online where users will be revealing their real identity to service providers, regardless of the fingerprinting protections. What FLoC data will be sent to those providers?
danShumway 2021-08-17 22:38:39 +0000 UTC [ - ]
FLoC greatly increases the potential abuses that can come from fingerprinting, particularly from smaller sites.
So is Google committed to completely eliminating fingerprinting from the browser before it launches FLoC? Because it would really stink for everyone if the privacy sandbox doesn't completely block fingerprinting, and then the sites that manage to fingerprint users suddenly have a huge amount of extra data about them.
What will happen if the privacy sandbox comes out and people are still fingerprinting through other invasive means, or if a hard-to-fix exploit is found, or if a research paper demonstrates that the privacy sandbox is insufficient? Will Google commit to disabling FLoC if that happens?
vineyardmike 2021-08-17 20:00:10 +0000 UTC [ - ]
It shows that google no longer things owning data is a competitive advantage.
This seems like it should be bigger news.
vineyardmike 2021-08-17 16:58:22 +0000 UTC [ - ]
FLOC is centralized at the individual. That means YOU are profiled completely and all the profile info is centralized for everyone to grab.
danShumway 2021-08-17 17:09:24 +0000 UTC [ - ]
With FLoC, anyone can get that data without entering into an expensive agreement. I can get that data as a single person from a personal blog with no cookies. I can get that data directly, I don't need to sign up with another 3rd-party tracking company that might not tell why they're serving my users a specific ad or what data they have about them. And it's all scaleable with no extra cost to my operations. This opens up additional attack vectors that might not exist otherwise: in a small operation it may be a lot easier for me to fingerprint you using your IP address, login information, or other data I have access to and to correlate that to a real-world identity. When that happens, I also have access to all of the information about you from your FLoC categories, for free.
On some level, Google is taking a gamble that users will be difficult to fingerprint or identify using FLoC so it won't matter that some sensitive information is leaked. But it's just not the case, users can be fingerprinted. And in scenarios where they are fingerprinted/identified, FLoC provides a much more complete picture of that person based on activity from sites that might not have ever sold that information, that might have never intended to leak information about them in the first place.
shuckles 2021-08-17 17:02:16 +0000 UTC [ - ]
josefx 2021-08-17 18:04:09 +0000 UTC [ - ]
vineyardmike 2021-08-17 19:58:04 +0000 UTC [ - ]
FB and google used to (idk about now, don't care to check) allow you to upload data you gathered/bought and use it for tracking.
To prove a point, i created a fb ad to target my parents and family (uploaded their email, phone number, identifiers, interests, locations they frequent - only data i knew fb already had) and watched as their ads became "hey vineyardmike's parents. the internet is not private".
tyingq 2021-08-17 16:46:22 +0000 UTC [ - ]
Spivak 2021-08-17 20:30:23 +0000 UTC [ - ]
FLoC is fundamentally is a proxy for your aggregate browser history which is something that hasn't really be exposed before.
audit 2021-08-18 03:35:42 +0000 UTC [ - ]
"... There are also continued, non-specific calls for violence on multiple online platforms associated with DVE ideologies or conspiracy theories on perceived election fraud and alleged reinstatement, and responses to anticipated restrictions relating to the increasing COVID cases. ..." [1]
So browsing thegatewaypundit.com, gab.com and theconservativetreehouse.com might get you jailed, or 'suicided', and, certainly, economically ruined -- by federal law enforcement of the wealthiest country on Earth...
[1] https://www.dhs.gov/ntas/advisory/national-terrorism-advisor...
ericyan 2021-08-17 16:38:40 +0000 UTC [ - ]
charlesdaniels 2021-08-17 15:30:17 +0000 UTC [ - ]
256 topics would be ceil(log2(256)) = 8 bits of entropy
30,000 topics would be ceil(log2(30000) = 15 bits of entropy
As a reminder, there are ~ 10 billion people on earth, so if you have 34 bits of entropy or so, you can uniquely identify each person.
So really, the way to think of this as "Google considers making FLoC 20% less effective at fingerprinting users", and that's not even considering other sources of entropy, like user agent or screen size.
josefx 2021-08-17 15:56:08 +0000 UTC [ - ]
As a reminder: Chrome sends 16 bits of x-client-data with every http request aimed at Google servers. So they already have half the bits they need to uniquely identify your system without FLoC.
jefftk 2021-08-17 16:29:58 +0000 UTC [ - ]
Earlier comment with more details: https://news.ycombinator.com/item?id=27367482
(Disclosure: I work on ads at Google, speaking only for myself)
gruez 2021-08-17 17:02:58 +0000 UTC [ - ]
Just like the 2fa phone numbers that people to facebook were for "security purposes" only, but later turned out they were using it for ad targeting?
https://techcrunch.com/2018/09/27/yes-facebook-is-using-your...
Spivak 2021-08-17 20:19:54 +0000 UTC [ - ]
nimih 2021-08-17 20:55:20 +0000 UTC [ - ]
More concretely, I think it's easy to believe that:
- The Facebook software developers and product managers who originally built and promoted phone 2FA were being earnest when they said the data would never be used for advertising.
- Some number of years later, someone elsewhere in the organization successfully got themselves access to that information without the knowledge/approval of the first group of people--who in all likelihood don't even work at Facebook anymore--and broke that original promise.
Throwing your hands up in the air and crying "well if they're lying, then all is for naught!" ignores the fact that large organizations act in complex ways, and even if you assume good faith on behalf of the current set of actors, you still need to push for systems which remain ethical and safe if some future set of actors turns out to be complete scumbags.
pessimizer 2021-08-17 20:41:39 +0000 UTC [ - ]
numbsafari 2021-08-17 20:52:55 +0000 UTC [ - ]
This is the same problem with Apple’s new SpywareKit.
neolog 2021-08-18 02:56:08 +0000 UTC [ - ]
dessant 2021-08-17 16:38:53 +0000 UTC [ - ]
Is there an option in Chrome to opt-out of this data being sent to Google?
anchpop 2021-08-17 17:02:32 +0000 UTC [ - ]
It's possible that Google is tracking you with FLoC or with extra HTTP headers or whatever. But they're also openly tracking you all the time anyway because you use Chrome. If you don't trust them to use the data they collect responsibly, don't use Chrome. (I'm not saying you shouldn't pressure Chrome to collect less data, I'm saying it doesn't make any sense to theorize about secret HTTP header fingerprinting operations when they're making literally no effort to hide the much bigger data collection operation right in front of you.)
gruez 2021-08-17 17:08:22 +0000 UTC [ - ]
what if I'm not signed into chrome, or have that feature disabled?
perryizgr8 2021-08-18 09:31:07 +0000 UTC [ - ]
ggggtez 2021-08-17 16:42:27 +0000 UTC [ - ]
Considering you're already aware of screen size and user agent, and other forms of fingerprinting, you should probably realize that in the pre-FLoC world, you're likely already 100% identified by numerous ad networks.
kleene_op 2021-08-17 17:26:41 +0000 UTC [ - ]
Unless several topics can be assigned to a person (which seems to be implied in the article), in which case that's 256 bits of entropy available to classify each person.
>As a reminder, there are ~ 10 billion people on earth, so if you have 34 bits of entropy or so, you can uniquely identify each person.
Yeah, well theoretically you could. But that assumes that browsers are able to extract and balance some very arbitrary and very specific information from the browsing habits of all people on earth in a perfect decision tree.
In practice, lots of browsing habits overlap, making this decision tree far less discriminating and powerful than the theoretically optimal one.
Though I think you are absolutely correct that in practice the number of bits to build up a classifier able to uniquely classify each person must be pretty low. Maybe a few hundreds.
That may very well be possible with those 256 topics mentioned in that article.
Also I don't understand the difference between cohorts and topics, apart from the fact that topic are less numerous and can have appealing names?
charlesdaniels 2021-08-17 18:54:42 +0000 UTC [ - ]
Good catch, forgot this was a bit-vector not a single key.
> Yeah, well theoretically you could. But that assumes that browsers are able to extract and balance some very arbitrary and very specific information from the browsing habits of all people on earth in a perfect decision tree.
Not really, people have found in the past that combinations of user agent, screen resolution, installed fonts, installed extensions, and things of that sort can come very close to uniquely identifying individual people.
> Though I think you are absolutely correct that in practice the number of bits to build up a classifier able to uniquely classify each person must be pretty low. Maybe a few hundreds.
Exactly. It might not narrow it down to one person, but perhaps a relatively small pool.
jnwatson 2021-08-17 21:00:58 +0000 UTC [ - ]
Google would also have limit the number of bits an advertiser has access to.
omegalulw 2021-08-17 20:27:04 +0000 UTC [ - ]
omginternets 2021-08-17 20:07:49 +0000 UTC [ - ]
I understand it's an information-theoretical concept, and also understand it's somehow related to randomness, but I'm not sure exactly how, and I would like to have a more precise understanding.
Seirdy 2021-08-17 20:54:55 +0000 UTC [ - ]
N bits of entropy refers to 2^n possible possible states.
Cryptanalysis:
AES-128 has a key size of 128 bits, so there are 2^128 possible AES-128 keys. A brute-force attack capable of testing 2^128 keys can break any AES-128 key with certainty.
Fingerprinting:
If a website measures your "uniqueness", saying "one in over 14 thousand people" isn't a great way to measure uniqueness because that number changes exponentially. Since we're dealing possible states, i.e. possible combinations of screen size, user-agent, etc., we instead take the base-2 logarithm of this to get a count of entropy bits (~13.8 bits).
Thermal physics:
The second law of thermodynamics states that spontaneous changes in a system should move from a low- to a high-entropy state. Hot particles are far apart and moving a lot; there are many possible states. Cold particles are moving around less and can't change as easily; there are fewer possible states. Heat cannot move from cold things to hot things on its own, but it can move from hot things to cold things. Think of balls on a billiards table moving apart rather than together.
Entropy of the whole universe is perpetually on the rise. In an unimaginably long time, the most popular understanding is that particles will all be so far apart that they'll never interact. The universe will look kind of like white noise. And endless sea of random-like movement, where everything adds up to nothing, everywhere and forever.
nybble41 2021-08-18 20:21:48 +0000 UTC [ - ]
One minor caveat: You have to be able to recognize when you've found the right key. If the message is short (less than the key size) then it is likely that there are multiple keys that can decode the ciphertext to a plausible message and you have no way to know which one was correct. This is why an ideal One-Time Pad is considered unbreakable even by brute force: For any possible message of size less than or equal to the ciphertext there exists a key which will decode the ciphertext into that message.
omginternets 2021-08-18 21:17:40 +0000 UTC [ - ]
omeze 2021-08-17 20:31:12 +0000 UTC [ - ]
omginternets 2021-08-17 20:36:02 +0000 UTC [ - ]
2021-08-17 20:54:26 +0000 UTC [ - ]
mishafb 2021-08-17 15:58:11 +0000 UTC [ - ]