DH.
All posts
Privacy Investigation: What Companies Actually Know About You

June 10, 2025

Privacy Investigation: What Companies Actually Know About You

A practical look at what data Google, social platforms, and data brokers collect — and how much they can infer from behavior you think is private.

PrivacySecurityTechnology

Most people underestimate how much data they're generating — not because they're making obvious mistakes, but because the inference layer on top of their data is more powerful than they realize. You don't need to share your political views for a platform to model them. You don't need to share your location for a broker to know your home address. This post is a concrete look at what's actually being collected and what can be inferred from it.

What Social Media Knows From Minimal Sharing

You can use Instagram primarily to post photos of projects, avoid personal details, and still generate a surprisingly rich profile. Here's what's derivable from that "minimal" usage:

Interest graph. What you post about reveals interests with high confidence. What you engage with — likes, saves, time spent on posts you don't interact with (measured by scroll pause time) — reveals interests you haven't explicitly stated.

Behavioral patterns. Posting frequency and timing reveals your schedule. If you post every weeknight between 9 and 11 PM, that's a signal about your working hours, timezone, and routine.

Social graph. Who you follow and who follows you back builds a relationship map. Even if you have a private account, the shape of your network (professional vs. personal, geographic clusters) is inferable.

Skill and capability trajectory. If you post about projects over time, the progression is visible. Platforms and advertisers can model your current capability level from the pattern of your content.

None of this requires you to fill out a profile form. The behavioral signal is richer than the declared data anyway.

Google's Data Surface

Google's data collection is unusually broad because their products are unusually broad. The full picture of what's in a Google account is worth actually looking at — you can review most of it at myaccount.google.com.

Web & App Activity is the most comprehensive single source. It includes every search, every page visited through Chrome if you're signed in, every app opened on Android, and interactions with Google-integrated services. This builds a behavioral history going back years for most users.

Location Timeline is the most sensitive for many people. If you've had an Android device with location history enabled, Google has a timestamped record of where you've been, including home, work, regular stops, and travel. This data is useful for Google's own products but also makes it a high-value target for legal process and data breaches.

YouTube Watch History feeds the recommendation algorithm but also reveals a lot about interests, political leanings, mental health patterns, and purchasing intent — topics that are heavily modeled by advertisers.

Ad personalization data is worth reviewing because it shows you what Google has inferred about you explicitly. It's often more accurate than people expect, and occasionally includes inferences users find surprising.

Data Brokers: The Less Visible Layer

Google and social platforms at least have privacy controls and data deletion options. Data brokers are a different category.

Companies like Spokeo, Whitepages, BeenVerified, and dozens of smaller services aggregate public records — property records, voter registrations, court filings, business registrations — and combine them with data purchased from apps and loyalty programs. The result is profiles that include home address, family members, estimated income, vehicles owned, and more, available to anyone willing to pay a small fee.

The challenge here is that the underlying data is mostly public record. The aggregation is the problem, not the individual sources. And unlike social platforms, many data brokers have minimal incentive to respond to deletion requests — they're not building a relationship with you.

What "Limited" Sharing Actually Means

The gap between what you think you're sharing and what's collectible comes down to a few things:

Metadata is as valuable as content. Who you message, when, and how often is often more revealing than what you say. Call metadata — numbers, duration, frequency — can accurately model relationship strength, health patterns, and routine.

Cross-platform linking. Your behavior on one platform gets linked to your behavior on others through device fingerprinting, email addresses, phone numbers, and ad network identifiers. The walled gardens aren't actually walled.

Inference at scale. At hundreds of millions of users, even weak signals become reliable predictions. A platform doesn't need to know your income directly — it can model it from your device type, app usage patterns, shopping behavior, and neighborhood.

Practical Implications

The goal of understanding this isn't paranoia — most data collection is used for advertising, not anything nefarious. But a few practical things follow from understanding the actual scope:

  • Reviewing and deleting Google location history periodically is worth doing, especially if you're concerned about data breach exposure.
  • Browser fingerprinting bypasses most cookie-based tracking controls. If privacy matters for specific browsing, a browser that addresses fingerprinting (Firefox with appropriate settings, or Brave) is more effective than incognito mode.
  • Data broker removal is tedious but possible. Tools like DeleteMe automate the requests, though it's an ongoing process — brokers re-aggregate data continuously.
  • The most sensitive data — location history, health data, financial patterns — deserves more scrutiny than the most visible data. A photo you post is visible; your location timeline is just as real but much less considered.