Non-Breach Privacy Events

Simson L Garfinkel; Mary Theofanos

Abstract

While data breaches frequently create privacy concerns, other types of non-security-related events may also raise privacy concerns. The present study collected and characterized a corpus of non-security-related privacy events that we term “non-breach privacy events." In this article, we consider non-breach privacy events, which we define as incidents in which the action or inaction by an individual or organization resulted in a perceived privacy violation, but where the action did not involve the theft of data as the result of a computer intrusion. Using a systematic search methodology, we identified 44 non-breach privacy events enabled by technology. We then organized these events according to the data flows using Solove’s Taxonomy of Privacy and several characteristics that examined the range of these non-breach events.

Results summary: A curated dataset of 44 events that resulted in privacy harms. This dataset is a valuable tool for other researchers wanting to explore handling user data while respecting privacy. Also provided is a qualitative analysis of the nature of the privacy incidents, revealing several trends and lessons learned, the most significant being that just a few people operating within a large organization can create large-scale privacy events.

Methods

Search Methodology

We found events using the following approaches:

We performed searches on the Federal Trade Commission (FTC) website for privacy enforcement actions that that contained the keyword "privacy" and then manually reviewed each to remove the enforcement actions resulting from data breaches as we define the term [6],[7]. Because the FTC does not require the existence of a malicious actor in order to categorize an incident as a data breach, some incidents that the FTC classified as a data breach may appear in this list.
We searched the Federal Communications Commission (FCC) website for press releases (keyword: "For immediate release") that featured the keyword "privacy" [8] and reviewed the results from 2000 through the present day. FCC actions that specifically mentioned "data breaches" were not included, but FCC actions that featured data being exposed but not necessarily being exploited were included. We also reviewed FCC enforcement actions [9].
We reviewed journalistic articles that featured lists of famous privacy mishaps by searching for the keyword privacy along with the search terms flaps, snafus, and credit bureaus.
We asked colleagues to review earlier drafts of this article and provide us with events that we had missed.

We sought to include cases between 1990 through 2015 that were public, well publicized, well known, and legally settled (that is, no longer the subject of an appeal).

Exclusion Criteria

Because this article is solely concerned with privacy events that result from the improper, authorized, and intentional use of data, we excluded the following kinds of events:

Where data were stolen by an outsider due to a computer security configuration error or a vulnerability that was exploited. Such "breach events" have been widely reported elsewhere.
Where data were stolen by an insider due to the insider’s dishonesty or systems that allowed the insider to exceed his or her authorized access.
Where data were released because of the failure on the part of a data custodian to properly destroy data on equipment prior to disposal [10].
When attackers engaged in pretexting, identity theft, or identity fraud. (Pretexting is a form of social engineering, in which the perpetrator lies or provides false or misleading information to an information custodian in an attempt to obtain confidential information about a targeted individual. Identity theft is the theft of personal information that could be used to obtain credit or steal something of value; identity fraud is the use of the personal information for a fraudulent purpose.)
When an individual was harmed because of a mismatch in a database (for example, a person being prohibited from boarding a flight because of a mismatch on a "no-fly" list).
When employers legally accessed email, phone conversations, or the work space of their employees.
Incidents of improper government surveillance, such as the incidents described by the Senate Select Committee to Study Governmental Operations with Respect to Intelligence Activities, 1975-1976 (Church Committee) [11].
Data released by inadvertent inclusion in publicly accessible directories. Although such incidents are unfortunate, they typically are the result of poor usability, incorrect system configuration, or poor training, and do not reflect the actor’s common business practices.

We also attempted to limit our collection to cases that had cultural or societal significance, as indicated by the number of people impacted, the involvement of government, or media attention.

Finally, we excluded events in which there was apparent wrongdoing but no finding from a government agency, or in which there was no statement of explanation or apology issued by the organization involved in the event.

Analysis Criteria

Solove’s taxonomy views privacy as a series of information flows. Informational privacy involves information collected from an individual by surveillance of the individual or collected by data holders through interrogation of an individual. In Solove’s taxonomy, the difference between surveillance and interrogation rests with the manner of data collection: “Surveillance is the watching, listening to, or recording of an individual’s activities. Interrogation consists of various forms of questioning or probing for information.” (p. 490)

Despite the outsized role that consent plays in today’s privacy world, with many organizations requiring all manner of consent from consumers before the consumers can use their service, Solove’s taxonomy makes little mention of consent. Solove notes that consent frequently determines the context of an activity and, as a result, whether or not a privacy violation has occurred: “Thus, if a couple invites another to watch them have sex, this observation would not constitute a privacy violation. Without consent, however, it most often would.” (p. 484) But the word “consent” appears just 24 times on 12 pages of the 84-page article, mostly to emphasize that a particular privacy violation happened, in part, because an action was taken without an individual’s informed consent.

Once a data holder has information about a data subject, the data holder can violate an individual’s privacy by employing a variety of privacy-invading information processing techniques. Finally, the data holders may disseminate the personal information in several privacy-invading techniques. Separately, Solove considers privacy invasions that an individual may suffer. Thus, using the taxonomy, it is possible to decompose a single privacy-violating event into multiple kinds of privacy harms.

Solove’s taxonomy does not directly address the scale, network, growth, and movement of data within the ecosystem of data holders that has grown drastically since the early 2000s when the taxonomy was formulated. Nevertheless, privacy harms in today’s data economy can be readily categorized using the taxonomy. This strongly implies that while the modern data economy has created more opportunities for privacy harms, it is not creating fundamentally new kinds of privacy harms.

Solove’s taxonomy is summarized by its iconic diagram, which we reprint below as Figure 1, and display as a list in Figure 2:

Figure 1: Solove’s privacy taxonomy diagram.

Figure 2. Solove’s Taxonomy of Privacy [12]; text in parentheses is from Solove’s descriptions.

In creating this compilation, we noted that many areas of concern to privacy researchers in recent years do not fit neatly into only one of these categories. For example, early criticisms of Google Street View service focused on the perceived privacy harms of surveillance (the recording of street-level photos), aggregation (the assemblage of photos from all over the world), identification (the matching of photos to places), and disclosure (the revealing of true facts—the photographs). Although Street View could not function without all of these aspects, we classify it under Information Collection/Surveillance because of the FCC’s action against Google pertaining to Street View’s collection of wireless network traffic.

In addition, we characterize each case according to the following characteristics, using terms that we adopted for this purpose:

Scale - The number of people impacted, to the nearest order of magnitude. When discussing a privacy event, the number of people affected is a useful measure for putting the event into perspective. Scale is not a measure of the impact of the event on those people, as events that impact a small number of people typically have a larger impact on those people than events that impact thousands or millions.
Purview - We reviewed official statements from regulatory agencies, letters from corporations, news reports, and other material that we reference to determine the number of individuals that had direct knowledge of the actions leading up to the privacy event. We use the word purview to indicate knowledge or experience; we are silent on the issue of legal responsibility.
Awareness - Because of our selection methodology, all of the events in this article involved some aspect of intentional data use. Beyond the intent to use the data, sometimes the actors were aware that their actions might create a privacy event, while other times the privacy event was an unexpected outcome. We use yes to indicate that the privacy incident was the result of deliberate, intentional decisions to engage in a particular practice, while no indicates that those with purview were not aware that their actions would result in a privacy event.
Goal of Identifiability - If the technology, application, or focus of the privacy event was to single out individuals or equipment associated with individuals including identifying a person with a group or characteristic. We use yes to indicate that the privacy event was focused on singling out individuals, while no indicates that the event was not focused on singling out individuals, even though the event may ultimately have had that result.

We find these characterizations useful for understanding why organizations engage in practices that impact the privacy experiences of customers, individuals, and the public at large.

Results

In this section, we briefly describe each of the 44 non-breach privacy incidents in our corpus. For each, we provide a brief name and description and the year that it took place. We provide a category and sub-category for each, using Solove’s taxonomy. For each we provide a scale, which is the base-10 logarithm of the number of people who were affected. To the best of our ability based on published information, we present the purview of the event, as well as whether the event had the goal of identifiability. All of this is presented in Table 1.

Table 1. The compilation of incidents. "Paragraph" refers to the paragraph in this article where the incident is discussed. "Incident" is our title for the incident. "Year" is the year the incident took place. "Category" and "Sub-Category" refer to the Solove category we used to classify the incident. Scale refers to the order-of-magnitude of number of people impacted by the incident. "Purview" refers to the number of people in the organization who were aware of the incident before it became publicly known. "Awareness" indicates whether the organization responsible for the privacy incident was aware of the privacy impact. "Goal of identifiability" indicates whether the goal of the incident was to identify or single out individuals.

1.A — Information Collection

This section presents information processing incidents involving the information collection activities of data holders. Surveillance events typically involve passive collection, while interrogation events involve interaction between the data subject and the data holder.

1.1 Google Street View Wi-Fi Capture (2007) A1 Surveillance

Scale: 10⁸; Purview: few [13],[14]; Awareness: yes; Goal of Identifiability: no

The Google Street View program involves driving cars on public roads, collecting photographs as the cars drive, and geolocating those photographs on Google’s online map products. As part of its Street View program, Google captured the location and Wi-Fi MAC address of every wireless router that it could identify so that Google could deploy a Wi-Fi–based geolocation service, similar to the service pioneered by Skyhook Wireless in 2003. In 2010 Google conducted a technical review of Street View and determined that, in addition to photographs and Wi-Fi geolocation data, Google’s cars also recorded and aggregated unencrypted Wi-Fi frames. Google commissioned an outside consulting firm to audit its practices and self-reported to multiple national regulatory agencies, which then conducted their own reviews. In the United States, the FCC concluded that the data Google captured included "names, addresses, telephone numbers, URLs, passwords, e-mail, text messages, medical records, video and audio files, and other information from Internet users in the United States" [15]. As a result of its investigation, the FCC assessed a $25,000 Notice of Apparent Liability (similar to a fine) against Google “for willfully and repeatedly violating an Enforcement Bureau directive to respond to a letter of inquiry” [16]. Separately, Google agreed to pay $7 million to 38 states and the District of Columbia to settle claims arising from the incident [17].

1.2 Lower Merion School District “spycam” (2010) A1 Surveillance

Scale: 10³; Purview: few; Awareness: yes; Goal of Identifiability: yes

System administrators at the Lower Merion, Pennsylvania school district installed software on laptop computers provided by the school district to high school students that secretly snapped photographs every 15 minutes and transmitted those photographs to servers operated by the school system [18]. After a student was disciplined at school for conduct in his bedroom, two parents filed suit against the school district for invading the students' privacy rights. During the course of the suit, it was revealed that more than 66,000 images of students had been secretly snapped and recorded and that two school staffers knew that the images were being recorded. Several students later alleged that the photos included images in which they were nude or partially dressed, and filed suit against the school system [19],[20],[21]. In October 2010 the school district settled the primary lawsuit for $610,000.

1.3 Newport Television KTVX(DT) Telephone Disclosure (2012) A1 Surveillance; C2 Disclosure

Scale: 1; Purview: organization; Awareness: yes; Identifiability: yes

In August 2012, Newport Television LLC’s KTVX(DT) in Salt Lake City recorded and broadcast "a consumer's telephone conversation as part of a news segment without first telling the person that the call was being recorded and would be broadcast" [22]. Newport Television LLC agreed to pay a $35,000 civil penalty for the violation of the FCC’s Telephone Broadcast Rule in November 2014.

1.4 Brightest Flashlight (2013) A1 Surveillance

Scale: 10⁷; Purview: organization; Awareness: yes; Goal of Identifiability: yes

A popular Android operating system application called “Brightest Flashlight Free” was downloaded over 50 million times [23] by users and used to turn an Android phone into a flashlight. Unknown to users, the app also collected precise location and Device ID from the user’s phone and transmitted this data to third parties for the purpose of improving advertising messages. The FTC took action against the makers of the app, Goldenshores Technologies, LLC, for not disclosing the collection of personal information to users [24]. The company was required to improve their notification of users, to provide users controls regarding the collection, use, and sharing of geolocation information, and to delete the data collected from users prior to the settlement.

1.5 BabyBus (2014) A1 Surveillance

Scale: 10⁶; Purview: unknown; Awareness: yes; Goal of Identifiability: yes

BabyBus created popular mobile applications designed to teach letters, numbers, and shapes to young children. In 2014 the FTC sent a letter to BabyBus, a Chinese developer of apps for children, warning that the company might be in violation of the Children’s Online Privacy Protection Act (COPPA) [25]. "Your apps, offered to users in nine languages, have been downloaded millions of times," the FTC wrote in its letter to BabyBus. "Several of your apps appear to collect precise geolocation information that is transmitted to third parties, including advertising networks and/or analytics companies. Under COPPA and its implementing Rule, 16 C.F.R. § 312 et seq..., developers of apps that are directed to children under 13—or that knowingly collect personal information from children under 13-are required to post accurate privacy policies, provide notice, and obtain verifiable parental consent before collecting, using, or disclosing any ‘personal information’ collected from children."

According to BabyBus, the geolocation information was collected by an “Android third-party statistics software plug-in” [26]. Google suspended the BabyBus apps from the PlayStore a week after the FTC’s letter was publicized. The apps were later re-admitted to the app marketplace.

1.6 Yelp, TinyCo COPPA (2014) A2 Interrogation

Scale: 10⁶; Purview: few; Awareness: yes; Goal of Identifiability: yes

The online review site Yelp, Inc., and its mobile application developer TinyCo, Inc., settled an FTC action involving the collection of children’s information on mobile applications in violation of COPPA. The Yelp mobile application requested that users enter their date of birth, name, email address, and other personal information. In thousands of cases, the FTC alleged, children told Yelp’s app that they were under 13 but the app continued to collect their personal information. The FTC alleged that this was a violation of COPPA, as Yelp did not have written permission from parents to collect that information. Under the terms of the settlements, Yelp agreed to pay a $450,000 civil penalty, while TinyCo agreed to pay a $300,000 civil penalty [27].

1.7 Harvard University Classroom Covert Photography (2014) A1 Surveillance

Scale:10³; Purview: few; Awareness: yes; Goal of Identifiability: yes

As part of an experiment measuring course attendance and completion rates, digital cameras placed in classrooms at Harvard University photographed students in order to electronically determine classroom attendance using facial recognition. Neither the professors nor the students in the courses were told that video monitoring would be taking place. Harvard’s Institutional Review Board (IRB), the body federally mandated to regulate human subjects research at the university, gave approval to the study on the grounds that the work “did not constitute human subjects research,” and thus did not require consent of those being monitored [28]. Following the disclosure of the surveillance, Harvard’s Vice Provost for Advances in Learning said that all of the collected images would be destroyed.

1.8 AddThis canvas fingerprinting (2014). A2 Interrogation

Scale: 10⁸; Purview: organization; Awareness: yes; Goal of Identifiability: yes

AddThis developed free website tools including a “sharing button” and “follow buttons,” making it easy for website operators to have buttons that allow users to post information from a website on social media such as Facebook and Twitter, and to “follow” the organization. The buttons are deployed on a website by including JavaScript code on the website that includes code from the AddThis website. Unknown to organizations using the technology, AddThis modified its code to include a technology called “canvas fingerprinting” [29] that allowed AddThis to uniquely identify and track every website visitor, irrespective of the use of “private browsing,” cookie deleting, or other privacy-signaling mechanisms. Because AddThis was used by thousands of top websites, it allowed AddThis to correlate browsing activity across a large percentage of the Internet’s users and properties [30]. Following the publicity of the tracking technique, some websites removed the AddThis technology.

1.9 Verizon “Perma-Cookie” (2014) A1 Surveillance

Scale:10⁸ Purview: organization; Awareness: yes; Goal of Identifiability: yes

Wireless provider Verizon injected a new header (“X-UIDH”) in unencrypted Hyper Text Transfer Protocol (HTTP) requests sent from Verizon cell phones to websites. The header, which was only sent for requests sent over the carrier’s wireless network (as opposed to Wi-Fi), contained a device-specific header that did not change, allowing websites to correlate activity from individual cell phones as the cell phone moved from place to place [31]. Further testing revealed that AT&T also experimented with device-specific headers; AT&T stopped this practice in November 2014 [32]. In January 2015 Verizon announced that it would allow users to opt out of the UIDH advertising program [33]; as of June 2018, Verizon’s customer support website indicated that the UIDH advertising program was still operational, but that the UIDH headers were only sent on a limited basis [34].

1.10 Nomi Technologies Wi-Fi Marketing (2015) A1 Surveillance

Scale:10⁷; Purview: organization; Awareness: yes; Goal of Identifiability: yes

Nomi Technologies developed technologies for tracking consumers entering stores based on identifiers transmitted by their mobile phones. Nomi posted a privacy statement on its website indicating that consumers could opt out of the tracking process on the Nomi website or in person at the stores; however, no opt-out process existed at the stores. FTC brought an action against Nomi and negotiated a settlement in which Nomi acknowledged misleading customers, promised that it would not mislead customers in the future, and agreed to FTC monitoring of its public statements and consumer complaints relating to the FTC action for a period of five years [35].

1.11 Pearson Twitter (2015) A1 Surveillance

Scale:10⁷; Purview: organization; Awareness: yes; Goal of Identifiability: yes

Pearson is a British-owned US company that publishes educational materials and assessment tests. In March 2015 the company informed the superintendent of a New Jersey high school district that one of the school district’s students posted information about the Partnership for Assessment of Readiness for College and Careers (PARCC) test to Twitter [36]. The company issued a statement stating that it was “contractually required by states to monitor public conversations on social media to ensure that no assessment information (text, photos, etc.) that is secure and not public is improperly disclosed” [37]. The American Federation of Teachers issued a statement criticizing Pearson for not signing the Student Privacy Pledge “designed to limit the collection, maintenance and use of student personal information” [38].

2. B — Information Processing

This section presents incidents involving the information processing activities of data holders. In these cases, the incident resulted not from the collection of the data, but from its inappropriate use. We characterize these incidents using Solove’s taxonomy, creating five potential harms: aggregation, identification, insecurity, secondary use, and exclusion.

2.1 Facebook News Feed (2006) B1 Aggregation

Scale:10⁷; Purview: organization; Awareness: yes; Goal of Identifiability: yes

Two years after its founding, Facebook launched News Feed, a new service within Facebook that aggregates status updates and changes in one’s “friends” and places them on each user’s Facebook home page. Previously users needed to check on each of their friends’ “walls” to see what they were doing. News Feed automatically aggregated all of this information. News Feed had the result of making information evident that was previously accessible but not prominently featured. For example, parents received details of their children’s lives that they previously had to seek out, potentially revealing more information than desired or expected. Facebook founder and CEO Mark Zuckerberg issued an apology for not including sufficient privacy controls into News Feed, saying “we really messed this one up” [39]. Facebook kept the News Feed as one of the primary ways that users interact with the website and attempted to address the privacy issues by adding a steadily growing and changing number of end-user controls [40].

2.2 Verizon Marketing with Consumer Information without Opt-Out (2006), B4 Secondary Use.

Scale:10⁶; Purview: organization; Awareness: yes; Goal of Identifiability: yes

The FCC holds that the Communications Act requires approval from consumers before a carrier can use consumer information for marketing purposes. However, between 2006 and 2008 Verizon used customer proprietary network information (CPNI) for marketing without first allowing the consumers to opt out or opt in. Verizon discovered the privacy event in September 2012 and reported it to the FCC on January 18, 2013 (126 days later). Verizon settled with the FCC, agreeing to pay $7.4 million, to create an internal compliance program, and “to notify consumers of their opt-out rights on every bill for the next three years” [41],[42].

2.3 Facebook Beacon (2007) B4 Secondary Use

Scale:10⁷; Purview: organization; Awareness: yes; Goal of Identifiability: yes

Facebook Beacon was an advertising tracking system that monitored what a Facebook user purchased on a non-Facebook website and then reported purchases in the news feeds of the user's friends [43]. For example, Beacon could report to a user’s friends when the user rented a movie at Blockbuster, purchased a movie ticket at Fandango, or purchased an engagement ring. An investigation by Computer Associates found that Facebook received information from partner websites even when the Facebook user had logged out of Facebook [44]. Facebook Founder and CEO Mark Zuckerberg apologized: “We’ve made a lot of mistakes building this feature, but we’ve made even more with how we’ve handled them” [45].

Facebook terminated Beacon in September 2009 and paid $9.5 million to resolve a class-action lawsuit resulting from the introduction of the service [46].

2.4 MIT Gaydar: Facebook friendships expose sexual orientation (2009) B1 Aggregation

Scale:10³; Purview: few; Awareness: yes; Goal of Identifiability: yes

Students at the Massachusetts Institute of Technology developed a statistical model based on data from the Facebook social network graph that could accurately predict the sexual orientation of MIT community members. This was significant, as the model could predict the orientation even in cases where the individual chose not to make that information public. The model required a valid Facebook account within the MIT network in order to access the complete list of an individual’s Facebook “friends” [47].

The students trained the computer program on the social networks of 1,544 men whose Facebook profile indicated they were straight, 21 whose profile said they were bisexual, and 33 whose profiles claimed to be gay. They then tested the program on 947 men who did not report their sexuality on Facebook. The students reviewed 10 people in the sample whom they knew to be gay, and the program identified all 10 as being gay [48]. The project was heralded as an example of the power of social network analysis, and the students’ faculty advisors reported on numerous occasions that leaders in the MIT gay community confirmed that the program could identify people who did not make their sexual orientation public.

2.5 Target Pregnancy Forecasting (2010) B1 Aggregation

Scale:10⁶; Purview: few; Awareness: yes; Goal of Identifiability: yes

At a talk at Predictive Analytics World, Andy Pole, a statistician at Target, explained how the company could infer whether some customers were pregnant by a sudden change in their buying habits [49]. By discovering that women started purchasing unscented hand lotions and some vitamins, Pole said that Target could proactively send the women coupons for baby items. The story was largely unnoticed at the time but received significant attention after an article appeared two years later in The New York Times [50] and an article about the Times article appeared in Forbes [51]. According to the Times article, women establish new buying habits when pregnant, and these new habits may last for 10 or more years, so companies like Target are highly motivated to influence those habits to the company’s advantage.

According to the Times article, a teenage girl’s father in Minnesota received coupons and called Target to complain about inappropriate coupons, after which the father called back to apologize when he learned that his daughter was actually pregnant. Other customers were similarly spooked, reported The Times, so Target started balancing the targeted advertisements with advertisements for lawn mowers so that the targeted women would think that their receiving maternity-themed coupons was a sheer coincidence.

Target refused to meet with the Times reporter while he was working on his story, and some commentators alleged that the Minnesota story is apocryphal, since the Times did not name the Target executive who provided the apocryphal anecdote [52].

2.6 Apple iPhone Tracking Location (2011) B1 Aggregation

Scale:10⁷; Purview: no one; Awareness: no; Goal of Identifiability: no

A programming error on Apple’s iPhone operating system caused the phone to remember the time and date of every Wi-Fi hotspot and cell phone tower it encountered. (The operating system collected this information and reported it back to Apple to assist in geolocation.) Users discovered that the iPhone’s database was copied when the phone was backed up to a desktop, and from there the database could be accessed by others, providing a database of where the user had been. After the bug was publicly disclosed, Apple acknowledged the error and issued a software update so that the iPhone (and the iPhone backups) would not retain more than seven days of data [53].

2.7 Uber “Rides of Glory” (2012) B4 Secondary Use

Scale:10⁵; Purview: small group; Awareness: yes; Goal of Identifiability: yes

In a blog post, Uber’s data science team showed how data from the ride-hailing service could be used to find customers who spent the night at a place other than their primary residence (with implicit sexual overtones). The original blog post was removed after negative publicity regarding the misuse of transactional data [54], [55].

2.8 PaymentsMD Improper Collection (2012) B4 Secondary Use

Scale:10³; Purview: organization; Awareness: yes; Goal of Identifiability: yes

In 2014 the FTC filed a complaint against PaymentsMD, alleging that the company’s consumer-facing payment processing website solicited consent from consumers to obtain their complete medical record, which PaymentsMD then used to build an electronic heath record (EHR) for a new business opportunity that the company was pursuing. The FTC alleged that consumers were deceived and misled into providing consent, even though consent was not required for the purpose of bill presentation and payment. Under the terms of the 2014 settlement between PaymentsMD and its former CEO Michael C. Hughes, the company was forced to destroy the healthcare information that it collected and was prohibited from engaging in similar practices in the future [56].

2.9 Facebook “Year in Review” (2014) B1 Aggregation; B4 Secondary Use

Scale:10⁸; Purview: organization; Awareness: yes; Goal of Identifiability: yes

In 2014, Facebook launched a “Year in Review” feature on its website that automatically evaluated photographs from each user’s photo history and prepared a collage customized for each user with the default tagline “It’s been a great year. Thanks for being a part of it.” These collages were prepared for all users, without opting in, and there was no way to opt out. In some cases, the photos produced pain and suffering as they reminded Facebook users of tragic events [57]. One example commonly cited was that of Eric Meyer of Cleveland Heights, Ohio, whose daughter had died from brain cancer during 2014: the daughter’s photo was prominently featured in the automatically generated collage with the tag line “See Your Year” [58], [59]. A Facebook product manager who oversaw the project apologized to Meyer and said that the company would do better in the future [60].

Facebook continued the practice of preparing “year in review” slideshows. Today it is common for photo management system from Facebook, Google, and Apple to show users images from a few years earlier with the hope of triggering pleasant memories and increasing user interaction with the system.

3. C — Information Dissemination

Privacy incidents can result when an organization legitimately entitled to holding personal data releases that data in an inappropriate manner. Solove identifies seven categories of information dissemination harms: breach of confidentiality, disclosure, exposure, increased accessibility, blackmail, appropriation, and distortion.

3.1 Lotus Marketplace (1990) C4 Increased Accessibility

Scale:10⁸; Purview: organization; Awareness: yes; Goal of Identifiability: yes

In April 1990 the Lotus Development Corporation [61] announced a partnership with Equifax to create Lotus Marketplace:Households, a targeted marketing platform. Delivered on CDROM, the system contained a compact disc database with data on 150 million individuals, each categorized by name, address, age, gender, marital status, household income, 50 lifestyle categories, and buying propensity for 100 specific products [62]. The $695 product include a software meter that would allow users to search on any field but only generate name and address reports of 5000 individuals.

Privacy activists expressed grave concern with Marketplace:Households shortly after it was announced. They noted that even though these kinds of data had long been collected and used by service providers to create targeted prospect lists for mailing and telemarketing, this would be the first time that the entire database would be put directly in the hands of users, allowing them to perform searches and generate lists without any oversight.

Lotus countered, saying that it considered privacy issues in producing the product, planned to limit purchasers to corporations, and required a license agreement that prohibited specific uses of the data. Lotus also claimed that protection mechanisms built into the software prevented a user from simply extracting the entire database—a claim that activists disputed. Privacy activists also said that consumers would be unable to opt out once each quarterly CDROM had been produced. The controversy ignited a grassroots campaign against the product. Lotus received more than 30,000 email messages from consumers demanding that their names be removed from the database. Soon afterward, Lotus terminated the project in October 1990, without ever releasing the product.

3.2 Massachusetts GIC (Group Insurance Commission) (1996), C1 Breach of Confidentiality

Scale:10⁴; Purview: organization; Awareness: yes; Goal of Identifiability: no

In 1996 the Massachusetts Group Insurance Commission released a dataset to healthcare researchers of records belonging to Massachusetts state employees who had been hospitalized. Then-governor of Massachusetts William Weld championed the release. “He said privacy would be protected because all identifiers had been eliminated from the records” [63]. Specifically, the state de-Identified the records by removing each employee’s name and address, but the employee’s date of birth, ZIP code, and sex remained to allow for statistical analysis. Latanya Sweeney, then an MIT graduate student, obtained a copy of the GIC data and decided to look for the medical records of Governor Weld, who was hospitalized after collapsing during a graduation ceremony at Bentley College on May 18, 1996. Knowing that Weld lived in Cambridge and was almost certainly a registered voter, Sweeney purchased the city of Cambridge’s voter rolls for $20, used them to learn Weld’s birthday and ZIP code, and then used this information to find the corresponding medical records in the GIC data set [64], [65]. Partly as a result of this study, the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule later established a de-identification standard requiring suppression of 18 different data fields, including days and months, and generalization of ZIP codes to the first three digits [66].

3.3 Eli Lilly Prozac Mailing List (2001) C1 Breach of Confidentiality

Scale:10²; Purview: few (1), Awareness: no; Goal of Identifiability: no

Between March 15, 2000 and June 22, 2001, Eli Lilly and Company, a major US pharmaceutical company, operated an e-mail reminder service that allowed patients to sign up on Prozac.com to receive messages reminding them to refill their prescriptions for the antidepressant. On June 27, 2001, a Lilly employee sent an email message to all of the service’s 669 subscribers informing them that the service was discontinued. Unfortunately, the email message listed all of the subscribers in the “To:” field of the email message, effectively providing each subscriber with a complete list of the other service subscribers. The FTC negotiated a settlement with Lilly in which the company agreed to establish an information security program to protect personal data [67].

3.4 JetBlue Releases Customer Data to DHS (2002) C1 Breach of Confidentiality

Scale:10⁶; Purview: organization; Awareness: yes; Goal of Identifiability: yes

In September 2002, jetBlue Airways provided 5 million “passenger name records” to Torch Concepts, a defense contractor developing a counterterrorism tool based on “data pattern analysis” [68].

Following the terrorist attacks of September 11, 2001, Torch Concepts of Huntsville, AL, approached the Department of Defense (DoD) with a proposal to use “data pattern analysis” to evaluate the risk posed by visitors to DoD installations. Briefly, the approach was to combine consumer reporting and demographic information with travel information to create risk analysis models. DoD added Torch Concepts as a subcontractor to an existing contract in March 2002 to perform a “limited initial test” of the technology. Torch made numerous approaches to federal agencies to obtain information but was unsuccessful. After unsuccessfully approaching American Airlines and Delta Airlines, Torch contacted the Department of Transportation (DOT) and the Transportation Security Agency (TSA). Finally, after it was approached by “a relatively new” TSA employee, jetBlue agreed to provide Torch with assistance. Torch engaged the data aggregation firm Acxiom to handle aspects of its data processing. In September 2002 jetBlue provided Acxiom with five million records, representing 1.5 million passengers. In October 2002, Torch purchased additional information from Acxiom.

Based on its analytics, Torch prepared a presentation concluding that “several distinctive travel patterns were identified” in the data and that “known airline terrorists appear readily distinguishable from the normal jetBlue passenger patterns” [[ix]]. This presentation was eventually discovered on the Internet by members of the public and the media. As a result, the Department of Homeland Security (DHS) Privacy Office investigated. The DHS Privacy Office concluded that while TSA employees were involved in the data transfer and “acted without appropriate regard for individual privacy interests or the spirit of the Privacy Act of 1974,” no actual Privacy Act violation had taken place, since the data were transferred directly from jetBlue to Acxiom.

3.5 Release of “de-identified” AOL search logs for research (2006) C2 Disclosure

Scale:10⁵; Purview: group; Awareness: yes; Goal of Identifiability: no

A group of researchers at the consumer Internet provider America Online (AOL) released a series of search queries made by AOL subscribers using the AOL search engine to assist academic researchers working in the area of Internet search and text retrieval. Prior to release, AOL removed the users’ identifying information and replaced it with a randomly generated pseudonym so that subsequent searches by the same individual could be correlated. Journalists were able to identify several users from their search terms and contacted the users to verify the re-identification. “There are also many thousands of sexual queries, along with searches about ‘child porno’ and ‘how to kill oneself by natural gas’ that raise questions about what legal authorities can and should do with such information,” read an article in The New York Times [70], [71]. Although AOL apologized for the release [72], researchers noted that other Internet search engines had released de-identified user search histories in 1999 and 2001 [73].

Following the release, a class action lawsuit, Landwehr v. AOL Inc., alleged that AOL violated specific privacy and consumer protection laws by publicly releasing some of its users’ search queries. On May 28, 2013, a federal court approved a settlement of up to $5 million in the case in which AOL admitted no wrongdoing. A settlement fund allowed affected AOL users to claim up to $100 each [74], [75].

3.6 Netflix Prize (2006) C2 Disclosure

Scale:10⁵; Purview: organization; Awareness: yes; Goal of Identifiability: no

To spur academic research in data mining, the video rental firm Netflix released customer video rental histories for roughly 480,000 Netflix customers that included the rental date and the customer's rating. Netflix tried to protect customer privacy by replacing customer names with a unique number. The dataset included no other direct identifiers. Netflix offered a prize of $1 million to the winning team that could develop a recommendation algorithm that performed better than the internet Netflix algorithm. Arvind Narayanan and Vitaly Shmatikov, two graduate students at the University of Texas, developed an approach for identifying some of the records in the Netflix Prize dataset by correlating the video ratings with ratings in IMDb, a publicly available dataset. Unlike the Netflix set, the IMDb dataset also included the names or other identifiers of the individuals who performed the ratings. By using the IMDb dataset, the researchers showed that they could discover additional movies that a Netflix subscriber might have watched but not publicly rated on IMDb. As a result of the release of the Netflix data and the company’s announcement of a second contest, the FTC sent a letter of inquiry to Netflix [76], and a class action lawsuit accused Netflix of violating fair-trade laws and the Video Privacy Protection Act [77]. Four months later, Netflix announced that it had settled the lawsuit and canceled the second contest [78].

3.7 Google Street View Photography Capture (2007) C4 Increased Accessibility

Scale:10⁸; Purview: organization; Awareness: yes; Goal of Identifiability: yes

In 2007, Google released Street View, street-level photographs of streets, houses and businesses taken from vehicles that Google had driven in major US cities. (The program was later expanded worldwide.) Privacy activists criticized Street View for collecting these photographs without permission. Although the photographs had been taken from public streets, Street View made it possible for people around the world to see and share imagery that was previously difficult to collect. Because Street View was based on automated collection and processing of the geolocation-tagged images, many potentially embarrassing images were found and publicized by the general public before they could be reviewed and removed by Google. Privacy activists also raised concerns about the fact that Google’s vehicles took photographs inside of houses if the windows were open. Google responded by blurring faces and license plates and creating a mechanism for individuals to request that images be removed [79], [80]. The government of Italy levied a €1 million ($1.4 million) fine against Google for taking Street View photographs from cars not clearly marked as belonging to Google, and for the interception of unencrypted Wi-Fi signals [81].

3.8 Jerk.com (2009-2015) C.5 Blackmail; C6 Appropriation

Scale:10⁷; Purview: small organization; Awareness: yes; Goal of Identifiability: yes

In a 2014 enforcement action, the FTC found that John Fanning created the website Jerk.com, downloaded up to 85 million individual user profiles from Facebook, labeled some of the people as a “Jerk” or “not a Jerk,” and then offered users $30 “memberships” to his website. The memberships allegedly gave Jerk.com users the ability to “manage your reputation” and to “dispute” the information posted online [82]. The FTC ruled against Jerk LLC in March 2015. Fanning appealed the FTC decision to the First Circuit Court of Appeals, which affirmed the Commission’s summary decision except for the provision regarding ongoing monitoring of Fanning.

3.9 Facebook “Like” (2009) C1 Disclosure

Scale:10⁸; Purview: organization; Awareness: yes; Goal of Identifiability: yes

In 2009 Facebook created a “like” button which allowed users of the Facebook platform to click “like” on items in their Facebook News Feed to indicate that they approved of a posting or message. Code on the Facebook platform collected all of the Facebook users who “liked” an item and displayed the total numbers to Facebook users, as well as the names of any of their friends who might like something. The Facebook “like” button also allowed third parties to place “like” buttons on their own web properties. As with the “like” button in the newsfeed, Facebook users visiting those third-party websites would see how many other users “liked” the web property, and who among those likers were their Facebook “friends.” As of this article's publication, the Facebook "like" button was still operational.

3.10 CVS Caremark and Rite Aid Improper Disposal of Sensitive Documents (2009, 2010) C1 Breach of Confidentiality

Scale:107; Purview: small groups; Awareness: yes; Goal of Identifiability: no

Investigations by the FTC and the US Department of Health and Human Services found that both CVS Caremark [83] and Rite Aid [84] improperly disposed of pharmacy-related information in open dumpsters behind their stores. In the case of CVS, “media reports from around the country [indicated] that its pharmacies were throwing trash into open dumpsters that contained pill bottles with patient names, addresses, prescribing physicians’ names, medication and dosages; medication instruction sheets with personal information; computer order information from the pharmacies, including consumers’ personal information; employment applications, including social security numbers; payroll information; and credit card and insurance card information, including, in some cases, account numbers and driver’s license numbers.” In the case of Rite Aid, the personal information included “pharmacy labels and job applications.”

In both cases the pharmacy chains were penalized by the FTC for deceptive trade practices, both having publicly claimed to respect consumer privacy and to properly safeguard protected health information. CVS, the largest pharmacy chain in the US, “agreed to pay $2.25 million and implement a Corrective Action Plan to ensure that it will appropriately dispose of protected health information such as labels from prescription bottles and old prescriptions” [85]. Rite Aid and 40 affiliated entities agreed to pay $1 million, as well as “to take corrective action to improve policies and procedures to safeguard the privacy of its customers when disposing of identifying information on pill bottle labels and other health information” [86].

3.11 Google Buzz (2010) C1 Disclosure

Scale:10⁸; Purview: small group; Awareness: yes; Goal of Identifiability: yes

In 2010 Google launched a social networking service called Google Buzz as a complement to its Gmail e-mail platform. When Google users logged into Gmail on the day Buzz launched, they were encouraged to automatically sign up for Buzz. Users who signed up for Buzz were automatically configured to “follow” the Gmail users that they “email and chat with the most,” and this list of followers became publicly available, violating Gmail’s privacy policy. Information in some users’ Buzz public profiles was augmented with information from other Google products, including Picasa (photo sharing) and Reader (news reading). In February 2011 the Electronic Privacy Information Center filed a complaint before the FTC requesting an investigation against Google. Google and the FTC reached a preliminary agreement March 2011 and a final agreement in October 2011, in which Google agreed to establish a comprehensive privacy program and be subject to regular, independent privacy audits for 20 years [87].

3.12 Snapchat (2011) C1 Breach of Confidentiality

Scale:10⁶; Purview: small group; Awareness: yes; Goal of Identifiability: yes

In 2011, mobile application developer Snapchat launched a service that allowed users to send “disappearing” photos to each other. By default, the photos were visible for 10 seconds, after which time the company promised the photos would be deleted. In its FAQ the company promised that “snaps disappear after the timer runs out” and stated they could not be recovered. In fact, the snaps were not removed from the consumer phones, only made invisible. Furthermore, the snaps remained accessible on the company’s servers through an API. Snapchat further promised that it did not “ask for, track, or access any location-specific information from your device at any time while you are using the Snapchat application,” when it fact it had integrated an Android analytics tracking service into its application.

The FTC filed a complaint against SnapChat. As a result of the complaint, SnapChat agreed to establish a comprehensive privacy program and submit to third-party monitoring of its privacy practices for a period of 20 years, and direct monitoring by the FTC for a period of five years [88].

3.13 Uber “God View” (2011) C1 Breach of Confidentiality

Scale:10⁵; Purview: small group or organization; Awareness: yes; Goal of Identifiability: yes

Uber’s “God View” is a tool that showed the present location of every Uber vehicle in a geographical area. The company developed the tool to allow it to see operations and direct vehicles to areas that lacked service. Reportedly the tool allowed the operator to see the names of individuals in Uber vehicles, or to monitor a rider’s history. In 2011 the tool was demonstrated at a party [89]. In 2013 a person who interviewed for a job at Uber’s Washington office was given access to the “God View” application for a full day following the person’s interview, during which time he was able to review travel records of people that he knew [90].

In a letter to Senator Al Franken [91], Uber confirmed that it had created a tool that allowed individuals in Uber’s operations department to view the location of every car, and admitted that one of its employees had looked at real-time information on a journalist.

The Federal Trade Commission conducted several investigations of Uber as a result of the company’s business practices and a 2014 data breach. On January 19, 2017, Uber agreed “to pay $20 million to settle FTC changes that it recruited prospective drivers with exaggerated earnings claims” [92]. The settlement was amended in August 2017 to include 20 years of privacy assessments by a third party [93]. The settlement was amended yet again in April 2018 to cover an unrelated cyberattack that took place in 2016 that was “strikingly similar [to the] 2014 breach” [94].

3.14 Location Sharing in Facebook Messenger (2012). C2 Disclosure

Scale:10⁸; Purview: small group; Awareness: yes; Goal of Identifiability: yes

Facebook Messenger is an instant messaging service that allows users to communicate with each other using a “chat” interface. From 2011 until May 2015, Facebook tagged every message sent by Facebook’s Android mobile app with the location of the sender. These locations were shared with all users in a group chat, irrespective of the users’ relationships or privacy settings. The location sharing was discovered by Aran Khanna, a Harvard University undergraduate, who hypothesized that there was no public outcry regarding the casual sharing of location information because users were either not aware of the sharing or were not concerned about the collection and visibility of their locational data. Khanna developed a tool that allowed users to display a map of all of the location data they shared with other users through Facebook Messenger chats—information that was already available through the Facebook user interface, but in a more aggregated form. The tool was downloaded more than 85,000 times, and more than 170 global news publications wrote about the article. Nine days after the tool’s release, Facebook made location sharing an opt-in feature, demonstrating that “sufficient public attention may be necessary for redress of reported privacy concerns” [95].

3.15 Release by Washington State of de-identified Patient Health Records (2013). C1 Breach of Confidentiality

Scale:10¹; Purview: organization; Awareness: yes; Goal of Identifiability: no

Acting under state law, Washington State released de-identified hospital discharge records to assist in healthcare policy analysis. Researchers demonstrated that discharge records for hospitalizations resulting from accidents could occasionally be re-identified manually by correlating information in the discharge records with newspaper articles describing the accident that caused the hospitalization [96].

3.16 Revenge Porn (2015) C3 Exposure, C6 Appropriation

Scale:10³; Purview: few (1); Awareness: yes; Goal of Identifiability: yes

Website operator Craig Brittain had been operating a so-called “revenge porn” site that solicited nude photographs of women and posted the photographs with the women’s names. Brittain also operated other websites, “Takedown Hammer” and “Takedown Lawyer,” which accepted money from victims and caused the photos to be taken down. Brittain agreed to refrain from posting nude photographs or videos of people without their affirmative consent. Brittain also agreed to 10 years of monitoring by the FTC of any new business he started or employment that he took [97].

3.17 Healthcare.gov ad tracking (2015) C1 Breach of Confidentiality

Scale:10⁶; Purview: small group; Awareness: yes; Goal of Identifiability: yes

Analytics software deployed on the US Government’s Healthcare.gov website transmitted personal data, including age, smoking status, pregnancy status, parental status, zip code, state, and income to at least 14 third-party analytics and marketing firms [98]. Following Congressional hearings, the website’s operators responded by adding a privacy control panel that allows web visitors to disable tracking from their computer [99].

4. D — Invasion

Invasion is a fundamentally different kind of privacy offense than those examined above. While Collection, Processing, and Dissemination all involve information that’s taken from a data subject, invasion involves doing something to the data subject. Invasion directly impact the subject and forces a reaction. Note that there is a significant difference between invasion and interrogation: although both are the result of an interaction between the data subject and the perpetrator, in invasion the harm is caused by the interaction, while in interrogation the purpose of the interaction is the extraction of personal information.

4.1 Commercial Spam Email (1995-) D1 Invasion

Scale:10⁸; Purview: organization (small); Awareness: yes; Goal of Identifiability: no

Large-scale commercial use of unsolicited email started in 1994 [100] and grew rapidly. Spam mail was highly intrusive from approximately 1995 to 2005, until security and filtering techniques largely prevented it from reaching the inboxes of victims. Today spam mail is widely sent but is more frequently an annoyance, as filtering occasionally causes legitimate mail to be missed. In 2010, studies found that 88 percent of the worldwide email traffic was spam, amounting to roughly 90 billion email messages sent to valid email addresses each day [101]. A study published in 2012 estimated that the cost of spam to American firms and consumers was almost $20 billion annually, while spammers and spam-advertised merchants received less than $200 million per year as a result of their efforts. “Thus, the ‘externality ratio’ of external costs to internal benefits for spam is around 100:1” [102].

4.2 Facebook Get Out the Vote experiment (2010) D2 Decisional Interference

Scale:10⁷; Purview: organization; Awareness: yes; Goal of Identifiability: yes

Researchers from the University of California, San Diego and Facebook conducted a randomized controlled trial on 61 million Facebook users during the 2010 US congressional elections to see if they could motivate individuals to vote. Some users saw messages in their newsfeed allowing them to post that they voted to their newsfeeds and showing them the names and faces of their Facebook friends who voted. The results indicated that those who saw the message were 0.39% more likely to vote than those who received no messages at all.

“First and foremost, online political mobilization works. It induces political self-expression, but it also induces information gathering and real, validated voter turnout,” the authors of the study noted. “Furthermore, as many elections are competitive, these changes could affect electoral outcomes. For example, in the 2000 US presidential election, George Bush beat Al Gore in Florida by 537 votes (less than 0.01% of votes cast in Florida). Had Gore won Florida, he would have won the election” [103]. The implication is that, by determining the political leanings of an individual and then targeting specific individuals with “get out the vote” messages, major social media providers might be able to influence the outcome of closely contested elections.

4.3 Dialing Services, LLC automated calls to cellphones (2011) D1 Invasion

Scale:10⁶; Purview: organization; Awareness: yes; Goal of Identifiability: no

The FCC alleged that Dialing Services, LLC placed automated phone calls with “artificial or prerecorded” messages to millions of wireless phones without authorization, a violation of the Communications Act and the Commission’s rules that prohibit “robocalls” and “autodialed calls” to wireless phones when not made for emergency purposes or with prior express consent [104], [105], [106].

In 2012 the FTC announced a series of contests and challenges to spur inventors to develop technical solutions for fighting automated callers [107]. The grand prize went to Daniel Klein and Dean Jackson for a system called Nomorobo, which used simultaneous call technology to suppress robot calls from phone numbers that appeared on a blacklist [108]. However, within a few years robot calls were once again a major problem, largely the result of technology allowing robot callers to spoof caller-ID technology [109].

4.4 Sprint “Do Not Call” violations (2011, 2014) D1 Invasion

Scale:10⁶; Purview: no one; Awareness no; Goal of Identifiability: no

Sprint Corporation placed unwanted marketing calls and texts to consumers who requested to be placed on the company’s “do not call” list. In 2011 Sprint paid the FCC a $400,000 fine following the negotiation of a consent decree. However, Sprint continued to place phone and text messages to consumers, a violation of the Telephone Consumer’s Privacy Act. Three years later, Sprint settled with FCC, agreeing to pay an additional $7.5 million and to implement “a two-year plan to ensure compliance with FCC requirements designed to protect consumer privacy and prevent consumers from receiving unwanted telemarketing calls” [110].

4.5 Facebook Emotional Contagion Experiment (2012) D2 Decisional Interference

Scale:10⁵; Purview: small group; Awareness: yes; Goal of Identifiability: no

Facebook intentionally manipulated the news feeds of 689,003 Facebook users to determine if it could change their emotions by controlling the information they saw. In an article published in the Proceedings of the National Academy of Sciences of the United States of America, the authors concluded: “We show, via a massive (N = 689,003) experiment on Facebook, that emotional states can be transferred to others via emotional contagion, leading people to experience the same emotions without their awareness. We provide experimental evidence that emotional contagion occurs without direct interaction between people (exposure to a friend expressing an emotion is sufficient), and in the complete absence of nonverbal cues” [111].

Following the publication, there was considerable outcry in both the news media and the academic community that the researchers experimented on Facebook users without their permission and without giving the users the ability to opt out. Furthermore, even though two of the study’s authors were affiliated with Cornell University, the Cornell Institutional Review Board, the organization at Cornell that reviews human subjects research, did not approve the study. Following the outcry, the editor of PNAS published an “Editorial Expression of Concern and Correction.” The editorial noted that the original paper stated that the research “was consistent with Facebook’s Data Use Policy, to which all users agree prior to creating an account on Facebook, constituting informed consent for this research.” However, Facebook only added the term “research” to its data policy four months after the study took place [112]. Furthermore, the authors reported to the journal that “[b]ecause this experiment was conducted by Facebook, Inc. for internal purposes, the Cornell University IRB [Institutional Review Board] determined that the project did not fall under Cornell’s Human Research Protection Program” [113].

4.6 Spying/Stalking Apps on Mobile Phones (2015) D1 Intrusion

Scale:10⁶; Purview: organization; Awareness: yes; Goal of Identifiability: yes

There is growing attention to apps on mobile phones that covertly collect geolocation, application use, screen displays, and user interaction and send this information to third parties. Such apps are reportedly used by men to spy on their ex-girlfriends and by employers to spy on their employees [114]. The FTC publishes consumer information for victims of violence and stalking [115].

Analysis of Events

An aggregate-level examination of the characteristics of the privacy events reveals several trends. First, while the timeline includes 32 events from 1990 through 2015, 17 of those events occurred between 2010 and 2015. The number of privacy events per year increased as did the number of years per decade that experienced events. This trend of increasing non-breach privacy events is shown in Figure 3.

Figure 3: Privacy events between 1990 and 2015.