Who Knows What About Me? A Survey of Behind the Scenes Personal Data Sharing to Third Parties by Mobile Apps

Jinyan Zang; Krysta Dummit; James Graves; Paul Lisker; Latanya Sweeney

Abstract

What types of user data are mobile apps sending to third parties? We chose 110 of the most popular free mobile apps as of June-July 2014 from the Google Play Store and Apple App Store, across 9 categories likely to handle potentially sensitive data about users including job information, medical data, and location. For each app, we used a man-in-the-middle proxy to record HTTP and HTTPS traffic that occurred while using the app and looked for transmissions that include personally identifiable information (PII), behavior data such as search terms, and location data, including geo-coordinates. An app that collects these data types may not need to notify the user in current permissions systems.

Results summary: We found that the average Android app sends potentially sensitive data to 3.1 third-party domains, and the average iOS app connects to 2.6 third-party domains. Android apps are more likely than iOS apps to share with a third party personally identifying information such as name (73% of Android apps vs. 16% of iOS apps) and email address (73% vs. 16%). For location data, including geo-coordinates, more iOS apps (47%) than Android apps (33%) share that data with a third party. In terms of potentially sensitive behavioral data, we found that 3 out of the 30 Medical and Health & Fitness category apps in the sample share medically-related search terms and user inputs with a third party. Finally, the third-party domains that receive sensitive data from the most apps are Google.com (36% of apps), Googleapis.com (18%), Apple.com (17%), and Facebook.com (14%). 93% of Android apps tested connected to a mysterious domain, safemovedm.com, likely due to a background process of the Android phone. Our results show that many mobile apps share potentially sensitive user data with third parties, and that they do not need visible permission requests to access the data. Future mobile operating systems and app stores should consider designs that more prominently describe to users potentially sensitive user data sharing by apps.

Introduction

Since the introduction of the Apple App Store in 2007 and the Google Play Store in 2008, smartphones (and more recently, tablets and other devices running mobile operating systems) have become a dominant means of personal computing. Smartphones run programs called applications or “apps,” and the Apple App Store and Google Play Store make millions of apps available for all kinds of uses. Google reports one billion monthly active users (MAUs) on its Android platform [1], and estimates put iOS MAUs at 500 million-600 million [2]. More than 1.5 million different apps are available to users on both the App Store and Play Store [59], with the average consumer using 26 apps per month [93].

Given the popularity of apps on smartphones, consumers worry about how much personal information apps share. In a survey of more than 2,000 Americans, the Pew Research Center found that 54% of users decided to not install an app after learning about how much personal information they would need to share to use it [4]. Pew indicated that 30% of users reported uninstalling an app already on their phone because they learned that it collected personal information that they did not want to share [4]. Similar rates of avoiding apps or uninstalling apps due to privacy concerns are seen for both iOS and Android users [4]. Consumers are sensitive about the collection of geolocation data, with 30% of smartphone owners turning off the location tracking feature of their phone owing to concerns about who might access that information [4]. In a different survey of more than 1,100 Americans, 70% of respondents said that they would “definitely not allow” a cellphone provider to use their location to tailor ads [5]. In another survey of more than 3,100 Americans, 60% reported being “very upset” if an app shares their location with an advertiser [6].

Governments are starting to address data collection and sharing in apps. For example, California’s Online Privacy Act, last amended in 2013, requires developers to have privacy policies that state whether third parties can collect personally identifiable information on users [7]. In 2013, the Federal Trade Commission (FTC) pursued Goldenshore Technologies, LLC for violating the Federal Trade Commission Act, because the company’s “Brightest Flashlight Free” app had a privacy policy that did not reflect the app’s use of personal data, including location data, and because the app presented consumers with a false choice about sharing location data [8]. Internationally, in 2013, the European Union’s Article 29 Data Protection Working Party ruled that the 1995 Data Protection Directive and the ePrivacy Directive apply to all mobile apps regardless of the developer’s country of origin, and that users must first give consent before an app can install or access any information from their device and send it to third parties [9].

Why do apps trigger concerns from consumers and governments? First, an app may share a unique IDs related to a device such as a System ID, SIM card ID, IMEI, MEID, MAC address, UDID, etc. The ID can be used to track an individual [10, 11]. Second, an app can request user permission to access device functions and potentially personal or sensitive data, with the most popular requests being access to network communications, storage, phone calls, location, hardware controls, system tools, contact lists, and photos & videos [12, 13]. Some apps practice over-privileging, where the app requests permissions to access more data and device functions than it needs for advertising and data collection [14, 16, 17, 18, 19]. Third, any data collected by the app may be sent to a third party, such as an advertiser [10, 46, 58]. Fourth, a user may have a hard time understanding permission screens and other privacy tools in a device’s operating system [15, 21].

In 2010, a Wall Street Journal (WSJ) study raised concerns about the amount of data sharing by apps. The WSJ conducted a survey of 101 popular Android and iOS apps [22]. By using network analysis to examine the data transmitted by different apps, the WSJ found that 56 apps sent the device’s unique ID to third parties without a user’s awareness or consent [22]. Forty-seven apps sent the device’s location [22]. Five sent age, gender, and other personal details to third parties [22]. One major beneficiary of this data sharing was Google, with its AdMob, AdSense, Analytics and DoubleClick products receiving data from 38 of the 101 apps tested [22]. Publication of the Wall Street Journal report prompted multiple lawsuits against Apple, Pandora, The Weather Channel, Dictionary.com, and 5 other app developers in the US and Canada [23, 24, 25].

The marketplace for apps has changed significantly since 2010. Certain regulators, such as California’s Attorney General, now require privacy policies for all mobile apps in the belief that policies will specify personal data collection and sharing [7]. In 2012, Apple’s App Stores started displaying an app’s privacy policy before allowing a user to download the app [26]. Apple started phasing out UDID as a unique tracking identifier in 2011 [27] and MAC addresses and cookies in 2013 [28, 29] in favor of its own IDFA (ID For Advertisers) system, which allows a user to opt out of sharing their IDFA for tracking purposes [30]. Google launched App Ops for Android, which allows a user to toggle permissions by app after installation, in 2013, but removed the feature later the same year [31]. Google re-launched these features in May 2015 as part of Android M [32]. While operating system designers, Apple and Google, are moving in the direction of providing a user more controls over data sharing, mobile advertising has blossomed in recent years, growing from 5% of total digital advertising in 2010 to 37% in 2014, or $19 billion [22, 33]. One fast-growing area of mobile advertising is location-based ad targeting that requires sharing of geo-location data. Forecasts indicate that location-based ads will account for 52% of mobile ad spending by 2017 [34].

Given these changes in the app marketplace, our study examines how frequently apps share geo-location information. In addition, what other kinds of personal data are apps sharing today, and with what parties?

Background

There are three main approaches to surveying data sharing by mobile apps: permissions analysis, static code analysis and dynamic analysis.

Permissions analysis examines the permission requests from an app either before installation or during use as disclosed to the user, usually on the app’s download page in the Google Play Store or Apple App Store [12, 13, 35, 36, 37]. The benefit of this approach is that it allows efficient review of thousands of apps at once. The shortcoming is that the review is only at a high level, without knowing whether the app actually collects the requested data and who receives it [43]. One study of over 22,000 Android apps found that free apps and “look-alike” apps with names similar to popular ones request more permissions [13]. There is a correlation between number of downloads and the number of permission requests. The greater the number of downloads, the more likely the app requests more permissions [13]. Barrera and Oorschot’s review of 1,100 Android apps found an exponential decay in the number of apps requesting large numbers of permissions. A few apps ask for very many permissions [35].

Static code analysis studies the code of an app after decompiling to look for the permissions it requests as part of its design [14, 20, 38, 39, 40, 41, 42, 44]. This approach provides more insight into the design of the app and remains fairly easy to automate, but its accuracy depends on the decompiler used. Also, results may be too high because they include “dead code” that never actually executes in use [43]. In one review of 114,000 apps, on average, each app had at least one ad library, which often requests permissions to access the Wi-Fi network, camera, contact list, microphone, and browser history [38]. Static analysis can also detect over-privileging in apps [14]. Beyond just looking at permissions, Egele et al. found that more than half of the 1,400 iOS apps in their sample shared a unique device ID [39]. Static analysis can also uncover potential malware and vulnerabilities, such as ad libraries directly fetching and running unexpected code from the Internet [40, 41, 44].

Dynamic analysis can capture what is actually happening when an app is used, but it requires human intervention, which makes it more difficult to scale [10, 22, 43, 45, 46, 47]. Taintdroid for Android tracks private information flows from the app to its destination [45]. In one study, it found 97 out of the 145 apps tested sent potentially private information such as phone information, device IDs, or geo-coordinates to primary or third-party servers [58]. Researchers at the University of Washington expanded on Taintdroid’s functions to look for leakages of other data types such as AndroidID and to check for commonly used native functions such as MD5 hashing that may obscure the extent of data sharing [10]. Another dynamic analysis method uses a virtual private network (VPN) to monitor traffic from a device, employing tools such as Meddle or AntMonitor, which have found apps on iOS and Android that share personally identifiable information, such as name, email, and password, as plaintext [46, 47]. The 2010 WSJ study used a third method by monitoring a man-in-the-middle Wi-Fi network that a device used to connect to the Internet [22].

For this study, we focused on examining actual transmissions of personal data by apps during routine use. As our method, we selected dynamic analysis monitoring with a man-in-the-middle Wi-Fi network, as used in the WSJ report.

Other research at the FTC and Privacy Rights Clearinghouse also uses this approach [22, 48, 49]. In 2014, a researcher from the FTC’s Mobile Lab conducted a runtime analysis of 15 health and fitness apps on mobile phones [48]. In total, from the apps tested, 18 third parties received device-specific identifiers, 14 received consumer-specific identifiers, and 22 received other health information [48]. Overall, the study found that 12 of the apps surveyed transmitted information to 76 different hosts, with many of the third parties receiving information from multiple apps [48]. This result supported the findings of a 2013 Privacy Rights Clearinghouse study that surveyed 43 mobile health and fitness apps and found that the biggest risks to the privacy of the personal information of users of mobile health and fitness apps resulted from apps using unencrypted connections to third-party advertisers and analytics services [49]. Our study expands on this work by studying 110 apps in health and other categories with potentially sensitive data.

Methods

Our goal was to select and use popular free apps from the Google Play Store for Android and the Apple App Store for iOS and to record the amount and kind of sensitive data transmitted from the user’s device. Afterwards, we analyzed the recordings looking for different kinds of information sharing—specifically, personally identifiable information (PII), behavioral, and location information shared with a primary or third-party domain.

Selecting the apps

We began by selecting a number of popular free apps from the Google Play Store and from the Apple App Store. We focused on the Play Store and App Store, since they are the two largest mobile app stores, with four times the number of apps of the closest competitor, the Amazon Appstore [59]. Before March 2015, a developer could submit and publish any app in the Google Play Store with no human review, and as a result the Play Store is a largely non-curated facility [60]. An app appearing in the Apple App Store must pass a human review and a registration process [60]. The process requires the app to have a privacy policy and terms of use statement describing requests for and uses of personal information [61, 62]. Thus, the Apple App Store is a more curated facility. By choosing apps from these two stores, we thought our results might also reveal whether curating makes a difference in the transmission of personal information by the most popular apps.

Using the apps

To test an app, we simulated typical use for 10 to 20 minutes, sufficient to establish personal accounts with passwords, populate requests with personally identifiable information (PII), and use the basic functionalities of the app such as conducting a search, looking at a page of results, or playing one level of a game. Thus, the time spent on each app varied and depended on the nature of the app. We set all permissions to the most permissible—i.e., we allowed all requests for sharing geolocation and agreed to any other permission requests. However, we generally did not permit push notifications, which allow an app to send data in the background when not in use, such as when a different app was being tested. We wanted to avoid contaminating the data capture during each app’s testing with push notifications that would cause background activity from unrelated apps to bleed through. We also deleted all apps on the tested smartphone not essential to the operating system. We tested our iOS apps on an iPhone 5 and the Android apps on a Samsung Galaxy S3.

Recording app communications

We monitored and recorded all communications between the phone and the Internet using the described man-in-the-middle approach with the free software mitmproxy [25]. The mobile phone connected to the Internet through a computer running mitmproxy. Thus, mitmproxy passed along and recorded the Internet traffic on HTTP and HTTPS going to and from the phone. Encrypted communications on HTTPS were visible as clear text in this set-up, because the mitmproxy computer records the keys necessary to decipher encrypted communications. Mitmproxy recorded two pieces of information for each flow or instance of HTTP or HTTPS traffic to and from the phone: (1) the full site address and (2) clear text, if available, metadata about an image, javascript, or other files transmitted.

For each app, we assumed the flows that occur during app testing are likely due to that app’s activity. As mentioned, we minimized background processes such as push notification for other apps as much as possible to reduce contamination. However, we could not shut off all background processes, such as those related to the phone’s own operating system. Thus if Android or iOS sent traffic to the domains of Google, Apple, or others during testing, these connections might have been recorded as belonging to the specific app that was open for testing.

Analyzing the recorded app communication data

We used Python scripts to help analyze captured data. These scripts searched for transmissions in clear text for different kinds of personal data that we put into an app, such as PII and behavioral data, as well as data from the phone, such as geolocation via longitude and latitude values. Table 1 lists the kinds of personal data types that our scripts tracked and defines the categories we used. A complete list of the terms can be found in the Appendix.

When our scripts found a potential occurrence in clear text of a match to one of our inputs, we visually inspected the occurrence to determine whether the match was accurate. For example, if we input a birthday field into an app as June 1, 1980 and the script found a potential match to “06011980” in one of recorded communications for the app, we visually inspected to make sure that the match looked like part of a transmission related to a birthday rather than being part of a very large integer. One limitation of our approach is that we only had HTTP and HTTPS data, and we only looked for clear text matches based on our list of terms. Thus, if the app uses a different protocol to transmit the data or hashes data like birthday date into a less obvious string, our approach would not identify that transmission of the potentially sensitive data.

Table 1. Kinds of potentially sensitive data shared. A complete list of terms tracked for each data type and related variations can be found in the Appendix.

For analysis, we merged domains that are the same at the top two levels. For example, we combined “trafficservicecdn.telenav.com” and “logshedcn.telenav.com,” which are subdomains of “telenav.com”. In the case of websites that have country-specific suffixes, such as “ad-x.co.uk”, we merged three levels of naming. We researched each domain in order to categorize it as either a primary domain belonging to the app-maker or as a third-party domain.

We used the statistical package R to render graphs, using bipartite graphs to show how apps connected to domains where they sent potentially sensitive data. See our data citation below to access an archived copy of raw communications captured, analyses, and scripts used.

Results

We tested 110 free apps, 55 each from the Google Play Store and the Apple App Store. We tested and recorded these apps in two waves. Wave 1 was done on June 24-26, 2014 and Wave 2 on July 15-22, 2014. During Wave 1, we chose the five most popular free apps from the Google Play Store in each of the following categories: Business, Games, Health & Fitness, and Travel & Local. In the App Store, we tested similar categories: Business, Games, Health & Fitness, and Navigation. In July 2014, we expanded our testing with Wave 2 and tested the five most popular free apps in the Play Store categories Communication, Medical, and Shopping and in the App Store categories Lifestyle, Medical, and Photo & Video. In addition, we made deeper dives—testing ten apps rather than five—in the categories Health & Fitness, Social, and Travel & Local for the Play Store and in the Health & Fitness, Navigation, and Social categories for the App Store. We chose the targeted categories in Wave 1 and 2 due to their likely handling of potentially sensitive data including job information, medical data, and location. Wave 2 did not re-test apps previously tested in Wave 1. Table 2 and 3 show the list of the apps in Android and iOS that we tested along with their wave for testing. When there was a problem testing an app, we replaced that app with the next most popular app not already tested. A complete list of all apps, including those we were unable to test is in the Appendix.

Table 2. List of tested Android apps. These were the most popular apps on Google Play for Android accessed during Wave 1 (June 2014, highlighted in orange) and during Wave 2 (July 2014) in the eight categories of Business, Communication, Games, Health & Fitness, Medical, Shopping, Social, and Travel & Local. Apps appear alphabetically per category. A more thorough list of apps, including those that could not be tested, appears in the Appendix.

Table 3. List of tested iOS apps. These were the most popular apps on the App Store for iOS accessed during Wave 1 (June 2014, highlighted in orange) and during Wave 2 (July 2014) in the eight categories of Business, Games, Health & Fitness, Lifestyle, Medical, Navigation, Photo & Video, and Social. Apps appear alphabetically per category. A more thorough list of apps, including those that could not be tested, appears in the Appendix.

Android results

Out of the 55 apps that we tested for Android, Text Free, Glide, and Map My Walk sent potentially sensitive data to the most primary and third-party domains (Figure 1). The top three domains that received potentially sensitive data from the largest number of apps are google.com, googleapis.com, and facebook.com, though that appears to be less the case for location data versus PII or behavior data (Figures 1, 2, 3, and 4). Facebook.com was also the primary domain for three of the apps tested: Facebook Messenger, Facebook Pages, and Instagram.

For PII data, Text Free, Glide, and Map My Walk again rose to the top as sending data to the most domains (Figure 2). For behavior data such as a search term input into the app, Pinterest and Drugs.com are the top two apps (Figure 3). In the case of location data, such as the user’s current coordinates, Text Free and MapQuest sent the data to the most domains (Figure 4).

Figure 1. Sensitive data sharing by Android apps. Apps (left) connected to various domains (right. The color of the line indicates whether the domain is that of the primary maker (orange) of the app or of a third party (black). Apps with bigger circles shared sensitive data with more domains, both primary and third-party.

Figure 2. PII data sharing by Android apps. Apps (left) connected to various domains (right). The color of the line indicates whether the domain is that of the primary maker (orange) of the app or of a third party (black). Apps with bigger circles shared PII data with more domains, both primary and third-party.

Figure 3. Behavior data sharing by Android apps. Apps (left) connected to various domains (right). The color of the line indicates whether the domain is that of the primary maker (orange) of the app or of a third party (black). Apps with bigger circles shared behavior data with more domains, both primary and third-party.

Figure 4. Location data sharing by Android apps. Apps (left) connected to various domains (right). The color of the line indicates whether the domain is that of the primary maker (orange) of the app or of a third party (black). Apps with bigger circles shared location data with more domains, both primary and third-party.

In general, there were only one or two primary domains per app that received sensitive data, but the average number of third-party domains was 3.1 (Table 4). Health & Fitness and Communication apps sent sensitive data, mostly PII data, to more third-party domains than apps in other categories. Text Free, an app listed under the Social category of the Play Store, sent sensitive data to 11 third-party domains, more than any other app, with 9 domains receiving PII data, 2 receiving behavior data, and 6 receiving location data. The apps in the sample generally sent PII data to more third-party domains than behavior or location data. Glide, Map My Walk, and Text Free are each sending PII data to 7 or more third-party domains. Many apps have no observable traffic to any third-party domains that contain behavior or location data, hence the many empty cells in Table 4 for those two columns.

Table 4. Distribution of domains receiving any sensitive data for Android apps tested. Empty cells indicate no observed data of that type was sent to a primary or third-party domain by the app.

iOS results

Out of the 55 apps that we tested for iOS, Local Scope sent potentially sensitive data to the most primary and third-party domains (Figure 5). The top three domains that received potentially sensitive data from the most apps are apple.com, yahooapis.com, and exacttargetapis.com, especially for location data versus PII or behavior data (Figures 5, 6, 7, and 8).

For PII data, Pinterest, Map My Run, MapQuest, Piano Tiles, and Timehop rose to the top as sending data to the most domains (Figure 6). For behavior data such as search terms input into the app, Local Scope is the app sending data to the most domains (Figure 7). For location data such as the user’s current GPS coordinates, Local Scope again sent that data to the most domains (Figure 8).

Figure 5. Sensitive data sharing by iOS apps. The color of the line indicates whether the domain is that of the primary maker (orange) of the app or of a third party (black). Apps with bigger circles shared sensitive data with more domains, both primary and third-party.

Figure 6. PII data sharing by iOS apps. The color of the line indicates whether the domain is that of the primary maker (orange) of the app or of a third party (black). Apps with bigger circles shared PII data with more domains, both primary and third-party.

Figure 7. Behavior data sharing by iOS apps. The color of the line indicates whether the domain is that of the primary maker (orange) of the app or of a third party (black). Apps with bigger circles shared behavior data with more domains, both primary and third-party.

Figure 8. Location data sharing by iOS apps. The color of the line indicates whether the domain is that of the primary maker (orange) of the app or of a third party (black). Apps with bigger circles shared location data with more domains, both primary and third-party.

Much like Android apps, iOS apps usually send sensitive data just one or two primary domains, but on average to 2.6 third-party domains (Table 5). Every category had a mix of apps that sent sensitive data to third-party domains and apps that did not. Local Scope, an app listed under the Navigation category of the App Store, sent sensitive data to 17 third-party domains, more than any other app, with 15 domains receiving behavior data, and 17 receiving location data. Piano Tiles and Pinterest both sent PII data to at least 3 third-party domains. Job Search – Indeed.com and Local Scope sent behavior data to at least 3 third-party domains. Job Search – Snagajob, Nike+ Running, Groupon, Walgreens, Urgent Care, Local Scope, and Phone Tracker sent location data to at least 3 third-party domains.

Table 5. Distribution of domains receiving any sensitive data for iOS apps tested. Empty cells indicate no observed data of that type was sent to a primary or third-party domain by the app.

Potentially sensitive data types shared with third-party domains

For Android apps, the most common data type shared with a third-party domain is a user’s email address, which is PII data, with 73% of the Android apps transmitting that data (Table 6 and 8). Other commonly shared data types in Android include name (49% of apps), address (25% of apps), and phone information such as IMEI number (24% of apps) for the PII data category, username (25% of apps) for the behavior data category, and location data such as the user’s current GPS coordinates (33% of apps) (Table 8).

Less commonly shared data types may still be potentially sensitive data. For example, the Drugs.com app shared medical info input by the user in testing—including words such as “herpes” or “interferon”—with 5 third-party domains: doubleclick.net, googlesyndication.com, intellitxt.com, quantserve.com, and scorecardresearch.com. None of the 5 domains directly received any PII from the app, though google.com and googleapis.com did receive names and email addresses while the app ran. For a different type of potentially sensitive behavior data, the Business category apps, Job Search and Snagajob, shared employment-related search terms such as “driver,” “cashier,” and “burger” with third-party domains google.com, google-analytics.com, scorecardresearch.com, and 2o7.net during testing. One of the domains in the Job Search app, google.com, also received PII data including the user’s email address. The third-party domains that received passwords from apps include crashlytics.com for RunKeeper, appspot.com for Snapchat, and instagram.com for Timehop.

Finally, some apps sent to the same third-party domain potentially sensitive combinations of data such as name and current GPS location. Facebook.com connected with 7 apps, American Well, Groupon, Pinterest, RunKeeper, Tango, Text Free, and Timehop, to access this data combination. Appboy.com received this data on the Glide app.

Table 6. Categories of sensitive data (columns) shared to third-party domains by Android apps (rows). Cells shaded orange indicate that at least one third-party domain received data of that category while the selected app ran. The values inside orange cells show specifically how many third-party domains received the data.

For iOS apps, the most common data type shared with a third-party domain was a user’s current location and GPS coordinates, with 47% of the apps transmitting that data (Table 7 and 8). Other commonly shared data types in iOS include name (18% of apps) and email address (16% of apps) in the PII data category (Table 8). 4 out of the 5 Game apps tested transferred name and email data to the domain apple.com, specifically to Apple’s Game Center site at service.gc.apple.com. Pinterest, a Social category app, sent names to 4 third-party domains, yoz.io, facebook.com, crittercism.com, and flurry.com. The third-party domains that received passwords from apps include instagram.com for Timehop for InstaSize and appspot.com for SnapChat.

A few different apps shared potentially sensitive behavior data from user inputs and searches with third-party domains. For example, Period Tracker Lite shared an input into a symptom field of “insomnia” with apsalar.com. In the Business category, the two Job Search apps from Indeed.com and Snagajob shared employment-related inputs such as “Nurse” and “Car mechanic” with 4 third-party domains, 207.net, healthcareresource.com, google-analytics.com, and scorecardresearch.com.

Finally, compared to Android, fewer of the tested iOS apps sent the same third-party domain potentially sensitive combinations of data such as name and current GPS location. Facebook.com connected with 2 apps, Pinterest and Timehop, to access this data combination.

Table 7. Categories of sensitive data (columns) shared to third-party domains by iOS apps (rows). Cells shaded orange indicate that at least one third-party domain received data of that category while the selected app ran. The values inside orange cells show specifically how many third-party domains received the data.

Compared to Android apps, fewer iOS apps shared PII and behavior data with third-party domains. In some data types, the contrast is significant, with 73% of Android apps transmitting email addresses versus 16% of iOS apps. In addition, 49% of Android apps transferred either first or last name, compared to 18% of iOS apps. On the other hand, more iOS apps (47%) than Android apps (33%) transmitted current location data, including GPS coordinates, to a third-party domain. In terms of all 110 apps tested across both operating systems, the top three data types most commonly shared were email (45% of apps), location (40% of apps), and name (34% of apps).

Table 8. Summary of the number of apps in Android and iOS sharing data with third-party domains by data type. We looked for phone info for Android apps only.

Third-party domains that received potentially sensitive data from the most apps

Table 9 shows the 13 third-party domains that received potentially sensitive data from at least 4 of the Android or iOS apps that we tested. The top 6 third-party domains provided API functions for the app that allowed the app to access code libraries and datasets provided by Google, Apple, Facebook, ExactTarget, and Yahoo. Apps mostly shared PII and behavior data with Google.com and Googleapis.com in Android, while apple.com received location data in iOS. Since we were not able to disable all background processes ran by the Android or iOS operating systems during, some of the observed data transmissions to Google or Apple domains may have been due to unrelated background processes. No single analytics or advertising third-party domain dominated in receiving potentially sensitive data across a large number of the apps in the sample. The most popular analytics domain, google-analytics.com, and the most popular advertising domain, scorecardresearch.com, received data from only 5% of the apps tested. We found 94 distinct third-party domains that received at least one instance of potentially sensitive data from one of the 110 apps tested.

Table 9. Top 13 third-party domains that received any sensitive data from the apps tested. The top 13 domains received sensitive data from at least 4 apps in the sample. The table categorizes each domain by its primary function for its API, analytics, or advertising-related capabilities.

One third-party domain not included the tables and figures presented is safemovedm.com, which was connected to by 51 or 93% of the Android apps tested. The purpose of this domain connection is unclear at this time; however, its ubiquity is curious. When we used the phone without running any app, connections to this domain continued. It may be a background connection being made by the Android operating system; thus we excluded it from the tables and figures in order to avoid mis-attributing this connection to the apps we tested. The relative emptiness of the information flows sent to safemovedm.com indicate the possibility of communication via other ports outside of HTTP not captured by mitmproxy. These other ports—which may be monitored by sniffers such as Wireshark—may be of future interest in a subsequent mobile app security study.

Figure 9. Flow of information from mitmproxy for connections to safemovedm.com. Since mitmproxy only examines HTTP and HTTPS traffic, it may be possible that other tools such as Wireshark might be used in future studies to monitor FTP and other types of traffic to safemovedm.com.

Discussion

We found that many mobile apps transmitted potentially sensitive user data to third-party domains, especially a user’s current location, email, and name. In general, iOS apps were less likely to share sensitive data of nearly every type with third-party domains than were Android apps, except for location data (Table 8). One reason might be the App Store human curation process that checks to see if apps only ask for personal information for app-related purposes [61, 62]. Collecting location data, including GPS coordinates, requires an app to request the permission of the user, which would occur before installation on the app download page for Android and as a pop-up notification during use for iOS [64, 64]. Thus, receiving location data requires user approval of a more prominent notification for iOS, and we saw more iOS apps (47%) sending location data to third parties than Android apps (33%). Our results for each operating system were in line with other studies [22, 36, 58]. In contrast, we found significantly less sharing of behavioral data, such as search terms from Medical and Health & Fitness apps, compared to previous research on data-sharing on healthcare websites. A 2015 study of more than 80,000 healthcare webpages found that on 70% of the pages, third parties can learn about the specific “conditions, treatments, and diseases” viewed [65, 66]. In our study, only 3 apps out of 30 Medical and Health & Fitness apps sent medical info, including search terms, to a third party (Table 6, 7).

The average Android app sent sensitive data to 3.1 third-party domains, and the average iOS app connected to 2.6 third-party domains. The top domains that received sensitive data from the most apps belonged to Google and Apple (Table 9). Other studies have found a similar dominance by Google [10, 46, 58]. One factor may be the mobile ad networks and services operated by Google with AdMob, DoubleClick, and Google Analytics [68], and by Apple with iAds [69]. It is also possible that system processes that we were unable to turn off on Android and iOS sent data to the two companies’ domains in the background while we tested our apps. Besides Google and Apple, no other third-party domain in our study received data from more than 14% of the apps tested. By contrast, the reach of third-party advertisers on websites is very extensive, with the top 12 ad networks all reaching more than 50% of American Internet users [67].

Implications for technology design and policy

The results of this study point out that the current permissions systems on iOS and Android are limited in how comprehensively they inform users about the degree of data sharing that occurs. Apps on Android and iOS today do not need to have permission request notifications (Figure 10, 11) for user inputs like PII and behavioral data. Three options are under current development by researchers, regulators, and companies for users who may want to more comprehensively protect their privacy while using mobile apps. These are: (1) send false data in response to app requests, (2) allow users to opt out of data collection, and (3) design app stores to prominently inform users about third parties who may receive their data.

Figure 10. iOS permission request notification for location data [94]. iOS does not require apps to have notifications like this for PII or behavior data.

Figure 11. Android permission request notification for location data [95]. Android does not require apps to have notifications like this for PII or behavior data.

Researchers have designed tools that can protect user privacy by sending false data to satisfy permission requests from apps. MockDroid, TISSA, and AppFence are three examples that send fake information back to the app if it makes certain API calls [70, 71, 80]. It may be possible to modify these tools to send fake user data inputs as well when the recipient is a third-party domain, though that may also impact the experience of the app for targeted advertising and other functions that depend on accurate user data.

Another option is to provide mobile users an opt-out option to limit tracking and sending of user data to third parties. In recent years, web browsers, with the support of the Federal Trade Commission and the White House, implemented “Do Not Track” settings that signal to sites that the user is choosing to limit data collection of their search and browsing patterns [72]. However, this signal is voluntary with no set standard on how websites should respond [72]. Since 2014, California started requiring each website to describe in its privacy policy how the site will respond to a “Do Not Track” signal from a user’s browser, though sites may still choose to ignore the signal [72]. In 2013, the FTC stated in a non-binding report that mobile apps should include a do-not-track feature to safeguard personal information [73]. No federal or state legislation mandates compliance with a do-not-track signal.

Despite the lack of formal regulation, Google and Apple in recent years implemented tracking prevention settings to a degree on their mobile operating systems. In September 2012, Apple launched a “Limit Ad Tracking” feature as part of iOS 6 that blocks ad networks from collecting their IDFA, a unique device ID [74]. By April 2014, Apple stated that it may remove or deny apps that don’t respect the “Limit Ad Tracking” setting [75]. Following Apple’s lead, Google implemented a similar “Opt out of interest-based ads” setting in Android KitKat in October 2013 [76]. However, as Google notes, this setting will not stop interest-based ads not served by Google or not part of the Google Display Network [77]. Also, even if a user opts out of interest-based ads, an app may still track user activity for “contextual advertising, frequency capping, conversion tracking, reporting, security and fraud detection” [78]. Interestingly, as gatekeepers of the operating system, companies such as Apple and Google moved faster than the regulators in providing and enforcing an opt-out option to tracking on mobile apps to consumers.

Finally, app stores can show the degree of third-party data sharing more prominently on their app download page to inform users before they install the app. Many apps may describe the degree of data collection and sharing with third parties in their privacy policies, which research has found to be confusing, dense, misunderstood, and often ignored by consumers [79, 81]. App stores may choose to feature this information more noticeably, using notices similar to what exists for children’s apps today. Apps meant to be used for children, such as learning or game apps, are under more scrutiny by regulators as a result of laws such as the federal Children’s Online Privacy Protection Act (COPPA), enforced by the FTC to control the amount of geolocation data, photos, videos, audio recordings, and persistent identifiers collected and shared by apps without parental consent [82]. In California, S.B. 568 gives minors the right to an “Eraser Button” that will remove any content or information they submitted to websites or apps [83]. Beyond regulations, civil society groups such as Moms With Apps have signed on more than 300 app developers to practice best practices by disclosing the data collection and sharing activities of their apps [84]. Mom With Apps even built its own app store, which allows parents to filter apps by requirements such as “Works without internet,” “No in-app purchases,” “No links to social networks,” and “No advertising” [85]. In September 2013, Apple launched Kids App Store, which includes apps for children that comply with COPPA restrictions and limit advertising [86]. Google followed in April 2015 with the launch of its “Designed For Families” program for Android apps [87]. In conclusion, app stores could possibly adapt the current designs for children’s apps more broadly to apply to all apps by clearly describing the degree of third-party data sharing by an app before it is downloaded.

Future work

To expand on the results of this study, future research can focus on improving the accuracy of the Internet traffic captured for each app, testing more apps under different conditions, and reviewing whether privacy policies reflect the data collection and sharing activities recorded for each app.

We can improve the app testing process by looking at non-TCP traffic, leakage through simple hashing like MD5, and less contaminated transmissions without background system processes. The Privacy Rights Clearinghouse study of 43 health apps used a man-in-the-middle proxy like ours to monitor all HTTP and HTTPS traffic, and it also had WireShark and tcpdump to monitor all packet-level traffic that is not on TCP [49]. Therefore, a future study could also incorporate WireShark and other tools to examine non-TCP traffic for data leakage (we note that one study using a VPN method to capture all device traffic found that over 90% of traffic volume from apps is on TCP [46]). We might also look for potential leakage of simply encrypted versions of sensitive user data sent not as clear text but by using common hashes such as MD5 that may be vulnerable to attack [88]. One 2011 study found that multiple apps sent AndroidIDs simply hashed with MD5 as plaintext to different ad networks including Google’s DoubleClick and AdMob [10]. There was no “salting” or extending the identifier with new data to make it more difficult to decrypt by those who may have intercepted the plaintext traffic [10]. Finally, future work may be able to use modified tools such as Taintdroid to monitor both the operating system and the app so as to better distinguish between leakage that occurred as a result of app activity versus a background system process [45].

Future research might also expand the scope of our testing with more apps under different conditions. We tested apps in only 9 categories in the Play Store and App Store. As of September 2015, the Play Store had 27 categories, and the App Store had 23 categories [89, 90], so there are many more categories to examine. We could also test paid apps to see whether their data sharing patterns differ from those of free apps. Beyond just testing more apps, we can re-test the apps in our sample to track changes in their data sharing over time. Finally, we could test apps under the condition of turning on Android’s “opt out of interest-based ads” and iOS’ “Limit Ad Tracking” settings to see if we observe a difference in app activity.

Finally, we could review the privacy policies and terms of use for each tested app to see if the policy matches the practice. The FTC has fined companies, such as Path and Goldenshore Technologies, for deception for collecting and sharing data in their mobile apps despite the claims in their privacy policies [8, 91, 92]. Future work may provide other examples of apps whose practices contradict their privacy policies.