Voter Identity Theft: Submitting Changes to Voter Registrations Online to Disrupt Elections

Latanya Sweeney; Ji Su Yoo; Jinyan Zang

Abstract

Could an attacker impact U.S. elections by merely changing voter registrations online? This reportedly happened during the 2016 Republican primary election in Riverside County, California. What about elsewhere? We surveyed official voter record websites for the 50 states and the District of Columbia and assessed the means and costs for an attacker to change voter addresses. Relatedly, an attacker could also change party affiliations, delete voter registrations, or request absentee ballots online. A voter whose address was changed without her knowledge, for example, in most states would have a polling place different than expected. On Election Day, when she appeared at her presumed polling place, she would have been unable to cast a regular vote because her name was not on the precinct’s register. She may have been turned away or given a provisional ballot, and in many cases, a provisional ballot would not count. Perpetrated at scale, changing voter addresses, deleting voter registrations, or requesting absentee ballots could disenfranchise a significant percentage of voters, and if carefully distributed, such an attack might go unnoticed even if the impact was significant. So, how practical is it to submit false changes to voter registrations online?

Results summary: We found that in 2016, the District of Columbia and 35 of the 50 states had websites that allowed voters to submit registration changes. These websites determined whether a visitor was an actual voter by requesting commonly available personal information. Some websites gave multiple ways for a voter to self-identify. Of these, {name, date of birth, address} was required in 15, {name, date of birth, driver’s license number} was required in 27, and {name, date of birth, last 4 SSN} was required in 3. We found that an attacker could acquire the voter names, demographic information and government-issued numbers needed to impersonate voters on all 36 websites from government offices, data brokers, the deep web, or darknet markets.

Overall, the total cost of an attack varied based on the number of voters to impersonate, data sources used, whether the websites had CAPTCHAs, and specific states of interest. We found that the practical costs of changing 1 percent of the voters on all 36 websites could range from $10,081 to $24,926 depending on whether the attacker used data from government, data broker, darknet or other sources. Costs for an attack on a specific geographical area or state were much less, such as $1 for Alaska or $1,020 for Illinois. Back office processes and election practices, which varied among states, could have possibly limited attack success rates.

Keywords: election, voting, identity theft, authentication, computer security, privacy

Background

Online Voter Registration

In many states, the ability to change existing voter registrations online is provided along with the ability for new voters to register online (“online voter registration” in this writing). The first state with online voter registration was Arizona in 2002, followed by Washington six years later [3]. There were 11 more states offering online registration by 2012, 7 more by July 2014 [3], and a reported total of 31 states and D.C. by June 2016 [4]. Kentucky was among the most recent, launching its system on March 1, 2016 [5].

One of the major motivations for states to provide an online voter registration tool is to reduce cost. While an online tool costs approximately $240,000 on average to build, with the highest cost of $1.8 million for California [4], [6], voter registration websites can significantly reduce costs. For example, Arizona reported a per-registration processing cost of 83 cents for paper registrations but only 3 cents for online registrations [4]. Beyond cost reductions, improved accessibility and visibility of registration data are additional benefits of online registration systems to voters and election administrators [7].

In January 2014, the Presidential Commission on Election Administration (PCEA) recommended that U.S. “states adopt online voter registration” [8] in order to reduce errors from paper-based registration, save money, decrease delays at the polling place with more accurate rolls, and improve the voter registration experience.

In a 2014 survey of 8,000 local election officials, researchers found that moving to online registration was one of the most common responses to an open-ended question about how to improve the election administration process. Similar comments were made repeatedly in public hearings of the PCEA [9].

While encouraging states to deploy online voter registration systems, the PCEA report also cautioned that “questions about security will require close attention to ensure that unauthorized changes to voter registration cannot be made” [8]. These concerns were reiterated by the Congressional Research Service (CRS) in a report on October 18, 2016, which stated that “successful attacks could compromise the confidentiality, integrity, or availability of election information or processes…For example, voter registration lists could be deleted or altered” [10]. If so, how could it be done?

An attacker who pretends to be a voter online engages in a form of identity theft. The attacker’s goal is to disenfranchise the impersonated voter or to discredit the election system, which is possible if the impostor can convince the website that he is the actual voter and is therefore eligible to submit online changes to the voter’s registration record and his submitted changes are accepted. How does the website know a visitor is a particular voter? A website considers anyone, or anything, that provides the correct personal information about a voter to be the voter. So, an impostor needs to have personal information about voters in order to impersonate voters online.

The paragraphs below describe identity theft, sources of personal information, ways that changing voter registrations can disenfranchise voters or undermine confidence in elections, and the kinds of back office logs state administrators may keep to limit an attacker or help track one down. Afterwards, we use these concepts to figure out what sources of personal information and which websites would have allowed impostors to submit changes to voter registrations in the 2016 presidential election.

Voter Identity Theft

Identity theft is the fraudulent use of a person’s identifying information, usually for financial gain. We introduce the term “voter identity theft” (or “voter ID theft”) to describe an impostor’s use of a voter’s personally identifying information to impersonate the voter in the voting franchise. The Riverside County, California case appears to be the first reported case of a cyber-attack using voter ID theft [1], [2].

Other government services encounter identity theft as a challenge as they operate and secure systems to serve the needs of Americans. For example, the Internal Revenue Service (IRS) regularly confronts identity theft in connection with tax fraud. The attacker impersonates a taxpayer and files a false tax return to obtain a tax refund before the actual taxpayer files her own legitimate return. When the actual taxpayer files subsequently, she receives a notice of double filing from the IRS and begins a lengthy administrative review process [11]. According to the U.S. Government Accountability Office in a report to Congress, the IRS estimates that in 2014, it paid $3.1 billion in refunds for 1.3 million false tax returns filed by identity thieves [11]. Anti-identity theft measures such as the IRS’s Identity Protection PIN and e-File PIN, which were given only to taxpayers who have already been victims of identity theft, were found to have been compromised by identity thieves and were suspended in 2016 [12], [13]. In May 2015, the IRS suspended its Get Transcript service, which allows taxpayers to view their old tax returns online, after identity thieves attacked the service and used the personal information of 100,000 taxpayers to log in and acquire the taxpayers’ returns [14].

Cyber-attacks involving identity theft do not require technical penetration of the computers that power the websites, nor do they require the kind of computer “break-ins,” data breaches, or compromised passwords traditionally discussed as computer security concerns. Instead, cyber-attacks involving identity theft rely on access to personal information, which today is widely available on Americans. Impersonating a registered voter online merely requires having personal data on the voter and knowledge of election specifics. It involves little or no computer hacking expertise.

Attacks based on Voter ID

Without breaking into a computer or compromising a password, voter ID theft can disrupt voting by changing voter registrations. Here are three ways.

The attacker can change a home address to make a voter ineligible to vote in the local precinct (or county) where the voter expected to vote in most states [15]. We term this the “change of address attack.”
The attacker can request an absentee ballot, making a voter ineligible to vote in person such as in some counties in Florida [16], or unless the voter remembers to bring the blank absentee ballot to the polling place in states such as in California or Virginia [17], [18]. We term this the “absentee ballot attack.”
The attacker can change party affiliation to a different party to make a voter ineligible to vote in closed party primaries as reported in the Riverside County incident described earlier.

An attack can combine the first two methods by first changing a voter’s address to an attacker-preferred address and then requesting that an absentee ballot be sent to the new address. In this case, the attacker might submit a false absentee ballot on behalf of the voter. This was one of the main concerns of security researchers regarding Maryland’s decision on September 14, 2016 to move to a new online ballot system [19], [20], [21].

Changes to voter records made by an attacker can have long-term consequences in many states. According to the National Voter Registration Act (NVRA) of 1993, a state is allowed to remove voters from local registration rolls due to address change, mental incapacity, criminal conviction, or death [22], with address change generally the only option that publicly accessible state online websites offer to voters.

Provisional ballots

A malicious attack that changes a voter’s home address in most states can adversely impact the voter’s experience at the polling place. The voter may be prohibited from voting, required to somehow locate and visit the precinct assigned to his new address—an address that is unknown to him—or obliged to cast a provisional ballot at the precinct that is no longer assigned to him [23]. If the new address set by the attacker is still in the same Congressional district as the old address of the voter, then the NVRA requires states to permit the voter to vote either at the previous polling place, the new polling place, or a designated central location upon “oral or written affirmation by the registrant of the new address.” A voter will find it difficult to affirm a new address if the new address on file results from an attack rather than an actual move [22]. If the attacker chooses a new address in a different Congressional district, then the voter would likely receive a provisional ballot at her old precinct on Election Day, but provisional ballots are often not counted, depending on the state and circumstances.

The National Conference of State Legislators (NCSL) in the United States conducted a survey in June 2015 of the nation and found 24 states (updated to exclude Illinois) do not count provisional ballots that were cast in the wrong precinct [23]. Illinois counts provisional ballots in some cases. Table 1 has a summary. Twenty-one states and D.C. include in their vote tallies a partial count of provisional ballots cast in the wrong precinct by only counting votes for federal races, statewide races, or local races that are shared between the new and old precincts [23]. Maine does a full count of all races from all provisional ballots cast in the wrong precinct only if the number of provisional ballots cast is large enough to affect the outcome of an election [23]. Idaho, Minnesota, and New Hampshire do not issue provisional ballots since they offer same-day registration. North Dakota does not have voter registration and only issues provisional ballots when poll hours are extended [23]. In 2016, 2.5 million provisional ballots were cast nationally, and 62 percent were counted in full [24].

Table 1. State Handling of Provisional Ballots Cast in the Wrong Precinct. From the National Conference of State Legislators in June 2015 [23] corrected by Illinois election official. Idaho, Minnesota, New Hampshire, and North Dakota are excluded for having either same-day registration or no registration.

Timing matters

For a hypothetical change of address attack to succeed, the attacker would need to focus on those states that host websites that allow changes to voter rolls and perpetrate the attack on those websites before registration deadlines. According to the NVRA, states may set registration deadlines of up to 30 days before an election [22]. In the days before the deadline, voters can submit personal information to the state on physical or electronic voter forms for change of address, which the state can then act on by immediately updating the voter rolls before the election [22]. After the registration deadline, states close the voter rolls to updates until after the election. Voters are expected to vote at the designated precincts based on the addresses that appear on the rolls [25].

In order for a falsified absentee ballot attack to succeed, an attacker needs to request the ballots on behalf of voters before the absentee ballot request deadline of each state. After the deadline passes, voters filing absentee ballot requests may be required to vote in person. Deadlines for absentee ballot requests range from 21 days before the election for Rhode Island to noon of the day before for 7 states [26].

In the case of the Riverside County attack, described earlier, registration changes may have occurred as early as two months before the election. At least one voter reported seeing his party affiliation changed online when he checked on April 11, almost two months before the June 7 primary [1]. However, complaints did not become prominent until the Republican primary on June 7, reportedly when voters found themselves receiving unexpected provisional ballots at the polling place.

Authenticating voters

A critical component to the change of address attack and the absentee ballot attack is the personally identifying information a state voter website requests from a visitor to prove he is the actual voter.

A May 2015 survey by Pew of 20 states found that each of the states surveyed used a combination of either driver’s license Number/State ID number, last 4 digits of the Social Security Number (SSN), and/or the full Social Security Number to authenticate website visitors for voter registration [3]. Louisiana also required the audit code of the driver’s license or State ID in addition to the driver’s license Number or State ID number, and Washington required the issuance date of the driver’s license or State ID in addition to the driver’s license Number or State ID number [3].

Figure 1 shows the voter registration pages for Delaware at which a Delaware voter can change home address. In Delaware, a registered voter may enter either name, date of birth, and 5-digit ZIP (postal) code, as shown in Figure 1a, or driver’s license number (or State ID) and date of birth, as shown in Figure 1b.

Figure 1. Information required to identify a voter at the Delaware website is either: (a) name, date of birth, and ZIP; or, (b) driver’s license number (or State ID if not a driver) and date of birth. In the example shown, the voter also has to enter “LANARK” in the CAPTCHA field to proceed. Visiting the web page again would require the voter to enter different text in the CAPTCHA field. See Figure 4 for the source URL.

Sources of Personal Information on Americans

The intent is to require voters to provide “information that others will not have,” according to the bipartisan National Conference of State Legislators [4]. However, in today’s data-rich society, is it reasonable to believe that a Social Security number or driver’s license number, or in the Delaware example (Figure 1) date of birth, is only known to the voter?

Social Security numbers (SSNs) are issued by the federal U.S. Social Security Administration to people who work in the United States, babies born in the United States, and people who are tax dependents of U.S. workers. Driver’s license numbers refer to the numbers that appear on identification cards issued by states as a requirement to drive a motor vehicle in the United States. Many states issue identification cards, through the same department that issues driver’s license numbers, to those state residents who do not drive.

Below are four readily available ways of procuring personally identifying information on Americans, including SSNs and driver’s license numbers.

Purchase from a data broker or vendor
There is a significant ecosystem of data brokers and vendors that legally acquire sensitive personal information such as Social Security Numbers or driver’s license numbers to sell to clients for background checks, fraud checks, or other purposes [27]. These companies include credit report bureaus such as Experian and data brokers such as Acxiom [28], [29]. Some companies sell their data to others who may then, in turn, use it for identity theft purposes [30],[31]. For example, in 2013, Experian discovered that one of its subsidiaries, Court Ventures, sold 3.1 million personal records of Americans including Social Security numbers, dates of birth, addresses, previous addresses, phone numbers, email addresses and other sensitive data to a Vietnam-based identity theft service, Superget.info, from at least 2011 to February 2013 [30]. The Vietnam company earned $1.9 million by selling its subscription to Court Venture’s data to its 1,300 clients for use for identity theft [30]. In February 2016, the Federal Trade Commission won $5.7 million judgments against two data brokers, LeapLab and Leads Company, for gathering personal information from payday lending applications. The information gathered was a consumer’s name, address, phone number, employer, Social Security number, and bank account number, including the bank routing number. These companies sold the data to non-lenders, who had no need for this information, for $0.50 a record [30].
Acquire data from a data breach, directly or indirectly
In recent years, multiple major data breaches have affected millions of Americans. A 2015 survey by the American Institute of CPAs found that 25 percent of Americans fell victim to information security breaches in the past year [32]. The number of data breaches, as tracked by the Identity Theft Resource Center, reached an all-time high in 2015 of 338 breaches involving Social Security Numbers and a combined total of 164.4 million records [33]. Victims of major breaches include: (a) the 21.5 million current and former federal workers affected by the Office of Personnel Management breach in July 2015 [34]; (b) the 78.8 million current and former customers and employees of Anthem, a health insurance company, who had their names, birth dates, Social Security numbers (SSNs), home addresses, and other personal information stolen in an attack on February 2015 [35]; and, (c) the 15 million customers of T-Mobile, the wireless service provider, who had their names, dates of birth, addresses, Social Security numbers, and driver’s license numbers exposed through a breached Experian server that T-Mobile used for credit assessment in October 2015 [36]. Even public figures such as former First Lady Michelle Obama, former Vice President Al Gore, and the singer Beyoncé have had their credit reports and Social Security numbers posted online in March 2013 as a result of attacks against the three credit bureaus Experian, Equifax, and TransUnion [37].
These breaches and many documented others provided personal data on Americans sufficient to enable black-market vendors to sell personal credentials such as Social Security numbers at fairly low prices. One survey in August 2016 found Social Security numbers sold discreetly on the web for $1 per record [38]. Security researchers also found web sites such as SSNDOB that sell Social Security numbers for $1 and driver’s license number for $4 [39]. In two years, SSNDOB served more than 1.02 million Social Security numbers and more than 3.1 million birth records to its customers [40]. SSNDOB allegedly acquired the personal data by infiltrating the internal systems of data brokers LexisNexis and Dun & Bradstreet and the background check company HireRight [40].
There have been breaches of voter data from state agencies and from political campaigns. In June 2016, the United States Federal Bureau of Investigation (FBI) investigated a hacking attack against a database of 15 million voter records at the Illinois Board of Elections, with the attackers using a SQL injection to steal tens of thousands of records [41], [42], [43]. The database was 10 years old and included voter’s names, addresses, sex, birthdates, with some records including the last 4 digits of the voter’s SSN or the voter’s entire driver’s license number. Board of Elections officials declared that they were “highly confident [the hackers] weren’t able to change anything” [42]. Investigators believe the hackers were likely based overseas, and suggested Russia as a possibility [44].
In May 2016, Arizona officials took the statewide voting registration system offline after the FBI alerted the Arizona Department of Administration of a credible threat [41]. Arizona officials discovered that a county election official’s username and password had been posted online. That official’s account could only access county-level data rather than the entire state system. State investigations of the incident found no evidence that any data in the system was compromised or any malware installed.
Georgia had two significant breaches of voter information in about the same number of years. A breach of about 7.5 million Georgia voter records took place in March 2017 [45]. In 2015, the Office of the Georgia Secretary of State accidentally sent the Social Security numbers and other private information of more than 6 million Georgia voters to a distribution list of media outlets and political parties [45].
Predict driver’s license numbers from other personal information
Intrinsic vulnerabilities in the design of some driver’s license numbers may allow others to predict an individual’s driver’s license number from publicly available personal information such as the individual’s name and date of birth.
University of Wisconsin–Madison researcher Alan De Smet has published online the methodology to predict the driver’s license numbers of 11 states: Florida, Illinois, Maryland, Michigan, Minnesota, Nevada, New Hampshire, New Jersey, New York, Washington, and Wisconsin [46]. These states use a combination of encoded first name, last name, middle initial, and date of birth [46]. Maryland uses driver’s license numbers as part of its verification check before delivering absentee ballots online to voters, but it is also a state with predictable driver’s license based on a voter’s last name, first name, middle initial, and birth month and date [19], [20], [47].
Access voter registration data
All states provide copies of voter rolls to political campaigns and others. These data often include the name, address, date of birth, party designation and voting history for each voter. These data do not usually include the SSN or driver’s license number of voters, however. Some states provide the data for a fee, while others provide it freely online. For example, anyone can download a copy of the voter registration data for Ohio [48]. Data brokers that specialize in voter data, such as Aristotle [49] and L2 [50], also make voter registries available.
Once released, a copy of a voter list may be shared widely. For example, a Republican contractor’s database that contained nearly every voter in the United States was left exposed online for anyone to copy or access [51].

Audit logs

Webserver logs can record the date, time, and Internet addresses of mobile devices and computers connecting to a website. Database logs can record all changes made to the voter database over time. Maintaining these kinds of audit logs will not prevent a change of address attack or an absentee ballot attack. However, audit logs can help determine, after the fact, whether an attack actually occurred and could possibly provide information about the attacker and the extent and nature of the attack.

The 2014 Pew survey found only 11 out of the 13 states used audit logs for their online voter registration system, with California and Arizona not providing a response to the question [6]. The registrar for the voter data in Riverside, California reportedly maintained no logs [1], [2].

As mentioned earlier, the IRS faces similar cyber-security and identity theft challenges. In response to its challenges, IRS mandates that certain security requirements be met by Authorized IRS e-File Providers [52], such as a requirement to maintain audit logs [52].

CAPTCHAs

An impostor can manually submit address changes on a voter website, one voter at a time. To increase the number of voter record changes per hour, he could employ more people to do the same or automate the process by having a computer program submit the kind of changes at the website that he would submit by hand. Once configured, a computer program could conduct a change of address attack or an absentee ballot attack without human intervention, iteratively impersonating each targeted voter. In addition, a computer program can operate on multiple machines simultaneously. Automation can dramatically increase the number of voter record changes submitted per hour.

A CAPTCHA (Completely Automated Procedures for Telling Computers and Humans Apart) can slow automated attacks. CAPTCHAs are a security scheme first proposed in 2000 and now widely used online to determine whether a user is more likely to be a human than a computer program [53], [54]. A CAPTCHA displays an image or group of images on the web page and asks the viewer to enter the text displayed in the image or to answer a question about the displayed image(s). Humans can usually respond easily to a CAPTCHA, but computer programs tend to have a difficult time interpreting images, making the proper response to a CAPTCHA more difficult for a program to achieve. Therefore, if a voter website has a CAPTCHA, the attacker has to find a way to automate or semi-automate responses to the CAPTCHA in order for his computer program to iterate its execution over multiple voter records [54], [55].

Figure 1 shows the voter registration pages used by Delaware registered voters to submit address changes. Voters have one of two choices, to either enter name, date of birth, and ZIP (Figure 1a) or enter driver’s license number and date of birth (Figure 1b). Both options display a CAPTCHA at the bottom of the page. The voter in the example shown has to enter “LANARK” in the CAPTCHA field to proceed. Visiting the web page again requires the voter to enter different text in the CAPTCHA field.

The IRS in its security requirements for Authorized IRS e-File Providers [52], the NCSL in its advice to states constructing online voter registration systems [4], and the National Institute of Standards and Technology in its Guidelines on Securing Web Servers all recommend that websites that accept information from visitors to the website use CAPTCHAs [56].

As we just described, CAPTCHAs may increase the effort required to automate an attack, but a CAPTCHA does not necessarily slow or prevent automated attacks. As an example, consider a 2010 criminal case in which a hacker reportedly wrote a computer program to impersonate thousands of individual ticket buyers on websites of online ticket vendors such as Ticketmaster, Musictoday and Tickets.com in order to automatically purchase premium event tickets and then resell the tickets later at higher prices [57]. The websites asked for personal financial information for the purchase and used CAPTCHAs to help thwart automated ticket buying [57]. The program defeated the CAPTCHAs presented and ran on a network of computers simultaneously to scale the attack. According to prosecutors, the program grabbed more than 1 million tickets for concerts and sporting events, and the resale of those tickets between 2002 and 2009 yielded more than $25 million in profit [57].

A semi-automated way an attacker’s program could defeat CAPTCHAs is to use cheap human labor, such as Amazon Mechanical Turk’s Human Intelligence Tasks [58]. The attacker’s program electronically routes a copy of the CAPTCHA to a human who views the CAPTCHA and electronically sends the response back to the program, which in turn submits the correct response to the website.

An automated way an attacker’s computer program could defeat a CAPTCHA is to use a dictionary of all the CAPTCHA images that could possibly appear on the web page. This approach assumes that the number of CAPTCHA images available at the website is small. The attacker records each image along with the proper response for each image in a dictionary. Later, when the computer program encounters the CAPTCHA, it looks up the image in its dictionary and responds with the pre-stored answer to respond to CAPTCHAs automatically. The ticket-scalping scheme described above created a dictionary of about a thousand CAPTCHAs found at popular ticket selling websites. Their program then used the dictionary to bypass the CAPTCHAs.

Advanced programming can also defeat some CAPTCHAs straightforwardly by responding to the CAPTCHA as a human would [59], [60], [61]. For example, Google engineers defeated street image CAPTCHAs using a Street View algorithm designed to decipher blurry street addresses [61]. They achieved 99.8 percent accuracy. A company named Vicarious developed software it claims can crack up to 90 percent of CAPTCHAs – including reCAPTCHA – offered by Google, Yahoo and PayPal [60]. Published academic papers use machine learning algorithms to defeat rendered-text CAPTCHAs [59].

Results

Below is a summary of our findings. Supporting details appear in the subsections that follow this summary and are organized following the 6-step approach described above.

We found online voter websites for 35 states and D.C. that allowed voters in those states to submit changes to their addresses. The kinds of personal information an attacker would have provided were the voter’s name in 30 (83 percent) of 36 websites, date of birth (35 or 97 percent), driver’s license number, or State ID if the voter did not drive (33 or 92 percent), and part or all of the voter’s SSN (22 or 61 percent). There were 43 possible combinations of fields of personal voter information required at the 36 websites. In combination, {name, date of birth, driver’s license} was needed in 27 (63 percent) of the 43 and {name, date of birth, address} was necessary in 15 (35 percent).

Our surveys found that an attacker could acquire voter names and demographics from government offices, data brokers, the deep web, or darknet markets, and could acquire government-issued identification numbers from data brokers, government sources, websites, and the deep web or darknet markets.

For voter lists, we calculated that an attacker could have spent $219 to acquire voter lists for 18 (50 percent) states, $4,407 for 29 (81 percent) states, or $17,679 for all 35 states and D.C. from authoritative government sources, publicized websites, or data brokers. Sources of all cost information and the assumptions we used to calculate attack costs are laid out further below in the sub-sections.

Alternatively, an attacker could have spent $1,002 from darknet sources to acquire 2 datasets that jointly contained the names, addresses, dates of birth, genders, and SSNs of most adult Americans.

While prices varied, an attacker could spend as little as $40 per month for an unlimited number of searches at a data broker website to acquire the SSN and driver’s license numbers of voters (and an additional $0.01 per record for more details such as prior addresses and driver’s license issue date). Some data brokers charged $1 per name compared to $0.41 from clandestine sources on darknet markets. Compilations from swipes of the magnetic strip of driver’s licenses and photo images of driver’s licenses were available on darknet markets for prices ranging from $0.01 to $15 each.

Government election offices, political campaigns, and data analytic companies working on elections have experienced breaches containing most or all the personal information an attacker would have needed to impersonate voters at the 36 websites. Copies of some of these datasets have been found publicly available on the deep web and could have been available for an attacker to use if the attacker made or became aware of the URLs. In these cases, the data would have been available at no additional cost.

Eleven (31 percent) of the 36 websites had a CAPTCHA service that attempted to limit the speed with which an attacker could submit address changes on the website. In 2016, however, automated programs could respond to the kinds of CAPTCHAs found on all the state websites that had CAPTCHAs, thereby rendering them a nominal deterrent.

In summary, the cost of an attack may vary based on actual data sources and whether the website used CAPTCHAs. An attacker that primarily used two datasets offered on darknet markets, the Texas voter list, and a data broker as sources could change 1 percent of the voter records on all 36 websites for a total attack cost of $10,081. The minimum state cost is $1 (Alaska), the maximum $3,059 (Texas), and the median $41 per state. Alternatively, an attacker could primarily purchase data from government offices, data brokers and websites archived on search engines to change 1 percent of the voter records on all 36 websites for $24,926. The minimum state cost was $5 (Delaware), the maximum $3,059 (Texas), and the median $417 per state. An attacker that found relevant information on the deep web, or had a confederate that placed it there, would have no data costs, dramatically reducing the attack cost for machine use to $748 for all 36 state websites.

Results for Step 1: Number of Websites

Thirty-five states and D.C. offer online websites for submitting changes to voter addresses. Specifics appear below.

From June 29 to July 6, 2016, and again in November 2016, we searched google.com for official state election websites. Our search queries were of the form “ voter registration address change,” where is the name of a state. An example is “Alabama voter registration address change” for the state of Alabama. We reviewed the top 6 results from each search. Using this approach, we located 35 websites for states and one website for the District of Columbia (D.C.). All 36 websites allowed a voter to submit online changes to a residential address in order to subsequently change the address in the voter rolls. The 35 states were: Alabama, Alaska, Arizona, California, Colorado, Connecticut, Delaware, Georgia, Hawaii, Illinois, Indiana, Kansas, Kentucky, Louisiana, Maryland, Massachusetts, Michigan, Minnesota, Missouri, Nebraska, Nevada, New Jersey, New Mexico, New York, Ohio, Oregon, Pennsylvania, Rhode Island, South Carolina, Texas, Utah, Vermont, Virginia, Washington, and West Virginia. Figure 4 lists these states and the URLs of their websites. Figure 5 shows them geographically.

Thirty (30 or 83 percent) of the 36 websites allowed a registered voter to submit a change of address at the same website he would register to vote. Some of these may require the change of address to be done by re-registering. The other six states—Colorado, Michigan, New Jersey, Ohio, Texas, and the District of Columbia—provided different websites for a registered voter wanting to change an address online.

South Carolina required a voter to make changes on two websites in order to submit a residential address change online. A voter would first submit an address change to the voter’s driver’s license information at a motor vehicle website, and then submit a change of the address to the voter registration record at a voter registration website.

We did not find an online change of address option for 15 (or 29 percent) of the 51 possibilities (50 states and D.C.). Specifically, Arkansas, Florida, Idaho, Iowa, Maine, Mississippi, Montana, New Hampshire, North Carolina, North Dakota, Oklahoma, South Dakota, Tennessee, Wisconsin, and Wyoming did not have online options for submitting voter address changes. Most of these states had an online web page that specifically described offering mail-in or in-person options only.

Figure 4. The website URLs for 35 states and D.C. (jointly termed “36 states”) that allowed visitors to update voter addresses online in 2016. References in this writing to 36 state websites include Washington, D.C., which is not a state but is also not part of any state. All but 6 of the 36 websites used the same website for both voter address changes and voter registration. *South Carolina required changing a voter address at two websites, one a voter registration website. We count South Carolina once in references to the total number of states or state websites.

Figure 5. Map of the United States, color-coded to distinguish the 35 states and the District of Columbia (jointly termed “36 states”) that provided websites to change voter addresses online (orange) from the states that provided no online means for changing voter information (gray).

Results for Step 2: Personal Information Required

Submitting voter address changes online required providing date of birth at 35 (97 percent) of 36 websites, driver’s license number, or state ID number if the voter did not drive, at 33 (92 percent) of 36 websites, name in 30 (83 percent), part or all of the voter’s SSN at 22 (61 percent), address information at 15 (42 percent), and gender at 7 (19 percent). There were 43 possible ways personal voter information could be entered into the 36 websites. The voter’s name, date of birth, and driver’s license was needed in 27 (63 percent) of the 43 possible ways. The voter’s name, date of birth and address was necessary in 15 (35 percent) of the 43. Nineteen (44 percent) of the 43 possible ways included the voter’s name, date of birth and the last four digits or all of the voter’s SSN. Details appear below.

A visitor to any of the websites had to provide some combination of personal information about the voter before submitting a change of the voter’s address. For example, on Delaware’s website, a visitor could choose to provide either: (1) the name, date of birth, and ZIP (postal) code of the voter (Figure 1a); or (2) date of birth and driver’s license number of the voter (Figure 1b). On Alabama’s website, a visitor must provide the voter’s name, date of birth and driver’s license number or state ID if the person does not drive (Figure 6). South Carolina’s website additionally required SSN, gender and race of the voter; see Figure 7.

Figure 6. Information required to authenticate a voter at the Alabama website was: name, date of birth, and driver’s license number (or state ID for a non-driver). Required fields have red asterisk (*), so a visitor may optionally enter the last 4 digits of his SSN. See Figure 4 for the source URL.

Figure 7. Information required of a voter to identify himself at the South Carolina website was: driver’s license number (or state ID if he does not drive), name, SSN, date of birth, gender, race and a telephone number. See Figure 4 for the source URL.

Table 2 lists the minimal personal information demanded by each of the 36 websites, and Table 3 enumerates the information by field. We report the minimal information because, as we described our methodology earlier, we stop surveying information at the point we believe any further advance on the website may alter a voter’s data. An attacker may reasonably conclude these are the required fields because of website design, notwithstanding discrepancies between these and state FAQs, as described in our Methods section earlier.

Name and Demographic Information. The 36 state websites relied heavily on name and demographic fields. The voter’s first and last name was required in 30 (83 percent) of the 36 websites, date of birth in 35 (97 percent) of the websites, and some part of the voter’s address in 15 (42 percent) of the websites. New Jersey had the only website that did not require the voter’s date of birth. The South Carolina website was the only website that required race.

The Delaware website requested a visitor to enter the voter’s name, date of birth and 5-digit ZIP code; see Figure 1. The Connecticut website required a visitor to enter the voter’s name, date of birth, and town if the voter did not have a driver’s license.

Websites for Minnesota, New York, and South Carolina requested a telephone number or an email address. Does the voting office have a reliable stored value of a telephone number or an email address for comparison? If not, values entered for phone and email are not verified against any previously stored values, but instead may be used for future verification of the voter or as a means for subsequent contact if needed. Therefore, values entered on the website for these fields would not have to be those actually held by the voter. A made-up value or a non-traceable phone number, such as one purchased on Skype [63], or a free email account, such as would be available on Gmail [64], could suffice. See Figure 7 for an example.

Missouri’s website had asterisks on the required fields {name, date of birth, address, and last 4 digits of SSN}. Driver’s license number and gender were among the fields listed, but not marked as required. When we attempted to submit a blank page, a message appeared that listed the fields required to complete, which were the same as those marked with asterisks but also included gender. Driver’s license number, a field like gender that did not have an asterisk, was not listed as being required. For this reason, we include gender as a required field for Missouri. We did not submit any actual changes, so we do not know whether Missouri’s website actually required gender. However, it seems plausible, that under these circumstances, an attacker might believe gender was required. If gender was not required, our assumption unduly increased the burden on the attacker.

Government-Issued Numbers. The 36 state websites also relied heavily on government-issued number—namely, SSNs and driver’s license numbers. Thirty-three (92 percent) of 36 websites requested a driver’s license number. If a voter did not have a driver’s license, a voter’s state ID issued by the same state department that regulates motor vehicle driver’s licenses could be entered; so, for convenience in this writing, all subsequent references to “driver’s license” include non-driving state IDs issued by the same state department that issues driver’s licenses in the state unless stated otherwise or obvious from context.

Of the 33 websites that requested a driver’s license number, only the websites for Colorado and Connecticut gave visitors the option of alternatively providing the last 4 digits of the voter’s SSN or using the voter’s name, date of birth and ZIP code, respectively. These are denoted by “OR” on Table 2. If a voter did not have a driver’s license number (or state ID number), a visitor could enter the last 4 digits of the voter’s SSN in Connecticut, Minnesota, Pennsylvania, Vermont, and District of Columbia websites. These options are listed as “ELSE” on Table 2. In all other cases, if a voter could not provide a requested driver’s license, a mail-in or in-person option was necessary.

Twenty-two of the 36 (61 percent) websites requested all or part of the voter’s SSN. Five (14 percent) websites required the full 9-digit SSN, while 17 (47 percent) required only the last 4 digits. Only the voter’s date of birth and SSN was necessary for Kentucky’s website.

Connecticut and Delaware websites (see Figure 1) allowed a visitor to enter combinations of fields of personal information that did not include any government-issued ID number. All the other states and D.C. required a government-issued ID number, though none of the websites relied solely on a government-issued ID number. Fourteen (39 percent) websites accepted combinations that included both a driver’s license number and some form of SSN, 19 (53 percent) accepted combinations with driver’s license and no SSN, and 8 states (22 percent) accepted combinations with some form of SSN and no driver’s license number.

Other Kinds of Information. California and Pennsylvania websites required party affiliation. Illinois, South Carolina, Washington required the issuance date of the driver’s license number. New York and Texas required document numbers that appear on the driver’s license. Michigan required eye color, which also appears on the driver’s license. Texas required a voter identification number. And Kansas required copies of citizenship documents on initial voter registrations but not on changes to existing registrations.

Combinations of Fields of Personal Information. As mentioned earlier, the websites for Colorado, Connecticut, Delaware, Minnesota, Pennsylvania, Vermont, and Washington, D.C. allowed a visitor to enter one of two different combinations of fields of personal voter information. Therefore, there were 43 possible combinations of personal voter information to enter for the 36 websites.

Table 2 identifies the 43 combinations of fields of personal voter information necessary at the 36 state websites. For those websites that offered a choice, Table 2 describes whether the choice was an OR, in which case a visitor could select either combination of fields to enter, or was an ELSE, in which case a visitor could only use the second choice if the voter did not have a driver’s license number (or non-driving state ID number). If a state website required information such as party affiliation or voter identification number, Table 2 identifies that “Other” information was required. If the website required any information that could be found on the driver’s license, such as issuance date, audit number, or eye color, Table 2 identifies that “Driver’s License Other Info” was required.

Table 4 lists which fields of voter information were required at the same state websites. Most websites (16) required a unique combination of fields of personal information. The voter’s name, date of birth, and driver’s license number were required at 8 of the websites. Five websites additionally required the voter’s address, and, alternatively, five websites additionally required the last four digits of the voter’s SSN. See Table 4 for a complete list.

Overall, the voter’s name, date of birth, and driver’s license were needed in 27 (or 63 percent) of the 43 possible ways personal voter information could be entered into the 36 websites. The voter’s name, date of birth, and address were necessary in 15 (35 percent) of the 43 possible combinations. Nineteen (44 percent) of the 43 possible ways included the voter’s name, date of birth and the last four digits or all of the voter’s SSN.

Table 2. Information required from visitor at a state website to be accepted as a voter allowed to change the voter’s address, based on an attacker’s inspection of the website. Fields included date of birth (DOB), driver’s license or state identification ID numbers (DL number), and other information combined with the DL number, gender (Sex), and Social Security numbers (SSN), including just the last 4 digits (Last 4 SSN). CT and other states offered a choice between two groups of fields to provide (“OR”). SC required visiting two websites (“THEN”). CT, DC, MN, PA, and VT allowed the second choice of fields only if the visitor did not have a DL number (“ELSE”). Figure 4 lists the source URLS for each state website. Kansas requires a copy of citizenship documents on an initial registration but not on address changes. *We consider gender as required for Missouri because upon a blank submission, gender was among the list of required fields even though it was not marked as required on the form.

Table 3. Kinds of information required in order to submit updates online by state. Orange represents a required field, based on an attacker’s inspection of the website. Some state websites provided a choice between the sets of fields required, identified as 1 and 2. Table 2 lists the actual fields minimally required by each state’s website and for those state websites providing a choice, the OR or ELSE relationship between choices.

Table 4. Groups of states whose websites have the same minimal requirements for visitors to identify themselves as voters. Table 3 lists these requirements by state.

Results for Step 3: Data Sources of Personal Voter Information

In summary, an attacker could have acquired the personal information needed for any or all of the 36 websites from named sources or from clandestine alternatives (Figure 2). We surveyed these two tracks separately. Below is a summary with details in the sub-sections that follow this summary.

When using named sources, the attacker would have started with voter data because it contains a list of actual voters. Starting with a dataset of the names and demographics of the general adult population is not as efficient. The attacker could have acquired voter data for all 36 states from government offices, voting list brokers, or publicized websites. For the 18 (50 percent) states having the least expensive voter data, the total cost would have been $219; for 29 (81 percent) states, the cost would have been $4,407, or for all 36 states, $17,679.

Voter names and all the demographic information needed to impersonate voters at more than 20 (56 percent) of the 36 websites were contained in voter lists acquired from government sources or from voter list brokers. Date of birth information was missing for 13 states in data acquired from government sources and for 11 states in data from voting list brokers. But for less than $30, an attacker could use public data broker websites to append the missing date of birth information. Gender was missing in a Missouri dataset acquired from a voting list broker, but it could have been inferred from the first names of voters or guessed.

SSNs were required on 17 (47 percent) of the 36 websites. Twenty-two data brokers proffered a service that provides an SSN in response to a search of a voter’s name and address or to a search additionally including the voter’s birth date. While prices varied, the least expensive offer was for an unlimited number of searches at a monthly fee of $40.

Driver’s license numbers and sometimes issuance dates were required by 33 (92 percent) of the 36 websites. Eleven data brokers provided this information in response to a search of a voter’s name and address or date of birth. Monthly subscriptions ranged from $39.95 to $59.95 per month, and some websites charged an additional $1, $5 or $10 per lookup or 1 cent more for issuance date. Alternatives were found for some states. A Nebraska government website revealed a voter’s driver’s license number if given the voter’s SSN. Driver’s license numbers for Illinois, Maryland, Michigan, New Jersey and Washington could be computed directly from voter demographics. Arizona used SSN as a driver’s license number for some of its population.

Offers from clandestine sources on darknet markets included access to voter lists or similar population lists for as little as $1 per million and to datasets having names, address, dates of birth, SSNs, and driver’s license numbers with issuance dates for as little as $0.41 per record for Americans in all 50 states and the District of Columbia. One-time flat fee offers included $500 for a dataset that reportedly contained the names, address, dates of birth, and SSNs of all adult Americans and $10 for a URL link to a data broker or website having equivalent information. Negotiable volume discounts beyond these advertised prices appeared widely.

National and statewide voter datasets, as well as the SSNs and driver’s license data on millions of Americans, have been the subject of breaches and the targets of break-ins. Government election offices, political campaigns and data analytic companies working on elections had breaches of datasets containing most or all the personal information an attacker needed to impersonate voters at the 36 websites. Copies of some of these datasets are publicly available for anyone, including an attacker, to access on the deep web (e.g., [65], [51]).

In presenting our detailed findings, we walk through two approaches for an attacker to acquire the personal voter information needed at the 36 state websites; see Figure 2. In the first approach, we start with (a) voter lists, and then append requisite (b) birth and (c) gender information to complete the set of demographics needed per state website (Table 2). We then use the compiled demographic data to acquire any needed government-issued numbers, such as (d) Social Security numbers and (e) driver’s license data. In the second approach, the attacker acquires all the data needed from clandestine sources on the (f) darknet or elsewhere on the (g) deep web. Our lists of available sources and any data quality issues, access requirements, or costs appear in the following subsections.

Results for Step 3: Data Sources: (a) Voter Lists

“Voter lists” contain the name, personal demographics, and voting history of each registered voter. As an example, Table 5 list the fields of data available in the South Carolina voter list; the list includes name, date of birth, gender, race, address, and voting history for each registered voter. Government election offices tend to share some (or all) of these fields widely to help assure the integrity of the voting franchise. Listing registered voters by name with demographic information makes the voting process more transparent and election results more accountable. An attacker, seeking to impersonate voters on websites, might first acquire personal information about registered voters.

The local election office is usually responsible for the integrity of the local voting list. Voting lists are almost always available at the local or county level. As described earlier, statewide systems have emerged, and so state election offices control access to statewide voting lists and may also provide a copy of the voting list for a given county. Therefore, there are usually two government sources of voter lists, one at the local or county level and another at the state level.

An attacker who wants to target a specific municipality could acquire the voting list from the local election office or from the state election office. Access requirements and costs often vary by municipality, county, or state for the same data. So, shopping around can make a difference. Of course, if the attacker needs access to the entire state’s voter list, then assembling the list using local sources could be onerous, as each locality would have to be contacted separately and each source may have a different process for obtaining a voter list. On the other hand, obtaining voter data from a local election office is usually easier and less restricted than obtaining voter data at the state level. Similarly, if statewide data are expensive, acquiring voter data through local sources may cost less. As an example, we acquired the voter list for the city of Cambridge, Massachusetts for $20, even though the state of Massachusetts would not directly provide the statewide voter list to us.

In this writing, we focus on statewide voter lists for convenience. That is, our investigation surveyed access and costs to obtain statewide voter lists. Acquiring voter data from local municipalities may yield different results.

Access to Voter Lists. Whether a person or organization can acquire a voter list directly from a state election office depends on the state and the stated purposes of the recipient. According to U.S. Voter List Information at the United States Election Project [66] and our own independent survey, state offices for 30 (83 percent) of the 36 websites listed in Figure 4 give or sell their voter lists to anyone in the public, although restrictions exist on acceptable uses. State offices in 23 (64 percent) of the 36 states give or sell their voter lists to anyone in the general public, provided the recipient attests to not using the data for commercial purposes [66]. State offices in 6 of the 36 states, namely Illinois, Kentucky, Maryland, Massachusetts, Minnesota, and South Carolina, do not give or sell their voter list to members of the public but they do give or sell voter lists to specific groups. For example, state election offices in Illinois, Kentucky, Massachusetts, and South Carolina give or sell their voter lists to political campaigns and political committees. State offices in Maryland and Minnesota sell their voter lists to political campaigns and political committees for non-research political purposes. Kentucky and Massachusetts election offices additionally give or sell their voter lists to non-profit organizations. South Carolina’s election office does not provide data to the public generally, but it does provide its voter list to any registered South Carolina voter. Table 6 details access restrictions by state.

To summarize access to government sources of voter data: an attacker in 2016 could request voter lists from state offices in 30 (83 percent) of the 36 states that are the subject of this study. In the other 6 states, the attacker would have had two options: collude with a political committee, non-profit organization, or a registered voter for access to the statewide government offering, or request copies of localized lists from municipal or county offices.

Table 5. Data fields in the South Carolina statewide voter list, includes the Voter ID, name, date of birth, address, gender, race, voting history, voter status, and precinct information for each registered voter.

Table 6. Access requirements for statewide voter lists in the 35 states and D.C. from Figure 4. States have requirements that vary by recipient: any person or organization, a state resident, a non-profit organization, or a political committee or political campaign. States also vary access requirements by data uses. Non-commercial use means the sale or resale of the data is prohibited. Political purposes include election activities, political activities, and voter registration. Primary source: U.S. Voter List Information at the United States Election Project [66].

Some voter lists are freely available online for anyone to download. For example, the state of Ohio allows anyone to download the most recent Ohio voter list online at no charge [48]. Some of those who acquire voter lists from government offices subsequently host the data online for public use. For example, websites exist where anyone can download recent copies of statewide voter lists from Colorado [67], Connecticut [68], Delaware [69], Michigan [70], Rhode Island [71], Utah [72], and the District of Columbia [73].

Data brokers, such as Aristotle [49], Catalist [74], L2 [50], e-Merges [75], and Nation Builder [76], specialize in selling voter data (these are called “voter list brokers” in this writing). These companies offer national databases as well as statewide versions or extracts by county or voting district. Voter list brokers add value to statewide voter lists in a variety of ways. They usually append additional information on each voter, such as more demographic data, lifestyle facts, hobbies, interests, email addresses, and phone numbers. They standardize the data format because each original source of voter data, whether a local municipality or a state election office, has its own format for storing a voter list. For example, the date of birth, April 5, 1945, might be stored as 19450405 in one voter list and as 05-APR-1945 in another. The order of fields and which fields are present also vary. These differences require time-consuming standardization of the selection and arrangement of fields and the format of values within each field in order for computer programs to use the data from multiple sources. Voter data from a voter list broker has a consistent, standard format. Several voter list brokers also offer advanced computing services on the data, such geo-coding, specialized sorts, mapping, and analyses. An attacker can acquire voter information for any municipality in the United States from a voter list broker.

To summarize access to voter data: An attacker in 2016 could acquire statewide voter lists for all 36 states from government offices and/or voting list brokers.

Prices for Voter Lists. First, we surveyed the price of voter lists from government sources. Table 7 reports the total amount per state as well as a per voter cost. Offices in Massachusetts, New York, Ohio, Vermont, and Washington provided the data for free to qualifying groups. The median price was $180 and the average $2,619, with a standard deviation of $7,786. Arizona had the most expensive list at $34,006. All costs were surveyed in 2016.

Next, we surveyed the price of voter lists for the 36 states from 3 voter list brokers. Aristotle offered a rate of 3 cents per voter on its website [49] and sold statewide voter lists at a negotiable rate. L2 had a slightly higher per voter rate of $0.032 on its website [50], and its website in January 2016 offered a flat rate of $2,000 per state for any statewide voter data. The e-Merges website [75] offered a statewide voter list with prices ranging from $350 for D.C. to $10,000 for Arizona, with a median price of $750 and an average price of $2,186.

Table 7. Prices for statewide voter lists in the 35 states and D.C. when provided directly from respective state election offices (“authoritative” or “government” source), with the number of registered voters for each state and the per voter cost. Recent copies of government-issued voter lists are freely available indirectly at third-party websites. The most recent years of data available from indirect sources are shown. References appear in []’s.

An attacker could price shop. Suppose an attacker wanted to target all 36 websites. (1) He could purchase statewide data from a voter list broker for $2,000 per state, for a total of $72,000. (2) Alternatively, he could purchase statewide data directly from government sources or indirectly through qualified groups for a total of $94,279, which he could lower to $92,820 by using third-party websites that offer free downloads of recent lists as available. Or (3) he could strategically reduce the cost to $17,679 by downloading freely available voter lists from third-party websites and purchasing lists from government sources for states having prices $2,000 or less and from voter list brokers for all others. Table 8 summarizes costs under all three plans.

Alternatively, an attacker could opportunistically target websites based on available funds. For $219, an attacker could acquire voter data for 18 (50 percent) of the 36 websites from freely available third-party websites and states that charge less than or equal to $100 per list. These 18 states are: Alaska, California, Colorado, Connecticut, Delaware, District of Columbia, Massachusetts, Michigan, Minnesota, Missouri, New Jersey, New York, Ohio, Pennsylvania, Rhode Island, Utah, Vermont, and Washington. See the rightmost column of Table 8. For $957, an attacker could purchase voter lists from 21 (or 58 percent) of the 36 states; see the rightmost column in Table 8 for voter lists that cost less than or equal to $250 each. Finally, for $4,407, an attacker could acquire voter data for 29 (81 percent) of the 36 websites; see Massachusetts through Utah in the rightmost column of Table 8 for voter lists that cost less than or equal to $500 each.

The Texas website includes a voter identification number, which is a field in the voter data acquired from Texas, so at $1,272, the State of Texas is itself the least expensive provider of its data from named sources and of the Texas voter identification number.

Table 8. Prices of statewide voter lists in the 35 states and DC from named sources sorted by cost. Column (a) list prices for voter lists directly from state government offices. Column (b) lists “$0” if the voter list could be obtained from free downloads either from the state or from a free website, if available. Otherwise, it lists the price from the state government office. Column (c) lists “0” if the voter list could be obtained through free downloads. If not, it lists the cost from government offices only if government price is less than $2000. If the government price is more than $2000, it lists the price from the voter list brokers. In short, column (c) reflects the least expensive option to buy a voter list from either a free download, a state government office, or a voter list broker. See Table 7 for more details. The least expensive option is (c) at $17,678.50 for data on all voters in the 35 states and DC.

Results for Step 3: Data Sources: (b) Dates of Birth

We purchased voter lists from government offices and from voter list brokers. We downloaded voter lists from third-party websites. We reviewed descriptions that government offices and voter list brokers provided of their voter lists. Finally, we constructed a national voter database to determine which fields of personal information appear in voter lists.

First, we surveyed the fields of voter lists provided from government sources. Twenty-three (64 percent) states provided names and all the demographic information –namely, date of birth, gender, address– needed to impersonate voters at their websites. These states were: California, Connecticut, Illinois, Kansas, Kentucky, Maryland, Massachusetts, Missouri, Nebraska, Nevada, New Jersey, New Mexico, New York, Ohio, Oregon, Pennsylvania, Rhode Island, South Carolina, Texas, Utah, Virginia, Washington, and West Virginia (see path 3A on Figure 8). Some of these state websites still required additional government issued numbers –namely, driver’s license data and Social Security numbers– as described later.

Thirteen states had websites that require an attacker to enter a voter’s date of birth, but provided voter lists that do not include dates of birth. These states were: Alaska, Alabama, Arizona, Colorado, Delaware, Georgia, Hawaii, Indiana, Louisiana, Michigan, Minnesota, Vermont, and the District of Columbia (see path 3B on Figure 8). The voter lists from some states, e.g., Alabama and Louisiana, provide age. Other states, e.g., Colorado, Delaware, Michigan, Minnesota, and Vermont, provide year of birth. Path 3 on Figure 8 has a state-wise breakdown.

When we surveyed the fields of data provided by voter list brokers, we found similar results. Voter list brokers provided data for 24 (67 percent) states that contained all the name and demographic information needed to impersonate voters at those state websites. The states were: Alabama, Arizona, California, Connecticut, Illinois, Indiana, Kansas, Kentucky, Maryland, Massachusetts, Nebraska, Nevada, New Jersey, New York, Oregon, Pennsylvania, Rhode Island, South Carolina, Texas, Utah, Virginia, Washington, the District of Columbia, and West Virginia (see path 1A on Figure 8).

Voter list brokers provided data for 11 states that did not have the date of birth information needed at the states’ websites. These states were: Alaska, Colorado, Delaware, Georgia, Hawaii, Louisiana, Michigan, Minnesota, New Mexico, Ohio, and Vermont (see path 1B on Figure 8).

We surveyed data brokers to acquire missing date of birth information. There are many kinds of data brokers. Earlier in this writing, we examined data brokers that specialized in voter data and termed them voter list brokers. In this section, we introduce data brokers that specialize in compiling dossiers from disparate publicly available personal information; we term these “public data brokers” in this writing.

Public data brokers tend to gather and merge personal data from diverse public sources, including some combination of: (1) population registries, e.g., marriage records, divorce records, marriage license records, driving records, phone number directories, professional licenses, weapons permits, and voter registrations; (2) court records, e.g., civil filings, civil actions, bankruptcies, liens, and judgments; (3) property records, e.g., mortgage records and real estate property and owner information; (4) criminal records, e.g., arrests and warrants, mug shots, DUI (driving under the influence of alcohol) records, restraining orders, misdemeanors and felonies, convictions and incarcerations, and driving violations; and, (5) online and social media, e.g., email addresses, and personal references appearing on Facebook, Flickr, Twitter, Google+, MySpace, Kickstarter, LinkedIn, YouTube, Picasa, Pinterest, Reddit, Gravatar, Ancestry, Klout, Instagram, Amazon, Vimeo, and Foursquare.

A simple Google search of a voter’s name, city, and state (and optionally “date of birth”) revealed links to public data brokers that optimize part of their database content for web searches. We harvested links from search results and ads to compile a list of 15 public data brokers: People Smart (www.peoplesmart.com), BeenVerified (www.beenverified.com), Spokeo (www.spokeo.com), PublicRecords (publicrecords.directory), Radaris (radaris.com), Intelius (www.intelius.com), People by Name (www.peoplebyname.com/people/), Public Data Check (publicdatacheck.com), Truth Finder (www.truthfinder.com), Spyfly (spyfly.com), Instant Checkmate (www.instantcheckmate.com), White Pages (www.whitepages.com), People Finders (www.peoplefinders.com), Persopo (persopo.com), and DOB Search (dobsearch.com). In this writing, we consider these “popular public data brokers”.

Freely available search results on 13 (87 percent) of the 15 popular public data brokers show a voter’s age when the search contains the voter’s name, city, and state. The other two public data brokers provide more detailed birth information. Been Verified sometimes reports the full date of birth and other times reports only age in free searches. People by Name reports the month and year of birth in free searches. Eleven (or 73 percent) of the 15 public data brokers claimed to provide the date of birth for a fee as part of a complete dossier on the person. Prices for one month of unlimited searches ranged from $8.99 (DOB Search) to $34.78 (Instant Checkmate), with an average monthly price of $26.01.

In short, for less than $30, an attacker could search for known voters at popular public data broker websites in order to append date of birth information to voter lists. (See a later section on automation, Results for Step 5. Test automation, for examples of search automation.)

Results for Step 3: Data Sources: (c) Inferring Gender

Gender is another field an attacker needs in order to impersonate voters at the websites for Hawaii, Louisiana, Missouri, New Mexico, South Carolina, and Washington (see Table 3). Voter lists acquired from government sources included gender.

Data from voting list brokers included gender for all the state websites that require it except for Missouri. There are several strategies an attacker could have deployed to enter gender on the Missouri website. One option is for an attacker to enter the same gender for all voters whose address the attacker wants to change. Missouri’s system would presumably not change the addresses of those voters for whom the wrong gender was entered but would change the addresses of those voters for whom the correct gender was entered. So, about half the attempted record changes would succeed.

As a second option, an attacker can use lists of common male and female first names (e.g., [113]). When a first name on the voter list agrees with a name most often associated with one gender, the attacker enters the respective gender. The attacker can ignore all voters whose first names are not on the list or whose first name is not easily distinguishable.

A final option is for an attacker to try one gender, and if it fails for a voter, retry with the other gender. In this case, some attempts to change a voter address would take two tries to be successful. This strategy can also work with eye color. If eye color is a required field, the attacker can try attempt multiple changes until successful since there are a limited number of eye color options. The attacker could pursue any of these three options by manual or automated means.

In short, an attacker could have acquired all the demographic information needed to impersonate a voter on any or all of the 36 websites listed in Figure 4 by obtaining some combination of voter lists and date of birth information from public data brokers, and, for the Missouri website, inferring gender.

Figure 8 summarizes the attacker’s data acquisition requirements for each state. An attacker could have acquired statewide voter lists for any of the 35 states and D.C. from either a voting list broker (Path 1) or a government source (Path 3).

Voting lists from a voter list broker provided the names and personal demographics required for the websites of Alabama (AL), Arizona (AZ), California (CA), Connecticut (CT), Illinois (IL), Indiana (IN), Kansas (KS), Kentucky (KY), Maryland (MD), Massachusetts (MA), Nebraska (NE), Nevada (NV), New Jersey (NJ), New York (NY), Oregon (OR), Pennsylvania (PA), Rhode Island (RI), South Carolina (SC), Texas (TX), Utah (UT), Virginia (VA), Washington (WA), District of Columbia (DC), and West Virginia (WV); these are identified on path 1A in Figure 8. No further demographic information was necessary for these states. However, gender was needed for Missouri, and date of birth was needed for the other 11 states. A public data broker could have provided the missing date of birth information, as identified by path 1B on Figure 8. Gender could be inferred from first names (see path 1C on Figure 8).

Similarly, voting lists from government sources provided the names and personal demographics required for the websites of California (CA), Connecticut (CT), Illinois (IL), Kansas (KS), Kentucky (KY), Maryland (MD), Massachusetts (MA), Missouri (MO), Nebraska (NE), Nevada (NV), New Jersey (NJ), New Mexico (NM), New York (NY), Oregon (OR), Pennsylvania (PA), Rhode Island (RI), South Carolina (SC), Texas (TX), Utah (UT), Virginia (VA), Washington (WA), and West Virginia (WV), as identified by path 3A on Figure 8. No further demographic information was necessary for these states. However, date of birth was needed for the remaining 12 states. A public data broker could provide the missing date of birth information, as identified by path 3B on Figure 8.

Figure 8. Four paths for an attacker to acquire the names and demographics needed to impersonate voters on the 36 state websites listed in Figure 4. Paths 1, 2, and 4 follow data from a voter list broker or darknet or deep web source. Path 3 follows data from government sources. Path 1A identifies states (abbreviations AL to WV) for which acquired voter lists from a broker was sufficient in 2016. Path 1B identifies states (abbreviations AK to VT) for which voting lists from brokers augmented with date of birth information were sufficient. Path 1C specifies the need to add gender to a Missouri data (MO) voter list acquired from a broker. Path 3A identifies states (abbreviations CA to WV) for which acquired voter lists from government sources were sufficient. Path 3B identifies states (AK to VT) for which voting lists from government offices augmented with date of birth information were sufficient. Alternatively, names and demographics are available with SSNs or driver’s license data on the darknet or deep web as shown by Path 4A. Paths continue on Figure 9, which shows ways to append any needed government identifiers.

Figure 9. Continuation of paths 1 through 4 from Figure 8 to additionally acquire any SSNs or driver’s license data needed to impersonate voters on the 36 state websites listed in Figure 4. In Figure 8, Paths 1 through 4 show how names and demographic information might be acquired from a voter list broker, government source, or the deep web or darknet. Path 5 applies to states that need driver’s license and SSNs and Path 6 to states that need driver’s license only. As shown in Path 7A, two states required no additional information. Path 5A concludes with SSNs acquired from an identification broker for 6 states (AZ to VT) and the ELSE requirement for 4 states (DC to VT). Path 5A provides SSNs for the same states as does path 5F, except that the SSNs come from sources on the deep web or darknet. Driver’s license data is the driver’s license number and issue date. Path 5B and 6A conclude the requirements for 32 states (AK to WV) with driver’s license data from an identification broker. Path 6B satisfies NE requirements using a government website. Path 5E and 6D satisfy requirements for 5 states using an algorithm. Path 5D and 6C provide driver’s license data for the same states as does path 5B and 6A, except that the data come from sources on the deep web or darknet. * indicates an OR requirement (Table 2); ** states also need the driver’s license issue date, which is not included in the algorithmic computation.

Results for Step 3: Data Sources: (d) Social Security Numbers

None of the voter lists acquired from government sources or from voter list brokers included Social Security numbers. We surveyed other kinds of data brokers as a means for an attacker to obtain the Social Security numbers of named voters needed on the 17 (47 percent) websites that require SSNs.

Voter list brokers, as described earlier, provide voting lists and related services. One voter list broker may be better than another based on how recent or complete the data may be and added fields of information. Public data brokers, as described earlier, gather public records and produce dossiers on individuals primarily from publicly available information. One public data broker may be better than another based on sources of information and the ability to correctly link information about the same person across disparate public sources. Most sources of public records merely record the person’s name and address; some include the person’s date of birth. Therefore, attempts to construct personal dossiers from public records necessitate matches of name, address and date of birth. False and missed matches are common.

One way to dramatically improve linkages is to incorporate sources that record Social Security numbers and driver’s license numbers. Often these are financial, insurance, employment, or credit-related sources of personal information. In this writing, we use the term “identification broker” to refer to a data broker that provides personal information that includes Social Security numbers or driver’s license numbers. Of course, the fact that a data broker has or uses Social Security numbers and driver’s license numbers internally does not necessarily mean that the broker provides or sells those numbers to others. We reserve the term “identification broker” for a broker that provides either Social Security numbers, driver’s license numbers, or both for named individuals. Identification brokers often provide services related to verifying identity, such as verifying information on rental applications or performing background employment checks.

Any particular data broker may offer a suite of data options; therefore, a broker may be a voter list broker, a public data broker, and an identification broker, or some combination of these.

In order to locate identification brokers, we started with previously assembled lists. In prior work, Pam Dixon at the World Privacy Forum [114] and Julia Angwin at Propublica [115] itemized data brokers who held information on millions of Americans as part of their own separate studies. The World Privacy Forum listed 347 brokers [114], and Propublica listed 201 sources [115]. We combined these lists and acquired a list of 426 distinct brokers, to which we added 5 popular data brokers from above that were on neither list. We also added 23 other data brokers we learned about during our investigations. Our list began with 454 brokers.

We surveyed the websites of each of the 454 brokers and recorded any claims made to provide dates of birth, Social Security numbers or driver’s license numbers for Americans, and the price for doing so. We then made purchase agreements with 5 brokers to verify available content.

Thirty-five of the 454 data broker websites were defunct, and 20 websites no longer worked. That reduced the total number of operational data broker websites to 399.

The websites of 79 data brokers offered Social Security number services, and the websites of 49 data brokers provided driver’s license number services. Fifty additional data brokers, beyond those that provide SSNs or driver’s license numbers, advertised an ability to provide date of birth information on named Americans.

Of the 79 data broker websites offering Social Security number services, 14 offered the SSNs of decedents only. Two data broker websites reported whether a given Social Security number had been assigned and if so, whether the recipient was deceased. Thirty-seven (37) data broker websites verified matches between Social Security numbers and names. Twenty-two (22) websites provided the ability to associate a name with a given Social Security number or a Social Security number with a given name and address or birthdate.

Of course, the last group, the 22 broker websites that provide an SSN in response to a search of a voter’s name and address or to a search additionally including the voter’s date of birth or age, would be the most convenient service for an attacker. Prices varied widely, making it cost prohibitive to try each of the 22 data brokers. However, we tested 5 broker websites and found SSNs correctly associated with the searched names in all cases. (See a later section on automation, Results for Step 5. Test automation, for examples of search automation.) Our maximum per search cost was $1 and our minimum cost was an unlimited number of searches (seemingly for a single machine) for a monthly fee of $39.95.

The Gramm-Leach-Bliley Act (GLBA) is a federal law enacted in the United States to control the ways that financial institutions handle the personal information of individuals [116]. It covers the kind of personal information, including Social Security numbers, that individuals provide to open a checking account, acquire an insurance policy, apply for a loan, or request a line of credit. If a data broker uses information from a GLBA source, then any user who accesses the data broker’s data must attest to using the information for a “permissible purpose.” Permissible purposes under the GLBA are: (a) to protect against or prevent actual or potential fraud, unauthorized transactions, claims, or other liability; (b) required institutional risk control, or for resolving consumer disputes or inquiries; (c) use by persons holding a legal or beneficial interest relating to the consumer; or, (d) to comply with Federal, State, or local laws, rules, and other applicable legal requirements. Most data broker websites list the permissible purposes and request website users to mouse-click on the purpose that applies. Self-attestation is sufficient with no further validation. As researchers in this study, we selected (a), as would an individual looking up their own information. Of course, “disenfranchising voters” is not listed; an attacker would have to claim one of these purposes in order to proceed at sites requiring a GLBA selection.

Acquiring SSNs, when combined with the acquisition of demographics we described earlier, completes the requirements for Colorado, Delaware, Kentucky, Missouri, and Virginia; see Table 3. If an attacker wanted to focus on voters who were unlikely drivers, such as the elderly or those whose address was a care institution, then acquiring SSNs would also complete the requirements for Minnesota, Pennsylvania, Vermont and the District of Columbia. These are summarized on Figure 9 as path 5A.

For completeness, we now discuss ways some of the other SSN services could have helped an attacker to impersonate voters, and we identify which other SSN services would not have been beneficial to an attacker. A service based solely on decedent information is not useful because the attacker would only be interested in live voters. On the other hand, searches by SSNs could have helped the attacker in cases where the service permitted an unlimited number of searches at a reasonable price. SSNs assigned prior to 2011 follow an encoded format that allows predictions about the first 5 digits of the 9-digit number [117]. For voters born after 1987, more of the 9 digits may be predicted from birthdate and birthplace [118]. With some work, an attacker may predict digits of a voter’s SSN by assuming the voter was born in the same state. The attacker would then exhaustively search numbers having the same first 3 or 5 digits on the SSN website to determine which of these SSNs are assigned to living people. Then, the attacker would search those numbers assigned to living people to see which match a given voter’s name. This approach could be automated to scale performance (see section Results for Step 5. Test automation for examples). However, the direct search approach, described in the prior paragraphs, provides more SSN-to-voter matches with far less effort.

Results for Step 3: Data Sources: (e) Driver’s License Data

Recall, 33 (92 percent) of the 36 websites require driver’s license data. We surveyed: (1) the use of SSNs as driver’s license numbers; (2) state websites that revealed driver’s license numbers; (3) data brokers that provided driver’s license data; and (4) encodings of personal demographics to construct driver’s license numbers. We summarize per state findings in Figure 9.

Several sources (e.g., [119]) have reported that some states still use a person’s SSN as a driver’s license number. These reports cite Arizona, Indiana, Kansas, Massachusetts, Missouri, Nevada, New Mexico, Ohio, and Virginia. We tested each and only found SSNs somewhat used as driver’s license numbers in Arizona, so we appended Arizona to path 5A in Figure 9 for cases because the attacker would only need SSNs.

We surveyed state websites that allowed an individual to inquire about her own driving record or change the address on her driver’s license. Our search strings were “ driver’s license status,” “ driver’s license verify,” “ driver’s license check,” and “ driver’s license record,” where is the name of the state that is the subject of the query. We found such websites for 34 of the 36 states; Hawaii and Vermont were the exceptions. We then examined whether these websites reveal a person’s driver’s license number after one enters the person’s name and demographics. Of the 34 websites tested, only the website for Nebraska revealed the driver’s license number if an attacker entered a person’s SSN [120]. There was no cost for the service. This is summarized on Figure 9 as path 6B.

Of the 454 data broker websites we surveyed for SSN services, 49 offered driver’s license services. But only 11 of these provided a driver’s license number and issuance date in response to a search of a voter’s name and address or date of birth. Monthly subscriptions ranged from $39.95 to $59.95 per month, and some websites charged an additional $1, $5 or $10 per lookup. The $39.95 service provided unlimited searches (seemingly for a single machine) that included SSNs and driver’s license numbers and charged $0.01 per record for further details, such as driver’s license issue date. We sampled the least expensive of these and found accurate information. (For completeness, we report that the other 38 data broker websites provided driving reports that listed drunk driving convictions or other driving record information in response to a given driving license number.)

Combining driver’s license data from an identification broker with the acquired demographic information described earlier would have provided an attacker with all the data needed to impersonate voters on the websites of Alabama, Colorado, Connecticut, Delaware, Georgia, Illinois, Indiana, Kansas, Louisiana, Massachusetts, Minnesota, Nebraska, Oregon, Pennsylvania, Rhode Island, Utah, Vermont, Washington, and the District of Columbia. These are summarized on Figure 9 along path 6C.

Similarly, when further combined with the SSN information described in the prior subsection, an attacker would have acquired all the data needed to impersonate voters on the websites for Alaska, Arizona, California, Hawaii, Maryland, Michigan, Nevada, New Jersey, New Mexico, New York, Ohio, South Carolina, Texas and West Virginia. These are summarized on Figure 9 along path 5D.

A driver’s license number in some states is an encoding of a person’s name and date of birth [46], [47]. The relevant states are: Illinois, Maryland, Michigan, Minnesota (licenses issued prior to 2005) Nevada (prior to 1998), New Jersey (prior to 2004), New York (prior to 1992), and Washington. We used information on a random sample of 10 drivers from each state to see whether the encodings remained in use.

Driver’s license number encodings often rely on Soundex to phonetically encode a name into a 4-character code [46]. The Soundex code for a name consists of a letter followed by three digits [121]. The letter is the first character of the person’s name, and the digits encode the remaining consonants. Similar-sounding consonants share the same digit. Figure 10 shows the Soundex algorithm. Various programs exist on the Web to produce Soundex encodings (such as [122]; some online programs are specific to producing driver’s license numbers that use Soundex, such as [47]).

Figure 10. Soundex algorithm for phonetically encoding names to a 4-character code used in some driver’s license numbers [46]. “Robert” and “Rupert” are both R163, and “Ashcroft” is A261.

An Illinois driver’s license number is based on the driver’s name and birth date in the format SSSS-FFFY-YDDD, where SSSS is the Soundex code of the driver’s last name, FFF requires looking up the first name and middle initial in a table (see Appendix A [123]), Y-Y is the last two digits of the year of birth, and DDD is the calculation: (birth_month - 1) * 31 + birth_day + (male:0, female: 600). Online programs can compute the number automatically (e.g., [124]). All of the driver’s licenses in our sample matched the encoded format. Therefore, an attacker could produce Illinois driver’s license numbers from demographics, so we recorded Illinois on path 6D of Figure 9.

Maryland, Michigan, and Minnesota (in licenses issued prior to 2005) use the same driver’s license number encoding of the driver’s name and birth date in the format LLLL-FFF-MMM-BBB, where LLL is the Soundex code of the driver’s last name, FFF is an encoding of the driver’s first name based on a table, MMM encodes the driver’s middle name based on a table, and BBB is the driver’s birthday and month based on a table (see Appendix A [125]). Then, for Michigan, drop the last two digits. Online programs can compute the number automatically (e.g., [126] for Maryland, [127] for Michigan, and [128] for Minnesota). All of the driver’s licenses in our sample matched the encoded format for Maryland and Michigan, but not for Minnesota. Therefore, an attacker could produce Maryland and Michigan driver’s license numbers from demographics, so we recorded Maryland and Michigan on path 5E of Figure 9.

If issued before 1998, the Nevada driver’s license number was based on the driver’s SSN and year of birth—specifically, take the SSN, multiply by 2, and then add 2,600,000,001 [129]. Append the two-digit year to the end. The result is a 12-digit value. Nevada driver’s licenses issued after 1998 have a two-digit code identifying the issuing office followed by 8 digits. We found none of the older encoded values in our sample.

If issued before 2004, the New Jersey driver’s license number is based on the driver’s name, birth date, gender, and eye color [130]. The format is Axxxx fffmm MMyye, where A is the first letter of the driver’s last name, xxxx is the Soundex encoding of the driver’s last name starting with the second letter, fff is the 3-digit part of the Soundex encoding of the driver’s first name, mm is an index for the driver’s middle initial (A=61, …, Z=89, none=00), MM is the month of birth and gender (01=Jan male, …, 12=Dec male, 51=Jan female, …, 62=Dec female), yy is the last 2 digits of the year of birth, and e is eye color (1=Black, 2=Brown, 3=Grey, 4=Blue, 5=Hazel, 6=Green). All of the driver’s licenses in our sample matched the encoded format with guesses for eye color. Therefore, an attacker could produce New Jersey driver’s license numbers from demographics, so we recorded New Jersey on path 5E of Figure 9. Of course, guessing eye color would require more attempts.

If issued before 1992, the New York driver’s license number is based on the driver’s name and birthdate [131]. The first character is the first letter of the last name, and the next 12 digits encode the first three letters in the first name (F1, F2, F2), the middle initial (M1), and the second through fourth letters of the last name (L2, L3, L4 and L5). Each letter has a value (1=A, …, 26=Z, 0 otherwise). The length of the last name is X. Compute: 10,017,758,323 L2 + 371,538,441 L3 + 13,779,585 L4 + 510,355 L5 + 19,657 F1 + 729 F2 + 27 F3 + M1 – X. If the result is not 12 digits long, add zeros to the left side as needed. Then, the next three digits use the driver’s birth month (m=1,..,12), birth day (d=1,…,31) and gender (g-0=male, 1=female) to compute: 63 m + 2 d + g. Pad the left side with zeros to make the number three digits long as needed. The last two digits are the driver’s year of birth. Finally, if more than one person would have the same number, another digit is added before the year of birth digits, seemingly assigned sequentially. Online programs can compute the number automatically (e.g., [132]). We found none of the older encoded values for New York in our sample.

The Washington driver’s license number is based on the driver’s name and birth date in the format LLLLLFMYYXmd, where LLLLL is the first five letters of the driver’s last name padded with asterisks (*) as needed, F is the initial of the driver’s first name, M is the initial of the driver’s middle name, YY is 100 minus the two digit year of birth, X is a checksum computed over values from an assignment of codes to prior computed characters (see Appendix A), m is the result of looking up the driver’s month of birth in a table of two possible values depending on whether a prior similar driver number has been assigned, and d is the result of looking up the driver’s day of birth in a table (see Appendix A [133]). Online programs can compute the number automatically (e.g., [134]). All of the driver’s licenses in our sample matched the encoded format. Therefore, an attacker could produce Washington driver’s license numbers from demographics, so we recorded Washington on path 6D of Figure 9.

In short, driver’s license numbers for Illinois, Maryland, Michigan, New Jersey and Washington could be computed from the voter demographics.

We have now shown multiple means by which an attacker could acquire all the personal data needed to impersonate voters at any of the 36 websites (Figure 4). Figure 8 summarizes the acquisition of needed demographics by state, and Figure 9 describes how to append Social Security numbers and driver’s license data as needed. All data sources used so far are named government offices, online websites, algorithms, and data brokers. In the next subsection, we examine how versions of the same data can be acquired through unknown sources on the deep web and darknet.

Results for Step 3: Data Sources: (f) Darknet

In November 2016, February 2017, June 2017, and July 2017, we surveyed darknet websites for offers that included the combinations of personal information that an attacker would need. We found the darknet to be an easily accessible, worldwide marketplace with numerous markets in which personal data on Americans was for sale. Buyers and sellers use anonymous, but persistent, identities with ratings based on purchase history. Payments are made using bitcoins, often escrowed by the market as a way to assure the integrity of purchases.

Our goal was to learn what kinds of darknet offers might have been available to a 2016 attacker. We did not seek to record the prevalence of personal information on darknet markets. That is, we were not interested in how many offers were available, but in what kinds of offers were available.

Our guidelines were simple. If an offer differed from all other offers we recorded by either the combination of personal information provided, the price, or the geography covered, we included that offer in our count. On the other hand, distinct offers for the same kind of personal information bundled in a similar manner with similar costs and for the same US geography, even if from different suppliers, were counted once.

We used a 2-step approach: (1) use a darknet search engine to identify two markets having the most offers of personal information of interest, and then (2) survey relevant offers on those markets specifically. We did not purchase any data from darknet sources, so actual dataset contents may vary from advertised descriptions. Instead, we use darknet market trust levels (e.g. [135]) as a measure of the likelihood that an offer properly describes the dataset contents. These levels use the reputation of the vendor and range from 1 (no real historical experience) to 10 (having 900 or more sales with 90 percent positive feedback). We accepted offers of level 4 or higher; these are vendors having at least 300 sales with $10,000 volume and 90 percent positive feedback.

First, we searched Grams [136], a search engine used to locate goods for sale on darknet markets. Our search strings were “voter,” “SSN,” “driver’s license,” and “fullz.” The term "fullz" usually contains an individual's name, SSN, birth date, account numbers, and other data [137]. We counted the occurrence of darknet marketplaces on the first 6 pages of search results. The two darknet markets having the most offers from these searches were Alphabay [138] and Hansa [139]. During the writing of this paper, Alphabay was shut down [140], so these offers now appear on Hansa and Dream [141]. Our findings are from our prior surveys of Alphabay and Hansa.

On Alphabay, we recorded the price and contents of the first 25 distinct kinds of offers encountered for access to: (a) datasets that included voter lists or the names, addresses, and dates of birth of thousands of Americans but excluded SSNs or driver’s license data; (b) datasets that also included Social Security numbers; or (c) datasets that included driver’s license data. We then searched Alphabay for the existence of state-specific datasets having the names, addresses, dates of birth, and SSNs of thousands of residents in each state. On Hansa, we surveyed the first 20 offers of (a), (b), or (c) and recorded any that were different in kind from those previously found on the first market. Hansa included links to a master list of datasets called dbworld [142], which seemed to be another means to access offers on Hansa, so we only counted these kinds of offers once. Viewing the offers on Hansa however, includes a sample of the first rows of the data allowing us to verify contents.

Among what we found for sale were: lists of personal information needed to establish credit, including SSNs and sometimes driver’s license data (20 kinds of offers); files of driver’s license data (9 kinds of offers); links to online repositories of profiles containing names, demographics, SSNs, and driver’s license data (6 kinds of offers); bundles of databases containing names and demographics including voter lists (36 kinds of offers); credit card information including driver’s license data (2 kinds of offers); and discounted access to data broker websites often using pre-established account names and passwords (2 kinds of offers). We also found one recurring option for each of the 50 states and D.C. for access to the names, addresses, dates of birth, and SSNs for thousands of that state’s residents (51 offers). In total, we archived 95 kinds of offers.

Datasets sold on darknet markets are not necessary illegal. There are many possible legal, illegal, and “in-between” possibilities. Credit card numbers likely originated from illegal activity. Account names and passwords to data broker websites may come from stolen account holder information; or, they may come from account holders exploiting their own pre-existing unlimited-use account arrangements. Opportunists who located breached information freely available online may be the source of lists of names and demographics (including voter lists, or data holders wanting to make money from data they legally hold may be the sources. There are many possibilities.

Regardless of the source of the data, however, pricing structures suggest that primary buyers intend to perpetrate credit card fraud or tax identity theft, because higher prices correlate with increased potential for economic gain through fraud. For example, the most recently released credit card numbers had a higher price than credit card numbers on the market for a longer time, presumably because over time credit card holders and banks cancel the accounts of exposed numbers, rendering the data less useful for fraud. As another example, the SSNs of individuals with higher credit ratings cost more than SSNs of those with lower credit ratings or unknown credit ratings.

These market incentives dictate how an attacker, seeking to impersonate voters on the 36 websites, could acquire the personal information needed. In the prior subsections, we described the attacker’s possible trail of data acquisition from named sources as one that started with voter lists, to which dates of birth were added, gender was inferred, and SSNs and driver’s license information were acquired as needed (Figure 8 and Figure 9). The shortest trail started with voter lists and then relied on the same data broker to append date of birth, SSNs, and driver’s license data.

Using darknet markets, an attacker could acquire access to single datasets having all the fields of information needed. For example, datasets having the personal information needed to establish credit minimally include: name, address, date of birth, and SSN (e.g., the recent examples in Figure 11 and Figure 12). As mentioned earlier, prices often depend on credit worthiness, so an attacker not concerned with credit could acquire the least expensive options. Prices for these datasets ranged from $0.41 to $3 per SSN record with a median price of $0.90 and negotiable volume discounts. We also found lists of names, addresses and dates of birth and SSNs for all 50 states and the District of Columbia, though not all lists were complete (median price of $0.90 per account). Acquiring any of these datasets for relevant states would have completed the requirements for Colorado, Delaware, Kentucky, Missouri, and Virginia; see Table 3. If the attacker wanted to focus on voters who were unlikely drivers, then acquiring these would have also completed the requirements for Connecticut, Minnesota, Pennsylvania, Vermont, and the District of Columbia. These are summarized in Figure 9 as paths 5F and path 7A. Of course, we do not believe that any of these datasets include all or most adults in a given state, so these datasets would provide information for opportunistic rather than broad attacks.

Offers having the same fields but including most or all of the adult American population are also available. For about $500, an attacker could purchase either a copy of a dataset or obtain unlimited access to a dataset that reportedly contains names and SSNs for all adult Americans who have SSNs (see the recent example in Figure 13). For another $500 (actual price $501.94), an attacker could purchase a dataset having the names, address, gender, and dates of birth of 203 million Americans, seemingly originating from a breach at Experian [30]; see Figure 14. Together, these two datasets would complete the requirements for Colorado, Delaware, Kentucky, Missouri, and Virginia; see Table 3. If the attacker wanted to focus on voters who were unlikely drivers, then acquiring these data also would complete the requirements for Connecticut, Minnesota, Pennsylvania, Vermont and the District of Columbia. These are summarized on Figure 9 as path 5F and path 7A. These datasets would provide comprehensive coverage of the American population.

An attacker could alternatively acquire files of driver’s license numbers that include driver’s license issue dates, dates of birth, names, and addresses, but not SSNs (e.g., Figure 15). Prices for these datasets ranged from $0.41 to $30 per record with negotiable volume discounts. Again, price depended on credit worthiness. Acquiring one of these datasets would complete the requirements for Alabama, Delaware, Colorado, Connecticut, Georgia, Illinois, Indiana, Kansas, Louisiana, Massachusetts, Minnesota, Nebraska, Oregon, Pennsylvania, Rhode Island, Utah, Vermont, Washington, and the District of Columbia; see Figure 9, paths 7A and 6C.

Because SSNs are critical to credit, their inclusion in a dataset usually has a dominant influence on price. An attacker could just as easily purchase a dataset having SSNs, names, address, dates of birth, and driver’s license data as a dataset having SSNs, names, address, dates of birth, and no driver’s license data (e.g., Figure 16). Prices of records having SSNs with driver’s license data included ranged from $0.41 to $3 with negotiable volume discounts. Acquiring one of these datasets, with SSNs and driver’s license data, would complete the requirements for Alaska, Alabama, Arizona, California, Delaware, Colorado, Connecticut, Georgia, Hawaii, Illinois, Indiana, Kansas, Louisiana, Maryland, Massachusetts, Michigan, Minnesota, Nebraska, Nevada, New Jersey, New York, Ohio, Oregon, Pennsylvania, Rhode Island, South Carolina, Texas, Utah, Vermont, Washington, West Virginia and the District of Columbia; see Figure 9 paths 5D, 6C and 7A. These datasets may not include all or most adults in the state.

Of course, the datasets we just described, whether comprehensive or not, included voters and non-voters (those not registered to vote), indiscriminately. An attacker could use these datasets alone for opportunistic attacks, but if an attacker wanted to tamper with an election, a voter list would be useful to identify specific people to impersonate.

Voter lists on the darknet are often difficult to discern from generally available datasets that contain names and addresses of Americans because the seller tends to barely or incidentally mention voter affiliation information among the available fields. For example, Figure 17 shows an example of a bundle of 141 datasets we found for $653, which included voter files for 13 states; an attacker who purchased these data only for the voter files would have paid $13 per million voters.

Voter lists are also offered separately. We found offers for 9 (25 percent of the 36) statewide voter lists; these were: $10.98 for Alaska, $30.97 for Colorado, $25.97 Connecticut, $20.98 Delaware, $30.97 Michigan, $26.97 Nevada, $30.97 Ohio, $25.98 Rhode Island, and $30.97 for Washington. We found offers for 4 voter lists that did not include all the voters in the state; these were: $10.98 for 4 percent of the statewide list from Alabama, $30.98 for 7 percent of the statewide list from Pennsylvania, $15.98 for 4 percent of the statewide list from Texas, and $25.98 for 5 percent of the statewide list from Utah. We also found offers for voter lists from Florida, North Carolina, and Oklahoma, which are not among the 36 states. See Figure 18 for excerpts of these offers. The Texas voter list included the voter ID number for each person.

An attacker could have spent $235 for the 9 statewide voter lists offered over the darknet (Figure 18), instead of spending $678 for those voter lists from government sources (Table 8). The bulk of the savings comes from paying $26.97 for Nevada instead of $500 each to acquire these from named government offices or data brokers. Surprisingly, an attacker could lower the cost further, to just $38, by using freely available online voter lists for Colorado (saving $30.97), Connecticut (saving $25.97), Delaware (saving $20.98), Michigan (saving $30.97), Ohio (saving $30.97), Rhode Island (saving $25.98), and Washington (saving $30.97); see Table 8. Using the least expensive combination of named or darknet sources to acquire statewide voter lists for all 36 states drops the total cost from $17,679 to $17,196.

Also sold on the darknet is discounted access to named data broker websites. Rather than purchasing datasets on the darknet, an attacker could purchase a discount coupon for a named data broker or credentials for a previously established account at a data broker’s website. We found prices ranging from $2 to $12, with a median price of $10 per account. The account credentials (username and password) allow an attacker unlimited use of the data broker website for the flat fee. Figure 19 shows an offer for Intelius, one of the popular public data brokers mentioned earlier. Figure 20 shows offers to URL links where data can be searched freely or downloaded, one of which reports having profiles for 70 percent of all Americans who have SSNs.

Some of the flat fee prices for URL links to personal data on Americans may be URLs hosting files stored freely and openly on the deep web (see the recent examples in Figure 20). Because the URL is freely and openly available, the attacker might be able to locate the data without a darknet purchase at all.

Many of the datasets mentioned may or may not include gender. In cases where the dataset does not include gender, the attacker could infer gender from first names, as described in an earlier section.

The websites for New York and Texas both require additional information from the driver’s license beyond the license number and issuance date. New York’s website requires a driver’s license “document number,” and Texas’s requires an “audit number” from the driver’s license. We refer to both as document numbers, and they are commonly found on U.S. driver’s licenses. A driver’s license number is unique to a driver but a document number is unique to the card. A replacement driver’s license, for example, will have the same driver’s license number but a different document number. Different states assign document numbers differently. In Massachusetts, for example, the document number is an encoding of the date of issuances for the driver’s license number. Some states may sequentially assign document numbers, so that numbers correlate with issuance dates. Other states may encode other kinds of information, so having a list of these document numbers with other personal information about the drivers will often reveal the encoding scheme. A document number may end with a checksum digit so that a mathematical computation can be done on the digits to confirm that the number is valid. The magnetic strip on the back of the driver’s license usually includes the document number.

Document numbers were not available from the data broker websites mentioned earlier; however, several offers on the darknet include physical copies of actual driver’s licenses and data compiled from swipes of the magnetic strips of driver’s licenses. These range from $1 to $10 per license for copies and bulk discounts for magnetic strip data of $0.01 to $1 each. Using this information, an attacker could learn whether these numbers are encoded values derived from other information or sequentially assigned, and could then predict document numbers for other driver’s licenses without purchasing more data. Or, an attacker could use the purchased data to opportunistically impersonate Texas and New York voters. In this writing, we assume that all references to driver’s license data for New York and Texas includes the document number at the cost of $0.01 per record unless stated otherwise or is obvious from context.

In short, darknet markets sell all the kinds of personal information an attacker would need to impersonate voters on the 36 websites. For example, for about $1,000, an attacker could have acquired the names, addresses, dates of birth, gender, and SSNs of most adult Americans in the United States by purchasing two datasets. Driver’s license numbers with issue dates, names, addresses, and dates of birth were available for as little as $0.41 per record, with negotiable volume discounts or bundled in some offers with SSN data, though not necessarily available for every driver in America. An attacker who wanted to use voter lists to target voters to impersonate could have purchased statewide voter lists for Alaska, Nevada, and Rhode Island for $64 on the darknet instead of $1,020 from named sources.

Figure 11. Excerpt of a 2017 darknet offer of a dataset containing name, address, date of birth, and SSN for $0.90 per record. Volume discounts available.

Figure 12. Excerpt of a darknet offer of a state dataset containing name, address, date of birth, and SSN. Similar solicitations were found for each of the 50 states and the District of Columbia.

Figure 13. Excerpt of a darknet offer of unlimited access to a dataset containing name, address, date of birth, and SSN for $500. Ability to download the entire dataset included. The offer claims that 100 percent of SSNs appear.

Figure 14. Excerpts of (a) darknet offer for a dataset containing (b) fields that included the names, addresses, gender, and dates of birth of 203 million Americans. Cost for a copy of the dataset was $501.94.

Figure 15. Excerpt of a darknet offer of access to a dataset containing name, address, date of birth, driver’s license number, and driver’s license issuance date. Bulk prices seem negotiable but the total number of driver licenses available is not clear.

Figure 16. Excerpt of a darknet offer for access to a dataset containing the name, address, date of birth, SSN, and driver’s license number of people in the United States.

Figure 17. Excerpt of a darknet offer for a dataset containing 141 databases for $653, including 2015 voter files from 12 states. Some appear to include 100 percent of the voters and others a fraction of the voters in that state. Florida’s list is for 2013.

Figure 18. Excerpts of darknet offers for voter lists by state. These offers included lists for 13 of the 36 states in our study, of which 9 included all voters (Alaska, Colorado, Connecticut, Delaware, Michigan, Nevada, Ohio, Rhode Island, and Washington) and 4 did not include all the voters in the state (Alabama, Pennsylvania, Texas, and Utah). Offers shown for voter lists from states not in our study are from Florida, North Carolina, and Oklahoma.

Figure 19. Excerpt of a darknet offer for discounted access to popular public data broker websites.

Figure 20. Excerpt of 2 darknet offers for URL links to personal data claiming (a) a source similar to a popular public data broker and (b) the SSNs of 70 percent of all Americans having SSNs.

Results for Step 3: Data Sources: (g) Deep Web

The public often learns about large data breaches of sensitive personal information when computer security investigators find datasets publicly available at openly accessible website URLs (e.g. [65],[51]). No password or account credentials block access. Instead, the datasets are available to anyone who happens to know the URL, discovers the URL, or perhaps purchases knowledge of the URL from a darknet market. These breached data contain the names, addresses, dates of birth, SSNs, and driver’s license numbers of millions of Americans (e.g., [33], [34], [35], [36]).

There are three primary reasons these data could be unguarded. First, a legitimate data holder may naïvely believe that the URL is so well hidden that no one beyond those with whom they share the URL will know the dataset is there. Meanwhile, the open access allows them to easily share voluminous information with business associates and others.

Second, the data holder may use weak security settings on the cloud storage platform that hosts the data. These settings may even default to open access and then remain unchanged because the data holder is unaware of the vulnerability.

Lastly, an insider or a thief could share the dataset at an obscure URL link on purpose, enabling outsider access without actually having personal possession of the data. As mentioned earlier, the open access makes it easier to share large amounts of information.

The last of these scenarios seems the most likely for an attacker seeking to impersonate voters at the 36 websites.

Earlier we described the breaches of millions of voter records from Illinois and Arizona government election offices [41], [42], [43], [44]. In this subsection, we turn our attention to political campaigns.

We surveyed breach notices, news articles, and the websites of data analytic companies that work with political campaigns to identify the kinds of personal information that could be leaked to, discovered or stolen by an attacker.

Overall, we found that political campaigns hold a treasure trove of personal data on Americans. They have up-to-the-minute voter lists, often embellished with lifestyle and interest information. A campaign often holds credit card information of those who make donations. Data analytic companies working with or on behalf of campaigns may use driver’s license data to identify unregistered voters and Social Security numbers to disambiguate and link information across disparate datasets. Resulting voter profiles may be shared back to the campaigns in aggregate or detailed form. In large data operations, a campaign may maintain all the demographic and government-issued data an attacker needed to impersonate voters.

We also found reports of ongoing attempts to steal and acquire personal information on Americans from political campaigns, and some political campaign datasets were found openly available on the deep web.

NBC News reported that U.S. authorities traced massive political cyber-espionage activities targeting the 2008 U.S. presidential campaigns of Barack Obama and John McCain to the People’s Republic of China [143]. The goal, according to the officials, was to export massive amounts of internal data from both campaigns.

An article published in Time magazine described ongoing hacking attempts faced constantly by the U.S. presidential campaigns of Barack Obama and Mitt Romney in 2012 [144]. According to the article, organized crime attempted to steal credit card data, and there were ongoing attempts to gain access to all kinds of valuable data seemingly by foreign nation states. Zac Moffatt, digital director of the Romney campaign, reported being “under constant attack” with highly visible incidents occurring “four or five times a week.” Other organizations under attack included the National Republican Congressional Committee.

Data from political campaigns have also been found publicly and openly available on the deep web. For example, a national voter list containing 191 million U.S. voter registration records appeared openly available on the Web in December 2015 [65]. The 300 GB dataset contained names, addresses, dates of birth, party affiliations, and voting history for each voter in all 50 states and the District of Columbia.

Another example is 170 GB of 2016 political profiles on American voters found freely available on the Web [51]. The dataset contained virtually every voter in the United States, including all the kinds of information found on voter lists embellished with surveys and detailed analyses about enthusiasm for Trump.

In short, the voter lists needed to impersonate voters at the 36 websites are openly and freely available on the deep web, and data breaches containing many of the needed government identifiers have been reported.

Results for Step 4. Record and Test CAPTCHAs.

Of the 36 websites in our study, 11 (31 percent) had some form of CAPTCHA. These states were Connecticut, Delaware, Indiana, Nebraska, Nevada, Ohio, Oregon, Pennsylvania, South Carolina, Utah, and Washington.

Connecticut’s website had the easiest CAPTCHA to defeat. All of the 100 images we captured consisted of a 5-digit number displayed using the same font, print angle, and background. Figure 21 shows some examples. Here are three ways that an attacker could semi- or fully automate responses to these CAPTCHAs. First, in a preliminary step, the attacker could copy a separate image of each digit, 0 to 9, and then write a simple image processing program to match the stored images of digits against a CAPTCHA image of a 5-digit number to report which digits appear. This approach allows the attacker to fully automate change of address changes at scale.

As an alternative, the attacker could construct a dictionary that associates each 5-digit number, 0000 to 99999, with its corresponding image. Each time the website displays a new 5-digit code, the attacker enters the code by hand, recording the image and the answer. Any repeated CAPTCHA images would then be recognized automatically, so this semi-automated approach has slow manual performance in the beginning until enough recognized images re-appear.

As a third and final alternative, the attacker could write a simple computer program that sends a copy of the CAPTCHA image to a network of human helpers, (e.g., Amazon Mechanical Turk [58]), who in turn provide the text answer back to the program, which then enters the text as a response to the CAPTCHA. This semi-automated approach would have been possible because the Connecticut website’s page did not reset even after 5 minutes, which is more than enough time to route the image to human helpers and get an answer back. Amazon’s Mechanical Turk costs about $0.042 per CAPTCHA (about $0.03 to the human worker plus Amazon’s 40 percent overhead) [145], [146].

Ohio’s website had a similar but more advanced CAPTCHA. All of the 100 images we captured showed 5 characters. Each image had a different font effect and print angle against a random dot background. Figure 3a and Figure 22 show examples. Researchers describe ways to automate the recognition of this kind of CAPTCHA, and they report a 30 to 50 percent success rate [59]. Of course, the semi-automated means described earlier using a network of human helpers would defeat this CAPTCHA because the pages did not reset even after 5 minutes.

Delaware, Indiana, South Carolina, Utah, and Washington’s websites used Google’s reCAPTCHA v1 system of images of street signs from Google Street view [147]. Figure 3b and Figure 23 show examples. As mentioned earlier, the Google Street View algorithm [61] and a program from a commercial company [60] are automated ways to defeat these CAPTCHAs. These web pages also accepted answers after 5 minutes of idle time, so the semi-automated approach using human helpers would have defeated them also.

Lastly, Nebraska, Nevada, Oregon, and Pennsylvania’s websites use Google's reCAPTCHA v2 which uses an “I'm not a robot button” based on user click behavior rather than an image-based quiz [148]. After the “I am not a Robot” box is clicked (Figure 24), a 3x3 grid of images appears, to which the user clicks an answer by selecting those images having a specific characteristic (Figure 3c and Figure 3d). These CAPTCHAs require a response of mouse clicks within the grid of images in only 2 minutes, so a network of human helpers would likely not work.

However, both the ReCAPTCHA v1 and ReCAPTCHA v2 services allowed an audio alternative to visual images. The attacker could have used semi-automated or automated approaches to defeat “verbal CAPTCHAs” even when image CAPTCHAs could not be defeated. Clicking the headphones icon replaces the image CAPTCHA with a verbal one (Figure 24). A verbal CAPTCHA plays a sound file of a person speaking some digits or words. The user responds by typing the spoken words into the text box (Figure 24). The attacker could automate this sound-to-text test by writing a computer program that sends the sound file to a speech recognition program (e.g., Google’s own speech API [149]), which in turn, returns the resulting text to enter as a response to the CAPTCHA (see the release of a computer program after the election that worked in this manner [150], which forced Google to make further changes to the ReCAPTCHA service). Alternatively, the attacker could develop a semi-automated solution in which a network of human helpers sends text responses to verbal prompts. The wait time for responses was 2 minutes, so performance of the human network would have to be optimized.

In short, 11 (31 percent) of the 36 websites had a CAPTCHA service that attempted to limit the rate at which an attacker could change addresses on the website, but an attacker could have defeated all of them. Connecticut, Delaware, Indiana, Ohio, South Carolina, and Washington’s websites had CAPTCHAs that fully automated computer programs could defeat. Ohio’s website was the easiest to semi-automate, because the image could be farmed out in real-time to an online network of human helpers who sent back the text to enter into the CAPTCHA. Using a network of human helpers would have cost about $0.042 per CAPTCHA. Nebraska, Nevada, Oregon, and Pennsylvania’s websites used the same, very difficult to defeat image CAPTCHA service. However, the CAPTCHAs on these websites, as well as those on the websites of Delaware, Indiana, South Carolina, and Washington, included a verbal option, which could be defeated by automated or semi-automated means by converting sound to text using a speech recognition program or sending the sound file to a network of human helpers who sent back the text from the spoken words.

Figure 21. Twelve sample CAPTCHA images from the Connecticut website.

Figure 22. Eight sample CAPTCHA images from the Ohio website.

Figure 23. Sample CAPTCHA images from the websites of Delaware (top row), Indiana, South Carolina, Utah, and Washington (bottom row).

Figure 24. Sample reCAPTCHA image (a), with verbal option (c) selected by clicking on the headphones icon (b).

Results for Step 5. Test Automation.

In 2016, we wrote computer programs using the Python programming language that automatically submitted names and demographic information to 19 voter websites. We randomly chose the websites for Alabama, Alaska, Connecticut, District of Columbia, Delaware, Hawaii, Indiana, Kansas, Kentucky, Massachusetts, Minnesota, Nevada, New York, Rhode Island, Utah, Vermont, Washington, and West Virginia. These websites reported a voter’s polling place and were not the same websites used for registering or changing the registrations of voters. We made this distinction to further insulate our automation tests from actual voter rolls. Like the websites used for changing addresses, polling place websites required entering information into fields on a web page and “clicking” a form submission button. Often the submission process spanned multiple web pages.

We used the Python language and constructed different kinds of programs to best match different Python libraries to the features present on websites. Here are several simple examples.

Figure 25a shows the web form that finds polling places in Ohio from street addresses. Figure 25b shows a simple Python program for looking up one voter. By writing a program that automatically changed the values associated with ‘frmstnum’ and ‘frmstname’ to be all the addresses found in the voter file for Ohio, we learned all the polling places associated with voter addresses.

Similarly, Figure 26a shows an example of the website that finds polling places in Alabama. Figure 26b shows a simple Python program for looking up one voter. By changing values for ‘nameLast,’ ‘dobMonth,’ ‘dobDay,’ and ‘dobYear,’ our program found all polling places for voter addresses in Alabama.

The other websites required more advanced programming, but we were completely successful at automating searches on all 8 state websites using Python programs we wrote.

[We merged some of our results with results from other approaches to create a public service website voteGPS.org, where voters could easily find their polling place in those states whose information was not as readily available in early 2016. By November 2016, however, polling place information was readily available online.]

In short, we demonstrated how simple Python programs can be used to automate online activity at state websites.

Figure 25. Ohio website features: (a) form for looking up polling places in an Ohio county [http://www.voterfind.com/lickingoh/pollfinder.aspx] and (b) a simple, complete Python program to look up the polling place for one Ohio voter at the county website. The code was further modified to look up the polling places of all voters in the county and then modified further (c) for all voters in the state [http://voterlookup.sos.state.oh.us/voterlookup.aspx].

Figure 26. Alabama website (a) for looking up polling places [https://myinfo.alabaavotes.gov/VoterView/PollingPlaceSearch.do] and (b) a simple, complete Python program to look up the polling place of one Alabama voter at this website. The code was further modified to look up the polling places of all voters in the state.

Results for Step 6. Compare Total Attack Costs.

We have now shown that the primary ingredients an attacker needed to execute a change of address attack at scale on any of the 36 state websites in 2016 were: (1) computing facilities to edit voter records online; (2) personal data to impersonate voters; (3) computing facilities to defeat CAPTCHAs; and, (4) the programs or programming skills to automate tasks. We have also shown variability among the state websites in terms of data requirements and CAPTCHAs, so in this section we compare costs of executing a change of address attack by website.

We focus on perpetrating attacks at scale using automation. The computer programs needed to impersonate voters or to defeat CAPTCHAs differ by website, so a scaled attack requires programming. For the remainder of this section, we assume the attacker had the programming skills needed to write the kinds of straightforward Python or equivalent programs required (e.g., Figure 25b and Figure 26b) or had a confederate with that skill. Our cost comparisons do not include programming costs.

If an attacker sought to cause widespread havoc by changing all or most voter records at a state website, the impact would have been noticed at the polling places. But what if the attacker wanted to have impact but remain undetected? In this case, he would seek to alter records strategically across the state in small percentages so that no one polling place would have a large number of disenfranchised voters complaining. The overall impact of this approach across the polling places in the state could be considerable. A change of 1, 2, 5, or perhaps even 10 percent of the voter rolls may not be noticed. So we look at the costs of a change-of-address attack on the 36 websites using different sources for personal data and different ways of addressing CAPTCHAs to disenfranchise 1, 5 and 10 percent of the voters in the state.

Results for Step 6a. Total Attack Costs Using Named Sources

In this subsection, we compute the total attack costs to change 1 percent, 5 percent, and 10 percent of voter records using personal data from named sources using an automated method to defeat CAPTCHAs.

Changes to 1 percent of the voters in a state would have affected anywhere from 505 voters in Alaska, which had the fewest voters, to 170,283 voters in California, the largest population of voters. The median was 33,000 voters and the average 42,550, with a standard deviation of 39,951. Changes to 5 and 10 percent of the voters equated to 2,526 and 5,052 Alaskan voters and 851,415 and 1,702,829 Californian voters, respectively. See Figure 28.

A computer can work around the clock with possibly some down time for programming or storage changes. In our computations, we estimate a computer working 20 hours a day for 30 days a month, for a monthly total of 600 hours.

Extrapolating from our experiment with automated retrieval of polling place data (prior subsection), we assume that a computer could impersonate a voter on a state website and make an address change within 1 minute. In other words, changing the addresses of 505 Alaskan voters would take a computer 505 minutes; done using a single computer, it would take less than a day. Figure 27 shows the number of machine hours needed to edit the addresses of 1 percent of the voters on each state website. The minimum is 8 hours to change the 505 voters on Alaska’s website. The maximum is 2,838 hours to change the 170,283 voters on California’s website. The median is 33,000 hours and the average 42,550 hours.

An attacker can use a farm of computers as needed by purchasing virtual cloud machines (e.g., [151]). We use the current price of a basic configuration from Microsoft Azure, which is $13 per machine per month. We compute the machine cost per state based on the machine time needed to edit the voter rolls. This is the number of required machine hours divided by 600 hours/month times $13/month for machine rental.

We start by ignoring any machine costs needed to respond to CAPTCHAs or harvest personal information from broker websites. Instead, we first look at the computer time needed to enter personal information at the website. Then, in a few paragraphs below, we add the additional costs.

For example, to change 1 percent of the Alaskan voters, the machine cost for editing the website is 8 hours divided by 600 times $13, or $0.18. For California, the cost is 2838 hours divided by 600 times $13, or $61. Figure 27 enumerates the machine costs for editing voter rolls by state to change the records of 1 percent of the voters. The median cost is $12 and the average $15, with a standard deviation of 14. The total machine costs to edit the records of 1 percent of voters at all 36 websites would have been $553, the cost of about 1 month of computing by 43 cloud machines.

An attacker could use virtual machines to scale attacks as speed and resources permitted. For example, the 2,838 hours needed to edit voter registrations on the California website could be done by a single computer working almost 5 months, 5 virtual machines working almost 1 month, or 10 virtual machines working for about 2 weeks, all for the same $61 in computing costs.

As described in detail earlier, changing voter addresses requires access to voter names and demographics (Table 3). Prices and sources vary. Suppose the attacker purchases voter data from government sources for all purchases less than $100 and from brokers or government sources for all other states (Table 8). In that case, voter data for 16 states cost less than $100 each and for 6 states, the cost is the maximum of $2,000 each; the total for all 36 states is $17,679. The median cost is $114 and the average $491, with a standard deviation of $733. Figure 27 lists the voter costs for per state.

An attacker would also need government-issued identifiers such as SSNs and driver’s license data. As reported earlier, we found named data brokers that provided SSNs and driver’s license data; the least expensive was a flat $40 per month with an additional $0.01 for detailed data that included the driver’s license issue date.

Texas and New York websites need document numbers. We budget $0.01 for each document number based on an assortment of possibilities. An attacker could purchase driver’s license data on the darknet that includes document numbers $0.01 each. An attacker may have derived a scheme for predicting document numbers in these states, as discussed earlier, in which case, there are no additional data costs but there are additional computational costs for multiple tries. So, for convenience, we use one cent cost per record to model the costs of these different possibilities.

In estimating costs, we applied the $40 per state for each 600 hours of computer time. This cost model accommodates seeming limits on the “unrestricted” access to be what a single computer can access within a month from the broker’s website –i.e., 600 hits per month. So, the cost to acquire government data differs by state website based on the number of hits per month. We used the number of machine hours computed earlier for each state, divided by 600, and multiplied by 40. This computation appears for each state in Figure 27 as the cost of government data (Gov ID). There is no cost shown for Delaware because no government number was needed on its website. Otherwise, the least expensive was $1 for Alaska. The most expensive was $1,678 for Texas because of the one-cent lookup cost for the driver’s license document number. The median was $40 and the average $170, with a standard deviation of $381. The cost of acquiring all the SSNs and driver’s license data needed to impersonate 1 percent of the voters on the 36 websites totaled $5,958. Figure 27 lists per state totals.

Using a data broker website to display the SSNs and driver’s license data would require a computer program to look up the information and harvest the needed numbers. We used the same 1 minute-per-voter time estimate, so the computing costs are the same as those for editing the voter information online. These values are replicated in Figure 27 as the machine cost for acquiring personal data.

Some websites used CAPTCHAs, but these could be automated. For these websites, we computed the machine costs for running the program to respond to the CAPTCHA to also be 1 minute per voter, with the exception of Ohio, where we used 3 minutes per voter because of the algorithm’s hit rate being 30 to 50 percent. These values are stored as the machine costs for CAPTCHAs on Figure 27. The minimum was $2 for Delaware. The maximum was $85 for Ohio. The median was $9 and the average $18, with a standard deviation of $24. The total machine cost for defeating all CAPTCHAs automatically was $195.

In short, an attacker could acquire personal data from government sources and data brokers and use programs to defeat CAPTCHAs to change the addresses of 1 percent of the voters on the 36 websites. Figure 27 summarizes the costs of doing so. Delaware was the least expensive at $5, and Texas was the most expensive at $3,059. The median cost for an attacker to attack one state is $339; the average is $692 with a standard deviation of $841. The voter addresses of 1 percent of the voters on all 36 websites could be changed for a total of $24,926.

The cost of acquiring voter data from government offices and data brokers dominated the total cost to change 1 percent of the voters on all 36 websites. Specifically, the cost of the voter data was $17,679 (71 percent) of the $24,926 total. The acquisition of SSNs and driver’s license data totaled $5,958 (23 percent) and was the second largest cost.

As would be expected, costs increase with the number of voter records changed per state. As we just itemized, changing 1 percent of the voter records on all 36 websites would have cost $24,926. Changing 5 percent of the voter records on all 36 websites would roughly double the cost to $53,915, and changing 10 percent would increase the cost to $90,152. See Figure 28.

Figure 27. Total attack costs for changing the addresses of 1 percent of the voters on the 36 websites using automated programs to respond to CAPTCHAs and purchasing voter lists from government sources when those lists cost less than $100 or else, for all other states, from the least expensive named source (Table 8). Estimates include 1 minute per voter for a computer to make the address change on the state website (“Editing (Machine)”), another minute to scrape the SSN and driver’s license number from the data broker’s website, and another minute to respond to the CAPTCHA (except 3 minutes for Ohio because the program had only 30-50 percent accuracy). Machine time cost was estimated at $13 for 600 hours (a computer work month). SSN and driver’s license numbers purchased from a data broker for a flat fee of $40 per 600 hours plus $0.01 for each driver’s license issue date. In the cases of NY and TX, we presume that additional driver’s license data were inferable.

Figure 28. Total attack costs for changing the addresses of 1, 5, and 10 percent of the voters on the 36 websites using automated programs to respond to CAPTCHAs and purchasing voter lists from government sources when those lists cost less than $100 or else, for all other states, from the least expensive named source (Table 8). Estimates include 1 minute per voter for a computer to make the address change on the state website, another minute to scrape the SSN and driver’s license number from the data broker’s website, and another minute to respond to the CAPTCHA (except 3 minutes for Ohio because the program had only 30-50 percent accuracy). Machine time cost was estimated at $13 for 600 hours (a computer work month). SSN and driver’s license numbers purchased from a data broker for a flat fee of $40 per 600 hours plus $0.01 for each driver’s license issue date. In the cases of NY and TX, we presume that additional driver’s license data were inferable.

Results for Step 6b. Total Attack Costs using Darknet Sources

In this sub-section, we compute the total attack costs to change 1 percent, 5 percent, and 10 percent of voter records using personal data from darknet sources with automated means to defeat CAPTCHAs.

An attacker could use personal datasets sold on darknet markets in strategic ways to impersonate voters at scale. One way would be to combine the SSN dataset that contained the names and SSNs of most Americans (Figure 13) with the dataset that contained the name, address, and demographics of 203 million Americans (Figure 14) to make a combined “National Dataset.” The cost for the two datasets together would have been $1,002, and the contents would complete the requirements for 6 states and the ELSE requirements for 5 states. See path A in Figure 29.

The information from the National Dataset could then be used with the state website for Nebraska to learn needed driver’s license numbers for Nebraska voters at no extra cost; see path C on Figure 29. The combined information could be used with computer programs to compute needed driver’s license number from demographics for Maryland, Michigan, and New Jersey, at no extra cost; see path D in Figure 29.

The website for Texas required a voter ID number (Table 2). The voter ID numbers were present on Texas voter lists purchased from named sources, as well as from darknet sources. The darknet offer was for a partial list of Texas voters and cost $15.98. The statewide dataset from Texas itself was $1,272. We used the statewide version in our analyses because of its completeness. We also use the $0.01 per record model cost for the document or “audit” number as described earlier.

Finally, an attacker could use additional information from the data broker described earlier to learn needed driver’s license numbers and issuance dates for 26 states at a cost of $40 per machine month and $0.01 per issuance date; see path B in Figure 29.

An attacker may have made a deal with a supplier of driver license data for a dataset that included the driver’s license numbers and issuance dates of most Americans. We found an offer on a darknet market that seemed to encourage the negotiation of such a dataset (Figure 15), but we did not contact the supplier or engage in such negotiations so we do not use the acquisition of the dataset in these computations. If an attacker did acquire driver’s license data in bulk from a darknet source, the total attack cost could be much lower than what we report. See path X on Figure 29.

Figure 30 itemizes the relative costs to change 1 percent of the voters on each of the 36 state websites using the National Dataset and driver’s license data from a data broker, the Texas voter list from a government office in Texas, a state website, and algorithms that infer driver’s license numbers; see paths A, B, C, and D in Figure 29. There are no state-specific costs for voter data other than Texas because all voter (and non-voter) information is included in the $1,002 cost for the National Dataset.

The National Dataset is not voter-specific, so we estimate the number of attempts (or “tries”) required at the state website to find registered voters. The U.S. Census reports the percentage of each state’s residents who are registered voters [152]. The number of tries per change is 1 divided by the percentage of voters in the state. Nationally, the proportion of registered voters is 64.2 percent, which increases the number of attempts by a factor of 1.6. In other words, for every three people’s information attempted, two would likely be voters. The “Tries” column in Figure 30 reports how this plays out for each state. The factor is 1.0 for Texas because we use the actual Texas voter list. The smallest factor, other than for Texas, is 1.3 for Washington, D.C., where 75.9 percent of the population are registered voters. The largest factor is 2 for Hawaii, which has less than half of its population registered to vote. The average and median is 1.5 with a standard deviation of 0.2.

The number of attempts is the number of voter records sought to be changed times the number of tries. For example, a 1 percent change in Alaska would involve changing 505 voter records. With 69.1 percent of Alaska’s population registered to vote, the factor is 1.4. We estimate that it will take about 707 attempts to edit 505 voter registrations, because 505 times 1.4 is 707.

These additional attempts to locate voters through state websites increase the number of hours machines would have to spend editing voter rolls. We still assume that each attempt takes 1 minute. The number of machine hours to change 1 percent of the voter records in a state is the number of required machine hours divided by 600 hours/month times $13 per month for machine rental. See the machine cost column under “Website Editing” in Figure 30 for state-level totals. Alaska has the lowest cost, $0.26, for 12 hours of effort. California has the highest at $114. The median is $16 and the average $24, with a standard deviation of $27. The total for all 36 states is $587.

The only state-specific cost for personal information was the Texas voter list and New York and Texas document numbers. There were no personal data costs for any other states or for SSNs, only the $1,002 cost of the National Dataset itself. Driver’s license data, however, is not included in the National Dataset. The least expensive data broker for driver’s license data charged a flat $40 per month with an additional $0.01 for detailed data that included the driver’s license issue date. As we did in the previous assessment, we applied the $40 per state for each 600 hours of computer time. The total cost for acquiring driver’s license data was $6,109. The average cost among those states needing driver’s license data to be purchased was $255 with a standard deviation of $460. The median was $52. See the “Driver ID” column on Figure 30 for costs per state.

We also estimated 1 minute for each access to the driver’s license data at the broker’s website. This increased the machine costs for those states requiring driver’s license data that could not be inferred from demographics or SSNs or looked up for free. The total was $587. The median was $16 and the average $24, with a standard deviation of $27. See the machine costs for “Personal Data” in Figure 30 for totals by state.

As in the prior assessment, we estimate the costs of automated approaches to defeat any website CAPTCHAs. We allocate 1 minute per attempt for each state having a CAPTCHA except for Ohio, where we estimate 3 minutes per attempt. The per state costs appear under the “CAPTCHA” heading on Figure 30. The minimum is $4 for Delaware. The maximum is $123 for Ohio. The median is $14 and the average $26, with a standard deviation of $34. The total machine costs for defeating all CAPTCHAs automatically were $284.

Overall, an attacker could use the National Dataset to change 1 percent of the voter records at the 36 websites for a total cost of $10,081. Changing 5 percent of the voter records on all 36 websites cost $41,310, and changing 10 percent cost $80,347.

An attacker could have changed voter records using the National Dataset, but for any state other than Texas, targeting specific voters could only be done based on residence, race inferred from first name for Blacks and last name for Hispanics and Asians, or gender. If an attacker wanted to impersonate voters based on actual rather than inferred party affiliation, then voter lists would be necessary for the other states.

An alternative to using the National Dataset is to join the SSN dataset with the least expensive options for acquiring statewide voter lists from named and darknet sources. The SSN dataset cost $500 (Figure 13), and the cost of acquiring statewide voter lists using the least expensive sources totaled $$17,196. Combining these datasets would complete the requirements for 6 states and the ELSE requirements for 5 states. See Path E in Figure 29. We computed the per state costs for using the combination of statewide voter lists and the SSN dataset. The total cost to change 1 percent of the voter records on all 36 websites was $20,073. Changing 5 percent of the voter records cost $29,605, and changing 10 percent cost $41,488.

In short, two datasets that contained the names, demographics and SSNs of most adult Americans were available from darknet sources for $1,002 total. Together, these datasets provide a national dataset that dramatically lowered the costs for an attacker by avoiding the purchase of voter lists altogether. For $10,081, an attacker could use the two datasets and a data broker to change 1 percent of the voter records on the 36 websites. Had the attacker acquired a dataset that contained the SSNs of most Americans and statewide voter lists from darknet sources, the cost to change 1 percent of the voter records would have been $20,073. Both options cost less than the $24,926 option described earlier, which used data primarily from named government and data broker sources.

Figure 29. Six paths for an attacker to have acquired the personal data needed to impersonate voters on each of the 36 websites: (a) combination of two datasets from the darknet that jointly produce a National Dataset having the names, demographics and SSNs of most adult Americans combined with the Texas Voter List purchased from Texas; (e) an alternative of statewide voter lists and SSNs combined; and then, addition of driver’s license data from (b) a data broker, (c) a state website, (d) inferred from demographics, or, (x) a dataset having driver’s license numbers and issuance dates acquired from the darknet (e.g., Figure 15). The (+) superscript identifies those states whose websites required driver’s license issue date information.

Figure 30. Total attack costs for changing the addresses of 1 percent of the voters on the 36 websites using automated programs to respond to CAPTCHAs and acquiring from darknet sources a national dataset of SSNs (Figure 13) and a national dataset of profiles having the names, address, gender, and dates of birth of 203 million Americans (Figure 14). Estimates include 1 minute per attempt for a computer to make the address change on the state website (“Website Editing”), where the number of attempts is based on a factor derived from the percentage of registered voters in the state population (“Tries”). Costs include a minute to scrape driver’s license number from the data broker’s website, and another minute to respond to the CAPTCHA (except 3 minutes for Ohio because the program had only 30-50 percent accuracy). Machine time estimated at $13 for 600 hours (a computer work month). Driver license numbers purchased from a data broker for a flat fee of $40 per 600 hours plus $0.01 for each driver’s license issue date. In the cases of NY and TX, we presume that additional driver’s license data were inferable. Costs for driver’s license data could be reduced if attacker acquired a dataset of driver’s license data from the darknet (e.g., Figure 15).

Results for Step 6c. Total Attack Costs Using Deep Web Sources

In this sub-section, we compute the total attack costs to change 1 percent, 5 percent, and 10 percent of voter records using personal data from deep web sources with automated means to defeat CAPTCHAs.

Had an attacker accessed complete datasets either found on the Internet or shared secretly from a confederate, then there would have been no costs for data at all. However, machine costs to edit the voter records at the website and to defeat CAPTCHAs would remain. These costs total $748 to change 1 percent of the voter records at all 36 websites. Costs would jump to $3,739 and $7,477 to change 5 and 10 percent of the records, respectively.

Results for Step 6d. Total Attack Costs Using Semi-Automated Means to Defeat CAPTCHAs

In this sub-section, we compute the total attack costs to change 1 percent, 5 percent, and 10 percent of voter records using a network of human helpers to defeat CAPTCHAs.

In our prior assessments, we used computer programs to defeat CAPTCHAs because these automations were applicable to the websites as they appeared in 2016. Months after the election, however, Google modified its reCAPTCHA v2 service to make it harder to use speech recognition for sound-to-text automation. That means that an attacker in 2016 had an advantage that an attacker today does not. Instead of automated response, an attacker today might need to turn to a network of human helpers.

With a network of human helpers, costs increase on a per-CAPTCHA basis. We use a rate of $0.042 per CAPTCHA to pay humans for simple image-to-text CAPTCHAs and $0.084 per CAPTCHA for more difficult sound-to-text CAPTCHAs. We contextualize these increased costs in the two scenarios we described earlier: (a) changing 1 percent of the voter records using personal data from named sources (Figure 27), and (b) changing 1 percent of the voter records using a national datasets of personal data from darknet sources (Figure 30). Actual costs vary by the kind of CAPTCHAs found at the 11 state websites that had CAPTCHAs.

The kind of CAPTCHA that appears on Connecticut’s website (Figure 21) is not part of Google’s service, so it remains susceptible to the same automatic matching of digit images that we discussed earlier. We estimate its cost, as we did previously, based on the allocated machine time for the program to respond, namely, 1 minute per CAPTCHA. In the assessment that used data from named sources, processing CAPTCHAs for 353 Connecticut voters cost $8. This cost rises to $12 in the assessment that used the National Dataset because each change requires 1.6 attempts.

Ohio’s website had a kind of CAPTCHA (Figure 22) that was also not part of Google’s service. The computer program used to solve this kind of CAPTCHA provided a correct answer 30 to 50 percent of the time, meaning that often it would make multiple attempts per voter. A network of human helpers could help. Instead of multiple tries by the program, the program could try first, and if not successful, forward the replacement CAPTCHA to the network of human helpers. There would be at most 2 CAPTCHA attempts per voter. We set machine costs based on 6 minutes per voter to handle the extra processing. We expect half of the attempts to be forwarded to the human helpers, so the human costs are $0.042 times half the number of voters. In the assessment that used data from named sources, the machine costs to handle CAPTCHAs for 78,610 voters is $170 and the costs for human helpers is $1,651. These costs rise to $245 in machine costs and $1,651 in human helper costs in the assessment that used the National Dataset because it required 1.6 machine attempts per change.

Delaware, Indiana, South Carolina, Utah, and Washington’s websites used Google’s reCAPTCHA v1 system of images of street signs (Figure 23). Automation was done with a version of the Google street view algorithm. The automation appears to still work at the time of this writing, but we explore a semi-automated alternative. The attacker’s computer forwards the images to human helpers, who return the text read from the images. We base machine costs on an estimate of 5 minutes per CAPTCHA to allow for the handling, and human costs on $0.042 per CAPTCHA. Figure 31 shows the totals for the two scenarios for each of these states.

Finally, Nebraska, Nevada, Oregon, and Pennsylvania’s websites used Google's reCAPTCHA v2, which uses clicks on images but offers a verbal option. Automation can be done by using a speech recognition program to respond to the verbal CAPTCHA option. In March, following the November election, Google made the speech more difficult to discern, which increased the number of errors from speech recognition. A network of human helpers could help. Human helpers would receive a sound file and then return the text heard from the sound. We based machine costs on an allocation of 2 minutes per CAPTCHA and human helper costs on $0.084 per CAPTCHA. We used the higher rate for humans because Google has made the task more difficult for humans as well as for speech recognition software. Figure 31 shows the totals for the two scenarios for each of these states.

With semi-automated approaches, total costs increase as the number of voters increases because the costs paid to the human helpers are per CAPTCHA; see Figure 31. Using personal data from darknet sources for the same state website always had higher CAPTCHA costs than using personal data from named sources because this approach involved more attempts to authenticate registered voters; see Figure 31.

Costs of addressing CAPTCHA become a greater percentage of the total attack cost as the number of voter records increases. Consider the assessment that used automated CAPTCHA responses and personal data from named sources. The total automated CAPTCHA costs to change 1 percent of voter records across all 36 websites were $195 (0.8 percent of $24,926 total), costs to change 5 percent of voters were $973 (1.8 percent of $53,915), and costs to change 10 percent were $1,946 (2.2 percent of $90,152). Now consider the assessment that used semi-automated CAPTCHA responses and the same data. Changing 1 percent of voter records across all 36 websites cost $19,416 (43 percent of $44,681 total), changing 5 percent of voters cost $97,080 (64 percent of $152,692), and changing 10 percent cost $194,161 (67 percent of $287,705).

Moving from automation to semi-automation of CAPTCHAs increased the total attack cost for the 36 websites by $19,755 (from $24,926 to $44,681) when changing 1 percent of the voter records using data from named sources and by $29,006 (from $10,081 to $39,087) when using data from darknet sources. See Figure 32.

In short, a website that uses CAPTCHAs can dramatically increase the cost of an attack if the attacker has no means other than labor to circumvent the CAPTCHA. In 2016, however, automation was possible on all the state websites that had CAPTCHAs, rendering CAPTCHAs less of a deterrent and keeping attack costs low.

Figure 31. Relative costs of using a network of human helpers to defeat CAPTCHAs while changing 1, 5, and 10 percent of the voters at the 11 websites having CAPTCHAs. Center right columns compare costs based on using named sources for data (Figure 27); the rightmost columns show costs based on using darknet sources for data (Figure 30).

Figure 32. Comparison of total attack costs for changing 1 percent of voter records on state websites using automated means to defeat CAPTCHAs versus using a semi-automated means with a network of human helpers. On the left are relative costs for using automated means to defeat CAPTCHAs with named sources for data, copied from Figure 27, and with darknet sources for data, copied from Figure 30. The totals on the right show costs when a network of humans is used to defeat CAPTCHAs instead of automated means. The 11 state websites that had CAPTCHAs were: CT, DE, IN, NE, NV, OH, OR, PA, SC, UT, and WA.

Results for Step 6e. Comparison of Total Attack Costs by State

In this sub-section, we rank states by total attack costs. Throughout these results, we base our assessments on two primary attack models that differ by the source of personal data used. In one model, the attacker acquires data through named government offices, data brokers or websites published on search engines. In the other model, the attacker uses a named data broker, a voter list from Texas, and two datasets from darknet sources (National Dataset).

State rankings varied by model. Figure 33 ranks states by the total attack costs for changing 1 percent of all voter records at the 36 state websites using data acquired from named sources. Texas is the most expensive at $3,059, followed by Indiana ($2,106), Virginia ($2,101), and Arizona ($2,066). Delaware is the least expensive at $5, followed by Vermont and the District of Columbia ($9), Rhode Island ($14), and Alaska ($21).

Figure 34 ranks the states by total attack costs for making the same voter records changes using the second model, the one that involves data acquired from darknet sources. Texas is again the most expensive at $3,059, followed by New York ($1,638), Illinois ($1,020), and then California ($580). The least expensive is Alaska at $1, followed by West Virginia ($7), Delaware ($7), and Nebraska ($11), the District of Columbia ($12), and then Vermont ($12).

Factors that determine the attack costs for states in these models are the number of voters in the state, the lowest price found for the personal data and government-issued numbers of the voters, and whether the website used CAPTCHAs.

Website editing of voter rolls, scraping data from data broker websites, and responding to CAPTCHAs are all per person costs. The more voters or attempts to impersonate voters, the greater these costs. States with more voters will therefore have greater attack costs for these activities, if all other factors remain the same.

Another factor that influences state rankings in the two models is reliance on SSNs. The darknet sources made the cost of acquiring SSNs insignificant, so it lowered the total attack costs of those states whose websites relied on SSNs alone.

Texas ranks as the most expensive in both models because its website required a voter ID number that was only available from its voter list, and we did not find a statewide Texas voter list on the darknet. We believe that the attacker would have no choice but to purchase the Texas voter list. We found a partial Texas voter list on the darknet for $15.98 (Figure 18) but not a statewide list. If other state websites required voter ID numbers also, then the attacker would have to acquire their voter lists too, which would drive the maximum attack costs to be those in the first model (Figure 35). On the other hand, if those same voter lists were available on the darknet at prices comparable to what we found for statewide voter lists from darknet sources (e.g., $30.97 for the Nevada voter list; see Figure 18), then the maximum attack costs would be about the same as those we recorded for the second model (Figure 36). Put another way, if a statewide Texas voter list is available on the darknet, then its attack costs will plummet.

In short, we found two attack models that exploit the state websites based on different sources of data available at the time.

Figure 33. Comparison of total attack costs by state for an attack using data from named sources—i.e., data from government offices, data brokers, and websites published on search engines. See Figure 27 for details.

Figure 34. Comparison of total attack costs by state for an attack using data from a combination of two datasets from the darknet to jointly produce a National Dataset having the names, demographics, and SSNs of most adult Americans, access to a named data broker’s website, and a copy of the Texas Voter List purchased from Texas. See Figure 30 for details.

Results: Closing thoughts on data and attack costs

We are not stating that our reported attack totals are the actual costs any attacker would have paid because specifics matter. For example, using a data broker that charges $1 per driver’s license issue date is 100 times more expensive than using a data broker that charges $0.01 per lookup. Still, the models we constructed can be used to estimate total costs based on actual attack decisions made. For example, using a data broker that charges $1 per driver’s license data would have had no impact in the 31 states that did not use driver’s license issue dates. However, attempting to change 1 percent of the voters on all 36 websites and paying $1 instead of $0.01 for each driver’s license issue date lookup would have been significantly more expensive, with the cost increasing from $10,081 to $432,026 using the darknet data option and from $24,926 to $446,983 using data from named sources.

Different kinds of data were more expensive to obtain, so some states could increase attack costs by adding additional requirements for website visitors to satisfy. This adds more data hurdles for online users, but would it be effective? Texas’ website required a voter ID, which was only available on its voter list. Driver’s license issuance dates required a per person lookup fee because the data was most widely available from a data broker through a search service. Document numbers on driver’s licenses were more obscure but still available or inferable. CAPTCHAs increased the computational costs. The attack cost to submit address changes to 1 percent the voters on all 36 websites, assuming each website had all these requirements, goes from $24,926 to only $36,358. This suggests that field selection alone is an insufficient guard against voter identity theft attacks.

We showed that automated programs could defeat the kinds of CAPTCHAs we found on the 11 websites that had CAPTCHAs. As a result, total attack costs were low for those websites. By March 2017, after the presidential election, Google updated its reCAPTCHA v2 service to thwart the kind of audio-to-text automation we describe as capable of defeating those CAPTCHAs. Semi-automated attacks on CAPTCHAs still remain possible, but they are much more expensive than fully automated ones because the attacker must pay humans. We estimated that the cost of changing 1 percent of the voters using data from named sources increases from $24,926 to $44,681 when automated CAPTCHA responses are replaced with semi-automated ones. Even with the improvement Google made to its service, there is no guarantee that there does not exist another automation capable of defeating the new version and driving costs down again.

An attacker does not even need voter data to perpetrate an attack. We showed that using 2 datasets available on the darknet along with a data broker’s website, an attacker could implement an attack for about $10,000 to change the addresses of 1 percent of the voters on 35 of the 36 websites. These 2 datasets, together costing $1,002, included the names, addresses, demographics, and SSNs of most adult Americans. An attacker can use this data, without SSNs, as a replacement for voter data in the attacks we described. Doing so requires making more attempts at a state website because the website will accept only some as registered voters. As we reported, using general population data rather than a voter list increases the number of attempts made by an attacker by about 60 percent. A voter list usually includes voting history and party designation too, making it easier for an attacker to target specific kinds of voters to disenfranchise. However, an attacker can target specific kinds of voters from population data by using address and even race and gender, inferred by name, as the bases for targeting. More generally, consumer lists and club memberships could also be used to target voters presumed to be in favor of or opposed to a position, and can be used without a voter list.

Impersonating voters on the Texas website required voter ID numbers, so the voter list for Texas was necessary for acquiring needed voter ID numbers. An attacker could purchase the Texas voter list from the state for $1,160 to acquire voter date including the ID numbers. The partial voter list available on the darknet for $15.98 that included 4 percent of Texas voters also included voter ID numbers. The lesson from the Texas example is that using voter ID numbers on a website can force an attacker to have a voter list, but the availability of most voter lists at practical costs, as discussed above, suggests that the requirement of voter ID numbers on websites offers little or no protection when voter ID appears on the voter list or is inferable from other information.

Similarly, we found driver’s license data readily available from data brokers or on darknet markets for all states.

Americans and American financial systems consider Social Security numbers (SSNs) to be personally identifying privately held information, but we showed that SSNs are widely available through named data brokers and on the darknet. Some brokers and darknet sources charge $10 per lookup for a targeted person or for a person matching a particular criterion. But in bulk, SSNs are widely available and inexpensive to acquire from named sources as well as on darknet markets. Most notably, we reported multiple offers on the darknet for bulk access to SSNs, including a dataset for $500 that reportedly included the SSNs of virtually all adult Americans. Restricting the search to named sources, we reported data broker websites providing unlimited searches for flat monthly fees from $40 to $60 per month. Ironically, because SSNs are so widely associated with financial, tax, and other records, SSNs are widely available. Twenty-one of the 36 websites required SSNs.

We found data brokers that provide driver’s license data too, although fewer data brokers provided driver’s license data than supplied SSNs. A data broker that provided SSNs and driver’s license data tended to do so at the same bundled price. That is, the price for either SSN or driver’s license information tended to be the same as acquiring both driver’s license data and SSNs together. We also found this pattern on darknet offers, although we did not find a comprehensive dataset of driver’s license data for sale on the darknet markets as we did for SSNs.

Some of the 33 websites that require driver’s license data had driver’s license numbers that could be acquired online. For example, one state had a website where, if the attacker provided the voter’s SSN, the website would provide the voter’s driver’s license number. And 6 states had driver’s license numbers that could be computed from demographics. No data purchase was needed for these states, although though two of them also required driver’s license issuance dates, which necessitated using a data broker except when the issuance dates used by the state relate to the voter’s birthday. Because driver’s license numbers are not as widely associated with other kinds of records as are SSNs, the secondary availability of driver’s license data is less than that of SSNs, but the numbers are still available at reasonable costs for an attacker.

An attack to change 1, 5, or 10 percent of the voter registrations on the 36 websites requires a lot of computation. We explained that an attacker could deploy 57 virtual cloud machines for one month at a cost of less than $1,000 to do the work needed to change 1 percent of the voter records at all 36 websites. That includes the machine time needed to defeat any CAPTCHAs and scrape information from a data broker website. Virtual machines available from multiple cloud services can be utilized on demand.

Architecting a scaled attack takes programming skill along the lines of a reasonably proficient web programmer. But people with little or no programming experience can modify existing code for the specifics of each website. We gave examples of the kinds of simple Python programs needed to do accomplish a similar task at a state website. We had students and summer interns, most of whom were newcomers to programming, revise and customize sample code to state website specifics to learn polling places. And of course, changing hundreds of voter records could be done manually, without any programming whatsoever. Automation becomes necessary at scale, if the goal is to change thousands or millions of voter records.