Spencer: Privacy and Predictive Analytics in E-commerce

Published on May 2016 | Categories: Documents | Downloads: 41 | Comments: 0 | Views: 133
of 19
Download PDF   Embed   Report

This Article discusses the implications of predictive analytics for consumer privacy in e-commerce and surveys potential regulatory responses. Part I introduces predictive analytics and illustrates its potential uses in e-commerce. Predictive analytics helps merchants operate efficiently and maximize profits, but also risks denying consumers important commercial benefits. Part II examines how predictive analytics harms consumer privacy. The prevailing theoretical accounts define privacy as the ability to control what others know about you and recognize privacy’s role in promoting personal autonomy and dignity. Predictive analytics harms privacy as control because individuals cannot know what the data they share will ultimately predict. In addition, predictive analytics harms consumer autonomy and dignity because it deprives them of significant commercial benefits based on secret formulas, and risks automating societal discrimination. Finally, Part III examines potential regulatory responses to the harms of predictive surveillance. Any regulatory response must be measured to avoid diminishing real commercial benefits and stifling innovation. In addition, regulation must be tailored to the type and degree of privacy harm posed by the varying uses of predictive analytics.

Comments

Content

Privacy and Predictive Analytics in
E-Commerce

SHAUN B. SPENCER*

INTRODUCTION

T

his Article discusses the implications of predictive analytics for
consumer privacy in e-commerce and surveys potential regulatory
responses. Part I introduces predictive analytics and illustrates its
potential uses in e-commerce. Predictive analytics helps merchants operate
efficiently and maximize profits, but also risks denying consumers
important commercial benefits. Part II examines how predictive analytics
harms consumer privacy. The prevailing theoretical accounts define
privacy as the ability to control what others know about you and recognize
privacy’s role in promoting personal autonomy and dignity. Predictive
analytics harms privacy as control because individuals cannot know what
the data they share will ultimately predict. In addition, predictive analytics
harms consumer autonomy and dignity because it deprives them of
significant commercial benefits based on secret formulas, and risks
automating societal discrimination. Finally, Part III examines potential
regulatory responses to the harms of predictive surveillance. Any
regulatory response must be measured to avoid diminishing real
commercial benefits and stifling innovation. In addition, regulation must
be tailored to the type and degree of privacy harm posed by the varying
uses of predictive analytics.

*Assistant Professor and Director of Legal Skills, University of Massachusetts School of LawDartmouth. I am grateful to the panelists and participants at the New England Law Review’s
Spring 2015 Symposium, What Stays in Vegas, and at the University of Michigan Technology and
Communications Law Review’s Spring 2015 Symposium, Privacy, Technology, and the Law, where
I presented earlier versions of this article. Parts I and II of this article will appear in Predictive
Analytics, Consumer Privacy, and E-Commerce, in RESEARCH HANDBOOK ON ELECTRONIC
COMMERCE LAW (John A. Rothchild ed., Elgar Publishing, forthcoming 2015).

629

630
I.

New England Law Review

v. 49 | 629

Predictive Analytics and E-Commerce
A. Overview of Predictive Analytics

Predictive analytics predicts future behavior based on the patterns of
past behavior.1 Although predictive analytics uses statistical techniques, it
departs from traditional statistical analysis in several important ways. First,
predictive analytics usually analyzes vast quantities of data rather than
carefully drawn samples. In contrast, traditional statistical analysis has
always relied on sophisticated techniques for drawing representative
samples and inferring population characteristics from those samples. 2 With
the data explosion over the last few decades, however, researchers can use
predictive analytics to observe the entire population and find subtle
patterns that help predict future behavior.3
Second, predictive analytics is less concerned about causation than
traditional statistical methods. By using predictive analytics to study large
datasets with many variables, analysts can build extremely accurate
predictive models based on strong correlations in the data, regardless of
why those correlations exist.4 This technique can reveal correlations one
might not have imagined if one were looking for causation.5
For example, predictive analytics can generate models that predict
when a given mechanical device, like a motor or a bridge, will fail. The
models are based on vast amounts of data from sensors monitoring
patterns in the data that the devices emit, such as heat, vibration, stress,
and sound. It is far less important to know why the device may fail than it
is to know that it will probably fail soon.6 Eric Siegel’s Predictive Analytics7
gives us many examples of what predictive analytics can show us,
including: “[s]uicide bombers do not buy life insurance”;8 crime rises after
upset losses in college football;9 and phone card sales predict massacres in
the Congo.10 In each of these cases, researchers use past correlations to

1

ERIC SIEGEL, PREDICTIVE ANALYTICS: THE POWER TO PREDICT WHO WILL CLICK, BUY, LIE,
80 (2013).

OR DIE

2 See VIKTOR MAYER-SCHÖNBERGER & KENNETH CUKIER, BIG DATA: A REVOLUTION THAT
WILL TRANSFORM HOW WE LIVE, AND THINK 24–25 (2013).
3

Id. at 6.
See id. at 13.
5 SIEGEL, supra note 1, at 88.
6 See MAYER-SCHÖNBERGER & CUKIER, supra note 2, at 58–59.
7 SIEGEL, supra note 1.
8 Id. at 85.
9 Id. at 86.
10 Id.
4

2015

Privacy and Predictive Analytics

631

predict future behavior.
B. Using Predictive Analytics in E-Commerce
Merchants use predictive analytics to identify consumers who share a
condition of interest to the merchant, also known as the “target variable.”11
That condition may be the likelihood of a commercial behavior, like
clicking on an online ad, purchasing a product, defecting to a competitor,
or defaulting on a loan.12 Or the condition may be the likelihood of noncommercial behaviors like dying early or getting into a car accident. 13 Or
the condition may not involve future behavior. It may instead be a specific
characteristic that is of interest to the merchant, such as whether the
consumer is pregnant14 or has a particular medical condition.15
Predictive analytics relies mainly on secondary evidence of these
conditions of interest, rather than primary evidence. Primary evidence, for
example, might take the form of a consumer’s answer to survey questions
about the consumer’s preferences or characteristics. Secondary evidence, in
contrast, appears in many small bits of data about the consumer’s past
behavior.16 This behavioral evidence more accurately reflects consumers’

11 VIJAY KOTU & BALA DESHPANDE, PREDICTIVE ANALYTICS AND DATA MINING: CONCEPTS &
PRACTICE WITH RAPIDMINER 13 (2015); Testimony of Solon Barocas, FTC Workshop: Big Data: A
Tool for Inclusion or Exclusion, at 19 (FTC Sept. 15, 2014), available at
https://www.ftc.gov/system/files/documents/public_events/313371/bigdata-transcript9_15_14.pdf.
12 See KOTU & DESHPANDE, supra note 11, at xi (discussing prediction of customer defection
to a competitor); SIEGEL, supra note 1, at 83 (discussing prediction of loan repayment risk);
FEDERAL TRADE COMMISSION, OFFICE OF THE SECRETARY, DIRECT MARKETING ASSOCIATION
PUBLIC COMMENT, SPRING PRIVACY SERIES: ALTERNATIVE SCORING PRODUCTS 1, 4–5 (Apr. 17,
2014) available at https://www.ftc.gov/policy/public-comments/2014/04/17/comment-00011
(“predictive analytics are used to predict a consumer’s likelihood of being interested in a
product or service” and to “tailor[] marketing materials to meet the preferences of
consumers”).
13

SIEGEL, supra note 1, at 83 (discussing automobile insurer’s prediction of bodily injury
based on vehicle characteristics); id. at 64–65 (discussing health insurance companies’
predictions of policyholder mortality).
14 See Charles Duhigg, How Companies Learn Your Secrets, N.Y. TIMES, Feb. 16, 2012, available
at http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html (discussing Target’s
pregnancy prediction score).
15

See Ryen W. White et al., Web-scale Pharmacovigilance: Listening to Signals from the Crowd,
20 J. AM. MED. INFORM. ASSOC. 404–08 (2012), available at http://jamia.oxfordjournals.org/
content/jaminfo/20/3/404.full.pdf (describing how web searches provide evidence of an
adverse interaction between two drugs, the antidepressant Paroxetine and the cholesterol
drug Pravastatin).
16

See KOTU & DESHPANDE, supra note 11, at 13.

632

New England Law Review

v. 49 | 629

attitudes and preferences than self-reports.17
The familiar story of Target’s “pregnancy prediction score” illustrates
the predictive analytics process.18 First, the merchant identifies a group of
consumers who possess the condition of interest to create a training set. 19
Target wanted to know which customers were pregnant because the
changes in habit formation associated with pregnancy created a significant
opportunity to secure future purchases. 20 Target already had a set of
customers with a known condition of interest — customers who had
signed up for Target’s online baby shower registry and shared their due
date.21 So Target would use a subset of these as its training set.22
Next, the merchant uses the many variables in its training set to
develop a predictive model.23 Target’s training set would be a subset of its
baby shower registrants. Target would use these customers’ detailed
purchase histories to develop a model that weighs many different types of
purchases in order to generate a “pregnancy score” to predict whether a
given customer is pregnant.24 Eventually, Target settled on a model that
included and weighed about twenty-five different products to produce a
“pregnancy prediction” score.25
Third, the merchant tests and refines the model using a different subset
of the customers with known conditions of interest. 26 For example, Target
would use another subset of its baby shower registrants to refine and
perfect its pregnancy score model, along with other customers who
showed no evidence of pregnancy. 27

17 Lior Jacob Strahilevitz, Toward a Positive Theory of Privacy Law, 126 HARV. L. REV. 2010,
2023 (2013).
18

Duhigg, supra note 14.
See generally KOTU & DESHPANDE, supra note 11, at 17–19, 27–28 (discussing
implementation and usage of data mining processes); see Testimony of Solon Barocas, supra
note 11, at 20–21.
20 Duhigg, supra note 14. When consumers change their routines they are susceptible to
forming new shopping habits. Merchants, therefore, see new parents as a valuable customer
segment because landing them while their routines are in flux may produce substantial sales
over the long term. As a Target statistician explained, if Target could identify pregnant
consumers in their second trimester, “there’s a good chance we could capture them for years.”
Id.
19

21

Id.
Id.
23 See KOTU & DESHPANDE, supra note 11, at 27–28.
24 Duhigg, supra note 14.
25 Id.
26 KOTU & DESHPANDE, supra note 11, at 27–28.
27 See Duhigg, supra note 14. Although Duhigg does not mention testing the algorithm on
22

2015

Privacy and Predictive Analytics

633

Finally, once the model is optimized, the merchant applies the final
model to current prospects or customers.28 A Target employee illustrated
how Target might use this prediction with regard to a hypothetical
customer. Based on the customer’s purchase history, Target’s algorithm
might assign an 87% chance that she is pregnant and due in August. Based
on other data about her shopping habits, Target may also know the most
likely marketing approaches to draw her to a Target store or website. For
example, email coupons may trigger her to purchase online, whereas direct
mail that arrives on a Friday may be likely to get her to a store over the
weekend. By applying those techniques to the tens of thousands of
consumers with high pregnancy prediction scores, Target hoped to reshape
their shopping habits to generate purchases at Target for years to come. 29
Target’s implementation involved sending ads for maternity and baby
products to consumers with high pregnancy scores.30 Target’s model
proved to be too accurate for its own good. One Minnesota father stormed
into his local Target complaining that his teenage daughter was receiving
maternity ads.31 The puzzled store manager could only apologize. A week
later, however, the father called to apologize, saying that there had “been
some activities in my house I haven’t been completely aware of. She’s due
in August.”32 Target had learned that the daughter was pregnant before her
father.
Predictive analytics has myriad uses in e-commerce, but they can be
grouped into four common categories: (1) targeted advertising; (2) price
discrimination; (3) customer segmentation; and (4) eligibility
determinations for particular financial and insurance products.33 These

customers who had not signed up for the baby registry, Target had to include non-pregnant
customers in the test set to determine whether the model could predict the likelihood of
pregnancy. Id.
28

See KOTU & DESHPANDE, supra note 11, at 32.
Id.
30 Duhigg, supra note 14.
31 Id.
32 Id.
33 See CLAUDIA PERLICH, FED. TRADE COMM’N, SPRING PRIVACY SERIES: ALTERNATIVE
SCORING PRODUCTS 11–14 (Mar. 19, 2014), available at https://www.ftc.gov/system/files/
29

documents/public_events/182261/alternative-scoring-products_final-transcript.pdf (discussing
targeted advertising); FED. TRADE COMM’N, COMMENTS OF THE SOFTWARE & INFORMATION
INDUSTRY ASSOCIATION ON THE FTC WORKSHOP ON ALTERNATIVE SCORING PRODUCTS 9–10
(Apr. 17, 2014), available at https://www.ftc.gov/policy/public-comments/2014/04/17/comment00010 (discussing price discrimination); CATALYSIS, BUILDING BEST PRACTICE CUSTOMER
SEGMENTATION USING PREDICTIVE ANALYTICS
(Feb.
10, 2012),
available
at
http://media.catalysis.com/prod/resources/files/articles/pdfs/Building%20Best%20practice%20

634

New England Law Review

v. 49 | 629

categories are not mutually exclusive; they represent points along a
continuum. For example, price discrimination that quotes some consumers
impossibly high auto insurance rates can effectively render those
consumers ineligible for auto insurance.
1.

Online Behavioral Advertising

Online behavioral advertising means “the tracking of a consumer’s
online activities over time — including the searches the consumer has
conducted, the web pages visited, and the content viewed − in order to
deliver advertising targeted to the individual consumer’s interests.”34 For
example, a consumer might search a travel website for flights to New York
City, but not buy tickets. The consumer might then visit a local newspaper
to read about the Washington Nationals baseball team. On the newspaper’s
website, the consumer would see a display ad for flights from Washington,
D.C. to New York City.35
The behind the scenes process that led to the display ad involved the
relationships between the travel website, the newspaper website, and an
intermediary called a network advertiser. The travel website had an
arrangement with a network advertiser (Doubleclick, for example), so
when the consumer visited the travel website, the network advertiser
placed a cookie on the consumer’s computer. This cookie tracks aspects of
the user’s online behavior such as websites visited and includes a unique
identifier assigned by the network advertiser. The newspaper website also
had an arrangement with the network advertiser to place an ad on its
website. So when the consumer visited the newspaper website, the
network advertiser’s cookie identified the user as someone potentially
interested in flying to New York, and displayed an ad consistent with that
interest.36
customer%20segmentation.pdf (discussing use of predictive analytics in customer
segmentation); PAM DIXON & ROBERT GELLMAN, THE SCORING OF AMERICA: HOW SECRET
CONSUMER SCORES THREATEN YOUR PRIVACY AND YOUR FUTURE 8–9 (2014), available at
http://www.worldprivacyforum.org/wp-content/uploads/2014/04/WPF_Scoring_of_America_
April2014_fs.pdf (discussing eligibility determinations based on “consumer scores”).
34 FEDERAL TRADE COMMISSION STAFF REPORT: SELF-REGULATORY PRINCIPLES FOR ONLINE
BEHAVIORAL ADVERTISING 46 (2009), available at https://www.ftc.gov/sites/default/files/
documents/reports/federal-trade-commission-staff-report-self-regulatory-principles-onlinebehavioral-advertising/p085400behavadreport.pdf.
35

Id. at 3.
Id. Contrast targeted advertising with “contextual advertising,” in which advertisers
place ads based on the content of the page, and therefore on inferences about the types of
consumers who will be reading that page. Jonathan R. Mayer & John C. Mitchell, Third-Party
Web Tracking: Policy & Technology, IEEE SYMP. ON SECURITY & PRIVACY (2012), available at
36

2015

Privacy and Predictive Analytics

635

This process became more complex with the emergence of ad
exchanges. Ad exchanges emerged in the mid-2000s as a way for websites
to sell the “remnant” ad spaces they could not sell though advertising
networks.37 For each ad in its inventory, an ad exchange takes bids in real
time from many different advertising networks. 38 Ad exchanges, however,
did not change the tailoring of ad placement to consumers’ online
behavior.
Predictive analytics can make online behavioral advertising more
efficient by showing ads to consumers who are more likely to click on
them. For example, an education “information portal” targeted at high
school seniors used predictive analytics to increase the click through rate
for its ads.39 The portal hired a predictive analytics firm to analyze millions
of instances where consumers clicked or did not click different ads. The
firm then generated many different models to decide which consumers’
behavioral profiles make them more likely to click which ads. Using the
models generated increased the response rate by 25% over its existing
online advertising.40
2.

Price Discrimination

Price discrimination involves merchants selling “the same or similar
products at different prices in different markets, where such price
differentials are not based on differences in marginal cost.”41 Familiar
examples of price discrimination include airlines selling seats on the same
flight to different passengers at different rates and theaters offering senior
discounts.42
Predictive analytics, however, allows merchants to make dynamic,
real-time use of price discrimination in e-commerce. The Wall Street Journal

https://jonathanmayer.org/papers_data/trackingsurvey12.pdf; Blase Ur et al., Smart, Useful,
Scary, Creepy: Perceptions of Online Behavioral Advertising, SYMP. ON USABLE PRIVACY AND
SECURITY (SOUPS), July 11–13, 2012, available at https://www.andrew.cmu.edu/user/pgl/
soups2012.pdf.
37 Mayer & Mitchell, supra note 36, at 419.
38 Id.
39 Prediction Impact, Case Study: How Predictive Analytics Generates $1 Million Increased
Revenue, PREDICTIVE ANALYTICS WORLD, http://www.predictiveanalyticsworld.com/
casestudy.php (last visited Sept. 3, 2015).
40 Id.
41 NICK WILKINSON, MANAGERIAL ECONOMICS: A PROBLEM-SOLVING APPROACH 396 (2005),
available at http://www.railassociation.ir/Download/Article?books?Managerial%20Economics%20A%20Problem%20Solving%20Approach.pdf.
42

See id.

636

New England Law Review

v. 49 | 629

reported on companies:
consistently adjusting prices and displaying different product
offers based on a range of characteristics that could be discovered
about the user. Office Depot, for example, told the Journal that it
uses “customers’ browsing history and geolocation” to vary the
offers and products it displays to a visitor to its site. 43

Similarly, Capital One Financial used “personalization technology to
decide which credit cards to show first-time visitors to its website.”44 The
Journal’s follow-up testing showed that users deemed to have “excellent
credit” saw different cards than those with “average credit.”45
Discrimination need not be limited to price. A major cable company
worked with data broker eBureau to “determine the appropriate
equipment and service packages to sell to each new customer.”46 The
company developed a predictive model that “identified and segmented the
risk for every online lead, ultimately scoring and rank ordering each
customer for appropriate level of service and equipment.”47
3.

Customer Segmentation

Customer segmentation groups people or organizations with similar
characteristics such as demographics, purchase histories, or preferences.48
Segmenting customers improves merchants’ marketing and customer
retention efforts by helping them understand their customers better. 49
Predictive analytics augments the segmentation process to reveal more

43 Jennifer Valentino-Devries et al., Websites Vary Prices, Deals Based on Users' Information,
WALL STREET J., Dec. 24, 2012, available at http://www.wsj.com/articles/SB10001424127887323

777204578189391813881534 (referencing Staples, Discover Financial Services, Rosetta Stone
Inc. and Home Depot Inc.).
44 Id.
45 Id.
46 U.S. PIRG & CENTER FOR DIGITAL DEMOCRACY, PROTECTING CONSUMER PRIVACY AND
WELFARE IN THE ERA OF “E-SCORES,” REAL-TIME BIG-DATA “LEAD GENERATION” PRACTICES
AND OTHER SCORING/PROFILE APPLICATIONS 11, COMMENTS SUBMITTED TO FTC WORKSHOP:
ALTERNATIVE SCORING PRODUCTS (2014), available at https://www.ftc.gov/policy/publiccomments/comment-00006-75 (citing eBureau, Fortune 500 and Top 5 Cable Operator,
http://www.ebureau.com/sites/all/files/file/ebureau_successstory_top5cable_operator.pdf).
47

Id.
David Vergara, Database: Get a Little Closer: Use Effective Segmentation with Predictive
Analytics
to
Personalize
Customer
Relationships
(May
2009),
http://www.targetmarketingmag.com/article/use-effective-segmentation-predictive-analyticspersonalize-customer-relationships-406169/1.
48

49

Id.

2015

Privacy and Predictive Analytics

637

subtle and granular segments than traditional approaches.50
Companies often use customer segmentation to develop “churn
scores” identifying the risk that a customer will defect to a competitor.51
For example, a cellular phone carrier may use predictive analytics to
identify the customers who are most likely to switch carriers within a few
months. The same company may then use predictive analytics to identify
which potential defectors offer sufficient long-term value to merit spending
resources to retain them. Finally, predictive analytics may help that
company determine what offers are most likely to persuade the valuable
customers to stay.52
Predictive analytics can also be used to decide what level of service to
deliver to each customer. For example, a merchant’s call center can connect
high-value customers to the best customer service agents, while routing
lower-value customers to an “outsourced overflow call center.”53
4.

Eligibility Determinations

Predictive analytics can also help merchants decide whether to do
business at all with certain consumers. 54 Many consumers are familiar with
credit scores like the FICO Score widely used to determine loan eligibility. 55
However, the proliferation of data in e-commerce allows merchants to
create and use consumer scores in many other contexts. Merchants may
refuse to do business with some consumers because of a risk of fraud or
default.56 Many of these consumer scores are not subject to the Fair Credit
Reporting Act. For example, Experian offers a “Consumer View
Profitability Score” designed to “predict, identify, and target prospect in
households likely to be profitable and pay debt.”57 The database includes
50

Id.
DIXON & GELLMAN, supra note 33, at 51–52.
52 See IBM SOFTWARE, REAL WORLD PREDICTIVE ANALYTICS: PUTTING ANALYSIS INTO ACTION
FOR VISIBLE RESULTS 6–8 (2010), available at http://www.revelwood.com/uploads/whitepapers/
PA/WP_Real-World-Predictive-Analytics_IBM_SPSS.pdf.
53 Natasha Singer, Secret E-Scores Chart Consumers’ Buying Power, N.Y. TIMES (Aug. 12,
2012),
http://www.nytimes.com/2012/08/19/business/electronic-scores-rank-consumers-bypotential-value.html.
51

54

DIXON & GELLMAN, supra note 33, at 19–21.
See FICO Score, Critical in Billions of Lending Decisions, FICO, http://www.fico.com/en/
products/fico-score (last visited Sept. 3, 2015).
56 DIXON & GELLMAN, supra note 33, at 53–55; IBM SOFTWARE, REAL WORLD PREDICTIVE
ANALYTICS: PUTTING ANALYSIS INTO ACTION FOR VISIBLE RESULTS 2 (2010), available at
http://www.revelwood.com/uploads/whitepapers/PA/WP_Real-World-PredictiveAnalytics_IBM_SPSS.pdf.
55

57

DIXON & GELLMAN, supra note 33, at 46.

638

New England Law Review

v. 49 | 629

information on “235 million consumers and 117 million households from
hundreds of data sources.” Scores like this can serve as proxies for credit
risks. However, because they assess households rather than individuals,
they are not governed by the Fair Credit Reporting Act. 58
Merchants may also make de facto eligibility determinations by not
targeting prospects who may be credit risks. Their risk assessment may
include traditional credit scores or include other variables including “the
history of which customers proved to be good or bad risks in this
business.”59 Consumers with scores above a certain risk threshold will be
excluded from marketing outreach, while some risky prospects may be
targeted if their potential value is high enough. 60
II. Predictive Analytics and Consumer Privacy
A. Prevailing Theories of Privacy
The prevailing theoretical accounts of privacy describe what privacy is
and what privacy does. Many theorists define privacy as the individual’s
ability to control what others know about him or her. 61 This notion of
privacy as control reaches back to Warren and Brandeis’ famous account of
the “right to be let alone” in their seminal 1890 law review article, The Right
to Privacy.62 Privacy as control also animated Alan Westin’s work in the
1960s defining privacy as control in four different states: solitude,
anonymity, intimacy, and reserve. 63
Leading theorists have identified privacy’s instrumental value for
promoting personal dignity and autonomy in ways that are important for
individual personality, healthy civic discourse, and democratic
governance.64 For Warren and Brandeis, privacy promotes the “inviolate
personality.”65 Edward Bloustein observed that privacy defines one’s
essence as a human being by promoting individual dignity, integrity,
personal autonomy, and independence.66 Similarly, Ruth Gavison
58

Id.
IBM SOFTWARE, supra note 56, at 8.
60 Id. at 9.
61 Daniel J. Solove, Conceptualizing Privacy, 90 CALIF. L. REV. 1087, 1092 (2002) (identifying
varying accounts of privacy as the right to be let alone, limited access to the self, secrecy,
control over personal information, personhood, and intimacy).
59

62

Samuel D. Warren & Louis D. Brandeis, The Right to Privacy, 4 HARV. L. REV. 193 (1890).
ALAN F. WESTIN, PRIVACY AND FREEDOM 31–32 (1967).
64 Solove, Conceptualizing Privacy, supra note 61, at 1093 (noting accounts of privacy’s
importance for “freedom, democracy, social welfare, [and] individual well-being”).
63

65
66

Warren & Brandeis, supra note 62, at 205.
Edward J. Bloustein, Privacy as an Aspect of Human Dignity: An Answer to Dean Prosser, 39

2015

Privacy and Predictive Analytics

639

described privacy’s role as promoting “liberty, autonomy, selfhood, . . .
human relations, and . . . the existence of a free society.”67
B. Predictive Analytics and Privacy as Control
One can assume that the daughter in Target’s pregnancy prediction
score story did not want Target to know that she was pregnant. After all,
she apparently did not sign up for Target’s baby shower registry. If Target
had asked her in a survey whether she was pregnant, she surely would
have said no. But when she shared all of her shopping habits with Target,
she could not possibly know that she was also sharing secondary evidence
that Target would use to generate a pregnancy prediction score. Had the
daughter known what Target could learn from her purchases, she might
have exercised control over what Target could learn about her by paying in
cash or shopping elsewhere. But that was not an option. Moreover, for the
many consumers who shared their purchases with Target before Target
developed its pregnancy prediction model, even Target did not know that
customer purchases could predict pregnancy.
The control problem gets even more challenging when companies
combine their internal data with third party data to build predictive
models. For example, a merchant might provide data on its existing “high
value” customers to a predictive analytics company. The predictive
analytics company then combines the merchant’s data with information
about those same customers obtained from third parties. Finally, the
predictive analytics company uses the combined data to develop a model
to help identify future high value prospects.68 If the consumers could not
anticipate future predictive uses of the data they shared with the merchant,
they certainly could not know about the future predictive uses of the data
shared with third parties.
Merchants themselves have difficulty valuing data’s future uses. For
example, at the time of Facebook’s initial public offering in 2012, the
issuing banks had valued Facebook at $104 billion. 69 However, Facebook’s
audited financial statements for 2011 reported assets of only $6.3 billion,
which included cash and physical assets but excluded the vast stores of

N.Y.U. L. REV. 962, 965–66, 1002–03 (1964).
67

Ruth Gavison, Privacy and the Limits of Law, 89 YALE L.J. 421, 423 (1980).
See, e.g., TruSignal, Leading Life Insurance Broker Case Study: Firm Reaches New High Value
Customers Through Targeted Display Advertising, TRU-SIGNAL.COM, http://www.trusignal.com/wp-content/uploads/2014/11/TruSignal-Leading_Life_Insurance_Broker.pdf (last
visited Sept. 13, 2015) (describing a model using more than 100 predictive factors to identify a
lookalike audience of over 8 million high value prospects).
68

69

MAYER-SCHÖNBERGER & CUKIER, supra note 2, at 118.

640

New England Law Review

v. 49 | 629

personal data that are Facebook’s lifeblood. The data valuation challenge
arises because most of data’s value lies in unknown future secondary uses,
rather than the original purpose of collection. 70 If merchants cannot value
data easily and consistently, consumers can hardly be asked to value
unknown future uses of their own data.
C. Predictive Analytics and Personal Autonomy and Dignity
Predictive analytics also harms personal autonomy and dignity in
several ways. One type of harm arises from the nature of predictive
algorithms and their secret e-commerce status. Another arises from the risk
that predictive algorithms can institutionalize latent societal
discrimination.
1.

Algorithms in E-Commerce: Secret, Predictive, and
Imperfect

As described above, predictive models can determine the prices
consumers pay, the level of service merchants provide, and even
consumers’ eligibility to make purchases or obtain such essentials as credit,
housing, and insurance. To most consumers, however, the very existence of
the models that dole out these important commercial benefits is a secret. To
the extent that some consumers know they exist, merchants will not reveal
how they work, and consumers are powerless to reverse engineer them. A
recent White House report observed that “big data analytics may . . . create
such an opaque decision-making environment that individual autonomy is
lost in an impenetrable set of algorithms.”71 Thus, the real-world effects of
these secret algorithms diminish consumers’ sense of autonomy and
dignity.72
Next, predictive models do not judge individuals based on their own
actions. Instead, they judge individuals based on things they have not yet
done, and even worse, on things that other people did. 73 Research suggests
that people have an aversion to algorithms because of notions that

70

See id. at 118–20.
EXECUTIVE OFFICE OF THE PRESIDENT, BIG DATA: SEIZING OPPORTUNITIES, PRESERVING
VALUE 10 (2014), available at https://www.whitehouse.gov/sites/default/files/docs/big_data_
71

privacy_report_may_1_2014.pdf.
72 See Danielle Keats Citron & Frank Pasquale, The Scored Society: Due Process for Automated
Predictions, 89 WASH. L. REV. 1, 27 (2014) (discussing how secret scoring systems affecting
credit, housing, employment, and other opportunities threaten human dignity); Strahilevitz,
supra note 17, at 2028 (discussing dignitary harm from “service discrimination”).
73 See Testimony of Solon Barocas, supra note 11, at 20–21; KOTU & DESHPANDE, supra note
11, at 17–19, 27–29.

2015

Privacy and Predictive Analytics

641

“algorithms are dehumanizing” or “cannot properly consider individual”
subjects.74 For these reasons, predictive models challenge individuals’ sense
of autonomy.
Finally, predictive models are always wrong for a subset of consumers.
Merchants do not need them to be perfect as applied to every consumer.
They merely need them to be better than the previous approaches to
pricing, marketing, and eligibility determinations.75 So, predictive models
optimize profits for the merchants, but inevitably misclassify some
consumers. Misclassified consumers who pay higher prices, have fewer
options, and cannot secure credit, housing, or insurance are simply the
“collateral damage” of a predictive algorithm. Treating some consumers as
“collateral damage” from an algorithm they can neither see nor
comprehend offends their sense of dignity and autonomy.
2.

The Institutionalization of Societal Discrimination

Predictive analytics can institutionalize existing societal prejudices.76
Although the mathematics underlying algorithms may be free from
prejudice, the choices that data scientists must make are not. First, someone
must decide how to define the target variable, which carries a risk of
intentional or unintentional discrimination.77 Target, for example, saw
pregnancy as a valuable trait in its customers, but some insurers might
view pregnancy differently. Next, someone must decide what training data
to use. If that training data resulted in part from societal discrimination,
then the existing discriminatory effects will be baked into the predictive
model.78 For example, if there already exists a discriminatory pattern of
lenders targeting poor consumers for unfavorable credit terms, then a
model trained on those data will reproduce that pattern of discrimination.
Poor consumers will be saddled with higher debt service, and will
74 Berkeley J. Dietvorst et al., Algorithm Aversion: People Erroneously Avoid Algorithms After
Seeing Them Err, 144 J. EXPERIMENTAL PSYCHOL.: GEN. 114–26 (2015), available at
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2466040.
75

MAYER-SCHÖNBERGER & CUKIER, supra note 2, at 45–49 (describing predictive modeling
as aiming for “good enough” results).
76 Testimony of Solon Barocas, supra note 11, at 19–22; see FRANK PASQUALE, THE BLACK
BOX SOCIETY: THE SECRET ALGORITHMS THAT CONTROL MONEY AND INFORMATION 41 (2015);
Citron & Pasquale, supra note 72, at 4–5, 13.
77

Testimony of Solon Barocas, supra note 11, at 19–20.
Id. at 20–22; accord Michael Aleo & Pablo Svirsky, Foreclosure Fallout: The Banking
Industry’s Attack on Disparate Impact Race Discrimination Claims Under the Fair Housing Act and
the Equal Credit Opportunity Act, 18 B.U. PUB. INT. L.J. 1, 5 (2008) (discussing the irony of
charging higher rates to riskier debtors, thus increasing the risk of default); see also Citron &
Pasquale, supra note 72, at 18 n.106.
78

642

New England Law Review

v. 49 | 629

therefore have even fewer resources available to engage in the kinds of
behaviors that would convince the predictive model that they should
receive more favorable credit terms.79 And because predictive models
operate invisibly to consumers, they render patterns of discrimination
nearly undetectable. Each consumer sees only the offers that merchants
make to them, not the offers merchants make to others. 80 As Michael
Fertick has observed, the rich see a different internet than the poor. 81
Predictive analytics company TruSignal offers an “Ideal Audiences”
service to let merchants market only to online consumers that “look like”
current high-value customers.82 TruSignal generates ideal audiences by
drawing on predictive analytics tools and its “big data warehouse of offline
consumer profile information.”83 TruSignal draws its audience data from
the Bluekai Exchange, which purports to hold “actionable audience data”
on 80% of the U.S. Internet population.84 The risk, of course, is that the way
that current high-value customers “look” may be a product of societal
discrimination. If so, building a model to replicate those customers
institutionalizes that discrimination.
Researcher Nathan Newman found evidence that online display
advertising reinforced racial stereotypes. He created test Gmail accounts
and assigned some white-sounding names, others African-Americansounding names, and still others Latino-sounding names. He then sent
emails about several different topics to and from the test accounts. Because
Google scans all Gmail and delivers ads based on their content, he wanted
to see if the types of ads delivered would vary if he held the content
constant but varied the names. When the test accounts sent emails about
car purchases, the white-sounding names all saw ads from car dealers or

79

Cf. PASQUALE, supra note 76, at 41.
Citron & Pasquale, supra note 72, at 10–11; Michael Fertik, The Rich See a Different Internet
Than the Poor, SCIENTIFIC AMERICAN, (Jan. 15, 2013), http://www.scientificamerican.com/
article/rich-see-different-internet-than-the-poor/.
81 Fertik, supra note 80; accord EXECUTIVE OFFICE OF THE PRESIDENT, supra note 71, at 10
(discussing the risk of disparate treatment of disadvantaged groups); Singer, supra note 53
(discussing how financial sector consumer scores risk creating “a new subprime class”);
Joseph W. Jerome, Buying and Selling Privacy: Big Data's Different Burdens and Benefits, 66 STAN.
L. REV. ONLINE 47, 51 (2013) (discussing how big data harms self-determination and
autonomy, especially for poor consumers).
80

82

TruSignal Unveils High Value Consumer Audience Targeting Segments on the BlueKai
Exchange, TRUSIGNAL (Feb. 16, 2012), http://www.tru-signal.com/press-releases/trusignalunveils-high-value-consumer-audience-targeting-segments-on-the-bluekai-exchange.
83

Id.
Data
Activation:
The
Audience
Data
Marketplace,
ORACLE |
BLUEKAI,
http://www.bluekai.com/audience-data-marketplace.php (last visited Sept. 13, 2015).
84

2015

Privacy and Predictive Analytics

643

car buying sites. In contrast, the African-American sounding names all saw
at least one ad related to “bad credit card loans” or used car purchases. The
Latino-sounding names saw a mix. When the test accounts sent emails with
the term “education” in the subject line, the white-sounding names saw
more ads for graduate education, while the non-white names saw more ads
for undergraduate and non-college education.85
Predictive analytics may also reinforce class-based discrimination. A
Wall Street Journal investigation showed that geographically-based price
discrimination can “reinforce patterns that e-commerce had promised to
erase: prices that are higher in areas with less competition, including rural
or poor areas. It diminishes the Internet’s role as an equalizer.”86 The
Journal found that Staples offered discount prices to ZIP codes with
weighted average income of about $59,900, but offered higher prices to ZIP
codes with weighted average incomes of about $48,700. 87
III. Implications for Regulating Predictive Analytics
Any regulatory response must balance potential harm against the
commercial benefits of predictive analytics.88 Predictive analytics can
maximize revenues by improving marketing efficiency, improving
customer retention, attracting high-value customers, preventing fraud, and
avoiding credit risks. It can also improve the consumer experience by
providing consumers with more relevant ads, offers, and products.89
In addition, regulation must avoid stifling innovation. Legislating
technology can have unintended consequences, and legislators may have
difficulty keeping pace with developing technologies. For example, there is
near-universal agreement that the Electronic Communications Privacy
85 Nathan Newman, Racial and Economic Profiling in Google Ads: A Preliminary Investigation
(Updated), HUFFINGTON POST (Nov. 20, 2011, 5:12 AM), http://www.huffingtonpost.com/
nathan-newman/racial-and-economic-profi_b_970451.html, noted in PASQUALE, supra note 76,
at 40. For a similar study finding statistically significant discrimination in ad delivery based
on searches of 2,184 racially associated personal names. See Latanya Sweeney, Discrimination
in Online Ad Delivery, ACM QUEUE (Apr. 2, 2013), https://queue.acm.org/detail.cfm?id=
2460278 (cited in PASQUALE, supra note 76, at 236).
86 Valentino-Devries et al., supra note 43.
87 Id.
88 For a discussion of how existing law may apply to predictive analytics in e-commerce,
see Shaun B. Spencer, Predictive Analytics, Consumer Privacy, and E-Commerce, in RESEARCH
HANDBOOK ON ELECTRONIC COMMERCE LAW (Elgar Publishing, John A. Rothchild ed.,
forthcoming 2015) (discussing application of the FTC Act, Children’s Online Privacy
Protection Act, Equal Credit Opportunity Act, Fair Housing Act, and state laws prohibiting
discrimination in insurance and public accommodations).
89

See MAYER-SCHÖNBERGER & CUKIER, supra note 2, at 58; SIEGEL, supra note 1, at 23.

644

v. 49 | 629

New England Law Review

Act’s (“ECPA”) approach to e-mail privacy is based on a long-outdated
conception of how e-mail works.90 Yet Congress has been unable to amend
ECPA despite repeated attempts.91 Similarly, the Computer Fraud and
Abuse Act (“CFAA”) has been decried as an abuse of justice and the “worst
law in technology.”92
In the context of predictive analytics, regulators must avoid dictating
what types of data merchants can use to train predictive models, aside
from overtly discriminatory factors such as race. It also means that
regulators should avoid regulating how the models are constructed.
Instead, regulators should focus on the harmful outputs or effects of
predictive models. As discussed above, the privacy harms caused by
predictive analytics fall into three categories: (1) the loss of control over
how one’s information is used; (2) autonomy and dignity harms from
secret and even flawed uses of models to dole out commercial benefits; and
(3) discriminatory allocation of commercial benefits. Figure 1 below
proposes a classification for the degree of harm posed by the various uses
of predictive analytics in e-commerce.
Figure 1: Degrees of Harm Posed by Predictive Analytics in Ecommerce
Use

Online Behavioral

Price

Customer

Eligibility:

Advertising

Discrimination

Segmentation

Commercial

General

Eligibility:
Credit,

Transactions

Housing,
Insurance

Harm

Error Rate

Minimal

Minimal

Minimal

Minimal

Moderate

Secrecy

Minimal

Minimal

Minimal

Minimal

Moderate

90 See, e.g., Charles H. Kennedy, An ECPA for the 21st Century: The Present Reform Efforts and
Beyond, 20 COMMLAW CONSPECTUS 129, 129, 145–53 (2011) (discussing the challenges in
applying ECPA’s outdated framework to unanticipated technologies).
91 Id. at 153–61 (discussing reform efforts). For recent attempts to amend ECPA, see
Electronic Communications Privacy Act Amendments Act of 2015, S. 356, 114th Cong. (2015);
Electronic Communications Privacy Act Amendments Act of 2013, S. 607, 113th Cong. (2013);
Electronic Communications Privacy Act Amendments Act of 2011, S. 1011, 112th Cong. (2011).
92

Lothar Determann, Internet Freedom and Computer Abuse, 35 HASTINGS COMM. & ENT. L.J.
429, 429–30 (2013) (quoting Tim Wu, Fixing the Worst Law in Technology, THE NEW YORKER
(Mar. 18, 2013), available at http://www.newyorker.com/news/news-desk/fixing-the-worst-lawin-technology).

2015
Poverty

645

Privacy and Predictive Analytics
Moderate

Moderate

Moderate

High

High

High

High

High

High

High

Discrimination

Discrimination
Against
Traditionally
Protected Classes

Areas of minimal harm should be left largely to the market with
minimal regulatory intervention. The error rate, in particular, should be left
to competition between merchants to develop the most accurate models. To
mitigate the privacy harm from the secret use of predictive analytics, the
FTC could treat merchants’ privacy policies as deceptive acts under the
FTC Act, unless they disclose predictive uses of data in at least general
terms.93 This approach is by far the most achievable because it would not
require any new legislation or regulations.
Areas of moderate and high harm, in contrast, require more direct
intervention targeted at the harmful outcomes. For the moderate harms
caused by flawed data or model error in critical eligibility determinations,
regulatory approaches should require that merchants use statistically
sound methodology94 and should afford consumers the opportunity to
review and correct data.95
To address the high harms caused by discrimination against the poor
and against traditionally protected classes, new legislation should
authorize disparate-impact claims concerning eligibility for all commercial
transactions. Disparate-impact claims are currently authorized to varying
degrees in credit, housing, insurance, and employment sectors. 96 We ought
93 See Daniel J. Solove & Woodrow Hartzog, The FTC and the New Common Law of Privacy,
114 COLUM. L. REV. 583, 585 (2014) (discussing FTC privacy enforcement actions pursuant to
its jurisdiction over unfair and deceptive trade practices).
94 For example, the Equal Credit Opportunity Act requires that creditors using “empirically
derived” scoring systems use statistically sound methodology. 15 U.S.C. § 1691(b)(3) (2012); 12
C.F.R. § 202.2(p) (2014).
95 For example, the Fair Credit Reporting Act provides consumers with the right to access
information in their credit file, 15 U.S.C. § 1681(g) (2012), and provides a procedure for
consumers to dispute the information in their file, 15 U.S.C. § 1681(i) (2012).
96 See Spencer, supra note 88 (discussing how disparate-impact claims under ECOA and
FHA regulations may apply to predictive analytics). In 2015, the U.S. Supreme Court
confirmed that disparate-impact claims are cognizable under the FHA, and offered guidance
on the plaintiff’s prima facie case, the burden-shifting framework, and the business necessity
defense in FHA cases. Tex. Dep’t of Hous. & Cmty. Affairs v. Inclusive Cmtys. Project, Inc.,

646

New England Law Review

v. 49 | 629

not tolerate practices that exclude traditionally protected classes from
commercial transactions, regardless of whether that exclusion is
unintended.
Applying the disparate-impact test to predictive analytics, however,
will be quite challenging.97 Under the traditional burden-shifting
framework of employment cases, business necessity is a defense to proof of
disparate impact.98 Merchants may argue that predictive models by their
very nature must survive disparate-impact scrutiny because they produce
the most efficient results. Accordingly, they would argue, there is no
alternative model that would have a less discriminatory impact while
maintaining an equally efficient outcome. Consumers and regulators may
respond that a modest increase in marginal profits should not excuse
disadvantaging protected classes. Consumers could draw analogies to
employers’ failed attempts to justify discriminatory hiring practices based
on the biases of their customers.99 As a practical matter, of course, passing
broad disparate impact-legislation seems unlikely. But this debate over
how to apply disparate impact-claims to predictive analytics will likely

135 S. Ct. 2507, 2514–15, 2525 (2015). The Court emphasized several limitations on disparateimpact liability to ensure that regulated entities can make “practical business choices and
profit-related decisions” essential to free enterprise. Id. at 2518. First, the plaintiff cannot
establish a prima facie case based solely on a statistical disparity. Instead, the plaintiff must
also identify the “defendant’s policy or policies causing that disparity.” Id. at 2523. Second, the
Court emphasized that defendants facing FHA disparate-impact claims have a defense
analogous to the business necessity defense in Title VII cases, and that the defense allows
defendants to “state and explain the valid interest served by their policies.” Id. at 2522. Third,
the Court emphasized that, to refute the defendant’s stated business need or government
interest, the plaintiff must show an available alternative practice with less disparate-impact
that still serves the defendant’s legitimate needs. Id. at 2518. Finally, the Court stressed that
the defendant’s “policies are not contrary to the disparate-impact requirement unless they are
‘artificial, arbitrary, and unnecessary barriers.’” Id. at 2524 (quoting Griggs v. Duke Power
Co., 401 U.S. 424, 431 (1971)).
97 In fact, discovering disparate impact in the first place may be challenging, because
consumers will not know why the merchant refused to do business with them or whether
other consumers in the protected class were excluded.
98

See 42 U.S.C. § 2002e-2(k)(1)(A)(i) (2012) (requiring that, after complainant demonstrates
disparate impact on protected class, employer must demonstrate that “the challenged practice
is job related for the position in question and consistent with business necessity”).
99 See, e.g., Rucker v. Higher Educ. Aids Bd., 669 F.2d 1179, 1181 (7th Cir. 1982) (holding
that Title VII forbids “refus[ing] on racial grounds to hire someone because your customers or
clientele do not like his race”); Diaz v. Pan Am. World Airways, Inc., 442 F.2d 385, 389 (5th
Cir. 1971) (rejecting customer preference for female flight attendants as justification for sex
discrimination, where discriminatory employment policy was not founded on “business
necessity”).

2015

Privacy and Predictive Analytics

647

play out under the ECOA and FHA. 100

CONCLUSION
Predictive analytics promises substantial benefits for merchants and
consumers alike, but can harm consumer privacy. Balancing these benefits
and harms requires careful attention to the nature and degree of the harm,
as well as the risk that regulatory intervention may stifle innovation. This
article proposes that regulators allow the market to police areas of minimal
harm flowing from the error rates and secrecy inherent in predictive
analytics. For more significant harms involving discrimination against the
poor and against traditionally protected classes, regulators and legislators
should intervene, either using existing legal tools or through new
legislation targeted at potential discriminatory impact from predictive
analytics.

100

See Spencer, supra note 88 (discussing how disparate impact claims under ECOA and
FHA regulations may apply to predictive analytics); Solon Barocas & Andrew D. Selbst, Big
Data’s Disparate Impact, 104 CALIF. L. REV. (forthcoming 2016) (discussing how disparate
impact claims may be applied to predictive analytics).

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close