Professor Luo Instagram research

Published on June 2016 | Categories: Types, Presentations | Downloads: 89 | Comments: 0 | Views: 673
of 10
Download PDF   Embed   Report

A University of Rochester professor is mining Instagram for trends on teenage drinking.

Comments

Content

Monitoring Adolescent Alcohol Use via Multimodal Analysis in Social Multimedia
Ran Pang, Agustin Baretto, Henry Kautz, and Jiebo Luo
University of Rochester
Rochester, NY 14627
{rpang, abaretto, kautz,jluo}@cs.rochester.edu
Abstract—Underage drinking or adolescent alcohol use is a
major public health problem that causes more than 4,300
annual deaths. Traditional methods for monitoring adolescent
alcohol consumption are based on surveys, which have many
limitations and are difficult to scale. The main limitations
include 1) respondents may not provide accurate, honest
answers, 2) surveys with closed-ended questions may have a
lower validity rate than other question types, 3) respondents
who choose to respond may be different from those who chose
not to respond, thus creating bias, 4) cost, 5) small sample size,
and 6) lack of temporal sensitivity. We propose a novel
approach to monitoring underage alcohol use by analyzing
Instagram users’ contents in order to overcome many of the
limitations of surveys. First, Instagram users’ demographics
(such as age, gender and race) are determined by analyzing
their selfie photos with automatic face detection and face
analysis techniques supplied by a state-of-the-art face
processing toolkit called Face++. Next, the tags associated with
the pictures uploaded by users are used to identify the posts
related to alcohol consumption and discover the existence of
drinking patterns in terms of time, frequency and location. To
that end, we have built an extensive dictionary of drinking
activities based on internet slang and major alcohol brands.
Finally, we measure the penetration of alcohol brands among
underage users within Instagram by analyzing the followers of
such brands in order to evaluate to what extent they might
influence their followers’ drinking behaviors. Experimental
results using a large number of Instagram users have revealed
several findings that are consistent with those of the
conventional surveys, thus partially validating the proposed
approach. Moreover, new insights are obtained that may help
develop effective intervention. We believe that this approach
can be effectively applied to other domains of public health.
Keywords-social media, social multimedia, data mining,
underage drinking public health

INTRODUCTION
Surveys have been widely used in conducting public
health related studies, in the forms of online survey, mobile
survey, paper survey, or a combination of all modes.
Online surveys and mobile surveys tend to be the most costeffective modes of survey research today. Surveys have
several advantages and disadvantages, including:
Advantages:
Relatively easy to administer
Can be developed in less time (compared to other
data-collection methods)

Can be administered remotely via online, mobile
devices, mail, email, kiosk, or telephone.
Conducted remotely can reduce or prevent
geographical dependence
Numerous questions can be asked about a
subject, giving extensive flexibility in data analysis
Disadvantages:
Respondents may not feel encouraged to provide
accurate, honest answers
Respondents may not feel comfortable providing
answers that present themselves in an unfavorable
manner.
Respondents may not be fully aware of their reasons
for any given answer because of lack of memory on
the subject, or even boredom.
Surveys with closed-ended questions may have a
lower validity rate than other question types.
Data errors due to question non-responses may exist.
The number of respondents who choose to respond
to a survey question may be different from those
who chose not to respond, thus creating bias.
With the growing popularity of social networking sites
and the proliferation of mobile devices and camera phones,
new opportunities and challenges emerge as people can
now actively generate contents that provide a unique
compilation of information that is more frequently updated
and self-representative than traditional media. Analysis of
pictures uploaded by users and their descriptions, together
with the use of face detection technologies and data mining
techniques, can allow us to obtain rich and useful data
about the users. Such data can be largely collected and used
to better understand populations’ behaviors. This can be of
vital importance for areas such as public health when trying
to identify subpopulations at greatest risks for a particular
problem, as health professionals can potentially target their
prevention campaigns to intervene before problems fully
develop.
Despite of the extensive research on social media, social
multimedia is a relatively under-investigated area. A few
notable studies have shown the enormous potential of using
rich multimodal data, especially visual data and textual data
together, to develop novel solutions to current problems in
politics, business, health, travel, entertainment, social
networking, and lifestyle [12][29][28][24][25][26][27]. In

this paper, we introduce the use of social multimedia as an
alternative and complementary source of data for research
that could potentially overcome many of the limitations
from other traditional methods of data collection, such as
surveys. In particular, we show that we can monitor
underage consumption of alcohol by analyzing multimedia
contents from the popular social network Instagram,
identifying both age and drinking patterns. We will also
analyze the Instagram accounts of several alcohol brands in
order to measure the amount of underage users following
them and evaluate the penetration of alcohol advertising
within social networks and its impact upon users.
I.

RELATED WORK

Alcohol is the drug of choice among youth. According to
the National Survey on Drug Use and Health (NSDUH), a
survey carried out in 2013 by the U.S. Department of
Health and Human Services, an estimated 8.7 million
underage persons (aged 12 to 20) were current drinkers,
including 5.4 million binge drinkers (consuming 4 or more
drinks per occasion for women or 5 or more drinks per
occasion for men at least once in the past month) and 1.4
million heavy drinkers. Corresponding percentages of
underage persons in 2013 were 22.7 percent for current
alcohol use, 14.2 percent for binge alcohol use, and 3.7
percent for heavy use. Apart from the direct consequences
of drinking to a teenager’s health, such as brain [23], liver
[3] and growth and endocrine [5] effects, there are also the
indirect risks (school problems, social problems, legal
problems, unprotected sexual activity, access to other drugs)
and harms caused by underage drinkers to others (violence,
sexual assault) or even fatalities such as traffic accidents
[11]. NSDUH indicates that an estimated 10.9 percent of
persons aged 12 or older drove under the influence of
alcohol at least once in the past year. Young adult drinkers
pose a serious public health threat, putting themselves and
others at risk [16].
Surveys are the most widely used data collection method
for underage drinking1. In the US, there are several national
scoped surveys carried out every year related to alcohol
drinking, most of them sponsored by organisms such as the
National Institute on Alcohol Abuse and Alcoholism
(NIAAA) and the Centers for Disease Control and
Prevention (CDC). One of the most popular ones is the
“Monitoring the Future” (MTF) survey, defined in its
website as “a continuing study of American youth” and
which has been taking place in secondary schools since
1975. The MTF and NSDUH are the Federal Government's
largest and primary tools for tracking youth substance use.
All these surveys provide useful data for the understanding
of alcohol problems and the elaboration of specific
prevention campaigns but still, they suffer limitations that
1

 

http://pubs.niaaa.nih.gov/publications/2012DataDirectory/2012DataDirect
ory.htm 

are intrinsic to surveys like sampling error margins, high
costs in terms of time and money, low scalability and
delayed results generation since data collection. It takes for
example around a year to prepare, perform and process the
data for the MTF survey with a sample of 46,000 students
from 389 secondary schools across United States. Under
such circumstances, a more scalable approach with
acceptable accuracy to monitoring adolescent alcohol use
becomes imperative.
There is a fundamental difference between traditional
surveys data and social media extracted data. While
questionnaire researchers actively ask subjects for the data
of their interest, social media collection and analytics
focuses on mining for the data of interest among that
voluntarily provided and made public by subjects. The most
important advantage of such a data gathering method lies in
its scalability. With huge amounts of free information
flowing over social media, the scale can easily range from a
small size, specifically located area focus to a nation-wide
focus without significant additional costs. On the other
hand, information on social media tends to be mixed and
noisy, as opposed to the straightforward answers that can be
found on any traditional questionnaire. Thus, data mining
methods are often required to filter useful information. Our
approach calls for the use of the photos and texts on
Instagram to first identify the targeted population and
subsequently monitor the adolescent drinking activities and
patterns. In addition to scalability and potentially interesting
new findings, our approach provides a nearly real-time,
low-cost, effective and complementary alternative to
traditional surveys.
A growing amount of research literature has been written
related to the use of public information available on Internet
to survey populations: Google Trends data has been used to
monitor from viral diseases outbreaks [30] to the use of
tobacco by youth and adults [19]. There are also other
works based on the use of social media contents in order to
monitor epidemic diseases [13] as well as detecting
Alcohol-Related Promotions on Twitter [14]. There are also
reports on how whispers on Twitter help law enforcement
detect adolescent alcohol drinking parties, vandalism, and
the traces of fugitives [20]. However, only a very limited
amount of activities with enough popularity can produce
such whispers to be noticed while the majority of more
private alcohol use activities can easily slip attention.
The main contributions of this work are several folds:
1. We identify a demographically matching social
multimedia platform in Instagram to study the
important social problem of underage drinking;
2. We employ the state of the art computer vision
techniques to identify the targeted population
among Instagram users;
3. We exploit natural language understanding based
on an extensive dictionary to extract activity signals;

4.

Several features of Instagram make it a good choice for
monitoring adolescent alcohol use. Today there are more
than 300 million Instagram users 2 among which around
41% are aged between 16 and 24. Instagram has become
the most important social platform for teenagers, who start
to migrate from their former accounts of Facebook and
Twitter, especially those with upper-income and coming
from urban areas [22]. One motivation for this migration
may be that Instagram offers greater opportunities for youth
to share information outside of the gaze of parents or school
officials. Adolescents often friend their parents on

Facebook [15], and Facebook’s real names policy makes it
relatively easy for any parent or school official to find a
youth’s Facebook profile. While Twitter does not require
real names, its broadcast nature and large adult audience
makes it unattractive place to post information that user
wishes to keep confidential. By contrast, teenagers find in
Instagram a safe place to post potentially incriminating
information about underage alcohol use, without fear that it
will be seen their parents or teachers. As a matter of fact,
Instagram does not require a real name or even a birth date
when registering to the site. The lack of verification of age
means that Instagram may have many users even younger
than 13, despite that this not be in compliance with the
existing federal laws3. While only 17% of online US adults
use Instagram, more than 90% of Instagram users are under
the age of 35, and usage by teens aged 13-18 grew from
17% to 30% between 2012 and 2014 [22]. Roughly half of
Internet-using young adults ages 18-29 (53%) use
Instagram and half of all Instagram users (49%) use the site
daily [7]. This makes our studied subsample highly
representative for the intended study.
The distributions can also be seen graphically in Figure 1.
Interestingly matched with Instagram user demographics,
studies have also found higher levels of family income is
related to higher levels of alcohol use, heavy use, and
abuse/dependence among persons under age 18 [8].
Another advantage of using Instagram over other social
networks is that users’ interactions within Instagram are
primarily based on sharing images and videos (that can be
supported by comments and even hashtags), which provide
a good input for both image based analysis and tags based
analysis. Tags may be the most important text information
source. Users can add hashtags on their posts to label the
topics or features being represented. Anyone can search for
posts on Instagram related to their interested topics and
features. Users can also comment on posts of others.
Moreover, Instagram is a mobile-phone based app with
official support for iOS and Android handsets and third
party support for Blackberry and Nokia-Symbian devices.
Therefore, most photos and videos on Instagram are taken
by users with their own smartphone cameras and from their
real current locations as things are happening. In fact, the
level of user engagement is very high with 49% of them
connecting on a daily basis [7]. Most of Instagram media
contents are self-representative and users use the social
network as a way to show themselves and their lives on a
daily basis.
Because of its incredible growth over the last few years,
Instagram has also become a place to be for advertisers.
Around 92% of major brands have an Instagram account [9]
and this provides a baseline to measure the penetration (if
any) of alcohol-related brands among teenagers. Finally,
most Instagram account contents, profiles and lists of

2

3

5.

We combine robust and complementary signals
from multimodalities to discover behavior patterns
at a large scale and a fine granularity not seen
before on this subject; and
We develop a novel alternative approach to surveys
using social multimedia and obtain promising
results for a public health problem.

Figure 1: Instagram statistics from BI Intelligence.

II.

DATA SOURCE

 http://blog.instagram.com/post/104847837897/141210-300million

http://www.ecfr.gov/ (Title 6, Chapter I, Subchapter C, Part 312) 

friends and followers are available via official APIs (with
the exception of data set as private by their owners).
III.

METHODOLOGY

Instagram does not provide any personal information
about its users (such as real name, age, gender, address).
Therefore, most of our study will be based on extracting
information out of the only data available: contents
generated by users. Taking into account that Instagram is an
image based social network where most of the users upload
self-representative pictures and that these are usually
accompanied by descriptive hashtags, we can make use of
both these elements to infer the missing data needed for our
study. This computational data analytics framework is
shown in Figure 2 and generally applicable to problems that
involve using social media to study user behaviors.

those tags, we gradually incorporated similar tags or
synonyms also found in those images.
Human supervision was required for this task in order to
avoid adding to the list popular hashtags (e.g. #happy) or
terms that were not exclusive to alcohol consumption (e.g.,
“party” might appear in many images that are associated to
alcohol but also in others that are not). Picture descriptions
often include the use of slang terms and intentional or
accidental misspellings. Thus, many term variations (such
as plurals) and deformations (e.g. “wiskie”) were also
included in the list.
The alcohol dictionary is composed by 1) nonambiguous words referring to alcohol consumption and its
consequences, 2) a list of non-ambiguous popular alcohol
brands (e.g. “Corona” is a beer brand but also means
“crown” in Spanish, so such hashtag might not necessarily
be related to alcohol consumption), and 3) a list of nonambiguous popular alcoholic drinks (e.g., “daiquiri” but not
“screwdriver”). More details can be found in the Appendix.

Figure 2: The general framework.

A. Text Analysis
The first, quickest and easiest way of filtering media and
profiles is by analyzing the text attached to them. Pictures
and videos contain descriptions and comments (which can
also contain hashtags) and user profiles can also have some
text such as a bio. We will use the bio and descriptions
created by owners of accounts to detect the language
spoken by the user with aid of existing Java libraries4.
We will also use customized dictionaries in order to filter
specific contents requested to Instagram. A selfies
dictionary will be used to detect those images containing
tags related to self-representation and we will then assume
that such images will be representations of the owner. The
selfie images will be eventually used as an input to
calculate users’ demographics with image analysis tools as
explained later in this section.
Another set of keywords will be used for the alcohol
dictionary (see appendix A). This dictionary will be used to
identify those pictures associated with alcohol consumption.
If a user posts a picture with alcohol related tags such as
"tequila" or "drunk", and considering most of the contents
are usually self-referenced, it is reasonable to believe that
the picture might have been taken during an alcohol
consumption act where the user was involved. The
procedure for creating the dictionary was iterative and
manual. In the beginning, the dictionary only contained a
few seed terms directly associated with drinking behaviors
and conditions (drunk, drinking and alcohol). As we
iterated over the pictures returned by Instagram containing

4
Shuyo, Nakatani, Language Detection Library for Java.
http://code.google.com/p/language-detection, 2010

Figure 3: Examples of face detection and facial feature
localization by Face++ (shown with permission).

Figure 4: Examples of selfies.

B. Image Analysis
We will make use of an existing engine, Face++, which is
devoted to face and age detection from people images5, in
order to discriminate those users above the legal drinking
age (21 years old for the US) from those below. Given an
image, the engine can locate all the faces or the single
5

Megvii Inc. Face++ Research Toolkit. www.faceplusplus.com, December
2013. 

largest face (based on parameter settings) and return an
estimation of the age of that person. The example (shown
with permission) in Figure 3 shows the faces detected by
the engine in a photo, as well the facial features that will be
used for gender and age recognition. Here pink squares
suggest female, blue squares suggest male. There are also
estimations for age and race for each face detected in the
photo. Note that the actual selfies (Figure 4) may not be as
clean and high in quality.
The Face++ engine also allows us to estimate other
demographics from a face picture, such as gender and race.
Gender is a float number that ranges from -1.0 (male) to
+1.0 (female) and race is a discrete value that can take
values out of White, Asian and Black.
To benchmark the accuracy of the face analysis engine
for classifying users based on age and gender, we tested on
a subset of 1000 randomly selected Instagram users who
had posted selfies. We compared the age and gender
estimation of the face analysis engine with our manually
classified ground truth. We found that the precision of the
age classifier was 0.80 with a recall of 0.64. The precision
of the gender classifier is 0.93 with a recall of 0.94 (at the
threshold of 0.0). The accuracy for gender classification is
very high on our data set, which is completely independent
of Face++. Note that the accuracy for age estimation is
sufficient to support the rest of the experiment because even
though the recall is only .64, what matters to our study is to
reliably, at the precision of 0.80, find sufficient numbers of
(as opposed to all) users in the age groups of interest. .

Figure 5: Distribution of estimated age of Instagram Users.

Figure 5 shows the distribution of estimated age of the
population of Instagram users we track. As expected, the
Instagram users are mostly young people with the peak age
around 24. Moreover, our population consists of a good
portion of teenagers and young adults, who are the primary
subject of this study.
It is very challenging for the state of the art image
analysis algorithms to perform any kind of image analysis
in order to determine if a picture is related to alcohol
consumption. A photo taken of a bottle or a glass cannot be

reliable to assume it contains alcohol and the different
possibilities of objects being photographed are endless.
Instead, our alcohol detection approach is based on the tags
and comments referred to users’ uploaded media.
C. Age Detection Process
Our approach to detect age of the users then involves the
following steps:
1) We first request Instagram for all of the media
contents uploaded by our specific target user;
2) We filter those pictures with our selfies dictionary in
order to obtain what we assume are pictures representing
the user (we double check face sizes to confirm selfies);
3) We use the face detection engine to detect the
presence of any face on each of the pictures. In case no
face is detected, the picture is discarded; and
4) In case there is one face detected, we estimate the age
of the person with aid of the age detection engine. If
there are more than one face, we use the largest face as
the input for the engine. The logic is: if the photo is a
selfie and there is a face in it, this face is likely to be the
account owner’s face. If there are many faces, the person
taking the picture is the one closest to the camera;
5) We continue iterating over the rest of the pictures from
that same target user also tagged as selfies, applying the
same validations and processing to each of them until
there are no more selfies or we get to a reasonable limit;
6) Once all of the selfies from a user were analyzed, we
average all the different age estimations and assign that
estimated age to the respective account; and
7) Other demographics are also calculated by the engine
and associated to each account: gender is averaged, race
is picked from the value appearing the most times.
It is worth mentioning that the availability of many
selfies for the same user helps reduce the error in the
estimated age. Face++ claims a standard deviation σ of 5
years for age estimation. If we take the average of m
independent age estimates from different selfies of the same
Instagram user, assuming that the error falls in a Gaussian
distribution, the standard deviation σm of the averaged age
estimate from m independent identically distributed
estimates will be reduced to (1)
σm = σ / √m
(1)
Assuming on the average m = 4, the standard deviation is
down to 2.5. If m ≥ 25 for some users, then σm ≤ 1.0.
IV.

EXPERIMENTS AND EVALUATION

A. Experiment 1: Underage Drinking Patterns
The first of our experiments is directed at monitoring the
time of publication of alcohol-related contents created by
English-speaking teenagers and discovering any patterns in
them. To this end, we requested the Instagram API for an
initial sample of images matching words from our alcohol
dictionary and for the data associated with each of the

30000

40000
35000

9000
8000

25000

7000

30000

ia
d
e 25000
m
 f
o
r  20000
e
b
m15000
u
N
10000

20000

6000
5000

15000

3000
2000

5000

5000

1000

0

0
1

2

3

4

5

6

alcohol media

4000

10000

0
1

7

320000

300000

310000

250000

2

3

4

5

6

7

8

9

10

11

12

1

3

5

7

9 11 13 15 17 19 21 23 25 27 29

80000
70000
60000

ia 300000
d
e
m
 f
o
r e 290000
b
m
u 280000
N

200000

50000

150000

40000

all media

30000

100000

20000

270000

50000

260000

0
1

2

3

4

5

6

10000
0

7

1

2

3

4

5

6

7

8

9

10

11

12

1

14

14

14

12

12

12

10

10

10

8

8

6

6

4

4

2

2

2

0

0

e
agt 8
n
ce
r 6
e
P
4

1

2

3

4

5

6

5

7

9 11 13 15 17 19 21 23 25 27 29

ratio

0
1

7

3

2

3

4

5

6

7

8

9

10

11

12

1

3

5

7

9

Month

Day of week

11 13 15 17 19 21 23 25 27 29
Day of month

3500

140000

0.03

3000

120000

0.025

2500
2000
1500
1000
500

100000

percentage

Number of media

Number of media

Figure 6: Time pattern of underage alcohol use.

80000
60000
40000

0.01
0.005

20000

0

0.02
0.015

0

0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Hour of day

Hour of day

Hour of day

alcohol meida

all media

ratio

Figure 7: Time pattern of alcohol use in NYC.

owners of those alcohol-related images. Some preprocessing tasks must be performed at the beginning. We
wanted our sample of users to be English speakers (as that
was the language used in our dictionaries) and teenagers.
To do this, we detected the language used in the bio and
comments of the users in the sample and discarded those
with a language other than English. After that, we estimated
the age for each of the remaining users and discarded those
above 21 years old. Our sample had now a size of 15,522
underage, English-speaking users who had uploaded
195,325 pictures associated with alcohol.
The next step was to track all the posts tagged with
alcohol related words and get their timestamps as a
reference to user's alcohol use time. One assumption here is
that every time a user posts some content related to alcohol,
the alcohol consumption act happened within a reasonable

timeframe of at most one day from that upload time.
Instagram, unlike other photo sharing platforms such as
Flickr and Pinterest, is designed so that most users post
pictures instantly as things are happening.
Figure 6 displays the results of our first experiment. The
first row shows the number of alcohol-tagged media posted
at each day of week, each month and each day of month
respectively. The second row shows the same statistics for
all media posted by those same users (no matter what their
tags were). The third row shows the ratio of the number of
alcohol tagged media to the number of all media. The
results show a positive correlation between alcohol-tagged
media and all media. Both alcohol-tagged media and all
media increase significantly during the weekends (with
Monday hangover). They both grow around December and
January. They are slightly higher at the beginning and end

of each month. The ratios indicate that the proportion of
alcohol-tagged media does not increase as much as all
media neither during the weekends, nor at the end and
beginning of a year, nor at the end and beginning of a
month. We also noted that both the number of alcoholtagged media and the number of all media increase steadily
across most of the year. This might be due to the general
trend that the number of media has been increasing faster
since the debut of Instagram as it has become more popular.
B. Experiment 2: Location-Specific Drinking Pattern
The goal of the second experiment is to obtain more
accurate and fine-grained data about drinking times. Note
that Instagram uses GMT in the timestamp regardless of
where the user is so we do not have the precise local time.
To reduce the variation across different posting times across
the world, we needed to limit our sample users to a specific
time zone and this involves using the geolocation property
of pictures. Users can optionally allow Instagram to store
their current location at the time of uploading a picture or
video. Unfortunately, only a very small percentage of media
contents contain location data, so we decided to pick a
crowded city as the filtering parameter for our media
requests in order to maximize our sampling size. The global
coordinates were sent as a parameter to our Instagram query
were the ones surrounding the greater area of New York
City, including the neighboring area in New Jersey. Once
we obtained all of the pictures taken in NYC, we filtered
them through our alcohol dictionary and extracted the
posting times for them.
Figure 7 shows the result of the within day time pattern of
alcohol use in New York City. The alcohol tagged pictures
posted within the area of New York City give us an insight
of the within-day time pattern of alcohol consumption
among Instagram users. However, the result show us that
both alcohol-tagged pictures and pictures in general follow
similar pattern as to the posting hour of day. The number of
posting of both alcohol tagged pictures and all pictures
reaches their minimum around the midnight, and surge to
their maximum around 6 pm. The ratio of the number of
alcohol tagged pictures and the number of all pictures
suggests that Instagram users tend to post more alcohol
tagged pictures during the night than early in the day.
C. Experiment 3: Youth Exposure to Alcohol Media
Our third experiment is designed to find the existence of
underage users among the followers of Instagram accounts
that belong to major alcoholic brands and therefore
demonstrate the exposure teenagers have today to contents
such as advertisements and promotions that encourage the
consumption of alcoholic drinks. Note that this should not
be allowed in theory.
We randomly picked 5 Instagram accounts belonging to
major alcohol brands (Smirnoff, Skyy Vodka, Heineken,
Chandon USA, and Carlsberg) out of a list of popular

brands6 and requested Instagram for the profiles and media
of each the followers of these brand accounts. We iterated
through the users to run our demographics analysis process.

Figure 8: Age distribution of alcohol brand followers.

Age Distribution:
Figure 8 shows the age distribution of alcohol brand
followers. This experiment shows that all of the 5 brands
are being followed by underage users. In general, these
underage users account for more than one quarter of all the
followers of each brand. This could be alarming!

Figure 9: Gender distribution of alcohol brand followers.

Gender Distribution:
Figure 9 shows the gender distribution of alcohol brand
followers. Gender tends to vary among the brands. There
are significantly more female followers than male followers
for Smirnoff and Chandon USA. On the other hand,
Heineken and Carlsberg have more male followers than
female followers. Heineken and Carlsberg are the brands
with the lowest ratios of underage followers. Table 1 also
shows that youth and adult audiences do not always follow
the same gender distribution. For example: beer brands
Heineken and Carlsberg underage followers are equally
balanced in gender. However, male adults almost double
female adults in both brands.
6

http://list.totems.co/tag/alcoholic-drinks/

is likely a result of the gender distribution of the
general Instagram users, with 68% female users [22].

Figure 10: Race distribution of alcohol brand followers.

Race Distribution:
Figure 10 shows the race distribution of alcohol brand
followers. For all five brands, more than three quarters of
the followers are white people. The remaining followers are
divided into black people and Asian people.
Table 1 summarizes and cross-examines age, gender and
the five major alcohol brands. We can clearly see different
preferences among users of different ages and genders. The
underage interests do not always mirror those of the adults.
Table 1: Alcohol follower distribution by gender and age.

Male
Under
Female
Under
Male
Above
Female
Above

Smir

Skyy

Hnkn

Chdn

Carl

6.43%

6.79%

10.95%

6.10%

13.21%

19.76%

15.12%

12.17%

19.49%

11.58%

29.69%

42.16%

51.91%

24.27%

52.41%

44.12%

35.93%

24.97%

50.14%

22.79%

D. Discussions
Using a different data source in social multimedia, we
observed many interesting patterns of underage drinking in
its natural, undisturbed state. Some of the findings are
consistent with what we already know from numerous
surveys on this same subject, for example,
1. Underage consumers still contribute to a significant
proportion of total alcohol consumers.
2. Underage alcohol consumption is higher on weekends
and holidays, and happens more often near the end of a
day. This is similar to the alcohol consumption of the
general population.
3. The underage alcohol drinker group consists of both
females and males and is not strongly biased toward
either gender. The one-third proportion of male
underage alcohol user among all underage alcohol user

The above findings are consistent with those of the
conventional surveys, thus indirectly validating the
proposed approach. On the other hand, we have also
discovered several new patterns, which may help develop
effective intervention, including:
1. Social media does provide an informative platform for
analyzing underage alcohol use activities. Social
media excel in real-time, scalable data analysis.
2. Underage drinkers are willing to share their alcohol
consumption experience to some extent. The time
pattern of posting alcohol related activities is
correlated to the general pattern of posting, while
underage Instagram users who use alcohol are slightly
more likely to show their involvement with alcohol
during the weekdays. This may indicate psychological
influence of alcohol use.
3. Alcohol promotion on social media has a deep
penetration on the young users segment. The
significant amount of underage users following the
polled brands could be an alarming indication of the
lack of regulations within social media advertising and
easiness of access to contents that should be banned to
minors.
4. Not all of the alcohol brands have the same proportion of
underage followers. This means that the results of our
third experiment do not just represent some random
noise but rather the scope each brand has over young
audiences and what age segments they target.
Heineken for example has aimed many marketing
campaigns towards seniors in the last few years [10]
and it appears in our experiment as a brand not so
popular among teenagers.
5. Different genders also show preference toward certain
brands of alcohol. However, gender preferences are
not always consistent between underage people and
adults.
V.

CONCLUSION AND FUTURE WORK

In this paper, we propose a novel approach to monitoring
a major public health problem of underage drinking. The
traditional methods used to monitor adolescent alcohol
consumption based on surveys have the serious limitations
in terms of scale, real-time, and cost. Given the wide
adoption of social media by the targeted population of
teenagers, we exploit the content of the same media youth
use to express themselves to extract signals that can help us
get a handle on the problem of underage drinking. To that
end, we overcome the lack of explicit demographic
information by employing the state of the art computer
vision algorithms for face analytics to acquire information
on age, gender and race of teenagers involved in alcohol
use. We then build a comprehensive language model to

capture drinking related activities from the tags associated
with Instagram photos. We are able to demonstrate the
effectiveness and synergy of the proposed multimodal
approach by discovering a number of underage drinking
patterns in contrast to adult drinking patterns, and
examining the potential influence of social media on
underage drinking.
In the future, we intend to mine deeper level patterns on
this subject in terms of factors such as family income, rural
vs. urban, coastal vs. inland regions, as well as social
influence by peers in the social networks. One important
direction is to combine the proposed approach with surveys
which can be used to verify the findings through social
media data mining. In the long run, we are interested in
applying this methodology for social good to other
problems that involve youth, such as tobacco, drugs, teen
pregnancy, stress [6], and depression [4].
ACKNOWLEDGMENT
The authors would like to thank the generous support of
Google, Xerox, Yahoo, New York State CoE CEIS and IDS.
REFERENCES
[1]

Carter, M. Vertical Focus – Alcohol Brands: Drink up. New
Media Age, 18-20., 2010

[2]

Chester J, Montgomery K, Dorfman L., Alcohol Marketing
in the Digital Age. Center for Digital Democracy and
Berkeley Media Studies Group, 2010

[3]

Clark, D.B.; Lynch, K.G.; Donovan, J.E.; and Block, G.D.
Health problems in adolescents with alcohol use disorders:
Self-report, liver injury, and physical examination findings
and correlates. Alcoholism: Clinical and Experimental
Research 25:1350–1359, 2001. PMID: 11584156

[4]

[5]

[6]

[7]

[8]

[9]

De Choudhury, M.; and De S. Mental Health Discourse on
reddit: Self-disclosure, Social Support, and Anonymity.
Proceedings of International Conference on Web and Social
Media (ICWSM, 2014.
Dees, W.L.; Srivastava, V.K.; and Hiney, J.K. Alcohol and
female puberty: The role of intraovarian systems. Alcohol
Research & Health 25(4):271–275, 2001. PMID: 11910704
Dinakar K.; Lieberman, H.; Weinstein E.; and Selman R.
Stacked Generalization to Predict Adolescent Distress.
Proceedings of International Conference on Web and Social
Media (ICWSM), 2014.
Duggan, M., Ellison, N.B., Lampe, C., Lenhart, A. and
Madden, M. Social Media Update 2014, Pew Research
Center, January 2015. Available at:
http://www.pewinternet.org/2015/01/09/social-mediaupdate-2014/
Robert L. Flewelling, Mallie J. Paschall, and Christopher
Ringwalt. The Epidemiology of Underage Drinking in the
United States: An Overview.
http://www.ncbi.nlm.nih.gov/books/NBK37602/
Galloway, S.; Sabria, P. L2 Intelligence Report: Instagram,
L2, February 2014

[10] Heineken Press, HEINEKEN crowns winners of Ideas
Brewery Crowdsourcing Platform 60+ Challenge, August

2014. Available at
http://www.theheinekencompany.com/media/mediareleases/press-releases/2013/08/1722670
[11] Hingson, Ralph, et al. Magnitude of alcohol-related
mortality and morbidity among US college students ages
18–24: Changes from 1998 to 2001. Public Health 26, 2005
[12] Xin Jin, Andrew Gallagher, Jiawei Han, Jiebo Luo. Wisdom
of Social Multimedia: Using Flickr for Prediction and
Forecast,” ACM Multimedia Conference, 2010.
[13] Krieck, Manuela, et al. A new age of public health:
Identifying disease outbreaks by analyzing tweets.
Proceedings of Health Web-Science Workshop, ACM Web
Science Conference. 2011.
[14] Menon, A.; Farmer, F.; Whalen, T.; Beini Hua; Najib, K.;
Gerber, M., Automatic identification of alcohol-related
promotions on Twitter and prediction of promotion spread,
Systems and Information Engineering Design Symposium
(SIEDS), 2014 , vol., no., pp.233,238, 25-25, April 2014
[15] Moira Burke. How Families Interact on Facebook.
www.facebook.com. Web. Accessed 1/23/2015, 2012.
[16] National Institute on Alcohol Abuse and Alcoholism
(NIAAA). Young Adult Drinking. Alcohol Alert, Vol. 68,
April 2006
[17] National Survey on Drug Use and Health (NSDUH). Survey
on Drug Use and Health: Summary of National Findings,
NSDUH Series H-48, HHS Publication No. (SMA) 14-4863.
Rockville, MD: Substance Abuse and Mental Health
Services Administration, 2014.
[18] Nicholls, J. Everyday, everywhere: alcohol marketing and
social media—current trends. Alcohol and alcoholism,
ags043, 2012
[19] Patricia A Cavazos-Rehg, Melissa J Krauss, Edward L
Spitznagel, Ashley Lowery, Richard A Grucza, Frank J
Chaloupka, Laura Jean Bierut. Monitoring of non-cigarette
tobacco use using Google Trends, 2014
[20] Quiggins P. Police catch two fugitives with help of
Facebook.
http://www.wkyt.com/home/headlines/Police_catch_two_fu
gitives_with_help_of_Facebook_139556273.html, 2012.
[21] Sebastian, J. Carlsberg offers drinkers half-price beer in
exchange for Instagram posts. Marketing Week (Online),
Mar 17, 2014
[22] Smith, C. Here's Why Instagram's Demographics Are So
Attractive To Brands, Business Insider, August 2014.
Available at: http://www.businessinsider.com/instagramdemographics-2013-12
[23] White, A.M.; Jamieson-Drake, D.W.; and Swartzwelder,
H.S. Prevalence and correlates of alcohol-induced blackouts
among college students: Results of an e-mail survey. Journal
of American College Health 51:117-119, 122–131, 2002.
PMID: 12638993
[24] Ruichi Yu, Shuguan Yang, Guifan Li, Chao Qian, Sambit
Sahu, Ching-Yung Lin. Mobile App Connecting People
Based on Personality Detection and Image Perception
Analysis. ISM 2014: 333-340.
[25] Yun Yang, Peng Cui, Vicky Zhao, Wenwu Zhu, Shiqiang
Yang. Emotionally Representative Image Discovery for
Social Events. ACM ICMR, 2014

[26] Ting Yao, Yuan Liu, Chong-Wah Ngo, Tao Mei. Unified
Entity Search in Social Media Community. International
World-Wide Web Conference (WWW), 2013.
[27] Heng Liu, Tao Mei, Houqiang Li, Jiebo Luo, Shipeng Li.
Robust and Accurate Mobile Visual Localization and Its
Applications. ACM Trans. on Multimedia Computing
Communications and Applications, Volume 9, Number 1,
October 2013.
[28] Danning Zheng, Tianran Hu, Quanzeng You, and Jiebo Luo.
Towards Lifestyle Understanding: Predicting Home and

Vacation Locations from User’s Online Photo
Collections," AAAI International Conference on Weblogs
and Social Media (ICWSM), 2015.
[29] Dawei Zhou, Jiebo Luo, Vincent Silenzio, Yun Zhou, Glenn
Currier, and Henry Kautz. Tackling Mental Health by
Integrating Unobtrusive Multimodal sensing. AAAI
Conference on Artificial Intelligence (AAAI), 2015.
[30] Zhou, X., Ye, J., & Feng, Y. Tuberculosis surveillance by
analyzing Google trends. Biomedical Engineering, IEEE
Transactions on, 2011.

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close