Portfolio (Compressed)

Published on January 2017 | Categories: Documents | Downloads: 36 | Comments: 0 | Views: 229

of 43

Content

Wu, Jie-Wei
I certify that the work included in this portfolio is my own original work.
Work included which was conducted as part of a team or other group is indicated and attributed as such--the other team members are named and a true
description of my role in the project is included.

Self Graphic
Works

“I’m practicing letting it go.
It’s all about you but nothing to do with you.”
“Which accompanies you longer, a person or a
pair of shoes?”

2011 NTUEE
Camp, Journal
and Film Group

Lectures
Git Best Practice & Common FAQ
Photoshop Class
Implicit Animation
Am I Destined to Love You?

Research
Publications
Design of Test Functions for Discrete Estimation of Distribution Algorithms.

Thesis

Appendix

User Preference Based Recommendation System Design with
Adaptive Concept Space

WEBSITE DESIGN
Being one the most easiest way to spread information, a good website
must contains many details in its design.
Here I select two websites I was wholly responsible for the design.

News e Forum
This is a web design case for “News e Forum.” News e Forum
is a student media outlet staffed by graduate students from the
Department of Journalism at three universities, National
Taiwan University, National Chengchi University, and National Taiwan Normal University. They intend to inform more
people about public issues in the aftermath of the popular

“Sunflower Movement” of 2013 in Taiwan. I was responsible
for the entire website design and “Aotter” studio (http://aotter.net/) was in charge of the implementation work. Dark red
is the theme color and I designed the website back screen
white to make the articles more readable for readers.

URL: http://newseforum.com/

I designed both the PC and mobile versions
for a better user reading experience.

News Reading Page

Including reference link, sharing function,
and moving to next/prev page

Introduction Page

Activity Page

Projectype
A web service that helps users manage individual/group projects based on the structure of Ruby on Rails and it features easy
settings for sub-task dependencies and friendly UI with simple mouse clicks/drags after an extensive survey of current project
management services. This received the first place award given by Chunghwa Telecom (Taiwan’s biggest telecom company)
and one of three creativity awards given by Wantoto Inc. in the final demo/poster show for, of course, Network and Platform
Service Programming.
I was independently responsible for the whole UI/UX design, a major portion of the front-end development, and
discussing website data structure design with Wang Yu-Hsin. Wang was responsible for entire data structure implement and
Chen Pei-Chun was working on some of the front-end development and major presenter. I proposed this idea and we three
discussed in detail then.
URL: http://projectype.herokuapp.com/

After logging in to the welcome page, users can create multiple
projects.
We divide a big project into three parts, project, tasks, and subtasks. A
project contains many tasks and each task is composed of sequential
subtasks. Subtasks are dependent, that is, you can only start one subtask
after finishing another
These pictures are part of introduction page.

Gantt Chart
Subtasks belonging to a task are shown with their
start/due dates corresponding to the date scroll below.
Users can drag the subtasks around to change their
start/due dates and draw arrows between two subtasks to
set their dependency. Just click on the cute red dots and
start dragging the arrows around.

Subtask Card
The details of a subtask such as start/due date, description, comment and members in charge are recorded in a
"card" which shows up by clicking the subtask.

This service provides two modes for better project management: Task mode and User mode.  Users can add
tasks and subtasks in Task mode and see “subtask card.”
In User mode, there is a calendar where a user can see
the start/due dates of subtasks she/he and her/his project
partners are in charge of.

PROJECTS
I worked on projects combining graphics and programming. Here I
select a mobile phone game and a work from a hackathon.

King of Plants
This is a PvP Android game featuring a cute hand-drawn user interface. The game is played on Android mobile
phones by the two players. My team members are Jian, Bo-Yu and Shao, Zhong-Yu. I was in charge of most of the
UI/UX design and have drawn all of the pictures needed. I discussed some user interactions and the touching screen
function with Jian, Bo-Yu. I also took part in some programming work, such as the animation background and video
effect while my team members created the major programming structure.

Lectures
Move to Map

The two mobile phones are connected
via Bluetooth. Players need to move
their phones and touch screens to
control their characters to attack each
other until one of them is out of
“blood.”

When user want to move their
characters, they need to lend the
photo to the left/right, and the
character would move to left/right,
along with the gravity.

For different actions, the entire screen are divided into eight blocks. When users
want to invoke jumping and shooting “bullets,” they need to touch the corresponding block. In addition, there is a configuration view for users to change the
function of blocks on their preference.

I designed three characters, sunflowers, dandelions and peas. I slightly
modified graphics to meet different needs of the character, such as
jumping, moving, falling and shooting bullets.

Demo Video:
www.youtube.com/watch?v=QUu6QaXmG-I

If characters touch the bugs, it would lessen their “blood.”
I draw three different bugs.

After connection, users will see this view and they may
choose their characters and the “circumstance” in which
they are going to fight.

The background makes players feel that they are in a flower
shop. When playing, some furniture is blurred, only those
in the clear foreground view are available for characters to
move around.

Pictures for more powerful weapons, weapon symbols, and
bumping effect.

This is the main menu, and users can touch the door to enter the game or the right black
board to change the configuration.

I Have Questions
I participated in “HackNTU” with my classmates, Yu-Hsin Wang and Hsiang-Sheng Liang.
Our work is named “I Have Questions.” It provides instructors a javascript snippet to insert
into their HTML slides that allows students to ask questions directly on the slides during the
lecture. This idea came from Liang who worked with Wang to build the server connection
and building database.
I was responsible for the user interface design, website interface design building, logo
design, and front-end development. Ultimately we won the first place award. Also, we were
invited to talk about our work in another meetup and we won the most popular award as
voted by the audiences. I represented our team to introduce our work to the public.

During a Class:
This is a HTML slide based on RevealJS, and
the left panel shows the questions that students
propose and the number reveals how many
students have the same question. Students can
see what questions others have and click button
to show “I have the same question.”

Website
This website introduces how our work processes, and here lecturers can
access a “code” to put into their slide to utilize our work.

URL: http://owo.herokuapp.com/

GRAPHIC DESIGN
I was involved in creating various graphic designs, such as logos, phone
cases, clothes, and name cards.
I have a huge passion for creating beautiful visual work.

2012
NTUEE
APPAREL

I designed this shirt with Hsiang-Sheng
Liang in 2012. This work won the most
votes and became the 2012 department-pride apparel for more than 1,000
fellow EE students.
The idea came from heavy work
we EE students face every day, including
remembering many formulas involved
with the study of electronics, electro-

magnetism, differential equations and so
on. I discussed this idea with Liang and
we worked together to type many
formulas with Latex and transfer the
format into Photoshop shape. Also, I
drew an amplifier on it and a different
explanation for “GB product,” which
originally meant that gained cross
bandwidth is constant. I thought that

v

“Grade cross bed is constant” could
indicate how EE students study hard to
get good grades. I believe this T-shirt is
meaningful since it reminds us of the
days we studied so hard.

09

2011 NTUEE Camp Uniform
NTUEE camp is the most important event to introduce the department to
potential students. We organized a 5-day event that included various
lectures, experiments, and activities for 120 high school students. The
theme for the 2011 NTUEE Camp was “rEEturn.” I was in charge of
designing the uniform. Based in Tiffany blue, I intended to show the flow
of time and used a yellow character to present the theme of this camp.

2012 NTUEE Year Book
I was one of two editors of our 24-page EE Yearbook and was responsible for the major design, cooperating with
Hsiang-Sheng Liang. This work won second place in the NTU yearbook contest. Here I select part of it.
In the beginning, as if we’re
going to tell you a story, this
story must have a computer,
and the screen is the photo of
all EE students while the
keyboard shows many scenes
we’re familiar with, including
our buildings, Smith chart,
Lena, and the textbooks we
both love and hate. On the
right side, the electronics
textbook shape indicated on the
tables we used to study on with
the indication, “No Food and
Keep Quiet.”

This is a view from a popular game among us
then. It’s also an important factor of our lives.
Thus I use it to put students’ profile photos in it.

The familiar classrooms are places we will never
forget. We learned so much knowledge here with
our classmates. That’s why I chose this place as the
background and put photos showing us studying,
playing and hanging out together. Photos and words
seem like a dialog box. I want to create an atmosphere that even if the classroom is empty and quite,
what happened still exists forever.

NTUEE camp is the most important event to introduce
the department to potential students. We organized a
5-day event that included various lectures, experiments,
and activities for 120 high school students. This is the
building where the last day and all of EE students with
our high school students would take pictures. Thus, I let
the photos seem like emerging from the building,
indicating that the place contains our camp memories.

NTUEE night is one of the most important events for EE students and
it’s one of the incredible nights at NTU. Through these two pages, the
background is the window of the place where we performed. Thus, I
put the photos of our performance on those windows.
The last page looks like our exam paper with so many familiar
graphs. However, all of the charts are represented as time lines
and every important event is noted. In addition, the elephant
shape and the refrigerator came from a joke. “How do engineers
put an elephant into a refrigerator? Just give them a deadline.”
Hsiang-Sheng Liang was majorly responsible for this page. I
helped him confirm details and discussed typesetting.

Graphics of Films

Pencil, Photoshop CS6, and graphics tablet (Wacom bamboo)

These drawings are used in a film in which my friends and I participated in our campus film contest. 
I was the leading character and responsible for all of the drawings shown in the film. The story is about a little girl
who loved drawing and imagined a boy to be a caterpillar. I drew it to reveal her mind. I drew with pencil and paper, scanned
the paintings into computer, and then colored them with a computer tool, Adobe Photoshop, and graphic tablet.

Mola
Flutefish
Phone Case and Bag Mockup
These are pictures I created for a demo website in the lecture by Hsiang-Sheng Liang. I drew the mola and flutefish with
Adobe Illustrator then put them in this phone/bag case template.

This is the screenshot of the demo website.
http://flutefish.herokuapp.com/

CUPETIT Valentine’s Day Visual Theme

CUPETIT is a store selling cup cakes. (https://www.cupetit.com) I helped them design the pictures used in flyers and their
website for Valentine’s Day 2012. The photo is from CUPETIT and I discussed the wording and colors with them, using Adobe
Photoshop to adjust the photo.

Original photograph Credit: Huang Shih-Yu

Logo Design
“DMCC” is an abbreviation of Digital Movie Creator Club.
We get together to produce films. This club just started a year
before I joined and I became executive-in-chief the second
year. I designed the logo for our club then. I put many movie
factors inside this logo, such as a camera, clapboard, and film,
implying that all of these are part of “DMCC.”

Shotwill is a startup which provides a platform for photographers
and people who want to buy photo materials. I designed the
camera shape logo. The gradient blue color and the white clouds
are added by Liang, Hsiang-Sheng, one of the co-founders of
Shotwill.

This is a fish logo composed with J-a-p-i-e. Japie is my
nickname. I love the image of fish so I created this logo as
my identity.

Name Card Design
When I was an intern in Womany, I designed my name card. People
can scan the 1D barcode as well.

Womany Facebook Pages Graphic Design
When I was an intern at Womany, I helped to design a picture for a
series of activities, which aimed to tell readers how to get to together
with each other and love themselves more. In my designing, the check
lists are events for readers to check if they love themselves. This is
used for one post of the fb pages which Womany operates.

Do You Really Love Yourself? Five Checks:
You’re always unsatisfied with your appearance.
You always fit yourself into the trend.
You seldom sincerely praise yourself.
You live in the image set by others.
You negate yourself because other people refuse
you.
If you check three or more, that means you should love yourself more.

Facebook pages

SELF GRAPHIC WORK
I enjoy drawing and have created many work on my own.
Several pictures are selected.

“I’m practicing letting it go.
It’s all about you but nothing to do with you.”

This picture was inspired by a sentence in a movie, The Shoe Fairy.

“Which accompanies you longer, a person or a pair of shoes?”

The idea came from a lyric:
“I fall asleep while missing you.”

“Do you hide a sun in your hand? “
“Or why I feel so warm whenever holding your
hand?”

I love to plan my schedule while listening to
my favorite music.

I depicted my image and put things I love
together.

Taking a picture of myself

A dancing girl shakes her body to the rhythm.

2011 NTUEE Camp, Journal and Film Group
NTUEE Camp is the most important event in a year. I was in the Journal
and Film Group and several works are selected.

COURSE PROJECTS
Here I select four course projects and briefly introduce them. The complete
reports are appended as an appendix of this portfolio.

2011 NTUEE Camp, Journal and Film Group
Newspaper
I was in charge of the newspaper, named EEN, in the camp. We interviewed people during the
events and compiled interesting stories. In addition, we included fun things as well, such as paper
advertisements, cross-word puzzles and horoscopes. We published a six-page newspaper reporting the events of the previous day for six consecutive days. I worked with a group and I was
responsible for major design, work distribution and final review.
I designed the logo of EEN.

All Newspaper for six days

NTUEE camp newspaper

The students were reading our newspa- We used mosaic skills to
per.
compose these two pages with
photos taken during the camp.

Good Morning, Circuit
Video URL: https://www.youtube.com/watch?v=3JsAJvu4ijY
This was one of many “advertisements” between the “TV
News” produced by the Journal and Film group. These
films were played during NTUEE Camp. We mimicked the
famous MTV format and modified it to be more “electronic engineering oriented.” During this advertisement, the
original lyrics were adjusted to make it sound like setting
circuits in a lab. I was playing the “host” rule, responsible
for the graphics and was directing this advertisement.
I designed this banner and decomposed it to enable it
to animate in the film.

Course Projects
NTUEE Camp is the most important event in a year. I was in the Journal and Film Group and several
works are selected.

Faceloook

Artificial Intelligence

Faceloook is a browser extension that learns
user preferences on her/his own Facebook
feed. Faceloook uses click events on each post
to distinguish the interesting posts from the
others, collecting labels without a user’s cognitive effort. Naive Bayes classifier is used to
create the user preference model out of contextual features of the posts and the click event
labels. 5-fold cross validation is performed to
evaluate the system. The entire project is open
source. From the context of the post, we
extracted a couple of features that indicate that
a user was interested in a post or not. All tokenized text features were concatenated and
marked with labels, “Interesting” or “Not-Interesting,” according to the presence of click
events. Then Faceloook used a third-party
classifier library, the Naive Bayes model, to
train and test. The output was the probability
the user was interested in the post. In addition,
users could give feedback by clicking on the

Ringtune

star we added to a post, then the label of the
specific post was altered, but only if the post
was still not trained yet. In the query interface
view, the feed posts were categorized and
sorted by the probability the user is interested
in each post. Time was also considered as
aging factor.
Query Interface
Faceloook Block Diagram

Data Mining

We built an Android application that intelligently adjusts the volume level of mobile
devices to the desirable level to avoid the
volume being loud in a quiet environment or
too soft in a noisy environment. After observing and surveying previous work, we decided
upon several features, including ambience
sound, light, proximity, z-axis acceleration and
low-frequency acceleration. We then collected
the features and the ringer volume settings as
training data in the daily use of the mobile
phone. A combination of two models was
performed. The first model is called “ambience
model,” which utilizes ambience sound data as
input and its output is an “intermediate
volume.” Then the “major model” predicts the
ringer volume based on the intermediate
volume and other sensor features. The reason
was that the ambience sound is recorded as an

amplitude histogram. We had 96 different
numbers but they’re actually closely related. In
this project, I was responsible for the ambience
model, collecting training data and integrating
the inputs and outputs of the system.

The comments left by online readers are
tokenized and extracted as features via
chi-square selection.

ShrimpNews

Web Retrieval and Mining

We extracted the latest news
articles and their social network
comments to create a training
database. After that, we invited
volunteers on the Internet to mark
if a news post from the database
was useless. Terms that were
highly correlated with valueless
news were extracted from the
database and used as feature
terms in our classifier. We then
trained different ML models,
including SVM, Naive Bayes and
KNN. After adding some terms
as editor’s choices, we then
presented the list of valueless
news post.
I was responsible for building a

Movie to Map

website on which people can tag
news posts as valueless or useful.
In addition, I cooperated on the
feature extraction, modeling
building and representing our
team in the presentation for the
final demo show.

A title list of valueless news post

Multimedia Analysis and Indexing

When people watch a film, they
might want to know where the
scenes of this film were
produced, and even they might
want to visit the same place.
Based on this idea, we created a
tool that could automatically clip
scenes in a film then pick out
specific locations on a map.
First we trained gist filter with
SVM and existed database, then
utilizing trained gist filter and
face recognition tools to clip the
scenes from a film. Several
groups of scenes were presented,
and each group represented a
scene, we used multiple shots to
leverage the precision rate. Then
we made use of imgur api and
Nokogiri (Ruby library) to
perform Google image search to
extract similar images, which are

presented with name information.
Then those names would be the
search keyword on Google map.
The map with scene pinned out is
the result.
I was in charge of the structure
design and integration of the slots
result, utilizing imgur api and
Google map api to search for
similar results and use map api to
present a visual feedback. In
addition, I represented my team
in the presentation for the final
demo show.

A film was extracted into several groups.

Here shows the map with a particular
locations of a scene in a film.

LECTURES
I gave various types of lectures. Here I select a part of them and briefly
introduce them. Full slides are on SlideShare.
http://www.slideshare.net/jieweiwu/presentations

RESEARCH
The abstract of a published paper and my master thesis briefly describes
my research experience on genetic algorithm and information retrieval.

APPENDIX
The reports of Course projects.

Lectures
I gave various types of lectures. Here I select part of them and briefly introduce them. Full slides are
on SlideShare. (http://www.slideshare.net/jieweiwu/presentations)

Introduction to Protractor

Yahoo Internal

Tech

Git Best Practice & Common FAQ

Yahoo Internal

Tech

Protractor is an end-to-end testing framework and
originally designed to test AngularJS then expanded to
test all end web views/functions. I talked about the
concept of Protractor, followed by the steps of installing and running it. Several test cases are used as
examples to help the audience understand. I presented
this talk in a front-end meetup at Yahoo Inc.

Git is a powerful version control tool. Here I talk about
the best practice in cooperation. Some common situations and best solution is presented. I also talk about
some skills like rebase, reflog, and stash. I presented this
talk for three search teams at Yahoo Inc.

Implicit Animation

Am I Destined to Love You?
iOS

This is an iOS study
group at Yahoo. I
was in charge of
implicit animation.

Photoshop Class
Design Tool
Psychology

A lecture for a Photoshop Class at the NTU Digital
Movie Creator Club (DMCC) and appended handout

Club

This is a two-hour lecture about intimate relationships
from the perspective of psychology in a gathering
named “Code and Beer” and attended by EE and
CSIE students at the end of 2013. I talked about how
psychologists view a relationship and introduced some
experiments and theories. I also mentioned what’s the
key point to maintain a relationship, know yourself
better, lessen your own anxiety, or find the right guy.

Research
Publications
Shih-Ming Wang, Jie-Wei Wu, Wei-Ming Chen & Tian-Li Yu. Design of Test Functions for Discrete Estimation of Distribution Algorithms. Proceedings of the Genetic
and Evolutionary Computation Conference(GECCO-2013), 2013
Abstract
Two types of problem structures, overlapping and
conflict structures, are challenging for the estimation of
distribution algorithms (EDAs) to solve. To test the
capabilities of different EDAs of dealing with overlapping and conflict structures, some test problems have
been proposed. However, the upper-bound of the degree
of overlap and the effect of conflict have not been fully
investigated. This paper investigates how to properly
define the degree of overlap and the degree of conflict to

Full Paper: https://goo.gl/83iKOy
reflect the difficulties of problems for the EDAs. A new
test problem is proposed with the new definitions of the
degree of overlap and the degree of conflict. A framework for building the proposed problem is presented,
and some model-building genetic algorithms are tested
by the problem. This test problem can be applied to
further researches on overlapping and conflict structures.

Thesis
User Preference Based Recommendation System Design with Adaptive Concept
Space
Abstract
This thesis proposes a recommendation system (RS)
which
incorporates
the
advantages
of
the
user/item-based collaborative filtering (CF) and the
content-based filtering. Unlike the user/item-based CF
where the user/item spaces are of high dimension, the
proposed RS utilizes the user-based and item-based
concept spaces where dimension, or the number of
concepts, is increased only necessary. In addition, the
proposed system can deal with the cold start problem
with producing another kind dimension of items. With
modifying clustering results, it can be used to create
recommendation in the rapid increasing information.
The dimension of the item-based concepts is defined by
the features of the items, and concepts are the clustering
result of the item-based concept space. The user-based
concepts are the result of clustering adjustment from the
item-based concepts with the information of users’
behaviors, such as whether or not a user is interested in
both items in a concept. The user-base and item-based
concepts co-evolve iteratively in the above manner. At
the end, the proposed RS utilizes the learned concepts
combined with the reading dependence to perform
recommendation. The proposed techniques are demonstrated on the article recommendation. In this case, the

Full Thesis: https://goo.gl/Rj0Npd
features of an item correspond to the segmented
contents of an article, and users’ behaviors correspond to
users’ reading preferences.
 In the experiment, the item-based/user-based CF dimension is about 30, 000 and 3, 000 while the concept space
in proposed RS articles starts from 5 and ended up
merely 87 after 12 iterations. The proposed RS dynamically adjust the dimension of articles. The dimensions of
articles is 44 in the end and used for clustering articles.
New articles then can be clustered and recommended as
well.
The precision-recall curves indicate that the proposed
RS achieves more hits than user-based/item-based CF
and content-based filtering. The average precision-recall
curves and mean average precision of proposed system
grows and exceeds others. This idea of two concept
spaces can be extended to the situation with items with
extractable features as dimension and the interaction
between items and users.

Faceloook: Learning user preferences on Facebook feed
Hsiang-Sheng Liang

Jie-Wei Wu

Department of Electrical Engineering
National Taiwan University
{b97901125, b97901084}@ntu.edu.tw
ABSTRACT

We present Faceloook, a browser extension that learns user
preference on her/his own Facebook feed. Faceloook uses
click events on each posts to distinguish the interesting
posts from the others, collecting labels without user’s
cognitive effort. Naive Bayes classifier is used to create the
user preference model out of contextual features of the
posts and the click event labels. 5-fold cross validation is
performed to evaluate the system. The entire project is
available on the Internet1.
INTRODUCTION

Browsing the overwhelming feed on Facebook is timeconsuming, not to mention searching a specific post on the
feed. However, the post that a user really interested in is
much less than those the user have read. Users often find
themselves scrolling down the their news feed merely to
pick up one or two posts they really interested in. This is
still true even with the “Top stories” version of the news
feed, which shows only feeds according to the affinity
score and the social events occurred on the post [1].
We believes that a user preference model can do better on
choosing what a user is really interested in. User
interactions and the contextual cues when browsing the
feed can help us conclude whether a post is interesting or
not. Combined with the context of the post as the machinelearning feature, a user preference model on Facebook feed
can be constructed.
Faceloook is a Google Chrome extension that records user
clicks on the user’s Facebook news feed. A simple click
event can be any interaction between the post and the user,
such as “like” and “reply”. When such interaction exists,
we suppose that the user is interested in the post. The click
event data and the post context is used to train a Naive
Bayes classifier, which is the user preference model that
distinguishes interesting posts from others.
As an application of the model, the browser extension also
provides a query interface, enabling search on Facebook
feeds. The results are sorted according to the classifier
output, helping users to find desired posts quickly.

1

https://github.com/MrOrz/faceloook

Figure 1: Screenshot of Faceloook query interface. The
figure shows the posts classified as “interesting” to the
user, guessing what post the user might want to search
even before he/she enters a search term. The order is
determined using the trained user preference model.
RELATED WORK

Webb et al. [2] identifies several challenges in user
modeling, including the need for large datasets and labeled
data, concept drift and computational complexity.
Michalski et al. [3] discussed algorithms for learning and
revising user proﬁles that can determine which websites on
a given topic would be interesting to a user. They used a
naive Bayesian classiﬁer for this task, and demonstrated
that it can incrementally learn proﬁles from user feedback
on the interestingness of Web sites. In an experimental
evaluation, They compares the Bayesian classiﬁer to
computationally more intensive alternatives, and show that
it performs at least as well as these approaches throughout
a range of different domains and empirically analyzing the
effects of providing the classiﬁer with background
knowledge in form of user deﬁned proﬁles and examine the
use of lexical knowledge for feature selection.
Facebook once had a function called “news feed
preferences”, which allows users to adjust which types of
story to show up in her / his news feed. However, it
requires user intervention and cognitive effort to tune the
preference settings. The function is no longer available
since 2009.

because she/he does not see it yet. The existance of
“updated_at” timestamp marks the post as “seen”, or “once
appeared in screen”. We use only “seen” posts to train the
model. Secondly, the timestamp can be used to impose
different weights on the classifier output, which is
explained in detail in the “query interface” section below.
Another interaction the extension concerns is the “click”
event mentioned before. Any click event occured on the
area of a post (figure 4) will mark the post as “clicked” in
the Web SQL Database. The “updated_at” timestamp is
updated when a post is clicked.
Figure 2: News feed preferences of Facebook.2
DESIGN AND IMPLEMENTATION

Faceloook is implemented as a Google Chrome extension.
It collects user interaction with the Facebook feed. The
interaction information is combined with contextual
information of the posts, which consist of the training
feature and the interesting / not-interesting label. The
feature is processed by word segmentation system, before
being modeled by the Naive Bayes algorithm. The
extension also provides a feedback mechanism to let user
actually see the classification result, and the ability to alter
the label. Finally, a query interface is built, with the user
preference model in mind, enabling search functionality
over the user’s news feed. The system block diagram is
shown in figure 3.
Figure 4: The purple shaded area is the area of a feed.
Any mouse click action, including like, comment, or
drag-selecting words within will mark the post as
“clicked”.
Feature Extraction

From the context of the post, we extacted a couple of
features that affect a user is interested in a post or not.
Table 1 lists the features that we considered to be the
typical features that really matters.

Figure 3: Block Diagram for Faceloook, the Google
Chrome extension.
User Interaction Recording

The very first thing the extension achieve is to collect user
interaction with the posts in the feed. The browser
extension content script stores Facebook object ID of all
posts into the Web SQL Database, but only those that were
once appeared in the screen has a “updated_at” timestamp,
indicating the last interaction time. The “updated_at”
timestamp is important in two ways. First, only “seen”
posts are meaningful to user preference modeling, since we
cannot say the user is not interested in an unseen post just

message

The content of the post.

link

The URL of the link included in the post.

caption

The title of the link.

description

The content of the link.

from

The author of this post, in Facebook Object
ID.

type

Could be one of “status”, “photo”, “video”,
“link”, or “checkin”.

group

The group ID this post is posted in.

Table 1: Features of a post on the feed.
2

http://www.chewie.co.uk/facebook/facebook-news-feedpreferences-no-longer-work/

We used MMSEG[4], a well-known Chinese tokenizer, to
segment the post contents into words. Since word
segmentation is not the main focus of this work, we
directly adopted a ruby implementation 3 of MMSEG
algorithm for the convenience of implementation. The
dictionary file is from the Chewing input method. The
word segmentation system is deployed as a web service 4
for the browser extension to access.

User Feedback

Machine Learning

To insert the star-shaped mark, Faceloook must classify
each post on the current feed first. We first obtain the
context of each feed using Facebook Graph API, given the
object IDs recorded by the browser extension content
scripts. The context is then segmented, and classified with
the current model. An empirical threshold of 0.7 is chosen
for the classifier. The word-segmented results are cached in
database to speed up the training procedure.

How Faceloook trains the model is basically very similar to
training a spam mail classifier. All tokenized text features
like caption, description, and even links are concatenated
and is marked with label “interesting” or “not-interesting”
according to the presence of click event. Features that
identifies a specific user or group like “from” and “group”
are represented by their Facebook object ID. Post types are
transformed into tokens that does not appear in our daily
life. The types are prefixed by the string “TYPE”, resulting
in post type tokens like “TYPEstatus” or “TYPElink”.
Faceloook uses a third-party classifier library, brain.js [5] ,
to train and test the Naive Bayes model. We modified the
classifier library to output a score, which is the probability
the user is interested in the post. The training procedure
starts in background when the user opens Google Chrome.
Only posts that were marked “untrained” were processed at
that time. When user visits Facebook, newer posts with
“untrained” mark will enter the Web SQL Database,
awaiting click events. These posts will be trained after the
restart of Google Chrome.

Faceloook inserts a star-shaped mark (figure 5) into each
post in Facebook feed. The color of the star reflects the
classification result. The star also acts as a toggle switch,
enabling users to manually mark an individual post as
interesting or not-interesting. When user clicks on the star,
the label of the specific post is altered, but only if the post
is still not trained yet.

Query Interface

Figure 1 shows the query interface of Faceloook, as an
application of the user preference model. Clicking on a
post item leads the user to the original post. The feed posts
are categorized and sorted by the probability the user is
interested in each post. The probability is weighted by the
age of the post. More specifically, the age is determined
with the following formula:
Where the “updated_time” denotes the time the post is seen
or clicked by the user.
At first, the latest 50 posts are presented, sorted using the
method mentioned above. These posts are suggestions
made by Faceloook, predicting that these posts are most
likely to be what the user might want to search for, before
the actual search term input. After the user enters a search
term and submits the search form, queries will be made to
the Facebook Graph API. The search results are segmented
and classified, as what Faceloook did when putting stars on
the feed messages. Finally, the search result is categorized
with its post type, and is sorted by the weighted probability.
EVALUATION

Figure 5: The star in each post indicates the
classification result. A yellow star means this post is
classified as “interesting”, or the user has manually
marked the post as “interesting”.

3

http://rmmseg.rubyforge.org/
4
A Ruby on Rails Application is created specifically for the word
segmentation. The application is deployed on Heroku
(http://www.heroku.com) and has a public domain name.

We created several agents to emulate user clicking
behavior. After the agents label all the posts cached in the
Web SQL database, a 5-fold cross validation is used to
generate the score of the user preference model. The posts
in the Web SQL database are real-world Facebook posts
collected in one month from 2 users, constituting 6200
posts as our test data. We split the test data into 2 groups.
The first contains 1200 posts, which is the number of the
posts an user may read in a week. The other group has the
rest of the posts, emulating a longer period of data
collection.
The agents consists of simple rules to determine if it is
interested in the post. The rules involve seeking for specific
keywords in the message attribute, checking if the post is
from a specific group or user, or if the content belongs to a
specific type. The agent we used for the cross validation

deterministically labels 85% of the posts as not-insteresting.
The percentage is empirically derived from the actual click
data collected from the 2 users; that is, they clicked 15% of
the posts they had seen on their news feed.

classifier output of a post must be larger than the
threshold in order to be marked as interesting.
CONCLUSIONS AND FUTURE WORK

We implemented a system to create a user preference
model using the contents from Facebook content and the
user click data. However, the performance of the trained
classifier is far from useful, leaving a great space of
improvement. Some directions of improvement are listed
below:

●

●
Figure 6: The PR curve of 1200-post group (left) and
the 5000-post group (right).

Figure 6 shows the PR curve of the two groups of the test
data. The precision and recall is far from desirable in the
both groups. As the threshold variates, the average
precision is approximately 0.3 and the recall rarely exceeds
0.5. Recall rate of 5000-post group looks particularly
awful. The longer data collection period does not seem to
improve the performance at all.
To compare the classifier performance with the “alwaysguess-0” baseline classifier, figure 7 illustrates the
accuracy-to-threshold graph of the both group. With 85%
of test data is marked as not-interesting as the ground truth,
the baseline classifier accuracy should be 0.85. It turns out
that the overall performance of the Naive Bayes classifier
does not improve much from the baseline. The optimal
threshold giving the best accuracy is above 0.9 for both
cases, which is no different from the “always-guess-0”
classifier.

●

When combined with other classifiers like SVM [6]
and a language model, the classifer may have better
performance compared to merely using the Naive
Bayes Algorithm
Filter out common words in Chinese language before
training Naive Bayes classifier. This is a common way
to rule out irrelevant words that has little impact on
whether the post is interesting or not.
Another one need to be improved is the speed. In the
front-end, it would renew the post while all search data
received and finish all probability calculating. The
transport of data depends on the web speed.

REFERENCES

1. http://smartblogs.com/social-media/2010/10/14/how-tomake-your-facebook-content-top-news/
2. Webb, G. I., Pazzani, M. J., & Billsus, D. (2001).
Machine Learning for User Modeling. User Modeling
and User-Adapted Interaction 2001, (1978), 19-29.
3. S.Michalski, R., & Wnek, J. (1997). Learning and
revising user profiles: The identification of interesting
web sites. Machine Learning - Special issue on
multistrategy learning, 331, 313-331.
4. http://technology.chtsai.org/mmseg/
5. https://github.com/harthur/brain
6. P.Yeh et al. 結合 SVM 與 Naïve Bayes 演算法防堵垃
圾郵件的研究. 2007

Figure 7: The accuracy-to-threshold graph of 1200-post
group (top) and the 5000-post group (bottom). The

WM & IR Final Project Report

ShrimpNews
動機

系D流程
K

g

[
I V
D
T
C
C

K
[

I

o
o k
C

h
I
K

M

+644-5
I

3) -3D

U
V
C

D

解法
K

PO
D
f
T
I

U

u
D
K

D

資料來源F標記
W
o
a

V Km
- C
773- )13
D
)+- 662
c
a 065 "+9)7 -5/15)+- 662
K
D
S
a
)/-5
SP N
k
D
K aM
KwI
IO
3) -3D
WKw
S
M
C
a
M
a[
3) -3Dn
K
U
a
I
IO
I O
O
D K
Lw[
SP
o
3) -3D
U
n
U

[

VK

.-) 9gn d
M
n
D
n
/9)4 (
M w /9)4 -94
n χ 8 )9-) 9- -3-+ 165D W
3) -3
-94
v χ 8 )9- ) 1 1+

+644-5
) 1 1+ o

w -94C M
w .-) 9- -94 D

w -94

K
.9-8 -5+
U

n

w .-) 9- -94
-94
.-) 9- =-+ 69
D
U
o
46 -3D

aN

U

[
h
d
l

D
B

SP q
U
V

pDn
W

T

-

a

U
C

W
C
k
773- )13 U
n
]

D

D

C
DN

46 -3
p
.-) 9- -94
T s

K
C

C
U
k
D
.-) 9- =-+ 69 U
3) -3D

R

9)15

7
i
46 -3

46 -3D

機器學習演算法之比較

+69-

-2)
WL
" C )A=- ) C
k
1 +91415) 1=- 46 -3C/-5-9) 1=- 46 -3
!- 91-=)3 ) - 46 -3
D T
U
~
T
O
5-1/0 69
O
.)3 - 5-/) 1=D n
+31 -)5 1 )5+k
1 )5+- 4- 91+ Sa
y
-1/0 15/D"
n /91 -)9+0
+6
/)44) n
2-95-3 . 5+ 165
! DN
.63 +96 =)31 ) 165
D

LIBSVM

TP

FP

FN

TN

Precision

Recall

F1

124

2032

40

4546

0.057

0.756

0.106

36

Naive Bayes
1NN

18

3NN

2

15
49
0

)3 - 76 1 1=- U
I
+3) 1.1-9
9.)3 - 5-/) 1="
L 46 -3
9-+)33 ~
D

128
146
162

4546
4512
4561

0.71

0.21

0.269

0.33

0.110

1

"

V

U
D

F1-score

LIBSVM

0.677

Naive Bayes

0.298

1NN

0.358

3NN

0.154

Demo Site
M U
N
http://shrimpnews.herokuapp.com/
M
79- -5 ) 165
773- )13
a "
O
D

46 -3 o

0.156

0.012

0.024

O
)A=- ) 79-+1 165
hD
p
N
t
hW 79-+1 165 K U
V

V
D
t

修正

O V
M I

M
O
.-) 9- -94

D
oV

n
M
N
q O
d g
[

M
O
D
/96 5
Wn
N T

[
-94

k
K
CO N

ï
D

O

V
T O
-) 9- -3-+ 165
U .-) 9- -94 U
V
W
-94
N
o
6

D
T
e

ub

D

V
dKw

F
[

[

uture Work
r O

3) -3

-) 9- r O

N

a

D

工作分配
+9) 3-9
5- C 773- )13 U
C[
C
7
[ D
-35- +9) 3-9
C[
D
M
k .-) 9- -3-+ 165 S
-46 1 - U
D
D
[
-46 1 -C[
a
C
CM
C
D

-

W

69-=-94)92
O

T C
E

g
9 0

g
O
-94 k .-) 9D
/96 5 9 0 3) -3D
.-) 9- -3-+ 165
.-) 9- -94D
O
O

[

p

K

K
W

o

M

[
.-) 9Reference
[1]6N8gram

l

https://github.com/timdream/wordfreq
6 stopword6

6

Portfolio (Compressed)

Comments

Content

Sponsor Documents

Recommended