bioinformatics

Published on May 2017 | Categories: Documents | Downloads: 35 | Comments: 0 | Views: 419
of 105
Download PDF   Embed   Report

Comments

Content


1
!"#$%&'( *+'#,+%
$&-. /'%+,01








$$$2+'%+,012#"(
$$$2+'%+,01(+'#,+%2#"(


3#4"%+0##5 678

9:;

<1=>+ ? @=-+
2
AB!;/ CD 3CEA/EAF

G'-"#@4>-&#' -# /'%+,01 2222222222222222222222222222222222222222222222222222222222222222222222222222 8
/HI1#"&'( -.+ /'%+,01 (+'#,+ 0"#$%+" 2222222222222222222222222222222222222222222222 7
Bemo: Ensembl species ............................................................................... 7
Exeicises: Ensembl species ...................................................................... 12
Bemo: The Region in uetail view ........................................................... 12
Exeicises: The Region in Betail view ................................................... 18
*+'+% ='@ -"='%>"&I-% 222222222222222222222222222222222222222222222222222222222222222222222222222222 JK
Bemo: The gene tab ..................................................................................... 2u
Bemo: The tiansciipt tab .......................................................................... 2S
Exeicises: uenes anu tiansciipts ........................................................... Su
!&#L="- 222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 MJ
Bemo: BioNait .............................................................................................. S2
Exeicises: BioNait ....................................................................................... S6
N="&=-&#' 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 8K
Bemo: Exploiing vaiiants in Ensembl ................................................. 4u
Exeicises: Exploiing vaiiants in Ensembl ......................................... 48
Bemo: The vaiiant Effect Pieuictoi (vEP) ........................................ Su
Exeicise: The vaiiant Effect Pieuictoi (vEP) .................................. S2
3#,I="=-&6+ (+'#,&>% 2222222222222222222222222222222222222222222222222222222222222222222222222222 OM
Bemo: uene tiees anu homologues ...................................................... SS
Exeicises: uene tiees anu homologues .............................................. SS
Bemo: Whole genome alignments ........................................................ SS
Exeicises: Whole genome alignments ................................................. S9
:+(41=-&#' 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 PJ
Bemo: Raw ChIPSeq uata .......................................................................... 62
3
Bemo: Regulatoiy featuies anu segmentation ................................ 6S
Exeicises: Regulation ................................................................................. 6S
B@6='>+@ B>>+%% 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 P7
Bemo: 0ploau small files .......................................................................... 67
Bemo: Attach 0RLs of laige files ........................................................... 7u
Bemo: REST API ............................................................................................ 72
B@6='>+@ +H+">&%+ 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222 7O
B'%$+"% Q /HI1#"&'( -.+ /'%+,01 (+'#,+ 0"#$%+" 222222222222222222222 77
Ensembl species ............................................................................................ 77
Region in uetail ............................................................................................. 78
B'%$+"% Q *+'+% ='@ A"='%>"&I-% 22222222222222222222222222222222222222222222222222222222 RS
B'%$+"% Q !&#L="- 222222222222222222222222222222222222222222222222222222222222222222222222222222222222 R8
B'%$+"% Q N="&=-&#' 2222222222222222222222222222222222222222222222222222222222222222222222222222222222 RR
Finuing vaiiants in Ensembl .................................................................... 88
vEP ..................................................................................................................... 91
B'%$+"% Q 3#,I="=-&6+ *+'#,&>% 2222222222222222222222222222222222222222222222222222222 TM
uene tiees anu homologues .................................................................... 9S
Whole genome alignments ....................................................................... 94
B'%$+"% Q :+(41=-&#' 2222222222222222222222222222222222222222222222222222222222222222222222222222222 T7
B'%$+"% Q B@6='>+@ +H+">&%+ 22222222222222222222222222222222222222222222222222222222222222 SKK
U4&>5 *4&@+ -# V=-=0=%+% ='@ <"#W+>-% 22222222222222222222222222222222222222222222 SK8


4
G'-"#@4>-&#' -# /'%+,01
uetting staiteu with Ensembl
www.ensembl.oig

Ensembl is a joint pioject between the EBI (Euiopean Bioinfoimatics
Institute) anu the Wellcome Tiust Sangei Institute that annotates
>.#"@=-+ genomes (i.e. veitebiates anu closely ielateu inveitebiates
with a notochoiu such as sea squiit). uene sets fiom mouel
oiganisms such as yeast anu woim aie also impoiteu foi compaiative
analysis by the Ensembl 'compaia' team. Nost annotation is upuateu
eveiy two months, leauing to incieasing Ensembl veisions (such as
veision 74), howevei the gene sets aie ueteimineu less fiequently. A
sistei biowsei at www.ensemblgenomes.oig is set up to access non-
choiuates, namely bacteiia, plants, fungi, metazoa, anu piotists.

Ensembl pioviues genes anu othei =''#-=-&#' such as iegulatoiy
iegions, conseiveu base paiis acioss species, anu sequence
vaiiations. The Ensembl gene set is baseu on piotein anu mRNA
eviuence in 9'&<"#-X! anu E3!G :+YF+Z uatabases, along with
manual annotation fiom the N/*B[\=6='= gioup. All the uata aie
fieely available anu can be accesseu via the web biowsei at
www.ensembl.oig. Peil piogiammeis can uiiectly access Ensembl
uatabases thiough an Application Piogiamming Inteifaces (<+"1
B<G%). uene sequences can be uownloaueu fiom the Ensembl
biowsei itself, oi thiough the use of the !&#L="- web inteiface,
which can extiact infoimation fiom the Ensembl uatabases without
the neeu foi piogiamming knowleuge by the usei.



3

F]'#I%&% Q ^.=- >=' G @# $&-. /'%+,01_

• view genes with othei annotation along the chiomosome.
• view alteinative tiansciipts (i.e. splice vaiiants) foi a given
gene.
• Exploie homologues anu phylogenetic tiees acioss moie than
6u species foi any gene.
• Compaie whole genome alignments anu conseiveu iegions
acioss species.
• view micioaiiay sequences that match to Ensembl genes.
• view ESTs, clones, mRNA anu pioteins foi any chiomosomal
iegion.
• Examine single nucleotiue polymoiphisms (SNPs) foi a gene oi
chiomosomal iegion.
• view SNPs acioss stiains (iat, mouse), populations (human), oi
bieeus (uog).
• view positions anu sequence of mRNAs anu pioteins that align
with Ensembl genes.
• 0ploau youi own uata.
• 0se BLAST, oi BLAT against any Ensembl genome.
• Expoit sequence oi cieate a table of gene infoimation with
BioNait.
• Beteimine how youi vaiiants affect genes anu tiansciipts using
the vaiiant Effect Pieuictoi.
• Shaie Ensembl views with youi colleagues anu collaboiatois.



6
E++@ ,#"+ .+1I_

Check Ensembl uocumentation
Watch viueo tutoiials on YouTube
view the FAQs
Tiy some exeicises
Reau some publications
uo to oui online couise
F-=] &' -#4>.`

! Email the team with comments oi questions at
helpueskÇensembl.oig
! Follow the Ensembl blog
! Sign up to a mailing list
D4"-.+" "+=@&'(

Flicek, P. !" $%
/'%+,01 JKSM
Nucleic Acius Res. Auvanceu Access (Batabase Issue)
http:¡¡www.ncbi.nlm.nih.gov¡pubmeu¡2S2uS987

/'%+,01 L+-.#@% F+"&+%
http:¡¡www.biomeucential.com¡seiies¡ENSENBL2u1u
Xosé N. Feinánuez-Suáiez anu Nichael K. Schustei
9%&'( -.+ /'%+,01 *+'#,+ F+"6+" -# !"#$%+ *+'#,&> F+Z4+'>+ V=-=2
0NIT 1.1S in Cuiient Piotocols in Bioinfoimatics, }un 2u1u.
uiulietta N Spuuich anu Xosé N Feinánuez-Suáiez
A#4"&'( /'%+,01a B I"=>-&>=1 (4&@+ -# (+'#,+ 0"#$%&'(
BNC uenomics 2u1u, 11:29S (11 Nay 2u1u)

7
/HI1#"&'( -.+ /'%+,01 (+'#,+ 0"#$%+"

V+,#a /'%+,01 %I+>&+%

The fiont page of Ensembl is founu at ensembl.oig. It contains lots of
infoimation anu links to help you navigate Ensembl:





Click on view full list of all Ensembl species.

Click on the common name of youi species of inteiest to go to the
species homepage. We'll click on Buman.

8|ue bar rema|ns v|s|b|e
on every Lnsemb| page
Search
L|nk back to
homepage
Lnsemb| too|s
Search
News
Drop-down ||st
of spec|es
now-tos for
common|y used
Lnsemb| features
8


To finu out moie about the genome assembly anu genebuilu, click on
Noie infoimation anu statistics.


Search
News
Informat|on
and stat|st|cs
L|nks to
examp|e
features |n
Lnsemb|
Informat|on
1ab|es of
stat|st|cs
9
Let's take a look at the Ensembl uenomes homepage at
ensemblgenomes.oig.





Click on the uiffeient taxa to see theii homepages. Each one is coloui-
coueu.

Piotists Fungi
L|nks to the taxa-
spec|f|c s|tes
L|nk back
to Lnsemb|
News
10

Netazoa Plants

Bacteiia

You can navigate most of the taxa in the same way as you woulu with
Ensembl, but Ensembl Bacteiia has a laige numbei of genomes, so
neeus slightly uiffeient methous. Let's look at it in moie uetail.


Search for
a gene
Search for
a spec|es
Informat|on
on Lnsemb|
8acter|a
11

Theie's no full species list foi bacteiia as it woulu be haiu to navigate
with the numbei of species. To finu a species, stait to type the species
name into the species seaich box. A uiop uown list will appeai with
possible species.

Foi example, to finu a substiain of &%'(")*+*,- +*..*/*%! type in
Clostiiuium u.



The uiop uown contains vaiious stiains of &%'(")*+*,- +*..*/*%!. Let's
choose Clostiiuium uifficile 6Su. This will take us to anothei species
homepage, wheie we can exploie vaiious featuies.

12

/H+">&%+%a /'%+,01 %I+>&+%

/H+">&%+ S Q <='@=

(a) uo to the species homepage foi Panua. What is the name of the
genome assembly foi Panua.

(b) Click on Noie infoimation anu statistics. Bow long is the Panua
genome (in bp). Bow many genes have been annotateu.


/H+">&%+ J Q b+0"=Y&%.

(a) What's new in ielease 74 foi zebiafish.

(b) What pievious assembly is available foi zebiafish.


/H+">&%+ M Q L#%Z4&-#%

(a) uo to Ensembl Netazoa. Bow many species of the genus 01'23!%!(
aie theie.

(b) Who publisheu the genome sequence foi 01'23!%!( 4$-5*$!.


/H+">&%+ 8 Q !=>-+"&=

uo to Ensembl Bacteiia anu finu the species 6!%%*!%%$ 5$%"*/$. Bow
many couing anu non-couing genes uoes it have.


V+,#a A.+ :+(&#' &' @+-=&1 6&+$

Stait at the Ensembl fiont page, ensembl.oig. You can seaich foi a
iegion by typing it into a seaich box, but you have to specify the
species.

Type (oi copy anu paste) human 4:12S792818-12S86789S into
eithei seaich box.

13
oi

Piess Entei oi click uo to jump uiiectly to the :+(&#' &' @+-=&1 Page.

Click on the button to view page-specific help.

The help pages pioviue links to Fiequently Askeu Questions, a
ulossaiy, viueo Tutoiials, anu a foim to Contact BelpBesk.

Theie is a help viueo on this page at http:¡¡youtu.be¡tTKEvgP0q94.



Locat|on
v|ews
Chromosome
1oo|
buttons
Scro||ab|e
1Mb v|ew
keg|on of
|nterest |n
deta||
Þage-spec|f|c
he|p
14
The Region in uetail page is maue up of thiee images, let's look at
each one on uetail.

The fiist image shows the chiomosome:



You can jump to a uiffeient iegion by uiagging out a box in this
image. Biag out a box on the chiomosome, a pop-up menu will
appeai.





If you wanteu to move to the iegion, you coulu click on }ump to
iegion (###bp). Foi now, we'll close the pop-up by clicking on the X
on the coinei.

The seconu image shows a 1Nb iegion aiounu oui selecteu iegion.
This view allows you to scioll back anu foith along the chiomosome.





Chromosome
bands
Cur
pos|t|on
nap|otypes
and patches
8ox dragged
out
Scro|||ng
buttons
keg|on of
|nterest
8|ocks represent
genes. Names are
shown bottom |eft.
13
At the moment the gene tiack is set to a fixeu height. Click on the
Automatic tiack height button to expanu the image to incluue all
possible uata in the tiack.

Scioll along the chiomosome by clicking anu uiagging within the
image. As you uo this you'll see the image below giey out anu two
blue buttons appeai. Clicking on 0puate this image woulu jump the
lowei image to the iegion cential to the sciollable image. We want to
go back to wheie we staiteu, so we'll click on Reset sciollable image.




You can also uiag out anu jump to a iegion. Eithei holu uown shift
anu uiag in the image, oi click on the Biag¡Select button to
change the action of youi mouse click, anu uiag out a box.



Click on the X to close the pop-up menu.

The thiiu image is a uetaileu, configuiable view of the iegion.

16


We can euit what we see on this page by clicking on the blue
Configuie this page menu at the left.



This will open a menu that allows you to change the image.

You can put some tiacks on in uiffeient styles; moie uetails aie in this
FAQ: http:¡¡www.ensembl.oig¡Belp¡Faq.iu=SSS.

8|ue bar |s
the
genome
Iorward-
stranded
transcr|pts
keverse-
stranded
transcr|pts
C||ck and
drag the
pos|t|on of
tracks
1rack
names
Legends
17



Let's auu some tiacks to this image. Auu:
• Buman pioteins - Labels
• ubSNP vaiiants - Noimal
• 1uuu uenomes - ANR - Collapseu
Now click on the tick in the top left hanu to save anu close the menu.
Alteinatively, click anywheie outsiue of the menu. We can now see
the tiacks in the image.

We can also change the way the tiacks appeai by hoveiing ovei the
tiack name then the cog wheel to open a menu. We can move tiacks
aiounu by clicking anu uiagging on the bai to the left of the tiack
name.

Search for
tracks
1rack
categor|es
1rack
|nformat|on
1rack
names
Conf|gurat|on
tabs
1urn tracks
on]off and
change sty|e
18

Now that you've got the view how you want it, you might like to show
something you've founu to a colleague oi collaboiatoi. Click on the
Shaie this page button to geneiate a link. Email the link to someone
else, so that they can see the same view as you, incluuing all the
tiacks you've auueu. These links contain the Ensembl ielease
numbei, so if a new ielease oi even assembly comes out, youi link
will just take you to the aichive site foi the ielease it was maue on.



To ietuin this to the uefault view, go to Configuie this page anu select
Reset configuiation at the bottom of the menu.


/H+">&%+%a A.+ :+(&#' &' V+-=&1 6&+$

/H+">&%+ O Q /HI1#"&'( = (+'#,&> "+(&#' &' .4,='

(a) uo to the iegion fiom S2,448,uuu to SS,198,uuu bp on human
chiomosome 1S. 0n which cytogenetic banu is this iegion locateu.
Bow many contigs make up this poition of the assembly (contigs aie
contiguous stietches of BNA sequence that have been assembleu
solely baseu on uiiect sequencing infoimation).

(b) Zoom in on the 67&08 gene.

(c) Tuin on the Tilepath tiack in this view. What is this tiack. Aie
theie any Tilepath clones that contain the complete 67&08 gene.

(u) Cieate a Shaie link foi this uisplay. Email it to youiself anu open
the link.

(e) Expoit the genomic sequence of the iegion you aie looking at in
FASTA foimat.

(f) Tuin off all tiacks you auueu to the Region in uetail page.




19
/H+">&%+ P Q /HI1#"&'( I=->.+% ='@ .=I1#-]I+% &' .4,='

(a) uo to the iegion 6:112294691-112624977 in human. What is the
gieen highlighteu iegion. (Tip: if you see a woiu oi phiase you uon't
know in Ensembl, seaich foi it to see help pages.)

(b) Can you see the patches in the chiomosome view. Biag out a box
to jump to a iegion containing the leftmost patch on this
chiomosome, nameu Bu27_patch (note: you must uiag out a iegion
smallei than 1Nb). What aie the cooiuinates of the patch.

(c) Can you compaie this patch with the iefeience. What has
changeu between this patch anu the sequence it ieplaceu.

(u) uo back to the Region in uetail anu scioll to the iight in the 1Nb
view until you ieach a ieu highlighteu iegion. What is this.
20
*+'+% ='@ -"='%>"&I-%

V+,#a A.+ (+'+ -=0

If you click on any one of the tiansciipts in the Region in uetail image,
a pop-up menu will appeai, allowing you to jump uiiectly to that gene
oi tiansciipt.


Anothei way to go to a gene of inteiest is to seaich uiiectly foi it.

We'ie going to look at the human 9:;< gene. This gene encoues a
multifunctional actin-bunuling piotein with a majoi iole in meuiating
sensoiy tiansuuction in vaiious mechanosensoiy anu chemosensoiy
cells. Nutations in this gene aie associateu with ueafness
(http:¡¡tinyuil.com¡espn-ncbi-gene).

Fiom ensembl.oig, type 9:;< into the seaich bai anu click the uo
button. You will get a list of hits with the human gene at the top.

Wheie you seaich foi something without specifying the species, oi
wheie the IB is not iestiicteu to a single species, the most populai
species will appeai fiist, in this case, human, mouse anu zebiafish
appeai fiist. You can iestiict youi queiy to species oi featuies of
inteiest using the options on the left.
L|nks
21


Click on the gene name oi Ensembl IB. The *+'+ -=0 shoulu open:







LSÞN-001
transcr|pt. C||ck
for |nfo
8|ue bar |s
the
genome
Cpt|on:
Cpen tab|e
of transcr|pts
Iorward-
stranded
transcr|pts
keverse-
stranded
transcr|pts
Gene
v|ews
L|nks
Gene tab
22
Let's walk thiough some of the links in the left hanu navigation
column. Bow can we view the genomic sequence. Click Sequence at
the left of the page.






The sequence is shown in FASTA foimat. Take a look at the FASTA
heauei:



C||ck
Sequence
Most recent human
genome assemb|y
GkCh37 = hg19
Upstream
sequence
!"#$ Lxon

Lxon of an
over|app|ng gene

23








Exons aie highlighteu within the genomic sequence. vaiiations can
be auueu with the Configuie this page link founu at the left. Click on it
now.




0nce you have selecteu changes (in this example, Show vaiiations
anu Line numbeiing) click at the top iight.



Let's look at wheie oui gene is expiesseu. Click on Expiession in the
left-hanu menu.

L|nks to the
var|at|on tab
Show var|ants
1urn on ||ne
numbers
forward strand
(-1 |s reverse)

base pa|r end

name of
genome
assemb|y

chromosome

base pa|r start

24

Bovei ovei the column titles foi a pop-up uefinition.

Can oui gene be founu in othei uatabases. uo up the left-hanu menu
to Exteinal iefeiences:



This contains links to the gene in othei piojects, such as Entiezuene.

To finu out moie about the inuiviuual tiansciipts of this gene, click
on Tiansciipt compaiison in the left-hanu menu.

23
You must now choose the tiansciipts you'u like to see, click on the
blue Select tiansciipts button.




Let's select all the piotein-couing tiansciipts, then close the menu.



V+,#a A.+ -"='%>"&I- -=0

Let's now exploie one splice isofoim. Click on Show tiansciipt table
at the top.



Click on the IB foi the laigest one, ESPN-uu1 (ENSTuuuuuS77828).

C||ck on the + to add
a transcr|pt
Se|ect a|| transcr|pts
of a part|cu|ar
b|otype
Legend
1ranscr|pt
sequence
s
Gene
sequence
26


You aie now in the Tiansciipt tab foi ESPN-uu1. The left hanu
navigation column pioviues seveial options foi the tiansciipt ESPN-
uu1. Click on the Exons link.





You may want to change the uisplay (foi example, to show moie
flanking sequence, oi to show full intions). In oiuei to uo so click on
Configuie this page anu change the uisplay options accoiuingly.

Þurp|e:
U1k
8|ue:
|ntrons
Grey:
cod|ng
sequence
Green:
f|ank|ng
sequence
27


If you woulu like to expoit the sequence, incluuing the colouis, click
Bownloau view as RTF. A Rich Text Foimat uocument will be
geneiateu that can be openeu in woiu piocessoi such as NS Woiu.


Now click on the cBNA link to see the spliceu tiansciipt sequence.






C||ck
cDNA
28


0nTianslateu Regions (0TRs) aie highlighteu in uaik yellow, couons
aie highlighteu in light yellow, anu exon sequence is shown in black
oi blue letteis to show exon uiviues. Sequence vaiiants aie
iepiesenteu by highlighteu nucleotiues anu clickable I0PAC coues
aie above the sequence.

Next, follow the ueneial iuentifieis link at the left.

This page shows infoimation fiom othei uatabases such as RefSeq,
0niPiotKB, CCBS anu otheis, that match to the Ensembl tiansciipt
anu piotein.


29

Click on 0ntology table to see u0 teims fiom the uene 0ntology
consoitium. www.geneontology.oig



Click on the to see a guiue to the thiee-lettei Eviuence coues.

Now click on Piotein summaiy to view uomains fiom Pfam, PR0SITE,
Supeifamily, InteiPio, anu moie.




Clicking on Bomains & featuies shows a table of this infoimation.


Þrote|n
doma|ns
Lnsemb|
LSÞN prote|n
30


/H+">&%+%a *+'+% ='@ -"='%>"&I-%

/H+">&%+ 7 Q /HI1#"&'( -.+ .4,=' !"#$ (+'+

(a) Finu the human =>?@ (myosin, heavy chain 9, non-muscle) gene,
anu go to the uene tab.

• 0n which chiomosome anu which stianu of the genome is this
gene locateu.
• Bow many tiansciipts (splice vaiiants) aie theie.
• Bow many of these tiansciipts aie piotein couing.
• What is the longest tiansciipt, anu how long is the piotein it
encoues.
• Which tiansciipt has a CCBS iecoiu associateu with it.
Why is the CCBS impoitant - what uoes it tell us.

(b) Click on Phenotype at the left siue of the page. Aie theie any
uiseases associateu with this gene, accoiuing to 0NIN (0nline
Nenuelian Inheiitance in Nan).

(c) In the tiansciipt table, click on the tiansciipt IB foi NYB9-uu1,
anu go to the Tiansciipt tab.

• Bow many exons uoes it have.
• Aie any of the exons completely oi paitially untianslateu.
• Is theie an associateu sequence in 0niPiotKB¡Swiss-Piot.
Bave a look at the ueneial iuentifieis foi this tiansciipt.
• What aie some functions of NYB9-uu1 accoiuing to the uene
0ntology consoitium. Bave a look at the 0ntology table foi this
tiansciipt.
(u) Aie theie micioaiiay (oligo) piobes that can be useu to monitoi
ENSTuuuuu216181 expiession.
31
/H+">&%+ R Q D&'@&'( = (+'+ =%%#>&=-+@ $&-. = I.+'#-]I+

Phenylketonuiia is a genetic uisoiuei causeu by an inability to
metabolise phenylalanine in any bouy tissue. This iesults in an
accumulation of phenylalanine causing seizuies anu mental
ietaiuation.

(a) Seaich foi phenylketonuiia fiom the Ensembl homepage. What
gene is associateu with this uisoiuei.

(b) What tissues is this gene expiesseu in. Is this suipiising, given
the gene's iole in uisease. What is meant by "Intion-spanning ieaus"
anu "RNASeq alignments".

(c) Bow many piotein couing tiansciipts uoes this gene have. view
all of these in the tiansciipt compaiison view.

(u) What is the NIN uisease iuentifiei foi this gene.


/H+">&%+ T Q /HI1#"&'( = I1='- (+'+ c&'(') *'+',-./d ("=I+e

Stait in http:¡¡plants.ensembl.oig¡inuex.html anu select the A*"*(
B*1*.!)$ genome.

(a) What u0: biological piocess teims aie associateu with the =0C:D
gene.

(b) uo to the tiansciipt tab foi the only tiansciipt,
vvu1suu1uguS9uu.tu1. Bow many exons uoes it have. Which one is
the longest. Bow much of that is couing.

(c) What uomains can be founu in the piotein piouuct of this
tiansciipt. Bow many uiffeient uomain pieuiction methous agiee
with each of these uomains.

32
!&#L="-

V+,#a !&#L="-

Follow these instiuctions to guiue you thiough BioNait to answei the
following queiy:

You have thiee questions about a set of human genes:
9:;<E =>?@E F:?G&E &H:C8E I?76E CJ<6KG
Lthese aie BuNC gene symbols. Noie uetails on the B0u0
uene Nomenclatuie Committee can be founu on
http:¡¡www.genenames.oig)

1) What aie the Entiezuene IBs foi these genes.

2) Aie theie associateu functions fiom the u0 (gene
ontology) pioject that might help uesciibe theii function.

S) What aie theii cBNA sequences.


F-+I Sa Click on 6*'=$)" in the top heauei of a www.ensembl.oig
page to go to: www.ensembl.oig¡biomait¡maitview

N0TE: These answeis weie ueteimineu using BioNait Ensembl 74.






S1LÞ 2:
Choose Lnsembl Cenes 74
as Lhe prlmary daLabase.
S1LÞ 3:
Choose !"#" %&'()*% genes as Lhe
daLaseL.
33







S1LÞ S:
ln lu LlsL LlmlL, pasLe ln your
gene symbols. Change Lhe
headlng Lo read PCnC
symbol(s) [e.g. Zl?].

S1LÞ 6:
Cllck CounL Lo see 8loMarL ls readlng
6 genes ouL of 64,138 posslble !+
%&'()*% genes. Slnce we enLered 6
gene symbols, Lhls conflrms LhaL our
fllLers have worked correcLly.

S1LÞ 4:
Cllck lllLers aL Lhe lefL.
Lxpand Lhe CLnL panel.

34













S1LÞ 7:
Cllck on ALLrlbuLes Lo selecL
ouLpuL opLlons
(l.e. CC Lerms)
S1LÞ 8:
Lxpand Lhe Lx1L8nAL panel.
S1LÞ 9:
Scroll down Lo selecL
LnLrezCene lu
,-" &*%.)/ 01)%-("* 23
S1LÞ 12:
Cllck 8esulLs.
S1LÞ 11:
Scroll back up Lo selecL CC Lerm
flelds
,-" &*%.)/ 01)%-("* 43
S1LÞ 10:
Also selecL PCnC symbol Lo see
Lhe lnpuL gene symbols we
sLarLed wlLh.
33
M3N $)! "3!)! -,%"*2%! )'O( .') '1! 4!1! HCP J') !Q$-2%!E %''R $" "3!
.*)(" .!O )'O(S







S1LÞ 14:
SelecL Sequences aL Lhe Lop, Lhen expand
SLCuLnCLS and choose Lhe opLlon cunA
sequences (-" &*%.)/ 01)%-("* 5).
S1LÞ 1S:
Lxpand Peader lnformaLlon Lo selecL Lhe
AssoclaLed Cene name (Lhls ls Lhe
offlclal gene name, for human lL ls PCnC
whlch was our orlglnal lnpuL).
S1LÞ 13:
Cllck 6--/(71-)% agaln
36



E#-+a ]#4 >=' 4%+ -.+ *# 04--#' -# +HI#"- = Y&1+2

M3$" +*+ N', %!$)1 $5'," "3! 3,-$1 4!1!( *1 "3*( !Q!)/*(!P
&',%+ N', %!$)1 "3!(! "3*14( .)'- "3! 91(!-5% 5)'O(!)P M',%+ *" "$R!
%'14!)P


Foi moie uetails on BioNait, have a look at these publications:

Smeuley, B. !" $%
!&#L="- Q 0&#1#(&>=1 Z4+"&+% ,=@+ +=%]
BNC uenomics 2uu9 }an 14;1u:22

Kinsella, R.}. !" $%
/'%+,01 !&#L="-%a = .40 Y#" @=-= "+-"&+6=1 =>"#%% -=H#'#,&>
%I=>+2
Batabase (0xfoiu) 2u11:baiuSu


/H+">&%+%a !&#L="-

/H+">&%+ SK Q D&'@&'( (+'+% 0] I"#-+&' @#,=&'

Finu mouse pioteins with tiansmembiane uomains locateu on
chiomosome 9.


S1LÞ 17:
Change vlew 10 rows Lo vlew A||
rows so LhaL you see Lhe full Lable.

noLe: Þop-up blocklng musL be
swlLched off ln your browser.
S1LÞ 16:
Cllck 8esulLs Lo see Lhe cunA
sequences ln lAS1A formaL.
37
/H+">&%+ SS Q 3#'6+"- GV%

BioNait is a veiy hanuy tool when you want to conveit IBs fiom
uiffeient uatabases. The following is a list of 29 IBs of .4,='
I"#-+&'% fiom the NCBI :+YF+Z uatabase
(http:¡¡www.ncbi.nlm.nih.gov¡piojects¡RefSeq¡):

NP_uu1218
NP_2uS12S
NP_2uS124
NP_2uS126
NP_uu1uu72SS
NP_1Su6S6
NP_1Su6SS
NP_uu1214
NP_1Su6S7
NP_1Su6S4
NP_1Su649
NP_uu1216
NP_116787
NP_uu1217
NP_12746S
NP_uu122u
NP_uu4SS8
NP_uu4SS7
NP_116786
NP_uS6246
NP_1167S6
NP_1167S9
NP_uu1221
NP_2uSS19
NP_uu1u7SS94
NP_uu1219
NP_uu1u7SS9S
NP_2uSS2u
NP_2uSS22

ueneiate a list that shows to which Ensembl uene IBs anu to which
BuNC symbols these RefSeq IBs coiiesponu. Bo these 29 pioteins
coiiesponu to 29 genes.

Bint: Foi this exeicise, it's easiei to copy anu paste the IBs fiom the
online exeicise booklet (copy one column, then the othei). See the
fiont covei foi the 0RL.


/H+">&%+ SJ Q /HI#"- .#,#1#(4+%

Foi a list of &*'1$ ($B*41N* Ensembl genes, expoit the human
oithologues.

ENSCSAvuuuuuuuuuuu2
ENSCSAvuuuuuuuuuuuS
ENSCSAvuuuuuuuuuuu6
ENSCSAvuuuuuuuuuuu7
ENSCSAvuuuuuuuuuuu9
38
ENSCSAvuuuuuuuuuu11


/H+">&%+ SM Q /HI#"- %-"4>-4"=1 6="&='-%

You can use BioNait to queiy vaiiants, not just genes. (Nake suie you
use the iight Batasets.)

(a) Expoit the stuuy accession, souice name, chiomosome, sequence
iegion stait anu enu (in bp) of human stiuctuial vaiiations (Sv) on
chiomosome 1, staiting at 1Su,4u8 anu enuing at 21u,S97.

(b) In a new BioNait queiy, finu the alleles, phenotype uesciiptions,
anu associateu genes foi the human SNPs is18u1Suu anu is18u1S68.
Can you view this same infoimation in the Ensembl biowsei.


/H+">&%+ S8 Q D&'@ (+'+% =%%#>&=-+@ $&-. =""=] I"#0+%

Foiiest !" $% peifoimeu a micioaiiay analysis of peiipheial bloou
mononucleai cell gene expiession in benzene-exposeu woikeis
(Enviion Bealth Peispect. 2uuS }une; 11S(6): 8u1-8u7). The
micioaiiay useu was the human Affymetiix 01SSA¡B (also calleu
01SS plus 2) ueneChip. The top 2S up-iegulateu piobe-sets weie:

2u76Su_s_at
22184u_at
219228_at
2u4924_at
22761S_at
22S4S4_at
228962_at
214696_at
21u7S2_s_at
212S7u_at
22SS9u_s_at
22764S_at
2266S2_at
221641_s_at
2u2uSS_at
22674S_at
228S9S_s_at
22S12u_at
218S1S_at
2u2224_at
2uu614_at
212u14_x_at
22S461_at
2u98SS_x_at
21SS1S_x_at

(a) Retiieve foi the genes coiiesponuing to these piobe-sets the
Ensembl uene anu Tiansciipt IBs as well as theii BuNC symbols anu
uesciiptions.
39

(b) In oiuei to analyse these genes foi possible piomotei¡enhancei
elements, ietiieve the 2uuu bp upstieam of the tiansciipts of these
genes.

(c) In oiuei to be able to stuuy these human genes in mouse, iuentify
theii mouse oithologues. Also ietiieve the genomic cooiuinates of
these oithologues.
40
N="&=-&#'

V+,#a /HI1#"&'( 6="&='-% &' /'%+,01

In any of the sequence views shown in the uene anu Tiansciipt tabs,
you can view vaiiants on the sequence. You can uo this by clicking on
Configuie this page fiom any of these views.

Let's take a look at the uene sequence view foi =&=T in human.
Seaich foi =&=T anu go to the Sequence view.

If you can't see vaiiants maikeu on this view, click on Configuie this
page anu select Show vaiiations: Yes anu show links.



Finu out moie about a vaiiant by clicking on it.


You can auu vaiiants to all othei sequence views in the same way.
Var|ants on
sequences shown
as IUÞAC codes
L|nks to
var|ants
Legend of var|ant
consequence
types
41
You can go to the vaiiation tab by clicking on the vaiiant IB. Foi now,
we'll exploie moie ways of finuing vaiiants.

To view all the sequence vaiiations in table foim, click the vaiiation
table link at the left of the gene tab.



The table is uiviueu into consequence types.

Click on Show to expanu a uetaileu table foi any of the consequence
types available.

Let's expanu Nissense vaiiants.



The table contains lots of infoimation about the vaiiants. You can
click on the IBs heie to go to the vaiiation tab too.

Let's look at Stiuctuial vaiiation in the uene Tab. You'll finu it in the
left-hanu menu.

Var|ant
IDs
1ranscr|pt
affected
SII1 and
Þo|yÞhen
scores
42


You can click on the stiuctuial vaiiants (Svs) in the image, oi on theii
IBs in the table to go to the Sv tab.
You can also see the phenotypes associateu with a gene. Click on
Phenotypes in the left hanu menu.



1ab|e of
a|| SVs
Sma||er SVs
are shown
|nd|v|dua||y
A|| |arger SVs are
condensed |nto a
s|ng|e bar
C||ck to see a
||st of var|ants
Þhenotypes
assoc|ated w|th
var|ants |n the
gene
Þhenotypes
assoc|ated w|th
the gene
43

Let's have a look at vaiiants in the Location tab. Click on the Location
tab in the top bai.



Configuie this page anu open vaiiation fiom the left-hanu menu.



Theie aie vaiious options foi tuining on vaiiants. You can tuin on
vaiiants by souice, by fiequency, piesence of a phenotype oi by
inuiviuual genome they weie isolateu fiom. Tuin on the following
sequence vaiiants in Expanueu with name.
• 1uuu genomes - All
• 1uuu genomes - All - common
• All phenotype-associateu vaiiants
• ENSENBL:ventei

Also tuin on Laigei anu Smallei Stiuctuial vaiiants (all souices) in
Expanueu.

44


Click on a vaiiant to finu out moie infoimation. It may be easiei to
see the inuiviuual vaiiants if you zoom in.

Let's zoom in on the iegion 2:1S66u78Su-1S66u9811 by typing it
into the Location box.

Now that we aie zoomeu in, we can see the vaiiant names. Click on
the vaiiant is49882SS to open a pop-up, then click on is49882SS
piopeities to open the vaiiation tab.

Var|at|on
|egends
SVs
SNÞs and |nde|s
43


The icons show you what infoimation is available foi this vaiiant.
Click on uenes anu iegulation, oi follow the link at the left.



This vaiiant is founu in thiee tiansciipts of the =&=T gene. It has not
been associateu with any iegulatoiy featuies oi motifs.

Let's look at population genetics. Eithei click on Exploie this vaiiant
in the left hanu menu then click on the Population genetics icon, oi
click on Population genetics in the left-hanu menu.

Var|at|on |cons.
1hese go to the
same p|aces as the
||nks on the |eft
Var|at|on v|ews
Var|ant
|nformat|on
46



These uata aie mostly fiom the SKKK (+'#,+% anu \=IL=I
piojects in human.

Theie aie big uiffeiences in allele fiequencies between populations.
Let's have a look at the phenotypes associateu with this vaiiant to see
if they aie known to be specific to ceitain human populations. Eithei
click on Exploie this vaiiant in the left hanu menu then click on the
Phenotype uata icon, oi click on Phenotype Bata in the left-hanu
menu.



This vaiiant is associateu with lactase peisistence, which is known to
be common in Euiopean populations, anu iaie in Asian populations,
exactly as we saw in the allele fiequencies in these populations.

Aie theie othei vaiiants in the genome that also cause lactase
peisistence. Click on |view on Kaiyotypej to finu out.

1ab|e of more
deta||ed data
Lxpand
subpopu|at|ons
Þ|e charts of a||e|e
frequenc|es
47


Two vaiiants aie known to be associateu with this phenotype. Both
aie founu with the =&=T gene.

Click back to the vaiiation Tab. Click on Phylogenetic context to see
the vaiiant in othei species.



1ab|e of
var|ants
Legend show|ng
h|t s|gn|f|cance
n|ts on the
karyotype
SNÞ of
|nterest
A||gned reg|ons
Choose your
a||gnment
A||gnment
between spec|es
48
The vaiiant is not maikeu in the othei species. This means that the
vaiiant aiose in humans.


/H+">&%+%a /HI1#"&'( 6="&='-% &' /'%+,01

/H+">&%+ SO Q \4,=' I#I41=-&#' (+'+-&>% ='@ I.+'#-]I+ @=-=

The SNP is17S8u74 in the S' 0TR of the human I0U0; gene has been
iuentifieu as a genetic iisk factoi foi a few uiseases.

(a) In which tiansciipts is this SNP founu.

(b) What is the least fiequent genotype foi this SNP in the Yoiuba
(YRI) population fiom the BapNap set.

(c) What is the ancestial allele. Is it conseiveu in the S7 eutheiian
mammals.

(u) With which uiseases is this SNP associateu. Aie theie any known
iisk (oi associateu) alleles.


/H+">&%+ SP Q /HI1#"&'( = FE< &' .4,='

The missense vaiiation is18u11SS in the human =I?J7 gene has
been linkeu to elevateu levels of homocysteine, an amino aciu whose
plasma concentiation seems to be associateu with the iisk of
caiuiovasculai uiseases, neuial tube uefects, anu loss of cognitive
function. This SNP is also iefeiieu to as 'A222v', 'Ala222val' as well
as othei BuvS names.

(a) Finu the page with infoimation foi is18u11SS.

(b) Is is18u11SS a Nissense vaiiation in all tiansciipts of the =I?J7
gene.

(c) Why aie the alleles foi this vaiiation in Ensembl given as u¡A anu
not as C¡T, as in ubSNP anu liteiatuie.
(http:¡¡www.ncbi.nlm.nih.gov¡piojects¡SNP¡snp_ief.cgi.is=18u11S
S)

49
(u) What is the majoi allele in is18u11SS.

(e) In which papei is the association between is18u11SS anu
homocysteine levels uesciibeu.

(f) Accoiuing to the uata impoiteu fiom ubSNP, the ancestial allele
foi is18u11SS is u. Ancestial alleles in ubSNP aie baseu on a
compaiison between human anu chimp. Boes the sequence at this
same position in foui othei piimates, i.e. goiilla, oiangutan, macaque
anu maimoset, confiim that the ancestial allele is u.

(g) Weie both alleles of is18u11SS alieauy piesent in Neanueithal.
To answei this question, have a look at the inuiviuual ieaus at its
genomic position in the Neanueithal uenome Biowsei
(http:¡¡neanueital.ensemblgenomes.oig¡).



/H+">&%+ S7 Q F-"4>-4"=1 6="&=-&#' &' .4,='

In the papei 'The influence of &&VKVG gene-containing segmental
uuplications on BIv-1¡AIBS susceptibility' (uonzalez !" $% Science.
2uuS Nai 4; Su7(S7u4):14S4-4u) it is shown that a highei copy
numbei of the &&VKVG (Chemokine (C-C motif) liganu S-like 1) gene is
associateu with lowei susceptibility to BIv infection.

(a) Finu the human &&VKVG gene.

(b) Bave any CNvs been annotateu foi this gene. Note: In Ensembl,
CNvs aie classifieu as stiuctuial vaiiants2


/H+">&%+ SR Q /HI1#"&'( = FE< &' ,#4%+

Nausen !" $% in the papei 'Alteieu metabolic signatuie in pie-uiabetic
N0B mice' (PloS 0ne. 2u12; 7(4): eSS44S) have uesciibeu seveial
iegulatoiy anu couing SNPs, some of them in genes iesiuing within
the pieviously uefineu *1(,%*1 +!2!1+!1" +*$5!"!( LHCCW iegions. The
authois uesciibe that one of the iuentifieu SNPs in the muiine X+3
gene (is29S22S48) woulu leau to an amino aciu substitution anu
coulu be uamaging as pieuicteu as by SIFT (http:¡¡sift.jcvi.oig¡).

30
(a) Wheie is the SNP locateu (chiomosome anu cooiuinates).

(b) What is the BuvS iecommenuation nomenclatuie foi this SNP.

(c) Why uoes Ensembl put the C allele fiist (C¡T).

(u) Aie theie uiffeiences between the genotypes iepoiteu in
N0B¡LT} anu BALB¡cBy}.


V+,#a A.+ N="&='- /YY+>- <"+@&>-#" cN/<e

We have analyseu a samples fiom a patient with a genetic uisoiuei.
The patient piesents with facial anu limb uefoimities, mental
ietaiuation anu gastiointestinal ieflux. 0ui genotyping has iuentifieu
a mutation that may be iesponsible foi the phenotype:
01 0YZU -,"$"*'1 '1 /3)'-'('-! [ $" K\E]G\E8][ '1 "3! ^ (")$1+S

We will use the /'%+,01 N/< to ueteimine:

• Bas my vaiiant alieauy been annotateu in Ensembl.
• What genes aie affecteu by my vaiiant.
• Boes my vaiiant iesult in a piotein change.

uo to the fiont page of Ensembl anu click on the vEP button.



This page contains infoimation about the vEP, incluuing links to
uownloau the sciipt veision of the tool. Click on Launch the online
vEP tool!

This will open up a uialogue box. This allows us to input uata on oui
vaiiant.

31


The uata is in the foimat:
Chiomosome Stait Enu alleles (iefeience¡mutation) stianu

Belete the wiiting alieauy in the Paste uata box anu type in:
S S7u172uS S7u172uS A¡u +

Scioll uown to see some of the options we can also choose.


Þut your data
|n here.
G|ve your
data a
name
¥ou can a|so
up|oad a f||e.
Choose wh|ch
database to map
your var|ant to.
Choose to see
scores for
prote|n changes.
I|nd out |f
var|ants a|ready
ex|st |n our
database.
Choose to on|y
see common or
rare var|ants
32
Select Pieuiction anu Scoie foi SIFT pieuictions anu PolyPhen
pieuictions. These aie algoiithms that pieuict how ueleteiious a
mutation will be on a piotein.

When you've selecteu eveiything you neeu, scioll iight to the bottom
anu click Next.

Click BTNL to view youi iesults with clickable links.









/H+">&%+a A.+ N="&='- /YY+>- <"+@&>-#" cN/<e

/H+">&%+ ST Q N/<

Resequencing of the genomic iegion of the human &JI7 (cystic
fibiosis tiansmembiane conuuctance iegulatoi (ATP-binuing
cassette sub-family C, membei 7) gene (ENSuuuuuuuu1626) has
ievealeu the following vaiiants (alleles uefineu in the foiwaiu
stianu):
• u¡A at 7:117,171,uS9
• T¡C at 7:117,171,u92
• T¡C at 7:117,171,122

(a) 0se the vEP tool in Ensembl anu choose the options to see SIFT
anu PolyPhen pieuictions. Bo these vaiiants iesult in a change in the
pioteins encoueu by any of the Ensembl genes. Which gene. Bave
the vaiiants alieauy been founu.

(b) uo to Region in uetail foi &JI7. Bo you see the vEP tiack.

Cur mutat|on affects two
transcr|pts of one gene

Cur mutat|on causes an
am|no ac|d change

Cur mutat|on |s a|ready
|n the Lnsemb| database

33
3#,I="=-&6+ (+'#,&>%

V+,#a *+'+ -"++% ='@ .#,#1#(4+%

Let's look at the homologues of human 67&08. Seaich foi the gene
anu go to the *+'+ -=0.

Click on uene tiee (image), which will uisplay the cuiient gene in the
context of a phylogenetic tiee useu to ueteimine oithologues anu
paialogues.




Funnels inuicate collapseu noues. We can expanu them by clicking on
the noue anu selecting Expanu this sub-tiee fiom the pop-up menu.


Co||apsed nodes

Gene of
|nterest

Þrote|n
a||gnments

Legend
Lxpand th|s
sub-tree
34
We can look at homologues in the 0ithologues anu Paialogues pages,
which can be accesseu fiom the left-hanu menu. The numbeis of
oithologues oi paialogues available aie inuicateu in biackets
alongsiue the name. If theie aie none, then the name will be gieyeu
out. Paialogues is gieyeu out foi 67&08 inuicating that theie aie no
paialogues available.

Click on 0ithologues to see the 61 oithologues available.



Choose to see only :#@+'- oithologues by selecting the box. The
table below will now only show uetails of iouent oithologues. Let's
look at mouse.



Links fiom the oithologue allow you to go to alignments of the
oithologous pioteins anu cBNAs. Click on Alignment (piotein) foi the
mouse oithologue.

Crtho|ogue types

Informat|on on
ortho|ogues

Choose a
taxon of
|nterest

33



/H+">&%+%a *+'+ -"++% ='@ .#,#1#(4+%

/H+">&%+ JK Q C"-.#1#(4+%d I="=1#(4+% ='@ (+'+ -"++% Y#" -.+
.4,=' 0123 (+'+2

(a) Bow many oithologues aie pieuicteu foi this gene in piimates.
Note the Taiget %iu anu Queiy %iu.
Bow much sequence iuentity uoes the I$)(*,( (N)*/3"$ piotein have
to the human one. Click on the Alignment link next to the Ensembl
iuentifiei column to view a piotein alignment in Clustal foimat.

(b) uo to the oithologue in maimoset. Is theie a genomic alignment
between maimoset anu human. Is theie a gene foi both species in
this iegion.


V+,#a ^.#1+ (+'#,+ =1&(',+'-%

Let's look at some of the compaiative genomics views in the Location
tab. uo to the iegion 2:176914144-177u9498u in human, which
contains the ?'QC clustei which is involveu in limb uevelopment anu
is highly conseiveu between species.

In the :+(&#' &' @+-=&1 view, we can alieauy see the Constiaineu
elements foi S7 eutheiian mammals EP0_L0W_C0vERAuE tiack by
uefault. This tiack inuicates iegions of high conseivation between
species, consiueieu to be "constiaineu" by evolution.
A||gnment |n
C|usta| W format

Informat|on on
ortho|ogue pa|r

Þrote|n IDs

36



This tiack has a matching conseivation scoie tiack. Click on
Configuie this page, then Compaiative genomics anu tuin on the
tiack foi Conseivation scoie foi S7 eutheiian mammals
EP0_L0W_C0vERAuE. Save anu close the menu.



You can now see the conseivation scoies that weie useu to
ueteimine the peaks inuicateu in the constiaineu elements tiack.

We can also look at inuiviuual species compaiative genomics tiacks
in this view by clicking on Configuie this page.

Select BLASTz¡LASTz alignments fiom the left-hanu menu to choose
alignments between closely ielateu species. Tuin on the alignments
foi Nouse anu Chimpanzee in Noimal. uo to Tianslateu blat
alignments anu tuin on alignments with Zebiafish anu Xenopus in
Noimal. Save anu close the menu.








The alignment is gieatest between closely ielateu species.

We can also look at the alignment between species oi gioups of
species as text. Click on Alignments (text) in the left hanu menu.

Select Nouse fiom the alignments list then click uo.
Þrote|n a||gnments
|n magenta

I|||ed boxes are a||gned
sequences. Lmpty boxes
are no a||gnments

Nuc|eot|de a||gnments
|n baby p|nk

37


You will see a list of the iegions aligneu, followeu by the sequence
alignment. Exons aie shown in ieu.

This can also be vieweu giaphically. Click on Alignments (image) in
the left-hanu menu.


Mu|t|p|e a||gnments

Choose an a||gnment
from the drop-down

Þa|rw|se a||gnments

numan reg|on

Mouse |s a|ready se|ected
(from text v|ew)

Mouse reg|on,
rearranged to a||gn
w|th human

38
In both alignment views the contig is the compaieu species is
ieaiiangeu to align to the species of inteiest. To compaie with both
contigs in theii natuial oiuei, go to Region compaiison.

To auu species to this view, click on the blue Select species oi iegions
button. Choose Nouse fiom the list then close the menu.



We can view laige scale syntenic iegions fiom oui chiomosome of
inteiest. Click on Synteny in the left hanu menu.

A||gned reg|ons are
||nked up

numan reg|on

Mouse reg|on

39



/H+">&%+%a ^.#1+ (+'#,+ =1&(',+'-%

/H+">&%+ JS Q b+0"=Y&%. #"-.#1#(4+%

uo to www.ensembl.oig to finu the +53 gene on the zebiafish
genome.
(a) uo to the Location page foi this gene. view the Alignments
(image) anu Alignments (text) foi the S teleost fish. Which fish
genomes aie iepiesenteu in the alignment. Bo all the fish show a
gene in these alignments.

(b) Expoit the alignments (as Clustal).

Synten|c reg|ons

numan
chromosome

Mouse chromosome
w|th synten|c reg|on

Choose another
spec|es or
chromosome

keg|on of
|nterest

1ab|e of
synten|c genes

60
(c) Click on the Region in uetail link at the left anu tuin on the tiacks
foi multiple alignments anu conseivation scoie foi the S teleost fish
EP0 by configuiing the page.

What is the uiffeience between the S teleost fish EP0 multiple
alignment tiack anu the Constiaineu elements tiack. Which iegions
of the gene uo most of the constiaineu element blocks match up to.

Can you finu moie infoimation on how the constiaineu elements
tiack was geneiateu.


/H+">&%+ JJ Q F]'-+']

uo to www.ensembl.oig
Finu the Rhouopsin (7?_) gene foi Buman. uo to the Location tab.

(a) Click Synteny at the left. Aie theie any syntenic iegions in uog. If
so, which chiomosomes aie shown in this view.

(b) Stay in the Synteny view. Is theie a homologue in uog foi human
7?_. Aie theie moie genes in this syntenic block with homologues.


/H+">&%+ JM Q ^.#1+ (+'#,+ =1&(',+'-%

(a) Finu the Ensembl 67&08 (Bieast cancei type 2 susceptibility
piotein) gene foi human anu go to the Region in uetail page.

(b) Tuin on the BLASTZ oi LASTZ-net alignment tiacks foi chicken,
chimp, mouse anu platypus anu the Tianslateu BLAT alignment
tiacks foi anole lizaiu anu zebiafish. Boes the uegiee of conseivation
between human anu the vaiious othei species ieflect theii
evolutionaiy ielationship. Which paits of the 67&08 gene seem to be
the most conseiveu. Biu you expect this.

(c) Bave a look at the Conseivation scoie anu Constiaineu elements
tiacks foi the set of S7 mammals anu the set of 21 amniota
veitebiates. Bo these tiacks confiim what you alieauy saw in the
tiacks with paiiwise alignment uata.

61
(u) Retiieve the genomic alignment foi a constiaineu element.
Bighlight the bases that match in >Su% of the species in the
alignment.

(e) Retiieve the genomic alignment foi the 67&08 gene foi piimates.
Bighlight the bases that match in >Su% of the species in the
alignment.
62
:+(41=-&#'

V+,#a :=$ 3.G<F+Z @=-=

We'ie going to auu some iegulation uata to the :+(&#' &' @+-=&1
view. We'll stait at the human iegion 11:2u12486-2uSu1SS, which
contains the impiinteu ?G@ gene.

Auu iegulation tiacks using Configuie this page. Fiist, we'ie going to
auu ChIP-seq uata foi histone mouifications anu polymeiase binuing.
Click on Bistones & polymeiases unuei Regulation in the left-hanu
menu.



You can tuin on a single tiack by clicking on the box in the matiix.
Note that ceitain tiacks aie selecteu foi all cell lines by uefault (PolII,
PolIII, BSK27meS, BSKS6meS, BSK4meS, BSK9meS). These will
Se|ect
boxes

Ce|| ||nes

Choose track sty|es

Add tutor|a|
|abe|s to he|p
use th|s v|ew

n|stone
mod|f|cat|ons

Legend

63
appeai in the Region in uetail view only if you specify a tiack style foi
the cell lines.

Tuin on all the tiacks foi uN12878. Bovei ovei the cell line name
then select All.



Now choose the tiack style foi the tiacks you've switcheu on. Click on
the tiack style box foi uN12878 anu select Both.



Theie is a similai matiix foi 0pen chiomatin &TFBS. 0se this to tuin
on all tiacks foi uN12878 in Both.

Close the menu to see the tiacks in the biowsei.





V+,#a :+(41=-#"] Y+=-4"+% ='@ %+(,+'-=-&#'

These uata aie useu to constiuct the :+(?Y+=-% anu F+(,+'-=-&#'
Y+=-4"+%. The meigeu Reg-feats aie switcheu on in the Region in
uetail view by uefault.

Þeaks of
h|stone
mod|f|cat|ons

n|stograms of
h|stone
mod|f|cat|ons

C||ck for |egend of
h|stogram co|ours

64
Click on Configuie this page. Then select Regulatoiy featuies. Tuin on
the Reg. Feats: uN12878 anu Reg. Segs: uN12878 tiacks.

Save anu close the menu.



Can you see coiielations between the uiffeient kinus of iegulatoiy
uata iepiesentation.

You can also auu methylation uata using Configuie this page. Finu it
unuei BNA methylation anu tuin on uN12878 RRBS ENC0BE anu
uN12878 WuBS ENC0BE.



keg feats are
shown as bar and
wh|sker p|ots

A s|ng|e
co|oured bar
represents the
segmentat|on

Legends of reg
feats and
segmentat|on
co|our codes

63
0ui iegulatoiy uata incoipoiates the ENC0BE uata. To see the iaw
ENC0BE uata anu the ENC0BE segmentation, you neeu to auu the
ENC0BE hub.

Fiom ensembl.oig, click on the ENC0BE icon.


This page contains infoimation about the ENC0BE uata anu how it is
incoipoiateu into Ensembl.

Auu the ENC0BE hub by clicking on the Link to auu the ENC0BE
tiack hub.

This will take you uiiectly to the matiices foi auuing ENC0BE uata to
the Region in uetail view. The ENC0BE matiices woik in the same
way as the 0pen chiomatin &TFBS anu Bistones & polymeiases
matiices, except that some have multiple options (inuicateu by
numbeis within the boxes).


/H+">&%+%a :+(41=-&#'

/H+">&%+ J8 Q *+'+ "+(41=-&#'a \4,=' 4567

(a) Finu the Location tab (Region in uetail page) foi the :IX\ gene.
Aie theie iegulatoiy featuies in this gene iegion. If so, wheie in the
gene uo they appeai.

(b) Click Configuie this page anu on the Regulatoiy featuies menu in
the left hanu siue. Tuin on Segmentation featuies foi B0vEC, BeLa-
SS, anu Bepu2 cell types. Bo any of these cells show pieuicteu
enhancei iegions in the :IX\ iegion.

(c) 0se Configuie this page to auu suppoiting uata inuicating open
chiomatin foi BeLa-SS cells. Aie theie sites eniicheu foi maiks of
open chiomatin (BNase1 anu FAIRE) in BeLa cells at the S' enu of
:IX\.
66

(u) Configuie this page once again to auu histone mouification
suppoiting uata foi the same cell type as above (e.g.BeLa-SS). Which
ones aie piesent at the S' enu of :IX\.

(e) Is theie any uata to suppoit methylateu Cpu sites in this iegion
(S' enu) of :IX\ in B-cells.

(f) Cieate a Shaie link foi this uisplay. Email it to youiself then open
the link.


/H+">&%+ JO Q :+(41=-#"] Y+=-4"+% &' .4,='

The ?V0YC76G anu ?V0YC`0G genes aie pait of the human majoi
histocompatibility complex class II (NBC-II) iegion anu aie locateu
about 44 kb fiom each othei on chiomosome 6. In the papei 'The
human majoi histocompatibility complex class II ?V0YC76G anu ?V0Y
C`0G genes aie sepaiateu by a CTCF-binuing enhancei-blocking
element' (Najumuei !" $% } Biol Chem. 2uu6 }ul 7;281(27):184SS-4S)
a iegion of high acetylation locateu in the inteigenic sequences
between ?V0YC76G anu ?V0YC`0G is uesciibeu. This iegion, teimeu
XL9, coinciueu with sequences that bounu the insulatoi piotein
CCCTC-binuing factoi (CTCF). Najumuei !" $% hypothesise that the
XL9 iegion may have evolveu to sepaiate the tiansciiptional units of
the ?V0YC7 anu ?V0YC` genes.

(a) uo to the iegion fiom S2,S4u,uuu to S2,62u,uuu bp on human
chiomosome 6

(b) Is theie a iegulatoiy featuie annotateu in the inteigenic iegion
between the ?V0YC76G anu ?V0YC`0G genes that has CTCF binuing
suppoiting uata as (pait of) its coie eviuence.

(c) Bas the CTCF binuing uetecteu at this position been obseiveu in
all cell¡tissue types analyseu.

(u) Bave a look at the Regulatoiy suppoiting eviuence - Bistones &
Polymeiases configuiation matiix. Foi which cell¡tissue type aie the
most histone acetylation uata sets available. In this cell¡tissue type,
is the iegion that shows CTCF binuing also a iegion of high
acetylation, as founu by Najumuei !" $%.
67
B@6='>+@ B>>+%%

V+,#a 9I1#=@ %,=11 Y&1+%

We have some patients that piesent with miciocephaly anu
uevelopmental uelay. They all have laige scale ueletions on
chiomosome five:

<=-&+'- 3."#,#%#,+ F-="- /'@
P1 S S68216S2 S7u912S4
P2 S S67S1476 S6978Su6
PS S S69u8SS2 S71u8671

We can tuin them into a BEB file anu view them in the genome
biowsei.

To finu out about BEB foimat, click on Belp & Bocumentation in the
top bai fiom any page in Ensembl:



Click on BEB File Foimat to finu out moie:


68
This page uesciibes the BEB file foimat.

Foi oui uata, we have chiomosome cooiuinates anu a name foi each
featuie. Following the instiuctions on this page, we can put oui uata
into BEB foimat as follows:

chiS S68216S2 S7u912S4 P1
chiS S67S1476 S6978Su6 P2
chiS S69u8SS2 S71u8671 PS

To see this uata in Ensembl, we neeu to go to a iegion of inteiest.
We'll go to the iegion of these uata. Put human S:S67uuuuu-
S711uuuu into the top iight seaich box to jump to the Region in
uetail page.

Click on the Auu youi uata button at the left. If you've pieviously
auueu uata to Ensembl, this button will say Nanage youi uata
insteau.
oi

A menu will appeai:


Noie options will now appeai in the menu. Since uploau is alloweu
foi BEB, this option appeais. You aie still able to attach a 0RL if you
want to.

Choose a name
for the data

Spec|es |s
human

Se|ect 8LD

69


Paste the BEB uata into the box then click 0ploau.

You shoulu get to a uialogue box telling you youi uploau has been
successful. Close the menu to go back to youi iegion of inteiest.



To have a look at the file, click on Nanage youi uata.



If you've got an Ensembl account, you can save this uata to youi
account. Accounts aie fiee to set up anu allow you to save
configuiations anu uata, anu shaie with gioups.

1he data |n the
browser

nover over the track
name to change |ts
appearance

Save, share or
de|ete th|s data.

70
V+,#a B--=>. 9:;% #Y 1="(+ Y&1+%

Laigei files, such as BAN files geneiateu by NuS, neeu to be attacheu
by 0RL. I've put a BAN file of human chiomosome 2u RNASeq uata
online at:
http:¡¡www.ebi.ac.uk¡~emily¡Woikshops¡BAN

Let's take a look at that 0RL.



Beie you can see two files Illumina_ieaus_test.bam anu
Illumina_ieaus_test.bam.bai (the files beginning with ._ aie aitefacts
of cieating this foluei on a Nac - ignoie them). These files aie the
BAN file anu the inuex file iespectively. When attaching a BAN file to
Ensembl, theie must be an inuex file in the same foluei.

To attach the file, click on Nanage youi uata, then click on Auu youi
uata to auu a new tiack.

We get to the same uialogue box as befoie. This time we'll name oui
uata Illumina ieaus anu choose BAN as the uata foimat.

Paste in the 0RL of the BAN file itself
(http:¡¡www.ebi.ac.uk¡~emily¡Woikshops¡BAN¡Illumina_ieaus_tes
t.bam), then click Attach.

71


Close the menu.

To see this uata, jump to a iegion on chiomosome 2u. Let's go to the
iegion of the &C?88 gene. Seaich foi the gene anu click on the
location.



We can zoom in to see the sequence itself. Biag out boxes in the view
to zoom in, until you see a view like this.


8AM read
|ntens|ty

8AM reads


Sequence of
|nd|v|dua| 8AM
reads

Consensus 8AM
read sequence

Genom|c
sequence

72
V+,#a :/FA B<G

I have the cooiuinates of a paiticulai piotein motif with iespect to
the piotein that it's in. I woulu like to finu out wheie this motif lies on
the genome.

I'm inteiesteu in a coileu-coil uomain at position 116-216 in the
piotein ENSPuuuuuS862uu.

To uo this I want to use the REST API. I'll stait at the REST homepage
at http:¡¡beta.iest.ensembl.oig¡.



Beie you can see a list of all the possible REST enupoints, with names
anu shoit uesciiptions. Scioll uown to finu the section Napping. The
enupoint uET map¡tianslation¡:iu¡:iegion uoes what we want. Click
on the link.

73


If you wish to extiact this uata using a language such as Peil, Python,
Ruby oi }ava, oi to get the uata using commanu line tools such as Cuil
oi Wget, you can click on them to see coue examples. We'ie just going
to uo a simple lookup using a 0RL.

The top of the page shows us that the methou is
map¡tianslation¡:iu¡:iegion. That means that we can get oui uata
using a 0RL in the foimat
beta.ensembl.iest.oig¡map¡tianslation¡:iu¡:iegion.

Lxamp|e
requests
Cpt|ona| paramaters:
a||ow you to choose
your output format
kequ|red parameters:
what the endpo|nt
NLLDS to work
Descr|pt|on of
the endpo|nt
Code examp|es
|n d|fferent
|anguages for
access|ng th|s
endpo|nt
1he examp|e
output shown
by defau|t
74
Foi oui uata we can use the 0RL
http:¡¡beta.iest.ensembl.oig¡map¡tianslation¡ENSPuuuuuS862uu¡
116..216. Put this into youi inteinet biowsei.

This will take you to a text page:



Fiom this we can see that oui coileu-coil uomain coveis two uiffeient
iegions, which will be two uiffeient exons of the tiansciipt. They aie
on chiomosome 7 anu span 1142686u7-1142687S2 anu 11426986u-
11427uuS6.

If we weie accessing this uata piogiammatically, the stanuaiu output
foimat woulu allow us to extiact the uata.



73
B@6='>+@ +H+">&%+

This exeicise iequiies you to combine the knowleuge you have
gaineu about uiffeient aspects of Ensembl. It is uesigneu to be
challenging anu foice you to come up with solutions youiself.

L+-.]1=-&#' @=-= &' .4,='

The human ;C?08 gene, that encoues foi a subunit of the pyiuvate
uehyuiogenase complex, is exclusively expiesseu in speimatogenic
cells. In the papei 'Buman testis-specific ;C?08 gene: Nethylation
status of a Cpu islanu in the open ieauing fiame coiielates with
tiansciiptional Activity' (Pinheiio !" $% Nol uenet Netab. 2u1u
Api;99(4):42S-Su), two Cpu islanus in the ;C?08 gene aie iepoiteu,
one encompassing the coie piomotei iegion anu extenuing into the
open ieauing fiame, the othei exclusively locateu in the couing
iegion. The lattei Cpu islanu was shown to be methylateu in somatic
tissues but uemethylateu in testiculai geim cells anu has theiefoie
been pioposeu to play an impoitant iole in the tissue-specific
expiession of the ;C?08 gene.

(a) Finu the ;C?08 gene foi human anu go to the Region in uetail
page. Zoom out one step, so that S kb aiounu the ;C?08 gene is
shown.

(b) Tuin on the Cpu islanus tiack. Two Cpu islanus aie iepoiteu in
the ;C?08 gene by Pinheiio !" $% (2u1u). Bo they appeai in this
tiack. If not, why not. (Tip: tuin on Bisplay empty tiacks to confiim
that a tiack is on but has no uata.)

(c) Confiim the existence of the two Cpu islanus using the ENB0SS
piogiam CpuPlot
(http:¡¡www.ebi.ac.uk¡Tools¡emboss¡cpgplot¡inuex.html) on the
sequence aiounu the ;C?08 gene.

(u) 0ploau the Cpu islanus founu by CpuPlot using Nanage youi uata.
0se BEB foimat, which in its simplest foim just consists of the
chiomosome anu the stait anu enu cooiuinates, sepaiateu by spaces
(as an optional fouith fielu, you can auu a name¡uesciiption). The
genomic stait anu enu cooiuinates of the Cpu islanus can be
calculateu fiom the genomic stait cooiuinate of the sequence on
76
which the CpuPlot piogiam was iun anu the ielative location of the
Cpu islanus on this sequence as given by the CpuPlot output.

(e) Cieate a link to allow you to show youi new BEB tiack to
colleagues, compaieu to the %uC tiack.

(f) What is the methylation status of the two Cpu islanus in uiffeient
tissues. Is theie any tissue in paiticulai which is uiffeient to othei
tissues.

(g) Tuin on the RNASeq tiacks foi uiffeient tissues. Is theie eviuence
that ;C?08 is expiesseu in one tissue moie than otheis. Bow uoes
this ielate to the BNA methylation uata you saw. What uoes this
suggest about the way this gene is iegulateu.

(h) Bow well conseiveu is the iegion of the ;C?08 gene amongst the
S7 eutheiian mammals. Aie the Cpu islanus conseiveu.

(i) Bow many u0 teims aie associateu with ;C?08. Can you expoit
the sequences of all human genes that aie also associateu with the
fiist of these teims.

(j) Can you fetch the gene sequence foi ;C?08 in FASTA using the
Ensembl REST API.

77
B'%$+"% Q /HI1#"&'( -.+ /'%+,01 (+'#,+ 0"#$%+"

/'%+,01 %I+>&+%

/H+">&%+ S Q <='@=

(a) Select Panua fiom the uiop uown species list, oi click on view full
list of all Ensembl species, then choose Panua fiom the list.
The assembly is ailNel1 oi uCA uuuuu4SSS.1

(b) Click on Noie infoimation anu statistics. Statistics aie shown in
the tables on the left.
The length of the genome is 2,24S,S12,8S1 bp.
Theie aie 19,S4S couing genes.

/H+">&%+ J Q b+0"=Y&%.

(a) Click on Zebiafish on the fiont page of Ensembl to go to the
species homepage. News is in the top iight.
What's new in Zebiafish ielease 7S:
• Splicing events
• Stiuctuial vaiiations
• Zebiafish knockout uata

(b) Assembly Zv8 is available in the aichiveu ielease S9.


/H+">&%+ M Q L#%Z4&-#%

(a) uo to metazoa.ensembl.oig. 0pen the uiop uown list oi click on
view full list of all Ensembl Netazoa species.
Theie aie two 01'23!%!( species: 01'23!%!( 4$-5*$! anu
01'23!%!( +$)%*14*.

(b) Click on 01'23!%!( 4$-5*$!, then on Noie infoimation anu
statistics.
The genome was publisheu in 2uu2 by Bolt !" $% anu
upuateu in 2uu7 by Shaiakhova !" $%.



78
/H+">&%+ 8 Q !=>-+"&=

uo to bacteiia.ensembl.oig anu stait to type the name 6!%%*!%%$ 5$%"*/$
into the seaich species box. It will autocomplete, allowing you to
select Belliella baltica BSN 1S88S, (TaxIB 866SS6) fiom the uiop-
uown list. Click on Noie infoimation anu statistics.
6!%%*!%%$ 5$%"*/$ has S,68u couing genes anu SS non-couing.


:+(&#' &' @+-=&1

/H+">&%+ O Q /HI1#"&'( = (+'#,&> "+(&#' &' .4,='

(a) uo to the Ensembl homepage (http:¡¡www.ensembl.oig¡).

Select Seaich: Buman anu type 1S:S2448uuu-SS198uuu in the text
box (oi alteinatively leave the Seaich uiop-uown list like it is anu
type human 1S:S2448uuu-SS198uuu in the text box).
Click uo.
This genomic iegion is locateu on cytogenetic banu q1S.1. It is
maue up of seven contigs, inuicateu by the alteinating light anu
uaik blue colouieu bais in the Contigs tiack.

(b) Biaw with youi mouse a box encompassing the 67&08
tiansciipts. Click on }ump to iegion in the pop-up menu.

(c) Click Configuie this page in the siue menu (oi on the cog wheel
icon in the top left hanu siue of the bottom image).

Type tilepath in the Finu a tiack text box.
Select Tilepath.
Click on the (i) button to finu out moie
The tilepath tiack shows the BAC clones that the assembly was
baseu upon.
Save anu close the new configuiation by clicking on ! (oi anywheie
outsiue the pop-up winuow).
Theie is not just one clone that contains the complete 67&08
gene. The BAC clone RP11-S7E2S contains most of the gene,
but not its veiy S' enu (containeu in RP11-298PS). This was
ieflecteu on the two contigs that make up the entiie 67&08
gene (the Contigs tiack is on by uefault).

79
(u) Click Shaie this page in the siue menu.

Select the link anu copy.
Compose an email to youiself, paste the link in anu senu the message.
0pen the email anu click on youi link. You shoulu be able to view the
page with the new configuiation anu uata tiacks you hau auueu to in
the Location tab.

(e) Click Expoit uata in the siue menu. Leave the uefault paiameteis
as they aie.
Click Next>.
Click on Text.

Note that the sequence has a heauei that pioviues infoimation about
the genome assembly (uRChS7), the chiomosome, the stait anu enu
cooiuinates anu the stianu. Foi example:

>13 dna:chromosome
chromosome:GRCh37:13:32883613:32978196:1

(f) Click Configuie this page in the siue menu.
Click Reset configuiation.
Click !.


/H+">&%+ P Q /HI1#"&'( I=->.+% ='@ .=I1#-]I+% &' .4,='

(a) uo to the Ensembl homepage (http:¡¡www.ensembl.oig¡).

Select Seaich: Buman anu type 6:112294691-112624977 in the text
box (oi alteinatively leave the Seaich uiop-uown list like it is anu
type human 6:112294691-112624977 in the text box).
Click uo.

You will see a gieen highlighteu iegion in the miuule of this iegion.
Click on the thin uaik gieen bai in any of the thiee views to see the
label \*SMK8f<BA3\. To leain about patches, open a new tab in
youi inteinet biowsei, go to the Ensembl homepage anu put patch
into the seaich box.

80
Choose Belp & Bocs fiom the left hanu siue. Theie aie glossaiy teims
(Patch anu Alteinative sequence) anu an FAQ (What haplotypes anu
assembly patches can I see foi human.) that explain patches.

(b) Patches aie maikeu in gieen in the chiomosome view at the top.
Click on the leftmost patch to confiim that it is uefinitely Bu27_patch.
Biag a box aiounu it (less than 1Nb) then click on }ump to iegion.

Scioll uown to the Region in uetail view anu click on the thin uaik
gieen bai at the top of the patch. A uiop-uown containing the
cooiuinates of the patch will appeai.
6: 26S8S84S-268S9228

(c) Anothei option in this uiop-uown is Compaie with iefeience.
Click on this.

Scioll uown the page to see the compaiison between the patch anu
iefeience. Aligneu sequences aie highlighteu in pink anu linkeu
togethei in gieen.
The sequences in this iegion have been ieaiiangeu.

(u) Click the back button in youi biowsei to ietuin to the Region in
uetail page. 0sing youi mouse, click anu uiag within the 1Nb view to
move iight. The ieu highlighteu iegions aie all labelleu BSCBR6_NBC
etc, which is the NBC haplotypic iegion. Seaich help again to
unueistanu what haplotypes aie, in the same way as you uiu foi
patches.

81
B'%$+"% Q *+'+% ='@ A"='%>"&I-%

/H+">&%+ 7 ? /HI1#"&'( -.+ .4,=' !"#$ (+'+

(a) uo to the Ensembl homepage (http:¡¡www.ensembl.oig).

Select Seaich: Buman anu type =>?@. Click uo, then Buman on the
iesults page. Click on uene.

Click on eithei the Ensembl IB ENSuuuuuu1uuS4S oi the BuNC
official gene name =>?@.

• Chiomosome 22 on the ieveise stianu.
• Ensembl has 11 tiansciipts annotateu foi this gene.
• Thiee tiansciipts aie piotein couing.
• The longest tiansciipt is NYB9-uu1 anu it coues foi a piotein
of 1,96u amino acius
• NYB9-uu1 has a CCBS iecoiu. CCBS is the consensus couing
sequence set, which couing sequences (CBS) aie agieeu upon
by Ensembl, Bavana, NCBI anu 0CSC.
The CCBS set is a collection of ievieweu, agieeu-upon couing
sequences (foi human anu mouse). These sequences aie of high
confiuence, anu unlikely to change in the futuie.
(b) These aie some of the phenotypes associateu to =>?@
accoiuing to NIN: autosomal uominant ueafness, Epstein
synuiome, anu Fechtnei synuiome. Click on the iecoius foi
moie infoimation.

(c) Click on ENSTuuuuu216181

• It has 41 exons. This is shown in the Tiansciipt summaiy oi in
the left hanu siue menu Exons.
• Click on the Exons link in this siue menu. Exon 1 is completely
untianslateu, anu exons 2 anu 41 aie paitially untianslateu
82
(0TR sequence is shown in puiple). You can also see this in the
cBNA view if you click on the cBNA link in the left siue menu.
• NYB9-B0NAN fiom 0niPiot¡Swiss-Piot matches the
tianslation of the Ensembl tiansciipt. Click on NYB9-B0NAN
to go to 0niPiotKB, oi click align foi the alignment.
• The uene 0ntology pioject (http:¡¡www.geneontology.oig¡)
maps teims to a piotein in thiee classes: biological piocess,
cellulai component, anu moleculai function. Neiotic spinule
oiganisation, cell moiphogenesis, anu cytokinesis aie some of
the ioles associateu with NYB9-uu1.

(u) Click on 0ligo piobes in the siue menu.
Piobesets fiom Affymetiix, Agilent, Couelink, Illumina, anu
Phalanx match to this tiansciipt sequence. Expiession analysis
with any of these piobesets woulu ieveal infoimation about the
tiansciipt. Bint: this infoimation can sometimes be founu in the
AiiayExpiess Atlas: www.ebi.ac.uk¡aiiayexpiess¡


/H+">&%+ R Q D&'@&'( = (+'+ =%%#>&=-+@ $&-. = I.+'#-]I+

(a) Stait at the Ensembl homepage (http:¡¡www.ensembl.oig).

Type phenylketonuiia into the seaich box then click uo. Choose uene
fiom the left hanu menu.
The gene associateu with this uisoiuei is ;0?, phenylalanine
hyuioxylase, ENSuuuuuu1717S9.

(b) Click on the gene symbol to go to the uene tab. Click on
Expiession in the left hanu menu.
The gene is expiesseu in all tissues listeu. This is unsuipiising
foi a metabolic gene.

Bovei ovei the column titles to view uefinitions.
Intion spanning ieaus aie RNASeq ieaus that covei exon
junctions.
RNASeq alignments aie RNASeq ieaus that align to the genome.

83
(c) If the tiansciipt table is hiuuen, click on Show tiansciipt table to
see it.
Theie aie foui piotein couing tiansciipts.

Click on Tiansciipt compaiison in the left hanu menu. Click on Select
tiansciipts. Eithei select all the tiansciipts labelleu piotein couing
one-by-one, oi click on the uiop uown anu select Piotein couing.
Close the menu.

(u) Click on Exteinal iefeiences.
The NIN uisease IB is 2616uu.


/H+">&%+ T Q /HI1#"&'( = I1='- (+'+ c&'(') *'+',-./d ("=I+e

(a) uo to http:¡¡plants.ensembl.oig¡inuex.html

Select A*"*( B*1*.!)$ fiom the uiop uown menu All genomes - select a
species oi click on view full list of all Ensembl Plants species anu then
choose AS B*1*.!)$.

Type =0C:D anu click on the gene name link =0C:D
|vIT_u1suu1uguS9uu j.
Click on u0: biological piocess in the siue menu.
Theie aie nine teims listeu incluuing u0:uuu6SS1,
tiansciiption, BNA-uepenuent, anu u0:uuu6SSS, iegulation of
tiansciiption, BNA-uepenuent.

(b) Click on the tiansciipt tab nameu vvu1suu1uguS9uu.tu1 (oi on
the Tiansciipt tab). Click on Exons in the left hanu menu.
Theie aie eight exons, of which exon 8 is longest with SuS bp,
of which 1S aie couing.

c) Click on eithei Piotein Summaiy oi Bomains & featuies in the left
hanu menu to see giaphically oi as a table iespectively.
A TF_NABSbox is iuentifieu by six uomain pieuiction methous.
A TF_Kbox uomain is iuentifieu by two. Two coileu-coils aie
iuentifieu by one.

84
B'%$+"% Q !&#L="-

/H+">&%+ SK Q D&'@&'( (+'+% 0] I"#-+&' @#,=&'

As with all BioNait queiies you must select the @=-=%+-, set youi
Y&1-+"% (input) anu uefine youi =--"&04-+% (uesiieu output). Foi this
exeicise:
V=-=%+-a Ensembl genes in mouse
D&1-+"%a Tiansmembiane pioteins on chiomosome 9
B--"&04-+%a Ensembl gene anu tiansciipt IBs anu Associateu gene
names
uo to the Ensembl homepage (http:¡¡www.ensembl.oig) anu click on
BioNait at the top of the page.
Select Ensembl genes as youi uatabase anu Nus musculus genes as
the uataset.
Click on Filteis on the left of the scieen anu expanu REuI0N. Change
the chiomosome to 9.
Now expanu PR0TEIN B0NAINS, also unuei filteis, anu select
Tiansmembiane uomains anu then 0nly. Clicking on Count shoulu
ieveal that you have filteieu the uataset uown to 42u genes.
Click on Attiibutes anu expanu uENE. Select Associateu gene name.
Now click on Results. The fiist 1u iesults aie uisplayeu by uefault;
uisplay all iesults by selecting ALL fiom the uiop uown menu.

The output will uisplay the Ensembl gene IB, Ensembl Tiansciipt IB
anu Associateu gene names of all pioteins with a tiansmembiane
uomain on mouse chiomosome 9. If you piefei, you can also expoit
as an Excel sheet by using the Expoit all iesults to XLS option.


/H+">&%+ SS Q 3#'6+"- GV%

Click New.
Choose the ENSENBL uenes 7S uatabase.
Choose the ?'-' ($2*!1( genes (uRChS7) uataset.

Click on Filteis in the left panel.
Expanu the uENE section by clicking on the + box.
Select IB list limit - RefSeq piotein IB(s) anu entei the list of IBs in
the text box (eithei comma sepaiateu oi as a list).
\GEAa You may have to scioll uown the menu to see these.
83

Count shows 11 genes (iemembei one gene may have multiple splice
vaiiants couing foi uiffeient pioteins, that is the ieason why these 29
pioteins uo not coiiesponu to 29 genes).

Click on Attiibutes in the left panel.
Select the Featuies attiibutes page.
Expanu the Exteinal section by clicking on the + box.
Select BuNC symbol anu RefSeq Piotein IB fiom the Exteinal
Refeiences section.

Click the Results button on the toolbai.
Select view All iows as BTNL oi expoit all iesults to a file. Tick the
box 0nique iesults only.


/H+">&%+ SJ Q /HI#"- .#,#1#(4+%

Click New.
Choose the ENSENBL uenes 74 uatabase.
Choose the &*'1$ ($B*41N* genes (CSAv2.u) uataset.

Click on Filteis in the left panel.
Expanu the uENE section by clicking on the + box.
Entei the gene list in the IB List Limit box.

Click on Attiibutes in the left panel.
Select the Bomologs attiibutes page.
Expanu the 0ithologs section by clicking on the + box.
Select Buman Ensembl uene IB.
Click Results (iemembei to tick the unique iesults only box).


/H+">&%+ SM Q /HI#"- %-"4>-4"=1 6="&='-%

(a) Choose Ensembl vaiiation 74 anu ?'-' ($2*!1( Stiuctuial
vaiiation.
D&1-+"%a Region: Chiomosome 1, Base paii stait: 1Su4u8, Base paii
enu: 21uS97
3#4'- shows 6 out of S,S77,u2S stiuctuial vaiiants.
B--"&04-+%a Stiuctuial vaiiation (Sv) Infoimation: Buva Stuuy
Accession anu Souice Name
86
Stiuctuial vaiiation (Sv) Location: Chiomosome name, Sequence
iegion stait (bp) anu Sequence iegion enu (bp).

(b) Choose Ensembl vaiiation 74 anu Bomo sapiens Shoit vaiiation
(SNPs anu inuels).
D&1-+"%a Filtei by vaiiation IB entei: is18u1Suu, is18u1S68
B--"&04-+%a vaiiation Name, vaiiant Alleles, Phenotype uesciiption,
anu Associateu gene.
You can view this same infoimation in the Ensembl biowsei.
Click on one of the vaiiation IBs (names) in the iesult table. The
vaiiation tab shoulu open in the Ensembl biowsei. Click
Phenotype Bata.


/H+">&%+ S8 Q D&'@ (+'+% =%%#>&=-+@ $&-. =""=] I"#0+%

(a) Click New.
Choose the ENSENBL uenes 74 uatabase.
Choose the ?'-' ($2*!1( genes (uRChS7) uataset.

Click on Filteis in the left panel.
Expanu the uENE section by clicking on the + box.
Select IB list limit - Affy hg u1SS plus 2 piobeset IB(s) anu entei the
list of piobeset IBs in the text box (eithei comma sepaiateu oi as a
list).

Count shows 2S genes match this list of piobesets.

Click on Attiibutes in the left panel.
Select the Featuies attiibutes page.
Expanu the uENE section by clicking on the + box.
In auuition to the uefault selecteu attiibutes, select Besciiption.
Expanu the Exteinal section by clicking on the + box.
Select BuNC symbol fiom the Exteinal Refeiences section anu AFFY
Bu 01SS-PL0S-2 fiom the Nicioaiiay Attiibutes section.

Click the Results button on the toolbai.
Select view All iows as BTNL oi expoit all iesults to a file. Tick the
box 0nique iesults only.
Youi iesults shoulu show that the 2S piobes map to 2S
Ensembl genes.

87
(b) Bon't change Bataset anu Filteis- simply click on Attiibutes.

Select the Sequences attiibutes page.
Expanu the SEQ0ENCES section by clicking on the + box.
Select Flank (Tiansciipt) anu entei 2uuu in the 0pstieam flank text
box.
Expanu the Beauei infoimation section by clicking on the + box.
Select, in auuition to the uefault selecteu attiibutes, Besciiption anu
Associateu uene Name2

Note: Flank (Tiansciipt) will give the flanks foi all tiansciipts of a
gene with multiple tiansciipts. Flank (uene) will give the flanks foi
one possible tiansciipt in a gene (the most S' cooiuinates foi
upstieam flanking).

Click the Results button on the toolbai.

(c) You can leave the Bataset anu Filteis the same, anu go uiiectly to
the Attiibutes section:

Click on Attiibutes in the left panel.
Select the Bomologs attiibutes page.
Expanu the uENE section by clicking on the + box.
Select Associateu uene Name.
Beselect Ensembl Tiansciipt IB.
Expanu the 0RTB0L0uS section by clicking on the + box.
Select Nouse Ensembl uene IB, Nouse Chiomosome Name, Nouse
Chi Stait (bp) anu Nouse Chi Enu (bp).

Click the Results button on the toolbai.
Check the box 0nique iesults only. Select view All iows as BTNL oi
expoit all iesults to a file.
Youi iesults shoulu show that foi most of the human genes at
least one mouse oithologue has been iuentifieu.



88
B'%$+"% Q N="&=-&#'

D&'@&'( 6="&='-% &' /'%+,01

/H+">&%+ SO Q \4,=' I#I41=-&#' (+'+-&>% ='@ I.+'#-]I+ @=-=

(a) Please note theie is moie than one way to get this answei. Eithei
go to the vaiiation Table foi the human I0U0; gene, anu Show
vaiiants in the S'0TR, oi seaich Ensembl foi is17S8u74 uiiectly.

0nce you'ie in the vaiiation tab, click on the uenes anu iegulation
link oi icon. This SNP is founu in thiee tiansciipts
(ENSTuuuuuS2696S, ENSTuuuuuSS8S1S, anu ENSTuuuuuS67u66).

(b) Click on Population genetics at the left of the vaiiation tab. (0i,
click on Exploie this vaiiation at the left anu click the Population
genetics icon.)
In Yoiuba (CSBL-BAPNAP:BapNap-YRI population), the least
fiequent genotype is CC at the fiequency of 9.7%. This is also
the least fiequent genotype in in othei populations (to finu out
what the thiee lettei population aie, have a look at oui FAQ
(http:¡¡www.ensembl.oig¡Belp¡Faq.iu=S28)

(c) Click on phylogenetic context.
The ancestial allele is T anu it's infeiieu fiom the alignment in
piimates.

Select the S7 eutheiian mammals EP0 L0W C0vERAuE alignment
anu click on uo.
A iegion containing the SNP (highlighteu in ieu anu placeu in
the centie) anu its flanking sequence aie uisplayeu. The T allele
is conseiveu in all but thiee of the S7 eutheiian mammals
uisplayeu. Note that one species has no alignment in that iegion
anu many othei species have no vaiiation uatabase.

(u) Click Phenotype Bata at the left of the vaiiation page.
This vaiiation is associateu with uiabetes, multiple scleiosis
anu coeliac. Theie aie known iisk alleles foi both multiple
scleiosis anu coeliac anu the coiiesponuing P values aie
pioviueu. The allele A is associateu with coeliac uisease. Note
that the alleles iepoiteu by Ensembl aie T¡C. Ensembl iepoits
89
alleles on the foiwaiu stianu. This suggests that A was
iepoiteu on the ieveise stianu in the PubNeu aiticle.

You can view Exteinal Bata souices that miiioi uata fiom
SNPeuia anu L0vB. We shaie infoimation about the effects of
vaiiations in BNA, citing peei-ievieweu scientific publications.
Click on SNPeuia anu L0vB in the left hanu menu to exploie
fuithei. No L0vB uata was founu foi this vaiiant so fai.


/H+">&%+ SP Q /HI1#"&'( = FE< &' .4,='

(a) uo to the Ensembl homepage (http:¡¡www.ensembl.oig¡).

Type is18u11SS in the Seaich box, then click uo.
Click on is18u11SS.

(b) Click on uenes anu Regulation in the siue menu (oi the uenes anu
Regulation icon).
No, is18u11SS is Nissense vaiiant in foui =I?J7 tiansciipts.
It's a uownstieam gene vaiiant of ENSTuuuuu418uS4.

(c) In Ensembl, the alleles of is18u11SS aie given as u¡A because
these aie the alleles in the foiwaiu stianu of the genome. In the
liteiatuie anu in ubSNP, the alleles aie given as C¡T because the
=I?J7 gene is locateu on the ieveise stianu. The alleles in the
actual gene anu tiansciipt sequences aie C¡T.

(u) Click on Population genetics in the siue menu.
In all populations but two (fiom the 1uuu genomes anu
BapNap piojects), the allele u is the majoi one. The two
exceptions aie: CLN (Colombian in Neuelin; 1uuu uenomes),
BCB (Ban Chinese in Beijing, China; BapNap).

(e) Click on Phenotype Bata in the left hanu siue menu.
The specific stuuy wheie the association was oiiginally
uesciibeu is given in the Phenotype Bata table. Click on
pubmeu¡2uuS1S78 foi moie uetails.

The association between is18u11SS anu homocysteine levels is
uesciibeu in the papei 'Novel associations of &;:G, =FI, <_XD
anu C;9;G with plasma homocysteine in a healthy population:
90
a genome-wiue evaluation of 1S,974 paiticipants in the
Women's uenome Bealth Stuuy' (Paie !" $%, Cii Caiuiovasc
uenet. 2uu9 Api;2(2):142-Su).

(f) Click on Phylogenetic Context in the siue menu.

Select Alignment: 6 piimates EP0 anu click uo.
uoiilla, oiangutan, chimp, macaque anu maimoset all have a u
in this position. Please note that theie is no vaiiation uatabase
foi goiilla anu maimoset though.

(g) uo to http:¡¡neanueital.ensemblgenomes.oig¡ anu type
is18u11SS in the Seaich Neanueital text box.
Click uo.
Click on is18u11SS on the iesults page.
Click on }ump to iegion in uetail.
Click on Configuie this page in the siue menu.
Click on vaiiation featuies.
Select All vaiiations - Noimal.
SAvE anu close.
Biaw a box of about Su bp aiounu is18u11SS (shown in yellow in the
centie of the uisplay).
Click on }ump to iegion on the pop-up menu.
The Sequences tiack shows that theie aie foui ieaus foi
Neanueithal at the position of is18u11SS, all with a u, so baseu
on these (veiy limiteu) uata theie is no eviuence that both
alleles weie alieauy piesent in Neanueithal.


/H+">&%+ S7 Q F-"4>-4"=1 6="&=-&#' &' .4,='

(a) uo to the Ensembl homepage (http:¡¡www.ensembl.oig¡).
Select Seaich: Buman anu type //%K%G in the seaich box.
Click uo.
Click on &&VKVG (Buman uene) at the top.

(b) Click on Stiuctuial vaiiation in the siue menu.
Yes, CNvs have been annotateu foi this gene by multiple
stuuies, as inuicateu by the many bais in the laigei anu smallei
stiuctuial vaiiants tiacks in the uisplay. Betails aie given in the
table below the uisplay.

91
Note: Can you uo this with BioNait.


/H+">&%+ SR Q /HI1#"&'( = FE< &' ,#4%+

(a) uo to www.ensembl.oig, type is29S22S48 in the seaich box. Click
on is29S22S48 (Nouse vaiiation).
SNP is29S22S48 is locateu on 17:7S92499S. In Ensembl, its
alleles aie pioviueu as in the foiwaiu stianu.

(b) Click on BuvS names to ieveal infoimation about BuvS
nomenclatuie.
This SNP has got thiee BuvS names, one at the genomic BNA
level (17:g.7S92499SC>T), one at the tiansciipt level
(c.721u>A) anu one at the piotein level (p.val241Ile).

(c) In Ensembl, the allele that is piesent in the iefeience genome
assembly is always put fiist (C is the allele foi the iefeience
mouse genome, stiain CS7BL¡6}).

(u) Click on Inuiviuual genotypes is the left hanu siue menu. In the
summaiy of genotypes by population, click on Show foi
PERLEuEN:NN_PANEL2, oi seaich foi the two stiain names.
Theie aie inueeu uiffeiences between the genotypes iepoiteu
in those two uiffeient stiains. The genotype iepoiteu in
N0B¡LT} is TT wheieas in BALB¡cBy} the genotype is CC.


N/<

/H+">&%+ ST Q N/< cN="&='- /YY+>- <"+@&>-#" -##1e

(a) uo to www.ensembl.oig anu click on the link tools at the top of
the page. Cuiiently theie aie S tools listeu in that page. Click on
vaiiant Effect Pieuictoi anu entei the thiee vaiiants as below:
7 117171uS9 117171uS9 u¡A
7 117171u92 117171u92 T¡C
7 117171122 117171122 T¡C

Note: vaiiation uata input can be uone in a vaiiety of foimats. See
moie uetails heie
92
http:¡¡www.ensembl.oig¡info¡uocs¡vaiiation¡vep¡vep_foimats.htm
l

0nuei the non-synonymous SNP pieuictions option, select pieuiction
only foi SIFT anu PolyPhen, then click Next.
The output foimat is eithei in BTNL oi text. You will get a table
with the consequence teims fiom the Sequence 0ntology
pioject (http:¡¡www.sequenceontology.oig¡) (i.e. synonymous,
missense, uownstieam, intionic, S' 0TR, S' 0TR, etc) pioviueu
by vEP foi the listeu SNPs. You can also uploau the vEP iesults
as a tiack anu view them on Location pages in Ensembl. SIFT
anu PolyPhen aie available foi missense SNPs only. Foi two of
the enteieu positions, the vaiiations have been pieuicteu to be
piobably uamaging¡ueleteiious (cooiuinate 117171u92) anu
benign¡toleiateu (cooiuinate 117171122). All the thiee
vaiiations have been alieauy uesciibeu anu aie known as in
is18uuu78, is18uuu77 anu isSSS16286 in ubSNP anu othei
souices (uatabases, liteiatuie, etc).

(b) In oiuei to see youi uploaueu SNPs as a tiack in Region in uetail,
you will neeu to choose a name foi this uploau (e.g. vEP) when
enteiing the uata into the vEP tool. So you may neeu to entei the uata
again. 0nce you have uone that anu given a name to the uploau, click
on any link unuei the location column (in the vEP iesults table) to
see youi newly auueu vEP tiack with the thiee vaiiations in the
Location tab (oi Region in uetail view) in Ensembl.



93
B'%$+"% Q 3#,I="=-&6+ *+'#,&>%

*+'+ -"++% ='@ .#,#1#(4+%

/H+">&%+ JK Q C"-.#1#(4+%d I="=1#(4+% ='@ (+'+ -"++% Y#" -.+
.4,=' 0123 (+'+2

(a) uo to www.ensembl.oig, choose human anu seaich foi 670J. Click
thiough to the uene tab view.

0n the gene tab, click on 0ithologues at the left siue of the page to see
all the 6S oithologous genes.
Theie aie oithologues in 8 piimates.

The peicentage of iuentical amino acius in the Taisiei piotein
(the oithologue) compaieu with the gene of inteiest. i.e. human
670J (the taiget species¡gene) is 69%. This is known as the
Taiget %IB. The iuentity of the gene of inteiest (human 670J)
when compaieu with the oithologue (Taisiei 670J, the queiy
species¡gene) is 62% (the queiy %IB).

Note the uiffeience in the values of the Taiget anu Queiy % IB
ieflects the uiffeient piotein lengths foi the human anu taisiei
670J genes.

(b) Theie is moie than one way to get to the answei.
0ption 1: uo to the oithologues page anu click on the maimoset
oithologue to open the gene tab.
Click uenomic alignments at the left. Then select Alignment: Buman
(Bomo sapiens) - lastz anu click uo.
The ieu sequence is piesent in exons, so theie is a gene in both
species in this iegion. You can finu wheie the stait anu stop couons
aie locateu if you configuie this page anu select START¡ST0P couons.

0ption 2: uo to location tab of the maimoset 670J gene anu then
click on Region Compaiison view at the left. Click on Select species oi
iegions at the left anu click on the + to select Buman (Bomo sapiens)
- lastz then save anu close. You shoulu see an alignment between the
human 670J gene iegion anu the 670J gene iegion foi the
maimoset.

94
(E#-+: To see a blue line connecting homologous genes in the
Region Compaiison view page, click on configuie this page anu
unuei Compaiative featuies select join genes. Zoom out on the
location view to see blue lines connecting all the homologous
genes between maimoset anu human genes in that iegion).


^.#1+ (+'#,+ =1&(',+'-%

/H+">&%+ JS Q b+0"=Y&%. #"-.#1#(4+%

(a) Stait in the Location tab (iegion in uetail) foi +53
(ENSBARuuuuuuu69446). Click on Alignments (Image) at the left,
anu select the S teleost fish EP0 alignment in the pull-uown menu in
the view. The zebiafish, stickleback, meuaka, fugu, anu tetiauon aie
shown in this iegion. All the species show a gene in the aligneu
iegion. This can also be seen in the Alignments (text) page (the exons
aie highlighteu in ieu).

(b) You can expoit the alignments fiom eithei Alignments (images)
oi Alignments (text) menus in the Location tab. Click on the blue
Expoit uata button at the left, anu choose Clustal fiom the list.

(c) Click on Region in uetail in the left hanu menu. Tuin on the
multiple alignment anu, constiaineu elements anu conseivation scoie
foi S teleost fish EP0 tiacks, all unuei the Compaiative genomics
menu by configuiing the page.

The S teleost fish EP0 tiack just shows that the whole iegion foi the
+53 gene can be aligneu among those five species of fish. The
Constiaineu elements anu Conseivation scoie tiacks show the
conseiveu sequence is locateu wheie in the alignment.
Bighei conseivation iegions match up with exonic iegions
(exons tenu to be highly conseiveu) of the gene. Note that theie
aie intionic iegions that seem to be faiily conseiveu acioss the
species available.

Click on the Tiack name anu the (infoimation button) to ieau
moie about constiaineu elements (oi any othei uata tiack).



93
/H+">&%+ JJ Q F]'-+']

(a) Change the species to uog next to the image.
Yes, theie aie multiple syntenic iegions in uog to human
chiomosome S, which is in the centie of this view. Bog
chiomosomes 6, 2u, 2S, S1, SS, anu S4 have syntenic iegions to
human chiomosome S.

(b) Scioll uown to the bottom of the page.
Theie is a homologue in uog of human RB0. Click Centie on
gene RB0 to compaie the genes between human anu uog in this
syntenic block.


/H+">&%+ JM Q ^.#1+ (+'#,+ =1&(',+'-%

(a) uo to the Ensembl homepage (http:¡¡www.ensembl.oig¡).
Select Seaich: Buman anu type bica2 in the seaich box.
Click uo.
Click on 1S:S2889611-S297S8uS:1 below BRCA2 (Buman uene).

You may want to tuin off all tiacks that you auueu to the uisplay in
the pievious exeicises as follows:
Click Configuie this page in the siue menu.
Click Reset configuiation.
SAvE anu close.

(b) Click Configuie this page in the siue menu
Click on BLASTZ¡LASTz alignments unuei the Compaiative genomics
menu. Select Chicken (uallus gallus) - BLASTZ_NET - Noimal,
Chimpanzee (Pan tioglouytes) - BLASTZ_NET - Noimal, Nouse (Nus
musculus) - BLASTZ_NET - Noimal anu Platypus (0inithoihynchus
anatinus) - BLASTZ_NET - Noimal.
Click on Tianslateu blat alignments. Select Anole Lizaiu (Anolis
caiolinensis) - TRANSLATEB_BLAT_NET - Noimal anu Zebiafish
(Banio ieiio) - TRANSLATEB_BLAT_NET - Noimal.
SAvE anu close.
Yes, the uegiee of conseivation uoes ieflect the evolutionaiy
ielationship between human anu the othei species; the highest
uegiee of conseivation is founu in chimp, followeu by mouse,
platypus, chicken, lizaiu anu zebiafish, iespectively. Especially
the exonic sequences of 67&08 seem to be highly conseiveu
96
between the vaiious species, which is what is to be expecteu
because these aie supposeu to be unuei highei selection
piessuie than intionic anu inteigenic sequences.

(c) Click Configuie this page in the siue menu.
Click on Conseivation iegions unuei the Compaiative genomics
menu.
Select Conseivation scoie foi S7 eutheiian mammals
EP0_L0W_C0vERAuE, Conseivation scoie foi 21 amniota
veitebiates Pecan anu Constiaineu elements foi 21 amniota
veitebiates Pecan.
SAvE anu close.
Both the Conseivation scoie anu Constiaineu elements tiacks
laigely coiiesponu with the uata seen in the paiiwise
alignment tiacks; all exons of the 67&08 gene show a high
uegiee of conseivation (Note the 0TRs which aie not
conseiveu).

(u) Click on a constiaineu element (biown block).
Click on view alignments (text) in the pop-up menu.
Click Configuie this page in the siue menu.
Select Conseivation iegions: All conseiveu iegions.
SAvE anu close.

The conseiveu iegions will be shown in light blue.

(e) Click on the uene: BRCA2 tab.
Click on uenomic alignments unuei Compaiative uenomics in the
siue menu.
Select Alignment: 6 piimates EP0.
Click uo.
Click Configuie this page in the siue menu.
Select Conseivation iegions: All conseiveu iegions.
SAvE anu close.

The conseiveu iegions will be shown in light blue.



97
B'%$+"% Q :+(41=-&#'

/H+">&%+ J8 Q *+'+ "+(41=-&#'a \4,=' 4567

(a) Seaich foi human gene :IX\ fiom the home page. Click on
Location in the seaich iesults.
Regulatoiy featuies fiom the Ensembl 'iegulatoiy builu' aie
baseu on inuicatois of open chiomatin such as CTCF binuing
sites, BNase I hypeisensitive sites, anu Tiansciiption Factoi
binuing sites. The Regulatoiy featuies aie tuineu on by uefault
in the Region in uetail view.

Theie aie many iegulatoiy featuies mapping to the :IX\
tiansciipts, incluuing the S' enu.

Click on the Reg. Feats tiack name to jump to an aiticle
explaining the unueilying uata. Click anu uiag the Reg. Feats
tiack next to the uenes (Neigeu Ensembl¡Bavana) tiack to
bettei compaie wheie the Regulatoiy featuies (giey boxes) aie
in the gene.

(b) See the legenu below the Region in uetail view to finu the
pieuicteu enhancei segments aie colouieu in yellow. Two
appeai in the B0vEC cell type only (out of the thiee cells
chosen).

(c) Configuie this page anu click on 0pen chiomatin &TFBS. Tuin on
both peaks anu signal foi BNase 1 anu FAIRE in BeLa-SS cells (the
boxes in this configuie this page winuow will tuin blue. Foi moie
infoimation on how to select anu view the suppoiting uata, click on
Show tutoiial in the pop up winuow). Close the menu.
Theie aie two BNase 1 hypeisensitive sites in the S' exon of
:IX\. Click on the colouieu block to finu out that the BNase1
eniicheu sites in BeLa-SS cells come fiom the ENC0BE pioject.
Theie is no FAIRE site known in this iegion.

(u) Configuie this page anu click on Bistones & polymeiases. Change
the Filtei by menu fiom All classes to Bistone. Select the all the
histone mouifications available foi BeLa cells (some of them might be
on by uefault). Save anu close the menu.
98
BSK4meS, BSK9ac anu BSK27ac sites have been founu in the S'
iegion of :IX\ in BeLa-SS cells.

(e) Click on configuie this page anu choose the BNA Nethylation
menu. Scioll uown to Enable¡uisable all Exteinal uata then tuin on
the fiist tiack in the list (NeBIP-chip B-cells). Save anu close the
menu.
The Cpu sites at the S' enu of :IX\ aie not highly methylateu
(note the yellow¡gieen bais). Yellow, gieen, anu blue bais
iepiesent unmethylateu, inteimeuiately methylateu, anu
methylateu iegions, iespectively. Foi moie infoimation on
human BNA methylation BAS tiacks, see:
www.ensembl.oig¡info¡uocs¡funcgen¡inuex.html

(f) Click Shaie this page in the siue menu.
Select the link anu copy.
uo into youi email account anu compose an email to youiself.
Paste the link in, then senu.
0pen the email anu click on youi link.


/H+">&%+ JO Q :+(41=-#"] Y+=-4"+% &' .4,='

(a) uo to the Ensembl homepage (http:¡¡www.ensembl.oig¡).
Select Seaich: Buman anu type 6:S2S4uuuu-S262uuuu in the seaich
box.
Click uo.

You may want to tuin off all tiacks that you auueu to the uisplay in
the pievious exeicises as follows:

Click Configuie this page in the siue menu.
Click Reset configuiation.
SAvE anu close.

(b) You can click on all the iegulatoiy featuies shown in the Reg.
Feats tiack that aie locateu in the inteigenic iegion of those genes.
The iesulting pop-up winuow foi each of those will show the coie
attiibutes unueilying the iegulatoiy featuies.
Yes, theie is one iegulatoiy featuie aiounu cooiuinates
S2S89947-S2S9127S that has CTCF binuing uata as pait of its
coie eviuence. Its IB is ENSRuuuuu488u2S.
99

(c) Click Configuie this page in the siue menu.
Click on Regulation - 0pen chiomatin & TFBS.
Select NultiCell - Tiack style: Peaks.
SAvE anu close.
CTCF binuing has been uetecteu at this position in eleven of the
cell¡tissue types analyseu. (CB4, uNu699u, uN12878, B1ESC,
BNEC, BSNN, B0vEC, BeLa-SS, Bepu2, NB-A, NBEK)

(u) Click Configuie this page in the siue menu.
Click on Regulation - Bistones & polymeiases.
Accoiuing to the Bistones & Polymeiases configuiation matiix
the most infoimation on histone acetylation is available foi CB4
cells.

Bovei ovei CB4 in the Bistones & Polymeiases configuiation matiix.
Select Select featuies foi CB4 - All.
SAvE anu close.
Yes, the iegion that shows CTCF binuing is also a iegion of high
acetylation of histone 2A, 2B, S anu 4 in CB4 cells.



100
B'%$+"% Q B@6='>+@ +H+">&%+

L+-.]1=-&#' @=-= &' .4,='

(a) uo to the Ensembl homepage (http:¡¡www.ensembl.oig¡).
Select Seaich: Buman anu type PBBA2 in the foi text box.
Click uo.
Click on 4:967612S9-9676262S:1.
Zoom out one step, so that the Skb iegion aiounu the ;C?08 gene is
shown.

You may want to tuin off all tiacks that you auueu to the uisplay in
the pievious exeicises as follows:

Click Configuie this page in the siue menu.
Click Reset configuiation.
SAvE anu close.

(b) Click Configuie this page in the siue menu.
Type cpg in the Finu a tiack box.
Select Cpu islanus.
SAvE anu close.
No Cpu islanus aie shown. As foi the inclusion of Cpu islanus
into the Ensembl uatabase foi human a minimum length of 4uu
bp is iequiieu, the ieason foi this coulu be that the Cpu islanus
in the PBBA2 gene aie shoitei than 4uu bp. Bowevei, theie is a
%uC tiack, which shows that the iegion that compiises the S'
pait of the PBBA2 gene anu the iegion uiiectly upstieam of the
gene has a high %uC (the ieu line in the %uC tiack inuicates
Su% uC). It is uifficult ¡ impossible to uistinguish inuiviuual
Cpu islanus in this tiack, though.

(c) Click Expoit uata in the siue menu.
Click Next>.
Click on Text.
Select anu copy the sequence.
uo to http:¡¡www.ebi.ac.uk¡Tools¡emboss¡cpgplot¡inuex.html.
Paste the sequence into the text box.
Click Run.
CpuPlot uoes confiim the existence of two Cpu islanus in the
;C?08 gene iegion of lengths 2uu anu 26S bp, iespectively. So,
101
it is inueeu because of theii length being less than 4uu bp that
these Cpu islanus aie not piesent in the Ensembl uatabase.

(u) Click Auu youi uata in the siue menu (Note that if you have
pieviously uploaueu uata to Ensembl, this box will say Nanage youi
uata insteau).
Click on 0ploau Bata.
Type Cpu islanus in the Name foi this uploau (optional) box.
Select Bata foimat: BEB.
Copy the following into the Paste file box:

chi4 96761176 96761S7S cpg_islanu_1
chi4 96761Suu 96761762 cpg_islanu_2

Click 0ploau.
Click on uo to neaiest iegion with uata: 4:967u1276-96811276.
The two Cpu islanus shoulu now be shown on the Region in
uetail page. They shoulu coinciue with the iegions of high %uC.

Zoom in on the two Cpu islanus.

To uisplay the names of the Cpu islanus:

Bovei ovei the Cpu islanus tiack name.
Bovei ovei the icon of the cog-wheel.
Select Labels.

(e) Biag youi Cpu islanus tiack so that it is next to the %uC tiack.
Click Shaie this page in the siue menu.
Select the link anu copy.
Paste into youi inteinet biowsei to view.

(f) Click Configuie this page in the siue menu.
Click on Regulation - BNA Nethylation.
Select all NeBIP tiacks in Noimal moue.
SAvE anu close.
Yellow, gieen anu blue iepiesent unmethylateu, inteimeuiately
methylateu anu methylateu iegions, iespectively (see the
Nethylation Legenu at the bottom of the page). It can be seen
that the iegion aiounu the S' pait of the ;C?08 gene is
methylateu in all assayeu tissues anu cell lines, except in speim.
102
The NeBIP-seq tiack foi speim shows that the unmethylateu
iegions coinciue with the Cpu islanus founu by CpuPlot.

(g) Click on Configuie this page, then select RNASeq mouels. Tuin on
the BAN files foi all the tissues in Coveiage only.
You will see histogiams of RNASeq coveiage foi each of the
tissues. All of these histogiams appeai to be the same height,
but the numbeis at the left inuicate the peak. The laigest
numbei is foi the meigeu ieau, 1u,u48. Foi the tissue-specific
ieau, Testes have a peak of 18Su, highei than all the othei
tissues. Theie aie also moie wiuei peaks in the Testes tiack.
The unmethyateu Cpu islanus in speim suggest that this gene is
negatively iegulateu by Cpu islanu methylation.

(h) Click on Configuie this page, then select Compaiative genomics.
Tuin on the tiacks foi the Constiaineu elements foi S7 eutheiian
mammals anu Conseivation scoie foi S7 eutheiian mammals.
The iegion of the gene itself has high uERP scoies, inuicateu by
constiaineu elements ovei most of the gene. Theie is no
appaient uiffeience in the conseivation scoie between the Cpu
islanus anu theii flanking iegions.

(i) Click on the Tiansciipt Tab, Tiansciipt: PBAB2-uu1 anu select
0ntology table.
Theie aie ten teims in the table, the fiist being u0:uuu6u9u,
pyiuvate metabolic piocess.

To expoit the list use BioNait.
Click on BioNait in the top bai.
Choose Ensembl uenes 7S anu Bomo sapiens genes (uRChS7).

Click on Filteis.
0pen the menu foi uENE 0NT0L0uY.
Select u0 Teim Accession anu put u0:uuu6u9u into the box.

Click on Attiibutes.
Choose Sequences.
Expanu SEQ0ENCES anu select 0nspliceu (uene).
Expanu Beauei infoimation anu ueselect Ensembl Tiansciipt IB.

Click Results.
You can expoit these iesults if you wish.
103


(j) uo to the REST API uocumentation page at
http:¡¡beta.iest.ensembl.oig¡uocumentation.
Click on uET sequence¡iu¡:iu to get the uocumentation foi this
commanu.

You will neeu the stable IB of ;C?08, go to the biowsei page to finu
that it is ENSuuuuuu16S114.

0se the uocumentation to constiuct a 0RL in the coiiect foim, ie:
http:¡¡beta.iest.ensembl.oig¡sequence¡iu¡:iu.foimat=fasta

Auu the IB to the 0RL to cieate:
http:¡¡beta.iest.ensembl.oig¡sequence¡iu¡ENSuuuuuu16S114.foim
at=fasta

This 0RL will give you the sequence.


104
U4&>5 *4&@+ -# V=-=0=%+% ='@ <"#W+>-%

Beie is a list of uatabases anu piojects you will come acioss in these
exeicises. uoogle any of these to leain moie. Piojects incluue many
species, unless otheiwise noteu.

C-.+" .+1Ia
A.+ /'%+,01 *1#%%="]a http:¡¡www.ensembl.oig¡Belp¡ulossaiy
/'%+,01 DBU%a
http:¡¡www.ensembl.oig¡Belp¡Faq
F/U9/E3/F
/L!;?!='5d E3!G *+'!='5d VV!g Q Contain nucleic aciu sequences
uepositeu by submitteis such as wet-lab biologists anu gene
sequencing piojects. These thiee uatabases aie synchioniseu with
each othei eveiy uay, so the same sequences shoulu be founu in each.

33VF - couing sequences that aie agieeu upon by Ensembl, vEuA-
Bavana, 0CSC, anu NCBI. L3,-$1 $1+ -',(!WS

E3!G /'-"+h *+'+ Q NCBI's gene collection
i
E3!G :+YF+Z Q NCBI's collection of 'iefeience sequences', incluues
genomic BNA, tiansciipts anu pioteins. NN stanus foi 'Known mRNA'
(eg NN_uuS476) anu NP (eg NP_uuS467) aie 'Known pioteins'.

9'&<"#-X! Q the "Piotein knowleugebase", a compiehensive set of
piotein sequences. Biviueu into two paits: Swiss-Piot anu TiENBL

9'&<"#- F$&%%?<"#- Q the manually annotateu, ievieweu piotein
sequences in the 0niPiotKB. Bigh quality.

9'&<"#- A"/L!; Q the automatically annotateu, unievieweu set of
pioteins (ENBL-Bank tianslateu). vaiying quality.

N/*B Q veitebiate uenome Annotation, a selection of manually-
cuiateu genes, tiansciipts, anu pioteins. L3,-$1E -',(!E a!5)$.*(3E
4')*%%$E O$%%$5NE 2*4E $1+ +'4WS

N/*B?\BNBEB Q The main contiibutoi to the vEuA pioject, locateu
at the Wellcome Tiust Sangei Institute, Binxton, 0K.

103
*/E/ EBL/F

\*E3 Q B0u0 uene Nomenclatuie Committee, a pioject assigning a
unique anu meaningful name anu symbol to eveiy human gene.
L?,-$1WS

bDGE Q The Zebiafish Nouel 0iganism Batabase. uene names aie only
one pait of this pioject. LbY.*(3WS

<:CA/GE FG*EBA9:/F
G'-+"<"# Q A collection of uomains, motifs, anu othei piotein
signatuies. Piotein signatuie iecoius aie extensive, anu combine
infoimation fiom inuiviuual piojects such as 0niPiot, along with
othei uatabases such as SNART, PFAN anu PR0SITE (explaineu
below).

<DBL Q A collection of piotein families

<:CFGA/ Q A collection of piotein uomains, families, anu functional
sites.

FLB:A Q A collection of evolutionaiily conseiveu piotein uomains.

CA\/: <:Cg/3AF
E3!G @0FE< Q A collection of sequence polymoiphisms; mainly
single nucleotiue polymoiphisms, along with inseition-ueletions.

E3!G CLGL Q 0nline Nenuelian Inheiitance in Nan - a iesouice
showing phenotypes anu uiseases ielateu to genes L3,-$1WS

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close