Data Analysis & visualization with R

Published on July 2016 | Categories: Types, Speeches | Downloads: 44 | Comments: 0 | Views: 455
of 14
Download PDF   Embed   Report

Slides used in talk on R at fifthelephant.in

Comments

Content

Data analysis & Visualization with R

Shree Joshi twitter:@2joshis

Image Credit: waveking1/flickr

What is R ?
R is an integrated suite of software facilities for data manipulation, calculation and graphical display.
It includes • an effective data handling and storage facility, • a suite of operators for calculations on arrays, in particular matrices, • a large, coherent, integrated collection of intermediate tools for data analysis, • graphical facilities for data analysis and display either on-screen or on hardcopy, and • a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

Objects/Symbols
• Everything in R is an object • Common Object Types
– Vector - collection of same type of objects
> c(1:10) [1] 1 2 3 4 5 6 7 8 9 10 >even <- seq(from=2,to=10,by=2) >odd <-seq(from=1,to=10,by=2) >ifelse(rep(c(0,1),times=2),odd,even) >sort(c(even,odd)) * c(1,2)

– List - collection of dissimilar objects
>address <- list(door=1,street="infinity loop",city="cupertino",dimension=c(1:3))

>address$door [1] 1 > address$street >[1] "infinity loop" >address$dimension >[1] 1 2 3

Objects/Symbols
– Factors – Ordered and unordered, similar to Enum – Matrix – two dimensional vector of same type – data.frame – Tables with columns of different object types – array – Timeseries

Operations
Subsetting Data
> c <- (1:10) > c [1] 1 2 3 4 5 6 7 8 9 10 > c[c<5] [1] 1 2 3 4

> c[-(1:2)] [1] 3 4 5 6 7 8 9 10

Arithmetic
>c*2
[1] 2 4 6 8 10 12 14 16 18 20

Operations
> x <- cbind(1:10) >x [,1] [1,] 1 [2,] 2 [3,] 3 [4,] 4 [5,] 5 [6,] 6 [7,] 7 [8,] 8 [9,] 9 [10,] 10 > apply(x,2,sum) [1] 55 > foo <- list(a=1:10,b=11:20) > lapply(foo,sum) > sapply(foo,sum)

Functions/Libraries
> foo <- function(x) { x*x} > foo function(x) { x*x} > foo(2) [1] 4

CRAN – Package Repository

Accessing Data with R
• Reading/Writing Data
– CSV/Text Files
foo <-read.csv("D:/shree/R/rprojects/5el/cm26JUL2012bhav.csv",strip.white=TRUE)

foo2 <subset(foo,SERIES="EQ",select=c("SYMBOL","OPEN","HIGH","LOW","CLOSE","PREVCLOSE")) foo3<-cbind(foo2,change=(foo2$CLOSE-foo2$PREVCLOSE)/foo2$PREVCLOSE) summary(foo3$change)
breaks<-seq(from=-0.1,to=0.1,by=0.02)

f<-cut(foo3$change,breaks) Summary(f)

– Databases, Excel, Rcpp – Web – readHTMLTable(), XML,JSON

Graphing
plot(foo3$change,col=‘seagreen’) hist(foo3$change,breaks=50,col='seagreen') plot(foo3$change,col='seagreen',type='h')

Time series
• Sequence of Orderes Data points in time
– Regularly spaced – Irregularly spaced

ts - regularly spaced time series mts - multiple regularly spaced time series its - irregularly spaced time series timeSeries - default for Rmetrics packages fts - R interface to tslib (c++ time series library) zoo - reg/irreg and arbitrary time stamp classes xts - an extension of the zoo class

Packages for Financial Analysis

Slide Credit: Guy Yollin

Blotter Flow

Slide Credit: Guy Yollin

QuantStrat

Slide Credit: Guy Yollin

Acknowledgements
• Guy Yollin • R Cookbook • R in a Nutshell

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close