A First Course in Statistical Programming with R
This new, color edition of Braun and Murdoch’s bestselling textbook inte-
grates use of the RStudio platform. and adds discussion of newer graphics
systems,extensiveexplorationofMarkovchainMonteCarlo,expertadvice
oncommonerrormessages,motivatingapplicationsofmatrixdecomposi-
tions,and numerous new examples and exercises.
This is the only introduction you’ll need to start programming in R, the
computing standard for analyzing data. Co-written by an R Core Team
member and an established R author, this book comes with real R code
that complies with the standards of the language. Unlike other introduc-
torybooksontheRsystem,thisbookemphasizesprogramming,including
theprinciplesthatapplytomostcomputinglanguages,andtechniquesused
to develop more complex projects. Solutions, datasets, and any errata are
availablefromthebook’swebsite.Themanyexamples,allfromrealappli-
cations, make it particularly useful for anyone working in practical data
analysis.
W. John Braun is Deputy Director of the Canadian Statistical Sciences
Institute. He is also Professor and Head of the Departments of Computer
Science, Physics, Mathematics and Statistics at the University of British
Columbia Okanagan. His research interests are in the modeling of envi-
ronmental phenomena, such as wildfire, as well as statistical education,
particularlyas itrelates tothe R programming language.
Duncan J. Murdoch is a member of the R Core Team of developers, and
is co-president of the R Foundation. He is one of the developers of the rgl
packagefor3DvisualizationinR,andhasalsodevelopednumerousother
R packages. He is also a professor in the Department of Statistical and
Actuarial Sciences at the University of WesternOntario.
,
,
A First Course in
Statistical Programming
with R
Second Edition
W.John Braun and Duncan J.Murdoch
,
OneLiberty Plaza, 20thiFloor, New York,NY 10006,iUSA
CambridgeUniversityPress is part oftheUniversity ofCambridge.
Itfurthers theUniversity’s mission bydisseminating knowledgeinthe pursuitof
education, learning,and research at thehighest international levels ofexcellence.
www.cambridge.org
Informationonthis title: www.cambridge.org/9781107576469
C© W.JohnBraunandDuncan J. Murdoch2007,2016
This publicationis in copyright.Subject tostatutory exception
andtothe provisionsofrelevant collective licensing agreements,
noreproductionofanypart maytake place without thewritten
permission ofCambridgeUniversity Press.
First published 2007
Secondedition 2016
Printedin theUnited States ofAmerica bySheridanBooks, Inc.
AcataloguerecordforthispublicationisavailablefromtheBritishLibrary.
ISBN978-1-107-57646-9Hardback
Additional resources forthis publicationat www.cambridge.org/9781107576469.
CambridgeUniversityPress has noresponsibility forthe persistence oraccuracyof
URLs forexternalorthird-partyInternet Websites referred tointhis publication
anddoesnot guaranteethat anycontentonsuch Websites is, orwill remain,
accurate orappropriate.
,
Contents
Preface to the second edition pagexi
Preface to the first edition xiii
1 Getting started 1
1.1 Whatisstatistical programming? 1
1.2 Outline ofthis book 2
1.3 TheRpackage 3
1.4 Whyuse a commandline? 3
1.5 Fontconventions 4
1.6 Installation ofR andRStudio 4
1.7 Getting started in RStudio 5
1.8 Goingfurther 6
2 Introduction to the R language 7
2.1 Firststeps 7
2.2 Basic featuresofR 11
2.3 Vectorsin R 13
2.4 Data storage in R 22
2.5 Packages,libraries, and repositories 27
2.6 Getting help 28
2.7 Logicalvectorsand relationaloperators 34
2.8 Data frames and lists 37
2.9 Data inputandoutput 43
3 Programming statistical graphics 49
3.1 High levelplots 50
3.2 Choosinga high levelgraphic 62
3.3 Lowlevelgraphicsfunctions 63
3.4 Othergraphicssystems 70
14:22:50,
vi CONTENTS
4 Programming with R 76
4.1 Flowcontrol 76
4.2 Managingcomplexity through functions 91
4.3 The replicate() function 97
4.4 Miscellaneousprogrammingtips 97
4.5 Somegeneralprogramming guidelines 100
4.6 Debuggingandmaintenance 107
4.7 Efficientprogramming 113
5 Simulation 120
5.1 MonteCarlo simulation 120
5.2 Generation ofpseudorandomnumbers 121
5.3 Simulation ofotherrandomvariables 126
5.4 Multivariate randomnumbergeneration 142
5.5 Markov chainsimulation 143
5.6 MonteCarlo integration 147
5.7 Advancedsimulation methods 149
6 Computational linear algebra 158
6.1 Vectors and matrices in R 159
6.2 Matrix multiplication and inversion 166
6.3 Eigenvaluesand eigenvectors 171
6.4 Othermatrix decompositions 172
6.5 Othermatrix operations 178
7 Numerical optimization 182
7.1 Thegoldensection search method 182
7.2 Newton–Raphson 185
7.3 TheNelder–Meadsimplex method 188
7.4 Built-in functions 191
7.5 Linearprogramming 192
Appendix Review of random variables and
distributions 209
Index 212
14:22:50,
Expanded contents
Preface to the second edition pagexi
Preface to the first edition xiii
1 Getting started 1
1.1 Whatisstatistical programming? 1
1.2 Outline ofthis book 2
1.3 TheRpackage 3
1.4 Whyuse a commandline? 3
1.5 Fontconventions 4
1.6 Installation ofR andRStudio 4
1.7 Getting started in RStudio 5
1.8 Goingfurther 6
2 Introduction to the R language 7
2.1 Firststeps 7
2.1.1 R canbeused as a calculator 7
2.1.2 Named storage 9
2.1.3 Quitting R 10
2.2 Basic featuresofR 11
2.2.1 Functions 11
2.2.2 R is case-sensitive 12
2.2.3 Listing theobjects in theworkspace 13
2.3 Vectorsin R 13
2.3.1 Numeric vectors 13
2.3.2 Extracting elements fromvectors 14
2.3.3 Vector arithmetic 15
2.3.4 Simple patternedvectors 16
2.3.5 Vectors with randompatterns 17
2.3.6 Character vectors 17
2.3.7 Factors 18
2.3.8 More onextractingelements fromvectors 19
14:22:50,
viii EXPANDED CONTENTS
2.3.9 Matrices andarrays 19
2.4 Data storage in R 22
2.4.1 Approximatestorage ofnumbers 22
2.4.2 Exactstorage ofnumbers 24
2.4.3 Dates and times 25
2.4.4 Missing values andotherspecial values 25
2.5 Packages,libraries, andrepositories 27
2.6 Getting help 28
2.6.1 Built-inhelppages 28
2.6.2 Built-inexamples 29
2.6.3 Findinghelpwhen youdon’tknowthe functionname 30
2.6.4 Somebuilt-in graphics functions 31
2.6.5 Someelementary built-in functions 33
2.7 Logicalvectorsand relational operators 34
2.7.1 Booleanalgebra 34
2.7.2 Logicaloperations inR 34
2.7.3 Relational operators 36
2.8 Data frames and lists 37
2.8.1 Extractingdata frame. elements andsubsets 39
2.8.2 Takingrandomsamples frompopulations 40
2.8.3 Constructingdataframes 40
2.8.4 Data frames canhave non-numericcolumns 40
2.8.5 Lists 41
2.9 Data inputandoutput 43
2.9.1 Changingdirectories 43
2.9.2 dump() and source() 43
2.9.3 RedirectingRoutput 44
2.9.4 Savingand retrievingimage files 45
2.9.5 The read.table function 45
3 Programming statistical graphics 49
3.1 High levelplots 50
3.1.1 Barcharts anddotcharts 50
3.1.2 Pie charts 53
3.1.3 Histograms 54
3.1.4 Boxplots 55
3.1.5 Scatterplots 57
3.1.6 Plottingdata fromdata frames 57
3.1.7 QQ plots 60
3.2 Choosinga high levelgraphic 62
3.3 Lowlevelgraphicsfunctions 63
3.3.1 Theplottingregionandmargins 63
3.3.2 Addingtoplots 64
3.3.3 Adjusting axis ticklabels 66
3.3.4 Setting graphicalparameters 68
3.4 Othergraphicssystems 70
3.4.1 The ggplot2 package 70
3.4.2 The lattice package 72
14:22:50,
EXPANDED CONTENTS ix
3.4.3 The grid package 73
3.4.4 Interactive graphics 74
4 Programming with R 76
4.1 Flowcontrol 76
4.1.1 The for() loop 76
4.1.2 The if() statement 82
4.1.3 The while() loop 86
4.1.4 Newton’s methodforrootfinding 87
4.1.5 The repeat loop,andthe break and next statements 89
4.2 Managingcomplexity throughfunctions 91
4.2.1 What are functions? 91
4.2.2 Scope ofvariables 94
4.2.3 Returningmultiple objects 95
4.2.4 Using S3 classes tocontrolprinting 95
4.3 The replicate() function 97
4.4 Miscellaneousprogramming tips 97
4.4.1 Always editcode inthe editor,notinthe console 97
4.4.2 Documentation using # 98
4.4.3 Neatness counts! 98
4.5 Somegeneralprogramming guidelines 100
4.5.1 Top-downdesign 103
4.6 Debuggingandmaintenance 107
4.6.1 Recognizingthat a bugexists 108
4.6.2 Make the bugreproducible 108
4.6.3 Identify thecause ofthe bug 109
4.6.4 Fixing errors andtesting 111
4.6.5 Lookforsimilar errorselsewhere 111
4.6.6 Debugging inRStudio 111
4.6.7 The browser(), debug(),anddebugonce() functions 112
4.7 Efficientprogramming 113
4.7.1 Learn yourtools 114
4.7.2 Use efficient algorithms 114
4.7.3 Measure thetime yourprogramtakes 116
4.7.4 Be willing touse differenttools 117
4.7.5 Optimize with care 117
5 Simulation 120
5.1 MonteCarlo simulation 120
5.2 Generation ofpseudorandomnumbers 121
5.3 Simulation ofotherrandomvariables 126
5.3.1 Bernoulli randomvariables 126
5.3.2 Binomial randomvariables 128
5.3.3 Poisson randomvariables 132
5.3.4 Exponential randomnumbers 136
5.3.5 Normal randomvariables 138
5.3.6 All built-indistributions 140
14:22:50,
x EXPANDED CONTENTS
5.4 Multivariate randomnumbergeneration 142
5.5 Markov chainsimulation 143
5.6 MonteCarlo integration 147
5.7 Advancedsimulation methods 149
5.7.1 Rejectionsampling 150
5.7.2 Importancesampling 152
6 Computational linear algebra 158
6.1 Vectors and matrices in R 159
6.1.1 Constructingmatrixobjects 159
6.1.2 Accessing matrixelements; rowandcolumn names 161
6.1.3 Matrixproperties 163
6.1.4 Triangularmatrices 164
6.1.5 Matrixarithmetic 165
6.2 Matrix multiplication and inversion 166
6.2.1 Matrixinversion 167
6.2.2 TheLU decomposition 168
6.2.3 Matrixinversionin R 169
6.2.4 Solvinglinear systems 170
6.3 Eigenvaluesand eigenvectors 171
6.4 Othermatrix decompositions 172
6.4.1 Thesingular valuedecomposition ofamatrix 172
6.4.2 TheCholeskidecomposition ofa positivedefinite matrix 173
6.4.3 TheQRdecomposition ofa matrix 174
6.5 Othermatrix operations 178
6.5.1 Kroneckerproducts 179
6.5.2 apply() 179
7 Numerical optimization 182
7.1 Thegoldensection search method 182
7.2 Newton–Raphson 185
7.3 TheNelder–Meadsimplex method 188
7.4 Built-in functions 191
7.5 Linearprogramming 192
7.5.1 Solvinglinear programmingproblems inR 195
7.5.2 Maximization andotherkinds ofconstraints 195
7.5.3 Special situations 196
7.5.4 Unrestricted variables 199
7.5.5 Integerprogramming 200
7.5.6 Alternatives to lp() 201
7.5.7 Quadratic programming 202
Appendix Review of random variables and
distributions 209
Index 212
14:22:50,
Preface to the second edition
AlotofthingshavehappenedintheRcommunitysincewewrotethefirst
editionofthistext.MillionsofnewusershavestartedtouseR,anditisnow
the premier platform. for data analytics. (In fact, the term “data analytics”
hardly existed when we wrotethefirst edition.)
RStudio, a cross-platform. integrated development environment for R,
hashadalargeinfluenceontheincreaseinpopularity.Inthiseditionwerec-
ommend RStudio as the platform. for most new users, and have integrated
simpleRStudioinstructionsintothetext.Infact,wehaveusedRStudioand
the knitr package inputting together themanuscript.
Wehavealsoaddednumerousexamplesandexercises,andcleanedup
existingoneswhentheywereunclear.Chapter2(IntroductiontotheRlan-
guage)hashadextensiverevisionandreorganization.Wehaveaddedshort
discussions of newer graphics systems to Chapter 3 (Programming statis-
tical graphics). Reference material on some common error messages has
been added to Chapter 4 (Programming with R), and a list of pseudoran-
dom number generators as well as a more extensive discussion of Markov
chainMonteCarloisnewinChapter5(Simulation).InChapter6(Compu-
tationallinearalgebra),someapplicationshavebeenaddedtogivestudents
abetter idea of why some of thematrixdecompositions are soimportant.
Once again we have a lot of people to thank. Many students have used
the first edition, and we are grateful for their comments and criticisms.
Some anonymous reviewers also provided some helpful suggestions and
pointers so that we could make improvements to the text. We hope our
readers find this new edition as interesting and educational as we think it
is.
W.John Braun
Duncan Murdoch
November, 2015
14:26:36, .001
Preface to the first edition
Thistextbeganasnotesforacourseinstatisticalcomputingforsecondyear
actuarialandstatisticalstudentsattheUniversityofWesternOntario.Both
authorsareinterestedinstatisticalcomputing,bothassupportforourother
research and for its own sake. However, we have found that our students
were not learning the right sort of programming basics before they took
our classes. At every level from undergraduate through Ph.D., we found
that the students were not able to produce simple, reliable programs; that
theydidn’tunderstandenoughaboutnumericalcomputationtounderstand
howroundingerrorcouldinfluencetheirresults,andthattheydidn’tknow
how tobegin adifficult computational project.
We looked into service courses from other departments, but we found
that they emphasized languages and concepts that our students would not
use again. Our students need to be comfortable with simple programming
so that they can put together a simulation of a stochastic model; they also
needtoknow enough about numerical analysis sothat theycandonumer-
ical computations reliably. We were unable to find this mix in an existing
course, sowe designed our own.
WechosetobasethistextonR.Risanopensourcecomputingpackage
whichhasseenahugegrowthinpopularityinthelastfewyears.Beingopen
source, it is easily obtainable by students and economical to install in our
computinglab.Oneofus(Murdoch)isamemberofthecoreRdevelopment
team,andtheother(Braun)isaco-authorofabookondataanalysisusing
R. These facts made it easy for us to choose R, but we are both strong
believers in the idea that there are certain universals of programming, and
inthistextwetrytoemphasizethose:itisnotamanualaboutprogramming
inR,it isacourse instatisticalprogramming that uses R.
Studentsstartingthiscoursearenotassumedtohaveanyprogramming
experienceoradvancedstatisticalknowledge.Theyshouldbefamiliarwith
university-level calculus, and should have had exposure to a course in
introductoryprobability,thoughthatcouldbetakenconcurrently:theprob-
abilistic concepts start in Chapter 5. (We include a concise appendix
reviewing the probabilistic material.) We include some advanced topics in
simulation, linear algebra, and optimization that an instructor may choose
toskipinaone-semester course offering.
.002
14:29:58,
xiv PREFACE TO THE FIRST EDITION
Wehavealotofpeopletothankfortheirhelpinwritingthisbook.The
students in Statistical Sciences 259b have provided motivation and feed-
back,LutongZhoudraftedseveralfigures,KristyAlexander,YiwenDiao,
QiangFu,andYuHanwentovertheexercisesandwroteupdetailedsolu-
tions, and Diana Gillooly of Cambridge University Press, Professor Brian
Ripley of Oxford University, and some anonymous reviewers all provided
helpfulsuggestions.Andofcourse,thisbookcouldnotexistwithoutR,and
Rwould be far lessvaluable without the contributions of the worldwide R
community.
W.John Braun
Duncan Murdoch
February, 2007
.002
14:29:58,
1
Getting started
Welcome to the world of statistical programming. This book contains a
lot of specific advice about the hows and whys of the subject. We start in
this chapter by giving you an idea of what statistical programming is all
about.Wewillalsotellyouwhattoexpectasyouproceedthroughtherest
of the book. The chapter will finish with some instructions about how to
download and install R, the software package and language on which we
baseourprogrammingexamples,andRStudio,an“integrateddevelopment
environment” (or “IDE”) for R.
1.1 What is statistical programming?
Computerprogramminginvolvescontrollingcomputers,tellingthemwhat
calculationstodo,whattodisplay,etc.Statisticalprogrammingisharderto
define.Onedefinitionmightbethatit’sthekindofcomputerprogramming
statisticians do – but statisticians do all sorts of programming. Another
would be that it’s the kind of programming one does when one is doing
statistics:but again, statisticsinvolves awide variety of computing tasks.
For example, statisticians are concerned with collecting and analyzing
data, and some statisticians would be involved in setting up connections
betweencomputersandlaboratoryinstruments:butwewouldnotcallthat
statistical programming. Statisticians often oversee data entry from ques-
tionnaires, and may set up programs to aid in detecting data entry errors.
Thatisstatistical programming, but it is quite specialized, and beyond the
scope of thisbook.
Statistical programming involves doing computations to aid in statisti-
calanalysis.Forexample,datamustbesummarizedanddisplayed.Models
must be fit to data, and the results displayed. These tasks can be done in a
number of different computer applications: Microsoft Excel, SAS, SPSS,
S-PLUS,R,Stata,etc.Usingtheseapplicationsiscertainlystatisticalcom-
puting,andusuallyinvolvesstatisticalprogramming,butitisnotthefocus
of this book. In this book our aim is to provide a foundation for an under-
standingofhowthoseapplicationswork:whatarethecalculationstheydo,
and how could you dothem yourself?
.003
03:48:57,
2 GETTING STARTED
Since graphs play an important role in statistical analysis, drawing
graphics of one-, two-, or higher-dimensional data is an aspect of statis-
tical programming.
An important part of statistical programming is stochastic simulation.
Digitalcomputersarenaturallyverygoodatexact,reproduciblecomputa-
tions, but the real world is full of randomness. In stochastic simulation we
program a computer to act as though it is producing random results, even
though, ifwe knew enough, theresultswould be exactly predictable.
Statistical programming is closely related to other forms of numerical
programming.Itinvolvesoptimization,andapproximationofmathematical
functions. Computational linear algebra plays a central role. There is less
emphasis on differential equations than in physics or applied mathematics
(though this is slowly changing). We tend to place more of an emphasis
on the results and less on the analysis of the algorithms than in computer
science.
1.2 Outline of this book
Thisbookisanintroductiontostatisticalprogramming.Wewillstartwith
basicprogramming:howtotellacomputerwhattodo.Wedothisusingthe
open source R statistical package, so we will teach you R, but we will try
not to just teach you R. We will emphasize those things that are common
tomany computing platforms.
Statisticians need to display data. We will show you how to construct
statistical graphics. In doing this, we will learn a little bit about human
vision, and how itmotivates our choice of display.
In our introduction to programming, we will show how to control the
flowofexecutionofaprogram.Forexample,wemightwishtodorepeated
calculationsaslongastheinputconsistsofpositiveintegers,butthenstop
when an input value hits 0. Programming a computer requires basic logic,
andwewilltouchonBooleanalgebra,aformalwaytomanipulatelogical
statements. The best programs are thought through carefullybefore being
implemented, and we will discuss how to break down complex problems
into simple parts. When we are discussing programming, we will spend
quite a lot of time discussing how to get it right: how to be sure that the
computer program iscalculating what you want ittocalculate.
Onedistinguishingcharacteristicofstatisticalprogrammingisthatitis
concernedwithrandomness:randomerrorsindata,andmodelsthatinclude
stochastic components. We will discuss methods for simulating random
values with specified characteristics, and show how random simulations
areuseful inavariety of problems.
Many statistical procedures are based on linear models. While discus-
sion of linear regression and other linear models is beyond the scope of
this book, we do discuss some of the background linear algebra, and how
the computations it involves can be carried out. We also discuss the gen-
eral problem of numerical optimization: finding the values which make a
function as largeor as small aspossible.
.003
03:48:57,
1.4 WHY USE A COMMAND LINE? 3
Each chapter has a number of exercises which are at varying degrees
of difficulty. Solutions to selected exercises can be found on the web at
www.statprogr.science.
1.3 The R package
This book uses R, which is an open source package for statistical comput-
ing.“Opensource”hasanumberofdifferentmeanings;heretheimportant
one is that R is freely available, and its users are free to see how it is writ-
ten, and to improve it. R is based on the computer language S, developed
byJohnChambersandothersatBellLaboratoriesin1976.In1993Robert
GentlemanandRossIhakaattheUniversityofAucklandwantedtoexper-
imentwiththelanguage,sotheydevelopedanimplementation,andnamed
it R. They made it open source in 1995, and thousands of people around
theworldhave contributed toitsdevelopment.
1.4 Why use a command line?
The R system is mainly command-driven, with the user typing in text and
asking R to execute it. Nowadays most programs use interactive graphical
user interfaces (menus, touchscreens, etc.) instead. So why did we choose
suchan old-fashioned way of doing things?
Menu-basedinterfacesareveryconvenientwhenappliedtoalimitedset
of commands, from a few to one or two hundred. However, a command-
line interface is open ended. As we will show in this book, if you want
to program a computer to do something that no one has done before, you
can easily do it by breaking down the task into the parts that make it up,
and then building up a program to carry it out. This may be possible in
some menu-driven interfaces, but it is much easier in a command-driven
interface.
Moreover, learning how to use one command-line interface will give
you skills that carry over to others, and may even give you some insight
intohowamenu-driveninterfaceisimplemented.Asstatisticians,itisour
beliefthatyourgoalshouldbeunderstanding,andlearninghowtoprogram
atacommandlinewillgiveyouthatatafundamentallevel.Learningtouse
amenu-basedprogrammakesyoudependentontheparticularorganization
ofthat program.
There is no question that command-line interfaces require greater
knowledge on the part of the user – you need to remember what to type
toachieveaparticularoutcome.Fortunately,thereishelp.Werecommend
thatyouusetheRStudiointegrateddevelopmentenvironment(IDE).IDEs
were first developed in the 1970s to help programmers: they allow you to
edityourprogram,tosearchforhelp,andtorunit;whenyourfirstattempt
doesn’t work, they offer support for diagnosing and fixing errors. RStudio
is an IDE for R programming, first released in 2011. It is produced by a
Bostoncompany named RStudio, and isavailable for freeuse.
.003
03:48:57,
4 GETTING STARTED
1.5 Font conventions
ThisbookdescribeshowtodocomputationsinR.Aswewillseeinthenext
chapter,thisrequiresthattheusertypesinput,andRrespondswithtextor
graphsasoutput.Toindicatethedifference,wehavetypesettheuserinput
and R output inagray box. The output isprefixed with ##.For example
This was typed by the user
## This is a response from R
In most cases other than this one and certain exercises, we will show
theactual response from Rcorresponding tothepreceding input.
1
1
Wehaveusedthe knitr packageso
thatR itselfiscomputingtheoutput.
The computationsinthetextweredone
with Rversion 3.2.2(2015-08-14).
Therearealsosituationswherethecodeispurelyillustrativeandisnot
meant to be executed. (Many of those are not correct R code at all; others
illustratethesyntaxofRcodeinageneralway.)Inthesesituationswehave
typeset the code examples inanupright typewriter font. For example,
f( some arguments )
1.6 Installation of R and RStudio
R can be http://cloud.r-project.org.Most
users should download and install a binaryversion. This is a version that
has been translated (by compilers) into machine language for execution
on a particular type of computer with a particular operating system. R is
designed to be very portable: it will run on Microsoft Windows, Linux,
Solaris, Mac OSX, and other operating systems, but different binary ver-
sions are required for each. In this book most of what we do would be the
same on any system, but when we write system-specific instructions, we
willassumethat readers are usingMicrosoft Windows.
Installation on Microsoft Windows is straightforward. A binary ver-
sion is available for Windows Vista or above from the web page
http://cloud.r-project.org/bin/windows/base. Download
the “setup program,” a file with a name like R-3.2.5-win.exe. Click-
ing on this file will start an almost automatic installation of the R system.
Though it is possible to customize the installation, the default responses
will lead to a satisfactory installation in most situations, particularly for
beginning users.
One of the default settings of the installation procedure is to create an
R icon on your computer’s desktop.
You should also install RStudio, after you have installed R. As with R,
there are separate versions for different computing platforms, but they all
lookandactsimilarly.Youshoulddownloadthe“OpenSourceEdition”of
“RStudio Desktop” from www.rstudio.com/, and follow the instruc-
tionstoinstall iton your computer.
.003
03:48:57,
1.7 GETTING STARTED IN RSTUDIO 5
Fig.1.1 A typical RStudio display.
1.7 Getting started in RStudio
OnceyouhaveinstalledRandRStudio,youwillbereadytostartstatistical
programming.We’llstartwithaquicktourofRStudio,andintroducemore
detail inlater chapters.
When you are working in RStudio, you’ll see a display something like
Figure 1.1. (The first time you start it, you won’t see all the content that
is in the figure.) The display includes four panes. The top left pane is the
Source Pane, or editor. You will type your program (or other document)
there. The bottom left pane is called theConsolePane. This is where you
communicate with R. You can type directly into this pane, but it is usu-
allybettertoworkwithintheeditorpane,becausethatwayyoucaneasily
correct mistakes and tryagain.
The two right-hand panes contain a variety of tabs. In the figure, the
toppaneisshowingtheWorkspace,andthebottompaneisshowingaplot;
we’ll discuss these and the other tabs in later chapters. For now, you just
need toknow thefollowing points:
a114
Youshoulddomostofyourworkintheeditor,butyoucanoccasionally
typeinthe console.
.003
03:48:57,
6 GETTING STARTED