APS 240: Data Analysis and Statistics with R
2021-01-14
Chapter 1 Course information and overview
This is the online course book for the Data Analysis and Statistics with R (APS 240) module. You can view this book in any modern desktop browser, as well as on your phone or tablet device.
1.1 Why do a data analysis course?
To do science yourself, or to understand the science other people do, you need some understanding of the principles of experimental design, data collection, statistical analysis and scientific presentation. That does not mean becoming a maths wizard, or a computer genius. It means knowing how to take sensible decisions about designing studies and collecting data, and then being able to interpret those data correctly. Sometimes the methods required are extremely simple, sometimes more complex. You aren’t expected to get to grips with all of them, but what we hope to do in the course is to give you a good practical grasp of the core techniques that are widely used in biology. You should then be equipped to use these techniques intelligently and, equally importantly, know when they are not appropriate, and when you need to seek help to find the correct way to design or analyse your study.
You should, with some work on your part, acquire a set of skills which you will use at various stages throughout the remainder of your course, in practicals, field courses and in your project work. These same skills will almost certainly also be useful after your degree, whether doing biology, or something completely different. We live in a world that is increasingly flooded with data, and people who know how to make sense of this are in high demand.
1.2 Course overview
1.2.1 Aims
This course has two main, and equal, aims. The first is to provide a basic training in the use of statistical methods and software (R and RStudio) to analyse biological data. The second is to introduce some of the principles of experimental design, sampling, data interpretation, graphical presentation and scientific writing relevant to the biological sciences.
1.2.2 Objectives
By the end of the course you should be familiar with the principles and use of a range of basic statistical techniques, be able to use the R programming language to carry out appropriate analyses, evaluate your statistical models, and make sensible interpretation of the results. You should be able to relate the ways in which data are collected to the types of statistical methods that can be used to analyse those data. In combination with the skills you developed in APS 135, you should be able to decide on appropriate ways of investigate data graphically, be able to produce good quality scientific figures, and incorporate these, along with statistical results, into a formal report.
1.2.3 Assumed background
You are assumed to be familiar with the use of personal computers and with the use of R for data input, manipulation and plotting. The key skills required are reviewed in the prerequisites chapter. If you are unsure about these basic methods you should familiarise yourself with the material introduced in APS 135.
1.2.4 Methods
Students come to this course with very different prior experiences and competencies. Consequently, the course is designed as a ‘self-directed’ module to provide flexibility in how people work through the material.
Everything is meant to be covered in one semester (~12 weeks). Each week involves a block of preparatory reading and ‘walk-through’ R practical from the course book, which is then followed by one or more interactive learnr tutorials to practise application and consolidate learning.
1.2.4.1 Preparatory reading and practical work
The book chapters are generally of one of two types:
Concept chapters. These focus on ideas and concepts, rather then using R per se. They provide background information or more detailed discussion relating to each topic. These are an integral part of the course. Please don’t skip them!
Practical walk-through chapters. These are designed to introduce the practical aspects of different analysis techniques. These generally focus on the ‘when’ and ‘why’ we use a particular technique, as well how to actually do it in R.
It is up to you to complete the preparatory reading in your own time. You are not expected to understand everything when you first encounter it. Do your best to understand it, taking careful notes of anything you’re struggling with. The TAs and staff will happily answer any questions you have via the discussion board or feedback lectures.
1.2.5 What do you need to learn?
The learning outcomes chapter tells you what you should know and be able to do by the end of the course. It also highlights what you don’t need to know from an assessment perspective. As always, feel free to post a message on the discussion board if you’re not sure of what you need to be able to do.
1.2.6 What is required of you?
A willingness to learn and to take responsibility for your learning! Data analysis is not the easiest subject in the world, but neither is it the most difficult. It’s worth making the effort because what you learn in this course will form the basis for much of what you do in practical and project work that follows in later semesters.
The minimum requirement for the course is that you:
complete the preparatory practical work and reading, taking careful notes of anything that you don’t understand,
be proactive about your learning and post questions on the discussion about things you’re struggling with,
work through the learnr tutorials each week to consolidate your learning and practise using R.
How you work through the book is fairly flexible, but remember, the self-study preparatory work lays the foundation for the material covered in the tutorials. Take our word for it, you will probably struggle with the tutorials if you skip the preparatory reading.
A word of advice: Don’t let the flexibility of the course tempt you into letting a backlog of work build up. It will be hard to catch up later if you haven’t made a good ‘first pass’ at all the material during semester.
One last point: the university guidelines assume the total study time associated with a 10 credit module to be about 100 hours. You should aim to spend about 4-5 hours each week working through the chapters, completing the tutorials, and attending feedback sessions during semester. This leaves you about 40-50 hours to revise for the formal exam in the new year.
1.3 How to use the teaching material
1.3.1 The online course book
All of the teaching material is available through this online course book. The book is organised such that it forms a complete, stand-alone introduction to data analysis. Bookmark it now if you haven’t already done so. There are a couple of good reasons for delivering the course material this way:
Practicality: Most exercises can be completed by building on the examples in the course material. Copying bits of relevant R code from the course website into a script is much more efficient, and less error-prone, than copying by eye from a printed page. A website also allows us to cross-reference topics and link to the odd bit of outside reading.
Permanence: Experience suggests that many students will want to refer to the material in this course after they graduate. Bits of paper are easy to lose, and because the R landscape is always changing, some elements of the course may be less relevant in a few years time. Putting everything on a website ensures you will always be able to access a familiar but up-to-date data analysis course.
1.3.2 How to make best use of the teaching material
DO:
When working through a tutorial exercise, follow the instructions carefully, but also think about what you are doing. Work at your own pace; you are not going to be assessed on whether you can do an analysis quickly.
Ask teaching staff for help using eth discussion boards if there are things you don’t follow, or when things don’t seem to come out the way they should—that’s what they’re there for!
Collaborate! If you are not sure you understand something feel free to discuss it with a friend—more often than not this is exactly how scientists resolve and clarify problems in their experimental design and analysis.
Be prepared to experiment with R to solve problems that you encounter. You can’t break R or RStudio by generating an error. When you run into a problem, go back to the line of code that generated the first error and try making a change.
Complete each week’s work before the next week’s session. You may be able to complete some sessions quite quickly, others may take more time and require more effort form you.
DON’T:
Just copy what someone else tells you to do without understanding why you are doing it. You need to understand it for yourself!
Skip preparatory work or get behind schedule — there is too much material to assimilate all at once.
Like all skills, data analysis is something you have practice. It often feels difficult and confusing at first but it does get easier with practise.
1.3.3 Conventions used in the course material
The teaching material, as far as possible, uses a standard set of conventions to differentiate between various sorts of information, action and instruction:
1.3.3.1 Text, instructions, and explanations
Normal text, instructions, explanations etc. are written in the same type as this paragraph (obvious really). We will often use bold to highlight specific technical terms when they are first introduced. Italics are generally used for emphasis and with Latin names.
When we want you to do something important or pay particular attention we place the text inside a box like this one:
Brown boxes
These boxes contain text telling you to do something or remember something important. Sometimes it is just signposting that you should really pay attention to the next bit.
Please don’t ignore these boxes.
At various points in the text you will also come across different coloured boxes that contain additional information:
Blue boxes
These aim to offer a not-too-technical discussion of how or why something works the way it does. These are things that it may be useful to know, or at least know about, but aren’t necessarily part of the main thread of a section.
Red boxes
These contain a warning or flag a common gotcha that may trip you up. They highlight potential pitfalls and show you how to avoid them. You will avoid future mistakes if pay close attention to these.
We use block quotations to indicate an example of how a statistical result should be presented when you write it in a report: e.g.
The mean lengths of male and female locusts differed significantly (t=4.04, df=15, p=0.001), with males being significantly larger.
1.3.3.2 R code, files and RStudio
This typeface
is used to distinguish R code within a sentence of text: e.g. “We use the summary
function to obtain information about an object produced by the lm
function.”
A sequence of selections from an RStudio menu is indicated as follows: e.g. File ▶ New File ▶ R Script
Data file names referred to in general text are given in upper case in the normal typeface: e.g. MYFILE.CSV.
1.3.4 Feedback
There are a number of ways in which you can obtain feedback on how well you understand the course material.
1.3.4.1 Self-assessment questions:
There are multiple choice questions for you to answer at various points in the learnr tutorials. These provide a bit of instant instant ‘formative assessment,’ i.e. they allow you to check your understanding of a topic. They will often provide some additional explanation as to why that answer is right (or wrong!).
1.3.4.2 Online drop-in sessions
You will be able to access ‘live,’ online drop-in sessions several times a week, during which you can ask questions about any conceptual or practical aspect of the course causing you problems. These sessions will be run by academic staff and graduate teaching assistants. Details of these sessions will be posted on Blackboard.
1.3.4.3 Working together:
Discussing what you are doing or working through a problem with someone else can help clarify your understanding and we encourage you to work with one another on tutorial exercises. Use whatever chat or video conferencing platform works best for you to make this happen. Bear in mind, however, that you learn little or nothing by simply copying information from someone else, and when it comes to an assessment, it must be your own work.
1.3.5 Overall…
We hope that the material is clear and easy to use, and that you find the course useful, or even enjoy it!
In a text of this size, which is continually being improved and updated, errors do creep in; if you find something you think is wrong please tell us. If it’s not wrong we will happily to explain why, and if it is then you will save yourself and others a lot of confusion. Similarly, if you have any comments or suggestions for improving the teaching materials please let us know.
1.4 Health and safety using display screen equipment
Although using a computer may not seem like a particularly risky activity you should be aware that you can suffer ill effects if you work at a computer for long periods without observing a few sensible precautions. The standard guidelines are as follows:
Make sure that your equipment is properly adjusted:
ensure that your lower back is well supported by adjusting the seat back height
adjust your chair seat height so that your forearms are level when using the keyboard
make sure that the front edge of the keyboard is at least 8-10 cm away from the edge of the desk
if you are using a mouse, have it far enough away from the edge of the desk so that your wrist is supported whilst you use it.
Do not have your screen positioned in such a way that there is glare or reflections from the windows or room lights on the screen.
Take regular breaks away from the computer. It is recommended that you take about 10 minutes break every hour.