5021.20 - Data Science and Statistics with RStudio
Data Science and Statistics with RStudio
High school mathematics, B-level.
To introduce modern data science, statistics and the concept of reproducible research by using advanced - but easy to use - facilities in RStudio. Data Science and Statistics with RStudio makes it now much easier to organize and analyse data – as compared to only using R – to knit code, text, formulas and plots into nice statistical reports. As a computer scientists, data scientists or researcher you will get an introduction to both modern computational tools and basic theory within statistical analysis of data.
Data Science: data input, organisation, transformation and modelling of tidy data. Introduction to R and RStudio. Configuration and development of RStudio projects for reproducible research. Documentation and presentation: dynamic reports with RStudio. Statistics: Types of data, Random sampling and sampling distributions. Probability and random variables. Discrete distributions, among them binomial and Poisson distribution. Continuous distributions, among them normal and exponential distribution. Statistical description and graphs. Point and interval estimation, confidence-intervals. Statistical models and hypothesis testing for both numerical and categorical data. Correlation and linear regression, t-test and chi-square test, power of tests, large and small samples. Examples of use of statistics within computer science and other fields, e.g. human health, biology, and sociology.
Learning and teaching approaches
Lectures, computer exercises with open-source R and RStudio, group work and discussion.
As a result of this course, the student will be able to - use RStudio to input data, and to use tidy data for statistical analysis. - configure RStudio projects for reproducible research, and produce and use R-code for statistics. - to produce nice reports including tables and plots obeying to the principles of reproducible research with RStudio. - demonstrate basic knowledge regarding data, probability, stochastic variables and distributions, and assess which statistical methods can be useful for an actual dataset.
The assessment is based on the following elements: - mandatory assignments (two small and one large) to be handed in on time. - an oral examination (counts 100%) without preparation time based on the large assignment and questions within theory and the small assignments. The large assignment will be presented and defended during the examination. Therefore, it will be sent to the censor in good time before the examination. Note: the two small assignments must be handed in on time and approved to be registered for the examination.
Open-source online textbooks: - R for Data Science (http://r4ds.had.co.nz/). - OpenIntroStatistics(https://www.openintro.org) - Possibly other materials introduced during the course.