5021.16 - Data Science and Statistics with RStudio
Data Science and Statistics with RStudio
High school mathematics, B-level.
To introduce modern data science, statistics and the concept of reproducible research by using advanced - but easy to use - facilities in RStudio. Data Science and Statistics with RStudio makes it now much easier to organize and analyse data – as compared to only using R – and to knit code, text, formulas and plots into nice statistical reports, presentations, scientific articles or books, and web sites. All this, without significant use of other software-tools. As a computer scientists, data scientists or researcher you will get an introduction to both modern computational tools and basic theory within statistical analysis of data.
Data Science: data input, organisation, transformation and modelling of tidy data. Introduction to R and RStudio. Configuration and development of RStudio projects for reproducible research. Documentation and presentation: dynamic reports and slides, web-sites with RStudio. Statistics: Types of data, Random sampling and sampling distributions. Probability and random variables. Discrete distributions, among them binomial and Poisson distribution. Continuous distributions, among them normal and exponential distribution. Computer simulations. Statistical description and graphs. Point and interval estimation, confidence-intervals. Statistical models and hypothesis testing for both numerical and categorical data. Correlation, linear and logistic regression, t-test and chi-square test, power of tests, large and small samples. Examples of use of statistics within computer science and other fields, e.g. human health, biology, and sociology.
Learning and teaching approaches
Lectures, computer exercises with open-source R and RStudio, group work and discussion. Open-source textbooks, videos and other modern learning technologies will be used.
As a result of this course, the student will be able to - use RStudio to input data, and to use tidy data for statistical analysis. - configure RStudio projects for reproducible research, and produce and use R-code for statistics. - produce nice tables and plots, reports, presentation-slides and web sites. - demonstrate basic knowledge regarding data, probability, stochastic variables and distributions, and assess which statistical methods can be useful for an actual dataset. - simulate uncertainties and random phenomenon.
The assessment is based on the following elements: - mandatory assignments (two small and one large) to be handed in on time. - an oral examination (counts 100%) without preparation time based on the large assignment and questions within theory and the small assignments. The large assignment will be presented and defended during the examination. Therefore, it will be sent to the censor in good time before the examination. Note: the two small assignements must be handed in on time and approved to be registered for the examination.
Open-source online textbooks: - R for Data Science (http://r4ds.had.co.nz/). - OpenIntroStatistics(https://www.openintro.org/stat/textbook.php?stat_book=os) - Possibly other materials introduced during the course.