{LSTbook}: An R package for Lessons in Statistical Thinking (2024)

The {LSTbook} package provides software and datasets for Lessons in Statistical Thinking.

Installation

We hope to have {LSTbook} available on CRAN by February 2024. In the meanwhile, you can install the development version of {LSTbook} from GitHub with:

Overview

The {LSTbook} package has been developed to help students and instructors learn and teach statistics and early data science. {LSTbook} supports the 2024 textbook Lessons in Statistical Thinking, but instructors may want to use {LSTbook} even with other textbooks.

The statistics component of Lessons may fairly be called a radical innovation. As an introductory, university-level course, Lessons gives students access to important modern themes in statistics including modeling, simulation, co-variation, and causal inference. Data scientists, who use data to make genuine decisions, will get the tools they need. This includes a complete rethinking of statistical inference, starting with confidence intervals very early in the course, then gently introducing the structure of Bayesian inference. The coverage of hypothesis testing has greatly benefited from the discussions prompted by the American Statistical Association’s Statement on P-values and is approached in a way that, I hope, will be appreciated by all sides of the debate.

The data-science part of the course includes the concepts and wrangling needed to undertake statistical investigations (not including data cleaning). It is based, as you might expect, on the tidyverse and {dplyr}.

Some readers may be familiar with the {mosaic} suite of packages which provides, for many students and instructors, their first framework for statistical computation. But there have been many R developments since 2011 when {mosaic} was introduced. These include pipes and the tidyverse style of referring to variables. {mosaic} has an uneasy equilibrium with the tidyverse. In contrast, the statistical functions in {LSTbook} fit in with the tidyverse style and mesh well with {dplyr} commands.

The {LSTbook} function set is highly streamlined and internally consistent. There is a tight set of only four object types produced by the {LSTbook} computations:

  • Data frames
  • Graphic frames ({ggplot2} compatible but much streamlined)
  • Models, which are summarized to produce either data frames or graphic frames.
  • Data simulations (via DAGs) which are sampled from to produce data frames

Vignettes provide an instructor-level tutorial introduction to {LSTbook}. The student-facing introduction is the Lessons in Statistical Thinking textbook.

Statistics for data science

Every instructor of introductory statistics is familiar with textbooks that devote separate chapters to each of a half-dozen basic tests: means, differences in means, proportions, differences in proportions, and simple regression. It’s been known for a century that these topics invoke the same statistical concepts. Moreover, they are merely precursors to the essential multivariable modeling techniques used in mainstream data-science tasks such as dealing with confounding.

To illustrate how {LSTbook} supports teaching such topics in a unified and streamlined way, consider to datasets provided by the {mosaicData} package: Galton, which contains the original data used by Francis Galton in the 1880s to study the heritability of genetic traits, specifically, human height; and Whickham results from a 20-year follow-up survey to study smoking and health.

Start by installing {LSTbook} as described above, then loading it into the R session:

In the examples that follow, we will use the {LSTbook} function point_plot() which handles both numerical and categorical variables using one syntax. Here’s a graphic for looking at the difference between two means.

{LSTbook}: An R package for Lessons in Statistical Thinking (1)

Point plots can be easily annotated with models. To illustrate the difference between the two means, add a model annotation:

{LSTbook}: An R package for Lessons in Statistical Thinking (2)

Other point_plot() annotations are violin and bw.

In Lessons, models are always graphed in the context of the underlying data and shown as confidence intervals.

The same graphics and modeling conventions apply to categorical variables:

{LSTbook}: An R package for Lessons in Statistical Thinking (3)

Simple regression works in the same way:

{LSTbook}: An R package for Lessons in Statistical Thinking (4)

{LSTbook}: An R package for Lessons in Statistical Thinking (5)

The syntax extends naturally to handle the inclusion of covariates. For example, the simple calculation of difference between two proportions is misleading; age, not smoking status, plays the primary role in explaning mortality.

{LSTbook}: An R package for Lessons in Statistical Thinking (6)

NOTE: To highlight statistical inference, we have been working with an n=200 sub-sample of Galton:

Quantitative modeling has the same syntax, but rather than rely on the default R reports for models, {LSTbook} offers concise summaries.

To help students develop an deeper appreciation of the importance of covariates, we can turn to data-generating simulations where we know the rules behind the data and can check whether modeling reveals them faithfully.

{LSTbook}: An R package for Lessons in Statistical Thinking (7)

From the rules, we can see that y increases directly with x, the coefficient being 1. A simple model gets this wrong:

I’ll leave it as an exercise to the reader to see what happens when c is included in the model as a covariate.

Finally, an advanced example that’s used as a demonstration but illustrates the flexibility of unifying modeling, simulation, and wrangling. We’ll calculate the width of the x confidence interval as a function of the sample size n and averaging over 100 trials.

I’ve used only two trials to show the output of trials(), but increase it to, say, times = 100 and finish off the wrangling with the {dplyr} function summarize(mean(width), .by = sample_size).

#> sample_size mean(width)#> 1 100 0.34800368#> 2 400 0.17059320#> 3 1600 0.08483481#> 4 6400 0.04251015#> 5 25600 0.02123563
{LSTbook}: An R package for Lessons in Statistical Thinking (2024)

References

Top Articles
41 Unique Office Bulletin Board Ideas Your Team Will Love
T-Shirts von 100 Days Of School Soccer 100th Day Boys Girls für Jungen - freenet.de
Genesis Parsippany
How To Do A Springboard Attack In Wwe 2K22
Free Atm For Emerald Card Near Me
Gunshots, panic and then fury - BBC correspondent's account of Trump shooting
Ncaaf Reference
Strange World Showtimes Near Cmx Downtown At The Gardens 16
Danielle Longet
Craigslist Dog Kennels For Sale
R/Altfeet
Vcuapi
Huge Boobs Images
Sivir Urf Runes
Wisconsin Women's Volleyball Team Leaked Pictures
50 Shades Darker Movie 123Movies
Metro Pcs.near Me
What Is Vioc On Credit Card Statement
Ein Blutbad wie kein anderes: Evil Dead Rise ist der Horrorfilm des Jahres
Kcwi Tv Schedule
Conan Exiles Sorcery Guide – How To Learn, Cast & Unlock Spells
Clare Briggs Guzman
Sullivan County Image Mate
Used Safari Condo Alto R1723 For Sale
Azur Lane High Efficiency Combat Logistics Plan
Craigslist Illinois Springfield
Walgreens Bunce Rd
Apartments / Housing For Rent near Lake Placid, FL - craigslist
Craigslist Lake Charles
Craig Woolard Net Worth
1145 Barnett Drive
Skepticalpickle Leak
Uno Fall 2023 Calendar
Desales Field Hockey Schedule
Devargasfuneral
O'reilly's Wrens Georgia
EST to IST Converter - Time Zone Tool
All Things Algebra Unit 3 Homework 2 Answer Key
Kelly Ripa Necklace 2022
Boone County Sheriff 700 Report
Cygenoth
Craigslist Com Panama City Fl
Cnp Tx Venmo
Nail Salon Open On Monday Near Me
Frigidaire Fdsh450Laf Installation Manual
Used Auto Parts in Houston 77013 | LKQ Pick Your Part
Tyrone Dave Chappelle Show Gif
Grandma's Portuguese Sweet Bread Recipe Made from Scratch
Tenichtop
Loss Payee And Lienholder Addresses And Contact Information Updated Daily Free List Bank Of America
Elizabethtown Mesothelioma Legal Question
Latest Posts
Article information

Author: Nicola Considine CPA

Last Updated:

Views: 5694

Rating: 4.9 / 5 (49 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Nicola Considine CPA

Birthday: 1993-02-26

Address: 3809 Clinton Inlet, East Aleisha, UT 46318-2392

Phone: +2681424145499

Job: Government Technician

Hobby: Calligraphy, Lego building, Worldbuilding, Shooting, Bird watching, Shopping, Cooking

Introduction: My name is Nicola Considine CPA, I am a determined, witty, powerful, brainy, open, smiling, proud person who loves writing and wants to share my knowledge and understanding with you.