DataSum is built for the first serious look at a dataset. Before modeling, teaching, or publication, analysts need to know what is missing, what is unusual, which variables are skewed, whether normality checks are meaningful, and which columns need closer inspection.
summarize_vector(c(1, 2, 2, NA, 10), name = "score")
#> variable type n n_complete n_missing missing_pct n_unique mode mode_count
#> 1 score numeric 5 4 1 20 3 2 2
#> mode_ties mean median sd variance minimum q25 q75 maximum range iqr
#> 1 FALSE 3.75 2 4.193249 17.58333 1 1.75 4 10 9 2.25
#> mad skewness excess_kurtosis outlier_count outlier_pct normality_test
#> 1 0.7413 0.7209456 -1.70475 1 25 Shapiro-Wilk
#> normality_statistic normality_p_value normality_alpha
#> 1 0.7252874 0.02203226 0.05
#> normality_decision warning
#> 1 Evidence against normality <NA>summarize_data(iris)
#> variable type n n_complete n_missing missing_pct n_unique
#> 1 Sepal.Length numeric 150 150 0 0 35
#> 2 Sepal.Width numeric 150 150 0 0 23
#> 3 Petal.Length numeric 150 150 0 0 43
#> 4 Petal.Width numeric 150 150 0 0 22
#> 5 Species factor 150 150 0 0 3
#> mode mode_count mode_ties mean median sd
#> 1 5 10 FALSE 5.843333 5.80 0.8280661
#> 2 3 26 FALSE 3.057333 3.00 0.4358663
#> 3 1.4, 1.5 13 TRUE 3.758000 4.35 1.7652982
#> 4 0.2 29 FALSE 1.199333 1.30 0.7622377
#> 5 setosa, versicolor, virginica 50 TRUE NA NA NA
#> variance minimum q25 q75 maximum range iqr mad skewness
#> 1 0.6856935 4.3 5.1 6.4 7.9 3.6 1.3 1.03782 0.3086407
#> 2 0.1899794 2.0 2.8 3.3 4.4 2.4 0.5 0.44478 0.3126147
#> 3 3.1162779 1.0 1.6 5.1 6.9 5.9 3.5 1.85325 -0.2694109
#> 4 0.5810063 0.1 0.3 1.8 2.5 2.4 1.5 1.03782 -0.1009166
#> 5 NA NA NA NA NA NA NA NA NA
#> excess_kurtosis outlier_count outlier_pct normality_test normality_statistic
#> 1 -0.6058125 0 0.000000 Shapiro-Wilk 0.9760903
#> 2 0.1387047 4 2.666667 Shapiro-Wilk 0.9849179
#> 3 -1.4168574 0 0.000000 Shapiro-Wilk 0.8762681
#> 4 -1.3581792 0 0.000000 Shapiro-Wilk 0.9018349
#> 5 NA 0 NA <NA> NA
#> normality_p_value normality_alpha normality_decision
#> 1 1.018116e-02 0.05 Evidence against normality
#> 2 1.011543e-01 0.05 No evidence against normality
#> 3 7.412263e-10 0.05 Evidence against normality
#> 4 1.680465e-08 0.05 Evidence against normality
#> 5 NA 0.05 Not tested
#> warning
#> 1 <NA>
#> 2 <NA>
#> 3 <NA>
#> 4 <NA>
#> 5 Normality requires at least 3 finite numeric values.Grouped summaries are useful for teaching and comparative research workflows.
summarize_data(iris, by = "Species")
#> Warning in data.frame(..., check.names = FALSE): row names were found from a
#> short variable and have been discarded
#> Warning in data.frame(..., check.names = FALSE): row names were found from a
#> short variable and have been discarded
#> Warning in data.frame(..., check.names = FALSE): row names were found from a
#> short variable and have been discarded
#> Species variable type n n_complete n_missing missing_pct n_unique
#> 1 setosa Sepal.Length numeric 50 50 0 0 15
#> 2 setosa Sepal.Width numeric 50 50 0 0 16
#> 3 setosa Petal.Length numeric 50 50 0 0 9
#> 4 setosa Petal.Width numeric 50 50 0 0 6
#> 5 versicolor Sepal.Length numeric 50 50 0 0 21
#> 6 versicolor Sepal.Width numeric 50 50 0 0 14
#> 7 versicolor Petal.Length numeric 50 50 0 0 19
#> 8 versicolor Petal.Width numeric 50 50 0 0 9
#> 9 virginica Sepal.Length numeric 50 50 0 0 21
#> 10 virginica Sepal.Width numeric 50 50 0 0 13
#> 11 virginica Petal.Length numeric 50 50 0 0 20
#> 12 virginica Petal.Width numeric 50 50 0 0 12
#> mode mode_count mode_ties mean median sd variance minimum
#> 1 5, 5.1 8 TRUE 5.006 5.00 0.3524897 0.12424898 4.3
#> 2 3.4 9 FALSE 3.428 3.40 0.3790644 0.14368980 2.3
#> 3 1.4, 1.5 13 TRUE 1.462 1.50 0.1736640 0.03015918 1.0
#> 4 0.2 29 FALSE 0.246 0.20 0.1053856 0.01110612 0.1
#> 5 5.5, 5.6, 5.7 5 TRUE 5.936 5.90 0.5161711 0.26643265 4.9
#> 6 3 8 FALSE 2.770 2.80 0.3137983 0.09846939 2.0
#> 7 4.5 7 FALSE 4.260 4.35 0.4699110 0.22081633 3.0
#> 8 1.3 13 FALSE 1.326 1.30 0.1977527 0.03910612 1.0
#> 9 6.3 6 FALSE 6.588 6.50 0.6358796 0.40434286 4.9
#> 10 3 12 FALSE 2.974 3.00 0.3224966 0.10400408 2.2
#> 11 5.1 7 FALSE 5.552 5.55 0.5518947 0.30458776 4.5
#> 12 1.8 11 FALSE 2.026 2.00 0.2746501 0.07543265 1.4
#> q25 q75 maximum range iqr mad skewness excess_kurtosis
#> 1 4.800 5.200 5.8 1.5 0.400 0.29652 0.11297784 -0.4508724
#> 2 3.200 3.675 4.4 2.1 0.475 0.37065 0.03872946 0.5959507
#> 3 1.400 1.575 1.9 0.9 0.175 0.14826 0.10009538 0.6539303
#> 4 0.200 0.300 0.6 0.5 0.100 0.00000 1.17963278 1.2587179
#> 5 5.600 6.300 7.0 2.1 0.700 0.51891 0.09913926 -0.6939138
#> 6 2.525 3.000 3.4 1.4 0.475 0.29652 -0.34136443 -0.5493203
#> 7 4.000 4.600 5.1 2.1 0.600 0.51891 -0.57060243 -0.1902555
#> 8 1.200 1.500 1.8 0.8 0.300 0.22239 -0.02933377 -0.5873144
#> 9 6.225 6.900 7.9 3.0 0.675 0.59304 0.11102862 -0.2032597
#> 10 2.800 3.175 3.8 1.6 0.375 0.29652 0.34428489 0.3803832
#> 11 5.100 5.875 6.9 2.4 0.775 0.66717 0.51691747 -0.3651161
#> 12 1.800 2.300 2.5 1.1 0.500 0.29652 -0.12181190 -0.7539586
#> outlier_count outlier_pct normality_test normality_statistic
#> 1 0 0 Shapiro-Wilk 0.9776985
#> 2 2 4 Shapiro-Wilk 0.9717195
#> 3 4 8 Shapiro-Wilk 0.9549768
#> 4 2 4 Shapiro-Wilk 0.7997645
#> 5 0 0 Shapiro-Wilk 0.9778357
#> 6 0 0 Shapiro-Wilk 0.9741333
#> 7 1 2 Shapiro-Wilk 0.9660044
#> 8 0 0 Shapiro-Wilk 0.9476263
#> 9 1 2 Shapiro-Wilk 0.9711794
#> 10 3 6 Shapiro-Wilk 0.9673905
#> 11 0 0 Shapiro-Wilk 0.9621864
#> 12 0 0 Shapiro-Wilk 0.9597715
#> normality_p_value normality_alpha normality_decision warning
#> 1 4.595132e-01 0.05 No evidence against normality <NA>
#> 2 2.715264e-01 0.05 No evidence against normality <NA>
#> 3 5.481147e-02 0.05 No evidence against normality <NA>
#> 4 8.658573e-07 0.05 Evidence against normality <NA>
#> 5 4.647370e-01 0.05 No evidence against normality <NA>
#> 6 3.379951e-01 0.05 No evidence against normality <NA>
#> 7 1.584778e-01 0.05 No evidence against normality <NA>
#> 8 2.727780e-02 0.05 Evidence against normality <NA>
#> 9 2.583147e-01 0.05 No evidence against normality <NA>
#> 10 1.808960e-01 0.05 No evidence against normality <NA>
#> 11 1.097754e-01 0.05 No evidence against normality <NA>
#> 12 8.695419e-02 0.05 No evidence against normality <NA>profile <- profile_data(iris)
profile$dataset
#> rows columns complete_rows duplicated_rows total_missing missing_pct
#> 1 150 5 150 1 0 0
#> type_profile
#> 1 factor=1, numeric=4
profile$warnings
#> variable level
#> 1 <dataset> duplicates
#> 2 Sepal.Length normality
#> 3 Petal.Length normality
#> 4 Petal.Width normality
#> 5 Species data-quality
#> message
#> 1 Duplicate rows were detected.
#> 2 Normality test suggests evidence against a normal distribution.
#> 3 Normality test suggests evidence against a normal distribution.
#> 4 Normality test suggests evidence against a normal distribution.
#> 5 Normality requires at least 3 finite numeric values.report_path <- datasum_report(iris, format = "qmd", render = FALSE)
file.exists(report_path)
#> [1] TRUEThe generated Quarto source contains the dataset overview, variable
diagnostics, warnings, formula definitions, and interpretation notes.
Rendering HTML, PDF, or DOCX output is available when the optional
quarto package and Quarto CLI are installed.