1 Zakład Bioinformatyki, Instytut Informatyki, Uniwersytet w Białymstoku
✉ Correspondence: Jarosław Kotowicz <j.kotowicz@uwb.edu.pl>
daneMieszkania <- read_delim("http://www.biecek.pl/R/dane/daneMieszkania.csv",
";",
escape_double = FALSE,
trim_ws = TRUE)
Parsed with column specification:
cols(
cena = [32mcol_double()[39m,
pokoi = [32mcol_double()[39m,
powierzchnia = [32mcol_double()[39m,
dzielnica = [31mcol_character()[39m,
`typ budynku` = [31mcol_character()[39m
)
cena pokoi powierzchnia dzielnica typ budynku
Min. : 83280 Min. :1.00 Min. :17.00 Length:200 Length:200
1st Qu.:143304 1st Qu.:2.00 1st Qu.:31.15 Class :character Class :character
Median :174935 Median :3.00 Median :43.70 Mode :character Mode :character
Mean :175934 Mean :2.55 Mean :46.20
3rd Qu.:208741 3rd Qu.:3.00 3rd Qu.:61.40
Max. :295762 Max. :4.00 Max. :87.70
cena pokoi powierzchnia dzielnica typ budynku
Min. : 83280 Min. :1.00 Min. :17.00 Biskupin :65 kamienica :61
1st Qu.:143304 1st Qu.:2.00 1st Qu.:31.15 Krzyki :79 niski blok:63
Median :174935 Median :3.00 Median :43.70 Srodmiescie:56 wiezowiec :76
Mean :175934 Mean :2.55 Mean :46.20
3rd Qu.:208741 3rd Qu.:3.00 3rd Qu.:61.40
Max. :295762 Max. :4.00 Max. :87.70
Call:
lm(formula = cena ~ dzielnica, data = daneMieszkania)
Coefficients:
(Intercept) dzielnicaKrzyki dzielnicaSrodmiescie
189494 -21321 -18351
Interpretacja wyniku!
Call:
lm(formula = cena ~ dzielnica - 1, data = daneMieszkania)
Coefficients:
dzielnicaBiskupin dzielnicaKrzyki dzielnicaSrodmiescie
189494 168173 171143
Interpretacja wyniku!
Analysis of Variance Table
Response: cena
Df Sum Sq Mean Sq F value Pr(>F)
dzielnica 2 1.7995e+10 8997691613 5.0456 0.007294 **
Residuals 197 3.5130e+11 1783263361
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Interpretacja wyniku!
Analysis of Variance Table
Response: cena
Df Sum Sq Mean Sq F value Pr(>F)
dzielnica 3 6.2086e+12 2.0695e+12 1160.5 < 2.2e-16 ***
Residuals 197 3.5130e+11 1.7833e+09
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Interpretacja wyniku!
1 2 3 4 5 6
58125.20 -61372.64 -52810.03 30920.71 -37869.97 -33252.10
Jarque-Bera test for normality
data: model$residuals
JB = 5.2583, p-value = 0.054
Interpretacja wyniku!
Durbin-Watson test
data: daneMieszkania$cena ~ daneMieszkania$dzielnica
DW = 2.1565, p-value = 0.8655
alternative hypothesis: true autocorrelation is greater than 0
Interpretacja wyniku!
Goldfeld-Quandt test
data: daneMieszkania$cena ~ daneMieszkania$dzielnica
GQ = 1.0691, df1 = 97, df2 = 97, p-value = 0.3713
alternative hypothesis: variance increases from segment 1 to 2
Interpretacja wyniku!
mean sd
1.1618982 2.0215240
(0.2021524) (0.1429433)
mean sd
1.161898 2.021524
meanlog sdlog
0.2618982 2.0215240
meanlog sdlog
0.2618982 2.0215240
(0.2021524) (0.1429433)
wyprodukowano warto㤼㹣ci NaNwyprodukowano warto㤼㹣ci NaNwyprodukowano warto㤼㹣ci NaNwyprodukowano warto㤼㹣ci NaNwyprodukowano warto㤼㹣ci NaNwyprodukowano warto㤼㹣ci NaNwyprodukowano warto㤼㹣ci NaNwyprodukowano warto㤼㹣ci NaNwyprodukowano warto㤼㹣ci NaNwyprodukowano warto㤼㹣ci NaNwyprodukowano warto㤼㹣ci NaNwyprodukowano warto㤼㹣ci NaNwyprodukowano warto㤼㹣ci NaNwyprodukowano warto㤼㹣ci NaNwyprodukowano warto㤼㹣ci NaNwyprodukowano warto㤼㹣ci NaNwyprodukowano warto㤼㹣ci NaN
shape rate
0.36475915 0.04496287
(0.04138815) (0.00901971)
library(readr)
daneMieszkania <- read_delim("http://www.biecek.pl/R/dane/daneMieszkania.csv",
";", escape_double = FALSE, trim_ws = TRUE)
Parsed with column specification:
cols(
cena = [32mcol_double()[39m,
pokoi = [32mcol_double()[39m,
powierzchnia = [32mcol_double()[39m,
dzielnica = [31mcol_character()[39m,
`typ budynku` = [31mcol_character()[39m
)
[30m-- [1mAttaching packages[22m --------------------------------------- tidyverse 1.3.0 --[39m
[30m[32m<U+221A>[30m [34mggplot2[30m 3.3.0 [32m<U+221A>[30m [34mdplyr [30m 0.8.5
[32m<U+221A>[30m [34mtibble [30m 3.0.1 [32m<U+221A>[30m [34mstringr[30m 1.4.0
[32m<U+221A>[30m [34mtidyr [30m 1.0.2 [32m<U+221A>[30m [34mforcats[30m 0.5.0
[32m<U+221A>[30m [34mpurrr [30m 0.3.4 [39m
[30m-- [1mConflicts[22m ------------------------------------------ tidyverse_conflicts() --
[31mx[30m [34mdplyr[30m::[32mfilter()[30m masks [34mstats[30m::filter()
[31mx[30m [34mdplyr[30m::[32mlag()[30m masks [34mstats[30m::lag()
[31mx[30m [34mdplyr[30m::[32mselect()[30m masks [34mMASS[30m::select()[39m
cena pokoi powierzchnia dzielnica typ budynku
Min. : 83280 Min. :1.00 Min. :17.00 Length:200 Length:200
1st Qu.:143304 1st Qu.:2.00 1st Qu.:31.15 Class :character Class :character
Median :174935 Median :3.00 Median :43.70 Mode :character Mode :character
Mean :175934 Mean :2.55 Mean :46.20
3rd Qu.:208741 3rd Qu.:3.00 3rd Qu.:61.40
Max. :295762 Max. :4.00 Max. :87.70
t.test((daneMieszkania[daneMieszkania$dzielnica == "Biskupin",])$cena,
(daneMieszkania[daneMieszkania$dzielnica == "Krzyki",])$cena)
Welch Two Sample t-test
data: (daneMieszkania[daneMieszkania$dzielnica == "Biskupin", ])$cena and (daneMieszkania[daneMieszkania$dzielnica == "Krzyki", ])$cena
t = 2.9793, df = 140.82, p-value = 0.003404
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
7173.093 35468.945
sample estimates:
mean of x mean of y
189494 168173
Interpretacja wyniku!
Call:
lm(formula = cena ~ dzielnica, data = daneMieszkania)
Coefficients:
(Intercept) dzielnicaKrzyki dzielnicaSrodmiescie
189494 -21321 -18351
Interpretacja wyniku!
Analysis of Variance Table
Response: cena
Df Sum Sq Mean Sq F value Pr(>F)
dzielnica 2 1.7995e+10 8997691613 5.0456 0.007294 **
Residuals 197 3.5130e+11 1783263361
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Interpretacja wyniku!
Call:
lm(formula = cena ~ dzielnica - 1, data = daneMieszkania)
Coefficients:
dzielnicaBiskupin dzielnicaKrzyki dzielnicaSrodmiescie
189494 168173 171143
Interpretacja wyniku!
Analysis of Variance Table
Response: cena
Df Sum Sq Mean Sq F value Pr(>F)
dzielnica 3 6.2086e+12 2.0695e+12 1160.5 < 2.2e-16 ***
Residuals 197 3.5130e+11 1.7833e+09
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Interpretacja wyniku!
Nast攼㹡puj戼㸹ce obiekty zosta戼㸳y zakryte z daneMieszkania (pos = 13):
cena, dzielnica, pokoi, powierzchnia, typ budynku
Było 28 ostrzeżenie (użyj 'warnings()' aby je zobaczyć)
Call:
lm(formula = cena ~ typ, data = daneMieszkania)
Coefficients:
(Intercept) typniski blok typwiezowiec
178318 10473 -14955
Interpretacja wyniku!
Analysis of Variance Table
Response: cena
Df Sum Sq Mean Sq F value Pr(>F)
typ 2 2.2770e+10 1.1385e+10 6.4725 0.001895 **
Residuals 197 3.4653e+11 1.7590e+09
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Interpretacja wyniku!
Call:
lm(formula = cena ~ typ - 1, data = daneMieszkania)
Coefficients:
typkamienica typniski blok typwiezowiec
178318 188791 163363
Interpretacja wyniku!
Analysis of Variance Table
Response: cena
Df Sum Sq Mean Sq F value Pr(>F)
typ 3 6.2133e+12 2.0711e+12 1177.4 < 2.2e-16 ***
Residuals 197 3.4653e+11 1.7590e+09
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Interpretacja wyniku!
[1] 200
One-sample Kolmogorov-Smirnov test
data: model4$residuals
D = 0.505, p-value < 2.2e-16
alternative hypothesis: two-sided
Interpretacja wyniku!
Anderson-Darling normality test
data: model4$residuals
A = 0.63027, p-value = 0.09918
Interpretacja wyniku!
Jarque Bera Test
data: model4$residuals
X-squared = 4.0672, df = 2, p-value = 0.1309
Interpretacja wyniku!
Length Class Mode
100 character character
[1] "z" "z" "x" "y" "x" "y" "y" "x" "y" "y" "y" "x" "y" "y" "x" "z" "z" "y" "y" "y" "y" "y" "y"
[24] "x" "z" "y" "x" "x" "y" "x" "y" "x" "y" "x" "y" "y" "x" "y" "x" "y" "y" "y" "z" "y" "y" "x"
[47] "y" "y" "y" "x" "x" "y" "y" "x" "y" "y" "y" "x" "x" "y" "y" "z" "x" "z" "x" "y" "y" "z" "y"
[70] "y" "x" "x" "y" "y" "x" "y" "y" "y" "y" "y" "z" "x" "x" "z" "z" "x" "y" "x" "x" "z" "y" "x"
[93] "z" "y" "y" "y" "z" "x" "z" "y"
a b
x 21 10
y 41 12
z 10 6
Aproksymacja chi-kwadrat mo戼㹦e by攼㸶 niepoprawna
Pearson's Chi-squared test
data: tablica
X-squared = 1.7499, df = 2, p-value = 0.4169
Interpretacja wyniku!
Pearson's product-moment correlation
data: x.norm and x.gamma
t = -1.9241, df = 998, p-value = 0.05462
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.122326536 0.001203135
sample estimates:
cor
-0.06079448
Interpretacja wyniku!
Spearman's rank correlation rho
data: x.norm and x.gamma
S = 177858726, p-value = 0.03374
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
-0.06715342
Interpretacja wyniku!
Pearson's product-moment correlation
data: x.norm[1:100] and x.norm[901:1000]
t = 1.5964, df = 98, p-value = 0.1136
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.0384156 0.3448386
sample estimates:
cor
0.1592038
Interpretacja wyniku!
Pamiętaj o założeniach!
F test to compare two variances
data: x.norm and x.norm2
F = 1, num df = 999, denom df = 999, p-value = 1
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.8832987 1.1321198
sample estimates:
ratio of variances
1
Interpretacja wyniku!
F test to compare two variances
data: x.norm[1:100] and x.norm[901:1000]
F = 1.1009, num df = 99, denom df = 99, p-value = 0.6334
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.7407362 1.6362053
sample estimates:
ratio of variances
1.100907
Interpretacja wyniku!
Mood two-sample test of scale
data: x.norm and x.norm2
Z = -0.15617, p-value = 0.8759
alternative hypothesis: two.sided
Interpretacja wyniku!
Ansari-Bradley test
data: x.norm and x.norm2
AB = 501504, p-value = 0.8764
alternative hypothesis: true ratio of scales is not equal to 1
Interpretacja wyniku!
1-sample proportions test with continuity correction
data: 399 out of 1000, null probability 0.5
X-squared = 40.401, df = 1, p-value = 2.068e-10
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.3685995 0.4301862
sample estimates:
p
0.399
Interpretacja wyniku!
[1] 63
1-sample proportions test with continuity correction
data: sum(probka3) out of length(probka3), null probability 0.6
X-squared = 0.26042, df = 1, p-value = 0.6098
alternative hypothesis: true p is not equal to 0.6
95 percent confidence interval:
0.5271463 0.7227373
sample estimates:
p
0.63
Interpretacja wyniku!
1-sample proportions test with continuity correction
data: sum(probka3) out of length(probka3), null probability 0.7
X-squared = 2.0119, df = 1, p-value = 0.1561
alternative hypothesis: true p is not equal to 0.7
95 percent confidence interval:
0.5271463 0.7227373
sample estimates:
p
0.63
Interpretacja wyniku!
[1] 0.1322852
[1] 0.1311428
[1] 0.1322852
[1] 0.1112382
[30m-- [1mAttaching packages[22m --------------------------------------- tidyverse 1.3.0 --[39m
[30m[32m<U+221A>[30m [34mggplot2[30m 3.3.0 [32m<U+221A>[30m [34mpurrr [30m 0.3.4
[32m<U+221A>[30m [34mtibble [30m 3.0.1 [32m<U+221A>[30m [34mdplyr [30m 0.8.5
[32m<U+221A>[30m [34mtidyr [30m 1.0.2 [32m<U+221A>[30m [34mstringr[30m 1.4.0
[32m<U+221A>[30m [34mreadr [30m 1.3.1 [32m<U+221A>[30m [34mforcats[30m 0.5.0[39m
[30m-- [1mConflicts[22m ------------------------------------------ tidyverse_conflicts() --
[31mx[30m [34mdplyr[30m::[32mfilter()[30m masks [34mstats[30m::filter()
[31mx[30m [34mdplyr[30m::[32mlag()[30m masks [34mstats[30m::lag()[39m
daneMieszkania <- read_delim("http://www.biecek.pl/R/dane/daneMieszkania.csv",
";",
escape_double = FALSE,
trim_ws = TRUE)
Parsed with column specification:
cols(
cena = [32mcol_double()[39m,
pokoi = [32mcol_double()[39m,
powierzchnia = [32mcol_double()[39m,
dzielnica = [31mcol_character()[39m,
`typ budynku` = [31mcol_character()[39m
)
cena pokoi powierzchnia dzielnica typ budynku
Min. : 83280 Min. :1.00 Min. :17.00 Biskupin :65 kamienica :61
1st Qu.:143304 1st Qu.:2.00 1st Qu.:31.15 Krzyki :79 niski blok:63
Median :174935 Median :3.00 Median :43.70 Srodmiescie:56 wiezowiec :76
Mean :175934 Mean :2.55 Mean :46.20
3rd Qu.:208741 3rd Qu.:3.00 3rd Qu.:61.40
Max. :295762 Max. :4.00 Max. :87.70
model <- lm(cena ~ dzielnica, data = daneMieszkania)
model_1 <- lm(cena ~ dzielnica - 1, data = daneMieszkania)
Durbin-Watson test
data: model
DW = 2.1565, p-value = 0.8655
alternative hypothesis: true autocorrelation is greater than 0
Interpretacja wyniku!
1 2 3 4 5 6 7 8
58125.195 -61372.644 -52810.033 30920.707 -37869.975 -33252.104 -2761.984 -69247.315
9 10
22043.257 -51866.395
Cramer-von Mises normality test
data: model$residuals
W = 0.102, p-value = 0.1048
Interpretacja wyniku!
Shapiro-Francia normality test
data: model$residuals
W = 0.98656, p-value = 0.05273
Interpretacja wyniku!
Jarque-Bera test for normality
data: model$residuals
JB = 5.2583, p-value = 0.06
Interpretacja wyniku!
F test to compare two variances
data: x.norm[1:100] and x.norm[301:400]
F = 1.0601, num df = 99, denom df = 99, p-value = 0.7721
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.7132824 1.5755629
sample estimates:
ratio of variances
1.060104
Interpretacja wyniku!
Mood two-sample test of scale
data: x.norm[1:100] and x.norm[901:1000]
Z = 0.004164, p-value = 0.9967
alternative hypothesis: two.sided
Interpretacja wyniku!
Ansari-Bradley test
data: x.norm[501:600] and x.norm[701:800]
AB = 5116, p-value = 0.747
alternative hypothesis: true ratio of scales is not equal to 1
Interpretacja wyniku!
1-sample proportions test with continuity correction
data: 295 out of 750, null probability 0.45
X-squared = 9.503, df = 1, p-value = 0.002051
alternative hypothesis: true p is not equal to 0.45
95 percent confidence interval:
0.3583488 0.4294256
sample estimates:
p
0.3933333
Interpretacja wyniku!
1-sample proportions test with continuity correction
data: 295 out of 750, null probability 0.4
X-squared = 0.1125, df = 1, p-value = 0.7373
alternative hypothesis: true p is not equal to 0.4
95 percent confidence interval:
0.3583488 0.4294256
sample estimates:
p
0.3933333
Interpretacja wyniku!
[1] 804
1-sample proportions test with continuity correction
data: sum(proba) out of length(proba), null probability 0.78
X-squared = 3.2182, df = 1, p-value = 0.07282
alternative hypothesis: true p is not equal to 0.78
95 percent confidence interval:
0.7777307 0.8278955
sample estimates:
p
0.804
Interpretacja wyniku!
Pearson's Chi-squared test
data: tablica
X-squared = 0.00060927, df = 2, p-value = 0.9997
Interpretacja wyniku!
Tak stosowany test chi-kwadrat obowiązuje tylko dla wielowymiarowego rozkładu normalnego
Pearson's product-moment correlation
data: x.norm and x.gamma
t = 0.70572, df = 998, p-value = 0.4805
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.03971457 0.08420999
sample estimates:
cor
0.02233349
Interpretacja wyniku!
Spearman's rank correlation rho
data: x.norm and x.gamma
S = 165662512, p-value = 0.8491
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.006023934
Interpretacja wyniku!
[1] 0.002468333
[1] 0.002468333
[1] 0.002468333
[1] 0.002468325
[1] 0.002075612
library(readr)
daneMieszkania <- read_delim("http://www.biecek.pl/R/dane/daneMieszkania.csv",
";", escape_double = FALSE, trim_ws = TRUE)
Parsed with column specification:
cols(
cena = [32mcol_double()[39m,
pokoi = [32mcol_double()[39m,
powierzchnia = [32mcol_double()[39m,
dzielnica = [31mcol_character()[39m,
`typ budynku` = [31mcol_character()[39m
)
cena pokoi powierzchnia dzielnica typ budynku
Min. :120290 Min. :1.000 Min. :17.10 Biskupin :65 kamienica :26
1st Qu.:156655 1st Qu.:2.000 1st Qu.:35.20 Krzyki : 0 niski blok:17
Median :189291 Median :3.000 Median :45.10 Srodmiescie: 0 wiezowiec :22
Mean :189494 Mean :2.585 Mean :47.05
3rd Qu.:214462 3rd Qu.:3.000 3rd Qu.:61.20
Max. :295762 Max. :4.000 Max. :87.70
[1] 65 5
One Sample t-test
data: daneMieszkania.Biskupin$cena
t = 37.585, df = 64, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
179422 199566
sample estimates:
mean of x
189494
Interpretacja wyniku!
Welch Two Sample t-test
data: daneMieszkania.krzyki$cena and daneMieszkania.Biskupin$cena
t = -2.9793, df = 140.82, p-value = 0.003404
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-35468.945 -7173.093
sample estimates:
mean of x mean of y
168173 189494
Interpretacja wyniku!
Welch Two Sample t-test
data: daneMieszkania.Biskupin$cena and daneMieszkania.srodmiescie$cena
t = 2.5079, df = 117.13, p-value = 0.01351
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
3859.764 32841.317
sample estimates:
mean of x mean of y
189494.0 171143.5
Interpretacja wyniku!
model <- lm(cena ~ dzielnica, data = daneMieszkania)
model2 <- lm(cena ~ dzielnica - 1, data = daneMieszkania)
Call:
lm(formula = cena ~ dzielnica, data = daneMieszkania)
Coefficients:
(Intercept) dzielnicaKrzyki dzielnicaSrodmiescie
189494 -21321 -18351
Analysis of Variance Table
Response: cena
Df Sum Sq Mean Sq F value Pr(>F)
dzielnica 2 1.7995e+10 8997691613 5.0456 0.007294 **
Residuals 197 3.5130e+11 1783263361
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Interpretacja wyniku!