Overview

Dataset statistics

Number of variables10
Number of observations392
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory30.8 KiB
Average record size in memory80.3 B

Variable types

Numeric6
Categorical4

Alerts

name has a high cardinality: 301 distinct values High cardinality
mpg is highly correlated with cyl and 7 other fieldsHigh correlation
cyl is highly correlated with mpg and 6 other fieldsHigh correlation
displ is highly correlated with mpg and 6 other fieldsHigh correlation
hp is highly correlated with mpg and 6 other fieldsHigh correlation
weight is highly correlated with mpg and 6 other fieldsHigh correlation
accel is highly correlated with mpg and 4 other fieldsHigh correlation
yr is highly correlated with mpgHigh correlation
origin is highly correlated with mpg and 5 other fieldsHigh correlation
mfr is highly correlated with mpg and 5 other fieldsHigh correlation
name is uniformly distributed Uniform

Reproduction

Analysis started2022-09-20 15:16:11.906631
Analysis finished2022-09-20 15:16:15.454252
Duration3.55 seconds
Software versionpandas-profiling v3.3.0
Download configurationconfig.json

Variables

mpg
Real number (ℝ≥0)

HIGH CORRELATION

Distinct127
Distinct (%)32.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23.44591837
Minimum9
Maximum46.6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2022-09-20T17:16:15.540783image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum9
5-th percentile13
Q117
median22.75
Q329
95-th percentile37
Maximum46.6
Range37.6
Interquartile range (IQR)12

Descriptive statistics

Standard deviation7.805007487
Coefficient of variation (CV)0.3328940826
Kurtosis-0.5159934946
Mean23.44591837
Median Absolute Deviation (MAD)5.8
Skewness0.4570923231
Sum9190.8
Variance60.91814187
MonotonicityNot monotonic
2022-09-20T17:16:15.657253image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1320
 
5.1%
1419
 
4.8%
1817
 
4.3%
1516
 
4.1%
2614
 
3.6%
1613
 
3.3%
1912
 
3.1%
2411
 
2.8%
2810
 
2.6%
2210
 
2.6%
Other values (117)250
63.8%
ValueCountFrequency (%)
91
 
0.3%
102
 
0.5%
114
 
1.0%
126
 
1.5%
1320
5.1%
1419
4.8%
14.51
 
0.3%
1516
4.1%
15.55
 
1.3%
1613
3.3%
ValueCountFrequency (%)
46.61
0.3%
44.61
0.3%
44.31
0.3%
441
0.3%
43.41
0.3%
43.11
0.3%
41.51
0.3%
40.81
0.3%
39.41
0.3%
39.11
0.3%

cyl
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Memory size3.2 KiB
4
199 
8
103 
6
83 
3
 
4
5
 
3

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters392
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row8
2nd row8
3rd row8
4th row8
5th row8

Common Values

ValueCountFrequency (%)
4199
50.8%
8103
26.3%
683
21.2%
34
 
1.0%
53
 
0.8%

Length

2022-09-20T17:16:15.766965image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-20T17:16:15.868624image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
4199
50.8%
8103
26.3%
683
21.2%
34
 
1.0%
53
 
0.8%

Most occurring characters

ValueCountFrequency (%)
4199
50.8%
8103
26.3%
683
21.2%
34
 
1.0%
53
 
0.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number392
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
4199
50.8%
8103
26.3%
683
21.2%
34
 
1.0%
53
 
0.8%

Most occurring scripts

ValueCountFrequency (%)
Common392
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
4199
50.8%
8103
26.3%
683
21.2%
34
 
1.0%
53
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII392
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4199
50.8%
8103
26.3%
683
21.2%
34
 
1.0%
53
 
0.8%

displ
Real number (ℝ≥0)

HIGH CORRELATION

Distinct81
Distinct (%)20.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean194.4119898
Minimum68
Maximum455
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2022-09-20T17:16:15.987043image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum68
5-th percentile85
Q1105
median151
Q3275.75
95-th percentile400
Maximum455
Range387
Interquartile range (IQR)170.75

Descriptive statistics

Standard deviation104.6440039
Coefficient of variation (CV)0.5382590036
Kurtosis-0.7783169302
Mean194.4119898
Median Absolute Deviation (MAD)61
Skewness0.7016690997
Sum76209.5
Variance10950.36755
MonotonicityNot monotonic
2022-09-20T17:16:16.137220image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9721
 
5.4%
35018
 
4.6%
9817
 
4.3%
31817
 
4.3%
25017
 
4.3%
14015
 
3.8%
40013
 
3.3%
22513
 
3.3%
9112
 
3.1%
30211
 
2.8%
Other values (71)238
60.7%
ValueCountFrequency (%)
681
 
0.3%
703
0.8%
712
 
0.5%
721
 
0.3%
761
 
0.3%
781
 
0.3%
796
1.5%
801
 
0.3%
811
 
0.3%
831
 
0.3%
ValueCountFrequency (%)
4553
 
0.8%
4541
 
0.3%
4402
 
0.5%
4293
 
0.8%
40013
3.3%
3901
 
0.3%
3832
 
0.5%
3604
 
1.0%
3518
2.0%
35018
4.6%

hp
Real number (ℝ≥0)

HIGH CORRELATION

Distinct93
Distinct (%)23.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean104.4693878
Minimum46
Maximum230
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2022-09-20T17:16:16.279673image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum46
5-th percentile60.55
Q175
median93.5
Q3126
95-th percentile180
Maximum230
Range184
Interquartile range (IQR)51

Descriptive statistics

Standard deviation38.49115993
Coefficient of variation (CV)0.3684443908
Kurtosis0.6969469997
Mean104.4693878
Median Absolute Deviation (MAD)19.5
Skewness1.087326282
Sum40952
Variance1481.569393
MonotonicityNot monotonic
2022-09-20T17:16:16.859898image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
15022
 
5.6%
9020
 
5.1%
8819
 
4.8%
11018
 
4.6%
10017
 
4.3%
7514
 
3.6%
9514
 
3.6%
10512
 
3.1%
7012
 
3.1%
6712
 
3.1%
Other values (83)232
59.2%
ValueCountFrequency (%)
462
 
0.5%
483
0.8%
491
 
0.3%
524
1.0%
532
 
0.5%
541
 
0.3%
582
 
0.5%
605
1.3%
611
 
0.3%
622
 
0.5%
ValueCountFrequency (%)
2301
 
0.3%
2253
0.8%
2201
 
0.3%
2153
0.8%
2101
 
0.3%
2081
 
0.3%
2001
 
0.3%
1982
0.5%
1931
 
0.3%
1903
0.8%

weight
Real number (ℝ≥0)

HIGH CORRELATION

Distinct346
Distinct (%)88.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2977.584184
Minimum1613
Maximum5140
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2022-09-20T17:16:16.985607image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1613
5-th percentile1931.6
Q12225.25
median2803.5
Q33614.75
95-th percentile4464
Maximum5140
Range3527
Interquartile range (IQR)1389.5

Descriptive statistics

Standard deviation849.40256
Coefficient of variation (CV)0.2852656743
Kurtosis-0.8092593883
Mean2977.584184
Median Absolute Deviation (MAD)639.5
Skewness0.5195856741
Sum1167213
Variance721484.709
MonotonicityNot monotonic
2022-09-20T17:16:17.099648image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
19854
 
1.0%
21304
 
1.0%
21253
 
0.8%
27203
 
0.8%
29453
 
0.8%
23003
 
0.8%
22653
 
0.8%
21553
 
0.8%
24082
 
0.5%
21642
 
0.5%
Other values (336)362
92.3%
ValueCountFrequency (%)
16131
0.3%
16491
0.3%
17551
0.3%
17601
0.3%
17731
0.3%
17952
0.5%
18002
0.5%
18252
0.5%
18341
0.3%
18351
0.3%
ValueCountFrequency (%)
51401
0.3%
49971
0.3%
49551
0.3%
49521
0.3%
49511
0.3%
49061
0.3%
47461
0.3%
47351
0.3%
47321
0.3%
46991
0.3%

accel
Real number (ℝ≥0)

HIGH CORRELATION

Distinct95
Distinct (%)24.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.54132653
Minimum8
Maximum24.8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2022-09-20T17:16:17.219832image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum8
5-th percentile11.255
Q113.775
median15.5
Q317.025
95-th percentile20.235
Maximum24.8
Range16.8
Interquartile range (IQR)3.25

Descriptive statistics

Standard deviation2.758864119
Coefficient of variation (CV)0.1775179303
Kurtosis0.4442335534
Mean15.54132653
Median Absolute Deviation (MAD)1.7
Skewness0.2915869257
Sum6092.2
Variance7.611331228
MonotonicityNot monotonic
2022-09-20T17:16:17.331651image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
14.523
 
5.9%
15.521
 
5.4%
1416
 
4.1%
1616
 
4.1%
13.515
 
3.8%
1514
 
3.6%
16.513
 
3.3%
1713
 
3.3%
1312
 
3.1%
1911
 
2.8%
Other values (85)238
60.7%
ValueCountFrequency (%)
81
 
0.3%
8.52
 
0.5%
91
 
0.3%
9.52
 
0.5%
104
1.0%
10.51
 
0.3%
117
1.8%
11.11
 
0.3%
11.21
 
0.3%
11.31
 
0.3%
ValueCountFrequency (%)
24.81
0.3%
24.61
0.3%
23.71
0.3%
23.51
0.3%
22.22
0.5%
22.11
0.3%
21.91
0.3%
21.81
0.3%
21.71
0.3%
21.51
0.3%

yr
Real number (ℝ≥0)

HIGH CORRELATION

Distinct13
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean75.97959184
Minimum70
Maximum82
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2022-09-20T17:16:17.429693image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum70
5-th percentile70
Q173
median76
Q379
95-th percentile82
Maximum82
Range12
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.683736544
Coefficient of variation (CV)0.04848323681
Kurtosis-1.16744622
Mean75.97959184
Median Absolute Deviation (MAD)3
Skewness0.01968829963
Sum29784
Variance13.56991492
MonotonicityIncreasing
2022-09-20T17:16:17.518324image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
7340
10.2%
7836
9.2%
7634
8.7%
7530
 
7.7%
8230
 
7.7%
7029
 
7.4%
7929
 
7.4%
7228
 
7.1%
7728
 
7.1%
8128
 
7.1%
Other values (3)80
20.4%
ValueCountFrequency (%)
7029
7.4%
7127
6.9%
7228
7.1%
7340
10.2%
7426
6.6%
7530
7.7%
7634
8.7%
7728
7.1%
7836
9.2%
7929
7.4%
ValueCountFrequency (%)
8230
7.7%
8128
7.1%
8027
6.9%
7929
7.4%
7836
9.2%
7728
7.1%
7634
8.7%
7530
7.7%
7426
6.6%
7340
10.2%

origin
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size3.2 KiB
North America
245 
Asia
79 
Europe
68 

Length

Max length13
Median length13
Mean length9.971938776
Min length4

Characters and Unicode

Total characters3909
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNorth America
2nd rowNorth America
3rd rowNorth America
4th rowNorth America
5th rowNorth America

Common Values

ValueCountFrequency (%)
North America245
62.5%
Asia79
 
20.2%
Europe68
 
17.3%

Length

2022-09-20T17:16:17.610933image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-20T17:16:17.707220image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
north245
38.5%
america245
38.5%
asia79
 
12.4%
europe68
 
10.7%

Most occurring characters

ValueCountFrequency (%)
r558
14.3%
A324
8.3%
i324
8.3%
a324
8.3%
o313
8.0%
e313
8.0%
N245
 
6.3%
t245
 
6.3%
h245
 
6.3%
245
 
6.3%
Other values (6)773
19.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3027
77.4%
Uppercase Letter637
 
16.3%
Space Separator245
 
6.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r558
18.4%
i324
10.7%
a324
10.7%
o313
10.3%
e313
10.3%
t245
8.1%
h245
8.1%
m245
8.1%
c245
8.1%
s79
 
2.6%
Other values (2)136
 
4.5%
Uppercase Letter
ValueCountFrequency (%)
A324
50.9%
N245
38.5%
E68
 
10.7%
Space Separator
ValueCountFrequency (%)
245
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3664
93.7%
Common245
 
6.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
r558
15.2%
A324
8.8%
i324
8.8%
a324
8.8%
o313
8.5%
e313
8.5%
N245
6.7%
t245
6.7%
h245
6.7%
m245
6.7%
Other values (5)528
14.4%
Common
ValueCountFrequency (%)
245
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII3909
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r558
14.3%
A324
8.3%
i324
8.3%
a324
8.3%
o313
8.0%
e313
8.0%
N245
 
6.3%
t245
 
6.3%
h245
 
6.3%
245
 
6.3%
Other values (6)773
19.8%

name
Categorical

HIGH CARDINALITY
UNIFORM

Distinct301
Distinct (%)76.8%
Missing0
Missing (%)0.0%
Memory size3.2 KiB
amc matador
 
5
ford pinto
 
5
toyota corolla
 
5
toyota corona
 
4
amc hornet
 
4
Other values (296)
369 

Length

Max length36
Median length28
Mean length16.12244898
Min length6

Characters and Unicode

Total characters6320
Distinct characters45
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique245 ?
Unique (%)62.5%

Sample

1st rowchevrolet chevelle malibu
2nd rowbuick skylark 320
3rd rowplymouth satellite
4th rowamc rebel sst
5th rowford torino

Common Values

ValueCountFrequency (%)
amc matador5
 
1.3%
ford pinto5
 
1.3%
toyota corolla5
 
1.3%
toyota corona4
 
1.0%
amc hornet4
 
1.0%
chevrolet chevette4
 
1.0%
chevrolet impala4
 
1.0%
amc gremlin4
 
1.0%
peugeot 5044
 
1.0%
ford maverick4
 
1.0%
Other values (291)349
89.0%

Length

2022-09-20T17:16:17.813618image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ford48
 
4.7%
chevrolet43
 
4.2%
plymouth31
 
3.0%
sw28
 
2.7%
dodge28
 
2.7%
amc27
 
2.6%
toyota25
 
2.4%
datsun23
 
2.2%
custom18
 
1.8%
buick17
 
1.7%
Other values (302)737
71.9%

Most occurring characters

ValueCountFrequency (%)
633
 
10.0%
o523
 
8.3%
a494
 
7.8%
e413
 
6.5%
r381
 
6.0%
t379
 
6.0%
c346
 
5.5%
l327
 
5.2%
d261
 
4.1%
i252
 
4.0%
Other values (35)2311
36.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5288
83.7%
Space Separator633
 
10.0%
Decimal Number307
 
4.9%
Open Punctuation36
 
0.6%
Close Punctuation36
 
0.6%
Dash Punctuation10
 
0.2%
Other Punctuation8
 
0.1%
Math Symbol2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o523
 
9.9%
a494
 
9.3%
e413
 
7.8%
r381
 
7.2%
t379
 
7.2%
c346
 
6.5%
l327
 
6.2%
d261
 
4.9%
i252
 
4.8%
s246
 
4.7%
Other values (16)1666
31.5%
Decimal Number
ValueCountFrequency (%)
0100
32.6%
155
17.9%
248
15.6%
426
 
8.5%
525
 
8.1%
315
 
4.9%
613
 
4.2%
911
 
3.6%
810
 
3.3%
74
 
1.3%
Other Punctuation
ValueCountFrequency (%)
.3
37.5%
/3
37.5%
@1
 
12.5%
'1
 
12.5%
Space Separator
ValueCountFrequency (%)
633
100.0%
Open Punctuation
ValueCountFrequency (%)
(36
100.0%
Close Punctuation
ValueCountFrequency (%)
)36
100.0%
Dash Punctuation
ValueCountFrequency (%)
-10
100.0%
Math Symbol
ValueCountFrequency (%)
+2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin5288
83.7%
Common1032
 
16.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
o523
 
9.9%
a494
 
9.3%
e413
 
7.8%
r381
 
7.2%
t379
 
7.2%
c346
 
6.5%
l327
 
6.2%
d261
 
4.9%
i252
 
4.8%
s246
 
4.7%
Other values (16)1666
31.5%
Common
ValueCountFrequency (%)
633
61.3%
0100
 
9.7%
155
 
5.3%
248
 
4.7%
(36
 
3.5%
)36
 
3.5%
426
 
2.5%
525
 
2.4%
315
 
1.5%
613
 
1.3%
Other values (9)45
 
4.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII6320
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
633
 
10.0%
o523
 
8.3%
a494
 
7.8%
e413
 
6.5%
r381
 
6.0%
t379
 
6.0%
c346
 
5.5%
l327
 
5.2%
d261
 
4.1%
i252
 
4.0%
Other values (35)2311
36.6%

mfr
Categorical

HIGH CORRELATION

Distinct30
Distinct (%)7.7%
Missing0
Missing (%)0.0%
Memory size3.2 KiB
ford
48 
chevrolet
47 
plymouth
31 
dodge
28 
amc
27 
Other values (25)
211 

Length

Max length10
Median length8
Mean length6.209183673
Min length2

Characters and Unicode

Total characters2434
Distinct characters23
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)1.0%

Sample

1st rowchevrolet
2nd rowbuick
3rd rowplymouth
4th rowamc
5th rowford

Common Values

ValueCountFrequency (%)
ford48
12.2%
chevrolet47
12.0%
plymouth31
 
7.9%
dodge28
 
7.1%
amc27
 
6.9%
toyota26
 
6.6%
datsun23
 
5.9%
volkswagen22
 
5.6%
buick17
 
4.3%
pontiac16
 
4.1%
Other values (20)107
27.3%

Length

2022-09-20T17:16:17.922850image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ford48
12.2%
chevrolet47
12.0%
plymouth31
 
7.9%
dodge28
 
7.1%
amc27
 
6.9%
toyota26
 
6.6%
datsun23
 
5.9%
volkswagen22
 
5.6%
buick17
 
4.3%
pontiac16
 
4.1%
Other values (20)107
27.3%

Most occurring characters

ValueCountFrequency (%)
o301
 
12.4%
e203
 
8.3%
t189
 
7.8%
a187
 
7.7%
d174
 
7.1%
l143
 
5.9%
r141
 
5.8%
c132
 
5.4%
u109
 
4.5%
h99
 
4.1%
Other values (13)756
31.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2434
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o301
 
12.4%
e203
 
8.3%
t189
 
7.8%
a187
 
7.7%
d174
 
7.1%
l143
 
5.9%
r141
 
5.8%
c132
 
5.4%
u109
 
4.5%
h99
 
4.1%
Other values (13)756
31.1%

Most occurring scripts

ValueCountFrequency (%)
Latin2434
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o301
 
12.4%
e203
 
8.3%
t189
 
7.8%
a187
 
7.7%
d174
 
7.1%
l143
 
5.9%
r141
 
5.8%
c132
 
5.4%
u109
 
4.5%
h99
 
4.1%
Other values (13)756
31.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII2434
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o301
 
12.4%
e203
 
8.3%
t189
 
7.8%
a187
 
7.7%
d174
 
7.1%
l143
 
5.9%
r141
 
5.8%
c132
 
5.4%
u109
 
4.5%
h99
 
4.1%
Other values (13)756
31.1%

Interactions

2022-09-20T17:16:14.673307image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:12.177639image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:12.687530image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:13.158284image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:13.632357image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:14.148406image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:14.749799image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:12.302890image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:12.767096image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:13.235913image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:13.710660image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:14.234016image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:14.827707image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:12.383128image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:12.847324image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:13.319281image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:13.796241image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:14.342817image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:14.903100image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:12.459610image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:12.925354image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:13.398849image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:13.879074image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:14.425717image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:14.977537image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:12.537247image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:13.002976image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:13.476062image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:13.979850image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:14.509515image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:15.053091image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:12.613748image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:13.082192image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:13.556032image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:14.065618image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-20T17:16:14.598962image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Correlations

2022-09-20T17:16:18.012465image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-09-20T17:16:18.128871image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-09-20T17:16:18.248067image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-09-20T17:16:18.366066image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-09-20T17:16:18.467638image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-09-20T17:16:15.213314image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-09-20T17:16:15.394640image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

mpgcyldisplhpweightaccelyroriginnamemfr
018.08307.0130350412.070North Americachevrolet chevelle malibuchevrolet
115.08350.0165369311.570North Americabuick skylark 320buick
218.08318.0150343611.070North Americaplymouth satelliteplymouth
316.08304.0150343312.070North Americaamc rebel sstamc
417.08302.0140344910.570North Americaford torinoford
515.08429.0198434110.070North Americaford galaxie 500ford
614.08454.022043549.070North Americachevrolet impalachevrolet
714.08440.021543128.570North Americaplymouth fury iiiplymouth
814.08455.0225442510.070North Americapontiac catalinapontiac
915.08390.019038508.570North Americaamc ambassador dplamc

Last rows

mpgcyldisplhpweightaccelyroriginnamemfr
38226.04156.092258514.582North Americachrysler lebaron medallionchrysler
38322.06232.0112283514.782North Americaford granada lford
38432.04144.096266513.982Asiatoyota celica gttoyota
38536.04135.084237013.082North Americadodge charger 2.2dodge
38627.04151.090295017.382North Americachevrolet camarochevrolet
38727.04140.086279015.682North Americaford mustang glford
38844.0497.052213024.682Europevw pickupvolkswagen
38932.04135.084229511.682North Americadodge rampagedodge
39028.04120.079262518.682North Americaford rangerford
39131.04119.082272019.482North Americachevy s-10chevrolet