Machine learning / Scaling of Features

It is used to compare data with different values / units we do scaling.

Scaling Standard method use formula:
z = (x - u) / s

where
z new value
x original value
u mean
s standard deviation

Consider the below data set("products.csv").
Product price quantity profit category productconditiongood
A1 200 190 49 x YES
A2 400 560 45 x YES
A3 200 329 45 x YES
A4 100 265 40 x YES
A5 700 540 55 x YES
A6 200 329 55 x YES
A7 600 509 40 x YES
A8 700 765 42 x YES
A9 700 512 48 x YES
A10 800 550 49 x YES
A11 300 380 49 x YES
A12 500 390 51 x YES
A13 200 512 49 x YES
A14 800 652 44 x YES
A15 800 726 47 x YES
A16 800 730 47 y YES
A17 800 765 49 y YES
A18 1400 680 54 y YES
A19 800 519 54 y YES
A20 1200 728 55 y YES
A21 800 984 44 y NO
A22 1200 828 49 y NO
A23 1300 765 49 y NO
A24 800 815 49 y NO
A25 1200 815 49 y NO
A26 700 865 52 y NO
A27 1200 890 54 y NO
A28 1200 1125 64 y NO
A29 800 923 59 z NO
A30 1200 1105 64 z NO
A31 1300 1005 65 z NO
A32 1200 1146 67 z NO
A33 800 635 54 z NO
A34 800 790 58 z NO
A35 800 805 59 z NO
A36 1700 795 70 z NO


Statistical Calculation Python Program Output
Scaling of Features import pandas
from sklearn import linear_model
from sklearn.preprocessing import StandardScaler
scale = StandardScaler() fp = pandas.read_csv("products.csv")
Xpoints = fp[['price', 'quantity']]
scaledXpoints = scale.fit_transform(Xpoints)
print(scaledXpoints)
[[-1.59336644 -2.10389253] [-1.07190106 -0.55407235] [-1.59336644 -1.52166278] [-1.85409913 -1.78973979] [-0.28970299 -0.63784641] [-1.59336644 -1.52166278] [-0.55043568 -0.76769621] [-0.28970299 0.3046118 ] [-0.28970299 -0.7551301 ] [-0.0289703 -0.59595938] [-1.33263375 -1.30803892] [-0.81116837 -1.26615189] [-1.59336644 -0.7551301 ] [-0.0289703 -0.16871166] [-0.0289703 0.14125238] [-0.0289703 0.15800719] [-0.0289703 0.3046118 ] [ 1.53542584 -0.05142797] [-0.0289703 -0.72580918] [ 1.01396046 0.14962979] [-0.0289703 1.2219378 ] [ 1.01396046 0.5685001 ] [ 1.27469315 0.3046118 ] [-0.0289703 0.51404696] [ 1.01396046 0.51404696] [-0.28970299 0.72348212] [ 1.01396046 0.8281997 ] [ 1.01396046 1.81254495] [-0.0289703 0.96642691] [ 1.01396046 1.72877089] [ 1.27469315 1.30990057] [ 1.01396046 1.90050772] [-0.0289703 -0.23991961] [-0.0289703 0.40932938] [-0.0289703 0.47215993] [ 2.31762392 0.4302729 ]]


Explanation
First record 200 190
Second record -1.59336644 -2.10389253


Comparing first row is not easy but Comparing second row is easy because values are small. That’s why we use scaling.


Home     Back