Analyze the Feature Space¶
-
pandas_ml_utils.
feature_selection
(df, label_column=None, ignore=[], top_features=5, correlation_threshold=0.5, minimum_features=1, lags=range(0, 100), show_plots=True, figsize=(12, 10))¶ The feature_selection functionality helps you to analyze your features, filter out highly correlated once and focus on the most important features. This function also applies an auto regression and embeds and ACF plot.
- Parameters
df (
DataFrame
) – the DataFrame which you apply the function onlabel_column (
Optional
[str
]) – column name of your dependent variableignore (
Union
[List
[str
],str
]) – columns you want to ignoretop_features (
int
) – number of most important features you want to selectcorrelation_threshold (
float
) – threshold at which correlated features drop outminimum_features (
int
) – number of features you want to keep even if they are highly correlatedlags (
Iterable
[int
]) – iterable of lags you want to analyze as an AR processshow_plots (
bool
) – whether to show plots or notfigsize (
Tuple
[int
,int
]) – size of the polots
- Returns
None
import pandas_ml_utils as pmu
import pandas as pd
df = pd.read_csv('_static/burritos.csv')[["Tortilla", "Temp", "Meat", "Fillings", "Meat:filling", "Uniformity", "Salsa", "Synergy", "Wrap", "overall"]]
df.feature_selection(label_column="overall")
Tortilla overall Synergy Fillings Temp Salsa \
Tortilla 1.0 0.403981 0.367575 0.345613 0.290702 0.267212
Meat Uniformity Meat:filling Wrap
Tortilla 0.260194 0.208666 0.207518 0.160831
label is continuous: True
Feature ranking:
['Synergy', 'Meat', 'Fillings', 'Meat:filling', 'Wrap', 'Tortilla', 'Uniformity', 'Salsa', 'Temp']
TOP 5 features
Synergy Meat Fillings Meat:filling Wrap
Synergy 1.0 0.601545 0.663328 0.428505 0.08685
filter feature Fillings with correlation 1.0 > 0.5
filter feature Meat with correlation 1.0 > 0.5
Features after correlation filer:
Synergy Meat:filling Wrap
Tortilla 0.367575 0.207518 0.160831
Synergy 1.000000
Synergy_0 1.000000
Synergy_1 0.147495
Synergy_56 0.128449
Synergy_78 0.119272
Synergy_55 0.111832
Synergy_79 0.086466
Synergy_47 0.085117
Synergy_53 0.084786
Synergy_37 0.084312
Name: Synergy, dtype: float64
Meat:filling 1.000000
Meat:filling_0 1.000000
Meat:filling_15 0.185946
Meat:filling_35 0.175837
Meat:filling_1 0.122546
Meat:filling_87 0.118597
Meat:filling_33 0.112875
Meat:filling_73 0.103090
Meat:filling_72 0.103054
Meat:filling_71 0.089437
Name: Meat:filling, dtype: float64
Wrap 1.000000
Wrap_0 1.000000
Wrap_63 0.210823
Wrap_88 0.189735
Wrap_1 0.169132
Wrap_87 0.166502
Wrap_66 0.146689
Wrap_89 0.141822
Wrap_74 0.120047
Wrap_11 0.115095
Name: Wrap, dtype: float64
best lags are
[(1, '-1.00'), (2, '-0.15'), (88, '-0.10'), (64, '-0.07'), (19, '-0.07'), (89, '-0.06'), (36, '-0.05'), (43, '-0.05'), (16, '-0.05'), (68, '-0.04'), (90, '-0.04'), (87, '-0.04'), (3, '-0.03'), (20, '-0.03'), (59, '-0.03'), (75, '-0.03'), (91, '-0.03'), (57, '-0.03'), (46, '-0.02'), (48, '-0.02'), (54, '-0.02'), (73, '-0.02'), (25, '-0.02'), (79, '-0.02'), (76, '-0.02'), (37, '-0.02'), (71, '-0.02'), (15, '-0.02'), (49, '-0.02'), (12, '-0.02'), (65, '-0.02'), (40, '-0.02'), (24, '-0.02'), (78, '-0.02'), (53, '-0.02'), (8, '-0.02'), (44, '-0.01'), (45, '0.01'), (56, '0.01'), (26, '0.01'), (82, '0.01'), (77, '0.02'), (22, '0.02'), (83, '0.02'), (11, '0.02'), (66, '0.02'), (31, '0.02'), (80, '0.02'), (92, '0.02'), (39, '0.03'), (27, '0.03'), (70, '0.04'), (41, '0.04'), (51, '0.04'), (4, '0.04'), (7, '0.05'), (13, '0.05'), (97, '0.06'), (60, '0.06'), (42, '0.06'), (96, '0.06'), (95, '0.06'), (30, '0.07'), (81, '0.07'), (52, '0.07'), (9, '0.07'), (61, '0.07'), (84, '0.07'), (29, '0.08'), (94, '0.08'), (28, '0.11')]