Apply Models to DataFrames¶

While all the possible arguments are documented in API the general pattern follwows along the lines.

import pandas as pd
import pandas_ml_utils as pmu
from pandas_ml_utils.summary.binary_classification_summary import BinaryClassificationSummary
from sklearn.linear_model import LogisticRegression

df = pd.read_csv('_static/burritos.csv')
df["with_fires"] = df["Fries"].apply(lambda x: str(x).lower() == "x")
df["price"] = df["Cost"] * -1
df = df[["Tortilla", "Temp", "Meat", "Fillings", "Meat:filling", "Uniformity", "Salsa", "Synergy", "Wrap", "overall", "with_fires", "price"]].dropna()
fit = df.fit(pmu.SkModel(LogisticRegression(solver='lbfgs'),
                           pmu.FeaturesAndLabels(["Tortilla", "Temp", "Meat", "Fillings", "Meat:filling",
                                                  "Uniformity", "Salsa", "Synergy", "Wrap", "overall"],
                                                 ["with_fires"],
                                                 gross_loss=lambda f: f["price"]),
                           BinaryClassificationSummary))

fit

Data was not in RNN shape
Data was not in RNN shape
Data was not in RNN shape
Data was not in RNN shape

Training Data

Test Data

Confusion Matrix

Confusion Loss

Prediction/Truth	True	False
True	19	8
False	49	119

Prediction/Truth	True	False
True	-128.82	-62.38
False	-340.61	-840.31

FN/TP Ratio (should be < 0.5, ideally 0)	0.42
FP/TP Ratio (should be < 0.5, ideally 0)	2.58
F1 Score (should be > 0.5, ideally 1)	0.40

Loss

Chart

Confusion Matrix

Confusion Loss

Prediction/Truth	True	False
True	12	14
False	31	74

Prediction/Truth	True	False
True	-86.39	-103.91
False	-228.66	-530.61

FN/TP Ratio (should be < 0.5, ideally 0)	1.17
FP/TP Ratio (should be < 0.5, ideally 0)	2.58
F1 Score (should be > 0.5, ideally 1)	0.35

Loss

Chart

pandas_ml_utils.model.models(LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, l1_ratio=None, max_iter=100, multi_class='auto', n_jobs=None, penalty='l2', random_state=None, solver='lbfgs', tol=0.0001, verbose=0, warm_start=False), FeaturesAndLabels(['Tortilla', 'Temp', 'Meat', 'Fillings', 'Meat:filling', 'Uniformity', 'Salsa', 'Synergy', 'Wrap', 'overall'],['with_fires'],None,None,NoneNone) #10 features expand to 10)

From here you can save and reuse it like so:

fit.save_model('/tmp/burrito.model')
df.predict(pmu.Model.load('/tmp/burrito.model')).tail()

saved model to: /tmp/burrito.model
Data was not in RNN shape

	prediction
	with_fires
380	0.251311
381	0.328659
382	0.064751
383	0.428745
384	0.265546

This is basically all you need to know. The same patterns are applied to regressors or agents for reinforcement learning.