import pandas as pd
from sklearn import model_selection
import statsmodels.api as sm
Profit = pd.read_excel('Predict to Profit.xlsx')
Profit.head()
train, test = model_selection.train_test_split(Profit, test_size = 0.2, random_state=1234)
model = sm.formula.ols('Profit~RD_Spend+Administration+Marketing_Spend+C(State)',data=train).fit()
print('模型的偏回归系数分别为:n', model.params)
test_X = test.drop(labels = 'Profit', axis=1)
pred = model.predict(exog = test_X)
print('对比预测值和实际值的差异:n',pd.DataFrame({'prediction':pred,'Real':test.Profit}))
model.summary()
首先将数据集分为训练集和测试集;
接着用sm进行多元回归;
计算预测值;
计算统计量。