tsfresh
机器学习特征工程
wangzf / 2022-05-03
目录
tsfresh 是一个自动化提取时序特征的库
tsfresh 安装
$ pip install tsfresh
tsfresh 使用步骤
使用tsfresh的使用步骤如下:
前期训练阶段:
- 数据准备:准备符合 tsfresh 输入格式的数据集
- 样本抽样:以步长 s 为间隔滑窗抽样
- 特征生成:对采样样本生成特征,并收集它们
- 特征选择:收集多个特征下的衍生特征,进行特征选择
后期部署阶段:
- 数据准备:准备符合 tsfresh 输入格式的数据集
- 特征选择:对滑窗样本生成特征,并收集它们
tsfresh 数据格式
输入数据格式
- Flat DataFrame
- Stacked DataFrame
- dictionary of flat DataFrame
column_id | column_value | column_sort | column_kind |
---|---|---|---|
id | value | sort | kind |
适合的 API:
tsfresh.extract_features()
tsfresh.
Flat DataFrame
id | time | x | y | A | t1 | x(A, t1) | y(A, t1) | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | t2 | x(A, t2) | y(A, t2) | A | t3 | x(A, t3) | y(A, t3) | B | t1 | ||||
x(B, t1) | y(B, t1) | B | t2 | x(B, t2) | y(B, t2) | B | t3 | x(B, t3) | |||||
y(B, t3) |
Stacked DataFrame
id | time | kind | value | A | t1 | x | x(A, t1) | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | t2 | x | x(A, t2) | A | t3 | x | x(A, t3) | A | t1 | y | y(A, t1) | ||
A | t2 | y | y(A, t2) | A | t3 | y | y(A, t3) | B | t1 | x | x(B, t1) | ||
B | t2 | x | x(B, t2) | B | t3 | x | x(B, t3) | B | t1 | y | y(B, t1) | ||
B | t2 | y | y(B, t2) | B | t3 | y | y(B, t3) |
Dictionary of flat DataFrame
{
"x”:
| id | time | value |
|----|------|----------|
| A | t1 | x(A, t1) |
| A | t2 | x(A, t2) |
| A | t3 | x(A, t3) |
| B | t1 | x(B, t1) |
| B | t2 | x(B, t2) |
| B | t3 | x(B, t3) |
, "y”:
| id | time | value |
|----|------|----------|
| A | t1 | y(A, t1) |
| A | t2 | y(A, t2) |
| A | t3 | y(A, t3) |
| B | t1 | y(B, t1) |
| B | t2 | y(B, t2) |
| B | t3 | y(B, t3) |
}
输出数据格式
id | x feature 1 | … | x feature N | y feature 1 | $\ldots$ |
y feature N |
---|---|---|---|---|---|---|
A | … | … | … | … | … | … |
B | … | … | … | … | … | … |
scikit-learn Transformers
Feature extraction
tsfresh.FeatureAugmenter
Feature selection
tsfresh.FeatureSelector
Feature extraction and selection
tsfresh.RelevantFeatureAugmenter
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from tsfresh.examples import load_robot_execution_failures
from tsfresh.transformers import RelevantFeatureAugmenter
import pandas as pd
# download data
from tsfresh.examples.robot_execution_failures import download_robot_execution_failures
download_robot_execution_failures()
pipeline = Pipeline([
("augmenter", RelevantFeatureAugmenter(column_id = "id", column_sort = "time")),
("classifier", RandomForestClassifier()),
])
df_ts, y = load_robot_execution_failures()
X = pd.DataFrame(index = y.index)
pipeline.set_params(augmenter__timeseries_container = df_ts)
pipeline.fit(X, y)