Instance filter:
- ICF
- RT3
Learning process :
- k-fold Cross-validation
- Bootstrap (0.632)
- Jackknife
- Non-information error rate
- Resampling
- Stratified sampling
- Bolstered estimate
- Bayesian cofidence intervals
- Permutation tests
Confusion matrix:
- Confusion Matrix
- Database bias
- ROC (equally distributed data)
- AUC (equally distributed data)
- F-score (equally distributed data)
- “Unbalanced” data scores: Classification error rate, Recall Specificity, Precision, Accuracy, FPR, FNR, NPV, Kappa statistic, H-Measure.
- Brier score (calibration)
- LogLoss (calibration)
Non-binary classification:
- One vs. All OVA
- One vs One OVO
Algorithm comparison:
Contents
ToggleSupervised Learning
Supervised learning or supervised machine learning learns patterns and relationships between input and output data. It is defined by its use of labeled data. Labeled data is a set of data that contains many examples of features and targets. Supervised learning uses algorithms that learn the relationship between features and the target from the dataset. This process is called training or adjustment.
There are two types of supervised learning algorithms:
- Classification
- Regression
Classification
Classification is a type of supervised machine learning in which algorithms learn from data to predict an outcome or event in the future. For example:
A bank may have a set of customer data containing credit history, loans, investment details, etc. and she may want to know if a customer will default. In historical data we will have Characteristics and Target.
- The features will be attributes of a customer such as credit history, loans, investments, etc.
- The target will indicate whether a particular customer has defaulted in the past (normally represented by 1 or 0 / True or False / Yes or No.
Classification algorithms are used to predict discrete outcomes. If the result can take two possible values such as True or False, Default or No default value, Yes or No, it is called binary classification. When the result contains more than two possible values, we speak of multiclass classification.
Regression
Regression is a type of supervised machine learning in which algorithms learn from data to predict continuous values such as sales, salary, weight, or temperature. For example:
A dataset containing home features such as lot size, number of bedrooms, number of bathrooms, neighborhood, etc. and the price of the house. A algorithm regression can be trained to know the relationship between the features and the price of the house.