博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
How large a training set is needed?
阅读量:4088 次
发布时间:2019-05-25

本文共 4304 字,大约阅读时间需要 14 分钟。

Is there a common method used to determine how many training samples are required to train a classifier (an LDA in this case) to obtain a minimum threshold generalization accuracy?

I am asking because I would like to minimize the calibration time usually required in a brain-computer interface.

The search term you are looking for is "learning curve", which gives the (average) model performance as function of the training sample size.

Learning curves depend on a lot of things, e.g.

  • classification method
  • complexity of the classifier
  • how well the classes are separated.

(I think for two-class LDA you may be able to derive some theoretical power calculations, but the crucial fact is always whether your data actually meets the "equal COV multivariate normal" assumption. I'd go for some simulation on for both LDA assumptions and resampling of your already existing data).

There are two aspects of the performance of a classifier trained on a finite sample size nn (as usual),

  • bias, i.e. on average a classifier trained on nn training samples is worse than the classifier trained on n=∞n=∞ training cases (this is usually meant by learning curve), and
  • variance: a given training set of nn cases may lead to quite different model performance.
    Even with few cases, you may be lucky and get good results. Or you have bad luck and get a really bad classifier.
    As usual, this variance decreases with incresing training sample size nn.

Another aspect that you may need to take into account is that it is usually not enough to train a good classifier, but you also need to prove that the classifier is good (or good enough). So you need to plan also the sample size needed for validation with a given precision. If you need to give these results as fraction of successes among so many test cases (e.g. producer's or consumer's accuracy / precision / sensitivity / positive predictive value), and the underlying classification task is rather easy, this can need more independent cases than training of a good model.

As a rule of thumb, for training, the sample size is usually discussed in relation to model complexity (number of cases : number of variates), whereas absolute bounds on the test sample size can be given for a required precision of the performance measurement.

Here's a paper, where we explained these things in more detail, and also discuss how to constuct learning curves:

Beleites, C. and Neugebauer, U. and Bocklitz, T. and Krafft, C. and Popp, J.: Sample size planning for classification models. Anal Chim Acta, 2013, 760, 25-33.

This is the "teaser", showing an easy classification problem (we actually have one easy distinction like this in our classification problem, but other classes are far more difficult to distinguish):

We did not try to extrapolate to larger training sample sizes to determine how much more training cases are needed, because the test sample sizes are our bottleneck, and larger training sample sizes would let us construct more complex models, so extrapolation is questionable. For the kind of data sets I have, I'd approach this iteratively, measuring a bunch of new cases, showing how much things improved, measure more cases, and so on.

This may be different for you, but the paper contains literature references to papers using extrapolation to higher sample sizes in order to estimate the required number of samples.

Asking about training sample size implies you are going to hold back data for model validation. This is an unstable process requiring a huge sample size. Strong internal validation with the bootstrap is often preferred. If you choose that path you need to only compute the one sample size. As @cbeleites so nicely stated this is often an "events per candidate variable" assessment, but you need a minimum of 96 observations to accurately predict the probability of a binary outcome even if there are no features to be examined [this is to achieve of 0.95 confidence margin of error of 0.1 in estimating the actual marginal probability that Y=1].

It is important to consider proper scoring rules for accuracy assessment (e.g., Brier score and log likelihood/deviance). Also make sure you really want to classify observations as opposed to estimating membership probability. The latter is almost always more useful as it allows a gray zone.

转载地址:http://pzuii.baihongyu.com/

你可能感兴趣的文章
实践这一次,彻底搞懂浏览器缓存机制
查看>>
Koa2教程(常用中间件篇)
查看>>
React Hooks 完全指南
查看>>
React16常用api解析以及原理剖析
查看>>
教你发布你npm包
查看>>
nvm 和 nrm 的安装与使用
查看>>
React Hooks 一步到位
查看>>
React Redux常见问题总结
查看>>
前端 DSL 实践指南
查看>>
ReactNative: 自定义ReactNative API组件
查看>>
cookie
查看>>
总结vue知识体系之实用技巧
查看>>
PM2 入门
查看>>
掌握 TS 这些工具类型,让你开发事半功倍
查看>>
前端如何搭建一个成熟的脚手架
查看>>
Flutter ListView如何添加HeaderView和FooterView
查看>>
Flutter key
查看>>
Flutter 组件通信(父子、兄弟)
查看>>
Flutter Animation动画
查看>>
Flutter 全局监听路由堆栈变化
查看>>