- 06 Feb 2024
- 1 Minute to read
- Print
- DarkLight
Cogniac Rare Subject Mining Best Practices
- Updated on 06 Feb 2024
- 1 Minute to read
- Print
- DarkLight
Best Practices for Rare Subject Mining
Cogniac Confidential
During the application building stage, the Cogniac system is designed to facilitate the discovery and Labeling of rare subjects to increase model Accuracy as rapidly as possible. We call this process “rare subject mining”. This technical note describes our best practices for this process.
Starting assumptions:
You have at least a handful of images labeled for the rare condition, with some instances in the training set and some instances in the validation set for both True and False Consensus
A model has been released with a validation F1 of greater than 0%
Configure the detection threshold for the rare subject lower than the 0.5 default value. A detection threshold value in the .2 to .3 range is suggested.
Iterate on the following process:
Replay (or process for the first time) a large set of random media (1,000’s to 10,000’s) through the app, typically from an input subject
Wait until the replay completes (no more outputs appear in the application output view)
(Optional) Check for rare subject candidates from the Subject Media view on the rare subject using the following filter configuration:
Note the lower probability above 0 and the sort specified on “Highest Prob first”.
Perform “Replay” :
Select the replay to use the rare subject you are mining, like the following. Note that the lower end of the probability filter must be set above 0, and Force Feedback must be selected.
Provide feedback from the yellow “Provide Feedback” button after the replay completes
Wait for a new model to be released
Repeat
Suppose insufficient candidate media are found during the iterative process. In that case, a more manual labeling mechanism must be utilized (e.g., manually searching through many images and providing feedback on specific items) until the model learns the subject. Alternatively, it is possible that setting a progressively lower detection threshold could surface useful candidates.