- 05 Feb 2024
- 7 Minutes to read
- Print
- DarkLight
Applications
- Updated on 05 Feb 2024
- 7 Minutes to read
- Print
- DarkLight
Applications process incoming media from 'input_subjects' and associate the media with outbound 'output_subjects'.
The attributes associated with an individual application include:
Name | Example | Description |
---|---|---|
application_id string | "di71rG94" | (read only) Unique application_id automatically assigned to each Application object upon creation |
tenant_id string | "2ijgkd9we8dksd" | (read only) Unique tenant_id automatically assigned to each tenant object upon creation |
name string | "Person detector" | Name should be brief and descriptive |
description string | "Find people walking into the front door" | A full description of the purpose of the application. Use this field to capture detailed subject and any exceptional feedback instructions. |
type string | "classification" | (read only) Cogniac appliction type. See below for the valid types |
input_subjects array | ["flicker_cats", "animals_pics"] | Array of Input Subject id's (note that input subjects' must be already created and have a valid id before adding to an Application) |
output_subjects array | ["cat", "dog", "face"] | List of subject tags corresponding to objects, patterns, or features that are of interest in this application |
release_metrics string | "best_F1" | The performance measure used when to assess model performance. |
detection_ thresholds dictionary | {"cat": 0.5, "dog": 0.7, "face": 0.5} | Map between subjects and associated probability thresholds. Detections below the specified probability threshold will not be forwarded to subsequent applications (if any). Detections below the threshold will not be posted to the detection_post_urls (if any). |
custom_fields dictionary (Being deprecated, replaced by app_type_config) | { "min_px": 50, "max_px": 450, "stride_px": 20 } | Field values for application parameters that are unique to a particular application type. |
detection_ post_urls array | ["http://127.0.0.1:9999/ my_model_output.net", "https://detections-now.com/detections"] | A list of URL's where model detections will be surfaced in addition to web and iOS interfaces. Posts are retried for thirty seconds, but URL's that fail retried posts after thirty seconds are blacklisted for five minutes. |
gateway_ post_urls array | ["http://127.0.0.1:9999/ my_model_output.net", "https://detections-now.com/detections"] | A list of URL's where model detections will be surfaced from the gateway. Posts are retried for thirty seconds, but URL's that fail retried posts after thirty seconds are blacklisted for five minutes. Specifying a gateway post URL implies that the gateway will implement the application along with any linked applications. |
active bool | true | Flag to control if the the application is active or not. Inactive applications do not process images submitted to their input subjects or requests feed back. |
replay bool | false | Switch to turn on replay of the input subjects to the app. This is used to 'skim' from the pool of input subject images for the purpose of creating more consensus data. |
refresh_feedback bool | false | Flag to control whether the images waiting for user feedback should be re-evaluated by the new model when a new model is released. |
model_id string | "Hpo-d-bf30-019ahMzliYY2KT4iWI-YN_mtsv1_4426.tgz" | Current model in use by cloud inference. |
staging_gateway_model_id string | "Hpo-d-bf30-019ahMzliYY2KT4iWI-YN_mtsv1_4426.tgz" | Force gateway to use a specific staging model. If a staging model is specified and the gateway is configured to use staging models, the gateway will download and use the specified model. Likewise for production models. However, if the gateway is configured to use a production model, but a production model is not specified for a particular application, the gateway will default to using the latest production model. The same logic applies to staging models. |
production_gateway_model_id string | "Hpo-d-bf30-019ahMzliYY2KT4iWI-YN_mtsv1_4426.tgz" | Force gateway to use a specific production model. If a staging model is specified and the gateway is configured to use staging models, the gateway will download and use the specified model. Likewise for production models. However, if the gateway is configured to use a production model, but a production model is not specified for a particular application, the gateway will default to using the latest production model. The same logic applies to staging models. |
app_managers array | ["user1@email.com", "user2@email.com"] | List of user email address, the users are given the app_manager role that is authorized to manage application settings and maintain feedback control. |
system feedback per_hour integer | 48 | (read only) The current target number of feedback requests per hour for the application. By default this is determined automatically by the system based on the current model performance and the number of subject-media associations that have reached consensus. The user can over-ride the system selected value by setting the requested_feedback_per_hour configuration item to the desired feedback level. |
requested feedback per_hour integer | 50 | Override the target rate of feedback to surface per hour. A Null value indicates the system feedback rate should be used. The default value is Null: system selects feedback rate. Select a higher value to schedule more feedback feedback requests, or a lower value to schedule fewer feedback requests. |
hpo_credit integer | 10 | (read only) Based on amount of feedback given, 1 credit is good for 1 immediately prioritized hyperparameter optimization training run for this application. Training is scheduled giving priority to higher credit holders. |
created_at float | 1455044755 | (read only) Unix Timestamp |
modified_at float | 1455044770 | (read only) Unix Timestamp |
created_by string | "test@cogniac.co" | (read only) email address of the user who created this application |
current_ performance float | 0.9 | _(read only)_performance of current winning model based on current validation images with respected to release metrics |
bestmodel ccp_filename string | "Hpo-d-bf30-019ahMzliYY2KT4iWI-YN_mtsv1_4426.tgz" | _(read only)_filename of the current winning model |
last_candidate_at float | 1455044770 | _(read only)_timestamp when last model was trained for this app |
last_released_at float | 1455044770 | _(read only)_timestamp when last winning model was released for this app |
candidate_ model_count int | 20 | _(read only)_number of models trained for this app |
release_ model_count int | 5 | _(read only)_number of models released for this app |
training_data_count int | 750 | _(read only)_number of training images used by the current winning model |
validation_data_count int | 250 | _(read only)_number of validation images used to evaluate the current winning model performance |
inference_execution_policies dict | { "replicas": 1, "max_batch": 8 "runtime_policy": { "rtc_timeout_seconds": 5, "model_seconds": 0, "model_load_policy": "realtime", "gpu_simul_load": 1, "gpu_selection_policy": "instance-ix" } } | Cogniac EdgeFlows and CloudFlow instances provide a high degree of flexibility for executing application models. Workloads can range from a single application to 100's of simultaneous applications. Different EdgeFlow and CloudFlow models can contain one GPU to dozens of GPUs in clustered environments. Inference_execution_policies controls the tradeoffs between number of applications, application throughput, and GPU memory consumption. Details see below. |
Inference Execution Policies
Cogniac EdgeFlows and CloudFlow instances provide a high degree of flexibility for executing application models. Workloads can range from a single application to 100's of simultaneous applications. Different EdgeFlow and CloudFlow models can contain one GPU to dozens of GPUs in clustered environments.
The following controls are available on a per-application basis to tune the tradeoffs between a number of applications, application throughput, and GPU memory consumption.
replicas The number of instances of the application model that are simultaneously running. Running multiple instances of a model can increase the application throughput when multiple GPUs are available at the expense of more overall GPU memory consumption by the application's models. The default replicas is 1.
max_batch The number of media items that may be batched in a single model inference pass. Increasing the max_batch size may increase application throughput for smaller media items at the expense of higher latency. The default max_batch is 1.
runtime_policy specifies controls for managing the scenarios where all applications (including all application replicas) can not fit in GPU memory simultaneously, in which case models must be dynamically loaded and unloaded from GPU memory. Loading and unloading models from GPU memory is relatively slow (temporarily reducing throughput and increasing latency). Furthermore, in highly oversubscribed scenarios there can be large delays in acquiring a GPU with sufficient available memory. These policies allow the tradeoffs to be tuned for different usage patterns.
runtime_policy consists of the following:
model_load_policy
model_seconds
rtc_timeout_seconds
gpu_selection_policy
gpu_simul_load
where
model_load_policy is one of "realtime", "timebound", or "run-to-completion". This controls the policy for UNLOADING a model from GPU memory. This policy is needed because there is a latency associated with loading a model into GPU memory, and potentially an even more significant latency related to finding a GPU with sufficient available memory if the EdgeFlow is highly oversubscribed concerning the number of applications relative to the amount of GPU memory. The default model_load_policy is "realtime".
The model will never be unloaded with the "realtime" policy (once successfully loaded). This provides the highest sustained throughput for applications processing constant media streams or can not otherwise absorb the latency or potential uncertainty associated with acquiring a GPU with sufficient memory available combined with the subsequent latency of loading the model into memory.
With the "timebound" policy, a model will be unloaded model_seconds (int) after successfully loading. The inference will be complete if the model_seconds expires while inference is in progress. Thus, a "timebound" policy with model_seconds of 0 always results in the model removal immediately after processing a single media batch. The default model_seconds is 0.
With the "run-to-completion" policy, a model will unload rtc_timeout_seconds (float) after its input queue is emptied of input media items. This policy is helpful for specific periodic application input patterns where the inputs may be somewhat spread out in time. For example, a usage pattern that expects to receive 4 images every 20 seconds, but the 4 images arrive over several seconds would be a good candidate for the "run-to-completion" policy. The default rtc_timeout_seconds is 5.
gpu_selection_policy is one of "instance-ix" or "by-free-memory". The default gpu_selection_policy is "by-free-memory".
When the "instance-ix" policy is selected, a model will be assigned to a GPU based on the replica index (0 to replicas - 1) modulo the total GPU count. This is more deterministic and most appropriate for realtime applications.
When "by-free-memory" policy is selected gpu_simul_load controls the number of models that are allowed to contend for a give GPU's available memory simultaneously. This is only relevant if there are many 10s of apps contending for each GPU. The default gpu_simul_load is 1.