diff --git a/docs/docs/01-fundamentals/01-getting-started.md b/docs/docs/01-fundamentals/01-getting-started.md index 7e7ea936f..9c70651b0 100644 --- a/docs/docs/01-fundamentals/01-getting-started.md +++ b/docs/docs/01-fundamentals/01-getting-started.md @@ -76,7 +76,7 @@ If you plan on using your models via require() instead of fetching them from a u This allows us to use binaries, such as exported models or tokenizers for LLMs. -:::caution +:::warning When using Expo, please note that you need to use a custom development build of your app, not the standard Expo Go app. This is because we rely on native modules, which Expo Go doesn’t support. ::: diff --git a/docs/docs/02-hooks/01-natural-language-processing/useLLM.md b/docs/docs/02-hooks/01-natural-language-processing/useLLM.md index 3f072f93c..2f704394a 100644 --- a/docs/docs/02-hooks/01-natural-language-processing/useLLM.md +++ b/docs/docs/02-hooks/01-natural-language-processing/useLLM.md @@ -197,7 +197,7 @@ Sometimes, you might want to stop the model while it’s generating. To do this, There are also cases when you need to check if tokens are being generated, such as to conditionally render a stop button. We’ve made this easy with the `isGenerating` property. -:::caution +:::warning If you try to dismount the component using this hook while generation is still going on, it will result in crash. You'll need to interrupt the model first and wait until `isGenerating` is set to false. ::: diff --git a/docs/docs/02-hooks/01-natural-language-processing/useSpeechToText.md b/docs/docs/02-hooks/01-natural-language-processing/useSpeechToText.md index d94c96a66..807f24fa5 100644 --- a/docs/docs/02-hooks/01-natural-language-processing/useSpeechToText.md +++ b/docs/docs/02-hooks/01-natural-language-processing/useSpeechToText.md @@ -20,7 +20,7 @@ description: "Learn how to use speech-to-text models in your React Native applic Speech to text is a task that allows to transform spoken language to written text. It is commonly used to implement features such as transcription or voice assistants. :::warning -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-whisper-tiny.en). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/speech-to-text-68d0ec99ed794250491b8bbe). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference diff --git a/docs/docs/02-hooks/01-natural-language-processing/useTextEmbeddings.md b/docs/docs/02-hooks/01-natural-language-processing/useTextEmbeddings.md index 7d4706f15..e123e6ce5 100644 --- a/docs/docs/02-hooks/01-natural-language-processing/useTextEmbeddings.md +++ b/docs/docs/02-hooks/01-natural-language-processing/useTextEmbeddings.md @@ -17,8 +17,8 @@ description: "Learn how to use text embeddings models in your React Native appli Text Embedding is the process of converting text into a numerical representation. This representation can be used for various natural language processing tasks, such as semantic search, text classification, and clustering. -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-all-MiniLM-L6-v2). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/text-embeddings-68d0ed42f8ca0200d0283362). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -141,7 +141,7 @@ For the supported models, the returned embedding vector is normalized, meaning t ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/docs/02-hooks/02-computer-vision/useClassification.md b/docs/docs/02-hooks/02-computer-vision/useClassification.md index eaf9afcb7..d0fd199b7 100644 --- a/docs/docs/02-hooks/02-computer-vision/useClassification.md +++ b/docs/docs/02-hooks/02-computer-vision/useClassification.md @@ -8,8 +8,8 @@ Image classification is the process of assigning a label to an image that best d Usually, the class with the highest probability is the one that is assigned to an image. However, if there are multiple classes with comparatively high probabilities, this may indicate that the model is not confident in its prediction. ::: -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-efficientnet-v2-s). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/classification-68d0ea49b5c7de8a3cae1e68). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -104,7 +104,7 @@ function App() { ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/docs/02-hooks/02-computer-vision/useImageEmbeddings.md b/docs/docs/02-hooks/02-computer-vision/useImageEmbeddings.md index b6decd1d2..31d3d42fb 100644 --- a/docs/docs/02-hooks/02-computer-vision/useImageEmbeddings.md +++ b/docs/docs/02-hooks/02-computer-vision/useImageEmbeddings.md @@ -18,8 +18,8 @@ description: "Learn how to use image embeddings models in your React Native appl Image Embedding is the process of converting an image into a numerical representation. This representation can be used for tasks, such as classification, clustering and (using contrastive learning like e.g. CLIP model) image search. -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-clip-vit-base-patch32). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/image-embeddings-68d0eda599a9d37caaaf1ad0). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -119,7 +119,7 @@ For the supported models, the returned embedding vector is normalized, meaning t ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. Performance also heavily depends on image size, because resize is expansive operation, especially on low-end devices. ::: diff --git a/docs/docs/02-hooks/02-computer-vision/useImageSegmentation.md b/docs/docs/02-hooks/02-computer-vision/useImageSegmentation.md index 7fee70880..6631fc217 100644 --- a/docs/docs/02-hooks/02-computer-vision/useImageSegmentation.md +++ b/docs/docs/02-hooks/02-computer-vision/useImageSegmentation.md @@ -4,8 +4,8 @@ title: useImageSegmentation Semantic image segmentation, akin to image classification, tries to assign the content of the image to one of the predefined classes. However, in case of segmentation this classification is done on a per-pixel basis, so as the result the model provides an image-sized array of scores for each of the classes. You can then use this information to detect objects on a per-pixel basis. React Native ExecuTorch offers a dedicated hook `useImageSegmentation` for this task. -:::caution -It is recommended to use models provided by us which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-style-transfer-candy), you can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/image-segmentation-68d5291bdf4a30bee0220f4f), you can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -55,7 +55,7 @@ To run the model, you can use the `forward` method. It accepts three arguments: - The `classesOfInterest` list contains classes for which to output the full results. By default the list is empty, and only the most probable classes are returned (essentially an arg max for each pixel). Look at [`DeeplabLabel`](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/imageSegmentation.ts) enum for possible classes. - The `resize` flag says whether the output will be rescaled back to the size of the image you put in. The default is `false`. The model runs inference on a scaled (probably smaller) version of your image (224x224 for `DEEPLAB_V3_RESNET50`). If you choose to resize, the output will be `number[]` of size `width * height` of your original image. -:::caution +:::warning Setting `resize` to true will make `forward` slower. ::: @@ -98,7 +98,7 @@ function App() { ### Memory usage -:::warning warning +:::warning Data presented in the following sections is based on inference with non-resized output. When resize is enabled, expect higher memory usage and inference time with higher resolutions. ::: @@ -108,7 +108,7 @@ Data presented in the following sections is based on inference with non-resized ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/docs/02-hooks/02-computer-vision/useOCR.md b/docs/docs/02-hooks/02-computer-vision/useOCR.md index d07efd601..b7395879c 100644 --- a/docs/docs/02-hooks/02-computer-vision/useOCR.md +++ b/docs/docs/02-hooks/02-computer-vision/useOCR.md @@ -4,8 +4,8 @@ title: useOCR Optical character recognition(OCR) is a computer vision technique that detects and recognizes text within the image. It's commonly used to convert different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data. -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/ocr-68d0eb320ae6d20b5f901ea9). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -311,7 +311,7 @@ You need to make sure the recognizer models you pass in `recognizerSources` matc | -------------------------------------------- | -------------------------------------------------- | | Original Image | Image with detected Text Boxes | -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/docs/02-hooks/02-computer-vision/useObjectDetection.md b/docs/docs/02-hooks/02-computer-vision/useObjectDetection.md index 2bae6a658..309d0e3ab 100644 --- a/docs/docs/02-hooks/02-computer-vision/useObjectDetection.md +++ b/docs/docs/02-hooks/02-computer-vision/useObjectDetection.md @@ -5,8 +5,8 @@ title: useObjectDetection Object detection is a computer vision technique that identifies and locates objects within images or video. It’s commonly used in applications like image recognition, video surveillance or autonomous driving. `useObjectDetection` is a hook that allows you to seamlessly integrate object detection into your React Native applications. -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-ssdlite320-mobilenet-v3-large). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/object-detection-68d0ea936cd0906843cbba7d). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -143,7 +143,7 @@ function App() { ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/docs/02-hooks/02-computer-vision/useStyleTransfer.md b/docs/docs/02-hooks/02-computer-vision/useStyleTransfer.md index f5d0a423c..72f420668 100644 --- a/docs/docs/02-hooks/02-computer-vision/useStyleTransfer.md +++ b/docs/docs/02-hooks/02-computer-vision/useStyleTransfer.md @@ -4,8 +4,8 @@ title: useStyleTransfer Style transfer is a technique used in computer graphics and machine learning where the visual style of one image is applied to the content of another. This is achieved using algorithms that manipulate data from both images, typically with the aid of a neural network. The result is a new image that combines the artistic elements of one picture with the structural details of another, effectively merging art with traditional imagery. React Native ExecuTorch offers a dedicated hook `useStyleTransfer`, for this task. However before you start you'll need to obtain ExecuTorch-compatible model binary. -:::caution -It is recommended to use models provided by us which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-style-transfer-candy), you can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/style-transfer-68d0eab2b0767a20e7efeaf5), you can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -102,7 +102,7 @@ function App() { ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/docs/02-hooks/02-computer-vision/useTextToImage.md b/docs/docs/02-hooks/02-computer-vision/useTextToImage.md index 3eaf7d826..540b1263f 100644 --- a/docs/docs/02-hooks/02-computer-vision/useTextToImage.md +++ b/docs/docs/02-hooks/02-computer-vision/useTextToImage.md @@ -9,7 +9,7 @@ Text-to-image is a process of generating images directly from a description in n :::warning -It is recommended to use models provided by us which are available at our Hugging Face repository, you can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +It is recommended to use models provided by us which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/text-to-image-68d0edf50ae6d20b5f9076cd), you can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference diff --git a/docs/docs/02-hooks/02-computer-vision/useVerticalOCR.md b/docs/docs/02-hooks/02-computer-vision/useVerticalOCR.md index f317d527e..4f64c5de8 100644 --- a/docs/docs/02-hooks/02-computer-vision/useVerticalOCR.md +++ b/docs/docs/02-hooks/02-computer-vision/useVerticalOCR.md @@ -8,8 +8,8 @@ The `useVerticalOCR` hook is currently in an experimental phase. We appreciate f Optical Character Recognition (OCR) is a computer vision technique used to detect and recognize text within images. It is commonly utilized to convert a variety of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data. Traditionally, OCR technology has been optimized for recognizing horizontal text, and integrating support for vertical text recognition often requires significant additional effort from developers. To simplify this, we introduce `useVerticalOCR`, a tool designed to abstract the complexities of vertical text OCR, enabling seamless integration into your applications. -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/ocr-68d0eb320ae6d20b5f901ea9). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -326,7 +326,7 @@ You need to make sure the recognizer models you pass in `recognizerSources` matc | ---------------------------------------------------- | --------------------------------------------------------- | | Original Image | Image with detected Text Boxes | -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/docs/02-hooks/03-executorch-bindings/useExecutorchModule.md b/docs/docs/02-hooks/03-executorch-bindings/useExecutorchModule.md index 137b19d92..f9b9f21a9 100644 --- a/docs/docs/02-hooks/03-executorch-bindings/useExecutorchModule.md +++ b/docs/docs/02-hooks/03-executorch-bindings/useExecutorchModule.md @@ -4,7 +4,7 @@ title: useExecutorchModule useExecutorchModule provides React Native bindings to the ExecuTorch [Module API](https://pytorch.org/executorch/stable/extension-module.html) directly from JavaScript. -:::caution +:::warning These bindings are primarily intended for custom model integration where no dedicated hook exists. If you are considering using a provided model, first verify whether a dedicated hook is available. Dedicated hooks simplify the implementation process by managing necessary pre and post-processing automatically. Utilizing these can save you effort and reduce complexity, ensuring you do not implement additional handling that is already covered. ::: diff --git a/docs/docs/03-typescript-api/02-computer-vision/ImageSegmentationModule.md b/docs/docs/03-typescript-api/02-computer-vision/ImageSegmentationModule.md index 99deae014..89344c70a 100644 --- a/docs/docs/03-typescript-api/02-computer-vision/ImageSegmentationModule.md +++ b/docs/docs/03-typescript-api/02-computer-vision/ImageSegmentationModule.md @@ -63,7 +63,7 @@ To run the model, you can use the `forward` method on the module object. It acce - The `classesOfInterest` list contains classes for which to output the full results. By default the list is empty, and only the most probable classes are returned (essentially an arg max for each pixel). Look at `DeeplabLabel` enum for possible classes. - The `resize` flag says whether the output will be rescaled back to the size of the image you put in. The default is `false`. The model runs inference on a scaled (probably smaller) version of your image (224x224 for the `DEEPLAB_V3_RESNET50`). If you choose to resize, the output will be `number[]` of size `width * height` of your original image. -:::caution +:::warning Setting `resize` to true will make `forward` slower. ::: diff --git a/docs/docs/04-benchmarks/inference-time.md b/docs/docs/04-benchmarks/inference-time.md index dbfc2b21d..5f24cb416 100644 --- a/docs/docs/04-benchmarks/inference-time.md +++ b/docs/docs/04-benchmarks/inference-time.md @@ -2,7 +2,7 @@ title: Inference Time --- -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: @@ -104,7 +104,17 @@ Benchmark times for text embeddings are highly dependent on the sentence length. Image embedding benchmark times are measured using 224×224 pixel images, as required by the model. All input images, whether larger or smaller, are resized to 224×224 before processing. Resizing is typically fast for small images but may be noticeably slower for very large images, which can increase total inference time. ::: -## Text to Image +## Image Segmentation + +:::warning +Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. +::: + +| Model | iPhone 16 Pro (Core ML) [ms] | iPhone 14 Pro Max (Core ML) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | +| ----------------- | ---------------------------- | -------------------------------- | --------------------------------- | +| DEELABV3_RESNET50 | 1000 | 670 | 700 | + +## Text to image | Model | iPhone 17 Pro (XNNPACK) [ms] | iPhone 16 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] | | --------------------- | :--------------------------: | :--------------------------: | :------------------------: | :-------------------------------: | :-----------------------: | diff --git a/docs/docs/04-benchmarks/memory-usage.md b/docs/docs/04-benchmarks/memory-usage.md index a0c5a7b6d..b20e04986 100644 --- a/docs/docs/04-benchmarks/memory-usage.md +++ b/docs/docs/04-benchmarks/memory-usage.md @@ -73,7 +73,17 @@ All the below benchmarks were performed on iPhone 17 Pro (iOS) and OnePlus 12 (A | --------------------------- | :--------------------: | :----------------: | | CLIP_VIT_BASE_PATCH32_IMAGE | 345 | 340 | -## Text to Image +## Image Segmentation + +:::warning +Data presented in the following sections is based on inference with non-resized output. When resize is enabled, expect higher memory usage and inference time with higher resolutions. +::: + +| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] | +| ----------------- | ---------------------- | ------------------ | +| DEELABV3_RESNET50 | 930 | 660 | + +## Text to image | Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] | | --------------------- | ---------------------- | ------------------ | diff --git a/docs/docs/04-benchmarks/model-size.md b/docs/docs/04-benchmarks/model-size.md index 00e819494..81798954b 100644 --- a/docs/docs/04-benchmarks/model-size.md +++ b/docs/docs/04-benchmarks/model-size.md @@ -83,7 +83,13 @@ title: Model Size | --------------------------- | :----------: | | CLIP_VIT_BASE_PATCH32_IMAGE | 352 | -## Text to Image +## Image Segmentation + +| Model | XNNPACK [MB] | +| ----------------- | ------------ | +| DEELABV3_RESNET50 | 168 | + +## Text to image | Model | Text encoder (XNNPACK) [MB] | UNet (XNNPACK) [MB] | VAE decoder (XNNPACK) [MB] | | ----------------- | --------------------------- | ------------------- | -------------------------- | diff --git a/docs/versioned_docs/version-0.3.x/benchmarks/inference-time.md b/docs/versioned_docs/version-0.3.x/benchmarks/inference-time.md index 31dc81174..7ae4c9875 100644 --- a/docs/versioned_docs/version-0.3.x/benchmarks/inference-time.md +++ b/docs/versioned_docs/version-0.3.x/benchmarks/inference-time.md @@ -91,7 +91,7 @@ Average time for encoding audio of given length over 10 runs. For `Whisper` mode ### Decoding -Average time for decoding one token in sequence of 100 tokens, with encoding context is obtained from audio of noted length. +Average time for decoding one token in sequence of 100 tokens, with encoding context obtained from audio of noted length. | Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] | | -------------------- | :--------------------------: | :--------------------------: | :------------------------: | :-------------------------------: | :-----------------------: | diff --git a/docs/versioned_docs/version-0.4.x/benchmarks/inference-time.md b/docs/versioned_docs/version-0.4.x/benchmarks/inference-time.md index f5d6d0113..4a4027b5d 100644 --- a/docs/versioned_docs/version-0.4.x/benchmarks/inference-time.md +++ b/docs/versioned_docs/version-0.4.x/benchmarks/inference-time.md @@ -92,7 +92,7 @@ Average time for encoding audio of given length over 10 runs. For `Whisper` mode ### Decoding -Average time for decoding one token in sequence of 100 tokens, with encoding context is obtained from audio of noted length. +Average time for decoding one token in sequence of 100 tokens, with encoding context obtained from audio of noted length. | Model | iPhone 17 Pro (XNNPACK) [ms] | iPhone 16 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] | | -------------------- | :--------------------------: | :--------------------------: | :------------------------: | :-------------------------------: | :-----------------------: | diff --git a/docs/versioned_docs/version-0.5.x/01-fundamentals/01-getting-started.md b/docs/versioned_docs/version-0.5.x/01-fundamentals/01-getting-started.md index b5d60c35b..18d845862 100644 --- a/docs/versioned_docs/version-0.5.x/01-fundamentals/01-getting-started.md +++ b/docs/versioned_docs/version-0.5.x/01-fundamentals/01-getting-started.md @@ -76,7 +76,7 @@ If you plan on using your models via require() instead of fetching them from a u This allows us to use binaries, such as exported models or tokenizers for LLMs. -:::caution +:::warning When using Expo, please note that you need to use a custom development build of your app, not the standard Expo Go app. This is because we rely on native modules, which Expo Go doesn’t support. ::: diff --git a/docs/versioned_docs/version-0.5.x/02-hooks/01-natural-language-processing/useLLM.md b/docs/versioned_docs/version-0.5.x/02-hooks/01-natural-language-processing/useLLM.md index f639a6cf6..e49b1a8e4 100644 --- a/docs/versioned_docs/version-0.5.x/02-hooks/01-natural-language-processing/useLLM.md +++ b/docs/versioned_docs/version-0.5.x/02-hooks/01-natural-language-processing/useLLM.md @@ -30,7 +30,7 @@ React Native ExecuTorch supports a variety of LLMs (checkout our [HuggingFace re Lower-end devices might not be able to fit LLMs into memory. We recommend using quantized models to reduce the memory footprint. ::: -:::caution +:::warning Up to version 0.5.3, our architecture was designed to support only one instance of the model runner at a time. As a consequence, only one active component could leverage `useLLM` concurrently. Starting with version 0.5.3, this limitation has been removed ::: @@ -199,7 +199,7 @@ Sometimes, you might want to stop the model while it’s generating. To do this, There are also cases when you need to check if tokens are being generated, such as to conditionally render a stop button. We’ve made this easy with the `isGenerating` property. -:::caution +:::warning If you try to dismount the component using this hook while generation is still going on, it will result in crash. You'll need to interrupt the model first and wait until `isGenerating` is set to false. ::: diff --git a/docs/versioned_docs/version-0.5.x/02-hooks/01-natural-language-processing/useSpeechToText.md b/docs/versioned_docs/version-0.5.x/02-hooks/01-natural-language-processing/useSpeechToText.md index d94c96a66..807f24fa5 100644 --- a/docs/versioned_docs/version-0.5.x/02-hooks/01-natural-language-processing/useSpeechToText.md +++ b/docs/versioned_docs/version-0.5.x/02-hooks/01-natural-language-processing/useSpeechToText.md @@ -20,7 +20,7 @@ description: "Learn how to use speech-to-text models in your React Native applic Speech to text is a task that allows to transform spoken language to written text. It is commonly used to implement features such as transcription or voice assistants. :::warning -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-whisper-tiny.en). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/speech-to-text-68d0ec99ed794250491b8bbe). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference diff --git a/docs/versioned_docs/version-0.5.x/02-hooks/01-natural-language-processing/useTextEmbeddings.md b/docs/versioned_docs/version-0.5.x/02-hooks/01-natural-language-processing/useTextEmbeddings.md index fd595d208..23cc32438 100644 --- a/docs/versioned_docs/version-0.5.x/02-hooks/01-natural-language-processing/useTextEmbeddings.md +++ b/docs/versioned_docs/version-0.5.x/02-hooks/01-natural-language-processing/useTextEmbeddings.md @@ -17,8 +17,8 @@ description: "Learn how to use text embeddings models in your React Native appli Text Embedding is the process of converting text into a numerical representation. This representation can be used for various natural language processing tasks, such as semantic search, text classification, and clustering. -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-all-MiniLM-L6-v2). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/text-embeddings-68d0ed42f8ca0200d0283362). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -141,7 +141,7 @@ For the supported models, the returned embedding vector is normalized, meaning t ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useClassification.md b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useClassification.md index e17bfa775..8e18ab999 100644 --- a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useClassification.md +++ b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useClassification.md @@ -8,8 +8,8 @@ Image classification is the process of assigning a label to an image that best d Usually, the class with the highest probability is the one that is assigned to an image. However, if there are multiple classes with comparatively high probabilities, this may indicate that the model is not confident in its prediction. ::: -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-efficientnet-v2-s). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/classification-68d0ea49b5c7de8a3cae1e68). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -104,7 +104,7 @@ function App() { ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useImageEmbeddings.md b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useImageEmbeddings.md index 4d417590c..17f82502c 100644 --- a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useImageEmbeddings.md +++ b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useImageEmbeddings.md @@ -18,8 +18,8 @@ description: "Learn how to use image embeddings models in your React Native appl Image Embedding is the process of converting an image into a numerical representation. This representation can be used for tasks, such as classification, clustering and (using contrastive learning like e.g. CLIP model) image search. -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-clip-vit-base-patch32). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/image-embeddings-68d0eda599a9d37caaaf1ad0). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -119,7 +119,7 @@ For the supported models, the returned embedding vector is normalized, meaning t ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. Performance also heavily depends on image size, because resize is expansive operation, especially on low-end devices. ::: diff --git a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useImageSegmentation.md b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useImageSegmentation.md index 7fee70880..6631fc217 100644 --- a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useImageSegmentation.md +++ b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useImageSegmentation.md @@ -4,8 +4,8 @@ title: useImageSegmentation Semantic image segmentation, akin to image classification, tries to assign the content of the image to one of the predefined classes. However, in case of segmentation this classification is done on a per-pixel basis, so as the result the model provides an image-sized array of scores for each of the classes. You can then use this information to detect objects on a per-pixel basis. React Native ExecuTorch offers a dedicated hook `useImageSegmentation` for this task. -:::caution -It is recommended to use models provided by us which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-style-transfer-candy), you can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/image-segmentation-68d5291bdf4a30bee0220f4f), you can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -55,7 +55,7 @@ To run the model, you can use the `forward` method. It accepts three arguments: - The `classesOfInterest` list contains classes for which to output the full results. By default the list is empty, and only the most probable classes are returned (essentially an arg max for each pixel). Look at [`DeeplabLabel`](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/imageSegmentation.ts) enum for possible classes. - The `resize` flag says whether the output will be rescaled back to the size of the image you put in. The default is `false`. The model runs inference on a scaled (probably smaller) version of your image (224x224 for `DEEPLAB_V3_RESNET50`). If you choose to resize, the output will be `number[]` of size `width * height` of your original image. -:::caution +:::warning Setting `resize` to true will make `forward` slower. ::: @@ -98,7 +98,7 @@ function App() { ### Memory usage -:::warning warning +:::warning Data presented in the following sections is based on inference with non-resized output. When resize is enabled, expect higher memory usage and inference time with higher resolutions. ::: @@ -108,7 +108,7 @@ Data presented in the following sections is based on inference with non-resized ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useOCR.md b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useOCR.md index 5a1e80cfc..13034e276 100644 --- a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useOCR.md +++ b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useOCR.md @@ -4,8 +4,8 @@ title: useOCR Optical character recognition(OCR) is a computer vision technique that detects and recognizes text within the image. It's commonly used to convert different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data. -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/ocr-68d0eb320ae6d20b5f901ea9). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -311,7 +311,7 @@ You need to make sure the recognizer models you pass in `recognizerSources` matc | ----------------------------------------------- | ----------------------------------------------------- | | Original Image | Image with detected Text Boxes | -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useObjectDetection.md b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useObjectDetection.md index 7f49e8389..cd53abffd 100644 --- a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useObjectDetection.md +++ b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useObjectDetection.md @@ -5,8 +5,8 @@ title: useObjectDetection Object detection is a computer vision technique that identifies and locates objects within images or video. It’s commonly used in applications like image recognition, video surveillance or autonomous driving. `useObjectDetection` is a hook that allows you to seamlessly integrate object detection into your React Native applications. -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-ssdlite320-mobilenet-v3-large). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/object-detection-68d0ea936cd0906843cbba7d). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -143,7 +143,7 @@ function App() { ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useStyleTransfer.md b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useStyleTransfer.md index 2bedba325..00f08c0a9 100644 --- a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useStyleTransfer.md +++ b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useStyleTransfer.md @@ -4,8 +4,8 @@ title: useStyleTransfer Style transfer is a technique used in computer graphics and machine learning where the visual style of one image is applied to the content of another. This is achieved using algorithms that manipulate data from both images, typically with the aid of a neural network. The result is a new image that combines the artistic elements of one picture with the structural details of another, effectively merging art with traditional imagery. React Native ExecuTorch offers a dedicated hook `useStyleTransfer`, for this task. However before you start you'll need to obtain ExecuTorch-compatible model binary. -:::caution -It is recommended to use models provided by us which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-style-transfer-candy), you can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/style-transfer-68d0eab2b0767a20e7efeaf5), you can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -102,7 +102,7 @@ function App() { ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useVerticalOCR.md b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useVerticalOCR.md index 73c3fc108..87eb9d780 100644 --- a/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useVerticalOCR.md +++ b/docs/versioned_docs/version-0.5.x/02-hooks/02-computer-vision/useVerticalOCR.md @@ -8,8 +8,8 @@ The `useVerticalOCR` hook is currently in an experimental phase. We appreciate f Optical Character Recognition (OCR) is a computer vision technique used to detect and recognize text within images. It is commonly utilized to convert a variety of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data. Traditionally, OCR technology has been optimized for recognizing horizontal text, and integrating support for vertical text recognition often requires significant additional effort from developers. To simplify this, we introduce `useVerticalOCR`, a tool designed to abstract the complexities of vertical text OCR, enabling seamless integration into your applications. -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/ocr-68d0eb320ae6d20b5f901ea9). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -326,7 +326,7 @@ You need to make sure the recognizer models you pass in `recognizerSources` matc | ------------------------------------------------------- | ------------------------------------------------------------ | | Original Image | Image with detected Text Boxes | -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/versioned_docs/version-0.5.x/02-hooks/03-executorch-bindings/useExecutorchModule.md b/docs/versioned_docs/version-0.5.x/02-hooks/03-executorch-bindings/useExecutorchModule.md index 137b19d92..f9b9f21a9 100644 --- a/docs/versioned_docs/version-0.5.x/02-hooks/03-executorch-bindings/useExecutorchModule.md +++ b/docs/versioned_docs/version-0.5.x/02-hooks/03-executorch-bindings/useExecutorchModule.md @@ -4,7 +4,7 @@ title: useExecutorchModule useExecutorchModule provides React Native bindings to the ExecuTorch [Module API](https://pytorch.org/executorch/stable/extension-module.html) directly from JavaScript. -:::caution +:::warning These bindings are primarily intended for custom model integration where no dedicated hook exists. If you are considering using a provided model, first verify whether a dedicated hook is available. Dedicated hooks simplify the implementation process by managing necessary pre and post-processing automatically. Utilizing these can save you effort and reduce complexity, ensuring you do not implement additional handling that is already covered. ::: diff --git a/docs/versioned_docs/version-0.5.x/03-typescript-api/02-computer-vision/ImageSegmentationModule.md b/docs/versioned_docs/version-0.5.x/03-typescript-api/02-computer-vision/ImageSegmentationModule.md index 99deae014..89344c70a 100644 --- a/docs/versioned_docs/version-0.5.x/03-typescript-api/02-computer-vision/ImageSegmentationModule.md +++ b/docs/versioned_docs/version-0.5.x/03-typescript-api/02-computer-vision/ImageSegmentationModule.md @@ -63,7 +63,7 @@ To run the model, you can use the `forward` method on the module object. It acce - The `classesOfInterest` list contains classes for which to output the full results. By default the list is empty, and only the most probable classes are returned (essentially an arg max for each pixel). Look at `DeeplabLabel` enum for possible classes. - The `resize` flag says whether the output will be rescaled back to the size of the image you put in. The default is `false`. The model runs inference on a scaled (probably smaller) version of your image (224x224 for the `DEEPLAB_V3_RESNET50`). If you choose to resize, the output will be `number[]` of size `width * height` of your original image. -:::caution +:::warning Setting `resize` to true will make `forward` slower. ::: diff --git a/docs/versioned_docs/version-0.5.x/04-benchmarks/inference-time.md b/docs/versioned_docs/version-0.5.x/04-benchmarks/inference-time.md index 89f1f9de1..340d4fb72 100644 --- a/docs/versioned_docs/version-0.5.x/04-benchmarks/inference-time.md +++ b/docs/versioned_docs/version-0.5.x/04-benchmarks/inference-time.md @@ -2,7 +2,7 @@ title: Inference Time --- -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: @@ -64,6 +64,16 @@ The values below represent the averages across all runs for the benchmark image. ❌ - Insufficient RAM. +### Streaming mode + +Notice than for `Whisper` model which has to take as an input 30 seconds audio chunks (for shorter audio it is automatically padded with silence to 30 seconds) `fast` mode has the lowest latency (time from starting transcription to first token returned, caused by streaming algorithm), but the slowest speed. If you believe that this might be a problem for you, prefer `balanced` mode instead. + +| Model (mode) | iPhone 16 Pro (XNNPACK) [latency \| tokens/s] | iPhone 14 Pro (XNNPACK) [latency \| tokens/s] | iPhone SE 3 (XNNPACK) [latency \| tokens/s] | Samsung Galaxy S24 (XNNPACK) [latency \| tokens/s] | OnePlus 12 (XNNPACK) [latency \| tokens/s] | +| ----------------------- | :-------------------------------------------: | :-------------------------------------------: | :-----------------------------------------: | :------------------------------------------------: | :----------------------------------------: | +| Whisper-tiny (fast) | 2.8s \| 5.5t/s | 3.7s \| 4.4t/s | 4.4s \| 3.4t/s | 5.5s \| 3.1t/s | 5.3s \| 3.8t/s | +| Whisper-tiny (balanced) | 5.6s \| 7.9t/s | 7.0s \| 6.3t/s | 8.3s \| 5.0t/s | 8.4s \| 6.7t/s | 7.7s \| 7.2t/s | +| Whisper-tiny (quality) | 10.3s \| 8.3t/s | 12.6s \| 6.8t/s | 7.8s \| 8.9t/s | 13.5s \| 7.1t/s | 12.9s \| 7.5t/s | + ### Encoding Average time for encoding audio of given length over 10 runs. For `Whisper` model we only list 30 sec audio chunks since `Whisper` does not accept other lengths (for shorter audio the audio needs to be padded to 30sec with silence). @@ -104,6 +114,18 @@ Benchmark times for text embeddings are highly dependent on the sentence length. Image embedding benchmark times are measured using 224×224 pixel images, as required by the model. All input images, whether larger or smaller, are resized to 224×224 before processing. Resizing is typically fast for small images but may be noticeably slower for very large images, which can increase total inference time. ::: + +## Image Segmentation + +:::warning +Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. +::: + +| Model | iPhone 16 Pro (Core ML) [ms] | iPhone 14 Pro Max (Core ML) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | +| ----------------- | ---------------------------- | -------------------------------- | --------------------------------- | +| DEELABV3_RESNET50 | 1000 | 670 | 700 | + + ## Text to Image | Model | iPhone 17 Pro (XNNPACK) [ms] | iPhone 16 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] | diff --git a/docs/versioned_docs/version-0.5.x/04-benchmarks/memory-usage.md b/docs/versioned_docs/version-0.5.x/04-benchmarks/memory-usage.md index a0c5a7b6d..042cecacc 100644 --- a/docs/versioned_docs/version-0.5.x/04-benchmarks/memory-usage.md +++ b/docs/versioned_docs/version-0.5.x/04-benchmarks/memory-usage.md @@ -71,6 +71,17 @@ All the below benchmarks were performed on iPhone 17 Pro (iOS) and OnePlus 12 (A | Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] | | --------------------------- | :--------------------: | :----------------: | +| CLIP_VIT_BASE_PATCH32_IMAGE | 350 | 340 | + +## Image Segmentation + +:::warning +Data presented in the following sections is based on inference with non-resized output. When resize is enabled, expect higher memory usage and inference time with higher resolutions. +::: + +| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] | +| ----------------- | ---------------------- | ------------------ | +| DEELABV3_RESNET50 | 930 | 660 | | CLIP_VIT_BASE_PATCH32_IMAGE | 345 | 340 | ## Text to Image diff --git a/docs/versioned_docs/version-0.5.x/04-benchmarks/model-size.md b/docs/versioned_docs/version-0.5.x/04-benchmarks/model-size.md index 128cbd7fb..898166e49 100644 --- a/docs/versioned_docs/version-0.5.x/04-benchmarks/model-size.md +++ b/docs/versioned_docs/version-0.5.x/04-benchmarks/model-size.md @@ -83,6 +83,12 @@ title: Model Size | --------------------------- | :----------: | | CLIP_VIT_BASE_PATCH32_IMAGE | 352 | +## Image Segmentation + +| Model | XNNPACK [MB] | +| ----------------- | ------------ | +| DEELABV3_RESNET50 | 168 | + ## Text to Image | Model | Text encoder (XNNPACK) [MB] | UNet (XNNPACK) [MB] | VAE decoder (XNNPACK) [MB] | diff --git a/docs/versioned_docs/version-0.6.0/01-fundamentals/01-getting-started.md b/docs/versioned_docs/version-0.6.0/01-fundamentals/01-getting-started.md index 4ac66bb7e..2bbbe8b55 100644 --- a/docs/versioned_docs/version-0.6.0/01-fundamentals/01-getting-started.md +++ b/docs/versioned_docs/version-0.6.0/01-fundamentals/01-getting-started.md @@ -75,7 +75,7 @@ If you plan on using your models via require() instead of fetching them from a u This allows us to use binaries, such as exported models or tokenizers for LLMs. -:::caution +:::warning When using Expo, please note that you need to use a custom development build of your app, not the standard Expo Go app. This is because we rely on native modules, which Expo Go doesn’t support. ::: diff --git a/docs/versioned_docs/version-0.6.0/02-hooks/01-natural-language-processing/useLLM.md b/docs/versioned_docs/version-0.6.0/02-hooks/01-natural-language-processing/useLLM.md index 3f072f93c..2f704394a 100644 --- a/docs/versioned_docs/version-0.6.0/02-hooks/01-natural-language-processing/useLLM.md +++ b/docs/versioned_docs/version-0.6.0/02-hooks/01-natural-language-processing/useLLM.md @@ -197,7 +197,7 @@ Sometimes, you might want to stop the model while it’s generating. To do this, There are also cases when you need to check if tokens are being generated, such as to conditionally render a stop button. We’ve made this easy with the `isGenerating` property. -:::caution +:::warning If you try to dismount the component using this hook while generation is still going on, it will result in crash. You'll need to interrupt the model first and wait until `isGenerating` is set to false. ::: diff --git a/docs/versioned_docs/version-0.6.0/02-hooks/01-natural-language-processing/useSpeechToText.md b/docs/versioned_docs/version-0.6.0/02-hooks/01-natural-language-processing/useSpeechToText.md index 86e59ee0a..1031b6c43 100644 --- a/docs/versioned_docs/version-0.6.0/02-hooks/01-natural-language-processing/useSpeechToText.md +++ b/docs/versioned_docs/version-0.6.0/02-hooks/01-natural-language-processing/useSpeechToText.md @@ -20,7 +20,7 @@ description: "Learn how to use speech-to-text models in your React Native applic Speech to text is a task that allows to transform spoken language to written text. It is commonly used to implement features such as transcription or voice assistants. :::warning -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-whisper-tiny.en). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/speech-to-text-68d0ec99ed794250491b8bbe). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference diff --git a/docs/versioned_docs/version-0.6.0/02-hooks/01-natural-language-processing/useTextEmbeddings.md b/docs/versioned_docs/version-0.6.0/02-hooks/01-natural-language-processing/useTextEmbeddings.md index 7d4706f15..dc045e06c 100644 --- a/docs/versioned_docs/version-0.6.0/02-hooks/01-natural-language-processing/useTextEmbeddings.md +++ b/docs/versioned_docs/version-0.6.0/02-hooks/01-natural-language-processing/useTextEmbeddings.md @@ -17,8 +17,8 @@ description: "Learn how to use text embeddings models in your React Native appli Text Embedding is the process of converting text into a numerical representation. This representation can be used for various natural language processing tasks, such as semantic search, text classification, and clustering. -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-all-MiniLM-L6-v2). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/text-embeddings-68d0ed42f8ca0200d0283362). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference diff --git a/docs/versioned_docs/version-0.6.0/02-hooks/01-natural-language-processing/useVAD.md b/docs/versioned_docs/version-0.6.0/02-hooks/01-natural-language-processing/useVAD.md index b38fe8df0..417570732 100644 --- a/docs/versioned_docs/version-0.6.0/02-hooks/01-natural-language-processing/useVAD.md +++ b/docs/versioned_docs/version-0.6.0/02-hooks/01-natural-language-processing/useVAD.md @@ -4,8 +4,8 @@ title: useVAD Voice Activity Detection (VAD) is the task of analyzing an audio signal to identify time segments containing human speech, separating them from non-speech sections like silence and background noise. -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-fsmn-vad). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/speech-to-text-68d0ec99ed794250491b8bbe). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference diff --git a/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useClassification.md b/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useClassification.md index eaf9afcb7..d0fd199b7 100644 --- a/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useClassification.md +++ b/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useClassification.md @@ -8,8 +8,8 @@ Image classification is the process of assigning a label to an image that best d Usually, the class with the highest probability is the one that is assigned to an image. However, if there are multiple classes with comparatively high probabilities, this may indicate that the model is not confident in its prediction. ::: -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-efficientnet-v2-s). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/classification-68d0ea49b5c7de8a3cae1e68). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -104,7 +104,7 @@ function App() { ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useImageEmbeddings.md b/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useImageEmbeddings.md index b6decd1d2..31d3d42fb 100644 --- a/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useImageEmbeddings.md +++ b/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useImageEmbeddings.md @@ -18,8 +18,8 @@ description: "Learn how to use image embeddings models in your React Native appl Image Embedding is the process of converting an image into a numerical representation. This representation can be used for tasks, such as classification, clustering and (using contrastive learning like e.g. CLIP model) image search. -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-clip-vit-base-patch32). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/image-embeddings-68d0eda599a9d37caaaf1ad0). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -119,7 +119,7 @@ For the supported models, the returned embedding vector is normalized, meaning t ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. Performance also heavily depends on image size, because resize is expansive operation, especially on low-end devices. ::: diff --git a/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useImageSegmentation.md b/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useImageSegmentation.md index 7fee70880..6631fc217 100644 --- a/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useImageSegmentation.md +++ b/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useImageSegmentation.md @@ -4,8 +4,8 @@ title: useImageSegmentation Semantic image segmentation, akin to image classification, tries to assign the content of the image to one of the predefined classes. However, in case of segmentation this classification is done on a per-pixel basis, so as the result the model provides an image-sized array of scores for each of the classes. You can then use this information to detect objects on a per-pixel basis. React Native ExecuTorch offers a dedicated hook `useImageSegmentation` for this task. -:::caution -It is recommended to use models provided by us which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-style-transfer-candy), you can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/image-segmentation-68d5291bdf4a30bee0220f4f), you can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -55,7 +55,7 @@ To run the model, you can use the `forward` method. It accepts three arguments: - The `classesOfInterest` list contains classes for which to output the full results. By default the list is empty, and only the most probable classes are returned (essentially an arg max for each pixel). Look at [`DeeplabLabel`](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/imageSegmentation.ts) enum for possible classes. - The `resize` flag says whether the output will be rescaled back to the size of the image you put in. The default is `false`. The model runs inference on a scaled (probably smaller) version of your image (224x224 for `DEEPLAB_V3_RESNET50`). If you choose to resize, the output will be `number[]` of size `width * height` of your original image. -:::caution +:::warning Setting `resize` to true will make `forward` slower. ::: @@ -98,7 +98,7 @@ function App() { ### Memory usage -:::warning warning +:::warning Data presented in the following sections is based on inference with non-resized output. When resize is enabled, expect higher memory usage and inference time with higher resolutions. ::: @@ -108,7 +108,7 @@ Data presented in the following sections is based on inference with non-resized ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useOCR.md b/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useOCR.md index 13021fd36..d491ed65b 100644 --- a/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useOCR.md +++ b/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useOCR.md @@ -4,8 +4,8 @@ title: useOCR Optical character recognition(OCR) is a computer vision technique that detects and recognizes text within the image. It's commonly used to convert different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data. -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/ocr-68d0eb320ae6d20b5f901ea9). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -311,7 +311,7 @@ You need to make sure the recognizer models you pass in `recognizerSources` matc | ----------------------------------------------- | ----------------------------------------------------- | | Original Image | Image with detected Text Boxes | -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useObjectDetection.md b/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useObjectDetection.md index 2bae6a658..4ec125827 100644 --- a/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useObjectDetection.md +++ b/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useObjectDetection.md @@ -6,7 +6,7 @@ Object detection is a computer vision technique that identifies and locates obje `useObjectDetection` is a hook that allows you to seamlessly integrate object detection into your React Native applications. :::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-ssdlite320-mobilenet-v3-large). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/object-detection-68d0ea936cd0906843cbba7d). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -143,7 +143,7 @@ function App() { ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useStyleTransfer.md b/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useStyleTransfer.md index f5d0a423c..72f420668 100644 --- a/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useStyleTransfer.md +++ b/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useStyleTransfer.md @@ -4,8 +4,8 @@ title: useStyleTransfer Style transfer is a technique used in computer graphics and machine learning where the visual style of one image is applied to the content of another. This is achieved using algorithms that manipulate data from both images, typically with the aid of a neural network. The result is a new image that combines the artistic elements of one picture with the structural details of another, effectively merging art with traditional imagery. React Native ExecuTorch offers a dedicated hook `useStyleTransfer`, for this task. However before you start you'll need to obtain ExecuTorch-compatible model binary. -:::caution -It is recommended to use models provided by us which are available at our [Hugging Face repository](https://huggingface.co/software-mansion/react-native-executorch-style-transfer-candy), you can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/style-transfer-68d0eab2b0767a20e7efeaf5), you can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -102,7 +102,7 @@ function App() { ### Inference time -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useVerticalOCR.md b/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useVerticalOCR.md index 6d6aa7990..88934fd2e 100644 --- a/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useVerticalOCR.md +++ b/docs/versioned_docs/version-0.6.0/02-hooks/02-computer-vision/useVerticalOCR.md @@ -8,8 +8,8 @@ The `useVerticalOCR` hook is currently in an experimental phase. We appreciate f Optical Character Recognition (OCR) is a computer vision technique used to detect and recognize text within images. It is commonly utilized to convert a variety of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data. Traditionally, OCR technology has been optimized for recognizing horizontal text, and integrating support for vertical text recognition often requires significant additional effort from developers. To simplify this, we introduce `useVerticalOCR`, a tool designed to abstract the complexities of vertical text OCR, enabling seamless integration into your applications. -:::caution -It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/software-mansion). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. +:::warning +It is recommended to use models provided by us, which are available at our [Hugging Face repository](https://huggingface.co/collections/software-mansion/ocr-68d0eb320ae6d20b5f901ea9). You can also use [constants](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/constants/modelUrls.ts) shipped with our library. ::: ## Reference @@ -326,7 +326,7 @@ You need to make sure the recognizer models you pass in `recognizerSources` matc | ------------------------------------------------------- | ------------------------------------------------------------ | | Original Image | Image with detected Text Boxes | -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: diff --git a/docs/versioned_docs/version-0.6.0/02-hooks/03-executorch-bindings/useExecutorchModule.md b/docs/versioned_docs/version-0.6.0/02-hooks/03-executorch-bindings/useExecutorchModule.md index 137b19d92..f9b9f21a9 100644 --- a/docs/versioned_docs/version-0.6.0/02-hooks/03-executorch-bindings/useExecutorchModule.md +++ b/docs/versioned_docs/version-0.6.0/02-hooks/03-executorch-bindings/useExecutorchModule.md @@ -4,7 +4,7 @@ title: useExecutorchModule useExecutorchModule provides React Native bindings to the ExecuTorch [Module API](https://pytorch.org/executorch/stable/extension-module.html) directly from JavaScript. -:::caution +:::warning These bindings are primarily intended for custom model integration where no dedicated hook exists. If you are considering using a provided model, first verify whether a dedicated hook is available. Dedicated hooks simplify the implementation process by managing necessary pre and post-processing automatically. Utilizing these can save you effort and reduce complexity, ensuring you do not implement additional handling that is already covered. ::: diff --git a/docs/versioned_docs/version-0.6.0/03-typescript-api/02-computer-vision/ImageSegmentationModule.md b/docs/versioned_docs/version-0.6.0/03-typescript-api/02-computer-vision/ImageSegmentationModule.md index 99deae014..89344c70a 100644 --- a/docs/versioned_docs/version-0.6.0/03-typescript-api/02-computer-vision/ImageSegmentationModule.md +++ b/docs/versioned_docs/version-0.6.0/03-typescript-api/02-computer-vision/ImageSegmentationModule.md @@ -63,7 +63,7 @@ To run the model, you can use the `forward` method on the module object. It acce - The `classesOfInterest` list contains classes for which to output the full results. By default the list is empty, and only the most probable classes are returned (essentially an arg max for each pixel). Look at `DeeplabLabel` enum for possible classes. - The `resize` flag says whether the output will be rescaled back to the size of the image you put in. The default is `false`. The model runs inference on a scaled (probably smaller) version of your image (224x224 for the `DEEPLAB_V3_RESNET50`). If you choose to resize, the output will be `number[]` of size `width * height` of your original image. -:::caution +:::warning Setting `resize` to true will make `forward` slower. ::: diff --git a/docs/versioned_docs/version-0.6.0/04-benchmarks/inference-time.md b/docs/versioned_docs/version-0.6.0/04-benchmarks/inference-time.md index dbfc2b21d..45607f468 100644 --- a/docs/versioned_docs/version-0.6.0/04-benchmarks/inference-time.md +++ b/docs/versioned_docs/version-0.6.0/04-benchmarks/inference-time.md @@ -2,7 +2,7 @@ title: Inference Time --- -:::warning warning +:::warning Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization. ::: @@ -64,6 +64,16 @@ The values below represent the averages across all runs for the benchmark image. ❌ - Insufficient RAM. +### Streaming mode + +Notice than for `Whisper` model which has to take as an input 30 seconds audio chunks (for shorter audio it is automatically padded with silence to 30 seconds) `fast` mode has the lowest latency (time from starting transcription to first token returned, caused by streaming algorithm), but the slowest speed. If you believe that this might be a problem for you, prefer `balanced` mode instead. + +| Model (mode) | iPhone 16 Pro (XNNPACK) [latency \| tokens/s] | iPhone 14 Pro (XNNPACK) [latency \| tokens/s] | iPhone SE 3 (XNNPACK) [latency \| tokens/s] | Samsung Galaxy S24 (XNNPACK) [latency \| tokens/s] | OnePlus 12 (XNNPACK) [latency \| tokens/s] | +| ----------------------- | :-------------------------------------------: | :-------------------------------------------: | :-----------------------------------------: | :------------------------------------------------: | :----------------------------------------: | +| Whisper-tiny (fast) | 2.8s \| 5.5t/s | 3.7s \| 4.4t/s | 4.4s \| 3.4t/s | 5.5s \| 3.1t/s | 5.3s \| 3.8t/s | +| Whisper-tiny (balanced) | 5.6s \| 7.9t/s | 7.0s \| 6.3t/s | 8.3s \| 5.0t/s | 8.4s \| 6.7t/s | 7.7s \| 7.2t/s | +| Whisper-tiny (quality) | 10.3s \| 8.3t/s | 12.6s \| 6.8t/s | 7.8s \| 8.9t/s | 13.5s \| 7.1t/s | 12.9s \| 7.5t/s | + ### Encoding Average time for encoding audio of given length over 10 runs. For `Whisper` model we only list 30 sec audio chunks since `Whisper` does not accept other lengths (for shorter audio the audio needs to be padded to 30sec with silence). diff --git a/docs/versioned_docs/version-0.6.0/04-benchmarks/memory-usage.md b/docs/versioned_docs/version-0.6.0/04-benchmarks/memory-usage.md index a0c5a7b6d..f1aa638af 100644 --- a/docs/versioned_docs/version-0.6.0/04-benchmarks/memory-usage.md +++ b/docs/versioned_docs/version-0.6.0/04-benchmarks/memory-usage.md @@ -73,6 +73,16 @@ All the below benchmarks were performed on iPhone 17 Pro (iOS) and OnePlus 12 (A | --------------------------- | :--------------------: | :----------------: | | CLIP_VIT_BASE_PATCH32_IMAGE | 345 | 340 | +## Image Segmentation + +:::warning +Data presented in the following sections is based on inference with non-resized output. When resize is enabled, expect higher memory usage and inference time with higher resolutions. +::: + +| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] | +| ----------------- | ---------------------- | ------------------ | +| DEELABV3_RESNET50 | 930 | 660 | + ## Text to Image | Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] | diff --git a/docs/versioned_docs/version-0.6.0/04-benchmarks/model-size.md b/docs/versioned_docs/version-0.6.0/04-benchmarks/model-size.md index 00e819494..c095757ba 100644 --- a/docs/versioned_docs/version-0.6.0/04-benchmarks/model-size.md +++ b/docs/versioned_docs/version-0.6.0/04-benchmarks/model-size.md @@ -83,6 +83,12 @@ title: Model Size | --------------------------- | :----------: | | CLIP_VIT_BASE_PATCH32_IMAGE | 352 | +## Image Segmentation + +| Model | XNNPACK [MB] | +| ----------------- | ------------ | +| DEELABV3_RESNET50 | 168 | + ## Text to Image | Model | Text encoder (XNNPACK) [MB] | UNet (XNNPACK) [MB] | VAE decoder (XNNPACK) [MB] |