carbon-typescript-sdk
TypeScript icon, indicating that this package has built-in type declarations

0.2.53 • Public • Published

Visit Carbon

Connect external data to LLMs, no matter the source.

npm

Table of Contents

Installation

npm pnpm yarn
npm i carbon-typescript-sdk
pnpm i carbon-typescript-sdk
yarn add carbon-typescript-sdk

Getting Started

import { Carbon } from "carbon-typescript-sdk";

// Generally this is done in the backend to avoid exposing API key to the client

const carbonWithApiKey = new Carbon({
  apiKey: "API_KEY",
  customerId: "CUSTOMER_ID",
});

const accessToken = await carbonWithApiKey.auth.getAccessToken();

// Once an access token is obtained, it can be passed to the frontend
// and used to instantiate the SDK client without an API key

const carbon = new Carbon({
  accessToken: accessToken.data.access_token,
});

// use SDK as usual
const whiteLabeling = await carbon.auth.getWhiteLabeling();
// etc.

Reference

carbon.auth.getAccessToken

Get Access Token

🛠️ Usage

const getAccessTokenResponse = await carbon.auth.getAccessToken();

🔄 Return

TokenResponse

🌐 Endpoint

/auth/v1/access_token GET

🔙 Back to Table of Contents


carbon.auth.getWhiteLabeling

Returns whether or not the organization is white labeled and which integrations are white labeled

:param current_user: the current user :param db: the database session :return: a WhiteLabelingResponse

🛠️ Usage

const getWhiteLabelingResponse = await carbon.auth.getWhiteLabeling();

🔄 Return

WhiteLabelingResponse

🌐 Endpoint

/auth/v1/white_labeling GET

🔙 Back to Table of Contents


carbon.cRM.getAccount

Get Account

🛠️ Usage

const getAccountResponse = await carbon.cRM.getAccount({
  id: "id_example",
  dataSourceId: 1,
  includeRemoteData: false,
});

⚙️ Parameters

id: string
dataSourceId: number
includeRemoteData: boolean
includes: BaseIncludes[]

🔄 Return

Account

🌐 Endpoint

/integrations/data/crm/accounts/{id} GET

🔙 Back to Table of Contents


carbon.cRM.getAccounts

Get Accounts

🛠️ Usage

const getAccountsResponse = await carbon.cRM.getAccounts({
  data_source_id: 1,
  include_remote_data: false,
  order_dir: "asc",
  includes: [],
  order_by: "created_at",
});

⚙️ Parameters

data_source_id: number
include_remote_data: boolean
next_cursor: string
page_size: number
order_dir: OrderDirV2Nullable
includes: BaseIncludes[]
filters: AccountFilters

🔄 Return

AccountResponse

🌐 Endpoint

/integrations/data/crm/accounts POST

🔙 Back to Table of Contents


carbon.cRM.getContact

Get Contact

🛠️ Usage

const getContactResponse = await carbon.cRM.getContact({
  id: "id_example",
  dataSourceId: 1,
  includeRemoteData: false,
});

⚙️ Parameters

id: string
dataSourceId: number
includeRemoteData: boolean
includes: BaseIncludes[]

🔄 Return

Contact

🌐 Endpoint

/integrations/data/crm/contacts/{id} GET

🔙 Back to Table of Contents


carbon.cRM.getContacts

Get Contacts

🛠️ Usage

const getContactsResponse = await carbon.cRM.getContacts({
  data_source_id: 1,
  include_remote_data: false,
  order_dir: "asc",
  includes: [],
  order_by: "created_at",
});

⚙️ Parameters

data_source_id: number
include_remote_data: boolean
next_cursor: string
page_size: number
order_dir: OrderDirV2Nullable
includes: BaseIncludes[]
filters: ContactFilters

🔄 Return

ContactsResponse

🌐 Endpoint

/integrations/data/crm/contacts POST

🔙 Back to Table of Contents


carbon.cRM.getLead

Get Lead

🛠️ Usage

const getLeadResponse = await carbon.cRM.getLead({
  id: "id_example",
  dataSourceId: 1,
  includeRemoteData: false,
});

⚙️ Parameters

id: string
dataSourceId: number
includeRemoteData: boolean
includes: BaseIncludes[]

🔄 Return

Lead

🌐 Endpoint

/integrations/data/crm/leads/{id} GET

🔙 Back to Table of Contents


carbon.cRM.getLeads

Get Leads

🛠️ Usage

const getLeadsResponse = await carbon.cRM.getLeads({
  data_source_id: 1,
  include_remote_data: false,
  order_dir: "asc",
  includes: [],
  order_by: "created_at",
});

⚙️ Parameters

data_source_id: number
include_remote_data: boolean
next_cursor: string
page_size: number
order_dir: OrderDirV2Nullable
includes: BaseIncludes[]
filters: LeadFilters

🔄 Return

LeadsResponse

🌐 Endpoint

/integrations/data/crm/leads POST

🔙 Back to Table of Contents


carbon.cRM.getOpportunities

Get Opportunities

🛠️ Usage

const getOpportunitiesResponse = await carbon.cRM.getOpportunities({
  data_source_id: 1,
  include_remote_data: false,
  order_dir: "asc",
  includes: [],
  order_by: "created_at",
});

⚙️ Parameters

data_source_id: number
include_remote_data: boolean
next_cursor: string
page_size: number
order_dir: OrderDirV2Nullable
includes: BaseIncludes[]

🔄 Return

OpportunitiesResponse

🌐 Endpoint

/integrations/data/crm/opportunities POST

🔙 Back to Table of Contents


carbon.cRM.getOpportunity

Get Opportunity

🛠️ Usage

const getOpportunityResponse = await carbon.cRM.getOpportunity({
  id: "id_example",
  dataSourceId: 1,
  includeRemoteData: false,
});

⚙️ Parameters

id: string
dataSourceId: number
includeRemoteData: boolean
includes: BaseIncludes[]

🔄 Return

Opportunity

🌐 Endpoint

/integrations/data/crm/opportunities/{id} GET

🔙 Back to Table of Contents


carbon.dataSources.addTags

Add Data Source Tags

🛠️ Usage

const addTagsResponse = await carbon.dataSources.addTags({
  tags: {},
  data_source_id: 1,
});

⚙️ Parameters

tags: object
data_source_id: number

🔄 Return

OrganizationUserDataSourceAPI

🌐 Endpoint

/data_sources/tags/add POST

🔙 Back to Table of Contents


carbon.dataSources.query

Data Sources

🛠️ Usage

const queryResponse = await carbon.dataSources.query({
  order_by: "created_at",
  order_dir: "desc",
});

⚙️ Parameters

pagination: Pagination
order_dir: OrderDir

🔄 Return

OrganizationUserDataSourceResponse

🌐 Endpoint

/data_sources POST

🔙 Back to Table of Contents


carbon.dataSources.queryUserDataSources

User Data Sources

🛠️ Usage

const queryUserDataSourcesResponse =
  await carbon.dataSources.queryUserDataSources({
    order_by: "created_at",
    order_dir: "desc",
  });

⚙️ Parameters

pagination: Pagination
order_dir: OrderDir

🔄 Return

OrganizationUserDataSourceResponse

🌐 Endpoint

/user_data_sources POST

🔙 Back to Table of Contents


carbon.dataSources.removeTags

Remove Data Source Tags

🛠️ Usage

const removeTagsResponse = await carbon.dataSources.removeTags({
  data_source_id: 1,
  tags_to_remove: [],
  remove_all_tags: false,
});

⚙️ Parameters

data_source_id: number
tags_to_remove: string[]
remove_all_tags: boolean

🔄 Return

OrganizationUserDataSourceAPI

🌐 Endpoint

/data_sources/tags/remove POST

🔙 Back to Table of Contents


carbon.dataSources.revokeAccessToken

Revoke Access Token

🛠️ Usage

const revokeAccessTokenResponse = await carbon.dataSources.revokeAccessToken({
  data_source_id: 1,
});

⚙️ Parameters

data_source_id: number

🔄 Return

GenericSuccessResponse

🌐 Endpoint

/revoke_access_token POST

🔙 Back to Table of Contents


carbon.embeddings.getDocuments

For pre-filtering documents, using tags_v2 is preferred to using tags (which is now deprecated). If both tags_v2 and tags are specified, tags is ignored. tags_v2 enables building complex filters through the use of "AND", "OR", and negation logic. Take the below input as an example:

{
    "OR": [
        {
            "key": "subject",
            "value": "holy-bible",
            "negate": false
        },
        {
            "key": "person-of-interest",
            "value": "jesus christ",
            "negate": false
        },
        {
            "key": "genre",
            "value": "religion",
            "negate": true
        }
        {
            "AND": [
                {
                    "key": "subject",
                    "value": "tao-te-ching",
                    "negate": false
                },
                {
                    "key": "author",
                    "value": "lao-tzu",
                    "negate": false
                }
            ]
        }
    ]
}

In this case, files will be filtered such that:

  1. "subject" = "holy-bible" OR
  2. "person-of-interest" = "jesus christ" OR
  3. "genre" != "religion" OR
  4. "subject" = "tao-te-ching" AND "author" = "lao-tzu"

Note that the top level of the query must be either an "OR" or "AND" array. Currently, nesting is limited to 3. For tag blocks (those with "key", "value", and "negate" keys), the following typing rules apply:

  1. "key" isn't optional and must be a string
  2. "value" isn't optional and can be any or list[any]
  3. "negate" is optional and must be true or false. If present and true, then the filter block is negated in the resulting query. It is false by default.

When querying embeddings, you can optionally specify the media_type parameter in your request. By default (if not set), it is equal to "TEXT". This means that the query will be performed over files that have been parsed as text (for now, this covers all files except image files). If it is equal to "IMAGE", the query will be performed over image files (for now, .jpg and .png files). You can think of this field as an additional filter on top of any filters set in file_ids and

When hybrid_search is set to true, a combination of keyword search and semantic search are used to rank and select candidate embeddings during information retrieval. By default, these search methods are weighted equally during the ranking process. To adjust the weight (or "importance") of each search method, you can use the hybrid_search_tuning_parameters property. The description for the different tuning parameters are:

  • weight_a: weight to assign to semantic search
  • weight_b: weight to assign to keyword search

You must ensure that sum(weight_a, weight_b,..., weight_n) for all n weights is equal to 1. The equality has an error tolerance of 0.001 to account for possible floating point issues.

In order to use hybrid search for a customer across a set of documents, two flags need to be enabled:

  1. Use the /modify_user_configuration endpoint to to enable sparse_vectors for the customer. The payload body for this request is below:
{
  "configuration_key_name": "sparse_vectors",
  "value": {
    "enabled": true
  }
}
  1. Make sure hybrid search is enabled for the documents across which you want to perform the search. For the /uploadfile endpoint, this can be done by setting the following query parameter: generate_sparse_vectors=true

Carbon supports multiple models for use in generating embeddings for files. For images, we support Vertex AI's multimodal model; for text, we support OpenAI's text-embedding-ada-002 and Cohere's embed-multilingual-v3.0. The model can be specified via the embedding_model parameter (in the POST body for /embeddings, and a query parameter in /uploadfile). If no model is supplied, the text-embedding-ada-002 is used by default. When performing embedding queries, embeddings from files that used the specified model will be considered in the query. For example, if files A and B have embeddings generated with OPENAI, and files C and D have embeddings generated with COHERE_MULTILINGUAL_V3, then by default, queries will only consider files A and B. If COHERE_MULTILINGUAL_V3 is specified as the embedding_model in /embeddings, then only files C and D will be considered. Make sure that the set of all files you want considered for a query have embeddings generated via the same model. For now, do not set VERTEX_MULTIMODAL as an embedding_model. This model is used automatically by Carbon when it detects an image file.

🛠️ Usage

const getDocumentsResponse = await carbon.embeddings.getDocuments({
  query: "query_example",
  k: 1,
  include_all_children: false,
  media_type: "TEXT",
  embedding_model: "OPENAI",
  include_file_level_metadata: false,
  high_accuracy: false,
  exclude_cold_storage_files: false,
});

⚙️ Parameters

query: string

Query for which to get related chunks and embeddings.

k: number

Number of related chunks to return.

tags: Record<string, Tags1>

A set of tags to limit the search to. Deprecated and may be removed in the future.

query_vector: number[]

Optional query vector for which to get related chunks and embeddings. It must have been generated by the same model used to generate the embeddings across which the search is being conducted. Cannot provide both query and query_vector.

file_ids: number[]

Optional list of file IDs to limit the search to

parent_file_ids: number[]

Optional list of parent file IDs to limit the search to. A parent file describes a file to which another file belongs (e.g. a folder)

include_all_children: boolean

Flag to control whether or not to include all children of filtered files in the embedding search.

tags_v2: object

A set of tags to limit the search to. Use this instead of tags, which is deprecated.

include_tags: boolean

Flag to control whether or not to include tags for each chunk in the response.

include_vectors: boolean

Flag to control whether or not to include embedding vectors in the response.

include_raw_file: boolean

Flag to control whether or not to include a signed URL to the raw file containing each chunk in the response.

hybrid_search: boolean

Flag to control whether or not to perform hybrid search.

hybrid_search_tuning_parameters: HybridSearchTuningParamsNullable
embedding_model: EmbeddingGeneratorsNullable
include_file_level_metadata: boolean

Flag to control whether or not to include file-level metadata in the response. This metadata will be included in the content_metadata field of each document along with chunk/embedding level metadata.

high_accuracy: boolean

Flag to control whether or not to perform a high accuracy embedding search. By default, this is set to false. If true, the search may return more accurate results, but may take longer to complete.

file_types_at_source: AutoSyncedSourceTypesPropertyInner[]

Filter files based on their type at the source (for example help center tickets and articles)

exclude_cold_storage_files: boolean

Flag to control whether or not to exclude files that are not in hot storage. If set to False, then an error will be returned if any filtered files are in cold storage.

🔄 Return

DocumentResponseList

🌐 Endpoint

/embeddings POST

🔙 Back to Table of Contents


carbon.embeddings.getEmbeddingsAndChunks

Retrieve Embeddings And Content

🛠️ Usage

const getEmbeddingsAndChunksResponse =
  await carbon.embeddings.getEmbeddingsAndChunks({
    order_by: "created_at",
    order_dir: "desc",
    filters: {
      user_file_id: 1,
      embedding_model: "OPENAI",
    },
    include_vectors: false,
  });

⚙️ Parameters

pagination: Pagination
order_dir: OrderDir
include_vectors: boolean

🔄 Return

EmbeddingsAndChunksResponse

🌐 Endpoint

/text_chunks POST

🔙 Back to Table of Contents


carbon.embeddings.list

Retrieve Embeddings And Content V2

🛠️ Usage

const listResponse = await carbon.embeddings.list({
  order_by: "created_at",
  order_dir: "desc",
  filters: {
    include_all_children: false,
    non_synced_only: false,
  },
  include_vectors: false,
});

⚙️ Parameters

pagination: Pagination
order_dir: OrderDir
include_vectors: boolean

🔄 Return

EmbeddingsAndChunksResponse

🌐 Endpoint

/list_chunks_and_embeddings POST

🔙 Back to Table of Contents


carbon.embeddings.uploadChunksAndEmbeddings

Upload Chunks And Embeddings

🛠️ Usage

const uploadChunksAndEmbeddingsResponse =
  await carbon.embeddings.uploadChunksAndEmbeddings({
    embedding_model: "OPENAI",
    chunks_and_embeddings: [
      {
        file_id: 1,
        chunks_and_embeddings: [
          {
            chunk_number: 1,
            chunk: "chunk_example",
          },
        ],
      },
    ],
    overwrite_existing: false,
    chunks_only: false,
  });

⚙️ Parameters

embedding_model: EmbeddingGenerators
chunks_and_embeddings: SingleChunksAndEmbeddingsUploadInput[]
overwrite_existing: boolean
chunks_only: boolean
custom_credentials: { [key: string]: object; }

🔄 Return

GenericSuccessResponse

🌐 Endpoint

/upload_chunks_and_embeddings POST

🔙 Back to Table of Contents


carbon.files.createUserFileTags

A tag is a key-value pair that can be added to a file. This pair can then be used for searches (e.g. embedding searches) in order to narrow down the scope of the search. A file can have any number of tags. The following are reserved keys that cannot be used:

  • db_embedding_id
  • organization_id
  • user_id
  • organization_user_file_id

Carbon currently supports two data types for tag values - string and list<string>. Keys can only be string. If values other than string and list<string> are used, they're automatically converted to strings (e.g. 4 will become "4").

🛠️ Usage

const createUserFileTagsResponse = await carbon.files.createUserFileTags({
  tags: {
    key: "string_example",
  },
  organization_user_file_id: 1,
});

⚙️ Parameters

tags: Record<string, Tags1>
organization_user_file_id: number

🔄 Return

UserFile

🌐 Endpoint

/create_user_file_tags POST

🔙 Back to Table of Contents


carbon.files.deleteFileTags

Delete File Tags

🛠️ Usage

const deleteFileTagsResponse = await carbon.files.deleteFileTags({
  tags: ["tags_example"],
  organization_user_file_id: 1,
});

⚙️ Parameters

tags: string[]
organization_user_file_id: number

🔄 Return

UserFile

🌐 Endpoint

/delete_user_file_tags POST

🔙 Back to Table of Contents


carbon.files.deleteMany

Deprecated

Delete Files Endpoint

🛠️ Usage

const deleteManyResponse = await carbon.files.deleteMany({
  delete_non_synced_only: false,
  send_webhook: false,
  delete_child_files: false,
});

⚙️ Parameters

file_ids: number[]
sync_statuses: ExternalFileSyncStatuses[]
delete_non_synced_only: boolean
send_webhook: boolean
delete_child_files: boolean

🔄 Return

GenericSuccessResponse

🌐 Endpoint

/delete_files POST

🔙 Back to Table of Contents


carbon.files.deleteV2

Delete Files V2 Endpoint

🛠️ Usage

const deleteV2Response = await carbon.files.deleteV2({
  send_webhook: false,
  preserve_file_record: false,
});

⚙️ Parameters

send_webhook: boolean
preserve_file_record: boolean

Whether or not to delete all data related to the file from the database, BUT to preserve the file metadata, allowing for resyncs. By default preserve_file_record is false, which means that all data related to the file as well as its metadata will be deleted. Note that even if preserve_file_record is true, raw files uploaded via the uploadfile endpoint still cannot be resynced.

🔄 Return

GenericSuccessResponse

🌐 Endpoint

/delete_files_v2 POST

🔙 Back to Table of Contents


carbon.files.getParsedFile

Deprecated

This route is deprecated. Use /user_files_v2 instead.

🛠️ Usage

const getParsedFileResponse = await carbon.files.getParsedFile({
  fileId: 1,
});

⚙️ Parameters

fileId: number

🔄 Return

PresignedURLResponse

🌐 Endpoint

/parsed_file/{file_id} GET

🔙 Back to Table of Contents


carbon.files.getRawFile

Deprecated

This route is deprecated. Use /user_files_v2 instead.

🛠️ Usage

const getRawFileResponse = await carbon.files.getRawFile({
  fileId: 1,
});

⚙️ Parameters

fileId: number

🔄 Return

PresignedURLResponse

🌐 Endpoint

/raw_file/{file_id} GET

🔙 Back to Table of Contents


carbon.files.modifyColdStorageParameters

Modify Cold Storage Parameters

🛠️ Usage

const modifyColdStorageParametersResponse =
  await carbon.files.modifyColdStorageParameters({});

⚙️ Parameters

enable_cold_storage: boolean
hot_storage_time_to_live: number

🌐 Endpoint

/modify_cold_storage_parameters POST

🔙 Back to Table of Contents


carbon.files.moveToHotStorage

Move To Hot Storage

🛠️ Usage

const moveToHotStorageResponse = await carbon.files.moveToHotStorage({});

⚙️ Parameters

🌐 Endpoint

/move_to_hot_storage POST

🔙 Back to Table of Contents


carbon.files.queryUserFiles

For pre-filtering documents, using tags_v2 is preferred to using tags (which is now deprecated). If both tags_v2 and tags are specified, tags is ignored. tags_v2 enables building complex filters through the use of "AND", "OR", and negation logic. Take the below input as an example:

{
    "OR": [
        {
            "key": "subject",
            "value": "holy-bible",
            "negate": false
        },
        {
            "key": "person-of-interest",
            "value": "jesus christ",
            "negate": false
        },
        {
            "key": "genre",
            "value": "religion",
            "negate": true
        }
        {
            "AND": [
                {
                    "key": "subject",
                    "value": "tao-te-ching",
                    "negate": false
                },
                {
                    "key": "author",
                    "value": "lao-tzu",
                    "negate": false
                }
            ]
        }
    ]
}

In this case, files will be filtered such that:

  1. "subject" = "holy-bible" OR
  2. "person-of-interest" = "jesus christ" OR
  3. "genre" != "religion" OR
  4. "subject" = "tao-te-ching" AND "author" = "lao-tzu"

Note that the top level of the query must be either an "OR" or "AND" array. Currently, nesting is limited to 3. For tag blocks (those with "key", "value", and "negate" keys), the following typing rules apply:

  1. "key" isn't optional and must be a string
  2. "value" isn't optional and can be any or list[any]
  3. "negate" is optional and must be true or false. If present and true, then the filter block is negated in the resulting query. It is false by default.

🛠️ Usage

const queryUserFilesResponse = await carbon.files.queryUserFiles({
  order_by: "created_at",
  order_dir: "desc",
  presigned_url_expiry_time_seconds: 3600,
});

⚙️ Parameters

pagination: Pagination
order_dir: OrderDir
include_raw_file: boolean

If true, the query will return presigned URLs for the raw file. Only relevant for the /user_files_v2 endpoint.

include_parsed_text_file: boolean

If true, the query will return presigned URLs for the parsed text file. Only relevant for the /user_files_v2 endpoint.

include_additional_files: boolean

If true, the query will return presigned URLs for additional files. Only relevant for the /user_files_v2 endpoint.

presigned_url_expiry_time_seconds: number

The expiry time for the presigned URLs. Only relevant for the /user_files_v2 endpoint.

🔄 Return

UserFilesV2

🌐 Endpoint

/user_files_v2 POST

🔙 Back to Table of Contents


carbon.files.queryUserFilesDeprecated

Deprecated

This route is deprecated. Use /user_files_v2 instead.

🛠️ Usage

const queryUserFilesDeprecatedResponse =
  await carbon.files.queryUserFilesDeprecated({
    order_by: "created_at",
    order_dir: "desc",
    presigned_url_expiry_time_seconds: 3600,
  });

⚙️ Parameters

pagination: Pagination
order_dir: OrderDir
include_raw_file: boolean

If true, the query will return presigned URLs for the raw file. Only relevant for the /user_files_v2 endpoint.

include_parsed_text_file: boolean

If true, the query will return presigned URLs for the parsed text file. Only relevant for the /user_files_v2 endpoint.

include_additional_files: boolean

If true, the query will return presigned URLs for additional files. Only relevant for the /user_files_v2 endpoint.

presigned_url_expiry_time_seconds: number

The expiry time for the presigned URLs. Only relevant for the /user_files_v2 endpoint.

🔄 Return

UserFile

🌐 Endpoint

/user_files POST

🔙 Back to Table of Contents


carbon.files.resync

Resync File

🛠️ Usage

const resyncResponse = await carbon.files.resync({
  file_id: 1,
  force_embedding_generation: false,
  skip_file_processing: false,
});

⚙️ Parameters

file_id: number
chunk_size: number
chunk_overlap: number
force_embedding_generation: boolean
skip_file_processing: boolean

🔄 Return

UserFile

🌐 Endpoint

/resync_file POST

🔙 Back to Table of Contents


carbon.files.upload

This endpoint is used to directly upload local files to Carbon. The POST request should be a multipart form request. Note that the set_page_as_boundary query parameter is applicable only to PDFs for now. When this value is set, PDF chunks are at most one page long. Additional information can be retrieved for each chunk, however, namely the coordinates of the bounding box around the chunk (this can be used for things like text highlighting). Following is a description of all possible query parameters:

  • chunk_size: the chunk size (in tokens) applied when splitting the document
  • chunk_overlap: the chunk overlap (in tokens) applied when splitting the document
  • skip_embedding_generation: whether or not to skip the generation of chunks and embeddings
  • set_page_as_boundary: described above
  • embedding_model: the model used to generate embeddings for the document chunks
  • use_ocr: whether or not to use OCR as a preprocessing step prior to generating chunks. Valid for PDFs, JPEGs, and PNGs
  • generate_sparse_vectors: whether or not to generate sparse vectors for the file. Required for hybrid search.
  • prepend_filename_to_chunks: whether or not to prepend the filename to the chunk text

Carbon supports multiple models for use in generating embeddings for files. For images, we support Vertex AI's multimodal model; for text, we support OpenAI's text-embedding-ada-002 and Cohere's embed-multilingual-v3.0. The model can be specified via the embedding_model parameter (in the POST body for /embeddings, and a query parameter in /uploadfile). If no model is supplied, the text-embedding-ada-002 is used by default. When performing embedding queries, embeddings from files that used the specified model will be considered in the query. For example, if files A and B have embeddings generated with OPENAI, and files C and D have embeddings generated with COHERE_MULTILINGUAL_V3, then by default, queries will only consider files A and B. If COHERE_MULTILINGUAL_V3 is specified as the embedding_model in /embeddings, then only files C and D will be considered. Make sure that the set of all files you want considered for a query have embeddings generated via the same model. For now, do not set VERTEX_MULTIMODAL as an embedding_model. This model is used automatically by Carbon when it detects an image file.

🛠️ Usage

const uploadResponse = await carbon.files.upload({
  skipEmbeddingGeneration: false,
  setPageAsBoundary: false,
  useOcr: false,
  generateSparseVectors: false,
  prependFilenameToChunks: false,
  parsePdfTablesWithOcr: false,
  detectAudioLanguage: false,
  transcriptionService: "assemblyai",
  includeSpeakerLabels: false,
  mediaType: "TEXT",
  splitRows: false,
  enableColdStorage: false,
  generateChunksOnly: false,
  storeFileOnly: false,
  file: fs.readFileSync("/path/to/file"),
});

⚙️ Parameters

file: Uint8Array | File | buffer.File
chunkSize: number

Chunk size in tiktoken tokens to be used when processing file.

chunkOverlap: number

Chunk overlap in tiktoken tokens to be used when processing file.

skipEmbeddingGeneration: boolean

Flag to control whether or not embeddings should be generated and stored when processing file.

setPageAsBoundary: boolean

Flag to control whether or not to set the a page's worth of content as the maximum amount of content that can appear in a chunk. Only valid for PDFs. See description route description for more information.

embeddingModel: EmbeddingModel

Embedding model that will be used to embed file chunks.

useOcr: boolean

Whether or not to use OCR when processing files. Valid for PDFs, JPEGs, and PNGs. Useful for documents with tables, images, and/or scanned text.

generateSparseVectors: boolean

Whether or not to generate sparse vectors for the file. This is required for the file to be a candidate for hybrid search.

prependFilenameToChunks: boolean

Whether or not to prepend the file's name to chunks.

maxItemsPerChunk: number

Number of objects per chunk. For csv, tsv, xlsx, and json files only.

parsePdfTablesWithOcr: boolean

Whether to use rich table parsing when use_ocr is enabled.

detectAudioLanguage: boolean

Whether to automatically detect the language of the uploaded audio file.

transcriptionService: TranscriptionServiceNullable

The transcription service to use for audio files. If no service is specified, 'deepgram' will be used.

includeSpeakerLabels: boolean

Detect multiple speakers and label segments of speech by speaker for audio files.

The media type of the file. If not provided, it will be inferred from the file extension.

splitRows: boolean

Whether to split tabular rows into chunks. Currently only valid for CSV, TSV, and XLSX files.

enableColdStorage: boolean

Enable cold storage for the file. If set to true, the file will be moved to cold storage after a certain period of inactivity. Default is false.

hotStorageTimeToLive: number

Time in days after which the file will be moved to cold storage. Must be one of [1, 3, 7, 14, 30].

generateChunksOnly: boolean

If this flag is enabled, the file will be chunked and stored with Carbon, but no embeddings will be generated. This overrides the skip_embedding_generation flag.

storeFileOnly: boolean

If this flag is enabled, the file will be stored with Carbon, but no processing will be done.

🔄 Return

UserFile

🌐 Endpoint

/uploadfile POST

🔙 Back to Table of Contents


carbon.files.uploadFromUrl

Create Upload File From Url

🛠️ Usage

const uploadFromUrlResponse = await carbon.files.uploadFromUrl({
  url: "url_example",
  skip_embedding_generation: false,
  set_page_as_boundary: false,
  embedding_model: "OPENAI",
  generate_sparse_vectors: false,
  use_textract: false,
  prepend_filename_to_chunks: false,
  parse_pdf_tables_with_ocr: false,
  detect_audio_language: false,
  transcription_service: "assemblyai",
  include_speaker_labels: false,
  media_type: "TEXT",
  split_rows: false,
  generate_chunks_only: false,
  store_file_only: false,
});

⚙️ Parameters

url: string
file_name: string
chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
set_page_as_boundary: boolean
embedding_model: EmbeddingGenerators
generate_sparse_vectors: boolean
use_textract: boolean
prepend_filename_to_chunks: boolean
max_items_per_chunk: number

Number of objects per chunk. For csv, tsv, xlsx, and json files only.

parse_pdf_tables_with_ocr: boolean
detect_audio_language: boolean
transcription_service: TranscriptionServiceNullable
include_speaker_labels: boolean
split_rows: boolean
cold_storage_params: ColdStorageProps
generate_chunks_only: boolean

If this flag is enabled, the file will be chunked and stored with Carbon, but no embeddings will be generated. This overrides the skip_embedding_generation flag.

store_file_only: boolean

If this flag is enabled, the file will be stored with Carbon, but no processing will be done.

🔄 Return

UserFile

🌐 Endpoint

/upload_file_from_url POST

🔙 Back to Table of Contents


carbon.files.uploadText

Carbon supports multiple models for use in generating embeddings for files. For images, we support Vertex AI's multimodal model; for text, we support OpenAI's text-embedding-ada-002 and Cohere's embed-multilingual-v3.0. The model can be specified via the embedding_model parameter (in the POST body for /embeddings, and a query parameter in /uploadfile). If no model is supplied, the text-embedding-ada-002 is used by default. When performing embedding queries, embeddings from files that used the specified model will be considered in the query. For example, if files A and B have embeddings generated with OPENAI, and files C and D have embeddings generated with COHERE_MULTILINGUAL_V3, then by default, queries will only consider files A and B. If COHERE_MULTILINGUAL_V3 is specified as the embedding_model in /embeddings, then only files C and D will be considered. Make sure that the set of all files you want considered for a query have embeddings generated via the same model. For now, do not set VERTEX_MULTIMODAL as an embedding_model. This model is used automatically by Carbon when it detects an image file.

🛠️ Usage

const uploadTextResponse = await carbon.files.uploadText({
  contents: "contents_example",
  skip_embedding_generation: false,
  embedding_model: "OPENAI",
  generate_sparse_vectors: false,
  generate_chunks_only: false,
  store_file_only: false,
});

⚙️ Parameters

contents: string
name: string
chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
overwrite_file_id: number
embedding_model: EmbeddingGeneratorsNullable
generate_sparse_vectors: boolean
cold_storage_params: ColdStorageProps
generate_chunks_only: boolean

If this flag is enabled, the file will be chunked and stored with Carbon, but no embeddings will be generated. This overrides the skip_embedding_generation flag.

store_file_only: boolean

If this flag is enabled, the file will be stored with Carbon, but no processing will be done.

🔄 Return

UserFile

🌐 Endpoint

/upload_text POST

🔙 Back to Table of Contents


carbon.github.getIssue

Issue

🛠️ Usage

const getIssueResponse = await carbon.github.getIssue({
  issueNumber: 1,
  includeRemoteData: false,
});

⚙️ Parameters

issueNumber: number
includeRemoteData: boolean
dataSourceId: number
repository: string

🔄 Return

Issue

🌐 Endpoint

/integrations/data/github/issues/{issue_number} GET

🔙 Back to Table of Contents


carbon.github.getIssues

Issues

🛠️ Usage

const getIssuesResponse = await carbon.github.getIssues({
  data_source_id: 1,
  include_remote_data: false,
  repository: "repository_example",
  page: 1,
  page_size: 30,
  order_by: "created",
  order_dir: "asc",
});

⚙️ Parameters

data_source_id: number
repository: string

Full name of the repository, denoted as {owner}/{repo}

include_remote_data: boolean
page: number
page_size: number
next_cursor: string
filters: IssuesFilter
order_by: IssuesOrderBy
order_dir: OrderDirV2Nullable

🔄 Return

IssuesResponse

🌐 Endpoint

/integrations/data/github/issues POST

🔙 Back to Table of Contents


carbon.github.getPr

Get Pr

🛠️ Usage

const getPrResponse = await carbon.github.getPr({
  pullNumber: 1,
  includeRemoteData: false,
});

⚙️ Parameters

pullNumber: number
includeRemoteData: boolean
dataSourceId: number
repository: string

🔄 Return

PullRequestExtended

🌐 Endpoint

/integrations/data/github/pull_requests/{pull_number} GET

🔙 Back to Table of Contents


carbon.github.getPrComments

Pr Comments

🛠️ Usage

const getPrCommentsResponse = await carbon.github.getPrComments({
  data_source_id: 1,
  include_remote_data: false,
  repository: "repository_example",
  page: 1,
  page_size: 30,
  pull_number: 1,
  order_by: "created",
  order_dir: "asc",
});

⚙️ Parameters

data_source_id: number
repository: string

Full name of the repository, denoted as {owner}/{repo}

pull_number: number
include_remote_data: boolean
page: number
page_size: number
next_cursor: string
order_by: CommentsOrderBy
order_dir: OrderDirV2Nullable

🔄 Return

CommentsResponse

🌐 Endpoint

/integrations/data/github/pull_requests/comments POST

🔙 Back to Table of Contents


carbon.github.getPrCommits

Pr Commits

🛠️ Usage

const getPrCommitsResponse = await carbon.github.getPrCommits({
  data_source_id: 1,
  include_remote_data: false,
  repository: "repository_example",
  page: 1,
  page_size: 30,
  pull_number: 1,
});

⚙️ Parameters

data_source_id: number
repository: string

Full name of the repository, denoted as {owner}/{repo}

pull_number: number
include_remote_data: boolean
page: number
page_size: number
next_cursor: string

🔄 Return

CommitsResponse

🌐 Endpoint

/integrations/data/github/pull_requests/commits POST

🔙 Back to Table of Contents


carbon.github.getPrFiles

Pr Files

🛠️ Usage

const getPrFilesResponse = await carbon.github.getPrFiles({
  data_source_id: 1,
  include_remote_data: false,
  repository: "repository_example",
  page: 1,
  page_size: 30,
  pull_number: 1,
});

⚙️ Parameters

data_source_id: number
repository: string

Full name of the repository, denoted as {owner}/{repo}

pull_number: number
include_remote_data: boolean
page: number
page_size: number
next_cursor: string

🔄 Return

FilesResponse

🌐 Endpoint

/integrations/data/github/pull_requests/files POST

🔙 Back to Table of Contents


carbon.github.getPullRequests

Get Prs

🛠️ Usage

const getPullRequestsResponse = await carbon.github.getPullRequests({
  data_source_id: 1,
  include_remote_data: false,
  repository: "repository_example",
  page: 1,
  page_size: 30,
  order_by: "created",
  order_dir: "asc",
});

⚙️ Parameters

data_source_id: number
repository: string

Full name of the repository, denoted as {owner}/{repo}

include_remote_data: boolean
page: number
page_size: number
next_cursor: string
order_by: PROrderBy
order_dir: OrderDirV2Nullable

🔄 Return

PullRequestResponse

🌐 Endpoint

/integrations/data/github/pull_requests POST

🔙 Back to Table of Contents


carbon.integrations.cancel

Cancel Data Source Items Sync

🛠️ Usage

const cancelResponse = await carbon.integrations.cancel({
  data_source_id: 1,
});

⚙️ Parameters

data_source_id: number

🔄 Return

OrganizationUserDataSourceAPI

🌐 Endpoint

/integrations/items/sync/cancel POST

🔙 Back to Table of Contents


carbon.integrations.connectDataSource

Connect Data Source

🛠️ Usage

const connectDataSourceResponse = await carbon.integrations.connectDataSource({
  authentication: {
    source: "GOOGLE_DRIVE",
    access_token: "access_token_example",
  },
});

⚙️ Parameters

authentication: AuthenticationProperty
sync_options: SyncOptions

🔄 Return

ConnectDataSourceResponse

🌐 Endpoint

/integrations/connect POST

🔙 Back to Table of Contents


carbon.integrations.connectDocument360

You will need an access token to connect your Document360 account. To obtain an access token, follow the steps highlighted here https://apidocs.document360.com/apidocs/api-token.

🛠️ Usage

const connectDocument360Response = await carbon.integrations.connectDocument360(
  {
    account_email: "account_email_example",
    access_token: "access_token_example",
    chunk_size: 1500,
    chunk_overlap: 20,
    skip_embedding_generation: false,
    embedding_model: "OPENAI",
    generate_sparse_vectors: false,
    prepend_filename_to_chunks: false,
    sync_files_on_connection: true,
    sync_source_items: true,
  }
);

⚙️ Parameters

account_email: string

This email will be used to identify your carbon data source. It should have access to the Document360 account you wish to connect.

access_token: string
tags: object
chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
embedding_model: EmbeddingGenerators
generate_sparse_vectors: boolean
prepend_filename_to_chunks: boolean
sync_files_on_connection: boolean
request_id: string
sync_source_items: boolean

Enabling this flag will fetch all available content from the source to be listed via list items endpoint

file_sync_config: FileSyncConfigNullable
data_source_tags: object

Tags to be associated with the data source. If the data source already has tags set, then an upsert will be performed.

🔄 Return

GenericSuccessResponse

🌐 Endpoint

/integrations/document360 POST

🔙 Back to Table of Contents


carbon.integrations.connectFreshdesk

Refer this article to obtain an API key https://support.freshdesk.com/en/support/solutions/articles/215517. Make sure that your API key has the permission to read solutions from your account and you are on a paid plan. Once you have an API key, you can make a request to this endpoint along with your freshdesk domain. This will trigger an automatic sync of the articles in your "solutions" tab. Additional parameters below can be used to associate data with the synced articles or modify the sync behavior.

🛠️ Usage

const connectFreshdeskResponse = await carbon.integrations.connectFreshdesk({
  domain: "domain_example",
  api_key: "api_key_example",
  chunk_size: 1500,
  chunk_overlap: 20,
  skip_embedding_generation: false,
  embedding_model: "OPENAI",
  generate_sparse_vectors: false,
  prepend_filename_to_chunks: false,
  sync_files_on_connection: true,
  sync_source_items: true,
});

⚙️ Parameters

domain: string
api_key: string
tags: object
chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
embedding_model: EmbeddingGeneratorsNullable
generate_sparse_vectors: boolean
prepend_filename_to_chunks: boolean
sync_files_on_connection: boolean
request_id: string
sync_source_items: boolean

Enabling this flag will fetch all available content from the source to be listed via list items endpoint

file_sync_config: FileSyncConfigNullable
data_source_tags: object

Tags to be associated with the data source. If the data source already has tags set, then an upsert will be performed.

🔄 Return

GenericSuccessResponse

🌐 Endpoint

/integrations/freshdesk POST

🔙 Back to Table of Contents


carbon.integrations.connectGitbook

You will need an access token to connect your Gitbook account. Note that the permissions will be defined by the user generating access token so make sure you have the permission to access spaces you will be syncing. Refer this article for more details https://developer.gitbook.com/gitbook-api/authentication. Additionally, you need to specify the name of organization you will be syncing data from.

🛠️ Usage

const connectGitbookResponse = await carbon.integrations.connectGitbook({
  organization: "organization_example",
  access_token: "access_token_example",
  chunk_size: 1500,
  chunk_overlap: 20,
  skip_embedding_generation: false,
  embedding_model: "OPENAI",
  generate_sparse_vectors: false,
  prepend_filename_to_chunks: false,
  sync_files_on_connection: true,
  sync_source_items: true,
});

⚙️ Parameters

organization: string
access_token: string
tags: object
chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
embedding_model: EmbeddingGenerators
generate_sparse_vectors: boolean
prepend_filename_to_chunks: boolean
sync_files_on_connection: boolean
request_id: string
sync_source_items: boolean

Enabling this flag will fetch all available content from the source to be listed via list items endpoint

file_sync_config: FileSyncConfigNullable
data_source_tags: object

Tags to be associated with the data source. If the data source already has tags set, then an upsert will be performed.

🔄 Return

GenericSuccessResponse

🌐 Endpoint

/integrations/gitbook POST

🔙 Back to Table of Contents


carbon.integrations.connectGuru

You will need an access token to connect your Guru account. To obtain an access token, follow the steps highlighted here https://help.getguru.com/docs/gurus-api#obtaining-a-user-token. The username should be your Guru username.

🛠️ Usage

const connectGuruResponse = await carbon.integrations.connectGuru({
  username: "username_example",
  access_token: "access_token_example",
  chunk_size: 1500,
  chunk_overlap: 20,
  skip_embedding_generation: false,
  embedding_model: "OPENAI",
  generate_sparse_vectors: false,
  prepend_filename_to_chunks: false,
  sync_files_on_connection: true,
  sync_source_items: true,
});

⚙️ Parameters

username: string
access_token: string
tags: object
chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
embedding_model: EmbeddingGenerators
generate_sparse_vectors: boolean
prepend_filename_to_chunks: boolean
sync_files_on_connection: boolean
request_id: string
sync_source_items: boolean

Enabling this flag will fetch all available content from the source to be listed via list items endpoint

file_sync_config: FileSyncConfigNullable
data_source_tags: object

Tags to be associated with the data source. If the data source already has tags set, then an upsert will be performed.

🔄 Return

GenericSuccessResponse

🌐 Endpoint

/integrations/guru POST

🔙 Back to Table of Contents


carbon.integrations.createAwsIamUser

This endpoint can be used to connect S3 as well as Digital Ocean Spaces (S3 compatible)
For S3, create a new IAM user with permissions to:

  1. List all buckets.
  2. Read from the specific buckets and objects to sync with Carbon. Ensure any future buckets or objects carry the same permissions.
Once created, generate an access key for this user and share the credentials with us. We recommend testing this key beforehand. For Digital Ocean Spaces, generate the above credentials in your Applications and API page here https://cloud.digitalocean.com/account/api/spaces. Endpoint URL is required to connect Digital Ocean Spaces. It should look like <>.digitaloceanspaces.com

🛠️ Usage

const createAwsIamUserResponse = await carbon.integrations.createAwsIamUser({
  access_key: "access_key_example",
  access_key_secret: "access_key_secret_example",
  sync_source_items: true,
});

⚙️ Parameters

access_key: string
access_key_secret: string
sync_source_items: boolean

Enabling this flag will fetch all available content from the source to be listed via list items endpoint

endpoint_url: string

You can specify a Digital Ocean endpoint URL to connect a Digital Ocean Space through this endpoint. The URL should be of format .digitaloceanspaces.com. It\'s not required for S3 buckets.

data_source_tags: object

Tags to be associated with the data source. If the data source already has tags set, then an upsert will be performed.

🔄 Return

OrganizationUserDataSourceAPI

🌐 Endpoint

/integrations/s3 POST

🔙 Back to Table of Contents


carbon.integrations.getOauthUrl

This endpoint can be used to generate the following URLs

  • An OAuth URL for OAuth based connectors
  • A file syncing URL which skips the OAuth flow if the user already has a valid access token and takes them to the success state.

🛠️ Usage

const getOauthUrlResponse = await carbon.integrations.getOauthUrl({
  scopes: [],
  service: "BOX",
  chunk_size: 1500,
  chunk_overlap: 20,
  skip_embedding_generation: false,
  embedding_model: "OPENAI",
  generate_sparse_vectors: false,
  prepend_filename_to_chunks: false,
  sync_files_on_connection: true,
  set_page_as_boundary: false,
  connecting_new_account: false,
  use_ocr: false,
  parse_pdf_tables_with_ocr: false,
  enable_file_picker: true,
  sync_source_items: true,
  incremental_sync: false,
});

⚙️ Parameters

tags: any
scope: string
scopes: string[]

List of scopes to request from the OAuth provider. Please that the scopes will be used as it is, not combined with the default props that Carbon uses. One scope should be one array element.

chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
embedding_model: EmbeddingGeneratorsNullable
zendesk_subdomain: string
microsoft_tenant: string
sharepoint_site_name: string
confluence_subdomain: string
generate_sparse_vectors: boolean
prepend_filename_to_chunks: boolean
max_items_per_chunk: number

Number of objects per chunk. For csv, tsv, xlsx, and json files only.

salesforce_domain: string
sync_files_on_connection: boolean

Used to specify whether Carbon should attempt to sync all your files automatically when authorization is complete. This is only supported for a subset of connectors and will be ignored for the rest. Supported connectors: Intercom, Zendesk, Gitbook, Confluence, Salesforce, Freshdesk

set_page_as_boundary: boolean
data_source_id: number

Used to specify a data source to sync from if you have multiple connected. It can be skipped if you only have one data source of that type connected or are connecting a new account.

connecting_new_account: boolean

Used to connect a new data source. If not specified, we will attempt to create a sync URL for an existing data source based on type and ID.

request_id: string

This request id will be added to all files that get synced using the generated OAuth URL

use_ocr: boolean

Enable OCR for files that support it. Supported formats: pdf, png, jpg

parse_pdf_tables_with_ocr: boolean
enable_file_picker: boolean

Enable integration\'s file picker for sources that support it. Supported sources: BOX, DROPBOX, GOOGLE_DRIVE, ONEDRIVE, SHAREPOINT

sync_source_items: boolean

Enabling this flag will fetch all available content from the source to be listed via list items endpoint

incremental_sync: boolean

Only sync files if they have not already been synced or if the embedding properties have changed. This flag is currently supported by ONEDRIVE, GOOGLE_DRIVE, BOX, DROPBOX, INTERCOM, GMAIL, OUTLOOK, ZENDESK, CONFLUENCE, NOTION, SHAREPOINT, SERVICENOW. It will be ignored for other data sources.

file_sync_config: FileSyncConfigNullable
automatically_open_file_picker: boolean

Automatically open source file picker after the OAuth flow is complete. This flag is currently supported by BOX, DROPBOX, GOOGLE_DRIVE, ONEDRIVE, SHAREPOINT. It will be ignored for other data sources.

gong_account_email: string

If you are connecting a Gong account, you need to input the email of the account you wish to connect. This email will be used to identify your carbon data source.

servicenow_credentials: ServiceNowCredentialsNullable
data_source_tags: object

Tags to be associated with the data source. If the data source already has tags set, then an upsert will be performed.

🔄 Return

OuthURLResponse

🌐 Endpoint

/integrations/oauth_url POST

🔙 Back to Table of Contents


carbon.integrations.listConfluencePages

Deprecated

This endpoint has been deprecated. Use /integrations/items/list instead.

To begin listing a user's Confluence pages, at least a data_source_id of a connected Confluence account must be specified. This base request returns a list of root pages for every space the user has access to in a Confluence instance. To traverse further down the user's page directory, additional requests to this endpoint can be made with the same data_source_id and with parent_id set to the id of page from a previous request. For convenience, the has_children property in each directory item in the response list will flag which pages will return non-empty lists of pages when set as the parent_id.

🛠️ Usage

const listConfluencePagesResponse =
  await carbon.integrations.listConfluencePages({
    data_source_id: 1,
  });

⚙️ Parameters

data_source_id: number
parent_id: string

🔄 Return

ListResponse

🌐 Endpoint

/integrations/confluence/list POST

🔙 Back to Table of Contents


carbon.integrations.listConversations

List all of your public and private channels, DMs, and Group DMs. The ID from response can be used as a filter to sync messages to Carbon
types: Comma separated list of types. Available types are im (DMs), mpim (group DMs), public_channel, and private_channel. Defaults to public_channel.
cursor: Used for pagination. If next_cursor is returned in response, you need to pass it as the cursor in the next request
data_source_id: Data source needs to be specified if you have linked multiple slack accounts
exclude_archived: Should archived conversations be excluded, defaults to true

🛠️ Usage

const listConversationsResponse = await carbon.integrations.listConversations({
  types: "public_channel",
  excludeArchived: true,
});

⚙️ Parameters

types: string
cursor: string
dataSourceId: number
excludeArchived: boolean

🌐 Endpoint

/integrations/slack/conversations GET

🔙 Back to Table of Contents


carbon.integrations.listDataSourceItems

List Data Source Items

🛠️ Usage

const listDataSourceItemsResponse =
  await carbon.integrations.listDataSourceItems({
    data_source_id: 1,
    order_by: "name",
    order_dir: "asc",
  });

⚙️ Parameters

data_source_id: number
parent_id: string
pagination: Pagination
order_dir: OrderDirV2

🔄 Return

ListDataSourceItemsResponse

🌐 Endpoint

/integrations/items/list POST

🔙 Back to Table of Contents


carbon.integrations.listFolders

After connecting your Outlook account, you can use this endpoint to list all of your folders on outlook. This includes both system folders like "inbox" and user created folders.

🛠️ Usage

const listFoldersResponse = await carbon.integrations.listFolders({});

⚙️ Parameters

dataSourceId: number

🌐 Endpoint

/integrations/outlook/user_folders GET

🔙 Back to Table of Contents


carbon.integrations.listGitbookSpaces

After connecting your Gitbook account, you can use this endpoint to list all of your spaces under current organization.

🛠️ Usage

const listGitbookSpacesResponse = await carbon.integrations.listGitbookSpaces({
  dataSourceId: 1,
});

⚙️ Parameters

dataSourceId: number

🌐 Endpoint

/integrations/gitbook/spaces GET

🔙 Back to Table of Contents


carbon.integrations.listLabels

After connecting your Gmail account, you can use this endpoint to list all of your labels. User created labels will have the type "user" and Gmail's default labels will have the type "system"

🛠️ Usage

const listLabelsResponse = await carbon.integrations.listLabels({});

⚙️ Parameters

dataSourceId: number

🌐 Endpoint

/integrations/gmail/user_labels GET

🔙 Back to Table of Contents


carbon.integrations.listOutlookCategories

After connecting your Outlook account, you can use this endpoint to list all of your categories on outlook. We currently support listing up to 250 categories.

🛠️ Usage

const listOutlookCategoriesResponse =
  await carbon.integrations.listOutlookCategories({});

⚙️ Parameters

dataSourceId: number

🌐 Endpoint

/integrations/outlook/user_categories GET

🔙 Back to Table of Contents


carbon.integrations.listRepos

Once you have connected your GitHub account, you can use this endpoint to list the repositories your account has access to. You can use a data source ID or username to fetch from a specific account.

🛠️ Usage

const listReposResponse = await carbon.integrations.listRepos({
  perPage: 30,
  page: 1,
});

⚙️ Parameters

perPage: number
page: number
dataSourceId: number

🌐 Endpoint

/integrations/github/repos GET

🔙 Back to Table of Contents


carbon.integrations.listSharepointSites

List all Sharepoint sites in the connected tenant. The site names from the response can be used as the site name when connecting a Sharepoint site. If site name is null in the response, then site name should be left null when connecting to the site.

This endpoint requires an additional Sharepoint scope: "Sites.Read.All". Include this scope along with the default Sharepoint scopes to list Sharepoint sites, connect to a site, and finally sync files from the site. The default Sharepoint scopes are: [o, p, e, n, i, d, , o, f, f, l, i, n, e, _, a, c, c, e, s, s, , U, s, e, r, ., R, e, a, d, , F, i, l, e, s, ., R, e, a, d, ., A, l, l].

data_soure_id: Data source needs to be specified if you have linked multiple Sharepoint accounts cursor: Used for pagination. If next_cursor is returned in response, you need to pass it as the cursor in the next request

🛠️ Usage

const listSharepointSitesResponse =
  await carbon.integrations.listSharepointSites({});

⚙️ Parameters

dataSourceId: number
cursor: string

🌐 Endpoint

/integrations/sharepoint/sites/list GET

🔙 Back to Table of Contents


carbon.integrations.syncAzureBlobFiles

After optionally loading the items via /integrations/items/sync and integrations/items/list, use the container name and file name as the ID in this endpoint to sync them into Carbon. Additional parameters below can associate data with the selected items or modify the sync behavior

🛠️ Usage

const syncAzureBlobFilesResponse = await carbon.integrations.syncAzureBlobFiles(
  {
    ids: [{}],
    chunk_size: 1500,
    chunk_overlap: 20,
    skip_embedding_generation: false,
    embedding_model: "OPENAI",
    generate_sparse_vectors: false,
    prepend_filename_to_chunks: false,
    set_page_as_boundary: false,
    use_ocr: false,
    parse_pdf_tables_with_ocr: false,
  }
);

⚙️ Parameters

tags: object
chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
embedding_model: EmbeddingGenerators
generate_sparse_vectors: boolean
prepend_filename_to_chunks: boolean
max_items_per_chunk: number

Number of objects per chunk. For csv, tsv, xlsx, and json files only.

set_page_as_boundary: boolean
data_source_id: number
request_id: string
use_ocr: boolean
parse_pdf_tables_with_ocr: boolean
file_sync_config: FileSyncConfigNullable

🔄 Return

GenericSuccessResponse

🌐 Endpoint

/integrations/azure_blob_storage/files POST

🔙 Back to Table of Contents


carbon.integrations.syncAzureBlobStorage

This endpoint can be used to connect Azure Blob Storage.

For Azure Blob Storage, follow these steps:

  1. Create a new Azure Storage account and grant the following permissions:
    • List containers.
    • Read from specific containers and blobs to sync with Carbon. Ensure any future containers or blobs carry the same permissions.
  2. Generate a shared access signature (SAS) token or an access key for the storage account.

Once created, provide us with the following details to generate the connection URL:

  1. Storage Account KeyName.
  2. Storage Account Name.

🛠️ Usage

const syncAzureBlobStorageResponse =
  await carbon.integrations.syncAzureBlobStorage({
    account_name: "account_name_example",
    account_key: "account_key_example",
    sync_source_items: true,
  });

⚙️ Parameters

account_name: string
account_key: string
sync_source_items: boolean
data_source_tags: object

Tags to be associated with the data source. If the data source already has tags set, then an upsert will be performed.

🔄 Return

OrganizationUserDataSourceAPI

🌐 Endpoint

/integrations/azure_blob_storage POST

🔙 Back to Table of Contents


carbon.integrations.syncConfluence

Deprecated

This endpoint has been deprecated. Use /integrations/files/sync instead.

After listing pages in a user's Confluence account, the set of selected page ids and the connected account's data_source_id can be passed into this endpoint to sync them into Carbon. Additional parameters listed below can be used to associate data to the selected pages or alter the behavior of the sync.

🛠️ Usage

const syncConfluenceResponse = await carbon.integrations.syncConfluence({
  data_source_id: 1,
  ids: ["string_example"],
  chunk_size: 1500,
  chunk_overlap: 20,
  skip_embedding_generation: false,
  embedding_model: "OPENAI",
  generate_sparse_vectors: false,
  prepend_filename_to_chunks: false,
  set_page_as_boundary: false,
  use_ocr: false,
  parse_pdf_tables_with_ocr: false,
  incremental_sync: false,
});

⚙️ Parameters

data_source_id: number
tags: object
chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
embedding_model: EmbeddingGeneratorsNullable
generate_sparse_vectors: boolean
prepend_filename_to_chunks: boolean
max_items_per_chunk: number

Number of objects per chunk. For csv, tsv, xlsx, and json files only.

set_page_as_boundary: boolean
request_id: string
use_ocr: boolean
parse_pdf_tables_with_ocr: boolean
incremental_sync: boolean

Only sync files if they have not already been synced or if the embedding properties have changed. This flag is currently supported by ONEDRIVE, GOOGLE_DRIVE, BOX, DROPBOX, INTERCOM, GMAIL, OUTLOOK, ZENDESK, CONFLUENCE, NOTION, SHAREPOINT, SERVICENOW. It will be ignored for other data sources.

file_sync_config: FileSyncConfigNullable

🔄 Return

GenericSuccessResponse

🌐 Endpoint

/integrations/confluence/sync POST

🔙 Back to Table of Contents


carbon.integrations.syncDataSourceItems

Sync Data Source Items

🛠️ Usage

const syncDataSourceItemsResponse =
  await carbon.integrations.syncDataSourceItems({
    data_source_id: 1,
  });

⚙️ Parameters

data_source_id: number

🔄 Return

OrganizationUserDataSourceAPI

🌐 Endpoint

/integrations/items/sync POST

🔙 Back to Table of Contents


carbon.integrations.syncFiles

After listing files and folders via /integrations/items/sync and integrations/items/list, use the selected items' external ids as the ids in this endpoint to sync them into Carbon. Sharepoint items take an additional parameter root_id, which identifies the drive the file or folder is in and is stored in root_external_id. That additional paramter is optional and excluding it will tell the sync to assume the item is stored in the default Documents drive.

🛠️ Usage

const syncFilesResponse = await carbon.integrations.syncFiles({
  data_source_id: 1,
  ids: ["string_example"],
  chunk_size: 1500,
  chunk_overlap: 20,
  skip_embedding_generation: false,
  embedding_model: "OPENAI",
  generate_sparse_vectors: false,
  prepend_filename_to_chunks: false,
  set_page_as_boundary: false,
  use_ocr: false,
  parse_pdf_tables_with_ocr: false,
  incremental_sync: false,
});

⚙️ Parameters

data_source_id: number
tags: object
chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
embedding_model: EmbeddingGeneratorsNullable
generate_sparse_vectors: boolean
prepend_filename_to_chunks: boolean
max_items_per_chunk: number

Number of objects per chunk. For csv, tsv, xlsx, and json files only.

set_page_as_boundary: boolean
request_id: string
use_ocr: boolean
parse_pdf_tables_with_ocr: boolean
incremental_sync: boolean

Only sync files if they have not already been synced or if the embedding properties have changed. This flag is currently supported by ONEDRIVE, GOOGLE_DRIVE, BOX, DROPBOX, INTERCOM, GMAIL, OUTLOOK, ZENDESK, CONFLUENCE, NOTION, SHAREPOINT, SERVICENOW. It will be ignored for other data sources.

file_sync_config: FileSyncConfigNullable

🔄 Return

GenericSuccessResponse

🌐 Endpoint

/integrations/files/sync POST

🔙 Back to Table of Contents


carbon.integrations.syncGitHub

Refer this article to obtain an access token https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens. Make sure that your access token has the permission to read content from your desired repos. Note that if your access token expires you will need to manually update it through this endpoint.

🛠️ Usage

const syncGitHubResponse = await carbon.integrations.syncGitHub({
  username: "username_example",
  access_token: "access_token_example",
  sync_source_items: false,
});

⚙️ Parameters

username: string
access_token: string
sync_source_items: boolean

Enabling this flag will fetch all available content from the source to be listed via list items endpoint

data_source_tags: object

Tags to be associated with the data source. If the data source already has tags set, then an upsert will be performed.

🔄 Return

GenericSuccessResponse

🌐 Endpoint

/integrations/github POST

🔙 Back to Table of Contents


carbon.integrations.syncGitbook

You can sync upto 20 Gitbook spaces at a time using this endpoint. Additional parameters below can be used to associate data with the synced pages or modify the sync behavior.

🛠️ Usage

const syncGitbookResponse = await carbon.integrations.syncGitbook({
  space_ids: ["space_ids_example"],
  data_source_id: 1,
  chunk_size: 1500,
  chunk_overlap: 20,
  skip_embedding_generation: false,
  embedding_model: "OPENAI",
  generate_sparse_vectors: false,
  prepend_filename_to_chunks: false,
});

⚙️ Parameters

space_ids: string[]
data_source_id: number
tags: object
chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
embedding_model: EmbeddingGenerators
generate_sparse_vectors: boolean
prepend_filename_to_chunks: boolean
request_id: string
file_sync_config: FileSyncConfigNullable

🌐 Endpoint

/integrations/gitbook/sync POST

🔙 Back to Table of Contents


carbon.integrations.syncGmail

Once you have successfully connected your gmail account, you can choose which emails to sync with us using the filters parameter. Filters is a JSON object with key value pairs. It also supports AND and OR operations. For now, we support a limited set of keys listed below.

label: Inbuilt Gmail labels, for example "Important" or a custom label you created.
after or before: A date in YYYY/mm/dd format (example 2023/12/31). Gets emails after/before a certain date. You can also use them in combination to get emails from a certain period.
is: Can have the following values - starred, important, snoozed, and unread
from: Email address of the sender
to: Email address of the recipient
in: Can have the following values - sent (sync emails sent by the user)
has: Can have the following values - attachment (sync emails that have attachments)

Using keys or values outside of the specified values can lead to unexpected behaviour.

An example of a basic query with filters can be

{
    "filters": {
            "key": "label",
            "value": "Test"
        }
}

Which will list all emails that have the label "Test".

You can use AND and OR operation in the following way:

{
    "filters": {
        "AND": [
            {
                "key": "after",
                "value": "2024/01/07"
            },
            {
                "OR": [
                    {
                        "key": "label",
                        "value": "Personal"
                    },
                    {
                        "key": "is",
                        "value": "starred"
                    }
                ]
            }
        ]
    }
}

This will return emails after 7th of Jan that are either starred or have the label "Personal". Note that this is the highest level of nesting we support, i.e. you can't add more AND/OR filters within the OR filter in the above example.

🛠️ Usage

const syncGmailResponse = await carbon.integrations.syncGmail({
  filters: {},
  chunk_size: 1500,
  chunk_overlap: 20,
  skip_embedding_generation: false,
  embedding_model: "OPENAI",
  generate_sparse_vectors: false,
  prepend_filename_to_chunks: false,
  sync_attachments: false,
  incremental_sync: false,
});

⚙️ Parameters

filters: object
tags: object
chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
embedding_model: EmbeddingGenerators
generate_sparse_vectors: boolean
prepend_filename_to_chunks: boolean
data_source_id: number
request_id: string
sync_attachments: boolean
file_sync_config: FileSyncConfigNullable
incremental_sync: boolean

🔄 Return

GenericSuccessResponse

🌐 Endpoint

/integrations/gmail/sync POST

🔙 Back to Table of Contents


carbon.integrations.syncOutlook

Once you have successfully connected your Outlook account, you can choose which emails to sync with us using the filters and folder parameter. "folder" should be the folder you want to sync from Outlook. By default we get messages from your inbox folder.
Filters is a JSON object with key value pairs. It also supports AND and OR operations. For now, we support a limited set of keys listed below.

category: Custom categories that you created in Outlook.
after or before: A date in YYYY/mm/dd format (example 2023/12/31). Gets emails after/before a certain date. You can also use them in combination to get emails from a certain period.
is: Can have the following values: flagged
from: Email address of the sender

An example of a basic query with filters can be

{
    "filters": {
            "key": "category",
            "value": "Test"
        }
}

Which will list all emails that have the category "Test".

Specifying a custom folder in the same query

{
    "folder": "Folder Name",
    "filters": {
            "key": "category",
            "value": "Test"
        }
}

You can use AND and OR operation in the following way:

{
    "filters": {
        "AND": [
            {
                "key": "after",
                "value": "2024/01/07"
            },
            {
                "OR": [
                    {
                        "key": "category",
                        "value": "Personal"
                    },
                    {
                        "key": "category",
                        "value": "Test"
                    },
                ]
            }
        ]
    }
}

This will return emails after 7th of Jan that have either Personal or Test as category. Note that this is the highest level of nesting we support, i.e. you can't add more AND/OR filters within the OR filter in the above example.

🛠️ Usage

const syncOutlookResponse = await carbon.integrations.syncOutlook({
  folder: "Inbox",
  filters: {},
  chunk_size: 1500,
  chunk_overlap: 20,
  skip_embedding_generation: false,
  embedding_model: "OPENAI",
  generate_sparse_vectors: false,
  prepend_filename_to_chunks: false,
  sync_attachments: false,
  incremental_sync: false,
});

⚙️ Parameters

filters: object
tags: object
folder: string
chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
embedding_model: EmbeddingGenerators
generate_sparse_vectors: boolean
prepend_filename_to_chunks: boolean
data_source_id: number
request_id: string
sync_attachments: boolean
file_sync_config: FileSyncConfigNullable
incremental_sync: boolean

🔄 Return

GenericSuccessResponse

🌐 Endpoint

/integrations/outlook/sync POST

🔙 Back to Table of Contents


carbon.integrations.syncRepos

You can retreive repos your token has access to using /integrations/github/repos and sync their content. You can also pass full name of any public repository (username/repo-name). This will store the repo content with carbon which can be accessed through /integrations/items/list endpoint. Maximum of 25 repositories are accepted per request.

🛠️ Usage

const syncReposResponse = await carbon.integrations.syncRepos({
  repos: ["repos_example"],
});

⚙️ Parameters

repos: string[]
data_source_id: number

🌐 Endpoint

/integrations/github/sync_repos POST

🔙 Back to Table of Contents


carbon.integrations.syncRssFeed

Rss Feed

🛠️ Usage

const syncRssFeedResponse = await carbon.integrations.syncRssFeed({
  url: "url_example",
  chunk_size: 1500,
  chunk_overlap: 20,
  skip_embedding_generation: false,
  embedding_model: "OPENAI",
  generate_sparse_vectors: false,
  prepend_filename_to_chunks: false,
});

⚙️ Parameters

url: string
tags: object
chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
embedding_model: EmbeddingGenerators
generate_sparse_vectors: boolean
prepend_filename_to_chunks: boolean
request_id: string
data_source_tags: object

Tags to be associated with the data source. If the data source already has tags set, then an upsert will be performed.

🔄 Return

GenericSuccessResponse

🌐 Endpoint

/integrations/rss_feed POST

🔙 Back to Table of Contents


carbon.integrations.syncS3Files

After optionally loading the items via /integrations/items/sync and integrations/items/list, use the bucket name and object key as the ID in this endpoint to sync them into Carbon. Additional parameters below can associate data with the selected items or modify the sync behavior

🛠️ Usage

const syncS3FilesResponse = await carbon.integrations.syncS3Files({
  ids: [{}],
  chunk_size: 1500,
  chunk_overlap: 20,
  skip_embedding_generation: false,
  embedding_model: "OPENAI",
  generate_sparse_vectors: false,
  prepend_filename_to_chunks: false,
  set_page_as_boundary: false,
  use_ocr: false,
  parse_pdf_tables_with_ocr: false,
});

⚙️ Parameters

Each input should be one of the following: A bucket name, a bucket name and a prefix, or a bucket name and an object key. A prefix is the common path for all objects you want to sync. Paths should end with a forward slash.

tags: object
chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
embedding_model: EmbeddingGenerators
generate_sparse_vectors: boolean
prepend_filename_to_chunks: boolean
max_items_per_chunk: number

Number of objects per chunk. For csv, tsv, xlsx, and json files only.

set_page_as_boundary: boolean
data_source_id: number
request_id: string
use_ocr: boolean
parse_pdf_tables_with_ocr: boolean
file_sync_config: FileSyncConfigNullable

🔄 Return

GenericSuccessResponse

🌐 Endpoint

/integrations/s3/files POST

🔙 Back to Table of Contents


carbon.integrations.syncSlack

You can list all conversations using the endpoint /integrations/slack/conversations. The ID of conversation will be used as an input for this endpoint with timestamps as optional filters.

🛠️ Usage

const syncSlackResponse = await carbon.integrations.syncSlack({
  filters: {
    conversation_id: "conversation_id_example",
  },
  chunk_size: 1500,
  chunk_overlap: 20,
  skip_embedding_generation: false,
  embedding_model: "OPENAI",
  generate_sparse_vectors: false,
  prepend_filename_to_chunks: false,
});

⚙️ Parameters

filters: SlackFilters
tags: object
chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
embedding_model: EmbeddingGenerators
generate_sparse_vectors: boolean
prepend_filename_to_chunks: boolean
data_source_id: number
request_id: string

🌐 Endpoint

/integrations/slack/sync POST

🔙 Back to Table of Contents


carbon.organizations.get

Get Organization

🛠️ Usage

const getResponse = await carbon.organizations.get();

🔄 Return

OrganizationResponse

🌐 Endpoint

/organization GET

🔙 Back to Table of Contents


carbon.organizations.update

Update Organization

🛠️ Usage

const updateResponse = await carbon.organizations.update({});

⚙️ Parameters

global_user_config: UserConfigurationNullable
data_source_configs: Record<string, DataSourceConfiguration>

Used to set organization level defaults for configuration related to data sources.

🔄 Return

GenericSuccessResponse

🌐 Endpoint

/organization/update POST

🔙 Back to Table of Contents


carbon.organizations.updateStats

Use this endpoint to reaggregate the statistics for an organization, for example aggregate_file_size. The reaggregation process is asyncronous so a webhook will be sent with the event type being FILE_STATISTICS_AGGREGATED to notify when the process is complee. After this aggregation is complete, the updated statistics can be retrieved using the /organization endpoint. The response of /organization willalso contain a timestamp of the last time the statistics were reaggregated.

🛠️ Usage

const updateStatsResponse = await carbon.organizations.updateStats();

🔄 Return

GenericSuccessResponse

🌐 Endpoint

/organization/statistics POST

🔙 Back to Table of Contents


carbon.users.delete

Delete Users

🛠️ Usage

const deleteResponse = await carbon.users.delete({
  customer_ids: ["customer_ids_example"],
});

⚙️ Parameters

customer_ids: string[]

🔄 Return

GenericSuccessResponse

🌐 Endpoint

/delete_users POST

🔙 Back to Table of Contents


carbon.users.get

User Endpoint

🛠️ Usage

const getResponse = await carbon.users.get({
  customer_id: "customer_id_example",
});

⚙️ Parameters

customer_id: string

🔄 Return

UserResponse

🌐 Endpoint

/user POST

🔙 Back to Table of Contents


carbon.users.list

List users within an organization

🛠️ Usage

const listResponse = await carbon.users.list({
  order_by: "created_at",
  order_dir: "asc",
  include_count: false,
});

⚙️ Parameters

pagination: Pagination
order_dir: OrderDirV2
include_count: boolean

🔄 Return

UserListResponse

🌐 Endpoint

/list_users POST

🔙 Back to Table of Contents


carbon.users.toggleUserFeatures

Deprecated

Toggle User Features

🛠️ Usage

const toggleUserFeaturesResponse = await carbon.users.toggleUserFeatures({
  configuration_key_name: "sparse_vectors",
  value: {},
});

⚙️ Parameters

configuration_key_name: ConfigurationKeys
value: object

🔄 Return

GenericSuccessResponse

🌐 Endpoint

/modify_user_configuration POST

🔙 Back to Table of Contents


carbon.users.updateUsers

Update Users

🛠️ Usage

const updateUsersResponse = await carbon.users.updateUsers({
  customer_ids: ["customer_ids_example"],
});

⚙️ Parameters

customer_ids: string[]

List of organization supplied user IDs

auto_sync_enabled_sources: AutoSyncEnabledSourcesProperty
max_files: number

Custom file upload limit for the user over all user\'s files across all uploads. If set, then the user will not be allowed to upload more files than this limit. If not set, or if set to -1, then the user will have no limit.

max_files_per_upload: number

Custom file upload limit for the user across a single upload. If set, then the user will not be allowed to upload more files than this limit in a single upload. If not set, or if set to -1, then the user will have no limit.

max_characters: number

Custom character upload limit for the user over all user\'s files across all uploads. If set, then the user will not be allowed to upload more characters than this limit. If not set, or if set to -1, then the user will have no limit.

max_characters_per_file: number

A single file upload from the user can not exceed this character limit. If set, then the file will not be synced if it exceeds this limit. If not set, or if set to -1, then the user will have no limit.

max_characters_per_upload: number

Custom character upload limit for the user across a single upload. If set, then the user won\'t be able to sync more than this many characters in one upload. If not set, or if set to -1, then the user will have no limit.

auto_sync_interval: number

The interval in hours at which the user\'s data sources should be synced. If not set or set to -1, the user will be synced at the organization level interval or default interval if that is also not set. Must be one of [3, 6, 12, 24]

🔄 Return

GenericSuccessResponse

🌐 Endpoint

/update_users POST

🔙 Back to Table of Contents


carbon.users.whoAmI

Me Endpoint

🛠️ Usage

const whoAmIResponse = await carbon.users.whoAmI();

🔄 Return

UserResponse

🌐 Endpoint

/whoami GET

🔙 Back to Table of Contents


carbon.utilities.fetchUrls

Deprecated

Extracts all URLs from a webpage.

Args: url (str): URL of the webpage

Returns: FetchURLsResponse: A response object with a list of URLs extracted from the webpage and the webpage content.

🛠️ Usage

const fetchUrlsResponse = await carbon.utilities.fetchUrls({
  url: "url_example",
});

⚙️ Parameters

url: string

🔄 Return

FetchURLsResponse

🌐 Endpoint

/fetch_urls GET

🔙 Back to Table of Contents


carbon.utilities.fetchWebpage

Fetch Urls V2

🛠️ Usage

const fetchWebpageResponse = await carbon.utilities.fetchWebpage({
  url: "url_example",
});

⚙️ Parameters

url: string

🌐 Endpoint

/fetch_webpage POST

🔙 Back to Table of Contents


carbon.utilities.fetchYoutubeTranscripts

Fetches english transcripts from YouTube videos.

Args: id (str): The ID of the YouTube video. raw (bool): Whether to return the raw transcript or not. Defaults to False.

Returns: dict: A dictionary with the transcript of the YouTube video.

🛠️ Usage

const fetchYoutubeTranscriptsResponse =
  await carbon.utilities.fetchYoutubeTranscripts({
    id: "id_example",
    raw: false,
  });

⚙️ Parameters

id: string
raw: boolean

🔄 Return

YoutubeTranscriptResponse

🌐 Endpoint

/fetch_youtube_transcript GET

🔙 Back to Table of Contents


carbon.utilities.processSitemap

Retrieves all URLs from a sitemap, which can subsequently be utilized with our web_scrape endpoint.

🛠️ Usage

const processSitemapResponse = await carbon.utilities.processSitemap({
  url: "url_example",
});

⚙️ Parameters

url: string

🌐 Endpoint

/process_sitemap GET

🔙 Back to Table of Contents


carbon.utilities.scrapeSitemap

Extracts all URLs from a sitemap and performs a web scrape on each of them.

Args: sitemap_url (str): URL of the sitemap

Returns: dict: A response object with the status of the scraping job message.-->

🛠️ Usage

const scrapeSitemapResponse = await carbon.utilities.scrapeSitemap({
  url: "url_example",
  chunk_size: 1500,
  chunk_overlap: 20,
  skip_embedding_generation: false,
  enable_auto_sync: false,
  generate_sparse_vectors: false,
  prepend_filename_to_chunks: false,
  html_tags_to_skip: [],
  css_classes_to_skip: [],
  css_selectors_to_skip: [],
  embedding_model: "OPENAI",
  url_paths_to_include: [],
  url_paths_to_exclude: [],
  urls_to_scrape: [],
  download_css_and_media: false,
  generate_chunks_only: false,
  store_file_only: false,
  use_premium_proxies: false,
});

⚙️ Parameters

url: string
tags: Record<string, Tags1>
max_pages_to_scrape: number
chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
enable_auto_sync: boolean
generate_sparse_vectors: boolean
prepend_filename_to_chunks: boolean
html_tags_to_skip: string[]
css_classes_to_skip: string[]
css_selectors_to_skip: string[]
embedding_model: EmbeddingGenerators
url_paths_to_include: string[]

URL subpaths or directories that you want to include. For example if you want to only include URLs that start with /questions in stackoverflow.com, you will add /questions/ in this input

url_paths_to_exclude: string[]

URL subpaths or directories that you want to exclude. For example if you want to exclude URLs that start with /questions in stackoverflow.com, you will add /questions/ in this input

urls_to_scrape: string[]

You can submit a subset of URLs from the sitemap that should be scraped. To get the list of URLs, you can check out /process_sitemap endpoint. If left empty, all URLs from the sitemap will be scraped.

download_css_and_media: boolean

Whether the scraper should download css and media from the page (images, fonts, etc). Scrapes might take longer to finish with this flag enabled, but the success rate is improved.

generate_chunks_only: boolean

If this flag is enabled, the file will be chunked and stored with Carbon, but no embeddings will be generated. This overrides the skip_embedding_generation flag.

store_file_only: boolean

If this flag is enabled, the file will be stored with Carbon, but no processing will be done.

use_premium_proxies: boolean

If the default proxies are blocked and not returning results, this flag can be enabled to use alternate proxies (residential and office). Scrapes might take longer to finish with this flag enabled.

🌐 Endpoint

/scrape_sitemap POST

🔙 Back to Table of Contents


carbon.utilities.scrapeWeb

Conduct a web scrape on a given webpage URL. Our web scraper is fully compatible with JavaScript and supports recursion depth, enabling you to efficiently extract all content from the target website.

🛠️ Usage

const scrapeWebResponse = await carbon.utilities.scrapeWeb([
  {
    url: "url_example",
    recursion_depth: 3,
    max_pages_to_scrape: 100,
    chunk_size: 1500,
    chunk_overlap: 20,
    skip_embedding_generation: false,
    enable_auto_sync: false,
    generate_sparse_vectors: false,
    prepend_filename_to_chunks: false,
    html_tags_to_skip: [],
    css_classes_to_skip: [],
    css_selectors_to_skip: [],
    embedding_model: "OPENAI",
    url_paths_to_include: [],
    download_css_and_media: false,
    generate_chunks_only: false,
    store_file_only: false,
    use_premium_proxies: false,
  },
]);

⚙️ Request Body

WebscrapeRequest[]

🌐 Endpoint

/web_scrape POST

🔙 Back to Table of Contents


carbon.utilities.searchUrls

Perform a web search and obtain a list of relevant URLs.

As an illustration, when you perform a search for “content related to MRNA,” you will receive a list of links such as the following:

- https://tomrenz.substack.com/p/mrna-and-why-it-matters

- https://www.statnews.com/2020/11/10/the-story-of-mrna-how-a-once-dismissed-idea-became-a-leading-technology-in-the-covid-vaccine-race/

- https://www.statnews.com/2022/11/16/covid-19-vaccines-were-a-success-but-mrna-still-has-a-delivery-problem/

- https://joomi.substack.com/p/were-still-being-misled-about-how

Subsequently, you can submit these links to the web_scrape endpoint in order to retrieve the content of the respective web pages.

Args: query (str): Query to search for

Returns: FetchURLsResponse: A response object with a list of URLs for a given search query.

🛠️ Usage

const searchUrlsResponse = await carbon.utilities.searchUrls({
  query: "query_example",
});

⚙️ Parameters

query: string

🔄 Return

FetchURLsResponse

🌐 Endpoint

/search_urls GET

🔙 Back to Table of Contents


carbon.utilities.userWebpages

User Web Pages

🛠️ Usage

const userWebpagesResponse = await carbon.utilities.userWebpages({
  order_by: "created_at",
  order_dir: "asc",
});

⚙️ Parameters

pagination: Pagination
order_dir: OrderDirV2

🌐 Endpoint

/user_webpages POST

🔙 Back to Table of Contents


carbon.webhooks.addUrl

Add Webhook Url

🛠️ Usage

const addUrlResponse = await carbon.webhooks.addUrl({
  url: "url_example",
});

⚙️ Parameters

url: string

🔄 Return

Webhook

🌐 Endpoint

/add_webhook POST

🔙 Back to Table of Contents


carbon.webhooks.deleteUrl

Delete Webhook Url

🛠️ Usage

const deleteUrlResponse = await carbon.webhooks.deleteUrl({
  webhookId: 1,
});

⚙️ Parameters

webhookId: number

🔄 Return

GenericSuccessResponse

🌐 Endpoint

/delete_webhook/{webhook_id} DELETE

🔙 Back to Table of Contents


carbon.webhooks.urls

Webhook Urls

🛠️ Usage

const urlsResponse = await carbon.webhooks.urls({
  order_by: "created_at",
  order_dir: "desc",
});

⚙️ Parameters

pagination: Pagination
order_dir: OrderDir
filters: WebhookFilters

🔄 Return

WebhookQueryResponse

🌐 Endpoint

/webhooks POST

🔙 Back to Table of Contents


carbon.whiteLabel.create

Create White Labels

🛠️ Usage

const createResponse = await carbon.whiteLabel.create([
  {
    data_source_type: "GOOGLE_DRIVE",
    credentials: {
      client_id: "client_id_example",
      redirect_uri: "redirect_uri_example",
    },
  },
]);

⚙️ Request Body

WhiteLabelCreateRequestInner[]

🌐 Endpoint

/white_label/create POST

🔙 Back to Table of Contents


carbon.whiteLabel.delete

Delete White Labels

🛠️ Usage

const deleteResponse = await carbon.whiteLabel.delete({
  ids: [1],
});

⚙️ Parameters

ids: number[]

🌐 Endpoint

/white_label/delete POST

🔙 Back to Table of Contents


carbon.whiteLabel.list

List White Labels

🛠️ Usage

const listResponse = await carbon.whiteLabel.list({
  order_by: "created_at",
  order_dir: "desc",
});

⚙️ Parameters

pagination: Pagination
order_dir: OrderDir

🌐 Endpoint

/white_label/list POST

🔙 Back to Table of Contents


carbon.whiteLabel.update

Update White Label

🛠️ Usage

const updateResponse = await carbon.whiteLabel.update({
  data_source_type: "GOOGLE_DRIVE",
  credentials: {
    client_id: "client_id_example",
    redirect_uri: "redirect_uri_example",
  },
});

⚙️ Parameters

data_source_type: string
credentials: Credentials

🌐 Endpoint

/white_label/update POST

🔙 Back to Table of Contents


Author

This TypeScript package is automatically generated by Konfig

Versions

Current Tags

VersionDownloads (Last 7 Days)Tag
0.2.534latest

Version History

VersionDownloads (Last 7 Days)Published
0.2.534
0.2.521
0.2.5128
0.2.50204
0.2.490
0.2.489
0.2.470
0.2.461,873
0.2.450
0.2.440
0.2.430
0.2.420
0.2.4143
0.2.400
0.2.390
0.2.380
0.2.370
0.2.360
0.2.350
0.2.340
0.2.330
0.2.325
0.2.310
0.2.306
0.2.29133
0.2.280
0.2.270
0.2.261
0.2.250
0.2.240
0.2.231
0.2.220
0.2.212
0.2.200
0.2.190
0.2.180
0.2.170
0.2.160
0.2.151
0.2.147
0.2.131
0.2.120
0.2.110
0.2.101
0.2.91
0.2.71
0.2.61
0.2.50
0.2.40
0.2.30
0.2.20
0.2.10
0.2.00
0.1.377
0.1.360
0.1.350
0.1.340
0.1.330
0.1.320
0.1.311
0.1.300
0.1.291
0.1.280
0.1.271
0.1.260
0.1.250
0.1.241
0.1.230
0.1.221
0.1.210
0.1.203
0.1.190
0.1.181
0.1.170
0.1.160
0.1.151
0.1.140
0.1.131
0.1.120
0.1.110
0.1.100
0.1.90
0.1.80
0.1.70
0.1.60
0.1.50
0.1.40
0.1.30
0.1.20
0.1.10
0.1.00

Package Sidebar

Install

npm i carbon-typescript-sdk

Weekly Downloads

2,341

Version

0.2.53

License

Unlicense

Unpacked Size

1.76 MB

Total Files

644

Last publish

Collaborators

  • konfig-publisher
  • carbon-ai