Storages

Storages are where Vidispine will store any files that are ingested/created in the system. All files on a storage location will get an entry in the Vidispine database, containing state, file size, hash etc. This is to keep track of any file changes.

For information about files in storage, see Files.

Storages

Storage types

A storage must be designated a type, based on what type of operations are to be performed on the contained files. Operations in this context are transcode, move, delete, and destination (that is, placing new files here).

LOCAL
A Vidispine specific storage, suitable for all operations. Note that LOCAL doesn’t necessarily imply that the storage is physically local. It should however be a dedicated Vidispine storage. That is, files on such storages should not be written to/deleted by any external application.
SHARED
A storage shared with another application, Vidispine will not create new files, nor perform any write operations here.
REMOTE
A storage on a remote computer, files should be copied to a local storage before used.
EXTERNAL
A storage placeholder.
ARCHIVE
A storage meant for archiving, needs a plugin bean or a JavaScript, described in more detail at Archive Integration.
EXPORT
Files are not monitored, but copy operations to here will create a file entry in the database.

Storage states

Storages will have one of the following states:

NONE
Not used.
READY
Operating normally.
OFFLINE
No available storage method could be reached.
FAILED
Currently not used in Vidispine.
DISABLED
Currently not used in Vidispine.
EVACUATING
Storage is being evacuated.
EVACUATED
Evacuating process finished.

For more information about storage evacuation, see section on Evacuating storages.

Storage groups

Storages can be placed in named groups, called storage groups. These storage groups can then be used in Storage rules and Quota rules.

Storage capacity

When a storage is created a capacity can be specified. This is the total number of bytes that is freely available on the storage. The free capacity is calculated as total capacity - sum(file sizes in database list). Note that this means that the size of MISSING and LOST files are included in the used capacity. If you do not expect a file with these states to return, it is best to delete the file entity using the API.

Auto-detecting the storage capacity

By setting the element autoDetect in the StorageDocument you can make Vidispine read the capacity from the file system. This only works if the storage has a storage method that points to the local file system, that is, a file:// URI.

Warning

Do not enable auto-detection for multiple storages located on the same device, as each storage will then have the capacity of the device. This means that storages may appear to have free space in Vidispine, when there is actually no space left on the device.

Storage cleanup

If you have used storage rules to control the placement of files on storages then you may have noticed that files have been copied to the storages selected by the rules, but that files on the source storages have not been removed.

This is by design. Vidispine prefers to keep multiple copies of a file, and only remove the files when a storage is about to become full. The storage high and low watermarks control when files should start to be removed, and when enough files have been removed and storage cleanup should stop.

For example, for a 1 TB storage with a high watermark at 80% and a low watermark at 40%, Vidispine will keep adding files to the storage until the usage exceeds 800 GB. Once that happens cleanup would occur. Files that are deletable, that is, that have a copy on another storage and that is not required to exist according to the storage rules, will be deleted. Cleanup will stop once the usage has reached 400 GB or when there are no more deletable files.

If this behavior is not desirable, then there are two options.

  1. Update the storage rules to specify where files should not exist, using the not element. For example, using <not><any/></not>.

    <StorageRuleDocument xmlns="http://xml.vidispine.com/schema/vidispine">
      <storageCount>1</storageCount>
      <storage>VX-122</storage>
      <not><any/></not>
    </StorageRuleDocument>
    
  2. Set the high watermark on the storage to 0%. Updating the storage rules is preferred as storage cleanup will be triggered continuously if the high watermark is set at a low level.

Evacuating storages

If you would like to delete a storage, but you still have files there which are connected to items, you can first trigger an evacuation of the storage. This will cause Vidispine to attempt to delete redundant files, or move files to other storages. Once the evacuation is complete, the storage will get the state EVACUATED.

Storage methods

Methods are the way Vidispine talks to the storage. Every method has a base URL. See Storage method URIs for the list of supported schemes.

Retrieve a storage to check its status. The storage state shows if the storage is accessible to Vidispine. If a storage is not accessible, then its state will be OFFLINE. Check the failureMessage in the storage methods to find out why. The failure message will be the error from when the last attempt to connect to the storage was made, and will be available even when the storage comes back online again. Compare lastSuccess to lastFailure to determine if the error message is current or not.

If multiple methods are defined for one storage, it is important, in order to avoid inconsistencies, that they all point to the same physical location. E.g. a storage might have one file system method, and one HTTP method. The HTTP URL must point to the same physical location as the file system method.

Storage method examples

Here are some examples of valid storage methods:

  • file:///mnt/vidistorage/
  • ftp://vidispine:pA5sw0rd!?@10.85.0.10/storage/
  • azure://:%2ZmFuODl0MGg0MmJ5ZnZuczc5YmhndjkrZThodnV5Ymhqb2lwbW9lcmN4c2Rmc2Q0NThmdjQ0Mzc4cWF5NGcxNg0Kdjg0NyANCmw3csO2NWk%3D%3D@vsstorage/

Method types

Methods can also be of different type. By default, the type is empty. Only those methods (with empty types) are used by Vidispine when doing file operations, the other methods are ignored, but can be returned, for example when requesting URLs in search results.

New in version 4.1: Credentials are encrypted. This means that passwords cannot be viewed through the API/server logs.

Auto method types

One exception is method type AUTO, or any method type with prefix AUTO-. When a file URL is requested, with such method type, the a no-auth URL will be created (with the method URL as base).

If there is no AUTO method defined, but a file URL is requested with method type AUTO, an implicit one will be used automatically.

GET /item/VX-2406?content=uri&methodType=AUTO
Accept: application/xml
<ItemDocument xmlns="http://xml.vidispine.com/schema/vidispine" id="VX-2406">
  <files>
    <uri>http://vs.example.com:8089/APInoauth/storage/VX-1/file/VX-6537/0.7354486788234469/VX-6537.mp4</uri>
    <uri>http://vs.example.com:8089/APInoauth/storage/VX-1/file/VX-6536/0.7638025887084131/VX-6536.dv</uri>
  </files>
</ItemDocument>

The URL returned is only valid for the duration of fileTempKeyDuration minutes. The expiration timer is reset whenever the URL is used in a new operation (e.g. HEAD or GET).

Method metadata

In addition to select method types, method metadata can be given as instructions for the URI returned. Two metadata values are defined:

format

Specifies if any special format of the URI should be returned. By default, the normal URI is returned. Two values are defined:

SIGNED
Returns a http URI that points contains a signed URI directly to Azure or S3 storage. If a signed URI cannot be generated from the underlying (default) URI, no URI is returned.
SIGNED-AUTO

New in version 4.2.9.

As above, but if no URI can be generated, an AUTO URI (see above) is returned.

expiration
Sets the expiration time of the signed URI, in minutes. If not specified, the expiration time is 60 minutes, unless azureSasValidTime is set.
contentDisposition
Sets the Content-Disposition header for the signed URI. If not specified, the Content-Disposition header will be set to null.
vsauri

Specifies if the VSA URI (schema vxa) should use UUID or name syntax. By default, UUID is used.

UUID
Return URI with hostname being the UUID of the VSA.
NAME
Return URI with hostname being the NAME of the VSA.
GET /item/VX-206?content=uri&methodMetadata=format=SIGNED-AUTO&methodMetadata=contentDisposition=attachment%3b+filename%3dmyfile.mov
Accept: application/xml
<ItemDocument xmlns="http://xml.vidispine.com/schema/vidispine" id="VX-206">
  <files>
    <uri>https://vstest.s3.amazonaws.com/VX-362.mp4?Expires=1439545041&amp;AWSAccessKeyId=AKIAJCCXQRY2MW4YQUVQ&amp;Signature=UcNdTIm1v1omM%2FaIGaYXf4QNfc%3D</uri>
    <uri>http://vs.example.com:8089/APInoauth/storage/VX-1/file/VX-336/0.7638025117084131/VX-336.dv</uri>
  </files>
</ItemDocument>

Parent directory management

For local file systems (method is using a file:// URI), Vidispine will by default remove empty parent directories when deleting the last file in the directory.

New in version 4.2.5: This can be controlled, either on system level or on storage level. If the storage metadata keepEmptyDirectories is set to true, empty directories are preserved in that storage. Likewise, if the configuration property keepEmptyDirectories is set to true, empty directories are preserved for all storages. Storage configuration overrules system configuration.

Files

When are files scanned?

In order to discover changes made to files, or if any files have been removed/added, Vidispine will scan the storages periodically. It is possible to disable the scanning by not having any methods with browse=true on the storage. The scan interval is also configurable on a per storage basis by setting the scanInterval storage metadata. The value should be in seconds. Setting this to a higher value will lower the I/O load of the device, but any file changes will take longer to be discovered. This also means that file notifications for file changes or file creation will be triggered later for changes occurring outside of Vidispine’s control.

You can force a rescan of a storage by calling POST /storage/(storage-id)/rescan. This will trigger an immediate rescan of a storage if the supervisor is idle. If a supervisor is already busy processing the files then you may notice that the rescan happens some time later.

Avoiding frequent scan of S3 storages

New in version 4.4.

Scanning a S3 storage can be expensive both in terms of time and money. To make it cheaper to access a S3 bucket, you can configure Vidispine to poll Amazon SQS for S3 events.

See S3 Event Notifications for more information.

File States

Files can be in one of the following states:

NONE
Just created, not used.
OPEN
Discovered or created, not yet marked as finished.
CLOSED
File does no longer grow.
UNKNOWN
The current state is not known.
MISSING
File is missing from the file system/storage.
LOST
File has been missing for a longer period. Candidate for restoration from archive.
TO_APPEAR
File will appear on file system/storage, transfer subsystem or transcoder will create it.
TO_BE_DELETED
The file is no longer in use, and will be deleted at the next clean-up sweep.
BEING_READ
File is in use by transfer subsystem or transcoder.
ARCHIVED
File is archived.
AWAITING_SYNC
File will be synchronized by multi-site agent.

Vidispine will mark a file as MISSING when it is first detected that the file no longer exists on the storage. No action is taken for files that are missing. If the file does not appear within the time specified by lostLimit, then the file will be marked as LOST. Lost files will be restored from other copies if such exist.

Items and storages

By default, when creating a new file, Vidispine will choose the LOCAL storage with the highest free capacity. This can be changed in a few different ways:

File hashing

Vidispine will calculate a hash for all files in a storage. This is done by a background process, running continuously. Files are hashed one by one for performance reasons, so if a large number of files are added to the system in a short time span it might take some time for all hashes to be calculated. The default hashing algorithm is SHA-1. This can be changed by setting the configuration property fileHashAlgorithm. See below for a list of supported values.

Additional algorithms

Vidispine can be configured to calculate hashes using additional algorithms by setting the additionalHash metadata field on the storage. It should contain a comma separated list (no spaces) of algorithms. The supported algorithms are:

  • MD2
  • MD5
  • SHA-1
  • SHA-256
  • SHA-384
  • SHA-512

Throttling storage I/O

Vidispine will retrieve information about files on a storage at the configured scan intervals. If you find that the I/O on your local disk drives is high, even when no transfers or transcodes are being performed, then you can try rate limiting the stat calls performed by Vidispine. Do this by setting statsPerSecond or the configuration property statsPerSecond to a suitable limit. During the file system scan, Vidispine will typically perform one stat per file.

An easy way to check if rate limiting the stat calls will have any effect is to disable the storage supervisors in Vidispine. This can be done using PUT /vidispine-service/service/StorageSupervisorServlet/disable. Remember to enable the service afterwards or you will find that Vidispine no longer detects new files on the storages, among other things.

It could also be that it’s the file hashing service that is the cause of the I/O. You should be able to tell which service is behind it by monitoring your disk devices. If there’s a high read activity/a large amount of data read from a device then it could be the file hashing that’s the cause. If the number of read operations per seconds is high then it’s more likely the storage supervisor.

Tip

Use tools such as htop, iotop, dstat and iostat to monitor your systems and devices.

Throttling transfer to and from a storage

New in version 4.0.

It is possible to specify a bandwidth on a storage or a specific storage method. This causes any file transfers involving the specified storage or storage method to be throttled. If multiple transfers take place concurrently, the total bandwidth will be allocated between the transfers. If a bandwidth is set on both the storage and its storage methods, the lowest applicable bandwidth will be used.

To set a bandwidth you can set the bandwidth element in the StorageMethodDocument when creating or updating a storage or storage method. The bandwidth is set in bytes per second.

Example

Updating a storage to set a bandwidth of 50,000,000 bytes per second.

PUT /storage/VX-2
Content-Type: application/xml

<StorageDocument xmlns="http://xml.vidispine.com/schema/vidispine">
  <type>LOCAL</type>
  <capacity>1000000000</capacity>
  <bandwidth>50000000</bandwidth>
</StorageDocument>

Example

Updating a storage method to set a bandwidth of 20,000,000 bytes per second.

PUT /storage/VX-2/method?uri=http://10.5.1.2/shared/&bandwidth=20000000

Temporary storages for transcoder output

New in version 4.2.3.

The Vidispine transcoder requires that the destination (output) file can be partially updated. This is in order to be able to write header files after the essence has been written.

In previous versions, this is solved by the application server storing the intermediate result as a temporary file on the local file system (/tmp). This requires a lot of space on the application server.

With version 4.2.3, another strategy is available. Instead of storing the result as one file on the application server, several small files are stored directly on the destination file system as “segments”. After the transcode has finished, the segments are merged. On S3 storage, this merging can be done with S3 object(s)-to-object copy.

Control of the segment file strategy is via the useSegmentFiles configuration property.

Storage credentials

Storage credentials can be specified in the storage URL, but can also be saved in an external location and referenced by an alias. This is configured in the server configuration file. Credentials can be stored in either:

For example, a FTP storage could be configured either using ftp://testuser:testpassword@ftp.example.com/, or using ftp://exampleftp@ftp.example.com/; with exampleftp being an alias referencing the externally stored credentials.

Java Keystore

A Java Keystore can be used to store private keys, for example, the private keys for a Google Cloud Platform service account.

server.yaml
secrets:
  keyStore:
    path: /etc/vidispine/server.keystore
    password: changeit

Local file

For local file secret storage, the alias refers to the file under the configured secret path, containing the private key or username and password credentials.

  • With private keys, the file should contain the private key as is.
  • With username and password credentials, the file should be a directory, containing two files, username and password.

For example:

server.yaml
secrets:
  file:
    path: /etc/secrets/
$ mkdir -p /etc/secrets/exampleftp/
$ echo -n "testuser" > /etc/secrets/exampleftp/username
$ echo -n "testpassword" > /etc/secrets/exampleftp/password

This could be one way to consume credentials from secrets in Kubernetes, or similar services that expose secrets via the local file system.

Hashicorp Vault

Using Hashicorp Vault the alias should match the name of a secret in Vault. Username and password credentials will be read from the keys username and password; private keys from the private_key key.

For example:

server.yaml
secrets:
  vault:
    address: http://vault.example.com:8200
    token: 2262e94c-39c3-b9a8-605d-f0450dfc558b
    keyPrefix: secret/

The keyPrefix setting can be used to for example select the backend to use. For example, with Vault configured with a “generic” backend mounted at secret/:

$ vault mounts
Path        Type       Default TTL  Max TTL  Description
secret/     generic    system       system   generic secret storage
sys/        system     n/a          n/a      system endpoints used for control, policy and debugging
$ vault write secret/exampleftp username=testuser password=testpassword
$ vault read secret/exampleftp
Key                  Value
---                  -----
refresh_interval     720h0m0s
password             testpassword
username             testuser

Storage method URIs

The following URI schemes are defined.

file

Syntax:file:///{path}
Example:file:///mnt/storage/, file:///C:/mystorage/
Note:The URI file://mnt/storage/ is not valid! (But file:/mnt/storage/ is.)

ftp

Syntax:ftp://{user}:{password}@{host}/{path}
Example:ftp://johndoe:secr3t@example.com/mystorage/

New in version 4.1.2: Add query parameter passive=false to force active mode. To set the client side ports used in active mode, set the configuration property ftpActiveModePortRange, the value should be a range, e.g. 42100-42200.

To set the client IP used in active mode, set the configuration property ftpActiveModeIp.

sftp

Syntax:sftp://{user}:{password}@{host}/{path}
Example:sftp://johndoe:secr3t@example.com/mystorage/

http

Syntax:http://{user}:{password}@{host}/{path}
Example:http://johndoe:secr3t@example.com/mystorage/
Note:Requires WebDAV support in host.

https

Syntax:https://{user}:{password}@{host}/{path}
Example:https://johndoe:secr3t@example.com/mystorage/
Note:Requires WebDAV support in host.

omms

Syntax:omms://{userId}:{userKey}@{hostList}/{clusterId}/{vaultId}/
Example:omms://c2f6a2f4-6927-11e1-cc94-ab94bd11183f:some%20secret@10.0.0.3,10.0.0.4/4255378f-dc73-fca3-e40d-5726008b3dac/0a49472d-15d4-12f1-862e-f9708d49267e/
Note:Object Matrix Matrix Store.

s3

Syntax:s3://{accessKey}:{secretKey}@{bucket}/{path}
Example:s3://KDASODSALSDI8U:RxZYlu23NDSIN293002WdlNyq@mystore/storage1/

If no access key is provided, then the credentials will be read from the AwsCredentials.properties file in the credentials directory, if one exists. Else, credentials will be read from the default locations used by the AWS SDK.

Valid S3 bucket names must agree with DNS requirements.

Changed in version 4.6: The default locations specified by the AWS SDK are now also searched.

The following query parameters are supported:

endpoint

The endpoint that the S3 requests will be sent to.

See Regions and Endpoints in the Amazon documentation for more information.

New in version 4.4.

region

The region that will be used in the S3 requests.

See Regions and Endpoints in the Amazon documentation for more information.

New in version 4.4.

signer

The algorithm to use to signing requests. Valid values include S3SignerType for AWS signature v2, and AWSS3V4SignerType for AWS signature v4.

New in version 4.5.3.

Default:Signature algorithm will be selected by region.

Note

For Version 4 Signature only regions (Beijing and Frankfurt) to work, the endpoint or region parameter must be set. Example:

  • s3://frankfurt-bucket/?endpoint=s3.eu-central-1.amazonaws.com
  • s3://frankfurt-bucket/?region=eu-central-1

Storage method metadata keys can be used control the interaction with the storage.

storageClass

The default Amazon S3 storage class that will be used for new files created on an Amazon S3 storage. Can be either standard, infrequent or reduced

Default:standard

New in version 4.0.3.

Changed in version 4.5: Support for infrequent access was added.

sseAlgorithm

The encryption used to encrypt data on the server side. See Server-Side Encryption. By default no encryption will be performed.

This sets the x-amz-server-side-encryption header on PUT Object S3 requests.

Example:AES256

New in version 4.4.1.

accelerate

Enable S3 Transfer Acceleration.

Default:false

New in version 4.8.

retrievalTier

The default Glacier retrieval tier to use when restoring the file. Can be set to either Expedited, Standard or Bulk. See Restoring Archived Objects for more information.

New in version 4.9.

Note

For S3 Transfer Acceleration to work, the endpoint or region parameter must be set. Also make sure that transfer acceleration is enabled on the bucket.

Other S3 compatible endpoints may not support transfer acceleration.

ds3

Syntax:ds3://{accessKey}:{secretKey}@{bucket}/{path}
Example:ds3://KDASODSALSDI8U:RxZYlu23NDSIN2Nyq@bucketname/?endpoint=http://blackpearl-endpoint
Note:Spectra BlackPearl Deep Storage Gateway.

New in version 4.5.

The following query parameters are supported:

endpoint
The endpoint of the BlackPearl service. This is mandatory.
chunkReadyTimeout

The maximum time (in seconds) of waiting for BlackPearl to prepare the target data chunk, or an EOF will be returned.

Default:1800
checksumType

If set, a client-side checksum will be computed and sent to BlackPearl gateway for data integrity verification. Supported checksum types are: md5, crc32 and crc32c.

Default:Empty, no checksum will be sent.

azure

Syntax:azure://:{accessKey}@{accountName}/{containerName}
Example:azure://:KLKau23dEE02WdlLiO@companyname/container1/

New in version 4.0.1.

gs

Google Cloud Storage.

New in version 4.7.

Using a P12 private key:

Syntax:gs://{privateKeyAlias}@{bucket}/?project={project}&account={account}
Example:gs://test-key-p12@test-bucket/?project=12345&account=67890

Using a JSON private key:

Syntax:gs://{privateKeyAlias}@{bucket}/?project={project}
Example:gs://test-key-json@test-bucket/?project=12345

Using an OAuth2 access token:

Syntax:gs://:{accessToken}@{bucket}/?project={project}
Example:gs://:abc123@test-bucket/?project=12345

Using the credentials file specified in the GOOGLE_APPLICATION_CREDENTIALS environmental variable:

Syntax:gs://{bucket}/
Example:gs://test-bucket/

See also

See here for some notes on how to write URIs.