Add to awss3sink and awss3putobjectsink elements the following
paramerters which are set on the uploaded S3 objects:
* cache-control;
* content-encoding; and
* content-language
Bugfix: Set the content-type and content-disposition values in the S3
putobject call. Previously the params were defined on the element but
unused.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1585>
Quoting [`BehaviorVersion` documentation]:
> Over time, new best-practice behaviors are introduced. However, these
> behaviors might not be backwards compatible. For example, a change which
> introduces new default timeouts or a new retry-mode for all operations might
> be the ideal behavior but could break existing applications.
This commit uses `BehaviorVersion::v2023_11_09()`, which is the latest
major version at the moment. When a new major version is released, the method
will be deprecated, which will warn us of the new version and let us decide
when to upgrade, after any changes if required. This is safer that using
`latest()` which would silently use a different major version, possibly
breaking existing code.
[`BehaviorVersion` documentation]: https://docs.rs/aws-config/1.1.8/aws_config/struct.BehaviorVersion.html
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1520>
When streaming small amounts of data, using awss3sink might not be a
good idea, as we need to accumulate at least 5 MB of data for a
multipart upload (or we flush on EOS).
The alternative, while inefficient, is to do a complete PutObject of
_all_ the data periodically so as to not lose data in case of a pipeline
failure. This element makes a start on this idea by doing a PutObject
for every buffer.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1337>
Prior to this commit, we were sending over words concatenated together
with no separators, for instance "Idon'twanttobeanemperor".
The translation service seems clever enough to translate the contents
anyway, but there is no reason to make its task harder than necessary,
and it didn't re-add separators when the target language was the same as
the source language, which resulted in less than ideal output.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1171>
* A queue dedicated to transcript items not intended for translation.
* A queue dedicated to transcript items intended for translation. The items are
enqueued after a separator is detected or translate-lookahead was reached.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1137>
This commit adds an optional experimental translation tokenization feature.
It can be activated using the `translation_src_%u` pads property
`tokenization-method`. For the moment, the feature is deactivated by default.
The Translate ws accepts '<span></span>' tags in the input and adds matching
tags in the output. When an 'id' is also provided as an attribute of the
'span', the matching output tag also uses this 'id'.
In the context of close captions, the 'id's are of little use. However, we can
take advantage of the spans in the output to identify translation chunks, which
more or less reflect the rythm of the input transcript.
This commit adds simples spans (no 'id') to the input Transcript Items and
parses the resulting spans in the translated output, assigning the timestamps
and durations sequentially from the input Transcript Items. Edge cases such as
absence of spans, nested spans were observed and are handled here. Similarly,
mismatches between the number of input and output items are taken care of by
some sort of reconcialiation.
Note that this is still experimental and requires further testings.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1109>
This commit adds an optional transcript translation feature implemented as
request src Pads.
When requesting a src Pad, the user can specify the translation language code
using Pad properties 'language-code'.
The following properties are defined on the Element:
- 'transcribe-latency': formerly 'latency', defines the expected latency for
the Transcribe webservice.
- 'translate-latency': defines the expected latency for the Translate
webservice.
- 'transcript-lookahead': maximum transcript duration to send to translation
when a transcript is hitting its deadline and no punctuation was found.
When the input and output languages are the same, only the 'transcribe-latency'
is used for the Pad. Otherwise, the resulting latency is the addition of
'transcribe-latency' and 'translate-latency'.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1109>
This helps gather together the details related to the `TranscriberLoop`.
One difference with previous implementation is that the ws `Client` is
build each time the loop is started instead of being reused. With the new
approach, we don't keep the connection open after EOS and we should be
more resistant in case of a connection failure.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1104>
Instead of sending transcription events to the src pad loop, this commit
enqueues the transcribed buffers immediately in the ws loop, then notifies
the src pad loop. The src pad loop is only in charge of dequeuing the buffers.
This should help with upcoming evolutions.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1104>
for uploaded object default content-type is set to binary/octet-stream,
which is correct.
metadata cannot be used to set content-type and content-disposition as
setting metadata add a prefix x-amz-meta to key
e.g. setting metadate "content-type=video/mp4" actually set value as
x-amz-meta-content-type. So these has to be seaprate property.
Part-of: <https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/merge_requests/1085>