Extension |
Name |
Risk |
Default Filtering Configuration |
Short Description |
|
|
|
|
|
|
|
|
|
|
|
csv |
Comma Separated Values (.csv) |
Medium |
field delimiter - comma ‘,’ text qualifier - double quote “” csv escaping mode - duplicates qualifier excludes qualifiers from extracted text excludes leading/trailing white spaces from the extracted text adds qualifiers to output when appropriate extraction mode - extracts table data table properties - values start at line 1 (no column with names) extracts data from all columns the number of columns is defined by values (may vary in different rows) allows trimming of leading/trailing spaces and tabs converts \t, \n , \\ and \uXXXX into characters separates lines with line-feeds (\n) includes okf_html@FP-subfilter-default and protects generic placeholders |
The filter extracts all table data from all columns. Generic placeholders are protected, as well as embedded HTML. |
|
dita |
Darwin Information Typing Architecture (.dita) |
Medium |
assumes the document is well formed preserves white space uses codeFinder to protect generic placeholders |
The filter accepts only well formed xml documents (which adhere to specific dita syntax rules). Generic placeholders are protected. |
|
ditamap |
Darwin Information Typing Architecture Map (.ditamap) |
Low |
asssumes the document is well formed lists elements and attributes for translation |
The filter accepts only well formed documents (which adhere to specific syntax rules). |
|
docm |
Microsoft Word (.docm) |
Medium |
does not extract document properties and comments translates headers and footers excludes graphical metadata automatically accepts revisions includes styles and highlights |
The filter extracts everything except document properties, comments, graphical metadata. It automatically accepts revisions if they are present in the document. |
|
docx |
Microsoft Word (.docx) |
Medium |
extracts headers and footers excludes graphical metadata includes HTML subfilter |
The filter extracts everything except document properties, comments and graphical metadata. |
|
dtd |
Document Type Definition XML (.dtd) |
Low |
The filter is intended to process XML-DTD that have translatable text entity declarations. |
|
|
html / htm |
HyperText Markup Language (.htm) |
Low |
protects generic placeholders |
The filter extracts all content from the file. Generic placeholders are protected. |
|
icml |
InCopy Markup Language (.icml) |
Medium |
extracts master spreads simplifies inline codes where possible uses codeFinder for tag protection |
The filter extracts all content from the file. |
|
idml |
InDesign Markup Language (.idml) |
Medium |
does not untag XML structures (the filter cannot put the tags back, it needs to be done by DTP manually, which depending on the size of the file might be an issue) extracts master spreads |
The filter extracts all content from the file, except for XML structures
|
|
json |
JavaScript Object Notation (.json) |
Medium |
extracts all key/string pairs extracts strings without associated key uses key as resname an html subfilter deals with embedded html and protects generic placeholders |
The filter extracts all values. Embedded HTML and generic placeholders are protected. |
|
markdown / md |
Markdown (.markdown) |
Low |
translates fenced code-blocks translates inline code blocks translates YAML metadata header translates image alt text placeholders are protected as inline codes. For this configuration placeholders of type #company and [checkout_date] are not protected as # and [...] are part of markdown syntax. uses the default embedded HTML filter configuration tailored for the Markdown filter (no html subfilter is needed) |
The filter extracts all content from the file. Embedded html and generic placeholders are protected. For this configuration placeholders of type #company and [checkout_date] are not protected as # and [...] are part of markdown syntax. |
|
mif |
Adobe FrameMaker Interchange format (.mif) |
Medium |
extracts variables extracts index markers extracts body pages extracts master pages inline code protection for fonts |
The filter extracts variables, index markers, body pages and master pages. |
|
mqxliff |
XML Localization Interchange File Format (.mqxliff) |
Medium |
adds the target language attribute if not present segments only if the input text is segmented includes ITS markup balances codes uses a custom xml stream parser sets Finished segments as translate=”no” default translation_type value: manual_translation value of the tm_score to match: 100.00 protects the generic placeholders |
The filter extracts all content from the file. Generic placeholders are protected. |
|
mxliff |
XML Localization Interchange File Format (.mxliff) |
Medium |
adds the target language attribute if not present segments only if the input text is segmented includes ITS markup balances codes uses a custom xml stream parser sets Finished segments as translate=”no” value of the tm_score to match: 100.00 protects generic placeholders |
The filter extracts all content from the file. Generic placeholders are protected. |
|
odp |
OpenDocument (Ver 2) Presentation (.odp) |
High |
extracts everything |
The filter extracts everything from the file. All the different embedded files are treated as sub-documents by the filter. This means that, for example, when represented in XLIFF, a single ODT extracted to a single XLIFF document is made up three XLIFF <file> elements: One for content.xml, one for style.xml, and one for meta.xml. Note that very often, only content.xml has extracted text. |
|
ods |
OpenDocument (Ver 2) Spreadsheet (.ods) |
Medium |
extracts everything |
The filter extracts everything from the file. All the different embedded files are treated as sub-documents by the filter. This means that, for example, when represented in XLIFF, a single ODT extracted to a single XLIFF document is made up three XLIFF <file> elements: One for content.xml, one for style.xml, and one for meta.xml. Note that very often, only content.xml has extracted text. |
|
odt |
OpenDocument (Ver 2) Text Document (.odt) |
Medium |
extracts everything |
The filter extracts everything from the file. All the different embedded files are treated as sub-documents by the filter. This means that, for example, when represented in XLIFF, a single ODT extracted to a single XLIFF document is made up three XLIFF <file> elements: One for content.xml, one for style.xml, and one for meta.xml. Note that very often, only content.xml has extracted text. |
|
ots |
OpenDocument (Ver 2) Spreadsheet (.ots) |
Medium |
extracts everything |
The filter extracts everything from the file. All the different embedded files are treated as sub-documents by the filter. This means that, for example, when represented in XLIFF, a single ODT extracted to a single XLIFF document is made up three XLIFF <file> elements: One for content.xml, one for style.xml, and one for meta.xml. Note that very often, only content.xml has extracted text. |
|
po |
Portable Object (.po) |
Low |
Bilingual Mode set - msgid contains the source text, msgstr contains the translation generates identifiers from the source text CodeFinder takes care of the placeholders (no html subfilter) |
The filter treats the file as bilingual - it extracts the content of "msgid" and places the translation in "msgstr". Generic placeholders are protected. |
|
potm |
Microsoft PowerPoint (.potm) |
High |
does not extract document properties and comments extracts Masters ignores placeholder text in Masters |
The filter extracts all content but document properties, comments and notes. It extracts the content of the Master slide while ignoring the placeholder text in it. |
|
potx |
Microsoft PowerPoint (.potx) |
High |
does not extract document properties and comments extracts Masters ignores placeholder text in Masters |
The filter extracts all content but document properties, comments and notes. It extracts the content of the Master slide while ignoring the placeholder text in it. |
|
ppsm |
Microsoft PowerPoint (.ppsm) |
High |
does not extract document properties and comments extracts Masters ignores placeholder text in Masters |
The filter extracts all content but document properties, comments and notes. It extracts the content of the Master slide while ignoring the placeholder text in it. |
|
ppsx |
Microsoft PowerPoint (.ppsx) |
High |
does not extract document properties and comments extracts Masters ignores placeholder text in Masters |
The filter extracts all content but document properties, comments and notes. It extracts the content of the Master slide while ignoring the placeholder text in it. |
|
pptm |
Microsoft PowerPoint (.pptm) |
High |
does not extract document properties and comments extracts Masters ignores placeholder text in Masters |
The filter extracts all content but document properties, comments and notes. It extracts the content of the Master slide while ignoring the placeholder text in it. |
|
pptx |
Microsoft PowerPoint (.pptx) |
High |
extracts all slides extracts master slides, but not the placeholder text on them |
The filter extracts all content but document properties and comments. It extracts the content of the Master slide while ignoring the placeholder text in it. Extracts speaker notes |
|
properties |
Configuration File (.properties) |
Low |
uses localization directives when they are present extracts items outside of the scope of localization directives extracts comments to note properties converts \n and \t to line break and tab CodeFinder takes care of placeholders (an html subfilter deals with the embedded html) does not escape extended characters (\uHHHH notation) |
The filter extracts the content of the values. Embedded HTML and generic placeholders are protected. |
|
resx |
.NET Managed Resource (.resx) |
Low |
extracts by default //data[not(@type) and not(starts-with(@name, '>'))]/value and //data[@name='$this.Text']/value extracts as notes //data[not(@type) and not(starts-with(@name, '>') or starts-with(@name, '$'))]/value an html subfilter deals with placeholders and embedded HTML |
The filter extracts the content of the values. Embedded HTML and generic placeholders are protected. |
|
sdlxliff |
SDL XML-based Localization Interchange File Format (.sdlxliff) |
Medium |
uses SDLXLIFF writer adds the target-language attribute if not present preserves whitespace by default skips seg-sources with no marked segments segments only if the input text unit is segmented includes ITS markup balances codes uses a custom xml stream parser sets Finished segments as translate=”no” default translation_status value: finished default translation_type value: manual translation value of the tm_score to match: 100.00 protects generic placeholders |
The filter extracts all content from the file. Generic placeholders are protected. |
|
srt |
SubRip Subtitle (.srt) |
Low |
a regex filter processes the .srt whilst the html subfilter deals with embedded html and protects generic placeholders the time-codes are not added as notes due to a limitation we found when using regex filter + html subfilter |
The filter extracts all content from the file. Generic placeholders and line-breaks are protected. |
|
strings |
Text Strings File (.strings) |
Low |
does not include notes (limitation we faced when using regex filter + html subfilter) extracts the content of the source group preserves whitespace regular expressions options: dot also matches line-feed + multiline uses localization directives when they are present extracts items outside of the scope of localization directives escaped characters use backslash mime type for the document: text/plain protects generic placeholders and embedded HTML |
The filter extracts the content of all values. Embedded HTML, generic placeholders and line-breaks are protected. |
|
stringsdict |
Apple Stringsdict (.stringsdict) |
Low |
extracts for translation /plist/dict/dict/string and /plist/dict/dict/dict/string does not extract strings with keys NSStringFormatSpecTypeKey and NSStringFormatValueTypeKey protects generic placeholders |
The filter extracts the content of <string> elements (without elements with keys NSStringFormatSpecTypeKey and NSStringFormatValueTypeKey). Generic placeholders are protected. |
|
tmx |
Translation Memory eXchange files (.tmx) |
Medium |
groups all document parts skeleton into one skips invalid TUs creates the segment if segtype is ‘sentence’ or is undefined string used to delimit property values when there are duplicate properties: , |
The filter extracts all content from the file. |
|
txt |
Plain Text (.txt) |
Low |
extracts text by lines converts \t, \n, \\ and \uXXXX into characters separates lines with line-feeds (\n) protects generic placeholders |
The filter extracts all content from the file. Generic placeholders are protected. |
|
vsdx |
Microsoft Visio (.vsdx) |
Medium |
uses the default okp_openxml filter includes HTML subfilter it offers no specific options for Visio |
|
|
xlf / xliff |
XML Localization Interchange File Format (.xlf) |
Medium |
adds the target language attribute if not present segments only if the input text is segmented includes ITS markup balances codes uses a custom xml stream parser sets Finished segments as translate=”no” protects generic placeholders |
The filter extracts all content from the file. Generic placeholders are protected. |
|
xlsm |
Microsoft Excel Macro-Enabled (.xlsm) |
High |
does not extract document properties and comments does not extract hidden rows or columns does not extract sheet names does not extract diagram data does not extract drawings The html sub-filter deals with embedded html and protects generic placeholders. |
|
|
xlsx |
Microsoft Excel (.xlsx) |
High |
does not extract hidden rows or columns does not extract sheet names does not extract diagram data does not extract drawings embedded HTML and generic placeholders are protected |
The filter extracts all content from the file but document properties, comments, hidden rows/columns, diagram data and drawings. Embedded HTML and generic placeholders are protected. |
|
xltx |
Microsoft Excel (.xltx) |
High |
does not extract hidden rows nor columns does not extract sheet names does not extract diagram data does not extract drawings includes HTML subfilter |
The filter extracts all content from the file but document properties, comments, hidden rows/columns, diagram data and drawings. Embedded HTML and generic placeholders are protected. |
|
xml |
EXtensible Markup Language (.xml) |
High |
accepts only valid, well-formed XML protects html only in CDATA does not protect placeholders preserves whitespace
|
The filter accepts only well formed documents (which adhere to specific syntax rules). HTML is protected only in CDATA . Generic placeholders are not protected.
|
|
yaml / yml |
YAML Aint Markup Language (.yaml) |
Low |
extracts isolated strings extracts all pairs uses key as name uses the full key path does not use codeFinder The html subfilter deals with placeholders and embedded html. |
The filter extracts the content of all values in the file. Embedded HTML and generic placeholders are protected. |
|
vtt |
Web Video Text Tracks (WebVTT) |
Low |
extracts the content of the source group using regex preserves whitespace regex options: dot also matches line-feed + multi-line uses localization directives when they are present extracts items out of the scope of localization directives beginning/end of string: “” escaped characters use backslash prefix mime type: text/plain The html subfilter deals with placeholders and embedded html |
The filter extracts all content from the file. Embedded HTML, generic placeholders and linebreaks are protected. |
|
tsv |
Tab Separated Values (.tsv) |
Medium |
field delimiter - tab ‘\t’ extraction mode - extracts table data table properties - values start at line 1 (no column with names) extracts data from all columns the number of columns is defined by values (may vary in different rows) allows trimming of leading/trailing spaces and tabs converts \t, \n , \\ and \uXXXX into characters separates lines with line-feeds (\n) protects generic placeholders |
The filter extracts all table data from all columns. Generic placeholders are protected. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Comments
0 comments
Please sign in to leave a comment.