Thextedit user manual - version 0.1.0

A simple text editor focused on character sets encoding.
It can be useful to analyze the binary data of text allowing to experiment with codecs (character set encoding), BOM (Byte Order Mark), EOL (End Of Line mark), and to convert files from an encoding to another.
It features a basic text editor, and an hex viewer that allow to identify which character maps to which byte(s) (that may be not stright forward for multy-byte encodings). Powered by the "Qt" widgets toolkit, it supports most existing encodings (from ASCII to Unicode).

Release info

Version: 0.1.0, 10/10/2010

Linux version build with QtCreator 2.0.0 based on Qt 4.7.0 (32 bit)

Windows version build with QtCreator 2.0.0 based on Qt 4.6.3 (32 bit)

This is the first beta version. It may be considered as a prototype to explore the various facilities the editor could feature, but is still incomplete and has several performance issues. In particular it does not have large file support (it becomes very slow with files larger than 1MB).
Planned developing:

large file support (maybe with a virtual paging mechanism)
advanced find/replace
syntax highlighting
settings configuration
extended binary editing

Install

The latest binary and source packages can be downloaded from: http://sourceforge.net/projects/thext/
Thextedit ships as a single executable and has no setup: just extract the archive and double click the executable.
The linux version is linked against Qt4 that should be present in your system (otherwise is promptly available in most distribution).
The windows version is statically linked, so no external library is requested.

User Interface reference

Thextedit is a TDI (Tabbed Document Interface) application. Each document consists of a text/hex editor, described in the next section.
Here's a list of the UI commands menus/toolbars/keyboard-shortcats:

File menu and toolbar
- New: Create a new text document.
- Open: Open an existing file for edinting.
- Save: Save current file.
- Save as (only in menu): Save current file with a different name.
Edit menu
- Find: Show the search toolbar.
Help menu
- Supported encodings: Displays a list of all current supported encodings, with name, MIBenum, BOM and aliases: it can be used to check if a known codec is supported by onother name.
- About: Shows the version informations.
- User manual: (Should) open this document.
Encoding toolbar
- Encoding combo: The leftmost combo in the toolbar contains the list of the avaiable encoding: The displayed value is the text encoding used by the current document. How changing the encoding affects the documents depends on the status of the left button as follows:
  - Change display encoding: change the characters set encoding of the text editor: the document is left unaltered but the original bytes are decoded differently.
  - Convert text: convert the displayed text from the current encoding to the selected encoding: it does modify the document as the displayed text is encoded ignoring the original bytes.
- BOM combo: Right to the encoding combo is the BOM encoding: The displayed value is the BOM (Byte Order Mark) of the current document, if present. The BOM is placed in the first 2 to 4 bytes of the text document, and is used to determine the byte order for wide-char and multi-byte encoded charsets. Changing the value alter the document, prepending/removing the selected bom to the head of the buffer. The combo is accesible only if Convert text is selected.
- EOL combo: The last combo in the toolbar is the EOL (End Of Line) mark combo: There are 3 possible values:
  - CRLF used by window (hex value: 0D 0A)
  - LF used by unix (hex value: 0D)
  - CR used by mac (hex value: 0A)
  The displayed value is the EOL currently used in the document. For open documents it is determined by the first occurence that is found. Changing the value affects the document as all occurence of the other EOL will be replaced with the newly selected one. The current value will be used as a replace value also if the document is saved without any further changes.
View toolbar
- Find : Display the search toolbar
- HEX view : Switch between plain-text/binary view
- Base combo: Determine which numeric base is used to format the displayed bytes (only in binary view).
Search toolbar:

The search bar displays at the bottom of the window and allow to perform basic find/replace in the document.
The possible input in the find/replace text input fields matches the currently selected view-mode.
The selected codec is used to encode/decode the input value when changing view-mode.
- Whole word : Find only matches the entire words (only in text view-mode)
- Case sensitive : match exact text casing (only in text view-mode)
- Find: The text/bytes to be searched
- Find prev : find previous match (only in text view-mode)
- Find next : find next match
- Replaces: The text/bytes the find input will be replaced with.
- Replace prev : replace next match (only in text view-mode)
- Replace next : replace next match
Shorcuts:
- ctrl+'f' with focus in the editor will show up the search bar.
- return key whit focus in the search input field execute find next.
- shift+return whit focus in the search input field execute find next.
- return key whit focus in the replace input field execute replace previous (only in text view-mode).
- shift+return whit focus in the replace input field execute replace next (only in text view-mode).

The editor

Plain text view

In plain text view a very basic text editor is available, nevertheless it can be useful to experiment with various text encoding, and to convert a text document.
Please note the encoding and BOM selections are intenctionally left indepent, as to allow to reproduce commonly displayed text decoding error. Only if the encoding matches the BOM, this is hidden from the text, as all Unicode enabled programs will do. If converting a BOM marked document from the correspondenting encoding, also the BOM is converted, while it will produce extravagant (thought realistic) results, if encoding mismatched BOM.

When editing the status bar keeps updated information about the document.
Left to right the following information are displayed:

document icon as associated by the host system;
file type (suffix or extension) and size (for existing files);
number of characters, bytes, and lines;
line and column number of the cursor position in the text (only in text mode);
number of selected characters (text mode);
character ordinal position in the encoded text, and the corresponding byte offset in the raw bytes array.

Hex view

In HEX view the editor is splitted in the usual 3 areas:

offset area: display the starting offset of the displayed line;
binary area: display the portion of the buffer starting at offset;
text area: display the corresponding characters to the left bytes.

Note that while as usual all non printable characters are rendered with a '.' in the text area, the "extra" bytes of a multybyte characters are rendered with a space and not a dot, to better recognise the char/bytes association.
Moving the cursor in both the text/binary area the highlighted selection are keept syncroniced, and a different highlight color is used when a multibyte character is encountered.

License

Copyright (C) 2010 Attilio Pavone <tilly@utillyty.eu>

Thext is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Thext is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with thext. If not, see http://www.gnu.org/licenses/.

And here's your copy of the licence: GNU General Public License version 3

Develope

Source packages: sourceforge.net/projects/thext
SVN repository: svn co https://thext.svn.sourceforge.net/svnroot/thext thext

Revisions history

Version: 0.1.0, 10/10/2010
Build late at night in Macerata...
This is the first public release