Text Analysis Info - Information retrieval software

 

Last update: 30. June 2005

Programs listed here can be divided into more subtle groups:

  • pure information retrievers: searching and displaying texts, indexers
  • concordancers: programs providing concordances

 

AntConc 3.0.1

 

program: AntConc 3.0.1
author: Lawrence Anthony
distributor: Linguist's Software
documentation: readme file for usage
download: free version
operating system: MS-Windows, Mac-OS
description:

 

AnyText

 

program: AnyText
author: Linguist's Software
distributor: Linguist's Software
documentation: no
download: no
operating system: Mac-OS System 7.1-9.2, or the Classic system in OS X. (You must be able to boot into Classic to install.) 2 MB of RAM.
description: AnyText is a HyperCard®-based Full Proximity Boolean Search Engine and Index Generator that allows you to create concordances and do FAST word searches on ordinary text files in English, Greek and Russian languages. AnyText was designed especially to work with the Greek, English, Cyrillic and Latin Bible texts, but can be used with any text-only file. The text files can be on diskettes, hard disk drives or CD-ROM drives, as long as there is disk space for the special indexing files that AnyText must create and access for operation. Requires 2 Megabytes of RAM.

 

Ask Sam 6.0

 

program: Ask Sam 6.0
author: Ask Sam Software Development
distributor: Ask Sam Software Development
documentation: overview and quick tour
download: trial version
operating system: MS-Windows
description: AskSam is a fast information retrieval program and allows searching in E-mails and PDF-files. The new professional version allows programming (e.g. with Visual Basic).

 

ATA - Ashton Text Analyser (WinATA Mark 2)

 

program: ATA - Ashton Text Analyser
author and distributor: Peter Roe
documentation: users's guide
download: no, but it is free for non-commercial applications
operating system(s): Win9x, WinNT
description: ATA generates word lists, KWIC, KWOC

 

Concordance 3.2

 

program: Concordance 3.2
author and distributor: Rob J C Watt
documentation: manual
download: trial
operating system(s): Win9x, WinNT, WinXP
description:
phrases, proximity search, samples, regular expression search, references
book-like indexing, treat upper and lower case separately, show duplicate words separately, analyse characters instead of words, It can also handle East Asian languages (e.g. Chinese).
sort headwords by order of occurrence, sort word endings using a string sort, sort contexts by string before and string after headword
language support including East Asian languages on Windows 2000/XP
user-definable HTML entity translation

 

DBT 3.1 - Data base testuale

 

The website provides information in Italian only, English pages are under construction.

program: DBT 3.1 - overview, under construction
author: Eugenio Picchi
distributor: none
documentation: none
download: demo with differents texts
operating system: Win9x, WinNT
description: DBT can do word searches, concordances, search for word sets with boolean logic (including wildcards and fuzzy search) search in the main text or in accessory components (notes, apparatus, appendices). Also possible: word lists in different sort orders, index locorum, list of all verses, rhyming dictionary, list of most frequently character sequences and word sequences, handling of images, which can be associated to every part of the text.

 

Eric Johnson's programs

 

program: Eric Johnson's programs
author and sitributor: Eric Johnson
documentation: none
download: none
operating system(s): DOS
description: Eric Johnson's programs are especially written for the analysis of plays and poetry. Some of the program require SGML tagged texts, some are limited to certain text corpora (e.g. Jane Austen or Shakespeare).
ACTORS: list of characters on the stage simultaneously - generated each time there is an entrance or exit, co-occurences of characters on stage, list of possible doubling of roles.
BITZER generates an index of page numbers (or line numbers) for all words.
CONCORD produces a key-word-in-context concordance.
number of words within quotation marks
FINDLIST: comparision of word lists (more than two)
IDENT compares the number and percentage of occurrence of selected words in two text files.
PICKWICK: filter program for scenes or places of a play in tagged texts.
SENT: statistics and graphics on sentence length (in strings).
SHAKWORD: cross-references for selected in tagged texts.
WORDS: wordlists, counts types and tokens
Four programs that process the Oxford Electronic Text Library Edition of the Complete Works of Jane Austen.

 

KURA 1.0

 

program: Kura 1.0
author: Boudewijn Rempt
distributor: Boudewijn Rempt
documentation: manual
download: open source
operating system(s): Win9x, WinNT, Unix/X11, MaxOS, Linux
description:

Kura (Nepali for language) is a multi-user open-source linguistic database especially geared towards language description. Read the proposal for a description of the intended functionality and the preliminary design.
The application consists of three independent parts: an application framework that works with the data from the database, and a gui-framework that the user to work with the data, and a server that can present the data in html form over the Internet.
Requirements
Kura is developed in Python, with some third party extension modules for the GUI and the database connectivity.

 

LEXA 7.0 - Corpus Processing Software

 

program: LEXA 7.0
author: Raymond Hickey University of Essen/Germany
distributor: University of Bergen, Norway
documentation: documentation quite like a manual
download: test
operating system(s): DOS
description: LEXA is an open system based on files. It can perform lemmatisation, word lists, lexical density tables, file comparision, global find and replace, database and corpus management functions (print, sort), statistics on characters, words, and sentences, searching groups of files looking for strings, also with wildcards * and ?, also in databases (DBF-files). There are also lots of DOS-utilites.

 

Metamorph

 

program: Metamorph
distributor: Thunderstone Software
documentation: manual
download: none
operating systems: DOS, 0Win9x, WinNT, Unix
description: Metamorph is a realtime concept based search package. It will search through anything without any pre-processing steps. Metamorph has an English language vocabulary of 250,000 word and phrase concept associations for natural language queries, also boolean logic (with weights), and wildcards can be used. It also provides proximity control, fuzzy searches, true regular expression matching, and numerical value searches.
The Metamorph API alone is available for most operating systems.

 

Microconcord

 

program: MicroConcord
author: Mike Scott, Tim Johns
distributor: Mike Scott
documentation: none
download: freeware
operating system(s): DOS
description: MicroConcord is the predecessor of WordSmith. It is faster than Windows but the number of concordance lines is limited to around 1,500, and you can't save a concordance except as a text file.

 

MicroOCP - Oxford Concordance Package

 

The program is not available any more. However, you will find outdated information on the web that tells you otherwise, they lead you to dead links.

 

MonoConc Pro 2.0

 

program: MonoConc 2.0
author:
Michael Barlow
distributor:
Athelstan
documentation: unknown
download:
demo limited to 20 hits
operating system(s): Win9x
description: MonoConc is a concordancer. It can create concordances, word lists, (with exclusion lists, case sensitive/insensitive), converts texts, and works with tagged texts and with different languages. Searching can be done with wildcard characters and variable (multi-line) context (also a sentence). Sorting to words left and right, collocation of words is possible, too.

 

MonoConc Pro 2.0

 

program: MonoConc 2.0
author: Michael Barlow
distributor: Athelstan
documentation: unknown
download: demo limited to 20 hits
operating system(s): Win9x
description: MonoConc is a concordancer. It can create concordances, word lists, (with exclusion lists, case sensitive/insensitive), converts texts, and works with tagged texts and with different languages. Searching can be done with wildcard characters and variable (multi-line) context (also a sentence). Sorting to words left and right, collocation of words is possible, too.

 

Phrase Context 1.02

 

program: Phrase Context
author/distributor: Hans J. Klarskov Mortensen
download: test version
documentation: none
operating systems: Windows ?
description: Phrase Context is a versatile program that counts words and phrases, does concordances, calculates TTR-and lexical density values, regular expressions as search patterns, and writes XML formatted output files. The program is still in beta status.

 

Sonar 6.0 (Windows) / 12 (MacOS) Text Retrieval/Document Management Systems

 

program: Sonar 6.0
distributor: Virginiasystems
download: demo
documentation: none
operating systems: Win9x, WinNT, MacOS
description: High speed program than can process many types of text and word processing files.

 

TACT 2.1.5 - Text Analysis Computing Tools

 

There is no information available on the web, but you can still download the program. program: TACT 2.1.5
authors: Michael Stairs, John Bradley, Ian Lancashire, Lidio Presutti
download: free for research and teaching
operating system(s): DOS
description: TACT is a system of 15 programs and designed to do text-retrieval and analysis on literary works. Typically, researchers use TACT to retrieve occurrences of a word, word pattern, or word combination. Output takes the form of a concordance, a list, or a table. Programs also can do simple kinds of analysis, such as sorted frequencies of letters, words or phrases, type-token statistics, or ranking of collocates to a word by their strength of association.
TACT is intended for individual literary texts, or small to mid-size groups of such texts. Languages using a roman alphabet and classical Greek are supported. There is also a mailing list for TACT-users.

 

Textalyzer

 

program: Textalyzer
author: Bernhard Huber
distributor: none
documentation: self explaining download: none
operating system: runs on a web site
description: Textalyser is a free text analysis tool that counts words, sentences, syllables, and lexical density. It also computes the Gunning readability index. A small but nice tool that counts syllables correct at least for English, French, and German. You can cut and paste text or and specify a web page.

 

Textstat 2.1

 

program: Textstat 2.1
author: Matthias Hüning
distributor: Matthias Hüning
documentation: manual
download: freeware
operating system: Windows
description: TextSTAT is a simple programme for the analysis of texts. It reads ASCII/ANSI texts and HTML files (directly from the internet) and it produces word frequency lists and concordances from these files. The programme runs on MS Windows and is distributed as freeware. Source codein Python is also available for free. User interface in German (default), English, and French.

 

WordSmith 4.0

 

program: WordSmith 4.0
author: Mike Scott
distributor: Mike Scott, Liverpool University
documentation: manual
download: beta version is still free
operating system: Win9x, WinNT
description: WordSmith is the sucessor of MicroConcord.

Text Analysis, Text Mining, and Information Retrieval Software

commercial: | free

  • Arrowsmith software for supporting discovery from complementary literatures.
  • ClearForest, tools for analysis and visualization of your document collection.
  • Compare Suite, compares texts by keywords, highlights common and unique keywords.
  • Connexor Machinese, discovers the grammatical and semantic information of natural language.
  • Copernic Summarizer, can read and summarize document and Web page text contents in many languages from various applications
  • Corpora, a Natural Language processing company that creates simple tools to help end users deal more effectively with unstructured text.
  • DocMINER offers visual text mining and retrieval capabilities for the explorative analysis of text collections. Functionality comprises search, term statistics, summary, fiel description, annotations, and more.
  • DolphinSearch, text-reading robot powered by a computer model of the extraordinary pattern recognition capabilities of a dolphin's brain.
  • dtSearch, for indexing, searching, and retrieving free-form text files.
  • DS Dataset, a Knowledge Management System.
  • Enkata, providing a range of enterprise-level solutions for text analysis.
  • Entrieva, patented technology indexes, categorizes and organizes unstructured text from virtually any source.
  • Files Search Assistant, quick and efficient search within text documents.
  • updateFreeText Software Technologies, commercial products and custom applications for "free text" analysis.
  • Intellexer, natural language searching technologies for developing knowledge management tools, document comparison software and document summarization software, custom built search engines and other intelligent software.
  • Insightful InFact, an enterprise search and analysis solution for mining text, images, and numerical data.
  • Inxight, enterprise software solutions for analysis of text and unstructured information.
  • ISYS:desktop, searches over 100 file formats across multiple sources; on-the-fly HTML conversion.
  • Klarity (part of Intology tools), learns, recognises, analyses and labels any textual information for subsequent recall.
  • Kwalitan 5 for Windows, uses codes for text fragments to faciliate textual search, display overviews, build hierarchical trees and more.
  • Leximancer, makes automatic concept maps of text data collections
  • Lextek Onix Toolkit, for adding high performance full-text indexing search and retrieval to applications.
  • Lextek Profiling Engine, for automatically classifying, routing, and filtering electronic text according to user defined profiles.
  • media style text classification (tc) java-based library for high speed text classification.
  • media style information extraction (ie) java-based library, allows user to identify entities in a text like person, organization or locations.
  • Megaputer Text Analyst, offers semantic analysis of free-form texts, summarization, clustering, navigation, and natural language retrieval with search dynamic refocusing.
  • Monarch, data access and analysis tool that lets you transform any report into a live database.
  • Recommind MindServer, uses PLSA (Probablistic Latent Semantic Analysis) for accurate retrieval and categorization of texts.
  • SAS Text Miner, provides a rich suite of text processing and analysis tools.
  • SPSS LexiQuest, for accessing, managing and retrieving textual information; integrated with SPSS Clementine data mining suite.
  • SPSS Text Mining for Clementine enables you to extract key concepts, sentiments, and relationships from call center notes, blogs, emails and other unstructured data, and convert it to structured format for predictive modeling.
  • Temis-Group Insight Discoverer and Skill Cartridge, text mining engines and software components
  • TeSSI®, software components that perform semantic indexing, semantic searching, coding and information extraction on biomedical literature.
  • Text Analysis Info, offering software and links for Text Analysis and more
  • Textalyser, online text analysis tool, providing detailed text statistics
  • TextPipe Pro, text conversion, extraction and manipulation workbench.
  • TextQuest, text analysis software
  • Readware Information Processor for Intranets and the Internet, classifies documents by content; provides literal and conceptual search; includes a ConceptBase with English, French or German lexicons.
  • Quenza, automatically extracts entities and cross references from free text documents and builds a database for subsequent analysis.
  • VantagePoint provides a variety of interactive graphical views and analysis tools with powerful capabilities to discover knowledge from text databases.
  • VisualText, by TextAI is a comprehensive GUI development environment for quickly building accurate text analyzers.
  • Wordstat, analysis module for textual information such as responses to open-ended questions, interviews, etc.

free and shareware:

  • INTEXT, MS-DOS version of TextQuest, in public domain since Jan 2, 2003.
  • S-EM (Spy-EM), a text classification system that learns from positive and unlabeled examples.
  • Vivisimo/Clusty web search and text clustering engine.

Many packages above offer free or limited trial versions.