Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
South African journal of information management, 2001-12, Vol.3 (2)
2001

Details

Autor(en) / Beteiligte
Titel
Automatic extraction and analysis of financial data from the EDGAR database
Ist Teil von
  • South African journal of information management, 2001-12, Vol.3 (2)
Ort / Verlag
AOSIS
Erscheinungsjahr
2001
Link zum Volltext
Quelle
EZB Free E-Journals
Beschreibungen/Notizen
  • In this article the authors discuss a new methodology of extracting financial data from the Electronic Data Gathering, Analysis and Retrieval (EDGAR) database of the Securities and Exchange Commission (SEC) which contains financial information of about 68,000 companies. In documents of this database, for example 10-K or 10-Q filings, the beginning of a balance sheet or income statement for a single company and a single year is sometimes introduced with some SGML tags and the financial data itself like balance sheet items are in pure ASCII format. We introduce text mining procedures to detect relevant financial data in these documents. This is accomplished by dextrapi (data extraction API), a wrapper for extracting information from any text-based source. The extracted information is then transformed into machine understandable XML syntax enabling and supporting quick trading decisions of stock market investors. The advantage of dextrapi over existing wrappers, for example the World-Wide Web Wrapper Factory (W4F) or the Java Extraction and Dissemination of Information (JEDI) wrapper, lies in its ability to adapt the extraction process on the semistructured input whereas most other wrappers rely on fixed data formats for extraction (e.g. extracting only HTML documents). Furthermore we introduce Edgar2xml, a software agent based on dextrapi wrapper enabling to automate the process of extracting and evaluating balance sheet data and related information from the EDGAR database. Evaluation is done with XML output which conforms to an XML schema, that is a set of rules for descriptionbing the underlying document structure of the XML document.
Sprache
Englisch
Identifikatoren
ISSN: 2078-1865
eISSN: 1560-683X
DOI: 10.4102/sajim.v3i2.127
Titel-ID: cdi_doaj_primary_oai_doaj_org_article_1848f47e322b43989201047450498675
Format

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX