Encyclopedia of Data Warehousing and Mining

Philip Calvert (Victoria University of Wellington)

Online Information Review

ISSN: 1468-4527

Article publication date: 1 May 2006

238

Keywords

Citation

Calvert, P. (2006), "Encyclopedia of Data Warehousing and Mining", Online Information Review, Vol. 30 No. 3, pp. 313-315. https://doi.org/10.1108/14684520610675852

Publisher

:

Emerald Group Publishing Limited

Copyright © 2006, Emerald Group Publishing Limited


Data mining is a term used to describe a range of activities in the extraction and transformation of data sets, and presentation of the results in a useful form. It is a form of information retrieval in which the user seeks a manageable yet pertinent number of returns from search terminology, though unlike most information retrieval techniques, data mining is focussed on data stored in structured form with fixed format fields of numeric values, character codes, or short strings. More recently data mining practitioners have commenced trying to extract data from materials as images, with the objective of turning such data into a structured format. In particular, data mining is targeted at large legacy databases that often hold huge quantities of data that are never utilised. Often, data mining attempts to discover underlying patterns, trends and relationships in the original data sets. The techniques of data mining can be extended to examine very large enterprise or scientific databases, whether they are located in a single location or distributed globally. This is also the case with the world wide web, considered in this context as a massive database.

Data mining is a new and potentially very rewarding addition to information science, but it is a young discipline in which neither the significant research trends, nor the leading scholars are yet apparent. There is a significant literature, however, as a recent (and very useful) bibliography shows, for in Current Essays and Reports in Information Retrieval and Data Mining: An Annotated Bibliography of Shorter Monographs, edited by Alexander, Wilson and Williams (Scarecrow, 2005) there were 173 entries for data mining alone.

The two‐volume Encyclopedia of Data Warehousing and Mining from the Idea Group is a significant addition to the literature, which is becoming just about stable enough to warrant a basic reference source such as this. The basic figures will give some idea of its scope. There are more than 361 contributors from 34 countries, and most continents are well represented with the exception of Africa. There are definitions of 1,850 technical and managerial terms and 4,400 references. The meat of the work is the 234 chapters, each of four or five pages in length. The level of the chapters is appropriate to undergraduate students who would find them useful for new investigations preceding more substantial research, or to fill in knowledge gaps when the curriculum has not dealt with the subject in any great detail. Even academics will want to check this work, though they might do it surreptitiously, because with the curriculum changing so rapidly there never seems enough time to read all that we want about new topics, and this encyclopedia offers a very simple means of catching up with developments in data mining and warehousing that could have passed us by.

The content of the two volumes is arranged simply by the alphabetical order of the title, which is not necessarily going to lead the reader to the topics he or she is looking for. The best way of approaching the content is through the index, though it is not as complete as one might hope. This encyclopaedia does not have the broad topic classification scheme provided in some of the other Idea Group reference works.

In terms of content, general topics are covered very well. Some chapters take a theoretical approach. There are very many chapters on rule‐based data mining (many more than on other statistical methods), lots on the various mathematical algorithms needed to mine data, and equally there is considerable content on web mining. There are several chapters on data warehousing and mining in specific disciplines or industries. The “soft” topics of ethics and social impact are not forgotten. Being ultra critical, there may not be as many chapters on data warehousing as might be expected, considering the title, and the coverage of some statistical methods becoming significant in the literature, such as neural networks, hierarchical and k‐means clustering, and Kohonen networks, is not at all full.

I recommend this reference work to all academic libraries with computing or information systems courses. The work will also be relevant to academics and practitioners alike.

Related articles