A Visual Approach to Exploratory Data Mining

Publisher's description

As the first step upon commencing an in-depth data mining analysis, students should become intimately acquainted with the data under study. This paper presents a methodology and set of custom tools that one has designed and developed for use in the data mining courses that allows students to efficiently and effectively accomplish this task. The tools create interactive visual presentations of the data, encouraging students to explore the data in search of patterns or relationships that would then be investigated in subsequent steps using sophisticated statistical and machine learning tools.

Registration Required
File size 1048.20kb - Check your internet speed
File format PDF - Download the latest version of Adobe Acrobat Reader

Download for free now

Similar whitepapers

Oracle Essbase - Not Just for Financial Analytics

Oracle Essbase is not just for financial analytics anymore. By expanding its use beyond the Finance department, one will begin to see the true value of this flexible and functional tool. This paper shows how one can use an existing sales analysis cube as a foundation for an inventory analysis cube.

89 days ago by Business Intelligence Consulting Group Download

Grocery Retailer Uses Business Intelligence to Strengthen Supplier Negotiations

Grocery retailer Waitrose needed a more efficient business intelligence solution to support buyers' daily decisions. It designed a solution based on Microsoft technologies, which pulls reporting and analysis into one place. Buyers can quickly access crucial trading figures and supplier data for use in negotiations, helping the retailer to boost sales and profits.

103 days ago by Microsoft Download

Farmaceuticos Maypo S.A. de C.V. Improves Data Analysis and Information Services to Client Labs

Farmaceuticos Maypo S.A. de C.V. wanted to consolidate internal information and information of affiliated laboratories to provide a total view of data, and accelerate availability of reliable, up-to-date internal and external information and reduce manual procedures and human errors in the data extraction and reporting process. The challenge was to improve the company's ability to produce actionable reports for management as well as the laboratories and health organizations that it serves.

141 days ago by Oracle Download

Broadridge Recovers Lost Revenue and Helps Prevent Future Loss With Payment Recovery Solution

Since 2007, Broadridge, formerly a subsidiary of ADP, has provided a range of complete brokerage processing solutions for both retail and institutional firms around the globe. Broadridge wanted to recover lost funds due to overpayment, duplicate payments and erroneous payments. Broadridge analyzed supplier transaction database to find financial recovery opportunities and identified outstanding credits and issues that led to payment errors.

141 days ago by Hewlett-Packard (HP) Download

EMC Corporation Delivers Reports in Minutes With Unified Data Warehouse and Analytics Tools

EMC Corporation wanted to consolidate disparate data sources to reduce total cost of ownership, improve efficiency, and sharpen decision making and provide diverse users with insight into data on various topics, from sales metrics and customer information to Human Resources (HR) and financial figures. The challenge was to grow market share against other high-end storage competitors by leveraging data on customers and markets, and by improving customer service. EMC Corporation implemented a unified data warehouse with Oracle Business Intelligence Enterprise Edition and Oracle Database 11g, eliminating four legacy data warehouses and numerous overlapping business and analytics applications and improved accuracy and efficiency in financial reporting by eliminating 13 shadow business applications.

202 days ago by Oracle Download

5 Ways to Reduce IT Audit Tax

Taxes are certainly not fun, but there is something worse: an audit. Combine the two in a risk and compliance scenario and you have the onerous "audit tax," a figurative term used to describe the expenses a company incurs when deploying resources and manpower to satisfy the burgeoning set of internal and external compliance and audit mandates. The good news is that there are ways to reduce the audit tax burden. This whitepaper outlines five methods organizations should consider to streamline their compliance efforts and thereby reduce their audit tax.

208 days ago by Lumension Download

Texas A&M University whitepapers

Cyber Crimes Aimed at Publicly Traded Companies: Is Stock Price Affected?

E-commerce has been a boon for business. A great deal of business activity now occurs in the realm of cyberspace on the Web. The downside of cyber-business is cyber crimes, also called electronic crime or simply e-crime. Cyber crime costs publicly traded companies billions of dollars annually in stolen assets and lost business. Further, when a company falls prey to cyber criminals, this may concern customers who worry about the security of their business transactions with the company. As a result, a company can lose future business if it is perceived to be vulnerable to cyber crime. Such vulnerability may even lead to a decrease in the market value of the company, due to legitimate concerns of financial analysts, investors, and creditors.

82 days ago by Texas A&M University Download

Visual Tools for Initial Exploratory Data Mining

Data mining has been defined as "The process of discovering useful information in large data repositories" (Tan et al., 2006). It focuses on the discovery of valid, previously unknown, and actionable patterns and relationships. In its search for patterns, associations and descriptions, it goes beyond the simple retrieval of base facts and aggregations. Data mining frequently employs complex statistical and machine learning algorithms to extract this information. Given the large volumes of data on which data mining analysts usually work, the question of where to begin is often asked. This paper presents an approach that one has developed that uses three locally developed visual tools to conduct some of the introductory steps of data mining.

391 days ago by Texas A&M University Download

Sorting Based Data Centric Storage

Data-centric storage is a very important concept for sensor networks that supports efficient in-network data query and processing. Previous approaches are mostly using a hashing function to store data with the same key value on the sensors that are closest the same geometric location. The paper proposes a new data-centric storage method based on sorting. Their method is robust for different network models and works for unlocalized homogeneous sensor networks, i.e., it requires no location information. The idea is to sort the data in the network based on their key values, so that queries can be easily answered. The sorting method balances the storage load very well. They present a sorting algorithm that is both decentralized and very efficient.

686 days ago by Texas A&M University Download

Internet Explorer and Firefox: Web Browser Features Comparision and Their Future

Internet technology is one of the utmost inventions of the era and has contributed significantly in distributing and collecting data and information. Effectiveness and efficiency of the process depends on the performance of the web browser. Internet Explorer is the leader of the competitive browser market with Mozilla Fox as its strongest rival, which has been and is gaining a substantial level of popularity among internet users. Choosing the superlative web browser is a difficult task due to the considerably large selection of browser programs and lack of tangible comparison data. This paper describes and compares vital features of Internet Explorer and Mozilla Firefox, which represent over 90% of the browser market.

926 days ago by Texas A&M University Download

Design of an Active Set Top Box in a Wireless Network for Scalable Streaming Services

The popularity of multimedia streaming services via wireless home networks has confronted major challenges in quality improvement for services through a Set Top Box (STB). Even though scalable methods have been suggested to enhance the quality of multimedia streaming services, it is still challenging how to provide scalable streaming services in wireless home networks. Previous studies on scalable streaming services eliminate the corrupted stream at the multimedia client. This paper proposes a new method, ActiveSTB, which removes the distorted or unsuitable multimedia data early to save frugal resources. The paper uses a network simulation tool, NS-2, to evaluate the method with various range of cross traffic and error rates.

963 days ago by Texas A&M University Download

Design of IRA Codes for Non-Coherent Detection With OFDM

Irregular Repeat Accumulate (IRA) codes are a special class Of Low-Density Parity-Check (LDPC) codes which perform very close to capacity for memory-less channels with coherent detection. Further, they can be encoded and decoded in linear-time. This paper considers the design of a non-coherent Orthogonal Frequency-Division Multiplexing (OFDM) system for block frequency-selective fading channels with non-systematic IRA codes where the receiver performs iterative channel estimation and decoding. Non-systematic IRA codes are optimized for the given system and it is shown to perform within a few tenths of a dB from the coherent receiver with the same channel parameters. The proposed scheme is shown to be robust to change in the number of channel taps and Doppler.

1074 days ago by Texas A&M University Download