|
Exploratory Data Analysis
Today’s professionals in
business, engineering, and science work in complex—often overwhelmingly
complex—environments. To be effective, they must understand masses of
data from a variety of sources. Traditional Nat mining tools employ
blackbodies algorithms that generate complex predictive models. The
models can be useful for predicting, but provide little to no insight
into the data. Exploratory data analysis—the underlying premise of Data
Desk software—is a statistics approach that allows the decision maker
to not only see patterns and relationships in a dataset but to get at
the causes and effects behind the relationships. EDA facilitates
sophisticated understanding of what’s really going on in a body of data.
Most data arise as a byproduct of other activities. A business person
may have data in a spreadsheet intended for tracking sales, data in a
database for human resource management, or data that have been
published by a government or trade organization. A researcher may
collect data to sift a variety of alternatives, may want to look in a
new way at data originally collected for a different purpose, or may
want to check experiment data for errors or unexpected patterns. That
is why the process of analyzing data needs to be wide open to
possibility.
About a hundred years ago, in its early days, statistics concentrated
on analyses of data, considering effective ways to describe patterns,
trends, and relationships. In the middle half of the 20th century,
attention moved to developing a solid mathematical foundation,
establishing the properties of various estimators to find the best
methods. In 1962 Dr. John Tukey warned that mathematical statistics was
ignoring real-world data analysis and called for a return to scientific
statistics in which the value of the statistical description of the
data was paramount. In subsequent work, Tukey defined Exploratory Data
Analysis, a philosophy that returned to the original goals of
statistics but used modern methods.
Traditional inferential statistics starts from a hypothesis, performs
an experiment, and then tests the hypothesis. EDA starts instead from
the data and asks what patterns, relationships, or trends they might
hold. In recent years EDA has gained wider acceptance. A large part of
this growth is due to the availability of desktop computers and the
explosion of data for which traditional statistics is just not
suitable. Desktop computers have also made it possible to develop new
graphical methods that support the EDA philosophy in strikingly
effective fashion.
Because EDA relies heavily on data display, makes few assumptions about
the structure of the data and emphasizes identifying and describing
patterns, it is useful to a wide range of professionals who can
recognize important patterns easily, but may not wish to work with
complex statistical techniques.
Data Description’s graphical analytical tools start from the EDA
philosophy. They empower people who have data and want to discover the
patterns hiding within.
©2015, Data Description, Inc.
|
|