|
MARS by
Product Description
MARS (Multivariate
Adaptive Regression Splines) is a companion to CART that focuses on the
development and deployment of accurate and easy-to-understand
regression models. The MARS model is designed to predict continuous
numeric outcomes such as the average monthly bill of a mobile phone
customer or the amount that a shopper is expected to spend in a web
site visit. MARS is also capable of producing high quality probability
models for a yes/no outcome. A dramatic improvement over conventional
stepwise and other automated regression tools, MARS performs variable
selection, variable transformation, interaction detection, and
self-testing, all automatically and at high speed. The MARS model is a
regression but with automatically generated non-linearities and
interactions included. A number of independent scientific studies have
reported that MARS often outperforms neural networks in predictive
accuracy while training from 100 to 1000 times faster.
MARS excels at finding thresholds and breaks in the relationships
between a set of inputs and is thus ideal for detecting changes in the
behavior of individuals or processes over time. Of all the Salford
tools, MARS is the most adept at working with the small data sets
frequently encountered in engineering contexts. MARS has also been
involved in winning data mining competitions focused on large database
customer relationship management (CRM) topics. Areas where MARS has
exhibited very high-performance results include forecasting electricity
demand for power-generating companies, relating customer satisfaction
scores to the engineering specifications of products, and
presence/absence modeling in geographical information systems (GIS).
Principal Characteristics
MARS is an innovative and
flexible modeling tool that automates the building of accurate
predictive models for continuous and binary dependent variables.
Multivariate Adaptive Regression Splines was developed in the early
1990s by Jerry Friedman, a world-renowned statistician and one of the
co-developers of CART. Salford Systems' MARS, based on the original
code, has been substantially enhanced with new features and
capabilities in exclusive collaboration with Friedman.
MARS excels at finding optimal variable transformations and
interactions, the complex data structure that often hides in
high-dimensional data. In doing so, this new generation approach to
data mining uncovers business critical data patterns and relationships
that are difficult, if not impossible, for other approaches to uncover.
Given a target variable and a set of candidate predictor variables, MARS automates all aspects of model development, including:
- Separating relevant from irrelevant predictors
Large numbers of variables are examined using efficient algorithms, and all promising variables are identified.
- Transforming predictor variables exhibiting a nonlinear relationship with the target variable
Every variable selected for entry into the model is repeatedly
checked for non-linear response. Highly non-linear functions can be
traced with precision via essentially piecewise regression.
- Determining interactions between predictor variables
MARS
repeatedly searches through the interactions allowed by the analyst.
Unlike recursive partitioning schemes, MARS models may be constrained
to forbid interactions of certain types, thus allowing some variables
to enter only as main effects, while allowing other variables to enter
as interactions, but only with a specified subset of other variables.
- Handling missing values with new nested variable techniques
Certain variables are deemed to be meaningful (possibly
non-missing) in the model only if particular conditions are met (e.g.,
X has a meaningful non-missing value only if categorical variable Y has
a value in some range).
- Conducting extensive self tests to protect against overfitting
The
user can choose to reserve a random subset of data for test, or use
v-fold cross validation to tune the final model selection parameters.
MARS enables analysts to rapidly search through all possible models and
to quickly identify the optimal solution, providing insights that can
lead to a definitive competitive advantage. Because the software can be
exploited via an easy-to-use GUI, intelligent default settings, and
aesthetically appealing output, for the first time analysts at all
levels can easily access MARS' innovations.
MARS for Windows also incorporates two alternative control modes that
extend the program's features and capabilities. In addition to
controlling MARS with the GUI, you can also issue commands at the
command prompt or submit a command file.
- User-Friendly Graphical User Interface
MARS'
easy-to-use GUI allows the user to control the variables and functional
forms to be entered into the model and the interactions to be
considered or forbidden, while allowing the MARS algorithm to optimize
those parts of the model the analyst chooses to leave free. Once the
model is selected, the user can easily remove or add terms, instantly
see the impact of changes on model fit, review diagnostics that assist
in model selection, save the model and apply the model to new data for
prediction.
-
MARS Output
MARS output is an easy-to-deploy regression model that can be
automatically applied to new data from within MARS itself or exported
as ready-to-run SAS® and C source code. To facilitate interpretation of
the model, the output also includes interpretive summary reports as
well as exportable two- and three-dimensional curve and surface plots:
Data Translation Engine
The MARS®
data-translation engine supports data conversions for more than 80 file
formats, including popular statistical-analysis packages such as SAS®
and SPSS®, databases such as Oracle and Informix, and spreadsheets such
as Microsoft Excel and Lotus 1-2-3.
Which version do you need?
MARS
requires that all training data reside in RAM, so the larger the data
set to be analyzed, the larger the RAM needed to analyze it. The exact
amount of RAM required will vary from problem to problem. The table
below is intended as a guide for the maximum number of candidate
predictor variables that can be specified in a MARS analysis for the
given sample size and amount of RAM workspace:
Number of Predictor Columns You Can Use For Different Training Sample Sizes and MARS versions |
Sample Size |
64 MB compile [2m]** |
128 MB compile [4.8m] |
256 MB compile [9.6m] |
512 MB compile [22.8m]*** |
10,000 |
200 |
480 |
960 |
2280 |
25,000 |
80 |
190 |
380 |
910 |
50,000 |
40 |
95 |
190 |
455 |
100,000 |
20 |
45 |
95 |
225 |
200,000 |
5 |
20 |
45 |
110 |
MARS run with default settings and with the
following assumptions: no missing values or categorical variables in
training data; maximum interactions set to 1; maximum basis functions
set to the number of specified predictors.
NOTE that each variable containing a missing value counts as two predictors.
- ** Maximum number of numbers (in millions) based on above assumptions.
- ***
Custom compiles up to 32 GB are available on UNIX platforms. the
maximum number of candidate predictor variables that can be specified
regardless of available RAM is 8,192.
Rule of Thumb for Calculating Required RAM
A rule of thumb that you can also use for calculating the needed RAM
for your data set is to multiply the data set size by a factor of 3 to
4. For example, if your data set is 10 megabytes, MARS potentially
requires 40 megabytes of RAM for the analysis.
Increasing the Number of Variables MARS Can Handle
If
you have a very large list of potential predictors, CART can be used
first to extract the most important variables. MARS can then focus on
the top variables from the CART model, enabling you to fit larger
problem sizes into smaller workspaces and resulting in faster analyses
and more accurate and robust models.
System Technical Requirements
Windows
Minimum System Requirements
- 80486 processor or higher.
- 512MB of random-access memory (RAM). This value
depends on the "size" you have purchased (64MB, 128MB, 256MB, 512MB,
1GIG). While all versions may run with a minimum of 32MB of RAM, we
CANNOT GUARANTEE it will. We highly recommend that you follow the
recommended memory configuration that applies to the particular version
you have purchased. Using less than the recommended memory
configuration results in hard drive paging, reducing performance
significantly, or application instability.
- Hard disk with 40 MB of free space for program files, data file access utility, and sample data files.
- Additional hard disk space for scratch files (with the required space contingent on the size of the input data set).
- CD-ROM or DVD drive.
- Windows XP/2003/2008 and Windows 7.
Recommended System Requirements
Because Salford Tools are extremely CPU intensive, the faster your CPU,
the faster they will run. For optimal performance, we strongly
recommend they run on a machine with a system configuration equal to,
or greater than, the following:
- Pentium 4 processor running 2.0+ GHz.
- 2 GIG of random-access memory (RAM). This value
depends on the "size" you have purchased (64MB, 128MB, 256MB, 512MB,
1GIG). While all versions may run with a minimum of 32MB of RAM, we
CANNOT GUARANTEE it will. We highly recommend that you follow the
recommended memory configuration that applies to the particular version
you have purchased. Using less than the recommended memory
configuration results in hard drive paging, reducing performance
significantly, or application instability.
- Hard disk with 40 MB of free space for program files, data file access utility, and sample data files.
- Additional hard disk space for scratch files (with the required space contingent on the size of the input data set).
- CD-ROM or DVD drive.
- Windows XP/2003/2008 and Windows 7.
- 2 GIG of additional hard disk space available for virtual memory and temporary files.
Ensuring Proper Permissions
If you are installing on a machine that uses security permissions, please read the following note.
- You must belong to the Administrator group onWindows
XP/2003/2008 and Windows 7 to be able to properly install and license.
Once the application is installed and licensed, any member with
read/write/modify permissions to the applications /bin and temp
directories can execute and run the application.
Licensing Application
CART uses a system of application system ID and associated unlock key.
Once installation is complete, the user will need to email the
application "system ID." This system ID is clearly displayed in the
License Information displayed the first time the application is
started. Alternatively, you can get to this window by selecting the
Help->License menu option.
Method 1: Fixed License
With a fixed license, each machine must have its own copy of the
licensed program installed. If your license terms permit more than one
copy, then the license must be activated on each machine that will be
used.
Method 2: Floating License
This method of licensing your program is used if you intend the program
application to be used by more than one user concurrently over a
network. A floating license tracks the number of copies "checked out."
When that number exceeds your license terms, a message is provided
informing the user "all copies are checked out." The licensed program
may be installed on a machine that each client machine can access.
Machines that are not connected to the network must be issued a fixed
license (Method 1 above).
A floating license is particularly useful when the number of potential
users exceeds the number of seats specified in your license terms.
UNIX/Linux
Supported Architectures
- Alpha: DEC 3000 or AlphaServer running Tru64 UNIX 5.0 or higher
- Linux/i386: i586 or higher processor; Linux 2.4 or higher kernel; glibc 2.3 or higher
- Linux/AMD64: AMD64 or Intel EM64T processor; Linux 2.6 or higher kernel; glibc 2.3 or higher
- Sun: UltraSPARC processor; Solaris 2.6 or higher
- RS/6000: POWER or PowerPC processor; AIX 4.2 or higher
- HP 9000: PA/RISC 1.1 or higher processor; HP/UX 11.x
- SGI: MIPS 4 or higher processor; IRIX 6.5
Minimum System Requirements
- Minimum RAM requirement for all non-GUI app's is 32
MB of random-access memory (RAM). This value depends on the "size" you
have purchased (64MB, 128MB, 256MB, 512MB, 1GIG).
- Hard disk with 40 MB of free space for program files, data file access utility, and sample data files.
- Additional hard disk space for scratch files (with the required space contingent on the size of the input data set).
Recommended System Requirements
- Recommended random-access memory (RAM) is 1.5 times
the licensed data limit (32 MB, 64 MB, etc), up to the maximum
permitted by the target architecture. On UNIX systems, it is generally
recommended that there be at least twice as much swap space as there is
RAM.
- Hard disk with 40 MB of free space for program files, data file access utility, and sample data files.
- Additional hard disk space for scratch files (with the required space contingent on the size of the input data set).
All Salford apps are very CPU intensive, so more memory and a faster CPU are always helpful.
© Copyright 2015 Salford-Systems Inc.
|
|