CART - 决策树软件-北京睿驰科技

cart

넳 넲

CART是Salford Systems的旗舰数据挖掘软件，该软件是一款决策树能自动筛选复杂的数据。

美国Salford Systems公司创建于1983年，总部位于美国加利福尼亚州圣地亚哥。公司提供数据挖掘和商业智能软件和咨询服务。获奖软件被成功应用于复杂数据分析，预测建模和划分方面，并应用于信用评分、目标市场营销、分析型的客户关系管理(CRM)、欺诈和非法侵入检测、网站个性化、药品研发、制造业质量控制。使用Salford Systems产品和服务的行业银行业、金融服务、保险、电信、交通、医药品、保健、制造业、法律的实施和安全、零售和目录销售和教育。有4,500个站点，其中300个大学，均安装Salford Systems软件。公司主要客户规模大小不同，其中不少是Fortune 500强，美国运通（American Express）, 辉瑞制药（Pfizer Pharmaceuticals）, 通用汽车（General Motors）、西尔斯罗巴客（Sears, Roebuck and Co）。

CART^® - 分类和回归树

分类树：

Salford Predictive Modeler的CART^®建模引擎是分类树，它改变了分析领域，并开创了当前数据科学的时代。CART是现代数据挖掘中的工具之一。

代码：

从技术上讲，CART建模具有引擎基于1984年斯坦福大学加州大学伯克利分校的四位知名统计学家引入的具有里程碑意义的数学理论。CART建模引擎是SPM的分类和回归树实现，是体现原始代码的决策树软件。

速度快，用途广泛：

CART建模引擎的扩展用于增强市场研究和网络分析的结果。CART建模引擎高速部署，允许Salford Predictive Modeler的模型大规模实时预测和评分。多年来，CART建模引擎已成为分析师可用的流行且易于使用的预测建模算法之一，它也被用作基于装袋和增强的现代数据挖掘方法的基础。

CART功能：

热点检测以发现树的部分以及相应的树规则
变量用于了解树中的变量
部署模型并实时或以方式生成预测
用户定义的拆分在树的位置
差异提升（也称为“提升”或“增量反应”）模型，用于评估治疗效果
用于模型调整和实验的自动化工具，

自动递归功能，可进行高等变量选择
对先验概率进行实验，以获得对于类别而言具有率的模型
执行重复的交叉验证
在引导程序样本上构建CART模型
建立两个链接的模型，其中模型预测二进制事件，而模型预测数值。例如，预测某人是否会购买以及他们将花费多少。
发现不同学习和测试分区的影响

Salford Systems公司数据挖掘工具介绍：

CART (分类和回归树)是基于斯坦福大学和加州大学伯克利分校的统计学家 Leo Breiman、Jerome Friedman、Richard Olshen 和 Charles Stone开发的原CART代码的决策树软件。CART具有高速、和使用的，并自动对数据提供深入的探索研究，产生可理
解的预测模型。

MARS (多变量适应回归样条)，是的数据挖掘和预测模型的回归工具，是神经网络和传统统计模型以外可供选择的方法。

TreeNet 是新一代高速，错误容忍的预测建模工具。TreeNet需要很少的数据准备工作，巧妙的处理有缺陷的数据，自动适应缺失领域，并且进行广泛的自检，使得模型应用于新数据时的效果也得到。TreeNet模型经常由500或更多的小决策树组成的图表，概括了关键变量对于结果的影响。

RandomForests 是新一代的树集合技术，该技术把大量树结合起来，组成高性能的分类器和预测模型。基于Leo Breiman性的研究RandomForests具备从数据结构中抽取关键信息的能力，并始初提供带有度量和无假定的聚类和分割。

CART is an acronym for Classification and Regression Trees, a decision-tree procedure introduced in 1984 by world-renowned UC Berkeley and Stanford statisticians, Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone.

CART uses an intuitive, Windows based interface, making it accessible to both technical and non technical users. Underlying the "easy" interface, however, is a mature theoretical foundation that distinguishes CART from other methodologies and other decision trees. CART is the only decision tree system based on the original CART code developed by world renowned Stanford University and University of California at Berkeley statisticians; this code now includes enhancements that were co-developed by Salford Systems and CART's originators.

Based on a decade of machine learning and statistical research, CART provides stable performance and reliable results.

In addition, CART is an excellent pre-processing complement to other data analysis techniques. For example, CART's outputs (predicted values) can be used as inputs to improve the predictive accuracy of neural nets and logistic regression. NEW TreeCoder Model Deployment Module TreeCoder is an add-on module for deploying CART models directly in SAS -- quickly and accurately.

The decision logic of a CART tree, including the surrogate rules utilised if primary splitting values are missing, is automatically implemented. The resulting source code can be dropped into a SAS run without modification thus eliminating errors due to hand coding of decision rules and enabling fast and accurate model deployment.

COMPONENTS	BASIC	PRO	PROEX	ULTRA
Components	Basic	Pro	ProEx	Ultra
Modeling Engine: CART (Decision Trees)	o	o	o	o
Linear Combination Splits	o	o	o	o
Optimal tree selection based on area under ROC curve	o	o	o	o
User defined splits for the root node and its children		o	o	o
Automation: Generate models with alternative handling of missing values (Battery MVI)		o	o	o
Automation: Build a series of models using all available splitting strategies (six for classification, two for regression) (Battery RULES)		o	o	o
Automation: Build a series of models varying the depth of the tree (Battery DEPTH)		o	o	o
Automation: Build a series of models changing the minimum required size on parent nodes (Battery ATOM)		o	o	o
Automation: Build a series of models changing the minimum required size on child nodes (Battery MINCHILD)		o	o	o
Automation: Explore accuracy versus speed trade-off due to potential sampling of records at each node in a tree (Battery SUBSAMPLE)		o	o	o
Multiple user defined lists for linear combinations			o	o
Constrained trees			o	o
Ability to create and save dummy variables for every node in the tree during scoring			o	o
Report basic stats on any variable of user choice at every node in the tree			o	o
Comparison of learn vs. test performance at every node of every tree in the sequence			o	o
Hot-Spot detection to identify the richest nodes across multiple trees			o	o
Automation: Vary the priors for the specified class (Battery PRIORS)			o	o
Automation: Build a series of models limiting the number of nodes in a tree (Battery NODES)			o	o
Automation: Build a series of models trying each available predictor as the root node splitter (Battery ROOT)			o	o
Automation: Explore the impact of favoring equal sized child nodes (Battery POWER)			o	o
Automation: Build a series of models by progressively removing misclassified records thus increasing the robustness of trees and posssibly reducing model complexity (Battery REFINE)			o	o
Automation: Bagging and ARCing using the legacy code (COMBINE)			o	o
Build a CART tree utilizing the TreeNet engine to gain speed as well as alternative reporting				o
Build a Random Forests model utlizing the CART engine to gain alternative handling of missing values via surrogate splits (Battery BOOTSTRAP RSPLIT)				o