Molegro Data Modeller is a cross-platform application for Data Mining, Data Modelling, and Data Visualization. The spreadsheet centered user interface makes Molegro Data Modeller a simple and affordable alternative to complex workflow based solutions or command-driven statistical products.
Here are some key features of "Molegro Data Modeller":
Regression
Molegro Data Modeller offers different types of regression methods:
· Multiple Linear Regression models simple linear relations between data, and is fast and efficient.
· Partial Least Squares reduces the dimensionality of the data set before creating a model. Suitable for data sets with many independent variables.
· Neural Networks are able to model highly non-linear relations.
· Support Vector Machines are also able to model complex relations and tend to be less prone to overfitting than Neural Networks.
· Feature Selection and Cross-Validation
Feature selection is easy to set up in the regression wizard: different schemes can be chosen (Forward, Backward, and Hill Climber selection) and be combined with different model selection criteria (Bayes Information Criterion or cross validated R^2). Different descriptor rankings can be employed when searching the descriptors.
Cross-validation is just as easy. You can cross-validate by using Leave-One-Out, by using a specified number of random folds, or by manually creating folds.
· Visualization
The different visualization types are highly interactive. Selections in the spreadsheet are directly shown in the plots and vice versa. It is also possible to apply different user-defined coloring schemes and apply jitter (add artificial noise to the data plots).
· Clustering
Molegro Data Modeller offers two kinds of clustering: K-means clustering (which is very efficient) and a density-based clustering scheme (which is able to capture more complex cluster shapes).
· Principal Component Analysis (PCA).
Principal Component Analysis is a method for reducing the dimensionality of a dataset. A new set of principal components is created using linear combinations of the original descriptors. The number of descriptors is then reduced by only keeping the descriptors contributing most to the variance.
· Algebraic Data Transformations.
It is possible to work with algebraic transformations directly on columns: for instance, "New Activity = log(Act) + Beta^2" will create a new column based on the expression.
· Outlier Detection
Molegro Data Modeller provides two methods for locating abnormal data:
· A quartile based method which checks how far away a data point is from the 25th and 75th percentile. This method examines each descriptor individually.
· A density-based method which calculates a local density for each data point. Data points with a low density are far away from other data points and could be outliers.
· Advanced Subset Creation
Molegro Data Modeller offers a grid-based method for creating a diverse subset of a dataset. It is possible to create grids in an arbitrary number of dimensions, and if working with 2D and 3D grids they can be visualized directly in the data plotters.
Other Features
· Scrambling (shuffling) of columns and "replace with random values" for performing y-Randomization.
· Data preparation: scaling, normalization, repair of missing values.
· Data coloring.
· Correlation Matrix.
· Custom Data Views.
· Similarity Browser.
· Gnuplot export (for creating and customizing publishing quality plots).
· Online help and automatic check for updates.
[RS]
Aucun commentaire:
Enregistrer un commentaire