5 of the Best Free and Open Source Data Mining Software | TechSource
Gerald Lushington takes a look at the current status of open source data mining
I’ve recently received some questions about the best open source data mining tools out there. Good excuse to peak my head up out of the sand and see what’s out there!
Let me say up front that I use Weka quite religiously, and I would be quite happy to promote it to anyone as an efficient and reasonably flexible playground for anyone who needs to make sense of a lot of numbers. I do, however, get the sense that it is a bit static in its methodological offerings, and that its graphics are marginal. So what else is available?
The title link to this post is itself a blog post on this very question. The only deficiency that post has is that it is now 2.5 years old, which might be considered somewhat ‘mature’ in the fast paced world of open source development. So it does suggest Weka as one of it’s top five, as well asRapidMiner, which may well be the king of the data mining platforms period (open source or not). I was surprised but pleased to see Knime listed: Knime is much beloved in the chemical informatics community, and is a wonderful tool, but I personally had always thought of it as much more of a data integration platform than as a data mining engine. I guess I need to open my mind a bit wider sometimes. The remaining two codes profiled by Auza, include Orange and JHepWork, neither of which I had heard of, but both described in compelling and attractive terms: Orange as a powerful but user-friendly platform that sounds suitable for beginners, and JHepWork as perhaps the most graphically endowed of the bunch.
New recommendations
What are some up and coming programs that Auza missed, or didn’t include in his top five? Definitely Waffles, which provides a fine general environment within which to carry out a broad range of data analyses. On a somewhat more narrow (but still broadly applicable level) I would definitely suggest ELKIfor its sophisticated techniques for model validation and outlier detection. Finally, for those people with more specialized interests, the MLoss list should be useful — it includes both the general data mining packages as well as a reasonable number that are highly tuned to specific disciplines or types of computation.Finally, I would mention that if any of you have personal favorites that have not been mentioned here and may be of relevance to chemistry or biology, I would greatly appreciate hearing about them!
Posts in Gerald Lushington’s open source software blog represent the honest opinions of an open source software fan who has not been influenced by commercial interests or other inducements.
没有评论:
发表评论