2010年8月11日 星期三

WEKA:由程式產生資料庫 Instances

WEKA API 請參閱官方網站:http://www.cs.waikato.ac.nz/ml/weka/

由於 WEKA 有自己專用的檔案格式 ARFF
但資料在記憶體中做完 preprocessing 以後,還得特地存成 *.arff 檔
再叫 WEKA 的 API 把 *.arff 檔讀進來,顯然是很蠢的事情=..=~
經過一番尋找,總算在官方文件中發現產生 Instances 資料庫的方法了!

可參閱:Creating an ARFF file


範例程式碼

FastVector      atts;
FastVector      attsRel;
FastVector      attVals;
FastVector      attValsRel;
Instances       data;
Instances       dataRel;
double[]        vals;
double[]        valsRel;
int             i;

// 1. set up attributes
atts = new FastVector();
// - numeric
atts.addElement(new Attribute("att1"));
// - nominal
attVals = new FastVector();
for (i = 0; i < 5; i++)
  attVals.addElement("val" + (i+1));
atts.addElement(new Attribute("att2", attVals));
// - string
atts.addElement(new Attribute("att3", (FastVector) null));
// - date
atts.addElement(new Attribute("att4", "yyyy-MM-dd"));
// - relational
attsRel = new FastVector();
// -- numeric
attsRel.addElement(new Attribute("att5.1"));
// -- nominal
attValsRel = new FastVector();
for (i = 0; i < 5; i++)
  attValsRel.addElement("val5." + (i+1));
attsRel.addElement(new Attribute("att5.2", attValsRel));
dataRel = new Instances("att5", attsRel, 0);
atts.addElement(new Attribute("att5", dataRel, 0));
   
// 2. create Instances object
data = new Instances("MyRelation", atts, 0);
   
// 3. fill with data
// first instance
vals = new double[data.numAttributes()];
// - numeric
vals[0] = Math.PI;
// - nominal
vals[1] = attVals.indexOf("val3");
// - string
vals[2] = data.attribute(2).addStringValue("This is a string!");
// - date
vals[3] = data.attribute(3).parseDate("2001-11-09");
// - relational
dataRel = new Instances(data.attribute(4).relation(), 0);
// add
data.add(new Instance(1.0, vals));
   
// second instance
vals = new double[data.numAttributes()];  // important: needs NEW array!
// - numeric
vals[0] = Math.E;
// - nominal
vals[1] = attVals.indexOf("val1");
// - string
vals[2] = data.attribute(2).addStringValue("And another one!");
// - date
vals[3] = data.attribute(3).parseDate("2000-12-01");
// - relational
dataRel = new Instances(data.attribute(4).relation(), 0);
// -- first instance
valsRel = new double[2];
valsRel[0] = Math.E + 1;
valsRel[1] = attValsRel.indexOf("val5.4");
dataRel.add(new Instance(1.0, valsRel));
// -- second instance
valsRel = new double[2];
valsRel[0] = Math.E + 2;
valsRel[1] = attValsRel.indexOf("val5.1");
dataRel.add(new Instance(1.0, valsRel));
vals[4] = data.attribute(4).addRelation(dataRel);
// add
data.add(new Instance(1.0, vals));

//4. output data
System.out.println(data);

輸出的結果如下:

@relation MyRelation

@attribute att1 numeric
@attribute att2 {val1,val2,val3,val4,val5}
@attribute att3 string
@attribute att4 date yyyy-MM-dd
@attribute att5 relational
@attribute att5.1 numeric
@attribute att5.2 {val5.1,val5.2,val5.3,val5.4,val5.5}
@end att5

@data
3.141593,val3,'This is a string!',2001-11-09,'3.718282,val5.4\n4.718282,val5.1'
2.718282,val1,'And another one!',2000-12-01,'3.718282,val5.4\n4.718282,val5.1'

沒有留言: