Thursday, 01 May 2014
  5 Replies
  100 Visits
0
Votes
Undo
  Subscribe
Hi all Rapidminers!

Please see what accuracy level and output you are getting by analyzing subject and body without using concatenation. You will have to apply weights in various combinations for the subject and body so that it totals to 1.0. So, for example, if you apply 0.6 weight to body, you have to apply 0.4 weight to subject. Note the accuracy levels you are getting using 10 validations and see how the test emails are classified and compare it. Please discuss the outputs here.
11 years ago
·
#417
0
Votes
Undo
Hi,

Please refer to the picture for the output which I have got. The best output which I have got is under the following conditions:

1. Validations: 10
2. Subject weight: 0.9
3. Body weight: 0.1
4. Operators used under 'Process Documents from Data' - 'Tokenize, Filter Stopwords, Filter Tokens by Length (lower limit: 3 upper limit: 999), Filter Tokens by Content: string filtered - www with 'inverse' condition.

Please put in your outputs for discussions and deciding on the best combination.
Under this condition, 2 texts have been predicted as Davison correctly.
11 years ago
·
#420
0
Votes
Undo
The results are changing in two different conditions. They are:
1. While 'Filter stopwords' and 'Stem(porter)' are used, the result is misclassified for Davison and other categories.
2. While 'Filter tokens by length' and 'Filter tokens by content' are used, the result is classifying emails more correctly.
11 years ago
·
#421
0
Votes
Undo
stem porter is clearly affecting the result , funnily in negative way
11 years ago
·
#422
0
Votes
Undo
Davison: 4
Phone & Network: 4
Web: 3
Others: 9
Weight to Subject: 0.9
Weight to Body: 0.1
Model Accuracy: 71.71% +/- 8.44% (mikro: 71.63%)
Screenshot2014-05-0117.17.04.png
11 years ago
·
#695
0
Votes
Undo
Hi

It seems that the model is working, and is automated. Though in some cases, the emails are misclassified. But, this is the best, which I am getting. See the XML file.

trainingdata_validationscheduled_unscheduled_xml.txt

test_output_xml.txt
  • Page :
  • 1
There are no replies made for this post yet.
Submit Your Response
Upload files or images for this discussion by clicking on the upload button below.
Supported: gif,jpg,png,jpeg,zip,rar,pdf
· Insert · Remove
  Upload Files (Maximum 2MB)

Sharing your current location while posting a new question allow viewers to identify the location you are located.