C4.5 [release 8] decision tree generator Fri Jun 1 10:14:45 2001 ---------------------------------------- Options: File stem Read 4142 cases (57 attributes) from sb2.data.7.data Decision Tree: word_freq_remove <= 0 : | char_freq_$ <= 0.055 : | | word_freq_000 <= 0.25 : | | | char_freq_! <= 0.378 : | | | | word_freq_money <= 0.03 : | | | | | word_freq_font <= 0.12 : | | | | | | word_freq_free <= 0.19 : | | | | | | | word_freq_george > 0 : 0 (600.0) | | | | | | | word_freq_george <= 0 : | | | | | | | | word_freq_our <= 0.71 : | | | | | | | | | word_freq_hp > 0.02 : 0 (503.0/4.0) | | | | | | | | | word_freq_hp <= 0.02 : | | | | | | | | | | word_freq_650 <= 0 : | | | | | | | | | | | word_freq_business <= 0.08 :[S1] | | | | | | | | | | | word_freq_business > 0.08 : | | | | | | | | | | | | word_freq_hp > 0 : 1 (2.0) | | | | | | | | | | | | word_freq_hp <= 0 :[S2] | | | | | | | | | | word_freq_650 > 0 :[S3] | | | | | | | | word_freq_our > 0.71 : | | | | | | | | | word_freq_internet <= 0.5 : | | | | | | | | | | word_freq_email <= 0.42 : | | | | | | | | | | | capital_run_length_average <= 3.675 : | | | | | | | | | | | | word_freq_receive <= 0.03 : | | | | | | | | | | | | | word_freq_edu <= 0.08 :[S4] | | | | | | | | | | | | | word_freq_edu > 0.08 :[S5] | | | | | | | | | | | | word_freq_receive > 0.03 : | | | | | | | | | | | | | word_freq_re <= 0.28 : 1 (3.0/1.0) | | | | | | | | | | | | | word_freq_re > 0.28 : 0 (2.0) | | | | | | | | | | | capital_run_length_average > 3.675 : | | | | | | | | | | | | word_freq_will <= 0.4 : 1 (6.0) | | | | | | | | | | | | word_freq_will > 0.4 : 0 (2.0) | | | | | | | | | | word_freq_email > 0.42 : | | | | | | | | | | | word_freq_will <= 1.82 : 1 (4.0) | | | | | | | | | | | word_freq_will > 1.82 : 0 (2.0) | | | | | | | | | word_freq_internet > 0.5 : | | | | | | | | | | word_freq_over <= 0.12 : 1 (8.0) | | | | | | | | | | word_freq_over > 0.12 : 0 (2.0) | | | | | | word_freq_free > 0.19 : | | | | | | | word_freq_george > 0.12 : 0 (27.0) | | | | | | | word_freq_george <= 0.12 : | | | | | | | | word_freq_cs > 0.08 : 0 (8.0) | | | | | | | | word_freq_cs <= 0.08 : | | | | | | | | | word_freq_project > 0.31 : 0 (8.0) | | | | | | | | | word_freq_project <= 0.31 : | | | | | | | | | | word_freq_people > 0.48 : 0 (10.0) | | | | | | | | | | word_freq_people <= 0.48 : | | | | | | | | | | | word_freq_our <= 0.05 :[S6] | | | | | | | | | | | word_freq_our > 0.05 : | | | | | | | | | | | | word_freq_1999 > 0.32 : 0 (4.0) | | | | | | | | | | | | word_freq_1999 <= 0.32 :[S7] | | | | | word_freq_font > 0.12 : | | | | | | char_freq_; > 0.895 : 0 (14.0) | | | | | | char_freq_; <= 0.895 : | | | | | | | word_freq_edu <= 0.09 : 1 (17.0) | | | | | | | word_freq_edu > 0.09 : 0 (4.0) | | | | word_freq_money > 0.03 : | | | | | word_freq_hp > 0.08 : 0 (11.0) | | | | | word_freq_hp <= 0.08 : | | | | | | word_freq_edu > 0.08 : 0 (9.0) | | | | | | word_freq_edu <= 0.08 : | | | | | | | word_freq_project > 0.17 : 0 (4.0) | | | | | | | word_freq_project <= 0.17 : | | | | | | | | capital_run_length_longest <= 9 : 0 (5.0/1.0) | | | | | | | | capital_run_length_longest > 9 : 1 (29.0/1.0) | | | char_freq_! > 0.378 : | | | | capital_run_length_total <= 64 : | | | | | word_freq_free > 0.77 : 1 (21.0/1.0) | | | | | word_freq_free <= 0.77 : | | | | | | word_freq_hp > 0.52 : 0 (8.0) | | | | | | word_freq_hp <= 0.52 : | | | | | | | capital_run_length_average <= 2.652 : | | | | | | | | char_freq_! <= 0.824 : | | | | | | | | | capital_run_length_longest <= 8 : 0 (57.0) | | | | | | | | | capital_run_length_longest > 8 :[S8] | | | | | | | | char_freq_! > 0.824 : | | | | | | | | | word_freq_business > 0.5 : 1 (5.0) | | | | | | | | | word_freq_business <= 0.5 : | | | | | | | | | | word_freq_our > 0.37 : 1 (4.0/1.0) | | | | | | | | | | word_freq_our <= 0.37 : | | | | | | | | | | | word_freq_your > 0.47 : 0 (3.0) | | | | | | | | | | | word_freq_your <= 0.47 : | | | | | | | | | | | | word_freq_re > 0.34 : 0 (9.0) | | | | | | | | | | | | word_freq_re <= 0.34 : | | | | | | | | | | | | | char_freq_! > 3.907 : 1 (5.0) | | | | | | | | | | | | | char_freq_! <= 3.907 :[S9] | | | | | | | capital_run_length_average > 2.652 : | | | | | | | | word_freq_1999 > 0.64 : 0 (2.0) | | | | | | | | word_freq_1999 <= 0.64 : | | | | | | | | | word_freq_re > 0.6 : 0 (2.0) | | | | | | | | | word_freq_re <= 0.6 : | | | | | | | | | | word_freq_you > 3.2 : 1 (7.0) | | | | | | | | | | word_freq_you <= 3.2 :[S10] | | | | capital_run_length_total > 64 : | | | | | word_freq_pm > 0.04 : 0 (11.0/1.0) | | | | | word_freq_pm <= 0.04 : | | | | | | word_freq_people <= 0.15 : 1 (136.0/7.0) | | | | | | word_freq_people > 0.15 : | | | | | | | word_freq_business > 0.22 : 1 (11.0) | | | | | | | word_freq_business <= 0.22 : | | | | | | | | capital_run_length_total <= 137 : 0 (7.0) | | | | | | | | capital_run_length_total > 137 : 1 (4.0/1.0) | | word_freq_000 > 0.25 : | | | char_freq_( > 0.6 : 0 (3.0) | | | char_freq_( <= 0.6 : | | | | word_freq_original > 0.05 : 0 (3.0/1.0) | | | | word_freq_original <= 0.05 : | | | | | word_freq_address > 0.01 : 1 (4.0) | | | | | word_freq_address <= 0.01 : | | | | | | word_freq_report > 0.18 : 1 (10.0) | | | | | | word_freq_report <= 0.18 : | | | | | | | word_freq_edu <= 0.16 : 1 (32.0) | | | | | | | word_freq_edu > 0.16 : 0 (2.0) | char_freq_$ > 0.055 : | | word_freq_hp > 0.39 : 0 (57.0/1.0) | | word_freq_hp <= 0.39 : | | | capital_run_length_longest <= 9 : | | | | word_freq_email > 1.43 : 1 (7.0) | | | | word_freq_email <= 1.43 : | | | | | word_freq_free <= 0.73 : 0 (21.0) | | | | | word_freq_free > 0.73 : 1 (4.0/1.0) | | | capital_run_length_longest > 9 : | | | | word_freq_1999 <= 0 : | | | | | char_freq_( <= 0.282 : | | | | | | capital_run_length_average > 3.266 : 1 (267.0) | | | | | | capital_run_length_average <= 3.266 : | | | | | | | char_freq_! <= 0.003 : | | | | | | | | word_freq_our > 0.13 : 1 (10.0) | | | | | | | | word_freq_our <= 0.13 : | | | | | | | | | word_freq_your <= 0.33 : 1 (4.0/1.0) | | | | | | | | | word_freq_your > 0.33 : 0 (5.0) | | | | | | | char_freq_! > 0.003 : | | | | | | | | char_freq_; > 0.048 : 1 (12.0/2.0) | | | | | | | | char_freq_; <= 0.048 : | | | | | | | | | char_freq_[ > 0.005 : 1 (7.0/1.0) | | | | | | | | | char_freq_[ <= 0.005 : | | | | | | | | | | char_freq_# <= 0.024 : 1 (108.0) | | | | | | | | | | char_freq_# > 0.024 : | | | | | | | | | | | char_freq_# <= 0.056 : 0 (3.0/1.0) | | | | | | | | | | | char_freq_# > 0.056 : 1 (18.0) | | | | | char_freq_( > 0.282 : | | | | | | word_freq_hp > 0.11 : 0 (2.0) | | | | | | word_freq_hp <= 0.11 : | | | | | | | word_freq_project > 0.01 : 0 (2.0) | | | | | | | word_freq_project <= 0.01 : | | | | | | | | capital_run_length_average <= 2.2 : 0 (3.0) | | | | | | | | capital_run_length_average > 2.2 : 1 (31.0/1.0) | | | | word_freq_1999 > 0 : | | | | | word_freq_edu > 0.26 : 0 (8.0) | | | | | word_freq_edu <= 0.26 : | | | | | | word_freq_conference > 0.32 : 0 (2.0) | | | | | | word_freq_conference <= 0.32 : | | | | | | | word_freq_email <= 0.39 : 1 (17.0/1.0) | | | | | | | word_freq_email > 0.39 : | | | | | | | | char_freq_! <= 0.1 : 1 (2.0) | | | | | | | | char_freq_! > 0.1 : 0 (3.0) word_freq_remove > 0 : | word_freq_hp <= 0.19 : | | word_freq_edu <= 0.08 : | | | char_freq_! > 0.076 : 1 (540.0/7.0) | | | char_freq_! <= 0.076 : | | | | word_freq_business > 0.05 : 1 (67.0) | | | | word_freq_business <= 0.05 : | | | | | word_freq_george > 0.08 : 0 (6.0) | | | | | word_freq_george <= 0.08 : | | | | | | word_freq_direct > 0.07 : 1 (4.0/1.0) | | | | | | word_freq_direct <= 0.07 : | | | | | | | word_freq_our > 0.08 : 1 (42.0) | | | | | | | word_freq_our <= 0.08 : | | | | | | | | word_freq_internet <= 0.18 : 1 (27.0/2.0) | | | | | | | | word_freq_internet > 0.18 : 0 (5.0/1.0) | | word_freq_edu > 0.08 : | | | word_freq_money <= 0.04 : 0 (7.0/1.0) | | | word_freq_money > 0.04 : 1 (18.0) | word_freq_hp > 0.19 : | | char_freq_$ <= 0.028 : 0 (15.0/1.0) | | char_freq_$ > 0.028 : 1 (12.0/2.0) Subtree [S1] capital_run_length_longest <= 12 : 0 (637.0/23.0) capital_run_length_longest > 12 : | word_freq_edu > 0.04 : 0 (87.0/1.0) | word_freq_edu <= 0.04 : | | word_freq_over > 0.63 : 1 (7.0/1.0) | | word_freq_over <= 0.63 : | | | word_freq_original > 0.07 : 0 (7.0) | | | word_freq_original <= 0.07 : | | | | word_freq_project > 0.54 : 0 (6.0) | | | | word_freq_project <= 0.54 : | | | | | word_freq_hpl > 0.17 : 0 (8.0) | | | | | word_freq_hpl <= 0.17 : | | | | | | char_freq_# > 0.011 : 0 (5.0) | | | | | | char_freq_# <= 0.011 : | | | | | | | word_freq_people > 0.04 : 0 (5.0) | | | | | | | word_freq_people <= 0.04 : | | | | | | | | word_freq_receive > 0.99 : 0 (8.0) | | | | | | | | word_freq_receive <= 0.99 : | | | | | | | | | word_freq_conference > 0.26 : 0 (5.0) | | | | | | | | | word_freq_conference <= 0.26 : | | | | | | | | | | word_freq_re > 0.24 : 0 (12.0) | | | | | | | | | | word_freq_re <= 0.24 : | | | | | | | | | | | word_freq_meeting > 0.04 : 0 (8.0) | | | | | | | | | | | word_freq_meeting <= 0.04 : | | | | | | | | | | | | word_freq_our > 0.04 : 1 (4.0) | | | | | | | | | | | | word_freq_our <= 0.04 :[S11] Subtree [S2] word_freq_make > 0.2 : 1 (4.0) word_freq_make <= 0.2 : | char_freq_! > 0.112 : 1 (2.0) | char_freq_! <= 0.112 : | | word_freq_over <= 0.3 : 0 (21.0) | | word_freq_over > 0.3 : 1 (4.0/1.0) Subtree [S3] capital_run_length_longest <= 20 : 0 (9.0) capital_run_length_longest > 20 : | word_freq_make <= 0.02 : 1 (14.0/1.0) | word_freq_make > 0.02 : 0 (3.0) Subtree [S4] word_freq_650 > 0.26 : 0 (5.0) word_freq_650 <= 0.26 : | word_freq_technology > 0.03 : 0 (5.0) | word_freq_technology <= 0.03 : | | capital_run_length_longest <= 10 : 0 (58.0/2.0) | | capital_run_length_longest > 10 : | | | word_freq_hp > 0.14 : 0 (9.0) | | | word_freq_hp <= 0.14 : | | | | word_freq_meeting <= 0.07 : 1 (6.0/1.0) | | | | word_freq_meeting > 0.07 : 0 (3.0) Subtree [S5] word_freq_mail <= 0.36 : 0 (3.0) word_freq_mail > 0.36 : 1 (2.0) Subtree [S6] word_freq_business > 0.35 : 1 (7.0/1.0) word_freq_business <= 0.35 : | capital_run_length_longest > 44 : 0 (25.0) | capital_run_length_longest <= 44 : | | word_freq_meeting > 0.4 : 0 (3.0) | | word_freq_meeting <= 0.4 : | | | word_freq_make > 0.12 : 0 (4.0) | | | word_freq_make <= 0.12 : | | | | word_freq_credit > 0.49 : 1 (2.0) | | | | word_freq_credit <= 0.49 : | | | | | word_freq_receive > 0.12 : 1 (2.0) | | | | | word_freq_receive <= 0.12 : | | | | | | word_freq_telnet > 0.13 : 0 (2.0) | | | | | | word_freq_telnet <= 0.13 : | | | | | | | word_freq_85 > 0.3 : 0 (2.0) | | | | | | | word_freq_85 <= 0.3 : | | | | | | | | word_freq_all > 0.11 : 1 (5.0) | | | | | | | | word_freq_all <= 0.11 : | | | | | | | | | word_freq_internet > 0.37 : 1 (2.0) | | | | | | | | | word_freq_internet <= 0.37 : | | | | | | | | | | word_freq_free <= 1.1 : 0 (9.0) | | | | | | | | | | word_freq_free > 1.1 : | | | | | | | | | | | word_freq_pm > 0.7 : 0 (2.0) | | | | | | | | | | | word_freq_pm <= 0.7 : | | | | | | | | | | | | char_freq_! > 0.102 : 1 (2.0) | | | | | | | | | | | | char_freq_! <= 0.102 :[S12] Subtree [S7] word_freq_direct > 0.06 : 1 (4.0) word_freq_direct <= 0.06 : | word_freq_project > 0.01 : 1 (3.0) | word_freq_project <= 0.01 : | | word_freq_meeting > 0.93 : 0 (2.0) | | word_freq_meeting <= 0.93 : | | | char_freq_! <= 0.289 : 1 (36.0/4.0) | | | char_freq_! > 0.289 : 0 (2.0) Subtree [S8] capital_run_length_longest > 9 : 0 (14.0) capital_run_length_longest <= 9 : | word_freq_85 > 1.51 : 0 (2.0) | word_freq_85 <= 1.51 : | | word_freq_will <= 1.61 : 1 (2.0) | | word_freq_will > 1.61 : 0 (2.0) Subtree [S9] word_freq_you <= 1.66 : 0 (17.0/1.0) word_freq_you > 1.66 : | char_freq_# > 0.207 : 1 (2.0/1.0) | char_freq_# <= 0.207 : | | char_freq_! <= 1.118 : 1 (2.0) | | char_freq_! > 1.118 : 0 (3.0/1.0) Subtree [S10] capital_run_length_average <= 3.578 : 0 (6.0/1.0) capital_run_length_average > 3.578 : 1 (3.0) Subtree [S11] word_freq_technology > 0.42 : 1 (3.0) word_freq_technology <= 0.42 : | word_freq_technology > 0.1 : 0 (4.0) | word_freq_technology <= 0.1 : | | word_freq_you > 6.06 : 1 (4.0) | | word_freq_you <= 6.06 : | | | word_freq_1999 > 0.24 : 0 (4.0) | | | word_freq_1999 <= 0.24 : | | | | word_freq_make > 1.88 : 1 (2.0) | | | | word_freq_make <= 1.88 : | | | | | word_freq_make > 0.35 : 0 (2.0) | | | | | word_freq_make <= 0.35 : | | | | | | word_freq_85 > 0.15 : 0 (2.0) | | | | | | word_freq_85 <= 0.15 : | | | | | | | word_freq_cs > 0.53 : 0 (2.0) | | | | | | | word_freq_cs <= 0.53 : | | | | | | | | capital_run_length_average > 5.236 : 0 (8.0) | | | | | | | | capital_run_length_average <= 5.236 : | | | | | | | | | capital_run_length_longest > 29 : 1 (6.0) | | | | | | | | | capital_run_length_longest <= 29 :[S13] Subtree [S12] word_freq_your > 1.19 : 1 (2.0) word_freq_your <= 1.19 : | word_freq_you > 1.85 : 0 (2.0) | word_freq_you <= 1.85 : | | capital_run_length_total <= 13 : 1 (2.0) | | capital_run_length_total > 13 : 0 (3.0/1.0) Subtree [S13] capital_run_length_average <= 2.823 : 0 (8.0) capital_run_length_average > 2.823 : | word_freq_you > 2.16 : 1 (2.0) | word_freq_you <= 2.16 : | | capital_run_length_longest <= 17 : 1 (4.0/1.0) | | capital_run_length_longest > 17 : 0 (2.0) Simplified Decision Tree: word_freq_remove <= 0 : | char_freq_$ <= 0.055 : | | word_freq_000 <= 0.25 : | | | char_freq_! <= 0.378 : | | | | word_freq_money <= 0.03 : | | | | | word_freq_font <= 0.12 : | | | | | | word_freq_free <= 0.19 : | | | | | | | word_freq_george > 0 : 0 (600.0/1.4) | | | | | | | word_freq_george <= 0 : | | | | | | | | word_freq_our <= 0.71 : | | | | | | | | | word_freq_hp > 0.02 : 0 (503.0/6.2) | | | | | | | | | word_freq_hp <= 0.02 : | | | | | | | | | | word_freq_650 <= 0 : | | | | | | | | | | | word_freq_business <= 0.08 :[S1] | | | | | | | | | | | word_freq_business > 0.08 : | | | | | | | | | | | | word_freq_hp > 0 : 1 (2.0/1.0) | | | | | | | | | | | | word_freq_hp <= 0 :[S2] | | | | | | | | | | word_freq_650 > 0 :[S3] | | | | | | | | word_freq_our > 0.71 : | | | | | | | | | word_freq_internet <= 0.5 : | | | | | | | | | | word_freq_email <= 0.42 :[S4] | | | | | | | | | | word_freq_email > 0.42 : | | | | | | | | | | | word_freq_will <= 1.82 : 1 (4.0/1.2) | | | | | | | | | | | word_freq_will > 1.82 : 0 (2.0/1.0) | | | | | | | | | word_freq_internet > 0.5 : | | | | | | | | | | word_freq_over <= 0.12 : 1 (8.0/1.3) | | | | | | | | | | word_freq_over > 0.12 : 0 (2.0/1.0) | | | | | | word_freq_free > 0.19 : | | | | | | | word_freq_george > 0.12 : 0 (27.0/1.4) | | | | | | | word_freq_george <= 0.12 : | | | | | | | | word_freq_cs > 0.08 : 0 (8.0/1.3) | | | | | | | | word_freq_cs <= 0.08 : | | | | | | | | | word_freq_project > 0.31 : 0 (8.0/1.3) | | | | | | | | | word_freq_project <= 0.31 : | | | | | | | | | | word_freq_people > 0.48 : 0 (10.0/1.3) | | | | | | | | | | word_freq_people <= 0.48 : | | | | | | | | | | | word_freq_our <= 0.05 :[S5] | | | | | | | | | | | word_freq_our > 0.05 : | | | | | | | | | | | | word_freq_1999 > 0.32 : 0 (4.0/1.2) | | | | | | | | | | | | word_freq_1999 <= 0.32 :[S6] | | | | | word_freq_font > 0.12 : | | | | | | char_freq_; > 0.895 : 0 (14.0/1.3) | | | | | | char_freq_; <= 0.895 : | | | | | | | word_freq_edu <= 0.09 : 1 (17.0/1.3) | | | | | | | word_freq_edu > 0.09 : 0 (4.0/1.2) | | | | word_freq_money > 0.03 : | | | | | word_freq_hp > 0.08 : 0 (11.0/1.3) | | | | | word_freq_hp <= 0.08 : | | | | | | word_freq_edu > 0.08 : 0 (9.0/1.3) | | | | | | word_freq_edu <= 0.08 : | | | | | | | word_freq_project > 0.17 : 0 (4.0/1.2) | | | | | | | word_freq_project <= 0.17 : | | | | | | | | capital_run_length_longest <= 9 : 0 (5.0/2.3) | | | | | | | | capital_run_length_longest > 9 : 1 (29.0/2.6) | | | char_freq_! > 0.378 : | | | | capital_run_length_total <= 64 : | | | | | word_freq_free > 0.77 : 1 (21.0/2.5) | | | | | word_freq_free <= 0.77 : | | | | | | capital_run_length_average <= 2.652 : | | | | | | | char_freq_! <= 0.824 : 0 (83.0/3.8) | | | | | | | char_freq_! > 0.824 : | | | | | | | | word_freq_business > 0.5 : 1 (5.0/1.2) | | | | | | | | word_freq_business <= 0.5 : | | | | | | | | | word_freq_our > 0.37 : 1 (4.0/2.2) | | | | | | | | | word_freq_our <= 0.37 : | | | | | | | | | | word_freq_re > 0.34 : 0 (11.0/1.3) | | | | | | | | | | word_freq_re <= 0.34 : | | | | | | | | | | | char_freq_! <= 3.907 : 0 (27.0/7.1) | | | | | | | | | | | char_freq_! > 3.907 : 1 (5.0/1.2) | | | | | | capital_run_length_average > 2.652 : | | | | | | | word_freq_1999 > 0.64 : 0 (2.0/1.0) | | | | | | | word_freq_1999 <= 0.64 : | | | | | | | | word_freq_re > 0.6 : 0 (2.0/1.0) | | | | | | | | word_freq_re <= 0.6 : | | | | | | | | | word_freq_you > 3.2 : 1 (7.0/1.3) | | | | | | | | | word_freq_you <= 3.2 :[S7] | | | | capital_run_length_total > 64 : | | | | | word_freq_pm > 0.04 : 0 (11.0/2.5) | | | | | word_freq_pm <= 0.04 : | | | | | | word_freq_people <= 0.15 : 1 (136.0/9.6) | | | | | | word_freq_people > 0.15 : | | | | | | | word_freq_business > 0.22 : 1 (11.0/1.3) | | | | | | | word_freq_business <= 0.22 : | | | | | | | | capital_run_length_total <= 137 : 0 (7.0/1.3) | | | | | | | | capital_run_length_total > 137 : 1 (4.0/2.2) | | word_freq_000 > 0.25 : | | | char_freq_( > 0.6 : 0 (3.0/1.1) | | | char_freq_( <= 0.6 : | | | | word_freq_original > 0.05 : 0 (3.0/2.1) | | | | word_freq_original <= 0.05 : | | | | | word_freq_edu <= 0.16 : 1 (46.0/1.4) | | | | | word_freq_edu > 0.16 : 0 (2.0/1.0) | char_freq_$ > 0.055 : | | word_freq_hp > 0.39 : 0 (57.0/2.6) | | word_freq_hp <= 0.39 : | | | capital_run_length_longest <= 9 : | | | | word_freq_email > 1.43 : 1 (7.0/1.3) | | | | word_freq_email <= 1.43 : | | | | | word_freq_free <= 0.73 : 0 (21.0/1.3) | | | | | word_freq_free > 0.73 : 1 (4.0/2.2) | | | capital_run_length_longest > 9 : | | | | word_freq_1999 <= 0 : | | | | | char_freq_( <= 0.282 : | | | | | | capital_run_length_average > 3.266 : 1 (267.0/1.4) | | | | | | capital_run_length_average <= 3.266 : | | | | | | | char_freq_! > 0.003 : 1 (148.0/7.3) | | | | | | | char_freq_! <= 0.003 : | | | | | | | | word_freq_our > 0.13 : 1 (10.0/1.3) | | | | | | | | word_freq_our <= 0.13 : | | | | | | | | | word_freq_your <= 0.33 : 1 (4.0/2.2) | | | | | | | | | word_freq_your > 0.33 : 0 (5.0/1.2) | | | | | char_freq_( > 0.282 : | | | | | | word_freq_hp > 0.11 : 0 (2.0/1.0) | | | | | | word_freq_hp <= 0.11 : | | | | | | | word_freq_project > 0.01 : 0 (2.0/1.0) | | | | | | | word_freq_project <= 0.01 : | | | | | | | | capital_run_length_average <= 2.2 : 0 (3.0/1.1) | | | | | | | | capital_run_length_average > 2.2 : 1 (31.0/2.6) | | | | word_freq_1999 > 0 : | | | | | word_freq_edu > 0.26 : 0 (8.0/1.3) | | | | | word_freq_edu <= 0.26 : | | | | | | word_freq_conference > 0.32 : 0 (2.0/1.0) | | | | | | word_freq_conference <= 0.32 : | | | | | | | word_freq_email <= 0.39 : 1 (17.0/2.5) | | | | | | | word_freq_email > 0.39 : | | | | | | | | char_freq_! <= 0.1 : 1 (2.0/1.0) | | | | | | | | char_freq_! > 0.1 : 0 (3.0/1.1) word_freq_remove > 0 : | word_freq_hp <= 0.19 : | | word_freq_edu <= 0.08 : | | | char_freq_! > 0.076 : 1 (540.0/9.6) | | | char_freq_! <= 0.076 : | | | | word_freq_business > 0.05 : 1 (67.0/1.4) | | | | word_freq_business <= 0.05 : | | | | | word_freq_george > 0.08 : 0 (6.0/1.2) | | | | | word_freq_george <= 0.08 : | | | | | | word_freq_our > 0.08 : 1 (45.0/2.6) | | | | | | word_freq_our <= 0.08 : | | | | | | | word_freq_internet <= 0.18 : 1 (28.0/3.7) | | | | | | | word_freq_internet > 0.18 : 0 (5.0/2.3) | | word_freq_edu > 0.08 : | | | word_freq_money <= 0.04 : 0 (7.0/2.4) | | | word_freq_money > 0.04 : 1 (18.0/1.3) | word_freq_hp > 0.19 : | | char_freq_$ <= 0.028 : 0 (15.0/2.5) | | char_freq_$ > 0.028 : 1 (12.0/3.6) Subtree [S1] capital_run_length_longest <= 12 : 0 (637.0/27.0) capital_run_length_longest > 12 : | word_freq_edu > 0.04 : 0 (87.0/2.6) | word_freq_edu <= 0.04 : | | word_freq_over > 0.63 : 1 (7.0/2.4) | | word_freq_over <= 0.63 : | | | word_freq_project > 0.54 : 0 (6.0/1.2) | | | word_freq_project <= 0.54 : | | | | word_freq_hpl > 0.17 : 0 (10.0/1.3) | | | | word_freq_hpl <= 0.17 : | | | | | word_freq_conference > 0.26 : 0 (6.0/1.2) | | | | | word_freq_conference <= 0.26 : | | | | | | word_freq_re > 0.24 : 0 (16.0/1.3) | | | | | | word_freq_re <= 0.24 : | | | | | | | word_freq_meeting > 0.04 : 0 (10.0/1.3) | | | | | | | word_freq_meeting <= 0.04 : | | | | | | | | word_freq_technology > 0.42 : 1 (3.0/1.1) | | | | | | | | word_freq_technology <= 0.42 : | | | | | | | | | word_freq_technology > 0.1 : 0 (4.0/1.2) | | | | | | | | | word_freq_technology <= 0.1 : | | | | | | | | | | word_freq_you > 6.06 : 1 (4.0/1.2) | | | | | | | | | | word_freq_you <= 6.06 : | | | | | | | | | | | word_freq_85 > 0.15 : 0 (2.0/1.0) | | | | | | | | | | | word_freq_85 <= 0.15 :[S8] Subtree [S2] word_freq_make > 0.2 : 1 (4.0/1.2) word_freq_make <= 0.2 : | char_freq_! > 0.112 : 1 (2.0/1.0) | char_freq_! <= 0.112 : | | word_freq_over <= 0.3 : 0 (21.0/1.3) | | word_freq_over > 0.3 : 1 (4.0/2.2) Subtree [S3] capital_run_length_longest <= 20 : 0 (9.0/1.3) capital_run_length_longest > 20 : | word_freq_make <= 0.02 : 1 (14.0/2.5) | word_freq_make > 0.02 : 0 (3.0/1.1) Subtree [S4] capital_run_length_longest <= 10 : 0 (64.0/5.0) capital_run_length_longest > 10 : | word_freq_hp > 0.14 : 0 (19.0/1.3) | word_freq_hp <= 0.14 : | | word_freq_meeting <= 0.07 : 1 (18.0/5.9) | | word_freq_meeting > 0.07 : 0 (3.0/1.1) Subtree [S5] word_freq_business > 0.35 : 1 (7.0/2.4) word_freq_business <= 0.35 : | capital_run_length_longest > 44 : 0 (25.0/1.3) | capital_run_length_longest <= 44 : | | word_freq_meeting > 0.4 : 0 (3.0/1.1) | | word_freq_meeting <= 0.4 : | | | word_freq_make > 0.12 : 0 (4.0/1.2) | | | word_freq_make <= 0.12 : | | | | word_freq_85 > 0.3 : 0 (2.0/1.0) | | | | word_freq_85 <= 0.3 : | | | | | word_freq_all > 0.11 : 1 (6.0/1.2) | | | | | word_freq_all <= 0.11 : | | | | | | word_freq_free <= 1.1 : 0 (10.0/1.3) | | | | | | word_freq_free > 1.1 : | | | | | | | word_freq_pm <= 0.7 : 1 (17.0/6.9) | | | | | | | word_freq_pm > 0.7 : 0 (2.0/1.0) Subtree [S6] word_freq_meeting > 0.93 : 0 (2.0/1.0) word_freq_meeting <= 0.93 : | char_freq_! <= 0.289 : 1 (43.0/6.1) | char_freq_! > 0.289 : 0 (2.0/1.0) Subtree [S7] capital_run_length_average <= 3.578 : 0 (6.0/2.3) capital_run_length_average > 3.578 : 1 (3.0/1.1) Subtree [S8] capital_run_length_average > 5.236 : 0 (19.0/2.5) capital_run_length_average <= 5.236 : | capital_run_length_longest > 29 : 1 (10.0/2.4) | capital_run_length_longest <= 29 : | | capital_run_length_average <= 2.823 : 0 (17.0/1.3) | | capital_run_length_average > 2.823 : | | | word_freq_you > 2.16 : 1 (2.0/1.0) | | | word_freq_you <= 2.16 : | | | | capital_run_length_longest <= 17 : 1 (7.0/3.4) | | | | capital_run_length_longest > 17 : 0 (5.0/1.2) Tree saved Evaluation on training data (4142 items): Before Pruning After Pruning ---------------- --------------------------- Size Errors Size Errors Estimate 305 84( 2.0%) 219 99( 2.4%) ( 6.0%) <<