﻿1
00:00:17,210 --> 00:00:25,940
Hi! Welcome back! In the last lesson, we looked
at linear regression -- the problem of predicting,

2
00:00:25,949 --> 00:00:30,500
not a nominal class value, but a numeric class
value.

3
00:00:30,500 --> 00:00:31,710
The regression problem.

4
00:00:31,710 --> 00:00:37,370
In this lesson, we're going to look at how
to use regression techniques for classification.

5
00:00:37,370 --> 00:00:42,589
It sounds a bit weird, but regression techniques
can be really good under certain circumstances,

6
00:00:42,589 --> 00:00:47,399
and we're going to see if we can apply them
to ordinary classification problems.

7
00:00:47,399 --> 00:00:50,670
In a 2-class problem, it's quite easy really.

8
00:00:50,670 --> 00:00:57,039
We're going to call the 2 classes 0 and 1
and just use those as numbers, and then come

9
00:00:57,039 --> 00:01:06,460
up with a regression line that, presumably
for most 0 instances has a pretty low value,

10
00:01:06,460 --> 00:01:11,390
and for most 1 instances has a larger value,
and then come up with a threshold for determining

11
00:01:11,390 --> 00:01:15,909
whether, if it's less than that threshold,
we're going to predict class 0; if it's greater,

12
00:01:15,909 --> 00:01:18,600
we're going to predict class 1.

13
00:01:18,600 --> 00:01:23,820
If we want to generalize that to more than
2 classes, we can use a separate regression

14
00:01:23,820 --> 00:01:25,350
for each class.

15
00:01:25,350 --> 00:01:31,270
We set the output to 1 for instances that
belong to the class, and 0 for instances that don't.

16
00:01:31,270 --> 00:01:36,250
Then come up with a separate regression line
for each class, and given an unknown test

17
00:01:36,250 --> 00:01:40,229
example, we're going to choose a class with
the largest output.

18
00:01:41,400 --> 00:01:48,990
That would give us n regressions for a problem
where there are n different classes.

19
00:01:48,990 --> 00:01:53,990
We could alternatively use pairwise regression:
take every pair of classes -- that's n squared

20
00:01:53,990 --> 00:02:00,500
over 2 -- and have a linear regression line
for each pair of classes, discriminating an

21
00:02:00,500 --> 00:02:05,770
instance in one class of that pair from the
other class of that pair.

22
00:02:05,770 --> 00:02:11,070
We're going to work with a 2-class problem,
and we're going to investigate 2-class classification

23
00:02:11,070 --> 00:02:12,549
by regression.

24
00:02:12,549 --> 00:02:19,549
I'm going to open diabetes.arff.

25
00:02:19,540 --> 00:02:21,680
Then I'm going to convert the class.

26
00:02:21,680 --> 00:02:25,820
Actually, let's just try to apply regression
to this.

27
00:02:25,820 --> 00:02:28,950
I'm going to try LinearRegression.

28
00:02:28,950 --> 00:02:30,570
You see it's grayed out here.

29
00:02:30,570 --> 00:02:32,290
That means it's not applicable.

30
00:02:32,290 --> 00:02:37,800
I can select it, but I can't start it.

31
00:02:37,800 --> 00:02:43,580
It's not applicable because linear regression
applies to a dataset where the class is numeric,

32
00:02:43,580 --> 00:02:48,320
and we've got a dataset where the class is
nominal.

33
00:02:48,320 --> 00:02:49,300
We need to fix that.

34
00:02:49,300 --> 00:02:55,670
We're going to change this from these 2 labels
to 0 and 1, respectively.

35
00:02:55,670 --> 00:02:58,320
We'll do that with a filter.

36
00:02:58,320 --> 00:03:00,200
We want to change an attribute.

37
00:03:00,200 --> 00:03:04,140
It's unsupervised.

38
00:03:04,140 --> 00:03:10,460
We want to change a nominal to a binary attribute,
so that's the NominalToBinary filter.

39
00:03:10,460 --> 00:03:12,460
We want to apply that to the 9th attribute.

40
00:03:12,460 --> 00:03:18,640
The default will apply it to all the attributes,
but we just want to apply it to the 9th attribute.

41
00:03:18,640 --> 00:03:22,680
I'm hoping it will change this attribute from
nominal to binary.

42
00:03:22,680 --> 00:03:24,110
Unfortunately, it doesn't.

43
00:03:24,110 --> 00:03:29,670
It doesn't have any effect, and the reason
it doesn't have any effect is because these

44
00:03:29,670 --> 00:03:34,450
attribute filters don't work on the class
value.

45
00:03:34,450 --> 00:03:41,420
I can change the class value; we're going
to give this "No class", so now this is not

46
00:03:41,420 --> 00:03:43,620
the class value for the dataset.

47
00:03:43,620 --> 00:03:46,210
Run the filter again.

48
00:03:46,210 --> 00:03:51,490
Now I've got what I want: this attribute "class"
is either 0 or 1.

49
00:03:51,490 --> 00:03:56,490
In fact, this is the histogram -- there are
this number of 0's and this number of 1's,

50
00:03:56,490 --> 00:03:59,800
which correspond to the two different values
in the original dataset.

51
00:04:01,230 --> 00:04:09,860
Now, we've got our LinearRegression, and we
can just run it.

52
00:04:09,860 --> 00:04:11,240
This is the regression line.

53
00:04:11,240 --> 00:04:19,050
It's a line, 0.02 times the "pregnancy" attribute,
plus this times the "plas" attribute, and

54
00:04:19,050 --> 00:04:22,110
so on, plus this times the "age" attribute,
plus this number.

55
00:04:22,110 --> 00:04:26,390
That will give us a number for any given instance.

56
00:04:26,390 --> 00:04:32,610
We can see that number if we select "Output
predictions" and run it again.

57
00:04:33,790 --> 00:04:39,300
Here is a table of predictions for each instance
in the dataset.

58
00:04:39,300 --> 00:04:46,070
This is the instance number; this is the actual
class of the instance, which is 0 or 1; this

59
00:04:46,070 --> 00:04:49,770
is the predicted class, which is a number
-- sometimes it's less than 0.

60
00:04:49,770 --> 00:04:55,690
We would hope that these numbers are generally
fairly small for 0's and generally larger

61
00:04:55,690 --> 00:04:56,570
for 1's.

62
00:04:56,570 --> 00:05:00,730
They sort of are, although it's not really
easy to tell.

63
00:05:00,730 --> 00:05:07,730
This is the error value here in the fourth
column.

64
00:05:09,500 --> 00:05:13,580
I'm going to do more extensive investigation,
and you might ask why are we bothering to

65
00:05:13,580 --> 00:05:17,440
do this? First of all, it's an interesting
idea that I want to explore.

66
00:05:17,440 --> 00:05:21,540
It will lead to quite good performance for
classification by regression, and it will

67
00:05:21,540 --> 00:05:28,200
lead into the next lesson on logistic regression,
which is an excellent classification technique.

68
00:05:28,200 --> 00:05:33,600
Perhaps most importantly, we'll learn how
to do some cool things with the Weka interface.

69
00:05:33,600 --> 00:05:39,670
My strategy is to add a new attribute called
"classification" that gives this predicted

70
00:05:39,670 --> 00:05:46,710
number, and then we're going to use OneR to
optimize a split point for the two classes.

71
00:05:46,710 --> 00:05:51,170
We'll have to restore the class back to its
original nominal value, because, remember,

72
00:05:51,170 --> 00:05:54,920
I just converted it to numeric.

73
00:05:54,920 --> 00:05:56,350
Here it is in detail.

74
00:05:56,350 --> 00:06:01,800
We're going to use a supervised attribute
filter [AddClassification].

75
00:06:01,800 --> 00:06:07,480
This is actually pretty cool, I think.

76
00:06:07,480 --> 00:06:12,250
We're going to add a new attribute called
"classification".

77
00:06:12,250 --> 00:06:20,390
We're going to choose a classifier for that
-- LinearRegression.

78
00:06:20,390 --> 00:06:24,700
We need to set "outputClassification" to "True".

79
00:06:24,700 --> 00:06:28,730
If we just run this, it will add a new attribute
to the dataset.

80
00:06:28,730 --> 00:06:34,540
It's called "classification", and it's got
these numeric values, which correspond exactly

81
00:06:34,540 --> 00:06:41,540
to the numeric values that were predicted
here by the linear regression scheme.

82
00:06:43,390 --> 00:06:49,200
Now, we've got this "classification" attribute,
and what I'd like to do now is to convert

83
00:06:49,200 --> 00:06:52,320
the class attribute back to nominal from numeric.

84
00:06:52,320 --> 00:06:57,300
I want to use ZeroR now, and ZeroR will only
work with a nominal class.

85
00:06:57,300 --> 00:07:04,300
Let me convert that.

86
00:07:04,830 --> 00:07:11,830
I want NumericToNominal.

87
00:07:12,050 --> 00:07:18,270
I want to run that on attribute number 9.

88
00:07:18,270 --> 00:07:27,040
Let me apply that, and now, sure enough, I've
got the two labels 0 and 1.

89
00:07:27,040 --> 00:07:30,390
This is a nominal attribute with these two
labels.

90
00:07:30,390 --> 00:07:35,440
I'll be sure to make that one the class attribute.

91
00:07:38,150 --> 00:07:44,410
Then I get the colors back -- 2 colors for
the 2 classes.

92
00:07:44,410 --> 00:07:49,290
Really, I want to predict this "class" based
on the value of "classification", that numeric value.

93
00:07:49,290 --> 00:07:53,820
I'm going to delete all the other attributes.

94
00:07:56,570 --> 00:08:00,820
I'm going to go to my Classify panel here.

95
00:08:01,200 --> 00:08:15,220
I'm going to predict "class" -- this nominal
value "class" -- and I'm going to use OneR.

96
00:08:21,290 --> 00:08:31,100
I think I'll stop outputting the predictions
because they just get in the way; and run that.

97
00:08:31,100 --> 00:08:33,300
It's 72-73%, and that's a bit disappointing.

98
00:08:33,300 --> 00:08:38,360
But actually, when you look at this, OneR
has produced this really overfitted rule.

99
00:08:38,360 --> 00:08:40,270
We want a single split point.

100
00:08:40,270 --> 00:08:44,540
If it's less than this than predict 0, otherwise
predict 1.

101
00:08:44,540 --> 00:08:52,330
We can get around that by changing this "b"
parameter, the minBucketSize parameter, to

102
00:08:52,330 --> 00:08:53,920
be something much larger.

103
00:08:53,920 --> 00:08:58,790
I'm going to change it to 100 and run it again.

104
00:08:58,790 --> 00:09:05,250
Now I've got much better performance, 77%
accuracy, and this is the kind of split I've

105
00:09:05,250 --> 00:09:10,180
got: if the classification -- that is the
regression value -- is less than 0.47 I'm

106
00:09:10,180 --> 00:09:14,700
going to call it a 0; otherwise I'm going
to call it a 1.

107
00:09:14,700 --> 00:09:17,630
So I've got what I wanted, classification
by regression.

108
00:09:17,630 --> 00:09:21,070
We've extended linear regression to classification.

109
00:09:21,070 --> 00:09:27,260
This performance of 76.8% is actually quite
good for this problem.

110
00:09:27,260 --> 00:09:33,460
It was easy to do with 2 classes, 0 and 1;
otherwise you need to have a regression for

111
00:09:33,460 --> 00:09:39,280
each class -- multi-response linear regression
-- or else for each pair of classes -- pairwise

112
00:09:39,280 --> 00:09:41,390
linear regression.

113
00:09:41,390 --> 00:09:43,020
We learnt quite a few things about Weka.

114
00:09:43,020 --> 00:09:48,100
We learned about unsupervised attribute filters
to convert nominal attributes to binary, and

115
00:09:48,100 --> 00:09:50,490
numeric attributes back to nominal.

116
00:09:50,490 --> 00:09:55,100
We learned about this cool filter AddClassification,
which adds the classification according to

117
00:09:55,100 --> 00:09:59,210
a machine learning scheme as an attribute
in the dataset.

118
00:09:59,210 --> 00:10:03,140
We learned about setting and unsetting the
class of the dataset, and we learned about

119
00:10:03,140 --> 00:10:08,290
the minimum bucket size parameter to prevent
OneR from overfitting.

120
00:10:08,290 --> 00:10:09,950
That's classification by regression.

121
00:10:09,950 --> 00:10:12,330
In the next lesson, we're going to do better.

122
00:10:12,330 --> 00:10:18,380
We're going to look at logistic regression,
an advanced technique which effectively does

123
00:10:18,380 --> 00:10:22,330
classification by regression in an even more
effective way.

124
00:10:22,330 --> 00:10:23,390
We'll see you soon.

125
00:10:23,390 --> 00:10:24,890
Bye!

