﻿1
00:00:17,850 --> 00:00:18,920
Hello again.

2
00:00:18,920 --> 00:00:23,350
In most courses, there comes a point where
things start to get a little tough.

3
00:00:23,350 --> 00:00:27,480
In the last couple of lessons, you've seen
some mathematics that you probably didn't

4
00:00:27,480 --> 00:00:31,350
want to see, and you might have realized that
you'll never completely understand how all

5
00:00:31,350 --> 00:00:35,650
these machine learning methods work in detail.

6
00:00:35,650 --> 00:00:39,280
I want you to know that what I'm trying to
convey is the gist of modern machine learning

7
00:00:39,280 --> 00:00:41,710
methods, not the details.

8
00:00:41,710 --> 00:00:44,800
What's important is that you can use them
and that you understand a little bit of the

9
00:00:44,800 --> 00:00:47,170
principles behind how they work.

10
00:00:47,170 --> 00:00:49,220
And the math is almost finished.

11
00:00:49,220 --> 00:00:53,650
So hang in there; things will start to get
easier -- and anyway, there's not far to go:

12
00:00:53,650 --> 00:00:55,610
just a few more lessons.

13
00:00:57,000 --> 00:00:59,210
I told you before that I play music.

14
00:00:59,210 --> 00:01:02,300
Someone came round to my house last night
with a contrabassoon.

15
00:01:02,300 --> 00:01:07,550
It's the deepest, lowest instrument in the
orchestra.

16
00:01:07,550 --> 00:01:08,870
You don't often see or hear one.

17
00:01:08,870 --> 00:01:13,640
So, here I am, trying to play a contrabassoon
for the first time.

18
00:01:27,000 --> 00:01:32,960
I think this has got to be the lowest point
of our course, Data Mining with Weka!

19
00:01:32,960 --> 00:01:37,570
Today I want to talk about support vector
machines, another advanced machine learning

20
00:01:37,570 --> 00:01:41,040
technique.

21
00:01:41,040 --> 00:01:46,280
We looked at logistic regression in the last
lesson, and we found that these produce linear

22
00:01:46,280 --> 00:01:48,350
boundaries in the space.

23
00:01:48,350 --> 00:01:56,340
In fact, here I've used Weka's Boundary Visualizer
to show the boundary produced by a logistic

24
00:01:56,340 --> 00:02:06,010
regression machine -- this is on the 2D Iris
data, plotting petalwidth against petallength.

25
00:02:06,010 --> 00:02:13,010
This black line is the boundary between these
classes, the red class and the green class.

26
00:02:14,110 --> 00:02:18,280
It might be more sensible, if we were going
to put a boundary between these two classes,

27
00:02:18,280 --> 00:02:25,940
to try and drive it through the widest channel
between the two classes, the maximum separation

28
00:02:25,940 --> 00:02:28,760
from each class.

29
00:02:28,760 --> 00:02:36,290
Here's a picture where the black line now
is right down the middle of the channel between

30
00:02:36,290 --> 00:02:37,620
the two classes.

31
00:02:37,620 --> 00:02:46,130
Actually, mathematically, we can find that
line by taking the two critical members, one

32
00:02:46,130 --> 00:02:52,130
from each class -- they're called support
vectors; these are the critical points that

33
00:02:52,130 --> 00:02:58,490
define the channel -- and take the perpendicular
bisector of the line joining those two support

34
00:02:58,490 --> 00:03:01,230
vectors.

35
00:03:01,230 --> 00:03:03,220
That's the idea of support vector machines.

36
00:03:03,220 --> 00:03:08,330
We're going to put a line between the two
classes, but not just any old line that separates them.

37
00:03:08,330 --> 00:03:15,060
We're trying to drive the widest channel between
the two classes.

38
00:03:15,060 --> 00:03:15,730
Here's another picture.

39
00:03:15,730 --> 00:03:20,240
We've got two clouds of points, and I've drawn
a line around the outside of each cloud -- the

40
00:03:20,240 --> 00:03:22,740
green cloud and the brown cloud.

41
00:03:22,740 --> 00:03:29,740
It's clear that any interior points aren't
going to affect this hyperplane, this plane,

42
00:03:30,040 --> 00:03:31,680
this separating line.

43
00:03:31,680 --> 00:03:38,040
I call it a line, but in multi dimensions
it would be a plane, or a hyperplane in four

44
00:03:38,040 --> 00:03:39,830
or more dimensions.

45
00:03:39,830 --> 00:03:46,830
There's just a few of the points in each cloud
that define the position of the line: the

46
00:03:46,980 --> 00:03:47,540
support vectors.

47
00:03:47,540 --> 00:03:51,260
In this case, there are three points.

48
00:03:51,260 --> 00:03:53,020
Support vectors define the boundary.

49
00:03:53,020 --> 00:03:57,420
The thing is that all the other instances
in the training data could be deleted without

50
00:03:57,420 --> 00:04:02,600
changing the position of the dividing
hyperplane.

51
00:04:02,600 --> 00:04:07,720
There's a simple equation and this is the
last equation in this course.

52
00:04:07,720 --> 00:04:15,570
A simple equation that gives the formula for
the maximum margin hyperplane as a sum over

53
00:04:15,570 --> 00:04:17,460
the support vectors.

54
00:04:17,460 --> 00:04:23,960
These are kind of a vector product with each
of the support vectors, and the sum there.

55
00:04:23,960 --> 00:04:30,030
It's pretty simple to calculate this maximum
margin hyperplane once you've got the support

56
00:04:30,030 --> 00:04:30,930
vectors.

57
00:04:30,930 --> 00:04:35,090
It's a very easy sum, and, like I say, it
only depends on the support vectors.

58
00:04:35,090 --> 00:04:41,960
None of the other points play any part in
this calculation.

59
00:04:41,960 --> 00:04:48,130
Now in real life, you might not be able to
drive a straight line between the classes.

60
00:04:48,130 --> 00:04:52,880
Classes are called "linearly separable" if
there exists a straight line that separates

61
00:04:52,880 --> 00:04:54,750
the two classes.

62
00:04:54,750 --> 00:04:58,940
In this picture, the two classes are not linearly
separable.

63
00:04:58,940 --> 00:05:03,140
It might be a little hard to see, but there
are some blue points on the green side of

64
00:05:03,140 --> 00:05:07,060
the line, and a couple of green points on
the blue side of the line.

65
00:05:07,060 --> 00:05:13,190
It's not possible to get a single straight
line that divide these points.

66
00:05:13,190 --> 00:05:18,370
That makes support vector machines -- the
mathematics -- a little more complicated.

67
00:05:18,370 --> 00:05:25,370
But it's still possible to define the maximum
margin hyperplane under these conditions.

68
00:05:27,280 --> 00:05:30,340
That's it: support vector machines.

69
00:05:30,340 --> 00:05:32,710
It's a linear decision boundary.

70
00:05:32,710 --> 00:05:38,000
Actually, there's a really clever technique
which allows you to get more complex boundaries.

71
00:05:38,000 --> 00:05:41,880
It's called the "Kernel trick".

72
00:05:41,880 --> 00:05:47,900
By using different formulas for the "kernel"
-- and in Weka you just select from some possible

73
00:05:47,900 --> 00:05:54,730
different kernels -- you can get different
shapes of boundaries, not just straight lines.

74
00:05:54,730 --> 00:06:01,420
Support vector machines are fantastic because
they're very resilient to overfitting.

75
00:06:01,420 --> 00:06:08,240
The boundary just depends on a very small
number of points in the dataset.

76
00:06:08,240 --> 00:06:12,120
So it's not going to overfit the dataset,
because it doesn't depend on almost all of

77
00:06:12,120 --> 00:06:17,720
the points in the dataset, just a few of these
critical points -- the support vectors.

78
00:06:17,720 --> 00:06:23,800
So it's very resilient to overfitting, even
with large numbers of attributes.

79
00:06:23,800 --> 00:06:28,290
In Weka, there are a couple of implementations
of support vector machines.

80
00:06:28,290 --> 00:06:31,650
We could look in the "functions" category
for "SMO".

81
00:06:31,650 --> 00:06:36,630
Let me have a look at that over here.

82
00:06:36,630 --> 00:06:51,060
If I look in "functions" for "SMO", that implements
an algorithm called "Sequential Minimal Optimization"

83
00:06:51,060 --> 00:06:54,110
for training a support vector classifier.

84
00:06:55,720 --> 00:07:00,160
There are a few parameters here, including,
for example, the different choices of kernel.

85
00:07:00,160 --> 00:07:04,540
You can choose different kernels: you can
play around and try out different things.

86
00:07:04,540 --> 00:07:07,060
There are a few other parameters.

87
00:07:07,060 --> 00:07:12,310
Actually, the SMO algorithm is restricted
to two classes, so this will only work with

88
00:07:12,310 --> 00:07:15,040
a 2-class dataset.

89
00:07:15,040 --> 00:07:20,930
There are other, more comprehensive, implementations
of support vector machines in Weka.

90
00:07:20,930 --> 00:07:29,430
There's a library called "LibSVM", an external
library, and Weka has an interface to this library.

91
00:07:29,430 --> 00:07:34,940
This is a wrapper class for the LibSVM tools.

92
00:07:34,940 --> 00:07:39,680
You need to download these separately from
Weka and put them in the right Java classpath.

93
00:07:39,680 --> 00:07:44,960
You can see that there are a lot of different
parameters here, and, in fact, a lot of information

94
00:07:44,960 --> 00:07:51,930
on this support vector machine package.

95
00:07:51,930 --> 00:07:55,090
That's support vector machines.

96
00:07:55,090 --> 00:07:56,380
You can read about them in Section 6.4

97
00:07:56,380 --> 00:08:01,850
of the textbook if you like, and please
go and do the associated activity.

98
00:08:01,850 --> 00:08:04,940
See you soon for the last lesson in this class.

99
00:08:04,940 --> 00:08:05,800
Bye!

