1
00:00:17,480 --> 00:00:26,320
Hi! This is the last lesson in the course
Data mining with Weka, Lesson 5.4 - Summary.

2
00:00:26,320 --> 00:00:31,300
We'll just have a quick summary of what we've
learned here.

3
00:00:31,300 --> 00:00:36,839
One of the main points I've been trying to
convey is that there's no magic in data mining.

4
00:00:36,839 --> 00:00:42,710
There's a huge array of alternative techniques,
and they're all fairly straightforward algorithms.

5
00:00:42,710 --> 00:00:45,170
We've seen the principles of many of them.

6
00:00:45,170 --> 00:00:50,170
Perhaps we don't understand the details, but
we've got the basic idea of the main methods

7
00:00:50,170 --> 00:00:53,329
of machine learning used in data mining.

8
00:00:53,329 --> 00:00:57,620
And there is no single, universal best method.

9
00:00:57,620 --> 00:01:00,899
Data mining is an experimental science.

10
00:01:00,899 --> 00:01:06,070
You need to find out what works best on your
problem.

11
00:01:06,070 --> 00:01:07,780
Weka makes it easy for you.

12
00:01:07,780 --> 00:01:11,210
Using Weka you can try out different methods,
you can try out different filters, different

13
00:01:11,210 --> 00:01:12,960
learning methods.

14
00:01:12,960 --> 00:01:14,910
You can play around with different datasets.

15
00:01:14,910 --> 00:01:17,160
It's very easy to do experiments in Weka.

16
00:01:17,160 --> 00:01:22,030
Perhaps you might say it's too easy, because
it's important to understand what you're doing,

17
00:01:22,030 --> 00:01:25,729
not just blindly click around and look at
the results.

18
00:01:25,729 --> 00:01:30,569
That's what I've tried to emphasize in this
course -- understanding and evaluating what

19
00:01:30,569 --> 00:01:31,660
you're doing.

20
00:01:31,660 --> 00:01:36,759
There are many pitfalls you can fall into
if you don't really understand what's going

21
00:01:36,759 --> 00:01:38,030
on behind the scenes.

22
00:01:38,030 --> 00:01:43,649
It's not a matter of just blindly applying
the tools in the workbench.

23
00:01:43,649 --> 00:01:48,550
We've stressed in the course the focus on
evaluation, evaluating what you're doing,

24
00:01:48,550 --> 00:01:54,950
and the significance of the results of the
evaluation.

25
00:01:54,950 --> 00:01:57,679
Different algorithms differ in performance,
as we've seen.

26
00:01:57,679 --> 00:02:00,789
In many problems, it's not a big deal.

27
00:02:00,789 --> 00:02:06,060
The differences between the algorithms are
really not very important in many situations,

28
00:02:06,060 --> 00:02:12,080
and you should perhaps be spending more time
on looking at the features and how the problem

29
00:02:12,080 --> 00:02:19,340
is described and the operational context that
you're working in, rather than stressing about

30
00:02:19,349 --> 00:02:21,680
getting the absolute best algorithm.

31
00:02:21,680 --> 00:02:25,280
It might not make all that much difference
in practice.

32
00:02:25,280 --> 00:02:29,080
Use your time wisely.

33
00:02:29,080 --> 00:02:31,709
There's a lot of stuff that we've missed out.

34
00:02:31,709 --> 00:02:35,569
I'm really sorry I haven't been able to cover
more of this stuff.

35
00:02:35,569 --> 00:02:41,299
There's a whole technology of filtered classifiers,
where you want to filter the training data,

36
00:02:41,299 --> 00:02:43,099
but not the test data.

37
00:02:43,099 --> 00:02:49,230
That's especially true when you've got a supervised
filter, where the results of the filter depend

38
00:02:49,230 --> 00:02:53,760
on the class values of the training instances.

39
00:02:53,760 --> 00:02:59,069
You want to filter the training data, but
not the test data, or maybe take a filter

40
00:02:59,069 --> 00:03:04,650
designed for the training data and apply the
same filter to the test data without re-optimizing

41
00:03:04,650 --> 00:03:06,590
it for the test data, which would be cheating.

42
00:03:08,800 --> 00:03:11,290
You often want to do this during cross-validation.

43
00:03:11,290 --> 00:03:15,639
The trouble in Weka is that you can't get
hold of those cross-validation folds; it's

44
00:03:15,639 --> 00:03:17,469
all done internally.

45
00:03:17,469 --> 00:03:21,819
Filtered classifiers are a simple way of dealing
with this problem.

46
00:03:21,819 --> 00:03:25,680
We haven't talked about costs of different
decisions and different kinds of errors, but

47
00:03:25,680 --> 00:03:29,510
in real life different errors have different
costs.

48
00:03:29,510 --> 00:03:35,999
We've talked about optimizing the error rate,
or the classification accuracy, but really,

49
00:03:35,999 --> 00:03:40,310
in most situations, we should be talking about
costs, not raw accuracy figures, and these

50
00:03:40,310 --> 00:03:43,519
are different things.

51
00:03:43,519 --> 00:03:48,290
There's a whole panel in the Weka Explorer
for attribute selection, which helps you select

52
00:03:48,290 --> 00:03:55,099
a subset of attributes to use when learning,
and in many situations it's really valuable,

53
00:03:55,099 --> 00:04:00,209
before you do any learning, to select an appropriate
small subset of attributes to use.

54
00:04:01,950 --> 00:04:04,170
There are a lot of clustering techniques in
Weka.

55
00:04:04,170 --> 00:04:07,529
Clustering is where you want to learn something
even when there is no class value: you want

56
00:04:07,529 --> 00:04:12,060
to cluster the instances according to their
attribute values.

57
00:04:12,060 --> 00:04:16,380
Association rules are another kind of learning
technique where we're looking for associations

58
00:04:16,380 --> 00:04:17,630
between attributes.

59
00:04:17,630 --> 00:04:22,770
There's no particular class, but we're looking
for any strong associations between any of

60
00:04:22,770 --> 00:04:23,960
the attributes.

61
00:04:23,960 --> 00:04:27,639
Again, that's another panel in the Explorer.

62
00:04:27,639 --> 00:04:29,000
Text classification.

63
00:04:29,000 --> 00:04:35,740
There are some fantastic text filters in Weka
which allow you to handle textual data as

64
00:04:35,740 --> 00:04:41,010
words, or as characters, or n-grams (sequences
of three, four, or five consecutive characters).

65
00:04:42,200 --> 00:04:45,060
You can do text mining using Weka.

66
00:04:45,060 --> 00:04:52,000
Finally, we've focused exclusively on the
Weka Explorer, but the Weka Experimenter is

67
00:04:52,000 --> 00:04:54,340
also worth getting to know.

68
00:04:54,340 --> 00:04:59,880
We've done a fair amount of rather boring,
tedious, calculations of means and standard

69
00:04:59,880 --> 00:05:07,260
deviations manually by changing the random-number
seed and running things again.

70
00:05:07,260 --> 00:05:09,400
That's very tedious to do by hand.

71
00:05:09,400 --> 00:05:12,840
The Experimenter makes it very easy to do
this automatically.

72
00:05:12,840 --> 00:05:19,900
So, there's a lot more to learn, and I'm wondering
if you'd be interested in an Advanced Data

73
00:05:19,900 --> 00:05:21,200
Mining with Weka course.

74
00:05:21,200 --> 00:05:25,630
I'm toying with the idea of putting one on,
and I'd like you to let us know what you think

75
00:05:25,630 --> 00:05:29,350
about the idea, and what you'd like to see
included.

76
00:05:30,940 --> 00:05:33,690
Let me just finish off here with a final thought.

77
00:05:33,690 --> 00:05:37,550
We've been talking about data, data mining.

78
00:05:37,550 --> 00:05:43,380
Data is recorded facts, a change of state
in the world, perhaps.

79
00:05:43,380 --> 00:05:48,360
That's the input to our data mining process,
and the output is information, the patterns

80
00:05:48,360 --> 00:05:54,050
-- the expectations -- that underlie that
data: patterns that can be used for prediction

81
00:05:54,050 --> 00:05:58,320
in useful applications in the real world.

82
00:05:58,320 --> 00:06:02,310
We've going from data to information.

83
00:06:02,310 --> 00:06:07,500
Moving up in the world of people, not computers,
"knowledge" is the accumulation of your entire

84
00:06:07,500 --> 00:06:14,500
set of expectations, all the information that
you have and how it works together -- a large

85
00:06:14,680 --> 00:06:20,550
store of expectations and the different situations
where they apply.

86
00:06:20,550 --> 00:06:24,690
Finally, I like to define "wisdom" as the
value attached to knowledge.

87
00:06:24,690 --> 00:06:32,610
I'd like to encourage you to be wise when
using data mining technology.

88
00:06:32,610 --> 00:06:33,910
You've learned a lot in this course.

89
00:06:33,910 --> 00:06:39,280
You've got a lot of power now that you can
use to analyze your own datasets.

90
00:06:39,280 --> 00:06:44,420
Use this technology wisely for the good of
the world.

91
00:06:44,420 --> 00:06:46,390
That's my final thought for you.

92
00:06:47,470 --> 00:06:51,640
There is an activity associated with this
lesson, a little revision activity.

93
00:06:51,640 --> 00:07:00,460
Go and do that, and then do the final assessment,
and we will send you your certificate if you

94
00:07:00,460 --> 00:07:02,060
do well enough.

95
00:07:02,060 --> 00:07:07,590
Good luck! It's been good talking to you,
and maybe we'll see you in an advanced version

96
00:07:07,590 --> 00:07:09,680
of this course.

97
00:07:09,680 --> 00:07:11,100
Bye for now!