1
00:00:18,470 --> 00:00:25,470
Hi! Welcome back to the course on Data Mining
with Weka. I'm Ian up here in New Zealand.

2
00:00:25,720 --> 00:00:32,259
This is Lesson 1.2. Remember there are five
classes in this course, and each class consists

3
00:00:32,259 --> 00:00:37,409
of about six lessons. This is the second lesson
of the first class. We're going to explore

4
00:00:37,409 --> 00:00:43,100
the Explorer, the Weka Explorer interface.
Actually, first we're going to download the

5
00:00:43,100 --> 00:00:46,820
Weka system. This is something you're going
to have to do on your computer. We're going

6
00:00:46,820 --> 00:00:53,820
to download it from this URL. Withouth delay,
let's go straight there. Here we are.

7
00:00:57,900 --> 00:01:03,530
This is www. cs.waikato.ac.nz/ml/weka. You can
read about Weka here. I'm going to go straight

8
00:01:03,530 --> 00:01:11,330
to the Download button and download and install
Weka on my computer. I'm running on a Windows

9
00:01:11,360 --> 00:01:16,030
machine here, but there are versions down
at the bottom you can see for Mac OS X and

10
00:01:16,030 --> 00:01:23,030
Linux and so on. You need to download the
appropriate version for your machine.

11
00:01:23,030 --> 00:01:33,760
We want Weka 3.6.10. That's the latest version of
Weka. I'm going to download a self-extracting

12
00:01:33,760 --> 00:01:38,040
executable without the Java Virtual Machine.
I already have the Java Virtual Machine on

13
00:01:38,040 --> 00:01:43,909
my computer. I'm going to click here, but
you're going to need to do whatever's appropriate

14
00:01:43,909 --> 00:01:46,320
for you computer.

15
00:01:46,320 --> 00:01:55,380
While it's downloading, let's have a word
about the pronunciation of the word 'Weka'.

16
00:01:55,380 --> 00:02:02,380
It's called Weh-kuh. We don't like calling
it 'weaker' system. It's not 'weaker', it's

17
00:02:02,520 --> 00:02:07,310
Weka, pronounced to rhyme with 'Mecca'. That's
the name of the bird, that's the name of our

18
00:02:07,310 --> 00:02:21,230
software. Weka. I think it might have downloaded.
I'm going to open it. This is a standard kind

19
00:02:21,230 --> 00:02:28,180
of setup wizard. We're installing Weka 3.6.10.
I'm just going to keep clicking next here.

20
00:02:28,180 --> 00:02:35,180
Yes, I'm happy with this GNU public license.
I'm going to have a full install.

21
00:02:36,930 --> 00:02:40,870
I'm going to install it in the default place—just
need to remember the name of this place.

22
00:02:40,870 --> 00:02:46,550
We're going to need to visit there in a moment.
We're going to install the whole thing.

23
00:02:46,550 --> 00:02:53,550
This is going to take a couple minutes. I'm just
off for a cup of coffee; I'll be back in a second.

24
00:02:56,270 --> 00:02:56,780
 

25
00:02:56,780 --> 00:03:02,560
Now, it's installed. Let's just carry on here.
I want to click Finish, but actually I'm not

26
00:03:02,560 --> 00:03:06,160
going to start Weka. I'm going to uncheck
that, and click Finish, because there are

27
00:03:06,160 --> 00:03:13,160
a couple of things I want to do first. Let's
go and see where Weka is. It's on my computer

28
00:03:16,200 --> 00:03:28,670
in Program Files. It should be down here—Weka
3.6. I'm going to create a shortcut to that,

29
00:03:28,670 --> 00:03:34,960
because we're going to be using it a lot in
this course. I'm just going to put it on the desktop.

30
00:03:38,260 --> 00:03:44,330
Then, I'm going to do one more thing.
I'm going to go inside this folder, and I'm

31
00:03:44,330 --> 00:03:52,190
going to look at the data folder. This contains
a bunch of datasets we're going to be using.

32
00:03:52,190 --> 00:03:59,190
I'm going to take this folder and copy it
and put it somewhere convenient.

33
00:04:00,360 --> 00:04:15,160
Let's cut that, and I'm going to put it in My Documents
folder. I'm going to rename it Weka datasets.

34
00:04:21,029 --> 00:04:33,229
I'm all set. I finished installing Weka. 
I've got my shortcut to Weka here.

35
00:04:33,850 --> 00:04:41,340
I made my shortcut to the wrong place. I meant to 
make the shortcut to this here. Let me just make

36
00:04:41,340 --> 00:04:54,330
a shortcut here. Create shortcut, put it on the desktop.
That's the one I want. Now, when I click here,

37
00:04:54,330 --> 00:05:01,330
it will open Weka. Back to the slide. There
are four interfaces in Weka. The Explorer

38
00:05:01,900 --> 00:05:05,470
is the one that we'll be using throughout
this course. We're just using the Explorer.

39
00:05:05,470 --> 00:05:12,470
But also, there is the Experimenter for large
scale performance comparisons for different

40
00:05:12,810 --> 00:05:18,240
machine learning methods on different datasets.
There's the KnowldgeFlow interface, which

41
00:05:18,240 --> 00:05:24,110
is a graphical interface to Weka tools, and
there's a command-line interface. But we're

42
00:05:24,110 --> 00:05:30,960
just going to use the Explorer, so let's get
on with it. Here's the Explorer. Across the

43
00:05:30,960 --> 00:05:37,960
top, there are five panels: the Preprocess
panel;

44
00:05:42,090 --> 00:05:54,139
the Classify panel, where you build classifiers for datasets;
 clustering, another procedure Weka is good at, although we won't

45
00:05:54,139 --> 00:05:59,639
be talking about clustering in this course;
association rules; attribute selection; and

46
00:05:59,639 --> 00:06:04,990
visualization. In this course, we'll be using
mainly the Preprocess panel to open files

47
00:06:04,990 --> 00:06:09,759
and so on, the Classify panel to experiment
with classifiers, and the Visualize panel

48
00:06:09,759 --> 00:06:16,509
to visualize our datasets. I'm going to open
a dataset. The dataset that I'm going to open

49
00:06:16,509 --> 00:06:22,210
is the weather data; it's a little toy dataset
that we'll be seeing a lot of in this course.

50
00:06:22,210 --> 00:06:29,210
It's about 14 instances, 14 days, and for
each of these days, we have recorded the values

51
00:06:29,600 --> 00:06:34,580
of five attributes. Four to do with the weather:
Outlook, Temperature, Humidity, and Windy.

52
00:06:34,580 --> 00:06:41,580
The fifth, Play, is whether or not we're going
to play a particular, unspecified game.

53
00:06:43,020 --> 00:06:47,940
Actually, what we're going to be doing is predicting
the Play attribute from the other attributes.

54
00:06:47,940 --> 00:06:51,729
Let's not worry about that at the moment.
Let's just open the dataset and take a look

55
00:06:51,729 --> 00:06:58,729
at it in Weka. Here's My Documents. Here are
the Weka datasets; this is what I copied.

56
00:07:00,910 --> 00:07:07,910
I'm going to open weather.nominal.arff. All
Weka data files are called ARFF files.

57
00:07:09,560 --> 00:07:17,460
We'll talk about that later on. This is the weather
data. Just ignore these colorful bars at the moment.

58
00:07:19,990 --> 00:07:26,990
There are 14 instances; those correspond
to the 14 days that we saw in the dataset

59
00:07:27,350 --> 00:07:33,630
on the slide. For each day, we have five attributes:
outlook, temperature, humidity, windy, and

60
00:07:33,630 --> 00:07:39,600
play. If you select one of these attributes—outlook
is selected at the moment—we can see the

61
00:07:39,600 --> 00:07:45,970
values. The values for the outlook attribute
are sunny, overcast, and rainy. These are

62
00:07:45,970 --> 00:07:51,470
the number of times they appear in the dataset:
5 sunny days, 4 overcast days, and 3 rainy

63
00:07:51,470 --> 00:07:58,470
days for a total of 14 days, 14 instances.
If we look at the temperature attribute, hot,

64
00:07:59,479 --> 00:08:04,300
mild, and cool are the possible values, and
these are the number of times they appear

65
00:08:04,300 --> 00:08:11,300
in the dataset. Let's go to the play attribute.
There are two values for play, yes or no.

66
00:08:12,349 --> 00:08:19,349
Now, let's look at these two bars here. Blue
corresponds to yes, and red corresponds to no.

67
00:08:21,419 --> 00:08:28,410
If you look at one of the other attributes,
like outlook, you can see that when the outlook

68
00:08:28,410 --> 00:08:35,410
is sunny—this is like a histogram—there
are three no instances and two yes instances.

69
00:08:37,680 --> 00:08:43,949
When the outlook is overcast, there are four
yes instances and zero no instances.

70
00:08:43,949 --> 00:08:49,940
These are like a histogram of the attribute values
in terms of the attribute we're trying to predict.

71
00:08:49,940 --> 00:08:56,940
It makes it kind of useful to click
around and visualize your data. We've opened

72
00:08:57,829 --> 00:09:04,829
the weather data, weather.nominal.arff. We've
looked at the attribute values and the attributes

73
00:09:08,970 --> 00:09:13,069
in Weka. There's one more thing I want to
do before we summarize here. I want to go

74
00:09:13,069 --> 00:09:19,610
to the Edit panel. If I go to the Edit panel,
I see the data in the form that it was on

75
00:09:19,610 --> 00:09:26,269
the slide, with the 14 days down here and
the 5 attributes across here. This is another

76
00:09:26,269 --> 00:09:33,269
view of the data. I can actually change this
dataset. If I click here, I can change this

77
00:09:34,239 --> 00:09:41,239
no to yes. Or, if I click here, I can change
on this day, the outlook from rainy to sunny.

78
00:09:46,089 --> 00:09:51,839
If only it were so easy in real life to change
a day from rainy to sunny. Then I can click

79
00:09:51,839 --> 00:09:57,980
OK, and we've got this edited dataset, which
we could save if we'd like. We haven't saved

80
00:09:57,980 --> 00:10:01,739
any of this. The dataset on the disk is still
the same as it was. I'm not going to save

81
00:10:01,739 --> 00:10:05,389
it, and I don't think you should save it,
because we're going to be using this dataset

82
00:10:05,389 --> 00:10:12,389
quite a bit in this course. This is what we've
done in this lesson. We've installed Weka.

83
00:10:13,769 --> 00:10:20,769
We've got the datasets. We've opened the Explorer.
We've looked at a dataset—the weather.nominal.arff

84
00:10:21,720 --> 00:10:26,739
dataset. We've looked at the attributes and
their values. We've edited the dataset, and

85
00:10:26,739 --> 00:10:32,850
we didn't save it. You can read more about
this in the course text. Section 1.2 talks about

86
00:10:32,850 --> 00:10:39,850
the weather data, and Chapter 10 is a little
introduction to the Weka system. Now you should

87
00:10:40,569 --> 00:10:45,239
go and do the activity associated with this
lesson. Good luck, and I'll see you in the

88
00:10:45,239 --> 00:10:52,239
next lesson. Bye for now!

