WEEK- 2
code
install.packages("dplyr")
library(dplyr)
Rajeshdf = read.csv('c:\\Insurance.csv')
str(Rajeshdf)
str(Rajeshdf)
summary(Rajeshdf)
agg_tbl <- Rajeshdf %>% group_by(Rajeshdf$JOB) %>%
summarise(total_count=n(),
.groups = 'drop')
agg_tbl
a = aggregate( x=Rajeshdf$HOME_VAL, by=list( Rajeshdf$CAR_TYPE), FUN=median, na.rm=TRUE )
a
QUIZ
2.
What famous literary detective solved a crime because a dog did not bark at the criminal?
A). Sherlock Holmes
1. In the Insurance data set, how many Lawyers
are there?
A). 1031
3. What
two prefixes does the instructor use for variables when fixing the missing
values? Select all that apply.
A).
IMP_
M_
4. What
is the median Home Value of a person who drives a Van?
A). 204139
5. In
the insurance data set, how many missing (NA) values does the variable AGE
have?
A) 7
1. What
is the process called where missing data is fixed?
a). Imputing
2. According
to the instructor, approximately what percentage of the analytic time is spent
on data preparation?
a). 90%
3. In
the Insurance data set, how many Blue Collar workers are there?
a). 2288
4. What
is the median Home Value of a person who drives a Panel Truck?
A). 220541
5. In
the insurance data set, how many missing (NA) values does the variable KIDSDRIV
have?
A). 0
In the
Insurance data set, how many Doctors are there?
A). 321
A). 639
What is the median Home
Value of a person who drives a Pickup?
A). 151061
In the insurance data
set, how many missing (NA) values does the variable AGE have?
A). 7
What is the process
called that converts categorical variables into flag variables?
A).
One
Hot Encoding
In the insurance data
set, how many missing (NA) values does the variable KIDSDRIV have?
A). 0
In the R programming
language, what is one method for converting a TRUE/FALSE variable into a 1/0
variable?
A). Add the number zero (0) to the
TRUE/FALSE variable.
What is the median Home
Value of a person who drives an SUV?
A). 140927
According to the
instructor, after a variable with missing values is "fixed", it is a
good idea to remove the variable from the data set.
A). True
What is the median Home
Value of a person who drives a Minivan?
A). 172269
In the insurance data
set, how many missing (NA) values does the variable YOJ have?
A). 548
In
the Insurance data set, how many Home Makers are there?
a).
843
In the Insurance data
set, how many Clerical workers are there?
a). 1590
WEEK 5 QUIZ
1.Random Forests and the Gradient Boosting models will usually be more accurate than Decision Tree models.
A. True
2.Which of these modelling techniques is not adversely affected by outliers?
a. All of these
3.Gradient Boosting models are easy to interpret.
A. False
4.Which of these modelling techniques trains many trees with each tree is built on a random subset of variables?
A. Random Forests
5.Which of these modelling techniques tends to use many small trees?
A. Gradient Boosting
6.Which of these modelling techniques is usually the easiest to interpret?
A. Decision Trees
7.Random Forests are easy to interpret.
A. False
8.In the United States, it is probably against the law to use a Gradient Boosting model for Marketing models.
A. FALS
9.Gradient Boosting models are based on Decision Trees.
A. True
10.Which of these modelling techniques is usually the fastest to train?
A. Decision Trees
11. Random Forests and the Gradient Boosting models will always be more accurate than Decision Tree models.
A. False
12. A Random Forest is more sensitive to a small input change than a Decision Tree
A. False
13. Which of these modelling techniques trains many trees with each tree is built on a random subset of records?
A. Random Forests
14. Random Forests are based on Decision Trees.
A. True
15.A Gradient Boosting model is less sensitive to a small input change than a Decision Tree
A. True
16. In the United States, it may be against the law to use a Gradient Boosting models for Credit or Auto Insurance models.
A. True
17.Which of these modelling techniques alters the data in order to over sample records that it incorrectly classified?
A. Gradient Boosting
18. Which of these modelling techniques is usually the easiest to convert into IF-THEN-ELSE rules?
A. Decision Trees
19. In the United States, it may be against the law to use a Random Forest for Credit or Auto Insurance models.
A. True
20. In the United States, it is probably against the law to use a Random Forest for Marketing models.
A. False
1. WhendoingtSNE
analysis,settingthePerplexitytoalownumberwilltendto favor local aspects of the
data. High numbers will tend to favor global data.
A. True
2.
PrincipalComponentsarealwaysOrthogonaltooneanother.
A. True
3.
WhendoingtSNEanalysis,settingthePerplexitytoahighnumberwilltendto
have less well defined groupings.
4.
WhendoingtSNEanalysis,settingthePerplexitytoahighnumberwilltendto
have more well defined groupings.
A. False
5.
InPCAanalysis,thevectorsrepresentaLINEARrelationshipinthe data.
A. True6.
Assumethatyouhave3continuousvariablesinyourdataset,howmanyPrincipal
Components will be created if you do a PCA Analysis?
A. 37.
PrincipalComponentsarealwaysIndependenttooneanother.
A. True
8.
IntheRprogramminglanguage,the"prcomp"functionallowsforscoringdata
using the "predict" command.
A. True
9.
Assumethatyouhave8continuousvariablesinyourdataset,howmany
Principal Components will be created if you do a PCA Analysis?
A. 8
10.
WhendoingtSNEanalysis,settingthePerplexitytoalownumberwilltendto
favor global aspects of the data. High numbers will tend to favor local data.
A. False
11.
Assumethatyouhave3continuousvariablesinyourdataset,howmanyPrincipalComponents
will be created if you do a PCA Analysis?
A. 3
12) WhendoingtSNEanalysis,settingthePerplexitytoalownumberwilltendtofavorglobal
aspects of the data. High numbers will tend to favor local data.
A. False
13) WhendoingtSNEanalysis,settingthePerplexitytoahighnumberwilltendtohavemorewell
defined groupings.
A. False
14) Assumethatyouhave2continuousvariablesinyourdataset,howmanyPrincipal
Components will be created if you do a PCA Analysis?
A. 2
15) tSNEvectorsarealwaysOrthogonaltooneanother.
A. False
16) tSNEvectorsarealwaysOrthogonaltooneanother.
A. False
17) Assume
thathave8continuesvariablesinyourdatasethowmanyprincipalcomponentswill be
created if you do a PCA analysis?
A. 8
18) )Assumethatyouhave8continuousvariablesinyourdataset,howmanyPrincipal
Components will be created if you do a tSNE Analysis using Rtsne?
A. 2or3
19) In PCAanalysis,thevectorsrepresentaNONLINEARrelationshipinthedata.
A. False
20) Assumethataninputdatasethasfourvariables:A,B,C,Dandtheyareusedtocreatefour
PrincipalComponents: PC1, PC2, PC3, and PC4.If A,B,C,D are allhighly
correlated, then what do you know about the correlation of PC1, PC2, PC3, and
PC4?
A. PC1,PC2,PC3,andPC4arecompletelyuncorrelatedfromoneanother.
21) IntSNEanalysis,thevectorsrepresentaLINEARrelationshipinthedata
A. False
22) GiventhefollowingScreePlot,howmanyPrincipalComponentsshouldbeused?
A) 2or possibly3Principal Components
23) IntheRprogramminglanguage,the"Rtsne"functionallowsforscoringdatausingthe "predict" command.
A. False
To answer this question, please refer to the CRAN Packages web page referred to in the course material.
Which of these packages are used for Optical Character Recognition?
A. abbyyR
Using the iris data set in R, generate a box plot by Species of the variable Petal Width.
Using the iris data set in R, generate a box plot by Species of the variable Sepal Width.
1. What are the two
commands that will return the first and last six rows of a Data Frame?
A. head, tail
2. The R programming language has data sets that are pre-loaded.
One of these data sets is the "iris" data set. What command will give
you information about this data set?
A. iris
3. In the R data set, ChickWeight, calculate the
median weight value by Diet. What is the median weight of a chick that has
Diet=1 ?
A. 88.0
4. In the R data set,
ChickWeight, calculate the median weight value by Diet. What is the median
weight of a chick that has Diet=2 ?
A.104.5
5. In the R data set,
ChickWeight, calculate the median weight value by Diet. What is the median
weight of a chick that has Diet=3 ?
A. 125.5
6. In the R data set,
ChickWeight, calculate the median weight value by Diet. What is the median
weight of a chick that has Diet=4 ?
A. 129.5
7. To answer this
question, please refer to the CRAN Packages web page referred to in the course
material.
Which of these
packages are used for Reliability and Scoring Routines?
A. ATtools
8. How many records
are in the predefined data set named "trees"
A. 31
9. There is no guarantee that an R Package included in CRAN will
be maintained and "up to date".
A. False
10. Which of these
packages are used for Combining Multidimensional Arrays?
A. abind
11. How many records are in the predefined data set named
"cars"
A. 50
12. Which of
these packages are used for Baysian approximation?
A. abc
13. If an R Package is included in CRAN it is
guaranteed to be regularly updated, and will always be "up to
date".
A. False
R
WEEK-7 QUIZ
1. When doing tSNE analysis, setting the Perplexity to a
low number will tend to favor local aspects of the data. High numbers will tend
to favor global data.
A. True
2. In the R programming language, the "Rtsne"
function allows for scoring data using the "predict" command.
A. False
3. Assume that an input data set has four variables:
A,B,C,D and they are used to create four Principal Components: PC1, PC2, PC3,
and PC4. If A,B,C,D are all highly correlated, then what do you know about the
correlation of PC1, PC2, PC3, and PC4?
A. PC1,
PC2, PC3, and PC4 are completely uncorrelated from one another.
4. tSNE vectors are always Independent to one
another.
A. False
5. Assume that you have 3 continuous variables
in your data set, how many Principal Components will be created if you do a PCA
Analysis?
A. 3
6. Principal Components are always Orthogonal
to one another
A. True
7. tSNE vectors are always Orthogonal to one
another.
A. False
8. When doing tSNE analysis, setting the
Perplexity to a low number will tend to favor global aspects of the data. High
numbers will tend to favor local data.
A. False
9. Assume that you have 2 continuous variables
in your data set, how many Principal Components will be created if you do a PCA
Analysis?
A. 2
10. Principal Components are always Orthogonal to
one another.
A. True
11. In the R programming language, the
"Rtsne" function allows for scoring data using the
"predict" command.
False
12. In tSNE analysis, the vectors represent a
LINEAR relationship in the data.
A. False
13. Assume that you have 8 continuous variables
in your data set, how many Principal Components will be created if you do a
tSNE Analysis using Rtsne?
A. 2 or 3
14. In tSNE analysis, the vectors represent a NON LINEAR
relationship in the data.
Ture
15. When doing tSNE analysis, setting the
Perplexity to a high number will tend to have less well defined groupings.
False
16. In the R programming language, the
"prcomp" function allows for scoring data using the
"predict" command.
True
17. In PCA analysis, the vectors represent a NON
LINEAR relationship in the data.
False
18. In PCA analysis, the vectors represent a
LINEAR relationship in the data.
A. True
20. When doing tSNE analysis, setting the
Perplexity to a high number will tend to have more well defined groupings.
A. True
21. Assume that you have 8 continuous variables
in your data set, how many Principal Components will be created if you do a PCA
Analysis?
8
22. Given the following
Scree Plot, how many Principal Components should be used?
1 or
possibly 2 Principal Components
23. Principal Components are always Independent
to one another.
True
Comments
Post a Comment