Lesson 1 Quiz
One of the key differences between business analytics and data science is their primary focus either on business problems or on mathematical algorithms.
True
Analytics and analysis are essentially the same thing; they both focus on the granular level representation of complex problems through decomposition of the whole into its lower-level parts.
False
If a data scientist is analyzing historical data to identify problems and root causes, he/she is essentially conducting descriptive analytics.
True
ERP stands for enterprise resource planning and is used for the integration of company-wide data.
True
The most important driver behind business analytics popularity is the need for business managers to make experience and intuition driven business decisions.
False
Business analytics and data science have the same purpose: to convert data into actionable insight through an algorithm-based discovery process.
True
Major commercial business intelligence products and services were established in the early 1970s.
False
If I am distributing funds to different financial products to maximize return, | am essentially doing descriptive analytics.
False
Today, analytics can be defined simply as "the discovery of information/knowledge/insight in data.”
True
Business intelligence is a broad concept that also includes business analytics within its simple taxonomy.
False
Analytics is the art and science of discovering insight to support accurate and timely decision making.
True
Business analytics is the process of developing computer code and novel IT frameworks.
False
Organizations apply analytics to business problems to identify problems, foresee future trends, and make the best possible decisions.
True
DeepQA is a massively parallel, web mining focused, probabilistic computational algorithm developed by the SAS Institute.
False
Descriptive analytics is also called business intelligence that is the entry level in analytics taxonomy.
True
Lesson 1 Post Assessment
What are the main roadblocks to the adoption of analytics?
All of these
Jim, the marketing manager in the company, is interested in the sales numbers in the south region by each product type for the last six months. What type of analytics would you use to help him?
Descriptive
Which of the following developments is not contributing to facilitating the growth of decision support and analytics? AO Knowledge management systems
Locally concentrated workforces
What type of analytics seeks to identify the courses of action to achieve the best performance possible?
Prescriptive
If Jack is interested in identifying the optimal quantity of purchase orders in order to minimize the overall cost, which of the following types of analytics should he use?
Prescriptive
Firms have used analytics to enhance which of the following business activities?
All of these
Which of the following is not commonly used as an enabler of descriptive analytics?
Data mining
Lesson 2 Quiz
1. Association patterns can include capturing the sequence of events and things.
True
2. Cubes in OLAP are defined as a multidimensional representation of the data stored in and retrieved from data warehouses.
True
3. Prediction modeling is often classified under the unsupervised machine learning methods.
False
4. Data mining can be used to predict the result of sporting events to identify means to decrease odds of winning against specific opponent.
False
5. In banking and finance, data mining is often used to manage microeconomics movements and overall cash flow outcomes.
False
6. One of the most pronounced reasons for the increasing popularity of data mining is due to the fact that there are less suppliers than corresponding demand in the business marketplace.
False
7. Novel is a key term in the definition of data mining, which means that the patterns are known by the user within the context of the system being analyzed.
False
8. Segmentation and outlier analysis are part of classification modeling.
False
9. Data mining is primarily concerned with mining (that is, digging out data) from a variety of disparate data sources.
False
10. In the retail industry, association rule mining is frequently called market-based analysis.
True
11. CRM aims to create one-on-one relationships with customers by developing an intimate understanding of their needs and wants.
True
12. Data mining leverages capabilities of statistics, artificial intelligence, machine learning, management science, information systems, and databases in a systematic and synergistic way.
True
13. The original terminology of data mining commonly refers to discovering known patterns in large and structured data sets.
False
14. Manufacturers use data mining to classify anomalies and commonalities in the production system to improve the manufacturing system.
True
15. Information warfare often refers to identify and stop malicious attacks on critical information infrastructures in literarily any and every organizations and business
True
Lesson 2 Post Assessment
In data mining, clustering is classified further into:
segmentation and outlier analysis.
Which of the following is the most commonly used clustering
k-means
What kinds of patterns can data mining discover?
Each correct answer represents a complete solution. Choose all that apply.
Clustering
Classification
Optimization
Forecasting
Association
What are the most common reasons why data mining has gained overwhelming attention in the business world?
All of these
In retailing, data mining is most commonly used to: |
predict future sales.
Which of the following statements is true about clustering?
Assigns customers to different segments
What is the primary difference between statistics and data mining?
Statistics starts with a well-defined proposition and hypothesis, whereas data mining starts with a loosely defined discovery statement.
The important part of the KDD process is the feedback loop that allows the process flow to redirect backward, from any step to any other previous steps, for rework and readjustments.
True
The data sources that are combined in a centralized data repository for supporting managerial decisions is known as a data warehouse.
True
In the SEMMA process, the accuracy and usefulness of the models are evaluated in the Assess step.
True
In the SEMMA process, visualization and description of the data are carried out in the Modify step.
False
The CRISP-DM methodology was proposed by Fayyad et al., in the year 1996.
False
In the model building task, both the CRISP-DM and SEMMA methodologies build and test various models.
True
Define, Explore, Measure, and Assess are the steps involved in the Six Sigma process.
False
In the testing and evaluation step of the CRISP-DM methodology, monitoring and maintenance of the models are important.
False
During the model building step in the CRISP-DM process, the data mining methods and algorithms are applied to the current data set.
True
The Six Sigma process promotes an error-free/perfect business execution.
True
The Modify step in Six Sigma involves the process of assessing the mapping between organizational data repositories and the business problem.
False
In the project finalization task, both the CRISP-DM and SEMMA methodologies prescribe deploying the results.
False
Identifying the most pressing problem and defining the goals and objectives can be done in the Define step of the Six Sigma process.
True
When compared with all other methodologies, CRISP-DM is the most popular data mining process that is being used in data analytics.
True
In the CRISP-DM process, it is not important or necessary to follow the sequential order of each step. That is, the steps can be executed in an arbitrary sequence.
False
During which step of the SEMMA process the analyst searches for unanticipated trends and anomalies to gain a better understanding of the data set?
Explore
Which of the following steps of the CRISP-DM process is commonly called the data preprocessing step that produces the data identified in the data understanding | step for analysis?
Data preparation
Which of the following is the most relevant methodology that is used to implement data science and business analytics projects?
CRISP-DM
During which step of the Six Sigma process are the identified data sources consolidated and transformed into a format that is amenable to machine processing?
Measure
Which of the following steps of the CRISP-DM process identifies the relevant data from different sources?
Data understanding
Which of the following substeps are involved in the Sample step of the SEMMA process?
Training, validation, and test
Which of the following steps of the CRISP-DM process identifies the goals, purpose, and requirements of the customers?
Business understanding
The customer credit ratings like bad, fair, and excellent are considered as what type of data?
Ordinal
The ratio of accurately classified instances (positives and negatives) divided by the total number of instances is defined as the overall accuracy metric.
True
Handling the missing values in the data is typically performed in the data consolidation phase.
False
F1 metric is simply the harmonic mean of precision and recall.
True
A typical example of interval scale measurement is the temperature on the Celsius scale.
True
Apriori and FP-Growth algorithms are part of the association type data mining tasks.
True
The ratio of correctly classified positives divided by the total positive count is defined as a precision metric.
False
If a classification problem is not binary, you cannot use a confusion matrix to tabulate prediction outcomes.
False
k-means algorithm is a part of prediction data mining method.
False
The bootstrapping methodology is similar to the leave-one-out methodology, where it can be used to calculate accuracy by leaving out one sample at each iteration of the estimation process.
False
Balancing skewed data means oversampling the more represented class records and undersampling the less represented class records.
False
Decision trees are part of the regression type prediction methods.
False
The multi split methodology partitions data into exactly two mutually exclusive subsets called training set and test set.
False
The purpose of data preparation (commonly called data preprocessing) is to eliminate the possibility of GIGO errors.
True
How and what the model concludes on certain predictions is obtained by the interpretability characteristic of the prediction method.
True
The area under the ROC curve is a graphical assessment technique for binary classification problems, in which sensitivity is plotted on the y-axis and the specificity is plotted on the x-axis.
False
Which clustering method is based on the basic idea that nearby objects are more related to each other than are those that are farther away from each other?
Hierarchical
Which cross-validation methodology achieves random sampling of a fixed number of instances from the original data with replacement to construct the training data set?
Bootstrapping
Which classification method use(s) conditional probabilities to build classification models?
Bayesian classifiers
Which of the following is defined as the ratio of correctly classified negatives divided by the total negative count?
Specificity
Which of the following factors refers to a model's ability to make reasonably accurate predictions, given noisy data or data with missing and erroneous values?
Robustness
Which method takes into account the partial membership of class labels to predefined categories while building models for classification problems?
Rough sets
Time series is a sequence of data points of interest measured and represented at consecutive and regular time intervals.
True
In linear regression, the independence of errors assumption is also known as homoscedasticity.
False
Multicollinearity can be triggered by having two or more perfectly correlated explanatory variables present in the model.
True
In linear regression, hypothesis testing reveals the existence of relationships between explanatory variables.
False
The Naive Bayes method requires output variables to have numeric values
False
In prediction, linear regression uses a mathematical equation to identify additive mathematical relationships between explanatory variables and the response variable
True
In the normality of error assumption of linear regression, the response variables' values are expected to be randomly distributed.
False
In time-series forecasting, an estimator's mean squared error measures the average absolute error between the estimated and the actual values.
False
Correlation is meant to represent the linear relationships between two nominal input variables.
False
k-NN is a prediction method used not only for classification but also for regression-type prediction problems.
True
To deploy a developed SVM model, the model coefficients can be extracted and integrated directly into the decision support system.
True
Logistic regression is like linear regression where both of them are used to predict a numeric target variable.
False
Linear regression aims to capture the functional relationships between one or more numeric input variables and a categorical output variable.
False
Homoscedasticity states that the response variables must have the same variance in their error, regardless of the explanatory variables' values.
True
In the SVM model, normalization's main benefit is to avoid having attributes in greater numeric ranges and dominate those in smaller numeric ranges.
True
In prediction analytics, variance refers to the error, and bias refers to the consistency in the predictive accuracy of models applied to other data sets.
False
A data set is imbalanced when the distribution of different classes in the input variables are significantly dissimilar.
False
Overfitting is the notion of making the model too specific to the training data to capture not only the signal but also the noise in the data set.
True
Information fusion type model ensembles utilize meta-modeling called super learners.
False
Bias is often defined as the difference between a model's prediction output and the actual values for a given prediction problem.
True
Model ensembles are known to be more robust against outliers and noise in the data compared to individual models.
True
Bagging type ensembles can be used in both regression and classification type prediction problems.
True
In explainable AI, the LIME and SHAP methods are considered as global interpreters.
False
Sensitivity analysis based on the leave-one-out methodology can be applied to any predictive analytics method because of its model agnostic implementation methodology.
True
A model with low variance is the one that captures both noise and generalized patterns in the data and therefore produces an overfit model.
False
In ensemble modeling, bagging uses the bootstrap sampling of cases to create a collection of decision trees.
True
Model ensembles are much easier and faster to develop than individual models.
False
In ensemble modeling, boosting builds several independent simple trees for the resultant prediction model.
False
Underfitting is mainly characterized on the bias–variance trade-off continuum as low-bias/low-variance outcome.
False
Sensitivity analysis based on input value perturbation is often used in trained feed-forward neural network modeling, where all of the input variables are numeric and standardized.
True
Lesson 7 Quiz
Clustering is a supervised learning process in which objects are assigned to pre-determined number of artificial groups called clusters.
False
Text-to-speech is a text processing function that can read textual content and detects and corrects syntactic and semantic errors.
False
In the context of the text mining process, both structured and unstructured data are extracted from the data sources and converted into context-specific knowledge.
True
SCM and ERP are the first two beneficiaries of the NLP and WordNet.
False
True
True
False
False
True
Automatic summarization is a program that is used to assign documents into a predefined set of categories.
False
True
True
In the context of text mining, lemmatization is a process of syntactically reducing words to their stem/root form.
False
False
False
In the context of text mining, which of the following is a part of NLP that studies the internal structure of words (that is, the patterns of word formation within a language or across languages)?
Morphology
Which of the following are the most commonly used normalization methods?
Log, binary, and inverse document frequencies
Which of the following are the best options available to manage the TDM matrix size?
Labor-intensive process, eliminate terms, and singular value decomposition
Which of the following are the common challenges that are associated with the implementation of NLP?
All of these
Which of the following is not among the steps involved in sentiment analysis?
Latent Dirichlet allocation
In the knowledge extraction method of the text mining process, ____________ refers to the natural groping, analysis, and navigation of large text collections, such as web pages.
Clustering
Which of the following applications utilize the capabilities of text mining?
Marketing applications
Security applications
Biomedical applications
In which of the following categories of knowledge extraction method is the task of text categorization achieved?
Classification
Hadoop is an open-source framework for processing, storing, and analyzing massive amounts of distributed, wide variety of data.
True
The term velocity in big data analytics refers to how fast digitized data is created and processed.
True
Big data comes from a variety of sources within an organization, including marketing and sales transaction, inventory records, financial transaction, and human resources and accounting records.
False
Hadoop is a batch-oriented computing framework, which implies it does not support real-time data processing and analysis.
True
A stream in a stream analytics is defined as a discrete and aggregated level of data elements.
False
MapReduce is a contemporary programming language designed to be used by computer programmers.
False
Among the variety of factors, the key driver for big data analytics is the business needs at any level, including strategic, tactical, or operational.
True
Grid computing increases efficiency, lowers total cost, and enhances production by processing computational jobs in a shared, centrally managed ordinary pool of computing resources.
True
HDFS (Hadoop Distributed File System) was invented before Google developed MapReduce. Hence, the early versions of MapReduce relied on HDFS.
False
The main benefit of Hadoop is that it allows enterprises to process and analyze large volumes of structured and semi-structured data on specialized hardware.
False
Hadoop is not just about the volume but also processing of diversity of data types.
True
A data scientist's main objective is to organize and analyze large amounts of data, to solve complex problems, often using software specifically designed for the task.
True
The term veracity in big data analytics refers to the processing of different types and formats of data, structured and unstructured.
False
Hadoop is a replacement for a data warehouse which stores and processes large amounts of structured data.
False
In typical data stream mining applications, the purpose is to predict the class or value of new instances in the data stream, given some knowledge about the class membership or values of previous instances in the data stream.
True
The main characteristic of deep learning solutions is that they use AI (artificial intelligence) to understand and organize data, predict the intent of a search query, improve the relevancy of results, and automatically tune the relevancy of results over time. xzc xc
False
Human–computer interaction is a critical component of cognitive systems that allows users to interact with cognitive machines and define their needs.
True
Deep learning analytics is a term that refers to the computing−branded technology platforms, such as IBM Watson, that specialize in processing and analyzing large, unstructured data sets.
False
In a typical neural network, the goal of the testing process is to adjust the network weights and biases such that the network output for each set of inputs is adequately close to its corresponding target value.
False
Connection weights are the key elements of an artificial neural network (ANN). They produce the final value through the summation and transfer function.
False
AI (artificial intelligence) has the capability to find hidden patterns in a variety of data sources to identify problems and provide potential solutions.
True
Cognitive computing has the capability to simulate human thought processes to assist humans in finding solutions to complex problems.
True
In artificial neural networks, neurons are processing units, also called processing elements, that perform predefined mathematical operations on the numeric values from the input variables or the other neuron outputs to create and push out their own outputs.
True
The term long short-term memory network refers to a network that is used to remember what happened in the past for a long enough time that it can be leveraged in accomplishing the task when needed.
True
Multilayer perceptron type deep networks are also known as feedforward networks because the flow of information that goes through them is always forwarding, and no feedback connections are allowed.
True
In representation learning, the emphasis is on automatically discovering the features to be used for analytics purposes.
True
Delta (or an error) is defined as the difference between the network weights in two consecutive iterations.
False
The purpose of artificial intelligence is to augment human capability.
False
The main characteristic of the convolutional networks is having at least one layer involving a convolution weight function instead of general matrix multiplication.
True
Deep learning is an extension of neural networks that deal with more complicated tasks with a higher level of sophistication by employing many layers of connected neurons.
True
Comments
Post a Comment