Fastai Course Chapter 2 Q&A on WSL2 | by David Littlefield


Table of Contents

An answer key for the questionnaire at the end of the chapter

Image by Adam Do

The 2nd chapter from the textbook provides an overview of the model. It covers some capabilities, limitations, challenges, and considerations that are related to building the model. It also covers some of the challenges and considerations that are related to deploying the model into production.

We’ve spent many weeks writing the questionnaires. And the reason for that, is because we tried to think about what we wanted you to take away from each chapter. So if you read the questionnaire first, you can find out which things we think you should know before you move on, so please make sure to do the questionnaire before you move onto the next chapter.

— Jeremy Howard,

1. Where do text models currently have a major deficiency?

Text models still struggle to produce factually correct responses when asked questions about factual information. It can generate responses that appear compelling to the layman but are entirely incorrect. It can also be attributed to the challenges in natural language processing that are related to accuracies such as contextual words, homonyms, synonyms, sarcasm, and ambiguity.

2. What are the possible negative societal implications of text generation models?

The negative societal implications of text generation models are fake news and the spread of disinformation. It could be used to produce compelling content on a massive scale with far greater efficiency and lower barriers to entry. It could also be used to carry out socially harmful activities that rely on text such as spam, phishing, abuse of legal and government processes, fraudulent academic essay writing, and social engineering pretexting.

3. In situations where a model might make mistakes, and those mistakes could be harmful, what is a good alternative to automating a process?

The best alternative to artificial intelligence is augmented intelligence which expects humans to interact closely with the models. It can make humans 20 times more productive than using strictly manual methods. It can also produce more accurate processes than using strictly humans.

4. What kind of tabular data is deep learning particularly good at?

Deep learning is particularly good at analyzing tabular data that contains columns with plain text and high-cardinality categorical variables which have many possible values. It can outperform popular machine learning algorithms under these conditions. It also takes longer to train, is harder to interpret, involves hyperparameter tuning, and requires GPU hardware.

5. What’s a key downside of directly using a deep learning model for recommendation systems?

A key downside of recommendation systems is that nearly all deep learning models only recommend products the user might like rather than products they might need or find useful. It only recommends similar products based on their purchase history, product sales, and product ratings. It also can’t recommend novel products that haven’t been discovered by many users yet.

6. What are the steps of the Drivetrain Approach?

The Drivetrain Approach is a framework that’s used in machine learning to design a system that can solve a complex problem. It uses data to produce actionable outcomes rather than just generate more data in the form of predictions. It also uses the following 4-step process to build data products:

  1. Define a clear outcome you are wanting to achieve
  2. Identify the levers you can pull to influence the outcome
  3. Consider the data you would need to produce the outcome
  4. Determine the models you can build to achieve the outcome

7. How do the steps of the Drivetrain Approach map to a recommendation system?

The outcome is to capture additional sales by recommending products to customers that wouldn’t have purchased without the recommendation. The lever is the method that’s used to choose the recommendations that are shown to customers. The data is collected to identify the recommendations that cause new sales which require conducting randomized experiments that test a wide range of recommendations for a wide range of customers.

The model is actually two models that predict the purchase probability for products based on whether customers were shown the recommendation. It computes the difference between the purchase probabilities to decide the best recommendations to display. It also accounts for customers that ignore recommendations and would’ve purchased without the recommendation.

8. Create an image recognition model using data you curate and deploy it on the web.

The textbook recommends deploying the initial prototype of an application as an interactive Jupyter Notebook using Binder. It allows users to create sharable notebooks that can be accessed with a single link. It also assigns a virtual machine to run the application which allocates the storage space to store all the files that are needed to run the Jupyter Notebook in the cloud.

9. What are DataLoaders?

Data Loader is a class that’s used in PyTorch to preprocess the data from the dataset into the format that’s needed by the model. It specifies the dataset to load and customizes how the dataset gets loaded. It also mostly gets used for batching the data, shuffling the data, and loading the data in parallel.

10. What four things do we need to tell Fastai to create DataLoaders?

Data Block is a class that’s used in Fastai to build datasets and data loaders objects. It must specify the blocks, get_items, splitter, and get_y parameters to build the data loaders object. It can also use various combinations of the parameters to build different types of data loaders for deep learning models.

  1. blocks: Sets the functions for the input (left) and output (right) type
  2. get_items: Sets the input file paths using the get_image_files function
  3. splitter: Sets the function for splitting the training and validation sets
  4. get_y: Sets the labels function that extracts the labels from the dataset

11. What does the splitter parameter to DataBlock do?

Splitter is a parameter in the DataBlock class that’s used in Fastai to split the dataset into subsets. It sets the function that defines how to split the dataset into training and validation subsets. It also mostly uses the RandomSplitter function to randomly split the data but there are nine ways to split the data.

12. How do we ensure a random split always gives the same validation set?

Random Seed is a number that’s used in machine learning to initialize the random number generator. It enables the random number generator to produce weights with the same sequence of numbers. It also lets users train the model with the same code, data, and weights to produce similar results.

13. What letters are often used to signify the independent and dependent variables?

The Independent Variable is the variable that’s used in machine learning to represent the input value that’s being manipulated. Its value is expected to affect the output value but it’s not affected by any of the other variables in the experiment. It also usually gets signified by the letter “x” in equations.

The Dependent Variable is the variable that’s used in machine learning to represent the output value that’s being predicted. Its value depends on the independent variable which means it only changes when the independent variable changes. It also usually gets signified by the letter “y” in equations.

14. What’s the difference between the crop, pad, and squish resize approaches? When might you choose one over the others?

Crop is a technique that’s used in data augmentation to crop the images to fit a square shape of the requested size. It can help the model generalize better by adding images to the training set where the object isn’t fully visible. It can also lose important details in the images that get cropped out.

Pad is a technique that’s used in data augmentation to add pixels on each side of the images. It can help resize the images to the size that the model expects where the aspect ratio is preserved. It can also waste computation on blank spaces and lower the resolution of the useful part of the images.

Squish is a technique that’s used in data augmentation to either squeeze or stretch the images. It can help resize the images to the size that the model expects where the aspect ratio isn’t preserved. It can also cause unrealistic proportions in the images which confuses the model and lowers accuracy.

Each of the techniques has its disadvantages so the best technique would depend on each problem and dataset. It was suggested in the textbook to randomly crop different parts of the images which helps the model learn to focus on different things in the images. It also reflects how images work in the real world where the same object might be framed in different ways.

15. What is data augmentation? Why is it needed?

Data Augmentation is a technique that’s used in machine learning to artificially increase the size of a training dataset by creating modified versions of the images in the dataset. It can involve flipping, rotating, scaling, padding, cropping, translating, and transforming images. It can also help prevent overfitting when training machine learning models.

16. Provide an example of where the bear classification model might work poorly in production, due to structural or style differences in the training data.

The bear classification model will work poorly in production because the training data doesn’t match the production data. It would happen because the training images were downloaded from the internet which displays the bears more clearly and artistically than they would look in the real world.

17. What is the difference between item_tfms and batch_tfms?

Item Transforms is the parameter that’s used in Fastai to apply one or more transformations to all the images using the CPU before they are grouped into batches. It also gets used by batch transforms to resize all the images to the same size before the batch transformations are applied to the batches.

Batch Transforms is the parameter that’s used in Fastai to apply one or more transformations to the batches after they formed. It uses item transforms to resize all the images to the same size before they are grouped into batches which lets it apply the batch transformations to the batches using the GPU.

18. What is a confusion matrix?

The Confusion Matrix is a table that’s used in machine learning to evaluate the performance of the classification model. It compares the actual labels to the predicted values and provides a holistic view of how well the model is performing. It also displays the actual labels in the rows and the predicted values in the columns where the diagonal squares represent the correct predictions and the rest of the squares represent the incorrect predictions.

19. What does export save?

Export is a function that’s used in Fastai to save the trained model to make predictions in production. It saves everything that’s needed to rebuild the learner which includes the architecture and trained parameters. It also includes the data loader parameters that define how to transform the data.

20. What is it called when we use a model for making predictions, instead of training?

The Inference is the process of using the trained model to make predictions about unseen data. It can make predictions by performing the forward pass without including the backward pass to compute the error and update the weights. It can also be optimized to improve the throughput, response time, and power and memory consumption before being used in the real world.

21. What are IPython widgets?

IPython Widget is a GUI element that’s used in Jupyter Notebook to enhance the interactive features in the notebook. It includes widgets such as buttons, sliders, and dropdowns that combine Python and JavaScript functionality in the web browser. It also lets users control the data and visualize changes in the data by responding to events and invoking specified event handlers.

22. When would you use a CPU for deployment? When might a GPU be better?

CPUs are general-purpose processors that do a decent job at inference even though they have considerably lower throughputs and higher latencies than GPUs. It can be cost-effective for applications that analyze single pieces of data where speed isn’t very important. It can also be cheaper to rent servers because there’s more market competition in CPU servers than GPU servers.

GPUs are parallel coprocessors that are designed to be ideal at inference because they have considerably higher throughputs and lower latencies than CPUs. It can be cost-effective for applications that have a high enough volume to analyze a batch of data at a time. It can also require additional complexities like memory management and queuing processing systems.

23. What are the downsides of deploying your app to a server, instead of to a client (or edge) device such as a phone or PC?

The textbook provides four examples of the downsides to deploying the model to a server which includes limited accessibility, longer wait times, more security, and extra costs. It would require users to have an internet connection to use the model, and they would experience longer delays while the data was transmitted to and from the server. It would also require protecting the sensitive data that’s uploaded by users, and the complexity of managing, scaling, and protecting the server would increase the overhead.

24. What are three examples of problems that could occur when rolling out a bear warning system in practice?

The bear warning system could make accurate predictions that detect bears but be unable to produce an actionable outcome that’s helpful. It can make inaccurate predictions which trigger false alarms that are unhelpful. It can also not work at all because the training and production data are different.

25. What is out-of-domain data?

Out of Domain Data is production data in machine learning that’s largely different in some aspect from the training data that was used to train the model. It can cause unexpected behaviors from the model that leads to all kinds of problems in practice. It can also be mitigated by using a carefully thought-out process and by doing first-hand data collection and labeling.

26. What is domain shift?

Domain Shift is a problem in machine learning where the production data changes over time until it no longer represents the training data that was used to train the model. It can cause the model to be less effective and even ineffective. It can also be partially mitigated by using a thought-out process.

27. What are the three steps in the deployment process?

The first step of the deployment process is to use an entirely manual process where the model is run in parallel with human supervision and not used to drive any actions. It requires humans to be involved in the process to look at the model outputs to make sure they make sense and check for problems.

The second step of the deployment process is to limit the scope of the model and carefully supervise it. It can be implemented in a small geographical area with time constraints as a trial using the model-driven approach. It can also require a person to approve each prediction before any action is taken.

The third step of the deployment process is to gradually expand the scope of the model. It can gradually increase the scope of the model and gradually decrease human supervision. It can also require good reporting systems to check for any changes to the actions taken compared to the manual process.


Contextual words are words that carry different meanings that depend on the context of the sentence such as “running to the store” and “running out of milk.” It can be difficult for the text model to differentiate between these kinds of words in context even though it has learned all of the definitions.

Homonyms are words that are spelled and pronounced the same but have different meanings such as “bank,” as in the financial institution and the land along a river. It can be difficult for the text model to perform question answering and speech-to-text when the words aren’t written in text form.

Synonyms are words that have the same meaning as other words such as “big” and “large.” It can be difficult for the text model to understand the correct meaning of synonyms because some words have the same meaning in certain contexts but not all contexts such as “big” and “large” brother.

Sarcasm refers to words that may have a positive or negative sentiment by definition but actually implies the opposite. It can be difficult for the text model to detect sarcasm because it requires an understanding of the context of the situation, the specific topic, and the environment that’s referenced.

Ambiguity refers to sentences that have multiple interpretations such as “I saw a dog on the beach with my binoculars.” It can be difficult for the text model to interpret ambiguity because some words strongly depend on the sentence context which makes it impossible to define polarity in advance.

Artificial Intelligence (AI) is a wide area of computer science that builds smart machines that are capable of performing tasks that usually require human intelligence. It enables machines to simulate human perception, learning, problem-solving, and decision-making. It also includes concepts such as machine learning, deep learning, and artificial neural networks.

Augmented Intelligence is an alternative use of artificial intelligence that focuses on technology as a tool to enhance human intelligence rather than replace it. It can relieve humans from demanding, time-consuming, and repetitive tasks. It can also support human thinking and decision making but the interpretation and decision making are made entirely by humans.

Deep Learning (DL) is a subcategory of machine learning that uses special algorithms to learn how to perform a specific task with increasing accuracy. It has four learning methods which include supervised, semi-supervised, unsupervised, and reinforcement learning. It also produces models based on an artificial neural network that contains two or more hidden layers.

Machine Learning (ML) is a subcategory of artificial intelligence that uses algorithms to analyze data, learn from that data, and make decisions or predictions about new data. It has three learning methods which include supervised, unsupervised, and semi-supervised learning. It also produces models based on artificial neural networks that have one hidden layer.

PyTorch is a C++ library that’s used in Python to build, train, and deploy deep learning models for prototyping. It offers high performance, usability, and flexibility. It was also optimized for Python which led to better memory and optimizations, error messages, model structure, and model behavior.

Fastai is a library that’s used in Python for deep learning. It provides a high-level API that’s built on top of a hierarchy of lower-level APIs which can be rebuilt to customize the high-level functionality. It also provides support for computer vision, natural language processing, and tabular data processing.

Jupyter Notebook is a program that’s used to create, modify, and distribute notebooks that contain code, equations, visualizations, and narrative text. It provides an interactive coding environment that runs in the web browser. It also has become a preferred tool for machine learning and data science.

Python is an object-oriented language that’s known for its simple syntax, code readability, flexibility, and scalability. It mostly gets used to develop web and software applications. It also has become one of the most popular languages for artificial intelligence, machine learning, and data science.


Leave a Comment