50 QnA Terraform

 

Top 50 Terraform Interview Questions & Answers

This post covers the most frequently asked Terraform interview questions, categorized from fundamental concepts to advanced real-world scenarios. Whether you're just starting or are an experienced DevOps professional, this list will help you prepare for your next interview.

Ⅰ. Fundamental Concepts

1. What is Terraform and why is it used?

Answer: Terraform is an open-source Infrastructure as Code (IaC) tool created by HashiCorp. It is used to define, provision, and manage infrastructure resources (like virtual machines, networks, and storage) in a safe, repeatable, and efficient manner. You write declarative configuration files in HCL (HashiCorp Configuration Language) to describe your desired infrastructure, and Terraform creates a plan to achieve that state and then executes it.

2. What is Infrastructure as Code (IaC)?

Answer: Infrastructure as Code (IaC) is the practice of managing and provisioning infrastructure through machine-readable configuration files, rather than through manual processes or interactive configuration tools. This allows for versioning, automation, and collaboration, treating infrastructure with the same rigor as application code.

3. What does it mean that Terraform is "declarative"?

Answer: A declarative language focuses on describing the desired end state (the "what"), not the specific steps to get there (the "how"). You declare what resources you want and their configuration, and Terraform is responsible for figuring out the sequence of API calls (create, update, delete) needed to reach that state. This is in contrast to an imperative approach, where you would write a script detailing each step.

4. What is the difference between Terraform and Ansible?

Answer:

  • Terraform is primarily a provisioning tool used to create, manage, and orchestrate infrastructure itself (e.g., spin up a VM). It uses a declarative approach.

  • Ansible is primarily a configuration management tool used to install and manage software on existing servers (e.g., install a web server on a VM). It uses a procedural (imperative) approach. They are often used together: Terraform provisions the servers, and Ansible configures them.

5. What are providers in Terraform?

Answer: Providers are plugins that Terraform uses to interact with a specific cloud provider, SaaS provider, or other API. Each provider adds a set of resource types and data sources that Terraform can manage. For example, the aws provider gives Terraform the ability to interact with AWS resources, while the azurerm provider is for Microsoft Azure.

6. What is the purpose of the terraform init command?

Answer: terraform init is the first command you run in a new Terraform project. It performs three main tasks:

  1. Provider Installation: It downloads and installs the providers defined in the configuration (required_providers block).

  2. Backend Initialization: It configures the backend where Terraform will store its state file.

  3. Module Installation: It downloads any modules referenced in the configuration.

7. What is the purpose of the terraform plan command?

Answer: terraform plan creates an execution plan. It compares the desired state defined in your configuration files with the current state of the real-world infrastructure (recorded in the state file) and shows you what changes Terraform will make. It's a "dry run" that allows you to review changes before applying them, which helps prevent mistakes.

8. What does terraform apply do?

Answer: terraform apply executes the plan created by terraform plan. It makes the necessary API calls to the cloud provider to create, update, or delete resources to match the desired state in your configuration. It will ask for confirmation before making any changes unless you use the -auto-approve flag.

9. What is the Terraform state file?

Answer: The Terraform state file (by default, terraform.tfstate) is a JSON file that Terraform uses to map the resources in your configuration to the real-world resources it manages. It keeps track of resource IDs, attributes, and dependencies. This file is crucial for Terraform to know what it is managing and how to update it.

10. Why should you not store the state file in your Git repository?

Answer: The state file should not be stored in Git for two main reasons:

  1. Sensitive Data: The state file can contain sensitive information in plain text, such as passwords, access keys, or IP addresses.

  2. Collaboration Lock: If multiple people run Terraform at the same time, they might overwrite each other's changes, leading to state corruption. A remote backend provides state locking to prevent this.

Ⅱ. Core Concepts

11. What are Terraform backends and why are they important?

Answer: A backend determines how Terraform loads and stores its state file. By default, Terraform uses the local backend, which stores the terraform.tfstate file on your local disk. For team collaboration, it's essential to use a remote backend (like Amazon S3, Azure Storage, or HashiCorp Terraform Cloud). Remote backends provide state locking, shared access, and versioning of the state file.

12. What is state locking?

Answer: State locking is a feature provided by remote backends that prevents more than one user from running terraform apply at the same time on the same state. When one user runs apply, the state is "locked." If another user tries to run apply, they will have to wait until the lock is released. This prevents concurrent runs from corrupting the state file.

13. Explain the difference between count and for_each.

Answer: Both count and for_each are meta-arguments used to create multiple instances of a resource.

  • count creates a specified number of identical resources. The resources are stored in a list, and you reference them with an index (e.g., aws_instance.example[0]). If you remove an instance from the middle of the list, Terraform may need to destroy and recreate the subsequent instances.

  • for_each iterates over a map or a set of strings. It creates a resource for each item in the collection. The resources are stored in a map, referenced by a key (e.g., aws_instance.example["web"]). This is more stable because if you remove an item, it doesn't affect the others. for_each is generally the preferred method.

14. What are Terraform modules?

Answer: A module is a container for multiple resources that are used together. It's the primary way to package and reuse resource configurations in Terraform. A simple module might be a set of .tf files in a directory. Using modules helps you organize your code, promote reuse, and maintain consistency.

15. What is the difference between a root module and a child module?

Answer:

  • Root Module: The set of configuration files in the main working directory where you run terraform commands is the root module.

  • Child Module: A child module is a separate directory of configuration files that is called from another module (like the root module) using a module block.

16. What are input variables and output values in the context of a module?

Answer:

  • Input Variables (variables.tf) are like function arguments for a module. They allow you to pass in values to customize the resources the module creates.

  • Output Values (outputs.tf) are like function return values. They expose information about the resources created by the module, which can then be used by the parent module.

17. What is the purpose of locals?

Answer: A local value assigns a name to an expression, allowing you to use it multiple times within a module without repetition. They are useful for simplifying complex expressions and improving the readability of your code. They are only visible within the module where they are defined.

18. What are data sources in Terraform?

Answer: A data source allows Terraform to fetch information from an external source (like a cloud provider's API) and use that information in your configuration. This is useful for referencing resources that were not created by Terraform or for getting information like the latest AMI ID.

19. Explain implicit vs. explicit dependencies in Terraform.

Answer:

  • Implicit Dependency: Terraform automatically infers dependencies between resources by analyzing the expressions in your configuration. For example, if an EC2 instance's configuration references the ID of a security group, Terraform knows it must create the security group before the instance. This is the preferred way to manage dependencies.

  • Explicit Dependency: In rare cases, you might need to manually define a dependency using the depends_on meta-argument. This tells Terraform that one resource must be created before another, even if there's no direct reference between them.

20. What is a provisioner in Terraform?

Answer: Provisioners are used to execute scripts or actions on a local or remote machine as part of resource creation or destruction. For example, you could use the remote-exec provisioner to run a configuration script on an EC2 instance after it's created. HashiCorp generally recommends against using provisioners, as they add complexity and are not fully declarative. It's often better to use configuration management tools like Ansible or to bake configuration into a custom machine image.

Ⅲ. Advanced Concepts

21. What is the lifecycle block used for?

Answer: The lifecycle block is a meta-argument that customizes the lifecycle of a resource. It has three main arguments:

  • create_before_destroy: Creates the replacement resource before destroying the old one during an update. This minimizes downtime.

  • prevent_destroy: Prevents Terraform from accidentally deleting a critical resource. An apply will fail if it plans to destroy the resource.

  • ignore_changes: Tells Terraform to ignore changes to specific resource attributes, preventing updates if those attributes are modified outside of Terraform.

22. What are dynamic blocks?

Answer: A dynamic block is used to generate multiple nested blocks (like ingress rules in a security group or setting blocks in an Elastic Beanstalk environment) based on a collection of values. It's a way to write more concise and flexible code when dealing with repeatable nested configuration blocks.

23. How do you manage sensitive data in Terraform?

Answer: There are several ways:

  1. Input Variables: Mark variables as sensitive = true. Terraform will redact their values from CLI output.

  2. Environment Variables: Set sensitive values as environment variables (e.g., TF_VAR_api_key).

  3. Secrets Management Tools: The best practice is to integrate with a secrets management tool like HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault. You can use a data source to fetch the secrets at runtime.

24. What is terraform import used for?

Answer: The terraform import command is used to bring existing, manually-created infrastructure under Terraform's management. You provide the resource address and the resource ID, and Terraform imports it into the state file. After importing, you must write the corresponding resource configuration block in your .tf files to match the imported resource.

25. What is terraform taint and why is it deprecated?

Answer: terraform taint marked a resource as "tainted" in the state file. On the next plan and apply, Terraform would plan to destroy and recreate the tainted resource, even if its configuration hadn't changed. It was deprecated in favor of the -replace option for terraform apply (e.g., terraform apply -replace=aws_instance.example), which is more explicit and predictable.

26. What is terraform workspace?

Answer: Workspaces allow you to manage multiple, distinct sets of infrastructure with the same configuration files. Each workspace has its own separate state file. This is useful for creating parallel environments like dev, staging, and prod from the same codebase.

27. What is a "splat" expression?

Answer: A splat expression (*) is a syntax shortcut for getting a list of attributes from a list of resources. For example, if you create multiple EC2 instances with count, you can get a list of all their public IP addresses with aws_instance.example[*].public_ip.

28. How can you handle a "resource already exists" error?

Answer: This error usually means that a resource with the same name or unique identifier already exists in the cloud provider, but it's not in your Terraform state. You have two options:

  1. Import: Use terraform import to bring the existing resource under Terraform's control.

  2. Delete: Manually delete the existing resource in the cloud provider and let Terraform create it on the next apply.

29. What is a null_resource?

Answer: The null_resource is a resource from the null provider that doesn't create any actual infrastructure. It's often used as a "dummy" resource to act as a trigger for a provisioner or to manage dependencies in a more complex way.

30. Explain the difference between terraform.tfvars and *.auto.tfvars.

Answer: Both are files used to set values for input variables.

  • terraform.tfvars: Terraform automatically loads variables from this file if it exists. It's a standard way to set environment-specific variables.

  • *.auto.tfvars (or *.auto.tfvars.json): Terraform loads all files with this extension automatically. This is useful for breaking up variable definitions into multiple files. They are loaded in alphabetical order.

Ⅳ. Best Practices & Real-World Scenarios

31. How would you structure a large Terraform project?

Answer: A good structure for a large project often involves:

  • Environment-based Layout: Create separate directories for each environment (dev, staging, prod).

  • Modules: Break down the infrastructure into reusable modules (e.g., vpc, database, app-server).

  • Remote State: Use a remote backend with a separate state file for each environment.

  • File Naming: Use consistent file names within modules (main.tf, variables.tf, outputs.tf).

  • Centralized Modules: Store shared modules in a separate Git repository or a private module registry.

32. What is infrastructure drift and how do you manage it?

Answer: Infrastructure drift is when the real-world state of your infrastructure no longer matches the state defined in your Terraform configuration. This usually happens when someone makes manual changes through the cloud console. You can detect drift by running terraform plan. To manage it, you should enforce a policy where all infrastructure changes are made through Terraform. Tools like Terraform Cloud can also periodically scan for drift.

33. How can you test your Terraform code?

Answer:

  1. Linting and Formatting: Use terraform fmt to format code and tflint to catch common errors.

  2. Validation: terraform validate checks for syntax errors.

  3. Planning: terraform plan is a form of testing to see what changes will be made.

  4. Unit/Integration Testing: Use tools like Terratest (Go library) or Kitchen-Terraform (Ruby library) to write automated tests that provision real infrastructure, run checks, and then tear it down.

34. You need to upgrade the version of a provider. What is your process?

Answer:

  1. Read the provider's changelog to check for any breaking changes between the current and target versions.

  2. Update the version constraint in the required_providers block.

  3. Run terraform init -upgrade to download the new provider version.

  4. Run terraform plan to see if the new version causes any planned changes.

  5. Review the plan carefully. If there are unexpected changes, address them by updating your configuration code.

  6. Apply the changes in a non-production environment first before rolling out to production.

35. A terraform apply fails midway through. What happens to your infrastructure?

Answer: If an apply fails, some resources may have been created or updated while others were not. Terraform records any successful changes in the state file before it exits. When you run terraform apply again, Terraform will read the updated state file, see what has already been done, and create a new plan to complete the remaining changes. The infrastructure is left in a partially applied state, but Terraform knows how to correct it.

36. How do you manage dependencies between separate Terraform configurations?

Answer: The terraform_remote_state data source is the best way. One configuration can use this data source to read the output values from another configuration's remote state file. This creates a loosely coupled dependency, allowing you to share information like a VPC ID or a security group ID between different projects or teams.

37. What is a "blast radius"?

Answer: The "blast radius" refers to the potential scope of damage that a single change or failure can cause. In Terraform, it's important to keep the blast radius small by breaking up large state files into smaller, more manageable components (e.g., separate state for networking, databases, and applications). This way, a mistake in one component is less likely to affect others.

38. What is the difference between destroy and -refresh=false?

Answer:

  • terraform destroy: This command creates and executes a plan to destroy all infrastructure managed by the current configuration.

  • terraform plan -refresh=false: The -refresh=false flag tells Terraform to skip the step of checking the real-world status of resources. This can speed up planning but can also be risky, as your plan will be based on a potentially stale state file.

39. How would you provision a three-tier architecture on AWS using Terraform?

Answer: I would structure the configuration into modules:

  • VPC Module: Creates the VPC, subnets (public for web, private for app and database), internet gateway, NAT gateway, and route tables.

  • Security Module: Creates security groups for each tier (e.g., web SG allows port 80/443 from the internet, app SG allows traffic from the web tier, DB SG allows traffic from the app tier).

  • Web Tier Module: Creates an Auto Scaling Group of EC2 instances for the web servers in the public subnets, along with a public-facing Application Load Balancer.

  • App Tier Module: Creates an Auto Scaling Group for the application servers in the private subnets, with an internal ALB.

  • Database Tier Module: Creates an RDS database instance in the private database subnets.

40. What is "terragrunt" and how does it relate to Terraform?

Answer: Terragrunt is a thin wrapper for Terraform that provides extra tools for keeping your configurations DRY (Don't Repeat Yourself), managing remote state, and working with multiple Terraform modules. It helps solve some of the challenges of managing large, real-world Terraform projects by allowing you to define your backend configuration and input variables once and inherit them across multiple environments.

Ⅴ. Terraform Cloud & Enterprise

41. What is Terraform Cloud/Enterprise?

Answer: Terraform Cloud (and its self-hosted version, Terraform Enterprise) is a managed service from HashiCorp that provides a stable, collaborative environment for using Terraform. It offers features like remote state management, a private module registry, policy as code (Sentinel), and a UI-driven workflow for planning and applying changes.

42. What is the Private Module Registry?

Answer: The Private Module Registry is a feature of Terraform Cloud/Enterprise that allows you to host and share modules internally within your organization. This makes it easy to discover, version, and reuse standard modules, promoting consistency and best practices.

43. What is Sentinel?

Answer: Sentinel is a Policy as Code (PaC) framework integrated into Terraform Enterprise/Cloud. It allows you to define fine-grained, logic-based policies that are enforced before a terraform apply is executed. For example, you could write a policy to prevent the creation of oversized EC2 instances or to ensure all S3 buckets have encryption enabled.

44. What are the advantages of using Terraform Cloud's remote execution environment?

Answer:

  • Consistency: All runs are executed in a consistent, disposable environment, eliminating "it works on my machine" problems.

  • Collaboration: Team members can review plans and approve applies through the UI.

  • Security: Cloud credentials and other secrets can be stored securely as variables in Terraform Cloud, rather than on individual developer machines.

  • Automation: It can be integrated with VCS (like GitHub) to automatically trigger runs on pull requests and merges.

45. Explain the VCS-driven workflow in Terraform Cloud.

Answer: You connect your Terraform Cloud workspace to a Git repository.

  1. Pull Request: When a developer opens a pull request with code changes, Terraform Cloud automatically runs a terraform plan and posts the result back to the PR for review.

  2. Merge: Once the PR is approved and merged into the main branch, Terraform Cloud can be configured to automatically run terraform apply to deploy the changes. This creates a CI/CD pipeline for your infrastructure.

Ⅵ. Miscellaneous

46. How do you keep up with new Terraform features and provider updates?

Answer: I follow the official HashiCorp blog, the Terraform section on HashiCorp Developer, and the changelogs for both Terraform Core and the specific providers I use. I also participate in community forums and watch talks from events like HashiConf.

47. What is the terraform graph command?

Answer: terraform graph generates a visual representation of the dependency graph of your resources in the DOT format. You can then use a tool like Graphviz to convert this into an image, which is useful for visualizing the relationships between resources in your configuration.

48. Can you use if/else logic in Terraform?

Answer: Yes, you can use a ternary conditional expression: condition ? true_val : false_val. This is often used with the count meta-argument to conditionally create a resource (e.g., count = var.create_resource ? 1 : 0) or to set a variable value based on a condition.

49. How do you handle breaking changes in a module you maintain?

Answer:

  1. Semantic Versioning: Use semantic versioning (e.g., v1.2.3) for the module. Increment the major version number (e.g., v1.x to v2.0) for any breaking changes.

  2. Documentation: Clearly document the breaking changes in the module's README and create an upgrade guide.

  3. Communication: Announce the new major version and the required changes to the teams that consume the module.

50. Describe a time you solved a complex problem using Terraform.

Answer: This is a behavioral question. Be prepared to discuss a specific, real-world example. A good answer will follow the STAR method:

  • Situation: Describe the context and the problem (e.g., "We had an inconsistent, manually-managed development environment that was slow to provision.").

  • Task: Explain your goal (e.g., "My task was to automate the entire environment provisioning process using Terraform to ensure consistency and speed.").

  • Action: Detail the steps you took. Mention specific Terraform features you used (e.g., "I created reusable modules for our networking, database, and application tiers. I used for_each to manage multiple application services and the terraform_remote_state data source to link our application stack to our shared networking stack.").

  • Result: Quantify the outcome (e.g., "As a result, we reduced environment provisioning time from 2 days to 15 minutes and eliminated configuration drift, which led to a 30% reduction in environment-related bugs.").

No comments: