Software Engineering

Version Control

Michael L. Collard, Ph.D.

Department of Computer Science, The University of Akron

SCM

Software Configuration Management

  • Tracking and controlling changes to files used in software development
  • Based on revision control (version control)
  • Used for managing builds and releases
  • Used for accounting and auditing of the process and product

We use version control for …

  • Coordinating source code for a particular release
  • Collecting metrics on software productivity
  • Studying the process of development
  • Informing non-developers of the current state of the source code
  • Bringing new and absent developers up to date

Version Control (Revision Control)

After an editor and a compiler, version control is the most crucial tool for software development

  • Essential to the coordination of changes among collaborating developers
  • Essential to a solitary developer working on anything non-trivial
  • Maintains a history of changes
  • Management of branches and product families
  • Defines the workflow
  • All parts of development revolve around version control
  • Everything in modern software development depends on version control

Version Control vs. Shared Directory

Version Control (e.g., Git) Shared Directory
Tracks history of every change Only a snapshot at one moment
Shows authorship of each change No authorship information
Allows reverting to earlier versions No rollback, must unpack copies
Supports branching and merging Single linear copy only
Enables collaboration with multiple users Must manually share updated files
Detects and resolves conflicts Silently overwrites changes
Stores differences efficiently (deltas) Full copy of every file for every "version"
Integrates with automation No built-in automation support
Provides commit messages for context No explanation of changes
Ensures integrity with hashes Integrity depends on archive tool
Each change can be signed/verified Only the entire archive can be signed/verified
Treats development as iterative Treats development as static

diff and patch

  • Distribute changes efficiently
  • Simplistic form of handling versions
  • diff utility creates a patch file
  • The patch utility applies the patch file to the starting code to create the updated file

Ex: Create a Patch hello.cpp

Ex: Apply a Patch

hello.cpp

iostream.patch

Common Features

  • Versioning down to file level
  • Text Files: Only understands the lexical level (i.e., a source-code file is a file of characters)
  • No understanding of the syntactic structure of code
  • Does not know what a while statement is
  • Binary Files: Stay at the file level

Management Models

Management Model: File Locking

  • Only one developer at a time has access to a file/resource
  • Lock-Modify-Unlock
  • One developer at a time has the "token"; other developers have to wait
  • Library model
  • Advantage: No merging problems
  • Disadvantage: Prevents other developers from working
  • Disadvantage: Impractical for distributed development due to time differences

Management Model: Version Merging

  • No restrictions on access
  • Developers can work simultaneously
  • Copy-Modify-Merge
  • Advantage: No restrictions on working
  • Disadvantage: Merge issues

Current Practice

  • A large majority of the usage of version control is Version Merging
  • File Locking is typically only used for binary files (e.g., MS Word files)
  • May find old projects (and developers) that use File Locking

Centralized Version Control

Centralized Version Control

  • e.g., Subversion (SVN), ClearCase, Vault
  • A single central repository, local working copies
  • Access controlled by the server
  • One sequence of version numbers
  • Traditional approach

SVN View

  • (remote) repository:
  • The single, central SVN repository typically running on a remote machine
  • working copy:
  • Sometimes referred to as a local repository, but it is not
  • The code you checked out into your filesystem
  • Where you modify your files

SVN

  • Versions identified by monotonically increasing numbers
  • URLs identify both the location of a central repository and directories/files in the central repository
  • Each commit has an author
  • Support for per-directory permissions, with some limitations

Common SVN Issues

  • Need access to a server to create a shared repository
  • No distinction between private and public changes
  • Merging is difficult
  • Branching creates problems

Distributed Version Control

Distributed Version Control

  • e.g., Git, Bazaar, Darcs, Mercurial, Monotone, SVK
  • Peer-to-peer, no central repository; all are repository copies
  • No one sequence of version "numbers" (Why?)
  • Access controlled by the server

Git

  • Distributed revision control and SCM (Source Code Management) system
  • Created in 2005 by Linus Torvalds for Linux kernel development
  • Used by major companies, e.g., Microsoft, Apple
  • Built-in to many IDEs
  • Fluency in Git is a requirement for anybody in software engineering

Git View

  • repository
  • Stored in the (hidden) directory .git
  • What you clone from another repository
  • working copy
  • The code you checkout into your filesystem
  • The files that you see
  • Where you modify the file

Git Characteristics

  • Each commit has a hash, currently a SHA1 id (160-bit numbers in hexadecimal)
  • Each commit has an author and a committer
  • peer-to-peer
  • The URL only identifies the repository's location. The repository should always have branches and tags, and the default branch is the "main" (previously "master").
  • Each copy is a full-fledged repository and can be worked on locally without access to a central server
  • Each user clones the repository, makes changes, and pushes the changes

Git Benefits

  • Records complete new version
  • Handles local and remote repositories
  • Tracks merged data
  • Staging changes

Git Comparison

  • Advantages: fast, flexible, powerful, multiuser
  • Disadvantages: complex, challenging to learn, GUI tools less developed than SVN tools
  • Despite disadvantages, Git is a standard tool for software engineering and software development in general
  • Also used as a data format for applications
  • Example: brew is built on top of Git

GitHub

  • One issue with Git is that to collaborate with others, your repository must be public
  • GitHub is a hosting service for software development projects using Git
  • Web-based hosting site for Git repositories
  • About 420 million repositories, over 150 million users
  • Founded in 2008, Microsoft purchased it in 2018
  • An account at GitHub is necessary for software development and software engineering (not just this class)

Software Ecosystems

Git Influence

  • Microsoft:
  • Visual Studio Online 2015
  • Visual Studio Blog: Git
  • Azure Repos
  • Apple:
  • Current: XCode 26 and macOS Tahoe 26
  • XCode - Git/SVN support through XCode 8
  • XCode 9: Dropped SVN support
  • XCode 11: SVN deprecated
  • macOS Catalina 10.15: SVN no longer installed on the command line

Git Tools

Recommendation: command-line Git

  • Command-line git is the proper Git; everything else is an approximation
  • Most answers to Git questions show the command-line answer
  • As you use the command line, you start to remember commands (not the case in GUIs)
  • Often, the GUI "easy" solutions are not the only way to fix a problem and are often not the best
  • GUIs overlays proper git and try to make it simpler, but they fail
  • Can script/automate command-line solutions

Platform

  • Linux - Command-line git
  • macOS - Command-line git
  • Windows - Git Bash in Git for Windows Command-line git in WSL

SVN/Git Command Comparison

SVN Git
svn checkout url git clone url (git checkout branch)
svn update git pull
svn commit -m "Add feature" git commit -am "Add feature"; git push
svn status git status
svn revert path git reset –hard path
svn add file; svn rm file; svn mv file git add file; git rm file; git mv file