Safe File Integration Locking - Best Practices for File Locks, Atomic Claims, and Idempotent Processing

· Go Komura · File Integration, Concurrency Control, Design, Windows Development

Safe File Integration Locking - Best Practices for File Locks, Atomic Claims, and Idempotent Processing

Concurrency control becomes a real problem almost immediately in shared-folder workflows, overnight batch jobs, and multi-process file integration. The usual questions are whether a file lock alone is enough, how to stop multiple workers from picking the same file, and how to avoid reading a file that is still being written.

This article organizes file-integration concurrency control around file locks, atomic claims, temp -> rename, and idempotency.

Contents

  1. Short version
  2. Conflict patterns that happen in file integration
  3. Anti-patterns
  4. Best practices
  5. Pseudocode excerpts
  6. Rough rule-of-thumb guide
  7. Summary
  8. References

File integration is a domain where the code itself is often less fragile than the handover agreement. Things pass unit tests but occasionally fail only in production shared folders or overnight batch runs. That is very normal.

In many cases, the real problem is not the file I/O API itself but the fact that these three things are vague:

  • when the file is safe to read
  • who owns the right to process it
  • how recovery works when something fails

So this article treats file-integration concurrency control not as “just locking,” but as a handover protocol.

1. Short version

  • The most important rule is to make sure that when the final filename becomes visible, the file is already safe to read
  • Express states such as being written, published, processing, and processed through names or directories
  • If multiple workers exist, take an atomic claim before processing
  • Use lock files and OS locks as helpers, but treat idempotency as the final safety net

In other words, the heart of file integration is not just “locking.”
It is really the handover protocol.

2. Conflict patterns that happen in file integration

2.1. Reading a file that is still being written

If the sender writes directly to the final filename, this failure appears immediately. With JSON, the closing brace may still be missing. With CSV, the row count may be incomplete. With ZIP, the file may simply be broken.

sequenceDiagram
    participant Sender as Sender
    participant Share as Shared folder
    participant Receiver as Receiver

    Sender->>Share: Create orders.csv with its final name
    Sender->>Share: Still writing rows 1..5000
    Receiver->>Share: Detect orders.csv
    Receiver->>Share: Start reading it immediately
    Note over Receiver: The file is still incomplete
    Sender->>Share: Continue writing the rest
    Note over Receiver: Row shortage / parse failure / partial processing

2.2. Two workers pick the same file at the same time

If the flow is “list files, check whether one is unprocessed, then open it,” two workers can easily grab the same input. That is how double counting and duplicate sending begin.

sequenceDiagram
    participant W1 as Worker 1
    participant W2 as Worker 2
    participant Dir as incoming

    W1->>Dir: Find a.csv
    W2->>Dir: Find a.csv
    W1->>Dir: Start reading
    W2->>Dir: Start reading
    Note over W1,W2: The same input is processed twice

2.3. Everyone stops because of a stale lock

A design that only “drops a lock file” gets stuck very easily after abnormal termination. If you cannot tell who owns the lock, whether it is still alive, or how long it remains valid, the next worker may wait forever.

sequenceDiagram
    participant A as Worker A
    participant Lock as lock file
    participant B as Worker B

    A->>Lock: Create the lock
    Note over A: Abnormal termination happens here
    B->>Lock: See that the lock exists
    B->>Lock: Skip starting work
    B->>Lock: Wait even longer
    Note over B,Lock: Everyone stops because staleness cannot be judged

3. Anti-patterns

3.1. Two-step Exists -> Create checking

The problem here is that checking and claiming are two different operations. Another process can slip in between them, so this is not real exclusion.

sequenceDiagram
    participant A as Process A
    participant B as Process B
    participant FS as File system

    A->>FS: Check whether the lock exists
    B->>FS: Check whether the lock exists
    FS-->>A: It does not exist
    FS-->>B: It does not exist
    A->>FS: Create the lock
    B->>FS: Create the lock
    Note over A,B: Both sides move forward
if (!File.Exists(lockPath))
{
    File.WriteAllText(lockPath, Environment.ProcessId.ToString());
    ProcessFile();
}

What you really need is a single atomic operation for “create only if absent.” In .NET, that usually means a FileMode.CreateNew-style approach. On POSIX, O_CREAT | O_EXCL is the same idea.

3.2. Writing directly to the final filename

If the receiver interprets “this filename is visible” as “this file is ready to read,” then writing directly to the final name is already a mistake. Do not make visible and safe to read mean the same thing.

flowchart LR
    A["Final name becomes visible"] --> B["Receiver detects it"]
    B --> C["Sender is still writing"]
    C --> D["Receiver reads incomplete data"]

3.3. Treating “the file size stopped changing” as completion

This looks convenient, but it is fragile. Network copies, sender pauses, buffering, and retries all make it unreliable.

if (currentLength == lastLength && stableSeconds >= 10)
{
    return Ready;
}

Completion decided by guessing will hurt you on shared folders and large files. It is much more stable to declare completion explicitly through a manifest or done file.

3.4. Letting everyone update a shared file

A shared status.csv or counter.json that everybody reads and writes usually ends in “last writer wins.” Once file integration starts acting like a mini-database, this becomes painful quickly.

3.5. Thinking a lock API is universal

Lock APIs matter, but they only work well when all participants play by the same agreement. In heterogeneous system integration, it is safer not to overestimate them.

Examples:

  • Linux flock is advisory, so software that ignores the rule can still write
  • Windows byte-range locks are ignored by memory-mapped file access
  • In other words, do not ask OS locks alone to carry completion signaling and ownership design

4. Best practices

4.1. Publish with temp -> close -> rename / replace

This is the standard path. Keep the file hidden under a temporary name while it is being built, close it, and only then switch it to the final name. The receiver watches only the final name.

flowchart LR
    A["Create a unique temp name"] --> B["Write the full payload to temp"]
    B --> C["Flush / close it"]
    C --> D["Rename / replace to the final name inside the same directory"]
    D --> E["Receiver watches only the final name"]

Important points:

  • temp and final should be in the same directory, or at least on the same volume / file system
  • on Windows / .NET, File.Replace is often worth considering
  • make “final filename is visible” mean “the contents are already complete”

4.2. Use done / manifest files to declare completeness

It is often much more stable to declare not only the payload itself, but also what exactly is complete in a separate file. This is especially useful in heterogeneous system integration.

Useful manifest fields often include:

  • target filename
  • size
  • hash
  • record count
  • integration ID / idempotency key
  • creation timestamp

The order matters too. If you publish the done file before the payload is actually complete, that is not a completion signal. It is an accident announcement.

4.3. Let the receiver take the claim atomically

If multiple workers watch the same incoming directory, a simple pattern is: rename it into your own processing area before reading. Only the worker whose rename succeeds owns the right to process the file.

sequenceDiagram
    participant W1 as Worker 1
    participant W2 as Worker 2
    participant IN as incoming
    participant PR as processing

    W1->>IN: Find a.csv
    W2->>IN: Find a.csv
    W1->>PR: Rename a.csv
    W2->>PR: Rename a.csv
    Note over W1,W2: Only the first successful worker owns it

It also helps operationally to split directories clearly:

flowchart LR
    T[temp] -->|publish| I[incoming]
    I -->|claim| P[processing]
    P -->|success| A[archive]
    P -->|failure| E[error]

4.4. If you rely on lock files, make them lease-based

If you use lock files, do not make them empty markers. Make them contain ownership and expiry information.

flowchart TD
    L[lock.json] --> A[ownerId]
    L --> B[host]
    L --> C[pid]
    L --> D[acquiredAt]
    L --> E[expiresAt]
    L --> F[heartbeatAt]

Important points:

  • create them atomically
  • use missing heartbeat updates as one staleness signal
  • in principle, only the creator should remove the lock
  • assume lock leakage can happen and define the recovery path up front

4.5. Assume idempotency

Exclusion matters, but in real operation you rarely eliminate double delivery or retries completely. In the end, it helps a lot if the system is designed so that processing the same input again does not break anything.

flowchart LR
    A["Input + idempotency key"] --> B{"Already processed?"}
    B -- yes --> C["Treat as success without re-executing"]
    B -- no --> D["Execute processing"]
    D --> E["Record in processed ledger"]

5. Pseudocode excerpts

5.1. A typical broken pattern

var lockPath = finalPath + ".lock";

if (!File.Exists(lockPath))
{
    File.WriteAllText(lockPath, "");
    using var writer = OpenForWrite(finalPath); // writes directly to final name
    WritePayload(writer);

    File.Delete(lockPath);
}

Problems:

  • Exists and WriteAllText are separate operations
  • finalPath becomes visible while writing is still in progress
  • the lock remains behind after abnormal termination

5.2. A healthier direction

var tempPath = MakeTempPathSameDirectory(finalPath);
WritePayload(tempPath);
FlushAndClose(tempPath);

PublishByRenameOrReplace(tempPath, finalPath); // same FS / same volume
PublishDoneFile(finalPath + ".done", new
{
    FileName = Path.GetFileName(finalPath),
    Size = GetFileSize(finalPath),
    Hash = ComputeHash(finalPath),
    IdempotencyKey = integrationId
});
if (!TryClaimBundleByRename(baseName, incomingDir, processingDir))
{
    return; // another worker already took it
}

var manifest = ReadDoneFile(Path.Combine(processingDir, baseName + ".done"));
VerifyPayload(Path.Combine(processingDir, baseName), manifest);

if (AlreadyProcessed(manifest.IdempotencyKey))
{
    MoveBundle(processingDir, archiveDir, baseName);
    return;
}

Process(Path.Combine(processingDir, baseName));
RecordProcessed(manifest.IdempotencyKey);
MoveBundle(processingDir, archiveDir, baseName);

The implementation details matter, but the order matters more. Do not mix “write,” “publish,” “take ownership,” and “record completion.”

6. Rough rule-of-thumb guide

  • single writer / single reader / same host -> temp -> rename alone already gets you far
  • multiple consumers -> add a claim rename from incoming to processing
  • heterogeneous systems, NAS, shared folders -> add manifest / done and idempotency as well
  • multiple writers updating the same logical state -> do not force the problem into files; consider a database or queue
  • OS locks are useful inside one controlled application family, but they do not replace the handover protocol itself

That last point is also a retreat criterion. Some problems are simply unpleasant when forced into file-based integration.

7. Summary

The real heart of exclusion here is:

  • file-integration concurrency control is not mainly about calling a lock function; it is about defining state transitions
  • representing being written, published, processing, and processed through names and directories reduces accidents a lot

Designs to avoid:

  • Exists -> Create
  • writing directly to the final filename
  • guessing completion from stable file size
  • everybody updating the same shared file
  • asking lock APIs alone to carry the whole protocol

Practical countermeasures that work well:

  • temp -> close -> rename / replace
  • explicit completeness through done / manifest files
  • ownership through claim rename
  • lease rules and idempotency for recovery

The core trick is not to make “readable” and “safe to read” mean the same thing. That single separation eliminates a surprising number of problems that otherwise show up only in the middle of the night.

8. References

Author GitHub

The author of this article, Go Komura, is on GitHub as gomurin0428 .

You can also find COM_BLAS and COM_BigDecimal there.

← Back to the Blog