Dify Contract Review Case Demonstration Based on Bad Cases (Workflow Breakdown)

Written by
Audrey Miles
Updated on:June-07th-2025

Recommendation
In-depth exposure of smart contract review technology based on Bad Cases.

Core content:

1. Application and challenges of RAG technology in contract review

2. Analysis of limitations of traditional contract review tools

3. Dify workflow case demonstration and solution

Yang Fangxian
Founder of 53A/Most Valuable Expert of Tencent Cloud (TVP)

At the end of April, there was a question about how to realize contract review based on Bad Cases and contract generation based on contract templates in the RAG process, which is a very representative direction of advanced RAG application, and this article introduces and demonstrates some of the contract review scenarios.

 

In the contract review scenario, it is a practical business requirement to utilize historical "Bad Cases" (including the original contract text and review results) to assist in the review of new contracts, rather than relying on predefined rules. However, the standard RAG mainly recalls fragments that are semantically similar to the problem, which makes it difficult for LLM to understand the whole situation and the reference value of a Bad Case. Directly using the whole Bad Cases as the knowledge source faces the challenges of context window limitation and imprecise recall (may only match the surface similarity but not the key problem points).

Combined with my hands-on experience in some related projects in the past few months, this post shows you a quick-startable solution example based on the dify workflow and sample data designed.

Enjoy.

1

 

Modern Contract Review Tools and Limitations

Before we formally start the introduction, let's review the common contract review tools available in the market and the rule engine practices behind them, before LLM and RAG technologies became popular.

1.1

 

Core Practices

Knowledge base construction and rules encoding:

Traditional rule engines are derived primarily from legal expertise, industry practices, and a company's risk strategy, and require continuous manual maintenance and updating. Specifically, it typically includes dimensions such as completeness checking, risk pattern identification, consistency and accuracy, compliance checking, and quantitative metrics monitoring.

Next, this knowledge needs to be translated into a series of clear and enforceable rules. For example, "Contracts must contain a 'dispute resolution' clause.", "'Limitation of Liability' clauses must be capped at no less than X% of the total contract amount.""If the contract type is 'Non-Disclosure Agreement', the confidentiality period must not be less than N years." etc. Finally, these rules are encoded into the rule engine using specific syntax (e.g., regular expressions, keyword matching, logical judgment formulas).

Automated Scanning and Tagging:

With the above rule base in place, in practice, the contract text is read by the tool (usually using OCR to process the scanned copy) and the rules engine matches the contract content with the rules in the rule base, either line by line or in parallel, to flag violations, missing clauses, inconsistencies with the standardized templates, or the presence of warning keywords. The output is generally highlighting problematic clauses, generating a list of risk alerts, giving a preliminary risk level score, and so on.

 

1.2

 

Limitations

Lack of semantic understanding:

The main implementation logic of the rule engine is to match based on surface features such as keywords, sentence structure, etc., which makes it difficult to understand the nuances of the language, the contextual meaning, or the real intention. In addition, for wording or clause variants that are not predefined in the rule base, even if they have the same meaning, they may be missed or misreported. For example, a clause that may be fine on its own may be risky in combination with other clauses.

Rigidity and maintenance costs are high:

Second, such manually-driven rule bases require continuous and extensive updating and maintenance, which is costly and prone to lag. Especially in order to cover more scenarios, the number of rules will increase dramatically, leading to conflicts between rules, and management complexity will gradually get out of control.

User experience and interpretability:

The output of traditional methods is generally a bunch of "rule violation" lists, a lack of in-depth explanation of the risk and business impact analysis, as well as specific, targeted modification suggestions, which are not user-friendly enough.

2

 

Generic Rules vs Bad Cases

Sorting out the traditional rules engine implementation logic and limitations, here to introduce this article to show the workflow orchestration logic.

2.1

 

Complementary Strengths

First of all, it should be noted that both are very important in contract review, but play different and complementary roles. The Generic Rules (the traditional rules engine described above) serve as a basic framework and compliance check, acting more like a "medical checklist" for quickly determining whether a contract is structurally complete, whether core terms are missing, and whether there are any basic, generally applicable legal or business practice requirements that have not been met. The general rules are also the cornerstone and starting point of the review, ensuring that the "skeleton" of the contract is basically correct.

The Bad Cases, as mentioned above, are a reflection of the depth and experience of the review. In other words, they are warnings of specific risk scenarios. How the absence, ambiguity or improper design of certain provisions in real business contracts can lead to practical problems (e.g., disputes, losses, failures, etc.) Bad Cases are often closely related to specific types of contracts, industry backgrounds, or business logics, which makes the risk analysis more targeted.

To summarize, General Rules are more like "static knowledge bases" that tell you "what it should look like", while Bad Cases are "static knowledge bases" that tell you "what it should look like". Bad Cases is a dynamic experience-based approach, telling you "what has gone wrong, why it happened, and what the consequences are. In a review, it is common to start with a "scan" using generalized rules to ensure that the basics are in order. Then, for the specific type of contract, the context of the transaction, and any potential areas of concern identified in the initial scan, the Bad Cases library is searched for similar scenarios for more in-depth and targeted risk assessment and clause optimization.

Which is more important? They are both important, one without the other, and they go hand in hand. If you only look at generic rules, the review may be superficial and fail to anticipate complex risks; if you only look at Bad Cases, you may lack a systematic approach and miss some basic but critical issues.

2.2

 

Bad Cases Recall Strategy

Further, the following order of prioritization can be considered when recalling Bad Cases in Dify (or similar) workflow aggregation to ensure relevance and effectiveness:

Issue Clause Similarity.

If a clause in a new contract is highly similar in wording, structure, or underlying logical flaws to a clause that was explicitly identified in a Bad Case as causing a problem, that Bad Case will have the greatest reference value. Similar problem clauses often imply similar risk logic, regardless of whether the contract type or industry is identical. Example: The acceptance clause in a new contract that states "Party A shall accept the delivery from Party B without written objection" is highly relevant to a Bad Case where a dispute arose over this type of "default acceptance" clause.

Contract Type.

This is self-explanatory, as contracts of the same type tend to have similar transaction structures, core objectives, and common areas of risk. For example, software development contracts typically focus on intellectual property, delivery and acceptance, and maintenance responsibilities, while leasing contracts focus more on the condition of the leased property, rental payments, and liability for breach of contract. As a result, Bad Cases from the same contract type are more likely to touch on topics directly related to the contract under review. For example, when reviewing a new Software Outsourcing Development Contract, Bad Cases from historical Software Development Service Contracts are usually more informative than Bad Cases from Lease Contracts. Bad Cases from historical "Software Development Services Contracts" are usually more informative than those from "Real Estate Lease Contracts".

Business Scenario Similarities.

Industry characteristics and specific business scenarios can significantly affect the risk weighting and interpretation of contract terms. What is standard practice in one industry may be a significant risk in another. Business scenarios determine the specific circumstances in which a contract is to be performed and possible points of conflict. For example, a marketing contract that involves the processing of a large amount of sensitive personal data may be more important in recalling a Bad Case related to data compliance and privacy protection (even if the type of contract is slightly different, if the industry is the same as Internet advertising or involves the processing of user data) than a Bad Case related to an ordinary service contract that does not involve sensitive data.

To summarize, the first thing to look at is "what does the problem look like": is there a known pattern of risk in the clause itself? Second, look at the "contractual category": is it the same type of legal relationship and transactional framework? Then look at the "contextualized": do industry practices and business processes make the risk more salient or take on a different form? Ideally, the workflow would synthesize these factors and give you a weighted list of Bad Cases based on how well they match. For example, a Bad Case that is highly similar to the terms of the current new contract, has the same contract type, and is in a similar industry context will have the highest reference value.

3

 

Sample Data Analysis

The sample data used in the following workflow demonstration consists of 3 typical contracts to be reviewed, 5 Bad Cases, and a basic set of review rules. In addition to not being able to directly use the commercial project data that was accessed due to confidentiality agreements, the simulated data allows for a more targeted design of contract terms and potential risk points, thus clearly demonstrating the core logic and capabilities of the workflow in identifying, analyzing, and dealing with these specific issues. This is a much more focused approach than using real contracts with complex structures containing a lot of extraneous information.

3.1

 

Diversity of contract types:

NC001: Software Outsourcing Development Contract: This is a common type of technical service contract that involves a number of key points, such as project scope, delivery, acceptance, intellectual property, etc., and can adequately test the workflow's understanding of complex business logic.

NC002: One-way Non-Disclosure Agreement (Recipient's Perspective): This type of contract focuses on the confidentiality of information, with relatively focused clauses, but with high demands on the accuracy of definitions and the reasonableness of obligations. From the "recipient's perspective", the workflow can be tested to identify potential risks to the detriment of one party.

NC003: Marketing Service Contracts: These contracts usually contain KPIs, attribution of results, etc., and test the workflow's ability to review service performance measures, IP transfers, etc. The three contracts are designed to include predefined obligations, which require a high degree of accuracy and reasonableness.

 

The three contracts have been designed with some typical risk points embedded in the corresponding scenarios, and many of the clauses in these new contracts can directly or indirectly correspond to the general rules in Basic_rules.md and the risk points of the cases in the Bade cases folder, which are easy for the workflow to refer to and match.

3.2

 

Design of Bad Cases (Bad Cases 1-5)

Covering common contract types and risk areas

BC001: Software Development Service Contract - Acceptance, Breach of Contract, Dispute Resolution.

BC002: Non-Disclosure Agreements (NDAs) - scope, duration, breach of contract.

BC003: Housing Lease Contracts - Delivery Criteria, Deposits, Early Termination, Renewal.

BC004: Product Purchase Contracts - Quality standards, acceptance period, limitation of liability.

BC005: Consulting service contracts - payment terms, delivery of results, intellectual property rights, termination clauses.

This diversity ensures a breadth of knowledge base for the workflow.

Structured Risk Summaries

Each Bad Case clearly lists identified_risks, problematic_clauses_summary, and suggestion_or_lesson. This structured information is ideal for LLM to learn from and extract, enabling the workflow to more accurately understand the context and resolution of historical risks. The LLM can also be used to learn and extract structured information.

In addition, the issues exposed in these cases, such as vague definitions, unequal rights and obligations, lack of key protections, and unreasonable limitations of liability, are very classic and common types of risks that are well represented in contract review.

 

With this design, the generic rules (Basic_rules.md) provide universal checklists when a new contract to be reviewed enters the workflow. Potential risk points in the new contract can be quickly found in similar historical cases by matching them with keywords_for_retrieval and identified_risks in Bad Cases. suggestion_or_lesson in Bad Cases can provide specific modifications to the review report for the new contract. The suggestion or lesson in Bad Cases can provide specific modifications and risk warnings for new contract review reports.

4

 

Workflow Orchestration Logic

This workflow implements a multi-stage contract review process, which is divided into the following four parts:

4.1

 

Input Collection and Preprocessing

Collect user-provided contract documents, specific concerns, contract types, and generic review rules to extract and transform contract text.

Note: The 'Contract Text Extraction' node outputs a list of 'Contract Contents' containing a single string, whereas a subsequent LLM or Template node might expect pure string contract contents. I initially ignored this subtle type difference, leading to Prompt rendering anomalies or LLM misinterpretation, and later added the Template node for explicit data type conversion.

4.2

 

Initial Risk Identification

LLM was used to perform a preliminary analysis of the contract text to quickly identify potential risk points.

4.3

 

Similar case retrieval and analysis

Based on the initial risk and contract type, relevant historical bad cases are retrieved from the knowledge base and key risk points from these cases are extracted by another LLM.

4.4

 

Comprehensive Review and Report Generation

The new contract text, preliminary risk analysis, relevant historical case risks, generic review rules, and user-specific concerns are consolidated into a detailed prompt for in-depth review by a powerful LLM, and a structured contract review report is generated.

Note: Even if all inputs are passed correctly to the final review LLM, it may not produce a high-quality, targeted report because the Prompt is not optimized enough. The solution is to prioritize the use of high-order LLMs with large parameters, followed by designing the final Prompt to be as structured, task-specific, and information-focused as possible.

5

 

Showing the results

6

 

Three Optimization Directions

Interested parties are recommended to start from three points: deepening the quality of input information (Bad Case Processing), enhancing the actionability of output results (Modification Suggestion), and improving the overall intelligence and user experience of the system (Iteration and Feedback).

6.1

 

Analyze the recalled Bad Cases one by one through the Loop node.

The current Bad Case Risk Point Summarization takes the results of all recalled Bad Cases as a whole and processes them in a single LLM call, which is required to summarize "every case that is explicitly present in the input text". Consider instead introducing a Loop node. The input to the loop is a list of recalled Bad Cases, and each iteration processes one Bad Case for more detailed correlation analysis.

6.2

 

Introducing the ability to "locate clause-level risks and generate specific amendment recommendations".

The output of the current workflow is a "Contract Review Report" that contains recommendations for changes, but the workflow itself does not directly generate usable, clause-specific changes. Consideration can be given to adjusting the LLM prompts in the "Review of New Contract" node to explicitly require it not only to identify risks, but also to locate the specific problematic clause numbers and original text in the new contract, and then try to generate 1-2 specific, actionable clause modification proposals for each major risk point identified.

6.3

 

Establish an iterative review and user feedback mechanism

The current workflow is a one-way process where the user enters a contract and gets a review report. After outputting the review report, it would be helpful to allow the user to select a particular risk point in the report, which then triggers a new, more focused LLM call that explains the point in more depth, provides more alternative modifications, or analyzes the pros and cons of a particular modification. (This is similar to the interaction logic of OpenAI, Gemini's Deep Research.)

7

 

Exploring the direction of RAG advancement

The advanced application of RAG is not only simple information retrieval and Q&A, but also using RAG as one of the core capabilities, combining more complex logic, multi-step processing, external tool invocation, and collaboration between models to accomplish more complex tasks or drive smarter interactions.

In 2024, the expectation for Agent explosion is getting higher and higher. In multiple subdivided scenarios, RAG (especially advanced RAG that can process various static documents and databases within the enterprise and connect to external tools and information) will no longer be "icing on the cake", but will become the basic requirement and core component for building enterprise-level large model applications. It solves the illusion problem and knowledge update problem of LLM, and provides the Agent with a "factual basis" for actions and decisions. The contract review workflow introduced in this article is also the practice of this concept. The case generated by the contract is expected to be issued in early June. Welcome to wait.

Anyway, practice makes perfect.