AdaExtract - Multimodal Knowledge Graph

Abstract

AdaExtract is our foundation model for multimodal information extraction. It processes diverse data types with customizable schemas and source tracing.

Everyone can do RAG, but not all RAG pipelines are equally good.

AdaExtract Usage

AdaExtract is extraction

Figure 1 illustrates this workflow:

graph TD A[Schema Definition + Examples] --> B[Fine-tune AdaExtract] B --> C[AdaExtract Instance] D[New Data] --> E[Run AdaExtract] C --> E E --> F[Structured Output with Source Attribution] style C fill:#f9f,stroke:#333,stroke-width:4px style E fill:#bbf,stroke:#333,stroke-width:2px

Figure 1: AdaExtract Usage

Start by schema definition and example collection, followed by fine-tuning the pre-trained AdaExtract model. Once fine-tuned, the model can process new data efficiently, producing structured output with source attribution.

Model Architecture

AdaExtract’s approximate model architecture, which represents our proprietary research and competitive edge, is designed for flexibility and efficiency. While the specific implementation details are reserved for licensed use, we can provide an overview of the key components that contribute to AdaExtract’s performance:

graph TD A[Input Data] --> B subgraph ModelArchitecture[AdaExtract] B[Encoder] C[Multimodal Fusion Module] D[Extraction Head] F[Adaptive Layer] B --> C C --> D B -.-> F C -.-> F end D --> E[Structured Output with Source Attribution] style ModelArchitecture fill:#f0f0f0,stroke:#333,stroke-width:2px style B fill:#bbf,stroke:#333,stroke-width:2px style C fill:#bfb,stroke:#333,stroke-width:2px style D fill:#bfb,stroke:#333,stroke-width:2px style F fill:#f9f,stroke:#333,stroke-width:2px

Figure 2: Approximate AdaExtract Model Architecture

The architecture consists of the following key components:

Encoder: This module efficiently processes both text and image inputs, leveraging advanced techniques in transfer learning and multimodal representation.
Multimodal Fusion Module: A sophisticated component that combines information from different modalities, enabling effective integration of features from diverse data types.
Extraction Head: This specialized module extracts relevant information based on the provided schema and provides accurate source attribution for extracted information.
Adaptive Layer: This layer contains task-specific weights, enabling efficient fine-tuning for new tasks without modifying the entire model architecture.

The exact implementation details, including the specific neural network architectures, attention mechanisms, and training procedures, are available only for licensed use. This unique architecture allows AdaExtract to achieve its key advantages in customization, efficiency, precision, and traceability.

For more details on the baseline model architecture, please subscribe and we will announce once the research paper is published on ArXiv. AdaNomad deploys a version of the model with improvements to the encoder, fusion module, and extraction head.

Example Extracted Output

The following mermaid graph illustrates an example of extracted output from multimodal data sources using AdaExtract:

graph TD A[Multimodal Data] --> B[AdaExtract] B --> C[Extracted Output] subgraph VideoData[Video Data] D[Table Tennis Game] --> E[Extracted Entities] E --> F[coach] E --> G[chair] E --> H[table tennis player] E --> I[pingpong racket] E --> J[pingpong table] end C --> VideoData style A fill:#f9f,stroke:#333,stroke-width:4px style B fill:#bbf,stroke:#333,stroke-width:4px style C fill:#bfb,stroke:#333,stroke-width:4px

Figure 3: Example Extracted Output from Multimodal Data

This example demonstrates how AdaExtract can process data from various modalities (video, image, text) and extract structured information based on the defined schema. The extracted entities, activities, and information are clearly linked to their respective data sources, showcasing AdaExtract’s multimodal capabilities and source attribution.

Advantages Over Existing Solutions

While general-purpose models like GPT-4o and Gemini Flash offer powerful capabilities, AdaExtract is not a large language model (LLM) nor a purely embedding-based function such as Ada-2 or OpenCLIP. Instead, AdaExtract is a specialized information extraction model that extracts data based on a user-defined schema.

This schema-driven approach allows AdaExtract to focus on extracting only the relevant information requested, ignoring extraneous details that may be present in the input data. Unlike general-purpose indexing techniques, AdaExtract’s extraction results are highly relevant.

AdaExtract also capable of extracting relationships between entities, creating a knowledge graph representation, which provides significant advantages over unstructured text or embeddings, as it enables downstream applications to query for the relationships within the data.

The trade-off with this approach is that if the schema needs to be changed, the extraction process would need to be repeated on the same set of documents. However, this is a reasonable trade-off given the significant benefits in relevancy and structured output that AdaExtract provides.

Compared to existing solutions, AdaExtract offers several key advantages:

Guaranteed Schema-Following: By extracting information based on user-defined schemas, AdaExtract ensures that the output adheres to the desired structure, which is not guaranteed with general LLM fine-tuning.
Multimodal Capability: Unlike many text-focused models, AdaExtract efficiently handles both text and image inputs, making it versatile for various real-world applications.
Efficiency: The lightweight fine-tuning approach permits quick instantiation of new model instances, reducing computational overhead and time-to-deployment.
Traceability: The model provides accurate source attribution for all extracted information, enhancing transparency and facilitating verification.
Low Latency: AdaExtract’s efficient extraction process ensures quick response times, making it suitable for real-time applications.

While fine-tuning large language models (LLMs) can be effective for certain tasks, AdaExtract’s specialized architecture and workflow offer distinct advantages in structured information extraction.

Additionally, AdaExtract’s pre-trained model can be used even in situations where there is limited task-specific data available.

Applications

AdaExtract’s flexibility makes it suitable for a wide range of applications, including but not limited to:

1. Legal Document Analysis

One key application of AdaExtract is in the legal domain, where the processing of complex contracts and legal documents is a critical task. For example, many large law firms maintain a collection of contracts manual, which contains standardized language, fallback alternatives, and explanatory rationale for various contract clauses.

Using AdaExtract, these legal manuals can be ingested and processed, with the model extracting the relevant entities, relationships, and rules defined within the manual. This structured knowledge graph representation can then be used to automate various contract review and drafting tasks.

graph TD A[Contract Manual] --> B[AdaExtract] B --> C[Extracted Knowledge Graph] C --> D[Contract Review and Drafting] D --> E[Automated Contract Management] E --> F[Continuous Learning] F --> B style A fill:#f9f,stroke:#333,stroke-width:4px style B fill:#bbf,stroke:#333,stroke-width:4px style C fill:#bfb,stroke:#333,stroke-width:4px style D fill:#bfb,stroke:#333,stroke-width:4px style E fill:#bfb,stroke:#333,stroke-width:4px style F fill:#bfb,stroke:#333,stroke-width:4px

When presented with a new contract, AdaExtract can analyze the document, identify the relevant clauses, and apply the appropriate rules and fallback options based on the extracted knowledge graph. This allows for faster contract review, negotiation, and drafting, as the model can suggest modifications and alternatives that align with the firm’s established best practices.

As AdaExtract is used to process more contracts and interacts with legal experts, it can continuously learn and refine its understanding of contract language and negotiation strategies, allowing it to provide increasingly accurate and tailored recommendations over time.

2. Medical Record Processing

In the medical domain, AdaExtract can be used to extract relevant information from patient records, which often contain a mix of structured and unstructured data, including text, images, and other multimedia content. By defining a schema that captures the key entities, relationships, and medical concepts, AdaExtract can parse these records and create a comprehensive knowledge graph representation.

graph TD A[Patient Medical Records] --> B[AdaExtract] B --> C[Extracted Knowledge Graph] C --> D[Automated Diagnosis] C --> E[Treatment Planning] D --> F[Patient Monitoring] E --> F C --> G[Knowledge Graph Reasoning] G --> H[Merge with External Knowledge Graphs] H --> I[Enhanced Patient Data] H --> J[Biomedical Research Insights] style A fill:#f9f,stroke:#333,stroke-width:4px style B fill:#bbf,stroke:#333,stroke-width:4px style C fill:#bfb,stroke:#333,stroke-width:4px style D fill:#fbb,stroke:#333,stroke-width:4px style E fill:#ffb,stroke:#333,stroke-width:4px style F fill:#bff,stroke:#333,stroke-width:4px style G fill:#fbf,stroke:#333,stroke-width:4px style H fill:#bff,stroke:#333,stroke-width:4px style I fill:#fbb,stroke:#333,stroke-width:4px style J fill:#ffb,stroke:#333,stroke-width:4px

This structured output can then be used to support various healthcare applications, such as automated diagnosis, treatment planning, and patient monitoring. The traceability and source attribution features of AdaExtract also help to enhance the transparency and trustworthiness of the extracted information, which is particularly important in the medical field.

The extracted knowledge graph can be merged with external knowledge graphs of patient data and biomedical research, enabling more comprehensive analysis and insights. This integration of multiple data sources through knowledge graph reasoning can lead to improved patient care and accelerated medical research.

3. Financial Report Extraction

Financial institutions and analysts often need to process large volumes of financial reports, earnings statements, and other business documents to extract relevant data points and insights. AdaExtract can be leveraged to automate this process, defining schemas that capture the key financial metrics, company information, and market trends.

graph LR A[Financial Reports] --> B[AdaExtract] B --> C[Extracted Knowledge Graph] C --> D[Data Aggregation] C --> E[Trend Analysis] D --> F[Decision Support] E --> F style A fill:#f9f,stroke:#333,stroke-width:4px style B fill:#bbf,stroke:#333,stroke-width:4px style C fill:#bfb,stroke:#333,stroke-width:4px style D fill:#fbb,stroke:#333,stroke-width:4px style E fill:#ffb,stroke:#333,stroke-width:4px style F fill:#bff,stroke:#333,stroke-width:4px

By processing these reports and creating a structured knowledge graph, AdaExtract can enable faster data aggregation, trend analysis, and decision-making support for financial professionals. The model’s multimodal capabilities also allow it to handle a wide range of financial documents, including those with embedded charts, tables, and other visual elements.

In social media and digital marketing, AdaExtract can be used to analyze user-generated content, such as posts, comments, and reviews, to extract valuable insights. By defining schemas that capture entities, sentiments, and relationships, the model can parse this unstructured data and generate structured knowledge graphs.

graph TD A[Social Media Content] --> B[AdaExtract] B --> C[Extracted Knowledge Graph] C --> D[Brand Monitoring] C --> E[Sentiment Analysis] C --> F[Influencer Identification] C --> G[Merge with External Knowledge Graphs] G --> H[Social Interaction Analysis] G --> I[Enhanced User Profiling] style A fill:#f9f,stroke:#333,stroke-width:4px style B fill:#bbf,stroke:#333,stroke-width:4px style C fill:#bfb,stroke:#333,stroke-width:4px style D fill:#bfb,stroke:#333,stroke-width:4px style E fill:#bfb,stroke:#333,stroke-width:4px style F fill:#bfb,stroke:#333,stroke-width:4px style G fill:#fbf,stroke:#333,stroke-width:4px style H fill:#fbb,stroke:#333,stroke-width:4px style I fill:#ffb,stroke:#333,stroke-width:4px

These knowledge graphs can then be used to support various applications, such as brand monitoring, customer sentiment analysis, and influencer identification. The traceability and source attribution provided by AdaExtract also enhance the reliability and trustworthiness of the extracted insights, which is crucial in the dynamic and often noisy social media landscape.

Differentiating the extracted knowledge graphs with external knowledge graphs of social interactions can provide compare & contrasting insights into user behavior, network dynamics, and social trends. This integration enables more sophisticated social media analysis and targeted marketing strategies.

Ongoing Work

Ongoing research and development efforts for AdaExtract include:

Expanding the training dataset to cover a more diverse range of use cases and domains, enhancing the model’s generalization capabilities.
Developing interactive demonstrations to showcase AdaExtract’s real-time capabilities and facilitate adoption by potential users.
Investigating the integration of additional modalities, such as audio and video, to further expand the model’s applicability.

Conclusion

AdaExtract represents a significant advancement in multimodal information extraction, offering a flexible and efficient solution for processing diverse data types.

The innovative architecture and workflow of AdaExtract allow it to adapt quickly to new tasks while maintaining high levels of precision and traceability in the extracted information.

As research progresses, AdaExtract has the potential to revolutionize information extraction across various industries and academic disciplines.

For more information or technical discussions, contact our engineering team at [email protected]

Interested on working on the research? Reach out to us at [email protected]