90 lines
2.4 KiB
Markdown
90 lines
2.4 KiB
Markdown
# recipe-document-converter
|
|
|
|
🍳 **A Docker-based import service for converting recipe documents (PDF, Word, Excel, Images) into structured JSON data.**
|
|
|
|
## 🎯 Project Overview
|
|
|
|
This repository contains a **microservice architecture** designed to:
|
|
|
|
1. **Extract** recipe content from multiple document formats
|
|
2. **Structure** unorganized data using LLM (Mistral) for intelligent parsing
|
|
3. **Integrate** seamlessly with recipe applications via REST API
|
|
4. **Scale** independently using Docker containerization
|
|
|
|
## 📦 Structure
|
|
|
|
```
|
|
recipe-document-converter/
|
|
├── recipe-document-converter/ # Main import service (this is the actual service)
|
|
│ ├── src/ # TypeScript source code
|
|
│ ├── Dockerfile # Container definition
|
|
│ ├── docker-compose.yml # Multi-service orchestration
|
|
│ ├── package.json # Dependencies
|
|
│ └── README.md # Service documentation
|
|
└── README.md # This file
|
|
```
|
|
|
|
## 🚀 Quick Start
|
|
|
|
### Using Docker Compose (Recommended)
|
|
|
|
```bash
|
|
cd recipe-document-converter
|
|
|
|
# Start the service
|
|
docker-compose up -d
|
|
|
|
# Test the service
|
|
curl http://localhost:3000/health
|
|
```
|
|
|
|
### Local Development
|
|
|
|
```bash
|
|
cd recipe-document-converter
|
|
|
|
# Install dependencies
|
|
npm install
|
|
|
|
# Start development server
|
|
npm run start:dev
|
|
```
|
|
|
|
## 📖 API Endpoints
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|---------------|--------------------------------|
|
|
| `GET` | `/health` | Health check |
|
|
| `POST` | `/import/pdf` | Import and extract PDF recipe |
|
|
|
|
## 📚 Full Documentation
|
|
|
|
See [recipe-document-converter/README.md](recipe-document-converter/README.md) for complete documentation.
|
|
|
|
## 🔮 Planned Features
|
|
|
|
- [x] PDF extraction
|
|
- [x] Basic recipe structuring
|
|
- [ ] Mistral LLM integration
|
|
- [ ] Excel support
|
|
- [ ] Word support
|
|
- [ ] Image OCR support
|
|
- [ ] Web scraping
|
|
|
|
## 🛠️ Tech Stack
|
|
|
|
- **NestJS** — Node.js framework
|
|
- **TypeScript** — Type safety
|
|
- **Docker** — Containerization
|
|
- **pdf-parse** — PDF extraction
|
|
- **Zod** — Schema validation (coming soon)
|
|
- **Mistral AI** — LLM integration (coming soon)
|
|
|
|
## 🤝 Contributing
|
|
|
|
Contributions welcome! Open an issue or submit a PR.
|
|
|
|
## 📄 License
|
|
|
|
MIT
|