hsh-finetune-dataset

Active

declared in 1.0.0

Made-to-order, answer-verified datasets for LLM fine-tuning. Describe the task (e.g. 'step-by-step math reasoning', 'SQL generation', 'instruction-following for support replies') and we deliver a clean, HuggingFace-ready dataset in Alpaca schema (instruction/input/output), deduplicated, train/val/test split, with every checkable answer verified in code. Drop the repo straight into Gradients (SN56), TRL, Axolotl, or Unsloth. Verified sample live: huggingface.co/datasets/HSH-Intelligence/verified-math-reasoning-3k. Tier S: 1-2K rows ($75). Tier M: 2-5K rows ($150). Tier L: 5-10K rows ($300). Custom/larger scoped on request.

Parameters schema

{
  "type": "object",
  "required": [
    "task_description",
    "row_count"
  ],
  "properties": {
    "domain": {
      "type": "string",
      "description": "Subject domain (e.g. 'math', 'SQL', 'customer support', 'legal Q&A')."
    },
    "row_count": {
      "type": "number",
      "description": "Number of training rows needed (1000-10000 standard; larger scoped on request)."
    },
    "schema_hint": {
      "type": "string",
      "description": "Preferred schema. Default: Alpaca instruction/input/output (Gradients-ready)."
    },
    "verification": {
      "enum": [
        "programmatic",
        "llm_judge",
        "none"
      ],
      "type": "string",
      "description": "How answers are checked. Programmatic (code-verified ground truth) where the task allows."
    },
    "target_platform": {
      "enum": [
        "gradients",
        "trl",
        "axolotl",
        "unsloth",
        "generic"
      ],
      "type": "string",
      "description": "Where you'll train — tunes the delivered format."
    },
    "task_description": {
      "type": "string",
      "description": "Plain English: what the model should learn to do (the instruction-following task)."
    }
  }
}

What this tool wraps· 0 endpoints

min confidence0.70 0.50

No endpoints wrapped at confidence ≥ 0.70.

Parent server

HSH Data-on-Demand

https://github.com/hshintelligence/data-on-demand

2/7 registries

View full server →