Extract and process different types of blocks from markdown files
MD-Extract is a utility for extracting and processing content from markdown files. It can extract code blocks, normal text blocks, or both, with various output formats and filtering options.
Extract code blocks from a markdown file:
pinocchio helpers md-extract --file input.md
Extract all blocks (code and normal text):
pinocchio helpers md-extract --file input.md --blocks all
Process markdown from stdin:
cat input.md | pinocchio helpers md-extract
--output <string>
# Simple concatenation of blocks
pinocchio helpers md-extract --file input.md --output concatenated
# Detailed list with block metadata
pinocchio helpers md-extract --file input.md --output list
# YAML format for structured processing
pinocchio helpers md-extract --file input.md --output yaml
--blocks <string>
# Extract all blocks
pinocchio helpers md-extract --file input.md --blocks all
# Extract only normal text blocks
pinocchio helpers md-extract --file input.md --blocks normal
# Extract only code blocks (default)
pinocchio helpers md-extract --file input.md --blocks code
--with-quotes <bool>
pinocchio helpers md-extract --file input.md --with-quotes
--allowed-languages <list>
pinocchio helpers md-extract --file input.md --allowed-languages python,go
--file <string>
# Read from file
pinocchio helpers md-extract --file README.md
# Read from stdin
cat README.md | pinocchio helpers md-extract --file -
Outputs blocks one after another:
// For code blocks with --with-quotes
```python
def hello():
print("Hello")
// For code blocks without --with-quotes def hello(): print("Hello")
// For normal text blocks This is a normal text block.
### List
Outputs blocks with metadata:
Language: python
def hello():
print("Hello")
### YAML
Outputs blocks in structured YAML format:
```yaml
- type: code
language: python
content: |
def hello():
print("Hello")
- type: normal
content: This is a normal text block.
Extract Code Examples
# Extract Python code examples from documentation
pinocchio helpers md-extract --file docs.md --allowed-languages python
Documentation Processing
# Extract all content in structured format
pinocchio helpers md-extract --file README.md --blocks all --output yaml
Code Block Collection
# Collect all code blocks with language markers
pinocchio helpers md-extract --file tutorial.md --with-quotes
Text Content Extraction
# Extract only text content
pinocchio helpers md-extract --file article.md --blocks normal