MarkItDown: Convert Documents to Markdown

This project is an ongoing journey — learning AI open source projects with steady, daily progress. Through hands-on work with real projects and AI tooling, the goal is to develop the ability to solve complex problems and document the process. Notion List 1. Introduction 1.1. MarkItDown and Markdown — Clarifying the Relationship First, it is important to clarify that “MarkItDown” is not a misspelling of the general-purpose markup language “Markdown.” MarkItDown is a specific Python library developed and open-sourced by Microsoft. While its name resembles Markdown and its core purpose is to convert various file formats into Markdown, MarkItDown is an independent software entity. This article focuses on analyzing the implementation principles, design philosophy, features, and practical applications of the MarkItDown tool, while also referencing the Markdown language itself as the target output format when relevant. ...

April 21, 2025 · 22 min · 4521 words · Xinwei Xiong, Me

Large Language Models: How LLMs Work

LLM’s basic learning theory [toc] Introduction to large language models Large Language Model (LLM), also known as large language model, is an artificial intelligence model designed to understand and generate human language. LLMs typically refer to language models containing tens of billions (or more) of parameters that are trained on massive amounts of text data to gain a deep understanding of language. At present, well-known foreign LLMs include GPT-3.5, GPT-4, PaLM, Claude and LLaMA, etc., and domestic ones include Wenxinyiyan, iFlytek Spark, Tongyi Qianwen, ChatGLM, Baichuan, etc. ...

May 15, 2024 · 147 min · 31282 words · Xinwei Xiong, Me