[{"data":1,"prerenderedAt":510},["ShallowReactive",2],{"content-building-chatgpt-sync":3},{"id":4,"title":5,"articleTitleSource":6,"articleValid":7,"articleWarnings":8,"body":9,"comments":7,"date":497,"date_updated":8,"description":498,"extension":499,"icon":500,"image":501,"language":8,"meta":502,"navigation":7,"path":503,"publish-to":504,"seo":505,"stem":506,"titleLines":507,"topics":508,"translationKey":8,"__hash__":509},"content/building-chatgpt-sync.md","Building `chatgpt-sync`","h1",true,null,{"type":10,"value":11,"toc":485},"minimark",[12,26,33,36,46,49,54,69,72,79,83,86,89,92,96,107,110,113,117,120,130,133,136,140,147,154,157,161,168,174,177,180,183,187,190,193,196,200,203,210,213,217,220,223,226,230,238,245,287,292,376,383,388,435,440,452,463,473,481],[13,14,15,25],"p",{},[16,17,21],"a",{"href":18,"rel":19},"https://github.com/pavel-voronin/chatgpt-sync",[20],"nofollow",[22,23,24],"code",{},"chatgpt-sync"," started as a small local tool for exporting ChatGPT conversations into Markdown files. The initial idea was simple: use an already authenticated ChatGPT session, read the conversation list, fetch each conversation, render it into Markdown, and keep the result in a local workspace. After a few iterations, the project became less of a one-shot exporter and more of a sync engine. The difference is mostly operational: an exporter can assume that it runs once and either succeeds or fails, while a sync engine has to handle repeated runs, partial progress, moved files, missing assets, unavailable conversations, backend failures, and changes in the remote data model.",[13,27,28,29,32],{},"The project is a TypeScript/Node CLI. It does not use the public OpenAI API and it does not implement its own login flow. Instead, it connects to a separate Chrome instance through Chrome DevTools Protocol and uses an already authenticated ChatGPT profile. Chrome is used as the authenticated browser context; the tool then performs backend requests from inside that context and writes Markdown files, assets, and an ",[22,30,31],{},"index.json"," file into a local workspace.",[13,34,35],{},"The basic flow now looks like this:",[37,38,44],"pre",{"className":39,"code":41,"language":42,"meta":43},[40],"language-text","authenticated Chrome profile\n→ CDP session\n→ ChatGPT backend requests from browser runtime\n→ conversation JSON\n→ Markdown renderer\n→ local workspace\n","text","",[22,45,41],{"__ignoreMap":43},[13,47,48],{},"This post is a build log of how the architecture moved toward that shape.",[50,51,53],"h2",{"id":52},"starting-with-the-browser-session","Starting with the browser session",[13,55,56,57,60,61,64,65,68],{},"The first implementation decision was to avoid owning authentication. The tool should not store a password, automate a login form, or implement a separate auth protocol. It should assume that the user has a Chrome profile where ChatGPT already works, and it should operate through that profile. This made CDP the natural entry point: the program can inspect tabs, open or reuse a ChatGPT tab, enable ",[22,58,59],{},"Network",", ",[22,62,63],{},"Page",", and ",[22,66,67],{},"Runtime"," domains, and evaluate JavaScript in the page context.",[13,70,71],{},"This approach also means that the normal working setup is a separate Chrome process with a separate profile, not the user’s everyday GUI Chrome. That avoids interfering with the user’s browser and gives the CLI a controlled runtime. The CDP layer stayed small: connect to the WebSocket debugger URL, keep track of pending CDP requests, dispatch events, create or select the ChatGPT tab, apply a few browser-session preparations, and expose helpers for runtime evaluation.",[13,73,74,75,78],{},"The early version relied more on navigation. It opened a conversation page and captured the backend response that the ChatGPT UI loaded. That was enough to prove the path from Chrome session to Markdown output, but it tied export to UI navigation. Later versions changed this: the tool now performs a direct ",[22,76,77],{},"fetch(\"/backend-api/conversation/{id}\")"," from the browser runtime. Chrome is still needed for authentication and browser-originated context, but exporting a conversation no longer depends on navigating the tab to that conversation.",[50,80,82],{"id":81},"backend-api-instead-of-dom","Backend API instead of DOM",[13,84,85],{},"A DOM-based exporter would have been easier to prototype but worse as a sync tool. The DOM is the rendered interface, not the conversation data model. It can miss hidden state, mix UI details into content, depend on layout changes, and make it hard to handle assets, citations, Canvas documents, or alternative branches.",[13,87,88],{},"The project therefore moved toward backend JSON. The list of conversations is read from the ChatGPT backend, and each conversation is exported from its backend payload. This gives the renderer access to the mapping, metadata, content parts, attachments, references, and other structures that are not reliably available as visible page text.",[13,90,91],{},"The cost is that this is not a stable public contract. The tool depends on internal ChatGPT backend endpoints, expected headers, and the current shape of conversation JSON. That trade-off is central to the project: backend JSON is much more useful than DOM text for this task, but it requires defensive code and debug hooks because the format can change.",[50,93,95],{"id":94},"header-capture-became-its-own-problem","Header capture became its own problem",[13,97,98,99,102,103,106],{},"At first it was tempting to think that cookies would be enough. In practice, backend requests need more context. The ChatGPT web app sends authorization and several client/session/build/language/route headers, and some of them are not always available in the first network event. The tool now listens to both ",[22,100,101],{},"Network.requestWillBeSent"," and ",[22,104,105],{},"Network.requestWillBeSentExtraInfo",", merges headers by request id, identifies backend requests by path and target headers, and validates that a usable header context has been collected.",[13,108,109],{},"This also changed the preparation phase. If the selected tab is already on a conversation URL, the tool moves it back to the ChatGPT root before collecting headers. If the right backend requests do not appear, it performs a lightweight probe to the conversations endpoint from the page context. The goal is not to guess headers manually, but to observe the headers that the real web app is using and then reuse the relevant parts for list and conversation fetches.",[13,111,112],{},"That became one of the more important reliability improvements. The sync process now has an explicit “prepare backend context” phase instead of assuming that any visible ChatGPT page is immediately enough.",[50,114,116],{"id":115},"separating-scan-from-export","Separating scan from export",[13,118,119],{},"The next architectural change was splitting the process into scan and export phases. Reading the conversation list and exporting full conversation payloads have different costs and failure modes. List scanning is relatively cheap, can be paginated, and can save summaries as it goes. Conversation export is heavier: it fetches full payloads, renders Markdown, downloads assets, writes files, and updates status.",[13,121,122,123,126,127,129],{},"The scan phase now reads pages of conversation summaries, applies the selected mode, and records which conversations are new or changed. Those conversations become ",[22,124,125],{},"pending"," in ",[22,128,31],{},". The export phase then takes a bounded batch of pending conversations and exports them one by one.",[13,131,132],{},"This made incremental sync easier to reason about. If scanning fails after several pages, the summaries already seen can remain in the index, but the watermark is not advanced. If export fails, the remaining conversations stay pending. A later run can continue without assuming that the previous run completed cleanly.",[13,134,135],{},"The project also distinguishes first-run bootstrap from normal sync. A first run with no watermark has to be explicit: export the latest N conversations, export conversations from the last N days, or scan the full history. After that, incremental sync can use a watermark plus an overlap window to avoid missing borderline updates.",[50,137,139],{"id":138},"the-index-is-sync-state-not-the-archive","The index is sync state, not the archive",[13,141,142,143,146],{},"The local state file is ",[22,144,145],{},"workspace/index.json",". It stores the sync watermark, backend lock information, and per-conversation state such as summary, status, and last synced update marker. The important part is what it does not try to be: it is not the full database of the archive.",[13,148,149,150,153],{},"Earlier versions stored more file metadata in the index. Later versions reduced this and moved toward workspace-driven sync. The Markdown files themselves contain frontmatter with the conversation id, title, source URL, and update timestamp. On startup, the tool recursively scans the workspace, reads frontmatter, and builds a map from ",[22,151,152],{},"conversation_id"," to the current Markdown path.",[13,155,156],{},"This allows the user to move exported files around inside the workspace. New conversations are written to the inbox directory, but existing conversations are updated where they already live. If a previously exported file disappears, the sync engine can mark that conversation as locally removed instead of recreating it blindly. This made the filesystem part of the model instead of treating it as a disposable output directory.",[50,158,160],{"id":159},"rendering-turned-out-to-be-most-of-the-work","Rendering turned out to be most of the work",[13,162,163,164,167],{},"The Markdown renderer became the largest and most product-specific part of the project. A ChatGPT conversation is not just a flat list of messages. It is a tree with a ",[22,165,166],{},"current_node",", message metadata, content parts, attachments, tool messages, possible Canvas state, research reports, citations, and other references.",[13,169,170,171,173],{},"One early issue was branch handling. A recursive walk over children can mix alternative branches of the conversation. The renderer now builds the path from ",[22,172,166],{}," back to the root and renders that path in order. This means the Markdown follows the currently visible branch rather than trying to preserve every alternative response.",[13,175,176],{},"The renderer also filters internal or non-user-facing content. System messages, raw tool calls, model context, reasoning-related internals, and canmore service messages are not useful as normal Markdown transcript content. Some of them are ignored; some are used to reconstruct visible artifacts. This is especially relevant for Canvas. The raw canmore create/update messages are not rendered as a log, but they are used to maintain the active text document state, so the output can contain the resulting document rather than the implementation protocol.",[13,178,179],{},"Deep Research required another special case. Some report content is not stored like a normal assistant message. The renderer detects the relevant metadata, parses the widget state, extracts the report message, and renders it as Markdown. Citations and content references are also handled separately: source footnotes, nav lists, entity references, and inline link lists are converted into Markdown links or source sections where possible.",[13,181,182],{},"This is the part of the project where “exporting chat messages” became too small a description. The renderer has to preserve useful artifacts, not only visible text.",[50,184,186],{"id":185},"assets-and-partial-failure","Assets and partial failure",[13,188,189],{},"Assets are handled through placeholders during render and resolved after files are downloaded. The project supports several placement strategies: a fixed assets folder, assets next to the Markdown file, assets at the workspace root, or a subfolder next to the current Markdown file. Before a conversation is re-exported, old asset artifacts for that Markdown file are removed so that stale files do not remain attached to an updated export.",[13,191,192],{},"Asset download also needed a non-fatal failure mode. A conversation can be exportable even when one file is no longer available or one signed download URL fails. In that case, the Markdown should still be written, and the missing asset should be replaced with a readable note. On the other hand, statuses such as 429 or 5xx are treated as backend-level problems and can stop the run.",[13,194,195],{},"This was a small but important distinction: one missing file should not destroy the text export, but backend pressure or service failure should not be ignored.",[50,197,199],{"id":198},"backend-locks-and-unavailable-conversations","Backend locks and unavailable conversations",[13,201,202],{},"Once the tool became suitable for repeated runs, backend failure handling had to be more explicit. Some errors indicate that continuing is probably wrong: 401, 403, 408, 429, 5xx, or a missing/unknown backend status. These can set a backend lock in the index. A later run can exit early while the lock is active instead of repeatedly hitting the same failing backend.",[13,204,205,206,209],{},"A 404 for one conversation is handled differently. It does not necessarily mean that the backend is unavailable; it can mean that this specific conversation payload cannot be fetched. In that case the conversation is marked ",[22,207,208],{},"unavailable",", and the rest of the sync can continue. This became part of the current error model in the latest version.",[13,211,212],{},"The same general rule appears in several places: distinguish local or item-level failure from global backend failure, keep enough state to retry safely, and avoid turning every error into either a full crash or silent success.",[50,214,216],{"id":215},"current-shape","Current shape",[13,218,219],{},"The current architecture is a small local CLI with a few clear layers: CDP session preparation, backend header capture, list scanning, conversation export, Markdown rendering, asset handling, and index storage. The tool uses ChatGPT’s web app context to access structured backend data, but the user-facing result is just files in a local workspace.",[13,221,222],{},"The main trade-off remains unchanged. This is not an official ChatGPT integration, so the backend contract can break. The project compensates with debug modes, raw JSON dumps, unknown-part rendering, typed backend errors, throttling, batch limits, and a renderer that can be extended when new content shapes appear.",[13,224,225],{},"The final design is still simple in form: run a CLI, use an authenticated Chrome profile, write Markdown files. Most of the work is in the details around making that repeatable without treating the local archive as a temporary dump and without assuming that the remote system is stable.",[50,227,229],{"id":228},"trying-it-locally","Trying it locally",[13,231,232,233,237],{},"The project is on GitHub: ",[16,234,236],{"href":18,"rel":235},[20],"pavel-voronin/chatgpt-sync",". The README has the full setup notes, but the short version is:",[239,240,241],"ol",{},[242,243,244],"li",{},"Clone the repository and install dependencies.",[37,246,250],{"className":247,"code":248,"language":249,"meta":43,"style":43},"language-bash shiki shiki-themes material-theme","git clone https://github.com/pavel-voronin/chatgpt-sync.git\ncd chatgpt-sync\nnpm install\n","bash",[22,251,252,268,278],{"__ignoreMap":43},[253,254,257,261,265],"span",{"class":255,"line":256},"line",1,[253,258,260],{"class":259},"s5Dmg","git",[253,262,264],{"class":263},"sfyAc"," clone",[253,266,267],{"class":263}," https://github.com/pavel-voronin/chatgpt-sync.git\n",[253,269,271,275],{"class":255,"line":270},2,[253,272,274],{"class":273},"sdLwU","cd",[253,276,277],{"class":263}," chatgpt-sync\n",[253,279,281,284],{"class":255,"line":280},3,[253,282,283],{"class":259},"npm",[253,285,286],{"class":263}," install\n",[239,288,289],{"start":270},[242,290,291],{},"Start a dedicated Chrome instance with a CDP endpoint. The Chrome profile used here must already be authenticated in ChatGPT.",[37,293,295],{"className":247,"code":294,"language":249,"meta":43,"style":43},"open -na \"/Applications/Google Chrome.app\" --args \\\n  --headless \\\n  --disable-gpu \\\n  --remote-debugging-port=9222 \\\n  --user-data-dir=\"$HOME/.chrome-chatgpt-sync\" \\\n  --no-first-run \\\n  about:blank\n",[22,296,297,322,329,336,344,362,370],{"__ignoreMap":43},[253,298,299,302,305,309,312,315,318],{"class":255,"line":256},[253,300,301],{"class":259},"open",[253,303,304],{"class":263}," -na",[253,306,308],{"class":307},"sAklC"," \"",[253,310,311],{"class":263},"/Applications/Google Chrome.app",[253,313,314],{"class":307},"\"",[253,316,317],{"class":263}," --args",[253,319,321],{"class":320},"svy0-"," \\\n",[253,323,324,327],{"class":255,"line":270},[253,325,326],{"class":263},"  --headless",[253,328,321],{"class":320},[253,330,331,334],{"class":255,"line":280},[253,332,333],{"class":263},"  --disable-gpu",[253,335,321],{"class":320},[253,337,339,342],{"class":255,"line":338},4,[253,340,341],{"class":263},"  --remote-debugging-port=9222",[253,343,321],{"class":320},[253,345,347,350,352,355,358,360],{"class":255,"line":346},5,[253,348,349],{"class":263},"  --user-data-dir=",[253,351,314],{"class":307},[253,353,354],{"class":320},"$HOME",[253,356,357],{"class":263},"/.chrome-chatgpt-sync",[253,359,314],{"class":307},[253,361,321],{"class":320},[253,363,365,368],{"class":255,"line":364},6,[253,366,367],{"class":263},"  --no-first-run",[253,369,321],{"class":320},[253,371,373],{"class":255,"line":372},7,[253,374,375],{"class":263},"  about:blank\n",[13,377,378,379,382],{},"After the first start, open the same profile without ",[22,380,381],{},"--headless"," if you need to sign in to ChatGPT manually, then restart it in headless mode.",[239,384,385],{"start":280},[242,386,387],{},"Create a local env file and choose the first-run bootstrap mode.",[37,389,391],{"className":247,"code":390,"language":249,"meta":43,"style":43},"cat > .env.local \u003C\u003C'EOF'\nCHATGPT_SYNC_CDP_HTTP=http://127.0.0.1:9222\nCHATGPT_SYNC_WORKSPACE_DIR=./output\nCHATGPT_SYNC_BOOTSTRAP_MODE=count\nCHATGPT_SYNC_BOOTSTRAP_COUNT=5\nEOF\n",[22,392,393,410,415,420,425,430],{"__ignoreMap":43},[253,394,395,398,401,404,407],{"class":255,"line":256},[253,396,397],{"class":259},"cat",[253,399,400],{"class":307}," >",[253,402,403],{"class":263}," .env.local",[253,405,406],{"class":307}," \u003C\u003C",[253,408,409],{"class":307},"'EOF'\n",[253,411,412],{"class":255,"line":270},[253,413,414],{"class":263},"CHATGPT_SYNC_CDP_HTTP=http://127.0.0.1:9222\n",[253,416,417],{"class":255,"line":280},[253,418,419],{"class":263},"CHATGPT_SYNC_WORKSPACE_DIR=./output\n",[253,421,422],{"class":255,"line":338},[253,423,424],{"class":263},"CHATGPT_SYNC_BOOTSTRAP_MODE=count\n",[253,426,427],{"class":255,"line":346},[253,428,429],{"class":263},"CHATGPT_SYNC_BOOTSTRAP_COUNT=5\n",[253,431,432],{"class":255,"line":364},[253,433,434],{"class":307},"EOF\n",[239,436,437],{"start":338},[242,438,439],{},"Run the sync.",[37,441,443],{"className":247,"code":442,"language":249,"meta":43,"style":43},"npm start\n",[22,444,445],{"__ignoreMap":43},[253,446,447,449],{"class":255,"line":256},[253,448,283],{"class":259},[253,450,451],{"class":263}," start\n",[13,453,454,455,458,459,462],{},"The default output goes into ",[22,456,457],{},"./output",". The Markdown files are written there, and sync state is stored in ",[22,460,461],{},"output/index.json",". After the first run, normal incremental runs can be started with the same command:",[37,464,465],{"className":247,"code":442,"language":249,"meta":43,"style":43},[22,466,467],{"__ignoreMap":43},[253,468,469,471],{"class":255,"line":256},[253,470,283],{"class":259},[253,472,451],{"class":263},[13,474,475,476,480],{},"For the full list of options, see the project’s ",[16,477,479],{"href":18,"rel":478},[20],"README",".",[482,483,484],"style",{},"html pre.shiki code .s5Dmg, html code.shiki .s5Dmg{--shiki-default:#FFCB6B}html pre.shiki code .sfyAc, html code.shiki .sfyAc{--shiki-default:#C3E88D}html pre.shiki code .sdLwU, html code.shiki .sdLwU{--shiki-default:#82AAFF}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html pre.shiki code .sAklC, html code.shiki .sAklC{--shiki-default:#89DDFF}html pre.shiki code .svy0-, html code.shiki .svy0-{--shiki-default:#EEFFFF}",{"title":43,"searchDepth":270,"depth":270,"links":486},[487,488,489,490,491,492,493,494,495,496],{"id":52,"depth":270,"text":53},{"id":81,"depth":270,"text":82},{"id":94,"depth":270,"text":95},{"id":115,"depth":270,"text":116},{"id":138,"depth":270,"text":139},{"id":159,"depth":270,"text":160},{"id":185,"depth":270,"text":186},{"id":198,"depth":270,"text":199},{"id":215,"depth":270,"text":216},{"id":228,"depth":270,"text":229},"2026-04-30","A build log on turning ChatGPT conversations into a local Markdown archive through CDP, backend JSON, and incremental sync.","md","streamline-ultimate-color:conversation-sync","og-image.jpg",{},"/building-chatgpt-sync","all",{"title":5,"description":498},"building-chatgpt-sync","1","build log, AI toolchain","563AdLXvGHefpFLMm5PHiVSL0gKUvrg7aIytQKsme2o",1777568642668]