$$$$$$@@@@@@@@@@
###########$$$$$$$$$$$$$
**************#######$$$$$$$$
!!=!!!!!!!!*!*!******########$$#
;;============!!!!!*!*******#######
;:::::::::;;;;=====!!!!!!!*!*****#*#*
~~~--~~~~~~~~~::;;;;;===!!!!!!*!******
-,,,.....,,,---~~~::;:;;=====!!!!!!!!!!
,..............,,---~~:::;;;=;====!!!=!!
..................,---~~~:::;;;;========
....................,,,-~~~~::;;;;;;;=;
........,~=*#$.......,,--~~~~:::;;;;;
......-~;=*$$$$........,,--~~~::::::
..,-~;!!****=:........,,---~~~~~~
.,-:;=!!=;::,.........,,,-----
..-::;;~~,.............,,,
..,,,...............
...........
$ memora recall "apple silic
Hoshi Labs · Est. 2024 · Tokyo
Mechanistic interoperability. Frontier research.
We build on the compute nobody else uses.
What we build
Hoshi Node
Real MLX inference on Apple Silicon. Every job produces a cryptographic proof, anchored on Sui. Early node operators join now.
Memora
Local-first memory layer for AI agents. ANE-accelerated embeddings. Zero API keys, zero cloud. Drop-in for any agent framework.
Chimera
First implementation of MTP inference on MLX. We re-enabled the speculative decoding weights every framework strips. Published results.
Hawkeye
Parallel multi-LLM research engine. Dispatches queries to 7 providers simultaneously — Perplexity, Tavily, DeepSeek, Gemini, and more. Synthesizes consensus in seconds.
Open Research
The ANE is the most underutilized compute substrate in the world. We benchmark it, break it, and publish the results. No paywalls, no preprints — just findings on GitHub.
MTP weights survive 4-bit quantization
Qwen3.5 MTP heads achieve 76–86% accuracy after 4-bit quantization with fp16 sidecar. mtp.* weights stripped by MLX were never the bottleneck.
RoPE encoding is load-bearing for MTP
Without positional encoding on the MTP head, accuracy collapses to 0%. Every framework that strips these weights also strips RoPE. Both matter.
ANE runs 178 calls/sec/W vs GPU's 12
For MTP heads on Apple Silicon: ANE at ~3W delivers 15× the power efficiency of Metal GPU. The chip Apple built for ML is the right chip for ML.
Fire-and-forget ANE swarms are 14% faster
127 specialist models running simultaneously with no coordination overhead outperform sequential dispatch by 14%. Discovered during Hoshi Engine benchmarking.
Compute Efficiency — ANE vs GPU vs CPU
M3 Ultra · MTP head inference · Chimera research, 2026