You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To pick up a draggable item, press the space bar.
While dragging, use the arrow keys to move the item.
Press space again to drop the item in its new position, or press escape to cancel.
Project: Massive Ebooks to markdown scrapper & converter
Extract tons of ebooks and convert them into Markdown formatted text as raw data, recognize author and titles and chapters. Furthermore, convert markdown text to mobile friendly webpages.
Video editing app, apply filter and light amendments to videos, which is currently short in the app market.
Maybe we can apply the same thing with photos to video frame by frame.
Extract clear images from dusty window glass.
When we see through a dusty window at outside, we can see outside clearly, but it can’t be the same with the photos we take from the same spot.
We can compare multiple photos taken at same spot and recognize and exclude the dusts from photos.
Or we can take screenshots from a video to exclude the dusts. Applies to travel videos taken inside a train or car.
Elastic search is good for searching through normalized documents, but not as fast as grep for irregular documents. So it can not be the search engine for general purposes. Its only good case scenario is still logging search.
With sharing it’s difficult to know which partition the target records are except we scan each partition and each record.
It’s easy to make a hash map of ID to know exactly where is the specific row with given ID, but it doesn’t work if we query more columns.
The combination of multiple columns can be “infinite” and we cannot make infinite index.
BUT, I think the query statement is not infinite most of the time. For production programs, the SQL patterns are written in the code which aren’t changed very often.
Since that number of query pattern is stable and controllable, we can build hashing index to each query patterns.
If the DB server see the filter “where a in (1,2,3) and b = 0”, it will seek the existing index for “a, b” to know exactly where are the target records located.
If the index does not exist, the server should just run the scan anyway to build a new index for this query pattern.
While any new record is written into DB, the server should update each existing index.
So the problem is these index can be also taking a lot of spaces, but we still think that’s not growing as fast as the records themselves, and much easier to handle.
We need a SQL proxy to accept any normal SQL statement and DB connection, like RDS Proxy. And what it does is to search the index and get the target sharding locations, then rewrite the SQL and send actual queues concurrently to each sharding server. Then concatenation the result records.
The benefit of proxy server helps on both sharding query and connection pool.
Activity
[-]Python里中文编码的理解:unicode、utf-8、gbk[/-][+]PROJECT IDEAS[/+]solomonxie commentedon Apr 21, 2018
Project: 制作自己的httpbin或者直接fork python版的
#httpbin #netwrok #troubleshooting
部署在服务器上
solomonxie commentedon Apr 21, 2018
Project: 爬取所有goodreads图书信息,然后爬取全网的PDF和txt、word等资源
然后建立一个完整的图书资源和信息库。
#webspider #webcrawler #gooodreads #searchengine
显示资源完整度百分百:比如哪些图书信息是有pdf资源的 哪些是没有的。
solomonxie commentedon Apr 21, 2018
Project: 树莓派上crontab每天git pull所有的repo保存到本地u盘。
#raspberrypi #crontab #linux #git
solomonxie commentedon Apr 21, 2018
Project: 读取issues里所有图片并下载,然后保存到另一个repo中备份。保留对应表。
solomonxie commentedon Apr 21, 2018
Project :抓取自己豆瓣的影评和短篇到github成为博客
#api #douban #webcrawler #webspider #python
像抓取issues一样的思路
solomonxie commentedon Apr 21, 2018
Project: chrome插件管理bookmark
#书签 #bookmarks #chrome #automation #youtube
利用这个特性,可以动态获取facebook、知乎等网站,或某种方式,

获取这些网站的通知,然后在书签上显示出通知数字。
如

更新:
设计书签管理的XML或JSON数据结构,方便在一个文件内保持所有标签和相关信息。
包括
title, type(folder/link), description, icon, script...
此chrome 插件会定期(每分钟)通过服务器的脚本访问或通过本机访问网络指定的地方,更新标签信息:如未读邮件数,网盘剩余容量,日历上的日期,TODO列表的剩余项目等等等等。
书签里的每个文件夹或链接都可以指定单独的脚本,以达到不同的效果。脚本最好支持像POSTMAN一样所有的API功能。
更新:
Project: Chrome 插件,超越google keep的网络内容保存插件 @oct 30 2017
Chrome 插件,超越google keep的网络内容保存插件
具有逻辑性和线索性
能够记录自己搜索一个问题解答的全部相关文章,按线索性整理排列,并能一键转换为可下载的离线archive(网页PDF或全文截图)
Project: chrome网页收藏夹插件
不是收藏链接而是存储全部内容,文字型的网页就直接像safari阅读模式一样转化为简单排版文字模式然后再存储。
还可以收藏PDF、图片、gif等。
像google keep一样,插件连接云存储,实现w更完整的内容管理系统。
音频视频就算了
solomonxie commentedon Apr 21, 2018
Project: 构建词云神经网络
#machinelearning #neuralnetwork #wordcloud #ai
两个词出现在同一篇文章中次数作为两个词的距离度。
次数越多,距离越近。
solomonxie commentedon Apr 21, 2018
Project: 服务器下载youtube视频,自动上传到google drive和百度云
solomonxie commentedon Apr 21, 2018
Project: python脚本实时监控一系列文件和文件夹变化,并统一到某文件夹备份,定期发邮件存档
TODO:
solomonxie commentedon Apr 21, 2018
Project: 人工智能 debuging专用系统
能够根据当前的错误描述和自己输入的相关特征,
推测最可能的原因,然后搜索所有相关的解决方案
注意需要时沙箱方式的,即只给出错误代码等少数信息,而不是让程序在电脑后台做各种检查。
solomonxie commentedon Apr 21, 2018
Project: 制作ios词典 直接在spotlight里可以出结果
和极光词典一样
solomonxie commentedon Apr 21, 2018
Project: CLI命令行版资源搜索引擎
160 remaining items
solomonxie commentedon Apr 6, 2019
Project: Scraper Practice
爬虫项目
强力爬取大量百度盘资源,练习爬虫编程。
只从google爬取相关站点(这样就减少了对搜索引擎的爬取),然后再针对各站点进行爬取。将内容存入本地数据库或文本文件,然后逐项检验。
#Tech #IDEAS #plan
solomonxie commentedon Apr 6, 2019
Project: Chrome Plugin for Alfred
用python做mac桌面应用,类似alfred,但是增加强化lookup的功能,直接显示搜索词的wiki内容、豆瓣影视内容等。
也可以用chrome插件或userscript实现。
#IDEAS #Tech #plan
solomonxie commentedon Apr 6, 2019
Project: Massive Ebooks to markdown scrapper & converter
Extract tons of ebooks and convert them into Markdown formatted text as raw data, recognize author and titles and chapters. Furthermore, convert markdown text to mobile friendly webpages.
solomonxie commentedon Apr 6, 2019
Project: Video editing app
Video editing app, apply filter and light amendments to videos, which is currently short in the app market.
Maybe we can apply the same thing with photos to video frame by frame.
solomonxie commentedon Apr 6, 2019
Project: Computer Vision (AI)
Extract clear images from dusty window glass.
When we see through a dusty window at outside, we can see outside clearly, but it can’t be the same with the photos we take from the same spot.
We can compare multiple photos taken at same spot and recognize and exclude the dusts from photos.
Or we can take screenshots from a video to exclude the dusts. Applies to travel videos taken inside a train or car.
solomonxie commentedon Apr 6, 2019
Project: One annotation tool for all
对于txt、网页、word、pdf等一切阅读文件进行笔记辅助。
solomonxie commentedon Apr 6, 2019
Project: Study English with AI
大量收集英文影视剧并对应上字幕,程序找出来念数字或念名称念字母的片段,组合起来给自己练听写。
另: 人工智能用 ,分析大量英文影视剧音频,对应字幕,归档成指定的数据结构,以后学英语用。
solomonxie commentedon Apr 6, 2019
自制英语辞典
Idea Nov 6 2017
前期大量抓取和整理现存网络辞典,包括英英 中英,同义词 反义词 衍生词 不同时态 例句 美剧发音等
后期展现为网站、app、chrome扩展、mac辞典扩展、命令行查询等全方位翻译
solomonxie commentedon Feb 19, 2022
Elastic search is good for searching through normalized documents, but not as fast as grep for irregular documents. So it can not be the search engine for general purposes. Its only good case scenario is still logging search.
solomonxie commentedon Feb 19, 2022
Sharding DB Indexing
With sharing it’s difficult to know which partition the target records are except we scan each partition and each record.
It’s easy to make a hash map of ID to know exactly where is the specific row with given ID, but it doesn’t work if we query more columns.
The combination of multiple columns can be “infinite” and we cannot make infinite index.
BUT, I think the query statement is not infinite most of the time. For production programs, the SQL patterns are written in the code which aren’t changed very often.
Since that number of query pattern is stable and controllable, we can build hashing index to each query patterns.
If the DB server see the filter “where a in (1,2,3) and b = 0”, it will seek the existing index for “a, b” to know exactly where are the target records located.
If the index does not exist, the server should just run the scan anyway to build a new index for this query pattern.
While any new record is written into DB, the server should update each existing index.
So the problem is these index can be also taking a lot of spaces, but we still think that’s not growing as fast as the records themselves, and much easier to handle.
We need a SQL proxy to accept any normal SQL statement and DB connection, like RDS Proxy. And what it does is to search the index and get the target sharding locations, then rewrite the SQL and send actual queues concurrently to each sharding server. Then concatenation the result records.
The benefit of proxy server helps on both sharding query and connection pool.
solomonxie commentedon Feb 19, 2022
1 Page Tutorials
Everything in one page
solomonxie commentedon Feb 19, 2022
Data warehouse是基于olap的
Olap讲求时时海量数据反馈 必须用cube
olap基于oltp作为数据源,即传统数据库
我们现在需要的并不是data warehouse来提供在线实时数据,而是离线分析结果
硬盘存储1T即可存放所有当前数据源的所有数据
solomonxie commentedon Feb 19, 2022
路人配角中的隐藏精英
这些角色看似平庸但是稳定输出,从不掉链子
东京食种:平子丈,永近英良
灌篮高手:水户洋平
高达:林有德
银河系英雄传说:亚典波罗
Aldnoah Zero: 网文韵子
solomonxie commentedon Feb 19, 2022
Video Analysis
Extract all frames and get all the “main frames” as pictures. It’s very useful for Keynote presentations.
solomonxie commentedon Feb 19, 2022
Python TUI postman
Build with either Urwid or Rich library