Skip to content

BRAIN STORMS #20

Closed
Closed
@solomonxie

Description

@solomonxie
Owner

Keep recording all sorts of ideas of mine.

© 著作权归作者所有
image

Activity

changed the title [-]Python里中文编码的理解:unicode、utf-8、gbk[/-] [+]PROJECT IDEAS[/+] on Apr 21, 2018
solomonxie

solomonxie commented on Apr 21, 2018

@solomonxie
OwnerAuthor

Project: 制作自己的httpbin或者直接fork python版的

#httpbin #netwrok #troubleshooting

部署在服务器上

solomonxie

solomonxie commented on Apr 21, 2018

@solomonxie
OwnerAuthor

Project: 爬取所有goodreads图书信息,然后爬取全网的PDF和txt、word等资源

然后建立一个完整的图书资源和信息库。

#webspider #webcrawler #gooodreads #searchengine

显示资源完整度百分百:比如哪些图书信息是有pdf资源的 哪些是没有的。

solomonxie

solomonxie commented on Apr 21, 2018

@solomonxie
OwnerAuthor

Project: 树莓派上crontab每天git pull所有的repo保存到本地u盘。

#raspberrypi #crontab #linux #git

solomonxie

solomonxie commented on Apr 21, 2018

@solomonxie
OwnerAuthor

Project: 读取issues里所有图片并下载,然后保存到另一个repo中备份。保留对应表。

solomonxie

solomonxie commented on Apr 21, 2018

@solomonxie
OwnerAuthor

Project :抓取自己豆瓣的影评和短篇到github成为博客

#api #douban #webcrawler #webspider #python

像抓取issues一样的思路

solomonxie

solomonxie commented on Apr 21, 2018

@solomonxie
OwnerAuthor

Project: chrome插件管理bookmark

#书签 #bookmarks #chrome #automation #youtube

利用这个特性,可以动态获取facebook、知乎等网站,或某种方式,
获取这些网站的通知,然后在书签上显示出通知数字。


image

更新:

设计书签管理的XML或JSON数据结构,方便在一个文件内保持所有标签和相关信息。
包括title, type(folder/link), description, icon, script...

此chrome 插件会定期(每分钟)通过服务器的脚本访问或通过本机访问网络指定的地方,更新标签信息:如未读邮件数,网盘剩余容量,日历上的日期,TODO列表的剩余项目等等等等。

书签里的每个文件夹或链接都可以指定单独的脚本,以达到不同的效果。脚本最好支持像POSTMAN一样所有的API功能。

更新:

  • 设计单独脚本自动读取youtube订阅列表,同步到书签专门文件夹中。
  • 设计单独脚本自动读取自己github所有repos(需要权限),同步到专门的文件夹中。

Project: Chrome 插件,超越google keep的网络内容保存插件 @oct 30 2017

Chrome 插件,超越google keep的网络内容保存插件
具有逻辑性和线索性
能够记录自己搜索一个问题解答的全部相关文章,按线索性整理排列,并能一键转换为可下载的离线archive(网页PDF或全文截图)

Project: chrome网页收藏夹插件

不是收藏链接而是存储全部内容,文字型的网页就直接像safari阅读模式一样转化为简单排版文字模式然后再存储。
还可以收藏PDF、图片、gif等。
像google keep一样,插件连接云存储,实现w更完整的内容管理系统。
音频视频就算了

solomonxie

solomonxie commented on Apr 21, 2018

@solomonxie
OwnerAuthor

Project: 构建词云神经网络

#machinelearning #neuralnetwork #wordcloud #ai

两个词出现在同一篇文章中次数作为两个词的距离度。
次数越多,距离越近。

solomonxie

solomonxie commented on Apr 21, 2018

@solomonxie
OwnerAuthor

Project: 服务器下载youtube视频,自动上传到google drive和百度云

solomonxie

solomonxie commented on Apr 21, 2018

@solomonxie
OwnerAuthor

Project: python脚本实时监控一系列文件和文件夹变化,并统一到某文件夹备份,定期发邮件存档

  • python后台脚步虽系统实时运行
  • 备份hosts, .ssh, .vim, .zsh, .bash等等常用配置,以及一些个人隐私文件的备份
  • 备份文件夹内设立git,每次变动都会自动提交git,这样有迹可循
  • 备份到本地:sd卡上
  • 备份到远程:私密repo如bitbucket上,或google drive和dropbox
  • 被监控文件和文件夹和复制到的位置在单独json文件中定义

TODO:

  • 测试python操作git,py库或直接执行命令
    写伪代码设计代码
    测试python收集指定文件到指定位置
    测试python对比文件变化
solomonxie

solomonxie commented on Apr 21, 2018

@solomonxie
OwnerAuthor

Project: 人工智能 debuging专用系统

能够根据当前的错误描述和自己输入的相关特征,
推测最可能的原因,然后搜索所有相关的解决方案

注意需要时沙箱方式的,即只给出错误代码等少数信息,而不是让程序在电脑后台做各种检查。

solomonxie

solomonxie commented on Apr 21, 2018

@solomonxie
OwnerAuthor

Project: 制作ios词典 直接在spotlight里可以出结果

和极光词典一样

solomonxie

solomonxie commented on Apr 21, 2018

@solomonxie
OwnerAuthor

Project: CLI命令行版资源搜索引擎

  • 一开始主要是根据关键词来展现piratebay的搜索结果
  • 并且添加磁性连接的复制、种子下载到本地的功能。
  • 之后可以融合其它bt资源搜索结果。
  • 可以通过代理访问
  • 再之后可以从google里找出所有pdf等类型资源
  • 还可以通过ssh让自己的某台服务器下载

160 remaining items

solomonxie

solomonxie commented on Apr 6, 2019

@solomonxie
OwnerAuthor

Project: Scraper Practice

爬虫项目

强力爬取大量百度盘资源,练习爬虫编程。
只从google爬取相关站点(这样就减少了对搜索引擎的爬取),然后再针对各站点进行爬取。将内容存入本地数据库或文本文件,然后逐项检验。

#Tech #IDEAS #plan

solomonxie

solomonxie commented on Apr 6, 2019

@solomonxie
OwnerAuthor

Project: Chrome Plugin for Alfred

用python做mac桌面应用,类似alfred,但是增加强化lookup的功能,直接显示搜索词的wiki内容、豆瓣影视内容等。
也可以用chrome插件或userscript实现。

#IDEAS #Tech #plan

solomonxie

solomonxie commented on Apr 6, 2019

@solomonxie
OwnerAuthor

Project: Massive Ebooks to markdown scrapper & converter

Extract tons of ebooks and convert them into Markdown formatted text as raw data, recognize author and titles and chapters. Furthermore, convert markdown text to mobile friendly webpages.

solomonxie

solomonxie commented on Apr 6, 2019

@solomonxie
OwnerAuthor

Project: Video editing app

Video editing app, apply filter and light amendments to videos, which is currently short in the app market.
Maybe we can apply the same thing with photos to video frame by frame.

solomonxie

solomonxie commented on Apr 6, 2019

@solomonxie
OwnerAuthor

Project: Computer Vision (AI)

Extract clear images from dusty window glass.
When we see through a dusty window at outside, we can see outside clearly, but it can’t be the same with the photos we take from the same spot.

We can compare multiple photos taken at same spot and recognize and exclude the dusts from photos.
Or we can take screenshots from a video to exclude the dusts. Applies to travel videos taken inside a train or car.

solomonxie

solomonxie commented on Apr 6, 2019

@solomonxie
OwnerAuthor

Project: One annotation tool for all

对于txt、网页、word、pdf等一切阅读文件进行笔记辅助。

solomonxie

solomonxie commented on Apr 6, 2019

@solomonxie
OwnerAuthor

Project: Study English with AI

大量收集英文影视剧并对应上字幕,程序找出来念数字或念名称念字母的片段,组合起来给自己练听写。

另: 人工智能用 ,分析大量英文影视剧音频,对应字幕,归档成指定的数据结构,以后学英语用。

solomonxie

solomonxie commented on Apr 6, 2019

@solomonxie
OwnerAuthor

自制英语辞典

Idea Nov 6 2017

前期大量抓取和整理现存网络辞典,包括英英 中英,同义词 反义词 衍生词 不同时态 例句 美剧发音等
后期展现为网站、app、chrome扩展、mac辞典扩展、命令行查询等全方位翻译

solomonxie

solomonxie commented on Feb 19, 2022

@solomonxie
OwnerAuthor

Elastic search is good for searching through normalized documents, but not as fast as grep for irregular documents. So it can not be the search engine for general purposes. Its only good case scenario is still logging search.

solomonxie

solomonxie commented on Feb 19, 2022

@solomonxie
OwnerAuthor

Sharding DB Indexing

With sharing it’s difficult to know which partition the target records are except we scan each partition and each record.
It’s easy to make a hash map of ID to know exactly where is the specific row with given ID, but it doesn’t work if we query more columns.
The combination of multiple columns can be “infinite” and we cannot make infinite index.

BUT, I think the query statement is not infinite most of the time. For production programs, the SQL patterns are written in the code which aren’t changed very often.
Since that number of query pattern is stable and controllable, we can build hashing index to each query patterns.

If the DB server see the filter “where a in (1,2,3) and b = 0”, it will seek the existing index for “a, b” to know exactly where are the target records located.
If the index does not exist, the server should just run the scan anyway to build a new index for this query pattern.
While any new record is written into DB, the server should update each existing index.

So the problem is these index can be also taking a lot of spaces, but we still think that’s not growing as fast as the records themselves, and much easier to handle.

We need a SQL proxy to accept any normal SQL statement and DB connection, like RDS Proxy. And what it does is to search the index and get the target sharding locations, then rewrite the SQL and send actual queues concurrently to each sharding server. Then concatenation the result records.
The benefit of proxy server helps on both sharding query and connection pool.

solomonxie

solomonxie commented on Feb 19, 2022

@solomonxie
OwnerAuthor

1 Page Tutorials

Everything in one page

solomonxie

solomonxie commented on Feb 19, 2022

@solomonxie
OwnerAuthor

Data warehouse是基于olap的
Olap讲求时时海量数据反馈 必须用cube
olap基于oltp作为数据源,即传统数据库

我们现在需要的并不是data warehouse来提供在线实时数据,而是离线分析结果

硬盘存储1T即可存放所有当前数据源的所有数据

solomonxie

solomonxie commented on Feb 19, 2022

@solomonxie
OwnerAuthor

路人配角中的隐藏精英
这些角色看似平庸但是稳定输出,从不掉链子

东京食种:平子丈,永近英良
灌篮高手:水户洋平
高达:林有德
银河系英雄传说:亚典波罗
Aldnoah Zero: 网文韵子

solomonxie

solomonxie commented on Feb 19, 2022

@solomonxie
OwnerAuthor

Video Analysis

Extract all frames and get all the “main frames” as pictures. It’s very useful for Keynote presentations.

solomonxie

solomonxie commented on Feb 19, 2022

@solomonxie
OwnerAuthor

Python TUI postman

Build with either Urwid or Rich library

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @solomonxie

        Issue actions

          BRAIN STORMS · Issue #20 · solomonxie/blog-in-the-issues