Channel: Data Science Archive
pandas bokeh 一个半年前准备造的轮子被人先造了,不过这种轮子也是不少了。。。
link: https://github.com/PatrikHlobil/Pandas-Bokeh
link: https://github.com/PatrikHlobil/Pandas-Bokeh
GitHub
GitHub - PatrikHlobil/Pandas-Bokeh: Bokeh Plotting Backend for Pandas and GeoPandas
Bokeh Plotting Backend for Pandas and GeoPandas. Contribute to PatrikHlobil/Pandas-Bokeh development by creating an account on GitHub.
一份对 FM 比较不错的应用介绍,包括推荐搜索这样的典型应用,适合了解 FFM 和 FM。https://www.m3tech.blog/entry/2019/01/02/090000
エムスリーテックブログ
Factorization Machineの実装と数値検証 - エムスリーテックブログ
はじめに あけましておめでとうございます。エンジニアGの西場です(@m_nishiba)。AI・機械学習チームで自然言語処理や推薦システムの開発を行っています。 Gunosyのデータ分析ブログのDeepなFactorization Machinesの最新動向 (2018)を読んでFactorization Machin…
Parabel 的 Rust 高度并行实现。https://github.com/tomtung/parabel-rs
关于 Parabel:https://dl.acm.org/citation.cfm?doid=3178876.3185998
看起来是适合大规模分类问题,性能超群,留待日后研究。
关于 Parabel:https://dl.acm.org/citation.cfm?doid=3178876.3185998
看起来是适合大规模分类问题,性能超群,留待日后研究。
GitHub
GitHub - tomtung/omikuji: An efficient implementation of Partitioned Label Trees & its variations for extreme multi-label classification
An efficient implementation of Partitioned Label Trees & its variations for extreme multi-label classification - GitHub - tomtung/omikuji: An efficient implementation of Partitioned Label T...
2018年几个比较重要的数据集,自己用过 SQuAD2.0/CoQA/HotpotQA/TencentAI ML 质量都比较高
https://medium.com/syncedreview/2018-in-review-10-open-sourced-ai-datasets-696b3b49801f
还推荐 Tencent AI 前段时间发布的中文 embedding:https://ai.tencent.com/ailab/nlp/embedding.html
https://medium.com/syncedreview/2018-in-review-10-open-sourced-ai-datasets-696b3b49801f
还推荐 Tencent AI 前段时间发布的中文 embedding:https://ai.tencent.com/ailab/nlp/embedding.html
Medium
2018 In Review: 10 Open-Sourced AI Datasets
In a boon to AI researchers, the last year witnessed an unprecedented open-sourcing of large datasets by popular AI research projects.
来自Uber AI 的一个不错的轮子,玩了一天非常适合跑demo和验证,许多state of the art 的解决方案都可以先做验证。https://uber.github.io/ludwig/
blog介绍:https://eng.uber.com/introducing-ludwig/
blog介绍:https://eng.uber.com/introducing-ludwig/
DVC:做data science model管理的工具,大致原理是使用git和s3之类的进行联合存储。多人团队,跨多业务团队还是蛮有用的,上一次和其他队员一起刷Kaggle的时候用过一次体验不错。https://github.com/iterative/dvc
GitHub
GitHub - iterative/dvc: 🦉 Data Versioning and ML Experiments
🦉 Data Versioning and ML Experiments. Contribute to iterative/dvc development by creating an account on GitHub.
FAIR的ELF发布了ELF Go的新版,应该后面会继续发更多Go bot,https://facebook.ai/developers/tools/elf
ELF OpenGo:https://research.fb.com/facebook-open-sources-elf-opengo/
lecun的fb post:https://www.facebook.com/yann.lecun/posts/10155789997817143
ELF OpenGo:https://research.fb.com/facebook-open-sources-elf-opengo/
lecun的fb post:https://www.facebook.com/yann.lecun/posts/10155789997817143
早上试玩了一下JAX,前段时间有关注,昨天看Francois又在提到。简单来说就是Numpy+gradients,有XLA https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/g3doc/overview.md 加成的GPU加速。想实现一些底层框架的话也许是一个不错的选择。https://github.com/google/jax
前有StanfordNLP,又发现 https://github.com/zalandoresearch/flair 不过现在对这种轮子有点免疫。看了一些源码觉得项目代码写得还是挺不错的,自己造轮子的朋友不妨一看,看得多才能造得好。
GitHub
GitHub - flairNLP/flair: A very simple framework for state-of-the-art Natural Language Processing (NLP)
A very simple framework for state-of-the-art Natural Language Processing (NLP) - flairNLP/flair
ignite,来自FAIR的PyTorch high-level api,昨晚玩了一下非常好用,感觉是有点像keras和tf的关系。https://github.com/pytorch/ignite
GitHub
GitHub - pytorch/ignite: High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.
High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently. - pytorch/ignite
Foundations of Data Science,一份来自MSR India的资料,作者是MSR India的DataScience Lead。看一眼,书质量非常高。https://www.cs.cornell.edu/jeh/book.pdf
一些生成模型的collections,TF2+Keras,货都在colab上。https://github.com/timsainb/tensorflow2-generative-models/
GitHub
GitHub - timsainb/tensorflow2-generative-models: Implementations of a number of generative models in Tensorflow 2. GAN, VAE, Seq2Seq…
Implementations of a number of generative models in Tensorflow 2. GAN, VAE, Seq2Seq, VAEGAN, GAIA, Spectrogram Inversion. Everything is self contained in a jupyter notebook for easy export to colab...
Sequence-Aware Recommender Systems 的一份Tutorial,之前在做实验的时候也发现Session Based 的RNN做推荐效果是相当好的,尤其是在典型的存在序列Session的场景,例如YouTube连续剧,短视频流等等。https://github.com/mquad/sars_tutorial
GitHub
GitHub - mquad/sars_tutorial: Repository for the tutorial on Sequence-Aware Recommender Systems held at TheWebConf 2019 and ACM…
Repository for the tutorial on Sequence-Aware Recommender Systems held at TheWebConf 2019 and ACM RecSys 2018 - GitHub - mquad/sars_tutorial: Repository for the tutorial on Sequence-Aware Recommend...
BAMBI 是一个在PyMC3上的Python高级api,如果你经常用Bayesian statistical model的话,可以一试。我只用过PyMC3,打算试试这个BAMBI,希望好用。https://github.com/bambinos/bambi
GitHub
GitHub - bambinos/bambi: BAyesian Model-Building Interface (Bambi) in Python.
BAyesian Model-Building Interface (Bambi) in Python. - bambinos/bambi
Catalyst 19.06rc2 把 TensorFlow 的依赖全去掉了,完全使用 PyTorch。新版本还没试用,不过把tf去掉倒是一个好消息。
link:https://catalyst-team.github.io/catalyst/index.html
Sergey的介绍:https://docs.google.com/presentation/d/1NQGWb53Kqm-f3hZ2JIoHjX-he3C39eOcSszZzp5o07U/edit#slide=id.p
link:https://catalyst-team.github.io/catalyst/index.html
Sergey的介绍:https://docs.google.com/presentation/d/1NQGWb53Kqm-f3hZ2JIoHjX-he3C39eOcSszZzp5o07U/edit#slide=id.p
Google Docs
Catalyst.RL
Catalyst.RL tl;dr Sergey Kolesnikov
如何管理ML实验结果和模型其实是一个老生常谈的问题,reddit这个帖子总结的一些工具还是不错的,下面的评论不少也值得一看。
https://old.reddit.com/r/MachineLearning/comments/bx0apm/d_how_do_you_manage_your_machine_learning/
https://old.reddit.com/r/MachineLearning/comments/bx0apm/d_how_do_you_manage_your_machine_learning/
reddit
r/MachineLearning - [D] How do you manage your machine learning experiments?
184 votes and 68 comments so far on Reddit
Forwarded from AirOnG
https://github.com/PacktPublishing/Hands-On-Data-Structures-and-Algorithms-with-Rust 使用Rust入手数据结构和算法 数据结构和算法是每种计算机语言都要面对的基础知识,而Rust由于独特的所有权问题,在实现数据结构和算法时需要一定技巧,也更能体会语言的独特性。这个repo保存了书里所有例子代码,可以用来入门,也可以用来查阅具体算法的写法。
GitHub
GitHub - PacktPublishing/Hands-On-Data-Structures-and-Algorithms-with-Rust: Hands-On Data Structures and Algorithms with Rust,…
Hands-On Data Structures and Algorithms with Rust, published by Packt - PacktPublishing/Hands-On-Data-Structures-and-Algorithms-with-Rust
最近在看一些NLP项目corpus的序列化部分,http://matthewrocklin.com/blog/work/2015/03/16/Fast-Serialization
文章有点老,实验部分尚可一看。
文章有点老,实验部分尚可一看。
Matthewrocklin
Efficiently Store Pandas DataFrames
HTML Embed Code: