Data Science Archive Webview 85.html Telegram

pandas bokeh 一个半年前准备造的轮子被人先造了，不过这种轮子也是不少了。。。
link: https://github.com/PatrikHlobil/Pandas-Bokeh

GitHub - PatrikHlobil/Pandas-Bokeh: Bokeh Plotting Backend for Pandas and GeoPandas

Bokeh Plotting Backend for Pandas and GeoPandas. Contribute to PatrikHlobil/Pandas-Bokeh development by creating an account on GitHub.

1.8K views小熊猫, 17:40

Data Science Archive

一份对 FM 比较不错的应用介绍，包括推荐搜索这样的典型应用，适合了解 FFM 和 FM。https://www.m3tech.blog/entry/2019/01/02/090000

エムスリーテックブログ

Factorization Machineの実装と数値検証 - エムスリーテックブログ

はじめにあけましておめでとうございます。エンジニアGの西場です(@m_nishiba)。AI・機械学習チームで自然言語処理や推薦システムの開発を行っています。 Gunosyのデータ分析ブログのDeepなFactorization Machinesの最新動向 (2018)を読んでFactorization Machin…

1.6K views小熊猫, 19:50

Data Science Archive

Parabel 的 Rust 高度并行实现。https://github.com/tomtung/parabel-rs
关于 Parabel：https://dl.acm.org/citation.cfm?doid=3178876.3185998
看起来是适合大规模分类问题，性能超群，留待日后研究。

GitHub

GitHub - tomtung/omikuji: An efficient implementation of Partitioned Label Trees & its variations for extreme multi-label classification

An efficient implementation of Partitioned Label Trees & its variations for extreme multi-label classification - GitHub - tomtung/omikuji: An efficient implementation of Partitioned Label T...

1.7K views小熊猫, edited 19:53

Data Science Archive

2018年几个比较重要的数据集，自己用过 SQuAD2.0/CoQA/HotpotQA/TencentAI ML 质量都比较高
https://medium.com/syncedreview/2018-in-review-10-open-sourced-ai-datasets-696b3b49801f
还推荐 Tencent AI 前段时间发布的中文 embedding：https://ai.tencent.com/ailab/nlp/embedding.html

Medium

2018 In Review: 10 Open-Sourced AI Datasets

In a boon to AI researchers, the last year witnessed an unprecedented open-sourcing of large datasets by popular AI research projects.

1.9K views小熊猫, edited 19:57

Data Science Archive

来自Uber AI 的一个不错的轮子，玩了一天非常适合跑demo和验证，许多state of the art 的解决方案都可以先做验证。https://uber.github.io/ludwig/
blog介绍：https://eng.uber.com/introducing-ludwig/

1.7K views小熊猫, edited 08:59

Data Science Archive

DVC：做data science model管理的工具，大致原理是使用git和s3之类的进行联合存储。多人团队，跨多业务团队还是蛮有用的，上一次和其他队员一起刷Kaggle的时候用过一次体验不错。https://github.com/iterative/dvc

GitHub

GitHub - iterative/dvc: 🦉 Data Versioning and ML Experiments

🦉 Data Versioning and ML Experiments. Contribute to iterative/dvc development by creating an account on GitHub.

1.7K views小熊猫, edited 09:03

Data Science Archive

FAIR的ELF发布了ELF Go的新版，应该后面会继续发更多Go bot，https://facebook.ai/developers/tools/elf
ELF OpenGo：https://research.fb.com/facebook-open-sources-elf-opengo/
lecun的fb post：https://www.facebook.com/yann.lecun/posts/10155789997817143

1.8K views小熊猫, edited 03:11

Data Science Archive

早上试玩了一下JAX，前段时间有关注，昨天看Francois又在提到。简单来说就是Numpy+gradients，有XLA https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/g3doc/overview.md 加成的GPU加速。想实现一些底层框架的话也许是一个不错的选择。https://github.com/google/jax

2.0K views小熊猫, edited 03:19

Data Science Archive

前有StanfordNLP，又发现 https://github.com/zalandoresearch/flair 不过现在对这种轮子有点免疫。看了一些源码觉得项目代码写得还是挺不错的，自己造轮子的朋友不妨一看，看得多才能造得好。

GitHub

GitHub - flairNLP/flair: A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art Natural Language Processing (NLP) - flairNLP/flair

2.3K views小熊猫, 03:25

Data Science Archive

ignite，来自FAIR的PyTorch high-level api，昨晚玩了一下非常好用，感觉是有点像keras和tf的关系。https://github.com/pytorch/ignite

GitHub

GitHub - pytorch/ignite: High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently. - pytorch/ignite

2.9K views小熊猫, 02:18

Data Science Archive

一份spaCy的cheat sheet：http://datacamp-community-prod.s3.amazonaws.com/29aa28bf-570a-4965-8f54-d6a541ae4e06

2.1K views小熊猫, 18:15

Data Science Archive

还有一份CS229的Cheat Sheet：https://stanford.edu/~shervine/teaching/cs-229/

stanford.edu

Teaching - CS 229

Teaching page of Shervine Amidi, Graduate Student at Stanford University.

2.3K views小熊猫, 18:17

Data Science Archive

Foundations of Data Science，一份来自MSR India的资料，作者是MSR India的DataScience Lead。看一眼，书质量非常高。https://www.cs.cornell.edu/jeh/book.pdf

2.0K views小熊猫, 03:48

Data Science Archive

一些生成模型的collections，TF2+Keras，货都在colab上。https://github.com/timsainb/tensorflow2-generative-models/

GitHub

GitHub - timsainb/tensorflow2-generative-models: Implementations of a number of generative models in Tensorflow 2. GAN, VAE, Seq2Seq…

Implementations of a number of generative models in Tensorflow 2. GAN, VAE, Seq2Seq, VAEGAN, GAIA, Spectrogram Inversion. Everything is self contained in a jupyter notebook for easy export to colab...

2.1K views小熊猫, 04:10

Data Science Archive

Sequence-Aware Recommender Systems 的一份Tutorial，之前在做实验的时候也发现Session Based 的RNN做推荐效果是相当好的，尤其是在典型的存在序列Session的场景，例如YouTube连续剧，短视频流等等。https://github.com/mquad/sars_tutorial

GitHub

GitHub - mquad/sars_tutorial: Repository for the tutorial on Sequence-Aware Recommender Systems held at TheWebConf 2019 and ACM…

Repository for the tutorial on Sequence-Aware Recommender Systems held at TheWebConf 2019 and ACM RecSys 2018 - GitHub - mquad/sars_tutorial: Repository for the tutorial on Sequence-Aware Recommend...

2.2K views小熊猫, 04:12

Data Science Archive

BAMBI 是一个在PyMC3上的Python高级api，如果你经常用Bayesian statistical model的话，可以一试。我只用过PyMC3，打算试试这个BAMBI，希望好用。https://github.com/bambinos/bambi

GitHub

GitHub - bambinos/bambi: BAyesian Model-Building Interface (Bambi) in Python.

BAyesian Model-Building Interface (Bambi) in Python. - bambinos/bambi

2.4K views小熊猫, edited 01:57

Data Science Archive

Catalyst 19.06rc2 把 TensorFlow 的依赖全去掉了，完全使用 PyTorch。新版本还没试用，不过把tf去掉倒是一个好消息。
link：https://catalyst-team.github.io/catalyst/index.html
Sergey的介绍：https://docs.google.com/presentation/d/1NQGWb53Kqm-f3hZ2JIoHjX-he3C39eOcSszZzp5o07U/edit#slide=id.p

Google Docs

Catalyst.RL

Catalyst.RL tl;dr Sergey Kolesnikov

2.6K views小熊猫, 06:57

Data Science Archive

如何管理ML实验结果和模型其实是一个老生常谈的问题，reddit这个帖子总结的一些工具还是不错的，下面的评论不少也值得一看。
https://old.reddit.com/r/MachineLearning/comments/bx0apm/d_how_do_you_manage_your_machine_learning/

r/MachineLearning - [D] How do you manage your machine learning experiments?

184 votes and 68 comments so far on Reddit

2.6K views小熊猫, 02:14

Data Science Archive

Forwarded from AirOnG

https://github.com/PacktPublishing/Hands-On-Data-Structures-and-Algorithms-with-Rust 使用Rust入手数据结构和算法数据结构和算法是每种计算机语言都要面对的基础知识，而Rust由于独特的所有权问题，在实现数据结构和算法时需要一定技巧，也更能体会语言的独特性。这个repo保存了书里所有例子代码，可以用来入门，也可以用来查阅具体算法的写法。

GitHub

GitHub - PacktPublishing/Hands-On-Data-Structures-and-Algorithms-with-Rust: Hands-On Data Structures and Algorithms with Rust,…

Hands-On Data Structures and Algorithms with Rust, published by Packt - PacktPublishing/Hands-On-Data-Structures-and-Algorithms-with-Rust

470 views小熊猫, 09:55

Data Science Archive

最近在看一些NLP项目corpus的序列化部分，http://matthewrocklin.com/blog/work/2015/03/16/Fast-Serialization
文章有点老，实验部分尚可一看。

Matthewrocklin

Efficiently Store Pandas DataFrames

2.3K views小熊猫, edited 06:31

HTML Embed Code:

<iframe width="100%" src="https://www.hottg.com/buyppe/webview?embed=1" title="Telegram Webview" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

2025/07/07 05:06:00
Back to Top