Discussion

Home     Discussion Board      Sohu2019      比赛中使用开源数据请在此回复公布

sohucampus

Sohu Staff

Sohu Staff

比赛中使用开源数据请在此回复公布

posted in   Sohu2019

April 16, 2019, 9:27 a.m.

23  comments

全体起立

May 7, 2019, 2:51 a.m.

Reply

0
<p><a href="https://github.com/bigzhao/Keyword_Extraction/tree/master/%E5%AD%97%E5%85%B8">https://github.com/bigzhao/Keyword_Extraction/tree/master/%E5%AD%97%E5%85%B8</a><br>字典<br>bert 中文预训练<br><a href="https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip">https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip</a></p>

sys1874

May 5, 2019, 3:35 a.m.

Reply

0
<p><a href="https://github.com/HIT-SCIR/ELMoForManyLangs">https://github.com/HIT-SCIR/ELMoForManyLangs</a> 词向量</p>

onion1003

May 4, 2019, 6:46 a.m.

Reply

0
<p><a href="https://github.com/bigzhao/Keyword_Extraction/tree/master/%E5%AD%97%E5%85%B8">https://github.com/bigzhao/Keyword_Extraction/tree/master/%E5%AD%97%E5%85%B8</a><br>字典</p>

zhudaxia

May 1, 2019, 3:48 p.m.

Reply

0
<p><a href="https://github.com/Fengfeng1024/SOHU-baseline">https://github.com/Fengfeng1024/SOHU-baseline</a></p>

Clitost

April 30, 2019, 3:56 p.m.

Reply

0
<p>字典数据:<br>链接:<a href="https://pan.baidu.com/s/1ggfLba3mqm8ZwKFRoXcs9A">https://pan.baidu.com/s/1ggfLba3mqm8ZwKFRoXcs9A</a><br>提取码:nejc</p>

lizp

April 29, 2019, 12:51 a.m.

Reply

0
<p>词典数据<br>链接:<a href="https://pan.baidu.com/s/1lIgeI4FEKpr4BXPcvJO5BA">https://pan.baidu.com/s/1lIgeI4FEKpr4BXPcvJO5BA</a><br>提取码:shsg </p>

allennlp

April 28, 2019, 9 a.m.

Reply

0
<p><a href="https://github.com/fighting41love/funNLP">https://github.com/fighting41love/funNLP</a><br>词典数据</p>

TPF

April 26, 2019, 5:51 a.m.

Reply

0
<p><a href="https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip">https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip</a><br><a href="https://github.com/google-research/bert">https://github.com/google-research/bert</a></p>

Passer_by

April 25, 2019, 12:45 p.m.

Reply

0
<p><a href="https://github.com/google-research/bert">https://github.com/google-research/bert</a></p>

peco

April 25, 2019, 2:21 a.m.

Reply

0
<p><a href="https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip">https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip</a><br><a href="https://github.com/google-research/bert">https://github.com/google-research/bert</a></p>

zhengguokai

April 25, 2019, 2:15 a.m.

Reply

0
<p><a href="https://github.com/macanv/BERT-BiLSTM-CRF-NER">https://github.com/macanv/BERT-BiLSTM-CRF-NER</a></p>

dachougui

April 23, 2019, 12:29 p.m.

Reply

0
<p><a href="https://drive.google.com/drive/folders/1K_xCYMCEfjpPjedSnMyL9zMVzqbanQX9">BERT中文预训练</a></p>

MaggicQ

April 23, 2019, noon

Reply

0
<p>100+ Chinese Word Vectors 上百种预训练中文词向量<br><a href="https://github.com/Embedding/Chinese-Word-Vectors">https://github.com/Embedding/Chinese-Word-Vectors</a></p>

aber

April 23, 2019, 1:57 a.m.

Reply

0
<p><a href="https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip">https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip</a><br><a href="https://github.com/google-research/bert">https://github.com/google-research/bert</a><br>bert预训练模型以及bert官方源码</p>

reborn

April 23, 2019, 1:42 a.m.

Reply

0
<p>bert 中文预训练<br><a href="https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip">https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip</a></p>

CallmeDad

April 22, 2019, 12:15 p.m.

Reply

0
<p>词向量:链接: <a href="https://pan.baidu.com/s/1htC495U">https://pan.baidu.com/s/1htC495U</a> 密码: 4ff8</p>

CallmeDad

April 22, 2019, 6:29 a.m.

Reply

0
<p><a href="https://github.com/kyzhouhzau/BERT-NER">https://github.com/kyzhouhzau/BERT-NER</a></p>

超能

April 21, 2019, 5:33 p.m.

Reply

0
<p><a href="https://github.com/wainshine/Chinese-Names-Corpus">https://github.com/wainshine/Chinese-Names-Corpus</a><br><a href="https://github.com/brightmart/nlp_chinese_corpus">https://github.com/brightmart/nlp_chinese_corpus</a><br><a href="https://github.com/SophonPlus/ChineseNlpCorpus">https://github.com/SophonPlus/ChineseNlpCorpus</a><br><a href="https://github.com/ml-distribution/chinese-corpus">https://github.com/ml-distribution/chinese-corpus</a></p>

sxu_nlp

April 20, 2019, 8:49 a.m.

Reply

0
<p>可能用到数据:<br>中文维基百科、搜狗新闻<br>链接:<a href="https://pan.baidu.com/s/1lAkp0VBHkV2IjzYG6e3GEg">https://pan.baidu.com/s/1lAkp0VBHkV2IjzYG6e3GEg</a><br>提取码:oaax<br>BERT中文预训练模型 <a href="https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip">https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip</a><br>训练好的词向量<br><a href="https://github.com/Embedding/Chinese-Word-Vectors">https://github.com/Embedding/Chinese-Word-Vectors</a></p>

smybiendata

April 20, 2019, 4:18 a.m.

Reply

0
<p>BERT中文预训练模型 <a href="https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip">https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip</a></p>

deepfind

April 19, 2019, 3:07 p.m.

Reply

0
<p>BERT Chinese<br><a href="https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip">https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip</a></p>

ll842316268

April 19, 2019, 3:05 p.m.

Reply

0
<p><a href="https://github.com/PaddlePaddle/LARK">https://github.com/PaddlePaddle/LARK</a> 百度预训练模型</p>

NaiveNick

April 19, 2019, 1:03 p.m.

Reply

0
<p>ELMo<br><a href="https://github.com/HIT-SCIR/ELMoForManyLangs">https://github.com/HIT-SCIR/ELMoForManyLangs</a><br>HowNet<br><a href="https://github.com/thunlp/OpenHowNet-API">https://github.com/thunlp/OpenHowNet-API</a><br>LASER<br><a href="https://dl.fbaipublicfiles.com/laser/models">https://dl.fbaipublicfiles.com/laser/models</a><br>ernie<br><a href="https://ernie.bj.bcebos.com/ERNIE_stable.tgz">https://ernie.bj.bcebos.com/ERNIE_stable.tgz</a><br>bert chinese<br><a href="https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip">https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip</a><br>bert multilingual<br><a href="https://storage.googleapis.com/bert_models/2018_11_03/multilingual_L-12_H-768_A-12.zip">https://storage.googleapis.com/bert_models/2018_11_03/multilingual_L-12_H-768_A-12.zip</a></p>

maomao

April 19, 2019, 12:03 p.m.

Reply

0
<p>2014人民日报数据集 链接: <a href="https://pan.baidu.com/s/1jAJPlYQ_j8__P1386Knz-w">https://pan.baidu.com/s/1jAJPlYQ_j8__P1386Knz-w</a> 提取码: bevm</p>

李贤

April 19, 2019, 10:51 a.m.

Reply

0
<p>腾讯词向量<br><a href="https://cloud.tencent.com/developer/article/1356164">https://cloud.tencent.com/developer/article/1356164</a></p>

bluekeroro

April 19, 2019, 7:35 a.m.

Reply

0
<p>可能使用:<br><a href="http://ltp.ai/download.html">哈工大ltp相关模型</a><br>以及该板块提到的其他开源数据,应该不需要重复了</p>

zhengzx

April 19, 2019, 6:57 a.m.

Reply

0
<p>可能用到如下开源词典:<br><a href="https://github.com/thunlp/THUOCL">THUOCL清华大学开放中文词库</a><br><a href="https://github.com/1data-inc">壹沓科技中文词库</a><br><a href="https://dumps.wikimedia.org/zhwiki/">中文维基百科词条信息</a><br><a href="https://github.com/bigzhao/Keyword_Extraction/tree/master/字典">bigzhao贡献的词典</a><br><a href="https://github.com/fighting41love/funNLP">funNLP</a><br><a href="https://github.com/goto456/stopwords">中文停止词集合</a></p>

bigheart

April 19, 2019, 6:29 a.m.

Reply

0
<p>ELMo 预训练模型 <a href="https://github.com/HIT-SCIR/ELMoForManyLangs">https://github.com/HIT-SCIR/ELMoForManyLangs</a></p>

tangmingyi

April 19, 2019, 5:43 a.m.

Reply

0
<p>BERT中文预训练模型 <a href="https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip">https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip</a></p>

虹猫蓝兔仗剑走天涯

April 19, 2019, 2:47 a.m.

Reply

0
<p>维基中文语料<a href="https://dumps.wikimedia.org/zhwiki/20190401/">https://dumps.wikimedia.org/zhwiki/20190401/</a></p>

Panini

April 18, 2019, 2:47 p.m.

Reply

0
<p>100+ Chinese Word Vectors 上百种预训练中文词向量:<br><a href="https://github.com/Embedding/Chinese-Word-Vectors">https://github.com/Embedding/Chinese-Word-Vectors</a></p>

ll842316268

April 18, 2019, 2:28 p.m.

Reply

0
<p><a href="https://github.com/wainshine/Company-Names-Corpus">https://github.com/wainshine/Company-Names-Corpus</a><br><a href="https://github.com/wainshine/Company-Names-Corpus">https://github.com/wainshine/Company-Names-Corpus</a><br>中文公司名 人名语料库</p>

qiguang

April 18, 2019, 2:18 p.m.

Reply

0
<p>BERT中文预训练模型 <a href="https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip">https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip</a></p>

xiaolalala

April 18, 2019, 3:24 a.m.

Reply

0
<p>大连理工情感词汇本体库<a href="http://ir.dlut.edu.cn/EmotionOntologyDownload">http://ir.dlut.edu.cn/EmotionOntologyDownload</a></p>

pk3725069

April 17, 2019, 2:55 p.m.

Reply

0
<p>全网新闻数据(SogouCA) <a href="http://www.sogou.com/labs/resource/ca.php">http://www.sogou.com/labs/resource/ca.php</a><br>TextRank算法<a href="https://github.com/letiantian/TextRank4ZH">https://github.com/letiantian/TextRank4ZH</a></p>

ll842316268

April 17, 2019, 7:27 a.m.

Reply

0
<p>BERT中文预训练模型 <a href="https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip">https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip</a></p>

Zessay

April 17, 2019, 6:59 a.m.

Reply

0
<p>腾讯开源的中文词向量<a href="https://ai.tencent.com/ailab/nlp/embedding.html">https://ai.tencent.com/ailab/nlp/embedding.html</a></p>

chizhu

April 16, 2019, 9:45 a.m.

Reply

4
<p>fasttext用wiki和crawl训练的词向量(选择中文) <a href="https://fasttext.cc/docs/en/crawl-vectors.html">https://fasttext.cc/docs/en/crawl-vectors.html</a></p>