0%

前言

好久没写博客了,之前一直在做RelexNet的项目,感觉上是做出了一些还不错的东西,也不知道能不能发一篇论文。这个学期选了anu Statistic Machine Learning这门以前的神课,当然感觉现在好像没有以前那种难度的感觉了,但结合这门课加上PRML这本书的学习,也从数学的角度更好的理解了一些机器学习的原理,不得不说Bishop是真的很爱用贝叶斯来解释一切。

Preface

It is a long time for me not writing the Blog. Recently, I just finished my last honours year of bachelor degree and the thesis. I am not sure if the thesis could be published to a conference, but I will trying to do this. In the last semester, I chose the course COMP4670 (Statistical Machine Learning). Although it may be not as difficult as before, I could also learn much knowledge from this course and PRML (Pattern Recognition and Machine Learning) book. From a mathematical point of view, I have a better understanding of some machine learning principles. I have to say that Bishop really likes to use Bayes to explain everything (lol). I would like to update about Kernel method first, the previous chapters can also be updated slowly.

阅读全文 »

Introduction to Boolean Retrieval

Bag-of-Words (BOW)

Assumption

A document is a collection of words
e.g.

  • Doc1: Mary married John
  • Doc2: John married Mary
  • These two documents are the same under BoW assumption

We will use the BoW assumption throughout IR part(not care about ordering). NLP part will cover other approaches that care about ordering.

Field (Zone) in Document

  • Document is a semi-structured data
    e.g. Title, Author, Published, Date, Body
  • Someone may want to limit search scope within a certain field
阅读全文 »

Introduction to Information Retrieval

What is information retrieval

Concept

  • Google Defination
    Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.
  • Textbook Defination
    Textbook
    Information Retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).
阅读全文 »

Kaggle比赛 NLP with Disaster Tweets 第二次尝试

这个算是第二次尝试吧,当然其实和第一次的区别不大,就是调调参什么的,也算是对模型参数的一个熟悉过程吧,这个比第一次的结果提升了大概0.007。所以说有时候当一个调参侠还是有点用处的哈哈~
notebook参考Getting started with NLP-Feature Vectors

阅读全文 »

Kaggle比赛 NLP with Disaster Tweets

这个算是nlp比赛的第一次尝试,是个基础入门类型的比赛,也当是先熟悉一下nlp的一些基础方法。这个比赛主要也就是通过tweet的文本内容来判断他是否是在描述一场灾难。

如何从零开始

首先呢,最好的开始kaggle比赛的方法是找一个baseline了解一下, 我这里就是先找了一个General Introduction了解一下nlp相关的预处理和模型。Getting started with NLP - A general Introduction

阅读全文 »

编写博客

编写博客的命令:

1
hexo new "title"

title是博客的名称,即显示到主页的title
此条命令会在source/posts文件夹中新建一个以title命名的md文件,只需修改其md文件后再
1
2
3
hexo clean
hexo g
hexo d

执行上面两条命令即可将修改的md文件push到github上,即可通过网站访问,不过推荐先使用
1
2
3
hexo clean
hexo g
hexo s

阅读全文 »

第一篇博客-Hexo博客的建立与部署

  1. 经历过刺激的考试周, 假期试着学一下nlp再学习一下《统计学习方法》,为以后COMP4650和COMP4670的学习铺垫一下
  2. 首先,感谢一下我的好兄弟兼好儿子MrSun,按照他的博客先搭起了框架,这里先也写一下博客的搭建过程

    博客的搭建过程

  3. 首先下载node.js以及git(这两个都是hexo博客框架所需要的)
    node.js
    git
  4. 安装Hexo
    首先,node.js下载并安装后是默认安装npm(项目管理工具)

    因为npm是使用国外镜像,所以我们修改并使用淘宝的镜像,这样之后的命令执行速度会变快, 其下载命令是:

阅读全文 »