Elasticsearch学习——了解Lucene（二）

索引文档（创建索引）

上文中已经说过，lucene 索引文档的过程就是把文档变成索引这种数据结构。以下代码段是一个索引文档的实例，该实例中需要索引的数据源为对象数组，模拟了真实需求下的索引需求。
实例中，我定义了indexDocument(Product[] products, Analyzer analyzer, String indexPath)方法来实现文档的索引，该方法具体如下：

/**
     * 索引文档，生成索引
     * @param products 文档数组
     * @param analyzer 采用的分词器，该类应为 Analyzer 的实现
     * @param indexPath 索引存放的文件夹位置，此处为相对路径
     */
    static void indexDocument(Product[] products, Analyzer analyzer, String indexPath) {
        IndexWriterConfig config = new IndexWriterConfig(analyzer);
        config.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
        //索引存储的目录
        Directory directory = null;
        IndexWriter indexWriter = null;
        //相对路径寻址，索引存储目录为classpath下的indexPath文件夹
        Path indexFolder = Paths.get(indexPath);
        System.out.println("开始索引文档");
        try {
            if (!Files.isReadable(indexFolder)) {
                System.out.println("索引路径不存在，请确认路径存在后重试");
                System.exit(1);
            }
            directory = FSDirectory.open(indexFolder);
            indexWriter = new IndexWriter(directory, config);
            //创建lucene域
            FieldType id = new FieldType();
            id.setIndexOptions(IndexOptions.DOCS);
            id.setStored(true);
            FieldType name = new FieldType();
            name.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
            name.setStored(true);
            name.setTokenized(true);
            FieldType description = new FieldType();
            description.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
            description.setStored(true);
            description.setTokenized(true);
            description.setStoreTermVectors(true);
            description.setStoreTermVectorOffsets(true);
            description.setStoreTermVectorPayloads(true);
            description.setStoreTermVectorPositions(true);
            for (Product product : products) {
                indexWriter.addDocument(getDocument(product, id, name, description));
                System.out.println("第" + product.getId() + "条数据索引成功");
            }
            indexWriter.commit();
            indexWriter.close();
        } catch (IOException e) {
            System.out.println("索引文档失败");
            e.printStackTrace();
        }
        System.out.println("索引文档成功");
    }
    static Document getDocument(Product product, FieldType id, FieldType name, FieldType description) {
        Document document = new Document();
        document.add(new Field("id", String.valueOf(product.getId()), id));
        document.add(new Field("name", product.getName(), name));
        document.add(new Field("description", product.getDescription(), description));
        return document;
    }

可以整理 lucene 索引一个文档（生成一个索引）的大致步骤如下：

根据存储目录和配置参数初始化 IndexWriter
构建存储域，将对象转化成 lucene 的 Document 对象

使用 IndexWriter 提交文档，完成索引过程。
用来测试该方法的主方法定义及运行结果如下：

public static void main(String[] args) {
        Product[] products = {new Product(1, "苹果手机", "大屏、前置指纹识别、面容识别、不支持切换UI", 10),
                new Product(2, "小米手机", "超大屏、后置指纹识别、支持多种UI切换", 25),
                new Product(3, "一加手机", "超大屏、前置指纹识别、两种UI切换", 50)};
        //此例中采用SmartChinese中文分词器进行分词
        Analyzer analyzer = new SmartChineseAnalyzer();
        indexDocument(products, analyzer, "indexDirectory");
    }
运行结果：
开始索引文档
第1条数据索引成功
第2条数据索引成功
第3条数据索引成功
索引文档成功

前往项目根目录下的 indexDirectory 文件夹下查看，可以看到如下几个文件。至此，索引文档成功。

lucene索引可以通过一个开源工具 Luke 来查看，如有需要可以访问 Luke 项目的 Github 地址下载使用。

再谈索引文档

由上文的实例来说。每一个 Product 类的实例化对象在 Java 中是一个实体，其对应的便是索引到 lucene 中的一个文档。Product

删除、更新索引

由上文对于索引文档（创建索引）的实例中，主要是通过 IndexWriter 实例进行索引的创建。该类还提供了删除索引文档、更新索引文档的方法。其对比如下：

addDocument 方法：索引文档
deleteDocuments 方法：删除文档

updateDocument 方法：更新文档
删除索引（已索引文档）的实例如下：

/**
     * 根据关键词，删除索引文档
     * @param analyzer 指定的分词器
     * @param field 关键词所在的域
     * @param keyWord 关键词
     * @param indexPath
     */
    static void deleteDocument(Analyzer analyzer, String field, String keyWord, String indexPath) {
        IndexWriterConfig config = new IndexWriterConfig(analyzer);
        Path indexFolder = Paths.get(indexPath);
        Directory directory = null;
        try {
            directory = FSDirectory.open(indexFolder);
            IndexWriter indexWriter = new IndexWriter(directory, config);
            indexWriter.deleteDocuments(new Term(field, keyWord));
            indexWriter.commit();
            indexWriter.close();
            System.out.println("删除成功");
        } catch (IOException e) {
            System.out.println("删除失败");
            e.printStackTrace();
        }
    }

此实例仅展示了多个删除 API 的其中之一，如有其他需求需要，还需查阅 lucene API 官方文档。

同理，我们也可以通过 IndexWriter 类提供的方法对索引进行更新，实例代码如下：

/**
     * 根据关键字定位，并更新文档
     * @param analyzer 指定分词器
     * @param field 关键字所在域
     * @param keyWord 关键字
     * @param product 新文档对象
     * @param indexPath 索引存放目录
     */
    static void updateDocument(Analyzer analyzer, String field, String keyWord, Product product, String indexPath) {
        IndexWriterConfig config = new IndexWriterConfig(analyzer);
        Path indexFolder = Paths.get(indexPath);
        Directory directory = null;
        try {
            directory = FSDirectory.open(indexFolder);
            IndexWriter indexWriter = new IndexWriter(directory, config);
            FieldType id = new FieldType();
            id.setIndexOptions(IndexOptions.DOCS);
            id.setStored(true);
            FieldType name = new FieldType();
            name.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
            name.setStored(true);
            name.setTokenized(true);
            FieldType description = new FieldType();
            description.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
            description.setStored(true);
            description.setTokenized(true);
            description.setStoreTermVectors(true);
            description.setStoreTermVectorOffsets(true);
            description.setStoreTermVectorPayloads(true);
            description.setStoreTermVectorPositions(true);
            Document document = getDocument(product, id, name, description);
            indexWriter.updateDocument(new Term(field, keyWord), document);
            indexWriter.commit();
            indexWriter.close();
            System.out.println("更新成功");
        } catch (IOException e) {
            System.out.println("更新失败");
            e.printStackTrace();
        }
    }

小结

本文主要介绍了 lucene 索引的相关操作，包括索引文档、删除索引的文档、更新索引的文档。除了这些之外，Lucene 的另一主要特性便是可以进行全文检索，下一篇文章中，会继续介绍 Lucene 查询的相关操作。
示例工程地址：lucene-es-demo
如有疑问和交流，欢迎email至 JaydenRansom@outlook.com