简而言之,如果两个document之间的mapping比较类似,则使用type(同一个index下两个type),否则使用两个index可能是更好的选择。
https://www.elastic.co/blog/index-vs-type
注意红色字体的部分:
Who has never wondered whether new data should be put into a new type of an existing index, or into a new index? This is a recurring question for new users, that can’t be answered without understanding how both are implemented.
In the past we tried to make elasticsearch easier to understand by building an analogy with relational databases: indices would be like a database, and types like a table in a database. This was a mistake: the way data is stored is so different that any comparisons can hardly make sense, and this ultimately led to an overuse of types in cases where they were more harmful than helpful.
What is an index?
An index is stored in a set of shards, which are themselves Lucene indices. This already gives you a glimpse of the limits of using a new index all the time: Lucene indices have a small yet fixed overhead in terms of disk space, memory usage and file descriptors used. For that reason, a single large index is more efficient than several small indices: the fixed cost of the Lucene index is better amortized across many documents.
Another important factor is how you plan to search your data. While each shard is searched independently, Elasticsearch eventually needs to merge results from all the searched shards. For instance if you search across 10 indices that have 5 shards each, the node that coordinates the execution of a search request will need to merge 5x10=50 shard results. Here again you need to be careful: if there are too many shard results to merge and/or if you ran an heavy request that produces large shard responses (which can easily happen with aggregations), the task of merging all these shard results can become very resource-intensive, both in terms of CPU and memory. Again this would advocate for having fewer indices.
What is a type?
This is where types help: types are a convenient way to store several types of data in the same index, in order to keep the total number of indices low for the reasons exposed above. In terms of implementation it works by adding a “_type” field to every document that is automatically used for filtering when searching on a specific type. One nice property of types is that searching across several types of the same index comes with no overhead compared to searching a single type: it does not change how many shard results need to be merged.
However this comes with limitations as well(type有哪些限制):
- Fields need to be consistent across types. For instance if two fields have the same name in different types of the same index, they need to be of the same field type (string, date, etc.) and have the same configuration.
- Fields that exist in one type will also consume resources for documents of types where this field does not exist. This is a general issue with Lucene indices: they don’t like sparsity. Sparse postings lists can’t be compressed efficiently because of high deltas between consecutive matches. And the issue is even worse with doc values: for speed reasons, doc values often reserve a fixed amount of disk space for every document, so that values can be addressed efficiently. This means that if Lucene establishes that it needs one byte to store all value of a given numeric field, it will also consume one byte for documents that don’t have a value for this field. Future versions of Elasticsearch will have improvements in this area but I would still advise you to model your data in a way that will limit sparsity as much as possible.
- Scores use index-wide statistics, so scores of documents in one type can be impacted by documents from other types.
This means types can be helpful, but only if all types from a given index have mappings that are similar. Otherwise, the fact that fields also consume resources in documents where they don’t exist could make things worse than if the data had been stored in separate indices.
Which one should I use?
This is a tough question, and the answer will depend on your hardware, data and use-case. First it is important to realize that types are useful because they can help reduce the number of Lucene indices that Elasticsearch needs to manage. But there is another way that you can reduce this number: creating indices that have fewer shards. For instance, instead of folding 5 types into the same index, you could create 5 indices with 1 primary shard each.
I will try to summarize the questions you should ask yourself to make a decision:
- Are you using parent/child? If yes this can only be done with two types in the same index.
- Do your documents have similar mappings? If no, use different indices.
- If you have many documents for each type, then the overhead of Lucene indices will be easily amortized so you can safely use indices, with fewer shards than the default of 5 if necessary.
- Otherwise you can consider putting documents in different types of the same index. Or even in the same type.
In conclusion, you may be surprised that there are not as many use cases for types as you expected. And this is right: there are actually few use cases for having several types in the same index for the reasons that we mentioned above. Don’t hesitate to allocate different indices for data that would have different mappings, but still keep in mind that you should keep a reasonable number of shards in your cluster, which can be achieved by reducing the number of shards for indices that don’t require a high write throughput and/or will store low numbers of documents.
相关推荐
ElasticSearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口。Elasticsearch是用Java开发的,并作为Apache许可条款下的开放源码发布,是当前流行的企业级搜索...
前言 第1章 Elasticsearch入门 1 1.1 Elasticsearch是什么 1 1.1.1 Elasticsearch的历史 2 1.1.2 相关产品 3 1.2 全文搜索 3 1.2.1 Lucene介绍 4 1.2.2 Lucene倒排索引 4 1.3 基础知识 6 1.3.1 Elasticsearch术语及...
IK中文分词器在Elasticsearch上的使用。原生IK中文分词是从文件系统中读取词典,es-ik本身可扩展成从不同的源读取词典。目前提供从sqlite3数据库中读取。es-ik-plugin-sqlite3使用方法: 1. 在elasticsearch.yml中...
Elasticsearch的核心概念主要包括索引(Index)、类型(Type)、文档(Document)和字段(Field)。索引是一个存储相关文档的容器,类似于关系型数据库中的数据库;类型类似于关系型数据库中的表,但在Elasticsearch...
## ElasticSearch ### 概念 - 倒排索引 - 文档:每一条数据就是一个文档 - 词条:由文档分词得到,为词条创建索引 - 查询过程:将用户的搜索内容进行分词,根据词到索引中进行查找 - 与正向索引比较:正向...
Fluent :: Plugin :: Elasticsearch, 的插件 将您的日志发送到Elasticsearch(也许用Kibana搜索它们?) 注意:对于Amazon Elasticsearch Service,请考虑使用 当前的维护者:@ cosmo0920 cloud_auth 产生错误...
#ElasticSearch的葡萄牙语语音插件 ... 在config/elasticsearch.yml配置过滤器和分析器,如下所示: index : analysis : analyzer : fonetico : type : custom tokenizer : standard filter :
使用 @Index 和 @Type 注释创建普通 Java 对象@Index注释 - Elasticsearch 索引名称@Type注解 - Elasticsearch 类型名称例子: @Index(name = "TestIndex")@Type(name = "TestType")public class TestObject { .......
看不到源码elasticsearch-docker-composer-for-liferay-7 这是用于设置 docker-composer 以针对 Liferay 7.3 GA1 / DXP 7.3 SP1 (Elasticsearch 7.9.3) 测试 Elasticsearch 和 Kuromoji。 所需环境 码头工人 3.3.3 >...
基于Elasticsearch实现空间索引创建、空间范围查询, 1 创建索引集合接口 filePath:传入待建索引的SHP文件路径,预先将SHP文件拷贝至相应路径下; indexName:待建索引集合名称; indexConfig:待建索引集合的字段...
Elasticsearch LangField插件 概述 LangField插件为多语言提供了一个有用的功能。 版本 问题/问题 请提出。 (日本论坛在。) 安装 对于5.x $ $ES_HOME/bin/elasticsearch-plugin install org.codelibs:elastic...
1、元字段:主要包括每个文档的_index、_type、_id和_source以及_all等。其中常用的属性有: 1>dynamic:是否可以动态索引数据。可以取值”true”、”false”或”strict”,默认为”true”。”true”表示如果...
Elastomer-Client 为 ElasticSearch API endpoint 提供一个一对一的映射。API 通过你想要实现的来分解为逻辑部分和访问,每一个逻辑部分代表一个客户端类。示例代码:require 'elastomer/client' client = ...
Elasticseach的自定义... "type": "org.elasticsearch.index.similarity.CustomSimilarityProvider" }, "search": { "type": "org.elasticsearch.index.similarity.CustomSimilarityProvider" } } } }' 享受
弹性散装将数据批量添加到ElasticSearch。 它支持来自PostgreSQL,MSSQL,MySQL,MariaDB,SQLite3,文件系统和CSV的数据流开始npm install elasticbulk --save const elasticbulk = require ( 'elasticbulk' ) ;将...
要求PHP> = 8.0 Elasticsearch> = 7.0安装通过作曲家$ composer require isswp101/elasticsearch-eloquent用法创建一个新模型您应该覆盖index和type属性以确定文档路径。 use Isswp101 \ Persimmon \ Models \ ...
ES构造index,删除index,新增indexType,删除indexType,删除数据
一个用于Elasticsearch的简单查询构建器。 用pip install elasticquery 。 使用metod调用及其args / kwargs生成查询/过滤器/聚合对象。 输出dict / json表示形式,将其直接传递给ES。 概要 from elasticsearch ...
弹性的 一种将数据从excel文件移动到elasticsearch的工具。 用法:java -jar excelastic.jar data.xls -es:host localhost -es:index test -es:type mytype
Elasticsearch重新索引 概述 Elasticsearch Reindexing插件提供了从现有索引创建新索引的功能。 如果要添加新的分析器或对现有字段进行更改,则需要... localhost:9200/{fromindex}/{fromtype}/_reindex/{toindex}/{t