2013年12月15日星期日

Database Query - Performance Tuning ~

 This post last edited by the niushuai666 on 2013-12-06 16:26:19
Now there is a performance bottleneck in the project :

I have A / B / C, 3 tables

now check the status of records by B 1 , about 4000 . Then traverse this 4000 :
find the same record id and id in table B C table ( 500,000 records )

So, the performance bottleneck is the C table based on id query. So you need to perform 4000 queries . Including the establishment of overhead and then connect, disconnect and the like . Now complete this module takes about four minutes or so.

I now want to optimize the idea is this:

queries through a C table all the information read into memory. But the question is, if I did not read any treatment , then returns List object , I need to finish 4000 * 500,000 traverse to complete this task. This makes it a tragedy. So, I think
select * from C where id in (1,2,3,4,5, ... 4000); then put these results into a loaded
HashMap >. So I used it according to id you can get .

SimpleJdbcTemplate not know there is such a function? ? ? ? Emergency line ~ ~ ~
------ Solution ---------------------------------- when ----------
less if C table data changes , you can consider the service starts , C table will be loaded into memory tasks , data structures using HashMap >, when used directly map.get (B table id). When
C table data changes, memory corresponding to the id of the corresponding list map is updated accordingly .

This memory should be considered , if the C table fields are more likely to take up a lot of memory. If the conditions, you can put forward a cache server .
------ Solution ---------------------------------------- ----
SimpleJdbcTemplate not have queryForObject (String sql, RowMapper rowMapper, Ojbect ... args) way to do this is to return an object mapRow way you rowMapper returned by the landlord so long as the use BeanPropertyRowMapper be property and automatic mapping database fields

Map<Integer,List<C>> map=new HashMap<Integer,List<C>>();
(Map<Integer,List<C>>)simpleJdbcTemplate.queryForObject(sql,new BeanPropertyRowMapper(C.class){
   //这里重写下rowMapper方法把获得的C对象放到map中就是了
   public Object mapRow(ResultSet rs,int rowNum){
     //原来的方法就是会返回c对象了
     C c=(C)super.mapRow(rs,rowNum);
     //接下来只需要把这个c对象放到map中返回map即可
     Integer id=c.getId();
     List<C> list=map.get(id);
     if(list==null){
       list=new ArrayList<C>();
       map.put(list);
     }
     list.add(c);
     return map;
   }
   
},args)

------ Solution ------------------------------------- -------
landlord used the 4th and 5th floor, said multi- table join it? If indexing, performance should not be poor son.

Select * 
From c Join b On c.id = b.id 
Where b.state=1

which state b to create an index on the table , but also to create an index ( but estimated that the field is the primary key can not index ) on id c table.

If you feel less connected to performance under Oracle environment, you can build the C statement is " clustered hash table ."
------ Solution ---------------------------------------- ----
complete looked at the above post.
select * from C where id in (1,2,3, ... 4000);
and
select C. * from C where exists (select 1 from B where B.status = 1 and C.id = B.id) ;

edge here , the query itself , may not charge for too long ,
The key issue is likely to have less than 4000 records, if it is an id of a record number of C , then
If the number of fields in this table are more bound to read data, to spend a lot of time.
Since data can be cached C table ( no write operation ? correct? ) , right , and id is an integer , then, may wish to use TreeSet > cached,
This is a sorted tree , take a id corresponding value is also very fast.
------ Solution ---------------------------------------- ----
learn together friends ~

ah , there are indexes, primary keys when inserted will affect the efficiency of the operation , some projects because deletions operate very frequently, it 's abandoned indexed .

But do not primary keys, indexes , missed a big waste characteristics database ah.
------ For reference only -------------------------------------- -
to correct , it should be
select * from C where id in (1,2,3, ... 4000);
------ For reference only ------ ---------------------------------
seek advice ah -
---- - For reference only ---------------------------------------
people do ! ! ~ ~ ~ ~
------ For reference only ---------------------------------- -----
correct , in (4000 something ) , this query is nothing

Also do not need this right

select xxx from c join b on c.id = b.id where b.state = 1

like the index of a building
------ For reference only --------------------------- ------------
sql statement that you change this :
select C. * from C where exists (select 1 from B where C.id = B.id)
------ For reference only ---------------------------------------
to determine if B table status word then added to the conditions inside
select C. * from C where exists (select 1 from B where B.status = 1 and C.id = B.id) ;
------ For reference only ------------------------------------- -
on two sql question how such a complex description .
------ For reference only -------------------------------------- -
I say to you below:

this company's mission is to create an index and fixed a long time to be updated. The speed of indexing will very much influence the user experience. Therefore, we must optimize .

I now pass a select * from C; then indexed by HashMap established once and then use the performance improved by 50 %. However, I feel like making wheels is not good, there is no more stable, robust and efficient way to optimize this module next ask .

Thank you very much .
------ For reference only -------------------------------------- -


am using this method ! Performance by 50 %. But time is still too long. 100s probably need time . I want to optimize to about 30s. Because we are now distributed in three server room, originally a room complete this module requires three minutes + , and the remaining two because the network is 10 minutes + . The situation is now estimated one and a half , five minutes + aims 30s 3 minutes. So you get what the big unique skills , both showed off Come on ! ! !

------ For reference only ---------------------------------- ----- inquiries
big data is a headache .
------ For reference only -------------------------------------- -
exsits efficiency
------ For reference only ------------------------------- --------
cattle to force
------ For reference only ------------------------ ---------------
50w data how it may take four minutes , you have to build the index , optimize sql statement
------ For reference only - --------------------------------------

you here obviously is to limit strong conditions in the sub- queries, in a more efficient ah
------ For reference only ----------------------- ----------------
seek advice, yeah I will not do
------ For reference only ------------ ---------------------------
new multi- table query indexes good
------ For reference only ---------------------------------------

------ For reference only ---------------------------------------
obviously your table structure issues now , not millions , to be like this .

listed in the table structure to allow the experts to give you an analysis.
------ For reference only -------------------------------------- -


amount . This I really do not know. . . Insufficient knowledge base database issues, I've got to learn about ~
------ For reference only ------------------------- --------------
good ........ can try
------ For reference only --------- ------------------------------

TreeSet interior looks like a red-black tree or balanced tree ? I do use a HashMap > structure. But I will try. Thank :)
------ For reference only ----------------------------------- ----

TreeSet interior looks like a red-black tree or balanced tree ? I do use a HashMap > structure. But I will try. Thanks in advance :)   I entered the wrong

should be TreeMap <....>, please correct .
------ For reference only -------------------------------------- -
personal feeling is hard in the index , you can hardly describe this large amount of data

select column names from C inner join B on C.id = B.id and B.status = 1
must be indexed on the first association ID

Second, you select the columns are not necessarily all , it is still possible to optimize the composite index or include index (sqlserver words )

I'm just like you have a similar query

8W master table data , the equivalent of your B table , the child table 50W amount of data , the equivalent of your C table , efficiency Leverage drops , paging queries, without any performance issues
------ For reference only ---------------------------------------


create table A 
(
 id int identity(1,1),
 Status nvarchar(10),
 Column3 varchar(50),
 Column4 varchar(50)
)
insert into A values('1',NEWID(),NEWID())
go 5000

insert into A values('2',NEWID(),NEWID())
go 5000

insert into A values('3',NEWID(),NEWID())
go 5000

insert into A values('4',NEWID(),NEWID())
go 5000

insert into A values('5',NEWID(),NEWID())
go 5000

insert into A values('6',NEWID(),NEWID())
go 5000

create table B1
(
 ID int ,
 Column2 varchar(50),
 Column3 varchar(50),
 Column4 varchar(50),
 Column5 varchar(50),
 Column6 varchar(50)



insert into B values (cast(RAND()*100000 as int),NEWID(),NEWID(),NEWID(),NEWID(),NEWID())
go 500000

select RAND()*100000


create index index_id on A(id)


create index index_id on B1(id)

--本机测试,联系的垃圾本本
--A表3W条数据,B表50W,模拟你的查询
--B表ID随机的情况下,有2w多条数据,不到一秒就出来了
--所以还是建议你在数据库级别处理,别拿出来到内存中弄了
select * from B inner join A on B.ID=A.ID and A.Status='1'


------ For reference only ---------------------------------------
Direct left join query , only 50W amount of data without problems.

SELECT B.*,C.* FROM B LEFT JOIN C ON C.BID=B.ID WHERE B.STATE=1;

If it is slow , then you can consider the establishment of BID in the C table index. established after the completion rate should be around 1S.
300W my MYSQL queries related data within 1S ( painted partitions, each about 50W or so ) .

------ For reference only ---------------------------------- -----
you are not put out all the data , the 4000 * 500 000 , this is not a small number, you read it, time is an issue, these data memory installed also to time ,


Please indicate that you are sql statement execution time or do you program the readout time ,

------ For reference only ---------------------------------- -----
  The reply was deleted administrator at 2013-12-09 09:14:10

------ For reference only ---------------------------------- -----
first in character over 7 + performance will decline geometrically , in addition to the best primary key operations, coupled with the left or the associated keyword search operation exists , the criteria to be able to write sql do not get in a word or two words to write the query is to obtain the maximum consumption
------ For reference only connection ------------------- --------------------
good thing. Learn. Good landlord
------ For reference only ---------------------------------------
record. Good things
------ For reference only ---------------------------------- -----
good thing, next time use on a
------ For reference only --------------------- ------------------
good stuff, learning the
------ For reference only ---------- -----------------------------

  
am using this method ! Performance by 50 %. But time is still too long. 100s probably need time . I want to optimize to about 30s. Because we are now distributed in three server room, originally a room complete this module requires three minutes + , and the remaining two because the network is 10 minutes + . The situation is now estimated one and a half , five minutes + aims 30s 3 minutes. So you get what the big unique skills , both showed off Come on ! ! !   
 
Since the conditional engage distributed , it should be considered a cache server ? redis, the cached data to the cache each query server, based on the use of different tables , set a valid time. How about this program ?
------ For reference only -------------------------------------- -
nice
------ For reference only ---------------------------------------
also can be ; Haha
------ For reference only ------------------------------------ ---

    
am using this method ! Performance by 50 %. But time is still too long. 100s probably need time . I want to optimize to about 30s. Because we are now distributed in three server room, originally a room complete this module requires three minutes + , and the remaining two because the network is 10 minutes + . The situation is now estimated one and a half , five minutes + aims 30s 3 minutes. So you get what the big unique skills , both showed off Come on ! ! !     
         
Since the conditional engage distributed , it should be considered a cache server ? redis, the cached data to the cache each query server, based on the use of different tables , set a valid time. How about this program ?  
really is a big cow , yes, we have a few sets of cache servers. Using redis. The main table compares our complex business logic and more stuff I optimize the structure and memory handling database query returns under the guidance of mentor , now once the index needs about 20s .

As for optimization, because it is part of our project, it needs to build the index , it is now scheduled task , future business needs in real-time to build the index . Long way to go ah. . ( I will not say too much food is my main .. ).

------ For reference only ---------------------------------- -----
database can be so ashamed professional ah ah
------ For reference only -------------------- -------------------
want to learn to avoid killing , seeking expert guidance !
------ For reference only -------------------------------------- -
seek advice ah -
------ For reference only --------------------------- ------------
index as long as not too much, not too much performance impact , and make good use of the database , these algorithms others have written, a small amount of data , then use nestloop join get a large amount of data , then use hash join to get over your own wheels made ​​much faster
------ For reference only ------------------ ---------------------
SELECT B.COLUMN1, B.COLUMN2 .... B.COLUMNX, B.ID, TEMP.COLUMN1 , TEMP.COLUMN2 ... TEMP.COLUMNX FROM (SELECT C.COLUMN1 AS TEMP1, C.COLUMN2 AS TEMP2 ... C.COLUMNX AS TEMPN, C.ID AS TEMPID FROM C) AS TEMP, B WHERE TEMP.ID = B.ID
------ For reference only ------------- --------------------------
encounter efficiency. I tend to think of the theory of space and time interchangeable. Indexing, database configuration adjustments essentially have adopted this approach .

If c is the primary key of the table from growing , then use id matching is very fast.

My idea is Table B Table C are not directly related . With a tmp table , this table only id field , Storage Table B status effective 4000 id. Tmp table and the associated c -table queries . Such data is 4000 × 50 thousand. The amount of data is not large, because it is the primary key of the match , the speed is very fast. Because if it does affect the response time of large amount of data , B table can store 200 , then 20 cycles .
------ For reference only -------------------------------------- -
write a few SQL statements , use the fastest
------ For reference only test --------------------- ------------------
write a few SQL statements , use the fastest
------ For reference only test ---- -----------------------------------
try to remember the 5th floor bar plus index < br> ------ For reference only ---------------------------------------
  The reply was deleted administrator at 2013-12-13 09:26:51

------ For reference only ---------------------------------- when debugging is not flawed ah -----

------ For reference only ---------------------------------------
first to see whether the next Explain the index, as well as why you need to build real-time indexing , ah, you should consider before building a good index of what the building was built ah what should and should not be built into the authority 's program . " fixed time should be updated "The aim is to prevent the hollowing out of the right index .
If excessive influence of fear index insertion speed read and write , then you can use a separate way to build a database .
------ For reference only ------------ ---------------------------
really? ? ? ?

没有评论:

发表评论