2017年2月28日 星期二

[ MongoDB FAQ ] How to stop insertion of Duplicate documents in a mongodb collection

Source From Here 
Question 
Let us have a MongoDB collection which has three docs.. 
db.collection.find() 
{ _id:'...', user: 'A', title: 'Physics', Bank: 'Bank_A' }
{ _id:'...', user: 'A', title: 'Chemistry', Bank: 'Bank_B' }
{ _id:'...', user: 'B', title: 'Chemistry', Bank: 'Bank_A' }

We have a doc, 
doc = { user: 'B', title: 'Chemistry', Bank:'Bank_A' }

If we use 
  1. db.collection.insert(doc)   
here, this duplicate doc will get inserted in database. 
{ _id:'...', user: 'A', title: 'Physics', Bank: 'Bank_A' }
{ _id:'...', user: 'A', title: 'Chemistry', Bank: 'Bank_B' }
{ _id:'...', user: 'B', title: 'Chemistry', Bank: 'Bank_A' }

{ _id:'...', user: 'B', title: 'Chemistry', Bank: 'Bank_A' }

How this duplicate can be stopped. On which field should indexing be done or any other approach? 

How-To 
Don't use insert. 

Use update with upsert=true. Update will look for the document that matches your query, then it will modify the fields you want and then, you can tell it upsert:True if you want to insert if no document matches your query
  1. db.collection.update(  
  2.    ,  
  3.    ,  
  4.   {  
  5.     upsert: <boolean>,  
  6.      multi: <boolean>,  
  7.     writeConcern:   
  8.    }  
  9.   )  
So, for your example, you could use something like this: 
  1. db.collection.update(doc, doc, {upsert:true})  
Supplement 
[ MongoDB 文件 ] Getting Started - Update Data with Java Driver

[ Java 常見問題 ] Java Synchronized list

Source From Here 
Question: 
I have a pre-populated array list. And I have multiple threads which will remove elements from the array list. Each thread calls the remove method below and removes one item from the list. Does the following code give me consistent behavior ? 
  1. ArrayList list = Collections.synchronizedList(new ArrayList());  
  2.   
  3. void remove(String item)  
  4. {  
  5.      do something; (doesn't work on the list)  
  6.      list.remove(item);  
  7. }  
How-To 
Check below sample code which doesn't do synchronization protection: 
  1. package demo;  
  2.   
  3. import java.util.ArrayList;  
  4. import java.util.List;  
  5.   
  6. public class MultiThreads {  
  7.     static class RThd implements Runnable{  
  8.         List     list = null;  
  9.         int             rc = 0;  
  10.           
  11.         public RThd(List list)  
  12.         {  
  13.             this.list = list;  
  14.         }  
  15.           
  16.         @Override  
  17.         public void run() {  
  18.             while(!list.isEmpty())  
  19.             {  
  20.                 System.out.printf("\t[%s] Remove %d\n", Thread.currentThread().getName(), list.remove(0));  
  21.                 rc++;  
  22.             }  
  23.         }  
  24.           
  25.     }  
  26.   
  27.     public static void main(String[] args) throws Exception{  
  28.         List list = new ArrayList();  
  29.         for(int i=0; i<100000; i++) list.add(i);  
  30.         List thdList = new ArrayList();  
  31.         ThreadGroup tg1 = new ThreadGroup("Group A");     
  32.         for(int i=0; i<10; i++)   
  33.         {  
  34.             RThd r = new RThd(list);  
  35.             thdList.add(r);  
  36.             new Thread(tg1, r).start();  
  37.         }  
  38.           
  39.         while(tg1.activeCount()>0)  
  40.         {  
  41.             Thread.sleep(500);            
  42.         }  
  43.         int rcc = 0;  
  44.         for(RThd r:thdList) rcc+=r.rc;  
  45.         System.out.printf("\t[Info] Done (%d)!\n", rcc);  
  46.     }  
  47. }  
When you execute it, you probably see the message what shows the removing action seems weird: 
...
[Thread-7] Remove null
[Thread-7] Remove null

[Thread-3] Remove 91578
[Thread-9] Remove 91577
[Thread-5] Remove 91576
[Thread-1] Remove 91575
[Thread-8] Remove 91685
[Thread-4] Remove 91644
[Thread-6] Remove 91643
[Thread-0] Remove 91642
[Thread-2] Remove 91641
[Info] Done (100113)!

You can resolve this issue by below code change which will make sure every thread to access the list with synchronization protection: 
  1. List list = new ArrayList();  
  2. list = Collections.synchronizedList(list);  
Just be careful if you are also iterating over the list, because in this case you will need to synchronize on it. From the Javadoc: 
It is imperative that the user manually synchronize on the returned list when iterating over it:

  1. List list = Collections.synchronizedList(new ArrayList());  
  2.     ...  
  3. synchronized (list) {  
  4.     Iterator i = list.iterator(); // Must be in synchronized block  
  5.     while (i.hasNext())  
  6.         foo(i.next());  
  7. }  
Or, you can use CopyOnWriteArrayList which is slower for writes but doesn't have this issue. 

Supplement 
ThreadGroup in Java

[ MongoDB FAQ ] Find all objects in collection Java Mongodb3

Source From Here
Question
Below code finds the first document in a collection :
  1. package database;  
  2.   
  3. import com.mongodb.BasicDBObject;  
  4. import com.mongodb.BulkWriteOperation;  
  5. import com.mongodb.BulkWriteResult;  
  6. import com.mongodb.Cursor;  
  7. import com.mongodb.DB;  
  8. import com.mongodb.DBCollection;  
  9. import com.mongodb.DBCursor;  
  10. import com.mongodb.DBObject;  
  11. import com.mongodb.MongoClient;  
  12. import com.mongodb.ParallelScanOptions;  
  13. import com.mongodb.ServerAddress;  
  14.   
  15. import java.net.UnknownHostException;  
  16. import java.util.List;  
  17. import java.util.Set;  
  18.   
  19. import static java.util.concurrent.TimeUnit.SECONDS;  
  20.   
  21. // based on http://mongodb.github.io/mongo-java-driver/2.13/getting-started/quick-tour/  
  22.   
  23. public class Mongo {  
  24.   
  25.     public void getCon() {  
  26.         // or  
  27.         MongoClient mongoClient;  
  28.         try {  
  29.             mongoClient = new MongoClient("localhost"27017);  
  30.             DB db = mongoClient.getDB("mydb");  
  31.             DBCollection coll = db.getCollection("testCollection");  
  32.   
  33.             BasicDBObject doc = new BasicDBObject("name""MongoDB")  
  34.                     .append("type""database")  
  35.                     .append("count"1)  
  36.                     .append("info",  
  37.                             new BasicDBObject("x"203).append("y"102));  
  38.             coll.insert(doc);  
  39.   
  40.             coll.findOne();  
  41.         } catch (UnknownHostException e) {  
  42.             // TODO Auto-generated catch block  
  43.             e.printStackTrace();  
  44.         }  
  45.   
  46.     }  
  47. }  
There does not appear to be a findAll method. How to find all the documents in the collection testCollection ?

How-To
You have to use the DBCollection.find() method, which:
Select all documents in collection and get a cursor to the selected documents.

So, what you have to do, is:
  1. DBCursor cursor = coll.find();  
  2. while (cursor.hasNext()) {  
  3.    DBObject obj = cursor.next();  
  4.    //do your thing  
  5. }  


[ Py DS ] Ch3 - Data Manipulation with Pandas (Part5)

Source From  Here   Pivot Tables   We have seen how the  GroupBy  abstraction lets us explore relationships within a dataset. A pivot ta...