Google is collecting tiles !  

Share your tips and tricks here or get help with any Panotour 2.0 problem!
No bug reports (of any kind) in this forum!
no avatar
leifs
Member
 
Posts: 612
Joined: Sun Sep 06, 2009 12:08 pm
Location: Ørsta Norway

Google is collecting tiles !

by leifs » Thu Dec 26, 2013 11:00 am

The stats from my webhotel I see (for the first time) that google is systematic collecting tiles from my panotour panos.
Are they decoding the XML's ? Can they possible put it together?
Do I have to protect the panos from google ?
Anybody.

leifs
Attachments
google.jpg

no avatar
HansKeesom
Member
 
Posts: 2168
Joined: Mon Jul 19, 2010 8:53 pm

Re: Google is collecting tiles !

by HansKeesom » Thu Dec 26, 2013 3:49 pm

leifs wrote:The stats from my webhotel I see (for the first time) that google is systematic collecting tiles from my panotour panos.
Are they decoding the XML's ? Can they possible put it together?
Do I have to protect the panos from google ?
Anybody.

leifs


They collect them because they found a link to them, that's webcrawling like most search engine do this. Nothing out of the ordinary.
If you want to stop it you can place a robots.txt file in the topdirectory of your webserver.

no avatar
leifs
Member
 
Posts: 612
Joined: Sun Sep 06, 2009 12:08 pm
Location: Ørsta Norway

Re: Google is collecting tiles !

by leifs » Thu Dec 26, 2013 4:33 pm

HansKeesom wrote:They collect them because they found a link to them, that's webcrawling like most search engine do this. Nothing out of the ordinary.
If you want to stop it you can place a robots.txt file in the topdirectory of your webserver.


There are no links to the tiles directly. They are jpg's ok, but the name is produced from the XML's by the krpano viewer.
The robot.txt is for denying access to a directory. I don't mind the bots to index my thumbnails etc, but I have not seen them systematicly grab the tiles.
For now, while thinking, I've denied the two ip-addresses (google-image bots) access to my site by using .htaccess. Google and the others can index my site as before.

as seen below there are quite some robots visiting the site. this is for december until now.

leifs
Attachments
spiders.jpg

no avatar
HansKeesom
Member
 
Posts: 2168
Joined: Mon Jul 19, 2010 8:53 pm

Re: Google is collecting tiles !

by HansKeesom » Thu Dec 26, 2013 4:38 pm

leifs wrote:
HansKeesom wrote:They collect them because they found a link to them, that's webcrawling like most search engine do this. Nothing out of the ordinary.
If you want to stop it you can place a robots.txt file in the topdirectory of your webserver.



There are no links to the tiles directly. They are jpg's ok, but the name is produced from the XML's by the krpano viewer.
The robot.txt is for denying access to a directory. I don't mind the bots to index my thumbnails etc, but I have not seen them systematicly grab the tiles.
For now, while thinking, I've denied the two ip-addresses (google-image bots) access to my site by using .htaccess. Google and the others can index my site as before.

as seen below there are quite some robots visiting the site. this is for december until now.

leifs




So isn't the conclusion then that they are ignoring your robot.txt and get the names from the xml-files which are refered to in the html-files?

no avatar
leifs
Member
 
Posts: 612
Joined: Sun Sep 06, 2009 12:08 pm
Location: Ørsta Norway

Re: Google is collecting tiles !

by leifs » Thu Dec 26, 2013 5:02 pm

HansKeesom wrote:So isn't the conclusion then that they are ignoring your robot.txt and get the names from the xml-files which are refered to in the html-files?


They are grabbing the XML's. But there are no explicit filenames to grab there. The filenames are produced from the XML's. This is how the name of the thousands jpg's are presented in the virtualtour.XML

<level tiledimagewidth="898" tiledimageheight="898">
<front url="froystadtua_sphere_64/0/0/%v_%u.jpg"/>
<right url="froystadtua_sphere_64/1/0/%v_%u.jpg"/>
<back url="froystadtua_sphere_64/2/0/%v_%u.jpg"/>
<left url="froystadtua_sphere_64/3/0/%v_%u.jpg"/>
<up url="froystadtua_sphere_64/4/0/%v_%u.jpg"/>
<down url="froystadtua_sphere_64/5/0/%v_%u.jpg"/>
</level>

%v and %u are counters which increase from zero to a maximum integer found or calculated from somewhere else in the XML's.
For me it looks like Google-image has reverse engineered the way krpano make tiles and has the ambition to download all the tiles, for maybe later to put them together and so get the original image. They have cooperated with NSA on other issues, so this is probably a piece of cake when you have this kind of resources.

leifs

no avatar
HansKeesom
Member
 
Posts: 2168
Joined: Mon Jul 19, 2010 8:53 pm

Re: Google is collecting tiles !

by HansKeesom » Thu Dec 26, 2013 5:08 pm

leifs wrote:
HansKeesom wrote:So isn't the conclusion then that they are ignoring your robot.txt and get the names from the xml-files which are refered to in the html-files?


They are grabbing the XML's. But there are no explicit filenames to grab there. The filenames are produced from the XML's. This is how the name of the thousands jpg's are presented in the virtualtour.XML

<level tiledimagewidth="898" tiledimageheight="898">
<front url="froystadtua_sphere_64/0/0/%v_%u.jpg"/>
<right url="froystadtua_sphere_64/1/0/%v_%u.jpg"/>
<back url="froystadtua_sphere_64/2/0/%v_%u.jpg"/>
<left url="froystadtua_sphere_64/3/0/%v_%u.jpg"/>
<up url="froystadtua_sphere_64/4/0/%v_%u.jpg"/>
<down url="froystadtua_sphere_64/5/0/%v_%u.jpg"/>
</level>

%v and %u are counters which increase from zero to a maximum integer found or calculated from somewhere else in the XML's.
For me it looks like Google-image has reverse engineered the way krpano make tiles and has the ambition to download all the tiles, for maybe later to put them together and so get the original image. They have cooperated with NSA on other issues, so this is probably a piece of cake when you have this kind of resources.

leifs


The code you give is the code as read by your webserver from disk. The webserver translates it into explicit code and write that to the client.

no avatar
leifs
Member
 
Posts: 612
Joined: Sun Sep 06, 2009 12:08 pm
Location: Ørsta Norway

Re: Google is collecting tiles !

by leifs » Thu Dec 26, 2013 5:26 pm

HansKeesom wrote:The code you give is the code as read by your webserver from disk. The webserver translates it into explicit code and write that to the client.


Sure ? When I look at the source of the html i've got from the server there is no filename.jpg there. Where is it written on the client, so that google-image can extract jpg filenames from it ?

leifs

no avatar
HansKeesom
Member
 
Posts: 2168
Joined: Mon Jul 19, 2010 8:53 pm

Re: Google is collecting tiles !

by HansKeesom » Thu Dec 26, 2013 5:51 pm

leifs wrote:
HansKeesom wrote:The code you give is the code as read by your webserver from disk. The webserver translates it into explicit code and write that to the client.


Sure ? When I look at the source of the html i've got from the server there is no filename.jpg there. Where is it written on the client, so that google-image can extract jpg filenames from it ?

leifs


In that case your client must interpret the (java)code into real names, else it can not download them. The robot works like a client, it interprets the (java)code.

What I described earlier was server based interpretation, this is client-based.

no avatar
leifs
Member
 
Posts: 612
Joined: Sun Sep 06, 2009 12:08 pm
Location: Ørsta Norway

Re: Google is collecting tiles !

by leifs » Thu Dec 26, 2013 8:02 pm

HansKeesom wrote:The robot works like a client, it interprets the (java)code.
What I described earlier was server based interpretation, this is client-based.


In panotour the html calls for the virtualtour.js (the krpano HTML5 viewer), which then calls for the appropriate jpg's and present them.
If google is to get the jpg's names it has to sniff them in some way, because they are not in any file on the client as far as I can see.
If this is the case google has made some special software for this purpose. Then it will be of no help to encrypt the XML's using "krpano protect tool" either.

If the Googlebot-image only opened the html it would grab the few level 1 jpg's.
The bot is zooming too ! It has grabbed level 4.
"GET /panotour/lokeberget/loekebergetdata/lokeberget_17_11_201_23/3/4/11_0.jpg HTTP/1.1"

I will keep a close eye on the logs for some time.

leifs

User avatar
benji33
Kolor Team
 
Posts: 3051
Joined: Tue Apr 09, 2013 10:59 am
Location: France

Re: Google is collecting tiles !

by benji33 » Wed Jan 15, 2014 12:04 pm

Solution will be to encrypt images as krpano can do with it's own algorithm. But it's unavailable for the moment.

no avatar
leifs
Member
 
Posts: 612
Joined: Sun Sep 06, 2009 12:08 pm
Location: Ørsta Norway

Re: Google is collecting tiles !

by leifs » Wed Jan 15, 2014 12:45 pm

I posted the issue on the krpano forum and Klaus pointed me to the problem:
listing of directories was allowed (by error)
Google had done what they always do, traverse all folders available, including the tiles folders
I have turned listing off and guess that will keep Google at a distance for now

leifs

no avatar
John360
Member
 
Posts: 39
Joined: Thu Dec 12, 2013 3:53 pm

Re: Google is collecting tiles !

by John360 » Wed Jan 15, 2014 12:50 pm

To answer your question about the ability of Google to put it together:

Technically they probably could make a photosynth out of it if they integrated krpano xml to Photosynth. In the future they may offer tools to post excisting 360s to Photosynth. It will be a "nice" way to build the volume of Photosynth content.

I do not think its happening now though.

John

no avatar
leifs
Member
 
Posts: 612
Joined: Sun Sep 06, 2009 12:08 pm
Location: Ørsta Norway

Re: Google is collecting tiles !

by leifs » Wed Jan 15, 2014 2:38 pm

John360 wrote:I do not think its happening now though.
John


After the NSA issues You never know !

leifs :)


Return to Using Panotour / Panotour Pro

Who is online

Users browsing this forum: No registered users and 1 guest