mongodb-user
[Arriba] [Todas las Listas]

Re: [mongodb-Usuario] Re: lectura de Error compressed BSON con Apache Sp

To: mongodb-user@xxxxxxxxxxxxxxxx
Subject: Re: [mongodb-Usuario] Re: lectura de Error compressed BSON con Apache Spark
From: Rafael Aguiar <rafael@xxxxxxx>
Date: Mon, 1 Feb 2016 13:05:14 -0300
Delivery-date: Mon, 01 Feb 2016 11:05:43 -0500
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:x-spam-checked-in-group :list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe; bh=uqdmzLRfXGHhwVHf7KJBHhUVAN1vDM2eqeq1o8GO9WQ=; b=LdyLMuopVJ4csGwJ+Mp5UDb9Tl7f4xEcro5LJqRposlZ+TFeIZjQRGvNxqcHvS2M5d TrlIDWkb5ojNWNOHUDkd94Zxdu0FnZ4D5sTuy+odg93nQT09j46cf+NHAgjygWzFSl64 fmc2VfVJk8MDGD4qsDphMGXbCBF2ZqgapkAw48CsgamdkQJThy0dUOri4dMnCQCagwg7 qWV4Y/VPgGOg6Yt4YY9awF6QJ3HNKC7ERVkCN1csqNbrYWoD16Ln+nbHfhO3Gf4QtV7u 7uah+2fGBNXMDmUYBWUXKhzQlTtUvABges6AoBUiAiyz19T18yiWlwdExnvxIPhapuPU oMkw==
Envelope-to: traductor@xxxxxxxxxxx
In-reply-to: <9b9e529c-9d0d-4d07-835c-1584124b80eb@googlegroups.com>
List-archive: <https://groups.google.com/group/mongodb-use>
List-help: <https://groups.google.com/support/>, <mailto:mongodb-user+help@googlegroups.com>
List-id: <mongodb-user.googlegroups.com>
List-post: <https://groups.google.com/group/mongodb-user/post>, <mailto:mongodb-user@googlegroups.com>
List-subscribe: <https://groups.google.com/group/mongodb-user/subscribe>, <mailto:mongodb-user+subscribe@googlegroups.com>
List-unsubscribe: <mailto:googlegroups-manage+1044811755470+unsubscribe@googlegroups.com>, <https://groups.google.com/group/mongodb-user/subscribe>
Mailing-list: list mongodb-user@xxxxxxxxxxxxxxxx; contact mongodb-user+owners@xxxxxxxxxxxxxxxx
References: <b5e83491-8a4f-435b-a21d-c01d2cca5a37@googlegroups.com> <e10dd1cf-f7da-42de-a882-bf0431461598@googlegroups.com> <4f5dcdd4-9e5e-4555-8a26-8bf2421fec5e@googlegroups.com> <2272110f-4a44-4325-bdfc-56d2d75c9368@googlegroups.com> <9b9e529c-9d0d-4d07-835c-1584124b80eb@googlegroups.com>
Reply-to: mongodb-user@xxxxxxxxxxxxxxxx
Sender: mongodb-user@xxxxxxxxxxxxxxxx
Probé y trabaja. *Thanks, otra vez Luke!!

En *Wed, Jan 27, 2016 en 4:18 PM, Luke *Lovett <luke.lovett@xxxxxxxxx> escribió:

> yo justo resuelto *HADOOP-253; esto tendría que ser fijado ahora en la rama maestra.
>
>
> En miércoles, enero 27, 2016 en 9:29:14 SOY *UTC-8, Luke *Lovett escribió:
>>
>> pienso que tiene que hacer con el camino de código que el *compressed *BSON toma.
>> La manera el *FileSystem está siendo recuperado está ignorando el esquema en
>> el *URI. El fijar para este asunto es ya en los trabajos. Por el tiempo que 1.5
>> sale esto  no más largo ser un problema.
>>
>> En miércoles, enero 27, 2016 en 7:16:35 SOY *UTC-8, Rafael *Aguiar escribió:
>>>
>>> Luke,
>>>
>>> puedo leer un regular *BSON de *S3, es justo cuándo pruebo el *compressed
>>> unos que veo que error.
>>>
>>> Probaré vuestra sugerencia, aun así;
>>>
>>> *Em *terça-*feira, 26 *de *janeiro *de 2016 17:02:41 *UTC-3, Luke *Lovett
>>> *escreveu:
>>>>
>>>> no parece el problema es con el hecho que el *BSON es
>>>> *compressed, pero el hecho que *Hadoop no ha sido configurado para utilizar el *s3
>>>> *filesystem (está esperando *HDFS, aparentemente). Aquello el *connector no coge en el hecho que "*s3*n://" significa *s3 es el *connector problema (yo
>>>> justo archivado *HADOOP-253), pero puedes trabajar alrededor del problema por configurar
>>>> *Hadoop para utilizar *s3 (y único *s3) por poner "*fs.*default.Nombre" y *fs.*defaultFS
>>>> "".
>>>>
>>>> En lunes, enero 25, 2016 en 10:32:47 PM *UTC-8, Rafael *Aguiar escribió:
>>>>>
>>>>> estoy utilizando *pyspark (en *spark 1.3.1) junto con el *mongo-*hadoop *jar,
>>>>>  ambos construido de la rama
>>>>> maestra.
>>>>>
>>>>> *rdd = *sc.*newAPIHadoopRDD(
>>>>>  *inputFormatClass='*com.*mongodb.*hadoop.*BSONFileInputFormat',
>>>>>  *keyClass='*org.*apache.*hadoop.*io.Texto',
>>>>>  *valueClass='*org.*apache.*hadoop.*io.*MapWritable',
>>>>>  *conf
>>>>>     #default{^'*mapred.Entrada.*dir': 3*n://mi-cubo/*compressed_*bson.*gz'
>>>>>  }
>>>>> )
>>>>>
>>>>>
>>>>> Cuándo intento crear el RDD por encima de yo consigue el siguiendo error:
>>>>>
>>>>> *INFO *hadoop.*BSONFileInputFormat: Archivo *s3*n://mi-cubo/*compressed_*bson.*gz
>>>>> Es *compressed así que no puede ser partido.
>>>>> *Traceback (La mayoría de llamada reciente dura):
>>>>>   Archivo "<*stdin>", línea 6, en <casa>
>>>>>   de Archivo "/del módulo/*hadoop/*spark/pitón/*pyspark/contexto.*py", línea 547, en
>>>>> *newAPIHadoopRDD
>>>>>     *jconf, *batchSize)
>>>>>   casa
>>>>> "/de Archivo/*hadoop/*spark/pitón/*lib/*py4*j-0.8.2.1-*src.*zip/*py4*j/*java_puerta.*py"
>>>>> , línea 538, en __casa__
>>>>>   de Archivo
>>>>> "/de la llamada/*hadoop/*spark/pitón/*lib/*py4*j-0.8.2.1-*src.*zip/*py4*j/protocolo.*py",
>>>>> línea 300, dentro conseguir_valor_de regreso
>>>>> *py4*j.Protocolo.*Py4*JJavaError: Un error ocurrido mientras llamando *z:*org.
>>>>> *apache.*spark.*api.Pitón.*PythonRDD.*newAPIHadoopRDD.
>>>>> : *java.*lang.*IllegalArgumentException: FS incorrecto: *s3*n://mi-cubo/*compressed_*bson.*gz,
>>>>> esperó: *hdfs://10.0.2.139:9000
>>>>>
>>>>> Tiene cualquiera afrontó algo similar?
>>>>>
>>>> --
> Recibiste este mensaje porque eres *subscribed al *Google Grupos
> "*mongodb-grupo"
> de usuario.
>
> Para otro *MongoDB opciones de apoyo técnico, ve:
> *http://www.mongodb.org/sobre/apoyo/.
> ---
> Recibiste este mensaje porque eres *subscribed a un tema en
> el *Google Grupos "*mongodb-grupo" de usuario.
> A *unsubscribe de este tema, visita
> *https://grupos.*google.*com/*d/Tema/*mongodb-usuario/2*jcrxOdRuFo/*unsubscribe.
> A *unsubscribe de este grupo y todos sus temas, enviar un *email a *mongodb-usuario+unsubscribe@xxxxxxxxxxxxxxxx.
> 
> A correo a este grupo, envía *email a *mongodb-user@xxxxxxxxxxxxxxxx.
> Visita este grupo en *https://grupos.*google.*com/Grupo/*mongodb-usuario.
> Para ver esta discusión en la visita de web
> *https://grupos.*google.*com/*d/*msgid/*mongodb-Usuario/9*b9*e529*c-9*d0*d-4*d07-835*c-1584124*b80*eb%40*googlegroups.*com
> <*https://Grupos.*google.*com/*d/*msgid/*mongodb-Usuario/9*b9*e529*c-9*d0*d-4*d07-835*c-1584124*b80*eb%40*googlegroups.*com?*utm_Medio=*email&*utm_fuente=*footer>
> .
>
> Para más opciones, visita *https://grupos.*google.*com/*d/*optout.
>



-- 
Rafael *AguiarData Móvil de Ingeniero

de la Ciencia: +55 81 99730.0415 <*callto://+5581997300415>
*Skype: *rafael_*aguiar_
<*http://*t.*sidekickopen29.*com/*e1*t/*c/5/*f18*dQhb0*S7*lC8*dDMPbW2*n0*x6*l2*B9*nMJW7*t5*XX45*w6*CwnN7*dSpvzQZpw8*W8*pTc_456*dVQFdQm8LT02?*t=*callto%3Un%2F%2*Frafael_*aguiar_&*si=4991638468296704&*pi=9266*b53*b-57*c9-4*b38-*d81un-*d2*f8*f01*ed355>
Oficina: +55 81 3127.0881 <*callto://+558131270881>
*Website: *inlocomedia.*com <*http://Www.inlocomedia.com/>
[imagen: *inlocomedia]
<*http://*t.*sidekickopen29.*com/*e1*t/*c/5/*f18*dQhb0*S7*lC8*dDMPbW2*n0*x6*l2*B9*nMJW7*t5*XX45*w6*CwnN7*dSpvzQZpw8*W8*pTc_456*dVQFdQm8LT02?*t=*http%3Un%2F%2*Fwww.*inlocomedia.*com%2F&*si=4991638468296704&*pi=9266*b53*b-57*c9-4*b38-*d81un-*d2*f8*f01*ed355>
 [imagen: *LinkedIn]
<*http://*t.*sidekickopen29.*com/*e1*t/*c/5/*f18*dQhb0*S7*lC8*dDMPbW2*n0*x6*l2*B9*nMJW7*t5*XX45*w6*CwnN7*dSpvzQZpw8*W8*pTc_456*dVQFdQm8LT02?*t=*https%3Un%2F%2*Fwww.*linkedin.*com%2*Fcompany%2*Fin-*loco-*media&*si=4991638468296704&*pi=9266*b53*b-57*c9-4*b38-*d81un-*d2*f8*f01*ed355>
 [imagen: *Facebook] <*https://www.facebook.com/*inlocomedia> [imagen: *Twitter]
<*http://*t.*sidekickopen29.*com/*e1*t/*c/5/*f18*dQhb0*S7*lC8*dDMPbW2*n0*x6*l2*B9*nMJW7*t5*XX45*w6*CwnN7*dSpvzQZpw8*W8*pTc_456*dVQFdQm8LT02?*t=*https%3Un%2F%2*Ftwitter.*com%2*Finlocomedia&*si=4991638468296704&*pi=9266*b53*b-57*c9-4*b38-*d81un-*d2*f8*f01*ed355>

-- 
recibiste este mensaje porque eres *subscribed al *Google Grupos "*mongodb-grupo"
de usuario.

Para otro *MongoDB opciones de apoyo técnico, ve: *http://www.mongodb.org/sobre/apoyo/.
--- 
Recibiste este mensaje porque eres *subscribed al *Google Grupos "*mongodb-grupo" de usuario.
A *unsubscribe de este grupo y la parón que recibe *emails de él, enviar un *email a *mongodb-usuario+unsubscribe@xxxxxxxxxxxxxxxx.
A correo a este grupo, envía *email a *mongodb-user@xxxxxxxxxxxxxxxx.
Visita este grupo en *https://grupos.*google.*com/Grupo/*mongodb-usuario.
Para ver esta discusión en la visita de web *https://grupos.*google.*com/*d/*msgid/*mongodb-Usuario/*CACc%3*D51*bC0__*UJXEGdmF58*pHE%3*DK7*T%2*Bnhi7*N-*M9*wYupjYu6*bnz2*Q%40correo.*gmail.*com.
Para más opciones, visita *https://grupos.*google.*com/*d/*optout.
I tested and it works. Thanks, again Luke!!

On Wed, Jan 27, 2016 at 4:18 PM, Luke Lovett <luke.lovett@xxxxxxxxx> wrote:

> I just resolved HADOOP-253; this should be fixed now in the master branch.
>
>
> On Wednesday, January 27, 2016 at 9:29:14 AM UTC-8, Luke Lovett wrote:
>>
>> I think it has to do with the code path that the compressed BSON takes.
>> The way the FileSystem is being retrieved is ignoring the scheme in the
>> URI. The fix for this issue is already in the works. By the time that 1.5
>> comes out this will no longer be a problem.
>>
>> On Wednesday, January 27, 2016 at 7:16:35 AM UTC-8, Rafael Aguiar wrote:
>>>
>>> Luke,
>>>
>>> I can read a regular BSON from S3, it's just when I try the compressed
>>> ones that I see that error.
>>>
>>> I'll try your suggestion, though;
>>>
>>> Em terça-feira, 26 de janeiro de 2016 17:02:41 UTC-3, Luke Lovett
>>> escreveu:
>>>>
>>>> It doesn't look like the problem is with the fact that the BSON is
>>>> compressed, but the fact that Hadoop has not been configured to use the s3
>>>> filesystem (it's expecting HDFS, apparently). That the connector doesn't
>>>> pick up on the fact that "s3n://" means s3 is the connector's problem (I
>>>> just filed HADOOP-253), but you can work around the problem by configuring
>>>> Hadoop to use s3 (and only s3) by setting "fs.default.name" and
>>>> "fs.defaultFS".
>>>>
>>>> On Monday, January 25, 2016 at 10:32:47 PM UTC-8, Rafael Aguiar wrote:
>>>>>
>>>>> I'm using pyspark (on spark 1.3.1) along with the mongo-hadoop jar,
>>>>>  both built from the master
>>>>> branch.
>>>>>
>>>>> rdd = sc.newAPIHadoopRDD(
>>>>>  inputFormatClass='com.mongodb.hadoop.BSONFileInputFormat',
>>>>>  keyClass='org.apache.hadoop.io.Text',
>>>>>  valueClass='org.apache.hadoop.io.MapWritable',
>>>>>  conf={
>>>>>     'mapred.input.dir': 's3n://my-bucket/compressed_bson.gz'
>>>>>  }
>>>>> )
>>>>>
>>>>>
>>>>> When I try to create the RDD above I get the following error:
>>>>>
>>>>> INFO hadoop.BSONFileInputFormat: File s3n://my-bucket/compressed_bson.gz
>>>>> is compressed so cannot be split.
>>>>> Traceback (most recent call last):
>>>>>   File "<stdin>", line 6, in <module>
>>>>>   File "/home/hadoop/spark/python/pyspark/context.py", line 547, in
>>>>> newAPIHadoopRDD
>>>>>     jconf, batchSize)
>>>>>   File
>>>>> "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py"
>>>>> , line 538, in __call__
>>>>>   File
>>>>> "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
>>>>> line 300, in get_return_value
>>>>> py4j.protocol.Py4JJavaError: An error occurred while calling z:org.
>>>>> apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
>>>>> : java.lang.IllegalArgumentException: Wrong FS: s3n://my-bucket/compressed_bson.gz,
>>>>> expected: hdfs://10.0.2.139:9000
>>>>>
>>>>> Has anyone faced something similar?
>>>>>
>>>> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-user"
> group.
>
> For other MongoDB technical support options, see:
> http://www.mongodb.org/about/support/.
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "mongodb-user" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/mongodb-user/2jcrxOdRuFo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> mongodb-user+unsubscribe@xxxxxxxxxxxxxxxx.
> To post to this group, send email to mongodb-user@xxxxxxxxxxxxxxxx.
> Visit this group at https://groups.google.com/group/mongodb-user.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/mongodb-user/9b9e529c-9d0d-4d07-835c-1584124b80eb%40googlegroups.com
> <https://groups.google.com/d/msgid/mongodb-user/9b9e529c-9d0d-4d07-835c-1584124b80eb%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Rafael AguiarData Science Engineer

Mobile: +55 81 99730.0415 <callto://+5581997300415>
Skype: rafael_aguiar_
<http://t.sidekickopen29.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XX45w6CwnN7dSpvzQZpw8W8pTc_456dVQFdQm8LT02?t=callto%3A%2F%2Frafael_aguiar_&si=4991638468296704&pi=9266b53b-57c9-4b38-d81a-d2f8f01ed355>
Office: +55 81 3127.0881 <callto://+558131270881>
Website: inlocomedia.com <http://www.inlocomedia.com/>
[image: inlocomedia]
<http://t.sidekickopen29.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XX45w6CwnN7dSpvzQZpw8W8pTc_456dVQFdQm8LT02?t=http%3A%2F%2Fwww.inlocomedia.com%2F&si=4991638468296704&pi=9266b53b-57c9-4b38-d81a-d2f8f01ed355>
 [image: LinkedIn]
<http://t.sidekickopen29.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XX45w6CwnN7dSpvzQZpw8W8pTc_456dVQFdQm8LT02?t=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Fin-loco-media&si=4991638468296704&pi=9266b53b-57c9-4b38-d81a-d2f8f01ed355>
 [image: Facebook] <https://www.facebook.com/inlocomedia> [image: Twitter]
<http://t.sidekickopen29.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XX45w6CwnN7dSpvzQZpw8W8pTc_456dVQFdQm8LT02?t=https%3A%2F%2Ftwitter.com%2Finlocomedia&si=4991638468296704&pi=9266b53b-57c9-4b38-d81a-d2f8f01ed355>

-- 
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
--- 
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@xxxxxxxxxxxxxxxx.
To post to this group, send email to mongodb-user@xxxxxxxxxxxxxxxx.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/CACc%3D51bC0__UJXEGdmF58pHE%3DK7T%2Bnhi7N-M9wYupjYu6bnz2Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
<Anterior por Tema] Tema Actual [Siguiente por Tema>
  • Re: [mongodb-Usuario] Re: lectura de Error compressed BSON con Apache Spark, Rafael Aguiar <=